CN111861880A - Image super-fusion method based on regional information enhancement and block self-attention - Google Patents

Image super-fusion method based on regional information enhancement and block self-attention Download PDF

Info

Publication number
CN111861880A
CN111861880A CN202010506835.XA CN202010506835A CN111861880A CN 111861880 A CN111861880 A CN 111861880A CN 202010506835 A CN202010506835 A CN 202010506835A CN 111861880 A CN111861880 A CN 111861880A
Authority
CN
China
Prior art keywords
super
fusion
resolution
block
source image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010506835.XA
Other languages
Chinese (zh)
Other versions
CN111861880B (en
Inventor
李华锋
岑悦亮
余正涛
张亚飞
原铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202010506835.XA priority Critical patent/CN111861880B/en
Publication of CN111861880A publication Critical patent/CN111861880A/en
Application granted granted Critical
Publication of CN111861880B publication Critical patent/CN111861880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention relates to an image super-fusion method based on regional information enhancement and block self-attention, and belongs to the technical field of digital image processing. The method comprises a source image super-resolution branch and a fusion super-resolution branch. In the source image super-resolution branch, a feature extraction block is used iteratively to extract a source image feature map, and dense connection is used to fully utilize feature map information before and after the source image feature map is extracted. The output of each feature extraction block also passes through the region information enhancement block to explore the region of each object in the source image, and the information assists in fusing the super-resolution branch accurate prediction fusion decision diagram. In the fusion super-resolution branch, two source images are spliced together and input, and a fusion block based on a block self-attention mechanism is iteratively used by combining the source image information after the region enhancement input in the source image super-resolution branch so as to better distinguish focusing and non-focusing regions. And finally, performing sub-pixel convolution on each branch to generate a super-resolution source image and a fusion image.

Description

Image super-fusion method based on regional information enhancement and block self-attention
Technical Field
The invention relates to an image super-fusion method based on regional information enhancement and block self-attention, and belongs to the technical field of image information processing.
Background
The purpose of image fusion is to fuse the information of two or more source images captured by different cameras in the same scene into one image and ensure that the information of each source image can be preserved. The image fusion has very wide application in the fields of safety monitoring images, medical images, satellite remote sensing images and the like. In recent years, many researches have achieved good fusion effects, but the existing method is usually based on the de-fusion of high-resolution multi-focus source image data sets, however, the images obtained by the real-world imaging system are not necessarily high-resolution images. When fusing low-resolution source images, the fused image will also be low-resolution, even blurred and lacking in detail information, which reduces the utility of the image fusion technique. In order to input a low-resolution source image into a traditional fusion method for fusion, bicubic interpolation and nearest neighbor interpolation are generally adopted as upsampling operations to unify the resolution of the source image. However, these interpolation methods are too simple, have no pertinence to different data, and introduce wrong information to reduce the accuracy of image texture details, resulting in poor fusion effect; in addition, for the fusion task of multi-focus images, the accuracy of the fusion decision diagram is also reduced. Therefore, in order to solve these disadvantages and make the task of low resolution image fusion more efficient, a method capable of accurately super-resolving and fusing images is urgently needed.
In recent years, many image fusion methods based on deep learning have been proposed, which have a greater ability to extract texture and detail than fusion methods based on transform domain and spatial domain. One of the methods is to use an encoder-decoder network, extract the features of the source image by using an encoder, fuse the features by using a decoding network, and gradually amplify to obtain a fused image. One class of methods uses a pre-trained classification convolutional network to input image blocks into the network to predict whether the image blocks are focused, thereby generating a fusion decision diagram. One class of methods decomposes a source image into a base layer containing large-scale contours or intensity variations and a detail layer containing important textures, which are fused separately. Still other methods are based on generating a countermeasure network, implementing a fused image with the generator, while the discriminator is only used to distinguish the difference between the fused image and the visible image, extracting more texture from the visible image. These methods, while innovative and successful, still suffer from two major drawbacks: 1) the resolution of the source image is low, the resolution of the fused image is low, and texture details are lacked; 2) the regional scope of the salient features in the image cannot be accurately estimated, so that the salient features of the source image contained in the fusion result image are not complete enough.
To overcome the first two disadvantages, some work combines super-resolution with image fusion tasks. Dictionary learning-based methods learn a set of multi-scale dictionaries from a high-resolution image and then use sparse coefficients based on local information content to fuse low-resolution image blocks, but these methods require storage of the dictionaries from low-resolution image to high-resolution image, thereby consuming memory. Some methods fuse images by compressed sensing, however, these methods on the one hand require two steps, i.e. decomposing this task into super-resolution and fusion of the images, which is very time consuming. Some methods also use structure tensor, fractional differentiation and variation technology to fuse the image and the super-resolution into one step, but the methods can only perform the integral multiple of the super-resolution, are not flexible and practical enough, and the fusion result is not good enough.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an image super-fusion and fusion method based on regional information enhancement and block self-attention, so as to solve the image fusion problem when the resolution of a source image is low, and improve the quality of a fusion result.
The technical scheme adopted by the invention is as follows: an image super-fusion and fusion method based on region information enhancement and block self-attention is disclosed, taking a low-resolution multi-focus image fusion method as an example, and a flow chart is shown in fig. 1, wherein the method specifically comprises the following steps:
step1, during the task of over-resolution and fusion of multi-focus images, as shown in FIG. 1, the source image with low resolution
Figure BDA0002526821440000021
Figure BDA0002526821440000022
Respectively input into the source image super-resolution branches, and at the same time,
Figure BDA0002526821440000023
spliced together according to the channels and input into the fusion and super-resolution branches. And a 3 multiplied by 3 convolutional layer is arranged at the beginning of the source image super-resolution branch and the fusion and super-resolution branch and is used for preliminarily extracting features. And then, the source image super-resolution branch contains 16 feature extraction blocks and 16 region information enhancement blocks, and the fusion and super-resolution branch contains 16 fusion blocks based on a block self-attention mechanism. 16 feature extraction blocks, 16 region information enhancement blocks and 16 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i (i is more than or equal to 0 and less than or equal to 16) is defined as the ith feature extraction block/region information enhancement block/fusion block based on the block self-attention mechanism.
Step2, in the source image super-resolution branch, the initial feature map passes through 16 feature extraction blocks, and the 16 feature extraction blocks are connected in a dense connection mode. Output of the i-1 th feature extraction Block
Figure BDA0002526821440000024
Besides being continuously input into the ith feature extraction block to construct a super-resolution source image, the super-resolution source image is also input into the ith region information enhancement block to assist in fusion and super-resolution branch acquisition decision weight map. The region information enhancement block will enhance the information of the salient feature region, especially the feature information of the focus region. Regional information augmentationThe information output by the strong block is input into the ith fusion block based on the block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 16 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, after the 16 feature extraction blocks in the source image super-resolution branch, and after the 16 block self-attention mechanism based fusion blocks in the source image super-resolution branch, is a layer of 1 × 1 convolutions and a layer of sub-pixel convolutions. 1 x 1 convolution reduction
Figure BDA0002526821440000031
(16 th block-based output from the fusion block of the attention mechanism) to the square of the magnification factor r, where
Figure BDA0002526821440000032
Are respectively
Figure BDA0002526821440000033
The output of the 16 th characteristic extraction block of the source image super-resolution branch,
Figure BDA0002526821440000034
Sub-pixel convolution is carried out on the output of a 16 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtained
Figure BDA0002526821440000035
Super-resolution results of
Figure BDA0002526821440000036
In the fusion and super-resolution branch, normalization is carried out through a Sigmoid function and obtained through threshold divisionMulti-focus image fused decision weight graph WSRFinally, combining the source image to obtain a super-resolution fusion result image
Figure BDA0002526821440000037
Step5, obtaining the network parameter through Step4 in the network parameter training process
Figure BDA0002526821440000038
Super-resolution results of
Figure BDA0002526821440000039
Figure BDA00025268214400000310
And a decision weight graph WSRSuper-resolution fusion result image
Figure BDA00025268214400000311
And then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases to be flat, and obtaining a high-quality super-resolution and fusion result by testing.
Specifically, the dense connection mode proposed in Step2 refers to an initial feature map f output by a first layer convolution layer in a source image super-resolution branch0And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block. Finally, f0And the outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1. The structure of the feature extraction block is shown in fig. 2(a), which is composed of three convolution layers of 3 × 3, and uses a residual learning mode to alleviate the degradation problem caused by the deep network;
Specifically, the region information enhancement block proposed in Step2 is shown in fig. 2 (c). Firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; outputting the characteristic diagrams and slicing according to channels to obtain two characteristic diagrams with the same dimensionThe two characteristic diagrams are the offset of the input characteristic diagram in the horizontal and vertical directions; namely, the convolutional layer learns the offset of each position of the input feature map in the horizontal and vertical directions, and the horizontal and vertical offsets and the input feature map are input into the deformable convolution, so that the feature map closer to the shape and the size of the object is obtained. Definition of
Figure BDA00025268214400000312
Are respectively as
Figure BDA00025268214400000313
The amount of offset in the horizontal and vertical directions of the,
Figure BDA00025268214400000314
are respectively as
Figure BDA00025268214400000315
Is offset in the horizontal and vertical directions, wherein
Figure BDA00025268214400000316
Are respectively
Figure BDA00025268214400000317
The output of the ith feature extraction block of the super-resolution branch of the source image,
Figure BDA0002526821440000041
And outputting the ith characteristic extraction block of the source image super-resolution branch. Therefore, the feature map of the salient object region information input to the super-resolution and fusion branch i-th time
Figure BDA0002526821440000042
The calculation method is as follows:
Figure BDA0002526821440000043
Figure BDA0002526821440000044
Figure BDA0002526821440000045
Figure BDA0002526821440000046
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
Specifically, the block self-attention mechanism proposed in Step3 means that attention should be paid to pixels having a large influence on a pixel when local characteristics of the pixel are considered. In the present invention, the feature relationship of each location to its 7 × 7 neighborhood will be explored. In that
Figure BDA0002526821440000047
In for position p, define
Figure BDA0002526821440000048
A neighborhood range of 7 × 7 with p as the center point;
Figure BDA0002526821440000049
is composed of
Figure BDA00025268214400000410
The characteristic values corresponding to the regions, () fuse the information in the neighborhood range; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p positionpCan be calculated as:
Figure BDA00025268214400000411
wherein
Figure BDA00025268214400000412
I.e. calculating the eigenvector x of the p position by using the transposition multiplication modepAnd the feature vector x of the q positionqThe correlation of (c). BatchNormalize (·) is a batch normalization operation.
The fusion block based on the block self-attention mechanism proposed in Step3 means that a fusion feature map output in the front is spliced with a feature map of a salient focusing area input by a source image super-resolution branch, information integration is carried out through convolution of 1 × 1 and convolution of several layers of 3 × 3, and then the self-attention mechanism based on the block range is used for more accurately highlighting the range of a salient object.
Specifically, the normalization of the Sigmoid function in Step4 means that:
Figure BDA00025268214400000413
wherein
Figure BDA0002526821440000051
Representing the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; and (m, n) represents a coordinate position, and then a decision weight graph of multi-focus image fusion is obtained by using threshold t division. The invention sets t to 0.5, decides the weight graph WSRThis can be obtained by the following formula:
Figure BDA0002526821440000052
then, the results are fused
Figure BDA0002526821440000053
Can be represented by a decision weight graph WSRObtaining:
Figure BDA0002526821440000054
specifically, the loss calculation proposed in Step5 adopts the method with better convex optimizationThe L1 norm of the property is used to calculate the loss and an Adam optimizer is used to minimize the loss value. Definition of
Figure BDA0002526821440000055
Are tag values, respectively
Figure BDA0002526821440000056
Corresponding high resolution image,
Figure BDA0002526821440000057
Corresponding high resolution image, high resolution fused image, WSR、WHRIf the fusion decision graph and the high-resolution label fusion decision graph are super-resolution, the loss is calculated as follows:
Figure BDA0002526821440000058
specifically, Relu is used as the nonlinear activation function after all convolutional layers, except where specifically noted; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
The invention has the beneficial effects that: the method comprises a source image super-resolution branch and a fusion super-resolution branch, wherein the image super-resolution branch assists in fusing the super-resolution branch to obtain an accurate fusion decision diagram. In the source image super-resolution branch, a feature extraction block is used iteratively to extract a source image feature map, and dense connection is used to fully utilize feature map information before and after the source image feature map is extracted. The output of each feature extraction block also passes through an area information enhancement block to search the range and the area of each object in the source image, and the information is transmitted to a fusion super-resolution branch to accurately predict a fused decision weight map. In the fusion super-resolution branch, two source images are spliced together and input, and a fusion block based on a block self-attention mechanism is iteratively used by combining the information of the source images after the region enhancement input in the source image super-resolution branch, so that the focused region and the unfocused region are better distinguished. And finally, using sub-pixel convolution as an upsampling layer to generate a super-resolution source image and a fusion image.
Drawings
FIG. 1 is an overall architecture diagram of the present invention incorporating an embodiment;
fig. 2 is a diagram of the structure of each sub-module: (a) a structure diagram of a feature extraction block in a source image super-resolution branch is shown; (b) the structure diagram of the fusion block based on the self-attention mechanism in the super-resolution and fusion branch is shown; (c) the tile is patterned for region information enhancement.
Detailed Description
The following detailed description of the embodiments, specific examples and flow diagrams are shown in FIG. 1. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present application. But merely as exemplifications of systems and methods consistent with certain aspects of the application, as recited in the claims.
Example 1: referring to fig. 1, a schematic diagram of steps of an image super-fusion and fusion method based on region information enhancement and block self-attention according to the present application is shown, in which an input source image and an output result image of a specific example are also shown. As shown in fig. 1, the present application is composed of a source image super-resolution branch and a super-resolution and fusion branch, and provides an image super-resolution and fusion method based on region information enhancement and block self-attention, including:
step1, during the task of over-resolution and fusion of multi-focus images, as shown in FIG. 1, the source image with low resolution
Figure BDA0002526821440000061
Figure BDA0002526821440000062
Respectively input into the source image super-resolution branches, and at the same time,
Figure BDA0002526821440000063
spliced together by channel input into fusionAnd in the super-resolution branch. And a 3 multiplied by 3 convolutional layer is arranged at the beginning of the source image super-resolution branch and the fusion and super-resolution branch and is used for preliminarily extracting features. And then, the source image super-resolution branch contains 16 feature extraction blocks and 16 region information enhancement blocks, and the fusion and super-resolution branch contains 16 fusion blocks based on a block self-attention mechanism. 16 feature extraction blocks, 16 region information enhancement blocks and 16 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i (i is more than or equal to 0 and less than or equal to 16) is defined as the ith feature extraction block/region information enhancement block/fusion block based on the block self-attention mechanism.
Step2, in the source image super-resolution branch, the initial feature map passes through 16 feature extraction blocks, and the 16 feature extraction blocks are connected in a dense connection mode. Output of the i-1 th feature extraction Block
Figure BDA0002526821440000064
Besides being continuously input into the ith feature extraction block to construct a super-resolution source image, the super-resolution source image is also input into the ith region information enhancement block to assist in fusion and super-resolution branch acquisition decision weight map. The region information enhancement block will enhance the information of the salient feature region, especially the feature information of the focus region. The information output by the region information enhancement block is input into the ith fusion block based on the block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 16 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, after passing 16 feature extraction blocks in the source image super-resolution branch, and 16 block-self-attention mechanism-based fusion blocks in the source image super-resolution branch, is a layer of 1 × 1 convolutional layers and a layer of sub-pixel convolutions. 1 x 1 convolution reduction
Figure BDA0002526821440000071
To the square of the magnification r, wherein
Figure BDA0002526821440000072
Are respectively
Figure BDA0002526821440000073
The output of the 16 th characteristic extraction block of the source image super-resolution branch,
Figure BDA0002526821440000074
Sub-pixel convolution is carried out on the output of a 16 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtained
Figure BDA0002526821440000075
Super-resolution results of
Figure BDA0002526821440000076
In the fusion and super-resolution branch, normalization is carried out through a Sigmoid function, and a decision weight graph W for multi-focus image fusion is obtained through threshold divisionSRFinally, combining the source image to obtain a super-resolution fusion result image
Figure BDA0002526821440000077
Step5, obtaining the network parameter through Step4 in the network parameter training process
Figure BDA0002526821440000078
Super-resolution results of
Figure BDA0002526821440000079
Figure BDA00025268214400000710
And a decision weight graph WSRSuper-resolution fusion result image
Figure BDA00025268214400000711
And then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases to be flat, and obtaining a high-quality super-resolution and fusion result by testing.
Furthermore, in Step2, the dense connection mode refers to the initial feature map f output by the first layer convolution layer in the super-resolution branch of the source image0And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block. Finally, f0And the outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1. The structure of the feature extraction block is shown in fig. 2(a), which is composed of three convolution layers of 3 × 3, and uses a residual learning mode to alleviate the degradation problem caused by the deep network;
further, in Step2, the proposed region information enhancement block is shown in fig. 2 (c). Firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; the output characteristic diagram is sliced according to the channel to obtain two characteristic diagrams with the same dimension, and the two characteristic diagrams are the offset of the input characteristic diagram in the horizontal direction and the vertical direction; namely, the convolutional layer learns the offset of each position of the input feature map in the horizontal and vertical directions, and the horizontal and vertical offsets and the input feature map are input into the deformable convolution, so that the feature map closer to the shape and the size of the object is obtained. Definition of
Figure BDA00025268214400000712
Are respectively as
Figure BDA00025268214400000713
The amount of offset in the horizontal and vertical directions of the,
Figure BDA00025268214400000714
are respectively as
Figure BDA00025268214400000715
Is offset in the horizontal and vertical directions, wherein
Figure BDA00025268214400000716
Are respectively
Figure BDA00025268214400000717
The output of the ith feature extraction block of the super-resolution branch of the source image,
Figure BDA00025268214400000718
And outputting the ith characteristic extraction block of the source image super-resolution branch. Therefore, the feature map of the salient object region information input to the super-resolution and fusion branch i-th time
Figure BDA0002526821440000081
The calculation method is as follows:
Figure BDA0002526821440000082
Figure BDA0002526821440000083
Figure BDA0002526821440000084
Figure BDA0002526821440000085
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
Further, in Step3, the block self-attention mechanism means that when the local feature of a pixel is considered, attention should be paid to the pixel which has a large influence on the local feature. In the present invention, each location will be explored with its 7 × 7 neighborhood rangeCharacteristic relationships within the enclosure. In that
Figure BDA0002526821440000086
In for position p, define
Figure BDA0002526821440000087
A neighborhood range of 7 × 7 with p as the center point;
Figure BDA0002526821440000088
is composed of
Figure BDA0002526821440000089
The characteristic values corresponding to the regions, () fuse the information in the neighborhood range; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p position pCan be calculated as:
Figure BDA00025268214400000810
wherein
Figure BDA00025268214400000811
I.e. calculating the eigenvector x of the p position by using the transposition multiplication modepAnd the feature vector x of the q positionqThe correlation of (c). BatchNormalize (·) is a batch normalization operation.
Further, in Step3, the fusion block based on the block self-attention mechanism means that the previously output fusion feature map is spliced with the feature map of the salient focusing region input by the source image super-resolution branch, and after information integration is performed through convolution of 1 × 1 and convolution of several layers of 3 × 3, the self-attention mechanism based on the block range is used to more accurately highlight the range of the salient object.
In Step4, the normalization of Sigmoid function means:
Figure BDA00025268214400000812
wherein
Figure BDA00025268214400000813
Representing the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; and (m, n) represents a coordinate position, and then a decision weight graph of multi-focus image fusion is obtained by using threshold t division. The invention sets t to 0.5, decides the weight graph WSRThis can be obtained by the following formula:
Figure BDA0002526821440000091
then, the results are fused
Figure BDA0002526821440000092
Can be represented by a decision weight graph WSRObtaining:
Figure BDA0002526821440000093
further, in Step5, regarding the loss calculation, the present invention calculates the loss using the L1 norm with better convex optimization properties and uses an Adam optimizer to minimize the loss value. Definition of
Figure BDA0002526821440000094
Are tag values, respectively
Figure BDA0002526821440000095
Corresponding high resolution image,
Figure BDA0002526821440000096
Corresponding high resolution image, high resolution fused image, WSR、WHRIf the fusion decision graph and the high-resolution label fusion decision graph are super-resolution, the loss is calculated as follows:
Figure BDA0002526821440000097
in Step5, the input test image, i.e. the two low-resolution source images on the left side in fig. 1, is the input low-resolution source image of the specific example, and the intermediate image on the right side in fig. 1 is the fusion result image of the specific example, it can be seen that the super-resolution fusion result contains abundant texture detail information of the two low-resolution source images, which indicates that the method can capture information in the low-resolution source images deeply and further generate natural high-quality details. The focused boundary and the non-focused boundary are accurately estimated, which shows that the region information enhancement block of the invention has the effect of accurately estimating the object contour, the fusion block based on the block attention mechanism has the effect of accurately estimating the focused region, and the combination of the two blocks ensures the information fusion of the focused regions of the two source images.
Further, unless otherwise specified, Relu is used as the nonlinear activation function after all convolutional layers; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (7)

1. An image super-fusion method based on region information enhancement and block self-attention is characterized in that: the method comprises the following specific steps:
step1, in the task of over-dividing and fusing multi-focus images, the source image with low resolution
Figure FDA0002526821430000011
Respectively input into the source image super-resolution branches, and at the same time,
Figure FDA0002526821430000012
splicing the source image super-resolution branch and the fusion and super-resolution branch together according to channels, inputting the source image super-resolution branch and the fusion and super-resolution branch into a 3 x 3 convolution layer at the beginning for preliminarily extracting features, then, the source image super-resolution branch contains 16 feature extraction blocks and 16 region information enhancement blocks, the fusion and super-resolution branch contains 16 fusion blocks based on a block self-attention mechanism, the 16 feature extraction blocks, the 16 region information enhancement blocks and the 16 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i is defined, i is more than or equal to 0 and less than or equal to 16 and is the ith feature extraction block or the region information enhancement block or the fusion block based on the block self-attention mechanism;
Step2, in the source image super-resolution branch, the initial feature map passes through 16 feature extraction blocks, and the 16 feature extraction blocks are connected in a dense connection mode, and the output of the i-1 th feature extraction block
Figure FDA0002526821430000013
In addition to being continuously input into the ith feature extraction block to construct a super-resolution source image, the feature extraction block also inputs the feature extraction block into the ith region information enhancement block to assist the fusion and super-resolution branch to acquire a decision weight map, the region information enhancement block enhances the information of a significant feature region, particularly the feature information of a focusing region, and the information output by the region information enhancement block is input into the ith fusion block based on a block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 16 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, following the 16 feature extraction blocks in the source image super-resolution branch, and the 16 block-autofocusing mechanism-based fusion blocks in the source image super-resolution branch, is a layer of 1 × 1 convolutions and a layer of sub-pixel convolutions, a 1 × 1 convolution reduction
Figure FDA0002526821430000014
To the square of the magnification r, wherein
Figure FDA0002526821430000015
Figure FDA0002526821430000016
Are respectively
Figure FDA0002526821430000017
The output of the 16 th characteristic extraction block of the source image super-resolution branch,
Figure FDA0002526821430000018
Sub-pixel convolution is carried out on the output of a 16 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtained
Figure FDA0002526821430000019
Super-resolution results of
Figure FDA00025268214300000110
In the fusion and super-resolution branch, normalization is carried out through a Sigmoid function, and a decision weight graph W for multi-focus image fusion is obtained through threshold divisionSRFinally, combining the source image to obtain a super-resolution fusion result image
Figure FDA00025268214300000216
Step5, obtaining the network parameter through Step4 in the network parameter training process
Figure FDA0002526821430000021
Super-resolution results of
Figure FDA0002526821430000022
Figure FDA0002526821430000023
And a decision weight graph WSRSuper-resolution fusion result image
Figure FDA0002526821430000024
And then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases to be flat, and obtaining a high-quality super-resolution and fusion result by testing.
2. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein:
the dense connection mode proposed in Step2 refers to the following steps: initial characteristic diagram f output by first layer convolution layer in source image super-resolution branch0And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block, and finally, f0The outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1; the structure of the feature extraction block is composed of three convolution layers of 3 x 3, and residual learning is used to alleviate the degradation problem caused by the deep network.
3. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein:
the region information enhancement block proposed in Step2 is: firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; the output characteristic diagram is sliced according to the channel to obtain two characteristic diagrams with the same dimension, and the two characteristic diagrams are the offset of the input characteristic diagram in the horizontal direction and the vertical direction; that is, the convolutional layer learns that each position of the input feature map is in horizontal and vertical The directional offset, horizontal and vertical offsets and the input feature map are input into a deformable convolution to obtain a feature map that is closer to the shape and size of the object, defining
Figure FDA0002526821430000025
Are respectively as
Figure FDA0002526821430000026
The amount of offset in the horizontal and vertical directions of the,
Figure FDA0002526821430000027
are respectively as
Figure FDA0002526821430000028
Is offset in the horizontal and vertical directions, wherein
Figure FDA0002526821430000029
Are respectively
Figure FDA00025268214300000210
The output of the ith feature extraction block of the super-resolution branch of the source image,
Figure FDA00025268214300000211
The output of the ith feature extraction block in the super-resolution branch of the source image, and therefore the feature map of the salient object region information input to the super-resolution and fusion branch for the ith time
Figure FDA00025268214300000212
The calculation method is as follows:
Figure FDA00025268214300000213
Figure FDA00025268214300000214
Figure FDA00025268214300000215
Figure FDA0002526821430000031
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
4. The method for image hyper-and fusion based on region information enhancement and block self-attention of claim 3, characterized in that:
the block self-attention mechanism proposed in Step3 means that when the local feature of a pixel is considered, attention should be paid to the pixel which has a large influence on the local feature, and the feature relationship between each position and the 7 x 7 neighborhood of each position is explored, wherein
Figure FDA0002526821430000032
In for position p, define
Figure FDA0002526821430000033
A neighborhood range of 7 × 7 with p as the center point;
Figure FDA0002526821430000034
is composed of
Figure FDA0002526821430000035
The characteristic values corresponding to the regions, () fuse the information in the neighborhood range; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p positionpCan be calculated as:
Figure FDA0002526821430000036
wherein
Figure FDA0002526821430000037
Figure FDA0002526821430000038
I.e. calculating the eigenvector x of the p position by using the transposition multiplication modepAnd the feature vector x of the q positionqThe relevance of (1), BatchNormalize (·) is a batch normalization operation;
the fusion block based on the block self-attention mechanism proposed in Step3 means that a fusion feature map output in the front is spliced with a feature map of a salient focusing area input by a source image super-resolution branch, information integration is carried out through convolution of 1 × 1 and convolution of several layers of 3 × 3, and then the self-attention mechanism based on the block range is used for more accurately highlighting the range of a salient object.
5. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein: the normalization of the Sigmoid function in Step4 means that:
Figure FDA0002526821430000039
wherein
Figure FDA00025268214300000310
Representing the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; (m, n) represents coordinate positions, and then a decision weight map of multi-focus image fusion is obtained by using threshold t division, t is set to be 0.5, and a decision weight map W is obtainedSRThis can be obtained by the following formula:
Figure FDA0002526821430000041
then, the results are fused
Figure FDA0002526821430000042
Can be represented by a decision weight graph WSRObtaining:
Figure FDA0002526821430000043
6. the method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein: the loss calculation proposed in Step5, using the L1 norm with better convex optimization properties to calculate the loss, and using an Adam optimizer to minimize the loss value, defines the loss
Figure FDA0002526821430000044
Are tag values, respectively
Figure FDA0002526821430000045
Corresponding high resolution image,
Figure FDA0002526821430000046
Corresponding high resolution image, high resolution fused image, WSR、WHRIf the fusion decision graph and the high-resolution label fusion decision graph are super-resolution, the loss is calculated as follows:
Figure FDA0002526821430000047
7. the image super-fusion method based on region information enhancement and block self-attention according to any one of claims 1-6, characterized in that: relu was used as the nonlinear activation function after all convolutional layers, except where specifically noted; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
CN202010506835.XA 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention Active CN111861880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010506835.XA CN111861880B (en) 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010506835.XA CN111861880B (en) 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention

Publications (2)

Publication Number Publication Date
CN111861880A true CN111861880A (en) 2020-10-30
CN111861880B CN111861880B (en) 2022-08-30

Family

ID=72986067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010506835.XA Active CN111861880B (en) 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention

Country Status (1)

Country Link
CN (1) CN111861880B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418163A (en) * 2020-12-09 2021-02-26 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN112784909A (en) * 2021-01-28 2021-05-11 哈尔滨工业大学 Image classification and identification method based on self-attention mechanism and self-adaptive sub-network
CN113094972A (en) * 2021-03-15 2021-07-09 西南大学 Basement depth prediction method and system based on generation of confrontation network and environmental element data
CN113537246A (en) * 2021-08-12 2021-10-22 浙江大学 Gray level image simultaneous coloring and hyper-parting method based on counterstudy
CN113705675A (en) * 2021-08-27 2021-11-26 合肥工业大学 Multi-focus image fusion method based on multi-scale feature interaction network
CN113837946A (en) * 2021-10-13 2021-12-24 中国电子技术标准化研究院 Lightweight image super-resolution reconstruction method based on progressive distillation network
CN113963009A (en) * 2021-12-22 2022-01-21 中科视语(北京)科技有限公司 Local self-attention image processing method and model based on deformable blocks

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182109A1 (en) * 2016-12-22 2018-06-28 TCL Research America Inc. System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles
US20190122103A1 (en) * 2017-10-24 2019-04-25 International Business Machines Corporation Attention based sequential image processing
CN109714592A (en) * 2019-01-31 2019-05-03 天津大学 Stereo image quality evaluation method based on binocular fusion network
US20190156220A1 (en) * 2017-11-22 2019-05-23 Microsoft Technology Licensing, Llc Using machine comprehension to answer a question
CN109859106A (en) * 2019-01-28 2019-06-07 桂林电子科技大学 A kind of image super-resolution rebuilding method based on the high-order converged network from attention
CN110033410A (en) * 2019-03-28 2019-07-19 华中科技大学 Image reconstruction model training method, image super-resolution rebuilding method and device
CN110322402A (en) * 2019-04-30 2019-10-11 武汉理工大学 Medical image super resolution ratio reconstruction method based on dense mixing attention network
CN110334765A (en) * 2019-07-05 2019-10-15 西安电子科技大学 Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism
CN111179167A (en) * 2019-12-12 2020-05-19 天津大学 Image super-resolution method based on multi-stage attention enhancement network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182109A1 (en) * 2016-12-22 2018-06-28 TCL Research America Inc. System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles
US20190122103A1 (en) * 2017-10-24 2019-04-25 International Business Machines Corporation Attention based sequential image processing
US20190156220A1 (en) * 2017-11-22 2019-05-23 Microsoft Technology Licensing, Llc Using machine comprehension to answer a question
CN109859106A (en) * 2019-01-28 2019-06-07 桂林电子科技大学 A kind of image super-resolution rebuilding method based on the high-order converged network from attention
CN109714592A (en) * 2019-01-31 2019-05-03 天津大学 Stereo image quality evaluation method based on binocular fusion network
CN110033410A (en) * 2019-03-28 2019-07-19 华中科技大学 Image reconstruction model training method, image super-resolution rebuilding method and device
CN110322402A (en) * 2019-04-30 2019-10-11 武汉理工大学 Medical image super resolution ratio reconstruction method based on dense mixing attention network
CN110334765A (en) * 2019-07-05 2019-10-15 西安电子科技大学 Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism
CN111179167A (en) * 2019-12-12 2020-05-19 天津大学 Image super-resolution method based on multi-stage attention enhancement network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIANG X C ET AL.: ""MCFNet: multi-layer concatenation fusion network for medical images fusion"", 《IEEE SENSORS JOURNAL》 *
Y QING-MING LIU: ""Face Super-Resolution Reconstruction Based on Self-Attention Residual Network"", 《IEEE ACCESS》 *
朱欣鑫: ""基于深度学习的图像描述算法研究"", 《信息科技》 *
杨默远等: ""卷积稀疏表示图像融合与超分辨率联合实现"", 《光学技术》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418163A (en) * 2020-12-09 2021-02-26 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN112418163B (en) * 2020-12-09 2022-07-12 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN112784909A (en) * 2021-01-28 2021-05-11 哈尔滨工业大学 Image classification and identification method based on self-attention mechanism and self-adaptive sub-network
CN113094972A (en) * 2021-03-15 2021-07-09 西南大学 Basement depth prediction method and system based on generation of confrontation network and environmental element data
CN113094972B (en) * 2021-03-15 2022-08-02 西南大学 Bedrock depth prediction method and system based on generation of confrontation network and environmental element data
CN113537246A (en) * 2021-08-12 2021-10-22 浙江大学 Gray level image simultaneous coloring and hyper-parting method based on counterstudy
CN113705675A (en) * 2021-08-27 2021-11-26 合肥工业大学 Multi-focus image fusion method based on multi-scale feature interaction network
CN113705675B (en) * 2021-08-27 2022-10-04 合肥工业大学 Multi-focus image fusion method based on multi-scale feature interaction network
CN113837946A (en) * 2021-10-13 2021-12-24 中国电子技术标准化研究院 Lightweight image super-resolution reconstruction method based on progressive distillation network
CN113963009A (en) * 2021-12-22 2022-01-21 中科视语(北京)科技有限公司 Local self-attention image processing method and model based on deformable blocks

Also Published As

Publication number Publication date
CN111861880B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
Islam et al. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN111325794B (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
Engin et al. Cycle-dehaze: Enhanced cyclegan for single image dehazing
CN109791697B (en) Predicting depth from image data using statistical models
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
CN113657388B (en) Image semantic segmentation method for super-resolution reconstruction of fused image
CN110717851A (en) Image processing method and device, neural network training method and storage medium
CN110378838B (en) Variable-view-angle image generation method and device, storage medium and electronic equipment
CN110381268B (en) Method, device, storage medium and electronic equipment for generating video
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN110910437A (en) Depth prediction method for complex indoor scene
CN117253154B (en) Container weak and small serial number target detection and identification method based on deep learning
CN115713679A (en) Target detection method based on multi-source information fusion, thermal infrared and three-dimensional depth map
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
Hu et al. Effective local-global transformer for natural image matting
CN116645598A (en) Remote sensing image semantic segmentation method based on channel attention feature fusion
Li et al. An improved method for underwater image super-resolution and enhancement
CN116563103A (en) Remote sensing image space-time fusion method based on self-adaptive neural network
CN112950653B (en) Attention image segmentation method, device and medium
CN113780305B (en) Significance target detection method based on interaction of two clues
Nie et al. Binocular image dehazing via a plain network without disparity estimation
CN114565764A (en) Port panorama sensing system based on ship instance segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant