CN111861880A - Image super-fusion method based on regional information enhancement and block self-attention - Google Patents
Image super-fusion method based on regional information enhancement and block self-attention Download PDFInfo
- Publication number
- CN111861880A CN111861880A CN202010506835.XA CN202010506835A CN111861880A CN 111861880 A CN111861880 A CN 111861880A CN 202010506835 A CN202010506835 A CN 202010506835A CN 111861880 A CN111861880 A CN 111861880A
- Authority
- CN
- China
- Prior art keywords
- super
- fusion
- resolution
- block
- source image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 14
- 230000004927 fusion Effects 0.000 claims abstract description 115
- 238000000605 extraction Methods 0.000 claims abstract description 52
- 230000007246 mechanism Effects 0.000 claims abstract description 38
- 238000010586 diagram Methods 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000010354 integration Effects 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000015556 catabolic process Effects 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims description 3
- 238000006731 degradation reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4046—Scaling the whole image or part thereof using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Abstract
The invention relates to an image super-fusion method based on regional information enhancement and block self-attention, and belongs to the technical field of digital image processing. The method comprises a source image super-resolution branch and a fusion super-resolution branch. In the source image super-resolution branch, a feature extraction block is used iteratively to extract a source image feature map, and dense connection is used to fully utilize feature map information before and after the source image feature map is extracted. The output of each feature extraction block also passes through the region information enhancement block to explore the region of each object in the source image, and the information assists in fusing the super-resolution branch accurate prediction fusion decision diagram. In the fusion super-resolution branch, two source images are spliced together and input, and a fusion block based on a block self-attention mechanism is iteratively used by combining the source image information after the region enhancement input in the source image super-resolution branch so as to better distinguish focusing and non-focusing regions. And finally, performing sub-pixel convolution on each branch to generate a super-resolution source image and a fusion image.
Description
Technical Field
The invention relates to an image super-fusion method based on regional information enhancement and block self-attention, and belongs to the technical field of image information processing.
Background
The purpose of image fusion is to fuse the information of two or more source images captured by different cameras in the same scene into one image and ensure that the information of each source image can be preserved. The image fusion has very wide application in the fields of safety monitoring images, medical images, satellite remote sensing images and the like. In recent years, many researches have achieved good fusion effects, but the existing method is usually based on the de-fusion of high-resolution multi-focus source image data sets, however, the images obtained by the real-world imaging system are not necessarily high-resolution images. When fusing low-resolution source images, the fused image will also be low-resolution, even blurred and lacking in detail information, which reduces the utility of the image fusion technique. In order to input a low-resolution source image into a traditional fusion method for fusion, bicubic interpolation and nearest neighbor interpolation are generally adopted as upsampling operations to unify the resolution of the source image. However, these interpolation methods are too simple, have no pertinence to different data, and introduce wrong information to reduce the accuracy of image texture details, resulting in poor fusion effect; in addition, for the fusion task of multi-focus images, the accuracy of the fusion decision diagram is also reduced. Therefore, in order to solve these disadvantages and make the task of low resolution image fusion more efficient, a method capable of accurately super-resolving and fusing images is urgently needed.
In recent years, many image fusion methods based on deep learning have been proposed, which have a greater ability to extract texture and detail than fusion methods based on transform domain and spatial domain. One of the methods is to use an encoder-decoder network, extract the features of the source image by using an encoder, fuse the features by using a decoding network, and gradually amplify to obtain a fused image. One class of methods uses a pre-trained classification convolutional network to input image blocks into the network to predict whether the image blocks are focused, thereby generating a fusion decision diagram. One class of methods decomposes a source image into a base layer containing large-scale contours or intensity variations and a detail layer containing important textures, which are fused separately. Still other methods are based on generating a countermeasure network, implementing a fused image with the generator, while the discriminator is only used to distinguish the difference between the fused image and the visible image, extracting more texture from the visible image. These methods, while innovative and successful, still suffer from two major drawbacks: 1) the resolution of the source image is low, the resolution of the fused image is low, and texture details are lacked; 2) the regional scope of the salient features in the image cannot be accurately estimated, so that the salient features of the source image contained in the fusion result image are not complete enough.
To overcome the first two disadvantages, some work combines super-resolution with image fusion tasks. Dictionary learning-based methods learn a set of multi-scale dictionaries from a high-resolution image and then use sparse coefficients based on local information content to fuse low-resolution image blocks, but these methods require storage of the dictionaries from low-resolution image to high-resolution image, thereby consuming memory. Some methods fuse images by compressed sensing, however, these methods on the one hand require two steps, i.e. decomposing this task into super-resolution and fusion of the images, which is very time consuming. Some methods also use structure tensor, fractional differentiation and variation technology to fuse the image and the super-resolution into one step, but the methods can only perform the integral multiple of the super-resolution, are not flexible and practical enough, and the fusion result is not good enough.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an image super-fusion and fusion method based on regional information enhancement and block self-attention, so as to solve the image fusion problem when the resolution of a source image is low, and improve the quality of a fusion result.
The technical scheme adopted by the invention is as follows: an image super-fusion and fusion method based on region information enhancement and block self-attention is disclosed, taking a low-resolution multi-focus image fusion method as an example, and a flow chart is shown in fig. 1, wherein the method specifically comprises the following steps:
step1, during the task of over-resolution and fusion of multi-focus images, as shown in FIG. 1, the source image with low resolution Respectively input into the source image super-resolution branches, and at the same time,spliced together according to the channels and input into the fusion and super-resolution branches. And a 3 multiplied by 3 convolutional layer is arranged at the beginning of the source image super-resolution branch and the fusion and super-resolution branch and is used for preliminarily extracting features. And then, the source image super-resolution branch contains 16 feature extraction blocks and 16 region information enhancement blocks, and the fusion and super-resolution branch contains 16 fusion blocks based on a block self-attention mechanism. 16 feature extraction blocks, 16 region information enhancement blocks and 16 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i (i is more than or equal to 0 and less than or equal to 16) is defined as the ith feature extraction block/region information enhancement block/fusion block based on the block self-attention mechanism.
Step2, in the source image super-resolution branch, the initial feature map passes through 16 feature extraction blocks, and the 16 feature extraction blocks are connected in a dense connection mode. Output of the i-1 th feature extraction Block Besides being continuously input into the ith feature extraction block to construct a super-resolution source image, the super-resolution source image is also input into the ith region information enhancement block to assist in fusion and super-resolution branch acquisition decision weight map. The region information enhancement block will enhance the information of the salient feature region, especially the feature information of the focus region. Regional information augmentationThe information output by the strong block is input into the ith fusion block based on the block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 16 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, after the 16 feature extraction blocks in the source image super-resolution branch, and after the 16 block self-attention mechanism based fusion blocks in the source image super-resolution branch, is a layer of 1 × 1 convolutions and a layer of sub-pixel convolutions. 1 x 1 convolution reduction(16 th block-based output from the fusion block of the attention mechanism) to the square of the magnification factor r, whereAre respectivelyThe output of the 16 th characteristic extraction block of the source image super-resolution branch,Sub-pixel convolution is carried out on the output of a 16 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtained Super-resolution results ofIn the fusion and super-resolution branch, normalization is carried out through a Sigmoid function and obtained through threshold divisionMulti-focus image fused decision weight graph WSRFinally, combining the source image to obtain a super-resolution fusion result image
Step5, obtaining the network parameter through Step4 in the network parameter training processSuper-resolution results of And a decision weight graph WSRSuper-resolution fusion result imageAnd then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases to be flat, and obtaining a high-quality super-resolution and fusion result by testing.
Specifically, the dense connection mode proposed in Step2 refers to an initial feature map f output by a first layer convolution layer in a source image super-resolution branch0And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block. Finally, f0And the outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1. The structure of the feature extraction block is shown in fig. 2(a), which is composed of three convolution layers of 3 × 3, and uses a residual learning mode to alleviate the degradation problem caused by the deep network;
Specifically, the region information enhancement block proposed in Step2 is shown in fig. 2 (c). Firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; outputting the characteristic diagrams and slicing according to channels to obtain two characteristic diagrams with the same dimensionThe two characteristic diagrams are the offset of the input characteristic diagram in the horizontal and vertical directions; namely, the convolutional layer learns the offset of each position of the input feature map in the horizontal and vertical directions, and the horizontal and vertical offsets and the input feature map are input into the deformable convolution, so that the feature map closer to the shape and the size of the object is obtained. Definition ofAre respectively asThe amount of offset in the horizontal and vertical directions of the,are respectively asIs offset in the horizontal and vertical directions, whereinAre respectivelyThe output of the ith feature extraction block of the super-resolution branch of the source image,And outputting the ith characteristic extraction block of the source image super-resolution branch. Therefore, the feature map of the salient object region information input to the super-resolution and fusion branch i-th timeThe calculation method is as follows:
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
Specifically, the block self-attention mechanism proposed in Step3 means that attention should be paid to pixels having a large influence on a pixel when local characteristics of the pixel are considered. In the present invention, the feature relationship of each location to its 7 × 7 neighborhood will be explored. In thatIn for position p, defineA neighborhood range of 7 × 7 with p as the center point;is composed ofThe characteristic values corresponding to the regions, () fuse the information in the neighborhood range; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p positionpCan be calculated as:
whereinI.e. calculating the eigenvector x of the p position by using the transposition multiplication modepAnd the feature vector x of the q positionqThe correlation of (c). BatchNormalize (·) is a batch normalization operation.
The fusion block based on the block self-attention mechanism proposed in Step3 means that a fusion feature map output in the front is spliced with a feature map of a salient focusing area input by a source image super-resolution branch, information integration is carried out through convolution of 1 × 1 and convolution of several layers of 3 × 3, and then the self-attention mechanism based on the block range is used for more accurately highlighting the range of a salient object.
Specifically, the normalization of the Sigmoid function in Step4 means that:
whereinRepresenting the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; and (m, n) represents a coordinate position, and then a decision weight graph of multi-focus image fusion is obtained by using threshold t division. The invention sets t to 0.5, decides the weight graph WSRThis can be obtained by the following formula:
specifically, the loss calculation proposed in Step5 adopts the method with better convex optimizationThe L1 norm of the property is used to calculate the loss and an Adam optimizer is used to minimize the loss value. Definition ofAre tag values, respectivelyCorresponding high resolution image,Corresponding high resolution image, high resolution fused image, WSR、WHRIf the fusion decision graph and the high-resolution label fusion decision graph are super-resolution, the loss is calculated as follows:
specifically, Relu is used as the nonlinear activation function after all convolutional layers, except where specifically noted; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
The invention has the beneficial effects that: the method comprises a source image super-resolution branch and a fusion super-resolution branch, wherein the image super-resolution branch assists in fusing the super-resolution branch to obtain an accurate fusion decision diagram. In the source image super-resolution branch, a feature extraction block is used iteratively to extract a source image feature map, and dense connection is used to fully utilize feature map information before and after the source image feature map is extracted. The output of each feature extraction block also passes through an area information enhancement block to search the range and the area of each object in the source image, and the information is transmitted to a fusion super-resolution branch to accurately predict a fused decision weight map. In the fusion super-resolution branch, two source images are spliced together and input, and a fusion block based on a block self-attention mechanism is iteratively used by combining the information of the source images after the region enhancement input in the source image super-resolution branch, so that the focused region and the unfocused region are better distinguished. And finally, using sub-pixel convolution as an upsampling layer to generate a super-resolution source image and a fusion image.
Drawings
FIG. 1 is an overall architecture diagram of the present invention incorporating an embodiment;
fig. 2 is a diagram of the structure of each sub-module: (a) a structure diagram of a feature extraction block in a source image super-resolution branch is shown; (b) the structure diagram of the fusion block based on the self-attention mechanism in the super-resolution and fusion branch is shown; (c) the tile is patterned for region information enhancement.
Detailed Description
The following detailed description of the embodiments, specific examples and flow diagrams are shown in FIG. 1. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present application. But merely as exemplifications of systems and methods consistent with certain aspects of the application, as recited in the claims.
Example 1: referring to fig. 1, a schematic diagram of steps of an image super-fusion and fusion method based on region information enhancement and block self-attention according to the present application is shown, in which an input source image and an output result image of a specific example are also shown. As shown in fig. 1, the present application is composed of a source image super-resolution branch and a super-resolution and fusion branch, and provides an image super-resolution and fusion method based on region information enhancement and block self-attention, including:
step1, during the task of over-resolution and fusion of multi-focus images, as shown in FIG. 1, the source image with low resolution Respectively input into the source image super-resolution branches, and at the same time,spliced together by channel input into fusionAnd in the super-resolution branch. And a 3 multiplied by 3 convolutional layer is arranged at the beginning of the source image super-resolution branch and the fusion and super-resolution branch and is used for preliminarily extracting features. And then, the source image super-resolution branch contains 16 feature extraction blocks and 16 region information enhancement blocks, and the fusion and super-resolution branch contains 16 fusion blocks based on a block self-attention mechanism. 16 feature extraction blocks, 16 region information enhancement blocks and 16 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i (i is more than or equal to 0 and less than or equal to 16) is defined as the ith feature extraction block/region information enhancement block/fusion block based on the block self-attention mechanism.
Step2, in the source image super-resolution branch, the initial feature map passes through 16 feature extraction blocks, and the 16 feature extraction blocks are connected in a dense connection mode. Output of the i-1 th feature extraction BlockBesides being continuously input into the ith feature extraction block to construct a super-resolution source image, the super-resolution source image is also input into the ith region information enhancement block to assist in fusion and super-resolution branch acquisition decision weight map. The region information enhancement block will enhance the information of the salient feature region, especially the feature information of the focus region. The information output by the region information enhancement block is input into the ith fusion block based on the block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 16 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, after passing 16 feature extraction blocks in the source image super-resolution branch, and 16 block-self-attention mechanism-based fusion blocks in the source image super-resolution branch, is a layer of 1 × 1 convolutional layers and a layer of sub-pixel convolutions. 1 x 1 convolution reduction To the square of the magnification r, whereinAre respectivelyThe output of the 16 th characteristic extraction block of the source image super-resolution branch,Sub-pixel convolution is carried out on the output of a 16 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtainedSuper-resolution results ofIn the fusion and super-resolution branch, normalization is carried out through a Sigmoid function, and a decision weight graph W for multi-focus image fusion is obtained through threshold divisionSRFinally, combining the source image to obtain a super-resolution fusion result image
Step5, obtaining the network parameter through Step4 in the network parameter training processSuper-resolution results of And a decision weight graph WSRSuper-resolution fusion result imageAnd then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases to be flat, and obtaining a high-quality super-resolution and fusion result by testing.
Furthermore, in Step2, the dense connection mode refers to the initial feature map f output by the first layer convolution layer in the super-resolution branch of the source image0And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block. Finally, f0And the outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1. The structure of the feature extraction block is shown in fig. 2(a), which is composed of three convolution layers of 3 × 3, and uses a residual learning mode to alleviate the degradation problem caused by the deep network;
further, in Step2, the proposed region information enhancement block is shown in fig. 2 (c). Firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; the output characteristic diagram is sliced according to the channel to obtain two characteristic diagrams with the same dimension, and the two characteristic diagrams are the offset of the input characteristic diagram in the horizontal direction and the vertical direction; namely, the convolutional layer learns the offset of each position of the input feature map in the horizontal and vertical directions, and the horizontal and vertical offsets and the input feature map are input into the deformable convolution, so that the feature map closer to the shape and the size of the object is obtained. Definition of Are respectively asThe amount of offset in the horizontal and vertical directions of the,are respectively asIs offset in the horizontal and vertical directions, whereinAre respectivelyThe output of the ith feature extraction block of the super-resolution branch of the source image,And outputting the ith characteristic extraction block of the source image super-resolution branch. Therefore, the feature map of the salient object region information input to the super-resolution and fusion branch i-th timeThe calculation method is as follows:
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
Further, in Step3, the block self-attention mechanism means that when the local feature of a pixel is considered, attention should be paid to the pixel which has a large influence on the local feature. In the present invention, each location will be explored with its 7 × 7 neighborhood rangeCharacteristic relationships within the enclosure. In thatIn for position p, defineA neighborhood range of 7 × 7 with p as the center point;is composed ofThe characteristic values corresponding to the regions, () fuse the information in the neighborhood range; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p position pCan be calculated as:
whereinI.e. calculating the eigenvector x of the p position by using the transposition multiplication modepAnd the feature vector x of the q positionqThe correlation of (c). BatchNormalize (·) is a batch normalization operation.
Further, in Step3, the fusion block based on the block self-attention mechanism means that the previously output fusion feature map is spliced with the feature map of the salient focusing region input by the source image super-resolution branch, and after information integration is performed through convolution of 1 × 1 and convolution of several layers of 3 × 3, the self-attention mechanism based on the block range is used to more accurately highlight the range of the salient object.
In Step4, the normalization of Sigmoid function means:
whereinRepresenting the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; and (m, n) represents a coordinate position, and then a decision weight graph of multi-focus image fusion is obtained by using threshold t division. The invention sets t to 0.5, decides the weight graph WSRThis can be obtained by the following formula:
further, in Step5, regarding the loss calculation, the present invention calculates the loss using the L1 norm with better convex optimization properties and uses an Adam optimizer to minimize the loss value. Definition of Are tag values, respectivelyCorresponding high resolution image,Corresponding high resolution image, high resolution fused image, WSR、WHRIf the fusion decision graph and the high-resolution label fusion decision graph are super-resolution, the loss is calculated as follows:
in Step5, the input test image, i.e. the two low-resolution source images on the left side in fig. 1, is the input low-resolution source image of the specific example, and the intermediate image on the right side in fig. 1 is the fusion result image of the specific example, it can be seen that the super-resolution fusion result contains abundant texture detail information of the two low-resolution source images, which indicates that the method can capture information in the low-resolution source images deeply and further generate natural high-quality details. The focused boundary and the non-focused boundary are accurately estimated, which shows that the region information enhancement block of the invention has the effect of accurately estimating the object contour, the fusion block based on the block attention mechanism has the effect of accurately estimating the focused region, and the combination of the two blocks ensures the information fusion of the focused regions of the two source images.
Further, unless otherwise specified, Relu is used as the nonlinear activation function after all convolutional layers; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.
Claims (7)
1. An image super-fusion method based on region information enhancement and block self-attention is characterized in that: the method comprises the following specific steps:
step1, in the task of over-dividing and fusing multi-focus images, the source image with low resolutionRespectively input into the source image super-resolution branches, and at the same time,splicing the source image super-resolution branch and the fusion and super-resolution branch together according to channels, inputting the source image super-resolution branch and the fusion and super-resolution branch into a 3 x 3 convolution layer at the beginning for preliminarily extracting features, then, the source image super-resolution branch contains 16 feature extraction blocks and 16 region information enhancement blocks, the fusion and super-resolution branch contains 16 fusion blocks based on a block self-attention mechanism, the 16 feature extraction blocks, the 16 region information enhancement blocks and the 16 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i is defined, i is more than or equal to 0 and less than or equal to 16 and is the ith feature extraction block or the region information enhancement block or the fusion block based on the block self-attention mechanism;
Step2, in the source image super-resolution branch, the initial feature map passes through 16 feature extraction blocks, and the 16 feature extraction blocks are connected in a dense connection mode, and the output of the i-1 th feature extraction blockIn addition to being continuously input into the ith feature extraction block to construct a super-resolution source image, the feature extraction block also inputs the feature extraction block into the ith region information enhancement block to assist the fusion and super-resolution branch to acquire a decision weight map, the region information enhancement block enhances the information of a significant feature region, particularly the feature information of a focusing region, and the information output by the region information enhancement block is input into the ith fusion block based on a block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 16 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, following the 16 feature extraction blocks in the source image super-resolution branch, and the 16 block-autofocusing mechanism-based fusion blocks in the source image super-resolution branch, is a layer of 1 × 1 convolutions and a layer of sub-pixel convolutions, a 1 × 1 convolution reduction To the square of the magnification r, wherein Are respectivelyThe output of the 16 th characteristic extraction block of the source image super-resolution branch,Sub-pixel convolution is carried out on the output of a 16 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtainedSuper-resolution results ofIn the fusion and super-resolution branch, normalization is carried out through a Sigmoid function, and a decision weight graph W for multi-focus image fusion is obtained through threshold divisionSRFinally, combining the source image to obtain a super-resolution fusion result image
Step5, obtaining the network parameter through Step4 in the network parameter training processSuper-resolution results of And a decision weight graph WSRSuper-resolution fusion result imageAnd then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases to be flat, and obtaining a high-quality super-resolution and fusion result by testing.
2. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein:
the dense connection mode proposed in Step2 refers to the following steps: initial characteristic diagram f output by first layer convolution layer in source image super-resolution branch0And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block, and finally, f0The outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1; the structure of the feature extraction block is composed of three convolution layers of 3 x 3, and residual learning is used to alleviate the degradation problem caused by the deep network.
3. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein:
the region information enhancement block proposed in Step2 is: firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; the output characteristic diagram is sliced according to the channel to obtain two characteristic diagrams with the same dimension, and the two characteristic diagrams are the offset of the input characteristic diagram in the horizontal direction and the vertical direction; that is, the convolutional layer learns that each position of the input feature map is in horizontal and vertical The directional offset, horizontal and vertical offsets and the input feature map are input into a deformable convolution to obtain a feature map that is closer to the shape and size of the object, definingAre respectively asThe amount of offset in the horizontal and vertical directions of the,are respectively asIs offset in the horizontal and vertical directions, whereinAre respectivelyThe output of the ith feature extraction block of the super-resolution branch of the source image,The output of the ith feature extraction block in the super-resolution branch of the source image, and therefore the feature map of the salient object region information input to the super-resolution and fusion branch for the ith timeThe calculation method is as follows:
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
4. The method for image hyper-and fusion based on region information enhancement and block self-attention of claim 3, characterized in that:
the block self-attention mechanism proposed in Step3 means that when the local feature of a pixel is considered, attention should be paid to the pixel which has a large influence on the local feature, and the feature relationship between each position and the 7 x 7 neighborhood of each position is explored, wherein In for position p, defineA neighborhood range of 7 × 7 with p as the center point;is composed ofThe characteristic values corresponding to the regions, () fuse the information in the neighborhood range; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p positionpCan be calculated as:
wherein I.e. calculating the eigenvector x of the p position by using the transposition multiplication modepAnd the feature vector x of the q positionqThe relevance of (1), BatchNormalize (·) is a batch normalization operation;
the fusion block based on the block self-attention mechanism proposed in Step3 means that a fusion feature map output in the front is spliced with a feature map of a salient focusing area input by a source image super-resolution branch, information integration is carried out through convolution of 1 × 1 and convolution of several layers of 3 × 3, and then the self-attention mechanism based on the block range is used for more accurately highlighting the range of a salient object.
5. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein: the normalization of the Sigmoid function in Step4 means that:
wherein Representing the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; (m, n) represents coordinate positions, and then a decision weight map of multi-focus image fusion is obtained by using threshold t division, t is set to be 0.5, and a decision weight map W is obtainedSRThis can be obtained by the following formula:
6. the method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein: the loss calculation proposed in Step5, using the L1 norm with better convex optimization properties to calculate the loss, and using an Adam optimizer to minimize the loss value, defines the lossAre tag values, respectivelyCorresponding high resolution image,Corresponding high resolution image, high resolution fused image, WSR、WHRIf the fusion decision graph and the high-resolution label fusion decision graph are super-resolution, the loss is calculated as follows:
7. the image super-fusion method based on region information enhancement and block self-attention according to any one of claims 1-6, characterized in that: relu was used as the nonlinear activation function after all convolutional layers, except where specifically noted; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010506835.XA CN111861880B (en) | 2020-06-05 | 2020-06-05 | Image super-fusion method based on regional information enhancement and block self-attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010506835.XA CN111861880B (en) | 2020-06-05 | 2020-06-05 | Image super-fusion method based on regional information enhancement and block self-attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111861880A true CN111861880A (en) | 2020-10-30 |
CN111861880B CN111861880B (en) | 2022-08-30 |
Family
ID=72986067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010506835.XA Active CN111861880B (en) | 2020-06-05 | 2020-06-05 | Image super-fusion method based on regional information enhancement and block self-attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111861880B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418163A (en) * | 2020-12-09 | 2021-02-26 | 北京深睿博联科技有限责任公司 | Multispectral target detection blind guiding system |
CN112784909A (en) * | 2021-01-28 | 2021-05-11 | 哈尔滨工业大学 | Image classification and identification method based on self-attention mechanism and self-adaptive sub-network |
CN113094972A (en) * | 2021-03-15 | 2021-07-09 | 西南大学 | Basement depth prediction method and system based on generation of confrontation network and environmental element data |
CN113537246A (en) * | 2021-08-12 | 2021-10-22 | 浙江大学 | Gray level image simultaneous coloring and hyper-parting method based on counterstudy |
CN113705675A (en) * | 2021-08-27 | 2021-11-26 | 合肥工业大学 | Multi-focus image fusion method based on multi-scale feature interaction network |
CN113837946A (en) * | 2021-10-13 | 2021-12-24 | 中国电子技术标准化研究院 | Lightweight image super-resolution reconstruction method based on progressive distillation network |
CN113963009A (en) * | 2021-12-22 | 2022-01-21 | 中科视语(北京)科技有限公司 | Local self-attention image processing method and model based on deformable blocks |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180182109A1 (en) * | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
US20190122103A1 (en) * | 2017-10-24 | 2019-04-25 | International Business Machines Corporation | Attention based sequential image processing |
CN109714592A (en) * | 2019-01-31 | 2019-05-03 | 天津大学 | Stereo image quality evaluation method based on binocular fusion network |
US20190156220A1 (en) * | 2017-11-22 | 2019-05-23 | Microsoft Technology Licensing, Llc | Using machine comprehension to answer a question |
CN109859106A (en) * | 2019-01-28 | 2019-06-07 | 桂林电子科技大学 | A kind of image super-resolution rebuilding method based on the high-order converged network from attention |
CN110033410A (en) * | 2019-03-28 | 2019-07-19 | 华中科技大学 | Image reconstruction model training method, image super-resolution rebuilding method and device |
CN110322402A (en) * | 2019-04-30 | 2019-10-11 | 武汉理工大学 | Medical image super resolution ratio reconstruction method based on dense mixing attention network |
CN110334765A (en) * | 2019-07-05 | 2019-10-15 | 西安电子科技大学 | Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism |
CN111179167A (en) * | 2019-12-12 | 2020-05-19 | 天津大学 | Image super-resolution method based on multi-stage attention enhancement network |
-
2020
- 2020-06-05 CN CN202010506835.XA patent/CN111861880B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180182109A1 (en) * | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
US20190122103A1 (en) * | 2017-10-24 | 2019-04-25 | International Business Machines Corporation | Attention based sequential image processing |
US20190156220A1 (en) * | 2017-11-22 | 2019-05-23 | Microsoft Technology Licensing, Llc | Using machine comprehension to answer a question |
CN109859106A (en) * | 2019-01-28 | 2019-06-07 | 桂林电子科技大学 | A kind of image super-resolution rebuilding method based on the high-order converged network from attention |
CN109714592A (en) * | 2019-01-31 | 2019-05-03 | 天津大学 | Stereo image quality evaluation method based on binocular fusion network |
CN110033410A (en) * | 2019-03-28 | 2019-07-19 | 华中科技大学 | Image reconstruction model training method, image super-resolution rebuilding method and device |
CN110322402A (en) * | 2019-04-30 | 2019-10-11 | 武汉理工大学 | Medical image super resolution ratio reconstruction method based on dense mixing attention network |
CN110334765A (en) * | 2019-07-05 | 2019-10-15 | 西安电子科技大学 | Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism |
CN111179167A (en) * | 2019-12-12 | 2020-05-19 | 天津大学 | Image super-resolution method based on multi-stage attention enhancement network |
Non-Patent Citations (4)
Title |
---|
LIANG X C ET AL.: ""MCFNet: multi-layer concatenation fusion network for medical images fusion"", 《IEEE SENSORS JOURNAL》 * |
Y QING-MING LIU: ""Face Super-Resolution Reconstruction Based on Self-Attention Residual Network"", 《IEEE ACCESS》 * |
朱欣鑫: ""基于深度学习的图像描述算法研究"", 《信息科技》 * |
杨默远等: ""卷积稀疏表示图像融合与超分辨率联合实现"", 《光学技术》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418163A (en) * | 2020-12-09 | 2021-02-26 | 北京深睿博联科技有限责任公司 | Multispectral target detection blind guiding system |
CN112418163B (en) * | 2020-12-09 | 2022-07-12 | 北京深睿博联科技有限责任公司 | Multispectral target detection blind guiding system |
CN112784909A (en) * | 2021-01-28 | 2021-05-11 | 哈尔滨工业大学 | Image classification and identification method based on self-attention mechanism and self-adaptive sub-network |
CN113094972A (en) * | 2021-03-15 | 2021-07-09 | 西南大学 | Basement depth prediction method and system based on generation of confrontation network and environmental element data |
CN113094972B (en) * | 2021-03-15 | 2022-08-02 | 西南大学 | Bedrock depth prediction method and system based on generation of confrontation network and environmental element data |
CN113537246A (en) * | 2021-08-12 | 2021-10-22 | 浙江大学 | Gray level image simultaneous coloring and hyper-parting method based on counterstudy |
CN113705675A (en) * | 2021-08-27 | 2021-11-26 | 合肥工业大学 | Multi-focus image fusion method based on multi-scale feature interaction network |
CN113705675B (en) * | 2021-08-27 | 2022-10-04 | 合肥工业大学 | Multi-focus image fusion method based on multi-scale feature interaction network |
CN113837946A (en) * | 2021-10-13 | 2021-12-24 | 中国电子技术标准化研究院 | Lightweight image super-resolution reconstruction method based on progressive distillation network |
CN113963009A (en) * | 2021-12-22 | 2022-01-21 | 中科视语(北京)科技有限公司 | Local self-attention image processing method and model based on deformable blocks |
Also Published As
Publication number | Publication date |
---|---|
CN111861880B (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111861880B (en) | Image super-fusion method based on regional information enhancement and block self-attention | |
Islam et al. | Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception | |
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN111325794B (en) | Visual simultaneous localization and map construction method based on depth convolution self-encoder | |
Engin et al. | Cycle-dehaze: Enhanced cyclegan for single image dehazing | |
CN109791697B (en) | Predicting depth from image data using statistical models | |
CN111950453B (en) | Random shape text recognition method based on selective attention mechanism | |
CN113657388B (en) | Image semantic segmentation method for super-resolution reconstruction of fused image | |
CN110717851A (en) | Image processing method and device, neural network training method and storage medium | |
CN110378838B (en) | Variable-view-angle image generation method and device, storage medium and electronic equipment | |
CN110381268B (en) | Method, device, storage medium and electronic equipment for generating video | |
CN112733950A (en) | Power equipment fault diagnosis method based on combination of image fusion and target detection | |
CN110910437A (en) | Depth prediction method for complex indoor scene | |
CN117253154B (en) | Container weak and small serial number target detection and identification method based on deep learning | |
CN115713679A (en) | Target detection method based on multi-source information fusion, thermal infrared and three-dimensional depth map | |
CN114724155A (en) | Scene text detection method, system and equipment based on deep convolutional neural network | |
CN116612468A (en) | Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism | |
Hu et al. | Effective local-global transformer for natural image matting | |
CN116645598A (en) | Remote sensing image semantic segmentation method based on channel attention feature fusion | |
Li et al. | An improved method for underwater image super-resolution and enhancement | |
CN116563103A (en) | Remote sensing image space-time fusion method based on self-adaptive neural network | |
CN112950653B (en) | Attention image segmentation method, device and medium | |
CN113780305B (en) | Significance target detection method based on interaction of two clues | |
Nie et al. | Binocular image dehazing via a plain network without disparity estimation | |
CN114565764A (en) | Port panorama sensing system based on ship instance segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |