CN111563513A

CN111563513A - Defocus blur detection method based on attention mechanism

Info

Publication number: CN111563513A
Application number: CN202010411177.6A
Authority: CN
Inventors: 朱策; 姜泽宇; 刘翼鹏; 刘晓宁
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2020-08-21
Anticipated expiration: 2040-05-15
Also published as: CN111563513B

Abstract

The invention belongs to the technical field of image processing, and particularly relates to a defocus blur detection method based on an attention mechanism. The network structure of the invention uses a channel attention mechanism, extracts the relation between characteristic layers from the global angle, effectively improves the expression capability of the characteristics, simultaneously applies a space attention mechanism, and realizes the selective extraction of low-order information by combining high-order semantic information. The invention solves two important problems in improving fuzzy detection, namely, correctly classifying smooth clear areas and effectively inhibiting the influence caused by disordered backgrounds.

Description

Defocus blur detection method based on attention mechanism

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a defocus blur detection method based on an attention mechanism.

Background

Defocus blur detection is one of the basic image processing tasks, and its purpose is to segment out blurred parts of the picture. Blur detection has many applications, such as image deblurring, blur enhancement, depth estimation, etc. Among the most advanced defocus blur detection methods at present, the Convolutional Neural Network (CNN) is the most common and effective method for solving the problem. Compared with the traditional fuzzy detection method based on manual features, the convolutional neural network can effectively extract deep semantic information, and further can greatly improve the detection result. Deep semantic information can effectively locate the fuzzy area, and low-order features can be used for determining edge information of the detection area. The existing neural network fuzzy detection method fuses multi-level characteristics by constructing a larger and deeper neural network, so that the network obtains better characteristic expression. For example, Zhao et al propose a bottom-up full convolutional network that fuses low-level cables and high-level semantic information for defocus blur detection. Tang et al propose that defusing Net recurrently fuses and improves multi-level feature maps, and then combines these multi-level feature maps to obtain the final detection result.

Although the current defocus fuzzy detection method based on the deep learning neural network model can extract deep semantic information and further enhance the accuracy of a detection result, the current network model cannot completely utilize the feature representation capability of a Convolutional Neural Network (CNN), the defocus fuzzy detection method is not good in the main problems of two defocus fuzzy detections, firstly, smooth clear areas can be wrongly classified into fuzzy blocks, and secondly, because background clutter affects the detection result, the networks are not correctly classified in low contrast clear areas and defocus areas with strong background noise. Some defocus blur detection methods increase the feature representation capability of the network by constructing larger and deeper network structures, but the relationship between intermediate feature layers cannot be effectively extracted in the network structures, so that the discrimination capability of the convolutional neural network is influenced. Besides, the methods are all superimposed without distinction for all low-order feature information, however, the effect of the low-order information on the detection result is not the same, and some structural features of background information can cause the detection performance to be reduced and even some areas can be classified wrongly

Disclosure of Invention

The invention aims to solve the problems and provides a novel neural network structure which can enable the network to obtain better distinguishing capability and suppression capability of background noise through an attention mechanism.

The technical scheme adopted by the invention is that as shown in figure 1, the defocus blur detection method based on the attention mechanism specifically comprises the following steps:

step 1: sending the input picture into a pre-trained VGG-16 network, and extracting a multi-level feature map;

step 2: dividing the multi-level feature graphs into two types, wherein one type is used as a high-level feature, and the other type is used as a low-level feature;

and step 3: respectively sending the high-order characteristic diagram and the low-order characteristic diagram into a channel attention mechanism so as to enhance the characteristic expression of the network and obtain better distinguishing and learning capacity;

and 4, step 4: the feature map of the higher order is subjected to an upsampling operation (upsample) to change the size to the same size as the feature map of the lower order. Then, a space attention mechanism is used, according to high-order semantic information, detail features at different positions in a low-order feature map are weighted, effective detail information is given larger weight, and influence of background disorder is inhibited;

and 5: fusing the characteristic graphs of different levels together by high-order information and low-order information through a cross-channel connection (termination); )

Step 6: and further fusing the characteristics through a convolution layer, and obtaining a final detection result after passing through a Sigmoid function.

The method has the beneficial effects that the method mainly aims at improving two important problems in fuzzy detection, namely, the correct classification of smooth and clear areas and how to effectively inhibit the influence caused by disordered backgrounds. For the former problem, whether the region is defocused or not can not be correctly classified is mainly that the network discrimination capability is not strong enough, and in order to improve the situation, a channel attention mechanism is used in the network structure, and the connection among feature layers is extracted from the global angle, so that the feature expression capability is effectively improved. For the latter problem, the existing method is indiscriminately used for all low-order information. However, different detail information has different effects on the detection result, and only the detail information of the edge of the clear and fuzzy area has the maximum effect on the detection result. Some strongly cluttered background low-order information may even cause ambiguous areas to be erroneously determined as unambiguous areas. Therefore, the invention applies a spatial attention mechanism and realizes the selective extraction of low-order information by combining high-order semantic information.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic illustration of an attention mechanism process flow;

FIG. 3 is a schematic diagram of a spatial attention mechanism fused with low-level features;

FIG. 4 is a schematic diagram showing the comparison of the detection results of the detection method of the present invention with those of other detection methods;

FIG. 5 is a graph showing the results of a comparison of two evaluation criteria for MAE and F-measure on two published data sets DUT and Shi.

Detailed Description

The technical scheme of the invention is further described in detail by combining the accompanying drawings:

in step 2 of the present invention: firstly, an input picture is sent to VGG-16 which is pre-trained on ImageNet, and an initial high-order and low-order feature map is obtained. Specifically, the convolutional layers of VGG-16 are first divided into two categories, wherein conv1_2, conv2_2 are used as shallow networks to extract low-order information in the image; the conv3_3, conv4_3 and conv5_3 are used as deep networks to extract high-order semantic information. Then, using an upsample (upsample) operation on the high-order and low-order feature maps, respectively, con2_2 is changed to the size of conv1_2, and conv4_3, conv5_3 are changed to the size consistent with conv3_ 3. This results in initial high-order features as well as low-order features. And then, respectively sending the high-order information and the low-order information to a channel attention mechanism to extract dependency relationship information among the features.

As shown in FIG. 2, in step 3, the feature map for the input

(where C represents the number of channels, H represents the length of the feature map, and W represents the width of the feature map), this feature map is first transformedIs shaped as

Then for X₁And its device

Making a matrix multiplication, and then using the softmax layer for the result of the multiplication to obtain an attention profile R, as follows, where R_jiRepresents the value of the ith column element in the jth row in the attention feature map R. I.e. the impact factor of the ith channel on the jth channel. )

The more similar the two profiles, the stronger this connection. Then transpose X for the input feature map^TPerforming matrix multiplication with the characteristic diagram R to obtain the size of

Finally, multiplying the output of the channel attention by a scaling coefficient α, and superposing the result on the original characteristic diagram in a residual error connection mode to obtain the final output Y

The scale factor α in the formula is initialized to 0 at the beginning of training, and then a suitable value is gradually obtained through training process, and as can be seen from the above formula, the final output result of the module is the weighted sum of all feature maps and the feature map with the original input superimposed. Similar profiles can gain from each other, highlight regions of common interest and reduce variance. By considering the interrelationship between feature layers from a global perspective, the network can achieve greater resolution.

As shown in fig. 3, in step 4,

which represents a feature of a low order,

in order to extract global information and increase the receptive field, but not to increase too much calculation cost, the invention selects hole convolution (Atrous convolution) with two continuous convolution kernels of 3 × 3 and expansion rate of 5]The values within the interval serve as a feature map of spatial attention. The low-level feature map output by the final module is the result of multiplying the corresponding elements of the input low-level feature map by the feature map of spatial attention. By such a structure, the network can realize explicit selective extraction of low-order detail information. Lower order information that is more effective for the detection result is given more weight, while interference information from the background is effectively suppressed.

In step 5, the high-order feature map is subjected to an upsampling operation (upsample) to change the size to the same size as the low-order feature map. For defocused blur detection, high-order features can better position a blur block, but detail information is lacked on irregular boundaries, and low-order features can be used for optimizing detected boundaries, but semantic information is lacked, so that features of different levels need to be fused to obtain better complementary information, and further, a detection result is optimized. Specifically, the high-order information and the low-order information are merged together by a cross-channel connection (termination) to obtain different levels of feature maps.

The results of the detection generated by the method of the present invention are compared with those of other most advanced defocus blur detection methods in fig. 4. It can be seen that the method of the present invention can accurately distinguish smooth clear areas and can effectively suppress the interference from background noise.

The table in FIG. 5 is a comparison of the two evaluation criteria for MAE (smaller is better) and F-measure (larger is better) on the two published data sets DUT and Shi. It can be seen that the defocus blur detection method provided by the invention achieves the best performance in multiple aspects, and the effectiveness of the method is proved. )

Claims

1. The defocus blur detection method based on the attention mechanism is characterized by comprising the following steps of:

s1, inputting the picture into a pre-trained VGG-16 network, and extracting a multi-level feature map;

s2, dividing the multi-level feature graphs into two types, wherein one type is used as a high-level feature graph, and the other type is used as a low-level feature graph; the method specifically comprises the following steps: dividing convolutional layers of the VGG-16 network into two types, taking conv1_2 and conv2_2 as shallow networks to extract low-order information in the image, namely defining feature maps extracted by conv1_2 and conv2_2 as low-order feature maps; the conv3_3, conv4_3 and conv5_3 serve as deep network extraction image high-order information, namely feature maps extracted by the conv3_3, conv4_3 and conv5_3 are defined as high-order feature maps; then, using up-sampling operation on the low-order and high-order feature maps respectively, changing the feature map extracted by con2_2 into the same size as the feature map extracted by conv1_2, and changing the feature maps extracted by conv4_3 and conv5_3 into the same size as the feature map extracted by conv3_3, thereby obtaining an initial low-order feature map and a high-order feature map;

s3, respectively enabling the low-order characteristic diagram and the high-order characteristic diagram to pass through a channel attention mechanism to obtain a low-order attention characteristic diagram and a high-order attention characteristic diagram; the processing method of the channel attention mechanism comprises the following steps: for input feature maps

Wherein C represents the number of channels, H represents the length of the feature map, and W represents the width of the feature map, and the feature map is firstly deformed into

Then for x₁And its device

Matrix multiplication is carried out, finally, a softmax layer is used for the multiplication result to obtain an attention characteristic graph R,

wherein r is_jiRepresenting the value of the ith element in the jth row and the ith column in the attention feature map R, namely the influence factor of the ith channel on the jth channel;

transpose of input feature graph X^TPerforming matrix multiplication with the characteristic diagram R to obtain the size of

An output of (d);

the final output Y is obtained by multiplying the output of the channel attention by a proportionality coefficient alpha and superposing the product on the original characteristic diagram in a residual error connection mode:

the proportionality coefficient alpha is initialized to 0 when training is started, and then is updated through the training process;

defining a feature map obtained after the low-order feature map passes through a channel attention mechanism as a low-order attention feature map, and defining a feature map obtained after the high-order feature map passes through the channel attention mechanism as a high-order attention feature map;

s4, the size of the obtained high-order attention feature graph is changed to be the same as that of the low-order attention feature graph through an upsampling operation, then the obtained high-order attention feature graph is subjected to hollow convolution with two continuous convolution kernels of 3 x 3 and an expansion rate of 5, and an output value is mapped into a [0,1] interval by using a Sigmoid function to obtain a space attention feature graph;

s5, fusing the space attention feature map and the low-order attention feature map through cross-channel connection to the obtained space attention feature map, and obtaining a fused low-order feature map;

and S6, further fusing the fused low-order feature map and the spatial attention feature map through a convolution layer, and obtaining a final detection result through a Sigmoid function.