CN113344005B

CN113344005B - Image edge detection method based on optimized small-scale features

Info

Publication number: CN113344005B
Application number: CN202110518411.XA
Authority: CN
Inventors: 杜博; 刘菊华; 宣文杰
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2022-04-15
Anticipated expiration: 2041-05-12
Also published as: CN113344005A

Abstract

The invention discloses an image edge detection method based on optimized small-scale features, which utilizes a top-down attention guide module to adaptively learn an attention mechanism from features of different scales and utilizes semantic information of high-level features to guide feature extraction and edge prediction of a bottom layer. In addition, a pixel level fusion module is designed, and the confidence scores of the edge pixels with different scales are predicted to perform pixel level fusion on the edge detection results of the network under different scales, so that the fineness of the detected edge is further improved. The method can effectively solve the problems of inaccurate edge positioning and insufficient utilization of small-scale features commonly found in the image edge detection algorithm based on deep learning, obtains a more precise edge detection result, better keeps semantic information contained in the edge, greatly inhibits the detection of a false edge, and effectively improves the performance of the image edge detection algorithm.

Description

Image edge detection method based on optimized small-scale features

Technical Field

The invention belongs to the technical field of image edge detection, and particularly relates to an image edge detection method based on optimized small-scale features.

Background

As a basic computer vision task, image edge detection algorithms have been extensively studied in the industry and widely used in other vision tasks, however, conventional edge detection algorithms are susceptible to noise and detect more false edges. With the rapid development of deep learning in recent years, the image edge detection algorithm based on deep learning has performance superior to that of the traditional algorithm, the detection accuracy is remarkably improved, and the method gradually becomes a mainstream method in the industry. However, the edge has a multi-scale characteristic as a general underlying feature. Meanwhile, as a pixel-level detection task, edge detection has high requirements on pixel positioning, semantics and the like, and therefore the edge detection task still faces huge challenges. Early, the industry dealt with this task based primarily on local features of the image. The method comprises the steps of firstly obtaining a common edge mode based on a clustering method, then dividing the whole image into image blocks, extracting features through a neural network, judging the edge type of each image block, and finally splicing local detection results to obtain an edge detection result. However, this kind of method only uses local information of the image, and the number of predefined patterns is limited, which greatly limits the detection performance of the model. In view of the above, researchers have proposed edge detection methods based on global information. The method models an edge detection task into a semantic segmentation task, realizes end-to-end edge detection by establishing an integrally nested deep supervision network model, fully utilizes semantic information of high-level features of a neural network, greatly reduces the difference between an algorithm and human performance, and gradually becomes a mainstream method at the present stage.

Currently, edge detection methods based on global information can be further divided into two categories. The first category of methods is mainly based on an integrally nested edge detection network model. According to the method, by means of a deep supervision mechanism, more abundant multi-scale information is extracted from a network model to improve edge detection effects under different scales, and edges obtained through prediction of each scale are further subjected to weighted average to obtain a final edge detection result. The second method is based on a feature pyramid network architecture. The method mainly realizes the fusion of the features with different scales by introducing the side links so as to balance the semantic information and the position information contained in the features with different scales and realize the fine edge detection. However, the above two methods usually ignore the problem that the learning capability of the network to the underlying features is insufficient, resulting in that the prediction result of the shallow layer contains a large number of false edges, which greatly limits the accuracy of edge detection, and the specific reason can be summarized as the following two aspects: firstly, the small-scale features have higher resolution and provide accurate spatial position information, but in the existing method, the weight of the small-scale prediction result in the final result is smaller, so that the detected edge is thicker, and the final edge detection result lacks accurate position information; secondly, the edge features are very sensitive to scale information, so that edges obtained based on feature prediction of the same scale usually have different confidence degrees, and the existing method considers the edges to have the same confidence degree, so that the expression capability of the model is greatly limited.

Disclosure of Invention

In view of the above, the present invention provides an image edge detection method based on optimizing small-scale features. The method provides an image edge detection network model based on the optimized small-scale features, and the learning capability of the network model on the small-scale features is greatly improved and the fineness of the edge pixels detected by the conventional model is enhanced by utilizing a top-down attention guide module and a pixel level fusion module. The method proposes to use a long-short term memory network to adaptively learn a top-down attention guide module in an edge detection task for the first time, and aims to guide the learning of bottom layer features and the positioning of edge pixels by utilizing rich semantic information contained in large-scale edge prediction. In addition, the method also provides a pixel level fusion module for carrying out pixel level fusion on the edge detection results under different scales, so that the confidence of the detected edge pixels is effectively improved. The method effectively solves the problems of inaccurate edge positioning and insufficient utilization of small-scale features commonly found in the image edge detection algorithm based on deep learning, obtains a more precise edge detection result, and effectively improves the performance of edge detection.

In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: an image edge detection method based on optimized small-scale features comprises the following steps:

(1) in a backbone network, extracting abundant multi-scale features for each layer of network by using a scale feature enhancement module to obtain a feature map and a confidence score map, and taking the feature map and the confidence score map as the input of an attention guide module and a pixel level fusion module after the feature map and the confidence score map are sampled;

(2) establishing a top-down attention guide module based on a long-short term memory network model, enriching semantic information of small-scale features by means of edge prediction results under a large scale, further guiding the network to learn the small-scale features, and enhancing the representation capability of the small-scale features;

(3) and performing pixel-level fusion on the edge detection results under different scales by using a pixel-level fusion module, and extracting the edge pixel with the highest confidence coefficient as a final detection result.

The method comprises the following concrete steps:

step 1), inputting a natural scene image, and extracting convolution characteristic graphs with different spatial resolutions by using a convolution neural network;

step 2), further extracting multi-scale features from each convolution feature map by using a scale feature enhancement module to respectively obtain feature maps and confidence score maps under different scales;

step 3), the feature map and the confidence score map are subjected to up-sampling by using an deconvolution layer, and the feature map and the confidence score map are restored to be the same as the original image in size and are respectively used as input features of an attention guidance module and a pixel level fusion module;

step 4), taking the feature graph obtained by upsampling the deconvolution layer in the step 3) as the input of an attention guiding module, sequentially establishing long-term and short-term memory network units in a top-down mode, guiding and enhancing feature learning under the current scale according to the current features and high-level semantic information contained in historical prediction, predicting the edge under the current scale by each unit at the same time, obtaining a preliminary edge prediction result, and transmitting the preliminary edge prediction result as a hidden state to the next unit;

step 5), monitoring the primary edge detection result obtained by the long-term and short-term memory network unit under each scale so as to improve the learned feature quality under each scale;

step 6), inputting the confidence score images acquired in each scale after the deconvolution layer is used for up-sampling in the step 3) into a pixel-level fusion module for fusion and dimensionality reduction, calculating the confidence of the pixel at each scale in the preliminary prediction edge image through a softmax function, and performing weighted fusion with the prediction result in each scale in a pixel-by-pixel multiplication mode to obtain the edge detection result after pixel-level fusion;

and 7), finally, further fine-tuning the edge detection result through jump link and weighting fusion to obtain a more robust edge detection result as a final output result of the whole method.

Further, the specific processing process of the scale feature enhancement module in the step 2) is to use the feature map of the current scale as input, obtain image features of a plurality of channels through a 3 × 3 convolution, extract features under different scales by utilizing a plurality of parallel hole convolutions with different expansion rates, further input the extracted multi-scale features into a plurality of convolution layers with convolution kernel sizes of 1 × 1 for feature fusion and dimension reduction, and obtain the feature map and the confidence score map under different scales respectively.

Further, the concrete structure of the attention guiding module in the step 4) is as follows;

41) firstly, outputting a preliminary edge prediction result by using a convolution layer with a convolution kernel size of 1 multiplied by 1 for the characteristics of the highest layer, and supervising the result to be used as the hidden state input of a subsequent long-term and short-term memory network unit;

42) the method comprises the steps of establishing a long-short term memory network unit under each scale in a top-down mode to guide small-scale feature learning, receiving an edge prediction result and a hidden state of a previous unit and a feature map under the current scale as input by the long-short term memory network unit, guiding the feature learning under the current scale by utilizing three gating mechanisms of an internal forgetting gate, an input gate and an output gate and combining historical information contained in the hidden state and semantic information contained in the prediction result of the previous unit, adjusting and enhancing feature representation learned by the current scale, outputting a preliminary edge detection result under the current scale, and monitoring the preliminary edge detection result.

Furthermore, a deep supervision mechanism is adopted to supervise the preliminary edge prediction result, the calculation method is shown as formula (1), and the mechanism can promote the network layer extraction under each scale to be more effective for edge prediction;

specifically, P ═ P_iI | ═ 1,2, …, | P | } denotes the edge detection result of the network prediction, Y ═ Y { (Y)_iI | ═ 1,2, …, | Y | } denotes the true edge label, Y⁺＝{y_j|y_j≧ gamma } represents the number of positively labeled edge pixels, Y^-＝{y_j|y_j0 represents the number of negative callout edge pixels; α ═ Y^-|/(|Y⁺|+|Y^-|)，β＝λ*|Y⁺|/(|Y⁺|+|Y^-|) as a weight coefficient for balancing the difference caused by imbalance of positive and negative samples; lambda is used as an empirical parameter to further control the weight difference between the positive and negative samples.

Further, the specific structure of the pixel level fusion module in the step 6) is as follows;

61) superposing confidence score maps obtained under different scales on channel dimensions, firstly fusing a convolution layer with the size of 3 multiplied by 3, secondly reducing the dimensions of the fused confidence score map by utilizing the convolution layer with the size of 1 multiplied by 1, and finally normalizing by using a softmax function to obtain confidence score with the same number as that of edge prediction maps under different scales, wherein the confidence score represents the confidence of pixels at different positions under different scales, and the specific calculation formula is as follows:

W＝softmax(S₁,S₂...,S_n) (2)

wherein S is_iRepresenting confidence score graphs of pixels at all spatial positions under different scales, wherein W represents the fusion weight of the pixel level obtained through a softmax function;

62) the confidence degrees of the pixels obtained in the last step under different scales are used as weight parameters to multiply the edge detection results of different scales pixel by pixel to obtain the fused edge detection result, the calculation method is shown as a formula (3),

wherein,

representing a bitwise multiplication, P_iThe edge detection result at the ith scale is shown, and Cat (-) shows channel superposition.

Further, the specific calculation formula of the jump link and the weighted fusion in step 7) is as follows:

P^e＝Conv_1×1(Cat(P′,P₁,P₂…,P_n)) (4)

wherein, Conv_1×1(. -) represents a 1 × 1 convolution operation, and the jump link is specifically represented as Cat (P', P)₁,P₂…,P_n) Where Cat (·) denotes channel stacking; finally P is added^eAnd outputting the final edge detection result.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1) by optimizing the extraction and learning of the small-scale features, compared with the prior art, the method makes full use of more accurate position information in the small-scale features, and effectively improves the positioning accuracy of the edge pixels;

2) the invention can predict the edge detection result with the highest confidence level under different scales by integrating the edge detection result with the highest confidence level, and simultaneously considers the position and semantic information of the edge, thereby effectively inhibiting the detection of the false edge and finally obtaining the edge detection result with finer vision and more robust performance.

Drawings

FIG. 1 is a schematic flow chart of the practice of the present invention.

FIG. 2 is a top-down view of an attention-guiding module according to an embodiment of the present invention.

FIG. 3 is a diagram of a pixel level fusion module according to an embodiment of the invention.

FIG. 4 is a diagram illustrating an image edge detection effect according to an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

As shown in fig. 1, the technical solution adopted by the present invention is an image edge detection method based on optimizing small-scale features, comprising the following steps:

1) inputting a natural scene image, extracting convolution feature maps with different spatial resolutions by using a convolution neural network, wherein convolution feature maps under 5 scales are obtained in the embodiment;

2) and further extracting multi-scale features from each convolution feature map by using a scale feature enhancement module. The scale feature enhancement module takes a feature map of a current scale as input, obtains image features of 32 channels through 3 × 3 convolution, extracts features under different scales by utilizing a plurality of parallel void convolutions with different expansion rates, and further inputs the extracted multi-scale features into 21 convolution layers with convolution kernel sizes of 1 × 1 for feature fusion and dimension reduction to respectively obtain feature maps and confidence score maps under different scales;

3) the feature map and the confidence score map obtained in the last step are subjected to up-sampling by using an deconvolution layer, and are restored to the same size as the original image and are respectively used as input features of an attention guidance module and a pixel level fusion module;

4) and 3) taking the characteristic diagram obtained after the up-sampling is carried out by using the deconvolution layer in the step 3) as the input of the attention guiding module, and sequentially establishing the long-term and short-term memory network units in a top-down mode. The unit can guide and enhance the feature learning under the current scale according to the current feature and high-level semantic information contained in the historical prediction, and simultaneously, each unit predicts the edge under the current scale to obtain a preliminary edge prediction result which is used as a hidden state and is transmitted to the next unit;

5) monitoring the primary edge detection result obtained by the long-term and short-term memory network unit under each scale so as to improve the learning characteristic quality under each scale;

6) inputting the confidence score maps obtained in 3) at all scales after the up-sampling by using the deconvolution layer into a pixel-level fusion module for fusion and dimensionality reduction, calculating the confidence of the pixel at each position in the preliminary prediction edge result at each scale through a softmax function, and performing weighted fusion with the preliminary edge prediction result at each scale in a pixel-by-pixel multiplication mode to obtain a pixel-level fused edge detection result;

7) and finally, further fine-tuning the edge detection result through jump link and weighting fusion to obtain a more robust edge detection result as a final output result of the whole method. The results show superior performance in detecting edge details.

FIG. 2 provides a block diagram of a top-down attention directing module, with specific details including the following:

1) after the feature expressions of different scales are obtained, firstly, a convolution layer with the convolution kernel size of 1 multiplied by 1 is used for the feature of the highest layer to output an edge prediction result, and the result is supervised to be used as the hidden state input of a subsequent long-short term memory network unit;

2) and establishing long-short term memory network units at each scale in a top-down mode to guide small-scale feature learning. The long-short term memory network unit receives an edge prediction result and a hidden state of a previous unit and a feature map under a current scale as input, utilizes three gating mechanisms of an internal forgetting gate, an input gate and an output gate, guides feature learning under the current scale by combining historical information contained in the hidden state and semantic information contained in the prediction result of the previous unit, adjusts and enhances feature representation learned under the current scale, outputs a preliminary edge detection result under the current scale, and supervises the preliminary edge detection result;

3) and (4) monitoring the prediction result of each long-term and short-term memory unit by using a deep monitoring mechanism, namely calculating loss with a real label for training the model parameters. This mechanism may facilitate network layer extraction at each scale to more efficiently characterize the representation for edge prediction.

Specifically, P ═ P_i|i＝1,2,…,|P| represents an edge detection result of network prediction, and Y ═ Y_iI | ═ 1,2, …, | Y | } denotes the true edge label, Y⁺＝{y_j|y_j≧ gamma } represents the number of positively labeled edge pixels, Y^-＝{y_j|y_j0 represents the number of negative-labeled edge pixels. α ═ Y^-|/(|Y⁺|+|Y^-|)，β＝λ*|Y⁺|/(|Y⁺|+|Y^-And |) as a weighting coefficient for balancing the difference caused by imbalance of positive and negative samples. Lambda is used as an empirical parameter to further control the weight difference between the positive and negative samples.

As shown in fig. 3, the specific details of the pixel level fusion module are as follows:

1) first, the confidence score maps obtained at different scales are superimposed on the channel dimensions and fused using convolutional layers of size 3 × 3. On the basis, reducing the dimension of the fused confidence score map by using a convolution layer with the size of 1 multiplied by 1;

2) and (4) normalizing the confidence score maps obtained after the operation by using a softmax function to obtain fused confidence score maps with the same number as the edge prediction maps under different scales. The score represents the confidence degrees of the pixels at different spatial positions under different scales, and the specific calculation formula is as follows:

W＝softmax(S₁,S₂...,S_n) (2)

wherein S is_iAnd (3) representing a confidence score map of each spatial position pixel at different scales, wherein W represents a pixel-level fusion weight obtained by a softmax function.

3) Further, the confidence degrees of the pixels obtained in the last step under different scales are used as weight parameters to multiply the edge detection results of different scales element by element, so that edge detection results after fine fusion are obtained, and the edge positioning accuracy is further improved;

wherein,

4) In practical use, in order to improve the robustness of the edge detection result, the method further introduces jump linking and weighted fusion after the pixel-level fusion module, namely all the preliminary edge detection results P of the attention-guiding module₁,P₂…,P_nChannel superposition and weighted fusion are carried out on the prediction result P' obtained by the pixel level fusion module, and the calculated result P is^eAnd outputting the final edge detection result of the method. The specific calculation formula of the jump link and weighting fusion step is as follows:

P^e＝Conv_1×1(Cat(P′,P₁,P₂…,P_n)) (4)

wherein, Conv_1×1(. -) represents a 1 × 1 convolution operation, and the jump link is specifically represented as Cat (P', P)₁,P₂…,P_n) Where Cat (. circle.) represents the channel stacking.

Fig. 4 shows the effect of the image edge detection method proposed by the present invention. The method can effectively solve the problems of inaccurate edge positioning and insufficient utilization of small-scale features commonly found in image edge detection, more accurately detect edge pixels and inhibit false edges, better keep semantic information contained in the edges and effectively improve the performance of an image edge detection algorithm.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image edge detection method based on optimized small-scale features is characterized by comprising the following steps:

2. The image edge detection method based on the optimized small-scale features as claimed in claim 1, characterized in that: the specific processing process of the mesoscale feature enhancement module in the step 2) is that the feature map of the current scale is used as input, image features of a plurality of channels are obtained through 3 x 3 convolution, features under different scales are extracted through a plurality of parallel void convolutions with different expansion rates, the extracted multiscale features are further input into a plurality of convolution layers with convolution kernel sizes of 1 x 1 for feature fusion and dimension reduction, and feature maps and confidence score maps under different scales are obtained respectively.

3. The image edge detection method based on the optimized small-scale features as claimed in claim 1, characterized in that: the specific structure of the attention guiding module in the step 4) is as follows;

4. The image edge detection method based on the optimized small-scale features as claimed in claim 1 or 3, characterized in that: a deep supervision mechanism is adopted to supervise the preliminary edge prediction result, the calculation method is shown as formula (1), and the mechanism can promote the network layer under each scale to extract more effective feature expression for edge prediction;

5. The image edge detection method based on the optimized small-scale features as claimed in claim 1, characterized in that: step 6) the specific structure of the pixel level fusion module is as follows;

W＝softmax(S₁,S₂...,S_n) (2)

wherein,

6. The image edge detection method based on the optimized small-scale features as claimed in claim 5, characterized in that: the specific calculation formula of jump link and weighting fusion in step 7) is as follows:

P^e＝Conv_1×1(Cat(P′,P₁,P₂…,P_n)) (4)