CN117809048A

CN117809048A - Intelligent image edge extraction system and method

Info

Publication number: CN117809048A
Application number: CN202311857751.0A
Authority: CN
Inventors: 赵建勇; 黄利星; 孙丹枫; 陈佰平
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-02

Abstract

The invention discloses an intelligent image edge extraction system and method, wherein an image input module is used for processing input image data and forming an output image P in a uniform format; the multi-scale feature extraction module is used for extracting initial edge features of the image P, enriching the initial edge features and eliminating background noise so as to obtain a multi-scale edge feature map Q; the brightness edge type decoder module adopts a deep learning network structure and is used for analyzing the multi-scale edge characteristic diagram Q, identifying different brightness edge types and reconstructing a high-resolution edge prediction diagram M; and the multi-scale edge map fusion module performs operations such as up-sampling and activation on the edge prediction map M obtained by the brightness edge type decoder module by utilizing the depth supervision layer, then calculates a loss value according to a loss function, performs back propagation, and finally performs multi-scale edge map fusion to obtain a final edge map K of the original image.

Description

Intelligent image edge extraction system and method

Technical Field

The invention relates to the field of computer vision, in particular to an intelligent image edge extraction system and method.

Background

At present, although the traditional image edge extraction method is widely applied, certain problems exist in special scenes. The illumination change and the change of the color and the material of the object can introduce unnecessary noise and interference, so that the edge detection result is inaccurate. In addition, the conventional method generally adopts a manually designed feature extractor and rules, and cannot meet the edge extraction requirement in a complex scene. Convolutional neural networks can automatically extract features by learning a large amount of image data, and have achieved significant results in tasks such as edge extraction. However, most of the existing deep learning methods rely on complex network architecture and a large number of parameters, resulting in large computing resources and storage overhead, which limit the use thereof in practical applications.

Therefore, in order to solve the drawbacks of the prior art, it is necessary to propose a technical solution to solve the technical problems of the prior art.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an image edge intelligent extraction system and method, which can efficiently and accurately extract image edges while keeping the neural network model with low computational complexity and model parameter number, by adopting a lightweight multi-scale supervised neural network.

In order to solve the technical problems in the prior art, the technical scheme of the invention is as follows:

an intelligent image edge extraction system comprising: an image input module (1), a multi-scale feature extraction module (2), a brightness edge type decoder module (3) and a multi-scale edge map fusion module (4) based on depth supervision, wherein,

an image input module (1) for processing input image data and forming an output image P in a unified format;

the multi-scale feature extraction module (2) is used for extracting initial edge features of the image P, enriching the initial edge features and eliminating background noise so as to obtain a multi-scale edge feature map Q;

the brightness edge type decoder module (3) adopts a deep learning network structure and is used for analyzing the multi-scale edge characteristic diagram Q, identifying different brightness edge types and reconstructing a high-resolution edge prediction diagram M;

and the multi-scale edge map fusion module (4) performs operations such as up-sampling and activation on the edge prediction map M obtained by the brightness edge type decoder module (3) by utilizing the depth supervision layer, then calculates a loss value according to a loss function, performs back propagation, and finally performs multi-scale edge map fusion to obtain a final edge map K of the original image.

As a further improvement, the image input module (1) comprises an image normalization module, an image normalization module and an image size adjustment module;

the multi-scale feature extraction module (2) comprises a feature extraction layer, a compact expansion convolution sub-module and a compact space attention sub-module, wherein the feature extraction layer is used for extracting initial edge features from an output image P of the image input module, and the compact expansion convolution sub-module and the compact space attention sub-module are used for enriching the initial edge features and eliminating background noise so as to obtain a multi-scale edge feature map Q.

As a further improvement scheme, the feature extraction layer in the multi-scale feature extraction module (2) is a depth separable convolution layer, the layer comprises convolution, pooling and residual error connection sub-modules, downsampling is carried out according to different steps in the forward propagation process of the sub-modules, differential convolution operation is carried out, and finally the downsampled features and the features after differential convolution are added through residual error connection to obtain final output.

As a further improvement, the compact expansion convolution sub-module in the multi-scale feature extraction module (2) comprises a linear rectification active layer (ReLU), a 1x1 convolution layer and four 3x3 convolution layers with different expansion rates, wherein the expansion rates are 5, 7, 9 and 11 respectively, and an output result is obtained through summation operation, and a specific compact expansion convolution formula is as follows:

where x is the input tensor; reLU () represents a linear rectification activation function; conv _1x1 () represents a 1x1 convolution operation;representing a 3x3 convolution operation with a dilation rate of i x 2+ 3.

As a further improvement, the compact spatial attention sub-module in the multi-scale feature extraction module (2) comprises a linear rectification active layer (ReLU), a 1x1 convolution layer, a 3x3 convolution layer and a nonlinear active layer (Sigmoid), and the enhanced output result is obtained by multiplying the attention weight with the input image.

As a further improvement, the brightness edge type decoder module (3) introduces a weight layer mechanism, generates fusion features through the weight layer, and then sends the fusion features to the brightness edge type decoder, so as to realize self-adaptive fusion of low-level features and high-level prompts in a learning manner without increasing the dimension of the features; let the input low-level features be F _low The high-level prompt is F _hint The weight layer is W, and the fusion characteristic is F _fusion The formulas for the specific weight layers and fusion features are expressed as follows:

W＝Sigmoid(w _learnable (F _hint ))

F _fusion ＝W⊙F _low +(1-W)⊙F _hint

wherein w is _learnable Is a parameter learning function, sigmoid is used to normalize weights to [0,1 ]]In the range, as indicated by the addition of elements.

As a further improvement, the brightness edge type decoder comprises a 3x3 convolution layer, a 1x1 convolution layer, two linear rectification activation functions (ReLUs) and two batch normalization layers, and is subjected to convolution, activation, batch normalization and other layers on the basis of the fusion feature map so as to generate an edge prediction map of a corresponding type.

As a further improvement scheme, the multi-scale edge map fusion module (4) comprises an upsampling layer, a nonlinear activation function (Sigmoid), a multi-scale feature joint layer, a 1x1 convolution layer and a loss function; performing up-sampling operation on the image by a bilinear interpolation method, and amplifying details of the image; then, a probability map is obtained through a Sigmoid activation function; calculating a loss value through a loss function, and carrying out back propagation to update training parameters according to the loss value; finally, carrying out image multi-scale fusion and feature channel integration to obtain a final edge map;

wherein, the loss function formula is as follows:

wherein, the prediction probability is P, the target label is Y, the total number of samples is N, and the number of positive samples is N ₁ Negative sample number N ₀ The number of ignored samples is N ₂ The negative sample weight is beta, and the positive sample weight Negative class sample weightIgnoring the sample weight ω ₂ ＝0，ignore(Y _i ) Is when Y _i When the sample label is ignored, the value is 0; otherwise, the value is 1.

The invention also discloses an intelligent image edge extraction method, which comprises the following steps:

step S1: constructing a lightweight multi-scale supervision neural network and training to obtain an image edge intelligent extraction model;

step S2: inputting the image data into the intelligent image edge extraction model trained in the step S1, and performing image processing to output edge image information; the intelligent image edge extraction model comprises an image input module (1), a multi-scale feature extraction module (2), a brightness edge type decoder module (3) and a multi-scale edge map fusion module (4) based on depth supervision;

Compared with the prior art, the multi-scale feature extraction module adopts high-efficiency depth separable convolution, so that the computational complexity of a network can be effectively reduced, the network can keep very few model parameters, the global modeling capability is enhanced, the receptive field is enlarged, the computational burden is not increased, and the feature extraction effect is improved. Therefore, even on embedded systems or mobile devices, our image edge extraction algorithm can be run efficiently, offering a wider possibility for various applications.

According to the invention, the multi-scale characteristics extracted through the backbone network are combined with the compact expansion convolution sub-module and the compact space attention sub-module, so that multi-scale edge information can be enriched, and the perception capability of edges with different scales can be effectively improved. Thus, the network can more comprehensively capture the edge structure in the image, and has good response to edges with different sizes.

The brightness edge type decoder module provided by the invention adopts a novel weight layer mechanism, generates fusion characteristics through the weight layer, and then transmits the fusion characteristics to the brightness edge type decoder to realize the self-adaptive fusion of low-level characteristics and high-level prompts. The design innovation is to overcome the problems of information loss and ambiguity possibly faced by the model in the traditional method when different scale features are fused, and provide a more accurate and robust solution for the image brightness edge extraction task.

According to the invention, the edge prediction graph is supervised and optimized by adopting the depth supervision module, so that the guidance of the features is enhanced, the training stability is improved, and the edge extraction performance is further improved.

Drawings

FIG. 1 is a schematic diagram of an intelligent image edge extraction system according to the present invention;

FIG. 2 is a network block diagram of a multi-scale feature extraction module in an embodiment of the invention;

FIG. 3 is a flow chart of a depth separable convolutional layer in an embodiment of the invention;

FIG. 4 is a flow chart of a compact expansion convolution sub-module in an embodiment of the present invention;

FIG. 5 is a flow chart of a compact spatial attention sub-module in an embodiment of the invention;

the invention will be further illustrated by the following specific examples in conjunction with the above-described figures.

Detailed Description

The technical scheme provided by the invention is further described below with reference to the accompanying drawings.

The invention provides an intelligent image edge extraction method, which comprises the following steps:

step S2: and (3) inputting the image data into the intelligent image edge extraction model trained in the step (S1) to perform image processing and output edge image information.

Referring to fig. 1, a schematic structural diagram of an image edge intelligent extraction system is provided, and an image edge intelligent extraction model is obtained based on training of a lightweight multi-scale supervised neural network, and the image edge intelligent extraction system comprises an image input module (1), a multi-scale feature extraction module (2), a brightness edge type decoder module (3) and a multi-scale edge map fusion module (4) based on depth supervision; the method comprises the following specific steps:

step one, the training images are passed through an image input module, and the image data are standardized, normalized and resized to ensure that they have a consistent input format, forming an output image P. The training image dataset in this example is a standard dataset BSDS500 widely used for image edge extraction, which contains 5000 images of natural scenes, including 3000 training images, 1000 verification images and 1000 test images. Each image provides accurate edge annotation for image edge detection.

Inputting the output image P to a multiscale feature extraction module of a pre-constructed lightweight multiscale supervised neural network model, and extracting multiscale features of the image to obtain a multiscale edge feature map Q.

In specific implementation, as shown in fig. 2, the network structure of the multi-scale feature extraction module in this embodiment is that the entire multi-scale feature extraction module is provided with four feature extraction layers, named Stage1, stage2, stage3, stage4, and the largest pooling layers with a convolution kernel size of 2 and a step size of 2 are connected with each other between each feature layer. The first feature extraction layer Stage1 comprises a two-dimensional conventional convolution layer with a convolution kernel size of 3 and a padding of 1 and three depth-separable convolution layers, and the implementation structure of the depth-separable convolution layers is shown in fig. 3, where the depth-separable convolution layers comprise a differential convolution with a convolution kernel size of 3, a linear rectification activation function (ReLU) and a conventional convolution with a convolution kernel of 1; next three feature extraction layers Stage2-Stage4 contain only four depth-separable convolution layers, each of which includes a differential convolution with a convolution kernel size of 3, a linear rectification activation function (ReLU), and a conventional convolution with a convolution kernel of 1.

The convolution layer can be separated at the first layer depth, the input channel number of the feature map is C, and the channel number of the feature map is C x2 before the second layer, which means that the input channel number of the feature map of the second layer is changed into 2C; also before the third layer, the number of input channels is 2c×2 again, so the number of input channels of the feature map of the third layer and the fourth layer becomes 4C, and the main purpose of this is to increase the network depth of the model and limit the increase of the number of channels by adjusting the number of channels of the feature map, so as to achieve the effect of light weight.

And then, respectively inputting the feature images output by the feature extraction layers Stage1, stage2, stage3 and Stage4 into a compact expansion convolution sub-module and a compact space attention sub-module so as to enrich multi-scale edge information and eliminate background noise, thereby obtaining four multi-scale edge feature images which are respectively named as Q1, Q2, Q3 and Q4.

In specific implementation, the input image sequentially passes through four feature extraction layers, and the feature images are input into a compact expansion convolution sub-module and a compact space attention sub-module after passing through each feature extraction layer. The implementation structures of the compact expansion convolution sub-module and the compact space attention sub-module according to this embodiment are shown in fig. 4 and 5, where the compact expansion convolution sub-module mainly includes 1 linear rectification active layer (ReLU), one convolution layer of 1x1, and four convolution layers of 3*3 with different expansion rates, and the expansion rates are 5, 7, 9, and 11, respectively; in this embodiment, the input feature map x is first applied with a ReLU activation function; then, carrying out convolution operation on the activated tensor x through a 1x1 convolution layer; then, the convolved tensor x is convolved through four convolution layers with different expansion rates to obtain four output tensors x1, x2, x3 and x4; finally, the four output tensors x1, x2, x3 and x4 are summed to obtain an output image y, and a specific compact expansion convolution formula is as follows:

where x is the input tensor; reLU () represents a linear rectification activation function; conv _1x1 () represents a 1x1 convolution operation;a 3x3 convolution operation representing a dilation rate of i 2+ 3;

the output image y is then input to a compact spatial attention sub-module, the entire module comprising 1 linear rectifying active layer (ReLU), 1 convolution layer 1x1, 1 convolution layer 3x3 and 1 nonlinear active layer (Sigmoid); in this embodiment, a linear rectification activation function (ReLU) is first applied to the output image y of the previous module, and then a convolution operation is performed by 1x1 convolution layer and 3x3 convolution layer: then activating the convolution result through a Sigmoid activation function to obtain the attention weight z; and finally multiplying the input image y with the attention weight z to obtain an enhanced output result, namely an enhanced feature map of the original image. In the implementation, after four layers of branches, four feature images are obtained and used for learning the edge features of the feature images in the next step.

And step three, inputting the four feature maps Q1-Q4 obtained in the step two to a brightness edge type decoder respectively to strengthen the brightness edge feature, and outputting four edge prediction maps which are named as M1, M2, M3 and M4 respectively.

In the implementation, firstly, the feature maps Q1-Q4 are respectively input into a weight layer; in this embodiment, the weight layer mainly includes two paths: the first path receives advanced features through the deconvolution layer to recover high resolution, and then uses a batch normalization layer and a linear rectification activation function (ReLU) to mine two 3x3 convolutional layer adaptive semantic cues; the other path is implemented as two convolutional layers with batch normalization and linear rectification activation (ReLU), encoding low-level features; and then fusing the features of the two paths through element multiplication to generate fusion features. The specific steps are that the input low-level characteristic is F _low The high-level prompt is F _hint The weight layer is W, and the fusion characteristic is F _fusion The formula for the weight layer and the fusion feature is expressed as follows:

W＝Sigmoid(w _learnable (F _hint ))

F _fusion ＝W⊙F _low +(1-W)⊙F _hint

wherein w is _learnable Is a parameter learning function, sigmoid is used to normalize weights to [0,1 ]]Range, as indicated by element-wise phaseMultiplying;

and finally, sending the fusion characteristic into a brightness edge type decoder for decoding operation. The luma edge type decoder includes a 3x3 convolutional layer, a 1x1 convolutional layer, 2 linear rectification activation functions (relus), and 2 batch normalization layers. On the basis of the fusion feature map, the four edge prediction maps M1-M4 are finally generated through convolution, activation function, batch normalization and other hierarchical processing.

And fourthly, performing operations such as up-sampling and activating functions on the edge prediction graphs M1-M4 obtained in the third step by using a depth supervision module, then calculating a loss function with a group Truth (true value), performing back propagation, and finally performing multi-scale edge graph fusion, thereby obtaining a final edge graph K of the original image.

In particular implementations, the overall depth supervision module includes an upsampling layer, a nonlinear activation function (Sigmoid), a multi-scale feature federation layer, and a 1x1 convolution layer. In this embodiment, since there are four branches of the edge prediction graph, each branch performs the following operations: firstly, carrying out up-sampling operation on an edge prediction image by a bilinear interpolation method, and amplifying the image under the condition that the image information is not lost, so as to amplify details of the image; then mapping the value of each pixel on the image to a range between 0 and 1 through a Sigmoid activation function, and interpreting the output as the probability that the pixel belongs to the edge to obtain a probability map; calculating a loss value through a loss function, and carrying out back propagation to update parameters of the model according to the loss value; and finally, carrying out image multi-scale fusion and feature channel integration through the multi-scale feature joint layer and the 1x1 convolution layer, thereby obtaining a final edge map of the original image.

In this embodiment, the loss function includes setting the prediction probability to be P, the target label to be Y, the total number of samples to be N, and the number of positive samples to be N ₁ Negative sample number N ₀ The number of ignored samples is N ₂ If the negative sample weight is β, the specific loss function formula is as follows:

wherein, the positive class sample weightNegative class sample weight->Ignoring the sample weight ω ₂ ＝0，ignore(Y _i ) Is when Y _i When the sample label is ignored, the value is 0; otherwise, the value is 1.

The above description of the embodiments is only for aiding in the understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An intelligent image edge extraction system, comprising: an image input module (1), a multi-scale feature extraction module (2), a brightness edge type decoder module (3) and a multi-scale edge map fusion module (4) based on depth supervision, wherein,

2. The image edge intelligent extraction system according to claim 1, wherein the image input module (1) comprises an image normalization module, an image normalization module and an image size adjustment module;

3. The image edge intelligent extraction system according to claim 2, wherein the feature extraction layer in the multi-scale feature extraction module (2) is a depth separable convolution layer, the layer comprises convolution, pooling and residual connection sub-modules, downsampling is performed according to different steps in the forward propagation process of the sub-modules, differential convolution operation is performed, and finally the downsampled features and the features after differential convolution are added through residual connection to obtain final output.

4. The image edge intelligent extraction system according to claim 3, wherein the compact expansion convolution sub-module in the multi-scale feature extraction module (2) comprises a linear rectification active layer (ReLU), a 1x1 convolution layer and four 3x3 convolution layers with different expansion rates, wherein the expansion rates are respectively 5, 7, 9 and 11, and an output result is obtained through a summation operation, and a specific compact expansion convolution formula is as follows:

5. The image edge intelligence extraction system of claim 4, wherein the compact spatial attention sub-modules in the multi-scale feature extraction module (2) comprise a linear rectification active layer (ReLU), a 1x1 convolution layer, a 3x3 convolution layer, and a non-linear active layer (Sigmoid), and wherein the enhanced output result is obtained by multiplying the attention weight with the input image.

6. The image edge intelligent extraction system according to claim 5, wherein the luminance edge type decoder module (3) introduces a weight layer mechanism, generates fusion features through the weight layer, and then sends the fusion features to the luminance edge type decoder, realizing self-adaptively fusing low-level features and high-level cues in a learnable manner without increasing the dimension of the features; let the input low-level features be F _low The high-level prompt is F _hint The weight layer is W, and the fusion characteristic is F _fusion The formulas for the specific weight layers and fusion features are expressed as follows:

W＝Sigmoid(w _learnable (F _hint ))

F _fusion ＝W⊙F _low +(1-W)⊙F _hint

wherein w is _learnable Is a parameter learning function, sigmoid is used to normalize weights to [0,1 ]]Range, byMultiplying.

7. The intelligent image edge extraction system of claim 6, wherein the luminance edge type decoder comprises a 3x3 convolution layer, a 1x1 convolution layer, two linear rectification activation functions (ReLU) and two batch normalization layers, and wherein the processing of the layers of convolution, activation and batch normalization is performed on the basis of the fused feature map to generate the edge prediction map of the corresponding type.

8. The image edge intelligent extraction system according to claim 7, wherein the multi-scale edge map fusion module (4) comprises an upsampling layer, a nonlinear activation function (Sigmoid), a multi-scale feature joint layer, a 1x1 convolution layer and a loss function; performing up-sampling operation on the image by a bilinear interpolation method, and amplifying details of the image; then, a probability map is obtained through a Sigmoid activation function; calculating a loss value through a loss function, and carrying out back propagation to update training parameters according to the loss value; finally, carrying out image multi-scale fusion and feature channel integration to obtain a final edge map;

wherein, the loss function formula is as follows:

9. The intelligent image edge extraction method is characterized by comprising the following steps of: