CN112001403A

CN112001403A - Image contour detection method and system

Info

Publication number: CN112001403A
Application number: CN202010802590.5A
Authority: CN
Inventors: 李瑞瑞; 翟新茹
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2020-11-27
Anticipated expiration: 2040-08-11
Also published as: CN112001403B

Abstract

The invention discloses an image contour detection method, which comprises the following steps: s1, performing multi-scale feature extraction on the image to be detected to obtain a plurality of first feature maps; s2, performing multilevel feature extraction on the plurality of first feature maps to obtain a plurality of second feature maps; and S3, performing feature weighted fusion on the plurality of second feature maps to obtain a contour detection map. The invention also discloses an image contour detection system. The invention extracts global features and boundary features from feature maps of different scales, performs feature weighted fusion on the feature maps of different levels, realizes mutual supplement of shallow features and deep features, performs training through deep supervision, and improves the precision of image contour detection.

Description

Image contour detection method and system

Technical Field

The invention relates to the technical field of image detection, in particular to an image contour detection method and system.

Background

Image contour detection is the extraction of image object boundaries and perceptually salient contours from natural images. At present, when a convolutional neural network is used for detecting the contour of an image, most of the features of different layers are considered, global features and boundary information are ignored, and when feature graphs of different scales are fused, the weight of the feature graph of each scale is the same, and the precision of contour detection is not high.

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide an image contour detection method and system, which extract global features and boundary features from feature maps of different scales, perform feature weighted fusion on feature maps of different levels, implement mutual complementation of shallow features and deep features, and improve the precision of image contour detection by performing training through deep supervision.

The invention provides an image contour detection method, which comprises the following steps:

s1, performing multi-scale feature extraction on the image to be detected to obtain a plurality of first feature maps;

s2, performing multilevel feature extraction on the plurality of first feature maps to obtain a plurality of second feature maps;

and S3, performing feature weighted fusion on the plurality of second feature maps to obtain a contour detection map, wherein the weights corresponding to the second feature maps are different.

As a further improvement of the present invention, the performing multi-scale feature extraction on the image to be detected to obtain a plurality of first feature maps includes:

carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature maps;

carrying out global feature extraction on the extracted feature maps respectively to obtain a plurality of global feature maps;

and respectively carrying out boundary processing on the global feature maps to obtain the first feature maps.

As a further improvement of the present invention, the performing multi-level feature extraction on the plurality of first feature maps to obtain a plurality of second feature maps includes:

respectively carrying out feature fusion on the first feature maps with the same scale in the plurality of first feature maps to obtain a plurality of fused feature maps;

and respectively carrying out size conversion on the plurality of fused feature maps to restore the fused feature maps to the original image size to obtain a plurality of second feature maps.

As a further improvement of the present invention, the performing feature weighted fusion on the plurality of second feature maps to obtain a contour detection map includes:

determining weights corresponding to the second characteristic graphs, wherein the weights corresponding to the second characteristic graphs are different;

and respectively carrying out feature weighting fusion processing on the plurality of second feature maps by adopting corresponding weights to obtain the contour detection map.

As a further improvement of the present invention, the method is implemented by a neural network, and the method further comprises: training the neural network according to a training set,

the neural network comprises a plurality of first convolution layers, a plurality of global feature extraction modules, a plurality of boundary refinement modules, a plurality of second convolution layers, a plurality of deconvolution layers and a fusion layer;

the plurality of first convolution layers, the plurality of global feature extraction modules and the plurality of boundary thinning modules are used for performing multi-scale feature extraction on an image to be detected to obtain a plurality of first feature maps;

the second convolution layers and the deconvolution layers are used for carrying out multi-level feature extraction on the first feature maps to obtain second feature maps;

the fusion layer is used for carrying out feature weighted fusion on the plurality of second feature maps to obtain a contour detection map.

As a further improvement of the invention, the training set comprises a plurality of sample images, each sample image comprises a plurality of contour labeling graphs,

wherein training the neural network according to a training set comprises:

and for each sample image, adding and averaging the feature vectors of the plurality of contour label graphs of the sample image to generate a contour probability graph of the sample image, wherein the range of the pixel contour probability of each pixel point in the contour probability graph is 0-1, wherein 0 represents that the pixel point is not labeled in the plurality of contour label graphs, and 1 represents that the pixel point is labeled in the plurality of contour label graphs.

As a further improvement of the present invention, training the neural network according to a training set comprises:

determining the loss of the plurality of second feature maps and the loss of the contour probability map for each sample image to obtain a plurality of first loss functions;

determining the loss of the contour detection map and the contour probability map to obtain a second loss function;

adding the plurality of first loss functions and the second loss functions to obtain a target loss function;

performing parameter optimization on the neural network through the target loss function;

wherein the target loss function is:

in the formula (I), the compound is shown in the specification,

a feature vector representing a second feature map, I represents the number of sample images, I represents the total number of sample images, K represents the number of second feature maps, K represents the total number of second feature maps, W represents a parameter of the neural network,

the first loss function is represented as a function of,

a feature vector representing the contour detection map,

representing a second loss function.

As a further improvement of the present invention, for each sample image, determining the loss of the plurality of second feature maps and the contour probability map to obtain a plurality of first loss functions, including:

determining each positive sample point and each negative sample point in the contour probability map for each sample image, wherein each positive sample point represents a pixel point with a pixel contour probability greater than a threshold eta, and each negative sample point represents a pixel point with a pixel contour probability of 0;

for each second feature map, calculating the loss of each pixel point in the second feature map relative to the contour probability map to obtain a first loss function;

wherein the first loss function is:

in the formula, Y⁺Representing a positive sample point, Y^-Representing negative sample points, λ representing a parameter balancing the number of positive and negative sample points, X_iAnd representing a feature vector of the second feature map, and P represents the prediction probability of the pixel point belonging to the positive sample point.

As a further improvement of the present invention, the method further comprises:

s4, scaling the image to be detected to obtain images to be detected of multiple scales, executing S1-S3 on the image to be detected of each scale to obtain multiple contour detection maps, and performing feature weighting fusion on the multiple contour detection maps to obtain a final contour detection map.

The invention also provides an image contour detection system, comprising:

the multi-scale feature extraction module is used for performing multi-scale feature extraction on an image to be detected to obtain a plurality of first feature maps;

the multilevel feature extraction module is used for carrying out multilevel feature extraction on the plurality of first feature maps to obtain a plurality of second feature maps;

and the contour detection module is used for performing feature weighted fusion on the plurality of second feature maps to obtain contour detection maps, wherein the weights corresponding to the second feature maps are different.

As a further improvement of the present invention, the multi-scale feature extraction module is configured to:

As a further improvement of the present invention, the multilevel feature extraction module is configured to:

As a further improvement of the present invention, the contour detection module is configured to:

As a further improvement of the present invention, the system is implemented by a neural network, and the system further comprises: training the neural network according to a training set,

wherein training the neural network according to a training set comprises:

wherein the target loss function is:

in the formula (I), the compound is shown in the specification,

the first loss function is represented as a function of,

a feature vector representing the contour detection map,

representing a second loss function.

wherein the first loss function is:

As a further improvement of the present invention, the system further comprises:

and the contour detection fusion module is used for scaling the image to be detected to obtain images to be detected of multiple scales, processing the image to be detected of each scale through the multi-scale feature extraction module, the multi-level feature extraction module and the contour detection module to obtain multiple contour detection images, and performing feature weighting fusion on the multiple contour detection images to obtain a final contour detection image.

The invention has the beneficial effects that:

the global features and the boundary features are extracted from the feature maps with different scales, the feature maps with different levels are subjected to feature weighted fusion, the mutual supplement of the shallow features and the deep features is realized, the training is performed through deep supervision, and the precision of image contour detection is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic flowchart of an image contour detection method according to an exemplary embodiment of the present invention;

FIG. 2 is a schematic diagram of a neural network in accordance with an exemplary embodiment of the present invention;

fig. 3 is a flowchart illustrating a training test of a neural network according to an exemplary embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

In addition, in the description of the present invention, the terms used are for illustrative purposes only and are not intended to limit the scope of the present invention. The terms "comprises" and/or "comprising" are used to specify the presence of stated elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the present invention will be more readily understood by those of ordinary skill in the art. The drawings are only for purposes of illustrating the described embodiments of the invention. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated in the present application may be employed without departing from the principles described in the present application.

As shown in fig. 1, an image contour detection method according to an embodiment of the present invention includes:

performing multi-scale feature extraction on an image to be detected to obtain a plurality of first feature maps;

performing multi-level feature extraction on the plurality of first feature maps to obtain a plurality of second feature maps;

and performing feature weighted fusion on the plurality of second feature maps to obtain a contour detection map, wherein weights corresponding to the second feature maps are different.

The method comprises the steps of firstly carrying out multi-scale feature extraction on an image, then fusing feature graphs with the same scale to obtain features of different levels, and then carrying out weighted fusion on the features of different levels to obtain a final detection result. The features of the contour at different levels are greatly different, and the precision of contour detection can be improved by performing weighted fusion on the features at different levels. It can be understood that when fusing the features of different levels, the present invention adopts different weights (the weights corresponding to the second feature maps are different) for the features of each level, so as to better fuse the shallow information and the deep information, realize the mutual supplement of the shallow feature and the deep feature, and improve the detection accuracy.

An optional embodiment, the performing multi-scale feature extraction on the image to be detected to obtain a plurality of first feature maps includes:

The extracted features of the feature maps are rough and cannot cover the global features and boundary information of the image, so that the outlines of the feature maps are unclear. The method provided by the invention carries out global feature extraction on the feature map of each scale to obtain global information, and carries out thinning processing on the boundary of the contour to obtain boundary information, so that the contour of the obtained first feature map is clearer.

An optional implementation manner, where performing multi-level feature extraction on the plurality of first feature maps to obtain a plurality of second feature maps includes:

The method comprises the steps that each scale comprises a plurality of first feature maps, the extracted features of each first feature map are different, the shallow feature map has more detailed features, the deep feature map has more semantic features, feature fusion is carried out on the first feature maps with the same scale, each scale obtains one second feature map, fusion of the shallow features and the deep features is achieved, and the detection effect of different scales can be improved through the plurality of second feature maps obtained through fusion.

In an optional embodiment, the performing feature weighted fusion on the plurality of second feature maps to obtain a contour detection map includes:

And each scale corresponds to one second feature map, the features of the second feature maps are different, the information represented by the shallow feature map and the deep feature map is different, the second feature maps are respectively subjected to feature weighted fusion by adopting different weights, and the feature weighted fusion and the mutual complementation are realized, so that the accuracy of image contour detection is improved.

In an alternative embodiment, the method is implemented by a neural network, and the method further comprises: training the neural network according to a training set. The neural network comprises a plurality of first convolution layers, a plurality of global feature extraction modules, a plurality of boundary refinement modules, a plurality of second convolution layers, a plurality of deconvolution layers and a fusion layer;

The first convolution layers are used for carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature maps;

each first convolution layer is connected with a global feature extraction module, and the global feature extraction modules are used for respectively carrying out global feature extraction on the extracted feature maps to obtain a plurality of global feature maps;

each global feature extraction module is connected with a boundary refinement module, and the plurality of boundary refinement modules are used for respectively performing boundary processing on the plurality of global feature maps to obtain the plurality of first feature maps.

The global feature extraction module (GCN) may use, for example, a k × k large kernel convolution, which may make the contour detection effect more accurate. Convolution kernels of 1x k + k x1 and k x 1+ 1x k may also be used, for example, which may reduce the amount of parameters and calculations, e.g., k 7. And the boundary refining module (BR) is used for refining the boundary so as to enable the outline of the feature map to be clearer. For example, a residual branch formed by a 3 × 3 convolution kernel and a Relu activation function may be used, the coarse global feature map S output by the global feature extraction module is input into the residual branch, and a feature map S1 is output, and the first feature map finally output by the boundary refinement module is S + S1.

The second convolution layers are used for respectively performing feature fusion on the first feature maps with the same scale in the first feature maps to obtain a plurality of fused feature maps;

the multiple deconvolution layers are used for respectively carrying out size conversion on the multiple fused feature maps to restore the multiple fused feature maps to the original image size, and obtaining the multiple second feature maps.

It can be understood that, when the neural network is used for training, a weight (which is a vector) is initialized for each second feature map, each weight is constantly changed in the training process, after the training is finished, the weight corresponding to each second feature map can be determined, and the weights corresponding to the second feature maps are different. When the contour of the image to be detected is detected, the trained weights can be used for carrying out feature weighted fusion processing on the second feature maps so as to obtain a contour detection map.

For example, as shown in fig. 2, the neural network employed in the present invention includes 13 first convolution layers of 3 × 3, which are, from top to bottom, conv1_1, conv1_2, conv2_1, conv2_2, conv3_1, conv3_2, conv3_3, conv4_1, conv4_2, conv4_3, conv5_1, conv5_2, and conv5_ 3. Wherein, conv1_1 and conv1_2 correspond to a first scale, and conv1_2 connects a pooling layer; conv2_1, conv2_2 correspond to the second scale, conv2_2 connects a pooling layer; conv3_1, conv3_2 and conv3_3 correspond to a third scale, and conv3_3 is connected with a pooling layer; conv4_1, conv4_2 and conv4_3 correspond to a fourth scale, and conv4_3 is connected with a pooling layer; conv5_1, conv5_2, conv5_3 correspond to the fifth scale. Each first convolution layer output is connected with a global feature extraction module and a boundary refinement module. After the image is input, 13 first feature maps are obtained after the image is subjected to 13 first convolution layers, 13 global feature extraction modules and 13 boundary thinning modules. The first feature map of each scale passes through a second convolution layer of 1x1 to obtain feature maps of 5 channels 1 corresponding to 5 scales respectively, and the feature maps of 5 channels 1 pass through a deconvolution layer respectively to be restored to the original size to obtain second feature maps of 5 channels 1. And performing feature fusion on the second feature maps with 5 channels as 1 by adopting different weights (namely 5 trained different weights are adopted by the 5 second feature maps), obtaining 1 feature map with 5 channels through concat layer fusion, and outputting the feature maps with 5 channels as feature maps with 1 channel through a convolution layer of 1x1, namely obtaining the contour detection map. Wherein the weights may be determined by network training.

Compared with the traditional VGGNet network, the neural network provided by the invention reduces the fifth pooling layer and all full-connection layers, and adds a global feature extraction module and a boundary refinement module. The fifth pooling layer further reduces the size of the image, and when the feature map is enlarged and restored to the original size, the image is blurred too much, and the detection accuracy is lowered. The reduction of the fifth pooling layer and the full-link layer can obviously reduce the internal memory and time of training and testing, and can also avoid inaccurate detection caused by over-fuzzy images. Meanwhile, the added global feature extraction module and the boundary refinement module can acquire global information and boundary information of the feature map, and the contour detection precision is further improved.

The method provided by the invention trains the neural network to obtain the optimal network parameters, and can better perform contour detection on the image to be detected. The present invention, for example, employs the BSDS500 data set and the NYUD data set as the raw data set. The BSDS500 data set is a data set for image segmentation and object edge detection, and includes 200 training images, 100 verification images, and 200 test images, each with 4-9 markers. When the system is used, each training image in the BSDS500 data set is rotated at different angles, the largest rectangle in the rotated image is cut out, the image is turned over at each angle to obtain an augmentation data set, and the augmentation data in the augmentation data set and the VOC context data set are mixed to serve as the training data in the training set. The NYUD dataset consists of 1449 pairs of densely labeled RGB and depth images captured from an indoor scene, including 381 training images, 414 verification images, and 654 test images. When the method is used, each training image in the NYUD data set is randomly turned over, zoomed, rotated and the like to obtain an augmented data set, and the augmented data set is expanded into the NYUD data set to serve as training data of the training set.

In an alternative embodiment, the training set comprises a plurality of sample images, each sample image comprises a plurality of contour labels,

wherein training the neural network according to a training set comprises:

Since the images in the data set are usually marked by different annotators, different annotators will get different annotated pictures for the same image. All contour label graphs of the same image are processed to obtain a contour probability graph of the image, and the contour probability graph is used as a label graph in the training process, so that the detection error caused by single label graph training can be reduced, and the contour detection accuracy is improved.

An alternative embodiment, training the neural network according to a training set, comprising:

wherein the target loss function is:

in the formula (I), the compound is shown in the specification,

a feature vector representing the second feature map, I represents the number of sample images, and I represents a sampleA total number of images, K representing a serial number of the second feature map, K representing a total number of the second feature map, W representing a parameter of the neural network,

the first loss function is represented as a function of,

a feature vector representing the contour detection map,

representing a second loss function.

An alternative embodiment, determining the loss of the plurality of second feature maps and the contour probability map for each sample image, resulting in a plurality of first loss functions, comprises:

wherein the first loss function is:

in the formula, Y⁺Representing a positive sample point, Y^-Representing negative examplesThe point, λ, represents a parameter that balances the number of positive and negative sample points, X_iAnd representing a feature vector of the second feature map, and P represents the prediction probability of the pixel point belonging to the positive sample point.

When the positive sample point and the negative sample point are considered, the method discards the pixel points with the pixel outline probability being more than 0 and less than or equal to the threshold eta, because the pixel points possibly have disputes semantically and cannot determine whether the pixel points are the outline points, the pixel points are regarded as the positive sample points or the negative sample points and can confuse the network, and the detection accuracy of the network is reduced. The threshold η may be selected from 0 to 1, for example, η is set to 0.5, and the value of the threshold is not particularly limited in the present invention.

The method of the invention calculates loss for the output feature maps with different scales and different levels, and calculates loss for the final output contour map. For example, as shown in fig. 2, 5 second feature maps are obtained from 5 deconvolution layer outputs, and the loss is calculated for the contour detection map output after the concat layer is fused. According to the invention, through a deep supervision training method, a first loss function is calculated for each second feature map, so that the data volume of neural network training can be increased, the problems of training gradient loss, too low convergence speed and the like are solved, network overfitting is prevented, the detection accuracy is improved, and the obtained contour map (second feature map) of each scale is finer and more integral. In determining the minimum value of the objective loss function, for example, a random gradient descent algorithm may be used.

In an alternative embodiment, the method further comprises:

The method inputs the images to be detected with a plurality of scales into the neural network for detection, and a plurality of contour detection graphs can be obtained. The multiple scales can be understood as scaling the image to be detected in different scales, for example, acquiring 3 images to be detected (1X, 1.5X, 0.5X) in different scales, which respectively represent an image of 1 time, an image enlarged to 1.5 times, and an image reduced by 0.5 time. And fusing the obtained multiple contour detection images to obtain a final contour detection image, and performing multi-scale feature weighted fusion again to further improve the contour detection effect. When a plurality of contour detection images are fused, different weight coefficients can be set for the characteristic vector of each contour detection image, the same weight coefficient can also be set for the characteristic vector of each contour detection image, and each weight coefficient is designed according to the scale of an image to be detected.

For example, after the network training is finished, for the input image X, the input images (1X, 1.5X, 0.5X) of 3 scales are respectively input into the network, and 5 second feature maps can be obtained for the input image of each scale

And 1 contour detection map Y_fuse(for example, a network as shown in fig. 2 is used), where k denotes the number of the second feature map, and 3 contour detection maps Y obtained by mapping input images of 3 scales are used_fuseAnd performing feature weighted fusion according to the weighting coefficients of 1, 1 and 1 to obtain a final contour detection image.

In the method, in the training process of the network, as shown in fig. 3, an input image is input into a neural network, and a plurality of characteristic graphs are output after a plurality of first convolution layer convolution processes; respectively inputting the plurality of feature maps into a global feature extraction module, and extracting global features to obtain a plurality of global feature maps; and respectively inputting the global feature maps into a boundary thinning module, and adjusting boundary information to obtain a plurality of first feature maps. Performing convolution processing on the plurality of first feature maps through the plurality of second convolution layers, and performing feature fusion on the first feature maps with the same scale in the plurality of first feature maps to obtain a plurality of feature maps of different levels; and then up-sampling and amplifying to the size of the original image respectively to obtain a plurality of second characteristic images. And performing weighted fusion on the plurality of second characteristic graphs, and outputting a contour detection graph. And designing a target loss function, adding the loss of the plurality of second feature maps and the loss of the contour detection map, and performing deep supervised training on the neural network. In the network training process, a plurality of images are input into the network each time, one iteration is carried out, a loss function is calculated, a plurality of images are input again for the next iteration, the loss function is enabled to face the descending direction in each iteration until convergence, the training is finished, and the iteration times can be judged according to the descending images of the loss function. And after each round is finished, the network can be verified by using the data in the verification set, loss is calculated, and whether the overfitting condition exists or not is observed. After training is completed, the network can be tested by using the data in the test set. And during testing, inputting input images with different scales into the trained network respectively, and fusing the plurality of output contour detection images to obtain a final contour detection image.

An image contour detection system according to an embodiment of the present invention includes:

The system firstly extracts the multi-scale features of the image through the multi-scale feature extraction module, then fuses feature graphs with the same scale through the multi-level feature extraction module to obtain features of different levels, and finally obtains a final detection result through the weighted fusion of the features of different levels through the outline detection module. The features of the contour at different levels are greatly different, and the accuracy of contour detection can be improved by fusing the features at different levels. It can be understood that when fusing the features of different levels, the present invention adopts different weights (the weights corresponding to the second feature maps are different) for the features of each level, so as to better fuse the shallow information and the deep information, realize the mutual supplement of the shallow feature and the deep feature, and improve the detection accuracy.

An optional embodiment, wherein the multi-scale feature extraction module is configured to:

The extracted features of the feature maps are rough and cannot cover the global features and boundary information of the image, so that the outlines of the feature maps are unclear. The system provided by the invention performs global feature extraction on the feature map of each scale to obtain global information, and performs thinning processing on the boundary of the contour to obtain boundary information, so that the contour of the obtained first feature map is clearer.

In an alternative embodiment, the multilevel feature extraction module is configured to:

In an alternative embodiment, the contour detection module is configured to:

In an alternative embodiment, the system is implemented by a neural network, the system further comprising: training the neural network according to a training set. The neural network comprises a plurality of first convolution layers, a plurality of global feature extraction modules, a plurality of boundary refinement modules, a plurality of second convolution layers, a plurality of deconvolution layers and a fusion layer;

For example, as shown in fig. 2, the neural network employed in the present invention has 13 first convolution layers of 3 × 3, which are, from top to bottom, conv1_1, conv1_2, conv2_1, conv2_2, conv3_1, conv3_2, conv3_3, conv4_1, conv4_2, conv4_3, conv5_1, conv5_2, and conv5_3, respectively. Wherein, conv1_1 and conv1_2 correspond to a first scale, and conv1_2 connects a pooling layer; conv2_1, conv2_2 correspond to the second scale, conv2_2 connects a pooling layer; conv3_1, conv3_2 and conv3_3 correspond to a third scale, and conv3_3 is connected with a pooling layer; conv4_1, conv4_2 and conv4_3 correspond to a fourth scale, and conv4_3 is connected with a pooling layer; conv5_1, conv5_2, conv5_3 correspond to the fifth scale. Each first convolution layer output is connected with a global feature extraction module and a boundary refinement module. After the image is input, 13 first feature maps are obtained after the image is subjected to 13 first convolution layers, 13 global feature extraction modules and 13 boundary thinning modules. The first feature map of each scale passes through a second convolution layer of 1x1 to obtain feature maps of 5 channels 1 corresponding to 5 scales respectively, and the feature maps of 5 channels 1 pass through a deconvolution layer respectively to be restored to the original size to obtain second feature maps of 5 channels 1. And performing feature fusion on the second feature maps with 5 channels as 1 by adopting different weights (namely 5 trained different weights are adopted by the 5 second feature maps), obtaining 1 feature map with 5 channels through concat layer fusion, and outputting the feature maps with 5 channels as feature maps with 1 channel through a convolution layer of 1x1, namely obtaining the contour detection map. Wherein the weights may be determined by network training.

Compared with the traditional VGGNet network, the neural network provided by the invention reduces the fifth pooling layer and all full-connection layers, and adds a global feature extraction module and a boundary refinement module. The fifth pooling layer reduces the size of the image, and reduces the detection accuracy by blurring the image when the feature map is enlarged and restored to the original size. The reduction of the fifth pooling layer and the full-link layer can obviously reduce the internal memory and time of training and testing, and can also avoid inaccurate detection caused by over-fuzzy images. Meanwhile, the added global feature extraction module and the boundary refinement module can acquire global information and boundary information of the feature map, and the contour detection precision is further improved.

The system provided by the invention trains the neural network to obtain the optimal network parameters, so that the contour detection can be better carried out on the image to be detected. The present invention, for example, employs the BSDS500 data set and the NYUD data set as the raw data set. The BSDS500 data set is a data set for image segmentation and object edge detection, and includes 200 training images, 100 verification images, and 200 test images, each with 4-9 markers. When the system is used, each training image in the BSDS500 data set is rotated at different angles, the largest rectangle in the rotated image is cut out, the image is turned over at each angle to obtain an augmentation data set, and the augmentation data in the augmentation data set and the VOC context data set are mixed to serve as the training data in the training set. The NYUD dataset consists of 1449 pairs of densely labeled RGB and depth images captured from an indoor scene, including 381 training images, 414 verification images, and 654 test images. When the method is used, each training image in the NYUD data set is randomly turned over, zoomed, rotated and the like to obtain an augmented data set, and the augmented data set is expanded into the NYUD data set to serve as training data of the training set.

An alternative embodiment is provided, wherein the training set comprises a plurality of sample images, each sample image comprising a plurality of contour labels,

wherein training the neural network according to a training set comprises:

wherein the target loss function is:

in the formula (I), the compound is shown in the specification,

the first loss function is represented as a function of,

a feature vector representing the contour detection map,

representing a second loss function.

wherein the first loss function is:

When the positive sample point and the negative sample point are considered, the method discards the pixel points with the pixel outline probability of more than 0 and less than or equal to the threshold eta, because the pixel points possibly have disputes semantically, whether the pixel points are the outline points or not cannot be determined, the pixel points are regarded as the positive sample points or the negative sample points, the network can be confused, and the detection accuracy of the network is reduced. The threshold η may be selected from 0 to 1, for example, η is set to 0.5, and the value of the threshold is not particularly limited in the present invention.

In an alternative embodiment, the system further comprises:

In the network training process, as shown in fig. 3, the system inputs an input image into a neural network, and outputs a plurality of characteristic diagrams after a plurality of first convolution layer convolution processes; respectively inputting the plurality of feature maps into a global feature extraction module, and extracting global features to obtain a plurality of global feature maps; and respectively inputting the global feature maps into a boundary thinning module, and adjusting boundary information to obtain a plurality of first feature maps. Performing convolution processing on the plurality of first feature maps through the plurality of second convolution layers, and performing feature fusion on the first feature maps with the same scale in the plurality of first feature maps to obtain a plurality of feature maps of different levels; and then up-sampling and amplifying to the size of the original image respectively to obtain a plurality of second characteristic images. And performing weighted fusion on the plurality of second characteristic graphs, and outputting a contour detection graph. And designing a target loss function, adding the loss of the plurality of second feature maps and the loss of the contour detection map, and performing deep supervised training on the neural network. In the network training process, a plurality of images are input into the network each time, one iteration is carried out, a loss function is calculated, a plurality of images are input again for the next iteration, the loss function is enabled to face the descending direction in each iteration until convergence, the training is finished, and the iteration times can be judged according to the descending images of the loss function. And after each round is finished, the network can be verified by using the data in the verification set, loss is calculated, and whether the overfitting condition exists or not is observed. After training is completed, the network can be tested by using the data in the test set. And during testing, inputting input images with different scales into the trained network respectively, and fusing the plurality of output contour detection images to obtain a final contour detection image.

The disclosure also relates to an electronic device comprising a server, a terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the method of the above embodiments.

In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the method, by executing nonvolatile software programs, instructions, and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory and, when executed by the one or more processors, perform the methods of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.

The present disclosure also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Furthermore, those of ordinary skill in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

It will be understood by those skilled in the art that while the present invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. An image contour detection method, characterized in that the method comprises:

2. The method of claim 1, wherein the performing multi-scale feature extraction on the image to be detected to obtain a plurality of first feature maps comprises:

3. The method of claim 1, wherein the performing multi-level feature extraction on the plurality of first feature maps to obtain a plurality of second feature maps comprises:

4. The method of claim 1, wherein the performing feature weighted fusion on the plurality of second feature maps to obtain a contour detection map comprises:

5. The method of claim 1, wherein the method is implemented by a neural network, the method further comprising: training the neural network according to a training set,

6. The method of claim 5, wherein the training set comprises a plurality of sample images, each sample image comprising a plurality of silhouette annotation graphs,

wherein training the neural network according to a training set comprises:

7. The method of claim 6, wherein training the neural network according to a training set comprises:

wherein the target loss function is:

in the formula (I), the compound is shown in the specification,

the first loss function is represented as a function of,

a feature vector representing the contour detection map,

representing a second loss function.

8. The method of claim 7, wherein determining the loss of the plurality of second feature maps and the contour probability map for each sample image, resulting in a plurality of first loss functions, comprises:

wherein the first loss function is:

9. The method of claim 1, wherein the method further comprises:

10. An image contour detection system, characterized in that the system comprises: