CN113538484B - Deep-refinement multiple-information nested edge detection method - Google Patents

Deep-refinement multiple-information nested edge detection method Download PDF

Info

Publication number
CN113538484B
CN113538484B CN202110746455.8A CN202110746455A CN113538484B CN 113538484 B CN113538484 B CN 113538484B CN 202110746455 A CN202110746455 A CN 202110746455A CN 113538484 B CN113538484 B CN 113538484B
Authority
CN
China
Prior art keywords
image
convolution
images
information extraction
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110746455.8A
Other languages
Chinese (zh)
Other versions
CN113538484A (en
Inventor
林川
王蕤兴
张贞光
陈永亮
谢智星
吴海晨
李福章
潘勇才
韦艳霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University of Science and Technology
Original Assignee
Guangxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University of Science and Technology filed Critical Guangxi University of Science and Technology
Priority to CN202110746455.8A priority Critical patent/CN113538484B/en
Publication of CN113538484A publication Critical patent/CN113538484A/en
Application granted granted Critical
Publication of CN113538484B publication Critical patent/CN113538484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention aims to provide a deep-refinement multiple-information nested edge detection method, which comprises the following steps: constructing a deep neural network structure, wherein the deep neural network structure is as follows: an encoding network, a decoding network; the encoding network is a VGG16 network, all full connection layers and pool5 pooling layers are removed from the VGG16 network, and only a VGG16 network body is reserved; the decoding network is divided into three layers, wherein the first layer comprises a compression module, a remodeling module and an adjustment module; the second layer comprises an information extraction and fusion module a, an information extraction and fusion module b, information extraction and fusion module information c and an extraction and fusion module d; the third layer is a network module for transversely subdividing the contour.

Description

Deep-refinement multiple-information nested edge detection method
Technical Field
The invention relates to the field of image processing, in particular to a method for detecting a multi-information nested edge of deep thinning.
Background
Contour detection is an important component of image processing and computer vision. Correctly detecting object contours from complex backgrounds is a very important and difficult task. Among the conventional image processing methods, Canny operators, active contour models, contour models based on machine learning, and the like are used for contour detection. These methods mainly use brightness, color, and contrast information in the image to detect, and are difficult to distinguish between object contours and other cluttered boundaries. Therefore, when the contrast ratio in the image is changed relatively greatly and the background interference is relatively much, the methods have difficulty in obtaining satisfactory results. The above algorithm requires considerable domain expertise and elaborate processing algorithm design to convert the raw image data into suitable representations or feature vectors to construct a contour classifier or contour model. In recent years, deep learning techniques have become an efficient way to automatically learn feature representations from raw data. By means of deep learning tools, in particular convolutional neural networks, the contour detection task has remarkable performance improvement.
In recent years, research related to deep learning has formed a relatively complete system. The HED shows the detection effect of a five-layer side view of the VGG16 network, and finds that the outline effect of a shallow layer is poor, the shallow layer contains a large amount of textures and noise, the error rate is increased in the transmission process, and the experiment effect is greatly influenced. The conventional deep learning algorithm only directly adds or fuses the convolutional layers, and lacks the theoretical support of a biological visual mechanism, while the bionic algorithm describes cell response by using a mathematical model and is not enough to simulate a complex transmission mode among layers in the visual mechanism.
Disclosure of Invention
The invention aims to provide a method for detecting the nested edges of deeply refined multiple information, which overcomes the defects of the prior art and can make the outline clearer and more accurate.
The technical scheme of the invention is as follows:
the method for detecting the deeply refined multiple information nested edges comprises the following steps:
A. the method comprises the following steps of constructing a deep neural network structure, wherein the deep neural network structure comprises an encoding network and a decoding network, and the specific structure is as follows:
the encoding network is a VGG16 network, all full connection layers and pool5 pooling layers are removed from the VGG16 network, and only a VGG16 network body is reserved; the decoding network is divided into three layers, wherein the first layer comprises a compression module, a reshaping module and an adjusting module; the second layer is an information extraction and fusion module a, an information extraction and fusion module b, information extraction and fusion module information c and an information extraction and fusion module d; the third layer is a network module for transversely subdividing the contour;
B. the original image is subjected to network convolution processing of VGG16 to obtain 5 side output images of VGG16, and then the 5 side output images of VGG16 are respectively input into a compression module and an information extraction and fusion module a;
in the information extraction and fusion module a, carrying out convolution processing on the 1 st to 5 th side output images again to enable the number of output channels to be consistent, and obtaining a convolution image again of the 1 st to 5 th side output images; secondly, unifying the resolution of the deconvolution images of the 2 nd to 5 th side output images by taking the 1 st side output image deconvolution image as a reference, obtaining resolution adjustment images of the 2 th to 5 th side output images, fusing the 1 st side output image deconvolution image with the resolution adjustment images of the 2 to 5 th side output image deconvolution images, obtaining an information extraction fused image a, and inputting the information extraction fused image a into a transverse subdivision profile network module;
C. in the compression module: performing secondary convolution on the 1 st to 5 th side face output images, wherein 3 × 3 convolution is adopted for the secondary convolution of the 1 st and 2 th layers of convolution images, 1 × 1 convolution is adopted for the secondary convolution of the 3 st, 4 th and 5 th layers of convolution images, and the number of characteristic channels is unified; combining the 1, 2, 3, 4 and 5 layers of convolution images after secondary convolution two by two in sequence to form 4 groups, pooling the high-resolution output image in each group to be the same as the low-resolution output image by using the maximum value, then adding to obtain four primary combined images which are respectively 1-2, 2-3, 3-4 and 4-5 combined images, and respectively inputting the four primary combined images into a remodeling module and an information extraction and fusion module b;
in the information extraction and fusion module b, combining the images of 1-2, 2-3, 3-4 and 4-5, respectively carrying out convolution processing again to ensure that the number of output channels is consistent, and obtaining the convolution images of 1-2, 2-3, 3-4 and 4-5 again; secondly, respectively taking the 2-3, 3-4 and 4-5 deconvolution images as a reference, unifying the resolution to obtain 2-3, 3-4 and 4-5 resolution adjustment images, fusing the 1-2 deconvolution images with the 2-3, 3-4 and 4-5 resolution adjustment images to obtain an information extraction fused image b, and inputting the information extraction fused image b into a transverse subdivision contour network module;
D. the remolding module is provided with two layers, and the treatment process of the first layer is as follows: performing three parallel convolutions on the 1-2 and 2-3 combined images by using 1 × 1, 3 × 3 and 5 × 5 respectively; fusing the three-time parallel convolution results of the 1-2 combined images to obtain fused 1-2 combined images; fusing the three-time parallel convolution results of the 2-3 combined images to obtain fused 2-3 combined images; performing 1 × 1 convolution on the 3-4 and 4-5 combined images; combining the fused 1-2 combined image, the fused 2-3 combined image, the convolved 3-4 combined image and the convolved 4-5 combined image two by two in sequence to form 3 groups, pooling the high-resolution output image in each group to be the same as the low-resolution output image by using the maximum value, adding to obtain a 1-3 combined image, a 2-4 combined image and a 3-5 combined image, and respectively inputting the images into a second layer and an information extraction and fusion module c;
the treatment process at the second layer is as follows: performing three parallel convolutions on the 1-3 and 2-4 combined images by using 1 × 1, 3 × 3 and 5 × 5 respectively; fusing the three-time parallel convolution results of the 1-3 combined images to obtain fused 1-3 combined images; fusing the three-time parallel convolution results of the 2-4 combined images to obtain fused 2-4 combined images; applying 1 x 1 convolution to the 3-5 combined images; unifying the fused 1-3 combined image 1-3, the fused 2-4 combined image and the convolved 3-5 combined image, pooling the high-resolution output image to be the same as the low-resolution output image by using the maximum value, combining and adding to obtain a 1-4 combined image and a 2-5 combined image, and inputting the images into an adjusting module;
in the information extraction and fusion module c, carrying out convolution processing on the combined images of 1-3, 2-4 and 3-5 respectively to ensure that the number of output channels is consistent, and obtaining convolution images of 1-3, 2-4 and 3-5 again; secondly, unifying the resolution of the 2-4 and 3-5 re-convolution images by taking the 1-3 re-convolution images as a reference respectively to obtain 2-4 and 3-5 resolution adjustment images, fusing the 1-3 re-convolution images with the 2-4 and 3-5 resolution adjustment images to obtain an information extraction fused image c, and inputting the information extraction fused image c into a transverse subdivision contour network module;
E. in an adjusting module, combining 1-4 images and 2-5 images to unify resolution, converting an output image with low resolution into the same output image with high resolution by using a bilinear difference value, combining and adding to obtain a 1-5 combined image, and inputting the combined image into an information extraction and fusion module d;
in the information extraction and fusion module d, carrying out convolution processing on the 1-5 combined image again to obtain a 1-5 convolution image again, and inputting the 1-5 convolution image into the transverse subdivision contour network module;
F. in the transverse subdivision outline network module, the following operations are carried out:
f1, performing convolution and activation on the information extraction fused image a, the information extraction fused image b, the information extraction fused image c and the information extraction fused image d respectively, multiplying the convolution and activation by self-adaptive random weight to obtain a primary weight image a, a primary weight image b, a primary weight image c and a primary weight image d, combining the four images in pairs in sequence to form 3 groups, sampling the low-resolution output image in each group to the high-resolution output image by using a bilinear difference value, and then adding to obtain a primary addition weight image a, a primary addition weight image b and a primary addition weight image c;
f2, respectively performing convolution and activation on the primary addition weight image a, the primary addition weight image b and the primary addition weight image c, multiplying the convolution and activation by self-adaptive random weights to obtain a secondary weight image a, a secondary weight image b and a secondary weight image c, combining the three images in pairs in sequence to form 2 groups, performing up-sampling on a low-resolution output image to a high-resolution output image in each group by using a bilinear difference value, and then adding to obtain a secondary addition weight image a and a secondary addition weight image b;
f3, respectively performing convolution and activation on the secondary addition weight image a and the secondary addition weight image b, multiplying the convolution and activation by self-adaptive random weight to obtain a tertiary weight image a and a tertiary weight image b, unifying the resolution of the two images, performing upsampling on the output image with low resolution to the same as the high resolution output image by using a bilinear difference value, then adding, finally, changing the number of characteristic channels to 1 by 1 convolution, and outputting to obtain a final edge image.
The convolution expression involved in each step is m x n-k conv + relu, wherein m x n represents the size of a convolution kernel, k represents the number of output channels, conv represents a convolution formula, and relu represents an activation function; m, n and k are preset values; the convolution expression of the final fusion layer is m x n-k conv.
The VGG16 network comprises 5 stages, which are stage I to stage V, wherein more than one convolution layer is arranged in each stage;
the input response of the first convolution layer of the stage I is an original image, and the input responses of other convolution layers of the stage I are the output responses of the convolution layer at the stage; in stages II-V, except the input response of the first convolutional layer in the stage, the input responses of other convolutional layers in the stage are the output responses of the last convolutional layer; the output response of the last convolution layer in the stages I to IV is used as the input response of the first convolution layer in the next stage after the maximum value pooling on the one hand; on the other hand, the information is input into a compression module and an information extraction and fusion module a as an input response; the output response of the last convolutional layer in the stage V is input into a compression module and an information extraction and fusion module a after being subjected to maximum value pooling;
the convolutions in the VGG16 network are all 3 × 3 convolutions.
The second convolution in steps B to E is 1 × 1 convolution.
In the step C, the number of the unified feature channels is 200.
In the steps B-E, the number of the feature channels of the information extraction fusion image a is 64, the number of the feature channels of the information extraction fusion image B is 100, the number of the feature channels of the information extraction fusion image c is 200, and the number of the feature channels of the information extraction fusion image d is 300.
In the steps B-E, the method for unifying the resolution in the information extraction fusion module a, the information extraction fusion module B, the information extraction fusion module c, and the information extraction fusion module d is as follows: the low resolution output map is transformed to the high resolution output map using bilinear differencing.
In the step F1-3, the convolution is 3 × 3 convolution, the activation is performed by using the following ReLU function, and the weight parameter range of the adaptive random weight is 0 to 1;
Figure GDA0003616288670000041
the maximum pooling is 2 x 2 maximum pooling.
The invention designs an edge detection method based on a novel decoding network, which is suitable for most networks and can show good results. On the NYUD-V2 data set, with VGG16 as the encoding network, F-score with ODS of 0.773 was obtained, which is 1.6% higher than LRCNet. The method provided by the invention provides a new idea for the subsequent contour detection research and is further beneficial to improving other visual tasks.
Drawings
Fig. 1 is a network diagram of VGG16 provided in embodiment 1 of the present invention;
FIG. 2 is a graph showing the comparison between the contour detection effects of the embodiment 1 of the present invention and that of the reference 1;
in fig. 1, "3 × 3-64", "3 × 3-128" and the like indicate parameters of the convolution kernel, where "3 × 3" indicates the size of the convolution kernel, and "-64", "128" and the like indicate the number of convolution kernels, that is, the number of output characteristic channels is 64 or 128 and the like.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and examples.
Example 1
The method for detecting the nested edges of the multiple information with the deep refinement provided by the embodiment comprises the following steps:
A. the method comprises the following steps of constructing a deep neural network structure, wherein the deep neural network structure comprises an encoding network and a decoding network, and the specific structure is as follows:
the encoding network is a VGG16 network, all full connection layers and pool5 pooling layers are removed from the VGG16 network, and only a VGG16 network body is reserved; the decoding network is divided into three layers, wherein the first layer comprises a compression module, a reshaping module and an adjusting module; the second layer is an information extraction and fusion module a, an information extraction and fusion module b, information extraction and fusion module information c and an information extraction and fusion module d; the third layer is a network module for transversely subdividing the contour;
B. the original image is subjected to network convolution processing of VGG16 to obtain 5 side output images of VGG16, and then the 5 side output images of VGG16 are respectively input into a compression module and an information extraction and fusion module a;
in the information extraction and fusion module a, carrying out convolution processing on the 1 st to 5 th side output images again to enable the number of output channels to be consistent, and obtaining a convolution image again of the 1 st to 5 th side output images; secondly, respectively taking the re-convolved images of the 2 nd-5 th side output images as reference, unifying the resolution to obtain resolution adjustment images of the re-convolved images of the 2 th-5 th side output images, fusing the resolution adjustment images of the re-convolved images of the 1 st side output images and the re-convolved images of the 2 th-5 th side output images through a concat function to obtain an information extraction fused image a, and inputting the information extraction fused image a into a transverse subdivision contour network module;
C. in the compression module: performing secondary convolution on the 1 st to 5 th side face output images, wherein 3 × 3 convolution is adopted for the secondary convolution of the 1 st and 2 th layers of convolution images, 1 × 1 convolution is adopted for the secondary convolution of the 3 st, 4 th and 5 th layers of convolution images, and the number of characteristic channels is unified; combining the 1, 2, 3, 4 and 5 layers of convolution images after secondary convolution two by two in sequence to form 4 groups, pooling the high-resolution output image in each group to be the same as the low-resolution output image by using the 2 x 2 maximum value, then adding to obtain four primary combined images which are respectively 1-2, 2-3, 3-4 and 4-5 combined images, and respectively inputting the four primary combined images into a remodeling module and an information extraction and fusion module b;
in the information extraction and fusion module b, combining the images of 1-2, 2-3, 3-4 and 4-5, respectively carrying out convolution processing again to ensure that the number of output channels is consistent, and obtaining the convolution images of 1-2, 2-3, 3-4 and 4-5 again; secondly, respectively taking the 1-2 deconvoluted images as the reference for the 2-3, 3-4 and 4-5 deconvoluted images, unifying the resolution to obtain 2-3, 3-4 and 4-5 resolution adjustment images, fusing the 1-2 deconvoluted images and the 2-3, 3-4 and 4-5 resolution adjustment images through a concat function to obtain an information extraction fused image b, and inputting the information extraction fused image b into a transverse subdivision contour network module;
D. the remolding module is provided with two layers, and the treatment process of the first layer is as follows: performing three parallel convolutions on the 1-2 and 2-3 combined images by using 1 × 1, 3 × 3 and 5 × 5 respectively; fusing the three-time parallel convolution results of the 1-2 combined images through a concat function to obtain fused 1-2 combined images; fusing the three-time parallel convolution results of the 2-3 combined images through a concat function to obtain fused 2-3 combined images; 1 x 1 convolution was used for 3-4, 4-5 binding images; combining the fused 1-2 combined image, the fused 2-3 combined image, the convolved 3-4 combined image and the convolved 4-5 combined image two by two in sequence to form 3 groups, pooling the high-resolution output image in each group to be the same as the low-resolution output image by using a 2 x 2 maximum value, adding to obtain a 1-3 combined image, a 2-4 combined image and a 3-5 combined image, and respectively inputting the 1-3 combined image, the 2-4 combined image and the 3-5 combined image into a second layer and an information extraction fusion module c;
the processing procedure at the second layer is as follows: performing three parallel convolutions on the 1-3 and 2-4 combined images by using 1 × 1, 3 × 3 and 5 × 5 respectively; fusing the three-time parallel convolution results of the 1-3 combined images through a concat function to obtain fused 1-3 combined images; fusing the three-time parallel convolution results of the 2-4 combined images through a concat function to obtain fused 2-4 combined images; applying 1 x 1 convolution to the 3-5 combined images; unifying the fused 1-3 combined image 1-3, the fused 2-4 combined image and the convolved 3-5 combined image, pooling the high-resolution output image to be the same as the low-resolution output image by using a 2 x 2 maximum value, combining and adding to obtain a 1-4 combined image and a 2-5 combined image, and inputting the images into an adjusting module;
in the information extraction and fusion module c, the combined images of 1-3, 2-4 and 3-5 are respectively subjected to convolution processing again, so that the number of output channels is consistent, and the deconvolution images of 1-3, 2-4 and 3-5 are obtained; secondly, unifying the resolution of the 2-4 and 3-5 re-convolution images by taking the 1-3 re-convolution images as a reference respectively to obtain 2-4 and 3-5 resolution adjustment images, fusing the 1-3 re-convolution images with the 2-4 and 3-5 resolution adjustment images through a concat function to obtain an information extraction fused image c, and inputting the information extraction fused image c into a transverse subdivision contour network module;
E. in an adjusting module, combining 1-4 images and 2-5 images to unify resolution, converting an output image with low resolution into the same output image with high resolution by using a bilinear difference value, combining and adding to obtain a 1-5 combined image, and inputting the combined image into an information extraction and fusion module d;
in the information extraction and fusion module d, carrying out convolution processing on the 1-5 combined image again to obtain a 1-5 convolution image again, and inputting the 1-5 convolution image into the transverse subdivision contour network module;
F. in the transverse subdivision outline network module, the following operations are carried out:
f1, performing convolution and activation on the information extraction fused image a, the information extraction fused image b, the information extraction fused image c and the information extraction fused image d respectively, multiplying the convolution and activation by self-adaptive random weight to obtain a primary weight image a, a primary weight image b, a primary weight image c and a primary weight image d, combining the four images in pairs in sequence to form 3 groups, sampling the low-resolution output image in each group to the high-resolution output image by using a bilinear difference value, and then adding to obtain a primary addition weight image a, a primary addition weight image b and a primary addition weight image c;
f2, respectively performing convolution and activation on the primary addition weight image a, the primary addition weight image b and the primary addition weight image c, multiplying the convolution and activation by self-adaptive random weights to obtain a secondary weight image a, a secondary weight image b and a secondary weight image c, combining the three images in pairs in sequence to form 2 groups, performing up-sampling on a low-resolution output image to a high-resolution output image in each group by using a bilinear difference value, and then adding to obtain a secondary addition weight image a and a secondary addition weight image b;
f3, respectively performing convolution and activation on the secondary addition weight image a and the secondary addition weight image b, multiplying the convolution and activation by self-adaptive random weight to obtain a tertiary weight image a and a tertiary weight image b, unifying the resolution of the two images, performing upsampling on the output image with low resolution to the same as the high resolution output image by using a bilinear difference value, then adding, finally, changing the number of characteristic channels to 1 by 1 convolution, and outputting to obtain a final edge image.
The convolution expression involved in each step is m x n-k conv + relu, wherein m x n represents the size of a convolution kernel, k represents the number of output channels, conv represents a convolution formula, and relu represents an activation function; m, n and k are preset values; the convolution expression of the final fusion layer is m x n-k conv.
The VGG16 network includes 5 stages, which are stage I to stage V, each stage is provided with more than one convolution layer;
the input response of the first convolution layer of the stage I is an original image, and the input responses of other convolution layers of the stage I are the output responses of the convolution layer at the stage; in stages II-V, except the input response of the first convolutional layer in the stage, the input responses of other convolutional layers in the stage are the output responses of the last convolutional layer; on one hand, the output response of the last convolution layer in the stages I to IV is used as the input response of the first convolution layer in the next stage after 2-by-2 maximum pooling; on the other hand, the information is input into a compression module and an information extraction and fusion module a as an input response; the output response of the last convolution layer in the stage V is input into the compression module and the information extraction fusion module a after being subjected to 2 x 2 maximum value pooling;
the convolutions in the VGG16 network are all 3 × 3 convolutions.
The re-convolution in steps B-E is 1 x 1 convolution.
In the step C, the number of the uniform characteristic channels is 200.
In the steps B-E, the number of the characteristic channels of the information extraction fusion image a is 64, the number of the characteristic channels of the information extraction fusion image B is 100, the number of the characteristic channels of the information extraction fusion image c is 200, and the number of the characteristic channels of the information extraction fusion image d is 300.
In the steps B-E, the method for unifying the resolution in the information extraction fusion module a, the information extraction fusion module B, the information extraction fusion module c, and the information extraction fusion module d is as follows: the low resolution output map is transformed to the high resolution output map using bilinear differencing.
In the step F1-3, the convolution is 3 × 3 convolution, the activation is performed by using the following ReLU function, and the weight parameter range of the adaptive random weight is 0 to 1;
Figure GDA0003616288670000071
example 2
Comparing the edge detection results of the method of this embodiment with the method of the following document 1;
document 1: HED: S.Xie and Z.Tu, "Hollistingy-nested edge detection," in International conference on Computer Vision,2015, pp.1395-1403;
document 2: LRCNet: lin, l.cui, f.li, and y.cao, "bacterial reference Network for content Detection," neuro-typing, vol.409, 2020;
training and edge detection were performed based on the neural network model of example 1. The training and testing of the present invention was done using the published PyTorch framework. The invention initializes the network of the invention using the VGG16 model that has been pre-trained in ImageNet. In training, the convolution kernel is initialized with a zero mean gaussian distribution with a standard deviation of 0.01 and a bias term of 0. The Stochastic Gradient Descent (SGD) hyper-parameter, global learning rate set to 1e-6, momentum and weight decay set to 0.9 and 0.0002, respectively. When the NYUD dataset is employed, the tolerance maxDist is adjusted to 0.011.
We used Precision-regression (PR) curves and harmonic mean F values to evaluate the performance of the contour detection model. The F value is defined as follows:
F=2PR/(P+R)
wherein P and R represent the degree of accuracy and regression, respectively,
Figure GDA0003616288670000081
here TPFP, and FN represent the correct number of contour pixels, the number of false detections, and the number of missed detections, respectively.
Experimental data:
NYUD-V2 data set. As shown in table 1, the network of the present invention has better detection results than other learning networks. In embodiment 1 of the present invention, when VGG16 is used as the coding network, the ODS obtained by combining the HHA image and the RGB image is 0.773. Compared with LRC, the improvement is 1.6 percent respectively. From the results of the experiments in table 1, the detection method of the present invention (DDM) is superior to the detection methods of documents 1(HED) and 2 (LRCNet).
TABLE 1 comparison of the effects of F-score in other networks
Figure GDA0003616288670000082

Claims (10)

1. A deep-refinement multiple information nested edge detection method is characterized by comprising the following steps:
A. constructing a deep neural network structure, wherein the deep neural network comprises a coding network and a decoding network, and the specific structure is as follows:
the encoding network is a VGG16 network, all full connection layers and pool5 pooling layers are removed from the VGG16 network, and only a VGG16 network body is reserved; the decoding network is divided into three layers, wherein the first layer comprises a compression module, a reshaping module and an adjusting module; the second layer is an information extraction and fusion module a, an information extraction and fusion module b, information extraction and fusion module information c and an information extraction and fusion module d; the third layer is a network module for transversely subdividing the contour;
B. the original image is subjected to network convolution processing of VGG16 to obtain 5 side output images of VGG16, and then the 5 side output images of VGG16 are respectively input into a compression module and an information extraction and fusion module a;
in the information extraction and fusion module a, carrying out convolution processing on the 1 st to 5 th side output images again to enable the number of output channels to be consistent, and obtaining a convolution image again of the 1 st to 5 th side output images; secondly, respectively taking the re-convolved images of the 2 nd to 5 th side output images as reference, unifying the resolution to obtain a resolution adjustment image of the re-convolved images of the 2 th to 5 th side output images, fusing the re-convolved image of the 1 st side output image and the resolution adjustment image of the re-convolved images of the 2 th to 5 th side output images to obtain an information extraction fused image a, and inputting the information extraction fused image a into a transverse subdivision contour network module;
C. in the compression module: performing secondary convolution on the 1 st to 5 th side face output images, wherein 3 × 3 convolution is adopted for the secondary convolution of the 1 st and 2 th layers of convolution images, 1 × 1 convolution is adopted for the secondary convolution of the 3 st, 4 th and 5 th layers of convolution images, and the number of characteristic channels is unified; combining the 1, 2, 3, 4 and 5 layers of convolution images after secondary convolution two by two in sequence to form 4 groups, pooling the high-resolution output image in each group to be the same as the low-resolution output image by using the maximum value, then adding to obtain four primary combined images which are respectively 1-2, 2-3, 3-4 and 4-5 combined images, and respectively inputting the four primary combined images into a remodeling module and an information extraction and fusion module b;
in the information extraction and fusion module b, combining the images of 1-2, 2-3, 3-4 and 4-5, respectively carrying out convolution processing again to ensure that the number of output channels is consistent, and obtaining the convolution images of 1-2, 2-3, 3-4 and 4-5 again; secondly, respectively taking the 2-3, 3-4 and 4-5 deconvolution images as a reference, unifying the resolution to obtain 2-3, 3-4 and 4-5 resolution adjustment images, fusing the 1-2 deconvolution images with the 2-3, 3-4 and 4-5 resolution adjustment images to obtain an information extraction fused image b, and inputting the information extraction fused image b into a transverse subdivision contour network module;
D. the remolding module is provided with two layers, and the treatment process of the first layer is as follows: performing three parallel convolutions on the 1-2 and 2-3 combined images by using 1 × 1, 3 × 3 and 5 × 5 respectively; fusing the three-time parallel convolution results of the 1-2 combined images to obtain fused 1-2 combined images; fusing the three-time parallel convolution results of the 2-3 combined images to obtain fused 2-3 combined images; performing 1 × 1 convolution on the 3-4 and 4-5 combined images; combining the fused 1-2 combined image, the fused 2-3 combined image, the convolved 3-4 combined image and the convolved 4-5 combined image two by two in sequence to form 3 groups, pooling the high-resolution output image in each group to be the same as the low-resolution output image by using the maximum value, adding to obtain a 1-3 combined image, a 2-4 combined image and a 3-5 combined image, and respectively inputting the images into a second layer and an information extraction and fusion module c;
the processing procedure at the second layer is as follows: performing three parallel convolutions on the 1-3 and 2-4 combined images by using 1 × 1, 3 × 3 and 5 × 5 respectively; fusing the three-time parallel convolution results of the 1-3 combined images to obtain fused 1-3 combined images; fusing the three-time parallel convolution results of the 2-4 combined images to obtain fused 2-4 combined images; 1 × 1 convolution was applied to the 3-5 binding image; unifying the fused 1-3 combined image 1-3, the fused 2-4 combined image and the convolved 3-5 combined image, pooling the high-resolution output image to be the same as the low-resolution output image by using the maximum value, combining and adding to obtain a 1-4 combined image and a 2-5 combined image, and inputting the images into an adjusting module;
in the information extraction and fusion module c, the combined images of 1-3, 2-4 and 3-5 are respectively subjected to convolution processing again, so that the number of output channels is consistent, and the deconvolution images of 1-3, 2-4 and 3-5 are obtained; secondly, unifying the resolution of the 2-4 and 3-5 re-convolution images by taking the 1-3 re-convolution images as a reference respectively to obtain 2-4 and 3-5 resolution adjustment images, fusing the 1-3 re-convolution images with the 2-4 and 3-5 resolution adjustment images to obtain an information extraction fused image c, and inputting the information extraction fused image c into a transverse subdivision contour network module;
E. in an adjusting module, combining 1-4 images and 2-5 images to unify resolution, converting an output image with low resolution into the same output image with high resolution by using a bilinear difference value, combining and adding to obtain a 1-5 combined image, and inputting the combined image into an information extraction and fusion module d;
in the information extraction and fusion module d, carrying out convolution processing on the 1-5 combined image again to obtain a 1-5 convolution image again, and inputting the 1-5 convolution image into the transverse subdivision contour network module;
F. in the transverse subdivision outline network module, the following operations are carried out:
f1, performing convolution and activation on the information extraction fused image a, the information extraction fused image b, the information extraction fused image c and the information extraction fused image d respectively, multiplying the convolution and activation by self-adaptive random weight to obtain a primary weight image a, a primary weight image b, a primary weight image c and a primary weight image d, combining the four images in pairs in sequence to form 3 groups, sampling the low-resolution output image in each group to the high-resolution output image by using a bilinear difference value, and then adding to obtain a primary addition weight image a, a primary addition weight image b and a primary addition weight image c;
f2, respectively performing convolution and activation on the primary addition weight image a, the primary addition weight image b and the primary addition weight image c, multiplying the convolution and activation by self-adaptive random weights to obtain a secondary weight image a, a secondary weight image b and a secondary weight image c, combining the three images in pairs in sequence to form 2 groups, performing up-sampling on a low-resolution output image to a high-resolution output image in each group by using a bilinear difference value, and then adding to obtain a secondary addition weight image a and a secondary addition weight image b;
f3, respectively performing convolution and activation on the secondary addition weight image a and the secondary addition weight image b, multiplying the convolution and activation by self-adaptive random weight to obtain a tertiary weight image a and a tertiary weight image b, unifying the resolution of the two images, performing upsampling on the output image with low resolution to the same as the high resolution output image by using a bilinear difference value, then adding, finally, changing the number of characteristic channels to 1 by 1 convolution, and outputting to obtain a final edge image.
2. The method of deep-refinement multiple-information nested edge detection as claimed in claim 1, characterized in that: the convolution expression involved in each step is m x n-k conv + relu, wherein m x n represents the size of a convolution kernel, k represents the number of output channels, conv represents a convolution formula, and relu represents an activation function; and m, n and k are preset values.
3. The method of deep-refinement multiple-information nested edge detection as claimed in claim 2, characterized in that: the VGG16 network includes 5 stages, which are stage I to stage V, each stage is provided with more than one convolution layer;
the input response of the first convolution layer of the stage I is an original image, and the input responses of other convolution layers of the stage I are the output responses of the convolution layer at the stage; in stages II-V, except the input response of the first convolutional layer in the stage, the input responses of other convolutional layers in the stage are the output responses of the last convolutional layer; the output response of the last convolution layer in the stages I to IV is used as the input response of the first convolution layer in the next stage after the maximum value pooling on the one hand; on the other hand, the information is input into a compression module and an information extraction and fusion module a as an input response; and the output response of the last convolutional layer in the stage V is input into the compression module and the information extraction and fusion module a after being subjected to maximum value pooling.
4. The method of deep-refinement multiple-information nested edge detection as claimed in claim 3, characterized in that:
the convolutions in the VGG16 network are all 3 × 3 convolutions.
5. The method for detecting the multiple information nested edges of the deep refinement as claimed in claim 1, characterized in that: the second convolution in steps B to E is 1 × 1 convolution.
6. The method of deep-refinement multiple-information nested edge detection as claimed in claim 1, characterized in that: in the step C, the number of the unified feature channels is 200.
7. The method of deep-refinement multiple-information nested edge detection as claimed in claim 1, characterized in that: in the steps B-E, the number of the feature channels of the information extraction fusion image a is 64, the number of the feature channels of the information extraction fusion image B is 100, the number of the feature channels of the information extraction fusion image c is 200, and the number of the feature channels of the information extraction fusion image d is 300.
8. The method of deep-refinement multiple-information nested edge detection as claimed in claim 1, characterized in that: in the steps B-E, the method for unifying the resolution in the information extraction fusion module a, the information extraction fusion module B, the information extraction fusion module c, and the information extraction fusion module d is as follows: the low resolution output map is the same using bilinear differencing to the high resolution output map.
9. The method of deep-refinement multiple-information nested edge detection as claimed in claim 1, characterized in that:
in the step F1-3, the convolution is 3 × 3 convolution, the activation is performed by using the following ReLU function, and the weight parameter range of the adaptive random weight is 0 to 1;
Figure FDA0003616288660000041
10. the method of deep-refinement multiple-information nested edge detection as claimed in claim 8, characterized in that: the maximum pooling is 2 x 2 maximum pooling.
CN202110746455.8A 2021-07-01 2021-07-01 Deep-refinement multiple-information nested edge detection method Active CN113538484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110746455.8A CN113538484B (en) 2021-07-01 2021-07-01 Deep-refinement multiple-information nested edge detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110746455.8A CN113538484B (en) 2021-07-01 2021-07-01 Deep-refinement multiple-information nested edge detection method

Publications (2)

Publication Number Publication Date
CN113538484A CN113538484A (en) 2021-10-22
CN113538484B true CN113538484B (en) 2022-06-10

Family

ID=78097547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110746455.8A Active CN113538484B (en) 2021-07-01 2021-07-01 Deep-refinement multiple-information nested edge detection method

Country Status (1)

Country Link
CN (1) CN113538484B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463360B (en) * 2021-10-27 2024-03-15 广西科技大学 Contour detection method based on bionic characteristic enhancement network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740869A (en) * 2016-01-28 2016-07-06 北京工商大学 Square operator edge extraction method and system based on multiple scales and multiple resolutions
CN107610140A (en) * 2017-08-07 2018-01-19 中国科学院自动化研究所 Near edge detection method, device based on depth integration corrective networks
CN110706242A (en) * 2019-08-26 2020-01-17 浙江工业大学 Object-level edge detection method based on depth residual error network
CN111242138A (en) * 2020-01-11 2020-06-05 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
CN111325762A (en) * 2020-01-21 2020-06-23 广西科技大学 Contour detection method based on dense connection decoding network
CN112347859A (en) * 2020-10-15 2021-02-09 北京交通大学 Optical remote sensing image saliency target detection method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8457437B2 (en) * 2010-03-23 2013-06-04 Raytheon Company System and method for enhancing registered images using edge overlays
US10410353B2 (en) * 2017-05-18 2019-09-10 Mitsubishi Electric Research Laboratories, Inc. Multi-label semantic boundary detection system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740869A (en) * 2016-01-28 2016-07-06 北京工商大学 Square operator edge extraction method and system based on multiple scales and multiple resolutions
CN107610140A (en) * 2017-08-07 2018-01-19 中国科学院自动化研究所 Near edge detection method, device based on depth integration corrective networks
CN110706242A (en) * 2019-08-26 2020-01-17 浙江工业大学 Object-level edge detection method based on depth residual error network
CN111242138A (en) * 2020-01-11 2020-06-05 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
CN111325762A (en) * 2020-01-21 2020-06-23 广西科技大学 Contour detection method based on dense connection decoding network
CN112347859A (en) * 2020-10-15 2021-02-09 北京交通大学 Optical remote sensing image saliency target detection method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Fast accurate contours for 3D shape recognition;M. U. Butt等;《2015 IEEE Intelligent Vehicles Symposium (IV)》;20150827;第832-838页 *
Lateral refinement network for contour detection;Chuan Lin等;《Neurocomputing》;20200624;第409卷;第361-371页 *
一种基于密集深度分离卷积的SAR图像水域分割算法;张金松;《雷达学报》;20190307;第8卷(第03期);第400-412页 *
基于多层次感知网络的GF-2遥感影像建筑物提取;卢麒等;《国土资源遥感》;20210615;第33卷(第02期);第75-84页 *
视觉仿生轮廓检测中多尺度融合方法研究;林川等;《计算机仿真》;20190415;第36卷(第04期);第362-368页 *

Also Published As

Publication number Publication date
CN113538484A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN110599409B (en) Convolutional neural network image denoising method based on multi-scale convolutional groups and parallel
CN106875373B (en) Mobile phone screen MURA defect detection method based on convolutional neural network pruning algorithm
CN112116605B (en) Pancreas CT image segmentation method based on integrated depth convolution neural network
CN109754017B (en) Hyperspectral image classification method based on separable three-dimensional residual error network and transfer learning
CN107464217B (en) Image processing method and device
CN112232229B (en) Fine water body extraction method based on U-net neural network
CN112435191B (en) Low-illumination image enhancement method based on fusion of multiple neural network structures
CN111325762B (en) Contour detection method based on dense connection decoding network
CN110827297A (en) Insulator segmentation method for generating countermeasure network based on improved conditions
CN109325513B (en) Image classification network training method based on massive single-class images
CN111105375B (en) Image generation method, model training method and device thereof, and electronic equipment
CN113066025B (en) Image defogging method based on incremental learning and feature and attention transfer
CN113538484B (en) Deep-refinement multiple-information nested edge detection method
CN111882516B (en) Image quality evaluation method based on visual saliency and deep neural network
CN111062432B (en) Semantically multi-modal image generation method
CN111160378A (en) Depth estimation system based on single image multitask enhancement
CN113642445A (en) Hyperspectral image classification method based on full convolution neural network
CN109949334B (en) Contour detection method based on deep reinforced network residual error connection
CN109934835B (en) Contour detection method based on deep strengthening network adjacent connection
CN114742985A (en) Hyperspectral feature extraction method and device and storage medium
CN109102457B (en) Intelligent color changing system and method based on convolutional neural network
CN110599495A (en) Image segmentation method based on semantic information mining
CN111724306B (en) Image reduction method and system based on convolutional neural network
CN110111252A (en) Single image super-resolution method based on projection matrix
CN111767842B (en) Micro-expression type discrimination method based on transfer learning and self-encoder data enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20211022

Assignee: Liuzhou Wanyou Printing Co.,Ltd.

Assignor: GUANGXI University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2023980054135

Denomination of invention: A Deep Refined Multi information Nested Edge Detection Method

Granted publication date: 20220610

License type: Common License

Record date: 20231225