CN113538485B - Contour detection method for learning biological visual pathway - Google Patents
Contour detection method for learning biological visual pathway Download PDFInfo
- Publication number
- CN113538485B CN113538485B CN202110784619.6A CN202110784619A CN113538485B CN 113538485 B CN113538485 B CN 113538485B CN 202110784619 A CN202110784619 A CN 202110784619A CN 113538485 B CN113538485 B CN 113538485B
- Authority
- CN
- China
- Prior art keywords
- result
- processing
- layer
- network
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 21
- 210000000239 visual pathway Anatomy 0.000 title claims abstract description 16
- 230000004400 visual pathway Effects 0.000 title claims abstract description 16
- 230000004927 fusion Effects 0.000 claims abstract description 83
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 153
- 230000003042 antagnostic effect Effects 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 21
- 238000011176 pooling Methods 0.000 claims description 21
- 238000000034 method Methods 0.000 claims description 19
- 230000004044 response Effects 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 230000009977 dual effect Effects 0.000 claims description 7
- 230000008485 antagonism Effects 0.000 claims description 6
- 239000011800 void material Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000003708 edge detection Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000011664 nicotinic acid Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Abstract
The invention aims to provide a contour detection method for learning biological visual pathways, which comprises the following steps: constructing a deep neural network structure, wherein the deep neural network structure is as follows: the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet; the original image is processed by an encoding network, a decoding network and a feedforward fusion module in sequence to obtain a final output contour. The invention can make the encoder obtain richer contour characteristic information and improve the contour detection performance.
Description
Technical Field
The invention relates to the field of image processing, in particular to a contour detection method for learning biological visual pathways.
Background
Contour detection aims at extracting the boundary between the background and the object in an image, usually as a key step of front-end processing of various middle and high-level computer vision tasks, and is one of the basic tasks in the field of computer vision research. In recent years, deep learning has been rapidly developed, and some scholars design a contour detection model based on a Convolutional Neural Network (CNN), wherein the contour detection model consists of an encoder and a decoder, the encoder commonly adopts VGG16 or ResNet, and the decoder architecture design is the research focus. End-to-end contour extraction can be realized by CNN-based models, and experiments prove that the models achieve remarkable effect on a Berkeley segmentation data set (BSDS 500).
Although the end-to-end contour detection method based on the CNN achieves a significant effect, the main innovation points of the current models are the design of a decoder, and the models lack guidance of a visual mechanism. The decoder functions to restore the original resolution image by fusing the output features of the encoder.
Disclosure of Invention
The invention aims to provide a contour detection method for learning a biological visual pathway, which starts from enhancing the characteristic expression capability of an encoder and is inspired by the biological visual pathway and a related visual mechanism thereof, and designs a bionic contour enhancer. The intensifier is combined with the encoder, so that the encoder can obtain richer contour characteristic information, and the purpose of improving the contour detection performance is achieved.
The technical scheme of the invention is as follows:
the method for detecting the contour of the biological visual pathway comprises the following steps:
A. constructing a deep neural network structure, wherein the deep neural network structure is as follows:
the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet; the FENet network is a self-creation structure; the FENet Chinese is named as a feature enhancement network;
the VGG16 network takes the pooling layer as a boundary and is divided into stages S1, S2, S3, S4 and S5;
the FENet includes four sub-networks: a single antagonistic feature subnetwork, a dual antagonistic feature subnetwork, a V1 exporter subnetwork, a V2 exporter subnetwork; the single antagonistic characteristic subnetwork simulates a single antagonistic receptive field mechanism in the retina/LGN, and the double antagonistic characteristic subnetwork simulates a double antagonistic receptive field mechanism in V1;
B. inputting an original image into a VGG16 network, and performing convolution processing in S1, S2, S3, S4 and S5 stages in sequence to respectively obtain output results S1, S2, S3, S4 and S5, wherein the output result S1 is sent to a decoding network;
processing an original image by a formula 1 to obtain four inputs of R-G, G-R, B-Y and Y-B;
SOi=Cm-ωCn (1)
wherein i represents R-G, G-R, B-Y, Y-B; m and n both represent R, G, B, Y components; omega is a coefficient and takes the value of 0.7;
inputting R-G, G-R, B-Y and Y-B into a single antagonistic characteristic subnetwork for processing to obtain an output result a, adding the output result a and the output result S2 for fusion to obtain a fusion result a, and inputting the fusion result a into a decoding network;
inputting R-G, G-R, B-Y and Y-B into a dual-antagonistic characteristic subnetwork for processing to obtain an output result B, adding the output result B and the output result S3 for fusion to obtain a fusion result B, and inputting the fusion result B into a decoding network;
the edge response of a V1 area is obtained by an original image through an SCO algorithm, the edge response is input into a V1 output sub-network for processing, an output result c is obtained, the output result c and an output result S4 are added and fused, a fusion result c is obtained, and the fusion result c is input into a decoding network;
the edge response of a V2 area is obtained by an original image through an SED algorithm, the edge response is input into a V2 output sub-network for processing, an output result d is obtained, the output result d and an output result S5 are added and fused, a fusion result d is obtained, and the fusion result d is input into a decoding network;
C. respectively inputting the output result a and the output result b into a feedforward fusion module;
the output result S1, the fusion result a, the fusion result b, the fusion result c and the fusion result d are processed by a decoding network to obtain a decoding output result, the decoding output result is input into a feedforward fusion module, and the loss of the decoding output result is calculated;
D. in the feed-forward fusion module, after an output result a and an output result b respectively pass through a 1x1-1 convolution layer, the original resolution is restored through upsampling, the loss of the original resolution is calculated, finally, the original resolution is multiplied by weight, the obtained result and a decoding output result are added and fused to obtain a final output contour, and the loss of the final output contour is calculated.
The SCO algorithm is described in the following documents: K. yang, S. -B.Gao, C. -F.Guo, C. -Y.Li, Y. -J.Li, Boundary detection using double-open and spatial sparse constraint, IEEE Transactions on Image Processing,24(2015) 2565-.
The SED algorithm is described in the following documents: akbania, C.A. Parraga, Feedback and Surround Modulated Boundary Detection, International Journal of Computer Vision,126(2018) 1367-.
The convolution expression related to each step is m × n-k conv + ReLU, wherein m × n represents the size of a convolution kernel, k represents the number of output channels, conv represents a convolution formula, and ReLU represents an activation function; m, n and k are preset values; the convolution expression of the final fusion layer is m x n-k conv.
The VGG16 network is obtained by the original VGG16 network through the following structural adjustment:
the pooling layers between S4 and S5 were removed, and the three convolutional layers of S5 were changed to the void convolutional layers with void rates of 2, 4, and 8 in sequence.
The sub-network of single antagonistic features comprises: R-G, G-R, B-Y, Y-B four groups of single antagonistic convolution treatment stages, SEM multiscale enhancement module, 3X 3-128 convolutional layer;
the R-G, G-R, B-Y and Y-B single antagonistic convolution treatment stages are the same and respectively pass through a 3x 3-3 convolution layer, a 3x 3-64 convolution layer, a maximum pooling layer and a 3x 3-128 convolution layer in sequence;
the sub-network processing procedure for the single antagonistic feature is as follows:
adding and fusing the features processed in the R-G and G-R single-antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single-antagonistic enhancement result a; adding and fusing the characteristics processed in the B-Y, Y-B single antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single antagonistic enhancement result B;
and splicing the single antagonism enhancement result a and the single antagonism enhancement result b, and then matching the number of channels through a 3x 3-128 convolution layer to obtain a fusion result a.
The dual antagonistic feature subnetwork comprises: R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages, SEM multi-scale enhancement module, 1 × 1-256 convolution layer;
the R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages are the same, the input of each stage is divided into two paths, the two paths in each stage respectively pass through a 9 x 9-3 convolution layer, a 9 x 9-64 convolution layer, a 2 x 2 maximum pooling layer, a 9 x 9-128 convolution layer, a 2 x 2 maximum pooling layer and a 9 x 9-256 convolution layer in sequence, and are subtracted after being multiplied by the trainable weight normalized by a sigmoid function to respectively obtain R-G, G-R and B-Y, Y-B dual-antagonistic convolution processing results;
adding and fusing R-G and G-R dual-antagonism convolution processing results, and processing the results through an SEM multi-scale enhancement module to obtain a dual-antagonism enhancement result a; adding and fusing the results of the B-Y, Y-B dual-antagonistic convolution processing, and processing by an SEM multi-scale enhancement module to obtain a dual-antagonistic enhancement result B;
and splicing the double-antagonism enhancement result a and the double-antagonism enhancement result b, and then matching the number of channels through a 1x 1-256 convolution layer to obtain a fusion result b.
The V1 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V1 area is subjected to 2 × 2 maximum pooling for three times in the V1 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result c is obtained through the number of matching channels of the 3 × 3-512 convolutional layers.
The V2 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V2 area is subjected to 2 × 2 maximum pooling for three times in the V2 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result d is obtained through the number of matching channels of 3 × 3-512 convolutional layers.
The decoding network is a self-created RDNet network named as a residual error decoder network in Chinese;
the decoding network is of a 4-layer structure consisting of a plurality of unit modules R, a first layer comprises 4 unit modules R, a second layer comprises 3 unit modules, a third layer comprises 2 unit modules R, and a fourth layer comprises 1 unit module R;
respectively inputting the fusion result d and the fusion result c into a first unit module R of the first layer, and processing by the unit module R to obtain a processing result R1;
the processing result dc and the fusion result b are respectively input into a second unit module R of the first layer, and are processed by the unit module R to obtain a processing result R2;
inputting the processing result R2 and the fusion result a into the third unit module R of the first layer, and processing by the unit module R to obtain a processing result R3;
the processing result R3 and the output result S1 are respectively inputted into the fourth unit module R of the first layer, and are processed by the unit module R to obtain a processing result R4;
the processing result R1 and the processing result R2 are respectively input into the first unit module R of the second layer, and the processing result R5 is obtained after the processing of the unit module R;
the processing result R5 and the processing result R3 are respectively inputted into the second unit module R of the second layer, and are processed by the unit module R to obtain a processing result R6;
the processing result R6 and the processing result R4 are respectively inputted into the third unit module R of the second layer, and are processed by the unit module R to obtain a processing result R7;
the processing result R5 and the processing result R6 are respectively input into the first unit module R of the third layer, and are processed by the unit module R to obtain a processing result R8;
the processing result R8 and the processing result R7 are respectively inputted into the second unit module R of the third layer, and the processing result R9 is obtained after the processing of the unit module R;
the processing result R8 and the processing result R9 are respectively input into the unit module R of the fourth layer, the unit module R processes the processing result R10, and the processing result R10 obtains the decoding output result through 1 × 1-1 convolution.
The unit module R comprises two input channels, wherein a channel 1 inputs an image with a larger size, and a channel 2 inputs an image with a smaller size;
sequentially carrying out 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer processing and multiplication by a trainable weight normalized by a sigmoid function in the channel 1 on the image to obtain an output result of the channel 1;
the image is sequentially subjected to 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer multiplication and sigmoid function normalization trainable weight in a channel 2, and the size of the result is consistent with that of the channel 1 output result through up-sampling, so that the channel 2 output result is obtained;
the number of output channels of the 3x3 convolution layers in the channel 1 and the channel 2 is consistent with the small number of channels in the two inputs;
and adding and fusing the output result of the channel 1 and the output result of the channel 2 to obtain the output result of the current unit module R.
In the steps C and D, the formula for calculating the loss is as follows:
the total loss is as follows:
in the above formula, θiAnd thetafuseWeights representing the loss and the final predicted loss of the three sub-network outputs, PiRepresenting three different outputs, PfuseRepresenting the final edge prediction, Y represents the true edge map;
l(Pfusey) is as follows:
for a real edge map Y ═ Y (Y)j,j=1,...,|Y|),yjE {0, 1}, and Y + = { Y is definedj,yjEta and Y- ═ Yj,yj=0},Y+And Y-Respectively representing a positive sample set and a negative sample set, and other pixels are ignored entirely.
Thus l (P)fuseY) is calculated as follows:
in the formula (3), P represents prediction, PjRepresenting the value processed by a sigmoid function at pixel j. α and β are used to balance the positive and negative samples, and λ is a weight that controls the magnitude of the coefficient.
The method is inspired by a biological visual channel and a related visual mechanism thereof, forms a simulated bionic contour enhancement encoder, and can effectively enhance the contour characteristics of the encoder, so that a decoding network obtains richer characteristic information and the contour detection performance is improved.
Drawings
Fig. 1 is an overall structural view of a deep neural network according to embodiment 1 of the present invention;
fig. 2 is an overall structural diagram of a coding network according to embodiment 1 of the present invention;
FIG. 3 is a block diagram of a single antagonistic feature subnetwork of example 1 of the present invention;
FIG. 4 is a block diagram of a dual antagonistic feature subnetwork of example 1 of the present invention;
FIG. 5 is a structural diagram of a V1 export sub-network according to embodiment 1 of the present invention;
FIG. 6 is a structural diagram of a V2 export sub-network according to embodiment 1 of the present invention;
FIG. 7 is a block diagram of a feedforward fusion module according to embodiment 1 of the present invention;
fig. 8 is a structural diagram of a decoding network of embodiment 1 of the present invention;
fig. 9 is a structural diagram of a unit module R in the decoding network according to embodiment 1 of the present invention;
FIG. 10 is a graph showing the comparison between the contour detection effects of the embodiment 1 of the present invention and that of the reference 1;
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a contour detection method for learning a biological visual pathway, which comprises the following steps:
the method for detecting the contour of the biological visual pathway comprises the following steps:
A. a deep neural network structure is constructed, and is shown in figures 1-9, and the deep neural network structure is specifically as follows:
the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet; the decoding network is an RDNet network;
the VGG16 network takes the pooling layer as a boundary and is divided into stages S1, S2, S3, S4 and S5;
the FENet includes four sub-networks: a single antagonistic feature subnetwork, a dual antagonistic feature subnetwork, a V1 exporter subnetwork, a V2 exporter subnetwork;
B. inputting an original image into a VGG16 network, and performing convolution processing in S1, S2, S3, S4 and S5 stages in sequence to respectively obtain output results S1, S2, S3, S4 and S5, wherein the output result S1 is sent to a decoding network;
processing an original image by a formula 1 to obtain four inputs of R-G, G-R, B-Y and Y-B;
SOi=Cm-ωCn (1)
wherein i represents R-G, G-R, B-Y, Y-B; m and n both represent R, G, B, Y components; omega is a coefficient and takes the value of 0.7;
inputting R-G, G-R, B-Y and Y-B into a single antagonistic characteristic subnetwork for processing to obtain an output result a, adding the output result a and the output result S2 for fusion to obtain a fusion result a, and inputting the fusion result a into a decoding network;
inputting R-G, G-R, B-Y and Y-B into a dual-antagonistic characteristic subnetwork for processing to obtain an output result B, adding the output result B and the output result S3 for fusion to obtain a fusion result B, and inputting the fusion result B into a decoding network;
the edge response of a V1 area is obtained by an original image through an SCO algorithm, the edge response is input into a V1 output sub-network for processing, an output result c is obtained, the output result c and an output result S4 are added and fused, a fusion result c is obtained, and the fusion result c is input into a decoding network;
the edge response of a V2 area is obtained by an original image through an SED algorithm, the edge response is input into a V2 output sub-network for processing, an output result d is obtained, the output result d and an output result S5 are added and fused, a fusion result d is obtained, and the fusion result d is input into a decoding network;
C. respectively inputting the output result a and the output result b into a feedforward fusion module;
the output result S1, the fusion result a, the fusion result b, the fusion result c and the fusion result d are processed by a decoding network to obtain a decoding output result, the decoding output result is input into a feedforward fusion module, and the loss of the decoding output result is calculated;
D. in the feed-forward fusion module, after an output result a and an output result b respectively pass through a 1x1-1 convolution layer, the original resolution is restored through upsampling, the loss of the original resolution is calculated, finally, the original resolution is multiplied by weight, the obtained result and a decoding output result are added and fused to obtain a final output contour, and the loss of the final output contour is calculated.
The SCO algorithm is described in the following documents: K. yang, S. -B.Gao, C. -F.Guo, C. -Y.Li, Y. -J.Li, Boundary detection using double-open and spatial sparse constraint, IEEE Transactions on Image Processing,24(2015) 2565-.
The SED algorithm is described in the following documents: akbania, C.A. Parraga, Feedback and Surround Modulated Boundary Detection, International Journal of Computer Vision,126(2018) 1367-.
The convolution expression related to each step is m × n-k conv + ReLU, wherein m × n represents the size of a convolution kernel, k represents the number of output channels, conv represents a convolution formula, and ReLU represents an activation function; m, n and k are preset values; the convolution expression of the final fusion layer is m x n-k conv.
The VGG16 network is obtained by the original VGG16 network through the following structural adjustment:
the pooling layers between S4 and S5 were removed, and the three convolutional layers of S5 were changed to the void convolutional layers with void rates of 2, 4, and 8 in sequence.
The sub-network of single antagonistic features comprises: R-G, G-R, B-Y, Y-B four groups of single antagonistic convolution treatment stages, SEM multiscale enhancement module, 3X 3-128 convolutional layer;
the R-G, G-R, B-Y and Y-B single antagonistic convolution treatment stages are the same and respectively pass through a 3x 3-3 convolution layer, a 3x 3-64 convolution layer, a maximum pooling layer and a 3x 3-128 convolution layer in sequence;
the sub-network processing procedure for the single antagonistic feature is as follows:
adding and fusing the features processed in the R-G and G-R single-antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single-antagonistic enhancement result a; adding and fusing the characteristics processed in the B-Y, Y-B single antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single antagonistic enhancement result B;
and splicing the single antagonism enhancement result a and the single antagonism enhancement result b, and then matching the number of channels through a 3x 3-128 convolution layer to obtain a fusion result a.
The dual antagonistic feature subnetwork comprises: R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages, SEM multi-scale enhancement module, 1 × 1-256 convolution layer;
the R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages are the same, the input of each stage is divided into two paths, the two paths in each stage respectively pass through a 9 x 9-3 convolution layer, a 9 x 9-64 convolution layer, a 2 x 2 maximum pooling layer, a 9 x 9-128 convolution layer, a 2 x 2 maximum pooling layer and a 9 x 9-256 convolution layer in sequence, and are subtracted after being multiplied by the trainable weight normalized by a sigmoid function to respectively obtain R-G, G-R and B-Y, Y-B dual-antagonistic convolution processing results;
adding and fusing R-G and G-R dual-antagonism convolution processing results, and processing the results through an SEM multi-scale enhancement module to obtain a dual-antagonism enhancement result a; adding and fusing the results of the B-Y, Y-B dual-antagonistic convolution processing, and processing by an SEM multi-scale enhancement module to obtain a dual-antagonistic enhancement result B;
and splicing the double-antagonism enhancement result a and the double-antagonism enhancement result b, and then matching the number of channels through a 1x 1-256 convolution layer to obtain a fusion result b.
The V1 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V1 area is subjected to 2 × 2 maximum pooling for three times in the V1 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result c is obtained through the number of matching channels of the 3 × 3-512 convolutional layers.
The V2 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V2 area is subjected to 2 × 2 maximum pooling for three times in the V2 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result d is obtained through the number of matching channels of 3 × 3-512 convolutional layers.
The decoding network is of a 4-layer structure consisting of a plurality of unit modules R, the first layer comprises 4 unit modules R, the second layer comprises 3 unit modules, the third layer comprises 2 unit modules R, and the fourth layer comprises 1 unit module R;
respectively inputting the fusion result d and the fusion result c into a first unit module R of the first layer, and processing by the unit module R to obtain a processing result R1;
the processing result dc and the fusion result b are respectively input into a second unit module R of the first layer, and are processed by the unit module R to obtain a processing result R2;
inputting the processing result R2 and the fusion result a into the third unit module R of the first layer, and processing by the unit module R to obtain a processing result R3;
the processing result R3 and the output result S1 are respectively inputted into the fourth unit module R of the first layer, and are processed by the unit module R to obtain a processing result R4;
the processing result R1 and the processing result R2 are respectively input into the first unit module R of the second layer, and the processing result R5 is obtained after the processing of the unit module R;
the processing result R5 and the processing result R3 are respectively inputted into the second unit module R of the second layer, and are processed by the unit module R to obtain a processing result R6;
the processing result R6 and the processing result R4 are respectively inputted into the third unit module R of the second layer, and are processed by the unit module R to obtain a processing result R7;
the processing result R5 and the processing result R6 are respectively input into the first unit module R of the third layer, and are processed by the unit module R to obtain a processing result R8;
the processing result R8 and the processing result R7 are respectively inputted into the second unit module R of the third layer, and the processing result R9 is obtained after the processing of the unit module R;
the processing result R8 and the processing result R9 are respectively input into the unit module R of the fourth layer, the unit module R processes the processing result R10, and the processing result R10 obtains the decoding output result through 1 × 1-1 convolution.
The unit module R comprises two input channels, wherein a channel 1 inputs an image with a larger size, and a channel 2 inputs an image with a smaller size;
sequentially carrying out 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer processing and multiplication by a trainable weight normalized by a sigmoid function in the channel 1 on the image to obtain an output result of the channel 1;
the image is sequentially subjected to 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer multiplication and sigmoid function normalization trainable weight in a channel 2, and the size of the result is consistent with that of the channel 1 output result through up-sampling, so that the channel 2 output result is obtained;
the number of output channels of the 3x3 convolution layers in the channel 1 and the channel 2 is consistent with the small number of channels in the two inputs;
and adding and fusing the output result of the channel 1 and the output result of the channel 2 to obtain the output result of the current unit module R.
In the steps C and D, the formula for calculating the loss is as follows:
the total loss is as follows:
in the above formula, θiAnd thetafuseWeights representing losses of three sub-network outputs and final predicted losses, respectivelyWeight of (1), PiRepresenting three different outputs, PfuseRepresenting the final edge prediction, Y represents the true edge map;
l(Pfusey) is as follows:
for a real edge map Y ═ Y (Y)j,j=1,...,|Y|),yjE {0, 1}, and Y is defined+={yj,yjEta and Y- ═ Yj,yj=0},Y+And Y-Respectively representing a positive sample set and a negative sample set, and other pixels are ignored entirely.
Thus l (P)fuseY) is calculated as follows:
in the formula (3), P represents prediction, PjRepresenting the value processed by a sigmoid function at pixel j. α and β are used to balance the positive and negative samples, and λ is a weight that controls the magnitude of the coefficient.
Example 2
Comparing the edge detection results of the method of this embodiment with the method of the following document 1;
document 1: S.Xie and Z.Tu, "Hollistically-connected edge detection," in International conference on Computer Vision,2015, pp.1395-1403.
The parameters used in document 1 are, as in the original text, the parameters that have been guaranteed to be optimal for the model.
For quantitative performance evaluation of the final profile, we used the same performance measurement criteria as in document 1, and the detailed evaluation is shown in equation (3).
Wherein P represents precision and R represents recall. The larger the value of F, the better the performance.
Fig. 10 shows two natural images selected from the berkeley segmented data set (BSDS500), corresponding real contours, contours detected by the method of document 1, and contours detected by the method of embodiment 1.
From the experimental effect, the detection method of example 1 is superior to the detection method of document 1.
Claims (7)
1. A contour detection method for learning biological visual pathways is characterized by comprising the following steps:
A. constructing a deep neural network structure, wherein the deep neural network structure is as follows:
the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet;
the VGG16 network takes the pooling layer as a boundary and is divided into stages S1, S2, S3, S4 and S5;
the FENet includes four sub-networks: a single antagonistic feature subnetwork, a dual antagonistic feature subnetwork, a V1 exporter subnetwork, a V2 exporter subnetwork;
B. inputting an original image into a VGG16 network, and performing convolution processing in S1, S2, S3, S4 and S5 stages in sequence to respectively obtain output results S1, S2, S3, S4 and S5, wherein the output result S1 is sent to a decoding network;
processing an original image by a formula 1 to obtain four inputs of R-G, G-R, B-Y and Y-B;
SOi=Cm-ωCn (1)
wherein i represents R-G, G-R, B-Y, Y-B; m and n both represent R, G, B, Y components; omega is a coefficient and takes the value of 0.7;
inputting R-G, G-R, B-Y and Y-B into a single antagonistic characteristic subnetwork for processing to obtain an output result a, adding the output result a and the output result S2 for fusion to obtain a fusion result a, and inputting the fusion result a into a decoding network;
inputting R-G, G-R, B-Y and Y-B into a dual-antagonistic characteristic subnetwork for processing to obtain an output result B, adding the output result B and the output result S3 for fusion to obtain a fusion result B, and inputting the fusion result B into a decoding network;
the edge response of a V1 area is obtained by an original image through an SCO algorithm, the edge response is input into a V1 output sub-network for processing, an output result c is obtained, the output result c and an output result S4 are added and fused, a fusion result c is obtained, and the fusion result c is input into a decoding network;
the edge response of a V2 area is obtained by an original image through an SED algorithm, the edge response is input into a V2 output sub-network for processing, an output result d is obtained, the output result d and an output result S5 are added and fused, a fusion result d is obtained, and the fusion result d is input into a decoding network;
C. respectively inputting the output result a and the output result b into a feedforward fusion module;
the output result S1, the fusion result a, the fusion result b, the fusion result c and the fusion result d are processed by a decoding network to obtain a decoding output result, the decoding output result is input into a feedforward fusion module, and the loss of the decoding output result is calculated;
D. in the feed-forward fusion module, after an output result a and an output result b respectively pass through a 1x1-1 convolution layer, the original resolution is restored through upsampling, the loss of the original resolution is calculated, finally, the original resolution is multiplied by weight, the obtained result and a decoding output result are added and fused to obtain a final output contour, and the loss of the final output contour is calculated;
the sub-network of single antagonistic features comprises: R-G, G-R, B-Y, Y-B four groups of single antagonistic convolution treatment stages, SEM multiscale enhancement module, 3X 3-128 convolutional layer;
the R-G, G-R, B-Y and Y-B single antagonistic convolution processing stages are the same and respectively pass through a 3x 3-3 convolution layer, a 3x 3-64 convolution layer, a maximum pooling layer and a 3x 3-128 convolution layer in sequence;
the sub-network processing procedure for the single antagonistic feature is as follows:
adding and fusing the features processed in the R-G and G-R single-antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single-antagonistic enhancement result a; adding and fusing the characteristics processed in the B-Y, Y-B single antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single antagonistic enhancement result B;
splicing the single antagonism enhancement result a and the single antagonism enhancement result b, and then matching the number of channels through a 3x 3-128 convolution layer to obtain a fusion result a;
the dual antagonistic feature subnetwork comprises: R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages, SEM multi-scale enhancement module, 1 × 1-256 convolution layer;
the R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages are the same, the input of each stage is divided into two paths, the two paths in each stage respectively pass through a 9 x 9-3 convolution layer, a 9 x 9-64 convolution layer, a 2 x 2 maximum pooling layer, a 9 x 9-128 convolution layer, a 2 x 2 maximum pooling layer and a 9 x 9-256 convolution layer in sequence, and are subtracted after being multiplied by the trainable weight normalized by a sigmoid function to respectively obtain R-G, G-R and B-Y, Y-B dual-antagonistic convolution processing results;
adding and fusing R-G and G-R dual-antagonism convolution processing results, and processing the results through an SEM multi-scale enhancement module to obtain a dual-antagonism enhancement result a; adding and fusing the results of the B-Y, Y-B dual-antagonistic convolution processing, and processing by an SEM multi-scale enhancement module to obtain a dual-antagonistic enhancement result B;
and splicing the double-antagonism enhancement result a and the double-antagonism enhancement result b, and then matching the number of channels through a 1x 1-256 convolution layer to obtain a fusion result b.
2. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the VGG16 network is obtained by the original VGG16 network through the following structural adjustment:
the pooling layers between S4 and S5 were removed, and the three convolutional layers of S5 were changed to the void convolutional layers with void rates of 2, 4, and 8 in sequence.
3. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the V1 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V1 area is subjected to 2 × 2 maximum pooling for three times in the V1 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result c is obtained through the number of matching channels of the 3 × 3-512 convolutional layers.
4. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the V2 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V2 area is subjected to 2 × 2 maximum pooling for three times in the V2 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result d is obtained through the number of matching channels of 3 × 3-512 convolutional layers.
5. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the decoding network is of a 4-layer structure consisting of a plurality of unit modules R, the first layer comprises 4 unit modules R, the second layer comprises 3 unit modules, the third layer comprises 2 unit modules R, and the fourth layer comprises 1 unit module R;
respectively inputting the fusion result d and the fusion result c into a first unit module R of the first layer, and processing by the unit module R to obtain a processing result R1;
inputting the processing result R1 and the fusion result b into the second unit module R of the first layer, and processing by the unit module R to obtain a processing result R2;
inputting the processing result R2 and the fusion result a into the third unit module R of the first layer, and processing by the unit module R to obtain a processing result R3;
the processing result R3 and the output result S1 are respectively inputted into the fourth unit module R of the first layer, and are processed by the unit module R to obtain a processing result R4;
the processing result R1 and the processing result R2 are respectively input into the first unit module R of the second layer, and the processing result R5 is obtained after the processing of the unit module R;
the processing result R5 and the processing result R3 are respectively inputted into the second unit module R of the second layer, and are processed by the unit module R to obtain a processing result R6;
the processing result R6 and the processing result R4 are respectively inputted into the third unit module R of the second layer, and are processed by the unit module R to obtain a processing result R7;
the processing result R5 and the processing result R6 are respectively input into the first unit module R of the third layer, and are processed by the unit module R to obtain a processing result R8;
the processing result R8 and the processing result R7 are respectively inputted into the second unit module R of the third layer, and the processing result R9 is obtained after the processing of the unit module R;
the processing result R8 and the processing result R9 are respectively input into the unit module R of the fourth layer, the unit module R processes the processing result R10, and the processing result R10 obtains the decoding output result through 1 × 1-1 convolution.
6. The method for detecting the profile of a learned biological visual pathway as set forth in claim 5, wherein:
the unit module R comprises two input channels;
sequentially carrying out 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer processing and multiplication by a trainable weight normalized by a sigmoid function in the channel 1 on the image to obtain an output result of the channel 1;
the image is sequentially subjected to 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer multiplication and sigmoid function normalization trainable weight in a channel 2, and the size of the result is consistent with that of the channel 1 output result through up-sampling, so that the channel 2 output result is obtained;
the number of output channels of the 3x3 convolution layers in the channel 1 and the channel 2 is consistent with the small number of channels in the two inputs;
and adding and fusing the output result of the channel 1 and the output result of the channel 2 to obtain the output result of the current unit module R.
7. The method for detecting the profile of a learned biological visual pathway as set forth in claim 5, wherein:
in the steps C and D, the formula for calculating the loss is as follows:
the total loss is as follows:
in the above formula, θiAnd thetafuseWeights representing the loss and the final predicted loss of the three sub-network outputs, PiRepresenting three different outputs, PfuseRepresenting the final edge prediction, Y represents the true edge map;
l(Pfusey) is as follows:
for a real edge map Y ═ Y (Y)j,j=1,...,|Y|),yjE {0, 1}, and Y is defined+={yj,yjEta and Y-={yj,yj=0},Y+And Y-Respectively representing a positive sample set and a negative sample set, and neglecting all other pixels;
thus l (P)fuseY) is calculated as follows:
in the formulae (3) and (4), P represents prediction, PjRepresenting the value processed by a sigmoid function at pixel j, α and β are used to balance the positive and negative samples, and λ is the weight that controls the magnitude of the coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110784619.6A CN113538485B (en) | 2021-08-25 | 2021-08-25 | Contour detection method for learning biological visual pathway |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110784619.6A CN113538485B (en) | 2021-08-25 | 2021-08-25 | Contour detection method for learning biological visual pathway |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113538485A CN113538485A (en) | 2021-10-22 |
CN113538485B true CN113538485B (en) | 2022-04-22 |
Family
ID=78098542
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110784619.6A Active CN113538485B (en) | 2021-08-25 | 2021-08-25 | Contour detection method for learning biological visual pathway |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113538485B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463360B (en) * | 2021-10-27 | 2024-03-15 | 广西科技大学 | Contour detection method based on bionic characteristic enhancement network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2776988A1 (en) * | 2003-02-06 | 2004-08-26 | Dolby Laboratories Licensing Corporation | Conversion of synthesized spectral components for encoding and low-complexity transcoding |
CN109509192A (en) * | 2018-10-18 | 2019-03-22 | 天津大学 | Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space |
CN109949334A (en) * | 2019-01-25 | 2019-06-28 | 广西科技大学 | Profile testing method based on the connection of deeply network residual error |
CN110222628A (en) * | 2019-06-03 | 2019-09-10 | 电子科技大学 | A kind of face restorative procedure based on production confrontation network |
CN111325762A (en) * | 2020-01-21 | 2020-06-23 | 广西科技大学 | Contour detection method based on dense connection decoding network |
CN113222328A (en) * | 2021-03-25 | 2021-08-06 | 中国科学技术大学先进技术研究院 | Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity |
-
2021
- 2021-08-25 CN CN202110784619.6A patent/CN113538485B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2776988A1 (en) * | 2003-02-06 | 2004-08-26 | Dolby Laboratories Licensing Corporation | Conversion of synthesized spectral components for encoding and low-complexity transcoding |
CN109509192A (en) * | 2018-10-18 | 2019-03-22 | 天津大学 | Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space |
CN109949334A (en) * | 2019-01-25 | 2019-06-28 | 广西科技大学 | Profile testing method based on the connection of deeply network residual error |
CN110222628A (en) * | 2019-06-03 | 2019-09-10 | 电子科技大学 | A kind of face restorative procedure based on production confrontation network |
CN111325762A (en) * | 2020-01-21 | 2020-06-23 | 广西科技大学 | Contour detection method based on dense connection decoding network |
CN113222328A (en) * | 2021-03-25 | 2021-08-06 | 中国科学技术大学先进技术研究院 | Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity |
Non-Patent Citations (3)
Title |
---|
Design and Implementation of Viterbi Encoding and Decoding Algorithm on FPGA;M. Irfan;《2005 International Conference on Microelectronics》;20060213;234-239 * |
基于多通道Gabor滤波的指纹图像二值化方法;林川;《科学技术与工程》;20130808(第22期);6487-6491 * |
高分辨率遥感图像典型目标高精度分割研究;王宇;《中国优秀博硕士学位论文全文数据库(博士)工程科技Ⅱ辑》;20200715;C028-9 * |
Also Published As
Publication number | Publication date |
---|---|
CN113538485A (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299274B (en) | Natural scene text detection method based on full convolution neural network | |
CN107169421B (en) | Automobile driving scene target detection method based on deep convolutional neural network | |
CN108960261B (en) | Salient object detection method based on attention mechanism | |
CN111340814B (en) | RGB-D image semantic segmentation method based on multi-mode self-adaptive convolution | |
CN110363290B (en) | Image recognition method, device and equipment based on hybrid neural network model | |
CN107180430A (en) | A kind of deep learning network establishing method and system suitable for semantic segmentation | |
CN112070044B (en) | Video object classification method and device | |
CN114937204B (en) | Neural network remote sensing change detection method for lightweight multi-feature aggregation | |
CN111161244B (en) | Industrial product surface defect detection method based on FCN + FC-WXGboost | |
CN113570508A (en) | Image restoration method and device, storage medium and terminal | |
CN110119805B (en) | Convolutional neural network algorithm based on echo state network classification | |
CN112101262B (en) | Multi-feature fusion sign language recognition method and network model | |
Hou et al. | Handwritten digit recognition based on depth neural network | |
CN112036260A (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN113538485B (en) | Contour detection method for learning biological visual pathway | |
CN116052218B (en) | Pedestrian re-identification method | |
US20230316699A1 (en) | Image semantic segmentation algorithm and system based on multi-channel deep weighted aggregation | |
CN109766918A (en) | Conspicuousness object detecting method based on the fusion of multi-level contextual information | |
Xue et al. | Research on edge detection operator of a convolutional neural network | |
CN112927236B (en) | Clothing analysis method and system based on channel attention and self-supervision constraint | |
CN112418070B (en) | Attitude estimation method based on decoupling ladder network | |
CN113807356A (en) | End-to-end low visibility image semantic segmentation method | |
CN107729885A (en) | A kind of face Enhancement Method based on the study of multiple residual error | |
CN113538484B (en) | Deep-refinement multiple-information nested edge detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20211022 Assignee: GUANGXI YINGTENG EDUCATION TECHNOLOGY Co.,Ltd. Assignor: GUANGXI University OF SCIENCE AND TECHNOLOGY Contract record no.: X2023980053979 Denomination of invention: Contour detection methods for learning biological visual pathways Granted publication date: 20220422 License type: Common License Record date: 20231226 |
|
EE01 | Entry into force of recordation of patent licensing contract |