CN113538485B - Contour detection method for learning biological visual pathway - Google Patents

Contour detection method for learning biological visual pathway Download PDF

Info

Publication number
CN113538485B
CN113538485B CN202110784619.6A CN202110784619A CN113538485B CN 113538485 B CN113538485 B CN 113538485B CN 202110784619 A CN202110784619 A CN 202110784619A CN 113538485 B CN113538485 B CN 113538485B
Authority
CN
China
Prior art keywords
result
processing
layer
network
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110784619.6A
Other languages
Chinese (zh)
Other versions
CN113538485A (en
Inventor
林川
张哲一
谢智星
陈永亮
张晓�
张贞光
吴海晨
李福章
潘勇才
韦艳霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University of Science and Technology
Original Assignee
Guangxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University of Science and Technology filed Critical Guangxi University of Science and Technology
Priority to CN202110784619.6A priority Critical patent/CN113538485B/en
Publication of CN113538485A publication Critical patent/CN113538485A/en
Application granted granted Critical
Publication of CN113538485B publication Critical patent/CN113538485B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention aims to provide a contour detection method for learning biological visual pathways, which comprises the following steps: constructing a deep neural network structure, wherein the deep neural network structure is as follows: the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet; the original image is processed by an encoding network, a decoding network and a feedforward fusion module in sequence to obtain a final output contour. The invention can make the encoder obtain richer contour characteristic information and improve the contour detection performance.

Description

Contour detection method for learning biological visual pathway
Technical Field
The invention relates to the field of image processing, in particular to a contour detection method for learning biological visual pathways.
Background
Contour detection aims at extracting the boundary between the background and the object in an image, usually as a key step of front-end processing of various middle and high-level computer vision tasks, and is one of the basic tasks in the field of computer vision research. In recent years, deep learning has been rapidly developed, and some scholars design a contour detection model based on a Convolutional Neural Network (CNN), wherein the contour detection model consists of an encoder and a decoder, the encoder commonly adopts VGG16 or ResNet, and the decoder architecture design is the research focus. End-to-end contour extraction can be realized by CNN-based models, and experiments prove that the models achieve remarkable effect on a Berkeley segmentation data set (BSDS 500).
Although the end-to-end contour detection method based on the CNN achieves a significant effect, the main innovation points of the current models are the design of a decoder, and the models lack guidance of a visual mechanism. The decoder functions to restore the original resolution image by fusing the output features of the encoder.
Disclosure of Invention
The invention aims to provide a contour detection method for learning a biological visual pathway, which starts from enhancing the characteristic expression capability of an encoder and is inspired by the biological visual pathway and a related visual mechanism thereof, and designs a bionic contour enhancer. The intensifier is combined with the encoder, so that the encoder can obtain richer contour characteristic information, and the purpose of improving the contour detection performance is achieved.
The technical scheme of the invention is as follows:
the method for detecting the contour of the biological visual pathway comprises the following steps:
A. constructing a deep neural network structure, wherein the deep neural network structure is as follows:
the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet; the FENet network is a self-creation structure; the FENet Chinese is named as a feature enhancement network;
the VGG16 network takes the pooling layer as a boundary and is divided into stages S1, S2, S3, S4 and S5;
the FENet includes four sub-networks: a single antagonistic feature subnetwork, a dual antagonistic feature subnetwork, a V1 exporter subnetwork, a V2 exporter subnetwork; the single antagonistic characteristic subnetwork simulates a single antagonistic receptive field mechanism in the retina/LGN, and the double antagonistic characteristic subnetwork simulates a double antagonistic receptive field mechanism in V1;
B. inputting an original image into a VGG16 network, and performing convolution processing in S1, S2, S3, S4 and S5 stages in sequence to respectively obtain output results S1, S2, S3, S4 and S5, wherein the output result S1 is sent to a decoding network;
processing an original image by a formula 1 to obtain four inputs of R-G, G-R, B-Y and Y-B;
SOi=Cm-ωCn (1)
wherein i represents R-G, G-R, B-Y, Y-B; m and n both represent R, G, B, Y components; omega is a coefficient and takes the value of 0.7;
inputting R-G, G-R, B-Y and Y-B into a single antagonistic characteristic subnetwork for processing to obtain an output result a, adding the output result a and the output result S2 for fusion to obtain a fusion result a, and inputting the fusion result a into a decoding network;
inputting R-G, G-R, B-Y and Y-B into a dual-antagonistic characteristic subnetwork for processing to obtain an output result B, adding the output result B and the output result S3 for fusion to obtain a fusion result B, and inputting the fusion result B into a decoding network;
the edge response of a V1 area is obtained by an original image through an SCO algorithm, the edge response is input into a V1 output sub-network for processing, an output result c is obtained, the output result c and an output result S4 are added and fused, a fusion result c is obtained, and the fusion result c is input into a decoding network;
the edge response of a V2 area is obtained by an original image through an SED algorithm, the edge response is input into a V2 output sub-network for processing, an output result d is obtained, the output result d and an output result S5 are added and fused, a fusion result d is obtained, and the fusion result d is input into a decoding network;
C. respectively inputting the output result a and the output result b into a feedforward fusion module;
the output result S1, the fusion result a, the fusion result b, the fusion result c and the fusion result d are processed by a decoding network to obtain a decoding output result, the decoding output result is input into a feedforward fusion module, and the loss of the decoding output result is calculated;
D. in the feed-forward fusion module, after an output result a and an output result b respectively pass through a 1x1-1 convolution layer, the original resolution is restored through upsampling, the loss of the original resolution is calculated, finally, the original resolution is multiplied by weight, the obtained result and a decoding output result are added and fused to obtain a final output contour, and the loss of the final output contour is calculated.
The SCO algorithm is described in the following documents: K. yang, S. -B.Gao, C. -F.Guo, C. -Y.Li, Y. -J.Li, Boundary detection using double-open and spatial sparse constraint, IEEE Transactions on Image Processing,24(2015) 2565-.
The SED algorithm is described in the following documents: akbania, C.A. Parraga, Feedback and Surround Modulated Boundary Detection, International Journal of Computer Vision,126(2018) 1367-.
The convolution expression related to each step is m × n-k conv + ReLU, wherein m × n represents the size of a convolution kernel, k represents the number of output channels, conv represents a convolution formula, and ReLU represents an activation function; m, n and k are preset values; the convolution expression of the final fusion layer is m x n-k conv.
The VGG16 network is obtained by the original VGG16 network through the following structural adjustment:
the pooling layers between S4 and S5 were removed, and the three convolutional layers of S5 were changed to the void convolutional layers with void rates of 2, 4, and 8 in sequence.
The sub-network of single antagonistic features comprises: R-G, G-R, B-Y, Y-B four groups of single antagonistic convolution treatment stages, SEM multiscale enhancement module, 3X 3-128 convolutional layer;
the R-G, G-R, B-Y and Y-B single antagonistic convolution treatment stages are the same and respectively pass through a 3x 3-3 convolution layer, a 3x 3-64 convolution layer, a maximum pooling layer and a 3x 3-128 convolution layer in sequence;
the sub-network processing procedure for the single antagonistic feature is as follows:
adding and fusing the features processed in the R-G and G-R single-antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single-antagonistic enhancement result a; adding and fusing the characteristics processed in the B-Y, Y-B single antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single antagonistic enhancement result B;
and splicing the single antagonism enhancement result a and the single antagonism enhancement result b, and then matching the number of channels through a 3x 3-128 convolution layer to obtain a fusion result a.
The dual antagonistic feature subnetwork comprises: R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages, SEM multi-scale enhancement module, 1 × 1-256 convolution layer;
the R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages are the same, the input of each stage is divided into two paths, the two paths in each stage respectively pass through a 9 x 9-3 convolution layer, a 9 x 9-64 convolution layer, a 2 x 2 maximum pooling layer, a 9 x 9-128 convolution layer, a 2 x 2 maximum pooling layer and a 9 x 9-256 convolution layer in sequence, and are subtracted after being multiplied by the trainable weight normalized by a sigmoid function to respectively obtain R-G, G-R and B-Y, Y-B dual-antagonistic convolution processing results;
adding and fusing R-G and G-R dual-antagonism convolution processing results, and processing the results through an SEM multi-scale enhancement module to obtain a dual-antagonism enhancement result a; adding and fusing the results of the B-Y, Y-B dual-antagonistic convolution processing, and processing by an SEM multi-scale enhancement module to obtain a dual-antagonistic enhancement result B;
and splicing the double-antagonism enhancement result a and the double-antagonism enhancement result b, and then matching the number of channels through a 1x 1-256 convolution layer to obtain a fusion result b.
The V1 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V1 area is subjected to 2 × 2 maximum pooling for three times in the V1 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result c is obtained through the number of matching channels of the 3 × 3-512 convolutional layers.
The V2 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V2 area is subjected to 2 × 2 maximum pooling for three times in the V2 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result d is obtained through the number of matching channels of 3 × 3-512 convolutional layers.
The decoding network is a self-created RDNet network named as a residual error decoder network in Chinese;
the decoding network is of a 4-layer structure consisting of a plurality of unit modules R, a first layer comprises 4 unit modules R, a second layer comprises 3 unit modules, a third layer comprises 2 unit modules R, and a fourth layer comprises 1 unit module R;
respectively inputting the fusion result d and the fusion result c into a first unit module R of the first layer, and processing by the unit module R to obtain a processing result R1;
the processing result dc and the fusion result b are respectively input into a second unit module R of the first layer, and are processed by the unit module R to obtain a processing result R2;
inputting the processing result R2 and the fusion result a into the third unit module R of the first layer, and processing by the unit module R to obtain a processing result R3;
the processing result R3 and the output result S1 are respectively inputted into the fourth unit module R of the first layer, and are processed by the unit module R to obtain a processing result R4;
the processing result R1 and the processing result R2 are respectively input into the first unit module R of the second layer, and the processing result R5 is obtained after the processing of the unit module R;
the processing result R5 and the processing result R3 are respectively inputted into the second unit module R of the second layer, and are processed by the unit module R to obtain a processing result R6;
the processing result R6 and the processing result R4 are respectively inputted into the third unit module R of the second layer, and are processed by the unit module R to obtain a processing result R7;
the processing result R5 and the processing result R6 are respectively input into the first unit module R of the third layer, and are processed by the unit module R to obtain a processing result R8;
the processing result R8 and the processing result R7 are respectively inputted into the second unit module R of the third layer, and the processing result R9 is obtained after the processing of the unit module R;
the processing result R8 and the processing result R9 are respectively input into the unit module R of the fourth layer, the unit module R processes the processing result R10, and the processing result R10 obtains the decoding output result through 1 × 1-1 convolution.
The unit module R comprises two input channels, wherein a channel 1 inputs an image with a larger size, and a channel 2 inputs an image with a smaller size;
sequentially carrying out 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer processing and multiplication by a trainable weight normalized by a sigmoid function in the channel 1 on the image to obtain an output result of the channel 1;
the image is sequentially subjected to 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer multiplication and sigmoid function normalization trainable weight in a channel 2, and the size of the result is consistent with that of the channel 1 output result through up-sampling, so that the channel 2 output result is obtained;
the number of output channels of the 3x3 convolution layers in the channel 1 and the channel 2 is consistent with the small number of channels in the two inputs;
and adding and fusing the output result of the channel 1 and the output result of the channel 2 to obtain the output result of the current unit module R.
In the steps C and D, the formula for calculating the loss is as follows:
the total loss is as follows:
Figure BDA0003158248520000051
in the above formula, θiAnd thetafuseWeights representing the loss and the final predicted loss of the three sub-network outputs, PiRepresenting three different outputs, PfuseRepresenting the final edge prediction, Y represents the true edge map;
l(Pfusey) is as follows:
for a real edge map Y ═ Y (Y)j,j=1,...,|Y|),yjE {0, 1}, and Y + = { Y is definedj,yjEta and Y- ═ Yj,yj=0},Y+And Y-Respectively representing a positive sample set and a negative sample set, and other pixels are ignored entirely.
Thus l (P)fuseY) is calculated as follows:
Figure BDA0003158248520000052
Figure BDA0003158248520000053
in the formula (3), P represents prediction, PjRepresenting the value processed by a sigmoid function at pixel j. α and β are used to balance the positive and negative samples, and λ is a weight that controls the magnitude of the coefficient.
The method is inspired by a biological visual channel and a related visual mechanism thereof, forms a simulated bionic contour enhancement encoder, and can effectively enhance the contour characteristics of the encoder, so that a decoding network obtains richer characteristic information and the contour detection performance is improved.
Drawings
Fig. 1 is an overall structural view of a deep neural network according to embodiment 1 of the present invention;
fig. 2 is an overall structural diagram of a coding network according to embodiment 1 of the present invention;
FIG. 3 is a block diagram of a single antagonistic feature subnetwork of example 1 of the present invention;
FIG. 4 is a block diagram of a dual antagonistic feature subnetwork of example 1 of the present invention;
FIG. 5 is a structural diagram of a V1 export sub-network according to embodiment 1 of the present invention;
FIG. 6 is a structural diagram of a V2 export sub-network according to embodiment 1 of the present invention;
FIG. 7 is a block diagram of a feedforward fusion module according to embodiment 1 of the present invention;
fig. 8 is a structural diagram of a decoding network of embodiment 1 of the present invention;
fig. 9 is a structural diagram of a unit module R in the decoding network according to embodiment 1 of the present invention;
FIG. 10 is a graph showing the comparison between the contour detection effects of the embodiment 1 of the present invention and that of the reference 1;
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a contour detection method for learning a biological visual pathway, which comprises the following steps:
the method for detecting the contour of the biological visual pathway comprises the following steps:
A. a deep neural network structure is constructed, and is shown in figures 1-9, and the deep neural network structure is specifically as follows:
the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet; the decoding network is an RDNet network;
the VGG16 network takes the pooling layer as a boundary and is divided into stages S1, S2, S3, S4 and S5;
the FENet includes four sub-networks: a single antagonistic feature subnetwork, a dual antagonistic feature subnetwork, a V1 exporter subnetwork, a V2 exporter subnetwork;
B. inputting an original image into a VGG16 network, and performing convolution processing in S1, S2, S3, S4 and S5 stages in sequence to respectively obtain output results S1, S2, S3, S4 and S5, wherein the output result S1 is sent to a decoding network;
processing an original image by a formula 1 to obtain four inputs of R-G, G-R, B-Y and Y-B;
SOi=Cm-ωCn (1)
wherein i represents R-G, G-R, B-Y, Y-B; m and n both represent R, G, B, Y components; omega is a coefficient and takes the value of 0.7;
inputting R-G, G-R, B-Y and Y-B into a single antagonistic characteristic subnetwork for processing to obtain an output result a, adding the output result a and the output result S2 for fusion to obtain a fusion result a, and inputting the fusion result a into a decoding network;
inputting R-G, G-R, B-Y and Y-B into a dual-antagonistic characteristic subnetwork for processing to obtain an output result B, adding the output result B and the output result S3 for fusion to obtain a fusion result B, and inputting the fusion result B into a decoding network;
the edge response of a V1 area is obtained by an original image through an SCO algorithm, the edge response is input into a V1 output sub-network for processing, an output result c is obtained, the output result c and an output result S4 are added and fused, a fusion result c is obtained, and the fusion result c is input into a decoding network;
the edge response of a V2 area is obtained by an original image through an SED algorithm, the edge response is input into a V2 output sub-network for processing, an output result d is obtained, the output result d and an output result S5 are added and fused, a fusion result d is obtained, and the fusion result d is input into a decoding network;
C. respectively inputting the output result a and the output result b into a feedforward fusion module;
the output result S1, the fusion result a, the fusion result b, the fusion result c and the fusion result d are processed by a decoding network to obtain a decoding output result, the decoding output result is input into a feedforward fusion module, and the loss of the decoding output result is calculated;
D. in the feed-forward fusion module, after an output result a and an output result b respectively pass through a 1x1-1 convolution layer, the original resolution is restored through upsampling, the loss of the original resolution is calculated, finally, the original resolution is multiplied by weight, the obtained result and a decoding output result are added and fused to obtain a final output contour, and the loss of the final output contour is calculated.
The SCO algorithm is described in the following documents: K. yang, S. -B.Gao, C. -F.Guo, C. -Y.Li, Y. -J.Li, Boundary detection using double-open and spatial sparse constraint, IEEE Transactions on Image Processing,24(2015) 2565-.
The SED algorithm is described in the following documents: akbania, C.A. Parraga, Feedback and Surround Modulated Boundary Detection, International Journal of Computer Vision,126(2018) 1367-.
The convolution expression related to each step is m × n-k conv + ReLU, wherein m × n represents the size of a convolution kernel, k represents the number of output channels, conv represents a convolution formula, and ReLU represents an activation function; m, n and k are preset values; the convolution expression of the final fusion layer is m x n-k conv.
The VGG16 network is obtained by the original VGG16 network through the following structural adjustment:
the pooling layers between S4 and S5 were removed, and the three convolutional layers of S5 were changed to the void convolutional layers with void rates of 2, 4, and 8 in sequence.
The sub-network of single antagonistic features comprises: R-G, G-R, B-Y, Y-B four groups of single antagonistic convolution treatment stages, SEM multiscale enhancement module, 3X 3-128 convolutional layer;
the R-G, G-R, B-Y and Y-B single antagonistic convolution treatment stages are the same and respectively pass through a 3x 3-3 convolution layer, a 3x 3-64 convolution layer, a maximum pooling layer and a 3x 3-128 convolution layer in sequence;
the sub-network processing procedure for the single antagonistic feature is as follows:
adding and fusing the features processed in the R-G and G-R single-antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single-antagonistic enhancement result a; adding and fusing the characteristics processed in the B-Y, Y-B single antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single antagonistic enhancement result B;
and splicing the single antagonism enhancement result a and the single antagonism enhancement result b, and then matching the number of channels through a 3x 3-128 convolution layer to obtain a fusion result a.
The dual antagonistic feature subnetwork comprises: R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages, SEM multi-scale enhancement module, 1 × 1-256 convolution layer;
the R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages are the same, the input of each stage is divided into two paths, the two paths in each stage respectively pass through a 9 x 9-3 convolution layer, a 9 x 9-64 convolution layer, a 2 x 2 maximum pooling layer, a 9 x 9-128 convolution layer, a 2 x 2 maximum pooling layer and a 9 x 9-256 convolution layer in sequence, and are subtracted after being multiplied by the trainable weight normalized by a sigmoid function to respectively obtain R-G, G-R and B-Y, Y-B dual-antagonistic convolution processing results;
adding and fusing R-G and G-R dual-antagonism convolution processing results, and processing the results through an SEM multi-scale enhancement module to obtain a dual-antagonism enhancement result a; adding and fusing the results of the B-Y, Y-B dual-antagonistic convolution processing, and processing by an SEM multi-scale enhancement module to obtain a dual-antagonistic enhancement result B;
and splicing the double-antagonism enhancement result a and the double-antagonism enhancement result b, and then matching the number of channels through a 1x 1-256 convolution layer to obtain a fusion result b.
The V1 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V1 area is subjected to 2 × 2 maximum pooling for three times in the V1 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result c is obtained through the number of matching channels of the 3 × 3-512 convolutional layers.
The V2 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V2 area is subjected to 2 × 2 maximum pooling for three times in the V2 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result d is obtained through the number of matching channels of 3 × 3-512 convolutional layers.
The decoding network is of a 4-layer structure consisting of a plurality of unit modules R, the first layer comprises 4 unit modules R, the second layer comprises 3 unit modules, the third layer comprises 2 unit modules R, and the fourth layer comprises 1 unit module R;
respectively inputting the fusion result d and the fusion result c into a first unit module R of the first layer, and processing by the unit module R to obtain a processing result R1;
the processing result dc and the fusion result b are respectively input into a second unit module R of the first layer, and are processed by the unit module R to obtain a processing result R2;
inputting the processing result R2 and the fusion result a into the third unit module R of the first layer, and processing by the unit module R to obtain a processing result R3;
the processing result R3 and the output result S1 are respectively inputted into the fourth unit module R of the first layer, and are processed by the unit module R to obtain a processing result R4;
the processing result R1 and the processing result R2 are respectively input into the first unit module R of the second layer, and the processing result R5 is obtained after the processing of the unit module R;
the processing result R5 and the processing result R3 are respectively inputted into the second unit module R of the second layer, and are processed by the unit module R to obtain a processing result R6;
the processing result R6 and the processing result R4 are respectively inputted into the third unit module R of the second layer, and are processed by the unit module R to obtain a processing result R7;
the processing result R5 and the processing result R6 are respectively input into the first unit module R of the third layer, and are processed by the unit module R to obtain a processing result R8;
the processing result R8 and the processing result R7 are respectively inputted into the second unit module R of the third layer, and the processing result R9 is obtained after the processing of the unit module R;
the processing result R8 and the processing result R9 are respectively input into the unit module R of the fourth layer, the unit module R processes the processing result R10, and the processing result R10 obtains the decoding output result through 1 × 1-1 convolution.
The unit module R comprises two input channels, wherein a channel 1 inputs an image with a larger size, and a channel 2 inputs an image with a smaller size;
sequentially carrying out 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer processing and multiplication by a trainable weight normalized by a sigmoid function in the channel 1 on the image to obtain an output result of the channel 1;
the image is sequentially subjected to 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer multiplication and sigmoid function normalization trainable weight in a channel 2, and the size of the result is consistent with that of the channel 1 output result through up-sampling, so that the channel 2 output result is obtained;
the number of output channels of the 3x3 convolution layers in the channel 1 and the channel 2 is consistent with the small number of channels in the two inputs;
and adding and fusing the output result of the channel 1 and the output result of the channel 2 to obtain the output result of the current unit module R.
In the steps C and D, the formula for calculating the loss is as follows:
the total loss is as follows:
Figure BDA0003158248520000091
in the above formula, θiAnd thetafuseWeights representing losses of three sub-network outputs and final predicted losses, respectivelyWeight of (1), PiRepresenting three different outputs, PfuseRepresenting the final edge prediction, Y represents the true edge map;
l(Pfusey) is as follows:
for a real edge map Y ═ Y (Y)j,j=1,...,|Y|),yjE {0, 1}, and Y is defined+={yj,yjEta and Y- ═ Yj,yj=0},Y+And Y-Respectively representing a positive sample set and a negative sample set, and other pixels are ignored entirely.
Thus l (P)fuseY) is calculated as follows:
Figure BDA0003158248520000101
Figure BDA0003158248520000102
in the formula (3), P represents prediction, PjRepresenting the value processed by a sigmoid function at pixel j. α and β are used to balance the positive and negative samples, and λ is a weight that controls the magnitude of the coefficient.
Example 2
Comparing the edge detection results of the method of this embodiment with the method of the following document 1;
document 1: S.Xie and Z.Tu, "Hollistically-connected edge detection," in International conference on Computer Vision,2015, pp.1395-1403.
The parameters used in document 1 are, as in the original text, the parameters that have been guaranteed to be optimal for the model.
For quantitative performance evaluation of the final profile, we used the same performance measurement criteria as in document 1, and the detailed evaluation is shown in equation (3).
Figure BDA0003158248520000103
Wherein P represents precision and R represents recall. The larger the value of F, the better the performance.
Fig. 10 shows two natural images selected from the berkeley segmented data set (BSDS500), corresponding real contours, contours detected by the method of document 1, and contours detected by the method of embodiment 1.
From the experimental effect, the detection method of example 1 is superior to the detection method of document 1.

Claims (7)

1. A contour detection method for learning biological visual pathways is characterized by comprising the following steps:
A. constructing a deep neural network structure, wherein the deep neural network structure is as follows:
the system comprises an encoding network, a decoding network and a feedforward fusion module; the coding network is a network structure combining VGG16 and FENet;
the VGG16 network takes the pooling layer as a boundary and is divided into stages S1, S2, S3, S4 and S5;
the FENet includes four sub-networks: a single antagonistic feature subnetwork, a dual antagonistic feature subnetwork, a V1 exporter subnetwork, a V2 exporter subnetwork;
B. inputting an original image into a VGG16 network, and performing convolution processing in S1, S2, S3, S4 and S5 stages in sequence to respectively obtain output results S1, S2, S3, S4 and S5, wherein the output result S1 is sent to a decoding network;
processing an original image by a formula 1 to obtain four inputs of R-G, G-R, B-Y and Y-B;
SOi=Cm-ωCn (1)
wherein i represents R-G, G-R, B-Y, Y-B; m and n both represent R, G, B, Y components; omega is a coefficient and takes the value of 0.7;
inputting R-G, G-R, B-Y and Y-B into a single antagonistic characteristic subnetwork for processing to obtain an output result a, adding the output result a and the output result S2 for fusion to obtain a fusion result a, and inputting the fusion result a into a decoding network;
inputting R-G, G-R, B-Y and Y-B into a dual-antagonistic characteristic subnetwork for processing to obtain an output result B, adding the output result B and the output result S3 for fusion to obtain a fusion result B, and inputting the fusion result B into a decoding network;
the edge response of a V1 area is obtained by an original image through an SCO algorithm, the edge response is input into a V1 output sub-network for processing, an output result c is obtained, the output result c and an output result S4 are added and fused, a fusion result c is obtained, and the fusion result c is input into a decoding network;
the edge response of a V2 area is obtained by an original image through an SED algorithm, the edge response is input into a V2 output sub-network for processing, an output result d is obtained, the output result d and an output result S5 are added and fused, a fusion result d is obtained, and the fusion result d is input into a decoding network;
C. respectively inputting the output result a and the output result b into a feedforward fusion module;
the output result S1, the fusion result a, the fusion result b, the fusion result c and the fusion result d are processed by a decoding network to obtain a decoding output result, the decoding output result is input into a feedforward fusion module, and the loss of the decoding output result is calculated;
D. in the feed-forward fusion module, after an output result a and an output result b respectively pass through a 1x1-1 convolution layer, the original resolution is restored through upsampling, the loss of the original resolution is calculated, finally, the original resolution is multiplied by weight, the obtained result and a decoding output result are added and fused to obtain a final output contour, and the loss of the final output contour is calculated;
the sub-network of single antagonistic features comprises: R-G, G-R, B-Y, Y-B four groups of single antagonistic convolution treatment stages, SEM multiscale enhancement module, 3X 3-128 convolutional layer;
the R-G, G-R, B-Y and Y-B single antagonistic convolution processing stages are the same and respectively pass through a 3x 3-3 convolution layer, a 3x 3-64 convolution layer, a maximum pooling layer and a 3x 3-128 convolution layer in sequence;
the sub-network processing procedure for the single antagonistic feature is as follows:
adding and fusing the features processed in the R-G and G-R single-antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single-antagonistic enhancement result a; adding and fusing the characteristics processed in the B-Y, Y-B single antagonistic convolution processing stage, and processing by a multi-scale enhancement module to obtain a single antagonistic enhancement result B;
splicing the single antagonism enhancement result a and the single antagonism enhancement result b, and then matching the number of channels through a 3x 3-128 convolution layer to obtain a fusion result a;
the dual antagonistic feature subnetwork comprises: R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages, SEM multi-scale enhancement module, 1 × 1-256 convolution layer;
the R-G, G-R, B-Y, Y-B dual-antagonistic convolution processing stages are the same, the input of each stage is divided into two paths, the two paths in each stage respectively pass through a 9 x 9-3 convolution layer, a 9 x 9-64 convolution layer, a 2 x 2 maximum pooling layer, a 9 x 9-128 convolution layer, a 2 x 2 maximum pooling layer and a 9 x 9-256 convolution layer in sequence, and are subtracted after being multiplied by the trainable weight normalized by a sigmoid function to respectively obtain R-G, G-R and B-Y, Y-B dual-antagonistic convolution processing results;
adding and fusing R-G and G-R dual-antagonism convolution processing results, and processing the results through an SEM multi-scale enhancement module to obtain a dual-antagonism enhancement result a; adding and fusing the results of the B-Y, Y-B dual-antagonistic convolution processing, and processing by an SEM multi-scale enhancement module to obtain a dual-antagonistic enhancement result B;
and splicing the double-antagonism enhancement result a and the double-antagonism enhancement result b, and then matching the number of channels through a 1x 1-256 convolution layer to obtain a fusion result b.
2. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the VGG16 network is obtained by the original VGG16 network through the following structural adjustment:
the pooling layers between S4 and S5 were removed, and the three convolutional layers of S5 were changed to the void convolutional layers with void rates of 2, 4, and 8 in sequence.
3. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the V1 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V1 area is subjected to 2 × 2 maximum pooling for three times in the V1 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result c is obtained through the number of matching channels of the 3 × 3-512 convolutional layers.
4. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the V2 output sub-network comprises three groups of 2X 2 maximum pool layers, an SEM multi-scale enhancement module and a 3X 3-512 convolution layer which are connected in sequence;
the edge response of the V2 area is subjected to 2 × 2 maximum pooling for three times in the V2 output sub-network, then the multi-scale features are extracted through an SEM multi-scale enhancement module, and finally the fusion result d is obtained through the number of matching channels of 3 × 3-512 convolutional layers.
5. The method for detecting the contour of a learned biological visual pathway as set forth in claim 1, wherein:
the decoding network is of a 4-layer structure consisting of a plurality of unit modules R, the first layer comprises 4 unit modules R, the second layer comprises 3 unit modules, the third layer comprises 2 unit modules R, and the fourth layer comprises 1 unit module R;
respectively inputting the fusion result d and the fusion result c into a first unit module R of the first layer, and processing by the unit module R to obtain a processing result R1;
inputting the processing result R1 and the fusion result b into the second unit module R of the first layer, and processing by the unit module R to obtain a processing result R2;
inputting the processing result R2 and the fusion result a into the third unit module R of the first layer, and processing by the unit module R to obtain a processing result R3;
the processing result R3 and the output result S1 are respectively inputted into the fourth unit module R of the first layer, and are processed by the unit module R to obtain a processing result R4;
the processing result R1 and the processing result R2 are respectively input into the first unit module R of the second layer, and the processing result R5 is obtained after the processing of the unit module R;
the processing result R5 and the processing result R3 are respectively inputted into the second unit module R of the second layer, and are processed by the unit module R to obtain a processing result R6;
the processing result R6 and the processing result R4 are respectively inputted into the third unit module R of the second layer, and are processed by the unit module R to obtain a processing result R7;
the processing result R5 and the processing result R6 are respectively input into the first unit module R of the third layer, and are processed by the unit module R to obtain a processing result R8;
the processing result R8 and the processing result R7 are respectively inputted into the second unit module R of the third layer, and the processing result R9 is obtained after the processing of the unit module R;
the processing result R8 and the processing result R9 are respectively input into the unit module R of the fourth layer, the unit module R processes the processing result R10, and the processing result R10 obtains the decoding output result through 1 × 1-1 convolution.
6. The method for detecting the profile of a learned biological visual pathway as set forth in claim 5, wherein:
the unit module R comprises two input channels;
sequentially carrying out 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer processing and multiplication by a trainable weight normalized by a sigmoid function in the channel 1 on the image to obtain an output result of the channel 1;
the image is sequentially subjected to 3 multiplied by 3 convolution, ReLU function activation, batch normalization layer multiplication and sigmoid function normalization trainable weight in a channel 2, and the size of the result is consistent with that of the channel 1 output result through up-sampling, so that the channel 2 output result is obtained;
the number of output channels of the 3x3 convolution layers in the channel 1 and the channel 2 is consistent with the small number of channels in the two inputs;
and adding and fusing the output result of the channel 1 and the output result of the channel 2 to obtain the output result of the current unit module R.
7. The method for detecting the profile of a learned biological visual pathway as set forth in claim 5, wherein:
in the steps C and D, the formula for calculating the loss is as follows:
the total loss is as follows:
Figure FDA0003547393650000041
in the above formula, θiAnd thetafuseWeights representing the loss and the final predicted loss of the three sub-network outputs, PiRepresenting three different outputs, PfuseRepresenting the final edge prediction, Y represents the true edge map;
l(Pfusey) is as follows:
for a real edge map Y ═ Y (Y)j,j=1,...,|Y|),yjE {0, 1}, and Y is defined+={yj,yjEta and Y-={yj,yj=0},Y+And Y-Respectively representing a positive sample set and a negative sample set, and neglecting all other pixels;
thus l (P)fuseY) is calculated as follows:
Figure FDA0003547393650000042
Figure FDA0003547393650000043
in the formulae (3) and (4), P represents prediction, PjRepresenting the value processed by a sigmoid function at pixel j, α and β are used to balance the positive and negative samples, and λ is the weight that controls the magnitude of the coefficient.
CN202110784619.6A 2021-08-25 2021-08-25 Contour detection method for learning biological visual pathway Active CN113538485B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110784619.6A CN113538485B (en) 2021-08-25 2021-08-25 Contour detection method for learning biological visual pathway

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110784619.6A CN113538485B (en) 2021-08-25 2021-08-25 Contour detection method for learning biological visual pathway

Publications (2)

Publication Number Publication Date
CN113538485A CN113538485A (en) 2021-10-22
CN113538485B true CN113538485B (en) 2022-04-22

Family

ID=78098542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110784619.6A Active CN113538485B (en) 2021-08-25 2021-08-25 Contour detection method for learning biological visual pathway

Country Status (1)

Country Link
CN (1) CN113538485B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463360B (en) * 2021-10-27 2024-03-15 广西科技大学 Contour detection method based on bionic characteristic enhancement network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2776988A1 (en) * 2003-02-06 2004-08-26 Dolby Laboratories Licensing Corporation Conversion of synthesized spectral components for encoding and low-complexity transcoding
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space
CN109949334A (en) * 2019-01-25 2019-06-28 广西科技大学 Profile testing method based on the connection of deeply network residual error
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network
CN111325762A (en) * 2020-01-21 2020-06-23 广西科技大学 Contour detection method based on dense connection decoding network
CN113222328A (en) * 2021-03-25 2021-08-06 中国科学技术大学先进技术研究院 Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2776988A1 (en) * 2003-02-06 2004-08-26 Dolby Laboratories Licensing Corporation Conversion of synthesized spectral components for encoding and low-complexity transcoding
CN109509192A (en) * 2018-10-18 2019-03-22 天津大学 Merge the semantic segmentation network in Analysis On Multi-scale Features space and semantic space
CN109949334A (en) * 2019-01-25 2019-06-28 广西科技大学 Profile testing method based on the connection of deeply network residual error
CN110222628A (en) * 2019-06-03 2019-09-10 电子科技大学 A kind of face restorative procedure based on production confrontation network
CN111325762A (en) * 2020-01-21 2020-06-23 广西科技大学 Contour detection method based on dense connection decoding network
CN113222328A (en) * 2021-03-25 2021-08-06 中国科学技术大学先进技术研究院 Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Design and Implementation of Viterbi Encoding and Decoding Algorithm on FPGA;M. Irfan;《2005 International Conference on Microelectronics》;20060213;234-239 *
基于多通道Gabor滤波的指纹图像二值化方法;林川;《科学技术与工程》;20130808(第22期);6487-6491 *
高分辨率遥感图像典型目标高精度分割研究;王宇;《中国优秀博硕士学位论文全文数据库(博士)工程科技Ⅱ辑》;20200715;C028-9 *

Also Published As

Publication number Publication date
CN113538485A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN107169421B (en) Automobile driving scene target detection method based on deep convolutional neural network
CN108960261B (en) Salient object detection method based on attention mechanism
CN111340814B (en) RGB-D image semantic segmentation method based on multi-mode self-adaptive convolution
CN110363290B (en) Image recognition method, device and equipment based on hybrid neural network model
CN107180430A (en) A kind of deep learning network establishing method and system suitable for semantic segmentation
CN112070044B (en) Video object classification method and device
CN114937204B (en) Neural network remote sensing change detection method for lightweight multi-feature aggregation
CN111161244B (en) Industrial product surface defect detection method based on FCN + FC-WXGboost
CN113570508A (en) Image restoration method and device, storage medium and terminal
CN110119805B (en) Convolutional neural network algorithm based on echo state network classification
CN112101262B (en) Multi-feature fusion sign language recognition method and network model
Hou et al. Handwritten digit recognition based on depth neural network
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN113538485B (en) Contour detection method for learning biological visual pathway
CN116052218B (en) Pedestrian re-identification method
US20230316699A1 (en) Image semantic segmentation algorithm and system based on multi-channel deep weighted aggregation
CN109766918A (en) Conspicuousness object detecting method based on the fusion of multi-level contextual information
Xue et al. Research on edge detection operator of a convolutional neural network
CN112927236B (en) Clothing analysis method and system based on channel attention and self-supervision constraint
CN112418070B (en) Attitude estimation method based on decoupling ladder network
CN113807356A (en) End-to-end low visibility image semantic segmentation method
CN107729885A (en) A kind of face Enhancement Method based on the study of multiple residual error
CN113538484B (en) Deep-refinement multiple-information nested edge detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20211022

Assignee: GUANGXI YINGTENG EDUCATION TECHNOLOGY Co.,Ltd.

Assignor: GUANGXI University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2023980053979

Denomination of invention: Contour detection methods for learning biological visual pathways

Granted publication date: 20220422

License type: Common License

Record date: 20231226

EE01 Entry into force of recordation of patent licensing contract