CN113421269A - Real-time semantic segmentation method based on double-branch deep convolutional neural network - Google Patents
Real-time semantic segmentation method based on double-branch deep convolutional neural network Download PDFInfo
- Publication number
- CN113421269A CN113421269A CN202110640607.6A CN202110640607A CN113421269A CN 113421269 A CN113421269 A CN 113421269A CN 202110640607 A CN202110640607 A CN 202110640607A CN 113421269 A CN113421269 A CN 113421269A
- Authority
- CN
- China
- Prior art keywords
- feature
- layer
- prediction
- image
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 53
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 35
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000011176 pooling Methods 0.000 claims abstract description 26
- 238000013507 mapping Methods 0.000 claims abstract description 25
- 238000003709 image segmentation Methods 0.000 claims abstract description 16
- 238000005070 sampling Methods 0.000 claims abstract description 11
- 230000009466 transformation Effects 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 19
- 238000010586 diagram Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010304 firing Methods 0.000 description 2
- 238000010191 image analysis Methods 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
Abstract
The invention discloses a real-time semantic segmentation method based on a double-branch deep convolutional neural network. The method comprises the following steps: preprocessing a city landscape semantic segmentation data set; retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features; designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality and carry out channel dimensionality combination; sharing characteristic information of different stages in a ResNet residual error network by utilizing a shared characteristic layer and a pooling layer, and constructing a local branch with rich detailed information; designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph; utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image; and (4) carrying out classification prediction on each pixel of the prediction image of the One-Hot coding by utilizing a Softmax classification layer, and finally obtaining an image segmentation result. The invention improves the segmentation prediction speed of the deep convolutional network on the high-resolution image and realizes the performance upgrade of the semantic segmentation precision and the segmentation speed.
Description
Technical Field
The invention relates to the field of deep learning of computer vision, in particular to a real-time semantic segmentation method based on a double-branch deep convolutional neural network.
Background
Computer-based processing and analysis of images is a major goal of machine vision tasks and is also a very challenging task. The human visual system can rapidly analyze image information captured by eyes, and analyze the whole scene by combining with multilayer brain neurons, and the extraction and analysis of image semantic information can be realized by combining with computer and Deep Convolutional Neural Networks (DCNN) which are developed rapidly in recent years, so that the pixel-by-pixel mapping transformation from a characteristic diagram to an original image is realized, the boundary segmentation of different region blocks in the image is realized, and the process of analyzing the whole scene is finally realized. The method has extremely important research significance in the fields of medical image analysis, geographical remote sensing image analysis, automatic driving and the like.
Semantic segmentation is mainly divided into two categories from a method point of view: the first is a classical image segmentation algorithm based on traditional image processing; the second category is deep learning algorithms based on convolutional neural networks. In the sixty-seven decades of the twentieth century, image segmentation still stays at the traditional image segmentation stage in the period, and the segmentation is realized by utilizing simple image features. Prewitt et al calculate one or more gray threshold values based on the gray features of the image and compare the gray value of each pixel in the image to the threshold values and finally classify the pixels into the appropriate categories based on the comparison. Boykov Y and Rother C et al propose that GraphCut and GrabCut, respectively, are graph theory-based image segmentation methods that relate the image segmentation problem to the min cut problem of the graph. The essence of the graph theory-based segmentation method is to remove a specific edge and divide a graph into a plurality of sub-graphs so as to realize segmentation. These manually designed operators are usually only able to extract a single feature and thus cannot fully represent the features possessed by the object.
Deep learning occupies a place in machine vision with its excellent characteristic characterization performance. Deep learning obtains the deep semantic features by constructing a convolutional neural network with a deep structure, and has stronger generalization capability. In 2015, Badrinarayanan et al proposed a real-time semantic segmentation network SegNet, which is a typical Encoder-Decoder architecture. SegNet makes up the encoder with the removal of fully connected layers in the VGG-16 convolutional network architecture to generate low resolution image representations or feature maps, which are then mapped to pixel level predictions by a decoder with a symmetric structure. The decoder part consists of a series of upsampling and convolutional layers, and is finally connected with a Softmax classifier to predict labels at a pixel level, so that the labels are used as output, and the resolution can reach the same resolution as that of an input image. SegNet has an apparently symmetrical structure, one for each encoder layer. Furthermore, the function of pooling multiple recording locations compared to pooling for a full convolutional network is used in the encoder layer. Since the lost weight of each convolution kernel is irretrievable after Max-firing, SegNet records the position of the maximum weight value when the encoder layer performs Max-firing, and realizes nonlinear up-sampling by the corresponding maximum pooling layer index at the decoder layer. The method avoids the excessive calculation amount caused by using deconvolution when the full convolution network is used for up-sampling, does not need to open up a space for storing the characteristic diagram during coding, and improves the calculation efficiency. Although the model has the advantage of real-time reasoning, there is still room for improvement in semantic segmentation accuracy.
Disclosure of Invention
The invention aims to provide a real-time semantic segmentation method based on a double-branch deep convolutional neural network, which is high in speed and accuracy.
The technical solution for realizing the purpose of the invention is as follows: a real-time semantic segmentation method based on a double-branch deep convolutional neural network comprises the following steps:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality for channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
further, step 2 retrains the deep convolutional neural network ResNet on the data set, and extracts deep semantic features, which is specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantic segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting a One-Hot matrix, wherein each element corresponds to One-Hot vector in the matrix, the elements only have two values of 0 and 1, if the category is the same as the sample category, the value is 1, and if the category is not consistent with the sample, the value is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
further, step 3 is to design a global branch formed by the normalized convolutional layers, and perform normalized convolution operation on the feature maps of different stages of the ResNet respectively to obtain feature maps of the same dimension for channel dimension combination, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
wherein x is the input feature map of the residual block, F (x) represents the feature mapping function implemented by the residual block,representing the output signature after passing through a residual block, which allows the network to converge more quickly.
Normalizing convolutional layers at different stages of the ResNet residual error network, normalizing feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,indicating the bias parameters for the c-th channel at the k-th layer network layer.
Further, in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual error network, and a local branch with rich detail information is constructed, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
wherein f isi,j(s)maxRepresenting a maximum pooling operation characteristic value; f. ofi,j(s)avgRepresenting an average pooling operation characteristic value; k represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average denote the maximum and average operations, respectively.
Further, the up-sampling operation in step 6 is used to implement the mapping transformation from the prediction image to the resolution of the original image:
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
Further, the step 7 of performing classification prediction on each pixel in the prediction map of One-Hot encoding by using the Softmax classification layer, and finally obtaining an image segmentation result:
wherein, PiRepresenting the probability value of the ith target, k representing the index value of the current prediction category, k representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
Compared with the prior art, the invention has the following remarkable advantages: (1) an easily-trained ResNet18 residual neural network is used as a feature extractor, so that the representation capability of the model on the image is improved, the convergence is easier, and the precision of an image segmentation algorithm is higher; (2) by combining a corresponding convolutional network layer, a unique double-branch network architecture and a characteristic merging strategy, the algorithm can ensure the segmentation precision and improve the segmentation prediction speed; (3) the method can realize the functions of semantic segmentation and scene analysis in real time in the high-resolution video material, and can be applied to the fields of automatic driving and the like.
Drawings
FIG. 1 is a flow chart of a real-time semantic segmentation method based on a dual-branch deep convolutional neural network according to the present invention.
Fig. 2 is a diagram of the effect of real-time semantic segmentation experiment in Cityscapes city landscape dataset, where (a) is an original image in the dataset, (b) is a label diagram corresponding to the real segmentation effect of the original image, and (c) is a diagram of the predicted segmentation effect of the semantic segmentation model on the original image.
Detailed Description
The invention discloses a real-time semantic segmentation method based on a double-branch deep convolutional neural network. Firstly, processing an original label file and an image of a Cityscapes data set to manufacture a corresponding training label image file, and synthesizing a training data set; retraining a ResNet deep convolution neural network on the data set, and extracting deep semantic features; designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of ResNet at different stages to obtain feature maps with the same dimension for channel dimension combination; sharing characteristic information of different stages in the ResNet residual error network by using the shared characteristic layer and the pooling layer, and constructing a local branch with rich detailed information; designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph; utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image; the method comprises the following steps of performing classification prediction on each pixel of a prediction image of One-Hot coding by using a Softmax classification layer, and finally obtaining an image segmentation result, wherein the method specifically comprises the following steps:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality for channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
further, step 2 retrains the deep convolutional neural network ResNet on the data set, and extracts deep semantic features, which is specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantic segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting a One-Hot matrix, wherein each element corresponds to One-Hot vector in the matrix, the elements only have two values of 0 and 1, if the category is the same as the sample category, the value is 1, and if the category is not consistent with the sample, the value is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
further, step 3 is to design a global branch formed by the normalized convolutional layers, and perform normalized convolution operation on the feature maps of different stages of the ResNet respectively to obtain feature maps of the same dimension for channel dimension combination, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
wherein x is the input feature map of the residual block, F (x) represents the feature mapping function implemented by the residual block,representing output signatures after passing through a residual blockThe residual connection structure makes the network convergence faster.
Normalizing convolutional layers at different stages of the ResNet residual error network, normalizing feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,indicating the bias parameters for the c-th channel at the k-th layer network layer.
Further, in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual error network, and a local branch with rich detail information is constructed, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
wherein f isi,j(s)maxTo representMaximum pooling operating characteristic value; f. ofi,j(s)avgRepresenting an average pooling operation characteristic value; i represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average denote the maximum and average operations, respectively.
Further, the up-sampling operation in step 6 is used to implement the mapping transformation from the prediction image to the resolution of the original image:
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
Further, the step 7 of performing classification prediction on each pixel in the prediction map of One-Hot encoding by using the Softmax classification layer, and finally obtaining an image segmentation result:
wherein, PiRepresenting the probability value of the ith target, k representing the index value of the current prediction category, k representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
The invention is described in further detail below with reference to the figures and the embodiments.
Examples
The invention discloses a real-time semantic segmentation method based on a double-branch deep convolutional neural network, which mainly comprises three components: the first part is a global branch constructed by ResNet18 as an infrastructure and a convolutional layer; the second part is a local branch constructed by a characteristic diagram, a pooling layer and the like of different stages of a shared ResNet18 model; the third part is a feature merging module, which fuses prediction graphs of global branches and local branches, and the detailed steps are as follows in combination with fig. 1:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality for channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
further, step 2 retrains the deep convolutional neural network ResNet on the data set, and extracts deep semantic features, which is specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantic segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting an One-Hot matrix, each element corresponding to One of the matrixThe elements of the One-Hot vector only have two values of 0 and 1, if the category is the same as the category of the sample, the category is 1, and if the category is not consistent with the sample, the category is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
further, step 3 is to design a global branch formed by the normalized convolutional layers, and perform normalized convolution operation on the feature maps of different stages of the ResNet respectively to obtain feature maps of the same dimension for channel dimension combination, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
wherein x is the input feature map of the residual block, F (x) represents the feature mapping function implemented by the residual block,representing the output signature after passing through a residual block, which allows the network to converge more quickly.
Normalizing convolutional layers at different stages of the ResNet residual error network, normalizing feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,indicating the bias parameters for the c-th channel at the k-th layer network layer.
Further, in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual error network, and a local branch with rich detail information is constructed, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
wherein f isi,j(s)maxRepresenting a maximum pooling operation characteristic value; f. ofi,j(s)avgRepresenting an average pooling operation characteristic value; k represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average denote the maximum and average operations, respectively.
Further, the up-sampling operation in step 6 is used to implement the mapping transformation from the prediction image to the resolution of the original image:
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
Further, the step 7 of performing classification prediction on each pixel in the prediction map of One-Hot encoding by using the Softmax classification layer, and finally obtaining an image segmentation result:
wherein, PiRepresenting the probability value of the ith target, k representing the index value of the current prediction category, k representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
The invention discloses an effect graph for semantic segmentation and scene analysis of an image in an urban landscape data set, wherein a graph 2(a) is an image which is selected from a Cityscapes data set and has prediction categories of pedestrians, vehicles and roads, a graph 2(b) is a real segmentation effect graph corresponding to an original image, different area blocks have different colors for distinguishing, and a graph 2(c) is a segmentation prediction graph of the image after passing through a real-time semantic segmentation model.
Claims (6)
1. A real-time semantic segmentation method based on a double-branch deep convolutional neural network is characterized by comprising the following steps:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality and carry out channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branches and the local branches, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
step 7, classifying and predicting each pixel in the prediction image of the One-Hot coding by utilizing a Softmax classification layer, and finally obtaining an image segmentation result;
2. the method for real-time semantic segmentation based on the double-branch deep convolutional neural network according to claim 1, wherein step 2 retrains the deep convolutional neural network ResNet on the data set to extract deep semantic features, specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantics segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting a One-Hot matrix, wherein each element corresponds to One-Hot vector in the matrix, the elements only have two values of 0 and 1, if the category is the same as the sample category, the category is 1, and if the category is different from the sampleThen is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
3. the real-time semantic segmentation method based on the double-branch deep convolutional neural network according to claim 1, wherein in step 3, the global branch formed by the designed normalized convolutional layer is subjected to normalized convolutional operation respectively on feature maps at different stages of ResNet, feature maps with the same dimensionality are obtained, and channel dimensionality combination is performed, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
wherein X is the input feature map of the residual block, F (X) represents the feature mapping function implemented by the residual block,representing the output profile after passing through a residual block, which allows the network to converge more quickly.
Normalizing convolution layers at different stages of the ResNet residual error network to normalize feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,indicating the bias parameters for the c-th channel at the k-th layer network layer.
4. The real-time semantic segmentation method based on the dual-branch deep convolutional neural network of claim 1, wherein in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual network, so as to construct local branches with rich detail information, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
wherein f isi,j(S)maxRepresenting a maximum pooling operation characteristic value; f. ofi,j(S)avgRepresenting an average pooling operation characteristic value; k represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average respectively representMaximum and average operations.
5. The method for real-time semantic segmentation based on the dual-branch deep convolutional neural network as claimed in claim 1, wherein the step 6 uses an upsampling operation to implement mapping transformation from a predicted image to the resolution of an original image:
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
6. The real-time semantic segmentation method based on the double-branch deep convolutional neural network of claim 1, wherein in step 7, classification prediction is performed on each pixel in the prediction graph of One-Hot coding by using a Softmax classification layer, and finally an image segmentation result is obtained:
wherein, PiRepresenting the probability value of the ith target, K representing the index value of the current prediction category, K representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110640607.6A CN113421269A (en) | 2021-06-09 | 2021-06-09 | Real-time semantic segmentation method based on double-branch deep convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110640607.6A CN113421269A (en) | 2021-06-09 | 2021-06-09 | Real-time semantic segmentation method based on double-branch deep convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113421269A true CN113421269A (en) | 2021-09-21 |
Family
ID=77788042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110640607.6A Pending CN113421269A (en) | 2021-06-09 | 2021-06-09 | Real-time semantic segmentation method based on double-branch deep convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113421269A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688836A (en) * | 2021-09-28 | 2021-11-23 | 四川大学 | Real-time road image semantic segmentation method and system based on deep learning |
CN114224354A (en) * | 2021-11-15 | 2022-03-25 | 吉林大学 | Arrhythmia classification method, device and readable storage medium |
CN114332715A (en) * | 2021-12-30 | 2022-04-12 | 武汉华信联创技术工程有限公司 | Method, device and equipment for identifying snow through automatic meteorological observation and storage medium |
CN114399519A (en) * | 2021-11-30 | 2022-04-26 | 西安交通大学 | MR image 3D semantic segmentation method and system based on multi-modal fusion |
CN114795258A (en) * | 2022-04-18 | 2022-07-29 | 浙江大学 | Child hip joint dysplasia diagnosis system |
CN114943963A (en) * | 2022-04-29 | 2022-08-26 | 南京信息工程大学 | Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network |
CN115640418A (en) * | 2022-12-26 | 2023-01-24 | 天津师范大学 | Cross-domain multi-view target website retrieval method and device based on residual semantic consistency |
CN116052110A (en) * | 2023-03-28 | 2023-05-02 | 四川公路桥梁建设集团有限公司 | Intelligent positioning method and system for pavement marking defects |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952220A (en) * | 2017-03-14 | 2017-07-14 | 长沙全度影像科技有限公司 | A kind of panoramic picture fusion method based on deep learning |
CN109711413A (en) * | 2018-12-30 | 2019-05-03 | 陕西师范大学 | Image, semantic dividing method based on deep learning |
CN109919869A (en) * | 2019-02-28 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of image enchancing method, device and storage medium |
US20200160065A1 (en) * | 2018-08-10 | 2020-05-21 | Naver Corporation | Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network |
CN111768415A (en) * | 2020-06-15 | 2020-10-13 | 哈尔滨工程大学 | Image instance segmentation method without quantization pooling |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
-
2021
- 2021-06-09 CN CN202110640607.6A patent/CN113421269A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952220A (en) * | 2017-03-14 | 2017-07-14 | 长沙全度影像科技有限公司 | A kind of panoramic picture fusion method based on deep learning |
US20200160065A1 (en) * | 2018-08-10 | 2020-05-21 | Naver Corporation | Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network |
CN109711413A (en) * | 2018-12-30 | 2019-05-03 | 陕西师范大学 | Image, semantic dividing method based on deep learning |
CN109919869A (en) * | 2019-02-28 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of image enchancing method, device and storage medium |
CN111768415A (en) * | 2020-06-15 | 2020-10-13 | 哈尔滨工程大学 | Image instance segmentation method without quantization pooling |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
Non-Patent Citations (4)
Title |
---|
YUE LIU; ZHICHAO LIAN: "PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation", 《2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》 * |
刘悦: "基于语义分割的道路异常检测研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 * |
霍雨佳: "机械臂末端视觉目标跟踪及三维重建方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
马天浩; 谭海; 李天琪; 吴雅男; 刘祺: "多尺度特征融合的膨胀卷积残差网络高分一号影像道路提取", 《激光与光电子学进展》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688836A (en) * | 2021-09-28 | 2021-11-23 | 四川大学 | Real-time road image semantic segmentation method and system based on deep learning |
CN114224354A (en) * | 2021-11-15 | 2022-03-25 | 吉林大学 | Arrhythmia classification method, device and readable storage medium |
CN114224354B (en) * | 2021-11-15 | 2024-01-30 | 吉林大学 | Arrhythmia classification method, arrhythmia classification device, and readable storage medium |
CN114399519A (en) * | 2021-11-30 | 2022-04-26 | 西安交通大学 | MR image 3D semantic segmentation method and system based on multi-modal fusion |
CN114399519B (en) * | 2021-11-30 | 2023-08-22 | 西安交通大学 | MR image 3D semantic segmentation method and system based on multi-modal fusion |
CN114332715A (en) * | 2021-12-30 | 2022-04-12 | 武汉华信联创技术工程有限公司 | Method, device and equipment for identifying snow through automatic meteorological observation and storage medium |
CN114795258A (en) * | 2022-04-18 | 2022-07-29 | 浙江大学 | Child hip joint dysplasia diagnosis system |
CN114943963A (en) * | 2022-04-29 | 2022-08-26 | 南京信息工程大学 | Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network |
CN115640418A (en) * | 2022-12-26 | 2023-01-24 | 天津师范大学 | Cross-domain multi-view target website retrieval method and device based on residual semantic consistency |
CN116052110A (en) * | 2023-03-28 | 2023-05-02 | 四川公路桥梁建设集团有限公司 | Intelligent positioning method and system for pavement marking defects |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113421269A (en) | Real-time semantic segmentation method based on double-branch deep convolutional neural network | |
CN112541503B (en) | Real-time semantic segmentation method based on context attention mechanism and information fusion | |
CN112101175A (en) | Expressway vehicle detection and multi-attribute feature extraction method based on local images | |
CN112132156B (en) | Image saliency target detection method and system based on multi-depth feature fusion | |
CN111612008B (en) | Image segmentation method based on convolution network | |
CN108280397B (en) | Human body image hair detection method based on deep convolutional neural network | |
CN111191583B (en) | Space target recognition system and method based on convolutional neural network | |
CN112308860A (en) | Earth observation image semantic segmentation method based on self-supervision learning | |
CN110706239B (en) | Scene segmentation method fusing full convolution neural network and improved ASPP module | |
CN111291809A (en) | Processing device, method and storage medium | |
Özkanoğlu et al. | InfraGAN: A GAN architecture to transfer visible images to infrared domain | |
CN111696110B (en) | Scene segmentation method and system | |
CN110717921B (en) | Full convolution neural network semantic segmentation method of improved coding and decoding structure | |
CN110969171A (en) | Image classification model, method and application based on improved convolutional neural network | |
WO2023030182A1 (en) | Image generation method and apparatus | |
CN111179193B (en) | Dermatoscope image enhancement and classification method based on DCNNs and GANs | |
CN111476133B (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN110490155B (en) | Method for detecting unmanned aerial vehicle in no-fly airspace | |
CN113269133A (en) | Unmanned aerial vehicle visual angle video semantic segmentation method based on deep learning | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
CN113592894A (en) | Image segmentation method based on bounding box and co-occurrence feature prediction | |
CN113298817A (en) | High-accuracy semantic segmentation method for remote sensing image | |
CN113592893A (en) | Image foreground segmentation method combining determined main body and refined edge | |
CN111368776B (en) | High-resolution remote sensing image classification method based on deep ensemble learning | |
CN112132207A (en) | Target detection neural network construction method based on multi-branch feature mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |