CN113421269A - Real-time semantic segmentation method based on double-branch deep convolutional neural network - Google Patents

Real-time semantic segmentation method based on double-branch deep convolutional neural network Download PDF

Info

Publication number
CN113421269A
CN113421269A CN202110640607.6A CN202110640607A CN113421269A CN 113421269 A CN113421269 A CN 113421269A CN 202110640607 A CN202110640607 A CN 202110640607A CN 113421269 A CN113421269 A CN 113421269A
Authority
CN
China
Prior art keywords
feature
layer
prediction
image
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110640607.6A
Other languages
Chinese (zh)
Inventor
刘悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Ruiyi Intelligent Technology Co ltd
Original Assignee
Nanjing Ruiyi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Ruiyi Intelligent Technology Co ltd filed Critical Nanjing Ruiyi Intelligent Technology Co ltd
Priority to CN202110640607.6A priority Critical patent/CN113421269A/en
Publication of CN113421269A publication Critical patent/CN113421269A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Abstract

The invention discloses a real-time semantic segmentation method based on a double-branch deep convolutional neural network. The method comprises the following steps: preprocessing a city landscape semantic segmentation data set; retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features; designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality and carry out channel dimensionality combination; sharing characteristic information of different stages in a ResNet residual error network by utilizing a shared characteristic layer and a pooling layer, and constructing a local branch with rich detailed information; designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph; utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image; and (4) carrying out classification prediction on each pixel of the prediction image of the One-Hot coding by utilizing a Softmax classification layer, and finally obtaining an image segmentation result. The invention improves the segmentation prediction speed of the deep convolutional network on the high-resolution image and realizes the performance upgrade of the semantic segmentation precision and the segmentation speed.

Description

Real-time semantic segmentation method based on double-branch deep convolutional neural network
Technical Field
The invention relates to the field of deep learning of computer vision, in particular to a real-time semantic segmentation method based on a double-branch deep convolutional neural network.
Background
Computer-based processing and analysis of images is a major goal of machine vision tasks and is also a very challenging task. The human visual system can rapidly analyze image information captured by eyes, and analyze the whole scene by combining with multilayer brain neurons, and the extraction and analysis of image semantic information can be realized by combining with computer and Deep Convolutional Neural Networks (DCNN) which are developed rapidly in recent years, so that the pixel-by-pixel mapping transformation from a characteristic diagram to an original image is realized, the boundary segmentation of different region blocks in the image is realized, and the process of analyzing the whole scene is finally realized. The method has extremely important research significance in the fields of medical image analysis, geographical remote sensing image analysis, automatic driving and the like.
Semantic segmentation is mainly divided into two categories from a method point of view: the first is a classical image segmentation algorithm based on traditional image processing; the second category is deep learning algorithms based on convolutional neural networks. In the sixty-seven decades of the twentieth century, image segmentation still stays at the traditional image segmentation stage in the period, and the segmentation is realized by utilizing simple image features. Prewitt et al calculate one or more gray threshold values based on the gray features of the image and compare the gray value of each pixel in the image to the threshold values and finally classify the pixels into the appropriate categories based on the comparison. Boykov Y and Rother C et al propose that GraphCut and GrabCut, respectively, are graph theory-based image segmentation methods that relate the image segmentation problem to the min cut problem of the graph. The essence of the graph theory-based segmentation method is to remove a specific edge and divide a graph into a plurality of sub-graphs so as to realize segmentation. These manually designed operators are usually only able to extract a single feature and thus cannot fully represent the features possessed by the object.
Deep learning occupies a place in machine vision with its excellent characteristic characterization performance. Deep learning obtains the deep semantic features by constructing a convolutional neural network with a deep structure, and has stronger generalization capability. In 2015, Badrinarayanan et al proposed a real-time semantic segmentation network SegNet, which is a typical Encoder-Decoder architecture. SegNet makes up the encoder with the removal of fully connected layers in the VGG-16 convolutional network architecture to generate low resolution image representations or feature maps, which are then mapped to pixel level predictions by a decoder with a symmetric structure. The decoder part consists of a series of upsampling and convolutional layers, and is finally connected with a Softmax classifier to predict labels at a pixel level, so that the labels are used as output, and the resolution can reach the same resolution as that of an input image. SegNet has an apparently symmetrical structure, one for each encoder layer. Furthermore, the function of pooling multiple recording locations compared to pooling for a full convolutional network is used in the encoder layer. Since the lost weight of each convolution kernel is irretrievable after Max-firing, SegNet records the position of the maximum weight value when the encoder layer performs Max-firing, and realizes nonlinear up-sampling by the corresponding maximum pooling layer index at the decoder layer. The method avoids the excessive calculation amount caused by using deconvolution when the full convolution network is used for up-sampling, does not need to open up a space for storing the characteristic diagram during coding, and improves the calculation efficiency. Although the model has the advantage of real-time reasoning, there is still room for improvement in semantic segmentation accuracy.
Disclosure of Invention
The invention aims to provide a real-time semantic segmentation method based on a double-branch deep convolutional neural network, which is high in speed and accuracy.
The technical solution for realizing the purpose of the invention is as follows: a real-time semantic segmentation method based on a double-branch deep convolutional neural network comprises the following steps:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality for channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
step 7, carrying out classification prediction on each pixel in the prediction image of the One-Hot code by utilizing a Softmax classification layer, and finally obtaining an image segmentation result;
further, step 2 retrains the deep convolutional neural network ResNet on the data set, and extracts deep semantic features, which is specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantic segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
Figure BDA0003107458860000021
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting a One-Hot matrix, wherein each element corresponds to One-Hot vector in the matrix, the elements only have two values of 0 and 1, if the category is the same as the sample category, the value is 1, and if the category is not consistent with the sample, the value is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
Figure BDA0003107458860000031
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
further, step 3 is to design a global branch formed by the normalized convolutional layers, and perform normalized convolution operation on the feature maps of different stages of the ResNet respectively to obtain feature maps of the same dimension for channel dimension combination, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
Figure BDA0003107458860000032
wherein x is the input feature map of the residual block, F (x) represents the feature mapping function implemented by the residual block,
Figure BDA0003107458860000033
representing the output signature after passing through a residual block, which allows the network to converge more quickly.
Normalizing convolutional layers at different stages of the ResNet residual error network, normalizing feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
Figure BDA0003107458860000034
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,
Figure BDA0003107458860000035
indicating the bias parameters for the c-th channel at the k-th layer network layer.
Further, in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual error network, and a local branch with rich detail information is constructed, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
Figure BDA0003107458860000041
Figure BDA0003107458860000042
wherein f isi,j(s)maxRepresenting a maximum pooling operation characteristic value; f. ofi,j(s)avgRepresenting an average pooling operation characteristic value; k represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average denote the maximum and average operations, respectively.
Further, the up-sampling operation in step 6 is used to implement the mapping transformation from the prediction image to the resolution of the original image:
Figure BDA0003107458860000043
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
Further, the step 7 of performing classification prediction on each pixel in the prediction map of One-Hot encoding by using the Softmax classification layer, and finally obtaining an image segmentation result:
Figure BDA0003107458860000044
wherein, PiRepresenting the probability value of the ith target, k representing the index value of the current prediction category, k representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
Compared with the prior art, the invention has the following remarkable advantages: (1) an easily-trained ResNet18 residual neural network is used as a feature extractor, so that the representation capability of the model on the image is improved, the convergence is easier, and the precision of an image segmentation algorithm is higher; (2) by combining a corresponding convolutional network layer, a unique double-branch network architecture and a characteristic merging strategy, the algorithm can ensure the segmentation precision and improve the segmentation prediction speed; (3) the method can realize the functions of semantic segmentation and scene analysis in real time in the high-resolution video material, and can be applied to the fields of automatic driving and the like.
Drawings
FIG. 1 is a flow chart of a real-time semantic segmentation method based on a dual-branch deep convolutional neural network according to the present invention.
Fig. 2 is a diagram of the effect of real-time semantic segmentation experiment in Cityscapes city landscape dataset, where (a) is an original image in the dataset, (b) is a label diagram corresponding to the real segmentation effect of the original image, and (c) is a diagram of the predicted segmentation effect of the semantic segmentation model on the original image.
Detailed Description
The invention discloses a real-time semantic segmentation method based on a double-branch deep convolutional neural network. Firstly, processing an original label file and an image of a Cityscapes data set to manufacture a corresponding training label image file, and synthesizing a training data set; retraining a ResNet deep convolution neural network on the data set, and extracting deep semantic features; designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of ResNet at different stages to obtain feature maps with the same dimension for channel dimension combination; sharing characteristic information of different stages in the ResNet residual error network by using the shared characteristic layer and the pooling layer, and constructing a local branch with rich detailed information; designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph; utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image; the method comprises the following steps of performing classification prediction on each pixel of a prediction image of One-Hot coding by using a Softmax classification layer, and finally obtaining an image segmentation result, wherein the method specifically comprises the following steps:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality for channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
step 7, carrying out classification prediction on each pixel in the prediction image of the One-Hot code by utilizing a Softmax classification layer, and finally obtaining an image segmentation result;
further, step 2 retrains the deep convolutional neural network ResNet on the data set, and extracts deep semantic features, which is specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantic segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
Figure BDA0003107458860000051
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting a One-Hot matrix, wherein each element corresponds to One-Hot vector in the matrix, the elements only have two values of 0 and 1, if the category is the same as the sample category, the value is 1, and if the category is not consistent with the sample, the value is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
Figure BDA0003107458860000061
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
further, step 3 is to design a global branch formed by the normalized convolutional layers, and perform normalized convolution operation on the feature maps of different stages of the ResNet respectively to obtain feature maps of the same dimension for channel dimension combination, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
Figure BDA0003107458860000062
wherein x is the input feature map of the residual block, F (x) represents the feature mapping function implemented by the residual block,
Figure BDA0003107458860000063
representing output signatures after passing through a residual blockThe residual connection structure makes the network convergence faster.
Normalizing convolutional layers at different stages of the ResNet residual error network, normalizing feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
Figure BDA0003107458860000064
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,
Figure BDA0003107458860000065
indicating the bias parameters for the c-th channel at the k-th layer network layer.
Further, in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual error network, and a local branch with rich detail information is constructed, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
Figure BDA0003107458860000071
Figure BDA0003107458860000072
wherein f isi,j(s)maxTo representMaximum pooling operating characteristic value; f. ofi,j(s)avgRepresenting an average pooling operation characteristic value; i represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average denote the maximum and average operations, respectively.
Further, the up-sampling operation in step 6 is used to implement the mapping transformation from the prediction image to the resolution of the original image:
Figure BDA0003107458860000074
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
Further, the step 7 of performing classification prediction on each pixel in the prediction map of One-Hot encoding by using the Softmax classification layer, and finally obtaining an image segmentation result:
Figure BDA0003107458860000073
wherein, PiRepresenting the probability value of the ith target, k representing the index value of the current prediction category, k representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
The invention is described in further detail below with reference to the figures and the embodiments.
Examples
The invention discloses a real-time semantic segmentation method based on a double-branch deep convolutional neural network, which mainly comprises three components: the first part is a global branch constructed by ResNet18 as an infrastructure and a convolutional layer; the second part is a local branch constructed by a characteristic diagram, a pooling layer and the like of different stages of a shared ResNet18 model; the third part is a feature merging module, which fuses prediction graphs of global branches and local branches, and the detailed steps are as follows in combination with fig. 1:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality for channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branch and the local branch, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
step 7, carrying out classification prediction on each pixel in the prediction image of the One-Hot code by utilizing a Softmax classification layer, and finally obtaining an image segmentation result;
further, step 2 retrains the deep convolutional neural network ResNet on the data set, and extracts deep semantic features, which is specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantic segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
Figure BDA0003107458860000081
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting an One-Hot matrix, each element corresponding to One of the matrixThe elements of the One-Hot vector only have two values of 0 and 1, if the category is the same as the category of the sample, the category is 1, and if the category is not consistent with the sample, the category is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
Figure BDA0003107458860000082
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
further, step 3 is to design a global branch formed by the normalized convolutional layers, and perform normalized convolution operation on the feature maps of different stages of the ResNet respectively to obtain feature maps of the same dimension for channel dimension combination, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
Figure BDA0003107458860000083
wherein x is the input feature map of the residual block, F (x) represents the feature mapping function implemented by the residual block,
Figure BDA0003107458860000084
representing the output signature after passing through a residual block, which allows the network to converge more quickly.
Normalizing convolutional layers at different stages of the ResNet residual error network, normalizing feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
Figure BDA0003107458860000091
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,
Figure BDA0003107458860000092
indicating the bias parameters for the c-th channel at the k-th layer network layer.
Further, in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual error network, and a local branch with rich detail information is constructed, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
Figure BDA0003107458860000093
Figure BDA0003107458860000094
wherein f isi,j(s)maxRepresenting a maximum pooling operation characteristic value; f. ofi,j(s)avgRepresenting an average pooling operation characteristic value; k represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average denote the maximum and average operations, respectively.
Further, the up-sampling operation in step 6 is used to implement the mapping transformation from the prediction image to the resolution of the original image:
Figure BDA0003107458860000095
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
Further, the step 7 of performing classification prediction on each pixel in the prediction map of One-Hot encoding by using the Softmax classification layer, and finally obtaining an image segmentation result:
Figure BDA0003107458860000101
wherein, PiRepresenting the probability value of the ith target, k representing the index value of the current prediction category, k representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
The invention discloses an effect graph for semantic segmentation and scene analysis of an image in an urban landscape data set, wherein a graph 2(a) is an image which is selected from a Cityscapes data set and has prediction categories of pedestrians, vehicles and roads, a graph 2(b) is a real segmentation effect graph corresponding to an original image, different area blocks have different colors for distinguishing, and a graph 2(c) is a segmentation prediction graph of the image after passing through a real-time semantic segmentation model.

Claims (6)

1. A real-time semantic segmentation method based on a double-branch deep convolutional neural network is characterized by comprising the following steps:
step 1, preprocessing a city landscape semantic segmentation data set to obtain an original image in the data set;
step 2, retraining a deep convolutional neural network ResNet on the data set, and extracting deep semantic features;
step 3, designing a global branch formed by the normalized convolutional layer, and respectively carrying out normalized convolution operation on the feature maps of different stages of ResNet to obtain feature maps with the same dimensionality and carry out channel dimensionality combination;
step 4, sharing feature information of different stages in the ResNet residual error network by utilizing the shared feature layer and the pooling layer, and constructing a local branch with rich detail information;
step 5, designing a feature merging module, fusing feature mapping graphs of the global branches and the local branches, integrating feature information of different scales, and obtaining a final prediction graph;
step 6, utilizing an up-sampling operation to realize the mapping transformation from the prediction image to the resolution of the original image;
step 7, classifying and predicting each pixel in the prediction image of the One-Hot coding by utilizing a Softmax classification layer, and finally obtaining an image segmentation result;
2. the method for real-time semantic segmentation based on the double-branch deep convolutional neural network according to claim 1, wherein step 2 retrains the deep convolutional neural network ResNet on the data set to extract deep semantic features, specifically as follows:
training a ResNet-18 residual neural network model on a preprocessed large-scale high-resolution city landscape City semantics segmentation data set, using the model as an extractor of deep semantic features, performing class prediction on each pixel, calculating cross entropy loss, and training by combining a back propagation algorithm, wherein a loss function corresponding to each pixel is as follows:
Figure FDA0003107458850000011
wherein pixel _ loss represents the loss of each pixel after being calculated by a convolutional neural network, classes represents all the prediction categories of the semantic segmentation model, and ytrueRepresenting a One-Hot matrix, wherein each element corresponds to One-Hot vector in the matrix, the elements only have two values of 0 and 1, if the category is the same as the sample category, the category is 1, and if the category is different from the sampleThen is 0, ypredRepresenting the probability of the prediction sample belonging to the current class;
Figure FDA0003107458850000012
wherein bp _ loss represents the total loss of the back propagation of the whole image, w and h respectively represent the corresponding width and height of the whole image, and pixel _ lossijIndicating the loss of the pixel corresponding to the ith row and the jth column;
3. the real-time semantic segmentation method based on the double-branch deep convolutional neural network according to claim 1, wherein in step 3, the global branch formed by the designed normalized convolutional layer is subjected to normalized convolutional operation respectively on feature maps at different stages of ResNet, feature maps with the same dimensionality are obtained, and channel dimensionality combination is performed, specifically as follows:
the deep convolutional neural network is realized by utilizing the residual block in the residual network ResNet, and meanwhile, the overfitting phenomenon caused by deepening of a network layer can be avoided, wherein the characteristic mapping realized by the residual block is as follows:
Figure FDA0003107458850000021
wherein X is the input feature map of the residual block, F (X) represents the feature mapping function implemented by the residual block,
Figure FDA0003107458850000022
representing the output profile after passing through a residual block, which allows the network to converge more quickly.
Normalizing convolution layers at different stages of the ResNet residual error network to normalize feature maps with different channel dimensions and different space dimensions to completely same size feature maps by utilizing convolution operation, and realizing feature fusion of a high-dimensional feature map and a low-dimensional feature map, wherein the normalization convolution operation is defined as:
Figure FDA0003107458850000023
wherein k, c, i, j are respectively a characteristic diagram c channel, an ith row, a jth column and y corresponding to the kth layerc,i,jFor outputting the characteristic value of the pixel at the corresponding position of the characteristic map, w (k)c,0,ki,kj) Weight parameter, x (k), representing the convolution kernel in a convolution operationc,i+ki,j+kj) A feature value representing the size of the convolution kernel corresponding to the input feature map in the convolution operation,
Figure FDA0003107458850000024
indicating the bias parameters for the c-th channel at the k-th layer network layer.
4. The real-time semantic segmentation method based on the dual-branch deep convolutional neural network of claim 1, wherein in step 4, the shared feature layer and the pooling layer are used to share feature information of different stages in the ResNet residual network, so as to construct local branches with rich detail information, specifically as follows:
extracting feature maps at different stages of a residual error network, learning the feature maps by utilizing network layers such as a pooling layer and an upsampling layer, and extracting rich image detail information as follows:
Figure FDA0003107458850000025
Figure FDA0003107458850000026
wherein f isi,j(S)maxRepresenting a maximum pooling operation characteristic value; f. ofi,j(S)avgRepresenting an average pooling operation characteristic value; k represents the size of the convolution kernel; i, j represents the calculation characteristic value of the ith row and the jth column of the corresponding convolution kernel; max, average respectively representMaximum and average operations.
5. The method for real-time semantic segmentation based on the dual-branch deep convolutional neural network as claimed in claim 1, wherein the step 6 uses an upsampling operation to implement mapping transformation from a predicted image to the resolution of an original image:
Figure FDA0003107458850000031
wherein x and y respectively represent the abscissa and the ordinate of the midpoint of the coordinate system; f (0,0), f (0,1), f (1,0), f (1,1) represent the coordinates of four known coordinate points of the bilinear interpolation operation.
6. The real-time semantic segmentation method based on the double-branch deep convolutional neural network of claim 1, wherein in step 7, classification prediction is performed on each pixel in the prediction graph of One-Hot coding by using a Softmax classification layer, and finally an image segmentation result is obtained:
Figure FDA0003107458850000032
wherein, PiRepresenting the probability value of the ith target, K representing the index value of the current prediction category, K representing the number of prediction categories of the semantic segmentation model, aiRepresenting the eigenvalues of the ith target.
CN202110640607.6A 2021-06-09 2021-06-09 Real-time semantic segmentation method based on double-branch deep convolutional neural network Pending CN113421269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110640607.6A CN113421269A (en) 2021-06-09 2021-06-09 Real-time semantic segmentation method based on double-branch deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110640607.6A CN113421269A (en) 2021-06-09 2021-06-09 Real-time semantic segmentation method based on double-branch deep convolutional neural network

Publications (1)

Publication Number Publication Date
CN113421269A true CN113421269A (en) 2021-09-21

Family

ID=77788042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110640607.6A Pending CN113421269A (en) 2021-06-09 2021-06-09 Real-time semantic segmentation method based on double-branch deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN113421269A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688836A (en) * 2021-09-28 2021-11-23 四川大学 Real-time road image semantic segmentation method and system based on deep learning
CN114224354A (en) * 2021-11-15 2022-03-25 吉林大学 Arrhythmia classification method, device and readable storage medium
CN114332715A (en) * 2021-12-30 2022-04-12 武汉华信联创技术工程有限公司 Method, device and equipment for identifying snow through automatic meteorological observation and storage medium
CN114399519A (en) * 2021-11-30 2022-04-26 西安交通大学 MR image 3D semantic segmentation method and system based on multi-modal fusion
CN114795258A (en) * 2022-04-18 2022-07-29 浙江大学 Child hip joint dysplasia diagnosis system
CN114943963A (en) * 2022-04-29 2022-08-26 南京信息工程大学 Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN115640418A (en) * 2022-12-26 2023-01-24 天津师范大学 Cross-domain multi-view target website retrieval method and device based on residual semantic consistency
CN116052110A (en) * 2023-03-28 2023-05-02 四川公路桥梁建设集团有限公司 Intelligent positioning method and system for pavement marking defects

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952220A (en) * 2017-03-14 2017-07-14 长沙全度影像科技有限公司 A kind of panoramic picture fusion method based on deep learning
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
CN109919869A (en) * 2019-02-28 2019-06-21 腾讯科技(深圳)有限公司 A kind of image enchancing method, device and storage medium
US20200160065A1 (en) * 2018-08-10 2020-05-21 Naver Corporation Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network
CN111768415A (en) * 2020-06-15 2020-10-13 哈尔滨工程大学 Image instance segmentation method without quantization pooling
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952220A (en) * 2017-03-14 2017-07-14 长沙全度影像科技有限公司 A kind of panoramic picture fusion method based on deep learning
US20200160065A1 (en) * 2018-08-10 2020-05-21 Naver Corporation Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
CN109919869A (en) * 2019-02-28 2019-06-21 腾讯科技(深圳)有限公司 A kind of image enchancing method, device and storage medium
CN111768415A (en) * 2020-06-15 2020-10-13 哈尔滨工程大学 Image instance segmentation method without quantization pooling
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YUE LIU; ZHICHAO LIAN: "PSDNet: A Balanced Architecture of Accuracy and Parameters for Semantic Segmentation", 《2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》 *
刘悦: "基于语义分割的道路异常检测研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 *
霍雨佳: "机械臂末端视觉目标跟踪及三维重建方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
马天浩; 谭海; 李天琪; 吴雅男; 刘祺: "多尺度特征融合的膨胀卷积残差网络高分一号影像道路提取", 《激光与光电子学进展》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688836A (en) * 2021-09-28 2021-11-23 四川大学 Real-time road image semantic segmentation method and system based on deep learning
CN114224354A (en) * 2021-11-15 2022-03-25 吉林大学 Arrhythmia classification method, device and readable storage medium
CN114224354B (en) * 2021-11-15 2024-01-30 吉林大学 Arrhythmia classification method, arrhythmia classification device, and readable storage medium
CN114399519A (en) * 2021-11-30 2022-04-26 西安交通大学 MR image 3D semantic segmentation method and system based on multi-modal fusion
CN114399519B (en) * 2021-11-30 2023-08-22 西安交通大学 MR image 3D semantic segmentation method and system based on multi-modal fusion
CN114332715A (en) * 2021-12-30 2022-04-12 武汉华信联创技术工程有限公司 Method, device and equipment for identifying snow through automatic meteorological observation and storage medium
CN114795258A (en) * 2022-04-18 2022-07-29 浙江大学 Child hip joint dysplasia diagnosis system
CN114943963A (en) * 2022-04-29 2022-08-26 南京信息工程大学 Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN115640418A (en) * 2022-12-26 2023-01-24 天津师范大学 Cross-domain multi-view target website retrieval method and device based on residual semantic consistency
CN116052110A (en) * 2023-03-28 2023-05-02 四川公路桥梁建设集团有限公司 Intelligent positioning method and system for pavement marking defects

Similar Documents

Publication Publication Date Title
CN113421269A (en) Real-time semantic segmentation method based on double-branch deep convolutional neural network
CN112541503B (en) Real-time semantic segmentation method based on context attention mechanism and information fusion
CN112101175A (en) Expressway vehicle detection and multi-attribute feature extraction method based on local images
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN111612008B (en) Image segmentation method based on convolution network
CN108280397B (en) Human body image hair detection method based on deep convolutional neural network
CN111191583B (en) Space target recognition system and method based on convolutional neural network
CN112308860A (en) Earth observation image semantic segmentation method based on self-supervision learning
CN110706239B (en) Scene segmentation method fusing full convolution neural network and improved ASPP module
CN111291809A (en) Processing device, method and storage medium
Özkanoğlu et al. InfraGAN: A GAN architecture to transfer visible images to infrared domain
CN111696110B (en) Scene segmentation method and system
CN110717921B (en) Full convolution neural network semantic segmentation method of improved coding and decoding structure
CN110969171A (en) Image classification model, method and application based on improved convolutional neural network
WO2023030182A1 (en) Image generation method and apparatus
CN111179193B (en) Dermatoscope image enhancement and classification method based on DCNNs and GANs
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN110490155B (en) Method for detecting unmanned aerial vehicle in no-fly airspace
CN113269133A (en) Unmanned aerial vehicle visual angle video semantic segmentation method based on deep learning
CN113269224A (en) Scene image classification method, system and storage medium
CN113592894A (en) Image segmentation method based on bounding box and co-occurrence feature prediction
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
CN113592893A (en) Image foreground segmentation method combining determined main body and refined edge
CN111368776B (en) High-resolution remote sensing image classification method based on deep ensemble learning
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination