CN113902925A - Semantic segmentation method and system based on deep convolutional neural network - Google Patents

Semantic segmentation method and system based on deep convolutional neural network Download PDF

Info

Publication number
CN113902925A
CN113902925A CN202111245617.6A CN202111245617A CN113902925A CN 113902925 A CN113902925 A CN 113902925A CN 202111245617 A CN202111245617 A CN 202111245617A CN 113902925 A CN113902925 A CN 113902925A
Authority
CN
China
Prior art keywords
semantic segmentation
network
layer
module
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111245617.6A
Other languages
Chinese (zh)
Inventor
汪春梅
李康
袁非牛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Normal University
University of Shanghai for Science and Technology
Original Assignee
Shanghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Normal University filed Critical Shanghai Normal University
Priority to CN202111245617.6A priority Critical patent/CN113902925A/en
Publication of CN113902925A publication Critical patent/CN113902925A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a semantic segmentation method and a semantic segmentation system based on a deep convolutional neural network, wherein the method comprises model training and model application, and the model training comprises the following steps: acquiring a semantic segmentation image data set, and preprocessing images in the semantic segmentation data set; building a semantic segmentation network model, taking a modified ResNet50 backbone network as an encoder, and taking a multi-scale mixed pool structure and a feature attention fusion module as a decoder; and performing model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set to obtain the trained semantic segmentation network model. Compared with the prior art, the method can effectively realize the connection between the encoding and the decoding, fully extract the related information between different stages and in the same stage, effectively fuse the characteristics of the low layer and the high layer, obtain the remote dependence and rich context information, and efficiently and accurately segment the related images.

Description

Semantic segmentation method and system based on deep convolutional neural network
Technical Field
The invention relates to the technical field of image processing and neural networks, in particular to a semantic segmentation method and a semantic segmentation system based on a deep convolutional neural network.
Background
Semantic segmentation is an important research field of computer vision and is one of key technologies for realizing scene understanding. The method aims to allocate labels of semantic categories to all pixels in an image and divide and analyze a scene image into different areas related to the semantic categories. It is widely used for automatic driving, medical image analysis and target detection. Semantic segmentation is a challenging task because it requires combining dense pixel-level precision and multi-scale contextual reasoning, and furthermore considers the huge differences in content, shape and scale within the same object, as well as the high similarity between different classes of objects, and also requires attention to objects that easily confuse the boundary region.
In recent years, most advanced semantic segmentation methods often widely use a deep Convolutional Neural Network (CNN), which shows impressive capability in solving various complex challenges, and can realize end-to-end full image segmentation with precision superior to any traditional method. Dense semantic representations are extracted from input images, pixel-level labels are predicted, and through proper training, the deep CNN can acquire rich scene information by utilizing multiple convolution operations, nonlinear pooling and activation functions. However, due to the local nature of CNNs, convolution local features typically have a limited acceptance domain, although some methods use a large acceptance domain, and the extracted features mainly describe the core region of the object, largely ignoring the context around its boundary. Furthermore, objects of different classes may have similar local features, e.g. tables and chairs may share similar local textures. Therefore, the detail information brings great difficulty to deep network learning, and further results in poor image segmentation effect.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a semantic segmentation method and a semantic segmentation system based on a deep convolutional neural network.
The purpose of the invention can be realized by the following technical scheme:
a semantic segmentation method based on a deep convolutional neural network comprises model training and model application, wherein the model training comprises the following steps:
acquiring a semantic segmentation image data set, and preprocessing images in the semantic segmentation data set; building a semantic segmentation network model, taking a modified ResNet50 backbone network as an encoder, and taking a multi-scale mixed pool structure and a feature attention fusion module as a decoder, wherein the encoder performs feature extraction on an input image to obtain a feature map, and the decoder obtains a segmentation result map based on the feature map; performing model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set to obtain a trained semantic segmentation network model;
the model application specifically comprises: and performing semantic segmentation on the image by using the trained semantic segmentation network model to obtain a segmentation result graph.
Further, the constructed semantic segmentation network model takes a ResNet50 backbone network of a variant as an encoder, the ResNet50 backbone network comprises 4 layers, input images are sent to a first layer, the output of each layer is sent to the next layer, the ResNet50 backbone network of the variant is a ResNet50 backbone network which comprises a channel attention module, a third layer and a fourth layer are provided with a hole convolution, the outputs of the second layer and the third layer of the backbone network are sent to the channel attention module, the output of the channel attention module and the output of the third layer of the backbone network are combined in an Element-wise Sum mode and are sent to the fourth layer of the backbone network together, and the output of the fourth layer of the backbone network is a feature map extracted by the encoder.
Further, the built semantic segmentation network model adopts a multi-scale mixing pool structure and a feature attention fusion module as a decoder, the input of the multi-scale mixing pool structure is the output of the fourth layer of the backbone network, and the output of the multi-scale mixing pool structure and the output of the third layer of the backbone network are sent to the feature attention fusion module together; the multi-scale mixed pool structure comprises a conventional pooling module and an unconventional pooling module, and has certain improvement on the partition of objects with irregular shapes aiming at the characteristics of different objects respectively.
Further, the conventional pooling module comprises m k × k adaptive average pooling blocks, where k is greater than 0 and may be 1, 2, …, 6, and the like, and a feature map output by the fourth layer of the backbone network is sent to the m adaptive average pooling blocks after dimensionality reduction; the unconventional pooling module comprises an sx 1 pooling block and a 1 xs pooling block, a feature map output by the fourth layer of the backbone network is subjected to dimensionality reduction and then is sent into the sx 1 pooling block and the 1 xs pooling block, and the output of the unconventional pooling module is restored to the size of the sx s through matrix multiplication and convolution; and finally, the feature graph output by the conventional pooling module and the feature graph output by the unconventional pooling module are subjected to upsampling and then restored to the size of the input feature graph, and the feature graphs are combined and subjected to dimensionality reduction to obtain the output of a multi-scale mixed pool structure, namely the feature graph after mixed pooling.
Further, the feature attention fusion module is specifically described as follows: the output of the multi-scale mixing pool structure is a feature map X1The output of the third layer of the backbone network is a feature map X2Feature map X2Generating attention coefficients used for adjusting the feature map X through global average pooling, convolution, batch normalization and Sigmoid activation functions in a feature attention fusion module1Each channel of (a). The feature attention fusion module compresses and excites the feature graph output by the third layer of the backbone network, and the generated attention coefficient is used for weighting the feature graph output by the multi-scale mixed pool structure, so that the segmentation accuracy of the semantic segmentation network model is further enhanced.
Furthermore, the constructed semantic segmentation network model also comprises a context embedding block, a global average pooling layer and a classification module, wherein a feature map output by an encoder is sent to the context embedding block and then sequentially passes through the global average pooling layer and the classification module, the classification module comprises a full connection layer and a Sigmoid function, and in a network training stage, the full connection layer is used for learning global information of corresponding features to emphasize beneficial features and inhibit useless features; and the output of the classification module and the output of the characteristic attention fusion module are subjected to element multiplication so as to adjust relevant parameters of the semantic segmentation network model, optimize network performance, refine the segmentation result of the image and obtain a segmentation result graph.
Furthermore, two types of auxiliary losses are introduced, the output of the Classification module and the Classification layer (Classification label) perform Classification assistance, the Segmentation result of the image is further refined, the output of the semantic Segmentation network model, namely the Segmentation result image, and the Segmentation layer (semantic label) perform Segmentation assistance, and the training parameters of the semantic Segmentation network model are adjusted.
Further, the preprocessing the image in the semantic segmentation data set specifically includes: and carrying out cutting, turning, translation and scaling operations on the image and the mask thereof in the semantic segmentation data set, and expanding the semantic segmentation data set.
Further, the model training and the network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set specifically comprise:
taking the images in the preprocessed semantic segmentation image data set as input images, unifying the sizes of the input images, wherein the weight initialization mode in the semantic segmentation network model is Kaiming, the semantic segmentation network model is trained by using a random gradient descent algorithm with momentum, the iteration frequency is 30000 times, the weight attenuation is 1e-5, the momentum is 0.9, the batch processing number is 4, the initial learning rate is 0.001, and the learning rate iteration strategy is Poly.
A semantic segmentation system based on a deep convolutional neural network, comprising:
the image acquisition module is used for acquiring a scene image and preprocessing the scene image;
the semantic segmentation module is used for performing semantic segmentation on the scene image by using the trained semantic segmentation network model and outputting a segmentation result graph;
the training process of the semantic segmentation network model specifically comprises the following steps:
acquiring a semantic segmentation image data set, and preprocessing images in the semantic segmentation data set; building a semantic segmentation network model, taking a modified ResNet50 backbone network as an encoder, and taking a multi-scale mixed pool structure and a feature attention fusion module as a decoder, wherein the encoder performs feature extraction on an input image to obtain a feature map, and the decoder obtains a segmentation result map based on the feature map; and performing model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set to obtain the trained semantic segmentation network model.
Further, the constructed semantic segmentation network model takes a ResNet50 backbone network of a variant as an encoder, the ResNet50 backbone network of the variant is a ResNet50 backbone network which comprises a channel attention module, a third layer and a fourth layer are provided with hole convolution, outputs of the second layer and the third layer of the backbone network are sent to the channel attention module, the output of the channel attention module and the output of the third layer of the backbone network are combined in an Element-wise Sum mode and are sent to the fourth layer of the backbone network together, and the output of the fourth layer of the backbone network is a feature map extracted by the encoder.
Compared with the prior art, the invention improves the coding and decoding structure, designs some special modules to fully extract the relevant information between the features and in the features, and the modules can jointly extract rich multi-scale information and global information of different images, thereby realizing the dense classification of each pixel. Wherein, the two input channel attention modules can capture the internal correlation of the coding stage; the multi-scale mixing pool structure can effectively capture multi-scale features of different images; a feature attention fusion module that can capture external dependencies between encoding and decoding stages; and the classification module can further optimize the network performance and refine the segmentation result of the image. Therefore, the invention can efficiently and accurately carry out semantic segmentation on the image.
Drawings
FIG. 1 is a schematic diagram of a semantic segmentation network model;
FIG. 2 is a schematic diagram of a multi-scale mixing tank structure;
FIG. 3 is a flow chart of training of a semantic segmentation network model;
FIG. 4 is a graph of segmentation results obtained using a semantic segmentation network model;
FIG. 5 is a graph of segmentation results obtained using a semantic segmentation network model;
FIG. 6 is a graph of segmentation results obtained using a semantic segmentation network model;
FIG. 7 is a graph of segmentation results obtained using a semantic segmentation network model;
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example 1:
a semantic segmentation method based on a deep convolutional neural network comprises model training and model application, wherein the model training is as shown in figure 3, the embodiment adopts Python language as a basis, an open-source Pythrch is used as a neural network framework to complete the construction of a semantic segmentation network model, the semantic segmentation network model is trained through a semantic segmentation image data set, and the optimal model parameters are found out, and the method comprises the following steps:
acquiring a semantic segmentation image data set, and preprocessing images in the semantic segmentation data set; building a semantic segmentation network model, as shown in fig. 1, taking a ResNet50 backbone network of a variant as an encoder, and taking a multi-scale mixed pool structure and a feature attention fusion module as a decoder, wherein the encoder performs feature extraction on an input image to obtain a feature map, and the decoder obtains a segmentation result map based on the feature map; performing model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set to obtain a trained semantic segmentation network model;
the model application specifically comprises: and performing semantic segmentation on the image by using the trained semantic segmentation network model to obtain a segmentation result graph.
The preprocessing of the image in the semantic segmentation data set specifically comprises the following steps: the images and the masks in the semantic segmentation data set are cut, turned, translated and scaled, for example, length and width scaling is performed according to a preset scaling ratio, so that the semantic segmentation data set is expanded, and the generalization capability of the semantic segmentation network model can be improved.
The constructed semantic segmentation network model is shown in FIG. 1:
1) the method comprises the steps of taking a ResNet50 backbone network of a variant as an encoder, wherein the ResNet50 backbone network comprises 4 layers, input images are sent to a first layer, the output of each layer is sent to the next layer, the ResNet50 backbone network of the variant is a ResNet50 backbone network which comprises a channel attention module, a third layer and a fourth layer are provided with hole convolutions, the outputs of the second layer and the third layer of the backbone network are sent to the channel attention module, the output of the channel attention module and the output of the third layer of the backbone network are combined in an Element-wise Sum mode and are sent to the fourth layer of the backbone network together, and the output of the fourth layer of the backbone network is a feature map extracted by the encoder.
2) A multi-scale mixing pool structure and a feature attention fusion module are used as a decoder, the input of the multi-scale mixing pool structure is the output of the fourth layer of the backbone network, and the output of the multi-scale mixing pool structure and the output of the third layer of the backbone network are sent to the feature attention fusion module; the multi-scale mixed pool structure comprises a conventional pooling module and an unconventional pooling module, and has certain improvement on the partition of objects with irregular shapes aiming at the characteristics of different objects respectively.
Multi-scale mixing pool structure as shown in fig. 2, the feature map output by the encoder is fed into the multi-scale mixing pool structure. The conventional pooling module comprises m k multiplied by k self-adaptive average pooling blocks, wherein k is greater than 0 and can be 1, 2, …, 6 and the like, and a feature map output by the fourth layer of the backbone network is sent into the m self-adaptive average pooling blocks after dimension reduction; the unconventional pooling module comprises an sx 1 pooling block and a 1 xs pooling block, a feature map output by the fourth layer of the backbone network is subjected to dimensionality reduction and then is sent into the sx 1 pooling block and the 1 xs pooling block, and the output of the unconventional pooling module is restored to the size of the sx s through matrix multiplication and convolution; and finally, the feature graph output by the conventional pooling module and the feature graph output by the unconventional pooling module are subjected to upsampling and then restored to the size of the input feature graph, namely the size of the feature graph is the same as that of the feature graph output by the encoder, merging and dimensionality reduction are carried out to obtain the output of a multi-scale mixed pool structure, namely the feature graph after mixing pooling, and the feature graph after mixing pooling is sent to the feature attention fusion module.
The feature attention fusion module is described in detail as follows: the output of the multi-scale mixing pool structure is a feature map X1The output of the third layer of the backbone network is a feature map X2Feature map X2Generating an attention coefficient through a global average pooling function, a convolution function, a batch normalization function and a Sigmoid activation function in a characteristic attention fusion module, wherein the attention coefficient is used for adjusting the characteristic diagram X1Each channel of (a). The feature attention fusion module compresses and excites the feature graph output by the third layer of the backbone network, and the generated attention coefficient is used for weighting the feature graph output by the multi-scale mixed pool structure, so that the segmentation accuracy of the semantic segmentation network model is further enhanced.
3) The constructed semantic segmentation network model also comprises a context embedding block, a global average pooling layer and a classification module, wherein a feature map output by an encoder is sent into the context embedding block and then sequentially passes through the global average pooling layer and the classification module, the classification module comprises a full connection layer and a Sigmoid function, and in a network training stage, the full connection layer is used for learning global information of corresponding features so as to emphasize beneficial features and inhibit useless features; the output of the classification module and the output of the characteristic attention fusion module are combined by Element-wise product, namely Element multiplication is carried out, so that relevant parameters of a semantic segmentation network model are adjusted, network performance is optimized, the segmentation result of the image is refined, a segmentation result image is obtained, and the size of the segmentation result image is restored to the size of the input image.
Two types of auxiliary losses are introduced, wherein the Classification layer is a Classification label, the Segmentation layer is a semantic label, the output of the Classification module and the Classification layer perform Classification assistance to further refine the Segmentation result of the image, the output of the semantic Segmentation network model, namely the Segmentation result image, and the Segmentation assistance and the Segmentation layer perform Segmentation assistance to adjust the training parameters of the semantic Segmentation network model.
The preprocessing of the image in the semantic segmentation data set specifically comprises: and carrying out cutting, turning, translation and scaling operations on the image and the mask thereof in the semantic segmentation data set, and expanding the semantic segmentation data set.
The model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set specifically comprise:
taking the images in the preprocessed semantic segmentation image data set as input images, unifying the sizes of the input images, wherein the weight initialization mode in the semantic segmentation network model is Kaiming, the semantic segmentation network model is trained by using a random gradient descent algorithm with momentum, the iteration frequency is 30000 times, the weight attenuation is 1e-5, the momentum is 0.9, the batch processing number is 4, the initial learning rate is 0.001, and the learning rate iteration strategy is Poly.
A semantic segmentation system based on a deep convolutional neural network, comprising:
the image acquisition module is used for acquiring a scene image and preprocessing the scene image;
the semantic segmentation module is used for performing semantic segmentation on the scene image by using the trained semantic segmentation network model and outputting a segmentation result graph;
the training process of the semantic segmentation network model specifically comprises the following steps:
acquiring a semantic segmentation image data set, and preprocessing images in the semantic segmentation data set; building a semantic segmentation network model, taking a modified ResNet50 backbone network as an encoder, taking a multi-scale mixed pool structure and a feature attention fusion module as a decoder, carrying out feature extraction on an input image by the encoder to obtain a feature map, and obtaining a segmentation result map by the decoder based on the feature map; and performing model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set to obtain the trained semantic segmentation network model.
Taking real-time semantic segmentation of a video image as an example, a camera function is called in an OpenCV (open content computer vision library) library, a real-time picture is read, then the read video is processed frame by frame, the image format is converted, the preprocessed video image is input into a trained semantic segmentation network model frame by frame, and a corresponding segmentation mask is output in real time. Fig. 4, 5, 6 and 7 show the segmentation result of different street view images.
The invention provides a segmentation method such as ResNet50 backbone network and mixed pooling based on variants, in the training process, a preprocessed semantic segmentation image is sent to a deeper encoder to strive for network learning to obtain more image characteristic information, in addition, a channel attention module is added, migration learning is adopted to initialize a network architecture, learning of the image characteristic information is accelerated, and then model convergence is accelerated.
In the decoding stage, a Multi-scale Mixed pool structure is designed, and comprises a conventional Pooling module and an unconventional Pooling module, and the Mixed pool modules (MMP) with different scales are connected in parallel to realize the efficient aggregation of the Multi-scale feature maps. And a feature attention fusion module is also built behind the multi-scale mixing pool structure, the feature graph output by the third layer of the backbone network in the encoding stage is compressed and excited, and an attention coefficient is generated and used for weighting the feature graph output by the multi-scale mixing pool structure in the decoding stage, so that the network segmentation accuracy is further enhanced.
In addition, a context embedding block, a global average pooling layer and a classification module which are parallel to the decoder are established, Element-wise product operation is carried out on the output of the classification module and the output of the characteristic attention fusion module, namely Element multiplication is carried out, a segmentation result graph is obtained, and the segmentation effect is enhanced.
On the whole, although objects of different classes may have similar local features and related detail information increases the difficulty of segmentation, the method can effectively realize the association between encoding and decoding, fully extract related information between different stages and in the same stage, effectively fuse low-level and high-level features, obtain remote dependence and rich context information, and can efficiently and accurately segment related images. The invention can efficiently and accurately realize the semantic segmentation task of the related images.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A semantic segmentation method based on a deep convolutional neural network is characterized by comprising model training and model application, wherein the model training comprises the following steps:
acquiring a semantic segmentation image data set, and preprocessing images in the semantic segmentation data set; building a semantic segmentation network model, taking a modified ResNet50 backbone network as an encoder, and taking a multi-scale mixed pool structure and a feature attention fusion module as a decoder, wherein the encoder performs feature extraction on an input image to obtain a feature map, and the decoder obtains a segmentation result map based on the feature map; performing model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set to obtain a trained semantic segmentation network model;
the model application specifically comprises: and performing semantic segmentation on the image by using the trained semantic segmentation network model to obtain a segmentation result graph.
2. The semantic segmentation method based on the deep convolutional neural network of claim 1, wherein a constructed semantic segmentation network model takes a ResNet50 backbone network of a variant as an encoder, the ResNet50 backbone network of the variant is a ResNet50 backbone network which comprises a channel attention module and is provided with a hole convolution on a third layer and a fourth layer, outputs of the second layer and the third layer of the backbone network are sent to the channel attention module, outputs of the channel attention module and outputs of the third layer of the backbone network are combined in an Element-wise Sum mode and are sent to the fourth layer of the backbone network together, and the output of the fourth layer of the backbone network is a feature map extracted by the encoder.
3. The deep convolutional neural network-based semantic segmentation method according to claim 2, wherein a built semantic segmentation network model adopts a multi-scale mixed pool structure and a feature attention fusion module as a decoder, the multi-scale mixed pool structure comprises a conventional pooling module and an unconventional pooling module, the input of the multi-scale mixed pool structure is the output of the fourth layer of the backbone network, and the output of the multi-scale mixed pool structure and the output of the third layer of the backbone network are sent to the feature attention fusion module together.
4. The deep convolutional neural network-based semantic segmentation method as claimed in claim 3, wherein the conventional pooling module comprises m k × k adaptive average pooling, k >0, and the irregular pooling module adopts s × 1 and 1 × s pooling modes.
5. The semantic segmentation method based on the deep convolutional neural network as claimed in claim 3, wherein the feature attention fusion module is specifically described as follows: the output of the multi-scale mixing pool structure is a feature map X1The output of the third layer of the backbone network is a feature map X2Feature map X2Generating attention coefficients used for adjusting the feature map X through global average pooling, convolution, batch normalization and Sigmoid activation functions in a feature attention fusion module1Each channel of (a).
6. The semantic segmentation method based on the deep convolutional neural network as claimed in claim 1, wherein the constructed semantic segmentation network model further comprises a context embedding block, a global average pooling layer and a classification module, a feature map output by an encoder is sent into the context embedding block and then sequentially passes through the global average pooling layer and the classification module, wherein the classification module comprises a full connection layer and a Sigmoid function; and carrying out element multiplication on the output of the classification module and the output of the characteristic attention fusion module to obtain a segmentation result graph.
7. The semantic segmentation method based on the deep convolutional neural network as claimed in claim 1, wherein the preprocessing of the image in the semantic segmentation data set specifically comprises: and carrying out cutting, turning, translation and scaling operations on the image and the mask thereof in the semantic segmentation data set, and expanding the semantic segmentation data set.
8. The semantic segmentation method based on the deep convolutional neural network as claimed in claim 1, wherein the model training and network parameter setting of the semantic segmentation network model using the preprocessed semantic segmentation image dataset specifically comprise:
taking the images in the preprocessed semantic segmentation image data set as input images, unifying the sizes of the input images, wherein the weight initialization mode in the semantic segmentation network model is Kaiming, the semantic segmentation network model is trained by using a random gradient descent algorithm with momentum, the iteration frequency is 30000 times, the weight attenuation is 1e-5, the momentum is 0.9, the batch processing number is 4, the initial learning rate is 0.001, and the learning rate iteration strategy is Poly.
9. A semantic segmentation system based on a deep convolutional neural network, which is based on the semantic segmentation method based on the deep convolutional neural network as claimed in any one of claims 1 to 8, and comprises:
the image acquisition module is used for acquiring a scene image and preprocessing the scene image;
the semantic segmentation module is used for performing semantic segmentation on the scene image by using the trained semantic segmentation network model and outputting a segmentation result graph;
the training process of the semantic segmentation network model specifically comprises the following steps:
acquiring a semantic segmentation image data set, and preprocessing images in the semantic segmentation data set; building a semantic segmentation network model, taking a modified ResNet50 backbone network as an encoder, and taking a multi-scale mixed pool structure and a feature attention fusion module as a decoder, wherein the encoder performs feature extraction on an input image to obtain a feature map, and the decoder obtains a segmentation result map based on the feature map; and performing model training and network parameter setting of the semantic segmentation network model by using the preprocessed semantic segmentation image data set to obtain the trained semantic segmentation network model.
10. The semantic segmentation system based on the deep convolutional neural network of claim 9, wherein the constructed semantic segmentation network model uses a ResNet50 backbone network of a variant as an encoder, the ResNet50 backbone network of the variant is a ResNet50 backbone network which comprises a channel attention module and has a hole convolution in a third layer and a fourth layer, outputs of the second layer and the third layer of the backbone network are sent to the channel attention module, outputs of the channel attention module and outputs of the third layer of the backbone network are combined in an Element-wise Sum manner and sent to the fourth layer of the backbone network, and the output of the fourth layer of the backbone network is a feature map extracted by the encoder.
CN202111245617.6A 2021-10-26 2021-10-26 Semantic segmentation method and system based on deep convolutional neural network Pending CN113902925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111245617.6A CN113902925A (en) 2021-10-26 2021-10-26 Semantic segmentation method and system based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111245617.6A CN113902925A (en) 2021-10-26 2021-10-26 Semantic segmentation method and system based on deep convolutional neural network

Publications (1)

Publication Number Publication Date
CN113902925A true CN113902925A (en) 2022-01-07

Family

ID=79026154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111245617.6A Pending CN113902925A (en) 2021-10-26 2021-10-26 Semantic segmentation method and system based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN113902925A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677514A (en) * 2022-04-19 2022-06-28 苑永起 Underwater image semantic segmentation model based on deep learning
CN115995002A (en) * 2023-03-24 2023-04-21 南京信息工程大学 Network construction method and urban scene real-time semantic segmentation method
CN117764995A (en) * 2024-02-22 2024-03-26 浙江首鼎视介科技有限公司 biliary pancreas imaging system and method based on deep neural network algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677514A (en) * 2022-04-19 2022-06-28 苑永起 Underwater image semantic segmentation model based on deep learning
CN115995002A (en) * 2023-03-24 2023-04-21 南京信息工程大学 Network construction method and urban scene real-time semantic segmentation method
CN117764995A (en) * 2024-02-22 2024-03-26 浙江首鼎视介科技有限公司 biliary pancreas imaging system and method based on deep neural network algorithm
CN117764995B (en) * 2024-02-22 2024-05-07 浙江首鼎视介科技有限公司 Biliary pancreas imaging system and method based on deep neural network algorithm

Similar Documents

Publication Publication Date Title
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN109410239B (en) Text image super-resolution reconstruction method based on condition generation countermeasure network
CN109711463B (en) Attention-based important object detection method
CN113902925A (en) Semantic segmentation method and system based on deep convolutional neural network
Li et al. Single image snow removal via composition generative adversarial networks
CN114723760B (en) Portrait segmentation model training method and device and portrait segmentation method and device
CN115393396B (en) Unmanned aerial vehicle target tracking method based on mask pre-training
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN111414860A (en) Real-time portrait tracking and segmenting method
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN114333062A (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN111723934B (en) Image processing method and system, electronic device and storage medium
CN117152438A (en) Lightweight street view image semantic segmentation method based on improved deep LabV3+ network
CN117097853A (en) Real-time image matting method and system based on deep learning
CN116246109A (en) Multi-scale hole neighborhood attention computing backbone network model and application thereof
CN114821061A (en) Context aggregation network and image real-time semantic segmentation method based on same
CN112164078B (en) RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN114463734A (en) Character recognition method and device, electronic equipment and storage medium
Zhang et al. Research on rainy day traffic sign recognition algorithm based on PMRNet
Ma et al. Rtsnet: Real-time semantic segmentation network for outdoor scenes
Zamanian et al. Improvement in accuracy and speed of image semantic segmentation via convolution neural network encoder-decoder
CN116645399B (en) Residual network target tracking method and system based on attention mechanism
CN115171020A (en) Real-time video instance segmentation method for complete convolution
Ren et al. Complex Scene Segmentation Network Based on Multi-scale Encoding-decoding Architecture
Wang et al. 3M: An Affinity Processing Method for Weakly Supervised Semantic Segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination