CN112819000A - Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium - Google Patents

Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN112819000A
CN112819000A CN202110208934.4A CN202110208934A CN112819000A CN 112819000 A CN112819000 A CN 112819000A CN 202110208934 A CN202110208934 A CN 202110208934A CN 112819000 A CN112819000 A CN 112819000A
Authority
CN
China
Prior art keywords
semantic segmentation
convolution
street view
output
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110208934.4A
Other languages
Chinese (zh)
Inventor
梁超
王小瑀
宋宇
程超
姜长泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Technology
Original Assignee
Changchun University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Technology filed Critical Changchun University of Technology
Priority to CN202110208934.4A priority Critical patent/CN112819000A/en
Publication of CN112819000A publication Critical patent/CN112819000A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures

Abstract

The invention discloses a streetscape image semantic segmentation system and a segmentation method, electronic equipment and a computer readable medium, wherein the segmentation method comprises the following steps: step 1, street view images are collected and preprocessed and data enhancement is carried out on the street view images; step 2, encoding the street view image into an output feature map by using an encoder; step 3, collecting the characteristics of the three output characteristic graphs by using a multi-stage characteristic combination upper sampling module, and fusing to obtain a second output characteristic graph; step 4, converting the second output characteristic diagram into a third output characteristic diagram; step 5, inputting the third output feature map into a convolution classifier to obtain a semantic segmentation feature value; step 6, performing end-to-end training by using a back propagation algorithm to obtain a streetscape image semantic segmentation model; step 7, performing semantic segmentation on the street view image by using the street view image semantic segmentation model; the invention accelerates the network segmentation speed and enhances the real-time response capability in the application under the condition of not reducing the semantic segmentation precision.

Description

Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium
Technical Field
The invention belongs to the technical field of image semantic segmentation, and particularly relates to a streetscape image semantic segmentation system, a streetscape image semantic segmentation method, electronic equipment and a computer readable medium.
Background
Semantic segmentation is one of basic tasks of computer vision, and aims to allocate a semantic label to each pixel in an image so as to obtain a pixel-level segmentation result.
As the most original full convolution neural network, the FCN is transformed from a convolution neural network specially used for image classification, and the semantic segmentation has made great progress in recent years due to the benefit of deep learning technology after the FCN; semantic segmentation algorithms applied to unmanned driving are generally divided into two main categories: the first type is a network based on an encoder-decoder structure, such as Unet and SegNet, when the encoder-decoder structure is used for performing segmentation tasks of few categories, the classification speed is high, the accuracy is high, but when the classification categories are increased, the semantic segmentation speed and the semantic segmentation accuracy are greatly reduced; the second type is a network based on context information, such as PSPNet and deep lab v3+, which improves the scene analysis capability of the network by introducing more context information, and keeps the receptive field unchanged by introducing a hole convolution, and adopts a hole pyramid pooling at the top of the final feature map, thereby avoiding down-sampling operation and obtaining a large amount of receptive field information, but the introduction of the hole convolution can increase the computational complexity and memory occupancy of the network, and the network has serious defects in the aspect of segmentation speed.
The existing semantic segmentation network usually generates a large number of parameters during running, consumes a large amount of running time, only considers the segmentation precision but not the real-time property of the network, and the unmanned driving field has requirements on the accuracy of the semantic segmentation network and is very sensitive to the real-time property of the algorithm, so that the semantic segmentation algorithm is required to have real-time processing speed and rapid interaction and response capability, and the network is not suitable for unmanned driving.
Disclosure of Invention
The invention aims to provide a streetscape image semantic segmentation system, which uses a multi-level feature combined up-sampling module and a pyramid pooling module to extract deep features and shallow features in streetscape images, the collected features can comprehensively represent each segmented object, so that the semantic segmentation precision is higher, and meanwhile, a low-resolution feature map is used for approximating a high-resolution feature map, so that the network operation speed is accelerated, and the response capability in application is improved.
The invention also aims to provide a street view image semantic segmentation method, which can be used for performing semantic segmentation on a street view image, greatly improving the real-time performance of the semantic segmentation under the condition of ensuring the segmentation precision, quickly performing semantic segmentation on the street view image when the street view image is used for unmanned driving, giving real-time response and improving the safety of the unmanned driving.
It is also an object of the present invention to provide an electronic device and a computer readable medium for storing and performing semantic segmentation of street view images.
The technical scheme adopted by the invention is that the streetscape image semantic segmentation system comprises:
the preprocessing module is used for carrying out scaling, random cutting, random turning and normalization processing on the finely marked street view image;
the encoder is used for encoding the preprocessed street view image into five output characteristic graphs with gradually reduced size and resolution, and inputting the three output characteristic graphs into the multi-level characteristic combination upper sampling module;
the multi-level feature combined up-sampling module is used for extracting features and context information in the three subsequent output feature graphs and fusing the features and the context information to obtain a second output feature graph;
the pyramid pooling module is used for performing convolution processing on the second output characteristic diagram and converting the second output characteristic diagram into a third output characteristic diagram with low resolution;
and the convolution classifier is used for dividing the third output characteristic image into different objects to realize image semantic segmentation.
The street view image semantic segmentation method comprises the following steps:
step 1, obtaining street view images with fine labels, dividing the street view images into a training set, a testing set and a verification set, and inputting the street view images into a preprocessing module for preprocessing and data enhancement;
step 2, the preprocessing module inputs the processed street view images of the training set into an encoder, the encoder performs convolution operation and maximum pooling operation on the input street view images to obtain five output feature maps of Conv1 layers to Conv5 layers, and the last three output feature maps are input into a multi-level feature combined upper sampling module;
step 3, the multi-level feature combined up-sampling module respectively collects features and context information in the three output feature graphs, and the collected results are fused to obtain a second output feature graph;
step 4, the pyramid pooling module takes the second output characteristic diagram as input and carries out convolution operation on the second output characteristic diagram so as to convert the second output characteristic diagram into a third output characteristic diagram with low resolution;
step 5, inputting the third output feature map into a convolution classifier to obtain a semantic segmentation feature value;
step 6, comparing the semantic segmentation characteristic value with the fine label, and performing end-to-end training by using a back propagation algorithm to obtain a streetscape image semantic segmentation model;
and 7, preprocessing the street view image to be tested, inputting the street view image semantic segmentation model to obtain a semantic segmentation characteristic value, and up-sampling the semantic segmentation characteristic value to obtain a semantic segmentation image.
Further, the preprocessing and data enhancement in step 1 includes: and carrying out scaling, random cutting, random turning and normalization processing on the training set images, and carrying out scaling and normalization processing on the test set images and the verification set images.
Further, the encoder in step 2 is a lightweight network FCN8s, and sequentially comprises 2 groups of 2 convolution operations of 3 × 3, a maximum pooling operation, 3 groups of 3 convolution operations of 3 × 3, and a maximum pooling operation;
the five output characteristic graphs are as follows: the Conv1 layer output feature map is one half of the size of the original image and has 64 encoders; the Conv2 layer output feature map is one fourth of the size of the original image and has 64 encoders; the Conv3 layer output feature map is one eighth of the size of the original image and has 128 encoders; the Conv4 layer output feature map is one sixteenth of the original image in size, and has 256 encoders; the Conv5 layer output feature map is thirty-half the size of the original image, with 512 encoders.
Further, the step 3 specifically includes the following steps:
step 31, performing convolution processing on the three input feature maps to generate three first intermediate feature maps, and performing up-sampling and splicing operation on the three first intermediate feature maps to obtain a first output feature map;
and step 32, processing the first output characteristic diagram by using four depth separable convolutions with different expansion rates respectively to obtain four second intermediate characteristic diagrams, and inputting the four second intermediate characteristic diagrams into the convolution layers to stack and compress the input convolution layers to obtain a second output characteristic diagram, wherein the expansion rates of the depth separable convolutions are 1, 2, 4 and 8 respectively.
Further, the specific operation of step 4 is to perform step-by-step convolution on the input second output feature map, delete the odd-numbered elements to obtain a third intermediate feature map, and perform several times of ordinary convolution on the third intermediate feature map to obtain a third output feature map.
Further, in the step 5, the convolution classifier is operated by conv2d, the number of input channels is the number of street view image segmentation objects, the size of a convolution kernel is 1, the convolution filling mode is same, and the activation function is softmax.
Further, the reverberation propagation algorithm in the step 6 uses Adam optimizer, the loss function is spark _ probability _ cross strategy, the initial learning rate is 0.001, the learning rate strategy is an inverse time decay strategy, and the weight decay is normalized by using L2, wherein escape _ steps is 74300, and escape _ rate is 0.5.
An electronic device comprising a processor and a memory, the processor and memory in communication with each other;
a memory for storing a computer program;
and the processor is used for realizing the steps of the method when executing the program stored in the memory.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method steps.
The invention has the beneficial effects that: the embodiment of the invention provides a semantic segmentation method with higher efficiency and better real-time performance on the basis of the existing semantic segmentation network, a lightweight network FCN8s is used as an encoder to output a multi-scale feature map, then multi-level feature joint up-sampling is used for extracting features and context information in the multi-scale feature map, and then step convolution and common convolution are used for extracting features to obtain more comprehensive feature information, so that the semantic segmentation model obtained by training has higher semantic segmentation precision, and meanwhile, a low-resolution feature map is used to approximate a high-resolution feature map, so that the operation amount of the semantic segmentation network is greatly reduced, the segmentation speed of the network is increased, and the real-time response capability of the network in application is further enhanced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an implementation of an embodiment of the present invention.
Fig. 2 is a network configuration diagram of an embodiment of the present invention.
Fig. 3 is a block diagram of a multi-level feature joint upsampling module.
FIG. 4 is the semantic segmentation effect of different algorithms on the Cityscapes dataset.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The streetscape image semantic segmentation system comprises a preprocessing module, an encoder, a multi-level feature combined up-sampling module, a pyramid pooling module and a convolution classifier which are sequentially connected, wherein the preprocessing module is used for carrying out scaling, random cutting, random turning and normalization operations on images in a data set; the encoder is a lightweight network FCN8s, the lightweight network is used for encoding image features to obtain output feature maps of a Conv1 layer, a Conv2 layer, a Conv3 layer, a Conv4 layer and a Conv5 layer, the multi-level feature combined upsampling module is used for extracting features and context information in the output feature maps of the Conv3 layer, the Conv4 layer and the Conv5 layer and fusing the extracted feature information to obtain a second output feature map, the pyramid pooling module is used for performing convolution processing on the second output feature map to convert the second output feature map with high resolution into a third output feature map with low resolution, and the convolution classifier is used for dividing the feature maps into different objects to realize image semantic segmentation.
Examples
As shown in fig. 1, the streetscape image semantic segmentation method includes the following steps:
step 1, acquiring an unmanned street view image with fine labels, and dividing the unmanned street view image into a training set, a verification set and a test set;
selecting a Cityscapes database released by a speed company as an unmanned street view image, wherein the Cityscapes database comprises 50 street view images of cities in different scenes, backgrounds and seasons, 5000 fine labeled images with the resolution of 1024 multiplied by 2048 are contained, and the fine labeled images are divided into 2975 training images, 500 verification images and 1525 test images;
as a segmentation object, the following 34 classes of objects were used: unlabeled, ego vehicle, recovery binder, out of roi, static, dynamic, ground, road, sidewalk, park, rail track, building, wall, dance, guard rail, bridge, tunnel, pole, polegroup, traffic light, traffic sign, vegetation, terain, sky, person, rider, car, truck, bus, caravan, trailer, train, motorcyclee, bicycle, license plate;
preprocessing and data enhancing are carried out on the finely labeled street view image;
the resolution of the finely-labeled street view images in the cityscaps database is higher, and the direct semantic segmentation of the finely-labeled street view images can seriously reduce the running speed of a semantic segmentation network, so that the street view images in a training set need to be scaled to 512 multiplied by 1024, then randomly cut to 512 multiplied by 512, and are randomly turned and normalized, and the street view images in a test set and a verification set are scaled to 512 multiplied by 512 and are normalized;
step 2, as shown in fig. 2, a lightweight network FCN8s is used as an encoder of the semantic segmentation network to encode street view images of the training set;
the lightweight network FCN8s has the characteristics of accurate encoding semantic information, small calculated amount and the like, and can reduce the time consumption of an algorithm in an encoding characteristic stage by using the lightweight network FCN8s as an encoder, wherein the lightweight network FCN8s sequentially comprises 2 groups of 2 convolution operation operations of 3 × 3, maximum pooling operation, 3 groups of 3 convolution operations of 3 × 3 and maximum pooling operation;
the input street view image format is H multiplied by W multiplied by 3, the length and width of the image are reduced to one half of the original image after each maximum pooling operation, the Conv1 layer output feature map generated by the encoder has a size of one half of the original image and 64 encoders, the Conv2 layer output feature map has a size of one fourth of the original image and 64 encoders, the Conv3 layer output feature map has a size of one eighth of the original image and 128 encoders, the Conv4 layer output feature map has a size of one sixteenth of the original image and 256 encoders, the Conv5 layer output feature map has a size of one thirty-half of the original image and 512 encoders;
step 3, respectively acquiring context information and characteristics of Conv3 layer, Conv4 layer and Conv5 layer output characteristic diagrams by using a multi-level characteristic combined up-sampling module, and fusing acquisition results to obtain a second output characteristic diagram with high resolution;
as shown in fig. 3, the multistage feature joint upsampling module takes the last three feature maps (Conv3-Conv5) of the encoder network FCN8s as its input, performs convolution processing on the three input feature maps (Conv3-Conv5) respectively to generate three first intermediate feature maps, puts the three first intermediate feature maps into a same space with a lower dimension, and performs upsampling and splicing on the three first intermediate feature maps to obtain a first output feature map, so that the context information of the multistage feature maps is better fused, and the computational complexity of the first output feature map is reduced;
then, respectively extracting deep-layer and shallow-layer features in the first output feature map by using four depth separable convolutions to obtain four second intermediate feature maps, stacking channels of the four second intermediate feature maps by using a convolution layer, and compressing and converting the channels into a high-resolution second output feature map with a normal channel size; the expansion rates of the four depth separable convolutions are respectively 1, 2, 4 and 8, the relationship between the first output characteristic diagram and the separation characteristic diagram is captured by using the depth separable convolution with the expansion rate of 1, and the characteristic diagram obtained by separating the first output characteristic diagram is converted into the mapping of the second output characteristic diagram by using the depth separable convolution learning with the expansion rates of 2, 4 and 8;
in the embodiment, a multilevel feature combined upsampling module is used for avoiding convolution calculation of a cavity pyramid pooling network with huge parameter quantity and a high-resolution output feature map so as to greatly reduce the segmentation speed, and multi-scale context information can be extracted from multilevel feature mapping, so that better performance is obtained;
step 4, inputting the second output characteristic diagram into a pyramid pooling module, and converting the high-resolution second output characteristic diagram into a low-resolution third output characteristic diagram through convolution processing so as to further extract multi-scale information of the second output characteristic diagram and improve the capability of a network for segmenting targets with different scales;
the pyramid pooling module comprises step convolution and a plurality of times of common convolution, the second output feature map is input into the step convolution for convolution processing, then elements with odd indexes are deleted to obtain a third intermediate feature map, and the third intermediate feature map is subjected to the plurality of times of common convolution to obtain a third output feature map with lower spatial resolution;
when the number of times of the ordinary convolution is increased, the more abstract the information contained in the feature map obtained along with the convolution, the stronger semantic information is provided, the receptive field is enlarged, but the resolution is reduced, the perception capability of the detail is poor, the resolution of the feature map obtained by reducing the number of times of the convolution is higher, the more information such as position, detail and the like is contained, but the semantic property is reduced, the noise is more, and the embodiment performs the ordinary convolution for 5 times in the operation process;
step 5, inputting the third output feature map into a convolution classifier to obtain a semantic segmentation feature value;
the convolution classifier is configured to: adopting conv2d operation, setting input filters 34, kernel _ size 1, padding same as 'same', activation same as 'softmax', filters as filter number, kernel _ size as convolution kernel size, padding as convolution filling mode, and activation as activation function;
step 6, comparing the semantic segmentation characteristic value with the fine label, and performing end-to-end training by using a back propagation algorithm to obtain a streetscape image semantic segmentation model;
processing the semantic characteristic values only by using random inversion and random clipping in an end-to-end training process, wherein an Adam optimizer is used in a back propagation algorithm, a loss function is a sparse _ coordinated _ cross strategy, the initial learning rate is 0.001, a learning rate strategy is an inverse time attenuation strategy, and weight attenuation is normalized by using L2, wherein the decay _ steps is 74300, the decay _ rate is 0.5, and represents that the learning rate is attenuated to two thirds after each 100 epochs;
and 7, down-sampling the street view image to be tested to 512 multiplied by 512, then inputting a semantic segmentation model to obtain a semantic segmentation characteristic value, up-sampling the semantic segmentation characteristic value by utilizing bilinear interpolation, and restoring the semantic segmentation characteristic value into the street view image semantic segmentation image.
The method comprises the steps of performing semantic segmentation on a Cityscapes database by using various existing semantic segmentation algorithms and the embodiment respectively, wherein evaluation indexes of the Cityscapes database are shown in table 1, wherein an index Pix Acc and an index mIoU are used for evaluating semantic segmentation accuracy of the algorithms, and an index FPS is used for evaluating semantic segmentation speed of the algorithms, and the data in the table 1 show that the semantic segmentation method can greatly increase the operation speed of semantic segmentation without losing the semantic segmentation accuracy, and can greatly improve real-time response capability and driving safety of unmanned driving when being used for the unmanned driving.
TABLE 1 comparison of different evaluation indexes of algorithms on the Cityscapes database
Algorithm Backbone network Pix Acc% mIoU% FPS (frame/s)
Unet VGG16 87.07 37.06 16.6
SegNet VGG16 85.48 33.75 24.7
Enet From Seratch 85.75 30.46 37.8
PSPNet Resnet101 89.24 41.65 11.2
EncNet Resnet101 92.68 45.65 13.6
Deeplab v3+ Resnet101 93.24 44.76 14.3
This example FCN8s 91.85 43.78 32.3
The existing various semantic segmentation algorithms and the semantic segmentation result of the embodiment in the citiscapes database are shown in fig. 4, and it can be known from fig. 4 that the segmentation result obtained by the invention is closest to the real label, the error of the segmentation label does not occur, and the contour line of each classified object is clearer.
The invention also comprises an electronic device which comprises a memory and a processor, wherein the memory is used for storing the collected street view images and various computer program instructions for carrying out operations such as preprocessing, coding, feature extraction, up-sampling and the like on the street view images, and the processor is used for executing the computer program instructions to complete all or part of the steps so as to realize semantic segmentation of the street view images to be processed; the electronic device may communicate with one or more external devices, may also communicate with one or more devices that enable user interaction with the electronic device, and/or any device that enables the electronic device to communicate with one or more other computing devices, may also communicate with one or more networks (e.g., local area networks, wide area networks, and/or public networks) through a network adapter; the present invention also includes a computer-readable medium having stored thereon a computer program executable by a processor to implement street view image semantic segmentation, the computer-readable medium can include, but is not limited to, magnetic storage devices, optical disks, digital versatile disks, smart cards, and flash memory devices, the readable storage medium of the present invention can represent one or more devices for storing information and/or other machine-readable media, the term "machine-readable medium" including, but not limited to, wireless channels and various other media (and/or storage media) capable of storing, containing, and/or carrying code and/or instructions and/or data.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. Street view image semantic segmentation system, characterized by including:
the preprocessing module is used for carrying out scaling, random cutting, random turning and normalization processing on the finely marked street view image;
the encoder is used for encoding the preprocessed street view image into five output characteristic graphs with gradually reduced size and resolution, and inputting the three output characteristic graphs into the multi-level characteristic combination upper sampling module;
the multi-level feature combined up-sampling module is used for extracting features and context information in the three subsequent output feature graphs and fusing the features and the context information to obtain a second output feature graph;
the pyramid pooling module is used for performing convolution processing on the second output characteristic diagram and converting the second output characteristic diagram into a third output characteristic diagram with low resolution;
and the convolution classifier is used for dividing the third output characteristic image into different objects to realize image semantic segmentation.
2. The street view image semantic segmentation method using the street view image semantic segmentation system according to claim 1, characterized by comprising the steps of:
step 1, obtaining street view images with fine labels, dividing the street view images into a training set, a testing set and a verification set, and inputting the street view images into a preprocessing module for preprocessing and data enhancement;
step 2, the preprocessing module inputs the processed street view images of the training set into an encoder, the encoder performs convolution operation and maximum pooling operation on the input street view images to obtain five output feature maps of Conv1 layers to Conv5 layers, and the last three output feature maps are input into a multi-level feature combined upper sampling module;
step 3, the multi-level feature combined up-sampling module respectively collects features and context information in the three output feature graphs, and the collected results are fused to obtain a second output feature graph;
step 4, the pyramid pooling module takes the second output characteristic diagram as input and carries out convolution operation on the second output characteristic diagram so as to convert the second output characteristic diagram into a third output characteristic diagram with low resolution;
step 5, inputting the third output feature map into a convolution classifier to obtain a semantic segmentation feature value;
step 6, comparing the semantic segmentation characteristic value with the fine label, and performing end-to-end training by using a back propagation algorithm to obtain a streetscape image semantic segmentation model;
and 7, preprocessing the street view image to be tested, inputting the street view image semantic segmentation model to obtain a semantic segmentation characteristic value, and up-sampling the semantic segmentation characteristic value to obtain a semantic segmentation image.
3. The streetscape image semantic segmentation method according to claim 2, wherein the preprocessing and data enhancement in step 1 comprises: and carrying out scaling, random cutting, random turning and normalization processing on the training set images, and carrying out scaling and normalization processing on the test set images and the verification set images.
4. The streetscape image semantic segmentation method according to claim 2, wherein the encoder in the step 2 is a lightweight network FCN8s, which is composed of 2 groups of 2 convolution operations of 3 × 3, a max-pooling operation, 3 groups of 3 convolution operations of 3 × 3, and a max-pooling operation in sequence;
the five output characteristic graphs are as follows: the Conv1 layer output feature map is one half of the size of the original image and has 64 encoders; the Conv2 layer output feature map is one fourth of the size of the original image and has 64 encoders; the Conv3 layer output feature map is one eighth of the size of the original image and has 128 encoders; the Conv4 layer output feature map is one sixteenth of the original image in size, and has 256 encoders; the Conv5 layer output feature map is thirty-half the size of the original image, with 512 encoders.
5. The streetscape image semantic segmentation method according to claim 2, wherein the step 3 specifically comprises the following steps:
step 31, performing convolution processing on the three input feature maps to generate three first intermediate feature maps, and performing up-sampling and splicing operation on the three first intermediate feature maps to obtain a first output feature map;
and step 32, processing the first output characteristic diagram by using four depth separable convolutions with different expansion rates respectively to obtain four second intermediate characteristic diagrams, and inputting the four second intermediate characteristic diagrams into the convolution layers to stack and compress the input convolution layers to obtain a second output characteristic diagram, wherein the expansion rates of the depth separable convolutions are 1, 2, 4 and 8 respectively.
6. The streetscape image semantic segmentation method according to claim 2, wherein the specific operation of step 4 is to perform step-by-step convolution on the input second output feature map, delete odd-numbered elements to obtain a third intermediate feature map, and perform several times of ordinary convolution on the third intermediate feature map to obtain a third output feature map.
7. The streetscape image semantic segmentation method according to claim 2, wherein the convolution classifier in step 5 is operated by conv2d, the number of input channels is the number of streetscape image segmentation objects, the size of a convolution kernel is 1, the convolution filling mode is same, and the activation function is softmax.
8. The streetscape image semantic segmentation method according to claim 2, wherein the reverberation propagation algorithm in the step 6 uses Adam optimizer, the loss function is sparse _ clustering _ cross strategy, the initial learning rate is 0.001, the learning rate strategy is inverse time decay strategy, and the weight decay is normalized by using L2, wherein the decay _ steps is 74300 and the decay _ rate is 0.5.
9. An electronic device comprising a processor and a memory, the processor and memory in communication with each other;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 2 to 8 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 2-8.
CN202110208934.4A 2021-02-24 2021-02-24 Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium Pending CN112819000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110208934.4A CN112819000A (en) 2021-02-24 2021-02-24 Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110208934.4A CN112819000A (en) 2021-02-24 2021-02-24 Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN112819000A true CN112819000A (en) 2021-05-18

Family

ID=75865521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110208934.4A Pending CN112819000A (en) 2021-02-24 2021-02-24 Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN112819000A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284155A (en) * 2021-06-08 2021-08-20 京东数科海益信息科技有限公司 Video object segmentation method and device, storage medium and electronic equipment
CN113469019A (en) * 2021-06-29 2021-10-01 广州市城市规划勘测设计研究院 Landscape image characteristic value calculation method, device, equipment and storage medium
CN113628349A (en) * 2021-08-06 2021-11-09 西安电子科技大学 Scene content self-adaptive AR navigation method and device and readable storage medium
CN114299331B (en) * 2021-12-20 2023-06-27 中国地质大学(武汉) Urban bicycle lane type detection method and system based on street view picture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109741340A (en) * 2018-12-16 2019-05-10 北京工业大学 Ice sheet radar image ice sheet based on FCN-ASPP network refines dividing method
CN110175613A (en) * 2019-06-03 2019-08-27 常熟理工学院 Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models
CN111179272A (en) * 2019-12-10 2020-05-19 中国科学院深圳先进技术研究院 Rapid semantic segmentation method for road scene

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145983A (en) * 2018-08-21 2019-01-04 电子科技大学 A kind of real-time scene image, semantic dividing method based on lightweight network
CN109741340A (en) * 2018-12-16 2019-05-10 北京工业大学 Ice sheet radar image ice sheet based on FCN-ASPP network refines dividing method
CN110175613A (en) * 2019-06-03 2019-08-27 常熟理工学院 Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models
CN111179272A (en) * 2019-12-10 2020-05-19 中国科学院深圳先进技术研究院 Rapid semantic segmentation method for road scene

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
T. EMARA等: "LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation", 《 2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA)》 *
WU H等: "Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation", 《ARXIV PREPRINT ARXIV:1903.11816》 *
宋宇等: "基于多级特征图联合上采样的实时语义分割", 《计算机技术与发展》 *
马书浩等: "改进DeepLabv2的实时图像语义分割算法", 《计算机工程与应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284155A (en) * 2021-06-08 2021-08-20 京东数科海益信息科技有限公司 Video object segmentation method and device, storage medium and electronic equipment
CN113284155B (en) * 2021-06-08 2023-11-07 京东科技信息技术有限公司 Video object segmentation method and device, storage medium and electronic equipment
CN113469019A (en) * 2021-06-29 2021-10-01 广州市城市规划勘测设计研究院 Landscape image characteristic value calculation method, device, equipment and storage medium
CN113628349A (en) * 2021-08-06 2021-11-09 西安电子科技大学 Scene content self-adaptive AR navigation method and device and readable storage medium
CN113628349B (en) * 2021-08-06 2024-02-02 西安电子科技大学 AR navigation method, device and readable storage medium based on scene content adaptation
CN114299331B (en) * 2021-12-20 2023-06-27 中国地质大学(武汉) Urban bicycle lane type detection method and system based on street view picture

Similar Documents

Publication Publication Date Title
WO2022083784A1 (en) Road detection method based on internet of vehicles
CN112819000A (en) Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium
CN111915592B (en) Remote sensing image cloud detection method based on deep learning
CN110147794A (en) A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning
CN111259905A (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
CN114092917B (en) MR-SSD-based shielded traffic sign detection method and system
CN110717921B (en) Full convolution neural network semantic segmentation method of improved coding and decoding structure
CN111582029A (en) Traffic sign identification method based on dense connection and attention mechanism
CN113888550A (en) Remote sensing image road segmentation method combining super-resolution and attention mechanism
CN112508960A (en) Low-precision image semantic segmentation method based on improved attention mechanism
CN114693924A (en) Road scene semantic segmentation method based on multi-model fusion
CN110853057A (en) Aerial image segmentation method based on global and multi-scale full-convolution network
CN114187520B (en) Building extraction model construction and application method
CN116912257B (en) Concrete pavement crack identification method based on deep learning and storage medium
CN115035298A (en) City streetscape semantic segmentation enhancement method based on multi-dimensional attention mechanism
CN115346071A (en) Image classification method and system for high-confidence local feature and global feature learning
CN113139551A (en) Improved semantic segmentation method based on deep Labv3+
CN115272677A (en) Multi-scale feature fusion semantic segmentation method, equipment and storage medium
CN109508639B (en) Road scene semantic segmentation method based on multi-scale porous convolutional neural network
CN113436210B (en) Road image segmentation method fusing context progressive sampling
CN111462090A (en) Multi-scale image target detection method
CN117197763A (en) Road crack detection method and system based on cross attention guide feature alignment network
CN116597270A (en) Road damage target detection method based on attention mechanism integrated learning network
CN114782949B (en) Traffic scene semantic segmentation method for boundary guide context aggregation
CN111612803A (en) Vehicle image semantic segmentation method based on image definition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210518