CN113255574B - Urban street semantic segmentation method and automatic driving method - Google Patents

Urban street semantic segmentation method and automatic driving method Download PDF

Info

Publication number
CN113255574B
CN113255574B CN202110670967.0A CN202110670967A CN113255574B CN 113255574 B CN113255574 B CN 113255574B CN 202110670967 A CN202110670967 A CN 202110670967A CN 113255574 B CN113255574 B CN 113255574B
Authority
CN
China
Prior art keywords
attention
semantic segmentation
network
pixel
characteristic diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110670967.0A
Other languages
Chinese (zh)
Other versions
CN113255574A (en
Inventor
瞿绍军
欧阳柳
刘义亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Normal University
Original Assignee
Hunan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Normal University filed Critical Hunan Normal University
Priority to CN202110670967.0A priority Critical patent/CN113255574B/en
Publication of CN113255574A publication Critical patent/CN113255574A/en
Application granted granted Critical
Publication of CN113255574B publication Critical patent/CN113255574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Abstract

The invention discloses a semantic segmentation method for urban streets, which comprises the steps of obtaining an original training data set; constructing a basic semantic segmentation network, adding an attention module based on pixels and an attention module based on different image levels to obtain the basic semantic segmentation network based on the attention of the different image levels and the attention of the pixels, and training to obtain the semantic segmentation network based on the attention of the different image levels and the attention of the pixels; and performing semantic segmentation on the urban streets in real time by adopting a semantic segmentation network based on different levels of attention and pixel attention of the image. The invention also discloses an automatic driving method comprising the urban street semantic segmentation method. The method of the invention fully utilizes the information of the high-level characteristic diagram and the information of the low-level characteristic diagram, and has high precision, good reliability and better real-time property.

Description

Urban street semantic segmentation method and automatic driving method
Technical Field
The invention belongs to the field of computer vision and image processing, and particularly relates to a semantic segmentation method and an automatic driving method for urban streets.
Background
With the development of economic technology and the improvement of living standard of people, computer vision technology is gradually applied to the production and life of people, and brings endless convenience to the production and life of people.
Semantic segmentation is one of core research hotspots of computer vision, and is aimed at dividing an image into regions with semantic information, allocating a semantic label to each region block, and finally obtaining a segmented image with each pixel being semantically labeled.
The prior art mainly has two semantic segmentation methods: the image semantic segmentation method based on the traditional image semantic segmentation method and the image semantic segmentation method based on the deep learning. The image semantic segmentation method based on deep learning has richer learning characteristics, stronger expression capability and greatly improved segmentation precision, so the method becomes the key point of research. The full convolution network applies the classification network to the network, replaces the full connection layer of the traditional convolution neural network with the convolution layer, combines the characteristic diagram generated by the middle convolution layer by using a jump layer method, and then performs transposition convolution; however, this approach presents two problems: 1. with the convolution pooling, the resolution is continuously reduced, and partial pixels are lost; 2. the original context information of the feature map is not considered. Subsequently, a number of researchers have proposed improved methods based on full convolutional networks. In the pyramid network, the pyramid pooling module can fuse multi-scale context information, effectively utilize the context information, and can make the segmentation result more detailed, but the disadvantage is that the boundary information part of the segmentation target is lost. The U-shaped neural network is a network model of a coder-decoder and comprises a contraction path and an expansion path, wherein the contraction path extracts context information, and the expansion path gradually restores object details and image resolution, but the U-shaped neural network has the defects that network training parameters are too many, the calculation amount is large, and real-time processing cannot be met. OCNet forms a target context feature map by calculating the similarity of each pixel with all pixels, and then represents the pixel by aggregating the features of all pixels, but has the disadvantage that part of the pixels are lost in the process. In deep lab-v3, a void convolution kernel pyramid pooling method is combined to construct a void space pyramid pooling module, and multi-scale context information is captured by using convolutions with different void rates, so that the receptive field is effectively enhanced, the spatial accuracy of a segmentation result is improved, but the defect is that the dependency among pixels is lost.
Disclosure of Invention
The invention aims to provide a semantic segmentation method for urban streets, which has high accuracy, good reliability and good real-time performance.
The invention also aims to provide an automatic driving method comprising the urban street semantic segmentation method.
The invention provides a semantic segmentation method for urban streets, which comprises the following steps:
s1, acquiring an original training data set;
s2, constructing a basic semantic segmentation network;
s3, adding a pixel-based attention module in the basic semantic segmentation network constructed in the step S2 to obtain a pixel-attention-based basic semantic segmentation network;
s4, adding attention modules based on different image levels to the semantic segmentation network based on pixel attention obtained in the step S3, so as to obtain a basic semantic segmentation network based on the attention of different image levels and the attention of pixels;
s5, training the basic semantic segmentation network based on the attention of different levels of the image and the attention of the pixels obtained in the step S4 by adopting the original training data set obtained in the step S1, so as to obtain the semantic segmentation network based on the attention of different levels of the image and the attention of the pixels;
and S6, performing semantic segmentation on the city streets in real time by adopting the semantic segmentation network based on the attention of different levels and the attention of pixels of the image obtained in the step S5.
Step S2, constructing a basic semantic segmentation network, specifically, using a Resnet101 network as the basic semantic segmentation network.
In the basic semantic segmentation network constructed in step S2 and described in step S3, a pixel-based attention module is added, specifically, the pixel-based attention module is connected in series at the output end of the fourth block of the Resnet101 network.
The pixel-based attention module specifically comprises the following steps:
A. features to the fourth block of the Resnet101 networkSign graphX 4The outer side of (a) is filled with all 1 s of dimension 1;
B. and D, operating the filled characteristic diagram obtained in the step A by adopting the following formula, thereby obtaining a preprocessed characteristic diagramX pre
Figure 845898DEST_PATH_IMAGE001
In the formula
Figure 763038DEST_PATH_IMAGE002
Is the feature map of the fourth block of the Resnet101 network populated in step A;
Figure 277196DEST_PATH_IMAGE003
for standard convolution operation, the convolution kernel is 3 x 3, the sample step size is 1,dvoid fraction and value 1;BN() The method is a batch standard operation;
C. for the pretreatment characteristic graph obtained in the step BX pre The pixel relation matrix is obtained by processing according to the following formulaX wmap
Figure 672405DEST_PATH_IMAGE004
In the formulaR() Is a reshape operation; x is a matrix multiplication operation;
Figure 486778DEST_PATH_IMAGE005
activating a function for sigmoid;
D. for the pixel relation matrix obtained in the step CX wmap Processing is performed by the following formula to obtain a depth feature mapX proc Is composed ofX proc =R(X wmap ×R(X pre ));
E. Feature map for the fourth block of the Resnet101 networkX 4Is proceeding to the outside ofAll 0 fills of 1;
F. and E, operating the filled characteristic diagram obtained in the step E by adopting the following formula, thereby obtaining the characteristic diagram after convolution with different voidage ratesF 1~F 4
Figure 309240DEST_PATH_IMAGE006
Figure 966487DEST_PATH_IMAGE007
Figure 165387DEST_PATH_IMAGE008
Figure 834265DEST_PATH_IMAGE009
In the formula
Figure 827629DEST_PATH_IMAGE010
A feature map of the fourth block of the Resnet101 network populated in step E;
Figure 519642DEST_PATH_IMAGE011
is a standard convolution operation with a convolution kernel size of 1 x 1, a sample step size of 1,dvoid fraction and value 1;
Figure 256654DEST_PATH_IMAGE012
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 12;
Figure 780039DEST_PATH_IMAGE013
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 24;
Figure 209883DEST_PATH_IMAGE014
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 36;
G. the obtained characteristic diagramX proc F 1F 2F 3AndF 4on the channel, processing is performed using the following equation, thereby completing the processing of the pixel-based attention module:
Figure 185929DEST_PATH_IMAGE015
in the formulaF m The feature map is output after the attention module based on the pixel processes;concat() A splicing operation in the channel dimension.
In the semantic segmentation based on pixel attention obtained in step S3 in step S4, attention modules based on different image levels are added, specifically at the output end of the second block of the Resnet101 network, and are connected in parallel to the attention modules based on different image levels.
The attention module based on different image levels specifically comprises the following steps:
a. signature graph X for the output of the second block of the Resnet101 network2The outer side of (a) is filled with all 0 s of dimension 3;
b. c, the filled characteristic diagram obtained in the step a
Figure 461053DEST_PATH_IMAGE016
Performing global average pooling to obtain results
Figure 104524DEST_PATH_IMAGE017
c. C, the filled characteristic diagram obtained in the step a
Figure 328438DEST_PATH_IMAGE016
Performing maximum pooling to obtain results
Figure 57360DEST_PATH_IMAGE018
d. The results obtained in the step b and the step c are carried outconcatOperate to obtain a first characteristic diagram
Figure 136174DEST_PATH_IMAGE019
e. D, operating the first characteristic diagram obtained in the step d by adopting the following formula to obtain the attention characteristic diagramF N
Figure 368573DEST_PATH_IMAGE020
In the formula
Figure 140220DEST_PATH_IMAGE021
Is a standard convolution operation with a convolution kernel size of 7 x 7, a sample step size of 1,dvoid fraction and value 1;
Figure 90858DEST_PATH_IMAGE022
is a Hadamard product;
f. the attention feature map obtained in the step eF N Processing the image by adopting the following formula so as to obtain a final feature map based on the attention module output of different levels of the imageF pam
Figure 973363DEST_PATH_IMAGE023
In the formulaF m Is a feature map output after processing by the pixel-based attention module.
The invention also provides an automatic driving method comprising the urban street semantic segmentation method.
The urban street semantic segmentation method and the automatic driving method provided by the invention utilize the relation among the high-level feature image pixels to obtain the global information, enhance the correlation among the pixels, and further extract the feature information of the image by combining with the attention module based on the pixels; aiming at the problem of pixel loss of the high-level feature image of the image, an attention module based on different levels of the image is provided, information in the high-level feature image is used as guidance to mine information hidden in the low-level feature image, and then the information is fused with the high-level feature image; therefore, the method of the invention fully utilizes the information of the high-level characteristic diagram and the information of the low-level characteristic diagram, and has high precision, good reliability and good real-time performance.
Drawings
FIG. 1 is a schematic process flow diagram of the process of the present invention.
FIG. 2 is a schematic diagram of a model structure of the method of the present invention.
FIG. 3 is a schematic diagram of a pixel-based attention module in a model structure of the method of the present invention.
FIG. 4 is a schematic diagram of a PSAM module in a pixel-based attention module in a model structure of the method of the present invention.
FIG. 5 is a schematic diagram of the structure of the attention module based on different image levels in the model structure of the method of the present invention.
FIG. 6 is a diagram illustrating the comparison between the effect of the present invention and the effect of the prior art on the same atlas.
Detailed Description
FIG. 1 is a schematic flow chart of the method of the present invention: the method for segmenting the urban street semantics provided by the invention comprises the following steps (the structure schematic diagram of the network provided by the invention is shown in figure 2):
s1, acquiring an original training data set;
s2, constructing a basic semantic segmentation network; particularly, a Resnet101 network is used as a basic semantic segmentation network;
s3, adding a pixel-based attention module in the basic semantic segmentation network constructed in the step S2 to obtain a pixel-attention-based basic semantic segmentation network; in particular, at the output of the fourth block of the Resnet101 network, a pixel-based attention module (ASPP AM module in fig. 2) is concatenated;
in particular, the structure of the pixel-based attention module is shown in FIG. 3; the method specifically comprises the following steps:
A. feature map for the fourth block of the Resnet101 networkX 4The outer side of (a) is filled with all 1 s of dimension 1;
B. and D, operating the filled characteristic diagram obtained in the step A by adopting the following formula, thereby obtaining a preprocessed characteristic diagramX pre
Figure 716060DEST_PATH_IMAGE024
In the formula
Figure 658609DEST_PATH_IMAGE025
Is the feature map of the fourth block of the Resnet101 network populated in step A;
Figure 96543DEST_PATH_IMAGE026
for standard convolution operation, the convolution kernel is 3 x 3, the sample step size is 1,dvoid fraction and value 1;BN() The method is a batch standard operation;
C. for the pretreatment characteristic graph obtained in the step BX pre The pixel relation matrix is obtained by processing according to the following formulaX wmap
Figure 517160DEST_PATH_IMAGE027
In the formulaR() Is a reshape operation; x is a matrix multiplication operation;
Figure 678145DEST_PATH_IMAGE005
activating a function for sigmoid;
D. for the pixel relation matrix obtained in the step CX wmap Processing is performed by the following formula to obtain a depth feature mapX proc Is composed ofX proc =R(X wmap ×R(X pre ));
The depth feature map can enable the pixel category information to get more attention, and meanwhile, the detail information is more highlighted;
E. padding the outer side of the feature map X4 of the fourth block of the Resnet101 network with all 0's of scale 1;
wherein, the processing and calculation performed in steps B to E are the PSAM module in fig. 3, and a schematic network structure diagram thereof is shown in fig. 4;
F. and E, operating the filled characteristic diagram obtained in the step E by adopting the following formula, thereby obtaining the characteristic diagram after convolution with different voidage ratesF 1~F 4
Figure 791595DEST_PATH_IMAGE028
Figure 716826DEST_PATH_IMAGE029
Figure 675554DEST_PATH_IMAGE030
Figure 737051DEST_PATH_IMAGE031
In the formula
Figure 21402DEST_PATH_IMAGE032
A feature map of the fourth block of the Resnet101 network populated in step E;
Figure 433929DEST_PATH_IMAGE033
is a standard convolution operation with a convolution kernel size of 1 x 1, a sample step size of 1,dvoid fraction and value 1;
Figure 852141DEST_PATH_IMAGE034
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 12;
Figure 33723DEST_PATH_IMAGE035
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 24;
Figure 223396DEST_PATH_IMAGE036
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 36;
G. the obtained characteristic diagramX proc F 1F 2F 3AndF 4on the channel, processing is performed using the following equation, thereby completing the processing of the pixel-based attention module:
Figure 857640DEST_PATH_IMAGE015
in the formulaF m The feature map is output after the attention module based on the pixel processes;concat() Splicing operation in channel dimension;
after the processing, the detail information of the feature map is weighted and refined, the context information is richer, and the size of the feature map is unchanged;
s4, adding attention modules based on different image levels to the semantic segmentation network based on pixel attention obtained in the step S3, so as to obtain a basic semantic segmentation network based on the attention of different image levels and the attention of pixels; specifically, at the output end of the second block of the Resnet101 network, attention modules (PAM module in fig. 2) based on different levels of the image are connected in parallel;
in specific implementation, a schematic diagram of a network structure of attention modules based on different image levels is shown in fig. 5; the method specifically comprises the following steps:
a. signature graph X for the output of the second block of the Resnet101 network2The outer side of (a) is filled with all 0 s of dimension 3;
b. c, the filled characteristic diagram obtained in the step a
Figure 423750DEST_PATH_IMAGE037
Performing global average pooling to obtain results
Figure 725419DEST_PATH_IMAGE038
c. C, the filled characteristic diagram obtained in the step a
Figure 85993DEST_PATH_IMAGE037
Performing maximum pooling to obtain results
Figure 410795DEST_PATH_IMAGE039
d. The results obtained in the step b and the step c are carried outconcatOperate to obtain a first characteristic diagram
Figure 780596DEST_PATH_IMAGE040
e. D, operating the first characteristic diagram obtained in the step d by adopting the following formula to obtain the attention characteristic diagramF N
Figure 671192DEST_PATH_IMAGE041
In the formula
Figure 468247DEST_PATH_IMAGE021
Is a standard convolution operation with a convolution kernel size of 7 x 7, a sample step size of 1,dvoid fraction and value 1;
Figure 965831DEST_PATH_IMAGE022
is a Hadamard product;
f. directing the attention obtained in step eCharacteristic diagramF N Processing the image by adopting the following formula so as to obtain a final feature map based on the attention module output of different levels of the imageF pam
Figure 139323DEST_PATH_IMAGE023
In the formulaF m Feature maps output after processing for a pixel-based attention module
After processing, when the feature map is subjected to global average pooling and maximum pooling through direct connection, since the proportion of weight sharing of convolution to the two pooling is the same, in the highlighted information area, the two pooling do not necessarily contribute the same to the task of compensating for edge details; this convolution can be seen as giving different weights to the pooling operation so that the network can learn edge details better;
s5, training the basic semantic segmentation network based on the attention of different levels of the image and the attention of the pixels obtained in the step S4 by adopting the original training data set obtained in the step S1, so as to obtain the semantic segmentation network based on the attention of different levels of the image and the attention of the pixels;
and S6, performing semantic segmentation on the city streets in real time by adopting the semantic segmentation network based on the attention of different levels and the attention of pixels of the image obtained in the step S5.
The process of the invention is further illustrated below with reference to specific examples:
the method disclosed by the invention is used for carrying out experiments on a Cityscapes data set, a frame is used as a pytorch1.4, an evaluation index is a common semantic segmentation evaluation index mIou (mean intersection over), an attention module based on pixels and attention modules based on different levels of an image are embedded into an FCN, and the dependency relationship among the pixels is calculated, wherein the results of the attention modules based on the pixels and the attention modules based on different levels of the image on the Cityscapes verification set are shown in a table 1.
TABLE 1 schematic representation of the impact of two modules on network performance
Figure 2
In order to verify the performance of the attention module, the invention carries out ablation experiments on the two modules, the mIou of Resnet-baseline is 68.1%, and compared with the basic Resnet-baseline, the mIou of the attention module added on the basis of the Resnet-baseline based on the pixels is 73.8%, which is improved by 5.6%. The attention module based on different image levels aims to refine edges and details, the segmentation performance is not obviously improved, and the mIou added to the attention module based on different image levels on the basis of Resnet-baseline is 69.3%, which is improved by 1.2%. The invention also performs experiments on the pyramid pooling module of the void space without any improvement, and the mIou of the pyramid pooling module is 70.7 percent. Experimental results show that the attention module based on pixels is very helpful for scene segmentation. In view of the computational cost, the backbone network is finally used as Resnet-101 with a down-sampling rate of 8, and the results in Table 1 are calculated from the official toolkit.
The invention also compares the method of the invention with the current advanced network, the data set is the test set of Cityscapes, the segmentation chart is predicted by the test set picture provided by the official party through the network, and the result is shown in the table 2 after the official test:
TABLE 2 schematic presentation of comparative data of the method of the invention and various advanced networks
Figure 3
In table 2, the proposed attention mechanism has an mlou of 69.3%, which significantly improves the performance of the previous FCN network. The ASPP AM module is improved by 5.6 percent compared with a reference network on a verification set of the data set, meanwhile, compared with the current popular network, compared with the original scaled FCN-16, the ASPP AM module is improved by nearly 22 percent, compared with the deep Labv3 containing ASPP, the performance of the ASPP AM module is improved by 5 percent, and compared with the latest bilateral attention network BiANet, the ASPP AM module is improved by 3 percent. The accuracy of the model mIou reaches 69.3%. The two modules of the network emphasize the dependency between pixels and the details of a low-level space, so that the method obtains better performance in a Cityscapes test set.
Finally, a visualization of the effect of the two modules proposed by the present invention on the network performance is shown in fig. 6. In fig. 6, from left to right, the original picture (a), the true value label (b), the result (c) of the baseline method, the result (d) of the ASPP AM method, the result (e) of the PAM method, the result (f) of the ASPP method, and the result (g) of the method of the present invention are shown in order. As can be seen from fig. 6, there are modules in the reset-baseline that are misclassified, and some edge detail partitions are not very coherent, such as: the green belt is mixed with sidewalks, plants in the sky, the vehicle is mixed with backgrounds and the like, and after the ASPP AM module is added, the phenomenon of mistaken separation is reduced because the dependence information among pixels is enhanced. After adding the PAM module, selective details are noted, improving the segmentation of the fine objects, such as: traffic signs and the like can be visually compared in the segmentation chart.
Finally, the invention also provides an automatic driving method comprising the urban street semantic segmentation method; in specific implementation, the semantic segmentation method for the urban streets is adopted to carry out semantic segmentation on the urban streets, and then automatic driving control is carried out according to semantic segmentation results.

Claims (4)

1. A method for segmenting urban street semantics is characterized by comprising the following steps:
s1, acquiring an original training data set;
s2, constructing a basic semantic segmentation network; particularly, a Resnet101 network is used as a basic semantic segmentation network;
s3, adding a pixel-based attention module in the basic semantic segmentation network constructed in the step S2 to obtain a pixel-attention-based basic semantic segmentation network; specifically, at the output end of the fourth block of the Resnet101 network, a pixel-based attention module is connected in series; the pixel-based attention module specifically comprises the following steps:
A. feature map for the fourth block of the Resnet101 networkX 4The outer side of (a) is filled with all 1 s of dimension 1;
B. and D, operating the filled characteristic diagram obtained in the step A by adopting the following formula, thereby obtaining a preprocessed characteristic diagramX pre
Figure DEST_PATH_IMAGE001
In the formula
Figure 986381DEST_PATH_IMAGE002
Is the feature map of the fourth block of the Resnet101 network populated in step A;
Figure DEST_PATH_IMAGE003
for standard convolution operation, the convolution kernel is 3 x 3, the sample step size is 1,dvoid fraction and value 1;BN() The method is a batch standard operation;
C. for the pretreatment characteristic graph obtained in the step BX pre The pixel relation matrix is obtained by processing according to the following formulaX wmap
Figure 359594DEST_PATH_IMAGE004
In the formulaR() Is a reshape operation; x is a matrix multiplication operation;
Figure DEST_PATH_IMAGE005
activating a function for sigmoid;
D. for the pixel relation matrix obtained in the step CX wmap Processing is performed by the following formula to obtain a depth feature mapX proc Is composed ofX proc =R(X wmap ×R(X pre ));
E. Feature map for the fourth block of the Resnet101 networkX 4The outer side of (a) is filled with all 0's of dimension 1;
F. and E, operating the filled characteristic diagram obtained in the step E by adopting the following formula, thereby obtaining the characteristic diagram after convolution with different voidage ratesF 1~F 4
Figure 223644DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE007
Figure 132957DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
In the formula
Figure 673529DEST_PATH_IMAGE010
A feature map of the fourth block of the Resnet101 network populated in step E;
Figure DEST_PATH_IMAGE011
is a standard convolution operation with a convolution kernel size of 1 x 1, a sample step size of 1,dvoid fraction and value 1;
Figure 35502DEST_PATH_IMAGE012
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 12;
Figure DEST_PATH_IMAGE013
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 24;
Figure 703244DEST_PATH_IMAGE014
is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 36;
G. the obtained characteristic diagramX proc F 1F 2F 3AndF 4on the channel, processing is performed using the following equation, thereby completing the processing of the pixel-based attention module:
Figure DEST_PATH_IMAGE015
in the formulaF m The feature map is output after the attention module based on the pixel processes;concat() Splicing operation in channel dimension;
s4, adding attention modules based on different image levels to the semantic segmentation network based on pixel attention obtained in the step S3, so as to obtain a basic semantic segmentation network based on the attention of different image levels and the attention of pixels;
s5, training the basic semantic segmentation network based on the attention of different levels of the image and the attention of the pixels obtained in the step S4 by adopting the original training data set obtained in the step S1, so as to obtain the semantic segmentation network based on the attention of different levels of the image and the attention of the pixels;
and S6, performing semantic segmentation on the city streets in real time by adopting the semantic segmentation network based on the attention of different levels and the attention of pixels of the image obtained in the step S5.
2. The method for semantic segmentation of city streets according to claim 1, wherein step S4 is implemented by adding an attention module based on different image levels to the pixel attention-based semantic segmentation network obtained in step S3, specifically at the output end of the second block of the Resnet101 network, and then connecting the attention module based on different image levels.
3. The method according to claim 2, wherein the attention module based on different image levels comprises the following steps:
a. signature graph X for the output of the second block of the Resnet101 network2The outer side of (a) is filled with all 0 s of dimension 3;
b. c, the filled characteristic diagram obtained in the step a
Figure 231177DEST_PATH_IMAGE016
Performing global average pooling to obtain results
Figure DEST_PATH_IMAGE017
c. C, the filled characteristic diagram obtained in the step a
Figure 178536DEST_PATH_IMAGE016
Performing maximum pooling to obtain results
Figure 870548DEST_PATH_IMAGE018
d. The results obtained in the step b and the step c are carried outconcatOperate to obtain a first characteristic diagram
Figure DEST_PATH_IMAGE019
e. D, operating the first characteristic diagram obtained in the step d by adopting the following formula to obtain the attention characteristic diagramF N
Figure 201035DEST_PATH_IMAGE020
In the formula
Figure DEST_PATH_IMAGE021
Is a standard convolution operation with a convolution kernel size of 7 x 7, a sample step size of 1,dvoid fraction and value 1;
Figure 943994DEST_PATH_IMAGE022
is a Hadamard product;
f. the attention feature map obtained in the step eF N Processing the image by adopting the following formula so as to obtain a final feature map based on the attention module output of different levels of the imageF pam
Figure DEST_PATH_IMAGE023
In the formulaF m Is a feature map output after processing by the pixel-based attention module.
4. An automatic driving method, characterized by comprising the semantic city street segmentation method as claimed in any one of claims 1 to 3.
CN202110670967.0A 2021-06-17 2021-06-17 Urban street semantic segmentation method and automatic driving method Active CN113255574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110670967.0A CN113255574B (en) 2021-06-17 2021-06-17 Urban street semantic segmentation method and automatic driving method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110670967.0A CN113255574B (en) 2021-06-17 2021-06-17 Urban street semantic segmentation method and automatic driving method

Publications (2)

Publication Number Publication Date
CN113255574A CN113255574A (en) 2021-08-13
CN113255574B true CN113255574B (en) 2021-09-14

Family

ID=77188423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110670967.0A Active CN113255574B (en) 2021-06-17 2021-06-17 Urban street semantic segmentation method and automatic driving method

Country Status (1)

Country Link
CN (1) CN113255574B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035299B (en) * 2022-06-20 2023-06-13 河南大学 Improved city street image segmentation method based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN111914935A (en) * 2020-08-03 2020-11-10 哈尔滨工程大学 Ship image target detection method based on deep learning
CN112418027A (en) * 2020-11-11 2021-02-26 青岛科技大学 Remote sensing image road extraction method for improving U-Net network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10929715B2 (en) * 2018-12-31 2021-02-23 Robert Bosch Gmbh Semantic segmentation using driver attention information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN111914935A (en) * 2020-08-03 2020-11-10 哈尔滨工程大学 Ship image target detection method based on deep learning
CN112418027A (en) * 2020-11-11 2021-02-26 青岛科技大学 Remote sensing image road extraction method for improving U-Net network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于注意力机制的多尺度融合航拍影像语义分割;郑顾平 等;《图学学报》;20181231;第1069页摘要,第1071-1073页第1.2-1.3节 *
基于深度学习的道路图像语义分割算法研究;张学涛;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190915;说明书第19-26页第三章 *

Also Published As

Publication number Publication date
CN113255574A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN111898439B (en) Deep learning-based traffic scene joint target detection and semantic segmentation method
CN111563909B (en) Semantic segmentation method for complex street view image
CN113642390B (en) Street view image semantic segmentation method based on local attention network
CN113688836A (en) Real-time road image semantic segmentation method and system based on deep learning
CN112489054A (en) Remote sensing image semantic segmentation method based on deep learning
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN113034506B (en) Remote sensing image semantic segmentation method and device, computer equipment and storage medium
CN114693924A (en) Road scene semantic segmentation method based on multi-model fusion
CN113486886B (en) License plate recognition method and device in natural scene
CN111882620A (en) Road drivable area segmentation method based on multi-scale information
CN113255574B (en) Urban street semantic segmentation method and automatic driving method
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
CN116051977A (en) Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm
CN116363358A (en) Road scene image real-time semantic segmentation method based on improved U-Net
CN114973199A (en) Rail transit train obstacle detection method based on convolutional neural network
CN116704194A (en) Street view image segmentation algorithm based on BiSeNet network and attention mechanism
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN111612803A (en) Vehicle image semantic segmentation method based on image definition
CN115019039B (en) Instance segmentation method and system combining self-supervision and global information enhancement
CN116778318A (en) Convolutional neural network remote sensing image road extraction model and method
CN112634289B (en) Rapid feasible domain segmentation method based on asymmetric void convolution
CN113223006B (en) Lightweight target semantic segmentation method based on deep learning
CN115171092B (en) End-to-end license plate detection method based on semantic enhancement
CN116704196B (en) Method for training image semantic segmentation model
CN114529878B (en) Cross-domain road scene semantic segmentation method based on semantic perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20210813

Assignee: Hunan Yimo Information Technology Co.,Ltd.

Assignor: HUNAN NORMAL University

Contract record no.: X2023980033719

Denomination of invention: Semantic Segmentation Method and Automatic Driving Method for Urban Streets

Granted publication date: 20210914

License type: Common License

Record date: 20230317