CN113255574B - Urban street semantic segmentation method and automatic driving method - Google Patents
Urban street semantic segmentation method and automatic driving method Download PDFInfo
- Publication number
- CN113255574B CN113255574B CN202110670967.0A CN202110670967A CN113255574B CN 113255574 B CN113255574 B CN 113255574B CN 202110670967 A CN202110670967 A CN 202110670967A CN 113255574 B CN113255574 B CN 113255574B
- Authority
- CN
- China
- Prior art keywords
- attention
- semantic segmentation
- network
- pixel
- characteristic diagram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
Abstract
The invention discloses a semantic segmentation method for urban streets, which comprises the steps of obtaining an original training data set; constructing a basic semantic segmentation network, adding an attention module based on pixels and an attention module based on different image levels to obtain the basic semantic segmentation network based on the attention of the different image levels and the attention of the pixels, and training to obtain the semantic segmentation network based on the attention of the different image levels and the attention of the pixels; and performing semantic segmentation on the urban streets in real time by adopting a semantic segmentation network based on different levels of attention and pixel attention of the image. The invention also discloses an automatic driving method comprising the urban street semantic segmentation method. The method of the invention fully utilizes the information of the high-level characteristic diagram and the information of the low-level characteristic diagram, and has high precision, good reliability and better real-time property.
Description
Technical Field
The invention belongs to the field of computer vision and image processing, and particularly relates to a semantic segmentation method and an automatic driving method for urban streets.
Background
With the development of economic technology and the improvement of living standard of people, computer vision technology is gradually applied to the production and life of people, and brings endless convenience to the production and life of people.
Semantic segmentation is one of core research hotspots of computer vision, and is aimed at dividing an image into regions with semantic information, allocating a semantic label to each region block, and finally obtaining a segmented image with each pixel being semantically labeled.
The prior art mainly has two semantic segmentation methods: the image semantic segmentation method based on the traditional image semantic segmentation method and the image semantic segmentation method based on the deep learning. The image semantic segmentation method based on deep learning has richer learning characteristics, stronger expression capability and greatly improved segmentation precision, so the method becomes the key point of research. The full convolution network applies the classification network to the network, replaces the full connection layer of the traditional convolution neural network with the convolution layer, combines the characteristic diagram generated by the middle convolution layer by using a jump layer method, and then performs transposition convolution; however, this approach presents two problems: 1. with the convolution pooling, the resolution is continuously reduced, and partial pixels are lost; 2. the original context information of the feature map is not considered. Subsequently, a number of researchers have proposed improved methods based on full convolutional networks. In the pyramid network, the pyramid pooling module can fuse multi-scale context information, effectively utilize the context information, and can make the segmentation result more detailed, but the disadvantage is that the boundary information part of the segmentation target is lost. The U-shaped neural network is a network model of a coder-decoder and comprises a contraction path and an expansion path, wherein the contraction path extracts context information, and the expansion path gradually restores object details and image resolution, but the U-shaped neural network has the defects that network training parameters are too many, the calculation amount is large, and real-time processing cannot be met. OCNet forms a target context feature map by calculating the similarity of each pixel with all pixels, and then represents the pixel by aggregating the features of all pixels, but has the disadvantage that part of the pixels are lost in the process. In deep lab-v3, a void convolution kernel pyramid pooling method is combined to construct a void space pyramid pooling module, and multi-scale context information is captured by using convolutions with different void rates, so that the receptive field is effectively enhanced, the spatial accuracy of a segmentation result is improved, but the defect is that the dependency among pixels is lost.
Disclosure of Invention
The invention aims to provide a semantic segmentation method for urban streets, which has high accuracy, good reliability and good real-time performance.
The invention also aims to provide an automatic driving method comprising the urban street semantic segmentation method.
The invention provides a semantic segmentation method for urban streets, which comprises the following steps:
s1, acquiring an original training data set;
s2, constructing a basic semantic segmentation network;
s3, adding a pixel-based attention module in the basic semantic segmentation network constructed in the step S2 to obtain a pixel-attention-based basic semantic segmentation network;
s4, adding attention modules based on different image levels to the semantic segmentation network based on pixel attention obtained in the step S3, so as to obtain a basic semantic segmentation network based on the attention of different image levels and the attention of pixels;
s5, training the basic semantic segmentation network based on the attention of different levels of the image and the attention of the pixels obtained in the step S4 by adopting the original training data set obtained in the step S1, so as to obtain the semantic segmentation network based on the attention of different levels of the image and the attention of the pixels;
and S6, performing semantic segmentation on the city streets in real time by adopting the semantic segmentation network based on the attention of different levels and the attention of pixels of the image obtained in the step S5.
Step S2, constructing a basic semantic segmentation network, specifically, using a Resnet101 network as the basic semantic segmentation network.
In the basic semantic segmentation network constructed in step S2 and described in step S3, a pixel-based attention module is added, specifically, the pixel-based attention module is connected in series at the output end of the fourth block of the Resnet101 network.
The pixel-based attention module specifically comprises the following steps:
A. features to the fourth block of the Resnet101 networkSign graphX 4The outer side of (a) is filled with all 1 s of dimension 1;
B. and D, operating the filled characteristic diagram obtained in the step A by adopting the following formula, thereby obtaining a preprocessed characteristic diagramX pre :
In the formulaIs the feature map of the fourth block of the Resnet101 network populated in step A;for standard convolution operation, the convolution kernel is 3 x 3, the sample step size is 1,dvoid fraction and value 1;BN() The method is a batch standard operation;
C. for the pretreatment characteristic graph obtained in the step BX pre The pixel relation matrix is obtained by processing according to the following formulaX wmap :
In the formulaR() Is a reshape operation; x is a matrix multiplication operation;activating a function for sigmoid;
D. for the pixel relation matrix obtained in the step CX wmap Processing is performed by the following formula to obtain a depth feature mapX proc Is composed ofX proc =R(X wmap ×R(X pre ));
E. Feature map for the fourth block of the Resnet101 networkX 4Is proceeding to the outside ofAll 0 fills of 1;
F. and E, operating the filled characteristic diagram obtained in the step E by adopting the following formula, thereby obtaining the characteristic diagram after convolution with different voidage ratesF 1~F 4:
In the formulaA feature map of the fourth block of the Resnet101 network populated in step E;is a standard convolution operation with a convolution kernel size of 1 x 1, a sample step size of 1,dvoid fraction and value 1;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 12;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 24;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 36;
G. the obtained characteristic diagramX proc 、F 1、F 2、F 3AndF 4on the channel, processing is performed using the following equation, thereby completing the processing of the pixel-based attention module:
in the formulaF m The feature map is output after the attention module based on the pixel processes;concat() A splicing operation in the channel dimension.
In the semantic segmentation based on pixel attention obtained in step S3 in step S4, attention modules based on different image levels are added, specifically at the output end of the second block of the Resnet101 network, and are connected in parallel to the attention modules based on different image levels.
The attention module based on different image levels specifically comprises the following steps:
a. signature graph X for the output of the second block of the Resnet101 network2The outer side of (a) is filled with all 0 s of dimension 3;
b. c, the filled characteristic diagram obtained in the step aPerforming global average pooling to obtain results;
c. C, the filled characteristic diagram obtained in the step aPerforming maximum pooling to obtain results;
d. The results obtained in the step b and the step c are carried outconcatOperate to obtain a first characteristic diagram;
e. D, operating the first characteristic diagram obtained in the step d by adopting the following formula to obtain the attention characteristic diagramF N :
In the formulaIs a standard convolution operation with a convolution kernel size of 7 x 7, a sample step size of 1,dvoid fraction and value 1;is a Hadamard product;
f. the attention feature map obtained in the step eF N Processing the image by adopting the following formula so as to obtain a final feature map based on the attention module output of different levels of the imageF pam :
In the formulaF m Is a feature map output after processing by the pixel-based attention module.
The invention also provides an automatic driving method comprising the urban street semantic segmentation method.
The urban street semantic segmentation method and the automatic driving method provided by the invention utilize the relation among the high-level feature image pixels to obtain the global information, enhance the correlation among the pixels, and further extract the feature information of the image by combining with the attention module based on the pixels; aiming at the problem of pixel loss of the high-level feature image of the image, an attention module based on different levels of the image is provided, information in the high-level feature image is used as guidance to mine information hidden in the low-level feature image, and then the information is fused with the high-level feature image; therefore, the method of the invention fully utilizes the information of the high-level characteristic diagram and the information of the low-level characteristic diagram, and has high precision, good reliability and good real-time performance.
Drawings
FIG. 1 is a schematic process flow diagram of the process of the present invention.
FIG. 2 is a schematic diagram of a model structure of the method of the present invention.
FIG. 3 is a schematic diagram of a pixel-based attention module in a model structure of the method of the present invention.
FIG. 4 is a schematic diagram of a PSAM module in a pixel-based attention module in a model structure of the method of the present invention.
FIG. 5 is a schematic diagram of the structure of the attention module based on different image levels in the model structure of the method of the present invention.
FIG. 6 is a diagram illustrating the comparison between the effect of the present invention and the effect of the prior art on the same atlas.
Detailed Description
FIG. 1 is a schematic flow chart of the method of the present invention: the method for segmenting the urban street semantics provided by the invention comprises the following steps (the structure schematic diagram of the network provided by the invention is shown in figure 2):
s1, acquiring an original training data set;
s2, constructing a basic semantic segmentation network; particularly, a Resnet101 network is used as a basic semantic segmentation network;
s3, adding a pixel-based attention module in the basic semantic segmentation network constructed in the step S2 to obtain a pixel-attention-based basic semantic segmentation network; in particular, at the output of the fourth block of the Resnet101 network, a pixel-based attention module (ASPP AM module in fig. 2) is concatenated;
in particular, the structure of the pixel-based attention module is shown in FIG. 3; the method specifically comprises the following steps:
A. feature map for the fourth block of the Resnet101 networkX 4The outer side of (a) is filled with all 1 s of dimension 1;
B. and D, operating the filled characteristic diagram obtained in the step A by adopting the following formula, thereby obtaining a preprocessed characteristic diagramX pre :
In the formulaIs the feature map of the fourth block of the Resnet101 network populated in step A;for standard convolution operation, the convolution kernel is 3 x 3, the sample step size is 1,dvoid fraction and value 1;BN() The method is a batch standard operation;
C. for the pretreatment characteristic graph obtained in the step BX pre The pixel relation matrix is obtained by processing according to the following formulaX wmap :
In the formulaR() Is a reshape operation; x is a matrix multiplication operation;activating a function for sigmoid;
D. for the pixel relation matrix obtained in the step CX wmap Processing is performed by the following formula to obtain a depth feature mapX proc Is composed ofX proc =R(X wmap ×R(X pre ));
The depth feature map can enable the pixel category information to get more attention, and meanwhile, the detail information is more highlighted;
E. padding the outer side of the feature map X4 of the fourth block of the Resnet101 network with all 0's of scale 1;
wherein, the processing and calculation performed in steps B to E are the PSAM module in fig. 3, and a schematic network structure diagram thereof is shown in fig. 4;
F. and E, operating the filled characteristic diagram obtained in the step E by adopting the following formula, thereby obtaining the characteristic diagram after convolution with different voidage ratesF 1~F 4:
In the formulaA feature map of the fourth block of the Resnet101 network populated in step E;is a standard convolution operation with a convolution kernel size of 1 x 1, a sample step size of 1,dvoid fraction and value 1;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 12;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 24;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 36;
G. the obtained characteristic diagramX proc 、F 1、F 2、F 3AndF 4on the channel, processing is performed using the following equation, thereby completing the processing of the pixel-based attention module:
in the formulaF m The feature map is output after the attention module based on the pixel processes;concat() Splicing operation in channel dimension;
after the processing, the detail information of the feature map is weighted and refined, the context information is richer, and the size of the feature map is unchanged;
s4, adding attention modules based on different image levels to the semantic segmentation network based on pixel attention obtained in the step S3, so as to obtain a basic semantic segmentation network based on the attention of different image levels and the attention of pixels; specifically, at the output end of the second block of the Resnet101 network, attention modules (PAM module in fig. 2) based on different levels of the image are connected in parallel;
in specific implementation, a schematic diagram of a network structure of attention modules based on different image levels is shown in fig. 5; the method specifically comprises the following steps:
a. signature graph X for the output of the second block of the Resnet101 network2The outer side of (a) is filled with all 0 s of dimension 3;
b. c, the filled characteristic diagram obtained in the step aPerforming global average pooling to obtain results;
c. C, the filled characteristic diagram obtained in the step aPerforming maximum pooling to obtain results;
d. The results obtained in the step b and the step c are carried outconcatOperate to obtain a first characteristic diagram;
e. D, operating the first characteristic diagram obtained in the step d by adopting the following formula to obtain the attention characteristic diagramF N :
In the formulaIs a standard convolution operation with a convolution kernel size of 7 x 7, a sample step size of 1,dvoid fraction and value 1;is a Hadamard product;
f. directing the attention obtained in step eCharacteristic diagramF N Processing the image by adopting the following formula so as to obtain a final feature map based on the attention module output of different levels of the imageF pam :
In the formulaF m Feature maps output after processing for a pixel-based attention module
After processing, when the feature map is subjected to global average pooling and maximum pooling through direct connection, since the proportion of weight sharing of convolution to the two pooling is the same, in the highlighted information area, the two pooling do not necessarily contribute the same to the task of compensating for edge details; this convolution can be seen as giving different weights to the pooling operation so that the network can learn edge details better;
s5, training the basic semantic segmentation network based on the attention of different levels of the image and the attention of the pixels obtained in the step S4 by adopting the original training data set obtained in the step S1, so as to obtain the semantic segmentation network based on the attention of different levels of the image and the attention of the pixels;
and S6, performing semantic segmentation on the city streets in real time by adopting the semantic segmentation network based on the attention of different levels and the attention of pixels of the image obtained in the step S5.
The process of the invention is further illustrated below with reference to specific examples:
the method disclosed by the invention is used for carrying out experiments on a Cityscapes data set, a frame is used as a pytorch1.4, an evaluation index is a common semantic segmentation evaluation index mIou (mean intersection over), an attention module based on pixels and attention modules based on different levels of an image are embedded into an FCN, and the dependency relationship among the pixels is calculated, wherein the results of the attention modules based on the pixels and the attention modules based on different levels of the image on the Cityscapes verification set are shown in a table 1.
TABLE 1 schematic representation of the impact of two modules on network performance
In order to verify the performance of the attention module, the invention carries out ablation experiments on the two modules, the mIou of Resnet-baseline is 68.1%, and compared with the basic Resnet-baseline, the mIou of the attention module added on the basis of the Resnet-baseline based on the pixels is 73.8%, which is improved by 5.6%. The attention module based on different image levels aims to refine edges and details, the segmentation performance is not obviously improved, and the mIou added to the attention module based on different image levels on the basis of Resnet-baseline is 69.3%, which is improved by 1.2%. The invention also performs experiments on the pyramid pooling module of the void space without any improvement, and the mIou of the pyramid pooling module is 70.7 percent. Experimental results show that the attention module based on pixels is very helpful for scene segmentation. In view of the computational cost, the backbone network is finally used as Resnet-101 with a down-sampling rate of 8, and the results in Table 1 are calculated from the official toolkit.
The invention also compares the method of the invention with the current advanced network, the data set is the test set of Cityscapes, the segmentation chart is predicted by the test set picture provided by the official party through the network, and the result is shown in the table 2 after the official test:
TABLE 2 schematic presentation of comparative data of the method of the invention and various advanced networks
In table 2, the proposed attention mechanism has an mlou of 69.3%, which significantly improves the performance of the previous FCN network. The ASPP AM module is improved by 5.6 percent compared with a reference network on a verification set of the data set, meanwhile, compared with the current popular network, compared with the original scaled FCN-16, the ASPP AM module is improved by nearly 22 percent, compared with the deep Labv3 containing ASPP, the performance of the ASPP AM module is improved by 5 percent, and compared with the latest bilateral attention network BiANet, the ASPP AM module is improved by 3 percent. The accuracy of the model mIou reaches 69.3%. The two modules of the network emphasize the dependency between pixels and the details of a low-level space, so that the method obtains better performance in a Cityscapes test set.
Finally, a visualization of the effect of the two modules proposed by the present invention on the network performance is shown in fig. 6. In fig. 6, from left to right, the original picture (a), the true value label (b), the result (c) of the baseline method, the result (d) of the ASPP AM method, the result (e) of the PAM method, the result (f) of the ASPP method, and the result (g) of the method of the present invention are shown in order. As can be seen from fig. 6, there are modules in the reset-baseline that are misclassified, and some edge detail partitions are not very coherent, such as: the green belt is mixed with sidewalks, plants in the sky, the vehicle is mixed with backgrounds and the like, and after the ASPP AM module is added, the phenomenon of mistaken separation is reduced because the dependence information among pixels is enhanced. After adding the PAM module, selective details are noted, improving the segmentation of the fine objects, such as: traffic signs and the like can be visually compared in the segmentation chart.
Finally, the invention also provides an automatic driving method comprising the urban street semantic segmentation method; in specific implementation, the semantic segmentation method for the urban streets is adopted to carry out semantic segmentation on the urban streets, and then automatic driving control is carried out according to semantic segmentation results.
Claims (4)
1. A method for segmenting urban street semantics is characterized by comprising the following steps:
s1, acquiring an original training data set;
s2, constructing a basic semantic segmentation network; particularly, a Resnet101 network is used as a basic semantic segmentation network;
s3, adding a pixel-based attention module in the basic semantic segmentation network constructed in the step S2 to obtain a pixel-attention-based basic semantic segmentation network; specifically, at the output end of the fourth block of the Resnet101 network, a pixel-based attention module is connected in series; the pixel-based attention module specifically comprises the following steps:
A. feature map for the fourth block of the Resnet101 networkX 4The outer side of (a) is filled with all 1 s of dimension 1;
B. and D, operating the filled characteristic diagram obtained in the step A by adopting the following formula, thereby obtaining a preprocessed characteristic diagramX pre :
In the formulaIs the feature map of the fourth block of the Resnet101 network populated in step A;for standard convolution operation, the convolution kernel is 3 x 3, the sample step size is 1,dvoid fraction and value 1;BN() The method is a batch standard operation;
C. for the pretreatment characteristic graph obtained in the step BX pre The pixel relation matrix is obtained by processing according to the following formulaX wmap :
In the formulaR() Is a reshape operation; x is a matrix multiplication operation;activating a function for sigmoid;
D. for the pixel relation matrix obtained in the step CX wmap Processing is performed by the following formula to obtain a depth feature mapX proc Is composed ofX proc =R(X wmap ×R(X pre ));
E. Feature map for the fourth block of the Resnet101 networkX 4The outer side of (a) is filled with all 0's of dimension 1;
F. and E, operating the filled characteristic diagram obtained in the step E by adopting the following formula, thereby obtaining the characteristic diagram after convolution with different voidage ratesF 1~F 4:
In the formulaA feature map of the fourth block of the Resnet101 network populated in step E;is a standard convolution operation with a convolution kernel size of 1 x 1, a sample step size of 1,dvoid fraction and value 1;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 12;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 24;is a standard convolution operation with a convolution kernel size of 3 x 3, a sample step size of 1,dvoid fraction and value 36;
G. the obtained characteristic diagramX proc 、F 1、F 2、F 3AndF 4on the channel, processing is performed using the following equation, thereby completing the processing of the pixel-based attention module:
in the formulaF m The feature map is output after the attention module based on the pixel processes;concat() Splicing operation in channel dimension;
s4, adding attention modules based on different image levels to the semantic segmentation network based on pixel attention obtained in the step S3, so as to obtain a basic semantic segmentation network based on the attention of different image levels and the attention of pixels;
s5, training the basic semantic segmentation network based on the attention of different levels of the image and the attention of the pixels obtained in the step S4 by adopting the original training data set obtained in the step S1, so as to obtain the semantic segmentation network based on the attention of different levels of the image and the attention of the pixels;
and S6, performing semantic segmentation on the city streets in real time by adopting the semantic segmentation network based on the attention of different levels and the attention of pixels of the image obtained in the step S5.
2. The method for semantic segmentation of city streets according to claim 1, wherein step S4 is implemented by adding an attention module based on different image levels to the pixel attention-based semantic segmentation network obtained in step S3, specifically at the output end of the second block of the Resnet101 network, and then connecting the attention module based on different image levels.
3. The method according to claim 2, wherein the attention module based on different image levels comprises the following steps:
a. signature graph X for the output of the second block of the Resnet101 network2The outer side of (a) is filled with all 0 s of dimension 3;
b. c, the filled characteristic diagram obtained in the step aPerforming global average pooling to obtain results;
c. C, the filled characteristic diagram obtained in the step aPerforming maximum pooling to obtain results;
d. The results obtained in the step b and the step c are carried outconcatOperate to obtain a first characteristic diagram;
e. D, operating the first characteristic diagram obtained in the step d by adopting the following formula to obtain the attention characteristic diagramF N :
In the formulaIs a standard convolution operation with a convolution kernel size of 7 x 7, a sample step size of 1,dvoid fraction and value 1;is a Hadamard product;
f. the attention feature map obtained in the step eF N Processing the image by adopting the following formula so as to obtain a final feature map based on the attention module output of different levels of the imageF pam :
In the formulaF m Is a feature map output after processing by the pixel-based attention module.
4. An automatic driving method, characterized by comprising the semantic city street segmentation method as claimed in any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110670967.0A CN113255574B (en) | 2021-06-17 | 2021-06-17 | Urban street semantic segmentation method and automatic driving method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110670967.0A CN113255574B (en) | 2021-06-17 | 2021-06-17 | Urban street semantic segmentation method and automatic driving method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113255574A CN113255574A (en) | 2021-08-13 |
CN113255574B true CN113255574B (en) | 2021-09-14 |
Family
ID=77188423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110670967.0A Active CN113255574B (en) | 2021-06-17 | 2021-06-17 | Urban street semantic segmentation method and automatic driving method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255574B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035299B (en) * | 2022-06-20 | 2023-06-13 | 河南大学 | Improved city street image segmentation method based on deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784386A (en) * | 2018-12-29 | 2019-05-21 | 天津大学 | A method of it is detected with semantic segmentation helpers |
CN110163875A (en) * | 2019-05-23 | 2019-08-23 | 南京信息工程大学 | One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature |
CN111914935A (en) * | 2020-08-03 | 2020-11-10 | 哈尔滨工程大学 | Ship image target detection method based on deep learning |
CN112418027A (en) * | 2020-11-11 | 2021-02-26 | 青岛科技大学 | Remote sensing image road extraction method for improving U-Net network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10929715B2 (en) * | 2018-12-31 | 2021-02-23 | Robert Bosch Gmbh | Semantic segmentation using driver attention information |
-
2021
- 2021-06-17 CN CN202110670967.0A patent/CN113255574B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784386A (en) * | 2018-12-29 | 2019-05-21 | 天津大学 | A method of it is detected with semantic segmentation helpers |
CN110163875A (en) * | 2019-05-23 | 2019-08-23 | 南京信息工程大学 | One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature |
CN111914935A (en) * | 2020-08-03 | 2020-11-10 | 哈尔滨工程大学 | Ship image target detection method based on deep learning |
CN112418027A (en) * | 2020-11-11 | 2021-02-26 | 青岛科技大学 | Remote sensing image road extraction method for improving U-Net network |
Non-Patent Citations (2)
Title |
---|
基于注意力机制的多尺度融合航拍影像语义分割;郑顾平 等;《图学学报》;20181231;第1069页摘要,第1071-1073页第1.2-1.3节 * |
基于深度学习的道路图像语义分割算法研究;张学涛;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190915;说明书第19-26页第三章 * |
Also Published As
Publication number | Publication date |
---|---|
CN113255574A (en) | 2021-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111898439B (en) | Deep learning-based traffic scene joint target detection and semantic segmentation method | |
CN111563909B (en) | Semantic segmentation method for complex street view image | |
CN113642390B (en) | Street view image semantic segmentation method based on local attention network | |
CN113688836A (en) | Real-time road image semantic segmentation method and system based on deep learning | |
CN112489054A (en) | Remote sensing image semantic segmentation method based on deep learning | |
CN113256649B (en) | Remote sensing image station selection and line selection semantic segmentation method based on deep learning | |
CN113034506B (en) | Remote sensing image semantic segmentation method and device, computer equipment and storage medium | |
CN114693924A (en) | Road scene semantic segmentation method based on multi-model fusion | |
CN113486886B (en) | License plate recognition method and device in natural scene | |
CN111882620A (en) | Road drivable area segmentation method based on multi-scale information | |
CN113255574B (en) | Urban street semantic segmentation method and automatic driving method | |
CN113298817A (en) | High-accuracy semantic segmentation method for remote sensing image | |
CN116051977A (en) | Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm | |
CN116363358A (en) | Road scene image real-time semantic segmentation method based on improved U-Net | |
CN114973199A (en) | Rail transit train obstacle detection method based on convolutional neural network | |
CN116704194A (en) | Street view image segmentation algorithm based on BiSeNet network and attention mechanism | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN111612803A (en) | Vehicle image semantic segmentation method based on image definition | |
CN115019039B (en) | Instance segmentation method and system combining self-supervision and global information enhancement | |
CN116778318A (en) | Convolutional neural network remote sensing image road extraction model and method | |
CN112634289B (en) | Rapid feasible domain segmentation method based on asymmetric void convolution | |
CN113223006B (en) | Lightweight target semantic segmentation method based on deep learning | |
CN115171092B (en) | End-to-end license plate detection method based on semantic enhancement | |
CN116704196B (en) | Method for training image semantic segmentation model | |
CN114529878B (en) | Cross-domain road scene semantic segmentation method based on semantic perception |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20210813 Assignee: Hunan Yimo Information Technology Co.,Ltd. Assignor: HUNAN NORMAL University Contract record no.: X2023980033719 Denomination of invention: Semantic Segmentation Method and Automatic Driving Method for Urban Streets Granted publication date: 20210914 License type: Common License Record date: 20230317 |