CN116630702A - Pavement adhesion coefficient prediction method based on semantic segmentation network - Google Patents
Pavement adhesion coefficient prediction method based on semantic segmentation network Download PDFInfo
- Publication number
- CN116630702A CN116630702A CN202310580153.7A CN202310580153A CN116630702A CN 116630702 A CN116630702 A CN 116630702A CN 202310580153 A CN202310580153 A CN 202310580153A CN 116630702 A CN116630702 A CN 116630702A
- Authority
- CN
- China
- Prior art keywords
- network
- road surface
- image
- training
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 110
- 238000000034 method Methods 0.000 title claims abstract description 94
- 238000012549 training Methods 0.000 claims abstract description 83
- 238000000605 extraction Methods 0.000 claims abstract description 67
- 230000007246 mechanism Effects 0.000 claims abstract description 26
- 238000013507 mapping Methods 0.000 claims abstract description 11
- 238000004519 manufacturing process Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 58
- 230000006870 function Effects 0.000 claims description 45
- 239000011159 matrix material Substances 0.000 claims description 43
- 230000004913 activation Effects 0.000 claims description 34
- 238000012545 processing Methods 0.000 claims description 33
- 238000012795 verification Methods 0.000 claims description 22
- 238000002372 labelling Methods 0.000 claims description 12
- 239000010426 asphalt Substances 0.000 claims description 11
- 239000004568 cement Substances 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 6
- 241000512668 Eunectes Species 0.000 claims description 6
- 230000001174 ascending effect Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000002452 interceptive effect Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 206010039203 Road traffic accident Diseases 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 4
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 claims description 3
- 230000003416 augmentation Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000013434 data augmentation Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000007667 floating Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000003709 image segmentation Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pavement adhesion coefficient prediction method based on a semantic segmentation network, which comprises the steps of firstly, building the semantic segmentation network based on a multi-scale space attention mechanism; secondly, pre-training the constructed segmentation network on a public data set; then enriching the semantic segmentation data set and carrying out specific training on the segmentation network; then, extracting a pavement area by utilizing the segmentation network to manufacture a pavement classification network data set; then, constructing and training a road surface type classification network; finally, establishing a mapping rule to obtain road adhesion coefficient information; the method enhances the generalization capability of the algorithm on the rain and snow driving scene, and further improves the accuracy, instantaneity and robustness of the extraction of the driving road surface; meanwhile, a semantic segmentation network, a road surface extraction and a road surface recognition network serial algorithm structure are designed by combining a lightweight road surface recognition network, so that quick and accurate prediction of road surface attachment information in rich driving scenes can be realized.
Description
Technical Field
The invention belongs to the technical field of intelligent automobiles, relates to an adhesion coefficient prediction method based on computer vision, and more particularly relates to a pavement adhesion coefficient prediction method based on a semantic segmentation network;
Background
Along with the increase of the quantity of the vehicles, the traffic safety problem is brought along with the convenience and rapidness of the vehicles, the traffic accidents become one of the main reasons for casualties at home and abroad, particularly under the condition of low adhesion coefficient, the unstable conditions of the vehicles such as sideslip, drifting, collision and the like are more easy to occur, and the life and property loss caused by the traffic accidents is more serious, so that the accurate road adhesion coefficient information can be acquired in advance to provide references for drivers, thereby being beneficial to improving the driving safety; meanwhile, the advanced active safety system of the automobile needs an accurate road adhesion coefficient as a support, the road adhesion coefficient can be accurately obtained, the working condition application range of the active safety system of the automobile can be expanded, and the active safety system can timely adjust the control strategy by sensing the front road state change in advance; therefore, accurate identification and acquisition of the road surface type and attachment coefficient of the driving road surface are key to improving driving safety and comfort;
at present, the method for obtaining the road adhesion coefficient mainly comprises an estimation method based on dynamic response and an image vision prediction method based on a neural network; the estimation accuracy of the estimator based on the dynamic model is limited by the accuracy of the vehicle model and the tire model, and the estimated value of the road adhesion coefficient is difficult to obtain in advance, so that certain hysteresis exists; the rapid development of the neural network and the upgrading and perfecting of intelligent vehicle hardware facilities such as the vehicle-mounted camera lead the visual road surface attachment coefficient identification method to be more reliable; meanwhile, the vision-based prediction method effectively improves the hysteresis of the dynamics estimator, can sense the road surface condition in front in advance, and improves the response capability to dangerous working conditions; in addition, the semantic segmentation network is embedded in the visual prediction algorithm, so that the algorithm can concentrate on road information, interference of redundant information is eliminated, the road surface recognition accuracy is improved, and more accurate road surface attachment coefficient prediction information is obtained;
Disclosure of Invention
Aiming at the problems existing in the prior art, in order to improve the accuracy, the instantaneity and the robustness of a pavement adhesion coefficient prediction algorithm, the invention provides a pavement adhesion coefficient prediction method based on a semantic segmentation network; according to the method, a semantic segmentation network based on an attention mechanism is built to extract a road surface area of a front driving road, accuracy and robustness of the network are improved as much as possible in a pre-training and specific training mode, then a road surface classification network based on channel attention is built and trained, and the obtained road surface classification result is combined with a mapping rule to obtain a prediction result of a road surface attachment coefficient;
the invention is realized by adopting the following technical scheme:
a pavement adhesion coefficient prediction method based on a semantic segmentation network predicts the road type classification and pavement adhesion coefficient of a vehicle driving area under urban working conditions, and specifically comprises the following steps:
step one, building a semantic segmentation network based on a multiscale space attention mechanism:
the vehicle-mounted camera equipped with the intelligent driving automobile sensing system is used for collecting video data of a road ahead in the running process of the automobile, and the exercisable road surface area is extracted through the semantic segmentation network:
Firstly, configuring and constructing a network environment: selecting a Linux operating system as an image processing and network construction and training operation environment, writing a program code by using a Python language, and selecting PyTorch developed by Meta artificial intelligence team of science and technology company as a deep learning network construction frame; using environment compiling software Anaconde to create a virtual environment, and installing Python 3.10.8 version, pyTorch 1.11.0 version and OpenCV 4.6.0.66 version in the new environment;
secondly, constructing a semantic segmentation network, and adopting an advanced lightweight Encoder-Decoder framework Encoder-Decoder to ensure the precision of the semantic segmentation network and meet the real-time requirement of an algorithm;
then, starting to specifically develop and build a hierarchical semantic segmentation network Encoder structure based on an attention mechanism under the Encoder-Decoder framework Encoder-Decoder for generating rich multi-scale semantic features: firstly, scaling an input image into tensors with the size of 1024 multiplied by 3, and inputting the tensors into a semantic segmentation network; performing a 3×3 convolutional layer, a BN layer and Gelu activation treatment to realize the functions of preliminary extraction and downsampling of features of an input picture;
and then taking the obtained feature map as input of a feature extraction module, wherein the specific implementation process of the feature extraction module is as follows: firstly, adopting 1X 1 convolution for adjusting the number of input channels, inputting the input channels into a spatial attention mechanism module formed by 5X 5 main branch convolution and subsequent 7X 7 and 13X 13 branch convolution after Gelu activation function, and outputting a weight parameter representing spatial attention through the 1X 1 convolution; wherein the multi-scale spatial attention mechanism formula is shown as follows:
Atten in formula (1) is the output weight parameter representing the spatial attention, atten 0 Weight output for main branch of spatial attention module, atten i The weight of the ith branch convolution in the module is output, wherein i is the number of the branch convolutions;
expanding the number of channels of the feature map through 1X 1 convolution, carrying out finer feature extraction on the feature map subjected to space attention weighting processing by adopting 3X 3 convolution, activating an extraction result through a GeLu function, and finally outputting a processing result of a feature extraction module through 1X 1 convolution;
wherein Output is the Output of the bottleneck module, F is the input of the bottleneck module, conv2D (·) represents the convolution for processing the input feature map, gelu (·) represents the further processing of the feature map using the activation function;
performing four-time serial feature extraction on the network input tensor by using the constructed encoder bottleneck module to obtain four-stage feature graphs representing different semantic levels; the first stage feature extraction is performed by three serial feature extraction modules, and 256×256×32 feature graphs are output; the number of layers of the convolution layers, the number of in-layer convolutions and the number of channels are adjusted through the downsampling layer architecture, and after the feature map is downsampled, the feature map with the size of 128 multiplied by 64 in the second stage is obtained through three serial feature extraction modules; then obtaining a 64×64×160 phase three feature map through a downsampling layer and five serial feature processing modules; finally, obtaining a 32 multiplied by 256 feature map of the fourth stage through downsampling once again and two feature extraction modules;
Finally, a lightweight Decoder architecture is built based on specific expansion of an Encoder-Decoder framework Encoder-Decoder: the feature images of different stages extracted by the encoder are processed sequentially through an up-sampling, channel splicing operation, a light ham _head decoder module and a full-connection classifier, and are used for fusing semantic information of different levels; in order to collect multi-scale semantic information and expand receptive fields, a stage I with more low-level semantic information is abandoned, the feature images of a stage II, a stage III and a stage IV are processed into feature images with the same size through an up-sampling method of bilinear interpolation, then channel splicing concat operation is carried out, the feature images are input into a ham _head decoder module, and the feature images with the sizes of 128 multiplied by 480 after the integration are processed; the specific implementation process of the ham _head decoder module is as follows: compressing the channel number of the 480 channels of the feature map obtained by the concat operation to 256 through a 1 multiplied by 1 convolution layer, a 32 group GroupNorm layer and a ReLu activation layer, and adopting a non-Negative Matrix Factorization (NMF) algorithm shown in a formula (3):
V m×n =P m×r ×Q r×n (3)
matrix V of features m×n Low-set matrix P decomposed into mxr and rxn m×r And Q r×n Wherein P is m×r The main characteristics of the data are reflected by the characteristic base matrix; and Q is r×n A characteristic coefficient matrix which represents the distribution of data characteristics; the characteristic matrix is replaced by the characteristic matrix in the algorithm, so that the interference of a redundant matrix can be avoided, the image algorithm processing process is accelerated, and the algorithm instantaneity is improved; finally, outputting a processing result through a linearization layer of a 1X 1 convolution architecture formed by a 1X 1 convolution layer, 32 groups of group Norm layers and a ReLu activation layer to obtain the output of the ham _head decoder module;
then the processed feature map is subjected to full connection operation through a 1×1 convolution layer, category scores are converted into probability distribution through a softmax function, and a 256×256×nls feature map is obtained, wherein the value of each pixel point represents the probability of belonging to the semantic category, and nls represents the number of the semantic categories to be classified; then upsampling the obtained feature map by bilinear interpolation, and giving semantic class prediction results of each pixel point according to the maximum probability by using ArgMax function to obtain 1024×1024 prediction segmentation result map with the same size as the input picture, namely the output of the encoder;
step two, pre-training the built segmentation network on a public data set:
the method comprises the steps of selecting a Cityscapes data set to conduct pre-training of a semantic segmentation network, wherein the data set is pushed by a Benz company to issue street view image segmentation tasks which can be used for city street view driving scene images of 50 cities, and the street view driving scene images have accurate labels covering 19 semantic information of road surfaces, buildings, pedestrians, vehicles, buildings and the like; meanwhile, the image sample is similar to the driving form of the daily urban road condition, so that the pre-trained semantic segmentation network has certain generalization capability; the number of the accurate annotation image samples participating in the training and verification of the network model is 5000, wherein 2975 training set pictures are 500 verification set pictures, and 1525 test set pictures are measured; the specific pre-training process is as follows: firstly, loading images from a data set catalog, then loading corresponding annotation information, and then carrying out data augmentation on the trained images and annotations, wherein the augmentation mode comprises random scaling, random clipping of picture tensors and random left-right overturn, then normalizing each pixel point of an input image through a normalization function, and finally designing a cross entropy loss function shown as a formula (4):
Wherein nls is the semantic category to be classified, and the value of nls is 19 and y when training in the Cityscapes data set c To take a vector with a value of 0 or 1, the value of the element is used to judge whether the category is the same as the sample category or not, P c The probability c E (1, nls) representing that the predicted sample belongs to class c;
and selecting a Poly learning rate rule to train the segmentation network, wherein the Poly learning rate attenuation expression is as follows:
LR in formula (5) initial For the initial learning rate of the network training, the initial learning rate is set to be 0.002 in the pre-training process, the iter is the iterative step number of the network training, the max_iter is the maximum training step number, the initial learning rate is set to be 40K steps, and the power is the attenuation coefficient for controlling the shape of the learning rate curve and is set to be 0.9; LR (iter) is the learning rate corresponding to the specific steps in the training process obtained by calculation and update, and the learning rate of each parameter is dynamically adjusted by using the first moment estimation and the second moment estimation of the gradient by using an Adam optimization solving algorithm; setting the batch processing size as 32 according to the hardware performance of the computer, storing model parameters once every 4K steps, and simultaneously carrying out performance evaluation on the network by using a verification set;
for semantic segmentation tasks, selecting an average pixel precision Acc and average intersection ratio MIoU index based on a confusion matrix to perform performance evaluation on a network pre-training result, wherein the specific implementation process is as follows: for the two-class Confusion Matrix configuration shown in table 1, each row represents the true attribution class of the data, each column represents the prediction class, and a specific element value in the Matrix represents the number of samples predicted as a certain class;
Table 1: schematic table of two-class Confusion Matrix fusion Matrix
The accuracy Acc indicates the percentage of the number of pixels of the correct prediction category to the total number of pixels, and the formula (6) is as follows:
MIoU represents the ratio of the intersection to the union of each type of predicted and real values, and the result of the summation re-average, equation (7) is as follows:
using a logging recording tool in PyTorch to store training data into a log file, and analyzing a training parameter curve, wherein the built semantic segmentation network shows excellent performance in urban driving scenes in good weather;
enriching semantic segmentation data sets and performing specific training on a segmentation network
Aiming at the problem that a segmentation network is easy to be disturbed and distorted in rainy and snowy weather conditions, the method prepares a semantic segmentation data set containing six road types of dry and wet asphalt, dry and wet cement, a snow covered road surface and an ice covered road surface, and continuously trains and evaluates the pre-trained semantic segmentation network on the data set, thereby further improving the generalization capability of the built network, being better applied to daily urban driving scenes, and comprising the following specific realization processes:
firstly, acquiring driving scene data under different weather conditions by using vehicle-mounted camera equipment, and disassembling an acquired video file according to a fixed frame rate to obtain an image sample information base covering six road surface categories: and then, screening image samples with rich differences and obvious features from an image database to carry out fine labeling, wherein the specific labeling process is as follows: firstly, installing an EISeg interactive marking tool in Anaconda and starting a platform, then importing a driving scene picture to be marked, uploading the picture to the EISeg platform, then creating a marking task, selecting a semantic segmentation task type, selecting a marking format of a Cityscapes dataset, then selecting a marking data function, carrying out segmentation marking on the imported picture, entering an interactive marking interface, and using the marking tool to mark the picture; after the labeling is finished, a function of exporting data is used, and a format of exporting the Cityscapes data set is selected and an output path is designated; constructing the manufactured data set according to a Cityscapes data set format, preparing 100 accurate labeling pictures of each type of pavement information, counting 600 data set samples, randomly extracting 20% to be used as a verification set, and using the remaining 80% as a training set; the data set is subjected to data expansion means of mirroring, translation and brightness adjustment, a training strategy and an optimization method in pre-training are continued, and meanwhile, a trained network is verified on a verification set; finally, model reasoning is carried out by using a trained semantic segmentation network, an input driving scene picture is processed, and a visualized pixel-level semantic segmentation result is output;
Step four, extracting a pavement area by using the segmentation network to manufacture a pavement classification network data set:
inputting the image sample library containing 6 types of road surface types obtained in the third step into a semantic segmentation network for prediction processing according to the road surface types, wherein the process is similar to an online identification process in a daily driving environment, and then sending a prediction result and an original image into a processor for extracting a road surface area; the specific process of extracting the pavement area is as follows: firstly, splitting a semantic segmentation prediction result graph according to color channels, respectively obtaining two-dimensional matrixes corresponding to three color channels, looking up a table to find BGR values (128, 64, 128) corresponding to 'road', respectively manufacturing masks of the three BGR color channels by using OpenCV, and only reserving pixel points with the same BGR values as 'road', so as to ensure that only a pavement area is extracted; finally, splicing masks of the three-color channels, respectively performing matrix dot product operation on the masks and the channels corresponding to the original image, extracting to obtain a picture only containing a pavement area, and setting a non-pavement area as a background and converting the non-pavement area into black; the extracted image sample library only containing the pavement area is respectively built into specific folders according to six types of pavement information for storage, and the specific pavement types and label information corresponding to the corresponding 5-bit binary information are prepared as shown in table 2:
Table 2: pavement type and label information correspondence table
Screening the manufactured data set according to the image quality, wherein the final data set capacity comprises 6000 image samples, and each pavement category capacity is 1000 images respectively; finally, the pavement image data set is disturbed, each type of picture is randomly extracted according to the proportion of 20 percent to be used as a verification set, and the rest part is used as a training set;
step five, constructing and training a road surface image type classification network:
the concrete implementation process for constructing the road surface image type classification network is as follows: firstly, constructing a pavement image type recognition network in an Anaconda environment, taking the complexity of pavement classification tasks and the real-time requirement in the classification process into consideration, constructing a lightweight convolutional neural network as a main network of the classifier, introducing a channel attention mechanism, reducing network parameters and floating point operands as much as possible while ensuring the recognition precision, and improving the real-time performance of the classification network; the specific road surface image classification network structure design shown in table 3 is as follows:
table 3: road surface classification network structure table
Firstly, an input layer of a network adjusts a road surface image to be identified into a tensor of 288 multiplied by 3, then, the characteristic is primarily extracted by a convolution layer of 3 multiplied by 3 convolution kernels with a step length of 2 and batch regularization BN and Swish activation function operation, and then, the obtained characteristic image is used as input of a built bottleneck module, the input characteristic image is subjected to characteristic extraction through the bottleneck module, and the specific process of building the network bottleneck module based on a channel attention mechanism is as follows: firstly, changing the dimension of a characteristic channel according to an expansion proportion by adopting 1X 1 convolution, performing batch normalization BN and Swish activation, inputting the characteristic image into a depth separable convolution DWConv layer of 3X 3 for characteristic extraction, inputting the obtained characteristic image into a channel attention mechanism SE module, performing global average pooling operation on the input characteristic image by the module, performing Swish activation to aggregate global characteristics of the input characteristic image, performing full-connection operation on the obtained global characteristics according to the 1X 1 convolution layer with a certain activation rate, obtaining attention weights of different channels by a sigmoid activation function, performing dot multiplication weighting on the obtained weights and the input characteristic image, retaining the main characteristics of input, removing noise interference, reducing parameter calculation, improving the real-time performance of a network, and achieving the purpose of paying attention to more important information channels; finally, the output of the SE module is subjected to convolution of 1 multiplied by 1 to obtain the output of the whole bottleneck module;
As shown in table 3, the feature map extracted by the convolution layer with the step length of 2 and the convolution kernel of 3×3 is input into two bottleneck modules with serial expansion rate of 1 to perform feature extraction to obtain 144×144×24 feature maps, and the first three layers of networks in the network structure adopting the bottleneck modules to perform feature extraction are designed, so that the size of the output feature map is reduced to half of the input feature map when the feature extraction of the bottleneck layer is performed once, and meanwhile, the number of output channels is doubled as the number of input channels, and the feature map with 36×36×64 is obtained when the feature extraction of the first three layers is performed; the three later layers of feature extraction layers sequentially adopt a channel attention mechanism with the activation proportion of 0.35, 0.30 and 0.25, more pay attention to the feature channels with rich contents, finally obtain a 9×9×256 feature image group containing advanced semantic information, carry out channel dimension ascending through a 1×1 convolution layer, carry out pooling treatment on the 9×9×1280 feature image group after dimension ascending by adopting a 9×9 global average pooling layer, and obtain a feature sequence of 1×1×1280; finally, through the full connection layer and the full connection layer matched with the Softmax function, outputting probability values belonging to various road surface categories, and finally determining a network classification result according to the maximum probability value by using the Argmax function;
Training the built road surface image type network on a self-built data set, adopting a cross entropy loss function in a training process, using an Adam self-adaptive gradient descent optimization algorithm, setting a basic learning rate to be 0.0001, and storing a model and a training result according to the iteration times epoch;
step six, establishing a mapping rule to obtain road adhesion coefficient information:
finally, obtaining a corresponding road surface adhesion coefficient through the road surface image type classification result obtained according to the steps; in order to obtain the road surface adhesion coefficient of the identified road surface, referring to an automobile longitudinal slip adhesion coefficient reference value table and an automobile longitudinal slip adhesion coefficient reference value table of ice and snow road surfaces in GA/T643-2006 typical traffic accident shape vehicle running speed technical identification, and considering the influence of the running speed on the road surface adhesion coefficient, defining a mapping rule of the road surface type and the road surface adhesion coefficient as shown in Table 4 and considering the low-speed running state that the running speed of the vehicle is always 48km/h under the urban working condition; according to the vehicle speed and the road surface type identification result, searching a table to correspondingly obtain an adhesion coefficient range value of the current road surface, and taking the intermediate value of the upper limit and the lower limit of the adhesion coefficient as the adhesion coefficient of the current road, namely, the output result of the whole road surface adhesion coefficient prediction algorithm;
Table 4: road surface type and attachment coefficient mapping table.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a pavement adhesion coefficient prediction method based on semantic segmentation, which can provide key pavement adhesion coefficient information for an active safety control system and an auxiliary driving system of an automobile by processing road images of a front driving pavement to obtain a prediction result of the adhesion coefficient of the pavement in advance; according to the invention, a lightweight semantic segmentation network based on a multiscale space attention mechanism is built, and a self-built data set is used for specific training, so that the generalization capability of an algorithm on a rain and snow driving scene is enhanced, and the accuracy, the instantaneity and the robustness of driving road surface extraction are further improved; meanwhile, a semantic segmentation network, a pavement extraction and a pavement identification network serial algorithm structure are designed by combining a lightweight pavement identification network, so that rapid and accurate prediction of pavement attachment information in rich driving scenes can be realized;
drawings
Fig. 1 is a flow chart of a pavement adhesion coefficient prediction method based on a semantic segmentation network.
Fig. 2 is a schematic diagram of the semantic segmentation network constructed in the first step.
Fig. 3 is a schematic diagram of a bottleneck module structure for extracting features of the semantic segmentation network constructed in the first step of the method.
FIG. 4 is a schematic diagram of a bottleneck module structure for extracting characteristics of a road surface classification network constructed in the fifth step of the method
Detailed Description
The invention provides a pavement adhesion coefficient prediction method based on a semantic segmentation network, which predicts the road type classification and pavement adhesion coefficient of a vehicle driving area under urban working conditions, and specifically comprises the following steps:
step one, building a semantic segmentation network based on a multiscale space attention mechanism:
the vehicle-mounted camera equipped by the intelligent driving automobile sensing system is used for collecting video data of a road ahead in the running process of the vehicle, and a exercisable road surface area is extracted through the semantic segmentation network, wherein the performance of the semantic segmentation network determines the overall performance of the whole road surface attachment coefficient prediction algorithm to a great extent, so that the semantic segmentation network with high construction precision, good instantaneity and strong robustness is very critical, and the method is specifically designed as follows:
firstly, configuring and constructing a network environment: the Linux operating system is selected as an image processing and network building and training operating environment, and the specific hardware configuration comprises the following steps: CPU is Intel (R) Core (TM) i9-9900K model, RAM memory is 16GB, display card is NVIDIA RTX A6000, CUDA version is 11.4, hard disk storage is 256GB solid state disk plus 2TB mechanical hard disk; programming a program code by using a Python language, and selecting PyTorch developed by Meta artificial intelligence team of science and technology company as a deep learning network building frame; the configuration work of the software is started under the hardware environment: using environment compiling software Anaconde to create a virtual environment, and installing Python 3.10.8 version, pyTorch 1.11.0 version and OpenCV 4.6.0.66 version in the new environment;
Secondly, constructing a semantic segmentation network, adopting an advanced lightweight Encoder-Decoder framework Encoder-Decoder, and meeting the real-time requirement of an algorithm while guaranteeing the precision of the semantic segmentation network, wherein the specific network structure is shown in figure 1;
then, starting to specifically develop and build a hierarchical semantic segmentation network Encoder structure based on an attention mechanism under the Encoder-Decoder framework Encoder-Decoder for generating rich multi-scale semantic features: firstly, scaling an input image into tensors with the size of 1024 multiplied by 3, and inputting the tensors into a semantic segmentation network; performing a 3×3 convolutional layer, a BN layer and Gelu activation treatment to realize the functions of preliminary extraction and downsampling of features of an input picture;
and then taking the obtained feature map as input of a feature extraction module, wherein the specific implementation process of the feature extraction module is as follows: firstly, adopting 1X 1 convolution for adjusting the number of input channels, inputting the input channels into a spatial attention mechanism module formed by 5X 5 main branch convolution and subsequent 7X 7 and 13X 13 branch convolution after Gelu activation function, and outputting a weight parameter representing spatial attention through the 1X 1 convolution; wherein the multi-scale spatial attention mechanism formula is shown as follows:
Atten in formula (1) is the output weight parameter representing the spatial attention, atten 0 Weight output for main branch of spatial attention module, atten i The weight of the ith branch convolution in the module is output, and i is the number of the branch convolutions;
expanding the number of channels of the feature map through 1X 1 convolution, carrying out finer feature extraction on the feature map subjected to space attention weighting processing by adopting 3X 3 convolution, activating an extraction result through a GeLu function, and finally outputting a processing result of a feature extraction module through 1X 1 convolution;
wherein Output is the Output of the bottleneck module, F is the input of the bottleneck module, conv2D (·) represents the convolution for processing the input feature map, gelu (·) represents the further processing of the feature map using the activation function;
performing four-time serial feature extraction on the network input tensor by using the constructed encoder bottleneck module to obtain four-stage feature graphs representing different semantic levels; the first stage feature extraction is performed by three serial feature extraction modules, and 256×256×32 feature graphs are output; the number of layers of the convolution layers, the number of in-layer convolutions and the number of channels are adjusted through the downsampling layer architecture, and after the feature map is downsampled, the feature map with the size of 128 multiplied by 64 in the second stage is obtained through three serial feature extraction modules; then obtaining a 64×64×160 phase three feature map through a downsampling layer and five serial feature processing modules; finally, obtaining a 32 multiplied by 256 feature map of the fourth stage through downsampling once again and two feature extraction modules;
Finally, a lightweight Decoder architecture is built based on specific expansion of an Encoder-Decoder framework Encoder-Decoder: the feature images of different stages extracted by the encoder are processed sequentially through an up-sampling, channel splicing operation, a light ham _head decoder module and a full-connection classifier, and are used for fusing semantic information of different levels; in order to collect multi-scale semantic information and expand receptive fields, a stage I with more low-level semantic information is abandoned, the feature images of a stage II, a stage III and a stage IV are processed into feature images with the same size through an up-sampling method of bilinear interpolation, then channel splicing concat operation is carried out, the feature images are input into a ham _head decoder module, and the feature images with the sizes of 128 multiplied by 480 after the integration are processed; the specific implementation process of the ham _head decoder module is as follows: compressing the channel number of the 480 channels of the feature map obtained by the concat operation to 256 through a 1 multiplied by 1 convolution layer, a 32 group GroupNorm layer and a ReLu activation layer, and adopting a non-Negative Matrix Factorization (NMF) algorithm shown in a formula (3):
V m×n =P m×r ×Q r×n (3)
matrix V of features m×n Low-set matrix P decomposed into mxr and rxn m×r And Q r×n Wherein P is m×r The main characteristics of the data are reflected by the characteristic base matrix; and Q is r×n A characteristic coefficient matrix which represents the distribution of data characteristics; the algorithm replaces the feature matrix with the feature matrix, and can map high-dimensional data to low-dimensional data and keep important features of the data, so that interference of redundant matrixes is abandoned, generalization capability of a model is enhanced, and meanwhile, the image algorithm processing process can be accelerated, and algorithm instantaneity is improved; finally, outputting a processing result through a linearization layer of a 1X 1 convolution architecture formed by a 1X 1 convolution layer, 32 groups of group Norm layers and a ReLu activation layer to obtain the output of the ham _head decoder module;
then the processed feature map is subjected to full connection operation through a 1×1 convolution layer, category scores are converted into probability distribution through a softmax function, and a 256×256×nls feature map is obtained, wherein the value of each pixel point represents the probability of belonging to the semantic category, and nls represents the number of the semantic categories to be classified; then upsampling the obtained feature map by bilinear interpolation, and giving semantic class prediction results of each pixel point according to the maximum probability by using ArgMax function to obtain 1024×1024 prediction segmentation result map with the same size as the input picture, namely the output of the encoder;
Step two, pre-training the built segmentation network on a public data set:
the method comprises the steps of selecting a Cityscapes data set to conduct pre-training of a semantic segmentation network, wherein the data set is pushed by a Benz company to issue street view image segmentation tasks which can be used for city street view driving scene images of 50 cities, and the street view driving scene images have accurate labels covering 19 semantic information of road surfaces, buildings, pedestrians, vehicles, buildings and the like; meanwhile, the image sample is similar to the driving form of the daily urban road condition, so that the pre-trained semantic segmentation network has certain generalization capability; the number of the accurate annotation image samples participating in the training and verification of the network model is 5000, wherein 2975 training set pictures are 500 verification set pictures, and 1525 test set pictures are measured; the specific pre-training process is as follows: firstly, loading images from a data set catalog, then loading corresponding annotation information, and then carrying out data augmentation on the trained images and annotations, wherein the augmentation mode comprises random scaling, random clipping of picture tensors and random left-right overturn, then normalizing each pixel point of an input image through a normalization function, and finally designing a cross entropy loss function shown as a formula (4):
Wherein nls is the semantic category to be classified, and the value of nls is 19 and y when training in the Cityscapes data set c To take a vector with a value of 0 or 1, the value of the element is used to judge whether the category is the same as the sample category or not, P c The probability c E (1, nls) representing that the predicted sample belongs to class c;
and selecting a Poly learning rate rule to train the segmentation network, wherein the Poly learning rate attenuation expression is as follows:
LR in formula (5) initial For the initial learning rate of the network training, the initial learning rate is set to be 0.002 in the pre-training process, the iter is the iterative step number of the network training, the max_iter is the maximum training step number, the initial learning rate is set to be 40K steps, and the power is the attenuation coefficient for controlling the shape of the learning rate curve and is set to be 0.9; LR (iter) is the learning rate corresponding to the specific steps in the training process obtained by calculation and update, and the learning rate of each parameter is dynamically adjusted by using the first moment estimation and the second moment estimation of the gradient by using an Adam optimization solving algorithm; setting batches according to computer hardware performanceThe processing size is 16, model parameters are stored once every 4K steps, and meanwhile, a verification set is used for evaluating the performance of the network;
for semantic segmentation tasks, the performance evaluation is carried out on the network pre-training result by using an average pixel precision Acc and average cross-over ratio MIoU index based on a confusion matrix, and the specific implementation process is as follows: for the two-class Confusion Matrix configuration shown in table 1, each row represents the true attribution class of the data, each column represents the prediction class, and a specific element value in the Matrix represents the number of samples predicted as a certain class;
Table 1: schematic table of two-class Confusion Matrix fusion Matrix
The accuracy Acc indicates the percentage of the number of pixels of the correct prediction category to the total number of pixels, and the formula (6) is as follows:
MIoU represents the ratio of the intersection to the union of each type of predicted and real values, and the result of the summation re-average, equation (7) is as follows:
using a logging recording tool in PyTorch to store training data into a log file, and analyzing a training parameter curve, wherein the built semantic segmentation network shows excellent performance in urban driving scenes in good weather;
in the embodiment, acc belonging to the road category is 0.9909, the MIoU value is 0.9835, and the MIoU value belonging to the full-semantic category is 0.7959, which represent that the built semantic segmentation network has higher semantic segmentation precision, and the prediction time of a single picture is 0.02s, so that the real-time requirement of the algorithm in the vehicle driving process is met;
enriching semantic segmentation data sets and performing specific training on a segmentation network
In the experimental process, the semantic segmentation network which is built and trained only through the steps is found to have poor performance and even serious distortion condition on the task of segmenting the ponding snow-covered pavement in rainy and snowy weather; the training data under the Cityscapes data set are collected in urban driving scenes under good weather conditions, and driving scenes under rain and snow weather conditions are not included, so that the semantic segmentation network which is only pre-trained is easy to be interfered by bad weather and special road conditions to be misclassified, and the overall performance of the semantic segmentation network is affected;
In order to solve the problems, the method is better applied to daily driving scenes, and aims at solving the problem that a segmentation network is easy to be disturbed and distorted under rainy and snowy weather conditions, a semantic segmentation data set containing six road types including dry and wet asphalt, dry and wet cement, a snow covered road surface and an ice covered road surface is manufactured, and training and evaluation are continuously carried out on the data set on the pre-trained semantic segmentation network, so that the generalization capability of the built network is further improved, and the method is better applied to daily urban driving scenes, and the specific implementation process is as follows:
firstly, selecting proper vehicle-mounted camera equipment to collect driving scene data under different weather conditions, and disassembling the collected video files according to a fixed frame rate to obtain an image sample information base covering six road surface categories:
in the embodiment, the selected image acquisition equipment is a high-definition image sensor of F1005-E Sensor Unit AXIS, the working temperature is-30-55 ℃, the focal length adjustable range is 28-120 mm, the acquired video resolution is 1920 multiplied by 1200, the frame rate is 60fps, and the visual angle FOV of a lens is wide-angle 113 degrees;
in this embodiment, the image capturing device is installed on an image capturing experiment vehicle, and after a long-time road surface image data capturing experiment, the video files including different weather conditions such as sunny days, cloudy days, rainy days, snowy days, and the like, and daytime, evening, night, and the like, in different capturing time periods such as daytime, evening, night, and the like, are finally collected; decomposing the video file into pictures according to fixed frame number intervals, sorting according to attribution categories, unifying naming modes, and placing similar road surface images under the same folder; finally, 6 pavement type image sample libraries are obtained by arrangement;
And then, screening image samples with rich differences and obvious features from an image database to carry out fine labeling, wherein the specific labeling process is as follows: firstly, installing an EISeg interactive marking tool in Anaconda and starting a platform, then importing a driving scene picture to be marked, uploading the picture to the EISeg platform, then creating a marking task, selecting a semantic segmentation task type, selecting a marking format of a Cityscapes dataset, then selecting a marking data function, carrying out segmentation marking on the imported picture, entering an interactive marking interface, and using the marking tool to mark the picture; after the labeling is finished, a function of exporting data is used, and a format of exporting the Cityscapes data set is selected and an output path is designated; the method comprises the steps of carrying out a first treatment on the surface of the
Constructing the manufactured data set according to a Cityscapes data set format, preparing 100 accurate labeling pictures of each type of pavement information, counting 600 data set samples, randomly extracting 20% to be used as a verification set, and using the remaining 80% as a training set; the training strategy and the optimization method in the pre-training are continued by the data expansion means of mirroring, translation and brightness adjustment of the data set, meanwhile, the trained network is verified on the verification set, finally, the model reasoning is carried out by utilizing the trained semantic segmentation network, the input driving scene picture can be processed, and the visualized segmentation result is output;
The experimental result in the embodiment shows that the semantic segmentation network subjected to specific training adapts to the interference of a rain and snow road surface, wherein the MIoU of the verification set part of the rain and snow road surface in the data set belongs to the road surface is improved from 0.5532 to 0.7309, the precision of the constructed semantic segmentation network is further improved, and meanwhile, the algorithm is more reliable;
step four, extracting a pavement area by using the segmentation network to manufacture a pavement classification network data set:
step one to three belong to the construction and discrete training processes of a semantic segmentation network, and when the pavement attachment coefficient prediction technology is specifically applied, the segmentation network is required to be used for online identification, so that an industrial personal computer is used for simulating the online identification process of the semantic segmentation network;
inputting the image sample library containing 6 types of road surface types obtained in the third step into a semantic segmentation network for prediction processing according to the road surface types, wherein the process is similar to an online identification process in a daily driving environment, and then sending a prediction result and an original image into a processor for extracting a road surface area; the specific process of extracting the pavement area is as follows: firstly, splitting a semantic segmentation prediction result graph according to color channels, respectively obtaining two-dimensional matrixes corresponding to three color channels, looking up a table to find BGR values (128,64,128) corresponding to 'road', respectively manufacturing masks of the three BGR color channels by using OpenCV, and only reserving pixels with the same BGR values as 'road', so as to ensure that only road surface areas are extracted; finally, splicing masks of the three-color channels, respectively performing matrix dot product operation on the masks and the channels corresponding to the original image, extracting to obtain a picture only containing a pavement area, and setting a non-pavement area as a background and converting the non-pavement area into black; the extracted image sample library only containing the pavement area is respectively built into specific folders according to six types of pavement information for storage, and specific pavement types and label information for making corresponding 5-bit binary information are shown in table 2:
Table 2: pavement type and label information correspondence table
Screening the manufactured data set according to the image quality, wherein the final data set capacity comprises 6000 image samples, and each pavement category capacity is 1000 images respectively; finally, the pavement image data set is disturbed, each type of picture is randomly extracted according to the proportion of 20 percent to be used as a verification set, and the rest part is used as a training set;
step five, constructing and training a road surface image type classification network:
step one to four, the road surface area extraction process of the image information in the driving process is completed, the prediction method of the road surface attachment coefficient based on vision is to be realized, and the recognition of the road surface type is also required to be completed on the basis of the road surface area extraction result, so that the construction and training of the road surface image type classification network is particularly critical; the specific implementation process is as follows:
the concrete implementation process for constructing the road surface image type classification network is as follows: firstly, constructing a road surface image type classification network under an Anaconda environment, constructing a lightweight convolutional neural network as a main network of the classifier in consideration of the complexity of road surface classification tasks and real-time requirements in the classification process, simultaneously introducing a channel attention mechanism, ensuring the recognition precision, reducing network parameters and floating point operands as much as possible, and improving the real-time performance of the classification network; the specific road surface image type classification network structure design as shown in table 3 is as follows:
Table 3: road surface classification network structure table
Firstly, an input layer of a network adjusts a road surface image to be identified into a tensor of 288 multiplied by 3, then, the characteristic is primarily extracted by a convolution layer of 3 multiplied by 3 convolution kernels with a step length of 2 and batch regularization BN and Swish activation function operation, then, the obtained characteristic diagram is used as input of a built bottleneck module, the characteristic extraction is carried out on the input characteristic diagram through the bottleneck module, and the structure and the specific process of the network bottleneck module based on a channel attention mechanism are shown in figure 3: firstly, changing the dimension of a characteristic channel according to an expansion proportion by adopting 1X 1 convolution, performing batch normalization BN and Swish activation, inputting the characteristic image into a depth separable convolution DWConv layer of 3X 3 for characteristic extraction, inputting the obtained characteristic image into a channel attention mechanism SE module, performing global average pooling operation on the input characteristic image by the module, performing Swish activation to aggregate global characteristics of the input characteristic image, performing full-connection operation on the obtained global characteristics according to the 1X 1 convolution layer with a certain activation rate, obtaining attention weights of different channels by a sigmoid activation function, performing dot multiplication weighting on the obtained weights and the input characteristic image, retaining the main characteristics of input, removing noise interference, reducing parameter calculation, improving the real-time performance of a network, and achieving the purpose of paying attention to more important information channels; finally, the output of the SE module is subjected to convolution of 1 multiplied by 1 to obtain the output of the whole bottleneck module;
As shown in table 3, the feature map extracted by the convolution layer with the step length of 2 and the convolution kernel of 3×3 is input into two bottleneck modules with serial expansion rate of 1 to perform feature extraction to obtain 144×144×24 feature maps, and the first three layers of networks in the network structure adopting the bottleneck modules to perform feature extraction are designed, so that the size of the output feature map is reduced to half of the input feature map when the feature extraction of the bottleneck layer is performed once, and meanwhile, the number of output channels is doubled as the number of input channels, and the feature map with 36×36×64 is obtained when the feature extraction of the first three layers is performed; the three later layers of feature extraction layers sequentially adopt channel attentions with the activation proportion of 0.35, 0.30 and 0.25, pay more attention to feature channels rich in content, finally obtain 9×9×256 feature image groups containing advanced semantic information, carry out channel dimension ascending through a 1×1 convolution layer, carry out pooling treatment on the 9×9×1280 feature image groups after dimension ascending by adopting a 9×9 global average pooling layer, and obtain a feature sequence of 1×1×1280; finally, through the full connection layer and the full connection layer matched with the Softmax function, outputting probability values belonging to various road surface categories, and finally determining a network classification result according to the maximum probability value by using the Argmax function;
Training the built road surface image type network on a self-built data set, adopting a cross entropy loss function in a training process, using an Adam self-adaptive gradient descent optimization algorithm, setting a basic learning rate to be 0.0001, and storing a model and a training result according to the iteration times epoch;
in the embodiment, the trained model and parameters are stored, an online identification process is simulated, 1200 image sample data are randomly extracted from a sample library acquired by a real vehicle for verification, the concrete expression of the constructed classification network is shown in table 4, wherein the numbers at the diagonal of a confusion matrix represent the correct prediction sample number of images, namely 1179 pictures in the verification sample are correctly predicted, the average accuracy of model verification is 0.9825, the processing time of the pictures is 0.009s, and the precision and real-time requirements of road classification technology are met;
table 4: road surface classification network classification result confusion matrix
Step six, establishing a mapping rule to obtain road adhesion coefficient information:
finally, obtaining a corresponding road surface adhesion coefficient through the road surface image type classification result obtained according to the steps; in order to obtain the road surface adhesion coefficient of the identified road surface, referring to an automobile longitudinal slip adhesion coefficient reference value table and an automobile longitudinal slip adhesion coefficient reference value table of ice and snow road surfaces in GA/T643-2006 typical traffic accident shape vehicle running speed technical identification, and considering the influence of the running speed on the road surface adhesion coefficient, defining a mapping rule of the road surface type and the road surface adhesion coefficient as shown in Table 5 and considering the low-speed running state that the running speed of the vehicle is about 48km/h under the urban working condition; according to the vehicle speed and the road surface type identification result, table look-up 5 correspondingly obtains the adhesion coefficient range value of the current road surface, takes the intermediate value of the upper limit and the lower limit of the adhesion coefficient as the adhesion coefficient of the current road, namely the output result of the whole road surface adhesion coefficient prediction algorithm;
Road surface type | Adhesion coefficient | Determination value of adhesion coefficient |
Dry asphalt | 0.55-0.8 | 0.675 |
Wet asphalt | 0.45-0.7 | 0.575 |
Dry cement | 0.55-0.8 | 0.675 |
Wet cement | 0.45-0.75 | 0.600 |
Snow covered road | 0.1-0.25 | 0.225 |
Ice-covered road | 0.1-0.2 | 0.150 |
Table 5: road surface type and attachment coefficient mapping table
In the embodiment, the sum of average processing time of a semantic segmentation network and a mask extraction and recognition network in the designed pavement adhesion coefficient prediction algorithm is 0.0314 seconds, and the accuracy is high, so that the requirements of the prediction algorithm on accuracy and instantaneity are met.
Claims (1)
1. A pavement adhesion coefficient prediction method based on a semantic segmentation network predicts the road type classification and pavement adhesion coefficient of a vehicle driving area under urban working conditions, and is characterized by comprising the following specific steps:
step one, building a semantic segmentation network based on a multiscale space attention mechanism:
the vehicle-mounted camera equipped with the intelligent driving automobile sensing system is used for collecting video data of a road ahead in the running process of the automobile, and the exercisable road surface area is extracted through the semantic segmentation network:
firstly, configuring and constructing a network environment: selecting a Linux operating system as an image processing and network construction and training operation environment, writing a program code by using a Python language, and selecting PyTorch developed by Meta artificial intelligence team of science and technology company as a deep learning network construction frame; using environment compiling software Anaconde to create a virtual environment, and installing Python 3.10.8 version, pyTorch 1.11.0 version and OpenCV 4.6.0.66 version in the new environment;
Secondly, constructing a semantic segmentation network, and adopting an advanced lightweight Encoder-Decoder framework Encoder-Decoder to ensure the precision of the semantic segmentation network and meet the real-time requirement of an algorithm;
then, starting to specifically develop and build a hierarchical semantic segmentation network Encoder structure based on an attention mechanism under the Encoder-Decoder framework Encoder-Decoder for generating rich multi-scale semantic features: firstly, scaling an input image into tensors with the size of 1024 multiplied by 3, and inputting the tensors into a semantic segmentation network; performing a 3×3 convolutional layer, a BN layer and Gelu activation treatment to realize the functions of preliminary extraction and downsampling of features of an input picture;
and then taking the obtained feature map as input of a feature extraction module, wherein the specific implementation process of the feature extraction module is as follows: firstly, adopting 1X 1 convolution for adjusting the number of input channels, inputting the input channels into a spatial attention mechanism module formed by 5X 5 main branch convolution and subsequent 7X 7 and 13X 13 branch convolution after Gelu activation function, and outputting a weight parameter representing spatial attention through the 1X 1 convolution; wherein the multi-scale spatial attention mechanism formula is shown as follows:
Atten in formula (1) is the output weight parameter representing the spatial attention, atten 0 Weight output for main branch of spatial attention module, atten i The weight of the ith branch convolution in the module is output, wherein i is the number of the branch convolutions;
expanding the number of channels of the feature map through 1X 1 convolution, carrying out finer feature extraction on the feature map subjected to space attention weighting processing by adopting 3X 3 convolution, activating an extraction result through a GeLu function, and finally outputting a processing result of a feature extraction module through 1X 1 convolution;
wherein Output is the Output of the bottleneck module, F is the input of the bottleneck module, conv2D (·) represents the convolution for processing the input feature map, gelu (·) represents the further processing of the feature map using the activation function;
performing four-time serial feature extraction on the network input tensor by using the constructed encoder bottleneck module to obtain four-stage feature graphs representing different semantic levels; the first stage feature extraction is performed by three serial feature extraction modules, and 256×256×32 feature graphs are output; the number of layers of the convolution layers, the number of in-layer convolutions and the number of channels are adjusted through the downsampling layer architecture, and after the feature map is downsampled, the feature map with the size of 128 multiplied by 64 in the second stage is obtained through three serial feature extraction modules; then obtaining a 64×64×160 phase three feature map through a downsampling layer and five serial feature processing modules; finally, obtaining a 32 multiplied by 256 feature map of the fourth stage through downsampling once again and two feature extraction modules;
Finally, a lightweight Decoder architecture is built based on specific expansion of an Encoder-Decoder framework Encoder-Decoder: the feature images of different stages extracted by the encoder are processed sequentially through an up-sampling, channel splicing operation, a light ham _head decoder module and a full-connection classifier, and are used for fusing semantic information of different levels; in order to collect multi-scale semantic information and expand receptive fields, a stage I with more low-level semantic information is abandoned, the feature images of a stage II, a stage III and a stage IV are processed into feature images with the same size through an up-sampling method of bilinear interpolation, then channel splicing concat operation is carried out, the feature images are input into a ham _head decoder module, and the feature images with the sizes of 128 multiplied by 480 after the integration are processed; the specific implementation process of the ham _head decoder module is as follows: compressing the channel number of the 480 channels of the feature map obtained by the concat operation to 256 through a 1 multiplied by 1 convolution layer, a 32 group GroupNorm layer and a ReLu activation layer, and adopting a non-Negative Matrix Factorization (NMF) algorithm shown in a formula (3):
V m×n =P m×r ×Q r×n (3)
matrix V of features m×n Low-set matrix P decomposed into mxr and rxn m×r And Q r×n Wherein P is m×r The main characteristics of the data are reflected by the characteristic base matrix; and Q is r×n A characteristic coefficient matrix which represents the distribution of data characteristics; the characteristic matrix is replaced by the characteristic matrix in the algorithm, so that the interference of a redundant matrix can be avoided, the image algorithm processing process is accelerated, and the algorithm instantaneity is improved; finally, outputting a processing result through a linearization layer of a 1X 1 convolution architecture formed by a 1X 1 convolution layer, 32 groups of group Norm layers and a ReLu activation layer to obtain the output of the ham _head decoder module;
then the processed feature map is subjected to full connection operation through a 1×1 convolution layer, category scores are converted into probability distribution through a softmax function, and a 256×256×nls feature map is obtained, wherein the value of each pixel point represents the probability of belonging to the semantic category, and nls represents the number of the semantic categories to be classified; then upsampling the obtained feature map by bilinear interpolation, and giving semantic class prediction results of each pixel point according to the maximum probability by using ArgMax function to obtain 1024×1024 prediction segmentation result map with the same size as the input picture, namely the output of the encoder;
step two, pre-training the built segmentation network on a public data set:
the method comprises the steps of selecting a Cityscapes data set to conduct pre-training of a semantic segmentation network, wherein the data set is pushed by a Benz company to issue street view image segmentation tasks which can be used for city street view driving scene images of 50 cities, and the street view driving scene images have accurate labels covering 19 semantic information of road surfaces, buildings, pedestrians, vehicles, buildings and the like; meanwhile, the image sample is similar to the driving form of the daily urban road condition, so that the pre-trained semantic segmentation network has certain generalization capability; the number of the accurate annotation image samples participating in the training and verification of the network model is 5000, wherein 2975 training set pictures are 500 verification set pictures, and 1525 test set pictures are measured; the specific pre-training process is as follows: firstly, loading images from a data set catalog, then loading corresponding annotation information, and then carrying out data augmentation on the trained images and annotations, wherein the augmentation mode comprises random scaling, random clipping of picture tensors and random left-right overturn, then normalizing each pixel point of an input image through a normalization function, and finally designing a cross entropy loss function shown as a formula (4):
Wherein nls is the semantic category to be classified, and the value of nls is 19 and y when training in the Cityscapes data set c To take a vector with a value of 0 or 1, the value of the element is used to judge whether the category is the same as the sample category or not, P c The probability c E (1, nls) representing that the predicted sample belongs to class c;
and selecting a Poly learning rate rule to train the segmentation network, wherein the Poly learning rate attenuation expression is as follows:
LR in formula (5) initial For the initial learning rate of the network training, the initial learning rate is set to be 0.002 in the pre-training process, the iter is the iterative step number of the network training, the max_iter is the maximum training step number, the initial learning rate is set to be 40K steps, and the power is the attenuation coefficient for controlling the shape of the learning rate curve and is set to be 0.9; LR (iter) is the learning rate corresponding to the specific steps in the training process obtained by calculation and update, and the learning rate of each parameter is dynamically adjusted by using the first moment estimation and the second moment estimation of the gradient by using an Adam optimization solving algorithm; setting the batch processing size as 32 according to the hardware performance of the computer, storing model parameters once every 4K steps, and simultaneously carrying out performance evaluation on the network by using a verification set;
for semantic segmentation tasks, selecting an average pixel precision Acc and average intersection ratio MIoU index based on a confusion matrix to perform performance evaluation on a network pre-training result, wherein the specific implementation process is as follows: for the two-class Confusion Matrix configuration shown in table 1, each row represents the true attribution class of the data, each column represents the prediction class, and a specific element value in the Matrix represents the number of samples predicted as a certain class;
Table 1: schematic table of two-class Confusion Matrix fusion Matrix
The accuracy Acc indicates the percentage of the number of pixels of the correct prediction category to the total number of pixels, and the formula (6) is as follows:
MIoU represents the ratio of the intersection to the union of each type of predicted and real values, and the result of the summation re-average, equation (7) is as follows:
using a logging recording tool in PyTorch to store training data into a log file, and analyzing a training parameter curve, wherein the built semantic segmentation network shows excellent performance in urban driving scenes in good weather;
enriching semantic segmentation data sets and performing specific training on a segmentation network
Aiming at the problem that a segmentation network is easy to be disturbed and distorted in rainy and snowy weather conditions, the method prepares a semantic segmentation data set containing six road types of dry and wet asphalt, dry and wet cement, a snow covered road surface and an ice covered road surface, and continuously trains and evaluates the pre-trained semantic segmentation network on the data set, thereby further improving the generalization capability of the built network, being better applied to daily urban driving scenes, and comprising the following specific realization processes:
firstly, acquiring driving scene data under different weather conditions by using vehicle-mounted camera equipment, and disassembling an acquired video file according to a fixed frame rate to obtain an image sample information base covering six road surface categories: and then, screening image samples with rich differences and obvious features from an image database to carry out fine labeling, wherein the specific labeling process is as follows: firstly, installing an EISeg interactive marking tool in Anaconda and starting a platform, then importing a driving scene picture to be marked, uploading the picture to the EISeg platform, then creating a marking task, selecting a semantic segmentation task type, selecting a marking format of a Cityscapes dataset, then selecting a marking data function, carrying out segmentation marking on the imported picture, entering an interactive marking interface, and using the marking tool to mark the picture; after the labeling is finished, a function of exporting data is used, and a format of exporting the Cityscapes data set is selected and an output path is designated; constructing the manufactured data set according to a Cityscapes data set format, preparing 100 accurate labeling pictures of each type of pavement information, counting 600 data set samples, randomly extracting 20% to be used as a verification set, and using the remaining 80% as a training set; the data set is subjected to data expansion means of mirroring, translation and brightness adjustment, a training strategy and an optimization method in pre-training are continued, and meanwhile, a trained network is verified on a verification set; finally, model reasoning is carried out by using a trained semantic segmentation network, an input driving scene picture is processed, and a visualized pixel-level semantic segmentation result is output;
Step four, extracting a pavement area by using the segmentation network to manufacture a pavement classification network data set:
inputting the image sample library containing 6 types of road surface types obtained in the third step into a semantic segmentation network for prediction processing according to the road surface types, wherein the process is similar to an online identification process in a daily driving environment, and then sending a prediction result and an original image into a processor for extracting a road surface area; the specific process of extracting the pavement area is as follows: firstly, splitting a semantic segmentation prediction result graph according to color channels, respectively obtaining two-dimensional matrixes corresponding to three color channels, looking up a table to find BGR values (128, 64, 128) corresponding to 'road', respectively manufacturing masks of the three BGR color channels by using OpenCV, and only reserving pixel points with the same BGR values as 'road', so as to ensure that only a pavement area is extracted; finally, splicing masks of the three-color channels, respectively performing matrix dot product operation on the masks and the channels corresponding to the original image, extracting to obtain a picture only containing a pavement area, and setting a non-pavement area as a background and converting the non-pavement area into black; the extracted image sample library only containing the pavement area is respectively built into specific folders according to six types of pavement information for storage, and the specific pavement types and label information corresponding to the corresponding 5-bit binary information are prepared as shown in table 2:
Table 2: pavement type and label information correspondence table
Screening the manufactured data set according to the image quality, wherein the final data set capacity comprises 6000 image samples, and each pavement category capacity is 1000 images respectively; finally, the pavement image data set is disturbed, each type of picture is randomly extracted according to the proportion of 20 percent to be used as a verification set, and the rest part is used as a training set;
step five, constructing and training a road surface image type classification network:
the concrete implementation process for constructing the road surface image type classification network is as follows: firstly, constructing a pavement image type recognition network in an Anaconda environment, taking the complexity of pavement classification tasks and the real-time requirement in the classification process into consideration, constructing a lightweight convolutional neural network as a main network of the classifier, introducing a channel attention mechanism, reducing network parameters and floating point operands as much as possible while ensuring the recognition precision, and improving the real-time performance of the classification network; the specific road surface image classification network structure design shown in table 3 is as follows:
Table 3: road surface classification network structure table
Firstly, an input layer of a network adjusts a road surface image to be identified into a tensor of 288 multiplied by 3, then, the characteristic is primarily extracted by a convolution layer of 3 multiplied by 3 convolution kernels with a step length of 2 and batch regularization BN and Swish activation function operation, and then, the obtained characteristic image is used as input of a built bottleneck module, the input characteristic image is subjected to characteristic extraction through the bottleneck module, and the specific process of building the network bottleneck module based on a channel attention mechanism is as follows: firstly, changing the dimension of a characteristic channel according to an expansion proportion by adopting 1X 1 convolution, performing batch normalization BN and Swish activation, inputting the characteristic image into a depth separable convolution DWConv layer of 3X 3 for characteristic extraction, inputting the obtained characteristic image into a channel attention mechanism SE module, performing global average pooling operation on the input characteristic image by the module, performing Swish activation to aggregate global characteristics of the input characteristic image, performing full-connection operation on the obtained global characteristics according to the 1X 1 convolution layer with a certain activation rate, obtaining attention weights of different channels by a sigmoid activation function, performing dot multiplication weighting on the obtained weights and the input characteristic image, retaining the main characteristics of input, removing noise interference, reducing parameter calculation, improving the real-time performance of a network, and achieving the purpose of paying attention to more important information channels; finally, the output of the SE module is subjected to convolution of 1 multiplied by 1 to obtain the output of the whole bottleneck module;
As shown in table 3, the feature map extracted by the convolution layer with the step length of 2 and the convolution kernel of 3×3 is input into two bottleneck modules with serial expansion rate of 1 to perform feature extraction to obtain 144×144×24 feature maps, and the first three layers of networks in the network structure adopting the bottleneck modules to perform feature extraction are designed, so that the size of the output feature map is reduced to half of the input feature map when the feature extraction of the bottleneck layer is performed once, and meanwhile, the number of output channels is doubled as the number of input channels, and the feature map with 36×36×64 is obtained when the feature extraction of the first three layers is performed; the three later layers of feature extraction layers sequentially adopt a channel attention mechanism with the activation proportion of 0.35, 0.30 and 0.25, more pay attention to the feature channels with rich contents, finally obtain a 9×9×256 feature image group containing advanced semantic information, carry out channel dimension ascending through a 1×1 convolution layer, carry out pooling treatment on the 9×9×1280 feature image group after dimension ascending by adopting a 9×9 global average pooling layer, and obtain a feature sequence of 1×1×1280; finally, through the full connection layer and the full connection layer matched with the Softmax function, outputting probability values belonging to various road surface categories, and finally determining a network classification result according to the maximum probability value by using the Argmax function;
Training the built road surface image type network on a self-built data set, adopting a cross entropy loss function in a training process, using an Adam self-adaptive gradient descent optimization algorithm, setting a basic learning rate to be 0.0001, and storing a model and a training result according to the iteration times epoch;
step six, establishing a mapping rule to obtain road adhesion coefficient information:
finally, obtaining a corresponding road surface adhesion coefficient through the road surface image type classification result obtained according to the steps; in order to obtain the road surface adhesion coefficient of the identified road surface, referring to an automobile longitudinal slip adhesion coefficient reference value table and an automobile longitudinal slip adhesion coefficient reference value table of ice and snow road surfaces in GA/T643-2006 typical traffic accident shape vehicle running speed technical identification, and considering the influence of the running speed on the road surface adhesion coefficient, defining a mapping rule of the road surface type and the road surface adhesion coefficient as shown in Table 4 and considering the low-speed running state that the running speed of the vehicle is always 48km/h under the urban working condition; according to the vehicle speed and the road surface type identification result, searching a table to correspondingly obtain an adhesion coefficient range value of the current road surface, and taking the intermediate value of the upper limit and the lower limit of the adhesion coefficient as the adhesion coefficient of the current road, namely, the output result of the whole road surface adhesion coefficient prediction algorithm;
Table 4: road surface type and attachment coefficient mapping table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310580153.7A CN116630702A (en) | 2023-05-23 | 2023-05-23 | Pavement adhesion coefficient prediction method based on semantic segmentation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310580153.7A CN116630702A (en) | 2023-05-23 | 2023-05-23 | Pavement adhesion coefficient prediction method based on semantic segmentation network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116630702A true CN116630702A (en) | 2023-08-22 |
Family
ID=87616534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310580153.7A Pending CN116630702A (en) | 2023-05-23 | 2023-05-23 | Pavement adhesion coefficient prediction method based on semantic segmentation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116630702A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117523318A (en) * | 2023-12-26 | 2024-02-06 | 宁波微科光电股份有限公司 | Anti-light interference subway shielding door foreign matter detection method, device and medium |
-
2023
- 2023-05-23 CN CN202310580153.7A patent/CN116630702A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117523318A (en) * | 2023-12-26 | 2024-02-06 | 宁波微科光电股份有限公司 | Anti-light interference subway shielding door foreign matter detection method, device and medium |
CN117523318B (en) * | 2023-12-26 | 2024-04-16 | 宁波微科光电股份有限公司 | Anti-light interference subway shielding door foreign matter detection method, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111368687B (en) | Sidewalk vehicle illegal parking detection method based on target detection and semantic segmentation | |
CN110163187B (en) | F-RCNN-based remote traffic sign detection and identification method | |
CN110188705B (en) | Remote traffic sign detection and identification method suitable for vehicle-mounted system | |
CN109740465B (en) | Lane line detection algorithm based on example segmentation neural network framework | |
CN111814623A (en) | Vehicle lane departure visual detection method based on deep neural network | |
CN111460919B (en) | Monocular vision road target detection and distance estimation method based on improved YOLOv3 | |
CN113095152B (en) | Regression-based lane line detection method and system | |
CN112183203A (en) | Real-time traffic sign detection method based on multi-scale pixel feature fusion | |
CN113902915A (en) | Semantic segmentation method and system based on low-illumination complex road scene | |
CN112990065B (en) | Vehicle classification detection method based on optimized YOLOv5 model | |
CN114418895A (en) | Driving assistance method and device, vehicle-mounted device and storage medium | |
CN111582339A (en) | Vehicle detection and identification method based on deep learning | |
CN114913498A (en) | Parallel multi-scale feature aggregation lane line detection method based on key point estimation | |
CN112613434A (en) | Road target detection method, device and storage medium | |
CN115273032A (en) | Traffic sign recognition method, apparatus, device and medium | |
CN116630702A (en) | Pavement adhesion coefficient prediction method based on semantic segmentation network | |
CN115527096A (en) | Small target detection method based on improved YOLOv5 | |
CN114048536A (en) | Road structure prediction and target detection method based on multitask neural network | |
CN117079276B (en) | Semantic segmentation method, system, equipment and medium based on knowledge distillation | |
CN117115690A (en) | Unmanned aerial vehicle traffic target detection method and system based on deep learning and shallow feature enhancement | |
CN117115770A (en) | Automatic driving method based on convolutional neural network and attention mechanism | |
CN116630920A (en) | Improved lane line type identification method of YOLOv5s network model | |
CN114120246B (en) | Front vehicle detection algorithm based on complex environment | |
CN115861948A (en) | Lane line detection method, lane line detection device, lane line early warning method, lane line early warning system and medium | |
CN114882205A (en) | Target detection method based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |