CN116630702A - Pavement adhesion coefficient prediction method based on semantic segmentation network - Google Patents

Pavement adhesion coefficient prediction method based on semantic segmentation network Download PDF

Info

Publication number
CN116630702A
CN116630702A CN202310580153.7A CN202310580153A CN116630702A CN 116630702 A CN116630702 A CN 116630702A CN 202310580153 A CN202310580153 A CN 202310580153A CN 116630702 A CN116630702 A CN 116630702A
Authority
CN
China
Prior art keywords
network
road surface
image
training
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310580153.7A
Other languages
Chinese (zh)
Inventor
郭洪艳
万俊成
管人生
刘俊
孟庆瑜
赵旭
戴启坤
刘嫣然
谭中秋
李佳霖
王含
李光尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202310580153.7A priority Critical patent/CN116630702A/en
Publication of CN116630702A publication Critical patent/CN116630702A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pavement adhesion coefficient prediction method based on a semantic segmentation network, which comprises the steps of firstly, building the semantic segmentation network based on a multi-scale space attention mechanism; secondly, pre-training the constructed segmentation network on a public data set; then enriching the semantic segmentation data set and carrying out specific training on the segmentation network; then, extracting a pavement area by utilizing the segmentation network to manufacture a pavement classification network data set; then, constructing and training a road surface type classification network; finally, establishing a mapping rule to obtain road adhesion coefficient information; the method enhances the generalization capability of the algorithm on the rain and snow driving scene, and further improves the accuracy, instantaneity and robustness of the extraction of the driving road surface; meanwhile, a semantic segmentation network, a road surface extraction and a road surface recognition network serial algorithm structure are designed by combining a lightweight road surface recognition network, so that quick and accurate prediction of road surface attachment information in rich driving scenes can be realized.

Description

Pavement adhesion coefficient prediction method based on semantic segmentation network
Technical Field
The invention belongs to the technical field of intelligent automobiles, relates to an adhesion coefficient prediction method based on computer vision, and more particularly relates to a pavement adhesion coefficient prediction method based on a semantic segmentation network;
Background
Along with the increase of the quantity of the vehicles, the traffic safety problem is brought along with the convenience and rapidness of the vehicles, the traffic accidents become one of the main reasons for casualties at home and abroad, particularly under the condition of low adhesion coefficient, the unstable conditions of the vehicles such as sideslip, drifting, collision and the like are more easy to occur, and the life and property loss caused by the traffic accidents is more serious, so that the accurate road adhesion coefficient information can be acquired in advance to provide references for drivers, thereby being beneficial to improving the driving safety; meanwhile, the advanced active safety system of the automobile needs an accurate road adhesion coefficient as a support, the road adhesion coefficient can be accurately obtained, the working condition application range of the active safety system of the automobile can be expanded, and the active safety system can timely adjust the control strategy by sensing the front road state change in advance; therefore, accurate identification and acquisition of the road surface type and attachment coefficient of the driving road surface are key to improving driving safety and comfort;
at present, the method for obtaining the road adhesion coefficient mainly comprises an estimation method based on dynamic response and an image vision prediction method based on a neural network; the estimation accuracy of the estimator based on the dynamic model is limited by the accuracy of the vehicle model and the tire model, and the estimated value of the road adhesion coefficient is difficult to obtain in advance, so that certain hysteresis exists; the rapid development of the neural network and the upgrading and perfecting of intelligent vehicle hardware facilities such as the vehicle-mounted camera lead the visual road surface attachment coefficient identification method to be more reliable; meanwhile, the vision-based prediction method effectively improves the hysteresis of the dynamics estimator, can sense the road surface condition in front in advance, and improves the response capability to dangerous working conditions; in addition, the semantic segmentation network is embedded in the visual prediction algorithm, so that the algorithm can concentrate on road information, interference of redundant information is eliminated, the road surface recognition accuracy is improved, and more accurate road surface attachment coefficient prediction information is obtained;
Disclosure of Invention
Aiming at the problems existing in the prior art, in order to improve the accuracy, the instantaneity and the robustness of a pavement adhesion coefficient prediction algorithm, the invention provides a pavement adhesion coefficient prediction method based on a semantic segmentation network; according to the method, a semantic segmentation network based on an attention mechanism is built to extract a road surface area of a front driving road, accuracy and robustness of the network are improved as much as possible in a pre-training and specific training mode, then a road surface classification network based on channel attention is built and trained, and the obtained road surface classification result is combined with a mapping rule to obtain a prediction result of a road surface attachment coefficient;
the invention is realized by adopting the following technical scheme:
a pavement adhesion coefficient prediction method based on a semantic segmentation network predicts the road type classification and pavement adhesion coefficient of a vehicle driving area under urban working conditions, and specifically comprises the following steps:
step one, building a semantic segmentation network based on a multiscale space attention mechanism:
the vehicle-mounted camera equipped with the intelligent driving automobile sensing system is used for collecting video data of a road ahead in the running process of the automobile, and the exercisable road surface area is extracted through the semantic segmentation network:
Firstly, configuring and constructing a network environment: selecting a Linux operating system as an image processing and network construction and training operation environment, writing a program code by using a Python language, and selecting PyTorch developed by Meta artificial intelligence team of science and technology company as a deep learning network construction frame; using environment compiling software Anaconde to create a virtual environment, and installing Python 3.10.8 version, pyTorch 1.11.0 version and OpenCV 4.6.0.66 version in the new environment;
secondly, constructing a semantic segmentation network, and adopting an advanced lightweight Encoder-Decoder framework Encoder-Decoder to ensure the precision of the semantic segmentation network and meet the real-time requirement of an algorithm;
then, starting to specifically develop and build a hierarchical semantic segmentation network Encoder structure based on an attention mechanism under the Encoder-Decoder framework Encoder-Decoder for generating rich multi-scale semantic features: firstly, scaling an input image into tensors with the size of 1024 multiplied by 3, and inputting the tensors into a semantic segmentation network; performing a 3×3 convolutional layer, a BN layer and Gelu activation treatment to realize the functions of preliminary extraction and downsampling of features of an input picture;
and then taking the obtained feature map as input of a feature extraction module, wherein the specific implementation process of the feature extraction module is as follows: firstly, adopting 1X 1 convolution for adjusting the number of input channels, inputting the input channels into a spatial attention mechanism module formed by 5X 5 main branch convolution and subsequent 7X 7 and 13X 13 branch convolution after Gelu activation function, and outputting a weight parameter representing spatial attention through the 1X 1 convolution; wherein the multi-scale spatial attention mechanism formula is shown as follows:
Atten in formula (1) is the output weight parameter representing the spatial attention, atten 0 Weight output for main branch of spatial attention module, atten i The weight of the ith branch convolution in the module is output, wherein i is the number of the branch convolutions;
expanding the number of channels of the feature map through 1X 1 convolution, carrying out finer feature extraction on the feature map subjected to space attention weighting processing by adopting 3X 3 convolution, activating an extraction result through a GeLu function, and finally outputting a processing result of a feature extraction module through 1X 1 convolution;
wherein Output is the Output of the bottleneck module, F is the input of the bottleneck module, conv2D (·) represents the convolution for processing the input feature map, gelu (·) represents the further processing of the feature map using the activation function;
performing four-time serial feature extraction on the network input tensor by using the constructed encoder bottleneck module to obtain four-stage feature graphs representing different semantic levels; the first stage feature extraction is performed by three serial feature extraction modules, and 256×256×32 feature graphs are output; the number of layers of the convolution layers, the number of in-layer convolutions and the number of channels are adjusted through the downsampling layer architecture, and after the feature map is downsampled, the feature map with the size of 128 multiplied by 64 in the second stage is obtained through three serial feature extraction modules; then obtaining a 64×64×160 phase three feature map through a downsampling layer and five serial feature processing modules; finally, obtaining a 32 multiplied by 256 feature map of the fourth stage through downsampling once again and two feature extraction modules;
Finally, a lightweight Decoder architecture is built based on specific expansion of an Encoder-Decoder framework Encoder-Decoder: the feature images of different stages extracted by the encoder are processed sequentially through an up-sampling, channel splicing operation, a light ham _head decoder module and a full-connection classifier, and are used for fusing semantic information of different levels; in order to collect multi-scale semantic information and expand receptive fields, a stage I with more low-level semantic information is abandoned, the feature images of a stage II, a stage III and a stage IV are processed into feature images with the same size through an up-sampling method of bilinear interpolation, then channel splicing concat operation is carried out, the feature images are input into a ham _head decoder module, and the feature images with the sizes of 128 multiplied by 480 after the integration are processed; the specific implementation process of the ham _head decoder module is as follows: compressing the channel number of the 480 channels of the feature map obtained by the concat operation to 256 through a 1 multiplied by 1 convolution layer, a 32 group GroupNorm layer and a ReLu activation layer, and adopting a non-Negative Matrix Factorization (NMF) algorithm shown in a formula (3):
V m×n =P m×r ×Q r×n (3)
matrix V of features m×n Low-set matrix P decomposed into mxr and rxn m×r And Q r×n Wherein P is m×r The main characteristics of the data are reflected by the characteristic base matrix; and Q is r×n A characteristic coefficient matrix which represents the distribution of data characteristics; the characteristic matrix is replaced by the characteristic matrix in the algorithm, so that the interference of a redundant matrix can be avoided, the image algorithm processing process is accelerated, and the algorithm instantaneity is improved; finally, outputting a processing result through a linearization layer of a 1X 1 convolution architecture formed by a 1X 1 convolution layer, 32 groups of group Norm layers and a ReLu activation layer to obtain the output of the ham _head decoder module;
then the processed feature map is subjected to full connection operation through a 1×1 convolution layer, category scores are converted into probability distribution through a softmax function, and a 256×256×nls feature map is obtained, wherein the value of each pixel point represents the probability of belonging to the semantic category, and nls represents the number of the semantic categories to be classified; then upsampling the obtained feature map by bilinear interpolation, and giving semantic class prediction results of each pixel point according to the maximum probability by using ArgMax function to obtain 1024×1024 prediction segmentation result map with the same size as the input picture, namely the output of the encoder;
step two, pre-training the built segmentation network on a public data set:
the method comprises the steps of selecting a Cityscapes data set to conduct pre-training of a semantic segmentation network, wherein the data set is pushed by a Benz company to issue street view image segmentation tasks which can be used for city street view driving scene images of 50 cities, and the street view driving scene images have accurate labels covering 19 semantic information of road surfaces, buildings, pedestrians, vehicles, buildings and the like; meanwhile, the image sample is similar to the driving form of the daily urban road condition, so that the pre-trained semantic segmentation network has certain generalization capability; the number of the accurate annotation image samples participating in the training and verification of the network model is 5000, wherein 2975 training set pictures are 500 verification set pictures, and 1525 test set pictures are measured; the specific pre-training process is as follows: firstly, loading images from a data set catalog, then loading corresponding annotation information, and then carrying out data augmentation on the trained images and annotations, wherein the augmentation mode comprises random scaling, random clipping of picture tensors and random left-right overturn, then normalizing each pixel point of an input image through a normalization function, and finally designing a cross entropy loss function shown as a formula (4):
Wherein nls is the semantic category to be classified, and the value of nls is 19 and y when training in the Cityscapes data set c To take a vector with a value of 0 or 1, the value of the element is used to judge whether the category is the same as the sample category or not, P c The probability c E (1, nls) representing that the predicted sample belongs to class c;
and selecting a Poly learning rate rule to train the segmentation network, wherein the Poly learning rate attenuation expression is as follows:
LR in formula (5) initial For the initial learning rate of the network training, the initial learning rate is set to be 0.002 in the pre-training process, the iter is the iterative step number of the network training, the max_iter is the maximum training step number, the initial learning rate is set to be 40K steps, and the power is the attenuation coefficient for controlling the shape of the learning rate curve and is set to be 0.9; LR (iter) is the learning rate corresponding to the specific steps in the training process obtained by calculation and update, and the learning rate of each parameter is dynamically adjusted by using the first moment estimation and the second moment estimation of the gradient by using an Adam optimization solving algorithm; setting the batch processing size as 32 according to the hardware performance of the computer, storing model parameters once every 4K steps, and simultaneously carrying out performance evaluation on the network by using a verification set;
for semantic segmentation tasks, selecting an average pixel precision Acc and average intersection ratio MIoU index based on a confusion matrix to perform performance evaluation on a network pre-training result, wherein the specific implementation process is as follows: for the two-class Confusion Matrix configuration shown in table 1, each row represents the true attribution class of the data, each column represents the prediction class, and a specific element value in the Matrix represents the number of samples predicted as a certain class;
Table 1: schematic table of two-class Confusion Matrix fusion Matrix
The accuracy Acc indicates the percentage of the number of pixels of the correct prediction category to the total number of pixels, and the formula (6) is as follows:
MIoU represents the ratio of the intersection to the union of each type of predicted and real values, and the result of the summation re-average, equation (7) is as follows:
using a logging recording tool in PyTorch to store training data into a log file, and analyzing a training parameter curve, wherein the built semantic segmentation network shows excellent performance in urban driving scenes in good weather;
enriching semantic segmentation data sets and performing specific training on a segmentation network
Aiming at the problem that a segmentation network is easy to be disturbed and distorted in rainy and snowy weather conditions, the method prepares a semantic segmentation data set containing six road types of dry and wet asphalt, dry and wet cement, a snow covered road surface and an ice covered road surface, and continuously trains and evaluates the pre-trained semantic segmentation network on the data set, thereby further improving the generalization capability of the built network, being better applied to daily urban driving scenes, and comprising the following specific realization processes:
firstly, acquiring driving scene data under different weather conditions by using vehicle-mounted camera equipment, and disassembling an acquired video file according to a fixed frame rate to obtain an image sample information base covering six road surface categories: and then, screening image samples with rich differences and obvious features from an image database to carry out fine labeling, wherein the specific labeling process is as follows: firstly, installing an EISeg interactive marking tool in Anaconda and starting a platform, then importing a driving scene picture to be marked, uploading the picture to the EISeg platform, then creating a marking task, selecting a semantic segmentation task type, selecting a marking format of a Cityscapes dataset, then selecting a marking data function, carrying out segmentation marking on the imported picture, entering an interactive marking interface, and using the marking tool to mark the picture; after the labeling is finished, a function of exporting data is used, and a format of exporting the Cityscapes data set is selected and an output path is designated; constructing the manufactured data set according to a Cityscapes data set format, preparing 100 accurate labeling pictures of each type of pavement information, counting 600 data set samples, randomly extracting 20% to be used as a verification set, and using the remaining 80% as a training set; the data set is subjected to data expansion means of mirroring, translation and brightness adjustment, a training strategy and an optimization method in pre-training are continued, and meanwhile, a trained network is verified on a verification set; finally, model reasoning is carried out by using a trained semantic segmentation network, an input driving scene picture is processed, and a visualized pixel-level semantic segmentation result is output;
Step four, extracting a pavement area by using the segmentation network to manufacture a pavement classification network data set:
inputting the image sample library containing 6 types of road surface types obtained in the third step into a semantic segmentation network for prediction processing according to the road surface types, wherein the process is similar to an online identification process in a daily driving environment, and then sending a prediction result and an original image into a processor for extracting a road surface area; the specific process of extracting the pavement area is as follows: firstly, splitting a semantic segmentation prediction result graph according to color channels, respectively obtaining two-dimensional matrixes corresponding to three color channels, looking up a table to find BGR values (128, 64, 128) corresponding to 'road', respectively manufacturing masks of the three BGR color channels by using OpenCV, and only reserving pixel points with the same BGR values as 'road', so as to ensure that only a pavement area is extracted; finally, splicing masks of the three-color channels, respectively performing matrix dot product operation on the masks and the channels corresponding to the original image, extracting to obtain a picture only containing a pavement area, and setting a non-pavement area as a background and converting the non-pavement area into black; the extracted image sample library only containing the pavement area is respectively built into specific folders according to six types of pavement information for storage, and the specific pavement types and label information corresponding to the corresponding 5-bit binary information are prepared as shown in table 2:
Table 2: pavement type and label information correspondence table
Screening the manufactured data set according to the image quality, wherein the final data set capacity comprises 6000 image samples, and each pavement category capacity is 1000 images respectively; finally, the pavement image data set is disturbed, each type of picture is randomly extracted according to the proportion of 20 percent to be used as a verification set, and the rest part is used as a training set;
step five, constructing and training a road surface image type classification network:
the concrete implementation process for constructing the road surface image type classification network is as follows: firstly, constructing a pavement image type recognition network in an Anaconda environment, taking the complexity of pavement classification tasks and the real-time requirement in the classification process into consideration, constructing a lightweight convolutional neural network as a main network of the classifier, introducing a channel attention mechanism, reducing network parameters and floating point operands as much as possible while ensuring the recognition precision, and improving the real-time performance of the classification network; the specific road surface image classification network structure design shown in table 3 is as follows:
table 3: road surface classification network structure table
Firstly, an input layer of a network adjusts a road surface image to be identified into a tensor of 288 multiplied by 3, then, the characteristic is primarily extracted by a convolution layer of 3 multiplied by 3 convolution kernels with a step length of 2 and batch regularization BN and Swish activation function operation, and then, the obtained characteristic image is used as input of a built bottleneck module, the input characteristic image is subjected to characteristic extraction through the bottleneck module, and the specific process of building the network bottleneck module based on a channel attention mechanism is as follows: firstly, changing the dimension of a characteristic channel according to an expansion proportion by adopting 1X 1 convolution, performing batch normalization BN and Swish activation, inputting the characteristic image into a depth separable convolution DWConv layer of 3X 3 for characteristic extraction, inputting the obtained characteristic image into a channel attention mechanism SE module, performing global average pooling operation on the input characteristic image by the module, performing Swish activation to aggregate global characteristics of the input characteristic image, performing full-connection operation on the obtained global characteristics according to the 1X 1 convolution layer with a certain activation rate, obtaining attention weights of different channels by a sigmoid activation function, performing dot multiplication weighting on the obtained weights and the input characteristic image, retaining the main characteristics of input, removing noise interference, reducing parameter calculation, improving the real-time performance of a network, and achieving the purpose of paying attention to more important information channels; finally, the output of the SE module is subjected to convolution of 1 multiplied by 1 to obtain the output of the whole bottleneck module;
As shown in table 3, the feature map extracted by the convolution layer with the step length of 2 and the convolution kernel of 3×3 is input into two bottleneck modules with serial expansion rate of 1 to perform feature extraction to obtain 144×144×24 feature maps, and the first three layers of networks in the network structure adopting the bottleneck modules to perform feature extraction are designed, so that the size of the output feature map is reduced to half of the input feature map when the feature extraction of the bottleneck layer is performed once, and meanwhile, the number of output channels is doubled as the number of input channels, and the feature map with 36×36×64 is obtained when the feature extraction of the first three layers is performed; the three later layers of feature extraction layers sequentially adopt a channel attention mechanism with the activation proportion of 0.35, 0.30 and 0.25, more pay attention to the feature channels with rich contents, finally obtain a 9×9×256 feature image group containing advanced semantic information, carry out channel dimension ascending through a 1×1 convolution layer, carry out pooling treatment on the 9×9×1280 feature image group after dimension ascending by adopting a 9×9 global average pooling layer, and obtain a feature sequence of 1×1×1280; finally, through the full connection layer and the full connection layer matched with the Softmax function, outputting probability values belonging to various road surface categories, and finally determining a network classification result according to the maximum probability value by using the Argmax function;
Training the built road surface image type network on a self-built data set, adopting a cross entropy loss function in a training process, using an Adam self-adaptive gradient descent optimization algorithm, setting a basic learning rate to be 0.0001, and storing a model and a training result according to the iteration times epoch;
step six, establishing a mapping rule to obtain road adhesion coefficient information:
finally, obtaining a corresponding road surface adhesion coefficient through the road surface image type classification result obtained according to the steps; in order to obtain the road surface adhesion coefficient of the identified road surface, referring to an automobile longitudinal slip adhesion coefficient reference value table and an automobile longitudinal slip adhesion coefficient reference value table of ice and snow road surfaces in GA/T643-2006 typical traffic accident shape vehicle running speed technical identification, and considering the influence of the running speed on the road surface adhesion coefficient, defining a mapping rule of the road surface type and the road surface adhesion coefficient as shown in Table 4 and considering the low-speed running state that the running speed of the vehicle is always 48km/h under the urban working condition; according to the vehicle speed and the road surface type identification result, searching a table to correspondingly obtain an adhesion coefficient range value of the current road surface, and taking the intermediate value of the upper limit and the lower limit of the adhesion coefficient as the adhesion coefficient of the current road, namely, the output result of the whole road surface adhesion coefficient prediction algorithm;
Table 4: road surface type and attachment coefficient mapping table.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a pavement adhesion coefficient prediction method based on semantic segmentation, which can provide key pavement adhesion coefficient information for an active safety control system and an auxiliary driving system of an automobile by processing road images of a front driving pavement to obtain a prediction result of the adhesion coefficient of the pavement in advance; according to the invention, a lightweight semantic segmentation network based on a multiscale space attention mechanism is built, and a self-built data set is used for specific training, so that the generalization capability of an algorithm on a rain and snow driving scene is enhanced, and the accuracy, the instantaneity and the robustness of driving road surface extraction are further improved; meanwhile, a semantic segmentation network, a pavement extraction and a pavement identification network serial algorithm structure are designed by combining a lightweight pavement identification network, so that rapid and accurate prediction of pavement attachment information in rich driving scenes can be realized;
drawings
Fig. 1 is a flow chart of a pavement adhesion coefficient prediction method based on a semantic segmentation network.
Fig. 2 is a schematic diagram of the semantic segmentation network constructed in the first step.
Fig. 3 is a schematic diagram of a bottleneck module structure for extracting features of the semantic segmentation network constructed in the first step of the method.
FIG. 4 is a schematic diagram of a bottleneck module structure for extracting characteristics of a road surface classification network constructed in the fifth step of the method
Detailed Description
The invention provides a pavement adhesion coefficient prediction method based on a semantic segmentation network, which predicts the road type classification and pavement adhesion coefficient of a vehicle driving area under urban working conditions, and specifically comprises the following steps:
step one, building a semantic segmentation network based on a multiscale space attention mechanism:
the vehicle-mounted camera equipped by the intelligent driving automobile sensing system is used for collecting video data of a road ahead in the running process of the vehicle, and a exercisable road surface area is extracted through the semantic segmentation network, wherein the performance of the semantic segmentation network determines the overall performance of the whole road surface attachment coefficient prediction algorithm to a great extent, so that the semantic segmentation network with high construction precision, good instantaneity and strong robustness is very critical, and the method is specifically designed as follows:
firstly, configuring and constructing a network environment: the Linux operating system is selected as an image processing and network building and training operating environment, and the specific hardware configuration comprises the following steps: CPU is Intel (R) Core (TM) i9-9900K model, RAM memory is 16GB, display card is NVIDIA RTX A6000, CUDA version is 11.4, hard disk storage is 256GB solid state disk plus 2TB mechanical hard disk; programming a program code by using a Python language, and selecting PyTorch developed by Meta artificial intelligence team of science and technology company as a deep learning network building frame; the configuration work of the software is started under the hardware environment: using environment compiling software Anaconde to create a virtual environment, and installing Python 3.10.8 version, pyTorch 1.11.0 version and OpenCV 4.6.0.66 version in the new environment;
Secondly, constructing a semantic segmentation network, adopting an advanced lightweight Encoder-Decoder framework Encoder-Decoder, and meeting the real-time requirement of an algorithm while guaranteeing the precision of the semantic segmentation network, wherein the specific network structure is shown in figure 1;
then, starting to specifically develop and build a hierarchical semantic segmentation network Encoder structure based on an attention mechanism under the Encoder-Decoder framework Encoder-Decoder for generating rich multi-scale semantic features: firstly, scaling an input image into tensors with the size of 1024 multiplied by 3, and inputting the tensors into a semantic segmentation network; performing a 3×3 convolutional layer, a BN layer and Gelu activation treatment to realize the functions of preliminary extraction and downsampling of features of an input picture;
and then taking the obtained feature map as input of a feature extraction module, wherein the specific implementation process of the feature extraction module is as follows: firstly, adopting 1X 1 convolution for adjusting the number of input channels, inputting the input channels into a spatial attention mechanism module formed by 5X 5 main branch convolution and subsequent 7X 7 and 13X 13 branch convolution after Gelu activation function, and outputting a weight parameter representing spatial attention through the 1X 1 convolution; wherein the multi-scale spatial attention mechanism formula is shown as follows:
Atten in formula (1) is the output weight parameter representing the spatial attention, atten 0 Weight output for main branch of spatial attention module, atten i The weight of the ith branch convolution in the module is output, and i is the number of the branch convolutions;
expanding the number of channels of the feature map through 1X 1 convolution, carrying out finer feature extraction on the feature map subjected to space attention weighting processing by adopting 3X 3 convolution, activating an extraction result through a GeLu function, and finally outputting a processing result of a feature extraction module through 1X 1 convolution;
wherein Output is the Output of the bottleneck module, F is the input of the bottleneck module, conv2D (·) represents the convolution for processing the input feature map, gelu (·) represents the further processing of the feature map using the activation function;
performing four-time serial feature extraction on the network input tensor by using the constructed encoder bottleneck module to obtain four-stage feature graphs representing different semantic levels; the first stage feature extraction is performed by three serial feature extraction modules, and 256×256×32 feature graphs are output; the number of layers of the convolution layers, the number of in-layer convolutions and the number of channels are adjusted through the downsampling layer architecture, and after the feature map is downsampled, the feature map with the size of 128 multiplied by 64 in the second stage is obtained through three serial feature extraction modules; then obtaining a 64×64×160 phase three feature map through a downsampling layer and five serial feature processing modules; finally, obtaining a 32 multiplied by 256 feature map of the fourth stage through downsampling once again and two feature extraction modules;
Finally, a lightweight Decoder architecture is built based on specific expansion of an Encoder-Decoder framework Encoder-Decoder: the feature images of different stages extracted by the encoder are processed sequentially through an up-sampling, channel splicing operation, a light ham _head decoder module and a full-connection classifier, and are used for fusing semantic information of different levels; in order to collect multi-scale semantic information and expand receptive fields, a stage I with more low-level semantic information is abandoned, the feature images of a stage II, a stage III and a stage IV are processed into feature images with the same size through an up-sampling method of bilinear interpolation, then channel splicing concat operation is carried out, the feature images are input into a ham _head decoder module, and the feature images with the sizes of 128 multiplied by 480 after the integration are processed; the specific implementation process of the ham _head decoder module is as follows: compressing the channel number of the 480 channels of the feature map obtained by the concat operation to 256 through a 1 multiplied by 1 convolution layer, a 32 group GroupNorm layer and a ReLu activation layer, and adopting a non-Negative Matrix Factorization (NMF) algorithm shown in a formula (3):
V m×n =P m×r ×Q r×n (3)
matrix V of features m×n Low-set matrix P decomposed into mxr and rxn m×r And Q r×n Wherein P is m×r The main characteristics of the data are reflected by the characteristic base matrix; and Q is r×n A characteristic coefficient matrix which represents the distribution of data characteristics; the algorithm replaces the feature matrix with the feature matrix, and can map high-dimensional data to low-dimensional data and keep important features of the data, so that interference of redundant matrixes is abandoned, generalization capability of a model is enhanced, and meanwhile, the image algorithm processing process can be accelerated, and algorithm instantaneity is improved; finally, outputting a processing result through a linearization layer of a 1X 1 convolution architecture formed by a 1X 1 convolution layer, 32 groups of group Norm layers and a ReLu activation layer to obtain the output of the ham _head decoder module;
then the processed feature map is subjected to full connection operation through a 1×1 convolution layer, category scores are converted into probability distribution through a softmax function, and a 256×256×nls feature map is obtained, wherein the value of each pixel point represents the probability of belonging to the semantic category, and nls represents the number of the semantic categories to be classified; then upsampling the obtained feature map by bilinear interpolation, and giving semantic class prediction results of each pixel point according to the maximum probability by using ArgMax function to obtain 1024×1024 prediction segmentation result map with the same size as the input picture, namely the output of the encoder;
Step two, pre-training the built segmentation network on a public data set:
the method comprises the steps of selecting a Cityscapes data set to conduct pre-training of a semantic segmentation network, wherein the data set is pushed by a Benz company to issue street view image segmentation tasks which can be used for city street view driving scene images of 50 cities, and the street view driving scene images have accurate labels covering 19 semantic information of road surfaces, buildings, pedestrians, vehicles, buildings and the like; meanwhile, the image sample is similar to the driving form of the daily urban road condition, so that the pre-trained semantic segmentation network has certain generalization capability; the number of the accurate annotation image samples participating in the training and verification of the network model is 5000, wherein 2975 training set pictures are 500 verification set pictures, and 1525 test set pictures are measured; the specific pre-training process is as follows: firstly, loading images from a data set catalog, then loading corresponding annotation information, and then carrying out data augmentation on the trained images and annotations, wherein the augmentation mode comprises random scaling, random clipping of picture tensors and random left-right overturn, then normalizing each pixel point of an input image through a normalization function, and finally designing a cross entropy loss function shown as a formula (4):
Wherein nls is the semantic category to be classified, and the value of nls is 19 and y when training in the Cityscapes data set c To take a vector with a value of 0 or 1, the value of the element is used to judge whether the category is the same as the sample category or not, P c The probability c E (1, nls) representing that the predicted sample belongs to class c;
and selecting a Poly learning rate rule to train the segmentation network, wherein the Poly learning rate attenuation expression is as follows:
LR in formula (5) initial For the initial learning rate of the network training, the initial learning rate is set to be 0.002 in the pre-training process, the iter is the iterative step number of the network training, the max_iter is the maximum training step number, the initial learning rate is set to be 40K steps, and the power is the attenuation coefficient for controlling the shape of the learning rate curve and is set to be 0.9; LR (iter) is the learning rate corresponding to the specific steps in the training process obtained by calculation and update, and the learning rate of each parameter is dynamically adjusted by using the first moment estimation and the second moment estimation of the gradient by using an Adam optimization solving algorithm; setting batches according to computer hardware performanceThe processing size is 16, model parameters are stored once every 4K steps, and meanwhile, a verification set is used for evaluating the performance of the network;
for semantic segmentation tasks, the performance evaluation is carried out on the network pre-training result by using an average pixel precision Acc and average cross-over ratio MIoU index based on a confusion matrix, and the specific implementation process is as follows: for the two-class Confusion Matrix configuration shown in table 1, each row represents the true attribution class of the data, each column represents the prediction class, and a specific element value in the Matrix represents the number of samples predicted as a certain class;
Table 1: schematic table of two-class Confusion Matrix fusion Matrix
The accuracy Acc indicates the percentage of the number of pixels of the correct prediction category to the total number of pixels, and the formula (6) is as follows:
MIoU represents the ratio of the intersection to the union of each type of predicted and real values, and the result of the summation re-average, equation (7) is as follows:
using a logging recording tool in PyTorch to store training data into a log file, and analyzing a training parameter curve, wherein the built semantic segmentation network shows excellent performance in urban driving scenes in good weather;
in the embodiment, acc belonging to the road category is 0.9909, the MIoU value is 0.9835, and the MIoU value belonging to the full-semantic category is 0.7959, which represent that the built semantic segmentation network has higher semantic segmentation precision, and the prediction time of a single picture is 0.02s, so that the real-time requirement of the algorithm in the vehicle driving process is met;
enriching semantic segmentation data sets and performing specific training on a segmentation network
In the experimental process, the semantic segmentation network which is built and trained only through the steps is found to have poor performance and even serious distortion condition on the task of segmenting the ponding snow-covered pavement in rainy and snowy weather; the training data under the Cityscapes data set are collected in urban driving scenes under good weather conditions, and driving scenes under rain and snow weather conditions are not included, so that the semantic segmentation network which is only pre-trained is easy to be interfered by bad weather and special road conditions to be misclassified, and the overall performance of the semantic segmentation network is affected;
In order to solve the problems, the method is better applied to daily driving scenes, and aims at solving the problem that a segmentation network is easy to be disturbed and distorted under rainy and snowy weather conditions, a semantic segmentation data set containing six road types including dry and wet asphalt, dry and wet cement, a snow covered road surface and an ice covered road surface is manufactured, and training and evaluation are continuously carried out on the data set on the pre-trained semantic segmentation network, so that the generalization capability of the built network is further improved, and the method is better applied to daily urban driving scenes, and the specific implementation process is as follows:
firstly, selecting proper vehicle-mounted camera equipment to collect driving scene data under different weather conditions, and disassembling the collected video files according to a fixed frame rate to obtain an image sample information base covering six road surface categories:
in the embodiment, the selected image acquisition equipment is a high-definition image sensor of F1005-E Sensor Unit AXIS, the working temperature is-30-55 ℃, the focal length adjustable range is 28-120 mm, the acquired video resolution is 1920 multiplied by 1200, the frame rate is 60fps, and the visual angle FOV of a lens is wide-angle 113 degrees;
in this embodiment, the image capturing device is installed on an image capturing experiment vehicle, and after a long-time road surface image data capturing experiment, the video files including different weather conditions such as sunny days, cloudy days, rainy days, snowy days, and the like, and daytime, evening, night, and the like, in different capturing time periods such as daytime, evening, night, and the like, are finally collected; decomposing the video file into pictures according to fixed frame number intervals, sorting according to attribution categories, unifying naming modes, and placing similar road surface images under the same folder; finally, 6 pavement type image sample libraries are obtained by arrangement;
And then, screening image samples with rich differences and obvious features from an image database to carry out fine labeling, wherein the specific labeling process is as follows: firstly, installing an EISeg interactive marking tool in Anaconda and starting a platform, then importing a driving scene picture to be marked, uploading the picture to the EISeg platform, then creating a marking task, selecting a semantic segmentation task type, selecting a marking format of a Cityscapes dataset, then selecting a marking data function, carrying out segmentation marking on the imported picture, entering an interactive marking interface, and using the marking tool to mark the picture; after the labeling is finished, a function of exporting data is used, and a format of exporting the Cityscapes data set is selected and an output path is designated; the method comprises the steps of carrying out a first treatment on the surface of the
Constructing the manufactured data set according to a Cityscapes data set format, preparing 100 accurate labeling pictures of each type of pavement information, counting 600 data set samples, randomly extracting 20% to be used as a verification set, and using the remaining 80% as a training set; the training strategy and the optimization method in the pre-training are continued by the data expansion means of mirroring, translation and brightness adjustment of the data set, meanwhile, the trained network is verified on the verification set, finally, the model reasoning is carried out by utilizing the trained semantic segmentation network, the input driving scene picture can be processed, and the visualized segmentation result is output;
The experimental result in the embodiment shows that the semantic segmentation network subjected to specific training adapts to the interference of a rain and snow road surface, wherein the MIoU of the verification set part of the rain and snow road surface in the data set belongs to the road surface is improved from 0.5532 to 0.7309, the precision of the constructed semantic segmentation network is further improved, and meanwhile, the algorithm is more reliable;
step four, extracting a pavement area by using the segmentation network to manufacture a pavement classification network data set:
step one to three belong to the construction and discrete training processes of a semantic segmentation network, and when the pavement attachment coefficient prediction technology is specifically applied, the segmentation network is required to be used for online identification, so that an industrial personal computer is used for simulating the online identification process of the semantic segmentation network;
inputting the image sample library containing 6 types of road surface types obtained in the third step into a semantic segmentation network for prediction processing according to the road surface types, wherein the process is similar to an online identification process in a daily driving environment, and then sending a prediction result and an original image into a processor for extracting a road surface area; the specific process of extracting the pavement area is as follows: firstly, splitting a semantic segmentation prediction result graph according to color channels, respectively obtaining two-dimensional matrixes corresponding to three color channels, looking up a table to find BGR values (128,64,128) corresponding to 'road', respectively manufacturing masks of the three BGR color channels by using OpenCV, and only reserving pixels with the same BGR values as 'road', so as to ensure that only road surface areas are extracted; finally, splicing masks of the three-color channels, respectively performing matrix dot product operation on the masks and the channels corresponding to the original image, extracting to obtain a picture only containing a pavement area, and setting a non-pavement area as a background and converting the non-pavement area into black; the extracted image sample library only containing the pavement area is respectively built into specific folders according to six types of pavement information for storage, and specific pavement types and label information for making corresponding 5-bit binary information are shown in table 2:
Table 2: pavement type and label information correspondence table
Screening the manufactured data set according to the image quality, wherein the final data set capacity comprises 6000 image samples, and each pavement category capacity is 1000 images respectively; finally, the pavement image data set is disturbed, each type of picture is randomly extracted according to the proportion of 20 percent to be used as a verification set, and the rest part is used as a training set;
step five, constructing and training a road surface image type classification network:
step one to four, the road surface area extraction process of the image information in the driving process is completed, the prediction method of the road surface attachment coefficient based on vision is to be realized, and the recognition of the road surface type is also required to be completed on the basis of the road surface area extraction result, so that the construction and training of the road surface image type classification network is particularly critical; the specific implementation process is as follows:
the concrete implementation process for constructing the road surface image type classification network is as follows: firstly, constructing a road surface image type classification network under an Anaconda environment, constructing a lightweight convolutional neural network as a main network of the classifier in consideration of the complexity of road surface classification tasks and real-time requirements in the classification process, simultaneously introducing a channel attention mechanism, ensuring the recognition precision, reducing network parameters and floating point operands as much as possible, and improving the real-time performance of the classification network; the specific road surface image type classification network structure design as shown in table 3 is as follows:
Table 3: road surface classification network structure table
Firstly, an input layer of a network adjusts a road surface image to be identified into a tensor of 288 multiplied by 3, then, the characteristic is primarily extracted by a convolution layer of 3 multiplied by 3 convolution kernels with a step length of 2 and batch regularization BN and Swish activation function operation, then, the obtained characteristic diagram is used as input of a built bottleneck module, the characteristic extraction is carried out on the input characteristic diagram through the bottleneck module, and the structure and the specific process of the network bottleneck module based on a channel attention mechanism are shown in figure 3: firstly, changing the dimension of a characteristic channel according to an expansion proportion by adopting 1X 1 convolution, performing batch normalization BN and Swish activation, inputting the characteristic image into a depth separable convolution DWConv layer of 3X 3 for characteristic extraction, inputting the obtained characteristic image into a channel attention mechanism SE module, performing global average pooling operation on the input characteristic image by the module, performing Swish activation to aggregate global characteristics of the input characteristic image, performing full-connection operation on the obtained global characteristics according to the 1X 1 convolution layer with a certain activation rate, obtaining attention weights of different channels by a sigmoid activation function, performing dot multiplication weighting on the obtained weights and the input characteristic image, retaining the main characteristics of input, removing noise interference, reducing parameter calculation, improving the real-time performance of a network, and achieving the purpose of paying attention to more important information channels; finally, the output of the SE module is subjected to convolution of 1 multiplied by 1 to obtain the output of the whole bottleneck module;
As shown in table 3, the feature map extracted by the convolution layer with the step length of 2 and the convolution kernel of 3×3 is input into two bottleneck modules with serial expansion rate of 1 to perform feature extraction to obtain 144×144×24 feature maps, and the first three layers of networks in the network structure adopting the bottleneck modules to perform feature extraction are designed, so that the size of the output feature map is reduced to half of the input feature map when the feature extraction of the bottleneck layer is performed once, and meanwhile, the number of output channels is doubled as the number of input channels, and the feature map with 36×36×64 is obtained when the feature extraction of the first three layers is performed; the three later layers of feature extraction layers sequentially adopt channel attentions with the activation proportion of 0.35, 0.30 and 0.25, pay more attention to feature channels rich in content, finally obtain 9×9×256 feature image groups containing advanced semantic information, carry out channel dimension ascending through a 1×1 convolution layer, carry out pooling treatment on the 9×9×1280 feature image groups after dimension ascending by adopting a 9×9 global average pooling layer, and obtain a feature sequence of 1×1×1280; finally, through the full connection layer and the full connection layer matched with the Softmax function, outputting probability values belonging to various road surface categories, and finally determining a network classification result according to the maximum probability value by using the Argmax function;
Training the built road surface image type network on a self-built data set, adopting a cross entropy loss function in a training process, using an Adam self-adaptive gradient descent optimization algorithm, setting a basic learning rate to be 0.0001, and storing a model and a training result according to the iteration times epoch;
in the embodiment, the trained model and parameters are stored, an online identification process is simulated, 1200 image sample data are randomly extracted from a sample library acquired by a real vehicle for verification, the concrete expression of the constructed classification network is shown in table 4, wherein the numbers at the diagonal of a confusion matrix represent the correct prediction sample number of images, namely 1179 pictures in the verification sample are correctly predicted, the average accuracy of model verification is 0.9825, the processing time of the pictures is 0.009s, and the precision and real-time requirements of road classification technology are met;
table 4: road surface classification network classification result confusion matrix
Step six, establishing a mapping rule to obtain road adhesion coefficient information:
finally, obtaining a corresponding road surface adhesion coefficient through the road surface image type classification result obtained according to the steps; in order to obtain the road surface adhesion coefficient of the identified road surface, referring to an automobile longitudinal slip adhesion coefficient reference value table and an automobile longitudinal slip adhesion coefficient reference value table of ice and snow road surfaces in GA/T643-2006 typical traffic accident shape vehicle running speed technical identification, and considering the influence of the running speed on the road surface adhesion coefficient, defining a mapping rule of the road surface type and the road surface adhesion coefficient as shown in Table 5 and considering the low-speed running state that the running speed of the vehicle is about 48km/h under the urban working condition; according to the vehicle speed and the road surface type identification result, table look-up 5 correspondingly obtains the adhesion coefficient range value of the current road surface, takes the intermediate value of the upper limit and the lower limit of the adhesion coefficient as the adhesion coefficient of the current road, namely the output result of the whole road surface adhesion coefficient prediction algorithm;
Road surface type Adhesion coefficient Determination value of adhesion coefficient
Dry asphalt 0.55-0.8 0.675
Wet asphalt 0.45-0.7 0.575
Dry cement 0.55-0.8 0.675
Wet cement 0.45-0.75 0.600
Snow covered road 0.1-0.25 0.225
Ice-covered road 0.1-0.2 0.150
Table 5: road surface type and attachment coefficient mapping table
In the embodiment, the sum of average processing time of a semantic segmentation network and a mask extraction and recognition network in the designed pavement adhesion coefficient prediction algorithm is 0.0314 seconds, and the accuracy is high, so that the requirements of the prediction algorithm on accuracy and instantaneity are met.

Claims (1)

1. A pavement adhesion coefficient prediction method based on a semantic segmentation network predicts the road type classification and pavement adhesion coefficient of a vehicle driving area under urban working conditions, and is characterized by comprising the following specific steps:
step one, building a semantic segmentation network based on a multiscale space attention mechanism:
the vehicle-mounted camera equipped with the intelligent driving automobile sensing system is used for collecting video data of a road ahead in the running process of the automobile, and the exercisable road surface area is extracted through the semantic segmentation network:
firstly, configuring and constructing a network environment: selecting a Linux operating system as an image processing and network construction and training operation environment, writing a program code by using a Python language, and selecting PyTorch developed by Meta artificial intelligence team of science and technology company as a deep learning network construction frame; using environment compiling software Anaconde to create a virtual environment, and installing Python 3.10.8 version, pyTorch 1.11.0 version and OpenCV 4.6.0.66 version in the new environment;
Secondly, constructing a semantic segmentation network, and adopting an advanced lightweight Encoder-Decoder framework Encoder-Decoder to ensure the precision of the semantic segmentation network and meet the real-time requirement of an algorithm;
then, starting to specifically develop and build a hierarchical semantic segmentation network Encoder structure based on an attention mechanism under the Encoder-Decoder framework Encoder-Decoder for generating rich multi-scale semantic features: firstly, scaling an input image into tensors with the size of 1024 multiplied by 3, and inputting the tensors into a semantic segmentation network; performing a 3×3 convolutional layer, a BN layer and Gelu activation treatment to realize the functions of preliminary extraction and downsampling of features of an input picture;
and then taking the obtained feature map as input of a feature extraction module, wherein the specific implementation process of the feature extraction module is as follows: firstly, adopting 1X 1 convolution for adjusting the number of input channels, inputting the input channels into a spatial attention mechanism module formed by 5X 5 main branch convolution and subsequent 7X 7 and 13X 13 branch convolution after Gelu activation function, and outputting a weight parameter representing spatial attention through the 1X 1 convolution; wherein the multi-scale spatial attention mechanism formula is shown as follows:
Atten in formula (1) is the output weight parameter representing the spatial attention, atten 0 Weight output for main branch of spatial attention module, atten i The weight of the ith branch convolution in the module is output, wherein i is the number of the branch convolutions;
expanding the number of channels of the feature map through 1X 1 convolution, carrying out finer feature extraction on the feature map subjected to space attention weighting processing by adopting 3X 3 convolution, activating an extraction result through a GeLu function, and finally outputting a processing result of a feature extraction module through 1X 1 convolution;
wherein Output is the Output of the bottleneck module, F is the input of the bottleneck module, conv2D (·) represents the convolution for processing the input feature map, gelu (·) represents the further processing of the feature map using the activation function;
performing four-time serial feature extraction on the network input tensor by using the constructed encoder bottleneck module to obtain four-stage feature graphs representing different semantic levels; the first stage feature extraction is performed by three serial feature extraction modules, and 256×256×32 feature graphs are output; the number of layers of the convolution layers, the number of in-layer convolutions and the number of channels are adjusted through the downsampling layer architecture, and after the feature map is downsampled, the feature map with the size of 128 multiplied by 64 in the second stage is obtained through three serial feature extraction modules; then obtaining a 64×64×160 phase three feature map through a downsampling layer and five serial feature processing modules; finally, obtaining a 32 multiplied by 256 feature map of the fourth stage through downsampling once again and two feature extraction modules;
Finally, a lightweight Decoder architecture is built based on specific expansion of an Encoder-Decoder framework Encoder-Decoder: the feature images of different stages extracted by the encoder are processed sequentially through an up-sampling, channel splicing operation, a light ham _head decoder module and a full-connection classifier, and are used for fusing semantic information of different levels; in order to collect multi-scale semantic information and expand receptive fields, a stage I with more low-level semantic information is abandoned, the feature images of a stage II, a stage III and a stage IV are processed into feature images with the same size through an up-sampling method of bilinear interpolation, then channel splicing concat operation is carried out, the feature images are input into a ham _head decoder module, and the feature images with the sizes of 128 multiplied by 480 after the integration are processed; the specific implementation process of the ham _head decoder module is as follows: compressing the channel number of the 480 channels of the feature map obtained by the concat operation to 256 through a 1 multiplied by 1 convolution layer, a 32 group GroupNorm layer and a ReLu activation layer, and adopting a non-Negative Matrix Factorization (NMF) algorithm shown in a formula (3):
V m×n =P m×r ×Q r×n (3)
matrix V of features m×n Low-set matrix P decomposed into mxr and rxn m×r And Q r×n Wherein P is m×r The main characteristics of the data are reflected by the characteristic base matrix; and Q is r×n A characteristic coefficient matrix which represents the distribution of data characteristics; the characteristic matrix is replaced by the characteristic matrix in the algorithm, so that the interference of a redundant matrix can be avoided, the image algorithm processing process is accelerated, and the algorithm instantaneity is improved; finally, outputting a processing result through a linearization layer of a 1X 1 convolution architecture formed by a 1X 1 convolution layer, 32 groups of group Norm layers and a ReLu activation layer to obtain the output of the ham _head decoder module;
then the processed feature map is subjected to full connection operation through a 1×1 convolution layer, category scores are converted into probability distribution through a softmax function, and a 256×256×nls feature map is obtained, wherein the value of each pixel point represents the probability of belonging to the semantic category, and nls represents the number of the semantic categories to be classified; then upsampling the obtained feature map by bilinear interpolation, and giving semantic class prediction results of each pixel point according to the maximum probability by using ArgMax function to obtain 1024×1024 prediction segmentation result map with the same size as the input picture, namely the output of the encoder;
step two, pre-training the built segmentation network on a public data set:
the method comprises the steps of selecting a Cityscapes data set to conduct pre-training of a semantic segmentation network, wherein the data set is pushed by a Benz company to issue street view image segmentation tasks which can be used for city street view driving scene images of 50 cities, and the street view driving scene images have accurate labels covering 19 semantic information of road surfaces, buildings, pedestrians, vehicles, buildings and the like; meanwhile, the image sample is similar to the driving form of the daily urban road condition, so that the pre-trained semantic segmentation network has certain generalization capability; the number of the accurate annotation image samples participating in the training and verification of the network model is 5000, wherein 2975 training set pictures are 500 verification set pictures, and 1525 test set pictures are measured; the specific pre-training process is as follows: firstly, loading images from a data set catalog, then loading corresponding annotation information, and then carrying out data augmentation on the trained images and annotations, wherein the augmentation mode comprises random scaling, random clipping of picture tensors and random left-right overturn, then normalizing each pixel point of an input image through a normalization function, and finally designing a cross entropy loss function shown as a formula (4):
Wherein nls is the semantic category to be classified, and the value of nls is 19 and y when training in the Cityscapes data set c To take a vector with a value of 0 or 1, the value of the element is used to judge whether the category is the same as the sample category or not, P c The probability c E (1, nls) representing that the predicted sample belongs to class c;
and selecting a Poly learning rate rule to train the segmentation network, wherein the Poly learning rate attenuation expression is as follows:
LR in formula (5) initial For the initial learning rate of the network training, the initial learning rate is set to be 0.002 in the pre-training process, the iter is the iterative step number of the network training, the max_iter is the maximum training step number, the initial learning rate is set to be 40K steps, and the power is the attenuation coefficient for controlling the shape of the learning rate curve and is set to be 0.9; LR (iter) is the learning rate corresponding to the specific steps in the training process obtained by calculation and update, and the learning rate of each parameter is dynamically adjusted by using the first moment estimation and the second moment estimation of the gradient by using an Adam optimization solving algorithm; setting the batch processing size as 32 according to the hardware performance of the computer, storing model parameters once every 4K steps, and simultaneously carrying out performance evaluation on the network by using a verification set;
for semantic segmentation tasks, selecting an average pixel precision Acc and average intersection ratio MIoU index based on a confusion matrix to perform performance evaluation on a network pre-training result, wherein the specific implementation process is as follows: for the two-class Confusion Matrix configuration shown in table 1, each row represents the true attribution class of the data, each column represents the prediction class, and a specific element value in the Matrix represents the number of samples predicted as a certain class;
Table 1: schematic table of two-class Confusion Matrix fusion Matrix
The accuracy Acc indicates the percentage of the number of pixels of the correct prediction category to the total number of pixels, and the formula (6) is as follows:
MIoU represents the ratio of the intersection to the union of each type of predicted and real values, and the result of the summation re-average, equation (7) is as follows:
using a logging recording tool in PyTorch to store training data into a log file, and analyzing a training parameter curve, wherein the built semantic segmentation network shows excellent performance in urban driving scenes in good weather;
enriching semantic segmentation data sets and performing specific training on a segmentation network
Aiming at the problem that a segmentation network is easy to be disturbed and distorted in rainy and snowy weather conditions, the method prepares a semantic segmentation data set containing six road types of dry and wet asphalt, dry and wet cement, a snow covered road surface and an ice covered road surface, and continuously trains and evaluates the pre-trained semantic segmentation network on the data set, thereby further improving the generalization capability of the built network, being better applied to daily urban driving scenes, and comprising the following specific realization processes:
firstly, acquiring driving scene data under different weather conditions by using vehicle-mounted camera equipment, and disassembling an acquired video file according to a fixed frame rate to obtain an image sample information base covering six road surface categories: and then, screening image samples with rich differences and obvious features from an image database to carry out fine labeling, wherein the specific labeling process is as follows: firstly, installing an EISeg interactive marking tool in Anaconda and starting a platform, then importing a driving scene picture to be marked, uploading the picture to the EISeg platform, then creating a marking task, selecting a semantic segmentation task type, selecting a marking format of a Cityscapes dataset, then selecting a marking data function, carrying out segmentation marking on the imported picture, entering an interactive marking interface, and using the marking tool to mark the picture; after the labeling is finished, a function of exporting data is used, and a format of exporting the Cityscapes data set is selected and an output path is designated; constructing the manufactured data set according to a Cityscapes data set format, preparing 100 accurate labeling pictures of each type of pavement information, counting 600 data set samples, randomly extracting 20% to be used as a verification set, and using the remaining 80% as a training set; the data set is subjected to data expansion means of mirroring, translation and brightness adjustment, a training strategy and an optimization method in pre-training are continued, and meanwhile, a trained network is verified on a verification set; finally, model reasoning is carried out by using a trained semantic segmentation network, an input driving scene picture is processed, and a visualized pixel-level semantic segmentation result is output;
Step four, extracting a pavement area by using the segmentation network to manufacture a pavement classification network data set:
inputting the image sample library containing 6 types of road surface types obtained in the third step into a semantic segmentation network for prediction processing according to the road surface types, wherein the process is similar to an online identification process in a daily driving environment, and then sending a prediction result and an original image into a processor for extracting a road surface area; the specific process of extracting the pavement area is as follows: firstly, splitting a semantic segmentation prediction result graph according to color channels, respectively obtaining two-dimensional matrixes corresponding to three color channels, looking up a table to find BGR values (128, 64, 128) corresponding to 'road', respectively manufacturing masks of the three BGR color channels by using OpenCV, and only reserving pixel points with the same BGR values as 'road', so as to ensure that only a pavement area is extracted; finally, splicing masks of the three-color channels, respectively performing matrix dot product operation on the masks and the channels corresponding to the original image, extracting to obtain a picture only containing a pavement area, and setting a non-pavement area as a background and converting the non-pavement area into black; the extracted image sample library only containing the pavement area is respectively built into specific folders according to six types of pavement information for storage, and the specific pavement types and label information corresponding to the corresponding 5-bit binary information are prepared as shown in table 2:
Road surface type Folder name Index Label Dry asphalt dry_asphalt 0 000001 Wet asphalt wet_asphalt 1 000010 Dry cement dry_cement 2 000100 Wet cement wet_cement 3 001000 Snow covered road loose_snow 4 010000 Ice-covered road ice_film 5 100000
Table 2: pavement type and label information correspondence table
Screening the manufactured data set according to the image quality, wherein the final data set capacity comprises 6000 image samples, and each pavement category capacity is 1000 images respectively; finally, the pavement image data set is disturbed, each type of picture is randomly extracted according to the proportion of 20 percent to be used as a verification set, and the rest part is used as a training set;
step five, constructing and training a road surface image type classification network:
the concrete implementation process for constructing the road surface image type classification network is as follows: firstly, constructing a pavement image type recognition network in an Anaconda environment, taking the complexity of pavement classification tasks and the real-time requirement in the classification process into consideration, constructing a lightweight convolutional neural network as a main network of the classifier, introducing a channel attention mechanism, reducing network parameters and floating point operands as much as possible while ensuring the recognition precision, and improving the real-time performance of the classification network; the specific road surface image classification network structure design shown in table 3 is as follows:
layer name Convolution size Step size Activation rate Expansion rate of Number of repetitions Feature map size Number of output channels Input layer 288×288 3 Convolutional layer 3×3 2 1 144×144 24 Feature extraction 3×3 1 1 2 144×144 24 Feature extraction 3×3 2 4 4 72×72 48 Feature extraction 3×3 2 4 4 36×36 64 Feature extraction 3×3 2 0.35 4 3 36×36 128 Feature extraction 3×3 1 0.30 6 6 18×18 160 Feature extraction 3×3 1 0.25 6 9 9×9 256 Convolutional layer 1×1 9×9 1280 Pooling layer 9×9 1 1 1×1 1280 All-connectedConnecting layer 1×1 nls ArgMax 1×1 1
Table 3: road surface classification network structure table
Firstly, an input layer of a network adjusts a road surface image to be identified into a tensor of 288 multiplied by 3, then, the characteristic is primarily extracted by a convolution layer of 3 multiplied by 3 convolution kernels with a step length of 2 and batch regularization BN and Swish activation function operation, and then, the obtained characteristic image is used as input of a built bottleneck module, the input characteristic image is subjected to characteristic extraction through the bottleneck module, and the specific process of building the network bottleneck module based on a channel attention mechanism is as follows: firstly, changing the dimension of a characteristic channel according to an expansion proportion by adopting 1X 1 convolution, performing batch normalization BN and Swish activation, inputting the characteristic image into a depth separable convolution DWConv layer of 3X 3 for characteristic extraction, inputting the obtained characteristic image into a channel attention mechanism SE module, performing global average pooling operation on the input characteristic image by the module, performing Swish activation to aggregate global characteristics of the input characteristic image, performing full-connection operation on the obtained global characteristics according to the 1X 1 convolution layer with a certain activation rate, obtaining attention weights of different channels by a sigmoid activation function, performing dot multiplication weighting on the obtained weights and the input characteristic image, retaining the main characteristics of input, removing noise interference, reducing parameter calculation, improving the real-time performance of a network, and achieving the purpose of paying attention to more important information channels; finally, the output of the SE module is subjected to convolution of 1 multiplied by 1 to obtain the output of the whole bottleneck module;
As shown in table 3, the feature map extracted by the convolution layer with the step length of 2 and the convolution kernel of 3×3 is input into two bottleneck modules with serial expansion rate of 1 to perform feature extraction to obtain 144×144×24 feature maps, and the first three layers of networks in the network structure adopting the bottleneck modules to perform feature extraction are designed, so that the size of the output feature map is reduced to half of the input feature map when the feature extraction of the bottleneck layer is performed once, and meanwhile, the number of output channels is doubled as the number of input channels, and the feature map with 36×36×64 is obtained when the feature extraction of the first three layers is performed; the three later layers of feature extraction layers sequentially adopt a channel attention mechanism with the activation proportion of 0.35, 0.30 and 0.25, more pay attention to the feature channels with rich contents, finally obtain a 9×9×256 feature image group containing advanced semantic information, carry out channel dimension ascending through a 1×1 convolution layer, carry out pooling treatment on the 9×9×1280 feature image group after dimension ascending by adopting a 9×9 global average pooling layer, and obtain a feature sequence of 1×1×1280; finally, through the full connection layer and the full connection layer matched with the Softmax function, outputting probability values belonging to various road surface categories, and finally determining a network classification result according to the maximum probability value by using the Argmax function;
Training the built road surface image type network on a self-built data set, adopting a cross entropy loss function in a training process, using an Adam self-adaptive gradient descent optimization algorithm, setting a basic learning rate to be 0.0001, and storing a model and a training result according to the iteration times epoch;
step six, establishing a mapping rule to obtain road adhesion coefficient information:
finally, obtaining a corresponding road surface adhesion coefficient through the road surface image type classification result obtained according to the steps; in order to obtain the road surface adhesion coefficient of the identified road surface, referring to an automobile longitudinal slip adhesion coefficient reference value table and an automobile longitudinal slip adhesion coefficient reference value table of ice and snow road surfaces in GA/T643-2006 typical traffic accident shape vehicle running speed technical identification, and considering the influence of the running speed on the road surface adhesion coefficient, defining a mapping rule of the road surface type and the road surface adhesion coefficient as shown in Table 4 and considering the low-speed running state that the running speed of the vehicle is always 48km/h under the urban working condition; according to the vehicle speed and the road surface type identification result, searching a table to correspondingly obtain an adhesion coefficient range value of the current road surface, and taking the intermediate value of the upper limit and the lower limit of the adhesion coefficient as the adhesion coefficient of the current road, namely, the output result of the whole road surface adhesion coefficient prediction algorithm;
Road surface type Adhesion coefficient Determination value of adhesion coefficient Dry asphalt 0.55-0.8 0.675 Wet asphalt 0.45-0.7 0.575 Dry cement 0.55-0.8 0.675 Wet cement 0.45-0.75 0.600 Snow covered road 0.1-0.25 0.225 Ice-covered road 0.1-0.2 0.150
Table 4: road surface type and attachment coefficient mapping table.
CN202310580153.7A 2023-05-23 2023-05-23 Pavement adhesion coefficient prediction method based on semantic segmentation network Pending CN116630702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310580153.7A CN116630702A (en) 2023-05-23 2023-05-23 Pavement adhesion coefficient prediction method based on semantic segmentation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310580153.7A CN116630702A (en) 2023-05-23 2023-05-23 Pavement adhesion coefficient prediction method based on semantic segmentation network

Publications (1)

Publication Number Publication Date
CN116630702A true CN116630702A (en) 2023-08-22

Family

ID=87616534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310580153.7A Pending CN116630702A (en) 2023-05-23 2023-05-23 Pavement adhesion coefficient prediction method based on semantic segmentation network

Country Status (1)

Country Link
CN (1) CN116630702A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523318A (en) * 2023-12-26 2024-02-06 宁波微科光电股份有限公司 Anti-light interference subway shielding door foreign matter detection method, device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523318A (en) * 2023-12-26 2024-02-06 宁波微科光电股份有限公司 Anti-light interference subway shielding door foreign matter detection method, device and medium
CN117523318B (en) * 2023-12-26 2024-04-16 宁波微科光电股份有限公司 Anti-light interference subway shielding door foreign matter detection method, device and medium

Similar Documents

Publication Publication Date Title
CN111368687B (en) Sidewalk vehicle illegal parking detection method based on target detection and semantic segmentation
CN110163187B (en) F-RCNN-based remote traffic sign detection and identification method
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN109740465B (en) Lane line detection algorithm based on example segmentation neural network framework
CN111814623A (en) Vehicle lane departure visual detection method based on deep neural network
CN111460919B (en) Monocular vision road target detection and distance estimation method based on improved YOLOv3
CN113095152B (en) Regression-based lane line detection method and system
CN112183203A (en) Real-time traffic sign detection method based on multi-scale pixel feature fusion
CN113902915A (en) Semantic segmentation method and system based on low-illumination complex road scene
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN114418895A (en) Driving assistance method and device, vehicle-mounted device and storage medium
CN111582339A (en) Vehicle detection and identification method based on deep learning
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN112613434A (en) Road target detection method, device and storage medium
CN115273032A (en) Traffic sign recognition method, apparatus, device and medium
CN116630702A (en) Pavement adhesion coefficient prediction method based on semantic segmentation network
CN115527096A (en) Small target detection method based on improved YOLOv5
CN114048536A (en) Road structure prediction and target detection method based on multitask neural network
CN117079276B (en) Semantic segmentation method, system, equipment and medium based on knowledge distillation
CN117115690A (en) Unmanned aerial vehicle traffic target detection method and system based on deep learning and shallow feature enhancement
CN117115770A (en) Automatic driving method based on convolutional neural network and attention mechanism
CN116630920A (en) Improved lane line type identification method of YOLOv5s network model
CN114120246B (en) Front vehicle detection algorithm based on complex environment
CN115861948A (en) Lane line detection method, lane line detection device, lane line early warning method, lane line early warning system and medium
CN114882205A (en) Target detection method based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination