WO2022126377A1 - Traffic lane line detection method and apparatus, and terminal device and readable storage medium - Google Patents

Traffic lane line detection method and apparatus, and terminal device and readable storage medium Download PDF

Info

Publication number
WO2022126377A1
WO2022126377A1 PCT/CN2020/136540 CN2020136540W WO2022126377A1 WO 2022126377 A1 WO2022126377 A1 WO 2022126377A1 CN 2020136540 W CN2020136540 W CN 2020136540W WO 2022126377 A1 WO2022126377 A1 WO 2022126377A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
convolution
road
network layer
images
Prior art date
Application number
PCT/CN2020/136540
Other languages
French (fr)
Chinese (zh)
Inventor
王磊
钟宏亮
马森炜
程俊
林佩珍
范筱媛
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2020/136540 priority Critical patent/WO2022126377A1/en
Publication of WO2022126377A1 publication Critical patent/WO2022126377A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application belongs to the technical field of computer vision and image processing, and in particular, relates to a method, apparatus, terminal device and readable storage medium for detecting lane lines.
  • the automatic driving of vehicles plays an important role in the safe driving of cars; and lane recognition is an important part of the automatic driving system.
  • the result of lane recognition provides the basis for the control system of automatic driving, and plays an irreplaceable role in the fields of automatic parking, anti-collision warning and unmanned driving.
  • semantic segmentation models have achieved good performance in lane recognition tasks, but limited by the lack of global information and contextual information, ordinary semantic segmentation models cannot handle bad lighting conditions or lane occlusions. for the lane recognition task.
  • most of the deep learning models currently used for lane recognition have a large amount of calculation and are relatively complex, which is not conducive to the real-time requirements in practical application scenarios of automatic driving tasks.
  • One of the purposes of the embodiments of the present application is to provide a method, device, terminal device and readable storage medium for detecting lane lines, which can solve the problem that most of the deep learning models currently used for lane recognition have a large amount of calculation, and the models are relatively It is complex and is not conducive to the real-time requirements in the actual application scenarios of automatic driving tasks.
  • an embodiment of the present application provides a method for detecting lane lines, including:
  • the trained neural network model is based on
  • the sample images in the training set and the semantic segmentation model are obtained by training, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • the trained neural network model includes a residual network layer, a dilated convolutional network layer, an upsampling network layer, and a detector; the road image is input into the training
  • the post neural network model is processed, including:
  • the residual network layer includes a first residual module, a second residual module and a third residual module; the road image is input into the residual
  • the network layer through the convolution processing of the residual network layer, outputs a first feature map containing semantic features, including:
  • the atrous convolutional network layer includes a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a global average pooling module ; Input the first feature map into the hole convolution network layer, and extract the features through the hole convolution network layer, including:
  • the first feature map is input into the atrous convolutional network layer, and after feature extraction of the atrous convolutional network layer, the method further includes:
  • the feature maps respectively output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module are processed in the channel dimension. Splicing to obtain a splicing feature map; after the splicing feature map is processed by 1 ⁇ 1 convolution, the second feature map is output.
  • the method includes:
  • the second feature map is subjected to up-sampling processing, and the result of the up-sampling processing is input into the detector.
  • the classification prediction result of the road image of the current scene is output.
  • the classification prediction result is used to indicate the position of the lane in the road image of the current scene.
  • the method includes:
  • the neural network model is trained according to the sample images in the training set and the semantic segmentation model, including:
  • the fourth feature map is input into the hollow convolutional network layer of the neural network model, and the fifth feature map is output through feature extraction of the hollow convolutional network layer;
  • the fifth feature map is subjected to up-sampling processing, and the The result of the upsampling processing is input into the detector of the neural network model, and after the convolution processing of the detector, the classification prediction results of the road images of the multiple scenes are output, and the classification prediction results are used to indicate the Location of lanes in road images for multiple scenes.
  • the residual network layer of the neural network model includes three residual modules, and the method includes:
  • the convolution processing of the residual module outputs a seventh feature map; the seventh feature map is input into the semantic segmentation model, and the lane line pixels in the seventh feature map are segmented through the semantic segmentation model. , output the lane line instance feature map; splicing the lane line instance feature map and the sixth feature map to obtain a spliced image.
  • the method includes:
  • the stitched image is input into a segmenter, and after convolution processing by the segmenter, a semantic segmentation prediction result is output, and the semantic segmentation prediction result is used to indicate the lane recognition results in the road images of the multiple scenes.
  • the method includes:
  • the first loss function is used to calculate the first error value of the classification prediction result of the neural network model relative to the prediction probability of the marked image in the sample image, and the parameters of the neural network model are adjusted by the first error value;
  • a loss function is expressed as follows:
  • y is the classification truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the neural network model
  • is the preset weight value
  • the method includes:
  • the second loss function is expressed as follows:
  • y is the recognition truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the semantic segmentation model .
  • an embodiment of the present application provides a device for detecting lane lines, including:
  • an acquisition unit for acquiring the road image of the current scene
  • the processing unit is used to input the road image into the trained neural network model for processing, and output the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model is based on the training set
  • the sample images in the training set and the semantic segmentation model are trained, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program implement the method described.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program implements the method when executed by a processor.
  • an embodiment of the present application provides a computer program product that, when the computer program product runs on a terminal device, causes the terminal device to execute the method described in any one of the above-mentioned first aspects.
  • the terminal device obtains the road image of the current scene; the road image is input into the trained neural network model for processing, and the detection result of the lane line in the road image of the current scene is outputted
  • the neural network model after training is obtained by training according to the sample images in the training set and the semantic segmentation model, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes;
  • the trained neural network model can be obtained, which can solve the problem of the low accuracy of lane line recognition in the current complex environment, and the large amount of calculation of the current model for lane line detection.
  • the more complex problem is the slow response speed; the detection accuracy is improved, and at the same time, it meets the real-time requirements in the actual application scenarios of automatic driving tasks; it has strong ease of use and practicability.
  • FIG. 1 is a schematic flowchart of an application scenario provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for detecting lane lines provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of the overall architecture of the neural network model after training provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of the architecture of a residual network layer provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an atrous convolutional network layer provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a detector provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the overall architecture of the training neural network model provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a splitter provided by an embodiment of the present application.
  • FIG. 9 is a visual schematic diagram of a detection result of a lane line provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a device for detecting lane lines provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • lane line detection technology plays an irreplaceable role in the field of autonomous driving, and research on lane line detection technology is also emerging one after another.
  • edge extraction is used to detect lanes, and the operating efficiency of the algorithm is optimized; while traditional lane detection algorithm models are often based on image visual clues.
  • Recognition After converting the image into a model that can represent hue, saturation, and brightness through a color model (Hue-Saturation-Intensity, HSI), the fuzzy mean clustering algorithm is used to process the pixels of each row to identify the lane lines. ; The algorithm model used is more complex, and the model calculation amount is large.
  • the detection task of lane lines is realized through semantic segmentation model and image classification model; however, the adopted model is relatively complex and requires a large amount of calculation, which is not conducive to the actual application of automatic driving tasks. Real-time requirements; and the processing of data by this model destroys the spatial structure of each part of the image and reduces the accuracy of lane line detection.
  • an embodiment of the present application proposes a method for detecting lane lines.
  • the lane lines are detected and recognized through a trained neural network model, and the task of lane recognition is regarded as a task of classifying the positions of pixels in each row of a picture.
  • the unit analyzes the position of the lane line, which better analyzes the global and contextual information, and optimizes the operating efficiency of the model.
  • the model builds a model for processing semantic segmentation tasks during the training process as an aid for training; at the same time, in order to better deal with the global structural features of the image, the setting of the loss function is also adjusted to guide the The model pays better attention to the continuity law of lane lines; the neural network model used to detect lane lines has been improved.
  • the spatial structure of the data is maintained, the perception ability of the model is optimized, and the model's sensitivity is reduced. Complexity.
  • FIG. 1 it is a schematic flowchart of an application scenario provided by an embodiment of the present application.
  • the trained neural network model is obtained, the road image of the current scene is input into the trained neural network model, and the trained convolutional neural network model performs feature extraction and The processing of feature learning, output the detection result of the lane line, and realize the prediction of the position of the lane line.
  • a hollow convolutional network is introduced into the neural network model for detecting lane lines, and the classifier in the neural network model is optimized.
  • the size of the atrous convolutional network optimizes the perception ability of the model, maintains the data space structure, optimizes the complexity of the model, greatly reduces the computational load of the model, and improves the response speed.
  • the semantic segmentation model is used as an auxiliary model for training, and the training process is optimized by combining the convolutional convolution network, which greatly improves the training speed of the neural network model; at the same time, the classifier and the neural network model are optimized.
  • the segmenter of the semantic segmentation model maintains the spatial structure of the feature map in the last layer of the network, reduces the number of parameters of the model, reduces the complexity of the loss function, and optimizes the computational efficiency of the neural network model.
  • the trained neural network model still achieves high detection accuracy for images collected under poor lighting conditions or under occlusion conditions.
  • model training and model architecture are further introduced below with reference to the implementation steps of the method for detecting lane lines provided by the embodiments of the present application.
  • FIG. 2 it is a schematic flowchart of a method for detecting lane lines provided by an embodiment of the present application; the method for detecting lane lines includes the following steps in the application process:
  • Step S201 obtaining a road image of the current scene.
  • the terminal device may use a camera to capture a road image of the current driving scene of the vehicle; the captured road image includes a road image in front of, behind or on the side of the vehicle.
  • the terminal device may be an in-vehicle device, which is respectively connected to the camera and the control system for automatic driving of the vehicle.
  • the terminal device can control the camera to collect the road image of the scene where the vehicle is located in real time or in a preset period according to the requirements of the vehicle driving scene.
  • the road image may include continuous lane lines, partially occluded lane lines, or no lane lines.
  • the characteristics of the lane lines are preset in the terminal device, which provides a prediction basis for the detection of the lane lines in the middle of the road image.
  • the width of the lane lines includes 10 cm, 15 cm and 20 cm; the lane lines are divided into solid lines and dotted lines, and the colors include yellow and white.
  • Step S202 input the road image into the trained neural network model for processing, and output the detection result of the lane lines in the road image of the current scene; wherein, the trained neural network model is trained according to the sample images in the training set and the semantic segmentation model It is obtained that the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • the processing of road images by the neural network model is the actual detection process of lane lines; the actually detected lane lines are composed of lane lines with different line widths, line types and colors in various environments; lane line detection The task is to identify lane lines in various environments, and the goal of lane line detection is to determine the location and direction of the lane lines.
  • the processing of the road image by the trained neural network model includes the extraction and detection of target features.
  • the trained neural network model includes a residual network layer, an atrous convolutional network layer, an upsampling network layer, and a detector. Input the road image into the trained neural network model for processing, including:
  • the road image is input into the residual network layer, and after the convolution processing of the residual network layer, the first feature map containing semantic features is output; , output the second feature map containing detailed features; input the second feature map into the up-sampling network layer, and output the third feature map after the up-sampling process of the up-sampling network layer; input the third feature map into the detector, through the detector
  • the convolution processing of the current scene outputs the detection results of the lane lines in the road image of the current scene.
  • FIG. 3 it is a schematic diagram of the overall architecture of the neural network model after training provided by the embodiment of the present application, and the neural network model after training is a classification detection model.
  • the classification detection model adopts the residual network layer in the residual neural network ResNet-34 (Residual Neural Network-34); the input road image is convolved through the residual network layer to extract the semantic features in the road image. , to get the first feature map.
  • ResNet-34 residual neural network-34
  • a hole convolution network layer is also introduced; the input first feature map is sampled and classified through the hole convolution network layer, while ensuring the receptive field of the preset size, by adopting multiple long holes
  • the convolution kernel performs convolution processing on the first feature map, extracts detailed features and global features of the first feature map, and obtains a second feature map.
  • the upsampling network layer in the classification detection model performs upsampling processing on the input second feature map, and outputs the sampled feature map, that is, the third feature map.
  • the detector of the classification detection model performs two-dimensional convolution processing on the input third feature map, and outputs the recognition result of the lane line.
  • the residual network layer includes a first residual module, a second residual module and a third residual module; the road image is input into the residual network layer, and after convolution processing of the residual network layer, the output includes The first feature map of semantic features, including:
  • the road image is input into the first residual module, and the first result is obtained through convolution processing by the first residual module; Second result; the second result is input into the third residual module, and the first feature map is output after the convolution processing of the third residual module.
  • FIG. 4 a schematic structural diagram of a residual network layer provided by an embodiment of the present application.
  • the residual network layer in the classification and detection model includes three residual modules, namely the first residual module, the second residual module and the third residual module.
  • each residual module includes six 3 ⁇ 3 convolutional layers with edge padding, and the result after processing the six 3 ⁇ 3 convolutional layers with edge padding It is superimposed with the input feature map to obtain the output of each residual module.
  • the residual module in the ResNet-34 network is used to extract and transform the features of the road image.
  • (b) in Figure 4 shows the structure of the residual block.
  • the residual block uses multiple stacked convolutional layers to process the input feature map to analyze and extract the effective information of the image more deeply.
  • the final convolution result will be accumulated with the original input to optimize the number of layers that are too deep, which may lead to vanishing gradients.
  • the identification result of the lane line includes the identified lanes, the positions of the lanes on each row of pixels or pixel units in the image, and the determination of which column of the row each lane belongs to or any column that does not belong to the row. a row.
  • the atrous convolutional network layer includes a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a global average pooling module. Input the first feature map into the atrous convolutional network layer, and extract the features of the atrous convolutional network layer, including:
  • the first feature map is input into the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module respectively, and feature extraction is performed on the first feature map.
  • the atrous convolutional network layer uses convolution kernels of different sizes to perform convolution processing on the feature maps.
  • FIG. 5 it is a schematic structural diagram of an atrous convolutional network layer provided by an embodiment of the present application.
  • Atrous spatial convolution pyramid pooling module and global average pooling module in atrous convolutional network layer as shown in (a) of Fig. 5.
  • the atrous convolution of each size and sampling rate performs convolution processing on the input first feature map, and extracts detailed features of different channel dimensions in the first feature map.
  • the global features of the input first feature map are extracted through the global average pooling module of the atrous convolutional network layer.
  • the first convolution module is a 1 ⁇ 1 convolution
  • the first feature map is input into the atrous convolutional network layer, and after the feature extraction of the atrous convolutional network layer, the method further includes:
  • the feature maps output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module are spliced in the channel dimension to obtain the spliced feature map; After the image is processed by 1 ⁇ 1 convolution, the second feature map is output.
  • the feature map is output after upsampling and 1 ⁇ 1 convolution processing.
  • the feature map output by the global draw pooling module and the feature maps output by the hole convolution and 1 ⁇ 1 convolution of each size and sampling rate, respectively, are spliced in the channel dimension, and the spliced feature map is output; the spliced feature map is then passed through A layer of 1 ⁇ 1 convolutional processing outputs the second feature map.
  • the hole convolution processing in the hole convolution network layer can effectively expand the convolution receptive field while reducing the amount of parameters of the classification and detection model, thereby optimizing the operating efficiency of the model.
  • the left side shows the general structure of the atrous convolution kernel.
  • the hollow convolution kernel only selects some positions in the kernel to give actual weights, such as the weight at the position of the small black square, and ignores the input of the remaining positions.
  • the sampling interval of atrous convolution is called the sampling rate.
  • Ordinary convolution is a hole convolution with a sampling rate of 1, which is a special case of hole convolution.
  • the use of atrous convolution will not cause possible image information due to the gap between the actual weights of the convolution kernel.
  • feature map 2 is output, and feature map 2 continues to undergo multi-layer ordinary convolution processing.
  • the gap weight between the actual weights of hole convolution is filled.
  • the method includes:
  • the second feature map is subjected to upsampling processing, and the result of the upsampling processing is input into the detector.
  • the classification prediction result of the road image of the current scene is output, and the classification prediction result is used to indicate the current scene.
  • FIG. 6 it is a schematic structural diagram of a detector provided by an embodiment of the present application.
  • the second feature map input upsampling network layer After the second feature map input upsampling network layer is sampled, it outputs a sampling feature map in the shape of [the number of lanes, the number of rows, the number of columns + 1], and the sampling feature map continues to pass through two layers of the hole volume filled with edges.
  • the final classification result of the shape of [number of lanes, number of rows, number of columns+1] is output, and the shape of the classification result is the same as that of the input sampling feature map , respectively represent each lane, locate each lane on each row of the image, and determine which column of the row each lane is located in, or any column that does not belong to the row.
  • the extraction and transformation of multi-size features greatly simplifies the detector of the classification detection model, directly using bilinear interpolation and 3 ⁇ 3 two
  • the dimensional convolution performs feature adjustment on the feature output of the hole convolution to obtain the final classification output, which further improves the operating efficiency of the model.
  • the road image is processed by a multi-layer residual network block and then outputs a feature map, and then after a layer of 1 ⁇ 1 convolution processing, the hole convolution pyramid in the middle hole convolution network layer of the model is detected by classification Pooling ASPP for feature processing. It will be processed by three atrous convolution blocks with sampling rates of 1, 3, and 5, and another 1 ⁇ 1 convolution. All convolution operations will use padding operations to keep the input and output sizes unchanged.
  • the atrous convolutional network layer also adopts global average pooling to obtain a global generalization of the input feature map, and restores it to the input with the help of bilinear interpolation upsampling and 1 ⁇ 1 convolution.
  • the second feature map is output after a 1 ⁇ 1 convolution to adjust the dimension features. Therefore, hole convolution processing is used in the middle and end of the classification detection model to comprehensively extract and analyze image features from multiple dimensions; the last layer features in the classification detection model are no longer straightened, but the last layer of the classification detection model is used.
  • the spatial structure of the feature map is maintained, and the final model output is generated with the help of the improved detector, which greatly reduces the number of parameters of the model and improves the operating efficiency of the model.
  • an auxiliary model based on semantic segmentation is used for training.
  • the method includes: marking the pixel coordinates of lane lines, the type of lane lines, and the characteristics of whether the current lane is a drivable lane in the collected road images of multiple scenes, and obtains the corresponding characteristics in the sample images. the marked images corresponding to the road images of the multiple scenes.
  • road images of different scenarios are collected, lane lines in the road images are marked, regions of interest are extracted from the road images and down-sampling operations are performed, and data sets with different annotation formats are processed by a preset processing function. , to obtain the lane line data set required for the training of the neural network model, that is, the marked images corresponding to the road images of multiple scenes in the sample image.
  • the neural network model is trained according to the sample images in the training set and the semantic segmentation model, including:
  • the road images of multiple scenes in the sample image are input into the residual network layer of the neural network model, and after the convolution processing of the residual network layer, the fourth feature map of the road image is output; the fourth feature map is input into the neural network model.
  • the hole convolution network layer after the feature extraction of the hole convolution network layer, outputs the fifth feature map; the fifth feature map is up-sampled, and the result of the up-sampling process is input into the detector of the neural network model, after the detector After the convolution processing, the classification prediction results of the road images of the multiple scenes are output, and the classification prediction results are used to indicate the positions of the lanes in the road images of the multiple scenes.
  • the residual network layer of the neural network model includes three residual modules, and the method includes:
  • the fifth feature map is up-sampled to obtain the sixth feature map; the road images of multiple scenes in the sample image are input into the residual network layer, and after the convolution processing of the first two residual modules of the residual network layer, Output the seventh feature map; input the seventh feature map into the semantic segmentation model, segment the lane line pixels in the seventh feature map through the semantic segmentation model, and output the lane line instance feature map; combine the lane line instance feature map with the sixth feature
  • the images are stitched together to obtain a stitched image.
  • the method includes: inputting the stitched image into a segmenter, subject to convolution processing by the segmenter, and outputting a semantic segmentation prediction result, where the semantic segmentation prediction result is used to indicate lane recognition results in road images of multiple scenes .
  • FIG. 7 a schematic diagram of the overall architecture of the training neural network model provided by the embodiment of the present application; the classification detection model in the upper half is the model used in the lane line detection process, and its output will be used as the final result of the task in the prediction stage ; The semantic segmentation model in the lower part only participates in the training process of the model and is used to guide the model to obtain a more accurate recognition effect.
  • the road images of multiple scenes in the sample image are input into the residual network layer of the neural network model.
  • the semantic features of the road image are extracted, and the first Four feature maps.
  • the fourth feature map is input into the hole convolution of each size and sampling rate of the hole convolution network layer, 1 ⁇ 1 convolution, and global average pooling for convolution processing to extract different detailed features and global features.
  • the image splicing process and the 1 ⁇ 1 convolution process output the fifth feature map.
  • the feature map of the road image is sampled and classified by the hole convolution network layer in the neural network model to ensure a larger receptive field. Detail perception ability, and reduce the number of parameters of the model.
  • Global feature extraction is performed on the feature map through global draw pooling in the atrous convolutional network layer, and the feature map is output after upsampling and 1 ⁇ 1 convolution.
  • the feature map output by each hole convolution kernel and the feature map output by the global draw pooling are spliced in the channel dimension, and then the fifth feature map is output after a layer of 1 ⁇ 1 convolution processing.
  • the output result of the second residual module of the neural network model (such as ResNet-34) is input into the semantic segmentation model, and the second residual is processed by the semantic segmentation model.
  • the output of the module is divided into pixels, and the feature map of the lane line instance is output.
  • the fifth feature map is output after being processed by the atrous convolutional network layer for up-sampling processing, and the sixth feature map is output.
  • the lane line instance feature map and the sixth feature map are spliced to obtain a spliced image.
  • the stitched image is input into a segmenter, and the segmenter performs two-dimensional convolution processing to output a semantic segmentation prediction result [number of lanes + 1, number of rows, number of columns], indicating that for each basic pixel in the image zone, which lane it belongs to or does not belong to any lane.
  • FIG. 8 a schematic structural diagram of a splitter provided by an embodiment of the present application.
  • the network before the splitter upsamples the feature map to the shape of [number of lanes + 1, number of rows, number of columns], and after two layers of convolution with edge-filled holes and a layer of ordinary 3 ⁇ 3 convolution, the output
  • the final semantic segmentation prediction result, the shape of the semantic segmentation prediction result is the same as the input feature map, which is [the number of lanes + 1, the number of rows, the number of columns], indicating which lane or Does not belong to any lane.
  • the segmenter is the output part of the semantic segmentation model.
  • the neural network is trained using an auxiliary model based on semantic segmentation. After splicing the results of the second residual module of the neural network model ResNet-34 with the output of the up-sampled atrous convolutional network layer through the semantic segmentation model, the output is directly processed by a two-dimensional convolution-based semantic segmenter. The goal of this output is to determine the recognition results of each lane in road images of multiple scenes, such as whether each part of the image belongs to a certain lane, which lane it belongs to, and performs the task of semantic segmentation.
  • the first loss function is used to calculate the first error value of the classification prediction result of the neural network model relative to the prediction probability of the marked image in the sample image, and the parameters of the neural network model are adjusted by the first error value;
  • the first loss The function is represented as follows:
  • y is the classification truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the neural network model
  • is the preset weight value
  • the classification detection model adopts Focal Loss as its first loss function in the training process.
  • y refers to the classification truth value of the sample
  • p refers to the predicted probability of the sample
  • is the set weight.
  • Focal Loss can attach larger weights to hard-to-classify and severely misclassified samples, making the model more focused on samples during training.
  • a second error value of the lane recognition result of the semantic segmentation model relative to the predicted probability of the marked image in the sample image is calculated by a second loss function, and the parameters of the neural network model are adjusted by the second error value; the second loss The function is represented as follows:
  • y is the recognition truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the semantic segmentation model .
  • the semantic segmentation model adopts the cross entropy loss function Cross Entropy Loss as the second loss function, and the calculation formula of Cross Entropy Loss is shown in formula (2), where y refers to the classification truth value of the sample , p refers to the predicted probability of the sample.
  • y refers to the classification truth value of the sample
  • p refers to the predicted probability of the sample.
  • the straightening operation is no longer performed on the feature map of the last layer in the classification detection model.
  • the structured loss function that can constrain the spatial features of the lane lines is used for processing, and the output feature map of the atrous convolutional network layer is directly subjected to bilinear interpolation upsampling and two-dimensional convolution processing; the spatial structure of the feature map is maintained, and the reduces the training burden of the model.
  • the parameter quantity of the model is optimized, and the operation efficiency of the model is improved; the spatial structure of the feature map is maintained at the end of the model, which is conducive to the analysis of the overall characteristics of the image; the use of hole spatial convolution pyramid pooling optimizes the
  • the feature analysis in the second half of the model improves the receptive field of the convolution kernel without increasing the training burden of the model, and uses multiple convolution kernels of different sizes to analyze different features of the feature map, enhancing the generalization of classification and semantic segmentation. It improves the detection accuracy of tasks.
  • the same data set as the traditional neural network model was used for experimental verification, and the traditional neural network model was compared in terms of the training speed, convergence speed and detection accuracy of the model. Significant progress has been made.
  • Table 1 the comparison of the detection accuracy and detection speed between the trained neural network model provided by the embodiment of the present application and the traditional original model.
  • the correct rate refers to the correct rate of recognition of lane line pixels by the model on an image of 800 ⁇ 288 pixels.
  • the running speed refers to the time it takes for the model to process a batch of 16 images.
  • the trained neural network model traditional original model Correct rate 92.96% 92.04% running speed 32ms/batch 60ms/batch
  • an embodiment of the present application provides a visual schematic diagram of a detection result of a lane line.
  • FIG. 9 shows the detection results of the trained neural network model provided by the embodiments of the present application in two sets of test images. It can be seen that, as shown in (a) of FIG. 9 , the trained neural network model provided by the embodiment of the present application can accurately identify multiple lane lines in the image; and as shown in (b) of FIG. 9 , even if In the case that the lane line is blocked by obstacles, it can still maintain a good recognition effect.
  • FIG. 10 shows a structural block diagram of the device for detecting lane lines provided by the embodiments of the present application. part.
  • the device includes:
  • an acquisition unit 101 used for acquiring a road image of the current scene
  • the processing unit 102 is configured to input the road image into the trained neural network model for processing, and output the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model is based on training
  • the sample images in the set and the semantic segmentation model are obtained by training, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • the terminal device obtains the road image of the current scene; inputs the road image into the trained neural network model for processing, and outputs the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model
  • the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes;
  • the segmentation model is trained to obtain the trained neural network model, which can solve the problem of low accuracy of lane line recognition in the current complex environment, and the current model used for lane line detection has a large amount of calculation and a relatively complex model.
  • the problem is that the response speed is slow ; Improve the detection accuracy, while meeting the real-time requirements in the actual application scenarios of automatic driving tasks; it has strong ease of use and practicability.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
  • the embodiments of the present application provide a computer program product, when the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be implemented when the mobile terminal executes the computer program product.
  • FIG. 11 is a schematic structural diagram of a terminal device 11 according to an embodiment of the present application.
  • the terminal device 11 of this embodiment includes: at least one processor 110 (only one is shown in FIG. 11 ), a processor, a memory 111 , and a processor 111 stored in the memory 111 and available for processing in the at least one processor A computer program 112 running on the processor 110, and the processor 110 implements the steps in any of the foregoing embodiments of the certificate storage method when the processor 110 executes the computer program 112.
  • the terminal device 11 may be a computing device such as a desktop computer, a notebook, a palmtop computer, a vehicle-mounted device, and a cloud server.
  • the terminal device 11 may include, but is not limited to, a processor 110 and a memory 111 .
  • FIG. 11 is only an example of the terminal device 11, and does not constitute a limitation on the terminal device 11, and may include more or less components than the one shown, or combine some components, or different components , for example, may also include input and output devices, network access devices, and the like.
  • the so-called processor 110 may be a central processing unit (Central Processing Unit, CPU), and the processor 110 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 111 may be an internal storage unit of the terminal device 11 in some embodiments, such as a hard disk or a memory of the terminal device 11 . In other embodiments, the memory 111 may also be an external storage device of the terminal device 11, such as a plug-in hard disk equipped on the terminal device 11, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 111 may also include both an internal storage unit of the terminal device 11 and an external storage device.
  • the memory 111 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as program codes of the computer program and the like. The memory 111 may also be used to temporarily store data that has been output or will be output.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the present application realizes all or part of the processes in the methods of the above embodiments, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium.
  • the computer program includes computer program code
  • the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunication signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunications signals.
  • the disclosed apparatus/network device and method may be implemented in other manners.
  • the apparatus/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

Abstract

A traffic lane line detection method and apparatus, and a terminal device and a readable storage medium, applicable to the technical field of computer vision and image processing. The method comprises: obtaining a road image of the current scene (S201); and inputting the road image into a trained neural network model for processing, and outputting a detection result of a traffic lane line in the road image of the current scene, wherein the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, and the sample images in the training set comprise collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes (S202). The method can solve the problems that most of deep learning models used for lane recognition at present are relatively large in calculation amount, and the models are relatively complex and unfavorable for meeting the requirement on real-time performance in an actual application scene of an automatic driving task.

Description

检测车道线的方法、装置、终端设备及可读存储介质Method, device, terminal device and readable storage medium for detecting lane line 技术领域technical field
本申请属于计算机视觉和图像处理技术领域,尤其涉及一种检测车道线的方法、装置、终端设备及可读存储介质。The present application belongs to the technical field of computer vision and image processing, and in particular, relates to a method, apparatus, terminal device and readable storage medium for detecting lane lines.
背景技术Background technique
随着人工智能及汽车行业的极速发展,车辆的自动驾驶(如全自动无人驾驶或半自动辅助驾驶)在汽车安全行驶上发挥着重要的作用;而车道识别是自动驾驶系统的重要组成部分,通过车道识别结果为自动驾驶的控制系统提供依据,在自动泊车、防碰撞预警以及无人驾驶等领域起到不可替代的作用。With the rapid development of artificial intelligence and the automotive industry, the automatic driving of vehicles (such as fully automatic driverless or semi-automatic assisted driving) plays an important role in the safe driving of cars; and lane recognition is an important part of the automatic driving system. The result of lane recognition provides the basis for the control system of automatic driving, and plays an irreplaceable role in the fields of automatic parking, anti-collision warning and unmanned driving.
近年来,语义分割模型在车道识别任务上取得了很好的表现,但受限于全局信息与上下文信息的缺失,普通的语义分割模型无法很好的处理不良光照条件或存在车道遮挡等情况下的车道识别任务。且目前大部分用于车道识别的深度学习模型计算量较大,模型较为复杂,不利于自动驾驶任务的实际应用场景中对实时性的要求。In recent years, semantic segmentation models have achieved good performance in lane recognition tasks, but limited by the lack of global information and contextual information, ordinary semantic segmentation models cannot handle bad lighting conditions or lane occlusions. for the lane recognition task. In addition, most of the deep learning models currently used for lane recognition have a large amount of calculation and are relatively complex, which is not conducive to the real-time requirements in practical application scenarios of automatic driving tasks.
技术问题technical problem
本申请实施例的目的之一在于:提供了一种检测车道线的方法、装置、终端设备及可读存储介质,可以解决目前大部分用于车道识别的深度学习模型计算量较大,模型较为复杂,不利于自动驾驶任务的实际应用场景中对实时性的要求的问题。One of the purposes of the embodiments of the present application is to provide a method, device, terminal device and readable storage medium for detecting lane lines, which can solve the problem that most of the deep learning models currently used for lane recognition have a large amount of calculation, and the models are relatively It is complex and is not conducive to the real-time requirements in the actual application scenarios of automatic driving tasks.
技术解决方案technical solutions
为解决上述技术问题,本申请实施例采用的技术方案是:In order to solve the above-mentioned technical problems, the technical solutions adopted in the embodiments of the present application are:
第一方面,本申请实施例提供了一种检测车道线的方法,包括:In a first aspect, an embodiment of the present application provides a method for detecting lane lines, including:
获取当前场景的道路图像;将所述道路图像输入训练后的神经网络模型中处理,输出所述当前场景的道路图像中的车道线的检测结果;其中,所述训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,所述训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像。Obtain the road image of the current scene; input the road image into the trained neural network model for processing, and output the detection result of the lane lines in the road image of the current scene; wherein, the trained neural network model is based on The sample images in the training set and the semantic segmentation model are obtained by training, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
在第一方面的一种可能的实现方式中,所述训练后的神经网络模型包括残差网络层、空洞卷积网络层、上采样网络层以及检测器;所述将所述道路图像输入训练后的神经网络模型中处理,包括:In a possible implementation manner of the first aspect, the trained neural network model includes a residual network layer, a dilated convolutional network layer, an upsampling network layer, and a detector; the road image is input into the training The post neural network model is processed, including:
将所述道路图像输入所述残差网络层,经过所述残差网络层的卷积处理,输出包含语义特征的第一特征图;将所述第一特征图输入所述空洞卷积网络层,经过所述空洞卷积网络层的特征提取,输出包含细节特征的第二特征图;将所述第二特征图输入所述上采样网络层,经过所述上采样网络层的上采样处理,输出第三特征图;将所述第三特征图输入所 述检测器,经过所述检测器的卷积处理,输出所述当前场景的道路图像中的车道线的所述检测结果。Input the road image into the residual network layer, and through the convolution processing of the residual network layer, output a first feature map containing semantic features; input the first feature map into the hole convolutional network layer , through the feature extraction of the hole convolutional network layer, output the second feature map containing detailed features; input the second feature map into the up-sampling network layer, after the up-sampling process of the up-sampling network layer, Outputting a third feature map; inputting the third feature map into the detector, and after convolution processing by the detector, the detection result of the lane lines in the road image of the current scene is output.
在第一方面的一种可能的实现方式中,所述残差网络层包括第一残差模块、第二残差模块和第三残差模块;所述将所述道路图像输入所述残差网络层,经过所述残差网络层的卷积处理,输出包含语义特征的第一特征图,包括:In a possible implementation manner of the first aspect, the residual network layer includes a first residual module, a second residual module and a third residual module; the road image is input into the residual The network layer, through the convolution processing of the residual network layer, outputs a first feature map containing semantic features, including:
将所述道路图像输入所述第一残差模块,经过所述第一残差模块进行卷积处理,得到第一结果;将所述第一结果输入所述第二残差模块,经过第二残差模块的卷积处理,得到第二结果;将所述第二结果输入所述第三残差模块,经过所述第三残差模块的卷积处理,输出所述第一特征图。Input the road image into the first residual module, and perform convolution processing through the first residual module to obtain a first result; input the first result into the second residual module, and pass the second The second result is obtained through the convolution processing of the residual module; the second result is input into the third residual module, and the first feature map is output through the convolution processing of the third residual module.
在第一方面的一种可能的实现方式中,所述空洞卷积网络层包括第一卷积模块、第二卷积模块、第三卷积模块、第四卷积模块以及全局平均池化模块;将所述第一特征图输入所述空洞卷积网络层,经过所述空洞卷积网络层的特征提取,包括:In a possible implementation manner of the first aspect, the atrous convolutional network layer includes a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a global average pooling module ; Input the first feature map into the hole convolution network layer, and extract the features through the hole convolution network layer, including:
将所述第一特征图分别输入所述第一卷积模块、所述第二卷积模块、所述第三卷积模块、所述第四卷积模块以及所述全局平均池化模块,对所述第一特征图进行特征提取。Input the first feature map into the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module respectively, Feature extraction is performed on the first feature map.
在第一方面的一种可能的实现方式中,将所述第一特征图输入所述空洞卷积网络层,经过所述空洞卷积网络层的特征提取之后,还包括:In a possible implementation manner of the first aspect, the first feature map is input into the atrous convolutional network layer, and after feature extraction of the atrous convolutional network layer, the method further includes:
将所述第一卷积模块、所述第二卷积模块、所述第三卷积模块、所述第四卷积模块以及所述全局平均池化模块分别输出的特征图在通道维度上进行拼接,得到拼接特征图;将所述拼接特征图经过1×1卷积处理后,输出所述第二特征图。The feature maps respectively output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module are processed in the channel dimension. Splicing to obtain a splicing feature map; after the splicing feature map is processed by 1×1 convolution, the second feature map is output.
在第一方面的一种可能的实现方式中,在经过所述空洞卷积网络层对所述第一特征图进行特征提取后,所述方法包括:In a possible implementation manner of the first aspect, after feature extraction is performed on the first feature map through the atrous convolutional network layer, the method includes:
将所述第二特征图进行上采样处理,并将上采样处理的结果输入所述检测器,经过所述检测器的卷积处理后,输出所述当前场景的道路图像的分类预测结果,所述分类预测结果用于指示所述当前场景的道路图像中的车道所在的位置。The second feature map is subjected to up-sampling processing, and the result of the up-sampling processing is input into the detector. After the convolution processing of the detector, the classification prediction result of the road image of the current scene is output. The classification prediction result is used to indicate the position of the lane in the road image of the current scene.
在第一方面的一种可能的实现方式中,所述方法包括:In a possible implementation manner of the first aspect, the method includes:
对采集的多个场景的道路图像的车道线像素点坐标、车道线类型以及当前车道是否为可行驶车道的特征进行标记,得到所述样本图像中与所述多个场景的道路图像对应的所述标记图像。Mark the lane line pixel coordinates, the lane line type, and the characteristics of whether the current lane is a drivable lane in the collected road images of multiple scenes, and obtain all the sample images corresponding to the road images of the multiple scenes. the marked image.
在第一方面的一种可能的实现方式中,根据训练集中的样本图像以及语义分割模型训练神经网络模型,包括:In a possible implementation manner of the first aspect, the neural network model is trained according to the sample images in the training set and the semantic segmentation model, including:
将样本图像中的所述多个场景的道路图像输入所述神经网络模型的残差网络层,经过所述残差网络层的卷积处理,输出所述道路图像的第四特征图;将所述第四特征图输入所 述神经网络模型的空洞卷积网络层,经过所述空洞卷积网络层的特征提取,输出第五特征图;将所述第五特征图进行上采样处理,并将上采样处理的结果输入所述神经网络模型的检测器,经过所述检测器的卷积处理后,输出所述多个场景的道路图像的分类预测结果,所述分类预测结果用于指示所述多个场景的道路图像中的车道所在的位置。Input the road images of the multiple scenes in the sample image into the residual network layer of the neural network model, and through the convolution processing of the residual network layer, output the fourth feature map of the road image; The fourth feature map is input into the hollow convolutional network layer of the neural network model, and the fifth feature map is output through feature extraction of the hollow convolutional network layer; the fifth feature map is subjected to up-sampling processing, and the The result of the upsampling processing is input into the detector of the neural network model, and after the convolution processing of the detector, the classification prediction results of the road images of the multiple scenes are output, and the classification prediction results are used to indicate the Location of lanes in road images for multiple scenes.
在第一方面的一种可能的实现方式中,所述神经网络模型的残差网络层包括三个残差模块,所述方法包括:In a possible implementation manner of the first aspect, the residual network layer of the neural network model includes three residual modules, and the method includes:
将所述第五特征图进行上采样处理,得到第六特征图;将样本图像中的所述多个场景的道路图像输入所述残差网络层,经过所述残差网络层的前两个所述残差模块的卷积处理,输出第七特征图;将所述第七特征图输入所述语义分割模型,经过所述语义分割模型对所述第七特征图中的车道线像素进行分割,输出车道线实例特征图;将所述车道线实例特征图与所述第六特征图进行拼接,得到拼接图像。Perform upsampling processing on the fifth feature map to obtain a sixth feature map; input the road images of the multiple scenes in the sample image into the residual network layer, and pass through the first two layers of the residual network layer. The convolution processing of the residual module outputs a seventh feature map; the seventh feature map is input into the semantic segmentation model, and the lane line pixels in the seventh feature map are segmented through the semantic segmentation model. , output the lane line instance feature map; splicing the lane line instance feature map and the sixth feature map to obtain a spliced image.
在第一方面的一种可能的实现方式中,所述方法包括:In a possible implementation manner of the first aspect, the method includes:
将所述拼接图像输入分割器,经过所述分割器的卷积处理,输出语义分割预测结果,所述语义分割预测结果用于指示所述多个场景的道路图像中的车道识别结果。The stitched image is input into a segmenter, and after convolution processing by the segmenter, a semantic segmentation prediction result is output, and the semantic segmentation prediction result is used to indicate the lane recognition results in the road images of the multiple scenes.
在第一方面的一种可能的实现方式中,所述方法包括:In a possible implementation manner of the first aspect, the method includes:
通过第一损失函数计算神经网络模型的分类预测结果相对于样本图像中所述标记图像的预测概率的第一误差值,通过所述第一误差值调整所述神经网络模型的参数;所述第一损失函数表示如下:The first loss function is used to calculate the first error value of the classification prediction result of the neural network model relative to the prediction probability of the marked image in the sample image, and the parameters of the neural network model are adjusted by the first error value; A loss function is expressed as follows:
Figure PCTCN2020136540-appb-000001
Figure PCTCN2020136540-appb-000001
其中,y为训练集中的样本图像中的所述多个场景的道路图像的分类真值,p为训练集中样本图像中的所述多个场景的道路图像经过所述神经网络模型处理的预测概率,γ为预设的权重值。Wherein, y is the classification truth value of the road images of the multiple scenes in the sample images in the training set, and p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the neural network model , γ is the preset weight value.
在第一方面的一种可能的实现方式中,所述方法包括:In a possible implementation manner of the first aspect, the method includes:
通过第二损失函数计算所述语义分割模型的车道识别结果相对于样本图像中所述标记图像的预测概率的第二误差值,通过所述第二误差值调整所述神经网络模型的参数;所述第二损失函数表示如下:Calculate the second error value of the lane recognition result of the semantic segmentation model relative to the predicted probability of the marked image in the sample image by using the second loss function, and adjust the parameters of the neural network model by the second error value; The second loss function is expressed as follows:
Figure PCTCN2020136540-appb-000002
Figure PCTCN2020136540-appb-000002
其中,y为训练集中的样本图像中的所述多个场景的道路图像的识别真值,p为训练集中样本图像中的所述多个场景的道路图像经过所述语义分割模型处理的预测概率。Wherein, y is the recognition truth value of the road images of the multiple scenes in the sample images in the training set, and p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the semantic segmentation model .
第二方面,本申请实施例提供了一种检测车道线的装置,包括:In a second aspect, an embodiment of the present application provides a device for detecting lane lines, including:
获取单元,用于获取当前场景的道路图像;an acquisition unit for acquiring the road image of the current scene;
处理单元,用于将所述道路图像输入训练后的神经网络模型中处理,输出所述当前场景的道路图像中的车道线的检测结果;其中,所述训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,所述训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像。The processing unit is used to input the road image into the trained neural network model for processing, and output the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model is based on the training set The sample images in the training set and the semantic segmentation model are trained, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
第三方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现所述的方法。In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program implement the method described.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program implements the method when executed by a processor.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product that, when the computer program product runs on a terminal device, causes the terminal device to execute the method described in any one of the above-mentioned first aspects.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the second aspect to the fifth aspect, reference may be made to the relevant description in the first aspect, which is not repeated here.
有益效果beneficial effect
本申请实施例的有益效果在于:通过本申请实施例,终端设备获取当前场景的道路图像;将道路图像输入训练后的神经网络模型中处理,输出当前场景的道路图像中的车道线的检测结果;其中,训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像;通过根据训练集中的样本图像以及语义分割模型训练,得到训练后的神经网络模型,可以解决目前复杂环境下,车道线识别准确率低,以及目前用于车道线检测的模型计算量较大、模型较复杂是的响应速度慢的问题;提高了检测精度,同时满足了自动驾驶任务的实际应用场景中的实时性要求;具有较强的易用性与实用性。The beneficial effects of the embodiments of the present application are: through the embodiments of the present application, the terminal device obtains the road image of the current scene; the road image is input into the trained neural network model for processing, and the detection result of the lane line in the road image of the current scene is outputted Wherein, the neural network model after training is obtained by training according to the sample images in the training set and the semantic segmentation model, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes; By training according to the sample images in the training set and the semantic segmentation model, the trained neural network model can be obtained, which can solve the problem of the low accuracy of lane line recognition in the current complex environment, and the large amount of calculation of the current model for lane line detection. The more complex problem is the slow response speed; the detection accuracy is improved, and at the same time, it meets the real-time requirements in the actual application scenarios of automatic driving tasks; it has strong ease of use and practicability.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或示范性技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or exemplary technologies. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.
图1是本申请实施例提供的应用场景的流程示意图;1 is a schematic flowchart of an application scenario provided by an embodiment of the present application;
图2是本申请实施例提供的检测车道线的方法流程示意图;2 is a schematic flowchart of a method for detecting lane lines provided by an embodiment of the present application;
图3是本申请实施例提供训练后神经网络模型整体架构示意图;3 is a schematic diagram of the overall architecture of the neural network model after training provided by an embodiment of the present application;
图4是本申请实施例提供的残差网络层的架构示意图;4 is a schematic diagram of the architecture of a residual network layer provided by an embodiment of the present application;
图5是本申请实施例提供的空洞卷积网络层的结构示意图;5 is a schematic structural diagram of an atrous convolutional network layer provided by an embodiment of the present application;
图6是本申请实施例提供的检测器的结构示意图;6 is a schematic structural diagram of a detector provided by an embodiment of the present application;
图7是本申请实施例提供的训练神经网络模型的整体架构示意图;7 is a schematic diagram of the overall architecture of the training neural network model provided by an embodiment of the present application;
图8是本申请实施例提供的分割器的结构示意图;8 is a schematic structural diagram of a splitter provided by an embodiment of the present application;
图9是本申请实施例提供车道线的检测结果的可视示意图;9 is a visual schematic diagram of a detection result of a lane line provided by an embodiment of the present application;
图10是本申请实施例提供的检测车道线的装置的结构示意图;10 is a schematic structural diagram of a device for detecting lane lines provided by an embodiment of the present application;
图11是本申请实施例提供的终端设备的结构示意图。FIG. 11 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or sets thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the specification of this application and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrases "if it is determined" or "if the [described condition or event] is detected" may be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is detected. ]" or "in response to detection of the [described condition or event]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the specification of the present application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and should not be construed as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.
目前,车道线检测技术在自动驾驶领域起到不可替代的作用,对车道线检测技术的研究也层出不穷。其中,基于霍夫变换,结合路面类型、天气等多种因素,利用边缘提取对 车道进行检测,并对算法的运行效率进行优化;而传统的车道检测算法模型往往基于图像的视觉线索进行车道线识别,通过颜色模型(Hue-Saturation-Intensity,HSI)将图像转化为能够表示色调、饱和度、亮度的模型后,利用模糊均值聚类算法对每行的像素点处理,从而进行车道线的识别;所采用的算法模型较为复杂,模型计算量较大。通过采用深度学习的语义分割技术,在车道线识别数据集中加入了大量恶劣天气条件下的样本,并提出基于消失点(Vanishing Point)进行分析的多任务模型;或者通过构建基于多重注意力机制进行知识提取的车道线识别模型;或者通过引入长短期神经网络等,处理车道线的长线特征,用于对车道线进行识别;通过传统卷积神经网络无法很好的分析图像中行与列的空间关系。At present, lane line detection technology plays an irreplaceable role in the field of autonomous driving, and research on lane line detection technology is also emerging one after another. Among them, based on Hough transform, combined with various factors such as road surface type and weather, edge extraction is used to detect lanes, and the operating efficiency of the algorithm is optimized; while traditional lane detection algorithm models are often based on image visual clues. Recognition: After converting the image into a model that can represent hue, saturation, and brightness through a color model (Hue-Saturation-Intensity, HSI), the fuzzy mean clustering algorithm is used to process the pixels of each row to identify the lane lines. ; The algorithm model used is more complex, and the model calculation amount is large. By using deep learning semantic segmentation technology, a large number of samples under severe weather conditions are added to the lane line recognition data set, and a multi-task model based on vanishing point analysis is proposed; or by constructing a multi-attention mechanism The lane line recognition model for knowledge extraction; or by introducing long-term and short-term neural networks, etc., to process the long-line features of the lane lines to identify the lane lines; the traditional convolutional neural network cannot well analyze the spatial relationship between rows and columns in the image .
另外,针对车道线检测任务,采用单一的语义分割角度进行检测,实现对模型的优化;然而,在分类时将图像进行了拉直处理,破坏了数据的空间结构,无法很好的分析图片的全局与上下文信息的问题。In addition, for the lane line detection task, a single semantic segmentation angle is used for detection to realize the optimization of the model; however, the image is straightened during classification, which destroys the spatial structure of the data and cannot well analyze the image. The problem of global and contextual information.
此外,通过融合语义分割与图像分类任务,通过语义分割模型和图像分类模型实现车道线的检测任务;而所采用的模型较为复杂、计算量很大,不利于自动驾驶任务的实际应用场景中对实时性的要求;且该模型对数据的处理破坏了图像中各部分的空间结构,降低了车道线检测的精度。In addition, by integrating semantic segmentation and image classification tasks, the detection task of lane lines is realized through semantic segmentation model and image classification model; however, the adopted model is relatively complex and requires a large amount of calculation, which is not conducive to the actual application of automatic driving tasks. Real-time requirements; and the processing of data by this model destroys the spatial structure of each part of the image and reduces the accuracy of lane line detection.
基于以上问题,本申请实施例提出一种检测车道线的方法,通过训练后的神经网络模型对车道线进行检测、识别,将车道识别任务视为对图片每一行像素的位置分类任务,以行为单位分析车道线的位置,较好的分析了全局与上下文信息的同时,优化了模型的运行效率。此外,该模型在训练过程中构建了用于处理语义分割任务的模型,作为训练的辅助;同时,为了更好的处理图像全局的结构特征,还针对性的调整了损失函数的设置,以引导模型更好的关注车道线的连续性规律;对用于检测车道线的神经网络模型进行了改进,在分类检测时,保持了数据的空间结构,优化了模型的感知能力,并降低了模型的复杂程度。Based on the above problems, an embodiment of the present application proposes a method for detecting lane lines. The lane lines are detected and recognized through a trained neural network model, and the task of lane recognition is regarded as a task of classifying the positions of pixels in each row of a picture. The unit analyzes the position of the lane line, which better analyzes the global and contextual information, and optimizes the operating efficiency of the model. In addition, the model builds a model for processing semantic segmentation tasks during the training process as an aid for training; at the same time, in order to better deal with the global structural features of the image, the setting of the loss function is also adjusted to guide the The model pays better attention to the continuity law of lane lines; the neural network model used to detect lane lines has been improved. During classification and detection, the spatial structure of the data is maintained, the perception ability of the model is optimized, and the model's sensitivity is reduced. Complexity.
参见图1,是本申请实施例提供的应用场景的流程示意图。在对神经网络模型基于语义分割模型的辅助进行训练后,得到训练后的神经网络模型,将当前场景的道路图像输入训练后的神经网络模型,经过训练后的卷积神经网络模型进行特征提取和特征学习的处理,输出车道线的检测结果,实现对车道线位置的预测。Referring to FIG. 1 , it is a schematic flowchart of an application scenario provided by an embodiment of the present application. After the neural network model is trained based on the assistance of the semantic segmentation model, the trained neural network model is obtained, the road image of the current scene is input into the trained neural network model, and the trained convolutional neural network model performs feature extraction and The processing of feature learning, output the detection result of the lane line, and realize the prediction of the position of the lane line.
本申请实施例,在用于检测车道线的神经网络模型中引入了空洞卷积网络,优化了神经网络模型中的分类器,在保持图像空间结构的同时,对车道线继续宁识别,借助多尺寸的空洞卷积网络优化了模型的感知能力,保持了数据空间结构的同时,优化了模型的复杂度,大幅降低了模型的计算量,提高了响应速度。In the embodiment of the present application, a hollow convolutional network is introduced into the neural network model for detecting lane lines, and the classifier in the neural network model is optimized. The size of the atrous convolutional network optimizes the perception ability of the model, maintains the data space structure, optimizes the complexity of the model, greatly reduces the computational load of the model, and improves the response speed.
其中,在对神经网络模型训练过程中,采用语义分割模型作为训练的辅助模型,结合空洞卷积网络优化训练过程,大大提升了神经网络模型的训练速度;同时优化了神经网络 模型的分类器和语义分割模型的分割器,保持了特征图在末层网络的空间结构,减少了模型的参数量,降低了损失函数的复杂性,优化了神经网络模型的计算效率。且使得训练后的神经网络模型针对在不良光照条件或存在遮挡情况下采集的图像,依然达到了很高的检测精度。Among them, in the training process of the neural network model, the semantic segmentation model is used as an auxiliary model for training, and the training process is optimized by combining the convolutional convolution network, which greatly improves the training speed of the neural network model; at the same time, the classifier and the neural network model are optimized. The segmenter of the semantic segmentation model maintains the spatial structure of the feature map in the last layer of the network, reduces the number of parameters of the model, reduces the complexity of the loss function, and optimizes the computational efficiency of the neural network model. In addition, the trained neural network model still achieves high detection accuracy for images collected under poor lighting conditions or under occlusion conditions.
下面结合本申请实施例提供的检测车道线的方法的实现步骤,进一步介绍模型训练及模型架构的具体内容。The specific contents of model training and model architecture are further introduced below with reference to the implementation steps of the method for detecting lane lines provided by the embodiments of the present application.
参见图2,是本申请实施例提供的检测车道线的方法的流程示意图;检测车道线的方法在应用过程中,包括以下步骤:Referring to FIG. 2, it is a schematic flowchart of a method for detecting lane lines provided by an embodiment of the present application; the method for detecting lane lines includes the following steps in the application process:
步骤S201,获取当前场景的道路图像。Step S201, obtaining a road image of the current scene.
在一些实施例中,终端设备可以通过摄像头拍摄车辆当前行驶场景的道路图像;所拍摄的道路图像包括车辆的前方、后方或侧方的道路图像。终端设备可以为车载设备,分别与摄像头和车辆的自动驾驶的控制系统通信连接。In some embodiments, the terminal device may use a camera to capture a road image of the current driving scene of the vehicle; the captured road image includes a road image in front of, behind or on the side of the vehicle. The terminal device may be an in-vehicle device, which is respectively connected to the camera and the control system for automatic driving of the vehicle.
其中,终端设备可以根据车辆行驶场景的需求,控制摄像头实时或按预设周期采集车辆所处场景的道路图像。道路图像中可能包括连续的车道线,也可能包括部分被遮挡的车道线,或者不包括车道线。Among them, the terminal device can control the camera to collect the road image of the scene where the vehicle is located in real time or in a preset period according to the requirements of the vehicle driving scene. The road image may include continuous lane lines, partially occluded lane lines, or no lane lines.
可理解的,终端设备中预设有车道线的特征,为针对道路图像的中车道线检测提供预测基础。其中,根据规定,车道线线宽包括10厘米、15厘米及20厘米;车道线分为实线和虚线,颜色包括黄色和白色。It is understandable that the characteristics of the lane lines are preset in the terminal device, which provides a prediction basis for the detection of the lane lines in the middle of the road image. Among them, according to the regulations, the width of the lane lines includes 10 cm, 15 cm and 20 cm; the lane lines are divided into solid lines and dotted lines, and the colors include yellow and white.
步骤S202,将道路图像输入训练后的神经网络模型中处理,输出当前场景的道路图像中的车道线的检测结果;其中,训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像。Step S202, input the road image into the trained neural network model for processing, and output the detection result of the lane lines in the road image of the current scene; wherein, the trained neural network model is trained according to the sample images in the training set and the semantic segmentation model It is obtained that the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
在一些实施例中,神经网络模型对道路图像的处理为对车道线的实际检测过程;实际检测的车道线为由各种环境下不同线宽、线型及颜色的车道线组成;车道线检测的任务为对各种环境下车道线的识别,车道线检测的目标为确定出车道线的位置和方向。In some embodiments, the processing of road images by the neural network model is the actual detection process of lane lines; the actually detected lane lines are composed of lane lines with different line widths, line types and colors in various environments; lane line detection The task is to identify lane lines in various environments, and the goal of lane line detection is to determine the location and direction of the lane lines.
其中,通过训练后的神经网络模型对道路图像的处理包括目标特征的提取与检测。Among them, the processing of the road image by the trained neural network model includes the extraction and detection of target features.
在一些实施例中,训练后的神经网络模型包括残差网络层、空洞卷积网络层、上采样网络层以及检测器。将道路图像输入训练后的神经网络模型中处理,包括:In some embodiments, the trained neural network model includes a residual network layer, an atrous convolutional network layer, an upsampling network layer, and a detector. Input the road image into the trained neural network model for processing, including:
将道路图像输入残差网络层,经过残差网络层的卷积处理,输出包含语义特征的第一特征图;将第一特征图输入空洞卷积网络层,经过空洞卷积网络层的特征提取,输出包含细节特征的第二特征图;将第二特征图输入上采样网络层,经过上采样网络层的上采样处理,输出第三特征图;将第三特征图输入检测器,经过检测器的卷积处理,输出当前场景 的道路图像中的车道线的检测结果。The road image is input into the residual network layer, and after the convolution processing of the residual network layer, the first feature map containing semantic features is output; , output the second feature map containing detailed features; input the second feature map into the up-sampling network layer, and output the third feature map after the up-sampling process of the up-sampling network layer; input the third feature map into the detector, through the detector The convolution processing of the current scene outputs the detection results of the lane lines in the road image of the current scene.
参见图3,是本申请实施例提供的训练后神经网络模型整体架构示意图,训练后的神经网络模型为分类检测模型。其中,分类检测模型采用了残差神经网络ResNet-34(Residual Neural Network-34)中的残差网络层;通过残差网络层对输入的道路图像进行卷积处理,提取道路图像中的语义特征,得到第一特征图。在分类检测模型中,还引入了空洞卷积网络层;通过空洞卷积网络层对输入的第一特征图进行采样分类,保证预设大小的感受野的同时,通过采用多个尺寸长的空洞卷积核对第一特征图进行卷积处理,提取第一特征图的细节特征及全局特征,得到第二特征图。为保证特征图与原始输入的道路图像的大小尺寸相等,通过分类检测模型中的上采样网络层对输入的第二特征图进行上采样处理,输出采样特征图,即第三特征图。通过分类检测模型的检测器对输入的第三特征图进行二维卷积处理,输出车道线的识别结果。Referring to FIG. 3 , it is a schematic diagram of the overall architecture of the neural network model after training provided by the embodiment of the present application, and the neural network model after training is a classification detection model. Among them, the classification detection model adopts the residual network layer in the residual neural network ResNet-34 (Residual Neural Network-34); the input road image is convolved through the residual network layer to extract the semantic features in the road image. , to get the first feature map. In the classification and detection model, a hole convolution network layer is also introduced; the input first feature map is sampled and classified through the hole convolution network layer, while ensuring the receptive field of the preset size, by adopting multiple long holes The convolution kernel performs convolution processing on the first feature map, extracts detailed features and global features of the first feature map, and obtains a second feature map. In order to ensure that the size of the feature map is equal to that of the original input road image, the upsampling network layer in the classification detection model performs upsampling processing on the input second feature map, and outputs the sampled feature map, that is, the third feature map. The detector of the classification detection model performs two-dimensional convolution processing on the input third feature map, and outputs the recognition result of the lane line.
在一些实施例中,残差网络层包括第一残差模块、第二残差模块和第三残差模块;将道路图像输入残差网络层,经过残差网络层的卷积处理,输出包含语义特征的第一特征图,包括:In some embodiments, the residual network layer includes a first residual module, a second residual module and a third residual module; the road image is input into the residual network layer, and after convolution processing of the residual network layer, the output includes The first feature map of semantic features, including:
将道路图像输入第一残差模块,经过第一残差模块进行卷积处理,得到第一结果;将第一结果输入第二残差模块,经过第二残差模块的卷积处理,得到第二结果;将第二结果输入第三残差模块,经过第三残差模块的卷积处理,输出第一特征图。The road image is input into the first residual module, and the first result is obtained through convolution processing by the first residual module; Second result; the second result is input into the third residual module, and the first feature map is output after the convolution processing of the third residual module.
如图4所示,本申请实施例提供的残差网络层的结构示意图。如图4中的(a)图所示,分类检测模型中的残差网络层包括三个残差模块,即第一残差模块、第二残差模块和第三残差模块。如图4中的(b)图所示,每个残差模块包括六个采用了边缘填充的3×3卷积层,将经过六个采用边缘填充的3×3卷积层处理后的结果与输入特征图进行叠加,得到每一个残差模块的输出结果。As shown in FIG. 4 , a schematic structural diagram of a residual network layer provided by an embodiment of the present application. As shown in (a) of Figure 4, the residual network layer in the classification and detection model includes three residual modules, namely the first residual module, the second residual module and the third residual module. As shown in (b) of Figure 4, each residual module includes six 3×3 convolutional layers with edge padding, and the result after processing the six 3×3 convolutional layers with edge padding It is superimposed with the input feature map to obtain the output of each residual module.
示例性的,采用ResNet-34网络中的残差模块对道路图像进行特征的提取与变换。以ResNet-34的第一个残差块为例,图4中的(b)图展示了该残差块的结构。残差块采用多层堆叠的卷积层对输入的特征图进行处理,以更加深入的分析提取图像的有效信息。最终的卷积结果将与原始输入进行累加,以优化过深的网络层数可能导致得梯度消失现象。为保证每次卷积后输出特征得尺寸相较输入不变,需要在每次卷积前,对输入特征图得边缘用0值进行填充。Exemplarily, the residual module in the ResNet-34 network is used to extract and transform the features of the road image. Taking the first residual block of ResNet-34 as an example, (b) in Figure 4 shows the structure of the residual block. The residual block uses multiple stacked convolutional layers to process the input feature map to analyze and extract the effective information of the image more deeply. The final convolution result will be accumulated with the original input to optimize the number of layers that are too deep, which may lead to vanishing gradients. In order to ensure that the size of the output feature after each convolution remains unchanged compared to the input, it is necessary to fill the edges of the input feature map with 0 values before each convolution.
可理解的,车道线的识别结果包括识别出的各个车道、各个车道在图像的每一行像素或像素单元上的位置,以及确定出每一车道所属该行的哪一列或者不属于该行的任何一列。Understandably, the identification result of the lane line includes the identified lanes, the positions of the lanes on each row of pixels or pixel units in the image, and the determination of which column of the row each lane belongs to or any column that does not belong to the row. a row.
在一些实施例中,空洞卷积网络层包括第一卷积模块、第二卷积模块、第三卷积模块、第四卷积模块以及全局平均池化模块。将第一特征图输入所述空洞卷积网络层,经过空洞 卷积网络层的特征提取,包括:In some embodiments, the atrous convolutional network layer includes a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a global average pooling module. Input the first feature map into the atrous convolutional network layer, and extract the features of the atrous convolutional network layer, including:
将第一特征图分别输入第一卷积模块、第二卷积模块、第三卷积模块、第四卷积模块以及全局平均池化模块,对第一特征图进行特征提取。The first feature map is input into the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module respectively, and feature extraction is performed on the first feature map.
在一些实施例中,空洞卷积网络层采用不同尺寸的卷积核对特征图进行卷积处理。In some embodiments, the atrous convolutional network layer uses convolution kernels of different sizes to perform convolution processing on the feature maps.
参见图5,是本申请实施例提供的空洞卷积网络层的结构示意图。如图5中的(a)所示的空洞卷积网络层中的空洞空间卷积金字塔池化模块和全局平均池化模块。空洞空间卷积金字塔池化模块包括多个不同尺寸的卷积核及不同采样率的空洞卷积,通过多个空洞卷积对输入的第一特征图进行细节特征的分析和提取。例图5中的(a)图所示的1×1卷积,采样率rate=1的3×3卷积,采样率rate=3的3×3卷积以及采样率rate=5的3×3卷积。其中每一个尺寸及采样率的空洞卷积均对输入的第一特征图进行卷积处理,提取第一特征图中不同通道维度的细节特征。通过空洞卷积网络层的全局平均池化模块提取输入的第一特征图的全局特征。Referring to FIG. 5 , it is a schematic structural diagram of an atrous convolutional network layer provided by an embodiment of the present application. Atrous spatial convolution pyramid pooling module and global average pooling module in atrous convolutional network layer as shown in (a) of Fig. 5. The atrous spatial convolution pyramid pooling module includes multiple convolution kernels of different sizes and atrous convolutions with different sampling rates, and analyzes and extracts detailed features of the input first feature map through multiple atrous convolutions. For example, the 1×1 convolution shown in (a) of Figure 5, the 3×3 convolution with the sampling rate rate=1, the 3×3 convolution with the sampling rate rate=3, and the 3×3 convolution with the sampling rate rate=5 3 convolutions. The atrous convolution of each size and sampling rate performs convolution processing on the input first feature map, and extracts detailed features of different channel dimensions in the first feature map. The global features of the input first feature map are extracted through the global average pooling module of the atrous convolutional network layer.
示例性的,第一卷积模块为1×1卷积,第二卷积模块为采样率rate=1的3×3卷积,第三卷积模块为采样率rate=3的3×3卷积,第四卷积模块为采样率rate=5的3×3卷积。Exemplarily, the first convolution module is a 1×1 convolution, the second convolution module is a 3×3 convolution with a sampling rate rate=1, and the third convolution module is a 3×3 convolution with a sampling rate rate=3 product, the fourth convolution module is a 3×3 convolution with a sampling rate of rate=5.
在一些实施例中,将第一特征图输入空洞卷积网络层,经过空洞卷积网络层的特征提取之后,还包括:In some embodiments, the first feature map is input into the atrous convolutional network layer, and after the feature extraction of the atrous convolutional network layer, the method further includes:
将第一卷积模块、第二卷积模块、第三卷积模块、第四卷积模块以及全局平均池化模块分别输出的特征图在通道维度上进行拼接,得到拼接特征图;将拼接特征图经过1×1卷积处理后,输出第二特征图。The feature maps output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module are spliced in the channel dimension to obtain the spliced feature map; After the image is processed by 1×1 convolution, the second feature map is output.
如图5中的(a)图所示,在通过全局平均池化模块提取第一特征图的全局特征后,再经过上采样及1×1卷积处理,输出特征图。将全局平局池化模块输出的特征图与各个尺寸及采样率的空洞卷积、1×1卷积分别输出的特征图,在通道维度上进行拼接,输出拼接特征图;对拼接特征图再经过一层1×1卷积处理,输出第二特征图。As shown in (a) of Figure 5, after the global feature of the first feature map is extracted by the global average pooling module, the feature map is output after upsampling and 1×1 convolution processing. The feature map output by the global draw pooling module and the feature maps output by the hole convolution and 1×1 convolution of each size and sampling rate, respectively, are spliced in the channel dimension, and the spliced feature map is output; the spliced feature map is then passed through A layer of 1×1 convolutional processing outputs the second feature map.
相比传统卷积方法,通过空洞卷积网络层中的空洞卷积处理,可以有效的在扩大卷积感受野的同时,降低分类检测模型的参数量,从而优化模型的运行效率。Compared with the traditional convolution method, the hole convolution processing in the hole convolution network layer can effectively expand the convolution receptive field while reducing the amount of parameters of the classification and detection model, thereby optimizing the operating efficiency of the model.
如图5中的(b)图所示,左侧展示了空洞卷积核的大致结构。空洞卷积核相比传统的卷积核,只选取核中的部分位置赋予实际权重,如黑色小方块位置处的权重,忽略其余位置的输入。空洞卷积的采样间隔被称作采样率。普通卷积是采样率为1情况下的空洞卷积,为一种特殊情况的空洞卷积。如图5中的(b)图中右侧部分所示,在模型的尾部,增加多层普通卷积后,使用空洞卷积并不会因为卷积核实际权重间的缝隙导致可能的图像信息的丢失;例如特征图1经过空洞卷积处理后输出特征图2,将特征图2继续经过多层普通卷积处理,在普通卷积处理过程中,填充空洞卷积实际权重间的缝隙权重,输出图像信息更 加完整的特征图3。因而空洞卷积是一种有效的增加图像感受野并降低模型计算量的方式。As shown in (b) of Figure 5, the left side shows the general structure of the atrous convolution kernel. Compared with the traditional convolution kernel, the hollow convolution kernel only selects some positions in the kernel to give actual weights, such as the weight at the position of the small black square, and ignores the input of the remaining positions. The sampling interval of atrous convolution is called the sampling rate. Ordinary convolution is a hole convolution with a sampling rate of 1, which is a special case of hole convolution. As shown in the right part of (b) in Figure 5, at the end of the model, after adding multiple layers of ordinary convolution, the use of atrous convolution will not cause possible image information due to the gap between the actual weights of the convolution kernel. For example, after feature map 1 is processed by hole convolution, feature map 2 is output, and feature map 2 continues to undergo multi-layer ordinary convolution processing. In the process of ordinary convolution, the gap weight between the actual weights of hole convolution is filled. Output feature map 3 with more complete image information. Therefore, atrous convolution is an effective way to increase the image receptive field and reduce the computational complexity of the model.
在一些实施例中,在经过空洞卷积网络层对第一特征图进行特征提取后,所述方法包括:In some embodiments, after feature extraction is performed on the first feature map through the atrous convolutional network layer, the method includes:
将第二特征图进行上采样处理,并将上采样处理的结果输入检测器,经过检测器的卷积处理后,输出当前场景的道路图像的分类预测结果,分类预测结果用于指示当前场景的道路图像中的车道所在的位置。The second feature map is subjected to upsampling processing, and the result of the upsampling processing is input into the detector. After the convolution processing of the detector, the classification prediction result of the road image of the current scene is output, and the classification prediction result is used to indicate the current scene. The location of the lane in the road image.
参见图6,是本申请实施例提供的检测器的结构示意图。在第二特征图输入上采样网络层经过采样处理后,输出[车道数,行数,列数+1]的形状的采样特征图,将采样特征图继续经过两层采用了边缘填充的空洞卷积(如采样率rate=1的3×3卷积)处理后,输出最终的[车道数,行数,列数+1]形状的分类结果,分类结果的形状与输入的采样特征图的相同,分别对应表示各车道、在图像的每一行上对各车道进行定位,以及确定每一车道究竟位于该行的哪一列,或是不属于该行的任何列。Referring to FIG. 6 , it is a schematic structural diagram of a detector provided by an embodiment of the present application. After the second feature map input upsampling network layer is sampled, it outputs a sampling feature map in the shape of [the number of lanes, the number of rows, the number of columns + 1], and the sampling feature map continues to pass through two layers of the hole volume filled with edges. After product (such as 3×3 convolution with sampling rate=1), the final classification result of the shape of [number of lanes, number of rows, number of columns+1] is output, and the shape of the classification result is the same as that of the input sampling feature map , respectively represent each lane, locate each lane on each row of the image, and determine which column of the row each lane is located in, or any column that does not belong to the row.
通过本申请实施例,基于空洞卷积网络层的各个空洞卷积,对多尺寸特征的提取及变换处理,大大简化了分类检测模型的检测器,直接采用双线性插值与3×3的二维卷积对空洞卷积的特征输出进行特征调整得到最终的分类输出,进一步提升了模型的运行效率。Through the embodiment of the present application, based on the convolution of each hole in the hole convolution network layer, the extraction and transformation of multi-size features greatly simplifies the detector of the classification detection model, directly using bilinear interpolation and 3 × 3 two The dimensional convolution performs feature adjustment on the feature output of the hole convolution to obtain the final classification output, which further improves the operating efficiency of the model.
通过本申请实施例,道路图像经过多层残差网络块处理后输出特征图,再经过一层1×1卷积的处理后,通过分类检测模型中间空洞卷积网络层中的空洞卷积金字塔池化ASPP进行特征处理。将分别被采样率为1、3、5的三个空洞卷积块、和另一个1×1卷积处理,所有的卷积运算都会使用填充Padding操作来保持输入输出尺寸的不变。为更好地把握特征图的全局特征,空洞卷积网络层还采用全局平均池化,获得输入特征图的全局概括,并借助双线性插值上采样与1×1卷积使其还原为输入特征图的大小,并于各卷积、空洞卷积的输出在通道维度上拼接后,经过一个1×1卷积调整维度特征后输出第二特征图。从而在分类检测模型中间及末端使用空洞卷积处理,从多个尺寸上综合提取分析图像特征;不再对分类检测模型中的末层特征进行拉直处理,而是在分类检测模型的末层保持特征图的空间结构,借助改进的检测器产生最终的模型输出,大大降低了模型的参数量,提升模型的运行效率。Through the embodiment of the present application, the road image is processed by a multi-layer residual network block and then outputs a feature map, and then after a layer of 1×1 convolution processing, the hole convolution pyramid in the middle hole convolution network layer of the model is detected by classification Pooling ASPP for feature processing. It will be processed by three atrous convolution blocks with sampling rates of 1, 3, and 5, and another 1×1 convolution. All convolution operations will use padding operations to keep the input and output sizes unchanged. In order to better grasp the global features of the feature map, the atrous convolutional network layer also adopts global average pooling to obtain a global generalization of the input feature map, and restores it to the input with the help of bilinear interpolation upsampling and 1×1 convolution. The size of the feature map, and after the outputs of each convolution and hole convolution are spliced in the channel dimension, the second feature map is output after a 1×1 convolution to adjust the dimension features. Therefore, hole convolution processing is used in the middle and end of the classification detection model to comprehensively extract and analyze image features from multiple dimensions; the last layer features in the classification detection model are no longer straightened, but the last layer of the classification detection model is used. The spatial structure of the feature map is maintained, and the final model output is generated with the help of the improved detector, which greatly reduces the number of parameters of the model and improves the operating efficiency of the model.
在训练阶段,为了更好的引导模型的输出评估,获得更精确的检测效果,在对神经网络模型进行训练过程中,采用基于语义分割的辅助模型进行训练。In the training phase, in order to better guide the output evaluation of the model and obtain a more accurate detection effect, in the training process of the neural network model, an auxiliary model based on semantic segmentation is used for training.
在一些实施例中,所述方法包括:对采集的多个场景的道路图像的车道线像素点坐标、车道线类型以及当前车道是否为可行驶车道的特征进行标记,得到所述样本图像中与所述多个场景的道路图像对应的所述标记图像。In some embodiments, the method includes: marking the pixel coordinates of lane lines, the type of lane lines, and the characteristics of whether the current lane is a drivable lane in the collected road images of multiple scenes, and obtains the corresponding characteristics in the sample images. the marked images corresponding to the road images of the multiple scenes.
示例性的,采集不同场景的道路图像,并对道路图像中的车道线进行标注,再对道路 图像提取感兴趣区域及下采样操作,通过与预设处理函数将不同标注格式的数据集进行处理,得到神经网络模型训练所需要的车道线数据集,即样本图像中的多个场景的道路图像对应的标记图像。Exemplarily, road images of different scenarios are collected, lane lines in the road images are marked, regions of interest are extracted from the road images and down-sampling operations are performed, and data sets with different annotation formats are processed by a preset processing function. , to obtain the lane line data set required for the training of the neural network model, that is, the marked images corresponding to the road images of multiple scenes in the sample image.
在一些实施例中,根据训练集中的样本图像以及语义分割模型训练神经网络模型,包括:In some embodiments, the neural network model is trained according to the sample images in the training set and the semantic segmentation model, including:
将样本图像中的多个场景的道路图像输入神经网络模型的残差网络层,经过残差网络层的卷积处理,输出道路图像的第四特征图;将第四特征图输入神经网络模型的空洞卷积网络层,经过空洞卷积网络层的特征提取,输出第五特征图;将第五特征图进行上采样处理,并将上采样处理的结果输入神经网络模型的检测器,经过检测器的卷积处理后,输出多个场景的道路图像的分类预测结果,分类预测结果用于指示所述多个场景的道路图像中的车道所在的位置。The road images of multiple scenes in the sample image are input into the residual network layer of the neural network model, and after the convolution processing of the residual network layer, the fourth feature map of the road image is output; the fourth feature map is input into the neural network model. The hole convolution network layer, after the feature extraction of the hole convolution network layer, outputs the fifth feature map; the fifth feature map is up-sampled, and the result of the up-sampling process is input into the detector of the neural network model, after the detector After the convolution processing, the classification prediction results of the road images of the multiple scenes are output, and the classification prediction results are used to indicate the positions of the lanes in the road images of the multiple scenes.
在一些实施例中,神经网络模型的残差网络层包括三个残差模块,所述方法包括:In some embodiments, the residual network layer of the neural network model includes three residual modules, and the method includes:
将第五特征图进行上采样处理,得到第六特征图;将样本图像中的多个场景的道路图像输入残差网络层,经过残差网络层的前两个残差模块的卷积处理,输出第七特征图;将第七特征图输入语义分割模型,经过语义分割模型对第七特征图中的车道线像素进行分割,输出车道线实例特征图;将车道线实例特征图与第六特征图进行拼接,得到拼接图像。The fifth feature map is up-sampled to obtain the sixth feature map; the road images of multiple scenes in the sample image are input into the residual network layer, and after the convolution processing of the first two residual modules of the residual network layer, Output the seventh feature map; input the seventh feature map into the semantic segmentation model, segment the lane line pixels in the seventh feature map through the semantic segmentation model, and output the lane line instance feature map; combine the lane line instance feature map with the sixth feature The images are stitched together to obtain a stitched image.
在一些实施例中,所述方法包括:将拼接图像输入分割器,经过分割器的卷积处理,输出语义分割预测结果,语义分割预测结果用于指示多个场景的道路图像中的车道识别结果。In some embodiments, the method includes: inputting the stitched image into a segmenter, subject to convolution processing by the segmenter, and outputting a semantic segmentation prediction result, where the semantic segmentation prediction result is used to indicate lane recognition results in road images of multiple scenes .
如图7所示,本申请实施例提供的训练神经网络模型的整体架构示意图;其中上半部分的分类检测模型是车道线检测过程中使用的模型,在预测阶段其输出将作为任务的最终结果;下半部分的语义分割模型,只参与模型的训练过程,用于引导模型获得更加精确的识别效果。As shown in FIG. 7 , a schematic diagram of the overall architecture of the training neural network model provided by the embodiment of the present application; the classification detection model in the upper half is the model used in the lane line detection process, and its output will be used as the final result of the task in the prediction stage ; The semantic segmentation model in the lower part only participates in the training process of the model and is used to guide the model to obtain a more accurate recognition effect.
在对神经网络模型进行训练的过程中,将样本图像中多个场景的道路图像输入神经网络模型的残差网络层,经过残差网络层的卷积处理,提取道路图像的语义特征,得到第四特征图。将第四特征图输入空洞卷积网络层的各个尺寸及采样率的空洞卷积及1×1卷积、全局平均池化进行卷积处理,提取不同的细节特征及全局特征,经过对各个特征图的拼接处理及1×1的卷积处理,输出第五特征图。将第五特征图输入神经网络模型的检测器,经过检测器的二维卷积处理后,输出分类预测结果[车道数,行数,列数+1];该分类预测结果表示分类是各车道、在图像的每一行上对各车道进行定位,以及确定每一车道究竟位于该行的哪一列,或是不属于该行的任何列。In the process of training the neural network model, the road images of multiple scenes in the sample image are input into the residual network layer of the neural network model. After the convolution processing of the residual network layer, the semantic features of the road image are extracted, and the first Four feature maps. The fourth feature map is input into the hole convolution of each size and sampling rate of the hole convolution network layer, 1 × 1 convolution, and global average pooling for convolution processing to extract different detailed features and global features. The image splicing process and the 1×1 convolution process output the fifth feature map. Input the fifth feature map into the detector of the neural network model, and after the two-dimensional convolution processing of the detector, output the classification prediction result [the number of lanes, the number of rows, the number of columns + 1]; the classification prediction result indicates that the classification is each lane. , locate each lane on each row of the image, and determine which column of the row each lane is in, or any column that does not belong to the row.
其中,通过神经网络模型中的空洞卷积网络层对道路图像的特征图进行采样分类,保 证较大的感受野的同时,利用多个不同尺寸的空洞卷积核提升模型对不同场景道路图像的细节感知能力,并减少模型的参数量。通过空洞卷积网络层中的全局平局池化对特征图进行全局特征的提取,并在上采样及1×1卷积后输出特征图。将各个空洞卷积核输出的特征图和全局平局池化输出的特征图在通道维度进行拼接,再经过一层1×1卷积处理后输出第五特征图。Among them, the feature map of the road image is sampled and classified by the hole convolution network layer in the neural network model to ensure a larger receptive field. Detail perception ability, and reduce the number of parameters of the model. Global feature extraction is performed on the feature map through global draw pooling in the atrous convolutional network layer, and the feature map is output after upsampling and 1×1 convolution. The feature map output by each hole convolution kernel and the feature map output by the global draw pooling are spliced in the channel dimension, and then the fifth feature map is output after a layer of 1×1 convolution processing.
示例性的,多个空洞卷积核可以为1×1卷积,采样率rate=1的3×3卷积,采样率rate=3的3×3卷积以及采样率rate=5的3×3卷积。Exemplarily, the multiple atrous convolution kernels may be 1×1 convolution, 3×3 convolution with sampling rate rate=1, 3×3 convolution with sampling rate rate=3, and 3×3 convolution with sampling rate rate=5 3 convolutions.
在一些实施例中,在基于语义分割模型的训练过程中,将神经网络模型(如ResNet-34)第二个残差模块的输出结果输入语义分割模型,经过语义分割模型对第二个残差模块的输出结果进行像素分割,输出车道线实例特征图。将经过空洞卷积网络层处理后输出第五特征图进行上采样处理,输出第六特征图。将车道线实例特征图与第六特征图进行拼接,得到拼接图像。In some embodiments, in the training process based on the semantic segmentation model, the output result of the second residual module of the neural network model (such as ResNet-34) is input into the semantic segmentation model, and the second residual is processed by the semantic segmentation model. The output of the module is divided into pixels, and the feature map of the lane line instance is output. The fifth feature map is output after being processed by the atrous convolutional network layer for up-sampling processing, and the sixth feature map is output. The lane line instance feature map and the sixth feature map are spliced to obtain a spliced image.
在一些实施例中,将拼接图像输入分割器,经过分割器的二维卷积处理,输出语义分割预测结果[车道数+1,行数,列数],表示对于图像中的每个基本像素区域,其属于哪一个车道或不属于任何车道。In some embodiments, the stitched image is input into a segmenter, and the segmenter performs two-dimensional convolution processing to output a semantic segmentation prediction result [number of lanes + 1, number of rows, number of columns], indicating that for each basic pixel in the image zone, which lane it belongs to or does not belong to any lane.
如图8所示,本申请实施例提供的分割器的结构示意图。分割器之前的网络将特征图上采样至[车道数+1,行数,列数]的形状,经过两层采用了边缘填充的空洞卷积与一层普通的3×3卷积后,输出最终的语义分割预测结果,语义分割预测结果的形状与输入特征图相同,为[车道数+1,行数,列数],表示对于图像中的每个基本像素区域,其属于哪一个车道或不属于任何车道。其中,分割器为语义分割模型的输出部分。As shown in FIG. 8 , a schematic structural diagram of a splitter provided by an embodiment of the present application. The network before the splitter upsamples the feature map to the shape of [number of lanes + 1, number of rows, number of columns], and after two layers of convolution with edge-filled holes and a layer of ordinary 3×3 convolution, the output The final semantic segmentation prediction result, the shape of the semantic segmentation prediction result is the same as the input feature map, which is [the number of lanes + 1, the number of rows, the number of columns], indicating which lane or Does not belong to any lane. Among them, the segmenter is the output part of the semantic segmentation model.
在训练阶段,为了更好的引导模型的输出评估,获得更精确的识别效果。使用基于语义分割的辅助模型对神经网络进行训练。通过语义分割模型将神经网络模型ResNet-34的第二个残差模块的结果与上采样后空洞卷积网络层的输出进行拼接后,直接采用基于二维卷积的语义分割器处理产生输出。该输出的目标是确定多个场景的道路图像的各个车道的识别结果,例如图像中各部分是否属于某一车道、属于哪一条车道,执行的是语义分割任务。In the training phase, in order to better guide the output evaluation of the model and obtain a more accurate recognition effect. The neural network is trained using an auxiliary model based on semantic segmentation. After splicing the results of the second residual module of the neural network model ResNet-34 with the output of the up-sampled atrous convolutional network layer through the semantic segmentation model, the output is directly processed by a two-dimensional convolution-based semantic segmenter. The goal of this output is to determine the recognition results of each lane in road images of multiple scenes, such as whether each part of the image belongs to a certain lane, which lane it belongs to, and performs the task of semantic segmentation.
在一些实施例中,通过第一损失函数计算神经网络模型的分类预测结果相对于样本图像中标记图像的预测概率的第一误差值,通过第一误差值调整神经网络模型的参数;第一损失函数表示如下:In some embodiments, the first loss function is used to calculate the first error value of the classification prediction result of the neural network model relative to the prediction probability of the marked image in the sample image, and the parameters of the neural network model are adjusted by the first error value; the first loss The function is represented as follows:
Figure PCTCN2020136540-appb-000003
Figure PCTCN2020136540-appb-000003
其中,y为训练集中的样本图像中的所述多个场景的道路图像的分类真值,p为训练集 中样本图像中的所述多个场景的道路图像经过所述神经网络模型处理的预测概率,γ为预设的权重值。Wherein, y is the classification truth value of the road images of the multiple scenes in the sample images in the training set, and p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the neural network model , γ is the preset weight value.
本申请实施例在训练过程中,分类检测模型在训练过程中采用Focal Loss作为其第一损失函数。其中y指该样例的分类真值,p指该样例的预测概率,γ为设置的权重。Focal Loss可以为难以分类的、分类严重错误的样本附加更大的权重,使得模型在训练时更加专注于样本。In the training process of the embodiment of the present application, the classification detection model adopts Focal Loss as its first loss function in the training process. Among them, y refers to the classification truth value of the sample, p refers to the predicted probability of the sample, and γ is the set weight. Focal Loss can attach larger weights to hard-to-classify and severely misclassified samples, making the model more focused on samples during training.
在一些实施例中,通过第二损失函数计算语义分割模型的车道识别结果相对于样本图像中标记图像的预测概率的第二误差值,通过第二误差值调整神经网络模型的参数;第二损失函数表示如下:In some embodiments, a second error value of the lane recognition result of the semantic segmentation model relative to the predicted probability of the marked image in the sample image is calculated by a second loss function, and the parameters of the neural network model are adjusted by the second error value; the second loss The function is represented as follows:
Figure PCTCN2020136540-appb-000004
Figure PCTCN2020136540-appb-000004
其中,y为训练集中的样本图像中的所述多个场景的道路图像的识别真值,p为训练集中样本图像中的所述多个场景的道路图像经过所述语义分割模型处理的预测概率。Wherein, y is the recognition truth value of the road images of the multiple scenes in the sample images in the training set, and p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the semantic segmentation model .
本申请实施例在训练过程中,语义分割模型采用交叉熵损失函数Cross Entropy Loss作为第二损失函数,Cross Entropy Loss的计算公式如公式(2)所示,其中y指该样例的分类真值,p指该样例的预测概率。对于每一类分类结果,当y=1,即该类的分类真值为1,使得预测概率尽可能趋近于1,因而损失为-log(p);对于分类真值为0的样例,使得预测概率趋近于0,损失为-log(1-p)。In the training process of the embodiment of the present application, the semantic segmentation model adopts the cross entropy loss function Cross Entropy Loss as the second loss function, and the calculation formula of Cross Entropy Loss is shown in formula (2), where y refers to the classification truth value of the sample , p refers to the predicted probability of the sample. For each class of classification results, when y=1, that is, the classification truth value of the class is 1, so that the predicted probability is as close to 1 as possible, so the loss is -log(p); for the example whose classification truth value is 0 , so that the predicted probability approaches 0, and the loss is -log(1-p).
通过本申请实施例,由于空洞空间卷积金字塔池化在几乎不增加模型参数量的情况下对感受野的有效优化,不再对分类检测模型中的末层特征图进行拉直操作,而是采用可以约束车道线空间特征的结构化损失函数进行处理,直接对空洞卷积网络层的输出特征图进行双线性插值上采样与二维卷积处理;保持了特征图的空间结构,也减轻了模型的训练负担。Through the embodiments of the present application, due to the effective optimization of the receptive field by the convolutional pyramid pooling of the empty space without increasing the amount of model parameters, the straightening operation is no longer performed on the feature map of the last layer in the classification detection model. The structured loss function that can constrain the spatial features of the lane lines is used for processing, and the output feature map of the atrous convolutional network layer is directly subjected to bilinear interpolation upsampling and two-dimensional convolution processing; the spatial structure of the feature map is maintained, and the reduces the training burden of the model.
通过本申请实施例,优化了模型的参数量,提升了模型的运行效率;在模型末端保持了特征图的空间结构,有利于对图像整体特征的分析;利用空洞空间卷积金字塔池化优化了模型后半部分的特征分析,在不增加模型训练负担的情况下提升了卷积核的感受野,并用多个不同尺寸的卷积核分析特征图的不同特征,增强了分类与语义分割的泛化能力,提升了任务的检测精度。Through the embodiment of the present application, the parameter quantity of the model is optimized, and the operation efficiency of the model is improved; the spatial structure of the feature map is maintained at the end of the model, which is conducive to the analysis of the overall characteristics of the image; the use of hole spatial convolution pyramid pooling optimizes the The feature analysis in the second half of the model improves the receptive field of the convolution kernel without increasing the training burden of the model, and uses multiple convolution kernels of different sizes to analyze different features of the feature map, enhancing the generalization of classification and semantic segmentation. It improves the detection accuracy of tasks.
为验证本发明所提出的方法的有效性,采用与传统神经网络模型同样的数据集进行了实验验证,并传统的神经网络模型在模型的训练速度、收敛速度、检测精度上进行了对比,均取得了明显的进步。如表1所示,本申请实施例提供的训练后的神经网络模型与传统的原模型在检测正确率与检测速度上的对比。正确率指的是模型在800×288像素的图像上, 对车道线像素的识别正确率。运行速度指的是模型处理一个批次(batch),计16张图片所需的时间。In order to verify the effectiveness of the method proposed by the present invention, the same data set as the traditional neural network model was used for experimental verification, and the traditional neural network model was compared in terms of the training speed, convergence speed and detection accuracy of the model. Significant progress has been made. As shown in Table 1, the comparison of the detection accuracy and detection speed between the trained neural network model provided by the embodiment of the present application and the traditional original model. The correct rate refers to the correct rate of recognition of lane line pixels by the model on an image of 800×288 pixels. The running speed refers to the time it takes for the model to process a batch of 16 images.
   训练后的神经网络模型The trained neural network model 传统的原模型traditional original model
正确率Correct rate 92.96%92.96% 92.04%92.04%
运行速度running speed 32ms/batch32ms/batch 60ms/batch60ms/batch
表1Table 1
如图9所示,本申请实施例提供车道线的检测结果的可视示意图。图9示出了两组测试图像中通过本申请实施例提供的训练后的神经网络模型的检测结果。可见,如图9中的(a)所示,本申请实施例提供的训练后的神经网络模型能够准确的识别图像中的多条车道线;且如图9中的(b)所示,即使在车道线被障碍物遮挡的情况下,也仍然能保持较好的识别效果。As shown in FIG. 9 , an embodiment of the present application provides a visual schematic diagram of a detection result of a lane line. FIG. 9 shows the detection results of the trained neural network model provided by the embodiments of the present application in two sets of test images. It can be seen that, as shown in (a) of FIG. 9 , the trained neural network model provided by the embodiment of the present application can accurately identify multiple lane lines in the image; and as shown in (b) of FIG. 9 , even if In the case that the lane line is blocked by obstacles, it can still maintain a good recognition effect.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
对应于上文实施例所述的检测车道线的方法,图10示出了本申请实施例提供的检测车道线的装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the method for detecting lane lines described in the above embodiments, FIG. 10 shows a structural block diagram of the device for detecting lane lines provided by the embodiments of the present application. part.
参照图10,该装置包括:10, the device includes:
获取单元101,用于用于获取当前场景的道路图像;an acquisition unit 101, used for acquiring a road image of the current scene;
处理单元102,用于将所述道路图像输入训练后的神经网络模型中处理,输出所述当前场景的道路图像中的车道线的检测结果;其中,所述训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,所述训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像。The processing unit 102 is configured to input the road image into the trained neural network model for processing, and output the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model is based on training The sample images in the set and the semantic segmentation model are obtained by training, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
通过本发明实施例,终端设备获取当前场景的道路图像;将道路图像输入训练后的神经网络模型中处理,输出当前场景的道路图像中的车道线的检测结果;其中,训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像;通过根据训练集中的样本图像以及语义分割模型训练,得到训练后的神经网络模型,可以解决目前复杂环境下,车道线识别准确率低,以及目前用于车道线检测的模型计算量较大、模型较复杂是的响应速度慢的问题;提高了检测精度,同时满足了自动驾驶任务的实际应用场景中的实时性要求;具有较强的易用性与实用性。Through the embodiment of the present invention, the terminal device obtains the road image of the current scene; inputs the road image into the trained neural network model for processing, and outputs the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model In order to be obtained by training according to the sample images in the training set and the semantic segmentation model, the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes; The segmentation model is trained to obtain the trained neural network model, which can solve the problem of low accuracy of lane line recognition in the current complex environment, and the current model used for lane line detection has a large amount of calculation and a relatively complex model. The problem is that the response speed is slow ; Improve the detection accuracy, while meeting the real-time requirements in the actual application scenarios of automatic driving tasks; it has strong ease of use and practicability.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example. Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application provide a computer program product, when the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be implemented when the mobile terminal executes the computer program product.
图11为本申请一实施例提供的终端设备11的结构示意图。如图11所示,该实施例的终端设备11包括:至少一个处理器110(图11中仅示出一个)处理器、存储器111以及存储在所述存储器111中并可在所述至少一个处理器110上运行的计算机程序112,所述处理器110执行所述计算机程序112时实现上述任意各个存证方法实施例中的步骤。FIG. 11 is a schematic structural diagram of a terminal device 11 according to an embodiment of the present application. As shown in FIG. 11 , the terminal device 11 of this embodiment includes: at least one processor 110 (only one is shown in FIG. 11 ), a processor, a memory 111 , and a processor 111 stored in the memory 111 and available for processing in the at least one processor A computer program 112 running on the processor 110, and the processor 110 implements the steps in any of the foregoing embodiments of the certificate storage method when the processor 110 executes the computer program 112.
所述终端设备11可以是桌上型计算机、笔记本、掌上电脑、车载设备及云端服务器等计算设备。该终端设备11可包括,但不仅限于,处理器110、存储器111。本领域技术人员可以理解,图11仅仅是终端设备11的举例,并不构成对终端设备11的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。The terminal device 11 may be a computing device such as a desktop computer, a notebook, a palmtop computer, a vehicle-mounted device, and a cloud server. The terminal device 11 may include, but is not limited to, a processor 110 and a memory 111 . Those skilled in the art can understand that FIG. 11 is only an example of the terminal device 11, and does not constitute a limitation on the terminal device 11, and may include more or less components than the one shown, or combine some components, or different components , for example, may also include input and output devices, network access devices, and the like.
所称处理器110可以是中央处理单元(Central Processing Unit,CPU),该处理器110还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 110 may be a central processing unit (Central Processing Unit, CPU), and the processor 110 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
所述存储器111在一些实施例中可以是所述终端设备11的内部存储单元,例如终端设备11的硬盘或内存。所述存储器111在另一些实施例中也可以是所述终端设备11的外部存储设备,例如所述终端设备11上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器111还可以既包括所述终端设备11的内部存储单元也包括外部存储设备。所述存储器 111用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器111还可以用于暂时地存储已经输出或者将要输出的数据。The memory 111 may be an internal storage unit of the terminal device 11 in some embodiments, such as a hard disk or a memory of the terminal device 11 . In other embodiments, the memory 111 may also be an external storage device of the terminal device 11, such as a plug-in hard disk equipped on the terminal device 11, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 111 may also include both an internal storage unit of the terminal device 11 and an external storage device. The memory 111 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as program codes of the computer program and the like. The memory 111 may also be used to temporarily store data that has been output or will be output.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the present application realizes all or part of the processes in the methods of the above embodiments, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When executed by a processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media. For example, U disk, mobile hard disk, disk or CD, etc. In some jurisdictions, under legislation and patent practice, computer readable media may not be electrical carrier signals and telecommunications signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims (15)

  1. 一种检测车道线的方法,其特征在于,包括:A method for detecting lane lines, comprising:
    获取当前场景的道路图像;Get the road image of the current scene;
    将所述道路图像输入训练后的神经网络模型中处理,输出所述当前场景的道路图像中的车道线的检测结果;Inputting the road image into the trained neural network model for processing, and outputting the detection result of the lane line in the road image of the current scene;
    其中,所述训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,所述训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像。The trained neural network model is obtained by training according to the sample images in the training set and the semantic segmentation model, and the sample images in the training set include collected road images of multiple scenes and road images corresponding to the multiple scenes. Tag the image.
  2. 如权利要求1所述的方法,其特征在于,所述训练后的神经网络模型包括残差网络层、空洞卷积网络层、上采样网络层以及检测器;The method of claim 1, wherein the trained neural network model comprises a residual network layer, an atrous convolutional network layer, an upsampling network layer and a detector;
    所述将所述道路图像输入训练后的神经网络模型中处理,包括:The processing of inputting the road image into the trained neural network model includes:
    将所述道路图像输入所述残差网络层,经过所述残差网络层的卷积处理,输出包含语义特征的第一特征图;Inputting the road image into the residual network layer, and through convolution processing of the residual network layer, outputting a first feature map containing semantic features;
    将所述第一特征图输入所述空洞卷积网络层,经过所述空洞卷积网络层的特征提取,输出包含细节特征的第二特征图;Inputting the first feature map into the hole convolution network layer, and outputting a second feature map containing detailed features through feature extraction of the hole convolution network layer;
    将所述第二特征图输入所述上采样网络层,经过所述上采样网络层的上采样处理,输出第三特征图;Inputting the second feature map into the up-sampling network layer, and after up-sampling processing by the up-sampling network layer, a third feature map is output;
    将所述第三特征图输入所述检测器,经过所述检测器的卷积处理,输出所述当前场景的道路图像中的车道线的所述检测结果。The third feature map is input into the detector, and the detection result of the lane line in the road image of the current scene is output through the convolution process of the detector.
  3. 如权利要求2所述的方法,其特征在于,所述残差网络层包括第一残差模块、第二残差模块和第三残差模块;The method of claim 2, wherein the residual network layer comprises a first residual module, a second residual module and a third residual module;
    所述将所述道路图像输入所述残差网络层,经过所述残差网络层的卷积处理,输出包含语义特征的第一特征图,包括:The said road image is input into the residual network layer, and after the convolution processing of the residual network layer, a first feature map containing semantic features is output, including:
    将所述道路图像输入所述第一残差模块,经过所述第一残差模块进行卷积处理,得到第一结果;Inputting the road image into the first residual module, and performing convolution processing through the first residual module to obtain a first result;
    将所述第一结果输入所述第二残差模块,经过第二残差模块的卷积处理,得到第二结果;The first result is input into the second residual module, and the second result is obtained through the convolution processing of the second residual module;
    将所述第二结果输入所述第三残差模块,经过所述第三残差模块的卷积处理,输出所述第一特征图。The second result is input into the third residual module, and the first feature map is output after convolution processing by the third residual module.
  4. 如权利要求2所述的方法,其特征在于,所述空洞卷积网络层包括第一卷积模块、第二卷积模块、第三卷积模块、第四卷积模块以及全局平均池化模块;The method of claim 2, wherein the atrous convolutional network layer comprises a first convolution module, a second convolution module, a third convolution module, a fourth convolution module and a global average pooling module ;
    将所述第一特征图输入所述空洞卷积网络层,经过所述空洞卷积网络层的特征提取,包括:Inputting the first feature map into the atrous convolutional network layer, and extracting features from the atrous convolutional network layer, including:
    将所述第一特征图分别输入所述第一卷积模块、所述第二卷积模块、所述第三卷积模块、所述第四卷积模块以及所述全局平均池化模块,对所述第一特征图进行特征提取。Input the first feature map into the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module respectively, Feature extraction is performed on the first feature map.
  5. 如权利要求4所述的方法,其特征在于,将所述第一特征图输入所述空洞卷积网络层,经过所述空洞卷积网络层的特征提取之后,还包括:The method of claim 4, wherein the first feature map is input into the atrous convolutional network layer, and after the feature extraction of the atrous convolutional network layer, the method further comprises:
    将所述第一卷积模块、所述第二卷积模块、所述第三卷积模块、所述第四卷积模块以及所述全局平均池化模块分别输出的特征图在通道维度上进行拼接,得到拼接特征图;The feature maps respectively output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module are processed in the channel dimension. Splicing to obtain a splicing feature map;
    将所述拼接特征图经过1×1卷积处理后,输出所述第二特征图。After the spliced feature map is processed by 1×1 convolution, the second feature map is output.
  6. 如权利要求5所述的方法,其特征在于,在经过所述空洞卷积网络层对所述第一特征图进行特征提取后,所述方法包括:The method according to claim 5, wherein after the feature extraction is performed on the first feature map through the atrous convolutional network layer, the method comprises:
    将所述第二特征图进行上采样处理,并将上采样处理的结果输入所述检测器,经过所述检测器的卷积处理后,输出所述当前场景的道路图像的分类预测结果,所述分类预测结果用于指示所述当前场景的道路图像中的车道所在的位置。The second feature map is subjected to up-sampling processing, and the result of the up-sampling processing is input into the detector. After the convolution processing of the detector, the classification prediction result of the road image of the current scene is output. The classification prediction result is used to indicate the location of the lane in the road image of the current scene.
  7. 如权利要求1所述的方法,其特征在于,所述方法包括:The method of claim 1, wherein the method comprises:
    对采集的多个场景的道路图像的车道线像素点坐标、车道线类型以及当前车道是否为可行驶车道的特征进行标记,得到所述样本图像中与所述多个场景的道路图像对应的所述标记图像。Mark the lane line pixel coordinates, the lane line type, and the characteristics of whether the current lane is a drivable lane in the collected road images of multiple scenes, and obtain all the sample images corresponding to the road images of the multiple scenes. the marked image.
  8. 如权利要求1或7所述的方法,其特征在于,根据训练集中的样本图像以及语义分割模型训练神经网络模型,包括:The method according to claim 1 or 7, wherein, training a neural network model according to the sample images in the training set and the semantic segmentation model, comprising:
    将样本图像中的所述多个场景的道路图像输入所述神经网络模型的残差网络层,经过所述残差网络层的卷积处理,输出所述道路图像的第四特征图;Inputting the road images of the multiple scenes in the sample image into the residual network layer of the neural network model, and through the convolution processing of the residual network layer, the fourth feature map of the road image is output;
    将所述第四特征图输入所述神经网络模型的空洞卷积网络层,经过所述空洞卷积网络层的特征提取,输出第五特征图;Inputting the fourth feature map into the hollow convolutional network layer of the neural network model, and outputting the fifth feature map through feature extraction of the hollow convolutional network layer;
    将所述第五特征图进行上采样处理,并将上采样处理的结果输入所述神经网络模型的检测器,经过所述检测器的卷积处理后,输出所述多个场景的道路图像的分类预测结果,所述分类预测结果用于指示所述多个场景的道路图像中的车道所在的位置。The fifth feature map is subjected to up-sampling processing, and the result of the up-sampling processing is input into the detector of the neural network model, and after the convolution processing of the detector, the road images of the multiple scenes are output. Classification prediction results, where the classification prediction results are used to indicate where the lanes in the road images of the multiple scenes are located.
  9. 如权利要求8所述的方法,其特征在于,所述神经网络模型的残差网络层包括三个残差模块,所述方法包括:The method of claim 8, wherein the residual network layer of the neural network model comprises three residual modules, and the method comprises:
    将所述第五特征图进行上采样处理,得到第六特征图;performing up-sampling processing on the fifth feature map to obtain a sixth feature map;
    将样本图像中的所述多个场景的道路图像输入所述残差网络层,经过所述残差网络层的前两个所述残差模块的卷积处理,输出第七特征图;Inputting the road images of the multiple scenes in the sample image into the residual network layer, and through the convolution processing of the first two residual modules of the residual network layer, the seventh feature map is output;
    将所述第七特征图输入所述语义分割模型,经过所述语义分割模型对所述第七特征图中的车道线像素进行分割,输出车道线实例特征图;Inputting the seventh feature map into the semantic segmentation model, segmenting the lane line pixels in the seventh feature map through the semantic segmentation model, and outputting a lane line instance feature map;
    将所述车道线实例特征图与所述第六特征图进行拼接,得到拼接图像。Splicing the feature map of the lane line instance with the sixth feature map to obtain a spliced image.
  10. 如权利要求9所述的方法,其特征在于,所述方法包括:The method of claim 9, wherein the method comprises:
    将所述拼接图像输入分割器,经过所述分割器的卷积处理,输出语义分割预测结果,所述语义分割预测结果用于指示所述多个场景的道路图像中的车道识别结果。The stitched image is input into a segmenter, and after convolution processing by the segmenter, a semantic segmentation prediction result is output, and the semantic segmentation prediction result is used to indicate the lane recognition results in the road images of the multiple scenes.
  11. 如权利要求8所述的方法,其特征在于,所述方法还包括:The method of claim 8, wherein the method further comprises:
    通过第一损失函数计算神经网络模型的分类预测结果相对于样本图像中所述标记图像的预测概率的第一误差值,通过所述第一误差值调整所述神经网络模型的参数;所述第一损失函数表示如下:The first loss function is used to calculate the first error value of the classification prediction result of the neural network model relative to the prediction probability of the marked image in the sample image, and the parameters of the neural network model are adjusted by the first error value; A loss function is expressed as follows:
    Figure PCTCN2020136540-appb-100001
    Figure PCTCN2020136540-appb-100001
    其中,y为训练集中的样本图像中的所述多个场景的道路图像的分类真值,p为训练集中样本图像中的所述多个场景的道路图像经过所述神经网络模型处理的预测概率,γ为预设的权重值。Wherein, y is the classification truth value of the road images of the multiple scenes in the sample images in the training set, and p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the neural network model , γ is the preset weight value.
  12. 如权利要求8或11所述的方法,其特征在于,所述方法还包括:The method of claim 8 or 11, wherein the method further comprises:
    通过第二损失函数计算所述语义分割模型的车道识别结果相对于样本图像中所述标记图像的预测概率的第二误差值,通过所述第二误差值调整所述神经网络模型的参数;所述第二损失函数表示如下:Calculate the second error value of the lane recognition result of the semantic segmentation model relative to the predicted probability of the marked image in the sample image by using the second loss function, and adjust the parameters of the neural network model by the second error value; The second loss function is expressed as follows:
    Figure PCTCN2020136540-appb-100002
    Figure PCTCN2020136540-appb-100002
    其中,y为训练集中的样本图像中的所述多个场景的道路图像的识别真值,p为训练集中样本图像中的所述多个场景的道路图像经过所述语义分割模型处理的预测概率。Wherein, y is the recognition truth value of the road images of the multiple scenes in the sample images in the training set, and p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the semantic segmentation model .
  13. 一种检测车道线的装置,其特征在于,包括:A device for detecting lane lines, comprising:
    获取单元,用于获取当前场景的道路图像;an acquisition unit for acquiring the road image of the current scene;
    处理单元,用于将所述道路图像输入训练后的神经网络模型中处理,输出所述当前场景的道路图像中的车道线的检测结果;其中,所述训练后的神经网络模型为根据训练集中的样本图像以及语义分割模型训练得到,所述训练集中的样本图像包括采集的多个场景的道路图像以及与多个场景的道路图像对应的标记图像。The processing unit is used to input the road image into the trained neural network model for processing, and output the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model is based on the training set The sample images in the training set are obtained by training the sample images and the semantic segmentation model, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of the multiple scenes.
  14. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至12任一项所述的方法。A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the computer program, the process according to claim 1 to The method of any one of 12.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至12任一项所述的方法。A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1 to 12 is implemented.
PCT/CN2020/136540 2020-12-15 2020-12-15 Traffic lane line detection method and apparatus, and terminal device and readable storage medium WO2022126377A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/136540 WO2022126377A1 (en) 2020-12-15 2020-12-15 Traffic lane line detection method and apparatus, and terminal device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/136540 WO2022126377A1 (en) 2020-12-15 2020-12-15 Traffic lane line detection method and apparatus, and terminal device and readable storage medium

Publications (1)

Publication Number Publication Date
WO2022126377A1 true WO2022126377A1 (en) 2022-06-23

Family

ID=82059813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136540 WO2022126377A1 (en) 2020-12-15 2020-12-15 Traffic lane line detection method and apparatus, and terminal device and readable storage medium

Country Status (1)

Country Link
WO (1) WO2022126377A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082888A (en) * 2022-08-18 2022-09-20 北京轻舟智航智能技术有限公司 Lane line detection method and device
CN115147812A (en) * 2022-07-05 2022-10-04 小米汽车科技有限公司 Lane line detection method, lane line detection device, vehicle, and storage medium
CN115471803A (en) * 2022-08-31 2022-12-13 北京四维远见信息技术有限公司 Method, device and equipment for extracting traffic identification line and readable storage medium
CN116071374A (en) * 2023-02-28 2023-05-05 华中科技大学 Lane line instance segmentation method and system
CN116129379A (en) * 2022-12-28 2023-05-16 国网安徽省电力有限公司芜湖供电公司 Lane line detection method in foggy environment
CN116229379A (en) * 2023-05-06 2023-06-06 浙江大华技术股份有限公司 Road attribute identification method and device, electronic equipment and storage medium
CN116453121A (en) * 2023-06-13 2023-07-18 合肥市正茂科技有限公司 Training method and device for lane line recognition model
CN116543365A (en) * 2023-07-06 2023-08-04 广汽埃安新能源汽车股份有限公司 Lane line identification method and device, electronic equipment and storage medium
CN116935349A (en) * 2023-09-15 2023-10-24 华中科技大学 Lane line detection method, system, equipment and medium based on Zigzag transformation
CN116994145A (en) * 2023-09-05 2023-11-03 腾讯科技(深圳)有限公司 Lane change point identification method and device, storage medium and computer equipment
CN117081806A (en) * 2023-08-18 2023-11-17 四川农业大学 Channel authentication method based on feature extraction
CN117237286A (en) * 2023-09-02 2023-12-15 国网山东省电力公司淄博供电公司 Method for detecting internal defects of gas-insulated switchgear
CN117372983A (en) * 2023-10-18 2024-01-09 北京化工大学 Low-calculation-force automatic driving real-time multitasking sensing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537197A (en) * 2018-04-18 2018-09-14 吉林大学 A kind of lane detection prior-warning device and method for early warning based on deep learning
US10423840B1 (en) * 2019-01-31 2019-09-24 StradVision, Inc. Post-processing method and device for detecting lanes to plan the drive path of autonomous vehicle by using segmentation score map and clustering map
CN110363770A (en) * 2019-07-12 2019-10-22 安徽大学 A kind of training method and device of the infrared semantic segmentation model of margin guide formula
CN110490205A (en) * 2019-07-23 2019-11-22 浙江科技学院 Road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference
CN111460921A (en) * 2020-03-13 2020-07-28 华南理工大学 Lane line detection method based on multitask semantic segmentation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537197A (en) * 2018-04-18 2018-09-14 吉林大学 A kind of lane detection prior-warning device and method for early warning based on deep learning
US10423840B1 (en) * 2019-01-31 2019-09-24 StradVision, Inc. Post-processing method and device for detecting lanes to plan the drive path of autonomous vehicle by using segmentation score map and clustering map
CN110363770A (en) * 2019-07-12 2019-10-22 安徽大学 A kind of training method and device of the infrared semantic segmentation model of margin guide formula
CN110490205A (en) * 2019-07-23 2019-11-22 浙江科技学院 Road scene semantic segmentation method based on the empty convolutional neural networks of Complete Disability difference
CN111460921A (en) * 2020-03-13 2020-07-28 华南理工大学 Lane line detection method based on multitask semantic segmentation

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147812A (en) * 2022-07-05 2022-10-04 小米汽车科技有限公司 Lane line detection method, lane line detection device, vehicle, and storage medium
CN115147812B (en) * 2022-07-05 2023-05-12 小米汽车科技有限公司 Lane line detection method, device, vehicle and storage medium
CN115082888A (en) * 2022-08-18 2022-09-20 北京轻舟智航智能技术有限公司 Lane line detection method and device
CN115082888B (en) * 2022-08-18 2022-10-25 北京轻舟智航智能技术有限公司 Lane line detection method and device
CN115471803A (en) * 2022-08-31 2022-12-13 北京四维远见信息技术有限公司 Method, device and equipment for extracting traffic identification line and readable storage medium
CN115471803B (en) * 2022-08-31 2024-01-26 北京四维远见信息技术有限公司 Extraction method, device and equipment of traffic identification line and readable storage medium
CN116129379A (en) * 2022-12-28 2023-05-16 国网安徽省电力有限公司芜湖供电公司 Lane line detection method in foggy environment
CN116129379B (en) * 2022-12-28 2023-11-07 国网安徽省电力有限公司芜湖供电公司 Lane line detection method in foggy environment
CN116071374B (en) * 2023-02-28 2023-09-12 华中科技大学 Lane line instance segmentation method and system
CN116071374A (en) * 2023-02-28 2023-05-05 华中科技大学 Lane line instance segmentation method and system
CN116229379B (en) * 2023-05-06 2024-02-02 浙江大华技术股份有限公司 Road attribute identification method and device, electronic equipment and storage medium
CN116229379A (en) * 2023-05-06 2023-06-06 浙江大华技术股份有限公司 Road attribute identification method and device, electronic equipment and storage medium
CN116453121B (en) * 2023-06-13 2023-12-22 合肥市正茂科技有限公司 Training method and device for lane line recognition model
CN116453121A (en) * 2023-06-13 2023-07-18 合肥市正茂科技有限公司 Training method and device for lane line recognition model
CN116543365B (en) * 2023-07-06 2023-10-10 广汽埃安新能源汽车股份有限公司 Lane line identification method and device, electronic equipment and storage medium
CN116543365A (en) * 2023-07-06 2023-08-04 广汽埃安新能源汽车股份有限公司 Lane line identification method and device, electronic equipment and storage medium
CN117081806A (en) * 2023-08-18 2023-11-17 四川农业大学 Channel authentication method based on feature extraction
CN117081806B (en) * 2023-08-18 2024-03-19 四川农业大学 Channel authentication method based on feature extraction
CN117237286A (en) * 2023-09-02 2023-12-15 国网山东省电力公司淄博供电公司 Method for detecting internal defects of gas-insulated switchgear
CN116994145A (en) * 2023-09-05 2023-11-03 腾讯科技(深圳)有限公司 Lane change point identification method and device, storage medium and computer equipment
CN116935349B (en) * 2023-09-15 2023-11-28 华中科技大学 Lane line detection method, system, equipment and medium based on Zigzag transformation
CN116935349A (en) * 2023-09-15 2023-10-24 华中科技大学 Lane line detection method, system, equipment and medium based on Zigzag transformation
CN117372983A (en) * 2023-10-18 2024-01-09 北京化工大学 Low-calculation-force automatic driving real-time multitasking sensing method and device

Similar Documents

Publication Publication Date Title
WO2022126377A1 (en) Traffic lane line detection method and apparatus, and terminal device and readable storage medium
CN112528878B (en) Method and device for detecting lane line, terminal equipment and readable storage medium
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN108846854B (en) Vehicle tracking method based on motion prediction and multi-feature fusion
CN111738995B (en) RGBD image-based target detection method and device and computer equipment
WO2020103893A1 (en) Lane line property detection method, device, electronic apparatus, and readable storage medium
CN109543641B (en) Multi-target duplicate removal method for real-time video, terminal equipment and storage medium
WO2022134996A1 (en) Lane line detection method based on deep learning, and apparatus
WO2023193401A1 (en) Point cloud detection model training method and apparatus, electronic device, and storage medium
CN114359851A (en) Unmanned target detection method, device, equipment and medium
US11887346B2 (en) Systems and methods for image feature extraction
WO2021013227A1 (en) Image processing method and apparatus for target detection
CN116188999B (en) Small target detection method based on visible light and infrared image data fusion
CN111178161A (en) Vehicle tracking method and system based on FCOS
WO2021083126A1 (en) Target detection and intelligent driving methods and apparatuses, device, and storage medium
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111191582A (en) Three-dimensional target detection method, detection device, terminal device and computer-readable storage medium
CN112395962A (en) Data augmentation method and device, and object identification method and system
CN115493612A (en) Vehicle positioning method and device based on visual SLAM
CN109977862B (en) Recognition method of parking space limiter
CN115115973A (en) Weak and small target detection method based on multiple receptive fields and depth characteristics
CN114898306B (en) Method and device for detecting target orientation and electronic equipment
CN116052090A (en) Image quality evaluation method, model training method, device, equipment and medium
CN112446292B (en) 2D image salient object detection method and system
CN112446230B (en) Lane line image recognition method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965395

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965395

Country of ref document: EP

Kind code of ref document: A1