WO2022126377A1 - Procédé et appareil de détection de ligne de voie de circulation, dispositif terminal et support de stockage lisible - Google Patents

Procédé et appareil de détection de ligne de voie de circulation, dispositif terminal et support de stockage lisible Download PDF

Info

Publication number
WO2022126377A1
WO2022126377A1 PCT/CN2020/136540 CN2020136540W WO2022126377A1 WO 2022126377 A1 WO2022126377 A1 WO 2022126377A1 CN 2020136540 W CN2020136540 W CN 2020136540W WO 2022126377 A1 WO2022126377 A1 WO 2022126377A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
convolution
road
network layer
images
Prior art date
Application number
PCT/CN2020/136540
Other languages
English (en)
Chinese (zh)
Inventor
王磊
钟宏亮
马森炜
程俊
林佩珍
范筱媛
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2020/136540 priority Critical patent/WO2022126377A1/fr
Publication of WO2022126377A1 publication Critical patent/WO2022126377A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application belongs to the technical field of computer vision and image processing, and in particular, relates to a method, apparatus, terminal device and readable storage medium for detecting lane lines.
  • the automatic driving of vehicles plays an important role in the safe driving of cars; and lane recognition is an important part of the automatic driving system.
  • the result of lane recognition provides the basis for the control system of automatic driving, and plays an irreplaceable role in the fields of automatic parking, anti-collision warning and unmanned driving.
  • semantic segmentation models have achieved good performance in lane recognition tasks, but limited by the lack of global information and contextual information, ordinary semantic segmentation models cannot handle bad lighting conditions or lane occlusions. for the lane recognition task.
  • most of the deep learning models currently used for lane recognition have a large amount of calculation and are relatively complex, which is not conducive to the real-time requirements in practical application scenarios of automatic driving tasks.
  • One of the purposes of the embodiments of the present application is to provide a method, device, terminal device and readable storage medium for detecting lane lines, which can solve the problem that most of the deep learning models currently used for lane recognition have a large amount of calculation, and the models are relatively It is complex and is not conducive to the real-time requirements in the actual application scenarios of automatic driving tasks.
  • an embodiment of the present application provides a method for detecting lane lines, including:
  • the trained neural network model is based on
  • the sample images in the training set and the semantic segmentation model are obtained by training, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • the trained neural network model includes a residual network layer, a dilated convolutional network layer, an upsampling network layer, and a detector; the road image is input into the training
  • the post neural network model is processed, including:
  • the residual network layer includes a first residual module, a second residual module and a third residual module; the road image is input into the residual
  • the network layer through the convolution processing of the residual network layer, outputs a first feature map containing semantic features, including:
  • the atrous convolutional network layer includes a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a global average pooling module ; Input the first feature map into the hole convolution network layer, and extract the features through the hole convolution network layer, including:
  • the first feature map is input into the atrous convolutional network layer, and after feature extraction of the atrous convolutional network layer, the method further includes:
  • the feature maps respectively output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module are processed in the channel dimension. Splicing to obtain a splicing feature map; after the splicing feature map is processed by 1 ⁇ 1 convolution, the second feature map is output.
  • the method includes:
  • the second feature map is subjected to up-sampling processing, and the result of the up-sampling processing is input into the detector.
  • the classification prediction result of the road image of the current scene is output.
  • the classification prediction result is used to indicate the position of the lane in the road image of the current scene.
  • the method includes:
  • the neural network model is trained according to the sample images in the training set and the semantic segmentation model, including:
  • the fourth feature map is input into the hollow convolutional network layer of the neural network model, and the fifth feature map is output through feature extraction of the hollow convolutional network layer;
  • the fifth feature map is subjected to up-sampling processing, and the The result of the upsampling processing is input into the detector of the neural network model, and after the convolution processing of the detector, the classification prediction results of the road images of the multiple scenes are output, and the classification prediction results are used to indicate the Location of lanes in road images for multiple scenes.
  • the residual network layer of the neural network model includes three residual modules, and the method includes:
  • the convolution processing of the residual module outputs a seventh feature map; the seventh feature map is input into the semantic segmentation model, and the lane line pixels in the seventh feature map are segmented through the semantic segmentation model. , output the lane line instance feature map; splicing the lane line instance feature map and the sixth feature map to obtain a spliced image.
  • the method includes:
  • the stitched image is input into a segmenter, and after convolution processing by the segmenter, a semantic segmentation prediction result is output, and the semantic segmentation prediction result is used to indicate the lane recognition results in the road images of the multiple scenes.
  • the method includes:
  • the first loss function is used to calculate the first error value of the classification prediction result of the neural network model relative to the prediction probability of the marked image in the sample image, and the parameters of the neural network model are adjusted by the first error value;
  • a loss function is expressed as follows:
  • y is the classification truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the neural network model
  • is the preset weight value
  • the method includes:
  • the second loss function is expressed as follows:
  • y is the recognition truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the semantic segmentation model .
  • an embodiment of the present application provides a device for detecting lane lines, including:
  • an acquisition unit for acquiring the road image of the current scene
  • the processing unit is used to input the road image into the trained neural network model for processing, and output the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model is based on the training set
  • the sample images in the training set and the semantic segmentation model are trained, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program implement the method described.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program implements the method when executed by a processor.
  • an embodiment of the present application provides a computer program product that, when the computer program product runs on a terminal device, causes the terminal device to execute the method described in any one of the above-mentioned first aspects.
  • the terminal device obtains the road image of the current scene; the road image is input into the trained neural network model for processing, and the detection result of the lane line in the road image of the current scene is outputted
  • the neural network model after training is obtained by training according to the sample images in the training set and the semantic segmentation model, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes;
  • the trained neural network model can be obtained, which can solve the problem of the low accuracy of lane line recognition in the current complex environment, and the large amount of calculation of the current model for lane line detection.
  • the more complex problem is the slow response speed; the detection accuracy is improved, and at the same time, it meets the real-time requirements in the actual application scenarios of automatic driving tasks; it has strong ease of use and practicability.
  • FIG. 1 is a schematic flowchart of an application scenario provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for detecting lane lines provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of the overall architecture of the neural network model after training provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of the architecture of a residual network layer provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an atrous convolutional network layer provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a detector provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the overall architecture of the training neural network model provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a splitter provided by an embodiment of the present application.
  • FIG. 9 is a visual schematic diagram of a detection result of a lane line provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a device for detecting lane lines provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • lane line detection technology plays an irreplaceable role in the field of autonomous driving, and research on lane line detection technology is also emerging one after another.
  • edge extraction is used to detect lanes, and the operating efficiency of the algorithm is optimized; while traditional lane detection algorithm models are often based on image visual clues.
  • Recognition After converting the image into a model that can represent hue, saturation, and brightness through a color model (Hue-Saturation-Intensity, HSI), the fuzzy mean clustering algorithm is used to process the pixels of each row to identify the lane lines. ; The algorithm model used is more complex, and the model calculation amount is large.
  • the detection task of lane lines is realized through semantic segmentation model and image classification model; however, the adopted model is relatively complex and requires a large amount of calculation, which is not conducive to the actual application of automatic driving tasks. Real-time requirements; and the processing of data by this model destroys the spatial structure of each part of the image and reduces the accuracy of lane line detection.
  • an embodiment of the present application proposes a method for detecting lane lines.
  • the lane lines are detected and recognized through a trained neural network model, and the task of lane recognition is regarded as a task of classifying the positions of pixels in each row of a picture.
  • the unit analyzes the position of the lane line, which better analyzes the global and contextual information, and optimizes the operating efficiency of the model.
  • the model builds a model for processing semantic segmentation tasks during the training process as an aid for training; at the same time, in order to better deal with the global structural features of the image, the setting of the loss function is also adjusted to guide the The model pays better attention to the continuity law of lane lines; the neural network model used to detect lane lines has been improved.
  • the spatial structure of the data is maintained, the perception ability of the model is optimized, and the model's sensitivity is reduced. Complexity.
  • FIG. 1 it is a schematic flowchart of an application scenario provided by an embodiment of the present application.
  • the trained neural network model is obtained, the road image of the current scene is input into the trained neural network model, and the trained convolutional neural network model performs feature extraction and The processing of feature learning, output the detection result of the lane line, and realize the prediction of the position of the lane line.
  • a hollow convolutional network is introduced into the neural network model for detecting lane lines, and the classifier in the neural network model is optimized.
  • the size of the atrous convolutional network optimizes the perception ability of the model, maintains the data space structure, optimizes the complexity of the model, greatly reduces the computational load of the model, and improves the response speed.
  • the semantic segmentation model is used as an auxiliary model for training, and the training process is optimized by combining the convolutional convolution network, which greatly improves the training speed of the neural network model; at the same time, the classifier and the neural network model are optimized.
  • the segmenter of the semantic segmentation model maintains the spatial structure of the feature map in the last layer of the network, reduces the number of parameters of the model, reduces the complexity of the loss function, and optimizes the computational efficiency of the neural network model.
  • the trained neural network model still achieves high detection accuracy for images collected under poor lighting conditions or under occlusion conditions.
  • model training and model architecture are further introduced below with reference to the implementation steps of the method for detecting lane lines provided by the embodiments of the present application.
  • FIG. 2 it is a schematic flowchart of a method for detecting lane lines provided by an embodiment of the present application; the method for detecting lane lines includes the following steps in the application process:
  • Step S201 obtaining a road image of the current scene.
  • the terminal device may use a camera to capture a road image of the current driving scene of the vehicle; the captured road image includes a road image in front of, behind or on the side of the vehicle.
  • the terminal device may be an in-vehicle device, which is respectively connected to the camera and the control system for automatic driving of the vehicle.
  • the terminal device can control the camera to collect the road image of the scene where the vehicle is located in real time or in a preset period according to the requirements of the vehicle driving scene.
  • the road image may include continuous lane lines, partially occluded lane lines, or no lane lines.
  • the characteristics of the lane lines are preset in the terminal device, which provides a prediction basis for the detection of the lane lines in the middle of the road image.
  • the width of the lane lines includes 10 cm, 15 cm and 20 cm; the lane lines are divided into solid lines and dotted lines, and the colors include yellow and white.
  • Step S202 input the road image into the trained neural network model for processing, and output the detection result of the lane lines in the road image of the current scene; wherein, the trained neural network model is trained according to the sample images in the training set and the semantic segmentation model It is obtained that the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • the processing of road images by the neural network model is the actual detection process of lane lines; the actually detected lane lines are composed of lane lines with different line widths, line types and colors in various environments; lane line detection The task is to identify lane lines in various environments, and the goal of lane line detection is to determine the location and direction of the lane lines.
  • the processing of the road image by the trained neural network model includes the extraction and detection of target features.
  • the trained neural network model includes a residual network layer, an atrous convolutional network layer, an upsampling network layer, and a detector. Input the road image into the trained neural network model for processing, including:
  • the road image is input into the residual network layer, and after the convolution processing of the residual network layer, the first feature map containing semantic features is output; , output the second feature map containing detailed features; input the second feature map into the up-sampling network layer, and output the third feature map after the up-sampling process of the up-sampling network layer; input the third feature map into the detector, through the detector
  • the convolution processing of the current scene outputs the detection results of the lane lines in the road image of the current scene.
  • FIG. 3 it is a schematic diagram of the overall architecture of the neural network model after training provided by the embodiment of the present application, and the neural network model after training is a classification detection model.
  • the classification detection model adopts the residual network layer in the residual neural network ResNet-34 (Residual Neural Network-34); the input road image is convolved through the residual network layer to extract the semantic features in the road image. , to get the first feature map.
  • ResNet-34 residual neural network-34
  • a hole convolution network layer is also introduced; the input first feature map is sampled and classified through the hole convolution network layer, while ensuring the receptive field of the preset size, by adopting multiple long holes
  • the convolution kernel performs convolution processing on the first feature map, extracts detailed features and global features of the first feature map, and obtains a second feature map.
  • the upsampling network layer in the classification detection model performs upsampling processing on the input second feature map, and outputs the sampled feature map, that is, the third feature map.
  • the detector of the classification detection model performs two-dimensional convolution processing on the input third feature map, and outputs the recognition result of the lane line.
  • the residual network layer includes a first residual module, a second residual module and a third residual module; the road image is input into the residual network layer, and after convolution processing of the residual network layer, the output includes The first feature map of semantic features, including:
  • the road image is input into the first residual module, and the first result is obtained through convolution processing by the first residual module; Second result; the second result is input into the third residual module, and the first feature map is output after the convolution processing of the third residual module.
  • FIG. 4 a schematic structural diagram of a residual network layer provided by an embodiment of the present application.
  • the residual network layer in the classification and detection model includes three residual modules, namely the first residual module, the second residual module and the third residual module.
  • each residual module includes six 3 ⁇ 3 convolutional layers with edge padding, and the result after processing the six 3 ⁇ 3 convolutional layers with edge padding It is superimposed with the input feature map to obtain the output of each residual module.
  • the residual module in the ResNet-34 network is used to extract and transform the features of the road image.
  • (b) in Figure 4 shows the structure of the residual block.
  • the residual block uses multiple stacked convolutional layers to process the input feature map to analyze and extract the effective information of the image more deeply.
  • the final convolution result will be accumulated with the original input to optimize the number of layers that are too deep, which may lead to vanishing gradients.
  • the identification result of the lane line includes the identified lanes, the positions of the lanes on each row of pixels or pixel units in the image, and the determination of which column of the row each lane belongs to or any column that does not belong to the row. a row.
  • the atrous convolutional network layer includes a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a global average pooling module. Input the first feature map into the atrous convolutional network layer, and extract the features of the atrous convolutional network layer, including:
  • the first feature map is input into the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module respectively, and feature extraction is performed on the first feature map.
  • the atrous convolutional network layer uses convolution kernels of different sizes to perform convolution processing on the feature maps.
  • FIG. 5 it is a schematic structural diagram of an atrous convolutional network layer provided by an embodiment of the present application.
  • Atrous spatial convolution pyramid pooling module and global average pooling module in atrous convolutional network layer as shown in (a) of Fig. 5.
  • the atrous convolution of each size and sampling rate performs convolution processing on the input first feature map, and extracts detailed features of different channel dimensions in the first feature map.
  • the global features of the input first feature map are extracted through the global average pooling module of the atrous convolutional network layer.
  • the first convolution module is a 1 ⁇ 1 convolution
  • the first feature map is input into the atrous convolutional network layer, and after the feature extraction of the atrous convolutional network layer, the method further includes:
  • the feature maps output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module are spliced in the channel dimension to obtain the spliced feature map; After the image is processed by 1 ⁇ 1 convolution, the second feature map is output.
  • the feature map is output after upsampling and 1 ⁇ 1 convolution processing.
  • the feature map output by the global draw pooling module and the feature maps output by the hole convolution and 1 ⁇ 1 convolution of each size and sampling rate, respectively, are spliced in the channel dimension, and the spliced feature map is output; the spliced feature map is then passed through A layer of 1 ⁇ 1 convolutional processing outputs the second feature map.
  • the hole convolution processing in the hole convolution network layer can effectively expand the convolution receptive field while reducing the amount of parameters of the classification and detection model, thereby optimizing the operating efficiency of the model.
  • the left side shows the general structure of the atrous convolution kernel.
  • the hollow convolution kernel only selects some positions in the kernel to give actual weights, such as the weight at the position of the small black square, and ignores the input of the remaining positions.
  • the sampling interval of atrous convolution is called the sampling rate.
  • Ordinary convolution is a hole convolution with a sampling rate of 1, which is a special case of hole convolution.
  • the use of atrous convolution will not cause possible image information due to the gap between the actual weights of the convolution kernel.
  • feature map 2 is output, and feature map 2 continues to undergo multi-layer ordinary convolution processing.
  • the gap weight between the actual weights of hole convolution is filled.
  • the method includes:
  • the second feature map is subjected to upsampling processing, and the result of the upsampling processing is input into the detector.
  • the classification prediction result of the road image of the current scene is output, and the classification prediction result is used to indicate the current scene.
  • FIG. 6 it is a schematic structural diagram of a detector provided by an embodiment of the present application.
  • the second feature map input upsampling network layer After the second feature map input upsampling network layer is sampled, it outputs a sampling feature map in the shape of [the number of lanes, the number of rows, the number of columns + 1], and the sampling feature map continues to pass through two layers of the hole volume filled with edges.
  • the final classification result of the shape of [number of lanes, number of rows, number of columns+1] is output, and the shape of the classification result is the same as that of the input sampling feature map , respectively represent each lane, locate each lane on each row of the image, and determine which column of the row each lane is located in, or any column that does not belong to the row.
  • the extraction and transformation of multi-size features greatly simplifies the detector of the classification detection model, directly using bilinear interpolation and 3 ⁇ 3 two
  • the dimensional convolution performs feature adjustment on the feature output of the hole convolution to obtain the final classification output, which further improves the operating efficiency of the model.
  • the road image is processed by a multi-layer residual network block and then outputs a feature map, and then after a layer of 1 ⁇ 1 convolution processing, the hole convolution pyramid in the middle hole convolution network layer of the model is detected by classification Pooling ASPP for feature processing. It will be processed by three atrous convolution blocks with sampling rates of 1, 3, and 5, and another 1 ⁇ 1 convolution. All convolution operations will use padding operations to keep the input and output sizes unchanged.
  • the atrous convolutional network layer also adopts global average pooling to obtain a global generalization of the input feature map, and restores it to the input with the help of bilinear interpolation upsampling and 1 ⁇ 1 convolution.
  • the second feature map is output after a 1 ⁇ 1 convolution to adjust the dimension features. Therefore, hole convolution processing is used in the middle and end of the classification detection model to comprehensively extract and analyze image features from multiple dimensions; the last layer features in the classification detection model are no longer straightened, but the last layer of the classification detection model is used.
  • the spatial structure of the feature map is maintained, and the final model output is generated with the help of the improved detector, which greatly reduces the number of parameters of the model and improves the operating efficiency of the model.
  • an auxiliary model based on semantic segmentation is used for training.
  • the method includes: marking the pixel coordinates of lane lines, the type of lane lines, and the characteristics of whether the current lane is a drivable lane in the collected road images of multiple scenes, and obtains the corresponding characteristics in the sample images. the marked images corresponding to the road images of the multiple scenes.
  • road images of different scenarios are collected, lane lines in the road images are marked, regions of interest are extracted from the road images and down-sampling operations are performed, and data sets with different annotation formats are processed by a preset processing function. , to obtain the lane line data set required for the training of the neural network model, that is, the marked images corresponding to the road images of multiple scenes in the sample image.
  • the neural network model is trained according to the sample images in the training set and the semantic segmentation model, including:
  • the road images of multiple scenes in the sample image are input into the residual network layer of the neural network model, and after the convolution processing of the residual network layer, the fourth feature map of the road image is output; the fourth feature map is input into the neural network model.
  • the hole convolution network layer after the feature extraction of the hole convolution network layer, outputs the fifth feature map; the fifth feature map is up-sampled, and the result of the up-sampling process is input into the detector of the neural network model, after the detector After the convolution processing, the classification prediction results of the road images of the multiple scenes are output, and the classification prediction results are used to indicate the positions of the lanes in the road images of the multiple scenes.
  • the residual network layer of the neural network model includes three residual modules, and the method includes:
  • the fifth feature map is up-sampled to obtain the sixth feature map; the road images of multiple scenes in the sample image are input into the residual network layer, and after the convolution processing of the first two residual modules of the residual network layer, Output the seventh feature map; input the seventh feature map into the semantic segmentation model, segment the lane line pixels in the seventh feature map through the semantic segmentation model, and output the lane line instance feature map; combine the lane line instance feature map with the sixth feature
  • the images are stitched together to obtain a stitched image.
  • the method includes: inputting the stitched image into a segmenter, subject to convolution processing by the segmenter, and outputting a semantic segmentation prediction result, where the semantic segmentation prediction result is used to indicate lane recognition results in road images of multiple scenes .
  • FIG. 7 a schematic diagram of the overall architecture of the training neural network model provided by the embodiment of the present application; the classification detection model in the upper half is the model used in the lane line detection process, and its output will be used as the final result of the task in the prediction stage ; The semantic segmentation model in the lower part only participates in the training process of the model and is used to guide the model to obtain a more accurate recognition effect.
  • the road images of multiple scenes in the sample image are input into the residual network layer of the neural network model.
  • the semantic features of the road image are extracted, and the first Four feature maps.
  • the fourth feature map is input into the hole convolution of each size and sampling rate of the hole convolution network layer, 1 ⁇ 1 convolution, and global average pooling for convolution processing to extract different detailed features and global features.
  • the image splicing process and the 1 ⁇ 1 convolution process output the fifth feature map.
  • the feature map of the road image is sampled and classified by the hole convolution network layer in the neural network model to ensure a larger receptive field. Detail perception ability, and reduce the number of parameters of the model.
  • Global feature extraction is performed on the feature map through global draw pooling in the atrous convolutional network layer, and the feature map is output after upsampling and 1 ⁇ 1 convolution.
  • the feature map output by each hole convolution kernel and the feature map output by the global draw pooling are spliced in the channel dimension, and then the fifth feature map is output after a layer of 1 ⁇ 1 convolution processing.
  • the output result of the second residual module of the neural network model (such as ResNet-34) is input into the semantic segmentation model, and the second residual is processed by the semantic segmentation model.
  • the output of the module is divided into pixels, and the feature map of the lane line instance is output.
  • the fifth feature map is output after being processed by the atrous convolutional network layer for up-sampling processing, and the sixth feature map is output.
  • the lane line instance feature map and the sixth feature map are spliced to obtain a spliced image.
  • the stitched image is input into a segmenter, and the segmenter performs two-dimensional convolution processing to output a semantic segmentation prediction result [number of lanes + 1, number of rows, number of columns], indicating that for each basic pixel in the image zone, which lane it belongs to or does not belong to any lane.
  • FIG. 8 a schematic structural diagram of a splitter provided by an embodiment of the present application.
  • the network before the splitter upsamples the feature map to the shape of [number of lanes + 1, number of rows, number of columns], and after two layers of convolution with edge-filled holes and a layer of ordinary 3 ⁇ 3 convolution, the output
  • the final semantic segmentation prediction result, the shape of the semantic segmentation prediction result is the same as the input feature map, which is [the number of lanes + 1, the number of rows, the number of columns], indicating which lane or Does not belong to any lane.
  • the segmenter is the output part of the semantic segmentation model.
  • the neural network is trained using an auxiliary model based on semantic segmentation. After splicing the results of the second residual module of the neural network model ResNet-34 with the output of the up-sampled atrous convolutional network layer through the semantic segmentation model, the output is directly processed by a two-dimensional convolution-based semantic segmenter. The goal of this output is to determine the recognition results of each lane in road images of multiple scenes, such as whether each part of the image belongs to a certain lane, which lane it belongs to, and performs the task of semantic segmentation.
  • the first loss function is used to calculate the first error value of the classification prediction result of the neural network model relative to the prediction probability of the marked image in the sample image, and the parameters of the neural network model are adjusted by the first error value;
  • the first loss The function is represented as follows:
  • y is the classification truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the neural network model
  • is the preset weight value
  • the classification detection model adopts Focal Loss as its first loss function in the training process.
  • y refers to the classification truth value of the sample
  • p refers to the predicted probability of the sample
  • is the set weight.
  • Focal Loss can attach larger weights to hard-to-classify and severely misclassified samples, making the model more focused on samples during training.
  • a second error value of the lane recognition result of the semantic segmentation model relative to the predicted probability of the marked image in the sample image is calculated by a second loss function, and the parameters of the neural network model are adjusted by the second error value; the second loss The function is represented as follows:
  • y is the recognition truth value of the road images of the multiple scenes in the sample images in the training set
  • p is the predicted probability that the road images of the multiple scenes in the sample images in the training set are processed by the semantic segmentation model .
  • the semantic segmentation model adopts the cross entropy loss function Cross Entropy Loss as the second loss function, and the calculation formula of Cross Entropy Loss is shown in formula (2), where y refers to the classification truth value of the sample , p refers to the predicted probability of the sample.
  • y refers to the classification truth value of the sample
  • p refers to the predicted probability of the sample.
  • the straightening operation is no longer performed on the feature map of the last layer in the classification detection model.
  • the structured loss function that can constrain the spatial features of the lane lines is used for processing, and the output feature map of the atrous convolutional network layer is directly subjected to bilinear interpolation upsampling and two-dimensional convolution processing; the spatial structure of the feature map is maintained, and the reduces the training burden of the model.
  • the parameter quantity of the model is optimized, and the operation efficiency of the model is improved; the spatial structure of the feature map is maintained at the end of the model, which is conducive to the analysis of the overall characteristics of the image; the use of hole spatial convolution pyramid pooling optimizes the
  • the feature analysis in the second half of the model improves the receptive field of the convolution kernel without increasing the training burden of the model, and uses multiple convolution kernels of different sizes to analyze different features of the feature map, enhancing the generalization of classification and semantic segmentation. It improves the detection accuracy of tasks.
  • the same data set as the traditional neural network model was used for experimental verification, and the traditional neural network model was compared in terms of the training speed, convergence speed and detection accuracy of the model. Significant progress has been made.
  • Table 1 the comparison of the detection accuracy and detection speed between the trained neural network model provided by the embodiment of the present application and the traditional original model.
  • the correct rate refers to the correct rate of recognition of lane line pixels by the model on an image of 800 ⁇ 288 pixels.
  • the running speed refers to the time it takes for the model to process a batch of 16 images.
  • the trained neural network model traditional original model Correct rate 92.96% 92.04% running speed 32ms/batch 60ms/batch
  • an embodiment of the present application provides a visual schematic diagram of a detection result of a lane line.
  • FIG. 9 shows the detection results of the trained neural network model provided by the embodiments of the present application in two sets of test images. It can be seen that, as shown in (a) of FIG. 9 , the trained neural network model provided by the embodiment of the present application can accurately identify multiple lane lines in the image; and as shown in (b) of FIG. 9 , even if In the case that the lane line is blocked by obstacles, it can still maintain a good recognition effect.
  • FIG. 10 shows a structural block diagram of the device for detecting lane lines provided by the embodiments of the present application. part.
  • the device includes:
  • an acquisition unit 101 used for acquiring a road image of the current scene
  • the processing unit 102 is configured to input the road image into the trained neural network model for processing, and output the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model is based on training
  • the sample images in the set and the semantic segmentation model are obtained by training, and the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes.
  • the terminal device obtains the road image of the current scene; inputs the road image into the trained neural network model for processing, and outputs the detection result of the lane line in the road image of the current scene; wherein, the trained neural network model
  • the sample images in the training set include collected road images of multiple scenes and marked images corresponding to the road images of multiple scenes;
  • the segmentation model is trained to obtain the trained neural network model, which can solve the problem of low accuracy of lane line recognition in the current complex environment, and the current model used for lane line detection has a large amount of calculation and a relatively complex model.
  • the problem is that the response speed is slow ; Improve the detection accuracy, while meeting the real-time requirements in the actual application scenarios of automatic driving tasks; it has strong ease of use and practicability.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
  • the embodiments of the present application provide a computer program product, when the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be implemented when the mobile terminal executes the computer program product.
  • FIG. 11 is a schematic structural diagram of a terminal device 11 according to an embodiment of the present application.
  • the terminal device 11 of this embodiment includes: at least one processor 110 (only one is shown in FIG. 11 ), a processor, a memory 111 , and a processor 111 stored in the memory 111 and available for processing in the at least one processor A computer program 112 running on the processor 110, and the processor 110 implements the steps in any of the foregoing embodiments of the certificate storage method when the processor 110 executes the computer program 112.
  • the terminal device 11 may be a computing device such as a desktop computer, a notebook, a palmtop computer, a vehicle-mounted device, and a cloud server.
  • the terminal device 11 may include, but is not limited to, a processor 110 and a memory 111 .
  • FIG. 11 is only an example of the terminal device 11, and does not constitute a limitation on the terminal device 11, and may include more or less components than the one shown, or combine some components, or different components , for example, may also include input and output devices, network access devices, and the like.
  • the so-called processor 110 may be a central processing unit (Central Processing Unit, CPU), and the processor 110 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 111 may be an internal storage unit of the terminal device 11 in some embodiments, such as a hard disk or a memory of the terminal device 11 . In other embodiments, the memory 111 may also be an external storage device of the terminal device 11, such as a plug-in hard disk equipped on the terminal device 11, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 111 may also include both an internal storage unit of the terminal device 11 and an external storage device.
  • the memory 111 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as program codes of the computer program and the like. The memory 111 may also be used to temporarily store data that has been output or will be output.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the present application realizes all or part of the processes in the methods of the above embodiments, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium.
  • the computer program includes computer program code
  • the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunication signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunications signals.
  • the disclosed apparatus/network device and method may be implemented in other manners.
  • the apparatus/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un appareil de détection de ligne de voie de circulation, un dispositif terminal et un support de stockage lisible, se rapportant au domaine technique de la vision par ordinateur et du traitement d'image. Le procédé comprend les étapes consistant à : obtenir une image de route de la scène actuelle (S201) ; et entrer l'image de route dans un modèle de réseau neuronal entraîné pour le traitement, et délivrer en sortie un résultat de détection d'une ligne de voie de circulation dans l'image de route de la scène actuelle, le modèle de réseau neuronal entraîné étant obtenu par apprentissage selon des images d'échantillon dans un ensemble d'apprentissage et un modèle de segmentation sémantique, et les images d'échantillon dans l'ensemble d'apprentissage comprenant des images de route collectées de multiples scènes et des images marquées correspondant aux images de route de multiples scènes (S202). Le procédé peut résoudre les problèmes selon lesquels la plupart des modèles actuels d'apprentissage profond utilisés pour la reconnaissance de voie ont une quantité de calcul relativement importante, et les modèles sont relativement complexes et permettent peu de satisfaire l'exigence de performance en temps réel dans une scène d'application réelle d'une tâche de conduite automatique.
PCT/CN2020/136540 2020-12-15 2020-12-15 Procédé et appareil de détection de ligne de voie de circulation, dispositif terminal et support de stockage lisible WO2022126377A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/136540 WO2022126377A1 (fr) 2020-12-15 2020-12-15 Procédé et appareil de détection de ligne de voie de circulation, dispositif terminal et support de stockage lisible

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/136540 WO2022126377A1 (fr) 2020-12-15 2020-12-15 Procédé et appareil de détection de ligne de voie de circulation, dispositif terminal et support de stockage lisible

Publications (1)

Publication Number Publication Date
WO2022126377A1 true WO2022126377A1 (fr) 2022-06-23

Family

ID=82059813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136540 WO2022126377A1 (fr) 2020-12-15 2020-12-15 Procédé et appareil de détection de ligne de voie de circulation, dispositif terminal et support de stockage lisible

Country Status (1)

Country Link
WO (1) WO2022126377A1 (fr)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082888A (zh) * 2022-08-18 2022-09-20 北京轻舟智航智能技术有限公司 一种车道线检测方法和装置
CN115147812A (zh) * 2022-07-05 2022-10-04 小米汽车科技有限公司 车道线检测方法、装置、车辆和存储介质
CN115187743A (zh) * 2022-07-29 2022-10-14 江西科骏实业有限公司 一种地铁站内部环境布置预测和白模采集方法及系统
CN115471803A (zh) * 2022-08-31 2022-12-13 北京四维远见信息技术有限公司 交通标识线的提取方法、装置、设备和可读存储介质
CN116071374A (zh) * 2023-02-28 2023-05-05 华中科技大学 一种车道线实例分割方法及系统
CN116129379A (zh) * 2022-12-28 2023-05-16 国网安徽省电力有限公司芜湖供电公司 一种雾天环境下的车道线检测方法
CN116229379A (zh) * 2023-05-06 2023-06-06 浙江大华技术股份有限公司 一种道路属性识别方法、装置、电子设备及存储介质
CN116453121A (zh) * 2023-06-13 2023-07-18 合肥市正茂科技有限公司 一种车道线识别模型的训练方法及装置
CN116543365A (zh) * 2023-07-06 2023-08-04 广汽埃安新能源汽车股份有限公司 一种车道线识别方法、装置、电子设备及存储介质
CN116935349A (zh) * 2023-09-15 2023-10-24 华中科技大学 一种基于Zigzag变换的车道线检测方法、系统、设备及介质
CN116994145A (zh) * 2023-09-05 2023-11-03 腾讯科技(深圳)有限公司 车道线变化点的识别方法、装置、存储介质及计算机设备
CN117081806A (zh) * 2023-08-18 2023-11-17 四川农业大学 一种基于特征提取的信道认证方法
CN117237286A (zh) * 2023-09-02 2023-12-15 国网山东省电力公司淄博供电公司 一种气体绝缘开关设备内部缺陷检测方法
CN117372983A (zh) * 2023-10-18 2024-01-09 北京化工大学 一种低算力的自动驾驶实时多任务感知方法及装置
CN118015286A (zh) * 2024-04-09 2024-05-10 杭州像素元科技有限公司 通过背景分割进行收费站车道通行状态检测的方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537197A (zh) * 2018-04-18 2018-09-14 吉林大学 一种基于深度学习的车道线检测预警装置及预警方法
US10423840B1 (en) * 2019-01-31 2019-09-24 StradVision, Inc. Post-processing method and device for detecting lanes to plan the drive path of autonomous vehicle by using segmentation score map and clustering map
CN110363770A (zh) * 2019-07-12 2019-10-22 安徽大学 一种边缘引导式红外语义分割模型的训练方法及装置
CN110490205A (zh) * 2019-07-23 2019-11-22 浙江科技学院 基于全残差空洞卷积神经网络的道路场景语义分割方法
CN111460921A (zh) * 2020-03-13 2020-07-28 华南理工大学 一种基于多任务语义分割的车道线检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537197A (zh) * 2018-04-18 2018-09-14 吉林大学 一种基于深度学习的车道线检测预警装置及预警方法
US10423840B1 (en) * 2019-01-31 2019-09-24 StradVision, Inc. Post-processing method and device for detecting lanes to plan the drive path of autonomous vehicle by using segmentation score map and clustering map
CN110363770A (zh) * 2019-07-12 2019-10-22 安徽大学 一种边缘引导式红外语义分割模型的训练方法及装置
CN110490205A (zh) * 2019-07-23 2019-11-22 浙江科技学院 基于全残差空洞卷积神经网络的道路场景语义分割方法
CN111460921A (zh) * 2020-03-13 2020-07-28 华南理工大学 一种基于多任务语义分割的车道线检测方法

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147812B (zh) * 2022-07-05 2023-05-12 小米汽车科技有限公司 车道线检测方法、装置、车辆和存储介质
CN115147812A (zh) * 2022-07-05 2022-10-04 小米汽车科技有限公司 车道线检测方法、装置、车辆和存储介质
CN115187743A (zh) * 2022-07-29 2022-10-14 江西科骏实业有限公司 一种地铁站内部环境布置预测和白模采集方法及系统
CN115082888B (zh) * 2022-08-18 2022-10-25 北京轻舟智航智能技术有限公司 一种车道线检测方法和装置
CN115082888A (zh) * 2022-08-18 2022-09-20 北京轻舟智航智能技术有限公司 一种车道线检测方法和装置
CN115471803A (zh) * 2022-08-31 2022-12-13 北京四维远见信息技术有限公司 交通标识线的提取方法、装置、设备和可读存储介质
CN115471803B (zh) * 2022-08-31 2024-01-26 北京四维远见信息技术有限公司 交通标识线的提取方法、装置、设备和可读存储介质
CN116129379B (zh) * 2022-12-28 2023-11-07 国网安徽省电力有限公司芜湖供电公司 一种雾天环境下的车道线检测方法
CN116129379A (zh) * 2022-12-28 2023-05-16 国网安徽省电力有限公司芜湖供电公司 一种雾天环境下的车道线检测方法
CN116071374A (zh) * 2023-02-28 2023-05-05 华中科技大学 一种车道线实例分割方法及系统
CN116071374B (zh) * 2023-02-28 2023-09-12 华中科技大学 一种车道线实例分割方法及系统
CN116229379A (zh) * 2023-05-06 2023-06-06 浙江大华技术股份有限公司 一种道路属性识别方法、装置、电子设备及存储介质
CN116229379B (zh) * 2023-05-06 2024-02-02 浙江大华技术股份有限公司 一种道路属性识别方法、装置、电子设备及存储介质
CN116453121A (zh) * 2023-06-13 2023-07-18 合肥市正茂科技有限公司 一种车道线识别模型的训练方法及装置
CN116453121B (zh) * 2023-06-13 2023-12-22 合肥市正茂科技有限公司 一种车道线识别模型的训练方法及装置
CN116543365B (zh) * 2023-07-06 2023-10-10 广汽埃安新能源汽车股份有限公司 一种车道线识别方法、装置、电子设备及存储介质
CN116543365A (zh) * 2023-07-06 2023-08-04 广汽埃安新能源汽车股份有限公司 一种车道线识别方法、装置、电子设备及存储介质
CN117081806A (zh) * 2023-08-18 2023-11-17 四川农业大学 一种基于特征提取的信道认证方法
CN117081806B (zh) * 2023-08-18 2024-03-19 四川农业大学 一种基于特征提取的信道认证方法
CN117237286A (zh) * 2023-09-02 2023-12-15 国网山东省电力公司淄博供电公司 一种气体绝缘开关设备内部缺陷检测方法
CN117237286B (zh) * 2023-09-02 2024-05-17 国网山东省电力公司淄博供电公司 一种气体绝缘开关设备内部缺陷检测方法
CN116994145A (zh) * 2023-09-05 2023-11-03 腾讯科技(深圳)有限公司 车道线变化点的识别方法、装置、存储介质及计算机设备
CN116935349B (zh) * 2023-09-15 2023-11-28 华中科技大学 一种基于Zigzag变换的车道线检测方法、系统、设备及介质
CN116935349A (zh) * 2023-09-15 2023-10-24 华中科技大学 一种基于Zigzag变换的车道线检测方法、系统、设备及介质
CN117372983A (zh) * 2023-10-18 2024-01-09 北京化工大学 一种低算力的自动驾驶实时多任务感知方法及装置
CN118015286A (zh) * 2024-04-09 2024-05-10 杭州像素元科技有限公司 通过背景分割进行收费站车道通行状态检测的方法及装置

Similar Documents

Publication Publication Date Title
WO2022126377A1 (fr) Procédé et appareil de détection de ligne de voie de circulation, dispositif terminal et support de stockage lisible
CN112528878B (zh) 检测车道线的方法、装置、终端设备及可读存储介质
CN112132156B (zh) 多深度特征融合的图像显著性目标检测方法及系统
WO2019169816A1 (fr) Réseau neuronal profond pour la reconnaissance précise d'attributs de véhicule, et son procédé d'apprentissage
WO2020103893A1 (fr) Procédé de détection de propriété de ligne de voie, dispositif, appareil électronique et support de stockage lisible
CN109543641B (zh) 一种实时视频的多目标去重方法、终端设备及存储介质
CN111738995B (zh) 一种基于rgbd图像的目标检测方法、装置及计算机设备
WO2023193401A1 (fr) Procédé et appareil de formation de modèle de détection de nuage de points, dispositif électronique et support de stockage
US11887346B2 (en) Systems and methods for image feature extraction
WO2021013227A1 (fr) Procédé et appareil de traitement d'image pour la détection de cible
CN116188999B (zh) 一种基于可见光和红外图像数据融合的小目标检测方法
EP4252148A1 (fr) Procédé de détection de ligne de voie basé sur un apprentissage profond et appareil
CN111178161A (zh) 一种基于fcos的车辆追踪方法及系统
CN110852327A (zh) 图像处理方法、装置、电子设备及存储介质
CN111191582A (zh) 三维目标检测方法、检测装置、终端设备及计算机可读存储介质
CN112395962A (zh) 数据增广方法及装置、物体识别方法及系统
CN115493612A (zh) 一种基于视觉slam的车辆定位方法及装置
CN109977862B (zh) 一种车位限位器的识别方法
WO2021083126A1 (fr) Procédés et appareils de détection de cible et de conduite intelligente, dispositif, et support d'informations
CN114898306B (zh) 一种检测目标朝向的方法、装置及电子设备
CN116052090A (zh) 图像质量评估方法、模型训练方法、装置、设备及介质
CN112446292B (zh) 一种2d图像显著目标检测方法及系统
CN112446230B (zh) 车道线图像的识别方法及装置
CN114359572A (zh) 多任务检测模型的训练方法、装置及终端设备
CN114170267A (zh) 目标跟踪方法、装置、设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965395

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965395

Country of ref document: EP

Kind code of ref document: A1