CN112528878A

CN112528878A - Method and device for detecting lane line, terminal device and readable storage medium

Info

Publication number: CN112528878A
Application number: CN202011481079.6A
Authority: CN
Inventors: 王磊; 钟宏亮; 马森炜; 程俊; 林佩珍; 范筱媛
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-03-19
Anticipated expiration: 2040-12-15
Also published as: CN112528878B

Abstract

The application is applicable to the technical field of computer vision and image processing, and provides a method and a device for detecting lane lines, a terminal device and a readable storage medium, wherein the method comprises the following steps: acquiring a road image of a current scene; inputting the road image into a trained neural network model for processing, and outputting a detection result of a lane line in the road image of the current scene; the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, wherein the sample images in the training set comprise collected road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes. The method and the device can solve the problems that most of the existing deep learning models for lane recognition are large in calculated amount, complex in model and not beneficial to the requirement of real-time performance in the actual application scene of the automatic driving task.

Description

Method and device for detecting lane line, terminal device and readable storage medium

Technical Field

The application belongs to the technical field of computer vision and image processing, and particularly relates to a method and device for detecting lane lines, a terminal device and a readable storage medium.

Background

With the rapid development of artificial intelligence and the automobile industry, automatic driving (such as full-automatic unmanned driving or semi-automatic auxiliary driving) of vehicles plays an important role in safe driving of automobiles; the lane recognition is an important component of an automatic driving system, provides a basis for a control system of the automatic driving through a lane recognition result, and plays an irreplaceable role in the fields of automatic parking, anti-collision early warning, unmanned driving and the like.

In recent years, the semantic division model has well expressed on a lane recognition task, but is limited by the lack of global information and context information, and the common semantic division model cannot well process the lane recognition task under the conditions of poor illumination conditions, lane obstruction and the like. Most of the current deep learning models for lane recognition are large in calculated amount and complex in model, and are not favorable for the requirement of real-time performance in the actual application scene of the automatic driving task.

Disclosure of Invention

The embodiment of the application provides a method and a device for detecting lane lines, a terminal device and a readable storage medium, which can solve the problems that most of the existing deep learning models for lane recognition are large in calculated amount and complex in model, and are not beneficial to the requirement on real-time performance in the actual application scene of an automatic driving task.

In a first aspect, an embodiment of the present application provides a method for detecting a lane line, including:

acquiring a road image of a current scene; inputting the road image into a trained neural network model for processing, and outputting a detection result of a lane line in the road image of the current scene; the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, wherein the sample images in the training set comprise collected road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes.

In a possible implementation manner of the first aspect, the trained neural network model includes a residual network layer, a hole convolution network layer, an upsampling network layer, and a detector; the inputting the road image into the trained neural network model for processing comprises:

inputting the road image into the residual error network layer, and outputting a first feature map containing semantic features through convolution processing of the residual error network layer; inputting the first feature map into the cavity convolution network layer, and outputting a second feature map containing detailed features through feature extraction of the cavity convolution network layer; inputting the second characteristic diagram into the up-sampling network layer, and outputting a third characteristic diagram after up-sampling processing of the up-sampling network layer; and inputting the third feature map into the detector, and outputting the detection result of the lane line in the road image of the current scene through convolution processing of the detector.

In a possible implementation manner of the first aspect, the residual network layer includes a first residual module, a second residual module, and a third residual module; the inputting the road image into the residual error network layer, and outputting a first feature map containing semantic features through convolution processing of the residual error network layer, including:

inputting the road image into the first residual error module, and performing convolution processing through the first residual error module to obtain a first result; inputting the first result into the second residual error module, and obtaining a second result through convolution processing of the second residual error module; and inputting the second result into the third residual error module, and outputting the first characteristic diagram after convolution processing of the third residual error module.

In a possible implementation manner of the first aspect, the void convolutional network layer includes a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a global average pooling module; inputting the first feature map into the cavity convolution network layer, and performing feature extraction on the cavity convolution network layer, wherein the feature extraction comprises the following steps:

and inputting the first feature map into the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module respectively, and performing feature extraction on the first feature map.

In a possible implementation manner of the first aspect, inputting the first feature map into the void convolutional network layer, and after feature extraction of the void convolutional network layer, the method further includes:

splicing feature maps respectively output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module on channel dimensions to obtain spliced feature maps; and after the splicing feature map is subjected to 1 × 1 convolution processing, outputting the second feature map.

In a possible implementation manner of the first aspect, after feature extraction is performed on the first feature map by the void convolutional network layer, the method includes:

and performing up-sampling processing on the second feature map, inputting the result of the up-sampling processing into the detector, and outputting a classification prediction result of the road image of the current scene after convolution processing of the detector, wherein the classification prediction result is used for indicating the position of the lane in the road image of the current scene.

In one possible implementation manner of the first aspect, the method includes:

marking the acquired lane line pixel point coordinates of the road images of the plurality of scenes, the lane line type and the characteristic of whether the current lane is a drivable lane, and obtaining the marked images corresponding to the road images of the plurality of scenes in the sample image.

In a possible implementation manner of the first aspect, training a neural network model according to a sample image in a training set and a semantic segmentation model includes:

inputting road images of the scenes in the sample image into a residual error network layer of the neural network model, and outputting a fourth feature map of the road image after convolution processing of the residual error network layer; inputting the fourth feature map into a cavity convolution network layer of the neural network model, and outputting a fifth feature map through feature extraction of the cavity convolution network layer; and performing upsampling processing on the fifth feature map, inputting the result of the upsampling processing into a detector of the neural network model, and outputting classification prediction results of the road images of the multiple scenes after convolution processing of the detector, wherein the classification prediction results are used for indicating positions of lanes in the road images of the multiple scenes.

In one possible implementation manner of the first aspect, the residual network layer of the neural network model includes three residual modules, and the method includes:

performing upsampling processing on the fifth characteristic diagram to obtain a sixth characteristic diagram; inputting road images of the scenes in the sample image into the residual error network layer, and outputting a seventh feature map through convolution processing of the first two residual error modules of the residual error network layer; inputting the seventh feature map into the semantic segmentation model, segmenting lane line pixels in the seventh feature map through the semantic segmentation model, and outputting a lane line example feature map; and splicing the lane line example characteristic graph and the sixth characteristic graph to obtain a spliced image.

In one possible implementation manner of the first aspect, the method includes:

and inputting the spliced image into a divider, and outputting a semantic division prediction result after convolution processing of the divider, wherein the semantic division prediction result is used for indicating lane recognition results in road images of the plurality of scenes.

In one possible implementation manner of the first aspect, the method includes:

calculating a first error value of a classification prediction result of a neural network model relative to the prediction probability of the marked image in the sample image through a first loss function, and adjusting parameters of the neural network model through the first error value; the first loss function is represented as follows:

y is a classification true value of the road images of the scenes in the sample images in the training set, p is a prediction probability of the road images of the scenes in the sample images in the training set after being processed by the neural network model, and gamma is a preset weight value.

In one possible implementation manner of the first aspect, the method includes:

calculating a second error value of the lane recognition result of the semantic segmentation model relative to the prediction probability of the marked image in the sample image through a second loss function, and adjusting the parameters of the neural network model through the second error value; the second loss function is expressed as follows:

and y is the identification true value of the road images of the scenes in the sample images in the training set, and p is the prediction probability of the road images of the scenes in the sample images in the training set after being processed by the semantic segmentation model.

In a second aspect, an embodiment of the present application provides an apparatus for detecting a lane line, including:

the acquisition unit is used for acquiring a road image of a current scene;

the processing unit is used for inputting the road image into the trained neural network model for processing and outputting the detection result of the lane line in the road image of the current scene; the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, wherein the sample images in the training set comprise collected road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method of any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that: according to the embodiment of the application, the terminal equipment acquires the road image of the current scene; inputting the road image into the trained neural network model for processing, and outputting the detection result of the lane line in the road image of the current scene; the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, wherein the sample images in the training set comprise acquired road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes; the trained neural network model is obtained by training according to the sample images in the training set and the semantic segmentation model, so that the problems of low lane line identification accuracy in the current complex environment, large model calculation amount and low response speed due to the fact that the current model for detecting the lane lines is complex can be solved; the detection precision is improved, and the real-time requirement in the actual application scene of the automatic driving task is met; has strong usability and practicability.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for detecting a lane line according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an overall architecture of a trained neural network model according to an embodiment of the present application;

fig. 4 is a schematic diagram of an architecture of a residual network layer provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a hole convolutional network layer provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a detector provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of an overall architecture of a trained neural network model provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a divider provided in an embodiment of the present application;

FIG. 9 is a schematic view of a lane line detection result provided in the embodiment of the present application;

fig. 10 is a schematic structural diagram of an apparatus for detecting lane lines according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

At present, the lane line detection technology plays an irreplaceable role in the field of automatic driving, and the research on the lane line detection technology is also endless. Based on Hough transform, combining with various factors such as road surface type and weather, detecting lanes by using edge extraction, and optimizing the operation efficiency of an algorithm; the traditional lane detection algorithm model usually identifies lane lines based on visual clues of images, converts the images into a model capable of representing Hue, Saturation and brightness through a color model (HSI), and then processes pixel points of each line by using a fuzzy mean value clustering algorithm so as to identify the lane lines; the adopted algorithm model is complex, and the calculation amount of the model is large. By adopting a deep learning semantic segmentation technology, a large number of samples under severe weather conditions are added into the lane line identification data set, and a multitask model analyzed based on a Vanishing Point (vanising Point) is provided; or constructing a lane line recognition model for knowledge extraction based on a multiple attention mechanism; or the long-line characteristics of the lane lines are processed by introducing a long-short-term neural network and the like, and the long-line characteristics are used for identifying the lane lines; the spatial relationship between rows and columns in the image cannot be well analyzed through the traditional convolutional neural network.

In addition, aiming at the lane line detection task, a single semantic segmentation angle is adopted for detection, so that the model is optimized; however, the image is straightened during classification, the spatial structure of data is damaged, and the global and context information of the image cannot be well analyzed.

In addition, by fusing semantic segmentation and image classification tasks, the detection task of the lane line is realized through a semantic segmentation model and an image classification model; the adopted model is complex, the calculated amount is large, and the requirement on the real-time performance in the actual application scene of the automatic driving task is not facilitated; and the model processes the data to destroy the spatial structure of each part in the image, thereby reducing the accuracy of lane line detection.

Based on the above problems, the embodiment of the application provides a method for detecting lane lines, the trained neural network model is used for detecting and identifying the lane lines, the lane identification task is regarded as a task for classifying the positions of pixels in each line of a picture, the positions of the lane lines are analyzed in a row unit, global and context information is well analyzed, and meanwhile the operation efficiency of the model is optimized. In addition, the model constructs a model for processing a semantic segmentation task in the training process as an auxiliary for training; meanwhile, in order to better process the global structural features of the image, the setting of a loss function is also adjusted in a targeted manner so as to guide the model to better pay attention to the continuity rule of the lane line; the neural network model for detecting the lane lines is improved, the spatial structure of data is kept during classification detection, the perception capability of the model is optimized, and the complexity of the model is reduced.

Fig. 1 is a schematic flowchart of an application scenario provided in the embodiment of the present application. After training the neural network model based on the assistance of the semantic segmentation model, obtaining the trained neural network model, inputting the road image of the current scene into the trained neural network model, carrying out feature extraction and feature learning processing on the trained convolutional neural network model, outputting the detection result of the lane line, and realizing the prediction of the lane line position.

According to the embodiment of the application, the hole convolution network is introduced into the neural network model for detecting the lane line, the classifier in the neural network model is optimized, the image space structure is kept, meanwhile, the lane line is continuously identified, the perception capability of the model is optimized by means of the hole convolution network with multiple sizes, the data space structure is kept, the complexity of the model is optimized, the calculated amount of the model is greatly reduced, and the response speed is improved.

In the process of training the neural network model, a semantic segmentation model is used as an auxiliary model for training, and the training process is optimized by combining a hole convolution network, so that the training speed of the neural network model is greatly improved; meanwhile, a classifier of the neural network model and a segmenter of the semantic segmentation model are optimized, the spatial structure of the characteristic diagram in a last-layer network is maintained, the parameter quantity of the model is reduced, the complexity of a loss function is reduced, and the calculation efficiency of the neural network model is optimized. And the trained neural network model still achieves high detection precision aiming at the images acquired under poor illumination conditions or the shielding condition.

The following further introduces specific contents of model training and model architecture by combining implementation steps of the method for detecting lane lines provided by the embodiment of the application.

Fig. 2 is a schematic flow chart of a method for detecting a lane line according to an embodiment of the present disclosure; the method for detecting the lane line comprises the following steps in the application process:

step S201, a road image of a current scene is acquired.

In some embodiments, the terminal device may capture a road image of a current driving scene of the vehicle through the camera; the captured road image includes an image of the road ahead, behind, or to the side of the vehicle. The terminal equipment can be vehicle-mounted equipment and is respectively in communication connection with the camera and a control system of automatic driving of the vehicle.

The terminal equipment can control the camera to acquire road images of the scene where the vehicle is located in real time or according to a preset period according to the requirements of the driving scene of the vehicle. The road image may include a continuous lane line, a partially blocked lane line, or no lane line.

Understandably, the terminal device is preset with the characteristics of the lane line, and provides a prediction basis for detecting the middle lane line of the road image. Wherein, according to the specification, the line width of the lane line comprises 10 cm, 15 cm and 20 cm; the lane lines are divided into solid lines and broken lines, and the colors include yellow and white.

Step S202, inputting the road image into the trained neural network model for processing, and outputting the detection result of the lane line in the road image of the current scene; the trained neural network model is obtained by training according to the sample images in the training set and the semantic segmentation model, wherein the sample images in the training set comprise acquired road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes.

In some embodiments, the processing of the road image by the neural network model is an actual detection process of the lane lines; the actually detected lane lines consist of lane lines with different line widths, line types and colors in various environments; the task of lane line detection is the recognition of lane lines in various environments, and the target of lane line detection is the determination of the position and the direction of the lane lines.

The processing of the road image through the trained neural network model comprises the extraction and detection of target features.

In some embodiments, the trained neural network model includes a residual network layer, a hole convolution network layer, an upsampling network layer, and a detector. Inputting the road image into the trained neural network model for processing, wherein the processing comprises the following steps:

inputting a road image into a residual error network layer, and outputting a first feature map containing semantic features through convolution processing of the residual error network layer; inputting the first feature map into a hole convolution network layer, and outputting a second feature map containing detailed features through feature extraction of the hole convolution network layer; inputting the second characteristic diagram into an up-sampling network layer, and outputting a third characteristic diagram after up-sampling processing of the up-sampling network layer; and inputting the third feature map into a detector, and outputting a detection result of the lane line in the road image of the current scene through convolution processing of the detector.

Referring to fig. 3, a schematic diagram of an overall architecture of a trained neural network model provided in the embodiment of the present application is shown, where the trained neural network model is a classification detection model. Wherein, the classification detection model adopts a Residual error Network layer in a Residual error Neural Network ResNet-34(Residual Neural Network-34); and performing convolution processing on the input road image through a residual error network layer, and extracting semantic features in the road image to obtain a first feature map. In the classification detection model, a cavity convolution network layer is also introduced; sampling and classifying the input first characteristic diagram through the cavity convolution network layer, ensuring the preset receptive field, performing convolution processing on the first characteristic diagram through adopting a plurality of dimension-long cavity convolution cores, extracting the detail characteristics and the global characteristics of the first characteristic diagram, and obtaining a second characteristic diagram. In order to ensure that the feature map is equal to the original input road image in size, the input second feature map is subjected to up-sampling processing through an up-sampling network layer in the classification detection model, and a sampling feature map, namely a third feature map, is output. And performing two-dimensional convolution processing on the input third characteristic diagram through a detector of the classification detection model, and outputting a lane line identification result.

In some embodiments, the residual network layer includes a first residual module, a second residual module, and a third residual module; inputting a road image into a residual error network layer, performing convolution processing on the residual error network layer, and outputting a first feature map containing semantic features, wherein the method comprises the following steps:

inputting the road image into a first residual error module, and performing convolution processing through the first residual error module to obtain a first result; inputting the first result into a second residual error module, and performing convolution processing on the first result through the second residual error module to obtain a second result; and inputting the second result into a third residual error module, and outputting a first characteristic diagram through convolution processing of the third residual error module.

As shown in fig. 4, a schematic structural diagram of a residual network layer provided in the embodiment of the present application is shown. As shown in fig. 4 (a), the residual network layer in the classification detection model includes three residual modules, i.e., a first residual module, a second residual module, and a third residual module. As shown in fig. 4 (b), each residual error module includes six 3 × 3 convolutional layers with edge padding, and the results of processing the six 3 × 3 convolutional layers with edge padding are superimposed on the input feature map to obtain the output result of each residual error module.

Illustratively, a residual module in a ResNet-34 network is adopted to extract and transform the features of the road image. Taking the first residual block of ResNet-34 as an example, the structure of the residual block is shown in the diagram (b) in FIG. 4. The residual block processes the input characteristic diagram by adopting multilayer stacked convolution layers so as to further analyze and extract effective information of the image. The final convolution result will be accumulated with the original input to optimize the number of network layers too deep, which may result in the gradient vanishing phenomenon. To ensure that the size of the output feature is unchanged compared to the input feature after each convolution, the edges of the input feature map need to be filled with 0 values before each convolution.

It will be appreciated that the lane line identification includes the individual lanes identified, the location of the individual lanes on each row of pixels or pixel cells in the image, and the determination of which column of the row each lane belongs to or does not belong to any column of the row.

In some embodiments, the hole convolutional network layer includes a first convolutional module, a second convolutional module, a third convolutional module, a fourth convolutional module, and a global average pooling module. Inputting the first feature map into the cavity convolution network layer, and performing feature extraction through the cavity convolution network layer, wherein the feature extraction comprises the following steps:

and respectively inputting the first feature map into a first convolution module, a second convolution module, a third convolution module, a fourth convolution module and a global average pooling module, and performing feature extraction on the first feature map.

In some embodiments, the hole convolution network layer performs convolution processing on the feature map by using convolution kernels with different sizes.

Fig. 5 is a schematic structural diagram of a hole convolutional network layer provided in the embodiment of the present application. The void space convolution pyramid pooling module and the global average pooling module in the void convolution network layer as shown in (a) of fig. 5. The cavity space convolution pyramid pooling module comprises a plurality of convolution kernels with different sizes and cavity convolutions with different sampling rates, and detail features of the input first feature map are analyzed and extracted through the cavity convolutions. The 1 × 1 convolution shown in fig. 5 (a), the 3 × 3 convolution with a sampling rate of 1, the 3 × 3 convolution with a sampling rate of 3, and the 3 × 3 convolution with a sampling rate of 5. And performing convolution processing on the input first characteristic diagram by the cavity convolution of each size and sampling rate, and extracting detail characteristics of different channel dimensions in the first characteristic diagram. And extracting the global features of the input first feature map through a global average pooling module of the cavity convolutional network layer.

Illustratively, the first convolution module is a 1 × 1 convolution, the second convolution module is a 3 × 3 convolution with a sample rate of 1, the third convolution module is a 3 × 3 convolution with a sample rate of 3, and the fourth convolution module is a 3 × 3 convolution with a sample rate of 5.

In some embodiments, inputting the first feature map into the hole convolutional network layer, and after feature extraction by the hole convolutional network layer, further comprising:

splicing feature maps respectively output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module on a channel dimension to obtain spliced feature maps; and (4) after the splicing feature map is subjected to 1 × 1 convolution processing, outputting a second feature map.

As shown in fig. 5 (a), after the global features of the first feature map are extracted by the global average pooling module, the feature map is output by upsampling and 1 × 1 convolution processing. Splicing the feature graph output by the global tie pooling module with feature graphs respectively output by the convolution of the holes with various sizes and sampling rates and the convolution of 1 multiplied by 1 on the channel dimension, and outputting a spliced feature graph; and performing 1 × 1 convolution processing on the spliced feature map to output a second feature map.

Compared with the traditional convolution method, the method has the advantages that the cavity convolution processing in the cavity convolution network layer can effectively enlarge the convolution receptive field and reduce the parameter quantity of the classification detection model, so that the operation efficiency of the model is optimized.

As shown in fig. 5 (b), the left side shows the rough structure of the hole convolution kernel. Compared with the traditional convolution kernel, the hole convolution kernel only selects partial positions in the kernel to give actual weights, such as the weights at the positions of black small squares, and neglects the input of the rest positions. The sampling interval of the hole convolution is called the sampling rate. The ordinary convolution is a hole convolution under the condition that the sampling rate is 1, and is a hole convolution under a special condition. As shown in the right part of the graph (b) in fig. 5, after adding multiple layers of ordinary convolution at the tail part of the model, the use of the hole convolution does not cause possible loss of image information due to gaps between convolution verification weights; for example, the feature map 1 outputs a feature map 2 after being subjected to the hole convolution processing, the feature map 2 is continuously subjected to the multilayer common convolution processing, in the common convolution processing process, gap weights among the actual weights of the hole convolution are filled, and a feature map 3 with more complete image information is output. Therefore, the hole convolution is an effective way to increase the image receptive field and reduce the model computation amount.

In some embodiments, after feature extraction of the first feature map through the hole convolution network layer, the method includes:

and performing up-sampling processing on the second characteristic diagram, inputting the result of the up-sampling processing into a detector, and outputting a classification prediction result of the road image of the current scene after convolution processing of the detector, wherein the classification prediction result is used for indicating the position of the lane in the road image of the current scene.

Fig. 6 is a schematic structural diagram of a detector provided in the embodiment of the present application. After the second feature map is input into the upsampling network layer and is subjected to sampling processing, a sampling feature map in the shape of [ lane number, line number and column number +1] is output, the sampling feature map is continuously subjected to two-layer hole convolution (such as 3 × 3 convolution with a sampling rate of 1) by adopting edge filling, a final classification result in the shape of [ lane number, line number and column number +1] is output, the shape of the classification result is the same as that of the input sampling feature map, the classification result respectively corresponds to each lane, positions each lane on each line of the image, and determines which column of each lane is located or does not belong to any column of the line.

According to the embodiment of the application, based on the cavity convolution of the cavity convolution network layer, the extraction and transformation processing of the multi-size features are carried out, the detector of the classification detection model is greatly simplified, the feature output of the cavity convolution is directly subjected to feature adjustment by adopting bilinear interpolation and 3 multiplied by 3 two-dimensional convolution to obtain the final classification output, and the operation efficiency of the model is further improved.

According to the embodiment of the application, the road image is processed by a plurality of layers of residual error network blocks and then the characteristic graph is output, and after being processed by a layer of 1 × 1 convolution, the road image is subjected to characteristic processing by the cavity convolution pyramid pooling ASPP in the middle cavity convolution network layer of the classification detection model. Three hole convolution blocks with sampling rates of 1, 3 and 5 respectively and another 1 x 1 convolution are processed, and all convolution operations use Padding operation to keep the input and output sizes unchanged. In order to better grasp the global characteristics of the characteristic diagram, the cavity convolution network layer also adopts global average pooling to obtain global generalization of the input characteristic diagram, reduces the input characteristic diagram into the size of the input characteristic diagram by means of bilinear interpolation upsampling and 1 × 1 convolution, and outputs a second characteristic diagram after dimension characteristics are adjusted through 1 × 1 convolution after the output of each convolution and cavity convolution is spliced on channel dimensions. Therefore, the cavity convolution processing is used in the middle and the tail end of the classification detection model, and the image features are comprehensively extracted and analyzed from multiple sizes; the last layer features in the classification detection model are not straightened, the space structure of the feature diagram is kept at the last layer of the classification detection model, and the final model output is generated by means of the improved detector, so that the parameter quantity of the model is greatly reduced, and the operation efficiency of the model is improved.

In the training stage, in order to better guide the output evaluation of the model and obtain a more accurate detection effect, an auxiliary model based on semantic segmentation is adopted for training in the training process of the neural network model.

In some embodiments, the method comprises: marking the acquired lane line pixel point coordinates of the road images of the plurality of scenes, the lane line type and the characteristic of whether the current lane is a drivable lane, and obtaining the marked images corresponding to the road images of the plurality of scenes in the sample image.

Illustratively, road images of different scenes are collected, lane lines in the road images are labeled, regions of interest and down-sampling operations are extracted from the road images, and data sets of different labeling formats are processed through a preset processing function to obtain a lane line data set required by neural network model training, namely, labeled images corresponding to the road images of the scenes in the sample image.

In some embodiments, training the neural network model from the sample images in the training set and the semantic segmentation model comprises:

inputting road images of a plurality of scenes in the sample image into a residual error network layer of the neural network model, and outputting a fourth feature map of the road image through convolution processing of the residual error network layer; inputting the fourth feature map into a cavity convolution network layer of the neural network model, and outputting a fifth feature map through feature extraction of the cavity convolution network layer; and performing upsampling processing on the fifth feature map, inputting the result of the upsampling processing into a detector of the neural network model, performing convolution processing on the result of the upsampling processing by the detector, and outputting classification prediction results of the road images of the multiple scenes, wherein the classification prediction results are used for indicating the positions of lanes in the road images of the multiple scenes.

In some embodiments, the residual network layer of the neural network model comprises three residual modules, the method comprising:

performing upsampling processing on the fifth characteristic diagram to obtain a sixth characteristic diagram; inputting road images of a plurality of scenes in the sample image into a residual error network layer, and outputting a seventh feature map through convolution processing of the first two residual error modules of the residual error network layer; inputting the seventh feature map into a semantic segmentation model, segmenting lane line pixels in the seventh feature map through the semantic segmentation model, and outputting a lane line example feature map; and splicing the lane line example characteristic graph and the sixth characteristic graph to obtain a spliced image.

In some embodiments, the method comprises: and inputting the spliced image into a divider, and outputting a semantic division prediction result after convolution processing of the divider, wherein the semantic division prediction result is used for indicating lane recognition results in road images of a plurality of scenes.

As shown in fig. 7, an overall architecture diagram of a trained neural network model provided in the embodiment of the present application is shown; the classification detection model of the upper half part is a model used in the lane line detection process, and the output of the classification detection model is used as the final result of the task in the prediction stage; the semantic segmentation model of the lower half part only participates in the training process of the model and is used for guiding the model to obtain a more accurate recognition effect.

In the process of training the neural network model, road images of a plurality of scenes in the sample image are input into a residual error network layer of the neural network model, and semantic features of the road images are extracted through convolution processing of the residual error network layer to obtain a fourth feature map. And inputting the fourth feature map into the hole convolution and 1 × 1 convolution of each size and sampling rate of the hole convolution network layer and global average pooling for convolution processing, extracting different detail features and global features, performing splicing processing and 1 × 1 convolution processing on each feature map, and outputting a fifth feature map. Inputting the fifth characteristic diagram into a detector of the neural network model, and outputting a classification prediction result (lane number, row number and column number + 1) after two-dimensional convolution processing of the detector; the classification prediction indicates the classification is lanes, the location of the lanes on each row of the image, and the determination of which column of the row each lane is located in, or does not belong to any column of the row.

The characteristic images of the road images are sampled and classified through the cavity convolution network layer in the neural network model, so that the larger receptive field is ensured, the detail perception capability of the model on the road images in different scenes is improved by utilizing a plurality of cavity convolution kernels with different sizes, and the parameter quantity of the model is reduced. And extracting global features of the feature map through global tie pooling in the cavity convolutional network layer, and outputting the feature map after upsampling and 1 multiplied by 1 convolution. And splicing the feature map output by each cavity convolution kernel and the feature map output by the global tie pooling in the channel dimension, and outputting a fifth feature map after one layer of 1 × 1 convolution processing.

For example, the plurality of hole convolution kernels may be 1 × 1 convolution, 3 × 3 convolution with a sampling rate of 1, 3 × 3 convolution with a sampling rate of 3, and 3 × 3 convolution with a sampling rate of 5.

In some embodiments, in the training process based on the semantic segmentation model, the output result of the second residual module of the neural network model (such as ResNet-34) is input into the semantic segmentation model, and the output result of the second residual module is subjected to pixel segmentation through the semantic segmentation model, so as to output the lane line instance feature map. And outputting the fifth characteristic diagram after the processing of the void convolutional network layer for up-sampling processing, and outputting a sixth characteristic diagram. And splicing the lane line example characteristic graph and the sixth characteristic graph to obtain a spliced image.

In some embodiments, the stitched image is input to a segmenter, and subjected to two-dimensional convolution processing by the segmenter to output a semantic segmentation prediction result [ number of lanes +1, number of rows, number of columns ] indicating, for each basic pixel region in the image, which lane it belongs to or does not belong to any lane.

As shown in fig. 8, a structural schematic diagram of a divider provided in the embodiment of the present application. The network before the segmenter samples the feature map to the shape of [ number of lanes +1, number of rows and number of columns ], after two layers of convolution of holes with edge filling and one layer of common convolution of 3 x 3, the final semantic segmentation prediction result is output, the shape of the semantic segmentation prediction result is the same as that of the input feature map, namely [ number of lanes +1, number of rows and number of columns ], and the semantic segmentation prediction result represents which lane or no lane belongs to each basic pixel region in the image. Wherein, the segmenter is an output part of the semantic segmentation model.

In the training phase, a more accurate recognition effect is obtained for better guiding the output evaluation of the model. The neural network is trained using an auxiliary model based on semantic segmentation. And splicing the result of the second residual error module of the neural network model ResNet-34 with the output of the up-sampled cavity convolution network layer through a semantic segmentation model, and directly processing the result by adopting a semantic segmenter based on two-dimensional convolution to generate output. The goal of this output is to determine the recognition results of the individual lanes of the road images of multiple scenes, e.g., whether each portion in the image belongs to a lane, to which lane, and to perform semantic segmentation tasks.

In some embodiments, a first error value of the classification prediction result of the neural network model relative to the prediction probability of the marker image in the sample image is calculated by a first loss function, and a parameter of the neural network model is adjusted by the first error value; the first loss function is expressed as follows:

In the training process, the classification detection model adopts the Focal local as a first Loss function in the training process. Where y refers to the classification truth of the sample, p refers to the prediction probability of the sample, and γ is the set weight. Focal local may add more weight to samples that are difficult to classify and severely misclassified, making the model more focused on the samples during training.

In some embodiments, a second error value of the lane recognition result of the semantic segmentation model relative to the prediction probability of the marker image in the sample image is calculated through a second loss function, and the parameters of the neural network model are adjusted through the second error value; the second loss function is expressed as follows:

In the training process of the semantic segmentation model, a Cross Entropy Loss function Cross entry Loss is used as a second Loss function, and a calculation formula of the Cross entry Loss is shown in formula (2), wherein y refers to a classification true value of the sample, and p refers to a prediction probability of the sample. For each class of classification result, when y is 1, namely the true value of the class is 1, the prediction probability is as close to 1 as possible, and the loss is-log (p); for the example where the true value of the classification is 0, the prediction probability is made to approach 0 and the loss is-log (1-p).

According to the embodiment of the application, due to the fact that the cavity space convolution pyramid pooling is used for effectively optimizing the receptive field under the condition that the parameter quantity of the model is hardly increased, the last layer feature graph in the classification detection model is not straightened, but a structured loss function capable of restraining the lane line space feature is adopted for processing, and bilinear interpolation up-sampling and two-dimensional convolution processing are directly carried out on the output feature graph of the cavity convolution network layer; the spatial structure of the feature map is kept, and the training burden of the model is also reduced.

By the embodiment of the application, the parameter quantity of the model is optimized, and the operation efficiency of the model is improved; the spatial structure of the characteristic diagram is kept at the tail end of the model, so that the analysis of the overall characteristics of the image is facilitated; the feature analysis of the latter half of the model is optimized by utilizing the hollow space convolution pyramid pooling, the receptive field of a convolution kernel is improved under the condition that the training burden of the model is not increased, different features of the feature map are analyzed by using a plurality of convolution kernels with different sizes, the generalization capability of classification and semantic segmentation is enhanced, and the detection precision of tasks is improved.

In order to verify the effectiveness of the method provided by the invention, the data set which is the same as that of the traditional neural network model is adopted for experimental verification, and the traditional neural network model is compared on the training speed, the convergence speed and the detection precision of the model, so that the method provided by the invention is obviously improved. As shown in table 1, the trained neural network model provided in the embodiment of the present application is compared with the conventional original model in terms of the detection accuracy and the detection speed. The accuracy refers to the accuracy of the model in identifying the pixels of the lane line on the image of 800 × 288 pixels. Run speed refers to the time required for the model to process one batch (batch) and count 16 pictures.

	Trained neural network model	Conventional master model
			Accuracy rate	92.96％	92.04％
Speed of operation	32ms/batch	60ms/batch

TABLE 1

As shown in fig. 9, the embodiment of the present application provides a visual schematic diagram of a detection result of a lane line. Fig. 9 shows the detection results of the trained neural network model provided by the embodiment of the present application in two sets of test images. As can be seen, as shown in fig. 9 (a), the trained neural network model provided in the embodiment of the present application can accurately identify a plurality of lane lines in an image; as shown in fig. 9 (b), even when the lane line is blocked by an obstacle, a good recognition effect can be maintained.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 10 shows a block diagram of a device for detecting a lane line according to an embodiment of the present application, which corresponds to the method for detecting a lane line according to the foregoing embodiment, and only shows portions related to the embodiment of the present application for convenience of description.

Referring to fig. 10, the apparatus includes:

an acquiring unit 101, configured to acquire a road image of a current scene;

the processing unit 102 is configured to input the road image into a trained neural network model for processing, and output a detection result of a lane line in the road image of the current scene; the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, wherein the sample images in the training set comprise collected road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes.

According to the embodiment of the invention, the terminal equipment acquires the road image of the current scene; inputting the road image into the trained neural network model for processing, and outputting the detection result of the lane line in the road image of the current scene; the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, wherein the sample images in the training set comprise acquired road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes; the trained neural network model is obtained by training according to the sample images in the training set and the semantic segmentation model, so that the problems of low lane line identification accuracy in the current complex environment, large model calculation amount and low response speed due to the fact that the current model for detecting the lane lines is complex can be solved; the detection precision is improved, and the real-time requirement in the actual application scene of the automatic driving task is met; has strong usability and practicability.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

Fig. 11 is a schematic structural diagram of a terminal device 11 according to an embodiment of the present application. As shown in fig. 11, the terminal device 11 of this embodiment includes: at least one processor 110 (only one shown in fig. 11), a memory 111, and a computer program 112 stored in the memory 111 and operable on the at least one processor 110, the processor 110 implementing the steps in any of the various authentication method embodiments described above when executing the computer program 112.

The terminal device 11 may be a desktop computer, a notebook, a palm computer, a vehicle-mounted device, a cloud server, or other computing devices. The terminal device 11 may include, but is not limited to, a processor 110 and a memory 111. Those skilled in the art will appreciate that fig. 11 is only an example of the terminal device 11, and does not constitute a limitation to the terminal device 11, and may include more or less components than those shown, or combine some components, or different components, for example, and may further include an input/output device, a network access device, and the like.

The Processor 110 may be a Central Processing Unit (CPU), and the Processor 110 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 111 may in some embodiments be an internal storage unit of the terminal device 11, such as a hard disk or a memory of the terminal device 11. In other embodiments, the memory 111 may also be an external storage device of the terminal device 11, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 11. Further, the memory 111 may also include both an internal storage unit and an external storage device of the terminal device 11. The memory 111 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer programs. The memory 111 may also be used to temporarily store data that has been output or is to be output.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of detecting a lane line, comprising:

acquiring a road image of a current scene;

inputting the road image into a trained neural network model for processing, and outputting a detection result of a lane line in the road image of the current scene;

the trained neural network model is obtained by training according to sample images in a training set and a semantic segmentation model, wherein the sample images in the training set comprise collected road images of a plurality of scenes and mark images corresponding to the road images of the plurality of scenes.

2. The method of claim 1, in which the trained neural network model comprises a residual network layer, a hole convolution network layer, an upsampling network layer, and a detector;

the inputting the road image into the trained neural network model for processing comprises:

inputting the road image into the residual error network layer, and outputting a first feature map containing semantic features through convolution processing of the residual error network layer;

inputting the first feature map into the cavity convolution network layer, and outputting a second feature map containing detailed features through feature extraction of the cavity convolution network layer;

inputting the second characteristic diagram into the up-sampling network layer, and outputting a third characteristic diagram after up-sampling processing of the up-sampling network layer;

and inputting the third feature map into the detector, and outputting the detection result of the lane line in the road image of the current scene through convolution processing of the detector.

3. The method of claim 2, wherein the residual network layer comprises a first residual module, a second residual module, and a third residual module;

the inputting the road image into the residual error network layer, and outputting a first feature map containing semantic features through convolution processing of the residual error network layer, including:

inputting the road image into the first residual error module, and performing convolution processing through the first residual error module to obtain a first result;

inputting the first result into the second residual error module, and obtaining a second result through convolution processing of the second residual error module;

and inputting the second result into the third residual error module, and outputting the first characteristic diagram after convolution processing of the third residual error module.

4. The method of claim 2, wherein the hole convolutional network layer comprises a first convolutional module, a second convolutional module, a third convolutional module, a fourth convolutional module, and a global average pooling module;

inputting the first feature map into the cavity convolution network layer, and performing feature extraction on the cavity convolution network layer, wherein the feature extraction comprises the following steps:

5. The method of claim 4, wherein inputting the first feature map into the hole convolutional network layer, after feature extraction by the hole convolutional network layer, further comprises:

splicing feature maps respectively output by the first convolution module, the second convolution module, the third convolution module, the fourth convolution module and the global average pooling module on channel dimensions to obtain spliced feature maps;

and after the splicing feature map is subjected to 1 × 1 convolution processing, outputting the second feature map.

6. The method of claim 5, wherein after feature extraction of the first feature map through the hole convolutional network layer, the method comprises:

7. The method of claim 1, wherein the method comprises:

8. The method of claim 1 or 7, wherein training the neural network model based on the sample images in the training set and the semantic segmentation model comprises:

inputting road images of the scenes in the sample image into a residual error network layer of the neural network model, and outputting a fourth feature map of the road image after convolution processing of the residual error network layer;

inputting the fourth feature map into a cavity convolution network layer of the neural network model, and outputting a fifth feature map through feature extraction of the cavity convolution network layer;

and performing upsampling processing on the fifth feature map, inputting the result of the upsampling processing into a detector of the neural network model, and outputting classification prediction results of the road images of the multiple scenes after convolution processing of the detector, wherein the classification prediction results are used for indicating positions of lanes in the road images of the multiple scenes.

9. The method of claim 8, wherein a residual network layer of the neural network model comprises three residual modules, the method comprising:

performing upsampling processing on the fifth characteristic diagram to obtain a sixth characteristic diagram;

inputting road images of the scenes in the sample image into the residual error network layer, and outputting a seventh feature map through convolution processing of the first two residual error modules of the residual error network layer;

inputting the seventh feature map into the semantic segmentation model, segmenting lane line pixels in the seventh feature map through the semantic segmentation model, and outputting a lane line example feature map;

and splicing the lane line example characteristic graph and the sixth characteristic graph to obtain a spliced image.

10. The method of claim 9, wherein the method comprises:

11. The method of claim 8, wherein the method further comprises:

12. The method of claim 8 or 11, wherein the method further comprises:

13. An apparatus for detecting a lane line, comprising:

the acquisition unit is used for acquiring a road image of a current scene;

14. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 12 when executing the computer program.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 12.