CN112883807A - Lane line detection method and system - Google Patents
Lane line detection method and system Download PDFInfo
- Publication number
- CN112883807A CN112883807A CN202110091223.3A CN202110091223A CN112883807A CN 112883807 A CN112883807 A CN 112883807A CN 202110091223 A CN202110091223 A CN 202110091223A CN 112883807 A CN112883807 A CN 112883807A
- Authority
- CN
- China
- Prior art keywords
- training
- image
- lane line
- coordinates
- follows
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 145
- 238000010586 diagram Methods 0.000 claims abstract description 55
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 21
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000012805 post-processing Methods 0.000 claims abstract description 4
- 238000010606 normalization Methods 0.000 claims description 36
- 238000013519 translation Methods 0.000 claims description 36
- 238000004364 calculation method Methods 0.000 claims description 33
- 230000006870 function Effects 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000003062 neural network model Methods 0.000 claims description 18
- 238000012360 testing method Methods 0.000 claims description 18
- 238000012795 verification Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 15
- 230000014509 gene expression Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 239000000945 filler Substances 0.000 claims description 6
- 230000017105 transposition Effects 0.000 claims description 6
- 230000004927 fusion Effects 0.000 abstract description 3
- 230000007246 mechanism Effects 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a lane line detection method and a system, wherein the method specifically comprises the following steps: modeling a lane detection task, building a convolutional neural network model, and extracting the characteristics of a convolutional characteristic diagram; collecting lane line training sample images in a training stage, increasing the diversity of training samples by utilizing preprocessing, and further obtaining converged network model parameters through iterative training; in the reasoning stage, post-processing is carried out on a reasoning result to obtain a lane line predicted by the model; according to the invention, a mechanism is introduced into the network structure, the semantic features of the extracted feature map are richer through the fusion of local and global information, and the positioning precision of the far end of the lane line is further improved by utilizing the network model of structure training.
Description
Technical Field
The invention relates to a lane line detection technology, in particular to a lane line detection method and a lane line detection system.
Background
The lane line is used as an important component of the road surface mark, so that the intelligent vehicle can be effectively guided to run in a restricted road structure area, the lane line detected on the road surface in real time is an important link in an intelligent vehicle auxiliary driving system, the functions of assisting path planning, performing road deviation early warning and the like are facilitated, and a reference object can be provided for positioning and navigation.
Currently, the most advanced methods for lane line detection in the industry are CNN-based methods, for example, the SCNN and SAD networks regard lane line detection as a semantic segmentation task and have a heavy coding and decoding structure, however, such methods usually use a small image as an input, which makes it difficult to accurately predict the far end part of a curved lane line; in addition, the above method is generally limited to detecting a predefined number of lane lines, and for the task of detecting lane lines on an actual road, the scenes in the image are different from each other, and the number of lane lines is not fixed; for this purpose pointlanonet follows a candidate region based strategy by generating multiple candidate lines in the image, thereby breaking the limitations of inefficient encoders and predefined number of lanes.
In the prior art, the ResNet122 is used as a main network to extract semantic features, an input image is subjected to main processing to obtain a corresponding feature map, then each grid on the feature map is regarded as an anchor, the anchor predicts the offset corresponding to a lane line passing through the grid, then the lane line with higher confidence coefficient is reserved through an NMS algorithm, and redundant candidate lane lines are filtered; furthermore, the output of the whole lane line can be directly obtained through end-to-end training, and the limit of the output of a fixed number of lane lines is eliminated, however, the whole lane line passing through the corresponding grid of the obtained feature graph is predicted by using anchors, so that the performance is greatly reduced when the far end of the prediction curve is obtained; due to the limitation of the scale of the convolution kernel, the characteristic diagram obtained by the scheme can only capture the information of a local area, and cannot capture the context information of long distance and short distance simultaneously.
Disclosure of Invention
The embodiment of the invention provides a lane line detection method, which can enrich semantic features of extracted feature maps through the fusion of local information and global information, and improve the positioning accuracy of the far end of a lane line at lower calculation cost.
In a first aspect, a lane line detection method is provided, where the method includes:
constructing a convolutional neural network model;
constructing a training and reasoning stage in a network model;
adjusting the pretreatment of the training phase;
and obtaining the predicted lane line through an inference stage.
In some realizations of the first aspect, the neural network model is constructed as follows:
the neural network model extracts the features of the convolution feature map;
the convolution characteristic diagram is connected with three branches and adopts convolution layers of 1x 1;
1x1 convolution layers respectively compress the number of channels of the characteristic diagram;
the compressed data carries out matrix multiplication on the first branch and the second branch after the first branch is transferred to obtain an attention feature map;
obtaining a normalized attention feature map by the attention feature map through softmax;
performing matrix multiplication on the third branch of the compressed data and the normalized attention feature map to obtain a weighted attention feature map;
and (4) performing dimension increasing on the number of the characteristic diagram channels by utilizing 1X1 convolution, and outputting a self-attention characteristic diagram.
In some implementations of the first aspect, the attention structure is derived from a neural network model, the attention structure being specified as follows:
the size of the characteristic diagram obtained after passing through the backbone network is not prevented from being N C H W,
then, the convolution of 1X1 is adopted by three branches to respectively compress the channel number of the feature map,
the sizes of the compressed characteristic graphs are all N C/8 (H W);
then, the characteristic diagram Q of the second branch is subjected to a transposition function to obtain Q ', and the size of Q' is N (H) W C/8;
performing matrix multiplication on Q 'and R to obtain M, and then performing normalization processing on M by using Softmax to obtain M', wherein the size of M is N (H) W;
representing the extracted global semantic information; performing matrix multiplication operation on the P and the M' to obtain a characteristic diagram O, wherein the size of the characteristic diagram O is N C/8 (H W);
finally, the number of channels is subjected to dimensionality increase by convolution with 1X1, and the size of the output feature graph is N × C (H × W).
In some implementations of the first aspect, the training phase specifically operates as follows:
collecting lane line training sample images, wherein the lane line training sample images comprise RGB three-channel color images and are marked with the position and category information of lane lines;
the training sample is converted into a format required by a convolutional neural network through pretreatment, and the training effect is further improved;
dividing a data set into a training set, a verification set and a test set, wherein the training set is used for training the convolutional neural network, the verification set is used for selecting an optimal training model, and the test set is used for evaluating model indexes;
building a neural network structure, and replacing a common convolution structure in a PointLaneNet structure with an attention structure;
iterative training, forward propagation and calculation of a model prediction result and a loss function of the marked truth value; calculating gradients through back propagation;
observing a loss function descending curve, verifying the converged model on a verification set, and selecting an optimal training model;
the training process comprises the following steps:
initializing training times;
generating small-batch lane line training data according to the training times;
obtaining a predicted value through forward-transmitted training data, and calculating a loss function;
updating the weights by back propagation;
judging whether the updated weight reaches a training target; if yes, the training is finished; otherwise, carrying out the next step;
and (4) judging whether the training times are reached, if not, returning to the step (2) through the training times plus 1, and if so, ending the training.
According to one aspect of the invention, the inference phase specifically comprises the following steps:
randomly disorganizing the test set;
carrying out convolution neural network reasoning on the images in the test set to obtain a reasoning result;
and post-processing the inference result to obtain the predicted lane line.
In some implementations of the first aspect, the preprocessing flow is as follows:
reading the image I and the annotation data L;
zooming and reading the coefficients of the image I and the labeled data L;
and normalizing the coefficients of the scaled read image I and the labeled data L.
In some realizations of the first aspect, the scaled expressions in the scaled read image I and annotation data L coefficients are as follows:
in the formula, dstxAnd dstyRepresenting the position of the zoomed image; srcxAnd srcyRepresenting the corresponding position in the original image;
sxand syRepresents a scaling factor for;
the expression for normalization in the coefficient normalization process of the scaled read image I and the annotation data L is as follows:
in formula (II)'(x,y)Representing pixel values of the normalized image; i is(x,y)Representing pixel values of the image before normalization; mean represents the mean of the image pixel values before normalization; std denotes the variance of the image pixel values before normalization.
In some implementations of the first aspect, the data enhancement strategy of the lane line detection task involved in the training phase includes horizontal flipping, rotation, translation, scaling; the coordinates of points on the lane line before horizontal turning are set as (x, y) in the horizontal turning; the coordinate after flipping is (x ', y'), then the calculation formula is as follows:
in the formula, w represents the width of an image;
the rotation is defined as the coordinates of a point on the lane line before the image rotation is (x, y), the coordinates after the image rotation is (x ', y'), and the rotation angle is θ, then the calculation formula is as follows:
in the formula, w represents the width of an image; h represents the height of the image;
the coordinates of a point on the lane line before image translation are set as (x, y) in the translation, and the coordinates after translation are set as (x ', y'), so that the calculation formula is as follows:
in the formula (x)start,ystart) Representing the coordinates selected before translation, (x)end,yend) Representing coordinates after point translation;
and (3) the scaling is to set the coordinates of a point on the lane line before image scaling as (x, y) and the scaled coordinates as (x ', y'), so that the calculation formula is as follows:
where scale represents the scaling factor, (x)top,ytop) Representing the starting coordinates of the scaled image in the filler image.
In a second aspect, there is provided a lane line detection system, the system comprising:
a first module for constructing a convolutional neural network model;
a second module for building a training and reasoning phase through the network model;
a third module for adjusting the pre-processing of the training phase;
a fourth module for predicting the lane line through the inference phase.
In some realizations of the second aspect, the first module further constructs the neural network model as follows:
the neural network model extracts the features of the convolution feature map;
the convolution characteristic diagram is connected with three branches and adopts convolution layers of 1x 1;
1x1 convolution layers respectively compress the number of channels of the characteristic diagram;
the compressed data carries out matrix multiplication on the first branch and the second branch after the first branch is transferred to obtain an attention feature map;
obtaining a normalized attention feature map by the attention feature map through softmax;
performing matrix multiplication on the third branch of the compressed data and the normalized attention feature map to obtain a weighted attention feature map;
performing dimension increasing on the number of the characteristic diagram channels by utilizing 1X1 convolution, and outputting a self-attention characteristic diagram;
further, an attention structure is obtained according to the neural network model, and the attention structure is specifically as follows:
the size of the characteristic diagram obtained after passing through the backbone network is not prevented from being N C H W,
then, the convolution of 1X1 is adopted by three branches to respectively compress the channel number of the feature map,
the sizes of the compressed characteristic graphs are all N C/8 (H W);
then, the characteristic diagram Q of the second branch is subjected to a transposition function to obtain Q ', and the size of Q' is N (H) W C/8;
performing matrix multiplication on Q 'and R to obtain M, and then performing normalization processing on M by using Softmax to obtain M', wherein the size of M is N (H) W;
representing the extracted global semantic information; performing matrix multiplication operation on the P and the M' to obtain a characteristic diagram O, wherein the size of the characteristic diagram O is N C/8 (H W);
finally, the number of channels is subjected to dimensionality increase by convolution with 1X1, and the size of the output feature graph is N × C (H × W).
In some implementations of the second aspect, the second module is further configured to perform the training phase, and the specific operation steps are as follows:
collecting lane line training sample images, wherein the lane line training sample images comprise RGB three-channel color images and are marked with the position and category information of lane lines;
the training sample is converted into a format required by a convolutional neural network through pretreatment, and the training effect is further improved;
dividing a data set into a training set, a verification set and a test set, wherein the training set is used for training the convolutional neural network, the verification set is used for selecting an optimal training model, and the test set is used for evaluating model indexes;
building a neural network structure, and replacing a common convolution structure in a PointLaneNet structure with an attention structure;
iterative training, forward propagation and calculation of a model prediction result and a loss function of the marked truth value; calculating gradients through back propagation;
observing a loss function descending curve, verifying the converged model on a verification set, and selecting an optimal training model;
the training process comprises the following steps:
initializing training times;
generating small-batch lane line training data according to the training times;
obtaining a predicted value through forward-transmitted training data, and calculating a loss function;
updating the weights by back propagation;
judging whether the updated weight reaches a training target; if yes, the training is finished; otherwise, carrying out the next step;
and (4) judging whether the training times are reached, if not, returning to the step (2) through the training times plus 1, and if so, ending the training.
In some implementations of the second aspect, the third module further performs the preprocessing as follows:
reading the image I and the annotation data L;
zooming and reading the coefficients of the image I and the labeled data L;
normalizing the coefficients of the zoom reading image I and the annotation data L;
the scaling expressions in the coefficients of the scaled read image I and the annotation data L are as follows:
in the formula, dstxAnd dstyRepresenting the position of the zoomed image; srcxAnd srcyRepresenting the corresponding position in the original image;
sxand syRepresents a scaling factor for;
the expression for normalization in the coefficient normalization process of the scaled read image I and the annotation data L is as follows:
in formula (II)'(x,y)Representing pixel values of the normalized image; i is(x,y)Representing pixel values of the image before normalization; mean represents the mean of the image pixel values before normalization; std denotes the variance of the image pixel values before normalization.
In some implementations of the second aspect, the fourth module further enhances the data enhancement strategy for the lane line detection task involved in the training phase by horizontal flipping, rotating, translating, scaling; the coordinates of points on the lane line before horizontal turning are set as (x, y) in the horizontal turning; the coordinate after flipping is (x ', y'), then the calculation formula is as follows:
in the formula, w represents the width of an image;
the rotation is defined as the coordinates of a point on the lane line before the image rotation is (x, y), the coordinates after the image rotation is (x ', y'), and the rotation angle is θ, then the calculation formula is as follows:
in the formula, w represents the width of an image; h represents the height of the image;
the coordinates of a point on the lane line before image translation are set as (x, y) in the translation, and the coordinates after translation are set as (x ', y'), so that the calculation formula is as follows:
in the formula (x)start,ystart) Representing the coordinates selected before translation, (x)end,yend) Representing coordinates after point translation;
and (3) the scaling is to set the coordinates of a point on the lane line before image scaling as (x, y) and the scaled coordinates as (x ', y'), so that the calculation formula is as follows:
where scale represents the scaling factor, (x)top,ytop) Representing the starting coordinates of the scaled image in the filler image.
In a third aspect, there is provided a lane line detection apparatus, the apparatus comprising:
a processor and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the lane line detection method of the first aspect.
In a fourth aspect, a computer-readable storage medium having a computer stored thereon is provided;
program instructions which, when executed by a processor, implement a lane line detection method of the first aspect.
Has the advantages that: the invention designs a lane line detection method and a lane line detection system, aiming at solving the problem that the far end of a lane line is not accurately positioned, the invention introduces an attention mechanism into the network structure, can make semantic features of an extracted feature map richer through the fusion of local and global information, and simultaneously improves the positioning accuracy of the far end of the lane line with lower calculation cost; by introducing an attention mechanism, the global semantic information of the feature map can be enriched; and the network model trained by the attention structure is utilized to improve the positioning precision of the far end of the lane line.
Drawings
Fig. 1 is a schematic diagram of the network architecture of the present invention.
Fig. 2 is a schematic view of the attention structure of the present invention.
Fig. 3 is a schematic diagram of the training process of the present invention.
FIG. 4 is a schematic flow chart of the training phase and the reasoning phase of the present invention.
Fig. 5 is a schematic diagram of the present invention in a horizontal flip.
Fig. 6 is a schematic rotation diagram of the present invention.
Fig. 7 is a schematic view of the translation of the present invention.
Fig. 8 is a schematic zoom view of the present invention.
FIG. 9 is a flow chart of the pre-processing of the present invention.
Detailed Description
In this embodiment, a method and a system for detecting a lane line can enrich semantic features of an extracted feature map by fusing local information and global information, and improve positioning accuracy of a far end of a lane line at a low calculation cost.
Currently, the most advanced methods for lane line detection in the industry are CNN-based methods, for example, the SCNN and SAD networks regard lane line detection as a semantic segmentation task and have a heavy coding and decoding structure, however, such methods usually use a small image as an input, which makes it difficult to accurately predict the far end part of a curved lane line; in addition, the above method is generally limited to detecting a predefined number of lane lines, and for the task of detecting lane lines on an actual road, the scenes in the image are different from each other, and the number of lane lines is not fixed; for this purpose pointlanonet follows a candidate region based strategy by generating multiple candidate lines in the image, thereby breaking the limitations of inefficient encoders and predefined number of lanes.
In the prior art, the ResNet122 is used as a main network to extract semantic features, an input image is subjected to main processing to obtain a corresponding feature map, then each grid on the feature map is regarded as an anchor, the anchor predicts the offset corresponding to a lane line passing through the grid, then the lane line with higher confidence coefficient is reserved through an NMS algorithm, and redundant candidate lane lines are filtered; and then the output of the whole lane line can be directly obtained through end-to-end training, and the limit of the output of a fixed number of lane lines is eliminated.
In summary, in the present application, the applicant believes that there are at least the following disadvantages in the prior art:
and predicting the whole lane line passing through the corresponding grid by using anchors on the obtained characteristic diagram, so that the performance is greatly reduced when the far end of the curve is predicted.
In order to solve the disadvantages in the prior art, embodiments of the present invention provide a lane line detection method and system, and the following describes a technical solution of the embodiments of the present invention with reference to the accompanying drawings.
The first embodiment,
According to an embodiment, there is provided a lane line detection method, including:
constructing a convolutional neural network model;
constructing a training and reasoning stage in a network model;
adjusting the pretreatment of the training phase;
and obtaining the predicted lane line through an inference stage.
Example II,
On the basis of the first embodiment, the neural network model is constructed as follows:
the neural network model extracts the features of the convolution feature map;
the convolution characteristic diagram is connected with three branches and adopts convolution layers of 1x 1;
1x1 convolution layers respectively compress the number of channels of the characteristic diagram;
the compressed data carries out matrix multiplication on the first branch and the second branch after the first branch is transferred to obtain an attention feature map;
obtaining a normalized attention feature map by the attention feature map through softmax;
performing matrix multiplication on the third branch of the compressed data and the normalized attention feature map to obtain a weighted attention feature map;
and (4) performing dimension increasing on the number of the characteristic diagram channels by utilizing 1X1 convolution, and outputting a self-attention characteristic diagram.
Example III,
On the basis of the second embodiment, an attention structure is obtained according to the neural network model, and the attention structure is specifically as follows:
the size of the characteristic diagram obtained after passing through the backbone network is not prevented from being N C H W,
then, the convolution of 1X1 is adopted by three branches to respectively compress the channel number of the feature map,
the sizes of the compressed characteristic graphs are all N C/8 (H W);
then, the characteristic diagram Q of the second branch is subjected to a transposition function to obtain Q ', and the size of Q' is N (H) W C/8;
performing matrix multiplication on Q 'and R to obtain M, and then performing normalization processing on M by using Softmax to obtain M', wherein the size of M is N (H) W;
representing the extracted global semantic information; performing matrix multiplication operation on the P and the M' to obtain a characteristic diagram O, wherein the size of the characteristic diagram O is N C/8 (H W);
finally, the number of channels is subjected to dimensionality increase by convolution with 1X1, and the size of the output feature graph is N × C (H × W).
Example four,
On the basis of the first embodiment, the training phase specifically operates as follows:
collecting lane line training sample images, wherein the lane line training sample images comprise RGB three-channel color images and are marked with the position and category information of lane lines;
the training sample is converted into a format required by a convolutional neural network through pretreatment, and the training effect is further improved;
dividing a data set into a training set, a verification set and a test set, wherein the training set is used for training the convolutional neural network, the verification set is used for selecting an optimal training model, and the test set is used for evaluating model indexes;
building a neural network structure, and replacing a common convolution structure in a PointLaneNet structure with an attention structure;
iterative training, forward propagation and calculation of a model prediction result and a loss function of the marked truth value; calculating gradients through back propagation;
and observing a loss function descending curve, verifying the converged model on a verification set, and selecting an optimal training model.
Example V,
On the basis of the fourth embodiment, the training flow comprises the following steps:
initializing training times;
generating small-batch lane line training data according to the training times;
obtaining a predicted value through forward-transmitted training data, and calculating a loss function;
updating the weights by back propagation;
judging whether the updated weight reaches a training target; if yes, the training is finished; otherwise, carrying out the next step;
and (4) judging whether the training times are reached, if not, returning to the step (2) through the training times plus 1, and if so, ending the training.
Example six,
On the basis of the first embodiment, the inference stage specifically comprises the following operation steps:
randomly disorganizing the test set;
carrying out convolution neural network reasoning on the images in the test set to obtain a reasoning result;
and post-processing the inference result to obtain the predicted lane line.
Example seven,
On the basis of the sixth embodiment, the flow of the pretreatment is as follows:
reading the image I and the annotation data L;
zooming and reading the coefficients of the image I and the labeled data L;
and normalizing the coefficients of the scaled read image I and the labeled data L.
Example eight,
On the basis of the seventh embodiment, the expressions of scaling in the coefficients of the scaled read image I and the annotation data L are as follows:
in the formula, dstxAnd dstyRepresenting the position of the zoomed image; srcxAnd srcyRepresenting the corresponding position in the original image;
sxand syRepresents a scaling factor for;
the expression for normalization in the coefficient normalization process of the scaled read image I and the annotation data L is as follows:
in formula (II)'(x,y)Representing pixel values of the normalized image; i is(x,y)Representing pixel values of the image before normalization; mean represents the mean of the image pixel values before normalization; std denotes the variance of the image pixel values before normalization.
Examples nine,
On the basis of the seventh embodiment, the data enhancement strategy of the lane line detection task involved in the training stage comprises horizontal turning, rotation, translation and scaling; the coordinates of points on the lane line before horizontal turning are set as (x, y) in the horizontal turning; the coordinate after flipping is (x ', y'), then the calculation formula is as follows:
in the formula, w represents the width of an image;
the rotation is defined as the coordinates of a point on the lane line before the image rotation is (x, y), the coordinates after the image rotation is (x ', y'), and the rotation angle is θ, then the calculation formula is as follows:
in the formula, w represents the width of an image; h represents the height of the image;
the coordinates of a point on the lane line before image translation are set as (x, y) in the translation, and the coordinates after translation are set as (x ', y'), so that the calculation formula is as follows:
in the formula (x)start,ystart) Representing the coordinates selected before translation, (x)end,yend) Representing coordinates after point translation;
and (3) the scaling is to set the coordinates of a point on the lane line before image scaling as (x, y) and the scaled coordinates as (x ', y'), so that the calculation formula is as follows:
where scale represents the scaling factor, (x)top,ytop) Representation scalingThe starting coordinates of the post image in the filler image.
Examples ten,
There is provided in accordance with an eleventh embodiment a lane line detection system, the system comprising:
a first module for constructing a convolutional neural network model; the first module further constructs the neural network model as follows:
the neural network model extracts the features of the convolution feature map;
the convolution characteristic diagram is connected with three branches and adopts convolution layers of 1x 1;
1x1 convolution layers respectively compress the number of channels of the characteristic diagram;
the compressed data carries out matrix multiplication on the first branch and the second branch after the first branch is transferred to obtain an attention feature map;
obtaining a normalized attention feature map by the attention feature map through softmax;
performing matrix multiplication on the third branch of the compressed data and the normalized attention feature map to obtain a weighted attention feature map;
and (4) performing dimension increasing on the number of the characteristic diagram channels by utilizing 1X1 convolution, and outputting a self-attention characteristic diagram.
An attention structure is obtained according to a neural network model, and the attention structure is specifically as follows:
the size of the characteristic diagram obtained after passing through the backbone network is not prevented from being N C H W,
then, the convolution of 1X1 is adopted by three branches to respectively compress the channel number of the feature map,
the sizes of the compressed characteristic graphs are all N C/8 (H W);
then, the characteristic diagram Q of the second branch is subjected to a transposition function to obtain Q ', and the size of Q' is N (H) W C/8;
performing matrix multiplication on Q 'and R to obtain M, and then performing normalization processing on M by using Softmax to obtain M', wherein the size of M is N (H) W;
representing the extracted global semantic information; performing matrix multiplication operation on the P and the M' to obtain a characteristic diagram O, wherein the size of the characteristic diagram O is N C/8 (H W);
finally, the number of channels is subjected to dimensionality increase by convolution with 1X1, and the size of the output feature graph is N × C (H × W).
Examples eleven,
On the basis of the tenth embodiment, the second module is used for constructing a training and reasoning stage in the network model; the second module is further the training phase, and the specific operation steps are as follows:
in the training stage, the specific operation is as follows:
collecting lane line training sample images, wherein the lane line training sample images comprise RGB three-channel color images and are marked with the position and category information of lane lines;
the training sample is converted into a format required by a convolutional neural network through pretreatment, and the training effect is further improved;
dividing a data set into a training set, a verification set and a test set, wherein the training set is used for training the convolutional neural network, the verification set is used for selecting an optimal training model, and the test set is used for evaluating model indexes;
building a neural network structure, and replacing a common convolution structure in a PointLaneNet structure with an attention structure;
iterative training, forward propagation and calculation of a model prediction result and a loss function of the marked truth value; calculating gradients through back propagation;
and observing a loss function descending curve, verifying the converged model on a verification set, and selecting an optimal training model.
Examples twelve,
A third module for adjusting the preprocessing of the training phase on the basis of the tenth embodiment; the third module further performs the following pre-processing procedure:
the pretreatment process comprises the following steps:
reading the image I and the annotation data L;
zooming and reading the coefficients of the image I and the labeled data L;
and normalizing the coefficients of the scaled read image I and the labeled data L.
The scaling expressions in the coefficients of the scaled read image I and the annotation data L are as follows:
in the formula, dstxAnd dstyRepresenting the position of the zoomed image; srcxAnd srcyRepresenting the corresponding position in the original image;
sxand syRepresents a scaling factor for;
the expression for normalization in the coefficient normalization process of the scaled read image I and the annotation data L is as follows:
in formula (II)'(x,y)Representing pixel values of the normalized image; i is(x,y)Representing pixel values of the image before normalization; mean represents the mean of the image pixel values before normalization; std represents the variance of the image pixel values before normalization; in the middle, the mean values of RGB three channels are [0.485, 0.456 and 0.406 ] in sequence]While the corresponding variance is sequentially
[0.229,0.224,0.225]。
Example thirteen,
A fourth module for predicting lane lines through a training phase based on the twelfth embodiment; the fourth module further enhances strategies including horizontal turning, rotation, translation and scaling for the data of the lane line detection task involved in the training phase; the coordinates of points on the lane line before horizontal turning are set as (x, y) in the horizontal turning; the coordinate after flipping is (x ', y'), then the calculation formula is as follows:
in the formula, w represents the width of an image;
the rotation is defined as the coordinates of a point on the lane line before the image rotation is (x, y), the coordinates after the image rotation is (x ', y'), and the rotation angle is θ, then the calculation formula is as follows:
in the formula, w represents the width of an image; h represents the height of the image;
the coordinates of a point on the lane line before image translation are set as (x, y) in the translation, and the coordinates after translation are set as (x ', y'), so that the calculation formula is as follows:
in the formula (x)start,ystart) Representing the coordinates selected before translation, (x)end,yend) Representing coordinates after point translation;
and (3) the scaling is to set the coordinates of a point on the lane line before image scaling as (x, y) and the scaled coordinates as (x ', y'), so that the calculation formula is as follows:
where scale represents the scaling factor, (x)top,ytop) Representing the starting coordinates of the scaled image in the filler image.
Examples fourteen,
There is provided in accordance with a twelfth embodiment a lane line detecting apparatus, including:
a processor and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the lane line detection method of the first embodiment.
Example fifteen,
There is provided in accordance with a twelfth embodiment a computer-readable storage medium having a computer stored thereon;
program instructions, which when executed by a processor implement a lane line detection method according to the first embodiment.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the above-mentioned order of the steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As will be apparent to those skilled in the art, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.
Claims (10)
1. A lane line detection method is characterized by comprising the following steps:
constructing a convolutional neural network model;
constructing a training and reasoning stage in a network model;
adjusting the pretreatment of the training phase;
and obtaining the predicted lane line through an inference stage.
2. The method according to claim 1, wherein the neural network model is constructed as follows:
the neural network model extracts the features of the convolution feature map;
the convolution characteristic diagram is connected with three branches and adopts convolution layers of 1x 1;
1x1 convolution layers respectively compress the number of channels of the characteristic diagram;
the compressed data carries out matrix multiplication on the first branch and the second branch after the first branch is transferred to obtain an attention feature map;
obtaining a normalized attention feature map by the attention feature map through softmax;
performing matrix multiplication on the third branch of the compressed data and the normalized attention feature map to obtain a weighted attention feature map;
performing dimension increasing on the number of the characteristic diagram channels by utilizing 1X1 convolution, and outputting a self-attention characteristic diagram;
an attention structure is obtained according to a neural network model, and the attention structure is specifically as follows:
the size of the characteristic diagram obtained after passing through the backbone network is not prevented from being N C H W,
then, the convolution of 1X1 is adopted by three branches to respectively compress the channel number of the feature map,
the sizes of the compressed characteristic graphs are all N C/8 (H W);
then, the characteristic diagram Q of the second branch is subjected to a transposition function to obtain Q ', and the size of Q' is N (H) W C/8;
performing matrix multiplication on Q 'and R to obtain M, and then performing normalization processing on M by using Softmax to obtain M', wherein the size of M is N (H) W;
representing the extracted global semantic information; performing matrix multiplication operation on the P and the M' to obtain a characteristic diagram O, wherein the size of the characteristic diagram O is N C/8 (H W);
finally, the number of channels is subjected to dimensionality increase by convolution with 1X1, and the size of the output feature graph is N × C (H × W).
3. The lane line detection method according to claim 1, wherein the training phase specifically operates as follows:
collecting lane line training sample images, wherein the lane line training sample images comprise RGB three-channel color images and are marked with the position and category information of lane lines;
the training sample is converted into a format required by a convolutional neural network through pretreatment, and the training effect is further improved;
dividing a data set into a training set, a verification set and a test set, wherein the training set is used for training the convolutional neural network, the verification set is used for selecting an optimal training model, and the test set is used for evaluating model indexes;
building a neural network structure, and replacing a common convolution structure in a PointLaneNet structure with an attention structure;
iterative training, forward propagation and calculation of a model prediction result and a loss function of the marked truth value; calculating gradients through back propagation;
observing a loss function descending curve, verifying the converged model on a verification set, and selecting an optimal training model;
the training process comprises the following steps:
initializing training times;
generating small-batch lane line training data according to the training times;
obtaining a predicted value through forward-transmitted training data, and calculating a loss function;
updating the weights by back propagation;
judging whether the updated weight reaches a training target; if yes, the training is finished; otherwise, carrying out the next step;
and (4) judging whether the training times are reached, if not, returning to the step (2) through the training times plus 1, and if so, ending the training.
4. The lane line detection method according to claim 1, wherein the inference stage specifically comprises the following steps:
randomly disorganizing the test set;
carrying out convolution neural network reasoning on the images in the test set to obtain a reasoning result;
post-processing the inference result to obtain a predicted lane line;
the pretreatment process comprises the following steps:
reading the image I and the annotation data L;
zooming and reading the coefficients of the image I and the labeled data L;
and normalizing the coefficients of the scaled read image I and the labeled data L.
5. The lane line detection method according to claim 4, wherein the scaling expressions in the coefficients of the scaled read image I and the annotation data L are as follows:
in the formula, dstxAnd dstyRepresenting the position of the zoomed image; srcxAnd srcyRepresenting the corresponding position in the original image;
sxand syRepresents a scaling factor for;
the expression for normalization in the coefficient normalization process of the scaled read image I and the annotation data L is as follows:
in formula (II)'(x,y)Representing pixel values of the normalized image; i is(x,y)Representing pixel values of the image before normalization; mean represents the mean of the image pixel values before normalization; std denotes the variance of the image pixel values before normalization.
6. The lane line detection method according to claim 1, wherein the data enhancement strategy of the lane line detection task involved in the training phase comprises horizontal flipping, rotation, translation, and scaling; the coordinates of points on the lane line before horizontal turning are set as (x, y) in the horizontal turning; the coordinate after flipping is (x ', y'), then the calculation formula is as follows:
in the formula, w represents the width of an image;
the rotation is defined as the coordinates of a point on the lane line before the image rotation is (x, y), the coordinates after the image rotation is (x ', y'), and the rotation angle is θ, then the calculation formula is as follows:
in the formula, w represents the width of an image; h represents the height of the image;
the coordinates of a point on the lane line before image translation are set as (x, y) in the translation, and the coordinates after translation are set as (x ', y'), so that the calculation formula is as follows:
in the formula (x)start,ystart) Representing the coordinates selected before translation, (x)end,yend) Representing coordinates after point translation;
and (3) the scaling is to set the coordinates of a point on the lane line before image scaling as (x, y) and the scaled coordinates as (x ', y'), so that the calculation formula is as follows:
where scale represents the scaling factor, (x)top,ytop) Representing the starting coordinates of the scaled image in the filler image.
7. A lane line detection system is characterized by comprising the following modules:
a first module for constructing a convolutional neural network model;
a second module for building a training and reasoning phase through the network model;
a third module for adjusting the pre-processing of the training phase;
a fourth module for predicting the lane line through the inference phase;
the first module further constructs the neural network model as follows:
the neural network model extracts the features of the convolution feature map;
the convolution characteristic diagram is connected with three branches and adopts convolution layers of 1x 1;
1x1 convolution layers respectively compress the number of channels of the characteristic diagram;
the compressed data carries out matrix multiplication on the first branch and the second branch after the first branch is transferred to obtain an attention feature map;
obtaining a normalized attention feature map by the attention feature map through softmax;
performing matrix multiplication on the third branch of the compressed data and the normalized attention feature map to obtain a weighted attention feature map;
performing dimension increasing on the number of the characteristic diagram channels by utilizing 1X1 convolution, and outputting a self-attention characteristic diagram;
further, an attention structure is obtained according to the neural network model, and the attention structure is specifically as follows:
the size of the characteristic diagram obtained after passing through the backbone network is not prevented from being N C H W,
then, the convolution of 1X1 is adopted by three branches to respectively compress the channel number of the feature map,
the sizes of the compressed characteristic graphs are all N C/8 (H W);
then, the characteristic diagram Q of the second branch is subjected to a transposition function to obtain Q ', and the size of Q' is N (H) W C/8;
performing matrix multiplication on Q 'and R to obtain M, and then performing normalization processing on M by using Softmax to obtain M', wherein the size of M is N (H) W;
representing the extracted global semantic information; performing matrix multiplication operation on the P and the M' to obtain a characteristic diagram O, wherein the size of the characteristic diagram O is N C/8 (H W);
finally, the size of the characteristic graph is output to be N, C (H, W) after the number of channels is subjected to dimensionality increase by convolution of 1X 1;
the second module is further the training phase, and the specific operation steps are as follows:
collecting lane line training sample images, wherein the lane line training sample images comprise RGB three-channel color images and are marked with the position and category information of lane lines;
the training sample is converted into a format required by a convolutional neural network through pretreatment, and the training effect is further improved;
dividing a data set into a training set, a verification set and a test set, wherein the training set is used for training the convolutional neural network, the verification set is used for selecting an optimal training model, and the test set is used for evaluating model indexes;
building a neural network structure, and replacing a common convolution structure in a PointLaneNet structure with an attention structure;
iterative training, forward propagation and calculation of a model prediction result and a loss function of the marked truth value; calculating gradients through back propagation;
observing a loss function descending curve, verifying the converged model on a verification set, and selecting an optimal training model;
the training process comprises the following steps:
initializing training times;
generating small-batch lane line training data according to the training times;
obtaining a predicted value through forward-transmitted training data, and calculating a loss function;
updating the weights by back propagation;
judging whether the updated weight reaches a training target; if yes, the training is finished; otherwise, carrying out the next step;
and (4) judging whether the training times are reached, if not, returning to the step (2) through the training times plus 1, and if so, ending the training.
8. The lane line detection system of claim 7, wherein said third module further performs said preprocessing as follows:
reading the image I and the annotation data L;
zooming and reading the coefficients of the image I and the labeled data L;
normalizing the coefficients of the zoom reading image I and the annotation data L;
the scaling expressions in the coefficients of the scaled read image I and the annotation data L are as follows:
in the formula, dstxAnd dstyRepresenting the position of the zoomed image; srcxAnd srcyRepresenting the corresponding position in the original image;
sxand syRepresents a scaling factor for;
the expression for normalization in the coefficient normalization process of the scaled read image I and the annotation data L is as follows:
in formula (II)'(x,y)Representing pixel values of the normalized image; i is(x,y)Representing pixel values of the image before normalization; mean represents the mean of the image pixel values before normalization; std represents the variance of the image pixel values before normalization;
the fourth module further enhances strategies including horizontal turning, rotation, translation and scaling for the data of the lane line detection task involved in the training phase; the coordinates of points on the lane line before horizontal turning are set as (x, y) in the horizontal turning; the coordinate after flipping is (x ', y'), then the calculation formula is as follows:
in the formula, w represents the width of an image;
the rotation is defined as the coordinates of a point on the lane line before the image rotation is (x, y), the coordinates after the image rotation is (x ', y'), and the rotation angle is θ, then the calculation formula is as follows:
in the formula, w represents the width of an image; h represents the height of the image;
the coordinates of a point on the lane line before image translation are set as (x, y) in the translation, and the coordinates after translation are set as (x ', y'), so that the calculation formula is as follows:
in the formula (x)start,ystart) Representing the coordinates selected before translation, (x)end,yend) Representing coordinates after point translation;
and (3) the scaling is to set the coordinates of a point on the lane line before image scaling as (x, y) and the scaled coordinates as (x ', y'), so that the calculation formula is as follows:
where scale represents the scaling factor, (x)top,ytop) Representing the starting coordinates of the scaled image in the filler image.
9. A lane line detection apparatus, the apparatus comprising:
a processor and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement a lane line detection method according to any one of claims 1 to 6.
10. A computer-readable storage medium having a computer stored thereon;
program instructions which, when executed by a processor, implement a lane line detection method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110091223.3A CN112883807A (en) | 2021-01-22 | 2021-01-22 | Lane line detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110091223.3A CN112883807A (en) | 2021-01-22 | 2021-01-22 | Lane line detection method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112883807A true CN112883807A (en) | 2021-06-01 |
Family
ID=76050531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110091223.3A Pending CN112883807A (en) | 2021-01-22 | 2021-01-22 | Lane line detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112883807A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269171A (en) * | 2021-07-20 | 2021-08-17 | 魔视智能科技(上海)有限公司 | Lane line detection method, electronic device and vehicle |
CN114782915A (en) * | 2022-04-11 | 2022-07-22 | 哈尔滨工业大学 | Intelligent automobile end-to-end lane line detection system and equipment based on auxiliary supervision and knowledge distillation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222591A (en) * | 2019-05-16 | 2019-09-10 | 天津大学 | A kind of method for detecting lane lines based on deep neural network |
CN110298387A (en) * | 2019-06-10 | 2019-10-01 | 天津大学 | Incorporate the deep neural network object detection method of Pixel-level attention mechanism |
CN111242037A (en) * | 2020-01-15 | 2020-06-05 | 华南理工大学 | Lane line detection method based on structural information |
-
2021
- 2021-01-22 CN CN202110091223.3A patent/CN112883807A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222591A (en) * | 2019-05-16 | 2019-09-10 | 天津大学 | A kind of method for detecting lane lines based on deep neural network |
CN110298387A (en) * | 2019-06-10 | 2019-10-01 | 天津大学 | Incorporate the deep neural network object detection method of Pixel-level attention mechanism |
CN111242037A (en) * | 2020-01-15 | 2020-06-05 | 华南理工大学 | Lane line detection method based on structural information |
Non-Patent Citations (1)
Title |
---|
王哲伟: "基于深度学习的车道线检测算法研究", 中国优秀硕士电子期刊网, no. 01, pages 035 - 472 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269171A (en) * | 2021-07-20 | 2021-08-17 | 魔视智能科技(上海)有限公司 | Lane line detection method, electronic device and vehicle |
CN113269171B (en) * | 2021-07-20 | 2021-10-12 | 魔视智能科技(上海)有限公司 | Lane line detection method, electronic device and vehicle |
CN114782915A (en) * | 2022-04-11 | 2022-07-22 | 哈尔滨工业大学 | Intelligent automobile end-to-end lane line detection system and equipment based on auxiliary supervision and knowledge distillation |
CN114782915B (en) * | 2022-04-11 | 2023-04-07 | 哈尔滨工业大学 | Intelligent automobile end-to-end lane line detection system and equipment based on auxiliary supervision and knowledge distillation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401361B (en) | End-to-end lightweight depth license plate recognition method | |
CN111696094B (en) | Immunohistochemical PD-L1 membrane staining pathological section image processing method, device and equipment | |
CN112837315B (en) | Deep learning-based transmission line insulator defect detection method | |
CN111160311A (en) | Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network | |
CN111612008B (en) | Image segmentation method based on convolution network | |
CN113177560A (en) | Universal lightweight deep learning vehicle detection method | |
CN111259940A (en) | Target detection method based on space attention map | |
CN113298815A (en) | Semi-supervised remote sensing image semantic segmentation method and device and computer equipment | |
CN110659601B (en) | Depth full convolution network remote sensing image dense vehicle detection method based on central point | |
CN112883807A (en) | Lane line detection method and system | |
CN108776777A (en) | The recognition methods of spatial relationship between a kind of remote sensing image object based on Faster RCNN | |
CN114821342A (en) | Remote sensing image road extraction method and system | |
CN114913498A (en) | Parallel multi-scale feature aggregation lane line detection method based on key point estimation | |
CN106372597A (en) | CNN traffic detection method based on adaptive context information | |
CN112037180B (en) | Chromosome segmentation method and device | |
CN110738132A (en) | target detection quality blind evaluation method with discriminant perception capability | |
CN114494870A (en) | Double-time-phase remote sensing image change detection method, model construction method and device | |
CN115205855A (en) | Vehicle target identification method, device and equipment fusing multi-scale semantic information | |
CN114387512A (en) | Remote sensing image building extraction method based on multi-scale feature fusion and enhancement | |
CN114120359A (en) | Method for measuring body size of group-fed pigs based on stacked hourglass network | |
CN111914596B (en) | Lane line detection method, device, system and storage medium | |
CN115147727A (en) | Method and system for extracting impervious surface of remote sensing image | |
KR102416714B1 (en) | System and method for city-scale tree mapping using 3-channel images and multiple deep learning | |
CN115761667A (en) | Unmanned vehicle carried camera target detection method based on improved FCOS algorithm | |
CN115376094A (en) | Unmanned sweeper road surface identification method and system based on scale perception neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |