CN116189130A

CN116189130A - Lane line segmentation method and device based on image annotation model

Info

Publication number: CN116189130A
Application number: CN202310183836.9A
Authority: CN
Inventors: 孟鹏飞; 朱磊; 贾双成; 郭杏荣
Original assignee: Zhidao Network Technology Beijing Co Ltd
Current assignee: Zhidao Network Technology Beijing Co Ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-05-30

Abstract

The application relates to a lane line segmentation method and device based on an image annotation model. The method comprises the following steps: acquiring a sample image, wherein the sample image comprises a road image marked by a lane line; inputting a sample image into a conditional random field algorithm to obtain a first result; inputting the sample image into a preset neural network to obtain a sample output result, and inputting the sample output result into a preset random field algorithm to obtain a second result; according to the first result and the second result, calculating a loss function of the preset neural network, and performing iterative optimization on parameters of the preset neural network based on the loss function to obtain a target image annotation model; and determining the lane line of the road image to be detected according to the road image to be detected and the target image annotation model. The scheme provided by the application can improve the lane line segmentation efficiency of the road image.

Description

Lane line segmentation method and device based on image annotation model

Technical Field

The application relates to the field of intelligent driving, in particular to a lane line segmentation method and device based on an image annotation model.

Background

When an automatic driving vehicle is driving or when a high-precision map is manufactured, the accurate position of a lane line needs to be known so as to determine a driving safety area or make some decisions according to the lane line. At present, various image segmentation models are gradually developed for segmenting various images. The image segmentation model relates to an important image segmentation model, namely an image semantic segmentation model. The image semantic segmentation model is generally a class of pixels for an image, and can finely segment the image. Before the semantic segmentation of the lane lines, each pixel of the lane line part in the image needs to be marked manually, the marking is difficult to be accurate to the pixel level, and the fact that each pixel is marked accurately is difficult to be ensured, so that the situation that some pixels are marked incorrectly exists in the data marked manually generally. If the data to be marked is accurate to each pixel, on one hand, the marking personnel needs to spend more time marking, the marking cost becomes high, on the other hand, even if a large amount of time marking is spent, finally, it is difficult to ensure that each pixel is marked correctly.

Disclosure of Invention

In order to solve or partially solve the problems in the related art, the application provides a lane line segmentation method and device based on an image annotation model, which improves the efficiency of lane line segmentation from a road image.

The lane line segmentation method based on the image annotation model comprises the following steps:

acquiring a sample image, wherein the sample image comprises a road image marked by a lane line;

inputting a sample image into a conditional random field algorithm to obtain a first result;

inputting the sample image into a preset neural network to obtain a sample output result, and inputting the sample output result into a preset random field algorithm to obtain a second result;

according to the first result and the second result, calculating a loss function of the preset neural network, and performing iterative optimization on parameters of the preset neural network based on the loss function to obtain a target image annotation model;

and determining the lane line of the road image to be detected according to the road image to be detected and the target image annotation model.

Optionally, acquiring the sample image includes:

collecting a road image, and carrying out gray processing on the road image;

and identifying lane lines in the road image after the gray level processing, and marking the lane lines to obtain a marked image.

Optionally, identifying the lane line in the road image and labeling the lane line image includes:

edge detection is carried out on the road image after gray processing, and pixel points with lane line characteristics are determined;

and marking first marks on pixel points of the lane line characteristics, and marking second marks on other pixels in the road image.

Optionally, inputting the sample image into a conditional random field algorithm to obtain a first result, including:

setting a characteristic function of a conditional random field, and calculating characteristic scores of lane lines in a sample image according to the characteristic function;

the feature scores of each road image in the sample image are weighted and summed, and a feature function set in the sample image is determined;

and obtaining the probability that the pixel points in the sample image are the pixel points of the lane lines according to the characteristic function set so as to obtain a first result.

Optionally, the preset neural network is a semantic segmentation model, the method further includes constructing the preset neural network, and constructing the preset neural network includes:

the method comprises the steps of setting a feature extraction module and a category output module, wherein the feature extraction module is used for extracting depth features of sample images, and the category output module is used for outputting probability that the sample images belong to each category according to the depth features;

And setting up-sampling functions and up-sampling layer numbers of the feature extraction module, and setting a depth convolution function of the category output module.

Optionally, the method further comprises constructing a loss function, the constructing the loss function comprising:

acquiring a first probability that the pixel point is predicted to be a lane line according to a first result;

acquiring a second probability of the pixel point being predicted as a lane line according to a second result;

setting class labels of lane line pixels and non-lane line pixels;

and constructing a loss function according to the first probability, the second probability and the class labels.

Optionally, performing multiple iterative optimization on the preset neural network according to the loss function to obtain the target image annotation model, including:

performing repeated iterative training on the preset neural network according to the loss function result until the preset neural network converges to obtain a target image annotation model;

wherein, each iterative training in the multiple iterative training of the preset neural network comprises:

and calculating the error of the preset neural network annotation according to the preliminary result of the loss function calculation and the sample annotation in the sample image, and correcting and optimizing the model parameters of the preset neural network by adopting a random gradient descent method.

The second aspect of the present application provides a lane line segmentation device based on an image labeling model, comprising:

the acquisition module is used for acquiring a sample image, wherein the sample image comprises a road image marked by a lane line;

the first acquisition module is used for inputting the sample image into a conditional random field algorithm to obtain a first result;

the second acquisition module is used for inputting the sample image into a preset neural network to obtain a sample output result, and inputting the sample output result into a preset random field algorithm to obtain a second result;

the training module is used for calculating a loss function of the preset neural network according to the first result and the second result, and performing iterative optimization on parameters of the preset neural network based on the loss function to obtain a target image annotation model;

the segmentation module is used for determining lane lines of the road image to be detected according to the road image to be detected and the target image annotation model.

A third aspect of the present application provides an electronic device, comprising:

a processor; and

a memory having executable code stored thereon which, when executed by the processor, causes the processor to perform the method as described above.

A fourth aspect of the present application provides a computer readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform a method as described above.

The technical scheme that this application provided can include following beneficial effect: on the one hand, the sample image is predicted based on the conditional random field algorithm, and fewer image labeling images can be adopted to acquire a data training set of the preset neural network, so that the training efficiency of the post-preset neural network is improved; on the other hand, because the image result obtained by the conditional random field algorithm may contain more category information, the method is equivalent to labeling more valuable pixels in the screened sample image, so that the influence on the training of the image semantic segmentation model can be reduced. In addition, as part of pixels are screened from the sample image, compared with the mode of labeling all pixel points in the sample image, the labeling mode in the embodiment of the application relatively increases the uncertainty of the labeled object, thereby improving the uncertainty in the model training process and further being beneficial to improving the generalization capability of the image semantic segmentation model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

Fig. 1 is a flow chart of a lane line segmentation method based on an image labeling model according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a lane line segmentation apparatus based on an image labeling model according to an embodiment of the present application;

fig. 3 is a schematic structural view of a vehicle shown in an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In order to facilitate a better understanding of the technical solutions of the present application, the following description refers to the terms related to the present application.

1. Artificial intelligence (Artificial Intelligence, AI): the system is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

2. Machine Learning (ML): is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

3. Convolutional neural network (Convolutional Neural Network, CNN): is a feedforward neural network, and its artificial neuron can respond to a part of surrounding units in coverage area, and has excellent performance for large-scale image processing. Convolutional neural networks consist of one or more convolutional layers and a top fully connected layer (corresponding to classical neural networks) and also include associated weights and pooling layers (pooling layers).

4. Depth characteristics: the image features extracted through the depth network contain abstract information of the image.

5. Semantic segmentation: and according to the interested object to which each pixel in the image belongs, a corresponding class label is allocated to the interested object.

6. Semantic image: the result obtained after assigning a category note to each pixel in the image.

7. Mask image: in the embodiment of the present application, the labeling image used for representing the road image may be a binary image, where the binary image includes a first type of pixel point with a first value and a second type of pixel point with a second value, for example, a value of 0 in the binary image indicates that the pixel point is not selected, and if the value of 1 in the binary image indicates that the pixel point is selected.

8. Condition generating countermeasure network (Conditional Generative Adversarial Nets, CGAN): an improvement on the GAN base is achieved by adding additional condition information to the Generator and Discriminator of the original GAN. The additional condition information may be a category tag or other auxiliary information.

9. ImageNet database: a large-scale database of 1000 categories is included.

10. MobileNet V2, a commonly used lightweight network model architecture, is trained on an ImageNet database and can be used to extract image features.

11. Image classification and category: image classification refers to an image processing method of distinguishing objects of different categories according to different features each reflected in image information. The method utilizes a computer to quantitatively analyze the image, and classifies each pixel point or region in the image into one of a plurality of categories to replace the visual interpretation of people. The category may also be referred to as a classification. There may be two or more categories in embodiments of the present application, such as vehicles, roads, etc. When the image semantic segmentation model is applicable to different scenes, the corresponding categories to be annotated can be different. Each object in the image is actually constituted by a pixel, and the category of the pixel corresponds to the category of the object.

12. Sample image and target image: all belong to the image, in the embodiment of the application, the image used for training the model is called a sample image, and the image which is processed by using the model later is called a target image.

13. Edge information and edge pixel points: the edge information is used for describing the information of the pixels with discontinuous gray level change of the neighborhood pixels in the image, the pixels with discontinuous gray level change of the neighborhood pixels in the pixels are edge pixels, and the edge information can specifically comprise gray level values of the edge pixels, shapes formed by the edge pixels and the like. Edges exist widely between objects and the background, object to object. Edge information in an image may be obtained by image edge detection.

14. Conditional Random Fields (CRFs), which are a discriminant probability model, are one type of random fields that are commonly used to label or analyze sequence data, such as natural language text or biological sequences. The conditional random field is a conditional probability distribution model P (y|x) representing a markov random field for a given set of input random variables X and another set of output random variables Y, that is to say CRF is characterized by the assumption that the output random variables constitute the markov random field. Conditional random fields can be seen as a generalization of the maximum entropy markov model over labeling problems. Like a Markov random field, a conditional random field is a model of a graph with undirected directions, vertices in the graph represent random variables, links between the vertices represent dependencies between the random variables, and in the conditional random field, the distribution of the random variable Y is a conditional probability, and a given observed value is the random variable X. In principle, the graph model layout of the conditional random field can be arbitrarily given, and a common layout is a link-type architecture, and a link-type architecture has a relatively high algorithm available for calculation, whether in training (training), inference (reference), or decoding (decoding). Conditional random fields are a typical discriminant model whose joint probabilities can be written in the form of a number of potential function multiplications, with linear chain member random fields being the most common.

In view of the above problems, embodiments of the present application provide a lane line segmentation method based on an image labeling model, which improves the efficiency of lane line segmentation from a road image.

The following describes the technical scheme of the embodiments of the present application in detail with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of a lane line segmentation method based on an image labeling model according to an embodiment of the present application mainly includes steps S101 to S104, which are described as follows:

step S101: and acquiring a sample image, wherein the sample image comprises a road image with lane line labels.

In this embodiment, the road image is captured by the vehicle-mounted camera and includes surrounding lane lines, and the road image in the sample image carries lane line labels. Acquiring a sample image, comprising: collecting a road image, and carrying out gray processing on the road image; and identifying lane lines in the road image, and marking the lane lines to obtain a marked image.

In one embodiment, identifying and labeling lane lines in a road image includes: edge detection is carried out on the road image after gray processing, and pixel points with lane line characteristics are determined; and marking first marks on pixel points of the lane line characteristics, and marking second marks on other pixels in the road image.

In this embodiment, the edge detection includes: detecting edge information of the sample image, determining edge pixel points with lane line edge characteristics in the sample image according to the edge information, and correspondingly determining lane line pixels of the edge pixel points included in the image. Edge pixels can be understood as pixels whose pixel values vary greatly according to the lane line characteristics. Specifically, the sample image is converted into a gray scale image, and edge information of the image is extracted based on a preset edge detection algorithm, for example, a canny edge detection algorithm, and an edge detection process is described below by taking the canny edge detection algorithm as an example:

S1.2, gaussian filtering is carried out on the sample image after gray processing.

The main purpose of gaussian filtering is to reduce noise in the sample image after gray scale processing. The gaussian filtering of the sample image after the gray-scale processing may be actually understood as performing weighted average on the sample image after the gray-scale processing, that is, the gray-scale value of each pixel in the sample image after the gray-scale processing is obtained by performing weighted average on the gray-scale values of the pixel and other pixels in the neighborhood of the pixel. The Gaussian filtering is used for carrying out weighted average on the gray value of each pixel point in the sample image after gray processing, so that some noise in the image is filtered, the whole outline in the sample image after gray processing is relatively blurred, the whole processed image is smoother, and the width of the outline is relatively increased. And calculating a gradient value and a gradient direction in the Gaussian filtered image. An edge is understood as a collection of pixels with large gray value variation, for example, one is a black edge and one is a white edge, so that a portion between the black edge and the white edge is generally an edge, and when in implementation, the gray value variation can be detected, so as to find an edge in an image, wherein the gray value variation degree can be represented by a gradient value, and the gray value variation direction can be represented by a gradient direction.

S1.3: non-maximum values are filtered.

Since the contour width in the image is actually enlarged, which may affect the accuracy of detecting the edge, S1.3 is mainly used to screen out the pixels that do not belong to the edge. Specifically, if the first server determines that the gradient value of the pixel point in the gradient direction is the largest, determining that the pixel point belongs to a suspected edge pixel point; if it is determined that the gradient value of the pixel in the gradient direction is not the maximum, it is determined that the pixel is not an edge pixel, and so on, thereby excluding some pixels not belonging to an edge. The suspected edge pixels may be understood as primarily identified edge pixels, but may be further determined. Alternatively, the suspected edge pixel point may be directly used as an edge pixel point, so as to obtain an edge pixel point in the sample image after gray processing.

S1.4: the edge is determined using an upper threshold. In order to determine more accurate edge pixel points, in the embodiment of the present application, the suspected edge pixel points in S1.3 may be further screened by a high threshold and a low threshold. Wherein the high threshold is greater than the low threshold. Specifically, if the first server determines that the gradient value of the suspected edge pixel point is greater than the high threshold value, determining that the suspected edge pixel point is an edge pixel point; if the gradient value of the suspected edge pixel point is smaller than the high threshold value but larger than the low threshold value, determining that the suspected edge pixel point belongs to the edge pixel point; and if the gradient value of the suspected edge pixel point is smaller than or equal to the low threshold value, determining that the suspected edge pixel point does not belong to the edge pixel point. In this way, all edge pixel points in the sample image after gray processing can be determined, so that an edge image can be obtained. An edge image may be understood as an image that identifies edge pixels and non-edge pixels. Non-edge pixels may be understood as pixels in an image that do not belong to edge pixels.

After the edge image is obtained, the edge pixel point is further determined according to the gray value of the edge pixel point, for example, the gray value of the pixel point is a preset value and belongs to the edge pixel point. After obtaining the edge image, it is naturally possible to determine the number of edge pixel points of each sample image. It should be noted that the number of edge pixels of each sample image may be 0, 1, or more.

After the edge image is determined, the pixels of the lane lines are marked with first marks, and the pixels except the lane lines in the plurality of sample pictures are marked with second marks, so that a mask image is obtained. The first mark and the second mark belong to different marks, and the first mark and the second mark can be of the same type, but belong to different marks under the same type, for example, the first mark and the second mark are both expressed by colors, for example, the first mark is white, the white can be expressed by '1', the second mark is black, and the black can be expressed by '0'. The first identifier and the second identifier are of the same type, which can facilitate subsequent devices to parse the mask image. The first identifier and the second identifier may also be of different types, which is not limited in this application.

Step S102, inputting the sample image into a conditional random field algorithm to obtain a first result.

In one embodiment, obtaining the first result based on a conditional random field algorithm and the sample image comprises: setting a feature function of a conditional random field, and calculating feature scores of road images in the sample images according to the feature function; weighting and summing the feature scores of each road image in the sample image, and determining a feature function set in the sample image; and obtaining the probability that the pixel points in the sample image are the pixel points of the lane lines according to the characteristic function set so as to obtain the first result.

Specifically, each element in the sample picture is compared with labels divided in the labeling picture, a feature function (which may be a probability matrix) is introduced through a conditional random field, and the transition probability that each pixel becomes a lane line pixel is calculated, so that labels and probability scores corresponding to each pixel in the sample picture are obtained. The conditional random field (conditional random fields, abbreviated as CRF, or CRFs) is a discriminant probability model, which is a type of random field, and is commonly used for labeling or analyzing sequence data, such as natural language text or biological sequences. The conditional random field is a conditional probability distribution model P (y|x) representing a markov random field for a given set of input random variables X and another set of output random variables Y, that is to say a conditional random field characterized by the assumption that the output random variables constitute a markov random field.

In the embodiment of the application, a feature function is introduced through a conditional random field, and an initial score is input into the feature function, so that the probability that each pixel in a sample picture becomes the number of pixels of a lane line is calculated, a probability score is obtained, and the tag probability distribution of each pixel is obtained, namely the similarity of each element and a tag is obtained.

S2.1: based on the probability score and the label probability score for each pixel, a total score for each pixel is obtained.

S2.2: and gradually accumulating the total score of each pixel according to the Viterbi algorithm to obtain the optimal solving path of each direction.

S2.3: and combining the optimal solving paths of each pixel to obtain an optimal solving path of the first result, and generating a characteristic function set according to the optimal solving path.

Specifically, the idea of the viterbi algorithm predicting the optimal annotation sequence is: by accumulating the total score of each element step by step, when updated to the annotation, the optimal distribution of each element is then predicted back by the total score of the annotation, since the annotation is deterministic.

Step S103: and inputting the sample image into a preset neural network to obtain a sample output result, and inputting the sample output result into a preset random field algorithm to obtain a second result.

In one embodiment, the preset neural network is a semantic segmentation model, and acquiring the preset neural network includes: the method comprises the steps of setting a feature extraction module and a category output module, wherein the feature extraction module is used for extracting depth features of sample images, and the category output module is used for outputting probability that the sample images belong to each category according to the depth features; and setting up-sampling functions and up-sampling layer numbers of the feature extraction module, and setting a depth convolution function of the category output module.

Specifically, the feature extraction module may be implemented by a MobileNet network, specifically, for example, mobileNet v2.Mobilenet v2 can be pre-trained using images in ImageNet database. The class output module may be implemented by a convolution layer, an activation layer, and an upsampling layer, and optionally, a first activation layer, a first convolution layer, a second activation layer, a second convolution layer, a third activation layer, a first upsampling layer, a fourth activation layer, a second upsampling layer, a fifth activation layer, a third upsampling layer, a sixth activation layer, a fourth upsampling layer, a seventh activation layer, and a fifth upsampling layer that are sequentially connected by a class output OK.

Specifically, the MobileNetV2 network is used as a feature extraction module, the feature extraction module is used for extracting convolution features of sample images, outputting corresponding feature images, carrying out convolution processing on the feature images through a convolution layer, outputting the feature images, and expanding the scale of the feature images through 5 up-sampling layers. Each up-sampling layer inserts a zero point into the middle of the inputted feature map to enlarge the image, and then carries out convolution operation on the enlarged image to output the enlarged feature map, for example, the width and the height of the output feature map can be 2 times of the inputted feature map. And the last sampling layer, the number of output channels of which is C, corresponds to the probability that each pixel belongs to various categories respectively, so as to obtain the probability that each pixel belongs to different categories.

For example, the sample image, the labeling result, and the mask image may be processed to preset sizes, respectively, such as 640×360×3, where 3 represents the number of channels. And inputting the processed sample image into an image semantic segmentation model in the embodiment, and obtaining a first feature map after being processed by a MobileNet V2 network. The first feature map sequentially passes through a first activation layer and a first convolution layer, wherein the convolution kernel of the first convolution layer is 4, the step length is 2, and the second feature map with 512 channels is output.

Similarly, the second feature map sequentially passes through the second activation layer and the second convolution layer, so as to obtain a third feature map with the channel number of 512. The third feature map is input to the third activation layer and the first upsampling layer to obtain a fourth feature map with a channel number of 512. And sequentially inputting the fourth characteristic diagram into a fourth activating layer and a second sampling layer to obtain a fifth characteristic diagram with 256 channels. And sequentially inputting the fifth characteristic map into a fifth activating layer and a third sampling layer to obtain a sixth characteristic map of the channel number 128. And sequentially inputting the seventh characteristic map into a sixth activation layer and a fourth sampling layer to obtain an eighth characteristic map with the channel number of 64. And inputting the eighth feature map and the eighth feature map into a sixth activation layer and a fifth sampling layer to obtain a semantic segmentation image with the channel number of 7, wherein the size of the semantic segmentation image is 640 multiplied by 360 multiplied by 7.

In the embodiment of the application, for a sample output result of a preset neural network, substituting a conditional random field algorithm, wherein each pixel in the output second result is a probability value of a lane line. The conditional random field algorithm is introduced to add probability value labels to the categories output by the semantic segmentation result, so that the comparison of the first result and the second result is converted into the comparison of the probability value of the first result and the probability value of the second result, and the loss function of the preset neural network is calculated more accurately.

Step S103: and calculating a loss function of the preset neural network according to the first result and the second result, and performing iterative optimization on parameters of the preset neural network based on the loss function to obtain a target image annotation model.

In one embodiment, the method further comprises constructing a loss function, the constructing the loss function comprising: acquiring a first probability value of a pixel point predicted as a lane line according to a first result; acquiring a second probability value of the pixel point predicted as the lane line according to a second result; setting class labels of lane line pixels and non-lane line pixels; and constructing a loss function according to the first probability value, the second probability value and the category label.

The expression of the loss function is various and can be represented by the following formula:

Wherein K represents the number of sample images of the semantic segmentation model of the input image, P represents the total number of pixel points included in the sample images, C represents the total number of categories and m _i,p The value corresponding to the p pixel point in the mask image corresponding to the i sample image is the y _i,p,c Probability that the p-th pixel point in the sample image belongs to the c-th category, y _i,p,c As the labeling result that the p-th pixel point in the i-th sample image belongs to the c-th category, if the category corresponding to the p-th pixel point is c, y _i,p,c 1, otherwise y _i,p,c The value is 0.

In one embodiment, performing iterative optimization on a preset neural network for multiple times according to a loss function to obtain a target image annotation model, including: performing repeated iterative training on the preset neural network according to the loss function result until the preset neural network converges to obtain a target image annotation model; wherein, each iterative training in the multiple iterative training of the preset neural network comprises: and calculating the error of the preset neural network annotation according to the preliminary result of the loss function calculation and the sample annotation in the sample image, and correcting and optimizing the model parameters of the preset neural network by adopting a random gradient descent method.

Specifically, according to the pixel probability value of the first result and the pixel probability value (semantic segmentation result) of the second result, adjusting the model parameters of the image semantic segmentation model, wherein how to adjust the model parameters of the image semantic segmentation model specifically is described as follows:

the value of the loss function can be determined according to the labeling result (the probability value that the pixel point is a lane line) of each pixel point of the first result in the sample image and the semantic segmentation result of the pixel point corresponding to at least one first data in the semantic segmentation result, and then the model parameters of the image semantic segmentation model are adjusted according to the value of the loss function, so that the difference between the labeling result and the semantic segmentation result of the image semantic segmentation model is reduced. And obtaining the trained image semantic segmentation model until the image semantic segmentation model converges.

Specifically, when the image semantic model is subjected to iterative training for a plurality of times, if the image semantic segmentation model converges, the training is determined to be completed, and a trained image semantic segmentation model is obtained. The convergence of the image semantic segmentation model can be that the value of the loss function is smaller than a loss threshold value, or the iteration times reach preset times, and the like, and the specific condition of the convergence is not limited.

In the embodiment of the application, the image semantic segmentation model is trained by the first result, and a large number of data samples can be generated according to a small number of marked images because the first result is generated according to a random training field. Moreover, as the pixel points in the sample image are not marked in the first result, the possibility of overfitting of the image semantic segmentation model is reduced, and the generalization performance of the trained semantic segmentation model is better. Because the sample image is not completely marked, the uncertainty of the marking result of the sample image is improved, the processing capacity of the image semantic segmentation model can be improved, and the accuracy of the segmentation result obtained by the image semantic segmentation model is improved

Step S104: and determining the lane line of the road image to be detected according to the road image to be detected and the target image annotation model.

In this embodiment, a road image to be detected is input into a target image labeling model, a probability that each pixel in the road image to be detected is a lane line pixel is generated, and the pixel which becomes the lane line pixel is determined according to a preset threshold value and extracted to obtain a target lane line.

On the one hand, the sample image is predicted based on the conditional random field algorithm, and fewer image labeling images can be adopted to acquire a data training set of the preset neural network, so that the training efficiency of the post-preset neural network is improved; on the other hand, because the image result obtained by the conditional random field algorithm may contain more category information, the method is equivalent to labeling more valuable pixels in the screened sample image, so that the influence on the training of the image semantic segmentation model can be reduced. In addition, as part of pixels are screened from the sample image, compared with the mode of labeling all pixel points in the sample image, the labeling mode in the embodiment of the application relatively increases the uncertainty of the labeled object, thereby improving the uncertainty in the model training process and further being beneficial to improving the generalization capability of the image semantic segmentation model.

Corresponding to the embodiment of the application function implementation method, the application further provides a lane line segmentation device based on the image annotation model, a vehicle and corresponding embodiments.

Fig. 2 is a schematic structural diagram of a lane line segmentation apparatus based on an image labeling model according to an embodiment of the present application. For ease of illustration, only portions relevant to embodiments of the present application are shown. The apparatus of fig. 2 mainly includes an acquisition module 201, a first acquisition module 202, a second acquisition module 203, a training module 204, and a segmentation module 205, where:

the acquisition module 201 is configured to acquire a sample image, where the sample image includes a road image with lane line labels;

the first acquisition module 202 inputs the sample image into a conditional random field algorithm to obtain a first result;

the second obtaining module 203 is configured to input a sample image into a preset neural network to obtain a sample output result, and input the sample output result into a preset random field algorithm to obtain a second result;

the training module 204 is configured to calculate a loss function of the preset neural network according to the first result and the second result, and iteratively optimize parameters of the preset neural network based on the loss function to obtain a target image labeling model;

The segmentation module 205 is configured to determine a lane line of the road image to be detected according to the road image to be detected and the target image annotation model.

Referring to fig. 3, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 300 includes a memory 310 and a processor 320.

The processor 320 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 310 may include various types of storage units such as system memory, read Only Memory (ROM), and persistent storage. Where the ROM may store static data or instructions that are required by the processor 320 or other modules of the computer. The persistent storage may be a readable and writable storage. The persistent storage may be a non-volatile memory device that does not lose stored instructions and data even after the computer is powered down. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the persistent storage may be a removable storage device (e.g., diskette, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as dynamic random access memory. The system memory may store instructions and data that are required by some or all of the processors at runtime. Furthermore, memory 310 may include any combination of computer-readable storage media including various types of semiconductor memory chips (e.g., DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic disks, and/or optical disks may also be employed. In some implementations, memory 310 may include a readable and/or writable removable storage device such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a blu-ray read only disc, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, micro-SD card, etc.), a magnetic floppy disk, and the like. The computer readable storage medium does not contain a carrier wave or an instantaneous electronic signal transmitted by wireless or wired transmission.

The memory 310 has stored thereon executable code that, when processed by the processor 320, can cause the processor 320 to perform some or all of the methods described above.

Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing part or all of the steps of the above-described method of the present application.

Alternatively, the present application may also be embodied as a computer-readable storage medium (or non-transitory machine-readable storage medium or machine-readable storage medium) having stored thereon executable code (or a computer program or computer instruction code) which, when executed by a processor of a vehicle (or a server, etc.), causes the processor to perform part or all of the steps of the above-described method according to the present application.

The embodiments of the present application have been described above, the foregoing description is exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. The lane line segmentation method based on the image annotation model is characterized by comprising the following steps of:

acquiring a sample image, wherein the sample image comprises a road image with lane line marks;

inputting the sample image into a conditional random field algorithm to obtain a first result;

inputting the sample image into a preset neural network to obtain a sample output result, and inputting the sample output result into the preset random field algorithm to obtain a second result;

2. The method of claim 1, wherein the acquiring the sample image comprises:

collecting a road image, and carrying out gray processing on the road image;

3. The method of claim 2, wherein the identifying and labeling the lane lines in the road image comprises:

4. The method of claim 1, wherein said inputting said sample image into a conditional random field algorithm results in a first result comprising:

setting a characteristic function of a conditional random field, and calculating characteristic scores of lane lines in the sample image according to the characteristic function;

weighting and summing the feature scores of each road image in the sample image, and determining a feature function set in the sample image;

and obtaining the probability that the pixel points in the sample image are the pixel points of the lane lines according to the characteristic function set so as to obtain the first result.

5. The method of claim 1, wherein the predetermined neural network is a semantic segmentation model, the method further comprising constructing the predetermined neural network, the constructing the predetermined neural network comprising:

the method comprises the steps of setting a feature extraction module and a category output module, wherein the feature extraction module is used for extracting depth features of sample images, and the category output module is used for outputting probabilities that the sample images belong to various categories according to the depth features;

6. The method of claim 1, further comprising constructing a loss function, the constructing a loss function comprising:

setting class labels of lane line pixels and non-lane line pixels;

and constructing the loss function according to the first probability, the second probability and the class label.

7. The method according to claim 1, wherein the performing, with a plurality of iterative optimizations on the preset neural network according to the loss function to obtain a target image annotation model includes:

performing repeated iterative training on a preset neural network according to the loss function result until the preset neural network converges to obtain a target image annotation model;

8. A lane segmentation device based on an image annotation model, the device comprising:

the acquisition module is used for acquiring a sample image, wherein the sample image comprises a road image with lane line marks;

the second acquisition module is used for inputting the sample image into a preset neural network to obtain a sample output result, and inputting the sample output result into the preset random field algorithm to obtain a second result;

9. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor causes the processor to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon executable code which when executed by a processor of an electronic device causes the processor to perform the method of any of claims 1 to 7.