CN111062347A

CN111062347A - Traffic element segmentation method in automatic driving, electronic device and storage medium

Info

Publication number: CN111062347A
Application number: CN201911332028.4A
Authority: CN
Inventors: 熊迹; 惠念; 肖圣; 刘春成; 罗跃军
Original assignee: Heading Data Intelligence Co Ltd
Current assignee: Heading Data Intelligence Co Ltd
Priority date: 2019-12-21
Filing date: 2019-12-21
Publication date: 2020-04-24
Anticipated expiration: 2039-12-21
Also published as: CN111062347B

Abstract

The invention provides a traffic element segmentation method in automatic driving, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring an acquired traffic scene image, and segmenting the traffic scene image through a pre-trained segmentation model; performing convolution calculation on the traffic scene image to obtain a feature matrix, performing down-sampling and up-sampling on the feature matrix, then overlapping the feature matrix with a feature matrix processed based on an ASPP (advanced application Programming protocol) module and an attention mechanism to obtain a target feature matrix, and classifying the target feature matrix through a sigmoid function to obtain a traffic element segmentation result; and simultaneously inputting the segmentation result and the preset standard data into a countermeasure network to train the countermeasure network, and generating a traffic element segmentation image with confidence degree through the countermeasure network. By the scheme, the problem that the existing traffic element segmentation model is poor in generalization capability is solved, the generalization capability of a network model can be effectively improved, and the adaptability to different traffic scenes is enhanced.

Description

Traffic element segmentation method in automatic driving, electronic device and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a method for segmenting traffic elements in automatic driving, an electronic device, and a storage medium.

Background

In the field of automatic driving, a camera mounted on a vehicle is required to identify and analyze traffic elements such as roads and signs in a traffic scene so that a driving system can make a decision further according to the scene. Generally, various different objects exist in a traffic scene, and in order to ensure the accuracy of detection and identification, different traffic elements need to be accurately segmented.

At present, the image segmentation method based on deconvolution can more accurately segment traffic elements and keep details lost, however, due to the fact that traffic scene images acquired in automatic driving are various, image segmentation models obtained by training through methods such as deconvolution and void convolution are simply adopted according to acquired samples, generalization capability is poor, and the method is difficult to adapt to multi-scene traffic element segmentation recognition.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an electronic device, and a storage medium for segmenting traffic elements in automatic driving, so as to solve the problem that an existing traffic scene image segmentation recognition model is poor in generalization capability.

In a first aspect of an embodiment of the present invention, a method for segmenting traffic elements in automatic driving is provided, including:

acquiring a traffic scene image acquired by a vehicle-mounted camera, and segmenting the traffic scene image through a pre-trained segmentation model;

specifically, convolution calculation is carried out on the traffic scene image to obtain a feature matrix, the feature matrix is subjected to down sampling and up sampling, then is superposed with the feature matrix processed based on an ASPP (automatic document Programming protocol) module and an attention mechanism to obtain a target feature matrix, and the target feature matrix is classified through a sigmoid function to obtain a traffic element segmentation result;

and simultaneously inputting the segmentation result and the preset standard data into a countermeasure network to train the countermeasure network, and generating a traffic element segmentation image with confidence degree through the countermeasure network.

In a second aspect of the embodiments of the present invention, there is provided an electronic device for traffic element segmentation in automatic driving, including:

the segmentation module is used for acquiring a traffic scene image acquired by a vehicle-mounted camera and segmenting the traffic scene image through a pre-trained segmentation model;

and the generation module is used for simultaneously inputting the segmentation result and the preset standard data to the countermeasure network to train the countermeasure network, and generating the traffic element segmentation image with confidence degree through the countermeasure network.

In a third aspect of the embodiments of the present invention, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the embodiments of the present invention.

In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method provided in the first aspect of the embodiments of the present invention.

In the embodiment of the invention, a traffic scene image is segmented through a pre-trained segmentation model, a segmentation result and preset standard data are simultaneously input into an countermeasure network to train the countermeasure network, and a traffic element segmentation image with confidence coefficient is generated through the countermeasure network, wherein a feature matrix is obtained by performing convolution calculation on the traffic scene image, after down-sampling and up-sampling are performed on the feature matrix, the feature matrix is superposed with a feature matrix processed based on an ASPP (automatic switched programming) module and an attention mechanism to obtain a target feature matrix, and the target feature matrix is classified through a sigmoid function to obtain a traffic element segmentation result. The segmentation model is reversely adjusted based on the countermeasure network, so that the generalization capability of the neural network model can be enhanced, the problem of poor generalization capability of the existing segmentation model is solved, the dependence of the traditional convolutional neural network on a sample set is reduced, and different traffic scenes can be better adapted. Meanwhile, based on up-sampling and down-sampling, the traffic element segmentation precision can be improved through the ASPP module and attention mechanism processing, more information in the acquired image is acquired, and the trained model has stronger identification capability.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method for segmenting traffic elements in automatic driving according to an embodiment of the present invention;

FIG. 2 is another schematic flow chart of a method for segmenting traffic elements in automatic driving according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device for traffic element segmentation in automatic driving according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons skilled in the art without any inventive work shall fall within the protection scope of the present invention, and the principle and features of the present invention shall be described below with reference to the accompanying drawings.

The terms "comprises" and "comprising," when used in this specification and claims, and in the accompanying drawings and figures, are intended to cover non-exclusive inclusions, such that a process, method or system, or apparatus that comprises a list of steps or elements is not limited to the listed steps or elements.

Referring to fig. 1, fig. 1 is a schematic flow chart of a traffic element segmentation method in automatic driving according to an embodiment of the present invention, including:

s101, acquiring a traffic scene image acquired by a vehicle-mounted camera, and segmenting the traffic scene image through a pre-trained segmentation model;

the traffic scene image is an image acquired in each direction in the driving process of the vehicle, generally is influence data of the vehicle in the road driving process, and traffic factors such as roads, road marking lines, signs, signal lamps and the like can be acquired according to the traffic scene image so as to facilitate a driving system to make driving decisions.

The pre-trained segmentation model is used for detecting and segmenting traffic elements of the traffic scene image, and the pre-trained segmentation model is pre-trained through the constructed traffic element training set, so that the segmentation model based on the convolutional neural network can perform segmentation processing on the traffic scene image.

Preferably, the traffic elements are extracted from the acquired traffic scene images, and the traffic elements are used as samples to train the segmentation model, wherein the iterative update algorithm of the segmentation model is a binary cross entropy loss function BCELoss.

Specifically, convolution calculation is carried out on the traffic scene image in the pre-trained segmentation model to obtain a feature matrix, the feature matrix is subjected to down-sampling and up-sampling and then is superposed with the feature matrix processed based on an ASPP (automatic position protocol) module and an attention mechanism to obtain a target feature matrix, and the target feature matrix is classified through a sigmoid function to obtain a traffic element segmentation result;

and setting related training network parameters to train the deep learning network by taking the traffic scene image as a sample. The traffic scene image is cut to a preset size, the traffic scene image can be conveniently used as a sample to train the segmentation model, and traffic element characteristics are obtained so as to detect traffic elements in the segmented traffic image.

The down-sampling is used for reducing the image to enable the image to conform to the size of a display area, the up-sampling can enlarge the image to enable the image to have more information, and the up-sampling and the down-sampling are realized through a pooling technology, so that the characteristic dimensionality can be reduced, and effective information is reserved. In the downsampling process, the size of the image features is reduced, the number of feature maps is increased, and thus features of various sizes in one image can be obtained as far as possible. The up-sampling is to decode the image, enlarge the feature map by various interpolation methods, then fuse the feature map with the down-sampling same size, and position the feature map at the position of the original image. This facilitates the most efficient classification of each pixel of the feature map and the acquisition of the classified position information.

The use of extended Convolution (Atrous Convolution) in the ASPP module can expand the field of view of feature extraction without sacrificing feature spatial resolution, and ASPP exploits multi-scale information to further enhance the segmentation effect. The attention mechanism (attention mechanism) is used between image coding and decoding, the limited feature storage space occupied in the coding process can be reduced, meanwhile, the decoding process can make the input have difference according to the useful information of different stages, namely, the attention mechanism can reduce the commonality and only pay attention to the difference between the features.

Carrying out convolution calculation on a sample traffic scene image to obtain a characteristic matrix, processing the characteristic matrix through an ASPP module and an attention mechanism to obtain a new characteristic matrix, carrying out down-sampling and convolution calculation on an original characteristic matrix, then continuing to carry out processing through the ASPP module and the attention mechanism, carrying out up-sampling on the characteristic matrix obtained through multiple times of convolution, and superposing the characteristic matrix obtained through processing through the ASPP module and the attention mechanism to obtain a target characteristic matrix.

In another embodiment of the present invention, the process of processing the original traffic scene image to obtain the target feature includes:

performing convolution calculation on the cut traffic scene image to obtain a characteristic matrix R1;

processing the feature matrix R1 through an ASPP module and an attention mechanism to obtain a feature matrix Q1;

after downsampling is carried out on the feature matrix R1, convolution calculation is carried out to obtain a feature matrix R2;

processing the feature matrix R2 through an ASPP module and an attention mechanism to obtain a feature matrix Q2;

after downsampling is carried out on the feature matrix R2, convolution calculation is carried out to obtain a feature matrix R3;

processing the feature matrix R3 through an ASPP module and an attention mechanism to obtain a feature matrix Q3;

the feature matrix R3 is convolved for multiple times to generate a feature matrix W1;

after upsampling, the feature matrix W1 is superposed with the feature matrix Q3 to obtain a feature matrix T1;

performing convolution calculation on the feature matrix T1 to obtain a feature matrix U1, upsampling the feature matrix U1, and overlapping the upsampled feature matrix U1 with the feature matrix Q2 to obtain a feature matrix T2;

performing convolution calculation on the feature matrix T2 to obtain a feature matrix U2, upsampling the feature matrix U2, and overlapping the upsampled feature matrix U2 with the feature matrix Q1 to obtain a feature matrix T3;

the obtained feature matrix T3 is the target feature matrix, and the feature matrix T3 is used as the target feature matrix.

And classifying and outputting the target feature matrix through a sigmoid function so as to detect and segment traffic elements in the traffic scene image.

The ASPP algorithm is added on the basis of the original linknet algorithm, so that the edge accuracy of the extracted traffic elements is higher, and the context information is more. The addition of the Attention algorithm can reduce the computational complexity of the algorithm and can accelerate the reasoning speed of the model.

The sigmoid function is an activation function, and the target feature matrix can be identified and classified through the sigmoid function. Based on the classified output of the sigmoid function to the feature matrix, the traffic elements can be segmented. And performing secondary classification on the target feature matrix through the sigmoid function based on feature learning of historical samples, and segmenting traffic elements from the traffic scene image.

S102, inputting the segmentation result and the preset standard data to a countermeasure network simultaneously to train the countermeasure network, and generating a traffic element segmentation image with confidence degree through the countermeasure network.

The segmentation result is a segmented traffic element obtained by segmenting the trained convolutional neural network model, i.e., a segmentation result obtained by segmenting the model, and the predetermined standard data is standard traffic element image data (or referred to as gold standard). The countermeasure network comprises a generator network and a discriminator network, wherein the generator network mainly learns the mapping relation between the random vector and the real standard data according to the input random vector, and the discriminator network is a two-classifier and used for adjusting the generator network according to the difference between the real sample data and the output of the generator network.

The segmentation performance of the segmentation model can be improved by introducing a countermeasure network into the segmentation network, two input high-dimensional images are mapped into a low-latitude output space in the countermeasure network, and the distance between the two input high-dimensional images is calculated by using a specified distance measurement calculation formula in an output result, so that the similarity between the images is judged, and the segmentation image with confidence coefficient is generated.

In an embodiment, as shown in fig. 2, fig. 2 is a schematic flowchart of a segmentation map with confidence generated by the countermeasure network on the basis of step S102 according to this embodiment, and includes:

the traffic element image segmentation result 2010 obtained by the segmentation of the pre-trained segmentation model is input to the countermeasure network module 2030 together with the gold standard (i.e., the traffic element image of the predetermined standard) 2020, and the segmentation map with confidence is generated by the network module 2030.

Wherein, the updating algorithm of the countermeasure network is as follows:

the parameters of the convolutional neural network are adjusted back by setting the gap between the convolutional neural network output and the gold standard measured by the antagonistic network module 2030, thereby improving the segmentation result. The segmentation module and the confrontation module confront each other in training, the network model which starts training may not achieve an ideal segmentation effect, the accuracy is very low, and a great loss value is caused, while the confrontation network module can easily compare the matching degree between pictures, the matching degree between the pictures at the back is gradually improved, and the segmentation model can also gradually achieve the ideal segmentation effect.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device for a traffic element segmentation method in automatic driving according to an embodiment of the present invention, where the electronic device includes:

the segmentation module 310 is configured to acquire a traffic scene image acquired by a vehicle-mounted camera, and segment the traffic scene image through a pre-trained segmentation model;

optionally, the segmentation module 310 includes:

the training unit is used for extracting traffic elements from the acquired traffic scene images and training the segmentation model by taking the traffic elements as samples, wherein the iterative update algorithm of the segmentation model is a binary cross entropy loss function.

Optionally, the performing convolution calculation on the traffic scene image to obtain a feature matrix, and after down-sampling and up-sampling the feature matrix, superimposing the feature matrix with a feature matrix processed based on an ASPP module and an attention mechanism to obtain a target feature matrix specifically includes:

the feature matrix R3 is convolved for multiple times to generate a feature matrix W1, and the feature matrix W1 is subjected to upsampling and then is superposed with a feature matrix Q1 to obtain a feature matrix T1;

and performing convolution calculation on the feature matrix T2 to obtain a feature matrix U2, performing upsampling on the feature matrix U2, and then overlapping the upsampled feature matrix U2 with the feature matrix Q2 to obtain a feature matrix T3, wherein the feature matrix T3 is used as a target feature matrix.

And a generating module 320, configured to input the segmentation result and the predetermined standard data to a countermeasure network simultaneously to train the countermeasure network, and generate a traffic element segmentation image with confidence through the countermeasure network.

Optionally, the step of inputting the segmentation result and the predetermined standard data to a countermeasure network simultaneously to train the countermeasure network, and the generating the traffic element segmentation image with the confidence degree by the countermeasure network specifically includes:

and acquiring the traffic element images obtained by segmentation of the pre-trained segmentation model, inputting the segmented traffic element images and the traffic element images with a preset standard into the trained confrontation network, and calculating and judging the similarity of the two images through the trained confrontation network.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, where the program may be stored in a computer-readable storage medium, and when the program is executed, the program includes steps S101 to S102, where the storage medium includes, for example: ROM/RAM, magnetic disk, optical disk, etc.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for segmenting traffic elements in automatic driving is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining of the traffic scene image captured by the vehicle-mounted camera, and the segmenting of the traffic scene image by the pre-trained segmentation model further comprises:

extracting traffic elements from the acquired traffic scene images, and training a segmentation model by taking the traffic elements as samples, wherein the iterative update algorithm of the segmentation model is a two-class cross entropy loss function.

3. The method according to claim 1, wherein the performing convolution calculation on the traffic scene image to obtain a feature matrix, and after down-sampling and up-sampling the feature matrix, superimposing the feature matrix with a feature matrix processed based on an ASPP module and an attention mechanism to obtain a target feature matrix specifically comprises:

the feature matrix R3 is convolved for multiple times to generate a feature matrix W1, and the feature matrix W1 is subjected to upsampling and then is superposed with a feature matrix Q3 to obtain a feature matrix T1;

and performing convolution calculation on the feature matrix T2 to obtain a feature matrix U2, performing upsampling on the feature matrix U2, and then overlapping the upsampled feature matrix U2 with the feature matrix Q1 to obtain a feature matrix T3, wherein the feature matrix T3 is used as a target feature matrix.

4. The method of claim 1, wherein the segmentation result and the predetermined standard data are simultaneously input to a countermeasure network to train the countermeasure network, and wherein generating a traffic element segmentation image with confidence by the countermeasure network comprises:

5. An electronic device for traffic element segmentation in autonomous driving, comprising:

6. The electronic device of claim 5, wherein the segmentation module comprises:

7. The electronic device according to claim 5, wherein the convolving the traffic scene image to obtain a feature matrix, and after down-sampling and up-sampling the feature matrix, superimposing the feature matrix with a feature matrix processed based on an ASPP module and an attention mechanism to obtain a target feature matrix specifically comprises:

8. The electronic device of claim 5, wherein the segmentation result and the predetermined standard data are simultaneously input to a countermeasure network for training the countermeasure network, and the generation of the traffic element segmentation image with confidence through the countermeasure network is specifically:

9. An electronic device comprising a processor, a memory and a computer program stored in the memory and running on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method for traffic element segmentation in autonomous driving according to any of claims 1 to 4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method for partitioning traffic elements in autonomous driving according to any one of claims 1 to 4.