CN117037095A

CN117037095A - Road topology prediction method of high-precision map based on Transformer

Info

Publication number: CN117037095A
Application number: CN202310970236.7A
Authority: CN
Inventors: 姚琼杰; 尹玉成; 石涤文; 丁豪; 张志军
Original assignee: Heading Data Intelligence Co Ltd
Current assignee: Heading Data Intelligence Co Ltd
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-11-10

Abstract

The invention provides a road topology prediction method of a high-precision map based on a transducer, which is characterized by comprising the following steps of: s1, extracting semantic features of intelligent driving vehicle perception data; s2, fusing the extracted semantic features of multiple sources, and generating a bird' S eye view; s3, identifying a lane center line and traffic lights in the aerial view through the target detection model; s4, predicting the topological relation between the center lines of the lanes and the association relation between the center lines of the lanes and the traffic lights. The invention solves the problem of insufficient accuracy of the intelligent driving vehicle in sensing the surrounding environment.

Description

Road topology prediction method of high-precision map based on Transformer

Technical Field

The invention relates to the technical field of intelligent driving, in particular to a road topology prediction method, a road topology prediction system, electronic equipment and a storage medium of a high-precision map based on a Transformer.

Background

High-precision semantic map construction is an important component of autopilot. The autopilot system needs to have a good understanding of the surrounding environment, including dynamic objects and static high-precision semantic maps. The current mainstream method integrates the perception data into a dense point cloud image by SLAM (simultaneous localization and mapping, instant positioning and map construction) and the like, and then a vector map needs to be marked manually, wherein map elements comprise a lane center line, key points, lane lines, traffic lights and the like. This method is costly to manufacture and it is difficult to quickly update the freshness of the maintained map. The generation of online local maps based on real-time perception of own vehicles has become a trend to solve this problem. Some recent studies model maps as an image semantic segmentation problem, represent the map with a raster pattern, predict the type of each grid using a partition approach such as a full convolution network. However, the rasterized map lacks instance information, it is difficult to ensure consistency of space, and nearby pixels may have contradictory semantic categories or geometric shapes, resulting in difficulty in improving the precision of the manufactured high-precision map.

Therefore, it is necessary to study a road topology prediction method of a high-precision map with higher accuracy.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a road topology prediction method of a high-precision map based on a transducer (a neural network model based on an attention mechanism) so as to solve the problem of insufficient accuracy of perception of an intelligent driving vehicle to surrounding environment.

According to a first aspect of the present invention, there is provided a road topology prediction method of a high-precision map based on a transducer, comprising:

s1, extracting semantic features of intelligent driving vehicle perception data;

s2, fusing the extracted semantic features of multiple sources, and generating a bird' S eye view BEV;

s3, identifying a lane center line and traffic lights in the aerial view BEV through the target detection model;

s4, predicting the topological relation between the center lines of the lanes and the association relation between the center lines of the lanes and the traffic lights.

On the basis of the technical scheme, the invention can also make the following improvements.

Optionally, in step S1, the extracting semantic features of the intelligent driving vehicle perception data includes:

obtaining perception data of multiple directions along the circumference of the intelligent driving vehicle,

if the perceived data is image data, adopting a ResNet convolution network or a ViT convolution network to extract the characteristics;

if the perceived data is point cloud data, the PointCert network is adopted for feature extraction.

Optionally, before extracting the features of the image data, preprocessing the image data and normalizing the whole image is further included.

Optionally, in step S2, the extracted features of multiple sources are fused into feature vectors through the FPN network; a semantic feature vector of the aerial view is generated based on the network of transformers.

Optionally, in step S3, the lane centerline and traffic lights are identified from the BEV map via the DETR network.

Optionally, in step S4, the topology relationship between the plurality of lane centerlines and the topology relationship between the lane centerlines and the traffic lights are predicted by the MLP.

According to a second aspect of the present invention, there is provided a road topology prediction system of a high-precision map based on a transducer, comprising:

the feature extraction module is used for extracting semantic features of perception data of the intelligent driving vehicle;

the feature fusion module fuses the extracted semantic features of multiple sources and generates a bird's eye view;

the target detection module is used for identifying a lane center line and traffic lights in the aerial view through a target detection model;

the prediction module predicts the topological relation between the center lines of all the lanes and the association relation between the center lines of the lanes and the traffic lights.

According to a third aspect of the present invention, there is provided an electronic device including a memory and a processor for implementing the steps of the above-described road topology prediction method based on a high-precision map of a transducer when executing a computer management class program stored in the memory.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer management class program which, when executed by a processor, implements the steps of the above-described road topology prediction method based on a high-precision map of a transducer.

According to the road topology prediction method, the system, the electronic equipment and the storage medium of the high-precision map based on the Transformer, which are provided by the invention, a Transformer network structure for generating the BEV map is designed, the topological relation between the lane center line and the traffic light element is generated end to end through the network structure, the problem that the intelligent driving vehicle has insufficient sensing accuracy on the surrounding environment is solved, and the accuracy of the intelligent driving vehicle on the road topology prediction of the high-precision map is improved.

Drawings

Fig. 1 is a flowchart of a road topology prediction method of a high-precision map based on a transducer;

FIG. 2 is a schematic diagram of a network structure of end-to-end high-precision map lane centerline and traffic light topology prediction in accordance with the method of the present invention;

FIG. 3 is a schematic diagram of a ResNet network architecture for use in one embodiment;

FIG. 4 is a schematic diagram of an FPN network architecture employed in one embodiment;

FIG. 5 is a schematic diagram of a Transformer network architecture;

fig. 6 is a block diagram of a road topology prediction system based on a high-precision map of a transducer;

fig. 7 is a schematic hardware structure of a possible electronic device according to the present invention;

fig. 8 is a schematic hardware structure of a possible computer readable storage medium according to the present invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

Fig. 1 is a flow chart of a road topology prediction method of a high-precision map based on a Transformer, and fig. 2 is a network structure schematic diagram of the road topology prediction of the lane center line and traffic lights of the high-precision map from end to end designed for realizing the method. As shown in the network structure of fig. 2, the network structure sequentially includes a backbone (backbone) network, a neck (neg) network, and a head (head) network, which is mainly a classification network (lc_head and te_head) and a prediction network (lc_head and lcte_head). The backbone network is mainly used for extracting semantic features of intelligent driving perception data; the neck network mainly realizes the fusion of a plurality of characteristics; the header network, mainly performs classification or prediction.

Referring to fig. 1 to 2, the road topology prediction method based on the high-precision map of the Transformer provided in the present embodiment includes steps S1 to S4:

s1, extracting semantic features of intelligent driving vehicle perception data through a backbone (backbone) network; for image data, features are extracted by adopting a convolutional network such as a ResNet network or a ViT network, and for point cloud data, features are extracted by adopting a PointCert network.

And S2, fusing the extracted semantic features of a plurality of sources by a neck (neg) network, and generating a bird' S eye view. Because a plurality of perception data of the intelligent driving vehicle come from different sensors and different data types, and the extracted semantic features are features of different angles of the scene around the current vehicle through modeling of different backhaul networks, the novel network fuses the different features, and the expression capability of perception semantic vectors of the intelligent driving environment is improved. In the scheme, semantic feature vectors of Bird's eye view (Bird's eye view) generated by a network based on a transducer are designed.

S3, recognizing a lane center line in the aerial view through a target detection model (an lc_head network in a head network is adopted here), and recognizing traffic light elements in the aerial view through target detection through a te_head network in the head network;

s4, predicting the topological relation between the center lines of all the lanes through an lc_head network in the head network, and predicting the association relation between the center lines of the lanes and traffic lights through an lcte_head network in the head network.

It can be appreciated that based on the defects in the background technology, the embodiment of the invention provides a road topology prediction method of a high-precision map based on a transducer. The method generates the topological relation between the lane center line and the traffic light element end to end, solves the problem that the intelligent driving vehicle has insufficient sensing accuracy on the surrounding environment, and improves the accuracy of the intelligent driving vehicle on the road topology prediction of the high-precision map.

After the network model based on the Transformer required for completing the method of the invention is built, the invention is illustrated by adopting the training process of the model in a specific implementation scene.

1. Image data of the mind driving perception is taken as input. The 7 cameras on the image data source vehicle are respectively a front center, a left front, a right front, a left middle side, a right middle side, a left rear and a right rear. Preprocessing the data and normalizing the image.

2. And extracting the characteristics of the image. The preprocessed image is used as input of the ResNet network to extract image characteristics. The ResNet network is composed of mainly 4 ResNetLayer layers, each of which is composed of conv2d, battNrom 2d and an activation function ReLU as shown in FIG. 3.

3. And a feature fusion layer. And inputting the characteristics of a plurality of sources into the FPN network, and fusing the characteristics into characteristic vectors with more abundant information. As shown in fig. 4, the fpn network is mainly composed of 3 conv2 d.

Generation of bev graphs. In this embodiment, the feature vectors of the BEV are generated by using a transform network, and the transform network structure is shown in fig. 5, and is composed of an activation function ReLU, sinePositionalEncoding, a regularized Dropout, a linear layer Linear, layerNorm, and the like.

5. The lane centerline is generated by the DETR network (lc_head). With DETR network, an FFN layer is added on the basis of fig. 5, and the FFN layer is composed of network layers such as ReLU, linear, dropout.

6. Traffic lights are detected through the DETR network (te_head).

7. Lane center line topology prediction: the MLP layers are added on the basis of FIG. 5, the structures of the lane center lines in the step 5 are combined, and the topological relation between the lane center lines and the lane center lines is predicted through a 3-layer MLP network (lc_head).

8. Lane center line and traffic light relationship prediction: combining the structures of step 5 and step 6, the relationship between lane centerline and traffic lights is predicted by a 5-layer MLP (lcte_head).

Fig. 6 is a block diagram of a road topology prediction system of a high-precision map based on a transducer, and as shown in fig. 6, the road topology prediction system of the high-precision map based on the transducer includes a feature extraction module, a feature fusion module, a target detection module and a prediction module, wherein:

It can be understood that the road topology prediction system based on the high-precision map of the present invention corresponds to the road topology prediction method based on the high-precision map of the foregoing embodiments, and the relevant technical features of the road topology prediction system based on the high-precision map of the present invention may refer to the relevant technical features of the road topology prediction method based on the high-precision map of the present invention, which are not described herein.

Referring to fig. 7, fig. 7 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 7, an embodiment of the present invention provides an electronic device 700, including a memory 710, a processor 720, and a computer program 711 stored in the memory 710 and executable on the processor 720, wherein the processor 720 executes the computer program 711 to implement the following steps:

s2, fusing the extracted semantic features of multiple sources, and generating a bird' S eye view;

s3, identifying a lane center line and traffic lights in the aerial view through the target detection model;

Referring to fig. 8, fig. 8 is a schematic diagram of an embodiment of a computer readable storage medium according to the present invention. As shown in fig. 8, the present embodiment provides a computer-readable storage medium 800 having stored thereon a computer program 811, which computer program 811 when executed by a processor performs the steps of:

According to the road topology prediction method, system and storage medium based on the high-precision map of the Transformer, the image characteristics are extracted by adopting ResNet in a back network, the image characteristics are fused by adopting FPN network and BEV characteristics are extracted by adopting a Neck network, a bird's eye view map fused with a plurality of source characteristic vectors is generated, an lc_head network and a te_head in a head network respectively adopt a DETR network to carry out target detection to generate a lane center line and traffic light elements, the lc_head network in the head network adopts MLP to predict the topological relation among a plurality of lane center lines, and the association relation between the lane center line and the traffic light is predicted by adopting MLP in the lc_head in the head network. The invention designs a transducer network structure for generating the BEV map, and generates the topological relation between the lane center line and the traffic light element end to end through the network structure, thereby solving the problem of insufficient accuracy of perception of the intelligent driving vehicle to the surrounding environment and improving the accuracy of the intelligent driving vehicle to the road topology prediction of the high-precision map.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The road topology prediction method of the high-precision map based on the Transformer is characterized by comprising the following steps of:

2. The method for predicting road topology based on a high-precision map of claim 1, wherein in step S1, the extracting semantic features of intelligent driving vehicle perception data comprises:

3. The method for predicting road topology based on a high-precision map of claim 1, further comprising preprocessing the image data and normalizing the entire image before feature extraction is performed on the image data.

4. The method for predicting road topology based on a high-precision map of claim 1, wherein in step S2, the extracted features of multiple sources are fused into feature vectors through an FPN network; a semantic feature vector of the aerial view is generated based on the network of transformers.

5. The method according to claim 1, wherein in step S3, the lane center line and the traffic light are identified from the BEV map through the DETR network.

6. The method according to claim 1, wherein in step S4, the topological relation between the plurality of lane centerlines and the topological relation between the lane centerlines and the traffic lights are predicted by MLP.

7. A transform-based road topology prediction system for high-precision maps, comprising:

8. An electronic device comprising a memory, a processor for implementing the steps of the Transformer-based road topology prediction method of the high-precision map of any one of claims 1-6 when executing a computer management class program stored in the memory.

9. A computer-readable storage medium, having stored thereon a computer-management-class program which, when executed by a processor, implements the steps of the Transformer-based road topology prediction method of a high-precision map according to any one of claims 1-6.