WO2022222095A1

WO2022222095A1 - Trajectory prediction method and apparatus, and computer device and storage medium

Info

Publication number: WO2022222095A1
Application number: PCT/CN2021/088937
Authority: WO
Inventors: 许家妙
Original assignee: 深圳元戎启行科技有限公司
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2022-10-27
Also published as: CN115917559A

Abstract

A trajectory prediction method, comprising: acquiring a motion trajectory of an obstacle to be subjected to prediction (202); according to the motion trajectory, determining target map information corresponding to said obstacle (204); converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix (206); and inputting the first trajectory matrix and the first map matrix into a trained trajectory prediction model, performing embedding processing on the first trajectory matrix and the first map matrix, so as to obtain a target matrix, performing feature extraction on the target matrix on the basis of a multi-head attention mechanism, so as to obtain an output feature, and performing regression processing on the output feature, so as to obtain a predicted trajectory of said obstacle (208).

Description

Trajectory prediction method, device, computer equipment and storage medium

technical field

The present application relates to a trajectory prediction method, apparatus, computer equipment, storage medium and vehicle.

Background technique

In the process of autonomous driving, it is very necessary to predict the trajectory of obstacles in the surrounding environment within a certain period of time. By predicting the future trajectory of the obstacle, the autonomous vehicle can identify the intention of the obstacle earlier, and plan the driving route and driving speed according to the intention of the obstacle, so as to avoid collision and reduce the occurrence of safety accidents. The traditional method is to extract features from the historical trajectory information and map information of obstacles through the existing trajectory prediction model to realize trajectory prediction. The network handles raster images or vectorized information.

Since the map information is particularly important for the trajectory prediction of obstacles, the existing trajectory prediction models can only roughly consider the correlation between the map information and the obstacle trajectory information, and cannot fully extract the deeper level between the map information and the obstacle information. , resulting in a low accuracy of trajectory prediction.

SUMMARY OF THE INVENTION

According to various embodiments disclosed in the present application, a trajectory prediction method, apparatus, computer device, storage medium, and vehicle are provided.

A trajectory prediction method, comprising:

Obtain the motion trajectory of the obstacle to be predicted;

Determine the target map information corresponding to the obstacle to be predicted according to the motion trajectory;

converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix; and

Inputting the first trajectory matrix and the first map matrix into the trained trajectory prediction model, and embedding the first trajectory matrix and the first map matrix to obtain a target matrix, based on multi-head attention The mechanism performs feature extraction on the target matrix to obtain output features, and performs regression processing on the output features to obtain the predicted trajectory of the obstacle to be detected.

A trajectory prediction device, comprising:

The trajectory acquisition module is used to acquire the motion trajectory of the obstacle to be predicted;

a map acquisition module, configured to determine target map information corresponding to the to-be-predicted obstacle according to the motion trajectory;

a matrix conversion module for converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix; and

A trajectory prediction module, configured to input the first trajectory matrix and the first map matrix into the trained trajectory prediction model, and perform embedding processing on the first trajectory matrix and the first map matrix to obtain a target Matrix, feature extraction is performed on the target matrix based on the multi-head attention mechanism to obtain output features, and regression processing is performed on the output features to obtain the predicted trajectory of the obstacle to be detected.

A computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored therein, the computer-readable instructions, when executed by the processor, cause the one or more processors to execute The following steps:

Obtain the motion trajectory of the obstacle to be predicted;

One or more computer storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

Obtain the motion trajectory of the obstacle to be predicted;

A vehicle comprising the steps of executing the above trajectory prediction method.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the present application will be apparent from the description, drawings, and claims.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings required in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

FIG. 1 is an application environment diagram of the trajectory prediction method in one or more embodiments.

FIG. 2 is a schematic flowchart of a trajectory prediction method in one or more embodiments.

FIG. 3 is a schematic structural diagram of a trained trajectory prediction model in one or more embodiments.

FIG. 4 is a schematic flowchart of a step of embedding a first trajectory matrix and a first map matrix to obtain a target matrix in one or more embodiments.

FIG. 5 is a block diagram of a trajectory prediction apparatus in one or more embodiments.

6 is a block diagram of a computer device in one or more embodiments.

Detailed ways

In order to make the technical solutions and advantages of the present application clearer, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

It should be noted that the terms "first", "second" and the like in the description and claims of the present application are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence.

The trajectory prediction method provided in this application can be applied to the application environment shown in FIG. 1 . The onboard sensor 102 communicates with the onboard computer device 104 over a network. The number of in-vehicle sensors can be one or more. The in-vehicle computer equipment may be simply referred to as computer equipment. The vehicle-mounted sensor 102 sends the collected drive test data to the computer device 104, and the computer device 104 performs detection, tracking and sampling processing on the drive test data to obtain the motion trajectory of the obstacle to be predicted in the road test data, and determines the to-be-predicted obstacle according to the motion trajectory The target map information corresponding to the obstacle, so as to convert the motion trajectory into the corresponding first trajectory matrix, and convert the target map information into the corresponding first map matrix, and then input the first trajectory matrix and the first map matrix into the trained In the trajectory prediction model of , the first trajectory matrix and the first map matrix are embedded to obtain the target matrix, and the feature extraction is performed on the target matrix based on the multi-head attention mechanism to obtain the output features, and the output features are subjected to regression processing to obtain the target matrix. The predicted trajectory of the obstacle. The vehicle-mounted sensor 102 can be, but is not limited to, a lidar, a laser scanner.

In one of the embodiments, as shown in FIG. 2, a trajectory prediction method is provided, and the method is applied to the computer device in FIG. 1 as an example for description, including the following steps:

Step 202, acquiring the motion trajectory of the obstacle to be predicted.

The obstacles to be predicted refer to the dynamic obstacles around the vehicle during the driving process of the vehicle. The obstacles to be predicted may include pedestrians, vehicles, and the like.

When the vehicle is driving, the sensors installed on the vehicle can send the collected road test data to the computer equipment. The computer equipment can store the drive test data in units of frames, and record the data collection time and other information of each frame of the drive test data. Among them, the vehicle sensor can be a lidar, a laser scanner, a camera, and the like. The drive test data can be point cloud data or surrounding environment images. When the sensor is a lidar or a laser scanner, the collected point cloud data is sent to a computer device. When the sensor is a camera, the captured image of the surrounding environment is sent to the computer device. The point cloud data refers to the data that the sensor records the scanned surrounding environment information in the form of a point cloud. The surrounding environment information includes the obstacles to be predicted in the surrounding environment of the vehicle, and there can be multiple obstacles to be predicted. The point cloud data may specifically include three-dimensional coordinates of each point, laser reflection intensity, color information, and the like. The three-dimensional coordinates are used to represent the position information of the obstacle surface to be predicted in the surrounding environment. The surrounding environment image may be a panoramic image around the vehicle collected by a plurality of cameras.

Each time the computer device acquires drive test data within a preset time period, it performs target detection and target tracking on the drive test data to obtain a motion trajectory within the preset time period. For example, the preset time period may be 2s. Object detection refers to detecting obstacles in drive test data and predicting the location and category of each obstacle. Target tracking refers to predicting the position of the obstacle in the subsequent frame and determining the speed information of the obstacle to be predicted when the position of the obstacle in the initial frame is known. The track information includes the position information, speed, orientation, etc. of the obstacle to be predicted in each frame of drive test data. The location information refers to the location coordinates of the obstacles to be predicted in the world coordinates. Specifically, the computer equipment inputs the collected drive test data into the corresponding target detection model, locates the location area where each obstacle to be predicted is located, and uses a bounding box to frame the location area to obtain the corresponding object detection model for each obstacle to be predicted. the bounding box. The bounding box includes the center point coordinates, size, orientation, etc. of each obstacle to be predicted. The coordinates of the center point of the bounding box represent the position information of the obstacle to be predicted. By identifying the bounding box corresponding to each obstacle to be predicted, different obstacles to be predicted can be accurately distinguished. The computer device can input the bounding box of the current frame corresponding to the obstacle to be predicted and a continuous multi-frame bounding box composed of bounding boxes before the current frame into the pre-trained target tracking model to obtain the speed and acceleration of the obstacle to be predicted in the current frame. The computer equipment obtains the trajectory information of the obstacle to be predicted in each frame by performing target detection and target tracking on each frame of drive test data. A high-precision map is stored in the computer equipment, and the high-precision map contains rich and detailed road traffic information elements. High-precision maps not only have high-precision coordinates, but also include accurate road shapes, and also include data on the slope, curvature, heading, elevation, roll, etc. of each lane. A high-resolution map will not only describe the road, but also how many lanes there are on a road, and will truly reflect the actual style of the road. The computer device can sample and process the trajectory information of the obstacle to be predicted based on the high-precision map, and obtain a trajectory that satisfies the preset sampling conditions, thereby obtaining the motion trajectory of the obstacle to be predicted. The preset sampling conditions refer to trajectories in the junction area, trajectories with changes in curvature and speed, and trajectories with lane changes and cut ins. The motion trajectory includes a plurality of trajectory points, and each trajectory point includes coordinate values in the x-direction and the y-direction.

Step 204: Determine target map information corresponding to the obstacle to be predicted according to the motion trajectory.

The computer device searches for the lane centerline corresponding to the motion track, and the number of lane centerlines may be multiple. The lane center line is sampled, and the lane center line is represented by a plurality of points obtained by sampling. The multiple points obtained by sampling can be called location points. Therefore, the target map information corresponding to the obstacle to be predicted is obtained according to the center line of the lane. The target map information may include a lane centerline corresponding to each motion track, and each motion track may correspond to multiple lane centerlines. The lane centerline corresponding to each motion track can be referred to as a track map information, and a lane centerline can be referred to as a track lane information. Therefore, a track map information can contain multiple track lane information. Each lane centerline includes a plurality of position points, and each position point includes coordinate values in the x-direction and the y-direction.

Step 206 , converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix.

Since the trajectory includes coordinate values in the x-direction and the y-direction, in order to improve the trajectory prediction speed, the x-direction and the y-direction can be separately calculated. Specifically, the motion trajectory can be converted into a first trajectory matrix, and the target map information can be converted into a corresponding first map matrix. The first trajectory matrix is in the format of N_1×T1×2, where N_1 represents the number of trajectories in the motion trajectory, T1 represents the number of trajectory points in each trajectory, and 2 represents the x and y coordinate directions. The second map matrix is an N_2×T2×2 matrix format, where N_2 represents the number of track lane information in the target map information, T2 represents the number of location points in each track lane information, and 2 represents the x and y coordinate directions.

Step 208: Input the first trajectory matrix and the first map matrix into the trained trajectory prediction model, perform embedding processing on the first trajectory matrix and the first map matrix to obtain the target matrix, and perform the target matrix based on the multi-head attention mechanism. Feature extraction, output features are obtained, and regression processing is performed on the output features to obtain the predicted trajectory of the obstacle to be detected.

The target matrix refers to a matrix that identifies the positional relationship between the obstacle to be predicted and the corresponding target map information.

A trained trajectory prediction model is pre-stored in the computer device, and the trained trajectory prediction model is a model based on a multi-head attention network. The multi-head attention network refers to the transformer network. The trained trajectory prediction model is trained with a large amount of sample data. Trained trajectory prediction models can include embedding networks, multi-head attention networks, and regression networks. The embedding network may be composed of any existing one-dimensional convolutional network, and the embedding network is used to perform embedding processing on the first trajectory matrix and the first map matrix, and the embedding processing may include processing the first trajectory matrix and the first map matrix. Perform feature extraction, and perform position embedding on the extracted feature matrix to obtain the target matrix. Position embedding refers to identifying the positional relationship between the obstacle to be predicted and the corresponding target map information.

The target matrix is used as the input of the multi-head attention network, and the feature extraction is performed on the target matrix through the multi-head attention network based on the multi-attention mechanism, and the output matrix is obtained. The output matrix is a matrix obtained by concatenating matrices extracted by multiple attention heads. The multi-attention mechanism refers to the feature extraction mechanism of the multi-head self-attention layer in the multi-head attention network, which can focus on the relationship between the obstacles to be predicted in the target matrix and the target map information from different positions. It can obtain richer and more comprehensive feature information and fully extract the deeper correlation between map information and obstacle information. The regression network can be any of the existing one-dimensional convolutional neural networks. The output matrix is input into the regression network, and the prediction operation is performed on the output matrix through the regression network to obtain the predicted trajectory of the obstacle to be predicted. The predicted trajectory may be the motion trajectory of the obstacle to be predicted in the future, such as the motion trajectory in the next 3s.

In this embodiment, the motion trajectory of the obstacle to be predicted is obtained, the target map information corresponding to the obstacle to be predicted is determined according to the motion trajectory, the motion trajectory is converted into a corresponding first trajectory matrix, and the target map information is converted into a corresponding first trajectory matrix. The first map matrix of , so that the motion trajectory and target map information meet the input requirements of the trajectory prediction model. The first trajectory matrix and the first map matrix are input into the trained trajectory prediction model, the first trajectory matrix and the first map matrix are embedded to obtain the target matrix, and the feature extraction is performed on the target matrix based on the multi-head attention mechanism, The output feature is obtained, and the output feature is subjected to regression processing to obtain the predicted trajectory of the obstacle to be detected. Since the multi-head attention mechanism in the trajectory prediction model can pay attention to the relationship between the obstacles to be predicted in the target matrix and the target map information from different positions, more abundant and comprehensive feature information can be obtained, and the map information and obstacles can be fully extracted. Deeper correlation between object information improves the accuracy of trajectory prediction. In addition, by converting the motion trajectory into the corresponding first trajectory matrix and converting the target map information into the corresponding first map matrix, the motion trajectory and the target map information can be divided into information in the x direction and the y direction, so that the x direction It performs independent operation with the information in the y direction, which improves the efficiency of trajectory prediction.

In one embodiment, acquiring the motion trajectory of the obstacle to be predicted includes: acquiring drive test data, performing perceptual processing on the drive test data, and obtaining trajectory information of the obstacle to be predicted in the drive test data; The trajectory information of the obstacle is sampled to obtain the motion trajectory corresponding to the obstacle to be predicted.

The road test data refers to the environmental information around the autonomous vehicle collected by the sensors during the autonomous driving process.

The computer equipment acquires the drive test data collected by the sensor, and performs perceptual processing on the drive test data. Perceptual processing refers to target detection and target tracking on the drive test data. The drive test data can be point cloud data or surrounding environment images. When the drive test data is point cloud data, the point cloud data can be detected by any target detection model, such as PointNet, PointPillar, PolarNet, Semantic Segment Models (semantic segmentation model), etc. A three-dimensional bounding box corresponding to each obstacle to be predicted, including the coordinates, size, and orientation of the center point of each obstacle to be predicted. The coordinates of the center point represent the position information of the obstacle to be predicted. When the drive test data is an image of the surrounding environment, target detection models can be used, such as SSD (Single Shot MultiBox Detector direct multi-target detection) model, RefineDet (Single-Shot Refinement neural network for Object Detection, fine direct multi-target detection), Mobilenet -SSD (Mobilenet based Single Shot MultiBox Detector, direct multi-target detection based on efficient convolutional neural network for mobile vision applications) model, YOLO (You Only Look Once, unified real-time target detection) model, etc. The surrounding environment image is used for target detection, and the two-dimensional bounding box corresponding to the obstacle to be predicted is determined, including the coordinates, size, and orientation of the center point of the obstacle to be predicted. The coordinates of the center point represent the position information of the obstacle to be predicted.

In the process of target tracking, any one of the traditional trackers such as Kalman filter (KF), Unscented Kalman Filter (UKF) and other traditional trackers can be used to predict subsequent frames. The speed information of the obstacle to be detected in . By performing target detection and target tracking on each frame of drive test data, the trajectory information of the obstacle to be predicted in each frame is obtained. Since the trajectory information of the obstacle to be detected may be stationary or moving at a uniform speed, in order to improve the accuracy of the trajectory prediction, the trajectory information can be sampled, and only non-stationary or non-uniformly changing trajectories are sampled. The computer equipment can sample and process the trajectory information of the obstacle to be predicted according to the preset sampling conditions, so as to obtain the motion trajectory of the obstacle to be predicted. The preset sampling conditions refer to trajectories in the junction area, trajectories with changes in curvature and speed, and trajectories with lane changes and cut ins. The motion trajectory includes a plurality of trajectory points, and each trajectory point includes coordinate values in the x-direction and the y-direction.

In this embodiment, the track information of the obstacle to be predicted in the drive test data is obtained by perceptual processing of the drive test data, and sampling processing is performed on the track information of the obstacle to be predicted according to the preset sampling conditions, so as to obtain the corresponding information of the obstacle to be predicted. movement trajectory. By sampling representative trajectory information, the accuracy of trajectory prediction can be effectively improved.

In one embodiment, determining the target map information corresponding to the obstacle to be predicted according to the motion trajectory includes: determining the corresponding lane center line according to the motion trajectory; sampling the lane center line to obtain the target map information corresponding to the obstacle to be predicted .

For the motion trajectory of the obstacle to be predicted, determine the position of the initial trajectory point of the motion trajectory, and take the position as the center of the circle to determine a circular area with a radius of r, for example, r is 3m. The computer device determines, based on the high-precision map, the lane centerlines that intersect with the circular area. There may be multiple lane centerlines that intersect, and the target map information corresponding to the motion trajectory is obtained according to the multiple lane centerlines that intersect. If the position of the initial trajectory point of the obstacle to be predicted is relatively close to the lane boundary (lane change may occur), then the lane centerline of the obstacle to be predicted includes the lane centerline where the initial trajectory point is located and the lane center to be changed. Wire. Since the vehicles on the road all drive along the lane, the map information next to the trajectory is very important to the trajectory prediction. In order to improve the accuracy of the trajectory prediction, the centerline of each lane corresponding to the motion trajectory can be uniformly sampled into N Points, that is, the sampled points represent each lane centerline, and each lane centerline includes N position points. The number of sampling points can be set according to the duration of motion estimation and the duration of the trajectory to be predicted.

In this embodiment, the corresponding lane centerline is determined according to the motion trajectory, and the lane centerline is sampled to obtain the target map information corresponding to the obstacle to be predicted, and the target map information related to the motion trajectory can be accurately obtained, which is conducive to improving the The accuracy of trajectory prediction.

In one of the embodiments, the trained trajectory prediction model includes a multi-head attention network, and the multi-head attention network includes a one-dimensional convolution layer, and the one-dimensional convolution layer is used to perform feature extraction in the abscissa direction of the target matrix respectively and Feature extraction in the ordinate direction.

As shown in Figure 3, it is a schematic diagram of the structure of the trained trajectory prediction model. The trained trajectory prediction model includes sequentially connected embedding network, multi-head attention network and regression network. The multi-head attention network is a transformer network, and "×N" indicates that the transformer network includes multiple multi-head attention layers and feed-forward neural network layers. There is an Add&Norm layer after the multi-head attention layer and the feed-forward neural network layer.

The multi-head attention layer extracts the feature of the target matrix through the multi-head attention mechanism, and the multi-head attention mechanism can pay attention to the trajectory points in different positions in the target matrix. Inputting the computational paths of multiple trajectory points to the feedforward neural network layer makes the matrix-vector interactions in the multi-head attention network more interactive and can learn more complex relationships. Since the path has no dependencies in the feedforward unit, the output features can be obtained by executing the calculation path of multiple trajectory points in parallel through the feedforward neural network layer.

Add is a residual network, and the residual structure can eliminate the problem of information loss caused by deepening the number of layers. Norm refers to Layer Normalization (layer normalization). Therefore, the Add&Norm unit is used to add and normalize the input and output of the multi-head attention layer or feedforward neural network layer. Layer Normalization is used to convert the input into data with a mean of 0 and a variance of 1 to avoid the input falling into the saturation region of the subsequent activation function.

The traditional transformer network includes a Linear layer, and the transformer network in this embodiment is an improved transformer network. The specific method is to replace the Linear in the traditional transformer network with a one-dimensional convolution layer. Therefore, the feature extraction in the abscissa direction and the feature extraction in the ordinate direction can be performed on the target matrix through the one-dimensional convolution layer. The data in the y direction is independently operated, which effectively improves the efficiency of trajectory prediction, and the accuracy of trajectory prediction is also improved.

In one embodiment, as shown in FIG. 4 , the first trajectory matrix and the first map matrix are embedded, and the step of obtaining the target matrix includes:

Step 402: Perform feature extraction on the first trajectory matrix and the first map matrix respectively through the embedding network in the trained trajectory prediction model, obtain the channel number of the last convolutional layer of the embedding network, and obtain the first trajectory according to the channel data. The first feature matrix corresponding to the matrix and the second feature matrix corresponding to the first map matrix.

Step 404: Combine the first feature matrix and the second feature matrix to obtain a combined matrix.

Step 406 , adding feature parameters to the combined matrix, and performing position embedding processing on the combined matrix after adding the feature parameters to obtain a target matrix.

The trained trajectory prediction model includes an embedding network, a multi-head attention network and a regression network, and the embedding network can be a one-dimensional convolutional network. The embedding network is to convert the first trajectory matrix and the first map matrix into the matrix format required by the multi-head attention network, which can be used to capture the distance between the trajectory points in the first trajectory matrix and the position points in the first map matrix in a high-dimensional space Relationship. The feature extraction is performed on the first trajectory matrix and the first map matrix respectively through the embedding network, and the first feature matrix corresponding to the first trajectory matrix and the first feature matrix corresponding to the first map matrix are generated according to the number of channels of the last convolutional layer of the embedding network. Two feature matrices. The number of channels in the last convolutional layer can be represented by dim1. The first feature matrix can be represented as N_1×dim1×2, where N_1 represents the number of tracks in the first feature matrix, and 2 represents the x and y coordinate directions. The second feature matrix may be represented as N_2×dim1×2, where N_2 represents the number of track lane information in the second feature matrix, and 2 represents the x and y coordinate directions.

The first feature matrix and the second feature matrix are combined in the second dimension through the embedding network to obtain a combined matrix. The combined matrix is a four-dimensional matrix. The combined matrix can be expressed as N_1×dim2×dim_1×2, where dim2 represents After the first feature matrix and the second feature matrix are combined in the second dimension, the total number of features in the second dimension. dim2 can be preset, so that the computer device can combine the first feature matrix and the second feature matrix according to the preset value. In the merging process, each obstacle to be predicted is traversed. If the number of track lane information corresponding to the track of the obstacle to be predicted in the second feature matrix + 1 is greater than dim2, then the track lane information in the second feature matrix Randomly select dim2-1 track lane information from the data and merge the track of the obstacle to be predicted in the first feature matrix in the second dimension; if the number of track lane information corresponding to the track of the obstacle to be predicted in the second feature matrix +1 is less than dim2, you need to stack 0 matrices in the second dimension, so that the total number of features of the combined second dimension is dim2.

By embedding the network to add feature parameters in the second dimension of the combined matrix, the combined matrix after adding the feature parameters can be expressed as N_1×(1+dim2)×dim_1×2, where 1 represents the added feature parameter, which can be is an arbitrary numerical value. Feature parameters are used to collect information on the map and obstacles to be predicted at scale for subsequent trajectory prediction.

Since there is no processing of the positional relationship between the obstacles to be predicted and the map information in the multi-head attention network, the combined matrix after adding the feature parameters can be processed by position embedding to obtain the target matrix. The positional relationship between the obstacles to be predicted and the map information in the matrix can be identified by position embedding, which is used to make up for the lack of positional information. The target matrix can be directly input into the multi-head attention network for feature extraction.

In this embodiment, feature extraction is performed on the first trajectory matrix and the first map matrix respectively through the embedding network, and the first feature matrix and the first feature matrix corresponding to the first trajectory matrix are obtained according to the number of channels of the last convolutional layer of the embedding network. The second feature matrix corresponding to a map matrix can obtain the matrix format required by the multi-head attention network, and can be used to capture the relationship between the trajectory points in the first trajectory matrix and the position points in the first map matrix in a high-dimensional space. The first feature matrix and the second feature matrix are combined to obtain a combined matrix, and feature parameters are added to the combined matrix, which can quickly collect information on the map and obstacles to be predicted for subsequent trajectory prediction. Combining the matrix for position embedding processing can make up for the lack of position information between the obstacles to be predicted and the map information in the multi-head attention network, and can further improve the accuracy of trajectory prediction.

In one embodiment, before acquiring the motion trajectory of the obstacle to be predicted, the method further includes: acquiring a training sample, where the training sample includes trajectory information of the target obstacle and sample map information corresponding to the target obstacle; converting the trajectory information into Be the corresponding second trajectory matrix, and convert the sample map information into the corresponding second map matrix; input the second trajectory matrix and the second map matrix into the trajectory prediction model to be trained, and output the future trajectory of the target obstacle; The model loss of the trajectory prediction model to be trained is calculated according to the trajectory information and the future trajectory, and the model parameters of the trajectory prediction model to be trained are updated according to the model loss until the preset conditions are met, and the trained trajectory prediction model is obtained.

The training sample refers to the sample data used to train the trajectory prediction model, and the training sample includes the trajectory information of the target obstacle and the sample map information corresponding to the target obstacle. The target obstacle refers to dynamic obstacles, such as vehicles, pedestrians, etc. Specifically, the computer device obtains the historical drive test data collected by the sensor, performs perception processing on the historical drive test data, and obtains the trajectory information of the dynamic obstacles in the historical drive test data. Perceptual processing refers to target detection and target tracking, which is the same as the perceptual processing method in the application process of the above trajectory prediction model, and will not be repeated here. Similarly, the computer device performs sampling processing on the trajectory information of the dynamic obstacle according to the preset sampling conditions, and obtains a trajectory sample set corresponding to the dynamic obstacle. The preset sampling conditions can be a trajectory in a junction area, a trajectory with a change in curvature and speed, a trajectory with a lane change and a cut in. The trajectory sample set includes historical trajectories of multiple dynamic obstacles, and each historical trajectory includes multiple trajectory points. For example, each historical trajectory may include 50 trajectory points. Each track point includes x-direction and y-direction coordinate values. Then, the track lane information corresponding to the dynamic obstacle is determined according to each historical track in the track sample set, and the map sample set corresponding to the dynamic obstacle is obtained. The sampling method of the trajectory lane information is the same as the sampling method in the application process of the above trajectory prediction model, and will not be repeated here. The trajectory information and sample map information corresponding to the target obstacle are respectively selected from the trajectory sample set and the map sample set to generate training samples. The trajectory sample set and the map sample set can be divided into training samples, test sets and validation sets according to a preset ratio. For example, the preset ratio can be 3:1:1. The purpose of dividing the trajectory sample set and the map sample set into three sets is to select the model with the highest accuracy and the best generalization ability.

The training process of the trajectory prediction model is the same as the trajectory prediction method in the application process, that is, the trajectory information in the training sample is converted into the corresponding second trajectory matrix, and the sample map information in the training sample is converted into the corresponding second map matrix, input the second trajectory matrix and the second map matrix into the trajectory prediction model to be trained, and output the future trajectory of the target obstacle. Thus, the model loss of the trajectory prediction model to be trained is calculated according to the trajectory information and the future trajectory, and the model parameters are adjusted according to the model loss to obtain the trained trajectory prediction model. For example, the model loss can be existing loss functions such as MSE mean square error loss, cross entropy loss, etc. The model parameters are adjusted through the output backpropagation of the loss function. Since the model training process is an iterative training process, it needs to go through multiple epoch, 1 epoch means that all training samples are used for training once, and each epoch will output a model parameter. The model parameters with the highest accuracy can be determined through the validation set, that is, to determine which epoch outputs the model parameters to obtain a more accurate future trajectory. The specific judgment method can be to determine whether the network loss value reaches the loss threshold, or whether the number of iterations reaches the iteration number. The number of times threshold, if the network loss value reaches the loss threshold, or the number of iterations reaches the threshold of the number of iterations, the model parameters output by the corresponding epoch can be used as the final model parameters, and the model is the trained trajectory prediction model. After getting the trained trajectory prediction model, you can use the test set for model prediction to measure the performance of the model. If the performance of the model tested in the test set is poor, the model parameters of the model can be readjusted by using the training samples until the model parameters with the highest accuracy are obtained.

In this embodiment, a training sample is obtained, the training sample includes trajectory information of the target obstacle and sample map information corresponding to the target obstacle, the trajectory information is converted into a corresponding second trajectory matrix, and the sample map information is converted into a corresponding The second map matrix, the second trajectory matrix and the second map matrix are input into the trajectory prediction model to be trained, the model loss of the trajectory prediction model to be trained is calculated, and the model parameters of the trajectory prediction model to be trained are updated according to the model loss, Get the trained trajectory prediction model. Since the multi-head attention mechanism in the trajectory prediction model can pay attention to the relationship between the target obstacle and the sample map information in the target matrix from different positions, it can obtain more abundant and comprehensive feature information, and fully extract map information and obstacles. The deeper correlation between information improves the accuracy of trajectory prediction.

In one embodiment, as shown in FIG. 5, a trajectory prediction apparatus is provided, including: a trajectory acquisition module 502, a map acquisition module 504, a matrix conversion module 506, and a trajectory prediction module 508, wherein:

The trajectory acquisition module 502 is used to acquire the motion trajectory of the obstacle to be predicted.

The map acquisition module 504 is configured to determine target map information corresponding to the obstacle to be predicted according to the motion trajectory.

The matrix conversion module 506 is configured to convert the motion trajectory into a corresponding first trajectory matrix, and convert the target map information into a corresponding first map matrix.

The trajectory prediction module 508 is configured to input the first trajectory matrix and the first map matrix into the trained trajectory prediction model, perform embedding processing on the first trajectory matrix and the first map matrix, and obtain the target matrix, based on the multi-head attention mechanism Perform feature extraction on the target matrix to obtain output features, and perform regression processing on the output features to obtain the predicted trajectory of the obstacle to be detected.

In one embodiment, the trained trajectory prediction model includes a multi-head attention network, and the multi-head attention network includes a one-dimensional convolutional layer, and the trajectory prediction module 508 is further configured to perform a horizontal cross-section on the target matrix according to the one-dimensional convolutional layer. Feature extraction in the coordinate direction and feature extraction in the ordinate direction.

In one embodiment, the trajectory prediction module 508 is further configured to perform feature extraction on the first trajectory matrix and the first map matrix respectively through the embedding network in the trained trajectory prediction model to obtain the last convolutional layer of the embedding network According to the number of channels, the first feature matrix corresponding to the first trajectory matrix and the second feature matrix corresponding to the first map matrix are obtained according to the number of channels; the first feature matrix and the second feature matrix are merged to obtain a combined matrix; in the combined matrix Add feature parameters to , and perform position embedding processing on the combined matrix after adding feature parameters to obtain the target matrix.

In one embodiment, the trajectory acquisition module 508 is further configured to acquire drive test data, perform perceptual processing on the drive test data, and obtain trajectory information of obstacles to be predicted in the drive test data; The trajectory information is sampled to obtain the motion trajectory corresponding to the obstacle to be predicted.

In one embodiment, the map acquisition module 504 is further configured to determine the corresponding lane centerline according to the motion trajectory; perform sampling processing on the lane centerline to obtain target map information corresponding to the obstacle to be predicted.

In one embodiment, the above-mentioned device further includes:

The sample acquisition module is used to acquire training samples, and the training samples include the trajectory information of the target obstacle and the sample map information corresponding to the target obstacle.

The sample conversion module is configured to convert the trajectory information into a corresponding second trajectory matrix, and convert the sample map information into a corresponding second map matrix.

The trajectory calculation module is used to input the second trajectory matrix and the second map matrix into the trajectory prediction model to be trained, and output the future trajectory of the target obstacle.

The parameter updating module is used to calculate the model loss of the trajectory prediction model to be trained according to the trajectory information and future trajectories, update the model parameters of the trajectory prediction model to be trained according to the model loss, and obtain the trained trajectory prediction model.

For the specific limitation of the trajectory prediction apparatus, reference may be made to the above limitation on the trajectory prediction method, which will not be repeated here. Each module in the above-mentioned trajectory prediction apparatus can be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one of the embodiments, a computer device is provided, the internal structure of which can be shown in FIG. 6 . The computer device includes a processor, memory, a communication interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer device is used to store data for a trajectory prediction method. The communication interface of the computer device is used to connect and communicate with an external terminal. The computer readable instructions, when executed by a processor, implement a trajectory prediction method.

Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

A computer device, comprising a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the one or more processors, makes the one or more processors execute the above methods to implement steps in the example.

One or more computer storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps in each of the foregoing method embodiments.

Wherein, the computer storage medium is a readable storage medium, and the readable storage medium may be non-volatile or volatile.

In one of the embodiments, a vehicle is provided, the vehicle may specifically include an autonomous driving vehicle, and the vehicle includes the above computer device, which can execute the steps in the above embodiment of the trajectory prediction method.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

A trajectory prediction method, characterized in that the method comprises:

Obtain the motion trajectory of the obstacle to be predicted;

Determine the target map information corresponding to the obstacle to be predicted according to the motion trajectory;

converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix; and

Inputting the first trajectory matrix and the first map matrix into the trained trajectory prediction model, and embedding the first trajectory matrix and the first map matrix to obtain a target matrix, based on multi-head attention The mechanism performs feature extraction on the target matrix to obtain output features, and performs regression processing on the output features to obtain the predicted trajectory of the obstacle to be detected.
The method according to claim 1, wherein the acquiring the motion trajectory of the obstacle to be predicted comprises:

Acquiring drive test data, performing perception processing on the drive test data, and obtaining trajectory information of obstacles to be predicted in the drive test data; and

The trajectory information of the to-be-predicted obstacle is sampled according to preset sampling conditions to obtain a motion trajectory corresponding to the to-be-predicted obstacle.
The method according to claim 1, wherein the determining the target map information corresponding to the obstacle to be predicted according to the motion trajectory comprises:

determining the corresponding lane centerline according to the motion trajectory; and

Perform sampling processing on the lane center line to obtain target map information corresponding to the obstacle to be predicted.
The method according to claim 1, wherein the trained trajectory prediction model comprises a multi-head attention network, the multi-head attention network comprises a one-dimensional convolution layer, and the one-dimensional convolution layer is used for The feature extraction in the abscissa direction and the feature extraction in the ordinate direction are respectively performed on the target matrix.
The method according to claim 1, wherein the performing embedding processing on the first trajectory matrix and the first map matrix to obtain a target matrix comprises:

Perform feature extraction on the first trajectory matrix and the first map matrix respectively through the embedding network in the trained trajectory prediction model, and obtain the number of channels of the last convolutional layer of the embedding network. The number of channels obtains the first feature matrix corresponding to the first trajectory matrix and the second feature matrix corresponding to the first map matrix;

combining the first feature matrix and the second feature matrix to obtain a combined matrix; and

A feature parameter is added to the combined matrix, and a position embedding process is performed on the combined matrix after adding the feature parameter to obtain a target matrix.
The method according to any one of claims 1 to 5, characterized in that before acquiring the motion trajectory of the obstacle to be predicted, the method further comprises:

acquiring training samples, the training samples including the trajectory information of the target obstacle and the sample map information corresponding to the target obstacle;

converting the trajectory information into a corresponding second trajectory matrix, and converting the sample map information into a corresponding second map matrix;

Inputting the second trajectory matrix and the second map matrix into the trajectory prediction model to be trained, and outputting the future trajectory of the target obstacle; and

The model loss of the trajectory prediction model to be trained is calculated according to the trajectory information and the future trajectory, and the model parameters of the trajectory prediction model to be trained are updated according to the model loss to obtain a trained trajectory prediction model.
A trajectory prediction device, comprising:

The trajectory acquisition module is used to acquire the motion trajectory of the obstacle to be predicted;

a map acquisition module, configured to determine target map information corresponding to the to-be-predicted obstacle according to the motion trajectory;

a matrix conversion module for converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix; and

A trajectory prediction module, configured to input the first trajectory matrix and the first map matrix into the trained trajectory prediction model, and perform embedding processing on the first trajectory matrix and the first map matrix to obtain a target Matrix, feature extraction is performed on the target matrix based on the multi-head attention mechanism to obtain output features, and regression processing is performed on the output features to obtain the predicted trajectory of the obstacle to be detected.
The apparatus according to claim 7, wherein the trained trajectory prediction model comprises a multi-head attention network, the multi-head attention network includes a one-dimensional convolution layer, and the trajectory prediction module is further configured to The one-dimensional convolution layer performs feature extraction in the abscissa direction and feature extraction in the ordinate direction for the target matrix, respectively.
The device according to claim 7, wherein the trajectory prediction module is further configured to perform the first trajectory matrix and the first map matrix on the first trajectory matrix and the first map matrix respectively through an embedding network in the trained trajectory prediction model. Feature extraction, obtaining the number of channels of the last convolutional layer of the embedding network, and obtaining a first feature matrix corresponding to the first trajectory matrix and a second feature matrix corresponding to the first map matrix according to the number of channels ; Combine the first feature matrix and the second feature matrix to obtain a combined matrix; and add a feature parameter in the combined matrix, and perform position embedding processing on the combined matrix after adding the feature parameter to obtain a target matrix.
The device according to claim 7, wherein the trajectory acquisition module is further configured to acquire drive test data, perform perceptual processing on the drive test data, and obtain trajectory information of obstacles to be predicted in the drive test data ; and perform sampling processing on the trajectory information of the obstacle to be predicted according to preset sampling conditions, to obtain a motion trajectory corresponding to the obstacle to be predicted.
A computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored in the memory that, when executed by the one or more processors, cause the one or more processors to Each processor performs the following steps:

Obtain the motion trajectory of the obstacle to be predicted;

Determine the target map information corresponding to the obstacle to be predicted according to the motion trajectory;

converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix; and

Inputting the first trajectory matrix and the first map matrix into the trained trajectory prediction model, and embedding the first trajectory matrix and the first map matrix to obtain a target matrix, based on multi-head attention The mechanism performs feature extraction on the target matrix to obtain output features, and performs regression processing on the output features to obtain the predicted trajectory of the obstacle to be detected.
The computer device according to claim 11, wherein when the processor executes the computer-readable instructions, the processor further executes the following steps: the trained trajectory prediction model comprises a multi-head attention network, and the multi-head attention The network includes a one-dimensional convolution layer, and the one-dimensional convolution layer is used to perform feature extraction in the abscissa direction and feature extraction in the ordinate direction respectively for the target matrix.
The computer device according to claim 11, wherein, when the processor executes the computer-readable instructions, the processor further performs the following step: using an embedding network in the trained trajectory prediction model to analyze the first trajectory The feature extraction is performed on the matrix and the first map matrix respectively, the number of channels of the last convolutional layer of the embedding network is obtained, and the first feature matrix corresponding to the first trajectory matrix and the first feature matrix corresponding to the first trajectory matrix are obtained according to the number of channels A second feature matrix corresponding to the first map matrix; combining the first feature matrix and the second feature matrix to obtain a combined matrix; and adding a feature parameter to the combined matrix, the combination after adding the feature parameter The matrix is subjected to position embedding processing to obtain the target matrix.
The computer device according to claim 11, wherein, when the processor executes the computer-readable instructions, the processor further performs the following steps: acquiring drive test data, performing perception processing on the drive test data, and obtaining the drive test data. The trajectory information of the obstacle to be predicted in the measured data; and the trajectory information of the obstacle to be predicted is sampled according to the preset sampling condition, and the motion trajectory corresponding to the obstacle to be predicted is obtained.
The computer device according to claim 11, wherein when the processor executes the computer-readable instructions, the processor further executes the following steps: determining a corresponding lane centerline according to the motion trajectory; Perform sampling processing to obtain target map information corresponding to the obstacle to be predicted.
One or more computer storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

Obtain the motion trajectory of the obstacle to be predicted;

Determine the target map information corresponding to the obstacle to be predicted according to the motion trajectory;

converting the motion trajectory into a corresponding first trajectory matrix, and converting the target map information into a corresponding first map matrix; and

Inputting the first trajectory matrix and the first map matrix into the trained trajectory prediction model, and embedding the first trajectory matrix and the first map matrix to obtain a target matrix, based on multi-head attention The mechanism performs feature extraction on the target matrix to obtain output features, and performs regression processing on the output features to obtain the predicted trajectory of the obstacle to be detected.
The storage medium according to claim 16, wherein the computer-readable instructions, when executed by the processor, further perform the following steps: the trained trajectory prediction model comprises a multi-head attention network, and the multi-head attention The force network includes a one-dimensional convolution layer, and the one-dimensional convolution layer is used to perform feature extraction in the abscissa direction and feature extraction in the ordinate direction respectively for the target matrix.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following step is further performed: performing the first step on the first Feature extraction is performed on the trajectory matrix and the first map matrix respectively, the number of channels of the last convolutional layer of the embedded network is obtained, and the first feature matrix corresponding to the first trajectory matrix and all the channels are obtained according to the number of channels. The second feature matrix corresponding to the first map matrix; the first feature matrix and the second feature matrix are combined to obtain a combined matrix; Combine the matrices for position embedding processing to obtain the target matrix.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed: acquiring drive test data, performing perceptual processing on the drive test data, and obtaining the drive test data. The trajectory information of the obstacle to be predicted in the drive test data; and performing sampling processing on the trajectory information of the obstacle to be predicted according to preset sampling conditions to obtain the motion trajectory corresponding to the obstacle to be predicted.
A vehicle comprising performing the trajectory prediction method according to any one of claims 1-6.