CN112734808A

CN112734808A - Trajectory prediction method for vulnerable road users in vehicle driving environment

Info

Publication number: CN112734808A
Application number: CN202110069140.4A
Authority: CN
Inventors: 游子诺; 李克强; 熊辉; 许庆
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2021-04-30
Anticipated expiration: 2041-01-19
Also published as: CN112734808B

Abstract

The invention discloses a trajectory prediction method for vulnerable road users in a vehicle driving environment, which comprises the following steps: calculating a semantic vector according to the first N steps of VRU image frame sequences, the first N steps of VRU motion track sequences and the first N steps of self-vehicle driving odometer sequences in the training data set; predicting VRU behavior characteristics; predicting and generating prior behavior pattern distribution and posterior behavior pattern distribution of the VRU in a continuous iterative computation mode by utilizing a behavior pattern prediction network according to the semantic vector and a VRU motion track sequence of the next M step of the training data set and a driving odometer sequence of the next M step of the vehicle, and predicting the VRU motion track in the continuous iterative computation mode by utilizing a track prediction network; calculating a behavior mode objective function, a trajectory prediction objective function and a behavior characteristic objective function; the method comprises the steps of realizing supervised learning through back propagation, and obtaining a VRU motion trail prediction model of a self-driving odometer sequence supporting input planning; and an online track prediction stage. The method can be used for behavior prediction and safety protection of road users who are easily injured in an advanced driving assistance system, and help is provided for decision making of unmanned vehicles.

Description

Trajectory prediction method for vulnerable road users in vehicle driving environment

Technical Field

The invention relates to the technical field of computer vision technology and intelligent vehicles, in particular to a trajectory prediction method for road users vulnerable to injury in a vehicle driving environment.

Background

Vulnerable Road Users (VRU for short) in traffic refer to two types of traffic participants, mainly pedestrians and riders. The prediction of the VRU motion track is one of key technologies of intelligent vehicle perception, and the uncertainty estimation of the track generated by prediction and the prediction result can provide reference for subsequent planning and decision-making of the intelligent vehicle.

The behavior patterns of VRUs are usually diverse and dynamically changing, which poses great challenges to the accuracy of prediction, so decision-making systems often use a conservative approach to reduce the potential risks to other traffic participants, but this is detrimental to maintaining the stability of the external traffic environment and the internal riding experience. In recent years, attention has been paid to a multi-trajectory prediction method using a random generation model, which sets a behavior pattern of an object to conform to a certain prior probability distribution (e.g., gaussian distribution) and can be inferred from an observed variable, and predicts and generates trajectories in various behavior patterns by characterizing different behavior patterns using a plurality of values on enumeration or sampling distributions. According to the difference of the observed variables, the method can be divided into an inference mode for modeling the interaction relation between similar objects and an inference mode for modeling the static environment in which the objects are positioned. The multi-track prediction method accords with the common knowledge of individual behavior diversity, has a better result than a deterministic prediction method (the output result of the same observation value is unique) in prediction accuracy, and is beneficial to providing better reference for planning and decision making, but the method has improved space when applied to VRU motion track prediction in traffic environment, and has the following specific problems:

the lack of modeling of causal associations of VRU behavior patterns and human-vehicle interactions: the intelligent vehicle is mainly divided into three systems of perception, decision and control, VRU motion trail prediction belongs to a perception part, a decision system selects a driving strategy and plans actions by means of perception information, and the rationality of selection and planning is related to the accuracy of perception prediction. In a traffic scene, the behavior of the VRU is influenced by various variables such as internal attributes (such as characters and habits) and external attributes (such as human-vehicle interaction and barriers), but because the self-vehicle is the only independently controllable individual in the invention scene, the human-vehicle interaction is the main controllable variable capable of influencing the behavior mode of the VRU, the driving system of the self-vehicle can select different driving strategies, the human-vehicle interaction variable is influenced in the subsequent time, and the behavior of the VRU is further influenced, in other words, the diversity of the VRU behavior is associated with the behavior of vehicles in the future. If the VRU behavior mode after the self vehicle adopts a certain driving strategy can be effectively modeled, the intelligent vehicle decision system can be beneficial to more accurately evaluating the candidate driving strategy.

Although the existing trajectory prediction based on a random method considers that VRU behaviors have diversity, the existing trajectory prediction only uses observation values of various variables to infer the VRU behavior mode once and predict the trajectory, but in a traffic scene of high dynamic human-vehicle interaction, the behavior mode inferred only by the observation values is likely to change due to human-vehicle interaction in a prediction period, in other words, the existing method only establishes implicit association between the future driving strategy of a vehicle and the VRU behavior mode, and cannot explicitly associate the behavior mode of the VRU with the vehicle strategy, namely, the existing prediction result is the sum of the prediction results of the VRU under all possible driving strategy conditions, and a subset of result distribution cannot be in one-to-one correspondence with the driving strategy. Therefore, the existing prediction method based on single behavior pattern inference is not suitable for predicting the VRU motion trajectory in a human-vehicle interaction environment, and a trajectory prediction method is needed, which can continuously model human-vehicle interaction in prediction, further continuously adjust the VRU behavior pattern, and make more accurate trajectory prediction, so that the behavior prediction of the VRU can be explicitly associated with different candidate driving strategies.

Lack of modeling and measurement of prediction uncertainty: the prediction of VRU behaviors and tracks provides reference for a decision system of an intelligent vehicle, and the high safety and reliability of a driving system provide high standards for the reliability of a prediction method. Although the prediction diversity is solved by the randomly generated prediction method, the uncertainty measurement index of each prediction result is not given by the current prediction method, so that the confidence degree of the method on the prediction result cannot be quantified, and the requirement of high safety of a driving system is not facilitated.

No suitable data set provides supervised learning training: the existing method has the problems of lack of sequence marking of a driving odometer of a self-vehicle, lack of VRU marking and images, single scene and the like, and is not suitable for a VRU motion track deep learning prediction method considering human-vehicle interaction under multiple scenes.

Lack of prior knowledge when introducing artificial judgment VRU action: the traffic environment is complex and the priori rule knowledge is more, for example, when a human driver judges the VRU behavior, the human driver can judge according to important visual features such as head orientation, vehicles and gestures, and the prediction performance of a method without introducing the priori knowledge may be reduced.

Disclosure of Invention

It is an object of the present invention to provide a method for trajectory prediction of vulnerable road users in a driving environment of a vehicle that overcomes or at least alleviates at least one of the above-mentioned drawbacks of the prior art.

To achieve the above object, the present invention provides a method for predicting a trajectory of a road user vulnerable to injury in a vehicle driving environment, the method comprising:

step 1, establishing a VRU data set which is divided into a training data set and a testing data set;

step 2, preprocessing various data in the VRU data set, and transmitting the various data to the step 3 and the step 4;

step 3, an off-line training stage, which specifically comprises:

step 31, calculating a semantic vector according to the first N steps of VRU image frame sequence, the first N steps of VRU motion track sequence and the first N steps of self-driving odometer sequence in the training data set;

step 32, predicting VRU behavior characteristics according to the previous N steps of VRU image frame sequences in the training data set, wherein the VRU behavior characteristics comprise a head inclination view angle of a VRU and a vehicle probability vector;

step 33, predicting and generating prior behavior pattern distribution and posterior behavior pattern distribution of the VRU in a continuous iterative computation mode by utilizing a behavior pattern prediction network according to the semantic vector, the VRU motion track sequence in the last M step and the driving odometer sequence of the self-vehicle in the last M step, and predicting the VRU motion track in the continuous iterative computation mode by utilizing the track prediction network;

step 34, calculating a behavior pattern target function according to the prior behavior pattern distribution and the posterior behavior pattern distribution output in the step 33;

step 35, calculating a track prediction target function according to the VRU motion track output in the step 33 and the VRU motion track obtained in the last M steps;

step 36, calculating a behavior feature target function according to the VRU behavior features of the first N steps output in step 32 and the VRU behavior features of the first N steps in the training data set;

step 37, implementing supervised learning of the behavior pattern target function, the trajectory prediction target function and the behavior feature target function through back propagation to obtain a VRU motion trajectory prediction model supporting the input planning of the driving odometer sequence of the vehicle;

step 4, in the online track prediction stage, the method specifically comprises the following steps:

step 41, calculating semantic vectors according to the VRU image frame sequence of the first N steps, the VRU motion trail of the first N steps and the self-driving odometer sequence of the first N steps obtained on line;

and 42, predicting the VRU motion trail distribution of the future M steps by utilizing the VRU motion trail prediction model obtained in the step 37 in a continuous iterative calculation mode according to the semantic vector output in the step 41 and the post-M-step self-vehicle driving odometer sequence generated under the driving strategy selected by the decision module.

The invention provides a VRU motion trail prediction method based on a tape-controlled variation automatic coding machine, which is characterized in that when each VRU is predicted, the method uses observed VRU local visual characteristics, VRU motion characteristics and self-vehicle motion characteristics to infer initial VRU behavior pattern distribution, then the VRU motion trail is predicted according to the observation characteristics and the behavior patterns, and in the prediction, the method can dynamically update the behavior patterns according to different set human-vehicle interaction scenes, namely, an intelligent vehicle is allowed to obtain the prediction result under the corresponding interaction scene by inputting different planning strategies. The trajectory prediction method provided by the invention can be used for behavior prediction and safety protection of vulnerable road users (VRU for short) in an advanced driving assistance system, and can also provide help for decision making of unmanned vehicles.

Drawings

FIG. 1 is a flowchart of an off-line training process in the method according to the embodiment of the present invention.

Fig. 2 is a schematic structural diagram of the VRU behavior feature sequence prediction network in fig. 1.

Fig. 3 is a schematic diagram of posterior behavior pattern distribution generation and trajectory prediction in fig. 1.

FIG. 4 is a flowchart of an on-line test in the method according to the embodiment of the present invention.

FIG. 5 is a schematic diagram of the prior behavior pattern distribution generation and trajectory prediction of FIG. 4.

Fig. 6 is a schematic view of an application scenario of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

In order to make the implementation objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described in more detail below with reference to the accompanying drawings in the embodiments of the present invention.

Specifically, the trajectory prediction method of the embodiment of the invention comprises the following steps:

step 1, establishing a VRU data set.

The VRU data set is obtained by labeling signals sampled by an on-vehicle vision sensor, an Inertial Measurement Unit (IMU) and a positioning sensor (GPS) in a driving environment, the time frequency of the data set is a set value (for example: 10 frames per second), the data set is sorted according to a time sequence mode, and each step of data comprises the following parts:

the first part is a driving odometer of the self-vehicle, which is obtained by combining the degrees of freedom measured by the IMU sensor with GPS positioning correction, including position, speed and steering angle, and is used for representing the driving characteristics of the self-vehicle.

The second part is the characteristics of various types of VRUs in the visual scene, including pedestrians, low-speed two-wheeled vehicle occupants (e.g., cyclists) and high-speed two-wheeled vehicle occupants (e.g., motorists). The characteristics of various VRUs in the visual scene are obtained by manually marking sampling data, including positions, image framing position coordinates, head inclination visual angles and vehicle types. The vehicle categories include, among others, walking, bicycle, motorcycle and electric bicycle.

The third part is original monocular vision image semantic information which is obtained from image signals of a vision sensor and is mainly used for acquiring and coding the visual characteristics of the VRU by combining with the VRU image framing position.

Specifically, two-dimensional coordinates (x-y) are used for the position information of the VRU and the own vehicle, and the axis rotation angle in the vertical road plane direction is used for the VRU head inclination angle and the own vehicle steering angle. The setting of the reference frame will be explained in detail in the data preprocessing phase of off-line training and on-line testing.

The VRU data set is divided into a training data set and a testing data set according to the data acquisition scene, wherein the accumulated total time sequence step of the training data set exceeds 15000, and the training data set is used only in an off-line training stage and is used for training a track prediction model (a part to be trained in the graph 1). The cumulative total timing step for the test data set exceeds 2500 and is used only in the online trajectory prediction phase.

And 2, preprocessing each data in the VRU data set according to several types of data in the VRU data set.

Specifically, step 2 comprises:

the length of a data observation time interval window is set to be N, the length of a prediction time interval window is set to be M, and the observation time interval is positioned before the prediction time interval, so that the former N steps and the later M steps are carried out. Then, one training data includes a feature sequence with a length of N + M of the same VRU and the sequence of the driving odometer of the own vehicle in the corresponding time window. Specifically, the training data set in this embodiment may select a first N-step VRU behavior feature sequence, a first N-step VRU image frame sequence, a first N + M-step VRU motion trajectory sequence, and a first N + M-step driving odometer sequence.

Based on the difference in the kind of data in the data set, the corresponding preprocessing is performed as listed below.

First, the first N-step VRU behavior feature sequence, which includes the VRU head-leaning view sequence S ═ S₁,s₂,…,s_NAnd VRU vehicle type T, where s_NIndicating the VRU head inclination angle in the nth step image.

Since the image signal is captured by a camera moving with the vehicle, only the image signal is used when the angle of view is predicted by a subsequent prediction model, and therefore, when the VRU head inclination angle in the data set is used, the image signal needs to be converted from the world coordinate system to the image coordinate system. Specifically, the reference frame transformation is obtained by subtracting the heading angle of the vehicle in the world coordinate from the VRU head inclination angle in the world coordinate. In the image coordinate system, the direction of the vehicle is 0 degree at each time. The world reference system selects the position of the self vehicle in the first step of data as an origin, the X direction of coordinates is the direction of the vehicle head, the Y direction is the vertical direction of the vehicle head, and the angle is clockwise represented by the positive direction of the vehicle head being 0 degree. VRU vehicle type T is characterized as a vector with one and only one value of 1, the remainder being 0.

And the second type is a VRU image frame sequence with the first N steps.

Firstly, the image framing position of the VRU in each step is utilized to extract a VRU image frame sequence from the original monocular image.

Then, the VRU image frame is subjected to YUV color coding, histogram equalization, and the like, and then the size is uniformly scaled to a predetermined size, thereby obtaining a VRU image frame sequence B ═ B that can be input to a subsequent Convolutional Neural Network (CNN)₁,b₂,…,b_N}. Wherein, b_NAnd representing the VRU image frame in the Nth step. The predetermined dimension may be, for example, (224, 224,3), the first dimension (224) representing width, the second dimensionTwo dimensions (224) indicate high and the third dimension (3) indicates the number of image channels.

The third kind, VRU motion trajectory sequence of N + M steps, which is presented in the form of 2D coordinate sequence: p ═ P₁,p₂,…,p_N+M}＝{(x₁,y₁),(x₂,y₂),…,(x_N+M,y_N+M) }, wherein: p is a radical of_N+MDenotes the position of the VRU in the world coordinate system in step N + M, x_N+MThe X-axis coordinate value, y, of the VRU in the world coordinate system in the (N + M) th step_N+MAnd (4) representing the Y-axis coordinate value of the VRU in the world coordinate system in the N + M step.

The fourth category, the N + M step driving odometer sequence of the own vehicle, which includes the trajectory, velocity and steering angle of the own vehicle, is noted as:

wherein:

the coordinate value of the X axis of the bicycle in the world coordinate system of the step 1 is shown,

the coordinate value of the Y axis of the bicycle in the world coordinate system of the step 1 is shown,

represents the speed of the bicycle in step 1 on the X axis in a world coordinate system,

the steering angle of the bicycle in the world coordinate system in the step 1 is shown,

the X-axis coordinate value of the bicycle in the world coordinate system in the (N + M) th step is represented,

the Y-axis coordinate value of the bicycle in the world coordinate system in the (N + M) th step is represented,

represents the speed of the vehicle in the X axis of the world coordinate system in the N + M step,

and (4) indicating the steering angle of the bicycle in the world coordinate system in the (N + M) th step. The use of superscripts herein indicates features of the vehicle, to be distinguished from features of the VRU.

Overall, since the present invention involves many types of features in the observation phase and the prediction phase, in order to more simply show the distribution of the probability of the prediction condition, F ═ F is used₁,f₂,…,f_N+MDenotes a sequence of all features in the data set, f_iRepresenting all known observed features and the features generated by the prediction corresponding to step i.

Step 3, an off-line training stage, which specifically comprises:

and step 31, calculating a semantic vector according to the first N steps of VRU image frame sequence, the first N steps of VRU motion track sequence and the first N steps of self-driving odometer sequence in the training data set.

Specifically, step 31 includes:

step 311, the first type of data, i.e. the preprocessed first N-step VRU image frame feature sequence B ═ B₁,b₂,…,b_NAnd (5) as input, coding by a Visual feature coder Visual-Encoder to obtain a Visual feature vector sequence. Specifically, the Visual feature Encoder Visual-Encoder comprises a Convolutional Neural Network (CNN) and a Visual feature time sequence coding network LSTM_vis(LSTM recurrent neural network). Converting the image signal into a visual Feature vector sequence Box _ Feature ═ bf through a convolutional neural network₁,bf₂,…,bf_NAfter the previous step, by LSTM_visCalculating visual time sequence characteristics of visual characteristic vector sequence and selecting LSTM_visAnd finally, outputting the hidden state as a visual time sequence feature vector.

For example, as shown in FIG. 2, VGG16 is a convolutional neural network commonly used to process images, with a support input size of (C:)224,3), the original model consisting of 5 blocks (Block) and 3 Fully Connected layers (full Connected Layer) in a sequential order, each Block comprising 2-4 Convolutional layers (Connected Layer) and 1 pooling Layer (Pool Layer). The invention only selects 5 modules and the first 2 connecting layers (the number of neurons is 4096 and 4096 respectively) of VGG16, and a VRU image frame feature sequence B ═ B₁,b₂,…,b_NVia VGG16, a sequence of visual Feature vectors Box _ Feature ═ bf { bf } that will result in a dimension of 4096 is calculated₁,bf₂,…,bf_N}. Wherein, bf_NAnd representing the visual characteristic vector of the VRU image frame in the Nth step.

LSTM_visHas a cell dimension of 512, and thus, LSTM_visAt each step of the computation, the input is a visual feature vector of dimension 4096 (encoded by VGG 16), and the output is a hidden state of dimension 512. Selecting LSTM as shown in formula (1) and formula (2)_visThe hidden state of the last step (t ═ N) of the output is used as the visual time sequence characteristic vector c_vis。

In the formula (I), the compound is shown in the specification,

time-sequence coding network LSTM for representing visual characteristics_visThe hidden state of the output t step is output,

time-sequence coding network LSTM for representing visual characteristics_visHidden state of output t-1 step, bf_tAnd representing the VRU visual characteristic vector in the t step.

Step 312, using the second kind of data, i.e. the preprocessed sequence of VRU motion trajectories in the first N steps as the motion trajectory, and using LSVRU motion track time sequence characteristic encoder LSTM formed by TM recurrent neural network_trajAnd extracting time sequence characteristics in the sequence. Wherein, LSTM_trajThe output vector dimension is 64, therefore, LSTM_trajIn each step of the calculation, a position vector p representing the trajectory with a dimension of 2 is input_tThe output is a hidden state of dimension 64. As shown in formulas (3) and (4), LSTM is used_trajThe hidden state output in the last step (t is N) is used as a VRU motion track time sequence characteristic vector c_traj。

In the formula (I), the compound is shown in the specification,

encoder LSTM for representing VRU motion track time sequence characteristics_trajThe hidden state of the output t step is output,

encoder LSTM for representing VRU motion track time sequence characteristics_trajHidden state of output t-1 step, p_tAnd representing the VRU track position vector in the t step.

And step 313, representing the driving characteristics of the self-vehicle driving odometer sequence N steps before the third type of data. Self-driving characteristic time sequence characteristic encoder LSTM formed by LSTM recurrent neural network_odomSeparately extracting temporal features, LSTM, in the sequence_odomThe output vector dimension is 64, therefore, LSTM_odomFor each calculation in the sequence, the input is the odometer vector o with dimension 4_tThe output is a hidden state of dimension 64. As shown in formulas (5) and (6), LSTM is used_odomThe hidden state output in the last step (t is N) is used as the characteristic time sequence characteristic vector c of the driving of the vehicle_odom。

In the formula (I), the compound is shown in the specification,

encoder LSTM for representing time sequence characteristics of driving characteristics of self-vehicle_odomThe hidden state of the output t step is output,

encoder LSTM for representing time sequence characteristics of driving characteristics of self-vehicle_odomHidden state of output step t-1, o_tAnd representing the odometer vector of the bicycle in the t step.

And step 314, obtaining a semantic vector through the visual time sequence feature vector, the VRU motion track time sequence feature vector and the self-vehicle driving feature time sequence feature vector. Obtaining a visual time sequence characteristic vector c after coding the known observation part data of the data in the splicing training data set_visVRU motion track time sequence characteristic vector c_trajAnd the time sequence characteristic vector c of the driving characteristic of the self vehicle_odomAnd the vector dimensions are 512,64 and 64 respectively to obtain a semantic vector C as shown in the formula (7). The dimension of the semantic vector C is 640, and the semantic vector C is used for comprehensively representing the VRU visual time sequence characteristics, the VRU motion track characteristics and the driving odometer characteristics of the vehicle obtained in the previous N steps:

C＝concat(c_vis,c_traj,c_odom) (7)

it is again emphasized that the odometer true values for M steps after the own vehicle will also be taken as input in the subsequent predictions in the off-line training phase, while the true values here in the on-line testing phase will be replaced by M-step odometers generated by the decision module according to the selected driving strategy post-plan.

And step 32, predicting VRU behavior characteristics according to the previous N steps of VRU image frame sequences in the training data set, wherein the VRU behavior characteristics comprise the head inclination angle of the VRU and the probability vector of the vehicle.

In the world reference system, the position of a self vehicle in the first step of data can be selected as an origin, the forward direction of the vehicle head is in the X direction, the direction vertical to the vehicle head is in the Y direction, the forward direction of the vehicle head is in the 0-degree direction, the clockwise increasing angle is a deviation angle, and the value range is [0,2 pi ].

The specific method for "predicting the head-inclined viewing angle of the VRU in the VRU behavior characteristics according to the sequence of the VRU image frames in the first N steps" in step 32 includes the following steps:

step 321, setting the VRU head inclination view angle range s ∈ [0,2 pi ] to be equally divided into a plurality of regions according to a MultiBin algorithm. Such as: s e 0,2 pi) are divided equally into 16 intervals, i.e. each interval corresponds to 22.5 deg..

And 322, taking the visual feature vector sequence obtained in the step 311 as an input, predicting the interval where the VRU head inclination visual angle is located by using a classification method, and predicting the deviation of the visual angle relative to the centerline of the interval in the interval by using a regression method.

Such as: calculating probability vector of interval in which VRU head inclination visual angle is located by using full connection layer FC1 and Softmax normalization function

And regularizing by using a full connection layer FC2 and an L2 to obtain a sine value and a cosine value of the offset angle in the interval:

wherein, bfi_iVisual time sequence feature vector representing the VRU image frame in the ith step, FullyConnected (DEG) represents the full connection layer FC1 or FC2, Softmax (DEG) represents Softmax normalization function, Nomalization_l2(. cndot.) denotes L2 regularization.

In step 323, the VRU head inclination angle is calculated.

In one embodiment, the "predicting the vehicle probability vector in the VRU behavior feature according to the N-step VRU image frame sequence" in the step 32 specifically includes the following method:

step 325, according to bf_iThe vehicle probability vector is predicted using a classification method. Such as: computing a probability vector for a vehicle type using a full connectivity layer FC3 and a Softmax normalization function

FullyConnected (·) denotes the fully connected layer FC 3.

And step 33, predicting and generating the prior Behavior pattern distribution and the posterior Behavior pattern distribution of the VRU in a continuous iterative computation mode by utilizing a Behavior pattern prediction network (Behavior Predictor) according to the semantic vector, the VRU motion track sequence at the last M step and the driving odometer sequence of the self-vehicle at the last M step, and predicting the VRU motion track in a continuous iterative computation mode by utilizing a track prediction network (track-Predictor).

As shown in fig. 3, the behavior pattern prediction network and the trajectory prediction network as a whole constitute a Conditional Variable Automatic Encoder (CVAE), and represent VRU behavior patterns affected by human-vehicle interaction with a dynamic state by latent codes (latent codes) that can be continuously generated. The Behavior pattern distribution prediction network comprises a Prior Behavior pattern distribution prediction network (Prior Behavior Predictor) and a Posterior Behavior pattern distribution prediction network (Posterior Behavior Predictor). In the off-line training stage, the posterior behavior pattern distribution prediction network is used for predicting and generating VRU behavior pattern distribution, and the VRU behavior pattern distribution is input into the trajectory prediction network to participate in iterative prediction, so that the calculation result of the prior network is as same as that of the posterior network as possible. In the online test stage, the posterior behavior pattern distribution prediction network has no effect, and only the behavior pattern distribution generated by the prior behavior pattern distribution prediction network is used for participating in iterative prediction.

The method for predicting and generating the prior behavior pattern distribution and the posterior behavior pattern distribution of the VRU in a continuous iterative computation mode by using a behavior pattern prediction network according to the semantic vector, the VRU motion trajectory in the next M step and the self-vehicle driving odometer sequence in the next M step specifically comprises the following steps:

step 331, set two LSTM recurrent neural networks: prior behavior pattern distribution prediction network LSTM_priorAnd posterior behavioral pattern distribution prediction network LSTM_post. The two networks are identical in that the unit dimension is 64, and the semantic vector C is calculated by the multilayer perceptron MLP to be used as an initial hidden state (the initial subscript is set to start from the last observation time t ═ N). The two networks differ in that the LSTM is used when the distribution of the behavior pattern of N +1 ≦ t ≦ N + M in the predicted tth step_postCoding the time sequence characteristics of the time interval of 1 ≤ i ≤ t, and generating posterior distribution, LSTM, of the t-th walking as mode_priorAnd coding the time sequence characteristics of the time period with the time interval of more than or equal to 1 and less than t, and generating the prior distribution of the t-th walking as the mode. The former is used only for the offline training phase, the latter for the offline training and online testing phases, respectively.

After being calculated by a multilayer state perception machine, the semantic vector is used as a prior behavior pattern distribution prediction network LSTM_priorAnd posterior behavioral pattern distribution prediction network LSTM_postAs shown in equation (8);

in the formula (I), the compound is shown in the specification,

distribution prediction network LSTM representing prior behavior patterns_priorThe initial hidden state (the network starts to be used at global time t, N, so the subscript is N),

distribution prediction network LSTM for representing posterior behavior mode_postThe network starts to be used at global time t ═ N, so the subscript is N), MLP (·) denotes a multi-layer state-aware network, and C denotes a semantic vector.

Step 332, dividing the VRU motion trail of the last M steps and the mileage sequence of the bicycle of the last M steps into M steps according to the time sequence to be taken as a rowFor the continuous input of the mode distribution prediction network, the network uses the input to update the hidden state of the network in each step, and the prior behavior mode distribution prediction network LSTM updates the hidden state in the t step_priorInputting the data of the t-1 step, and predicting the posterior behavior pattern distribution network LSTM_postData at the t-th step time are input, and are shown in equations (9) and (10):

in the formula (I), the compound is shown in the specification,

representing the hidden state of the distribution prediction network representing the posterior behavior pattern at the t step,

representing the hidden state of the distributed prediction network representing the prior behavior pattern at step t, LSTM_post(. represents a posterior behavioral Pattern distribution prediction network, LSTM_prior(. represents a prior behavior Pattern distribution prediction network, p_tIndicating the coordinate position, o, of the VRU in step t_tAnd (4) indicating the driving odometer of the self vehicle at the t step.

In step 333, the prior distribution and the posterior distribution of the behavior pattern distribution can be set to be one-dimensional gaussian distribution, but are not limited thereto. The mean and variance of the distribution are predicted by the model. Calculating hidden states of two behavior pattern prediction networks from N +1 to N + M time by a multilayer state perception machine, and predicting and generating prior behavior pattern distribution and posterior behavior pattern distribution of VRUs at corresponding time, wherein the prior behavior pattern distribution and the posterior behavior pattern distribution are shown as a formula (11) and a formula (12):

in the formula, mu_post(t) means, σ, of the distribution of the posterior behaviour patterns of step t_post(t) standard deviation of posterior behavior pattern distribution in the t step, MLP (-) represents multilayer state-aware network, μ_prior(t) means, σ, of the distribution of the prior behavior patterns of step t_piror(t) represents the standard deviation of the prior behavior pattern distribution at step t.

And in the off-line training stage, the posterior behavior pattern distribution is used as an input trajectory prediction network to participate in the behavior pattern distribution of trajectory prediction. However, in the online test, data required by the posterior behavior pattern distribution calculation cannot be obtained in real time, so that the prior behavior pattern distribution is used as the behavior pattern distribution participating in the trajectory prediction in the online test, and in the offline training stage, the data obtained by training the prior behavior pattern distribution prediction network only when the network is online is used to fit the performance of the posterior behavior pattern distribution prediction network.

The method for predicting the VRU motion trail by using the trail-Predictor in a continuous iterative computation mode according to the semantic vector, the VRU motion trail in the last M steps and the driving odometer sequence of the self-vehicle in the last M steps specifically comprises the following steps of:

steps 331 to 333 are the same as those provided in the above embodiment.

Step 334, set up an LSTM recurrent neural network LSTM_trajThe trajectory is predicted, with a network unit dimension of 64. After the semantic vector is calculated by a multilayer state perception machine, the semantic vector is used as a track prediction network LSTM_trajAs shown in equation (13);

in the formula (I), the compound is shown in the specification,

an initial hidden state is shown representing the trajectory prediction network (the subscript N is used since the network starts to be used at global time t — N).

Step 335, using the posterior behavior pattern distribution as the behavior pattern distribution P (z) participating in the trajectory prediction_t|f_≤t) Sampling the behavior vector z from the distribution using a resampling method_tAs a trace prediction network LSTM in corresponding time instants_trajAs shown in equation (14).

In the formula (I), the compound is shown in the specification,

represents a normal distribution determined by the mean and standard deviation parameters.

Step 336, the trajectory prediction network takes the behavior vector as input, and updates the hidden state of itself through calculation, as shown in formula (15).

In the formula (I), the compound is shown in the specification,

indicating the hidden state of the trajectory prediction network at step t, LSTM_traj(. cndot.) represents a trajectory prediction network.

And 337, calculating the hidden state of the track prediction network from N +1 to N + M by a multilayer state sensing machine, predicting to obtain the track distribution of VRU, and setting the prediction result as two-dimensional Gaussian distribution for quantifying prediction uncertainty.

Specifically, the input of each step of the LSTM network is a behavior pattern distribution sample value (vector) with dimension 1, the output is a hidden state vector with dimension 64, and the hidden state vector outputs a trajectory prediction distribution with dimension 5 through the MLP. Wherein, 2-dimensional mean, 2-dimensional standard deviation and 1-dimensional correlation coefficient are shown in formulas (16) to (18).

And step 34, calculating a Behavior pattern target function Loss _ Behavior according to the prior Behavior pattern distribution and the posterior Behavior pattern distribution output in the step 33. As shown in equation (19), the Behavior pattern objective function Loss _ Behavior uses LSTM in the same step_postAnd LSTM_priorThe calculated KL divergence between the prior and posterior distributions is used as an objective function, and the KL divergence can measure the similarity between the two distributions.

In the formula, D_KLIndicating KL divergence (also called relative entropy), f_≤tRepresenting all known observation features and predicted generated features of the previous t steps, P (-) represents probability distribution, | | | represents symbols in KL divergence formula, and is used for dividing two probability distributions.

And step 35, calculating a Trajectory prediction objective function Loss _ Tracory according to the VRU motion Trajectory predicted in the step 33 and the VRU motion Trajectory true value in the last M steps. As shown in equation (20), the Trajectory prediction objective function Loss _ Trajectory uses maximum likelihood estimation as an objective function, and approximates the estimation using L ═ 20 samples:

in the formula (I), the compound is shown in the specification,

the ith sampling value representing the posterior behavior pattern distribution of the t step, |, represents the symbol of the conditional probability distribution formula.

And step 36, calculating a behavior characteristic target function Loss _ Feature according to the VRU behavior characteristics in the first N steps output in the step 32 and the VRU behavior characteristics in the first N steps in the training data set. As shown in equation (21), the behavior Feature objective function Loss _ Feature is defined by the head dip objective function L_orientationAnd a vehicle classification objective function L_{transportation}Consists of the following components:

Loss_Feature＝L_orientation+L_{transportation} (21)

wherein:

in the formula, s_tRepresents the true head tilt of the VRU at step t,

indicating the deviation of the VRU predicted at the t-th step from the line angle in the prediction interval,

probability vector, theta, representing the interval in which the head inclination is predicted in step t_tThe unique heat vector, mean (theta), representing the interval in which the true value of the t step lies_t) Represents the value of the median angle of the interval in which the true trend angle lies, crossEncopy (. cndot.) represents the cross entropy function,

as shown in equation (22), the behavior pattern objective function, the trajectory prediction objective function and the behavior feature objective function constitute a loss function L, and the Adam optimizer optimizes the trainable network by using the loss function through back propagation, so as to reduce the loss as much as possible.

L＝α×Loss_Feature+β×Loss_Behavior+Loss_Trajectory (22)

Wherein: alpha and beta represent hyper-parameters, and specific numerical values can be determined by comparing experimental results by using an enumeration method.

And step 37, implementing supervised learning by the behavior mode target function, the trajectory prediction target function and the behavior characteristic target function through back propagation to obtain a VRU motion trajectory prediction model supporting the input planning of the driving odometer sequence of the self-vehicle.

As shown in fig. 4, step 4, the online trajectory prediction stage specifically includes:

and step 41, calculating a semantic vector according to the VRU image frame sequence in the first N steps, the VRU motion track in the first N steps and the self-driving odometer sequence in the first N steps acquired on line. Specifically, three types of sequences in online test data are coded by using a visual characteristic encoder, a VRU motion track time sequence characteristic encoder and a self-driving characteristic time sequence characteristic encoder which are trained in an off-line stage to form a time sequence characteristic vector c_vis,c_traj,c_odemAnd splicing to obtain a semantic vector C as shown in formulas (23) to (26).

c_vis＝Visual_Encoder(B) (23)

c_traj＝Trajectory_Encoder(P) (24)

c_odom＝Odometry_Encoder(O) (25)

C＝concat(c_vis,c_traj,c_odom) (26)

And 42, predicting the VRU motion trail distribution of the future M steps by using the VRU motion trail prediction model obtained in the step 37 in a continuous iterative computation mode according to the semantic vector output in the step 41 and the post-M-step self-vehicle driving odometer sequence generated under the driving strategy selected by the decision module, wherein the prediction result provides reference for the subsequent driving decision of the intelligent vehicle.

Generally, the smart car has a plurality of selectable subsequent driving strategies under a scene described by an observation data, for example, when the smart car drives to an intersection to prepare for a right turn, a pedestrian is found to be ready to cross the street, and the driving strategies include, but are not limited to, sudden braking, slow walking, constant speed passing and the like. Due to the fact that the characteristic of high dynamic interaction is frequently achieved between people and vehicles in the traffic scene, after different types of strategies are selected for driving, different people and vehicles interaction may be generated by different driving strategies in a subsequent prediction time interval window, and further different changes and different tracks are generated in the behavior mode of the VRU.

In reality, the motion trail of the VRU is influenced by various factors, but in an automatic driving scene, the self vehicle is the only controllable individual in the scene, and the influence on the VRU behavior is mainly based on the change of human-vehicle interaction between the VRU and the self vehicle, so that the trail prediction of the invention is developed under the assumption that the driving strategy can influence the human-vehicle interaction and further influence the VRU trail.

In order to enable the prediction network to support the change of the VRU behavior mode within the prediction time period window and strengthen the causal relationship between the VRU prediction track and human-vehicle interaction so as to enhance the accuracy and usability of prediction, the driving strategy influencing the human-vehicle interaction needs to be input into the prediction network in a certain form. Two points need to be considered for the form of the driving strategy input network, namely the input form supported by the model (the variable related to the current model training is the odometer), and the VRU serving as a third party can judge the human-vehicle interaction relation only based on the observable variable (the form of the strategy input network is required to be observable by the VRU), so that the odometer is selected to represent the characteristics of the driving strategy. When the online prediction is carried out, the decision module needs to input the selected alternative driving strategy into the model in a driving odometer sequence of the driving self vehicle observable by the VRU, and the model can model human-vehicle interaction under the selected driving strategy and predict the behavior mode and the track of the VRU.

Using D to represent the selected driving strategy, using odometers

To representAnd generating a planned VRU observable driving characteristic, wherein the format of the odometer is consistent with that of the driving odometer of the self-vehicle in the data set. Step 42 specifically includes:

step 421, after the semantic vector is calculated by the multi-layer state sensing machine, it is used as the prior behavior pattern distribution prediction network LSTM_priorAnd track prediction network LSTM_trajThe initial hidden state of (a).

Step 422, according to the corresponding driving strategy D appointed by the intelligent vehicle decision module, planning and generating a corresponding odometer O^DAnd the input is used as the input of a VRU motion trail prediction model.

Step 423, as shown in fig. 5, setting the prediction duration as M steps, and for the prediction of the t (N +1 ≦ t ≦ N + M) step, using the prior behavior pattern distribution prediction network LSTM_priorTaking the generated sequence of the driving odometer of the self vehicle and the predicted track point (when t is N +1, the track point true value of the Nth step) obtained in the preamble as input, and calculating the distribution of the current prior behavior pattern

Step 424, sampling on the distribution using a re-parametric approach, using a trajectory prediction network LSTM_trajAnd predicting the track point distribution of the t step based on the sampled behavior vector.

Step 425, as shown in equation (27), selecting a two-dimensional mean of the distribution of the tracing points to represent the mean as

As LSTM in the next iteration_priorThe behavior pattern distribution and the trajectory prediction of the t +1 step are performed in an autoregressive mode:

in the formula (I), the compound is shown in the specification,

represents the mean of the predicted t-th step trajectory distribution,

a covariance matrix representing the predicted t-th step trajectory distribution.

And 426, integrating the VRU track points predicted in each step into a sequence to obtain a VRU motion track distribution prediction result under a corresponding driving strategy D, wherein each predicted step is represented by two-dimensional Gaussian distribution, and uncertainty of the prediction result is quantified by using a covariance matrix of the distribution. The prediction is a result of one execution, and the semantic vector obtained in step 41 and the generated own odometer O are retained^DWithout change, multiple sub-steps of step 42 may be performed, and due to the randomness of the resampling method, a variety of feasible VRU motion trajectory distributions may be generated as shown in equation (29):

wherein P (-) represents a conditional probability distribution,

representing the predicted VRU trajectory from step N +1 to step M + N, f_1:NRepresenting all known observed features and features generated by the prediction from step 1 to step N.

Fig. 6 illustrates an application example of the present invention, and this example provides a VRU target for simplicity of description. Consider an intersection without a right turn signal control light where the vehicle is scheduled to turn right, but where there are pedestrians scheduled to cross the road.

If a conventional multi-track prediction method is used, all possible actions of the pedestrian are predicted based on the observed known information as shown in a in fig. 6.

If the method provided by the invention is used, besides the observed known information, the subsequent planned path and behavior of the vehicle are also input, and different driving strategies are predicted to be associated with the VRU track, for example:

when the vehicle keeps turning to the right, the pedestrian will tend to wait on site or try to go forward and turn back in a behavior pattern, as shown by b in fig. 6.

When the vehicle slows down to a stop, the pedestrian will tend to pass straight through the intersection, as shown at c in fig. 6.

When the car is whistling to remind and keep turning right at speed, the pedestrian will tend to wait in place without any attempt to cross the road, as shown at d in fig. 6.

Therefore, the prediction can more accurately reflect the result under a certain driving strategy, and the strategy can be more accurately evaluated and selected by an upper-layer decision making system.

Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for predicting a trajectory of a road user vulnerable to vehicle driving environment, comprising:

step 3, an off-line training stage, which specifically comprises:

step 35, calculating a track prediction objective function according to the VRU motion track predicted in the step 33 and the VRU motion track true value in the last M steps;

2. The method according to claim 1, wherein the step 31 comprises:

step 311, using the VRU image frame feature sequence of the first N steps as input, and obtaining a visual feature vector sequence through encoding by a convolutional neural network and an LSTM cyclic neural network; after converting the image signal into a visual feature vector sequence through a convolutional neural network, calculating the visual features of the visual feature vector sequence through an LSTM recurrent neural network, and selecting the hidden state output by the last step of the LSTM recurrent neural network as a visual time sequence feature vector;

step 312, selecting the former N-step VRU motion track sequence as a motion track, extracting the time sequence characteristics in the sequence by using a VRU motion track time sequence characteristic encoder formed by an LSTM recurrent neural network, and using the hidden state output by the last step of the VRU motion track time sequence characteristic encoder as a VRU motion track time sequence characteristic vector;

313, selecting a self-driving odometer sequence of the first N steps to represent driving characteristics, extracting time sequence characteristics in the sequence by using a self-driving characteristic time sequence characteristic encoder formed by an LSTM recurrent neural network, and using a hidden state output by the last step of the self-driving characteristic time sequence characteristic encoder as a self-driving characteristic time sequence characteristic vector;

and step 314, obtaining a semantic vector by splicing the visual time sequence feature vector, the VRU motion track time sequence feature vector and the self-vehicle driving feature time sequence feature vector.

3. The method for predicting the trajectory of a vulnerable road user in a driving environment of a vehicle according to claim 1, wherein the step 32 of predicting the head inclination view angle of the VRU in the VRU behavior characteristics according to the N-step VRU image frame sequence specifically comprises the following steps:

step 321, setting a VRU head tendency view angle range s belonging to [0,2 pi ] to be equally divided into a plurality of regions according to a MultiBin algorithm;

step 322, calculating a probability vector of an interval where the VRU head inclination angle is located by using the full connection layer FC1 and the Softmax normalization function according to the visual time sequence feature vector obtained in the step 311

wherein, bf_iVisual feature vector representing the VRU image frame in the ith step, FullyConnected (-) represents the full connection layer FC1 or FC2, Softmax (-) represents the Softmax normalization function, Nomalization_l2(. h) represents L2 regularization;

step 323, calculating the VRU head inclination angle by using a MultiBin algorithm.

4. The method for predicting the trajectory of a vulnerable road user in a driving environment of a vehicle according to claim 3, wherein the step 32 of predicting the vehicle probability vector in the VRU behavior feature according to the N-step VRU image frame sequence specifically comprises the following steps:

step 325, according to bf_iCalculating a probability vector for a vehicle type using the full connectivity layer FC3 and the Softmax normalization function

FullyConnected (·) denotes the fully connected layer FC 3.

5. The method for predicting the trajectory of a vulnerable road user in the driving environment of a vehicle according to any one of claims 1 to 4, wherein the method for predicting the prior behavior pattern distribution and the posterior behavior pattern distribution of the generated VRU in a continuous iterative calculation manner by using the behavior pattern prediction network according to the semantic vector, the VRU motion trajectory in the last M steps and the self-driving odometer sequence in the last M steps in the step 33 specifically comprises:

step 331, after the semantic vector is calculated by the multilayer state sensing machine, the semantic vector is used as a prior behavior pattern distribution prediction network LSTM_priorAnd posterior behavioral pattern distribution prediction network LSTM_postThe initial hidden state of (a);

in the formula (I), the compound is shown in the specification,

representing an initial hidden state of the a priori behavior pattern distribution prediction network,

representing an initial hidden state of the posterior behavior mode distribution prediction network, wherein MLP (question mark) represents a multilayer state perception machine network, and C represents a semantic vector;

step 332, dividing the VRU motion trail of the last M steps and the mileage sequence of the next M steps into M steps according to time sequence to be used as the continuous input of the behavior pattern distribution prediction network, updating the hidden state of the network by using the input in each step, and updating the hidden state of the network in the t step by using the prior behavior pattern distribution prediction network LSTM_priorInputting data at the t-1 step time, and predicting the posterior behavior pattern distribution network LSTM_postInputting data at the t step moment;

in the formula (I), the compound is shown in the specification,

representing the hidden state of the posterior behavior pattern distribution prediction network at the t step,

representing the hidden state of the prior behavior pattern distribution prediction network at step t, LSTM_post(. represents a posterior behavioral Pattern distribution prediction network, LSTM_prior(. represents a prior behavior Pattern distribution prediction network, p_tIndicating the coordinate position, o, of the VRU in step t_tDenotes the t-thStep-by-step driving odometer;

step 333, predicting and generating prior behavior pattern distribution and posterior behavior pattern distribution of the VRU at the corresponding time by calculating hidden states of the two behavior pattern prediction networks from N +1 to N + M through a multilayer state perception machine:

6. The method for predicting the trajectory of a vulnerable road user in the driving environment of a vehicle according to any one of claims 1 to 4, wherein the method for predicting the VRU motion trajectory in a continuous iterative calculation manner by using the trajectory prediction network according to the semantic vector, the VRU motion trajectory in the last M steps and the driving odometer sequence in the last M steps in step 33 specifically comprises the following steps:

step 332, performing sequential operation on the VRU motion trail of the last M steps and the odometer sequence of the bicycle of the last M stepsDividing the network into M steps as the continuous input of the behavior pattern distribution prediction network, updating the hidden state of the network by using the input in each step, and updating the hidden state in the t step by using the prior behavior pattern distribution prediction network LSTM_priorInputting data at the t-1 step time, and predicting the posterior behavior pattern distribution network LSTM_postInputting data at the t step moment;

step 334, after the semantic vector is calculated by the multilayer state perception machine, the semantic vector is used as a track prediction network LSTM_trajThe initial hidden state of (a);

in the formula (I), the compound is shown in the specification,

representing an initial hidden state of the trajectory prediction network;

step 335, using the posterior behavior pattern distribution as the behavior pattern distribution P (z) participating in the trajectory prediction_tF is less than or equal to t), a behavior vector z is sampled from the distribution by using a resampling method_tAs a trace prediction network LSTM in corresponding time instants_trajThe input of (1);

in the formula (I), the compound is shown in the specification,

represents a normal distribution determined by mean and standard deviation parameters;

step 336, the trajectory prediction network takes the behavior vector as input, and updates the hidden state of itself through calculation.

In the formula (I), the compound is shown in the specification,

indicating the hidden state of the trajectory prediction network at step t, LSTM_traj() represents a trajectory prediction network;

7. The method according to any one of claims 1 to 4, wherein step 42 comprises:

step 421, after the semantic vector is calculated by the multi-layer state sensing machine, it is used as the prior behavior pattern distribution prediction network LSTM_priorAnd track prediction network LSTM_trajThe initial hidden state of (a);

step 422, according to the corresponding driving strategy D appointed by the intelligent vehicle decision module, planning and generating a corresponding odometer O^DAnd is used as the input of a VRU motion trail prediction model;

step 423, setting the prediction duration as M steps, and for the prediction of the t step, using prior behavior pattern distribution prediction network LSTM_priorTaking the generated sequence of the driving odometer of the self vehicle and the predicted track points in the preamble as input, and calculating the distribution of the current prior behavior pattern

Step 424, sampling on the distribution using a re-parametric approach, using a trajectory prediction network LSTM_trajPredicting the track point distribution of the t step based on the sampled behavior vector;

425, selecting the two-dimensional mean value of the distribution of the trace points as the LSTM in the next iteration_priorPerforming autoregressive behavior pattern distribution and track prediction of the step t + 1;

in the formula (I), the compound is shown in the specification,

represents the mean of the predicted t-th step trajectory distribution,

a covariance matrix representing the predicted tth step trajectory distribution;

426, integrating the distribution of the VRU track points predicted in each step into a sequence to obtain a VRU motion track distribution prediction result under a corresponding driving strategy D, wherein each predicted coordinate point is represented by two-dimensional Gaussian distribution, and the uncertainty of the prediction result is quantified by using a distributed covariance matrix; the prediction is the result of one execution, and the words obtained in step 41 are retainedSemantic vector sum generated self-vehicle odometer O^DThe substep of step 42 is performed multiple times without change, and due to the randomness of the resampling method, a variety of feasible VRU motion trajectory distributions can be generated;

wherein P (-) represents a conditional probability distribution,

representing the predicted VRU trajectory from step N +1 to step M + N, f_1：NRepresenting all known observed features and features generated by the prediction from step 1 to step N.