CN115049009A

CN115049009A - Track prediction method based on semantic fusion representation

Info

Publication number: CN115049009A
Application number: CN202210707333.2A
Authority: CN
Inventors: 陈剑; 陈钢
Original assignee: Yangtze River Delta Information Intelligence Innovation Research Institute
Current assignee: Yangtze River Delta Information Intelligence Innovation Research Institute
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-09-13

Abstract

The invention discloses a track prediction method based on semantic fusion representation, which comprises the following steps: removing error points and redundant points by using a track data preprocessing method, and dividing track data by adopting a sliding window to form a track sequence with labels; carrying out multi-dimensional track fusion vector representation by combining longitude and latitude information, time information, speed information and direction information of the vehicle; learning the depth features of the track sequence by using an automatic encoder, and jointly constructing semantic representation of the track sequence by combining the original features of the track sequence; the trajectory prediction method based on the transform learns the correlation between trajectories through a multi-head self-attention mechanism and a mask-based self-attention method, and further realizes the prediction of the trajectories. The method can better predict the position and the regional distribution of the automobile at the future time, improve the accuracy of predicting the track of the automobile, enable a driver to change the travel time or the travel track in advance, and avoid traffic jam.

Description

Track prediction method based on semantic fusion representation

Technical Field

The invention relates to a track prediction method based on semantic fusion representation.

Background

With the vigorous development of the internet and internet of things technology, massive trajectory data is produced. The trajectory data includes traffic trajectory data, human activity trajectory data, and trajectory data of other movable objects. By mining the track data, the activity rule of the mobile object can be obtained.

However, the traditional method is too single and has low accuracy for predicting the position of the automobile. Specifically, time series models are widely used to predict the trajectory of a vehicle, and representative time series models are RNN, LSTM, and the like. However, the current trajectory prediction model lacks semantic fusion representation of automobile trajectory points and trajectory sequences, and the traditional time sequence model is difficult to capture the correlation between the trajectory points, so that the overall prediction accuracy of the model is not high.

Therefore, aiming at the defect that the traditional method lacks the acquisition of correlation information between semantic mining representation and track points of multi-modal track data, the invention provides a track prediction method based on semantic fusion representation.

Disclosure of Invention

The invention aims to provide a track prediction method based on semantic fusion representation, which can better predict the position and the regional distribution of an automobile at the future time, improve the accuracy of predicting the track of the automobile, enable a driver to change the travel time or the travel track in advance and avoid traffic jam.

In order to achieve the above object, the present invention provides a trajectory prediction method based on semantic fusion characterization, wherein the method comprises:

step 1: removing error points and redundant points by using a track data preprocessing method, and dividing track data by adopting a sliding window to form a track sequence with labels;

step 2: carrying out multi-dimensional track fusion vector representation by combining longitude and latitude information, time information, speed information and direction information of the vehicle;

and step 3: learning the depth characteristics of the track sequence by using an automatic encoder, and jointly constructing the semantic representation of the track sequence by combining the original characteristics of the track sequence;

and 4, step 4: the trajectory prediction method based on the transform learns the correlation between trajectories through a multi-head self-attention mechanism and a mask-based self-attention method, and further realizes the prediction of the trajectories.

Preferably, the method further comprises step 5: and verifying the track prediction model.

Preferably, step 1 comprises:

step 1.1: removing error points;

set the tracing point p _i And p _j With a time interval Δ t therebetween ₁ With a spatial distance Δ d ₁ ；

Set the tracing point p _j And p _k With a time interval Δ t therebetween ₂ With a spatial distance Δ d ₂ ；

Setting the upper limit of the driving speed of the urban road vehicle to V _max ；

If the condition is satisfied: Δ d ₁ >Δt ₁ *V _max And Δ d ₂ >Δt ₂ *V _max Then judging the track point p _j For error points, should be removed;

step 1.2: removing redundant points;

set tracing point p _i 、p _j 、p _k 、p _n Are successively delta d ₁ 、Δd ₂ And Δ d _n Setting the circle radius threshold value R as 20m, and if the condition delta d is satisfied ₁ <2*R，Δd ₂ <2R and Δ d _n <2R, considering that the position of the vehicle is basically kept unchanged in a time period, and using a track point p as an equivalent point of a redundant point in the region; wherein, the longitude and latitude of p are calculated according to the average value in the redundant area;

step 1.3: forming a track sequence;

dividing the track based on a sliding window, setting the sliding window with a fixed length at the beginning of the track, setting the last track point in the window as a position point to be predicted, and taking the rest track points as training characteristics to form a training sample; sliding the window forward one position in sequence to form a new training sample, and adding the new training sample into the training characteristic sequence and the label sequence respectively; when the window reaches the last position of the track sequence, extracting track points, and finishing the division; in particular, in a time window T _j Track ofT＝{p ₁ ,p ₂ ,...,p _n H, mixing p ₁ ,p ₂ ,...,p _j-1 As training features, p _j As the next position tag for the track.

Preferably, in step 2, the multi-modal semantic track is set to Traj (o) _i )＝{p ₁ (o _i ),p ₂ (o _i ),...,p _n (o _i ) In which p is _n (o _i ) Representing an object o _i Of the nth position, i.e. p _n (o _i )＝{L _n ,T _n ,D _n ,I _n }; wherein L is _n Representing an object o _i Semantic information of latitude and longitude at nth position, T _n Characterizing temporal semantic information, D _n Characterizing vehicle speed information of the vehicle, I _n Characterizing directional information of the vehicle;

the multi-dimensional track fusion vector characterization comprises the following steps:

semantic representation of track longitude and latitude characteristics, which comprises representing an area formed in a certain longitude and latitude range by adopting a grid division method and is used for capturing semantic information of the longitude and latitude characteristics;

assuming that the grid is divided into n segments, then L _n Can be expressed as an n × 1 dimensional vector; at the same time, the design dimension is D _l X n conversion vector E _l Is prepared by mixing L _n Conversion to D _l Vector of x 1

The formula is as follows:

the track time characteristic semantic representation comprises the steps of representing a time period by adopting a gridding division method and is used for capturing semantic information of time characteristics;

setting the period of hour as the grid division, dividing one hour into m sections, then T _n Can be expressed as an mx 1-dimensional vector; at the same time, the design dimension is D _t X m conversion vector E _t Will T _n Conversion to D _t Vector of x 1

The formula is as follows:

semantic representation of vehicle speed information, wherein V discrete values of the vehicle speed are set corresponding to the speed information contained in the track information; first, velocity information D is obtained _n Encoding the data into a vector of V multiplied by 1, wherein V represents the number of discrete speed values in the current data set; then, using the dimension D _d X V transformation matrix E _d Will D _n Is converted into

The formula is as follows:

semantic representation of vehicle direction information, setting the number of discrete values of the vehicle direction information to be Q corresponding to the vehicle direction information contained in the track, and then I _n Has a dimension vector of Qx 1 and a use dimension of D _i X Q transformation matrix E _i Will I _n Is converted into

The formula is as follows:

to this end, a multidimensional track semantic p is calculated _n (o _i ) The formula is as follows:

preferably, step 3 comprises:

step 3.1: coding;

an automatic encoder of an encoder-decoder is adopted to learn the depth characteristics of a track sequence, namely, an obtained multi-dimensional track semantic sequence is input to an encoder part of the automatic encoder, and the updating mode of a hidden layer is shown as the following formula:

h _i ＝f _encoder (h _i-1 ,b _i )

wherein f is _encoder Representing the encoder function of an automatic encoder, b _i Semantic input representing track points;

step 3.2: decoding;

final output h of hidden layer of Encoder in step 3.1 _i Will represent the entire track sequence and will serve as the initial hidden layer of the decoder LSTM, resulting in the output sequence c ₁ ,c ₂ ,...,c _i The hidden layer update of the decoder is shown as the following formula:

h' _i ＝f _decoder (h' _i-1 ,c _i-1 )

wherein f is _decoder Decoder function representing an automatic encoder, c _i-1 Represents the output of the decoder;

step 3.3: the learning objective of the decoder is the input to the encoder, which passes the minimum error, which takes the function:

the output of the trained encoder can effectively represent the input data, i.e. the output information contains the depth characteristic information of the track sequence.

Preferably, step 4 comprises trajectory prediction model coding training and trajectory prediction model decoding training.

Preferably, in the trajectory prediction model coding training, the characterization vector of the trajectory sequence is assumed to be T ═ T (T) ₁ +T ₂ +...+T _N ) Wherein N is the number of tracing points, T _i For semantic representation of each track point, the input of the track sequence needs to add one at the end of the sequence</s>Setting the length of the sequence as F, and enabling the length of the track sequence with the length less than F to be F by a Padding mode; also, add to the start position of the output sequence<s>End position addition</s>Setting the length of an output prediction sequence of the track as M; the method comprises the following steps:

firstly, during training, a batch mode is adopted, the size of the batch is set to be B, the input track sequence dimension (B, F) is set, the dimensions of track coding and Position coding are set to be the same and are both E, and after track Embedding and Position Embedding, the vector dimension at the moment is (B, F, E);

secondly, adjusting the dimension of the vector after Embedding from (B, F, E) to (B, F, E), and then using the vector to construct a query, key, value matrix, so that the dimension is (B, F, E); then transpose query, key, value matrix dimension from (B x F, E) to (B, F, E); since a multi-head attention mechanism is adopted, the number of heads is set to be N, and N × H ═ E, then query, key, value are divided into (B, F, N, H), and an attribute score is calculated according to query, key, value, and the formula is as follows:

finally, calculating by using the scores of the scores and the value matrix to obtain an attention vector;

after obtaining an attention vector, obtaining input dimensions (B, F, E), inputting the input dimensions into a full-connection layer, setting the dimension of an intermediate hidden variable as D, multiplying different words in a sequence by a weight matrix (E, D), and then multiplying by another weight matrix (D, E), so that the final dimension of the matrix is (B, F, E); after the rest 5 layers of encoders, the finally obtained encoder output is (B, F, E).

Preferably, in the decoding training of the track prediction model, the decoder of the training process receives the input sequence (B, T), and performs track coding and position coding on the input sequence, wherein the track coding and the position coding are the same group of weight matrixes as those in the Encoder; the method comprises the following steps:

firstly, after an input sequence is subjected to track coding and position coding, the obtained matrix dimension is (B, T, E), and coding is carried out by using a mask multi-head attention mechanism; the multi-head attention of the mask is the same as the structure of the self-attention network, but a mask matrix is added for covering future information by adding the mask matrix to the input complete sequence T during training;

secondly, after the multi-head attention mechanism is carried out, attention calculation is carried out on the information coded by the decoder and the information coded by the encoder, key and value vectors are obtained from the information coded by the encoder, query vectors are constructed from internal sequence variables of the decoder, and then attention operation is carried out; repeating the decoder 5 times with the same structure to obtain the final output sequence embedded expression (B, T, E);

then, the (B, T, E) vector output by the decoder is accessed into the region space from the full connection layer to the region table to become (B, T, Z); wherein Z is the size of the region table; and connecting the logits vector with a layer of softmax to obtain a probability, obtaining a prediction sequence according to the probability, calculating a loss function value by using a cross entropy loss function on the prediction sequence and a target sequence, and then starting a parameter optimization process.

Preferably, step 5 comprises:

inputting a track sequence to be predicted into an encoder to obtain an encoder coding vector (F, E), then inputting a single element sequence with a start mark < s > into the encoder, predicting a next track region T1, splicing the < s > and the T1 to be used as an input sequence of the encoder, and continuing to predict the next track region until the predicted sequence has </s >.

According to the technical scheme, on the basis of analyzing the existing vehicle track data, the data are preprocessed, error points and redundant points in the track data are removed, track sequence data are formed, then a multi-mode semantic fusion representation method is provided, the position, time, speed and direction information of a vehicle is fused, finally a track prediction method based on a transform is adopted to realize the track prediction of the vehicle.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of an auto-encoder learning track depth feature in accordance with the present invention;

FIG. 2 is a flow chart of a transform-based trajectory prediction method according to the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

The invention provides a track prediction method based on semantic fusion representation, which comprises the following steps:

and 4, step 4: the transform-based trajectory prediction method (as shown in fig. 2) learns the correlation between trajectories by using a multi-head self-attention mechanism and a mask-based self-attention method, thereby realizing the prediction of the trajectories.

Wherein, step 1 includes:

step 1.1: removing error points;

step 1.2: removing redundant points;

set the tracing point p _i 、p _j 、p _k 、p _n Are successively delta d ₁ 、Δd ₂ And Δ d _n Setting the circle radius threshold value R as 20m, and if the condition delta d is satisfied ₁ <2*R，Δd ₂ <2R and Δ d _n <2R, considering that the position of the vehicle is basically kept unchanged in a time period, and using a track point p as an equivalent point of a redundant point in the region; wherein, the longitude and latitude of p are calculated according to the average value in the redundant area;

step 1.3: forming a track sequence;

the research objective of the invention is to predict the next position of the track, and directly use the last position of the track as a label to be not beneficial to the prediction result of the track. Therefore, the method divides the track based on the sliding window, sets the sliding window with fixed length at the beginning of the track, sets the last track point in the window as the position point to be predicted, and takes the rest track points as training characteristics to form a training sample; sliding the window forward one position in sequence to form a new training sample, and adding the new training sample into the training characteristic sequence and the label sequence respectively; when the window reaches the last position of the track sequence, extracting track points, and finishing the division; in particular, in a time window T _j Track T ═ p ₁ ,p ₂ ,...,p _n H, mixing p ₁ ,p ₂ ,...,p _j-1 As training features, p _j As the next position tag for the track.

In step 2, the multi-modal semantic track is set to Traj (o) _i )＝{p ₁ (o _i ),p ₂ (o _i ),...,p _n (o _i ) In which p is _n (o _i ) Representing an object o _i Of the nth position, i.e. p _n (o _i )＝{L _n ,T _n ,D _n ,I _n }; wherein L is _n Representing an object o _i Semantic information of latitude and longitude at nth position, T _n Characterizing temporal semantic information, D _n Vehicle speed information characterizing the vehicle, I _n Characterizing directional information of the vehicle;

assuming that the grid is divided into n segments, L _n Can be expressed as an n × 1 dimensional vector; at the same time, the design dimension is D _l X n conversion vector E _l Is prepared by mixing L _n Conversion to D _l Vector of x 1

The formula is as follows:

semantic representation of track time characteristics, which comprises representing time periods by adopting a grid division method and is used for capturing semantic information of time characteristics;

setting the hour period as the grid division, dividing one hour into m sections, then T _n Can be expressed as an mx 1-dimensional vector; at the same time, the design dimension is D _t X m conversion vector E _t Will T _n Conversion to D _t Vector of x 1

The formula is as follows:

The formula is as follows:

semantic representation of vehicle direction information, corresponding to the vehicle direction information contained in the track, setting the number of discrete values of the vehicle direction information to be Q, and then I _n Has a dimension vector of Qx 1 and a use dimension of D _i X Q transformation matrix E _i Will I _n Is converted into

The formula is as follows:

the step 3 comprises the following steps:

step 3.1: coding;

the automatic encoder adopting the encoder-decoder learns the depth characteristics of the track sequence, and the structure of the automatic encoder is shown in figure 1. Inputting the obtained multi-dimensional track semantic sequence into an encoder part of an automatic encoder, wherein the updating mode of the hidden layer is shown as the following formula:

h _i ＝f _encoder (h _i-1 ,b _i )

step 3.2: decoding;

h' _i ＝f _decoder (h' _i-1 ,c _i-1 )

And step 4, track prediction model coding training and track prediction model decoding training are included. The model is divided into an encoder part and a decoder part, and the encoder and the decoder part respectively receive different inputs during training.

Specifically, in the trajectory prediction model coding training, it is assumed that the characterization vector of the trajectory sequence is T ═ T (T) ₁ +T ₂ +...+T _N ) WhereinN is the number of tracing points, T _i For semantic representation of each track point, the input of the track sequence needs to add one at the end of the sequence</s>Setting the length of the sequence as F, and enabling the length of the track sequence with the length less than F to be F by a Padding mode; also, add to the start position of the output sequence<s>End position addition</s>Setting the length of an output prediction sequence of the track as M; the method comprises the following steps:

secondly, adjusting the dimension of the vector after Embedding from (B, F, E) to (B, F, E), and then using the vector to construct a query, key, value matrix, so that the dimension is (B, F, E); then transpose query, key, value matrix dimension from (B x F, E) to (B, F, E); since a multi-head attention mechanism is adopted, the number of heads is set to be N, and N × H is E, then query, key, value is divided into (B, F, N, H), and an attribute score is calculated according to query, key, value, and the formula is as follows:

In the decoding training of the track prediction model, a decoder in the training process receives input sequences (B, T) and carries out track coding and position coding on the input sequences, wherein the track coding and position coding and the Encode are the same group of weight matrixes; the method comprises the following steps:

firstly, after an input sequence is subjected to track coding and position coding, the obtained matrix dimension is (B, T, E), and coding is carried out by using a mask multi-head attention mechanism; the multi-head attention of the mask is the same as the structure of the self-attention network, but a mask matrix is added, the future information is essentially masked, the input in the training process is a complete sequence T, and the mask matrix needs to be added to mask the future information.

Further, the method also comprises the step 5: and verifying a track prediction model, namely inputting a track sequence to be predicted into an encoder to obtain an encoder coding vector (F, E), then inputting a single element sequence with a start mark < s > into the encoder, predicting a next track region T1, splicing the < s > and the T1 to serve as an input sequence of the encoder, and continuing to predict the next track region until the predicted sequence has </s >.

The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical idea of the present invention, and these simple modifications are within the protective scope of the present invention.

It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.

In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims

1. A trajectory prediction method based on semantic fusion characterization is characterized by comprising the following steps:

2. The trajectory prediction method based on semantic fusion characterization according to claim 1, further comprising the step 5: and verifying the track prediction model.

3. The trajectory prediction method based on the semantic fusion characterization according to claim 1, wherein the step 1 comprises:

step 1.1: removing error points;

set tracing point p _i And p _j With a time interval Δ t therebetween ₁ With a spatial distance Δ d ₁ ；

If the condition is satisfied: Δ d ₁ ＞Δt ₁ *V _max And Δ d ₂ ＞Δt ₂ *V _max Then judging the track point p _j For error points, should be removed;

step 1.2: removing redundant points;

set the tracing point p _i 、p _j 、p _k 、p _n Are successively delta d from one another ₁ 、Δd ₂ And Δ d _n Setting the circle radius threshold value R as 20m, and if the condition delta d is satisfied ₁ ＜2*R，Δd ₂ < 2R and Δ d _n If the position of the vehicle is less than 2R, the position of the vehicle is basically kept unchanged in a time period, and a track point p is used as an equivalent point of a redundant point in the region; wherein, the longitude and latitude of p are calculated according to the average value in the redundant area;

step 1.3: forming a track sequence;

dividing the track based on a sliding window, setting the sliding window with a fixed length at the beginning of the track, setting the last track point in the window as a position point to be predicted, and taking the rest track points as training characteristics to form a training sample; sliding the window forward one position in sequence to form a new training sample, and adding the new training sample into the training characteristic sequence and the label sequence respectively; when the window reaches the last position of the track sequence, extracting track points, and finishing the division; in particular, in a time window T _j Track T ═ p ₁ ,p ₂ ,...,p _n H, mixing p1, p ₂ ,...,p _j- 1 as a training feature, p _j As the next position tag for the track.

4. The method for predicting trajectories based on semantic fusion characterization according to claim 1, wherein in step 2, the multi-modal semantic trajectory is set as Traj (o) _i )＝{p ₁ (o _i ),p ₂ (o _i ),...,p _n (o _i ) In which p _n (o _i ) Representing an object o _i Of the nth position, i.e. p _n (o _i )＝{L _n ,T _n ,D _n ,I _n }; wherein L is _n Representing an object o _i Semantic information of latitude and longitude at nth position, T _n Characterizing temporal semantic information, D _n Characterizing vehicle speed information of the vehicle, I _n Characterizing directional information of the vehicle;

The formula is as follows:

The formula is as follows:

The formula is as follows:

The formula is as follows:

5. the trajectory prediction method based on semantic fusion characterization according to claim 1, wherein step 3 comprises:

step 3.1: coding;

h _i ＝f _encoder (h _i-1 ,b _i )

step 3.2: decoding;

h' _i ＝f _decoder (h' _i-1 ,c _i-1 )

6. The trajectory prediction method based on semantic fusion characterization according to claim 1, wherein the step 4 comprises trajectory prediction model coding training and trajectory prediction model decoding training.

7. The trajectory predictor based on semantic fusion characterization of claim 6The method is characterized in that in the track prediction model coding training, a characterization vector of a track sequence is assumed to be T ═ T (T) ₁ +T ₂ +...+T _N ) Wherein N is the number of tracing points, T _i For semantic representation of each track point, the input of the track sequence needs to add one at the end of the sequence</s>Setting the length of the sequence as F, and enabling the length of the track sequence with the length less than F to be F by a Padding mode; also, add to the start position of the output sequence<s>End position addition</s>Setting the length of an output prediction sequence of the track as M; the method comprises the following steps:

secondly, adjusting the dimension of the vector after Embedding from (B, F, E) to (B, F, E), and then using the vector to construct a query, key, value matrix, so that the dimension is (B, F, E); then transpose again the dimensions of the query, key, value matrix from (B x F, E) to (B, F, E); since a multi-head attention mechanism is adopted, the number of heads is set to be N, and N × H is E, then query, key, value is divided into (B, F, N, H), and an attribute score is calculated according to query, key, value, and the formula is as follows:

after obtaining an attention vector, obtaining input dimensions (B, F, E), inputting the input dimensions into a full-connection layer, setting the dimension of an intermediate hidden variable as D, multiplying different words in a sequence by a weight matrix (E, D), and then multiplying by another weight matrix (D, E), so that the final dimension of the matrix is (B, F, E); after the other 5 layers of encoders, the finally obtained encoder outputs are (B, F and E).

8. The trajectory prediction method based on semantic fusion characterization as claimed in claim 6, wherein in the trajectory prediction model decoding training, the decoder of the training process receives the input sequence (B, T) and performs trajectory coding and position coding on the input sequence, and the trajectory coding and position coding and the Endecoder are the same set of weight matrix; the method comprises the following steps:

firstly, after an input sequence is subjected to track coding and position coding, the obtained matrix dimension is (B, T and E), and a mask multi-head attention mechanism is used for coding; the multi-head attention of the mask is the same as the structure of the self-attention network, but a mask matrix is added for covering future information by adding the mask matrix to the input complete sequence T during training;

9. The trajectory prediction method based on semantic fusion characterization according to claim 2, wherein the step 5 comprises: