CN115049009A - Track prediction method based on semantic fusion representation - Google Patents

Track prediction method based on semantic fusion representation Download PDF

Info

Publication number
CN115049009A
CN115049009A CN202210707333.2A CN202210707333A CN115049009A CN 115049009 A CN115049009 A CN 115049009A CN 202210707333 A CN202210707333 A CN 202210707333A CN 115049009 A CN115049009 A CN 115049009A
Authority
CN
China
Prior art keywords
track
sequence
encoder
semantic
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210707333.2A
Other languages
Chinese (zh)
Inventor
陈剑
陈钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Information Intelligence Innovation Research Institute
Original Assignee
Yangtze River Delta Information Intelligence Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Information Intelligence Innovation Research Institute filed Critical Yangtze River Delta Information Intelligence Innovation Research Institute
Priority to CN202210707333.2A priority Critical patent/CN115049009A/en
Publication of CN115049009A publication Critical patent/CN115049009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a track prediction method based on semantic fusion representation, which comprises the following steps: removing error points and redundant points by using a track data preprocessing method, and dividing track data by adopting a sliding window to form a track sequence with labels; carrying out multi-dimensional track fusion vector representation by combining longitude and latitude information, time information, speed information and direction information of the vehicle; learning the depth features of the track sequence by using an automatic encoder, and jointly constructing semantic representation of the track sequence by combining the original features of the track sequence; the trajectory prediction method based on the transform learns the correlation between trajectories through a multi-head self-attention mechanism and a mask-based self-attention method, and further realizes the prediction of the trajectories. The method can better predict the position and the regional distribution of the automobile at the future time, improve the accuracy of predicting the track of the automobile, enable a driver to change the travel time or the travel track in advance, and avoid traffic jam.

Description

Track prediction method based on semantic fusion representation
Technical Field
The invention relates to a track prediction method based on semantic fusion representation.
Background
With the vigorous development of the internet and internet of things technology, massive trajectory data is produced. The trajectory data includes traffic trajectory data, human activity trajectory data, and trajectory data of other movable objects. By mining the track data, the activity rule of the mobile object can be obtained.
However, the traditional method is too single and has low accuracy for predicting the position of the automobile. Specifically, time series models are widely used to predict the trajectory of a vehicle, and representative time series models are RNN, LSTM, and the like. However, the current trajectory prediction model lacks semantic fusion representation of automobile trajectory points and trajectory sequences, and the traditional time sequence model is difficult to capture the correlation between the trajectory points, so that the overall prediction accuracy of the model is not high.
Therefore, aiming at the defect that the traditional method lacks the acquisition of correlation information between semantic mining representation and track points of multi-modal track data, the invention provides a track prediction method based on semantic fusion representation.
Disclosure of Invention
The invention aims to provide a track prediction method based on semantic fusion representation, which can better predict the position and the regional distribution of an automobile at the future time, improve the accuracy of predicting the track of the automobile, enable a driver to change the travel time or the travel track in advance and avoid traffic jam.
In order to achieve the above object, the present invention provides a trajectory prediction method based on semantic fusion characterization, wherein the method comprises:
step 1: removing error points and redundant points by using a track data preprocessing method, and dividing track data by adopting a sliding window to form a track sequence with labels;
step 2: carrying out multi-dimensional track fusion vector representation by combining longitude and latitude information, time information, speed information and direction information of the vehicle;
and step 3: learning the depth characteristics of the track sequence by using an automatic encoder, and jointly constructing the semantic representation of the track sequence by combining the original characteristics of the track sequence;
and 4, step 4: the trajectory prediction method based on the transform learns the correlation between trajectories through a multi-head self-attention mechanism and a mask-based self-attention method, and further realizes the prediction of the trajectories.
Preferably, the method further comprises step 5: and verifying the track prediction model.
Preferably, step 1 comprises:
step 1.1: removing error points;
set the tracing point p i And p j With a time interval Δ t therebetween 1 With a spatial distance Δ d 1
Set the tracing point p j And p k With a time interval Δ t therebetween 2 With a spatial distance Δ d 2
Setting the upper limit of the driving speed of the urban road vehicle to V max
If the condition is satisfied: Δ d 1 >Δt 1 *V max And Δ d 2 >Δt 2 *V max Then judging the track point p j For error points, should be removed;
step 1.2: removing redundant points;
set tracing point p i 、p j 、p k 、p n Are successively delta d 1 、Δd 2 And Δ d n Setting the circle radius threshold value R as 20m, and if the condition delta d is satisfied 1 <2*R,Δd 2 <2R and Δ d n <2R, considering that the position of the vehicle is basically kept unchanged in a time period, and using a track point p as an equivalent point of a redundant point in the region; wherein, the longitude and latitude of p are calculated according to the average value in the redundant area;
step 1.3: forming a track sequence;
dividing the track based on a sliding window, setting the sliding window with a fixed length at the beginning of the track, setting the last track point in the window as a position point to be predicted, and taking the rest track points as training characteristics to form a training sample; sliding the window forward one position in sequence to form a new training sample, and adding the new training sample into the training characteristic sequence and the label sequence respectively; when the window reaches the last position of the track sequence, extracting track points, and finishing the division; in particular, in a time window T j Track ofT={p 1 ,p 2 ,...,p n H, mixing p 1 ,p 2 ,...,p j-1 As training features, p j As the next position tag for the track.
Preferably, in step 2, the multi-modal semantic track is set to Traj (o) i )={p 1 (o i ),p 2 (o i ),...,p n (o i ) In which p is n (o i ) Representing an object o i Of the nth position, i.e. p n (o i )={L n ,T n ,D n ,I n }; wherein L is n Representing an object o i Semantic information of latitude and longitude at nth position, T n Characterizing temporal semantic information, D n Characterizing vehicle speed information of the vehicle, I n Characterizing directional information of the vehicle;
the multi-dimensional track fusion vector characterization comprises the following steps:
semantic representation of track longitude and latitude characteristics, which comprises representing an area formed in a certain longitude and latitude range by adopting a grid division method and is used for capturing semantic information of the longitude and latitude characteristics;
assuming that the grid is divided into n segments, then L n Can be expressed as an n × 1 dimensional vector; at the same time, the design dimension is D l X n conversion vector E l Is prepared by mixing L n Conversion to D l Vector of x 1
Figure BDA0003705892680000031
The formula is as follows:
Figure BDA0003705892680000032
the track time characteristic semantic representation comprises the steps of representing a time period by adopting a gridding division method and is used for capturing semantic information of time characteristics;
setting the period of hour as the grid division, dividing one hour into m sections, then T n Can be expressed as an mx 1-dimensional vector; at the same time, the design dimension is D t X m conversion vector E t Will T n Conversion to D t Vector of x 1
Figure BDA0003705892680000033
The formula is as follows:
Figure BDA0003705892680000034
semantic representation of vehicle speed information, wherein V discrete values of the vehicle speed are set corresponding to the speed information contained in the track information; first, velocity information D is obtained n Encoding the data into a vector of V multiplied by 1, wherein V represents the number of discrete speed values in the current data set; then, using the dimension D d X V transformation matrix E d Will D n Is converted into
Figure BDA0003705892680000041
The formula is as follows:
Figure BDA0003705892680000042
semantic representation of vehicle direction information, setting the number of discrete values of the vehicle direction information to be Q corresponding to the vehicle direction information contained in the track, and then I n Has a dimension vector of Qx 1 and a use dimension of D i X Q transformation matrix E i Will I n Is converted into
Figure BDA0003705892680000043
The formula is as follows:
Figure BDA0003705892680000044
to this end, a multidimensional track semantic p is calculated n (o i ) The formula is as follows:
Figure BDA0003705892680000045
preferably, step 3 comprises:
step 3.1: coding;
an automatic encoder of an encoder-decoder is adopted to learn the depth characteristics of a track sequence, namely, an obtained multi-dimensional track semantic sequence is input to an encoder part of the automatic encoder, and the updating mode of a hidden layer is shown as the following formula:
h i =f encoder (h i-1 ,b i )
wherein f is encoder Representing the encoder function of an automatic encoder, b i Semantic input representing track points;
step 3.2: decoding;
final output h of hidden layer of Encoder in step 3.1 i Will represent the entire track sequence and will serve as the initial hidden layer of the decoder LSTM, resulting in the output sequence c 1 ,c 2 ,...,c i The hidden layer update of the decoder is shown as the following formula:
h' i =f decoder (h' i-1 ,c i-1 )
wherein f is decoder Decoder function representing an automatic encoder, c i-1 Represents the output of the decoder;
step 3.3: the learning objective of the decoder is the input to the encoder, which passes the minimum error, which takes the function:
Figure BDA0003705892680000051
the output of the trained encoder can effectively represent the input data, i.e. the output information contains the depth characteristic information of the track sequence.
Preferably, step 4 comprises trajectory prediction model coding training and trajectory prediction model decoding training.
Preferably, in the trajectory prediction model coding training, the characterization vector of the trajectory sequence is assumed to be T ═ T (T) 1 +T 2 +...+T N ) Wherein N is the number of tracing points, T i For semantic representation of each track point, the input of the track sequence needs to add one at the end of the sequence</s>Setting the length of the sequence as F, and enabling the length of the track sequence with the length less than F to be F by a Padding mode; also, add to the start position of the output sequence<s>End position addition</s>Setting the length of an output prediction sequence of the track as M; the method comprises the following steps:
firstly, during training, a batch mode is adopted, the size of the batch is set to be B, the input track sequence dimension (B, F) is set, the dimensions of track coding and Position coding are set to be the same and are both E, and after track Embedding and Position Embedding, the vector dimension at the moment is (B, F, E);
secondly, adjusting the dimension of the vector after Embedding from (B, F, E) to (B, F, E), and then using the vector to construct a query, key, value matrix, so that the dimension is (B, F, E); then transpose query, key, value matrix dimension from (B x F, E) to (B, F, E); since a multi-head attention mechanism is adopted, the number of heads is set to be N, and N × H ═ E, then query, key, value are divided into (B, F, N, H), and an attribute score is calculated according to query, key, value, and the formula is as follows:
Figure BDA0003705892680000052
finally, calculating by using the scores of the scores and the value matrix to obtain an attention vector;
after obtaining an attention vector, obtaining input dimensions (B, F, E), inputting the input dimensions into a full-connection layer, setting the dimension of an intermediate hidden variable as D, multiplying different words in a sequence by a weight matrix (E, D), and then multiplying by another weight matrix (D, E), so that the final dimension of the matrix is (B, F, E); after the rest 5 layers of encoders, the finally obtained encoder output is (B, F, E).
Preferably, in the decoding training of the track prediction model, the decoder of the training process receives the input sequence (B, T), and performs track coding and position coding on the input sequence, wherein the track coding and the position coding are the same group of weight matrixes as those in the Encoder; the method comprises the following steps:
firstly, after an input sequence is subjected to track coding and position coding, the obtained matrix dimension is (B, T, E), and coding is carried out by using a mask multi-head attention mechanism; the multi-head attention of the mask is the same as the structure of the self-attention network, but a mask matrix is added for covering future information by adding the mask matrix to the input complete sequence T during training;
secondly, after the multi-head attention mechanism is carried out, attention calculation is carried out on the information coded by the decoder and the information coded by the encoder, key and value vectors are obtained from the information coded by the encoder, query vectors are constructed from internal sequence variables of the decoder, and then attention operation is carried out; repeating the decoder 5 times with the same structure to obtain the final output sequence embedded expression (B, T, E);
then, the (B, T, E) vector output by the decoder is accessed into the region space from the full connection layer to the region table to become (B, T, Z); wherein Z is the size of the region table; and connecting the logits vector with a layer of softmax to obtain a probability, obtaining a prediction sequence according to the probability, calculating a loss function value by using a cross entropy loss function on the prediction sequence and a target sequence, and then starting a parameter optimization process.
Preferably, step 5 comprises:
inputting a track sequence to be predicted into an encoder to obtain an encoder coding vector (F, E), then inputting a single element sequence with a start mark < s > into the encoder, predicting a next track region T1, splicing the < s > and the T1 to be used as an input sequence of the encoder, and continuing to predict the next track region until the predicted sequence has </s >.
According to the technical scheme, on the basis of analyzing the existing vehicle track data, the data are preprocessed, error points and redundant points in the track data are removed, track sequence data are formed, then a multi-mode semantic fusion representation method is provided, the position, time, speed and direction information of a vehicle is fused, finally a track prediction method based on a transform is adopted to realize the track prediction of the vehicle.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of an auto-encoder learning track depth feature in accordance with the present invention;
FIG. 2 is a flow chart of a transform-based trajectory prediction method according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
The invention provides a track prediction method based on semantic fusion representation, which comprises the following steps:
step 1: removing error points and redundant points by using a track data preprocessing method, and dividing track data by adopting a sliding window to form a track sequence with labels;
step 2: carrying out multi-dimensional track fusion vector representation by combining longitude and latitude information, time information, speed information and direction information of the vehicle;
and step 3: learning the depth characteristics of the track sequence by using an automatic encoder, and jointly constructing the semantic representation of the track sequence by combining the original characteristics of the track sequence;
and 4, step 4: the transform-based trajectory prediction method (as shown in fig. 2) learns the correlation between trajectories by using a multi-head self-attention mechanism and a mask-based self-attention method, thereby realizing the prediction of the trajectories.
Wherein, step 1 includes:
step 1.1: removing error points;
set the tracing point p i And p j With a time interval Δ t therebetween 1 With a spatial distance Δ d 1
Set the tracing point p j And p k With a time interval Δ t therebetween 2 With a spatial distance Δ d 2
Setting the upper limit of the driving speed of the urban road vehicle to V max
If the condition is satisfied: Δ d 1 >Δt 1 *V max And Δ d 2 >Δt 2 *V max Then judging the track point p j For error points, should be removed;
step 1.2: removing redundant points;
set the tracing point p i 、p j 、p k 、p n Are successively delta d 1 、Δd 2 And Δ d n Setting the circle radius threshold value R as 20m, and if the condition delta d is satisfied 1 <2*R,Δd 2 <2R and Δ d n <2R, considering that the position of the vehicle is basically kept unchanged in a time period, and using a track point p as an equivalent point of a redundant point in the region; wherein, the longitude and latitude of p are calculated according to the average value in the redundant area;
step 1.3: forming a track sequence;
the research objective of the invention is to predict the next position of the track, and directly use the last position of the track as a label to be not beneficial to the prediction result of the track. Therefore, the method divides the track based on the sliding window, sets the sliding window with fixed length at the beginning of the track, sets the last track point in the window as the position point to be predicted, and takes the rest track points as training characteristics to form a training sample; sliding the window forward one position in sequence to form a new training sample, and adding the new training sample into the training characteristic sequence and the label sequence respectively; when the window reaches the last position of the track sequence, extracting track points, and finishing the division; in particular, in a time window T j Track T ═ p 1 ,p 2 ,...,p n H, mixing p 1 ,p 2 ,...,p j-1 As training features, p j As the next position tag for the track.
In step 2, the multi-modal semantic track is set to Traj (o) i )={p 1 (o i ),p 2 (o i ),...,p n (o i ) In which p is n (o i ) Representing an object o i Of the nth position, i.e. p n (o i )={L n ,T n ,D n ,I n }; wherein L is n Representing an object o i Semantic information of latitude and longitude at nth position, T n Characterizing temporal semantic information, D n Vehicle speed information characterizing the vehicle, I n Characterizing directional information of the vehicle;
the multi-dimensional track fusion vector characterization comprises the following steps:
semantic representation of track longitude and latitude characteristics, which comprises representing an area formed in a certain longitude and latitude range by adopting a grid division method and is used for capturing semantic information of the longitude and latitude characteristics;
assuming that the grid is divided into n segments, L n Can be expressed as an n × 1 dimensional vector; at the same time, the design dimension is D l X n conversion vector E l Is prepared by mixing L n Conversion to D l Vector of x 1
Figure BDA0003705892680000091
The formula is as follows:
Figure BDA0003705892680000092
semantic representation of track time characteristics, which comprises representing time periods by adopting a grid division method and is used for capturing semantic information of time characteristics;
setting the hour period as the grid division, dividing one hour into m sections, then T n Can be expressed as an mx 1-dimensional vector; at the same time, the design dimension is D t X m conversion vector E t Will T n Conversion to D t Vector of x 1
Figure BDA0003705892680000093
The formula is as follows:
Figure BDA0003705892680000094
semantic representation of vehicle speed information, wherein V discrete values of the vehicle speed are set corresponding to the speed information contained in the track information; first, velocity information D is obtained n Encoding the data into a vector of V multiplied by 1, wherein V represents the number of discrete speed values in the current data set; then, using the dimension D d X V transformation matrix E d Will D n Is converted into
Figure BDA0003705892680000095
The formula is as follows:
Figure BDA0003705892680000096
semantic representation of vehicle direction information, corresponding to the vehicle direction information contained in the track, setting the number of discrete values of the vehicle direction information to be Q, and then I n Has a dimension vector of Qx 1 and a use dimension of D i X Q transformation matrix E i Will I n Is converted into
Figure BDA0003705892680000101
The formula is as follows:
Figure BDA0003705892680000102
to this end, a multidimensional track semantic p is calculated n (o i ) The formula is as follows:
Figure BDA0003705892680000103
the step 3 comprises the following steps:
step 3.1: coding;
the automatic encoder adopting the encoder-decoder learns the depth characteristics of the track sequence, and the structure of the automatic encoder is shown in figure 1. Inputting the obtained multi-dimensional track semantic sequence into an encoder part of an automatic encoder, wherein the updating mode of the hidden layer is shown as the following formula:
h i =f encoder (h i-1 ,b i )
wherein f is encoder Representing the encoder function of an automatic encoder, b i Semantic input representing track points;
step 3.2: decoding;
final output h of hidden layer of Encoder in step 3.1 i Will represent the entire track sequence and will serve as the initial hidden layer of the decoder LSTM, resulting in the output sequence c 1 ,c 2 ,...,c i The hidden layer update of the decoder is shown as the following formula:
h' i =f decoder (h' i-1 ,c i-1 )
wherein f is decoder Decoder function representing an automatic encoder, c i-1 Represents the output of the decoder;
step 3.3: the learning objective of the decoder is the input to the encoder, which passes the minimum error, which takes the function:
Figure BDA0003705892680000104
the output of the trained encoder can effectively represent the input data, i.e. the output information contains the depth characteristic information of the track sequence.
And step 4, track prediction model coding training and track prediction model decoding training are included. The model is divided into an encoder part and a decoder part, and the encoder and the decoder part respectively receive different inputs during training.
Specifically, in the trajectory prediction model coding training, it is assumed that the characterization vector of the trajectory sequence is T ═ T (T) 1 +T 2 +...+T N ) WhereinN is the number of tracing points, T i For semantic representation of each track point, the input of the track sequence needs to add one at the end of the sequence</s>Setting the length of the sequence as F, and enabling the length of the track sequence with the length less than F to be F by a Padding mode; also, add to the start position of the output sequence<s>End position addition</s>Setting the length of an output prediction sequence of the track as M; the method comprises the following steps:
firstly, during training, a batch mode is adopted, the size of the batch is set to be B, the input track sequence dimension (B, F) is set, the dimensions of track coding and Position coding are set to be the same and are both E, and after track Embedding and Position Embedding, the vector dimension at the moment is (B, F, E);
secondly, adjusting the dimension of the vector after Embedding from (B, F, E) to (B, F, E), and then using the vector to construct a query, key, value matrix, so that the dimension is (B, F, E); then transpose query, key, value matrix dimension from (B x F, E) to (B, F, E); since a multi-head attention mechanism is adopted, the number of heads is set to be N, and N × H is E, then query, key, value is divided into (B, F, N, H), and an attribute score is calculated according to query, key, value, and the formula is as follows:
Figure BDA0003705892680000111
finally, calculating by using the scores of the scores and the value matrix to obtain an attention vector;
after obtaining an attention vector, obtaining input dimensions (B, F, E), inputting the input dimensions into a full-connection layer, setting the dimension of an intermediate hidden variable as D, multiplying different words in a sequence by a weight matrix (E, D), and then multiplying by another weight matrix (D, E), so that the final dimension of the matrix is (B, F, E); after the rest 5 layers of encoders, the finally obtained encoder output is (B, F, E).
In the decoding training of the track prediction model, a decoder in the training process receives input sequences (B, T) and carries out track coding and position coding on the input sequences, wherein the track coding and position coding and the Encode are the same group of weight matrixes; the method comprises the following steps:
firstly, after an input sequence is subjected to track coding and position coding, the obtained matrix dimension is (B, T, E), and coding is carried out by using a mask multi-head attention mechanism; the multi-head attention of the mask is the same as the structure of the self-attention network, but a mask matrix is added, the future information is essentially masked, the input in the training process is a complete sequence T, and the mask matrix needs to be added to mask the future information.
Secondly, after the multi-head attention mechanism is carried out, attention calculation is carried out on the information coded by the decoder and the information coded by the encoder, key and value vectors are obtained from the information coded by the encoder, query vectors are constructed from internal sequence variables of the decoder, and then attention operation is carried out; repeating the decoder 5 times with the same structure to obtain the final output sequence embedded expression (B, T, E);
then, the (B, T, E) vector output by the decoder is accessed into the region space from the full connection layer to the region table to become (B, T, Z); wherein Z is the size of the region table; and connecting the logits vector with a layer of softmax to obtain a probability, obtaining a prediction sequence according to the probability, calculating a loss function value by using a cross entropy loss function on the prediction sequence and a target sequence, and then starting a parameter optimization process.
Further, the method also comprises the step 5: and verifying a track prediction model, namely inputting a track sequence to be predicted into an encoder to obtain an encoder coding vector (F, E), then inputting a single element sequence with a start mark < s > into the encoder, predicting a next track region T1, splicing the < s > and the T1 to serve as an input sequence of the encoder, and continuing to predict the next track region until the predicted sequence has </s >.
The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical idea of the present invention, and these simple modifications are within the protective scope of the present invention.
It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.
In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims (9)

1. A trajectory prediction method based on semantic fusion characterization is characterized by comprising the following steps:
step 1: removing error points and redundant points by using a track data preprocessing method, and dividing track data by adopting a sliding window to form a track sequence with labels;
step 2: carrying out multi-dimensional track fusion vector representation by combining longitude and latitude information, time information, speed information and direction information of the vehicle;
and step 3: learning the depth characteristics of the track sequence by using an automatic encoder, and jointly constructing the semantic representation of the track sequence by combining the original characteristics of the track sequence;
and 4, step 4: the trajectory prediction method based on the transform learns the correlation between trajectories through a multi-head self-attention mechanism and a mask-based self-attention method, and further realizes the prediction of the trajectories.
2. The trajectory prediction method based on semantic fusion characterization according to claim 1, further comprising the step 5: and verifying the track prediction model.
3. The trajectory prediction method based on the semantic fusion characterization according to claim 1, wherein the step 1 comprises:
step 1.1: removing error points;
set tracing point p i And p j With a time interval Δ t therebetween 1 With a spatial distance Δ d 1
Set the tracing point p j And p k With a time interval Δ t therebetween 2 With a spatial distance Δ d 2
Setting the upper limit of the driving speed of the urban road vehicle to V max
If the condition is satisfied: Δ d 1 >Δt 1 *V max And Δ d 2 >Δt 2 *V max Then judging the track point p j For error points, should be removed;
step 1.2: removing redundant points;
set the tracing point p i 、p j 、p k 、p n Are successively delta d from one another 1 、Δd 2 And Δ d n Setting the circle radius threshold value R as 20m, and if the condition delta d is satisfied 1 <2*R,Δd 2 < 2R and Δ d n If the position of the vehicle is less than 2R, the position of the vehicle is basically kept unchanged in a time period, and a track point p is used as an equivalent point of a redundant point in the region; wherein, the longitude and latitude of p are calculated according to the average value in the redundant area;
step 1.3: forming a track sequence;
dividing the track based on a sliding window, setting the sliding window with a fixed length at the beginning of the track, setting the last track point in the window as a position point to be predicted, and taking the rest track points as training characteristics to form a training sample; sliding the window forward one position in sequence to form a new training sample, and adding the new training sample into the training characteristic sequence and the label sequence respectively; when the window reaches the last position of the track sequence, extracting track points, and finishing the division; in particular, in a time window T j Track T ═ p 1 ,p 2 ,...,p n H, mixing p1, p 2 ,...,p j- 1 as a training feature, p j As the next position tag for the track.
4. The method for predicting trajectories based on semantic fusion characterization according to claim 1, wherein in step 2, the multi-modal semantic trajectory is set as Traj (o) i )={p 1 (o i ),p 2 (o i ),...,p n (o i ) In which p n (o i ) Representing an object o i Of the nth position, i.e. p n (o i )={L n ,T n ,D n ,I n }; wherein L is n Representing an object o i Semantic information of latitude and longitude at nth position, T n Characterizing temporal semantic information, D n Characterizing vehicle speed information of the vehicle, I n Characterizing directional information of the vehicle;
the multi-dimensional track fusion vector characterization comprises the following steps:
semantic representation of track longitude and latitude characteristics, which comprises representing an area formed in a certain longitude and latitude range by adopting a grid division method and is used for capturing semantic information of the longitude and latitude characteristics;
assuming that the grid is divided into n segments, then L n Can be expressed as an n × 1 dimensional vector; at the same time, the design dimension is D l X n conversion vector E l Is prepared by mixing L n Conversion to D l Vector of x 1
Figure FDA0003705892670000021
The formula is as follows:
Figure FDA0003705892670000022
semantic representation of track time characteristics, which comprises representing time periods by adopting a grid division method and is used for capturing semantic information of time characteristics;
setting the hour period as the grid division, dividing one hour into m sections, then T n Can be expressed as an mx 1-dimensional vector; at the same time, the design dimension is D t X m conversion vector E t Will T n Conversion to D t Vector of x 1
Figure FDA0003705892670000031
The formula is as follows:
Figure FDA0003705892670000032
semantic representation of vehicle speed information, wherein V discrete values of the vehicle speed are set corresponding to the speed information contained in the track information; first, velocity information D is obtained n Encoding the data into a vector of V multiplied by 1, wherein V represents the number of discrete speed values in the current data set; then, using the dimension D d X V transformation matrix E d Will D n Is converted into
Figure FDA0003705892670000033
The formula is as follows:
Figure FDA0003705892670000034
semantic representation of vehicle direction information, corresponding to the vehicle direction information contained in the track, setting the number of discrete values of the vehicle direction information to be Q, and then I n Has a dimension vector of Qx 1 and a use dimension of D i X Q transformation matrix E i Will I n Is converted into
Figure FDA0003705892670000035
The formula is as follows:
Figure FDA0003705892670000036
to this end, a multidimensional track semantic p is calculated n (o i ) The formula is as follows:
Figure FDA0003705892670000037
5. the trajectory prediction method based on semantic fusion characterization according to claim 1, wherein step 3 comprises:
step 3.1: coding;
an automatic encoder of an encoder-decoder is adopted to learn the depth characteristics of a track sequence, namely, an obtained multi-dimensional track semantic sequence is input to an encoder part of the automatic encoder, and the updating mode of a hidden layer is shown as the following formula:
h i =f encoder (h i-1 ,b i )
wherein f is encoder Representing the encoder function of an automatic encoder, b i Semantic input representing track points;
step 3.2: decoding;
final output h of hidden layer of Encoder in step 3.1 i Will represent the entire track sequence and will serve as the initial hidden layer of the decoder LSTM, resulting in the output sequence c 1 ,c 2 ,...,c i The hidden layer update of the decoder is shown as the following formula:
h' i =f decoder (h' i-1 ,c i-1 )
wherein f is decoder Decoder function representing an automatic encoder, c i-1 Represents the output of the decoder;
step 3.3: the learning objective of the decoder is the input to the encoder, which passes the minimum error, which takes the function:
Figure FDA0003705892670000041
the output of the trained encoder can effectively represent the input data, i.e. the output information contains the depth characteristic information of the track sequence.
6. The trajectory prediction method based on semantic fusion characterization according to claim 1, wherein the step 4 comprises trajectory prediction model coding training and trajectory prediction model decoding training.
7. The trajectory predictor based on semantic fusion characterization of claim 6The method is characterized in that in the track prediction model coding training, a characterization vector of a track sequence is assumed to be T ═ T (T) 1 +T 2 +...+T N ) Wherein N is the number of tracing points, T i For semantic representation of each track point, the input of the track sequence needs to add one at the end of the sequence</s>Setting the length of the sequence as F, and enabling the length of the track sequence with the length less than F to be F by a Padding mode; also, add to the start position of the output sequence<s>End position addition</s>Setting the length of an output prediction sequence of the track as M; the method comprises the following steps:
firstly, during training, a batch mode is adopted, the size of the batch is set to be B, the input track sequence dimension (B, F) is set, the dimensions of track coding and Position coding are set to be the same and are both E, and after track Embedding and Position Embedding, the vector dimension at the moment is (B, F, E);
secondly, adjusting the dimension of the vector after Embedding from (B, F, E) to (B, F, E), and then using the vector to construct a query, key, value matrix, so that the dimension is (B, F, E); then transpose again the dimensions of the query, key, value matrix from (B x F, E) to (B, F, E); since a multi-head attention mechanism is adopted, the number of heads is set to be N, and N × H is E, then query, key, value is divided into (B, F, N, H), and an attribute score is calculated according to query, key, value, and the formula is as follows:
Figure FDA0003705892670000051
finally, calculating by using the scores of the scores and the value matrix to obtain an attention vector;
after obtaining an attention vector, obtaining input dimensions (B, F, E), inputting the input dimensions into a full-connection layer, setting the dimension of an intermediate hidden variable as D, multiplying different words in a sequence by a weight matrix (E, D), and then multiplying by another weight matrix (D, E), so that the final dimension of the matrix is (B, F, E); after the other 5 layers of encoders, the finally obtained encoder outputs are (B, F and E).
8. The trajectory prediction method based on semantic fusion characterization as claimed in claim 6, wherein in the trajectory prediction model decoding training, the decoder of the training process receives the input sequence (B, T) and performs trajectory coding and position coding on the input sequence, and the trajectory coding and position coding and the Endecoder are the same set of weight matrix; the method comprises the following steps:
firstly, after an input sequence is subjected to track coding and position coding, the obtained matrix dimension is (B, T and E), and a mask multi-head attention mechanism is used for coding; the multi-head attention of the mask is the same as the structure of the self-attention network, but a mask matrix is added for covering future information by adding the mask matrix to the input complete sequence T during training;
secondly, after the multi-head attention mechanism is carried out, attention calculation is carried out on the information coded by the decoder and the information coded by the encoder, key and value vectors are obtained from the information coded by the encoder, query vectors are constructed from internal sequence variables of the decoder, and then attention operation is carried out; repeating the decoder 5 times with the same structure to obtain the final output sequence embedded expression (B, T, E);
then, the (B, T, E) vector output by the decoder is accessed into the region space from the full connection layer to the region table to become (B, T, Z); wherein Z is the size of the region table; and connecting the logits vector with a layer of softmax to obtain a probability, obtaining a prediction sequence according to the probability, calculating a loss function value by using a cross entropy loss function on the prediction sequence and a target sequence, and then starting a parameter optimization process.
9. The trajectory prediction method based on semantic fusion characterization according to claim 2, wherein the step 5 comprises:
inputting a track sequence to be predicted into an encoder to obtain an encoder coding vector (F, E), then inputting a single element sequence with a start mark < s > into the encoder, predicting a next track region T1, splicing the < s > and the T1 to be used as an input sequence of the encoder, and continuing to predict the next track region until the predicted sequence has </s >.
CN202210707333.2A 2022-06-21 2022-06-21 Track prediction method based on semantic fusion representation Pending CN115049009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210707333.2A CN115049009A (en) 2022-06-21 2022-06-21 Track prediction method based on semantic fusion representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210707333.2A CN115049009A (en) 2022-06-21 2022-06-21 Track prediction method based on semantic fusion representation

Publications (1)

Publication Number Publication Date
CN115049009A true CN115049009A (en) 2022-09-13

Family

ID=83163534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210707333.2A Pending CN115049009A (en) 2022-06-21 2022-06-21 Track prediction method based on semantic fusion representation

Country Status (1)

Country Link
CN (1) CN115049009A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115222159A (en) * 2022-09-14 2022-10-21 中国电子科技集团公司第二十八研究所 Hot area identification method based on spatial domain relevancy
CN116304560A (en) * 2023-01-17 2023-06-23 北京信息科技大学 Track characterization model training method, characterization method and device based on multi-scale enhanced contrast learning
CN116558541A (en) * 2023-07-11 2023-08-08 新石器慧通(北京)科技有限公司 Model training method and device, and track prediction method and device
CN117475090A (en) * 2023-12-27 2024-01-30 粤港澳大湾区数字经济研究院(福田) Track generation model, track generation method, track generation device, terminal and medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115222159A (en) * 2022-09-14 2022-10-21 中国电子科技集团公司第二十八研究所 Hot area identification method based on spatial domain relevancy
CN116304560A (en) * 2023-01-17 2023-06-23 北京信息科技大学 Track characterization model training method, characterization method and device based on multi-scale enhanced contrast learning
CN116304560B (en) * 2023-01-17 2023-11-24 北京信息科技大学 Track characterization model training method, characterization method and device
CN116558541A (en) * 2023-07-11 2023-08-08 新石器慧通(北京)科技有限公司 Model training method and device, and track prediction method and device
CN116558541B (en) * 2023-07-11 2023-09-22 新石器慧通(北京)科技有限公司 Model training method and device, and track prediction method and device
CN117475090A (en) * 2023-12-27 2024-01-30 粤港澳大湾区数字经济研究院(福田) Track generation model, track generation method, track generation device, terminal and medium

Similar Documents

Publication Publication Date Title
CN111400620B (en) User trajectory position prediction method based on space-time embedded Self-orientation
CN115049009A (en) Track prediction method based on semantic fusion representation
CN113487061A (en) Long-time-sequence traffic flow prediction method based on graph convolution-Informer model
CN112257850B (en) Vehicle track prediction method based on generation countermeasure network
CN110738370A (en) novel moving object destination prediction algorithm
CN115240425B (en) Traffic prediction method based on multi-scale space-time fusion graph network
CN111930110A (en) Intent track prediction method for generating confrontation network by combining society
CN113993172B (en) Ultra-dense network switching method based on user movement behavior prediction
CN114202120A (en) Urban traffic travel time prediction method aiming at multi-source heterogeneous data
CN114372570A (en) Multi-mode vehicle trajectory prediction method
CN113479187B (en) Layered different-step-length energy management method for plug-in hybrid electric vehicle
CN116071715A (en) Automatic driving automobile real-time semantic segmentation model construction method
CN117141518A (en) Vehicle track prediction method based on intention perception spatiotemporal attention network
CN116484217A (en) Intelligent decision method and system based on multi-mode pre-training large model
CN115293237A (en) Vehicle track prediction method based on deep learning
CN113554060B (en) LSTM neural network track prediction method integrating DTW
US20230038673A1 (en) Sequential pedestrian trajectory prediction using step attention for collision avoidance
CN116331259A (en) Vehicle multi-mode track prediction method based on semi-supervised model
CN112950933B (en) Traffic flow prediction method based on novel space-time characteristics
CN113408786B (en) Traffic characteristic prediction method and system
CN115018134A (en) Pedestrian trajectory prediction method based on three-scale spatiotemporal information
Chen et al. Traffic flow prediction based on cooperative vehicle infrastructure for cloud control platform
Liu et al. Modeling trajectories with multi-task learning
CN113191539B (en) High-density composite scene track prediction method based on heterogeneous graph aggregation network
CN114312831B (en) Vehicle track prediction method based on spatial attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination