CN112766339A

CN112766339A - Trajectory recognition model training method and trajectory recognition method

Info

Publication number: CN112766339A
Application number: CN202110029664.0A
Authority: CN
Inventors: 徐勇军; 孙涛; 王飞; 吴�琳
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-05-07

Abstract

The invention provides a trajectory recognition model training method and a trajectory recognition method. The model training method comprises the following steps: s1, encoding target track data into track semantic vectors in a mode of constructing corpus pairs based on the one-hot coded vectors; s2, constructing a recurrent neural network for evaluating the track semantic vector and a classifier for classifying the evaluation result; and S3, training the recurrent neural network and the classifier by using the marked track data. In the track identification method, a new track semantic vector calculation method is adopted, and track segments with indefinite length can be projected into a vector space with definite length. By adopting the model training method, the network can learn the track characteristics of different categories, and the track categories are identified by the recurrent neural network.

Description

Trajectory recognition model training method and trajectory recognition method

Technical Field

The invention relates to the field of computers, in particular to processing and recognition of motion trajectory data.

Background

A trajectory is a series of data that records and describes the path of motion of an object. The trajectory recognition problem is a recognition problem that analyzes and presumes information such as the identity, category, intention, and the like of a target based on a captured motion trajectory of the target. The existing track recognition algorithm comprises a track recognition method based on clustering, a track recognition method based on sequence modeling and a track recognition method based on deep learning.

The track identification method based on clustering mainly comprises the methods of DTW (Dynamic Time Warping), EDR (Edit Distance on Real sequence) and the like, and the method mainly has the main idea that the similarity degree between different tracks is calculated by designing a calculation function for measuring the similarity of track data, and the similar tracks are clustered to achieve the track division effect, so that different types of tracks are identified.

The track recognition method based on sequence modeling mainly comprises methods such as an HMM (Hidden Markov Model), a CRF (Conditional Random Field) and the like, and the method mainly comprises the steps of modeling track data by using mathematical models such as a Random process and the like, adding classification information of a track to be recognized into a Model as modeling parameters, performing Model parameter tracks by using the existing track data marked with identity information, estimating parameters of the HMM and the C RF Model by using a maximum likelihood estimation method, and obtaining track identity information by using the parameter information.

The track recognition method based on deep learning is an end-to-end track recognition method, and comprises TULER (track-User Link on Embedding recovery Neural Networks), TULVAE (track-User Link on Auto Encoding variation) and the like, wherein input data of the method is original track data, and output is a recognition result of track identity. The track recognition method based on deep learning generally models tracks based on a recurrent neural network, wherein the recurrent neural network is a recurrent neural network which recurs in the evolution direction of a sequence and all nodes (recurrent units) are connected in a chain manner, and the method is characterized in that the network has memorability and network parameters are shared, so the method is widely applied to the processing of sequence data. The track identification method based on deep learning utilizes a large number of marked track data sets to carry out parameter estimation (namely network training) of a network model, and can directly carry out track identification by utilizing the trained network model.

Due to the particularity of the track data, namely the track data is indefinite in length and contains time and space information, and the distribution of track points on the space has obvious randomness, the characteristics enable the existing track identification method to have defects on track identification tasks, and the method is specifically embodied as follows:

the track identification method based on clustering faces the following problems: in an original space, the track data has the characteristics of large inter-class distance and difficult estimation of inter-class distance, so that the recognition accuracy of the track recognition task based on the clustering track recognition method is low, and the robustness of the algorithm is not high. Clustering-based trajectory recognition methods typically work well on one type of data, and recognition accuracy on other trajectory data can fluctuate widely.

The method based on sequence modeling has the problems that the modeling difficulty is high, due to the fact that the real track data features are many, the factors which can be considered by a sequence model are few, great challenges are brought to the sequence modeling, different sequence models need to be established according to specific application scenes and rules, and the generalization capability of the models is poor.

The method based on the deep neural network has the problems that the neural network has memory disappearance when the track length is too long, namely the neural network forgets information of an earlier track segment, and the track recognition method based on the deep neural network does not consider that the influences of different track points in the track on track recognition results are different, so that the track recognition method based on the deep learning has poor performance under certain data sets.

Disclosure of Invention

The invention aims to provide a trajectory model training method based on an attention mechanism and a corresponding trajectory identification method aiming at the problems.

Before describing the technical scheme of the invention, firstly, the definition of the track is given: a trajectory is a sequence of geographic points that record the movement of a human or other object, such as an animal, hurricane, and vehicle. The general trajectory has three key elements: target information, temporal information, and spatial information, in the form of: t { < u, T {₁,p₁＞,＜u,t₂,p₂＞,…＜u,t_n,p_n> -, where u is the record target identification, t is the time stamp of the record, and p is the location information of the record. In some cases, such as radar systems, the identity of the target is unknown, which is referred to as an anonymous trajectory. In a real world dataset, the location may be coordinates of latitude and longitude, or may be a point of interest, such as a park, a restaurant, etc. A track composed of points of interest is also referred to as a check-in based track. In the present invention, a check-in based trajectory dataset is used as a basic input.

In one aspect, the present invention provides a trajectory recognition model training method, including:

s1, encoding target track data into information encoding in a vector form by constructing a corpus pair based on a single hot encoding vector;

s2, constructing a recurrent neural network for encoding and converting the information into track semantic vectors and a classifier for classifying tracks based on the track semantic vectors;

and S3, training the recurrent neural network and the classifier by using the marked track data.

In a preferred implementation, the step S1 includes:

s1.1, carrying out vector-form one-hot coding on each track point in target track data;

s1.2, extracting adjacent track information of each track point and forming a corpus pair by the track point information;

s1.3, constructing a learning network for track point information coding, taking the corpus pair as the input of the learning network, and taking the output layer parameters of the learning network as the information coding of the corresponding track point.

In another preferred implementation, the method further comprises temporally slicing the target trajectory data.

In another preferred implementation, the step S2 includes: and classifying the corresponding track based on the evaluation result of the track semantic vector by utilizing the multilayer perceptron.

In another preferred implementation, the step S2 includes:

s2.1, calculating a track semantic vector based on a recurrent neural network;

s2.2, establishing a trajectory semantic vector reevaluation model based on attention;

and S2.3, recalculating the track semantic vector by using the track semantic vector obtained in the step S2.1 and the track semantic vector reevaluation model constructed in the step S2.3.

In another preferred implementation manner, in the track semantic vector calculation process of step S2.1, the method further includes: and randomly erasing a plurality of neural network units in the recurrent neural network according to a certain proportion.

In another preferred implementation, the recurrent neural network unit is a long-short term memory network, a gated neural unit, or a bidirectional long-short term memory network.

In another aspect, the present invention provides a method for performing trajectory recognition using a model trained by the method, including: and substituting the track data to be recognized as target track data into the trained recurrent neural network and the corresponding classifier for recognition.

In another aspect, the invention provides a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described above.

In another aspect, the present invention provides a computer device comprising a memory and a processor, wherein the memory stores a computer program capable of running on the processor, wherein the processor implements the method described above when executing the program.

Technical effects

The trajectory recognition model training method and the corresponding trajectory recognition method adopt a new trajectory semantic vector calculation mode, and can project trajectory sections with indefinite lengths into a vector space with definite lengths, so that the original trajectory data with poor recognition effect through a recurrent neural network can be converted into trajectory data which can be effectively recognized through the recurrent neural network.

The track recognition model can learn track characteristics of different categories through training, and track categories are recognized through a recurrent neural network.

According to the method, an attention mechanism is introduced, the weights of different track points can be learned in a self-adaptive manner, the influence of important points is highlighted, the influence of non-important points is weakened, and the precision of track identification is remarkably improved due to the consideration of the weight influence of different track points.

Drawings

The invention is illustrated and described only by way of example and not by way of limitation in the scope of the accompanying drawings, in which:

fig. 1 is a flowchart of an estimation and identification method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a neural network structure used in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions, design methods, and advantages of the present invention more apparent, the present invention will be further described in detail by specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The principles and embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Since the training method of the model and the recognition method of the trajectory are only different from the input data for the recurrent neural network, they are described together here.

As shown in fig. 1, the method for model training and trajectory recognition based on the attention-cycle neural network of embodiment 1 includes the following steps:

step 1: spatiotemporal trajectory data is obtained from a database.

And reading the track data from the database and temporarily storing the track data into a memory or a file.

Step 2: and (4) preprocessing data.

And carrying out preprocessing work such as denoising and duplicate removal on the original track data.

And step 3: and (6) slicing the data.

And performing re-slicing processing on the track data. The track data is collected, wherein the track is generally recorded by day, which may cause the situation that the irrelevant track position points are divided into a complete track. To solve this problem, in the present embodiment, the original trajectory is subdivided into subsequences at time intervals (e.g., 4 hours, 6 hours, 8 hours), each of which contains trajectory data within one sub-period.

And 4, step 4: and encoding track point information.

Deep neural networks have difficulty processing trajectories in terms of latitude and longitude. Therefore, in the present invention, each track location point is represented by a fixed-dimension vector v, and this location point is called a check-in location point. The sign-in position points learn context semantic information in an embedding mode, and vector representation of the position points is obtained. The advantage of converting location points into embedded vectors is that location points with similar semantics in geographic space are mapped into similar vectors in vector space, so in the process, the semantic information of the real world is expressed in the track point vector.

In this embodiment, the vector representation of the location point is calculated as follows:

4.1. and carrying out unique hot coding on the track points. Assume that the set of trace points included in all traces is (p)₁,p₂,…,p_n) Then, each trace point is separately encoded with a 1 xn-dimensional vector, and the trace point with index i is encoded with a unique code of [0,0, … 1, … 0, 0%]Where "1" is at the ith position and the remainder are 0, the symbol l (p) is used_i) Represents p_iThe one-hot encoding of (1).

4.2. And extracting context information of the track points and establishing a corpus pair. For each track point, selecting the original pointThe front k points and the back k points adjacent to the track point in the track form a corpus pair

Wherein p is_iIs the selected central track point of the central track,

is the set of adjacent trace points.

4.3. And (5) constructing a learning network and learning the information codes of the track points. In one implementation, a neural network having a structure as shown in fig. 2 is used as a learning network for track point information encoding, and the network comprises a three-layer network structure: an input layer, a projection layer and an output layer, wherein the input layer comprises p in the corpus pair, the projection layer adds the vectors of the input layer and performs linear transformation once, namely multiplying the vectors by a parameter W and adding an offset b, and the output layer is p in the target corpus pair_i. The input and output of the learning network both use the one-hot coding of the trace points. Utilizing the steps

4.2, training the neural network by the linguistic data pairs constructed in the step 2, wherein each linguistic data pair is used as a group of training data, and the training enables the neural network to converge.

4.4 after the training of the learning network is finished, inputting target track data, extracting parameters of an output layer of the learning network to be used as information codes of track points, and using v (p)_i) And (4) showing.

And 5: and calculating a track semantic vector based on the recurrent neural network.

After the position points are mapped into vectors, the problem of track with indefinite length needs to be solved. In this embodiment, a cyclic neural network (RNN) based model is used to encode the whole trace into a Vector, which is called a Track Semantic Vector (TSV). RNNs are a class of artificial neural networks, the connections between nodes forming a directed graph along a time series. It allows the model to show temporal dynamic behavior. Unlike feed-forward neural networks, RNNs can process input sequences with their internal states (memory). Three RNN variants can be used including: long and Short Time Memory (LSTM), a Gating Recursion Unit (GRU) and a Bidirectional Recursion Neural Network (BRNN) are used for calculating the track vector.

In this embodiment, LSTM is used for track semantic vector evaluation. A general LSTM unit consists of an input gate, an output gate, and a forgetting gate. The LSTM cell memorizes values at arbitrary time intervals, and the three gates regulate the flow of information into and out of the cell. The LSTM is designed to address the problem of explosion and vanishing gradients that may be encountered in conventional RNN training. For each sub-track T ═ l obtained after slicing₁,l₂,…,l_kLet h_t-1Representing the last state, h representing the current state, h_tRepresenting candidate states and encoding v (p) the information of the trace points obtained in step 4_i) Substitution, and then recursive evolution using the following recursion formula:

f_t＝σ_g(W_fv(p_t)+U_fh_t-1+b_f)

i_t＝σ_g(W_iv(p_t)+U_ih_t-1+b_i)

o_t＝σ_g(W_ov(p_t)+U_oh_t-1+b_o)

c_t＝f_t°c_t-1+i_t°σ_c(W_cv(p_t)+U_ch_t-1+b_c)

h_t＝o_t°σ_h(c_t)

during the training process, the initial value is c₀0 and h₀0, operator ° denotes a Hadamard product (elementary product). In addition, in the above formula, subscript t indexes time step; v (p)_t) Is the input vector of the LSTM unit; f. of_tIs the activation vector of the forgetting gate; i.e. i_tIs the activation vector that updates the gate; o_tIs the activation vector of the output gate; h is_tIs a hidden state vector, also called the output vector of the LSTM unit; c. C_tIs the state vector of the LSTM unit; w, U and b are weight matrix and deviation vector parameters that need to be learned during trainingLearning; sigma_gIs a sigmoid function; sigma_cIs a hyperbolic tangent function; sigma_hIs a hyperbolic tangent function. In this way, the RNN output after each recursive iteration is obtained. The average of each output is then used to evaluate the trajectory semantic vector, as shown below.

In this embodiment, the track semantic vector output formula is as follows:

step 6: and calculating the attention score.

In step 5 above, how to encode the track into a vector and further obtain a track semantic vector is described in detail, so as to associate the track with the identification information.

In the prior art, for the recurrent neural network, a softmax function is generally used for classification, wherein a loss function L can be calculated as an equation.

L＝cross_entropy(user,softmax(TSV))

However, there are two problems in directly judging the track category using the track semantic vector as feature information: (1) when the trajectory is too long, the original trajectory will contain noise points, but the noise points are weighted the same as the normal points. (2) The RNN unit used in step 5 may gradually forget the early track point information, resulting in incomplete capture of the whole track information.

In view of the above two problems, the present invention introduces an attention mechanism module. Note that the power mechanism does not discard all hidden states computed in the source RNN, but rather provides a way to allow decoders to peek at them (treating them as dynamic memory of source information). By doing so, the attention mechanism improves translation of long sentences. Currently, attention mechanisms have become de facto standard and have been successfully applied to many other tasks, including image subtitle generation, speech recognition, and text summarization.

The inventors of the present application have found that the above two problems can be solved by introducing an attention mechanism module to evaluate the trajectory semantic vector.

In the embodiment, an attention-based trajectory semantic vector reevaluation model is established. For example, the inventors assign a weight to the output at each point in time. Let a_tRepresenting the weight of each time point, representing the different importance of each point, the calculation steps are as follows:

1. calculating an attention score for each time point, using

It is shown that,

is calculated as follows:

wherein h is_t ^TIs h calculated in step 5_tThe transpose of (a) is performed,

h is all the time points calculated in step 5_tIs measured.

2. Normalizing the attention score of each time point to obtain a normalized attention score which is the weight a of each time point_tNormalization is performed using the following formula

And 7: recalculating trajectory vectors

Using the output value h calculated in step 5 at each time point_tAnd the weight a of the output value at each time point calculated in step 6_tRecalculating track semantic vector, new TSV for track semantic vector_attentionExpressed, it can be calculated with the following equation:

and 8: and classifying the track vectors by utilizing a multilayer perceptron.

Multi-layer perceptrons (MLPs) are a class of feedforward Artificial Neural Networks (ANNs). The MLP consists of three layers of nodes: an input layer, a hidden layer, and an output layer. Each node, except the input nodes, is a neuron that uses a nonlinear activation function. MLPs are trained using a supervised learning technique called back propagation. Its multi-layered and non-linear activation distinguishes MLP from linear perceptrons. It is able to distinguish between data that are not linearly separable. For the trajectory semantic vector TSV, MLP (TSV) is used to represent the output of MLP as a perceptual result, as follows:

MLP(TSV)＝tanh(W·TSV+b)

track identification is a multi-classification problem in nature, so the inventor classifies MLPs (TSVs) by using a softmax function, and the softmax function can acquire the element with the largest value in a vector and acquire the number corresponding to the largest value as the number corresponding to the track identification type.

And step 9: and acquiring track identification information.

In this example, an open python library Gensim (https:// radimreurek. com/genesis /) was used as a training database to encode information for check-in track points according to the above-described procedure. There are three important parameters in the embedding training, let vectore denote the dimension of the position embedding vector, windows denote the length of the context, and learning denotes the number of iterations of signing into the position embedding model. The vectoresize and windows values depend on the size of the trajectory training corpus, and when vectoresize is 250, the model performs best. When raising the value of learning, a more stable value vectore can be obtained. The inventors found that if the learning is greater than 100, vectoresize will converge. In addition, the inventor selects a skip-gram as a training algorithm to train the attention circulation neural network.

The inventors used cross-entropy loss as the loss function of the present invention and Adam as the optimization method because it was able to adaptively calculate the learning rate for each parameter.

Preferably, to reduce overfitting in the recurrent neural network, the inventors used a regularization technique dropout (erasure) in step 5, which works by: in the training process of a neural network, firstly, randomly selecting some units in a neural layer and temporarily hiding the units, then, carrying out the training and optimizing process of the neural network, and randomly hiding other neural elements in the next training process until the training is finished, wherein in the process, the proportion of the number of the randomly hidden neural elements to the number of the neural elements in the whole neural network is called as a dropout rate. Dropout is a very efficient method to achieve model averaging of the neural network by preventing complex co-adaptation on the training data. The inventors set the droporate to 0.5. Let hidsensize denote the number of hidden layers in the recurrent neural network and attentionsize denote the size of the trajectory semantic vector. The inventors employed a grid search strategy to select the optimal parameters of hidrenzee and attentionsize. The inventors set the initial learning rate to 0.00095 and gradually decreased it to 0.0001. The inventor conducts a large number of tests and finds that the network can achieve convergence after 30 times of iterative training.

Example 2

In another implementation, a similar approach as in embodiment 1 is used, except that the GRU is used for trajectory semantic vector evaluation in step 5.

Gating a recursive unit (GRU) is a gating mechanism in a recurrent neural network. The GRU is like a Long Short Term Memory (LSTM) with a forgetting gate, but has fewer parameters than LSTM because it has no output gates. GRUs exhibit better performance on some smaller data sets.

Similar to LSTM, the track semantic vector for a sub-track T is computed by averaging the hidden outputs of each time node GRU, as follows.

z_t＝σ_g(W_zv(p_t)+U_zh_t-1+b_z)

r_t＝σ_g(W_rv(p_t)+U_rh_t-1+b_r)

h_t＝(1-z_t)°h_t-1+z_t°σ_h(W_hv(p_t)+U_h(r_t°h_t-1)+b_h)

Wherein, v (p)_t) Is an input vector, h_tIs the output vector, z_tIs to update the gate vector, W, U and b are the parameters that need to be learned. In the evaluation of the track semantic vector, for t equal to 0, the output vector is h_t＝0。σ_gIs a sigmoid function. Sigma_hIs hyperbolic tangent.

Example 3

In another implementation, a similar approach to embodiment 1 is used, except that in step 5 a Bidirectional Recurrent Neural Network (BRNN) is used for trajectory semantic vector evaluation.

A bi-directional recurrent neural network is a neural network structure that connects two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can obtain information from both past (backward) and future (forward) states. Bidirectional recurrent neural networks were introduced to increase the amount of input information available to the network. Standard Recurrent Neural Networks (RNNs) also have limitations because information cannot be obtained from the current state for future inputs. In contrast, a bidirectional recurrent neural network does not require its input data to be fixed. In addition, their future input information may be obtained from the current state. Bidirectional recurrent neural networks are particularly useful when an input context is required. For example, in handwriting recognition, performance can be enhanced by knowing the letters before and after the current letter.

To verify the effect of the present invention, the inventor used a pedestrian trajectory data set (download address: http:// snap

http:// snap. stanford. edu/data/loc-bright kit. html) and the measurement mode adopts accuracy, recall rate and F1 index. The accuracy calculation mode is as follows:

the recall ratio is calculated as follows:

the F1 index is calculated as follows:

comparing the method of the present invention with the existing methods of LCSS (Long-Common-sequence), LDA (late Dirichlet allocation), SVM (support vector maps), TULER (project-User Link on Embedded Current Networks), TULVAE (project-User Link on Auto Encoding variant), etc., the comparison result is as follows:

as can be seen from the table, the track recognition model trained by the method and the corresponding recognition method can obviously improve the accuracy of track recognition.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A trajectory recognition model training method, characterized in that the method comprises:

s1, encoding target track data into vector-form information encoding by constructing a corpus based on a single hot encoding vector;

s2, constructing a recurrent neural network for coding and converting the information into track semantic vectors and a classifier for classifying tracks based on the track semantic vectors;

2. The trajectory recognition model training method according to claim 1, wherein the step S1 includes:

3. The trajectory recognition model training method of claim 1, further comprising time slicing the target trajectory data.

4. The method for training the trajectory recognition model based on the attention circulation neural network as claimed in claim 1, wherein the step S2 comprises: and classifying the corresponding track based on the evaluation result of the track semantic vector by utilizing the multilayer perceptron.

5. The trajectory recognition model training method according to claim 1, wherein the step S2 includes:

s2.1, calculating a track semantic vector based on a recurrent neural network;

and S2.3, recalculating the track semantic vector by using the track semantic vector obtained in the step S2.1 and the track semantic vector reevaluation model constructed in the step S2.2.

6. The method for training the trajectory recognition model based on the attention circulation neural network as claimed in claim 5, wherein in the trajectory semantic vector calculation process of step S2.1, the method further comprises: and randomly erasing a plurality of neural network units in the recurrent neural network according to a certain proportion.

7. The trajectory recognition model training method according to claim 1, wherein the recurrent neural network is a recurrent neural network, or a bidirectional recurrent neural network.

8. A trajectory recognition method, comprising: inputting the trajectory data to be recognized as target trajectory data into a model obtained by the method according to any one of claims 1 to 7 for recognition.

9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.

10. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the program.