CN114387313A

CN114387313A - Motion trajectory prediction method, device, equipment and storage medium

Info

Publication number: CN114387313A
Application number: CN202210016155.9A
Authority: CN
Inventors: 周斌; 胡波; 李艳红; 张子涵; 安宁
Original assignee: Wuhan Etah Information Technology Co ltd
Current assignee: Wuhan Etah Information Technology Co ltd
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2022-04-22

Abstract

The invention discloses a motion track prediction method, a device, equipment and a storage medium, which relate to the field of track prediction and comprise the following steps: acquiring position information and speed information of an observed object within set time to acquire a position hidden state and a speed hidden state of the observed object at each moment; assigning a weight based on the degree of influence to correct the velocity hiding state using an attention mechanism; and connecting the corrected speed hidden state and the position hidden state to form a final context vector, outputting the final context vector, and decoding to generate a predicted motion track. The invention can improve the precision of the track prediction.

Description

Motion trajectory prediction method, device, equipment and storage medium

Technical Field

The invention relates to the field of trajectory prediction, in particular to a motion trajectory prediction method, a motion trajectory prediction device, motion trajectory prediction equipment and a storage medium.

Background

In recent years, with the progress of computer vision and artificial intelligence, the prediction of human trajectories has recently become a vigorous research topic in the computer vision world. The trajectory prediction is to model the motion trajectory in the past so as to predict the trajectory for a period of time in the future. The pedestrian trajectory prediction is the basis and key point of the research in the trajectory prediction field, and with the maturity of human understanding and trajectory processing technology, the pedestrian trajectory prediction method is widely applied to the fields of robot navigation, automatic driving, intelligent monitoring of videos and the like.

The existing pedestrian trajectory prediction research work can be divided into methods based on a traditional model and methods based on deep learning. The pedestrian trajectory can be regarded as a typical sequence-to-sequence (seq 2seq) problem, and thus a Recurrent Neural Network (RNN) that is good at dealing with time series gradually moves into the field of view of researchers. However, because of the problems of gradient disappearance or gradient explosion, it is difficult for a simple RNN to remember long-term input information, so researchers have designed long-term short-term memory networks (LSTM) that are good at processing long-term dependency data, and in particular, successful application of LSTM in time series data processing, such as speech recognition, language translation, image captioning, etc., provides a necessary basis for pedestrian trajectory prediction.

Today, various trajectory prediction model algorithms are also applied to the trajectory prediction of athletes. Predicting the athlete's motion trajectory is a formidable challenge compared to the pedestrian trajectory because each athlete's choice of next step depends not only on their intent, but also on the influence of other athlete's position, direction of motion, and speed of motion. These factors cannot be directly observed, and can be estimated only from past information.

Disclosure of Invention

In view of the defects in the prior art, the first aspect of the present invention provides a motion trajectory prediction method, which can improve the accuracy of trajectory prediction.

In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:

a motion trajectory prediction method, the method comprising the steps of:

acquiring position information and speed information of an observed object within set time to acquire a position hidden state and a speed hidden state of the observed object at each moment;

assigning a weight based on the degree of influence to correct the velocity hiding state using an attention mechanism;

and connecting the corrected speed hidden state and the position hidden state to form a final context vector, outputting the final context vector, and decoding to generate a predicted motion track.

In some embodiments, the acquiring the position information and the speed information of the observed object within the set time to obtain the position hidden state and the speed hidden state of the observed object at each time includes:

embedding position information and speed information of an observed object into a vector by using a multi-layer perceptron MLP:

wherein the content of the first and second substances,

a position feature vector representing the time t,

representing the relative velocity feature vector at time t, W_eIs a corresponding weight, P_t ⁱTo observe the position information of the object i at time t,

speed information at the time t for an observation object i;

and sequentially taking the obtained position characteristic vector and relative speed characteristic vector at each moment as input vectors of a position-speed long-short term memory network PV-LSTM:

wherein the content of the first and second substances,

in order to observe the hidden state of the object i at the time t,

to observe the velocity hidden state of object i at time t,

and

is the corresponding weight;

summarizing the position hidden state and the speed hidden state of the observation object i at each moment to obtain:

wherein A isⁱIs the position hiding state of the observed object i at each moment, BⁱIs the velocity hiding state of the observation object i at each time.

In some embodiments, the assigning, using the attention mechanism, a weight based on the degree of influence to modify the speed hiding state includes:

calculating the weight value of the observation object i to the jth speed hidden state corresponding to the u at the time t

B is to beⁱIs modified into

Wherein the content of the first and second substances,

representing the jth speed hidden state, T, of the observed object i_SIndicating the moment of ending the observation.

In some embodiments, the calculating of the weight value of the observation object i for the jth speed hidden state corresponding to the u at the time t

The method comprises the following steps:

according to the formula:

calculating a scoring function

According to the formula:

computing

Wherein the content of the first and second substances,

is the hidden state of the decoder output at time t-1 of the observed object i, W_fcIs the weight of the full connection layer, v^TIs a parameter that can be learned by the user,

is to observeThe k-th speed hidden state of the object i, the value range of k is [1, T_S]。

In some embodiments, the connecting the corrected speed hidden state and the position hidden state to form a final context vector, and outputting the final context vector to perform decoding to generate the predicted motion trajectory includes:

according to the formula:

get the final context vector CⁱWherein

Is a fully connected layer with non-linearity, W_cIs a weight matrix;

according to the formula:

decoding to generate a predicted motion profile, wherein

Represents the output of the decoder predicted the last time instant,

representing the final context vector at time t, FC is the fully connected layer.

A second aspect of the present invention provides a motion trajectory prediction apparatus that can improve the accuracy of trajectory prediction.

a motion trajectory prediction apparatus comprising:

the encoder module is used for acquiring the position hiding state and the speed hiding state of the observation object at each moment according to the position information and the speed information of the observation object acquired within the set time;

the attention module is used for distributing weight based on the influence degree by utilizing an attention mechanism so as to correct the speed hidden state, and connecting the corrected speed hidden state with the position hidden state to form a final context vector for outputting;

a decoder module to receive the final context vector and decode to generate a predicted motion trajectory.

In some embodiments, the encoder module is to:

wherein the content of the first and second substances,

a position feature vector representing the time t,

speed information at the time t for an observation object i;

wherein the content of the first and second substances,

in order to observe the hidden state of the object i at the time t,

to observe the velocity hidden state of object i at time t,

and

is the corresponding weight;

In some embodiments, the attention module is to:

B is to beⁱIs modified into

Wherein the content of the first and second substances,

representing the jth velocity hidden state of an observed object i，T_SIndicating the moment of ending the observation.

A third aspect of the present invention provides a computer apparatus that can improve the accuracy of trajectory prediction.

a computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of the method as described above.

A fourth aspect of the present invention provides a computer-readable storage medium that can improve the accuracy of trajectory prediction.

a computer-readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, performs the steps of the method as described above.

Compared with the prior art, the invention has the advantages that:

according to the motion trajectory prediction method, due to the adoption of the attention mechanism, the attention mechanism can enable the position influencing the prediction to be distributed with larger weight, so that the prediction is more accurate. Therefore, the method has more accurate and practical application value in the prediction of the short-road speed sliding track, particularly the track prediction of a curve.

Drawings

FIG. 1 is a prior art ice rink camera profile;

FIG. 2 is a flow chart of a motion trajectory prediction method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a motion trajectory prediction apparatus according to an embodiment of the present invention;

fig. 4 is a block diagram schematically showing the structure of a computer device according to the embodiment of the present invention.

Detailed Description

It should be noted that, for the prediction of the human trajectory, based on a traditional model, such as optical flow kalman filtering, which has been proposed in the prior art, the model is more accurate than the traditional kalman filtering, but is only limited to pedestrians with constant speed and slow movement. However, the traditional model is limited to manually set pedestrian attributes and functions, is only suitable for the situation that pedestrians have little interaction, and is gradually surpassed by the deep learning model driven by data.

With respect to the deep learning model, it is described in the foregoing that simple RNNs have difficulty remembering long-term input information due to the problem of gradient disappearance or gradient explosion, and therefore researchers have devised long-term short-term memory networks (LSTM) that are good at processing long-term dependency data, and in particular, successful application of LSTM in time series data processing, such as speech recognition, language translation, image captioning, etc., provides a necessary basis for pedestrian trajectory prediction.

Currently, a Social-LSTM model is proposed in the prior art. In the model, the hidden state of the pedestrian in the neighborhood is judged according to the spatial distance of the pedestrian to be shared, and the information around the pedestrian is obtained to represent the influence of other pedestrians on the track of the target pedestrian. However, the Social-LSTM model has certain limitations on the context information of important scenes. For this reason, a deep stochastic inverse optimal control RNN encoder-Decoder (DESIRE) framework is proposed in the development process, and the scene context is sorted and refined instead of directly incorporating the scene information into the trajectory prediction. The Social-LSTM model is further expanded with a content-posing layer, which also enables neural networks to study how disorders affect pedestrian motion.

Today, various trajectory prediction model algorithms are also applied to the trajectory prediction of athletes. Predicting the athlete's motion trajectory is a formidable challenge compared to the pedestrian trajectory because each athlete's choice of next step depends not only on their intent, but also on the influence of other athlete's position, direction of motion, and speed of motion. These factors cannot be directly observed, and can be estimated only from past information. Especially in the sports games with fierce antagonism such as football, basketball or short track speed skating, the prediction of the motion trail has a very critical position, whether the prediction precision can be improved or not is very important for fully knowing the position information and the motion mode of the own side and the opponent player and obtaining tactical advantages in the games or accurately analyzing the game data after the games.

Therefore, based on the above analysis, in the embodiment of the present invention, the trajectory prediction is applied to the sport game with violent antagonism, such as the short track speed skating, aiming at the prediction analysis of the movement trajectory of the athlete. The analysis of short-road fast-sliding tracks belongs to the field of track prediction, and can be studied by taking the reference of the modern pedestrian track prediction theoretical method.

It is worth mentioning that the motion characteristics of short-track speed skiers are mainly different from those of pedestrians as follows:

the moving directions of the short-track speed skating athletes are the same, while the moving direction of the pedestrian is not fixed and is influenced by scenes and other pedestrians;

the speed of movement of short-track speed skiers is faster and changes more frequently than the speed of pedestrian walking; therefore, the speed information of the athlete is taken as an important condition in the embodiment of the invention.

The motion trail of the short-track speed skating player is more regular than that of the pedestrian.

Although the short track speed skating movement track has regularity, the short track speed skating movement track is roughly divided into a straight track and a curve track. However, referring to fig. 1, in the short track speed skating training or competition, in order to clearly record the movement of each player in the whole ice field, 6 high-definition panoramic cameras are adopted to shoot above the field at the same time. However, when images of 6 cameras are processed and synthesized into a video, for a cross-camera or a camera junction, due to the fact that the speed of athletes is high, when the athletes pass through the camera junction in a very short time, track mismatching of the athletes is difficult to avoid under the conditions that the athletes are frequently shielded and staggered in position, and subsequent predicted track disorder is easily caused.

Therefore, in order to solve the above problems, embodiments of the present invention provide a trajectory prediction model based on Position-Velocity-LSTM (Position-Velocity-LSTM) information of an LSTM Encoder-Decoder (Encoder-Decoder) framework, which applies trajectory prediction to short-track speed skating to concentrate on a motion trajectory of an athlete in real training or competition to predict the athlete's future trajectory.

It is worth to be noted that the PV-LSTM respectively processes position and speed information by adopting speed and position LSTM in an Encoder module, an attention mechanism is introduced between the Encoder and the Decode, athlete track information with large influence of speed weight on the track is calculated, the precision of track prediction is improved, and finally the track is predicted in the Decode module.

For the purpose of making the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 2, an embodiment of the present invention provides a motion trajectory prediction method, including the following steps:

s1, collecting position information and speed information of an observation object in set time to obtain a position hidden state and a speed hidden state of the observation object at each moment.

It is worth noting that in a sports match or training, it is assumed that the ith player on the field is denoted as i. Thus, at time t, each player in the scene is represented by a 2D coordinate (x)_t,y_t) And (4) showing. From T-1 to T-T_SObserving the position of each player in order to predict the pedestrian from T ═ T_STo T ═ T_PIn which T is_SAnd T_PRespectively indicating the time when observation is finished and the time when prediction is finished. Thus, an observation locus P is given_S＝[(x₁,y₁)，…，(x_s,y_s)]Where x and y represent the lateral position and the longitudinal position, respectively. For speed information, the relative speed to the absolute speed of a short track speed skier in making a decision on a performanceMore importantly, the relative speed to the target player is therefore chosen here as an input for the surrounding skater. U shape_S＝[(u₁,v₁)，…，(u_s,v_s)]. Where u and v represent the lateral velocity and the longitudinal velocity, respectively.

It can therefore be known that the historical position and velocity information is:

wherein i represents the ith player, and the above formula represents the historical position information and speed information of the ith player at the time t.

Furthermore, it is understood that the hidden state in step S1 refers to the background variables in the recurrent neural network, the input layer of the neural network inputs the hidden state, and the hidden state in the middle is used to calculate the result, and then passes the result to the output layer.

In combination with the above description, in a specific implementation, step S1 includes:

wherein the content of the first and second substances,

a position feature vector representing the time t,

speed information at the time t for an observation object i;

wherein the content of the first and second substances,

in order to observe the hidden state of the object i at the time t,

to observe the velocity hidden state of object i at time t,

and

is the corresponding weight;

And S2, utilizing an attention mechanism, and distributing weights based on the influence degree to correct the speed hiding state.

It is worth noting that B is output from a conventional encoderⁱCannot completely represent T_SAll speed status information within. Because the encoder-decoder model has certain limitations, the first input sequence information will be diluted or overwritten by the subsequent input sequence data. And this phenomenon is more serious as the length of the input sequence increases.

To solve this problem, the embodiment of the present invention employs an attention mechanism, and its core idea is to select a more appropriate context vector at each moment of the decoding process. In the embodiment of the invention, the speed information at different time has different influences on the future track, and the attention mechanism can ensure that the position influencing the prediction is distributed with larger weight, so that the prediction is more accurate.

Specifically, step S2 includes:

s21, calculating the weight value of the observation object i to the jth speed hidden state corresponding to the u at the time t

Specifically, step S21 includes:

according to the formula:

calculating a scoring function

According to the formula:

computing

Wherein the content of the first and second substances,

representing the jth velocity hidden state of the observed object i,

is the k-th speed hidden state of the observed object i, and the value range of k is [1, T_S]。

S22, mixing BⁱIs modified into

And S3, connecting the corrected speed hidden state with the position hidden state to form a final context vector, outputting the final context vector, and decoding to generate a predicted motion track.

Specifically, according to the formula:

get the final context vector CⁱWherein

Is a fully connected layer with non-linearity such that the output is the final context vector, W_cIs a weight matrix;

according to the formula:

decoding to generate a predicted motion profile, wherein

Represents the output of the decoder predicted the last time instant,

It is worth noting that the output of the LSTM decoder will be passed as input to the next time step LSTM decoder. That is, since the position and information of time step t +1 are carried at time step t, the position and velocity information is weighted and updated before the input of the next time step.

In summary, in the motion trajectory prediction method of the present invention, since the attention mechanism is adopted, the attention mechanism can assign a larger weight to the position that affects the prediction, so that the prediction is more accurate. Therefore, the method has more accurate and practical application value in the prediction of the short-road speed sliding track, particularly the track prediction of a curve.

Meanwhile, the motion trajectory prediction apparatus according to an embodiment of the present invention may be configured as shown in fig. 3, and includes an Encoder module (Encoder), an Attention module (Attention), and a Decoder module (Decoder).

The encoder module acquires the position hiding state and the speed hiding state of the observation object at each moment according to the position information and the speed information of the observation object collected within the set time.

And the attention module is used for distributing weight based on the influence degree by utilizing an attention mechanism so as to correct the speed hidden state, and connecting the corrected speed hidden state and the position hidden state to form a final context vector for outputting.

A decoder module is configured to receive the final context vector and decode it to generate a predicted motion trajectory.

In some embodiments, the encoder module is to:

wherein the content of the first and second substances,

a position feature vector representing the time t,

representing the relative velocity feature vector at time t, W_eIs the corresponding weight;

wherein the content of the first and second substances,

in order to observe the hidden state of the object i at the time t,

to observe the velocity hidden state of object i at time t,

and

is the corresponding weight;

In some embodiments, the attention module is to:

B is to beⁱIs modified into

Wherein the content of the first and second substances,

In some embodiments, the attention module calculates a weight value of the observation object i for the jth speed hiding state corresponding to the u at the time t

The method comprises the following steps:

according to the formula:

calculating a scoring function

According to the formula:

computing

Wherein the content of the first and second substances,

representing the jth velocity hidden state of the observed object i,

In some embodiments, the attention module is to:

according to the formula:

get the final context vector CⁱWherein

Is a fully connected layer with non-linearity such that the final context vector is output and input to the decoder block, W_cIs a weight matrix;

the decoder module is to:

according to the formula:

decoding to generate a predicted motion profile, wherein

Represents the output of the decoder predicted the last time instant,

In summary, the motion trajectory prediction apparatus of the present invention employs the attention mechanism, which can assign a greater weight to the position that affects the prediction, so that the prediction is more accurate. Therefore, the method has more accurate and practical application value in the prediction of the short-road speed sliding track, particularly the track prediction of a curve.

The apparatus provided by the above embodiments may be implemented in the form of a computer program, which can be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a terminal.

As shown in fig. 4, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions which, when executed, cause a processor to perform any of the methods.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by a processor causes the processor to perform any of the methods.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:

in one embodiment, the processor, when implemented, is configured to implement: acquiring position information and speed information of an observed object within set time to acquire a position hidden state and a speed hidden state of the observed object at each moment;

In one embodiment, the processor, when implemented, is configured to implement: embedding position information and speed information of an observed object into a vector by using a multi-layer perceptron MLP:

wherein the content of the first and second substances,

a position feature vector representing the time t,

representing the relative velocity feature vector at time t, W_eIs a corresponding weight, P_t ⁱTo watchThe object of examination i has position information at the time t,

speed information at the time t for an observation object i;

wherein the content of the first and second substances,

in order to observe the hidden state of the object i at the time t,

to observe the velocity hidden state of object i at time t,

and

is the corresponding weight;

wherein A isⁱIs that the observation object i is in eachPosition hidden state of individual moment, BⁱIs the velocity hiding state of the observation object i at each time.

In one embodiment, the processor, when implemented, is configured to implement: calculating the weight value of the observation object i to the jth speed hidden state corresponding to the u at the time t

B is to beⁱIs modified into

Wherein the content of the first and second substances,

In one embodiment, the processor, when implemented, is configured to implement: according to the formula:

calculating a scoring function

According to the formula:

computing

Wherein the content of the first and second substances,

j-th velocity implicit representing observation object iIn the stored state, the first and second containers are in the stored state,

get the final context vector CⁱWherein

according to the formula:

decoding to generate a predicted motion profile, wherein

Represents the output of the decoder predicted the last time instant,

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program includes program instructions, and a method implemented when the program instructions are executed may refer to the embodiments of the present application.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A motion trail prediction method is characterized by comprising the following steps:

2. The method for predicting the motion trail according to claim 1, wherein the step of collecting the position information and the speed information of the observed object within a set time to obtain the position hidden state and the speed hidden state of the observed object at each time comprises the steps of:

wherein the content of the first and second substances,

a position feature vector representing the time t,

speed information at the time t for an observation object i;

wherein the content of the first and second substances,

in order to observe the hidden state of the object i at the time t,

to observe the velocity hidden state of object i at time t,

and

is the corresponding weight;

3. The method according to claim 2, wherein the assigning a weight to modify the speed hiding state based on the degree of influence by using an attention mechanism comprises:

B is to beⁱIs modified into

Wherein the content of the first and second substances,

4. The method according to claim 3, wherein the weight value of the observation object i for the jth hidden speed state corresponding to u at the time t is calculated

The method comprises the following steps:

according to the formula:

calculating a scoring function

According to the formula:

computing

Wherein the content of the first and second substances,

5. The method as claimed in claim 4, wherein the step of connecting the corrected speed hidden state and the position hidden state to form a final context vector, and outputting the final context vector for decoding to generate the predicted motion trajectory comprises:

according to the formula:

get the final context vector CⁱWherein

Is a fully connected layer with non-linearity, W_cIs a weight matrix;

according to the formula:

decoding to generate a predicted motion profile, wherein

Represents the output of the decoder predicted the last time instant,

6. A motion trajectory prediction apparatus, comprising:

7. The motion trajectory prediction device of claim 6, wherein the encoder module is configured to:

wherein the content of the first and second substances,

a position feature vector representing the time t,

speed information at the time t for an observation object i;

wherein the content of the first and second substances,

in order to observe the hidden state of the object i at the time t,

to observe the velocity hidden state of object i at time t,

and

is the corresponding weight;

8. The motion trail prediction device according to claim 7, wherein the attention module is configured to:

B is to beⁱIs modified into

Wherein the content of the first and second substances,

9. A computer arrangement comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, performs the steps of any of claims 1 to 5.

10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.