CN111798492A

CN111798492A - Trajectory prediction method, apparatus, electronic device, and medium

Info

Publication number: CN111798492A
Application number: CN202010685801.1A
Authority: CN
Inventors: 余存俊; 马骁; 赵海宇; 伊帅
Original assignee: Sensetime International Pte Ltd
Current assignee: Sensetime International Pte Ltd
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2020-10-20
Anticipated expiration: 2040-07-16
Also published as: CN111798492B

Abstract

The embodiment of the disclosure provides a track prediction method, a track prediction device, an electronic device and a medium, wherein the track prediction method comprises the following steps: respectively acquiring track information of a target object and an associated object in a first period; carrying out time dimension information transformation on the track information of the target object to obtain a time characteristic vector of the target object, and carrying out space dimension information transformation on the basis of the track information of the target object and the associated object to obtain a space characteristic vector of the target object; wherein at least one of the information transformation of the time dimension and the space dimension adopts an attention mechanism; obtaining an object feature vector of the target object according to the time feature vector and the space feature vector; and predicting the track information of the target object in the second time interval based on the object feature vector.

Description

Trajectory prediction method, apparatus, electronic device, and medium

Technical Field

The present disclosure relates to machine learning technologies, and in particular, to a trajectory prediction method, apparatus, electronic device, and medium.

Background

In the aspect of information prediction, sometimes, the information prediction of a single object may be influenced by various factors, and how to predict is more accurate and is a topic to be researched. For example, taking the pedestrian trajectory prediction as an example, the analysis and understanding of the walking behavior of the pedestrian is an important research direction in the fields of computer vision and intelligent video monitoring, and the walking behavior model has important applications in many fields, such as walking behavior prediction, pedestrian detection and tracking, and the like. Modeling human walking behavior is a complex problem, and many important factors such as internal factors and external factors need to be considered.

Disclosure of Invention

The embodiment of the disclosure at least provides a track prediction method, a track prediction device, electronic equipment and a medium.

In a first aspect, a trajectory prediction method is provided, the method including:

respectively acquiring track information of a target object and an associated object in a first period;

performing time dimension information transformation on the track information of the target object to obtain a time characteristic vector of the target object, and performing space dimension information transformation on the basis of the track information of the target object and the associated object to obtain a space characteristic vector of the target object; wherein at least one of the information transformation of the time dimension and the space dimension adopts an attention mechanism;

obtaining an object feature vector of the target object according to the time feature vector and the space feature vector;

and predicting the track information of the target object in a second time interval on the basis of the object feature vector, wherein the second time interval is located after the first time interval in time sequence.

In some embodiments, the trajectory information of the first period of time comprises: respectively corresponding to the track characteristics of each moment in the first time interval; the information transformation of the spatial dimension is performed based on the track information of the target object and the associated object to obtain the spatial feature vector of the target object, and the method comprises the following steps: for each moment in the first time interval, obtaining an updated trajectory feature of the target object at the moment through an attention mechanism according to the trajectory feature of the target object at the moment and the trajectory feature of at least one associated object of the target object at the moment; and the whole of the updated track features respectively corresponding to the target object at each moment in the first time interval is used as the space feature vector of the target object.

In some embodiments, the obtaining the updated trajectory feature of the target object at the time point through the attention mechanism includes: obtaining a query vector corresponding to the target object at the moment according to the track characteristics of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristics of the associated object at the moment; determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object; obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment; and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.

In some embodiments, the method further comprises: for a first associated object of the target object at a first time in the first time period, obtaining an updated trajectory feature of the first associated object at the time through an attention mechanism according to the trajectory feature of the first associated object and a trajectory feature of at least one first object, wherein an association relationship exists between the first object and the first associated object; and obtaining the updated trajectory feature of the target object after being updated again at the first moment through an attention mechanism according to the updated trajectory feature of the target object at the first moment and the updated trajectory feature of at least one first associated object at the first moment.

In some embodiments, the distance between the first object and the first associated object is within a preset distance range.

In some embodiments, the trajectory information of the first period of time comprises: respectively corresponding to the track characteristics of each moment in the first time interval; the time-dimension information transformation of the trajectory information of the target object includes: and obtaining the updated time characteristic vector of the target object through an attention mechanism according to the track characteristics of the target object in the track information respectively corresponding to each moment.

In some embodiments, in a case that the information transformation in the time dimension employs an attention-free mechanism, the performing information transformation in the time dimension on the trajectory information of the target object to obtain a temporal feature vector of the target object includes: carrying out time dimension information transformation and multi-head attention mechanism processing on the track information of the target object to obtain a time characteristic vector of the target object; and/or, under the condition that the information transformation of the spatial dimension adopts a self-attention mechanism, performing the information transformation of the spatial dimension based on the trajectory information of the target object and the associated object to obtain a spatial feature vector of the target object, including: and performing information transformation of spatial dimensions and processing of a multi-head attention mechanism on the basis of the track information of the target object and the associated object to obtain a spatial feature vector of the target object.

In some embodiments, the obtaining an object feature vector of the target object according to the temporal feature vector and the spatial feature vector includes: stitching the temporal feature vector and the spatial feature vector; and obtaining the object feature vector according to the vector obtained after splicing.

In some embodiments, the obtaining an object feature vector of the target object according to the temporal feature vector and the spatial feature vector includes: performing information transformation of a spatial dimension on the time characteristic vector output by the information transformation of the temporal dimension, and taking the obtained spatial characteristic vector as the object characteristic vector; and/or performing time-dimension information transformation on the space characteristic vector output by the space-dimension information transformation to obtain a time characteristic vector as the object characteristic vector.

In some embodiments, the obtaining trajectory information of the target object in the first period includes: and acquiring the track characteristics of the target object at each moment in the first time period, and acquiring the track information of the target object in the first time period according to the acquired track characteristics.

In some embodiments, the first time period comprises at least one of a second time and a third time; the acquiring of the trajectory feature of the target object at each moment in the first time period includes: encoding the track coordinate of the target object at a second moment in the first time interval to obtain a track characteristic corresponding to the second moment; and/or reading track features respectively corresponding to the target objects at a third moment in the first time interval from a memory.

In some embodiments, the target object is a target pedestrian; the track information of the target object in the second time period comprises at least one of the following items: the track coordinates of the target pedestrian at each moment in the second time period and the movement trend of the target pedestrian in the second time period; the associated object is a pedestrian having an association relationship with the target pedestrian in the same scene.

In a second aspect, there is provided a trajectory prediction device, the device comprising:

the information acquisition module is used for respectively acquiring the track information of the target object and the associated object in a first time period;

the information transformation module is used for carrying out time dimension information transformation on the track information of the target object to obtain a time characteristic vector of the target object, and carrying out space dimension information transformation on the basis of the track information of the target object and the associated object to obtain a space characteristic vector of the target object; wherein at least one of the information transformation of the time dimension and the space dimension adopts an attention mechanism;

the vector obtaining module is used for obtaining an object feature vector of the target object according to the time feature vector and the space feature vector;

and the information prediction module is used for predicting the track information of the target object in a second time interval on the basis of the object feature vector, wherein the second time interval is positioned after the first time interval in time sequence.

In some embodiments, the information transformation module, when configured to perform information transformation of a spatial dimension based on trajectory information of the target object and the associated object, to obtain a spatial feature vector of the target object, includes: for each moment in a first time interval, obtaining an updated trajectory feature of the target object at the moment through an attention mechanism according to the trajectory feature of the target object at the moment and the trajectory feature of at least one associated object of the target object at the moment; and the whole of the updated track features respectively corresponding to the target object at each moment in the first time interval is used as the space feature vector of the target object. The track information of the first period comprises: respectively corresponding to the track characteristics at each moment in the first time interval.

In some embodiments, the information transformation module, when configured to obtain the updated trajectory feature of the target object at the time point through an attention mechanism, includes: obtaining a query vector corresponding to the target object at the moment according to the track characteristics of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristics of the associated object at the moment; determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object; obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment; and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.

In some embodiments, the information transformation module is further configured to, for a first associated object of the target object at a first time in the first time period, obtain, through an attention mechanism, an updated trajectory feature of the first associated object at the time according to the trajectory feature of the first associated object and a trajectory feature of at least one first object, where an association relationship exists between the first object and the first associated object; and obtaining the updated trajectory feature of the target object after being updated again at the first moment through an attention mechanism according to the updated trajectory feature of the target object at the first moment and the updated trajectory feature of at least one first associated object at the first moment.

In some embodiments, the information transformation module, when configured to perform information transformation on the trajectory information of the target object in a time dimension, includes: and obtaining the updated time characteristic vector of the target object through an attention mechanism according to the track characteristics of the target object corresponding to each moment in the track information. The track information of the first period comprises: respectively corresponding to the track characteristics at each moment in the first time interval.

In some embodiments, the information transformation module, when configured to obtain the temporal feature vector of the target object, includes: under the condition that the time dimension information transformation adopts a self-attention mechanism, the time dimension information transformation and the multi-head attention mechanism are carried out on the track information of the target object to obtain a time feature vector of the target object; and/or when the information transformation module is used for obtaining the spatial feature vector of the target object, the information transformation module comprises: and under the condition that the information transformation of the space dimension adopts a self-attention mechanism, carrying out the information transformation of the space dimension and the processing of a multi-head attention mechanism on the basis of the track information of the target object and the associated object to obtain a space feature vector of the target object.

In some embodiments, the vector obtaining module is specifically configured to: stitching the temporal feature vector and the spatial feature vector; and obtaining the object feature vector according to the vector obtained after splicing.

In some embodiments, the vector obtaining module, when configured to obtain the object feature vector of the target object according to the temporal feature vector and the spatial feature vector, includes: performing information transformation of a spatial dimension on the time characteristic vector output by the information transformation of the temporal dimension, and taking the obtained spatial characteristic vector as the object characteristic vector; and/or performing time-dimension information transformation on the space characteristic vector output by the space-dimension information transformation to obtain a time characteristic vector as the object characteristic vector.

In some embodiments, the information obtaining module, when configured to obtain the trajectory information of the target object in the first time period, includes: and acquiring the track characteristics of the target object at each moment in the first time period, and acquiring the track information of the target object in the first time period according to the acquired track characteristics.

In some embodiments, the information obtaining module, when configured to obtain the trajectory feature of the target object at each time in the first time period, includes: encoding the track coordinate of the target object at a second moment in the first time interval to obtain a track characteristic corresponding to the second moment; and/or reading track features respectively corresponding to the target objects at a third moment in the first time interval from a memory.

In a third aspect, an electronic device is provided, the device comprising: the track prediction device comprises a memory and a processor, wherein the memory is used for storing computer readable instructions, and the processor is used for calling the computer instructions to realize the track prediction method in any embodiment of the disclosure.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the trajectory prediction method according to any of the embodiments of the present disclosure.

According to the trajectory prediction method, the trajectory prediction device, the electronic equipment and the medium, the trajectory information is subjected to time and space dimension transformation, and the transformation is performed by adopting an attention mechanism, so that more accurate and richer information is fused according to the transformed object feature vector, and the prediction result is more accurate when the trajectory prediction is performed.

Drawings

In order to more clearly illustrate one or more embodiments of the present disclosure or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in one or more embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 illustrates a flow chart of a trajectory prediction method provided by at least one embodiment of the present disclosure;

FIG. 2 illustrates a network architecture diagram for trajectory prediction provided by at least one embodiment of the present disclosure;

FIG. 3 illustrates a feature update diagram for a spatial dimension provided by at least one embodiment of the present disclosure;

FIG. 4 is a block diagram illustrating a trajectory prediction device provided in at least one embodiment of the present disclosure;

fig. 5 shows a schematic diagram of an electronic device provided by at least one embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art based on one or more embodiments of the disclosure without inventive faculty are intended to be within the scope of the disclosure.

When predicting information of a certain object in a next period from information of the object in a previous period, such information prediction may be influenced not only by history information of the object itself in the previous period but also by history information of other objects in the same previous period. Therefore, historical information of the object and other objects can be integrated for prediction, so that a more accurate prediction result can be obtained.

That means, with the technical solution provided by the present disclosure, the object information of a certain object in a later period can be predicted through the history information generated in the previous period. Wherein, the previous period can be understood as a period including the current time; the latter time period is understood to be adjacent to the previous time period in time sequence, i.e. the next time period connected to the current time or a certain time or times belonging to the next time period.

For example, in the field of purchase recommendation, a target user may be predicted to purchase at a certain time in a next period according to a historical purchase record of the target user and a historical purchase record of an associated user (e.g., a friend or a relative of the target user), so as to recommend a product that may be of great interest to the target user.

For example, in the field of trajectory prediction, the trajectory of an object may relate to the historical trajectory of the object itself, or may relate to the trajectory of another related object (e.g., another object at a relatively short distance) in the same walking scene, and therefore, the trajectory coordinates of the object at a certain time in the next period may be predicted from the historical trajectories of the object and the related objects.

The information prediction method provided by the present disclosure is described below by taking pedestrian trajectory prediction as an example, but it will be understood that the method may be applied to other scenarios as well. Fig. 1 illustrates a flowchart of a trajectory prediction method according to at least one embodiment of the present disclosure, and as shown in fig. 1, the method may include the following processes:

in step 100, trajectory information of the target object and the associated object in a first time period is respectively obtained.

In this embodiment, the track information of the first period includes not only the information corresponding to the current time t, but also information of each time before the current time t by a predetermined time period, for example, the predetermined time period may be within one hour before the current time t.

For example, according to the pedestrian P1 being at [1, T ]_obs]Historical track data of [1, T ]_obs]May be referred to as a first period of time to predict that the pedestrian P1 is at T_obsAnd the track coordinate corresponding to the +1 moment. Wherein P1 is called the target object to be subjected to track prediction, T_obsIs the current time. The above-mentioned historical track data may include track coordinates corresponding to respective times, for example, track coordinates

The position coordinate of the pedestrian i at the time t is shown, and the abscissa x and the ordinate y, i show the pedestrian identification. The historical track data may include a plurality of time instants t, and the corresponding track coordinates corresponding to the time instants t respectively form a track coordinate sequence.

In practical implementation, initially, a Video may be obtained, and the duration of the Video corresponds to [1, T [ ]_obs]The video may include a plurality of image frames, wherein each image frame may include a target object, namely, a pedestrian P1, the video may be preprocessed through a detection and tracking algorithm, track coordinates of the pedestrian P1 in each image frame are extracted, and finally, historical track data of the target object in the video is obtained.

Further, the trajectory information may be information obtained by encoding the history trajectory data. For example, the track sequence of a user may be { p1, p2, … … pt }, where pt is the track coordinate (x) of the user at the current time t^t，y^t) P1 is the track coordinate (x) of the user at the historical time t1¹，y¹) And so on. The track coordinate of each time may be encoded into a vector, for example, the track coordinate may be encoded (embedded) into a vector through a full connection layer FC, and accordingly, the track coordinate of each time may be encoded into a vectorAnd corresponding vectors, the vectors obtained by encoding the track coordinates can be called track characteristics, and the whole of the track characteristics of each time instant of the first period can be called track information of the first period.

The associated object is an object associated with a target object. In the scene at each time, at least one associated object of the target object at the time may be obtained, and the at least one associated object may be, for example, an object in the same scene as the target object. For example, still taking the above one Video as an example, if the pedestrian P2 and the pedestrian P1 are both in the image frame of the Video at a certain moment and the distance between the two is less than the predetermined distance d, the pedestrian P2 can be considered as the associated object of P1. The historical track data of the associated object is also a track coordinate sequence, and the associated object is obtained by considering that the information of the associated object has influence on the information prediction of the target object, for example, if the pedestrian P2 is closer to the P1, the track of the P1 is influenced by the track of the P2 when walking.

As above, the historical trajectory data of the target object and its associated object at each time is acquired, and it is assumed that the number of persons in the scene at each time has N in total, where possible

The trajectory coordinate of the pedestrian i at the time t is represented, the target object and the associated objects thereof total N pedestrians, and the trajectory coordinate of each person may be a two-dimensional coordinate

Representing a location in the scene. And the track coordinates of the associated object at each moment in the first time period can be coded into track characteristics in a coding mode, so that track information of the associated object in the first time period can be obtained.

In step 102, performing time-dimension information transformation on the trajectory information of the target object to obtain a time feature vector of the target object, and performing space-dimension information transformation on the basis of the trajectory information of the target object and the associated object to obtain a space feature vector of the target object; wherein at least one of the transformation of information in the time dimension and the spatial dimension employs an attention mechanism.

In the embodiment, on one hand, the information transformation is performed on the track information by using an attention mechanism; on the other hand, both temporal and spatial transformations are considered.

The temporal information transformation may be to obtain an updated sequence through an attention mechanism according to the trajectory features of the object corresponding to the respective times in the trajectory information, and the updated sequence may be referred to as a temporal feature vector of the object.

The spatial information conversion may be a process of performing an attention mechanism once for each time. For example, at least one associated object of the object may be obtained at each time, and similarly, the trajectory coordinate of each associated object at the time may also be encoded into a trajectory feature, so that the updated trajectory feature of the object may be obtained through attention mechanism processing according to the trajectory features of the object and its associated objects. Each time can obtain a corresponding updated track feature, so that the whole track information of the object can be updated, and the updated track information can be called as a spatial feature vector.

In addition, the attention mechanism used for the above transformation of the time and space dimensions may be a self-attention mechanism (self-attention) or may be a multi-head attention mechanism after the self-attention processing.

In step 104, an object feature vector of the target object is obtained according to the temporal feature vector and the spatial feature vector.

In this step, the obtained temporal feature vector and spatial feature vector may be integrated to obtain an object feature vector of the target object, where the object feature vector may be updated trajectory information obtained after the temporal and spatial dimension processing.

Optionally, the transformation of the time and space dimensions may be performed in parallel and then merged together, or may be performed in series. For example, the trajectory information of the object may be subjected to time dimension transformation and space dimension transformation, and then the transformed results are fused together, for example, the time feature vector and the space feature vector are spliced and then processed through a full connection layer to obtain the object feature vector. For example, the temporal feature vector may be subjected to time-dimensional transformation, and the spatial feature vector may be subjected to spatial-dimensional information transformation to obtain a spatial feature vector as the object feature vector. Or, the transformation of the space dimension is executed firstly, and the information transformation of the time dimension is carried out on the space feature vector, and the obtained time feature vector is used as the object feature vector. Of course, other combinations of temporal and spatial transformations may be used.

In step 106, based on the object feature vector, the trajectory information of the target object in a second time interval, which is temporally subsequent to the first time interval, is predicted.

In this step, the trajectory information of the target object in the second time period may be predicted and obtained based on the object feature vector of the target object in the first time period obtained in the above step. The second time interval may be a next time interval adjacent to the first time interval in time sequence, or a certain time or a plurality of times belonging to the next time interval.

In addition, the target object of the trajectory prediction may be a pedestrian, or may be another type of object. When the target object is a pedestrian, the associated object may be a pedestrian having an associated relationship (e.g., within a preset distance range) with the target pedestrian in the same scene, and the predicted trajectory information of the pedestrian in the second time period may include at least one of: the track coordinate of the target pedestrian at least one moment in the second period, or the movement trend of the target pedestrian in the second period.

According to the information prediction method, the track information is subjected to time and space dimension transformation, and an attention mechanism is adopted for the transformation, so that more accurate and richer information is fused according to the transformed object feature vector, and the prediction result is more accurate during prediction.

How to predict a target pedestrian P1 at T based on historical trajectory data of the target pedestrian P1 and its associated objects by the trajectory prediction method of the present disclosure may be exemplarily described as follows by the network architecture shown in fig. 2_obsAnd the track coordinate corresponding to the +1 moment. It should be noted that the network architecture shown in fig. 2 is only an example, and the specific implementation is not limited thereto, and the following embodiments will provide several alternative network variants.

As shown in fig. 2, the network employs two encoders encoder 1(encoder 1) and encoder 2(encoder2) and one decoder (decoder). Wherein the target pedestrian and the related objects thereof are respectively in [1, T ]_obs]The historical track data in may be input to the encoder 1 first. The following embodiments will use one of the people, the target pedestrian, at T_obsThe trajectory coordinate prediction at the +1 time is taken as an example to describe the trajectory prediction method of the present disclosure, but it should be noted that, in actual implementation, each pedestrian in a scene, for example, the related object of the target pedestrian, may also predict T at the same time_obsThe track coordinate at the +1 moment, namely the track prediction of each pedestrian can be carried out simultaneously in parallel by the same prediction method, and the network output in fig. 2 can not only include the target pedestrian P1 at T_obsThe track coordinate at the +1 moment can also include the T of other pedestrians in the same scene_obsTrack coordinates at time + 1.

The historical track data of the target pedestrian P1 may be encoded as track information of P1 through a fully connected layer fc (fully connected layer) in the encoder 1, and the sequence includes track features respectively corresponding to the current time and historical times, that is, track coordinates of each time are respectively converted into a corresponding track feature. For example, the coordinates of the track

The trajectory coordinate at the time t1 is converted into a corresponding trajectory feature (embedding) through FC; track coordinate

Shown at time t2The trajectory coordinates may be converted into a corresponding further trajectory feature by FC. Likewise, the historical trajectory data of the associated object of the target pedestrian may be converted into corresponding trajectory information through the full connection layer FC in the same manner.

Referring to fig. 2, the encoder 1 may include a Spatial transform module (Spatial transform) and a Temporal transform module (Temporal transform), and the embodiment takes the encoder 1 as an example that includes one of the Spatial transform module and the Temporal transform module, but the specific implementation is not limited thereto. The space conversion module may be configured to perform information transformation of a space dimension based on the trajectory information of the target object and the object associated therewith, and update the trajectory information of the target object to finally obtain a space feature vector of the target object. The time conversion module may be configured to perform time-dimension information transformation on the trajectory information of the target object to obtain a time feature vector of the target object. The above-mentioned temporal feature vector and spatial feature vector are both sequences obtained after updating the trajectory information of the target object, and are respectively described as follows:

processing by the time conversion module:

the processing of the time conversion module is to process the trajectory information of the target pedestrian P1 according to the trajectory information of the pedestrian P1, and does not consider the trajectory information of the related object, that is, the module is the information of the single pedestrian.

The input, output and processing of the time conversion module are respectively as follows:

input to the modules: the trajectory characteristics of the target pedestrian P1 corresponding to the respective times may be:

wherein i represents a pedestrian P1, N represents N pedestrians P1 and related objects thereof, 1-T represents time, T is current time T_obsAnd h is a track characteristic corresponding to each time, and is a vector obtained by performing embedding conversion on track coordinates at the time.

Output of the module: may be the updated trajectory information of the target pedestrian P1In the form of a capsule, the particles,

that is, the time conversion module outputs a trace information, the sequence is the result of updating the input sequence, and the updated sequence can be called as a time feature vector.

Internal processing of the modules: an attention mechanism is used.

First, according to the track information of the pedestrian P1, a query matrix Q, key (key) matrix K and a value matrix V corresponding to the pedestrian P1 are obtained:

wherein, in the formula (1), f_Q，f_KAnd f_VIs a function used respectively in the above matrix calculation, and i represents the identification of the pedestrian P.

Next, the attention parameter Att corresponding to the pedestrian P1 is calculated by the self-attention mechanism from each matrix calculated as described above, as shown in the following equation (2):

wherein d is_kIs the dimension of the query matrix Q.

Then, the attention parameter Att obtained by the above calculation may be added to the initial trajectory information (i.e., the sequence of input modules) of the pedestrian P1 to obtain a first parameter. The first parameter is processed by a full link layer to obtain a second parameter. The second parameter is added to the first parameter to finally obtain updated track information, namely a time characteristic vector. The temporal feature vector may be a 32-dimensional vector.

The attention mechanism processes the trajectory information of the pedestrian P1, and the attention mechanism calculates the trajectory features of each time in the sequence, so that the time dependence and the association relation between the trajectory features in the historical time period can be better learned, and the prediction effect of prediction according to the time feature vector can be improved.

Optionally, the calculation may be further performed by a multi-head attention mechanism after being processed by the self-attention mechanism. The following formula (3) and formula (4):

Head_j＝Att_j(Qⁱ,Kⁱ,Vⁱ)………………(4)

wherein f is_OIs a full connection layer for fusing multiple heads.

The MultiHead obtained by the formula (3) may be referred to as a multi-head attention parameter, and after obtaining the multi-head attention parameter, the multi-head attention parameter may be added to the initial trajectory information of the pedestrian P1 to obtain a first parameter. The first parameter is processed by a full link layer to obtain a second parameter. The second parameter is added to the first parameter to finally obtain updated track information, namely a time characteristic vector.

Further, as shown in fig. 2, the trajectory information of the target pedestrian P1 input to the time conversion module may also be obtained by means of a memory. The trajectory information obtained in the previous prediction stored in the memory can be read out for use, and of course, the trajectory information obtained in the current prediction can also be stored in the memory for use in the next prediction. And will be described in detail later.

And (3) processing of the space conversion module:

the processing of the space conversion module is to update the trajectory information of the pedestrian P1, taking into account the trajectory information of the target pedestrian P1 and the trajectory information of the object related thereto.

The input, output and processing of the spatial conversion module are respectively as follows:

input to the modules: may be the trajectory information corresponding to the target pedestrian P1 and its associated object, respectively, for example,trajectory information of the pedestrian P1, trajectory information of the pedestrian P2, trajectory information of the pedestrian P3, and the like, wherein the pedestrians P2 and P3 are all related objects of the P1. The track information includes information corresponding to "1 to T" respectively_obs"trajectory characteristics at each time within a time period:

i represents the pedestrian identification, N represents N pedestrians P1 and the related objects, and T is the current time T_obsAnd h is a track characteristic corresponding to each time, and is a track characteristic obtained by converting the track coordinate of the time.

For each time, the trajectory characteristics of the pedestrian P1 corresponding to the time and the trajectory characteristics of other associated objects (e.g., P2) are included.

Output of the module: may be the updated trajectory information of the target pedestrian P1,

that is, the spatial conversion module outputs a trajectory information, and the sequence is the result of updating the trajectory information of the pedestrian P1, and the updated sequence may be referred to as a spatial feature vector.

Internal processing of the modules: an attention mechanism is used.

Taking one of the moments as an example, please refer to fig. 3, where fig. 3 illustrates a spatial diagram corresponding to one moment. Four nodes are illustrated in the space diagram, different nodes represent different pedestrians, h of which_t ¹A trajectory feature (obtained by converting the trajectory coordinates of the pedestrian) h corresponding to the target pedestrian P1 at the time t_t ²Representing the trajectory characteristic of the associated pedestrian P2 corresponding at time t, and so on, h_t ⁴Indicating the trajectory characteristic corresponding to the associated pedestrian P4 at time t. The distance between each associated pedestrian and the target pedestrian P1 may be less than a predetermined distance d, establishing a connecting edge between the four nodes.

First, according to the track characteristics at the time t in the track information of each pedestrian in fig. 3, a query (query) vector q corresponding to the pedestrian is obtained_iKey vector k_iAnd a vector v of values_i. For example, the trajectory characteristic of the pedestrian P1 at the time t is used to obtain the q corresponding to the pedestrian P1 at the time t_i、k_iAnd v_i. Similarly, the three vectors of other associated pedestrians at the time are calculated in the same manner.

Wherein, in the formula (5), f_Q，f_KAnd f_VIs a function used in the above calculation, respectively, and i represents the identification of the pedestrian.

Next, based on the three vectors q, k, and v calculated above, transfer information transferred from the node corresponding to the associated pedestrian to the node corresponding to the target pedestrian is set, which can be used to represent the influence of each associated pedestrian on feature update of the target pedestrian, as shown in equation (6) below:

then, the following equation (7) illustrates the calculation of the attention parameter corresponding to the target pedestrian node:

finally, based on the above formula (7), the trajectory characteristics of the target pedestrian node at that moment are updated:

as shown in the above equations (8) and (9), i represents the identification of the target pedestrian P1, and h_iIs the trajectory characteristic of the pedestrian P1 at that time, and nb (i) is the set of neighbor nodes (also called associated objects) of the pedestrian P1. h is_i' is the updated trajectory feature of the pedestrian P1, and may be referred to as an updated trajectory feature.

As shown in equation (8), the transfer information matrix including information transferred to the target node of the target pedestrian P1 by each neighbor node and information transferred to itself by the target node itself is subjected to softmax processing. And multiplying the result of the softmax processing by a key value matrix, wherein the key value matrix comprises: and (4) key value vectors v of each neighbor node and the target node. Finally divided by one

The data obtained may be referred to as the attention parameter (i.e., the result of the equation preceding the + sign of equation 8). Then the attention parameter is related to h_iAnd adding to obtain the output result of the formula (8).

According to the formula (9), the output result of the formula (8) is also passed through f_outAfter the processing, the updated track characteristic h of the target pedestrian is obtained by adding the output result of the formula (8)_i'. Wherein f is_outMay be a fully connected layer.

The trajectory feature of the target object at each time can be updated according to the above method, and updated trajectory information of the target object is finally obtained, and the updated sequence can be referred to as a spatial feature vector. For example, for a certain time in the first time period, the updated trajectory feature of the target pedestrian at the certain time can be obtained through an attention mechanism according to the trajectory features of the target pedestrian and the related pedestrians at the certain time, the whole of the updated trajectory feature respectively corresponding to the target pedestrian at each time in the first time period is the spatial feature vector of the target pedestrian, and the spatial feature vector may be a 32-dimensional vector.

Optionally, the calculated attention parameter may also be calculated by a multi-head attention mechanism, such asH is added after the excessive attention mechanism treatment_iThe addition results in att (i), which is not described in detail.

Please continue to refer to fig. 2, the spatial conversion module obtains a spatial feature vector, the temporal conversion module obtains a temporal feature vector, and an object feature vector may be obtained according to the temporal feature vector and the spatial feature vector, and liuli may obtain an object feature vector corresponding to the target object by processing the temporal feature vector and the spatial feature vector through the full connection layer FC, so as to obtain an output result of the encoder 1.

The output result of the encoder 1 is the updated track information of the target object, and the output result is further processed by the encoder2 according to the embodiment of the present disclosure. The encoder2 may likewise comprise a temporal conversion module and a spatial conversion module, but these two modules may be serial in the encoder2, for example, the result output by the encoder 1 may be processed by the spatial conversion module in the encoder2 before the spatial feature vector output by the spatial conversion module is processed by the temporal conversion module.

Specifically, the processing of the temporal conversion module and the spatial conversion module in the encoder2 is the same as that in the encoder 1, and only the following is briefly described here: for example, the updated trajectory information of the pedestrian P1 output by the encoder 1 is input to the spatial conversion module in the encoder2, which also updates the trajectory characteristics of the pedestrian P1 at that time with the characteristics of the associated pedestrian of the pedestrian P1 by the attention mechanism for each time instant. The spatial feature vector output by the spatial conversion module in the encoder2 is continuously input to the time conversion module in the encoder2, and the time conversion module performs time-dimension conversion on the track information of the pedestrian P1 to obtain a temporal feature vector. In the framework of fig. 2, the time eigenvector output by the time conversion module in the encoder2 is the finally obtained updated trajectory information of the pedestrian P1, and is called an object eigenvector.

In the system architecture of fig. 2, by adding the encoder2, the extraction of the trajectory prediction influencing factor for the pedestrian P1 can be more accurate and comprehensive. For example, in the case of the spatial conversion module in the encoder2, the spatial conversion module in the previous encoder 1 assumes that, for a first time instant in a first period, which may be any or specific time instant, the target pedestrian has at least one first associated object at the first time instant, each first associated object itself also has at least one first object at the first time instant, and there is an association relationship between the first object and the first associated object, for example, the distance between the first object and the first associated object is within a preset distance range. At the first moment, the trajectory feature of the pedestrian P1 is not only updated by the feature of each first associated object of the target pedestrian P1, but also the trajectory feature of the first associated object is simultaneously updated according to the trajectory features of the first associated object and at least one first object thereof. Then at a first time instant the updated trajectory characteristic of the target object at said first time instant and the updated trajectory characteristic of at least one of said first associated objects at said first time instant may be obtained. Then, according to the updated trajectory feature of the target object at the first time and the updated trajectory feature of at least one first associated object at the first time, the updated trajectory feature of the target object after being updated again at the first time may be obtained through an attention mechanism.

For example, with the nodes in FIG. 3

For example, the node

One of the first associated objects corresponding to the target pedestrian P1 has first been based on the respective first associated object

And

the node corresponding to the target pedestrian P1 is updated

The trajectory feature of (1). At the same timeAlso according to the node

The trace characteristics of the associated first objects (not shown) are updated with attention mechanism

To obtain a node

The updated track characteristics can be obtained by the same way

And

the updated trajectory feature of (1). The updated trajectory characteristics of these first associated objects have introduced into them the impact of their associated first object on the trajectory prediction of the first associated object. Then, when the spatial conversion module in the encoder2 resumes updating the trajectory feature of the target pedestrian P1, the associated object according to P1 can be used

And

the updated trajectory feature of P1 is continuously updated, which corresponds to indirectly taking into account the trajectory influence of the first object on P1.

In addition, with continued reference to fig. 2, the network architecture further includes a memory (graph memory), which may be an external memory independent of the prediction network, and the object feature vector of the target object output by the time conversion module in the encoder2 may be stored in the memory.

For example, assuming that the current time is t-1 (predicted is the trajectory coordinates at time t), the object feature vectors for the [1, t-1] period may be stored into the memory. When the track coordinate at the time t +1 is predicted subsequently, the track coordinate at the time t can be coded to obtain a corresponding current track characteristic, the historical track characteristics at each time within the [1, t-1] period can be directly read and used by a memory, and the current track characteristic at the time t and each historical track characteristic within the [1, t-1] period can form track information of the target object.

As can be seen from the above example, the trajectory characteristics of the target object at each time in the first time period may be partly obtained by encoding and partly obtained by reading from the memory. It may be assumed that the track coordinates of the target object at a second time (for example, the time t) in the first time period are encoded, so as to obtain a track feature corresponding to the second time; and reading track characteristics respectively corresponding to the target objects at a third moment in the first period (for example, each moment in the [1, t-1] period). Wherein the number of the second time instants or the third time instants may be at least one. In addition, the specific implementation may have various modifications, for example, the memory may store the track characteristics at each time, or the memory may not store the track characteristics at any time, or of course, a part of the track characteristics may be stored in the memory, and another part of the track characteristics may need to be encoded.

Alternatively, when the input of the encoder 1 is track data in the [1, t ] period, the track information may be obtained by FC as shown in fig. 2, and each historical track feature in the [1, t-1] period in the track information is replaced by the corresponding embedding read by the memory. By using the sequences in the memory for prediction, the prediction effect can be better.

Finally, the object feature vector output by the encoder2 may be output by a decoder (decoder), which may be a full link layer FC, reduced to 2-dimensional vector, to obtain the target pedestrian P1 at the time T_obs+1The trajectory coordinates of (1).

Similarly, other related pedestrians in the same scene as the target pedestrian can also be predicted at the time T by the same prediction method_obs+1I.e. each pedestrian in the same sceneTime T_obs+1The track coordinate predictions can be parallel at the same time, and finally, the network can output and obtain the time T of each pedestrian in the same scene _obs+1, the trajectory coordinates. The newly predicted track coordinates can be returned and stored in the history record for next prediction of T_obs+2The track coordinate of the time is used.

In addition, the network architecture of fig. 2 is only an example, and there may be a variety of practical implementations as long as feature extraction in two dimensions of time and space is performed in information prediction based on population association and an attention mechanism is used to extract features. Several variations are exemplified as follows, but not limited to the following variations, and the time conversion module and the space conversion module can be freely combined in specific implementations without limiting the number of modules and their combination:

for example, at least one of the time conversion module and the space conversion module in fig. 2 may adopt Attention, the space conversion module adopts GAT (Graph Attention Network), and the time conversion module adopts the manner mentioned in the embodiments of the present disclosure.

For another example, the space transformation module in fig. 2 adopts the method mentioned in the embodiment of the present disclosure, and the time transformation module adopts an LSTM (Long Short-Term Memory) network method. As another example, the encoder in the architecture of fig. 2 may also have only encoder 1, with encoder2 removed; alternatively, the encoder has only encoder2, with encoder 1 removed. Also for example, the encoder2 shown in fig. 2 may have only a spatial conversion module, with the temporal conversion module being eliminated. The external memory is also optionally provided.

The encoder and decoder described above in fig. 2 are also trained and then used for information prediction. In the network training stage, the prediction information of the target object can have a predicted value and a real value, a loss function is calculated according to the predicted value and the real value, and network parameters are reversely adjusted according to the loss function value.

The information prediction method of the above embodiment can be applied to various scenes after obtaining the predicted trajectory:

for example, after the predicted trajectory of the target object is obtained through prediction, if the actual trajectory of the target object is not matched with the predicted trajectory, it is determined that the target object is abnormal in behavior. The mismatch may be that the actual trajectory is different from the predicted trajectory, including the case where the deviation between the actual trajectory and the predicted trajectory is large, for example, the two trajectories are determined to be mismatched when the deviation between the two trajectories is large. And the measurement of the track deviation can measure the distance between the actual track and the predicted track by the following indexes: ade (average Displacement error) or fde (final Displacement error), and whether the degree of deviation between the two tracks is large can be judged by setting a certain threshold value for the indexes. One example of a practical application may be that a pedestrian P2 is predicted to turn left at the intersection, and as a result, the pedestrian may be determined to be at risk of a behavioral anomaly. For example, when the behavior of a pedestrian is found to be abnormal, it indicates that the pedestrian may be an illegal person (such as a thief).

For another example, after the predicted trajectory of the target object is predicted, the path planning process is performed based on the predicted trajectory of the target object. For example, when the intelligent robot is assisted to walk by itself, after the track of a pedestrian opposite to the intelligent robot is predicted, the robot can decide the next action route of the robot according to the predicted track of the pedestrian, for example, the robot can correct the walking route of the robot to prevent the robot from colliding with the predicted pedestrian. In addition, the method can also be applied to other intelligent driving equipment, and the intelligent driving equipment can correct or plan the next step of driving route of the intelligent driving equipment according to the predicted pedestrian track so as to avoid collision with the pedestrian.

The embodiment of the disclosure provides a track prediction device, which can execute the method of any embodiment of the disclosure. The apparatus is briefly described below, and the specific processing of its various modules may be combined with reference to method embodiments. As shown in fig. 4, the apparatus may include: an information acquisition module 41, an information transformation module 42, a vector acquisition module 43, and an information prediction module 44.

The information obtaining module 41 is configured to obtain track information of the target object and the associated object in a first period respectively;

the information transformation module 42 is configured to perform time-dimensional information transformation on the trajectory information of the target object to obtain a time feature vector of the target object, and perform space-dimensional information transformation on the basis of the trajectory information of the target object and the associated object to obtain a space feature vector of the target object; wherein at least one of the information transformation of the time dimension and the space dimension adopts an attention mechanism;

a vector obtaining module 43, configured to obtain an object feature vector of the target object according to the temporal feature vector and the spatial feature vector;

and the information prediction module 44 is configured to predict trajectory information of the target object in a second time period based on the object feature vector, where the second time period is located after the first time period in time sequence.

In some embodiments, the information transformation module 42, when configured to perform information transformation of spatial dimensions based on trajectory information of the target object and the associated object, to obtain a spatial feature vector of the target object, includes: for each moment in a first time interval, obtaining an updated trajectory feature of the target object at the moment through an attention mechanism according to the trajectory feature of the target object at the moment and the trajectory feature of at least one associated object of the target object at the moment; and the whole of the updated track features respectively corresponding to the target object at each moment in the first time interval is used as the space feature vector of the target object. The track information of the first period includes: respectively corresponding to the track characteristics at each moment in the first time interval.

In some embodiments, the information transformation module 42, when configured to obtain the updated trajectory feature of the target object at the time point through an attention mechanism, includes: obtaining a query vector corresponding to the target object at the moment according to the track characteristics of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristics of the associated object at the moment; determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object; obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment; and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.

In some embodiments, the information transformation module 42 is further configured to, for a first associated object of the target object at a first time in the first time period, obtain, through an attention mechanism, an updated trajectory feature of the first associated object at the time according to the trajectory feature of the first associated object and a trajectory feature of at least one first object, where an association relationship exists between the first object and the first associated object; and obtaining the updated trajectory feature of the target object after being updated again at the first moment through an attention mechanism according to the updated trajectory feature of the target object at the first moment and the updated trajectory feature of at least one first associated object at the first moment.

In some embodiments, the information transformation module 42, when configured to perform information transformation on the trajectory information of the target object in the time dimension, includes: and obtaining the updated time characteristic vector of the target object through an attention mechanism according to the track characteristics of the target object corresponding to each moment in the track information. The track information of the first period comprises: respectively corresponding to the track characteristics at each moment in the first time interval.

In some embodiments, the information transformation module 42, when configured to obtain the temporal feature vector of the target object, includes: under the condition that the time dimension information transformation adopts a self-attention mechanism, the time dimension information transformation and the multi-head attention mechanism are carried out on the track information of the target object to obtain a time feature vector of the target object; and/or when the information transformation module is used for obtaining the spatial feature vector of the target object, the information transformation module comprises: and under the condition that the information transformation of the space dimension adopts a self-attention mechanism, carrying out the information transformation of the space dimension and the processing of a multi-head attention mechanism on the basis of the track information of the target object and the associated object to obtain a space feature vector of the target object.

In some embodiments, the vector obtaining module 43 is specifically configured to: stitching the temporal feature vector and the spatial feature vector; and obtaining the object feature vector according to the vector obtained after splicing.

In some embodiments, the vector obtaining module 43, when configured to obtain the object feature vector of the target object according to the temporal feature vector and the spatial feature vector, includes: performing information transformation of a spatial dimension on the time characteristic vector output by the information transformation of the temporal dimension, and taking the obtained spatial characteristic vector as the object characteristic vector; and/or performing time-dimension information transformation on the space characteristic vector output by the space-dimension information transformation to obtain a time characteristic vector as the object characteristic vector.

In some embodiments, the information obtaining module 41, when configured to obtain the trajectory information of the target object in the first time period, includes: and acquiring the track characteristics of the target object at each moment in the first time period, and acquiring the track information of the target object in the first time period according to the acquired track characteristics.

In some embodiments, the information obtaining module 41, when configured to obtain the trajectory feature of the target object at each time in the first time period, includes: encoding the track coordinate of the target object at a second moment in the first time interval to obtain a track characteristic corresponding to the second moment; and/or reading track features respectively corresponding to the target objects at a third moment in the first time interval from a memory.

In some embodiments, the above apparatus may be configured to perform any of the methods described above, and for brevity, the description is omitted here.

As shown in fig. 5, an embodiment of the present disclosure also provides an electronic device, which includes a memory 51, a processor 52, and an internal bus 53. The memory 51 is configured to store computer readable instructions, and the processor 52 is configured to call the computer instructions to implement the trajectory prediction method according to any embodiment of the present specification. For example, the processor 52 may execute the functions of the modules in the trajectory prediction apparatus in the embodiment of the disclosure by calling the instructions in the memory, for example, the function area of the information obtaining module may obtain the trajectory information, the function of the information transforming module may transform the trajectory information in the time and space dimensions, and the functions of the vector obtaining module and the information predicting module may also be implemented to perform the trajectory prediction. Of course, the processor 52 may also implement other functions of the information obtaining module, the information transforming module, the vector obtaining module and the information predicting module provided in the embodiments of the present disclosure, and will not be described in detail.

The disclosed embodiments also provide a computer-readable storage medium on which a computer program is stored, where the computer program is executed by a processor to implement the trajectory prediction method of any one of the embodiments of the present specification.

One skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program may be stored, where the computer program, when executed by a processor, implements the steps of the method for training a neural network for word recognition described in any of the embodiments of the present disclosure, and/or implements the steps of the method for word recognition described in any of the embodiments of the present disclosure.

Wherein, the "and/or" described in the embodiments of the present disclosure means having at least one of the two, for example, "multiple and/or B" includes three schemes: poly, B, and "poly and B". The embodiments in the disclosure are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The foregoing description of specific embodiments of the present disclosure has been described. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this disclosure may be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this disclosure and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPG multi (field programmable gate array) or a SIC multi (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Further, the computer may be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PD multi), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this disclosure contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or of what may be claimed, but rather as merely describing features of particular embodiments of the disclosure. Certain features that are described in this disclosure in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure, and is not intended to limit the scope of the present disclosure, which is to be construed as being limited by the appended claims.

Claims

1. A trajectory prediction method, characterized in that the method comprises:

obtaining an object feature vector of a target object according to the time feature vector and the space feature vector;

2. The method of claim 1, wherein the trajectory information for the first time period comprises: respectively corresponding to the track characteristics of each moment in the first time interval;

the information transformation of the spatial dimension is performed based on the track information of the target object and the associated object to obtain the spatial feature vector of the target object, and the method comprises the following steps:

for each moment in the first time interval, obtaining an updated trajectory feature of the target object at the moment through an attention mechanism according to the trajectory feature of the target object at the moment and the trajectory feature of at least one associated object of the target object at the moment;

and the whole of the updated track features respectively corresponding to the target object at each moment in the first time interval is used as the space feature vector of the target object.

3. The method of claim 2, wherein the obtaining the updated trajectory feature of the target object at the time point through the attention mechanism comprises:

obtaining a query vector corresponding to the target object at the moment according to the track characteristics of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristics of the associated object at the moment;

determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object;

obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment;

and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.

4. The method of claim 2, further comprising:

for a first associated object of the target object at a first time in the first time period, obtaining an updated trajectory feature of the first associated object at the time through an attention mechanism according to the trajectory feature of the first associated object and a trajectory feature of at least one first object, wherein an association relationship exists between the first object and the first associated object;

and obtaining the updated trajectory feature of the target object after being updated again at the first moment through an attention mechanism according to the updated trajectory feature of the target object at the first moment and the updated trajectory feature of at least one first associated object at the first moment.

5. The method of claim 4, wherein a distance between the first object and the first associated object is within a preset distance range.

6. The method according to any one of claims 1 to 5, wherein the track information of the first time interval comprises: respectively corresponding to the track characteristics of each moment in the first time interval; the time-dimension information transformation of the trajectory information of the target object includes:

and obtaining the updated time characteristic vector of the target object through an attention mechanism according to the track characteristics of the target object in the track information respectively corresponding to each moment.

7. The method according to claim 1, wherein in a case that the information transformation in the time dimension adopts a self-attention mechanism, the performing the information transformation in the time dimension on the trajectory information of the target object to obtain a time feature vector of the target object includes:

carrying out time dimension information transformation and multi-head attention mechanism processing on the track information of the target object to obtain a time characteristic vector of the target object; and/or, under the condition that the information transformation of the spatial dimension adopts a self-attention mechanism, performing the information transformation of the spatial dimension based on the trajectory information of the target object and the associated object to obtain a spatial feature vector of the target object, including:

and performing information transformation of spatial dimensions and processing of a multi-head attention mechanism on the basis of the track information of the target object and the associated object to obtain a spatial feature vector of the target object.

8. The method according to claim 1, wherein the obtaining an object feature vector of the target object according to the temporal feature vector and the spatial feature vector comprises:

stitching the temporal feature vector and the spatial feature vector;

and obtaining the object feature vector according to the vector obtained after splicing.

9. The method according to claim 1, wherein the obtaining an object feature vector of the target object according to the temporal feature vector and the spatial feature vector comprises:

performing information transformation of a spatial dimension on the time characteristic vector output by the information transformation of the temporal dimension, and taking the obtained spatial characteristic vector as the object characteristic vector;

and/or performing time-dimension information transformation on the space characteristic vector output by the space-dimension information transformation to obtain a time characteristic vector as the object characteristic vector.

10. The method of claim 1, wherein the obtaining trajectory information of the target object in the first period of time comprises:

and acquiring the track characteristics of the target object at each moment in the first time period, and acquiring the track information of the target object in the first time period according to the acquired track characteristics.

11. The method of claim 10, wherein the first time period comprises at least one of a second time and a third time;

the acquiring of the trajectory feature of the target object at each moment in the first time period includes:

encoding the track coordinate of the target object at a second moment in the first time interval to obtain a track characteristic corresponding to the second moment; and/or reading track features respectively corresponding to the target objects at a third moment in the first time interval from a memory.

12. The method according to any one of claims 1 to 11,

the target object is a target pedestrian;

the track information of the target object in the second time period comprises at least one of the following items: the track coordinates of the target pedestrian at each moment in the second time period and the movement trend of the target pedestrian in the second time period;

the associated object is a pedestrian having an association relationship with the target pedestrian in the same scene.

13. A trajectory prediction device, characterized in that the device comprises:

14. An electronic device, comprising: a memory for storing computer readable instructions, a processor for invoking the computer instructions to implement the method of any of claims 1-12.

15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 12.