CN111798492B - Track prediction method, track prediction device, electronic equipment and medium - Google Patents

Track prediction method, track prediction device, electronic equipment and medium Download PDF

Info

Publication number
CN111798492B
CN111798492B CN202010685801.1A CN202010685801A CN111798492B CN 111798492 B CN111798492 B CN 111798492B CN 202010685801 A CN202010685801 A CN 202010685801A CN 111798492 B CN111798492 B CN 111798492B
Authority
CN
China
Prior art keywords
track
target object
information
moment
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010685801.1A
Other languages
Chinese (zh)
Other versions
CN111798492A (en
Inventor
余存俊
马骁
赵海宇
伊帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sensetime International Pte Ltd
Original Assignee
Sensetime International Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sensetime International Pte Ltd filed Critical Sensetime International Pte Ltd
Priority to CN202010685801.1A priority Critical patent/CN111798492B/en
Publication of CN111798492A publication Critical patent/CN111798492A/en
Application granted granted Critical
Publication of CN111798492B publication Critical patent/CN111798492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Traffic Control Systems (AREA)

Abstract

The embodiment of the disclosure provides a track prediction method, a track prediction device, electronic equipment and a medium, wherein the track prediction method comprises the following steps: track information of the target object and the associated object in a first period is respectively obtained; performing time dimension information transformation on track information of the target object to obtain a time feature vector of the target object, and performing space dimension information transformation on the track information of the target object and related objects to obtain a space feature vector of the target object; wherein at least one of the information transformations in the time dimension and the space dimension employs an attention mechanism; obtaining an object feature vector of the target object according to the time feature vector and the space feature vector; and predicting track information of the target object in the second period based on the object feature vector.

Description

Track prediction method, track prediction device, electronic equipment and medium
Technical Field
The disclosure relates to machine learning technology, and in particular relates to a track prediction method, a track prediction device, electronic equipment and a medium.
Background
In information prediction, sometimes, the information prediction of a single object may be affected by various factors, and how to predict the information more accurately is a subject to be studied. For example, taking the prediction of the track of a pedestrian as an example, the analysis and understanding of the walking behavior of the pedestrian is an important research direction in the fields of computer vision and intelligent video monitoring, and the walking behavior model has important applications in many fields, such as the prediction of the walking behavior, the detection and tracking of the pedestrian, and the like. Modeling human walking behavior is a complex problem, and many important factors such as internal factors and external factors need to be considered.
Disclosure of Invention
The embodiment of the disclosure at least provides a track prediction method, a track prediction device, electronic equipment and a medium.
In a first aspect, a track prediction method is provided, the method comprising:
track information of the target object and the associated object in a first period is respectively obtained;
Performing time dimension information transformation on the track information of the target object to obtain a time feature vector of the target object, and performing space dimension information transformation on the track information of the target object and the track information of the related object to obtain a space feature vector of the target object; at least one of the information transformations of the time dimension and the space dimension adopts an attention mechanism;
obtaining an object feature vector of the target object according to the time feature vector and the space feature vector;
track information of the target object in a second period is predicted based on the object feature vector, wherein the second period is positioned after the first period in time sequence.
In some embodiments, the trajectory information of the first period includes: track features corresponding to each moment in the first period respectively; the performing spatial dimension information transformation based on the track information of the target object and the associated object to obtain a spatial feature vector of the target object, including: for each moment in the first period, according to the track characteristics of the target object at the moment and the track characteristics of at least one associated object of the target object at the moment, obtaining updated track characteristics of the target object at the moment through an attention mechanism; and the whole of the updated track characteristics corresponding to each moment of the target object in the first period is taken as a space characteristic vector of the target object.
In some embodiments, the obtaining, by an attention mechanism, an updated track characteristic of the target object at the time includes: obtaining a query vector corresponding to a target object at the moment according to the track characteristic of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristic of the associated object at the moment; determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object; obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment; and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.
In some embodiments, the method further comprises: for a first associated object of the target object at a first moment in the first period, according to the track characteristics of the first associated object and the track characteristics of at least one first object, obtaining updated track characteristics of the first associated object at the moment through an attention mechanism, wherein an association relationship exists between the first object and the first associated object; and obtaining the updated track characteristics of the target object after being updated again at the first moment through an attention mechanism according to the updated track characteristics of the target object at the first moment and the updated track characteristics of at least one first associated object at the first moment.
In some embodiments, the distance between the first object and the first associated object is within a preset distance range.
In some embodiments, the trajectory information of the first period includes: track features corresponding to each moment in the first period respectively; the performing information transformation of the time dimension on the track information of the target object includes: and obtaining updated time feature vectors of the target object through an attention mechanism according to the track features corresponding to each moment in the track information of the target object.
In some embodiments, in the case where the information transformation in the time dimension adopts a self-attention mechanism, the performing the information transformation in the time dimension on the track information of the target object to obtain a time feature vector of the target object includes: performing time dimension information transformation and multi-head attention mechanism processing on the track information of the target object to obtain a time feature vector of the target object; and/or, in the case that the spatial dimension information transformation adopts a self-attention mechanism, performing the spatial dimension information transformation based on the track information of the target object and the related object to obtain a spatial feature vector of the target object, where the method includes: and carrying out information transformation of space dimension and processing of a multi-head attention mechanism based on the track information of the target object and the related object to obtain the space feature vector of the target object.
In some embodiments, the obtaining the object feature vector of the target object according to the temporal feature vector and the spatial feature vector includes: splicing the time feature vector and the space feature vector; and obtaining the object feature vector according to the vector obtained after the splicing.
In some embodiments, the obtaining the object feature vector of the target object according to the temporal feature vector and the spatial feature vector includes: performing information transformation of space dimension on the time feature vector output by the information transformation of the time dimension, wherein the obtained space feature vector is used as the object feature vector; and/or performing information transformation of the time dimension on the spatial feature vector output by the information transformation of the space dimension, wherein the obtained time feature vector is used as the object feature vector.
In some embodiments, the acquiring the track information of the target object in the first period includes: and acquiring the track characteristics of the target object at each moment in the first period, and acquiring track information of the target object in the first period according to the acquired track characteristics.
In some embodiments, the first time period includes at least one of a second time and a third time; the obtaining the track characteristic of the target object at each moment in the first period of time includes: encoding the track coordinates of the target object at a second moment in the first period to obtain track characteristics corresponding to the second moment; and/or reading track features corresponding to the target objects at the third moment in the first period from a memory.
In some embodiments, the target object is a target pedestrian; the track information of the target object in the second period comprises at least one of the following: track coordinates of the target pedestrian at each moment in the second period, and movement trend of the target pedestrian in the second period; the association object is a pedestrian having an association relationship with the target pedestrian in the same scene.
In a second aspect, there is provided a trajectory prediction device, the device comprising:
the information acquisition module is used for respectively acquiring track information of the target object and the associated object in a first period;
the information transformation module is used for carrying out information transformation of time dimension on the track information of the target object to obtain a time feature vector of the target object, and carrying out information transformation of space dimension on the track information of the target object and the track information of the related object to obtain a space feature vector of the target object; at least one of the information transformations of the time dimension and the space dimension adopts an attention mechanism;
The vector obtaining module is used for obtaining an object feature vector of the target object according to the time feature vector and the space feature vector;
And the information prediction module is used for predicting and obtaining track information of the target object in a second period based on the object feature vector, wherein the second period is positioned after the first period in time sequence.
In some embodiments, the information transformation module, when used for performing information transformation of spatial dimensions based on the track information of the target object and the associated object, obtains a spatial feature vector of the target object, includes: for each moment in the first period, according to the track characteristics of the target object at the moment and the track characteristics of at least one associated object of the target object at the moment, obtaining updated track characteristics of the target object at the moment through an attention mechanism; and the whole of the updated track characteristics corresponding to each moment of the target object in the first period is taken as a space characteristic vector of the target object. The track information of the first period includes: the track features respectively correspond to each moment in the first period.
In some embodiments, the information transformation module, when configured to obtain, by an attention mechanism, an updated trajectory characteristic of the target object at the time, comprises: obtaining a query vector corresponding to a target object at the moment according to the track characteristic of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristic of the associated object at the moment; determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object; obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment; and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.
In some embodiments, the information transformation module is further configured to, for a first associated object of the target object at a first time in the first period, obtain, by an attention mechanism, an updated track characteristic of the first associated object at the time according to a track characteristic of the first associated object and a track characteristic of at least one first object, where an association relationship exists between the first object and the first associated object; and obtaining the updated track characteristics of the target object after being updated again at the first moment through an attention mechanism according to the updated track characteristics of the target object at the first moment and the updated track characteristics of at least one first associated object at the first moment.
In some embodiments, the distance between the first object and the first associated object is within a preset distance range.
In some embodiments, the information transformation module, when used for performing the information transformation of the track information of the target object in the time dimension, includes: and obtaining the updated time feature vector of the target object through an attention mechanism according to the track features of the target object at each corresponding moment in the track information of the target object. The track information of the first period includes: the track features respectively correspond to each moment in the first period.
In some embodiments, the information transformation module, when used to obtain the temporal feature vector of the target object, includes: under the condition that the information transformation of the time dimension adopts a self-attention mechanism, carrying out the information transformation of the time dimension and the processing of a multi-head attention mechanism on the track information of the target object to obtain a time feature vector of the target object; and/or, the information transformation module, when used for obtaining the space feature vector of the target object, comprises: under the condition that the information transformation of the space dimension adopts a self-attention mechanism, the information transformation of the space dimension and the processing of a multi-head attention mechanism are carried out based on the track information of the target object and the related object, and the space feature vector of the target object is obtained.
In some embodiments, the vector obtaining module is specifically configured to: splicing the time feature vector and the space feature vector; and obtaining the object feature vector according to the vector obtained after the splicing.
In some embodiments, the vector obtaining module, when configured to obtain an object feature vector of the target object according to the temporal feature vector and the spatial feature vector, includes: performing information transformation of space dimension on the time feature vector output by the information transformation of the time dimension, wherein the obtained space feature vector is used as the object feature vector; and/or performing information transformation of the time dimension on the spatial feature vector output by the information transformation of the space dimension, wherein the obtained time feature vector is used as the object feature vector.
In some embodiments, the information acquisition module, when configured to acquire track information of the target object in the first period, includes: and acquiring the track characteristics of the target object at each moment in the first period, and acquiring track information of the target object in the first period according to the acquired track characteristics.
In some embodiments, the information acquisition module, when configured to acquire a trajectory characteristic of the target object at each time in the first period, includes: encoding the track coordinates of the target object at a second moment in the first period to obtain track characteristics corresponding to the second moment; and/or reading track features corresponding to the target objects at the third moment in the first period from a memory.
In some embodiments, the target object is a target pedestrian; the track information of the target object in the second period comprises at least one of the following: track coordinates of the target pedestrian at each moment in the second period, and movement trend of the target pedestrian in the second period; the association object is a pedestrian having an association relationship with the target pedestrian in the same scene.
In a third aspect, there is provided an electronic device, the device comprising: the track prediction method comprises a memory and a processor, wherein the memory is used for storing computer readable instructions, and the processor is used for calling the computer instructions to realize the track prediction method according to any embodiment of the disclosure.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements a trajectory prediction method according to any one of the embodiments of the present disclosure.
According to the track prediction method, the track prediction device, the electronic equipment and the track prediction medium, through the transformation of time and space dimensions of track information and the adoption of the attention mechanism for the transformation, more accurate and richer information is fused according to the transformed object feature vector, and accordingly, the prediction result in track prediction is more accurate.
Drawings
In order to more clearly illustrate the technical solutions of one or more embodiments of the present disclosure or related technologies, the following description will briefly describe the drawings that are required to be used in the embodiments or related technology descriptions, and it is apparent that the drawings in the following description are only some embodiments described in one or more embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 illustrates a flow chart of a trajectory prediction method provided by at least one embodiment of the present disclosure;
FIG. 2 illustrates a network architecture diagram of one trajectory prediction provided by at least one embodiment of the present disclosure;
FIG. 3 illustrates a feature update schematic of one spatial dimension provided by at least one embodiment of the present disclosure;
FIG. 4 illustrates a block diagram of a trajectory prediction device provided by at least one embodiment of the present disclosure;
fig. 5 illustrates a schematic diagram of an electronic device provided in at least one embodiment of the present disclosure.
Detailed Description
In order that those skilled in the art will better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. All other embodiments, which may be made by one of ordinary skill in the art based on one or more embodiments of the present disclosure without inventive faculty, are intended to be within the scope of the present disclosure.
When predicting information of a certain object in a next period based on information of the object in a previous period, such information prediction may be affected not only by history information of the object itself in the previous period but also by history information of other objects in the same previous period. Therefore, the historical information of the object and other objects can be integrated to predict, so that a more accurate prediction result can be obtained.
That is, by adopting the technical scheme provided by the disclosure, the object information of a period after a certain object can be predicted through the history information generated in a previous period. Wherein the previous period of time may be understood as a period of time including the current time; the latter period may be understood as being time-sequentially adjacent to the former period, i.e. the next period being connected to the current time instant or belonging to a certain time instant or times within the next period.
For example, in the purchase recommendation field, a target user's tendency to purchase at a certain moment in time in the next period may be predicted from the target user's historical purchase record and its associated user's (e.g., the target user's friends or relatives) historical purchase record, thereby recommending to the target user items of which the target user would be interested with a high probability.
For another example, in the field of track prediction, the track of an object may relate to its own historical track, or may relate to the track of other related objects (for example, other objects that are relatively close to the object) in the same walking scene, so that the track coordinates of the object at a certain moment in the next period can be predicted according to the historical track of the object and its related objects.
The information prediction method provided in the present disclosure is described below by taking pedestrian trajectory prediction as an example, but it can be understood that the method is equally applicable to other scenarios. Fig. 1 shows a flowchart of a track prediction method provided in at least one embodiment of the present disclosure, where, as shown in fig. 1, the method may include the following processes:
in step 100, track information of the target object and the associated object in the first period is obtained respectively.
In this embodiment, the track information of the target object, i.e. the object to be predicted currently, includes not only the information corresponding to the current time t, but also the information of each time of a predetermined period of time before the current time t, for example, the predetermined period of time may be within one hour before the current time t.
For example, from historical trajectory data of a pedestrian P1 within [1, T obs ], the [1, T obs ] may be referred to as a first period to predict the trajectory coordinates of the pedestrian P1 corresponding to the time T obs +1. Wherein, P1 is called a target object to be subjected to track prediction, and T obs is the current time. The historical track data may include track coordinates corresponding to each time, for example, track coordinatesThe position coordinate of the pedestrian i at the time t is represented, the abscissa x, and the ordinate y, i represents the pedestrian identification. The historical track data can comprise a plurality of moments t, and corresponding track coordinates corresponding to the moments t respectively form a track coordinate sequence.
In practical implementation, a Video is obtained initially, the duration of the Video corresponds to [1, t obs ], the Video may include a plurality of image frames, each image frame may include a target object, i.e. a pedestrian P1, the Video may be preprocessed by a detection tracking algorithm, track coordinates of the pedestrian P1 in each image frame are extracted, and finally historical track data of the target object in the Video is obtained.
In addition, the track information may be information obtained by encoding the history track data. For example, the track sequence for a user may be { p1, p2, … … pt }, where pt is the track coordinate of the user at the current time t (x t,yt), p1 is the track coordinate of the user at the historical time t1 (x 1,y1), and so on. The track coordinates at each time may be encoded into a vector, for example, the track coordinates may be encoded (embedding) into a vector by a full link layer FC, and accordingly, the track coordinates at each time may be encoded into a corresponding vector, the vector encoded by the track coordinates may be referred to as a track feature, and the entirety of the track features at each time of the first period may be referred to as track information of the first period.
The associated object is an object associated with the target object. In each instance of time, at least one associated object of the target object at that instance of time may be acquired, which may be, for example, an object in the same scene as the target object. For example, still taking the above one Video as an example, if both the pedestrian P2 and the pedestrian P1 are in an image frame at a certain time of the Video and the distance therebetween is smaller than the predetermined distance d, the pedestrian P2 can be considered as an associated object of P1. The historical track data of the associated object is also a track coordinate sequence, and the associated object is obtained by considering that the information of the associated object may influence the information prediction of the target object, for example, the track of the pedestrian P2 when walking is influenced by the track of the pedestrian P2 when the pedestrian P2 is closer to the pedestrian P1.
As above, the historical track data of the target object and the associated object at each time are obtained, and the number of people in the scene at each time is assumed to be N, whereinRepresenting the trajectory coordinates of the pedestrian i at the time t, the target object and its associated objects have a total of N pedestrians, and the trajectory coordinates of each person may be two-dimensional coordinates/>Representing the position in the scene. The track coordinates of the associated object at each moment in the first period can be encoded into track features in an encoding mode, so that track information of the associated object in the first period is obtained.
In step 102, performing information transformation of time dimension on track information of the target object to obtain a time feature vector of the target object, and performing information transformation of space dimension on track information of the target object and related objects to obtain a space feature vector of the target object; wherein at least one of the information transformations in the temporal and spatial dimensions employs an attention mechanism.
In this embodiment, on one hand, the track information is transformed by using the attention mechanism; on the other hand, both temporal and spatial transformations are considered.
The temporal information transformation may be to obtain an updated sequence through an attention mechanism according to the track features corresponding to the respective moments in the track information of the object, where the updated sequence may be referred to as a temporal feature vector of the object.
The spatial information transformation may be a process of performing an attention mechanism once for each time. For example, at least one associated object of the object can be obtained at each moment, and likewise, the track coordinate of each associated object at the moment can also be encoded into a track feature, so that the updated track feature of the object can be obtained through the processing of the attention mechanism according to the track features of the object and the associated object. The corresponding updated track feature can be obtained at each moment, so that the whole track information of the object can be updated, and the updated track information can be called as a space feature vector.
In addition, the attention mechanism adopted by the transformation of the time dimension and the space dimension can be a self-attention mechanism (self-attention), or a multi-head attention mechanism is adopted to continue processing after self-attention processing.
In step 104, an object feature vector of the target object is obtained according to the temporal feature vector and the spatial feature vector.
In this step, the obtained temporal feature vector and spatial feature vector may be combined to obtain an object feature vector of the target object, where the object feature vector may be updated track information obtained after the above-mentioned temporal and spatial dimension processing.
Alternatively, the above-mentioned transformations in time and space dimensions may be performed in parallel and then fused together, or may be performed in series. For example, the track information of the object may be transformed in a time dimension and transformed in a space dimension respectively, and then the transformed results are fused together, for example, the time feature vector and the space feature vector are spliced and then processed by a full connection layer, so as to obtain the object feature vector. For example, the transformation of the time dimension may be performed first, and the information transformation of the space dimension may be continued on the time feature vector, and the obtained space feature vector may be used as the object feature vector. Or firstly, performing space dimension transformation, and performing time dimension information transformation on the space feature vector to obtain a time feature vector serving as the object feature vector. Of course, other fusion approaches of temporal and spatial transformations may also be employed.
In step 106, track information of the target object in a second period is predicted based on the object feature vector, the second period being located temporally after the first period.
In this step, track information of the target object in the second period may be predicted based on the object feature vector of the target object in the first period obtained in the above step. The second period may be a next period of time adjacent to the first period of time in time series or belong to one or more moments in the next period of time.
In addition, the target object of the track prediction may be a pedestrian or other types of objects. When the target object is a pedestrian, the association object may be a pedestrian having an association relationship (e.g., a distance within a preset distance range) with the target pedestrian in the same scene, and the predicted trajectory information of the pedestrian in the second period may include at least one of: the track coordinates of the target pedestrian in at least one moment in the second period, or the movement trend of the target pedestrian in the second period.
According to the information prediction method, the track information is transformed in time and space dimensions, and the attention mechanism is adopted for transformation, so that more accurate and rich information is fused according to the transformed object feature vector, and the prediction result in prediction is more accurate.
The following may describe, by way of example, how the track coordinates corresponding to the target pedestrian P1 at time T obs +1 may be predicted by the track prediction method of the present disclosure based on the historical track data of the target pedestrian P1 and its associated objects, by the network architecture shown in fig. 2. It should be noted that the network architecture shown in fig. 2 is only an example, and the embodiments are not limited thereto, and the following embodiments provide several alternative network variants.
As shown in fig. 2, the network employs two encoders, encoder 1 (encoder 1) and encoder 2 (encoder 2), and a decoder (decoder). Wherein, the historical track data of the target pedestrian and the associated object thereof in [1, T obs ] respectively can be input into the encoder 1 first. The following embodiment will take the track coordinate prediction of one person, i.e., the target pedestrian, at time T obs +1 as an example, and describe the track prediction method of the present disclosure, however, it should be noted that, in practical implementation, each pedestrian in one scene, such as an associated object of the target pedestrian, may also predict the track coordinate of time T obs +1 at the same time, i.e., the track predictions of each pedestrian may be performed in parallel by the same prediction method, and the network output in fig. 2 may include not only the track coordinate of the target pedestrian P1 at time T obs +1, but also the track coordinate of other pedestrians in the same scene at time T obs +1.
The historical track data of the target pedestrian P1 may be encoded into track information of P1 by a full connection layer FC (fully connected layer) in the encoder 1, where the sequence includes each track feature corresponding to the current time and each time of the history, i.e. the track coordinates of each time are converted into a corresponding track feature. For example, track coordinatesThe trajectory coordinates at time t1 can be converted (embedding) into corresponding trajectory features by FC; track coordinates/>The trajectory coordinates at time t2 are represented and can be converted into a corresponding further trajectory feature by FC. Similarly, the historical track data of the associated object of the target pedestrian can be converted into corresponding track information through the full connection layer FC in the same manner.
With continued reference to fig. 2, the encoder 1 may include a spatial conversion module (Spatial Transformer) and a temporal conversion module (Temporal Transformer), which are not limited in this embodiment in the implementation, and the encoder 1 includes one spatial conversion module and one temporal conversion module. The space conversion module may be configured to perform information transformation of a spatial dimension based on track information of the target object and associated objects thereof, and update track information of the target object to finally obtain a spatial feature vector of the target object. The time conversion module can be used for carrying out information conversion of time dimension on track information of the target object to obtain a time feature vector of the target object. The above-mentioned temporal feature vector and spatial feature vector are sequences obtained after updating the track information of the target object, and are described as follows:
Processing of the time conversion module:
The processing of the time conversion module is to process the track information of the target pedestrian P1 according to the track information of the pedestrian P1, and the track information of the related object is not considered, namely the module is the information of a single pedestrian.
The input, output and processing of the time conversion module are respectively as follows:
[ input to module ]: the track characteristics corresponding to each moment in the track information of the target pedestrian P1 may be: Where i represents the pedestrian P1, N represents the pedestrian P1 and its related objects, N are total, 1 to T represent times, T is the current time T obs, h is the track feature corresponding to each time, and is the vector obtained by embedding converting the track coordinates of the time.
Output of the module: may be updated track information of the target pedestrian P1, That is, the time conversion module outputs a trace information, the sequence is updated from the input sequence, and the updated sequence may be called a time feature vector.
[ Internal processing of modules ]: an attention mechanism is employed.
Firstly, a query matrix Q, key (key) matrix K and a value matrix V corresponding to the pedestrian P1 can be obtained according to the track information of the pedestrian P1:
In formula (1), f Q,fK and f V are functions used in the matrix calculation, respectively, and i represents the identity of the pedestrian P.
Next, from the above calculated respective matrices, the attention parameter Att corresponding to the pedestrian P1 is calculated by the self-attention mechanism as shown in the following formula (2):
where d k is the dimension of the query matrix Q.
Then, the attention parameter Att calculated as described above may be added to the initial trajectory information (i.e., the sequence of input modules) of the pedestrian P1 to obtain the first parameter. The first parameter is processed by a full connection layer to obtain a second parameter. And adding the second parameter with the first parameter to finally obtain updated track information, namely the instant characteristic vector. The temporal feature vector may be a 32-dimensional vector.
The track information of the pedestrian P1 is processed through the attention mechanism, and the attention mechanism brings track characteristics at all moments in the sequence into calculation, so that the time dependence and the association relation among the track characteristics in the historical time period can be better learned, and the prediction effect of predicting according to the time characteristic vector can be improved.
Alternatively, the calculation may be further performed by a multi-head attention mechanism after the processing by the self-attention mechanism. The following formula (3) and formula (4):
Headj=Attj(Qi,Ki,Vi)………………(4)
Wherein f O is a full-link layer for fusing multiple heads.
MultiHead obtained by the formula (3) may be referred to as a multi-head attention parameter, and after the multi-head attention parameter is obtained, similarly, it may be added to the initial trajectory information of the pedestrian P1 to obtain a first parameter. The first parameter is processed by a full connection layer to obtain a second parameter. And adding the second parameter with the first parameter to finally obtain updated track information, namely the instant characteristic vector.
Further, as shown in fig. 2, the trajectory information of the target pedestrian P1 inputted to the time conversion module can also be obtained by means of a memory. The track information obtained in the previous time prediction stored in the memory can be read out for use, and of course, the track information obtained in the current prediction can also be stored in the memory for use in the next prediction. And will be described in detail later.
Processing of the space conversion module:
The processing of the space conversion module is to comprehensively consider the track information of the target pedestrian P1 and the track information of the related object thereof, and update the track information of the pedestrian P1.
The input, output and processing of the space conversion module are respectively as follows:
[ input to module ]: the track information corresponding to the target pedestrian P1 and the associated object thereof may be, for example, track information of the pedestrian P1, track information of the pedestrian P2, track information of the pedestrian P3, and the like, where both the pedestrian P2 and the pedestrian P3 are associated objects of P1. The track information includes track features corresponding to respective times in the time periods "1 to T obs": i represents a pedestrian identifier, N represents a total of N pedestrians P1 and associated objects thereof, T is a current time T obs, h is a track feature corresponding to each time, and is a track feature obtained by converting track coordinates of the time.
For each moment, the trajectory characteristics of the pedestrian P1 corresponding to that moment, as well as the trajectory characteristics of other associated objects (e.g., P2), are included.
Output of the module: may be updated track information of the target pedestrian P1, That is, the spatial transformation module outputs a track information, the sequence is the updated result of the track information of the pedestrian P1, and the updated sequence may be called a spatial feature vector.
[ Internal processing of modules ]: an attention mechanism is employed.
Taking one of the moments as an example, please refer to fig. 3, fig. 3 illustrates a space diagram corresponding to one moment. Four nodes are illustrated in the space diagram, different nodes represent different pedestrians, h t 1 represents a track feature corresponding to a target pedestrian P1 at a time t (the track coordinate of the pedestrian is converted at the time t), h t 2 represents a track feature corresponding to an associated pedestrian P2 at the time t, and similarly, h t 4 represents a track feature corresponding to an associated pedestrian P4 at the time t. The distance between each associated pedestrian and the target pedestrian P1 may be less than the predetermined distance d, establishing a connecting edge between the four nodes.
First, a query (query) vector q i, a key (key) vector k i, and a value (value) vector v i corresponding to each pedestrian are obtained from the trajectory characteristics at the time t in the trajectory information of each pedestrian in fig. 3. For example, q i、ki and v i corresponding to the pedestrian P1 at the time t are obtained according to the trajectory characteristics of the pedestrian P1 at the time t. Similarly, three vectors of other associated pedestrians at the moment are calculated in the same manner.
In formula (5), f Q,fK and f V are functions used in the above calculation, respectively, and i represents the identity of a pedestrian.
Then, according to the above-calculated three vectors q, k, v, the setting of the transfer information transferred from the node corresponding to the associated pedestrian to the node corresponding to the target pedestrian may be shown in the following formula (6), where the transfer information may be used to represent the influence of each associated pedestrian on the feature update of the target pedestrian:
then, the following equation (7) exemplifies the calculation of the attention parameter corresponding to the target pedestrian node:
finally, based on the formula (7), updating the track characteristics of the target pedestrian node at the moment:
As shown in the above formulas (8) and (9), i represents the identity of the target pedestrian P1, h i is the trajectory characteristic of the pedestrian P1 at that time, and Nb (i) is the set of neighboring nodes (also referred to as association objects) of the pedestrian P1. h i' is an updated trajectory feature of the pedestrian P1, which may be referred to as an updated trajectory feature.
And (3) carrying out softmax processing on a transmission information matrix according to the formula (8), wherein the transmission information matrix comprises information transmitted by each neighbor node to a target node of the target pedestrian P1 and information transmitted by the target node to the target node. Multiplying the result of the softmax processing by a key value matrix, wherein the key value matrix comprises: each neighbor node and the target node's own key value vector v. And finally divided by oneThe resulting data may be referred to hereinafter as the attention parameter (i.e., the result of the equation preceding equation 8). And adding the attention parameter to h i to obtain an output result of the formula (8).
According to the formula (9), the output result of the formula (8) is further processed by f out and then added with the output result of the formula (8), so as to obtain updated track characteristics h i' of the target pedestrian. Wherein f out may be a fully connected layer.
The track characteristics of the target object at each moment can be updated according to the method, so as to finally obtain updated track information of the target object, and the updated sequence can be called as a space characteristic vector. For example, for a certain moment in the first period, according to the track characteristics of the target pedestrian and the related pedestrians at the moment, the updated track characteristics of the target pedestrian at the moment can be obtained through a attention mechanism, and the whole of the updated track characteristics of the target pedestrian corresponding to each moment in the first period, namely, the spatial feature vector of the target pedestrian, can be a 32-dimensional vector.
Optionally, the attention parameter calculated above may also be further calculated by a multi-head attention mechanism, for example, the attention parameter is processed by the multi-head attention mechanism and added to h i to obtain Att (i), which will not be described in detail.
With continued reference to fig. 2, the spatial transformation module obtains a spatial feature vector, the temporal transformation module obtains a temporal feature vector, the object feature vector may be obtained according to the temporal feature vector and the spatial feature vector, liu Liru, and the object feature vector corresponding to the target object may be obtained by processing the temporal feature vector and the spatial feature vector through the full connection layer FC, so as to obtain an output result of the encoder 1.
The output result of the encoder 1 is updated track information of the target object, and the output result is further processed by the encoder 2 according to the embodiment of the present disclosure. The encoder 2 may also comprise a temporal conversion module and a spatial conversion module, although these two modules may be serial in the encoder 2, for example, the result output by the encoder 1 may be processed by the spatial conversion module in the encoder 2 before the spatial feature vector output by the spatial conversion module is processed by the temporal conversion module.
Specifically, the processing of the temporal conversion module and the spatial conversion module in the encoder 2 is the same as that in the encoder 1, and only a brief description is given here: for example, the updated trajectory information of the pedestrian P1 output by the encoder 1 is input to a spatial conversion module in the encoder 2, which also updates the trajectory characteristics of the pedestrian P1 at each moment by the attention mechanism with the characteristics of the associated pedestrian of the pedestrian P1. The spatial feature vector output by the spatial conversion module in the encoder 2 is continuously input to the temporal conversion module in the encoder 2, and the temporal conversion module performs temporal dimension conversion on the track information of the pedestrian P1 to obtain a temporal feature vector. In the architecture of fig. 2, the temporal feature vector output by the temporal conversion module in the encoder 2 is the updated trajectory information of the pedestrian P1 finally obtained, which is called an object feature vector.
In the system architecture of fig. 2, the encoder 2 is added, so that the extraction of the track prediction influence factors of the pedestrian P1 can be more accurate and comprehensive. For example, with the spatial conversion module in the encoder 2, the spatial conversion module in the previous encoder 1 assumes that for the first time instant in the first period, the first time instant may be any or specific time instant, at which the target pedestrian has at least one first association object, each first association object itself has at least one first object at the first time instant, and there is an association relationship between the first object and the first association object, for example, a distance between the first object and the first association object is within a preset distance range. At the first moment, not only the trajectory characteristics of the pedestrian P1 are updated by the characteristics of the respective first associated object of the target pedestrian P1, but also the trajectory characteristics of the first associated object and at least one first object thereof are updated at the same time. Then at a first time instant an updated trajectory characteristic of the target object at said first time instant and an updated trajectory characteristic of at least one of said first associated objects at said first time instant may be obtained. Then, the updated track characteristic of the target object after being updated again at the first moment can be obtained through an attention mechanism according to the updated track characteristic of the target object at the first moment and the updated track characteristic of at least one first associated object at the first moment.
For example, with the nodes in FIG. 3For example, the node/>One of the first associated objects corresponding to the target pedestrian P1 has first been identified according to the respective first associated object/>And/>Updating the node/>, corresponding to the target pedestrian P1Is a trajectory feature of (a). At the same time, also according to node/>Track features of associated first objects (first objects not shown) update the node/>, with an attention mechanismObtaining the track characteristics of the node/>And the updated track characteristics of the (E) can be obtained by the same wayAnd/>Is provided. The updated trajectory characteristics of these first associated objects have introduced the influence of their associated first objects on the trajectory prediction of the first associated objects. Then, when the space conversion module in the encoder 2 continues to update the track characteristics of the target pedestrian P1, the space conversion module can update the track characteristics according to the associated object/>, of the target pedestrian P1And/>The updated trajectory feature of P1 is continued to be updated, which is equivalent to indirectly taking into account the trajectory effect of the first object on P1.
In addition, referring to fig. 2, the network architecture further includes a memory (graph memory), which may be an external memory independent of the prediction network, and into which the object feature vector of the target object output by the time conversion module in the encoder 2 may be stored.
For example, assuming that the current time is t-1 (predicted is the track coordinate of time t), the object feature vector for the [1, t-1] period may be stored in the memory. When the track coordinates of the time t+1 are predicted subsequently, the track coordinates of the time t can be encoded to obtain corresponding current track features, the historical track features of all the time in the time period [1, t-1] can be directly read and used by a memory, and the current track features of the time t and all the historical track features in the time period [1, t-1] can form track information of the target object.
As can be seen from the above examples, the trajectory characteristics of the target object at each instant in said first period may be partly encoded and partly read by the memory. It may be assumed that track coordinates of the target object at a second time (for example, the time t) in the first period are encoded, so as to obtain track features corresponding to the second time; and reading track characteristics corresponding to the target object at third moments (such as the moments in the [1, t-1] time period) in the first time period in a memory. Wherein the number of second moments or third moments may be at least one. In addition, in the implementation, various alternative manners may be provided, for example, the track feature of each moment may be stored in the memory, or the track feature of any moment may not be stored in the memory, or of course, a part of the track features may be stored in the memory, and another part of the track features may need to be encoded.
Alternatively, when the input of the encoder 1 is track data in the [1, t ] period, track information may be obtained by FC as shown in fig. 2, and each history track feature in the [1, t-1] period in the track information may be replaced by a corresponding embedding read by the memory. By adopting the sequences in the memory for prediction, the prediction effect can be better.
Finally, the object feature vector output by the encoder 2 may be output through a decoder (decoder), which may be a full connection layer FC, and the dimension is reduced to a 2-dimensional vector, so as to obtain the track coordinate of the target pedestrian P1 at the time T obs+1.
Similarly, the track coordinates of other related pedestrians in the same scene with the target pedestrians can be predicted by the same prediction method at the moment T obs+1, namely the track coordinates of all pedestrians in the same scene at the moment T obs+1 can be predicted simultaneously in parallel, and finally the track coordinates of all pedestrians in the same scene at the moment T obs+ can be output by the network. The newly predicted trajectory coordinates may be returned to be saved in the history for use the next time the trajectory coordinates at time T obs+2 are predicted.
In addition, the network architecture of fig. 2 is only an example, and many variations are possible in practical implementation, as long as feature extraction in two dimensions of time-space is performed in the prediction of information based on group association, and a attention mechanism is used to extract features. The following examples are several modifications, but are not limited to the following modifications, and in the specific implementation, the time conversion module and the space conversion module may be freely combined, without limiting the number of modules and the combination modes thereof:
For example, at least one of the temporal conversion module and the spatial conversion module in fig. 2 may employ attention, the spatial conversion module employs GAT (Graph Attention Network, a graph attention network), and the temporal conversion module employs the manner mentioned in the embodiments of the present disclosure.
For another example, the spatial conversion module in fig. 2 adopts the manner mentioned in the embodiments of the present disclosure, and the temporal conversion module adopts an LSTM (Long Short-Term Memory) network manner. For another example, the encoder in the architecture of fig. 2 may also be only encoder 1, with encoder 2 removed; or the encoder has only encoder 2, with encoder 1 removed. Also for example, in the encoder 2 shown in fig. 2, only the spatial conversion module may be present, with the temporal conversion module removed. External memory is also optionally provided.
The encoder and decoder of fig. 2 are also trained and then applied to information prediction. In the network training stage, the predicted information of the target object may have a predicted value and a true value, a loss function is calculated according to the predicted value and the true value, and the network parameter is reversely adjusted according to the loss function value.
The information prediction method of the above embodiment can be applied to various scenarios after obtaining the predicted track:
For example, after the predicted track of the target object is predicted, if the actual track of the target object does not match the predicted track, determining that the target object is behaving abnormally. The mismatch may be that the actual track is different from the predicted track, including a case where the actual track deviates significantly from the predicted track, for example, when it is determined that the two tracks deviate significantly, the two tracks are determined to be mismatched. Whereas the measurement of the track deviation may be measured as the distance between the actual track and the predicted track by: ADE (AVERAGE DISPLACEMENT Error) or FDE (FINAL DISPLACEMENT Error), and it is possible to determine whether the degree of deviation of the two tracks is large by setting a certain threshold for the above-mentioned index. One example of a practical application may be that a pedestrian P2 is predicted to turn left at an intersection, and as a result, in fact, he turns right at the intersection, it may be determined that the pedestrian is at risk of behavioural anomalies. For example, after a pedestrian is found to be abnormal, it means that the pedestrian may be an illegal person (such as a thief).
For another example, after the predicted trajectory of the target object is predicted, a path planning process is performed based on the predicted trajectory of the target object. For example, when the auxiliary intelligent robot performs self-walking motion, after predicting the track of a certain opposite pedestrian, the robot can determine its own next walking path according to the predicted track of the pedestrian, for example, the robot can modify its own walking path to prevent collision with the predicted pedestrian. In addition, the method can be applied to other intelligent running equipment, and the intelligent running equipment can correct or plan a next running route according to the predicted pedestrian track so as to avoid collision with pedestrians.
Embodiments of the present disclosure provide a trajectory prediction apparatus that may perform the method of any of the embodiments of the present disclosure. The apparatus is briefly described below, and specific processing of its various modules may be combined with reference to the method embodiments. As shown in fig. 4, the apparatus may include: an information acquisition module 41, an information transformation module 42, a vector acquisition module 43, and an information prediction module 44.
The information acquisition module 41 is configured to acquire track information of the target object and the associated object in a first period respectively;
The information transformation module 42 is configured to perform information transformation of a time dimension on track information of the target object to obtain a time feature vector of the target object, and perform information transformation of a space dimension based on track information of the target object and related objects to obtain a space feature vector of the target object; at least one of the information transformations of the time dimension and the space dimension adopts an attention mechanism;
A vector obtaining module 43, configured to obtain an object feature vector of a target object according to the temporal feature vector and the spatial feature vector;
An information prediction module 44, configured to predict, based on the object feature vector, track information of the target object in a second period, where the second period is located after the first period in time sequence.
In some embodiments, the information transformation module 42, when configured to perform spatial dimension information transformation based on the track information of the target object and the associated object, includes: for each moment in the first period, according to the track characteristics of the target object at the moment and the track characteristics of at least one associated object of the target object at the moment, obtaining updated track characteristics of the target object at the moment through an attention mechanism; and the whole of the updated track characteristics corresponding to each moment of the target object in the first period is taken as a space characteristic vector of the target object. The track information of the first period includes: the track features respectively correspond to each moment in the first period.
In some embodiments, the information transformation module 42, when configured to obtain, by an attention mechanism, an updated trajectory characteristic of the target object at the time, comprises: obtaining a query vector corresponding to a target object at the moment according to the track characteristic of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristic of the associated object at the moment; determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object; obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment; and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.
In some embodiments, the information transformation module 42 is further configured to obtain, for a first associated object of the target object at a first time in the first period, an updated track feature of the first associated object at the time through an attention mechanism according to a track feature of the first associated object and a track feature of at least one first object, where an association relationship exists between the first object and the first associated object; and obtaining the updated track characteristics of the target object after being updated again at the first moment through an attention mechanism according to the updated track characteristics of the target object at the first moment and the updated track characteristics of at least one first associated object at the first moment.
In some embodiments, the distance between the first object and the first associated object is within a preset distance range.
In some embodiments, the information transformation module 42, when used for the information transformation of the trajectory information of the target object in the time dimension, includes: and obtaining the updated time feature vector of the target object through an attention mechanism according to the track features of the target object at each corresponding moment in the track information of the target object. The track information of the first period includes: the track features respectively correspond to each moment in the first period.
In some embodiments, the information transformation module 42, when configured to obtain the temporal feature vector of the target object, includes: under the condition that the information transformation of the time dimension adopts a self-attention mechanism, carrying out the information transformation of the time dimension and the processing of a multi-head attention mechanism on the track information of the target object to obtain a time feature vector of the target object; and/or, the information transformation module, when used for obtaining the space feature vector of the target object, comprises: under the condition that the information transformation of the space dimension adopts a self-attention mechanism, the information transformation of the space dimension and the processing of a multi-head attention mechanism are carried out based on the track information of the target object and the related object, and the space feature vector of the target object is obtained.
In some embodiments, the vector obtaining module 43 is specifically configured to: splicing the time feature vector and the space feature vector; and obtaining the object feature vector according to the vector obtained after the splicing.
In some embodiments, the vector obtaining module 43, when configured to obtain an object feature vector of the target object according to the temporal feature vector and the spatial feature vector, includes: performing information transformation of space dimension on the time feature vector output by the information transformation of the time dimension, wherein the obtained space feature vector is used as the object feature vector; and/or performing information transformation of the time dimension on the spatial feature vector output by the information transformation of the space dimension, wherein the obtained time feature vector is used as the object feature vector.
In some embodiments, the information obtaining module 41, when configured to obtain the track information of the target object in the first period, includes: and acquiring the track characteristics of the target object at each moment in the first period, and acquiring track information of the target object in the first period according to the acquired track characteristics.
In some embodiments, the information obtaining module 41, when configured to obtain the track feature of the target object at each time in the first period, includes: encoding the track coordinates of the target object at a second moment in the first period to obtain track characteristics corresponding to the second moment; and/or reading track features corresponding to the target objects at the third moment in the first period from a memory.
In some embodiments, the target object is a target pedestrian; the track information of the target object in the second period comprises at least one of the following: track coordinates of the target pedestrian at each moment in the second period, and movement trend of the target pedestrian in the second period; the association object is a pedestrian having an association relationship with the target pedestrian in the same scene.
In some embodiments, the above apparatus may be used to perform any of the corresponding methods described above, and for brevity, will not be described in detail herein.
As shown in fig. 5, the presently disclosed embodiments also provide an electronic device that includes a memory 51, a processor 52, and an internal bus 53. The memory 51 is configured to store computer readable instructions, and the processor 52 is configured to invoke the computer instructions to implement the track prediction method according to any embodiment of the present disclosure. For example, the processor 52 may execute the functions of each module in the track prediction apparatus in the embodiments of the present disclosure by calling the instructions in the memory, for example, may implement the function area of the information acquisition module to acquire track information, may implement the function of the information transformation module to perform information transformation of the track information in time and space dimensions, and may also implement the functions of the vector acquisition module and the information prediction module to perform track prediction. Of course, the processor 52 may also implement other functions of the information acquisition module, the information transformation module, the vector acquisition module, and the information prediction module provided in the embodiments of the present disclosure, which will not be described in detail.
The disclosed embodiments also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the trajectory prediction method of any of the embodiments of the present specification.
One skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present disclosure may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program may be stored, which when executed by a processor, implements the steps of the neural network training method for word recognition described in any embodiment of the present disclosure, and/or implements the steps of the word recognition method described in any embodiment of the present disclosure.
Wherein "and/or" as described in embodiments of the present disclosure means at least one of the two, for example, "multiple and/or B" includes three schemes: many, B, and "many and B". The various embodiments in this disclosure are described in a progressive manner, and identical and similar parts of the various embodiments are all referred to each other, and each embodiment is mainly described as different from other embodiments. In particular, for data processing apparatus embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.
The foregoing has described certain embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Embodiments of the subject matter and functional operations described in this disclosure may be implemented in the following: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this disclosure and structural equivalents thereof, or a combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., a FPG multi (field programmable gate array) or multi SIC (application specific integrated circuit).
Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, a personal digital assistant (PD multislot), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.
Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or removable disk), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
Although this disclosure contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or the scope of what is claimed, but rather as primarily describing features of particular embodiments of the particular disclosure. Certain features that are described in this disclosure in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings are not necessarily required to be in the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The foregoing description of the preferred embodiment(s) of the present disclosure is merely intended to illustrate the embodiment(s) of the present disclosure, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the embodiment(s) of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (13)

1. A method of trajectory prediction, the method comprising:
track information of the target object and the associated object in a first period is respectively obtained;
Performing time dimension information transformation on the track information of the target object to obtain a time feature vector of the target object, and performing space dimension information transformation on the track information of the target object and the track information of the related object to obtain a space feature vector of the target object; at least one of the information transformations of the time dimension and the space dimension adopts an attention mechanism;
Obtaining an object feature vector of the target object according to the time feature vector and the space feature vector;
predicting track information of the target object in a second period based on the object feature vector, wherein the second period is positioned after the first period in time sequence;
the track information of the first period includes: track features corresponding to each moment in the first period respectively;
The performing spatial dimension information transformation based on the track information of the target object and the associated object to obtain a spatial feature vector of the target object, including:
For each moment in the first period, according to the track characteristics of the target object at the moment and the track characteristics of at least one associated object of the target object at the moment, obtaining updated track characteristics of the target object at the moment through an attention mechanism;
the whole of the updated track characteristics of the target object corresponding to each moment in the first period is used as a space characteristic vector of the target object;
the obtaining the updated track characteristic of the target object at the moment through the attention mechanism comprises the following steps:
Obtaining a query vector corresponding to a target object at the moment according to the track characteristic of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristic of the associated object at the moment;
Determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object;
Obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment;
And updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.
2. The method according to claim 1, wherein the method further comprises:
For a first associated object of the target object at a first moment in the first period, according to the track characteristics of the first associated object and the track characteristics of at least one first object, obtaining updated track characteristics of the first associated object at the moment through an attention mechanism, wherein an association relationship exists between the first object and the first associated object;
And obtaining the updated track characteristics of the target object after being updated again at the first moment through an attention mechanism according to the updated track characteristics of the target object at the first moment and the updated track characteristics of at least one first associated object at the first moment.
3. The method of claim 2, wherein a distance between the first object and the first associated object is within a preset distance range.
4. A method according to any one of claims 1 to 3, wherein the trajectory information for the first period of time comprises: track features corresponding to each moment in the first period respectively; the performing information transformation of the time dimension on the track information of the target object includes:
And obtaining updated time feature vectors of the target object through an attention mechanism according to the track features corresponding to each moment in the track information of the target object.
5. The method according to claim 1, wherein, in the case where the information transformation in the time dimension adopts a self-attention mechanism, the performing the information transformation in the time dimension on the track information of the target object, to obtain a time feature vector of the target object, includes:
Performing time dimension information transformation and multi-head attention mechanism processing on the track information of the target object to obtain a time feature vector of the target object; and/or, in the case that the spatial dimension information transformation adopts a self-attention mechanism, performing the spatial dimension information transformation based on the track information of the target object and the related object to obtain a spatial feature vector of the target object, where the method includes:
And carrying out information transformation of space dimension and processing of a multi-head attention mechanism based on the track information of the target object and the related object to obtain the space feature vector of the target object.
6. The method according to claim 1, wherein the obtaining the object feature vector of the target object from the temporal feature vector and the spatial feature vector comprises:
splicing the time feature vector and the space feature vector;
and obtaining the object feature vector according to the vector obtained after the splicing.
7. The method according to claim 1, wherein the obtaining the object feature vector of the target object from the temporal feature vector and the spatial feature vector comprises:
Performing information transformation of space dimension on the time feature vector output by the information transformation of the time dimension, wherein the obtained space feature vector is used as the object feature vector;
And/or performing information transformation of the time dimension on the spatial feature vector output by the information transformation of the space dimension, wherein the obtained time feature vector is used as the object feature vector.
8. The method of claim 1, wherein the obtaining track information of the target object during the first period of time comprises:
And acquiring the track characteristics of the target object at each moment in the first period, and acquiring track information of the target object in the first period according to the acquired track characteristics.
9. The method of claim 8, wherein the first period of time comprises at least one of a second time and a third time;
the obtaining the track characteristic of the target object at each moment in the first period of time includes:
Encoding the track coordinates of the target object at a second moment in the first period to obtain track characteristics corresponding to the second moment; and/or reading track features corresponding to the target objects at the third moment in the first period from a memory.
10. The method of claim 1, wherein the step of determining the position of the substrate comprises,
The target object is a target pedestrian;
The track information of the target object in the second period comprises at least one of the following: track coordinates of the target pedestrian at each moment in the second period, and movement trend of the target pedestrian in the second period;
The association object is a pedestrian having an association relationship with the target pedestrian in the same scene.
11. A trajectory prediction device, the device comprising:
the information acquisition module is used for respectively acquiring track information of the target object and the associated object in a first period;
the information transformation module is used for carrying out information transformation of time dimension on the track information of the target object to obtain a time feature vector of the target object, and carrying out information transformation of space dimension on the track information of the target object and the track information of the related object to obtain a space feature vector of the target object; at least one of the information transformations of the time dimension and the space dimension adopts an attention mechanism;
The vector obtaining module is used for obtaining an object feature vector of the target object according to the time feature vector and the space feature vector;
The information prediction module is used for predicting and obtaining track information of the target object in a second period based on the object feature vector, wherein the second period is positioned after the first period in time sequence;
The information transformation module, when performing information transformation of spatial dimensions based on the track information of the target object and the related object, obtains a spatial feature vector of the target object, includes: for each moment in the first period, according to the track characteristics of the target object at the moment and the track characteristics of at least one associated object of the target object at the moment, obtaining updated track characteristics of the target object at the moment through an attention mechanism; the whole of the updated track characteristics of the target object corresponding to each moment in the first period is used as a space characteristic vector of the target object; wherein the track information of the first period includes: track features corresponding to each moment in the first time period respectively;
The information transformation module, when used for obtaining the updated track characteristics of the target object at the moment through the attention mechanism, comprises the following steps: obtaining a query vector corresponding to a target object at the moment according to the track characteristic of the target object at the moment, and obtaining a key vector corresponding to the associated object at the moment according to the track characteristic of the associated object at the moment; determining transfer information transferred to the target object by each associated object according to the obtained query vector of the target object and the key vector of the associated object; obtaining attention parameters corresponding to the target object according to the obtained transfer information of each associated object corresponding to the target object at the moment; and updating the track characteristics of the target object at the moment based on the attention parameter to obtain the updated track characteristics.
12. An electronic device, comprising: a memory for storing computer readable instructions, a processor for invoking the computer instructions to implement the method of any of claims 1 to 10.
13. A computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the method of any of claims 1 to 10.
CN202010685801.1A 2020-07-16 2020-07-16 Track prediction method, track prediction device, electronic equipment and medium Active CN111798492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010685801.1A CN111798492B (en) 2020-07-16 2020-07-16 Track prediction method, track prediction device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010685801.1A CN111798492B (en) 2020-07-16 2020-07-16 Track prediction method, track prediction device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN111798492A CN111798492A (en) 2020-10-20
CN111798492B true CN111798492B (en) 2024-04-19

Family

ID=72807412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010685801.1A Active CN111798492B (en) 2020-07-16 2020-07-16 Track prediction method, track prediction device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111798492B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112677993A (en) * 2021-01-05 2021-04-20 北京三快在线科技有限公司 Model training method and device
CN113316788A (en) * 2021-04-20 2021-08-27 深圳市锐明技术股份有限公司 Method and device for predicting pedestrian motion trail, electronic equipment and storage medium
CN113291321A (en) * 2021-06-16 2021-08-24 苏州智加科技有限公司 Vehicle track prediction method, device, equipment and storage medium
CN113888601B (en) * 2021-10-26 2022-05-24 北京易航远智科技有限公司 Target trajectory prediction method, electronic device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018187252A1 (en) * 2017-04-06 2018-10-11 Hrl Laboratories, Llc Explicit prediction of adversary movements with canonical correlation analysis
CN109409499A (en) * 2018-09-20 2019-03-01 北京航空航天大学 One kind being based on deep learning and the modified track restoration methods of Kalman filtering
US10345106B1 (en) * 2015-10-29 2019-07-09 National Technology & Engineering Solutions Of Sandia, Llc Trajectory analysis with geometric features
CN110895879A (en) * 2019-11-26 2020-03-20 浙江大华技术股份有限公司 Method and device for detecting co-running vehicle, storage medium and electronic device
CN111401233A (en) * 2020-03-13 2020-07-10 商汤集团有限公司 Trajectory prediction method, apparatus, electronic device, and medium
CN111400620A (en) * 2020-03-27 2020-07-10 东北大学 User trajectory position prediction method based on space-time embedded Self-orientation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060810A1 (en) * 2012-12-13 2017-03-02 Eagle Harbor Holdings, LLC. System and method for the operation of an automotive vehicle system with modeled sensors
US11082438B2 (en) * 2018-09-05 2021-08-03 Oracle International Corporation Malicious activity detection by cross-trace analysis and deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10345106B1 (en) * 2015-10-29 2019-07-09 National Technology & Engineering Solutions Of Sandia, Llc Trajectory analysis with geometric features
WO2018187252A1 (en) * 2017-04-06 2018-10-11 Hrl Laboratories, Llc Explicit prediction of adversary movements with canonical correlation analysis
CN109409499A (en) * 2018-09-20 2019-03-01 北京航空航天大学 One kind being based on deep learning and the modified track restoration methods of Kalman filtering
CN110895879A (en) * 2019-11-26 2020-03-20 浙江大华技术股份有限公司 Method and device for detecting co-running vehicle, storage medium and electronic device
CN111401233A (en) * 2020-03-13 2020-07-10 商汤集团有限公司 Trajectory prediction method, apparatus, electronic device, and medium
CN111400620A (en) * 2020-03-27 2020-07-10 东北大学 User trajectory position prediction method based on space-time embedded Self-orientation

Also Published As

Publication number Publication date
CN111798492A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN111798492B (en) Track prediction method, track prediction device, electronic equipment and medium
WO2021180130A1 (en) Trajectory prediction
CN112714896B (en) Self-aware vision-text common ground navigation agent
Zhou et al. Deep alignment network based multi-person tracking with occlusion and motion reasoning
US11176402B2 (en) Method and device for identifying object
Sheng et al. Siamese denoising autoencoders for joints trajectories reconstruction and robust gait recognition
CN101142586A (en) Method of performing face recognition
CN113538506A (en) Pedestrian trajectory prediction method based on global dynamic scene information depth modeling
US11590651B2 (en) Method and device for training manipulation skills of a robot system
CN112257659B (en) Detection tracking method, device and medium
CN112597984B (en) Image data processing method, image data processing device, computer equipment and storage medium
Zhao et al. When video classification meets incremental classes
CN111508000B (en) Deep reinforcement learning target tracking method based on parameter space noise network
KR20220065672A (en) Deep smartphone sensors fusion for indoor positioning and tracking
Wang et al. Plug-and-play: Improve depth prediction via sparse data propagation
Cetintas et al. Unifying short and long-term tracking with graph hierarchies
CN103218663A (en) Information processing apparatus, information processing method, and program
Hu et al. Stdformer: Spatial-temporal motion transformer for multiple object tracking
CN111626098B (en) Method, device, equipment and medium for updating parameter values of model
CN114399901B (en) Method and equipment for controlling traffic system
CN116030077A (en) Video salient region detection method based on multi-dataset collaborative learning
KR102323671B1 (en) Method and apparatus for detecting abnormal objects in video
KR20230057765A (en) Multi-object tracking apparatus and method based on self-supervised learning
Kao et al. A Posture Features Based Pedestrian Trajectory Prediction with LSTM
CN112862007A (en) Commodity sequence recommendation method and system based on user interest editing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant