CN114581487B - Pedestrian trajectory prediction method, device, electronic equipment and computer program product - Google Patents
Pedestrian trajectory prediction method, device, electronic equipment and computer program product Download PDFInfo
- Publication number
- CN114581487B CN114581487B CN202210255592.6A CN202210255592A CN114581487B CN 114581487 B CN114581487 B CN 114581487B CN 202210255592 A CN202210255592 A CN 202210255592A CN 114581487 B CN114581487 B CN 114581487B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- self
- under
- information
- track
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 128
- 238000004590 computer program Methods 0.000 title claims abstract description 11
- 230000033001 locomotion Effects 0.000 claims abstract description 52
- 230000003993 interaction Effects 0.000 claims abstract description 48
- 239000013598 vector Substances 0.000 claims description 81
- 238000012545 processing Methods 0.000 claims description 30
- 239000011159 matrix material Substances 0.000 claims description 24
- 230000002452 interceptive effect Effects 0.000 claims description 23
- 230000009466 transformation Effects 0.000 claims description 22
- 230000015654 memory Effects 0.000 claims description 19
- 230000007246 mechanism Effects 0.000 claims description 18
- 230000004927 fusion Effects 0.000 claims description 10
- 230000000007 visual effect Effects 0.000 claims description 7
- 238000007499 fusion processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 21
- 238000012549 training Methods 0.000 description 20
- 238000009826 distribution Methods 0.000 description 16
- 238000000605 extraction Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000011176 pooling Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 239000000463 material Substances 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure provides a pedestrian trajectory prediction method, including: acquiring observation track information of at least one pedestrian in a scene, and converting the observation track information of each pedestrian into self-view track information of each pedestrian at a self-view angle; acquiring the motion trend characteristics of each pedestrian based on the self-view track information of each pedestrian, and acquiring the interaction characteristics of each pedestrian and other pedestrians; generating future position information of each pedestrian under the self-view angle at least based on the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians; and generating at least one future track of each pedestrian under the self view angle at least based on the future position information of each pedestrian under the self view angle, and converting the future track under the self view angle into the future track under the world coordinate system. The disclosure also provides a pedestrian trajectory prediction apparatus, an electronic device, a readable storage medium, and a computer program product.
Description
Technical Field
The present disclosure relates to the field of computer vision technology and the field of autopilot. The present disclosure relates to a pedestrian trajectory prediction method, apparatus, electronic device, storage medium, and computer program product based on multi-view transformation.
Background
The trajectory prediction task is mainly applied to an automatic driving task at present, and the trajectory prediction is carried out on other traffic participants in an automatic driving scene, so that the method has important significance for realizing higher-level unmanned driving. The improvement of the performance of the automatic driving perception system and the further development of deep learning on a time series model lay a foundation for the research of a track prediction task. The perception system obtains historical position information of the target through various sensors and sends the historical position information to the prediction model, and the model predicts future tracks of other targets in various traffic scenes. The method can be used for completing more accurate prediction, and the prediction result can be used for serving an automatic driving control and decision-making system, so that the safety of vehicles and pedestrians is better ensured, and the efficiency of road traffic is improved.
At present, the pedestrian trajectory prediction method mainly extracts sequence features through deep learning methods, such as network models of a Recurrent Neural Network (RNN), a long-term memory network (LSTM), a gated round robin unit (GRU), and the like, and the methods are used as components and embedded into a network structure of an encoder-decoder to realize future trajectory prediction. The research of pedestrian trajectory prediction still has many difficult points to be solved urgently so far, and the learning ability and the prediction accuracy of the model method still need to be further improved.
The following typical trajectory prediction methods exist in the prior art.
The first scheme is as follows: human trajectory prediction in crowed spaces, a classic method for trajectory prediction using LSTM, was constructed based on a data-driven model, with the history trajectory sequence of each pedestrian as the input of LSTM, and passed through the designed Social (Social) pooling layer during each iteration output of LSTM for de-characterizing and integrating the interaction with other pedestrians around, and the output feature vector as the hidden state input at the next moment. The scheme is very classical, but the LSTM model has more parameters, and a pooling layer with poorer processing efficiency is added in each iteration, so that under the condition that simple track information is used as input, the pooling layer is difficult to learn useful interactive characteristics, and the model is high in cost during training and parameter updating. The method for representing the track is single, and only the track sequence is simply input.
Scheme II: the paper "Social gate: social acceptable destinations with generic adaptive networks" published in 2018 in CVPR conference adopts a method of countermeasure generation network to realize trajectory prediction, and adopts a common Encoder-Decoder (Encoder-Decoder) architecture in a sequence model, uses LSTM as a component for time sequence feature extraction, and embeds the LSTM into the coding parts of a generator and a discriminator, and the trajectory sequence in each scene is sent into a coder, and the extracted trajectory sequence features are used as a part of subsequent feature superposition. Meanwhile, the interaction information among different pedestrian tracks is represented through a designed social contact pooling layer, and the representation is used as another part of feature superposition. The two parts of characteristics are sent to a decoder for trajectory prediction, the method adopted by the scheme has a clear and definite structure, but the method is single in the aspect of extracting the trajectory characteristics, the prediction precision is reduced after interactive pooling is added, and the designed characterization method shows that the pedestrian interaction characteristics captured by the method are not prominent enough. And the decoder may not ensure the stability of the generated multi-track due to the participation of noise when the multi-track is generated.
The third scheme is as follows: a paper published in 2019 at the CVPR conference: state information for LSTM aware Pedestrian Prediction is implemented by adopting an LSTM-based State updating module, wherein feature extraction and future track generation of a track sequence are realized by using the LSTM module, after the sequence feature extraction, a State updating model is designed, the State updating model is used for extracting interaction influence among pedestrians by estimating the intention of adjacent pedestrians around a predicted target, updating the set State of the adjacent pedestrians according to common iteration, and using a designed message transmission mechanism and a social awareness selection mechanism. The message transmission and state updating mechanism adopted by the whole scheme is very novel, but the state updating module is relatively complex in design, so that the efficiency of updating multiple model parameters is low, and the generalization capability of interaction is possibly poor.
And the scheme is as follows: patent document CN112766561A proposes a trajectory prediction method based on attention mechanism and generation of confrontation network, and the main component of sequence feature extraction uses LSTM, which adds an attention pooling module in the encoder and decoder, and in order to characterize the influence of motion between pedestrians, it also considers the velocity vector, distance vector, and included angle of motion vector of pedestrian motion, and combines the above vectors into a feature vector, and sends it to the attention module for weight assignment. The scheme uses an attention mechanism to obtain pedestrian track interaction characteristics, but the characteristic design scheme is that a plurality of manually designed vector characteristics possibly have information redundancy among different characteristics, and the generalization capability of the model is reduced. And using the LSTM with more parameters as a decoder may result in higher computational cost when there is too large a feature dimension.
In the first scheme and the third scheme, designed pooling layers and state updating modules are respectively utilized to carry out interactive representation among different pedestrians, the designed interactive representation modules are ingenious, but feature vectors in each iteration process need to participate in updating, so that the time and space complexity of the model is high, meanwhile, a simple coordinate sequence is only used as a single input for a track, and the generalization representation capability of the model is difficult to improve. And in the second scheme, on the basis of the first scheme, the confrontation generation network is used for generating the track, so that the diversity of track generation is improved, the social pooling layer is simplified, the pooling layer in the iterative process is removed, and the speed and the precision of the model are improved. But the representation interaction mode is simple, and when the number of pedestrians in the scene is too large, the interaction influence of a part of the pedestrians on the target is inevitably ignored. And on the basis of the second scheme, the pedestrian interaction is represented by the manually designed motion vectors and vector included angles of different pedestrians, the interaction characteristics are further refined, but because a large amount of information redundancy exists among a plurality of characteristics, the generalization representation capability of the model is difficult to guarantee. And the second scheme and the fourth scheme can not ensure the stability of multi-track generation when the multi-track is generated.
In summary, the current pedestrian trajectory prediction method mainly has the following problems:
(1) The generalization capability needs to be improved, in the field of automatic driving, more accurate prediction capability, scene understanding capability and faster processing speed are needed, and the current main methods are single in scene understanding, and all consider the track coordinate sequence by a unified standard under a world coordinate system and consider the track sequence information by a consistent view angle. And the model prediction capability is improved, and implicit characteristics brought by historical track information need to be mined to a greater extent. (2) Meanwhile, the design of the encoder and the decoder needs to reduce parameters as much as possible while maintaining the precision, thereby reducing the training overhead and the inference time of the model. (3) When the model generates multiple tracks, the stability of track generation cannot be ensured due to the participation of noise.
Disclosure of Invention
To solve at least one of the above technical problems, the present disclosure provides a pedestrian trajectory prediction method, apparatus, electronic device, storage medium, and computer program product based on multi-view transformation.
The method aims to design a universal pedestrian trajectory prediction method with strong generalization capability, and can complete multi-trajectory prediction generation in a complex pedestrian interaction scene. Firstly, aiming at the problem that the scene understanding and generalization capability of the model (1) need to be improved, the multi-view coordinate system transformation method is provided in the disclosure, self-view coordinate systems are established one by one for respective motion characteristics of a plurality of pedestrians in the scene, each pedestrian observes the tracks of other pedestrians under the unique self-view coordinate system, and characteristics implied by track information can be mined to a greater extent. Aiming at the problem of excessive model complexity and parameters in the step (2), the multi-head attention mechanism is used for extracting track sequence characteristics, the representations of the track sequences in different hidden variable spaces are mined through a plurality of attention heads, network parameters are greatly reduced while network performance is guaranteed, and cost required by network training is reduced. Finally, aiming at the problem that the generation-resisting network lacks stability during multi-track prediction generation in the step (3), noise needs to be added during multi-track generation, but partial unacceptable tracks can be generated due to randomness, and the main movement intention of the pedestrian cannot be reflected. Aiming at the problem, the method provides that in multi-track prediction, by restricting the error distribution of a plurality of generated tracks and selecting partial track errors Loss superposition calculation, the track generation diversity is ensured, and meanwhile, the stability of generator prediction is improved.
The pedestrian trajectory prediction method and device based on multi-view transformation, the electronic device and the storage medium are realized by the following technical scheme.
According to an aspect of the present disclosure, there is provided a pedestrian trajectory prediction method, including:
acquiring observation track information of at least one pedestrian in a scene, and converting the observation track information of each pedestrian into self-view track information of each pedestrian under a self-view;
acquiring the motion trend characteristics of each pedestrian based on the self-view track information of each pedestrian, and acquiring the interaction characteristics of each pedestrian and other pedestrians;
generating future position information of each pedestrian under the self-view angle at least based on the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians; and the number of the first and second groups,
generating at least one future track of each pedestrian under the self view angle at least based on the future position information of each pedestrian under the self view angle, and converting the future track under the self view angle into the future track under the world coordinate system.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, acquiring observation trajectory information of at least one pedestrian of a scene includes:
acquiring position information of each pedestrian in a world coordinate system from the picture and/or the video of the scene, and generating the observation track information; and the number of the first and second groups,
and extracting the observation track information of the pedestrians with the lengths larger than or equal to the preset length in the scene and the corresponding identification numbers of the pedestrians.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, converting the observation trajectory information of each pedestrian into self-perspective trajectory information of each pedestrian at a self-perspective includes:
generating a coordinate system transformation matrix for the corresponding pedestrian based on the observation track information of the pedestrian with the length greater than or equal to the preset length in the scene; and the number of the first and second groups,
and based on the coordinate transformation matrix, converting the observation track information of the pedestrian into self-view observation track information under a self-view.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, acquiring a motion trend characteristic of each pedestrian based on the self-perspective trajectory information of each pedestrian, and acquiring an interaction characteristic of each pedestrian and other pedestrians, includes:
embedding the self-visual angle observation track information of each pedestrian under the self-visual angle into time information to obtain a feature vector xi obs ;
Respectively acquiring the final observation position information of each other pedestrian under the self-viewing angle coordinate system of each pedestrian so as to obtain the interactive characteristics with each other pedestrian; and the number of the first and second groups,
extracting the feature vector xi based on a full connection layer and a multi-head attention mechanism obs Time domain implicit feature e of att I.e. motion trend characteristics.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, the method of obtaining the motion trend characteristic of each pedestrian based on the self-view trajectory information of each pedestrian and obtaining the interaction characteristic of each pedestrian and other pedestrians further includes:
and carrying out fusion processing on the motion trend characteristics of each pedestrian, the interaction characteristics of each pedestrian and other pedestrians and Gaussian noise to generate a fusion characteristic vector.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, time information is embedded into self-perspective observation trajectory information under a self-perspective of each pedestrian, and a feature vector ξ is obtained obs The method comprises the following steps:
performing dimension increasing processing on the self-view angle observation track information of each pedestrian under the self-view angle, so that the position expression of the self-view angle observation track information is increased from two dimensions to a preset dimension D, and obtaining a characteristic vector e of each pedestrian, which represents the spatial position of the pedestrian obs Embedding position code, generating characteristic vector xi after embedding position code obs Wherein the feature vector e obs Based on the full connection layer, the method is obtained by the following formula:
e obs =φ(X view ;W g )
wherein φ (·) denotes a fully connected layer, W g Is the weight of the full connection layer, X view For each pedestrianA self-observation trajectory sequence under a self-view angle;
wherein, embedding position code, generating characteristic vector xi after embedding position code obs The method comprises the following steps:
the feature vector e obs And feature vector p obs Adding to obtain the feature vector xi obs ;
Wherein the feature vector p obs The feature vectors for different frames and different dimensions are obtained by the following formula:
wherein PE is a two-dimensional matrix, PE (-) represents the index of the parameter in the matrix, the size of the matrix and the characteristic vector e obs Likewise, t represents the time step D of the sequence representing each of the D dimensions.
According to the pedestrian track prediction method of at least one embodiment of the present disclosure, the method for obtaining the last observation position information of each other pedestrian in the self-perspective coordinate system of each pedestrian to obtain the interaction characteristics with each other pedestrian comprises:
respectively carrying out dimension increasing processing on the last observed position information of each other pedestrian under the self-view angle coordinate system of each pedestrian, so that the position expression of the last observed position information of each other pedestrian is increased from two dimensions to a preset dimension D, and obtaining the interactive feature vector e of each pedestrian and each other pedestrian act The interaction feature vector e act Obtained by the following formula:
e act =φ(X view ;W g );
wherein X view The observation trajectory sequence of other pedestrians under the self-vision angle of each pedestrian.
In accordance with at least one implementation of the present disclosureThe pedestrian track prediction method is characterized in that the feature vector xi is extracted based on a full connection layer and a multi-head attention mechanism obs Time domain implicit characteristic e of att The method comprises the following steps:
the feature vector xi obs Sending the data to a plurality of full connection layers for dimension transformation to generate multi-head attention input Query, key and Value, wherein the Query, key and Value are respectively a feature vector xi obs For mining the rule of pedestrian position variation with time.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, feature vector ξ is paired using four fully-connected layers obs Performing the dimensional transformation:
wherein l denotes the reference numerals of the fully connected layers of the four self-attention heads,the weights of the four fully connected layers are indicated and the subscript a is the identification of the fully connected layer.
According to the pedestrian track prediction method of at least one embodiment of the disclosure, a multi-head attention mechanism is adopted to extract a track sequence xi obs Time domain implicit feature e of att The method comprises the following steps:
the following formula was used for extraction:
head l =Attention(Query l ,Key l ,Value l )l=1,2,3,4
MultiHead=φ(Concat(head 1 ,head 2 ,...,head l );W b )
wherein softmax is a normalization function, d k For Key dimension, concat operation is used for connectionA plurality of vectors, W b Representing weight parameters in phi (-) full-connection network, and the feature representation extracted by MultiHead is time-domain implicit feature e att 。
According to the pedestrian track prediction method of at least one embodiment of the present disclosure, the method for generating a fusion feature vector by fusing the motion trend feature of each pedestrian, the interaction feature with other pedestrians, and gaussian noise includes:
implicit characteristic e of the time domain att Interactive feature vector e act And performing Concat operation on the Gaussian noise Z vector to obtain the final output e of the encoder encoder I.e. the fused feature vector.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, generating future position information of each pedestrian at a self-view angle based on at least motion trend characteristics of each pedestrian and interaction characteristics of each pedestrian and other pedestrians, includes:
the final output e encoder Into a decoder based on e encoder And the predicted track characteristics e existing for each pedestrian decoder And outputting future track sequence positions, namely future position information under the self-view angle of each pedestrian in an iterative mode.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, the final output e is output encoder Into a decoder based on e encoder And the predicted track characteristics e existing for each pedestrian decoder Outputting future track sequence positions in an iterative manner, wherein each iteration comprises:
existing predicted trajectory sequence for each pedestrianCarrying out coding operation to obtain coding characteristic xi pred ;
Based on the coding feature ξ pred Obtaining the existing predicted track characteristics e decoder ;
Based on the e encoder And current e decoder Feature extraction from attention mechanism to obtain current feature vector e deatt :
Wherein d is encoder Is e encoder The upper corner mark T represents the matrix transpose,is the current e decoder (ii) a And (c) a second step of,
A pedestrian trajectory prediction method according to at least one embodiment of the present disclosure is based on a current e deatt (i.e. the) Obtaining new position point coordinatesObtained by the following formula:
wherein,for iteration p t Predicted coordinates of frame time, W y Subscript y is the full link layer identification for the full link layer parameter.
A pedestrian trajectory prediction method, a new position point coordinate, according to at least one embodiment of the present disclosureIs stored toTo recalculate to e at the next iteration decoder And a new position prediction is made.
A pedestrian trajectory prediction method according to at least one embodiment of the present disclosure is based on a current e deatt (i.e. the) Obtaining new position point coordinatesObtained by the following formula:
wherein,to iterate p t Predicted coordinates of frame time, W y Subscript y is the full link layer identification for the full link layer parameter.
Pedestrian trajectory prediction method, new location point coordinates, according to at least one embodiment of the present disclosureIs stored toTo recalculate to e at the next iteration decoder And a new position prediction is made.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, an error loss function is used for training in a training process.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, generating a future trajectory of each pedestrian at least one self-view angle based on at least future position information of each pedestrian at the self-view angle includes:
during training, k trajectories are generated for each pedestrian and the n trajectory errors are used in calculating lossAs an error accumulation and updates the parameters of the encoder-decoder model:
according to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, in the training process, the error distribution after the generation of the plurality of trajectories is constrained, and the threshold is set according to the mean value and the standard deviation of the error distribution of the plurality of trajectories, so as to represent the punishment of the randomness of the trajectories, which is too large:
wherein,to generate the mean of the k trajectory error distributions,is the standard deviation of the distribution.
According to the pedestrian trajectory prediction method of at least one embodiment of the present disclosure, the n tracks are selected by:
selecting the corresponding distribution of error valuesDiscarding errors for n tracks within a threshold intervalAnd if the value of the track exceeds the threshold interval, the errors of the abandoned track are not calculated in an accumulated mode, and the errors of the n tracks with the error values in the threshold interval are averaged in an accumulated mode to be used as the errors of the encoder-decoder model.
According to another aspect of the present disclosure, there is provided a pedestrian trajectory prediction device including:
the first data processing module acquires observation track information of at least one pedestrian in a scene, and converts the observation track information of each pedestrian into self-view track information of each pedestrian under a self-view angle;
the encoder acquires the motion trend characteristics of each pedestrian based on the self-view track information of each pedestrian and acquires the interaction characteristics of each pedestrian and other pedestrians;
a decoder which generates future position information of each pedestrian under a self-view angle based on at least the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians;
a generator that generates at least one future trajectory for each pedestrian at its own perspective based at least on future position information for each pedestrian at its own perspective; and the number of the first and second groups,
a second data processing module that converts the future trajectory under the self-perspective to a future trajectory under a world coordinate system.
According to still another aspect of the present disclosure, there is provided a pedestrian trajectory prediction device including:
an image acquisition device that acquires images and/or videos of a scene;
the first data processing module is used for acquiring observation track information of at least one pedestrian in the image and/or the video of the scene and converting the observation track information of each pedestrian into self-view track information of each pedestrian under a self-view angle;
the encoder acquires the motion trend characteristics of each pedestrian based on the self-view track information of each pedestrian and acquires the interaction characteristics of each pedestrian and other pedestrians;
a decoder which generates future position information of each pedestrian under a self-view angle based on at least the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians;
a generator that generates a future trajectory for each pedestrian at least one self perspective based at least on future position information for each pedestrian at its self perspective; and (c) a second step of,
a second data processing module that converts the future trajectory under the self-perspective to a future trajectory under a world coordinate system.
According to still another aspect of the present disclosure, there is provided an electronic device including:
a memory storing execution instructions; and a processor executing execution instructions stored by the memory to cause the processor to perform any of the methods described above.
According to yet another aspect of the present disclosure, there is provided a readable storage medium having stored therein execution instructions, which when executed by a processor, are configured to implement the method of any one of the above.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the pedestrian trajectory prediction method of any one of the above.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the disclosure and together with the description serve to explain the principles of the disclosure.
Fig. 1 is a flowchart illustrating a pedestrian trajectory prediction method according to an embodiment of the present disclosure.
Fig. 2 is a process flow of a preferred embodiment of step S110 in fig. 1.
Fig. 3 is a schematic diagram of the transformation of the world coordinate system and the self-perspective coordinate system of each pedestrian according to an embodiment of the present disclosure.
Fig. 4 is a process flow of a preferred embodiment of step S120 in fig. 1.
FIG. 5 is a schematic diagram of trajectory error selection according to an embodiment of the present disclosure.
Fig. 6 is a schematic structural diagram of a pedestrian trajectory prediction apparatus employing a hardware implementation of a processing system according to an embodiment of the present disclosure.
Description of the reference numerals
1000. Pedestrian trajectory prediction device
1002. First data processing module
1004. Encoder for encoding a video signal
1006. Decoder
1008. Generator
1010. Second data processing module
1100. Bus line
1200. Processor with a memory having a plurality of memory cells
1300. Memory device
1400. Other circuits.
Detailed Description
The present disclosure will be described in further detail with reference to the drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant matter and not restrictive of the disclosure. It should be further noted that, for the convenience of description, only the portions relevant to the present disclosure are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. Technical solutions of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Unless otherwise indicated, the illustrated exemplary embodiments/examples are to be understood as providing exemplary features of various details of some ways in which the technical concepts of the present disclosure may be practiced. Accordingly, unless otherwise indicated, features of the various embodiments may be additionally combined, separated, interchanged, and/or rearranged without departing from the technical concept of the present disclosure.
The use of cross-hatching and/or shading in the drawings is generally used to clarify the boundaries between adjacent components. As such, unless otherwise specified, the presence or absence of cross-hatching or shading does not convey or indicate any preference or requirement for a particular material, material property, size, proportion, commonality among the illustrated components and/or any other characteristic, attribute, property, etc., of a component. Further, in the drawings, the size and relative sizes of components may be exaggerated for clarity and/or descriptive purposes. While example embodiments may be practiced differently, the specific process sequence may be performed in a different order than that described. For example, two processes described consecutively may be performed substantially simultaneously or in reverse order to that described. In addition, like reference numerals denote like parts.
When an element is referred to as being "on" or "on," "connected to" or "coupled to" another element, it can be directly on, connected or coupled to the other element or intervening elements may be present. However, when an element is referred to as being "directly on," "directly connected to" or "directly coupled to" another element, there are no intervening elements present. For purposes of this disclosure, the term "connected" may refer to physically, electrically, etc., and may or may not have intermediate components.
The terminology used herein is for the purpose of describing particular embodiments and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, when the terms "comprises" and/or "comprising" and variations thereof are used in this specification, the presence of stated features, integers, steps, operations, elements, components and/or groups thereof are stated but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. It is also noted that, as used herein, the terms "substantially," "about," and other similar terms are used as approximate terms and not as degree terms, and as such, are used to interpret inherent deviations in measured values, calculated values, and/or provided values that would be recognized by one of ordinary skill in the art.
The pedestrian trajectory prediction method and the pedestrian trajectory prediction apparatus of the present disclosure are explained in detail below with reference to fig. 1 to 6.
Fig. 1 is a flowchart illustrating a pedestrian trajectory prediction method according to an embodiment of the present disclosure. Referring to fig. 1, a pedestrian trajectory prediction method S100 includes:
s110, acquiring observation track information of at least one pedestrian in a scene, and converting the observation track information of each pedestrian into self-view track information of each pedestrian at a self-view angle;
s120, acquiring the motion trend characteristics of each pedestrian based on the self-view track information of each pedestrian, and acquiring the interaction characteristics of each pedestrian and other pedestrians;
s130, generating future position information of each pedestrian under the self view angle at least based on the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians; and the number of the first and second groups,
s140, generating at least one future track of each pedestrian under the self view angle at least based on the future position information of each pedestrian under the self view angle, and converting the future track under the self view angle into the future track under the world coordinate system.
The scene may be a traffic scene in a field of view of the image capturing device during an automatic driving process (or a non-automatic driving process) of the vehicle. The image capture device may capture images and/or video. The image pickup device may be various types of image pickup apparatuses that can be provided to the automatic driving system of various vehicles. Wherein, the observation track information is track information under a world coordinate system.
In step S110, various different types of data sets may be preprocessed to obtain a trajectory sequence satisfying a length requirement in a scene, and a position and an identification ID number at each time, and a transformation matrix from a world coordinate system to a pedestrian self-view coordinate system is calculated one by one according to a historical observation trajectory of each pedestrian, and a trajectory coordinate of each pedestrian is converted into a self-coordinate system to obtain a trajectory sequence under multiple viewing angles. In step S120, preferably, the trajectory coordinate of each pedestrian at the self-view angle is used as an input, the input is sent to an encoder based on multi-head attention, the extracted feature is used as a representation of the motion trend, the last position coordinate of other pedestrians is observed in the self-coordinate system of each pedestrian, the coordinate information of the part is encoded through a full connection layer, the extracted feature is used as an interactive feature at multiple view angles, and then the motion trend feature, the interactive feature at multiple view angles and noise are fused. In step S130, the fused feature is entirely fed into a decoder based on multi-head attention, and the future trajectory position is iteratively output. In the training process, the training error is restrained so as to improve the stability of multi-track generation. In step S140, a plurality of tracks are generated at a time, preferably by the generator, and the tracks of the pedestrian in the self-view coordinate system are converted back to the world coordinate system, resulting in the final prediction result.
According to the pedestrian trajectory prediction method S100 of the preferred embodiment of the present disclosure, in the above embodiment, acquiring the observation trajectory information of at least one pedestrian of the scene includes:
s111, acquiring position information of each pedestrian in a world coordinate system from the picture and/or video of the scene, and generating observation track information; and the number of the first and second groups,
and S112, extracting the observation track information of the pedestrian with the length being greater than or equal to the preset length in the scene and the identification number (ID) of the corresponding pedestrian.
Wherein the preset length described above may be represented by a frame length.
In step S112, the data set (the observation trajectory information of the pedestrian and the identification number (ID) of the corresponding pedestrian) acquired from the picture and/or video of the scene preferably satisfies the observation trajectory information (observation trajectory sequence) of the observation length and the predicted length.
For scene C, a sequence of observation trajectoriesDenotes the time step t =1 obs Position of trajectory (trajectory coordinates), real trajectory sequence input at the time of model training described belowDenotes the time step t = t obs +1,...,t pred Position of tracks, order of predicted tracksAnd (4) showing.
X,Y,Respectively representing the track sequence of the pedestrian under all scenes. Where the superscript C denotes the C-th scene, the subscript i denotes the i-th of the n pedestrians, the superscript t denotes the corresponding time frame scale, obs denotes the observation sequence frame length, and pred denotes the prediction sequence frame length.
When model training is carried out, the length of a training data sequence is at least equal to or more than obs + pred, when prediction inference is carried out, the length of the sequence is at least equal to or more than obs, and identification numbers of pedestrians are respectively stored according to different scenesWhere ped _ ID represents an ID number unique to each pedestrian so as to distinguish the trajectory information of different pedestrians.
For the pedestrian trajectory prediction method S100 of each of the above embodiments, preferably, converting the observation trajectory information of each pedestrian into the self-perspective trajectory information at the self-perspective of each pedestrian includes:
s113, generating a coordinate system transformation matrix for the corresponding pedestrian based on the observation track information of the pedestrian with the length greater than or equal to the preset length in the scene; and the number of the first and second groups,
and S114, based on the coordinate transformation matrix, converting the observation track information of the pedestrian into self-view observation track information under a self-view.
By generating a coordinate system change matrix for pedestrians whose observation trajectory information in the scene meets a preset length, the observation trajectory information of each pedestrian is converted into self-view trajectory information, and a plurality of pieces of self-view trajectory information can be generated.
According to one embodiment of the present disclosure, the trajectories are observed according to the history of different pedestrians in scene CAnd calculating a coordinate system transformation matrix (preferably a rotation and translation matrix) from the world coordinate system to the self-perspective coordinate system of each pedestrian, and fig. 3 is a schematic conversion diagram of the world coordinate system and the self-perspective coordinate system of each pedestrian.
Preferably, the rotation and translation matrix is obtained by:
The pedestrian self-perspective coordinate system takes an observation initial point (initial position of a pedestrian) as an origin, takes the starting position of the pedestrian to the last observation position as a vector and is taken as an x axis of the self-perspective coordinate system, and the other coordinate axis y is a direction vertical to the x axis;
constructing a translation and rotation transformation matrix T from a world coordinate system to a self-view coordinate system:
wherein Δ x i And Δ y i Is the difference between the ith pedestrian's own coordinate system origin and the world coordinate system origin in the scene, θ i The counterclockwise rotation angle between the view coordinate system and the world coordinate system is determined for the ith pedestrian.
Fig. 3 exemplarily shows a conversion diagram of the world coordinate system and the self-perspective coordinate system of each pedestrian.
More preferably, Δ x i 、Δy i And theta i Obtained by the following formula:
based on the transformation matrix T, the pedestrian trajectories in the scene C can be transformed from the world coordinate system to the respective self-view coordinate system, as follows:
by usingAn observation trajectory sequence (t = 1.., obs) in a self-view coordinate system representing the ith pedestrian in the scene C is usedRepresenting future real trajectory coordinates (t = obs + 1.., pred),the predicted trajectory sequence for the inference (i.e., the future trajectory sequence) is also saved into the same scene C.
Fig. 2 shows the processing flow of the preferred embodiment of step S110 described above.
For the pedestrian trajectory prediction method S100 of each of the above embodiments, preferably, S120, obtaining a movement trend characteristic of each pedestrian based on the self-perspective trajectory information of each pedestrian, and obtaining an interaction characteristic of each pedestrian with other pedestrians, with reference to fig. 4, includes:
s121, mixingEmbedding time information into self-visual-angle observation track information of the pedestrians under the self-visual angles to obtain a feature vector xi obs ;
S122, acquiring the final observation position information of each other pedestrian under the self-perspective coordinate system of each pedestrian respectively to obtain the interactive characteristics with each other pedestrian; and the number of the first and second groups,
s123, extracting feature vector xi based on full connection layer and multi-head attention mechanism obs Time domain implicit characteristic e of att I.e. motion trend characteristics.
Wherein, the time domain implies the feature e att A feature representation representing a spatial position variation of the pedestrian extracted from the plurality of attentions.
More preferably, it further comprises:
and S124, fusing the motion trend characteristics of each pedestrian, the interaction characteristics of each pedestrian and other pedestrians and Gaussian noise to generate a fused characteristic vector.
In the present embodiment, future position information at the self-perspective of each pedestrian is generated based on the fusion feature vector.
Preferably, in step S121, during the training process and the prediction process, each batch of input trajectory sequences (self-perspective observed trajectory information) needs to be trajectories in the same scene, and for the input trajectory sequences, the coordinate information is first expanded from two dimensions to a higher dimension D through the full connection layer, so as to fully represent the spatial position information of the pedestrian in the higher dimension.
Preferably, in the pedestrian trajectory prediction method S100 of each of the above embodiments, preferably, S121, the time information is embedded into the self-view observation trajectory information at the self-view angle of each pedestrian, and the feature vector ξ is obtained obs The method comprises the following steps:
performing dimension increasing processing on the self-view observation track information of each pedestrian under the self-view angle, so that the position expression of the self-view observation track information (namely the input observation track sequence) is increased from two dimensions to a preset dimension (D), and obtaining a characteristic vector e of each pedestrian for representing the spatial position of the pedestrian obs Embedding position codes, generating feature vectors after embedding position codesQuantity xi obs Wherein the feature vector e obs Based on the full connection layer, the method is obtained by the following formula:
e obs =φ(X view ;W g );
wherein phi (-) denotes a fully connected layer, W g Is the weight of the full connection layer, X view The self observation track sequence of each pedestrian under the self visual angle is obtained;
wherein, embedding position code, generating characteristic vector xi after embedding position code obs The method comprises the following steps:
the feature vector e obs And the feature vector p obs Adding to obtain a feature vector xi obs ;
Wherein the feature vector p obs The feature vectors for different frames and different dimensions are obtained by the following formula:
wherein, PE is a two-dimensional matrix, PE (-) represents the index of the parameter in the matrix (all the indexes are learnable weights), the size of the matrix and the characteristic vector xi obs Similarly, t represents the time step (i.e., frame position) of the sequence, D, represents each of the D dimensions.
With the above steps of the present embodiment, spatial position information representation ξ in the case where the observation trajectory sequence of each pedestrian has a temporal position code is obtained obs 。
According to the pedestrian trajectory prediction method S100 of the preferred embodiment of the present disclosure, in step S122, obtaining the last observed position information of each other pedestrian in the self-view coordinate system of each pedestrian to obtain the interactive features with each other pedestrian, respectively, includes:
respectively under the self-perspective coordinate system of each pedestrian, the final pedestrian of other pedestriansThe observation position information is subjected to dimension increasing processing, so that the position expression of the last observation position information of each pedestrian is increased from two dimensions to a preset dimension (D), and the interactive feature vector e of each pedestrian and each other pedestrian is obtained act Interactive feature vector e act Obtained by the following formula:
e act =φ(X view ;W g );
wherein, X view The observation trajectory sequence of other pedestrians under the self-vision angle of each pedestrian.
In this embodiment, the interactive feature vector e is preferably act And the feature vector e described above obs The generation mode is the same, namely the generation is carried out by adopting the same processing procedure.
For the pedestrian trajectory prediction method S100 of each of the above embodiments, preferably, S123, the feature vector ξ is extracted based on the all-connected layer and the multi-head attention mechanism obs Time domain implicit feature e of att The method comprises the following steps:
the feature vector xi obs Sending the data to a plurality of full connection layers for dimension transformation to generate multi-head attention input Query, key and Value, wherein the Query, key and Value values are respectively a feature vector zeta obs For mining the rule of the pedestrian position change with time.
According to a preferred embodiment of the present disclosure, four fully-connected layers are used to couple feature vector ξ obs Carrying out dimension transformation:
wherein l denotes the reference numerals of the fully connected layers of the four self-attention heads,representing the weight of four full connection layers, wherein subscript a is the identification of the full connection layer, and has no other meaning, and the obtained full connection layer parameters of query/key/value are all W a 。
For the pedestrian trajectory prediction method S100 of each of the above embodiments, preferably, the multi-head attention mechanism extraction trajectory sequence ξ in step S123 obs Time domain implicit feature e of att The method comprises the following steps:
the following formula was used for extraction:
head l =Attention(Query l ,Key l ,Value l ) l=1,2,3,4
MultiHead=φ(Concat(head 1 ,head 2 ,...,head l );W b )
wherein softmax is a normalization function, d k For Key dimension, concat operation is used to connect multiple vectors, W b Expressing the weight parameters in the phi (-) fully-connected network, and the characteristic expression extracted by the MultiHead is the time domain implicit characteristic e att The superscript T denotes the matrix transposition.
For the method S100 for predicting a pedestrian trajectory according to each of the above embodiments, preferably, the step S124 of generating a fusion feature vector by fusing the motion trend feature of each pedestrian, the interaction feature with other pedestrians, and the gaussian noise includes:
implicit characteristic e of time domain att Interactive feature vector e act And performing Concat operation on the Gaussian noise Z vector to obtain the final output e of the encoder encoder I.e. the fused feature vector.
The gaussian noise Z vector represents Z by a vector obtained by sampling from a gaussian distribution when multi-track prediction is realized.
Representation e of spatial position change of pedestrian by Concat operation in the present embodiment att Representation of pedestrian interaction act And the Gaussian noise Z vector is fused, so that the rule of the motion trail of the pedestrian in the scene can be more comprehensively acquired.
For the pedestrian trajectory prediction method S100 of each of the above embodiments, preferably, the step S130 of generating future position information of each pedestrian at the self-view angle based on at least the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians includes:
the final output e of the encoder encoder And sending the information into a decoder, and outputting future track sequence positions, namely future position information under the self-view angle of each pedestrian in an iterative mode.
With regard to the pedestrian trajectory prediction method S100 of each of the above embodiments, generating future position information at the self-perspective of each pedestrian based on at least the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians includes:
will finally output e encoder Into a decoder based on e encoder And the predicted track characteristics e existing for each pedestrian decoder And outputting future track sequence positions, namely future position information under the self-view angle of each pedestrian in an iterative mode.
For the pedestrian trajectory prediction method S100 of the above embodiment, it is preferable that e be finally output encoder Into a decoder based on e encoder And the predicted track characteristics e existing for each pedestrian decoder Outputting future track sequence positions in an iterative manner, wherein each iteration comprises:
existing predicted trajectory sequence for each pedestrianCarrying out coding operation to obtain coding characteristic xi pred ;
Based on coding characteristics xi pred Obtaining the existing predicted track characteristic e decoder ;
Based on e encoder And current e decoder Feature extraction from attention mechanism to obtain current feature vector e deatt :
Wherein d is encoder Is e encoder The upper corner mark T represents the matrix transpose,is the current e decoder (ii) a And the number of the first and second groups,
For the pedestrian trajectory prediction method S100 of the above embodiment, it is preferable to base it on the current e deatt (i.e. the) Obtaining new position point coordinatesObtained by the following formula:
wherein,for iteration p t Predicted coordinates of frame time, W y And the subscript y is the identification of the full connection layer, and has no other meaning.
For the pedestrian trajectory prediction method S100 of the above embodiment, it is preferable that the new position point coordinatesIs stored toTo recalculate to e at the next iteration decoder And performing new position predictionAnd (6) measuring. Preferably, in the training process, the coding of each time (frame time) only considers the track position information of the previous time (frame time), and the track position information of the rest times is set to be zero; in the prediction process, the coordinate of the initial prediction time (frame time) is initialized to zero, the new coordinate of the next time (frame time) is predicted each time, and the prediction sequence is updated at the same timeAnd coding features
Preferably, the new position point coordinatesIs stored toTo recalculate to e at the next iteration decoder And a new position prediction is made.
Preferably, the error loss function is used in the training process for training.
Preferably, the following error loss function (standard countermeasure generating network loss function loss) is used:
wherein G is a generator representing each internal module in the process of generating the track, wherein x-p data (χ) represents the data distribution from the real, z-p (z) represents the data distribution from the generator generation, where D represents the discriminator, which is specifically represented as:
D(z)=φ(z;W z )
wherein z represents the generated data, i.e. the predicted trajectory generated by the generator, W z Network parameters representing a full connectivity layer network.
With regard to the pedestrian trajectory prediction method S100 of each of the above embodiments, preferably, the generating of the future trajectory of each pedestrian at least one self-view angle based on at least the future position information of each pedestrian at the self-view angle in S140 includes:
during training, k tracks are generated for each pedestrian and n of the track errors are used in calculating lossAs an error accumulation and updates the parameters of the encoder-decoder model:
the skilled person can select the values of k and n appropriately, and n is generally smaller than k.
In the prediction process, only the trajectory needs to be directly generated by the generator, and a person skilled in the art can set the number (k pieces) of generated trajectories to complete the prediction of future trajectories of each pedestrian under the self-view angle, namely, the trajectory is obtained
Future trajectory for each pedestrianIt needs to be transformed from the self-perspective coordinate system back to the world coordinate system to obtain the final output, i.e. the predicted track sequence under the world coordinate system
For the method S100 for predicting pedestrian trajectories according to the foregoing embodiments, preferably, in the training process, the error distributions after the generation of the multiple trajectories are constrained, and a threshold is set according to the mean and standard deviation of the error distributions of the multiple trajectories, so as to represent a penalty of excessive randomness of the trajectories:
wherein,to generate the mean of the k trajectory error distributions,is the standard deviation of the distribution.
Preferably, the n tracks described above are selected by:
selecting the corresponding distribution of error valuesAnd discarding n tracks with error values exceeding the threshold interval, wherein the errors of the discarded tracks are not subjected to accumulative calculation, and the errors of the n tracks with the error values within the threshold interval are subjected to accumulative average to be used as the errors of the encoder-decoder model.
FIG. 5 is a schematic diagram of trajectory error selection according to an embodiment of the present disclosure. Referring to fig. 5, for example, only n =4 tracks of the shaded portion are selected to calculate the accumulated error for the prediction result at k = 8. Through the selection of the n tracks, the stability of track prediction is ensured.
For the steps of the pedestrian trajectory prediction method of each embodiment described above, the steps may be implemented by executing a computer program.
By the pedestrian trajectory prediction method based on multi-view coordinate system transformation and the multi-head attention mechanism, a unique self-view coordinate system is established one by one for each pedestrian appearing in a scene. The method comprises the steps of respectively converting track sequences in world coordinate systems into a plurality of self-perspective coordinate systems, simulating self-movement intentions and social intentions of pedestrians, extracting sequence features of tracks in the multi-perspective coordinate systems by using a multi-head attention mechanism, and on the basis of a network architecture generated by countermeasures, selectively superposing multi-track errors in a training process to ensure stability during multi-track generation, so that a complete pedestrian track prediction method with high calculation efficiency, outstanding generalization capability and strong stability is finally formed.
The pedestrian track prediction method is a track representation method of multi-view coordinate system transformation, based on the self-movement trend of each pedestrian, unique self-view coordinate systems of each pedestrian are established one by one, track coordinates are converted into the self-view coordinate systems, consideration of each pedestrian taking the pedestrian as the center during movement is simulated, the pedestrian behavior habit is closer, the pedestrian self-movement trend can be excavated in a track sequence, and the generalization capability and prediction accuracy of a model are improved.
Further, the track coordinate changes of other pedestrians are observed under the self-coordinate system of each pedestrian, and the behavior of each pedestrian observing other people at the self-angle during movement is simulated, so that the practical social practice is more met. The extracted features are used as the representation of interaction influence, so that the method is more reasonable in line-human interaction representation, and the track prediction precision can be improved.
According to the pedestrian trajectory prediction method, after multi-trajectory generation, the trajectory of the error value in the distribution specified interval is selected according to the distribution of the generated multi-trajectory errors, the trajectory errors are accumulated and averaged to serve as the loss errors of the model, then back propagation and training are carried out, the main movement intention of pedestrians can be reflected while the trajectory generation diversity is ensured, and the stability of model prediction is improved.
In the present disclosure, there are various modules available for the time series extraction module in the encoder-decoder, for example, LSTM, GRU can be used as an alternative to the multi-headed attention in the scheme, but there is a certain loss in speed.
In the present disclosure, the multi-view coordinate system may also be established by using the observation track sequence information to calculate the orientation angle and then rotating the world coordinate system counterclockwise or clockwise.
The present disclosure also provides a pedestrian trajectory prediction device.
According to one embodiment of the present disclosure, the pedestrian trajectory prediction apparatus 1000 includes:
the first data processing module 1002, the first data processing module 1002 acquires observation trajectory information of at least one pedestrian in a scene, and converts the observation trajectory information of each pedestrian into self-view trajectory information of each pedestrian at a self-view angle;
the encoder 1004 is used for acquiring the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians based on the self-view track information of each pedestrian;
the decoder 1006, the decoder 1006 generates future position information of each pedestrian under the self-view angle based on at least the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians;
a generator 1008, the generator 1008 generating at least one future trajectory for each pedestrian at its own perspective based at least on the future position information for each pedestrian at its own perspective; and the number of the first and second groups,
the second data processing module 1010, the second data processing module 1010 converts the future trajectory in the self-perspective into the future trajectory in the world coordinate system.
The pedestrian trajectory prediction apparatus 1000 may be implemented by a software architecture.
Fig. 6 shows the structure of a pedestrian trajectory prediction apparatus 1000 employing a hardware implementation of a processing system.
The apparatus may include corresponding means for performing each or several of the steps of the flowcharts described above. Thus, each step or several steps in the above-described flow charts may be performed by a respective module, and the apparatus may comprise one or more of these modules. The modules may be one or more hardware modules specifically configured to perform the respective steps, or implemented by a processor configured to perform the respective steps, or stored within a computer-readable medium for implementation by a processor, or by some combination.
Referring to fig. 6, the hardware architecture may be implemented using a bus architecture. The bus architecture may include any number of interconnecting buses and bridges depending on the specific application of the hardware and the overall design constraints. The bus 1100 couples various circuits including the one or more processors 1200, the memory 1300, and/or the hardware modules together. The bus 1100 may also connect various other circuits 1400, such as peripherals, voltage regulators, power management circuits, external antennas, and the like.
The bus 1100 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one connection line is shown, but no single bus or type of bus is shown.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present disclosure includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the implementations of the present disclosure. The processor performs the various methods and processes described above. For example, method embodiments in the present disclosure may be implemented as a software program tangibly embodied in a machine-readable medium, such as a memory. In some embodiments, some or all of the software program may be loaded and/or installed via memory and/or a communication interface. When the software program is loaded into memory and executed by a processor, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the processor may be configured to perform one of the methods described above in any other suitable manner (e.g., by means of firmware).
The logic and/or steps represented in the flowcharts or otherwise described herein may be embodied in any readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
For the purposes of this description, a "readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the readable storage medium include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). In addition, the readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in the memory.
It should be understood that portions of the present disclosure may be implemented in hardware, software, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps of the method implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a readable storage medium, and when executed, the program may include one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
A pedestrian trajectory prediction device 1000 according to still another embodiment of the present disclosure includes:
the system comprises an image acquisition device, a video acquisition device and a video processing device, wherein the image acquisition device acquires images and/or videos of a scene;
the first data processing module 1002, the first data processing module 1002 obtains observation trajectory information of at least one pedestrian in an image and/or a video of a scene, and converts the observation trajectory information of each pedestrian into self-view trajectory information of each pedestrian at a self-view angle;
the encoder 1004 is used for acquiring the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians based on the self-view track information of each pedestrian;
the decoder 1006, the decoder 1006 generates future position information of each pedestrian under the self-view angle based on at least the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians;
a generator 1008, the generator 1008 generating at least one future trajectory for each pedestrian at its own perspective based at least on the future position information for each pedestrian at its own perspective; and the number of the first and second groups,
the second data processing module 1010, the second data processing module 1010 converts the future trajectory in the self-perspective into the future trajectory in the world coordinate system.
The pedestrian trajectory prediction device 1000 according to the present embodiment may further include an image capturing device in addition to the pedestrian trajectory prediction devices 1000 according to the above-described embodiments.
The pedestrian trajectory prediction device (pedestrian trajectory prediction method) can be used for a sensing module for security protection or automatic driving, and can improve the decision planning capability of vehicles by predicting the future trajectory of pedestrians in road traffic, so that the road safety and efficiency are ensured.
The present disclosure also provides an electronic device, including: a memory storing execution instructions; and a processor or other hardware module that executes the execution instructions stored in the memory, such that the processor or other hardware module performs the above-described pedestrian trajectory prediction method.
The disclosure also provides a readable storage medium, in which an execution instruction is stored, and the execution instruction is executed by a processor to implement the above-mentioned pedestrian trajectory prediction method.
In the description of the present specification, reference to the description of "one embodiment/implementation", "some embodiments/implementations", "examples", "specific examples", or "some examples", etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment/implementation or example is included in at least one embodiment/implementation or example of the present application. In this specification, the schematic representations of the terms described above are not necessarily the same embodiment/mode or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments/modes or examples. Furthermore, the various embodiments/aspects or examples and features of the various embodiments/aspects or examples described in this specification can be combined and combined by one skilled in the art without conflicting therewith.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
It will be understood by those skilled in the art that the foregoing embodiments are merely for clarity of illustration of the disclosure and are not intended to limit the scope of the disclosure. Other variations or modifications may occur to those skilled in the art, based on the foregoing disclosure, and are still within the scope of the present disclosure.
Claims (6)
1. A pedestrian trajectory prediction method is characterized by comprising the following steps:
acquiring observation track information of at least one pedestrian in a scene, and converting the observation track information of each pedestrian into self-view track information of each pedestrian under a self-view;
acquiring the motion trend characteristics of each pedestrian based on the self-view track information of each pedestrian, and acquiring the interaction characteristics of each pedestrian and other pedestrians;
generating future position information of each pedestrian under a self-view angle at least based on the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians; and
generating at least one future track of each pedestrian under the self view angle at least based on the future position information of each pedestrian under the self view angle, and converting the future track under the self view angle into the future track under a world coordinate system;
wherein, with each pedestrian observe the orbit information and convert the self visual angle orbit information under each pedestrian's self visual angle into, include: generating a coordinate system transformation matrix for the corresponding pedestrian based on the observation track information of the pedestrian with the length greater than or equal to the preset length in the scene, and converting the observation track information of the pedestrian into self-view observation track information under a self-view based on the coordinate transformation matrix;
wherein, based on the self visual angle track information of each pedestrian, acquiring the motion trend characteristics of each pedestrian comprises: embedding the self-view angle observation track information of each pedestrian under the self-view angle into time information to obtain a feature vector; respectively acquiring the final observation position information of each other pedestrian under the self-viewing angle coordinate system of each pedestrian so as to obtain the interactive characteristics with each other pedestrian; extracting time domain implicit characteristics, namely motion trend characteristics, of the characteristic vectors based on a full connection layer and a multi-head attention mechanism;
wherein, acquire each pedestrian and other pedestrian's interactive feature, include: respectively carrying out dimension increasing processing on the last observation position information of each pedestrian under the self-perspective coordinate system of each pedestrian, so that the position expression of the last observation position information of each pedestrian is increased from two dimensions to preset dimensions, and obtaining the interactive feature vector of each pedestrian and each other pedestrian;
wherein, based on the motion trend characteristic of each pedestrian and the interactive characteristic of each pedestrian and other pedestrians at least, generate the future position information under the self-view angle of each pedestrian, including: generating future position information of each pedestrian under the self-view angle based on the fusion feature vector; the fusion feature vector is generated by performing fusion processing on the motion trend feature of each pedestrian, the interaction feature of each pedestrian and other pedestrians, and Gaussian noise.
2. The method for predicting pedestrian trajectories according to claim 1, wherein the step of generating a fusion feature vector by fusing the motion tendency feature of each pedestrian, the interaction feature of each pedestrian with other pedestrians, and gaussian noise comprises:
and performing Concat operation on the motion trend characteristic, the interaction characteristic and the Gaussian noise to obtain the final output of the encoder, namely a fusion characteristic vector.
3. A pedestrian trajectory prediction device characterized by comprising:
the first data processing module acquires observation track information of at least one pedestrian in a scene and converts the observation track information of each pedestrian into self-view track information of each pedestrian under a self-view angle;
the encoder acquires the motion trend characteristics of each pedestrian based on the self-view track information of each pedestrian and acquires the interaction characteristics of each pedestrian and other pedestrians;
the decoder generates future position information of each pedestrian under the self view angle at least based on the motion trend characteristics of each pedestrian and the interaction characteristics of each pedestrian and other pedestrians;
a generator that generates a future trajectory for each pedestrian at least one self perspective based at least on future position information for each pedestrian at its self perspective; and
the second data processing module converts the future track under the self view angle into the future track under a world coordinate system;
wherein, with each pedestrian observe the orbit information and convert the self visual angle orbit information under each pedestrian's self visual angle into, include: generating a coordinate system transformation matrix for the corresponding pedestrian based on the observation track information of the pedestrian with the length greater than or equal to the preset length in the scene, and converting the observation track information of the pedestrian into self-view observation track information under a self-view based on the coordinate transformation matrix;
wherein, based on the self visual angle track information of each pedestrian, acquiring the motion trend characteristics of each pedestrian comprises: embedding the self-visual angle observation track information of each pedestrian under the self-visual angle into time information to obtain a feature vector; respectively acquiring the final observation position information of each other pedestrian under the self-perspective coordinate system of each pedestrian so as to obtain the interactive characteristics with each other pedestrian; extracting time domain implicit characteristics, namely motion trend characteristics, of the characteristic vector based on a full connection layer and a multi-head attention mechanism;
wherein, obtain each pedestrian and other pedestrian's interactive feature, include: respectively carrying out dimension increasing processing on the last observed position information of each other pedestrian under the self-viewing angle coordinate system of each pedestrian, so that the position expression of the last observed position information of each other pedestrian is increased from two dimensions to preset dimensions, and obtaining the interactive feature vector of each pedestrian and each other pedestrian;
wherein, based on the motion trend characteristic of each pedestrian and the interactive characteristic of each pedestrian and other pedestrians at least, generate the future position information under the self-view angle of each pedestrian, including: generating future position information of each pedestrian under the self-view angle based on the fusion feature vector; the fusion feature vector is generated by performing fusion processing on the motion trend feature of each pedestrian, the interaction feature of each pedestrian and other pedestrians, and Gaussian noise.
4. An electronic device, comprising:
a memory storing execution instructions; and
a processor executing execution instructions stored by the memory to cause the processor to perform the pedestrian trajectory prediction method of claim 1 or 2.
5. A readable storage medium having stored therein executable instructions for implementing the pedestrian trajectory prediction method of claim 1 or 2 when executed by a processor.
6. A computer program product comprising computer programs/instructions, characterized in that said computer programs/instructions, when executed by a processor, implement the pedestrian trajectory prediction method of claim 1 or 2 above.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210255592.6A CN114581487B (en) | 2021-08-02 | 2021-08-02 | Pedestrian trajectory prediction method, device, electronic equipment and computer program product |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110882258.9A CN113538520B (en) | 2021-08-02 | 2021-08-02 | Pedestrian trajectory prediction method and device, electronic equipment and storage medium |
CN202210255592.6A CN114581487B (en) | 2021-08-02 | 2021-08-02 | Pedestrian trajectory prediction method, device, electronic equipment and computer program product |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110882258.9A Division CN113538520B (en) | 2021-08-02 | 2021-08-02 | Pedestrian trajectory prediction method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114581487A CN114581487A (en) | 2022-06-03 |
CN114581487B true CN114581487B (en) | 2022-11-25 |
Family
ID=78090148
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110882258.9A Active CN113538520B (en) | 2021-08-02 | 2021-08-02 | Pedestrian trajectory prediction method and device, electronic equipment and storage medium |
CN202210255592.6A Active CN114581487B (en) | 2021-08-02 | 2021-08-02 | Pedestrian trajectory prediction method, device, electronic equipment and computer program product |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110882258.9A Active CN113538520B (en) | 2021-08-02 | 2021-08-02 | Pedestrian trajectory prediction method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN113538520B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113888601B (en) * | 2021-10-26 | 2022-05-24 | 北京易航远智科技有限公司 | Target trajectory prediction method, electronic device, and storage medium |
CN113902776B (en) * | 2021-10-27 | 2022-05-17 | 北京易航远智科技有限公司 | Target pedestrian trajectory prediction method and device, electronic equipment and storage medium |
CN115359568B (en) * | 2022-08-24 | 2023-06-02 | 深圳职业技术学院 | Simulation method for pedestrian intelligent body movement and emergency evacuation and computer equipment |
CN116259176B (en) * | 2023-02-17 | 2024-02-06 | 安徽大学 | Pedestrian track prediction method based on intention randomness influence strategy |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516613A (en) * | 2019-08-29 | 2019-11-29 | 大连海事大学 | A kind of pedestrian track prediction technique under first visual angle |
CN110895674A (en) * | 2018-09-13 | 2020-03-20 | 本田技研工业株式会社 | System and method for future vehicle localization based on self-centric vision |
CN111552799A (en) * | 2020-04-30 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN112766561A (en) * | 2021-01-15 | 2021-05-07 | 东南大学 | Generating type confrontation track prediction method based on attention mechanism |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8457357B2 (en) * | 2011-11-09 | 2013-06-04 | Disney Enterprises, Inc. | Relative pose estimation of non-overlapping cameras using the motion of subjects in the camera fields of view |
FR3054336A1 (en) * | 2016-07-22 | 2018-01-26 | Parrot Drones | SELF-CONTAINING DRONE-CONDUCTED VIEWING SYSTEM WITH TARGET TRACKING AND IMPROVED TARGET LOCATION. |
US10843089B2 (en) * | 2018-04-06 | 2020-11-24 | Rovi Guides, Inc. | Methods and systems for facilitating intra-game communications in a video game environment |
US11436441B2 (en) * | 2019-12-17 | 2022-09-06 | Google Llc | Systems and methods for training a machine learned model for agent navigation |
CN111241963B (en) * | 2020-01-06 | 2023-07-14 | 中山大学 | First person view video interactive behavior identification method based on interactive modeling |
CN112289338B (en) * | 2020-10-15 | 2024-03-12 | 腾讯科技(深圳)有限公司 | Signal processing method and device, computer equipment and readable storage medium |
-
2021
- 2021-08-02 CN CN202110882258.9A patent/CN113538520B/en active Active
- 2021-08-02 CN CN202210255592.6A patent/CN114581487B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110895674A (en) * | 2018-09-13 | 2020-03-20 | 本田技研工业株式会社 | System and method for future vehicle localization based on self-centric vision |
CN110516613A (en) * | 2019-08-29 | 2019-11-29 | 大连海事大学 | A kind of pedestrian track prediction technique under first visual angle |
CN111552799A (en) * | 2020-04-30 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN112766561A (en) * | 2021-01-15 | 2021-05-07 | 东南大学 | Generating type confrontation track prediction method based on attention mechanism |
Non-Patent Citations (3)
Title |
---|
How Can I See My Future?FvTraj:Using First-Person View for Pedestrian Trajectory Prediction;Huikun Bi 等;《ECCV 2020》;20201109;第576-593页 * |
Huikun Bi 等.How Can I See My Future?FvTraj:Using First-Person View for Pedestrian Trajectory Prediction.《ECCV 2020》.2020,第576-593页. * |
Spatial-Channel Transformer Network for Trajectory Prediction on the Traffic Scenes;Spatial-Channel Transformer Network for Trajectory Prediction on;《arXiv》;20210205;第1-6页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114581487A (en) | 2022-06-03 |
CN113538520A (en) | 2021-10-22 |
CN113538520B (en) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114581487B (en) | Pedestrian trajectory prediction method, device, electronic equipment and computer program product | |
Lu et al. | Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks | |
US11494937B2 (en) | Multi-task multi-sensor fusion for three-dimensional object detection | |
Simonelli et al. | Disentangling monocular 3d object detection: From single to multi-class recognition | |
WO2019230339A1 (en) | Object identification device, system for moving body, object identification method, training method of object identification model, and training device for object identification model | |
CN110738309B (en) | DDNN training method and DDNN-based multi-view target identification method and system | |
Aliakbarian et al. | Flag: Flow-based 3d avatar generation from sparse observations | |
Xiang et al. | HM-ViT: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer | |
CN109272493A (en) | A kind of monocular vision odometer method based on recursive convolution neural network | |
CN114719848B (en) | Unmanned aerial vehicle height estimation method based on vision and inertial navigation information fusion neural network | |
CN114091598B (en) | Multi-vehicle cooperative environment sensing method based on semantic-level information fusion | |
Zhou et al. | Graph attention guidance network with knowledge distillation for semantic segmentation of remote sensing images | |
CN116612468A (en) | Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism | |
CN112489072B (en) | Vehicle-mounted video perception information transmission load optimization method and device | |
CN110281949A (en) | A kind of automatic Pilot unifies hierarchical decision making method | |
CN114973407A (en) | RGB-D-based video three-dimensional human body posture estimation method | |
CN112270701A (en) | Packet distance network-based parallax prediction method, system and storage medium | |
CN116486038A (en) | Three-dimensional construction network training method, three-dimensional model generation method and device | |
Lian et al. | Semantic fusion infrastructure for unmanned vehicle system based on cooperative 5G MEC | |
CN117516581A (en) | End-to-end automatic driving track planning system, method and training method integrating BEVFomer and neighborhood attention transducer | |
CN117710429A (en) | Improved lightweight monocular depth estimation method integrating CNN and transducer | |
US11238601B2 (en) | Multi-scale recurrent decoder for monocular depth estimation | |
CN117372697A (en) | Point cloud segmentation method and system for single-mode sparse orbit scene | |
Lu et al. | Monocular semantic occupancy grid mapping with convolutional variational auto-encoders | |
CN117132952A (en) | Bird's eye view angle vehicle perception system based on many cameras |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |