CN113568410B - Heterogeneous intelligent body track prediction method, system, equipment and medium - Google Patents

Heterogeneous intelligent body track prediction method, system, equipment and medium Download PDF

Info

Publication number
CN113568410B
CN113568410B CN202110866999.8A CN202110866999A CN113568410B CN 113568410 B CN113568410 B CN 113568410B CN 202110866999 A CN202110866999 A CN 202110866999A CN 113568410 B CN113568410 B CN 113568410B
Authority
CN
China
Prior art keywords
heterogeneous
category
interaction
attention
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110866999.8A
Other languages
Chinese (zh)
Other versions
CN113568410A (en
Inventor
王乐
郑方
周三平
陈仕韬
辛景民
郑南宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Shun'an Artificial Intelligence Research Institute
Xian Jiaotong University
Original Assignee
Ningbo Shun'an Artificial Intelligence Research Institute
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Shun'an Artificial Intelligence Research Institute, Xian Jiaotong University filed Critical Ningbo Shun'an Artificial Intelligence Research Institute
Priority to CN202110866999.8A priority Critical patent/CN113568410B/en
Publication of CN113568410A publication Critical patent/CN113568410A/en
Application granted granted Critical
Publication of CN113568410B publication Critical patent/CN113568410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a heterogeneous intelligent body track prediction method, a heterogeneous intelligent body track prediction system, heterogeneous intelligent body track prediction equipment and a heterogeneous intelligent body track prediction medium, wherein the heterogeneous intelligent body track prediction method comprises the following steps: respectively carrying out graph characterization on the category and the space-time track of the heterogeneous intelligent agent based on the track points and the category of the heterogeneous intelligent agent to obtain a category interaction graph and a space interaction graph; learning heterogeneous attentiveness based on the category interaction diagram to obtain category-level heterogeneous attentiveness; based on the space interaction diagram and the class-level heterogeneous attentions, heterogeneous intelligent-level heterogeneous attentions are obtained; and modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result. The method can effectively model interaction and track trend of heterogeneous intelligent bodies of various types, and can remarkably improve accuracy of track prediction.

Description

Heterogeneous intelligent body track prediction method, system, equipment and medium
Technical Field
The invention belongs to the technical field of computer vision, relates to the field of track prediction, and in particular relates to a heterogeneous intelligent body track prediction method, a heterogeneous intelligent body track prediction system, heterogeneous intelligent body track prediction equipment and a heterogeneous intelligent body track prediction medium.
Background
The purpose of track prediction is to predict future track sequences of agents in traffic scenes based on the observed tracks. Trajectory prediction is a challenging computer vision problem and has many real world applications (e.g., autopilot, anomaly detection, and motion recognition, etc.).
Currently, challenges that hinder prediction accuracy stem largely from complex interactions between agents, and recent advances in this regard fall into two main categories:
(1) Constructing a space diagram on each time step based on a graph (Graphs) method, and aggregating the characteristics of adjacent nodes;
(2) The method based on the cyclic neural network models the trajectory of each agent by using the cyclic neural network (RNN, LSTM) and extracts hidden states in surrounding areas.
However, the above-described method has limitations:
(1) The graph-based method only uses the pairwise relationship between nodes, and other nodes are mixed and relayed; in contrast, traffic interactions in the real world are much more complex than previously thought, such as polygonal relationships (relationships between three or more agents); that is, this approach is limited by the inflexible number of interaction partners.
(2) The cyclic neural network-based method only considers local relations between the intelligent objects in the manually defined peripheral region, and potential interactive participants outside the peripheral region are directly ignored; that is, this approach is limited by the manner in which the interactive agent is manually selected.
In addition, most of the existing methods only focus on track prediction of the intelligent agent in isomorphic scenes, such as scenes with pedestrians or automobiles, and neglect track prediction of the intelligent agent in heterogeneous scenes, such as scenes with pedestrians, cars, bicycles, trucks, carts, and the like. In fact, the latter is the more realistic case, and because of the differences in motion patterns (such as speed, fore-aft distance, and response to interactions) of different classes of agents, the reaction patterns of people to different classes of agents are also different, and trajectory prediction of heterogeneous agents is more challenging than isomorphic agents.
In view of the foregoing, there is a need for new heterogeneous intelligent agent trajectory prediction methods, systems, devices, and media.
Disclosure of Invention
The invention aims to provide a heterogeneous intelligent body track prediction method, a heterogeneous intelligent body track prediction system, heterogeneous intelligent body track prediction equipment and a heterogeneous intelligent body track prediction medium, so as to solve one or more of the technical problems. Aiming at the technical problems of the existing method, such as the limit of the number, the distance and the category of interaction participants, an effective infinite neighborhood interaction network is provided, all potential agents affected by the same interaction are extracted to obtain information, and meanwhile, different types of agents in a heterogeneous scene are respectively modeled, so that the interaction and track trend of various types of heterogeneous agents can be effectively modeled, and the accuracy of track prediction can be remarkably improved.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the invention discloses a heterogeneous intelligent body track prediction method, which comprises the following steps of:
respectively carrying out graph characterization on the category and the space-time track of the heterogeneous intelligent agent based on the track points and the category of the heterogeneous intelligent agent to obtain a category interaction graph and a space interaction graph;
learning heterogeneous attentiveness based on the category interaction diagram to obtain category-level heterogeneous attentiveness;
based on the space interaction diagram and the class-level heterogeneous attentions, heterogeneous intelligent-level heterogeneous attentions are obtained;
and modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result.
The invention further improves that the method carries out graph characterization on the category and the space-time track of the heterogeneous intelligent agent based on the track points and the category of the heterogeneous intelligent agent, and the steps for obtaining the category interaction graph and the space interaction graph specifically comprise the following steps:
1.1 Track points of heterogeneous intelligent agents in a traffic scene and categories thereof are used as input, track points of all heterogeneous intelligent agent examples are used as example graph nodes, and a space interaction graph with the number equal to the number of time frames is obtained;
Taking the tracks of all the heterogeneous intelligent agent instances of the same category as the nodes of the high-level category map to obtain category interaction maps with the number equal to the number of time frames;
1.2 Establishing an adjacency matrix of the heterogeneous intelligent agent instance in the space dimension through a space interaction diagram of the heterogeneous intelligent agent instance, wherein the adjacency matrix is set as a full-connection matrix and is used for representing the mutual correlation among all the heterogeneous intelligent agent instances; carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix;
1.3 For unbalanced heterogeneous intelligent agent instance data in the class interaction diagram, adopting a zero filling method to carry out completion operation.
The invention further improves that the step of learning the heterogeneous attention based on the class interaction diagram comprises the following steps:
2.1 Acquiring category characteristics of each category on the constructed category interaction graph, and obtaining interaction weights among the categories through pooling operation; obtaining an embedded value of each category through linear projection;
2.2 Connecting the embedded values of any two categories through the embedded value of each category to obtain a fusion embedded value, and obtaining a category-category attention vector in a time frame t through a graph annotation mechanism;
2.3 Adjusting category-category attention weights by a learnable weight vector, activating with a nonlinear function to obtain overall attention scores for measuring interactions between each category and other categories;
2.4 Normalizing the obtained attention weight between any two categories to obtain the final category-category interaction as the side of the category interaction diagram.
The invention further improves that the step of obtaining the heterogeneous attention of the heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level specifically comprises the following steps:
3.1 Using a distance-based method: initializing a space edge through the relative distance between corresponding heterogeneous intelligent agent examples, and then obtaining a Laplacian normalized interaction matrix through Laplacian transformation;
3.2 Using a learning-based approach: multiplying the fusion characteristics of the heterogeneous intelligent body instance by the Laplacian normalized interaction matrix point by point to obtain an attention matrix of the interaction of the heterogeneous intelligent body instance and the heterogeneous intelligent body instance;
3.3 Defining all heterogeneous intelligent agent instances involved in one interaction as infinite neighbors; adaptively capturing interactions between an indefinite number of heterogeneous intelligent agent instances using symmetric convolution, obtaining information of all heterogeneous intelligent agent instances involved in the same interaction;
3.4 Using a fill operation to ensure that the output size is the same as the input size; aggregating global spatial interaction information by repeatedly computing an asymmetric convolution; the interaction attention of the heterogeneous intelligent agent at the instance level is obtained by fusing infinite neighborhood and interaction at the class-class level.
A further improvement of the invention is that, in the process of learning heterogeneous attentions based on the class interaction diagram to obtain class-level heterogeneous attentions,
the attention construction formula is as follows:
Figure BDA0003187654000000041
Figure BDA0003187654000000042
Figure BDA0003187654000000043
Figure BDA0003187654000000044
in the method, in the process of the invention,
Figure BDA0003187654000000045
representative category feature embedding, W e For training parameters, padding represents a fill operation, < ->
Figure BDA0003187654000000046
Is a class diagram node, phi is a linear projection function, delta is a nonlinear activation function, mu c A learnable attention weight vector for category c,/->
Figure BDA0003187654000000047
Representative category c 1 For category c 2 Is the attention score of (i) is the concatenation operation, +.>
Figure BDA0003187654000000048
For category c 1 For category c 2 Is (are) attention of->
Figure BDA0003187654000000049
For the total force of interest obtained for category c, max-pooling is the maximum pooling operation, +.>
Figure BDA00031876540000000410
Representing the final attention after weight assignment.
A further improvement of the invention is that, in the course of said deriving heterogeneous attention at the level of heterogeneous intelligent agents based on said spatial interaction map and said heterogeneous attention at the level of categories,
The attention construction formula is as follows:
Figure BDA00031876540000000411
Figure BDA0003187654000000051
in the method, in the process of the invention,
Figure BDA0003187654000000052
inverse of the nth power of the degree matrix, E t For the edge matrix of the example space-time interaction diagram, R t For Laplace matrix, ATT t And (5) interacting an information matrix for the example agent.
A further improvement of the invention is that, in the modeling of interaction information and modeling of temporal trends of trajectories based on the heterogeneous attentions of the heterogeneous intelligent levels,
the modeling formulas for the interaction information and the time trend of the track are as follows:
Figure BDA0003187654000000053
HT==TCN(H t )
Figure BDA0003187654000000054
wherein H is l For the first layer interactive information of the graph convolution network, TCN is a time convolution network for extracting the trend of the track along with time, HT is final output, L i The obtained binary Gaussian mixture distribution function is used for fitting the track distribution of the future intelligent agent.
The invention discloses a heterogeneous intelligent body track prediction system, which comprises:
the diagram characterization module is used for respectively carrying out diagram characterization on the category and the space-time trajectory of the heterogeneous intelligent body based on the trajectory points and the category of the heterogeneous intelligent body to obtain a category interaction diagram and a space interaction diagram;
the first heterogeneous attention acquisition module is used for learning heterogeneous attention based on the category interaction diagram to obtain category-level heterogeneous attention;
the second heterogeneous attention acquisition module is used for obtaining heterogeneous attention of a heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level;
And the prediction result acquisition module is used for modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result.
An electronic device of the present invention includes a processor and a memory, where the processor is configured to execute a computer program stored in the memory to implement a heterogeneous intelligent agent trajectory prediction method according to any one of the present invention.
A computer readable storage medium of the present invention stores at least one instruction that when executed by a processor implements a heterogeneous intelligent agent trajectory prediction method according to any one of the present invention.
Compared with the prior art, the invention has the following beneficial effects:
aiming at the problem of poor interactive modeling effect of the hidden state of the LSTM on the intelligent agent in the prior art, the method uses the construction of the heterogeneous space-time diagram, converts the input data of the track position into space-time data and characterizes the category information of the intelligent agent, and then constructs the heterogeneous space-time diagram of the method. The heterogeneous space-time diagram comprises two stages of nodes, secondary nodes are used for representing intelligent agents, and advanced nodes are used for representing categories. The edges of the graph represent the spatial/temporal and class associations of nodes, where the spatial association is the interaction between agents, which is directed, i.e. the inventive method considers that the attention between agents is not symmetrical. The temporal association represents the continuity of the same agent between points in time, while the categorical association includes both the association of two levels of nodes within the same class of agent and the association of heterogeneous advanced nodes. Thus, the heterogeneous space-time diagram constructed by the method models the scene of each frame as a diagram, learns the interaction of the intelligent agent in space based on the diagram, and simultaneously, the spatial information of each frame is continuous in time to form a time diagram, so that the motion continuity of the intelligent agent in time is learned. Thus realizing interactive modeling of the whole scene and the whole quantity.
Aiming at the problems of gradient explosion, gradient disappearance and the like caused by LSTM in the prior art, the invention uses the joint modeling of a graph rolling network (GCN) and a Time Convolution Network (TCN) to solve the problems of gradient explosion, gradient disappearance and the like in the training process of the neural network.
Aiming at the problem that the interaction in the prior art is limited by the quantity and the range of traffic agents, the invention uses asymmetric small convolution kernel and a large-size pooling and packing operation to model the interaction of the agents, thereby realizing the interaction without being limited by the quantity or the space-time range of the agents, avoiding the defects of LSTM and GCN and realizing the interaction of unlimited neighborhood.
Aiming at the problem that the node capacity of the GCN and other graph neural network models for representing heterogeneous properties is insufficient in the prior art, the invention uses a layered graph attention module which learns the attention among categories and the attention among the agents, thereby helping to model interaction among the agents. The high-level attention module is used for constructing interactions among categories, and the low-level attention module is used for matching with the unrestricted neighborhood interaction module to finally model interactions among intelligent agents. Thus, the interaction of the heterogeneous intelligent bodies is well modeled. The invention uses Gaussian mixture model to distinguish different motion modes and influence forces among different intelligent agents.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description of the embodiments or the drawings used in the description of the prior art will make a brief description; it will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the invention and that other drawings may be derived from them without undue effort.
Fig. 1 is a flow chart of a method for predicting trajectories of heterogeneous scenes based on interactive information in convergence unrestricted neighbor according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a track prediction result of the method according to the present invention in a world coordinate system of nuScenes dataset;
fig. 3 is a schematic diagram of a long-time spliced track prediction result of the method in a real scene of a nuScenes dataset according to an embodiment of the invention;
FIG. 4 is a schematic diagram of hierarchical interaction attention prediction results according to the method of the present invention in an embodiment of the present invention.
Detailed Description
In order to make the purposes, technical effects and technical solutions of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it will be apparent that the described embodiments are some of the embodiments of the present invention. Other embodiments, which may be made by those of ordinary skill in the art based on the disclosed embodiments without undue burden, are within the scope of the present invention.
Referring to fig. 1, the track prediction method of heterogeneous scenes based on interactive information in convergent unrestricted neighboring domains according to the embodiment of the invention includes the following steps:
step 1: and carrying out graph characterization on the space-time track and the category of the heterogeneous intelligent agent based on the track point and the category of the heterogeneous intelligent agent to obtain a space interaction graph and a category interaction graph.
Specifically, step 1 specifically includes:
1) Track points of heterogeneous intelligent agents and categories thereof in a traffic scene are used as input, wherein track points of all heterogeneous intelligent agent examples are used as example graph nodes, and a space interaction graph with the number equal to the number of time frames is obtained. The trajectories of all the same-category intelligent agent examples are used as high-level category map nodes, and category interaction maps with the number equal to the number of time frames are obtained;
2) An adjacency matrix of the instance in the space dimension is established through a space interaction diagram of the instance, and the adjacency matrix is set as a full-connection matrix to represent the mutual correlation among all the instances. Carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix, namely carrying out Fourier transformation of the graph, so that the characteristic values and characteristic vectors of the Laplace matrix of the graph are utilized to study the properties of the graph, and the nodes on the graph are enabled to obtain a certain degree of information interaction without translation invariance;
3) And carrying out complement operation on unbalanced example data in the class interaction diagram by adopting a zero filling method, so that the class interaction diagram of each time frame keeps a relatively complete structure, and a connecting channel with the example interaction diagram is designed to transmit information.
Step 2: and learning heterogeneous attentions based on the category interaction graph to obtain category-level heterogeneous attentions.
Specifically, step 2 specifically includes:
1) To construct interactions between classes, class features for each class are obtained on a constructed spatiotemporal class diagram and then passed through a pooling operation based thereonAnd obtaining the interaction weight between the categories. Obtaining embedded values of each class by a linear projection
Figure BDA0003187654000000081
I.e. embedding class c at time step t; />
2) And connecting any two kinds of embedded devices through the embedded devices of each kind to obtain a fusion embedded device. Obtaining a category-category attention vector in a time frame t through a graph annotation mechanism;
3) Category-category attention weights are adjusted by a learnable weight vector and activated using a nonlinear function, and then a global attention score is obtained to measure interactions between one category and other categories;
4) And normalizing the obtained attention weight between any two categories to obtain the final category-category interaction as the side of the space category interaction graph. The weights of the spatial class edges represent class-class interactions, and the obtained interaction values are prepared for future assignment to the instance edges.
Step 3: heterogeneous attention at the heterogeneous intelligent agent level was obtained by spatially interacting the graphics Xi Yigou attention.
Specifically, the step 3 specifically includes:
1) Using a distance-based method: initializing a space edge through the relative distance between corresponding examples, and then obtaining a normalized interaction matrix through Laplace transformation;
2) Using a learning-based approach: and (3) multiplying the example fusion characteristic obtained in the step (2) by the Laplacian normalized interaction matrix obtained in the first step point by point to obtain an example-example interaction attention matrix.
Step 4: interactions of unrestricted neighbors are modeled.
Specifically, step 4 specifically includes:
1) All heterogeneous agents involved in one interaction are defined as infinite neighbors, regardless of the number of agents and the distance apart. Adaptively capturing interactions between an indefinite number of heterogeneous intelligent agents using symmetric convolution, acquiring information of all the heterogeneous intelligent agents involved in the same interaction at one time;
2) A padding operation is used to ensure that the output size is the same as the input size. Global spatial interaction information is aggregated by repeatedly computing an asymmetric convolution. The interaction attention of the final example-level heterogeneous intelligent agent is obtained by fusing infinite neighborhood and category-category level interactions.
Step 5: and (5) reasoning a distribution function of future track points.
Specifically, step 5 specifically includes:
1) According to the interaction attention matrix of the heterogeneous intelligent agent obtained in the step 3 and the step 4, taking the interaction attention matrix as the edge of the characterization graph constructed in the step 1;
2) Extracting and modeling interaction of the heterogeneous intelligent agent by using a graph convolution operation, and extracting and modeling historical track movement trend of the heterogeneous intelligent agent by using a time convolution operation;
3) And according to the spatial information and the time information obtained by the 1) and the 2), carrying out channel compression to obtain binary Gaussian mixture distribution conforming to logic so as to predict future tracks of heterogeneous intelligent bodies.
In the step 1, the space-time and class characterization diagram G of the heterogeneous intelligent agent stc Three relationships are shown: 1) The method comprises the steps that in a certain time frame, the heterogeneous intelligent bodies interact with each other in space, each node of the graph is represented by a track coordinate point of all the heterogeneous intelligent bodies in the time frame, each side of the graph is represented by the space interaction of each heterogeneous intelligent body, and the side is a directed side; 2) A representation graph of heterogeneous intelligent agents in a time dimension, wherein each node of the graph is represented by a track coordinate point of all the heterogeneous intelligent agents in the time dimension, and each side of the graph is represented by the continuity of the track of the same heterogeneous intelligent agent in time; 3) The method comprises the steps of representing a graph through interaction on a class abstraction level, wherein each node of the graph is represented by fusion of space-time characteristics of all similar intelligent agents, edges of the graph are divided into two classes, one class is a node interconnection edge represented by class-class abstraction interaction, the other class is an interconnection edge with class level pointing to the direction of a real level, and the edge is extracted by the class Like hierarchical collective interactive feature characterization.
In the step 2, the interactive attention construction formula between the categories is as follows:
Figure BDA0003187654000000101
/>
Figure BDA0003187654000000102
Figure BDA0003187654000000103
Figure BDA0003187654000000104
in the method, in the process of the invention,
Figure BDA0003187654000000105
representative category feature embedding, W e For training parameters, padding represents a fill operation, < ->
Figure BDA0003187654000000106
Is a class diagram node, phi is a linear projection function, delta is a nonlinear activation function, mu c A learnable attention weight vector for category c,/->
Figure BDA0003187654000000107
Representative category c 1 For category c 2 Is the attention score of (i) is the concatenation operation, +.>
Figure BDA0003187654000000108
For category c 1 For category c 2 Is (are) attention of->
Figure BDA0003187654000000109
For the total force of interest obtained for category c, max-pooling is the maximum poolManipulation of (I) and (II)>
Figure BDA00031876540000001010
Representing the final attention after weight assignment.
In the step 3, the attention construction formula of the example agent is as follows:
Figure BDA00031876540000001011
Figure BDA0003187654000000111
wherein the first formula represents the Laplacian normalization,
Figure BDA0003187654000000112
inverse of the nth power of the degree matrix, E t For the edge matrix of the example space-time interaction diagram, R t For Laplace matrix, ATT t And (5) interacting an information matrix for the example agent.
In the step 4, the unrestricted neighborhood interaction formula is as follows:
h t =δ(Conv(ATT t ))
Figure BDA0003187654000000113
wherein the first formula represents F, which is operated on by a nonlinear activation function after an asymmetric convolution operation of the interaction attention matrix t The representative assigns category attention weights to instance interaction attention.
In the step 5, the formulas for modeling the interaction information and modeling the time trend of the track are as follows:
Figure BDA0003187654000000114
HT=TCN(H t )
Figure BDA0003187654000000115
wherein H is l For the first layer interactive information of the graph convolution network, TCN is a time convolution network for extracting the trend of the track along with time, HT is final output, L i The obtained binary Gaussian mixture distribution function is used for fitting the track distribution of the future intelligent agent.
Most of the prior art has two main routes. First, because the trajectory prediction problem can be regarded as essentially a sequence-generating task, time-series prediction models, represented by LSTM, are often used in trajectory prediction tasks, which model interactions between agents with hidden states of LSTM. The limitation of the scheme is that the interaction modeling effect of the LSTM on the intelligent agent is not good, because the interaction modeling method can only model the interaction of the intelligent agent in a local range, and meanwhile, the LSTM hidden state is used for representing the interaction of the intelligent agent, so that a good theoretical basis is also lacking. In addition, LSTM technology itself presents technical challenges such as gradient vanishing and gradient explosion.
The second approach is to use a neural network model based on a graph, such as a Graph Convolution (GCN) or graph annotation force mechanism (GAT), and the like. A node of the graph naturally represents an agent, and an edge of the graph naturally also represents a relationship between agents. The limitation of this solution is that: first, GCN has an over smoothing (i.e., a problem that node values tend to be irreversible when the neural network hierarchy is deepened, due to technical limitations, so that using GCN can only use a shallow neural network, which is physically represented as modeling interactions of one-to-one agents, but the interactions do not actually occur on pairs of agents. In addition, GCN and other nodes only can represent nodes with the same property, and for heterogeneous points, the representation capability is insufficient, and the modeling effect is poor.
Therefore, the invention particularly provides an unrestricted neighborhood interaction track prediction method based on a heterogeneous space-time diagram.
Aiming at the problem that the hidden state of the LSTM has poor interactive modeling effect on the intelligent agent in the existing first main stream method, the invention uses the construction of the heterogeneous space-time diagram, the method converts the input data of the track position into space-time data and characterizes the category information of the intelligent agent, and then the heterogeneous space-time diagram of the method is constructed. The heterogeneous space-time diagram comprises two stages of nodes, secondary nodes are used for representing intelligent agents, and advanced nodes are used for representing categories. The edges of the graph represent the spatial/temporal and class associations of nodes, where the spatial association is the interaction between agents, which is directional, i.e., the method considers that the attention between agents is not symmetrical. The temporal association represents the continuity of the same agent between points in time, while the categorical association includes both the association of two levels of nodes within the same class of agent and the association of heterogeneous advanced nodes. Thus, the heterogeneous space-time diagram constructed by the method models the scene of each frame as a diagram, learns the interaction of the agent in space based on the diagram, and simultaneously the spatial information of each frame is continuous in time to form a time diagram, so that the motion continuity of the agent in time is learned. Thus realizing interactive modeling of the whole scene and the whole quantity.
Aiming at the problems of gradient explosion, gradient disappearance and the like caused by LSTM (localized surface acoustic wave) existing in the first mainstream method, the invention uses a graph rolling network (GCN) and a Time Convolution Network (TCN) to jointly model so as to solve the problems of gradient explosion, gradient disappearance and the like in the neural network training process.
Aiming at the problem that the interaction of the second main stream method is limited by the quantity and the range of traffic agents, the invention uses asymmetric small convolution kernel and a large-size pooling and packing operation to model the interaction of the agents, thereby realizing the interaction without being limited by the quantity or the space-time range of the agents, avoiding the defects of LSTM and GCN and realizing the interaction of unlimited neighborhood.
Aiming at the problem that the node capacity of the graph neural network model such as GCN and the like for representing heterogeneous properties is insufficient in the existing first main stream method, the invention uses a layered graph attention module which learns the attention among categories and the attention among the intelligent agents, thereby helping to model interaction among the intelligent agents. The high-level attention module is used for constructing interactions among categories, and the low-level attention module is used for matching with the unrestricted neighborhood interaction module to finally model interactions among intelligent agents. Thus, the interaction of the heterogeneous intelligent bodies is well modeled. The invention uses Gaussian mixture model to distinguish different motion modes and influence forces among different intelligent agents.
Example 1
The method for predicting the track of the heterogeneous scene based on the interaction information in the convergent unrestricted neighbor domain, provided by the embodiment of the invention, utilizes the infinite neighbor domain interaction module to simultaneously generate the fusion characteristics of all heterogeneous agents participating in interaction, and adopts a self-adaptive asymmetric convolution network, so that the method can be suitable for any number of agents and any range of interaction areas. At the same time, a hierarchical diagram attention module is presented for capturing category-category interactions to guide instance-instance interactions. And finally, extracting track trend information through a time convolution network, and estimating parameters of the Gaussian mixture model to generate a future track. Numerous experimental results on benchmark datasets (nuScenes, apollonicape, SDD, etc.) show that the method of the present invention has significant performance improvements over the most advanced methods.
The embodiment of the invention discloses a track prediction method of a heterogeneous scene based on interactive information in a convergence unrestricted neighbor, which comprises the following steps:
step 1: graph characterization is performed on the space-time trajectories and categories of the heterogeneous intelligent agents:
1) The track points and the categories of all the agents in the complex traffic scene are characterized as the graph: the instance graph nodes are represented by track points of all heterogeneous intelligent agent instances, and a space interaction graph with the number equal to the number of time frames is obtained. The nodes of the high-level class diagram are represented by the tracks of all the same class intelligent agent examples, and class interaction diagrams with the number equal to the number of time frames are obtained;
2) Adjacency matrix established in the spatial dimension: the adjacency matrix is a full-connection matrix and represents the interrelation among all the examples through the space interaction diagram of the examples. Carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix, namely carrying out Fourier transformation of the graph, so as to research the properties of the graph by utilizing the characteristic values and the characteristic vectors of the Laplace matrix of the graph, and further enabling nodes on the graph to obtain a certain degree of information interaction on a non-European structure without translation invariance;
3) Establishing a stable structure and a message transmission channel: the unbalanced example data in the class interaction diagram is zero-filled and completed, so that the class interaction diagram of each time frame keeps a relatively complete structure, and a connecting channel of the class interaction diagram and the example interaction diagram is designed to transmit information.
Step 2: learning heterogeneous attention through a category interaction graph:
1) And obtaining the category characteristics of each category on the constructed space-time category graph, and obtaining the interaction weight among the categories through pooling operation on the basis. Aiming at the problem of unbalanced number of instances in different scenes, the number of instances is aligned to the same number by adopting a filling operation. Then, each type of embedding is obtained by a linear projection;
2) Obtaining the embedding of each category by using a linear relation, and transversely connecting the embedding of any two categories to obtain the fusion embedding of a plurality of categories. Obtaining the attention vector of the category-category in the time frame t through a graph annotation mechanism;
3) Category-category attention weights are adjusted by a learnable weight vector and activated using a nonlinear function, and then a global attention score is obtained to measure interactions between one category and the other.
4) And normalizing the obtained attention weight between any two categories to obtain the final category-category interaction as the side of the space category interaction graph. The weights of the spatial class edges represent class-class interactions, and the obtained interaction values are prepared for future assignment to the instance edges.
Step 3: learning heterogeneous attention through example spatiotemporal interaction diagrams:
1) Using a distance-based method: initializing a space edge through the relative distance between corresponding examples, and then obtaining a normalized interaction matrix through Laplace transformation;
2) Using a learning-based approach: and (3) multiplying the example fusion characteristic obtained in the step (2) by the Laplacian normalized interaction matrix obtained in the first step point by point to obtain an example-example interaction attention matrix.
Step 4: modeling interactions of unrestricted neighbors:
1) All heterogeneous agents involved in one interaction are defined as infinite neighbors, regardless of the number of agents and the distance apart. Adaptively capturing interactions between an indefinite number of heterogeneous intelligent agents using a symmetric convolution, the information of all heterogeneous intelligent agents involved in the same interaction being acquired at one time, wherein the asymmetric convolution comprises convolutions with convolution kernel sizes of 3 by 1,3 by 3,2 by 1,1 by 1;
2) A padding operation is used to ensure that the output size is the same as the input size. Global spatial interaction information is aggregated by repeatedly computing an asymmetric convolution. The interaction attention of the final example-level heterogeneous intelligent agent is obtained by fusing infinite neighborhood and category-category level interactions.
Step 5: inferring a distribution function of future track points:
1) According to the interaction attention matrix of the heterogeneous intelligent agent obtained in the step 3 and the step 4, taking the interaction attention matrix as the side of the graph constructed in the step 1;
2) Interactions of the heterogeneous intelligent agent are extracted and modeled using a graph convolution operation, and historical trajectory movement trends of the heterogeneous intelligent agent are extracted and modeled using a time convolution operation. Wherein the number of layers of the graph convolution is 1, and the number of layers of the time convolution is 5;
3) And according to the spatial information and the time information obtained by the 1), 2), carrying out channel compression to obtain binary Gaussian mixture distribution conforming to logic so as to predict future tracks of heterogeneous intelligent bodies, wherein the channel compression uses convolution with convolution kernel of 1 by 1.
Example 2
According to the heterogeneous scene track prediction method based on the interaction information in the convergent unrestricted neighbor, the heterogeneous intelligent object track prediction algorithm with unrestricted neighbor interaction and hierarchical graph attention is provided for solving the problems that the interaction existing in the existing method is limited by the quantity and the range of traffic intelligent objects and the modeling of the interaction attention of the heterogeneous intelligent objects is lacking, interaction among the heterogeneous intelligent objects is effectively modeled, meanwhile, historical track trends are obtained according to the time continuity of the tracks, and the accuracy of heterogeneous track prediction is remarkably improved.
The embodiment of the invention discloses a track prediction method of a heterogeneous scene based on interactive information in a convergence unrestricted neighbor, which comprises the following steps:
step 1: graph characterization is performed on the space-time trajectories and categories of the heterogeneous intelligent agents:
1) Track points of heterogeneous intelligent agents and categories thereof in a traffic scene are used as input, wherein track points of all heterogeneous intelligent agent examples are used as example graph nodes, and a space interaction graph with the number equal to the number of time frames is obtained. The trajectories of all the same-category intelligent agent examples are used as high-level category map nodes, and category interaction maps with the number equal to the number of time frames are obtained;
2) An adjacency matrix of the instance in the space dimension is established through a space interaction diagram of the instance, and the adjacency matrix is set as a full-connection matrix to represent the mutual correlation among all the instances. Carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix, namely carrying out Fourier transformation of the graph, so that the characteristic values and characteristic vectors of the Laplace matrix of the graph are utilized to study the properties of the graph, and the nodes on the graph are enabled to obtain a certain degree of information interaction without translation invariance;
3) And carrying out complement operation on unbalanced example data in the class interaction diagram by adopting a zero filling method, so that the class interaction diagram of each time frame keeps a relatively complete structure, and a connecting channel with the example interaction diagram is designed to transmit information.
Step 2: learning heterogeneous attention through a category interaction graph:
1) In order to construct interaction among the categories, category characteristics of each category are obtained on the constructed space-time category diagram, and then interaction weights among the categories are obtained through pooling operation on the basis. Obtaining an embedded value H_t≡c of each class through a linear projection, namely embedding the class c in a time step t;
2) And connecting any two kinds of embedded devices through the embedded devices of each kind to obtain a fusion embedded device. Obtaining a category-category attention vector in a time frame t through a graph annotation mechanism;
3) Category-category attention weights are adjusted by a learnable weight vector and activated using a nonlinear function, and then a global attention score is obtained to measure interactions between one category and other categories;
4) And normalizing the obtained attention weight between any two categories to obtain the final category-category interaction as the side of the space category interaction graph. The weights of the spatial class edges represent class-class interactions, and the obtained interaction values are prepared for future assignment to the instance edges.
Step 3: learning heterogeneous attention through example spatiotemporal interaction diagrams:
1) Using a distance-based method: initializing a space edge through the relative distance between corresponding examples, and then obtaining a normalized interaction matrix through Laplace transformation;
2) Using a learning-based approach: and (3) multiplying the example fusion characteristic obtained in the step (2) by the Laplacian normalized interaction matrix obtained in the first step point by point to obtain an example-example interaction attention matrix.
Step 4: modeling interactions of unrestricted neighbors:
1) All heterogeneous agents involved in one interaction are defined as infinite neighbors, regardless of the number of agents and the distance apart. Adaptively capturing interactions between an indefinite number of heterogeneous intelligent agents using a symmetric convolution, acquiring information of all the heterogeneous intelligent agents involved in the same interaction at one time;
2) A padding operation is used to ensure that the output size is the same as the input size. Global spatial interaction information is aggregated by repeatedly computing an asymmetric convolution. The interaction attention of the final example-level heterogeneous intelligent agent is obtained by fusing infinite neighborhood and category-category level interactions.
Step 5: inferring a distribution function of future track points:
1) According to the interaction attention matrix of the heterogeneous intelligent agent obtained in the step 3 and the step 4, taking the interaction attention matrix as the edge of the characterization graph constructed in the step 1;
2) Extracting and modeling interaction of the heterogeneous intelligent agent by using a graph convolution operation, and extracting and modeling historical track movement trend of the heterogeneous intelligent agent by using a time convolution operation;
3) And according to the spatial information and the time information obtained by the 1) and the 2), carrying out channel compression to obtain binary Gaussian mixture distribution conforming to logic so as to predict future tracks of heterogeneous intelligent bodies.
Table 1 shows the experimental results of this method compared to other methods at Argoverse, nuScenes and Apolloscape datasets. The experiment adopts average displacement error and final displacement error (ADE/FDE) as evaluation indexes, namely average error of 20 sample tracks and real tracks at each time point and average error of 20 sample tracks and real tracks at the last time point, wherein the lower the two indexes are, the better the representing effect is.
Table 1 experimental results of this method under Argoverse, nuScenes and apollonipe data sets
Figure BDA0003187654000000181
All methods observe 2 seconds and predict the trajectory for the next 3 seconds. Wherein the Apolloscape dataset uses weighted ADE and FDE metrics, i.e. weights of 0.20, 0.58 and 0.22 for vehicle, pedestrian and bicycle, respectively. The method of label "×" uses additional scene context information. The performance of the method exceeds that of all the current advanced methods.
Table 2 shows the experimental results of the present method under the SDD data set, and the specific settings are the same as table one.
Table 2 experimental results of this method under SDD dataset
Figure BDA0003187654000000182
The comparison is made with the previous method on the SDD reference dataset containing mainly pedestrian trajectories. Performance was assessed using ADE/FDE metrics (lower better). The method labeled "×" uses additional analog data.
Fig. 2 shows the track prediction result of the method of the present invention in the world coordinate system of nuScenes dataset, and it can be seen that the method of the present invention captures various interactions and track continuity between heterogeneous intelligent objects. The predicted traffic agent trajectories in the figure include the trajectory being turned, the trajectory when parallel is oriented in the same direction, two non-adjacent agent interactions, and collective interactions involving a group of agents of different categories. The method of the invention well captures the track movement of the intelligent agent under different states.
Fig. 3 shows a long-time spliced track prediction result of the method in the real scene of the nuScenes data set, and it can be seen that the method successfully predicts the real track of the heterogeneous intelligent agent traveling for a long time in the real track scene. The method has higher prediction precision.
Fig. 4 shows hierarchical interaction attention prediction results of the method according to the present invention, and it can be seen that the method according to the present invention learns better asymmetric interaction attention among heterogeneous intelligent objects, and the relationship between category attention (first column) and individual attention (second column) corresponds to the attention map of two real scenes (first row and second row). The numerals 1,2,3 represent different categories of agents on each picture, respectively automobiles, bicycles and pedestrians. Color represents the attention weight. For example, on the upper left-hand corner attention diagram, black squares (row 3, column 1) represent class 3 (pedestrian) attention to class 1 (vehicle). It can be seen that, with respect to asymmetry, the method of the present invention successfully infers category attention and calculates overall agent attention under this guidance.
In summary, the invention discloses a heterogeneous scene track prediction method based on interaction information in convergent unrestricted neighbors, which belongs to the field of computer vision. And finally, predicting a Gaussian mixture distribution function of the heterogeneous intelligent object track in a future period by using the space-time diagram convolution.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents not departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.

Claims (6)

1. The heterogeneous intelligent body track prediction method is characterized by comprising the following steps of:
respectively carrying out graph characterization on the category and the space-time track of the heterogeneous intelligent agent based on the track points and the category of the heterogeneous intelligent agent to obtain a category interaction graph and a space interaction graph;
learning heterogeneous attentiveness based on the category interaction diagram to obtain category-level heterogeneous attentiveness;
Based on the space interaction diagram and the class-level heterogeneous attentions, heterogeneous intelligent-level heterogeneous attentions are obtained;
modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result;
the step of learning heterogeneous attentions based on the category interaction graph to obtain category-level heterogeneous attentions specifically comprises the following steps:
2.1 Acquiring category characteristics of each category on the constructed category interaction graph, and obtaining interaction weights among the categories through pooling operation; obtaining an embedded value of each category through linear projection;
2.2 Connecting the embedded values of any two categories through the embedded value of each category to obtain a fusion embedded value, and obtaining a category-category attention vector in a time frame t through a graph annotation mechanism;
2.3 Adjusting category-category attention weights by a learnable weight vector, activating with a nonlinear function to obtain overall attention scores for measuring interactions between each category and other categories;
2.4 Normalizing the obtained attention weight between any two categories to obtain a final category-category interaction as the side of the category interaction diagram;
The step of obtaining the heterogeneous attention of the heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level specifically comprises the following steps:
3.1 Using a distance-based method: initializing a space edge through the relative distance between corresponding heterogeneous intelligent agent examples, and then obtaining a Laplacian normalized interaction matrix through Laplacian transformation;
3.2 Using a learning-based approach: multiplying the fusion characteristics of the heterogeneous intelligent body instance by the Laplacian normalized interaction matrix point by point to obtain an attention matrix of the interaction of the heterogeneous intelligent body instance and the heterogeneous intelligent body instance;
3.3 Defining all heterogeneous intelligent agent instances involved in one interaction as infinite neighbors; adaptively capturing interactions between an indefinite number of heterogeneous intelligent agent instances using symmetric convolution, obtaining information of all heterogeneous intelligent agent instances involved in the same interaction;
3.4 Using a fill operation to ensure that the output size is the same as the input size; aggregating global spatial interaction information by repeatedly computing an asymmetric convolution; obtaining the interaction attention of the heterogeneous intelligent body at the instance level by fusing the interaction of the infinite neighborhood and the category-category level;
Learning heterogeneous attentions based on the category interaction diagram, and obtaining category-level heterogeneous attentions,
the attention construction formula is as follows:
Figure FDA0004092564110000021
Figure FDA0004092564110000022
Figure FDA0004092564110000023
Figure FDA0004092564110000024
in the method, in the process of the invention,
Figure FDA0004092564110000025
representative category feature embedding, W e For training parameters, padding represents a fill operation, < ->
Figure FDA0004092564110000026
Is a class diagram node, phi is a linear projection function, delta is a nonlinear activation function, mu c A learnable attention weight vector for category c,/->
Figure FDA0004092564110000027
Representative category c 1 For category c 2 Is the attention score of (i) is the concatenation operation, +.>
Figure FDA0004092564110000028
For category c 1 For category c 2 Is (are) attention of->
Figure FDA0004092564110000029
For the total force of interest obtained for category c, max-pooling is the maximum pooling operation, +.>
Figure FDA00040925641100000210
Representing the final attention after weight assignment;
in the process of obtaining the heterogeneous attention of the heterogeneous intelligent body level based on the space interaction diagram and the heterogeneous attention of the class level,
the attention construction formula is as follows:
Figure FDA00040925641100000211
Figure FDA00040925641100000212
in the method, in the process of the invention,
Figure FDA0004092564110000031
inverse of the nth power of the degree matrix, E t For the edge matrix of the example space-time interaction diagram, R t For Laplace matrix, ATT t And (5) interacting an information matrix for the example agent.
2. The method for predicting the trajectory of the heterogeneous intelligent agent according to claim 1, wherein the step of obtaining the category interaction map and the space interaction map specifically comprises the steps of:
1.1 Track points of heterogeneous intelligent agents in a traffic scene and categories thereof are used as input, track points of all heterogeneous intelligent agent examples are used as example graph nodes, and a space interaction graph with the number equal to the number of time frames is obtained;
taking the tracks of all the heterogeneous intelligent agent instances of the same category as the nodes of the high-level category map to obtain category interaction maps with the number equal to the number of time frames;
1.2 Establishing an adjacency matrix of the heterogeneous intelligent agent instance in the space dimension through a space interaction diagram of the heterogeneous intelligent agent instance, wherein the adjacency matrix is set as a full-connection matrix and is used for representing the mutual correlation among all the heterogeneous intelligent agent instances; carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix;
1.3 For unbalanced heterogeneous intelligent agent instance data in the class interaction diagram, adopting a zero filling method to carry out completion operation.
3. The method of claim 1, wherein in modeling interaction information and modeling temporal trends of trajectories based on heterogeneous attentions of the heterogeneous intelligent levels,
the modeling formulas for the interaction information and the time trend of the track are as follows:
Figure FDA0004092564110000032
HT=TCN(H t )
Figure FDA0004092564110000033
Wherein H is l For the first layer interactive information of the graph convolution network, TCN is a time convolution network for extracting the trend of the track along with time, HT is final output, L i The obtained binary Gaussian mixture distribution function is used for fitting the track distribution of the future intelligent agent.
4. A heterogeneous intelligent agent trajectory prediction system, comprising:
the diagram characterization module is used for respectively carrying out diagram characterization on the category and the space-time trajectory of the heterogeneous intelligent body based on the trajectory points and the category of the heterogeneous intelligent body to obtain a category interaction diagram and a space interaction diagram;
the first heterogeneous attention acquisition module is used for learning heterogeneous attention based on the category interaction diagram to obtain category-level heterogeneous attention;
the second heterogeneous attention acquisition module is used for obtaining heterogeneous attention of a heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level;
the prediction result acquisition module is used for modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result;
the step of learning heterogeneous attentions based on the category interaction graph to obtain category-level heterogeneous attentions specifically comprises the following steps:
2.1 Acquiring category characteristics of each category on the constructed category interaction graph, and obtaining interaction weights among the categories through pooling operation; obtaining an embedded value of each category through linear projection;
2.2 Connecting the embedded values of any two categories through the embedded value of each category to obtain a fusion embedded value, and obtaining a category-category attention vector in a time frame t through a graph annotation mechanism;
2.3 Adjusting category-category attention weights by a learnable weight vector, activating with a nonlinear function to obtain overall attention scores for measuring interactions between each category and other categories;
2.4 Normalizing the obtained attention weight between any two categories to obtain a final category-category interaction as the side of the category interaction diagram;
the step of obtaining the heterogeneous attention of the heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level specifically comprises the following steps:
3.1 Using a distance-based method: initializing a space edge through the relative distance between corresponding heterogeneous intelligent agent examples, and then obtaining a Laplacian normalized interaction matrix through Laplacian transformation;
3.2 Using a learning-based approach: multiplying the fusion characteristics of the heterogeneous intelligent body instance by the Laplacian normalized interaction matrix point by point to obtain an attention matrix of the interaction of the heterogeneous intelligent body instance and the heterogeneous intelligent body instance;
3.3 Defining all heterogeneous intelligent agent instances involved in one interaction as infinite neighbors; adaptively capturing interactions between an indefinite number of heterogeneous intelligent agent instances using symmetric convolution, obtaining information of all heterogeneous intelligent agent instances involved in the same interaction;
3.4 Using a fill operation to ensure that the output size is the same as the input size; aggregating global spatial interaction information by repeatedly computing an asymmetric convolution; obtaining the interaction attention of the heterogeneous intelligent body at the instance level by fusing the interaction of the infinite neighborhood and the category-category level;
learning heterogeneous attentions based on the category interaction diagram, and obtaining category-level heterogeneous attentions,
the attention construction formula is as follows:
Figure FDA0004092564110000051
Figure FDA0004092564110000052
Figure FDA0004092564110000053
Figure FDA0004092564110000054
in the method, in the process of the invention,
Figure FDA0004092564110000055
representative category feature embedding, W e For training parameters, padding represents a fill operation, < ->
Figure FDA0004092564110000056
Is a class diagram node, phi is a linear projection function, delta is a nonlinear activation function, mu c A learnable attention weight vector for category c,/->
Figure FDA0004092564110000057
Representative category c 1 For category c 2 Is the attention score of (i) is the concatenation operation, +.>
Figure FDA0004092564110000058
For category c 1 For category c 2 Is (are) attention of->
Figure FDA0004092564110000059
For the total force of interest obtained for category c, max-pooling is the maximum pooling operation, +.>
Figure FDA00040925641100000510
Representing the final attention after weight assignment;
in the process of obtaining the heterogeneous attention of the heterogeneous intelligent body level based on the space interaction diagram and the heterogeneous attention of the class level,
the attention construction formula is as follows:
Figure FDA0004092564110000061
Figure FDA0004092564110000062
in the method, in the process of the invention,
Figure FDA0004092564110000063
inverse of the nth power of the degree matrix, E t For the edge matrix of the example space-time interaction diagram, R t For Laplace matrix, ATT t And (5) interacting an information matrix for the example agent.
5. An electronic device comprising a processor and a memory, the processor configured to execute a computer program stored in the memory to implement the heterogeneous intelligent agent trajectory prediction method of any one of claims 1 to 3.
6. A computer readable storage medium storing at least one instruction that when executed by a processor implements the heterogeneous intelligent agent trajectory prediction method of any one of claims 1 to 3.
CN202110866999.8A 2021-07-29 2021-07-29 Heterogeneous intelligent body track prediction method, system, equipment and medium Active CN113568410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110866999.8A CN113568410B (en) 2021-07-29 2021-07-29 Heterogeneous intelligent body track prediction method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110866999.8A CN113568410B (en) 2021-07-29 2021-07-29 Heterogeneous intelligent body track prediction method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN113568410A CN113568410A (en) 2021-10-29
CN113568410B true CN113568410B (en) 2023-05-12

Family

ID=78169240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110866999.8A Active CN113568410B (en) 2021-07-29 2021-07-29 Heterogeneous intelligent body track prediction method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN113568410B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821812B (en) * 2022-06-24 2022-09-13 西南石油大学 Deep learning-based skeleton point action recognition method for pattern skating players
CN115273029B (en) * 2022-07-25 2024-06-14 上海人工智能创新中心 Method for predicting movement of intelligent body based on heterogeneous graph convolution network
CN115221971B (en) * 2022-07-28 2024-06-14 上海人工智能创新中心 Trajectory prediction method based on heterogeneous graph
CN115694697A (en) * 2022-09-28 2023-02-03 东南大学 Machine learning-based space-time domain prediction channel modeling method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215423A (en) * 2020-10-13 2021-01-12 西安交通大学 Pedestrian trajectory prediction method and system based on trend guiding and sparse interaction

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10328947B1 (en) * 2018-04-20 2019-06-25 Lyft, Inc. Transmission schedule segmentation and prioritization
US11307584B2 (en) * 2018-09-04 2022-04-19 Skydio, Inc. Applications and skills for an autonomous unmanned aerial vehicle
CN110580740B (en) * 2019-08-27 2021-08-20 清华大学 Multi-agent cooperative three-dimensional modeling method and device
CN112084427A (en) * 2020-09-15 2020-12-15 辽宁工程技术大学 Interest point recommendation method based on graph neural network
CN112905900B (en) * 2021-04-02 2023-11-17 辽宁工程技术大学 Collaborative filtering recommendation method based on graph convolution attention mechanism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215423A (en) * 2020-10-13 2021-01-12 西安交通大学 Pedestrian trajectory prediction method and system based on trend guiding and sparse interaction

Also Published As

Publication number Publication date
CN113568410A (en) 2021-10-29

Similar Documents

Publication Publication Date Title
CN113568410B (en) Heterogeneous intelligent body track prediction method, system, equipment and medium
Pan et al. Cross-view semantic segmentation for sensing surroundings
Xu et al. Trajectory prediction for heterogeneous traffic-agents using knowledge correction data-driven model
Du et al. A general pipeline for 3d detection of vehicles
Choi et al. Looking to relations for future trajectory forecast
Fan et al. Learning collision-free space detection from stereo images: Homography matrix brings better data augmentation
Fernando et al. Deep inverse reinforcement learning for behavior prediction in autonomous driving: Accurate forecasts of vehicle motion
Lyu et al. Robot path planning by leveraging the graph-encoded Floyd algorithm
Li et al. Dual-view 3d object recognition and detection via lidar point cloud and camera image
CN110837778A (en) Traffic police command gesture recognition method based on skeleton joint point sequence
CN103003846B (en) Articulation region display device, joint area detecting device, joint area degree of membership calculation element, pass nodular region affiliation degree calculation element and joint area display packing
Bešić et al. Dynamic object removal and spatio-temporal RGB-D inpainting via geometry-aware adversarial learning
Hu et al. How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence
Achaji et al. Is attention to bounding boxes all you need for pedestrian action prediction?
Wang et al. Multi-information-based convolutional neural network with attention mechanism for pedestrian trajectory prediction
CN115690153A (en) Intelligent agent track prediction method and system
Yang et al. PTPGC: Pedestrian trajectory prediction by graph attention network with ConvLSTM
Ding et al. Simultaneous body part and motion identification for human-following robots
Duan et al. A semantic robotic grasping framework based on multi-task learning in stacking scenes
Jin et al. Graph neural network based relation learning for abnormal perception information detection in self-driving scenarios
Zhou et al. Static-dynamic global graph representation for pedestrian trajectory prediction
Sadid et al. Dynamic Spatio-temporal Graph Neural Network for Surrounding-aware Trajectory Prediction of Autonomous Vehicles
Rao et al. Spatio-temporal look-ahead trajectory prediction using memory neural network
Jing et al. Learning to explore informative trajectories and samples for embodied perception
Schmeckpeper et al. Object-centric video prediction without annotation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant