CN113568410B

CN113568410B - Heterogeneous intelligent body track prediction method, system, equipment and medium

Info

Publication number: CN113568410B
Application number: CN202110866999.8A
Authority: CN
Inventors: 王乐; 郑方; 周三平; 陈仕韬; 辛景民; 郑南宁
Original assignee: Ningbo Shun'an Artificial Intelligence Research Institute; Xian Jiaotong University
Current assignee: Ningbo Shun'an Artificial Intelligence Research Institute; Xian Jiaotong University
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2023-05-12
Anticipated expiration: 2041-07-29
Also published as: CN113568410A

Abstract

The invention discloses a heterogeneous intelligent body track prediction method, a heterogeneous intelligent body track prediction system, heterogeneous intelligent body track prediction equipment and a heterogeneous intelligent body track prediction medium, wherein the heterogeneous intelligent body track prediction method comprises the following steps: respectively carrying out graph characterization on the category and the space-time track of the heterogeneous intelligent agent based on the track points and the category of the heterogeneous intelligent agent to obtain a category interaction graph and a space interaction graph; learning heterogeneous attentiveness based on the category interaction diagram to obtain category-level heterogeneous attentiveness; based on the space interaction diagram and the class-level heterogeneous attentions, heterogeneous intelligent-level heterogeneous attentions are obtained; and modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result. The method can effectively model interaction and track trend of heterogeneous intelligent bodies of various types, and can remarkably improve accuracy of track prediction.

Description

Heterogeneous intelligent body track prediction method, system, equipment and medium

Technical Field

The invention belongs to the technical field of computer vision, relates to the field of track prediction, and in particular relates to a heterogeneous intelligent body track prediction method, a heterogeneous intelligent body track prediction system, heterogeneous intelligent body track prediction equipment and a heterogeneous intelligent body track prediction medium.

Background

The purpose of track prediction is to predict future track sequences of agents in traffic scenes based on the observed tracks. Trajectory prediction is a challenging computer vision problem and has many real world applications (e.g., autopilot, anomaly detection, and motion recognition, etc.).

Currently, challenges that hinder prediction accuracy stem largely from complex interactions between agents, and recent advances in this regard fall into two main categories:

(1) Constructing a space diagram on each time step based on a graph (Graphs) method, and aggregating the characteristics of adjacent nodes;

(2) The method based on the cyclic neural network models the trajectory of each agent by using the cyclic neural network (RNN, LSTM) and extracts hidden states in surrounding areas.

However, the above-described method has limitations:

(1) The graph-based method only uses the pairwise relationship between nodes, and other nodes are mixed and relayed; in contrast, traffic interactions in the real world are much more complex than previously thought, such as polygonal relationships (relationships between three or more agents); that is, this approach is limited by the inflexible number of interaction partners.

(2) The cyclic neural network-based method only considers local relations between the intelligent objects in the manually defined peripheral region, and potential interactive participants outside the peripheral region are directly ignored; that is, this approach is limited by the manner in which the interactive agent is manually selected.

In addition, most of the existing methods only focus on track prediction of the intelligent agent in isomorphic scenes, such as scenes with pedestrians or automobiles, and neglect track prediction of the intelligent agent in heterogeneous scenes, such as scenes with pedestrians, cars, bicycles, trucks, carts, and the like. In fact, the latter is the more realistic case, and because of the differences in motion patterns (such as speed, fore-aft distance, and response to interactions) of different classes of agents, the reaction patterns of people to different classes of agents are also different, and trajectory prediction of heterogeneous agents is more challenging than isomorphic agents.

In view of the foregoing, there is a need for new heterogeneous intelligent agent trajectory prediction methods, systems, devices, and media.

Disclosure of Invention

The invention aims to provide a heterogeneous intelligent body track prediction method, a heterogeneous intelligent body track prediction system, heterogeneous intelligent body track prediction equipment and a heterogeneous intelligent body track prediction medium, so as to solve one or more of the technical problems. Aiming at the technical problems of the existing method, such as the limit of the number, the distance and the category of interaction participants, an effective infinite neighborhood interaction network is provided, all potential agents affected by the same interaction are extracted to obtain information, and meanwhile, different types of agents in a heterogeneous scene are respectively modeled, so that the interaction and track trend of various types of heterogeneous agents can be effectively modeled, and the accuracy of track prediction can be remarkably improved.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the invention discloses a heterogeneous intelligent body track prediction method, which comprises the following steps of:

respectively carrying out graph characterization on the category and the space-time track of the heterogeneous intelligent agent based on the track points and the category of the heterogeneous intelligent agent to obtain a category interaction graph and a space interaction graph;

learning heterogeneous attentiveness based on the category interaction diagram to obtain category-level heterogeneous attentiveness;

based on the space interaction diagram and the class-level heterogeneous attentions, heterogeneous intelligent-level heterogeneous attentions are obtained;

and modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result.

The invention further improves that the method carries out graph characterization on the category and the space-time track of the heterogeneous intelligent agent based on the track points and the category of the heterogeneous intelligent agent, and the steps for obtaining the category interaction graph and the space interaction graph specifically comprise the following steps:

1.1 Track points of heterogeneous intelligent agents in a traffic scene and categories thereof are used as input, track points of all heterogeneous intelligent agent examples are used as example graph nodes, and a space interaction graph with the number equal to the number of time frames is obtained;

Taking the tracks of all the heterogeneous intelligent agent instances of the same category as the nodes of the high-level category map to obtain category interaction maps with the number equal to the number of time frames;

1.2 Establishing an adjacency matrix of the heterogeneous intelligent agent instance in the space dimension through a space interaction diagram of the heterogeneous intelligent agent instance, wherein the adjacency matrix is set as a full-connection matrix and is used for representing the mutual correlation among all the heterogeneous intelligent agent instances; carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix;

1.3 For unbalanced heterogeneous intelligent agent instance data in the class interaction diagram, adopting a zero filling method to carry out completion operation.

The invention further improves that the step of learning the heterogeneous attention based on the class interaction diagram comprises the following steps:

2.1 Acquiring category characteristics of each category on the constructed category interaction graph, and obtaining interaction weights among the categories through pooling operation; obtaining an embedded value of each category through linear projection;

2.2 Connecting the embedded values of any two categories through the embedded value of each category to obtain a fusion embedded value, and obtaining a category-category attention vector in a time frame t through a graph annotation mechanism;

2.3 Adjusting category-category attention weights by a learnable weight vector, activating with a nonlinear function to obtain overall attention scores for measuring interactions between each category and other categories;

2.4 Normalizing the obtained attention weight between any two categories to obtain the final category-category interaction as the side of the category interaction diagram.

The invention further improves that the step of obtaining the heterogeneous attention of the heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level specifically comprises the following steps:

3.1 Using a distance-based method: initializing a space edge through the relative distance between corresponding heterogeneous intelligent agent examples, and then obtaining a Laplacian normalized interaction matrix through Laplacian transformation;

3.2 Using a learning-based approach: multiplying the fusion characteristics of the heterogeneous intelligent body instance by the Laplacian normalized interaction matrix point by point to obtain an attention matrix of the interaction of the heterogeneous intelligent body instance and the heterogeneous intelligent body instance;

3.3 Defining all heterogeneous intelligent agent instances involved in one interaction as infinite neighbors; adaptively capturing interactions between an indefinite number of heterogeneous intelligent agent instances using symmetric convolution, obtaining information of all heterogeneous intelligent agent instances involved in the same interaction;

3.4 Using a fill operation to ensure that the output size is the same as the input size; aggregating global spatial interaction information by repeatedly computing an asymmetric convolution; the interaction attention of the heterogeneous intelligent agent at the instance level is obtained by fusing infinite neighborhood and interaction at the class-class level.

A further improvement of the invention is that, in the process of learning heterogeneous attentions based on the class interaction diagram to obtain class-level heterogeneous attentions,

the attention construction formula is as follows:

in the method, in the process of the invention,

representative category feature embedding, W _e For training parameters, padding represents a fill operation, < ->

Is a class diagram node, phi is a linear projection function, delta is a nonlinear activation function, mu _c A learnable attention weight vector for category c,/->

Representative category c ₁ For category c ₂ Is the attention score of (i) is the concatenation operation, +.>

For category c ₁ For category c ₂ Is (are) attention of->

For the total force of interest obtained for category c, max-pooling is the maximum pooling operation, +.>

Representing the final attention after weight assignment.

A further improvement of the invention is that, in the course of said deriving heterogeneous attention at the level of heterogeneous intelligent agents based on said spatial interaction map and said heterogeneous attention at the level of categories,

The attention construction formula is as follows:

in the method, in the process of the invention,

inverse of the nth power of the degree matrix, E _t For the edge matrix of the example space-time interaction diagram, R _t For Laplace matrix, ATT _t And (5) interacting an information matrix for the example agent.

A further improvement of the invention is that, in the modeling of interaction information and modeling of temporal trends of trajectories based on the heterogeneous attentions of the heterogeneous intelligent levels,

the modeling formulas for the interaction information and the time trend of the track are as follows:

HT＝＝TCN(H _t )

wherein H is ^l For the first layer interactive information of the graph convolution network, TCN is a time convolution network for extracting the trend of the track along with time, HT is final output, L ⁱ The obtained binary Gaussian mixture distribution function is used for fitting the track distribution of the future intelligent agent.

The invention discloses a heterogeneous intelligent body track prediction system, which comprises:

the diagram characterization module is used for respectively carrying out diagram characterization on the category and the space-time trajectory of the heterogeneous intelligent body based on the trajectory points and the category of the heterogeneous intelligent body to obtain a category interaction diagram and a space interaction diagram;

the first heterogeneous attention acquisition module is used for learning heterogeneous attention based on the category interaction diagram to obtain category-level heterogeneous attention;

the second heterogeneous attention acquisition module is used for obtaining heterogeneous attention of a heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level;

And the prediction result acquisition module is used for modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result.

An electronic device of the present invention includes a processor and a memory, where the processor is configured to execute a computer program stored in the memory to implement a heterogeneous intelligent agent trajectory prediction method according to any one of the present invention.

A computer readable storage medium of the present invention stores at least one instruction that when executed by a processor implements a heterogeneous intelligent agent trajectory prediction method according to any one of the present invention.

Compared with the prior art, the invention has the following beneficial effects:

aiming at the problem of poor interactive modeling effect of the hidden state of the LSTM on the intelligent agent in the prior art, the method uses the construction of the heterogeneous space-time diagram, converts the input data of the track position into space-time data and characterizes the category information of the intelligent agent, and then constructs the heterogeneous space-time diagram of the method. The heterogeneous space-time diagram comprises two stages of nodes, secondary nodes are used for representing intelligent agents, and advanced nodes are used for representing categories. The edges of the graph represent the spatial/temporal and class associations of nodes, where the spatial association is the interaction between agents, which is directed, i.e. the inventive method considers that the attention between agents is not symmetrical. The temporal association represents the continuity of the same agent between points in time, while the categorical association includes both the association of two levels of nodes within the same class of agent and the association of heterogeneous advanced nodes. Thus, the heterogeneous space-time diagram constructed by the method models the scene of each frame as a diagram, learns the interaction of the intelligent agent in space based on the diagram, and simultaneously, the spatial information of each frame is continuous in time to form a time diagram, so that the motion continuity of the intelligent agent in time is learned. Thus realizing interactive modeling of the whole scene and the whole quantity.

Aiming at the problems of gradient explosion, gradient disappearance and the like caused by LSTM in the prior art, the invention uses the joint modeling of a graph rolling network (GCN) and a Time Convolution Network (TCN) to solve the problems of gradient explosion, gradient disappearance and the like in the training process of the neural network.

Aiming at the problem that the interaction in the prior art is limited by the quantity and the range of traffic agents, the invention uses asymmetric small convolution kernel and a large-size pooling and packing operation to model the interaction of the agents, thereby realizing the interaction without being limited by the quantity or the space-time range of the agents, avoiding the defects of LSTM and GCN and realizing the interaction of unlimited neighborhood.

Aiming at the problem that the node capacity of the GCN and other graph neural network models for representing heterogeneous properties is insufficient in the prior art, the invention uses a layered graph attention module which learns the attention among categories and the attention among the agents, thereby helping to model interaction among the agents. The high-level attention module is used for constructing interactions among categories, and the low-level attention module is used for matching with the unrestricted neighborhood interaction module to finally model interactions among intelligent agents. Thus, the interaction of the heterogeneous intelligent bodies is well modeled. The invention uses Gaussian mixture model to distinguish different motion modes and influence forces among different intelligent agents.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description of the embodiments or the drawings used in the description of the prior art will make a brief description; it will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the invention and that other drawings may be derived from them without undue effort.

Fig. 1 is a flow chart of a method for predicting trajectories of heterogeneous scenes based on interactive information in convergence unrestricted neighbor according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a track prediction result of the method according to the present invention in a world coordinate system of nuScenes dataset;

fig. 3 is a schematic diagram of a long-time spliced track prediction result of the method in a real scene of a nuScenes dataset according to an embodiment of the invention;

FIG. 4 is a schematic diagram of hierarchical interaction attention prediction results according to the method of the present invention in an embodiment of the present invention.

Detailed Description

In order to make the purposes, technical effects and technical solutions of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it will be apparent that the described embodiments are some of the embodiments of the present invention. Other embodiments, which may be made by those of ordinary skill in the art based on the disclosed embodiments without undue burden, are within the scope of the present invention.

Referring to fig. 1, the track prediction method of heterogeneous scenes based on interactive information in convergent unrestricted neighboring domains according to the embodiment of the invention includes the following steps:

step 1: and carrying out graph characterization on the space-time track and the category of the heterogeneous intelligent agent based on the track point and the category of the heterogeneous intelligent agent to obtain a space interaction graph and a category interaction graph.

Specifically, step 1 specifically includes:

1) Track points of heterogeneous intelligent agents and categories thereof in a traffic scene are used as input, wherein track points of all heterogeneous intelligent agent examples are used as example graph nodes, and a space interaction graph with the number equal to the number of time frames is obtained. The trajectories of all the same-category intelligent agent examples are used as high-level category map nodes, and category interaction maps with the number equal to the number of time frames are obtained;

2) An adjacency matrix of the instance in the space dimension is established through a space interaction diagram of the instance, and the adjacency matrix is set as a full-connection matrix to represent the mutual correlation among all the instances. Carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix, namely carrying out Fourier transformation of the graph, so that the characteristic values and characteristic vectors of the Laplace matrix of the graph are utilized to study the properties of the graph, and the nodes on the graph are enabled to obtain a certain degree of information interaction without translation invariance;

3) And carrying out complement operation on unbalanced example data in the class interaction diagram by adopting a zero filling method, so that the class interaction diagram of each time frame keeps a relatively complete structure, and a connecting channel with the example interaction diagram is designed to transmit information.

Step 2: and learning heterogeneous attentions based on the category interaction graph to obtain category-level heterogeneous attentions.

Specifically, step 2 specifically includes:

1) To construct interactions between classes, class features for each class are obtained on a constructed spatiotemporal class diagram and then passed through a pooling operation based thereonAnd obtaining the interaction weight between the categories. Obtaining embedded values of each class by a linear projection

I.e. embedding class c at time step t; />

2) And connecting any two kinds of embedded devices through the embedded devices of each kind to obtain a fusion embedded device. Obtaining a category-category attention vector in a time frame t through a graph annotation mechanism;

3) Category-category attention weights are adjusted by a learnable weight vector and activated using a nonlinear function, and then a global attention score is obtained to measure interactions between one category and other categories;

4) And normalizing the obtained attention weight between any two categories to obtain the final category-category interaction as the side of the space category interaction graph. The weights of the spatial class edges represent class-class interactions, and the obtained interaction values are prepared for future assignment to the instance edges.

Step 3: heterogeneous attention at the heterogeneous intelligent agent level was obtained by spatially interacting the graphics Xi Yigou attention.

Specifically, the step 3 specifically includes:

1) Using a distance-based method: initializing a space edge through the relative distance between corresponding examples, and then obtaining a normalized interaction matrix through Laplace transformation;

2) Using a learning-based approach: and (3) multiplying the example fusion characteristic obtained in the step (2) by the Laplacian normalized interaction matrix obtained in the first step point by point to obtain an example-example interaction attention matrix.

Step 4: interactions of unrestricted neighbors are modeled.

Specifically, step 4 specifically includes:

1) All heterogeneous agents involved in one interaction are defined as infinite neighbors, regardless of the number of agents and the distance apart. Adaptively capturing interactions between an indefinite number of heterogeneous intelligent agents using symmetric convolution, acquiring information of all the heterogeneous intelligent agents involved in the same interaction at one time;

2) A padding operation is used to ensure that the output size is the same as the input size. Global spatial interaction information is aggregated by repeatedly computing an asymmetric convolution. The interaction attention of the final example-level heterogeneous intelligent agent is obtained by fusing infinite neighborhood and category-category level interactions.

Step 5: and (5) reasoning a distribution function of future track points.

Specifically, step 5 specifically includes:

1) According to the interaction attention matrix of the heterogeneous intelligent agent obtained in the step 3 and the step 4, taking the interaction attention matrix as the edge of the characterization graph constructed in the step 1;

2) Extracting and modeling interaction of the heterogeneous intelligent agent by using a graph convolution operation, and extracting and modeling historical track movement trend of the heterogeneous intelligent agent by using a time convolution operation;

3) And according to the spatial information and the time information obtained by the 1) and the 2), carrying out channel compression to obtain binary Gaussian mixture distribution conforming to logic so as to predict future tracks of heterogeneous intelligent bodies.

In the step 1, the space-time and class characterization diagram G of the heterogeneous intelligent agent _stc Three relationships are shown: 1) The method comprises the steps that in a certain time frame, the heterogeneous intelligent bodies interact with each other in space, each node of the graph is represented by a track coordinate point of all the heterogeneous intelligent bodies in the time frame, each side of the graph is represented by the space interaction of each heterogeneous intelligent body, and the side is a directed side; 2) A representation graph of heterogeneous intelligent agents in a time dimension, wherein each node of the graph is represented by a track coordinate point of all the heterogeneous intelligent agents in the time dimension, and each side of the graph is represented by the continuity of the track of the same heterogeneous intelligent agent in time; 3) The method comprises the steps of representing a graph through interaction on a class abstraction level, wherein each node of the graph is represented by fusion of space-time characteristics of all similar intelligent agents, edges of the graph are divided into two classes, one class is a node interconnection edge represented by class-class abstraction interaction, the other class is an interconnection edge with class level pointing to the direction of a real level, and the edge is extracted by the class Like hierarchical collective interactive feature characterization.

In the step 2, the interactive attention construction formula between the categories is as follows:

/>

in the method, in the process of the invention,

For category c ₁ For category c ₂ Is (are) attention of->

For the total force of interest obtained for category c, max-pooling is the maximum poolManipulation of (I) and (II)>

Representing the final attention after weight assignment.

In the step 3, the attention construction formula of the example agent is as follows:

wherein the first formula represents the Laplacian normalization,

In the step 4, the unrestricted neighborhood interaction formula is as follows:

h _t ＝δ(Conv(ATT _t ))

wherein the first formula represents F, which is operated on by a nonlinear activation function after an asymmetric convolution operation of the interaction attention matrix _t The representative assigns category attention weights to instance interaction attention.

In the step 5, the formulas for modeling the interaction information and modeling the time trend of the track are as follows:

HT＝TCN(H _t )

Most of the prior art has two main routes. First, because the trajectory prediction problem can be regarded as essentially a sequence-generating task, time-series prediction models, represented by LSTM, are often used in trajectory prediction tasks, which model interactions between agents with hidden states of LSTM. The limitation of the scheme is that the interaction modeling effect of the LSTM on the intelligent agent is not good, because the interaction modeling method can only model the interaction of the intelligent agent in a local range, and meanwhile, the LSTM hidden state is used for representing the interaction of the intelligent agent, so that a good theoretical basis is also lacking. In addition, LSTM technology itself presents technical challenges such as gradient vanishing and gradient explosion.

The second approach is to use a neural network model based on a graph, such as a Graph Convolution (GCN) or graph annotation force mechanism (GAT), and the like. A node of the graph naturally represents an agent, and an edge of the graph naturally also represents a relationship between agents. The limitation of this solution is that: first, GCN has an over smoothing (i.e., a problem that node values tend to be irreversible when the neural network hierarchy is deepened, due to technical limitations, so that using GCN can only use a shallow neural network, which is physically represented as modeling interactions of one-to-one agents, but the interactions do not actually occur on pairs of agents. In addition, GCN and other nodes only can represent nodes with the same property, and for heterogeneous points, the representation capability is insufficient, and the modeling effect is poor.

Therefore, the invention particularly provides an unrestricted neighborhood interaction track prediction method based on a heterogeneous space-time diagram.

Aiming at the problem that the hidden state of the LSTM has poor interactive modeling effect on the intelligent agent in the existing first main stream method, the invention uses the construction of the heterogeneous space-time diagram, the method converts the input data of the track position into space-time data and characterizes the category information of the intelligent agent, and then the heterogeneous space-time diagram of the method is constructed. The heterogeneous space-time diagram comprises two stages of nodes, secondary nodes are used for representing intelligent agents, and advanced nodes are used for representing categories. The edges of the graph represent the spatial/temporal and class associations of nodes, where the spatial association is the interaction between agents, which is directional, i.e., the method considers that the attention between agents is not symmetrical. The temporal association represents the continuity of the same agent between points in time, while the categorical association includes both the association of two levels of nodes within the same class of agent and the association of heterogeneous advanced nodes. Thus, the heterogeneous space-time diagram constructed by the method models the scene of each frame as a diagram, learns the interaction of the agent in space based on the diagram, and simultaneously the spatial information of each frame is continuous in time to form a time diagram, so that the motion continuity of the agent in time is learned. Thus realizing interactive modeling of the whole scene and the whole quantity.

Aiming at the problems of gradient explosion, gradient disappearance and the like caused by LSTM (localized surface acoustic wave) existing in the first mainstream method, the invention uses a graph rolling network (GCN) and a Time Convolution Network (TCN) to jointly model so as to solve the problems of gradient explosion, gradient disappearance and the like in the neural network training process.

Aiming at the problem that the interaction of the second main stream method is limited by the quantity and the range of traffic agents, the invention uses asymmetric small convolution kernel and a large-size pooling and packing operation to model the interaction of the agents, thereby realizing the interaction without being limited by the quantity or the space-time range of the agents, avoiding the defects of LSTM and GCN and realizing the interaction of unlimited neighborhood.

Aiming at the problem that the node capacity of the graph neural network model such as GCN and the like for representing heterogeneous properties is insufficient in the existing first main stream method, the invention uses a layered graph attention module which learns the attention among categories and the attention among the intelligent agents, thereby helping to model interaction among the intelligent agents. The high-level attention module is used for constructing interactions among categories, and the low-level attention module is used for matching with the unrestricted neighborhood interaction module to finally model interactions among intelligent agents. Thus, the interaction of the heterogeneous intelligent bodies is well modeled. The invention uses Gaussian mixture model to distinguish different motion modes and influence forces among different intelligent agents.

Example 1

The method for predicting the track of the heterogeneous scene based on the interaction information in the convergent unrestricted neighbor domain, provided by the embodiment of the invention, utilizes the infinite neighbor domain interaction module to simultaneously generate the fusion characteristics of all heterogeneous agents participating in interaction, and adopts a self-adaptive asymmetric convolution network, so that the method can be suitable for any number of agents and any range of interaction areas. At the same time, a hierarchical diagram attention module is presented for capturing category-category interactions to guide instance-instance interactions. And finally, extracting track trend information through a time convolution network, and estimating parameters of the Gaussian mixture model to generate a future track. Numerous experimental results on benchmark datasets (nuScenes, apollonicape, SDD, etc.) show that the method of the present invention has significant performance improvements over the most advanced methods.

The embodiment of the invention discloses a track prediction method of a heterogeneous scene based on interactive information in a convergence unrestricted neighbor, which comprises the following steps:

step 1: graph characterization is performed on the space-time trajectories and categories of the heterogeneous intelligent agents:

1) The track points and the categories of all the agents in the complex traffic scene are characterized as the graph: the instance graph nodes are represented by track points of all heterogeneous intelligent agent instances, and a space interaction graph with the number equal to the number of time frames is obtained. The nodes of the high-level class diagram are represented by the tracks of all the same class intelligent agent examples, and class interaction diagrams with the number equal to the number of time frames are obtained;

2) Adjacency matrix established in the spatial dimension: the adjacency matrix is a full-connection matrix and represents the interrelation among all the examples through the space interaction diagram of the examples. Carrying out standardized Laplace transformation on the adjacent matrix to obtain a standardized Laplace matrix, namely carrying out Fourier transformation of the graph, so as to research the properties of the graph by utilizing the characteristic values and the characteristic vectors of the Laplace matrix of the graph, and further enabling nodes on the graph to obtain a certain degree of information interaction on a non-European structure without translation invariance;

3) Establishing a stable structure and a message transmission channel: the unbalanced example data in the class interaction diagram is zero-filled and completed, so that the class interaction diagram of each time frame keeps a relatively complete structure, and a connecting channel of the class interaction diagram and the example interaction diagram is designed to transmit information.

Step 2: learning heterogeneous attention through a category interaction graph:

1) And obtaining the category characteristics of each category on the constructed space-time category graph, and obtaining the interaction weight among the categories through pooling operation on the basis. Aiming at the problem of unbalanced number of instances in different scenes, the number of instances is aligned to the same number by adopting a filling operation. Then, each type of embedding is obtained by a linear projection;

2) Obtaining the embedding of each category by using a linear relation, and transversely connecting the embedding of any two categories to obtain the fusion embedding of a plurality of categories. Obtaining the attention vector of the category-category in the time frame t through a graph annotation mechanism;

3) Category-category attention weights are adjusted by a learnable weight vector and activated using a nonlinear function, and then a global attention score is obtained to measure interactions between one category and the other.

Step 3: learning heterogeneous attention through example spatiotemporal interaction diagrams:

Step 4: modeling interactions of unrestricted neighbors:

1) All heterogeneous agents involved in one interaction are defined as infinite neighbors, regardless of the number of agents and the distance apart. Adaptively capturing interactions between an indefinite number of heterogeneous intelligent agents using a symmetric convolution, the information of all heterogeneous intelligent agents involved in the same interaction being acquired at one time, wherein the asymmetric convolution comprises convolutions with convolution kernel sizes of 3 by 1,3 by 3,2 by 1,1 by 1;

Step 5: inferring a distribution function of future track points:

1) According to the interaction attention matrix of the heterogeneous intelligent agent obtained in the step 3 and the step 4, taking the interaction attention matrix as the side of the graph constructed in the step 1;

2) Interactions of the heterogeneous intelligent agent are extracted and modeled using a graph convolution operation, and historical trajectory movement trends of the heterogeneous intelligent agent are extracted and modeled using a time convolution operation. Wherein the number of layers of the graph convolution is 1, and the number of layers of the time convolution is 5;

3) And according to the spatial information and the time information obtained by the 1), 2), carrying out channel compression to obtain binary Gaussian mixture distribution conforming to logic so as to predict future tracks of heterogeneous intelligent bodies, wherein the channel compression uses convolution with convolution kernel of 1 by 1.

Example 2

According to the heterogeneous scene track prediction method based on the interaction information in the convergent unrestricted neighbor, the heterogeneous intelligent object track prediction algorithm with unrestricted neighbor interaction and hierarchical graph attention is provided for solving the problems that the interaction existing in the existing method is limited by the quantity and the range of traffic intelligent objects and the modeling of the interaction attention of the heterogeneous intelligent objects is lacking, interaction among the heterogeneous intelligent objects is effectively modeled, meanwhile, historical track trends are obtained according to the time continuity of the tracks, and the accuracy of heterogeneous track prediction is remarkably improved.

Step 2: learning heterogeneous attention through a category interaction graph:

1) In order to construct interaction among the categories, category characteristics of each category are obtained on the constructed space-time category diagram, and then interaction weights among the categories are obtained through pooling operation on the basis. Obtaining an embedded value H_t≡c of each class through a linear projection, namely embedding the class c in a time step t;

Step 4: modeling interactions of unrestricted neighbors:

1) All heterogeneous agents involved in one interaction are defined as infinite neighbors, regardless of the number of agents and the distance apart. Adaptively capturing interactions between an indefinite number of heterogeneous intelligent agents using a symmetric convolution, acquiring information of all the heterogeneous intelligent agents involved in the same interaction at one time;

Step 5: inferring a distribution function of future track points:

Table 1 shows the experimental results of this method compared to other methods at Argoverse, nuScenes and Apolloscape datasets. The experiment adopts average displacement error and final displacement error (ADE/FDE) as evaluation indexes, namely average error of 20 sample tracks and real tracks at each time point and average error of 20 sample tracks and real tracks at the last time point, wherein the lower the two indexes are, the better the representing effect is.

Table 1 experimental results of this method under Argoverse, nuScenes and apollonipe data sets

All methods observe 2 seconds and predict the trajectory for the next 3 seconds. Wherein the Apolloscape dataset uses weighted ADE and FDE metrics, i.e. weights of 0.20, 0.58 and 0.22 for vehicle, pedestrian and bicycle, respectively. The method of label "×" uses additional scene context information. The performance of the method exceeds that of all the current advanced methods.

Table 2 shows the experimental results of the present method under the SDD data set, and the specific settings are the same as table one.

Table 2 experimental results of this method under SDD dataset

The comparison is made with the previous method on the SDD reference dataset containing mainly pedestrian trajectories. Performance was assessed using ADE/FDE metrics (lower better). The method labeled "×" uses additional analog data.

Fig. 2 shows the track prediction result of the method of the present invention in the world coordinate system of nuScenes dataset, and it can be seen that the method of the present invention captures various interactions and track continuity between heterogeneous intelligent objects. The predicted traffic agent trajectories in the figure include the trajectory being turned, the trajectory when parallel is oriented in the same direction, two non-adjacent agent interactions, and collective interactions involving a group of agents of different categories. The method of the invention well captures the track movement of the intelligent agent under different states.

Fig. 3 shows a long-time spliced track prediction result of the method in the real scene of the nuScenes data set, and it can be seen that the method successfully predicts the real track of the heterogeneous intelligent agent traveling for a long time in the real track scene. The method has higher prediction precision.

Fig. 4 shows hierarchical interaction attention prediction results of the method according to the present invention, and it can be seen that the method according to the present invention learns better asymmetric interaction attention among heterogeneous intelligent objects, and the relationship between category attention (first column) and individual attention (second column) corresponds to the attention map of two real scenes (first row and second row). The

numerals

1,2,3 represent different categories of agents on each picture, respectively automobiles, bicycles and pedestrians. Color represents the attention weight. For example, on the upper left-hand corner attention diagram, black squares (row 3, column 1) represent class 3 (pedestrian) attention to class 1 (vehicle). It can be seen that, with respect to asymmetry, the method of the present invention successfully infers category attention and calculates overall agent attention under this guidance.

In summary, the invention discloses a heterogeneous scene track prediction method based on interaction information in convergent unrestricted neighbors, which belongs to the field of computer vision. And finally, predicting a Gaussian mixture distribution function of the heterogeneous intelligent object track in a future period by using the space-time diagram convolution.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents not departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.

Claims

1. The heterogeneous intelligent body track prediction method is characterized by comprising the following steps of:

modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result;

the step of learning heterogeneous attentions based on the category interaction graph to obtain category-level heterogeneous attentions specifically comprises the following steps:

2.4 Normalizing the obtained attention weight between any two categories to obtain a final category-category interaction as the side of the category interaction diagram;

The step of obtaining the heterogeneous attention of the heterogeneous intelligent agent level based on the space interaction diagram and the heterogeneous attention of the class level specifically comprises the following steps:

3.4 Using a fill operation to ensure that the output size is the same as the input size; aggregating global spatial interaction information by repeatedly computing an asymmetric convolution; obtaining the interaction attention of the heterogeneous intelligent body at the instance level by fusing the interaction of the infinite neighborhood and the category-category level;

Learning heterogeneous attentions based on the category interaction diagram, and obtaining category-level heterogeneous attentions,

the attention construction formula is as follows:

in the method, in the process of the invention,

For category c ₁ For category c ₂ Is (are) attention of->

Representing the final attention after weight assignment;

in the process of obtaining the heterogeneous attention of the heterogeneous intelligent body level based on the space interaction diagram and the heterogeneous attention of the class level,

the attention construction formula is as follows:

in the method, in the process of the invention,

2. The method for predicting the trajectory of the heterogeneous intelligent agent according to claim 1, wherein the step of obtaining the category interaction map and the space interaction map specifically comprises the steps of:

3. The method of claim 1, wherein in modeling interaction information and modeling temporal trends of trajectories based on heterogeneous attentions of the heterogeneous intelligent levels,

HT＝TCN(H _t )

4. A heterogeneous intelligent agent trajectory prediction system, comprising:

the prediction result acquisition module is used for modeling interaction information and modeling the time trend of the track based on the heterogeneous attention of the heterogeneous intelligent body level to obtain a heterogeneous intelligent body track prediction result;

the attention construction formula is as follows:

in the method, in the process of the invention,

For category c ₁ For category c ₂ Is (are) attention of->

Representing the final attention after weight assignment;

the attention construction formula is as follows:

in the method, in the process of the invention,

5. An electronic device comprising a processor and a memory, the processor configured to execute a computer program stored in the memory to implement the heterogeneous intelligent agent trajectory prediction method of any one of claims 1 to 3.

6. A computer readable storage medium storing at least one instruction that when executed by a processor implements the heterogeneous intelligent agent trajectory prediction method of any one of claims 1 to 3.