CN113159403A

CN113159403A - Method and device for predicting pedestrian track at intersection

Info

Publication number: CN113159403A
Application number: CN202110387223.8A
Authority: CN
Inventors: 李建波; 吕志强; 李浩然; 王玥
Original assignee: Qingdao University
Current assignee: Qingdao University
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-07-23
Anticipated expiration: 2041-04-13
Also published as: CN113159403B

Abstract

The application discloses a method and a device for predicting pedestrian tracks at an intersection, wherein the method comprises the steps of obtaining pedestrian track data of a preset intersection in a preset time period, wherein the pedestrian track data are longitude and latitude position data, and the preset intersection is an intersection of at least two roads; converting and standardizing the pedestrian track data according to a preset coordinate system to obtain a pedestrian track sequence, wherein the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and determines an X axis and a Y axis by the central lines of a plurality of roads corresponding to the intersection; inputting the pedestrian track sequence into a preset track prediction model to obtain a pedestrian track prediction result, wherein the prediction result is the walking direction of a pedestrian at a preset intersection in a future time period, and the preset track prediction model is a model obtained by obtaining a training sample according to historical pedestrian track data of the preset intersection and training through a time convolution neural network. The method and the device solve the problem that the existing pedestrian track prediction mode is high in complexity.

Description

Method and device for predicting pedestrian track at intersection

Technical Field

The application relates to the field of smart cities, in particular to a method and a device for predicting pedestrian tracks at intersections.

Background

The pedestrian trajectory prediction is an important research field of smart cities, and is beneficial to aspects of urban trip planning, traffic jam relief, urban commercial planning and the like. Pedestrian trajectories are largely divided into Location-Based Social networking (LBSN) long-term trajectories and continuous-Location short-term trajectories. The LBSN is a network which is established by intelligent terminal equipment and aims at social interaction, and is an online platform for users to share information such as interests, hobbies, states, activities and the like. LBSN provides users with location-based services that allow users to share their respective locations and detailed information of the locations in a social network. The long-term track prediction task of the pedestrian is based on the sign-in data of the pedestrian, and by combining implicit factors such as interests and hobbies, potential modes and rules of the data are mined, so that the situation that the pedestrian position is likely to appear in the future is predicted. Short-term trajectory prediction studies mainly perform a series of spatiotemporal feature calculations around the pedestrian's continuous position change over a short period of time. The track of the vehicle in the road is limited by road distribution and traffic signs, and the track route has strong regularity, so that the model can calculate the mutual relation of time and space conveniently. However, in real life, the influence of the pedestrian's trajectory by surrounding pedestrians and their nearby static obstacles is complex and variable and less restricted by traffic regulations, which presents greater challenges to the pedestrian's short-term trajectory prediction.

Currently, there are prediction methods based on deep move, prediction methods based on LSTPM (consisting of a non-local network for calculating long-term preference characteristics and a modified Recurrent neural network), and prediction methods based on ARNN (data is divided into two types, which are handled by an Attention layer and a current layer consisting of LSTM. In actually making a prediction of a pedestrian trajectory, the inventors found that the trajectory prediction can be regarded as a sequence generation task, that is, a prediction of a future trajectory based on a past position. Recurrent neural network models are often used to learn the general movements of humans and predict their future trajectories. The simple recurrent neural network cannot deal with the problem of exponential explosion or disappearance of weights (organizing gradient program) along with recursion, is difficult to capture long-term time association, and can well solve the problem by combining different long-term and short-term memory networks (LSTM). A recurrent neural network can describe dynamic temporal behavior because, unlike a feed-forward neural network (fed-forward neural network) which accepts inputs of a more specific structure, it passes states circularly in its own network and therefore can accept more extensive time-series structural inputs, as shown in fig. 1. However, the recurrent neural network has a fatal problem: each element within the sequence is directly related to all elements that are listed before the current element, and its complexity can grow explosively with the computation process.

In summary, the existing prediction method of the pedestrian track has the problem of high complexity.

Disclosure of Invention

The main purpose of the present application is to provide a method and a device for predicting pedestrian trajectories at intersections, which solve the problem that the existing prediction methods for pedestrian trajectories are still complicated.

To achieve the above object, according to a first aspect of the present application, a method for intersection pedestrian trajectory prediction is provided.

The method for predicting the pedestrian track at the intersection comprises the following steps:

acquiring pedestrian track data of a preset intersection within a preset time period, wherein the pedestrian track data are longitude and latitude position data, and the preset intersection is an intersection of at least two roads;

converting and standardizing the pedestrian track data according to a preset coordinate system to obtain a pedestrian track sequence, wherein the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and determines an X axis and a Y axis by the central lines of a plurality of roads corresponding to the intersection;

inputting the pedestrian track sequence into a preset track prediction model to obtain a pedestrian track prediction result, wherein the prediction result is the walking direction of a pedestrian at a preset intersection in a future time period, and the preset track prediction model is a model obtained by obtaining a training sample according to historical pedestrian track data of the preset intersection and training through a time convolution neural network.

Optionally, the inputting the pedestrian trajectory sequence into the preset trajectory prediction model to obtain the result of the pedestrian trajectory prediction includes:

performing matrix conversion, linearization processing and convolution processing on the pedestrian trajectory sequence to obtain pedestrian trajectory feature data containing each node feature in the pedestrian trajectory sequence and correlation features among the nodes;

and inputting the pedestrian track characteristic data, the distance auxiliary sequence and the time auxiliary sequence into a time convolution neural network, and obtaining the walking direction of the pedestrian after the processing of linearization and normalization functions.

Optionally, the step of converting and standardizing the pedestrian trajectory data according to a preset coordinate system to obtain a pedestrian trajectory sequence includes:

and converting the pedestrian trajectory data according to a preset coordinate system and processing the pedestrian trajectory data by a Z-Score standardization method to obtain a pedestrian trajectory sequence.

Optionally, the normalization function is a Log Softmax-based normalization function.

Optionally, the method further includes:

acquiring historical pedestrian track data of the preset intersection;

and training a model based on the historical pedestrian trajectory data and the time convolution neural network to obtain the preset trajectory prediction model.

Optionally, the preset intersection is an intersection, the preset coordinate system takes the center of the intersection as an origin of the coordinate system, and the central lines of four roads corresponding to the intersection are taken as X and Y axes.

Optionally, the preset intersection is a non-intersection, the preset coordinate system uses the center of the intersection as the origin of the coordinate system, and the center line of a certain road corresponding to the intersection is used as the X-axis or the Y-axis.

In order to achieve the above object, according to a second aspect of the present application, another intersection pedestrian trajectory prediction device is provided.

The device for predicting pedestrian tracks at intersections comprises:

the system comprises a first acquisition module, a second acquisition module and a control module, wherein the first acquisition module is used for acquiring pedestrian track data of a preset intersection within a preset time period, the pedestrian track data is longitude and latitude position data, and the preset intersection is an intersection of at least two roads;

the preprocessing module is used for converting and standardizing the pedestrian track data according to a preset coordinate system to obtain a pedestrian track sequence, wherein the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and determines an X axis and a Y axis by the central lines of a plurality of roads corresponding to the intersection;

the prediction module is used for inputting the pedestrian track sequence into a preset track prediction model to obtain a pedestrian track prediction result, the prediction result is the walking direction of a pedestrian at a preset intersection in a future time period, and the preset track prediction model is a model obtained by obtaining training samples according to historical pedestrian track data of the preset intersection and training through a time convolution neural network.

Optionally, the prediction module includes:

the processing unit is used for carrying out matrix conversion, linearization processing and convolution processing on the pedestrian track sequence to obtain pedestrian track characteristic data containing each node characteristic in the pedestrian track sequence and correlation characteristics among nodes;

and the prediction unit is used for inputting the pedestrian track characteristic data, the distance auxiliary sequence and the time auxiliary sequence into the time convolution neural network and obtaining the walking direction of the pedestrian after the processing of linearization and normalization functions.

Optionally, the preprocessing module is further configured to:

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring historical pedestrian track data of the preset intersection;

and the training module is used for training a model based on the historical pedestrian trajectory data and the time convolution neural network to obtain the preset trajectory prediction model.

In order to achieve the above object, according to a third aspect of the present application, there is provided a computer-readable storage medium storing computer instructions for causing the computer to execute the method for intersection pedestrian trajectory prediction of any one of the above first aspects.

In order to achieve the above object, according to a fourth aspect of the present application, there is provided an electronic apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method of intersection pedestrian trajectory prediction of any one of the first aspects above.

In the embodiment of the application, in the method and the device for predicting the pedestrian track at the intersection, pedestrian track data of a preset intersection within a preset time period are obtained, wherein the pedestrian track data are longitude and latitude position data, and the preset intersection is an intersection of at least two roads; converting and standardizing the pedestrian track data according to a preset coordinate system to obtain a pedestrian track sequence, wherein the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and determines an X axis and a Y axis by the central lines of a plurality of roads corresponding to the intersection; inputting the pedestrian track sequence into a preset track prediction model to obtain a pedestrian track prediction result, wherein the prediction result is the walking direction of a pedestrian at a preset intersection in a future time period, and the preset track prediction model is a model obtained by obtaining a training sample according to historical pedestrian track data of the preset intersection and training through a time convolution neural network. It can be seen that, in the embodiment of the present application, the walking direction of the pedestrian at the intersection is predicted according to the pedestrian track, and the existing data regression mode (based on the recurrent neural network mode) is converted into the mode of data regression and classification fusion. Specifically, when prediction is performed, the used preset trajectory prediction model is obtained based on time convolution neural network training, and the model fuses implicit factors such as trajectory data, distance sequences, time sequences and the like to realize a multi-modal data prediction task, so that the traditional recurrent neural network is replaced, and the complexity of time sequence calculation can be reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a schematic diagram of the relationship between elements in a sequence of a conventional recurrent neural network;

FIG. 2 is a flow chart of a method for pedestrian trajectory prediction at an intersection according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a default coordinate system provided in accordance with an embodiment of the present application;

FIG. 4 is a schematic diagram of another default coordinate system provided in accordance with an embodiment of the present application;

FIG. 5 is a diagram illustrating causal convolution and hole convolution in a TCN architecture;

FIG. 6 is a flow chart of another method for predicting pedestrian trajectories at intersections according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a distribution of results predicted in various ways according to an embodiment of the present application;

FIG. 8 is a schematic diagram of ROC curves for models provided according to embodiments of the present application;

FIG. 9 is a schematic diagram of F1 Score index variation of each model training process provided according to an embodiment of the present application;

FIG. 10 is a schematic illustration of model parameters for models provided in accordance with an embodiment of the present application;

FIG. 11 is a block diagram of an apparatus for pedestrian trajectory prediction at an intersection according to an embodiment of the present application;

fig. 12 is a block diagram of another device for predicting pedestrian trajectories at intersections according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Firstly, it should be noted that the method for predicting pedestrian tracks at intersections in the embodiment of the present application can preset the walking direction in a short time in the future according to the track of the pedestrian in the current time period, so that the prediction mode is applied to intelligent traffic, for example, the method can perform early warning on the pedestrian or vehicle according to the prediction on the walking direction of the pedestrian, thereby reducing the occurrence of accidents; in addition, the pedestrian walking direction flow can be predicted according to the prediction of the pedestrian walking direction, so that the time length of the traffic light is adjusted, and the traffic is more intelligent.

According to an embodiment of the present application, there is provided a method for predicting pedestrian trajectories at an intersection, as shown in fig. 2, the method includes the following steps:

s101, acquiring pedestrian track data of a preset intersection in a preset time period.

The preset intersection is an intersection of at least two roads, such as a T-shaped intersection, a three-way intersection, an intersection with multiple traveling directions, and the like. Particularly, intersections of several roads, and the embodiment of the present application is not limited. The models corresponding to different preset intersections have some differences, and the main difference lies in the data preprocessing stage, the selection of the preset coordinate system is different, and in addition, the differences can be explained in more detail in the subsequent steps with different predicted results (different types of walking directions).

The embodiment of the present application mainly aims at the prediction of short-term pedestrian trajectories, and therefore the preset time period is a short time period, which may be 20s, 30s, and the like. The preset time interval is consistent with the selection of the training data in the training of the preset track prediction model in the subsequent step. Namely, the training samples are selected from the pedestrian trajectory data in the long time period, and the pedestrian trajectory selected in the prediction is also in the long time period.

The pedestrian trajectory data is longitude and latitude position data, and the longitude and latitude data can be acquired through positioning equipment.

And S102, converting and standardizing the pedestrian track data according to a preset coordinate system to obtain a pedestrian track sequence.

In practical application, the accuracy of the longitude and latitude position data directly obtained is poor, and the prediction of short-distance pedestrian tracks corresponding to a short period cannot be met, for example, for two position points with an actual distance difference of 7 meters, in the longitude and latitude position data, the difference can be displayed only after the 7 th position and the 8 th position are accurate to decimal points, so that the prediction is not accurate directly based on the longitude and latitude position data, and the accurate prediction is more suitable after the small difference is properly amplified. Therefore, the acquired longitude and latitude position data needs to be preprocessed first.

Specifically, the pretreatment process in this embodiment includes: and converting the pedestrian trajectory data according to a preset coordinate system and standardizing. The preset coordinate system takes the center of the intersection as the origin of the coordinate system, and determines an X axis and a Y axis by the central lines of a plurality of roads corresponding to the intersection. It should be noted that the determination modes of the origin of the coordinate systems corresponding to different types of preset intersections are the same, and the determination of different X-axes and Y-axes is different. Specifically, as shown in fig. 3, if the preset intersection is an intersection, the central lines of four roads corresponding to the intersection are taken as the X and Y axes. As shown in fig. 4, if the preset intersection is a non-intersection, the preset coordinate system uses the center of the intersection as the origin of the coordinate system, and uses the center line of a certain road corresponding to the intersection as the X-axis or the Y-axis.

And converting the coordinates of the pedestrian track based on a preset coordinate system to obtain two sequences { X } and { Y } after conversion, wherein the sequence { X } is an abscissa set of the track in the coordinate system, and the sequence { Y } is an ordinate set of the track in the coordinate system. The conversion is followed by normalization, and the specific normalization method can be Z-Score normalization or Min-Max and Decimal Scaling, but the Z-Score normalization method is more effective in practical experiments. The Z-Score normalization method is applicable to situations where the maximum and minimum values of the data are unknown, or where there is outlier data that is outside of the range of values. Different from the driving mode of a vehicle, the behavior of people is complex and changeable, and the situation of curve traveling or road shortcut searching occurs sometimes, so that the Z-Score standardization method is suitable for the application scene of pedestrian trajectory prediction.

S103, inputting the pedestrian track sequence into a preset track prediction model to obtain a pedestrian track prediction result, wherein the prediction result is the walking direction of the pedestrian at the preset intersection in the future time period.

The preset track prediction model is a model obtained by obtaining training samples according to historical pedestrian track data of a preset intersection and training in advance through a time convolution neural network. And obtaining different types of preset track prediction models for different types of preset intersections. In addition, the pedestrian track conditions at different intersections may also have a large difference, and therefore, preset intersections of the same type but different positions may also correspond to different preset track prediction models. The principle of training of different models is the same, except for the training samples.

Specifically, each training sample comprises pedestrian trajectory data in a preset time period and the walking direction of a pedestrian after the preset time period. For the walking direction, the specific labeling manner given in this embodiment is, as shown in fig. 3, 4 walking directions corresponding to 4 roads, where the walking direction is determined by the final position of the pedestrian after passing through the intersection, and different roads may be respectively labeled as 1, 2, 3, and 4. If there are three walking directions, they can be labeled 1, 2, 3. Before inputting the training sample to the time convolution neural network training for training, similarly performing the preprocessing process in the step S102, obtaining a corresponding pedestrian trajectory sequence after preprocessing, and then performing matrix conversion, linearization processing and convolution processing on the pedestrian trajectory sequence to obtain pedestrian trajectory feature data including each node feature in the pedestrian trajectory sequence and correlation features between nodes; and then inputting the pedestrian track characteristic data, the distance auxiliary sequence and the time auxiliary sequence into a time convolution neural network for training to finally obtain a preset track prediction model which is input as a pedestrian track sequence and output as a pedestrian walking direction mark. The process of model training is the same as the subsequent process of predicting the walking direction based on the model, so that the subsequent prediction process can be referred to, and the details are not repeated here.

It should be noted that, in the embodiment of the present application, the conventional recurrent neural network is replaced by a time convolutional neural network, which takes advantage of the idea of massive parallel processing of the recurrent neural network, and maps a multidimensional matrix into a time sequence, obtains a sufficiently large receptive field through a multilayer network, and then performs deep network parallel processing. As shown in fig. 5, which is a causal convolution and a hole convolution in the TCN architecture, it can be seen from fig. 5 that the value of each layer depends only on the historical value of the previous layer, which characterizes the causal convolution. And the extraction of the previous layer information by each layer is in a jump type, and the layer-by-layer void factor d increases by an exponential of 2, thereby reflecting the property of void convolution. The embodiment of the application just references the calculation mode of the spatial correlation and the time dependency, integrates various implicit factors to increase the spatial-temporal characteristics of data, and reduces the complexity of time sequence calculation.

Specifically, inputting the pedestrian trajectory sequence into a preset trajectory prediction model to obtain a pedestrian trajectory prediction result comprises the following steps:

1) performing matrix conversion, linearization processing and convolution processing on the pedestrian trajectory sequence to obtain pedestrian trajectory feature data containing each node feature in the pedestrian trajectory sequence and correlation features among the nodes;

specifically, matrix conversion is performed on the pedestrian track sequence, namely, the normalized sequences { X } and { Y } obtained in the previous step are connected through concatenate to obtain a data matrix, then linearization processing is performed to obtain the input dimension of the Conv1d, and local connection enables the Conv1d to extract the local features of the data. The convolution process is to perform template matching in each local area of the matrix. The pooling operation of Conv1d is a down-sampling process, and the relative relationship between different features plays an important role, which can increase the generalization ability of the model by calculating the translation relationship and controlling the overfitting. Pedestrian trajectory feature data containing features of each node in the pedestrian trajectory sequence and correlation features between the nodes can be obtained through the matrix conversion, the linear processing and the convolution processing.

2) And inputting the pedestrian track characteristic data, the distance auxiliary sequence and the time auxiliary sequence into a time convolution neural network, and obtaining the walking direction of the pedestrian after the processing of linearization and normalization functions.

The distance Auxiliary sequence and the time Auxiliary sequence are sequences consisting of Auxiliary Factors Auxiliary Factors, and the paving factor is calculated according to the difference value between the first node and other nodes in the pedestrian track sequence, as shown in formula 1:

A_i＝a_i-a₀,i∈[0,len(a)]equation 1

Wherein A represents the sequence of different cofactors, a_iDenotes the i +1 th node, a₀Represents the 1 st node, and len (a) represents the length of the pedestrian trajectory sequence, i.e., the number of nodes.

It should be noted that formula 1 gives the generation method of the cofactor sequence, and the specific calculation method of different cofactors needs to be adaptively adjusted. The calculation formula of the distance-assisted sequence in this application needs to be adjusted, and specifically, the distance difference (distance-assisted factor) between different nodes can be calculated according to the following formula 2:

wherein d represents the sequence of distance cofactors_iThe ith distance spread factor in the sequence representing the distance helper factor is the distance difference between the (i + 1) th node and the 1 st node, x_iAnd y_iValues, x, representing the horizontal and vertical axes of the (i + 1) th node₀And y₀Values on the horizontal and vertical axes represent the 1 st node.

In addition, the spreading factor in the present application also includes a time auxiliary factor, and the calculation mode of each time auxiliary factor in the specific time spreading sequence can be determined by referring to the calculation mode of formula 1.

After the distance auxiliary sequence and the time auxiliary sequence are determined according to the mode, the pedestrian track characteristic data, the distance auxiliary sequence and the time auxiliary sequence are input into a time convolution neural network for training, the specific time convolution neural network comprises a plurality of Residual blocks, and the Residual blocks are connected through a short connection so that the data are optimized more easily. If the target function h (x) is approximated by a non-linear unit f (x, theta), the target function can be split into an identity function x and a residual function h (x) -x, and a non-linear unit composed of a neural network approximates the original target function with sufficient ability according to the general approximation theorem. Thus, the original problem is transformed into: let the linear element f (x, θ) approximate the residual function h (x) -x, and approximate h (x) with f (x, θ) + x. The multi-layer residual structure has also been shown to be beneficial for the non-linearization process of deep neural networks, while the performance of the single-layer residual structure does not perform well. The residual block in the time convolution neural network mainly comprises two processes: causal convolution and hole convolution.

The formula for the Causal Convolution Causal Convolution is shown in equation 3, where { b }₁,b₂,...,b_tIs the input sequence, { c₁,c₂,...,c_tIs the hidden layer output sequence, { f₁,f₂,...,f_kAnd represents a filter. Causal convolution only focuses on historical information and ignores future information, c_tResult of (a) only will be from b_tThe previous data is derived. And the larger K is, the more history information can be traced back, if the original input sequence of the current layer is [0, i]I.e. the sequence of i +1 nodes with indices from 0 to i, the original input sequence of the next layer becomes 0, i +1]I.e. subscripts from 0 to i +1, i.e. i +2 nodes make up the sequence.

The formula of the hole Convolution is shown in formula 4, where d represents a hole factor, and it varies exponentially according to 2 according to the depth of the network, and increasing d or K can increase the scope of the receptive field. The result of each layer at the time t can only be calculated through [0, t ], namely data from 0 to t, and the idea of causal convolution is embodied. The result of each moment is obtained by jumping values in the previous layer of network according to the hole factor, which embodies the idea of hole convolution.

After the multi-layer residual structure is processed through Linearization, and then normalized normalization is performed, specifically, the processing is performed by using a Log Softmax function in this embodiment, and compared with a conventional Softmax function, the Log Softmax function is added with one Log operation on the basis of Softmax, specifically, as shown in formula 5. The method can solve the problems of data overflow and underflow, and can accelerate the operation speed and improve the stability of data.

Wherein m is_iProbability value, m, of the class of the highest probability in the classification result_jIs the sum of the probabilities classified into each class in the classification result. Specifically, for the case that the preset intersection is the intersection, the classification result (i.e. the final walking direction of each pedestrian track) includes four categories, namely 1, 2, 3 and 4, and for a certain pedestrian track, m is_iIs the probability of being classified into the class with the highest probability among 4 classes, m_jIs the sum of the probabilities classified into each of the 4 classes.

After the Log Softmax, an output result can be finally obtained, for the condition that the preset intersection is the crossroad, the output result of each pedestrian track can be any one of 1, 2, 3 and 4, and then the walking direction of the pedestrian can be determined according to the corresponding relation between 1, 2, 3 and 4 and the road.

In addition, in the whole prediction process, the coordinates of the future position are actually predicted according to the pedestrian track, the prediction of the future position can obtain the prediction of the future track, and the prediction of the track can determine the final walking direction. The prediction of position corresponds to an intermediate result, the final result being the direction of walking. The embodiment of the application converts the existing data regression mode into a mode with data regression and classification fusion. The preset track prediction model is a deep space-time model, and the model fuses track data and implicit factors such as distance difference values, time difference values and the like to realize a multi-modal data prediction task. In which a convolutional neural network is used to perform data spatial correlation feature calculation (pedestrian trajectory feature data corresponding to each node feature and the correlation feature between nodes obtained by the convolution processing in the foregoing), and a time convolutional neural network TCN is used to perform data time dependency calculation.

From the above description, it can be seen that in the method for predicting pedestrian tracks at an intersection according to the embodiment of the present application, pedestrian track data at a preset intersection and in a preset time period are obtained, where the pedestrian track data are longitude and latitude position data, and the preset intersection is an intersection of at least two roads; converting and standardizing the pedestrian track data according to a preset coordinate system to obtain a pedestrian track sequence, wherein the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and determines an X axis and a Y axis by the central lines of a plurality of roads corresponding to the intersection; inputting the pedestrian track sequence into a preset track prediction model to obtain a pedestrian track prediction result, wherein the prediction result is the walking direction of a pedestrian at a preset intersection in a future time period, and the preset track prediction model is a model obtained by obtaining a training sample according to historical pedestrian track data of the preset intersection and training through a time convolution neural network. It can be seen that, in the embodiment of the present application, the walking direction of the pedestrian at the intersection is predicted according to the pedestrian track, and the existing data regression mode (recurrent neural network) is converted into a mode of data regression and classification fusion. Specifically, when prediction is performed, the used preset trajectory prediction model is obtained based on time convolution neural network training, and the model fuses implicit factors such as trajectory data, distance sequences, time sequences and the like to realize a multi-modal data prediction task, so that the traditional recurrent neural network is replaced, and the complexity of time sequence calculation can be reduced.

Further, another intersection pedestrian trajectory prediction method is provided in an embodiment of the present application, as shown in fig. 6, which is described by taking an intersection as an example, in fig. 6, the input data is original longitude and latitude position data of a pedestrian trajectory within a preset Time period, a pedestrian trajectory Sequence is obtained after processing, then pedestrian trajectory feature data including features of each node in the pedestrian trajectory Sequence and correlation features between the nodes is obtained after Spatial Conv processing, which includes matrix conversion (correlation Sequence), Linearization (Linearization), and convolution (Conv1d) on the pedestrian trajectory Sequence, and then the pedestrian trajectory feature data and two spread sequences (Distance Sequence, Time Sequence) are simultaneously input into a Time convolutional neural network (Temporal Conv) for training, where the Temporal Conv includes a multi-layer residual structure, and after output, the Temporal Conv is further trained in the Temporal convolutional neural network (Temporal Conv), and the Temporal Conv includes a multi-layer residual structure, Log Softmax, and finally obtaining the results of the walking direction prediction, namely '1, 2, 3 and 4', wherein each result corresponds to one walking direction of the intersection.

Further, in addition to or as a refinement of the above embodiment, the following contents are also included:

in order to verify the effect of the prediction method of the present application, comparative analysis was performed with other various existing methods, specifically as follows:

the performance of the prediction mode in the application and the existing mode based on other models are compared through four evaluation indexes. The specific four evaluation indexes are Accuracy, Precision, Recall and F1 Score. Wherein, Accuracy is an evaluation index of the traditional classification problem, and represents the percentage of the correct result of model prediction in the total sample; precision represents the probability of actually being a positive sample among all samples predicted to be positive; recall represents the probability of being predicted as a positive sample among the actual positive samples; precision and Recall indexes are sometimes traded off, i.e., Precision is high, Recall is reduced, and Precision and Recall are considered in some scenarios, the most common method is F1 Score. The specific following table shows four calculation modes of evaluation indexes:

Accuracy	(TP+TN)/(TP+TN+FP+FN)
		Precision	TP/(TP+FP)
Recall	TP/(TP+FN)
		F1 Score	(2×Precision×Recall)/(Precision+Recall)

TP, predicting the positive classes into the number of the positive classes by the model; FN, the model predicts the number of positive classes as negative classes; the model predicts the negative classes as the number of the positive classes; TN model predicts negative classes as the number of negative classes.

After the evaluation index is determined, the same pedestrian trajectory data are respectively predicted based on 3 common recurrent neural networks (RNN, LSTM, GRU) and 5 existing pedestrian prediction special models (the structures of the five models are as follows), and the results are shown in the following table.

The structures of the five models are as follows:

Fuzzy-LSTM this model sets two LSTM modules to compute the periodicity and proximity of the trajectory, respectively. The two LSTM module outputs are finally merged.

Encode-Decoder the main idea of the model is based on a recurrent neural network, which encodes an input sequence into a vector of fixed length and then decodes the vector into an output sequence.

AttentGAN this model is preceded by the addition of the Attenttion mechanism to the Encoder-Decoder structure consisting of LSTMs.

ARNN data is divided into two types, originally handled by the Attention layer and by the Current layer consisting of LSTM. The output of the Attention layer is one of the inputs to the LSTM.

DeepMove track data is firstly processed by a multi-modal embedding layer, and then track characteristics are calculated by a GRU and an Attention mechanism.

From the results in the above table, it can be seen that the mode of the present application has the highest accuracy compared to the existing model, and performs best in the comparison of the four evaluation indexes. Compared with the conventional recurrent neural networks (RNN, LSTM and GRU), the evaluation index is improved by 84.08% to the maximum. The comparison result of the mode and the recurrent neural network proves that the performance of the model can be remarkably improved by a calculation mode only carrying out time dependency compared with the combination of the spatial dependency and the time dependency. Compared with a special deep learning model (Fuzzy-LSTM, Encoder-Decoder, AttentGAN, DeepMove), the evaluation index is improved by 29.86 percent at most. The comparison result of the mode of the application and the model of other special structures proves the performance advantage of the combination of the convolutional neural network and the TCN in the track prediction. In the mode of the application, the accuracy of the model can be improved by increasing the number of network layers of the TCN. However, the increase in the number of layers sacrifices the speed of training. In a practical application scenario, the ratio of the training speed to the accuracy of the model should be comprehensively considered.

In addition, the predicted results of several models with better performance are compared according to the distribution of the predicted results, specifically, as shown in fig. 7, wherein a represents ARNN, b represents DeepMove, and c represents the present application). From the experimental results, we can understand that the predicted results of the three models for the fourth category are better, because the data volume of the fourth category is more than that of other categories in the actual data set, and the sample characteristics are rich. The present embodiment exhibits good accuracy even when the first category of data is small. The method fully utilizes the historical information of the track, all the historical information participates in the calculation process of the temporal constraint layer, and the process fully excavates the position information from the pedestrian starting point to the intersection center point.

In addition, in order to more intuitively show the accuracy of the method of the application, the ROC curve is used for explaining that the closer the ROC curve is to the upper left corner, the higher the recall ratio of the model is, namely, the point on the ROC curve has the least classification error. To compare the performance of different models, ROC curves for each model were plotted into the same coordinate system, as shown in fig. 8. The method, DeepMove and ARNN have obvious precision advantages, and the spatial correlation and the time dependency can be reserved due to the structure of the spatio-temporal model. The method uses the dynamically changed receptive field and longer-term historical information, so that the method can more accurately capture the space-time relationship and the recursion relationship among all nodes of the pedestrian track.

As mentioned above, the void convolution mechanism and the pure convolution structure of TCN strongly improve the model convergence speed of the training process. Most of the existing special models are designed based on the traditional recursive network, and the existing special models cannot avoid the single-step computing mode of the recursive network. The variation of F1 Score index of these models and the model training process in the present application is shown in fig. 9, and it can be known from fig. 9 that AttenGAN, ARNN and DeepMove have slow performance variation in the early stage of model training, and the performance of the model reaches the optimum only in the end stage of the whole training process. The strong model convergence capability in the mode enables the model to reach the optimal performance value in the early stage of training, and the model does not have the overfitting phenomenon in the whole model training stage. The experimental results of the 3-layer structure and the 5-layer structure in the mode of the application prove that in a certain range, the performance and the speed of the model in the mode of the application can be improved by increasing the number of TCN network layers, and high robustness cannot be influenced.

Because deep learning depends on tens of millions of network parameters in a neural network to participate in calculation, the defects of complex network structure, large calculation amount and low speed exist, and the deep learning is difficult to transplant into embedded equipment. As the number of layers of the network model is deeper and deeper, parameters are more and more, and the size reduction and the calculation loss of the parameters are important. The causal convolution mechanism in the mode realizes the tracing and fusion of the historical information on the characteristic level, and unlike the calculation process of DeepMove which only carries out calculation on the historical information on the data layer, the DeepMove has huge parameters due to the process. The model in the present embodiment is a lightweight complete convolution model. Compared with other special pedestrian prediction models, the model parameter quantity is minimum, as shown in fig. 10.

Finally, beneficial effects of the method for predicting the pedestrian track at the intersection are summarized as follows:

1. converting the existing data regression mode into a data regression and classification fusion mode;

2. a deep space-time model is provided, implicit factors such as trajectory data, distance difference values and time difference values are fused by the model to realize a multi-modal data prediction task, and complexity of time sequence calculation is reduced.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present application, there is also provided an apparatus for implementing intersection pedestrian trajectory prediction in the foregoing method embodiment, as shown in fig. 11, the apparatus includes:

the first acquisition module 21 is configured to acquire pedestrian trajectory data at a preset intersection within a preset time period, where the pedestrian trajectory data is longitude and latitude position data, and the preset intersection is an intersection of at least two roads;

the preprocessing module 22 is configured to convert and standardize the pedestrian trajectory data according to a preset coordinate system to obtain a pedestrian trajectory sequence, where the preset coordinate system uses the center of the intersection as an origin of the coordinate system, and determines an X axis and a Y axis with center lines of multiple roads corresponding to the intersection;

the prediction module 23 is configured to input the pedestrian trajectory sequence into a preset trajectory prediction model to obtain a pedestrian trajectory prediction result, where the prediction result is a walking direction of a pedestrian at a preset intersection in a future time period, and the preset trajectory prediction model is a model obtained by obtaining a training sample according to historical pedestrian trajectory data of the preset intersection and training the training sample through a time convolution neural network.

From the above description, it can be seen that in the device for predicting pedestrian tracks at an intersection according to the embodiment of the present application, pedestrian track data at a preset intersection and in a preset time period are obtained, where the pedestrian track data are longitude and latitude position data, and the preset intersection is an intersection of at least two roads; converting and standardizing the pedestrian track data according to a preset coordinate system to obtain a pedestrian track sequence, wherein the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and determines an X axis and a Y axis by the central lines of a plurality of roads corresponding to the intersection; inputting the pedestrian track sequence into a preset track prediction model to obtain a pedestrian track prediction result, wherein the prediction result is the walking direction of a pedestrian at a preset intersection in a future time period, and the preset track prediction model is a model obtained by obtaining a training sample according to historical pedestrian track data of the preset intersection and training through a time convolution neural network. It can be seen that, in the embodiment of the present application, the walking direction of the pedestrian at the intersection is predicted according to the pedestrian track, and the existing data regression mode (recurrent neural network) is converted into a mode of data regression and classification fusion. Specifically, when prediction is performed, the used preset trajectory prediction model is obtained based on time convolution neural network training, and the model fuses implicit factors such as trajectory data, distance sequences, time sequences and the like to realize a multi-modal data prediction task, so that the traditional recurrent neural network is replaced, and the complexity of time sequence calculation can be reduced.

Further, as shown in fig. 12, the prediction module 23 includes:

the processing unit 231 is configured to perform matrix conversion, linearization processing and convolution processing on the pedestrian trajectory sequence to obtain pedestrian trajectory feature data including features of each node in the pedestrian trajectory sequence and correlation features between the nodes;

and the prediction unit 232 is configured to input the pedestrian trajectory feature data, the distance auxiliary sequence and the time auxiliary sequence into the time convolutional neural network, and obtain the walking direction of the pedestrian after the processing of the linearization and normalization functions.

Further, the preprocessing module 22 is further configured to:

Further, the normalization function is a Log Softmax-based normalization function.

Further, as shown in fig. 12, the apparatus further includes:

the second obtaining module 24 is configured to obtain historical pedestrian trajectory data of the preset intersection;

and the training module 25 is configured to perform model training based on the historical pedestrian trajectory data and the time convolution neural network to obtain the preset trajectory prediction model.

Further, the preset intersection is an intersection, the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and the central lines of the four roads corresponding to the intersection are taken as the X and Y axes.

Further, the preset intersection is a non-intersection, the preset coordinate system takes the center of the intersection as the origin of the coordinate system, and the center line of a certain road corresponding to the intersection is taken as an X-axis or a Y-axis.

Specifically, the specific process of implementing the functions of each unit and module in the device in the embodiment of the present application may refer to the related description in the method embodiment, and is not described herein again.

According to an embodiment of the present application, there is further provided a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions for causing the computer to execute the method for predicting pedestrian trajectories at intersections in the above method embodiment.

According to an embodiment of the present application, there is also provided an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method of intersection pedestrian trajectory prediction in the above method embodiments.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of intersection pedestrian trajectory prediction, the method comprising:

2. The method for predicting pedestrian trajectories at intersections according to claim 1, wherein the inputting the pedestrian trajectory sequence into a preset trajectory prediction model to obtain the pedestrian trajectory prediction result comprises:

3. The method for predicting pedestrian trajectories at intersections according to claim 2, wherein the step of converting and standardizing the pedestrian trajectory data according to a preset coordinate system to obtain a pedestrian trajectory sequence comprises:

4. The method of intersection pedestrian trajectory prediction of claim 2, wherein the normalization function is a Log Softmax-based normalization function.

5. The method of intersection pedestrian trajectory prediction of claim 1, further comprising:

acquiring historical pedestrian track data of the preset intersection;

6. The method for predicting pedestrian trajectories at intersections according to claim 1, wherein the predetermined intersection is an intersection, the predetermined coordinate system is an origin point of the coordinate system based on a center of the intersection, and center lines of four roads corresponding to the intersection are X and Y axes.

7. The method for predicting pedestrian trajectories at intersections according to claim 1, wherein the predetermined intersection is a non-intersection, the predetermined coordinate system is based on a center of the intersection as an origin of the coordinate system, and a center line of a road corresponding to the intersection is based on an X-axis or a Y-axis.

8. An apparatus for intersection pedestrian trajectory prediction, the apparatus comprising:

9. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of intersection pedestrian trajectory prediction of any one of claims 1 to 7.

10. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method of intersection pedestrian trajectory prediction of any one of claims 1 to 7.