CN110516613A - A Pedestrian Trajectory Prediction Method from the First View - Google Patents

A Pedestrian Trajectory Prediction Method from the First View Download PDF

Info

Publication number
CN110516613A
CN110516613A CN201910807214.2A CN201910807214A CN110516613A CN 110516613 A CN110516613 A CN 110516613A CN 201910807214 A CN201910807214 A CN 201910807214A CN 110516613 A CN110516613 A CN 110516613A
Authority
CN
China
Prior art keywords
layer
motion
camera
future
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910807214.2A
Other languages
Chinese (zh)
Other versions
CN110516613B (en
Inventor
刘洪波
李伯林
江同棒
张博
汪大峰
戴光耀
李科
林正奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201910807214.2A priority Critical patent/CN110516613B/en
Publication of CN110516613A publication Critical patent/CN110516613A/en
Application granted granted Critical
Publication of CN110516613B publication Critical patent/CN110516613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

本发明公开了一种第一视角下的行人轨迹预测方法,采用编解码结构结合循环卷积网络来预测第一视角下行人轨迹策略。原始图像经过编码得到的行人轨迹信息的特征向量,然后进行解码特征向量,预测出未来的行人的轨迹信息。在公共数据集和自己采集到的数据集里,本发明都会准确的预测出多个行人的未来10帧的轨迹信息,最终预测轨迹和最终实际轨迹之间的L2距离误差提高到40,比现有方法提高了30个像素精度。本发明提出了预测行人轨迹的时空卷积循环网络方法,利用一维卷积进行编解码处理,通过时空卷积网络预测,在目前的相关方法中,实现较简单、数据获取和处理清晰、简洁,实用性强。

The invention discloses a method for predicting pedestrian trajectories from a first perspective, which adopts a codec structure combined with a circular convolution network to predict pedestrian trajectories from a first perspective. The original image is encoded to obtain the feature vector of pedestrian trajectory information, and then the feature vector is decoded to predict the future trajectory information of pedestrians. In the public data set and the data set collected by itself, the present invention will accurately predict the trajectory information of multiple pedestrians in the next 10 frames, and the L2 distance error between the final predicted trajectory and the final actual trajectory is increased to 40, which is higher than the current There are ways to improve the 30 pixel accuracy. The present invention proposes a time-space convolutional loop network method for predicting pedestrian trajectories, uses one-dimensional convolution for encoding and decoding processing, and predicts through the space-time convolution network. In the current related methods, the implementation is relatively simple, and the data acquisition and processing are clear and concise. , Strong practicability.

Description

一种第一视角下的行人轨迹预测方法A Pedestrian Trajectory Prediction Method from the First View

技术领域technical field

本发明涉及一种行人轨迹预测方法,特别是一种第一视角下的行人轨迹预测方法。The invention relates to a method for predicting pedestrian trajectories, in particular to a method for predicting pedestrian trajectories from a first perspective.

背景技术Background technique

在自动驾驶和机器人技术蓬勃发展的今天,通过车载摄像头获取车辆周围环境信息,预测视频中行人轨迹信息,控制车辆驾驶行为,做出更加合理的路径规划以进行障碍物、行人规避,是一个十分重要的任务。Today, with the rapid development of autonomous driving and robot technology, it is a very important task to obtain the surrounding environment information of the vehicle through the on-board camera, predict the trajectory information of pedestrians in the video, control the driving behavior of the vehicle, and make a more reasonable path planning to avoid obstacles and pedestrians. important task.

非第一视角,如监控摄像头下行人的轨迹预测不必考虑摄像头自身的运动对于行人轨迹预测的影响,比如监控摄像头里行人检测框在视频中越来越大表明行人走向摄像头。但第一视角区别监控视频等固定拍摄的视频,机器人或拍摄者自身的运动,直接影响视频中行人信息的获取和预测。第一视角属于运动视角,拍摄者自身也在运动,这种运动会影响对于行人的未来行为的判断,比如在第一视角下,行人越来越大,那么就不能确定行人是向摄像头运动还是摄像头在靠近行人,行人轨迹预测也是不准确的。Non-first perspective, such as the trajectory prediction of pedestrians under the surveillance camera does not need to consider the impact of the camera’s own motion on the pedestrian trajectory prediction. For example, the pedestrian detection frame in the surveillance camera is getting larger and larger in the video, indicating that pedestrians are walking towards the camera. However, the first-person perspective distinguishes fixed-shot videos such as surveillance videos, and the movement of robots or the photographer itself directly affects the acquisition and prediction of pedestrian information in the video. The first perspective is a movement perspective, and the photographer himself is also moving. This movement will affect the judgment of the future behavior of the pedestrian. In close proximity to pedestrians, pedestrian trajectory prediction is also inaccurate.

发明内容Contents of the invention

为解决现有技术存在的上述问题,本发明要提出一种能提升第一视角下行人的轨迹预测精度的第一视角下的行人轨迹预测方法。In order to solve the above-mentioned problems in the prior art, the present invention proposes a pedestrian trajectory prediction method in the first perspective that can improve the trajectory prediction accuracy of pedestrians in the first perspective.

本发明的思路是:基于编解码结构的模型,通过引入行人位置信息、自我运动历史信息和循环卷积网络预测行人未来的轨迹,并通过加入自我运动的未来信息来有效的提升视频中行人未来轨迹预测的精度。The idea of the present invention is: based on the codec structure model, by introducing pedestrian position information, self-motion history information and circular convolution network to predict the future trajectory of pedestrians, and by adding the future information of self-motion to effectively improve the future of pedestrians in the video Accuracy of trajectory prediction.

为了实现上述目的,本发明的技术方案如下:一种第一视角下的行人轨迹预测方法,包括以下步骤:In order to achieve the above object, the technical solution of the present invention is as follows: a pedestrian trajectory prediction method under the first viewing angle, comprising the following steps:

A、网络编码器编码得到轨迹特征A. Network encoder encoding to obtain trajectory features

A1、行人头部佩戴或手持运动摄像头,实时获取第一视角下的录取的视频;A1. Pedestrians wear or hold motion cameras on their heads to obtain recorded videos from the first perspective in real time;

A2、把视频按照每秒k帧的帧率分成若干幅图像,k的范围是5~20;A2. Divide the video into several images at a frame rate of k frames per second, where k ranges from 5 to 20;

A3、通过处理器处理步骤A2中分好的图像,经过以下步骤获取行人位置特征向量:A3, the image divided in step A2 is processed by the processor, and the pedestrian position feature vector is obtained through the following steps:

A31、通过标记工具对图像中行人进行标记,标记出行人检测框;A31. Use the marking tool to mark the pedestrians in the image, and mark the pedestrian detection frame;

A32、通过时间窗口采样算法对步骤A31中标记的行人检测框进行矫正。由于图像空间中坐标原点在图像的左上角,横轴坐标x值从左向右递增,纵轴坐标y值从上到下递增,所以取行人检测框左上角位置信息(xi min,yi min)T及右下角位置信息(xi max,yi max)T作为行人轨迹数据。将连续n帧所包括的所有行人的轨迹序列作为一组训练样本,n的范围是10~20,每个行人的训练样本记作LinA32. Correct the pedestrian detection frame marked in step A31 through the time window sampling algorithm. Since the coordinate origin in the image space is in the upper left corner of the image, the x value of the horizontal axis coordinate increases from left to right, and the y value of the vertical axis coordinate increases from top to bottom, so the position information of the upper left corner of the pedestrian detection frame (x i min , y i min ) T and the position information of the lower right corner ( xi max , y i max ) T are used as pedestrian trajectory data. Take the trajectory sequences of all pedestrians included in n consecutive frames as a set of training samples, the range of n is 10-20, and the training samples of each pedestrian are denoted as L in :

其中,li=(xi min,yi min,xi max,yi max)∈R4,i的取值范围是tcurrent-This~tcurrent。tcurrent为当前时刻,This表示历史帧范围,This取值为5~20。Wherein, l i =( xi min , y i min , xi max , y i max )∈R 4 , and the value range of i is t current -T his ~t current . t current is the current moment, T his represents the historical frame range, and the value of T his is 5-20.

A33、构建行人位置特征提取卷积网络,对行人位置和检测框大小处理得到行人位置特征向量Lin FA33. Construct a pedestrian position feature extraction convolutional network, and process the pedestrian position and the size of the detection frame to obtain the pedestrian position feature vector L in F :

Lin F=(lf1,..,lfm),Lin F = ( lf 1 , .., lf m ),

其中,lfi表示行人位置特征的第i个特征值。Among them, lf i represents the i-th eigenvalue of the pedestrian position feature.

所述行人位置特征提取卷积网络结构采用4层结构,先把输入数据Lin输入第一层Conv1d一维卷积层,第一层Conv1d一维卷积层输出结果输入第二层Convld一维卷积层,第二层Conv1d一维卷积层输出结果输入第三层Convld一维卷积层,第三层Conv1d一维卷积层输出结果输入第四层Conv1d一维卷积层,第四层Conv1d一维卷积层输出结果得到特征向量Lin F;每层的输出结果都进行BN批量归一化处理并经过Relu激活函数激活。The pedestrian position feature extraction convolutional network structure adopts a 4-layer structure, and the input data Lin is first input to the first layer of Conv1d one-dimensional convolution layer, and the output result of the first layer of Conv1d one-dimensional convolution layer is input to the second layer of Convld one-dimensional Convolution layer, the output result of the second layer Conv1d one-dimensional convolution layer is input to the third layer Convld one-dimensional convolution layer, the output result of the third layer Conv1d one-dimensional convolution layer is input to the fourth layer Conv1d one-dimensional convolution layer, the fourth layer The output result of the one-dimensional convolution layer of layer Conv1d obtains the feature vector L in F ; the output result of each layer is processed by BN batch normalization and activated by the Relu activation function.

A4、获取摄像头自我运动历史特征向量;A4. Obtain the camera self-motion history feature vector;

A41、通过Structure From Motion算法获得当前帧图像相对于前一帧图像的摄像头自我运动信息。所述摄像头自我运动信息包括摄像头自身旋转信息的欧拉角rt∈R3和速度信息vt∈R3。所述欧拉角包括偏航角ψ、滚转角Φ和俯仰角θ,所述速度信息包括摄像头即时速度在三维坐标轴上的投影vx,vy,vz。摄像头自我运动历史特征向量记作EHA41. Obtain the camera self-motion information of the current frame image relative to the previous frame image through the Structure From Motion algorithm. The camera self-motion information includes the Euler angle r t ∈ R 3 of the camera self-rotation information and the velocity information v t ∈ R 3 . The Euler angles include yaw angle ψ, roll angle Φ, and pitch angle θ, and the speed information includes projections v x , v y , v z of the instant speed of the camera on three-dimensional coordinate axes. The camera ego-motion history feature vector is denoted as E H ;

其中,et=(rt T,vt T)T∈R6,t的取值范围是tcurrent-This~tcurrentWherein, e t =(r t T , v t T ) T ∈ R 6 , and the value range of t is t current −T his ~t current .

A42、构建4层摄像头自我运动历史特征提取卷积网络,提取摄像头自我运动历史信息EH的特征,得到摄像头自我运动历史特征向量EH FA42. Construct a 4-layer camera self-motion history feature extraction convolutional network, extract the features of the camera self-motion history information EH, and obtain the camera self-motion history feature vector E H F ;

EH F=(ef1,...,efn)E H F = (ef 1 , . . . , ef n )

其中,efi表示摄像头自我运动历史特征的第i个特征值。Among them, ef i represents the i-th eigenvalue of the camera ego-motion history feature.

所述摄像头自我运动历史特征提取卷积网络结构采用行人位置特征提取卷积网络相同的结构。The convolutional network structure for extracting self-motion history features of the camera adopts the same structure as the pedestrian position feature extraction convolutional network.

A5、将Lin F和EH F两个向量首尾相连连接在一起,得到特征向量LEFA5. Connect the two vectors L in F and E H F end to end to get the feature vector LE F :

LEF=(lf1,..,lfm,ef1,…,efn)LE F = (lf 1 , .., lf m , ef 1 , . . . , ef n )

A6、获取摄像头自我运动未来特征向量;A6. Obtain the future feature vector of the camera's self-motion;

A61、采用与步骤A41相同的方法,通过Structure From Motion算法得到运动摄像头自我运动未来的Tfuture帧运动信息,范围是5~20,表示运动摄像头未来打算去往何方,记作EFurA61, adopt the same method as step A41, obtain the T future frame motion information of the self-moving future of the motion camera through the Structure From Motion algorithm, and the range is 5~20, which indicates that the motion camera is going to where in the future, and is recorded as E Fur ;

A62、构建摄像头自我运动未来特征提取卷积网络CNN,提取摄像头未来自我运动信息EFur的特征,得到摄像头自我运动未来特征向量EFur FA62. Construct the camera self-motion future feature extraction convolutional network CNN, extract the features of the camera's future self-motion information E Fur , and obtain the camera self-motion future feature vector E Fur F :

EFur F=(eff1,...,effn)E Fur F = (eff 1 ,..., eff n )

其中,effi表示摄像头自我运动未来特征的第i个特征值。Among them, eff i represents the i-th eigenvalue of the camera ego-motion future feature.

所述摄像头未来自我运动特征提取卷积网络结构与步骤A42的摄像头自我运动历史特征提取卷积网络结构相同。The structure of the convolutional network for extracting the camera's future ego-motion features is the same as the convolutional network structure for extracting the camera's ego-motion history features in step A42.

B、网络解码器解码预测行人未来轨迹B. The network decoder decodes and predicts the future trajectory of pedestrians

在网络解码器结构中对网络编码器输出的行人位置信息特征和自我运动信息特征进行解码,通过反卷积网络得到行人未来轨迹。为了提高预测精度,加入能代表未来走向的运动摄像头自身的未来运动信息,具体步骤如下:In the network decoder structure, the pedestrian position information features and self-motion information features output by the network encoder are decoded, and the future trajectory of pedestrians is obtained through the deconvolution network. In order to improve the prediction accuracy, the future motion information of the motion camera itself that can represent the future trend is added. The specific steps are as follows:

B1、构建标准循环神经网络RNN,单元数为n;B1. Construct a standard recurrent neural network RNN with n units;

B2、将特征向量LEF和EFur F作为循环神经网络RNN的两个输入,得到网络的输出为预测序列LoutB2. The feature vectors LE F and E Fur F are used as two inputs of the recurrent neural network RNN, and the output of the network is obtained as a prediction sequence L out .

B3、构建解码轨迹序列反卷积网络;B3. Construct a deconvolution network for decoding trajectory sequences;

解码轨迹序列反卷积网络结构采用4层结构,先将预测序列Lout输入第一层Convld一维卷积层,第一层Conv1d一维卷积层输出结果输入第二层Conv1d一维卷积层,第二层Conv1d一维卷积层输出结果输入第三层Conv1d一维卷积层,前三层的输出结果都是先进行BN批量归一化处理并经过Relu激活函数激活。最后把第三层Convld一维卷积层输出结果输入第四层Conv1d一维卷积层。The deconvolution network structure of the decoding trajectory sequence adopts a 4-layer structure. First, the prediction sequence L out is input into the first layer of Convld one-dimensional convolutional layer, and the output result of the first layer of Conv1d one-dimensional convolutional layer is input into the second layer of Conv1d one-dimensional convolutional layer. layer, the output of the second layer Conv1d one-dimensional convolutional layer is input to the third layer Conv1d one-dimensional convolutional layer, and the output results of the first three layers are first processed by BN batch normalization and activated by the Relu activation function. Finally, the output result of the third layer Convld one-dimensional convolutional layer is input into the fourth layer Conv1d one-dimensional convolutional layer.

B4、对预测序列Lout输入步骤B3构建好的反卷积网络,得到行人未来的检测框大小信息和轨迹信息LpreB4. Input the predicted sequence L out into the deconvolution network constructed in step B3 to obtain the pedestrian's future detection frame size information and trajectory information L pre :

其中,li=(xi min,yi min,xi max,yi max)∈R4,i的取值范围是tcurrent+1~tcurrent+Tfuture,tcurrent为当前时刻,Tfuture为预测的未来帧数,范围是5~20。Among them, l i =( xi min , y i min , xi max , y i max )∈R 4 , the value range of i is t current +1~t current +T future , t current is the current moment, T future is the number of predicted future frames, ranging from 5 to 20.

与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

1、本发明采用编解码结构结合循环卷积网络来预测第一视角下行人轨迹策略。原始图像经过步骤A编码得到的行人轨迹信息的特征向量,在步骤B中进行解码特征向量,预测出未来的行人的轨迹信息。在公共数据集和自己采集到的数据集里,本发明都会准确的预测出多个行人的未来10帧的轨迹信息,最终预测轨迹和最终实际轨迹之间的L2距离误差提高到40,比现有方法提高了30个像素精度。1. The present invention uses a codec structure combined with a circular convolutional network to predict pedestrian trajectory strategies in the first perspective. The eigenvectors of pedestrian trajectory information obtained by encoding the original image in step A are decoded in step B to predict the trajectory information of future pedestrians. In the public data set and the data set collected by itself, the present invention will accurately predict the trajectory information of multiple pedestrians in the next 10 frames, and the L2 distance error between the final predicted trajectory and the final actual trajectory is increased to 40, which is higher than the current There are ways to improve the 30 pixel accuracy.

2、本发明提出了预测行人轨迹的时空卷积循环网络方法,利用一维卷积进行编解码处理,通过时空卷积网络预测,在目前的相关方法中,实现较简单、数据获取和处理清晰、简洁,实用性强。2. The present invention proposes a space-time convolutional network method for predicting pedestrian trajectories, using one-dimensional convolution for encoding and decoding processing, and predicting through the space-time convolutional network. In the current related methods, the implementation is relatively simple, and the data acquisition and processing are clear , Simple and practical.

附图说明Description of drawings

本发明共有附图6张,其中:The present invention has 6 accompanying drawings, wherein:

图1是标记工具标记后的图像。Figure 1 is the image after marking with the marking tool.

图2是标记了t0时刻行人的历史和未来位置和检测框信息。Figure 2 marks the historical and future positions and detection frame information of pedestrians at time t 0 .

图3是t0+10时刻行人的历史和未来位置和检测框信息。Figure 3 shows the historical and future positions and detection frame information of pedestrians at time t 0 +10.

图4是本发明的流程图。Fig. 4 is a flowchart of the present invention.

图5是本发明中使用的卷积网络结构图。FIG. 5 is a structural diagram of a convolutional network used in the present invention.

图6是本发明中使用的反卷积网络结构图。Fig. 6 is a structure diagram of a deconvolution network used in the present invention.

具体实施方式Detailed ways

下面结合附图对本发明进行进一步地描述。按照图4所示的流程对第一视角图像进行计算,首先用运动摄像头在运动中获取摄像头图像,将n帧图片作为第一视角行人预测的原始图像。按照本发明的步骤A31对原始图像进行处理得到标记后的图像,如图1所示。此处,根据标记工具的精度需要对标记结果进行矫正。The present invention will be further described below in conjunction with the accompanying drawings. Calculate the first-view image according to the process shown in Figure 4. First, use the moving camera to capture the camera image in motion, and use n frames of pictures as the original image for the first-view pedestrian prediction. According to step A31 of the present invention, the original image is processed to obtain the marked image, as shown in FIG. 1 . Here, the marking result needs to be corrected according to the precision of the marking tool.

按照本发明的步骤A、B得到轨迹预测结果。为了直观显示预测效果,将预测轨迹、真实轨迹和历史轨迹均标记到图像上。假定图2是t0时刻的图像,在图2上用三角形标识标记行人在t0时刻之后10秒轨迹预测结果、用四角星形标识标记该行人的t0时刻之后10秒真实轨、用菱形标识标记t0时刻之前10秒真实历史轨迹,如图2所示。图3是t0+10时刻的图像。经过对比图2和图3可以看到,图2中三角形标识代表的t0时刻行人轨迹预测结果和菱形标识代表的未来真实行人轨迹行进趋势一致,并且两条轨迹坐标点偏差值很小。在图3中方框中心是t0+10时刻的行人所在真实位置中心点,该点已经在图2中t0时刻上的预测轨迹中预测出,即图2中三角形标识轨迹最左边一个三角形点。分析预测结果可以看到根据本发明的方法能够准确地预测处行人的未来轨迹。According to steps A and B of the present invention, trajectory prediction results are obtained. In order to visually display the prediction effect, the predicted trajectory, real trajectory and historical trajectory are all marked on the image. Assuming that Figure 2 is the image at time t 0 , in Figure 2, mark the trajectory prediction result of the pedestrian 10 seconds after time t 0 with a triangle mark, mark the pedestrian’s real trajectory 10 seconds after time t 0 with a four-pointed star mark, and mark the pedestrian’s real trajectory 10 seconds after time t 0 with a diamond mark Mark the real historical track 10 seconds before time t 0 , as shown in Figure 2. Figure 3 is the image at time t 0 +10. By comparing Figure 2 and Figure 3, it can be seen that the predicted pedestrian trajectory at time t 0 represented by the triangle mark in Figure 2 is consistent with the future real pedestrian trajectory represented by the diamond mark, and the deviation of the coordinate points of the two trajectories is very small. In Figure 3, the center of the box is the center point of the real position of the pedestrian at time t 0 +10, which has been predicted in the predicted trajectory at time t 0 in Figure 2, that is, the leftmost triangle point of the triangle mark trajectory in Figure 2 . Analyzing the prediction results, it can be seen that the method according to the present invention can accurately predict the future trajectory of pedestrians.

本发明不局限于本实施例,任何在本发明披露的技术范围内的等同构思或者改变,均列为本发明的保护范围。The present invention is not limited to this embodiment, and any equivalent ideas or changes within the technical scope disclosed in the present invention are listed in the protection scope of the present invention.

Claims (1)

1.一种第一视角下的行人轨迹预测方法,其特征在于:包括以下步骤:1. A pedestrian trajectory prediction method under a first viewing angle, characterized in that: comprising the following steps: A、网络编码器编码得到轨迹特征A. Network encoder encoding to obtain trajectory features A1、行人头部佩戴或手持运动摄像头,实时获取第一视角下的录取的视频;A1. Pedestrians wear or hold motion cameras on their heads to obtain recorded videos from the first perspective in real time; A2、把视频按照每秒k帧的帧率分成若干幅图像,k的范围是5~20;A2. Divide the video into several images at a frame rate of k frames per second, where k ranges from 5 to 20; A3、通过处理器处理步骤A2中分好的图像,经过以下步骤获取行人位置特征向量:A3, the image divided in step A2 is processed by the processor, and the pedestrian position feature vector is obtained through the following steps: A31、通过标记工具对图像中行人进行标记,标记出行人检测框;A31. Use the marking tool to mark the pedestrians in the image, and mark the pedestrian detection frame; A32、通过时间窗口采样算法对步骤A31中标记的行人检测框进行矫正;由于图像空间中坐标原点在图像的左上角,横轴坐标x值从左向右递增,纵轴坐标y值从上到下递增,所以取行人检测框左上角位置信息(xi min,yi min)T及右下角位置信息(xi max,yi max)T作为行人轨迹数据;将连续n帧所包括的所有行人的轨迹序列作为一组训练样本,n的范围是10~20,每个行人的训练样本记作LinA32. Correct the pedestrian detection frame marked in step A31 through the time window sampling algorithm; since the coordinate origin in the image space is in the upper left corner of the image, the x value of the horizontal axis increases from left to right, and the y value of the vertical axis increases from top to bottom Increments downward, so the position information ( xi min , y i min ) T of the upper left corner of the pedestrian detection frame and the position information ( xi max , y i max ) T of the lower right corner of the pedestrian detection frame are taken as the pedestrian trajectory data; The trajectory sequence of pedestrians is used as a set of training samples, n ranges from 10 to 20, and the training samples of each pedestrian are recorded as L in : 其中,li=(xi min,yi min,xi max,yi max)∈R4,i的取值范围是tcurrent-This~tcurrent;tcurrent为当前时刻,This表示历史帧范围,This取值为5~20;Among them, l i =( xi min , y i min , xi max , y i max )∈R 4 , the value range of i is t current -T his ~t current ; t current is the current moment, and This represents History frame range, The value of This is 5 to 20; A33、构建行人位置特征提取卷积网络,对行人位置和检测框大小处理得到行人位置特征向量Lin FA33. Construct a pedestrian position feature extraction convolutional network, and process the pedestrian position and the size of the detection frame to obtain the pedestrian position feature vector L in F : Lin F=(lf1,...,lfm),Lin F = ( lf 1 , . . . , lf m ), 其中,lfi表示行人位置特征的第i个特征值;Among them, lf i represents the i-th eigenvalue of the pedestrian position feature; 所述行人位置特征提取卷积网络结构采用4层结构,先把输入数据Lin输入第一层Convld一维卷积层,第一层Convld一维卷积层输出结果输入第二层Convld一维卷积层,第二层Convld一维卷积层输出结果输入第三层Convld一维卷积层,第三层Convld一维卷积层输出结果输入第四层Convld一维卷积层,第四层Convld一维卷积层输出结果得到特征向量Lin F;每层的输出结果都进行BN批量归一化处理并经过Relu激活函数激活;The pedestrian position feature extraction convolutional network structure adopts a 4-layer structure, and the input data Lin is first input to the first layer of Convld one-dimensional convolution layer, and the output result of the first layer of Convld one-dimensional convolution layer is input to the second layer of Convld one-dimensional Convolution layer, the output result of the second layer Convld one-dimensional convolution layer is input to the third layer Convld one-dimensional convolution layer, the output result of the third layer Convld one-dimensional convolution layer is input to the fourth layer Convld one-dimensional convolution layer, the fourth layer The output result of the one-dimensional convolutional layer of the layer Convld obtains the feature vector L in F ; the output result of each layer is processed by BN batch normalization and activated by the Relu activation function; A4、获取摄像头自我运动历史特征向量;A4. Obtain the camera self-motion history feature vector; A41、通过Structure From Motion算法获得当前帧图像相对于前一帧图像的摄像头自我运动信息;所述摄像头自我运动信息包括摄像头自身旋转信息的欧拉角rt∈R3和速度信息vt∈R3;所述欧拉角包括偏航角ψ、滚转角φ和俯仰角θ,所述速度信息包括摄像头即时速度在三维坐标轴上的投影vx,vy,vz;摄像头自我运动历史特征向量记作EHA41. Obtain the camera self-motion information of the current frame image relative to the previous frame image through the Structure From Motion algorithm; the camera self-motion information includes the Euler angle r t ∈ R 3 of the camera self-rotation information and the velocity information v t ∈ R 3 ; the Euler angle includes yaw angle ψ, roll angle φ and pitch angle θ, and the speed information includes the projection v x of the instant speed of the camera on the three-dimensional coordinate axis, v y , v z ; the self-motion history characteristics of the camera The vector is denoted as E H ; 其中,et=(rt T,vt T)T∈R6,t的取值范围是tcurrent-This~tcurrentAmong them, e t = (r t T , v t T ) T ∈ R 6 , the value range of t is t current - T his ~ t current ; A42、构建4层摄像头自我运动历史特征提取卷积网络,提取摄像头自我运动历史信息EH的特征,得到摄像头自我运动历史特征向量EH FA42. Construct a 4-layer camera self-motion history feature extraction convolutional network, extract the features of the camera self-motion history information EH , and obtain the camera self-motion history feature vector EHF ; EH F=(ef1,...,efn)E H F = (ef 1 , . . . , ef n ) 其中,efi表示摄像头自我运动历史特征的第i个特征值;Among them, ef i represents the i-th eigenvalue of the camera's ego-motion history feature; 所述摄像头自我运动历史特征提取卷积网络结构采用行人位置特征提取卷积网络相同的结构;The camera self-motion history feature extraction convolutional network structure adopts the same structure as the pedestrian position feature extraction convolutional network; A5、将Lin F和EH F两个向量首尾相连连接在一起,得到特征向量LEFA5. Connect the two vectors L in F and E H F end to end to get the feature vector LE F : LEF=(lf1,...,lfm,ef1,...,efn)LE F = (lf 1 , . . . , lf m , ef 1 , . . . , ef n ) A6、获取摄像头自我运动未来特征向量;A6. Obtain the future feature vector of the camera's self-motion; A61、采用与步骤A41相同的方法,通过Structure From Motion算法得到运动摄像头自我运动未来的Tfuture帧运动信息,表示运动摄像头未来打算去往何方,记作EFurA61, adopt the same method as step A41, obtain the T future frame motion information of the self-motion future of the motion camera through the Structure From Motion algorithm, represent the future where the motion camera intends to go, and be denoted as E Fur ; A62、构建摄像头自我运动未来特征提取卷积网络CNN,提取摄像头未来自我运动信息EFur的特征,得到摄像头自我运动未来特征向量EFur FA62. Construct the camera self-motion future feature extraction convolutional network CNN, extract the features of the camera's future self-motion information E Fur , and obtain the camera self-motion future feature vector E Fur F : EFur F=(eff1,...,effn)E Fur F = (eff 1 ,..., eff n ) 其中,effi表示摄像头自我运动未来特征的第i个特征值;Among them, eff i represents the i-th eigenvalue of the camera's self-motion future feature; 所述摄像头未来自我运动特征提取卷积网络结构与步骤A42的摄像头自我运动历史特征提取卷积网络结构相同;The convolutional network structure of the camera's future self-motion feature extraction is the same as that of the camera's self-motion history feature extraction convolutional network structure in step A42; B、网络解码器解码预测行人未来轨迹B. The network decoder decodes and predicts the future trajectory of pedestrians 在网络解码器结构中对网络编码器输出的行人位置信息特征和自我运动信息特征进行解码,通过反卷积网络得到行人未来轨迹;为了提高预测精度,加入能代表未来走向的运动摄像头自身的未来运动信息,具体步骤如下:In the network decoder structure, the pedestrian position information features and self-motion information features output by the network encoder are decoded, and the future trajectory of pedestrians is obtained through the deconvolution network; in order to improve the prediction accuracy, the future of the motion camera itself that can represent the future trend is added Sports information, the specific steps are as follows: B1、构建标准循环神经网络RNN,单元数为n;B1. Construct a standard recurrent neural network RNN with n units; B2、将特征向量LEF和EFur F作为循环神经网络RNN的两个输入,得到网络的输出为预测序列LoutB2, using the feature vectors LE F and E Fur F as two inputs of the recurrent neural network RNN, and obtaining the output of the network as the prediction sequence L out ; B3、构建解码轨迹序列反卷积网络;B3. Construct a deconvolution network for decoding trajectory sequences; 解码轨迹序列反卷积网络结构采用4层结构,先将预测序列Lout输入第一层Convld一维卷积层,第一层Convld一维卷积层输出结果输入第二层Convld一维卷积层,第二层Convld一维卷积层输出结果输入第三层Convld一维卷积层,前三层的输出结果都是先进行BN批量归一化处理并经过Relu激活函数激活;最后把第三层Convld一维卷积层输出结果输入第四层Convld一维卷积层;The deconvolution network structure of the decoding trajectory sequence adopts a 4-layer structure. First, the prediction sequence L out is input into the first layer of Convld one-dimensional convolution layer, and the output result of the first layer of Convld one-dimensional convolution layer is input into the second layer of Convld one-dimensional convolution Layer, the output of the second layer Convld one-dimensional convolutional layer is input to the third layer Convld one-dimensional convolutional layer, the output results of the first three layers are first processed by BN batch normalization and activated by the Relu activation function; finally, the second layer The output result of the three-layer Convld one-dimensional convolutional layer is input to the fourth layer Convld one-dimensional convolutional layer; B4、对预测序列Lout输入步骤B3构建好的反卷积网络,得到行人未来的检测框大小信息和轨迹信息LpreB4. Input the predicted sequence L out into the deconvolution network constructed in step B3 to obtain the pedestrian's future detection frame size information and trajectory information L pre : 其中,li=(xi min,yi min,xi max,yi max)∈R4,i的取值范围是tcurrent+1~tcurrent+Tfuture,tcurrent为当前时刻,Tfuture为预测的未来帧数,范围是5~20。Among them, l i =( xi min , y i min , xi max , y i max )∈R 4 , the value range of i is t current +1~t current +T future , t current is the current moment, T future is the number of predicted future frames, ranging from 5 to 20.
CN201910807214.2A 2019-08-29 2019-08-29 Method for predicting pedestrian track at first view angle Active CN110516613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910807214.2A CN110516613B (en) 2019-08-29 2019-08-29 Method for predicting pedestrian track at first view angle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910807214.2A CN110516613B (en) 2019-08-29 2019-08-29 Method for predicting pedestrian track at first view angle

Publications (2)

Publication Number Publication Date
CN110516613A true CN110516613A (en) 2019-11-29
CN110516613B CN110516613B (en) 2023-04-18

Family

ID=68629021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910807214.2A Active CN110516613B (en) 2019-08-29 2019-08-29 Method for predicting pedestrian track at first view angle

Country Status (1)

Country Link
CN (1) CN110516613B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116944A (en) * 2021-11-30 2022-03-01 重庆七腾科技有限公司 Trajectory prediction method and device based on time attention convolution network
CN114581487A (en) * 2021-08-02 2022-06-03 北京易航远智科技有限公司 Pedestrian trajectory prediction method and device, electronic equipment and computer program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160379074A1 (en) * 2015-06-25 2016-12-29 Appropolis Inc. System and a method for tracking mobile objects using cameras and tag devices
CN108875588A (en) * 2018-05-25 2018-11-23 武汉大学 Across camera pedestrian detection tracking based on deep learning
CN109063581A (en) * 2017-10-20 2018-12-21 奥瞳系统科技有限公司 Enhanced Face datection and face tracking method and system for limited resources embedded vision system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160379074A1 (en) * 2015-06-25 2016-12-29 Appropolis Inc. System and a method for tracking mobile objects using cameras and tag devices
CN109063581A (en) * 2017-10-20 2018-12-21 奥瞳系统科技有限公司 Enhanced Face datection and face tracking method and system for limited resources embedded vision system
CN108875588A (en) * 2018-05-25 2018-11-23 武汉大学 Across camera pedestrian detection tracking based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JO SUNYOUNG 等: ""Doppler Channel Series Prediction Using Recurrent Neural Networks"" *
张德正 等: ""基于深度卷积长短时神经网络的视频帧预测"" *
韩昭蓉 等: ""基于Bi-LSTM模型的轨迹异常点检测算法"" *
高玄 等: ""基于图像处理的人群行为识别方法综述"" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581487A (en) * 2021-08-02 2022-06-03 北京易航远智科技有限公司 Pedestrian trajectory prediction method and device, electronic equipment and computer program product
CN114581487B (en) * 2021-08-02 2022-11-25 北京易航远智科技有限公司 Pedestrian trajectory prediction method, device, electronic equipment and computer program product
CN114116944A (en) * 2021-11-30 2022-03-01 重庆七腾科技有限公司 Trajectory prediction method and device based on time attention convolution network
CN114116944B (en) * 2021-11-30 2024-06-11 重庆七腾科技有限公司 Track prediction method and device based on time attention convolution network

Also Published As

Publication number Publication date
CN110516613B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
US11763466B2 (en) Determining structure and motion in images using neural networks
US20220417590A1 (en) Electronic device, contents searching system and searching method thereof
WO2021139484A1 (en) Target tracking method and apparatus, electronic device, and storage medium
CN107274433B (en) Target tracking method, device and storage medium based on deep learning
WO2020173226A1 (en) Spatial-temporal behavior detection method
CN102982598B (en) Video people counting method and system based on single camera scene configuration
Manglik et al. Forecasting time-to-collision from monocular video: Feasibility, dataset, and challenges
JP7427614B2 (en) sensor calibration
KR102207195B1 (en) Apparatus and method for correcting orientation information from one or more inertial sensors
CN104899590B (en) A method and system for following an unmanned aerial vehicle visual target
CN103440667B (en) The automaton that under a kind of occlusion state, moving target is stably followed the trail of
CN108010067A (en) A kind of visual target tracking method based on combination determination strategy
CN104700088B (en) A kind of gesture track recognition method under the follow shot based on monocular vision
CN104091349A (en) Robust target tracking method based on support vector machine
Saric et al. Warp to the future: Joint forecasting of features and feature motion
CN110516613B (en) Method for predicting pedestrian track at first view angle
CN104036243A (en) Behavior recognition method based on light stream information
CN110781962A (en) Target detection method based on lightweight convolutional neural network
CN113312973A (en) Method and system for extracting features of gesture recognition key points
CN106296743A (en) A kind of adaptive motion method for tracking target and unmanned plane follow the tracks of system
CN113686314A (en) Monocular water surface target segmentation and monocular distance measurement method of shipborne camera
CN104574443B (en) The cooperative tracking method of moving target between a kind of panoramic camera
Łysakowski et al. Real-time onboard object detection for augmented reality: Enhancing head-mounted display with yolov8
CN106327528A (en) Moving object tracking method and operation method of unmanned aerial vehicle
CN104036238B (en) The method of the human eye positioning based on active light

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant