CN112800879B - Vehicle-mounted video-based front vehicle position prediction method and prediction system - Google Patents
Vehicle-mounted video-based front vehicle position prediction method and prediction system Download PDFInfo
- Publication number
- CN112800879B CN112800879B CN202110051940.3A CN202110051940A CN112800879B CN 112800879 B CN112800879 B CN 112800879B CN 202110051940 A CN202110051940 A CN 202110051940A CN 112800879 B CN112800879 B CN 112800879B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- sequence
- optical flow
- front vehicle
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000003287 optical effect Effects 0.000 claims abstract description 120
- 238000001514 detection method Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 127
- 239000011159 matrix material Substances 0.000 claims description 43
- 238000013519 translation Methods 0.000 claims description 27
- 238000012549 training Methods 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 21
- 230000004927 fusion Effects 0.000 claims description 16
- 230000000306 recurrent effect Effects 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 7
- 230000001186 cumulative effect Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 3
- 238000010586 diagram Methods 0.000 description 14
- 206010039203 Road traffic accident Diseases 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于车载视频的前方车辆位置预测方法,包括:构建基于编解码框架的车辆位置预测模型,用于根据前车包围框和包围框内光流的历史数据、本车运动信息的预测数据,预测前车的位置和尺度;构建样本集并对车辆位置预测模型进行训练;获取车载视频;对视频帧进行车辆检测与跟踪并计算光流,得到前车的包围框序列和光流序列;预测本车的运动信息,构成运动预测序列;截取当前时刻t前的T个视频帧中前车包围框、包围框内的光流,和t后的△个视频帧中本车运动信息预测值,输入车辆位置预测模型,得到前车在t后的△个视频帧中的包围框序列,预测出前车的位置和尺度。该方法仅基于行车记录仪拍摄的视频信息,能够实时对前车位置和尺度做出预测。
The invention discloses a method for predicting the position of a vehicle ahead based on vehicle video, which includes: constructing a vehicle position prediction model based on an encoding and decoding framework, which is used to predict the position of the vehicle based on the enclosing frame of the preceding vehicle and the historical data of the optical flow in the enclosing frame, and the motion information of the vehicle. Predict the position and scale of the preceding vehicle; build a sample set and train the vehicle position prediction model; obtain vehicle video; perform vehicle detection and tracking on video frames and calculate optical flow to obtain the bounding box sequence and optical flow of the preceding vehicle. Sequence; predict the motion information of the vehicle to form a motion prediction sequence; intercept the bounding box of the preceding vehicle and the optical flow in the bounding box in the T video frames before the current time t, and the motion information of the vehicle in the △ video frames after t Predicted value, input the vehicle position prediction model, get the bounding box sequence of the preceding vehicle in △ video frames after t, and predict the position and scale of the preceding vehicle. This method is only based on the video information captured by the dash cam, and can predict the position and scale of the preceding vehicle in real time.
Description
技术领域technical field
本发明属于辅助驾驶技术领域,具体涉及一种基于车载视频的前方车辆位置预测方法和系统。The invention belongs to the technical field of assisted driving, and in particular relates to a method and system for predicting the position of a vehicle ahead based on in-vehicle video.
背景技术Background technique
随着社会的不断发展,家用汽车得到了普及。在享受到汽车带来的便捷时,很多问题也随之而来,如交通安全事故频繁发生、道路行驶环境恶劣、生态环境受到污染等。种种问题都使得人们的生命和财产受到威胁,尤其是交通事故问题,因此安全行车成为了大众迫切的需求。造成交通事故往往是因为驾驶员对驾驶道路上其他交通参与者的行为不能及时做出反应,而行车记录仪现已经被大量车主使用,可以记录车主行驶全过程中的视频图像和声音,如果能够根据行车记录仪拍摄的视频,实时对本车前方车辆的位置进行预测,就能让驾驶员在行车过程中有足够的时间避免交通事故的发生,但目前的行车记录仪还没有这种功能。With the continuous development of society, family cars have been popularized. When enjoying the convenience brought by cars, many problems also follow, such as frequent traffic safety accidents, harsh road driving conditions, and pollution of the ecological environment. All kinds of problems threaten people's lives and property, especially traffic accidents. Therefore, safe driving has become an urgent need of the public. Traffic accidents are often caused by the driver's inability to respond in time to the behavior of other traffic participants on the driving road, and the driving recorder has been used by a large number of car owners, which can record the video images and sounds of the entire driving process of the car owner. According to the video captured by the driving recorder, the real-time prediction of the position of the vehicle in front of the vehicle can allow the driver to have enough time to avoid traffic accidents during the driving process, but the current driving recorder does not have such a function.
目前国内外提出的关于车辆位置的预测方法其大致可以分为传统方法和基于深度学习方法两类。At present, the prediction methods about vehicle position proposed at home and abroad can be roughly divided into two categories: traditional methods and deep learning-based methods.
传统的车辆位置预测方法如贝叶斯滤波方法,该方法的结构过于简单,无法分析复杂的车辆运动模式,而且往往不能很好的进行长期预测。动态贝叶斯网络利用图形模型描述了决定车辆轨迹的各种潜在因素,对生成车辆轨迹的物理过程进行显示建模,虽然能够解决上述问题,但由于基于设计人员的直觉确定的模型结构不足以捕获各种动态交通场景,在真实交通场景的性能受到限制,并且其计算复杂度高,不能满足实时预测的要求。Traditional vehicle position prediction methods such as Bayesian filtering methods are too simple in structure to analyze complex vehicle motion patterns and often fail to perform long-term predictions well. The dynamic Bayesian network uses a graphical model to describe the various potential factors that determine the vehicle trajectory, and models the physical process that generates the vehicle trajectory. Although it can solve the above problems, the model structure determined based on the designer's intuition is not enough. Capturing various dynamic traffic scenarios has limited performance in real traffic scenarios, and its computational complexity is high, which cannot meet the requirements of real-time prediction.
近几年,基于深度学习的方法在图像处理领域展现出强大的能力,许多研究者也将深度学习方法中的循环神经网络结构及其各种变体结构应用在车辆位置预测的任务中。这些方法利用车辆过去的行驶数据,在深度学习网络模型中训练,在各自的应用场景中都获得了很好的预测效果。但是这些研究存在两个问题:第一,车辆过去的行驶数据都需要通过车辆上安装的多种传感器捕获得到,这在今天的生产车辆上并不常见;第二,仅能预测出前方车辆的像素位置,不能预测出前方车辆的尺度。In recent years, methods based on deep learning have shown powerful capabilities in the field of image processing, and many researchers have also applied the recurrent neural network structure and its various variant structures in deep learning methods to the task of vehicle position prediction. These methods use the past driving data of the vehicle, train in the deep learning network model, and obtain good prediction results in their respective application scenarios. But these studies have two problems: first, the past driving data of the vehicle needs to be captured by various sensors installed on the vehicle, which is not common in today's production vehicles; second, it can only predict the vehicle ahead. Pixel position, the scale of the vehicle ahead cannot be predicted.
而本发明仅基于行车记录仪拍摄的图像信息实时对前方车辆位置和尺度做出预测,让驾驶员在行车过程中有足够的时间避免交通事故,可以较好的运用到实际场景中。However, the present invention makes real-time prediction on the position and size of the vehicle ahead based on the image information captured by the driving recorder, so that the driver has enough time to avoid traffic accidents during the driving process, and can be better applied to the actual scene.
发明内容SUMMARY OF THE INVENTION
发明目的:针对现有技术中存在的问题,本发明提供一种基于车载视频的前方车辆位置预测方法,该方法仅基于行车记录仪拍摄的视频信息,能够实时对前方车辆位置和尺度做出预测,让驾驶员在行车过程中有足够的时间避免交通事故,可以较好的运用到实际场景中。Purpose of the invention: In view of the problems existing in the prior art, the present invention provides a method for predicting the position of the vehicle ahead based on on-board video. The method can predict the position and scale of the vehicle ahead in real time only based on the video information captured by the driving recorder. , so that the driver has enough time to avoid traffic accidents during the driving process, which can be better applied to the actual scene.
技术方案:本发明一方面公开了一种基于车载视频的前方车辆位置预测方法,包括训练阶段和预测阶段,其中训练阶段包括:Technical solution: On the one hand, the present invention discloses a method for predicting the position of the vehicle ahead based on the vehicle video, including a training stage and a prediction stage, wherein the training stage includes:
S1、构建基于编解码框架的车辆位置预测模型,所述车辆位置预测模型用于根据当前时刻t之前的t-0,t-1,…,t-(T-1)时刻前方车辆包围框、所述包围框内的光流、本车在当前时刻t之后的t+1,t+2,…,t+△时刻的运动信息,预测前方车辆在当前时刻t之后的t+1,t+2,…,t+△时刻的包围框;S1. Construct a vehicle position prediction model based on an encoding and decoding framework. The vehicle position prediction model is used for the bounding box of the vehicle ahead at the time t-0, t-1, . . . , t-(T-1) before the current time t, The optical flow in the bounding box, the motion information of the vehicle at times t+1, t+2, ..., t+△ after the current time t, and the prediction of the vehicle ahead at t+1, t+2 after the current time t ,…, the bounding box at time t+△;
所述车辆位置预测模型的输入包括:当前时刻t前的T个时刻的视频帧中,前方车辆的包围框序列B、前方车辆包围框内的光流序列F,以及当前时刻t后的△个时刻的视频帧中,本车的运动预测序列M;The input of the vehicle position prediction model includes: in the video frames at T times before the current time t, the bounding box sequence B of the preceding vehicle, the optical flow sequence F in the bounding box of the preceding vehicle, and the △ frames after the current time t. In the video frame at the moment, the motion prediction sequence M of the vehicle;
所述车辆位置预测模型的输出为当前时刻t后的△个时刻的视频帧图像中前方车辆的预测包围框序列Y;The output of the vehicle position prediction model is the predicted bounding box sequence Y of the preceding vehicle in the video frame images at △ times after the current time t;
所述车辆位置预测模型包括:前方车辆包围框编码器、前方车辆光流编码器、特征融合单元、前方车辆位置预测解码器;The vehicle position prediction model includes: a front vehicle bounding box encoder, a front vehicle optical flow encoder, a feature fusion unit, and a front vehicle position prediction decoder;
所述前方车辆包围框编码器用于对前方车辆的包围框序列B编码,得到前方车辆的时序特征矢量 The preceding vehicle bounding box encoder is used to encode the bounding box sequence B of the preceding vehicle to obtain the time series feature vector of the preceding vehicle
所述前方车辆光流编码器用于对前方车辆包围框内的光流序列F编码,得到前方车辆的运动特征矢量 The preceding vehicle optical flow encoder is used to encode the optical flow sequence F in the bounding box of the preceding vehicle to obtain the motion feature vector of the preceding vehicle
所述特征融合单元将前方车辆的时序特征矢量和运动特征矢量连接为前车的融合特征矢量 The feature fusion unit combines the time series feature vector of the preceding vehicle and motion feature vectors Concatenated as the fused feature vector of the preceding vehicle
所述前方车辆位置预测解码器根据本车的运动预测序列M对特征矢量解码,得到当前时刻t后的△个时刻的视频帧中前方车辆的预测包围框;The preceding vehicle position prediction decoder predicts the sequence M pairs of feature vectors according to the motion of the vehicle. Decoding to obtain the predicted bounding box of the vehicle ahead in the video frame at △ times after the current time t;
S2、构建样本集并对车辆位置预测模型进行训练,包括:S2. Construct a sample set and train a vehicle position prediction model, including:
S2-1、采集能够拍摄到前车的多个时长为s的车载视频片段,对每个视频片段中的视频帧进行采样,并确定采样后的视频帧中前方车辆的包围框序列Btr、包围框内的光流序列Ftr和视频帧对应时刻本车的运动预测序列Mtr,构成样本集;S2-1. Collect a plurality of in-vehicle video clips with a duration of s that can capture the preceding vehicle, sample the video frames in each video clip, and determine the bounding box sequence B tr of the preceding vehicle in the sampled video frame, The optical flow sequence F tr in the bounding box and the motion prediction sequence M tr of the vehicle at the corresponding moment of the video frame constitute a sample set;
S2-2、将样本集划分为训练集和验证集;设置学习率σ,批处理数量N;S2-2. Divide the sample set into training set and validation set; set learning rate σ and batch number N;
S2-3、训练过程采用Adam优化器,根据训练集样本数和N确定训练批次N′;将训练样本中的视频片段前s′时长的视频帧对应的Btr、Ftr,后s″时长的视频帧对应的Mtr作为车辆位置预测模型的输入,后s″时长的视频帧对应的Btr作为输出,对所述模型进行训练,保存模型参数,并用验证集验证模型的预测准确度;s′+s″=s; S2-3 . The Adam optimizer is used in the training process, and the training batch N′ is determined according to the number of samples in the training set and N; The M tr corresponding to the video frame of the duration is used as the input of the vehicle position prediction model, and the B tr corresponding to the video frame of the later s″ duration is used as the output, the model is trained, the model parameters are saved, and the verification set is used to verify the prediction accuracy of the model ;s'+s"=s;
S2-4、选择N′批训练中预测准确度最高的模型参数作为车辆位置预测模型的参数;S2-4, select the model parameters with the highest prediction accuracy in the N' batch training as the parameters of the vehicle position prediction model;
预测阶段包括:The forecast phase includes:
车辆上设置可以拍摄前方车辆的摄像头,获取所述摄像头在车辆行驶中采集的视频数据;A camera that can shoot the vehicle ahead is set on the vehicle, and video data collected by the camera while the vehicle is driving is acquired;
对视频中每一帧图像进行车辆检测与跟踪,得到每一辆前车的包围框序列,并存入Btest(i)中,i为前车编号;同时计算包围框内的光流,存入Ftest(i);获取本车在未来帧中的运动信息,存入序列Mtest;Perform vehicle detection and tracking on each frame of image in the video, obtain the bounding box sequence of each preceding vehicle, and store it in B test (i), where i is the number of the preceding vehicle; at the same time, calculate the optical flow in the bounding box and store Enter F test (i); obtain the motion information of the vehicle in the future frame, and store it in the sequence M test ;
在序列Btest(i)和Ftest(i)中采用长度为T的第一滑动窗,在序列Mtest中采用长度为△的第二滑动窗,分别截取当前时刻t前的T个视频帧中车辆i的包围框、所述包围框内的光流,以及当前时刻t后的△个视频帧中本车的运动信息预测值,输入训练好的车辆位置预测模型中,得到前方车辆i在当前时刻t后的△个视频帧中的包围框序列Y′(i)=[Y′t+1(i),Y′t+2(i),…,Y′t+δ(i),…,Y′t+△(i)],计算前方车辆i的包围框在当前时刻视频帧中的相对位置:其中Btest,t+0(i)为前方车辆i在当前时刻t的包围框;1≤δ≤△;A first sliding window of length T is used in sequences B test (i) and F test (i), and a second sliding window of length △ is used in sequence M test to intercept T video frames before the current time t respectively The bounding box of vehicle i, the optical flow in the bounding box, and the predicted value of the motion information of the vehicle in △ video frames after the current time t are input into the trained vehicle position prediction model, and the position of the vehicle i ahead is obtained. The bounding box sequence Y′(i) in △ video frames after the current time t=[Y′ t+1 (i), Y′ t+2 (i),…,Y′ t+δ (i), ...,Y′ t+△ (i)], calculate the relative position of the bounding box of the preceding vehicle i in the video frame at the current moment: where B test,t+0 (i) is the bounding box of the preceding vehicle i at the current time t; 1≤δ≤△;
根据Y′(i)中包围框的中心得到前方车辆i的预测轨迹;根据Y′(i)中包围框的宽高得到前方车辆i尺度。The predicted trajectory of the preceding vehicle i is obtained according to the center of the bounding box in Y'(i); the scale of the preceding vehicle i is obtained according to the width and height of the bounding box in Y'(i).
所述前方车辆的包围框序列采用如下步骤计算:The bounding box sequence of the preceding vehicle is calculated using the following steps:
A.1、对连续T个时刻的视频帧图像进行车辆检测,得到每帧图像中所有车辆的包围框;A.1. Perform vehicle detection on video frame images of T consecutive moments to obtain the bounding boxes of all vehicles in each frame of image;
A.2、采用多目标跟踪算法跟踪步骤A.1得到的车辆包围框,对不同帧中同一车辆给出相同编号,按时间顺序构成T个时刻前方车辆包围框序列B。A.2. Use the multi-target tracking algorithm to track the vehicle bounding box obtained in step A.1, give the same number to the same vehicle in different frames, and form the bounding box sequence B of the preceding vehicle at T times in time order.
所述前方车辆包围框内的光流序列采用如下步骤计算:The optical flow sequence in the bounding box of the preceding vehicle is calculated by the following steps:
B.1、对连续T个时刻的视频帧图像,计算每一帧与其前一帧图像的光流,得到每一帧图像对应的光流图;所述光流图中第j个像素点的二维光流矢量为:Ij=(uj,vj),uj,vj分别为光流矢量的垂直分量和水平分量;B.1. For video frame images at consecutive T times, calculate the optical flow of each frame and its previous frame image, and obtain the optical flow map corresponding to each frame image; the jth pixel in the optical flow map The two-dimensional optical flow vector is: I j =(u j , v j ), u j , v j are the vertical and horizontal components of the optical flow vector, respectively;
B.2、在第t-τ时刻的图像对应的光流图中截取第t-τ时刻图像中前方车辆包围框覆盖部分,并缩放至预设的统一尺寸,得到第t-τ时刻的包围框内的光流图,按时间顺序构成T个时刻前方车辆包围框内的光流序列F,t-τ表示时刻t前的第τ个时刻,0≤τ<T。B.2. In the optical flow diagram corresponding to the image at time t-τ, intercept the part covered by the bounding box of the vehicle ahead in the image at time t-τ, and scale it to a preset uniform size to obtain the bounding box at time t-τ The optical flow graph in the box constitutes the optical flow sequence F in the bounding box of the vehicle ahead at T times in time order, t-τ represents the τth moment before time t, 0≤τ<T.
所述本车的运动预测序列采用如下步骤计算:The motion prediction sequence of the vehicle is calculated by the following steps:
C.1、对当前时刻t之前的t-0,t-1,…,t-(T-1)时刻的视频帧,计算相邻时刻视频帧Pt-τ-1和Pt-τ的相机旋转矩阵Rt-τ和平移向量Vt-τ,构成旋转矩阵序列RS和平移向量序列VS,0≤τ<T,具体包括步骤C.1-1至步骤C.1-2:C.1. For the video frames at the time t-0, t-1,..., t-(T-1) before the current time t, calculate the difference between the video frames P t-τ-1 and P t-τ at the adjacent time Camera rotation matrix R t-τ and translation vector V t-τ , constitute rotation matrix sequence RS and translation vector sequence VS, 0≤τ<T, specifically including steps C.1-1 to C.1-2:
C.1-1、采用八点法,计算得到本质矩阵E,方法如下:C.1-1. Using the eight-point method, calculate the essential matrix E, the method is as follows:
C.1-1-1、采用Surf算法,提取Pt-τ-1和Pt-τ的特征点,并选取8对最匹配的特征点(al,a′l),l=1,2,…,8;其中al,a′l分别表示视频帧Pt-τ-1和Pt-τ中第l对匹配的特征点像素位置在归一化平面上的坐标,al=[xl,yl,1]T,a′l=[x′l,y′l,1]T;al,a′l均为3×1的矩阵,其中T表示矩阵的转置;C.1-1-1. Using the Surf algorithm, extract the feature points of P t-τ-1 and P t-τ , and select 8 pairs of the most matching feature points (a l , a' l ), l=1, 2 , . _ _ [x l ,y l ,1] T , a' l =[x' l ,y' l ,1] T ; a l , a' l are both 3×1 matrices, where T represents the transpose of the matrix;
C.1-1-2、将8对匹配的特征点组合,得到3×8的矩阵a和a′:C.1-1-2. Combine 8 pairs of matched feature points to obtain 3×8 matrices a and a':
根据a和a′建立对极约束公式: According to a and a′, establish the polar constraint formula:
aTEa′=0a T Ea'=0
解上述方程组得到本质矩阵E,E为3×3的矩阵;Solve the above equations to get the essential matrix E, where E is a 3×3 matrix;
C.1-2、对E进行奇异值分解,得到相机的旋转矩阵Rt-τ和平移向量Vt-τ,其中Rt-τ为3×3的矩阵,Vt-τ为3维列向量;C.1-2. Perform singular value decomposition on E to obtain the rotation matrix R t-τ and translation vector V t-τ of the camera, where R t-τ is a 3×3 matrix, and V t-τ is a 3-dimensional column vector;
最终得到t时刻前T个视频帧的旋转矩阵序列RS={Rt-(T-1),…,Rt-τ,…,Rt-1,Rt-0},t时刻前T个视频帧的平移向量序列VS={Vt-(T-1),…,Vt-τ,…,Vt-1,Vt-0};Finally, the rotation matrix sequence RS={R t-(T-1) ,...,R t-τ ,...,R t-1 ,R t-0 } of the T video frames before time t is obtained, and T before time t The translation vector sequence VS={V t-(T-1) ,...,V t-τ ,...,V t-1 ,V t-0 };
C.2、对于C.1得到的RS和VS中的相机旋转矩阵和平移向量,计算每一个Rt-τ和Vt-τ与其前一时刻的累积值,所述累积值用R′t-τ和V′t-τ表示,如下公式所示:C.2. For the camera rotation matrix and translation vector in RS and VS obtained in C.1, calculate the cumulative value of each R t-τ and V t-τ and its previous moment, the cumulative value is R' t -τ and V′ t-τ are expressed as follows:
C.3、将C.2最后计算得到的R′t-0和V′t-0传递给相机在下一时刻的旋转矩阵和平移向量,如下公式所示:C.3. Pass the R' t-0 and V' t-0 calculated in C.2 to the rotation matrix and translation vector of the camera at the next moment, as shown in the following formula:
Rt+1=R′t-0 R t+1 =R' t-0
Vt+1=V′t-0 V t+1 =V' t-0
C.4、将C.3得到的Rt+1和Vt+1分别添加在C.1得到的旋转矩阵序列RS和平移向量序列VS末尾,并继续执行C.2和C.3,直到得到t时刻后△个视频帧的所有旋转矩阵{Rt+1,Rt+2,…,Rt+δ,…,Rt+△},t时刻后△个视频帧的所有平移向量{Vt+1,Vt+2,…,Vt+δ,…,Vt+△},1≤δ≤△;C.4. Add R t+1 and V t+1 obtained in C.3 to the end of the rotation matrix sequence RS and translation vector sequence VS obtained in C.1, and continue to execute C.2 and C.3 until Obtain all rotation matrices {R t+1 ,R t+2 ,…,R t+δ ,…,R t+△ } of △ video frames after time t, and all translation vectors {V of △ video frames after time t t+1 ,V t+2 ,…,V t+δ ,…,V t+△ }, 1≤δ≤△;
C.5、计算本车在当前时刻t后△个时刻的运动向量,构成本车的运动预测序列M={Mt+1,Mt+2,…,Mt+δ,…,Mt+△},具体包括步骤C.5-1至C.5-2:C.5. Calculate the motion vector of the vehicle at △ times after the current time t to form the motion prediction sequence M={M t+1 ,M t+2 ,…,M t+δ ,…,M t+ △ }, specifically including steps C.5-1 to C.5-2:
C.5-1、从旋转矩阵Rt+δ中提取相机在x,y,z轴的旋转角度信息,并用3维行向量表示,其中:C.5-1. Extract the rotation angle information of the camera in the x, y, z axes from the rotation matrix R t+δ , and use the 3-dimensional row vector means, where:
上式中,rjk表示旋转矩阵Rt+δ中第j行第k列的值,j,k∈{1,2,3};atan2()与atan()均表示反正切函数,但是atan()求出的结果取值范围为(0,2π],atan2()求出的结果取值范围为(-π,π];In the above formula, r jk represents the value of the jth row and the kth column in the rotation matrix R t+δ , j,k∈{1,2,3}; both atan2() and atan() represent the arctangent function, but atan The value range of the result obtained by () is (0, 2π], and the value range of the result obtained by atan2() is (-π, π];
C.5-2、将向量ψt+δ与转换为三维行向量的平移向量Vt+δ T连接,组成一个6维行向量Mt+δ:Mt+δ=[ψt+δ,Vt+δ T];C.5-2. Connect the vector ψ t+δ with the translation vector V t+δ T converted into a three-dimensional row vector to form a 6-dimensional row vector M t+δ : M t+δ =[ψ t+δ , V t + δ T ];
最终得到本车的运动预测序列M={Mt+1,Mt+2,…,Mt+δ,…,Mt+△};Finally, the motion prediction sequence M={M t+1 ,M t+2 ,…,M t+δ ,…,M t+△ } is obtained;
C.6、将M经过一个全连接层FC4,变换其所有运动向量的维度。C.6. Pass M through a fully connected layer FC 4 to transform the dimensions of all its motion vectors.
所述前方车辆包围框编码器包括编码门控循环神经网络GRUb和第一全连接层FC1;所述GRUb的输入为前方车辆的包围框序列B中每个时刻的包围框Bt-τ,以及上一时刻GRUb传下来的隐藏状态矢量输出为当前时刻的前方车辆包围框编码结果FC1对GRUb最终输出进行维度变换,得到当前时刻t前方车辆的时序特征矢量 The preceding vehicle bounding box encoder includes an encoding gated recurrent neural network GRU b and a first fully connected layer FC 1 ; the input of the GRU b is the bounding box B t- at each moment in the bounding box sequence B of the preceding vehicle τ , and the hidden state vector passed down by GRU b at the previous moment The output is the encoding result of the bounding box of the front vehicle at the current moment FC 1 to GRU b final output Perform dimension transformation to obtain the time series feature vector of the vehicle ahead at the current time t
所述前方车辆光流编码器包括基于CNN的运动特征提取网络FEN和第二全连接层FC2;所述FEN的输入为前方车辆包围框内的光流序列F,输出为当前时刻的前方车辆包围框内光流编码结果;所述FEN基于ResNet50架构,包括依次连接的一个卷积层conv1,一个Relu层、一个最大池化层maxPool、4个残差学习块;其中conv1的输入通道数为2m,m为对光流序列F中光流图的采样数,即从F中均匀采样m个光流图;4个残差学习块均为为三层结构,即每个残差学习块为3个串接在一起的卷积网络层和Relu层;The optical flow encoder of the preceding vehicle includes a CNN-based motion feature extraction network FEN and a second fully connected layer FC 2 ; the input of the FEN is the optical flow sequence F in the bounding box of the preceding vehicle, and the output is the preceding vehicle at the current moment. The optical flow encoding result in the bounding box; the FEN is based on the ResNet50 architecture, including a convolutional layer conv1 connected in sequence, a Relu layer, a maximum pooling layer maxPool, and 4 residual learning blocks; The number of input channels of conv1 is 2m, m is the sampling number of optical flow graphs in the optical flow sequence F, that is, m optical flow graphs are uniformly sampled from F; the four residual learning blocks are all three-layer structures, that is, each residual learning block is 3 convolutional network layers and Relu layers concatenated together;
对前方车辆包围框内的光流序列F均匀采样m个光流图,m个光流图的垂直分量和水平分量构成2m个光流分量输入FEN中,FEN的输出为当前时刻的前方车辆包围框内光流图中的运动特征;The optical flow sequence F in the bounding box of the preceding vehicle is uniformly sampled with m optical flow graphs, and the vertical and horizontal components of the m optical flow graphs form 2m optical flow components, which are input into FEN, and the output of FEN is the surrounding vehicle at the current moment. Motion features in the optical flow map within the box;
FC2对FEN输出的运动特征进行维度变换,得到当前时刻t前方车辆的运动特征矢量 FC 2 performs dimensional transformation on the motion feature output by FEN, and obtains the motion feature vector of the vehicle ahead at the current time t
所述前方车辆位置预测解码器包括解码门控循环神经网络GRUd和第三全连接层FC3;所述GRUd的输入为t+δ时刻本车运动信息预测值Mt+δ与上一时刻GRUd传下来的隐藏状态矢量的融合矢量Mht+δ,以及上一时刻GRUd传下来的隐藏状态矢量1≤δ≤△,输出为t+δ时刻前方车辆包围框解码结果FC3对进行维度变换,得到t+δ时刻前方车辆包围框。The preceding vehicle position prediction decoder includes a decoding gated recurrent neural network GRU d and a third fully connected layer FC 3 ; the input of the GRU d is the predicted value M t+δ of the vehicle motion information at time t+δ and the previous Hidden state vector passed from GRU d at moment The fusion vector Mh t+δ of , and the hidden state vector passed down by GRU d at the
另一方面,本发明还公开了实现上述基于车载视频的前方车辆位置预测方法的预测系统,包括:On the other hand, the present invention also discloses a prediction system for realizing the above-mentioned method for predicting the position of the vehicle ahead based on the in-vehicle video, including:
基于编解码框架的车辆位置预测模型,用于根据当前时刻t之前的t-0,t-1,…,t-(T-1)时刻前方车辆包围框、所述包围框内的光流、本车在当前时刻t之后的t+1,t+2,…,t+△时刻的运动信息,预测前方车辆在当前时刻t之后的t+1,t+2,…,t+△时刻的包围框;The vehicle position prediction model based on the codec framework is used to predict the vehicle position according to the time t-0, t-1, ..., t-(T-1) before the current time t. The bounding box of the vehicle ahead, the optical flow in the bounding box, The motion information of the vehicle at the
所述车辆位置预测模型包括:前方车辆包围框编码器、前方车辆光流编码器、特征融合单元、前方车辆位置预测解码器;The vehicle position prediction model includes: a front vehicle bounding box encoder, a front vehicle optical flow encoder, a feature fusion unit, and a front vehicle position prediction decoder;
所述前方车辆包围框编码器用于对前方车辆的包围框序列B编码,得到前方车辆的时序特征矢量 The preceding vehicle bounding box encoder is used to encode the bounding box sequence B of the preceding vehicle to obtain the time series feature vector of the preceding vehicle
所述前方车辆光流编码器用于对前方车辆包围框内的光流序列F编码,得到前方车辆的运动特征矢量 The preceding vehicle optical flow encoder is used to encode the optical flow sequence F in the bounding box of the preceding vehicle to obtain the motion feature vector of the preceding vehicle
所述特征融合单元将前方车辆的时序特征矢量和运动特征矢量连接为前车的融合特征矢量 The feature fusion unit combines the time series feature vector of the preceding vehicle and motion feature vectors Concatenated as the fused feature vector of the preceding vehicle
所述前方车辆位置预测解码器根据本车的运动预测序列M对特征矢量解码,得到当前时刻t后的△个时刻的视频帧中前方车辆的预测包围框;The preceding vehicle position prediction decoder predicts the sequence M pairs of feature vectors according to the motion of the vehicle. Decoding to obtain the predicted bounding box of the vehicle ahead in the video frame at △ times after the current time t;
车辆包围框获取模块,用于获取车载视频中前方车辆的包围框序列B;a vehicle bounding box obtaining module, used to obtain the bounding box sequence B of the preceding vehicle in the vehicle video;
车辆包围框光流获取模块,用于获取车载视频中前方车辆包围框内的光流序列F;The vehicle bounding box optical flow acquisition module is used to obtain the optical flow sequence F in the bounding box of the preceding vehicle in the vehicle video;
本车运动信息预测模块,用于预测本车在未来时间的运动信息,构成本车运动预测序列M。The vehicle motion information prediction module is used to predict the motion information of the vehicle in the future, and constitutes the vehicle motion prediction sequence M.
有益效果:本发明公开前方车辆位置预测方法具有以下优点:1、本发明仅基于行车记录仪拍摄的视频图像信息,有效解决了现有技术中其他方法中需要依赖多种传感器获取信息而导致的在当下生产车辆中适用性不高的的问题;2、本发明采用基于编码-解码框架的深度学习网络模型,不仅能预测前方车辆的位置,还能预测前方车辆的尺度,显著提高了其预测的性能。Beneficial effects: The method for predicting the position of the vehicle ahead disclosed by the present invention has the following advantages: 1. The present invention is only based on the video image information captured by the driving recorder, which effectively solves the problem that other methods in the prior art need to rely on a variety of sensors to obtain information. The problem of low applicability in current production vehicles; 2. The present invention adopts a deep learning network model based on an encoding-decoding framework, which can not only predict the position of the vehicle ahead, but also predict the scale of the vehicle ahead, which significantly improves its prediction. performance.
附图说明Description of drawings
图1为本发明公开基于车载视频的前方车辆位置预测方法的流程图;1 is a flow chart of a method for predicting the position of a vehicle ahead based on a vehicle-mounted video disclosed by the present invention;
图2为视频帧车辆检测跟踪的示意图;2 is a schematic diagram of video frame vehicle detection and tracking;
图3为相邻帧的光流提取方法示意图;3 is a schematic diagram of an optical flow extraction method for adjacent frames;
图4为车辆位置预测模型的结构示意图;4 is a schematic structural diagram of a vehicle position prediction model;
图5为GRU的结构示意图;Fig. 5 is the structural schematic diagram of GRU;
图6为运动特征提取网络的结构示意图;6 is a schematic structural diagram of a motion feature extraction network;
图7为滑动窗示意图;7 is a schematic diagram of a sliding window;
图8为实施例中预测结果示意图;8 is a schematic diagram of a prediction result in an embodiment;
图9为本发明公开基于车载视频的前方车辆位置预测系统的结构示意图。FIG. 9 is a schematic structural diagram of the system for predicting the position of the vehicle ahead based on the in-vehicle video disclosed in the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施方式,进一步阐明本发明。The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments.
如图1所示,本发明公开了一种基于车载视频的前方车辆位置预测方法,包括训练阶段和预测阶段,其中训练阶段包括:As shown in FIG. 1 , the present invention discloses a method for predicting the position of a vehicle ahead based on in-vehicle video, including a training stage and a prediction stage, wherein the training stage includes:
S1、构建基于编解码框架的车辆位置预测模型,所述车辆位置预测模型用于根据当前时刻t之前的t-0,t-1,…,t-(T-1)时刻前方车辆包围框、所述包围框内的光流、本车在当前时刻t之后的t+1,t+2,…,t+△时刻的运动信息,预测前方车辆在当前时刻t之后的t+1,t+2,…,t+△时刻的包围框;S1. Construct a vehicle position prediction model based on an encoding and decoding framework. The vehicle position prediction model is used for the bounding box of the vehicle ahead at the time t-0, t-1, . . . , t-(T-1) before the current time t, The optical flow in the bounding box, the motion information of the vehicle at times t+1, t+2, ..., t+△ after the current time t, and the prediction of the vehicle ahead at t+1, t+2 after the current time t ,…, the bounding box at time t+△;
本实施例中,T=20,△=40;In this embodiment, T=20, Δ=40;
所述车辆位置预测模型的输入包括:当前时刻t前的T个时刻的视频帧中,前方车辆的包围框序列B、前方车辆包围框内的光流序列F,以及当前时刻t后的△个时刻的视频帧中,本车的运动预测序列M;The input of the vehicle position prediction model includes: in the video frames at T times before the current time t, the bounding box sequence B of the preceding vehicle, the optical flow sequence F in the bounding box of the preceding vehicle, and the △ frames after the current time t. In the video frame at the moment, the motion prediction sequence M of the vehicle;
其中B=[Bt-0,Bt-1,…,Bt-τ,…Bt-(T-1)],Bt-τ表示前方车辆在时刻t前的第τ个时刻的视频帧中的包围框,所述包围框用包围框中心点的横纵坐标xt-τ,yt-τ、包围框的宽wt-τ、高ht-τ表示,即Bt-τ=(xt-τ,yt-τ,wt-τ,ht-τ);0≤τ<T;where B=[B t-0 , B t-1 ,...,B t-τ ,...B t-(T-1) ], B t-τ represents the video of the vehicle ahead at the τth time before time t The bounding box in the frame, the bounding box is represented by the horizontal and vertical coordinates x t-τ , y t-τ of the center point of the bounding box, the width w t-τ of the bounding box, and the height h t-τ , namely B t-τ =(x t-τ , y t-τ , w t-τ , h t-τ ); 0≤τ<T;
本发明中,前方车辆的包围框序列采用如下步骤计算:In the present invention, the bounding box sequence of the preceding vehicle is calculated by the following steps:
A.1、对连续T个时刻的视频帧图像进行车辆检测,得到每帧图像中所有车辆的包围框;A.1. Perform vehicle detection on video frame images of T consecutive moments to obtain the bounding boxes of all vehicles in each frame of image;
本实施例采用基于Mask-RCNN建立的车辆检测模型进行车辆检测,所述车辆检测模型采用COCO数据集进行训练,其输出为图像中的车辆包围框,每个包围框用4维向量表示;视频中的图像尺寸在输入Mask-RCNN前统一缩放至1024*1024。In this embodiment, the vehicle detection model established based on Mask-RCNN is used for vehicle detection. The vehicle detection model is trained using the COCO data set, and the output is the vehicle bounding box in the image, and each bounding box is represented by a 4-dimensional vector; video The size of the images in is uniformly scaled to 1024*1024 before input to Mask-RCNN.
A.2、采用多目标跟踪算法跟踪步骤A.1得到的车辆包围框,对不同帧中同一车辆给出相同编号,按时间顺序构成T个时刻前方车辆包围框序列B。本实施例中采用Sort算法进行多目标跟踪,Sort算法是一种在线实时多目标跟踪算法,适用于车载视频中车辆的跟踪。图2为视频帧车辆检测跟踪的示意图。图2中不同时刻的两幅视频帧中检测到3辆车,对相同的车辆编号,分别为1,2,3。A.2. Use the multi-target tracking algorithm to track the vehicle bounding box obtained in step A.1, give the same number to the same vehicle in different frames, and form the bounding box sequence B of the preceding vehicle at T times in time order. In this embodiment, the Sort algorithm is used for multi-target tracking. The Sort algorithm is an online real-time multi-target tracking algorithm, which is suitable for tracking vehicles in vehicle-mounted videos. FIG. 2 is a schematic diagram of vehicle detection and tracking in video frames. 3 vehicles are detected in the two video frames at different times in Figure 2, and the same vehicle numbers are 1, 2, and 3, respectively.
F=[Ft-0,Ft-1,…,Ft-τ,…Ft-(T-1)],Ft-τ表示前方车辆在时刻t前的第τ个时刻的视频帧中的包围框内的光流图,Ft-τ={(ut-τ(p),vt-τ(p))},(ut-τ(p),vt-τ(p))为所述光流图中第p个像素点处的二维光流矢量;F=[F t-0 ,F t-1 ,...,F t-τ ,...F t-(T-1) ], F t-τ represents the video frame of the preceding vehicle at the τth time before time t The optical flow graph inside the bounding box in , F t-τ = {(u t-τ (p), v t-τ (p))}, (u t-τ (p), v t-τ (p )) is the two-dimensional optical flow vector at the p-th pixel in the optical flow diagram;
所述前方车辆包围框内的光流序列采用如下步骤计算:The optical flow sequence in the bounding box of the preceding vehicle is calculated by the following steps:
B.1、对连续T个时刻的视频帧图像,计算每一帧与其前一帧图像的光流,得到每一帧图像对应的光流图;本实施例采用FlowNet2算法进行相邻帧的光流计算;所述光流图中第j个像素点的二维光流矢量为:Ij=(uj,vj),uj,vj分别为光流矢量的垂直分量和水平分量;如图3所示。B.1. For the video frame images of consecutive T times, calculate the optical flow of each frame and its previous frame image, and obtain the optical flow diagram corresponding to each frame image; this embodiment uses the FlowNet2 algorithm to perform optical flow of adjacent frames. Flow calculation; the two-dimensional optical flow vector of the jth pixel in the optical flow diagram is: I j =(u j , v j ), u j , v j are the vertical component and the horizontal component of the optical flow vector respectively; As shown in Figure 3.
B.2、在第t-τ时刻的图像对应的光流图中截取第t-τ时刻图像中前方车辆包围框覆盖部分,并缩放至预设的统一尺寸,得到第t-τ时刻的包围框内的光流图,按时间顺序构成T个时刻前方车辆包围框内的光流序列F,t-τ表示时刻t前的第τ个时刻,0≤τ<T。本实施例中,将包围框内的光流图统一缩放至224*224。B.2. In the optical flow diagram corresponding to the image at time t-τ, intercept the part covered by the bounding box of the vehicle ahead in the image at time t-τ, and scale it to a preset uniform size to obtain the bounding box at time t-τ The optical flow graph in the box constitutes the optical flow sequence F in the bounding box of the vehicle ahead at T times in time order, t-τ represents the τth moment before time t, 0≤τ<T. In this embodiment, the optical flow graph in the bounding box is uniformly scaled to 224*224.
行车过程中,除了车前方场景中的车辆运动,本车自身也在运动,要预测车前方车辆的运动,也必须预测本车自身的运动。During driving, in addition to the motion of the vehicle in the scene in front of the vehicle, the vehicle itself is also moving. To predict the motion of the vehicle in front of the vehicle, the motion of the vehicle itself must also be predicted.
本车的运动信息预测序列采用如下步骤计算:The motion information prediction sequence of the vehicle is calculated by the following steps:
C.1、对当前时刻t之前的t-0,t-1,…,t-(T-1)时刻的视频帧,计算相邻时刻视频帧Pt-τ-1和Pt-τ的相机旋转矩阵Rt-τ和平移向量Vt-τ,构成旋转矩阵序列RS和平移向量序列VS,0≤τ<T,具体包括步骤C.1-1至步骤C.1-2:C.1. For the video frames at the time t-0, t-1,..., t-(T-1) before the current time t, calculate the difference between the video frames P t-τ-1 and P t-τ at the adjacent time Camera rotation matrix R t-τ and translation vector V t-τ , constitute rotation matrix sequence RS and translation vector sequence VS, 0≤τ<T, specifically including steps C.1-1 to C.1-2:
C.1-1、采用八点法,计算得到本质矩阵E,方法如下:C.1-1. Using the eight-point method, calculate the essential matrix E, the method is as follows:
C.1-1-1、采用Surf算法,提取Pt-τ-1和Pt-τ的特征点,并选取8对最匹配的特征点(al,a′l),l=1,2,…,8;其中al,a′l分别表示视频帧Pt-τ-1和Pt-τ中第l对匹配的特征点像素位置在归一化平面上的坐标,al=[xl,yl,1]T,a′l=[x′l,y′l,1]T;al,a′l均为3×1的矩阵,其中T表示矩阵的转置;C.1-1-1. Using the Surf algorithm, extract the feature points of P t-τ-1 and P t-τ , and select 8 pairs of the most matching feature points (a l , a' l ), l=1, 2 , . _ _ [x l ,y l ,1] T , a' l =[x' l ,y' l ,1] T ; a l , a' l are both 3×1 matrices, where T represents the transpose of the matrix;
C.1-1-2、将8对匹配的特征点组合,得到3×8的矩阵a和a′:C.1-1-2. Combine 8 pairs of matched feature points to obtain 3×8 matrices a and a':
根据a和a′建立对极约束公式: According to a and a′, establish the polar constraint formula:
aTEa′=0a T Ea'=0
解上述方程组得到本质矩阵E,E为3×3的矩阵;Solve the above equations to get the essential matrix E, where E is a 3×3 matrix;
C.1-2、对E进行奇异值分解,得到相机的旋转矩阵Rt-τ和平移向量Vt-τ,其中Rt-τ为3×3的矩阵,Vt-τ为3维列向量;C.1-2. Perform singular value decomposition on E to obtain the rotation matrix R t-τ and translation vector V t-τ of the camera, where R t-τ is a 3×3 matrix, and V t-τ is a 3-dimensional column vector;
最终得到t时刻前T个视频帧的旋转矩阵序列RS={Rt-(T-1),…,Rt-τ,…,Rt-1,Rt-0},t时刻前T个视频帧的平移向量序列VS={Vt-(T-1),…,Vt-τ,…,Vt-1,Vt-0};Finally, the rotation matrix sequence RS={R t-(T-1) ,...,R t-τ ,...,R t-1 ,R t-0 } of the T video frames before time t is obtained, and T before time t The translation vector sequence VS={V t-(T-1) ,...,V t-τ ,...,V t-1 ,V t-0 };
C.2、对于C.1得到的RS和VS中的相机旋转矩阵和平移向量,计算每一个Rt-τ和Vt-τ与其前一时刻的累积值,所述累积值用R′t-τ和V′t-τ表示,如下公式所示:C.2. For the camera rotation matrix and translation vector in RS and VS obtained in C.1, calculate the cumulative value of each R t-τ and V t-τ and its previous moment, the cumulative value is R' t -τ and V′ t-τ are expressed as follows:
C.3、将C.2最后计算得到的R′t-0和V′t-0传递给相机在下一时刻的旋转矩阵和平移向量,如下公式所示:C.3. Pass the R' t-0 and V' t-0 calculated in C.2 to the rotation matrix and translation vector of the camera at the next moment, as shown in the following formula:
Rt+1=R′t-0 R t+1 =R' t-0
Vt+1=V′t-0 V t+1 =V' t-0
C.4、将C.3得到的Rt+1和Vt+1分别添加在C.1得到的旋转矩阵序列RS和平移向量序列VS末尾,并继续执行C.2和C.3,直到得到t时刻后△个视频帧的所有旋转矩阵{Rt+1,Rt+2,…,Rt+δ,…,Rt+△},t时刻后△个视频帧的所有平移向量{Vt+1,Vt+2,…,Vt+δ,…,Vt+△},1≤δ≤△;C.4. Add R t+1 and V t+1 obtained in C.3 to the end of the rotation matrix sequence RS and translation vector sequence VS obtained in C.1, and continue to execute C.2 and C.3 until Obtain all rotation matrices {R t+1 ,R t+2 ,…,R t+δ ,…,R t+△ } of △ video frames after time t, and all translation vectors {V of △ video frames after time t t+1 ,V t+2 ,…,V t+δ ,…,V t+△ }, 1≤δ≤△;
C.5、计算本车在当前时刻t后△个时刻的运动向量,构成本车的运动预测序列M={Mt+1,Mt+2,…,Mt+δ,…,Mt+△},具体包括步骤C.5-1至C.5-2:C.5. Calculate the motion vector of the vehicle at △ times after the current time t to form the motion prediction sequence M={M t+1 ,M t+2 ,…,M t+δ ,…,M t+ △ }, specifically including steps C.5-1 to C.5-2:
C.5-1、从旋转矩阵Rt+δ中提取相机在x,y,z轴的旋转角度信息,并用3维行向量表示,其中:C.5-1. Extract the rotation angle information of the camera in the x, y, z axes from the rotation matrix R t+δ , and use the 3-dimensional row vector means, where:
上式中,rjk表示旋转矩阵Rt+δ中第j行第k列的值,j,k∈{1,2,3};atan2()与atan()均表示反正切函数,但是atan()求出的结果取值范围为(0,2π],atan2()求出的结果取值范围为(-π,π];In the above formula, r jk represents the value of the jth row and the kth column in the rotation matrix R t+δ , j,k∈{1,2,3}; both atan2() and atan() represent the arctangent function, but atan The value range of the result obtained by () is (0, 2π], and the value range of the result obtained by atan2() is (-π, π];
C.5-2、将向量ψt+δ与转换为三维行向量的平移向量Vt+δ T连接,组成一个6维行向量Mt+δ:Mt+δ=[ψt+δ,Vt+δ T];C.5-2. Connect the vector ψ t+δ with the translation vector V t+δ T converted into a three-dimensional row vector to form a 6-dimensional row vector M t+δ : M t+δ =[ψ t+δ , V t + δ T ];
最终得到本车的运动预测序列M={Mt+1,Mt+2,…,Mt+δ,…,Mt+△};Finally, the motion prediction sequence M={M t+1 ,M t+2 ,…,M t+δ ,…,M t+△ } is obtained;
C.6、将M经过一个全连接层FC4,变换其所有运动向量的维度,使其与解码门控循环神经网络GRUd上一时刻传下来的隐藏状态矢量维度一致。本实施例中全连接输出维度为512维。C.6. Pass M through a fully connected layer FC 4 to transform the dimensions of all its motion vectors so that they are the same as the hidden state vector passed down from the decoding gated recurrent neural network GRU d at the previous moment Dimensions are consistent. In this embodiment, the fully connected output dimension is 512 dimensions.
所述车辆位置预测模型的输出为当前时刻t后的△个时刻的视频帧图像中前方车辆的预测包围框序列Y,Y=[Yt+1,Yt+2,…,Yt+δ,…,Yt+△];其中Yt+δ表示前方车辆在时刻t后的第δ个时刻视频帧图像中的预测包围框,所述包围框用包围框中心点的横纵坐标、包围框的宽高表示,即Yt+δ=(xt+δ,yt+δ,wt+δ,ht+δ);The output of the vehicle position prediction model is the predicted bounding box sequence Y of the preceding vehicle in the video frame images at △ times after the current time t, Y=[Y t+1 , Y t+2 ,..., Y t+δ ,...,Y t+△ ]; where Y t+δ represents the predicted bounding box in the video frame image at the δth time after the time t of the preceding vehicle, and the bounding box uses the horizontal and vertical coordinates of the center point of the bounding box, the bounding box The width and height of , namely Y t+δ =(x t+δ ,y t+δ ,w t+δ ,h t+δ );
如图4所示,车辆位置预测模型包括:前方车辆包围框编码器1-1、前方车辆光流编码器1-2、特征融合单元1-3、前方车辆位置预测解码器1-4;As shown in Figure 4, the vehicle position prediction model includes: a front vehicle bounding box encoder 1-1, a front vehicle optical flow encoder 1-2, a feature fusion unit 1-3, and a front vehicle position prediction decoder 1-4;
所述前方车辆包围框编码器1-1用于对前方车辆的包围框序列B编码,得到前方车辆的时序特征矢量 The preceding vehicle bounding box encoder 1-1 is used to encode the bounding box sequence B of the preceding vehicle to obtain the time series feature vector of the preceding vehicle
前方车辆包围框编码器主要利用门控循环神经网络(Gated Recurrent Unit,GRU)进行编码。GRU可以只保留相关信息来进行预测,而忘记不相关的数据,其结构如图5所示,输入为当前时刻的输入Int和上一时刻GRU传下来的隐藏状态矢量ht-1,ht-1表示GRU通过内部的门结构认为过去时刻中输入序列的有用信息,在本发明中该隐藏状态矢量表示前方车辆在过去时间段的位置和尺度信息。结合Int和ht-1,GRU输出当前时刻的隐藏状态矢量ht,整个前向传播过程计算公式如下:The front vehicle bounding box encoder is mainly encoded by a gated recurrent neural network (Gated Recurrent Unit, GRU). GRU can only keep relevant information for prediction, and forget irrelevant data. Its structure is shown in Figure 5. The input is the input In t at the current moment and the hidden state vector h t-1 , h passed from the GRU at the previous moment. t-1 indicates that the GRU considers the useful information of the input sequence in the past time through the internal gate structure. In the present invention, the hidden state vector represents the position and scale information of the preceding vehicle in the past time period. Combined with In t and h t-1 , GRU outputs the hidden state vector h t at the current moment. The calculation formula of the whole forward propagation process is as follows:
其中zt表示更新门的输出,σ()表示sigmoid函数,Wz表示更新门的权值参数,rt表示重置门的输出,Wr表示重置门的权值参数,表示当前时刻待定的输出,tanh()表示双曲正切函数,表示待定值的权值参数,[,]表示两个矢量相连。将上述公式组简记为:where z t represents the output of the update gate, σ() represents the sigmoid function, W z represents the weight parameter of the update gate, r t represents the output of the reset gate, W r represents the weight parameter of the reset gate, Represents the output to be determined at the current moment, tanh() represents the hyperbolic tangent function, Indicates the weight parameter of the undetermined value, [,] indicates that the two vectors are connected. The above formula group is abbreviated as:
其中c为具体的应用类别,U为GRUc当前时刻的输入值,V为GRUc的权值参数。Among them, c is the specific application category, U is the input value of GRU c at the current moment, and V is the weight parameter of GRU c .
所述前方车辆包围框编码器包括编码门控循环神经网络GRUb和第一全连接层FC1;所述GRUb的输入为前方车辆的包围框序列B中每个时刻的包围框Bt-τ,以及上一时刻GRUb传下来的隐藏状态矢量输出为当前时刻的前方车辆包围框编码结果FC1对GRUb最终输出进行维度变换,得到当前时刻t前方车辆的时序特征矢量 The preceding vehicle bounding box encoder includes an encoding gated recurrent neural network GRU b and a first fully connected layer FC 1 ; the input of the GRU b is the bounding box B t- at each moment in the bounding box sequence B of the preceding vehicle τ , and the hidden state vector passed down by GRU b at the previous moment The output is the encoding result of the bounding box of the front vehicle at the current moment FC 1 to GRU b final output Perform dimension transformation to obtain the time series feature vector of the vehicle ahead at the current time t
编码门控循环神经网络GRUb的结构为:The structure of the encoding gated recurrent neural network GRU b is:
其中φ()表示使用ReLU激活函数进行线性映射,θb表示GRUb中的权值参数V。本实施例中,的维度为512,FC1将的维度变换为256,即的维度为256。where φ() represents linear mapping using the ReLU activation function, and θ b represents the weight parameter V in GRU b . In this embodiment, has a dimension of 512, FC 1 will The dimension of is transformed to 256, that is The dimension is 256.
所述前方车辆光流编码器1-2用于对前方车辆包围框内的光流序列F编码,得到前方车辆的运动特征矢量 The preceding vehicle optical flow encoder 1-2 is used to encode the optical flow sequence F in the bounding box of the preceding vehicle to obtain the motion feature vector of the preceding vehicle
所述前方车辆光流编码器包括基于CNN的运动特征提取网络FEN和第二全连接层FC2;所述FEN的输入为前方车辆包围框内的光流序列F,输出为当前时刻的前方车辆包围框内光流编码结果;如图6所示,所述FEN基于ResNet50架构,包括依次连接的一个卷积层conv1,一个Relu层、一个最大池化层maxPool、4个残差学习块,如图6-(a)所示;其中conv1的输入通道数为2m,m为对光流序列F中光流图的采样数,即从F中均匀采样m个光流图,本实施例中m=10;4个残差学习块均为为三层结构,即每个残差学习块为3个串接在一起的卷积网络层Conv2和Relu层,如图6-(b)所示。The optical flow encoder of the preceding vehicle includes a CNN-based motion feature extraction network FEN and a second fully connected layer FC 2 ; the input of the FEN is the optical flow sequence F in the bounding box of the preceding vehicle, and the output is the preceding vehicle at the current moment. The optical flow encoding result in the bounding box; as shown in Figure 6, the FEN is based on the ResNet50 architecture, including a convolutional layer conv1, a Relu layer, a maximum pooling layer maxPool, and 4 residual learning blocks connected in sequence, such as As shown in Figure 6-(a); the number of input channels of conv1 is 2m, and m is the number of samples of the optical flow map in the optical flow sequence F, that is, m optical flow maps are uniformly sampled from F, in this embodiment m = 10; the four residual learning blocks are all three-layer structures, that is, each residual learning block consists of three convolutional network layers Conv2 and Relu layers concatenated together, as shown in Figure 6-(b).
对前方车辆包围框内的光流序列F均匀采样m个光流图,每一个光流图的垂直分量和水平分量,看作光流图的两个通道。m个光流图的垂直分量和水平分量构成2m个光流分量输入FEN中,FEN的输出为当前时刻的前方车辆包围框内光流图中的运动特征;本实施例中FEN提取的运动特征维度为2048维,FC2将FEN输出的运动特征的维度变换为256,得到当前时刻t前方车辆的256维运动特征矢量 M optical flow graphs are uniformly sampled for the optical flow sequence F in the bounding box of the preceding vehicle, and the vertical and horizontal components of each optical flow graph are regarded as two channels of the optical flow graph. The vertical and horizontal components of the m optical flow graphs form 2m optical flow components, which are input into the FEN, and the output of the FEN is the motion feature in the optical flow graph in the bounding box of the vehicle ahead at the current moment; the motion feature extracted by the FEN in this embodiment The dimension is 2048, FC 2 transforms the dimension of the motion feature output by FEN to 256, and obtains the 256-dimensional motion feature vector of the vehicle ahead at the current time t
所述特征融合单元1-3将前方车辆的时序特征矢量和运动特征矢量连接为前车的融合特征矢量 表示车辆包围框历史信息和光流历史信息,即前方车辆在过去时间段中不同时间点的位置、尺度、外观和运动信息;本实施例中,为512维矢量。The feature fusion unit 1-3 combines the time series feature vector of the preceding vehicle and motion feature vectors Concatenated as the fused feature vector of the preceding vehicle Represents the historical information of the vehicle bounding box and the historical information of optical flow, that is, the position, scale, appearance and motion information of the preceding vehicle at different time points in the past time period; in this embodiment, is a 512-dimensional vector.
所述前方车辆位置预测解码器1-4根据本车的运动预测序列M对特征矢量解码,得到当前时刻t后的△个时刻的视频帧中前方车辆的预测包围框;The preceding vehicle position prediction decoder 1-4 predicts the sequence M pairs of feature vectors according to the motion of the vehicle Decoding to obtain the predicted bounding box of the vehicle ahead in the video frame at △ times after the current time t;
所述前方车辆位置预测解码器包括解码门控循环神经网络GRUd和第三全连接层FC3;所述GRUd的输入为t+δ时刻本车运动信息预测值Mt+δ与上一时刻GRUd传下来的隐藏状态矢量的融合矢量Mht+δ,以及上一时刻GRUd传下来的隐藏状态矢量1≤δ≤△,输出为t+δ时刻前方车辆包围框解码结果FC3对进行维度变换,转换为4维矢量,得到t+δ时刻前方车辆包围框。The preceding vehicle position prediction decoder includes a decoding gated recurrent neural network GRU d and a third fully connected layer FC 3 ; the input of the GRU d is the predicted value M t+δ of the vehicle motion information at time t+δ and the previous Hidden state vector passed from GRU d at moment The fusion vector Mh t+δ of , and the hidden state vector passed down by GRU d at the
解码门控循环神经网络GRUd的结构为:The structure of the decoding gated recurrent neural network GRU d is:
其中θd为GRUd中的权值参数V。where θ d is the weight parameter V in GRU d .
本实施例中,融合矢量Mht+δ的计算为:In this embodiment, the calculation of the fusion vector Mh t+δ is:
对6维向量Mt+δ采用第四全连接层FC4变换为512维向量对使用ReLU激活函数进行线性映射,对线性映射后的向量与相加后求平均,得到512维的融合矢量Mht+δ,其中Average()表示对两个矢量相加后求平均。Use the fourth fully connected layer FC 4 to transform the 6-dimensional vector M t+δ into a 512-dimensional vector right Use the ReLU activation function to perform linear mapping, and the linearly mapped vector is After adding and averaging, a 512-dimensional fusion vector Mh t+δ is obtained, Among them, Average() means that the two vectors are added and averaged.
S2、构建样本集并对车辆位置预测模型进行训练,包括:S2. Construct a sample set and train a vehicle position prediction model, including:
S2-1、采集能够拍摄到前车的多个时长为s的车载视频片段,对每个视频片段中的视频帧进行采样,并确定采样后的视频帧中前方车辆的包围框序列Btr、包围框内的光流序列Ftr和视频帧对应时刻本车的运动信息序列Mtr,构成样本集;S2-1. Collect a plurality of in-vehicle video clips with a duration of s that can capture the preceding vehicle, sample the video frames in each video clip, and determine the bounding box sequence B tr of the preceding vehicle in the sampled video frame, The optical flow sequence F tr in the bounding box and the motion information sequence M tr of the vehicle at the corresponding moment of the video frame constitute a sample set;
S2-2、将样本集划分为训练集和验证集;设置学习率σ,批处理数量N;S2-2. Divide the sample set into training set and validation set; set learning rate σ and batch number N;
S2-3、训练过程采用Adam优化器,根据训练集样本数和N确定训练批次N′;将训练样本中的视频片段前s′时长的视频帧对应的Btr、Ftr,后s″时长的视频帧对应的Mtr作为车辆位置预测模型的输入,后s″时长的视频帧对应的Btr作为输出,对所述模型进行训练,保存模型参数,并用验证集验证模型的预测准确度;s′+s″=s; S2-3 . The Adam optimizer is used in the training process, and the training batch N′ is determined according to the number of samples in the training set and N; The M tr corresponding to the video frame of the duration is used as the input of the vehicle position prediction model, and the B tr corresponding to the video frame of the later s″ duration is used as the output, the model is trained, the model parameters are saved, and the verification set is used to verify the prediction accuracy of the model ;s'+s"=s;
S2-4、选择N′批训练中预测准确度最高的模型参数作为车辆位置预测模型的参数;S2-4, select the model parameters with the highest prediction accuracy in the N' batch training as the parameters of the vehicle position prediction model;
本实施例中,采集1000个视频片段,每个片段时长为3秒,每秒20帧,根据前1秒内的车辆包围框预测后2秒内该车辆的包围框;训练集占样本集的70%,验证集占30%。训练过程采用Adam优化器,固定学习率为0.0005,批处理数量为64,共训练40批次。训练中计算车辆的实际包围框序列与预测结果中的包围框Y的差值,使用smoothL1损失函数,反馈误差,优化并保存最终的网络权重参数;损失函数如下式所示:In this embodiment, 1000 video clips are collected, each clip has a duration of 3 seconds and 20 frames per second, and the bounding box of the vehicle in the next 2 seconds is predicted according to the bounding box of the vehicle in the first 1 second; 70% and the validation set is 30%. The training process adopts the Adam optimizer, the fixed learning rate is 0.0005, the batch number is 64, and a total of 40 batches are trained. Calculate the actual bounding box sequence of the vehicle during training The difference from the bounding box Y in the prediction result, using the smoothL1 loss function, feedback the error, optimize and save the final network weight parameters; the loss function is as follows:
其中|·|表示计算向量的模。where |·| denotes the modulo of the computed vector.
预测阶段包括:The forecast phase includes:
车辆上设置可以拍摄前方车辆的摄像头,获取所述摄像头在车辆行驶中采集的视频数据;A camera that can shoot the vehicle ahead is set on the vehicle, and video data collected by the camera while the vehicle is driving is acquired;
对视频中每一帧图像进行车辆检测与跟踪,得到每一辆前车的包围框序列,并存入Btest(i)中,i为前车编号;同时计算包围框内的光流,存入Ftest(i);获取本车在未来帧中的运动信息,存入序列Mtest;Perform vehicle detection and tracking on each frame of image in the video, obtain the bounding box sequence of each preceding vehicle, and store it in B test (i), where i is the number of the preceding vehicle; at the same time, calculate the optical flow in the bounding box and store Enter F test (i); obtain the motion information of the vehicle in the future frame, and store it in the sequence M test ;
在序列Btest(i)和Ftest(i)中采用长度为T的第一滑动窗SW-1,在序列Mtest中采用长度为△的第二滑动窗SW-2,分别截取当前时刻t前的T个视频帧中车辆i的包围框、所述包围框内的光流,以及当前时刻t后的△个视频帧中本车的运动信息预测值,输入训练好的车辆位置预测模型中,得到前方车辆i在当前时刻t后的△个视频帧中的包围框序列Y′(i)=[Y′t+1(i),Y′t+2(i),…,Y′t+δ(i),…,Y′t+△(i)],计算前方车辆i的包围框在当前时刻视频帧中的相对位置:其中Btest,t+0(i)为前方车辆i在当前时刻t的包围框;1≤δ≤△;滑动窗的如图7所示。随着时间的持续,两个滑动窗均前进一格,进行下一时刻前车位置的检测。A first sliding window SW-1 with a length of T is used in the sequences B test (i) and F test (i), and a second sliding window SW-2 with a length of Δ is used in the sequence M test , respectively intercepting the current time t The bounding box of vehicle i in the previous T video frames, the optical flow in the bounding box, and the predicted value of the vehicle's motion information in △ video frames after the current time t are input into the trained vehicle position prediction model. , obtain the bounding box sequence Y′(i)=[Y′ t+1 (i), Y′ t+2 (i),…,Y′ t of the preceding vehicle i in △ video frames after the current time t +δ (i),…,Y′ t+△ (i)], calculate the relative position of the bounding box of the preceding vehicle i in the current video frame: Among them, B test,t+0 (i) is the bounding box of the preceding vehicle i at the current time t; 1≤δ≤Δ; the sliding window is shown in Figure 7. As time goes on, both sliding windows move forward by one frame to detect the position of the preceding vehicle at the next moment.
根据Y′(i)中包围框的中心得到前方车辆i的预测轨迹;根据Y′(i)中包围框的宽高得到前方车辆i尺度。The predicted trajectory of the preceding vehicle i is obtained according to the center of the bounding box in Y'(i); the scale of the preceding vehicle i is obtained according to the width and height of the bounding box in Y'(i).
本实施例中,将预测结果在当前时刻的视频帧中显示出来,如图8所示。In this embodiment, the prediction result is displayed in the video frame at the current moment, as shown in FIG. 8 .
如图9所示,本发明还公开了实现上述基于车载视频的前方车辆位置预测方法的预测系统,包括:As shown in FIG. 9 , the present invention also discloses a prediction system for realizing the above-mentioned method for predicting the position of the vehicle ahead based on the in-vehicle video, including:
基于编解码框架的车辆位置预测模型1,用于根据当前时刻t之前的t-0,t-1,…,t-(T-1)时刻前方车辆包围框、所述包围框内的光流、本车在当前时刻t之后的t+1,t+2,…,t+△时刻的运动信息,预测前方车辆在当前时刻t之后的t+1,t+2,…,t+△时刻的包围框;The vehicle
所述车辆位置预测模型包括:前方车辆包围框编码器1-1、前方车辆光流编码器1-2、特征融合单元1-3、前方车辆位置预测解码器1-4;The vehicle position prediction model includes: a preceding vehicle bounding box encoder 1-1, a preceding vehicle optical flow encoder 1-2, a feature fusion unit 1-3, and a preceding vehicle position prediction decoder 1-4;
所述前方车辆包围框编码器用于对前方车辆的包围框序列B编码,得到前方车辆的时序特征矢量 The preceding vehicle bounding box encoder is used to encode the bounding box sequence B of the preceding vehicle to obtain the time series feature vector of the preceding vehicle
所述前方车辆光流编码器用于对前方车辆包围框内的光流序列F编码,得到前方车辆的运动特征矢量 The preceding vehicle optical flow encoder is used to encode the optical flow sequence F in the bounding box of the preceding vehicle to obtain the motion feature vector of the preceding vehicle
所述特征融合单元将前方车辆的时序特征矢量和运动特征矢量连接为前车的融合特征矢量 The feature fusion unit combines the time series feature vector of the preceding vehicle and motion feature vectors Concatenated as the fused feature vector of the preceding vehicle
所述前方车辆位置预测解码器根据本车的运动信息预测序列M对特征矢量解码,得到当前时刻t后的△个时刻的视频帧中前方车辆的预测包围框;The preceding vehicle position prediction decoder predicts the sequence M pairs of feature vectors according to the motion information of the vehicle Decoding to obtain the predicted bounding box of the vehicle ahead in the video frame at △ times after the current time t;
车辆包围框获取模块2,用于获取车载视频中前方车辆的包围框序列B;The vehicle bounding
车辆包围框光流获取模块3,用于获取车载视频中前方车辆包围框内的光流序列F;The vehicle bounding box optical flow acquisition module 3 is used to obtain the optical flow sequence F within the bounding box of the preceding vehicle in the in-vehicle video;
本车运动信息预测模块4,用于预测本车在未来时间的运动信息,构成本车运动预测序列M。The own vehicle motion
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051940.3A CN112800879B (en) | 2021-01-15 | 2021-01-15 | Vehicle-mounted video-based front vehicle position prediction method and prediction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051940.3A CN112800879B (en) | 2021-01-15 | 2021-01-15 | Vehicle-mounted video-based front vehicle position prediction method and prediction system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112800879A CN112800879A (en) | 2021-05-14 |
CN112800879B true CN112800879B (en) | 2022-08-26 |
Family
ID=75811025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110051940.3A Active CN112800879B (en) | 2021-01-15 | 2021-01-15 | Vehicle-mounted video-based front vehicle position prediction method and prediction system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800879B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113610900B (en) * | 2021-10-11 | 2022-02-15 | 深圳佑驾创新科技有限公司 | Method and device for predicting scale change of vehicle tail sequence and computer equipment |
CN114255450A (en) * | 2022-01-01 | 2022-03-29 | 南昌智能新能源汽车研究院 | A near-field vehicle jamming behavior prediction method based on forward panoramic images |
CN114445606A (en) * | 2022-01-29 | 2022-05-06 | 北京精英路通科技有限公司 | Method and device for capturing license plate image, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846854A (en) * | 2018-05-07 | 2018-11-20 | 中国科学院声学研究所 | A kind of wireless vehicle tracking based on motion prediction and multiple features fusion |
CN111914664A (en) * | 2020-07-06 | 2020-11-10 | 同济大学 | Vehicle multi-target detection and trajectory tracking method based on re-identification |
CN111931905A (en) * | 2020-07-13 | 2020-11-13 | 江苏大学 | Graph convolution neural network model and vehicle track prediction method using same |
-
2021
- 2021-01-15 CN CN202110051940.3A patent/CN112800879B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846854A (en) * | 2018-05-07 | 2018-11-20 | 中国科学院声学研究所 | A kind of wireless vehicle tracking based on motion prediction and multiple features fusion |
CN111914664A (en) * | 2020-07-06 | 2020-11-10 | 同济大学 | Vehicle multi-target detection and trajectory tracking method based on re-identification |
CN111931905A (en) * | 2020-07-13 | 2020-11-13 | 江苏大学 | Graph convolution neural network model and vehicle track prediction method using same |
Non-Patent Citations (1)
Title |
---|
基于CNN和LSTM混合模型的车辆行为检测方法;王硕等;《智能计算机与应用》;20200201(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112800879A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800879B (en) | Vehicle-mounted video-based front vehicle position prediction method and prediction system | |
Huang et al. | End-to-end autonomous driving decision based on deep reinforcement learning | |
CN109740419A (en) | A Video Action Recognition Method Based on Attention-LSTM Network | |
Bai et al. | Deep learning based motion planning for autonomous vehicle using spatiotemporal LSTM network | |
CN111027461B (en) | Vehicle track prediction method based on multi-dimensional single-step LSTM network | |
CN104506800B (en) | The alert camera scene synthesis of the multi-direction electricity of one kind and comprehensive monitoring and controlling method and device | |
CN110516633B (en) | Lane line detection method and system based on deep learning | |
CN110599521B (en) | Method and prediction method for generating trajectory prediction model of vulnerable road users | |
CN111292366A (en) | Visual driving ranging algorithm based on deep learning and edge calculation | |
CN115829171A (en) | Pedestrian trajectory prediction method combining space information and social interaction characteristics | |
Wang et al. | Adversarial learning for joint optimization of depth and ego-motion | |
CN116740424A (en) | Transformer-based time series point cloud 3D target detection | |
CN112818935B (en) | Multi-lane congestion detection and duration prediction method and system based on deep learning | |
CN109919107B (en) | Traffic police gesture recognition method based on deep learning and unmanned vehicle | |
Lee et al. | Ev-reconnet: Visual place recognition using event camera with spiking neural networks | |
Wen-juan et al. | Application of vision sensing technology in urban intelligent traffic control system | |
CN118314530B (en) | Video anti-tailing method based on abnormal event detection | |
CN117058474B (en) | Depth estimation method and system based on multi-sensor fusion | |
Lee et al. | Low computational vehicle lane changing prediction using drone traffic dataset | |
CN117649491A (en) | Real test scene virtual reconstruction method for ice and snow aerial photographing driving data | |
CN114998402B (en) | Monocular depth estimation method and device for pulse camera | |
CN116797640A (en) | A depth and 3D key point estimation method for intelligent accompanying patrol vehicles | |
CN116934977A (en) | A visual three-dimensional perception method and system based on three-dimensional occupancy prediction and neural rendering | |
Ayalew et al. | Self-Supervised Representation Learning for Motion Control of Autonomous Vehicles | |
Lyu et al. | Sensor Fusion and Motion Planning with Unified Bird’s-Eye View Representation for End-to-end Autonomous Driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |