CN112800879B - Vehicle-mounted video-based front vehicle position prediction method and prediction system - Google Patents
Vehicle-mounted video-based front vehicle position prediction method and prediction system Download PDFInfo
- Publication number
- CN112800879B CN112800879B CN202110051940.3A CN202110051940A CN112800879B CN 112800879 B CN112800879 B CN 112800879B CN 202110051940 A CN202110051940 A CN 202110051940A CN 112800879 B CN112800879 B CN 112800879B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- sequence
- front vehicle
- optical flow
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a vehicle-mounted video-based front vehicle position prediction method, which comprises the following steps: constructing a vehicle position prediction model based on a coding and decoding frame, and predicting the position and the scale of a front vehicle according to historical data of optical flows in a surrounding frame and a surrounding frame of the front vehicle and prediction data of the motion information of the vehicle; constructing a sample set and training a vehicle position prediction model; acquiring a vehicle-mounted video; carrying out vehicle detection and tracking on the video frame and calculating an optical flow to obtain a bounding box sequence and an optical flow sequence of the front vehicle; predicting the motion information of the vehicle to form a motion prediction sequence; intercepting the light streams in the surrounding frame and the surrounding frame of the front vehicle in the T video frames before the current time T and the predicted value of the motion information of the vehicle in the delta video frames after T, inputting a vehicle position prediction model to obtain a surrounding frame sequence of the front vehicle in the delta video frames after T, and predicting the position and the scale of the front vehicle. The method is based on the video information shot by the automobile data recorder only, and can predict the position and the scale of the front automobile in real time.
Description
Technical Field
The invention belongs to the technical field of auxiliary driving, and particularly relates to a method and a system for predicting a position of a front vehicle based on a vehicle-mounted video.
Background
With the continuous development of society, domestic automobiles have gained popularity. When enjoying the convenience brought by the automobile, a plurality of problems follow, such as frequent occurrence of traffic safety accidents, severe road driving environment, pollution to ecological environment and the like. All kinds of problems threaten the life and property of people, especially the problem of traffic accidents, so safe driving becomes an urgent need of the public. The traffic accident is often caused because the driver can not respond to the behaviors of other traffic participants on a driving road in time, and the automobile data recorder is used by a large number of automobile owners at present and can record video images and sound of the automobile owners in the whole driving process.
The current prediction methods for vehicle positions proposed at home and abroad can be roughly classified into two types, namely a traditional method and a deep learning-based method.
The traditional vehicle position prediction method such as a Bayesian filtering method is too simple in structure, cannot analyze complex vehicle motion modes, and often cannot perform long-term prediction well. Although the dynamic Bayesian network can solve the problems by describing various potential factors for determining the vehicle track by using a graphic model and displaying and modeling the physical process for generating the vehicle track, the model structure determined based on the intuition of a designer is not enough to capture various dynamic traffic scenes, the performance of the real traffic scene is limited, the calculation complexity is high, and the requirement of real-time prediction cannot be met.
In recent years, methods based on deep learning exhibit a great ability in the field of image processing, and many researchers also apply the recurrent neural network structure and various variant structures thereof in the deep learning method to the task of vehicle position prediction. The methods utilize the past driving data of the vehicle, train in a deep learning network model, and obtain good prediction effect in respective application scenes. However, these studies have two problems: first, past driving data of the vehicle needs to be captured by various sensors mounted on the vehicle, which is not common on today's production vehicles; second, only the pixel position of the preceding vehicle can be predicted, and the scale of the preceding vehicle cannot be predicted.
The invention only predicts the position and the scale of the front vehicle in real time based on the image information shot by the automobile data recorder, so that the driver has enough time to avoid traffic accidents in the driving process, and the invention can be better applied to the actual scene.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a method for predicting the position of a front vehicle based on a vehicle-mounted video, which can predict the position and the scale of the front vehicle in real time only based on the video information shot by a vehicle data recorder, so that a driver has enough time to avoid traffic accidents in the driving process, and the method can be better applied to actual scenes.
The technical scheme is as follows: the invention discloses a vehicle-mounted video-based front vehicle position prediction method, which comprises a training stage and a prediction stage, wherein the training stage comprises the following steps:
s1, constructing a vehicle position prediction model based on an encoding and decoding frame, wherein the vehicle position prediction model is used for predicting surrounding frames of the front vehicle at T +1, T +2, … and T + delta moments after the current moment T according to a surrounding frame of the front vehicle at T-0, T-1, … and T- (T-1) moment before the current moment T, optical flows in the surrounding frame and motion information of the vehicle at T +1, T +2, … and T + delta moments after the current moment T;
the input of the vehicle position prediction model includes: in the video frames at T moments before the current moment T, the surrounding frame sequence B of the front vehicle, the optical flow sequence F in the surrounding frame of the front vehicle and the motion prediction sequence M of the self vehicle in the video frames at delta moments after the current moment T;
the output of the vehicle position prediction model is a predicted bounding box sequence Y of a front vehicle in a video frame image of delta moments after the current moment t;
the vehicle position prediction model includes: the system comprises a front vehicle surrounding frame encoder, a front vehicle optical flow encoder, a feature fusion unit and a front vehicle position prediction decoder;
the front vehicle surrounding frame encoder is used for encoding a surrounding frame sequence B of the front vehicle to obtain a time sequence feature vector of the front vehicle
The front vehicle optical flow encoder is used for encoding an optical flow sequence F in a surrounding frame of the front vehicle to obtain a motion characteristic vector of the front vehicleMeasurement of
The feature fusion unit fuses time-series feature vectors of a preceding vehicleAnd motion feature vectorFused feature vector connected as front vehicle
The front vehicle position prediction decoder predicts the feature vector according to the motion prediction sequence M of the vehicleDecoding to obtain a predicted surrounding frame of a front vehicle in a video frame of delta moments after the current moment t;
s2, constructing a sample set and training a vehicle position prediction model, wherein the method comprises the following steps:
s2-1, collecting a plurality of vehicle-mounted video clips with the duration of S and capable of shooting the front vehicle, sampling video frames in each video clip, and determining a surrounding frame sequence B of the front vehicle in the sampled video frames tr Optical flow sequence within bounding box F tr Motion prediction sequence M of the vehicle at a time corresponding to the video frame tr Forming a sample set;
s2-2, dividing the sample set into a training set and a verification set; setting a learning rate sigma and a batch processing number N;
s2-3, determining a training batch N' according to the number of samples in the training set and N by adopting an Adam optimizer in the training process; b corresponding to the video frame s' before the video clip in the training sample tr 、F tr M corresponding to video frame of last s' duration tr As input of vehicle position prediction model, B corresponding to video frame with time length of s ″ later tr As output, the model is trainedStoring the model parameters and verifying the prediction accuracy of the model by using a verification set; s' + s ═ s;
s2-4, selecting the model parameter with the highest prediction accuracy in N' batch training as the parameter of the vehicle position prediction model;
the prediction phase comprises:
the method comprises the steps that a camera capable of shooting a front vehicle is arranged on the vehicle, and video data collected by the camera in the running process of the vehicle are obtained;
carrying out vehicle detection and tracking on each frame of image in the video to obtain a bounding box sequence of each front vehicle, and storing the bounding box sequence in the B test (i) In the middle, i is the serial number of the front vehicle; while calculating the light flow in the bounding box, storing in F test (i) (ii) a Obtaining the motion information of the vehicle in the future frame and storing the motion information into the sequence M test ;
In the sequence B test (i) And F test (i) In which a first sliding window of length T is used, in sequence M test The method includes the steps of adopting a second sliding window with the length of delta, respectively intercepting a surrounding frame of a vehicle i in T video frames before the current time T, an optical flow in the surrounding frame and a predicted value of motion information of the vehicle in delta video frames after the current time T, inputting the intercepted value into a trained vehicle position prediction model, and obtaining a surrounding frame sequence Y '(i) ═ Y' t+1 (i),Y′ t+2 (i),…,Y′ t+δ (i),…,Y′ t+△ (i)]And calculating the relative position of the surrounding frame of the front vehicle i in the video frame at the current moment:wherein B is test,t+0 (i) An enclosure for the vehicle i ahead at the current time t; delta is not less than 1 and not more than delta;
obtaining a predicted track of a front vehicle i according to the center of the surrounding frame in Y' (i); and obtaining the dimension i of the front vehicle according to the width and the height of the surrounding frame in Y' (i).
The surrounding frame sequence of the front vehicle is calculated by adopting the following steps:
a.1, carrying out vehicle detection on video frame images at continuous T moments to obtain surrounding frames of all vehicles in each frame image;
and A.2, tracking the vehicle enclosure frame obtained in the step A.1 by adopting a multi-target tracking algorithm, giving the same number to the same vehicle in different frames, and forming a front vehicle enclosure frame sequence B of T moments according to a time sequence.
The light stream sequence in the surrounding frame of the front vehicle is calculated by adopting the following steps:
b.1, calculating the optical flow of each frame and the previous frame of image of the frame of the video images at the continuous T moments to obtain an optical flow graph corresponding to each frame of image; the two-dimensional optical flow vector of the jth pixel point in the optical flow graph is as follows: I.C. A j =(u j ,v j ),u j ,v j The vertical and horizontal components of the optical flow vector, respectively;
b.2, intercepting a covering part of the front vehicle surrounding frame in the image at the T-T moment from a light flow graph corresponding to the image at the T-T moment, zooming to a preset uniform size to obtain a light flow graph in the surrounding frame at the T-T moment, and forming a light flow sequence F in the front vehicle surrounding frame at the T moments according to a time sequence, wherein T-T represents the T moment before the T moment, and T is more than or equal to 0 and less than or equal to T and is less than T.
The motion prediction sequence of the vehicle is calculated by adopting the following steps:
c.1, calculating a video frame P at an adjacent moment for the video frames at T-0, T-1, … and T- (T-1) before the current moment T t-τ-1 And P t-τ Camera rotation matrix R t-τ And a translation vector V t-τ Forming a rotation matrix sequence RS and a translation vector sequence VS, and the value is more than or equal to 0 and less than or equal to tau<T, specifically comprising the steps C.1-1 to C.1-2:
c.1-1, calculating to obtain an essential matrix E by adopting an eight-point method, wherein the method comprises the following steps:
c.1-1-1, extracting P by Surf algorithm t-τ-1 And P t-τ And 8 pairs of the most matched feature points (a) are selected l ,a′ l ) 1,2, …, 8; wherein a is l ,a′ l Respectively representing video frames P t-τ-1 And P t-τ Coordinates of the pixel positions of the matched characteristic points of the ith pair on a normalized plane, a l =[x l ,y l ,1] T ,a′ l =[x′ l ,y′ l ,1] T ;a l ,a′ l Each of the matrices is 3 × 1, where T represents a transpose of the matrix;
c.1-1-2, combining 8 pairs of matched feature points to obtain a 3 x 8 matrix a and a':
a T Ea′=0
solving the equation system to obtain an essential matrix E, wherein E is a matrix of 3 multiplied by 3;
c.1-2, performing singular value decomposition on E to obtain a rotation matrix R of the camera t-τ And a translation vector V t-τ Wherein R is t-τ Is a 3 × 3 matrix, V t-τ Is a 3-dimensional column vector;
finally obtaining a rotation matrix sequence RS ═ R of T video frames before T time t-(T-1) ,…,R t-τ ,…,R t-1 ,R t-0 T time is earlier than T video frames, and the translation vector sequence VS is ═ V t-(T-1) ,…,V t-τ ,…,V t-1 ,V t-0 };
C.2 for the camera rotation matrix and translation vector in the RS and VS obtained at C.1, calculate each R t-τ And V t-τ And a cumulative value of the previous time, said cumulative value being R' t-τ And V' t-τ Expressed as follows:
c.3, calculating R 'finally obtained from C.2' t-0 And V' t-0 The rotation matrix and translation vector transferred to the camera at the next moment, as followsShown in the specification:
R t+1 =R′ t-0
V t+1 =V′ t-0
c.4, R obtained from C.3 t+1 And V t+1 Adding the rotation matrix sequence RS and the translation vector sequence VS obtained at C.1 respectively at the end, and continuing to execute C.2 and C.3 until all rotation matrices { R } of delta video frames after t time are obtained t+1 ,R t+2 ,…,R t+δ ,…,R t+△ All translation vectors V for delta video frames after time t t+1 ,V t+2 ,…,V t+δ ,…,V t+△ },1≤δ≤△;
C.5, calculating the motion vector of the vehicle at the time delta after the current time t, and forming a motion prediction sequence M of the vehicle, wherein the motion prediction sequence M is { M ═ M t+1 ,M t+2 ,…,M t+δ ,…,M t+△ The method specifically comprises the following steps of C.5-1 to C.5-2:
c.5-1, from rotation matrix R t+δ Extracting the rotation angle information of the camera in the x, y and z axes and using 3-dimensional line vectorTo indicate, wherein:
in the above formula, r jk Representing a rotation matrix R t+δ The value of the jth row and the kth column in the jth row, j, k belongs to {1,2,3 }; atan2() and atan () both represent arctan functions, but the range of values of the result obtained by atan () is (0,2 π)]The value range of the result obtained by atan2() is (-pi, pi)];
C.5-2, vector psi t+δ And a translation vector V converted into a three-dimensional row vector t+δ T Are connected to form a 6-dimensional row vector M t+δ :M t+δ =[ψ t+δ ,V t+δ T ];
Finally obtaining the motion prediction sequence M ═ { M ═ of the vehicle t+1 ,M t+2 ,…,M t+δ ,…,M t+△ };
C.6, passing M through a full connection layer FC 4 Transform the dimensions of all its motion vectors.
The front vehicle surrounding frame encoder comprises a coding gating recurrent neural network GRU b And a first full connection layer FC 1 (ii) a The GRU b The input of (a) is the bounding box B at each time in the bounding box sequence B of the preceding vehicle t-τ And last time GRU b Passing down hidden state vectorsOutputting the coded result of the surrounding frame of the front vehicle at the current momentFC 1 To GRU b Final outputDimension conversion is carried out to obtain a time sequence feature vector of a vehicle in front of the current time t
The front vehicle optical flow encoder comprises a CNN-based motion feature extraction network FEN and a second full connection layer FC 2 (ii) a The input of the FEN is an optical flow sequence F in a surrounding frame of the front vehicle, and the output is an optical flow coding result in the surrounding frame of the front vehicle at the current moment; the FEN is based on a ResNet50 framework and comprises a convolution layer conv1, a Relu layer, a maximum pooling layer maxPool and 4 residual learning blocks which are connected in sequence; where conv1 has 2m input channels, m being the light in the optical flow sequence FSampling number of flow diagrams, namely uniformly sampling m light flow diagrams from F; the 4 residual error learning blocks are all of a three-layer structure, namely each residual error learning block is a convolution network layer and a Relu layer which are connected in series;
uniformly sampling m optical flow graphs of an optical flow sequence F in a surrounding frame of a front vehicle, wherein vertical components and horizontal components of the m optical flow graphs form 2m optical flow components which are input into FEN, and the output of the FEN is the motion characteristic of the optical flow graph in the surrounding frame of the front vehicle at the current moment;
FC 2 performing dimension transformation on the motion characteristics output by the FEN to obtain the motion characteristic vector of the vehicle in front of the current time t
The preceding vehicle position prediction decoder comprises a decoding gated recurrent neural network GRU d And a third full connection layer FC 3 (ii) a The GRU d The input of (1) is a predicted value M of the vehicle motion information at the moment of t + delta t+δ And the last GRU d Passing down hidden state vectorFusion vector Mh of t+δ And last time GRU d Passing down hidden state vector 1≤δ≤△,Outputting a decoding result of a surrounding frame of a front vehicle at the moment of t + deltaFC 3 To pairAnd carrying out dimension conversion to obtain a front vehicle surrounding frame at the t + delta moment.
On the other hand, the invention also discloses a prediction system for realizing the vehicle-mounted video-based method for predicting the position of the front vehicle, which comprises the following steps:
the vehicle position prediction model based on the coding and decoding frame is used for predicting the bounding box of the front vehicle at T +1, T +2, … and T + delta moments after the current moment T according to the bounding box of the front vehicle at T-0, T-1, … and T- (T-1) moments before the current moment T, the optical flow in the bounding box and the motion information of the self vehicle at T +1, T +2, … and T + delta moments after the current moment T;
the vehicle position prediction model includes: the system comprises a front vehicle surrounding frame encoder, a front vehicle optical flow encoder, a feature fusion unit and a front vehicle position prediction decoder;
the front vehicle surrounding frame encoder is used for encoding a surrounding frame sequence B of the front vehicle to obtain a time sequence characteristic vector of the front vehicle
The front vehicle optical flow encoder is used for encoding an optical flow sequence F in a surrounding frame of the front vehicle to obtain a motion characteristic vector of the front vehicle
The feature fusion unit fuses time-series feature vectors of a preceding vehicleAnd motion feature vectorFused feature vector connected as front vehicle
The front vehicle position prediction decoder predicts the feature vector according to the motion prediction sequence M of the vehicleDecoding to obtain views of delta time after the current time tA predicted bounding box of a leading vehicle in the frequency frame;
the vehicle surrounding frame acquisition module is used for acquiring a surrounding frame sequence B of a front vehicle in the vehicle-mounted video;
the vehicle surrounding frame light stream acquisition module is used for acquiring a light stream sequence F in a front vehicle surrounding frame in the vehicle-mounted video;
and the vehicle motion information prediction module is used for predicting the motion information of the vehicle at the future time to form a vehicle motion prediction sequence M.
Has the advantages that: the invention discloses a method for predicting the position of a front vehicle, which has the following advantages: 1. the invention is only based on the video image information shot by the automobile data recorder, thereby effectively solving the problem of low applicability in the current vehicle production caused by the need of depending on various sensors to obtain information in other methods in the prior art; 2. the invention adopts a deep learning network model based on a coding-decoding frame, not only can predict the position of the front vehicle, but also can predict the scale of the front vehicle, and obviously improves the prediction performance.
Drawings
FIG. 1 is a flow chart of a method for predicting a position of a vehicle ahead based on a vehicle video according to the present invention;
FIG. 2 is a schematic illustration of video frame vehicle detection tracking;
FIG. 3 is a schematic diagram of an optical flow extraction method for adjacent frames;
FIG. 4 is a schematic diagram of a vehicle position prediction model;
FIG. 5 is a schematic diagram of a GRU structure;
FIG. 6 is a schematic diagram of a motion feature extraction network;
FIG. 7 is a schematic view of a sliding window;
FIG. 8 is a diagram illustrating predicted results in an example;
fig. 9 is a schematic structural diagram of a vehicle-mounted video-based front vehicle position prediction system disclosed in the invention.
Detailed Description
The invention is further elucidated with reference to the drawings and the detailed description.
As shown in FIG. 1, the invention discloses a method for predicting the position of a vehicle ahead based on a vehicle-mounted video, which comprises a training stage and a prediction stage, wherein the training stage comprises the following steps:
s1, constructing a vehicle position prediction model based on an encoding and decoding frame, wherein the vehicle position prediction model is used for predicting surrounding frames of the front vehicle at T +1, T +2, … and T + delta moments after the current moment T according to a surrounding frame of the front vehicle at T-0, T-1, … and T- (T-1) moment before the current moment T, optical flows in the surrounding frame and motion information of the vehicle at T +1, T +2, … and T + delta moments after the current moment T;
in the present embodiment, T is 20, and Δ is 40;
the input of the vehicle position prediction model includes: in video frames at T moments before the current moment T, in an enclosing frame sequence B of a front vehicle, an optical flow sequence F in an enclosing frame of the front vehicle and video frames at delta moments after the current moment T, a motion prediction sequence M of the vehicle is obtained;
wherein B ═ B t-0 ,B t-1 ,…,B t-τ ,…B t-(T-1) ],B t-τ A bounding box in the video frame of the front vehicle at the time before the time t, the bounding box uses the horizontal and vertical coordinate x of the center point of the bounding box t-τ ,y t-τ Width w of the bounding box t-τ High h, h t-τ Is represented by, i.e. B t-τ =(x t-τ ,y t-τ ,w t-τ ,h t-τ );0≤τ<T;
In the invention, the surrounding frame sequence of the front vehicle is calculated by adopting the following steps:
a.1, carrying out vehicle detection on video frame images at continuous T moments to obtain surrounding frames of all vehicles in each frame image;
in the embodiment, a vehicle detection model established based on Mask-RCNN is adopted for vehicle detection, the vehicle detection model is trained by adopting a COCO data set, the output of the COCO data set is vehicle enclosing frames in an image, and each enclosing frame is represented by a 4-dimensional vector; the image size in the video is uniformly scaled to 1024 x 1024 before the input Mask-RCNN.
And A.2, tracking the vehicle enclosure frame obtained in the step A.1 by adopting a multi-target tracking algorithm, giving the same number to the same vehicle in different frames, and forming a front vehicle enclosure frame sequence B of T moments according to a time sequence. In the embodiment, the multiple target tracking is performed by adopting a Sort algorithm, and the Sort algorithm is an online real-time multiple target tracking algorithm and is suitable for tracking the vehicle in the vehicle-mounted video. FIG. 2 is a schematic diagram of video frame vehicle detection tracking. In fig. 2,3 vehicles are detected in two video frames at different time points, and the same vehicle number is 1,2, and 3.
F=[F t-0 ,F t-1 ,…,F t-τ ,…F t-(T-1) ],F t-τ A light flow map within an enclosure in a video frame representing a preceding vehicle at a time τ before time t, F t-τ ={(u t-τ (p),v t-τ (p))},(u t-τ (p),v t-τ (p)) is a two-dimensional optical flow vector at the p-th pixel point in the optical flow graph;
the optical flow sequence in the front vehicle surrounding frame is calculated by adopting the following steps:
b.1, calculating the optical flow of each frame and the previous frame of image of the frame of the video images at the continuous T moments to obtain an optical flow graph corresponding to each frame of image; in the embodiment, the FlowNet2 algorithm is adopted to calculate the optical flow of adjacent frames; the two-dimensional optical flow vector of the jth pixel point in the optical flow graph is as follows: i is j =(u j ,v j ),u j ,v j Vertical and horizontal components of the optical flow vector, respectively; as shown in fig. 3.
And B.2, intercepting a covering part of the front vehicle surrounding frame in the image at the T-T moment from a light flow graph corresponding to the image at the T-T moment, zooming to a preset uniform size to obtain a light flow graph in the surrounding frame at the T-T moment, and forming a light flow sequence F in the front vehicle surrounding frame at the T moments according to a time sequence, wherein the T-T represents the T-th moment before the moment T, and T is more than or equal to 0 and less than T. In this embodiment, the optical flow maps within the bounding box are uniformly scaled to 224 x 224.
In the driving process, the vehicle itself moves in addition to the vehicle movement in the scene in front of the vehicle, and the movement of the vehicle itself must be predicted to predict the movement of the vehicle in front of the vehicle.
The motion information prediction sequence of the vehicle is calculated by the following steps:
c.1, calculating a video frame P at an adjacent moment for the video frames at T-0, T-1, … and T- (T-1) before the current moment T t-τ-1 And P t-τ Camera rotation matrix R t-τ And a translation vector V t-τ Forming a rotation matrix sequence RS and a translation vector sequence VS, and the value is more than or equal to 0 and less than or equal to tau<T, specifically comprising the steps C.1-1 to C.1-2:
c.1-1, calculating to obtain an essential matrix E by adopting an eight-point method, wherein the method comprises the following steps:
c.1-1-1, extracting P by Surf algorithm t-τ-1 And P t-τ And 8 pairs of the most matched feature points (a) are selected l ,a′ l ) 1,2, …, 8; wherein a is l ,a′ l Respectively representing video frames P t-τ-1 And P t-τ Coordinates of the pixel positions of the matched characteristic points of the ith pair on a normalized plane, a l =[x l ,y l ,1] T ,a′ l =[x′ l ,y′ l ,1] T ;a l ,a′ l Each of the matrices is 3 × 1, where T represents a transpose of the matrix;
c.1-1-2, combining 8 pairs of matched feature points to obtain a 3 x 8 matrix a and a':
a T Ea′=0
solving the equation set to obtain an essential matrix E, wherein E is a matrix of 3 multiplied by 3;
c.1-2, performing singular value decomposition on E to obtain a rotation matrix R of the camera t-τ And a translation vector V t-τ Wherein R is t-τ Is a 3 × 3 matrix, V t-τ Is a 3-dimensional column vector;
finally obtaining a rotation matrix sequence RS ═ R of T video frames before T time t-(T-1) ,…,R t-τ ,…,R t-1 ,R t-0 T time is earlier than T video frames, and the translation vector sequence VS is ═ V t-(T-1) ,…,V t-τ ,…,V t-1 ,V t-0 };
C.2 for the camera rotation matrix and translation vector in the RS and VS obtained at C.1, calculate each R t-τ And V t-τ And a cumulative value of the previous time, said cumulative value being R' t-τ And V' t-τ Expressed, as shown in the following equation:
c.3, calculating R 'finally obtained from C.2' t-0 And V' t-0 The rotation matrix and the translation vector of the camera at the next moment are transmitted, and the following formula is shown:
R t+1 =R′ t-0
V t+1 =V′ t-0
C.4R obtained from C.3 t+1 And V t+1 Adding the rotation matrix sequence RS and the translation vector sequence VS obtained at C.1 respectively at the end, and continuing to execute C.2 and C.3 until all rotation matrices { R } of delta video frames after t time are obtained t+1 ,R t+2 ,…,R t+δ ,…,R t+△ All translation vectors V for delta video frames after time t t+1 ,V t+2 ,…,V t+δ ,…,V t+△ },1≤δ≤△;
C.5, calculating the motion vector of the vehicle at the time delta after the current time t, and forming a motion prediction sequence M of the vehicle, wherein M is equal to { M { t+1 ,M t+2 ,…,M t+δ ,…,M t+△ The method specifically comprises the following steps of C.5-1 to C.5-2:
c.5-1, Slave rotation matrix R t+δ Extracting the rotation angle information of the camera in the x, y and z axes and using 3-dimensional line vectorTo indicate, wherein:
in the above formula, r jk Representing a rotation matrix R t+δ The value of the jth row and the kth column in the jth row, j, k belongs to {1,2,3 }; atan2() and atan () both represent arctan functions, but the range of values of the result obtained by atan () is (0,2 π)]The value range of the result obtained by atan2() is (-pi, pi)];
C.5-2, vector psi t+δ And a translation vector V converted into a three-dimensional row vector t+δ T Are connected to form a 6-dimensional row vector M t+δ :M t+δ =[ψ t+δ ,V t+δ T ];
Finally obtaining the motion prediction sequence M ═ { M ═ of the vehicle t+1 ,M t+2 ,…,M t+δ ,…,M t+△ };
C.6, passing M through a full connection layer FC 4 Transforming the dimensions of all its motion vectors to be associated with a decoding gated recurrent neural network GRU d Hidden state vector passed down at last momentThe dimensions are consistent. The fully-connected output dimension in this embodiment is 512 dimensions.
The output of the vehicle position prediction model is a predicted bounding box sequence Y of the front vehicle in a video frame image of delta time after the current time t, wherein Y is [ Y ═ Y [ [ Y ] t+1 ,Y t+2 ,…,Y t+δ ,…,Y t+△ ](ii) a Wherein Y is t+δ Indicating vehicle ahead after time tPredicted bounding box in delta time video frame image, said bounding box being represented by the horizontal and vertical coordinates of the bounding box center point, the width and height of the bounding box, i.e. Y t+δ =(x t+δ ,y t+δ ,w t+δ ,h t+δ );
As shown in fig. 4, the vehicle position prediction model includes: the system comprises a front vehicle surrounding frame encoder 1-1, a front vehicle optical flow encoder 1-2, a feature fusion unit 1-3 and a front vehicle position prediction decoder 1-4;
the front vehicle surrounding frame encoder 1-1 is used for encoding a surrounding frame sequence B of a front vehicle to obtain a time sequence feature vector of the front vehicle
The front vehicle surrounding frame encoder mainly uses a Gated Recurrent neural network (GRU) for encoding. The GRU can only keep the relevant information for prediction, but forget the irrelevant data, and the structure is as shown In fig. 5, the input is the input In at the current moment t And the hidden state vector h passed down by the GRU at the previous time t-1 ,h t-1 The hidden state vector represents the position and scale information of the preceding vehicle in the past time period. Bound In t And h t-1 GRU outputs hidden state vector h at current moment t The whole forward propagation process calculation formula is as follows:
wherein z is t Represents the output of the update gate, σ () represents the sigmoid function, W z Weight parameter, r, representing the update gate t Represents the output of a reset gate, W r A weight parameter representing a reset gate is set,represents an output to which the current time is to be determined, and tanh () representsThe function of the hyperbolic tangent function is,a weight parameter representing a value to be determined, [,]indicating that the two vectors are connected. The formula group is abbreviated as:
wherein c is a specific application category, U is GRU c Input value at the current moment, V is GRU c The weight parameter of (2).
The front vehicle surrounding frame encoder comprises a coding gate control recurrent neural network GRU b And a first full connection layer FC 1 (ii) a The GRU b The input of (a) is the bounding box B at each time in the bounding box sequence B of the preceding vehicle t-τ And last time GRU b Passing down hidden state vectorOutputting the coded result of the surrounding frame of the front vehicle at the current momentFC 1 To GRU b Final outputDimension conversion is carried out to obtain a time sequence feature vector of a vehicle in front of the current time t
Coding gated recurrent neural network GRU b The structure of (1) is as follows:
where φ () represents linear mapping using the ReLU activation function, θ b Represents GRU b Weight parameter of (1)A number V. In the present embodiment, the first and second electrodes are,has a dimension of 512, FC 1 Will be provided withIs transformed into 256, i.e.Has a dimension of 256.
The front vehicle optical flow encoder 1-2 is used for encoding an optical flow sequence F in a surrounding frame of the front vehicle to obtain a motion characteristic vector of the front vehicle
The front vehicle optical flow encoder comprises a CNN-based motion feature extraction network FEN and a second full connection layer FC 2 (ii) a The input of the FEN is an optical flow sequence F in a surrounding frame of the front vehicle, and the output is an optical flow coding result in the surrounding frame of the front vehicle at the current moment; as shown in fig. 6, the FEN is based on the ResNet50 architecture, and includes a convolution layer conv1, a Relu layer, a max pooling layer maxPool, and 4 residual learning blocks connected in sequence, as shown in fig. 6- (a); wherein the number of input channels of conv1 is 2m, m is the number of samples of the optical flow graph in the optical flow sequence F, that is, m optical flow graphs are uniformly sampled from F, where m is 10 in this embodiment; each of the 4 residual learning blocks is of a three-layer structure, that is, each of the residual learning blocks is 3 convolutional network layers Conv2 and Relu layers connected in series, as shown in fig. 6- (b).
And uniformly sampling m light flow graphs for the light flow sequence F in the surrounding frame of the front vehicle, wherein the vertical component and the horizontal component of each light flow graph are regarded as two channels of the light flow graph. The vertical component and the horizontal component of the m light flow graphs form 2m light flow components which are input into the FEN, and the output of the FEN is the motion characteristic in the light flow graph in the front vehicle surrounding frame at the current moment; in this embodiment, the motion feature dimension extracted by FEN is 2048 dimensions, FC 2 Converting the dimension of the motion characteristic output by the FEN into 256 to obtain the current256-dimensional motion feature vector of vehicle ahead of time t
The feature fusion unit 1-3 fuses the time sequence feature vector of the preceding vehicleAnd motion feature vectorFused feature vector connected as front vehicle History information indicating the vehicle bounding box history information and optical flow history information, that is, information on the position, scale, appearance, and movement of the preceding vehicle at different time points in the past time period; in the present embodiment, the first and second electrodes are,is a 512-dimensional vector.
The preceding vehicle position prediction decoder 1-4 predicts the feature vector of the vehicle according to the motion prediction sequence M of the vehicleDecoding to obtain a predicted surrounding frame of a front vehicle in a video frame of delta moments after the current moment t;
the preceding vehicle position prediction decoder comprises a decoding gated recurrent neural network GRU d And a third full connection layer FC 3 (ii) a The GRU d The input of (1) is a predicted value M of the vehicle motion information at the moment of t + delta t+δ And GRU at last time d Passing down hidden state vectorsFusion vector Mh of t+δ And last time GRU d Get overHidden state vector of1≤δ≤△,Outputting a decoding result of a surrounding frame of a front vehicle at the moment of t + deltaFC 3 For is toAnd performing dimension conversion, converting the vector into a 4-dimensional vector, and obtaining a front vehicle surrounding frame at the t + delta moment.
Decoding gated recurrent neural network GRU d The structure of (1) is as follows:
wherein theta is d Is GRU d The weight parameter V in (1).
In this embodiment, the fusion vector Mh t+δ Is calculated as:
for 6-dimensional vector M t+δ Using a fourth full connection layer FC 4 Transformation into 512-dimensional vectorTo pairLinear mapping is carried out by using a ReLU activation function, and the vector after linear mapping is subjected to linear mappingAdding the two to obtain a 512-dimensional fusion vector Mh t+δ ,Wherein Average () represents the Average of two vectors after addition。
S2, constructing a sample set and training a vehicle position prediction model, wherein the method comprises the following steps:
s2-1, collecting a plurality of vehicle-mounted video clips with the duration of S and capable of shooting the front vehicle, sampling video frames in each video clip, and determining a surrounding frame sequence B of the front vehicle in the sampled video frames tr Optical flow sequence within bounding box F tr Motion information sequence M of the vehicle at a time corresponding to the video frame tr Forming a sample set;
s2-2, dividing the sample set into a training set and a verification set; setting a learning rate sigma and a batch processing number N;
s2-3, determining a training batch N' according to the number of samples in the training set and N by adopting an Adam optimizer in the training process; b corresponding to the video frame s' before the video clip in the training sample tr 、F tr M corresponding to video frame of last s' duration tr As input of vehicle position prediction model, B corresponding to video frame with time length of s ″ later tr As output, training the model, storing model parameters, and verifying the prediction accuracy of the model by using a verification set; s' + s ═ s;
s2-4, selecting the model parameter with the highest prediction accuracy in N' batch training as the parameter of the vehicle position prediction model;
in the embodiment, 1000 video clips are collected, the duration of each video clip is 3 seconds, 20 frames per second are obtained, and the surrounding frame of the vehicle in 2 seconds is predicted according to the surrounding frame of the vehicle in the first 1 second; the training set accounts for 70% of the sample set, and the validation set accounts for 30%. The training process used an Adam optimizer, with a fixed learning rate of 0.0005 and a batch number of 64, for a total of 40 batches. Calculating a sequence of actual bounding boxes of a vehicle in trainingUsing smoothL1 loss function and feedback error to optimize and store the final network weight parameter with the difference value of the bounding box Y in the prediction result; the loss function is shown as follows:
where | represents the modulus of the computed vector.
The prediction phase comprises:
the method comprises the steps that a camera capable of shooting a front vehicle is arranged on the vehicle, and video data collected by the camera in the driving process of the vehicle are obtained;
carrying out vehicle detection and tracking on each frame of image in the video to obtain a bounding box sequence of each front vehicle, and storing the bounding box sequence in the B test (i) In the middle, i is the serial number of the front vehicle; while computing the light flow in the bounding box, storing F test (i) (ii) a Obtaining the motion information of the vehicle in the future frame and storing the motion information into the sequence M test ;
In the sequence B test (i) And F test (i) Using a first sliding window SW-1 of length T in sequence M test The method adopts a second sliding window SW-2 with the length of delta, respectively cuts out a surrounding frame of a vehicle i in T video frames before the current time T, an optical flow in the surrounding frame and a predicted value of motion information of the vehicle in delta video frames after the current time T, inputs the cut values into a trained vehicle position prediction model, and obtains a surrounding frame sequence Y ' (i) ═ Y ' of a front vehicle i in delta video frames after the current time T ' t+1 (i),Y′ t+2 (i),…,Y′ t+δ (i),…,Y′ t+△ (i)]And calculating the relative position of the bounding box of the front vehicle i in the video frame at the current moment:wherein B is test,t+0 (i) A bounding box for the vehicle i ahead at the current time t; delta is not less than 1 and not more than delta; the sliding window is shown in fig. 7. And as the time continues, the two sliding windows move forward one grid, and the position of the front vehicle at the next moment is detected.
Obtaining a predicted track of a front vehicle i according to the center of the surrounding frame in Y' (i); and obtaining the dimension i of the front vehicle according to the width and the height of the surrounding frame in Y' (i).
In this embodiment, the prediction result is displayed in the video frame at the current time, as shown in fig. 8.
As shown in fig. 9, the present invention also discloses a prediction system for implementing the method for predicting a position of a vehicle ahead based on a vehicle-mounted video, including:
the vehicle position prediction model 1 based on the coding and decoding frame is used for predicting the bounding boxes of the front vehicle at T +1, T +2, … and T + delta moments after the current moment T according to the bounding boxes of the front vehicle at T-0, T-1, … and T- (T-1) moments before the current moment T, the optical flows in the bounding boxes and the motion information of the vehicle at T +1, T +2, … and T + delta moments after the current moment T;
the vehicle position prediction model includes: the system comprises a front vehicle surrounding frame encoder 1-1, a front vehicle optical flow encoder 1-2, a feature fusion unit 1-3 and a front vehicle position prediction decoder 1-4;
the front vehicle surrounding frame encoder is used for encoding a surrounding frame sequence B of the front vehicle to obtain a time sequence feature vector of the front vehicle
The front vehicle optical flow encoder is used for encoding an optical flow sequence F in a surrounding frame of the front vehicle to obtain a motion characteristic vector of the front vehicle
The feature fusion unit fuses time sequence feature vectors of a preceding vehicleAnd motion feature vectorFused feature vector connected as front vehicle
The forward vehicle position prediction decoder predicts a sequence M of feature vectors according to the motion information of the vehicleDecoding to obtain a predicted surrounding frame of a front vehicle in a video frame of delta moments after the current moment t;
the vehicle surrounding frame acquisition module 2 is used for acquiring a surrounding frame sequence B of a front vehicle in the vehicle-mounted video;
the vehicle surrounding frame light stream acquisition module 3 is used for acquiring a light stream sequence F in a front vehicle surrounding frame in the vehicle-mounted video;
and the vehicle motion information prediction module 4 is used for predicting the motion information of the vehicle at the future time to form a vehicle motion prediction sequence M.
Claims (10)
1. A vehicle-mounted video-based front vehicle position prediction method comprises a training phase and a prediction phase, and is characterized in that the training phase comprises the following steps:
s1, constructing a vehicle position prediction model based on a coding and decoding frame, wherein the vehicle position prediction model is used for predicting enclosing frames of the front vehicle at T +1, T +2, … and T + delta moments after the current moment T according to T-0, T-1, … and T- (T-1) moments before the current moment T, optical flows in the enclosing frames and motion information of the vehicle at T +1, T +2, … and T + delta moments after the current moment T;
the input of the vehicle position prediction model includes: in the video frames at T moments before the current moment T, the surrounding frame sequence B of the front vehicle, the optical flow sequence F in the surrounding frame of the front vehicle and the motion prediction sequence M of the self vehicle in the video frames at delta moments after the current moment T;
the output of the vehicle position prediction model is a predicted bounding box sequence Y of a front vehicle in a video frame image of delta moments after the current moment t;
the vehicle position prediction model includes: the system comprises a front vehicle surrounding frame encoder, a front vehicle optical flow encoder, a feature fusion unit and a front vehicle position prediction decoder;
the front vehicle surrounding frame encoder is used for encoding a surrounding frame sequence B of the front vehicle to obtain a time sequence characteristic vector of the front vehicle
The front vehicle optical flow encoder is used for encoding an optical flow sequence F in a surrounding frame of the front vehicle to obtain a motion characteristic vector of the front vehicle
The feature fusion unit fuses time-series feature vectors of a preceding vehicleAnd motion feature vectorFused feature vector connected as front vehicle
The front vehicle position prediction decoder predicts the feature vector according to the motion prediction sequence M of the vehicleDecoding to obtain a prediction surrounding frame of a front vehicle in video frames at delta moments after the current moment t;
s2, constructing a sample set and training a vehicle position prediction model, wherein the method comprises the following steps:
s2-1, collecting a plurality of vehicle-mounted video clips with the duration of S and capable of shooting a front vehicle, sampling video frames in each video clip, and determining a surrounding frame sequence B of the front vehicle in the sampled video frames tr Optical flow sequence within bounding box F tr Motion prediction sequence M of the vehicle at a time corresponding to the video frame tr Forming a sample set;
s2-2, dividing the sample set into a training set and a verification set; setting a learning rate sigma and a batch processing number N;
s2-3, adopting Adam optimizer in the training process,determining a training batch N' according to the number of the training set samples and N; b corresponding to the video frame s' duration before the video clip in the training sample tr 、F tr M corresponding to video frame of last s' duration tr As input of vehicle position prediction model, B corresponding to video frame with time length of s ″ later tr As output, training the model, storing model parameters, and verifying the prediction accuracy of the model by using a verification set; s' + s ═ s;
s2-4, selecting the model parameter with the highest prediction accuracy in N' batch training as the parameter of the vehicle position prediction model;
the prediction phase comprises:
the method comprises the steps that a camera capable of shooting a front vehicle is arranged on the vehicle, and video data collected by the camera in the driving process of the vehicle are obtained;
carrying out vehicle detection and tracking on each frame of image in the video to obtain an enclosure frame sequence of each front vehicle, and storing the enclosure frame sequence in B test (i) In the middle, i is the serial number of the front vehicle; while calculating the light flow in the bounding box, storing in F test (i) (ii) a Obtaining the motion information of the vehicle in the future frame and storing the motion information into the sequence M test ;
In the sequence B test (i) And F test (i) In which a first sliding window of length T is used, in sequence M test The method includes the steps of adopting a second sliding window with the length of delta, respectively intercepting a surrounding frame of a vehicle i in T video frames before the current time T, an optical flow in the surrounding frame and a predicted value of motion information of the vehicle in delta video frames after the current time T, inputting the intercepted value into a trained vehicle position prediction model, and obtaining a surrounding frame sequence Y '(i) ═ Y' t+1 (i),Y′ t+2 (i),…,Y′ t+δ (i),…,Y′ t+△ (i)]And calculating the relative position of the bounding box of the front vehicle i in the video frame at the current moment:wherein B is test,t+0 (i) A bounding box for the vehicle i ahead at the current time t; delta is not less than 1 and not more than delta;
obtaining a predicted track of a front vehicle i according to the center of the surrounding frame in Y' (i); and obtaining the dimension i of the front vehicle according to the width and the height of the surrounding frame in Y' (i).
2. A preceding vehicle position prediction method according to claim 1, characterized in that the sequence of bounding boxes of the preceding vehicle is calculated using the steps of:
a.1, carrying out vehicle detection on video frame images at continuous T moments to obtain surrounding frames of all vehicles in each frame image;
and A.2, tracking the vehicle enclosure frame obtained in the step A.1 by adopting a multi-target tracking algorithm, giving the same number to the same vehicle in different frames, and forming a front vehicle enclosure frame sequence B of T moments according to a time sequence.
3. The preceding vehicle position prediction method according to claim 1, characterized in that the optical flow sequence within the preceding vehicle bounding box is calculated by using:
b.1, calculating the optical flow of each frame and the previous frame of image of the frame of the video images at the continuous T moments to obtain an optical flow graph corresponding to each frame of image; the two-dimensional optical flow vector of the jth pixel point in the optical flow graph is as follows: i is j =(u j ,v j ),u j ,v j Vertical and horizontal components of the optical flow vector, respectively;
and B.2, intercepting a covering part of the front vehicle surrounding frame in the image at the T-T moment from a light flow graph corresponding to the image at the T-T moment, zooming to a preset uniform size to obtain a light flow graph in the surrounding frame at the T-T moment, forming a light flow sequence F in the front vehicle surrounding frame at the T moment according to a time sequence, wherein T-T represents the T-th moment before the moment T, and T is more than or equal to 0 and less than T.
4. The preceding vehicle position prediction method according to claim 1, characterized in that the motion prediction sequence of the own vehicle is calculated by using:
c.1, calculating the video frames at T-0, T-1, … and T- (T-1) before the current time TAdjacent moment video frame P t-τ-1 And P t-τ Camera rotation matrix R t-τ And a translation vector V t-τ Forming a rotation matrix sequence RS and a translation vector sequence VS, and the value is more than or equal to 0 and less than or equal to tau<T, specifically comprising the steps C.1-1 to C.1-2:
c.1-1, calculating to obtain an essential matrix E by adopting an eight-point method, wherein the method comprises the following steps:
c.1-1-1, extracting P by using Surf algorithm t-τ-1 And P t-τ And 8 pairs of the most matched feature points (a) are selected l ,a′ l ) 1,2, …, 8; wherein a is l ,a′ l Respectively representing video frames P t-τ-1 And P t-τ Coordinates of the pixel positions of the matched characteristic points of the ith pair on a normalized plane, a l =[x l ,y l ,1] T ,a′ l =[x′ l ,y′ l ,1] T ;a l ,a′ l Each of the matrices is 3 × 1, where T represents a transpose of the matrix;
c.1-1-2, combining 8 pairs of matched feature points to obtain a 3 x 8 matrix a and a':
a T Ea′=0
solving the equation set to obtain an essential matrix E, wherein E is a matrix of 3 multiplied by 3;
c.1-2, performing singular value decomposition on E to obtain a rotation matrix R of the camera t-τ And a translation vector V t-τ Wherein R is t-τ Is a 3 × 3 matrix, V t-τ Is a 3-dimensional column vector;
finally obtaining a rotation matrix sequence RS ═ R of T video frames before T time t-(T-1) ,…,R t-τ ,…,R t-1 ,R t-0 T time is earlier than T video frames, and the translation vector sequence VS is ═ V t-(T-1) ,…,V t-τ ,…,V t-1 ,V t-0 };
C.2 Camera rotation matrix and translation vector in RS and VS obtained for C.1Calculating each R t-τ And V t-τ And a cumulative value of the previous time, the cumulative value being R' t-τ And V' t-τ Expressed as follows:
c.3, calculating R 'finally obtained from C.2' t-0 And V' t-0 The rotation matrix and the translation vector of the camera at the next moment are transmitted, and the following formula is shown:
R t+1 =R′ t-0
V t+1 =V′ t-0
C.4R obtained from C.3 t+1 And V t+1 Adding the rotation matrix sequence RS and the translation vector sequence VS obtained at C.1 respectively at the end, and continuing to execute C.2 and C.3 until all rotation matrices { R } of delta video frames after t time are obtained t+1 ,R t+2 ,…,R t+δ ,…,R t+△ All translation vectors V for delta video frames after time t t+1 ,V t+2 ,…,V t+δ ,…,V t+△ },1≤δ≤△;
C.5, calculating the motion vector of the vehicle at the time delta after the current time t, and forming a motion prediction sequence M of the vehicle, wherein M is equal to { M { t+1 ,M t+2 ,…,M t+δ ,…,M t+△ The method specifically comprises the following steps of C.5-1 to C.5-2:
c.5-1, from rotation matrix R t+δ Extracting the rotation angle information of the camera in the x, y and z axes and using 3-dimensional line vectorIs shown, in which:
in the above formula, r jk Representing a rotation matrix R t+δ The value of the jth row and the kth column, j, k belongs to {1,2,3 }; both atan2() and atan () represent arctan functions, but the range of values obtained by atan () is (0,2 π)]The value range of the result obtained by atan2() is (-pi, pi)];
C.5-2, vector psi t+δ And a translation vector V converted into a three-dimensional row vector t+δ T Are connected to form a 6-dimensional row vector M t+δ :M t+δ =[ψ t+δ ,V t+δ T ];
Finally obtaining the motion prediction sequence M ═ { M ═ of the vehicle t+1 ,M t+2 ,…,M t+δ ,…,M t+△ };
C.6, passing M through a full connection layer FC 4 Transform the dimensions of all its motion vectors.
5. The method of predicting a location of a preceding vehicle as claimed in claim 1, wherein the preceding vehicle bounding box encoder includes a coded gated recurrent neural network GRU b And a first full connection layer FC 1 (ii) a The GRU b The input of (a) is the bounding box B at each time in the bounding box sequence B of the preceding vehicle t-τ And last time GRU b Passing down hidden state vectorOutputting the coded result of the surrounding frame of the front vehicle at the current momentFC 1 To GRU b Final outputDimension conversion is carried out to obtain a time sequence feature vector of the vehicle in front of the current moment t
6. The preceding vehicle position prediction method according to claim 1, characterized in that the preceding vehicle optical flow encoder includes a CNN-based motion feature extraction network FEN and a second full connection layer FC 2 (ii) a The input of the FEN is an optical flow sequence F in a surrounding frame of the front vehicle, and the output is an optical flow coding result in the surrounding frame of the front vehicle at the current moment; the FEN is based on a ResNet50 framework and comprises a convolution layer conv1, a Relu layer, a maximum pooling layer maxPool and 4 residual learning blocks which are connected in sequence; the number of input channels of conv1 is 2m, and m is the number of samples of the optical flow graph in the optical flow sequence F, that is, m optical flow graphs are uniformly sampled from F; the 4 residual error learning blocks are all of a three-layer structure, namely each residual error learning block is a convolution network layer and a Relu layer which are connected in series;
uniformly sampling m optical flow graphs of an optical flow sequence F in a surrounding frame of a front vehicle, wherein vertical components and horizontal components of the m optical flow graphs form 2m optical flow components which are input into FEN, and the output of the FEN is the motion characteristic of the optical flow graph in the surrounding frame of the front vehicle at the current moment;
7. A preceding vehicle position prediction method according to claim 1, characterized in that the preceding vehicleThe vehicle position prediction decoder comprises a decoding gated recurrent neural network GRU d And a third full connection layer FC 3 (ii) a The GRU d Is the predicted value M of the vehicle motion information at the moment of t + delta t+δ And GRU at last time d Passing down hidden state vectorFusion vector Mh of t+δ And last time GRU d Passing down hidden state vector1≤δ≤△,Outputting a decoding result of a surrounding frame of a front vehicle at the moment of t + deltaFC 3 To pairAnd carrying out dimension conversion to obtain a front vehicle surrounding frame at the t + delta moment.
8. A preceding vehicle position prediction system based on an in-vehicle video, characterized by comprising:
the vehicle position prediction model based on the coding and decoding frame is used for predicting the bounding box of the front vehicle at T +1, T +2, … and T + delta moments after the current moment T according to the bounding box of the front vehicle at T-0, T-1, … and T- (T-1) moments before the current moment T, the optical flow in the bounding box and the motion information of the self vehicle at T +1, T +2, … and T + delta moments after the current moment T;
the vehicle position prediction model includes: the system comprises a front vehicle surrounding frame encoder, a front vehicle optical flow encoder, a feature fusion unit and a front vehicle position prediction decoder;
the front vehicle surrounding frame encoder is used for surrounding frames of front vehiclesCoding the sequence B to obtain the time sequence characteristic vector of the front vehicle
The front vehicle optical flow encoder is used for encoding an optical flow sequence F in a surrounding frame of the front vehicle to obtain a motion characteristic vector of the front vehicle
The feature fusion unit fuses time-series feature vectors of a preceding vehicleAnd motion feature vectorFused feature vector connected as front vehicle
The front vehicle position prediction decoder predicts the feature vector according to the motion prediction sequence M of the vehicleDecoding to obtain a predicted surrounding frame of a front vehicle in a video frame of delta moments after the current moment t;
the vehicle surrounding frame acquiring module is used for acquiring a surrounding frame sequence B of a front vehicle in the vehicle-mounted video;
the vehicle surrounding frame light stream acquisition module is used for acquiring a light stream sequence F in a front vehicle surrounding frame in the vehicle-mounted video;
and the vehicle motion information prediction module is used for predicting the motion information of the vehicle in the future time to form a vehicle motion prediction sequence M.
9. The preceding vehicle position prediction system of claim 8, characterized in that the preceding vehicle position prediction system is the preceding vehicle position prediction systemThe vehicle surround frame encoder includes a coded gated recurrent neural network GRU b And a first full connection layer FC 1 (ii) a The GRU b The input of (a) is the bounding box B at each time in the bounding box sequence B of the preceding vehicle t-τ And last time GRU b Passing down hidden state vectorsOutputting the coded result of the surrounding frame of the front vehicle at the current momentFC 1 To GRU b Final outputDimension conversion is carried out to obtain a time sequence feature vector of a vehicle in front of the current time t
10. The preceding vehicle position prediction system according to claim 8, characterized in that the preceding vehicle optical flow encoder includes a CNN-based motion feature extraction network FEN and a second full connection layer FC 2 (ii) a The input of the FEN is an optical flow sequence F in a surrounding frame of the front vehicle, and the output is an optical flow coding result in the surrounding frame of the front vehicle at the current moment; the FEN is based on a ResNet50 framework and comprises a convolution layer conv1, a Relu layer, a maximum pooling layer maxPool and 4 residual learning blocks which are connected in sequence; wherein the number of input channels of conv1 is 2m, and m is the number of samples of the optical flow graph in the optical flow sequence F, that is, m optical flow graphs are uniformly sampled from F; the 4 residual error learning blocks are all of a three-layer structure, namely each residual error learning block is a convolution network layer and a Relu layer which are connected in series;
uniformly sampling m optical flow graphs of an optical flow sequence F in a surrounding frame of a front vehicle, wherein vertical components and horizontal components of the m optical flow graphs form 2m optical flow components which are input into FEN, and the output of the FEN is the motion characteristic of the optical flow graph in the surrounding frame of the front vehicle at the current moment;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051940.3A CN112800879B (en) | 2021-01-15 | 2021-01-15 | Vehicle-mounted video-based front vehicle position prediction method and prediction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051940.3A CN112800879B (en) | 2021-01-15 | 2021-01-15 | Vehicle-mounted video-based front vehicle position prediction method and prediction system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112800879A CN112800879A (en) | 2021-05-14 |
CN112800879B true CN112800879B (en) | 2022-08-26 |
Family
ID=75811025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110051940.3A Active CN112800879B (en) | 2021-01-15 | 2021-01-15 | Vehicle-mounted video-based front vehicle position prediction method and prediction system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800879B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113610900B (en) * | 2021-10-11 | 2022-02-15 | 深圳佑驾创新科技有限公司 | Method and device for predicting scale change of vehicle tail sequence and computer equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846854A (en) * | 2018-05-07 | 2018-11-20 | 中国科学院声学研究所 | A kind of wireless vehicle tracking based on motion prediction and multiple features fusion |
CN111914664A (en) * | 2020-07-06 | 2020-11-10 | 同济大学 | Vehicle multi-target detection and track tracking method based on re-identification |
CN111931905A (en) * | 2020-07-13 | 2020-11-13 | 江苏大学 | Graph convolution neural network model and vehicle track prediction method using same |
-
2021
- 2021-01-15 CN CN202110051940.3A patent/CN112800879B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846854A (en) * | 2018-05-07 | 2018-11-20 | 中国科学院声学研究所 | A kind of wireless vehicle tracking based on motion prediction and multiple features fusion |
CN111914664A (en) * | 2020-07-06 | 2020-11-10 | 同济大学 | Vehicle multi-target detection and track tracking method based on re-identification |
CN111931905A (en) * | 2020-07-13 | 2020-11-13 | 江苏大学 | Graph convolution neural network model and vehicle track prediction method using same |
Non-Patent Citations (1)
Title |
---|
基于CNN和LSTM混合模型的车辆行为检测方法;王硕等;《智能计算机与应用》;20200201(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112800879A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Top-view trajectories: A pedestrian dataset of vehicle-crowd interaction from controlled experiments and crowded campus | |
CN109740419A (en) | A kind of video behavior recognition methods based on Attention-LSTM network | |
Piccoli et al. | Fussi-net: Fusion of spatio-temporal skeletons for intention prediction network | |
Bai et al. | Deep learning based motion planning for autonomous vehicle using spatiotemporal LSTM network | |
CN109910909A (en) | A kind of interactive prediction technique of vehicle track net connection of more vehicle motion states | |
CN110599521B (en) | Method for generating trajectory prediction model of vulnerable road user and prediction method | |
CN104506800A (en) | Scene synthesis and comprehensive monitoring method and device for electronic police cameras in multiple directions | |
CN108267123A (en) | A kind of double-current vehicle-mounted pedestrian vehicle Forecasting Methodology based on bounding box and range prediction | |
CN113592905B (en) | Vehicle driving track prediction method based on monocular camera | |
CN111292366A (en) | Visual driving ranging algorithm based on deep learning and edge calculation | |
CN114820708A (en) | Peripheral multi-target trajectory prediction method based on monocular visual motion estimation, model training method and device | |
CN112800879B (en) | Vehicle-mounted video-based front vehicle position prediction method and prediction system | |
CN117274749A (en) | Fused 3D target detection method based on 4D millimeter wave radar and image | |
CN113435356B (en) | Track prediction method for overcoming observation noise and perception uncertainty | |
CN114299473A (en) | Driver behavior identification method based on multi-source information fusion | |
CN117058474B (en) | Depth estimation method and system based on multi-sensor fusion | |
CN114620059B (en) | Automatic driving method, system thereof and computer readable storage medium | |
CN117516581A (en) | End-to-end automatic driving track planning system, method and training method integrating BEVFomer and neighborhood attention transducer | |
CN112733734A (en) | Traffic abnormal event detection method based on combination of Riemann manifold characteristics and LSTM network | |
Lee et al. | Low computational vehicle lane changing prediction using drone traffic dataset | |
Wang et al. | An end-to-end auto-driving method based on 3D LiDAR | |
CN115512323A (en) | Vehicle track prediction method in automatic driving field of vision based on deep learning | |
Wang et al. | LSTM-based prediction method of surrounding vehicle trajectory | |
Liu et al. | End-to-end control of autonomous vehicles based on deep learning with visual attention | |
CN111242044A (en) | Night unmanned vehicle scene prediction method based on ConvLSTM dual-channel coding network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |