CN105184779B

CN105184779B - One kind is based on the pyramidal vehicle multiscale tracing method of swift nature

Info

Publication number: CN105184779B
Application number: CN201510528703.6A
Authority: CN
Inventors: 解梅; 陈熊; 罗招材; 于国辉
Original assignee: University of Electronic Science and Technology of China
Current assignee: Houpu Clean Energy Group Co ltd
Priority date: 2015-08-26
Filing date: 2015-08-26
Publication date: 2018-04-06
Anticipated expiration: 2035-08-26
Also published as: CN105184779A

Abstract

The present invention provides a vehicle multi-scale tracking method based on a fast feature pyramid, using aggregated channel feature ACF features, and using a fast feature pyramid to approximate features on adjacent scales, thereby quickly calculating tracking results on different scales. Aggregate channel features to obtain gradient images from L, U, and V channels of vehicle images to obtain gradient channels; cascade these channel features to form ACF features. Fast Pyramid scales and approximates the original feature map through the idea of statistics. According to the energy statistical relationship between the scaling of each channel, the image on the original scale can be directly calculated to obtain the image on the scale after scaling. Vehicle tracking based on aggregated channel features not only utilizes the global information of multiple channels, but also makes full use of the local information of the vehicle on each channel, which improves the robustness of vehicle tracking; and uses the fast feature pyramid to achieve tracking accuracy. Multi-scale, and ensure the real-time tracking.

Description

A Multi-Scale Vehicle Tracking Method Based on Fast Feature Pyramid

技术领域technical field

本发明属于数字图像处理技术，特别涉及计算机视觉识别技术。The invention belongs to digital image processing technology, in particular to computer vision recognition technology.

技术背景technical background

随着经济技术的不断发展，科技水平的不断提高，人们的生活质量也相应发生翻天覆地的变化。近年来随着机器视觉技术的发展,机器视觉在智能交通系统中得到了广泛的应用,如交通事件及流量的监测、路面病害检测以及智能车辆的自动导航等。作为智能交通系统的一个重要组成部分,智能安全车辆是目前的研究热点，智能车辆利用智能算法理解车辆的即时环境,为安全行驶提供保障。With the continuous development of economy and technology and the continuous improvement of scientific and technological level, people's quality of life has also undergone earth-shaking changes. In recent years, with the development of machine vision technology, machine vision has been widely used in intelligent transportation systems, such as traffic incidents and flow monitoring, road surface disease detection, and automatic navigation of intelligent vehicles. As an important part of the intelligent transportation system, intelligent safety vehicles are the current research hotspots. Intelligent vehicles use intelligent algorithms to understand the real-time environment of vehicles and provide guarantees for safe driving.

车辆跟踪技术是继车辆检测技术之后，又一大新的技术领域，两者紧密相连，缺一不可。根据摄像头与目标之间的位置关系，可将跟踪方法分为静态背景下的目标跟踪和动态背景跟踪。Vehicle tracking technology is another big new technical field after vehicle detection technology. The two are closely connected and indispensable. According to the positional relationship between the camera and the target, tracking methods can be divided into target tracking under static background and dynamic background tracking.

㈠静态背景下的目标跟踪方法：(1) Target tracking method in static background:

静态背景下的目标跟踪指的是摄像头是固定在某一方位，其所观察的视野也是静止的。通常采用背景差分法和高斯背景建模。背景差法先将前景图像与背景图像做差，就可以得到进入视野的目标物体。对于目标的描述，通常用目标连通区域的像素数目的多少来表达目标的大小，或者用目标区域的高宽比等。目标的位置信息可采用投影的方式来定位。高斯背景建模，即先对背景进行建模，通常使用K(基本为3到5个)个高斯模型来表征图像中各个像素点的特征,在新一帧图像获得后更新混合高斯模型,然后从视频流中读取图像(我们称之为前景图像)，用当前图像中的每个像素点与混合高斯模型匹配,如果成功则判定该点为背景点,否则为前景点。Target tracking in a static background means that the camera is fixed at a certain orientation, and the field of view it observes is also static. Background subtraction and Gaussian background modeling are usually used. The background subtraction method first makes the difference between the foreground image and the background image, and then the target object entering the field of view can be obtained. For the description of the target, the size of the target is usually expressed by the number of pixels in the connected area of the target, or the aspect ratio of the target area. The location information of the target can be located by means of projection. Gaussian background modeling, that is, to model the background first, usually using K (basically 3 to 5) Gaussian models to characterize the characteristics of each pixel in the image, update the mixed Gaussian model after a new frame of image is obtained, and then Read the image from the video stream (we call it the foreground image), and use each pixel in the current image to match the mixed Gaussian model. If it succeeds, it is determined that the point is a background point, otherwise it is a foreground point.

㈡动态背景下的目标跟踪方法：(2) Target tracking method in dynamic background:

动态背景下的目标跟踪指的是摄像头并不固定在某一位置上，摄像头所观察的视野并不是静止的，也即背景相对于摄像头也是运动的。对于这种情况，通常采用基于检测的跟踪方法对单帧图像进行检测，以达到连续检测的效果。Object tracking in a dynamic background means that the camera is not fixed at a certain position, and the field of view observed by the camera is not static, that is, the background is also moving relative to the camera. For this case, the detection-based tracking method is usually used to detect single-frame images to achieve the effect of continuous detection.

常见的跟踪算法有：卡尔曼滤波，粒子滤波，TLD，CT。Common tracking algorithms are: Kalman filter, particle filter, TLD, CT.

㈠卡尔曼滤波：卡尔曼滤波(Kalman filtering)一种利用线性系统状态方程，通过系统输入输出观测数据，对系统状态进行最优估计的算法。由于观测数据中包括系统中的噪声和干扰的影响，所以最优估计也可看作是滤波过程。基于卡尔曼滤波器的车辆跟踪方法在进行跟踪时，是利用过去的状态信息和当前的测量结果预测当前的最优状态信息，以达到跟踪的目的。(1) Kalman filtering: Kalman filtering (Kalman filtering) is an algorithm that uses the linear system state equation to optimally estimate the system state through the input and output observation data of the system. Since the observation data includes the influence of noise and interference in the system, the optimal estimation can also be regarded as a filtering process. The vehicle tracking method based on the Kalman filter uses past state information and current measurement results to predict the current optimal state information to achieve the purpose of tracking.

㈡粒子滤波：粒子滤波(PF:Particle Filter)的思想基于蒙特卡洛方法(MonteCarlo methods)，它是利用粒子集来表示概率，可以用在任何形式的状态空间模型上。其核心思想是通过从后验概率中抽取的随机状态粒子来表达其分布，是一种顺序重要性采样法(Sequential Importance Sampling)。简单来说，粒子滤波法是指通过寻找一组在状态空间传播的随机样本对概率密度函数进行近似，以样本均值代替积分运算，从而获得状态最小方差分布的过程。这里的样本即指粒子,当样本数量N→∝时可以逼近任何形式的概率密度分布。(2) Particle filter: The idea of particle filter (PF: Particle Filter) is based on the Monte Carlo method (Monte Carlo methods), which uses particle sets to represent probability and can be used in any form of state space model. Its core idea is to express its distribution through random state particles extracted from the posterior probability, which is a sequential importance sampling method (Sequential Importance Sampling). In simple terms, the particle filter method refers to the process of approximating the probability density function by finding a group of random samples propagated in the state space, and replacing the integral operation with the sample mean value, so as to obtain the minimum variance distribution of the state. The samples here refer to the particles, and when the number of samples is N→∝, any form of probability density distribution can be approximated.

㈢TLD：TLD(Tracking-Learning-Detection)的立足点是解决长时间跟踪中被跟踪目标发生形状变化、光照条件变化、尺度变化、遮挡等情况。检测器是将已经检测到的特征(表征目标物体)进行局部化处理，并且根据需要不断修正跟踪器。学习器估计出检测器的错误，并及时更新检测器，以避免后续再出现这些错误。传统的跟踪算法，前端需要跟检测模块相互配合，当检测到被跟踪目标之后，就开始进入跟踪模块，而此后，检测模块就不会介入到跟踪过程中。而该算法与传统跟踪算法的显著区别在于将传统的跟踪算法和传统的检测算法相结合来解决被跟踪目标在被跟踪过程中发生的形变、部分遮挡等问题。(3) TLD: The foothold of TLD (Tracking-Learning-Detection) is to solve the shape change, illumination condition change, scale change, occlusion, etc. of the tracked target during long-term tracking. The detector localizes the detected features (representing the target object) and continuously corrects the tracker as needed. The learner estimates the errors of the detector and updates the detector in time to avoid these errors in the future. In the traditional tracking algorithm, the front end needs to cooperate with the detection module. When the tracked target is detected, it starts to enter the tracking module. After that, the detection module will not intervene in the tracking process. The significant difference between this algorithm and the traditional tracking algorithm is that the traditional tracking algorithm and the traditional detection algorithm are combined to solve the deformation and partial occlusion of the tracked target during the tracking process.

㈣CT：CT(Compressive Tracking)是根据稀疏感知理论，用一个满足RIP条件的非常稀疏的测量矩阵对原图像特征空间进行投影，就可以得到一个低维压缩子空间。低维压缩子空间可以很好的保留高维图像特征空间的信息。所以我们通过稀疏测量矩阵去提取前景目标和背景的特征，作为在线学习更新分类器的正样本和负样本，然后使用该朴素贝叶斯分类器去分类下一帧图像的目标待测图像片。(iv) CT: CT (Compressive Tracking) is based on the theory of sparse perception, using a very sparse measurement matrix that satisfies the RIP condition to project the original image feature space, and a low-dimensional compressed subspace can be obtained. The low-dimensional compressed subspace can well preserve the information of the high-dimensional image feature space. Therefore, we use the sparse measurement matrix to extract the features of the foreground target and background, as the positive and negative samples of the online learning update classifier, and then use the naive Bayesian classifier to classify the target image slice of the next frame of image.

㈤MOSSE：MOSSE(Minimum Output Sum of Squared Error filter)滤波器是一种实时性好，利用相关卷积矩阵求取最小输出均方误差的跟踪方法。并且对于光照、大小、姿势、非刚性形变具有一定的鲁棒性。该算法首先假设跟踪目标区域的输出响应呈高斯分布，利用卷积性质，可以求得卷积滤波器，然后在后一帧图像的当前位置，用此滤波器进行滤波来求得新的输出响应，根据新的输出相信确定目标在当前帧的位置。(v) MOSSE: The MOSSE (Minimum Output Sum of Squared Error filter) filter is a tracking method with good real-time performance and using the correlation convolution matrix to obtain the minimum output mean square error. And it is robust to lighting, size, pose, and non-rigid deformation. The algorithm first assumes that the output response of the tracking target area has a Gaussian distribution. Using the convolution property, the convolution filter can be obtained, and then at the current position of the next frame image, use this filter to filter to obtain a new output response. , according to the new output belief to determine the position of the target in the current frame.

发明内容Contents of the invention

本发明所要解决的技术问题是，提供一种快速、准确的车辆跟踪方法。The technical problem to be solved by the invention is to provide a fast and accurate vehicle tracking method.

本发明为解决上述技术问题所采用的技术方案是，一种基于快速特征金字塔的车辆多尺度跟踪方法，当上一帧检测到目标时，(x₀,y₀)为上一帧检测到的目标区域的中心点，则对当前帧进行目标跟踪，具体方法如下：The technical solution adopted by the present invention to solve the above-mentioned technical problems is a multi-scale vehicle tracking method based on fast feature pyramid. When a target is detected in the previous frame, (x ₀ , y ₀ ) is the detected The center point of the target area, then perform target tracking on the current frame, the specific method is as follows:

步骤1、计算当前帧图像在原始尺度下的聚合通道特征得到原始尺度下的特征图：Step 1. Calculate the aggregated channel features of the current frame image at the original scale to obtain the feature map at the original scale:

1-1将当前帧图像转化到LUV色彩空间，得到L、U、V三通道特征；1-1 Convert the current frame image to LUV color space to obtain L, U, V three-channel features;

1-2求LUV图像的梯度图得到梯度直方图HOG特征；1-2 Find the gradient map of the LUV image to obtain the HOG feature of the gradient histogram;

1-3将L、U、V三通道特征与HOG各方向通道特征进行级联得到当前帧在原始尺度下的聚合通道特征；1-3 Concatenate the L, U, V three-channel features with the HOG channel features in each direction to obtain the aggregated channel features of the current frame at the original scale;

步骤2、计算特征金字塔：Step 2. Calculate the feature pyramid:

根据特征金字塔下各种尺度根据聚合通道特征各通道特征的能量统计关系对原始尺度下的特征图进行特征金字塔采样得到对应尺度下的特征图；经过各特征金字塔采样后特征图的尺度与原始尺度下的特征图的尺度比值为s₁,s₂,...s_n，n为不同尺度总数；各特征金字塔采样包括上采样、下采样；According to the various scales under the feature pyramid, according to the energy statistical relationship of each channel feature of the aggregation channel feature, perform feature pyramid sampling on the feature map at the original scale to obtain the feature map at the corresponding scale; the scale of the feature map after each feature pyramid sampling is the same as the original scale The scale ratio of the feature map below is s ₁ , s ₂ ,...s _n , n is the total number of different scales; each feature pyramid sampling includes upsampling and downsampling;

步骤3、在不同尺度上计算跟踪结果：Step 3. Calculate tracking results on different scales:

3-1原始尺寸下特征图的特征通道的宽高分别为W₀和H₀，设原始尺寸下特征图的目标区域的中心点为(x₀,y₀)；各特征金字塔采样后对应的目标区域的中心位置为(x₀s₁,y₀s₁),(x₀s₂,y₀s₂),...,(x₀s_n,y₀s_n) ；3-1 The width and height of the feature channel of the feature map under the original size are W ₀ and H ₀ respectively, and the center point of the target area of the feature map under the original size is (x ₀ , y ₀ ); The center position of the target area is (x ₀ s ₁ ,y ₀ s ₁ ),(x ₀ s ₂ ,y ₀ s ₂ ),...,(x ₀ s _n ,y ₀ s _n );

3-2计算卷积滤波器：3-2 Calculate the convolution filter:

对于上一帧是首次检测出目标的情况：For the case where the target was detected for the first time in the previous frame:

各特征金字塔采样下，在以目标区域的中心位置为高斯分布的极值产生一个随机高斯分布，所述随机高斯分布对应目标区域的输出响应G₁,G₂,...,G_n，初始化卷积滤波器的滤波系数i＝1,2,…,n，F_i为目标区域的傅里叶变换，G_i为以目标区域为中心位置的随机高斯分布；Under the sampling of each feature pyramid, a random Gaussian distribution is generated with the center position of the target area as the extreme value of the Gaussian distribution, and the random Gaussian distribution corresponds to the output responses G ₁ , G ₂ ,...,G _n of the target area. Initialization Filter coefficients for convolution filters i=1,2,...,n, F _i is the Fourier transform of the target area, G _i is the random Gaussian distribution centered on the target area;

对于上一帧不是首次检测出目标的情况：For the case where the target was not detected for the first time in the previous frame:

卷积滤波器的更新采用如下策略：The update of the convolution filter adopts the following strategy:

η为权重系数，为上一帧卷积滤波器的滤波系数； η is the weight coefficient, is the filter coefficient of the convolution filter of the previous frame;

3-3将特征金字塔采样后各尺寸的特征图输入卷积滤波器，卷积滤波器的输出响应分别为g₁,g₂,...,g_n，分别求得各尺寸下的跟踪值PSR₁,PSR₂,...,PSR_n，g_max为当前尺寸下输出响应的极值，μ_s和σ_s分别为当前尺寸下的输出响应的均值和方差；3-3 Input the feature map of each size after feature pyramid sampling into the convolution filter, and the output responses of the convolution filter are g ₁ , g ₂ ,...,g _n respectively, and the tracking values at each size are obtained respectively PSR ₁ ,PSR ₂ ,...,PSR _n , g _max is the extreme value of the output response at the current size, μ _s and σ _s are the mean and variance of the output response at the current size, respectively;

3-4取跟踪值PSR₁,PSR₂,...,PSR_n中最大的作为跟踪结果PSR，当PSR大于设定的阈值时，认为跟踪成功，则继续下一帧的跟踪，将跟踪结果按对应尺度映射至原尺度上确定目标区域；当PSR小于设定的阈值时，认为跟踪失败，则取消当前跟踪目标。3-4 Take the largest of the tracking values PSR ₁ , PSR ₂ ,...,PSR _n as the tracking result PSR. When the PSR is greater than the set threshold, it is considered that the tracking is successful, and the tracking of the next frame will continue, and the tracking result will be Map the corresponding scale to the original scale to determine the target area; when the PSR is less than the set threshold, it is considered that the tracking fails, and the current tracking target is canceled.

本发明一是使用一种新的，更具辨识能力的聚合通道特征ACF特征代替MOSSE算法中的灰度特征，二是使用快速特征金字塔来逼近相邻尺度上的特征，从而快速计算不同尺度上的跟踪结果。ACF特征首先将采集到RGB车辆图像转化到Luv空间，分别得到L、U、V通道；在此基础上求取梯度图像，得到梯度通道；级联这些通道特征形成ACF特征。快速金字塔是通过统计的思想将原特征图进行缩放逼近，与传统图像缩放方法不同的是，传统方法需要使用最近邻或线性插值等手段来计算缩放结果，其所带来的计算复杂度要远高于快速金字塔方法。快速金字塔根据各个通道缩放之间的能量统计关系，可以有原尺度上的图像直接计算得到缩放之后尺度上的图像。First, the present invention uses a new, more discriminative aggregated channel feature ACF feature to replace the gray feature in the MOSSE algorithm, and second, uses a fast feature pyramid to approximate features on adjacent scales, thereby quickly calculating tracking results. The ACF feature first transforms the collected RGB vehicle image into the Luv space to obtain the L, U, and V channels respectively; on this basis, the gradient image is obtained to obtain the gradient channel; these channel features are cascaded to form the ACF feature. The fast pyramid is to zoom and approximate the original feature map through the idea of statistics. Different from the traditional image zoom method, the traditional method needs to use the nearest neighbor or linear interpolation to calculate the zoom result, which brings more computational complexity. Higher than the Fast Pyramid method. According to the energy statistical relationship between the scaling of each channel, the fast pyramid can directly calculate the image on the original scale to obtain the image on the scale after scaling.

本发明的有益效果是，基于聚合通道特征的车辆跟踪算法不仅利用了多个通道的全局信息，而且在每个通道上也充分利用了车辆的局部信息，提高了车辆跟踪的鲁棒性；并且利用快速特征金字塔实现了跟踪的多尺度，并保证了跟踪的实时性。The beneficial effect of the present invention is that the vehicle tracking algorithm based on aggregated channel features not only utilizes the global information of multiple channels, but also fully utilizes the local information of the vehicle on each channel, which improves the robustness of vehicle tracking; and The multi-scale tracking is achieved by using the fast feature pyramid, and the real-time performance of the tracking is guaranteed.

附图说明Description of drawings

图1是本发明的流程示意图。Fig. 1 is a schematic flow chart of the present invention.

具体实施方案specific implementation plan

为了方便地描述本发明内容，首先对一些术语进行说明。In order to describe the content of the present invention conveniently, some terms are explained first.

Luv通道特征。LUV色彩空间全称CIE 1976(L,u,v)(也作CIELUV)色彩空间，L表示物体亮度，u和v是色度。于1976年由国际照明委员会CIE提出，由CIE XYZ空间经简单变换得到，具视觉统一性。类似的色彩空间有CIELAB。对于一般的图像，u和v的取值范围为-100到+100，亮度为0到100。Luv channel features. The full name of the LUV color space is CIE 1976 (L, u, v) (also known as CIELUV) color space, L represents the brightness of the object, and u and v are the chromaticity. Proposed by the International Commission on Illumination CIE in 1976, it is obtained by simple transformation of the CIE XYZ space and has visual unity. A similar color space is CIELAB. For general images, the values of u and v range from -100 to +100, and the brightness ranges from 0 to 100.

梯度通道特征。梯度通道特征即为一副图像的梯度图，反映的是目标的边缘信息。梯度可以有多种求法，例如Prewitt算子，Sobel算子。然而[-1 0 1]这种最简单的算子却有更好的表现。梯度用来对车辆图像的边缘进行描述。由于Luv通道与RGB通道之间只是线性变化，因此为了方便，可在求得Luv通道之后在Luv通道上求取图像的梯度图。Gradient channel features. The gradient channel feature is the gradient map of an image, which reflects the edge information of the target. There are many ways to find the gradient, such as Prewitt operator and Sobel operator. However, the simplest operator [-1 0 1] has better performance. Gradients are used to describe the edges of vehicle images. Since there is only a linear change between the Luv channel and the RGB channel, for convenience, the gradient map of the image can be obtained on the Luv channel after obtaining the Luv channel.

梯度直方图。梯度直方图即HOG(Histogram of oriented gradient)，首先将图像分成小的连通区域，我们把它叫细胞单元。然后采集细胞单元中各像素点的梯度的或边缘的方向直方图。最后把这些直方图组合起来就可以构成特征描述器。为了提高性能，我们还可以把这些局部直方图在图像的更大的范围内(我们把它叫区间或block)进行对比度归一化 (contrast-normalized)，所采用的方法是：先计算各直方图在这个区间(block)中的密度，然后根据这个密度对区间中的各个细胞单元做归一化。通过这个归一化后，能对光照变化和阴影获得更好的效果。Gradient histogram. The gradient histogram is HOG (Histogram of oriented gradient). First, the image is divided into small connected areas, which we call cell units. Then the gradient or edge direction histogram of each pixel in the cell unit is collected. Finally, these histograms are combined to form a feature descriptor. In order to improve performance, we can also contrast-normalize these local histograms in a larger range of the image (we call it interval or block). The method used is: first calculate each histogram The density of the graph in this interval (block), and then normalize each cell unit in the interval according to this density. After this normalization, better effects on lighting changes and shadows can be obtained.

采用本发明方法，在C++平台上实现算法，具体步骤如下：Adopt the inventive method, realize algorithm on C++ platform, concrete steps are as follows:

摄像头采集的图像一般为RGB图像，RGB图像不利于颜色的聚类。为了能很好的描述车辆的灰度、色度信息，需要将RGB图像转化为LUV图像。具体做法为：The images collected by the camera are generally RGB images, which are not conducive to color clustering. In order to describe the grayscale and chromaticity information of the vehicle well, it is necessary to convert the RGB image into an LUV image. The specific method is:

首先将RGB图像转化为CIE XYZFirst convert the RGB image to CIE XYZ

然后将CIE XYZ转化为LuvThen convert CIE XYZ to Luv

u＝13L(u'-u_n') (3)u＝13L(u'-u _n ') (3)

v＝13L(v'-v_n') (4)v=13L(v'-v _n ') (4)

其中， in,

Y_n为其亮度，(u_n',v_n')为色度坐标。Y _n is its brightness, and (u _n ', v _n ') is its chromaticity coordinates.

梯度计算有很多种方法，比如Prewitt算子和Sobel算子和然而在这里采用最简单的算子[-1 0 1]和进行滤波获得的效果更好。There are many methods for gradient calculation, such as Prewitt operator and Sobel operator and However, the simplest operator [-1 0 1] and The effect obtained by filtering is better.

1-3采样及归一化1-3 sampling and normalization

由于在计算梯度直方图时要将4*4的细胞分配到6个方向上，也即梯度直方图的长宽比为原样本图像的1/4，为了使得所有通道的长宽比保持一致，需将Luv通道图像和梯度图像进行下采样，此处采样并不影响检测结果。在采样过程中使用双线性插值法，以获得较好的效Since the 4*4 cells are assigned to 6 directions when calculating the gradient histogram, that is, the aspect ratio of the gradient histogram is 1/4 of the original sample image, in order to keep the aspect ratio of all channels consistent, The Luv channel image and gradient image need to be down-sampled, and the sampling here does not affect the detection result. Use bilinear interpolation during sampling to get better results

为了抑制梯度计算中噪声的影响，需要对梯度图进行归一化操作。归一化操作有L1-norm， L2-norm和L1-sqrt。In order to suppress the influence of noise in the gradient calculation, it is necessary to normalize the gradient map. The normalization operations are L1-norm, L2-norm and L1-sqrt.

L1-norm：v→v/(||v||₁+ε) (5)L1-norm: v→v/(||v|| ₁ +ε) (5)

L2-norm： L2-norm:

L1-sqrt： L1-sqrt:

其中,ε为是一个很小的数，比如0.01，v为梯度，||·||₁表示一范数，||·||₂表示二范数。在本实施例中采用L2-norm。Among them, ε is a very small number, such as 0.01, v is the gradient, ||·|| ₁ represents a norm, and ||·|| ₂ represents a two-norm. In this embodiment, L2-norm is used.

得到梯度图，对4*4的单元中的每一个像素点为梯度元素的方向在梯度方向直方图上进行投票，从而形成方向梯度直方图。直方图的方向在0-180度或0-360度之间均分，为了减少混叠现象，梯度投票在相邻两个方向上的中心之间需要进行方向和位置上的双线性插值。投票的权重根据梯度幅值进行计算，可以取幅值本身、幅值的平方或者幅值的平方根。实践表明，使用梯度本身作为投票权重效果最好。Get the gradient map, and vote on the gradient direction histogram for each pixel in the 4*4 unit as the direction of the gradient element, thereby forming a direction gradient histogram. The direction of the histogram is equally divided between 0-180 degrees or 0-360 degrees. In order to reduce aliasing, gradient voting requires bilinear interpolation in direction and position between the centers in two adjacent directions. The weight of the vote is calculated according to the magnitude of the gradient, which can be the magnitude itself, the square of the magnitude, or the square root of the magnitude. Practice shows that using the gradient itself as the voting weight works best.

由于局部光照的变化，以及前景背景对比度的变化，使得梯度强度的变化范围非常大，这就需要对梯度做局部对比度归一化。具体的做法是将细胞单元组成更大的空间块，然后针对每个块进行对比度归一化，归一化过程和步骤3一样，最终的描述子是检测窗口内所有块内的细胞单元的直方图构成的向量。事实上，块之间是有重叠的，也就是说，每个细胞单元的直方图都会被多次用于最终的描述子的计算。此方法看起来有冗余，但可以显著的提升性能。Due to the change of local illumination and the change of foreground and background contrast, the gradient strength varies greatly, which requires local contrast normalization of the gradient. The specific method is to form cell units into larger spatial blocks, and then perform contrast normalization for each block. The normalization process is the same as step 3. The final descriptor is the histogram of the cell units in all blocks within the detection window. A vector of diagrams. In fact, there is overlap between the blocks, that is, the histogram of each cell unit will be used multiple times for the calculation of the final descriptor. This method may seem redundant, but it can improve performance significantly.

2-3将L、U、V三通道特征与HOG各方向通道特征进行级联得到当前帧在原始尺度下的聚合通道ACF特征；如梯度直方图有六个方向，则总共得到10个通道。此10个通道即为聚合通道特征。2-3 Concatenate the L, U, V three-channel features with the HOG channel features in each direction to obtain the aggregated channel ACF features of the current frame at the original scale; if the gradient histogram has six directions, a total of 10 channels are obtained. These 10 channels are aggregated channel features.

步骤2、计算特征金字塔Step 2. Calculate the feature pyramid

上采样upsampling

设原图像为I(x,y)，被一个上采样因子k采样后得到的图像为I_k(x,y)＝I(x/k,y/k)，由梯度定义得也即在上采样图像上变化的比例是原尺度图像的1/k倍。设为上采样图像的梯度大小，那么Let the original image be I(x,y), and the image obtained after being sampled by an upsampling factor k is I _k (x,y)=I(x/k,y/k), which is defined by the gradient That is, the ratio of the change on the upsampled image is 1/k times that of the original scale image. Assume is the gradient size of the upsampled image, then

因此，在原图和上采样的图像中，它们的梯度和是与上采样因此k相关的。同理，梯度方向Therefore, in the original image and the upsampled image, their gradient sum is related to the upsampled k. Similarly, the gradient direction

因此，从上面的定义可以看出，在上采样图像中，其梯度关系为h'_q≈kh_q。Therefore, it can be seen from the above definition that in the upsampled image, its gradient relationship is h' _q ≈ kh _q .

下采样downsampling

由于在先采样图像中存在高频损失和能量损失，因此不能由上采样推导出下采样。设I_k为原图像I被下采样因子k下采样后的图像，它们之间的关系应该为h'_q≤h_q/k。由于图像的能量通道图像和该能量通道的下采样图像只跟能量通道的相对尺度有关，而跟原始图像是独立的。若用f(I,s_i)表示在s_i尺度下采样后通道的能量，则Downsampling cannot be derived from upsampling due to high frequency loss and energy loss in the previously sampled image. Let I _k be the image after the original image I is downsampled by the downsampling factor k, and the relationship between them should be h' _q ≤ h _q /k. Since the energy channel image of the image and the downsampled image of the energy channel are only related to the relative scale of the energy channel, they are independent of the original image. If f(I,s _i ) is used to represent the energy of the channel after sampling at the scale of s _i , then

E[f(I,s₁)/f(I,s₂)]＝E[f(I,s₁)]/E[f(I,s₂)]＝r(s₁-s₂) (11)E[f(I,s ₁ )/f(I,s ₂ )]=E[f(I,s ₁ )]/E[f(I,s ₂ )]=r(s ₁ -s ₂ ) ( 11)

成立。因此E[f(I,s₁)/f(I,s₂)]必然有如下形式：established. Therefore E[f(I,s ₁ )/f(I,s ₂ )] must have the following form:

E[f(I,s+s₀)/f(I,s₀)]＝ae^-λs (12)E[f(I,s+s ₀ )/f(I,s ₀ )]=ae ^-λs (12)

通过对大量目标图像和自然图像做数据统计可以得到a和λ。因此下采样图像和原图像之间的关系为f(I,s)＝ae^-λsf(I,0)。A and λ can be obtained by doing data statistics on a large number of target images and natural images. Therefore, the relationship between the downsampled image and the original image is f(I,s)=ae ^-λs f(I,0).

步骤3、多尺度跟踪：Step 3. Multi-scale tracking:

将原MOSSE跟踪算法中的灰度特征替换为ACF特征，并将其每一个通道特征向量化，于是就得到多个特征向量，将其组合成矩阵，并进行傅里叶变换作为F_i，使跟踪更具鲁棒性，并且在不同尺度上计算跟踪结果：Replace the gray features in the original MOSSE tracking algorithm with ACF features, and vectorize the features of each channel, then get multiple feature vectors, combine them into a matrix, and perform Fourier transform as F _i , so that Tracking is more robust, and tracking results are computed at different scales:

3-1原始尺寸下特征图的特征通道的宽高分别为W₀和H₀，设原始尺寸下特征图的目标区域的中心点为(x₀,y₀)；各特征金字塔采样后对应的目标区域的中心位置为(x₀s₁,y₀s₁),(x₀s₂,y₀s₂),...,(x₀s_n,y₀s_n)；3-1 The width and height of the feature channel of the feature map under the original size are W ₀ and H ₀ respectively, and the center point of the target area of the feature map under the original size is (x ₀ , y ₀ ); The center position of the target area is (x ₀ s ₁ ,y ₀ s ₁ ),(x ₀ s ₂ ,y ₀ s ₂ ),...,(x ₀ s _n ,y ₀ s _n );

3-2计算卷积滤波器：3-2 Calculate the convolution filter:

各特征金字塔采样下，在以目标区域的中心位置为高斯分布的极值产生一个随机高斯分布，所述随机高斯分布对应目标区域的输出响应G₁,G₂,...,G_n，初始化卷积滤波器的滤波系数：Under the sampling of each feature pyramid, a random Gaussian distribution is generated with the center position of the target area as the extreme value of the Gaussian distribution, and the random Gaussian distribution corresponds to the output responses G ₁ , G ₂ ,...,G _n of the target area. Initialization The filter coefficients of the convolution filter:

F_i为目标区域的傅里叶变换，G_i为以目标区域为中心位置的随机高斯分布；F _i is the Fourier transform of the target area, G _i is the random Gaussian distribution centered on the target area;

η为权重系数，为上一帧卷积滤波器的滤波系数；η is the weight coefficient, is the filter coefficient of the convolution filter of the previous frame;

3-3将特征金字塔采样后各尺寸的特征图输入卷积滤波器，卷积滤波器的输出响应分别为g₁,g₂,...,g_n，分别求得各尺寸下的跟踪值PSR₁,PSR₂,...,PSR_n；3-3 Input the feature map of each size after feature pyramid sampling into the convolution filter, and the output responses of the convolution filter are g ₁ , g ₂ ,...,g _n respectively, and obtain the tracking values at each size PSR ₁ ,PSR ₂ ,...,PSR _n ;

g_max为当前尺寸下输出响应的极值，μ_s和σ_s分别为当前尺寸下的输出响应的均值和方差；g _max is the extreme value of the output response at the current size, μ _s and σ _s are the mean and variance of the output response at the current size, respectively;

Claims

1. A vehicle multi-scale tracking method based on a fast feature pyramid is disclosed, when a target is detected in a previous frame, (x)₀,y₀) And tracking the target of the current frame for the central point of the target area detected by the previous frame, wherein the target tracking comprises the following steps:

step 1, calculating the aggregation channel characteristics of the current frame image under the original scale to obtain a characteristic map under the original scale:

1-1, converting the current frame image into an LUV color space to obtain L, U, V three-channel characteristics;

1-2, solving a gradient map of the LUV image to obtain a HOG (histogram of gradients) feature;

1-3, cascading the L, U, V three-channel characteristics and the HOG channel characteristics in each direction to obtain the aggregation channel characteristics of the current frame in the original scale;

step 2, calculating a characteristic pyramid:

according to various scales under the characteristic pyramid, performing characteristic pyramid sampling on the characteristic graph under the original scale according to the energy statistical relationship of the characteristics of all channels of the aggregation channel characteristics to obtain a characteristic graph under the corresponding scale; the ratio of the dimension of the feature map after sampling by each feature pyramid to the dimension of the feature map under the original dimension is s₁,s₂,...s_nN is the total number of different scales; each characteristic pyramid sampling comprises up sampling and down sampling;

and 3, calculating tracking results on different scales:

3-1 the width and height of the characteristic channel of the characteristic diagram under the original size are respectively W₀And H₀Let the center point of the target region of the feature map under the original size be (x)₀,y₀) (ii) a The central position of the corresponding target area after each characteristic pyramid is sampled is (x)₀s₁,y₀s₁),(x₀s₂,y₀s₂),...,(x₀s_n,y₀s_n) ；

3-2 calculate convolution filter:

for the case where the previous frame was the first time the target was detected:

under the sampling of each characteristic pyramid, a random Gaussian distribution is generated by taking the central position of the target area as an extreme value of the Gaussian distribution, and the random Gaussian distribution corresponds to the output response G of the target area₁,G₂,...,G_nInitializing the filter coefficients of the convolution filterF_iIs Fourier transform of the target area, G_iRandom Gaussian distribution with a target area as a center position;

for the case where the previous frame was not the first time the target was detected:

the convolution filter is updated by adopting the following strategy:

η are the weight coefficients of the image data,the filter coefficient of the convolution filter of the previous frame;

3-3, inputting the feature graphs of all sizes after the feature pyramid is sampled into a convolution filter, wherein the output responses of the convolution filters are g respectively₁,g₂,...,g_nSeparately obtaining the tracking value PSR of each size₁,PSR₂,...,PSR_n，g_maxFor the extreme value of the output response at the current size, mu_sAnd σ_sRespectively mean and variance of output response under the current size;

3-4 taking tracking value PSR₁,PSR₂,...,PSR_nThe maximum PSR is used as a tracking result PSR, when the PSR is larger than a set threshold value, the tracking is considered to be successful, the tracking of the next frame is continued, and the tracking result is mapped to the original scale according to the corresponding scale to determine a target area; and when the PSR is smaller than the set threshold, the tracking is considered to be failed, and the current tracking target is cancelled.