CN103413312A

CN103413312A - Video target tracking method based on neighborhood components analysis and scale space theory

Info

Publication number: CN103413312A
Application number: CN2013103619324A
Authority: CN
Inventors: 贾静平; 夏宏; 魏振华
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2013-08-19
Filing date: 2013-08-19
Publication date: 2013-11-27
Anticipated expiration: 2033-08-19
Also published as: CN103413312B

Abstract

The invention belongs to the technical field of computer vision, and in particular relates to a video target tracking method based on neighborhood component analysis and scale space theory. The present invention proposes to use the feature transformation function of the Neighborhood Component Analysis (NCA) to obtain the optimal feature for distinguishing the target and the background, and to obtain the best linear classifier for distinguishing the target and background pixels in any frame image. Angle solves the problem of updating target features; a particle confidence calculation method based on multi-scale normalized Laplacian filter function is proposed, and the state diversity and convergence characteristics of particle filter are used to avoid the algorithm falling into the local optimal point problem after the target is occluded. , which guarantees the tracking accuracy based on the scale space theory. The invention can more accurately locate the position and scale of the target, more effectively adapt to the problems of target illumination and color change, and simultaneously robustly process target occlusion.

Description

Video Object Tracking Method Based on Neighborhood Component Analysis and Scale Space Theory

技术领域technical field

本发明属于计算机视觉技术领域，具体涉及一种基于邻里成分分析和尺度空间理论的视频目标跟踪方法。The invention belongs to the technical field of computer vision, and in particular relates to a video target tracking method based on neighborhood component analysis and scale space theory.

背景技术Background technique

在计算机视觉领域的许多应用中，如智能监控、机器人视觉、人机交互界面等，都需要对视频序列中的运动目标进行跟踪。由于跟踪目标形态的多样性和目标运动的不确定性，如何实现各种环境下鲁棒的实时跟踪，并随目标距离变化实现其可变尺度的可靠估计一直是研究的热点。均值漂移（mean shift）是近年来学术界研究最多的目标跟踪方法，它是一种非参数密度梯度上升算法，用于寻找概率密度函数的极值，沿着目标状态概率函数的梯度方向，局部迭代搜索图像中目标最可能具有的状态。该类算法运算速度快，获得广泛推崇。然而，均值漂移方法中尺度的自适应机制的一直是重要的问题。传统的均值漂移算法采用固定带宽的核函数，不能自适应地跟踪目标的缩放，容易导致定位不准。当所选择的带宽过大时，所提取的候选区域特征概率分布将包含背景干扰，影响定位；反过来，当带宽过小时，只能获得目标局部的特征概率分布，同样会导致定位误差。In many applications in the field of computer vision, such as intelligent monitoring, robot vision, human-computer interaction interface, etc., it is necessary to track moving objects in video sequences. Due to the diversity of tracking target shapes and the uncertainty of target motion, how to achieve robust real-time tracking in various environments and achieve reliable estimation of variable scales as the target distance changes has always been a research hotspot. Mean shift (mean shift) is the most researched target tracking method in academia in recent years. It is a non-parametric density gradient ascent algorithm, which is used to find the extreme value of the probability density function, along the gradient direction of the target state probability function, local Iteratively searches the image for the most likely state of the object. This type of algorithm is fast and widely respected. However, the adaptive mechanism of scale in mean-shift methods has been an important issue. The traditional mean shift algorithm uses a fixed-bandwidth kernel function, which cannot adaptively track the scaling of the target, which easily leads to inaccurate positioning. When the selected bandwidth is too large, the feature probability distribution of the extracted candidate area will contain background interference, which will affect the positioning; conversely, when the bandwidth is too small, only the local feature probability distribution of the target can be obtained, which will also lead to positioning errors.

1998年，Bradski等人对基本的均值漂移方法进行了改进，在搜索目标位置的同时用目标状态概率分布的高阶矩得到更加准确的目标尺度大小，即尺度自适应的均值漂移方法（CamShift）。但是该方法在周围环境的颜色与目标的颜色接近时，容易失效。董蓉（董蓉,李勃,陈启美,基于SIFT特征的目标多自由度mean-shift跟踪算法.控制与决策,2012(03):399-402+407）提出了一种利用SIFT特征所获得的目标尺度和目标方向来调整均值漂移算法核函数的带宽和方向，从而提高均值漂移跟踪算法对目标尺度变化的适应能力的算法。由于均值漂移算法本身有可能收敛于状态空间中的局部最优点，该算法仍然无法从理论上保证跟踪的精度。覃剑（覃剑,曾孝平,李勇明,基于边界力的Mean-Shift核窗宽自适应算法.软件学报,2009(07):p.1726-1734.）提出了引入区域似然度以提取目标的局部信息，比较相邻帧间的区域似然度并构建边界力，通过对边界力的计算,得到边界点的位置,进而自适应地更新核函数带宽的一种基于边界力的均值漂移跟踪算法。因为同样的原因，受均值漂移算法本身缺点的约束，该算法也无法从理论上保证跟踪的精度。类似的问题也存在于别的基于均值漂移的算法中（如王勇,陈分雄,郭红想,偏移校正的核空间直方图目标跟踪.自动化学报,2012(03):p.430-436.）。在已授权的公开号为CN101281648A的发明专利中（低复杂度的尺度自适应视频目标跟踪方法），发明人提出的算法，以粒子滤波算法为框架，在重要性采样函数中利用了均值漂移算法，提高了采样效率，克服了单一尺度均值漂移方法收敛于局部最优点的缺点，但因为没有考虑目标本身特征发生变化的自适应问题，在光照发生明显变化时难以保证定位精度。In 1998, Bradski et al. improved the basic mean shift method, and used the higher-order moments of the target state probability distribution to obtain a more accurate target scale while searching for the target position, that is, the scale-adaptive mean shift method (CamShift) . However, this method is prone to failure when the color of the surrounding environment is close to the color of the target. Dong Rong (Dong Rong, Li Bo, Chen Qimei, multi-degree-of-freedom mean-shift tracking algorithm based on SIFT features. Control and Decision, 2012(03):399-402+407) proposed a method using SIFT features to obtain The target scale and target direction are used to adjust the bandwidth and direction of the kernel function of the mean shift algorithm, so as to improve the adaptability of the mean shift tracking algorithm to the change of the target scale. Because the mean shift algorithm itself may converge to the local optimal point in the state space, the algorithm still cannot theoretically guarantee the tracking accuracy. Qin Jian (Qin Jian, Zeng Xiaoping, Li Yongming, Mean-Shift Kernel Window Width Adaptive Algorithm Based on Boundary Force. Journal of Software, 2009(07):p.1726-1734.) proposed to introduce the region likelihood to extract The local information of the target compares the regional likelihood between adjacent frames and constructs the boundary force. Through the calculation of the boundary force, the position of the boundary point is obtained, and then adaptively updates the kernel function bandwidth based on the mean value shift of the boundary force. tracking algorithm. For the same reason, limited by the shortcomings of the mean shift algorithm itself, this algorithm cannot theoretically guarantee the tracking accuracy. Similar problems also exist in other algorithms based on mean shift (such as Wang Yong, Chen Fenxiong, Guo Hongxiang, offset-corrected kernel space histogram target tracking. Acta Automatica Sinica, 2012(03): p.430- 436.). In the authorized invention patent with publication number CN101281648A (low-complexity scale-adaptive video target tracking method), the algorithm proposed by the inventor uses the particle filter algorithm as the framework and uses the mean shift algorithm in the importance sampling function , which improves the sampling efficiency and overcomes the shortcoming of the single-scale mean shift method converging on the local optimum, but because it does not consider the adaptive problem of changes in the characteristics of the target itself, it is difficult to ensure the positioning accuracy when the illumination changes significantly.

邻里成分分析（Neighbourhood Components Analysis，NCA）是Goldberger（J.Goldberger,S.Roweis,G.Hinton,R.Salakhutdinov.(2005)NeighbourhoodComponent Analysis.Advances in Neural Information Processing Systems.17,513-520.）提出的一种距离度量的监督学习方法，其目的在于通过在训练集上学习得到一个线性空间转移矩阵，在新的转换空间中最大化平均留一分类效果。它根据一种给定的距离度量算法对样本数据进行度量，然后对多类簇进行分类。在功能上其和k近邻算法的目的相同，直接利用随机近邻的概念确定与测试样本临近的有标签的训练样本。Neighborhood Components Analysis (NCA) is a method proposed by Goldberger (J. Goldberger, S. Roweis, G. Hinton, R. Salakhutdinov. (2005) Neighborhood Component Analysis. Advances in Neural Information Processing Systems. 17, 513-520.) A supervised learning method for distance metrics, the purpose of which is to maximize the average leave-one-out classification effect in the new transition space by learning a linear space transition matrix on the training set. It measures sample data according to a given distance measure algorithm and then classifies multi-class clusters. In terms of function, it has the same purpose as the k-nearest neighbor algorithm, directly using the concept of random neighbors to determine the labeled training samples that are close to the test samples.

其典型的处理过程说明如下：以x_i表示训练集样本中的第i个样本，c_i为其类别，则经过线性空间转移矩阵A变换后的样本为AX，在变换后的新空间中考虑整个数据集作为随机最近邻居。用平方欧氏距离函数来定义在新的转换空间中的留一数据点与其他数据的距离，该函数定义如下：Its typical processing process is described as follows: let _xi represent the i-th sample in the training set sample, and _ci be its category, then the sample transformed by the linear space transfer matrix A is AX, and consider in the transformed new space The entire dataset is used as random nearest neighbors. Use the square Euclidean distance function to define the distance between the leave-one-out data point and other data in the new transformation space, the function is defined as follows:

p_ij也是样本点x_j为x_i的最近邻居的概率。x_i的分类准确率是与其相邻的最近邻居集C_i（C_i={j|c_i=c_j}）的分类准确率：选取能最大化分类准确率的A作为A_new，即 $A_{new} = \arg \max_{A} \underset{i}{Σ} p_{i} .$ 为方便计算，将目标函数 $f (A) = \underset{i}{Σ} p_{i} = \underset{i}{Σ} \underset{j &Element; C_{i}}{Σ} p_{ij}$ 重新写作为：

其梯度函数经过推导为p _ij is also the probability that the sample point x _j is the nearest neighbor of x _i . The classification accuracy of x _i is the classification accuracy of its adjacent nearest neighbor set C _i (C _i ={j| _ci =c _j }): Select A that can maximize the classification accuracy as A _new , namely

A_{new} = \arg \max_{A} \underset{i}{Σ} p_{i} .

For the convenience of calculation, the objective function

f (A) = \underset{i}{Σ} p_{i} = \underset{i}{Σ} \underset{j &Element; C_{i}}{Σ} p_{ij}

re-written as:

Its gradient function is derived as

$\frac{&PartialD; &PartialD; g g}{&PartialD; &PartialD; A A} = = 22 A A \underset{i i}{Σ Σ} ((\underset{k k}{Σ Σ} {p p}_{ik ik} {x x}_{ik ik} {x x}_{ik ik}^{T T} - - \frac{{Σ Σ}_{j j &Element; &Element; {C C}_{i i}} {p p}_{ij ij} {x x}_{ij ij} {x x}_{ij ij}^{T T}}{{Σ Σ}_{j j &Element; &Element; {C C}_{i i}} {p p}_{ij ij}}));;$

其中x_ij=x_i-x_j。采用共轭梯度多变量优化方法可以求得 $A_{new} = \arg \max_{A} g (A) = \arg \max_{A} \underset{i}{Σ} \log (\underset{j &Element; C_{i}}{Σ} p_{ij}) .$ A_new将使训练样本集在新的转换空间中取得最大化平均留一分类效果。where x _ij =x _i -x _j . The conjugate gradient multivariate optimization method can be used to obtain $A_{new} = \arg \max_{A} g (A) = \arg \max_{A} \underset{i}{Σ} \log (\underset{j &Element; C_{i}}{Σ} p_{ij}) .$ A _new will maximize the average leave-one-out classification effect of the training sample set in the new transformation space.

多尺度规范化Laplacian滤波是Lindeberg提出的尺度空间理论（Lindeberg,T.,“Feature Detection with Automatic Scale Selection”,International Journal ofComputer Vision,1998,vol30(2),pp.79-116）中用来检测不同尺度上灰度块的计算公式。它将灰度图像看成是二维函数f，即f：R²→R。其线性尺度空间表示L定义为和具有可变宽度t的3维的高斯核g的卷积：L(·;t)=g(·;t)*f(·)，其中g：

x=(x₁,...x_D)^T，t被称为L的尺度参数。以L_xx和L_yy分别表示L在水平和垂直方向上的二阶偏导数，则可如下确定多尺度规范化Laplacian滤波函数在(x,y,t)处的值：Laplacian(x,y,t)=(t(L_xx(x,y)+L_yy(x,y)))²。当灰度图像中包含若干大小不同的正方形灰度块时，该函数会在不同尺度参数t下，先后在各个灰度块的中心取得极大值，通过检查这些极大值所在的尺度和位置，就能确定各灰度块的中心位置和尺度。Multi-scale normalized Laplacian filtering is used to detect different The calculation formula of the grayscale block on the scale. It regards the grayscale image as a two-dimensional function f, namely f: R ² →R. Its linear scale space representation L is defined as a convolution with a 3-dimensional Gaussian kernel g with variable width t: L(·;t)=g(·;t)*f(·), where g:

x=(x ₁ ,...x _D ) ^T , t is called the scale parameter of L. Let L _xx and L _yy represent the second-order partial derivatives of L in the horizontal and vertical directions, respectively, then the value of the multi-scale normalized Laplacian filter function at (x, y, t) can be determined as follows: Laplacian(x, y, t )=(t(L _xx (x,y)+L _yy (x,y))) ² . When the grayscale image contains several square grayscale blocks of different sizes, the function will successively obtain the maximum value at the center of each grayscale block under different scale parameters t, by checking the scale and position of these maximum values , the center position and scale of each gray block can be determined.

发明内容Contents of the invention

本发明的目的是针对以上提到的现有技术存在的不足，提出了基于邻里成分分析和尺度空间理论的视频目标跟踪方法。The object of the present invention is to propose a video target tracking method based on neighborhood component analysis and scale space theory, aiming at the above-mentioned deficiencies in the prior art.

基于邻里成分分析和尺度空间理论的视频目标跟踪方法，该视频目标跟踪方法包括以下步骤：A video target tracking method based on neighborhood component analysis and scale space theory, the video target tracking method includes the following steps:

步骤1：在第一帧中通过检测或手动标注确定目标最初所在的矩形框，获得有关目标的初始状态，并初始化粒子滤波器；Step 1: Determine the initial rectangular frame of the target by detection or manual labeling in the first frame, obtain the initial state of the target, and initialize the particle filter;

通过目标检测方法或手动标注确定目标最初所在矩形框，目标矩形左上角点坐标为(r,c)，矩形框的宽和高为(w,h)，目标初始尺度参数s通过如下公式计算获得：Determine the initial rectangular frame of the target through the target detection method or manual labeling. The coordinates of the upper left corner of the target rectangle are (r, c), the width and height of the rectangular frame are (w, h), and the initial scale parameter s of the target is calculated by the following formula :

s=((13+(w-34)*0.47619))²；s=((13+(w-34)*0.47619)) ² ;

其中，目标中心点为

记录目标的宽度与高度比例为：

asr = \frac{w}{h};

Among them, the target center point is

The width-to-height ratio of the record target is:

asr = \frac{w}{h};

设定目标的初始状态的采样下限为lb，采样上限为ub；Set the lower limit of sampling in the initial state of the target as lb, and the upper limit of sampling as ub;

设定N个粒子来描述目标状态的多样性，将所有粒子的权值初始化为统一的

将各粒子的各分量初始化为在[lb,ub]范围内均匀分布的随机向量；Set N particles to describe the diversity of the target state, and initialize the weights of all particles to be uniform

Initialize each component of each particle as a random vector uniformly distributed in the range [lb,ub];

步骤2：对当前帧中目标所在的矩形框内和目标所在的矩形框外的背景域上进行采样，得到2K个样本的训练集X；Step 2: Sampling the background domain within the rectangular frame where the target is located and outside the rectangular frame where the target is located in the current frame to obtain a training set X of 2K samples;

在目标所在的矩形框中，以目标中心点坐标为二维高斯分布的期望，对目标区域进行采样，得到K个采样位置，以K个采样位置处的目标像素特征组成训练样本集中的K个目标类样本；以目标矩形的最小邻接椭圆的中心为极点，平行于目标宽度的方向为极轴建立极坐标，在极坐标中进行随机采样，每个采样点的角度为[0,2π)内均匀分布的随机数，极径为相同角度下椭圆上点的极径的倍数，该倍数为一个指数分布的随机数与大于1的浮点数之和，采样得到K个位于目标区域之外，周围背景域之上的新采样点，以新采样点位置处的背景像素的特征组成训练样本集中的K个背景类样本；通过采样的2K个样本得到训练集X；In the rectangular frame where the target is located, the target area is sampled with the coordinates of the target center point as the expectation of the two-dimensional Gaussian distribution, and K sampling positions are obtained, and the target pixel features at the K sampling positions are used to form K in the training sample set Target class samples; take the center of the smallest adjacent ellipse of the target rectangle as the pole, establish polar coordinates as the polar axis parallel to the direction of the target width, and perform random sampling in the polar coordinates, and the angle of each sampling point is within [0,2π) A uniformly distributed random number, the polar diameter is a multiple of the polar diameter of the point on the ellipse at the same angle, the multiple is the sum of an exponentially distributed random number and a floating point number greater than 1, and K samples are obtained outside the target area, around The new sampling point above the background domain, the K background class samples in the training sample set are composed of the features of the background pixels at the new sampling point position; the training set X is obtained by sampling 2K samples;

步骤3：对2K个样本的训练集X进行邻里成分分析NCA，并使用向量BFGS多变量优化算法求解得到新的线性空间转移矩阵A_new，2K个训练样本再按照得到新的线性空间转移矩阵A_new进行变换，得到变换后的训练样本集AX；Step 3: Neighborhood component analysis NCA is performed on the training set X of 2K samples, and a new linear space transfer matrix A _new is obtained by using the vector BFGS multivariate optimization algorithm. The 2K training samples are then obtained according to the new linear space transfer matrix A _New is transformed to obtain the transformed training sample set AX;

步骤4：获取下一帧图像，成为当前帧，将该帧图像中所有位置的像素特征构成的向量组成测试样本集，也按照A_new进行变换，得到变换后的测试样本集S_new；利用上一帧中变换后的训练样本集AX对本帧中变换后的测试样本集S_new进行分类，得到各测试样本分类的概率p_post，将属于目标类的概率作为各测试样本所在位置处的像素值，则得到一幅新的灰度图的目标概率分布图I_likelihood；Step 4: Obtain the next frame of image to become the current frame, and form a test sample set from the vectors composed of pixel features of all positions in the frame image, and transform according to A _new to obtain the transformed test sample set S _new ; use the above The transformed training sample set AX in one frame classifies the transformed test sample set S _new in this frame, and obtains the classification probability p _post of each test sample, and takes the probability of belonging to the target class as the pixel value at the position of each test sample , then a target probability distribution map I _likelihood of a new grayscale image is obtained;

步骤5：在目标概率分布图I_likelihood上，计算以各个粒子所在位置处为中心的多尺度规范化Laplacian滤波函数，在粒子所在位置处的值；其中最大值为vmax，以各粒子的多尺度规范化Laplacian滤波函数值，距最大值vmax的归一化的距离为基础计算该粒子的置信度；Step 5: On the target probability distribution map I _likelihood , calculate the multi-scale normalized Laplacian filter function centered at the position of each particle, and the value at the position of the particle; the maximum value is vmax, which is normalized by the multi-scale of each particle Laplacian filter function value, the confidence of the particle is calculated based on the normalized distance from the maximum value vmax;

步骤6：更新粒子滤波器，获取滤波器的输出状态，得到目标在当前帧图像中用矩形框表示的新位置，若跟踪尚未结束，转步骤2，否则停止。Step 6: Update the particle filter, obtain the output state of the filter, and obtain the new position of the target represented by a rectangular box in the current frame image. If the tracking has not ended, go to step 2, otherwise stop.

所述背景上采样的具体过程如下：The specific process of sampling on the background is as follows:

生成[0,2π)区间上均匀分布的随机数作为极坐标中的角度α；Generate a uniformly distributed random number on the [0,2π) interval as the angle α in polar coordinates;

生成率参数λ=0.5的指数分布的随机数χ，计算极径

其中β>1，w和h分别为目标的宽和高，β为控制采样点到目标边缘距离的参数；The random number χ of the exponential distribution with the generation rate parameter λ=0.5 is used to calculate the polar diameter

Where β>1, w and h are the width and height of the target respectively, and β is a parameter controlling the distance from the sampling point to the edge of the target;

获取图像坐标中

处的像素的特征构成一个背景类样本，重复上述过程K次即可获得K个背景类样本。Get image coordinates in

The features of the pixels at are a background class sample, and K background class samples can be obtained by repeating the above process K times.

所述各测试样本分类的概率p_post计算过程具体过程如下：The specific process of calculating the probability p _post of each test sample classification is as follows:

每个样本的维数为设定值n，则第i个样本x_i表示为n维行向量x_i=(x_i1,x_i2,…,x_in)，x_in为样本的第n维分量，以每个样本作为一行，来自目标和背景的经过线性变换的训练样本构成一个2K×n的矩阵AX，前K行来自目标，后K行来自背景，构建2K×2的矩阵G，各行的取值对应于相应行训练样本的类别，若第i个样本属于目标，则G的第i行为(1,0)，若第i个样本属于背景，则G的第i行为(0,1)；类似的，测试样本集S_new构成了W×H×n的矩阵，其中W和H分别为图像的宽和高；以AX为训练样本集，G作为监督结果，对变换后的测试样本集S_new进行分类，得到S_new中每一样本的类别归属，以及属于目标类的概率p_post，将测试样本所对应的图像中各像素位置处的像素值替换为对应的p_post，生成目标概率分布图I_likelihood。The dimension of each sample is the set value n, then the i-th sample x _i is expressed as an n-dimensional row vector x _i =(x _i1 , _xi2 ,…,x _in ), and x _in is the n-th dimension component of the sample , with each sample as a row, the linearly transformed training samples from the target and the background form a 2K×n matrix AX, the first K rows come from the target, and the last K rows come from the background, construct a 2K×2 matrix G, each row The value corresponds to the category of the corresponding row of training samples, if the i-th sample belongs to the target, then the i-th behavior of G is (1,0), if the i-th sample belongs to the background, then the i-th behavior of G is (0,1) ;Similarly, the test sample set S _new constitutes a matrix of W×H×n, where W and H are the width and height of the image respectively; with AX as the training sample set and G as the supervision result, the transformed test sample set S _new classifies, obtains the category belonging of each sample in S _new , and the probability p _post of belonging to the target class, replaces the pixel value at each pixel position in the image corresponding to the test sample with the corresponding p _post , and generates the target probability Distribution map I _likelihood .

所述多尺度规范化Laplacian滤波函数的具体计算过程如下：The specific calculation process of the multi-scale normalized Laplacian filter function is as follows:

(1)具有连续尺度变量t，离散空间变量n的高斯核函数T：Z×R₊→R为T(n;t)=e^-tI_n(t)，其中I_n(t)为第一类修正贝塞尔函数，其二阶微分可通过差分计算：(1) Gaussian kernel function T with continuous scale variable t and discrete space variable n: Z×R ₊ → R is T(n;t)=e ^-t I _n (t), where I _n (t) is the A class of modified Bessel functions whose second order differential It can be calculated by difference:

其中*表示一维离散信号卷积；

where * represents one-dimensional discrete signal convolution;

(2)给定尺度变量t，将目标概率分布图I_likelihood矩阵的每一行与

卷积，将结果矩阵的每一列再与T(n;t)卷积，以L_xx(x,y)表示第二次卷积的第一结果矩阵；类似地，将目标概率分布图I_likelihood矩阵的每一列与

卷积，将结果矩阵的每一行再与T(n;t)卷积，以L_yy(x,y)表示第二次卷积的第二结果矩阵。则多尺度规范化Laplacian滤波函数如下式在(x,y,t)处的取值为：(2) Given a scale variable t, combine each row of the target probability distribution graph I _likelihood matrix with

Convolution, each column of the result matrix is convolved with T(n;t), and L _xx (x, y) represents the first result matrix of the second convolution; similarly, the target probability distribution map I _likelihood Each column of the matrix with

Convolution, each row of the result matrix is convolved with T(n;t), and L _yy (x, y) represents the second result matrix of the second convolution. Then the value of the multi-scale normalized Laplacian filter function at (x, y, t) is as follows:

Laplacian(x,y,t)=(t(L_xx(x,y)+L_yy(x,y)))²。Laplacian(x,y,t)=(t(L _xx (x,y)+L _yy (x,y))) ² .

所述粒子置信度的计算过程如下：The calculation process of the particle confidence is as follows:

获得所有粒子多尺度规范化Laplacian滤波函数值中的最大值vmax，计算各粒子多尺度规范化Laplacian滤波函数值与vmax之间的距离d，以及所有粒子Laplacian滤波函数值的方差var，根据下式得到该粒子的置信度conf：Obtain the maximum value vmax among the multi-scale normalized Laplacian filter function values of all particles, calculate the distance d between the multi-scale normalized Laplacian filter function values of each particle and vmax, and the variance var of all particle Laplacian filter function values, and obtain the value according to the following formula Particle confidence conf:

$conf conf = = {e e}^{\frac{{- - d d}^{22}}{22 \times \times var var}} . .$

所述目标在当前帧图像中用矩形框表示的新位置的获得，具体过程如下：The acquisition of the new position of the target represented by a rectangular box in the current frame image, the specific process is as follows:

滤波器输出的状态为

则目标的宽度由如下公式计算：The state of the filter output is

Then the width of the target is calculated by the following formula:

$w w = = \frac{\sqrt{s the s} - - 1313}{0.47619 0.47619} + + 3434;;$

高度为 $h = \frac{w}{asr};$ 目标左上角点坐标为 $(r = \overset{\cdot}{r} - \frac{h}{2}, c = \overset{\cdot}{c} - \frac{w}{2}) .$ height is $h = \frac{w}{asr};$ The coordinates of the upper left corner of the target are $(r = \overset{\cdot}{r} - \frac{h}{2}, c = \overset{\cdot}{c} - \frac{w}{2}) .$

本发明的有益效果：本发明利用特征变换来获取区分目标和背景的最优特征，获得在任意帧图像中区分目标和背景像素的最佳分类器，从分类器设计的角度解决了目标特征的更新问题；利用粒子滤波的状态多样性和收敛特性，在避免目标被遮挡后算法陷入局部最优点问题的同时，在尺度空间理论的基础上保证了跟踪的精度。避免了均值漂移等传统算法陷于局部最优状态的危险，同时得到目标位置和尺度的最优估计值。本发明能更准确地定位目标的位置和尺度大小，更有效的适应目标光照、色彩变化问题，同时鲁棒的处理目标遮挡。Beneficial effects of the present invention: the present invention uses feature transformation to obtain the optimal feature for distinguishing the target and the background, obtains the best classifier for distinguishing the target and background pixels in any frame image, and solves the problem of target features from the perspective of classifier design Update the problem; using the state diversity and convergence characteristics of the particle filter, while avoiding the algorithm from falling into the local optimum problem after the target is occluded, the tracking accuracy is guaranteed on the basis of the scale space theory. It avoids the danger of traditional algorithms such as mean shift being trapped in a local optimal state, and at the same time obtains the optimal estimated value of the target position and scale. The invention can more accurately locate the position and scale of the target, more effectively adapt to the problems of target illumination and color change, and simultaneously robustly process target occlusion.

附图说明Description of drawings

图1是本发明跟踪方法的工作流程图；Fig. 1 is the work flowchart of tracking method of the present invention;

图2是现有技术中Camshift方法在目标颜色特征发生迅速变化时对目标的跟踪效果图；其中，（a）为第二帧时该方法开始偏离真实目标；（b）-（f）为在随后各典型帧对目标的跟踪效果图；Figure 2 is the tracking effect of the Camshift method in the prior art when the color characteristics of the target change rapidly; where (a) is the second frame when the method starts to deviate from the real target; (b)-(f) are the The tracking effect diagram of each typical frame to the target;

图3是本发明在目标颜色特征发生迅速变化时自适应更新最佳特征的跟踪效果图；Fig. 3 is the tracking effect figure of the present invention adaptively updating the best feature when the target color feature changes rapidly;

图4是现有技术中的L1APG方法在目标亮度和尺度同时发生变化时对目标的跟踪效果图；其中，（a）（b）为头几帧中该方法能对作为目标的玻璃瓶进行跟踪的情形；（c）-（f）为当目标上的表面亮度发生变化时该方法在目标高度方向上产生偏差的情形；Figure 4 is the tracking effect of the L1APG method in the prior art when the brightness and scale of the target change at the same time; where (a) and (b) show that the method can track the glass bottle as the target in the first few frames The situation of ; (c)-(f) is the situation where the method produces a deviation in the direction of the target height when the surface brightness on the target changes;

图5是本发明在目标亮度和尺度同时发生变化时，自适应更新最佳特征和调整尺度的跟踪效果图；其中，（a）为跟踪开始时的情形；（b）（c）为当目标表面变暗，大小缩小时本发明算法仍能准确跟踪的情形；（d）-（f）为目标表面由暗变亮再变暗，大小先放大后缩小时本发明算法仍对其准确跟踪的情形；Figure 5 is a tracking effect diagram of the present invention when the target brightness and scale change simultaneously, adaptively updating the best features and adjusting the scale; where (a) is the situation when the tracking starts; (b) (c) is when the target The situation that the algorithm of the present invention can still track accurately when the surface becomes dark and the size shrinks; (d)-(f) are the situations where the target surface changes from dark to bright and then dark, and the algorithm of the present invention can still accurately track it when the size first enlarges and then shrinks situation;

图6是本发明在跟踪过程中生成的目标概率分布图；其中，（a）-（f）显示了分别与图5中各帧图像对应的目标概率分布图；Fig. 6 is a target probability distribution diagram generated during the tracking process of the present invention; among them, (a)-(f) show the target probability distribution diagram corresponding to each frame image in Fig. 5;

图7是本发明在跟踪过程中对目标和背景的采样结果示意图；其中白色方块内为目标区域，外为背景区域；圆圈代表目标区域的采样点，x代表背景区域采样点；Fig. 7 is a schematic diagram of the sampling results of the target and the background in the tracking process of the present invention; wherein the target area is inside the white square, and the background area is outside; the circle represents the sampling point of the target area, and x represents the sampling point of the background area;

图8是本发明实例在对训练样本集进行邻里成分分析之前，各样本在特征空间头两维的投影图；圆圈代表来自目标的样本点，x代表来自背景的样本点；Fig. 8 is the first two-dimensional projection diagram of each sample in the feature space before the neighborhood component analysis is performed on the training sample set in the example of the present invention; the circle represents the sample point from the target, and x represents the sample point from the background;

图9是本发明实例在对训练样本集进行邻里成分分析之后，各样本在变换后的特征空间中头两维的投影图，圆圈代表来自目标的样本点，x代表来自背景的样本点；Fig. 9 is the first two-dimensional projection diagram of each sample in the transformed feature space after analyzing the neighborhood components of the training sample set in the example of the present invention, the circle represents the sample point from the target, and x represents the sample point from the background;

图10是本发明实施例与现有技术中的L1APG在目标宽度精度上的对比结果；Fig. 10 is the comparison result of the target width accuracy between the embodiment of the present invention and L1APG in the prior art;

图11是本发明实施例与现有技术中的L1APG在目标高度精度上的对比结果；Fig. 11 is the comparison result of the target height accuracy between the embodiment of the present invention and L1APG in the prior art;

图12是本发明在目标发生部分和完全遮挡时的跟踪效果图;其中，（a）（b）（c）分别是第一次部分遮挡前，遮挡中，遮挡后的算法跟踪情形；（d）-（f）分别为第二次完全遮挡前，遮挡中，遮挡后的算法跟踪情形。Figure 12 is a tracking effect diagram of the present invention when the target is partially and completely occluded; where (a) (b) (c) are the algorithm tracking situations before the first partial occlusion, during occlusion, and after occlusion respectively; (d )-(f) are the algorithm tracking situations before, during and after the second complete occlusion, respectively.

具体实施方式Detailed ways

本实施例中对“玻璃杯”视频序列进行目标跟踪。设计参数设置为：粒子总数N=800；状态向量维数stateN=3，对应于粒子所在位置的宽、高方向上的坐标，以及尺度参数，观测向量维数measureN=3，目标区域采样样本个数K=300，邻里成分分析最大循环次数为maxIter=100次。In this embodiment, target tracking is performed on the "glass" video sequence. The design parameters are set as follows: the total number of particles N=800; the state vector dimension stateN=3, corresponding to the coordinates in the width and height directions of the particle location, and the scale parameter, the observation vector dimension measureN=3, and the target area sampling samples The number K=300, the maximum cycle times of neighborhood component analysis is maxIter=100 times.

在跟踪的过程中有可能因为环境的变化而引起目标的亮度、色彩的变化，应该在跟踪的同时适时地修正目标的特征以适应这些变化，因此本发明利用邻里成分分析（NCA）变换计算获取一个线性变换矩阵，该矩阵将目标和背景的像素特征投影到新的空间中，在新空间中目标类和背景类样本间的mahalanobis距离达到最大，从而使在变换后的特征上训练得到的两类分类器，其分类误差比原特征的要小。对新一帧图像中的各像素特征按此变换矩阵进行变换，并按此分类器分类，可得到比利用原特征得到的更好的目标概率分布图。目标在此分布图中呈现为灰度值较高的区域，背景则呈现为较暗的区域。为进一步提高精度，引入了粒子滤波器来估计目标的状态，依据Lindeberg尺度空间理论，提出了基于多尺度规范化Laplacian滤波函数的粒子置信度计算方法，避免了均值漂移等传统算法陷于局部最优状态的危险，同时得到目标位置和尺度的最优估计值。In the process of tracking, the brightness and color of the target may change due to changes in the environment, and the characteristics of the target should be corrected in time to adapt to these changes while tracking. Therefore, the present invention uses Neighborhood Component Analysis (NCA) to calculate and obtain A linear transformation matrix, which projects the pixel features of the target and the background into a new space, in which the mahalanobis distance between the target class and the background class samples reaches the maximum, so that the two images trained on the transformed features A classifier whose classification error is smaller than that of the original feature. Transform each pixel feature in a new frame of image according to this transformation matrix, and classify it according to this classifier, and a better target probability distribution map can be obtained than that obtained by using the original feature. Targets appear as areas of higher gray values in this distribution plot, and the background appears as darker areas. In order to further improve the accuracy, a particle filter is introduced to estimate the state of the target. According to the Lindeberg scale space theory, a particle confidence calculation method based on the multi-scale normalized Laplacian filter function is proposed, which avoids traditional algorithms such as mean shift from being trapped in a local optimal state. , and at the same time obtain the optimal estimate of the target position and scale.

与现有技术相比，本发明提出了利用邻里成分分析法（NCA）的特征变换功能来获取区分目标和背景的最优特征，获得在任意帧图像中区分目标和背景像素的最佳线性分类器，从分类器设计的角度解决了目标特征的更新问题；提出了基于多尺度规范化Laplacian滤波函数的粒子置信度计算方法，利用粒子滤波的状态多样性和收敛特性，在避免目标被遮挡后算法陷入局部最优点问题的同时，在尺度空间理论的基础上保证了跟踪的精度。针对目前广泛流行的均值漂移跟踪算法和2012年新近提出的L1APG跟踪算法（Chenglong,B.,et al.Realtime robust L1tracker using accelerated proximal gradient approach.in ComputerVision and Pattern Recognition(CVPR),2012IEEE Conference on.2012.），在相同的实验条件下，本发明能更准确地定位目标的位置和尺度大小，更有效地适应目标光照、色彩变化问题，同时鲁棒地处理目标遮挡。Compared with the prior art, the present invention proposes to use the feature transformation function of the Neighborhood Component Analysis (NCA) to obtain the optimal feature for distinguishing the target and the background, and obtain the best linear classification for distinguishing the target and background pixels in any frame image From the perspective of classifier design, it solves the update problem of target features; proposes a particle confidence calculation method based on multi-scale normalized Laplacian filter function, and uses the state diversity and convergence characteristics of particle filter to prevent the target from being occluded. While falling into the problem of local optimum, the accuracy of tracking is guaranteed on the basis of scale space theory. For the widely popular mean shift tracking algorithm and the newly proposed L1APG tracking algorithm in 2012 (Chenglong, B., et al. Realtime robust L1tracker using accelerated proximal gradient approach. in ComputerVision and Pattern Recognition (CVPR), 2012IEEE Conference on.2012 .), under the same experimental conditions, the present invention can more accurately locate the position and scale of the target, more effectively adapt to the problem of target illumination and color changes, and at the same time robustly deal with target occlusion.

如图1所示，本实施例基于邻里成分分析和尺度空间理论的视频目标跟踪方法包括如下步骤：As shown in Figure 1, the video target tracking method based on neighborhood component analysis and scale space theory in this embodiment includes the following steps:

初始化目标状态：在序列的第一帧，通过目标检测方法或手动标注确定目标所在矩形框，得到目标矩形左上角点坐标为(r,c)，所在矩形框的宽和高为(w,h)，则通过如下公式计算获得目标初始尺度参数s；Initialize the target state: In the first frame of the sequence, determine the rectangular box where the target is located through the target detection method or manual labeling, and obtain the coordinates of the upper left corner of the target rectangle as (r,c), and the width and height of the rectangular box as (w,h ), then the target initial scale parameter s is obtained by calculating the following formula;

s=((13+(w-34)*0.47619))² （1）s=((13+(w-34)*0.47619)) ² (1)

其中，目标中心点为

记录目标的宽度与高度比例：

asr = \frac{w}{h};

Among them, the target center point is

Record the object's width-to-height ratio:

asr = \frac{w}{h};

设定各粒子初始状态的采样下限为lb=[r-3h,c-3w,0.1s]，采样上限为ub=[r+4h,c+4w,2s]；Set the sampling lower limit of the initial state of each particle as lb=[r-3h,c-3w,0.1s], and the sampling upper limit as ub=[r+4h,c+4w,2s];

以目标矩形框中心位置为期望

方差为

σ = [\begin{matrix} \frac{h}{3} & 0 \\ 0 & \frac{w}{3} \end{matrix}]

矩阵的二维高斯分布对图像像素位置进行采样，选取K=300个位于目标所在矩形框内的像素构成训练样本集中的目标类样本；以目标矩形的最小邻接椭圆的中心为极点，平行于目标宽度的方向为极轴建立极坐标，在极坐标中进行随机采样，每个采样点的角度为[0,2π)内均匀分布的随机数，极径为相同角度下椭圆上点的极径的倍数，该倍数为一个指数分布的随机数加上1.2，这样采样得到K=300个位于目标所在矩形框之外的，位于目标周围背景之上的像素位置，以它们构成背景类样本；Take the center position of the target rectangle as the expectation

Variance is

σ = [\begin{matrix} \frac{h}{3} & 0 \\ 0 & \frac{w}{3} \end{matrix}]

The two-dimensional Gaussian distribution of the matrix samples the pixel positions of the image, and selects K=300 pixels located in the rectangular frame of the target to form the target class samples in the training sample set; the center of the smallest adjacent ellipse of the target rectangle is the pole, parallel to the target The direction of the width establishes polar coordinates for the polar axis, and random sampling is performed in the polar coordinates. The angle of each sampling point is a random number uniformly distributed within [0,2π), and the polar diameter is the polar diameter of the point on the ellipse at the same angle. Multiple, the multiple is an exponentially distributed random number plus 1.2, so that K=300 pixel positions located outside the rectangular frame of the target and above the surrounding background of the target are obtained by sampling, and they constitute background samples;

对步骤2所得到的2K个样本的训练集X进行邻里成分分析，初始线性空间转移矩阵为 $A = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}],$ 使用向量BFGS（Broyden-Fletcher-Goldfarb-Shanno）多变量优化算法求解A_new，设定算法循环最大次数为maxIter=100次，循环退出条件为当梯度范数小于10^-3时退出。若优化算法成功返回获得极小值，则对应的多变量参数就作为A_new，否则设定A_new=A；对2K个训练样本按照A_new进行变换，得到变换后的训练样本集AX=A_new×X'；Neighborhood component analysis is performed on the training set X of 2K samples obtained in step 2, and the initial linear space transfer matrix is $A = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}],$ Use the vector BFGS (Broyden-Fletcher-Goldfarb-Shanno) multivariate optimization algorithm to solve A _new , set the maximum number of algorithm loops to maxIter=100 times, and the exit condition of the loop is to exit when the gradient norm is less than 10 ^-3 . If the optimization algorithm successfully returns and obtains the minimum value, the corresponding multivariate parameter will be used as A _new , otherwise set A _new = A; transform the 2K training samples according to A _new , and obtain the transformed training sample set AX = A _new ×X';

步骤4：，获取下一帧图像，成为当前帧，将该帧图像中各像素的RGB三分量按像素位置从左到右，从上到下的顺序排列，构成W×H×3的矩阵S，其每一行为一个像素的RGB三分量，W为图像宽度，H为图像高度。对S按照A_new进行变换，得到变换后的测试样本集S_new=A_new×S'；以步骤三中得到的AX作为训练集，对S_new按目标、背景两类进行分类，得到S_new中各样本（对应着各像素）属于目标的概率p_post，以各像素的p_post值取代原RGB分量，得到大小为W×H的目标概率分布图I_likelihood；Step 4: Obtain the next frame of image and make it the current frame, and arrange the RGB three-components of each pixel in the frame of image in the order of pixel positions from left to right and from top to bottom to form a matrix S of W×H×3 , each of which is a three-component RGB component of a pixel, W is the image width, and H is the image height. Transform S according to A _new , and obtain the transformed test sample set S _new = A _new × S'; use AX obtained in step 3 as the training set, classify S _new according to target and background, and obtain S _new The probability p _post of each sample (corresponding to each pixel) belonging to the target, the original RGB component is replaced by the p _post value of each pixel, and the target probability distribution map I _likelihood with a size of W×H is obtained;

各测试样本分类的概率p_post计算过程具体过程如下：The specific process of calculating the probability p _post of each test sample classification is as follows:

假设每个样本的维数为n，则第i个样本x_i可表示为n维行向量x_i=(x_i1,x_i2,…,x_in)，x_in为样本的第n维分量。以每个样本作为一行，来自目标和背景的经过线性变换的训练样本可构成一个2K×n的矩阵AX，其前K行来自目标，后K行来自背景。构建2K×2的矩阵G，其各行的取值对应于相应行训练样本的类别，若第i个样本属于目标，则G的第i行为(1,0)，若第i个样本属于背景，则G的第i行为(0,1)；类似的，测试样本集S_new构成了W×H×n的矩阵，其中W和H分别为图像的宽和高。以AX为训练样本集，G作为监督结果，对变换后的测试样本集S_new进行分类，不仅可得到S_new中每一样本的类别归属，还能得到其属于目标类的概率p_post，将测试样本所对应的图像中各像素位置处的像素值替换为对应的p_post，可生成目标概率分布图I_likelihood；Assuming that the dimension of each sample is n, the i-th sample x _i can be expressed as an n-dimensional row vector x _i =(x _i1 , _xi2 ,…,x _in ), where x _in is the n-th dimension component of the sample. Taking each sample as a row, the linearly transformed training samples from the target and the background can form a 2K×n matrix AX, the first K rows are from the target, and the last K rows are from the background. Construct a 2K×2 matrix G, the value of each row corresponds to the category of the training sample in the corresponding row, if the i-th sample belongs to the target, then the i-th row of G is (1,0), if the i-th sample belongs to the background, Then the i-th row of G is (0,1); similarly, the test sample set S _new constitutes a W×H×n matrix, where W and H are the width and height of the image, respectively. Taking AX as the training sample set and G as the supervision result, classify the transformed test sample set S _new , not only the category of each sample in S _new can be obtained, but also the probability p _post of its belonging to the target class can be obtained. Replace the pixel value at each pixel position in the image corresponding to the test sample with the corresponding p _post to generate the target probability distribution map I _likelihood ;

步骤5：在目标概率分布图I_likelihood上，计算以各个粒子所在位置处为中心的多尺度规范化Laplacian滤波函数，在粒子所在位置处的值；其中最大值为vmax，以各粒子的多尺度规范化Laplacian滤波函数值，距最大值vmax的归一化的距离为基础计算该粒子的置信度；获取该粒子当前的状态(x,y,t)，计算在尺度t下T(n;t)的离散模板大小为离散模板为

其中

计算其二阶微分

的离散模板：d2t_mask=t_mask*(1,-2,1)。Step 5: On the target probability distribution map I _likelihood , calculate the multi-scale normalized Laplacian filter function centered at the position of each particle, and the value at the position of the particle; the maximum value is vmax, which is normalized by the multi-scale of each particle Laplacian filter function value, calculate the confidence of the particle based on the normalized distance from the maximum value vmax; get the current state (x, y, t) of the particle, and calculate the T(n; t) at the scale t The discrete template size is The discrete template is

in

Calculate its second order differential

Discrete template for : d2t _mask =t _mask *(1,-2,1).

计算该粒子所在尺度下，目标分布图I_likelihood在两个方向上先后与

的离散模板d2t_mask及T(n;t)的离散模板t_mask卷积的第一结果矩阵L_xx(x,y)和第二结果矩阵L_yy(x,y)。Calculate the target distribution map I _likelihood in two directions at the scale where the particle is located.

The first result matrix L _xx (x, y) and the second result matrix L yy (x, y) of the convolution of the discrete template d2t _mask of T(n;t) and the discrete template t _mask of _T (n;t).

则粒子所在位置处的多尺度规范化Laplacian滤波函数值可由公式2计算得到：Laplacian(x,y,t)=(t(L_xx(x,y)+L_yy(x,y)))² （2）Then the multi-scale normalized Laplacian filter function value at the position of the particle can be calculated by formula 2: Laplacian(x,y,t)=(t(L _xx (x,y)+L _yy (x,y))) ² ( 2)

获得所有粒子多尺度规范化Laplacian滤波函数值中的最大值vmax，计算各粒子多尺度规范化Laplacian滤波函数值与vmax之间的距离d，以及所有粒子Laplacian滤波函数值的方差var，根据公式（3）得到该粒子的置信度conf：Obtain the maximum value vmax among the multi-scale normalized Laplacian filter function values of all particles, calculate the distance d between the multi-scale normalized Laplacian filter function values of each particle and vmax, and the variance var of all particle Laplacian filter function values, according to formula (3) Get the confidence conf of the particle:

$conf conf = = {e e}^{\frac{{- - d d}^{22}}{22 \times \times var var}} - - - - - - ((33))$

目标矩形框的获得，具体过程如下：The specific process of obtaining the target rectangle is as follows:

滤波器输出的状态为则目标的宽度由（4）公式计算：The state of the filter output is Then the width of the target is calculated by the formula (4):

$w w = = \frac{\sqrt{s the s} - - 1313}{0.47619 0.47619} + + 3434 - - - - - - ((44))$

高度为 $h = \frac{w}{asr};$ 目标左上角点坐标为 $(r = \overset{\cdot}{r} - \frac{h}{2}, c = \overset{\cdot}{c} - \frac{w}{2}) .$ height is $h = \frac{w}{asr};$ The coordinates of the upper left corner of the target are $(r = \overset{&Center Dot;}{r} - \frac{h}{2}, c = \overset{&Center Dot;}{c} - \frac{w}{2}) .$

依据上述步骤，对多个彩色测试序列进行了目标跟踪，所跟踪目标在跟踪过程中，颜色、光照或尺寸发生了明显变化，并存在着遮挡问题。According to the above steps, multiple color test sequences are tracked. During the tracking process, the color, illumination or size of the tracked targets have obvious changes, and there are occlusion problems.

图2、3、4、5和12中的子图(a)-(f)按照时间顺序刻画了目标的位置和大小变化。Subfigures (a)–(f) in Figures 2, 3, 4, 5, and 12 depict the location and size changes of objects in chronological order.

如图3所示为采用本实施例得到的“变色方块”序列的跟踪结果，被跟踪的目标是方块，算法跟踪结果用浅色方框标示。“变色方块”在运动过程中，在前后60帧图像中颜色迅速从纯红色变为纯蓝色。图2显示了均值漂移类方法中的典型算法——Camshift完全失败的情形，Camshift无法适应目标颜色特征如此剧烈的变化，如图2中黑色椭圆，因为方块颜色快速变化（在普通打印机上无法看出，彩色打印机可看出这一变化），很快就丢失了目标，而本实施例采用本发明的方法因为能在当前帧中寻找到区分目标和背景的最佳变换特征，所以很好地适应了目标颜色的这一变化，准确地跟踪了方块。As shown in Fig. 3, the tracking result of the "color-changing square" sequence obtained by using this embodiment, the tracked target is a square, and the tracking result of the algorithm is marked with a light-colored square. During the movement of the "color-changing square", the color rapidly changes from pure red to pure blue in the 60 frames before and after. Figure 2 shows a typical algorithm in the mean shift method - the situation where Camshift fails completely. Camshift cannot adapt to such a drastic change in the color characteristics of the target, as shown in the black ellipse in Figure 2, because the color of the square changes rapidly (cannot be seen on ordinary printers. The color printer can see this change), and the target will be lost soon, but the present embodiment uses the method of the present invention because it can find the best transformation feature to distinguish the target and the background in the current frame, so it is very good Adapting to this change in the target's color, the square is tracked accurately.

如图5所示为采用本实施例得到的“玻璃瓶”序列的跟踪结果，被跟踪的目标是玻璃瓶，算法跟踪结果用白色方框标示。玻璃瓶在跟踪过程中反复进入和退出背景中的阴影，引起其表面亮度发生剧烈变化，同时由于距离摄像机距离的变化，玻璃瓶在图像中的大小也发生明显变化。2012年新发表的L1APG跟踪算法虽能对玻璃瓶进行跟踪，但在尺度参数上逐渐产生了较大误差，最终影响了对目标的定位精度，如图4所示。图10、图11分别给出了目标真实宽度，目标真实高度，本实施例得到的目标宽度、目标高度，以及L1APG算法得到的宽度、高度随视频帧数变化的曲线。由图中曲线计算得出：本实施例所得宽度与目标真实宽度的平均误差为：4.474529，方差为13.178290。而L1APG算法所得宽度与目标真实宽度的平均误差为：8.821549,方差为106.027808。本实施例所得高度与目标真实高度平均误差为：1.009198,方差为12.256838。而L1APG与目标真实高度平均误差为：41.400282,方差为689.699398，明显大于本实施例的结果。可以看出，本实施例方法能很好地跟踪目标的大小变化，同时适应光照引起的目标亮度变化，取得了更高的目标定位精度。图6显示了本实施例处理过程中图5中各帧图像所对应的目标概率分布图，可见采用邻里成分分析技术突出了目标所在区域，而抑制了紧邻目标周边的背景，为提高定位的精度提供了条件。图7是本实施例在跟踪过程中对目标和背景采样的示意图；其中白色方块内为目标区域，方块外为背景区域。圆圈代表目标区域的采样点，x代表背景区域采样点。可见目标采样点在靠近目标中心处密集，在目标边界处稀少，有效地减少了边界附近可能存在的误差引入干扰的可能性。同时背景采样点在目标周围区域密集，远离目标处稀少，这保证了训练得到的分类器对目标和其紧邻背景间有最好的分类效果。图8是本实例在对训练样本集进行邻里成分分析之前，各样本在特征空间头两维的投影示意图。圆圈代表来自目标的样本点，x代表来自背景的样本点。可以看出因为背景中存在与目标特征接近的区域，使得两类样本集重叠，难以分类。图9是本实例在对训练样本集进行邻里成分分析之后，各样本在变换后的特征空间中头两维的投影示意图。圆圈代表来自目标的样本点，x代表来自背景的样本点。可以看出变换之后的两类样本集类间距明显变大，易于分类。As shown in Fig. 5, the tracking result of the "glass bottle" sequence obtained in this embodiment is shown. The target to be tracked is a glass bottle, and the tracking result of the algorithm is marked with a white box. The glass bottle repeatedly enters and exits the shadow in the background during the tracking process, causing drastic changes in the brightness of its surface. At the same time, the size of the glass bottle in the image also changes significantly due to the change in distance from the camera. Although the newly published L1APG tracking algorithm in 2012 can track the glass bottle, it gradually produced large errors in the scale parameters, which eventually affected the positioning accuracy of the target, as shown in Figure 4. Fig. 10 and Fig. 11 respectively provide the real width of the target, the real height of the target, the target width and target height obtained in this embodiment, and the curves of the width and height obtained by the L1APG algorithm with the number of video frames. Calculated from the curve in the figure: the average error between the width obtained in this embodiment and the target real width is 4.474529, and the variance is 13.178290. The average error between the width obtained by the L1APG algorithm and the true width of the target is 8.821549, and the variance is 106.027808. The average error between the height obtained in this embodiment and the true height of the target is 1.009198, and the variance is 12.256838. However, the average error between L1APG and the true height of the target is 41.400282, and the variance is 689.699398, which is obviously greater than the result of this embodiment. It can be seen that the method of this embodiment can track the size change of the target very well, and at the same time adapt to the change of the target brightness caused by the illumination, and achieve higher target positioning accuracy. Fig. 6 shows the target probability distribution map corresponding to each frame image in Fig. 5 in the processing process of this embodiment. It can be seen that the neighborhood component analysis technology is used to highlight the area where the target is located, while suppressing the background close to the target periphery, in order to improve the positioning accuracy Conditions are provided. FIG. 7 is a schematic diagram of sampling the target and the background during the tracking process of this embodiment; the target area is inside the white square, and the background area is outside the square. The circles represent the sampling points of the target area, and the x represent the sampling points of the background area. It can be seen that the sampling points of the target are dense near the center of the target and scarce at the boundary of the target, which effectively reduces the possibility of interference caused by errors that may exist near the boundary. At the same time, the background sampling points are dense in the area around the target, and few are far away from the target, which ensures that the trained classifier has the best classification effect between the target and its immediate background. Fig. 8 is a schematic diagram of the projection of each sample in the first two dimensions of the feature space before performing neighborhood component analysis on the training sample set in this example. Circles represent sample points from the target, and x represent sample points from the background. It can be seen that because there is an area close to the target feature in the background, the two types of sample sets overlap and it is difficult to classify. FIG. 9 is a schematic diagram of the first two-dimensional projections of each sample in the transformed feature space after neighborhood component analysis is performed on the training sample set in this example. Circles represent sample points from the target, and x represent sample points from the background. It can be seen that the class distance between the two types of sample sets after the transformation is significantly larger, and it is easy to classify.

如图12所示为采用本实施例得到的某“骑车人”序列的跟踪结果，目标用白色方框标示。目标先后两次被背景干扰所遮挡，但每次当遮挡结束后，本实施例方法都能继续对目标进行跟踪，可见本方法能有效应对目标遮挡问题。As shown in Fig. 12, the tracking result of a certain "cyclist" sequence obtained by using this embodiment, the target is marked with a white box. The target was occluded by the background interference twice successively, but each time after the occlusion ends, the method of this embodiment can continue to track the target. It can be seen that this method can effectively deal with the problem of target occlusion.

Claims

1. based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, this video target tracking method comprises the following steps:

Step 1: in the first frame, by the rectangle frame detected or manually mark is determined the initial place of target, obtain the original state of relevant target, and the initialization particle filter;

By object detection method or manual mark, determine the initial place of target rectangle frame, target rectangle upper left corner point coordinate is (r, c), and the wide and high of rectangle frame is (w, h), and target initial gauges parameter s is calculated acquisition by following formula:

s=((13+(w-34)*0.47619)) ²；

Wherein, target's center's point is

The width of record object and height ratio are:

asr = \frac{w}{h};

Under the sampling of the original state of target setting, be limited to lb, in sampling, be limited to ub;

Set N particle and describe the diversity of dbjective state, be initialized as the weights of all particles unified

Each component of each particle is initialized as to equally distributed random vector in [lb, ub] scope;

Step 2: to sampling in the rectangle frame at target place in present frame and on the background field of the outer rectangular frame at target place, obtain the training set X of 2K sample;

In the rectangle frame at target place, take target's center's point coordinate as the expectation that dimensional Gaussian distributes, sampled in target area, obtain K sampling location, with K the concentrated target class sample of object pixel feature composition training sample at place, K sampling location, the minimum of target rectangle of take is limit in abutting connection with oval center, the direction that is parallel to target width is that pole axis is set up polar coordinates, in polar coordinates, carry out stochastic sampling, the angle of each sampled point is [0, 2 π) equally distributed random number in, utmost point footpath is the multiple in the utmost point footpath of oval upper point under equal angular, this multiple is the random number of an exponential distribution and is greater than 1 floating number sum, sampling obtains K and is positioned at outside target area, new sampled point on background field on every side, feature with the background pixel at new sampling point position place forms K the background classes sample that training sample is concentrated, 2K sample by sampling obtains training set X,

Step 3: the training set X to 2K sample carries out neighbourhood's constituent analysis NCA, and uses vectorial BFGS multivariate optimized algorithm to solve to obtain new linear space shift-matrix A _new, 2K training sample is again according to obtaining new linear space shift-matrix A _newCarry out conversion, obtain the training sample set AX after conversion;

Step 4: obtain the next frame image, become present frame, the vector that the pixel characteristic of all positions in this two field picture is formed forms the test sample book collection, also according to A _newCarry out conversion, obtain the test sample book collection S after conversion _newTest sample book collection S after utilizing the training sample set AX after conversion in previous frame to conversion in this frame _newClassify, obtain the Probability p of each test sample book classification _Post, will belong to the pixel value of the probability of target class as each place, test sample book position, obtain the destination probability distribution plan I of the gray-scale map that a width is new _Likelihood

Step 5: at destination probability distribution plan I _LikelihoodUpper, calculate the Multi-scale normalized laplacian filter function centered by each place, particle position, the value at place, particle position; Wherein maximal value is vmax, with the Multi-scale normalized laplacian filter functional value of each particle, is the degree of confidence of this particle of basic calculation apart from the normalized distance of maximal value vmax;

Step 6: upgrade particle filter, obtain the output state of wave filter, obtain target means with rectangle frame in current frame image reposition, finish if follow the tracks of not yet, go to step 2, otherwise stop.

2. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the detailed process of described background up-sampling is as follows:

Generate [0,2 π) on interval equally distributed random number as the angle [alpha] in polar coordinates;

The random number χ of the exponential distribution of production rate parameter lambda=0.5, calculate utmost point footpath

β wherein>1, w and h are respectively the wide and high of target, and β is for controlling the parameter of sampled point to the object edge distance;

Obtain in image coordinate

The feature of the pixel at place forms a background classes sample, repeats said process and can obtain K background classes sample for K time.

3. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that the Probability p of described each test sample book classification _PostThe computation process detailed process is as follows:

The dimension of each sample is setting value n, i sample x _iBe expressed as n dimension row vector x _i=(x _I1, x _I2..., x _In), x _InFor the n dimension component of sample, using each sample as delegation, from the training sample through linear transformation of target and background, form the matrix A X of a 2K * n, front K is capable of target, rear K is capable of background, builds the matrix G of 2K * 2, and the value of each row is corresponding to the classification of corresponding line training sample, if i sample belongs to target, the i behavior (1,0) of G, if i sample belongs to background, the i behavior (0,1) of G; Similarly, test sample book collection S _newFormed the matrix of W * H * n, wherein W and H are respectively the wide and high of image; The AX of take is training sample set, and G is as the supervision result, to the test sample book collection S after conversion _newClassify; Obtain S _newIn the classification ownership of each sample, and the Probability p that belongs to target class _Post, the pixel value of each pixel position in the corresponding image of test sample book is replaced with to corresponding p _Post, generate destination probability distribution plan I _Likelihood.

4. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the concrete computation process of described Multi-scale normalized laplacian filter function is as follows:

(1) have continuous yardstick variable t, gaussian kernel function T:Z * R of discrete space variable n ₊→ R is T (n; T)=e ^-tI _n(t), I wherein _n(t) be first kind modified Bessel function, its second-order differential

Can pass through Difference Calculation:

Wherein * means one-dimensional discrete signal convolution;

(2) give dimensioning variable t, by destination probability distribution plan I _LikelihoodEvery delegation of matrix with Convolution, by matrix of consequence each row again with T (n; T) convolution, with L _Xx(x, y) means the first matrix of consequence of convolution for the second time; Similarly, by destination probability distribution plan I _LikelihoodMatrix each row with

Convolution, by every delegation of matrix of consequence again with T (n; T) convolution, with L _Yy(x, y) means the second matrix of consequence of convolution for the second time.The Multi-scale normalized laplacian filter function as shown in the formula the value of locating at (x, y, t) is:

Laplacian(x,y,t)=(t(L _xx(x,y)+L _yy(x,y))) ²。

5. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the computation process of described particle degree of confidence is as follows:

Obtain the maximal value vmax in all particle Multi-scale normalized laplacian filter functional values, calculate between each particle Multi-scale normalized laplacian filter functional value and vmax apart from d, and the variance var of all particle Laplacian filter function values, according to following formula, obtain the degree of confidence conf of this particle:

conf = e^{\frac{{- d}^{2}}{2 \times var}} .

6. according to claim 1 based on the video target tracking method of neighbourhood's constituent analysis and metric space theory, it is characterized in that, the acquisition of described target means with rectangle frame in current frame image reposition, detailed process is as follows:

The state of wave filter output is

The width of target is calculated by following formula:

w = \frac{\sqrt{s} - 13}{0.47619} + 34;

Be highly

h = \frac{w}{asr};

Target upper left corner point coordinate is

(r = \overset{\cdot}{r} - \frac{h}{2}, c = \overset{\cdot}{c} - \frac{w}{2}) .