CN103106667A

CN103106667A - Motion target tracing method towards shielding and scene change

Info

Publication number: CN103106667A
Application number: CN2013100397543A
Authority: CN
Inventors: 房胜; 汴紫涵; 徐田帅; 王飞; 党超
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2013-02-01
Filing date: 2013-02-01
Publication date: 2013-05-15
Anticipated expiration: 2033-02-01
Also published as: CN103106667B

Abstract

The invention discloses a moving object tracking method oriented to occlusion and scene change, which includes the following steps: a. Foreground motion detection is performed on the input video sequence to extract moving objects; ; If it is not saved, complete the template initialization and SURF feature extraction of the target object according to the area selected by the user, and the initialization of the Kalman filter; c use the method based on the Kalman filter to predict and track the moving target until the end of the video content , go to step e; when occlusion occurs during the tracking process, go to step d; d Use the matching method based on SURF features to determine the tracking object, when the feature matching tends to be stable and the occlusion is judged to be over, re-initialize the Kalman filter and go to step c; e output and save the target object feature information. The invention is a set of complete video moving object tracking method suitable for a fixed-background monocular camera, which can be made into software for easy application.

Description

A Moving Object Tracking Method Oriented to Occlusion and Scene Transformation

技术领域technical field

本发明属于图像处理技术、运动对象的追踪技术领域。具体地说是涉及基于kalman滤波和SURF方法相结合来实现遮挡和场景变换情况下运动对象的快速、准确追踪方法。The invention belongs to the field of image processing technology and tracking technology of moving objects. Specifically, it involves the combination of kalman filter and SURF method to realize the fast and accurate tracking method of moving objects under occlusion and scene change.

背景技术Background technique

目前的视频运动对象追踪方法有如下几种：The current video moving object tracking methods are as follows:

一是基于区域的跟踪方法，其首先分割出每一帧的视频对象，然后建立各分割对象间的对应关系，从而实现对视频对象的跟踪。这种方法对视频对象的分割要求很高，一旦视频片段中的某一帧或某几帧的对象分割错误，则整个视频对象的跟踪就会失败。One is the region-based tracking method, which firstly segments out the video objects of each frame, and then establishes the corresponding relationship between the segmented objects, so as to realize the tracking of the video objects. This method has high requirements on the segmentation of video objects. Once the object segmentation is wrong in a certain frame or several frames in the video clip, the tracking of the entire video object will fail.

二是Graph Cuts方法（又称为Min-Cut/Max-Flow方法），是一种经典的图像分割方法，目前很多图像分割方法都是基于Graph Cuts衍生出来的。由于运动对象的跟踪方法的前一步通常都是运动前景物体的提取，因此这种基于区域的跟踪方法应用比较广泛。但由于这种方法不能很好地分割相互遮挡的物体，因此这种方法在遮挡频繁发生的场景中效果较差。The second is the Graph Cuts method (also known as the Min-Cut/Max-Flow method), which is a classic image segmentation method. At present, many image segmentation methods are derived based on Graph Cuts. Since the first step of the moving object tracking method is usually the extraction of moving foreground objects, this region-based tracking method is widely used. However, since this method cannot segment objects that occlude each other well, this method is less effective in scenes where occlusions occur frequently.

三是基于模型的跟踪方法，目前基于模型的跟踪方法主要分为两类：即基于模型的人体跟踪和基于模型的车辆跟踪。由于该方法的特性，在得到物体2D图像坐标和3D坐标的对应关系后，即使物体发生较大程度的角度变换，也可以利用物体的3D模型进行跟踪。该方法要求先对被跟踪物体进行建模，然后将该模型和视频图像中的内容进行匹配来实现跟踪；且这种方法要求对被跟踪物体有足够的先验知识，才能建立出有效的目标模型。The third is the model-based tracking method. At present, the model-based tracking methods are mainly divided into two categories: model-based human body tracking and model-based vehicle tracking. Due to the characteristics of this method, after obtaining the corresponding relationship between the 2D image coordinates and 3D coordinates of the object, even if the object undergoes a large degree of angular transformation, the 3D model of the object can be used for tracking. This method requires modeling the tracked object first, and then matching the model with the content in the video image to achieve tracking; and this method requires sufficient prior knowledge of the tracked object to establish an effective target Model.

发明内容Contents of the invention

本发明的任务在于提供一种面向遮挡和场景变换的运动对象追踪方法，该方法能够快速、准确地实现对视频中特定运动物体的追踪。The task of the present invention is to provide a moving object tracking method oriented to occlusion and scene change, which can quickly and accurately realize the tracking of a specific moving object in a video.

其技术解决方案是：Its technical solutions are:

一种面向遮挡和场景变换的运动对象追踪方法，包括下列步骤：A moving object tracking method oriented to occlusion and scene transformation, comprising the following steps:

a对输入的视频序列进行前景运动检测，提取运动对象；然后进入步骤b，a Perform foreground motion detection on the input video sequence to extract moving objects; then enter step b,

b如果已经保存有追踪对象的特征，则进入步骤d；如果未保存有追踪对象的特征，则根据用户选定的区域完成对目标物体的模板初始化和SURF特征提取，以及Kalman滤波器的初始化；然后进入步骤c，b If the features of the tracking object have been saved, go to step d; if the features of the tracking object are not saved, complete the template initialization and SURF feature extraction of the target object according to the area selected by the user, and the initialization of the Kalman filter; Then go to step c,

c采用基于Kalman滤波器的方法对运动目标进行预测跟踪，直至视频内容结束，进入步骤e；当追踪过程中发生遮挡时，则进入步骤d；c Use the method based on the Kalman filter to predict and track the moving target until the end of the video content, then enter step e; when occlusion occurs during the tracking process, then enter step d;

d使用基于SURF特征的匹配方法确定追踪对象，在特征匹配趋于稳定并判断遮挡结束时，重新初始化Kalman滤波器后进入步骤c；dUse the SURF feature-based matching method to determine the tracking object. When the feature matching tends to be stable and the occlusion is judged to be over, re-initialize the Kalman filter and enter step c;

e输出并保存目标对象特征信息。e output and save the characteristic information of the target object.

上述步骤a中，建立两个参考帧I_bg(x,y)、I_up(x,y)，I_bg(x,y)为当前场景的背景帧，I_up(x,y)是一个随时间不断更新的参考帧；将当前帧I(x,y)分别与I_bg(x,y)、I_up(x,y)进行差分二值化，得到的结果记为：F_bg(x,y)、F_up(x,y)，根据二者的值分辨出场景中的遗留物和运动物体。In the above step a, two reference frames I _bg (x, y) and I _up (x, y) are established, I _bg (x, y) is the background frame of the current scene, and I _up (x, y) is a random A reference frame that is constantly updated in time; the current frame I(x,y) is differentially binarized with I _bg (x,y) and I _up (x,y) respectively, and the result obtained is recorded as: F _bg (x, y), F _up (x, y), according to the value of the two to distinguish the leftovers and moving objects in the scene.

上述步骤c中，首先对Kalman滤波器进行初始化，然后根据观测到的目标物体状态进行预测跟踪；在跟踪过程中，根据目标对象的轮廓变化情况自适应地更新模板图像，并将有代表性的特征信息进行保存；在追踪过程中，采用基于轮廓相交的判断方法对是否发生遮挡进行建模、分析和判断。In the above step c, the Kalman filter is first initialized, and then predictive tracking is performed according to the observed state of the target object; during the tracking process, the template image is adaptively updated according to the contour changes of the target object, and a representative The feature information is saved; in the tracking process, the judgment method based on the intersection of contours is used to model, analyze and judge whether occlusion occurs.

上述步骤d中，自动搜索视频内容并找到与被跟踪物体特征点匹配最多的前景团块，针对测量误差和噪声引起的错误匹配，在得到SURF特征匹配点对后使用RANSAC算法进行精确匹配并得到图像间的转换的单应性矩阵，在视频中标定目标对象；判断遮挡是否结束与上述判断是否发生遮挡采用相同模型；重新初始化Kalman滤波器采用的方法与上述Kalman滤波器的初始化方法相同。In the above step d, the video content is automatically searched and the foreground blob that matches the most feature points of the tracked object is found. For the wrong match caused by measurement error and noise, after obtaining the SURF feature matching point pair, use the RANSAC algorithm to perform an exact match and get The homography matrix of the conversion between images is used to mark the target object in the video; the same model is used to judge whether the occlusion is over and whether the occlusion occurs above; the method of reinitializing the Kalman filter is the same as the initialization method of the above Kalman filter.

本发明可具有如下有益技术效果：The present invention can have following beneficial technical effect:

本发明采用Kalman滤波器和SURF特征匹配相结合的方法，一方面在无遮挡和场景变换时，Kalman滤波器可以快速的完成预测和追踪，另一方面在发生遮挡和场景变换时，利用SURF特征的尺度不变等特性可以有效解决Kalman滤波器失效情况下的目标跟踪问题；因此具有快速、准确的优点，并且由于可以根据目标轮廓的变化自适应地更新目标模板，鲁棒性也很好。本发明是综合运用遗留物检测算法、Kalman滤波器、SURF特征、遮挡判断等，提出来的一套完整的适用于固定背景单目摄像机的视频运动对象追踪方法，可制成软件，便于应用。The present invention adopts the method of combining Kalman filter and SURF feature matching. On the one hand, when there is no occlusion and scene change, the Kalman filter can quickly complete prediction and tracking; on the other hand, when occlusion and scene change occur, the SURF feature is used The scale invariance and other characteristics of the Kalman filter can effectively solve the target tracking problem under the failure of the Kalman filter; therefore, it has the advantages of fast and accurate, and because it can adaptively update the target template according to the change of the target outline, the robustness is also very good. The present invention is a complete set of video moving object tracking method suitable for fixed background monocular cameras, which is proposed by comprehensively using relic detection algorithm, Kalman filter, SURF feature, occlusion judgment, etc., and can be made into software for easy application.

附图说明Description of drawings

下面结合附图与具体实施方式对本发明作更进一步的说明：Below in conjunction with accompanying drawing and specific embodiment the present invention will be further described:

图1为本发明中的前景运动检测过程示意图。FIG. 1 is a schematic diagram of the foreground motion detection process in the present invention.

图2为本发明中所使用的基本Kalman滤波器工作流程示意图。Fig. 2 is a schematic diagram of the workflow of the basic Kalman filter used in the present invention.

图3为本发明中所使用的SURF与SIFT算法构建的尺度空间对比示意图。Fig. 3 is a schematic diagram of a scale space comparison between SURF and SIFT algorithms used in the present invention.

图4为本发明中所使用的SURF算法在三个方向上的方框滤波示意图。FIG. 4 is a schematic diagram of block filtering in three directions of the SURF algorithm used in the present invention.

图5为本发明的流程框图。Fig. 5 is a flowchart of the present invention.

具体实施方式Detailed ways

为了更好地理解及实现本发明，首先对本发明使用的技术背景介绍如下：In order to better understand and realize the present invention, at first the technical background that the present invention uses is introduced as follows:

一、运动对象检测算法。1. Moving object detection algorithm.

1.时间域差分法1. Time domain difference method

时间域差分就是将视频序列中相邻俩帧图像做差，通过得到的像素差值提取运动对象。这种方法简单方便，适合动态背景下的运动提取。但通过这种方法得到的目标轮廓可能并不完整。例如当运动物体移动十分缓慢，而且本身具有大面积平滑区域时，这样将相邻俩帧图像做差就不能得到重叠的部分，得到的轮廓会出现“空洞”。The time domain difference is to make the difference between two adjacent frames in the video sequence, and extract the moving object through the obtained pixel difference. This method is simple and convenient, and is suitable for motion extraction in dynamic backgrounds. But the target contour obtained by this method may not be complete. For example, when the moving object moves very slowly and has a large area of smooth area, the overlap between two adjacent frame images cannot be obtained, and the obtained contour will appear "hole".

目前一种改进方法是利用三帧差分代替俩帧差分，这样就能较好地检测出中间帧运动目标的轮廓。设视频序列中相邻的三帧图像为：I_t-1(x,y)、I_t(x,y)、I_t+1(x,y)，分别计算相邻俩帧的像素差值：At present, an improved method is to use three-frame difference instead of two-frame difference, so that the outline of the moving target in the middle frame can be better detected. Let the three adjacent frames of images in the video sequence be: I _t-1 (x, y), I _t (x, y), I _t+1 (x, y), respectively calculate the pixel difference between the two adjacent frames :

然后对得到的差值图像进行二值化处理，得到二值化图像：Then binarize the obtained difference image to obtain a binarized image:

对b_(t，t1)(x,y)和b_(t+1,t)(x,y)进行逻辑“与”运算，得到二值图像B_t(x,y):Perform a logical "AND" operation on b _{(t, t1)} (x, y) and b _{(t+1, t)} (x, y) to obtain a binary image B _t (x, y):

${B B}_{t t} ((x x,, y the y)) = = \{\begin{matrix} 11 & {b b}_{((t t,, t t - - 11))} ((x x,, y the y)) \cap \cap {b b}_{((t t + + 11,, t t))} ((x x,, y the y)) = = 11 \\ 00 & {b b}_{((t t,, t t - - 11))} ((x x,, y the y)) \cap \cap {b b}_{((t t + + 11,, t t))} ((x x,, y the y)) = = 00 \end{matrix} - - - - - - ((33))$

最后对得到的二值图像进行腐蚀、膨胀等处理，以消除噪声和“空洞”。Finally, corrode and dilate the obtained binary image to eliminate noise and "holes".

2.光流法2. Optical flow method

光流法检测运动物体的基本原理是：给图像中的每一个像素定义一个速度矢量，如果图像中没有运动物体，则光流矢量在整个图像区域中是连续变化的；如果图像中存在运动物体，则运动物体和背景之间存在相对运动，二者的速度矢量不同，从而检测出运动物体。由于光流的计算方法十分复杂而且计算量很大，因此一般不被实时系统所采用。The basic principle of the optical flow method to detect moving objects is: define a velocity vector for each pixel in the image, if there is no moving object in the image, the optical flow vector changes continuously in the entire image area; if there is a moving object in the image , there is relative motion between the moving object and the background, and the velocity vectors of the two are different, thus detecting the moving object. Because the calculation method of optical flow is very complex and has a large amount of calculation, it is generally not used by real-time systems.

3.背景建模3. Background Modeling

背景建模和时间域差分方法有相似之处，都是做两帧图像的差值，不同的是背景建模方法是将当前帧与参考帧（背景帧）进行差值运算。背景建模方法广泛应用于静止摄像头的运动检测，背景帧的选取是整个算法的关键，背景建模也就是对背景帧的建模，理想状态下的背景帧应该不含运动物体，而且能够按照一定的策略进行更新以适应场景的动态变化，例如光照变化、背景中的树叶摆动、水波荡漾、雨雪飘落等情况。现有的背景建模方法大致可分为六类：增量式高斯平均、时序中值滤波、混合高斯模型、核密度估计、顺序核密度近似和特征背景模型。The background modeling and the time domain difference method have similarities, both of which are the difference between two frames of images. The difference is that the background modeling method is to perform a difference operation between the current frame and the reference frame (background frame). The background modeling method is widely used in the motion detection of still cameras. The selection of the background frame is the key to the whole algorithm. Background modeling is the modeling of the background frame. Ideally, the background frame should contain no moving objects and can be A certain strategy is updated to adapt to the dynamic changes of the scene, such as lighting changes, swaying leaves in the background, rippling water, falling rain and snow, etc. Existing background modeling methods can be roughly divided into six categories: incremental Gaussian averaging, temporal median filtering, mixture of Gaussian models, kernel density estimation, sequential kernel density approximation, and feature background models.

本发明采用的运动检测方法就是一种基于背景建模思想的简单快速方法：遗留物检测法，结合图1。建立两个参考帧I_bg(x,y)、I_up(x,y)。I_bg(x,y)为当前场景的背景帧，I_up(x,y)是一个随时间不断更新的参考帧。将当前帧I(x,y)分别与I_bg(x,y)、I_up(x,y)进行差分二值化，得到的结果记为：F_bg(x,y)、F_up(x,y)并根据二者的值分辨场景中的遗留物和运动物体。The motion detection method adopted in the present invention is a simple and fast method based on the idea of background modeling: remnant object detection method, combined with FIG. 1 . Two reference frames I _bg (x,y) and I _up (x,y) are established. I _bg (x, y) is the background frame of the current scene, and I _up (x, y) is a reference frame that is constantly updated over time. The current frame I(x,y) is differentially binarized with I _bg (x,y) and I _up (x,y) respectively, and the obtained results are recorded as: F _bg (x,y), F _up (x ,y) and distinguish remnants and moving objects in the scene according to the values of the two.

二、基于Kalman滤波器的追踪算法，结合图2。2. Tracking algorithm based on Kalman filter, combined with Figure 2.

1.离散Kalman滤波器1. Discrete Kalman filter

这是本发明使用的在无遮挡、无场景变换时的运动对象追踪方法。基本的卡尔曼滤波是解决线性滤波和预测问题的方法，以均方误差最小为准则，具有简单、快速的特点。This is the moving object tracking method used in the present invention when there is no occlusion and no scene change. The basic Kalman filter is a method to solve linear filtering and prediction problems, with the minimum mean square error as the criterion, and has the characteristics of simplicity and speed.

Kalman滤波器预测过程：Kalman filter prediction process:

${\overset{^^}{X x}}_{k k}^{- -} = = {F f}_{k k} {\overset{^^}{X x}}_{k k - - 11} + + {B B}_{k k - - 11} {u u}_{k k - - 11} - - - - - - ((11))$

${P P}_{k k}^{- -} = = {F f}_{k k - - 11} {P P}_{k k - - 11} {F f}_{k k - - 11}^{T T} + + {Q Q}_{k k - - 11} - - - - - - ((22))$

Kalman滤波器更新过程：Kalman filter update process:

${K K}_{k k} = = {P P}_{k k}^{- -} {H h}_{k k}^{T T} {(({H h}_{k k} {P P}_{k k}^{- -} {H h}_{k k}^{T T} + + {R R}_{k k}))}^{- - 11} - - - - - - ((33))$

${\overset{^^}{X x}}_{k k} = = {\overset{^^}{X x}}_{k k}^{- -} + + {K K}_{k k} (({Z Z}_{k k} - - {H h}_{k k} {\overset{^^}{X x}}_{k k}^{- -})) - - - - - - ((44))$

${P P}_{k k} = = ((I I - - {K K}_{k k} {H h}_{k k})) {P P}_{k k}^{- -} - - - - - - ((55))$

其中F是状态转移矩阵，H是测量矩阵，Z是观测值，B是输入变换矩阵，u是输入值（在有些系统中不需要新的输入值，因此B和u可以省略）。Q和R分别表示状态转移过程和观测过程中噪声向量的方差。

表示k-1时刻对k时刻状态X_k的最佳预测值，

表示用k时刻的观测值Z_k和前一时刻对本时刻的预测值

对X_k所作的状态更新。P是协方差，其上下标的含义与X相同，K表示Kalman增益。where F is the state transition matrix, H is the measurement matrix, Z is the observed value, B is the input transformation matrix, and u is the input value (in some systems no new input value is required, so B and u can be omitted). Q and R represent the variance of the noise vector in the state transition process and observation process, respectively.

Indicates the best predicted value of the state X k at time k-1 to the state X _k at time k,

Indicates that the observed value Z _k at k time and the predicted value of the current time at the previous time

A state update made to X _k . P is the covariance, and its superscript and subscript have the same meaning as X, and K represents the Kalman gain.

2.扩展的Kalman滤波器2. Extended Kalman filter

针对非线性状态空间模型的次优方法，应用最为广泛的一类方法是扩展的Kalman滤波器（extended Kalman filtering，EKF）。EKF的基本思路是首先将非线性系统线性化，然后再进行与线性Kalman滤波器类似的处理。其具体方法是对非线性函数的泰勒展开式进行截断，从而将非线性函数线性化。根据泰勒展开式进行的是一阶还是二阶截取，EKF主要可分为一阶EKF（first order EKF）和二阶EKF（second order EKF）。The most widely used class of suboptimal methods for nonlinear state-space models is the extended Kalman filter (extended Kalman filtering, EKF). The basic idea of EKF is to linearize the nonlinear system first, and then perform similar processing to the linear Kalman filter. The specific method is to truncate the Taylor expansion of the nonlinear function, so as to linearize the nonlinear function. According to whether the Taylor expansion is first-order or second-order interception, EKF can be mainly divided into first-order EKF (first order EKF) and second-order EKF (second order EKF).

虽然扩展的Kalman滤波器在解决非线性模型方面有出色的表现，但它在实际应用中也存在明显的不足：一是将非线性模型进行线性化的过程中可能会产生不稳定的滤波；二是在计算雅克比矩阵的导数时，实现较为复杂；三是现实中的模型函数可能存在不可微的情况，这样就会导致EKF失效。因此在模型的非线性较强以及系统中的噪声为非高斯分布时，EKF的估计精度就会大大降低，最终导致失败。Although the extended Kalman filter has excellent performance in solving nonlinear models, it also has obvious shortcomings in practical applications: first, unstable filtering may occur during the linearization of nonlinear models; The first is that when calculating the derivative of the Jacobian matrix, the implementation is more complicated; the third is that the model function in reality may not be differentiable, which will cause the EKF to fail. Therefore, when the nonlinearity of the model is strong and the noise in the system is non-Gaussian distribution, the estimation accuracy of EKF will be greatly reduced, which eventually leads to failure.

三、SIFT、SURF特征匹配法，结合图3，图中的左图为传统方法中构建的图像金字塔，上一层图像是对前一层图像的下采样；右图为SURF算法中构建尺度空间的方法，图像不变，改变的只是滤波模板的大小。3. SIFT and SURF feature matching methods, combined with Figure 3, the left picture in the figure is the image pyramid constructed in the traditional method, and the upper layer image is a downsampling of the previous layer image; the right picture is the scale space constructed in the SURF algorithm In this method, the image remains unchanged, and only the size of the filter template is changed.

SURF算法是SIFT的改进算法。SURF算法在特征匹配速度上要远远优于SIFT算法，因此SURF算法可以进一步应用到实时的图像匹配场景中去。本发明采用SURF算法进行特征点匹配。The SURF algorithm is an improved algorithm of SIFT. The SURF algorithm is far superior to the SIFT algorithm in terms of feature matching speed, so the SURF algorithm can be further applied to real-time image matching scenarios. The present invention adopts the SURF algorithm to perform feature point matching.

SURF算法分为尺度空间构建、特征点检测、特征描述符生成和特征点匹配四个部分。The SURF algorithm is divided into four parts: scale space construction, feature point detection, feature descriptor generation and feature point matching.

1.尺度空间构建1. Scale space construction

SURF算法的尺度空间构成Scale Space Composition of SURF Algorithm

传统的尺度空间被描述为一个金字塔，高斯卷积核是实现尺度变换的唯一线性核，给定一幅图像I(x,y)，则它的尺度空间定义为：The traditional scale space is described as a pyramid, and the Gaussian convolution kernel is the only linear kernel that realizes scale transformation. Given an image I(x,y), its scale space is defined as:

L(x,y,δ)=G(x,y,δ)*I(x,y) （1）L(x,y,δ)=G(x,y,δ)*I(x,y) (1)

其中G(x,y)是尺度可变高斯函数，where G(x,y) is a scale-variable Gaussian function,

$G G ((x x,, y the y,, δ δ)) = = \frac{11}{{22 πδ πδ}^{22}} {e e}^{- - (({x x}^{22} + + {y the y}^{22})) / / {22 δ δ}^{22} - - - - - - ((22))}$

(x,y)是空间坐标，δ是尺度坐标，δ的大小决定了图像的平滑程度。利用上面的公式，根据图像的大小最终决定金字塔尺度空间的阶数，以及每阶金字塔中包含图像的层数。第一阶金字塔的第一层是原始图像，之后往上的每一层是对前一层进行Laplace变换（高斯卷积，δ值逐渐增大）。从直观上来看，越往上的图像越模糊。(x, y) is the spatial coordinate, δ is the scale coordinate, and the size of δ determines the smoothness of the image. Using the above formula, the order of the pyramid scale space and the number of layers of images contained in each order of the pyramid are finally determined according to the size of the image. The first layer of the first-order pyramid is the original image, and each layer above is the Laplace transform of the previous layer (Gaussian convolution, the δ value gradually increases). From an intuitive point of view, the higher the image is, the blurrier it is.

为了有效地在尺度空间中检测到稳定的关键点，Lowe等人提出了高斯差分尺度空间（DOG scale-space）。To efficiently detect stable keypoints in scale space, Lowe et al. proposed Difference of Gaussian scale-space (DOG scale-space).

D(x,y,δ)=(G(x,y,kδ)-G(x,y,δ))*I(x,y)=L(x,y,kδ)-L(x,y,δ) (3)D(x,y,δ)=(G(x,y,kδ)-G(x,y,δ))*I(x,y)=L(x,y,kδ)-L(x,y ,δ) (3)

DoG金子塔尺度空间的每一层是由高斯金字塔相邻两层相减得到的，因此DoG金字塔与高斯金子塔相比，塔的阶数相同，但每阶塔中的层数减一。Each layer in the scale space of the DoG pyramid is obtained by subtracting two adjacent layers of the Gaussian pyramid. Therefore, compared with the Gaussian pyramid, the order of the DoG pyramid is the same, but the number of layers in each order is reduced by one.

SIFT算法的尺度空间构成Scale Space Composition of SIFT Algorithm

SIFT算法构建尺度空间的方法，其缺点在于每层图像的建立都要依赖于前一层图像，而且图像的尺寸需要重新设置，因此这种方法的运算量较大。The disadvantage of the SIFT algorithm to construct the scale space is that the establishment of each layer of image depends on the image of the previous layer, and the size of the image needs to be reset, so this method has a large amount of calculation.

SURF算法在构建图像金字塔时改变的不是图像尺寸，而是滤波模板的大小。SURF算法在构建尺度空间时可以并行处理，而且不需要对图像进行二次抽样，从而提高了运算速度。SURF算法与SIFT算法构造的尺度空间的差异如图3所示。What the SURF algorithm changes when constructing an image pyramid is not the size of the image, but the size of the filter template. The SURF algorithm can be processed in parallel when constructing the scale space, and does not need to subsample the image, thus improving the operation speed. The difference between the scale space constructed by the SURF algorithm and the SIFT algorithm is shown in Figure 3.

2.特征点检测2. Feature point detection

给定一幅图像I(x,y)，其积分图像为Given an image I(x,y), its integral image is

${I I}_{Σ Σ} ((x x,, y the y)) = = {Σ Σ}_{i i = = 00}^{i i \leq \leq x x} {Σ Σ}_{j j = = 00}^{j j \leq \leq y the y} I I ((i i,, j j)) - - - - - - ((44))$

SURF特征点检测利用Hessian矩阵的行列式判断图像上某点是否为极值点。设f(x,y)是一个二阶可微的函数，则其Hessian矩阵为SURF feature point detection uses the determinant of the Hessian matrix to judge whether a point on the image is an extreme point. Let f(x,y) be a second-order differentiable function, then its Hessian matrix is

$H h ((f f ((x x,, y the y)))) = = [\begin{matrix} \frac{{&PartialD; &PartialD;}^{22} f f}{{&PartialD; &PartialD; x x}^{22}} & \frac{{&PartialD; &PartialD;}^{22} f f}{&PartialD; &PartialD; x x &PartialD; &PartialD; y the y} \\ \frac{{&PartialD; &PartialD;}^{22} f f}{&PartialD; &PartialD; x x &PartialD; &PartialD; y the y} & \frac{{&PartialD; &PartialD;}^{22} f f}{{&PartialD; &PartialD; y the y}^{22}} \end{matrix}] - - - - - - ((55))$

则矩阵H的行列式Then the determinant of matrix H

$det det H h = = \frac{{&PartialD; &PartialD;}^{22} f f}{{&PartialD; &PartialD; x x}^{22}} \frac{{&PartialD; &PartialD;}^{22} f f}{{&PartialD; &PartialD; x x}^{22}} - - {((\frac{{&PartialD; &PartialD;}^{22} f f}{&PartialD; &PartialD; x x &PartialD; &PartialD; y the y}))}^{22} - - - - - - ((66))$

是H的特征值的乘积，若det H<0，则点(x,y)不是局部极值点；若det H>0，则点(x,y)是局部极值点。于是图像I(x,y)在尺度δ下的Hessian矩阵为is the product of the eigenvalues of H, if det H<0, the point (x, y) is not a local extremum point; if det H>0, then the point (x, y) is a local extremum point. Then the Hessian matrix of the image I(x,y) at scale δ is

$H h ((x x,, y the y,, δ δ)) = = [\begin{matrix} {L L}_{xx xx} ((x x,, y the y,, δ δ)) & {L L}_{xy xy} ((x x,, y the y,, δ δ)) \\ {L L}_{xy xy} ((x x,, y the y,δ,δ)) & {L L}_{yy yy} ((x x,, y the y,, δ δ)) \end{matrix}] - - - - - - ((77))$

其中L_xx(x,y,δ)是图像在点(x,y)处与高斯函数二阶偏导

的卷积，类似地可定义L_xy(x,y,δ)和L_yy(x,y,δ)。where L _xx (x,y,δ) is the second-order partial derivative of the image at point (x,y) and the Gaussian function

The convolution of L _xy (x,y,δ) and L _yy (x,y,δ) can be defined similarly.

Bay等人对卷积核进行合理的离散化和裁剪后用方框滤波模板代替，并使用积分图像来加速卷积速度降低计算量。经过裁剪后三个卷积核分别为D_xx、D_yy和D_xy，它们是L_xx、L_yy和L_xy的简化表示。9×9的方框滤波模板如图4所示，对应的二阶高斯滤波尺度因子δ=1.2。Bay et al. performed reasonable discretization and cropping of the convolution kernel and replaced it with a box filter template, and used the integral image to accelerate the convolution speed and reduce the amount of calculation. After cropping, the three convolution kernels are D _xx , D _yy and D _xy , which are simplified representations of L _xx , L _yy and L _xy . The 9×9 box filter template is shown in Figure 4, and the corresponding second-order Gaussian filter scale factor δ=1.2.

由于方框滤波是二阶高斯滤波的近似估计，因此为了弥补用方框滤波代替二阶高斯滤波计算Hessian矩阵行列式值而造成的误差，则有Since the box filter is an approximate estimate of the second-order Gaussian filter, in order to compensate for the error caused by using the box filter instead of the second-order Gaussian filter to calculate the determinant of the Hessian matrix, there is

detH=D_xxD_yy-(0.9D_xy)² (8)detH=D _xx D _yy -(0.9D _xy ) ² (8)

边长为9的卷积核是最小尺度的卷积核,随着尺度的不断增加,卷积核(滤波模板)的大小也成比例的增加。如果滤波模板大小为N*N,则对应尺度δ=N*1.2/9。The convolution kernel with a side length of 9 is the smallest scale convolution kernel. As the scale increases, the size of the convolution kernel (filter template) also increases proportionally. If the filter template size is N*N, the corresponding scale δ=N*1.2/9.

通过Hessian矩阵行列式得到各个尺度的极值点后，每个极值点和其同一尺度的8个相邻点以及它上下两个尺度的各9个点进行比较，当该极值点的值在26个值中最大或最小时，才将该极值点被作为候选特征点。After obtaining the extreme points of each scale through the Hessian matrix determinant, each extreme point is compared with its 8 adjacent points of the same scale and 9 points of its upper and lower scales. When the value of the extreme point When the maximum or minimum of the 26 values is reached, the extreme point is used as a candidate feature point.

最后用M.Brown所提到的方法进行插值运算，得到亚像素精度的特征点位置及所在尺度值。同时去掉对比度低的特征点和不稳定的边缘响应点（因为DoG方法会产生较强的边缘响应），以提高抗噪声能力和增强匹配的稳定性。Finally, the method mentioned by M.Brown is used for interpolation operation to obtain the sub-pixel precision feature point position and scale value. At the same time, low-contrast feature points and unstable edge response points are removed (because the DoG method will produce a strong edge response) to improve the anti-noise ability and enhance the stability of matching.

由于特征点是在图像尺度空间上选取的局部稳定点，所以满足尺度变化情况下特征匹配的要求。Since the feature point is a local stable point selected in the image scale space, it meets the requirement of feature matching under the condition of scale change.

SURF算法之所以比SIFT算法的运算效率高，是因为SURF算法使用了积分图像和Hessian矩阵来加速特征点的检测。The reason why the SURF algorithm is more efficient than the SIFT algorithm is that the SURF algorithm uses the integral image and the Hessian matrix to accelerate the detection of feature points.

3.SURF特征描述符生成3. SURF feature descriptor generation

SURF特征描述子的生成分为两步：主方向确定和构建描述符。The generation of SURF feature descriptors is divided into two steps: determination of main directions and construction of descriptors.

主方向确定。为保证特征点的旋转不变性，在特征点周围6δ（δ为特征点所在尺度）的邻域内按步长δ对像素点采样，并且在采样点处求取边长为4δ的Harr小波卷积核在X和Y方向上的响应值。为使靠近特征点的响应贡献大，远离特征点的响应贡献小，对小波响应按σ=2δ进行高斯加权，然后表示为一个二维坐标上的点，最后可得到所有采样点响应在二维平面上的分布图。然后用一个张角为60°的滑窗按固定步长滑动，每次将60°范围内的响应相加形成新矢量，选择最长矢量的方向为此特征点的主方向The main direction is determined. In order to ensure the rotation invariance of the feature points, the pixel points are sampled according to the step size δ in the neighborhood of 6δ around the feature point (δ is the scale of the feature point), and the Harr wavelet convolution with a side length of 4δ is obtained at the sampling point The response value of the kernel in the X and Y directions. In order to make the contribution of the response close to the feature point large and the contribution of the response far away from the feature point small, Gaussian weighting is performed on the wavelet response according to σ=2δ, and then expressed as a point on a two-dimensional coordinate, and finally all the sample point responses can be obtained in two dimensions Distribution map on a flat surface. Then use a sliding window with an opening angle of 60° to slide at a fixed step size, add the responses within the 60° range each time to form a new vector, and select the direction of the longest vector as the main direction of the feature point

构建描述符。然后以特征点为中心，将坐标轴旋转到主方向。选取边长为20δ的正方形区域，将该窗口分成4×4的子区域，在每一个子区域内计算25个采样点的Harr响应，分别记为d_x和d_y。然后在每个子窗口上以σ=3.3δ的高斯函数加权得到d_x和d_y的累积值，即∑d_x，∑d_y，∑│d_x│，∑│d_y│。每个子区域形成一个四维的新矢量，由于正方形区域包含16个子窗口，则每一个特征点形成16×4=64维的描述符。build descriptor. Then take the feature point as the center and rotate the coordinate axis to the main direction. Select a square area with a side length of 20δ, divide the window into 4×4 sub-areas, and calculate the Harr responses of 25 sampling points in each sub-area, which are denoted as d _x and d _y . Then weighted by a Gaussian function of σ=3.3δ on each sub-window to obtain the cumulative value of d _x and d _y , that is, ∑d _x , ∑d _y , ∑│d _x │, ∑│d _y │. Each sub-region forms a new four-dimensional vector. Since the square region contains 16 sub-windows, each feature point forms a 16×4=64-dimensional descriptor.

通过主方向对齐的方法，SURF特征具有旋转不变性，可用于旋转情况下的特征匹配。Through the method of main direction alignment, SURF features have rotation invariance and can be used for feature matching in the case of rotation.

4.SURF特征点匹配4. SURF feature point matching

特征点的匹配利用特征向量的欧式距离作为两幅图像中关键点的相似性判定度量。欧氏距离的公式为：The matching of feature points uses the Euclidean distance of feature vectors as the similarity judgment measure of key points in two images. The formula for Euclidean distance is:

d=sqrt(∑(x_i1-x_i2)²) (9)d=sqrt(∑(x _i1 -x _i2 ) ² ) (9)

x_i1表示第一幅图像上某一点的第i维坐标，x_i2表示第二幅图像上某一点的第i维坐标。x _i1 represents the i-th dimensional coordinate of a certain point on the first image, and x _i2 represents the i-th dimensional coordinate of a certain point on the second image.

具体做法为：取第一幅图像中的某个关键点，找出第二幅图像中与其欧氏距离最近的前两个关键点。如果在这两个关键点中最近的距离除以次近的距离小于某个阈值，则认为找到一对匹配点。增加这个阈值，匹配点的数目就会增多，但准确率下降；相反降低这个阈值，匹配点的数目就会减少，但准确率会提高。The specific method is: take a certain key point in the first image, and find out the first two key points with the closest Euclidean distance to it in the second image. A pair of matching points is considered to be found if the closest distance divided by the next closest distance among these two keypoints is less than some threshold. If the threshold is increased, the number of matching points will increase, but the accuracy rate will decrease; on the contrary, if the threshold is lowered, the number of matching points will decrease, but the accuracy rate will increase.

5.SURF算法在运动对象追踪中的不足5. Insufficiency of SURF algorithm in moving object tracking

虽然SIFT/SURF特征对图像的尺度缩放、旋转、平移、亮度变化等具有较为稳定的不变性，但将它应用在运动对象追踪中依然存在着不足，主要有以下几个方面：Although the SIFT/SURF feature has relatively stable invariance to image scaling, rotation, translation, brightness changes, etc., there are still deficiencies in applying it to moving object tracking, mainly in the following aspects:

1）由于在构建尺度空间时采用的是基于图像金子塔的方法，因此可能会出现层取得不够紧密从而造成尺度匹配有误差的情况。当原图像本身较小时，构建尺度空间对特征点的提取影响不大。1) Since the method based on the image pyramid is used when constructing the scale space, there may be cases where the layers are not obtained tightly enough, resulting in scale matching errors. When the original image itself is small, the construction of scale space has little effect on the extraction of feature points.

2）在SURF特征点过滤过程中会去除对比度低的点和不稳定的边缘响应点。如果图像内容中存在大片平滑区域，这样这部分内容的特征信息就会被过滤。同样作为重要特征的边缘信息也可能会被省略。2) During the SURF feature point filtering process, points with low contrast and unstable edge response points will be removed. If there is a large smooth area in the image content, the feature information of this part of the content will be filtered. Edge information, which is also an important feature, may also be omitted.

3）SURF特征是图像内容的局部特征，忽略了图像本身的全局信息。3) The SURF feature is a local feature of the image content, ignoring the global information of the image itself.

4）SURF算法在特征点匹配时使用的搜索策略效率不高，而且不能充分利用邻近特征点间的位置关系，从而可能会造成错误匹配。4) The search strategy used by the SURF algorithm in the matching of feature points is not efficient, and it cannot make full use of the positional relationship between adjacent feature points, which may cause wrong matching.

5）SIFT/SURF算法本身只利用了图像的灰度特性，没有考虑到图像本身颜色信息。5) The SIFT/SURF algorithm itself only uses the grayscale characteristics of the image, and does not take into account the color information of the image itself.

6）与kalman滤波器追踪技术相比，SURF算法的运算量要大的多；单纯使用SURF算法用于视频运动对象的追踪很难满足实时性的要求。6) Compared with the kalman filter tracking technology, the calculation amount of the SURF algorithm is much larger; simply using the SURF algorithm for tracking video moving objects is difficult to meet the real-time requirements.

四、单应性矩阵。Fourth, the homography matrix.

在进行目标对象定位时，用到了单应性矩阵的概念。同一空间内两幅图像AB，如果从图像A到图像B之间存在一一对应的映射关系，该映射关系用矩阵表示就是一个单应性矩阵。设两幅图像中某点坐标分别为I(x,y,1)，I'(x',y',1)，单应性矩阵H，则对应的投影关系为：The concept of homography matrix is used in target object location. For two images AB in the same space, if there is a one-to-one mapping relationship between image A and image B, the mapping relationship is represented by a matrix as a homography matrix. Let the coordinates of a point in the two images be I(x,y,1), I'(x',y',1), and the homography matrix H, then the corresponding projection relationship is:

$k k [\begin{matrix} {x x}^{' '} \\ {y the y}^{' '} \\ 11 \end{matrix}] = = H h [\begin{matrix} x x \\ y the y \\ 11 \end{matrix}] = = [\begin{matrix} {h h}_{11} & {h h}_{22} & {h h}_{33} \\ {h h}_{44} & {h h}_{55} & {h h}_{66} \\ {h h}_{77} & {h h}_{88} & {h h}_{99} \end{matrix}] [\begin{matrix} x x \\ y the y \\ 11 \end{matrix}] - - - - - - ((11))$

其中k为比例系数，通常H为自由度为8的变换矩阵，当h₉=1时，由(1)可得：Where k is a proportional coefficient, usually H is a transformation matrix with 8 degrees of freedom, when h ₉ =1, it can be obtained from (1):

h₁x+h₂y+h₃-h₇xx′-h₈yx′-x′=0 (2)h ₁ x+h ₂ y+h ₃ -h ₇ xx′-h ₈ yx′-x′=0 (2)

h₄x+h₅y+h₆-h₇xy-h₈yy-y=0 (3)h ₄ x+h ₅ y+h ₆ -h ₇ xy-h ₈ yy-y=0 (3)

这样只要有4对匹配点的坐标就可以计算出单应性矩阵H。In this way, the homography matrix H can be calculated as long as there are 4 pairs of coordinates of matching points.

结合上述内容以及图1与图5，下面进一步详细说明本发明的技术方案：In combination with the above content and Fig. 1 and Fig. 5, the technical solution of the present invention is further described in detail below:

a对输入的视频序列进行前景运动检测，提取运动对象。a Perform foreground motion detection on the input video sequence to extract moving objects.

在本发明所需要处理的应用场景中，运动对象进入后长时间滞留在同一位置、或者背景中的物体突然移动都是可能出现的情况。本发明采用遗留物检测方法来提取运动对象。In the application scenarios that need to be processed by the present invention, it is possible that a moving object stays at the same position for a long time after entering, or an object in the background moves suddenly. The invention adopts the remnants detection method to extract moving objects.

遗留物检测与运动检测不同，它不仅需要检测出原来场景中不存在的物体，还要判断该物体是否停留在该场景中。Remaining object detection is different from motion detection. It not only needs to detect objects that do not exist in the original scene, but also judges whether the object stays in the scene.

具体方法为：建立两个参考帧I_bg(x,y)、I_up(x,y)。I_bg(x,y)为当前场景的背景帧（不包括运动物体，和一般的背景建模中用的背景帧一致），I_up(x,y)是一个随时间不断更新的参考帧，设当前图像帧为I(x,y)，则更新方法为：The specific method is: establishing two reference frames I _bg (x, y) and I _up (x, y). I _bg (x, y) is the background frame of the current scene (excluding moving objects, which is consistent with the background frame used in general background modeling), and I _up (x, y) is a reference frame that is continuously updated over time. Assuming that the current image frame is I(x,y), the update method is:

I_up(x,y)=(1-α)I_up(x,y)+αI(x,y) (1)I _up (x,y)=(1-α)I _up (x,y)+αI(x,y) (1)

α为更新速度权重，α越大更新速率越快，α越小更新速率越慢。这样如果外界物体停留在场景中，则经过一段时间后该物体就会融入I_up(x，y)中，成为“背景”的一部分。将I(x,y)分别与I_bg(x,y)、I_up(x,y)进行差分二值化，得到的结果记为：F_bg(x,y)、Fup(x,y)。如果点(x,y)处的像素值F_bg(x,y)=1，F_up(x,y)=0则可以判断该点属于遗留物，具体可根据表1中定义的方法分辨出场景中的遗留物和运动物体。α is the update speed weight, the larger α is, the faster the update rate is, and the smaller α is, the slower the update rate is. In this way, if an external object stays in the scene, the object will be integrated into I _up (x, y) after a period of time and become a part of the "background". I(x,y) is differentially binarized with I _bg (x,y) and I _up (x,y) respectively, and the obtained results are recorded as: F _bg (x,y), Fup(x,y) . If the pixel value F _bg (x, y)=1 and F _up (x, y)=0 at the point (x, y), it can be judged that the point belongs to the remnant, which can be distinguished according to the method defined in Table 1 Remnants and moving objects in the scene.

表1遗留物检测方法判决表Table 1 Judgment table of detection methods for residues

F_bg(x,y)F _bg (x,y) F_up(x,y)F _up (x,y) 判断类型judgment type ⅠI 11 11 运动对象moving object ⅡII 11 00 暂时静止对象（遗留物）temporarily stationary objects (relics) ⅢIII 00 11 随机噪声random noise ⅣIV 00 00 场景中静态物体static objects in the scene

b如果已经保存有追踪对象的特征，则直接进入步骤d；如果没有，则根据用户选定的区域完成对目标物体的模板初始化和SURF特征提取，以及Kalman滤波器的初始化。b If the features of the tracked object have been saved, go directly to step d; if not, complete the template initialization and SURF feature extraction of the target object according to the area selected by the user, as well as the initialization of the Kalman filter.

如果在硬盘的指定目录下已经保存有追踪对象的特征，则意味着发生了场景变换，此时首先导入指定目录下保存的目标对象的模板信息和特征信息，然后在视频帧内进行全局搜索匹配，如果某一帧内的运动物体与模板的匹配点数超过某个规定的阈值，则认为目标物体出现在该场景中，此时需要初始化Kalman滤波器进行跟踪处理。If the features of the tracking object have been saved in the specified directory of the hard disk, it means that a scene change has occurred. At this time, first import the template information and feature information of the target object saved in the specified directory, and then perform a global search and match in the video frame , if the number of matching points between the moving object and the template in a certain frame exceeds a specified threshold, the target object is considered to appear in the scene, and the Kalman filter needs to be initialized for tracking processing.

如果在硬盘的指定目录下没有保存追踪对象的相关特征，则需要用户通过鼠标在监控画面上画选目标物体。算法根据画选的目标物体进行模板初始化和SURF特征提取。If the relevant characteristics of the tracking object are not saved in the specified directory of the hard disk, the user needs to draw and select the target object on the monitoring screen with the mouse. The algorithm performs template initialization and SURF feature extraction according to the selected target object.

上述Kalman滤波器的初始化如下列步骤：The initialization of the above-mentioned Kalman filter is as follows:

1）由于本发明实际应用场景的限制，目标物体在场景中的运动速度一般不会发生大的变化，因此本发明采用匀速运动的运动方程对物体进行分析和处理：1) Due to the limitation of the actual application scene of the present invention, the movement speed of the target object in the scene generally does not change greatly, so the present invention adopts the motion equation of uniform motion to analyze and process the object:

x_t=x_t-1+v_x (2)x _t =x _t-1 +v _x (2)

y_t=y_t-1+v_y (3)y _t =y _t-1 +v _y (3)

式(2)和式(3)分别表示物体在x轴和y轴方向上的运动方程，v_x、v_y分别表示物体在两个方向上的速度。在t时刻，一个运动物体的状态表示为X_t=(x_t,y_t,v_x,v_y)^T，观测值为Z_t=(x_t,y_t)^T。Equations (2) and (3) represent the motion equations of the object in the x-axis and y-axis directions respectively, and v _x and v _y represent the velocities of the object in the two directions respectively. At time t, the state of a moving object is expressed as X _t =(x _t ,y _t ,v _x ,v _y ) ^T , and the observed value is Z _t =(x _t ,y _t ) ^T .

2）对照Kalman滤波器模型和运动方程可知系统的状态转移矩阵F和测量矩阵H分别为：2) Compared with the Kalman filter model and the equation of motion, it can be seen that the state transition matrix F and measurement matrix H of the system are respectively:

$F f = = [\begin{matrix} 11 & 00 & 11 & 00 \\ 00 & 11 & 00 & 11 \\ 00 & 00 & 11 & 00 \\ 00 & 00 & 00 & 11 \end{matrix}],,$ $H h = = [\begin{matrix} 11 & 00 & 00 & 00 \\ 00 & 11 & 00 & 00 \end{matrix}] - - - - - - ((44))$

3）根据本发明实际应用场景的相关实验情况和参考资料，本发明中所采用的Kalman滤波器的状态转移过程和观测过程中噪声向量的方差矩阵Q和R的取值分别如下所示：3) According to the relevant experimental conditions and reference materials of the actual application scene of the present invention, the values of the variance matrix Q and R of the noise vector in the state transition process and the observation process of the Kalman filter adopted in the present invention are as follows:

$Q Q = = [\begin{matrix} 0.01 0.01 & 00 & 00 & 00 \\ 00 & 0.01 0.01 & 00 & 00 \\ 00 & 00 & 0.01 0.01 & 00 \\ 00 & 00 & 00 & 0.01 0.01 \end{matrix}],,$ $R R = = [\begin{matrix} 0.2845 0.2845 & 0.0045 0.0045 \\ 0.0045 0.0045 & 0.0455 0.0455 \end{matrix}] - - - - - - ((55))$

4）系统初始状态向量协方差矩阵P定义如下：4) The initial state vector covariance matrix P of the system is defined as follows:

${P P}_{00} = = [\begin{matrix} 100100 & 00 & 00 & 00 \\ 00 & 100100 & 00 & 00 \\ 00 & 00 & 100100 & 00 \\ 00 & 00 & 00 & 100100 \end{matrix}] - - - - - - ((66))$

5）当摄像头的位置调整时，以上矩阵Q、R、P₀的定义可以根据实际情况进行相应调整。对于初始状态

的定义，本文算法允许用户自定义的选择，即用户可以用鼠标划选出某一时刻想跟踪的目标物体，然后系统根据此时该物体左上角顶点坐标的观测位置和速度初始化

5) When the position of the camera is adjusted, the definitions of the above matrices Q, R, and P ₀ can be adjusted accordingly according to the actual situation. for the initial state

The definition of , the algorithm in this paper allows user-defined selection, that is, the user can use the mouse to select the target object to be tracked at a certain moment, and then the system initializes according to the observed position and velocity of the vertex coordinates of the upper left corner of the object at this time

对Kalman滤波器进行初始化之后，系统就可以对目标物体的位置进行预测跟踪了。在跟踪过程中，根据目标对象的轮廓变化情况自适应地更新模板图像，并将有代表性的特征信息进行保存。After initializing the Kalman filter, the system can predict and track the position of the target object. During the tracking process, the template image is updated adaptively according to the contour changes of the target object, and the representative feature information is saved.

本发明根据被跟踪物体的外形变化，实时地判断是否要更新目标模板和SURF特征描述子。此处的外形变化主要是指在摄像头焦距不变的情况下目标因自身运动位置和方向的变化，所造成的图像中目标所占的像素总面积发生的变化。即The present invention judges in real time whether to update the target template and the SURF feature descriptor according to the shape change of the tracked object. The shape change here mainly refers to the change of the total pixel area occupied by the target in the image caused by the change of the target's own moving position and direction when the focal length of the camera remains unchanged. Right now

A_m-A_n|>H (7)A _m -A _n |>H (7)

时，判断目标发生外形变化，Am和An分别表示m和n时刻目标所占的像素面积，H是一阈值，该阈值可根据具体场景进行调整。When it is judged that the shape of the target has changed, Am and An represent the pixel area occupied by the target at m and n respectively, and H is a threshold, which can be adjusted according to the specific scene.

当判断物体所占的像素面积发生变化后，系统继续判断该目标物体的长宽比是否发生变化，如果此时目标物体的长宽比R_t与之前保存的目标模板的长宽比R_m相比发生明显变化，则认为物体发生角度变化，需要更新目标对象的模板信息。After judging that the pixel area occupied by the object changes, the system continues to judge whether the aspect ratio of the target object has changed. If the aspect ratio R _t of the target object is the same as the aspect ratio R _m of the previously saved target template If there is a significant change in the ratio, it is considered that the angle of the object has changed, and the template information of the target object needs to be updated.

R_t-R_m|>H_R，R_t=W_tH_t (8)R _t -R _m |>H _R , R _t =W _t H _t (8)

H_R为判断目标对象长宽比发生较大变化时的阈值，W_t和H_t分别表示t时刻目标对象的宽度和高度H _R is the threshold for judging the large change in the aspect ratio of the target object, W _t and H _t represent the width and height of the target object at time t, respectively

判断遮挡的发生。本发明采用基于轮廓相交的判断方法，该方法快速有效，其需要解决两种情况的遮挡：一是对象间的相互遮挡，二是目标对象被背景遮挡。Determine the occurrence of occlusion. The present invention adopts a judging method based on contour intersection, which is fast and effective, and needs to solve two situations of occlusion: one is mutual occlusion between objects, and the other is target object is occluded by the background.

我们可知当目标对象发生遮挡时，它的轮廓所占的像素面积是迅速增加或减少的。像素面积增加表示发生了对象间的遮挡，像素面积减少表示发生了背景遮挡对象的情况。因此通过运动检测得到运动物体轮廓的二值化图像后，再利用下式进行判断就可以知道遮挡发生的时刻了。We know that when the target object is occluded, the pixel area occupied by its outline increases or decreases rapidly. An increase in pixel area indicates that occlusion between objects has occurred, and a decrease in pixel area indicates that a background occlusion of an object has occurred. Therefore, after the binarized image of the outline of the moving object is obtained through motion detection, the moment when the occlusion occurs can be known by using the following formula to judge.

|S_t-S_t-1|>T (9)|S _t -S _t-1 |>T (9)

S_t和S_t-1分别表示当前时刻和前一时刻目标对象所占的像素面积，T表示设定好的阈值。该阈值需要根据具体场景中摄像头的焦距进行改变。S _t and S _t-1 represent the pixel area occupied by the target object at the current moment and the previous moment respectively, and T represents the set threshold. The threshold needs to be changed according to the focal length of the camera in the specific scene.

d使用基于SURF特征的匹配方法确定追踪对象，在特征匹配趋于稳定并判断遮挡结束时，重新初始化Kalman滤波器后，进入步骤c；d Use the SURF feature-based matching method to determine the tracking object. When the feature matching tends to be stable and the occlusion is judged to be over, re-initialize the Kalman filter and proceed to step c;

通过SURF算法得到的两幅图像之间一些匹配的特征点对并不是完全正确的，它们中存在着因测量误差和噪声引起的错误匹配。因此，本发明在得到SURF特征匹配点对后，使用RANSAC算法进行精确匹配并得到图像间的转换的单应性矩阵，从而在视频中标定目标对象。Some matching feature point pairs between the two images obtained by the SURF algorithm are not completely correct, and there are false matches caused by measurement errors and noises in them. Therefore, after obtaining the SURF feature matching point pairs, the present invention uses the RANSAC algorithm for precise matching and obtains a homography matrix for conversion between images, so as to mark the target object in the video.

RANSAC（random sample consensus）算法是一种估计数学模型参数迭代的方法，其基本思路是通过随机采样和验证的方法求得大部分样本（指相互匹配的特征点对）都能满足的数学模型的参数。The RANSAC (random sample consensus) algorithm is an iterative method for estimating the parameters of a mathematical model. Its basic idea is to obtain a mathematical model that most samples (referring to matching feature point pairs) can satisfy through random sampling and verification. parameter.

本发明运用RANSAC算法的具体步骤如下：The concrete steps that the present invention uses RANSAC algorithm are as follows:

1）首先随机选取四对SURF匹配点作为初始内点集合，通过初始集合内的四对匹配点计算出变换矩阵H。1) First, randomly select four pairs of SURF matching points as the initial set of internal points, and calculate the transformation matrix H through the four pairs of matching points in the initial set.

2）判断内点集合外的点是否可加入该集合。计算I′和HI之间的距离，如果该值小于设定好的阈值，则将该点加入内点集合。2) Determine whether the points outside the set of internal points can be added to the set. Calculate the distance between I' and HI, if the value is less than the set threshold, then add this point to the set of inliers.

3）重复1）和2）步骤N次，选取内点集合中点数最多的那一组集合作为最终需要的匹配点集合。最后根据该集合，运用最小二乘法更新变换矩阵H。3) Repeat steps 1) and 2) N times, and select the set with the largest number of points in the set of interior points as the final set of matching points required. Finally, according to the set, the transformation matrix H is updated using the least square method.

在上述过程中，假设最终得到的匹配点占初始SURF匹配点总数的比例为p，则随机抽取四对匹配点不全是最终正确匹配的概率为1-p⁴，迭代N次初始的四对点都不全是正确匹配点的概率为(1-p⁴)^N，那么得到正确的变换矩阵H的概率P为：In the above process, assuming that the proportion of the final matching points to the total number of initial SURF matching points is p, the probability that four pairs of matching points randomly selected are not all final correct matches is 1-p ⁴ , and the initial four pairs of points are iterated N times The probability that not all of them are correct matching points is (1-p ⁴ ) ^N , then the probability P of getting the correct transformation matrix H is:

P=1-(1-p⁴)^N (10)P=1-(1-p ⁴ ) ^N (10)

在实际应用中，既为了保证较大概率得到变换矩阵H，又使算法的迭代次数N较少，一般取N在10~20之间即可。In practical applications, in order to ensure a higher probability of obtaining the transformation matrix H, and to reduce the number of iterations N of the algorithm, it is generally sufficient to set N between 10 and 20.

在得到变换矩阵H之后，将模板中目标对象的四个顶点映射过去就得到了图像中物体的大体位置，计算方法如下所示：After obtaining the transformation matrix H, map the four vertices of the target object in the template to obtain the general position of the object in the image. The calculation method is as follows:

x_i′=h₁*x_i+h₂*y_i+h₃ (11)x _i ′=h ₁ *x _i +h ₂ *y _i +h ₃ (11)

y_i′=h₄*x_i+h₅*y_i+h₆ (12)y _i ′=h ₄ *x _i +h ₅ *y _i +h ₆ (12)

(x_i′,y_i′)表示视频图像中目标对象第i个顶点的坐标，(x_i,y_i)为模板中目标对象的第i个顶点坐标。(xi _′ , y _i ′) represent the coordinates of the i-th vertex of the target object in the video image, and (xi _, y _i ) are the coordinates of the i-th vertex of the target object in the template.

判断遮挡的结束。该部分用到的方法与判断遮挡是否发生用到方法相同，即判断物体轮廓是否相交。当遮挡过程结束时，二值图像中观察到的目标对象所在团块占据的像素面积会发生明显变化，通过式(9)就可以判断某时刻遮挡是否结束。Determine the end of occlusion. The method used in this part is the same as the method used to determine whether occlusion occurs, that is, to determine whether the object outlines intersect. When the occlusion process ends, the pixel area occupied by the clump of the target object observed in the binary image will change significantly, and it can be judged whether the occlusion ends at a certain moment by formula (9).

遮挡结束时，系统将根据之前搜索匹配的结果，重新初始化Kalman滤波器，然后通过Kalman滤波器不断地预测和更新来继续完成跟踪处理。When the occlusion ends, the system will re-initialize the Kalman filter according to the previous search and matching results, and then continue to complete the tracking process through the continuous prediction and update of the Kalman filter.

e输出并保存目标对象特征信息。e output and save the target object feature information.

根据用户的需要，保存目标对象特征信息到指定目录。According to the user's needs, save the characteristic information of the target object to the specified directory.

本发明的意义在于：Significance of the present invention is:

视频对象的追踪处理是智能视频处理的关键技术，是行为识别、事件识别、身份识别等高级语义操作和处理的前提。在遮挡和场景变换等复杂场景下，快速、准确的实现对视频对象的追踪是国内外研究的热点和难点。近年来，人们对遮挡情况下的跟踪问题进行了大量的研究，目前现有的方法虽然能解决部分问题，但还没有一种方法能很好地解决所有问题。例如，图像分层方法虽然能解决遮挡跟踪问题，但复杂程度高，难以满足实时性的要求；而基于颜色或轮廓的跟踪方法初始化困难、目标模型更新困难，难以用于实际系统中；在同一场景中利用多摄像机联合监控是目前解决遮挡问题的热门方法，但目前该方法与单摄像头下的跟踪方法相比还很不成熟，而且实现成本和复杂度都很高。相比遮挡，场景变换情况下的跟踪更为复杂，目前在该领域国内外可参考的文献资料还较少，主要的解决方法依然是将单场景下的跟踪算法扩展到多场景跟踪。而本发明，不需要多台摄像机，只需要一台摄像机就可以针对固定背景环境下的运动对象进行快速追踪，并且在发生遮挡和场景变换时仍然具有较高的准确率和鲁棒性。The tracking and processing of video objects is the key technology of intelligent video processing, and the premise of advanced semantic operations and processing such as behavior recognition, event recognition, and identity recognition. In complex scenes such as occlusion and scene change, fast and accurate tracking of video objects is a hot and difficult research topic at home and abroad. In recent years, people have done a lot of research on the tracking problem under occlusion. Although the existing methods can solve some problems, there is no one method that can solve all problems well. For example, although the image layering method can solve the problem of occlusion tracking, it is highly complex and difficult to meet the real-time requirements; while the tracking method based on color or outline is difficult to initialize and update the target model, and it is difficult to be used in the actual system; in the same Using multi-camera joint monitoring in the scene is currently a popular method to solve the occlusion problem, but at present, this method is still immature compared with the tracking method under a single camera, and the implementation cost and complexity are high. Compared with occlusion, tracking under scene change is more complicated. At present, there are few reference materials in this field at home and abroad. The main solution is still to extend the tracking algorithm in single scene to multi-scene tracking. However, the present invention does not require multiple cameras, and only needs one camera to quickly track moving objects in a fixed background environment, and still has high accuracy and robustness when occlusions and scene changes occur.

特别说明的是，本发明工作受国家自然科学基金项目(61170253)、山东科技大学信息学院科研创新团队计划支持。In particular, the work of this invention is supported by the National Natural Science Foundation of China (61170253) and the scientific research and innovation team plan of the School of Information, Shandong University of Science and Technology.

上述方式中未述及的有关技术内容采取或借鉴已有技术即可实现。Relevant technical contents not mentioned in the above methods can be realized by adopting or referring to existing technologies.

需要说明的是，在本说明书的教导下本领域技术人员还可以作出这样或那样的容易变化方式，诸如等同方式，或明显变形方式。上述的变化方式均应在本发明的保护范围之内。It should be noted that under the teaching of this specification, those skilled in the art can also make one or another easy change, such as equivalent or obvious deformation. All the above-mentioned variations should fall within the protection scope of the present invention.

Claims

1. A moving object tracking method for occlusion and scene transformation, characterized in that it comprises the following steps:

a Perform foreground motion detection on the input video sequence to extract moving objects; then enter step b,

b If the features of the tracking object have been saved, go to step d; if the features of the tracking object are not saved, complete the template initialization and SURF feature extraction of the target object according to the area selected by the user, and the initialization of the Kalman filter; Then go to step c,

c Use the method based on the Kalman filter to predict and track the moving target until the end of the video content, then enter step e; when occlusion occurs during the tracking process, then enter step d;

dUse the SURF feature-based matching method to determine the tracking object. When the feature matching tends to be stable and the occlusion is judged to be over, re-initialize the Kalman filter and enter step c;

e output and save the target object feature information.

2. a kind of occlusion-oriented and scene change-oriented moving object tracking method according to claim 1, is characterized in that:

In the above step a, two reference frames I _bg (x, y) and I _up (x, y) are established, I _bg (x, y) is the background frame of the current scene, and I _up (x, y) is a random A reference frame that is constantly updated in time; the current frame I(x,y) is differentially binarized with I _bg (x,y) and I _up (x,y) respectively, and the result obtained is recorded as: F _bg (x, y), F _up (x, y), according to the value of the two to distinguish the leftovers and moving objects in the scene.

3. A kind of moving object tracking method oriented to occlusion and scene change according to claim 1 or 2, characterized in that:

In the above step c, the Kalman filter is first initialized, and then predictive tracking is performed according to the observed state of the target object; during the tracking process, the template image is adaptively updated according to the contour changes of the target object, and a representative The feature information is saved; in the tracking process, the judgment method based on the intersection of contours is used to model, analyze and judge whether occlusion occurs.

4. a kind of occlusion-oriented and scene change-oriented moving object tracking method according to claim 3, is characterized in that:

In the above step d, the video content is automatically searched and the foreground blob that matches the most feature points of the tracked object is found. For the wrong match caused by measurement error and noise, after obtaining the SURF feature matching point pair, use the RANSAC algorithm to perform an exact match and get The homography matrix of the conversion between images is used to mark the target object in the video; the same model is used to judge whether the occlusion is over and whether the occlusion occurs above; the method of reinitializing the Kalman filter is the same as the initialization method of the above Kalman filter.