CN103699908A

CN103699908A - Joint reasoning-based video multi-target tracking method

Info

Publication number: CN103699908A
Application number: CN201410016404.XA
Authority: CN
Inventors: 张辰元; 蔡岭; 张颖华; 赵宇明; 胡福乔
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2014-01-14
Filing date: 2014-01-14
Publication date: 2014-04-02
Anticipated expiration: 2034-01-14
Also published as: CN103699908B

Abstract

A video multi-target tracking method based on joint reasoning in the field of video processing technology, by first reading in a frame image of a video file and performing image rasterization on it, and then using an online detector and a KLT tracking algorithm as a tracker Calibrate the candidate positions of the target, screen the results separately and synthesize the results, then quantify and score the candidate position results obtained, and finally use the joint function to describe the target tracking situation and use the optimal solution based on the joint function as the position of the target in this frame , that is, to achieve target tracking. The present invention can solve the processing method for the combination of detection and tracking algorithms and the processing of multi-target interrelationships in the tracking technology under multi-target tracking, and uses the joint function to describe the relationship between multiple targets, which not only solves the problem of fusion of detection and tracking results, but also Also considering the overall situation, the relationship between each objective is synthesized, and the global optimal solution is obtained.

Description

Multi-object Tracking Method for Video Based on Joint Reasoning

技术领域technical field

本发明涉及的是一种视频处理技术领域的方法，具体是一种基于联合推理的视频多目标跟踪方法。The invention relates to a method in the technical field of video processing, in particular to a video multi-target tracking method based on joint reasoning.

背景技术Background technique

随着摄像头设备的发展和普及，视频跟踪在生产和生活中占有了越来越重要的地位。特别是在视频监控系统中，跟踪算法可以有效的降低人工成本，节约时间。然而由于跟踪技术本身的不完善以及复杂多变的跟踪环境，影响了跟踪结果的准确性，使跟踪算法的应用受到了限制。With the development and popularization of camera equipment, video tracking plays an increasingly important role in production and life. Especially in video surveillance systems, tracking algorithms can effectively reduce labor costs and save time. However, due to the imperfection of the tracking technology itself and the complex and changeable tracking environment, the accuracy of the tracking results is affected, and the application of the tracking algorithm is limited.

视频中的目标跟踪是非常有挑战性的课题，因为跟踪过程中有很多的不确定因素，比如：复杂的背景，若背景与被跟踪目标相似，会干扰跟踪算法对目标位置的判断；明显的物体遮挡、目标外形的快速变化，这些都会导致目标在画面中的外观有明显的变化，导致算法丢失被跟踪目标。对于多目标跟踪而言，除了上述的问题，各个目标之间的相似性、相互作用和相互遮挡都会给正确跟踪带来难度。Target tracking in video is a very challenging subject, because there are many uncertain factors in the tracking process, such as: complex background, if the background is similar to the tracked target, it will interfere with the tracking algorithm's judgment of the target position; obvious Object occlusion and rapid changes in the shape of the target will cause obvious changes in the appearance of the target in the screen, causing the algorithm to lose the tracked target. For multi-target tracking, in addition to the above-mentioned problems, the similarity, interaction and mutual occlusion between various targets will bring difficulties to correct tracking.

针对这些问题一般的处理方法往往是采用检测算法与跟踪算法相结合的方法，也就是利用检测算法来提高最终跟踪的效果。然而，在检测算法的正确性难以保证的前提下，是否能提高跟踪率仍是个问题。The general solution to these problems is often to use a combination of detection algorithms and tracking algorithms, that is, to use detection algorithms to improve the final tracking effect. However, under the premise that the correctness of the detection algorithm is difficult to guarantee, whether it can improve the tracking rate is still a problem.

经过现有技术的文献检索发现，很多学者利用自适应学习的方法来确保检测器的正确性，从而处理检测结果与跟踪结果互相协调的问题，比如Zdenek等在《IEEE Transaction on PatternAnalysis and Machine Intelligence》（2012年第34期第1409-1422页）上发表的“Tracking-Learning-Detection”，该文利用了“跟踪-学习-检测（TLD）”的框架结构，利用跟踪与检测相结合的方法处理跟踪问题。具体的说，首先，输入一帧图像，图像栅格化后，得到大量的图像区块；然后，跟踪器筛选出符合要求的图像区块，检测器也标定所有在外观上可能成为目标的图像区块；最后，采用跟踪环节和检测环节相结合的方法，当跟踪器失败时可以利用检测器的结果重新初始化跟踪器，同时利用跟踪结果扩充训练样本集在线训练检测器，提高检测器的精度。从该技术公开的实验结果可见，TLD在长时间跟踪上有很好的效果。不过该方法仍然存在很多限制：1）只适用于单目标跟踪2）如果目标的外观发生了较大的变化，或者发生了完全遮挡，该方法效果不好，这是因为在这种情况下在线检测器不能正确给出目标的可能位置。Through the literature search of the existing technology, it is found that many scholars use the method of adaptive learning to ensure the correctness of the detector, so as to deal with the problem of coordination between the detection results and the tracking results. For example, Zdenek et al. in "IEEE Transaction on Pattern Analysis and Machine Intelligence" (2012 Issue 34, Pages 1409-1422) published "Tracking-Learning-Detection", which uses the framework structure of "Tracking-Learning-Detection (TLD)" and uses the method of combining tracking and detection to deal with Track issues. Specifically, first, a frame of image is input, and a large number of image blocks are obtained after the image is rasterized; then, the tracker screens out the image blocks that meet the requirements, and the detector also marks all images that may become targets in appearance block; finally, adopt the method of combining the tracking link and the detection link. When the tracker fails, the result of the detector can be used to reinitialize the tracker, and at the same time, the tracking result can be used to expand the training sample set to train the detector online to improve the accuracy of the detector. . From the experimental results published by this technology, it can be seen that TLD has a good effect on long-term tracking. However, this method still has many limitations: 1) It is only suitable for single target tracking 2) If the appearance of the target changes greatly, or if a complete occlusion occurs, this method does not work well, because in this case the online The detector cannot correctly give the possible position of the object.

中国专利文献号CN103176185A申请公布日:2013.06.26，公开了一种用于检测道路障碍物的方法及系统，该技术基于视频摄像装置的第一障碍物检测模型，基于视频摄像装置和毫米波雷达的第二障碍物检测模型，基于三维激光雷达和红外线摄像装置的第三障碍物检测模型，并且通过基于粗糙集的模糊神经网络算法使得所述多个模型形成互补检测，从而实时获取道路障碍物的特征信息。但该技术设备成本高，也不能对障碍物做有效跟踪从而结合历史信息做出跟准确的检测结果。Chinese Patent Document No. CN103176185A Application Publication Date: 2013.06.26, discloses a method and system for detecting road obstacles, the technology is based on the first obstacle detection model of the video camera device, based on the video camera device and millimeter wave radar The second obstacle detection model based on the three-dimensional laser radar and the third obstacle detection model based on the infrared camera device, and through the rough set-based fuzzy neural network algorithm, the multiple models form complementary detection, so as to obtain road obstacles in real time feature information. However, the equipment cost of this technology is high, and it cannot effectively track obstacles so as to combine historical information to make more accurate detection results.

发明内容Contents of the invention

本发明针对现有技术存在的上述不足，提出一种基于联合推理的视频多目标跟踪方法，能够解决多目标跟踪下跟踪技术中对于检测跟踪算法结合的处理方法以及多目标相互关系的处理，利用联合函数来描述多目标之间关系，不仅解决了检测与跟踪的结果融合问题，同时也从全局考虑，综合了每个目标之间的关系，得出了全局最优解。Aiming at the above-mentioned deficiencies existing in the prior art, the present invention proposes a video multi-target tracking method based on joint reasoning, which can solve the processing method for the combination of detection and tracking algorithms and the processing of multi-target interrelationships in the tracking technology under multi-target tracking. The joint function is used to describe the relationship between multiple targets, which not only solves the problem of fusion of detection and tracking results, but also considers the overall situation, synthesizes the relationship between each target, and obtains the global optimal solution.

本发明是通过以下技术方案实现的，本发明通过首先读入视频文件的一帧图像并对其进行图像栅格化处理，然后采用在线检测器以及作为跟踪器的KLT跟踪算法标定目标的候选位置，分别筛选后综合结果，其次将得出的候选位置结果进行量化评分，最后利用联合函数来描述目标跟踪情况并将基于联合函数的最优解作为目标在这一帧的位置，即实现目标跟踪。The present invention is achieved through the following technical solutions. The present invention firstly reads in a frame image of a video file and performs image rasterization processing on it, and then uses an online detector and the KLT tracking algorithm as a tracker to demarcate the candidate position of the target , respectively screen and synthesize the results, and then quantify and score the candidate position results obtained, and finally use the joint function to describe the target tracking situation and use the optimal solution based on the joint function as the position of the target in this frame, that is, to achieve target tracking .

当所述的图像为视频文件的第一帧，则在图像栅格化处理前先进行初始化操作，具体为：手动输入需要跟踪的目标个数，然后手动框出目标。When the image is the first frame of the video file, an initialization operation is performed before rasterizing the image, specifically: manually input the number of targets to be tracked, and then manually frame the targets.

所述的初始化操作是指：初始化检测器与跟踪器，通过random fern在线学习的方法更新检测器，同时基于KLT算法的跟踪器也会在目标范围内选出特征点，用于下一帧的目标跟踪。The initialization operation refers to: initialize the detector and the tracker, update the detector through the method of random fern online learning, and at the same time, the tracker based on the KLT algorithm will also select feature points within the target range for the next frame. Target Tracking.

所述的图像栅格化处理，是指：用大小不一的滑动窗口扫描整帧图像得到不同位置大小不同的图像块，用来作为候选目标，具体为：首先根据初始化目标的大小，等比得出一系列不同尺度大小的滑动窗口，尺度变换的比例范围是1.2^-10～1.2¹⁰；每个滑动窗口依次按照从左到右从上到下的顺序遍历整幅图像，滑动窗口位移大小为窗口大小的0.1。这样，就可以得到很多不同位置大小不同的图像块。The image rasterization process refers to: scanning the whole frame of image with sliding windows of different sizes to obtain image blocks of different sizes in different positions, which are used as candidate targets, specifically: firstly, according to the size of the initialization target, equal A series of sliding windows of different scales are obtained, and the scaling range of scale transformation is 1.2 ^-10 ~ 1.2 ¹⁰ ; each sliding window traverses the entire image in sequence from left to right and top to bottom, and the displacement of the sliding window is 0.1 of the window size. In this way, many image blocks with different sizes at different positions can be obtained.

所述的候选位置通过以下方式得到：The candidate positions are obtained in the following ways:

首先，计算图像块密度方差，当图像块密度方差过小，则被排除，被跟踪目标模板图像在第一帧初始化时取得，图像块方具体差计算公式为：

其中：p_i指的是第i幅图像块的灰度图像，E()指的是求平均函数，当

则第i幅图像块就被保留，其中：l为固定参数，表示模板图像的方差。First, calculate the image block density variance. If the image block density variance is too small, it will be excluded. The target template image to be tracked is obtained when the first frame is initialized. The image block square variance calculation formula is:

Among them: p _i refers to the grayscale image of the i-th image block, E() refers to the averaging function, when

Then the i-th image block is reserved, wherein: l is a fixed parameter, Indicates the variance of the template image.

然后，对输入图像区块提取二值特征，利用random fern算法在线训练得到的分类器估计每一个通过密度方差判断的图像块与被检测目标的相似度，相似度判断公式为：P(c₁|x)=其中：c₁表示训练类别，这里训练时只有两种类别，与被检测目标相似用c₁表示，与被检测目标不相似，用c₀表示；P_i(c₁|x)表示第i颗fern得到的后验概率。Then, binary features are extracted from the input image block, and the classifier obtained by online training using the random fern algorithm is used to estimate the similarity between each image block judged by the density variance and the detected target. The similarity judgment formula is: P(c ₁ |x)= Among them: c ₁ represents the training category. There are only two categories during training here. It is represented by c ₁ if it is similar to the detected target, and it is represented by c ₀ if it is not similar to the detected target; P _i (c ₁ |x) represents the ith object The posterior probability obtained by fern.

最后，将所有ferns得到的后验概率进行平均，得到最终的后验概率值，当相似度P(c₁|x)>50%，则输入图像块与被检测目标相似，保留该图像块。Finally, the posterior probability obtained by all ferns is averaged to obtain the final posterior probability value. When the similarity P(c ₁ |x)>50%, the input image block is similar to the detected target, and the image block is retained.

所述的标定是指：在图像的第一帧时做KLT算法初始化处理，不做跟踪，之后每一帧都从上一帧目标位置中利用KLT算法选取被跟踪目标特征，在当前帧中找到与之相对应的特征区域；然后跟踪器根据图像块的每个图像块内部的特征点个数来决定是否保留，若一个图像块区域内的特征点个数超过了一定的经验阈值，那么该图像块就被判定为候选状态而被保留。The calibration refers to: do the KLT algorithm initialization process in the first frame of the image, do not do tracking, and then use the KLT algorithm to select the tracked target feature from the target position of the previous frame in each frame, and find it in the current frame. The corresponding feature area; then the tracker decides whether to keep it according to the number of feature points in each image block of the image block. If the number of feature points in an image block area exceeds a certain empirical threshold, then the The image block is judged as a candidate state and retained.

所述的量化评分是指：提取候选图像块的Haar特征，通过级联Adaboost分类器评价候选位置的真实性，进行量化的评价：

其中：

的意思是被检测目标停在了级联Adaboost分类器的第

层，

即代表了被检测矩形框

的量化评分；Described quantitative score refers to: extract the Haar feature of candidate image block, evaluate the authenticity of candidate position by cascading Adaboost classifier, carry out the evaluation of quantification:

in:

means the detected target stopped at the first cascaded Adaboost classifier

layer,

That is, it represents the detected rectangular frame

quantitative score;

其中：代表了级联分类器中的一个弱分类器，s(L)代表了第L层的一系列弱分类器；当函数F_i大于某个经验阈值，那么就可以通过这一层，反之则不能通过；所述的弱分类器权重w_i,l是通过AdaBoost的学习方法学习得出的。

in: Represents a weak classifier in the cascade classifier, s(L) represents a series of weak classifiers in the L-th layer; when the function F _i is greater than a certain empirical threshold, then can pass this layer, and vice versa; the weak classifier weights w _i,l are learned through the learning method of AdaBoost.

所述的联合函数由各目标之间的空间位置关系以及各目标候选位置的评分组成，通过构建该联合函数模型，求函数的最优解，就能得出最佳的候选位置作为跟踪的结果。The joint function is composed of the spatial position relationship between each target and the scoring of each target candidate position. By constructing the joint function model and finding the optimal solution of the function, the best candidate position can be obtained as the tracking result .

所述的空间位置关系是指，在多目标的情况下，也可以认为是在相邻帧之间各目标之间的位置关系变化也不显著。那么上一帧目标之间的相互位置也可以作为参考帮助提高本帧的跟踪结果。基于这一原则，那t时刻，第i个目标与第N个目标之间的位置关系概率

描述为：The spatial positional relationship mentioned above means that in the case of multiple targets, it can also be considered that the positional relationship between the targets does not change significantly between adjacent frames. Then the mutual position between the targets in the previous frame can also be used as a reference to help improve the tracking result of this frame. Based on this principle, at time t, the positional relationship probability between the i-th target and the N-th target is

described as:

$d (x_{i}^{t}, x_{N}^{t}) = e^{- β_{1} {(Δv)}^{2} - β_{2} {(Δθ)}^{2}},$ 其中： $Δν = ν (x_{i}^{t}, x_{N}^{t}) - ν (x_{i}^{t - 1}, x_{N}^{t - 1}), Δθ = θ (x_{i}^{t}, x_{N}^{t}) -$

函数ν和函数θ分别代表了在欧式空间上的距离和角度。β₁和β₂起到了平衡距离和角度权重的作用。函数

描述了t时刻目标i与目标N之间的空间位置信息，包括了距离的变化与角度的变化。

d (x_{i}^{t}, x_{N}^{t}) = e^{- β_{1} {(Δv)}^{2} - β_{2} {(Δθ)}^{2}},

in:

Δν = ν (x_{i}^{t}, x_{N}^{t}) - ν (x_{i}^{t - 1}, x_{N}^{t - 1}), Δθ = θ (x_{i}^{t}, x_{N}^{t}) -

Function ν and function θ represent distance and angle in Euclidean space, respectively. β ₁ and β ₂ play the role of balancing distance and angle weights. function

It describes the spatial position information between target i and target N at time t, including the change of distance and angle.

所述的联合函数模型是指，所有目标的联合概率， $p (x_{1}^{t}, . . . . . ., x_{N}^{t}) &Proportional; \exp (Σ_{i = 1}^{N} g (x_{i}^{t}) + γ Σ_{i = 1}^{N - 1} d (x_{i}^{t}, x_{N}^{t})),$ 其中：γ是权重参数，函数

代表了t时刻目标i的状态空间置信度，函数

描述了t时刻目标i与目标N之间的空间位置信息，包括了距离的变化与角度的变化。The joint function model refers to the joint probability of all targets,

p (x_{1}^{t}, . . . . . ., x_{N}^{t}) &Proportional; \exp (Σ_{i = 1}^{N} g (x_{i}^{t}) + γ Σ_{i = 1}^{N - 1} d (x_{i}^{t}, x_{N}^{t})),

Where: γ is the weight parameter, the function

Represents the state space confidence of target i at time t, the function

所述的最优解利用标准置信传播算法来有效解决多目标跟踪问题，把各目标之间的关系用树结构来描述；当被跟踪目标为马尔科夫随机场中的节点，则随机选定一个目标为根节点，其余均为子节点，向根节点传递信息，把最后一个目标定义为根节点，其他的作为树结构中的子节点；这样构造子节点向根节点传递信息的信息传递函数：

每一个叶节点都传递信息至根节点处，那么根节点即被跟踪目标N节点处的置信度可以表示为：

当

取得最大值时，便是该帧多目标跟踪的跟踪结果。The optimal solution uses the standard belief propagation algorithm to effectively solve the multi-target tracking problem, and describes the relationship between the targets with a tree structure; when the tracked target is a node in the Markov random field, it is randomly selected One target is the root node, and the rest are child nodes, and transmit information to the root node. The last target is defined as the root node, and the others are the child nodes in the tree structure; in this way, the information transfer function for child nodes to transmit information to the root node is constructed :

Each leaf node transmits information to the root node, then the confidence of the root node, which is the tracked target N node, can be expressed as:

when

When the maximum value is obtained, it is the tracking result of multi-target tracking in this frame.

技术效果technical effect

尽管在线检测器可以迅速的检测到目标的候选位置，但是产生较多的错误结果，虚警率较高。其次，在线检测器只对外观变化缓慢的目标敏感。在训练样本少的情况下，对于旋转问题、非刚性目标等，该方法是不稳定的。与在线检测器相比，离线检测更精确但速度较慢。为了平衡这两种检测器的作用，所以本发明运用在线检测来标定目标候选位置，并且利用离线检测来对候选位置评分，兼顾了两种技术的优点。Although the online detector can quickly detect the candidate position of the target, it produces more erroneous results and has a higher false alarm rate. Second, online detectors are only sensitive to objects whose appearance changes slowly. In the case of few training samples, the method is unstable for rotation problems, non-rigid targets, etc. Compared with online detectors, offline detection is more accurate but slower. In order to balance the functions of these two detectors, the present invention uses online detection to mark the target candidate positions, and uses offline detection to score the candidate positions, taking into account the advantages of the two technologies.

本发明不仅平衡了离线检测以及在线检测，使得检测器有更好的鲁棒性，能达到更好的检测结果，用来提高最终的跟踪效果。而且提出了利用联合函数来描述目标跟踪情况，从而整合了多目标间的结构信息，同时也有效融合了检测器和跟踪器的结果。与一般跟踪方法相比，本发明做到了从全局的角度来解决跟踪问题，能够更好的解决多目标跟踪过程中的遮挡问题与外观变化等问题，使其在日常生活和生产中都有广泛的应用前景。The invention not only balances off-line detection and on-line detection, makes the detector have better robustness, can achieve better detection results, and is used to improve the final tracking effect. Moreover, a joint function is proposed to describe the target tracking situation, which integrates the structural information between multiple targets, and also effectively fuses the results of the detector and tracker. Compared with the general tracking method, the present invention solves the tracking problem from a global perspective, and can better solve problems such as occlusion and appearance changes in the process of multi-target tracking, making it widely used in daily life and production application prospects.

附图说明Description of drawings

图1本发明实施例的处理流程示意图。Fig. 1 is a schematic diagram of the processing flow of the embodiment of the present invention.

图2本发明应用实例示意图1。Fig. 2 is a schematic diagram 1 of an application example of the present invention.

图3本发明应用实例示意图2。Fig. 3 is a schematic diagram 2 of an application example of the present invention.

具体实施方式Detailed ways

下面对本发明的实施例作详细说明，本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below. This embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operating procedures are provided, but the protection scope of the present invention is not limited to the following implementation example.

实施例1Example 1

本实施例的实施环境为，对于可移动或固定摄像头，在室内或室外采集得来视频，可用本发明做跟踪处理。具体使用方法为，首先，在需要监控跟踪的场所架好可固定的或者可移动的摄像头做采集数据之用；然后，采集得来的视频信息输入电脑，用本发明进行分析，本发明在视频流第一帧会要求输入需要被跟踪目标个数，人工用矩形框出被跟踪目标；最后，本发明会自动在视频流中准确跟踪并且标出被选出目标。The implementation environment of this embodiment is that the present invention can be used for tracking processing for videos collected indoors or outdoors by movable or fixed cameras. The specific method of use is as follows, at first, set up a fixed or movable camera in the place where monitoring and tracking is required to collect data; The first frame of the stream will require input of the number of targets to be tracked, and the tracked targets will be framed manually with a rectangle; finally, the present invention will automatically and accurately track and mark the selected targets in the video stream.

如图1所示，本实施例包括以下操作步骤：首先读取输入视频的一帧图像，若是第一帧，做初始化设置，；之后，该帧图像栅格化处理后，输入到在线检测器与跟踪器标定出候选目标位置；然后，将候选目标位置输入评分器，利用离线分类器给各个候选位置做一个量化的评分；最后，将量化的候选位置信息以及各目标空间结构信息输入联合函数，通过最优解计算输出各个目标的跟踪结果。As shown in Figure 1, this embodiment includes the following steps: first read a frame image of the input video, if it is the first frame, do initialization settings; afterward, after the frame image is rasterized, it is input to the online detector Mark the candidate target positions with the tracker; then, input the candidate target positions into the scorer, and use the offline classifier to make a quantitative score for each candidate position; finally, input the quantified candidate position information and the spatial structure information of each target into the joint function , output the tracking results of each target through the optimal solution calculation.

如图1所示，本实施例首先用户输入待跟踪视频，读取视频的一帧图像，在视频第一帧时进行初始化处理，由用户自行选择待跟踪目标，可以是一个或者多个。初始目标选择完成之后，本发明自动对目标进行跟踪。As shown in Figure 1, in this embodiment, the user first inputs the video to be tracked, reads a frame of the video, and performs initialization processing at the first frame of the video, and the user selects the target to be tracked, which can be one or more. After the initial target selection is completed, the present invention automatically tracks the target.

接下来，对该帧图像进行栅格化，得到很多大小不一的图像块作为初步的候选位置。再将栅格化图像分别输入在线检测器以及跟踪器中。在线检测器首先排除密度方差过小的图像块，即包含信息量较少的图像块，然后程序提取剩余图像块的特征，通过在线训练的检测器保留符合要求的图像块作为候选目标位置；跟踪器中，程序首先从上一帧的目标结果中选取关键点，然后在本帧中找到相对应的关键点，本发明会根据实验中定的经验阈值，通过分析每个图像块中被跟踪到的关键点的个数，来决定保留那些图像块作为目标候选位置。Next, the frame image is rasterized to obtain many image blocks of different sizes as preliminary candidate positions. The rasterized images are then fed into the online detector and tracker respectively. The online detector first excludes image blocks with too small density variance, that is, image blocks containing less information, and then the program extracts the features of the remaining image blocks, and the online trained detector retains the image blocks that meet the requirements as candidate target positions; tracking In the device, the program first selects key points from the target result of the previous frame, and then finds the corresponding key points in this frame. The present invention will analyze the tracked points in each image block according to the empirical threshold value determined in the experiment. The number of key points to determine which image blocks to keep as target candidate positions.

得到目标候选位置后，本程序会用离线分类器对每个候选图像块量化评分，再结合各个目标之间的空间关系模型，输入联合函数来描述所有目标的跟踪情况。程序把联合函数的最优解作为最后的跟踪结果输出。得到当前帧的跟踪结果后，程序还会对在线检测器做训练，来保证下一帧检测结果的稳定性。After obtaining the target candidate positions, this program will use an offline classifier to quantify and score each candidate image block, and then combine the spatial relationship model between each target to input a joint function to describe the tracking situation of all targets. The program outputs the optimal solution of the joint function as the final tracking result. After getting the tracking result of the current frame, the program will also train the online detector to ensure the stability of the detection result of the next frame.

当视频图像的每一帧都遍历之后，本实施例结束。After each frame of the video image is traversed, this embodiment ends.

如图2所示，本实施例应用实例示意图。本实例中，上文公式中提到的参数设置为：λ=10，γ=60，β₁=0.1，β₂=2，场景中4个目标同时从镜头的左侧走向右侧，目标不仅相互之间靠得很近，会产生相互遮挡，而且4个被跟踪目标的外形也颇为相似，对跟踪带来一定的难度。本方法由于运用了结构性推理函数，从全局角度考虑所有目标跟踪结果从而求得最优解，降低了目标之间相互遮挡的影响，frame#187中绿色目标在跟踪的过程中也没有受到周围相似物体的影响，很准确的对4个目标同时进行跟踪。从frame#220中可以明显看出，本方法在该场景下有很好的跟踪效果。As shown in FIG. 2 , it is a schematic diagram of an application example of this embodiment. In this example, the parameters mentioned in the above formula are set as: λ=10, γ=60, β ₁ =0.1, β ₂ =2, 4 targets in the scene move from the left side of the lens to the right side at the same time, and the targets not only Being very close to each other will cause mutual occlusion, and the appearance of the four tracked targets is quite similar, which brings certain difficulties to tracking. Due to the use of structural reasoning functions, this method considers all target tracking results from a global perspective to obtain the optimal solution, which reduces the influence of mutual occlusion between targets, and the green target in frame#187 is not affected by the surrounding during the tracking process. Due to the influence of similar objects, it can accurately track 4 targets at the same time. It can be clearly seen from frame#220 that this method has a good tracking effect in this scene.

如图3所示，也是本发明应用实例示意图。本实例中，上文公式中提到的参数设置为：λ=10，γ=60，β₁=0.1，β₂=2，为了对本方法的跟踪模型有直接的认知，对结构性的物体做了跟踪测试。图3所示的测试视频是自行录制的，被跟踪目标为人脸部的眼、鼻和嘴角。人脸可以被看做是一个结构性物体，由固定的5个部分构成，以鼻子为中心，四周有两个眼睛和嘴角。本方法提出的算法可以很好的处理该遮挡问题，即使有瞬间的遮挡，本文跟踪模型也没有丢失目标。这是因为本文提出的结构性推理联合概率函数利用了各个目标之间的相互关系的信息，而不是只考虑单个目标的概率，在跟踪器和检测器丢失目标时，仍然可以根据目标之间的相互空间关系来推测出被跟踪目标的位置。As shown in FIG. 3 , it is also a schematic diagram of an application example of the present invention. In this example, the parameters mentioned in the above formula are set as: λ=10, γ=60, β ₁ =0.1, β ₂ =2. In order to have a direct understanding of the tracking model of this method, the structural objects Did a track test. The test video shown in Figure 3 is self-recorded, and the tracked targets are the eyes, nose and mouth corners of the human face. The human face can be regarded as a structural object, consisting of five fixed parts, with the nose as the center, two eyes and the corners of the mouth around it. The algorithm proposed by this method can handle the occlusion problem very well. Even if there is an instantaneous occlusion, the tracking model in this paper does not lose the target. This is because the joint probability function of structural inference proposed in this paper utilizes the information of the interrelationship between various targets instead of only considering the probability of a single target. The mutual spatial relationship is used to infer the position of the tracked target.

Claims

1. A video multi-target tracking method based on joint reasoning, characterized in that by first reading in a frame image of a video file and carrying out image rasterization processing to it, then using online detector and KLT tracking as tracker The algorithm calibrates the candidate position of the target, screens the results separately and synthesizes the results, then quantifies and scores the candidate position results obtained, and finally uses the joint function to describe the target tracking situation and takes the optimal solution based on the joint function as the target in this frame. position, that is, to achieve target tracking.

2. method according to claim 1, it is characterized in that, described initialization operation refers to: initialize detector and tracker, update detector by the method for random fern online learning, simultaneously the tracker based on KLT algorithm will also Feature points are selected within the target range for target tracking in the next frame.

3. The method according to claim 1, wherein said image rasterization processing refers to: scanning the whole frame of image with sliding windows of different sizes to obtain image blocks of different sizes in different positions, which are used as Candidate targets, specifically: firstly, according to the size of the initialization target, a series of sliding windows of different scales are obtained in proportion, and the scale range of scale transformation is 1.2 ^-10 ~ 1.2 ¹⁰ ; The entire image is traversed in order from top to bottom, and the sliding window displacement size is 0.1 of the window size, that is, many image blocks with different sizes at different positions are obtained.

4. method according to claim 1, is characterized in that, described candidate position obtains by following way:

First, calculate the image block density variance. If the image block density variance is too small, it will be excluded. The target template image to be tracked is obtained when the first frame is initialized. The image block square variance calculation formula is: Among them: p _i refers to the grayscale image of the i-th image block, E() refers to the averaging function, when

Then the i-th image block is reserved, wherein: l is a fixed parameter,

Indicates the variance of the template image;

Then, binary features are extracted from the input image block, and the classifier obtained by online training using the random fern algorithm is used to estimate the similarity between each image block judged by the density variance and the detected target. The similarity judgment formula is: Pc ₁ |x )=

Among them: c ₁ represents the training category. There are only two categories during training here. It is represented by c ₁ if it is similar to the detected target, and it is represented by c ₀ if it is not similar to the detected target; P _i (c ₁ |x) represents the ith object The posterior probability obtained by fern;

Finally, the posterior probability obtained by all ferns is averaged to obtain the final posterior probability value. When the similarity P(c ₁ |x)>50%, the input image block is similar to the detected target, and the image block is retained.

5. The method according to claim 1, characterized in that, said calibration refers to: do KLT algorithm initialization processing in the first frame of the image, do not track, and then each frame from the previous frame target position In the method, the KLT algorithm is used to select the features of the tracked target, and the corresponding feature area is found in the current frame; then the tracker decides whether to keep it according to the number of feature points inside each image block of the image block, if an image block area If the number of feature points in the image exceeds a certain empirical threshold, then the image block is judged as a candidate state and is retained.

6. method according to claim 1, it is characterized in that, described quantitative scoring refers to: extract the Haar feature of candidate image block, evaluate the authenticity of candidate position by cascade Adaboost classifier, carry out the evaluation of quantification:

in:

means the detected target

stopped at the first cascaded Adaboost classifier

layer,

That is, it represents the detected rectangular frame

quantitative score;

in:

Represents a weak classifier in the cascade classifier, s(L) represents a series of weak classifiers in the L-th layer; when the function F _i is greater than a certain empirical threshold, then

can pass this layer, and vice versa; the weak classifier weights w _i,l are learned through the learning method of AdaBoost.

7. method according to claim 1, it is characterized in that, described joint function is made up of the spatial position relationship between each target and the scoring of each target candidate position, by building this joint function model, seek the optimum of function solution, the best candidate position can be obtained as the tracking result.

8. The method according to claim 7, wherein the spatial positional relationship means that, in the case of multiple targets, the mutual positions between targets in the previous frame can also be used as a reference to help improve the tracking of this frame As a result, at time t, the positional relationship probability between the i-th target and the N-th target

described as:

d (x_{i}^{t}, x_{N}^{T}) = e^{- β_{1} {(Δv)}^{2} - β_{2} {(Δθ)}^{2}},

in:

Δν = ν (x_{i}^{t}, x_{N}^{t}) - ν (x_{i}^{t - 1}, x_{N}^{t - 1}), Δθ = θ (x_{i}^{t}, x_{N}^{t}) -

Function ν and function θ respectively represent the distance and angle in Euclidean space; β ₁ and β ₂ play a role in balancing the weight of distance and angle; the function

9. method according to claim 7, is characterized in that, described joint function model refers to, the joint probability of all targets,

Among them: ε and γ are weight parameters, and the function

Represents the state space confidence of target i at time t, the function

10. The method according to claim 7, characterized in that, said optimal solution utilizes a standard belief propagation algorithm to effectively solve the multi-target tracking problem, and describes the relationship between each target with a tree structure; when being tracked The target is a node in the Markov random field, randomly select a target as the root node, and the rest are child nodes, and transmit information to the root node, define the last target as the root node, and the others as child nodes in the tree structure node; thus construct the information transfer function for the child node to transfer information to the root node:

Each leaf node transmits information to the root node, then the confidence of the root node, which is the tracked target N node, can be expressed as: when