KR101681104B1

KR101681104B1 - A multiple object tracking method with partial occlusion handling using salient feature points

Info

Publication number: KR101681104B1
Application number: KR1020150097839A
Authority: KR
Inventors: 이석룡
Original assignee: 한국외국어대학교 연구산학협력단
Priority date: 2015-07-09
Filing date: 2015-07-09
Publication date: 2016-11-30

Abstract

The present invention relates to a main feature point based multi-object tracking method for extracting a salient feature point (SFP) from each object and simultaneously tracking a plurality of objects in the image based on the feature point, wherein (a) Extracting major feature points in a corresponding frame of the image and calculating a minimum bounding rectangle of each object including all the major feature points of each object; (c) predicting positions of major feature points of a next frame; (d) determining whether the main feature points of the predicted next frame of each object are erroneously or normally tracked, using the outlier analysis; (e) calculating a minimum bounding rectangle of a next frame of each object using the main feature points of the next frame of the normal track of each object; And (f) using the minimum bounding rectangle of the next frame of each object to modify the main traits of each object.
By using the method described above, multiple SFPs can be tracked individually in a video frame, and if misclosed by the occlusion, multiple objects can be successfully tracked by utilizing the relative positions of the correctly tracked other SFPs, Tracking accuracy can be achieved.

Description

[0001] The present invention relates to a method for tracking a multi-object based on a feature point in a video object having a partial occlusion,

The present invention relates to a multi-object tracking method based on a feature point, which tracks multiple objects moving in a continuous image when the object is partially obscured by a background object or when two or more objects overlap each other.

More particularly, the present invention relates to a method for extracting a salient feature point (SFP) from each object when a partial occlusion phenomenon occurs between objects in an image, and simultaneously tracking a plurality of objects in the image based on the feature point. And more particularly, to a multi-object tracking method based on feature points.

Visual object tracking is an important and complex task in the field of computer vision. The method has various applications such as automatic object detection, object monitoring, motion analysis, and human computer interaction [Non-Patent Documents 1-4]. For example, automated surveillance systems play an important role in plant, school, traffic, hospital, bank monitoring, and other areas, including object detection, tracking, and event analysis, according to various needs.

In a video stream, a portion of an object may not be visible to a person by occlusion. Blurring has been recognized as one of the major challenges in visual object tracking as tracking accuracy is seriously degraded.

A person can recognize an object even if it is partially hidden. If an object is only partially visible, the human brain can reconstruct the entire object using inference based on knowledge of the visible part of the object and the overall structure of the object. For example, in FIG. 1 (a), the whole body of a human being is seen, whereas in FIG. 1 (b), only a part of the body is shown by an obstacle. Despite the presence of obstacles, a person can predict the size and shape of an object by gauging the posture of the visible part.

Sophisticated techniques may be needed to implement tracking systems similar to human object recognition mechanisms in the event of occlusion. Multiple object tracking, which tracks multiple objects at the same time when an occlusion of an object occurs in the image, has been recognized as a difficult problem in the field of computer vision, and research has been conducted steadily.

Many existing tracking methods can accurately track multiple objects when multiple objects are clearly separated from each other and the colors of multiple objects are not very similar to the background. Otherwise, it may fail, and the object tracker may incorrectly trace anywhere else in the moving object or background.

Existing tracking methods will be described in more detail.

Object tracking is largely divided into single object tracking and multiple object tracking. Tracking a single object or a few isolated objects is relatively easy when tracking occlusion and / or tracking multiple objects under harsh background conditions. The target object being tracked may be masked by a background object or other target object. On the other hand, a variety of single visual approaches and multiple visual approaches have been proposed to track obscured target objects.

Multiple visual approaches [Non-Patent Literature 5-8] use information obtained from one or more cameras to reconstruct 3D spatial information by reducing hidden portions. However, such a configuration in which an image of the same scene is shot with several cameras may not be actually possible.

Existing single visual approaches can effectively track isolated objects, while occlusion phenomena, especially inter-object occlusion, are likely to cause severe obstruction and failure of multi-object tracking. If there is a change in the appearance of an object, for example, if the shape of the object changes by some rotation, most algorithms will not be able to accurately track the object. In the present invention, a multi-object tracking method in case of occlusion using a single visual approach is discussed.

To date, much work has been done to track moving objects, and a variety of techniques have been used for effective tracking, including object detection, display, and tracking algorithms [9, 10]. Various algorithms such as Kalman Filter [Non-Patent Document 11-13], mean shift [Non-Patent Document 14-17], and particle filter [Non-Patent Document 18-27] It is proposed to track moving objects as a single visual approach.

Kalman filters are widely applied to object tracking applications, but require linear models for state dynamics that are not guaranteed in all scenarios [Non-patent Document 28]. Beymer and Konolige [Non-Patent Document 11] proposed a method of detecting the position and velocity of an object using a linear velocity model, and estimating the position and velocity of the object during occlusion by using a Kalman filter. In this method, the obscured object is tracked as a new object after the occlusion. Rowe et al. [Non-Patent Document 12] proposed a block-based color histogram matching algorithm in which tracking is performed by three steps-object detection, low-level tracking, and high-level tracking for a multi-object tracking using a Kalman filter. This method is difficult in a hybrid environment and requires initialization of many parameters that can cause misclassification. Chang et al. [Non-Patent Document 13] displayed the object using the center of the object and the boundary rectangle of the object, and tracked the object using the Kalman filter. A new large bounding rectangle is created that combines the obscured objects to track objects during occlusion. This approach can recognize blindness, but does not track individual objects that are obscured.

The mean shift (MS) tracking algorithm is widely used because of simplicity and efficiency. In the MS tracking method, the target object is mainly depicted as a weighted [non-patent document 14] or characteristic [non-patent document 15] histogram of pixels on the bounding rectangle of the object or object. The objects are then retrieved in the video frame through template matching using various vector similarity measures such as the Bhattacharyya coefficient [29] or Kullback-Leibler divergence [30]. Ning et al. [Non-Patent Document 16] proposed a corrected background weighted histogram (CBWH) in which the target position estimation is improved by reducing the relevance of the background information. However, when the colors of objects and backgrounds are similar, this algorithm may not maintain consistency. Zhao et al. Used 3D human models, color histograms, and foreground and background appearance models to track multiple people in crowded scenes [Non-Patent Document 17]. This method is very sophisticated and results in computational overhead.

A particle filter (PF), also known as a sequential Monte Carlo method (non-patent reference 31), is the best known algorithm for generating a posterior probability density function (pdf) using the propagation rule of state density [Non-patent Document 32]. This algorithm produces better results than other algorithms, especially in nonlinear environments. Non-Patent Document 18], Outline [Non-Patent Document 19], and Color Information [Non-Patent Document 20] methods are mainly used for the algorithm based on the particle filter. While it is not easy to find exact coordinates in the object coordinate method, approaches based on contour and color information may fail to track the object when the background is similar to the object.

Jin et al. [Non-Patent Document 21] divides a human body into three parts, each part is represented by a combination of a color histogram and a histogram of oriented gradient descriptor, and each part is individually tracked. This proposal does not track multiple objects and does not resolve blindness. Chang and Ansari [Non-Patent Document 22] used an elliptical model and gradient estimation. The proposed KPF (kernel particle filter) here suggests that it has better performance than the conventional PF, but it does not track multiple target objects and does not provide any mechanism to solve the blur. R. Cabido et al. [Non-Patent Document 23] proposed a tracking algorithm combining a particle filter and an imitation algorithm. It can track multiple objects, but does not address blindness. Liu and Sun [Non-Patent Document 24] used PF to track a target object represented by a rectangle. Traceability is improved by an incremental likelihood function which is a combination of histogram and Bhattacharyya similarity calculations. This method is very fast, but less accurate. Yang et al. [Non-Patent Document 25] proposed a method based on particle filtering for multi-object tracking including pseudo random sampling. This method does not solve the partial shape change or change of the tracked object. Cai et al [Non-Patent Document 26] proposed a particle filter approach combined with a mean shift algorithm, while Okuma et al. [Non-Patent Document 27] used an increased particle filter. Both methods use color histograms to represent the target model and are effective in tracking people and other non-rigid objects. However, a large number of particles are needed for pre-training.

[Non-Patent Document 1] E. Amer, E. Dubois and A. Mitiche, "Real-time system for high-level video representations: application to video surveillance," SPIE International Symposium on Electronic Imaging, Conf. on Visual Communication and Image Processing (VCIP), pp. 530-541, 2003. [Non-patent Document 2] M. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier and L. Gool, "Online Multi-Person Tracking- by-Detection Form a Single Uncalibrated Camera," IEEE Transactions on PAMI, pp. 1820 - 1833, 2010. [Non-Patent Document 3] A. Ess, B. Leibe, K. Schindler and L. Van Gool, "Robust multi-Person tracking from a mobile platform," IEEE Transactions on PAMI, pp. 1831-1846, 2009. [Non-Patent Document 4] P. K. Sahoo, J. Sheu and K. Y. Hsieh, "Target tracking and boundary node selection algorithms for wireless sensor networks for internet devices," Information Sciences, vol. 230, pp. 21-38, 2013. [Non-Patent Document 5] S. L. Dockstader and A. M. Tekalp, "Multiple camera tracking of interacting and occluded human motion.", In Proceedings of IEEE, 89, 2001. [Non-Patent Document 6] Y. L. Hou and G. K. H. Pang, "People Counting and Human Detection in a Challenging Situation," IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, pp. 24-33, 2011. [Non-Patent Document 7] K. Chen, C. Lai, P. Lee, C. Chen and Y. Hung, "Adaptive learning for target tracking and true linking discovering across multiple non-overlapping cameras," IEEE Transactions on Multimedia, 2011 . [Non-Patent Document 8] TT Yang, Y. Zhang, X. Tong, X. Zhang and R. Yu, "A New Hybrid Synthetic Aperture Imaging Model for Tracking and Seeing People Through Occlusion," IEEE Transactions on Circuits and Systems for video technology, issue: 99, 2013. [Non-Patent Document 9] A. Yilmaz and O. Javed, "Object Tracking: A Survey," ACM Computing Surveys, Vol. 38, No. 4, Article 13, 2006. [Non-Patent Document 10] H. Yang, L. Shao, F. Zheng, L. Wang and Z. Song, "Recent advances and trends in visual tracking: A review," Neurocomputing, vol. 74, pp. 3823-3831, 2011. [Non-Patent Document 11] D. Beymer and K. Konolige, "Real-time tracking of multiple people using continuous detection," in ICCV, 1999. [Non-Patent Document 12] D. Rowe, I. Reid, J. Gonz ㅰ lez and J. Villanueva, "Unconstrained multiple-person tracking," in In: Proc. the 28th DAGM symposium, Berlin, Germany, 2006. [Non-Patent Document 13] X. Li, K. Wang, W. Wang and Y. Li, "A Multiple Object Tracking Method Using Kalman Filter," Proceedings of the IEEE International Conference on Information and Automation, Harbin, 2010. [Non-Patent Document 14] D. Comaniciu, V. Ramesh and P. Meer, "Kernel-based object tracking," IEEE Transactions on PAMI, p. 564-57, 2003. [Non-Patent Document 15] J. Allen, R. Xu and J. Jin, "Mean shift object tracking for a SIMD computer, in Proc. Int. Conf. Information Technology and Applications, Sydney, Australia, July 2005. [Non-Patent Document 16] J. Ning, L. Zhang, D. Zhang and C. Wu, "Robust mean-shift tracking with corrected background-weighted histogram," IET Computer Vision, vol. 6, no. 1, pp. 62-69, 2012. [Non-Patent Document 17] T. Zhao, R. Nevatia and B. Wu, " Segmentation and tracking of multimodal humans in crowded environments, "IEEE Transactions on PAMI, pp. 1198-1211, 2008. [Non-patent Document 18] C. Yoon, M. Cheon and M. Park, "Object tracking from image sequences using adaptive models in fuzzy particle filter," Information Sciences, vol. 253, pp. 74-99, 2013. [Non-Patent Document 19] Z. Khan, T. Balch and F. Dellaert, "A rao-blackwelled particle filter for eigentracking," in In Proc. IEEE Conf. CVPR, Washington DC, 2004. [Non-Patent Document 20] K. Nummiaro, E. Koller-Meier and V. Gool L., "An Adaptive Color-Based Particle Filter," Image and Vision Computing, vol. 21, no. 1, pp. 99-110, 2003. [Non-Patent Document 21] L. Jin, J. Cheng and H. Huang, "Intelligent Signal Processing and Communication Systems (lSPACS 2010) in International Symposium on Human Tracking in the Complicated Background Using Particle Filter Using Color- , 2010. [Non-Patent Document 22] C. Chang and R. Ansari, "Kernel Particle Filter for Visual Tracking," IEEE Signal Processing Letters, vol. 12, no. 3, 2005. [Non-Patent Document 23] C. R., S. M. A. and J. P. J., "High performance memetic algorithm particle filter for multiple object tracking on modern GPUs," Soft Computing, vol. 16, pp. 217-230, 2012. [Non-Patent Document 24] H. Liu and F. Sun, "Efficient visual tracking using particle filter with incremental likelihood calculation," Information Sciences, vol. 2012, pp. 141-153, 2012. [Non-Patent Document 25] C. Yang, R. Duraiswami and L. Davis, "Fast multiple object tracking via a hierarchical particle filter," IEEE International Conference on Computer Vision (ICCV), Vol. 1, 2005. [Non-Patent Document 26] Y. Cai, N. Freitas and J. Little, "Robust visual tracking for multiple targets," in 9th European Conference on Computer Vision (ECCV), Graz, Austria, 2006. [Non-Patent Document 27] K. Okuma, A. Taleghani, N. Freitas, J. Little and D. Lowe, "A Boosted Particle Filter: Multitarget Detection and Tracking," 8th European Conference on Computer Vision , Czech Republic, 2004. [Non-Patent Document 28] J. Li, Q. S. Jia, X. Guan and X. Chen, "Tracking a moving object via a sensor network with a partial information broadcasting scheme," Information Sciences, vol. 181, pp. 4733-4753, 2011. [Non-Patent Document 29] A. Djouadi, O. Snorrason and F. D. Graber, "Quality of training-sample estimates of the Bhattacharyya coefficient," IEEE Transactions on PAMI, vol. 12, 1990. [Non-Patent Document 30] M. S. Khalid, M. U. Ilyas, K. Mahmoo, M. S. Sarfaraz and M. B. Malik, "Kullback-Leiber divergence measure in correlation with gray-scale objects, 2nd Int. Conf. IIT, 2005. [Non-Patent Document 31] A. Doucet, N. De Freitas and N. Gordon, Sequential Monte Carlo methods in practice., Springer-Verlag, 2001. [Non-Patent Document 32] M. R. G. and S. C., "Visual Object Target Tracking Using Particle Filter: A Survey," International Journal of Image, Graphics and Signal Processing, vol. 6, pp. 57-71, 2013. [Non-Patent Document 33] C. Tomasi and T. Kanade, "Detection and Tracking of Point Features," Pattern Recognition, vol. 37, p. 165, 168, 2004. [Non-Patent Document 34] C. Harris and M. Stephens, "A combined corner and edge detector," in Proceedings of the 4th Alvey Vision Conference. p. 147, 151, 1988. [Non-Patent Document 35] X. Chen He and N. H. C. Yung, "Corner detector based on global and local curvature properties," Journal of Optical Engineering, 2008. [Non-Patent Document 36] A. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. IEEE Conf. Computer Vision and Pattern (CVPR), 2005. [Non-Patent Document 37] A. David, Forsyth and J. Ponce, Computer Vision, a modern approach, Prentice Hall, ISBN 0-13-085198-1, 2003. [Non-Patent Document 38] S. Suresh, K. Chitra and P. Deepak, "A Survey On Occlusion Detection," International Journal of Engineering Research & Technology (IJERT), vol. 2, no. 3, 2013. [Non-Patent Document 39] P. 2010, 2010. [Online]. Available: http://www.cvg.rdg.ac.uk/PETS2010/a.html#s2l1 .. [Accessed 20 2012]. [Non-Patent Document 40] P. 2000, "PETS 2000, 2000. [Online]. Available: ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/. [Accessed 25 2012]. [Non-Patent Document 41] V. System, 28 9 2007. [Online]. Available: http://imagelab.ing.unimore.it/visor_test/video_details.asp?idvideo=91. [Accessed 12 5 2013].

The object of the present invention is to solve the above-mentioned problems, and focus on tracking multiple objects moving in a continuous image when an object is partially obscured by a background object or when two or more objects obscure each other In order to solve the occlusion problem, the multi-object tracking uses a similar strategy to the object reconstruction mechanism of a human, and provides a main feature point based multi-object tracking method.

In particular, it is an object of the present invention to extract a plurality of salient feature points (SFPs) from each target object, track each SFP individually in successive video frames, and determine which one of the SFPs of the object is blind Object tracking method based on key feature points that, when traced by uncooperative background conditions, utilizes the relative positions of correctly tracked other SFPs of the same object to gauge the exact location of the mis-tracked SFPs.

According to an aspect of the present invention, there is provided a method for tracking a plurality of objects detected in an image composed of a plurality of consecutive frames in time, the method comprising the steps of: (a) Extracting minutiae points and calculating a minimum bounding rectangle of each object including all major minutiae of each object; (c) predicting positions of major feature points of a next frame; (d) determining whether the main feature points of the predicted next frame of each object are erroneously or normally tracked, using the outlier analysis; (e) calculating a minimum bounding rectangle of a next frame of each object using the main feature points of the next frame of the normal track of each object; And (f) using the minimum bounding rectangle of the next frame of each object, modifying the main traits of each object.

According to another aspect of the present invention, there is provided a method for tracking a plurality of objects detected in an image composed of a plurality of consecutive frames in time, the method comprising the steps of: (a) extracting main feature points in a corresponding frame of the image; Calculating a minimum bounding rectangle of each object including all major feature points of each object; (b) calculating a feature descriptor for the main feature points of each object; (c) predicting positions of major feature points of a next frame; (d) determining whether the main feature points of the predicted next frame of each object are erroneously or normally tracked, using the outlier analysis; (e) calculating a minimum bounding rectangle of a next frame of each object using the main feature points of the next frame of the normal track of each object; (f) using the minimum bounding rectangle of the next frame of each object to modify the main traits of each object; And (g) modifying the feature descriptor for the main feature points.

According to another aspect of the present invention, there is provided a method for tracking a multi-object based on a feature point, the method comprising the steps of: generating a plurality of particles by a velocity of a main feature point and a Gaussian distribution at a position of a main feature point of the corresponding frame, The weight of the particle is determined by the Bhattacharyya distance between the main feature points, and the position of the particle selected by the obtained weight is predicted by the position of the main feature point of the next frame.

According to another aspect of the present invention, there is provided a method for tracking a multi-object based on a feature point, the method comprising: predicting a position of a particle having a maximum weight to a position of a main feature point of a next frame.

According to another aspect of the present invention, there is provided a method for tracking a multi-object feature based on a feature point, the method comprising: predicting a position of a minimum bounding rectangle of each object based on a relative position of predicted main feature points of each object; , The distribution of the positions of the predicted minimum bounding rectangles of all the predicted main feature points of each object is obtained and if the predicted main feature points are out of a predetermined range from the center of the distribution, .

According to another aspect of the present invention, there is provided a main feature point-based multi-object tracking method, wherein, in the step (d), if a position of a main feature point predicted deviates from a mean (m) of the distribution by a standard deviation Is judged to be mis-traced.

According to another aspect of the present invention, there is provided a multi-object tracking method based on a feature point, wherein the distribution is a distribution of positions of left-upper corners of a minimum bounding rectangle predicted from all predicted main feature points of each object.

According to another aspect of the present invention, there is provided a method of tracking a multi-object feature based on a feature point, the method comprising: averaging positions of a minimum bounding rectangle calculated based on relative positions of top- The ratio of the size of the minimum bounding rectangle to the relative position of the main tracked feature points of each object is equal to the ratio of the size of the minimum bounding rectangle to the relative position of the major feature points in the frame The size of the minimum bounding rectangle of each object is predicted and the size of the minimum bounding rectangle of the next frame is predicted as an average of the predicted sizes.

According to another aspect of the present invention, there is provided a method for tracking a multi-object based on a feature point, the method comprising the steps of: (a) detecting a relative position of major feature points in a corresponding frame, And the size of the minimum bounding rectangle is adjusted according to the size ratio of the minimum bounding rectangle.

According to another aspect of the present invention, there is provided a multi-object tracking method based on a feature point, wherein when one main feature point is included in a minimum bounding rectangle of at least two objects, the main feature point is judged to be duplicated, Is set to a speed.

In addition, the present invention is characterized in that, in the main feature point-based multi-object tracking method, the feature descriptor is a Histogram of Oriented Gradient descriptor (HOG) descriptor.

As described above, according to the multi-object tracking method based on the main feature points according to the present invention, when a plurality of SFPs are individually tracked within a video frame and are mistakenly tracked by the occlusion phenomenon, the relative positions of the correctly tracked other SFPs are utilized, Multiple objects can be successfully tracked, and significant object tracking accuracy can be achieved.

The major achievements of the method according to the present invention can be summarized as follows: (1) An effective way of dealing with moving multiple object tracking in the event of partial occlusion, one of the difficult tasks of moving multiple object tracking, is presented. The method according to the present invention has achieved significant object tracking accuracy, as shown in the experimental results. (2) We proposed an effective means to represent moving objects using bounding rectangles that closely surround the object's SFP. The tracking algorithm of the present invention enables more accurate tracking based on the SFP rather than the entire object widely used in the existing technology. (3) We define anomalies to detect false tracking points and propose a detection method of outliers. (4) The tracking method of the present invention is robust and configured so that the accurately tracked feature points play an important role in correcting the feature points traced by the blurring.

FIG. 1 is a diagram illustrating a human inference method for reconstructing an object in the occurrence of a general blurring phenomenon, in which (a) the whole body is visible, (b) the body is partially visible, A figure that shows that a person can predict an entire object (seen using a bounding rectangle), even if shown.
2 is a block diagram of a configuration of an overall system for implementing a main feature point based multi-object tracking method according to an embodiment of the present invention.
3 is a flow chart illustrating a method for tracking a main feature point based multi-object according to an embodiment of the present invention.
FIG. 4 is a data structure diagram of an object and an SFP according to the present invention, wherein (a) is an attribute of an object, and (b) is a data structure diagram of attributes of the SFP.
5 is a table showing operator definitions for vector operation according to the present invention;
6 is a view illustrating an object representation according to the present invention, wherein each circle represents the position of the characteristic point, the rectangle surrounding the points is the minimum boundary rectangle, and the dotted line represents the characteristic point Fj (R).
FIG. 7 is a diagram showing an example of the SFP position prediction using the maximum weight among particles and the average weight of particles according to the present invention. FIG. 7 (a) shows the original position of SFP in frame 1, (b) The red dot is the particle with the maximum weight, the sky color dot indicates the predicted position using the average weight of the particles, and the SFP is better than the way that the particle with the maximum weight predicts using the average weight of the particles. FIG. 7 is an illustration showing that the position is accurately predicted; FIG.
FIG. 8 is a diagram for predicting the left-to-upper corner position of a minimum bounding rectangle derived by the SFP according to the present invention, in which (a) is an ideal case, (b) Or that the prediction can be changed by a change in the characteristic point inside the minimum bounding rectangle.
9 is a diagram illustrating an example of the positional correction of an outlier SFP according to the present invention. In FIG. 9, (a) an original position of a feature point is indicated by a red dot, (b) Fig. (C) shows the predicted position of the feature point after the position modification, and the position of the outlier SFP after the outlier analysis is modified based on the previous relative position.
Figure 10 is an exemplary diagram of an overlapped SFP according to the present invention (indicated by the light blue dots), wherein the SFP belongs to the person walking from the left (object 1), and modifying the SFP descriptor at this location, An example showing that false tracking can be caused.
FIG. 11 shows a result of tracking of each method for seven video frames (2nd, 5th, 8th, 11th, 13th, 16th, and 20th) according to the experiment of the present invention. (b) MS [Non-Patent Document 14], (c) CBWH [Non-Patent Document 16], and (d) Resultant image according to the method of the present invention. As can be seen from the figure, KFOH The minimum bounding rectangle predicted by the MS method generates a large minimum bounding rectangle and the CBWH generates a long bounding minimum bounding rectangle. On the other hand, the method of the present invention shows that the two persons are tracked more accurately.
12 is a graph showing the relationship between the minimum bounding rectangle position (left-top position coordinates) of the target object and the minimum bounding rectangle position (left-top position coordinates) predicted by each method in the video frame according to the experiment of the present invention. As a graph of Euclidian distance comparison results, (a) a comparison with a female, (b) a graph with a comparison with a male.
13 is a table for comparison of the tracking accuracy of each method for a female (object 1) and a male (object 2) according to the experiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the drawings.

In the description of the present invention, the same parts are denoted by the same reference numerals, and repetitive description thereof will be omitted.

First, a configuration of an overall system for implementing a main feature point-based multi-object tracking method according to an embodiment of the present invention will be described with reference to FIG.

As shown in FIG. 2, the main feature-based multi-object tracking method according to the present invention can be implemented in a program system 30 on a computer terminal 20 that receives image data 10 and tracks multiple objects in the image. have. That is, the main feature point based multi-object tracking method may be implemented by a program and installed in the computer terminal 20 and executed. A program installed in the computer terminal 20 can operate as a single program system 30. [

The video data 10 is video stream data composed of images (or frames) continuous in time.

Meanwhile, as another embodiment, the main feature point based multi-object tracking method may be implemented by a single electronic circuit such as an ASIC (application-specific semiconductor) in addition to being operated by a general-purpose computer. Or a dedicated terminal 30 dedicated to processing only a plurality of object traces in the video stream data. This will be referred to as a main feature point-based multi-object tracking device 30. Other possible forms may also be practiced.

Next, before describing the present invention, the techniques used in the present invention will be described in more detail.

First, a particle filter (or particle filter) will be described.

The particle filter is known as a recursive Bayesian estimation method that predicts posterior probability using state density propagation rules. Bayesian tracking recursively computes the posterior density p (x ^t | z ¹ ^{: t} ) of the current state based on previously observed information. The probability density function is estimated through two steps (i.e., the prediction step and the update step). In the first Markov process, the probability density function (pdf) is obtained as follows.

[Equation 1]

&Quot; (2) "

Where κ is the state p | normalization constant and independent from the ^{(z t x t), p} (x t | z 1: t-1) and ^{p (x t | x t -1} ) is the likelihood function, respectively, the dynamic model And is temporally ahead of x ^t . Particle filters are implemented using recursive Bayesian filters through Monte Carlo simulation. Is: {n = 1, ..., N s t (n)} to represent each sample, the weight _{(w (j)) | (} z t x t) is set in the sample conditional state density p at time t The likelihood or conditional pdf is estimated by the following formula.

&Quot; (3) "

The post-pdf is updated using the weighted particles as shown in equation (4).

&Quot; (4) "

Particles are propagated according to the dynamic model of the system. Particle filters can handle nonlinear and non-Gaussian distributions by maintaining multiple hypotheses.

Next, the corner detecting method will be described.

If two dominant and unequal borders intersect in the vicinity of a point, the point is considered to be a corner point. In other words, the corner point is the point with the greatest curvature in the curve. Many corner detection methods have been devised to extract corner points [Non-Patent Documents 33, 34]. In the present invention, an algorithm (Non-Patent Document 35) in which the contour line is extracted from the Canny edge map and the absolute value of the curvature as the initial corner candidate is calculated is used for detecting the corner point. Then an adaptive curvature threshold is applied to remove the rounded corners from the initial list and the false corners are eventually removed due to the quantization noise.

Next, the Histogram of Oriented Gradient (HOG) feature will be described.

HOG [Non-Patent Document 36] is a basic feature descriptor used in computer vision. The HOG can capture the border or gradient structure that represents the partial shape properties of the object and adapt to the geometric changes of the partial photometer. The HOG is computed mainly as a block (or window) and is represented as a normalized histogram of the normalized direction of the image gradient within the dense grid. The gradient operator (usually Sobel) is applied to calculate the direction and size of the gradient. The reference and the direction are calculated using equations (5) and (6), respectively.

&Quot; (5) "

&Quot; (6) "

All pixels in the window are adopted within the histogram section along the gradient direction with the gradient size. At the end, the size of the accumulated gradient is normalized.

Next, we will explain the Bhattacharyya street.

The Bhattacharyya distance is used to measure the similarity or proximity between two separate or continuous normal distributions (Non-Patent Document 29). The distance D between the histograms H1 and H2, which have two equal size distributions, is calculated using the following equation.

&Quot; (7) "

Where n is the number of histogram intervals. Short distances indicate that matching is good between the two histograms. Usually, the distance score range is between 0 and 1.

Next, a main feature point based multi-object tracking method according to an embodiment of the present invention will be described in detail with reference to FIG.

In the multi-object tracking method based on the main feature points according to the present invention, salient feature points (SFPs) are extracted from each object when a partial occlusion occurs between the objects in the image, and based on the feature points, At the same time. The method proposed by the present invention is composed of seven steps as shown in FIG.

As shown in FIG. 3, the main feature point based multi-object tracking method according to the present invention includes the steps of (a) extracting main feature points S10, (b) calculating feature descriptors S20, (D) a main feature point detection step (S40) which is misclassified through outlier analysis, (e) a minimum boundary rectangle detection step (S50) for the main feature points tracked, f) the step of correcting the position of the main feature points that are mis-tracked, S, and the step of modifying the feature descriptors of the main feature points.

First, in step (a), objects of interest are detected from the first start frame of the video stream, and key feature points (SFPs) are extracted from each object using a corner detection algorithm. Then, And represented by a minimum bounding rectangle containing points (S10).

Next, in step (b), a feature descriptor is calculated from each SFP (S20). As a feature descriptor, HOG (histogram of oriented gradient descriptor), which is well known in the field of computer vision, is used.

Next, in step (c), a SFP position is predicted in a next frame using a particle filter (S30). Information on the shape of the object can be obtained using the relative position information of each SFP, and this relative position information is an important factor for evaluating whether the SFP is correctly detected.

Next, in step (d), an SFP that has been misjudged through outlier analysis is detected (S40).

Next, in step (e), the minimum bounding rectangle is detected using the correctly tracked SFP (S50).

Next, in step (f), the minimum bounding rectangle formed by the SFPs is corrected by considering the relative positions of the SFPs in the previous frame with respect to the mis-traced SFP (S60).

Next, in step (g), the feature descriptor is modified for the SFPs whose positions are correctly corrected (S70).

Steps (c) and (g) are repeated until the next frame remains, and the method of the present invention can successfully track multiple objects in this manner.

Each step will be described in more detail below.

First, a step S20 of extracting main feature points (S10) and calculating feature descriptors will be described in more detail.

Objects of interest are detected from the first start frame of the video stream and key feature points (SFPs) are extracted from each object using a corner detection algorithm (Non-patent Document 35) (S10).

A minimum bounding rectangle containing feature points extracted from each object can be generated, and each object is represented by this minimum bounding rectangle. As shown in FIG. 4 (a), each extracted object is represented by a set of a minimum bounding rectangle (B), a velocity (velocity, u), and a salient feature point (SFP). 4 (b), each SFP includes a location p, a relative location r, a descriptor h, and a velocity of the feature point v ), And other flags.

The table in Fig. 5 defines operators so that the calculation between these vector quantities is possible. Let the vector r = (r ₁ , r ₂ ), the vector p = (p ₁ , p ₂ ) and the vector l = (l ₁ , l ₂ )

,

,? And? Are defined as shown in the table of FIG.

The minimum bounding rectangle (B) for the target object consists of the left-topological coordinates l = (x, y) of the minimum bounding rectangle and the rectangle size s = (width, height) ) Is calculated from the moving displacement of the left-up coordinate value of the rectangle as shown in the following equation.

&Quot; (8) "

Here, B ^t and B ^t -1 are the t-th and (t-1) th frames of the object O, respectively.

As shown in FIG. 6, each object can be represented by a minimum bounding rectangle formed by a set of minutiae points. Each SFP (F _j ) is composed of a plurality of attributes as shown in FIG. 4 (b).

p is the position of the SFP, and r is the position vector of the SFP (F _j ) with respect to the left-to-right coordinates of the rectangle.

&Quot; (9) "

Therefore, r represents the position of the SFP in the minimum bounding rectangle of the object. Conversely, the location and size of the minimum bounding rectangle can be estimated using the r values of the correctly tracked feature points.

The descriptor h represents the characteristic of the feature point and uses a Histogram of Oriented Gradient descriptor (Non-Patent Document 36) widely known in the field of computer vision. The velocity v of the feature point represents the change in position in two consecutive frames and is calculated by the following equation.

&Quot; (10) "

Where p ^t and p ^t ^-1 are the positions of the SFP (F _j ) in the t-th and (t-1) -th frames, respectively. The SFP has two flags, an Outlier flag indicating whether or not the SFP has been correctly tracked, and an Overlapped flag indicating whether the SFP belongs to a position occupied by a plurality of objects .

Next, the steps (S30 to S70) for tracking the object will be described in more detail.

In general, object tracking using a particle filter (PF) estimates the posterior distribution of the position of a target object in a frame based on information obtained from past observations. The method according to the present invention also adopts this method to predict the position of the target object in each frame in the video stream.

In the normal method, the tracking algorithm tracks the entire object using a descriptor or template representing the object. However, this method has a problem that the error rate increases even if the position of the object or the condition of the image is slightly changed.

In the method according to the present invention, SFP tracking results are integrated to track each SFP of an object and predict the position of an object in a next frame. Next, the SFP attributes are modified based on the current position of the SFPs. This correction process reflects the current conditions of the feature point, and enables precise tracking even if the shape of the feature points varies slightly between frames.

Explain about particles.

The state of the SFP (F _j ) is represented by {F _j .p, F _j .r, F _j .v, F _j .h} and the set of n particles Z = {z _i | i = 1, 2, 3, ..., n} is generated from each SFP based on the current state. The particle z _i of feature point F _j of object O is generated by the following equation.

&Quot; (11) "

Here, O. F _j . v is the velocity of F _j calculated based on the position of F _j in the previous frame and N (0, d ² ) means a Gaussian distribution with mean of 0 and variance of d ² .

The generated particles allow us to estimate the possible position of the SFP in the video frame. The feature descriptor can be used to validate the particle, for which the HOG feature OF _j .z _i .h is calculated at each location OF _j .z _i .q predicted by the particle. Each particle is assigned a weight (w) using the following equation.

&Quot; (12) "

Here, BD (X, Y) is a Bhattacharyya distance (non-patent document 29) representing overlap or similarity between two distributions X and Y.

The weight of each particle is assigned by the similarity between the SFP being tracked and the SFP at the position predicted by the particle. Therefore, a larger weight means that the particle is likely to exhibit a similar SFP in the current frame.

According to the present invention, the average using the weights does not always present the exact position of the SFP, and the position of the SFP is at any position within the predicted position range. Since the algorithm presented in the present invention is based on SFP, it is important to predict the exact position of the SFP.

Therefore, in the method according to the present invention, each tracker selects a particle having the largest weight value among the particles generated by the following equation, and becomes a predicted position of the SFP that the particle is tracking. This approach gives a more accurate result than averaging the predicted positions of all particles.

&Quot; (13) "

FIG. 7 is a diagram comparing the SFP position prediction using the maximum weight among the particles and the average weight of the particles. 7 (a) shows the original position of the SFP in frame 1, and Fig. 7 (b) shows the predicted position in frame 2. Fig. Here, the red dot is the particle with the maximum weight, and the sky color dot indicates the predicted position using the average weight of the particles. As can be observed from the figure, the position of the SFP is more accurately predicted than the way that the particle with the largest weight predicts using the average weight of the particles.

Next, the outlier analysis and the minimum bounding rectangle correction step will be described.

In order to analyze the outliers, a method similar to the method proposed by RANSAC [Non-Patent Document 37] is used. However, in the method proposed in the present invention, detection is performed by only one execution instead of a plurality of iterations to detect an abnormal value. Based on the relative position of the feature point SFP (F _j ) of the object O, the position of the left-upper corner of the object's minimum bounding rectangle can be predicted using the following equation.

&Quot; (14) "

That is, p is obtained in the next frame (frame to be predicted) using the particle, and the relative position r in the next frame regards the same size / direction as the vector amount of the current frame. Then we predict the left-upper corner position of the minimum bounding rectangle (B) with p and r in the next frame.

In order to predict the position of the left-upper corner of the minimum bounding rectangle derived by the SFP, as shown in FIG. 8 (a), the ideal case is that all the predictions are directed to the left-upper corner of the minimum bounding rectangle.

However, in general, it is difficult to accurately track all SFPs because the tracking can fail or the prediction may change due to changes in feature points within the minimum bounding rectangle.

Therefore, only the prediction for the SFPs whose flag is outlier = overlapped = 0 is considered, and the Hough-like method is used to predict the left-upper corner (1) of the minimum bounding rectangle of the object of the current frame. As shown in FIG. 8 (b), the prediction by all feature points except for F ₂ is similar to each other. The prediction by F ₂ may cause tracking error, and thus, the case where most of the predictions included in distribution C and other prediction results are abnormal, is defined as follows.

Definition 1. Outlier: An outlier is defined as an outlier. An outlier is defined as an outlier. The outlier is defined by the following equation: SFP = do.

&Quot; (15) "

The position (O.B.l) of the left-upper corner of the object's minimum bounding rectangle is calculated as the predicted average value of the correctly tracked SFPs, outliers = overlapped = 0. The size of the minimum bounding rectangle is calculated using the relative position (r) of the SFPs.

The ratio of the size of the minimum bounding rectangle to the relative position of the correctly tracked SFPs (F _j ) can be estimated to be approximately the same.

&Quot; (16) "

Here, OBs ^t ^-1 is the size of the minimum bounding rectangle in the (t-1) th frame, and O.Fj.r ^t and O.Fj.r ^t ^-1 are the t-th and (t-1) The relative position of the SFP in the frame.

OBs ^t ^-1 is calculated by the following equation.

&Quot; (17) "

Where Bs ^t is the size of the minimum bounding rectangle in the t th frame and N is the number of correctly tracked SFPs.

Since the relative position of the SFP of the target object varies depending on the frame, the position of the correctly tracked SFPs is fixed after adjusting the size of the minimum bounding rectangle.

However, the position of the outlier SFP needs to be corrected using the following equation based on the previous relative position and the size of the minimum bounding rectangle.

&Quot; (18) "

Shows a modified location of the position of the corner - here, _j OF .p and OBl the left of the minimum bounding rectangle for each outlier it characterized in that the t-th frame. OF _j . ^{R t} ^-1 is the relative position of the outlier feature point in the (t-1) th frame.

FIG. 9 shows a process of modifying the position of an outlier SFP. 9 (a) shows the original position of the characteristic point as a red dot, Fig. 9 (b) shows the predicted position of the characteristic point in the next frame, and Fig. 9 Show the predicted location. After the outlier analysis, the position of the outlier SFP is appropriately modified based on the previous relative position.

Next, a method for solving the partial clipping phenomenon will be described.

Blindness is a common phenomenon when tracking multiple objects and is known to be a difficult problem that is not easy to solve. When a target object is obscured by an obstacle, the obstructed SFP becomes difficult to distinguish from the SFP of another object or the SFP of the background. When this occurs, it can not match with the exact descriptor of the obscured SFP, and erroneous position information is obtained compared to the position of the correctly tracked SFPs. In the present invention, this SFP is regarded as an outlier, and the outlier problem is solved based on the position of the correctly tracked SFPs.

Also, when two or more objects appear in close proximity or overlap in one frame, a case may occur in which a part of the SFP of the target object in the next frame is matched with the SFP of another object. In this case, the SFP is more likely to be anomalies, so modifying the SFP descriptor at that location can lead to error tracking that tracks the SFPs of other objects. In this case, the algorithm proposed in the present invention effectively solves the problem by not modifying the SFP descriptor with the overlapped flag set to 1.

FIG. 10 shows an example of an overlapped SFP. The overlapping SFP displayed by the light blue dot actually belongs to the person walking from the left (object 1), but it can best match the shoulder of the person wearing the black shirt (object 2). Therefore, modifying the SFP descriptor at this location may cause false tracing in the next frame. To solve this problem, we define the nested SFP as follows.

Definition 3. Overlapped SFP: Define this SFP as a superimposed SFP when the position of the SFP is included in one or more minimum bounding rectangles and set the flag of the SFP (F _j ) of the object O as the following expression do. Where R is a set of real numbers and m is the number of objects.

&Quot; (19) "

Where B is the minimum bounding rectangle. p and q are arbitrary real numbers. OF _j . p ∈ B _p ∩ B _q is OF _j . p and p is a description of the situation (overlap) belonging to any two minimum bounding rectangle (B).

Calculating the speed of a nested SFP is not accurate because the SFP is overlaid on many objects. Therefore, for nested SFPs, the velocity is calculated using the following equation:

&Quot; (20) "

As a result, accurately tracked SFPs can accurately and effectively track objects even when there is a change in the speed and direction of the object in the event of partial occlusion.

If the particular SFP p is an overlapping situation belonging to two minimum bounding rectangles (B), the flag overlapped = 1 is set, in which case the speed v of the p is not recomputed and the speed u of the object is used.

Next, modification of the minutiae descriptor will be described.

The shape of various parts of the target object varies depending on the frame due to the body movement and posture change. Therefore, tracking the SFP using the initial SFP descriptor (HOG) can lead to inaccurate tracking by predicting the location of the erroneous feature point in the next frame. Therefore, accurately tracked SFP descriptors are calculated using the HOG at the current frame in the current frame. However, even after correcting the position, the predicted position of the outlier SFP may not be accurate, and if the technician is modified using this, the desired SFP is not obtained. In this case, leave the technician untouched until you find another SFP that matches exactly.

Next, the effects of the present invention through experiments will be described in detail.

Three experimental data sets were used to verify the effectiveness of the algorithm developed in the present invention. Parameters such as the number of particles and the size of the HOG window were determined experimentally, the number of particles was 20, and the size of the window for calculating the HOG descriptor was 13x13. In order to verify the performance of the developed algorithm, three techniques, KF with occlusion handling (KFOH) [Non-Patent Document 13], mean shift [Non-Patent Document 14] and CBWH [Non-Patent Document 16] Performance comparison was performed. As a measure for performance evaluation, Euclidian distance d between the minimum boundary rectangle position of the target object and the minimum boundary rectangle position predicted by the developed algorithm was used. The Euclidian distance d is defined by the following equation.

&Quot; (21) "

Here, (x ₁ , y ₁ ) is the coordinate value of the left-upper corner of the minimum bounding rectangle of the target object, and (x ^GT , y ^GT ) represents the minimum bounding rectangle position of the target object identified with the naked eye. We use precision and recall as measures to verify the accuracy of each method. The precision and recall are defined by the following equations.

&Quot; (22) "

Where B ^GT and B ^Alg represent the minimum bounding rectangle of the target object identified by the naked eye and the minimum bounding rectangle predicted by the developed algorithm, respectively, and area () is calculated as the number of pixels in the minimum bounding rectangle.

PETS 2010 [Non-Patent Document 39] is used as an experimental data set, and FIG. 11 shows a result obtained by applying four algorithms to a video stream of a separated scene after two persons approach each other and overlap each other Respectively. (A) is KFOH (non-patent document 13), (b) is MS (non-patent document 14), and (c) CBWH [Non-Patent Document 16], and (d) shows the results of tracking by the method developed in the present invention. As can be seen from the figure, KFOH generates one large minimum bounding rectangle in case of occlusion, and the minimum bounding rectangle predicted by MS method contains many parts of background, CBWH is one long minimum bounding Create a rectangle. On the other hand, it can be seen that the method developed by the present invention tracks two persons more accurately.

The value obtained by comparing the Euclidian distance d between the minimum bounding rectangle position (left-top position coordinate) of the target object identified by the naked eye in the video frame and the minimum bounding rectangle position (left-top position coordinate) predicted by the developed algorithm is 12.

12 is a graph comparing Euclidian distances between the minimum bounding rectangle positions (left-top position coordinates) of the target object and the minimum bounding rectangle positions predicted by each method (left-top position coordinates) (A) is a comparison of women, and (b) is a comparison of men. In case of occlusion, the distance due to KFOH is large because this method can not track multiple objects correctly. MS and CBWH show a significant error rate for the second person. On the other hand, the method developed by the present invention shows that the tracking accuracy of the two persons is higher than the other methods.

The table in FIG. 13 is a table comparing the accuracy of each method for female (object 1) and male (object 2) using the precision and recall scales. KFOH shows a high recall in all frames, but in frames 11-16 the precision is degraded. This is because this method combines a female object and a male object to generate a single large minimum bounding rectangle in the occlusion phenomenon. The MS and CBWH methods show fairly high precision and recall for women and men in almost every frame. However, in frames 13-16, the MS is far from the actual minimum position of the bounding rectangle due to the mutual overlap of objects. In frame 13, CBWH generates a long minimum boundary rectangle for both female and male, recall. On the other hand, the method developed by the present invention maintains high precision and recall in all frames compared to other methods.

The invention made by the present inventors has been described concretely with reference to the embodiments. However, it is needless to say that the present invention is not limited to the embodiments, and that various changes can be made without departing from the gist of the present invention.

10: learning data 12: EMG signal
20: computer terminal 30: walking step recognition and prediction device

Claims

A multi-object tracking method based on feature points for tracking a plurality of objects detected in an image composed of a plurality of consecutive frames in time,
(a) extracting major feature points in a corresponding frame of the image, and calculating a minimum bounding rectangle of each object including all the major feature points of each object;
(c) predicting positions of major feature points of a next frame;
(d) determining whether the main feature points of the predicted next frame of each object are erroneously or normally tracked, using the outlier analysis;
(e) calculating a minimum bounding rectangle of a next frame of each object using the main feature points of the next frame of the normal track of each object; And
(f) using the minimum bounding rectangle of the next frame of each object to modify the main traits of each object to be tracked.

A multi-object tracking method based on feature points for tracking a plurality of objects detected in an image composed of a plurality of consecutive frames in time,
(a) extracting major feature points in a corresponding frame of the image, and calculating a minimum bounding rectangle of each object including all the major feature points of each object;
(b) calculating a feature descriptor for the main feature points of each object;
(c) predicting positions of major feature points of a next frame;
(d) determining whether the main feature points of the predicted next frame of each object are erroneously or normally tracked, using the outlier analysis;
(e) calculating a minimum bounding rectangle of a next frame of each object using the main feature points of the next frame of the normal track of each object;
(f) using the minimum bounding rectangle of the next frame of each object to modify the main traits of each object; And
(g) modifying the feature descriptor for the major feature points.

3. The method according to claim 1 or 2,
In the step (c), a plurality of particles are generated by the velocity of the main feature point and the Gaussian distribution at the positions of the main feature points of the frame, and the particles are generated by the Bhattacharyya distance between the generated particles and the main feature points. And predicting the position of the particle selected by the obtained weight to the position of the main feature point of the next frame.

The method of claim 3,
And predicting the position of the particle having the maximum weight to the position of the main feature points of the next frame.

3. The method according to claim 1 or 2,
In the step (d), the outlier analysis predicts the position of the minimum bounding rectangle of each object on the basis of the relative positions of the predicted major feature points of each object, Wherein a distribution of the positions of the boundary rectangles is obtained, and when the positions of the predicted main feature points deviate from the center of the distribution to a predetermined range, it is determined that the main feature points are mistakenly tracked.

6. The method of claim 5,
Wherein in the step (d), if the predicted position of the main feature point deviates from the mean (m) of the distribution by more than the standard deviation (?) X 2, it is determined that the main feature point is mis- Object tracking method.

6. The method of claim 5,
Wherein the distribution is a distribution of locations of left-upper corners of a minimum bounding rectangle predicted from all predicted major feature points of each object.

3. The method according to claim 1 or 2, wherein in the step (e)
A position of a minimum bounding rectangle of each object is calculated by averaging the positions of the minimum bounding rectangles calculated based on the relative positions of the main tracking points of each object,
Since the ratio of the size of the minimum bounding rectangle to the relative position of the main tracked feature points of each object is the same as the ratio of the size of the minimum bounding rectangle to the relative position of the major feature points in the frame, And estimating the size of the minimum bounding rectangle of the next frame as an average of the predicted sizes.

3. The method according to claim 1 or 2,
In the step (f), the location of the main feature points of the object that are mis-tracked is modified according to the relative position of the major feature points in the frame and the ratio of the size of the minimum bounding rectangle in the frame and the next frame Multi - object tracking method based on key feature points.

3. The method according to claim 1 or 2,
Characterized in that if one main feature point is included in the minimum bounding rectangle of at least two objects, the main feature point is judged to be duplicated, and the speed of the main feature point is set to the speed of the object. .

3. The method of claim 2,
Wherein the feature descriptor is a Histogram of Oriented Gradient descriptor (HOG) descriptor.