TWI415032B

TWI415032B - Object tracking method

Info

Publication number: TWI415032B
Application number: TW98136879A
Authority: TW
Original assignee: Univ Nat Chiao Tung
Priority date: 2009-10-30
Filing date: 2009-10-30
Publication date: 2013-11-11
Also published as: TW201115506A

Abstract

The present invention provides an object tracking method, including: firstly, finding an object image area from an initial image defined by a series of continuous images, segmenting the object image area into three object image blocks, and obtaining adaptive histograms thereof; then, according to entropy values of object feature color blocks distributed in the image, dynamically selecting a plurality of image blocks in other continuous images, and also obtaining the adaptive histograms thereof; detecting whether a tracking error occurs, if at least two adjacent image blocks in the continuous images respectively conform to at least two object image blocks, the location of the object in the continuous images can be inferred and tracked. Therefore, the present invention can rapidly obtain image features of a target object, not only the overall computation time can be saved, but also the object moving path can be correctly identified.

Description

Object tracking method

本發明係有關影像辨識技術之應用，尤其是一種物體追蹤方法。The invention relates to the application of image recognition technology, in particular to an object tracking method.

物體追蹤的技術應用十分廣闊，又以追蹤人體最為常見，而人體追蹤的習知技術多朝向物體輪廓切割(figure-ground sementation)或是時空對應法(temporal correspondences)的兩種方式進行。物體輪廓切割大部分係以單張影像為處理的標準，並嘗試在單張影像偵測出目標的物體，而時空對應的方法則是嘗試以一張物體切割為初始影像，並嘗試在連續畫面中找出相同追蹤物的變化。The technique of object tracking is very broad, and it is most common to track the human body. The conventional techniques of human tracking are mostly directed to the method of figure-ground sementation or temporal correspondence. Most of the contour cut of the object is based on a single image, and attempts to detect the target object in a single image, while the corresponding method of time and space is to try to cut an object into an initial image and try to be in a continuous image. Find out the changes in the same tracker.

輪廓切割方法又可更細分為五個子類別：背景相減法(background subtraction)、基於運動的切割法(motion-based segmentation)、基於深度的切割方法(depth-based segmentation)、基於外觀的切割法(appearance-based segmentation)和基於形狀的切割法(shape-based segmentation)。以人體追蹤技術為例，人體輪廓切割為主的方法大部分會基礎於一些有名的偵測(detection)演算法或是分類(classification)演算法，例如支持向量機(Support vector machine)、AdaBoost等方法。例如：2001年A.Mohan等人於IEEE Trans. on Pattern Analysis and Machine Intelligence期刊中提出以支持向量機(support vector machine,SVM)偵測人體四個部位之後再以支撐向量機整合偵測完整人體的演算法，這個演算法對正面之人體有著不錯的偵測效果，且可以減少遮蔽造成的影響，但是無法針對側面人體進行追蹤，此外使用支撐向量機在影像中搜尋人體部位之執行速度並不理想。2004年A. Shanshua等人於Intelligent Vehicle Symposium研討會中提出一個基礎於單張畫面的人體偵測演算法，此演算法將人體切割成九個部份，並利用AdaBoost演算法來偵測這些部位，以偵測出單張影像中的人，但跟大部分單張影像的人體偵測演算法相同，容易將非人的背景誤判為人，並且在多人的情況下無法辨別不相同個體，難以建立出每一個人之移動軌跡。The contour cutting method can be further subdivided into five sub-categories: background subtraction, motion-based segmentation, depth-based segmentation, and appearance-based cutting ( Appearance-based segmentation and shape-based segmentation. Taking human tracking technology as an example, most of the methods of human contour cutting are based on some famous detection algorithms or classification algorithms, such as Support vector machine, AdaBoost, etc. method. For example, in 2001, A. Mohan et al. proposed in the IEEE Trans. on Pattern Analysis and Machine Intelligence journals to detect four parts of the human body with a support vector machine (SVM) and then integrate and detect the complete human body with a support vector machine. Algorithm, this algorithm has a good detection effect on the frontal human body, and can reduce the impact of the shadow, but can not track the side of the human body, in addition to using the support vector machine to search the human body part of the image speed is not ideal. In 2004, A. Shanshua et al. proposed a human body detection algorithm based on a single picture in the Intelligent Vehicle Symposium seminar. The algorithm cuts the human body into nine parts and uses the AdaBoost algorithm to detect these parts. In order to detect a person in a single image, but the same as the human body detection algorithm of most single images, it is easy to misjudge the non-human background as a person, and in the case of multiple people, it is impossible to distinguish the different individuals. It is difficult to establish the movement trajectory of each individual.

另一方面，以時空對應為主的演算法大多著重在將連續影像中相同物體找出來，由於連續影像中不同的追蹤目標有著不同的外觀，而這些外觀往往難以事先訓練，因此這類方法通常重點放在立即抽取一些簡單易用的特徵，然後透過在連續影像中偵測此特徵以找出此物體在時間序列上之空間變化。由2003年K. Nummiaro等人於Image and Vision Computing期刊中提出了一個以顏色直方圖(color histogram)為主要特徵的追蹤演算法最具代表性，這個演算法採用粒子濾波器(particle filter)來建立時空對應關係，這篇方法被廣泛使用在很多追蹤技術上，但由於顏色直方圖對物體的鑑別能力較差，因此往往需要其他特徵輔助。On the other hand, algorithms based on spatio-temporal correspondence mostly focus on finding the same object in a continuous image. Since different tracking targets in a continuous image have different appearances, and these appearances are often difficult to train in advance, such methods are usually Emphasis is placed on extracting some easy-to-use features immediately, and then detecting this feature in a continuous image to find out the spatial variation of the object in time series. In 2003, K. Nummiaro et al. proposed the most representative tracking algorithm with color histogram as the main feature in Image and Vision Computing. This algorithm uses particle filter to Establishing the spatio-temporal correspondence, this method is widely used in many tracking techniques, but because of the poor ability of color histograms to discriminate objects, other features are often needed.

以下舉出習知技術以粒子濾波器擷取影像的顏色直方圖之實例。請參見第1圖，其為一灰階的攝影畫面，畫面內有兩個影像方塊：人臉影像方塊10與背景影像方塊20，將二個影像方塊採用粒子濾波器進行顏色直方圖轉換，若以每個區域都平分成八個顏色區間的方式進行轉換，則將兩個影像方塊的顏色分佈統計於同一顏色直方圖上，如第2圖所示，由於人臉影像方塊10與背景影像方塊20色階相似，導致影像中的畫素可能都落入小數目的顏色區間內，因此習知的顏色直方圖難以辨識出兩個影像方塊的差別。An example of a color histogram of an image captured by a particle filter using a conventional technique is given below. Please refer to FIG. 1 , which is a gray-scale photographic picture. There are two image blocks in the picture: a face image block 10 and a background image block 20, and the two image blocks are converted into a color histogram by using a particle filter. The color distribution of the two image blocks is counted on the same color histogram by dividing each area into eight color intervals. As shown in FIG. 2, the face image block 10 and the background image block are displayed. The 20 gradation is similar, which causes the pixels in the image to fall into a small number of color intervals. Therefore, the conventional color histogram is difficult to recognize the difference between the two image blocks.

另外，2003年D. Comaniciu等人於IEEE Trans. on Pattern Analysis and Machine Intelligence提出了一個和K. Nummiaro等人類似的作法，這個研究重點落在顏色直方圖計算時的核心函數(kernel function)及計算相似度的距離函式，而在時空對應關係的建立部分，這篇研究是以均值移動(mean-shift)為主，由於採用的方法和K. Nummiaro等人提出的方法類似，因此同樣對物體的鑑別能力較差。2004年C. Chang等人於IEEE Signal Processing Letters期刊中提出了一個基礎於人體部位的人體週期運動追蹤演算法，這個演算法利用均值移動整合粒子濾波器建立人體部位在時間序列上的關連性，這個演算法主要是應用在側面之週期肢體運動之追蹤，並不適用在一般活動之人體追蹤。In addition, in 2003, D. Comaniciu et al. proposed a similar practice to K. Nummiaro et al. in IEEE Trans. on Pattern Analysis and Machine Intelligence. This research focuses on the kernel function of color histogram calculation and Calculate the distance function of similarity, and in the part of the establishment of the space-time correspondence, this study is based on mean-shift, because the method used is similar to that proposed by K. Nummiaro et al. The ability to identify objects is poor. In 2004, C. Chang et al. proposed a human cycle motion tracking algorithm based on human body parts in the IEEE Signal Processing Letters journal. This algorithm uses the mean moving integrated particle filter to establish the correlation of human body parts in time series. This algorithm is mainly applied to the tracking of the periodic limb movements on the side, and is not suitable for human tracking in general activities.

此外，人體追蹤習知技術同時將重點放在人體切割及時空對應，1997年R. Wren等人提出了一個名叫PFinder的系統，主要用來建立一個跟電腦互動的人體追蹤，人體切割採用的是一個高斯分布的背景模型，而時空對應則是將顏色資訊及位置資訊整合成向量以計算連續畫面的相似度，這個研究重點是在於這樣整合人體切割及時空對應的架構，至於使用的特徵及時空對應的方式，都稍嫌簡單，以致於只能在較為單純的場景中使用。2000年I. Haritaoglu等人提出了一個追蹤人體部位的演算法，這個演算法是以高斯分布加上一些規則建力之背景模型切割出人體，以人體區塊影像輪廓之端點為肢體之定位點，將每張影像頭部定位點視為基準點預估其他部位定位點之位置，以建立出肢體之時空對應關係，但這樣的架構難以應付較複雜的肢體變化。2007年B. Wu及R. Nevatia於Intl. Jour. of Computer Vision期刊發表之研究提出結合人體部位偵測及平均移動追蹤演算法之追蹤系統，在人體部位偵測的部分是以AdaBoost為基礎偵測影像中人體之頭、軀幹、腳三部位之及完整之人體，之後利用連續影像中這些偵測結果之空間關係建立出時間序列中位移之變化，當偵測到之人體部位在連續影像之空間關係偏差太大時，採用平均移動演算法去調整追蹤結果，由於這個演算法依然是以偵測為主，對連續影像之位移關連性無較深入之研究，而在均值移動追蹤與人體部位偵測之關連性並不明顯，因此產生出來之追蹤軌跡較不能呈現實際上人體之移動模式。In addition, the human body tracking technology also focuses on the human body cutting time and space. In 1997, R. Wren et al. proposed a system called PFinder, which is mainly used to establish a human body tracking interaction with the computer. It is a background model of Gaussian distribution, and the space-time correspondence is to integrate color information and position information into a vector to calculate the similarity of continuous images. The focus of this research is to integrate the structure of human body cutting time and space, as well as the characteristics of use and The way of time and space is so simple that it can only be used in relatively simple scenes. In 2000, I. Haritaoglu et al. proposed an algorithm for tracking human body parts. This algorithm cuts out the human body with a Gaussian distribution and a background model of some rule-building forces, and locates the limbs with the contours of the human body image. Point, each image head positioning point is regarded as a reference point to estimate the position of other parts of the positioning point to establish the temporal and spatial correspondence of the limb, but such an architecture is difficult to cope with more complicated limb changes. In 2007, B. Wu and R. Nevatia published a study in the journal Intell. Jour. of Computer Vision, which proposed a tracking system combining human body part detection and average mobile tracking algorithm. The part of the human body detection is based on AdaBoost. Measuring the human body's head, torso, and foot parts and the intact human body, and then using the spatial relationship of these detection results in the continuous image to establish the change of the displacement in the time series, when the detected human body part is in the continuous image When the spatial relationship deviation is too large, the average moving algorithm is used to adjust the tracking result. Since this algorithm is still mainly based on detection, there is no deep research on the displacement correlation of continuous images, and the mean moving tracking and human body parts The correlation of detection is not obvious, so the resulting tracking trajectory is less able to present the actual human movement pattern.

然而，在大多數的先前技術中較少提及追蹤可能發生錯誤，當錯誤發生後如何再找到同一個人，對於以人體切割為主的方法，追蹤錯誤可能導致的問題較少，但相對的就是執行速度的犧牲及難以保證建立出來的追蹤軌跡是屬於同一個人，在2006年S. L. Dockstader及N. S. Imennov提出將人體視為是15個端點連接而成的物體，並利用隱馬可夫模型(Hidden Markov Model)來預估追蹤錯誤的發生，這個方法引進了採用時間序列之追蹤端點變化來偵測追蹤錯誤的發生，但可惜的是15個端點的人體模型難以從影像中自動定位出來，且這個研究並沒有提到追蹤錯誤發生後應該如何校正錯誤。However, in most prior art, there is less mention of tracking possible errors, how to find the same person after the error occurs. For human body cutting methods, tracking errors may cause fewer problems, but the opposite is The sacrifice of execution speed and the difficulty in ensuring the established tracking trajectory belong to the same person. In 2006, SL Dockstader and NS Imennov proposed to treat the human body as an object connected by 15 endpoints, and use Hidden Markov Model (Hidden Markov Model). To estimate the occurrence of tracking errors, this method introduces the use of time series tracking endpoint changes to detect the occurrence of tracking errors, but unfortunately the human body model of 15 endpoints is difficult to automatically locate from the image, and this The study did not mention how the error should be corrected after the tracking error occurred.

因此，在追蹤技術中，不論是物體輪廓切割法、時空對應法、前述二者之結合或是追蹤錯誤的應變，都仍有改良之空間，有鑑於此，本發明提出一種物體追蹤之方法，針對先前技術不足處提出有效的改善方法。Therefore, in the tracking technology, whether it is the object contour cutting method, the space-time correspondence method, the combination of the two or the tracking error, there is still room for improvement. In view of this, the present invention proposes an object tracking method. An effective improvement method is proposed for the deficiencies of the prior art.

本發明之主要目的係揭示一種物體追蹤之方法，係將追蹤影像的區域劃分為多個子區塊來各自追蹤，並進行總體的錯誤分析，使得在追蹤物體小部分遮蔽時，仍可持續追蹤此物體，並可自動修正在追蹤過程中的錯誤，提高追蹤物體之準確率。The main object of the present invention is to disclose an object tracking method, which divides an area of a tracking image into a plurality of sub-blocks for respective tracking, and performs an overall error analysis, so that the tracking can still be continued when tracking a small portion of the object. Objects, and can automatically correct errors during the tracking process, improving the accuracy of tracking objects.

本發明之另一目的係揭示一種物體追蹤之方法，將所擷取之影像方塊中色彩特徵轉換為適應性直方圖，以利影像方塊互相進行色彩比對，因而增進追蹤物體之正確性，並避免擷取過多細節無用的顏色資訊而影響追蹤效率的缺失。Another object of the present invention is to disclose an object tracking method, which converts color features in the captured image block into adaptive histograms, so as to facilitate image color matching between the image blocks, thereby improving the correctness of the tracking object, and Avoid the use of too much detail useless color information and affect the lack of tracking efficiency.

本發明之再一目的係揭示一種物體追蹤之方法，在擷取影像資料時根據追蹤物體在影像中的亂度分佈來調整所需擷取特徵粒子的數量，以利追蹤效能提升。A further object of the present invention is to disclose an object tracking method for adjusting the number of required feature particles according to the disordered distribution of the tracking object in the image when capturing the image data, so as to improve the tracking performance.

為達到上述之目的本發明提出一種物體追蹤之方法，首先選擇初始影像及該初始影像後續的連續影像；然後在該初始影像中定義出至少一物體的影像區域，將物體影像區域切割為至少三個物體影像方塊，並取得代表每個物體影像方塊中特徵色塊分佈的第一適應性直方圖；接續，根據物體影像方塊在連續影像中的亂度，來動態挑選連續影像中複數個影像方塊，並取其第二適應性直方圖，其中影像方塊數目與亂度有關；將獲得的第一適應性直方圖與第二適應性直方圖進行比對，偵測是否有至少二個鄰近的該影像方塊之特徵色塊分佈符合該等物體影像方塊之特徵色塊分佈；根據符合條件的該等影像方塊位置，得知該物體在該等連續影像中位移路徑。In order to achieve the above object, the present invention provides a method for object tracking, which first selects an initial image and a subsequent continuous image of the initial image; then defines an image region of at least one object in the initial image, and cuts the object image region into at least three An object image block, and obtaining a first adaptive histogram representing a distribution of feature patches in each object image block; and subsequently, dynamically selecting a plurality of image blocks in the continuous image according to the disorder of the object image block in the continuous image And taking the second adaptive histogram, wherein the number of image blocks is related to the disorder; comparing the obtained first adaptive histogram with the second adaptive histogram to detect whether there are at least two adjacent ones The feature color block distribution of the image block conforms to the feature color block distribution of the image block of the object; and the displacement path of the object in the continuous image is obtained according to the position of the image block that meets the condition.

底下藉由具體實施例配合所附的圖式詳加說明，當更容易瞭解本發明之目的、技術內容、特點及其所達成之功效。The purpose, technical contents, features and effects achieved by the present invention will be more readily understood by the detailed description of the embodiments and the accompanying drawings.

一般追蹤演算法通常由兩個程序組成而成：預測與更新。所謂的預測程序係主要根據系統中前一次模型樣本的行為進行現階段狀態預測，而在更新流程係利用現階段的觀測資料來調整追蹤物的偵測狀態。本發明係提出一種物體追蹤之方法，建立在連續影片中物體為平緩地移動的條件下，利用數個影像方塊將被追蹤的物體定位，再由預測及更新手段來追蹤此物體在接下來連續畫面中的位置。A general tracking algorithm usually consists of two programs: prediction and update. The so-called prediction program mainly predicts the current state based on the behavior of the previous model sample in the system, and the update process uses the current observation data to adjust the detection state of the tracker. The invention provides a method for object tracking, which is constructed under the condition that an object moves gently in a continuous film, and the object to be tracked is positioned by using several image blocks, and then the object is tracked by the prediction and update means. The position in the picture.

在本發明之一實施例以人體追蹤為例，請參考第1圖，首先輸入具有人體影像且時間連續之複數張影像，從中選擇一張影像作為初始影像，且初始影像的後續影像為連續影像，如圖中步驟S10所示；在初始影像中定義一塊人體影像區域，並將人體影像區域切割成至少三個影像方塊，下稱人體影像方塊，本實施例以三個不同分佈在人體各部位的人體影像方塊為例，其分別為頭部影像方塊、身體影像方塊以及肢體影像方塊，並取這三個人體影像方塊之第一適應性直方圖，如步驟S12所示。In an embodiment of the present invention, human body tracking is taken as an example. Referring to FIG. 1 , first input a plurality of images having a human body image and continuous time, and select one image as an initial image, and the subsequent image of the initial image is a continuous image. , as shown in step S10 in the figure; defining a human body image area in the initial image, and cutting the human body image area into at least three image blocks, hereinafter referred to as human body image blocks, and the present embodiment is distributed in three parts of the human body in three different parts. For example, the human body image block is a head image block, a body image block, and a limb image block, and the first adaptive histogram of the three human body image blocks is taken, as shown in step S12.

其中，人體影像方塊取得方式係以一組通用比例預設切割人體的二切割線，再上下調整該切割線尋找人體影像中頭部與上半身或上半身與下半身的色彩差別相差最大地方，因而在切線區隔的範圍內依照區塊顏色的重心位置來取得影像方塊，將影像方塊作灰階處理並取其適應性直方圖。而適應性直方圖係用來描述影像方塊中不同色彩所佔的比例，所以適應性直方圖為影像方塊中之代表性的特徵色塊分佈圖。The method for obtaining the human body image block is to cut the two cutting lines of the human body by a set of common proportions, and then adjust the cutting line up and down to find the largest difference between the color difference between the head and the upper body or the upper body and the lower body of the human body image, and thus the tangent line In the range of the interval, the image block is obtained according to the position of the center of gravity of the block color, and the image block is subjected to gray scale processing and its adaptive histogram is taken. The adaptive histogram is used to describe the proportion of different colors in the image block, so the adaptive histogram is a representative feature patch distribution in the image block.

接續，為了找出於初始影像後之連續影像中之人體影像，如步驟S14所示，而使用粒子濾波器從連續影像中根據亂度(entropy)動態選取複數個影像方塊，以適時減緩計算時間，詳言之，本發明根據亂度來動態調整需要的影像方塊，亂度大係所謂在影像中每個區域的色彩都很相似被追蹤人體，或者是被追蹤的人體失蹤時而很難辨識出人體影像時，所以上述兩種情況所擷取的粒子權重分佈廣，因此本發明將會選取大量的影像方塊，來加強人體影像追蹤；而亂度小係所謂為影像中只有少部分區域與被追蹤的人體相似，粒子權重分佈只限於這些少部分區域，所以亂度熵值小，此種狀況本發明只需選出的少量的影像方塊，即可追蹤人體。而本步驟更將選取的數個影像方塊之色塊分佈特徵化，並取每個影像方塊的第二適應性直方圖。In order to find the human body image in the continuous image after the initial image, as shown in step S14, the particle filter is used to dynamically select a plurality of image blocks from the continuous image according to the entropy to slow down the calculation time in time. In particular, the present invention dynamically adjusts the required image blocks according to the disorder, and the so-called chaos is so that the color of each area in the image is very similar to be tracked by the human body, or when the tracked human body is missing, it is difficult to identify When the human body image is taken out, the particle weights obtained in the above two cases are widely distributed. Therefore, the present invention will select a large number of image blocks to enhance the human body image tracking; and the small degree of disorder is only a small part of the image and The human body being tracked is similar, and the particle weight distribution is limited to these few regions, so the chaos entropy value is small. In this case, the invention only needs to select a small number of image blocks to track the human body. In this step, the color block distributions of the selected image blocks are further characterized, and a second adaptive histogram of each image block is taken.

再來，步驟S16將初始影像的三部分的人體影像方塊之第一適應性直方圖，其代表人體三個部位的影像特徵，與從連續影像取出的第二適應性直方圖進行比對，並獲得追蹤錯誤偵測結果。而追蹤錯誤偵測結果有三種，如下所述：Then, step S16 compares the first adaptive histogram of the three parts of the human body image block of the initial image, which represents the image features of the three parts of the human body, and compares with the second adaptive histogram taken from the continuous image, and Get tracking error detection results. There are three types of tracking error detection results, as described below:

第一、在連續影像中沒有偵測到錯誤人體影像方塊，亦即，在連續影像中找出相鄰三個可分別符合頭部影像區域、身體影像區域以及肢體影像區域之特徵色塊分佈第二適應性直方圖，即可判斷三個第二適應性直方圖之影像方塊其係代表人體位於連續影像的位置，如步驟S18所示。First, the wrong human image block is not detected in the continuous image, that is, the adjacent three image color distributions corresponding to the head image region, the body image region, and the limb image region are found in the continuous image. The two adaptive histograms can determine the image blocks of the three second adaptive histograms, which represent the position of the human body in the continuous image, as shown in step S18.

第二、在連續影像中偵測到一個錯誤人體影像方塊，亦即，在連續影像中找到兩個鄰近第二適應性直方圖，係符合頭部影像區域、身體影像區域或肢體影像區域之中任兩區域特徵色塊分佈，因為有其中一個人體影像區域未出現於連續影像中，所以人體可能發生遮蔽情況，而根據偵測到的具有人體特徵的影像方塊位置，推測所缺漏的人體影像區塊位置，換言之，若在連續影像中找到符合頭部影像區域與身體影像區域的影像區域位置，則可推測肢體影像區域位置，並可得知人體影像位於連續影像的位置，另外兩種情況同理可進行推測其人體移動的位置，如步驟S20所示。Second, an erroneous human image block is detected in the continuous image, that is, two adjacent second adaptive histograms are found in the continuous image, which are in accordance with the head image area, the body image area or the limb image area. Any two regions feature color patch distribution, because one of the human body image regions does not appear in the continuous image, the human body may be shielded, and the missing human body image region is estimated based on the detected image position of the human body image. Block position, in other words, if the position of the image area corresponding to the head image area and the body image area is found in the continuous image, the position of the limb image area can be estimated, and the position of the human body image in the continuous image can be known, and the other two cases are the same The position at which the human body moves can be estimated, as shown in step S20.

第三、在連續影像中偵測到兩個以上錯誤影像方塊，如步驟S22所示，若此情況發生，本實施例之步驟流程會回到步驟S12，在當下重新選擇當下連續影像為初始影像，並在上一張連續影像中人體影像消失的附近位置找尋人體影像，並定義此人體的位置，重新追蹤此人體。Third, two or more error image blocks are detected in the continuous image. If the situation occurs, the process of the embodiment returns to step S12 to reselect the current continuous image as the initial image. And find the human body image in the vicinity of the disappearance of the human body image in the previous continuous image, and define the position of the human body to re-track the human body.

進一步說明本發明為了增加分辨不同物體的辨識能力，本發明提出適應性直方圖來擷取影像方塊的色塊特徵。使用粒子濾波器時，若擷取數目大量的粒子可能導致速度下降，但若粒子數目過少也可能導致精確度降低，因此本發明根據粒子權重的亂度值來動態調整所擷取的粒子數目。而習知技術計算顏色直方圖通常將顏色空間劃分成若干個小顏色區間(bins)8×8×8，而當一個影像方塊的區域為32×32時，在每個顏色區間中會分到畫素數量期望值為2，此量化的顏色分佈解析空間仍有待加強。而本發明增加追蹤影像的顏色區間分類，選擇了YC_b C_r 三個獨立頻道當作顏色空間，將每個顏色的特徵值分配至對應的顏色區間，而在每個顏色區間中影像畫素的預期數目為128，其可更充分地表現出影像的顏色分佈。Further, in order to increase the recognition ability of distinguishing different objects, the present invention proposes an adaptive histogram to capture the color patch features of the image block. When a particle filter is used, if a large number of particles are extracted, the speed may be lowered. However, if the number of particles is too small, the accuracy may be lowered. Therefore, the present invention dynamically adjusts the number of particles picked up according to the turbulence value of the particle weight. The conventional technique for calculating the color histogram usually divides the color space into a number of small color bins (8×8×8), and when the area of one image block is 32×32, it is divided in each color interval. The expected number of pixels is 2, and the quantized color distribution resolution space needs to be strengthened. The invention increases the color interval classification of the tracking image, selects three independent channels YC _b C _r as the color space, assigns the feature values of each color to the corresponding color interval, and the image pixels in each color interval. The expected number is 128, which more fully expresses the color distribution of the image.

在先前技術說明過的顏色直方圖無法取出第1圖中的人臉畫面，本發明為了解決此問題，選擇一顏色直方圖H進行直方圖等化形成適應性直方圖，先將第1圖中人臉影像方塊10與背景影像方塊20之顏色直方圖H進行等化，在等化後的顏色直方圖為z，其公式係為z=M(H)，其中M函數係為將參考的顏色直方圖H等化後變成適應性直方圖z，而將另一個影像方塊的顏色直方圖H’進行等化形成另一個適應性直方圖z’=M’(H)。此種方式可防止兩個相似的影像方塊之畫素落入同樣顏色區間中，因此能將整體的顏色有效地快速量化，如第4圖所示，其係為第1圖影像方塊10、20取出的適應性直方圖，與習知技術所得之第2圖比較，本發明的適應性直方圖解析出人臉影像方塊10與背景影像方塊20更精細且有效的色塊資訊，因此可輕易辨別兩個區域的差別。In the color histogram described in the prior art, the face image in FIG. 1 cannot be taken out. In order to solve this problem, the present invention selects a color histogram H to perform histogram equalization to form an adaptive histogram, first in FIG. The face image block 10 is equalized with the color histogram H of the background image block 20. The color histogram after the equalization is z, and the formula is z=M(H), where the M function is the color to be referenced. The histogram H becomes equal to the adaptive histogram z, and the color histogram H' of the other image block is equalized to form another adaptive histogram z'=M'(H). In this way, the pixels of two similar image blocks can be prevented from falling into the same color interval, so that the overall color can be effectively and quickly quantized, as shown in FIG. 4, which is the image block 10 and 20 of FIG. The adaptive histogram is taken out, and compared with the second figure obtained by the prior art, the adaptive histogram of the present invention resolves the finer and more effective color block information of the human face image block 10 and the background image block 20, so that it can be easily distinguished. The difference between the two areas.

又由於追蹤物一直持續移動，其外觀可能也跟隨著改變，本發明亦將這些外在變動的因素納入考量，其初始的適應性直方圖的特徵值可隨著每一張連續影像的更新而變動，其函數定義如公式(1)所示：Since the tracker continues to move, its appearance may also follow changes. The present invention also takes these external factors into consideration, and the initial adaptive histogram feature value can be updated with each successive image. The function definition is as shown in formula (1):

H _t =H _t _-1 ×0.9+Q _t ×0.1　(1) H _t = H _t _-1 ×0.9+ Q _t ×0.1 (1)

其中H _t 與H _t _-1 係為在時間為t與t-1時即將要發生的直方圖，且Q _t 係為直接從時間t取出的顏色直方圖，其中係數0.9與0.1可根據影響連續影像之環境因素不同而改變。因此本發明會隨著追蹤時間變化更新初始影像資訊。Where H _t and H _t _-1 are the histograms that will occur when time is t and t-1, and Q _t is the color histogram taken directly from time t, where the coefficients 0.9 and 0.1 can be continuous according to the influence The environmental factors of the image change. Therefore, the present invention updates the initial image information as the tracking time changes.

而在粒子濾波器中，具有局部最大權重的粒子對於追蹤物體估計與粒子重置十分重要，而本發明一技術特徵係可更改擷取的粒子數目來控制狀態空間中覆蓋的範圍，使得具有最大權重的狀態(追蹤物體)在影像中得以定位。當使用粒子濾波器追蹤物體時，若許多區域的外觀和追蹤物體相像，則粒子權重將會趨近一均勻的分佈，因而此影像具有高亂度。假若物體追蹤不到時，則條件機率會變低，其權重分佈也會趨近均勻，因此，物體追蹤方法需要引進更大量的粒子數目來。反之，假若在影像中只有一個區域的外觀與追蹤物體相似，則權重將會集中在此區域的粒子上(粒子數目少)，導致該影像亂度低，在此種情況下，只需要小的影像方塊以及一點粒子數目即可進行物體追蹤。其中上述說明本發明在時間t、亂度與粒子數目三者關係如公式(2)所示：In the particle filter, the particle with local maximum weight is very important for tracking object estimation and particle reset, and a technical feature of the present invention can change the number of particles captured to control the coverage in the state space, so that the maximum The state of the weight (tracking object) is located in the image. When using a particle filter to track an object, if the appearance of many areas is similar to that of the tracking object, the particle weight will approach a uniform distribution, and thus the image has high turbulence. Conditional probability if the object is not tracked It will become lower and its weight distribution will be closer to uniform. Therefore, the object tracking method needs to introduce a larger number of particles. On the other hand, if only one area in the image looks similar to the tracking object, the weight will be concentrated on the particles in this area (the number of particles is small), resulting in low image chaos, in which case only small Object tracking can be performed by image blocks and a small number of particles. The above description shows the relationship between the time t, the disorder and the number of particles in the present invention as shown in the formula (2):

其中，ω⁽ ⁿ ⁾ 係為第N個粒子權重，C係為控制粒子增加速度的常數，例如，當C設定為2時，在t時粒子最多數量為2‧N_t-1 。在本發明將C值設定在1.2，而為了避免粒子數增加或減少過於劇烈，將N_t 數量限制在(200,1000)之間。Where ω ⁽ ⁿ ⁾ is the Nth particle weight, and C is a constant that controls the rate of particle increase. For example, when C is set to 2, the maximum number of particles at t is 2‧N _t-1 . In the present invention, the C value is set to 1.2, and in order to avoid an increase or decrease in the number of particles, the number of N _t is limited to (200, 1000).

再者，為了可確實地追蹤人體，本發明將人體分為頭部、身體、肢體(係指雙腳的部分)來同時追蹤，將初始影像中人體透過適應性高斯背景模型分割，先假設大部分的人這三個部分為相似比例，請見第5(a)圖，本發明根據預定好的高度比例設定兩條水平線位置來切割出一人體的三個部位，將預定的比例為H_h 、H_t 及H_l 。之後，垂直微調調整這兩條水平線直到水平線分出這三個部位最大的顏色差異性為止，如果垂直調整的距離過大，則將使用最原先的高度比例來進行三個部位的追蹤。而微調後三個部位高度比例轉變為R_h 、R_t 及R_l ，此切割的範圍可能包含背景區域以及干擾，加上三個部位的形狀會隨著不同追蹤人物或不同姿勢變化，因此本發明取切割後的區域更減縮的範圍，其為三個人體影像方塊30、32、34，如第5(b)圖所示，其畫素設定為，其中(C_x ,C_y )為矩形中心，(C_x ,C_y )中心定義為在R部分中所有畫素質量中心，(S_x ,S_y )矩形涵蓋的範圍，其係指在x和y空間中畫素的標準誤差。第5(b)圖顯示三個人體影像方塊30、32、34位在人體位置，可見此三個方塊內部的顏色分別具有一致性。Furthermore, in order to reliably track the human body, the present invention divides the human body into a head, a body, and a limb (a part of the two feet) to simultaneously track, and divides the human body in the initial image through an adaptive Gaussian background model, assuming a large The three parts of the part are similar proportions. See Figure 5(a). The present invention sets two horizontal line positions according to a predetermined height ratio to cut out three parts of a human body, and the predetermined ratio is H _{h .} , H _t and H _l . After that, the two horizontal lines are finely adjusted vertically until the horizontal line separates the maximum color difference of the three parts. If the vertical adjustment distance is too large, the original height ratio will be used to track the three parts. The height ratio of the three parts after the fine adjustment is changed to R _h , R _t and R _l . The range of the cut may include the background area and the interference, and the shape of the three parts will change with different tracking characters or different postures. The invention takes a more reduced range of the cut region, which is three human image blocks 30, 32, 34, as shown in Fig. 5(b), the pixel is set to Where (C _x , C _y ) is the center of the rectangle, the center of (C _x , C _y ) is defined as the center of all pixel masses in the R part, and the range covered by the (S _x , S _y ) rectangle is referred to as x And the standard error of the pixels in the y space. Figure 5(b) shows that the three human image blocks 30, 32, and 34 are in the human body position, and the colors inside the three squares are respectively consistent.

最後，本發明所提出的一項技術特徵-追蹤錯誤偵測，係針對在將人體劃分為三個區塊後，三個部位的相對位置以及部位的移動速度皆受到限制，但是人體非為鋼體結構，所以追蹤的人體影像容易產生畸形變異，因而在連續影像中產生不規則變化，因此本發明使用支持向量機(support vector machines,SVM)作為分類器來辨識人體影像區塊是否恰當地進行人體追蹤，其中SVM為常見的分類器來找出在高維空間的超平面來將兩個種類以最大邊緣區區分，其最佳化的超平面係根據公式(3)來計算計算，公式(3)如下所示：Finally, a technical feature proposed by the present invention-tracking error detection is that after dividing the human body into three blocks, the relative positions of the three parts and the moving speed of the parts are limited, but the human body is not steel. The body structure, so the tracked human body image is prone to abnormal deformation, and thus irregular changes occur in the continuous image. Therefore, the present invention uses a support vector machine (SVM) as a classifier to identify whether the human body image block is properly performed. Human body tracking, in which SVM is a common classifier to find the hyperplane in high-dimensional space to distinguish the two species by the largest edge region, and the optimized hyperplane is calculated according to formula (3), formula ( 3) as shown below:

本發明選擇了逕向基函數當作核心函數K來繪製特徵向量進入高維空間中，種類標籤表示特徵向量是否屬於追蹤錯誤，為訓練資料聚集之子集，稱之為支撐向量，係數α_i 與b 係由解決大刻度的二次方程式問題所決定。The invention selects the radial basis function as the core function K to draw the feature vector into the high dimensional space, the category label Indicates whether the feature vector is a tracking error. For the subset of training data aggregation, called the support vector, the coefficients α _i and b are determined by solving the quadratic equation problem of large scale.

而本發明設計三個SVM錯誤偵測器來偵測三個人體影像方塊的追蹤錯誤，假若沒有追蹤到至少兩個部分的人體影像區塊，則SVM錯誤偵測器會變得沒有效率，其係因為無法輕易識別是哪個部位反常，因此本發明設計第四個SVM錯誤偵測器來判斷錯誤的狀況係為一個還是兩個人體影像方塊發生錯誤。The present invention designs three SVM error detectors to detect tracking errors of three human image blocks. If the body image blocks of at least two parts are not tracked, the SVM error detector becomes inefficient. Because it is not easy to identify which part is abnormal, the present invention designs a fourth SVM error detector to determine whether the error condition is one or two human image block errors.

在使用SVM錯誤偵測器時，將初始影像的三個部位的估計狀態以及介於時間t₀ 與先前影像的時間t₀ -△t相對狀態改變之特徵向量定義為[RS _H (t ₀ ),RS _T (t ₀ ),RS _L (t ₀ ),RS _H (t ₀ )-RS _H (t ₀ -Δt),RS _T (t ₀ )-RS _T (t ₀ -Δt),RS _L (t ₀ )-RS _L (t ₀ -Δt )]^T ，其中向量RS _H (t ₀ )、RS _T (t ₀ )以及RS _L (t ₀ )表示為時間t的三個人體影像區塊的相對狀態向量。而相對狀態向量定義為公式(4)：When using the SVM error detector, the estimated state of the three parts of the initial image and the feature vector of the state change between the time t ₀ and the time t ₀ -Δt of the previous image are defined as [ RS _H ( t ₀ ) , RS _T ( t ₀ ), RS _L ( t ₀ ), RS _H ( t ₀ )- RS _H ( t ₀ -Δt), RS _T ( t ₀ )- RS _T ( t ₀ -Δt), RS _L ( t ₀ )- RS _L ( t ₀ -Δ t )] ^T , where the vectors RS _H ( t ₀ ), RS _T ( t ₀ ) and RS _L ( t ₀ ) are represented as three human image blocks of time t Relative state vector. The relative state vector is defined as equation (4):

其中，S_H (t)、S_T (t)以及S_L (t)為頭部、身軀以及肢體三個部位的估計狀態向量。Among them, S _H (t), S _T (t), and S _L (t) are estimated state vectors of the head, the body, and the limbs.

接續，使用粒子濾波器追蹤一系列影片中三個人體部位，在利用手動方式從每個SVM錯誤偵測器的追蹤結果標記訓練樣本，本發明針對每個SVM錯誤偵測器的訓練皆使用150個訓練樣本，將沒有覆蓋到人體的影像方塊，其狀態向量被標記為負值，當這些落入人體內部的影像方塊標記為正值，而追蹤器可能會因為追蹤錯誤導致難以繼續追蹤，因此本發明優先選擇一合適的負值率，可調整SVM錯誤偵測器的參數來完成目標。因此，在追蹤錯誤發生時，SVM錯誤偵測器必須調整被追蹤目標的狀態，例如：在人體追蹤遺失了兩個或三個的部位，本發明將偵測先前目標物的追蹤位置附近，並初始化整個追蹤流程；而如果只有單一部位的狀態反常，則本發明將利用其他兩個部位來調整反常部位的位置與大小。In succession, the particle filter is used to track three body parts in a series of films, and the training samples are manually labeled from the tracking results of each SVM error detector. The present invention uses 150 for each SVM error detector training. The training samples will not cover the image block of the human body, and the state vector will be marked as a negative value. When these image blocks falling inside the human body are marked as positive values, the tracker may be difficult to continue tracking due to tracking errors. The present invention preferentially selects a suitable negative rate, and can adjust the parameters of the SVM error detector to accomplish the target. Therefore, when a tracking error occurs, the SVM error detector must adjust the state of the tracked target. For example, if the body tracking loses two or three parts, the present invention will detect the vicinity of the tracking position of the previous target, and The entire tracking process is initialized; and if only the state of a single part is abnormal, the present invention will utilize the other two parts to adjust the position and size of the abnormal part.

此外，影像中追蹤物體被遮蔽的情況，亦導致追蹤物的外觀可能不能正確地從圖中取出，則本發明提出以標記方式來降低影像遮蔽造成追蹤錯誤發生的機會。In addition, in the case where the tracking object is obscured in the image, and the appearance of the tracking object may not be correctly taken out from the figure, the present invention proposes to reduce the chance of tracking error caused by image shielding by marking.

而會發生遮蔽情況通常為兩種：兩個追蹤物體影像重疊，及背景遮蔽追蹤物體。針對前種狀況本發明利用人體追蹤方法預測出兩個追蹤物體可能會發生重疊，在即將發生重疊時，停止使用追蹤錯誤偵測這部分的技術，僅使用原本的粒子濾波器來偵測追蹤物的速度及方向，直到兩個追蹤物體重疊現象消失後，再配合使用追蹤錯誤偵測。然而，若背景包括一遮蔽物，可能為柱子或門，本發明事先將遮蔽物位置手動標記，當追蹤物落入遮蔽物的範圍內時，僅使用原本的粒子濾波器來偵測追蹤物的速度及方向，停止使用追蹤錯誤偵測此部分之技術，以避免錯人體追蹤發生中斷。There are usually two types of obscuration: two tracking object image overlays, and a background obscuration tracking object. For the former situation, the present invention uses the human body tracking method to predict that two tracking objects may overlap, and when the overlap is about to occur, the technique of tracking the error detection is stopped, and only the original particle filter is used to detect the tracking object. Speed and direction, until the overlap of the two tracking objects disappears, and then use tracking error detection. However, if the background includes a shelter, which may be a pillar or a door, the present invention manually marks the position of the shield in advance, and when the tracer falls within the scope of the shield, only the original particle filter is used to detect the tracer. Speed and direction, stop using the tracking error detection technology in this part to avoid interruption of the wrong body tracking.

綜上所述，本發明提出一種物體追蹤之方法，其作法係將物體追蹤區分為多個影像方塊，在各自追蹤後進行追蹤錯誤偵測分析，因此影像中物體若發生部分遮蔽或扭曲的情形，仍可透過追蹤錯誤偵測校正回正確的追蹤結果。並且本發明為了有效率地擷取影像方塊的適應性直方圖，可突顯出物體影像方塊中特徵色塊分佈，使得後續的比對流程可快速進行，並提高追蹤物體的正確性。另外，追蹤技術在進行預測同時需要大量的影像方塊，其係從連續影像中取得，本發明採用動態彈性擷取數量的方式獲得連續影像中所需的資訊，以進一步提高本發明之效能。In summary, the present invention provides a method for object tracking, which divides object tracking into multiple image blocks, and performs tracking error detection analysis after each tracking, so that the object in the image is partially obscured or distorted. The correct tracking result can still be corrected by tracking error detection. Moreover, in order to efficiently capture the adaptive histogram of the image block, the present invention can highlight the distribution of the feature color patches in the object image block, so that the subsequent comparison process can be performed quickly, and the correctness of the tracking object is improved. In addition, the tracking technology requires a large number of image blocks to be obtained from the continuous image. The present invention uses the dynamic elastic capture quantity to obtain the information required in the continuous image to further improve the performance of the present invention.

以上所述之實施例僅係為說明本發明之技術思想及特點，其目的在使熟習此項技藝之人士能夠瞭解本發明之內容並據以實施，當不能以之限定本發明之專利範圍，即大凡依本發明所揭示之精神所作之均等變化或修飾，仍應涵蓋在本發明之專利範圍內。The embodiments described above are merely illustrative of the technical spirit and the features of the present invention, and the objects of the present invention can be understood by those skilled in the art, and the scope of the present invention cannot be limited thereto. That is, the equivalent variations or modifications made by the spirit of the present invention should still be included in the scope of the present invention.

10．．．人臉影像方塊10. . . Face image block

20．．．背景影像方塊20. . . Background image block

30．．．頭部影像方塊30. . . Head image block

32．．．身體影像方塊32. . . Body image block

34．．．肢體影像方塊34. . . Limb image block

第1圖係為一灰階攝影畫面之示意圖。Figure 1 is a schematic diagram of a grayscale photographic image.

第2圖係為從攝影畫面中兩個影像方塊所擷取之顏色直方圖。Figure 2 is a color histogram taken from two image blocks in the photographic screen.

第3圖係為本發明一實施例之步驟流程圖。Figure 3 is a flow chart showing the steps of an embodiment of the present invention.

第4圖係為從攝影畫面中兩個影像方塊所擷取之適應性直方圖。Figure 4 is an adaptive histogram taken from two image blocks in the photographic picture.

第5(a)圖係為本發明根據預先比例切割人體成三個部分之示意圖。Fig. 5(a) is a schematic view showing the invention in three parts according to a pre-proportional cutting of the human body.

第5(b)圖係為本發明所擷取之三個人體影像方塊之示意圖。Figure 5(b) is a schematic diagram of three human image blocks taken in the present invention.

Claims

A method for object tracking includes: (a) selecting an initial image and subsequent plurality of consecutive images; (b) defining an object image region of at least one object in the initial image, and cutting the object image region into at least three Object image blocks, and respectively obtaining first adaptive histograms of the image blocks of the objects, wherein each of the first adaptive histograms represents a characteristic patch distribution of the corresponding image block of the object; (c) from the continuous A plurality of image blocks are dynamically selected by a particle filter in the image, and a second adaptive histogram of the image blocks is obtained, wherein the number of the image blocks is determined by the disorder of each successive image, and each The second adaptive histogram respectively represents the characteristic patch distribution of the corresponding image block; (d) the first adaptive histogram and the second adaptive histogram are compared, and the continuous image is known Whether the feature patch distribution of the at least two adjacent image blocks respectively conforms to the characteristic patch distribution of the image blocks of the objects; and (e) according to the image blocks conforming to the objects The positions of the image block of the intrinsic color distribution of the object to obtain a continuous image at that location, the movement of the object to track.

The method of object tracking according to claim 1, wherein the continuous image is a series of time-continuous images.

The method of object tracking according to claim 1, wherein the object system is a human body.

The method of object tracking according to claim 3, wherein the object image block comprises a head image block, a body image block or a limb image block to track the head, body or limb of the human body.

The method of object tracking according to claim 1, wherein the first adaptive histogram or the second adaptive histogram is based on a color distribution of the object image block or the image block corresponding thereto. A representative feature patch map is taken out.

The method of object tracking according to claim 1, wherein in the step (d), the continuous image has two adjacent adaptive histograms of the image blocks, which respectively conform to the image of the objects Two of the first adaptive histograms of the square.

The method of object tracking according to claim 6, wherein the step (e) is to obtain the position of the object in the continuous image according to the two image blocks that match the image block of the object.

The method of object tracking according to claim 1, wherein in the step (d), the continuous image has three adjacent adaptive histograms of the image block, which respectively conform to the image of the object The first adaptive histogram of the square.

The method of object tracking according to claim 8, wherein in the step (e), the object is located at the position of the continuous image according to the three image blocks that match the image block of the object.

The method for tracking an object according to claim 1 further includes a determining step of determining, by the training result of the support vector machine algorithm, the correctness of the object at the position of the continuous image.

The method of object tracking according to claim 1, wherein the continuous image has at least one mask, and the position of the mask is marked as an error detection range, and the object is about to fall into the error detection range. When you stop tracking.

The method of object tracking as described in claim 1, wherein the initial image is determined The object image area of the two objects.

The method of object tracking according to claim 12, wherein the tracking of the object image areas is about to overlap.

The method of object tracking according to claim 1, wherein the first adaptive histograms are weighted to add the second adaptive histogram having an object feature patch distribution, thereby generating a plurality of new histograms. The first adaptive histogram causes the first adaptive histograms to be updated over time.

The method of object tracking according to claim 1, wherein the disorder is related to the appearance of the object and the appearance of each part of the continuous image, the large degree of disturbance representing the appearance of the object and the continuous image. Many of the areas have similar appearances, and the small degree of spoilage means that the appearance of the object is similar to the appearance of a small portion of the continuous image.

The method for object tracking according to claim 15, wherein in the step (c), when the disorder is large, the number of the image blocks is extracted.

The method of object tracking according to claim 15, wherein in the step (c), when the disorder is small, the number of the image blocks is reduced.