TW201501080A

TW201501080A - Method and system for object detection and tracking

Info

Publication number: TW201501080A
Application number: TW102122760A
Authority: TW
Inventors: Chin-Shyurng Fahn; Hsiu-Ting Chao
Original assignee: Univ Nat Taiwan Science Tech
Priority date: 2013-06-26
Filing date: 2013-06-26
Publication date: 2015-01-01
Also published as: TWI514327B; CN104252629A

Abstract

A method and system for object detection and tracking are provided. In the present method, a frame in a video is regarded as a first frame, and a display area of an object in the first frame is detected by scanning the first frame. Next, the object is tracked according to a ratio of the display area to the first frame and at least one reference frame in the video, wherein the at least one reference frame follows the first frame. Besides, a scale and rotation invariant feature of the object being tracked is recorded. After that, whether the object appears in a specific frame is determined by virtue of the scale and rotation invariant feature of the object.

Description

Target detection and tracking method and system

本發明是有關於一種目標偵測與追蹤方法及系統，且特別是有關於一種在複雜環境中針對一或多個目標進行即時偵測與追蹤的方法及系統。 The present invention relates to a method and system for detecting and tracking a target, and more particularly to a method and system for instantaneous detection and tracking of one or more targets in a complex environment.

近年來，視覺監控技術不斷精進且被廣泛地應用在各項領域之中。以智慧型運輸系統(Intelligent Transportation System，ITS)為例，其運用視覺監控技術進行測距及動態物體偵測來提供預先警告駕駛人不當的駕駛行為以及適應性速度控制等功能，如此避免碰撞以預防肇事產生，並可確保行車速度不超過安全範圍。 In recent years, visual monitoring technology has been continuously refined and widely used in various fields. Take the Intelligent Transportation System (ITS) as an example. It uses visual monitoring technology for ranging and dynamic object detection to provide pre-warning drivers' improper driving behavior and adaptive speed control. Prevent accidents and ensure that driving speed does not exceed safe range.

現有的測距技術是採用雷達或雷射等基於硬體的方式來進行，然而以雷達為基礎的技術雖然擁有準確的深度訊息和較長的檢測距離，但昂貴的硬體成本導致其普及化不易。而現有的動態物體偵測技術在具有複雜背景的環境下則容易造成高誤判率。目前對物體進行偵測與追蹤的技術多半需要在較單純的背景環境下才能達到良好的效果，且為了達到穩健的偵測效果又必須耗費大量的運算成本，難以同時顧及系統效能與準確率。 The existing ranging technology is based on a hardware-based approach such as radar or laser. However, radar-based technology has accurate depth information and long detection distance, but the expensive hardware cost leads to its popularization. Not easy. The existing dynamic object detection technology is prone to high false positive rate in an environment with complex background. At present, most of the techniques for detecting and tracking objects need to achieve good results in a relatively simple background environment, and it must be costly in order to achieve a robust detection effect. A large amount of computing costs, it is difficult to take into account system performance and accuracy.

有鑑於此，本發明提供一種目標偵測與追蹤方法及系統，可在明暗度差異大、畫面具有頻繁動態，以及背景複雜程度高等多種環境中，提供快速且穩健的目標物偵測與追蹤機制。 In view of this, the present invention provides a target detection and tracking method and system, which can provide a fast and stable target detection and tracking mechanism in various environments such as large difference in brightness, frequent dynamics, and high background complexity. .

本發明的目標偵測與追蹤方法包括下列步驟：首先擷取影片中的一影格(frame)做為第一影格，並掃描第一影格以偵測目標物在第一影格中的顯示區域。接著，根據顯示區域佔據第一影格的面積比例，以及影片中接續在第一影格之後的至少一參考影格來追蹤目標物，並記錄被追蹤的目標物的抗縮放及抗旋轉特徵。隨後，利用抗縮放及抗旋轉特徵來辨識目標物是否出現在特定影格中。 The target detection and tracking method of the present invention comprises the steps of: first capturing a frame in the movie as the first frame, and scanning the first frame to detect the display area of the target in the first frame. Then, the object is tracked according to the area ratio of the display area occupying the first frame, and at least one reference frame following the first frame in the movie, and the anti-scale and anti-rotation features of the tracked object are recorded. Subsequently, the anti-scale and anti-rotation features are utilized to identify whether the object is present in a particular frame.

本發明的一實施例中，目標偵測與追蹤方法更包括：取得多個訓練正樣本與多個訓練負樣本，擷取各訓練正樣本與各訓練負樣本的紋理特徵，並利用各訓練正樣本與各訓練負樣本的紋理特徵，以及一分類器融合演算法來訓練目標物分類器。 In an embodiment of the invention, the target detection and tracking method further comprises: obtaining a plurality of training positive samples and a plurality of training negative samples, extracting texture features of each training positive sample and each training negative sample, and using each training positive The sample and the texture features of each training negative sample, and a classifier fusion algorithm to train the target classifier.

在本發明的一實施例中，上述掃描第一影格以偵測目標物在第一影格中的顯示區域的步驟包括：以n個目標物分類器組成級聯式(cascade)分類器，其中n為正整數。利用級聯式分類器對第一影格進行n次掃描程序，以偵測出目標物在第一影格中的顯示區域。其中每一次的掃描程序包括：在第一影格中依照特定方向移動偵測視窗以完整掃瞄第一影格，並隨後縮小第一影格的尺寸。 In an embodiment of the invention, the step of scanning the first frame to detect the display area of the target in the first frame comprises: forming a cascade classifier with n target classifiers, wherein Is a positive integer. The first frame is scanned by n times using a cascaded classifier to detect the display area of the object in the first frame. Each of the scanning procedures includes: in the first frame Move the detection window in the direction to scan the first frame completely and then reduce the size of the first frame.

在本發明的一實施例中，上述根據顯示區域佔據第一影格的面積比例，以及影片中接續在第一影格之後的至少一參考影格，對目標物進行追蹤的步驟包括：判斷面積比例是否超過預設值。若面積比例不超過預設值，則從顯示區域擷取出關於目標物的至少一特徵點，並利用擷取自顯示區域的上述特徵點與上述參考影格，對目標物進行追蹤。若面積比例超過預設值，則從顯示區域內的多個子區域，分別擷取出關於目標物的至少一特徵點，並利用擷取自上述子區域的上述特徵點與上述參考影格，對目標物進行追蹤。 In an embodiment of the invention, the step of tracking the target according to the area ratio of the display area occupying the first frame and the at least one reference frame following the first frame in the movie includes: determining whether the area ratio exceeds default value. If the area ratio does not exceed the preset value, at least one feature point about the target object is extracted from the display area, and the target object is tracked by using the feature point extracted from the display area and the reference frame. If the area ratio exceeds the preset value, extracting at least one feature point about the target object from the plurality of sub-areas in the display area, and using the feature point extracted from the sub-area and the reference frame, Tracking.

在本發明的一實施例中，上述子區域為顯示區域的左上角區域及右下角區域。 In an embodiment of the invention, the sub-region is an upper left corner region and a lower right corner region of the display region.

在本發明的一實施例中，上述子區域彼此不重疊或部分重疊。 In an embodiment of the invention, the sub-regions do not overlap or partially overlap each other.

在本發明的一實施例中，上述參考影格包括：相鄰於第一影格的第二影格，以及相鄰於第二影格的第三影格。而根據顯示區域佔據第一影格的面積比例，以及影片中接續在第一影格之後的參考影格，對目標物進行追蹤的步驟包括：從第一影格擷取出關於目標物的至少一特徵點，根據卡爾曼(Kalman filter)濾波器，計算第二影格中的預估範圍，並執行光流追蹤(Optical flow)演算法來估測擷取自第一影格的上述特徵點在第二影格的預估範圍內的移動資訊。接著，從第二影格擷取出關於目標物的至少一特徵點，根據卡爾曼濾波器計算第三影格中的預估範圍，並執行光流追蹤演算法來估測擷取自第二影格的上述特徵點在第三影格的預估範圍內的移動資訊。 In an embodiment of the invention, the reference frame comprises: a second frame adjacent to the first frame, and a third frame adjacent to the second frame. And according to the ratio of the area occupied by the display area to the first frame, and the reference frame following the first frame in the movie, the step of tracking the target includes: extracting at least one feature point about the target from the first frame, according to The Kalman filter filters the estimated range in the second frame and performs an optical flow algorithm to estimate the prediction of the above feature points from the first frame in the second frame. Fan Mobile information within the perimeter. Then, at least one feature point about the target object is taken from the second frame, the estimated range in the third frame is calculated according to the Kalman filter, and an optical flow tracking algorithm is performed to estimate the above-mentioned from the second frame. The movement information of the feature point within the estimated range of the third frame.

在本發明的一實施例中，目標偵測與追蹤方法更包括：利用加速強健特徵擷取(Speeded Up Robust Features，SURF)演算法建立抗縮放及抗旋轉特徵。 In an embodiment of the invention, the target detection and tracking method further comprises: using a Speeded Up Robust Features (SURF) algorithm to establish anti-scale and anti-rotation features.

在本發明的一實施例中，上述特定影格是時序上在所有參考影格之後的任意影格，而利用抗縮放及抗旋轉特徵來辨識目標物是否出現在特定影格中的步驟包括：從特定影格擷取出待辨識特徵，比對待辨識特徵與抗縮放及抗旋轉特徵。當待辨識特徵符合抗縮放及抗旋轉特徵時，則判定目標物出現於特定影格中。 In an embodiment of the invention, the specific frame is any frame that is temporally after all reference frames, and the step of using the anti-scaling and anti-rotation features to identify whether the object appears in a particular frame includes: from a specific frame 撷The feature to be identified is taken out, compared to the feature to be recognized and the anti-scale and anti-rotation features. When the feature to be identified conforms to the anti-scale and anti-rotation features, it is determined that the object appears in a particular frame.

本發明的目標偵測與追蹤系統包括：相互耦接的儲存單元與處理單元。儲存單元記錄多個模組，而處理單元存取並執行儲存單元中的上述模組。上述模組包括：偵測模組、追蹤模組以及辨識模組。偵測模組擷取影片中的一影格做為第一影格，並掃描第一影格以偵測目標物在第一影格中的顯示區域。追蹤模組根據顯示區域佔據第一影格的面積比例，以及影片中接續在第一影格之後的至少一參考影格來追蹤目標物，並記錄被追蹤的目標物的抗縮放及抗旋轉特徵。辨識模組利用抗縮放及抗旋轉特徵來辨識目標物是否出現在特定影格中。 The target detection and tracking system of the present invention comprises: a storage unit and a processing unit coupled to each other. The storage unit records a plurality of modules, and the processing unit accesses and executes the above modules in the storage unit. The above modules include: a detection module, a tracking module, and an identification module. The detection module captures a frame in the movie as the first frame, and scans the first frame to detect the display area of the target in the first frame. The tracking module tracks the target according to the area ratio of the display area occupying the first frame, and at least one reference frame following the first frame in the movie, and records the anti-scale and anti-rotation features of the tracked object. The recognition module utilizes anti-scale and anti-rotation features to identify whether the object appears in a particular frame.

基於上述，本發明的目標偵測與追蹤方法及系統是利用基於機器學習的演算法和級聯式架構來即時偵測複雜動態場景中的目標物，再利用結合光流追蹤演算法和卡爾曼濾波器的追蹤機制，對目標物進行追蹤。本發明的目標偵測與追蹤方法及系統可廣泛地應用於各種環境，並提供快速且穩健的偵測與追蹤效果。 Based on the above, the target detection and tracking method and system of the present invention utilizes Machine learning-based algorithms and cascaded architectures are used to instantly detect objects in complex dynamic scenes, and then use the optical flow tracking algorithm and the Kalman filter tracking mechanism to track the target. The target detection and tracking method and system of the present invention can be widely applied to various environments and provide fast and robust detection and tracking effects.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。 The above described features and advantages of the invention will be apparent from the following description.

100‧‧‧目標偵測與追蹤系統 100‧‧‧Target Detection and Tracking System

110‧‧‧儲存單元 110‧‧‧ storage unit

111‧‧‧訓練模組 111‧‧‧ training module

113‧‧‧偵測模組 113‧‧‧Detection module

115‧‧‧追蹤模組 115‧‧‧Tracking module

117‧‧‧辨識模組 117‧‧‧ Identification Module

120‧‧‧處理單元 120‧‧‧Processing unit

S210~S230‧‧‧本發明一實施例所述的目標偵測與追蹤方法的各步驟 S210~S230‧‧‧ steps of the target detection and tracking method according to an embodiment of the present invention

310‧‧‧訓練正樣本 310‧‧‧ training positive sample

320‧‧‧訓練負樣本 320‧‧‧ training negative samples

330‧‧‧特徵擷取單元 330‧‧‧Character extraction unit

340‧‧‧訓練單元 340‧‧‧ training unit

400‧‧‧級聯式分類器 400‧‧‧ Cascade Classifier

C₁、C₂、C₃、C₄‧‧‧目標物分類器 C ₁ , C ₂ , C ₃ , C ₄ ‧ ‧ target classifier

F₁‧‧‧第一影格 F ₁ ‧‧‧first frame

NDR₁、NDR₂、NDR₃、NDR₄‧‧‧非目標物區域 NDR ₁ , NDR ₂ , NDR ₃ , NDR ₄ ‧‧‧ Non-target area

DR‧‧‧顯示區域 DR‧‧‧ display area

W‧‧‧偵測視窗 W‧‧‧Detection window

60‧‧‧目標物 60‧‧‧ Targets

63‧‧‧顯示區域 63‧‧‧Display area

63’‧‧‧縮小區域 63’‧‧‧Reduced area

65、67‧‧‧子區域 65, 67‧‧‧ sub-areas

+‧‧‧特徵點 +‧‧‧Feature points

R₁、R₂、R₃‧‧‧預估範圍 R ₁ , R ₂ , R ₃ ‧‧‧ Estimated range

O₁、O₂、O₃‧‧‧光流運算區域 O ₁ , O ₂ , O ₃ ‧‧‧ optical flow computing area

S810~S850‧‧‧本發明一實施例所述的辨識目標物是否出現在特定影格的各步驟 S810~S850‧‧‧ The steps of identifying whether the target object appears in a specific frame according to an embodiment of the present invention

圖1是依照本發明一實施例所繪示的目標偵測與追蹤系統的方塊圖。 FIG. 1 is a block diagram of a target detection and tracking system according to an embodiment of the invention.

圖2是依照本發明一實施例所繪示的目標偵測與追蹤方法的流程圖。 2 is a flow chart of a method for detecting and tracking a target according to an embodiment of the invention.

圖3是依照本發明一實施例所繪示的訓練目標物分類器的示意圖。 FIG. 3 is a schematic diagram of a training target classifier according to an embodiment of the invention.

圖4是依照本發明一實施例所繪示的級聯式分類器的示意圖。 4 is a schematic diagram of a cascaded classifier according to an embodiment of the invention.

圖5是依照本發明一實施例所繪示的級聯式分類器進行數次掃描程序的示意圖。 FIG. 5 is a schematic diagram of a cascaded classifier performing a plurality of scanning procedures according to an embodiment of the invention.

圖6A至6C是依照本發明數個實施例所繪示的擷取目標物的特徵點的示意圖。 6A-6C are schematic diagrams showing feature points of a captured object according to several embodiments of the present invention.

圖7是依照本發明一實施例所繪示的在相鄰影格中，對目標物進行追蹤的示意圖。 FIG. 7 is a diagram of a target in an adjacent frame according to an embodiment of the invention. Schematic diagram of the object tracking.

圖8是依照本發明一實施例所繪示的辨識目標物是否出現在特定影格的流程圖。 FIG. 8 is a flow chart of identifying whether an object appears on a particular frame, according to an embodiment of the invention.

圖1是依照本發明一實施例所繪示的目標偵測與追蹤系統的方塊圖。請參閱圖1，本實施例的目標偵測與追蹤系統100可實作於電腦系統、工作站、伺服器，或任何具備運算及處理能力的電子裝置。目標偵測與追蹤系統100包括：相互耦接的儲存單元110與處理單元120，其功能分述如下：儲存單元110例如是隨機存取記憶體(Random Access Memory，RAM)、唯讀記憶體(Read-Only Memory，ROM)、快閃記憶體(Flash memory)、硬碟或其他類似裝置或這些裝置的組合，而用以記錄可由處理單元120執行的多個模組，這些模組可載入處理單元120以執行目標物的偵測與追蹤功能。 FIG. 1 is a block diagram of a target detection and tracking system according to an embodiment of the invention. Referring to FIG. 1, the target detection and tracking system 100 of the present embodiment can be implemented in a computer system, a workstation, a server, or any electronic device having computing and processing capabilities. The target detection and tracking system 100 includes: a storage unit 110 and a processing unit 120 coupled to each other, and its functions are as follows: the storage unit 110 is, for example, a random access memory (RAM), a read-only memory ( Read-Only Memory (ROM), Flash memory, hard disk or other similar device or a combination of these devices for recording a plurality of modules executable by the processing unit 120, which can be loaded The processing unit 120 performs a detection and tracking function of the target.

處理單元120可以是中央處理單元(Central Processing Unit，CPU)，或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位訊號處理器(Digital Signal Processor，DSP)、可程式化控制器、特殊應用積體電路(Application Specific Integrated Circuits，ASIC)、可程式化邏輯裝置(Programmable Logic Device，PLD)或其他類似裝置或這些裝置的組合。處理單元120能存取並執行記錄在儲存單元110中的各個模組，使目標偵測與追蹤系統100對一或多個目標物進行即時偵測與追蹤。 The processing unit 120 can be a central processing unit (CPU), or other programmable general purpose or special purpose microprocessor (Microprocessor), digital signal processor (DSP), programmable Controllers, Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), or other similar devices or combinations of these devices. The processing unit 120 can access and execute each module recorded in the storage unit 110 to make the target The detection and tracking system 100 performs instant detection and tracking of one or more objects.

記錄在儲存單元110中的模組包括：訓練模組111、偵測模組113、追蹤模組115以及辨識模組117，上述模組可為電腦程式並在處理單元120的執行下實現目標物的偵測與追蹤功能；以下即舉實施例說明目標偵測與追蹤系統100的詳細運作方式。 The modules recorded in the storage unit 110 include: a training module 111, a detection module 113, a tracking module 115, and an identification module 117. The modules can be computer programs and implement objects under the execution of the processing unit 120. The detection and tracking function; the following embodiment illustrates the detailed operation of the target detection and tracking system 100.

圖2是依照本發明一實施例所繪示的目標偵測與追蹤方法的流程圖，請同時參閱圖1與圖2。在本實施例中，目標偵測與追蹤系統100可接收由網路攝影機(或其他影片拍攝裝置)在各種環境(例如，高速公路、市郊或市區街道，但不限於此)下拍攝的影片(video)，並對影片中屬於特定種類的目標物進行偵測與追蹤。其中，影片包括複數張靜態的影格(frame)，而目標物可以是車輛、人臉，或其他種類的物件。 2 is a flow chart of a method for detecting and tracking a target according to an embodiment of the present invention. Please refer to FIG. 1 and FIG. 2 at the same time. In this embodiment, the target detection and tracking system 100 can receive a movie taken by a webcam (or other film capture device) in various environments (eg, highways, suburbs, or urban streets, but not limited to). (video), and detect and track specific objects in the movie. The movie includes a plurality of static frames, and the target may be a vehicle, a human face, or other kinds of objects.

首先在步驟S210中，偵測模組113擷取影片中的一影格做為第一影格，並掃描第一影格，以偵測目標物在第一影格中的顯示區域。 First, in step S210, the detecting module 113 captures a frame in the movie as the first frame, and scans the first frame to detect the display area of the target in the first frame.

詳言之，為了讓偵測模組113在線上偵測階段能偵測出目標物是否出現在某一影格，以及目標物在該影格中的顯示區域，在本實施例中，訓練模組111將在離線訓練階段取得大量的訓練正、負樣本，並透過機器學習方法來訓練出能區分目標物與背景的目標物分類器。圖3是依照本發明一實施例所繪示的訓練目標物分類器的示意圖。如圖3所示，訓練模組111取得多個訓練正樣本310與多個訓練負樣本320。其中，訓練正樣本310是包含與目標物具有相同種類的物件的圖片，而訓練負樣本320則是不包含與目標物具有相同種類的物件的圖片。訓練模組111利用特徵擷取單元330分別擷取所有訓練正樣本310與訓練負樣本320的紋理特徵。舉例來說，特徵擷取單元330可結合積分圖像技術來快速取得每一訓練正樣本310與訓練負樣本320中明顯的紋理特徵，上述紋理特徵例如是類哈爾(Haar-like)特徵，但本發明並不侷限於此。爾後，訓練模組111利用訓練單元340對各訓練正樣本與各訓練負樣本的紋理特徵執行一分類器融合(Classifier Fusion)演算法，進而訓練出一目標物分類器。舉例來說，訓練單元340是以適應性增益(Adaptive Boosting)演算法訓練出一個可以區分正樣本(即，與目標物具有相同種類的物件)和負樣本(即，與目標物具有不同種類的物件)的目標物分類器，此目標物分類器是由多個弱分類器所組成，這是能夠區別欲偵測的目標物的基本元件。 In detail, in order to enable the detection module 113 to detect whether the target appears on a certain frame and the display area of the target in the frame, the training module 111 is used in this embodiment. A large number of training positive and negative samples will be obtained during the offline training phase, and a target classifier that can distinguish between the target and the background is trained through a machine learning method. FIG. 3 is a schematic diagram of a training target classifier according to an embodiment of the invention. As shown in FIG. 3, the training module 111 obtains a plurality of training positive samples 310 and a plurality of training negative samples 320. Wherein, the training positive sample 310 is a package A picture containing the same kind of object as the target, and the training negative sample 320 is a picture that does not contain the same kind of object as the target. The training module 111 utilizes the feature extraction unit 330 to retrieve the texture features of all the training positive samples 310 and the training negative samples 320, respectively. For example, the feature extraction unit 330 can quickly acquire the distinct texture features in each training positive sample 310 and the training negative sample 320 in combination with the integral image technique, such as a Haar-like feature. However, the invention is not limited to this. Then, the training module 111 performs a classifier fusion algorithm on the texture features of each training positive sample and each training negative sample by using the training unit 340, thereby training a target classifier. For example, the training unit 340 trains an adaptive Boosting algorithm to distinguish between a positive sample (ie, an object of the same kind as the target) and a negative sample (ie, a different type from the target). The object classifier of the object, which is composed of a plurality of weak classifiers, which are basic elements capable of distinguishing the object to be detected.

在線上偵測階段，為了兼顧偵測速度與準確性，偵測模組113以n個(n為正整數)目標物分類器組成一級聯式(cascade)分類器，並利用級聯式分類器對第一影格進行n次掃描程序以偵測出目標物在第一影格中的顯示區域。由於級聯式分類器能盡早排除第一影格中不可能是目標物的區域，因此可以大幅提升偵測速度。圖4是依照本發明一實施例所繪示的級聯式分類器的示意圖，請參閱圖4，在此範例中，假設偵測模組113是以4個目標物分類器C₁至C₄組成級聯式分類器400。第一影格F₁會經過目標物分類器C₁至C₄的層層檢測，而每一層級的目標物分類器會從第一影格F₁中排除不屬於目標物的非目標物區域(即，非目標物區域NDR₁至NDR₄)，而通過所有目標物分類器C₁至C₄的區域即為目標物的顯示區域DR。 In the online detection phase, in order to balance the detection speed and accuracy, the detection module 113 forms a cascade classifier with n (n is a positive integer) target classifier, and utilizes a cascade classifier. The first frame is subjected to n scanning procedures to detect the display area of the object in the first frame. Since the cascaded classifier can eliminate the area of the first frame that cannot be the target as early as possible, the detection speed can be greatly improved. 4 is a schematic diagram of a cascaded classifier according to an embodiment of the invention. Referring to FIG. 4, in this example, the detection module 113 is assumed to be four target classifiers C ₁ to C _{4 .} A cascade type classifier 400 is formed. The first frame F ₁ passes through the layer detection of the target classifiers C ₁ to C ₄ , and the target classifier of each level excludes the non-target region that does not belong to the target from the first frame F ₁ (ie The non-target regions NDR ₁ to NDR ₄ ), and the regions passing through all the target classifiers C ₁ to C ₄ are the display regions DR of the target.

圖5是級聯式分類器400進行4次掃描程序的示意圖。請參閱圖5，為了找出目標物在第一影格F₁中的顯示區域，在每一次的掃描程序中，級聯式分類器400會在第一影格F₁中依照特定方向移動偵測視窗W，以完整掃瞄第一影格F₁，特定方向例如是先由左至右、再由上至下地掃描第一影格F₁。隨後，級聯式分類器400縮小第一影格F₁的尺寸，再進行下一次的掃描程序，據此能找出第一影格F₁中，具有不同大小的目標物的顯示區域。圖5所示的掃描方式又稱為圖像金字塔式的掃描。 FIG. 5 is a schematic diagram of the cascaded classifier 400 performing four scanning procedures. Refer to FIG. 5, in order to identify the object in each scanning procedure, the cascade classifier 400 will move in the particular detection window in the direction F, the first frame of _a display area in _a first frame of the F W, in a full scan of the first frame F ₁ , the specific direction is, for example, scanning the first frame F ₁ from left to right and then from top to bottom. Subsequently, the cascade classifier 400 the first frame of reduced size F _1, then the next scan procedure, whereby the first can find Movies frame F _1, the object having a different size of the display area. The scanning method shown in FIG. 5 is also called an image pyramid type scanning.

如上所述，偵測模組113能偵測出第一影格中，屬於特定種類的所有目標物的顯示區域，然而為了方便說明，在以下假設第一影格中，僅具有一個目標物，因此偵測模組113僅偵測出一個目標物的顯示區域。 As described above, the detection module 113 can detect the display area of all the objects belonging to a specific category in the first frame. However, for convenience of explanation, it is assumed that the first frame has only one target, so the detection The test module 113 detects only the display area of one object.

在偵測模組113偵測出目標物的顯示區域後，本實施例的追蹤模組115會利用結合卡爾曼濾波器(Kalman filter)及光流追蹤(Optical flow)演算法的追蹤機制，來達到快速且穩定的目標物追蹤效果。詳言之，如步驟S220所示，追蹤模組115根據顯示區域佔據第一影格的面積比例，以及影片中接續在第一影格之後的至少一參考影格來追蹤目標物，並記錄被追蹤的目標物的抗縮放及抗旋轉特徵。在本實施例中，追蹤模組115僅會對影格的某部份(或某些部分)執行光流追蹤演算法，而不會對整張影格進行追蹤。舉例來說，在偵測模組113偵測出顯示區域後，追蹤模組115僅利用光流追蹤演算法，對顯示區域(或顯示區域內的一或多個更小的區域)進行運算，對於擷取自目前影格之部分區域的特徵點，在下一張影格中，找出特徵點的移動位置，以進行目標物的追蹤。以下以數個範例來說明追蹤模組115如何擷取用於光流追蹤演算法的特徵點。 After the detection module 113 detects the display area of the target object, the tracking module 115 of the embodiment uses a tracking mechanism combining a Kalman filter and an optical flow algorithm. A fast and stable target tracking effect is achieved. In detail, as shown in step S220, the tracking module 115 tracks the target according to the area ratio of the display area occupying the first frame, and at least one reference frame following the first frame in the movie, and records the target being tracked. Resistance Zoom and anti-rotation features. In this embodiment, the tracking module 115 performs an optical flow tracking algorithm only on a certain portion (or portions) of the frame without tracking the entire frame. For example, after the detection module 113 detects the display area, the tracking module 115 uses only the optical flow tracking algorithm to calculate the display area (or one or more smaller areas in the display area). For the feature points extracted from parts of the current frame, in the next frame, find the moving position of the feature points to track the target. The following describes how the tracking module 115 captures feature points for the optical flow tracking algorithm in several examples.

一般來說，由於影片捕捉的是目標物在環境中的移動狀態，因此同一個目標物在不同影格中的大小也不盡相同。當欲追蹤的目標物十分靠近拍攝影片的網路攝影機時，目標物的顯示區域在影格中所佔據的面積比例便相當大。由於追蹤模組115需要對較大的區域進行運算處理，因而會導致運算效率低落的情況產生。為了提升處理速度，針對影格中過大的目標物，追蹤模組115僅利用目標物的數個局部區域的資訊來進行追蹤處理，如此便能降低影格中過大的目標物的運算時間成本。 In general, because the film captures the moving state of the target in the environment, the same target is not the same size in different frames. When the target to be tracked is very close to the webcam that shoots the movie, the proportion of the area occupied by the display area of the target in the frame is quite large. Since the tracking module 115 needs to perform arithmetic processing on a large area, it may result in a low computational efficiency. In order to increase the processing speed, the tracking module 115 uses only the information of several local regions of the target to perform tracking processing for the object that is too large in the frame, so that the operation time cost of the excessive target in the frame can be reduced.

具體而言，在本實施例中，追蹤模組115會判斷目標物的顯示區域佔據第一影格的面積比例是否超過一預設值，預設值與影片的解析度及影格的大小有關，可經由實驗決定其適當的數值。若面積比例不超過預設值，表示目標物相較於第一影格來說不至於過大，因此追蹤模組115將目標物視為一整體來處理，而從顯示區域擷取出關於目標物的至少一特徵點，並利用擷取自顯示區域的上述特徵點與參考影格對目標物進行追蹤。倘若面積比例超過預設值，表示目標物在第一影格中佔據了較大的面積，追蹤模組115會將目標物分為數個部份來進行追蹤。基此，追蹤模組115在顯示區域內定義多個子區域，並分別從這些子區域中，擷取出關於目標物的至少一特徵點，再利用擷取自各子區域的特徵點與參考影格來對目標物進行追蹤。 Specifically, in this embodiment, the tracking module 115 determines whether the display area of the target occupies an area ratio of the first frame exceeds a preset value, and the preset value is related to the resolution of the movie and the size of the frame. The appropriate values are determined experimentally. If the area ratio does not exceed the preset value, it means that the target object is not too large compared to the first frame, so the tracking module 115 treats the target object as a whole, and extracts at least the target object from the display area. a feature point, and use the self-display The above feature points of the display area and the reference frame track the target. If the area ratio exceeds the preset value, indicating that the target occupies a large area in the first frame, the tracking module 115 divides the target into several parts for tracking. Based on this, the tracking module 115 defines a plurality of sub-regions in the display area, and extracts at least one feature point about the target object from the sub-regions, and then uses the feature points and reference frames extracted from the sub-regions. Track the target.

圖6A至6C是依照本發明的數個實施例所繪示的擷取目標物的特徵點的示意圖。在圖6A中，假設目標物60(車輛)在第一影格F₁中的顯示區域63佔據第一影格F₁的面積比例並未超過預設值。在一實施例中，追蹤模組115直接從顯示區域63中，擷取出關於目標物的一或多個特徵點。在另一實施例中，由於顯示區域63的邊緣處可能涵蓋到不屬於目標物的背景，為了避免擷取到的特徵點是屬於背景區，追蹤模組115會取得顯示區域63向中心內縮的一個縮小區域63’，並從此縮小區域63’中，擷取出關於目標物60的一或多個特徵點(在圖6A中，以符號“+”表示)。然而只要顯示區域63佔據第一影格F₁的面積比例不超過預設值，追蹤模組115便會將目標物視為一整體來進行追蹤。 6A to 6C are schematic diagrams showing feature points of a captured object according to several embodiments of the present invention. In FIG. 6A, assuming that the object 60 (vehicle) F ₁ in the first frame of the display area 63 occupies the area ratio of the first frame F ₁ Movies does not exceed a preset value. In one embodiment, the tracking module 115 extracts one or more feature points about the target directly from the display area 63. In another embodiment, since the edge of the display area 63 may cover a background that does not belong to the target, in order to prevent the captured feature point from belonging to the background area, the tracking module 115 may obtain the display area 63 to be centered and retracted. A reduced area 63', and from this reduced area 63', extracts one or more feature points (indicated by the symbol "+" in Fig. 6A) with respect to the object 60. However, as long as the area occupied by the first 63 Frame F ₁ area ratio does not exceed a preset value, the tracking module 115 will display the target was viewed as a whole track.

在圖6B所示的範例中，假設目標物60在第一影格F₁中的顯示區域63佔據第一影格F₁的面積比例超過預設值，追蹤模組115為了將目標物60分為數個部份來進行追蹤，首先在第一影格F₁中定義兩個子區域65與67，並分別在子區域65與67中擷取出關於目標物60的一或多個特徵點(在圖6B中，以符號“+”表示)。在本實施例中，子區域65為顯示區域63的左上角區域，而子區域67為顯示區域63的右下角區域，此時兩個子區域有部分重疊。須特別說明的是，子區域65與67的大小可以是預設的固定大小，也可隨顯示區域63的大小動態調整。而由於顯示區域63的邊緣處可能包含不屬於目標物的背景，為了避免擷取到的特徵點是位於背景中，因此子區域65與67的位置並不會完全靠近顯示區域63的邊緣，而是向顯示區域63的中心內縮一特定距離。 In the example shown in FIG 6B, the target 60 is assumed to occupy the first frame of the area ratio F ₁ exceeds a predetermined value in the display region of the first frame F ₁ in the Movies 63, the tracking module 115 to the object 60 is divided into a number of for tracking part, first define two sub-regions in the first frame of ₁₆₅ F and 67, respectively, and retrieving a plurality of characteristic points on a target or object 60 (in FIG. 6B in the sub-regions 67 and 65 , indicated by the symbol "+"). In the present embodiment, the sub-region 65 is the upper left corner region of the display region 63, and the sub-region 67 is the lower right corner region of the display region 63, in which case the two sub-regions partially overlap. It should be particularly noted that the size of the sub-areas 65 and 67 may be a preset fixed size or may be dynamically adjusted according to the size of the display area 63. Since the edge of the display area 63 may contain a background that does not belong to the target, in order to prevent the captured feature points from being located in the background, the positions of the sub-areas 65 and 67 are not completely close to the edge of the display area 63, and It is contracted to the center of the display area 63 by a specific distance.

在圖6C所示的實施例中，假設目標物60在第一影格F₁中的顯示區域63佔據第一影格F₁的面積比例超過預設值，追蹤模組115在第一影格F₁中，定義兩個子區域65與67，並分別在子區域65與67中，擷取出關於目標物60的一或多個特徵點(在圖6C中，以符號“+”表示)，進而將目標物60分為數個部份來進行追蹤。如圖6C所示，在此範例中，子區域65與67彼此互不重疊。 In the embodiment illustrated in FIG. 6C, assume that the target object 60 in the first frame F ₁ Movies display area 63 occupies the area ratio of the first frame of F ₁ exceeds a preset value, the tracking module 115 of the first frame F ₁ in Movies Defining two sub-regions 65 and 67, and extracting one or more feature points about the object 60 in the sub-regions 65 and 67, respectively (in FIG. 6C, denoted by the symbol "+"), and then the target The object 60 is divided into several parts for tracking. As shown in FIG. 6C, in this example, the sub-regions 65 and 67 do not overlap each other.

在圖6B與6C的實施例中，雖然是將兩個子區域定義為顯示區域63的左上角區域與右下角區域，但本發明並不侷限於此。換言之，在其他實施例中，追蹤模組115也可定義兩個以上的子區域，且各子區域的位置也不侷限於顯示區域的左上角區域或右下角區域。 In the embodiment of FIGS. 6B and 6C, although the two sub-regions are defined as the upper left corner region and the lower right corner region of the display region 63, the present invention is not limited thereto. In other words, in other embodiments, the tracking module 115 may also define more than two sub-regions, and the location of each sub-region is not limited to the upper left corner region or the lower right corner region of the display region.

從另一方面來看，傳統的光流追蹤演算法是在擷取特徵點後，針對同一套特徵點的動向進行追蹤。但由於本實施例是應用在明暗變化量大的複雜環境，為了適應場景變化，追蹤模組115每次擷取的一套特徵點，只會用來進行一次光流追蹤程序。詳言之，追蹤模組115針對每個影格，都將擷取一套特徵點，而對於此套特徵點，只進行一次光流追蹤程序，隨即在新的影格中，再次擷取新的一套特徵點，由於每套特徵點的使用時間很短，如此可以避免傳統光流追蹤演算法必須維持在一致亮度的限制。在本實施例中，追蹤模組115是利用卡爾曼濾波器，預估特徵點在下一影格中可能出現的範圍，以兼顧追蹤速度與準確性。 On the other hand, the traditional optical flow tracking algorithm tracks the movement of the same set of feature points after capturing the feature points. However, since this embodiment is applied to a complex environment with a large amount of change in brightness and darkness, in order to adapt to the scene change, the set of feature points captured by the tracking module 115 each time is only used to perform an optical flow tracking process. Detailed The tracking module 115 will capture a set of feature points for each frame, and for this set of feature points, only one optical flow tracking procedure will be performed, and then a new set of features will be retrieved again in the new frame. Point, because each set of feature points is used for a short time, this can avoid the traditional optical flow tracking algorithm must maintain the limit of consistent brightness. In this embodiment, the tracking module 115 uses a Kalman filter to estimate the range in which the feature points may appear in the next frame to balance the tracking speed and accuracy.

具體來說，假設被用於追蹤目標物的參考影格包括：相鄰於第一影格的第二影格，以及相鄰於第二影格的第三影格。首先，追蹤模組115從第一影格中，擷取出關於目標物的至少一特徵點(以下將擷取自第一影格的上述特徵點稱為第一組特徵點)，第一組特徵點是擷取自目標物在第一影格的顯示區域。接著，追蹤模組115根據卡爾曼濾波器，計算第二影格中的預估範圍，此預估範圍是第一組特徵點在第二影格中可能位於的區域。隨後，追蹤模組115執行光流追蹤演算法來實際估測第一組特徵點在第二影格的預估範圍內的移動資訊。在完成第一與第二影格之間的光流追蹤程序後，追蹤模組115捨棄第一組特徵點，而重新從第二影格擷取出關於目標物的至少一特徵點(例如，從目標物在第二影格之顯示區域中擷取特徵點，或從比顯示區域更小的一或多個小區域中擷取出特徵點，以下將擷取自第二影格的上述特徵點稱為第二組特徵點)，再根據卡爾曼濾波器，計算第三影格中的預估範圍(即，第二組特徵點在第三影格中可能位於的區域)，並執行光流追蹤演算法來實際估測第二組特徵點在第三影格的預估範圍內的移動資訊，如此完成第二與第三影格之間的光流追蹤程序，以此類推。 Specifically, it is assumed that the reference frame used to track the target includes: a second frame adjacent to the first frame, and a third frame adjacent to the second frame. First, the tracking module 115 extracts at least one feature point about the target object from the first frame (hereinafter, the feature points extracted from the first frame are referred to as a first group of feature points), and the first group of feature points is Extract from the display area of the target in the first frame. Next, the tracking module 115 calculates an estimated range in the second frame according to the Kalman filter, and the estimated range is an area in which the first set of feature points may be located in the second frame. Subsequently, the tracking module 115 performs an optical flow tracking algorithm to actually estimate the movement information of the first set of feature points within the estimated range of the second frame. After completing the optical flow tracking process between the first and second frames, the tracking module 115 discards the first set of feature points, and re-takes at least one feature point about the target from the second frame (eg, from the target) Taking feature points in a display area of the second frame, or extracting feature points from one or more small areas smaller than the display area, the following feature points taken from the second frame are referred to as a second group Feature point), according to the Kalman filter, calculate the estimated range in the third frame (ie, the region where the second set of feature points may be located in the third frame), and perform an optical flow tracking algorithm to actually estimate The second set of feature points in the third frame of the estimated range The movement information within the circumference, thus completing the optical flow tracking procedure between the second and third frames, and so on.

圖7繪示的是追蹤模組115利用影片中三張相鄰的影格來追蹤目標物的示意圖。其中，預估範圍R₁、R₂、R₃是根據卡爾曼濾波器計算而得，此為擷取自前一張影格的特徵點可能出現的區域。而較小的光流運算區域O₁、O₂、O₃，則是擷取自前一張影格的特徵點的實際座落之處。由於目標物在環境中移動會使得目標物在各影格中的大小不一致，故如圖7所示，預估範圍R₁、R₂、R₃及光流運算區域O₁、O₂、O₃的大小會有所改變。然而，光流追蹤演算法和卡爾曼濾波器的交互作用可避免因上述情況而導致追蹤失誤的結果。必須特別說明的是，圖7雖然是以三張相鄰的影格來說明追蹤目標物的方式，但影格的數量並不侷限於三張，只要是結合光流追蹤演算法和卡爾曼濾波器進行追蹤，且擷取自一影格的一套特徵點只進行一次光流追蹤程序，便屬於本發明的追蹤模組115的範疇。 FIG. 7 is a schematic diagram of the tracking module 115 using three adjacent frames in the movie to track the target. The estimated ranges R ₁ , R ₂ , and R ₃ are calculated according to the Kalman filter, which is an area that may be obtained from the feature points of the previous frame. The smaller optical flow computing regions O ₁ , O ₂ , and O ₃ are the actual locations of the feature points taken from the previous frame. Since the target moves in the environment, the size of the target in each frame is inconsistent, so as shown in FIG. 7, the estimated ranges R ₁ , R ₂ , R ₃ and the optical flow computing regions O ₁ , O ₂ , O ₃ The size will change. However, the interaction of the optical flow tracking algorithm and the Kalman filter can avoid the result of tracking errors caused by the above situation. It must be specially stated that although FIG. 7 illustrates the manner of tracking the target object by three adjacent frames, the number of frames is not limited to three, as long as the optical flow tracking algorithm and the Kalman filter are used for tracking. And only one optical flow tracking procedure is taken from a set of feature points of a frame, which belongs to the category of the tracking module 115 of the present invention.

在本實施例中，一旦開始對一目標物進行追蹤，追蹤模組115便利用加速強健特徵擷取(Speeded Up Robust Features，SURF)演算法，建立目標物的抗縮放及抗旋轉特徵，並將其儲存下來。抗縮放及抗旋轉特徵讓目標偵測與追蹤系統100能對此目標物做長期追蹤，無論目標物是否一度消失於影片，只要此目標物再次出現在影片中，則目標偵測與追蹤系統100便可利用其抗縮放及抗旋轉特徵來確認目標物的身分。如圖2的步驟S230所示，辨識模組117利用抗縮放及抗旋轉特徵辨識目標物是否出現在特定影格中。舉例來說，特定影格是指時序上出現在參考影格之後的任意影格。 In this embodiment, once tracking of an object is started, the tracking module 115 facilitates the use of the Speeded Up Robust Features (SURF) algorithm to establish anti-scale and anti-rotation features of the target, and It is stored. The anti-scale and anti-rotation feature allows the target detection and tracking system 100 to track the target for a long time, regardless of whether the target once disappeared into the movie, as long as the target appears again in the movie, the target detection and tracking system 100 The anti-scale and anti-rotation features can be used to confirm the identity of the target. As shown in step S230 of FIG. 2 The identification module 117 uses the anti-scale and anti-rotation features to identify whether the object appears in a particular frame. For example, a particular frame refers to any frame that appears temporally after the reference frame.

圖8是依照本發明一實施例所繪示的辨識目標物是否出現在特定影格的流程圖。請參閱圖8，首先如步驟S810所示，辨識模組117從特定影格擷取出待辨識特徵。接著在步驟S820中，辨識模組117將待辨識特徵與先前儲存的抗縮放及抗旋轉特徵進行比對。舉例而言，辨識模組117是採用第K個最近鄰近者(Kth Nearest Neighbor，KNN)演算法來進行比對動作。接下來，如步驟S830所示，辨識模組117依比對結果，判斷待辨識特徵是否符合抗縮放及抗旋轉特徵。若待辨識特徵不符合抗縮放及抗旋轉特徵，如步驟S840所示，辨識模組117判定目標物未出現於特定影格中。反之，若待辨識特徵符合抗縮放及抗旋轉特徵，則如步驟S850所示，辨識模組117判定目標物出現於特定影格中。 FIG. 8 is a flow chart of identifying whether an object appears on a particular frame, according to an embodiment of the invention. Referring to FIG. 8, first, as shown in step S810, the recognition module 117 extracts the feature to be identified from the specific frame. Next, in step S820, the recognition module 117 compares the feature to be recognized with the previously stored anti-scale and anti-rotation features. For example, the identification module 117 uses the Kth Nearest Neighbor (KNN) algorithm to perform the comparison action. Next, as shown in step S830, the recognition module 117 determines whether the feature to be identified conforms to the anti-scale and anti-rotation features according to the comparison result. If the feature to be recognized does not conform to the anti-scale and anti-rotation features, as shown in step S840, the recognition module 117 determines that the target does not appear in the specific frame. On the other hand, if the feature to be recognized meets the anti-scale and anti-rotation features, the identification module 117 determines that the target appears in a specific frame as shown in step S850.

在上述實施例中，雖然是針對一個目標物進行偵測與追蹤，但必須特別說明的是，目標偵測與追蹤系統100可針對影片中的數個目標物進行同步偵測與追蹤。每當偵測模組113在新的影格中偵測到目標物的顯示區域時，追蹤模組115會先判斷此目標物是否為先前已開始追蹤的物件。接下來，再透過結合光流追蹤演算法以及卡爾曼濾波器的追蹤機制進行追蹤，並儲存追蹤結果(例如，目標物的位置和大小)。 In the above embodiment, although the detection and tracking are performed for one target, it must be particularly noted that the target detection and tracking system 100 can perform synchronous detection and tracking for a plurality of objects in the movie. Whenever the detection module 113 detects the display area of the target in the new frame, the tracking module 115 first determines whether the target is an object that has been previously tracked. Next, tracking is performed by combining the optical flow tracking algorithm and the tracking mechanism of the Kalman filter, and the tracking result (for example, the position and size of the object) is stored.

綜上所述，本發明的目標偵測與追蹤方法及系統，先利用分類器融合演算法和擷取自大量訓練正、負樣本的紋理特徵，訓練出目標物分類器，以區分影格中的目標物與背景區域，據此，即便在動態環境下，依舊能偵測出目標物的所在位置。而對於已偵測到的目標物，則進一步利用結合卡爾曼濾波器和光流追蹤演算法的追蹤機制，從而達到快速且穩定的追蹤效果。一旦目標物被追蹤，則即時建立並儲存此目標物的抗縮放及抗旋轉特徵，如此一來，即便目標物暫時消失於影片，當此目標物再次出現在影片之際，可利用抗縮放及抗旋轉特徵辨識出此目標物的身分。本發明的目標偵測與追蹤方法及系統能在動態環境下達到快速且精確的追蹤的效果。 In summary, the target detection and tracking method and system of the present invention, Using the classifier fusion algorithm and the texture features extracted from a large number of training positive and negative samples, the target classifier is trained to distinguish the target and background regions in the frame, so that even in the dynamic environment, it can still detect Measure the location of the target. For the detected objects, the tracking mechanism combined with the Kalman filter and the optical flow tracking algorithm is further utilized to achieve fast and stable tracking effect. Once the target is tracked, the anti-scale and anti-rotation features of the target are instantly created and stored, so that even if the target temporarily disappears into the movie, the anti-scaling can be utilized when the target appears again in the movie. The anti-rotation feature identifies the identity of the target. The target detection and tracking method and system of the present invention can achieve fast and accurate tracking effects in a dynamic environment.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

Claims

A method for detecting and tracking a target includes: capturing a frame in a movie as a first frame, and scanning the first frame to detect a display of a target in the first frame. a region; according to the display area occupying an area ratio of the first frame, and at least one reference frame in the movie following the first frame to track the target, and recording the primary target scaling of the target being tracked And anti-rotation features; and using the anti-scale and anti-rotation features to identify whether the object appears in a particular frame.

The method of claim 1, wherein before the image frame in the movie is captured as the first frame, the method further comprises: obtaining a plurality of training positive samples and a plurality of training negative samples; Each of the training positive samples and a texture feature of each of the training negative samples; and using the training positive samples and the texture features of the training negative samples, and a classifier fusion algorithm to train a target a classifier, and scanning the first frame to detect the display area of the target in the first frame, comprising: forming a cascade classifier by n target classifiers, wherein n is a positive integer; and using the cascaded classifier to perform n scanning procedures on the first frame to detect the display area of the target in the first frame, wherein each of the n scanning programs includes In the first frame, a detection window is moved according to a specific direction to completely scan the first frame and then reduce the size of the first frame.

The method of claim 1, wherein the target area is performed according to the area ratio of the display area occupying the first frame, and the at least one reference frame of the movie subsequent to the first frame. The step of tracking includes: determining whether the area ratio exceeds a preset value; if not, extracting at least one feature point about the target from the display area, and using the at least one feature extracted from the display area And tracking the target with the at least one reference frame; and if so, extracting at least one feature point about the target from the plurality of sub-regions in the display area, and extracting from the plurality of feature points The at least one feature point of the area and the at least one reference frame are tracked, wherein the sub-areas are an upper left corner area and a lower right corner area of the display area, and the sub-areas do not overlap each other or Partial overlap.

The method of claim 1, wherein the at least one reference frame comprises: a second frame adjacent to the first frame, and a third frame adjacent to the second frame, and according to the The ratio of the area occupied by the display area to the first frame, and the at least one reference frame following the first frame in the movie, the step of tracking the target, comprising: extracting from the first frame At least one feature point of the target; calculating a predicted range in the second frame according to a Kalman filter; performing an optical flow algorithm to estimate the extracted from the first The at least one feature point of a frame, the mobile capital within the estimated range of the second frame Extracting at least one feature point about the target from the second frame; calculating an estimated range in the third frame according to the Kalman filter; and performing the optical flow tracking algorithm to estimate 撷The at least one feature point taken from the second frame, the movement information within the estimated range of the third frame.

The method of claim 1, wherein the specific frame is any frame that is temporally after the at least one reference frame, and the anti-scaling and anti-rotation feature utilizes an accelerated robust feature capture (Speeded Up Robust Features). , the SURF) algorithm is established, and the step of identifying whether the object appears in the specific frame by using the anti-scaling and anti-rotation feature comprises: taking out a feature to be identified from the specific frame; And the anti-scaling and anti-rotation feature; and determining that the object appears in the particular frame when the feature to be recognized conforms to the anti-scale and anti-rotation feature.

A target detection and tracking system includes: a storage unit that records a plurality of modules; and a processing unit coupled to the storage unit to access and execute the modules in the storage unit, the modules The method includes: a detecting module, capturing a frame in a movie as a first frame, and scanning the first frame to detect a display area of a target in the first frame; a tracking module, according to an area ratio of the display area occupying the first frame, and at least one reference frame in the movie following the first frame to track the target, and recording the target being tracked A primary anti-scaling and anti-rotation feature; and an identification module that utilizes the anti-scale and anti-rotation features to identify whether the object appears in a particular frame.

The target detection and tracking system as described in claim 6 wherein the modules further comprise: a training module, obtaining a plurality of training positive samples and a plurality of training negative samples, and taking each of the trainings. a positive sample and a texture feature of each of the training negative samples, and using the training positive samples and the texture features of each of the training negative samples, and a classifier fusion algorithm to train a target classifier, and The detecting module forms a first-level classifier by using the target classifiers, and uses the cascade classifier to perform n scanning processes on the first frame to detect that the target is in the first frame. The display area of the display area, wherein each of the n scanning programs includes: moving, in the first frame, a detection window according to a specific direction to completely scan the first frame and then reducing the size of the first frame, And n is a positive integer.

The target detection and tracking system of claim 6, wherein the tracking module determines whether the area of the display area occupying the first frame exceeds a preset value, and if not, from the display area Extracting at least one feature about the target Pointing, and tracking the target object by using the at least one feature point extracted from the display area and the at least one reference frame, and if so, extracting the target object from the plurality of sub-areas in the display area At least one feature point, and tracking the target object by using the at least one feature point extracted from the sub-regions, wherein the sub-regions are an upper left corner region and a right region of the display region Lower corner regions, and the sub-regions do not overlap or partially overlap each other.

The target detection and tracking system of claim 6, wherein the at least one reference frame comprises: a second frame adjacent to the first frame, and a third adjacent to the second frame a frame, and the tracking module extracts at least one feature point about the object from the first frame, calculates an estimated range in the second frame according to a Kalman filter, and performs an optical flow tracking calculation Estimating the movement information of the at least one feature point of the first frame from the estimated range of the second frame, and extracting at least one feature point about the object from the second frame, Calculating an estimated range in the third frame according to the Kalman filter, and performing the optical flow tracking algorithm to estimate the at least one feature point extracted from the second frame in the third frame Mobile information within the estimated range.

The target detection and tracking system according to claim 6, wherein the specific frame is any frame that is temporally after the at least one reference frame, and the tracking module uses an accelerated robust feature capture algorithm to establish The anti-scaling and anti-rotation feature, and the identification module extracts a feature to be recognized from the specific frame, compares the feature to be recognized with the anti-scaling and anti-rotation feature, and when the feature to be recognized meets the anti-scaling and When the anti-rotation feature is anti-rotation, it is determined that the object appears in the specific frame.