TW520603B

TW520603B - Method of generating a moving object shape from a series of video frames

Info

Publication number: TW520603B
Application number: TW89114744A
Authority: TW
Inventors: Liang-Ji Chen; Shr-Yi Ma; Shau-Yi Jian
Original assignee: Liang-Ji Chen
Priority date: 2000-07-24
Filing date: 2000-07-24
Publication date: 2003-02-11

Abstract

The present invention is directed to a method of real-time generating a moving object shape from a series of video frames by an automatic video segmentation method. The basic idea is to construct reliable background information from the video sequences and then compare the background with each frame. Pixels in the current frame which are significantly different from the background are labeled as the moving object. The moving object shape information can be obtained efficiently in this way. Background registration technique is used to maintain an up-to-date background information. The current luminance value of a pixel is copied into the background buffer if no significant change is detected in several consecutive frames. An optional morphological gradient filter is introduced in pre-processing stage to reduce the effect of shadows and illumination variation. From the experimental results, good and consistent object shape information can be obtained form our segmentation method. Since no computationally intensive operations are used in our method, it is very suitable for real-time object shape generation systems.

Description

經濟部智慧財產局員工消費合作社印製 520603 A7 B7 五、發明說明（1 ) 發明範疇本發明係有關視訊資料處理，特別是適用於以移動物件爲編碼單位的多媒體信號處理及通信應用的移動物件形狀即時產生方法。發明背景以內容爲基礎的使用者互動（content-based interactivity)是多媒體通信系統（multimedia communication systems)的重要功能。爲了實現這個功能，視訊資料的壓縮將不再能夠採取傳統上以整張畫面爲單位的編碼方式，而必須以物件爲單位。在這種以物件爲基礎的視訊編碼中，基本的編碼單位稱爲視訊物件（video object)，其中包含了物件的影像及形狀的資訊。在壓縮的過程中每個物件的編碼是獨立進行的。因此使用者可以依據本身的需要改變視訊畫面中物件的數量及位置。MPEG-4多媒體通信標準就是採用這種編碼的方式來進行視訊的壓縮。然而，目前的影像擷取系統都只能輸出整張畫面的影像，無法提供物件形狀的資訊。因此非常需要一個由視訊資料中自動求得物件形狀資料的方法和裝置。同時，由於視訊的資料量非常龐大’這個自動視訊切割的方法必須具備非常高的效率’才能滿足多媒體通信系統的即時（real time)要求。自動視訊切割的主要目的是從視訊晝面中’將1靜止的背景和移動的物件分開來’產生一個物件形狀的資訊。因本紙張尺度適用中國國家標準（CNS)A4規格（210 x 297公爱） --------------------訂---------線--- (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 520603 A7 B7 五、發明說明（2) 此必須要有一個有效的分割條件（segmentation criterion) 來區分物件的部份。最常用到的分割條件是空間上的同質性（spatial homogeneity)及移動的連續性（motion coherence)，而且通常必須兩種條件同時使用，適當的配合才能得到準確而穩定的切割結果。目前的習知技術可以依照主要的分割條件分成兩類。一類是採用空間上的同質性爲主要的切割條件。先將每張單張的畫面依照空間的特性 (亮度、色彩等）分割成同質性的區域。然後再依移動的特性將同一個物件內的區域合倂在一起。而另一類則以移動做爲主要的切割依據，利用改變偵測（change detection) 先求得物件的大致邊緣，然後再由空間上的特性（邊緣）求得準確的物件形狀。雖然先利用空間上的同質性的演算法可以求得比較精確的物件邊緣，但是由於在切割初期還沒有採用移動的資訊，因此相當多的運算都浪費在背景區域的切割。我們認爲移動物件與背景最大的區別在於是否有動作，因此一個有效率的視訊切割方法應該盡可能地利用移動的資訊作爲主要的判斷依據。在目前以改變偵測爲基礎的切割演算法中，有幾個主要的缺點。由於改變偵測是以連續畫面的差値（frame difference)作爲判斷的依據。因此第一個主要的缺點就是移動偵測的效果會受到移動速度的影響而有所不同，當物件的移動速度變化很大時，很難得到穩定的切割效果。第二個主要的缺點則是：當物件移動時，除了物件的部分之本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） (請先閱讀背面之注意事項再填寫本頁) _参丨線· 520603 經濟部智慧財產局員工消費合作社印製 A7 B7 i、發明說明（3 ) 外’因爲物件移動所露出的背景（unco ve red back ground)也會被偵測成有改變的區域。所以需要有一個判斷的方法來區分物件的區域及露出的背景。傳統的作法是在改變的區域內做移動偵測（motion estimation)。如果可以找到誤差很小的移動向量（motion vector)則表示是屬於移動物件的部分。而無法找到合適的移動向量的區域則是屬於露出的背景部分。這種作法的主要缺點在於物件在移動的過程中，有可能產生形狀的變化而使的移動估計無法找到正確的移動方向。另外一個缺點則是：移動估計所需的運算量相當龐大。有效率的運算法應該盡可能避免使用移動估計。另外一個主要的問題是物件陰影的處理。在一般的室內環境中，光源及反射時常在物件周圍的背景區域造成模 •糊的陰影。由於這些陰影會隨著物件的動作而移動，因此移動偵測會將這些陰影的區域也判斷成物件而得到錯誤的物件形狀。因此，我們需要一個自動視訊切割的方法及裝置，能夠有效率地區分移動物件及背景的區域。同時，切割的結果必須要準確而且穩定，不能受到物件的移動速度及陰影的影響。如此才能即時地求得多媒體通信系統所需要的物件形狀資訊。發明要旨本發明提出了一種藉由自動視訊切割從一系列視訊畫面切割其中移動物件形狀的方法。本發明方法適用於本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） --------------------訂-------—線 (請先閱讀背面之注意事項再填寫本頁) 520603 A7 B7 五、發明說明（4) MPEG-4編碼系統。於本發明方法中視訊切割是以改變偵測的觀念爲基礎’採用背景目主冊（background registration's 方式來區分移動物件的區域及背景區域.，主要由下列五個步驟所組成：第一個步驟是計算畫面差値由輸入的目前畫面 (current frame)和前一張畫面（previous frame)的亮度値作比較，將有明顯變化的部份紀錄爲” 1”，若否則記爲”0”，並輸出一畫面差値圖罩（frame difference mask)。第二個步驟是背景註冊（background registration)，主要是將連續數個畫面的畫面差値圖罩中均爲”0”的點的目前畫面的相對應像素的視訊資料存於一背景緩衝區而得一更新的背景畫面，來作爲判斷移動物件區域之用的可靠背 •景資訊。第三個步驟是計算背景差値（background difference)，由輸入的目前畫面和第二個步驟所得到的背景畫面的亮度値作比較，將有明顯不同的部份紀錄爲”1”，若否則記爲”0”，並輸出一背景差値圖罩（background difference mask) 〇第四個步驟物件偵測（object detection)是利用前幾個步驟的結果，得出移動物件的物件形狀圖罩（object mask)，其中如果背景緩衝區的背景畫面的一點存有背景資訊，則該物件形狀圖罩的相對應那一點的値等於該背景差値圖罩的相對應點，如果背景緩衝區的背景畫面的一點未存有背景資訊，則該物件形狀圖罩的相對應那一點的値等本紙張尺度中S目家標準（CNS)A4規格⑽x 297公釐) (請先閱讀背面之注意事項再填寫本頁) ;# --線- 經濟部智慧財產局員工消費合作社印製 520603 A7 B7 經濟部智慧財產局員工消費合作社印製 1、發明說明（5 ) 於該畫面差値圖罩的相對應點。簡言之，大體上依背景差値圖罩來產生物件形狀圖罩，除非背景資訊不存在。若背景資訊不存在，則將目前畫面和前一張畫面（previous frame)的亮度値有明顯變化的部份記爲移動物件形狀。第五個步驟爲雜訊消除（noise region elimination)，將物件形狀圖罩中因爲背景雜訊及物件移動產生的雜訊消除，求得正確的物件形狀圖罩。在許多實際的應用狀況中，照明的光源會使物件的陰影或是反射影像出現於畫面中。當物件移動時，陰影及反射也會改變，因此受到陰影影響的部份也會出現在物件形狀圖罩中。爲了減少這種錯誤，我們可以在影像的輸入端加入一個梯度濾波器（gradient filter)。由於大多數的情況陰影區域的特色是緩慢的亮度變化，因此在做完梯度運算之後，梯度低的陰影區域會變得比較不明顯，可以將陰影對於物件形狀切割的影響降低。本發明所提供的方法可以運用於電腦系統中，輸入的視訊資料可以由電腦的輸入週邊裝置（數位相機或是影像捕捉卡）提供，或是由資料儲存裝置提供。本發明所述的步驟則是以指令的形式在電腦的中央處理單元執行。而輸出的物件形狀資訊則可以儲存於資料儲存裝置中或是由電腦的輸出週邊裝置輸出。本發明所提供的方法也可以應用於獨立的資料處理硬體系統中，本方法中的每個步驟由對應的硬體運算單元來執行所需要的運算。本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） (請先閱讀背面之注意事項再填寫本頁) - 訂---------線丨» 520603 A7 B7 五、發明說明（6 ) 圖示說明圖一爲藉由自動視訊切割從一系列視訊畫面切割其中移動物件形狀的方法的流程圖。圖二顯示了背景背景註冊的效果，圖二（a)爲氣象報告影片第50張畫面，圖二（b)則爲在第50張畫面時背景緩衝區中的背景畫面，圖二（c)爲第1〇〇張的畫面，圖二（d)爲在第 1 〇〇張畫面時背景緩衝區中的背景畫面，其中黑色的部份代表尙未有背景資訊。圖三顯示了雜訊區域消除的效果，圖三（a)爲原始的物件形狀圖罩而圖三（b)是消除雜訊區域之後的物件形狀圖罩。圖四顯示了亮度梯度濾波器的效果：圖四（a)爲原始的畫面；圖四（b)爲切割之後的物件區域，由於受到陰影的影響，部份的背景區域也被誤判爲物件區域；圖四（c)爲通過了亮度梯度濾波器之後的畫面，其中陰影的部份和原始畫面相比較不明顯；圖四（d)則爲加入亮度梯度濾波器之後的切割結果，陰影的影響可以有效地消除。發明的詳細說明本發明提出了一藉由自動視訊切割從一系列視訊畫面切割其中移動物件形狀的方法，能夠有效率地區分視訊資料中移動物件的區域。主要的想法是由視訊的連續畫面中，將可靠的背景資訊組合出來儲存於背景緩衝區 (background buffer)中。然後將目前畫面和背景畫面比較，本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） (請先閱讀背面之注意事項再填寫本頁) --------訂---------線». 經濟部智慧財產局員工消費合作社印製 520603 A7 一 B7 五、發明說明（7 ) 如果和背景之間有明顯的差異，則可視爲是屬於移動物件的區域。這個方法的前提是假設背景固定不動，由於在大多數的多媒體通訊的應用中，照相機是固定的，因此本發明針對這種情況提出有效率的移動物件形狀之視訊切割方法。整個移動物件形狀之視訊切割方法的流程如圖一所示，主要由下列五個步驟所組成。第一個步驟是計算畫面差値11 (frame difference)，由輸入的目前畫面（current frame)和前一張畫面（previ〇us frame) 的亮度値作比較。如果兩連續畫面的一相同位置的像素的差値大於一個預設的臨界値，則表示這個像素在兩張畫面間有顯著的變化，在輸出的畫面差値圖罩（frame difference mask)中，這個像素位置的値會設爲二進位的”1”。反之，如果兩連續畫面的一相同位置的像素的亮度値差値並沒有大於預設的臨界値，則在畫面差値圖罩中這個像素位置的値會紀錄爲二進位的” 〇 ”，表示這個像素位置在兩張畫面間並沒有顯著的變化。第一個步驟是背景註冊 12(background registration)，主要的功能是由輸入的視訊資料中將可靠的背景資訊記錄下來，作爲判斷移動物件區域之用。我們將可靠的背景資訊定義爲連續L個畫面的亮度値都沒有變動的部份。例如，當L等於1 0的時候，如果畫面中的某個像素位置的亮度値連續1 〇個畫面都沒有顯著的變動，則表示這個像素的位置是背景的區域。因此，目前畫面的亮度値會被複製到背景緩衝區中背景畫面之相對應的像素。 -10- 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） (請先閱讀背面之注意事項再填寫本頁) 訂---------線 » 經濟部智慧財產局員工消費合作社印製 520603 A7 B7 經濟部智慧財產局員工消費合作社印製五、發明說明（8 ) 在實作上，在背景註冊的過程中會使用到三塊記憶體的區域。第一塊記憶體的區域存放了每一個像素的亮度値已經連續幾個畫面沒有顯著的變化。判斷的依據是第一個步驟11所輸出的畫面差値圖罩。對於每一個像素位置，如果畫面差値圖罩的値是”1”，則該像素的計數器會被設成 0，如果畫面差値圖罩的値爲”0”，則計數器的値會被增加 1。一旦發現這個計數器的値大於L，表示這個像素已經連續L個畫面沒有顯著的變化，應爲可靠的背景。此時目前畫面的該像素的亮度値會被複製到該背景畫面的相對應像素，這個背景畫面是存於第二塊記憶體區域：背景緩衝區 (background buffer)中 ° 背景緩衝區在視訊資料輸入之前並沒有預設的値，隨著背景註冊的過程不斷進行，累積的背景資訊也越來越多，因此需要另一個記憶體區域來記載畫面中的每一個像素是否已經有背景資訊了。這個資訊稱爲背景註冊圖罩 (background registration mask) 〇圖二顯示了背景註冊的效果。其中圖二（a)爲氣象報告影片第50張畫面時的輸入影像，圖二（b)爲第50張畫面時背景緩衝區中所儲存的影像，其中黑色的部份代表尙未有背景資訊。由圖二（a)及（b)中可以看出除了被記者身體擋住的部分以外的區域都能夠正確地獲得背景資訊。圖二（c)爲第 100張畫面時的輸入影像，而圖二（d)則爲此時背景緩衝區中所儲存的影像，由圖二（c)及（d)中可以看出，當原本被遮住的背景隨著物件的移動而露出來時，背景緩衝區能夠收 -11 - (請先閱讀背面之注意事項再填寫本頁) »衣線· 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） 520603 A7 B7 五、發明說明（9) 集到更多的背景資訊。第三個步驟是計算背景差値i3(background difference)，由輸入的目畫面和第一個步驟12所得到存於該背景緩衝區的背景畫面的亮度値作比較。在畫面中的每一個像素位置，如果和該背景畫面的差値大於一個預設的臨界値，則表示這個像素和背景有顯著的不同，在輸出的背景差値圖罩（background difference mask)中，這個像素位置的値會設爲二進位的”1”。反之，如果和背景畫面的差値並沒有大於預設的臨界値，則在背景差値圖罩中會紀錄爲二進位的”〇”，表示這個像素位置的亮度値和背景之間並沒有顯著的不同。第四個步驟移動物件偵測14 (moving object detection) 是利用前幾個步驟的結果，得出移動物件的物件形狀圖罩 (object mask)。輸入的資訊爲畫面差値圖罩、背景差値圖罩、及背景註冊圖罩。這個部份和傳統上以改變偵測爲基礎的切割演算法最大的不同，是以背景爲主要的判斷依據。畫面中的每一個像素，依據背景註冊圖罩中相對應値可以知道背景資訊是否已經存在於背景緩衝區內。對於背景資訊已經存在的像素位置，輸出的物件形狀圖罩的値爲背景差値圖罩中相對應的値。而對於還沒有背景資訊的像素位置，輸出的物件形狀圖罩的値則爲畫面差値圖罩中相對應位置的値。由這個步驟所求得的物件形狀圖罩是一個由二進位資料所組成的影像，物件形狀圖罩中値爲” i，，的像素表示該位置包含於移動物件的形狀之中，而爲，，〇，，的像素 -12- 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） (請先閱讀背面之注意事項再填寫本頁) - -------訂---------線. 經濟部智慧財產局員工消費合作社印製 520603 A7 B7 五、發明說明（10) 則表示該位置是屬於背景區域。第五個步驟爲雜訊消除15(noise region elimination)。在第四個步驟14輸出的物件形狀圖罩中.，會因爲背景雜訊及物件的移動產生不少的雜訊。例如在背景的區域會有單獨的像素或是較小的區域被標示爲物件，而在物件區域的內部，則有一些單獨的像素或是較小的區域被標示爲背景。這些雜訊的區域是錯誤的資訊，必須要想辦法消除才能得到正確的物件形狀圖罩。在本發明中採用的方法是相連元素演算法（connected component algorithm) 〇相連元素演算法屬於習知技能，因此在這裡只敘述其基本的觀念。在第四個步驟14輸出的物件形狀圖罩中，我們觀察到，物件的區域和雜訊最明顯的不同是雜訊通常是面積較小的隔 •離區塊。我們利用相連元素演算法可以將物件形狀圖罩中所有的連結在一起的區域都標示出來，然後分別計算每個區域的面積。當區域的面積小於一個預設的臨界値的時候，表示這個區域爲雜訊區域，這個區域中的每個像素位置的値會被反相，也就是”1”會被轉成”〇”，而”〇”會被轉成 ”1”。圖三顯示了雜訊消除的效果。其中圖三（a)爲原始的物件形狀圖罩而圖三（b)則爲消除雜訊之後的物件形狀圖罩。由圖中可以看出，獨立的雜訊區域可以有效地消除，而物件的形狀則不受影響。這個雜訊消除的步驟除了可以採用相連元素演算法之外，也可以採用形態學的閉合運算（morphological close -13 - 本紙張尺度適用中國國家標準（CNS)A4規格（210 x 297公釐） (請先閱讀背面之注意事項再填寫本頁) 訂---------線 » 經濟部智慧財產局員工消費合作社印製 520603 A7 B7 1、發明說明（11 oper at ion)及開放運算（open operation)來達成，這個運算可以用下面的方程式來表示： 0 = (Ι〇Β)·Β, 其中/是輸入的影像、5是一個型態構成元件（structuring element)、。是形態學的開放運算、•是形態學的閉合運算、而〇則是雜訊消除後的輸出影像。這些型態學的運算是屬於習知技能，計算的方程式如下： X。B = (ΧΘΒ)㊉B， X · B = (X㊉Β) ΘΒ，其中®是形態學的縮減運算（erosion operation)、㊉是形態學的擴張運算（dilation operation)，方程式如下：[X@B](Uj)= f| X{i - k,j -1、 (kJ)eZ2-B(kJ)=\μτ ㊉聊，y)= U x{i-kj-i) (k,l)eZ2-,B(kJ)=\ 上式中的型態構成元件B的大小決定了可以消除的雜訊區域大小，在本發明中採用3乘3的型態構成元件可以獲得不錯的效果，方程式如下：Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 520603 A7 B7 V. Description of the invention (1) The scope of the invention The invention relates to video data processing, and is particularly suitable for mobile signal processing and communication applications that use mobile objects as coding units Instant shape generation method. BACKGROUND OF THE INVENTION Content-based interactivity is an important function of multimedia communication systems. In order to achieve this function, the compression of video data will no longer be able to adopt the traditional encoding method based on the entire picture, but must be based on objects. In this object-based video coding, the basic coding unit is called a video object, which contains information about the image and shape of the object. The encoding of each object is performed independently during the compression process. Therefore, users can change the number and position of objects in the video screen according to their own needs. The MPEG-4 multimedia communication standard uses this encoding method to perform video compression. However, current image capture systems can only output images of the entire screen, and cannot provide information about the shape of objects. Therefore, there is a great need for a method and a device for automatically obtaining shape data of objects from video data. At the same time, because the amount of video data is very large, this method of automatic video cutting must have very high efficiency, in order to meet the real time requirements of multimedia communication systems. The main purpose of automatic video cutting is to ‘separate a stationary background and moving objects’ from the video daytime surface to generate information about the shape of an object. Because this paper size applies Chinese National Standard (CNS) A4 specification (210 x 297 public love) -------------------- Order --------- Line --- (Please read the notes on the back before filling out this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 520603 A7 B7 V. Invention Description (2) This must have a valid segmentation criterion To distinguish parts of objects. The most commonly used segmentation conditions are spatial homogeneity and motion coherence. Usually, two conditions must be used at the same time, and proper cooperation can be used to obtain accurate and stable cutting results. Current known techniques can be divided into two categories based on the main segmentation conditions. One is the use of spatial homogeneity as the main cutting condition. First divide each single picture into homogeneous areas according to the characteristics of space (brightness, color, etc.). Then the areas within the same object are combined according to the characteristics of the movement. In the other category, movement is used as the main cutting criterion. Change detection is used to first obtain the approximate edge of the object, and then to obtain the accurate object shape based on the spatial characteristics (edge). Although the spatial homogeneity algorithm can be used to obtain a more accurate object edge, since no moving information is used in the initial stage of cutting, a considerable amount of calculations are wasted in cutting the background area. We believe that the biggest difference between a moving object and the background is whether there is motion, so an efficient video cutting method should use moving information as much as possible as the main judgment basis. There are several major shortcomings in current cutting algorithms based on change detection. The change detection is based on the frame difference of continuous frames. Therefore, the first major disadvantage is that the effect of motion detection will be affected by the speed of the movement. When the speed of the object changes greatly, it is difficult to obtain a stable cutting effect. The second major disadvantage is that when the object is moved, the paper size of the part except the object applies the Chinese National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page ) _ Reference line 520603 Printed by the Consumer Property Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs A7 B7 i. Invention Description (3) Outside 'Unco ve red back ground exposed due to object movement will also be detected as changed Area. Therefore, a judgment method is needed to distinguish the area of the object and the exposed background. The traditional method is to perform motion estimation in the changed area. If a motion vector with a small error can be found, it means that it belongs to the part of the moving object. The area where a suitable motion vector cannot be found is part of the exposed background. The main disadvantage of this method is that during the movement of the object, the shape may change and the movement estimation cannot find the correct movement direction. Another disadvantage is that the amount of calculation required for motion estimation is quite large. Efficient algorithms should avoid using motion estimation whenever possible. Another major issue is the handling of object shadows. In the general indoor environment, light sources and reflections often cause blurred shadows in the background area around objects. Since these shadows will move with the motion of the object, motion detection will also judge the areas of these shadows as objects and get the wrong object shape. Therefore, we need an automatic video cutting method and device that can efficiently distinguish between moving objects and background areas. At the same time, the cutting results must be accurate and stable, and must not be affected by the moving speed and shadow of the object. In this way, the shape information of the objects required by the multimedia communication system can be obtained in real time. Summary of the Invention The present invention proposes a method for cutting the shape of a moving object from a series of video screens by automatic video cutting. The method of the present invention is applicable to the paper size applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) -------------------- Order ----- --- line (please read the notes on the back before filling this page) 520603 A7 B7 V. Description of the invention (4) MPEG-4 encoding system. In the method of the present invention, the video cutting is based on changing the concept of detection. The background registration's method is used to distinguish the area of the moving object from the background area. It is mainly composed of the following five steps: The first step It calculates the frame rate. The brightness of the current frame (previous frame) and the previous frame are compared. The part with significant changes is recorded as "1", otherwise it is recorded as "0". And output a frame difference mask. The second step is the background registration, which is mainly the current picture of points with "0" in the frame difference mask of several consecutive pictures The video data of the corresponding pixels are stored in a background buffer to obtain an updated background picture, which is used as a reliable background and scene information to determine the area of the moving object. The third step is to calculate the background difference, Comparing the brightness of the current picture input with the background picture obtained in the second step, a significantly different part is recorded as "1", Otherwise, it is recorded as "0" and a background difference mask is output. The fourth step of object detection is to use the result of the previous steps to obtain the object shape mask of the moving object. (Object mask), where if a point in the background buffer of the background buffer contains background information, the corresponding point of the object shape mask is equal to the corresponding point of the background difference mask. There is no background information at one point of the background screen, so the corresponding point of the shape of the object (such as the standard of the domestic standard (CNS) A4 size x 297 mm in this paper standard) (Please read the precautions on the back first) (Fill in this page again); # --Line-Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economy 520603 A7 B7 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economy 1. Description of the invention (5) Correspondence point. In short, an object shape mask is generally generated based on the background difference mask, unless background information is not present. If the background information does not exist, the part where the brightness 値 of the current frame and the previous frame has changed significantly is recorded as the shape of the moving object. The fifth step is noise region elimination. The noise generated by the background noise and object movement in the object shape mask is eliminated to obtain the correct object shape mask. In many practical applications, the illuminating light source will make the shadow or reflection of the object appear in the picture. As objects move, shadows and reflections also change, so the parts affected by the shadows also appear in the object shape mask. To reduce this error, we can add a gradient filter to the input of the image. In most cases, the characteristic of the shadow area is the slow brightness change. Therefore, after the gradient calculation is completed, the shadow area with low gradient will become less obvious, and the influence of the shadow on the shape of the object can be reduced. The method provided by the present invention can be applied to a computer system. The input video data can be provided by a computer input peripheral device (digital camera or image capture card), or provided by a data storage device. The steps of the present invention are executed in the form of instructions in a central processing unit of a computer. The output object shape information can be stored in the data storage device or output by the computer's output peripheral device. The method provided by the present invention can also be applied to an independent data processing hardware system. Each step in the method is performed by a corresponding hardware operation unit to perform a required operation. This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page)-Order --------- line 丨 »520603 A7 B7 5 6. Description of the invention (6) Graphical illustration FIG. 1 is a flowchart of a method for cutting the shape of a moving object from a series of video screens by automatic video cutting. Figure 2 shows the effect of background background registration. Figure 2 (a) is the 50th frame of the weather report movie, and Figure 2 (b) is the background image in the background buffer at the 50th frame. Figure 2 (c) It is the 100th picture, and Figure 2 (d) is the background picture in the background buffer at the time of the 100th picture. The black part represents that there is no background information. Figure 3 shows the effect of noise area removal. Figure 3 (a) is the original object shape mask and Figure 3 (b) is the object shape mask after the noise area is removed. Figure 4 shows the effect of the brightness gradient filter: Figure 4 (a) is the original picture; Figure 4 (b) is the object area after cutting. Due to the influence of the shadow, part of the background area is also misjudged as the object area ; Figure 4 (c) is the picture after passing the brightness gradient filter, where the shadow part is not obvious compared with the original picture; Figure 4 (d) is the cutting result after adding the brightness gradient filter, the effect of shadow Can be effectively eliminated. Detailed Description of the Invention The present invention proposes a method for cutting the shape of a moving object from a series of video images by automatic video cutting, which can efficiently distinguish the area of the moving object in the video data. The main idea is to combine reliable background information from the continuous footage of the video and store it in a background buffer. Then compare the current picture with the background picture. This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please read the precautions on the back before filling this page) -------- Order --------- Line ». Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 520603 A7-B7 V. Description of the invention (7) If there is a significant difference with the background, it can be regarded as a moving object Area. The premise of this method is to assume that the background is fixed. Since the camera is fixed in most multimedia communication applications, the present invention proposes an efficient video cutting method for the shape of a moving object in response to this situation. The flow of the video cutting method for the shape of a moving object is shown in Figure 1. It consists of the following five steps. The first step is to calculate the frame difference 値 11 (frame difference), and compare the brightness 値 of the input current frame and the previous frame (previous frame). If the difference between pixels in the same position of two consecutive pictures is greater than a preset critical threshold, it means that this pixel has a significant change between the two pictures. In the output frame difference mask, The pixel at this pixel position is set to the binary "1". Conversely, if the brightness difference between pixels at the same position in two consecutive pictures is not greater than a predetermined threshold, then the pixel at this pixel position in the picture difference mask will be recorded as a binary "0", indicating that This pixel position does not change significantly between the two frames. The first step is background registration 12. The main function is to record reliable background information from the input video data for the purpose of determining the area of the moving object. We define reliable background information as the part where the brightness of L consecutive pictures does not change. For example, when L is equal to 10, if there is no significant change in the brightness of a pixel position in the picture for 10 consecutive pictures, it means that the position of this pixel is the area of the background. Therefore, the brightness of the current picture is copied to the corresponding pixels of the background picture in the background buffer. -10- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page) Order --------- line »Ministry of Economy Wisdom Printed by the Consumer Cooperative of the Property Bureau 520603 A7 B7 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention (8) In practice, three memory areas will be used in the background registration process. The area of the first block of memory stores the brightness of each pixel. There have been no significant changes in several consecutive pictures. The judgment is based on the screen difference mask output in the first step 11. For each pixel position, if the frame rate mask is "1", the counter of the pixel will be set to 0. If the frame rate mask is "0", the counter frame will be increased. 1. Once it is found that the value of this counter is greater than L, it means that this pixel has no significant change in consecutive L pictures, and it should be a reliable background. At this time, the brightness 値 of the pixel in the current picture will be copied to the corresponding pixel of the background picture. This background picture is stored in the second memory area: in the background buffer. The background buffer is in the video data. There is no preset 値 before the input. As the background registration process continues, more and more background information is accumulated, so another memory area is required to record whether each pixel in the screen already has background information. This information is called the background registration mask. Figure 2 shows the effect of background registration. Figure 2 (a) is the input image at the 50th frame of the weather report movie, and Figure 2 (b) is the image stored in the background buffer at the 50th frame. The black part represents that there is no background information . It can be seen from Figures 2 (a) and (b) that the background information can be obtained correctly in all areas except the part blocked by the reporter's body. Figure 2 (c) is the input image at the 100th frame, and Figure 2 (d) is the image stored in the background buffer at this time. As can be seen from Figures 2 (c) and (d), when When the originally obscured background is exposed with the movement of the object, the background buffer can be received -11-(Please read the precautions on the back before filling this page) »Clothing thread · This paper size applies to Chinese national standards (CNS ) A4 specification (210 X 297 mm) 520603 A7 B7 V. Invention description (9) Set more background information. The third step is to calculate the background difference 値 i3 (background difference), and compare the input target picture with the brightness 该 of the background picture stored in the background buffer obtained in the first step 12. At each pixel position in the picture, if the difference between the background picture and the background picture is greater than a preset threshold, it means that this pixel is significantly different from the background. In the output background difference mask The 这个 at this pixel position will be set to the binary "1". Conversely, if the difference between the background image and the background image is not greater than the preset threshold, the binary image will be recorded as a binary "0" in the background image mask, indicating that there is no significant difference between the brightness of this pixel position and the background. s difference. The fourth step, moving object detection 14 (moving object detection) is to use the results of the previous steps to obtain the object mask of the moving object. The entered information is the screen rating mask, the background rating mask, and the background registration mask. The biggest difference between this part and the traditional cutting algorithm based on change detection is that the background is the main judgement. For each pixel in the picture, according to the corresponding in the background registration mask, you can know whether the background information already exists in the background buffer. For pixel positions where background information already exists, the 値 of the output object shape mask is the corresponding 中 in the background difference mask. For pixel positions that have no background information, the 値 of the output object shape mask is the 値 at the corresponding position in the screen difference mask. The object shape mask obtained from this step is an image composed of binary data. The pixel in the object shape mask is "i", and the pixel indicates that the position is included in the shape of the moving object. ，〇，，的 pixels -12- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page)-------- Order --------- Line. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 520603 A7 B7 V. The description of the invention (10) indicates that the location belongs to the background area. The fifth step is noise elimination 15 ( noise region elimination). In the object shape mask output in the fourth step 14, there will be a lot of noise due to background noise and object movement. For example, there will be separate pixels or smaller pixels in the background area. Area is marked as an object, and inside the object area, there are some individual pixels or smaller areas are marked as background. These noisy areas are wrong information and must be eliminated to get the correct Object shape figure cover. The method used in the invention is a connected component algorithm. O The connected element algorithm is a known skill, so only the basic concepts are described here. In the object shape mask output in the fourth step 14, we It is observed that the most obvious difference between the area of the object and the noise is that the noise is usually a small isolated block. We use the connected element algorithm to mark all the connected areas in the object shape mask. Come out, and then calculate the area of each area separately. When the area area is less than a preset threshold 値, it means that this area is a noise area, the 値 at each pixel position in this area will be inverted, that is, "1" will be converted to "〇", and "〇" will be converted to "1". Figure 3 shows the effect of noise reduction. Figure 3 (a) is the original object shape mask and Figure 3 ( b) is the object shape mask after noise removal. As can be seen from the figure, the independent noise area can be effectively eliminated, but the shape of the object is not affected. This noise In addition to the connected element algorithm, the morphological close operation (morphological close -13-this paper size applies to China National Standard (CNS) A4 (210 x 297 mm) (please first Read the notes on the back and fill in this page) Order --------- line »Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 520603 A7 B7 1. Description of the invention (11 oper at ion) and open operation (open operation), this operation can be expressed by the following equation: 0 = (Ι〇Β) · B, where / is the input image, 5 is a structuring element (structuring element). Is the open operation of morphology, • is the closed operation of morphology, and 0 is the output image after noise reduction. These morphological operations are conventional skills, and the calculation equation is as follows: X. B = (XΘΒ) ㊉B, X · B = (X㊉Β) ΘΒ, where ® is a morphological reduction operation and ㊉ is a morphological expansion operation, the equation is as follows: [X @ B] ( Uj) = f | X {i-k, j -1, (kJ) eZ2-B (kJ) = \ μτ chat, y) = U x {i-kj-i) (k, l) eZ2-, B (kJ) = \ The size of the type component B in the above formula determines the size of the noise area that can be eliminated. In the present invention, a 3 by 3 type component is used to obtain a good effect. The equation is as follows:

B (請先閱讀背面之注意事項再填寫本頁) »衣訂---------線 » 經濟部智慧財產局員工消費合作社印製 1 1 1 1 1 1 1 1 1 在上述的第一'個步驟11中’輸入的貪料是目即畫面及前一個畫面的影像資料。由於許多影像擷取裝置只能提供目前畫面的影像資料，因此我們可以在系統中加入一個畫面緩衝區（frame buffer)用以儲存一個畫面的資料。在每一個目前畫面的像素資料輸入時，畫面緩衝區讀出相對應位 -14- 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） 520603 A7 •^-_____ B7___ 五、發明說明（12 ) 置的前一個畫面的像素資料，同時將目前畫面的像素資料儲存於同一個記憶體位置。如此，只需要輸入目前畫面的影像資料就可以同時得到目前畫面及前一個畫面的影像資料作爲上述弟一*個步驟的輸入之用。在許多實際的應用狀況中，照明的光源會使物件的陰影或是反射影像出現於畫面中。當物件移動時，陰影及反射影像也會改變，因此受到陰影及反射影像影響的部份也會出現在物件形狀圖罩中。爲了減少這種錯誤，我們可以在影像的輸入端加入一個梯度濾波器（gradient filter)。由於大多數的情況陰影區域的特色是緩慢的亮度變化，因此在做完梯度運算之後，梯度低的陰影區域會變得比較不明顯，可以將陰影對於物件形狀切割的影響降低。在本發明中的梯度濾波器是所採用形態學的梯度運算 (morphological gradient operation)，這個運算可以用下面的方程式來表示： G = (I ㊉ Β)-(Ι€)Β)，其中I是輸入的影像、Β是一個3乘3的型態構成元件、㊉是形態學的擴張運算、Θ是形態學的縮減運算、而G則是輸出的梯度影像。這些型態學的運算是屬於習知技能，計算的方程式如下： [I@B](Uj)= Π (kJ)^Z2Mk,l)=\ [/㊉·，刀=U 外-〇 (k,l)eZ2-B(kJ)=\ -15- 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） (請先閱讀背面之注咅？事項再填寫本頁) 訂---------線 »_ 經濟部智慧財產局員工消費合作社印制衣 520603 A7 B7 五、發明說明（13) "1 1 Γ 5= 1 1 1 1 1 1 圖四顯示了梯度濾波器的效果。圖四（a)爲原始的影像，圖四（b)爲沒有使用梯度濾波器的切割結果，由於背景中包含移動物件的陰影使得部份的背景區域被誤判成爲移動物件區域。圖四（c)則爲原始影像經過梯度濾波器之後的梯度影像，由圖中可以看出，在背景的陰影部份和原始的影像相比，陰影部份不明顯。圖四（d)爲加入梯度濾波器之後的切割結果，陰影的影響可以有效的消除而能得到正確的物件形狀圖罩。本發明所提供的方法可以運用於電腦系統中，輸入的視訊資料可以由電腦的輸入週邊裝置（數位相機或是影像捕捉卡）提供，或是由資料儲存裝置提供。本發明所述的步驟則是以指令的形式在電腦的中央處理單元執行。而輸出的物件形狀資訊則可以儲存於資料儲存裝置中或是由電腦的輸出週邊裝置輸出。由於本方法並沒有使用到需要大量計算的演算法，因此計算的速度相當快。在使用Intel Pentium III 45 0MHz中央處理器的電腦系統上，處理畫面大小爲176xl44(QCIF)的視訊資料時，每一秒鐘可以處理22 個畫面。本發明所提供的方法也可以應用於獨立的資料處理硬體系統中，本方法中的每個步驟由對應的硬體運算單元來執行所需要的運算。 -16- 本紙張尺度適用中國國家標準（CNS)A4規格（210 X 297公釐） (請先閱讀背面之注意事項再填寫本頁) 11-----訂--------- ;·· 經濟部智慧財產局員工消費合作社印製B (Please read the notes on the back before filling out this page) »Clothes Order --------- Line» Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 1 1 1 1 1 1 1 1 1 The input information in the first 'step 11' is the image data of the current screen and the previous screen. Since many image capture devices can only provide image data of the current frame, we can add a frame buffer to the system to store the data of a frame. When inputting the pixel data of each current picture, the corresponding readout of the picture buffer is -14- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 520603 A7 • ^ -__ B7___ V. Description of the invention (12) The pixel data of the previous picture is set, and the pixel data of the current picture is stored in the same memory location. In this way, by only inputting the image data of the current screen, the image data of the current screen and the previous screen can be obtained at the same time as the input of the above-mentioned one step. In many practical applications, the illuminating light source will make the shadow or reflection of the object appear in the picture. When the object moves, the shadow and reflection image will change, so the part affected by the shadow and reflection image will also appear in the object shape mask. To reduce this error, we can add a gradient filter to the input of the image. Because in most cases the shadow area is characterized by slow brightness changes, so after the gradient calculation is done, the shadow area with low gradient will become less obvious, and the influence of the shadow on the shape of the object can be reduced. The gradient filter in the present invention is a morphological gradient operation. This operation can be expressed by the following equation: G = (I ㊉ Β)-(Ι €) B), where I is The input image, B is a 3 by 3 type component, ㊉ is the morphological expansion operation, Θ is the morphological reduction operation, and G is the output gradient image. These morphological operations are known skills, and the calculation equation is as follows: [I @ B] (Uj) = Π (kJ) ^ Z2Mk, l) = \ [/ ㊉ · ，刀 = U 外 -〇 (k , l) eZ2-B (kJ) = \ -15- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) (Please read the note on the back? Matters before filling out this page) Order- -------- line »_ Printed clothing by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 520603 A7 B7 V. Description of the invention (13) " 1 1 Γ 5 = 1 1 1 1 1 1 Figure 4 shows the gradient The effect of the filter. Figure 4 (a) is the original image, and Figure 4 (b) is the cutting result without using the gradient filter. Because the background contains shadows of moving objects, part of the background area is misjudged as a moving animal area. Figure 4 (c) is the gradient image of the original image after the gradient filter. As can be seen from the figure, the shadow portion of the background is not obvious compared with the original image. Figure 4 (d) is the cutting result after adding the gradient filter. The shadow effect can be effectively eliminated and the correct object shape mask can be obtained. The method provided by the present invention can be applied to a computer system. The input video data can be provided by a computer input peripheral device (digital camera or image capture card), or provided by a data storage device. The steps of the present invention are executed in the form of instructions in a central processing unit of a computer. The output object shape information can be stored in the data storage device or output by the computer's output peripheral device. Since this method does not use an algorithm that requires a lot of calculations, the calculation speed is quite fast. On a computer system using an Intel Pentium III 45 0MHz CPU, when processing video data with a frame size of 176xl44 (QCIF), it can process 22 frames per second. The method provided by the present invention can also be applied to an independent data processing hardware system. Each step in the method is performed by a corresponding hardware operation unit to perform a required operation. -16- This paper size applies to Chinese National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page) 11 ----- Order -------- -; · Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs

Claims

520603 A8 B8 C8 D8 夂、 Applicable patent scope French Ministry of Economic Affairs Intellectual Property Bureau Employee Consumption Cooperative Prints moderate K / sheets A type of cutting the shape of a moving object from a series of video images includes the following steps: (a) Calculating input The brightness difference between the current picture and the previous picture is compared with a preset first critical threshold, and a picture difference mask is generated. The value of each element in the picture difference mask is "0" or "1", "0" represents the brightness difference between the corresponding pixel in the current picture and the previous picture, which does not exceed the first threshold, "1" represents the corresponding pixel in the current picture The brightness difference from the previous picture exceeds the first critical threshold; (b) Judging from the data of the current picture difference mask and the past several picture difference masks generated in the previous step (a), Whether each pixel of the current picture belongs to the background area, and then the background picture located in the background buffer is updated according to the judgment result, and a background registration mask is generated at the same time. Each of the background registration masks The value of the element is "0" or "1", "0" means that the pixel position corresponding to the background picture in the background buffer has no background information recorded, and, Γ, represents the pixel corresponding to the background picture in the background buffer The location has recorded background information; (c) The current screen image provided by the input terminal and the brightness of the background screen stored in the background buffer in the previous step (b) are subtracted to obtain the absolute value. After the second critical unit comparison, a background rate mask is generated, and each element in the background rate mask is ,, 0, or "1 ,,,,, 0," which represents the current picture. The corresponding pixels are compared with the corresponding pixels of the background picture (please read the note on the back? Matters before filling out this page)---- 17-2020603 A8 B8 C8 D8 VI. The absolute scope of the patent application scope Beyond the second critical threshold, "1" represents the corresponding pixel in the current picture and the corresponding pixel of the background screen. The absolute absolute ratio exceeds the second critical threshold; (d) the registered mask according to the background Alas, produce a thing 'Shape mask', where when the element of the background registration mask is "1" (the background information exists), the corresponding element of the object shape mask is equal to the value of the corresponding element in the background difference mask. When one of the elements of the background registration mask is ,, 0 ,, (the background information does not exist), the corresponding element of the object shape mask is equal to the corresponding element of the screen difference mask; and (e) Filtering out noise, including comparing the area of the same element 値 connected area in the shape mask of the object with a preset third critical ，, and inverting elements smaller than the third critical 値("0" changed to "1", "1" changed to "0"), and a final object shape mask was generated. 2. The method as described in item 1 of the application scope, wherein step (b) includes the following steps: I. Update each pixel in the picture according to the screen difference map mask generated by the first step (a) of the application scope. Counter, where one element of the frame rate mask is "1", its corresponding counter is set to 0, and when one element of the frame rate mask is The counter's frame will increase by 1; and II. Update the stored in the background buffer according to the result of the previous step (I) -18- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please (Please read the notes on the back before filling this page); 0-line · Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 520603 A8 B8 C8 ___ D8 VI. Each pixel in the background picture in the scope of patent application, where if its The corresponding counter 値 is greater than a preset threshold 値 L, then the background information of the corresponding pixel is written to the pixel of the background picture in the background buffer, and the corresponding background registration mask is corresponding Su Zhi Yuan record is "1" (background information already exists). 3. The method according to item 1 of the application scope, wherein the current picture and the previous picture in step (a) are subjected to a morphological gradient operation in advance to delete the video of the area where the brightness changes slowly, The obtained gradient picture. (Please read the notes on the back before filling this page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs -19- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)