TW200803527A

TW200803527A - Method for determining a motion vector describing motion realtive to a reference block and storage media thereof

Info

Publication number: TW200803527A
Application number: TW096122000A
Authority: TW
Inventors: Zahid Hussain
Original assignee: Via Tech Inc
Priority date: 2006-06-16
Filing date: 2007-06-15
Publication date: 2008-01-01
Also published as: CN101068353A; TW200821986A; TWI482117B; CN101083763A; TW200816082A; TW200816820A; CN101068353B; TW200803525A; CN101068365A; TWI348654B; CN101072351B; CN101068364B; CN101072351A; TWI383683B; TWI350109B; TWI444047B; CN101083764A; CN101083764B; CN101083763B; TWI395488B

Abstract

A method for determining a motion vector describing motion relative to a reference block, the method comprising: determining which of a plurality of prediction blocks is a good match with the reference block, according to a match criteria; performing a local area exhaustive search to produce a best match with the reference block, the search performed in an area centered around the good match prediction block, the best match having integral pixel resolution; modeling the degree of match between the best match and the reference block as a quadratic surface; analytically determining a minima of the quadratic surface, the minima corresponding to a best matching block with fractional resolution; and computing a fractional motion vector based on the best matching block with fractional resolution.

Description

200803527 九、發明說明：【發明所屬之技術領域】目前所揭露的内容關於一圖形處理單元，且尤其係關於具有影像壓縮與解壓縮特徵之圖形處理單元。200803527 IX. DESCRIPTION OF THE INVENTION: TECHNICAL FIELD The present disclosure relates to a graphics processing unit, and more particularly to a graphics processing unit having image compression and decompression features.

I 【先前技術】個人電腦與消費性電子產品係用於各種娛樂用品。這 • 些娛樂用品可.以大致區分為2類：使用電腦製圖 (computer-generated graphics)的那些，例如電腦遊戲；與使用壓細視資料流（compressed video streaifi)的那些，例如預錄節目到數位式影音光碟（DVD)上，或由有線電視或備星業者提供數位節目（digital programming)至一機上盒（set-top box)。第2種亦包含編碼類比視訊資料流’例如由一數位錄影機（DVR，digital video recorder ) ^ 所執行。電腦製圖通常由一圖形處理單元（GPU，graphic processing unit)產生。一圖形處理單元是一種建立在電 '腦遊戲平台（computer game consoles)與一些個人電腦上一種特別的微處理器。一圖形處理單元係被最佳化為快速執行描繪三度空間基本物件（three-dimensional primitiveobjects)，例如三角形、四邊形等。這些基本物件係以多個頂點描述，其中每個頂點具有屬性（例如顏色），且可施加紋理（texture)至該基本物件上。描繪的 6Clienfs Docket N〇.:S3U06-0025 TT’s Docket N(K〇608-A41237-TW/final/林環輝/2007/06/15 6 200803527 結果係一二度空間像素陣列（two-dimensional array of pixels)，顯示在一電腦之顯示器或監視器上。視訊資料流的編碼與解碼牽涉到不同種類的運算，例如，離散餘弦變換（discrete cosine transform)、移動估測（motion estimation )，、移動補償：齡衍⑽ compensation )、去方塊效應濾波器（deblocking f i 1 ter )。這些計算通常由一般用途中央處理器（cpu )結合特別的硬體邏輯電路，例如特殊應用積體電路（ASIC， application specific integrated circuit)，來處理。消費者因而需要多個運算平台以滿足他們的娛樂需求。因而需要可以處理電腦製圖與視訊編碼/解碼的單一十管平台。【發明内容】本發明之一態樣係一種判定插述相對於一表考方塊之移動的一移動向量的方法，該方法包含·柄缺 ^ 0 ·很艨一相符標準，判定複數個預測方塊中哪一個與該參考方塊有—_ ρ才〒執行一局部區域徹底搜尋以產生與該泉老古、咏〃亏方塊的一最佳相付’ 5亥局部區域徹底搜哥在以该車又佳相符預、、則方境^ ;、、、之周圍之一區域，該最佳相符具有整數料解央將該最佳相符與該參考方塊間相符的程度建模為 X ’ 面；分析地判定該二次表面的一最小值，兮异 f 表系取小值對應有分數解析度的一最佳相符方塊；以及根據該有分，數解^产的I [Prior Art] Personal computers and consumer electronics are used in a variety of entertainment products. These entertainment items can be roughly classified into two categories: those using computer-generated graphics, such as computer games; and those using compressed video streaifi, such as pre-recorded programs. Digital audio and video (DVD), or digital programming from cable or satellite operators to a set-top box. The second type also includes a coded analog video stream', for example, implemented by a digital video recorder (DVR). Computer graphics are usually generated by a graphics processing unit (GPU). A graphics processing unit is a special type of microprocessor built on the computer 'computer game consoles' and some personal computers. A graphics processing unit is optimized to quickly perform three-dimensional primitive objects, such as triangles, quads, and the like. These basic objects are described by a plurality of vertices, where each vertex has an attribute (e.g., a color) and texture can be applied to the base object. 6Clienfs Docket N〇.:S3U06-0025 TT's Docket N(K〇608-A41237-TW/final/林环辉/2007/06/15 6 200803527 The result is a two-dimensional array of pixels Pixels), displayed on a computer monitor or monitor. The encoding and decoding of video streams involves different kinds of operations, such as discrete cosine transform, motion estimation, and motion compensation. : age (10) compensation), deblocking fi 1 ter. These calculations are typically handled by a general purpose central processing unit (CPU) in conjunction with a particular hardware logic circuit, such as an application specific integrated circuit (ASIC). Consumers therefore need multiple computing platforms to meet their entertainment needs. There is therefore a need for a single ten-pipe platform that can handle computer graphics and video encoding/decoding. SUMMARY OF THE INVENTION One aspect of the present invention is a method for determining a motion vector for interpolating a movement relative to a test block, the method comprising: a handle missing ^ 0 · a very consistent standard, determining a plurality of prediction blocks Which one has the -_ ρ 〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒〒 The best match corresponds to a region around the environment ^ ; , , , , and the best match has an integer material solution to model the degree of matching between the best match and the reference block as X ' face; Determining a minimum value of the secondary surface, the difference f table is a best matching block having a small value corresponding to the fractional resolution; and according to the score, the number of solutions

Docket N〇.:S3U06-0025 1 s Docket N〇:0608-A41237-TW/fmal/林璟輝/2007/06/15 7 200803527 最佳相符方塊計算一分數移動向量。本發明之另一態樣係一種判定插述相對於一參之移動的一移動向量的方法，該方法包含：根據〜相準，判定複數個預測方塊中哪一個與該袁考方塊付才不 1 ^ ^(去 , 相符；執行一局部區域徹底彳叟尋以產生與节|考方最佳相符’該局部區域徹底:搜尋在以該較佳相符預測的為中央之周圍之-區域，該最佳相符具有整數像素解:方塊鲁α及分析地判定建模該最佳相符與該參考方塊間斤度；度的一二次表面的一最小值，該最小值對應有分數的一最佳相符方塊。％析度本發明之另-態樣係-種具有用來判斷一移一程式之電腦可讀媒體，該㈣含•置成絲執行下列步驟的邏輯：根據一相符標準，矣丨〜> ^ 弋设數個預測方塊中輝一個與一麥考方塊有一較佳相符· /一 ’ W仃—局部區域尋以產生能參考方塊的—最佳彳1 # 土4斤π ' δ亥局部區域徹底搜哥在以該較佳相符預測方塊為中央夕a 、之周圍之一區域，該最佳相符具有整數像素解析度；將該匕、^取 pa 乜相符與該參考方塊間相付的私度建核為一二次表面· .县I伯％曰|处 ’刀析地判定該二次表面的一取小值，该取小值對應有分數塊；以及根據該有分數的最/度的—最佳相符方移動向量。 * 相符方塊計算一分數【實施方式】 8Clienrs Docket N〇.:S3U06-0025 TT，s Docket N〇:〇6〇8-A4m7-TW/iinal/林環輝/2〇〇7/〇6/i5 8 200803527 在此揭露的實關提供期—圖形處理單元以增進移動估測系統與方法。 1·用於視訊鱗碼的運算平色第1 ®係用於目形與視訊編碼及/或解碼之—示^性運算平台之方塊圖。系統1GG包含_ —般用途⑽11Q (此後稱為主處理器）、-圖形處理器⑽）12G、記憶體13()與匯流鲁排140。圖形處理單元120包含-視訊加速單元（卿）15〇，其可加速視訊編碼及/或解碼，將於後敘述。圖形處理單元的視訊加速功能係可在圖形處理單元12〇上執行的指令。軟體解碼器160與視訊加速驅動器17〇位於記憶體13〇中，解碼器160在主處理器110上執行。透過一個由視訊加速驅動器170提供的-介面，解碼器16〇亦可發出給圖形處理單儿120的視訊加速指令。如此一來，系統1〇〇透過發出視訊 ❿力口速指令給圖形處理單元12〇的主處理器軟體（⑽ processor software)執行視訊編碼。依此法，經常被執行的密集運算方塊（⑽Putati〇nally intensive心咖）被卸至 _處理單元120，而更複雜的運算係由域理|| 11()所執行。第1圖中省略數個對於解釋圖形處理單元m之視訊加速特徵並非必要且熟悉此項記憶者熟知的習知元件。接下來將對視訊編碼概要說明，再接下來討論一個視訊編碼元件（移動估 9Client's Docket N〇.:S3U06~0025 TT’s Docket No:0608-A41237-TW/fmal/林環輝/2007/06/15 9 200803527 ，）如何彻圖形處理單元12Q所提供的視訊加速單元功 2. 第2圖係第1 _之視I編碼器的功能方塊圖。轉人至編碼器⑽白勺圖像:（ 205)係由像素所組成。編碼器⑽利用圖像205内的時間（*temp〇ral)與空間相似性（邓的如 • Similar^leS)運作，並且利用判定一圖框内（空間）及/或圖柩間（日守間）白勺差異相似性編碼。空間編碼利用一圖像内鄰近像素通常相同或相關的特性編碼，故僅對差異編碼。時間編碼利用連串圖像中的终多像素通常相同的值，故僅對圖像間的差異編碼。編碼器⑽㈣用熵編碼的統計冗餘性：一些圖像車乂另a二圖I更系發生’故較常發生的以較短的碼代表。烟編碼的粑例包含*夫更編碼（Huff_⑺咖）、運行長度編碼 (run-length encoding) (Arithmetic c〇ding) 與前後自我適應的二位元算術編碼（⑽ binary arithmetic coding)。在此示範性實施例中，輪入圖像205的方塊係提供至一減法器2H)與一移動估測器22〇。移動估測器-比較輸入圖像 205内的方塊與-預先儲存的參考圖像23〇以找出相似的方塊。移動估測器2 2 0計算代表相符方塊間配置的一組移動向量 245。移動向量245與參考圖像的相符方塊2如合稱為預測方 lOClienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2〇〇7/〇6yq5 10 200803527 塊255，代表時間編碼。預測方塊255係提供至減法器21〇，其將輸入圖像2〇5減去預測方塊255以產生-剩餘圖像26〇。剩餘圖像·係提供至離散餘旋轉換器⑽，dlscrete⑽此transf〇rm)方塊 270與量化器_，其行空間編碼。量化器28〇 &輸出（例如組里化後的DCT係數> 係由痛編碼器29〇編碼。 • 對於某種犬員型的圖像（資訊或1圖框，與預測或P圖框），該空間來自量化器280的空間編碼餘數（spatiaUy⑽制 residual)係提供給内部解碼器。解碼器利用空間編碼餘數結合由移動估測器220所產生的移動向量245以對空間編碼圖像 205解碼。重新建構的圖像係儲存在參考圖像緩衝器295，其係提供至移動估測器220，如前所述。如結合第-圖所討論的，編碼器⑽在主處理器ιι〇上執 •行，然而億利用由圖形處理單元12〇所提供的視訊加速指令。尤其是’由移動估測器220戶斤實現的演算法利用由圖形處理單 7L 120所提供的絕對.絕對差值加總（通，漏-〇f-abs〇lute-difference)指令以達成正確的移動估測，在相對低的運算量下。接著將詳述移動估測淹管法。 ^_数盤移動估測演篡法 1 IClienfs Docket N〇.：S3U06-0025 TT’s Docket No:0_-A41237-TW/fmal/林璟輝/2〇〇7/〇6/15 11 200803527 L搜尋窗（Search Window) 如示於第3A、B圖，移動估測器220將目前圖像205切割成不重疊的各區段，稱為巨圖塊。巨圖塊的大小會依編碼器所使用的規乾（例如’ MPEG-2、H. 264、VC)與圖像的大小而改變。’ 1 在此敘述之示範性實施例，與在各種不同編碼標準中，一巨圖塊係16x16像素。一巨圖塊更切割成方塊，該方塊的大小可為 4x4、8x8、4x8、16x8、或 8x16。在MPEG-2中’各巨圖塊可僅有一移動向量，故移動估測係根據巨圖塊。H. 264允許達32個移動向量（依程度而定），故在H· 264中，移動估測係根據4x4或8x8方塊的基礎而計算。H· 264之變化，稱為AVS，該移動方塊永遠為8x8。在U 中，其可為4x4或8x8。移動估測演算法220對目前圖像205中的每一巨圖塊執行移動估測，依照在一預先編碼的圖像230 (其類似於目前圖像205的巨圖塊）中尋找一方塊的目標。參考圖像230中的巨圖塊與目前圖像205中的巨圖塊間的置換係計算並儲存為移動向量（245，第2圖）。為方便說明，移動估測程序將以目前圖像31〇中—特定巨圖塊說明（ 320)。此範例所選擇之巨圖塊32〇係在目前圖像 12Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/flnal/林璟輝/2007/06/15 12 200803527 ’ 310的中間，然而相同技術亦應用在其他巨圖塊。一搜尋窗（ 330)係在參考圖像230 (對應目前圖像31〇的巨圖塊320)中巨圖塊的中間。即，若巨圖塊32〇係位於（X， Y)，則在參考圖像230中的搜尋窗330亦位於（X，γ)，如示於_ 340。其他實施例將巨圖塊放在參专圖像23〇的其他部分，例如左上。範例第3A、B圖中的搜尋窗33〇在水平方向延伸通過相應巨圖塊的兩像素，在垂直方向一個像素。因此，搜 • 尋窗330包含14個不同巨圖塊：兩個巨圖塊分別發個與 2個像素，就在位置340的左邊；另一組兩個巨圖塊在位置的左邊；剩下組在位置340的上面、下面、左上、右上、與右下。 1 由移動估測裔220所執行的相符方塊移動運算使用絕對絕對差值加總作為判斷巨圖塊間相似性（相符）的準則。絕對絕對差值加總，計算兩像素值間的差值絕對值，並將一方塊中 ^ 所有像素的這些差值絕對值加總，如熟悉該項技藝之人士所理解的。移動估測器220結合使用絕對絕對差值加總準則與選擇待測相似性的目標巨圖塊之開創性方法，其將於下說明。 b.選擇目標巨圖德移動估測器220使用不同的搜尋方法，依據移動估測器 220疋產生目鈾圖像205的内部編碼（intra—c〇ded)移動向量或外部編碼（inter-coded)移動向量。移動估測器220利用真 BClient^s Docket N〇.:S3U06-0025 TT’s Docket NcK〇608-A41237-TW/fmal/林璟輝/2007/06/15 200803527 實世界關於移動的習知知辦咨320的打户，心/ 相符巨圖塊應該在搜尋 _二中t °中目標方塊數目，其係實際與目刚圖像205中的巨圖塊則進行相似測試 =通常以固定:速度移 :予tGPtieal flQW)中物體的移動，魏和且相似（即實質上連貝）的在工間上與時間上都是。:此外，在絕對絕對差值加 =表面（即在—搜尋空間姆絕對差值加總值）係被期待為相對地緩和（即相對少數量的局部最小點）。利用此自知知識需要指揮搜尋最可能發現最相符的地方’纽揭露的演算法使職少要被執行搜尋的數目以找到較佳的最小點。如此—來，該演算法在計算上有效率也可有效的標出較佳的相符。第4圖係一示範性實施例移動估測器22〇用來計算目前 • 圖像205内目满巨圖塊310之移動向量之演算法流程圖。移動估測程序從步驟開始，其判定由移動估測器22()為目前圖像205所產生的移動向量將被圖像間預測 (inter predicted)或圖像内預測Qntra—pre(jicted)。若使用圖像内預測則接著進行步驟420，在此施行共軛梯度下降搜尋 / 秀介法（conjugated gradient descent search algorithm) 以哥找搜尋窗320内一預測巨圖塊，這與參考巨圖塊（目前圖像205内之目前巨圖塊31〇)是較佳的相符。共輛梯度下降搜哥濟异法（步驟420)將結合第5、6圖詳細說明。 MClienfs Docket N〇.：S3U06-0025 TT，s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 200803527 回到410 ’若使用圖像間預測以產生移動向量，則，著執行步驟，在此執行，，鄰近的，，或，，鄰近區域，，搜尋。該搜尋包含鄰近於目前圖像2〇5内目前巨圖塊31〇的巨圖 =以及對應的先前編碼參考圖像23。内的巨圖塊。鄰近搜尋；演算法（步驟430)將結合第I、8圖詳細說明。卞共輛梯度下降搜尋演算法（步驟細）與鄰近搜尋演算 Ά驟430)各從-大群目標預測巨圖塊中認出了較佳或 7受的相符。熟悉此項技藝之人士應當暸解到用來判定如何才是-個”較佳的相符，，之準則可以是相對的或是絕對的。例如’在此敘狀鄰近搜尋演算法使用—絕對準則：有最低值 (score)的目標巨圖塊被視為較佳的相符。然而，在此敛述之共輛梯度下降搜尋演算法利用一臨界值，絕對差值加總值低於該臨界值的第-方塊被視為較佳的相符。然而，該臨界值的準則係一設計或實現決定。在處理步驟420 < 430之後，以認出一較佳候選相符。步驟440更執行-局部區域徹底搜尋（1〇Cal _— search)以找到最佳的候選。該搜尋區域係位於步驟42〇或 430所認出的較佳候選巨圖塊附近。在一些實施例中，在執行步驟420 ’共辆梯度下降搜尋演算法之後（即在圖像内預测的狀況下），局部徹底搜尋職尋祕域包含步驟所認出的局部隶小值（較佳候選）的外面附近的4個對角。例如，若在梯度下降上個步驟所使用的值是j，則該搜尋限制在離該較佳 15Clienf s Docket N〇.：S3U06-0025 TT’s Docket N〇..0608-A41237-TW/fmal/林璟輝/2007/06/15 15 200803527 候選（)的點。在一些實施例中，當執行步驟43〇之後（即在圖像間預測的狀況下），局部徹底搜尋（步驟權）所搜尋的包含在較佳候選巨圖塊附近一小區域的候選，通常是步驟440的局部徹雇搜尋從一較佳候選巨圖塊限縮至一最佳候選巨圖塊，這是像素調準（pixel_aligned)，即具有整數像素解析度。步驟450與460在一分數像素邊界 _ ( fractional-Pixel b_dary )找到一最佳候選巨圖塊調準。習知分數移動搜算法使用特定編解碼賴波演算法 (codec-specific filteringalg〇rithm)以内插在分數位置的像素值，根據周圍的整數位置。相對的，步驟45〇建立最俨候選巨圖塊與參考巨__符程度為二次表面，而步驟= 分析地判定該表面的最小值。最小值對應一最佳相符巨圖塊，為分數而非整數解析度。（開創性的以分數解析度判定最， • 符巨圖塊的建模將在後面的段落加以說明。）在有著分數”相度的相符巨圖塊於步，驟450被認出之後，接著處理步刀驟仰析根據該相符巨圖塊計算一分數移動向量，使用熟悉此項^ 2 所知悉的技術。接著就完成了程序4〇〇。 $ 熟悉此項技藝者應當瞭解到上面的演算法在本質上3 續的，因其使用了鄰近區域的資訊。儘管使用了硬體加逮知設計通常避免連續演算法，因為許多原因，連續的設計在裡是適當的。首先，像素資料係以連續水平掃瞄線X的开/ 16Clienfs Docket No.：S3U06-0025 乂 : TT’s Docket N〇:0608_A41237•丁 w/fmal/林璟輝/2〇_6m 200803527 (equential raster fashion)讀取，因而可被預先接收，，持在=路缓衝器中。其次，在含有單-絕對差值加總加速 f元的貝苑例中’效能是限制在該單元是否能維持滿載而非連 '、貝處理。絶對差值加總加速單元在預測方塊沒有許多快取遺漏、、准持尚負載。因為遺漏率是快取大小的函式，而.τν 解=度影像在快取中僅f要删/8 =侧義向量，低的快取遺漏率是可以預期的。 c, 的圖像内褶測銘叙而i 第5圖係第4圖共輛梯度步驟440的流程圖，由移動估測气220之-實施例所執行。如前所述，步驟權係、在判定使用圖像内預測將被用來尋找搜尋t 32G内巨圖塊係與目前方 $ _為-較佳（即可接受的）相符時執行。絕縣值加總值為P組5個初始候選而計算：目前巨圖塊、與目前巨圖塊上、下、左、右的巨圖塊。從這初始組5個絕對差值加總值，計鼻兩組互㈣絲梯度。從這兩組梯度，_最㈣肖的方向的梯度。錢梯度姉地淺，或5個初始候駐圖塊有非常接近的絕對錄加總值，職搜尋延伸遠離目敍圖塊，因為在此區域内不存在練佳局部最小機率之條件的簡。在對共辆梯度下降步驟44Q概述之後，該步驟將更詳細的說明於下。 17Clienf s Docket No.：S3U06-0025 mD〇cketN〇麵捕237挪 _ 林環輝/2_〇6/i5 取該步驟從步驟505開始，在此初始化—候選方塊^與步驟值△，與λ y。在—實施例巾，候敍圖塊^為搜尋窗 17 200803527 320的左上角，而步盥几接著在均叹為一小整數值，例如8。在力驟51G，計算候選巨圖塊^ 標。這四個硅、pec W周的候廷巨圖塊的座個。即，口尾c-的上、下、左、右四〜ς)心(―〜+Q，C》差值力個候選巨圖塊的絕對声^本_與周遭四個）。在步驟52Q，計算梯度一 1面:疋左邊與右邊巨圖塊絕對差值加總值的差。梯度a是 Μ下面巨圖塊絕對差值加總值 Λ » 」左如此一來，不論可銥相符巨圖塊間的誤差值是 ^ " J"b 左值疋、加或減少，該梯度表示x或v方向。在步驟525，該梯度係盥_gt凡做界值作比較。若該梯度低於界值（__目對喊），這表示在目前搜尋區域中益局部最小值’故該搜尋延伸至新的候選巨圖塊。這些新的候選 =圖塊遠離了原本的候選處理巨圖塊k。在—些實施例中，當在步驟515為候選巨圖塊所計算的絕對差值加總值相似時亦延伸該搜尋。該延伸搜尋繼續在步驟530進行，在此計算四個新候選巨®塊的座標。原本四個候選巨圖塊係在上下左右距離C,A)的地方，選擇四個新候選巨圖塊以形成原本候選巨圖塊Q周圍正方形角落’距離(△,，△，）： ΓΖ=(-△,+Cv，-△，C抑=(△〆” △}+=(_△〆，ς)湖=队 + 匕〜+Q 18Client’s Docket No.:S3U06-0025 TT，s Docket No:0608-A41237-TW7flnal/林環輝/2007/06/15 18 200803527 在步驟535，對這些新的候選巨圖塊（c，tl，t 匀別執行共輛梯度下降步驟440。回到步驟525的梯度比較，若在巨圖塊咖所計管的梯度係等於歧於該臨界值（_做仙對地㈣）步驟540在:步驟515所計算的絕對差值加總值與二臨界值作比較。若該絕縣值加總值低於該臨界值，職示朗較佳相符，則步驟440回到呼叫器、（在步驟545) ’提供該最低絕對差值加總值的候選巨圖塊。 ^ 若在步驟540所職的魏對差值加總料於或低於$ 臨界值’表示沒有_錄相符，故驢搜尋_。在步^ 550 ’選擇—新的中央候選巨圖塊Q。新的中央巨圖塊.是以：〇1^候選組中在步驟515巾算出有最低絕對差值加總值的方塊。接著，在步驟555，從梯計算新的步驟值△，與△,，例如陡_梯度代表可接受的相符巨圖塊係目前中央候選很遠，故增加⑽。相反地，淺的梯度代表可接受的相符巨圖塊係目前中央候選很近，故應減少 (△,Α)。熟悉此項技藝之人士應當瞭解到各種不同的係數可以從各梯度用來計算達成該結果。接著’在步驟560測試疊代迴圈數。若該數目大於_最大值，則步驟440於步驟565完成，找不到可以接受的相符。此外，採用錯誤梯度以選擇-組新的候選巨圖塊，其被期待為較 19Client，s Docket No·:S3U06-0025 TT s Docket No:0608-A41237-TW/fiiial/林環輝/2007/06/15 19 200803527 接近於最終相符，該梯度下降步驟44〇回到步驟5ι〇，在此產生一組新的。共輛梯度下降步驟44〇在以下兩種情況下完成，當找到可接受的值（步驟545)，或最大疊代數目以達到仍無相符（步驟565)。 _ 6圖說明使用共.輛梯度下降步驟44(Γ的示範狀態。初始候遥巨圖塊q係方形（)，而四個周圍候選係圓圈 (61〇T，610L，610R，610B)。從這些初始候選計算梯度仏與 •&⑽X，62GY)。在此示範狀態中，梯度太淺了，而沒有絕對差值加總值低於該臨界值。因此延伸搜尋，使用四個新的中央候選巨圖塊，示為三角形（63〇TL，63〇TR，63〇BL，63〇BR)。這些新的候選巨圖塊距離原本候選巨圖塊。周圍角落△的距離。在這些中央候選周圍的巨圖塊，示為六角形 (64叫，_7；，6響2,6樣2,6概3,6獅3,64叭，_ 鲁範狀態中，兩個候選640具有低於臨界值的絕對差值加總值與’’陡峭”梯度（ 650XY，66QXY)。另一候選係根據各，，陡峭，，梯度選擇：候選670係根據梯度650XY，而後選680係根據梯度660XY。梯度下降搜尋繼續使用這些新的候選670、68〇，根據共軛梯度下降步驟440。 d.使用先前鄰近圖像間預測移第7圖係第4圖鄰近搜尋演算法（步驟430)的流程圖，由移動估測器220之一實施例所執行。如前所述，該搜尋之候 20Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 20 200803527 中的目前巨圖塊310 (已被的係在預先編碼的參考圖像選巨圖塊包含鄰近於目前圖像205 編碼）的巨圖塊。亦包含為一候選 230中的一對應巨圖塊。計算候選巨圖塊座標的步驟從步驟710開始，在此藉由利用目前巨圖塊310位址的絕對值（餘婁£)與每行巨圖塊數古十管 -旗標變數ramiD。若聽龍非Q，則TQmuD為直了此外，TOPVALID為假。在步驟720，_旗標變數leftvaud係利用目别巨圖塊310位址的除以整數與每行巨圖塊數計算。若此除數非0 ’則LEFTVALID為真’此外，leftvalid為假。這些TOPVALID與LEHVALID變數表示目前巨圖塊31〇分別在上面與左邊有一鄰近巨圖塊，考慮巨圖塊的上緣與左邊緣。在步驟730，結合使用TOPVALID與LEFTVALID變數以判定目前巨圖塊310鄰近的4個候選巨圖塊的可得性，或存在性。特別是：左邊有一巨圖塊L若（LEFTVALID);上面有一巨圖塊T若（TOPVALID);左上有一巨圖塊TF若（T0PVALID& LEFTVALID);又上有一巨圖塊 TR 若（TOPVALIM RIGHTVALID)。接著，在步驟740，為一先前候選巨圖塊ρ判定可得性，這是在空間上對應目前巨圖塊310之先前編碼參考圖像230中的一巨圖塊。這5個候選巨圖塊的相對位置可在第8Α、Β圖中看到，其中 L 係 810、Τ 係 820、TL 係 830、TR 係 840、Ρ 係 850。回到第7圖，步驟730與步驟740有多少候選巨圖塊可用 21Client’s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 200803527 來比較（從1到5)。步驟750鱗—可得候選巨圖塊計算絕對差值加總。若5個候選均可得，該組絕對差值加總值為： Γ ( L Λ- Τ\ 若某些候選不可得，熟悉此項技藝之人士應當瞭解到該組候選相對較小。接著完成㈣43Q，喊有最低絕縣值加總的候選巨圖塊。 ~ 如先前結合第4圖所討論的，一旦找到相符巨圖塊（不論使用第_鄰近搜尋法或是第5圖的聽梯度下降），接著搜哥區域更加限縮，採用局部徹底搜尋（第4圖4仙）。在局部搜尋之後’ 局部徹底搜尋的結果計算—分數移動向量。分數移動向量的計算將於下詳述。 6·利用二次表面模型的分數移動向量運1 熟悉此項技藝之人士應當對圖示巨圖塊對搜尋窗間相符程度以產生”錯誤表面”感到熟悉。採用一開創性方法，浐動估測器220以一二次表面建模錯誤表面並分析地以次像素準確性判定該表面的最小值。移動估測器220，首先與】定_方項之最小值，給定一最小行。移動估測器220接著沿著這條線央定正交方向的最小值。二次曲線的一般方程式如方程式1。 22Client，s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 22 200803527 方程式1 y = Cx+C2t^Cf 對該曲線取微分，如第2方程式 I: C2 + 2C3t => i 方程式2 ϊ -旦係數c"。2, ς已知，:則可求解以判定、，最小的位置移動估測器220解出方程式3以判定係數Ci，心q。 C2 C3 4 31 - 27 5、 -27 25 一: 5 -5 1 Σ^2 方程式3 移動估測器220使用由圖开j 丨义用田圚小慝理早兀120所提供的84絕對差值加總指令已有效率的計算方程式3。各式代表一絕對差值加總值’對i累加代表在X方向鄰近巨圖塊的絕對差值加總 • 值。如結合帛[圖之詳細說明’該如4絕對差值加總指令有效率的計算鄰近的巨圖塊（x，y)、（x+1，y)、（x+2，yMX+3，y), 的4個絕對差值加總值，即i=〇·..3且。如前所述，一旦係數已知，解方程式2得到ΐ，X方向的最小值。方程式3可以用來判定垂直方向的最小值t。在此例中，移動估測器220使用8x4絕對差值加總指令已有效率的計算垂直地鄰近的巨圖塊（X，y)、（x+l，y)、（x+2, y)、（x+3, y)的 4 個絕對差值加總值。方程式3解出計算自這些絕對差值加總值 23Clienfs Docket N〇.：S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林環輝/2007/06/15 9 . 200803527 的係數如前所述’—旦係數已知，解方程式以导到t，y方向的最小值。移動估測器22〇所使用的二次錯亨表面方法較在先騎-像素邊界上—較佳树後再使用運算产: 貴濾波器去顧子像錢界上較佳树的⑽綠來的進步。 '絕對差總加速器以的計算 ⑩ ㈣所述，移動估測器22G以目前圖像中—參考巨圖塊判定酬圖像中那個巨圖塊有較佳的相符。移動估測器22〇使用由圖幵y處理單元120所提供的絕對差值加總硬體加速，其為圖形加速單元指令。絕對差值加總指令要輸入一 4x4參考方塊與一 8x4預測方塊，並產生4個絕對差值加總值。參考方塊與預測方塊的大小可根據需要而改變。4χ4參考方塊與8χ4預測方塊僅為範例以說明本發明，而不應限制參考方塊與預測方塊的 φ 大小。第9Α、Β圖係說明對參考與預測方塊進行絕對差值加總指令運作的方塊圖。如示於第9Α圖，8x4預測方塊係由多個彼此重疊的水平鄰近4x4方塊所組成，如方塊910、920、930、 940。絕對差值加總單元取一個輸入4Χ4參考方塊950並計算該麥考方塊與910-940個方塊的絕對差值加總值。即，該絕對差值加總指令計算4個值··一個值是方塊910與方塊950的差值的絕對值之總和；另一個值是方塊920與方塊950的差值的絕對值之總和;另一個值是方塊930與方塊950的差值的絕對值之總和；另一個值是方塊940與方塊950的差值的絕對值之 24Client，s Docket N〇.:S3U06-0025 TT’s Docket No..0608-A41237-TW/final/林璟輝/2007/06Λ5 24 200803527 總和。茶見第9B圖，圖形處理單元12〇内的絕對差值加總加速早兀使用4個絕對差值加總計算單元（96〇,97〇，_，99〇)以實現絕對差值加總指令。最左邊的4x4方塊910係提供給絕對差值加總计异單凡96〇。接著輪入右邊的4χ4方塊（的〇)給絕對差值加料算單元⑽。接著輸人右邊的ω方塊（93〇) 、给、％對差值加總計算單元98〇。最後，提供最右邊的4χ4方塊 940々、、、邑對差值加總計算單元9別。圖形處理單元平行地使用獨立的絕對差值加總計算單元，所以絕對差值加總指令每個週期產生4個絕對差值加總值。熟悉此項技藝之人士應當瞭解到用來計异兩個相同大小像素方塊的絕對差值加總運算之演算法，以及用來執行此運算之硬體設計，故這些細節將不再詳述。 • 4x4參考方塊係水平地且垂直地列在像素邊緣。然而，不需要垂直地校正4x4預測方塊91〇-94〇。在一實施例中，資料係藉由旋轉（邏輯電路995)該參考方塊所校正。旋轉參考方塊而非分別旋轉4個預測方塊可節省邏輯閘數。旋轉後的參考方塊係k供給各獨立絕對差值加總硬體加速單元。各單元產生 12位元的值，而這些值結合成一個位元的輪出。在一實施例中，這些值的數量級係根據預測方塊的11紋理座標（最低位元位置中的最低座標）。 25Client，s Docket N〇.:S3U06-0025 TT，s Docket N〇:0608-A41237-TW/fmal/林璟輝/2007/06八5 25 200803527 下面的程式碼說明8x8方塊，即兩個鄰近的8x4方塊，的絕對差值加總值可以僅使用4個絕對差值加總指令計算。暫存器T、T、T、T4係用來暫存這4個絕對差值加總值。變數sadS 係用來累加這些絕對差值加總值。8x4參考方塊的位址假設在 refReg。U與V係8x8預測方塊的紋理座標。下面的程式碼產生整個8x8方塊的全部的絕對差值加總值，儲存在sadS。 SAD T1A refReg, U, V ； left-top of 8x8 prediction blockDocket N〇.:S3U06-0025 1 s Docket N〇:0608-A41237-TW/fmal/林璟辉/2007/06/15 7 200803527 The best matching square calculates a fractional motion vector. Another aspect of the present invention is a method for determining a motion vector for interpolating relative to a movement of a parameter, the method comprising: determining, according to the ~phase, which of the plurality of prediction blocks is to be paid by the Yuan test block 1 ^ ^(go, match; perform a partial region thorough search to produce the best match with the knot|the tester's thoroughness: search for the region around the center predicted by the better match, The best match has an integer pixel solution: the square ru and the analytically determined modelling the best match with the reference block; a minimum of the quadratic surface of the degree, the minimum corresponding to the best of the score </ RTI> </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; ~> ^ 弋数预测预测预测一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个一个It’s better to thoroughly search the local area of Hai The coincident prediction block is a region around the central eve a, and the best match has an integer pixel resolution; the 私, ^, pa 乜 coincidence and the privateness paid between the reference blocks are constructed as a quadratic surface · The county I 曰%曰| at the location of the knife to determine a small value of the secondary surface, the small value corresponds to a fractional block; and according to the score of the most / degree - the best matching square motion vector * Matching box to calculate a score [Embodiment] 8Clienrs Docket N〇.:S3U06-0025 TT,s Docket N〇:〇6〇8-A4m7-TW/iinal/林环辉/2〇〇7/〇6/ I5 8 200803527 The real-time delivery period disclosed here—the graphics processing unit to improve the motion estimation system and method. 1. The operation flat color for the video scales is used for meshing and video encoding and/or decoding. A block diagram of the computing platform. The system 1GG includes a general purpose (10) 11Q (hereinafter referred to as a main processor), a graphics processor (10) 12G, a memory 13 (), and a bus bar 140. The graphics processing unit 120 includes a video acceleration unit (clear) 15 that speeds up video encoding and/or decoding, as will be described later. The video acceleration function of the graphics processing unit is an instruction that can be executed on the graphics processing unit 12A. The software decoder 160 and the video acceleration driver 17 are located in the memory 13A, and the decoder 160 is executed on the main processor 110. The decoder 16 can also issue a video acceleration command to the graphics processing unit 120 via an interface provided by the video acceleration driver 170. In this way, the system 1 performs video coding by sending a video power command to the main processor software (10) of the graphics processing unit 12 (10). In this way, the intensive computation block ((10) Putati〇nally intensive) that is often executed is offloaded to the _processing unit 120, while the more complex operation is performed by the domain || 11(). Several conventional elements that are not necessary for interpreting the video acceleration characteristics of the graphics processing unit m and are familiar to those skilled in the memory are omitted in FIG. Next, we will explain the video coding outline, and then discuss a video coding component (Mobile Estimation 9Client's Docket N〇.:S3U06~0025 TT's Docket No:0608-A41237-TW/fmal/林环辉/2007/06/15 9 200803527 ,) How to complete the video acceleration unit work provided by the graphics processing unit 12Q 2. Fig. 2 is a functional block diagram of the first _ view I encoder. Transfer to the image of the encoder (10): (205) is composed of pixels. The encoder (10) operates using the time (*temp〇ral) in the image 205 and the spatial similarity (Deng's like: Other^leS), and utilizes the decision within a frame (space) and/or between the maps (day guard) (d) differential similarity coding. Spatial coding utilizes the same or related characteristic coding of adjacent pixels within an image, so only the difference is encoded. Time Coding The use of the final multi-pixels in a series of images is usually the same value, so only the differences between the images are encoded. Encoder (10) (4) Statistical redundancy with entropy coding: some images are more likely to occur in the second picture I, so it is more often represented by shorter codes. Examples of smoke coding include *Huffy (Huff_(7) coffee), run-length encoding (Arithmetic c〇ding) and pre- and self-adaptive binary arithmetic coding ((10) binary arithmetic coding). In this exemplary embodiment, the block of wheeled image 205 is provided to a subtractor 2H) and a motion estimator 22A. The motion estimator compares the blocks in the input image 205 with the pre-stored reference images 23 to find similar blocks. The motion estimator 2 2 0 calculates a set of motion vectors 245 that represent the configuration between the matching blocks. The matching of the motion vector 245 with the reference image 2 is collectively referred to as the prediction side lOClienfs Docket N〇.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2〇〇7/〇6yq5 10 200803527 255, representing time coding. Prediction block 255 is provided to subtractor 21A which subtracts the input block 2〇5 from prediction block 255 to produce a residual image 26〇. The remaining image is supplied to a discrete cosine converter (10), dlscrete (10), this transf〇rm) block 270 and quantizer_, which is line-space encoded. The quantizer 28〇& output (for example, the grouped DCT coefficient> is encoded by the pain encoder 29〇. • For a certain dog-type image (information or 1 frame, with prediction or P frame) The spatially encoded remainder of the space from quantizer 280 (residual by spatiaUy (10) is provided to the internal decoder. The decoder uses the spatially encoded remainder in conjunction with the motion vector 245 generated by the motion estimator 220 to spatially encode the image 205. The reconstructed image is stored in a reference image buffer 295, which is provided to the motion estimator 220, as previously described. As discussed in connection with the first figure, the encoder (10) is in the main processor ιι〇 The above-mentioned implementation, however, utilizes the video acceleration instructions provided by the graphics processing unit 12. In particular, the algorithm implemented by the mobile estimator 220 uses the absolute absolute difference provided by the graphics processing unit 7L 120. The value sums up (pass, leak-〇f-abs〇lute-difference) instructions to achieve the correct motion estimation, at a relatively low computational amount. The mobile estimation flooding method will be detailed next. Estimation of deduction 1 IClienfs Dock Et N〇.:S3U06-0025 TT's Docket No:0_-A41237-TW/fmal/林璟辉/2〇〇7/〇6/15 11 200803527 L Search Window As shown in Figure 3A, B, move The estimator 220 cuts the current image 205 into non-overlapping segments, called giant tiles. The size of the giant tile is determined by the encoder (eg, 'MPEG-2, H.264, VC'). And the size of the image changes. ' 1 In the exemplary embodiment described herein, and in various coding standards, a giant tile is 16x16 pixels. A giant tile is further cut into squares, and the size of the square can be It is 4x4, 8x8, 4x8, 16x8, or 8x16. In MPEG-2, each giant tile can have only one motion vector, so the motion estimation system is based on the giant tile. H.264 allows up to 32 motion vectors (depending on the degree). Therefore, in H.264, the motion estimation is calculated based on the basis of 4x4 or 8x8 squares. The change of H·264, called AVS, is always 8x8. In U, it can be 4x4. Or 8x8. The motion estimation algorithm 220 performs a motion estimation on each macroblock in the current image 205, in accordance with a pre-encoded image 230 ( It is similar to the target of finding a square in the giant tile of the current image 205. The replacement between the giant tile in the reference image 230 and the giant tile in the current image 205 is calculated and stored as a motion vector (245). , Fig. 2). For convenience of explanation, the motion estimation program will be described in the current image 31〇-specific giant tile (320). The giant tile selected in this example is currently in the image 12Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/flnal/林璟辉/2007/06/15 12 200803527 'In the middle of 310, However, the same technology is also applied to other giant tiles. A search window (330) is in the middle of the macroblock in the reference image 230 (corresponding to the giant tile 320 of the current image 31〇). That is, if the giant tile 32 is located at (X, Y), the search window 330 in the reference image 230 is also located at (X, γ) as shown in _340. Other embodiments place the giant tile in other portions of the reference image 23, such as the upper left. The search window 33 of the example 3A, B extends horizontally through two pixels of the corresponding giant tile, one pixel in the vertical direction. Therefore, the search window 330 contains 14 different giant tiles: two giant tiles are respectively sent with 2 pixels, just to the left of the position 340; the other two giant tiles are to the left of the position; The group is above, below, top left, top right, and bottom right of position 340. 1 The coincidence block movement operation performed by the mobile estimation source 220 uses the absolute absolute difference sum as the criterion for judging the similarity (conformity) between the macroblocks. The absolute absolute difference is summed to calculate the absolute value of the difference between the two pixel values, and the absolute values of the differences of all pixels in a square are summed, as understood by those skilled in the art. The mobile estimator 220 combines the absolute absolute difference summation criterion with the groundbreaking method of selecting the target giant tile of the similarity to be tested, which will be explained below. b. Selecting the target giant map mobile estimator 220 uses the different search methods to generate an intra-c〇ded motion vector or an outer code (inter-coded) of the target uranium image 205 in accordance with the motion estimator 220. ) Move the vector. The mobile estimator 220 utilizes the true BClient^s Docket N〇.: S3U06-0025 TT's Docket NcK〇608-A41237-TW/fmal/林璟辉/2007/06/15 200803527 The real world of mobile know-how 320 The household, the heart/consistent giant block should be in the search for the number of target blocks in the _second t°, which is similar to the giant block in the image 205. Normally, it is fixed: speed shift: to tGPtieal The movement of objects in flQW), Wei and similar (ie, substantially continuous) are both on the work and in time. In addition, the absolute absolute difference plus the surface (i.e., the -search space m absolute difference plus the total value) is expected to be relatively moderate (i.e., a relatively small number of local minimum points). The use of this self-aware knowledge requires a command to search for the most likely to find the most consistent location of the 'News' algorithm to minimize the number of searches to be performed to find a better minimum. In this way, the algorithm is computationally efficient and can effectively mark the best match. 4 is an algorithmic flow diagram of an exemplary embodiment motion estimator 22 for computing the current motion vector of the macroblock 310 in the image 205. The motion estimation procedure begins with a step of determining that the motion vector produced by the motion estimator 22() for the current image 205 will be inter-predicted or intra-image predicted Qntra-pre (jicted). If intra-image prediction is used, then step 420 is performed, where a conjugated gradient descent search algorithm is performed to find a predicted macroblock in the search window 320, which is related to the reference giant tile ( The current giant tile 31〇 in the image 205 is a better match. The co-vehicle gradient descent (step 420) will be described in detail in conjunction with Figures 5 and 6. MClienfs Docket N〇.:S3U06-0025 TT,s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 200803527 Back to 410 'If you use inter-image prediction to generate motion vectors, then execute Steps, performed here, adjacent, or,, adjacent areas, search. The search contains a giant image adjacent to the current giant tile 31〇 in the current image 2〇5 and a corresponding previously encoded reference image 23. The giant tile inside. The proximity search; algorithm (step 430) will be described in detail in conjunction with Figures I and 8.卞 A common gradient descent search algorithm (step-by-step) and a neighboring search calculus. Step 430) Each of the slave-large group target prediction giant tiles recognizes a better or a 7-acceptance match. Those familiar with the art should understand the criteria for determining what is better, and the criteria can be relative or absolute. For example, 'Use in this neighborhood search algorithm—absolute criteria: The target giant tile with the lowest score is considered to be a better match. However, the common gradient gradient search algorithm used here uses a critical value, and the absolute difference plus the total value is lower than the critical value. The first-square is considered to be a better match. However, the criterion of the threshold is a design or implementation decision. After processing step 420 < 430, a better candidate is recognized. Step 440 is more performed - local area Thoroughly search (1〇Cal__search) to find the best candidate. The search area is located near the better candidate giant block identified in step 42〇 or 430. In some embodiments, step 420 ' After the common gradient descent search algorithm (ie, under the condition of intra-image prediction), the local thorough search for the four pairs of the local collocations (preferred candidates) recognized by the step is recognized by the step. Angle. For example, if In the gradient step, the value used in the previous step is j, then the search is limited to the preferred 15Clienf s Docket N〇.:S3U06-0025 TT's Docket N〇..0608-A41237-TW/fmal/林璟辉/2007/ 06/15 15 200803527 Point of Candidate(). In some embodiments, after performing step 43 (ie, in the case of inter-picture prediction), the partial thorough search (step weight) is searched for among the better candidates. A candidate for a small area near the giant tile, usually the partial perpetual search of step 440 is limited from a preferred candidate macroblock to a best candidate macroblock, which is pixel aligned (pixel_aligned), ie has an integer Pixel resolution. Steps 450 and 460 find an optimal candidate giant block alignment at a fractional-Pixel b_dary. The conventional fractional motion search algorithm uses a specific codec-specific filteringalg algorithm. 〇rithm) interpolates the pixel value at the fractional position according to the surrounding integer position. In contrast, step 45 〇 establishes that the final candidate giant tile and the reference giant __ degree are secondary surfaces, and step = analytically determines the Surface The minimum value corresponds to a best-matching giant tile, which is a fraction rather than an integer resolution. (The groundbreaking is best determined by fractional resolution. • The modeling of the macroblock will be explained in the following paragraphs.) After the coincident giant block with the score "phase" is stepped, the step 450 is recognized, and then the processing step is analyzed to calculate a fractional motion vector according to the matching giant block, and the technique known to the knowledge is used. . Then the program 4 is completed. $ Those familiar with the art should understand that the above algorithm is essentially intrinsic because it uses information from nearby areas. Although hardware-accumulated designs are often used to avoid continuous algorithms, continuous design is appropriate for many reasons. First, the pixel data is in the horizontal horizontal scanning line X / 16Clienfs Docket No.: S3U06-0025 乂: TT's Docket N〇: 0608_A41237 • Ding w/fmal/林璟辉/2〇_6m 200803527 (equential raster fashion) Take, and thus can be received in advance, held in the = channel buffer. Secondly, in the case of Beiyuan containing a single-absolute difference plus a total acceleration f-element, the effectiveness is limited to whether the unit can maintain full load instead of ', and the shell is treated. The absolute difference plus total acceleration unit does not have many cache misses in the prediction block, and the load is still loaded. Since the missing rate is a function of the size of the cache, the .τν solution=degree image is only f deleted in the cache. /8 = the side vector, and the low cache miss rate is expected. The in-image pleats of c, and the flowchart of the common gradient step 440 of Fig. 4 are performed by the embodiment of the mobile estimation gas 220. As previously mentioned, the step weighting is performed when it is determined that the intra-image prediction will be used to find that the giant block system in the search t 32G matches the current side $ _ is - preferably (acceptable). The total value of the absolute county value is calculated for the five initial candidates of the P group: the current giant tile, and the giant tile of the upper, lower, left and right of the current giant tile. From this initial group of 5 absolute differences plus the total value, count the two sets of mutual (four) silk gradient. From these two sets of gradients, the gradient of the _most (four) XI direction. The money gradient is shallow, or the five initial waiting blocks have very close absolute total values, and the job search extends far from the visual block, because there is no simple condition for practicing the local minimum probability in this area. This step will be explained in more detail after the overview of the common gradient step 44Q. 17Clienf s Docket No.:S3U06-0025 mD〇cketN〇面捕237向_林环辉/2_〇6/i5 Take this step from step 505, initialize here - candidate block ^ and step value △, and λ y. In the embodiment, the candidate block ^ is the upper left corner of the search window 17 200803527 320, and the steps are then sighed to a small integer value, such as 8. At the force 51G, the candidate giant block is calculated. These four silicon, pec W week's seat of the giant block. That is, the mouth, c-, upper, lower, left, and right four ~ ς) heart (―~+Q, C) difference force, the absolute sound of the candidate giant block, and the four surrounding. At step 52Q, the difference between the absolute value of the left and right giant blocks and the total value is calculated. The gradient a is the absolute difference between the giant block and the total value Λ » ” left, regardless of whether the error value between the matching macroblocks is ^ "J"b left value 疋, plus or minus, the gradient Indicates the x or v direction. At step 525, the gradient system 盥_gt is the boundary value for comparison. If the gradient is below the threshold (__目对对喊), this means that the local minimum is in the current search area' so the search extends to the new candidate giant block. These new candidates = tiles are far from the original candidate processing giant block k. In some embodiments, the search is also extended when the absolute difference plus total values calculated for the candidate giant tiles are similar in step 515. The extended search continues at step 530 where the coordinates of the four new candidate giant blocks are calculated. The original four candidate giant block blocks are located at the upper, lower, left and right distances C, A), and four new candidate giant image blocks are selected to form the square corner 'distance (△, △,) around the original candidate giant block Q: ΓΖ = (-△, +Cv, -△, C ==(△〆) △}+=(_△〆,ς) Lake=Team+匕~+Q 18Client's Docket No.:S3U06-0025 TT,s Docket No: 0608-A41237-TW7flnal/林环辉/2007/06/15 18 200803527 In step 535, a common vehicle gradient descent step 440 is performed on these new candidate giant block blocks (c, t1, t. Gradient comparison, if the gradient of the giant graph is equal to the critical value (_ do the ground to the ground (4)), the absolute difference plus the total value calculated in step 515 is compared with the second critical value. If the total value of the absolute value is below the threshold, and the job title is better, then step 440 returns to the pager, (at step 545) 'providing the candidate block of the lowest absolute difference plus the total value. ^ If the value of Wei in the step 540 is added to or below the threshold value, indicating that there is no _ record, then 驴 search _. In step ^ 550 'Select - New The candidate giant tile Q. The new central giant tile. In the candidate group, the block with the lowest absolute difference plus the total value is calculated in step 515. Then, in step 555, a new step is calculated from the ladder. The value △, and Δ, for example, the steep_gradient represents an acceptable coincidence of the giant block system, which is far away from the current central candidate, so it is increased by (10). Conversely, the shallow gradient represents an acceptable coincident giant block system. Therefore, it should be reduced (△, Α). Those skilled in the art should understand that various coefficients can be used to calculate the result from the gradients. Then 'test the number of iterations in step 560. If the number is greater than _maximum, then step 440 is completed in step 565, and an acceptable match is not found. Furthermore, an error gradient is used to select a new set of candidate giant tiles, which is expected to be more than 19Client, s Docket No:: S3U06- 0025 TT s Docket No: 0608-A41237-TW/fiiial/Lin Huanhui/2007/06/15 19 200803527 Close to the final match, the gradient descent step 44 returns to step 5ι, where a new set is generated. The common gradient descent step 44 is in the following two In this case, when an acceptable value is found (step 545), or the maximum number of iterations is reached, there is still no match (step 565). The _6 figure illustrates the use of a common gradient descent step 44 (an exemplary state of Γ. The distant giant block q is square (), and four surrounding candidate circles (61〇T, 610L, 610R, 610B). From these initial candidates, the gradients are calculated as &&;(10)X, 62GY). In this exemplary state, the gradient is too shallow, and there is no absolute difference plus the total value below the threshold. Therefore, the search is extended using four new central candidate giant blocks, shown as triangles (63〇TL, 63〇TR, 63〇BL, 63〇BR). These new candidate giant block blocks are from the original candidate giant block. The distance from the surrounding corner △. The giant tiles around these central candidates are shown as hexagons (64, _7;, 6 rings 2, 6 2, 6 3, 6 lions 3, 64 pts, _ Lu Fan state, two candidates 640 There is an absolute difference plus a total value below the critical value and a ''steep' gradient) (650XY, 66QXY). Another candidate is selected according to each, steep, gradient: candidate 670 is based on gradient 650XY, and then selected 680 is based on Gradient 660XY. Gradient descent search continues to use these new candidates 670, 68〇, according to the conjugate gradient descent step 440. d. Use previous neighbor inter-image predictive shift 7th image 4th neighborhood proximity search algorithm (step 430) The flow chart is executed by an embodiment of the mobile estimator 220. As described above, the search is 20Clienfs Docket N〇.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2007 /06/15 20 200803527 The current giant tile 310 (which has been categorized in the pre-encoded reference image selection block contains the adjacent image 205 encoding) is also included as a candidate 230. a corresponding macroblock. The step of calculating the candidate giant block coordinates begins at step 710, at By using the absolute value of the current huge block 310 address (the remaining value) and the number of giant blocks per line of the ancient ten tube-flag variable ramiD. If listening to the dragon is not Q, then TQmuD is straight. In addition, TOPVALID is False. In step 720, the _flag variable leftvaud is calculated by dividing the address of the macroblock block 310 by the integer and the number of macroblocks per line. If the divisor is not 0' then the LEFTVALID is true 'in addition, the leftvalid is False. These TOPVALID and LEHVALID variables indicate that the current giant tile 31〇 has a neighboring giant tile on the top and the left, respectively, considering the upper edge and the left edge of the giant tile. In step 730, the TOPVALID and LEFTVALID variables are combined to determine the current giant. The availability or existence of four candidate macroblocks adjacent to block 310. In particular, there is a giant tile L (LEFTVALID) on the left side; there is a giant tile T (TOPVALID) on the top; and a giant tile on the upper left. TF if (T0PVALID &LEFTVALID); there is a giant tile TR (TOPVALIM RIGHTVALID). Next, in step 740, the availability is determined for a previous candidate giant block ρ, which corresponds spatially to the current giant tile. One of the previously encoded reference images 230 of 310 Giant block. The relative positions of the five candidate giant blocks can be seen in Figure 8, where L system 810, system 820, TL system 830, TR system 840, and system 850. Returning to Figure 7, how many candidate giant tiles in steps 730 and 740 can be compared with 21Client's Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 200803527 From 1 to 5). Step 750 Scale - The candidate giant block is calculated to add the absolute difference. If 5 candidates are available, the total absolute value of the group is: Γ ( L Λ- Τ\ If some candidates are not available, those familiar with the art should know that the group of candidates is relatively small. Then complete (4) 43Q, shouting the candidate giant block with the lowest absolute value. ~ As previously discussed in conjunction with Figure 4, once the matching giant block is found (regardless of the use of the first-neighbor search method or the fifth gradient of the listening gradient ), then the search area is more limited, using a partial thorough search (Fig. 4, 4). After the local search, the result of the partial thorough search is calculated—the fractional motion vector. The calculation of the fractional motion vector will be detailed below. • Using the fractional motion vector of the quadratic surface model. 1 Those who are familiar with this technique should be familiar with the degree of conformity between the illustrated giant tiles and the search window to produce a “wrong surface.” Using a groundbreaking approach, swaying estimates The device 220 models the erroneous surface with a quadratic surface and analytically determines the minimum value of the surface with sub-pixel accuracy. The motion estimator 220 first gives a minimum value with the minimum value of the _square term. Estimate The device 220 then determines the minimum value of the orthogonal direction along this line. The general equation of the quadratic curve is as shown in Equation 1. 22Client,s Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/ Lin Yuhui/2007/06/15 22 200803527 Equation 1 y = Cx+C2t^Cf The curve is differentiated, as in Equation 2: C2 + 2C3t => i Equation 2 ϊ -dan coefficient c".2, ς Knowing that: the solution can be solved to determine, the minimum position shift estimator 220 solves Equation 3 to determine the coefficient Ci, the heart q. C2 C3 4 31 - 27 5, -27 25 A: 5 -5 1 Σ^2 Equation 3 The motion estimator 220 uses the equation 84 for the effective efficiency of the 84 absolute difference plus total instruction provided by the field j j 用圚。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 The value 'accumulates i represents the absolute difference plus the value of the adjacent macroblock in the X direction. If combined, the detailed description of the figure is as follows: 4 absolute difference plus total instruction efficient calculation of adjacent giant tiles ( x, y), (x+1, y), (x+2, yMX+3, y), the four absolute differences plus the total value, ie i = 〇 ·.. 3 and. As mentioned above, Once the coefficient has Equation 2 is used to determine the minimum value in the X direction. Equation 3 can be used to determine the minimum value t in the vertical direction. In this example, the motion estimator 220 uses the 8x4 absolute difference plus the total efficiency of the instruction. The four absolute differences of the adjacent giant blocks (X, y), (x+l, y), (x+2, y), (x+3, y) plus the total value. Equation 3 solves the calculation Since these absolute differences add up to 23Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/Lin Huanhui/2007/06/15 9 . 200803527 The coefficient is as described above. The coefficients are known and the equation is solved to lead to the minimum in the t, y direction. The second erroneous surface method used by the motion estimator 22〇 is used after the preferred ride-pixel boundary—the preferred tree is used: the expensive filter goes to the better tree (10) green of the money world. improvement. In the calculation of the absolute difference total accelerator 10 (4), the motion estimator 22G judges that the giant tile in the image is better in the current image-reference giant block. The motion estimator 22 uses the absolute difference provided by the processing unit 120 to add hardware acceleration, which is a graphics acceleration unit command. The absolute difference plus total instruction is to input a 4x4 reference block and an 8x4 prediction block, and generate 4 absolute difference plus total values. The size of the reference block and the prediction block can be changed as needed. The 4 χ 4 reference block and the 8 χ 4 prediction block are merely examples to illustrate the present invention, and should not limit the φ size of the reference block and the prediction block. Figure 9 is a block diagram showing the operation of the absolute difference plus total instruction for the reference and prediction blocks. As shown in Figure 9, the 8x4 prediction block is composed of a plurality of horizontally adjacent 4x4 blocks that overlap each other, such as blocks 910, 920, 930, 940. The absolute difference summation unit takes an input 4Χ4 reference block 950 and calculates the absolute difference plus the total value of the McCaw block and 910-940 blocks. That is, the absolute difference summation command calculates 4 values. One value is the sum of the absolute values of the difference between block 910 and block 950; the other value is the sum of the absolute values of the difference between block 920 and block 950; The other value is the sum of the absolute values of the difference between block 930 and block 950; the other value is the absolute value of the difference between block 940 and block 950, 24 Client, s Docket N〇.: S3U06-0025 TT's Docket No.. 0608-A41237-TW/final/林璟辉/2007/06Λ5 24 200803527 Total. See Figure 9B for tea. The absolute difference in the graphics processing unit 12 is accelerated. The four absolute difference plus total calculation units (96〇, 97〇, _, 99〇) are used to achieve the absolute difference sum. instruction. The leftmost 4x4 box 910 is provided for the absolute difference plus the total amount of the individual 96. Then round the 4χ4 square on the right to the absolute difference feed unit (10). Then input the right square ω square (93 〇), give, % to the difference total calculation unit 98 〇. Finally, the rightmost 4χ4 squares 940々, 邑, 邑 are added to the difference total calculation unit 9 . The graphics processing unit uses independent absolute difference summation calculation units in parallel, so the absolute difference plus total instruction produces 4 absolute difference plus total values per cycle. Those skilled in the art should be aware of the algorithms used to calculate the absolute difference summation operations of two blocks of the same size and the hardware design used to perform this operation, so these details will not be described in detail. • The 4x4 reference block is listed horizontally and vertically at the edge of the pixel. However, it is not necessary to correct the 4x4 prediction block 91〇-94〇 vertically. In one embodiment, the data is corrected by rotating (logic circuit 995) the reference block. Rotating the reference block instead of rotating the 4 prediction blocks separately saves the number of logic gates. The rotated reference block k is supplied to each of the independent absolute differences plus the total hardware acceleration unit. Each unit produces a value of 12 bits, and these values are combined into one bit round. In one embodiment, the magnitude of these values is based on the 11 texture coordinates of the prediction block (the lowest coordinate in the lowest bit position). 25Client,s Docket N〇.:S3U06-0025 TT,s Docket N〇:0608-A41237-TW/fmal/林璟辉/2007/06八5 25 200803527 The following code shows 8x8 squares, ie two adjacent 8x4 squares The absolute difference plus total value of , can be calculated using only 4 absolute difference plus total instructions. The temporary registers T, T, T, and T4 are used to temporarily store the four absolute difference plus total values. The variable sadS is used to accumulate these absolute differences plus the total value. The address of the 8x4 reference block is assumed to be in refReg. U and V are 8x8 prediction block texture coordinates. The following code produces the total absolute difference plus the total value of the entire 8x8 block, stored in sadS. SAD T1A refReg, U, V ; left-top of 8x8 prediction block

SAD T2, refReg, U+4, V ; right-top of 8x8 prediction block ADD sadS, ΤΙ, T2 SAD T3, refReg, U, V+4 ; left-bottom of 8x8 prediction block ADD sadS, sadS, T3 SAD T4, refReg, U+4, V+4 ; right-bottom of 8x8 prediction block ADD sadS, sadS, T4 然而，通常可以避免計算與加總所有4個子方塊的值，因為只要該總和達到目前最小值就可以停止該計算。下列的虛擬碼說明如何在一迴圈内使用絕對差值加總指令，其在總和達到一最小值時停止。工：=〇； SUM := 0; MIN = currentMIN； WHILE (工 < 4 I I SUM < MIN) SUM := SUM + SAD(refReg, U+(I%2)*4, V+ 26Clienf s Docket N〇.:S3U06-0025 TT’s Docket N(K〇608-A41237-TW/final/林璟輝/2007/06/15 26 200803527 (工>>1)*4); IP (SUM < currMIN) currMIN = MIN；SAD T2, refReg, U+4, V ; right-top of 8x8 prediction block ADD sadS, ΤΙ, T2 SAD T3, refReg, U, V+4 ; left-bottom of 8x8 prediction block ADD sadS, sadS, T3 SAD T4 , refReg, U+4, V+4 ; right-bottom of 8x8 prediction block ADD sadS, sadS, T4 However, it is usually possible to avoid calculating and summing the values of all four sub-blocks, as long as the sum reaches the current minimum value Stop the calculation. The following virtual code shows how to use the absolute difference plus total instruction in a loop, which stops when the sum reaches a minimum. Work:=〇; SUM := 0; MIN = currentMIN; WHILE (work < 4 II SUM < MIN) SUM := SUM + SAD(refReg, U+(I%2)*4, V+ 26Clienf s Docket N〇 .:S3U06-0025 TT's Docket N(K〇608-A41237-TW/final/林璟辉/2007/06/15 26 200803527 (工>>1)*4); IP (SUM < currMIN) currMIN = MIN ;

Go to Next Search point；圖形處理單元120中的84絕對差值加總指令係直接由移動估測气220的先進搜尋演算法所使用，例如第5圖中所說明的執行局部徹底搜尋。此外，紋理快取1〇6〇 (第1〇圖）係方塊校正，而移動估測器220所使用的演算法，如上所述，係像 φ 素权正。儘管可以將多工器單元加到圖形處理單元120中以處理這些校正a吳差，然而這麼做會增加邏輯閘數與電力消耗。取而代之，圖形處理單元120使用這些多餘的預算到4個絕對差值加總單元，而不是只用1個。在一些實施例中，8χ4絕對差值加總指令提供了有效率地運算最小值之優點，這牽涉到計算鄰近方塊的絕對差值加總值。在一些實施例中，8Χ4絕對差值加總指令提供了徹底搜尋（方塊440)之另一優點，當步驟值為1時’其計算各對角的絕對差值加總值。 4.圖形處理器已經討論過移動估測器220之軟體演算法實現以及該演算法在圖形處理單元120中之8x4絕對差值加總指令的使用’ 接下來詳細說明絕對差值加總指令與圖形處理單元120。 a·圖形處理單元流第10圖係圖形處理單元120的資料流程圖，其中指令流 27Client’s Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 27 200803527 係由第ίο圖左邊之箭頭，而影像或圖形流係由右邊的箭頭表示。第10圖省略了數個熟悉此項技藝者習知的元件，這些對解釋圖形處理單元120之回路内去方塊效應特徵非必要。一指令流處理器1010從一系統匯流排（未示）接收一指令ΐ020 ’並解碼該指令，產生指令資料Γ030，例如頂點資料。圖形處理單元120支援一習知圖形處理指令，以及加速視訊編碼及/或解碼的指令，例如前述之8χ4絕對差值加總指令。瞻‘知圖形處理指令牽涉到如頂點著色（vertex shading)、幾何著色（geometry shading)、像素著色（pixei shading)等難題。因此，指令資料1〇3〇係施用於著色器執行單元（shader execution units)之池（pool) 740。著色執行單元必要使用一紋理濾波單元（TFlJ，texture filter unit) 750以施加一紋理至一像素。紋理資料係快取自紋理快取1〇6〇，其係在主記憶體（未示）後面。 • 一些指令送給視訊處理單元1100，其運作將於後說明。產生的資料接著由後包裝器（p〇st—packer 1070 )處理’其壓、細5亥資料。在後處理（p〇st_pr〇cessing)之後，由視訊加速單元所產生的資料係提供給執行單元池 (execution unii: ρ0〇ι) 1〇40。視訊編碼/解碼加速指令的執行，例如前述之絕對差值加總指令’在許多方面與前述之習知圖形指令不同。首先，視訊加速指令係由視訊處理單元11〇〇執行，而非著色器執 28Client’s Docket No.:S3U06-0025 TPs Docket No:0608-A41237-TW/fmal/林環輝/2〇〇7/〇6/15 28 200803527 行單元。其次，視訊加速指令不使用其紋理資料。然而，視訊加速指令所使用的影像資料與圖形指令所使用的紋理資料均為2維陣列。圖形處理單元120同樣利用此優點，使用紋理濾波單元1050下載給視訊處理單元 1100的影像資料，因而使紋理快取1060快取一些由視訊Go to Next Search point; The 84 absolute difference summation command in graphics processing unit 120 is used directly by the advanced search algorithm of motion estimation gas 220, such as performing a partial thorough search as illustrated in FIG. In addition, the texture cache 1 〇 6 〇 (Fig. 1) is a block correction, and the algorithm used by the motion estimator 220, as described above, is like φ prime weight. Although the multiplexer unit can be added to the graphics processing unit 120 to handle these corrections, doing so increases the number of logic gates and power consumption. Instead, the graphics processing unit 120 uses these extra budgets to four absolute difference summing units instead of just one. In some embodiments, the 8χ4 absolute difference summing instruction provides the advantage of efficiently computing the minimum value, which involves calculating the absolute difference sum value of the neighboring blocks. In some embodiments, the 8Χ4 absolute difference summation instruction provides another advantage of a thorough search (block 440), which calculates the absolute difference plus the total value for each diagonal when the step value is one. 4. The graphics processor has discussed the implementation of the soft algorithm of the motion estimator 220 and the use of the 8x4 absolute difference summation instruction in the graphics processing unit 120. Next, the absolute difference summing instruction is described in detail. Graphics processing unit 120. a. Graphics Processing Unit Flow Figure 10 is a data flow diagram of the graphics processing unit 120, wherein the instruction stream 27Client's Docket No.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2007/06/15 27 200803527 is the arrow on the left side of the picture, and the image or graphics flow is indicated by the arrow on the right. Figure 10 omits several elements familiar to those skilled in the art which are not necessary to explain the in-loop deblocking characteristics of the graphics processing unit 120. An instruction stream processor 1010 receives an instruction ΐ 020 ' from a system bus (not shown) and decodes the instruction to generate instruction data Γ 030, such as vertex data. Graphics processing unit 120 supports a conventional graphics processing instruction, as well as instructions for speeding up video encoding and/or decoding, such as the aforementioned 8 χ 4 absolute difference summing instructions. Looking at the ‘graphic processing instructions involves issues such as vertex shading, geometry shading, and pixel pixei shading. Therefore, the instruction data 1〇3 is applied to the pool 740 of shader execution units. The shading execution unit necessarily uses a texture filter unit (TF1J) to apply a texture to a pixel. The texture data is taken from texture cache 1〇6〇, which is behind the main memory (not shown). • Some instructions are sent to the video processing unit 1100, the operation of which will be explained later. The resulting data is then processed by a post-packer (p〇st-packer 1070) to its pressure and fine data. After post-processing (p〇st_pr〇cessing), the data generated by the video acceleration unit is supplied to the execution unit pool (execution unii: ρ0〇ι) 1〇40. The execution of the video encoding/decoding acceleration instructions, such as the aforementioned absolute difference summing instructions, is different in many respects from the conventional graphics instructions described above. First, the video acceleration command is executed by the video processing unit 11 instead of the shader 28Client's Docket No.: S3U06-0025 TPs Docket No: 0608-A41237-TW/fmal/林环辉/2〇〇7/〇 6/15 28 200803527 Line unit. Second, the video acceleration instructions do not use their texture data. However, the image data used by the video acceleration command and the texture data used by the graphics commands are both 2-dimensional arrays. The graphics processing unit 120 also uses this advantage to download the image data to the video processing unit 1100 using the texture filtering unit 1050, thereby causing the texture cache 1060 to cache some of the video.

I 處理單元Π00運作之影像資料。：因此，示於第10圖，視訊處理單元1100係位於紋理濾波單元1050與後包裝器 1070之間。紋理滤波早元10 5 0檢驗從指令10 2 0媚取的指令貢料 1030。指令資料1030更提供紋理濾波單元1050主記憶體 (未示）内想要的影像資料的座標。在一實施例中，這些座標標明為U、V對，熟悉此項技藝者應對此熟悉。當指令 1020係一視訊加速指令時，所擷取的指令資料1030更命令紋理濾波單元1050略過紋理濾波單元1050内的任何紋理濾波器（未示）。因此，紋理濾波單元1050受到視訊加速指令的控制下載影像資料給視訊處理單元Π00。依此法，紋理濾波單元105Q係受操縱為視訊加速指令去下載影像資料給視訊加速單元1100。視訊處理單元 Π00從資料路徑上的紋理濾波單元1050接收影像資料，與命令路徑上的命令資料1030，並根據命令資料1030對該影像資料執行一運作。由視訊處理單元1100所輸出影像資料係回饋給執行單元池1040，在由後包裝器1070處理之後。 29Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 29 200803527 b.指令參數現在說明視訊處理單元11GG在執行絕對差值加總視訊加速指令的運作。如先前說_ ’各_處理單^指令係解碼且分析（parsed)為指令資料1030，其可視為各指令之特定參數集。絕對差值加總指令的參數示於第1表。第1表：圖形處理單元的絕對差值加總指令I Processing unit Π00 operation of the image data. Thus, as shown in FIG. 10, video processing unit 1100 is located between texture filtering unit 1050 and back wrapper 1070. The texture filtering early 10 5 0 checks the instruction tribute 1030 from the instruction 10 2 0. The command material 1030 further provides coordinates of desired image data in the main memory (not shown) of the texture filtering unit 1050. In one embodiment, these coordinates are designated U and V pairs and should be familiar to those skilled in the art. When the instruction 1020 is a video acceleration command, the retrieved instruction data 1030 further commands the texture filtering unit 1050 to skip any texture filters (not shown) within the texture filtering unit 1050. Therefore, the texture filtering unit 1050 is controlled by the video acceleration command to download the image data to the video processing unit Π00. According to this method, the texture filtering unit 105Q is manipulated as a video acceleration command to download image data to the video acceleration unit 1100. The video processing unit Π00 receives the image data from the texture filtering unit 1050 on the data path, and the command material 1030 on the command path, and performs an operation on the image data according to the command data 1030. The image data output by the video processing unit 1100 is fed back to the execution unit pool 1040 after being processed by the post wrapper 1070. 29Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 29 200803527 b. Instruction parameters Now the video processing unit 11GG is performing the absolute difference summation video acceleration command Operation. As previously stated, the _ processing _ processing instructions are decoded and parsed into the instruction material 1030, which can be considered as a specific parameter set for each instruction. The parameters of the absolute difference plus total instruction are shown in the first table. Table 1: Absolute difference plus total instruction of graphics processing unit

輸入/ 輸出名稱大小敘述輸入 FieldFlag 1-位元若 FieldFlag == 1 貝ij Field Picture，其餘則 Frame Picture 輸入 TopFieldFlag 1-位元若 TopFieldFlag 1 貝ij Top-Field-Picture, 其他 Bottom-Field-Picture 若設定了 FieldFlag· 輸入 PictureWidth 16·位元例如：1920用於HDTV 輸入 PictureHeigh t 16-位元例如：1080用於30P HDTV 輸入 BaseAddress 32-位元無符號的預測圖片基本位址輸入 BlockAddres s U: 16-位元有符號的預测圖片紋理座標（關係於基本位址）在 SRC1 Opcode SRC1[0:151 = U，SRC1『31:161=V 30Ciient’s Docket No· :S3U06-0025 TT，s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 30 200803527 V: 16-位元有符號的 U，V為13.3格式，忽略分數部分輸入 RefBlock 128-位元參考圖片資料 ~ 在 SRC2 Opcode 輸出·. Destination 4x16-位 I28位元暫存器中最不重要的32位f Operand 元在 DST OpcodeInput/output name size Description Input FieldFlag 1-bit if FieldFlag == 1 Bay ij Field Picture, and the rest Frame Picture Input TopFieldFlag 1-bit if TopFieldFlag 1 Bay ij Top-Field-Picture, Other Bottom-Field-Picture FieldFlag is input. InputWidth is 16 bits. For example: 1920 is used for HDTV input PictureHeight 16-bit. For example: 1080 for 30P HDTV input BaseAddress 32-bit unsigned predicted picture Basic address input BlockAddres s U: 16 - Bit-signed signed picture texture coordinates (relative to the base address) at SRC1 Opcode SRC1[0:151 = U, SRC1"31:161=V 30Ciient's Docket No: :S3U06-0025 TT,s Docket No: 0608-A41237-TW/fmal/林璟辉/2007/06/15 30 200803527 V: 16-bit signed U, V is 13.3 format, ignore fractional input RefBlock 128-bit reference picture data ~ SRC2 Opcode output · Destination 4x16-bit I28-bit scratchpad among the least significant 32-bit f Operand elements in DST Opcode

結合使用數個輸入參數以判定由紋理濾波單元1〇5〇所擷取的4x4方塊位址。BaseAddress參數指出在紋理快取中該紋理貢料的起點。將此區域内左上方塊座標給BaseAddass來數。PictureHeight與PictureWidth輸入參數係用來判斷該方塊的範® ’即左下方座標。最後，視訊目形可為漸進式掃晦 (progessive)或隔行掃瞄（interiace)。若為隔行掃瞄，其係由兩個方向組成（上方與下方）。紋理濾波單元75〇使用 FieldFlag與TopFieldFlag以適當處理隔行掃瞄影像。 c.影像資料轉換為執行絕對差值加總指令，視訊處理單元從紋理濾波單元1050擷取輸入像素方塊並對這些方塊^行轉換’轉換為-適當格式以利絕對差值加總加速：元謂處理。像素方塊接著被提供至絕對差值加總加速单兀960-謂’其回覆絕對差值加總值。各絕對差值加總值接者被累積至目標暫存器。這些功能將於後詳述。視訊處理單元H00接收定義計算該絕 = 3IClienfs DocketNo.:S3U06-0025 宁差值加、、、心值 TT s Docket N〇:0608_A41237-丁W/flnal/林璟輝/2〇〇7/〇6/15 31 200803527 之8x4方塊的兩個輸入參數。參考方塊的資料係直接由 SRC2運作碼直接定義· 8x4x8位元方塊視為128位元的資料。相對地，SRC1運作碼定義預測方塊的位址而非資料。視訊處理單元1100提供這些位址給紋理濾波單元lQ5〇，其從紋理快取1060 _取128位元的預测方塊資料。A plurality of input parameters are used in combination to determine the 4x4 block address captured by the texture filtering unit 1〇5〇. The BaseAddress parameter indicates the starting point of the texture metric in the texture cache. The coordinates of the top left square in this area are given to BaseAddass. The PictureHeight and PictureWidth input parameters are used to determine the square of the box, ie the lower left coordinate. Finally, the video shape can be a progressive progessive or an intentce. For interlaced scanning, it consists of two directions (upper and lower). The texture filtering unit 75 uses FieldFlag and TopFieldFlag to properly process the interlaced scanned image. c. The image data is converted into an absolute difference summing instruction, and the video processing unit extracts the input pixel blocks from the texture filtering unit 1050 and converts the squares into a suitable format to facilitate the absolute difference plus acceleration: Said processing. The pixel block is then provided to the absolute difference summation acceleration 兀 960- ‘ ’ replies its absolute difference plus the total value. Each absolute difference plus total value is accumulated to the target register. These features will be detailed later. Video processing unit H00 receives the definition calculation. The absolute = 3 IClienfs DocketNo.: S3U06-0025 宁 difference plus,,, heart value TT s Docket N〇: 0608_A41237-丁 W/flnal/林璟辉/2〇〇7/〇6/15 31 200803527 Two input parameters for the 8x4 block. The data in the reference block is directly defined by the SRC2 operating code. The 8x4x8 bit box is treated as 128-bit data. In contrast, the SRC1 operational code defines the address of the prediction block rather than the data. The video processing unit 1100 provides these addresses to the texture filtering unit lQ5, which takes the prediction block data of the 128-bit from the texture cache 1060_.

I 儘管影像資料包含亮度（Y)與彩度（Cb，Cr)平面，移動估測通常僅使用Y成分。因此，當執行絕對差值加總指令時，視訊處理單元1100所運作的像素方塊僅含有γ 成分。在一實施例中，視訊處理單元11〇〇產生一禁止信號’其指揮紋理濾波單元1050不要從紋理快取⑼擷取 Cr/Cb像素資料。第11圖係紋理濾波單元1050與紋理快取1〇6〇的方塊圖。紋理濾波單元1050係設計為從紋理快取⑼擷取紋理影像邊界（texel boundry)，並從紋理快取iQgQ下載 4x4紋理影像方塊至濾波輸入緩衝器111〇。當擷取資料代表視訊處理單元1100時，紋理影像1120被視為各有32 位元的4個通道（ARGB)’對於128位元的紋理影像大小。當為絕對差值加總指令擷取資料時，紋理濾波單元1〇5〇下載8x4x8位元方塊，其儲存在2個像素輸入緩衝器 (1110A、1110B)。絕對差值加總指令所使用的8X4影像方塊係如先前結合第9圖所述。視訊處理單元1100所示用的影像資料可能被位元組校正。然而，紋理濾波單元1050係被設計為從外取擷取紋 32Qient’s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 32 200803527 理影像邊界。因此，當為減處理單元n⑽擷取的資料時，紋理濾波單元1 〇 5 〇可能需要擷取達4個環繞在一特定位元組校正8x4方塊周圍的紋理影像校正知4方塊。 *亥私序可在第11目巾看到，其巾所擷取的方塊（目標方塊1130)對手在紋理影像邊界上，不論在垂直歹向或在水平方向。該目標方塊丨13G的u、ν位址定義8χ4χ8位元的最左上角，位元組校正方塊。在此例中，紋理快取單元 # 1050操取紋理影像1140、115〇、116〇、117〇以得到目標方塊1130。紋理濾波單元1〇5()接著結合從方塊li4〇_ii7〇所按位元選擇的行與列，顧目標方塊⑽的最左邊4χ4 位元係寫入濾波緩衝器1110Β。熟悉此項技藝之人士應當知道如何使用多工器、移位1(shifter)、遮罩位元（船吐 bits)達成該結果，不管從紋理快取1Q6()所擷取的4χ4 目標校正。在第11圖所示之實施例，當目標方塊113G包含一垂直紋理像素邊界，該資料不會垂直地重新排列。當此情形發生日寸，下載至濾波緩衝器1 ΠΟΑ與Π10Β的資料在垂直方向的順序與在快取中原本的順序不同。在此實施例中，視訊處理班員11〇〇必須垂直地重新排列（旋轉）128位元參考方塊資料以符合預測方塊的順序。在另一實施例中，在寫入其中一濾波緩衝益1110之前，紋理濾波單元垂直地重新排列快取紋理影像資料以符合原本的快取順序0 33Client’s Docket N〇.:S3U06-0025 TT’s Docket No..0608-A41237-TW/fmal/林珠輝/2007/06/15 33 200803527 說明或流程圖中的方塊應被理解為表示模。二ΓΓ程式碼，其包含用於實現特^邏輯電路功 ::=Γ驟之一個或多個可執行的指令。熟悉軟體 β者應當瞭解到，其他的實露，。在其他的實現方法中，各鄉依= 揭i备之順序執行，包含f 涉之功能U上时進行或逆向進行，依所I Although the image data contains the luminance (Y) and chroma (Cb, Cr) planes, the motion estimation usually uses only the Y component. Therefore, when the absolute difference summing instruction is executed, the pixel block operated by the video processing unit 1100 contains only the gamma component. In one embodiment, video processing unit 11 generates a disable signal 'which directs texture filtering unit 1050 not to extract Cr/Cb pixel data from texture cache (9). Figure 11 is a block diagram of texture filtering unit 1050 and texture cache 1 〇 6 。. Texture filtering unit 1050 is designed to extract texture boundary (texel boundry) from texture cache (9) and download 4x4 texture image block from texture cache iQgQ to filter input buffer 111. When the captured data represents the video processing unit 1100, the texture image 1120 is treated as a texture image size of four channels (ARGB) each having 32 bits for 128 bits. When the data is fetched for the absolute difference summing instruction, the texture filtering unit 1〇5〇 downloads the 8x4x8 bit block, which is stored in the 2 pixel input buffers (1110A, 1110B). The 8X4 image block used for the absolute difference plus total command is as previously described in connection with Figure 9. The image data used by the video processing unit 1100 may be corrected by a byte. However, the texture filtering unit 1050 is designed to extract the pattern from the outside 32Qient's Docket N〇.: S3U06-0025 TT’s Docket No: 0608-A41237-TW/fmal/林璟辉/2007/06/15 32 200803527 Image boundary. Therefore, when subtracting the data retrieved by the processing unit n(10), the texture filtering unit 1 〇 5 〇 may need to capture up to 4 texture image correction 4 blocks around a particular byte correction 8x4 block. *Hai private order can be seen in the 11th towel, the squares captured by the towel (target block 1130) are on the boundary of the texture image, whether in the vertical direction or in the horizontal direction. The u and ν addresses of the target block 丨13G define the top left corner of the 8 χ 4 χ 8 bits, and the byte correction block. In this example, texture cache unit #1050 fetches texture images 1140, 115〇, 116〇, 117〇 to obtain target block 1130. The texture filtering unit 1〇5() then combines the rows and columns selected from the bits of the block li4〇_ii7〇, and the leftmost 4χ4 bits of the target block (10) are written to the filter buffer 1110Β. Those familiar with the art should know how to use the multiplexer, shifter, and mask bits to achieve this result, regardless of the 4χ4 target correction taken from texture cache 1Q6(). In the embodiment illustrated in Figure 11, when the target block 113G contains a vertical texel boundary, the material is not rearranged vertically. When this happens, the order of the data downloaded to the filter buffers 1 ΠΟΑ and Π 10 在 is different from the original order in the cache. In this embodiment, the video processing crew 11 must vertically rearrange (rotate) the 128-bit reference block data to match the order of the predicted blocks. In another embodiment, before writing to one of the filter buffers 1110, the texture filtering unit vertically rearranges the cached texture image data to conform to the original cache order. 0 33Client's Docket N〇.: S3U06-0025 TT's Docket No ..0608-A41237-TW/fmal/林珠辉/2007/06/15 33 200803527 The blocks in the description or the flowchart are to be understood as representations. A second program code that contains one or more executable instructions for implementing a special logic circuit work ::= step. Those who are familiar with software β should understand other implementations. In other implementation methods, the townships are executed in the order of the stipulations, and the functions involved in the f are performed or reversed.

規之系統與方法可以軟體、硬體或其結合實财，㈣驗/或Μ係㈣在記憶體中 (包Α而不Ρ且由位於一計算裝置中之適當處理器所執行 = 於一微處理器、微控制器、網路處理器、可重新策配處理界、、可，右_ σ 了擴域n在其他實施例中，該。方法係以邏輯電路實現，包含而不限於-可程式邏輯裝置（PLD ’ progr_able 1〇gic —)、可程式邏輯閉陣列（PGA，㈣ra_ble gate array)、現場可程式化邏輯閘陣列（騎，Held pr〇grammable gate array)或4寸定應用電路（ASIC)。在其他實施例中，這些邏輯敛述係在—圖形處理器或圖形處理單元（GPU)完成。在此揭路之系統與方法可被嵌入任何電腦可讀媒體而使用，或連結—指令執行系統、設備、裝置。該指令執行系統包含任何以電腦為基礎的系統、含有處理器的系統或其他可以從該指令執行系統擷取與執行這些指令的系統。所揭格之文子電細可讀媒體（c⑽puter-readable 34Client’s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 34 200803527 medium)”可為任何可以容納、儲存、溝通、傳遞或傳送該程式作為使用或與該指令執行系統連結之工具。該電腦可讀媒體可為，例如（非限制）為基於電子的、有磁性的、光的、電磁的、紅外線的或半導體技術的一系統或傳遞媒 1 r 使用電子技術之電腦可讀媒體之特定範例（非限制）可包含：具有一條或多條電性（電子）連接的線；一隨機存取記憶體（RAM，random access memory);—唯讀記憶體 (ROM，read-only memory);—可拭去可程式化唯讀記憶體（EPROM或快閃記憶體）。使用磁技術之電腦可讀媒體之特定範例（非限制）可包含：可攜帶電腦磁碟。使用光技術之電腦可讀媒體之特定範例（非限制）可包含：一光纖與一可攜帶唯讀光碟（CD-ROM)。雖然本發明在此以一個或更多個特定的範例作為實施例闡明及描述，不過不應將本發明侷限於所示之細節，然而仍可在不背離本發明的精神下且在申請專利範圍均等之領域與範圍内實現許多不同的修改與結構上的改變。因此，最好將所附上的申請專利範圍廣泛地且以符合本發明領域之方法解釋，在隨後的申請專利範圍前提出此聲明。【圖式簡單說明】示範性運算第1圖係用於圖形與視訊編碼及/或解碼之一平台之方塊圖。 35Client，s Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林環輝/2007/06/15 35 200803527 第2圖係第1圖之視訊編碼器16〇的功能方塊圖。第3A、B圖說明將目前圖像分割成不重疊的區段的巨圖塊。第4圖係第2圖之移動估測器所使用之演算法之一示範性實施例之流程圖。 r 第5圖係第4圖共軛梯度步驟440之一實施例的流程圖。第6圖說明使用第5圖之共軛梯度下降步驟44〇的示範狀態。第7圖係第4圖鄰近搜尋演算法之一實施例的流程圖。第8A、B圖說明第7圖之鄰近搜尋演算法所使用的5個候選巨圖塊的相對位置。第9A、B圖係說明對參考與預测方塊進行絕對差值加總指令運作的方塊圖。第10圖係第1圖之圖形處理單元的資料流程圖。第11圖係第10圖紋理濾波單元與紋理快取的方塊圖。【主要元件符號說明】 100〜系統、110〜主處理器、120〜圖形處理器（GPU)、130 〜記憶體、140〜匯流排、150〜視訊加速單元（VPU)、160〜 36Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/final/林璟輝/2007/06/15 36 200803527 辦 tr 权解碼态、17〇〜視訊加速驅動器。 3像、210〜減法器、220〜移動估測器、mo〜來考圖像245〜移動向量、咖〜預測方塊、26〇〜剩餘圖像、挪，〜離散餘旋轉換器、〜量化器、29G〜熵解喝器、测 . 碼器。 , 1〇目萷巨圖塊、320〜巨圖塊、330〜搜尋窗、34〇〜點。、400〜程序、〜判定移動向量將被圖像間預測或圖像内 =彳、42G〜施行共㈣度下降搜尋演算法、43G〜執行鄰近搜才440〜執行一局部區域徹底搜尋、45〇〜建立最佳候選巨圖塊與參考£圖塊間相符程度為二次表面、〜在—分數像素邊界找到—最佳候選巨圖塊調準、470〜根據該相符巨圖塊計算一分數移動向量。 • 505〜初始化一候選方塊、510〜計算候選巨圖塊^〃四周的候選巨圖塊的座標、515〜分別計算5個候選巨圖塊的絕對差值加總、52〇〜計算梯度&與&、525〜梯度是否低於一臨界值、530〜計算四個新候選巨圖塊的座標、5邪〜對各候選巨圖塊分別執行共軛梯度下降步驟440、540〜比較絕對差值加魄值疋否低於一臨界值、545〜回傳有最低絕對差值加總值的候選巨圖塊、550〜選擇一新的中央候選巨圖塊、555〜從梯度g 與A計算新的步驟值~與丨、560〜測試疊代迴圈數是否大於一最大值、565〜回傳不相符。 37Client5s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 37 200803527 610C〜候選巨圖塊、61OL-61 OR-61OT-610B〜四個周圍候選、620X-62GY〜初始候選計算梯度、 630TL-630TR-630BL-630BR〜四個新的中央候選巨圖塊、 640L-640R-640T-640B〜候選、670-680〜候選 710〜利用目前巨圖塊位址的絕對值與每行巨圖塊數计异一旗標變數T0PVALID。若此絕對值非〇，則了opvalid為真，此外，T0PVALID為假 720〜旗標變數LEFTVALID係利用目前巨圖塊位址的除以整數與母行巨圖塊數計异。若此除數非〇，則LEFTVALID為真，此外，LEFTVALID為假。 730〜結合使用T0PVALID與LEFTVALID變數以判定目前巨圖塊鄰近的4個候選巨圖塊的可得性。 _ 740〜為一先前候選巨圖塊p判定可得性。 750〜為每一可得候選巨圖塊計算絕對差值加總。 810-850〜候選巨圖塊。 910-940〜4x4方塊、950〜4x4參考方塊。 234〜旋轉邏輯、950〜預測方塊、960-990〜絕對差值加總計算單元、 38Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/iinal/林璟輝/2007/06/15 38 200803527 1010〜指令流處理器、1020〜指令、1030〜指令資料、1040 〜執行單元池、1050〜紋理濾波單元、1060〜紋理快取、1070 〜後包裝器、1100〜視訊處理單元。 1120〜紋理影像、113〇〜目標方塊、1140-1170〜紋理影像The system and method can be software, hardware or a combination of real money, (4) inspection / or system (4) in the memory (packaged and not executed by a suitable processor located in a computing device = a micro-processing , microcontroller, network processor, re-provisioning processing boundary, ok, right _ σ extended domain n in other embodiments, the method is implemented by a logic circuit, including but not limited to - programmable Logic device (PLD 'progr_able 1〇gic —), programmable logic closed array (PGA, (ra) ra_ble gate array), field programmable logic gate array (Held pr〇grammable gate array) or 4-inch application circuit (ASIC) In other embodiments, the logic is implemented by a graphics processor or a graphics processing unit (GPU). The system and method disclosed herein can be embedded in any computer readable medium, or a link-instruction. Execution system, device, device. The instruction execution system includes any computer-based system, a system containing a processor, or other system that can retrieve and execute these instructions from the instruction execution system. The text is fine-readable media (c(10)puter-readable 34Client's Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 34 200803527 medium)" can be any, Storing, communicating, transmitting or transmitting the program as a means of using or interfacing with the instruction execution system. The computer readable medium can be, for example, (non-limiting) electronically based, magnetic, optical, electromagnetic, infrared. Or a system or medium of semiconductor technology. A specific example of a computer readable medium using electronic technology (non-limiting) may include: a line having one or more electrical (electronic) connections; a random access memory (RAM, random access memory); - read-only memory (ROM, read-only memory); - wipeable programmable read-only memory (EPROM or flash memory). Computer-readable media using magnetic technology Specific examples (non-limiting) may include: portable computer disks. Specific examples of computer readable media using optical technology (non-limiting) may include: an optical fiber and a portable CD-ROM (CD-ROM) While the invention has been illustrated and described herein by way of the embodiments of the invention Many different modifications and structural changes are possible within the scope and scope of the scope of the invention. Therefore, it is preferred that the scope of the appended claims be broadly and construed in a manner consistent with the field of the invention, This statement. [Simple diagram of the diagram] Exemplary operation Figure 1 is a block diagram of one of the platforms for graphics and video coding and/or decoding. 35Client,s Docket No.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林环辉/2007/06/15 35 200803527 Figure 2 is a functional block diagram of the video encoder 16〇 of Figure 1. . Figures 3A and B illustrate giant tiles that divide the current image into segments that do not overlap. Figure 4 is a flow diagram of an exemplary embodiment of an algorithm used by the motion estimator of Figure 2. r Figure 5 is a flow diagram of an embodiment of a conjugate gradient step 440 of Figure 4. Figure 6 illustrates an exemplary state using the conjugate gradient descent step 44A of Figure 5. Figure 7 is a flow diagram of an embodiment of a proximity search algorithm of Figure 4. Figures 8A and B illustrate the relative positions of the five candidate giant tiles used in the proximity search algorithm of Figure 7. Figures 9A and B are block diagrams showing the operation of the absolute difference summing instructions for the reference and prediction blocks. Figure 10 is a data flow diagram of the graphics processing unit of Figure 1. Figure 11 is a block diagram of texture filtering unit and texture cache in Figure 10. [Main component symbol description] 100~ system, 110~ main processor, 120~ graphics processing unit (GPU), 130~memory, 140~bus, 150~video acceleration unit (VPU), 160~36Clienfs Docket N〇 .:S3U06-0025 TT's Docket No:0608-A41237-TW/final/林璟辉/2007/06/15 36 200803527 Do the tr decoding mode, 17〇~video acceleration driver. 3 image, 210~subtractor, 220~moving estimator, mo~to test image 245~moving vector, coffee~predicting block, 26〇~remaining image,moving,~discrete cosine converter,~quantizer , 29G ~ entropy solution, measuring. Code. , 1 〇萷 giant block, 320 ~ giant block, 330 ~ search window, 34 〇 ~ point. , 400 ~ program, ~ determine the motion vector will be inter-image prediction or image within = 彳, 42G ~ implementation of the total (four) degree drop search algorithm, 43G ~ execution of the proximity search 440 ~ perform a partial area thorough search, 45 〇 ~ Establish the best candidate giant block and the reference £ block to match the secondary surface, ~ at the - fractional pixel boundary - the best candidate giant block alignment, 470 ~ calculate a fractional movement based on the matching giant block vector. • 505~ initialize a candidate block, 510~ calculate the coordinates of the candidate giant block around the candidate giant block, 515~ calculate the absolute difference sum of the five candidate giant blocks, respectively, 52〇~ calculate the gradient & And &, 525~ gradient is lower than a critical value, 530~ calculate coordinates of four new candidate giant block, 5 evil~ perform conjugate gradient descending steps 440, 540 to absolute difference of each candidate giant block respectively Value plus value 疋 No lower than a critical value, 545~Return the candidate giant block with the lowest absolute difference plus the total value, 550~Select a new central candidate giant block, 555~ Calculate from the gradient g and A The new step value ~ and 丨, 560 ~ test the number of iterations of the loop is greater than a maximum value, 565 ~ back pass does not match. 37Client5s Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 37 200803527 610C~Candidate giant block, 61OL-61 OR-61OT-610B~ four surrounding candidates 620X-62GY~ initial candidate calculation gradient, 630TL-630TR-630BL-630BR~ four new central candidate giant block blocks, 640L-640R-640T-640B~candidate, 670-680~candidate 710~ utilize current giant tile The absolute value of the address is different from the number of macroblocks per line by the flag variable T0PVALID. If the absolute value is not 〇, the opvalid is true. In addition, the T0PVALID is false. The 720~flag variable LEFTVALID is calculated by dividing the current giant tile address by the integer and the number of the parent macroblock. If the divisor is not 〇, LEFTVALID is true, and LEFTVALID is false. 730~ combines the T0PVALID and LEFTVALID variables to determine the availability of the four candidate giant tiles adjacent to the current giant tile. _ 740~ determines the availability of a previous candidate giant block p. 750~ Calculate the absolute difference sum for each available candidate block. 810-850~ Candidate giant block. 910-940~4x4 blocks, 950~4x4 reference blocks. 234~Rotating logic, 950~predictive block, 960-990~absolute difference plus total calculation unit, 38Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/iinal/林璟辉/2007/06/15 38 200803527 1010~ instruction stream processor, 1020~ instruction, 1030~ instruction data, 1040~ execution unit pool, 1050~ texture filtering unit, 1060~ texture cache, 1070~ post wrapper, 1100~ video processing unit. 1120 ~ texture image, 113 〇 ~ target square, 1140-1170 ~ texture image

1 I 、1Π0Α-Β〜緩衝器。：；1 I , 1Π0Α-Β~ buffer. : ;

39Clienfs Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 3939Clienfs Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 39

Claims

200803527 <10. Applying for the patent garden: The quantity is "The movement of the test box" The direction of the website is thoroughly searched to generate the ::::: area to thoroughly search for the better phase =: will ^ area 'The best match has an integer pixel resolution; the degree of conformity with the reference block is modeled as a long-term surface; the two counties _-minimum, the fractional resolution-best match block are determined analytically; ...~有向向=The best matching block with fractional resolution calculates a fractional shift. 2. The method of determining the relative motion of the motion vector according to the claim of the patent scope, wherein the decision is made in a plurality of predictors Which: a step that corresponds to the reference block preferably includes: determining whether the current frame is an intra-image prediction, and if the frame of the money is an image_test, then the co-technical brother determines the plurality of Which of the prediction blocks is the best match. Down~3=Apply the subjective (four) item's decision to describe the method of changing the mobile vector relative to the temple, wherein the step of determining the plurality of predictors and the reference block preferably includes . Client s Docket N〇.:S3U06-0025 匕3 · °c et No-〇608-A41237-TW/fmal/#if^/2〇〇7/〇6/15 40 200803527 Is the current frame Inter-image prediction; and if the x-ray frame is inter-image prediction, then search for _fu & +: adjacent blocks • the plural predictions two =: 4. If the square shifts two: =:=, Refer to / for the face-to-face pre-blocks for the secret position - group 4 blocks, perform a partial search for this partial area. ▲ 5. The method of claim 1, wherein the step of determining the minimum value of the two-person surface comprises: determining a first The minimum value of the direction; the minimum value of the second direction perpendicular to the first direction. 6. If the determination of the scope of the patent application section describes the method of moving the movement vector relative to the reference block, the step of analyzing the minimum value of the secondary surface by the & towel comprises: calculating the optimum Matches one of the adjacent blocks of the prediction block to the absolute difference plus the total value. The method of claim 1, wherein the step of determining the minimum value of the secondary surface relative to the movement of the reference block further comprises: counting one of the plurality of squares The sum of the difference values, the first of the plurality of blocks being adjacent to the best matching prediction block in a first direction, 41Clienfs Docket N〇.: S3U06-0025 TT'SD〇cketN0: _8-A41237-TW/ Fmal_f4/2〇〇7/〇6/15 4 200803527 1 The remaining squares of the plurality of squares are respectively adjacent to the other of the plurality of squares. 8. The method of claim 1, wherein the step of determining the recorded small value of the secondary surface comprises: ^ calculating a plurality of squares. An absolute difference plus a total value: and wherein the absolute difference plus total value of the plurality of calculated blocks is performed using an absolute difference summing instruction executed by a graphics processing unit 。. 9. A method of determining a motion vector describing movement relative to a reference block, the method comprising: determining, according to a matching criterion, which of the plurality of prediction blocks has a better match with the reference block; performing a partial region thoroughly Searching to produce a best match with the reference block, the local area is thoroughly searched for a region around the center of the preferred coincident prediction block, the best match having integer pixel resolution; and analytically determining the modeling The best match corresponds to a minimum of a quadratic surface of the degree of coincidence between the reference blocks, the minimum corresponding to a best matching block of fractional resolution. 10. The method of claim 9, wherein the step of determining the movement vector relative to the movement of the reference block, wherein the step of determining which of the plurality of prediction blocks has a better match with the reference block comprises: The current frame is an intra-image prediction, using a conjugate gradient to search 42Client's Docket N〇.:S3U06-0025 TT,s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 42 200803527 Method, searching for the plurality of prediction blocks to determine the preferred match. 11. The method of claim 1, wherein the determination of the movement vector relative to the movement of the reference block, wherein the step of searching for the plurality of prediction blocks to determine the preferred match further comprises: a prediction block selection, a candidate block; calculating a first absolute difference sum value of a first block to the left of a fixed distance of the candidate block and a second block to the right of the fixed distance of the candidate block a second absolute difference plus a horizontal gradient between the total values; calculating a third absolute difference sum value of a third-party block above the fixed distance of the candidate block and the fixed distance at the candidate block a fourth vertical difference between a fourth square and a vertical gradient between the total values; if the horizontal and vertical gradients are lower than a gradient threshold, the fixed distance is adjusted according to the horizontal and vertical gradients; a plurality of new candidate blocks located at a fixed distance from the block having the lowest absolute difference plus the total value in the first, second, third, and fourth blocks; and a plurality of new candidate blocks The new candidate block repeats the steps following the selection of a candidate block step. 12. The method of claim 11, wherein the method for describing the movement vector relative to the movement of the reference block further comprises: if the horizontal and vertical gradient system is greater than or equal to the gradient threshold, then the first The second, third, and fourth absolute difference plus total values are compared with an absolute difference plus the total threshold; 43Client's Docket N〇.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2007/ 06/15 43 200803527 Any one of the first, second, third, and fourth absolute difference plus total values of the right 低于 is lower than the absolute difference plus the total threshold, then the first, second, second, The fourth block has the lowest difference plus the total value of the block to determine that the preferred match is 0 ^ 13. - a computer-readable medium having a program for determining a motion vector. The program contains the set wire to perform the following steps. Logic: & Determine which of a plurality of prediction blocks and one according to the - match criterion

The reference block has a better match; the field performs a partial search of the local area to produce a better match with the reference block. The local consumption domain search is in the (four) preferred matching prediction block as the central friend - (4), the domain can be used (4) Digital pixel resolution 1 Modeling the degree of matching between the best match and the reference block - the subsurface; ,, determining the minimum value of the secondary surface by the knife to determine the value of the singular t a best matching block having a knife angle resolution; and calculating a motion vector based on the best matching block having the fractional resolution. The method of claim 13 of the patent application scope is used to determine that the movement is readable by the computer, wherein the step of determining which of the plurality of prediction blocks has a preferred alignment with the reference block further comprises: Judging - the current frame is inter-image prediction or if the image in the image is in-image prediction, then a conjugate gradient is used to determine which of the plurality of pre-funds is the best match. 44Client5s Docket No.:S3U06-0025 ' TT s Docket No.0608-A41237-TW/fmal/林环辉/2007/06/15 44 200803527 轻参若°海目筋框框为 inter-image prediction, then Search for the neighboring box of 今夫# _ 扪了 ° ° ° 苓苓。。。。。。乂该该犹犹犹预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测Marriage system, miscellaneous search for the test of the surrounding area of the plural _ square ... =: heart: =: pre-two? Bureau, the better match around the prediction side - the region performs the local area thorough search. To the computer readable medium having the program for determining the movement σ in the 13th item of the patent scope, the method further comprises: determining whether the current frame is an intra-image prediction; In-image prediction, using a plurality of prediction blocks to determine which one of the plurality of prediction blocks is the preferred match and/or the preferred match_block (four) position - the group of 4 blocks, performing a partial area of 5 Search thoroughly. π· ”Please refer to the computer readable medium of claim 13 for determining the movement direction of the program, wherein the step of determining the minimum value of the secondary surface by the analysis further includes: The minimum value in one direction; 45Client5s Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 The determination is perpendicular to the first direction—the minimum of the second direction 45. The method of claim 13 is the computer readable medium having a program for determining a motion vector, wherein the step of determining a minimum value of the secondary surface analytically comprises: calculating the most The absolute difference between one of the adjacent blocks of the prediction block plus the total value of r I. : : 19. The computer readable medium having the program for determining the motion vector according to claim 13 of the patent application, wherein the determination is analytically determined The step of the minimum value of the secondary surface further comprises: calculating an absolute difference plus a total value of the plurality of blocks, wherein a first one of the plurality of blocks is adjacent to the best matching predictor in the first direction Block, the remaining squares of the plurality of squares are respectively adjacent to the other one of the plurality of squares

46Clienfs Docket N〇,:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15