TW200803527A - Method for determining a motion vector describing motion realtive to a reference block and storage media thereof - Google Patents

Method for determining a motion vector describing motion realtive to a reference block and storage media thereof Download PDF

Info

Publication number
TW200803527A
TW200803527A TW096122000A TW96122000A TW200803527A TW 200803527 A TW200803527 A TW 200803527A TW 096122000 A TW096122000 A TW 096122000A TW 96122000 A TW96122000 A TW 96122000A TW 200803527 A TW200803527 A TW 200803527A
Authority
TW
Taiwan
Prior art keywords
block
determining
blocks
prediction
docket
Prior art date
Application number
TW096122000A
Other languages
Chinese (zh)
Other versions
TWI350109B (en
Inventor
Zahid Hussain
Original Assignee
Via Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Tech Inc filed Critical Via Tech Inc
Publication of TW200803527A publication Critical patent/TW200803527A/en
Application granted granted Critical
Publication of TWI350109B publication Critical patent/TWI350109B/en

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)

Abstract

A method for determining a motion vector describing motion relative to a reference block, the method comprising: determining which of a plurality of prediction blocks is a good match with the reference block, according to a match criteria; performing a local area exhaustive search to produce a best match with the reference block, the search performed in an area centered around the good match prediction block, the best match having integral pixel resolution; modeling the degree of match between the best match and the reference block as a quadratic surface; analytically determining a minima of the quadratic surface, the minima corresponding to a best matching block with fractional resolution; and computing a fractional motion vector based on the best matching block with fractional resolution.

Description

200803527 九、發明說明: 【發明所屬之技術領域】 目前所揭露的内容關於一圖形處理單元,且尤其係關 於具有影像壓縮與解壓縮特徵之圖形處理單元。200803527 IX. DESCRIPTION OF THE INVENTION: TECHNICAL FIELD The present disclosure relates to a graphics processing unit, and more particularly to a graphics processing unit having image compression and decompression features.

I 【先前技術】 個人電腦與消費性電子產品係用於各種娛樂用品。這 • 些娛樂用品可.以大致區分為2類:使用電腦製圖 (computer-generated graphics)的那些,例如電腦遊戲; 與使用壓細視資料流(compressed video streaifi)的那 些,例如預錄節目到數位式影音光碟(DVD)上,或由有線 電視或備星業者提供數位節目(digital programming)至 一機上盒(set-top box)。第2種亦包含編碼類比視訊資 料流’例如由一數位錄影機(DVR,digital video recorder ) ^ 所執行。 電腦製圖通常由一圖形處理單元(GPU,graphic processing unit)產生。一圖形處理單元是一種建立在電 '腦遊戲平台(computer game consoles)與一些個人電腦 上一種特別的微處理器。一圖形處理單元係被最佳化為快 速執行描繪三度空間基本物件(three-dimensional primitiveobjects),例如三角形、四邊形等。這些基本 物件係以多個頂點描述,其中每個頂點具有屬性(例如顏 色),且可施加紋理(texture)至該基本物件上。描繪的 6Clienfs Docket N〇.:S3U06-0025 TT’s Docket N(K〇608-A41237-TW/final/林環輝/2007/06/15 6 200803527 結果係一二度空間像素陣列(two-dimensional array of pixels),顯示在一電腦之顯示器或監視器上。 視訊資料流的編碼與解碼牽涉到不同種類的運算,例 如,離散餘弦變換(discrete cosine transform)、移動 估測(motion estimation ),、移動補償:齡衍⑽ compensation )、去方塊效應濾波器(deblocking f i 1 ter )。這些計算通常由一般用途中央處理器(cpu )結 合特別的硬體邏輯電路,例如特殊應用積體電路(ASIC, application specific integrated circuit),來處理。 消費者因而需要多個運算平台以滿足他們的娛樂需求。因 而需要可以處理電腦製圖與視訊編碼/解碼的單一 十管平 台。 【發明内容】 本發明之一態樣係一種判定插述相對於一表考方塊之 移動的一移動向量的方法,該方法包含·柄缺 ^ 0 ·很艨一相符標準, 判定複數個預測方塊中哪一個與該參考方塊有—_ ρ才〒 執行一局部區域徹底搜尋以產生與該泉老古 、 咏 〃亏方塊的一最佳相 付’ 5亥局部區域徹底搜哥在以该車又佳相符預、、則方境^ ;、、、 之周圍之一區域,該最佳相符具有整數料解央 將該最佳相符與該參考方塊間相符的程度建模為 X ’ 面;分析地判定該二次表面的一最小值,兮异 f 表 系取小值對應有分 數解析度的一最佳相符方塊;以及根據該有分,數解^产的I [Prior Art] Personal computers and consumer electronics are used in a variety of entertainment products. These entertainment items can be roughly classified into two categories: those using computer-generated graphics, such as computer games; and those using compressed video streaifi, such as pre-recorded programs. Digital audio and video (DVD), or digital programming from cable or satellite operators to a set-top box. The second type also includes a coded analog video stream', for example, implemented by a digital video recorder (DVR). Computer graphics are usually generated by a graphics processing unit (GPU). A graphics processing unit is a special type of microprocessor built on the computer 'computer game consoles' and some personal computers. A graphics processing unit is optimized to quickly perform three-dimensional primitive objects, such as triangles, quads, and the like. These basic objects are described by a plurality of vertices, where each vertex has an attribute (e.g., a color) and texture can be applied to the base object. 6Clienfs Docket N〇.:S3U06-0025 TT's Docket N(K〇608-A41237-TW/final/林环辉/2007/06/15 6 200803527 The result is a two-dimensional array of pixels Pixels), displayed on a computer monitor or monitor. The encoding and decoding of video streams involves different kinds of operations, such as discrete cosine transform, motion estimation, and motion compensation. : age (10) compensation), deblocking fi 1 ter. These calculations are typically handled by a general purpose central processing unit (CPU) in conjunction with a particular hardware logic circuit, such as an application specific integrated circuit (ASIC). Consumers therefore need multiple computing platforms to meet their entertainment needs. There is therefore a need for a single ten-pipe platform that can handle computer graphics and video encoding/decoding. SUMMARY OF THE INVENTION One aspect of the present invention is a method for determining a motion vector for interpolating a movement relative to a test block, the method comprising: a handle missing ^ 0 · a very consistent standard, determining a plurality of prediction blocks Which one has the -_ ρ 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 〒 The best match corresponds to a region around the environment ^ ; , , , , and the best match has an integer material solution to model the degree of matching between the best match and the reference block as X ' face; Determining a minimum value of the secondary surface, the difference f table is a best matching block having a small value corresponding to the fractional resolution; and according to the score, the number of solutions

Docket N〇.:S3U06-0025 1 s Docket N〇:0608-A41237-TW/fmal/林璟輝/2007/06/15 7 200803527 最佳相符方塊計算一分數移動向量。 本發明之另一態樣係一種判定插述相對於一參 之移動的一移動向量的方法,該方法包含:根據〜相 準,判定複數個預測方塊中哪一個與該袁考方塊付才不 1 ^ ^(去 , 相符;執行一局部區域徹底彳叟尋以產生與节|考方 最佳相符’該局部區域徹底:搜尋在以該較佳相符預測的 為中央之周圍之-區域,該最佳相符具有整數像素解:方塊 鲁α及分析地判定建模該最佳相符與該參考方塊間斤度; 度的一二次表面的一最小值,該最小值對應有分數 的一最佳相符方塊。 %析度 本發明之另-態樣係-種具有用來判斷一移 一程式之電腦可讀媒體,該㈣含•置成絲執行下列 步驟的邏輯:根據一相符標準,矣丨〜&gt; ^ 弋设數個預測方塊中輝 一個與一麥考方塊有一較佳相符· /一 ’ W仃—局部區域 尋以產生能參考方塊的—最佳彳1 # 土4斤π ' δ亥局部區域徹底搜 哥在以該較佳相符預測方塊為中央夕a 、之周圍之一區域,該最 佳相符具有整數像素解析度;將該 匕、^取 pa 乜相符與該參考方塊 間相付的私度建核為一二次表面· .县I伯 %曰|处 ’刀析地判定該二次表面 的一取小值,该取小值對應有分數 塊;以及根據該有分數的最/度的—最佳相符方 移動向量。 * 相符方塊計算一分數 【實施方式】 8Clienrs Docket N〇.:S3U06-0025 TT,s Docket N〇:〇6〇8-A4m7-TW/iinal/林環輝/2〇〇7/〇6/i5 8 200803527 在此揭露的實關提供期—圖形處理單元以增進移動 估測系統與方法。 1·用於視訊鱗碼的運算平色 第1 ®係用於目形與視訊編碼及/或解碼之—示^性運算 平台之方塊圖。系統1GG包含_ —般用途⑽11Q (此後稱 為主處理器)、-圖形處理器⑽)12G、記憶體13()與匯流 鲁排140。圖形處理單元120包含-視訊加速單元(卿)15〇, 其可加速視訊編碼及/或解碼,將於後敘述。圖形處理單元 的視訊加速功能係可在圖形處理單元12〇上執行的指令。 軟體解碼器160與視訊加速驅動器17〇位於記憶體13〇 中,解碼器160在主處理器110上執行。透過一個由視訊加速 驅動器170提供的-介面,解碼器16〇亦可發出給圖形處理單 儿120的視訊加速指令。如此一來,系統1〇〇透過發出視訊 ❿力口速指令給圖形處理單元12〇的主處理器軟體(⑽ processor software)執行視訊編碼。依此法,經常被執行的 密集運算方塊(⑽Putati〇nally intensive心咖)被卸至 _處理單元120,而更複雜的運算係由域理|| 11()所執 行。 第1圖中省略數個對於解釋圖形處理單元m之視訊加速 特徵並非必要且熟悉此項記憶者熟知的習知元件。接下來將對 視訊編碼概要說明,再接下來討論一個視訊編碼元件(移動估 9Client's Docket N〇.:S3U06~0025 TT’s Docket No:0608-A41237-TW/fmal/林環輝/2007/06/15 9 200803527 ,)如何彻圖形處理單元12Q所提供的視訊加速單元功 2. 第2圖係第1 _之視I編碼器的功能方塊圖。轉人至 編碼器⑽白勺圖像:( 205)係由像素所組成。編碼器⑽利用 圖像205内的時間(*temp〇ral)與空間相似性(邓的如 • Similar^leS)運作,並且利用判定一圖框内(空間)及/或圖 柩間(日守間)白勺差異相似性編碼。空間編碼利用一圖像内鄰近 像素通常相同或相關的特性編碼,故僅對差異編碼。時間編碼 利用連串圖像中的终多像素通常相同的值,故僅對圖像間的 差異編碼。編碼器⑽㈣用熵編碼的統計冗餘性:一些圖像 車乂另a二圖I更系發生’故較常發生的以較短的碼代表。烟編 碼的粑例包含*夫更編碼(Huff_⑺咖)、運行長度編碼 (run-length encoding) (Arithmetic c〇ding) 與前後自我適應的二位元算術編碼(⑽ binary arithmetic coding)。 在此示範性實施例中,輪入圖像205的方塊係提供至一減 法器2H)與一移動估測器22〇。移動估測器-比較輸入圖像 205内的方塊與-預先儲存的參考圖像23〇以找出相似的方 塊。移動估測器2 2 0計算代表相符方塊間配置的一組移動向量 245。移動向量245與參考圖像的相符方塊2如合稱為預測方 lOClienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2〇〇7/〇6yq5 10 200803527 塊255,代表時間編碼。 預測方塊255係提供至減法器21〇,其將輸入圖像2〇5減 去預測方塊255以產生-剩餘圖像26〇。剩餘圖像·係提供 至離散餘旋轉換器⑽,dlscrete⑽此transf〇rm)方塊 270與量化器_,其行空間編碼。量化器28〇 &amp;輸出(例 如組里化後的DCT係數&gt; 係由痛編碼器29〇編碼。 • 對於某種犬員型的圖像(資訊或1圖框,與預測或P圖框), 該空間來自量化器280的空間編碼餘數(spatiaUy⑽制 residual)係提供給内部解碼器。解碼器利用空間編碼餘數結 合由移動估測器220所產生的移動向量245以對空間編碼圖像 205解碼。重新建構的圖像係儲存在參考圖像緩衝器295,其 係提供至移動估測器220,如前所述。 如結合第-圖所討論的,編碼器⑽在主處理器ιι〇上執 •行,然而億利用由圖形處理單元12〇所提供的視訊加速指令。 尤其是’由移動估測器220戶斤實現的演算法利用由圖形處理單 7L 120所提供的絕對.絕對差值加總(通, 漏-〇f-abs〇lute-difference)指令以達成正確的移動估測, 在相對低的運算量下。接著將詳述移動估測淹管法。 ^_数盤移動估測演篡法 1 IClienfs Docket N〇.:S3U06-0025 TT’s Docket No:0_-A41237-TW/fmal/林璟輝/2〇〇7/〇6/15 11 200803527 L搜尋窗(Search Window) 如示於第3A、B圖,移動估測器220將目前圖像205切 割成不重疊的各區段,稱為巨圖塊。巨圖塊的大小會依編碼器 所使用的規乾(例如’ MPEG-2、H. 264、VC)與圖像的大小而 改變。’ 1 在此敘述之示範性實施例,與在各種不同編碼標準中, 一巨圖塊係16x16像素。一巨圖塊更切割成方塊,該方塊的大 小可為 4x4、8x8、4x8、16x8、或 8x16。 在MPEG-2中’各巨圖塊可僅有一移動向量,故移動估測 係根據巨圖塊。H. 264允許達32個移動向量(依程度而定), 故在H· 264中,移動估測係根據4x4或8x8方塊的基礎而計 算。H· 264之變化,稱為AVS,該移動方塊永遠為8x8。在U 中,其可為4x4或8x8。 移動估測演算法220對目前圖像205中的每一巨圖塊執 行移動估測,依照在一預先編碼的圖像230 (其類似於目前圖 像205的巨圖塊)中尋找一方塊的目標。參考圖像230中的巨 圖塊與目前圖像205中的巨圖塊間的置換係計算並儲存為移 動向量(245,第2圖)。 為方便說明,移動估測程序將以目前圖像31〇中—特定 巨圖塊說明( 320)。此範例所選擇之巨圖塊32〇係在目前圖像 12Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/flnal/林璟輝/2007/06/15 12 200803527 ’ 310的中間,然而相同技術亦應用在其他巨圖塊。 一搜尋窗( 330)係在參考圖像230 (對應目前圖像31〇 的巨圖塊320)中巨圖塊的中間。即,若巨圖塊32〇係位於(X, Y),則在參考圖像230中的搜尋窗330亦位於(X,γ),如示 於_ 340。其他實施例將巨圖塊放在參专圖像23〇的其他部 分,例如左上。範例第3A、B圖中的搜尋窗33〇在水平方向延 伸通過相應巨圖塊的兩像素,在垂直方向一個像素。因此,搜 • 尋窗330包含14個不同巨圖塊:兩個巨圖塊分別發個與 2個像素,就在位置340的左邊;另一組兩個巨圖塊在位置 的左邊;剩下組在位置340的上面、下面、左上、右上、 與右下。 1 由移動估測裔220所執行的相符方塊移動運算使用絕對 絕對差值加總作為判斷巨圖塊間相似性(相符)的準則。絕對 絕對差值加總,計算兩像素值間的差值絕對值,並將一方塊中 ^ 所有像素的這些差值絕對值加總,如熟悉該項技藝之人士所理 解的。移動估測器220結合使用絕對絕對差值加總準則與選擇 待測相似性的目標巨圖塊之開創性方法,其將於下說明。 b.選擇目標巨圖德 移動估測器220使用不同的搜尋方法,依據移動估測器 220疋產生目鈾圖像205的内部編碼(intra—c〇ded)移動向量 或外部編碼(inter-coded)移動向量。移動估測器220利用真 BClient^s Docket N〇.:S3U06-0025 TT’s Docket NcK〇608-A41237-TW/fmal/林璟輝/2007/06/15 200803527 實世界關於移動的習知知辦 咨320的打户,心/ 相符巨圖塊應該在搜尋 _二中t °中目標方塊數目,其係實際與 目刚圖像205中的巨圖塊則進行相似測試 =通常以固定:速度移 :予tGPtieal flQW)中物體的移動,魏和且相似(即實質上 連貝)的在工間上與時間上都是。:此外,在絕對絕對差值加 =表面(即在—搜尋空間姆絕對差值加總值)係被期待為相 對地緩和(即相對少數量的局部最小點)。 利用此自知知識需要指揮搜尋最可能發現最相符的地 方’纽揭露的演算法使職少要被執行搜尋的數目以找到較 佳的最小點。如此—來,該演算法在計算上有效率也可有效的 標出較佳的相符。 第4圖係一示範性實施例移動估測器22〇用來計算目前 • 圖像205内目满巨圖塊310之移動向量之演算法流程圖。移 動估測程序從步驟開始,其判定由移動估測器22()為目前 圖像205所產生的移動向量將被圖像間預測 (inter predicted)或圖像内預測Qntra—pre(jicted)。若使用 圖像内預測則接著進行步驟420,在此施行共軛梯度下降搜尋 / 秀介法(conjugated gradient descent search algorithm) 以哥找搜尋窗320内一預測巨圖塊,這與參考巨圖塊(目前圖 像205内之目前巨圖塊31〇)是較佳的相符。共輛梯度下降搜 哥濟异法(步驟420)將結合第5、6圖詳細說明。 MClienfs Docket N〇.:S3U06-0025 TT,s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 200803527 回到410 ’若使用圖像間預測以產生移動向量,則 ,著執行步驟,在此執行,,鄰近的,,或,,鄰近區域,,搜 尋。該搜尋包含鄰近於目前圖像2〇5内目前巨圖塊31〇的巨圖 =以及對應的先前編碼參考圖像23。内的巨圖塊。鄰近搜尋 ;演算法(步驟430)將結合第I、8圖詳細說明。 卞 共輛梯度下降搜尋演算法(步驟細)與鄰近搜尋演算 Ά驟430)各從-大群目標預測巨圖塊中認出了較佳或 7受的相符。熟悉此項技藝之人士應當暸解到用來判定如何才 是-個”較佳的相符,,之準則可以是相對的或是絕對的。例 如’在此敘狀鄰近搜尋演算法使用—絕對準則:有最低值 (score)的目標巨圖塊被視為較佳的相符。然而,在此敛述 之共輛梯度下降搜尋演算法利用一臨界值,絕對差值加總值低 於該臨界值的第-方塊被視為較佳的相符。然而,該臨界值的 準則係一設計或實現決定。 在處理步驟420 &lt; 430之後,以認出一較佳候選相符。 步驟440更執行-局部區域徹底搜尋(1〇Cal _— search)以找到最佳的候選。該搜尋區域係位於步驟42〇或 430所認出的較佳候選巨圖塊附近。在一些實施例中,在執行 步驟420 ’共辆梯度下降搜尋演算法之後(即在圖像内預测的 狀況下),局部徹底搜尋職尋祕域包含步驟所認出的 局部隶小值(較佳候選)的外面附近的4個對角。例如,若在 梯度下降上個步驟所使用的值是j,則該搜尋限制在離該較佳 15Clienf s Docket N〇.:S3U06-0025 TT’s Docket N〇..0608-A41237-TW/fmal/林璟輝/2007/06/15 15 200803527 候選()的點。在一些實施例中,當執行步驟43〇之後(即 在圖像間預測的狀況下),局部徹底搜尋(步驟權)所搜尋 的包含在較佳候選巨圖塊附近一小區域的候選,通常是 步驟440的局部徹雇搜尋從一較佳候選巨圖塊限縮至一 最佳候選巨圖塊,這是像素調準(pixel_aligned),即具有整 數像素解析度。步驟450與460在一分數像素邊界 _ ( fractional-Pixel b_dary )找到一最佳候選巨圖塊調 準。習知分數移動搜算法使用特定編解碼賴波演算法 (codec-specific filteringalg〇rithm)以内插在分數位置 的像素值,根據周圍的整數位置。相對的,步驟45〇建立最俨 候選巨圖塊與參考巨__符程度為二次表面,而步驟= 分析地判定該表面的最小值。最小值對應一最佳相符巨圖塊, 為分數而非整數解析度。(開創性的以分數解析度判定最, • 符巨圖塊的建模將在後面的段落加以說明。)在有著分數”相 度的相符巨圖塊於步,驟450被認出之後,接著處理步刀驟仰析 根據該相符巨圖塊計算一分數移動向量,使用熟悉此項^ 2 所知悉的技術。接著就完成了程序4〇〇。 $ 熟悉此項技藝者應當瞭解到上面的演算法在本質上3 續的,因其使用了鄰近區域的資訊。儘管使用了硬體加逮 知設計通常避免連續演算法,因為許多原因,連續的設計在 裡是適當的。首先,像素資料係以連續水平掃瞄線X的开/ 16Clienfs Docket No.:S3U06-0025 乂 : TT’s Docket N〇:0608_A41237•丁 w/fmal/林璟輝/2〇_6m 200803527 (equential raster fashion)讀取,因而可被預先接收, ,持在=路缓衝器中。其次,在含有單-絕對差值加總加速 f元的貝苑例中’效能是限制在該單元是否能維持滿載而非連 '、貝處理。絶對差值加總加速單元在預測方塊沒有許多快取遺漏 、、准持尚負載。因為遺漏率是快取大小的函式,而.τν 解=度影像在快取中僅f要删/8 =侧義向量,低的快 取遺漏率是可以預期的。 c, 的圖像内褶測銘叙而i 第5圖係第4圖共輛梯度步驟440的流程圖,由移動估 測气220之-實施例所執行。如前所述,步驟權係、在判定使 用圖像内預測將被用來尋找搜尋t 32G内巨圖塊係與目前方 $ _為-較佳(即可接受的)相符時執行。絕縣值加總值 為P組5個初始候選而計算:目前巨圖塊、與目前巨圖塊 上、下、左、右的巨圖塊。從這初始組5個絕對差值加總值, 計鼻兩組互㈣絲梯度。從這兩組梯度,_最㈣肖的方向 的梯度。錢梯度姉地淺,或5個初始候駐圖塊有非常接 近的絕對錄加總值,職搜尋延伸遠離目敍圖塊,因為在 此區域内不存在練佳局部最小機率之條件的簡。在對共辆 梯度下降步驟44Q概述之後,該步驟將更詳細的說明於下。 17Clienf s Docket No.:S3U06-0025 mD〇cketN〇麵捕237挪 _ 林環輝/2_〇6/i5 取該步驟從步驟505開始,在此初始化—候選方塊^與步 驟值△,與λ y。在—實施例巾,候敍圖塊^為搜尋窗 17 200803527 320的左上角,而步盥几 接著在均叹為一小整數值,例如8。 在力驟51G,計算候選巨圖塊^ 標。這四個硅、pec W周的候廷巨圖塊的座 個。即, 口尾c-的上、下、左、右四 〜ς)心(―〜+Q,C》 差值力 個候選巨圖塊的絕對 声^本_與周遭四個)。在步驟52Q,計算梯度一 1面:疋左邊與右邊巨圖塊絕對差值加總值的差。梯度a是 Μ下面巨圖塊絕對差值加總值 Λ » 」左如此一來,不論可銥 相符巨圖塊間的誤差值是 ^ &quot; J&quot;b 左值疋、加或減少,該梯度表示x或v方 向。在步驟525,該梯度係盥_gt凡 做界值作比較。若該梯度低於 界值(__目對喊),這表示在目前搜尋區域中益 局部最小值’故該搜尋延伸至新的候選巨圖塊。這些新的候選 =圖塊遠離了原本的候選處理巨圖塊k。在—些實施例中, 當在步驟515為候選巨圖塊所計算的絕對差值加總值相似時 亦延伸該搜尋。該延伸搜尋繼續在步驟530進行,在此計算四 個新候選巨®塊的座標。原本四個候選巨圖塊係在上下左 右距離C,A)的地方,選擇四個新候選巨圖塊以形成原本候選 巨圖塊Q周圍正方形角落’距離(△,,△,): ΓΖ=(-△,+Cv,-△,C抑=(△〆” △}+=(_△〆,ς)湖=队 + 匕〜+Q 18Client’s Docket No.:S3U06-0025 TT,s Docket No:0608-A41237-TW7flnal/林環輝/2007/06/15 18 200803527 在步驟535,對這些新的候選巨圖塊(c,tl,t 匀別執行共輛梯度下降步驟440。 回到步驟525的梯度比較,若在巨圖塊咖所計管的梯 度係等於歧於該臨界值(_做仙對地㈣) 步驟540在:步驟515所計算的絕對差值加總值與二臨界值作比 較。若該絕縣值加總值低於該臨界值,職示朗較佳相 符,則步驟440回到呼叫器、(在步驟545) ’提供該 最低絕對差值加總值的候選巨圖塊。 ^ 若在步驟540所職的魏對差值加總料於或低於$ 臨界值’表示沒有_錄相符,故驢搜尋_。在步^ 550 ’選擇—新的中央候選巨圖塊Q。新的中央巨圖塊.是 以:〇1^候選組中在步驟515巾算出有最低絕對差值加 總值的方塊。接著,在步驟555,從梯計算新的步驟 值△,與△,,例如陡_梯度代表可接受的相符巨圖 塊係目前中央候選很遠,故增加⑽。相反地,淺的梯度代 表可接受的相符巨圖塊係目前中央候選很近,故應減少 (△,Α)。熟悉此項技藝之人士應當瞭解到各種不同的係數可以 從各梯度用來計算達成該結果。 接著’在步驟560測試疊代迴圈數。若該數目大於_最大 值,則步驟440於步驟565完成,找不到可以接受的相符。此 外,採用錯誤梯度以選擇-組新的候選巨圖塊,其被期待為較 19Client,s Docket No·:S3U06-0025 TT s Docket No:0608-A41237-TW/fiiial/林環輝/2007/06/15 19 200803527 接近於最終相符,該梯度下降步驟44〇回到步驟5ι〇,在此產 生一組新的。共輛梯度下降步驟44〇在以下兩種情況下完成, 當找到可接受的值(步驟545),或最大疊代數目以達到仍無 相符(步驟565)。 _ 6圖說明使用共.輛梯度下降步驟44(Γ的示範狀態。初始 候遥巨圖塊q係方形(),而四個周圍候選係圓圈 (61〇T,610L,610R,610B)。從這些初始候選計算梯度仏與 •&amp;⑽X,62GY)。在此示範狀態中,梯度太淺了,而沒有絕對 差值加總值低於該臨界值。因此延伸搜尋,使用四個新的中央 候選巨圖塊,示為三角形(63〇TL,63〇TR,63〇BL,63〇BR)。這些 新的候選巨圖塊距離原本候選巨圖塊。周圍角落△的距離。 在這些中央候選周圍的巨圖塊,示為六角形 (64叫,_7;,6響2,6樣2,6概3,6獅3,64叭,_ 鲁 範狀態中,兩個候選640具有低於臨界值的絕對差值加總值 與’’陡峭”梯度( 650XY,66QXY)。另一候選係根據各,,陡峭,, 梯度選擇:候選670係根據梯度650XY,而後選680係根據梯 度660XY。梯度下降搜尋繼續使用這些新的候選670、68〇,根 據共軛梯度下降步驟440。 d.使用先前鄰近圖像間預測移 第7圖係第4圖鄰近搜尋演算法(步驟430)的流程圖, 由移動估測器220之一實施例所執行。如前所述,該搜尋之候 20Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 20 200803527 中的目前巨圖塊310 (已被 的係在預先編碼的參考圖像 選巨圖塊包含鄰近於目前圖像205 編碼)的巨圖塊。亦包含為一候選 230中的一對應巨圖塊。 計算候選巨圖塊座標的步驟從步驟710開始,在此藉由利 用目前巨圖塊310位址的絕對值(餘婁£)與每行巨圖塊數古十管 -旗標變數ramiD。若聽龍非Q,則TQmuD為直了 此外,TOPVALID為假。在步驟720,_旗標變數leftvaud係 利用目别巨圖塊310位址的除以整數與每行巨圖塊數計算。若 此除數非0 ’則LEFTVALID為真’此外,leftvalid為假。這 些TOPVALID與LEHVALID變數表示目前巨圖塊31〇分別在上 面與左邊有一鄰近巨圖塊,考慮巨圖塊的上緣與左邊緣。 在步驟730,結合使用TOPVALID與LEFTVALID變數以判 定目前巨圖塊310鄰近的4個候選巨圖塊的可得性,或存在 性。特別是:左邊有一巨圖塊L若(LEFTVALID);上面有一 巨圖塊T若(TOPVALID);左上有一巨圖塊TF若(T0PVALID&amp; LEFTVALID);又上有一巨圖塊 TR 若(TOPVALIM RIGHTVALID)。 接著,在步驟740,為一先前候選巨圖塊ρ判定可得性,這是 在空間上對應目前巨圖塊310之先前編碼參考圖像230中的一 巨圖塊。這5個候選巨圖塊的相對位置可在第8Α、Β圖中看到, 其中 L 係 810、Τ 係 820、TL 係 830、TR 係 840、Ρ 係 850。 回到第7圖,步驟730與步驟740有多少候選巨圖塊可用 21Client’s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 200803527 來比較(從1到5)。步驟750鱗—可得候選巨圖塊計算絕 對差值加總。若5個候選均可得,該組絕對差值加總值為: Γ ( L Λ- Τ\ 若某些候選不可得,熟悉此項技藝之人士應當瞭解到該組 候選相對較小。接著完成㈣43Q,喊有最低絕縣值加總 的候選巨圖塊。 ~ 如先前結合第4圖所討論的,一旦找到相符巨圖塊(不論 使用第_鄰近搜尋法或是第5圖的聽梯度下降),接著搜 哥區域更加限縮,採用局部徹底搜尋(第4圖4仙)。在局部 搜尋之後’ 局部徹底搜尋的結果計算—分數移動向量。分 數移動向量的計算將於下詳述。 6·利用二次表面模型的分數移動向量運1 熟悉此項技藝之人士應當對圖示巨圖塊對搜尋窗間相符 程度以產生”錯誤表面”感到熟悉。採用一開創性方法,浐動 估測器220以一二次表面建模錯誤表面並分析地以次像素準 確性判定該表面的最小值。移動估測器220,首先與】定_方項 之最小值,給定一最小行。移動估測器220接著沿著這條線央 定正交方向的最小值。 二次曲線的一般方程式如方程式1。 22Client,s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 22 200803527 方程式1 y = Cx+C2t^Cf 對該曲線取微分,如第2方程式 I: C2 + 2C3t =&gt; i 方程式2 ϊ -旦係數c&quot;。2, ς已知,:則可求解以判定、,最小的位置 移動估測器220解出方程式3以判定係數Ci,心q。 C2 C3 4 31 - 27 5、 -27 25 一: 5 -5 1 Σ^2 方程式3 移動估測器220使用由圖开j 丨义用田圚小慝理早兀120所提供的84絕 對差值加總指令已有效率的計算方程式3。各式代表一絕對差 值加總值’對i累加代表在X方向鄰近巨圖塊的絕對差值加總 • 值。如結合帛[圖之詳細說明’該如4絕對差值加總指令有效 率的計算鄰近的巨圖塊(x,y)、(x+1,y)、(x+2,yMX+3,y), 的4個絕對差值加總值,即i=〇·..3且。如前所述, 一旦係數已知,解方程式2得到ΐ,X方向的最小值。 方程式3可以用來判定垂直方向的最小值t。在此例中, 移動估測器220使用8x4絕對差值加總指令已有效率的計算垂 直地鄰近的巨圖塊(X,y)、(x+l,y)、(x+2, y)、(x+3, y)的 4 個絕對差值加總值。方程式3解出計算自這些絕對差值加總值 23Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林環輝/2007/06/15 9 . 200803527 的係數如前所述’—旦係數已知,解方程式以导 到t,y方向的最小值。移動估測器22〇所使用的二次錯亨表 面方法較在先騎-像素邊界上—較佳树後再使用運算产: 貴濾波器去顧子像錢界上較佳树的⑽綠來的進步。 '絕對差總加速器以 的計算 ⑩ ㈣所述,移動估測器22G以目前圖像中—參考巨圖塊判 定酬圖像中那個巨圖塊有較佳的相符。移動估測器22〇使用 由圖幵y處理單元120所提供的絕對差值加總硬體加速,其為圖 形加速單元指令。絕對差值加總指令要輸入一 4x4參考方塊與 一 8x4預測方塊,並產生4個絕對差值加總值。參考方塊與預 測方塊的大小可根據需要而改變。4χ4參考方塊與8χ4預測方 塊僅為範例以說明本發明,而不應限制參考方塊與預測方塊的 φ 大小。第9Α、Β圖係說明對參考與預測方塊進行絕對差值加總 指令運作的方塊圖。如示於第9Α圖,8x4預測方塊係由多個 彼此重疊的水平鄰近4x4方塊所組成,如方塊910、920、930、 940。絕對差值加總單元取一個輸入4Χ4參考方塊950並計算 該麥考方塊與910-940個方塊的絕對差值加總值。即,該絕對 差值加總指令計算4個值··一個值是方塊910與方塊950的差 值的絕對值之總和;另一個值是方塊920與方塊950的差值的 絕對值之總和;另一個值是方塊930與方塊950的差值的絕對 值之總和;另一個值是方塊940與方塊950的差值的絕對值之 24Client,s Docket N〇.:S3U06-0025 TT’s Docket No..0608-A41237-TW/final/林璟輝/2007/06Λ5 24 200803527 總和。 茶見第9B圖,圖形處理單元12〇内的絕對差值加總加速 早兀使用4個絕對差值加總計算單元(96〇,97〇,_,99〇)以 實現絕對差值加總指令。最左邊的4x4方塊910係提供給絕對 差值加總计异單凡96〇。接著輪入右邊的4χ4方塊(的〇)給 絕對差值加料算單元⑽。接著輸人右邊的ω方塊(93〇) 、给、%對差值加總計算單元98〇。最後,提供最右邊的4χ4方塊 940々、、、邑對差值加總計算單元9別。圖形處理單元平行地 使用獨立的絕對差值加總計算單元,所以絕對差值加總指令每 個週期產生4個絕對差值加總值。熟悉此項技藝之人士應當瞭 解到用來計异兩個相同大小像素方塊的絕對差值加總運算之 演算法,以及用來執行此運算之硬體設計,故這些細節將不再 詳述。 • 4x4參考方塊係水平地且垂直地列在像素邊緣。然而,不 需要垂直地校正4x4預測方塊91〇-94〇。在一實施例中,資料 係藉由旋轉(邏輯電路995)該參考方塊所校正。旋轉參考方 塊而非分別旋轉4個預測方塊可節省邏輯閘數。旋轉後的參考 方塊係k供給各獨立絕對差值加總硬體加速單元。各單元產生 12位元的值,而這些值結合成一個位元的輪出。在一實施 例中,這些值的數量級係根據預測方塊的11紋理座標(最低位 元位置中的最低座標)。 25Client,s Docket N〇.:S3U06-0025 TT,s Docket N〇:0608-A41237-TW/fmal/林璟輝/2007/06八5 25 200803527 下面的程式碼說明8x8方塊,即兩個鄰近的8x4方塊,的 絕對差值加總值可以僅使用4個絕對差值加總指令計算。暫存 器T、T、T、T4係用來暫存這4個絕對差值加總值。變數sadS 係用來累加這些絕對差值加總值。8x4參考方塊的位址假設在 refReg。U與V係8x8預測方塊的紋理座標。下面的程式碼產 生整個8x8方塊的全部的絕對差值加總值,儲存在sadS。 SAD T1A refReg, U, V ; left-top of 8x8 prediction blockDocket N〇.:S3U06-0025 1 s Docket N〇:0608-A41237-TW/fmal/林璟辉/2007/06/15 7 200803527 The best matching square calculates a fractional motion vector. Another aspect of the present invention is a method for determining a motion vector for interpolating relative to a movement of a parameter, the method comprising: determining, according to the ~phase, which of the plurality of prediction blocks is to be paid by the Yuan test block 1 ^ ^(go, match; perform a partial region thorough search to produce the best match with the knot|the tester's thoroughness: search for the region around the center predicted by the better match, The best match has an integer pixel solution: the square ru and the analytically determined modelling the best match with the reference block; a minimum of the quadratic surface of the degree, the minimum corresponding to the best of the score </ RTI> </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; ~> ^ 弋 数 预测 预测 预测 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个 一个It’s better to thoroughly search the local area of Hai The coincident prediction block is a region around the central eve a, and the best match has an integer pixel resolution; the 私, ^, pa 乜 coincidence and the privateness paid between the reference blocks are constructed as a quadratic surface · The county I 曰%曰| at the location of the knife to determine a small value of the secondary surface, the small value corresponds to a fractional block; and according to the score of the most / degree - the best matching square motion vector * Matching box to calculate a score [Embodiment] 8Clienrs Docket N〇.:S3U06-0025 TT,s Docket N〇:〇6〇8-A4m7-TW/iinal/林环辉/2〇〇7/〇6/ I5 8 200803527 The real-time delivery period disclosed here—the graphics processing unit to improve the motion estimation system and method. 1. The operation flat color for the video scales is used for meshing and video encoding and/or decoding. A block diagram of the computing platform. The system 1GG includes a general purpose (10) 11Q (hereinafter referred to as a main processor), a graphics processor (10) 12G, a memory 13 (), and a bus bar 140. The graphics processing unit 120 includes a video acceleration unit (clear) 15 that speeds up video encoding and/or decoding, as will be described later. The video acceleration function of the graphics processing unit is an instruction that can be executed on the graphics processing unit 12A. The software decoder 160 and the video acceleration driver 17 are located in the memory 13A, and the decoder 160 is executed on the main processor 110. The decoder 16 can also issue a video acceleration command to the graphics processing unit 120 via an interface provided by the video acceleration driver 170. In this way, the system 1 performs video coding by sending a video power command to the main processor software (10) of the graphics processing unit 12 (10). In this way, the intensive computation block ((10) Putati〇nally intensive) that is often executed is offloaded to the _processing unit 120, while the more complex operation is performed by the domain || 11(). Several conventional elements that are not necessary for interpreting the video acceleration characteristics of the graphics processing unit m and are familiar to those skilled in the memory are omitted in FIG. Next, we will explain the video coding outline, and then discuss a video coding component (Mobile Estimation 9Client's Docket N〇.:S3U06~0025 TT's Docket No:0608-A41237-TW/fmal/林环辉/2007/06/15 9 200803527 ,) How to complete the video acceleration unit work provided by the graphics processing unit 12Q 2. Fig. 2 is a functional block diagram of the first _ view I encoder. Transfer to the image of the encoder (10): (205) is composed of pixels. The encoder (10) operates using the time (*temp〇ral) in the image 205 and the spatial similarity (Deng's like: Other^leS), and utilizes the decision within a frame (space) and/or between the maps (day guard) (d) differential similarity coding. Spatial coding utilizes the same or related characteristic coding of adjacent pixels within an image, so only the difference is encoded. Time Coding The use of the final multi-pixels in a series of images is usually the same value, so only the differences between the images are encoded. Encoder (10) (4) Statistical redundancy with entropy coding: some images are more likely to occur in the second picture I, so it is more often represented by shorter codes. Examples of smoke coding include *Huffy (Huff_(7) coffee), run-length encoding (Arithmetic c〇ding) and pre- and self-adaptive binary arithmetic coding ((10) binary arithmetic coding). In this exemplary embodiment, the block of wheeled image 205 is provided to a subtractor 2H) and a motion estimator 22A. The motion estimator compares the blocks in the input image 205 with the pre-stored reference images 23 to find similar blocks. The motion estimator 2 2 0 calculates a set of motion vectors 245 that represent the configuration between the matching blocks. The matching of the motion vector 245 with the reference image 2 is collectively referred to as the prediction side lOClienfs Docket N〇.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2〇〇7/〇6yq5 10 200803527 255, representing time coding. Prediction block 255 is provided to subtractor 21A which subtracts the input block 2〇5 from prediction block 255 to produce a residual image 26〇. The remaining image is supplied to a discrete cosine converter (10), dlscrete (10), this transf〇rm) block 270 and quantizer_, which is line-space encoded. The quantizer 28〇&amp; output (for example, the grouped DCT coefficient> is encoded by the pain encoder 29〇. • For a certain dog-type image (information or 1 frame, with prediction or P frame) The spatially encoded remainder of the space from quantizer 280 (residual by spatiaUy (10) is provided to the internal decoder. The decoder uses the spatially encoded remainder in conjunction with the motion vector 245 generated by the motion estimator 220 to spatially encode the image 205. The reconstructed image is stored in a reference image buffer 295, which is provided to the motion estimator 220, as previously described. As discussed in connection with the first figure, the encoder (10) is in the main processor ιι〇 The above-mentioned implementation, however, utilizes the video acceleration instructions provided by the graphics processing unit 12. In particular, the algorithm implemented by the mobile estimator 220 uses the absolute absolute difference provided by the graphics processing unit 7L 120. The value sums up (pass, leak-〇f-abs〇lute-difference) instructions to achieve the correct motion estimation, at a relatively low computational amount. The mobile estimation flooding method will be detailed next. Estimation of deduction 1 IClienfs Dock Et N〇.:S3U06-0025 TT's Docket No:0_-A41237-TW/fmal/林璟辉/2〇〇7/〇6/15 11 200803527 L Search Window As shown in Figure 3A, B, move The estimator 220 cuts the current image 205 into non-overlapping segments, called giant tiles. The size of the giant tile is determined by the encoder (eg, 'MPEG-2, H.264, VC'). And the size of the image changes. ' 1 In the exemplary embodiment described herein, and in various coding standards, a giant tile is 16x16 pixels. A giant tile is further cut into squares, and the size of the square can be It is 4x4, 8x8, 4x8, 16x8, or 8x16. In MPEG-2, each giant tile can have only one motion vector, so the motion estimation system is based on the giant tile. H.264 allows up to 32 motion vectors (depending on the degree). Therefore, in H.264, the motion estimation is calculated based on the basis of 4x4 or 8x8 squares. The change of H·264, called AVS, is always 8x8. In U, it can be 4x4. Or 8x8. The motion estimation algorithm 220 performs a motion estimation on each macroblock in the current image 205, in accordance with a pre-encoded image 230 ( It is similar to the target of finding a square in the giant tile of the current image 205. The replacement between the giant tile in the reference image 230 and the giant tile in the current image 205 is calculated and stored as a motion vector (245). , Fig. 2). For convenience of explanation, the motion estimation program will be described in the current image 31〇-specific giant tile (320). The giant tile selected in this example is currently in the image 12Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/flnal/林璟辉/2007/06/15 12 200803527 'In the middle of 310, However, the same technology is also applied to other giant tiles. A search window (330) is in the middle of the macroblock in the reference image 230 (corresponding to the giant tile 320 of the current image 31〇). That is, if the giant tile 32 is located at (X, Y), the search window 330 in the reference image 230 is also located at (X, γ) as shown in _340. Other embodiments place the giant tile in other portions of the reference image 23, such as the upper left. The search window 33 of the example 3A, B extends horizontally through two pixels of the corresponding giant tile, one pixel in the vertical direction. Therefore, the search window 330 contains 14 different giant tiles: two giant tiles are respectively sent with 2 pixels, just to the left of the position 340; the other two giant tiles are to the left of the position; The group is above, below, top left, top right, and bottom right of position 340. 1 The coincidence block movement operation performed by the mobile estimation source 220 uses the absolute absolute difference sum as the criterion for judging the similarity (conformity) between the macroblocks. The absolute absolute difference is summed to calculate the absolute value of the difference between the two pixel values, and the absolute values of the differences of all pixels in a square are summed, as understood by those skilled in the art. The mobile estimator 220 combines the absolute absolute difference summation criterion with the groundbreaking method of selecting the target giant tile of the similarity to be tested, which will be explained below. b. Selecting the target giant map mobile estimator 220 uses the different search methods to generate an intra-c〇ded motion vector or an outer code (inter-coded) of the target uranium image 205 in accordance with the motion estimator 220. ) Move the vector. The mobile estimator 220 utilizes the true BClient^s Docket N〇.: S3U06-0025 TT's Docket NcK〇608-A41237-TW/fmal/林璟辉/2007/06/15 200803527 The real world of mobile know-how 320 The household, the heart/consistent giant block should be in the search for the number of target blocks in the _second t°, which is similar to the giant block in the image 205. Normally, it is fixed: speed shift: to tGPtieal The movement of objects in flQW), Wei and similar (ie, substantially continuous) are both on the work and in time. In addition, the absolute absolute difference plus the surface (i.e., the -search space m absolute difference plus the total value) is expected to be relatively moderate (i.e., a relatively small number of local minimum points). The use of this self-aware knowledge requires a command to search for the most likely to find the most consistent location of the 'News' algorithm to minimize the number of searches to be performed to find a better minimum. In this way, the algorithm is computationally efficient and can effectively mark the best match. 4 is an algorithmic flow diagram of an exemplary embodiment motion estimator 22 for computing the current motion vector of the macroblock 310 in the image 205. The motion estimation procedure begins with a step of determining that the motion vector produced by the motion estimator 22() for the current image 205 will be inter-predicted or intra-image predicted Qntra-pre (jicted). If intra-image prediction is used, then step 420 is performed, where a conjugated gradient descent search algorithm is performed to find a predicted macroblock in the search window 320, which is related to the reference giant tile ( The current giant tile 31〇 in the image 205 is a better match. The co-vehicle gradient descent (step 420) will be described in detail in conjunction with Figures 5 and 6. MClienfs Docket N〇.:S3U06-0025 TT,s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 200803527 Back to 410 'If you use inter-image prediction to generate motion vectors, then execute Steps, performed here, adjacent, or,, adjacent areas, search. The search contains a giant image adjacent to the current giant tile 31〇 in the current image 2〇5 and a corresponding previously encoded reference image 23. The giant tile inside. The proximity search; algorithm (step 430) will be described in detail in conjunction with Figures I and 8.卞 A common gradient descent search algorithm (step-by-step) and a neighboring search calculus. Step 430) Each of the slave-large group target prediction giant tiles recognizes a better or a 7-acceptance match. Those familiar with the art should understand the criteria for determining what is better, and the criteria can be relative or absolute. For example, 'Use in this neighborhood search algorithm—absolute criteria: The target giant tile with the lowest score is considered to be a better match. However, the common gradient gradient search algorithm used here uses a critical value, and the absolute difference plus the total value is lower than the critical value. The first-square is considered to be a better match. However, the criterion of the threshold is a design or implementation decision. After processing step 420 &lt; 430, a better candidate is recognized. Step 440 is more performed - local area Thoroughly search (1〇Cal__search) to find the best candidate. The search area is located near the better candidate giant block identified in step 42〇 or 430. In some embodiments, step 420 ' After the common gradient descent search algorithm (ie, under the condition of intra-image prediction), the local thorough search for the four pairs of the local collocations (preferred candidates) recognized by the step is recognized by the step. Angle. For example, if In the gradient step, the value used in the previous step is j, then the search is limited to the preferred 15Clienf s Docket N〇.:S3U06-0025 TT's Docket N〇..0608-A41237-TW/fmal/林璟辉/2007/ 06/15 15 200803527 Point of Candidate(). In some embodiments, after performing step 43 (ie, in the case of inter-picture prediction), the partial thorough search (step weight) is searched for among the better candidates. A candidate for a small area near the giant tile, usually the partial perpetual search of step 440 is limited from a preferred candidate macroblock to a best candidate macroblock, which is pixel aligned (pixel_aligned), ie has an integer Pixel resolution. Steps 450 and 460 find an optimal candidate giant block alignment at a fractional-Pixel b_dary. The conventional fractional motion search algorithm uses a specific codec-specific filteringalg algorithm. 〇rithm) interpolates the pixel value at the fractional position according to the surrounding integer position. In contrast, step 45 〇 establishes that the final candidate giant tile and the reference giant __ degree are secondary surfaces, and step = analytically determines the Surface The minimum value corresponds to a best-matching giant tile, which is a fraction rather than an integer resolution. (The groundbreaking is best determined by fractional resolution. • The modeling of the macroblock will be explained in the following paragraphs.) After the coincident giant block with the score "phase" is stepped, the step 450 is recognized, and then the processing step is analyzed to calculate a fractional motion vector according to the matching giant block, and the technique known to the knowledge is used. . Then the program 4 is completed. $ Those familiar with the art should understand that the above algorithm is essentially intrinsic because it uses information from nearby areas. Although hardware-accumulated designs are often used to avoid continuous algorithms, continuous design is appropriate for many reasons. First, the pixel data is in the horizontal horizontal scanning line X / 16Clienfs Docket No.: S3U06-0025 乂: TT's Docket N〇: 0608_A41237 • Ding w/fmal/林璟辉/2〇_6m 200803527 (equential raster fashion) Take, and thus can be received in advance, held in the = channel buffer. Secondly, in the case of Beiyuan containing a single-absolute difference plus a total acceleration f-element, the effectiveness is limited to whether the unit can maintain full load instead of ', and the shell is treated. The absolute difference plus total acceleration unit does not have many cache misses in the prediction block, and the load is still loaded. Since the missing rate is a function of the size of the cache, the .τν solution=degree image is only f deleted in the cache. /8 = the side vector, and the low cache miss rate is expected. The in-image pleats of c, and the flowchart of the common gradient step 440 of Fig. 4 are performed by the embodiment of the mobile estimation gas 220. As previously mentioned, the step weighting is performed when it is determined that the intra-image prediction will be used to find that the giant block system in the search t 32G matches the current side $ _ is - preferably (acceptable). The total value of the absolute county value is calculated for the five initial candidates of the P group: the current giant tile, and the giant tile of the upper, lower, left and right of the current giant tile. From this initial group of 5 absolute differences plus the total value, count the two sets of mutual (four) silk gradient. From these two sets of gradients, the gradient of the _most (four) XI direction. The money gradient is shallow, or the five initial waiting blocks have very close absolute total values, and the job search extends far from the visual block, because there is no simple condition for practicing the local minimum probability in this area. This step will be explained in more detail after the overview of the common gradient step 44Q. 17Clienf s Docket No.:S3U06-0025 mD〇cketN〇面捕237向_林环辉/2_〇6/i5 Take this step from step 505, initialize here - candidate block ^ and step value △, and λ y. In the embodiment, the candidate block ^ is the upper left corner of the search window 17 200803527 320, and the steps are then sighed to a small integer value, such as 8. At the force 51G, the candidate giant block is calculated. These four silicon, pec W week's seat of the giant block. That is, the mouth, c-, upper, lower, left, and right four ~ ς) heart (―~+Q, C) difference force, the absolute sound of the candidate giant block, and the four surrounding. At step 52Q, the difference between the absolute value of the left and right giant blocks and the total value is calculated. The gradient a is the absolute difference between the giant block and the total value Λ » ” left, regardless of whether the error value between the matching macroblocks is ^ &quot;J&quot;b left value 疋, plus or minus, the gradient Indicates the x or v direction. At step 525, the gradient system 盥_gt is the boundary value for comparison. If the gradient is below the threshold (__目对对喊), this means that the local minimum is in the current search area' so the search extends to the new candidate giant block. These new candidates = tiles are far from the original candidate processing giant block k. In some embodiments, the search is also extended when the absolute difference plus total values calculated for the candidate giant tiles are similar in step 515. The extended search continues at step 530 where the coordinates of the four new candidate giant blocks are calculated. The original four candidate giant block blocks are located at the upper, lower, left and right distances C, A), and four new candidate giant image blocks are selected to form the square corner 'distance (△, △,) around the original candidate giant block Q: ΓΖ = (-△, +Cv, -△, C ==(△〆) △}+=(_△〆,ς) Lake=Team+匕~+Q 18Client's Docket No.:S3U06-0025 TT,s Docket No: 0608-A41237-TW7flnal/林环辉/2007/06/15 18 200803527 In step 535, a common vehicle gradient descent step 440 is performed on these new candidate giant block blocks (c, t1, t. Gradient comparison, if the gradient of the giant graph is equal to the critical value (_ do the ground to the ground (4)), the absolute difference plus the total value calculated in step 515 is compared with the second critical value. If the total value of the absolute value is below the threshold, and the job title is better, then step 440 returns to the pager, (at step 545) 'providing the candidate block of the lowest absolute difference plus the total value. ^ If the value of Wei in the step 540 is added to or below the threshold value, indicating that there is no _ record, then 驴 search _. In step ^ 550 'Select - New The candidate giant tile Q. The new central giant tile. In the candidate group, the block with the lowest absolute difference plus the total value is calculated in step 515. Then, in step 555, a new step is calculated from the ladder. The value △, and Δ, for example, the steep_gradient represents an acceptable coincidence of the giant block system, which is far away from the current central candidate, so it is increased by (10). Conversely, the shallow gradient represents an acceptable coincident giant block system. Therefore, it should be reduced (△, Α). Those skilled in the art should understand that various coefficients can be used to calculate the result from the gradients. Then 'test the number of iterations in step 560. If the number is greater than _maximum, then step 440 is completed in step 565, and an acceptable match is not found. Furthermore, an error gradient is used to select a new set of candidate giant tiles, which is expected to be more than 19Client, s Docket No:: S3U06- 0025 TT s Docket No: 0608-A41237-TW/fiiial/Lin Huanhui/2007/06/15 19 200803527 Close to the final match, the gradient descent step 44 returns to step 5ι, where a new set is generated. The common gradient descent step 44 is in the following two In this case, when an acceptable value is found (step 545), or the maximum number of iterations is reached, there is still no match (step 565). The _6 figure illustrates the use of a common gradient descent step 44 (an exemplary state of Γ. The distant giant block q is square (), and four surrounding candidate circles (61〇T, 610L, 610R, 610B). From these initial candidates, the gradients are calculated as &&;(10)X, 62GY). In this exemplary state, the gradient is too shallow, and there is no absolute difference plus the total value below the threshold. Therefore, the search is extended using four new central candidate giant blocks, shown as triangles (63〇TL, 63〇TR, 63〇BL, 63〇BR). These new candidate giant block blocks are from the original candidate giant block. The distance from the surrounding corner △. The giant tiles around these central candidates are shown as hexagons (64, _7;, 6 rings 2, 6 2, 6 3, 6 lions 3, 64 pts, _ Lu Fan state, two candidates 640 There is an absolute difference plus a total value below the critical value and a ''steep' gradient) (650XY, 66QXY). Another candidate is selected according to each, steep, gradient: candidate 670 is based on gradient 650XY, and then selected 680 is based on Gradient 660XY. Gradient descent search continues to use these new candidates 670, 68〇, according to the conjugate gradient descent step 440. d. Use previous neighbor inter-image predictive shift 7th image 4th neighborhood proximity search algorithm (step 430) The flow chart is executed by an embodiment of the mobile estimator 220. As described above, the search is 20Clienfs Docket N〇.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2007 /06/15 20 200803527 The current giant tile 310 (which has been categorized in the pre-encoded reference image selection block contains the adjacent image 205 encoding) is also included as a candidate 230. a corresponding macroblock. The step of calculating the candidate giant block coordinates begins at step 710, at By using the absolute value of the current huge block 310 address (the remaining value) and the number of giant blocks per line of the ancient ten tube-flag variable ramiD. If listening to the dragon is not Q, then TQmuD is straight. In addition, TOPVALID is False. In step 720, the _flag variable leftvaud is calculated by dividing the address of the macroblock block 310 by the integer and the number of macroblocks per line. If the divisor is not 0' then the LEFTVALID is true 'in addition, the leftvalid is False. These TOPVALID and LEHVALID variables indicate that the current giant tile 31〇 has a neighboring giant tile on the top and the left, respectively, considering the upper edge and the left edge of the giant tile. In step 730, the TOPVALID and LEFTVALID variables are combined to determine the current giant. The availability or existence of four candidate macroblocks adjacent to block 310. In particular, there is a giant tile L (LEFTVALID) on the left side; there is a giant tile T (TOPVALID) on the top; and a giant tile on the upper left. TF if (T0PVALID &amp;LEFTVALID); there is a giant tile TR (TOPVALIM RIGHTVALID). Next, in step 740, the availability is determined for a previous candidate giant block ρ, which corresponds spatially to the current giant tile. One of the previously encoded reference images 230 of 310 Giant block. The relative positions of the five candidate giant blocks can be seen in Figure 8, where L system 810, system 820, TL system 830, TR system 840, and system 850. Returning to Figure 7, how many candidate giant tiles in steps 730 and 740 can be compared with 21Client's Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 200803527 From 1 to 5). Step 750 Scale - The candidate giant block is calculated to add the absolute difference. If 5 candidates are available, the total absolute value of the group is: Γ ( L Λ- Τ\ If some candidates are not available, those familiar with the art should know that the group of candidates is relatively small. Then complete (4) 43Q, shouting the candidate giant block with the lowest absolute value. ~ As previously discussed in conjunction with Figure 4, once the matching giant block is found (regardless of the use of the first-neighbor search method or the fifth gradient of the listening gradient ), then the search area is more limited, using a partial thorough search (Fig. 4, 4). After the local search, the result of the partial thorough search is calculated—the fractional motion vector. The calculation of the fractional motion vector will be detailed below. • Using the fractional motion vector of the quadratic surface model. 1 Those who are familiar with this technique should be familiar with the degree of conformity between the illustrated giant tiles and the search window to produce a “wrong surface.” Using a groundbreaking approach, swaying estimates The device 220 models the erroneous surface with a quadratic surface and analytically determines the minimum value of the surface with sub-pixel accuracy. The motion estimator 220 first gives a minimum value with the minimum value of the _square term. Estimate The device 220 then determines the minimum value of the orthogonal direction along this line. The general equation of the quadratic curve is as shown in Equation 1. 22Client,s Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/ Lin Yuhui/2007/06/15 22 200803527 Equation 1 y = Cx+C2t^Cf The curve is differentiated, as in Equation 2: C2 + 2C3t => i Equation 2 ϊ -dan coefficient c&quot;.2, ς Knowing that: the solution can be solved to determine, the minimum position shift estimator 220 solves Equation 3 to determine the coefficient Ci, the heart q. C2 C3 4 31 - 27 5, -27 25 A: 5 -5 1 Σ^2 Equation 3 The motion estimator 220 uses the equation 84 for the effective efficiency of the 84 absolute difference plus total instruction provided by the field j j 用 圚 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 The value 'accumulates i represents the absolute difference plus the value of the adjacent macroblock in the X direction. If combined, the detailed description of the figure is as follows: 4 absolute difference plus total instruction efficient calculation of adjacent giant tiles ( x, y), (x+1, y), (x+2, yMX+3, y), the four absolute differences plus the total value, ie i = 〇 ·.. 3 and. As mentioned above, Once the coefficient has Equation 2 is used to determine the minimum value in the X direction. Equation 3 can be used to determine the minimum value t in the vertical direction. In this example, the motion estimator 220 uses the 8x4 absolute difference plus the total efficiency of the instruction. The four absolute differences of the adjacent giant blocks (X, y), (x+l, y), (x+2, y), (x+3, y) plus the total value. Equation 3 solves the calculation Since these absolute differences add up to 23Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/Lin Huanhui/2007/06/15 9 . 200803527 The coefficient is as described above. The coefficients are known and the equation is solved to lead to the minimum in the t, y direction. The second erroneous surface method used by the motion estimator 22〇 is used after the preferred ride-pixel boundary—the preferred tree is used: the expensive filter goes to the better tree (10) green of the money world. improvement. In the calculation of the absolute difference total accelerator 10 (4), the motion estimator 22G judges that the giant tile in the image is better in the current image-reference giant block. The motion estimator 22 uses the absolute difference provided by the processing unit 120 to add hardware acceleration, which is a graphics acceleration unit command. The absolute difference plus total instruction is to input a 4x4 reference block and an 8x4 prediction block, and generate 4 absolute difference plus total values. The size of the reference block and the prediction block can be changed as needed. The 4 χ 4 reference block and the 8 χ 4 prediction block are merely examples to illustrate the present invention, and should not limit the φ size of the reference block and the prediction block. Figure 9 is a block diagram showing the operation of the absolute difference plus total instruction for the reference and prediction blocks. As shown in Figure 9, the 8x4 prediction block is composed of a plurality of horizontally adjacent 4x4 blocks that overlap each other, such as blocks 910, 920, 930, 940. The absolute difference summation unit takes an input 4Χ4 reference block 950 and calculates the absolute difference plus the total value of the McCaw block and 910-940 blocks. That is, the absolute difference summation command calculates 4 values. One value is the sum of the absolute values of the difference between block 910 and block 950; the other value is the sum of the absolute values of the difference between block 920 and block 950; The other value is the sum of the absolute values of the difference between block 930 and block 950; the other value is the absolute value of the difference between block 940 and block 950, 24 Client, s Docket N〇.: S3U06-0025 TT's Docket No.. 0608-A41237-TW/final/林璟辉/2007/06Λ5 24 200803527 Total. See Figure 9B for tea. The absolute difference in the graphics processing unit 12 is accelerated. The four absolute difference plus total calculation units (96〇, 97〇, _, 99〇) are used to achieve the absolute difference sum. instruction. The leftmost 4x4 box 910 is provided for the absolute difference plus the total amount of the individual 96. Then round the 4χ4 square on the right to the absolute difference feed unit (10). Then input the right square ω square (93 〇), give, % to the difference total calculation unit 98 〇. Finally, the rightmost 4χ4 squares 940々, 邑, 邑 are added to the difference total calculation unit 9 . The graphics processing unit uses independent absolute difference summation calculation units in parallel, so the absolute difference plus total instruction produces 4 absolute difference plus total values per cycle. Those skilled in the art should be aware of the algorithms used to calculate the absolute difference summation operations of two blocks of the same size and the hardware design used to perform this operation, so these details will not be described in detail. • The 4x4 reference block is listed horizontally and vertically at the edge of the pixel. However, it is not necessary to correct the 4x4 prediction block 91〇-94〇 vertically. In one embodiment, the data is corrected by rotating (logic circuit 995) the reference block. Rotating the reference block instead of rotating the 4 prediction blocks separately saves the number of logic gates. The rotated reference block k is supplied to each of the independent absolute differences plus the total hardware acceleration unit. Each unit produces a value of 12 bits, and these values are combined into one bit round. In one embodiment, the magnitude of these values is based on the 11 texture coordinates of the prediction block (the lowest coordinate in the lowest bit position). 25Client,s Docket N〇.:S3U06-0025 TT,s Docket N〇:0608-A41237-TW/fmal/林璟辉/2007/06八5 25 200803527 The following code shows 8x8 squares, ie two adjacent 8x4 squares The absolute difference plus total value of , can be calculated using only 4 absolute difference plus total instructions. The temporary registers T, T, T, and T4 are used to temporarily store the four absolute difference plus total values. The variable sadS is used to accumulate these absolute differences plus the total value. The address of the 8x4 reference block is assumed to be in refReg. U and V are 8x8 prediction block texture coordinates. The following code produces the total absolute difference plus the total value of the entire 8x8 block, stored in sadS. SAD T1A refReg, U, V ; left-top of 8x8 prediction block

SAD T2, refReg, U+4, V ; right-top of 8x8 prediction block ADD sadS, ΤΙ, T2 SAD T3, refReg, U, V+4 ; left-bottom of 8x8 prediction block ADD sadS, sadS, T3 SAD T4, refReg, U+4, V+4 ; right-bottom of 8x8 prediction block ADD sadS, sadS, T4 然而,通常可以避免計算與加總所有4個子方塊的值,因 為只要該總和達到目前最小值就可以停止該計算。下列的虛擬 碼說明如何在一迴圈内使用絕對差值加總指令,其在總和達到 一最小值時停止。 工:=〇; SUM := 0; MIN = currentMIN; WHILE (工 &lt; 4 I I SUM &lt; MIN) SUM := SUM + SAD(refReg, U+(I%2)*4, V+ 26Clienf s Docket N〇.:S3U06-0025 TT’s Docket N(K〇608-A41237-TW/final/林璟輝/2007/06/15 26 200803527 (工&gt;&gt;1)*4); IP (SUM &lt; currMIN) currMIN = MIN;SAD T2, refReg, U+4, V ; right-top of 8x8 prediction block ADD sadS, ΤΙ, T2 SAD T3, refReg, U, V+4 ; left-bottom of 8x8 prediction block ADD sadS, sadS, T3 SAD T4 , refReg, U+4, V+4 ; right-bottom of 8x8 prediction block ADD sadS, sadS, T4 However, it is usually possible to avoid calculating and summing the values of all four sub-blocks, as long as the sum reaches the current minimum value Stop the calculation. The following virtual code shows how to use the absolute difference plus total instruction in a loop, which stops when the sum reaches a minimum. Work:=〇; SUM := 0; MIN = currentMIN; WHILE (work &lt; 4 II SUM &lt; MIN) SUM := SUM + SAD(refReg, U+(I%2)*4, V+ 26Clienf s Docket N〇 .:S3U06-0025 TT's Docket N(K〇608-A41237-TW/final/林璟辉/2007/06/15 26 200803527 (工&gt;&gt;1)*4); IP (SUM &lt; currMIN) currMIN = MIN ;

Go to Next Search point; 圖形處理單元120中的84絕對差值加總指令係直接由移 動估測气220的先進搜尋演算法所使用,例如第5圖中所說明 的執行局部徹底搜尋。此外,紋理快取1〇6〇 (第1〇圖)係方 塊校正,而移動估測器220所使用的演算法,如上所述,係像 φ 素权正。儘管可以將多工器單元加到圖形處理單元120中以處 理這些校正a吳差,然而這麼做會增加邏輯閘數與電力消耗。取 而代之,圖形處理單元120使用這些多餘的預算到4個絕對差 值加總單元,而不是只用1個。在一些實施例中,8χ4絕對差 值加總指令提供了有效率地運算最小值之優點,這牽涉到計算 鄰近方塊的絕對差值加總值。在一些實施例中,8Χ4絕對差值 加總指令提供了徹底搜尋(方塊440)之另一優點,當步驟值 為1時’其計算各對角的絕對差值加總值。 4.圖形處理器 已經討論過移動估測器220之軟體演算法實現以及該演 算法在圖形處理單元120中之8x4絕對差值加總指令的使用’ 接下來詳細說明絕對差值加總指令與圖形處理單元120。 a·圖形處理單元流 第10圖係圖形處理單元120的資料流程圖,其中指令流 27Client’s Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 27 200803527 係由第ίο圖左邊之箭頭,而影像或圖形流係由右邊的箭頭表 示。第10圖省略了數個熟悉此項技藝者習知的元件,這些對 解釋圖形處理單元120之回路内去方塊效應特徵非必要。 一指令流處理器1010從一系統匯流排(未示)接收一指 令ΐ020 ’並解碼該指令,產生指令資料Γ030,例如頂點資料。 圖形處理單元120支援一習知圖形處理指令,以及加速視訊 編碼及/或解碼的指令,例如前述之8χ4絕對差值加總指令。 瞻‘知圖形處理指令牽涉到如頂點著色(vertex shading)、幾何著色(geometry shading)、像素著色(pixei shading)等難題。因此,指令資料1〇3〇係施用於著色器 執行單元(shader execution units)之池(pool) 740。 著色執行單元必要使用一紋理濾波單元(TFlJ,texture filter unit) 750以施加一紋理至一像素。紋理資料係快 取自紋理快取1〇6〇,其係在主記憶體(未示)後面。 • 一些指令送給視訊處理單元1100,其運作將於後說 明。產生的資料接著由後包裝器(p〇st—packer 1070 )處 理’其壓、細5亥資料。在後處理(p〇st_pr〇cessing)之後, 由視訊加速單元所產生的資料係提供給執行單元池 (execution unii: ρ0〇ι) 1〇40。 視訊編碼/解碼加速指令的執行,例如前述之絕對差值 加總指令’在許多方面與前述之習知圖形指令不同。首先, 視訊加速指令係由視訊處理單元11〇〇執行,而非著色器執 28Client’s Docket No.:S3U06-0025 TPs Docket No:0608-A41237-TW/fmal/林環輝/2〇〇7/〇6/15 28 200803527 行單元。其次,視訊加速指令不使用其紋理資料。 然而,視訊加速指令所使用的影像資料與圖形指令所 使用的紋理資料均為2維陣列。圖形處理單元120同樣利 用此優點,使用紋理濾波單元1050下載給視訊處理單元 1100的影像資料,因而使紋理快取1060快取一些由視訊Go to Next Search point; The 84 absolute difference summation command in graphics processing unit 120 is used directly by the advanced search algorithm of motion estimation gas 220, such as performing a partial thorough search as illustrated in FIG. In addition, the texture cache 1 〇 6 〇 (Fig. 1) is a block correction, and the algorithm used by the motion estimator 220, as described above, is like φ prime weight. Although the multiplexer unit can be added to the graphics processing unit 120 to handle these corrections, doing so increases the number of logic gates and power consumption. Instead, the graphics processing unit 120 uses these extra budgets to four absolute difference summing units instead of just one. In some embodiments, the 8χ4 absolute difference summing instruction provides the advantage of efficiently computing the minimum value, which involves calculating the absolute difference sum value of the neighboring blocks. In some embodiments, the 8Χ4 absolute difference summation instruction provides another advantage of a thorough search (block 440), which calculates the absolute difference plus the total value for each diagonal when the step value is one. 4. The graphics processor has discussed the implementation of the soft algorithm of the motion estimator 220 and the use of the 8x4 absolute difference summation instruction in the graphics processing unit 120. Next, the absolute difference summing instruction is described in detail. Graphics processing unit 120. a. Graphics Processing Unit Flow Figure 10 is a data flow diagram of the graphics processing unit 120, wherein the instruction stream 27Client's Docket No.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2007/06/15 27 200803527 is the arrow on the left side of the picture, and the image or graphics flow is indicated by the arrow on the right. Figure 10 omits several elements familiar to those skilled in the art which are not necessary to explain the in-loop deblocking characteristics of the graphics processing unit 120. An instruction stream processor 1010 receives an instruction ΐ 020 ' from a system bus (not shown) and decodes the instruction to generate instruction data Γ 030, such as vertex data. Graphics processing unit 120 supports a conventional graphics processing instruction, as well as instructions for speeding up video encoding and/or decoding, such as the aforementioned 8 χ 4 absolute difference summing instructions. Looking at the ‘graphic processing instructions involves issues such as vertex shading, geometry shading, and pixel pixei shading. Therefore, the instruction data 1〇3 is applied to the pool 740 of shader execution units. The shading execution unit necessarily uses a texture filter unit (TF1J) to apply a texture to a pixel. The texture data is taken from texture cache 1〇6〇, which is behind the main memory (not shown). • Some instructions are sent to the video processing unit 1100, the operation of which will be explained later. The resulting data is then processed by a post-packer (p〇st-packer 1070) to its pressure and fine data. After post-processing (p〇st_pr〇cessing), the data generated by the video acceleration unit is supplied to the execution unit pool (execution unii: ρ0〇ι) 1〇40. The execution of the video encoding/decoding acceleration instructions, such as the aforementioned absolute difference summing instructions, is different in many respects from the conventional graphics instructions described above. First, the video acceleration command is executed by the video processing unit 11 instead of the shader 28Client's Docket No.: S3U06-0025 TPs Docket No: 0608-A41237-TW/fmal/林环辉/2〇〇7/〇 6/15 28 200803527 Line unit. Second, the video acceleration instructions do not use their texture data. However, the image data used by the video acceleration command and the texture data used by the graphics commands are both 2-dimensional arrays. The graphics processing unit 120 also uses this advantage to download the image data to the video processing unit 1100 using the texture filtering unit 1050, thereby causing the texture cache 1060 to cache some of the video.

I 處理單元Π00運作之影像資料。:因此,示於第10圖,視 訊處理單元1100係位於紋理濾波單元1050與後包裝器 1070之間。 紋理滤波早元10 5 0檢驗從指令10 2 0媚取的指令貢料 1030。指令資料1030更提供紋理濾波單元1050主記憶體 (未示)内想要的影像資料的座標。在一實施例中,這些 座標標明為U、V對,熟悉此項技藝者應對此熟悉。當指令 1020係一視訊加速指令時,所擷取的指令資料1030更命 令紋理濾波單元1050略過紋理濾波單元1050内的任何 紋理濾波器(未示)。因此,紋理濾波單元1050受到視訊 加速指令的控制下載影像資料給視訊處理單元Π00。 依此法,紋理濾波單元105Q係受操縱為視訊加速指 令去下載影像資料給視訊加速單元1100。視訊處理單元 Π00從資料路徑上的紋理濾波單元1050接收影像資料, 與命令路徑上的命令資料1030,並根據命令資料1030對 該影像資料執行一運作。由視訊處理單元1100所輸出影像 資料係回饋給執行單元池1040,在由後包裝器1070處理 之後。 29Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 29 200803527 b.指令參數 現在說明視訊處理單元11GG在執行絕對差值加總視訊加 速指令的運作。如先前說_ ’各_處理單^指令係解碼且 分析(parsed)為指令資料1030,其可視為各指令之特定參 數集。絕對差值加總指令的參數示於第1表。 第1表:圖形處理單元的絕對差值加總指令I Processing unit Π00 operation of the image data. Thus, as shown in FIG. 10, video processing unit 1100 is located between texture filtering unit 1050 and back wrapper 1070. The texture filtering early 10 5 0 checks the instruction tribute 1030 from the instruction 10 2 0. The command material 1030 further provides coordinates of desired image data in the main memory (not shown) of the texture filtering unit 1050. In one embodiment, these coordinates are designated U and V pairs and should be familiar to those skilled in the art. When the instruction 1020 is a video acceleration command, the retrieved instruction data 1030 further commands the texture filtering unit 1050 to skip any texture filters (not shown) within the texture filtering unit 1050. Therefore, the texture filtering unit 1050 is controlled by the video acceleration command to download the image data to the video processing unit Π00. According to this method, the texture filtering unit 105Q is manipulated as a video acceleration command to download image data to the video acceleration unit 1100. The video processing unit Π00 receives the image data from the texture filtering unit 1050 on the data path, and the command material 1030 on the command path, and performs an operation on the image data according to the command data 1030. The image data output by the video processing unit 1100 is fed back to the execution unit pool 1040 after being processed by the post wrapper 1070. 29Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 29 200803527 b. Instruction parameters Now the video processing unit 11GG is performing the absolute difference summation video acceleration command Operation. As previously stated, the _ processing _ processing instructions are decoded and parsed into the instruction material 1030, which can be considered as a specific parameter set for each instruction. The parameters of the absolute difference plus total instruction are shown in the first table. Table 1: Absolute difference plus total instruction of graphics processing unit

輸入/ 輸出 名稱 大小 敘述 輸入 FieldFlag 1-位元 若 FieldFlag == 1 貝ij Field Picture,其餘則 Frame Picture 輸入 TopFieldFlag 1-位元 若 TopFieldFlag 1 貝ij Top-Field-Picture, 其他 Bottom-Field-Picture 若設 定了 FieldFlag· 輸入 PictureWidth 16·位元 例如:1920用於HDTV 輸入 PictureHeigh t 16-位元 例如:1080用於30P HDTV 輸入 BaseAddress 32-位元 無符號 的 預測圖片基本位址 輸入 BlockAddres s U: 16-位 元有符 號的 預测圖片紋理座標(關係於基本位址) 在 SRC1 Opcode SRC1[0:151 = U,SRC1『31:161=V 30Ciient’s Docket No· :S3U06-0025 TT,s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 30 200803527 V: 16-位 元有符 號的 U,V為13.3格式,忽略分數部分 輸入 RefBlock 128-位元 參考圖片資料 ~ 在 SRC2 Opcode 輸出·. Destination 4x16-位 I28位元暫存器中最不重要的32位f Operand 元 在 DST OpcodeInput/output name size Description Input FieldFlag 1-bit if FieldFlag == 1 Bay ij Field Picture, and the rest Frame Picture Input TopFieldFlag 1-bit if TopFieldFlag 1 Bay ij Top-Field-Picture, Other Bottom-Field-Picture FieldFlag is input. InputWidth is 16 bits. For example: 1920 is used for HDTV input PictureHeight 16-bit. For example: 1080 for 30P HDTV input BaseAddress 32-bit unsigned predicted picture Basic address input BlockAddres s U: 16 - Bit-signed signed picture texture coordinates (relative to the base address) at SRC1 Opcode SRC1[0:151 = U, SRC1"31:161=V 30Ciient's Docket No: :S3U06-0025 TT,s Docket No: 0608-A41237-TW/fmal/林璟辉/2007/06/15 30 200803527 V: 16-bit signed U, V is 13.3 format, ignore fractional input RefBlock 128-bit reference picture data ~ SRC2 Opcode output · Destination 4x16-bit I28-bit scratchpad among the least significant 32-bit f Operand elements in DST Opcode

結合使用數個輸入參數以判定由紋理濾波單元1〇5〇所擷 取的4x4方塊位址。BaseAddress參數指出在紋理快取中該紋 理貢料的起點。將此區域内左上方塊座標給BaseAddass來 數。PictureHeight與PictureWidth輸入參數係用來判斷該 方塊的範® ’即左下方座標。最後,視訊目形可為漸進式掃晦 (progessive)或隔行掃瞄(interiace)。若為隔行掃瞄,其係 由兩個方向組成(上方與下方)。紋理濾波單元75〇使用 FieldFlag與TopFieldFlag以適當處理隔行掃瞄影像。 c.影像資料轉換 為執行絕對差值加總指令,視訊處理單元從紋理 濾波單元1050擷取輸入像素方塊並對這些方塊^行轉 換’轉換為-適當格式以利絕對差值加總加速:元 謂處理。像素方塊接著被提供至絕對差值加總加速 单兀960-謂’其回覆絕對差值加總值。各絕對差值加總 值接者被累積至目標暫存器。這些功能將於後詳述。 視訊處理單元H00接收定義計算該絕 = 3IClienfs DocketNo.:S3U06-0025 宁差值加、、、心值 TT s Docket N〇:0608_A41237-丁W/flnal/林璟輝/2〇〇7/〇6/15 31 200803527 之8x4方塊的兩個輸入參數。參考方塊的資料係直接由 SRC2運作碼直接定義· 8x4x8位元方塊視為128位元的資 料。相對地,SRC1運作碼定義預測方塊的位址而非資料。 視訊處理單元1100提供這些位址給紋理濾波單元lQ5〇, 其從紋理快取1060 _取128位元的預测方塊資料。A plurality of input parameters are used in combination to determine the 4x4 block address captured by the texture filtering unit 1〇5〇. The BaseAddress parameter indicates the starting point of the texture metric in the texture cache. The coordinates of the top left square in this area are given to BaseAddass. The PictureHeight and PictureWidth input parameters are used to determine the square of the box, ie the lower left coordinate. Finally, the video shape can be a progressive progessive or an intentce. For interlaced scanning, it consists of two directions (upper and lower). The texture filtering unit 75 uses FieldFlag and TopFieldFlag to properly process the interlaced scanned image. c. The image data is converted into an absolute difference summing instruction, and the video processing unit extracts the input pixel blocks from the texture filtering unit 1050 and converts the squares into a suitable format to facilitate the absolute difference plus acceleration: Said processing. The pixel block is then provided to the absolute difference summation acceleration 兀 960- ‘ ’ replies its absolute difference plus the total value. Each absolute difference plus total value is accumulated to the target register. These features will be detailed later. Video processing unit H00 receives the definition calculation. The absolute = 3 IClienfs DocketNo.: S3U06-0025 宁 difference plus,,, heart value TT s Docket N〇: 0608_A41237-丁 W/flnal/林璟辉/2〇〇7/〇6/15 31 200803527 Two input parameters for the 8x4 block. The data in the reference block is directly defined by the SRC2 operating code. The 8x4x8 bit box is treated as 128-bit data. In contrast, the SRC1 operational code defines the address of the prediction block rather than the data. The video processing unit 1100 provides these addresses to the texture filtering unit lQ5, which takes the prediction block data of the 128-bit from the texture cache 1060_.

I 儘管影像資料包含亮度(Y)與彩度(Cb,Cr)平面, 移動估測通常僅使用Y成分。因此,當執行絕對差值加總 指令時,視訊處理單元1100所運作的像素方塊僅含有γ 成分。在一實施例中,視訊處理單元11〇〇產生一禁止信 號’其指揮紋理濾波單元1050不要從紋理快取⑼擷取 Cr/Cb像素資料。 第11圖係紋理濾波單元1050與紋理快取1〇6〇的方塊 圖。紋理濾波單元1050係設計為從紋理快取⑼擷取紋 理影像邊界(texel boundry),並從紋理快取iQgQ下載 4x4紋理影像方塊至濾波輸入緩衝器111〇。當擷取資料代 表視訊處理單元1100時,紋理影像1120被視為各有32 位元的4個通道(ARGB)’對於128位元的紋理影像大小。 當為絕對差值加總指令擷取資料時,紋理濾波單元1〇5〇 下載8x4x8位元方塊,其儲存在2個像素輸入緩衝器 (1110A、1110B)。絕對差值加總指令所使用的8X4影像 方塊係如先前結合第9圖所述。 視訊處理單元1100所示用的影像資料可能被位元組 校正。然而,紋理濾波單元1050係被設計為從外取擷取紋 32Qient’s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 32 200803527 理影像邊界。因此,當為減處理單元n⑽擷取的資料 時,紋理濾波單元1 〇 5 〇可能需要擷取達4個環繞在一特定 位元組校正8x4方塊周圍的紋理影像校正知4方塊。 *亥私序可在第11目巾看到,其巾所擷取的方塊(目標 方塊1130)對手在紋理影像邊界上,不論在垂直歹向或在 水平方向。該目標方塊丨13G的u、ν位址定義8χ4χ8位元 的最左上角,位元組校正方塊。在此例中,紋理快取單元 # 1050操取紋理影像1140、115〇、116〇、117〇以得到目標 方塊1130。紋理濾波單元1〇5()接著結合從方塊li4〇_ii7〇 所按位元選擇的行與列,顧目標方塊⑽的最左邊4χ4 位元係寫入濾波緩衝器1110Β。熟悉此項技藝之人士應當 知道如何使用多工器、移位1(shifter)、遮罩位元(船吐 bits)達成該結果,不管從紋理快取1Q6()所擷取的4χ4 目標校正。 在第11圖所示之實施例,當目標方塊113G包含一垂 直紋理像素邊界,該資料不會垂直地重新排列。當此情形 發生日寸,下載至濾波緩衝器1 ΠΟΑ與Π10Β的資料在垂直 方向的順序與在快取中原本的順序不同。在此實施例中, 視訊處理班員11〇〇必須垂直地重新排列(旋轉)128位元 參考方塊資料以符合預測方塊的順序。在另一實施例中, 在寫入其中一濾波緩衝益1110之前,紋理濾波單元 垂直地重新排列快取紋理影像資料以符合原本的快取順 序0 33Client’s Docket N〇.:S3U06-0025 TT’s Docket No..0608-A41237-TW/fmal/林珠輝/2007/06/15 33 200803527 說明或流程圖中的方塊應被理解為表示模 。二ΓΓ程式碼,其包含用於實現特^邏輯電路功 ::=Γ驟之一個或多個可執行的指令。熟悉軟體 β者應當瞭解到,其他的實 露,。在其他的實現方法中,各鄉依= 揭i备之順序執行,包含f 涉之功能U上时進行或逆向進行,依所I Although the image data contains the luminance (Y) and chroma (Cb, Cr) planes, the motion estimation usually uses only the Y component. Therefore, when the absolute difference summing instruction is executed, the pixel block operated by the video processing unit 1100 contains only the gamma component. In one embodiment, video processing unit 11 generates a disable signal 'which directs texture filtering unit 1050 not to extract Cr/Cb pixel data from texture cache (9). Figure 11 is a block diagram of texture filtering unit 1050 and texture cache 1 〇 6 。. Texture filtering unit 1050 is designed to extract texture boundary (texel boundry) from texture cache (9) and download 4x4 texture image block from texture cache iQgQ to filter input buffer 111. When the captured data represents the video processing unit 1100, the texture image 1120 is treated as a texture image size of four channels (ARGB) each having 32 bits for 128 bits. When the data is fetched for the absolute difference summing instruction, the texture filtering unit 1〇5〇 downloads the 8x4x8 bit block, which is stored in the 2 pixel input buffers (1110A, 1110B). The 8X4 image block used for the absolute difference plus total command is as previously described in connection with Figure 9. The image data used by the video processing unit 1100 may be corrected by a byte. However, the texture filtering unit 1050 is designed to extract the pattern from the outside 32Qient's Docket N〇.: S3U06-0025 TT’s Docket No: 0608-A41237-TW/fmal/林璟辉/2007/06/15 32 200803527 Image boundary. Therefore, when subtracting the data retrieved by the processing unit n(10), the texture filtering unit 1 〇 5 〇 may need to capture up to 4 texture image correction 4 blocks around a particular byte correction 8x4 block. *Hai private order can be seen in the 11th towel, the squares captured by the towel (target block 1130) are on the boundary of the texture image, whether in the vertical direction or in the horizontal direction. The u and ν addresses of the target block 丨13G define the top left corner of the 8 χ 4 χ 8 bits, and the byte correction block. In this example, texture cache unit #1050 fetches texture images 1140, 115〇, 116〇, 117〇 to obtain target block 1130. The texture filtering unit 1〇5() then combines the rows and columns selected from the bits of the block li4〇_ii7〇, and the leftmost 4χ4 bits of the target block (10) are written to the filter buffer 1110Β. Those familiar with the art should know how to use the multiplexer, shifter, and mask bits to achieve this result, regardless of the 4χ4 target correction taken from texture cache 1Q6(). In the embodiment illustrated in Figure 11, when the target block 113G contains a vertical texel boundary, the material is not rearranged vertically. When this happens, the order of the data downloaded to the filter buffers 1 ΠΟΑ and Π 10 在 is different from the original order in the cache. In this embodiment, the video processing crew 11 must vertically rearrange (rotate) the 128-bit reference block data to match the order of the predicted blocks. In another embodiment, before writing to one of the filter buffers 1110, the texture filtering unit vertically rearranges the cached texture image data to conform to the original cache order. 0 33Client's Docket N〇.: S3U06-0025 TT's Docket No ..0608-A41237-TW/fmal/林珠辉/2007/06/15 33 200803527 The blocks in the description or the flowchart are to be understood as representations. A second program code that contains one or more executable instructions for implementing a special logic circuit work ::= step. Those who are familiar with software β should understand other implementations. In other implementation methods, the townships are executed in the order of the stipulations, and the functions involved in the f are performed or reversed.

規之系統與方法可以軟體、硬體或其結合實 财,㈣驗/或Μ係㈣在記憶體中 (包Α而不Ρ且由位於一計算裝置中之適當處理器所執行 = 於一微處理器、微控制器、網路處理器、可 重新策配處理界、、可,右_ σ 了擴域n在其他實施例中,該 。方法係以邏輯電路實現,包含而不限於-可程式 邏輯裝置(PLD ’ progr_able 1〇gic —)、可程式 邏輯閉陣列(PGA,㈣ra_ble gate array)、現場可 程式化邏輯閘陣列(騎,Held pr〇grammable gate array)或4寸定應用電路(ASIC)。在其他實施例中,這些 邏輯敛述係在—圖形處理器或圖形處理單元(GPU)完成。 在此揭路之系統與方法可被嵌入任何電腦可讀媒體而 使用,或連結—指令執行系統、設備、裝置。該指令執行 系統包含任何以電腦為基礎的系統、含有處理器的系統或 其他可以從該指令執行系統擷取與執行這些指令的系統。 所揭格之文子電細可讀媒體(c⑽puter-readable 34Client’s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 34 200803527 medium)”可為任何可以容納、儲存、溝通、傳遞或傳送該 程式作為使用或與該指令執行系統連結之工具。該電腦可 讀媒體可為,例如(非限制)為基於電子的、有磁性的、 光的、電磁的、紅外線的或半導體技術的一系統或傳遞媒 1 r 使用電子技術之電腦可讀媒體之特定範例(非限制) 可包含:具有一條或多條電性(電子)連接的線;一隨機存 取記憶體(RAM,random access memory);—唯讀記憶體 (ROM,read-only memory);—可拭去可程式化唯讀記憶 體(EPROM或快閃記憶體)。使用磁技術之電腦可讀媒體 之特定範例(非限制)可包含:可攜帶電腦磁碟。使用光 技術之電腦可讀媒體之特定範例(非限制)可包含:一光 纖與一可攜帶唯讀光碟(CD-ROM)。 雖然本發明在此以一個或更多個特定的範例作為實施 例闡明及描述,不過不應將本發明侷限於所示之細節,然 而仍可在不背離本發明的精神下且在申請專利範圍均等之 領域與範圍内實現許多不同的修改與結構上的改變。因 此,最好將所附上的申請專利範圍廣泛地且以符合本發明 領域之方法解釋,在隨後的申請專利範圍前提出此聲明。 【圖式簡單說明】 示範性運算 第1圖係用於圖形與視訊編碼及/或解碼之一 平台之方塊圖。 35Client,s Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林環輝/2007/06/15 35 200803527 第2圖係第1圖之視訊編碼器16〇的功能方塊圖。 第3A、B圖說明將目前圖像分割成不重疊的區段的巨圖塊。 第4圖係第2圖之移動估測器所使用之演算法之一示範性 實施例之流程圖。 r 第5圖係第4圖共軛梯度步驟440之一實施例的流程圖。 第6圖說明使用第5圖之共軛梯度下降步驟44〇的示範狀 態。 第7圖係第4圖鄰近搜尋演算法之一實施例的流程圖。 第8A、B圖說明第7圖之鄰近搜尋演算法所使用的5個候 選巨圖塊的相對位置。 第9A、B圖係說明對參考與預测方塊進行絕對差值加總指 令運作的方塊圖。 第10圖係第1圖之圖形處理單元的資料流程圖。 第11圖係第10圖紋理濾波單元與紋理快取的方塊圖。 【主要元件符號說明】 100〜系統、110〜主處理器、120〜圖形處理器(GPU)、130 〜記憶體、140〜匯流排、150〜視訊加速單元(VPU)、160〜 36Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/final/林璟輝/2007/06/15 36 200803527 辦 tr 权解碼态、17〇〜視訊加速驅動器。 3像、210〜減法器、220〜移動估測器、mo〜來考 圖像245〜移動向量、咖〜預測方塊、26〇〜剩餘圖像、挪 ,〜離散餘旋轉換器、〜量化器、29G〜熵解喝器、测 . 碼器。 , 1〇目萷巨圖塊、320〜巨圖塊、330〜搜尋窗、34〇〜點。 、400〜程序、〜判定移動向量將被圖像間預測或圖像内 =彳、42G〜施行共㈣度下降搜尋演算法、43G〜執行鄰近搜 才440〜執行一局部區域徹底搜尋、45〇〜建立最佳候選巨圖 塊與參考£圖塊間相符程度為二次表面、〜在—分數像素 邊界找到—最佳候選巨圖塊調準、470〜根據該相符巨圖塊計 算一分數移動向量。 • 505〜初始化一候選方塊、510〜計算候選巨圖塊^〃四周的 候選巨圖塊的座標、515〜分別計算5個候選巨圖塊的絕對差 值加總、52〇〜計算梯度&amp;與&amp;、525〜梯度是否低於一臨界 值、530〜計算四個新候選巨圖塊的座標、5邪〜對各候選巨圖 塊分別執行共軛梯度下降步驟440、540〜比較絕對差值加魄 值疋否低於一臨界值、545〜回傳有最低絕對差值加總值的候 選巨圖塊、550〜選擇一新的中央候選巨圖塊、555〜從梯度g 與A計算新的步驟值~與丨、560〜測試疊代迴圈數是否大於 一最大值、565〜回傳不相符。 37Client5s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 37 200803527 610C〜候選巨圖塊、61OL-61 OR-61OT-610B〜四個周圍候 選、620X-62GY〜初始候選計算梯度 、 630TL-630TR-630BL-630BR〜四個新的中央候選巨圖塊、 640L-640R-640T-640B〜候選、670-680〜候選 710〜利用目前巨圖塊位址的絕對值與每行巨圖塊數 计异一旗標變數T0PVALID。若此絕對值非〇,則了opvalid為 真,此外,T0PVALID為假 720〜旗標變數LEFTVALID係利用目前巨圖塊位址的除以 整數與母行巨圖塊數計异。若此除數非〇,則LEFTVALID為真, 此外,LEFTVALID為假。 730〜結合使用T0PVALID與LEFTVALID變數以判定目前巨 圖塊鄰近的4個候選巨圖塊的可得性。 _ 740〜為一先前候選巨圖塊p判定可得性。 750〜為每一可得候選巨圖塊計算絕對差值加總。 810-850〜候選巨圖塊。 910-940〜4x4方塊、950〜4x4參考方塊。 234〜旋轉邏輯、950〜預測方塊、960-990〜絕對差值加總 計算單元、 38Clienfs Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/iinal/林璟輝/2007/06/15 38 200803527 1010〜指令流處理器、1020〜指令、1030〜指令資料、1040 〜執行單元池、1050〜紋理濾波單元、1060〜紋理快取、1070 〜後包裝器、1100〜視訊處理單元。 1120〜紋理影像、113〇〜目標方塊、1140-1170〜紋理影像The system and method can be software, hardware or a combination of real money, (4) inspection / or system (4) in the memory (packaged and not executed by a suitable processor located in a computing device = a micro-processing , microcontroller, network processor, re-provisioning processing boundary, ok, right _ σ extended domain n in other embodiments, the method is implemented by a logic circuit, including but not limited to - programmable Logic device (PLD 'progr_able 1〇gic —), programmable logic closed array (PGA, (ra) ra_ble gate array), field programmable logic gate array (Held pr〇grammable gate array) or 4-inch application circuit (ASIC) In other embodiments, the logic is implemented by a graphics processor or a graphics processing unit (GPU). The system and method disclosed herein can be embedded in any computer readable medium, or a link-instruction. Execution system, device, device. The instruction execution system includes any computer-based system, a system containing a processor, or other system that can retrieve and execute these instructions from the instruction execution system. The text is fine-readable media (c(10)puter-readable 34Client's Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 34 200803527 medium)" can be any, Storing, communicating, transmitting or transmitting the program as a means of using or interfacing with the instruction execution system. The computer readable medium can be, for example, (non-limiting) electronically based, magnetic, optical, electromagnetic, infrared. Or a system or medium of semiconductor technology. A specific example of a computer readable medium using electronic technology (non-limiting) may include: a line having one or more electrical (electronic) connections; a random access memory (RAM, random access memory); - read-only memory (ROM, read-only memory); - wipeable programmable read-only memory (EPROM or flash memory). Computer-readable media using magnetic technology Specific examples (non-limiting) may include: portable computer disks. Specific examples of computer readable media using optical technology (non-limiting) may include: an optical fiber and a portable CD-ROM (CD-ROM) While the invention has been illustrated and described herein by way of the embodiments of the invention Many different modifications and structural changes are possible within the scope and scope of the scope of the invention. Therefore, it is preferred that the scope of the appended claims be broadly and construed in a manner consistent with the field of the invention, This statement. [Simple diagram of the diagram] Exemplary operation Figure 1 is a block diagram of one of the platforms for graphics and video coding and/or decoding. 35Client,s Docket No.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林环辉/2007/06/15 35 200803527 Figure 2 is a functional block diagram of the video encoder 16〇 of Figure 1. . Figures 3A and B illustrate giant tiles that divide the current image into segments that do not overlap. Figure 4 is a flow diagram of an exemplary embodiment of an algorithm used by the motion estimator of Figure 2. r Figure 5 is a flow diagram of an embodiment of a conjugate gradient step 440 of Figure 4. Figure 6 illustrates an exemplary state using the conjugate gradient descent step 44A of Figure 5. Figure 7 is a flow diagram of an embodiment of a proximity search algorithm of Figure 4. Figures 8A and B illustrate the relative positions of the five candidate giant tiles used in the proximity search algorithm of Figure 7. Figures 9A and B are block diagrams showing the operation of the absolute difference summing instructions for the reference and prediction blocks. Figure 10 is a data flow diagram of the graphics processing unit of Figure 1. Figure 11 is a block diagram of texture filtering unit and texture cache in Figure 10. [Main component symbol description] 100~ system, 110~ main processor, 120~ graphics processing unit (GPU), 130~memory, 140~bus, 150~video acceleration unit (VPU), 160~36Clienfs Docket N〇 .:S3U06-0025 TT's Docket No:0608-A41237-TW/final/林璟辉/2007/06/15 36 200803527 Do the tr decoding mode, 17〇~video acceleration driver. 3 image, 210~subtractor, 220~moving estimator, mo~to test image 245~moving vector, coffee~predicting block, 26〇~remaining image,moving,~discrete cosine converter,~quantizer , 29G ~ entropy solution, measuring. Code. , 1 〇 萷 giant block, 320 ~ giant block, 330 ~ search window, 34 〇 ~ point. , 400 ~ program, ~ determine the motion vector will be inter-image prediction or image within = 彳, 42G ~ implementation of the total (four) degree drop search algorithm, 43G ~ execution of the proximity search 440 ~ perform a partial area thorough search, 45 〇 ~ Establish the best candidate giant block and the reference £ block to match the secondary surface, ~ at the - fractional pixel boundary - the best candidate giant block alignment, 470 ~ calculate a fractional movement based on the matching giant block vector. • 505~ initialize a candidate block, 510~ calculate the coordinates of the candidate giant block around the candidate giant block, 515~ calculate the absolute difference sum of the five candidate giant blocks, respectively, 52〇~ calculate the gradient &amp; And &amp;, 525~ gradient is lower than a critical value, 530~ calculate coordinates of four new candidate giant block, 5 evil~ perform conjugate gradient descending steps 440, 540 to absolute difference of each candidate giant block respectively Value plus value 疋 No lower than a critical value, 545~Return the candidate giant block with the lowest absolute difference plus the total value, 550~Select a new central candidate giant block, 555~ Calculate from the gradient g and A The new step value ~ and 丨, 560 ~ test the number of iterations of the loop is greater than a maximum value, 565 ~ back pass does not match. 37Client5s Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 37 200803527 610C~Candidate giant block, 61OL-61 OR-61OT-610B~ four surrounding candidates 620X-62GY~ initial candidate calculation gradient, 630TL-630TR-630BL-630BR~ four new central candidate giant block blocks, 640L-640R-640T-640B~candidate, 670-680~candidate 710~ utilize current giant tile The absolute value of the address is different from the number of macroblocks per line by the flag variable T0PVALID. If the absolute value is not 〇, the opvalid is true. In addition, the T0PVALID is false. The 720~flag variable LEFTVALID is calculated by dividing the current giant tile address by the integer and the number of the parent macroblock. If the divisor is not 〇, LEFTVALID is true, and LEFTVALID is false. 730~ combines the T0PVALID and LEFTVALID variables to determine the availability of the four candidate giant tiles adjacent to the current giant tile. _ 740~ determines the availability of a previous candidate giant block p. 750~ Calculate the absolute difference sum for each available candidate block. 810-850~ Candidate giant block. 910-940~4x4 blocks, 950~4x4 reference blocks. 234~Rotating logic, 950~predictive block, 960-990~absolute difference plus total calculation unit, 38Clienfs Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/iinal/林璟辉/2007/06/15 38 200803527 1010~ instruction stream processor, 1020~ instruction, 1030~ instruction data, 1040~ execution unit pool, 1050~ texture filtering unit, 1060~ texture cache, 1070~ post wrapper, 1100~ video processing unit. 1120 ~ texture image, 113 〇 ~ target square, 1140-1170 ~ texture image

1 I 、1Π0Α-Β〜緩衝器。: ;1 I , 1Π0Α-Β~ buffer. : ;

39Clienfs Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 3939Clienfs Docket No.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 39

Claims (1)

200803527 &lt; 十、申請專利範園: 量對&quot;考方塊之移動&quot;動向 執订局邛區域徹底搜尋以產生 :::::部區域徹底搜尋在以該較佳相=: 將^ 區域’該最佳相符具有整數像素解析度; 相符與該參考方塊間相符的程度建模為一 —久表面; 分析地判定該二縣_—最小值, 分數解析度的-最佳相符方塊;以及 …〜有 動向=該有分數解析度的最佳相符方塊計算一分數移 2.如申請專利範圍第〗項之判定描述相對於該來 的該移動向量的方法,其中該判定複數個預測方 中哪:個與該參考方塊有一較佳相符之步驟更包含: 判定-目前圖框是否為圖像内預測,·以及 ^目錢框為圖像_測,則湘—共 技哥判定該複數個預測方塊中哪一個為該較佳相符。下~ 3=申請專觀㈣丨項之判定描述相對於該參寺方 換私動的該移動向量的方法,其中該判定複數個預測方 個與該參考方塊有—較佳相符之步驟更包含.、 Client s Docket N〇.:S3U06-0025 又匕 3 · °c et No-〇608-A41237-TW/fmal/#if^/2〇〇7/〇6/15 40 200803527 目前圖框是否為圖像間預測;以及 若x目勒圖框為圖像間預測,則搜尋_夫&amp; + :鄰近方塊•該複數個預測二 =: 4. 如 方塊之移二:=:=,於該參考 /足該較麵符預财塊對祕位置的-組4個方塊, 執行該局部區域徹底搜尋。 ▲ 5.如申請專利範圍第1項之判定描述相對於該參考 =塊之移動_移動向量的方法,其中該分析地判定該二 -人表面的該最小值之步驟更包含: 判定一第一方向之該最小值; 判疋垂直於該第一方向之一第二方向之該最小值。 6·如申請專利範圍第〗項之判定描述相對於該參考 方塊之移自的該移動向量的方法,&amp;巾分析地判定該二次 表面的一最小值之步驟更包含: 計算該最佳相符預測方塊之鄰近方塊之一絕對差值加 總值。 .如申請專利範圍第1項之判定描述相對於該參考 方塊之移動的該移動向量的方法,其中分析地判定該二次 表面的一最小值之步驟更包含: 计异複數個方塊之一絕對差值加總值,該複數個方塊 之中第一個在一第一方向鄰近於該最佳相符預測方塊, 41Clienfs Docket N〇.:S3U06-0025 TT’SD〇cketN0:_8-A41237-TW/fmal_f4/2〇〇7/〇6/15 4 200803527 1 該複數個方塊剩下的方塊分別鄰近於該複數個方塊中之另 一個。 8. 如申請專利範圍第1項之判定描述相對於該參考方 塊之移動的該移動向量的方法,其中分析地判定該二次表 面的一录小值之步驟更包含: ^ 計算複數個方塊之一絕對差值加總值:,且其中該計算 複數個方塊之該絕對差值加總值係利用由一圖形處理單元 ϋ 所執行的一絕對差值加總指令所執行。 9. 一種判定描述相對於一參考方塊之移動的一移動向 量的方法,該方法包含: 根據一相符標準,判定複數個預測方塊中哪一個與該 參考方塊有一較佳相符; 執行一局部區域徹底搜尋以產生與該參考方塊的一最 佳相符,該局部區域徹底搜尋在以該較佳相符預測方塊為 中央之周圍之一區域,該最佳相符具有整數像素解析度; 以及 分析地判定建模該最佳相符與該參考方塊間相符的 程度的一二次表面的一最小值,該最小值對應有分數解析 度的一最佳相符方塊。 10.如申請專利範圍第9項之判定描述相對於該參考 方塊之移動的該移動向量的方法,其中該判定複數個預測 方塊中哪一個與該參考方塊有一較佳相符之步驟更包含: 若該目前圖框係圖像内預测,利用一共軛梯度下降搜 42Client’s Docket N〇.:S3U06-0025 TT,s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 42 200803527 尋法,搜尋該複數個預測方塊以判定該較佳相符。 11. 如申請專利範圍第1,0項之判定描述相對於讓參 考方塊之移動的該移動向量的方法,其中該搜尋該複數個 預測方塊以判定該較佳相符之步驟更包含: 從該複數個預測方塊選擇,一候選方塊; 計算在該候選方塊一固定距離之左的一第一方塊的 一第一絕對差值加總值與在該候選方塊該固定距離之右的 一第二方塊的一第二絕對差值加總值之間的一水平梯度; 計算在該候選方塊該固定距離之上的一第三方塊的 一第三絕對差值加總值與在該候選方塊該固定距離之下的 一第四方塊的一第四絕對差值加總值之間的一垂直梯度; 若該水平與垂直梯度係低於一梯度臨界值,根據該水 平與垂直梯度,調整該固定距離; 判斷位於離該第一、第二、第三、第四方塊中具有最 低絕對差值加總值的方塊該調整後之固定距離之複數個新 候選方塊;以及 對各該複數個新候選方塊重複該選擇一候選方塊步 驟之後的步驟。 12. 如申請專利範圍第11項之判定描述相對於該參 考方塊之移動的該移動向量的方法,更包含: 若該水平與垂直梯度係大於或等於該梯度臨界值,則 將該第一、第二、第三、第四絕對差值加總值與一絕對差 值加總臨界作比較; 43Client’s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 43 200803527 右忒第一、第二、第三、第四絕對差值加總值中任一 個值低於该絕對差值加總臨界,則將該第一、第二、第二、 第四方塊巾具有該最低差值加總值的方塊判斷該較佳相 符0 ^ 13.-種具有用來判斷一移動向量之―程式之電腦可 項媒體’該程式含有設置成絲執行下列步驟的邏輯: &amp;根據-相符標準,判定複數個預測方塊中哪一個與一200803527 &lt;10. Applying for the patent garden: The quantity is &quot;The movement of the test box&quot; The direction of the website is thoroughly searched to generate the ::::: area to thoroughly search for the better phase =: will ^ area 'The best match has an integer pixel resolution; the degree of conformity with the reference block is modeled as a long-term surface; the two counties _-minimum, the fractional resolution-best match block are determined analytically; ...~有向向=The best matching block with fractional resolution calculates a fractional shift. 2. The method of determining the relative motion of the motion vector according to the claim of the patent scope, wherein the decision is made in a plurality of predictors Which: a step that corresponds to the reference block preferably includes: determining whether the current frame is an intra-image prediction, and if the frame of the money is an image_test, then the co-technical brother determines the plurality of Which of the prediction blocks is the best match. Down~3=Apply the subjective (four) item's decision to describe the method of changing the mobile vector relative to the temple, wherein the step of determining the plurality of predictors and the reference block preferably includes . Client s Docket N〇.:S3U06-0025 匕3 · °c et No-〇608-A41237-TW/fmal/#if^/2〇〇7/〇6/15 40 200803527 Is the current frame Inter-image prediction; and if the x-ray frame is inter-image prediction, then search for _fu &amp; +: adjacent blocks • the plural predictions two =: 4. If the square shifts two: =:=, Refer to / for the face-to-face pre-blocks for the secret position - group 4 blocks, perform a partial search for this partial area. ▲ 5. The method of claim 1, wherein the step of determining the minimum value of the two-person surface comprises: determining a first The minimum value of the direction; the minimum value of the second direction perpendicular to the first direction. 6. If the determination of the scope of the patent application section describes the method of moving the movement vector relative to the reference block, the step of analyzing the minimum value of the secondary surface by the &amp; towel comprises: calculating the optimum Matches one of the adjacent blocks of the prediction block to the absolute difference plus the total value. The method of claim 1, wherein the step of determining the minimum value of the secondary surface relative to the movement of the reference block further comprises: counting one of the plurality of squares The sum of the difference values, the first of the plurality of blocks being adjacent to the best matching prediction block in a first direction, 41Clienfs Docket N〇.: S3U06-0025 TT'SD〇cketN0: _8-A41237-TW/ Fmal_f4/2〇〇7/〇6/15 4 200803527 1 The remaining squares of the plurality of squares are respectively adjacent to the other of the plurality of squares. 8. The method of claim 1, wherein the step of determining the recorded small value of the secondary surface comprises: ^ calculating a plurality of squares. An absolute difference plus a total value: and wherein the absolute difference plus total value of the plurality of calculated blocks is performed using an absolute difference summing instruction executed by a graphics processing unit 。. 9. A method of determining a motion vector describing movement relative to a reference block, the method comprising: determining, according to a matching criterion, which of the plurality of prediction blocks has a better match with the reference block; performing a partial region thoroughly Searching to produce a best match with the reference block, the local area is thoroughly searched for a region around the center of the preferred coincident prediction block, the best match having integer pixel resolution; and analytically determining the modeling The best match corresponds to a minimum of a quadratic surface of the degree of coincidence between the reference blocks, the minimum corresponding to a best matching block of fractional resolution. 10. The method of claim 9, wherein the step of determining the movement vector relative to the movement of the reference block, wherein the step of determining which of the plurality of prediction blocks has a better match with the reference block comprises: The current frame is an intra-image prediction, using a conjugate gradient to search 42Client's Docket N〇.:S3U06-0025 TT,s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 42 200803527 Method, searching for the plurality of prediction blocks to determine the preferred match. 11. The method of claim 1, wherein the determination of the movement vector relative to the movement of the reference block, wherein the step of searching for the plurality of prediction blocks to determine the preferred match further comprises: a prediction block selection, a candidate block; calculating a first absolute difference sum value of a first block to the left of a fixed distance of the candidate block and a second block to the right of the fixed distance of the candidate block a second absolute difference plus a horizontal gradient between the total values; calculating a third absolute difference sum value of a third-party block above the fixed distance of the candidate block and the fixed distance at the candidate block a fourth vertical difference between a fourth square and a vertical gradient between the total values; if the horizontal and vertical gradients are lower than a gradient threshold, the fixed distance is adjusted according to the horizontal and vertical gradients; a plurality of new candidate blocks located at a fixed distance from the block having the lowest absolute difference plus the total value in the first, second, third, and fourth blocks; and a plurality of new candidate blocks The new candidate block repeats the steps following the selection of a candidate block step. 12. The method of claim 11, wherein the method for describing the movement vector relative to the movement of the reference block further comprises: if the horizontal and vertical gradient system is greater than or equal to the gradient threshold, then the first The second, third, and fourth absolute difference plus total values are compared with an absolute difference plus the total threshold; 43Client's Docket N〇.: S3U06-0025 TT's Docket No: 0608-A41237-TW/fmal/林璟辉/2007/ 06/15 43 200803527 Any one of the first, second, third, and fourth absolute difference plus total values of the right 低于 is lower than the absolute difference plus the total threshold, then the first, second, second, The fourth block has the lowest difference plus the total value of the block to determine that the preferred match is 0 ^ 13. - a computer-readable medium having a program for determining a motion vector. The program contains the set wire to perform the following steps. Logic: &amp; Determine which of a plurality of prediction blocks and one according to the - match criterion 參考方塊有一較佳相符; 田土執行一局部區域徹底搜尋以產生與該參考方塊的一 取佳相符’該局耗域絲搜尋在㈣較佳相符預測方塊 為中央之朋之-㈣,域灿符具㈣數像素解析度1 將該最佳相符與該參考方塊間相符的程度建模 —次表面; 、、 刀析地判定該二次表面的一最小值 八教鈉t 取值该取小值對應有 刀數角午析度的一最佳相符方塊;以及 根據該有分數解析度的最佳相符方塊計算 動向量。 〇 旦 申请專利範圍第13項之具有用來判斷該移動向 ^之該程紅電腦可讀,其中該判定複數個預測方塊 中哪-個與該參考方塊有—較佳相符之步驟更包含: 判定-目前圖框為圖像間預測或圖像内紙 若刻前圖框為圖像内預測’則利用一共軛梯度下降 叟哥判定該複數個預财塊中哪—個為該較佳相符; 44Client5s Docket No.:S3U06-0025 ’ TT s Docket No.0608-A41237-TW/fmal/林環輝/2007/06/15 44 200803527 輕 參 若°亥目觔圖框為圖像間預測,則搜尋今夫# 的鄰近方塊_ 扪技了 °亥苓考方塊周圍 符。 乂心该猶預測方塊中哪—個為該較佳相 量^ 电編可f買媒體,更包含: ^疋目珂圖框··是否為圖像間預測; : 右4目别^框相姻制,雜尋轉考方境月圍 該複數個_方塊…=: 心:=:預二?局圍,該較佳相符預測方 周圍之-區域執行該局部區域徹底搜尋。 向旦1 么如申凊專利範圍第13項之具有用來判斷該移動 σ里之该程式之電腦可讀媒體,更包含: 判定一目前圖框是否為圖像内預測; 她前圖框為圖像内預測,則利用—共輛梯 技哥判定該複數個預測方塊中哪一個為該較佳相符以及 /該較佳相符_方塊㈣地位置的-組4個方塊, 執行5亥局部區域徹底搜尋。 π· ”請專利範圍第13項之具有用來判斷該移動 向!之該程式之電腦可讀媒體,其中該分析地判定該二次 表面的該最小值之步驟更包含·· 人 判定一第一方向之該最小值; 45Client5s Docket N〇.:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/15 判定垂直於該第一方向之—第二方向之該最小值。 45 200803527 18. 如申請專利範圍第13項之具有用來判斷一移動 向量之一程式之電腦可讀媒體,其中分析地判定該二次表 面的一最小值之步驟更包含: 計算該最佳相符預測方塊之鄰近方塊之一絕對差值加 r I 總值。 : : 19. 如申請專利範圍第13項之具有用來判斷該移動 向量之該程式之電腦可讀媒體,其中分析地判定該二次表 面的一最小值之步驟更包含: 計算複數個方塊之一絕對差值加總值,該複數個方塊 之中一第一個在該第一方向鄰近於該最佳相符預測方塊, 該複數個方塊剩下的方塊分別鄰近於該複數個方塊中之另 一個〇The reference block has a better match; the field performs a partial search of the local area to produce a better match with the reference block. The local consumption domain search is in the (four) preferred matching prediction block as the central friend - (4), the domain can be used (4) Digital pixel resolution 1 Modeling the degree of matching between the best match and the reference block - the subsurface; ,, determining the minimum value of the secondary surface by the knife to determine the value of the singular t a best matching block having a knife angle resolution; and calculating a motion vector based on the best matching block having the fractional resolution. The method of claim 13 of the patent application scope is used to determine that the movement is readable by the computer, wherein the step of determining which of the plurality of prediction blocks has a preferred alignment with the reference block further comprises: Judging - the current frame is inter-image prediction or if the image in the image is in-image prediction, then a conjugate gradient is used to determine which of the plurality of pre-funds is the best match. 44Client5s Docket No.:S3U06-0025 ' TT s Docket No.0608-A41237-TW/fmal/林环辉/2007/06/15 44 200803527 轻参若°海目筋框框为 inter-image prediction, then Search for the neighboring box of 今夫# _ 扪 了 ° ° ° 苓 苓 。 。 。 。 。 。乂 该 该 犹 犹 犹 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测Marriage system, miscellaneous search for the test of the surrounding area of the plural _ square ... =: heart: =: pre-two? Bureau, the better match around the prediction side - the region performs the local area thorough search. To the computer readable medium having the program for determining the movement σ in the 13th item of the patent scope, the method further comprises: determining whether the current frame is an intra-image prediction; In-image prediction, using a plurality of prediction blocks to determine which one of the plurality of prediction blocks is the preferred match and/or the preferred match_block (four) position - the group of 4 blocks, performing a partial area of 5 Search thoroughly. π· ”Please refer to the computer readable medium of claim 13 for determining the movement direction of the program, wherein the step of determining the minimum value of the secondary surface by the analysis further includes: The minimum value in one direction; 45Client5s Docket N〇.:S3U06-0025 TT's Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15 The determination is perpendicular to the first direction—the minimum of the second direction 45. The method of claim 13 is the computer readable medium having a program for determining a motion vector, wherein the step of determining a minimum value of the secondary surface analytically comprises: calculating the most The absolute difference between one of the adjacent blocks of the prediction block plus the total value of r I. : : 19. The computer readable medium having the program for determining the motion vector according to claim 13 of the patent application, wherein the determination is analytically determined The step of the minimum value of the secondary surface further comprises: calculating an absolute difference plus a total value of the plurality of blocks, wherein a first one of the plurality of blocks is adjacent to the best matching predictor in the first direction Block, the remaining squares of the plurality of squares are respectively adjacent to the other one of the plurality of squares 46Clienfs Docket N〇,:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟輝/2007/06/1546Clienfs Docket N〇,:S3U06-0025 TT’s Docket No:0608-A41237-TW/fmal/林璟辉/2007/06/15
TW096122000A 2006-06-16 2007-06-15 Method for determining a motion vector describing motion realtive to a reference block and storage media thereof TWI350109B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US81462306P 2006-06-16 2006-06-16

Publications (2)

Publication Number Publication Date
TW200803527A true TW200803527A (en) 2008-01-01
TWI350109B TWI350109B (en) 2011-10-01

Family

ID=38880763

Family Applications (6)

Application Number Title Priority Date Filing Date
TW096120098A TWI444047B (en) 2006-06-16 2007-06-05 Deblockings filter for video decoding , video decoders and graphic processing units
TW096122002A TWI383683B (en) 2006-06-16 2007-06-15 Video encoders and graphic processing units
TW096122009A TWI348654B (en) 2006-06-16 2007-06-15 Graphic processing unit and method for computing sum of absolute difference of marcoblocks
TW096121865A TWI395488B (en) 2006-06-16 2007-06-15 Vpu with programmable core
TW096121890A TWI482117B (en) 2006-06-16 2007-06-15 Filtering for vpu
TW096122000A TWI350109B (en) 2006-06-16 2007-06-15 Method for determining a motion vector describing motion realtive to a reference block and storage media thereof

Family Applications Before (5)

Application Number Title Priority Date Filing Date
TW096120098A TWI444047B (en) 2006-06-16 2007-06-05 Deblockings filter for video decoding , video decoders and graphic processing units
TW096122002A TWI383683B (en) 2006-06-16 2007-06-15 Video encoders and graphic processing units
TW096122009A TWI348654B (en) 2006-06-16 2007-06-15 Graphic processing unit and method for computing sum of absolute difference of marcoblocks
TW096121865A TWI395488B (en) 2006-06-16 2007-06-15 Vpu with programmable core
TW096121890A TWI482117B (en) 2006-06-16 2007-06-15 Filtering for vpu

Country Status (2)

Country Link
CN (6) CN101072351B (en)
TW (6) TWI444047B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8705622B2 (en) 2008-04-10 2014-04-22 Qualcomm Incorporated Interpolation filter support for sub-pixel resolution in video coding
US9077971B2 (en) 2008-04-10 2015-07-07 Qualcomm Incorporated Interpolation-like filtering of integer-pixel positions in video coding
US9451260B2 (en) 2011-06-28 2016-09-20 Samsung Electronics Co., Ltd. Method and apparatus for coding video and method and apparatus for decoding video, using intra prediction
US10440388B2 (en) 2008-04-10 2019-10-08 Qualcomm Incorporated Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2359590A4 (en) * 2008-12-15 2014-09-17 Ericsson Telefon Ab L M Method and apparatus for avoiding quality deterioration of transmitted media content
CN101901588B (en) * 2009-05-31 2012-07-04 比亚迪股份有限公司 Method for smoothly displaying image of embedded system
CN102164284A (en) * 2010-02-24 2011-08-24 富士通株式会社 Video decoding method and system
US8295619B2 (en) * 2010-04-05 2012-10-23 Mediatek Inc. Image processing apparatus employed in overdrive application for compressing image data of second frame according to first frame preceding second frame and related image processing method thereof
TWI395490B (en) * 2010-05-10 2013-05-01 Univ Nat Central Electrical-device-implemented video coding method
US8681162B2 (en) * 2010-10-15 2014-03-25 Via Technologies, Inc. Systems and methods for video processing
EP2661879B1 (en) 2011-01-03 2019-07-10 HFI Innovation Inc. Method of filter-unit based in-loop filtering
CN106162186B (en) * 2011-01-03 2020-06-23 寰发股份有限公司 Loop filtering method based on filtering unit
KR101567467B1 (en) * 2011-05-10 2015-11-09 미디어텍 인크. Method and apparatus for reduction of in-loop filter buffer
TWI612802B (en) * 2012-03-30 2018-01-21 Jvc Kenwood Corp Image decoding device, image decoding method
US9953455B2 (en) 2013-03-13 2018-04-24 Nvidia Corporation Handling post-Z coverage data in raster operations
US10154265B2 (en) 2013-06-21 2018-12-11 Nvidia Corporation Graphics server and method for streaming rendered content via a remote graphics processing service
CN105872553B (en) * 2016-04-28 2018-08-28 中山大学 A kind of adaptive loop filter method based on parallel computation
US20180174359A1 (en) * 2016-12-15 2018-06-21 Mediatek Inc. Frame difference generation hardware in a graphics system
CN111028133B (en) * 2019-11-21 2023-06-13 中国航空工业集团公司西安航空计算技术研究所 Graphic command pre-decoding device based on SystemVerilog

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3578498B2 (en) * 1994-12-02 2004-10-20 株式会社ソニー・コンピュータエンタテインメント Image information processing device
US5627657A (en) * 1995-02-28 1997-05-06 Daewoo Electronics Co., Ltd. Method for sequentially displaying information recorded on interactive information recording medium
US6064450A (en) * 1995-12-06 2000-05-16 Thomson Licensing S.A. Digital video preprocessor horizontal and vertical filters
JP3876392B2 (en) * 1996-04-26 2007-01-31 富士通株式会社 Motion vector search method
JPH10145753A (en) * 1996-11-15 1998-05-29 Sony Corp Receiver and its method
US6496537B1 (en) * 1996-12-18 2002-12-17 Thomson Licensing S.A. Video decoder with interleaved data processing
US6177922B1 (en) * 1997-04-15 2001-01-23 Genesis Microship, Inc. Multi-scan video timing generator for format conversion
JP3870491B2 (en) * 1997-07-02 2007-01-17 松下電器産業株式会社 Inter-image correspondence detection method and apparatus
US6487249B2 (en) * 1998-10-09 2002-11-26 Matsushita Electric Industrial Co., Ltd. Efficient down conversion system for 2:1 decimation
US6573905B1 (en) * 1999-11-09 2003-06-03 Broadcom Corporation Video and graphics system with parallel processing of graphics windows
JP3757116B2 (en) * 1998-12-11 2006-03-22 松下電器産業株式会社 Deblocking filter calculation device and deblocking filter calculation method
CN1112714C (en) * 1998-12-31 2003-06-25 上海永新彩色显象管有限公司 Kinescope screen washing equipment and method
CN1132432C (en) * 1999-03-23 2003-12-24 三洋电机株式会社 video decoder
KR100677082B1 (en) * 2000-01-27 2007-02-01 삼성전자주식회사 Motion estimator
JP4461562B2 (en) * 2000-04-04 2010-05-12 ソニー株式会社 Playback apparatus and method, and signal processing apparatus and method
US6717988B2 (en) * 2001-01-11 2004-04-06 Koninklijke Philips Electronics N.V. Scalable MPEG-2 decoder
US7940844B2 (en) * 2002-06-18 2011-05-10 Qualcomm Incorporated Video encoding and decoding techniques
CN1332560C (en) * 2002-07-22 2007-08-15 上海芯华微电子有限公司 Method based on difference between block bundaries and quantizing factor for removing block effect without additional frame memory
US6944224B2 (en) * 2002-08-14 2005-09-13 Intervideo, Inc. Systems and methods for selecting a macroblock mode in a video encoder
US7336720B2 (en) * 2002-09-27 2008-02-26 Vanguard Software Solutions, Inc. Real-time video coding/decoding
US7027515B2 (en) * 2002-10-15 2006-04-11 Red Rock Semiconductor Ltd. Sum-of-absolute-difference checking of macroblock borders for error detection in a corrupted MPEG-4 bitstream
FR2849331A1 (en) * 2002-12-20 2004-06-25 St Microelectronics Sa METHOD AND DEVICE FOR DECODING AND DISPLAYING ACCELERATED ON THE ACCELERATED FRONT OF MPEG IMAGES, VIDEO PILOT CIRCUIT AND DECODER BOX INCORPORATING SUCH A DEVICE
US6922492B2 (en) * 2002-12-27 2005-07-26 Motorola, Inc. Video deblocking method and apparatus
CN100424717C (en) * 2003-03-17 2008-10-08 高通股份有限公司 Method and apparatus for improving video quality of low bit-rate video
US7660352B2 (en) * 2003-04-04 2010-02-09 Sony Corporation Apparatus and method of parallel processing an MPEG-4 data stream
US7274824B2 (en) * 2003-04-10 2007-09-25 Faraday Technology Corp. Method and apparatus to reduce the system load of motion estimation for DSP
NO319007B1 (en) * 2003-05-22 2005-06-06 Tandberg Telecom As Video compression method and apparatus
US20050013494A1 (en) * 2003-07-18 2005-01-20 Microsoft Corporation In-loop deblocking filter
US7650032B2 (en) * 2003-08-19 2010-01-19 Panasonic Corporation Method for encoding moving image and method for decoding moving image
US20050105621A1 (en) * 2003-11-04 2005-05-19 Ju Chi-Cheng Apparatus capable of performing both block-matching motion compensation and global motion compensation and method thereof
US7292283B2 (en) * 2003-12-23 2007-11-06 Genesis Microchip Inc. Apparatus and method for performing sub-pixel vector estimations using quadratic approximations
CN1233171C (en) * 2004-01-16 2005-12-21 北京工业大学 A simplified loop filtering method for video coding
US20050262276A1 (en) * 2004-05-13 2005-11-24 Ittiam Systamc (P) Ltd. Design method for implementing high memory algorithm on low internal memory processor using a direct memory access (DMA) engine
NO20042477A (en) * 2004-06-14 2005-10-17 Tandberg Telecom As Chroma de-blocking procedure
US20060002479A1 (en) * 2004-06-22 2006-01-05 Fernandes Felix C A Decoder for H.264/AVC video
US8116379B2 (en) * 2004-10-08 2012-02-14 Stmicroelectronics, Inc. Method and apparatus for parallel processing of in-loop deblocking filter for H.264 video compression standard
NO322722B1 (en) * 2004-10-13 2006-12-04 Tandberg Telecom As Video encoding method by reducing block artifacts
CN1750660A (en) * 2005-09-29 2006-03-22 威盛电子股份有限公司 Method for calculating moving vector

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8705622B2 (en) 2008-04-10 2014-04-22 Qualcomm Incorporated Interpolation filter support for sub-pixel resolution in video coding
US9077971B2 (en) 2008-04-10 2015-07-07 Qualcomm Incorporated Interpolation-like filtering of integer-pixel positions in video coding
US10440388B2 (en) 2008-04-10 2019-10-08 Qualcomm Incorporated Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter
US11683519B2 (en) 2008-04-10 2023-06-20 Qualcomm Incorporated Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter
US9451260B2 (en) 2011-06-28 2016-09-20 Samsung Electronics Co., Ltd. Method and apparatus for coding video and method and apparatus for decoding video, using intra prediction
US9473776B2 (en) 2011-06-28 2016-10-18 Samsung Electronics Co., Ltd. Method and apparatus for coding video and method and apparatus for decoding video, using intra prediction
US9479783B2 (en) 2011-06-28 2016-10-25 Samsung Electronics Co., Ltd. Method and apparatus for coding video and method and apparatus for decoding video, using intra prediction
US9485510B2 (en) 2011-06-28 2016-11-01 Samsung Electronics Co., Ltd. Method and apparatus for coding video and method and apparatus for decoding video, using intra prediction
TWI558169B (en) * 2011-06-28 2016-11-11 三星電子股份有限公司 Method and apparatus for encoding and decoding video by using intra prediction
US9503727B2 (en) 2011-06-28 2016-11-22 Samsung Electronics Co., Ltd. Method and apparatus for coding video and method apparatus for decoding video, accompanied with intra prediction

Also Published As

Publication number Publication date
CN101068353A (en) 2007-11-07
TW200821986A (en) 2008-05-16
TWI482117B (en) 2015-04-21
CN101083763A (en) 2007-12-05
TW200816082A (en) 2008-04-01
TW200816820A (en) 2008-04-01
CN101068353B (en) 2010-08-25
TW200803525A (en) 2008-01-01
CN101068365A (en) 2007-11-07
TWI348654B (en) 2011-09-11
CN101072351B (en) 2012-11-21
CN101068364B (en) 2010-12-01
CN101072351A (en) 2007-11-14
TWI383683B (en) 2013-01-21
TWI350109B (en) 2011-10-01
TWI444047B (en) 2014-07-01
CN101083764A (en) 2007-12-05
CN101083764B (en) 2014-04-02
CN101083763B (en) 2012-02-08
TWI395488B (en) 2013-05-01
CN101068364A (en) 2007-11-07
TW200803528A (en) 2008-01-01
CN101068365B (en) 2010-08-25

Similar Documents

Publication Publication Date Title
TW200803527A (en) Method for determining a motion vector describing motion realtive to a reference block and storage media thereof
KR100938964B1 (en) System and method for compression of 3D computer graphics
TWI650996B (en) Video encoding or decoding method and device
CN107113414B (en) The coding of 360 degree of videos of using area adaptive smooth
US8275049B2 (en) Systems and methods of improved motion estimation using a graphics processing unit
US8913664B2 (en) Three-dimensional motion mapping for cloud gaming
US9319708B2 (en) Systems and methods of improved motion estimation using a graphics processing unit
JP5642859B2 (en) Method and apparatus for encoding motion information
US20220360780A1 (en) Video coding method and apparatus
JP5491517B2 (en) Method and apparatus for providing a video representation of a three-dimensional computer generated virtual environment
JP6267287B2 (en) Moving picture coding apparatus, moving picture decoding apparatus, moving picture coding method, moving picture coding method, and program
TW201939951A (en) Method and apparatus of loop filtering for VR360 videos
CN109905702A (en) The method, apparatus and storage medium that reference information determines in a kind of Video coding
TWI652934B (en) Method and apparatus for adaptive video decoding
CN109672896A (en) Utilize the method for video coding and device of depth information
CN111246212B (en) Geometric partitioning mode prediction method and device based on encoding and decoding end, storage medium and terminal
CN106165425B (en) Utilize the method for video coding and device of depth information
RU2732989C2 (en) Method, device and system for generating a video signal
JP2024513815A (en) Method, apparatus and computer program for estimating Manhattan layout associated with a scene
US9996949B2 (en) System and method of presenting views of a virtual space
CN114666606A (en) Affine motion estimation method, device, storage medium and terminal
JP6080726B2 (en) Moving picture encoding apparatus, intra prediction mode determination method, and program
WO2013035452A1 (en) Image encoding method, image decoding method, and apparatuses and programs thereof
JP2013153336A (en) Image encoding method, image decoding method, image encoding device, image encoding program, image decoding device, and image decoding program
JP2013110643A (en) Image coding method, image coding device, image decoding method, image decoding device, and program therefor