TW200803525A

TW200803525A - Deblockings filter for video decoding, video decoders and graphic processing units

Info

Publication number: TW200803525A
Application number: TW096120098A
Authority: TW
Inventors: Zahid Hussain
Original assignee: Via Tech Inc
Priority date: 2006-06-16
Filing date: 2007-06-05
Publication date: 2008-01-01
Also published as: CN101068353A; TW200821986A; TWI482117B; CN101083763A; TW200816082A; TW200816820A; CN101068353B; CN101068365A; TWI348654B; TW200803527A; CN101072351B; CN101068364B; CN101072351A; TWI383683B; TWI350109B; TWI444047B; CN101083764A; CN101083764B; CN101083763B; TWI395488B

Abstract

An exemplary video decoder comprises: an entropy decoder; a spatial decoder; combining logic; and an inloop deblocking filter. The entropy decoder receives an incoming coded bit stream. The spatial decoder receives the output of the entropy encoder and produces an encoded picture comprising a plurality of pixels. The combining logic combines a current picture with a prediction picture to produce a combined picture. The inloop deblocking filter receives the combined picture. The inloop deblocking filter comprises: logic configured to filter a predefined pixel group; and logic configured to filter each of the remaining pixel groups in the plurality after the predefined pixel group, according to a corresponding set of taps in a plurality of sets of taps, if the predefined pixel group meets a criteria.

Description

200803525 九、發明說明： _ - . V: ·< . 【發朗所暴之技術領域】 .!，:赢々： :本發明係關於影像壓縮與解壓縮，且尤其翁關.於黎吝:。^ 特徵之圖形處理輩先。·: "r； . ：. -： ,·：,- ·： · ；*" ’ > :.· . -, * ， . 、： ·： /，【先前技術】〜：' ^仏人t腦咨消f性電子產畀係用於各種娛樂用成。道;::Ω 矣埤樂用品5以、大致區分為2類：使用電腦製亂， (computer-generated graphics)的那些，例如電腦遊戲；與使用壓縮視訊資料流（compressed video stream)的那些，例如預錄節目到數位式影音光碟（DVD)上，或由有線電視或锌ί星業者提供數位節目（digital programming)至一機上盒（set-top box)。第2種亦包含編碼類比視訊資料流’例如由一數位錄影機（DVR，digital video recorder ) 所執行。 φ 電腦製圖通常由一圖形處理單元（GPU，graphic processingunit)產生。一圖形處理單元是一種建立在電腦遊戲平台（computer game consoles)與一些個人電腦上一種特別的微處理器。一圖形處理單元係被最佳化為快速執行描繪三度空間基本物件（three-dimensional primitiveobjects)，例如三角形、四邊形等。這些基本物件係以多個頂點描述，其中每個頂點具有屬性（例如顏色），且可施加紋理（texture)至該基本物件上。描繪的結果係一二度空間像素陣列（two-dimensional array of 6Client’s Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 6 200803525 pixels)，顯示在一電腦之顯示器或監視器上。視訊資料流的編碼與解碼牽涉到不同種類的運算，例如，離散餘弦變換（discrete\ci^ine transform)、移動估測（motion :《太#:動補:償（motion compensation )、去方塊敢:H波器:(^uocking f 11 ter)。這些計算通常由一般用途中央處理器( cpu)結 .舍特別的硬燁邏輯電路:，|彳灰獻殊應用積體電路(AS』c， application specific integrated circuit)來產理消費者因而，需要多個運算平台以滿足他們的娛樂需求。银: 而需要可以處理電腦製圖與視訊編碼/解碼的單一計算平台。 ^ 【發明内容】在此揭露之實施例提供-種用於視訊屢縮去方塊效應的系統與方法。-麟視轉碼_紐去方塊效應瀘波盗包含：設置成絲判定複數個像素群中之—預定像素群之像素是否❹卜標準的邏輯㈣；設置成#達到該標準先對該駭像素群之像錄波薦輯電路；以及設置成虽達到該標準時，根據在複數組濾波單元（set 〇f taps) 中之-相應組濾 '波單元’循序對該複數個像素群中剩下的像素群濾波之邏輯電路。一種示範性視訊解碼器包含：》熵解碼器、4間解組合邏輯電路與—回路内去方塊效應遽波器。該摘石=t ί收—輸人編碼位凡流。該空間解碼器接收該熵解包含複數個像素的—、_片。該組合 S Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 200803525 邏輯電路結合一目前圖片與一預測圖片以產生一結合圖片。該评路对去方塊放應濾波器接收該結合圖片。該.回靡& 内去方塊效應翁_器包含：設置成對一預定像素群濾雜_ 邏輯#丨叙;友餐置铖％該預定像素群達到一標準_，粮據·, 在複數波箪元冲乏=相應組濾波單元，對該複數廊像處素群申剩下意審像素群濾波之邏輯電路。 V y特T痛一後示範性傭形處理單元包含一主處理介面與一視訊、. 加速單元。該主處理介面，接收至少一視訊加速指令。該，， - - * 視訊力Π速單元，用:於該至少“:視訊加速指令。該视訊加速單元包含一回路内去方塊效應濾波器。該回路内去方塊效應濾波器包含：設置成判定複數個像素群之一預定像素群之像素是否達到一第一標準的邏輯電路；設置成當達到該第一標準時，先對該預定像素群之像素濾波的邏輯電路；以及設置成當達到該第一標準時，根據在複數組濾波單元 (set of taps)中之一相應組濾波單元，循序對該複數個像素群中剩下的像素群濾波之邏輯電路。【實施方式】用於視訊編碼/解碼的運算平台第1圖係用於圖形與視訊編碼及/或解碼之一示範性運算平台之方塊圖。系統100包含一一般用途CPU 110 (此後稱為主處理器）、一圖形處理器（GPU) 120、記憶體130與匯流排140。圖形處理單元120包含一視訊加速單元（VPU) 150，其可加速視訊編碼及/或解碼，將於後敘述。圖形處理單元120 SClienfs Docket N〇.:S3U06-0023 TT’s Docket N〇:0608-A41202-TW/fiml/林璟輝/2007/05/31 〇 200803525 的視訊加速魏係可在_處理單元⑽上執行的指令。 :中魏訊加速驅魏威普於記憶體130 理:¾ Α ^: Ϊ解碼16々雜 t驅夷器1麵率處 -〇。二:::丨，士f。、:過一個由視辦如埭器 180^，!^ 16〇 ,a,b_* ^^100 圖，處，早mo的主處理器軟體（h〇st⑶馳咖 so巧巧執行視訊鹤及/或解碼，圖形處理單元獅透過加速解碼气160之一部分回應這些指令。在-些貫施例中，僅有一小部分的解碼器16〇在主處理哭上執行，而大部分的解·⑽係由_處理單元⑽執行二在驅動器極少超載之下。依此法，經常被執行的密集運算方塊 (computationally intensive bl〇cks)被卸至圖形處理單元 120 更複雜的運算係由主處理器' 11〇所執行。在一些實施鲁例中’由圖形處理單元12〇所實現的一個密集運算功能包含回路内去方塊效應濾、波器硬體加速邏輯（inl〇〇p debl〇cking filter hardware acceleration logic) 400，亦稱為回路内方塊效應濾波器400或去方塊效應濾波器4〇〇，其稍後將結合第4圖說明。另-密集運算功能之範例係判定各滤波器之邊界強度（BS，boundary strength)。上述之結構因而使下列運作有彈性··在主處理器HO上對解碼器160執行一些透過對大方塊（marc〇bl〇ck)執行一著色程式（shaderprogram)之特殊功能（例如去方塊效應或 9Clienfs Docket N〇.:S3U06-0023 开 TT’s Docket No:0008-A41202-TW/final/林環輝/2007/05/31 200803525 邊界強度）；或在圖形處理單元丨 160,利用乾r ·〗· 上執仃大部分的解碼器丄叫扪用官線流通（:pipellni )盥在一此解满哭id:卜闽ΰ ^ ，化（阳仗二解碼态160,在圖形處理單亓】9η 孩去方塊效應處理傣談解喝器160各:能样 (thread)。 : ::# ㈣ • ‘ ：、..、人'：： : - ,'V' „ ·：· - .：：：：-；· ；·^；- ： ·：： ,-. · 特對於解釋卿處理單元120之視訊加速200803525 IX. Invention description: _ - . V: ·< . [Technical field of violent storm] .!,: Winning:: The invention relates to image compression and decompression, and especially Weng Guan. :. ^ The graphics of the feature are processed first. ·: "r; . :. -: ,·:,- ·: · ;*" ' > :.· . -, * , . , : ·: /, [Prior Art] ~: ' ^仏The human t-brain is used for various entertainment purposes. Road;:: Ω 矣埤用品 5 5, roughly divided into two categories: those using computer-generated graphics, (computer-generated graphics), such as computer games; and those using compressed video streams, For example, pre-recorded programs onto digital video discs (DVDs), or digital programming from cable television or zinc-based companies to a set-top box. The second type also includes a coded analog video stream', for example, implemented by a digital video recorder (DVR). φ Computer graphics are usually generated by a graphics processing unit (GPU). A graphics processing unit is a special type of microprocessor built on computer game consoles and some personal computers. A graphics processing unit is optimized to quickly perform three-dimensional primitive objects, such as triangles, quads, and the like. These basic objects are described by a plurality of vertices, where each vertex has an attribute (e.g., a color) and texture can be applied to the base object. The result is a two-dimensional array of 6Client's Docket N〇.: S3U06-0023 TT's Docket No: 0608-A41202-TW/fmal/林璟辉/2007/05/31 6 200803525 pixels), display On a computer monitor or monitor. The encoding and decoding of video data streams involve different kinds of operations, such as discrete cosine transform (discrete\ci^ine transform), motion estimation (motion: "太#: motion compensation: motion compensation, go to the box dare :H wave: (^uocking f 11 ter). These calculations are usually made up of general-purpose CPUs (Cpu). Special hard-wired logic circuits:,|Specially applied to the integrated circuit (AS』c, Application specific integrated circuit), therefore, requires multiple computing platforms to meet their entertainment needs. Silver: There is a need for a single computing platform that can handle computer graphics and video encoding/decoding. ^ [Summary] The embodiment provides a system and method for video frame-removing block effect. - The lining transcoding _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Whether to standardize the logic of the standard (4); set to # to achieve the standard first record of the pixel group of the pixel group; and set to achieve the standard, according to the complex array filter unit (set f taps) - the corresponding group filter 'wave unit' is a logic circuit that sequentially filters the remaining pixel groups in the plurality of pixel groups. An exemplary video decoder comprises: "entropy decoder, 4 de-combination logic circuits And - the in-loop de-blocking chopper. The picking = t ̄ ̄ - inputting the bit stream. The spatial decoder receives the entropy solution containing a plurality of pixels - _ slice. The combination S Docket No: 0608-A41202-TW/fmal/林璟辉/2007/05/31 200803525 The logic circuit combines a current picture with a predicted picture to generate a combined picture. The evaluation channel pairs the demodulation filter to receive the combined picture.靡 & inner square block effect 翁器 contains: set a pair of predetermined pixel group filter _ logic # 丨 ;; friend meal 铖% the predetermined pixel group reaches a standard _, grain data, in the complex wave 箪 yuan The flashing = the corresponding group of filtering units, the logic circuit of the residual image group filtering is applied to the complex group image. The V y special T pain after the exemplary servant processing unit includes a main processing interface and a video, Acceleration unit. The main processing interface receives at least one The acceleration command. The, -, * video power idle unit, for: at least ": video acceleration command. The video acceleration unit includes a loop-in-square block effect filter. Included: a logic circuit configured to determine whether a pixel of a predetermined pixel group of a plurality of pixel groups reaches a first standard; a logic circuit configured to filter pixels of the predetermined pixel group when the first standard is reached; and setting When the first criterion is reached, the logic circuit for filtering the remaining pixel groups in the plurality of pixel groups is sequentially sequenced according to a corresponding group of filtering units in the set of taps. [Embodiment] A computing platform for video encoding/decoding Fig. 1 is a block diagram of an exemplary computing platform for graphics and video encoding and/or decoding. The system 100 includes a general purpose CPU 110 (hereinafter referred to as a main processor), a graphics processing unit (GPU) 120, a memory 130, and a bus bar 140. The graphics processing unit 120 includes a video acceleration unit (VPU) 150 that speeds up video encoding and/or decoding, as will be described later. Graphics processing unit 120 SClienfs Docket N〇.:S3U06-0023 TT's Docket N〇:0608-A41202-TW/fiml/林璟辉/2007/05/31 〇200803525 Video acceleration system can execute instructions on _ processing unit (10) . : China and Weixun speeded up Wei Weipu in memory 130: 3⁄4 Α ^: Ϊ Decode 16 々 t t 夷器 1 1 1 1 1 1 1 1 1 1 1 1 1 Two:::丨,士f. ,: After a view of the device such as 180^, !^ 16〇, a, b_* ^^100 Figure, at the early mo main processor software (h〇st (3) Chi coffee so smart to perform video crane and / Or decoding, the graphics processing unit lion responds to these commands by accelerating a portion of the decoded gas 160. In some embodiments, only a small portion of the decoder 16 is executed on the main processing cry, and most of the solutions are (10) The execution by the _ processing unit (10) is under the overload of the driver. In this way, the computationally intensive bl〇cks that are often executed are unloaded to the graphics processing unit 120. The more complex operation is performed by the main processor '11执行 Execution. In some implementations, 'a dense computing function implemented by the graphics processing unit 12 包含 includes in-loop deblocking filtering, wave hardware acceleration logic (inl〇〇p debl〇cking filter hardware acceleration logic) 400, also known as in-loop block effect filter 400 or deblocking filter 4〇〇, which will be described later in conjunction with Figure 4. Another example of dense operation function is to determine the boundary strength of each filter (BS) ,boundar y strength) The above structure thus makes the following operations flexible. On the main processor HO, the decoder 160 performs some special functions of performing a shader program on a large block (for example, marc〇bl〇ck). Deblocking effect or 9Clienfs Docket N〇.:S3U06-0023 Open TT's Docket No:0008-A41202-TW/final/Lin Huanhui/2007/05/31 200803525 Boundary strength); or in the graphics processing unit 丨160, use dry r · 〗 · Most of the decoders on the screaming 扪扪官官官官 : : : (: pipellni) 盥盥盥盥 id id id id id 闽ΰ 闽ΰ , , , , , , , , , , , , , , , , , , ,亓] 9η child to block effect processing 傣 talk about the drinker 160: can be (thread) : :: # (four) • ' : , .., person ':: : - , 'V' „ ·:· - . ::::-;· ;·^;- : ·:: ,-. · Specially explain the video acceleration of the processing unit 120

$、要且无、‘此項$憶者熟知的習知元件。々 ·.〉:：；-"· -；· ' V - ' ...... ' 视訊解弟2圖係第，圖中該視訊解碼器16〇之方塊圖。在第2圖 ^兄明之特殊實施例，解碼器⑽施用觸.264視訊壓縮規 t “ ’熟悉此項技藝者應當暸解到第2圖之解碼器⑽係一視訊解之初步表示，該視訊解Μ亦說賴似於H 264 之其他類型解碼器之運作，例如SMPTE V(M與mpeg_2規範。此外’儘管示為-圖形處理單元‘ 12Q之—部分，熟悉此項技藝者亦應瞭解到在此揭露之部分解碼器160亦可實現於一圖开7處理單it之外’例如-獨立存在之邏輯電路，特殊應用積體電路（ASIC)之一部分等。輸入之位元流205首先由一熵解碼器（eni：r〇py dec〇der) 210所處理。熵編碼具有統計重複型（statistic；redundancy) 之優點：一些圖樣比其他圖樣更常出現，所以較常出現的就用較短的碼代表。熵編碼包含霍夫曼編碼（Huffman coding)與運行長度編碼（run-length encoding)。在熵編碼之後，該資 lOClienfs Docket N〇.：S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 200803525 料由一空間解碼器（spa1;ial dec〇der) 215所處理，其具 τ述f點，:事實上，一圖形中鄰近的像素通常相_如·、有$, yes and no, ‘this $known to the familiar components. 々 ·.〉:: ;-"· -;· ' V - ' ...... ' Video solution 2 picture system, the video decoder 16 〇 block diagram. In the special embodiment of Figure 2, the decoder (10) applies the touch-264 video compression gauge t "'the familiarizer should understand that the decoder (10) of Figure 2 is a preliminary representation of a video solution, the video solution Μ also said that it depends on the operation of other types of decoders of H 264, such as SMPTE V (M and mpeg_2 specifications. In addition, although it is shown as part of the graphics processing unit 12Q, those skilled in the art should also understand The disclosed portion of the decoder 160 can also be implemented in a manner other than a single processing unit, for example, a logic circuit that is independently present, a portion of a special application integrated circuit (ASIC), etc. The input bit stream 205 is firstly Entropy decoder (eni:r〇py dec〇der) 210. Entropy coding has the advantage of statistically statistic;redundancy: some patterns appear more often than other patterns, so shorter ones appear more often Code representation. Entropy coding includes Huffman coding and run-length encoding. After entropy coding, the resource lOClienfs Docket N〇.:S3U06-0023 TT's Docket No:0608-A41202-TW /fmal/林璟辉/2007/05/31 200803525 is processed by a spatial decoder (spa1; ial dec〇der) 215, which has a point of τ, in fact, the adjacent pixels in a graphic are usually _如·, Have

:圖_ 處埋為較小的子區_，稱為大方塊。 ⑩ 訊壓縮規範使用16x16像素的大方塊尺寸，而其他壓」使用其他尺寸。圖形235内的大方塊與先前解碼圖項之資訊梦合，稱為晝面間預測（inter predicrfci〇n)處理，或與圖= 235之其他大方塊之資訊結合，稱為畫面内預測 prediction)處理。該輸入位元流2〇5,被熵解碼器2〇5解碼，而依各類型之圖形施用晝面間或晝面内預測。一 - -： · _ 當施用畫面間預測時，熵解碼器21〇產生一移動向量 • (m〇ti〇n vector) 245輸出。移動向量245被用來暫時的二碼，其具有T述優點，事實上，通常在—連串的圖形中許多像，會有相同的值。從-圖形到另―圖形之改變係編碼為移動向里245。移動補償方塊250將一個或多個先前解碼圖形255結合移動向量245以產生-預測圖形（265)。當施用晝面間預測日守’空間補償方塊270將得自鄰近大方塊的資訊與圖形235内的大方塊結合以產生一預測圖形（275)。結合器280將圖形235與模式選擇器（m〇de seiect〇r)285 1 IClient’s Docket N〇.:S3U06_0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2〇〇7/〇5/31 模式選擇器285使用熵解碼位元流以判定結合器 11 200803525 一 280使用移動補償方塊250產生的預測圖形（265)或使用空間補償方塊270所產生的預測圓形4游5)。，卜， ' '"· ,¾ ：r..ry-y：.： > .: .. ir :,編,程序引起如在沿著大方塊邊g的不連續以及沿著大方 • '' …· . . - 1 ν] ν'. ： '·：'1 :::：：： / 5 -·· ；；. . $ ·塊内的予方塊邊緣不連續的聋物(^r1;ifac1：)。結果泰在解碼圖，框出現了”邊緣”（edge) ’而原本沒有。去方塊效蠢濾波器, ’· · . ' ..· - .... .... 290係施用於由結合器280輸出之結合圖形，以移去這些邊緣 ^產物，存由去方塊效應:濾波器生之該解碼圖形_5“修鲁‘碼接下來的圖形。: ，: κ 一結合第1圖之討論，部分解碼器160在主處理器n〇上執行，而解碼器16〇亦有由圖形處理單元120提供視訊加速才曰令之優點。尤其是，在一些實施例中，去方塊效應濾波器 290使用由圖形處理單元120提供之一個或多個指令用來實現使用相對低運算成本之濾波。去方塊效應瀘波器去方塊效應濾、波器290係-多單元濾、波器（multi_tap ，其基於鄰近像素值調整子方塊邊緣的像素值。可依照解碼器16G施行之壓縮規範使用去方塊效應遽波器 290之不同實施例。各規範使用不同的濾波器參數，例如子區塊的尺寸、由職歧作更新之像素數目、該渡波器施用之頻率（例如列或每Μ行）。此外，各規範使用不同滤波ϋ長度結構。熟悉此項技藝者應暸解多單元濾波器，在此不討論特定單元之結構。由V(M規·定之去方 12Client’s Docket N〇.:S3U06-0023 TT，s Docket No:0608-A41202-TW/final/林璟輝/2007/05/31 200803525 塊效應滤波器實施例將結合第4圖說明。首先，: Figure _ is buried as a smaller sub-region _, called a large square. The 10 compression specification uses a large block size of 16x16 pixels, while the other pressures use other sizes. The large squares in the graph 235 are combined with the information of the previously decoded graph items, which is called inter-prediction (inter predicrfci〇n) processing, or combined with the information of other large squares of the graph 235, called intra-picture prediction prediction) deal with. The input bit stream 2〇5 is decoded by the entropy decoder 2〇5, and inter-plane or in-plane prediction is applied according to each type of pattern. One - -: · _ When applying inter-picture prediction, the entropy decoder 21 generates a motion vector • (m〇ti〇n vector) 245 output. The motion vector 245 is used for the temporary two code, which has the advantage of T. In fact, many images in a series of graphs will have the same value. The change from -graphic to other graphics is encoded as moving in 245. Motion compensation block 250 combines one or more previously decoded graphics 255 with motion vector 245 to produce a prediction pattern (265). When applying the inter-plane prediction, the space-compensation block 270 combines the information from the adjacent large blocks with the large blocks in the graphic 235 to produce a predicted pattern (275). The combiner 280 will use the graphic 235 and the mode selector (m〇de seiect〇r) 285 1 IClient's Docket N〇.: S3U06_0023 TT's Docket No: 0608-A41202-TW/fmal/林璟辉/2〇〇7/〇5/31 The mode selector 285 uses the entropy decoded bitstream to determine the predictive graph (265) produced by the combiner 11 200803525-280 using the motion compensation block 250 or the predicted circular 4 swim generated using the spatial compensation block 270 5). , Bu, ' '"· , 3⁄4 : r..ry-y:.: > .: .. ir :, edit, program caused by discontinuity along the big square edge g and along the generous side ' ' ...· . . - 1 ν] ν'. : '·:'1 :::::: / 5 -·· ;;. . $ · The inside of the block is not continuous with the edge of the square (^r1; Ifac1:). As a result, in the decoding of the picture, the box has an "edge" and there is no such thing. Go to the block effect filter, '·· . ' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : The filter produces the decoded pattern _5 "Shu Lu' code next graph. : , : κ As discussed in connection with Figure 1, part of the decoder 160 is executed on the main processor n〇, and the decoder 16〇 There is also the advantage of providing video acceleration by graphics processing unit 120. In particular, in some embodiments, deblocking filter 290 uses one or more instructions provided by graphics processing unit 120 for relatively low usage. Filtering of computational cost. Deblocking chopper to block effect filter, waver 290 system - multi-cell filter, waver (multi_tap, which adjusts the pixel value of the edge of the sub-block based on the neighboring pixel value. It can be implemented according to decoder 16G The compression specification uses different embodiments of the deblocking chopper 290. Each specification uses different filter parameters, such as the size of the sub-block, the number of pixels updated by the job ambiguity, and the frequency of the ferrite application (eg, column or Every bank In addition, each specification uses a different filter ϋ length structure. Those skilled in the art should understand the multi-cell filter, and the structure of the specific unit will not be discussed here. By V (M rule · 定之方方12Client's Docket N〇.:S3U06 -0023 TT,s Docket No:0608-A41202-TW/final/林璟辉/2007/05/31 200803525 The block filter embodiment will be explained in conjunction with Figure 4. First,

·、W·' .:1"· ：-,· ·.；-5'V.. ，.:UP 器之子方塊:豫素祿排脐結合第3圖說明。，思波第3圖顯衆戀鄰:近4x4子方塊（31〇, 320)，定義= 列R1 -R4讀脅鑛間的 A it ^ , M VC-1 »^ ^：：： W P4V^ii| 由該預定組中像素之計算與比較之特殊集合而定。熟項技藝者麟_這些料與比㈣可是為—㈣慮^元 (a set of taps) ’而詳細的計算與比較將稍後結合第5 圖討論。更新值亦基於對預定群組巾像素所執行之^算。該VC-1濾波器以類比方式處理最右邊的子方塊，判定=素 6.、7、8是否達到-標準’若達到該標準則更新朽。換言之’該【1滤波器為-預定列⑽之一群預定像素^ 緣像素P4與P5-根據同一列中其他群預定像素之值計算數值’ P4的值根據P卜P2、P3，而P5的值根據p6、p7、p8。該VC-1有條件的更新其餘列的相同群預定像素，係根據為該預定列（R3)之預定群像素（邊緣像素p4、p5)'^ 計算之值。如此一來’ R1中之P4基於R1中之ρι、p2、兕更新了’然而僅有R3中之P4、P5更新了。同樣地，R1中之P5基於R1中之P6、P7、P8更新了，然而僅有R3中之 P4、P5更新了。第2列與第4列亦以類似方式處理。從另一方面來看，在一預定第三列之像素的一些像素 BClienfs Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 13 200803525 被濾波或更新了，當在第三列之其他像素達到一標準時。 '、鐵4波器牵涉到對這些其他像素執行比較與計、算^着在第 /‘:.：.声我森2⑽些實施:例使用一開創性技術，康對灌三i ：據波雜著再暴其他列濾波。這些開創牲的技瘡將結合:么、 4、5、6A一6D圖，更詳細的說明。 • · . 、一，', ，•乂 Γ^·: · * * · ·ν, ; . .；'；；儘管第3 ·說明一列列的處理垂直邊緣，熟悉:此事技藝者應可瞭解同—圖旋轉⑽:度後亦可說明一行行處理:水平邊緣。熟悉此項技藝者亦可瞭解到儘管VC-1使用四列中的第三列作為判定有條件更新其他列的預定列，纟此揭露之原則亦可應用至使用其他預定列之實施例（例如第一列、第二列等），亦可應用至形成子方塊列數目不同之其他貫施例。同樣地，熟悉此項技藝者亦可瞭解到儘管VC-1 檢驗鄰近一組像素的值以設定欲更新像素之值，在此揭露 Φ 之原則亦可應甩至其他像素已被檢驗且其他像素已設定之實施例。就一範例而言，可檢驗Ρ2與Ρ3以判定Ρ4之更新值。另一範例，Ρ3可根據Ρ2與Ρ4之值設定。圖形處理單元120中之視訊加速單元150為一回路内去方塊濾波器（IDF，inloop deblockging filter)，例如由VC-1規範之回路内去方塊效應濾波器，實現硬體加速邏輯電路。一圖形處理單元指令實現此硬體加速邏輯電. 路，將於後說明。實現一 VC-1回路内去方塊效應濾波器之習知方法係平行處理各列/行，因為相同像素計算係在一 14Clienfs Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 14 200803525 、子方塊之各列/行執行。此習知方法每週期對兩個鄰近的 4x4子方塊濾波，但需__增_:邏輯閘·（increased gate w、 count)執行。相對的’推其::硬體加速邏輯電路4 〇〇?_:雜的:猶餘 y:…::列/行像素，而若這些:傷素達_議_要::求之標準，接箸順序，r '* :處理剩:不的那三列/行:。:此開_鑛_^ 的,邏輯稱數，其複製各列/行之機能。乂(：-1:回路内去方塊效應遽波器加速邏輯電路4 〇〇循序列處理每個適期對兩個· ⑩ 鄰近的^ 4x4子方塊濾波。此較長之濾波時間與圖形處理單元120之指令週期一致，其中該習知方法較快速的濾波，事實上比所需求之速度還快.，造成邏輯閘上的浪費。第4圖係VC-1回路内去方塊效應濾波器硬體加速邏輯電路400之硬體描述虛擬碼之列表。雖非使用實際硬體描述語言（HDL，hardware description language)，例如Verilog與VHDL而使用一虛擬碼，熟悉此項技藝者應對 Φ 這些虛擬碼相當熟悉。這些人應可瞭解當以實際HDL描述時，這些程式碼應可被編譯並接著合成為構成部分視訊加速單元150之數邏輯閘配置。這些人應當可瞭解到這些邏輯閘可以各種技術實現，例如一特定應用積體電路 (ASIC)、可程式化邏輯閘陣列（pga)或現場程式化邏輯閘陣列（FPGA)。此程式碼的410段係模組定義（mocjule definition)。 vc-i回路内去方塊效應濾波器硬體加速邏輯電路4⑽有許多輸入參數。要進行濾波之子方塊係由該方塊參數 ISClicnVs Docket No.：S3U06-0023 TT’s Docket No:0608-A41202-TW/fmaI/林璟輝/2007/05/31 200803525·, W·' .:1"· :-,· ·.;-5'V.. ,.: Sub-block of the UP device: Yu Sulu umbilical combined with the third figure. , Spoel 3 shows the neighbors: nearly 4x4 sub-blocks (31〇, 320), definition = column R1 - R4 read the threat between the A it ^ , M VC-1 » ^ ^::: W P4V ^ Ii| is determined by the particular set of calculations and comparisons of pixels in the predetermined group. The skilled artist Lin _ these materials and ratios (4) are - (4) a set of taps ' and detailed calculations and comparisons will be discussed later in conjunction with Figure 5. The updated value is also based on the calculations performed on the predetermined group of pixels. The VC-1 filter processes the rightmost sub-block in an analogy manner, and determines if the primes 6, 7, and 8 reach the -standard' if the standard is reached. In other words, the [1 filter is a predetermined column of the predetermined column (10), and the pixels P4 and P5 are calculated according to the values of the predetermined pixels of the other groups in the same column. The value of P4 is based on the values of P, P2, P3, and P5. According to p6, p7, p8. The VC-1 conditionally updates the same group of predetermined pixels of the remaining columns based on the values calculated for the predetermined group of pixels (edge pixels p4, p5) '^ of the predetermined column (R3). As a result, P4 in R1 is updated based on ρι, p2, 兕 in R1. However, only P4 and P5 in R3 are updated. Similarly, P5 in R1 is updated based on P6, P7, and P8 in R1, but only P4 and P5 in R3 are updated. Columns 2 and 4 are also treated in a similar manner. On the other hand, some pixels of a pixel in a predetermined third column are BClienfs Docket N〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 13 200803525 Filtered Or updated, when the other pixels in the third column reach a standard. ', the iron 4 wave is involved in the comparison and calculation of these other pixels, counted in the / ':.:. Sound I Sen 2 (10) implementation: using a groundbreaking technology, Kang on the irrigation three i: according to the wave Miscellaneous and other storms filter. These pioneering sores will be combined with: 4, 5, 6A, 6D, and more detailed instructions. • · . , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Same-picture rotation (10): After the degree can also explain the line processing: horizontal edge. Those skilled in the art will also appreciate that although VC-1 uses the third column of the four columns as a predetermined column for determining the conditional update of other columns, the principles of this disclosure may be applied to embodiments using other predetermined columns (eg, The first column, the second column, etc.) can also be applied to other embodiments that form different numbers of sub-blocks. Similarly, those skilled in the art will appreciate that although VC-1 checks the value of a group of neighboring pixels to set the value of the pixel to be updated, the principle of exposing Φ can be applied to other pixels that have been tested and other pixels. The example has been set. For an example, Ρ2 and Ρ3 can be tested to determine the updated value of Ρ4. As another example, Ρ3 can be set according to the values of Ρ2 and Ρ4. The video acceleration unit 150 in the graphics processing unit 120 is an in-loop deblocking filter (IDF). For example, a block-effect filter in the loop of the VC-1 specification implements a hardware acceleration logic circuit. A graphics processing unit instruction implements this hardware acceleration logic circuit, which will be described later. The conventional method of implementing a VC-1 loop deblocking filter is to process each column/row in parallel because the same pixel calculation is in a 14Clienfs Docket N〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/ Fmal/林璟辉/2007/05/31 14 200803525, each column/row of sub-blocks is executed. This conventional method filters two adjacent 4x4 sub-blocks per cycle, but requires __increasing _:increased gate w, count. Relative 'push it:: hardware acceleration logic circuit 4 〇〇? _: miscellaneous: y:y:...:: column/row pixels, and if these: the wounded _ _ _ want:: to the standard, In the order of the order, r '*: handle the remaining: the three columns/rows that are not:. : This open _ mine _ ^, logical scale, which replicates the function of each column / line.乂(:-1: In-loop de-blocking chopper accelerating logic circuit 4 〇〇 sequence processing each of the two * 10 adjacent ^ 4x4 sub-blocks filtered. This longer filtering time and graphics processing unit 120 The instruction cycle is consistent, and the faster filtering of the conventional method is actually faster than the required speed, resulting in waste on the logic gate. Figure 4 is the hardware acceleration of the deblocking filter in the VC-1 loop. The hardware of the logic circuit 400 describes a list of virtual codes. Although not using a virtual hardware description language (HDL), such as Verilog and VHDL, a virtual code is used, and those skilled in the art should be familiar with these virtual codes. These individuals should be aware that when described in actual HDL, these codes should be compiled and then synthesized into the number of logic gate configurations that make up the portion of video acceleration unit 150. These individuals should be aware that these logic gates can be implemented in a variety of techniques, For example, a specific application integrated circuit (ASIC), programmable logic gate array (pga) or field programmed logic gate array (FPGA). 410 segments of this code Module definition (mocjule definition) vc-i loop deblocking filter hardware acceleration logic circuit 4 (10) has many input parameters. The sub-block to be filtered is the block parameter ISClicnVs Docket No.: S3U06-0023 TT's Docket No :0608-A41202-TW/fmaI/林璟辉/2007/05/31 200803525

(Block parameter)所規範。若垂直參數（Vertical parametei^;’為真:（True)，則該加速邏輯電路400將方越方參數視.為啦良恭塊乂事見「第3圖），並執行垂直邊緣攄,:¾暴w 若垂直）rr爾雜加速邏輯電路40G 參數視麵辦•参塊(:參見第，3圖），(並執行水平邊_濾_編:； ,* 程丨:式潘衡7區飛‘:遽始f loop:) ’設定該迴.參數變數之值七第一次通過此迴，圈時，:以心迴圈參教設為3 ”故先處理第3u行 '。後、續的迴赞疊代設定 " 迴圈參數為1、2與:4。利甩這些參數，vc-1回路内去.方塊效應濾波态硬體加速邏輯電路400重複4次，每次處理 8個像素’其中一彳于可為一水平列或一垂直行，每一列係由行加速邏輯電路500所處理（參見第5圖）。在一些實施例中，此行加速邏輯電路5〇〇係以一 HDL次模組實現，將結合第5圖說明。區段430測試垂直參數以判定執行垂直或水平邊緣濾波。根據該結果，行陣列變數之8個元素係自該4x8輸入方塊之列或8x4輸入方塊之行勒始化。區段440藉由將迴圈參數與3做比較判定該第3行是否處理。若迴圈參數為3，另兩個控制變數，(Block parameter) is regulated. If the vertical parameter (Vertical parametei^; 'true: (True), then the acceleration logic circuit 400 will see the side of the parameter as a good thing, see the "3rd picture", and perform the vertical edge 摅,: 3⁄4 storm w if vertical) rr error acceleration logic circuit 40G parameter view surface • reference block (: see, 3), (and perform horizontal edge _ filter _ edit:;, * Cheng Hao: type Pan Heng 7 area Fly ': Start f loop:) 'Set this back. The value of the parameter variable is the first time this time passes through this loop. When the circle is rounded, the reference is set to 3", so the 3u line is processed first. After the continuation of the replies to the iteration setting " loop parameters are 1, 2 and: 4. These parameters are taken in the vc-1 loop. The block effect filter state hardware acceleration logic circuit 400 is repeated 4 times, each time processing 8 pixels 'one of which can be a horizontal column or a vertical line, each column system It is processed by the line acceleration logic circuit 500 (see Figure 5). In some embodiments, the line acceleration logic circuit 5 is implemented as an HDL sub-module and will be described in conjunction with FIG. Section 430 tests the vertical parameters to determine the execution of vertical or horizontal edge filtering. Based on this result, the 8 elements of the row array variable are initialized from the row of the 4x8 input block or the 8x4 input block. Section 440 determines if the third line is processed by comparing the loop parameter to 3. If the loop parameter is 3, the other two control variables,

ProcessingPixel3 與 FILTER—OTHER—3 則設為真。若迴圈參數不為3 ’將ProcessingPixel3設為真。區段 450 舉例說明另一 HDL 模組， VCl—IDC-FiIter-Line，該濾波器施用目前之行。（結合第 3圖所述，該行濾波器基於鄰近像素值更新邊緣像素值。） 16Clienfs Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/final/林璟輝/2007/05/31 16 200803525 和：供至該子模組之參數包含該控制變數ProcessingPixel3 and FILTER—OTHER—3 are set to true. If the loop parameter is not 3 ', ProcessPixel3 is set to true. Section 450 illustrates another HDL module, VCl-IDC-FiIter-Line, which applies the current line. (As described in Figure 3, the row filter updates the edge pixel values based on neighboring pixel values.) 16Clienfs Docket N〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/final/林璟辉/2007/05/31 16 200803525 and: The parameters supplied to the submodule contain the control variable

力邊丨::Pr〇CessingH 春:施例中，¥:C斗回路内去方塊效應:濾翁靡硬盤^ 權减省π額濟瀚入參數，一量也緩 • -^r ^；. ιλ·.~, r·^ ·"" :士、、Λ，秦夺模_處理翁列之後，vc—w 。·二器硬體加速邏:輯電路40.0在區段420以一迴圈參數截新值，y :儀繼續該疊。代迴圈。依此法，對輸入方塊之第3行象 * ^ ϋ· 5 ^ ΙΊτ ^ » 2 # ^ $ 4 η 〇第5圖係行加速邏輯電路500之硬體描述語言程式碼之列表，其實現了上述之子模組。程式碼之區段51 〇係一权組疋義。行加速邏輯電路500有許多輸入參數。將進行濾波的行係定義為行輸入參數。processingpixel3係一輸入參數，若該行為第行或第3列則藉由較高層邏輯電路將其設為真。參數FILTER—OTHER-3 —開始係由較高層邏輯電 φ 路設為真，而根據像素值由行加速邏輯電路500調整。區段520執行如VC-1所定之各種像素值運算。（因為該計算可以參考VC-1之規範理解，將不對這些運算作詳細說明。）區段530測試由較高層VC-1回路内去方塊效應遽波益硬體加速避輯電路400所提供之processingPixe 13 參數。若ProcessingPixel3為真，則區段530將一控制變數DO一FILTER初始化為一預設值，真。在區段520中間的運算之各種結果係用來判定是否也要處理其他3行。若該像素運算結果表示不處理其他3行，則將DOJPILTER設為 17Clienf s Docket N〇.:S3U06-0023 TT’s Docket No:0608_A41202-TW/fmal/林璟輝/2007/05/31 17 200803525 假0 若ProcessingP本x_l:為假，區段540使用輸入參數 u FILTER—0THER___翁靡象。脱導、回路内去方塊效應濾波器硬體加速邏輯遽路_§肺設赞泰以設定D0_FILTER之值。: 若DO—FILTER為:真，电段550:.測試談FILTER 新談行變數之該邊缘__P4MP5.:X:參;見第3圖區段560測試該奸〇〇€&3丨师耻狀13參數，並適當更新^丨％ FI LTER_0THER_3。該FI LTER_0THER_3變數係用來傳達此模一組中不同範例之狀態資訊。若ProcessingPixel_3為真，則區段550以DO—FILTER之值更新該FILTER—OTHER—3參數。此技術使得用來說明此模組之較高層模組（即 VCl-InloopFilter ) 提供由此例之 VC 一 1—INLOOPFILTER_LINE 低層模組所更新之 FILTER一OTHER一3 值至另一例之 VC—1JNL00PFILTER—LINE。熟悉此項技藝者應瞭解到第5圖之虛擬碼可以各種方式合成以產生實現行加速邏輯電路500之邏輯閘布置。其中一種布置係在第6A-D圖中說明，他們一起構成行加速邏輯電路500之方塊圖。熟悉此項技藝者應當對VC-1回路内去方塊效應濾波器演算法及邏輯電路結構感到熟悉。因此，第6A-D圖之元件將不詳述。而將選擇詳述行加速邏輯電路500之特徵。熟悉此項技藝者應瞭解到，VC_1回路内去方塊效應濾波器所牽涉到之運算包含下列，其中P1-P8係指像素在被處理之列/行中之位置。 18Client’s Docket N〇.:S3U06-0023 TT s Docket No:0608-A41202-TW/final/林環輝/2〇〇7/〇5/31 18 200803525 AO Αΐ A2 (2*(P3 - P6) - 5*(P4 - P5) + 4) >> 3 (初(矜:一 P4) - 5*(P2 - P3) + 4) » 3 Λr： Λ - Wr； · -；P8)-5，(P6 - P7) + 4) »3 稿;:感_異菱_教〔個奪:涉到3個個加個變。第&A層中之行加速邏輯電路5〇^ 部分使用共甩邏輯電路循序計算姚、八卜.而非為賴二稷利用夕工态循序處理各輸入，減少了邏輯閘及/ 或功率消耗。多工态605、610與620係用來從像素暫存器p_8在不同時序週賴擇不同之輸人，而這些輸人係提供給各共用邏輯電路方塊。邏輯電路方塊625與63()各執行一^^ 邏輯電路方塊635藉由執行左移丨位實現乘以_ 2 4乘以係由左移1位所實行，後面接一加法器645。加法器65〇將左移器635之輸出、一常數4與645輸出之負數加在一起。最後，邏輯電路方塊655執行右移3位。在第1時序週期，一輸入T=1係提供至各多工器 605、610與615，而計算A1之值並存在暫存器66〇。在第 2日守序週期，一輸入了=2係提供至各多工器6〇5、610與 615，而計算A2之值並存在暫存器665。在第3時序週期，一輸入T= 3係提供至各多工器β〇5、βΐ〇與gig，而計算 Α0之值並存在暫存器670。存在暫存器660、665、670之值Al、Α2、A3將被第6Β圖之部分行加速邏輯電路5〇〇所 19Clienfs Docket No. :S3U06-0023 TT’s Docket No:0608-A41202-TW/fiml/林璟輝/2007/05/31 19 200803525 ' 使用，將於後說明。P4暫存器（671)之輸出與P5暫存器 ( 673)之輸出將被第6C圖之部.分游加速邏輯電路500所 ‘ * . ☆ ' 使用，將於後說明。 · ; ’ ,* ί · 、 * ； . ^ , ——- * / '、 _ h 熟悉此項技藝者亦應瞭解在JCrl·回路内去。方塊敏應士:〜~滤、凌器所牽涉到後敘之額外運算:::::: :: α波：·? ^ ^ ^ ^ m - ^ ^ * · / _ »* t{ ·,«*·， « 1 if (CLIP >0) … :-r 、.；一 . .-,-· .； . ' ' ·;·；. > .. .... . ' if (D < 0) D = 0 if (D > CLIP)Force side 丨::Pr〇CessingH Spring: In the example, the block effect in the ¥:C bucket circuit: the filter 靡靡 hard disk ^ weight reduction π amount of input parameters, a quantity is also slow • -^r ^;. Ιλ·.~, r·^ ·"" :士,,Λ,秦模模_ After processing Weng Lie, vc-w. • Two hardware acceleration logic: The circuit 40.0 intercepts the new value in section 420 with a loop parameter, y: the instrument continues the stack. On behalf of the circle. According to this method, the third line of the input block is *^ ϋ· 5 ^ ΙΊτ ^ » 2 # ^ $ 4 η 〇 Figure 5 is a list of hardware description language code of the acceleration logic circuit 500, which is implemented. The above submodules. The section 51 of the code is a group of rights. The row acceleration logic circuit 500 has a number of input parameters. The line system to be filtered is defined as the line input parameter. Processingpixel3 is an input parameter. If the behavior is in the first or third column, it is set to true by the higher layer logic. The parameter FILTER_OTHER-3 - the start is set to true by the higher layer logic φ path, and is adjusted by the line acceleration logic circuit 500 according to the pixel value. Section 520 performs various pixel value operations as determined by VC-1. (Because the calculation can be understood with reference to the specification of VC-1, these operations will not be described in detail.) Section 530 is tested by the higher layer VC-1 loop de-blocking wave-boosting hardware acceleration avoidance circuit 400. processingPixe 13 parameters. If ProcessingPixel3 is true, then section 530 initializes a control variable DO-FILTER to a predetermined value, true. The various results of the operations in the middle of the segment 520 are used to determine if the other 3 rows are to be processed. If the pixel operation result indicates that the other 3 lines are not processed, set DOJPILTER to 17Clienf s Docket N〇.:S3U06-0023 TT's Docket No:0608_A41202-TW/fmal/林璟辉/2007/05/31 17 200803525 False 0 If ProcessingP This x_l: is false, and section 540 uses the input parameter u FILTER_0THER___. Decoupling, in-loop de-blocking filter hardware acceleration logic path _§ lung set Zantai to set the value of D0_FILTER. : If DO-FILTER is: true, segment 550: test the edge of the FILTER new talk variable __P4MP5.:X: 参; see Figure 3 section 560 test the traitorous & Shame 13 parameters, and update ^丨% FI LTER_0THER_3 as appropriate. The FI LTER_0THER_3 variable is used to convey status information for different examples in this model group. If ProcessingPixel_3 is true, then section 550 updates the FILTER_OTHER-3 parameter with the value of DO_FILTER. This technique allows the higher layer module (ie, VCl-InloopFilter) used to describe the module to provide the FILTER-OTHER-3 value updated by the VC-1-INLOOPFILTER_LINE low-level module of this example to another VC-1JNL00PFILTER— LINE. Those skilled in the art will appreciate that the virtual code of Figure 5 can be synthesized in a variety of ways to produce a logic gate arrangement that implements line acceleration logic circuit 500. One of the arrangements is illustrated in Figures 6A-D, which together form a block diagram of the row acceleration logic circuit 500. Those skilled in the art should be familiar with the block-effect filter algorithm and logic circuit structure in the VC-1 loop. Therefore, the elements of Figures 6A-D will not be described in detail. The features of the row acceleration logic circuit 500 will be selected in detail. Those skilled in the art will appreciate that the operations involved in the VC_1 loop deblocking filter include the following, where P1-P8 refers to the position of the pixel in the column/row being processed. 18Client's Docket N〇.:S3U06-0023 TT s Docket No:0608-A41202-TW/final/林环辉/2〇〇7/〇5/31 18 200803525 AO Αΐ A2 (2*(P3 - P6) - 5 *(P4 - P5) + 4) >> 3 (initial (矜:一P4) - 5*(P2 - P3) + 4) » 3 Λr: Λ - Wr; · -;P8)-5,( P6 - P7) + 4) »3 Draft;: Sense _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The line acceleration logic circuit 5〇^ in the layer &A layer uses the 甩 logic circuit to sequentially calculate Yao and Ba Bu. Instead of using the circumstance to process each input sequentially, the logic gate and/or power consumption is reduced. Multiple modes 605, 610, and 620 are used to select different inputs from the pixel register p_8 at different timings, and these input lines are provided to the common logic circuit blocks. Logic circuit blocks 625 and 63() each execute a logic block 635 by multiplying the left shift clamp by _ 2 4 by the left shift by 1 bit followed by an adder 645. The adder 65 加 adds the output of the left shifter 635, a constant 4 and the negative of the 645 output. Finally, logic circuit block 655 performs a right shift of 3 bits. In the first timing cycle, an input T = 1 is supplied to each of the multiplexers 605, 610, and 615, and the value of A1 is calculated and stored in the register 66. On the second day of the sequence, an input = 2 is provided to each of the multiplexers 6〇5, 610, and 615, and the value of A2 is calculated and stored in the register 665. In the third timing cycle, an input T = 3 is supplied to each of the multiplexers β 〇 5, β ΐ〇 and gig, and the value of Α 0 is calculated and stored in the register 670. The values of the registers 660, 665, and 670, Al, Α2, A3, will be accelerated by the partial line of the sixth line. 19Clienfs Docket No.: S3U06-0023 TT's Docket No: 0608-A41202-TW/fiml /林璟辉/2007/05/31 19 200803525 'Use, will be explained later. The output of the P4 register (671) and the output of the P5 register (673) will be used by the section _ _ ☆ ' of the branching acceleration logic circuit 500 of FIG. 6C, which will be described later. · ; ' , * ί · , * ; . ^ , ——- * / ', _ h Those who are familiar with this skill should also know to go inside the JCrl circuit. Block Min Yingshi: ~~ Filter, Linger is involved in the extra operation of the following:::::: :: α wave:·? ^ ^ ^ ^ m - ^ ^ * · / _ »* t{ ·, «*·, « 1 if (CLIP >0) ... :-r ,.; a . .-,-· .; . ' ' ·;·;. > .. .... . ' if (D < 0) D = 0 if (D > CLIP)

D = CLIP • — — _} - - - ~ . - ——- - ..— .... else if (D > 0) D = 0D = CLIP • — — _} - - - ~ . - ——- - ..- .... else if (D > 0) D = 0

if (D < CLIP) D = CLIP 第6B圖之部分行加速邏輯電路500從第6A圖之部分 20Client’s Docket No.:S3U06-0023 TT，s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 20 200803525 行加速邏輯電路5⑽接收輸入，並計算D( 675)。再次參照第6A圖，GLIB :(:6:77:)係如下產生：像素P4與P5由邏輯電路方塊679相減.，該結果由邏輯電路方塊680右移（整ϋ斤 — ' . 數除以辨媳產生eLIF:67,7;'v回到第^ A1可在第i立週期自暫存。__0雅得，第二週期自暫存器:_5 ;〇取得，A0 期自暫存器670取得。因涵，在第西1¾]¾ 週期，第6圖之部分行加速邏輯電路500根據上述之方程:：式計算D (6Ϋ5):。行加速邏輯電路500利用（675)以更新Ρ4、Ρ5之像 A : 素位置。尤其是，P4=P4-D而P5=P5+D。儘管第6A、6B圖先前結合單一列/行（例如單一組像素位置P0-P8)說明，一子區塊第3列/行之運算會影響該子區塊其他3列/行之行為。行加速邏輯電路500利用一開創性方法實現此行為。當獨立濾波運算從最前面開始-平行地-完成，結合第 6A、6B圖之說明，示於第6C、6D圖之部分行加速邏輯電路500有條件的選擇要更新之位置。換言之，VC-1回路内去方塊效應濾波器硬體加速邏輯電路400判定是原本的值被寫回或新的值被寫回。相對地，一習知方法，一 VC-1 回路内去方塊效應滤·波斋使用迴圈，所以獨立滤·波運鼻有條件地執行。如先前說明的，第4圖解釋行加速邏輯電路500的虛擬碼在一迴圈内如此運作：在一重複區段420中出現了示例區段（instantiation section) 450。此外行加速邏輯電路500之示例使用2個參數，ProcessingPixel3與 21Client’s Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 21 200803525 FILTER—0THER—3。用行加速邏輯電路5〇〇的這些參數如下 ::執行像素P4、P5有條件的更新。參見第圖，.暫秦器U :，寫〉入漆法器：681之結果，其中減法器6S1有_輪為為故魏.:.％同:樣地:’暫存:|〇P㈣八加法器.之誥果'，漆 :丨有4輸々為瑪673)丨，為〇或D( 675>，依^ 值而定？固而，P4之更新值為原本之P4值（若p(LFILTER 為假），或P4-D。同樣地，P5之更新值為原本之p5值Γ • do_f™ ,, 熟悉此項技藝者應當瞭解到，當處理一子方塊第3列時，以P4-D更新P4之標準為： ((ABS(AO) < PQUANT) OR (A3 < ABS(AO)) OR (CUP 卜〇) DO一FILTER 683係由第6D圖中檢驗這些條件的部分行加速邏輯電路500所計算。多工器687提供一輸入至诹閑697，若 ABS(AO) < PQUANT則選擇一真輪出，其他則為假。多工器689 修提供另一輸入至0R閘697，若A3<ABS(A0)則選擇一真輸出，其他則為假。多工器691提供另一輸入至〇R閘697，若CLIP ! =〇則選擇一真輸出，其他則為假。 DO—FILTER 683係由多工器693所提供，其利用控制輸入 Processing—Pixel—3 ( 695)以選擇輸出 0R 閘 697 的輸出或輸入信號FILTER一OTHER一3 ( 699 )。輸入If (D < CLIP) D = CLIP Part 6B of the line acceleration logic circuit 500 from the part of Figure 6A 20Client's Docket No.: S3U06-0023 TT, s Docket No: 0608-A41202-TW/fmal/林璟辉/ 2007/05/31 20 200803525 Line acceleration logic circuit 5 (10) receives the input and calculates D (675). Referring again to Figure 6A, GLIB:(:6:77:) is generated as follows: Pixels P4 and P5 are subtracted by logic circuit block 679. The result is shifted right by logic circuit block 680 (the whole number - '. The eLIF is generated by identification: 67, 7; 'v returns to the first ^ A1 and can be temporarily stored in the i-th cycle. __0雅得, the second cycle from the scratchpad: _5; 〇, A0 period from the scratchpad 670. According to the culvert, in the west 13⁄4] 3⁄4 cycle, the partial row acceleration logic circuit 500 of FIG. 6 calculates D (6Ϋ5) according to the above equation:: The row acceleration logic circuit 500 uses (675) to update Ρ4 , Ρ5 image A : prime position. In particular, P4 = P4-D and P5 = P5 + D. Although the 6A, 6B diagram was previously combined with a single column / row (for example, a single set of pixel positions P0-P8), a child The operation of the third column/row of the block affects the behavior of the other three columns/rows of the sub-block. The row acceleration logic circuit 500 implements this behavior using a groundbreaking method. When the independent filtering operation starts from the front-parallel-complete In conjunction with the description of FIGS. 6A and 6B, the partial line acceleration logic circuit 500 shown in the 6C and 6D diagrams conditionally selects the position to be updated. In other words, the VC-1 circuit. The inner deblocking filter hardware acceleration logic circuit 400 determines whether the original value is written back or the new value is written back. In contrast, a conventional method, a VC-1 loop deblocking filter is used. The loop is looped, so the independent filter and wave nose are conditionally executed. As explained earlier, Fig. 4 explains that the virtual code of the line acceleration logic circuit 500 operates in a loop: an example appears in a repeating section 420. An instantiation section 450. In addition, the example of the row acceleration logic circuit 500 uses two parameters, ProcessingPixel3 and 21Client's Docket N〇.: S3U06-0023 TT's Docket No: 0608-A41202-TW/fmal/林璟辉/2007/05/ 31 21 200803525 FILTER—0THER—3. These parameters of row acceleration logic circuit 5〇〇 are as follows: Perform conditional update of pixels P4 and P5. See figure, temporary Qin U:, write 〗 lacquer : 681 results, in which the subtractor 6S1 has _ round for the sake of Wei.:.% with the same: sample: 'temporary storage: | 〇 P (four) eight adder. The result of 'paint: 丨 has 4 loses 々 673) 丨, for 〇 or D ( 675 >, depending on the value of the value, the update value of P4 is the original P4 value (if p (LFILTER is false), or P4-D. Similarly, the updated value of P5 is the original p5 value do • do_fTM ,, those skilled in the art should understand that when processing a sub-block In the case of 3 columns, the standard for updating P4 with P4-D is: ((ABS(AO) < PQUANT) OR (A3 < ABS(AO)) OR (CUP Divination) DO-FILTER 683 is from Figure 6D The partial line acceleration logic circuit 500 is tested to verify these conditions. The multiplexer 687 provides an input to the idle 697. If ABS(AO) < PQUANT selects a true round, the others are false. Multiplexer 689 repair provides another input to 0R gate 697. If A3<ABS(A0) selects a true output, the other is false. Multiplexer 691 provides another input to 〇R gate 697, if CLIP ! =〇 selects a true output, others are false. The DO-FILTER 683 is provided by the multiplexer 693, which uses the control input Processing-Pixel-3 (695) to select the output of the output 0R gate 697 or the input signal FILTER-OTHER-3 (699). Input

Processing—Pixel—3 ( 695)與 FILTER—OTHERJ3 ( 699 )先前結合第4圖與舉例說明行加速邏輯電路50〇之較高層 22Client’s Docket N〇.:S3U06-0023 TT’s Docket No:0008-A41202-TW/fmal/林璟輝/2007/05/31 22 200803525 VC-1回路内去方塊效應濾波器硬體加速邏輯電路400的虛擬碼已說明過了。回到第:¾.¾)¾，當處理第3行/列時（第 1圈），Processing_Pi餘1梦_:95'私設為真，其他則為假。基於關於PQUANT、AB弱A0辣疋之條赞，記錄--中間變數D0_FILTER，不論捋洩铲是杏丨受_。最後:FILTH^0THER_3 (699 ):之值係設自該中。第邏輯電路部分之行加速邏輯電路悉G0之結果係:為每、4個週期，在4:鄰近列/行之中4〃、像素位置設為濾波後的; 值(根據A0-A3、PQUANT、CLIP尊變數）或再:次寫入其原; 本的值。該VC-1去方塊效應加速單元400開創性地採用平行與循序之結合，如前所述。平行處理提供較快速的執行並減少延遲。儘管平行化增加了邏輯閘數，但增加量被前述的循序處理所抵銷。沒有使用前述循序處理的習知方法徒增邏輯閘數。圖形處理單元120的一些實施例包含一用於H. 264去方塊效應的硬體加速單元，而此去方塊效應功能係透過Sl· 形處理單元指令以供使用。圖形處理單元12〇將結合第8 圖詳細說明，並加強說明提供H· 264去方塊效應加速功能的圖形處理單元指令特殊選擇。圖形處理器多重去方塊效應指令的原理圖形處理單元120的指令集包含在軟體裡執行的部 23Client’s Docket N〇.:S3U06-0023 TT，s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 23 200803525 分解碼态160可用來加速一去方塊效應濾波器。在此說明一開創性技_鞮供不只一個的多重圖形處理單元指令琢知速特疋去黄塊效應濾波器。轉内去方塊效應濾波器獅^ 原本就雜細梅娜波器必須以_定轉對像= 素滤魏餘豁4纖左到右接著從上到下）。爾而，^ 先前_ __祕像素_後面像素時被拿來祕澈^ 入。:主癡理愨處理儲存在紧知記憶喱的像素值，這使得像 t - ft - « . ^ it #^ 4. f t ^^ ^ 去方塊效應渡波器290使用一獨形處理單元加速部分滤波處理時無法適當配合。習知圖形處理單元將像素儲存:二紋理快取（texturecache)，而該圖形處理單元管線設計不遵從一個接一個（back-to-back )讀取、寫人紋理快二。在此揭露圖形處理單元120的一些實施例提供多重圖形處理單元指令，其可-起用來加速—特定去方塊效應濾波器。其中一些指令把紋理快取當像素資料源，而一些指令使用圖形處理單元執行單元作為#料源。回路内去方塊效應渡波器290適當的結合使用這些不同的圖形處理單元指令以達成一個接一個讀取、寫入像素。接下概要明流經圖形處理單元1㈣㈣，再接著_由_處理單元120提供的去方塊效應加速指令’與回路内去方塊效應濾波器290運用這些指令。圖形處理單元流第7圖係圖形處理單元120資料流的圖，其中指令流 24Clienfs Docket N〇.：S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 24 200803525 係由第7圖左邊之箭頭，而影像或圖形流係由右邊的箭頭麵賴;爆對_釋圖形處理單元12〇之回潞内法亦塊魏應游徵非二蹲V:暴必夢· 一·输贏指聲7歡，徽解碼該指令，產生措冷费料 _成匕:修雜士 :::J :·_、加速視訊編碼及/:或解碼的指令0，: . ： h : ' · · .」.· ’ ：： W:.二 U ϋ : ·: 習知圖:形處理指:令牽涉到如頂點著色（;vertex :ϊ ® shading )- ^ ^ ^ i>( geometry shading)Λ ^ ^ ^ pixel^ shading)等難題。因此，指令資料730係施用於著色器執行單元（shader execution units)之池（pool) 740。著色執行單元必要使用一紋理濾波單元（TFU，texture filter unit) 750以施加一紋理至一像素。紋理資料係快取自紋理快取760，其係在主記憶體（未示）後面。一些指令送給視訊加速器150,其運作將於後說明。鲁產生的資料接著由後包裝器（ post-packer 770)處理，其壓縮該資料。在後處理（post-processing)之後，由視訊加速單元所產生的資料係提供給執行單元池（ execution unit pool) 740 ° 視訊編碼/解碼加速指令的執行，例如前述之去方塊效應濾波指令，在許多方面與前述之習知圖形指令不同。首先，視訊加速指令係由視訊加速單元執行，而非著色器執行單元。其次，視訊加速指令不使用其紋理資料。然而，視訊加速指令所使用的影像資料與圖形指令所 25 200803525 ^ 使用的紋理資料均為2維陣列。圖形處理單元120同樣利用此優點，使用紋藝:滤^波渾元:750下載給視訊加速單元 : 150的影像資料，:讓爾狀簌_缺取冊0快取一些由視訊加纖、一速單元150運作_驗穩翁。掘參b，義於第7圖，視訊加難速單元150潘:拉翁欽第槭_單身::勝Qi:與後包裝器770之义7 M： g , ，-· ;紋，理滤波早元..750:檢驗從損令720顧取的指令貧料. :身…影I資料的座標。在一實施例申:.…這:: 對，熟悉此項技藝者應對此熟悉。當指令720係一視訊加速指令時，所擷取的指令資料更命令紋理濾波單元750略過紋理濾波單元750内的紋理濾波器（未示）。依此法，紋理濾波單元750係受操縱為視訊加速指令去下載影像資料給視訊加速單元150。視訊加速單元150 從資料路徑上的紋理濾波單元750接收影像資斜，與命令路徑上的命令資料730，並根據命令資料730對該影像資 * 料執行一運作。由視訊加速單元150所輸出影像資料係回饋給執行單元地740，在由後包裝器770處理之後。去方塊效應指令在此敘述之圖形處理單元120之實施例，提供VC-1 去方塊效應濾波器與L 264去方塊效應濾波器硬體加速。 VC-1去方塊效應濾波器係由一圖形處理單元指令 (” IDF_VC-1”）加速，而H.264去方塊效應瀘波器由三 26Client5s Docket No.：S3U06-0023 TT，s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 26 200803525 個圖形冬”猶_64、〇3 L 卜、土二 -：.-?· V- ： . ... 力，迷，tc，:。;*‘:飞二.).¾ -.¾ 處理單 ” IDF_H264一Γ 離前說w的’奪_形處理單元指令係解瑪.其分漸' :(如rsed > 4指:令、資雜招〇，樓可視為各指令之特顏參數菩 :集p嵌於::第绎秦:。潴令共用一些共_參漱以雨t 其他的為各指令獨有的。，熟悉此項技藝者應瞭解到這聲參] 數可饮使角各稜操作鱗(0pc〇d6)與指令格式編碼，所奴這些議題將不在此討論、Λ : ^ ^ ' - · '- . 第1表:IDF-H264指令的參數參數大小運算元敘述 FieldFlag (Input) 1-位元若 FieldFlag = 1 貝1J Field Picture，其他 Frame Picture TopFieldFlag (Input) 1-位元 v ,- - 若 TopFieldFlag = 1 貝1J Top-Field-Pictxire，其他 Bottom-Field-Picture 若設定了 FieldFlag· PictureWidth (Input) 16-位元例如，用於HDTV之 1920 PictureHeight (Input) 16-位元例如，用於30PHDTV 之 1080 YC Flag 1-位元 Control-2 Y平面or彩度平面 27Client’s Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 27 200803525Processing—Pixel—3 (695) and FILTER—OTHERJ3 ( 699) previously combined with Figure 4 and illustrated by the higher layer 22Client's Docket N〇.:S3U06-0023 TT's Docket No:0008-A41202-TW /fmal/林璟辉/2007/05/31 22 200803525 The virtual code of the block-effect filter hardware acceleration logic circuit 400 in the VC-1 loop has been described. Go back to the first: 3⁄4.3⁄4)3⁄4, when processing the 3rd row/column (1st lap), Processing_Pi 1st dream _: 95' private set to true, others are false. Based on the praise of PQUANT, AB weak A0, the record - the intermediate variable D0_FILTER, regardless of the shovel is apricot. Finally: FILTH^0THER_3 (699): The value is set from this. The result of the logic of the logic circuit part of the logic circuit is G0: for every 4 cycles, 4: adjacent column/row, the pixel position is set as filtered; value (according to A0-A3, PQUANT) , CLIP deity number) or again: write the original; the value of this. The VC-1 deblocking acceleration unit 400 pioneered the combination of parallel and sequential, as previously described. Parallel processing provides faster execution and reduces latency. Although the parallelization increases the number of logic gates, the amount of increase is offset by the aforementioned sequential processing. The conventional method of using the aforementioned sequential processing does not increase the number of logic gates. Some embodiments of graphics processing unit 120 include a hardware acceleration unit for H.264 deblocking, and the deblocking function is used by the S1 processing unit for use. The graphics processing unit 12A will be described in detail in conjunction with FIG. 8 and enhances the specification of the graphics processing unit instructions that provide the H.264 deblocking acceleration function. Principle of the multi-blocking effect of the graphics processor The instruction set of the graphics processing unit 120 is included in the software executing part 23Client's Docket N〇.:S3U06-0023 TT,s Docket No:0608-A41202-TW/fmal/林璟辉/2007 /05/31 23 200803525 The sub-decoding state 160 can be used to speed up a deblocking filter. Here is a pioneering technique. 多重Multiple graphics processing unit commands for more than one. Turn inside the square effect filter lion ^ Originally, the fine Meena wave must be _ fixed to the opposite direction = the prime filter Wei Yuxu 4 fiber left to right and then from top to bottom). And, ^ Previous _ __ secret pixel _ behind the pixel was taken to the secret ^ into. : The main idiot handles the pixel value stored in the tightly understood memory gel, which makes it like t - ft - « . ^ it #^ 4. ft ^^ ^ The block effect waver 290 uses a single-shaped processing unit to accelerate partial filtering It cannot be properly coordinated during processing. The conventional graphics processing unit stores the pixels: two texture caches, and the graphics processing unit pipeline design does not follow the one-to-back read and the write texture fast. It is disclosed herein that some embodiments of graphics processing unit 120 provide multiple graphics processing unit instructions that can be used to speed up a particular deblocking filter. Some of these instructions use texture cache as the pixel data source, while some instructions use the graphics processing unit execution unit as the # source. The in-loop deblocking effector 290 suitably uses these different graphics processing unit instructions to achieve read and write pixels one after the other. The following figure is flowed through the graphics processing unit 1 (4) (4), and then the deblocking acceleration command 'provided by the processing unit 120' and the in-loop deblocking filter 290 are used. Graphic Processing Unit Flow Figure 7 is a diagram of the data flow of the graphics processing unit 120, wherein the instruction stream is 24Clienfs Docket N〇.: S3U06-0023 TT's Docket No: 0608-A41202-TW/fmal/林璟辉/2007/05/31 24 200803525 The arrow on the left side of Figure 7 is used, and the image or graphics flow system is flanked by the arrow on the right; the bursting of the _ release graphics processing unit 12 〇潞潞潞潞魏魏魏应应应征 : : : : : : : : : : : : : · Winning and losing refers to 7 Huan, the emblem decodes the instruction, and generates the cold charge _ Cheng Hao: repair the miscellaneous:::J:·_, speed up the video encoding and /: or decode the instruction 0,: . : h : ' · · . . . . ' : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ) Λ ^ ^ ^ pixel^ shading) and other problems. Thus, the instruction material 730 is applied to a pool 740 of shader execution units. The rendering execution unit must use a texture filter unit (TFU) 750 to apply a texture to a pixel. The texture data is taken from texture cache 760, which is behind the main memory (not shown). Some instructions are sent to the video accelerator 150, the operation of which will be described later. The data generated by Lu is then processed by a post-packer (post-packer 770), which compresses the data. After post-processing, the data generated by the video acceleration unit is provided to the execution unit pool 740 ° video encoding/decoding acceleration instruction, such as the aforementioned deblocking filtering instruction, Many aspects are different from the conventional graphical instructions described above. First, the video acceleration command is executed by the video acceleration unit instead of the shader execution unit. Second, the video acceleration instructions do not use their texture data. However, the image data used by the video acceleration command and the graphics command are all 2-dimensional arrays. The graphics processing unit 120 also utilizes this advantage, using the pattern art: filter ^ 浑 : : 750 download to the video acceleration unit: 150 image data,: let the _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Speed unit 150 operates _ test stability. Excavation b, meaning in the 7th picture, video plus hard speed unit 150 Pan: Laonqin first maple _ single:: win Qi: with the post-packer 770 meaning 7 M: g, ,-· ; Early Yuan.. 750: Inspect the instruction poor material from the damage order 720. : The coordinates of the body I file. In an embodiment of the application: .... This:: Yes, those skilled in the art should be familiar with this. When the instruction 720 is a video acceleration command, the captured instruction material further instructs the texture filtering unit 750 to skip the texture filter (not shown) in the texture filtering unit 750. According to this method, the texture filtering unit 750 is manipulated as a video acceleration command to download image data to the video acceleration unit 150. The video acceleration unit 150 receives the image resource skew from the texture filtering unit 750 on the data path, and the command material 730 on the command path, and performs an operation on the image material based on the command material 730. The image data output by the video acceleration unit 150 is fed back to the execution unit 740 after being processed by the post wrapper 770. Deblocking Instructions An embodiment of the graphics processing unit 120 described herein provides a VC-1 deblocking filter and an L 264 deblocking filter hardware acceleration. The VC-1 deblocking filter is accelerated by a graphics processing unit instruction ("IDF_VC-1"), while the H.264 deblocking chopper is composed of three 26Client5s Docket No.: S3U06-0023 TT, s Docket No: 0608-A41202-TW/fmal/林璟辉/2007/05/31 26 200803525 graphics winter "Jesus _64, 〇3 L Bu, Tu 2 -:.-?· V- : . ... force, fan, tc , :.;*': fly two.).3⁄4 -.3⁄4 processing single "IDF_H264 one Γ Γ Γ 的的的的的夺夺夺 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ : Orders, miscellaneous tricks, the building can be regarded as the special parameters of each command. Bo: The set p is embedded in:: Dijon Qin: The order is shared by some _ 漱漱漱雨其他其他其他其他其他其他其他其他其他其他其他其他其他其他其他 t t t t t t t Those who are familiar with this skill should understand that this sound can be used to make the angular operation scales (0pc〇d6) and the instruction format code. These issues will not be discussed here, Λ: ^ ^ ' - · '- Table 1: IDF-H264 instruction parameter parameter size operation element description FieldFlag (Input) 1-bit if FieldFlag = 1 Bay 1J Field Picture, Other Frame Picture TopFieldFlag (Input) 1-bit v, - - If TopFieldFlag = 1B 1J Top-Field-Pictxire, Other Bottom-Field-Picture If FieldFlag· PictureWidth (Input) 16-bit is set, for example, 1920 PictureHeight (Input) 16-bit for HDTV, for example, 1080 for 30PHDTV YC Flag 1-bit Control-2 Y plane or chroma plane 27Client's Docket N〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 27 200803525

Field 1-位元 Control-1 Direction t、'一：: CBCR Flag 1-位元 Control-1 ••Gb·::或，Cf Base崩龜ss 32-位 . · · *、 -· :·、： · ： • . ,:二·' . ·二.-· ；> ， ’> ’ ·· :爾於IDF須’ (In_ ”；元無符 . _ 二 :效理§己’丨思體中之子方，塊號的 ... • .， . 基本位址: BlocKddress 13.3 SRC1[0:15]^ 用於 IDFJH6420: (Input) :格: U 整個子方塊（關於基本式，省 SRC1[31:16] = 位址）之紋理座標略分 V ForH)F—H64—1:剩下數部的子方塊（關於基本位分址）之紋理座標在IDF_H64_2未使用 DataBlockl 4x4x8- 在IDF—H64J)未使用位元 SRC2[127:0] 用於IDF一H64J:子方塊的上半或左半部，根據依Control 2參數編碼的 FilterDirection SRC2[127:0] 用於 EDF—H64—2:第一 (偶數）暫存器對 DataBlock2 4x4x8- 在 IDF_H64_0 〇或位元 jDF__H64 1中未使用 SRC2[255:128] 用於IDF H64 2:第二 28Client5 s Docket N〇.：S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 28 200803525 (奇數）暫存器對 Sub-block (Output) ' > ,， 128‘翁， ” -· .....·. ’ .' .. d· .n .，_ 去方塊效應之 8x4x8-bit 子方塊(128' _:元)、 “ • > ： . ; ' 結合使用_学輪屬參:數谈判定由紋理濾波單元75a_m 擷取的4χ4方橡，址本BaseAddress參數指出在紋理快取 _ 中該、、、文理為料的起點。將此區域内左上方塊座標給Field 1-bit Control-1 Direction t, 'one:: CBCR Flag 1-bit Control-1 ••Gb·:: or, Cf Base ss 32-bit. · · *, -· :·, : · : • . , :二·' . ·二.-· ;> , '> ' ·· : In the IDF must ' (In_ ”; Yuan is not. _ II: § § 己 '丨思思The child of the body, the block number... • ., . Basic address: BlocKddress 13.3 SRC1[0:15]^ For IDFJH6420: (Input): Grid: U The entire sub-block (for basic, SRC1[ 31:16] = Address) Texture coordinates are slightly V ForH)F—H64—1: The remaining squares of the sub-blocks (for basic bit addressing) are not used in IDF_H64_2 DataBlockl 4x4x8- at IDF—H64J The unused bit SRC2[127:0] is used for IDF-H64J: the upper or left half of the sub-block, according to the FilterDirection SRC2[127:0] encoded according to the Control 2 parameter for EDF-H64-2: One (even) register pair DataBlock2 4x4x8- is not used in IDF_H64_0 〇 or bit jDF__H64 1 for SRC2[255:128] for IDF H64 2: second 28Client5 s Docket N 〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 28 200803525 (odd) register to Sub-block (Output) ' > ,, 128' Weng, ” -· .....·. ' .' .. d· .n ., _ 8x4x8-bit sub-block of block effect (128' _: meta), " • > : . ; ' The wheel ginseng: the number is negotiated by the texture filtering unit 75a_m. The BaseAddress parameter indicates the starting point of the texture and texture in the texture cache _. The upper left square of this area is given to

BaseAddress 參數。picturefjeig社與 picturewidth 输入參數係用來判斷該方塊的範圍，即左下方座標。最後、視訊圖形可為漸進式掃瞄（progessive)或隔行掃目苗 (interlace)。若為隔行掃瞄，其係由兩個方向組成（上方與下方）。紋理濾波單元750使用FieldFlag與 TopFieldFlag以適當處理隔行掃瞄影像。去方塊效應8x4x8位元輸出係提供於一目標暫存哭，鲁且亦寫回執行單元池740。將去方塊效應輸出寫回執行單元池740係一”位置修改（m〇dify in place)”運作，在某些解碼器的實現中是必要的，例如Η· 264其中方塊中之像素值’右邊與下方，係依先前的結果所計算。然而1 解碼器不像H.264有此限制關係。在？(：-1中，對每個85(8 邊界（先垂直再水平）濾波。所有的垂直邊緣可以因而卞質上平行地執行，4x4邊緣稍後濾波。可以利用平行化因為僅有兩個像素（一個邊緣一個）被更新，而這些像素不用來計算其他邊緣。既然去方塊效應資料是寫回執行單元 29Client’s Docket N〇.:S3U06-0023 TT’s Docket N(x0608_A41202-TW/fmal/林璟輝/2007/05/31 29 200803525 池740而非紋理快取760，提供了不同的IDFJI264一χ指，令·、這子方塊從不同位置被擷取。這可在第j .参中看到，在 llockAddress 的敘述中，Data Block !與邱· :參輕腳_β_ :狼 :元i子廉塊HDF—Η264一 1指令從紋理快取76〇擷•棄俩綠衣：，儀並:^^祿#’元撕:挪擷取半―個。A、… ::::隨獬碼，器1則而變之JDFJ1264—X指令的功用释 8圖詳述。接下來敘述在供應像素資料給視訊加速章元，前，紋理濾波單元750與執趣單元池740轉換所擷取的餘素資料的處理。影像資料的韓換上述之指令參數，提供欲從紋理快取76〇或從執行單元池740解取的子方塊位址之座標給紋理濾波單元75〇。影像資料包含亮度（Y)與彩度（Cb，Cr)平面。一 YC旗標輸八參.數定義要處理Y平面或是CbCr平面。當處理焭度（Y)資料時，如YC旗標參數所標示的，紋理濾波單元750擷取該子方塊並提供該128位元作為 VC-1回路内去方塊效應濾波器硬體加速邏輯電路4⑽ 的輸入（例如第4圖之VC-1加速器範例之方塊输入參數）。所產生的資料係寫入目標暫存器作為一 4組—暫存器 (register quad ，即，DST 、 DST+1 、 DST+2 、 DST+3)。當處理彩度資料時，如YC旗標參數所標示的，cb與 Cr方塊將由VC-1回路内去方塊效應濾波器硬體加速邏 30Clienfs Docket N〇.：S3U06-0023 TPs Docket No:0608-A41202-TW/fiml/林璟輝/2007/05/31 30 200803525 輯電路400連續地處理。所產生的資料係寫入紋理快取 760。在—些實施例中，此霖八_作_各週期中發生，每個週期窝入256位元。j;與、心 % —些視訊加速單元丨實’衡條，隔行掃瞄平面，各 :存為一芈寬度與一半長礎。:觸遽_實施例中:，，敢理濾:波單 .元75α_視訊加速單元450、騰，子方塊資料解隔_掃一 /:晦急辩来溝通紋理濾波單元;與親訊加速單元拉鲁緩衝器。尤其是，致理濾波單元霄50將個4x4 Cb方塊^ 寫入該緩衝器^接著將2個方塊寫入該緩衝器每&x# Cb方塊首先由vc-i回路内去方媿效應濾波器硬體加速邏輯電路4〇〇處理，所產生的資料寫入紋理快取760。接著， 8x4 Cr方塊由vc-l回路内去方塊效應濾波器硬體加速邏輯電路400處理，所產生的資料寫入紋理快取760。視訊加速單元150使用CbCr旗標參數以管理此循序處理。 % 使用去方後放應指土結合先前第1圖之說明·，解碼器160在主處理器110 上執行但亦利用圖形處理單元120所提供的視訊加速指令。尤其是H. 264回路内去方塊效應滤、波器290之實施例使用特定IDF_H264_x結合以處理邊緣，依H. 264所規定之次序，從紋理快取760擷取一些子方塊並從執行單元池740 擷取另一些。在適當結合之下，這些IDF_H264_x指令達成一個接一個像素讀取與寫入。第8圖係用於H. 264之16x16大方塊之方塊圖。這大 3 lClienfs Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 31 200803525 方塊切割成16個4x4子方塊，每個均將進行去方塊效應。第8圖::中:之"番麟子方塊可依列與行定義（例如ri， Η· 264定儀:先處箱、垂直邊緣在處理水平邊緣，如第8 示之惑緣__靡_胸聲;.1 。Hu. :因_#___塊:效:應德波器:係應用於一對子方塊間的：.?> 邊緣子‘方.塊餐腺典為序濾施„:.心 edge a=[block to left 〇fRl,ClH [R1?C1} ； [block to left of R2,C1] | [R2,Cdl; v ；.：： -> ,-；-Γ [block to 1¾ of R3,C1] I [R3；C1]； [block to left of R4?C1] | [R45C1]"：； edge b=[Rl,Cl] j ; tR2，C2l;[;Il2，Cl!;|[R2，C^;.:.; [R3，C1]| [R3，C2] ; [ R4，C1] | [R4，C2]; edge c=[Rl，C2] | [R2，C3] ; [ R2，C2]丨[R2，C3]; [R3，C2] | [R3，C3] ; [ R4，C2]丨[R4，C3]; edge d=[Rl,C3] | [R2，C4] ; [ R2，C3] | [R2，C4]; [R3，C3] | [R3，C4] ; [ R4，C3] | [R4，C4]; edge e-[block to top of R1，C1] | [R1，C1] ; [block to top of Rl，C2] | [R1，C2]; [block to top of R1，C3] | [R1,C3]; [block to top of R15C4] | [R15C4] edge f=[Rl，Cl] | [R2，C1] ; [R1，C2] 1 [R2，C2]; [R1，C3] | [R2，C3]; [R1，C4] 1 [R2，C4] edge g=[R2?Cl] | [R3，C1] ; [R2，C2] | [R3，C2]; [R2，C3] | [R3?C3]; [R2?C4] | [R35C4] edge h=[R3,Cl] | [R4，C1] ; [R3，C2] | [R4，C2]; [R3，C3] | [R4,C3]; [R3?C4] | [R4>C4] 對於第 1對子方塊’均下载自紋理快取，因為還沒 32Clienfs Docket N〇.:S3U06-0023 TTs Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/3} 32 200803525 有像素因施用濾波器而被改變。儘管第〗垂直邊緣（a)之濾 ' "波裔可以改變（R1，C1)之像素值，第;2列·垂直邊緣實際上瓶$赛第::1列垂直邊緣共用所有像素。因此贫 ^ 亦卞截&紋理快取760。既然兩相:鄰列;間的i垂直邊緣不共「》々用像参’.齡沒對（邊緣c)與第4對_邊緣 .、：，IDF—m64^x指令判.定要從那個位置下载像素凊_:Λ由.回路拽: • 去:方塊效應濾波器290所使用的IDF-H264_x指令處理第1鈕μ 垂直邊緣（a-d )V之次序為：；.. IDF—H264_0 SRCl=address of (R1，C1); IDF—H264—0 SRCl=address of (R2，C1); IDF—H264—0 SRCl=address of (R3，C1); IDF—H264—0 SRCl=address of (R4，C1); 接下來，回路内去方塊效應濾波器290處理第2垂直邊緣（b)，從（Rl，C2)開始。在定義為（Rl，C2) 8x4子方鲁塊内最左邊4個像素與（R1，C1)子方塊最右邊的像素重疊。這些由（R1，C1)之垂直邊緣濾波器所處理，亦可能更新，之重疊像素係因而被讀自執行單元池740而非紋理快取760。然而，在（Rl，C2)子方塊最右邊的4個像素還沒被濾波，因而讀自紋理快取760。子方塊（R2，C2)到（R4，C2)亦同。回路内去方塊效應濾波器290藉由命令下面idF_H264_x 的順序以處理第2組垂直邊緣，以完成此結果： IDF—H264一 1 SRCl=address of (R1?C2); IDF_H264_1 SRCl=address of (R25C2); 33Clienfs Docket No.：S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 33 200803525 IDF一H264—1 SRCl=address of (R3，C2); IDF_H264_1 SRCB= address of (R45C2)；當處理第3:組垂、直缝:_時_從.iR1，C3 )開始。在（幻， C3) 8x4子方塊:内曝g 邊的像素重疊，因:雨義然而，在（R4，C2:), 因而讀自紋理快取7.6 运邊_咖’ G2)子方塊最右讀_孰徐單元他ί74〇而非紋理快取76〇。承#>樣最_邊勝4::値像素還沒被渡波， 0。:子亦塊❹k，C2:)到（_，C2)亦同。最後一組垂直邊緣會發生類似的情形^因此，回路呼去方塊效應濾波器290藉由命令下面n)F-H264一X的順序议處課剰下2組垂直邊緣： IDF H264 1 ...... — IDF H264 1 —- — IDF 一 H264—1 H>F 一 H264」 IDF—H264—1 IDF 一 H264—1 IDF—H264—1 SRCl=address of (R1，C3); SRCl=address of (R2，C3); SRCl=address of (R3，C3); SRCl=address of (R4,C3)； SRCl=address of (R1，C4); SRCl^address of (R29C4)； SRCl=address of (R3?C4)；】DF—H264—1 SRCl=address of (R4，C4); 接著處理水平邊緣( e-h)。此時，去方塊效應濾波器已應用於大方塊中的每個子方塊，因而每個像素可能已更新。因此，送去進行水平邊緣濾波的各子方塊係讀自執行單元池 740而非紋理快取760。因此，回路内去方塊效應濾波器290 藉由命令下面H)F_H264_x的順序以處理水平邊緣： IDF一H264一2 SRCl^address of(Rl5Cl); 34Client’s Docket No.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 34 200803525 IDF—H264—2 SRCl=address of (R2，C1);BaseAddress parameter. Picturefjeig and the picturewidth input parameter are used to determine the range of the square, the lower left coordinate. Finally, the video graphics can be progressive (progessive) or interlaced. For interlaced scanning, it consists of two directions (upper and lower). The texture filtering unit 750 uses FieldFlag and TopFieldFlag to properly process the interlaced scanned image. The deblocking 8x4x8 bit output system is provided for a target temporary cry, and is also written back to the execution unit pool 740. Writing the deblocking output back to the execution unit pool 740 is a "m〇dify in place" operation, which is necessary in some decoder implementations, such as Η·264 where the pixel value in the box is 'right Below, it is calculated based on the previous results. However, the 1 decoder does not have this limitation relationship like H.264. in? (:-1, for each 85 (8 vertical (first vertical and then horizontal) filtering. All vertical edges can thus be performed in parallel on the enamel, 4x4 edges are filtered later. Parallelization can be utilized because there are only two pixels (one edge one) is updated, and these pixels are not used to calculate other edges. Since the block effect data is written back to the execution unit 29Client's Docket N〇.:S3U06-0023 TT's Docket N(x0608_A41202-TW/fmal/林璟辉/2007/ 05/31 29 200803525 Pool 740 instead of texture cache 760, provides a different IDFJI264 one finger, so that this sub-block is taken from different positions. This can be seen in the j. See, in llockAddress In the narrative, Data Block! and Qiu·: 轻轻脚_β_: Wolf: Yuan i sub-Leng block HDF-Η264-1 instruction from the texture cache 76〇撷 • Abandon two green clothes: , instrument and: ^^禄# 'Yuan Tear: Move the half to one. A,... :::: With the code, the device 1 is changed to the function of the JDFJ1264-X command. The following is a detailed description of the supply of pixel data to the video acceleration. Zhang Yuan, before, the texture filtering unit 750 and the competing unit pool 740 convert the remaining resources Processing of the material. The image data is replaced by the above-mentioned command parameters, and the coordinates of the sub-block address to be extracted from the texture cache 76 or from the execution unit pool 740 are provided to the texture filtering unit 75. The image data includes brightness (Y ) and chroma (Cb, Cr) plane. A YC flag loses eight parameters. The number defines the Y plane or CbCr plane to be processed. When processing the twist (Y) data, as indicated by the YC flag parameter, the texture The filtering unit 750 retrieves the sub-block and provides the 128-bit as an input to the VC-1 in-loop deblocking filter hardware acceleration logic circuit 4 (10) (eg, the block input parameter of the VC-1 accelerator example of FIG. 4). The generated data is written to the target register as a group of 4 - register quad (ie, DST, DST+1, DST+2, DST+3). When processing chroma data, such as YC flag As indicated by the standard parameters, the cb and Cr blocks will be accelerated by the block-effect filter hardware in the VC-1 loop. 30Clienfs Docket N〇.:S3U06-0023 TPs Docket No:0608-A41202-TW/fiml/林璟辉/2007/ 05/31 30 200803525 The circuit 400 is processed continuously. The generated data is written. Texture cache 760. In some embodiments, this occurs in each cycle, with 256 bits per cycle. j; and , %% - some video acceleration units compact 'balance bar, interlaced Sweep planes, each: save as one width and half length. : Touching 遽 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Unit puller buffer. In particular, the texture filtering unit 霄50 writes a 4x4 Cb block to the buffer^ and then writes 2 blocks to the buffer. Each &x# Cb block is firstly used by the vc-i loop. The hardware acceleration logic circuit 4 is processed, and the generated data is written to the texture cache 760. Next, the 8x4 Cr block is processed by the vc-l loop to the block effect filter hardware acceleration logic circuit 400, and the resulting data is written to the texture cache 760. The video acceleration unit 150 uses the CbCr flag parameters to manage this sequential processing. % Use the squared back and the ground. In conjunction with the previous description of Fig. 1, the decoder 160 executes on the main processor 110 but also utilizes the video acceleration command provided by the graphics processing unit 120. In particular, the H.264 loop deblocking filter, the embodiment of the waver 290 uses a specific IDF_H264_x combination to process the edges, extracting some sub-blocks from the texture cache 760 and executing from the execution unit pool in the order specified by H.264. 740 Take some more. With proper combination, these IDF_H264_x instructions achieve one pixel read and write. Figure 8 is a block diagram of a 16x16 large block for H.264. This big 3 lClienfs Docket N〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 31 200803525 The square is cut into 16 4x4 sub-blocks, each of which will perform the deblocking effect . Figure 8:: Medium: The "Fan Linzi box can be defined by column and line (for example, ri, Η· 264 determinator: first box, vertical edge at the processing level edge, as shown in the eighth __靡_胸声;.1. Hu. : because _#___ block: effect: Ying Debo: applied to a pair of sub-blocks: .?> edge sub-square. Filtration „:.heart edge a=[block to left 〇fRl,ClH [R1?C1} ; [block to left of R2,C1] | [R2,Cdl; v ;.:: ->,-;- Γ [block to 13⁄4 of R3,C1] I [R3;C1]; [block to left of R4?C1] | [R45C1]":; edge b=[Rl,Cl] j ; tR2,C2l;[; Il2,Cl!;|[R2,C^;.:.; [R3,C1]| [R3,C2] ; [ R4,C1] | [R4,C2]; edge c=[Rl,C2] | [ R2,C3] ; [ R2,C2]丨[R2,C3]; [R3,C2] | [R3,C3] ; [ R4,C2]丨[R4,C3]; edge d=[Rl,C3] | [R2,C4] ; [ R2,C3] | [R2,C4]; [R3,C3] | [R3,C4] ; [ R4,C3] | [R4,C4]; edge e-[block to top of R1,C1] | [R1,C1] ; [block to top of Rl,C2] | [R1,C2]; [block to top of R1,C3] | [R1,C3]; [block to top of R15C4] | [R15C4] edge f=[Rl,Cl] | [R2,C1] ; [ R1,C2] 1 [R2,C2]; [R1,C3] | [R2,C3]; [R1,C4] 1 [R2,C4] edge g=[R2?Cl] | [R3,C1] ; R2,C2] | [R3,C2]; [R2,C3] | [R3?C3]; [R2?C4] | [R35C4] edge h=[R3,Cl] | [R4,C1] ; [R3, C2] | [R4,C2]; [R3,C3] | [R4,C3]; [R3?C4] | [R4>C4] For the 1st pair of sub-blocks, all download from texture cache, because there is no 32Clienfs yet Docket N〇.:S3U06-0023 TTs Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/3} 32 200803525 There are pixels that have been changed due to the application of the filter. Although the filter of the vertical edge (a) of the vertical edge (a) can change the pixel value of (R1, C1), the second column of the vertical edge is actually the same as the vertical edge of the bottle::1 column. Therefore, the poor ^ also intercepts & texture cache 760. Since the two phases: the adjacent columns; the vertical edges of the i are not common "" 像 with the parameter '. age is not right (edge c) and the fourth pair _ edge., :, IDF-m64^x command to determine The pixel _: Λ . 拽 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ID =address of (R1,C1); IDF—H264—0 SRCl=address of (R2,C1); IDF—H264—0 SRCl=address of (R3,C1); IDF—H264—0 SRCl=address of (R4 Next, the in-loop deblocking filter 290 processes the second vertical edge (b) starting from (Rl, C2). The leftmost 4 in the 8x4 sub-block defined as (Rl, C2) The pixel overlaps with the rightmost pixel of the (R1, C1) sub-block. These are processed by the vertical edge filter of (R1, C1) and may also be updated, and the overlapping pixel system is thus read from the execution unit pool 740 instead of the texture. Take 760. However, the 4 pixels on the far right of the (Rl, C2) sub-block are not filtered yet, so read from texture cache 760. Sub-blocks (R2, C2) to (R4, C2) are also the same. The deblocking filter 290 processes the second set of vertical edges by commanding the order of idF_H264_x below to complete the result: IDF - H264 - 1 SRCl = address of (R1 - C2); IDF_H264_1 SRCl = address of (R25C2); 33Clienfs Docket No.:S3U06-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 33 200803525 IDF-H264-1 SRCl=address of (R3,C2); IDF_H264_1 SRCB= address of ( R45C2); When dealing with the 3rd: group vertical, straight seam: _ when _ from .iR1, C3). In (phantom, C3) 8x4 sub-block: the pixels in the inner side exposed g overlap, because: rain, however, in (R4, C2:), thus reading from texture cache 7.6 transport edge _ café ' G2) sub-square read right _ 孰 Xu unit ί 74 〇 instead of texture cache 76 〇. Cheng # > sample most _ edge win 4:: 値 pixels have not been crossed, 0:: sub-block ❹k, C2:) to (_, C2) is the same. The last set of vertical edges will have a similar situation ^ Therefore, the loop calls the block effect filter 290 By ordering the following n) F-H264-X sequence, the two vertical edges are: IDF H264 1 ...... — IDF H264 1 —- — IDF H264-1 H>F-H264 IDF —H264—1 IDF—H264—1 IDF—H264—1 SRCl=address of (R1,C3); SRCl=address of (R2,C3); SRCl=address of (R3,C3); SRCl=address of (R4 , C3); SRCl=address of (R1, C4); SRCl^address of (R29C4); SRCl=address of (R3?C4); 】DF—H264—1 SRCl=address of (R4,C4); Horizontal edge (eh). At this point, the deblocking filter has been applied to each sub-block in the large square, so each pixel may have been updated. Therefore, each sub-block sent to perform horizontal edge filtering is read from execution unit pool 740 instead of texture cache 760. Therefore, the in-loop deblocking filter 290 processes the horizontal edges by ordering the following H)F_H264_x: IDF-H264-2 SRCl^address of(Rl5Cl); 34Client's Docket No.:S3U06-0023 TT's Docket No:0608 -A41202-TW/fmal/林璟辉/2007/05/31 34 200803525 IDF—H264—2 SRCl=address of (R2,C1);

b “珥匕拟鉍一2 SRCl=address of (R3，C1); /SRCl-address of (R49C1); ΐ V 〇f(R2,C2); 7¾ SRGl^address of (R3?C2); IDF^H264^2 SRGl=address of (R4?C2); IDF-_；H264_2 SRC 1 ^address of (R1 ,C3); 任何程，序說:明.或流程圖:中?的定塊應被理解為表示模組、區段或部分程式碼，其包含用於實現特定邏輯電路功能或程序中的步驟之一個或多個可執行的指令。熟悉敕體部門之技藝者應當瞭解到，其他的實現方法亦包含於所揭露之範圍内。在其他的實現方法中，各功能可不依所示或揭露之順序執行，包含實質上同步進行或逆向進行，依所涉之功能而定。在此揭露之系属與方法可以軟體、硬體或其結合實現。在一些實施例中，該系統及/或方法係以存在記憶體中之軟體實現，且由位於一計算裝置中之適當處理器所執行 (包含而不限於一微處理器、微控制器、網路處理器、可重新裝配處理器、可擴充處理器）。在其他實施例中，該系統及/或方法係以邏輯電路實現，包含而不限於一可程式邏輯裝置（PLD，programmable logic device)、可程式邏輯閘陣列（PGA，programmable gate array)、現場可矛壬式化邏輯閘陣列（FPGA ’ field progrannnable gate 35CUent’s Docket No. :S3U〇6-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 35 200803525 array)或特定應用電路（ASIC)。在其他實施例中，這些邏輯敘述係在-_處理器或_處:理單元（ Gpu)完成。 /-· 、在此揭露之系統與方法可被嵌入.任何.電腦可讀.媒體而碰_ _結一指令執行系統，、設備卜翁置々該指令執游 ::系統愈含任何〔以電腦為基礎的系統、含有處理暴的屢統或其他卵從該指令執行系統擷取與執行遵趣指令酿統μ 所掩露之文字”電腦可讀媒體(compute-readable itiediumr可為任何可以容納、儲存、溝通、傳遞或傳送該私式作為使薄或與威指令執行系_連結之工具。嫁電腦可讀媒體可為，例如（非限制）為基於電子的、有磁性的、光的、電磁的、紅外線的或半導體技術的一系統或傳遞媒使用電子技術之電腦可讀媒體之特定範例（非限制）可包含：具有-條或多條電性（電子）連接的線；一隨機存取記憶體（RAM，random aCcess mem〇ry);一唯讀記憶體鲁（R0M，read, 1 y memory ); —可拭去可程式化唯讀記憶體（EPROM或快閃記憶體）。使用磁技術之電腦可讀媒體之特定範例（非限制）可包含：可攜帶電腦磁碟。使用光技術之電腦可讀媒體之特定範例（非限制）可包含：一光纖與一可攜帶唯讀光碟（CD-ROM)。雖然本發明在此以一個或更多個特定的範例作為實施例闡明及描述，不過不應將本發明侷限於所示之細節，然而仍可在不背離本發明的精神下且在申請專利範圍均等之領域與範圍内實現許多不同的修改與結構上的改變。因 36Clienfs Docket N〇.：S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 36 200803525b “珥匕1 2 SRCl=address of (R3,C1); /SRCl-address of (R49C1); ΐ V 〇f(R2,C2); 73⁄4 SRGl^address of (R3?C2); IDF^ H264^2 SRGl=address of (R4?C2); IDF-_;H264_2 SRC 1 ^address of (R1 ,C3); Any procedure, sequence: Ming. or flow chart: The block in the middle should be understood as Representing a module, section, or portion of code that contains one or more executable instructions for implementing a particular logic circuit function or step in a program. Those skilled in the art that should be aware of other implementations It is also included in the scope of the disclosure. In other implementations, the functions may be performed in the order shown or disclosed, including substantially synchronous or reverse, depending on the function involved. The genus and method may be implemented in software, hardware or a combination thereof. In some embodiments, the system and/or method is implemented in software stored in memory and executed by a suitable processor located in a computing device (including Not limited to a microprocessor, microcontroller, network processor, reconfigurable processor, In other embodiments, the system and/or method is implemented by a logic circuit, including but not limited to a programmable logic device (PLD), a programmable logic gate array (PGA), a programmable gate Array), on-site spoke-type logic gate array (FPGA 'field progrannnable gate 35CUent's Docket No. :S3U〇6-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 35 200803525 array Or an application specific circuit (ASIC). In other embodiments, these logical statements are done at the -_processor or _: the GPU (Gpu). /-· The systems and methods disclosed herein can be embedded. Any computer readable. Media touch _ _ knot an instruction execution system, equipment 卜々々々々 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The computer-readable medium (comput-readable itiediumr can be used to accommodate, store, communicate, transfer or transmit the privately-readable medium from the instruction execution system) A tool that is coupled to a system of instructions. The computer readable medium can be, for example, a non-limiting electronic or magnetic, optical, electromagnetic, infrared, or semiconductor technology system or medium. A specific example of a computer readable medium of electronic technology (non-limiting) may include: a line having one or more electrical (electronic) connections; a random access memory (RAM, random a Ccess mem〇ry); Read memory R (R0M, read, 1 y memory); - Can erase the programmable read-only memory (EPROM or flash memory). Specific examples of computer readable media that use magnetic technology (without limitation) may include: a portable computer diskette. Specific examples of computer readable media that use optical technology, without limitation, may include: a fiber optic and a portable CD-ROM. The invention is illustrated and described herein by way of example only, and is not intended to Many different modifications and structural changes are made within the scope and scope of equalization. Because 36Clienfs Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 36 200803525

I v 此，最好將所附上的申請專利範圍廣泛地且以符合本發曰領域之方法解:釋:，v在瓣後的申請專利範圍前提出二聲^: :- ：V-V ·：·-' . ·, -S '-： . ;；； % . ! ^ ··- …·、 f, .*··· · · ， _‘、，，，·. - Μ -、. ’ 【圖式簡單說明】第1 .1用各1形鼻.¾¾¾¾¾¾解碼之— 一算平台之方塊圖。 :: 〜：… :: 第2圖言第〒^ ’…第 3 i 說^ vcri:i波，之;|V • j 4 肩係第 1 _ VC-1 回路加速邏輯電路400之硬體描述虛擬碼之列表广：第5圖係第4圖行加速邏輯電路5〇〇之硬體描述語言程式碼之列表。I v, preferably, the scope of the attached patent application is broadly and in a manner consistent with the field of the present invention: v: 2 is proposed before the patent application scope of the flap ^: :- :VV ·: ·-' . ., -S '-: . ;; ; % . ! ^ ··· ...·, f, .*······, _',,,,.. - Μ -,. ' A brief description of the schema] The first one is decoded with a 1-shaped nose. 3⁄43⁄43⁄43⁄43⁄43⁄4 - a block diagram of the platform. :: ~ :... :: 2nd picture 〒^ '...3i i says ^vcri:i wave, it;|V • j 4 shoulder system 1 _ VC-1 circuit acceleration logic circuit 400 hardware description A wide list of virtual codes: Figure 5 is a list of hardware description language code for the acceleration logic circuit of Figure 4.

弟6A-D圖形成第4、5圖之行加速邏輯電路之_方圖。 A 第7圖係第1圖之圖形處理單元12〇之資料流程圖。苐8圖係H· 264所用之16x16大方塊之方塊圖。【主要元件符號說明】 100〜系統、110〜一般用途CPU、120〜圖形處理器（GPU)、 130〜記憶體、140〜匯流排、150〜視訊加速單元（Vpu)、160 〜軟體解碼器、170〜視訊加速驅動器。 205〜輸入之位元流、210〜熵解碼器、215〜空間解碼器、 220〜反相量化器、230〜反相離散餘弦轉換、235〜圖形、245 〜移動向量、250〜移動補償、255〜先前解碼圖形、265〜預 37Client5s Docket N〇.：S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 37 200803525 ’ 測圖形、270〜空間補償、280〜加法器、290〜去方塊效應濾波器、295〜解碼爵形 * ·. * * * 310-320〜#個鄰选4x4子方塊、330〜垂直邊界。 -·：· ι-；. '' iV ：·;. ·；· ：·：. ^·: /·/.· . ‘ ； . ···； ,-· 400〜回路·内去身巍效應濾波g硬體加速邏輯電路、θ会3； _ 。； ί\λ、·γ·: * ‘ 模組定義區段、:420义:變代迴爵_、430〜測試垂直參數霜Ρ 养、440〜tb較迴圈參數與:3區段、450〜示例區段。. .. -'： · · - · -.；…:；···'· :' :· ·.，二、.：，: ，.. . ‘， · · > - . • …500〜行加速邏輯電路、“51〇^模傘定義區段、520〜像棄值;;、運算區段、530〜比較迴圈參數與3區段、540〜測試DOJFILTER 區段、550〜更新狀態區段。 605-610-615 - 620 〜多工器、625-630-679 〜減法器、 635-640-655-680〜邏輯電路方塊、645-650〜加法器、 660-665-670〜暫存器、671〜P4暫存器輸出、673〜P5暫存器輸出。681〜減法器、685〜加法器。687-689-691 -693〜多工 • 器、697〜OR閘。 710〜指令流處理器、720〜指令、730〜指令資料、74〇〜執4亍早元池、750/^"紋理滤波早元、760〜紋理快取、77〇〜/纟包裝器。 i 38Clienfs Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/fmal/林璟輝/2007/05/31 38The 6A-D diagram forms the _ square diagram of the acceleration logic circuit in the 4th and 5th diagrams. A Figure 7 is a data flow diagram of the graphics processing unit 12 of Figure 1.苐8 is a block diagram of 16x16 large squares used by H·264. [Description of main component symbols] 100~ system, 110~ general purpose CPU, 120~ graphics processing unit (GPU), 130~memory, 140~ bus, 150~video acceleration unit (Vpu), 160~soft decoder, 170~ Video Acceleration Drive. 205~ input bit stream, 210~en entropy decoder, 215~space decoder, 220~inverting quantizer, 230~inverted discrete cosine transform, 235~ graphics, 245~moving vector, 250~moving compensation, 255 ~ Previously decoded graphics, 265~Pre-37Client5s Docket N〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 37 200803525 'Measurement graphics, 270~space compensation, 280~addition , 290~ go to the square effect filter, 295~decoded jug *.. * * * 310-320~# adjacent 4x4 sub-blocks, 330~ vertical borders. -·:· ι-;. '' iV :·;. ·;· :·:. ^·: /·/.· . ' ; . ···; ,-· 400~loop·inside body effect Filter g hardware acceleration logic, θ will 3; _. ; ί\λ,·γ·: * ' Module definition section,: 420 meaning: variable generation back to _, 430~ test vertical parameter frost 养, 440~tb compared to the loop parameters and: 3 sections, 450 ~ Example section. . . . -': · · - · -.;...:;···'· :' :· ·., 二,.:,: ,.. . ', · · > - . • ...500~ Line acceleration logic circuit, "51〇^模伞 definition section, 520~ image rejection value;; operation section, 530~ comparison loop parameter and 3 section, 540~ test DOJFILTER section, 550~update status area Section 605-610-615 - 620 ~ multiplexer, 625-630-679 ~ subtractor, 635-640-655-680 ~ logic circuit block, 645-650 ~ adder, 660-665-670 ~ temporary storage 671~P4 register output, 673~P5 register output. 681~subtractor, 685~adder. 687-689-691-693~Multiplexer, 697~OR gate. 710~ instruction stream Processor, 720~instruction, 730~instruction data, 74〇~ 执4亍早元池, 750/^" texture filtering early element, 760~ texture cache, 77〇~/纟 wrapper. i 38Clienfs Docket N 〇.:S3U06-0023 TT's Docket No:0608-A41202-TW/fmal/林璟辉/2007/05/31 38

Claims

200803525 ^ X. The scope of application for patent: : Bu; a circuit 'used to determine a plurality of intermediates: .:.; group _ field to _ standard; i ^ food ^ y article: stupid logic circuit, report Less skill soil 2. A deblocking filter according to claim 1 of the patent scope, wherein _ a plurality of pixel groups form a square pixel block, and the plurality of images: the prime group comprises a column of pixel squares. Each of the images is as described in the third aspect of the patent application. A plurality of pixel groups form a square pixel block, and the plurality of pixel groups: the prime group includes a row of pixel squares. For example, in the deblocking filter of claim 1, the third logic circuit further includes: - a fourth meaning circuit, which is set to be updated according to the second predetermined pixel group in the remaining pixel group One of the first predetermined pixel groups in each of the remaining pixel groups. 5. If you apply for a patent range! The deblocking effect of the term according to the wave $f of the second logic circuit further comprises: ~ - the fifth logic circuit is arranged to filter the pixels of the pre-population group in parallel when the standard is turned over. 6. The deblocking filter according to item 1 of the patent application scope, wherein the 39Client's Docket No.: S3U06-0023 TT's Docket No: 0608-A41202-TW/fmal/林璟辉/2007/05/31 39 200803525: block effect Qian Bo n is the secret sub-block to the edge to remove the object. '众,鸭炅一*·-.,·〆' .·,;上_: :, ^ 7·If applying for a patented painting: item 4, a t :: '...: two' ' block effect filter State, in which the deblocking filter makes a lot of graphics processing instructions for the apricots +, / 'miao Tian,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,.: 8. In the case of the name, please refer to the 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ A video decoder comprises: a second: an entropy decoder that receives an input encoded bit stream; a spatial decoder that receives the output of the entropy decoder and generates a coded picture of the plurality of pixels; The logical f path is set to combine - the current picture and - generate a combined picture; and & & the first loop in the loop to receive the combined picture, the in-loop deblocking filter includes: a county-based predetermined pixel occupational wave; and a third logic circuit configured to set a predetermined pixel group to a standard, according to a corresponding group of filtering units in the complex array chopping unit, remaining in the plurality of pixel groups Each pixel group under the filter. 10. The video gamma repeller of claim 9, wherein the plurality of pixel groups form a square pixel block, and each pixel group of the plurality of pixel groups comprises a column of pixel squares. 〃 11. The video decoder of claim 9, wherein the plurality of pixel groups form a square pixel block, and each pixel group of the plurality of pixel groups is 40Clienfs Docket N〇.; S3U06-0023 TT'sDocketNo ^OS-AAUC^-TW/fmal/林璟辉/2007/05/31 200803525 Contains a row of pixel squares. Where Mf Α Α

: First to the town to lead the cold.. Like: νΊ

":13⁄4.3⁄4: The video decoder of the ninth application patent scope._第:显铁缉_龟' 戮灭在在在在在达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到达到::Knowing to read the summer of the Li Hengwei J item winter video decoder, the album also contains

The fifth logic circuit 'is set to be based on the remaining pixel groups: = the predetermined pixel group of the flute, taking care of the remaining pixel groups - the first predetermined ^^ 14. As claimed in claim 9 The video maser is a filter that is defined in accordance with the VC-1 standard. The graphics processing unit comprises: a main processing interface, receiving at least one video acceleration command; and a video acceleration unit for responding to at least the video acceleration command, the video acceleration unit comprising a loop a block effect filter, the intra-blocking filter, the wave device comprises: a first logic circuit, configured to determine whether a pixel of a predetermined pixel group of the plurality of pixel groups reaches a first standard; And setting, when the first standard is reached, first filtering the pixels of the predetermined pixel group; and a third logic circuit 'set to, when the _th criterion is reached, according to one of the corresponding group of filtering units in the complex array filtering unit, The remaining pixel groups in the plurality of pixel groups are sequentially filtered. 16. The graphic processing unit of claim 15 of the patent application, wherein the complex 41Clienfs Docket N〇.: S3U06-0023 TT's Docket No: 0608-A41202-TW/fmal/林璟辉/2007/05/31 41 200803525 : and the image of the plurality of pixel groups, the image of the 15th woman-shaped processing unit 'where the plural number contains - line:: image class meaning mixed " thief _ standby pixel logic: road = patent scope first 嶋Ft_器' 'It, tender: Pre = prime = _ left view group feelings. - the first - difficult pixel group. ― Effect Filter 2 == Video Decoder ’ where _ 42Clienfs Docket N〇.:S3U06-0023 TT’s Docket No:0608-A41202-TW/final/林璟辉/2007/05/31 42