TW200821982A

TW200821982A - Decoding method

Info

Publication number: TW200821982A
Application number: TW96120726A
Authority: TW
Inventors: Zahid Hussain; John Brothers; Duc Huy Bui
Original assignee: Via Tech Inc
Priority date: 2006-06-08
Filing date: 2007-06-08
Publication date: 2008-05-16
Also published as: CN101072349A; CN101072353B; TWI344795B; CN101072353A; TWI348653B; TW200813884A; TW200803526A; TWI428850B; TW200809689A; CN101072349B; CN101072350A; CN101087411A; CN101072350B; TWI354239B

Abstract

Various embodiments of decoding systems and methods are disclosed. One method embodiment, among others, comprises providing a shader configurable with a plurality of instruction sets to decode a video stream coded a plurality of different coding methods, loading the shader having one of the plurality of instruction sets to a variable length decoding (VLD) unit of a software programmable core processing unit for execution thereof, and decoding the video stream by executing the shader on the VLD unit.

Description

200821982 九、發明說明：【發明所屬之技術領域】本發明係有關於資料處理系統，特別是有關於可編程圖，形處理系統以及方法。 Ϊ Γ先前技術】黾細圖开》是用電腦產生圖像、影像或是其他圖形或圖 f 像資訊的一種技術。目前，許多的圖形系統是透過介面的使用而實施，例如：微軟的Direct3D介面、〇penGL等，其可在執行特定操作系統（例如··微軟的視窗系統）的電腦上對多媒體硬體（例如：圖形加速器或是圖形處理單元 (graphics processing unit，Gpu)提供控制。圖像或是影像的產生一般稱之為描繪成像（rencjering )，上述操作的細節主要是經由圖形加速器所實施。一般而言，在三維（three dimensional ’ 3D )電腦圖形中，場景内物件表面（或容體） ( 所表示的幾何被轉換成像素（圖像元素），並儲存在圖框緩衝器（frame buffer)内，接著顯示於顯示裝置上。每個物件或是物件群都有與表面外觀有關的特定視覺性質（例如：材料、反射係數、形狀、紋理（texture)等），其可被定義成物件或物件群的描繪成像内容（rendering context) 〇電腦圖形用以增加消費者對遊戲及其他多媒體產品的控制性及特色的要求、產生更加真實的影像以及改善處理速度及耗能。現已發展出許多標準，可以利用較少的位元200821982 IX. DESCRIPTION OF THE INVENTION: TECHNICAL FIELD OF THE INVENTION The present invention relates to data processing systems, and more particularly to programmable graphics, shape processing systems, and methods. Ϊ ΓPrevious Technology 黾黾开》 is a technique for generating images, images or other graphics or images with computer. Currently, many graphics systems are implemented through the use of interfaces, such as Microsoft's Direct3D interface, 〇penGL, etc., which can be used for multimedia hardware on computers running specific operating systems (eg Microsoft Windows systems) (eg : Graphics accelerators or graphics processing units (Gpu) provide control. The generation of images or images is generally referred to as rendering (rencjering), and the details of the above operations are mainly implemented by graphics accelerators. In a three-dimensional (3D) computer graphic, the surface (or volume) of an object in the scene (the represented geometry is converted into pixels (image elements) and stored in a frame buffer) It is then displayed on the display device. Each object or group of objects has specific visual properties related to the appearance of the surface (eg, material, reflection coefficient, shape, texture, etc.), which can be defined as objects or groups of objects. Rendering context 〇 computer graphics to increase consumer demand for games and other multimedia products Regulatory requirements and characteristics, generate more realistic images and improve processing speed and power consumption. Has developed a number of criteria, you can use fewer bits

Client’s Docket No·: S3U06-0014-TW TT，s Docket No:0608-A41247twf.doc/NikeyChen 5 200821982Client’s Docket No·: S3U06-0014-TW TT,s Docket No:0608-A41247twf.doc/NikeyChen 5 200821982

幾乎三分之一 z跼碼器，H.264相容之編碼譯碼器僅使用的位元數來編碼|見頻並維持相似的視頻品貝H.264規格^供兩種型式的熵（）編碼處理，包括内谷適應—進位算術編碼（c〇ntext_adaptive汾加巧 arithmetic coding ’ CABAC )以及内容適應可變長度編碼 (context-adaptive variable length coding，CAVLC)。為了滿足這些連續變化的需要，已提出了許多不同的純軟體或是純硬體解決方式、然而，已知技術皆會導致較高的庫存、立即淘汰的技術以及在設計上缺乏彈性。【發明内容】本發明揭露用於圖形處理單元之多執行序平行計算妨Almost one-third of the z codecs, H.264-compatible codecs use only the number of bits used to encode | See the frequency and maintain a similar video product H.264 specification ^ for two types of entropy ( Encoding processing, including intra-adaptive arithmetic coding (CABAC) and content-adaptive variable length coding (CAVLC). In order to meet these continuous changes, many different pure software or pure hardware solutions have been proposed. However, known techniques result in higher inventory, immediate elimination techniques, and lack of flexibility in design. SUMMARY OF THE INVENTION The present invention discloses a multi-execution parallel calculation for a graphics processing unit.

元之一可變長度解碼單元内，以及上述解 Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 包括：藉由執行_著色器解碼在内嵌於一可編程核心處理單，以及上述解碼係根據複數不 6 200821982 •同編碼方法，以及提供一已解碼資料輸出。【實施方式】為讓本發明之上述和其他目的、特徵、和優點能更明頒易懂，下文特舉出較佳實施例，並配合所附圖式，作詳細說明如下： * 實施例：本發明揭露解碼系統以及方法的許多實施例（其中，上述系統及方法將統稱為解碼系統）。在一實施例中，解碼系統係内嵌於圖形處理單元（graphics pr〇cessing unit， GPU)之可編程、多執行序（multithread)以及平行計算核心之一或多個執行單元中。使用軟體或硬體之結合以實施解碼功能。即視訊解碼是在圖形處理單元程式設計 (programming)的内容（context)以及圖形處理單元資料路徑内的硬體實施所完成。例如，在一實施例中，解碼運异或方法係由具有擴充指令集（extended instruction set) 之著色器（shader)(例如：頂點著色器）、圖形處理單元的執行單元資料路徑、以及用於位元流緩衝器之自動管理的額外硬體所實施。相較於現有系統，現有系統為處理純硬體或純軟體為主的解決方式，因此會遇到於先前技術中所提到的一些問題。在本文所描述的解碼系統中，可實施使用複數熵編碼技術之資訊解碼的編碼動作。解碼系統可根據著名之國際電信聯盟通訊標準部門（international telecommunicationOne of the variable length decoding units, and the above solution Client's Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen includes: by executing _shader decoding embedded in a programmable The core processing list, as well as the above decoding system, is based on the complex number not 2008 21982 • the same encoding method, and provides a decoded data output. The above and other objects, features, and advantages of the present invention will become more apparent from the description of the preferred embodiments of the invention. The present invention discloses many embodiments of decoding systems and methods (wherein the above systems and methods will be collectively referred to as decoding systems). In one embodiment, the decoding system is embedded in one or more execution units of a graphics pr〇cessing unit (GPU), a multithread, and a parallel computation core. Use a combination of software or hardware to implement the decoding function. That is, video decoding is performed in the context of the programming of the graphics processing unit and the hardware implementation in the data path of the graphics processing unit. For example, in one embodiment, the decoding algorithm or method is performed by a shader having an extended instruction set (eg, a vertex shader), an execution unit data path of the graphics processing unit, and Additional hardware for automatic management of the bit stream buffer is implemented. Compared to existing systems, existing systems are solutions that deal with pure hardware or pure software, and therefore encounter some of the problems mentioned in the prior art. In the decoding system described herein, an encoding action of information decoding using complex entropy encoding techniques can be implemented. The decoding system can be based on the well-known international telecommunication alliance communication standard department (international telecommunication)

Client’s Docket No·: S3U06-0014-丁W TT’s Docket No:0608-A41247twf.doc/NikeyChen η 200821982 union telecommunication standardization sector 9 ITU-T) H.264標準的CABAC以及CAVLC進行解碼，亦可根據 MPEG-2以及VC-1標準進行解碼。不同的解碼系統實施例係根據複數模式之一而操作，其中各模式係對應於先前所描述的標準之一並根華:執行一或多個從圖形處理單元圖框 ! 緩衝記憶體或對應於主機處理器之記憶體（例如主機中央處理單元（central processing unit，CPU))所接收到的指令集（例如經由預先載入（preload)等已知機制或是快取失敗）。可重新使用硬體以提供多種型式的解碼標準（即根據所選擇的模式）。再者，所選擇的模式亦會對初始化、使用和/或更新内容記憶體的方式造成影響。根據解碼的啟動模式，解碼系統可使用如Exp-Golomb 編碼、像霍夫曼（Huffman )的編碼（例如：CAVLV、MPEG-2 以及VC-1)和/或算術編碼（例如：CABAC)。藉由延伸對應於一或多執行單元的指令集，以及提供額外的自動管理位元流之硬體來執行熵解碼方法，以在CAVLV解碼以及CABAC解碼中執行内容模型。在一實施例中，熵編碼表係使用不同的記憶體表格或是其他的資料結構（例如唯讀記憶體（read only memory，R〇M)表）。此外，自動位元流緩衝器具備一些優點，例如，一旦位元流缓衝器的直接記憶體存取（direct memory access， DMA )引擎得知位元流的位置（位址），便會自動管理位元流而不需要進一步的指令。相較於傳統的微處理器/數位信號處理器（digital signal processor，DSP)系統，位元流Client's Docket No·: S3U06-0014-Ding W TT's Docket No:0608-A41247twf.doc/NikeyChen η 200821982 union telecommunication standardization sector 9 ITU-T) CABAC and CAVLC for H.264 standard decoding, also according to MPEG-2 And the VC-1 standard for decoding. Different decoding system embodiments operate according to one of the complex modes, wherein each mode corresponds to one of the previously described standards and is executed: one or more slave graphics processing unit frames! Buffer memory or corresponding to The set of instructions received by the host processor's memory (eg, the central processing unit (CPU)) (eg, via known mechanisms such as preload or cache failure). The hardware can be reused to provide multiple types of decoding standards (ie, depending on the mode selected). Furthermore, the mode selected will also affect the way in which the content memory is initialized, used, and/or updated. Depending on the mode of decoding of the decoding, the decoding system may use, for example, Exp-Golomb encoding, encoding like Huffman (eg, CAVLV, MPEG-2, and VC-1) and/or arithmetic encoding (eg, CABAC). The entropy decoding method is performed by extending the instruction set corresponding to one or more execution units, and providing an additional hardware that automatically manages the bit stream to perform the content model in CAVLV decoding and CABAC decoding. In one embodiment, the entropy encoding table uses a different memory table or other data structure (e.g., a read only memory (R〇M) table). In addition, the automatic bit stream buffer has some advantages, for example, once the bit memory buffer's direct memory access (DMA) engine knows the location (address) of the bit stream, it automatically Manage the bit stream without further instructions. Compared to traditional microprocessor/digital signal processor (DSP) systems, bitstreams

Client’s Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 8 200821982 • 7理代表了大量的間 '元數量，位元流緩衝費^再者’透過追蹤所使用的位本發明解碼系；，:::貞·處理錯誤的位元流。 (latency) ^ ^ 連續的動作且不易利用多執行序處理=AC解碼是非常 t使用-種轉發:（fQm :因此在不同,實施例少有效相依延n g)枝制（例如暫存轉發）以減. 以及多執行序處理哭的關! /夕冰5線（deep-_eline ) 令每-週期内執行指令。有偷二執仃序（細ad) 藉由檢查先前結果的運曾（。、、° 般轉發’其係元位址，當兩者相同時了職=^=以f指令運算上，一般轉發需要複雜的比鲈等、、、實施例中，不管是二二;在解碼系統的部分之暫㈣或是結果(例如储存在内部 ^ ^料，將利用不同的轉發型 3指令她元以編碼，例如：總共2位元而每一而：H立Γ。猎由這種方式，可以減少整體的延遲而改善處理态管線的效率。 =1 ®係顯示圖形處理系統】⑽之—實施例的方塊 \^中Ml系統以及方法的實施例於圖形處理系統削中實施。在部分實施例中’圖形處理系統100可以是電腦系統。圖形處理器系統觸1包括由顯示介面單元（dlsplay interface unit，DIU ) 104驅動的顯示裝置搬以及局部記憶體106 (例如：可包括顯示緩衝器、圖框緩衝器、紋理緩衝器、命令緩衝器等）。局部記憶體1〇6亦可取代為圖Client's Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 8 200821982 • 7 represents a large number of 'meta-quantity, bit stream buffer fee ^ again' used by tracking Bits of the present invention are decoding systems;, ::: Handling error bitstreams. (latency) ^ ^ Continuous action and not easy to use multi-execution order processing = AC decoding is very t-use-type forwarding: (fQm: so different, the embodiment is less effective depending on the ng) branches (such as temporary forwarding) Subtraction and multi-execution processing crying! / 夕冰5线 (deep-_eline) Let the instructions execute every cycle. There is a stealing order (fine ad) by checking the previous results of the transport (.,, ° like forwarding "the system of the meta-address, when the two are the same when the job = ^ = with the f command operation, the general forwarding Requires complex comparisons, etc., in the embodiment, whether it is two or two; in the part of the decoding system (four) or the result (for example, stored in the internal ^, will use different forwarding type 3 instructions to encode her For example: a total of 2 bits and each: H. In this way, the overall delay can be reduced to improve the efficiency of the processing pipeline. =1 ® shows the graphics processing system] (10) - the embodiment The embodiment of the M1 system and method is implemented in a graphics processing system. In some embodiments, the graphics processing system 100 can be a computer system. The graphics processor system touch 1 includes a display interface unit (DLsplay interface unit, DIU) 104-driven display device loading and local memory 106 (eg, may include display buffers, frame buffers, texture buffers, command buffers, etc.). Local memory 1〇6 may also be replaced by a map.

Clienfs Docket No.： S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 200821982 •框緩衝器或是儲存單元。局部記憶體106經由一或多個記 fe介面單元（memory interface un忖，MIU) 110搞接於圖形處理單元114。在一實施例中，記憶介面單元110、圖形處理單元114以及顯示介面單元1〇4皆耦接至與高速週邊組件互連，（peripheral component interconnect express， t PCI-E)相容之匯流排介面單元（buS interface:unit，BIU) 118。在一實施例中，匯流排介面單元118可使用圖形位址重新映射表（graphics address remapping table，GART)，然而亦可使用其他的記憶映射（mapping )機制。圖形處理單元114包括解碼系統200，其將描述於後。在部分實施例中’雖然解碼系統2〇〇係顯示為圖形處理單元114内的一個元件’解碼系統200亦可包括所顯示之圖形處理系統 1 〇〇的一或多個額外元件或是不同元件。匯流排介面單元118耦接於晶片組122(例如··北橋晶片組）或開關。晶片組122包括介面電子電路以增強來自中央處理單元126 (又稱主機處理器）的信號，並分離從系統記憶體124進出的信號以及從輸入輸出（1/0)裝置（未顯示）進出的信號。雖然提到了 PCI_E匯流排協定，然而在部分實施例中亦可在主機處理器與圖形處理單元114之間使用其他的連接和/或通訊方式，例如·· PCI、專屬高速匯流排等。糸統記憶體124亦包括驅動軟體128，其可使用中央處理皁元126將指令集或命令傳送至圖形處理單元 114内的暫存器。在部分實施例中，可透過晶片組122使用額外的圖形Clienfs Docket No.: S3U06-0014-TW TT s Docket No: 0608-A41247twf.doc/NikeyChen 200821982 • Frame buffer or storage unit. The local memory 106 is coupled to the graphics processing unit 114 via one or more memory interface units (MIUs) 110. In one embodiment, the memory interface unit 110, the graphics processing unit 114, and the display interface unit 〇4 are all coupled to a bus interface that is compatible with a high-speed peripheral component (t PCI-E). Unit (buS interface: unit, BIU) 118. In an embodiment, bus interface interface unit 118 may use a graphics address remapping table (GART), although other memory mapping mechanisms may be used. Graphics processing unit 114 includes a decoding system 200, which will be described later. In some embodiments 'although the decoding system 2 is shown as a component within the graphics processing unit 114' the decoding system 200 may also include one or more additional components or different components of the graphics processing system 1 shown. . The bus interface unit 118 is coupled to the chip set 122 (e.g., Northbridge wafer set) or a switch. Wafer set 122 includes interface electronic circuitry to enhance signals from central processing unit 126 (also referred to as a host processor) and to separate signals coming in and out of system memory 124 and from input and output (1/0) devices (not shown). signal. Although a PCI_E bus protocol is mentioned, in some embodiments other connections and/or communication methods may be used between the host processor and graphics processing unit 114, such as PCI, dedicated high speed bus, and the like. The memory 124 also includes a driver software 128 that can communicate the instruction set or commands to the scratchpad within the graphics processing unit 114 using the central processing soap unit 126. In some embodiments, additional graphics may be used through the wafer set 122.

Client’s Docket No.: S3U06-0014-TW TT，s Docket No:0608-A41247twf.doc/NikeyClien 200821982 /處理單元經由PCI-Ε匯流排協定耦接至第1圖中的元件。在一實施例中，圖形處理單元100可包括第i圖所顯示之所有元件’或是較少元件和/或不同於第1圖所顯示之元件。再者，在部分實施例中，可使用額外的元件，例如耦接至晶^片組122的南橋晶片組。Client's Docket No.: S3U06-0014-TW TT, s Docket No: 0608-A41247twf.doc/NikeyClien 200821982 / The processing unit is coupled to the elements in Figure 1 via a PCI-Ε bus bar protocol. In one embodiment, graphics processing unit 100 may include all of the elements shown in Figure i or fewer components and/or components other than those shown in Figure 1. Moreover, in some embodiments, additional components may be used, such as a south bridge wafer set coupled to the wafer set 122.

I 參考第2圖，第2圖係顯示實施解碼:系統2〇〇之一實施例之處理環境的方塊圖。特別是圖形處理單元114包括圖形處理器202。圖形處理器202包括多執行單元 (execution unit，EU)及計算核心204 (亦稱為軟體可編程核心處理單元）。在一實施例中，計算核心2〇4包括内肷於執行單元資料路控（executi〇n unit data path，EUDP) 的解碼系統200 (亦稱為VLD單元），其中執行單元資料路徑被分配至一或多個執行單元。圖形處理器202亦包括執行單元集合（execution unit p〇〇卜EUP)控制、頂點/ 串流快取單元206 (這裡稱為執行單元集合控制單元2〇6) 以及具有固定功能邏輯單元（例如包含三角形設定單元 (triangle set-up unit，TSU)、栅格-圖塊產生器（span_tile generator，STG)等）的圖形管線208，其將描述於後。計算核心204包括多執行單元之集合以符合不同著色器程式之著色任務的計算要求，其中著色器程式包括頂點著色器、幾何著色器和/或像素著色器處理圖形管線208的資料。在一實施例中，當著色器透過計算核心2〇4執行解碼系統200的功能時，圖形處理器實施例的說明將被描述，接著說明解碼系統200的特定實施例。I Referring to Fig. 2, Fig. 2 is a block diagram showing the processing environment for implementing decoding: one of the embodiments of the system. In particular, graphics processing unit 114 includes graphics processor 202. Graphics processor 202 includes a multiple execution unit (EU) and computing core 204 (also known as a software programmable core processing unit). In an embodiment, the computing core 2〇4 includes a decoding system 200 (also referred to as a VLD unit) that is internal to the execution unit data path (EUDP), wherein the execution unit data path is assigned to One or more execution units. The graphics processor 202 also includes an execution unit set (eUP) control, a vertex/stream stream cache unit 206 (referred to herein as an execution unit set control unit 2〇6), and a fixed function logic unit (eg, including A graphics pipeline 208 of a triangle set-up unit (TSU), a grid-tile generator (STG), etc., which will be described later. The computing core 204 includes a collection of multiple execution units to meet the computational requirements of the coloring task of different shader programs, including shader shaders, geometry shaders, and/or pixel shader processing graphics pipeline 208. In one embodiment, the description of the graphics processor embodiment will be described when the shader performs the functions of the decoding system 200 through the computing core 2, and a particular embodiment of the decoding system 200 is illustrated.

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 11 200821982 解碼系統200可以用硬體、軟體、韌體或其組合等方式而實施。在較佳實施例中，解碼系統200係以硬體以及軟體的方式實施，其包括下列已知技術之任何技術或是結合：具有邏輯閘且可對資料信號進行邏輯功能的離散邏輯寫路、具有適當組合邏輯閘的斧殊應用集成電路 (：application specific integrated circuit ? ASIC )、可程式化閘極陣歹ij ( programmable gate array，PGA )、場式可程式化閘陣列（field programmable gate array，FPGA)以及狀態機（state machine )等。參考第3圖以及第4圖，其分別為圖形處理器202之實施例中選擇元件的方塊圖。如前所述，解碼系統2〇〇的一實施例可以是具有擴充指令集以及額外硬體元件之圖形處理器202内的著色器，圖形處理器202的一實施例以及對應的處理將描述於後。雖然第3圖與第4圖並未顯示圖形處理的全部元件，但是第3圖與第4圖所顯示的元件已足夠使熟知此技藝之人士理解到相關圖形處理器的功能及架構。參考第3圖，可編程處理環境的中心為計算核心 204，其包括解碼系統200並可處理各種指令。不同型式的著色器程式可執行或映射到計算核心204，例如頂點、幾何、像素著色器程式。多重事件（multi-issue )處理器的計算核心204可以在單一時脈週期内處理多個指令。參考第3圖，圖形處理器202的相關元件包括計算核心204、紋理過濾（filtering)單元302、像素包裝器（packer) 304、命令流處理器306、寫回單元308、以及紋理位址產Client's Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 11 200821982 The decoding system 200 can be implemented in the form of hardware, software, firmware, or a combination thereof. In the preferred embodiment, decoding system 200 is implemented in hardware and software, including any of the following techniques or combinations of techniques: discrete logic writes with logic gates and logic functions on data signals, An application specific integrated circuit (ASIC) with a suitable combination of logic gates, a programmable gate array (PGA), and a field programmable gate array (field programmable gate array) FPGA) and state machine. Referring to Figures 3 and 4, which are block diagrams of selected elements in an embodiment of graphics processor 202, respectively. As previously mentioned, an embodiment of the decoding system 2 can be a colorizer within the graphics processor 202 having an extended instruction set and additional hardware components, an embodiment of the graphics processor 202 and corresponding processing will be described in Rear. Although Figures 3 and 4 do not show all of the components of the graphics process, the components shown in Figures 3 and 4 are sufficient for those skilled in the art to understand the functionality and architecture of the associated graphics processor. Referring to Figure 3, the center of the programmable processing environment is computing core 204, which includes decoding system 200 and can process various instructions. Different types of shader programs can be executed or mapped to the computational core 204, such as vertex, geometry, and pixel shader programs. The computational core 204 of the multi-issue processor can process multiple instructions in a single clock cycle. Referring to Figure 3, the relevant elements of graphics processor 202 include computation core 204, texture filtering unit 302, pixel packer 304, command stream processor 306, write back unit 308, and texture address generation.

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 12 200821982 生器310。第3圖亦包括執行單元集合控制單元206,其中執行單元集合控制單元206亦包括頂點快取記憶體和/或串流（stream)快取記憶體。舉例來說，如第3圖所顯示，紋理過濾單元302提供紋素（texel)資料給計算核心204 (輸入A以及輸入B)。在部分實施例中，紋素資料為512 位元資料。：像素包裝器304提供像素著色輸入給計算核心204(輸入C以及輸入D)，像素著色輸入亦為512位元資料格式。此外，像素包裝器304向執行單元集合控制單元206請求像素著色任務，而執行單元集合控制單元206便會提供指定執行單元號碼及執行緒號碼給像素包裝器304。像素包裝器304及紋理過濾單元302為已知的技術，因此將不再進一步描述於此。雖然第3圖所顯示之像素及紋素封包為 512位元之資料封包，但是依據圖形處理器202所需的效能特徵，可在部分實施例中改變封包的大小。命令流處理器306提供三角形頂點索引給執行單元集合控制單元206。在第3圖的實施例中，索引為256位元之資料。執行單元集合控制單元206組合來自串流快取記憶體的頂點著色輸入，並傳送資料至計算核心204 (輸入 E)。執行單元集合控制單元206亦組合幾何著色輸入並傳送至計算核心204 (輸入F)。執行單元集合控制單元206 亦控制執行單元輸入402及執行單元輸出404 (第4圖）。換句話說，執行單元集合控制單元206控制各輸入流以及各輸出流至計算核心204。Client's Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 12 200821982 Generator 310. Figure 3 also includes an execution unit set control unit 206, wherein the execution unit set control unit 206 also includes vertex cache memory and/or stream cache memory. For example, as shown in FIG. 3, texture filtering unit 302 provides texel data to computing core 204 (input A and input B). In some embodiments, the texel data is 512 bit data. : Pixel wrapper 304 provides pixel shading input to computing core 204 (input C and input D), and the pixel shading input is also in 512-bit data format. In addition, the pixel wrapper 304 requests the execution unit collection control unit 206 for the pixel shading task, and the execution unit collection control unit 206 provides the specified execution unit number and thread number to the pixel wrapper 304. Pixel wrapper 304 and texture filtering unit 302 are known techniques and will therefore not be further described herein. Although the pixel and texel packet shown in FIG. 3 is a 512-bit data packet, the size of the packet can be changed in some embodiments depending on the desired performance characteristics of the graphics processor 202. Command stream processor 306 provides a triangle vertex index to execution unit set control unit 206. In the embodiment of Fig. 3, the index is 256 bits of data. The execution unit set control unit 206 combines the vertex shaded inputs from the stream cache memory and transmits the data to the compute core 204 (input E). Execution unit set control unit 206 also combines the geometric shading inputs and passes them to computing core 204 (input F). Execution unit set control unit 206 also controls execution unit input 402 and execution unit output 404 (Fig. 4). In other words, execution unit set control unit 206 controls each input stream and each output stream to computing core 204.

Client’s Docket No..· S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 200821982 / I過處理之後，計算核心204提供像素著色輸出（輸與輸出u)至寫回單元3Q8。像素著色輪出包括色彩貧訊」例如紅/綠/藍/透明度(RGBA)資訊，其為此技藝之人士所熟知。像素著色輸出可以是兩條512位元之資料流。其他實施例亦可使用其他的位元寬度。相似於像素著色輸出，計算核心204亦輸出包括UVRQ 資訊之紋理座標（輸出K1以及輸出K2)至紋理位址產生器310。紋理位址產生器31〇發出紋理描述符號請求至計算核心204的L2快取記憶體4〇8 (輸入χ)，而計算核心 204的L2快取記憶體4〇8 (輸出w)會輸出紋理描述符號貝料至紋理位址產生器310。紋理位址產生器310及寫回單元308為已知的技術，因此將不再進一步描述於此。再者，雖然URVQ及RGBA是顯示為512位元之資料，但是此參數亦可隨不同實施例而改變。在第三圖的實施例中，匯流排分成兩條512位元通道，其中各通道保持四像素的 128位元RGBA色彩值及128位元UVRQ紋理座標。圖形管線208包括固定功能之圖形處理功能。回應來自驅動軟體128的命令，例如繪出三角形，則頂點資訊通過計算核心204内的頂點著色邏輯單元以實施頂點轉換。尤其是從物件空間轉換物件成為工作空間和/或螢幕空間的三角形。三角形通過計算核心204至圖形管線208的三角形設定單元，其中圖形管線208結合基元（primitive)，並亦執行已知的任務，例如：邊界盒（bounding box )產生、揀選（culling)、邊緣功能產生（edge function generation)Client's Docket No.. S3U06-0014-TW TT^ Docket No: 0608-A41247twf.doc/NikeyChen 200821982 / I After processing, the calculation core 204 provides pixel shading output (transmission and output u) to the write back unit 3Q8. Pixel coloring rounds include color gamma, such as red/green/blue/transparency (RGBA) information, which is well known to those skilled in the art. The pixel shaded output can be two 512-bit data streams. Other embodiments may use other bit widths as well. Similar to the pixel shaded output, the compute core 204 also outputs texture coordinates (output K1 and output K2) including UVRQ information to the texture address generator 310. The texture address generator 31 sends a texture description symbol request to the L2 cache memory 4〇8 (input χ) of the calculation core 204, and the L2 cache memory 4〇8 (output w) of the calculation core 204 outputs the texture. The symbolic material is described to the texture address generator 310. Texture address generator 310 and write back unit 308 are known techniques and will therefore not be further described herein. Furthermore, although URVQ and RGBA are data shown as 512 bits, this parameter may also vary with different embodiments. In the third embodiment, the bus is divided into two 512-bit channels, each of which holds a four-pixel 128-bit RGBA color value and a 128-bit UVRQ texture coordinate. Graphics pipeline 208 includes graphics processing functions for fixed functions. In response to a command from the driver software 128, such as drawing a triangle, the vertex information is passed through a vertex shader logic unit within the compute core 204 to perform a vertex transformation. In particular, the object is transformed from the object space into a triangle of workspace and/or screen space. The triangle passes through the calculation of the core 204 to the triangle setting unit of the graphics pipeline 208, where the graphics pipeline 208 incorporates primitives and also performs known tasks such as bounding box generation, culling, edge functions. Edge function generation

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 14 200821982 .以及三角形層級剔除（triangle level rejection)。三角形設定單元傳遞資料至圖形管線208中具有圖塊產生功能的栅格及圖塊產生單元。因此，實料物件被分割成圖塊(例如8 x8、16x16等），並傳遞至其他的固定功能單元以執行深度 (例如z-值）處理，例如z—值之高階（例如：在相似的程 » ! 序下’高階使用的位元數比低階少）剔除。然後，根據所接收之紋理及管線資料，將z-值傳回至計算核心204的像素著色邏輯元件以作為像素著色功能的效能。計算核心204 將已處理之值輸出至位於圖形管線208内的目的單元。在不同快取記憶體需要更新内部值之前，目的單元用以執行α 測試及模板測試。. 值得注意的是，計算核心204的L2快取記憶體408以及執行單元集合控制單元206之間亦有512位元之頂點快取記憶體溢出資料的傳輸。此外，從計算核心204輸出兩個512位元頂點快取記憶體寫入資料（輸出mi及輸出M2) 至執行單元集合控制單元206做進一步的處理。參考第4圖，第4圖係顯示計算核心204的附加元件以及相關元件。計算核心204包括執行單元集合412。在一實施例中，執行單元集合412包括一或多個執行單元 420a-420h (統稱為執行單元420)。每一個執行單元420 可以在一個時脈週期内處理多個指令。因此，執行單元集合412在尖峰時可同時或是大體上同時處理多個執行緒。雖然第4圖顯示了 8個執行單元420(標示為EU0-EU7)，可以了解的是其並非用以限定執行單元的數量為8，在部Client's Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 14 200821982 . and triangle level rejection. The triangle setting unit passes the data to the grid and tile generating unit having the tile generating function in the graphics pipeline 208. Thus, the physical object is segmented into tiles (eg, 8 x 8, 16x16, etc.) and passed to other fixed functional units to perform depth (eg, z-value) processing, such as high-order z-values (eg, in similar Cheng » ! The order of the higher-order bits used in the higher order is less than the lower order. The z-value is then passed back to the pixel shaded logic element of compute core 204 as a function of the pixel shader function based on the received texture and pipeline data. The calculation core 204 outputs the processed values to the destination unit located within the graphics pipeline 208. The destination unit is used to perform the alpha test and the template test before the different cache memories need to update the internal values. It should be noted that there is also a 512-bit vertex cache memory overflow data transfer between the L2 cache memory 408 of the compute core 204 and the execution unit set control unit 206. In addition, two 512-bit vertex cache memory write data (output mi and output M2) are output from the calculation core 204 to the execution unit set control unit 206 for further processing. Referring to Figure 4, Figure 4 shows additional components of computing core 204 and associated components. The computing core 204 includes a set of execution units 412. In one embodiment, the set of execution units 412 includes one or more execution units 420a-420h (collectively referred to as execution units 420). Each execution unit 420 can process multiple instructions within one clock cycle. Thus, execution unit set 412 can process multiple threads simultaneously or substantially simultaneously at the peak. Although Figure 4 shows eight execution units 420 (labeled EU0-EU7), it can be understood that it is not intended to limit the number of execution units to 8, in the

Clients Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 15 200821982 ^口&例中可增加或是減少執行單元的數量。至少一個執灯單兀（例如執行單元4施，EU〇)包含解碼系統的 -實施:列，其將進_步描述於後。 •计异核心' 204 /亦包括記憶體存取單元（memory access umt’ MjOJ) 406 ’其中記憶體存取單元4〇6經由記憶體介面仲裁口口 410 _接於L2快取記憶體權。L2快取記憶體 4〇i從執行單元#合控制單元2G6接㈣職取記憶體溢出貝料（輸人G) ’並提供頂點快取記憶體溢出資料（輸出H) ^執行單元集合控制單元206。此外，L2快取記憶月旦:攸f理位址產生器310接收紋理描述符號請求（輸 I騎接收到的請求提供紋理描述符號資料（輸出w)給紋理位址產生器31〇。仲裁器410對局部視頻記憶體提供控制介面y列如：晝面緩衝器或是局部記憶豸ι〇6)。匯流排介面早凡118對綠提供如Ρα_Ε匯流排的介面。記情體面^裁器以及匯流排介面單元m提供了記憶體以及己憶體408之間的介面。在部分實施例中，a快取圮憶體408經由記憶體存取單 ^ 、仲裁器410與匯流排介面單元馬妾至記憶體介面將從L2快取記憶體姻以及其他1憶體存取單元概址轉換成實際記憶體位址。&塊传到的虛擬記憶體位記憶體介面仲裁器剔和快取記憶體4 ，吉取（例如讀出/寫人存取）、指令/常數/資魏理= 取、直接Zfe體存取（例如載入/儲存）、暫存存取的索引、Clients Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 15 200821982 The number of execution units can be increased or decreased in the mouth & At least one lamp unit (e.g., execution unit 4, EU〇) contains the implementation of the decoding system: column, which will be described later. The metering core '204/ also includes a memory access unit (memory access umt' MjOJ) 406' in which the memory access unit 4〇6 is connected to the L2 cache memory via the memory interface arbitration port 410_. L2 cache memory 4〇i from the execution unit #合控制 unit 2G6(4) job memory overflow material (input G) 'and provide vertex cache memory overflow data (output H) ^ execution unit collection control unit 206. In addition, the L2 cache memory address generator 310 receives the texture description symbol request (the request received by the input I ride provides the texture description symbol data (output w) to the texture address generator 31. The arbiter 410 The control video interface y is provided for the local video memory such as: face buffer or local memory 豸ι〇6). The bus interface interface provides 118 interfaces to the green as the Ρα_Ε bus. The memorizing body and the bus interface unit m provide an interface between the memory and the memory 408. In some embodiments, a cache memory 408 is accessed from the L2 cache memory and other memory via the memory access unit, the arbiter 410, and the bus interface unit. The unit address is converted to the actual memory address. & block virtual memory bit memory interface arbitrator ticker and cache memory 4, JI (such as read / write access), command / constant / Ziwei = fetch, direct Zfe body access (eg load/store), index of temporary access,

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 16 200821982 暫存盗溢出以及頂點快取記憶體内容溢出等。計算核? 2〇4 $包括執行單元輪入術以及執行單元 =404 ’亚分別用於提供輸入給執行單元集合川以及執行單元集合412的輸出。執行單元輸人4〇2以仃早作出404可以是交叉開關（_加〇或是其他匯k排’或是其他已知的輸入與輸出架構。：應!^單元輸入搬接收來自於執行單元集合控制單元並提二點*色輸入(輸入E)以及幾㈣色輸入(輸入F)，吏執行單元集合412以供各執行單元進行 c :幹二’執行單元輸入4。2接收像素著色輸入（輸入 C與輸入D)以及紋素封包（輸入些封包傳送至執行單元集合412 _ ； ^ 處理。再者，執行單元輸入術^^订早元420進行資訊⑴言賣取），以及當需要時將這^記憶體樣接收元集合412。 α二貝巩提供給執行單在第4圖之實施例中，執行單輸出4〇4a以及奇輸出·。相似輪出伽被分配成偶行單元輸出404可以是交叉開關:早7L輸入402，執架構。執行單元偶輸出404a處理偶勃：排口或是其他已知的 420e以及420g的輸出，而執行單^早兀输、42〜、行單元僅、侧、僅以及42(J輪出4_處理奇執輸出404a以及執行單元奇輸出4〇4】幸月出。執盯單凡偶行單元集合412的輸出，例如：UvRn、同地接收來自於執 χ 以及RGBA 〇這也輸出可回傳至L2快取記憶體408、洗B 一 4疋從計算核心204經Client's Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 16 200821982 Temporary stolen overflow and vertex cache memory content overflow. The calculation kernel? 2〇4 $ includes the execution unit rounding and the execution unit =404' sub is used to provide input to the execution unit set and the execution unit set 412, respectively. The execution unit inputs 4〇2 to make 404 early. It can be a crossbar (_plus or other sinks) or other known input and output architectures.: Should!^ Unit input and receive from the execution unit The control unit is combined with two points* color input (input E) and several (four) color inputs (input F), and the execution unit set 412 is used by each execution unit to perform c: dry two' execution unit input 4. 2 receive pixel coloring input ( Input C and input D) and texel packets (input some packets are sent to the execution unit set 412 _ ; ^ processing. Furthermore, the execution unit input ^ ^ 订 early 420 for information (1) sell), and when needed The memory sample receives the set of elements 412. The alpha bingo is supplied to the execution unit. In the embodiment of Fig. 4, the single output 4〇4a and the odd output are performed. The similar rounds are assigned to the even unit output. 404 can be a crossbar switch: early 7L input 402, implementation architecture. Execution unit even output 404a processing even Bo: port or other known 420e and 420g output, and execute single early, 42~, row unit Only, side, only and 42 (J round out 4_ Processing odd output 404a and execution unit odd output 4〇4] Fortunately, the output of the even-line unit set 412, such as: UvRn, simultaneous reception from χ and RGBA 〇 this output can also be returned To the L2 cache memory 408, wash B-4 from the computing core 204

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 17 200821982 由輸出J1以及輸出J2輸出至寫回單元308,或是經由輸出 K1及輸出K2輸出至紋理位址產生器310。執行單元集合412的執行單元流程通常包括多個層級，其包括：描繪内容層級、執行緒或任務層級，以及指令或執行層級。在任一時間點，各執行單元420可准許兩 « j 個描繪:内容，其中藉由使用一位元旗標或是其他機制來識別内容。在屬於這個内容的任務開始之前，從執行單元集合控制單元206傳遞内容資訊。内容層級資訊可包括著色器種類、輸入/輸出暫存器的數量、指令起始位址、輸出映射表、頂點識別符以及個別常數緩衝器内的常數。執行單元集合412的各執行單元420可同時儲存多個任務或執行緒（例如在部分實施例中有32個執行緒）。在一實施例中，各執行緒係根據程式計數器來提取指令。執行單元集合控制單元206可作為任務的總排程，並利用資料驅動（data-driven )方法（例如：在輸入内的頂點、像素以及幾何封包）來指派執行單元420内的適當執行緒。舉例來說，執行單元集合控制單元206指派一執行緒給執行單元集合412之各執行單元420内的一空執行緒槽 (slot)。當開始執行執行緒之後，由頂點快取記憶體、其他元件或是模組（根據著色器種類）所提供的資料將放置在通用暫存緩衝器中。通常，圖形處理器202係使用可編程頂點、幾何以及像素緩衝器。不把這些元件當成具有不同設計以及指令集的個別固定功能單元而實施這些元件的功能或是操作，而Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 17 200821982 Output from output J1 and output J2 to writeback unit 308, or output to texture address via output K1 and output K2 The device 310. The execution unit flow of execution unit set 412 typically includes a plurality of levels including: depicting a content level, a thread or task level, and an instruction or execution level. At any point in time, each execution unit 420 may permit two «j depictions: content, wherein the content is identified by using a one-bit flag or other mechanism. The content information is delivered from the execution unit collection control unit 206 before the task belonging to this content starts. Content level information can include shader types, number of input/output registers, instruction start addresses, output maps, vertex identifiers, and constants in individual constant buffers. Each execution unit 420 of the execution unit set 412 can simultaneously store multiple tasks or threads (e.g., 32 threads in some embodiments). In one embodiment, each thread fetches instructions based on a program counter. Execution unit set control unit 206 may serve as a general schedule for tasks and assign appropriate threads within execution unit 420 using data-driven methods (e.g., vertices, pixels, and geometry packets within the input). For example, execution unit set control unit 206 assigns a thread to an empty thread slot within each execution unit 420 of execution unit set 412. When the thread is started, the data provided by the vertex cache memory, other components, or modules (depending on the shader type) will be placed in the general scratchpad buffer. Typically, graphics processor 202 uses programmable vertex, geometry, and pixel buffers. These components are not implemented as functions or operations of these components as individual fixed functional units having different designs and instruction sets.

Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 18 200821982 .是藉由具有統一指令集之執行單元420a、420b.,.420n的集合來執行這些操作。除了執行單元420a (其包括解碼系統 200，因此具有額外的功能）之外，各執行單元420的設計相同並且用於編程操作。在一實施例中，各執行單元420 可同時地進行多執行緒操作。當頂點著色器、幾何著色器 » t 以及像素著色器產生不同的著色任務時，這些著色任務將傳送至個別的執行單元420去執行。在使用頂點著色器的一實施例中，解碼系統200可以被實施，其具有部分修改 P 和/或與其他執行單元420有差別。舉例來說，包含解碼系統200的執行單元（例如：執行單元420a)與其他執行單元（例如··執行單元420b)之間的差異是執行單元420a 使用一解碼系統200。而其他執行單元與執行單元420a不同的地方是在於一或多個對應之内部缓衝器中解碼系統 200安排。解碼系統200的資料係藉由連接413以及執行單元輸入402從記憶體存取單元406所接收。當個別任務產生時，執行單元集合控制單元206會指 I 派這些任務給不同執行單元420中可使用的執行緒。當任務完成時，執行單元集合控制單元206進一步管理相關執行緒的釋放。就這點而言，執行單元集合控制單元206指派頂點著色器、幾何著色器以及像素著色器的任務給不同執行單元420的執行緒，並紀錄相關的任務以及執行緒。具體地，執行單元集合控制單元206會維持全部執行單元 420的執行緒以及記憶體的資源表（未顯示）。執行單元集合控制單元206會明確知道哪一個執行緒被指派給任務Client's Docket No.: S3U06-0014-TW TT's Docket No: 0608-A41247twf.doc/NikeyChen 18 200821982. These operations are performed by a set of execution units 420a, 420b., .420n having a unified instruction set. Except for execution unit 420a (which includes decoding system 200, thus having additional functionality), each execution unit 420 is identically designed and used for programming operations. In an embodiment, each execution unit 420 can perform multiple thread operations simultaneously. When the vertex shader, geometry shader » t, and pixel shader produce different coloring tasks, these coloring tasks are passed to the individual execution unit 420 for execution. In an embodiment using a vertex shader, the decoding system 200 can be implemented with partial modifications P and/or differences from other execution units 420. For example, the difference between an execution unit (e.g., execution unit 420a) that includes decoding system 200 and other execution units (e.g., execution unit 420b) is that execution unit 420a uses a decoding system 200. The other execution units differ from the execution unit 420a in the arrangement of the decoding system 200 in one or more corresponding internal buffers. The data of decoding system 200 is received from memory access unit 406 via connection 413 and execution unit input 402. When an individual task is generated, the execution unit set control unit 206 will refer to these tasks to the threads available in the different execution units 420. When the task is completed, the execution unit set control unit 206 further manages the release of the relevant execution. In this regard, execution unit set control unit 206 assigns tasks of vertex shaders, geometry shaders, and pixel shaders to threads of different execution units 420, and records related tasks and threads. Specifically, the execution unit set control unit 206 maintains the threads of all execution units 420 and the resource table (not shown) of the memory. Execution unit The collection control unit 206 will explicitly know which thread is assigned to the task

Client5s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 19 200821982 =器哪-個執行緒會被釋放、多少共在使用 ^、π己 ^ 脰暫存态（register file memory register ) 因=及每一個執行單元有多少閒置空間可使用。時，勃彳-叩:指派任務給執行單元（例如執行單元420a) 並將全部可：：口 :：：凡206,將標示此執行緒為忙碌，佔用之暫存器檔案覆^ =料記憶體減衫執行緒所頂點著色器、幾何著：器及;°;P;^ 定。再者，各f色錄態可狀㈣設定或決頂點著色器執行緒可以要求區大小。例如，器，而像料㈡執行緒可_=共5^存^案暫存暫存器。董要求5個共用暫存器檔案行單其被指派的卫作時，執行該執行緒的執 Π: : 執行單元集合控制單元2。“ : / k 行緒未使用，並將全部執行绪共用暫存器: 加回至可用空間。t所有的執行緒都是㈣或里 =器檔案記憶體都被分配時(或是剩下的暫存= =猶納額外的執行緒時），執行單元420二;：已王滿，以及執行單元集合控制單元2〇6將不會^為額外或是新的執行緒給該執行單元。 /、壬何在各執行單元内部亦有—個執行緒控制哭官理或標不各執行緒為使用中（例如執行中）或是用貝就這點而言，至少在—實施例中，當頂點著色器正在執行Client5s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 19 200821982=Which device will be released, how many are in use ^, π己^ 脰temporary state (register file Memory register ) Because = and how much free space each execution unit has. At the time, Burgundy-叩: assigns a task to the execution unit (for example, execution unit 420a) and all of them can be::::: Where: 206, the thread will be marked as busy, and the occupied temporary file is overwritten. The body vertex shader vertex shader, geometry: and; °; P; ^ fixed. Furthermore, each f-color recording can be set (4) or the vertex shader thread can require the area size. For example, the device, and the material (2) thread can be _= a total of 5 ^ memory ^ temporary storage register. Dong requires 5 shared scratchpad files to execute the thread's execution when the assigned guard is executed: : Execution unit set control unit 2. " : / k line is not used, and all threads share the scratchpad: add back to the available space. t all threads are (four) or inner = device file memory is allocated (or the rest of the temporary Save = = when the extra thread is executed), execution unit 420 2;: Wang Man, and the execution unit set control unit 2〇6 will not give the execution unit an extra or new thread. There is also a thread inside each execution unit that controls the crying or the various threads are in use (for example, in execution) or in the case of Bay, at least in the embodiment, when vertex coloring Is executing

Client’s Docket No」S3U06-0014-TW TT，s Docket No:0608-A41247twf.doc/NikeyChen 20 200821982 解碼系統200的功能時，執行單元集合控制單元2〇6可以避免幾何著色器以及像素著色器在同一時間被執行。第5八圖係顯示具有前述圖形處理器202以及計算核心 204 #寸徵的執行單元420a，其包括内嵌解碼系統2〇〇的執行單元資料路徑512。具體参說，第5A圖是執行單元42〇a 的方塊圖。在一實施例中，執行單元42〇a包括指令快取記憶體控制器504、耦接於指令快取記憶體控制器5〇4的執Client's Docket No"S3U06-0014-TW TT,s Docket No:0608-A41247twf.doc/NikeyChen 20 200821982 When decoding the function of the system 200, the execution unit set control unit 2〇6 can avoid the geometry shader and the pixel shader in the same Time is executed. Figure 5 shows an execution unit 420a having the aforementioned graphics processor 202 and computing core 204, which includes an execution unit data path 512 of the embedded decoding system. Specifically, FIG. 5A is a block diagram of the execution unit 42A. In an embodiment, the execution unit 42A includes an instruction cache controller 504 coupled to the instruction cache controller 5〇4.

打緒控制器506、緩衝器508 (例如：常數緩衝器）、共用暫存為檔案（common register file，CRF ) 51 〇、輕接於執行緒控制器5〇6和緩衝器508以及共用暫存器檔案51〇的執行單元資料路徑（EU data path，EUDP ) 512、執行單一資料路徑先進先出緩衝器（first in first out，FIFO) 5μ 述阔暫存器檔案（predicate register file，PRF) 516 y 暫存裔檔案（scalar register file，SRF) 518、資料私里制态520以及執行緒任務介面524。如前所述，執行如一 420從執行單元輸入4〇2接收輸入，並提供輸出給-元元輸出404。執订單執行緒控制器506提供執行單元42〇a的控制功能，复包括管理各執行緒的功能以及判斷功能，例如決定二其行執行緒。執行單元資料路徑512包括解碼系統2㈧可執進-步描述於後，其通常包括執行不同計算的魏，、’，將含像是浮點以及整數計算邏輯單元（arithmetic logic =包 AIAJ)、移位邏輯功能等的邏輯電路。 mt，資料輸出控制為520將已完成之資料移至輕接於執〃一The controller 506, the buffer 508 (for example, a constant buffer), the common register file (CRF) 51 轻, the thread controller 5 〇 6 and the buffer 508, and the shared temporary storage The file data path (EU data path, EUDP) 512 of the file file 512, the first data path of the first in first out (FIFO) is executed. y scalar register file (SRF) 518, data private state 520 and thread task interface 524. As previously described, execution 420 receives input from the execution unit input 4〇2 and provides an output to the element output 404. The order thread controller 506 provides a control function of the execution unit 42A, and includes functions for managing the threads and determination functions, for example, determining the thread execution. Execution unit data path 512 includes decoding system 2 (8) executable step-by-step description, which typically includes performing Wei, ', which will perform floating-point and integer computing logic units (arithmetic logic = package AIAJ), shifting A logic circuit such as a bit logic function. Mt, data output control is 520 to move the completed data to the light one

Client’s Docket No·: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 21 200821982 單元輸出404之某些元件，例如執 _ 的頂點快取記憶體、寫回單元 ^^工制早凡206 512傳送「任務έ士束,,_ 、執订單凡資料路徑知任矛欠已1 貢訊給資料輸出控制器520，尤土知任矛力已元成。資料輪出控制器520 亚口成的任務（例如32項目r f Λ 省存口口以儲存完料於屮抻制-、’ U Υ))以及複數個寫入埠。眘科細工制盗520從儲存器選擇阜貝容所指定的暫存器位置，從共用暫存器猎=== 的輸執出資:❹，並將資料發送至執行單元輪出::。所有干匕木。匕制早兀206。任務識別符會打早W合控鮮元2G6則旨派新任務給—特定 (例如：執行單元420a)。早在一實施例中，緩衝器·可分成16個區塊， ^鬼有16槽’而每—槽有⑶位元的水平向量常數。、著色為使用運异7L以及索引以存取常數緩衝器槽。舉例來說，索引可以是包括32位元不具正負號之整數或是接近32位元不具正負號之常數的暫時暫存器。才曰令快取記憶體控制器5〇4是到執行緒控制器5〇6的介面區塊。當執行緒控制器讀取請求存在時（例如從指令記憶體提取可執行著色器碼），指令快取記憶體控制器^ 較佳地藉由查找標籤表（未顯示）以執行命中/未命中〇nt/m1SS)測試。舉例來說，當請求的指令是位於指令快取記憶體控制器504的快取記憶體中時，則命中發生。當所請求的指令將從L2快取記憶體408或是記憶體1〇6中提Client's Docket No·: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 21 200821982 Some elements of the unit output 404, such as the vertices of the _ cache memory, write back unit ^^ 206 512 transmission "Task gentleman bundle,, _, the order of the data path knows the spear owed 1 Gongxun to the data output controller 520, Yutu knows that the spear force has been formed. The data rotation controller 520 Yakou Tasks (for example, 32 items rf Λ save the mouth to store the system in the system -, 'U Υ)) and a number of write 埠. Shenke fine work thieves 520 from the storage selected 阜贝容 specified The location of the scratchpad, from the shared register hunting === the transfer of capital: ❹, and send the data to the execution unit round::. All dry eucalyptus. 匕兀兀 206. The task identifier will play early W Synthesizer 2G6 assigns a new task to the specific (for example, execution unit 420a). As an embodiment, the buffer can be divided into 16 blocks, and the ghost has 16 slots, and each slot has (3) The horizontal vector constant of the bit. The coloring is to use the transport 7L and the index to access the constant buffer slot. The index can be a temporary register including a 32-bit unsigned integer or a constant of 32-bit non-signed constants. The cache controller 5〇4 is to the thread controller 5 Interface block of 〇 6. When the thread controller read request exists (eg, extracting executable shader code from the instruction memory), the instruction cache controller is preferably by looking up the label table (not shown) Test with a hit/miss 〇nt/m1SS. For example, when the requested instruction is in the cache memory of the instruction cache controller 504, then a hit occurs. When the requested instruction will From L2 cache memory 408 or memory 1〇6

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 22 200821982 ，取日$ ’則未命中發生。當命中發生日卞單元輸入402的請求，則指令快取:丄:果沒有來自執行同意請求，這是因為指令快取記憶體控制器、504即可取纪憶體只有一個讀寫埠，而執行_制504的指令快，的優先權。否則，如果未命中發1 =元，人402具有最高内有可取代的區塊以及有空間存在’了 §快取記憶體408 資料路徑先進先出緩衝器514中，明求的執行單元 5〇4可同意請求。在一實施例中，护:陕取δ己憶體控制器 504的快取記憶體具有32組，其中亇7陕取6己憶體控制器區塊帶有2位元狀態信號以指示二、、且有4個區塊。各效、載入、或是有效狀態。在區塊载^狀怨其分別是無為「無效」狀態；當等候L2資料押入L2貝料之前，區塊恶，以及當L2資料載入後，區塊總 ”、、载入」狀經由執行單元資料路徑512料、、▲放」狀態。進行讀寫。執行單讀入402作a、4詞暫存器檔案516 420a的介面。在一實施例中，執二進入貝料與執行單元 8項目先料出緩衝器以缓衝進人資料。執行單元^ —個亦可傳送資料至指令快取記憶體控制器⑽的指令^奶2 憶體以及常數緩衝器508。執行單元輸入4〇2亦:拄=記器内容。、符者色執行單元輸出404作為從執行單元42〇a送出資料至行單元集合控制單元206、L2快取記憶體408、w : 執單元308的介面。在一實施例中，執行單元輸出々μ包人一個4項目先進先出緩衝器，用以接收仲裁之請、，3 η 5 it, ^Client’s Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 22 200821982, the day $ ‘the miss occurred. When the hit occurs, the request for the day unit input 402 is performed, then the instruction cache is: 丄: there is no request for execution consent, because the instruction cache memory controller, 504 can take only one read/write file, and execute _ 504 instructions are fast, priority. Otherwise, if the miss 1 is 1 = element, the person 402 has the highest internal replaceable block and there is space. '§ Cache memory 408 data path FIFO buffer 514, the explicit execution unit 5〇 4 can agree to the request. In an embodiment, the cache memory of the δ 己体 memory controller 504 has 32 groups, wherein the 亇7 取 6 6 hex memory controller block has a 2-bit status signal to indicate And there are 4 blocks. Effective, loaded, or valid. In the block, there is no "invalid" status; when waiting for the L2 data to be inserted into the L2 material, the block is evil, and when the L2 data is loaded, the block is always ", loaded". The unit data path is 512, ▲ put" state. Read and write. The single read in 402 is performed as the interface of the a, 4 word register file 516 420a. In one embodiment, the entry into the bedding and execution unit 8 items first buffers the buffer to buffer the incoming data. The execution unit can also transmit data to the instruction flash memory controller (10) command milk 2 memory and constant buffer 508. Execution unit input 4〇2 also: 拄 = recorder content. The executable unit output 404 is used as an interface for sending data from the execution unit 42A to the row unit set control unit 206, the L2 cache memory 408, and the w: unit 308. In one embodiment, the execution unit outputs a 4-item FIFO buffer for receiving arbitration requests, 3 η 5 it, ^

Client5s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 23 200821982 衝執行單元集合控制單元206的資料。執行單元輸出404 包含多種功能，其包括仲裁指令快取記憶體讀取請求、資料輸出寫入請求以及執行單元資料路徑讀出/寫入請求的功能。共用暫存器檔案510用於儲存輸入、輸出、以及暫存 ! 資料。在一實施例中，共用暫存器檔案5 10包括具有 128x128位元暫存器檔案之一讀一寫埠和一讀寫埠的八個記憶庫（bank)。一讀一寫埠是由執行單元資料路徑512 所使用，以供由指令執行所初始的讀出以及寫入存取。記憶庫0、2、4以及6係由偶數執行緒所共用，而記憶庫1、 3、5以及7係由奇數執行緒所共用。執行緒控制器506比對不同執行緒的指令，並確認共用暫存器檔案的記憶體沒有讀出或寫入記憶庫之衝突。一讀寫埠是由執行單元輸入402以及資料輸出控制器 520所使用，用以載入初始執行緒輸入資料並將最後執行緒輸出寫至執行單元集合控制單元資料緩衝器及L2快取記憶體408或是其他模組。執行單元輸入402以及執行單元輸出404共用一個讀寫輸入/輸出埠，以及在一實施例中，寫入比讀出具有較高的優先權。512位元的輸入資料進入四個不同的記憶庫以避免將資料載入至共用暫存器檔案510時會發生衝突。傳送2位元通道索引、資料以及512 位元對齊基準位址（aligned base address )以指定輸入資料的開始記憶庫。舉例來說，當開始通道索引為1時，假設執行緒基準記憶庫偏移量（offset)為0，則從最低有效位Client5s Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 23 200821982 The data of the execution unit set control unit 206 is executed. Execution unit output 404 includes a variety of functions including arbitration instruction cache read requests, data output write requests, and execution unit data path read/write requests. The shared scratchpad file 510 is used to store input, output, and temporary storage ! data. In one embodiment, the shared scratchpad file 5 10 includes eight banks having one of 128x128 bit register files, one read and one write, and one read/write. The read-and-write write is used by the execution unit data path 512 for the initial read and write accesses performed by the instruction. Memory banks 0, 2, 4, and 6 are shared by even threads, while memories 1, 3, 5, and 7 are shared by odd threads. The thread controller 506 compares the instructions of the different threads and confirms that the memory of the shared scratchpad file does not have a read or write memory conflict. A read/write buffer is used by the execution unit input 402 and the data output controller 520 to load the initial thread input data and write the final thread output to the execution unit set control unit data buffer and the L2 cache memory. 408 or other modules. Execution unit input 402 and execution unit output 404 share a single read/write input/output port, and in one embodiment, write versus readout has a higher priority. The 512-bit input data enters four different banks to avoid collisions when loading data into the shared scratchpad file 510. The 2-bit channel index, data, and 512-bit aligned base address are transmitted to specify the starting memory of the input data. For example, when the start channel index is 1, assuming the thread reference memory offset (offset) is 0, then the least significant bit is used.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 24 200821982 元（lest significant bit，LSB )起算的第一個128位元被載入至記憶庫1，下一個128位元被載入至記憶庫2···等，以及最後一個128位元被載入至記憶庫0。值得注意的是，使用執行緒ID的兩個最低有效位元來產生記憶庫偏移量，以隨機排列每一個執行緒的開始記憶庫位置。 j 可使用共用暫存器檔案暫存器索引以及執行緒ID以建立唯一的邏輯位址，使標籤能比對共用暫存器檔案510所寫入以及讀出的資料。舉例來說，位址可以排成128位元，即共用暫存器檔案記憶庫的寬度。藉由結合8位元之共用暫存器檔案暫存器索引以及5位元之執行緒ID，可以建立 13位元的位址以產生唯一的位址。每一個1024位元線具有一標籤，以及每一位元線有兩個512位元項目（字元）。各字元儲存於4個記憶庫中，以及將共用暫存器檔案索引的兩個最低有效位元加入至目前執行緒的記憶庫偏移量以建立記憶庫選擇。標籤比對方法可讓不同執行緒的暫存器共同使用共用暫存器檔案510以有效利用記憶體，因為執行單元集合控制單元206紀錄共用暫存器檔案510的記憶體使用程度，並確保對執行單元420a的新任務進行排程之前有足夠的空間。對照於目前執行緒之全部共用暫存器檔案暫存器的大小以檢查目標共用暫存器檔案索引。在執行緒控制器506 著手進行執行緒以及著色器執行開始之前，輸入資料就被預期存放在共用暫存器檔案510内。當執行緒執行結束Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 24 200821982 The first 128 bits from the (lest significant bit, LSB) are loaded into memory bank 1, the next 128 The bits are loaded into the memory bank 2, etc., and the last 128 bits are loaded into the memory bank 0. It is worth noting that the two least significant bits of the thread ID are used to generate the memory offset to randomly rank the starting memory locations of each thread. j The shared scratchpad file register index and the thread ID can be used to create a unique logical address that allows the tag to compare the data written and read by the shared scratchpad file 510. For example, the address can be arranged in 128 bits, which is the width of the shared scratch file archive. By combining the 8-bit shared scratchpad file register index and the 5-bit thread ID, a 13-bit address can be created to generate a unique address. Each 1024-bit line has a label, and each bit line has two 512-bit items (characters). Each character is stored in four banks, and the two least significant bits of the shared scratchpad file index are added to the current library's memory offset to establish a bank selection. The tag comparison method allows the different scratchpads to use the shared scratchpad file 510 to effectively utilize the memory, because the execution unit set control unit 206 records the memory usage of the shared scratchpad file 510, and ensures that There is sufficient space before the new task of execution unit 420a is scheduled. The target shared scratchpad file index is checked against the size of all the shared scratchpad file registers of the current thread. The input data is expected to be stored in the shared scratchpad file 510 before the thread controller 506 proceeds to execute the thread and the colorizer execution begins. When the thread execution ends

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 25 200821982 f ’精由資料輪出控制器52〇從共用暫存器檔案510讀取輸出資料。、Clienfs Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 25 200821982 f The fine data wheel controller 52 reads the output data from the shared register file 510. ,

—^L執行單元420之實施例包括内含解碼系統2〇〇之 ::例：執行單元資料路徑512 ’第5B圖係顯示執行單元貝，路彼512之—實，施例。執行單元資料路# 512包含暫存為棺案526、多工器528、向量浮點單元532、向量整數 =輯單元说、特殊目的單元536、多工器538、暫存 '、〇以及解碼系統200。解碼系統2〇〇包含一或多义長度解碼（variable length decoding，VLD)單元 530，其可以解碼—❹個串流。例如，單—可變長度解碼單元 έ 〇可以解瑪單—串流’兩個可變長度解碼單元530 (如虛良斤頌不因簡潔之故而未顯示其連接關係）可以同時解碼:個串流等等。為了說明’之後的敘述僅針對使用單一可，長度解碼單元530之解碼系統的操作，可以了解的疋其原則可推衍至超過—個可變長度解碼單元。如圖。所示，執行單元資料路徑512包含對應於可變長二午馬單元530、向置浮點單元532、向量整數計算邏輯單以及特殊目的單元536的一些平行資料路徑，其根 =接收到的指令執行制㈣作。暫存⑽案526接收 2兀（標示WRC1#SRC2)。在—實施例中，暫存器、田案526可對應於第5A圖所顯示之共用暫存關案51〇、述同暫存器槽案516，和/或純量暫存器檔案518。值得注意的是在某些實施例中，可使用額外的運算元。操作（功能）信號線542提供各單a 53 〇_53 6 |妾收運算信號的媒介The embodiment of the execution unit 420 includes a built-in decoding system 2: Example: execution unit data path 512 '5B shows the execution unit, and the path 512 is true. The execution unit data path # 512 includes a temporary storage file 526, a multiplexer 528, a vector floating point unit 532, a vector integer = a unit statement, a special purpose unit 536, a multiplexer 538, a temporary storage ', a buffer, and a decoding system. 200. The decoding system 2 includes a one or more variable length decoding (VLD) unit 530 that can decode - one stream. For example, the single-variable-length decoding unit 〇解 can solve the singular-streaming 'two variable-length decoding units 530 (if the virtual 颂颂 does not show its connection relationship for simplicity) can be decoded simultaneously: strings Flow and so on. For purposes of illustration, the following description is only directed to the operation of the decoding system of the length decoding unit 530, and the principles can be derived from more than one variable length decoding unit. As shown. As shown, the execution unit data path 512 includes some parallel data paths corresponding to the variable length 255, the floating point unit 532, the vector integer calculation logic, and the special purpose unit 536, the root = the received instruction Execution system (4). Temporary Storage (10) Case 526 receives 2兀 (marked WRC1#SRC2). In the embodiment, the register, field 526 may correspond to the shared temporary storage 51, shown in FIG. 5A, the associated register slot 516, and/or the scalar register file 518. It is worth noting that in some embodiments, additional operands may be used. The operation (function) signal line 542 provides a medium for each single a 53 〇 _53 6 |

Clienfs Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 26 200821982Clienfs Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 26 200821982

(mednun)當前信號線544耦接至多工器528，傳送編碼成指令之當前值以供各單元53〇_536完成小整數值的整數運算。指令解碼器（未顯示）提供運算元、運算（功能）信號以及當前信號。資料路徑（可包含寫回階段）末端的多工器538選,已被選擇之正確資料路徑的輸出結果並提供輸出給暫存器:檔案540。輸出暫存器檔案54〇包括目標元件，其可以是相同於暫存器檔案526或是不同暫存器的元件。值得注意的是在實施例中，當來源以及目標暫存荞包含相同元件時，指令提供之位元具有由多共器所使用之末源與目4示述擇以多路傳輸資料至/來自適當暫存器槽案。因此，執行單元420a可視為多階管線（例如4階管線，具有4個計算邏輯單元），並在4個執行階段中發生解螞操作。需要實施延遲以允許執行解碼執行緒。舉例來說，當位元流缓衝斋發生向下溢位（un(jer;Q〇w )、等候初始内容記憶體、等候將位元流載入至先進先出緩衝器以及暫存器（解釋於後），和/或處理時間已超過時間之既定定限（threshold)時，可以在執行階段加入延遲。如前所述，在部分實施例中，解碼系統200能使用單一執行卓元420a同時解碼兩個位元流。舉例來說，根據〜個擴充指令集，解碼系統可以使用兩個資料路徑（例如新增另一可變長度解碼單元530)以同時進行兩個串流的解碼，然而可一次解碼較多或較少的串流（因此會使用較多或較少的資料路徑）。當需要多個串流時，解碼系統2〇〇的部分實施例並未限定於同時解碼。再者，在部分實施例The current signal line 544 is coupled to the multiplexer 528 and transmits the current value encoded as an instruction for each unit 53 〇 536 to perform an integer operation of a small integer value. An instruction decoder (not shown) provides an operand, an arithmetic (function) signal, and a current signal. The multiplexer 538 at the end of the data path (which may include the writeback phase) selects the output of the correct data path that has been selected and provides the output to the scratchpad: file 540. The output scratchpad file 54 includes the target component, which may be the same as the scratchpad file 526 or a different scratchpad component. It should be noted that in the embodiment, when the source and the target temporary storage include the same component, the bit provided by the instruction has the end source and the destination 4 used by the multi-common device to multiplex the data to/from Appropriate register slot case. Thus, execution unit 420a can be considered a multi-stage pipeline (e.g., a 4th-order pipeline with 4 computational logic units) and a solution operation occurs in 4 execution phases. A delay needs to be implemented to allow execution of the decoding thread. For example, when the bit stream buffers a down overflow (un(jer; Q〇w), waits for the initial content memory, waits for the bit stream to be loaded into the FIFO buffer, and the scratchpad ( The delay may be added during the execution phase when the processing time has exceeded the established threshold of time. As previously mentioned, in some embodiments, the decoding system 200 can use a single execution 420a Simultaneous decoding of two bitstreams. For example, according to the ~extended instruction set, the decoding system can use two data paths (for example, another variable length decoding unit 530) to simultaneously decode two streams. However, more or less streams can be decoded at a time (so more or fewer data paths are used.) When multiple streams are required, some embodiments of the decoding system 2 are not limited to simultaneous decoding. Furthermore, in some embodiments

Client’s Docket No.: S3U06-0014-TW TTJs Docket No:0608-A41247twf.doc/NikeyChen 27 200821982 中’單-可變長度解碼發生的解碼。早元530可以執行串流之多個同時在貫施例中，當缺Client's Docket No.: S3U06-0014-TW TTJs Docket No: 0608-A41247twf.doc/NikeyChen 27 200821982 Decoding of 'single-variable length decoding. In early 530, you can perform multiple streams at the same time.

兩個執行緒可以同萨1糸統200使用兩個資料路徑時，中，執行、，的數旦°例士口 ’在兩串☆解碼之實施例如執行緒給：、1為兩個’其中指派第了執行緒（例解碼單元530)，而#、統200的第一記憶庫：（即可變長度碼系統200的第’二“ ^派第二執行緒（例如執行緒1)給解長度解碼單元）：：：：(」列如第5B圖虛線所顯示之可變運作在單—記情邻二μ，例中，兩個或多個執行緒可 200是内嵌於執行單部：實施例中，雖然顯示解碼系統的元件，例如執C空512 μ，其亦可包含其他在下面的控制單元206内的邏輯電路。 ΐ::1 可變長度解瑪單元530以及解碼系统200 使用’而可以了解到解碼系統2〇〇可包可受長度解碼單元53〇。一们將描述位於解碼系統細下的結構，而各單獨解石馬系統模式描述如下。特別地，在一實施例中，由驅動軟體128 所提出之下列指令可設定不同模式。進一步描述如下··指令1NIT-CTX (設置解碼系統200為CABAC處理模式）、指令INIT—CAVLC (設置解碼系統200為CAVLC處理模式）、指令INITJV[PEG2 (設置解碼系統2〇0為MPEG-2 處理模式），以及指令INITJVC1 (設置解碼系統2〇〇為 VCM/WMV9處理模式）。在部分實施例中，經由指令 INIT—AVS可提供額外的初始化，其可初始化音頻視頻標準The two threads can use the two data paths when the two systems are used, the execution, and the number of times of the case. In the implementation of two strings of ☆ decoding, for example, the thread gives:, 1 for two ' Assigning the first thread (the example decoding unit 530), and #, the first memory bank of the system 200: (that is, the second thread of the variable length code system 200 sends a second thread (for example, thread 1) to the solution. The length decoding unit)::::() column is as shown by the dotted line in FIG. 5B, and the variable operation is in the single-sympathy neighbor two μ. In the example, two or more threads can be embedded in the execution unit. In the embodiment, although the components of the decoding system are displayed, for example, C 512 μ, it may also include other logic circuits in the following control unit 206. ΐ::1 variable length gamma unit 530 and decoding system 200 It can be understood that the decoding system 2 can be subjected to the length decoding unit 53. One will describe the structure located under the decoding system, and the individual stone solutions are described below. In particular, in an implementation In the example, the following instructions proposed by the driver software 128 can be set. Different modes are further described as follows: • Command 1NIT-CTX (set decoding system 200 to CABAC processing mode), instruction INIT-CAVLC (set decoding system 200 to CAVLC processing mode), instruction INITJV [PEG2 (set decoding system 2〇0) For the MPEG-2 processing mode), and the instruction INITJVC1 (set the decoding system 2 to the VCM/WMV9 processing mode). In some embodiments, additional initialization can be provided via the INIT-AVS instruction, which initializes the audio video standard.

Client’s Docket No,: S3U06-0014-TW TT^s Docket No:0608-A41247twf.doc/NikeyChen 28 200821982 (audio video standard，A VS)位元流編碼。對 EXP-Golomb 系統而言’在CABAC以及CAVLC編碼下使用 EXP-Golomb編碼符號，因此指令iNIT CTX以及指令 INIT—CAVLC下載EXP-Golomb系統的位元流。其中，不需要，EXP-Golomb系統進行初始。舉|列來說，對要被編碼的符號而言，在位元流（例如在片段標頭位準的位元設定）所接收之計算編碼旗標會顯示符號為Exp_G〇1〇mb編碼、CABAC編碼以及CAVLC編碼。當使用EXP-Golomb 編碼時，執行下列所提出之適當的EXP_G〇i〇mb編碼指令。雖然這些模式會影響編碼引擎的實施，其亦會影響初始、使用以及更新記憶體的方法，進一步描述於後。參考第5C圖，第5C圖係顯示可變長度解碼單元之功能方塊圖，用以根據所選擇之模式完成任何複數解碼操作之一。可變長度解碼單元530包括可變長度解碼邏輯電路550，其中可變長度解碼邏輯電路55〇耦接於由串流緩衝器/DMA引擎562 (於此亦稱為DMA引擎模組）所組成之位元流緩衝器管理以及鄰近内容記憶體 (neighborhood context memory，NCM) 564 (亦稱為内容記憶體）。可變長度解碼單元53〇亦包括一或多個暫存^ 566 ’其包括用以儲存來自執行單元“ο (「CQNTROL」，例如使用來自執行單元之解碼器的控制信號以選擇可變長度解碼邏輯電路550的模組）有關給定模式之選擇的解碼資料之暫存斋、運异元（例如r Src 1」以及「SRC2」）、'，以及轉發暫存器（例如「F1」以及rF2」）。SRE(}串流Client's Docket No,: S3U06-0014-TW TT^s Docket No: 0608-A41247twf.doc/NikeyChen 28 200821982 (audio video standard, A VS) bit stream coding. For the EXP-Golomb system, the EXP-Golomb coded symbols are used under CABAC and CAVLC encoding, so the iNIT CTX and the INIT-CAVLC are instructed to download the bit stream of the EXP-Golomb system. Among them, the EXP-Golomb system does not need to be initialized. For the column to be encoded, the computed coding flag received in the bitstream (eg, the bit set at the slice header level) will display the symbol Exp_G〇1〇mb, CABAC coding and CAVLC coding. When using EXP-Golomb encoding, execute the appropriate EXP_G〇i〇mb encoding instructions as set forth below. While these patterns affect the implementation of the encoding engine, they also affect the initial, usage, and method of updating memory, as described further below. Referring to Figure 5C, Figure 5C shows a functional block diagram of a variable length decoding unit for performing any of the complex decoding operations in accordance with the selected mode. The variable length decoding unit 530 includes a variable length decoding logic circuit 550, wherein the variable length decoding logic circuit 55 is coupled to a stream buffer/DMA engine 562 (also referred to herein as a DMA engine module). Bitstream buffer management and proximity context memory (NCM) 564 (also known as content memory). The variable length decoding unit 53A also includes one or more temporary storage locations 566' that are included for storing from the execution unit "o ("CQNTROL", for example using a control signal from a decoder of the execution unit to select variable length decoding The module of the logic circuit 550) temporarily stores the decoded data of the selected mode, the transport elements (eg, r Src 1 and "SRC2"), ', and the forwarding registers (eg, "F1" and rF2) "). SRE (} stream

Client’s Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 29 200821982 ,·緩衝器/DMA引擎562包括SREG暫存器⑽以及位元流緩衝器562b，將進一步解釋於後。在一貫施例中，可變長度解碼邏輯電路55〇包括第5C 圖所顯示之模組（亦稱為邏輯電路）。可變長度解碼邏輯，路550包括硬體，其包括暫存器，/或布林或是計算邏輯電路，用以執行指令並根據所選擇之模式執行解碼。進一步解釋，可變長度解碼邏輯電路55〇 &括讀取鄰近内容記體模組（read—NCM ) 568、檢查字串（JNPSTR )模組 i 570、讀取模組572、計算前導1 ( CL〇)模組574、計算前導〇(0^；)模組 576、]\/〇^0模組 578、〇八；6人(：模組 580、 CAVLC模組582,以及耦接於計算前導〇(CLZ)模組576 之Exp-Golomb模組584。計算前導〇 (CLZ)模組576以及计异鈾導1(CL〇）模組574包括可解碼MPEG-2以及VC-1 位元流之指令。關於Exp-Gol〇mb模組584，Exp-Golomb 符號由跟在1之後的一些前導零所編碼，接著一些位元會等於零的數量。計算前導〇(CLZ)模組576偵測前導零的數置，接著移動這些位元加上丨以記錄前導零的數量。 Exp-Golomb模組584讀取尾隨位元（tramng blt)的數量，並根據Exp-Golomb模式而執行計算以判斷值。讀取鄰近内容記憶體模組568包括對應於產生位址以及请求圮憶體讀取操作的邏輯電路。在記憶體讀取操作中，從鄰近内容記憶體564讀取固定的位元數並輸出資料至目標暫存器。鄰近内容記憶體指令為從内容記憶體564 讀取32位元的資料並經由多工器685傳回所讀取的值給執Client's Docket No.: S3U06-0014-TW TT^ Docket No: 0608-A41247twf.doc/NikeyChen 29 200821982, buffer/DMA engine 562 includes SREG register (10) and bit stream buffer 562b, which will be further explained in Rear. In a consistent embodiment, the variable length decoding logic 55 includes a module (also referred to as a logic circuit) as shown in FIG. 5C. Variable length decoding logic, path 550, includes hardware that includes a scratchpad,/or a Boolean or computational logic to execute instructions and perform decoding in accordance with the selected mode. Further explained, the variable length decoding logic circuit 55 〇 & includes reading the adjacent content record module (read-NCM) 568, the check string (JNPSTR) module i 570, the read module 572, and the calculation leader 1 ( CL〇) module 574, calculation lead 〇 (0^;) module 576,]\/〇^0 module 578, 〇8; 6 people (: module 580, CAVLC module 582, and coupled to the calculation The Exp-Golomb module 584 of the leading 〇 (CLZ) module 576. The calculation leading 〇 (CLZ) module 576 and the 铀铀导 1 1 (CL〇) module 574 include decodable MPEG-2 and VC-1 bits. Flow instruction. With the Exp-Gol〇mb module 584, the Exp-Golomb symbol is encoded by some leading zeros following the 1 followed by some bits equal to zero. The calculation of the leading 〇 (CLZ) module 576 detection The number of leading zeros is set, and then these bits are added to add 丨 to record the number of leading zeros. Exp-Golomb module 584 reads the number of trailing bits (tramng blt) and performs calculation according to Exp-Golomb mode to determine The read proximity content memory module 568 includes logic circuitry corresponding to the generation of the address and the request for the memory read operation. In the body read operation, the fixed number of bits is read from the adjacent content memory 564 and the data is output to the target register. The adjacent content memory command reads the 32-bit data from the content memory 564 and passes the multiplex. 685 returns the value read to

Client?s Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 200821982 ..行單元4施的目標暫存器。CABAC以及cavlc編碼沒有使用到鄰近内容記憶體指令，然而對其他可變長度運算而言（例如：^、ΜΡΕ(：Μ Asp(DivX))，可使用内容記憶體564以維持可變長度解碼表，以及可使用讀，取鄰近内容記憶體模組以讀取可變長度解碼表内的值。貝讀取模'组572 &含邏輯電路以讀取s·暫存哭 562a，且從SREG暫存器562a之最高有效位元（二 significant bit，MSB)部分擷取特定位元數，零延伸（zer〇 Γ extend)，並將值放入暫存器内。因此，讀取模組s72包含邏輯電路以執行讀取操作，其讀取特定位元數並從sreg 暫存器562a移除以傳回不具正負號數值的值給目標暫存器。檢查字串模組570從SREG暫存器562a讀取固定位元數，但沒有從SREG暫存器562a移除任何位元（例如不改變指標位置），並傳回不具正負號數值的值給目標暫存器。各模組568-584皆耦接至多工器586，其中多工器586 根據各自的命令而選擇一模式。在一實施例中，多工器586 的輸出提供至目標暫存器以進一步處理。模組569-582的輸出亦提供至多工器586，其對應於一命令，選擇模組 569-582的輸出並提供至SREG暫存器562a以作為輸入。在個別相同的運算期間，提供來自轉發、控制以及運算暫存器566的資料給CABAC模組580以及CAVLC模經582 使用。經由接收控制信號（標示為第5C圖的 EXP一GOLOMB一OP )以致能 Exp-Golomb 模組 584。 Exp-Golomb模組584接收來自計算前導〇(CLZ)模纽576Client?s Docket No.: S3U06-0014-TW TT^ Docket No: 0608-A41247twf.doc/NikeyChen 200821982 .. The target register of row unit 4. CABAC and cavlc encoding do not use adjacent content memory instructions, whereas for other variable length operations (eg, ^, ΜΡΕ (: Μ Asp (DivX)), content memory 564 can be used to maintain variable length decoding tables. And can use the read, take the adjacent content memory module to read the value in the variable length decoding table. The shell read module 'group 572 & contains logic to read s · temporary cry 562a, and from SREG The most significant bit (MSB) portion of the register 562a takes a specific number of bits, zero extension (zer〇Γ extend), and puts the value into the scratchpad. Therefore, the reading module s72 A logic circuit is included to perform a read operation that reads a particular number of bits and removes from the sreg register 562a to return a value that has no sign value to the target register. The check string module 570 is temporarily staging from the SREG The 562a reads the fixed number of bits, but does not remove any bits from the SREG register 562a (eg, does not change the index position) and returns a value that does not have a sign value to the target register. Module 568- The 584 is coupled to the multiplexer 586, wherein the multiplexer 586 is The mode selects a mode. In one embodiment, the output of the multiplexer 586 is provided to the target register for further processing. The output of the modules 569-582 is also provided to the multiplexer 586, which corresponds to a command, selection The outputs of modules 569-582 are provided as input to SREG register 562a. During the same operation, data from forwarding, control, and operation registers 566 are provided to CABAC module 580 and CAVLC mode 582. The Exp-Golomb module 584 is enabled via a receive control signal (expressed as EXP-GOLOMB-OP of Figure 5C). The Exp-Golomb module 584 receives the analog lead (CLZ) module 576.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 31 200821982 的輸入並提供輸出至多工器586。CABAC模組58〇以及 CAVLC模組582可使用内容記憶體564。Client's Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 31 200821982 inputs and provides output to multiplexer 586. The content memory 564 can be used by the CABAC module 58A and the CAVLC module 582.

對除了 CABAC以及CAVLC模式之外的全部模式而吕，讀取指令為從SREG暫存器562a讀取η位元，並麵由多工器586傳回所讀取的，值至執行單元42〇a的目標暫存器。對除了 CABAC以及:CAVLC模式的模式而言，使用内谷圮憶體564以維持上方以及左方的内容值，其為自動璜取以作為解碼程序的部分。這些元件以及可變長度解碼單元530的其他元件將結合不同模式而進一步插述於後二值的注意的是在部分貫施例中，可變長度解碼邏輯電路可包括少於（或多於）全部所顯示之模組和/或多工哭將描述可變長度解碼單元530的一般功能，而可變度解碼單元530配置在不同模式下的操作將進一步描=: 後。〃田：^於 CABAC解碼下面簡單解釋CABAC解碼，然後說明解碼系統的一些實施例。通常，H.264標準的CABAC解碼程序可= 說明為包括解析第一語法成分之已編碼位元流、初始化二片段之内容變數以及第一語法成分之解碼引擎，以^二: 位化（binarization)。接著，對每一個二進位值（Μη)進行解碼，其程序包括獲得内容模組以及各語法成分之二進位值的解碼，直到獲得有意義的字碼（c〇dew〇rd)比對。更進一步解釋，解碼系統200對語法成分進行解碼，其中For all modes except the CABAC and CAVLC modes, the read command reads n bits from the SREG register 562a, and the multiplexer 586 returns the read value to the execution unit 42. The target register of a. For modes other than the CABAC and :CAVLC modes, the inner memory is used to maintain the upper and left content values, which are automatically captured as part of the decoding process. These elements, as well as other elements of the variable length decoding unit 530, which are further interleaved in the latter modes in conjunction with different modes, are noted in some embodiments, the variable length decoding logic may include less than (or more) All of the displayed modules and/or multiplexed crying will describe the general functionality of the variable length decoding unit 530, and the operation of the variability decoding unit 530 configured in different modes will be further described. Putian: ^ CABAC decoding The following briefly explains CABAC decoding, and then illustrates some embodiments of the decoding system. In general, the CABAC decoding program of the H.264 standard can be described as a decoding engine including an encoded bit stream that parses the first syntax component, a content variable that initializes the second fragment, and a first syntax component, to be binarized. ). Next, each binary value (Μη) is decoded, and the program includes obtaining the decoding of the content module and the binary values of the syntax components until a meaningful word (c〇dew〇rd) alignment is obtained. Still further explained, the decoding system 200 decodes the syntax components, wherein

Client5s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 32 200821982 每一語法成分可以代表量子化係數、動作向量、和/或預測模式、或其他有關巨集區塊（macroblock )的參數，用以表示影像或是視頻的特定圖場（field )或是圖框（frame )。每一個語法成分可以包含連續的一或多個二進位符號或是二進位值，而每一個二進位符號會被解碼成0或1值。解Client5s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 32 200821982 Each grammatical component can represent quantization coefficients, motion vectors, and/or prediction modes, or other related macroblocks ( A macroblock parameter that represents a particular field or frame of an image or video. Each syntax component can contain consecutive one or more binary symbols or binary values, and each binary symbol is decoded to a value of 0 or 1. solution

I I 碼系統200根據輸入二進位符號的發生機率控制輸出位元長度。當某些符號（稱為主要符號）比其他符號更可能發生， CABAC編碼器可提供高效率編碼方法。這些主要符號可用較小位元/符號比例來進行編碼。編碼器持續更新進入資料的頻率統計，並適當地調整編碼演算的計算以及内容模型。具有較高可能性的二進位符號稱為高可能性符號（most probable symbol，MPS )，而其他符號則為低可能性符號 (least probable symbol，LPS )。二進位符號與其内容模型結合，具有對應於低可能性符號的可能性以及高可能性符號值的各内容模型。為了對各二進位符號進行解碼，解碼系統200決定或是接收一對應範圍、偏移量以及内容模型。内容模型是根據符號種類以及由鄰近空間（例如目前巨集區塊或是屬於前次解碼的相鄰巨集區塊）所決定的内容而從複數個可能的内容模型中所選擇。可由内容模型決定内容辨識符號，從而並使用以得到高可能性符號值以及用於解碼程序之解碼引擎的目前狀態。範圍表示一個區間（interval )，每經過一次二進位解碼就會縮小一次範圍。The I I code system 200 controls the output bit length based on the probability of occurrence of the input binary symbol. When certain symbols (called primary symbols) are more likely to occur than others, CABAC encoders provide a highly efficient encoding method. These primary symbols can be encoded with a smaller bit/symbol scale. The encoder continuously updates the frequency statistics of the incoming data and adjusts the calculation of the coding calculus and the content model as appropriate. A binary symbol with a higher probability is called a most probable symbol (MPS), while other symbols are a low probable symbol (LPS). The binary symbol, in combination with its content model, has a content model corresponding to the likelihood of low probability symbols and high likelihood symbol values. In order to decode each binary symbol, decoding system 200 determines or receives a corresponding range, offset, and content model. The content model is selected from a plurality of possible content models based on the type of symbol and the content determined by the neighboring space (e.g., the current macroblock or the adjacent macroblock that belongs to the previous decoding). The content identification symbol can be determined by the content model and used to obtain a high probability symbol value and the current state of the decoding engine used to decode the program. The range represents an interval (interval), which is scaled down once every binary decoding.

Clients Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 200821982 區間分為兩個子範圍，分別m 及低可能性符號值。藉由將範高可=符號值以的低可能性符號可能性相乘則知内合杈型所指定网技丄 J叶异出低可能性符號子豁圍。猎由將·減去低可能性符號子範圍可計算出性符號子範圍。偏移量是決定解回可月匕赍曰帕 1 馬一進位值的標準，且诵碼位元流中取出前9位元進行初始化。對一進位付號解碼及内容模型，當石r网卩士、镐矛夕里小於咼可忐性符號已二進位值為高可能性符號值，而下—次解使用的範圍會設為高可能性符號，、'、斤由低可能性符號決定、高可能性符值相關的内容槿剞中，以及下柄的反向值會包含在 ^辄圍。解碼程序的結果為連續的已解碼二進位值，复^ 砰估以判斷此序列是否符合有意義的字碼 ” 概括敘述解碼系、統的操作與⑽ac解係，下列敘述獅CABAC解碼程序之内容: 200的各種元件，可將符合實際雁、、、糸、·先干a應用的各種變動列熟悉此技藝之人士可知下列所使用的許多術語是^自 H.264規格，為了簡潔不再贅述，除非是有助於了解所述的不同程序和/或元件，才會再做進_步之說明。第6A圖至f 6F圖係顯示解石馬系統2〇〇及相關元件之具體實施例的方塊圖。如圖所顯示，解碼系統· 一 CABAC單元530 (在第6A圖至第佔圖，所使用之 CABAC單元530可與解碼系統2〇〇互換），因此在實施例中，解碼系統200可解碼單—位元流。同樣的原理可應用Clients Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 200821982 The interval is divided into two sub-ranges, respectively m and low probability symbol values. By multiplying the high probability symbolic likelihood of the metric value = symbol value, it is known that the network technology specified by the 杈杈 type has a low probability symbol sub-edge. The hunter subtracts the low probability symbol sub-range to calculate the sex symbol sub-range. The offset is the criterion for determining the value of the return value of the 匕匕 1 1 1 , , , , , , , , , , 1 1 1 1 1 1 1 1 1 1 1 1 1 1 For the one-digit payment decoding and content model, when the stone r network gentleman, the 镐夕里咼咼咼咼咼已已已已已已已已已已已已已已已已已已已已已已已已已已已已已已已已已The sexual symbol, ', jin is determined by the low probability symbol, the high probability symbol value related content, and the inverse value of the lower handle will be included in the 辄. The result of the decoding process is a continuous decoded binary value, which is evaluated to determine whether the sequence conforms to a meaningful word. The general description of the decoding system, the operation of the system, and the (10) ac solution are described below. The following describes the content of the lion CABAC decoding program: 200 The various components of the application can be adapted to the actual geese, 糸, 先、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、 It is helpful to understand the different procedures and/or components described above, and then proceed to the description of the steps. Figures 6A to f6F show the blocks of the specific embodiment of the solution and related components. As shown, the decoding system · a CABAC unit 530 (in the 6A to the figure, the CABAC unit 530 used can be interchanged with the decoding system 2), so in an embodiment, the decoding system 200 can Decode single-bit stream. The same principle can be applied

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 34 200821982 至，有額外可變長度解碼單元的解碼系統2〇〇,可同時解Client's Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 34 200821982 To, the decoding system with additional variable length decoding unit 2〇〇 can be solved simultaneously

碼夕個（例如兩個）串流。簡單地說，第6A 碼系統200之選擇元件的方塊圖，而第圖㈣厂解所链一、阳摇— 圖係頟示弟όΑ 圖所顯不之延擇兀件加上其他元件的功能方 =及==碼系統2〇。之内容記憶體功能的方免圖’以及弟6D圖係顯示使用於解碼巨集區塊之機制的方塊圖。雖然下列敘述是有關巨集區塊解碼的内容，但是本發明所提出之原理可應用到各種區塊解碼。參考第6A ®，可變長度解碼單元撕包括a· 邏輯模組580以及記憶體模組咖。在—實施邏輯模組580包含三個模組，其分別是二進錢（腿D) 模組620、取得内容（GCTX)模組必、以及二進解碼（BARD)引擎624。二進位計算解碼引擎伽更包: 狀態索引（pStatddx )暫存n 6Q2、高可能性符號值（) 暫存器604、碼長範目（c〇dlRange)暫存器鄉，以及碼長偏移量暫存器（C〇dlOffset) 608。可變長度解碼單元53〇a 更包括記憶體模組650,其包括内容記憶體564 (亦稱為巨集區塊鄰近内容（mbNeighCtx)記憶體或是内容記憶體陣例）、局部暫存器612、總體暫存器614，以及SREG串流緩衝器/DMA引擎562 (亦稱為DMA引擎模組，將於第6C 圖中做進一步說明），另外還有未顯示之暫存器。在一實施例中，内容記憶體564包含如第6C圖之陣列結構，之後會有更進一步之說明。記憶體模組650亦包括二進位字串 (binstring)暫存器 616。A stream of (for example, two) streams. Briefly, the block diagram of the selected component of the 6A code system 200, and the figure (4) of the factory solution chain 1 , the Yang shake - the system shows the brothers and sisters to show the selection of the components and other components Party = and == code system 2〇. The block diagram of the content memory function and the 6D map show the block diagram of the mechanism used to decode the macroblock. Although the following description is about macroblock decoding, the principles of the present invention are applicable to various block decoding. Referring to FIG. 6A, the variable length decoding unit tears include an a logic module 580 and a memory module. The implementation logic module 580 includes three modules, namely a dual money (leg D) module 620, a get content (GCTX) module, and a binary decoding (BARD) engine 624. Binary calculation decoding engine gamma packet: state index (pStatddx) temporary storage n 6Q2, high probability symbol value () register 604, code length norm (c〇dlRange) register, and code length offset Register (C〇dlOffset) 608. The variable length decoding unit 53A further includes a memory module 650, which includes a content memory 564 (also referred to as a macroblock neighboring content (mbNeighCtx) memory or a content memory array), and a local register. 612, the overall register 614, and the SREG stream buffer/DMA engine 562 (also referred to as the DMA engine module, which will be further described in FIG. 6C), in addition to the scratchpad not shown. In one embodiment, the content memory 564 includes an array structure as shown in Figure 6C, as will be further explained hereinafter. The memory module 650 also includes a binstring register 616.

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 35 可變長度解碼單元530a 目標（DST)匯流排62δ、兩=伽的介面包括 SRC·、共用以及執行緒資訊匯^^ SR㈣2以及置匯流排636。目標匯流排628上一 634，以及延遲/重 (例如經由中間快取記,[tn 資料可以直接或間接傳送至圖形處理單元· 114 曰子M ，衝器、或記憶體）標匯流排628上的資料可°β ^部的視頻處理單元。目軟…丨格式或二=格式之-，包括微Clienfs Docket No.: S3U06-0014-TW TT's Docket No: 0608-A41247twf.doc/NikeyChen 35 Variable length decoding unit 530a Target (DST) bus 62δ, two = gamma interface including SRC·, sharing, and thread information The sink ^^ SR (four) 2 and the set bus 636. The target bus 628 is 634 on top, and delayed/heavy (eg, via intermediate cache, [tn data can be transferred directly or indirectly to the graphics processing unit 114 曰 M, rush, or memory) on the header 628 The data can be used in the video processing unit of the ββ section. Soft...丨 format or two=format-, including micro

巨集區塊參數、動物W取樣可隻長度解碼單元53〇a亦包括具有位址匯流排638和資料匯流排_的記憶體介面。藉由從位址匯流排㈣得到位址，S己憶體，面可存取位元流資料以供存取資料匯流排⑽ 所接收的資料在-貫施例巾，資料匯流排_上的資料 Μ包括未編碼視頻串流，其包括各難號參數以及其他貪料與格式。於部分實施例中，可以使用載人·儲存操作來存取位元流資料。Macro Block Parameters, Animal W Sampling The only length decoding unit 53A also includes a memory interface having an address bus 638 and a data bus _. By obtaining the address from the address bus (4), the surface can access the bit stream data for accessing the data received by the data bus (10) in the case of the case towel, the data bus _ The data Μ includes unencoded video streams, which include various hard parameters and other cues and formats. In some embodiments, a person storage operation can be used to access the bit stream data.

在開始說明可變長度解碼單元兄⑽的不同元件之前，簡單說明有關CABAC解碼之執行單元42〇a的整體操作。通常’根據片段（slice)的種類，驅動軟體128 (第1圖）準備並載入CABAC著色器至執行單元420a〇CABAC著色器使用標準指令集，再加上二進位化指令、取得内容指令以及二進位計算解碼指令以解碼位元流。因為可變長度解碼單元530a使用的内容表可根據片段種類改變，其中每一片段均要載入。在一實施例中，在發出其他指令前，CABACBefore starting to explain the different elements of the variable length decoding unit brother (10), the overall operation of the execution unit 42A with respect to CABAC decoding will be briefly explained. Usually, according to the type of slice, the driver software 128 (Fig. 1) prepares and loads the CABAC shader to the execution unit 420a. The CABAC shader uses a standard instruction set, plus a binary instruction, a content instruction, and The binary computes the decode instruction to decode the bitstream. Since the table of contents used by the variable length decoding unit 530a can be changed according to the type of the segment, each of the segments is to be loaded. In one embodiment, CABAC is issued before other instructions are issued.

Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChei 36 200821982 著色！§'所執行的第—個指令包含INIT_CTX指令和 INIT—ADE指令。這兩個指令使CABAC單元53〇開始解碼 CABAC位兀流’並從自動安排串流解碼的指標載入位元流至先進先出緩衝器，稍後將說明這兩個指令。 ,關於解析位元流，從記憶體介f的資料匯流排64〇接收位元流，然後由SREG串流緩衝:器/DMA引擎562進行緩衝。從片段資料解析階段提供位元流解碼。亦即，位元流（例如：NAL位元流）包括一或多張圖#，其將切割成圖片檔頭（header)以及許多片段。片段通常與連續的巨集區塊«。在-實施财，彳部㈣（gp可變長度解碼單兀53Qa外部）解析NAL位元流、解碼片段槽頭並傳送指向該片段資料（例如片段開始處）位置的指標。硬體（加上軟體）可以從圖形來解析H264位元流。不過，在一實施例中，CABAC編碼僅出現於片段資料與巨集區塊階段。通常’驅練體128從#段:#繼段處理位元流，因為這是應用程式以及AP所I提供的功能。指向片段資料位置的指標還包含片段資料的第一位元組（例如： RBSPbyeAddress)以及指出是位元流開始或標頭位置（例如：sREGptO的位元偏移量指標（例如一或多個位元）。位元流的初始化將於稍後解釋'。在某些實施例中，可以利用主機處理器（例如第1圖之中央處理單it 126)處理外部程序以提供圖片階段解^及片段㈣解碼。在部分實施例中，由於解碼系統200的編程特性，可以在任何階段中進行解碼。Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChei 36 200821982 Coloring! The first instruction executed by § ' contains the INIT_CTX instruction and the INIT-ADE instruction. These two instructions cause the CABAC unit 53 to start decoding the CABAC bit stream' and load the bit stream from the indicator that automatically arranges the stream decoding to the FIFO buffer, which will be described later. For parsing the bit stream, the bit stream is received from the data bus 64 of the memory f, and then buffered by the SREG stream buffer: the DMA engine 562. Bitstream decoding is provided from the fragment data parsing stage. That is, the bit stream (e.g., NAL bit stream) includes one or more pictures #, which will be cut into picture headers and a number of fragments. Fragments are usually associated with a continuous macro block «. In the implementation, the ( (4) (gp variable length decoding unit 兀 53Qa external) parses the NAL bit stream, decodes the fragment slot header, and transmits an indicator pointing to the location of the fragment data (eg, at the beginning of the segment). The hardware (plus software) can parse the H264 bit stream from the graph. However, in one embodiment, CABAC coding occurs only in the segment data and macroblock stages. Usually the 'driver' 128 handles the bit stream from the #segment:# segment, because this is the application and the functionality provided by the AP. The indicator pointing to the location of the fragment data also contains the first byte of the fragment data (eg: RBSPbyeAddress) and indicates the start or header position of the bit stream (eg, the bit offset indicator of sREGptO (eg one or more bits) The initialization of the bitstream will be explained later. In some embodiments, the host processor (eg, central processing unit it 126 of Figure 1) can be utilized to process external programs to provide image phase solutions and fragments. (d) Decoding. In some embodiments, due to the programming characteristics of the decoding system 200, decoding can be performed in any stage.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChe: 37 200821982 〜參考第5C圖以及第6A圖，SREG串流缓衝器麵入引擎562用以分別接收匯流排632以及匯流排63〇的匯流排SRC1值以及匯流排SRC2值，以及對應於轉發暫存器以及控制暫存器的資料。SREG串流緩衝器/DMA引擎562包含内部位元流緩衝器562b，在一實施例中可為BigEndmn 格式之32位元暫存器以及8個128位元（8χ128)暫存器。經由驅動軟體發出如前述之初始化指令可初始設定sreg 串飢、、爰衝态/DMA引擎562。一旦初始化，便自動管理SREG 串流緩衝器/DMA引擎562的内部緩衝器562b。使用SRE(} 串流緩衝器/DMA引擎562以保留解析位元的位置。在一貝施例中’ SREG串流緩衝器/DMA引擎562使用兩個暫存口口快速％位元正反器與一較慢512或1 〇24位元記憶月豆。位元流會使用位元。SREG暫存器562a以位元進行操作而位元k緩衝裔5 62b以位元組進行操作，其可以節省電源。通常，指令操作在SREG暫存器562a中，並使用少許位元（例如1-3位元）。當SREG暫存器562&使用超過一位7G組的資料時，資料（以位元組片段）將從位元流緩衝器562b傳送給SREG暫存器562a，然後緩衝器指標會減少所傳送的位元組數量。當SREG串流緩衝器/DMA引擎 562的DMA偵測到使用256位元或是更多位元時，從記憶月豆&取256位元以再填滿位元流緩衝器562b。因此，可變長度解碼單元530a實施一個簡單的循環緩衝器（256位元片段X 4 )以紀錄位元流缓衝器562b並提供填充。在某也實施例中，可以使用單一緩衝器，不過一個循環缓衝器需Client's Docket No.: S3U06-0014-TW TT's Docket No: 0608-A41247twf.doc/NikeyChe: 37 200821982 ~ Referring to Figure 5C and Figure 6A, the SREG stream buffer face-in engine 562 is used to receive the busbars respectively 632 and the bus bar SRC1 value of the bus bar 63〇 and the bus bar SRC2 value, and the data corresponding to the forwarding register and the control register. The SREG Stream Buffer/DMA Engine 562 includes an internal bitstream buffer 562b, which in one embodiment can be a 32-bit scratchpad in BigEndmn format and 8 128-bit (8-128) scratchpads. The sreg string, mash state/DMA engine 562 can be initially set by issuing an initialization command as described above via the driver software. Once initialized, the internal buffer 562b of the SREG stream buffer/DMA engine 562 is automatically managed. The SRE(} stream buffer/DMA engine 562 is used to preserve the position of the parsing bit. In a case where the 'SREG stream buffer/DMA engine 562 uses two temporary port fast % bit flip-flops With a slower 512 or 1 〇 24-bit memory moon beans. The bit stream uses bits. The SREG register 562a operates with bits and the bit k buffers 5 62b operate with bytes, which can Power is saved. Typically, the instruction operates in the SREG register 562a and uses a few bits (eg, 1-3 bits). When the SREG register 562 & uses more than one 7G group of data, the data (in bits) The tuple segment will be transferred from the bitstream buffer 562b to the SREG register 562a, and then the buffer indicator will reduce the number of bytes transferred. When the DMA of the SREG stream buffer/DMA engine 562 is detected. When 256 bits or more, 256 bits are taken from the memory moon bean & to refill the bit stream buffer 562b. Therefore, the variable length decoding unit 530a implements a simple circular buffer (256 bits) The meta-segment X 4 ) is used to record the bit stream buffer 562b and provide padding. You can use a single buffer, but a circular buffer required

Clients Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 38 200821982 要更複雜的指標計算以跟上記憶體的速度。可以利用初始化指令來達成内部緩衝器562b的内部動作，稱為INIT—BSTR指令。在一實施例中是由驅動軟體ία 發出INIT_BSTR指令以及其他之後說明的指令。已知位元流位置的位元組位址及位，元偏移量，INIT 一 BSTR指令將資料載入至内部位元流緩衝:器562b並開始管理程序。對於每一次呼叫處理片段資料，將發出下列格式之指令： INIT—BSTR offset, RBSPbyteAddress f \ 發出INIT一BSTR指令以載入資料至SREG串流缓衝器 /DMA引擎562的内部緩衝器562b。SRC2暫存器提供位元組位址（RBSPbyteAddress)，而SRC1暫存器提供位元偏移量。如此，可提供下列通用之指令格式： INIT—BSTR SRC2, SRC1，其中’這個指令中的SRC 1以及SRC2以及其他對應於 I 内部暫存器566的值非限定在這些暫存器。在一實施例中，使用256位元排列之記憶體提取以存取位元流資料，其寫入至緩衝裔暫存器並傳送至SREG串流緩衝器/dma 引擎562之32位元SREG暫存器562a。於一實施例中，在任何其他操作針對這些暫存器或是緩衝器的操作開始之前，位元流緩衝器562b内的資料是以位元組方式排列。藉由使用排列指令可實施資料的排列，稱之為ABST指令。 ABST指令排列位元流緩衝器562b内的資料，其中在解碼Clients Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 38 200821982 More complex indicator calculations to keep up with the speed of the memory. The internal operation of internal buffer 562b can be accomplished using an initialization instruction, referred to as the INIT-BSTR instruction. In one embodiment, the INIT_BSTR instruction and other instructions described later are issued by the driver software ία. Knowing the byte address and bit of the bit stream location, the meta-offset, the INIT-BSTR instruction loads the data into the internal bitstream buffer: 562b and begins the hypervisor. For each call processing fragment data, an instruction of the following format will be issued: INIT - BSTR offset, RBSPbyteAddress f \ The INIT-BSTR instruction is issued to load the data into the internal buffer 562b of the SREG Stream Buffer / DMA Engine 562. The SRC2 register provides the byte address (RBSPbyteAddress) and the SRC1 register provides the bit offset. Thus, the following general instruction formats are available: INIT_BSTR SRC2, SRC1, where SRC 1 and SRC2 in this instruction and other values corresponding to I internal register 566 are not limited to these registers. In one embodiment, a 256-bit aligned memory fetch is used to access the bitstream data, which is written to the buffer register and passed to the 32-bit SREG of the SREG stream buffer/dma engine 562. The memory 562a. In one embodiment, the data in the bit stream buffer 562b is arranged in a byte group before any other operations begin with the operations of the registers or buffers. The arrangement of the data can be implemented by using the permutation instruction, which is called an ABST instruction. The ABST instruction arranges the data in the bit stream buffer 562b, where the decoding

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 39 200821982 程序中，排列位元（例如：填充位元）最後將丟棄。 .‘ 當SREG暫存器562a使用資料時，内部缓衝器562b 便會填充貧料。換句話說，SREG串流缓衝器/DMA引擎 562的内部緩衝器562b作為以3為模(m〇dul〇)之循環緩衝器以輸入SREG串流緩衝器/DMA引擎562的32位元暫存态562a。CABAC模組580與讀取模組572 一起可使用 READ指令以從SREG暫存器562a讀取資料。例如，.在 H.264規格中’某些符號為固定長度編碼，以及藉由執行 f 這些特定位元數的READ指令而得到值，並零延伸至暫存器的尺寸。READ指令之格式如下： READDST, SRC1, 其中DST對應於輸出或目標暫存器。在一實施例中， SRC1暫存器包含不具正負號的整數值n。透過read指令’從SREG暫存器562a讀取n位元。當從⑴立元暫存器562a使用了 256位元的資料（例如解碼—或多個語法成分）’自動開始提取動作以獲得另—個256位㈣資料以寫入至内部緩衝器562b的暫存器，接著進^sreg暫存哭 562a進行使用。在某些實施例中，如果對應於—符號解碼之SREG新存器562a的資料已被使用了預定數量的位元或位元组^ 内部緩衝n懸沒有再接收到任何資料，則cabac 580可以經由延遲/重置匯流排636執行延遲，以便執行发他的執行緒（例如與CABAC解—序無關之執行:Client's Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 39 200821982 In the program, the arrangement bits (for example: padding bits) will be discarded at the end. ‘ When the SREG register 562a uses data, the internal buffer 562b is filled with lean material. In other words, the internal buffer 562b of the SREG stream buffer/DMA engine 562 acts as a circular buffer modulo 3 (m〇dul〇) to input the 32-bit temporary of the SREG stream buffer/DMA engine 562. State 562a. The CABAC module 580, along with the read module 572, can use the READ command to read data from the SREG register 562a. For example, in the H.264 specification, 'some symbols are fixed-length codes, and the value is obtained by executing the READ instruction for these specific number of bits, and zeros to the size of the scratchpad. The format of the READ instruction is as follows: READDST, SRC1, where DST corresponds to the output or target register. In an embodiment, the SRC1 register contains an integer value n that is not signed. The n-bit is read from the SREG register 562a via the read instruction. When the 256-bit data (for example, decoding - or a plurality of syntax components) is used from the (1) tiling register 562a, the extraction operation is automatically started to obtain another 256-bit (four) material for writing to the internal buffer 562b. Save the file, and then enter the ^sreg temporary cry 562a for use. In some embodiments, if the data corresponding to the -symbol decoded SREG register 562a has been used for a predetermined number of bits or bytes ^ internal buffer n is no longer received, then cabac 580 can The delay is performed via the delay/reset bus 636 to perform the execution of the thread (eg, execution independent of the CABAC solution:

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 40 200821982 像是頂點著色器操作。使用SREG串流緩衝器/DMA引擎562的DMA引擎可以減少所需的全部缓衝器以補償記憶體延遲（例如，於某些圖形處理單元中，會有三百多週期）。當使用了位元流，可以請求流入另外的的位元流資料。如果位元流資料太Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 40 200821982 Like a vertex shader operation. The DMA engine using the SREG Stream Buffer/DMA Engine 562 can reduce all of the buffers needed to compensate for memory delays (e.g., in some graphics processing units, there will be more than three hundred cycles). When a bit stream is used, it is possible to request the flow of additional bit stream data. If the bit stream data is too

t I 低，且位元流缓衝器562b有向下溢位的風險時（例如已知週期數量，讓信號從可變長度解碼單元530a流至處理器管線），可傳遞延遲信號給處理器管線以暫停操作直到所等候的資料到達位元流緩衝器562b。此外，SREG串流緩衝器/DMA引擎562原本就有處理錯誤位元流的能力。例如，由於位元流錯誤，有可能會沒有偵測到片段結尾標示。這種偵測錯誤可能會導致完全地解碼錯誤，並且使用到後來的圖樣或片段的位元。SREG 串流緩衝器/DMA引擎562紀錄所使用的位元數。當使用的位元數大於預設的定限值（可針對每一片段改變）時，結束處理程序並送出異常的信號至處理器（例如：主機處理器）。接著，處理器執行編碼以嘗試從錯誤中回復。請同時參考第6A圖以及第6B圖，進一步說明可變長度解碼單元530a的功能，尤其是解碼引擎（例如：BARD 引擎或是模組624)以及内容變數的初始化。在片段起始處且在解碼對應於第一巨集區塊的語法成分之前，内容狀態以及二進位計算解碼模組624被初始化。在一實施例中，驅動軟體128發出INIT—CTX指令以及INIT—ADE指令來進行初始化。When t I is low and the bit stream buffer 562b has a risk of a down overflow (eg, a known number of cycles, allowing the signal to flow from the variable length decoding unit 530a to the processor pipeline), a delay signal can be passed to the processor. The pipeline suspends operation until the waiting data arrives at the bit stream buffer 562b. In addition, the SREG Stream Buffer/DMA Engine 562 originally had the ability to handle error bitstreams. For example, due to a bit stream error, it may not be detected at the end of the segment. This detection error can result in a complete decoding error and the use of bits in subsequent patterns or fragments. The SREG Stream Buffer/DMA Engine 562 records the number of bits used. When the number of bits used is greater than the preset limit (which can be changed for each segment), the handler is terminated and an exception signal is sent to the processor (for example, the host processor). The processor then performs the encoding to attempt to reply from the error. Please refer to FIG. 6A and FIG. 6B simultaneously to further explain the function of the variable length decoding unit 530a, especially the decoding engine (for example, the BARD engine or the module 624) and the initialization of the content variables. The content state and the binary computation decoding module 624 are initialized at the beginning of the segment and before decoding the syntax components corresponding to the first macroblock. In one embodiment, the driver software 128 issues an INIT-CTX instruction and an INIT_ADE instruction for initialization.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 41 200821982 t INIT—CTX指令會啟動CABAC解碼模式並初始化一個或多個内容表（例如遠端儲存或是晶片上記憶體，例如 ROM)。INIT一CTX指令可根據下列指令格式而執行： INIT—CTX SRC2, SRC1 對INIT_CTX指令而言，根據位元位置，:運算元SRC1 可具有下列一或多個關於已知H.264巨集區塊參數的值： cabac—init一idc、mbPerLine、constrained—intra—pred flag、『 NAL_unit—type(NUT)以及 MbaffFlag 。需注意到 constrained—intra一pred—flag、NAL—unit—type(NUT)以及 MbaffFlag對應於已知H.264巨集區塊參數。此外，根據位元位置，運算元SRC2具有下列值：SliceQPY以及 mbAddrCurr。在一實施例中，進一步解釋，執行INIT_CTX 指令（即CAB AC内容表的初始化）需要cabac」nit_idc以及sliceQPY(如量子化）參數。不過，要初始化整個CABAC 引擎需要三個指令，即INIT_BTSR指令、INIT_CTX指令以及INIT—ADE指令，因此，SRC1及SRC2 ( {列如：全部 64位元或各32位元）中的可用位元可以傳遞其他用於 CABAC鄰近内容的參數。因此兩個來源暫存器SRC1以及 SRC2 664可以包含下列值： SRC1[15:0] = cabac 一 init—idc SRC 1 [23:16] = mbPerLine SRC 1 [24] = constrained 一 intra—pred 一 flag SRC 1 [27:25] - NAL_unit_type (NUT)Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 41 200821982 t The INIT-CTX command will initiate the CABAC decoding mode and initialize one or more table of contents (eg remote storage or on-wafer) Memory, such as ROM). The INIT-CTX instruction can be executed according to the following instruction format: INIT_CTX SRC2, SRC1 For the INIT_CTX instruction, depending on the bit position, the operand SRC1 can have one or more of the following H.264 macroblocks. The values of the parameters: cabac—init—idc, mbPerLine, constrained—intra—pred flag, ”NAL_unit—type(NUT), and MbaffFlag. It should be noted that constrained-intra-pred-flag, NAL-unit-type (NUT), and MbaffFlag correspond to known H.264 macroblock parameters. Further, according to the bit position, the operand SRC2 has the following values: SliceQPY and mbAddrCurr. In an embodiment, it is further explained that the execution of the INIT_CTX instruction (i.e., initialization of the CAB AC table of contents) requires a cabac"nit_idc and a sliceQPY (e.g., quantization) parameter. However, to initialize the entire CABAC engine requires three instructions, namely the INIT_BTSR instruction, the INIT_CTX instruction, and the INIT_ADE instruction. Therefore, the available bits in SRC1 and SRC2 ({columns such as: all 64 bits or 32 bits each) can be used. Pass other parameters for CABAC proximity content. Therefore, the two source registers SRC1 and SRC2 664 can contain the following values: SRC1[15:0] = cabac an init_idc SRC 1 [23:16] = mbPerLine SRC 1 [24] = constrained an intra-pred flag SRC 1 [27:25] - NAL_unit_type (NUT)

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 42 200821982 SRC1[；28卜 MbaffFlag SRC1[31:29]二未定義 SRC2[15:0] = SliceQPY SRC2[31:16] = mbAddrCurrClienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 42 200821982 SRC1[;28Bu MbaffFlag SRC1[31:29]Two undefined SRC2[15:0] = SliceQPY SRC2[31: 16] = mbAddrCurr

SliceQPY的值是用於初始化位元流:缓衝器562b内的狀態機（未顯示）。雖然如文已δ寸論各種已知之圖形與片段茶數’另外提供一些關於可變長度解碼單元530a之參數。在一實施例中’ cabacjnit—idc是針對未編碼為I-picture和切換 I-picture(SI)之片段所定義。換句話說，cabac_init_idc只能針對P、SP以及B片段而定義，以及當接收到I和si片段時’ cabac-imt—idc為預設值。舉例來說，當大概460個内谷（例如I以及SI片段）被初始化時，可以將cabac_ init_idc e又為3 (因為根據H.264規格，ca]3ac_init_idc的值只能是 0〜2)，致能2位元以表示該片段為I或SI。可變長度解碼單元530a亦可使用INIT_CTX指令以初始化局部暫存器612以及巨集區塊鄰近内容記憶體564陣列結構或是元件，包括與暫存相鄰巨集區塊有關之暫存器。參考第6C圖，在一實施例中，巨集區塊鄰近内容記憶體564位於圖的上方。在一實施例中，巨集區塊鄰近内容 € 體5 64的巨集區塊基準鄰近内容記憶體排列成記憶體陣列以儲存有關巨集區塊之列（r〇w)的資料。如圖所示，巨集區塊鄰近内容記憶體564包括陣列元素mbNeighCtx[0,The value of SliceQPY is used to initialize the bit stream: a state machine (not shown) within buffer 562b. Although a number of known patterns and fragment tea numbers have been provided as well, some parameters regarding the variable length decoding unit 530a are provided. In one embodiment, 'cabacjnit_idc' is defined for fragments that are not encoded as I-picture and switched I-picture (SI). In other words, cabac_init_idc can only be defined for P, SP, and B segments, and 'cabac-imt_idc' is the default when I and si segments are received. For example, when approximately 460 inner valleys (eg, I and SI fragments) are initialized, cabac_init_idc e can be again 3 (because the value of ca]3ac_init_idc can only be 0~2 according to the H.264 specification, A 2-bit is enabled to indicate that the fragment is I or SI. The variable length decoding unit 530a may also use the INIT_CTX instruction to initialize the local register 612 and the macroblock neighboring content memory 564 array structure or elements, including the temporary registers associated with the temporary neighboring macroblocks. Referring to Figure 6C, in one embodiment, the macroblock adjacent content memory 564 is located above the map. In one embodiment, the macroblock block is adjacent to the content of the macroblocks 64. The reference block neighboring content memory is arranged in a memory array to store data about the macroblocks (r〇w). As shown, the macroblock neighboring content memory 564 includes the array element mbNeighCtx[0,

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 200821982Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 200821982

1，i-1，i，i+1，...119](標號為601)，各元素用以儲存12〇個巨集區塊中的一個巨集區塊至一列（例如對應於HDTV 為 1920x1080 像素）。目前 mbNeighCtxCurrent 暫存器 603 用於儲存當前解碼之巨集區塊，而mbNeighCtxLeft暫存器㈧5用於儲存先前解碼之鄰近（左巨集區塊。此外，利用指標607a、607b和607c (在第6C圖中以箭頭表示）才曰向暫存器603、605和陣列元素601。為了解碼目前之巨集區塊，解碼之資料儲存於mbNeighCtxCurrent暫存器 603。已知CABAC解碼之内容本質，根據前次解碼巨集區塊時所蒐集之資訊來解碼目前的巨集區塊，亦即左方巨集區塊儲存於左方mbNeighCtxLeft暫存器605並由指標607b 所指向，而上方巨集區塊儲存於陣列元素[i]中並由指標 607c所指向。繼續解釋初始化指令，INIT^CTX指令用於初始化與目前巨集區塊（例如巨集區塊鄰近内容記憶體564陣列之元素）相鄰之巨集區塊有關的上方及左方指標607c及607b。例如，左方指標607b可以設為〇而上方指標607c可以設為1。此外，INIT_CTX指令會更新總體暫存器614。關於内容表的初始化，因應呼叫INIT_CTX指令，可變長度解碼單元530a建立一或多個内容表，亦稱為 CTX一TABLE。在一實施例中，CTX—TABLE可以是 4x460x16位元表（8位元給m，另外8位元給η，具正負號的值）或是其他貧料結構，内容表的每一個項目包含從狀態索引暫存器602以及高可能性符號值暫存器604所存1, i-1, i, i+1, ... 119] (labeled 601), each element is used to store one of the 12 macroblocks into a column (for example, corresponding to HDTV) 1920x1080 pixels). The mbNeighCtxCurrent register 603 is currently used to store the currently decoded macroblock, while the mbNeighCtxLeft register (8) 5 is used to store the previously decoded neighbors (left macroblock. In addition, using indicators 607a, 607b, and 607c (at section 6C) The figure is indicated by an arrow) to the registers 603, 605 and the array element 601. In order to decode the current macroblock, the decoded data is stored in the mbNeighCtxCurrent register 603. The content of the content of the CABAC decoding is known, according to the former The information collected during the decoding of the macroblock is decoded to decode the current macroblock, that is, the left macroblock is stored in the left mbNeighCtxLeft register 605 and pointed by the indicator 607b, and the upper macro block Stored in array element [i] and pointed to by indicator 607c. Continuing to interpret the initialization instructions, the INIT^CTX instruction is used to initialize adjacent to the current macroblock (eg, the element of the macroblock block adjacent to the content memory 564 array). The upper and left indicators 607c and 607b related to the macro block. For example, the left indicator 607b can be set to 〇 and the upper indicator 607c can be set to 1. In addition, the INIT_CTX command updates the total. The register 614. Regarding the initialization of the table of contents, the variable length decoding unit 530a establishes one or more content tables, also referred to as CTX-TABLE, in response to the INIT_CTX command. In an embodiment, the CTX_TABLE may be 4x460x16 bits. The metatable (8 bits for m, the other 8 bits for η, with positive and negative values) or other poor structure, each item of the table contains a slave state register 602 and a high probability symbol value. Stored in 604

Client’s Docket No_: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 44 200821982 取之 pStateldx 值及 valMPS 值。 INIT-ADE指令起始化二進位計算解碼模組624，亦稱為解碼引擎。在一實施例中，完成INIT_BTSR指令後呼叫 INIT一ADE 指♦ 。INIT一ADE 指i 變解碼單元530a建立兩個暫存器，分別是碼長範圍 ί (codlRange)暫存器606以及碼長偏移量（codlOffset)暫存器608，具有下列指令或是數值： codlRange - 0x01FE 以及 codlOffset = ZeroExtend (READ(#9)5 #16) 如此，在一實施例中，這些變數可以是9位元數值。關於codlOffset指令，9位元是從位元流緩衝器562b所讀取’令延伸（ZeroExtend)則儲存於16位元碼長偏移量暫存器608中。部分實施例亦可使用其他數值。二進位計算解碼模組624使用儲存於暫存器606及6〇8之數值以決定要輸出0或1 ’且當一進位解碼之後，這些值將進行更新。除了初始化碼長範圍暫存器606以及碼長偏移量暫存态608 ’ INIT一ADE指令操作亦初始化二進位字串暫存器 616。在一實施例中，二進位字串暫存器616可以是位元暫存器，其接收來自二進位計算解碼模組624的輸出位元。在部分實施例中可使用其他大小之暫存哭。當巨集區塊編石馬成I—P C Μ資料時，二進位計算解瑪模組624亦被初始化。已知LPCM資料包含像素資料，根據 H.264規格，其並沒有將轉換或預測模型應用至原始視訊Client’s Docket No_: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 44 200821982 Take the pStateldx value and the valMPS value. The INIT-ADE instruction initiates a binary calculation decoding module 624, also known as a decoding engine. In one embodiment, the INIT-ADE finger ♦ is called after the INIT_BTSR instruction is completed. The INIT-ADE refers to the i-decoding unit 530a to establish two registers, which are a code length range ί (codlRange) register 606 and a code length offset (codlOffset) register 608, having the following instructions or values: CodlRange - 0x01FE and codlOffset = ZeroExtend (READ(#9)5 #16) As such, in one embodiment, these variables can be 9-bit values. Regarding the codlOffset instruction, the 9-bit element is read from the bit stream buffer 562b, and the extension is stored in the 16-bit code length offset register 608. Some embodiments may also use other values. The binary calculation decoding module 624 uses the values stored in the registers 606 and 6.8 to determine whether to output 0 or 1 ' and these values will be updated after a carry decoding. In addition to the initialization code length range register 606 and the code length offset temporary state 608', the INIT-ADE instruction operation also initializes the binary string register 616. In one embodiment, binary string register 616 can be a bit register that receives the output bits from binary calculation decoding module 624. Other sizes of temporary crying may be used in some embodiments. When the macro block is programmed into I-P C Μ data, the binary calculation solution module 624 is also initialized. LPCM data is known to contain pixel data, and according to the H.264 specification, it does not apply a conversion or prediction model to the original video.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 45 200821982 ·貢料。例如，I—PCM可被使用以供無損（i〇ssless)編碼應用。、以上已描述與解析位元流以及初始化各種解碼系統元件有關的架構以及指令，下面將描述有關二進位化、接收模型資訊與内容，以及根據模型及内容解碼的一或多個程序。通常，可變長度解碼單元53〇a用於取得解析語法成分 (syntax element，SE)所有可能的二進位化，或是經由二广進位化模組620及BIND指令至少足夠取得模型資訊。可 " 艾長度解碼單元530a更經由取得内容模組622及GCTX指令得到已知語法成分的内容，並根據内容及模型資訊，經由二進位計算解碼模組624及BARD指令實施運算解碼。貝際上，呼叫GCTX/BARD指令、輸出一位元給二進位字串暫存器、616制發現配合已知語法成分之有意義字碼會構成一迴圈。在一實施例中，每一次解碼二進位值之後: 提供對應的解碼位元給二進位字串暫存器616，而二進位 ( 字串暫存器被讀回至内容模組622，直到發現配對。更詳細解釋使用單一可變長度解碼單元53〇a的解碼系統架構，並同時參考第6A圖與第犯圖，經由㈣軟體128 所發出的BIND指令以致能二進位化模組62〇。於一實施例中，BIND指令具有下列格式：、 BIND DST，#Imml6, SRC1，其中，DST對應於目標暫存器652，而#Imml6對應i6 位元目前數值，以及SRC1對應於輸入暫存器SRC1。bindClient’s Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 45 200821982 · Digest. For example, I-PCM can be used for lossless (i〇ssless) coding applications. The architecture and instructions related to parsing the bitstream and initializing various decoding system elements have been described above. One or more procedures for binarization, reception model information and content, and decoding based on the model and content will be described below. In general, the variable length decoding unit 53A is used to obtain all possible binarizations of the syntax element (SE), or at least sufficient to obtain model information via the BD module 620 and the BIND command. The " Ai length decoding unit 530a obtains the content of the known syntax component by acquiring the content module 622 and the GCTX command, and performs arithmetic decoding by the binary calculation decoding module 624 and the BARD command based on the content and the model information. On the Bay, the GCTX/BARD command is called, the bit is output to the binary string register, and the 616 system finds that the meaningful code with the known syntax component constitutes a loop. In one embodiment, after each decoding of the binary value: a corresponding decoding bit is provided to the binary string register 616, and the binary bit (the string register is read back to the content module 622 until found Pairing. The decoding system architecture using the single variable length decoding unit 53A is explained in more detail, and the BIND instruction issued by the (4) software 128 is used to enable the binary module 62〇 with reference to the 6A and the first map. In one embodiment, the BIND instruction has the following format: , BIND DST, #Imml6, SRC1, where DST corresponds to the target register 652, and #Imml6 corresponds to the current value of the i6 bit, and SRC1 corresponds to the input register. SRC1.bind

Clients Docket No.： S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 46 200821982 指令插作的輸入包含語法成分（包含16位元目前數值 Imm )以及内容區塊種類（ctxBlockCat)。語法成分可以包含任何符合H.264規格的任何語法成分型式（例如： MBTypelnl、MBSkipFlagB、IntraChromaPredMode 等）。呼叫BIND指令會使得驅動軟體128從儲存在記憶體（例如：晶片上記憶體或遠端記憶體）·中的表單（或其他資料結構）讀取語法成分，並取得語法成分索引（SEIdx)。語法成分索引用於存取其他表單或是資料結構以獲得如下文所描述之各巨集區塊參數。在一實施例中，目標暫存器652包含32位元暫存器，其具有下列格式：位元0-8 ( ctxIdxOffset)、位元16-18 (maxBinldxCtx)、位元 21-23 (ctxBlockCat)、位元 24-29 (ctxIdxBlockCatOffset)、以及位元 31 (bypass flag)。這些數值（例如ctxIdxOffset, maxBinldxCtx等等）會傳送至取得内容模組622當作内容模型之用。在此實施例中，任何未定義的保留位元可以是〇。根據語法成分索引以及内容區塊種類的配對結果，ctxIdxBlockOffset可經由儲存於遠端或晶片上記憶體之表單或其他資料結構而取得。表一說明一非限定實施例之表單内容： ccxJeNum (k) Coded 一blockjDattem Intra_4x4 Inter 0 47 0 1 31 16 2 15 1 3 0 2 4 23 4 5 27 8Clients Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 46 200821982 The input of the instruction insert contains the syntax component (including the 16-bit current value Imm) and the content block type (ctxBlockCat). The syntax component can contain any syntax component type that conforms to the H.264 specification (for example: MBTypelnl, MBSkipFlagB, IntraChromaPredMode, etc.). Calling the BIND command causes the driver software 128 to read the syntax components from the form (or other data structure) stored in the memory (e.g., on-wafer memory or remote memory) and obtain the syntax component index (SEIdx). The grammar component index is used to access other forms or data structures to obtain the macro block parameters as described below. In an embodiment, target register 652 includes a 32-bit scratchpad having the following format: bit 0-8 (ctxIdxOffset), bit 16-18 (maxBinldxCtx), bit 21-23 (ctxBlockCat) , Bits 24-29 (ctxIdxBlockCatOffset), and Bits 31 (bypass flag). These values (e.g., ctxIdxOffset, maxBinldxCtx, etc.) are passed to the fetch content module 622 for use as a content model. In this embodiment, any undefined reserved bits may be 〇. The ctxIdxBlockOffset can be obtained via a form or other data structure stored on the remote or on-wafer memory based on the result of the syntax component index and the content block type. Table 1 illustrates the contents of a non-limiting embodiment: ccxJeNum (k) Coded a blockjDattem Intra_4x4 Inter 0 47 0 1 31 16 2 15 1 3 0 2 4 23 4 5 27 8

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 47 200821982 6 29 32 7 30 3 8 7 5 9 11 10 10 13 12 11 14 15 12 39 47 13 43 7 , 14 45 11 15 46 13 16 16 14 17 3 6 18 5 9 19 10 31 20 12 35 21 19 37 22 21 42 23 26 44 24 28 33 25 35 34 26 37 36 27 42 40 28 44 39 29 1 43 30 2 45 31 4 46 32 8 17 33 17 18 34 18 20 35 20 24 36 24 19 37 6 21 38 9 26 39 22 28 40 25 23 41 32 27 42 33 29 43 34 30 44 36 22 45 40 25 46 38 38 47 41 41Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 47 200821982 6 29 32 7 30 3 8 7 5 9 11 10 10 13 12 11 14 15 12 39 47 13 43 7 , 14 45 11 15 46 13 16 16 14 17 3 6 18 5 9 19 10 31 20 12 35 21 19 37 22 21 42 23 26 44 24 28 33 25 35 34 26 37 36 27 42 40 28 44 39 29 1 43 30 2 45 31 4 46 32 8 17 33 17 18 34 18 20 35 20 24 36 24 19 37 6 21 38 9 26 39 22 28 40 25 23 41 32 27 42 33 29 43 34 30 44 36 22 45 40 25 46 38 38 47 41 41

Client’s Docket No..· S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 48 200821982 表一如果接收到未定義之内容區场絲口口一〃匕規種類，則可變長度解碼早兀53〇a可以把未定義參數當成 τ^ι 1 X υ 使件 ctxIdxBlockOffset 被考慮成具有〇值。啤叫BIND指令亦會使得重置卢％ ^ 〇. ιλ ，里直仏旒(Rst—Signal )從二進位化楱組620輸出至二進>[立計曾初,， 4异解碼模組.624，說明如下0 為了說明二進位化模組620的各種輸人與輸出，這裡提出根據至少-實施例之二進位化模組⑽的操作。呼叫 ,進位化馳㈣，則二進純—_触語法成分，亚且經由軟體提供已知的語法成分索引⑽版）。使用嗜法成分索引，二進位化模組62〇查找表單以獲^ maxBinldxCtx、ctxIdxOffset 以及 bypassFlag 的對應值。這個查找值會暫時儲存在目標暫存器652的預先定義位元配置。此外，使用語法成分索引以及内容區塊種類，二進位化模組620進行第二次表單查找（例如：遠端記憶體或是晶片上記憶體）以獲得ctxIdxBlockOffset數值。第二次的查找值亦是暫時儲存在目標暫存器652中。因此，已決定之值將用於建立目標暫存器652以作為32位元數值輸出目對某些語法成分而言，可使用額外的資訊（語法成分與内容區塊種類除外）以開始H.264解碼操作。例如，對像是SigCoeffFlag以及lastSigCoeffFlag的巨集區塊參數而言，使用儲存在巨集區塊鄰近内容記憶體564的陣列元素Client's Docket No..· S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 48 200821982 Table 1 If you receive an undefined content field, the mouth of the wire, the variable length decoding is early 53〇a can treat the undefined parameter as τ^ι 1 X υ The ctxIdxBlockOffset is considered to have a 〇 value. The beer called BIND will also make the reset Lu % ^ 〇. ιλ, and Rs-Signal output from the binary 楱 group 620 to the second gt; [Zheng Ji Zengchu, 4 different decoding module 624, illustrated below. To illustrate the various inputs and outputs of the binary module 620, the operation of the binary module (10) according to at least the embodiment is presented herein. Call, carry-up (4), then binary---------------------------------------------------------------------------------------------------------------------------------------------------- Using the affiliation component index, the binary module 62 looks up the form to obtain the corresponding values of maxBinldxCtx, ctxIdxOffset, and bypassFlag. This lookup value is temporarily stored in the predefined bit configuration of the target register 652. In addition, using the syntax component index and the content block type, the binary module 620 performs a second form lookup (e.g., remote memory or on-wafer memory) to obtain a ctxIdxBlockOffset value. The second lookup value is also temporarily stored in the target register 652. Therefore, the determined value will be used to establish the target register 652 as a 32-bit value output. For some syntax components, additional information (except for the syntax component and the content block type) can be used to start H. 264 decoding operation. For example, for macroblock parameters like SigCoeffFlag and lastSigCoeffFlag, array elements stored in the macroblock adjacent to the content memory 564 are used.

Client’s Docket No.: S3U06-0014-TW TT?s Docket No:0608-A41247twf.doc/NikeyChen 49 200821982 maxBinIdxCtx[ 1 ]裡的值以及輸入内容區塊種類值以決定巨集區塊疋圖場編碼或是圖框編碼。在某些實施例中，即使是不同的語法成分，同樣的語法成分數目也使用於這些旗才示’然後使用 mb—field—decoding—flag( mbNeighCtxf 1 ]攔位）來識別。除了上述有關二進位化模組620的功能，注意到在第诏圖1^進位化模'组620可結合二進位索引暫存器654、多工裔單元656和/或轉發暫存哭F】 ^ 办卄叩Μ以及F2。至於二進位索引暫存窃654以及多工器單元656，多工根據不同輸人而提供輸出SRC1(例如暫存^ 曰給取得内容模組622。内的值）關於標示為F1的轉發暫存哭指令產生結果時，結果可"二’ * _D (或GCTX) 暫存器052和/或轉發暫存哭ρ 不曰存裔（例如目標旗標可表示一個指令以及;應的)模=㈣令中的料 622或二進位計算解碼模組幻列如取侍内容模組以及F2。代表轉發暫存器的符# ^否使用轉發暫存器F1 1之值，在-實施例中可以是指 1 (即使用轉發來源及F2(即使用轉發來源2之日/的位元26所表示）以中的位元職示）。對於取得内：”：中可以是指令計算解碼模組624,資料可被轉 =22以及二進位下。 X们別的輪入，說明如前面已說明二進位化模組說明關於取得内容模組622在Client's Docket No.: S3U06-0014-TW TT?s Docket No:0608-A41247twf.doc/NikeyChen 49 200821982 The value in maxBinIdxCtx[ 1 ] and the input content block type value to determine the macro block map field code or Is the frame code. In some embodiments, even with different syntax components, the same number of grammatical components are used to identify these flags and then use mb-field-decoding_flag(mbNeighCtxf1) to identify them. In addition to the above-described functions relating to the binary module 620, it is noted that the group 620 can be combined with the binary index register 654, the multi-work unit 656, and/or the forwarding buffer F] ^ Office and F2. As for the binary index temporary stealing 654 and the multiplexer unit 656, the multiplexer provides the output SRC1 according to different input (for example, temporarily storing the value to the content module 622.) About the forwarding temporary storage indicated as F1 When the crying command produces a result, the result can be "two' * _D (or GCTX) register 052 and / or forward temporary crying 曰曰 ( (for example, the target flag can represent an instruction and; should) modul = (4) The material 622 or binary calculation decoding module in the order is like the content module and F2. The value of the forwarding register F1 1 on behalf of the forwarding register is 0. In the embodiment, it can be 1 (ie, using the forwarding source and F2 (ie, using the forwarding source 2 date/bit 26) Represents) in the position of the bit). For the acquisition: ": can be the instruction calculation decoding module 624, the data can be transferred = 22 and the binary. X other rounds, as explained above, the binary module description about the acquisition content module 622 at

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 620以及相關程序，這裡將 GCTX指令面如何取得已 50 200821982 知模型的内容以及二進位索引。簡單地說，取得内容模組 622 的輸入包含 maxBinldxCtx、binldx 以及 CtxIdxOffset，描述如下。取得内容模組622使用CtxIdxOffset及binldx 數值來計算Ctxldx之值（為一輸出，代表内容索引指令的示範格式如下： | GCTX DST，SRC2, SRC1，其中，SRC1對應於由多工器單元656所輸出的值並儲存於暫存态SRC1 ’而SRC2對應於由目標暫存器652所輸出的值並儲存於暫存器SRC2，以及DST對應於目標暫存器。在一實施例中，各暫存器具有下列數值： SRCl[7:0；hbinIdx ;當目前語法成分包含 codedBlockPattern時，SRC1的值（從多工器單元656輸出，並作為取得内容模組622之輸入）可以是二進位索引暫存器654的值。 SRC1 [15:8]可以是 levelListldx (當計算 sigC〇effplag 時）、lastSigCoeffFlag或是mbPartldx (當計算編碼區塊圖樣之Ref—Idx或是binldx)。當語法成分是sigC〇effplag或是lastSigCoeffFlag時，多工器單元656可以用來傳送 levelListldx 〇 SRC1 [16]可包含iCbCr旗標，而當其值為〇時，區塊為Cb色度區塊。此外，SRC1 [16]可包含L0/L1值，如果是L0時，其值為0，熟悉此技藝之人士從本發明的内容可知L0/L1是用於移動補償預測之圖形參考列表（= Hst〇Clienfs Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 620 and related procedures, here is how the GCTX command surface can get the content of the 2008 200821982 model and the binary index. Briefly, the input to get content module 622 contains maxBinldxCtx, binldx, and CtxIdxOffset, as described below. The retrieved content module 622 uses the CtxIdxOffset and binldx values to calculate the value of Ctxldx (which is an output, representing an exemplary format of the content index instruction as follows: | GCTX DST, SRC2, SRC1, where SRC1 corresponds to output by multiplexer unit 656 The value is stored in the temporary state SRC1 ' and SRC2 corresponds to the value output by the target register 652 and stored in the register SRC2, and the DST corresponds to the target register. In one embodiment, each temporary memory The device has the following values: SRCl[7:0; hbinIdx; when the current syntax component contains codedBlockPattern, the value of SRC1 (output from multiplexer unit 656 and as input to get content module 622) may be a binary index temporary storage. The value of 654. SRC1 [15:8] can be levelListldx (when calculating sigC〇effplag), lastSigCoeffFlag or mbPartldx (when calculating the coded block pattern Ref-Idx or binldx). When the syntax component is sigC〇effplag Or lastSigCoeffFlag, multiplexer unit 656 can be used to transmit levelListldx 〇 SRC1 [16] can contain iCbCr flag, and when its value is ,, the block is Cb chrominance block. SRC1 [16] may comprise L0 / L1 value, if L0, a value of 0, this person familiar with the art of the present invention may be known L0 / L1 prediction of pattern compensation for moving the reference list (= Hst〇

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 51 200821982Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 51 200821982

Ll = listl )。 SRC1 [21:20]二 mbPartitionMode SRC2 [8:0]二 ctxIdxOffset SRC2 [18:16]二 maxBinldxCtx SRC2 [23:31] = ctxBlockCat SRC2 [29:24] = ctxIdxBlockOffset SRC2 [31]二 bypassFlag 再者，DST包括取得内容模組622的輸出並具有下列值： DST [15:00] = ctxldx DST [ 23:16] = binldx DST [ 27:24] = mbPartldx DST [29:28] = mbPartitionMode DST [30] = L0Ll = listl ). SRC1 [21:20] two mbPartitionMode SRC2 [8:0] two ctxIdxOffset SRC2 [18:16] two maxBinldxCtx SRC2 [23:31] = ctxBlockCat SRC2 [29:24] = ctxIdxBlockOffset SRC2 [31] two bypassFlag Again, DST This includes getting the output of the content module 622 with the following values: DST [15:00] = ctxldx DST [ 23:16] = binldx DST [ 27:24] = mbPartldx DST [29:28] = mbPartitionMode DST [30] = L0

取得内容模組622亦可與轉發暫存器互動。因此，售使用轉發暫存器時，指令可取得GCTX.F1.F2的格式，^ 中F1以及F2指示轉發暫存器被使用，即有2位元在俨^ 解碼（F1以及F2)。假如未得到一或兩個轉發旗標，^ 示轉發暫存器未被使用。當這些位元被設定時（例如< 1)，則使用轉發暫存器的值（内部產生的值）。否使用來源暫存器的值。因此，轉發暫存器更提供―、，积何時為最早的時間可發出指令的建議給編譯、個有高 w吁柱序。去. 用轉發時’指令可能遇到已知來源暫存哭 ^ ^ 延遲。〈寫入後讀取#The retrieved content module 622 can also interact with the forwarding register. Therefore, when using the forwarding register, the instruction can obtain the format of GCTX.F1.F2, and F1 and F2 indicate that the forwarding register is used, that is, there are 2 bits in the decoding (F1 and F2). If one or two forwarding flags are not obtained, the forwarding register is not used. When these bits are set (for example, < 1), the value of the forwarding register (the internally generated value) is used. No Use the value of the source register. Therefore, the forwarding register provides a proposal to "-, when the product is the earliest time to issue instructions, to compile, and to have a high-order. Go. When using the Forwarding command, you may encounter a known source temporary crying ^^ delay. <Read after writing#

Client’s Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 52 200821982 對GCTX指令而言’當重置信號（Rst一gignai)被設定時，SRC1的值為〇。當運算（F1&Rst_signal)成立時， SRC1為來自取得内容模組622内部的binIdx值再加上}，否則SRC1為來自執行單元暫存器的binldx值。可使用二進位化模組620的輸出作為GCTX指令以及BARD指令的 ί 轉發SRC2值。在後面的指令中，不會發出BIND指令直. 到BARD指令使用到轉發暫存器。進一步解釋，重置信號以及F1轉發信號結合成一信號（例如2位元信號） {Fl，reset}，其指示輸入至取得内容模組622的SRC1值是否包括binldx值或是轉發值。提供重置信號的另一個作用是清除以及重置二進位字串暫存器616，並重置二進位索引暫存器654成〇。繼續討論取得内容模組622以及得到内容資訊，在一實施例中，下面表二以及表三所顯示的資訊分別對應於結構鄰近内容記憶體564以及mbNeighCtxCurrent暫存器603 的值。mbNeighCtxCurrent暫存器603包含目前巨集區塊的解碼輸出結果。在目前巨集區塊處理的最後部分，發出 CWRITE指令，其複製來自mbNeighCtxCurrent暫存器603 的資訊至鄰近内容記憶體564陣列内所對應的位置。之後，所複製的資訊被當作頂部鄰近值。參數大小（飯） transform_size_8x8_flag 1 0 mb—field—decode—flag 1 1 mb一skip一flag 1 2 lntra_chromajDred_mode 2 4:3 mb一type 3 7:5Client's Docket No.: S3U06-0014-TW TT s Docket No: 0608-A41247twf.doc/NikeyChen 52 200821982 For the GCTX instruction When the reset signal (Rst-gignai) is set, the value of SRC1 is 〇. When the operation (F1 & Rst_signal) is established, SRC1 is the value of the binIdx from the internal content acquisition module 622 plus, otherwise SRC1 is the binldx value from the execution unit register. The output of the binary module 620 can be used as the GCTX instruction and the ί forward SRC2 value of the BARD instruction. In the following instructions, the BIND instruction will not be issued. The BARD instruction is used to forward the scratchpad. Further, the reset signal and the F1 forward signal are combined into a signal (e.g., a 2-bit signal) {Fl, reset} indicating whether the value of the SRC1 input to the acquired content module 622 includes a binldx value or a forward value. Another function of providing a reset signal is to clear and reset the binary string register 616 and reset the binary index register 654 to 〇. Continuing with the discussion of the content module 622 and the content information, in one embodiment, the information shown in Tables 2 and 3 below corresponds to the values of the structure neighboring content memory 564 and the mbNeighCtxCurrent register 603, respectively. The mbNeighCtxCurrent register 603 contains the decoded output of the current macroblock. At the end of the current macroblock processing, a CWRITE instruction is issued that copies the information from the mbNeighCtxCurrent register 603 to the location within the array of adjacent content memory 564. The copied information is then treated as the top neighbor value. Parameter Size (rice) transform_size_8x8_flag 1 0 mb-field_decode_flag 1 1 mb-skip-flag 1 2 lntra_chromajDred_mode 2 4:3 mb-type 3 7:5

Client’s Docket No.: S3U06-0014-TW TT?s Docket No:0608-A41247twf.doc/NikeyChen 53 200821982 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFlagY 1 14 coded FlagCb 1 15 codedFlagCr 1 16 codedFlagTrans 8 24:17 refldx 8 32:25 predMode 4 36:33 表二參數大小（ίϊΰΐ;) transform_size一8x8一flag 1 0 mb—field 一decode一flag 1 1 mb—skip—flag 1 2 Intra—chroma_pred_mode 2 4:3 mbQpDeltaGTO 1 88 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFlagY 1 14 codedFlagCb 1 15 codedFlagCr 1 16 codedFlagTrans 24 87:64 refldx 16 52:37 predMode 8 60:53 mb—type 3 63:61 表三在一實施例中，參數codedFlagTrans被分為三部分。舉例來說，開始的4位元係有關於内容區塊種類為0或是 1，而上面的4位元係有關於内容區塊種類為3或是4。上面的4位元更可分為兩部分，較低的2位元給iCbCr=0而其他2位元給iCbCr=l。參數predMode (預測模式）具有下列三選項之一：predLO = 0、predL 1 = 1 以及 NiPred = 2。第6D係顯示參考表二以及表三之參數refldx結構的一實施例。需注意到參數refldx與使用在圖像復原之參考圖Client's Docket No.: S3U06-0014-TW TT?s Docket No:0608-A41247twf.doc/NikeyChen 53 200821982 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFlagY 1 14 coded FlagCb 1 15 codedFlagCr 1 16 codedFlagTrans 8 24:17 Refldx 8 32:25 predMode 4 36:33 Table 2 parameter size (ίϊΰΐ;) transform_size-8x8-flag 1 0 mb-field a decode-flag 1 1 mb-skip-flag 1 2 Intra-chroma_pred_mode 2 4:3 mbQpDeltaGTO 1 88 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 codedFlagY 1 14 codedFlagCb 1 15 codedFlagCr 1 16 codedFlagTrans 24 87:64 refldx 16 52:37 predMode 8 60:53 mb-type 3 63:61 Table 3 In an embodiment, The parameter codedFlagTrans is divided into three parts. For example, the first 4-bit system has a content block type of 0 or 1, and the upper 4-bit system has a content block type of 3 or 4. The upper 4 bits can be further divided into two parts, the lower 2 bits give iCbCr=0 and the other 2 bits give iCbCr=l. The parameter predMode has one of three options: predLO = 0, predL 1 = 1, and NiPred = 2. The 6D shows an embodiment of the parameter refldx structure of Reference Table 2 and Table 3. Note the parameter refldx and the reference map used in image restoration

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 54 200821982 像列表之索引有關。上述結椹^μ 曰， — 霉了 &供記憶體以及邏輯電路的隶佳化。如圖所顯示，計曾上五、+丄 I π語法成分結構包括巨隼區i 的頂部列009、巨集區塊分區旦木匕塊 ϋ11(如喊示的四區）、l〇/li 值613以及各L0/L1值的儲在办-a 贿存位兀值GtO (大於0) 615以及儲存位元值Gtl (大於1 ) 617 ， 017。通常，需要存取頂部鄰 f 近巨集區塊609，然而巨集區塊的底部列也是需要存取，其被分為4X4方陣的—實施例，結果產生四個mbPartlt腦 611。對各廳池腿611而言，L〇/U值⑴的消息被確定，但並非貫際值。關於L〇值以及U值為i或是大於！的判斷被決定。在-實施例中，藉由儲存⑽615以及⑻ 617兩位元而獲得蚊，其被使㈣計算語法成分。進-步簡單說明計算語法成分結構，兩個最佳化被執行。在-最佳化中，只有保持2位元（雖然參考值傳統上較大）’而不需要更多位元以供可變長度解碼單元遍内計算語法成分的解碼。解碼全部的值並維持在執行單元暫存益或是記憶體（例如：L2快取記憶體）。帛三最佳化只有四個元素被維持（例如兩個在頂部而兩個在左方）。四個兀素為再循環，而最後的值會由CWRITE指令寫入於鄰近，其儲存在記憶體中。之後，只有16位元被維持在 mbNeighCtxCmrent暫存器603，而只有8位元被維持在 mbNeighCtxLeft暫存器605以及陣列5料的頂部 mbNeighCtx元素601。在計算邏輯電路使用再儲存，因為解碼參考值的全部計算被較少位元的布林運算所取代。 mb—type包括如下列表四所顯示。Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 54 200821982 Like the index of the list. The above-mentioned knots ^μ 曰 , - mildew & for the memory and logic circuit. As shown in the figure, the structure of the grammatical components of the five, + 丄I π includes the top column 009 of the giant python area i, the macro block block partition danmu block ϋ 11 (such as the four areas shouted), l〇 / li The value 613 and the value of each L0/L1 value are stored in the office-a bribe deposit value GtO (greater than 0) 615 and the storage bit value Gtl (greater than 1) 617, 017. Typically, the top neighbor f macroblock 609 needs to be accessed, whereas the bottom column of the macro block is also accessed, which is divided into 4X4 square arrays - an embodiment, resulting in four mbPartlt brains 611. For each pool leg 611, the L〇/U value (1) message is determined, but not a consistent value. About L〇 and U value is i or greater than! The judgment is decided. In the embodiment, mosquitoes are obtained by storing (10) 615 and (8) 617 two-dimensional elements, which are (4) calculated grammatical components. The further step is to illustrate the calculation of the syntactic component structure and the two optimizations are performed. In the -optimization, only 2 bits are maintained (although the reference value is conventionally large), and no more bits are needed for the variable length decoding unit to compute the decoding of the syntax components. Decode all values and maintain the execution unit's temporary memory or memory (for example: L2 cache). Only three elements are maintained for the third optimization (for example, two at the top and two at the left). The four elements are recycled, and the last value is written in the neighborhood by the CWRITE command, which is stored in memory. Thereafter, only 16 bits are maintained in the mbNeighCtxCmrent register 603, and only 8 bits are maintained in the mbNeighCtxLeft register 605 and the top mbNeighCtx element 601 of the array. The storage logic uses re-storage because all calculations of the decoded reference value are replaced by Boolean operations with fewer bits. The mb_type includes the following list four.

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 200821982 mbjype 名稱 4，b000 SI 4，b001 I_4x4 or l__NxN 4’b010 1—16x16 4’b011 LPCM 4，b100 P_8x8 4，b101 B一8x8 4’b110 B—Direct__16x16 4’b111 Others 表四未顯示在第6B圖的額外暫存器可以被使用，例如 mbPerLine (例如8位元，不具正負號）、mb—qp—delta ( 8 位元，具正負號），以及mbAddrCurr ( 16-bit，目前巨集區塊位址）。對mbAddrCurr而言，1920x1080陣列被實施，雖然其只需要13位元。部分實施例會使用16位元以幫助 16位元計算的執行。來自先前所描述之暫存器的值亦被儲存在總體暫存器 614。複製儲存在總體暫存器614内的值並儲存在暫存器以幫助硬體設計。在一實施例中，總體暫存器614包括格式化之32位元暫存器以包含對應於mbPerlhie、mbAddrCurr 以及 mb—qp—delta 的值，除了對應於 NUT、MBAFF_FLAG 以及chroma_format—idc的其他值之外。可使用INSERT指令來更新總體暫存器614内的不同欄位。INSERT指令的示範格式描述如下： INSERT DST，#Imm，SRC1 在上面INSERT指令中，#Imm的一實施例包括1〇位元數字，其中前面5位元寬度的資料以及上面5位元指定Client's Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 200821982 mbjype name 4,b000 SI 4,b001 I_4x4 or l__NxN 4'b010 1-16x16 4'b011 LPCM 4,b100 P_8x8 4, B101 B-8x8 4'b110 B-Direct__16x16 4'b111 Others Table 4 does not show that the extra scratchpad in Figure 6B can be used, such as mbPerLine (eg 8-bit, no sign), mb-qp-delta ( 8-bit, with sign), and mbAddrCurr (16-bit, current macro block address). For mbAddrCurr, the 1920x1080 array is implemented, although it only requires 13 bits. Some embodiments will use 16 bits to aid in the execution of 16-bit calculations. Values from the previously described scratchpad are also stored in the overall register 614. The values stored in the overall register 614 are copied and stored in the scratchpad to aid in hardware design. In one embodiment, the overall scratchpad 614 includes a formatted 32-bit scratchpad to contain values corresponding to mbPerlhie, mbAddrCurr, and mb_qp-delta, except for other values corresponding to NUT, MBAFF_FLAG, and chroma_format_idc. Outside. The different fields within the overall scratchpad 614 can be updated using the INSERT instruction. The exemplary format of the INSERT instruction is described as follows: INSERT DST, #Imm, SRC1 In the above INSERT instruction, an embodiment of #Imm includes a 1-bit number, where the first 5-bit width data and the above 5-bit designation

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 200821982 ^資料被插入的位置。輸入參數包括下列所述：Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 200821982 ^The location where the data was inserted. Input parameters include the following:

Mask = NOT(0xFFFFFFFF«#Imm[4:0])Mask = NOT(0xFFFFFFFF«#Imm[4:0])

Data = SRC 1 & Mask SDATA = Data«#Imm[9:5] SMask = Mask«#Imm[9:5]Data = SRC 1 & Mask SDATA = Data«#Imm[9:5] SMask = Mask«#Imm[9:5]

f I 輸出：DST可表示如下：： DST = (DST & NOT(sMask)) I SDATA 需注意到一些攔位（例如：NUT ( NAL—UNIT—TYPE )、C ( (constrained一intra—pred一flag ) ) 、MBAFF_FLAG、 mbPerLine以及mbAddrCurr值亦可使用INIT—CTX指令來寫入/初始化至總體暫存器614。在一實施例中，局部暫存器612包括32位元暫存器，其具有對應於 b、mb_qp—delta、numDecodAbsLevelEql 以及numDecodAbsLevelGtl的攔位。這些欄位可使用INSERT 指令來更新。局部暫存器612亦被初始化，使得b二0、 mb—qp—delta^O 、 numDecodAbsLevelEql=-l 以及 c) numDecodAbsLevelGtl = 0。用以提供初始化的指令可使用下列格式： C WRITE SRC1 ，其中 SRC 1 [15:0] = mbAddrCurr。CWRITE SRC1 更新總體暫存器614的mbAddrCurr攔位。在鄰近元素結構以及其解碼的簡單描述之後，將描述透過CWRITE指令所提供的額外功能。f I Output: DST can be expressed as follows: : DST = (DST & NOT(sMask)) I SDATA Need to notice some interceptions (for example: NUT ( NAL_UNIT_TYPE ), C ( (constrained an intra-pred one) The flag ) ) , MBAFF_FLAG, mbPerLine, and mbAddrCurr values may also be written/initialized to the overall register 614 using the INIT-CTX instruction. In an embodiment, the local register 612 includes a 32-bit scratchpad having Corresponds to b, mb_qp-delta, numDecodAbsLevelEql, and numDecodAbsLevelGtl. These fields can be updated using the INSERT instruction. Local register 612 is also initialized so that b2, mb-qp-delta^O, numDecodAbsLevelEql=- l and c) numDecodAbsLevelGtl = 0. The instructions used to provide initialization can use the following format: C WRITE SRC1 , where SRC 1 [15:0] = mbAddrCurr. CWRITE SRC1 updates the mbAddrCurr block of the general register 614. The additional functionality provided by the CWRITE instruction will be described after the adjacent element structure and its simple description of decoding.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 57 200821982 在CABAC解碼中，語法值被預期並從隹塊_°不同方料述如後’其提供可變長度解碼=r30a 的實施例如何判斷左方以及上方鄰近巨集區塊以：斷f巨錢料實際上為可❹。如前文所描述’㈣私序使用鄰近值（例如：從巨集區塊或區塊至上方以及至 :左方）。在一實施例中，二進位計:算解碼？丨擎624計算下列方程式，其使用目前巨集區塊數量以及位於一線Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 57 200821982 In CABAC decoding, the syntax value is expected and is expressed from the block _° How to decode the =r30a embodiment to determine the left and above adjacent macroblocks to: Break the huge money is actually awkward. As described above, (4) The private order uses neighboring values (for example, from a macroblock or block to the top and to the left: to the left). In an embodiment, the binary meter: arithmetic decoding?丨 624 calculates the following equation, which uses the current number of macro blocks and is located in the first line.

(mbPerLme)t巨集區塊的數量以計算上方巨集區塊的位址以及左方與上方巨集區塊是否為可用。舉例來說，為了判斷鄰近巨集區塊（例如：左方鄰近）是否存在（即有效），可執行運算（例如：紙_她％ mbPerLine)以檢查其結果是否為〇。在一實施例中，可執行下列計算：：(mbCurrAddr%mbPerLine) x mbPerLine a = mbCwrAddr - mbPerLine 需注意到mbCurrAddr與對應於要解碼之二進位符號的目前巨集區塊位置有關，而nibPerLine與每一已知列之巨集區塊的數量有關。上面計算是使用一個除法、一個乘法以及一個減法而實施。進一步描述由二進位計算解碼引擎624所實施之解碼機制，參考第6E圖，其顯示將被解碼的圖像（16χ8巨集區塊且mbPerLine二16 )。當解碼第35巨集區塊時 (mbCurrent標記為35,而第36巨集區塊尚未被完全解碼）(mbPerLme) The number of t macro blocks to calculate whether the address of the upper macro block and whether the left and upper macro blocks are available. For example, to determine if a neighboring macroblock (eg, left neighbor) is present (ie, valid), an operation (eg, paper_her% mbPerLine) can be performed to check if the result is 〇. In one embodiment, the following calculations can be performed: :(mbCurrAddr%mbPerLine) x mbPerLine a = mbCwrAddr - mbPerLine Note that mbCurrAddr is related to the current macro block location corresponding to the binary symbol to be decoded, and nibPerLine and each The number of macroblocks in a known column is related. The above calculation is performed using a division, a multiplication, and a subtraction. The decoding mechanism implemented by the binary computation decoding engine 624 is further described, with reference to Figure 6E, which shows the image to be decoded (16χ8 macroblock and mbPerLine 2). When decoding the 35th macroblock (mbCurrent is marked 35, and the 36th macroblock has not been fully decoded)

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 58 200821982 時’需要來自先前已解碼之上方巨集區塊（標記為19)以及左方巨集區塊（標記為34 )的資料。上方巨集區塊的資訊可從 mbNeighCtx[i] 得到，其中卜mbCurrent%mbPerLine。因此，就這個例子而言，！= 35%16 ’則i=3。在目前巨集區塊被解碼後，可使用CWRITE 指令來更新陣列中的：mbNeighCtxLeft 605以及 mbNeighCtx[i] 601 〇當另一例子時，考慮下列： mbCurrAddr ε [Ο: max MB-l] 其中，maxMB為8192而mbPerLine二120。在一實施例中，除可以藉由乘上（Ι/mbPerLine)而實施，其查找儲存於晶片上記憶體之表（例如120x11位元的表）。當 mbCurrentAddr為13位元時，可使用13x11位元的乘法器。在一實施例中，完成乘法運算的結果、儲存上方13位元，以及執行13x7位元的乘法，藉以儲存較低13位元。最後，執行13位元的減法以決定「a」。運算的全部順序會使用到2個週期，而結果將被儲存以使用在其他運算，以及當 mbCurrAddr值改變時再計算一次。在部分實施例中，模數（modulo )運算不會被執行，反而可使用執行單元内的著色邏輯電路以提供對齊置於片段之第一線的第一 mbAddrCurr值。舉例來說，上述著色邏輯電路可執行下列計算：mbAd(keun_ absoluteMbAddrCurr - η * mbPerLine ° 因為，咅p 分 η 264Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 58 200821982 'Requires the upper macro block from the previous decoding (marked 19) and the left macro block (marked For 34) information. The information of the upper macro block can be obtained from mbNeighCtx[i], where mbCurrent%mbPerLine. So, for this example,! = 35% 16 ’ then i=3. After the current macroblock is decoded, the CWRITE instruction can be used to update the array: mbNeighCtxLeft 605 and mbNeighCtx[i] 601. As another example, consider the following: mbCurrAddr ε [Ο: max MB-l] where, maxMB is 8192 and mbPerLine is 120. In one embodiment, in addition to being implemented by multiplying (Ι/mbPerLine), it looks up a table of memory stored on the wafer (e.g., a table of 120 x 11 bits). When mbCurrentAddr is 13 bits, a 13x11 bit multiplier can be used. In one embodiment, the result of the multiplication operation is completed, the upper 13 bits are stored, and the 13x7 bit multiplication is performed to store the lower 13 bits. Finally, a 13-bit subtraction is performed to determine "a". The entire sequence of operations is used for 2 cycles, and the results are stored for use in other operations and again when the mbCurrAddr value changes. In some embodiments, a modulo operation will not be performed, instead a colored logic circuit within the execution unit may be used to provide a first mbAddrCurr value aligned to the first line of the slice. For example, the above-described coloring logic circuit can perform the following calculation: mbAd(keun_ absoluteMbAddrCurr - η * mbPerLine ° because, 咅p is divided into η 264

Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 59 200821982 / 彈丨生巨市區塊排序（Flexibility Macroblock Ordering，FMO ) 模式具有一些非常複雜的鄰近結構，為了複製這些模式，可在解碼系統2〇〇的額外著色器計算左方/上方的可得性，並載入至可變長度解碼單元530a的一或多個暫存器。藉由離開載入可變長度解碼單元530a，當啟動全部H.264模式以進行符號解碼時可減少硬體的複雜性。 ’ CWRITE指令從mbNeighCtxCurrent 603複製適當的攔位至 mbNeighCtxTop[] 601 以及 mbNeighCtxLeft[](例如陣列564的左方巨集區塊根據是否設定mBaffFrameFlag (MBAFF)以及目前與先前巨集區塊是否為攔位或是圖框角午碼’則特定 mbNeighCtxTop[] 601 以及 mbNeighCtxLeft[] 資料寫入。當（mbAddrCurr % mbPerLine = = 0)成立時，標記mbNeighCtxLeft 605為不可用（例如其被初始化成〇 )。使用CWRITE指令可移除mbNeighCtx記憶體564、局部暫存器612以及總體暫存器614的内容。例如， CWRITE指令移動鄰近内容記憶體564的相關内容至第i 個巨集區塊（例如mbNeighCtx[i]或是目前巨集區塊）的左方以及上方區塊，並且亦清除mbNeighCtxCurrent暫存器 603。如前文所描述，上方指標607c以及左方指標607b與鄰近内容記憶體564有關。在CWRITE指令之後，上方索引增加1，並且目前巨集區塊的内容移動到陣列内的上方位置以及左方位置。上述機構可減少讀出/寫入時記憶體陣列中讀出/寫入埠的數量。可使用INSERT指令來更新鄰近内容記憶體564、局部Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 59 200821982 / Flexibility Macroblock Ordering (FMO) mode has some very complicated neighboring structures for copying These modes, the left/upper availability can be calculated in the extra shader of the decoding system 2, and loaded into one or more registers of the variable length decoding unit 530a. By leaving the load variable length decoding unit 530a, the complexity of the hardware can be reduced when all H.264 modes are activated for symbol decoding. The 'CWRITE instruction copies the appropriate block from mbNeighCtxCurrent 603 to mbNeighCtxTop[] 601 and mbNeighCtxLeft[] (eg, the left macro block of array 564 depends on whether mBaffFrameFlag (MBAFF) is set and whether the current and previous macro blocks are blocked. Or the frame corner code 'is specific mbNeighCtxTop[] 601 and mbNeighCtxLeft[] data is written. When (mbAddrCurr % mbPerLine = = 0) is established, the mark mbNeighCtxLeft 605 is not available (for example, it is initialized to 〇). The CWRITE instruction may remove the contents of the mbNeighCtx memory 564, the local register 612, and the overall register 614. For example, the CWRITE instruction moves the related content of the adjacent content memory 564 to the i-th macroblock (eg, mbNeighCtx[i Or the left and upper blocks of the current macroblock, and also clear the mbNeighCtxCurrent register 603. As described above, the upper indicator 607c and the left indicator 607b are related to the adjacent content memory 564. In the CWRITE instruction After that, the upper index is incremented by 1, and the contents of the current macro block are moved to the upper position and the left position within the array. The above mechanism may reduce the read / write memory when the read / write of the number of ports in the array. INSERT command may be used to update the contents of adjacent memory 564, local

Client’s Docket No·: S3U06-0014-TW TT，s Docket No:0608-A41247twf.doc/NikeyChen 60 200821982Client’s Docket No·: S3U06-0014-TW TT,s Docket No:0608-A41247twf.doc/NikeyChen 60 200821982

如前文所述。例如：INSERT 暫存器612以及總體暫存器614的内容，如，可使用 INSER丁指令（例as stated before. For example, the contents of the INSERT register 612 and the overall register 614, for example, an INSER instruction can be used (for example)

SmbNeighCtxCurrent一 1，#Imml05 SRC1 )來寫入目前巨隽區塊。後來的運算不會影響上方指標60九以及左方:標二二 (即只寫入至目前位置）。SmbNeighCtxCurrent-1, #Imml05 SRC1) to write the current giant block. Subsequent operations will not affect the upper indicator 60 and the left: the second two (ie, only write to the current position).

INSERT指令以及來自二進位計算解碼模組624之更新被舄入至鄰近内谷€己fe體564的mbNeighCtxCiirrent陣列601。左方指標607b指向記憶體564的元素，其相同於鄰近（鄰近於mbNeighCtx 601 )陣列元素（即 mbNeighCtx[i-l])。鑑於上述關於得到内容以及模型資訊，下文將根據内容以及模型資訊討論二進位計算解碼模組624以及計算解碼。二進位計算解碼模組624在BARD指令下操作。BARD 指令的不乾格式描述如下： BARD DST，SRC2, SRC1 其提供二進位計算解碼運算，其中各二進位重複解碼導致單一位元輸出。輸入參數描述如下： SRC1 = binldx/ctxldx，為取得内容模組622的輸出；以及 SRC2 = bypassFlag，為二進位化模組62〇的輸出。當使用轉發暫存器時，一示範格式可包括 BARD.FI.F2 ’其指示轉發暫存器。假如未得到一或兩個對The INSERT command and the update from the binary computational decoding module 624 are broken into the mbNeighCtxCiirrent array 601 adjacent to the inner cell 564. The left indicator 607b points to the element of the memory 564, which is identical to the adjacent (near mbNeighCtx 601) array element (i.e., mbNeighCtx[i-l]). In view of the above regarding the content and model information, the binary calculation decoding module 624 and the calculation decoding will be discussed below based on the content and model information. The binary calculation decoding module 624 operates under the BARD instruction. The dry format of the BARD instruction is described as follows: BARD DST, SRC2, SRC1 It provides a binary calculation decoding operation in which each binary repeat decoding results in a single bit output. The input parameters are described as follows: SRC1 = binldx/ctxldx, to obtain the output of the content module 622; and SRC2 = bypassFlag, which is the output of the binary module 62〇. When using a forward register, an exemplary format may include BARD.FI.F2' which indicates a forwarding register. If you don't get one or two pairs

Client’s Docket No·: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 200821982 應的轉發旗標’絲_發暫存器、未被制。注意到二進位計异解碼模組624亦接收如心所描述的重^㈣。特 =地’在接收重置信號之後，二進位計算解碼模組幻*維持重置信號相接收到第-切叫BARD〗置信號被清除。 1 在運f中：一進位计异解碼模組624接收|内容索引、ctxldx值以及指標至來自取得内容模組奶❸解碼位元流（bmldx)之目前位元分析位置。二進位計算解碼模組使用來自於碼長偏移量暫存器_以及碼長範圍暫存 ™ 606的偏私里以及範圍值以紀錄解碼引擎的目前間隔狀態（偏移量，偏移量+範圍）。二進位計算解碼模組624 使用内容索引值以存取内容表（CTX_TABLE)，其依序使用以存取目4可能狀態pStateIdx以及高可能性符號值。使用的tateldx (例如：來自於儲存在遠端或晶片上記憶體之表單）以讀取低可能性符號子範®值、T-個高可能性符號值以及下一個低可能性符號的可能值。根據高可能性符號值的狀態、下一個範圍以及可能性資訊，二進位計算解碼模組624計算目前二進位符號的高可能性符號值。二進位計算解碼模組624輪出二進位信號 (位元或是二進位值，例如：b〇、bi、...bn)至二進位字串暫存器616。接著，對下一個二進位的相同或是不同内容重複程序，例如從二進位字串暫存器616至取得内容模組 622的回授連接658所顯示。二進位計算解碼模組624根據高可能性符號值的選擇而更新偏移量以及範圍值和可能Client’s Docket No·: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 200821982 The forwarding flag should be 'silk_hair register, not manufactured. It is noted that the binary offset decoding module 624 also receives the weight (4) as described by the heart. After receiving the reset signal, the binary calculation decoding module phantom maintains the reset signal and receives the first-to-cut BARD signal. 1 In transport f: a carry-to-count decoding module 624 receives the |content index, the ctxldx value, and the indicator to the current bit analysis location from the retrieved content module milkpock decoding bitstream (bmldx). The binary calculation decoding module uses the partial privacy and range values from the code length offset register _ and the code length range temporary storage 606 to record the current interval state of the decoding engine (offset, offset + range) ). The binary calculation decoding module 624 uses the content index value to access the table of contents (CTX_TABLE), which in turn is used to access the target state pStateIdx and the high likelihood symbol value. Tateldx used (eg, from a form stored on the remote or on-wafer memory) to read the low probability symbol sub-values, T-high probability symbol values, and possible values for the next low probability symbol . Based on the state of the high likelihood symbol value, the next range, and the likelihood information, the binary calculation decoding module 624 calculates the high likelihood symbol value of the current binary symbol. The binary calculation decoding module 624 rotates the binary signals (bits or binary values, e.g., b〇, bi, ... bn) to the binary string register 616. The program is then repeated for the same or different content of the next binary, such as from the binary string register 616 to the feedback connection 658 of the retrieved content module 622. The binary calculation decoding module 624 updates the offset and the range value and possibly based on the selection of the high probability symbol value.

ClientDocket No.： S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 62 200821982 及=:ΐ=::Τ 624將目前高可能性此性狀悲舄入至内容表以供後來的内 /忍到關於轉發暫存器F1以及轉發。用，當信號發出轉發時，指令可能或是F2的使 =口，當從二進純馳_發絲得化=有=° >又有延遲存在，且可在下一個週期發出GCT =622中，取得内容模組622轉發至二進位計算解碼樓。在從 .使用到4個週期。當在週期』發出gctx、24中’會發出B繼指令。有用指令的=導才曰〇填充4個觀。在從二進位化模組62q 遲位汁异解碼模組624中，沒有延 t至—進解碼模組624轉發至取得内容模组622 =進位計算ClientDocket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 62 200821982 and =:ΐ=::Τ 624 will present the high possibility of this trait into the table of contents for later / Endure about forwarding register F1 and forwarding. Use, when the signal is forwarded, the command may be the same as the F2's = port, when the slave is purely _ hairline = yes = ° > and there is a delay, and can be issued in the next cycle GCT = 622 The acquired content module 622 is forwarded to the binary computing decoding building. In use from . to 4 cycles. When a gctx, 24 is issued in the cycle, a B-order command is issued. The = command of the useful instruction fills in 4 views. In the delay decoding module 624 from the binary module 62q, there is no delay to the decoding module 624 forwarded to the acquired content module 622 = carry calculation

二β:: 則可在週期〇+5)發出二S : I 位字串被保留且二進位計算解碼模組丄 -進位化換組620之間有切換存在，則沒有延 : 二進位字串，可允許發出BAR…ard指： i、不而心叉延遲的旁路（bypass)情況。 CAVLC解碼已^描述用於CABAC解碼的可變長度解碼單元 3 ’目丽將針對解碼系、统200的CAVLC實施例作進— 上，其亦稱為可變長度解碼單元5地，如第从圖所顯示: 幻田述CAVLC架構之前，先簡翠描述在可變長度解碼單Two β:: can be issued in the period 〇+5) two S: I bit string is reserved and the binary calculation decoding module 丄-the carry group 620 has a switchover, then there is no extension: the binary string , can allow the issuing of BAR...ard means: i, bypassing the delay of the fork. The CAVLC decoding has described a variable length decoding unit 3 for CABAC decoding. The above will be referred to as a CAVLC embodiment of the decoding system, which is also referred to as a variable length decoding unit 5, such as the first The figure shows: Before the fantasy field CAVLC architecture, the first simple description of the variable length decoding

Clienfs Docket No.： S3U06-0014-TW TT^s Docket No:0608-A41247twf.doc/NikeyChen 63 200821982 元530b中内容的H.264 CAVLC程序。已知，CAVLC程序編碼有關巨集區塊或是其位置之信號的位準（例如·大小），以及位準何時會重複（例如多少週期），以避免需要對每一位元做解碼。位元流562b接收以及分析上述貧訊，其中當資訊由解碼可變長度解碼單元530b的解碼引擎使用時，緩衝器被填充。可變長度·解碼單兀530b藉由從已接收位元流所練具有位準以及運行 (run)係數的巨集區塊資訊來反向編碼過程並重建信號。因此，可變長度解碼單元通從位元流緩衝器562^收巨集區塊貧訊’並分析串流已分別得到位準以及運行值給位準以及運行陣列的暫時儲存器。舉例來飞位進、及運行陣列讀出對應於巨集區塊中區塊之4χ:區：：素’接著清除位準以及運行陣列以供下一個區塊使用。、介 =:64標準，軟體可根據4x4構建區塊而使用全部的巨' 現在提供有關於解碼巨集區塊資訊的—般敘述提出* CAVLC解碼程序之内容中可^二列 530b的不同元件，可將符合實1又早碼單元刀丁灯口貝|不應用的各種慮。熟悉此技藝之人士可知下列所使用的許多術語二考不同參數的標號）是出自h.264規格’為了 =、如除非是有助於了解所述的不同程序和/ 丹負述，一步之說明。才會再做進第7A圖係顯示可變長度解碼單元53(^ —麻> 塊圖。第7A圖係顯示單一可變長度二=例之方 1平兀530b，而單Clienfs Docket No.: S3U06-0014-TW TT^s Docket No: 0608-A41247twf.doc/NikeyChen 63 200821982 The H.264 CAVLC program of the contents of element 530b. It is known that the CAVLC program encodes the level (e.g., size) of the signal about the macroblock or its location, and when the level is repeated (e.g., how many cycles) to avoid the need to decode each bit. The bit stream 562b receives and analyzes the above-described lean, wherein the buffer is filled when the information is used by the decoding engine of the decoded variable length decoding unit 530b. Variable Length·Decoding The unit 530b reverses the encoding process and reconstructs the signal by learning the macroblock information with the level and run coefficients from the received bit stream. Therefore, the variable length decoding unit passes the bit stream buffer 562 to receive the macro block and analyzes the stream to obtain the level and the running value to the level and the temporary storage of the running array. For example, the fly-in and run-out array read corresponds to the block in the macroblock: zone:: prime then clears the level and runs the array for use by the next block. , = = 64 standard, software can use all the giants according to 4x4 building blocks. Now provide a general description about decoding macro block information. * CAVLC decoding program can be used in two columns 530b different components , can be in line with the real 1 and early code unit knife Dingkoukou | do not apply all kinds of considerations. Those skilled in the art will recognize that many of the following terms used in the second test are based on the h.264 specification 'for =, unless it is helpful to understand the different procedures and/or narration, one step description . Will be done again. Figure 7A shows the variable length decoding unit 53 (^-麻> block diagram. Figure 7A shows a single variable length two = example of the square 1 兀 530b, and single

Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 64 200821982 ，-可Μ長度解碼單s㈣心在f施例巾流。同樣的原理可庫用至且古 ”、、早位70 碼系雄細外可變長度解碼單元的解弟圖係顯示可變長度解碼單元530b之選擇元件，而第7 B圖係顯示CAVLCM^主# 擇兀件 ’ 隹「△ 碼的表格結構。雖'然下列敘 :以疋有關巨木區塊解碼的内容’但是本發Client's Docket No.: S3U06-0014-TW TT's Docket No: 0608-A41247twf.doc/NikeyChen 64 200821982, - Μ length decoding single s (four) heart in the f. The same principle can be used in the library, and the early 70-symbol system of the variable length decoding unit shows the selected elements of the variable length decoding unit 530b, and the 7th B system shows the CAVLCM^ Main # 兀 ' ' 隹 △ △ 的的的的 △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △

可應用到各種區塊解碼，將不再進一步插述相同的部I 可雙長度解碼單元53Gb用以分析位元流、初始化解碼硬脰與暫存器/記憶體結構，以及階段_運行解碼。上述 H.264標準的CAVLC解碼程序的上述各功能將進—步描述 =後。關於位元流緩衝器操作，在CABAC以及cavlc運算之間共用SREG串流緩衝器/DMA引擎562，因此除了面，及CABAC以及CAVLC模式之間的操作差異之外，為了簡潔將不再進一步描述相同的部分。CABAC以及 CAVLC解碼貫施例皆使用相同的内容記憶體564，但是攔It can be applied to various block decodings, and the same part I can be further inserted. The dual length decoding unit 53Gb is used to analyze the bit stream, initialize the decoding hard memory and the scratchpad/memory structure, and stage_run decoding. The above functions of the above-mentioned H.264 standard CAVLC decoding program will be described step by step. Regarding the bit stream buffer operation, the SREG stream buffer/DMA engine 562 is shared between the CABAC and cavlc operations, so the description will be omitted for the sake of brevity, except for the face, and the operational differences between the CABAC and CAVLC modes. The same part. Both CABAC and CAVLC decoding use the same content memory 564, but block

位（例如.結構）不相同，其將描述於後。因此，當CAVLC 的内容記憶體564操作相似於前文所描述的caBAC運算. 時，為了簡潔將不再進一步描述相同的部分。此外，總體暫存器614以及局部暫存器612亦被使用，因此將不再進一步描述相同的部分。參考第7A圖，可變長度解碼單元530b包括硬體的不同模組，其包括係數符記（token)模組（coeff_token) 710、位準碼模組（CAVLC—LevelCode ) 712、位準模組 (CAVLC一Level) 714、位準〇模組（CAVLCJL0) 716、Bits (eg, structures) are not the same, which will be described later. Therefore, when the content memory 564 of the CAVLC operates similarly to the caBAC operation described above, the same portions will not be further described for the sake of brevity. In addition, the overall register 614 and the local register 612 are also used, and thus the same portions will not be further described. Referring to FIG. 7A, the variable length decoding unit 530b includes different modules of the hardware, including a token module (coeff_token) 710, a level code module (CAVLC-LevelCode) 712, and a level module. (CAVLC-Level) 714, Positioning Module (CAVLCJL0) 716,

Clients Docket No.: S3U06-0014-TW TT^s Docket No:0608-A41247twf.doc/NikeyChen 65 200821982 ，零位準模組（CAVLC_ZL) 718、運行模組（CAVLC_Run) 720、位準陣列（LevelArray ) 722以及運行陣列（RunArray ) 724。解碼系統亦包括如前文所描述之SREG串流緩衝器 /DMA引擎562、總體暫存器614、局部暫存器612以及鄰近内容記憶體564。可變長度解碼單元530b.與執行單元420a的介面包括相同於前文所述之CABAC實施例的一或多個目標匯流排與對應的暫存器（例如：目標暫存器）’以及兩個來源匯 " 流排與對應的暫存器（SRC1以及SRC2等）。通常，根據片段的種類，驅動軟體128 (第1圖）準備並載入CAVLC著色器至執行單元420a〇CAVLC著色器使用才示準4曰令集再加上額外的指令集’包括coeff token、 CAVLC 一 LevelCode、CAVLC—Level、CAVLC—L0 、 CAVLC一ZL以及CAVLC—Run指令以解碼位元流。額夕卜的指令係包括有關於位準陣列722以及運行陣列724之讀取以及清除運算的READ—LRUN以及CLRJLRUN指令。在、一實施例中，在發出其他指令前，CAVLC著色器所執行的第一個指令包含INIT一CTX指令和INIT—ADE指令。這兩個指令初始化可變長度解碼單元530 b以解碼CAVLC位元流，並從自動安排串解碼的指標載入位元流至先進先出緩衝器，稍後將說明這兩個指令。因此，可變長度解碼單元 530b可用以分析位元流、初始化解碼硬體與暫存器/記恒俨結構，以及階段-運行解碼。Η·264標準的CAVLC解碼= 序的上述各功能將進一步描述於後。Clients Docket No.: S3U06-0014-TW TT^s Docket No:0608-A41247twf.doc/NikeyChen 65 200821982, Zero Position Module (CAVLC_ZL) 718, Operation Module (CAVLC_Run) 720, Level Array (LevelArray) 722 and RunArray 724. The decoding system also includes a SREG stream buffer/DMA engine 562, an overall register 614, a local register 612, and a neighboring content memory 564 as previously described. The interface of the variable length decoding unit 530b. and the execution unit 420a includes one or more target buss and corresponding registers (eg, target registers) of the CABAC embodiment as described above and two sources. Stream " stream and the corresponding scratchpad (SRC1 and SRC2, etc.). Generally, depending on the type of segment, the driver software 128 (Fig. 1) prepares and loads the CAVLC shader to the execution unit 420a. The CAVLC shader is used to display the command set plus an additional instruction set 'including coeff token, CAVLC A LevelCode, CAVLC-Level, CAVLC-L0, CAVLC-ZL, and CAVLC-Run instructions are used to decode the bit stream. The instructions of the quotation include READ-LRUN and CLRJLRUN instructions for the read and clear operations of the level array 722 and the run array 724. In one embodiment, the first instruction executed by the CAVLC shader includes an INIT-CTX instruction and an INIT-ADE instruction prior to issuing other instructions. These two instructions initialize the variable length decoding unit 530b to decode the CAVLC bit stream and load the bit stream from the index that automatically arranges the string decoding to the FIFO buffer, which will be described later. Thus, variable length decoding unit 530b can be used to analyze the bit stream, initialize the decoding hardware and register/record constant structure, and stage-run decoding. The CAVLC decoding of the 264264 standard = the above-mentioned functions of the sequence will be further described later.

Client’s Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 66 200821982 、關於分析位元流的指令，除了先前描述於CABAC程 4 序的READ以及INITJBSTR指令會共用於CAVLC程序之外，還有兩個其他指令分析位元流存取更有關於CAVLC 程序，即INPSTR指令（對應於檢查字串模組570)以及 INPTRB指令（第5C圖中前次載入至可變長度解碼邏輯電路550)。INPSTR指令以及INPTRB指令不需要限定在 CAVLC操作（例如上述指令可使用在其他程序，如 CABAC、VC-1以及MPEG)。使用INPSTR指令以及 1 INPTRB指令以偵測特定圖型（pattern )(例如：資料開始或是結束圖型）是否出現在片段、巨集區塊等，用以致能位元流的項出而不需要進行位元流。在一實施例中，指令的順序包括INPSTR以及INPTRB然後READ指令的實施。INPSTR指令的不範格式描述如下：Client's Docket No.: S3U06-0014-TW TT s Docket No:0608-A41247twf.doc/NikeyChen 66 200821982. The instructions for analyzing the bit stream are used in addition to the READ and INITJBSTR instructions previously described in the CABAC program. In addition to the program, there are two other instruction analysis bit stream accesses that are more related to the CAVLC program, namely the INPSTR instruction (corresponding to the check string module 570) and the INPTRB instruction (the previous load to the variable in the 5C picture) Length decoding logic 550). The INPSTR instruction and the INPTRB instruction do not need to be limited to CAVLC operations (for example, the above instructions can be used in other programs such as CABAC, VC-1, and MPEG). Use the INPSTR instruction and the 1 INPTRB instruction to detect whether a specific pattern (eg, data start or end pattern) appears in a fragment, a macro block, etc., to enable the entry of the bit stream without the need Perform a bit stream. In one embodiment, the order of the instructions includes the implementation of INPSTR and INPTRB and then the READ instruction. The irregular format of the INPSTR instruction is described as follows:

INPSTR DST 其中，在一實施例中，檢查位元流並傳回SREG暫存器562a ^ 的隶向有效16位元在目標暫存器的較低16位元。目標新存器的上16位元包含sREGbitptr值。由於此操作，資料並未從SREG暫存器562a移除。根據下列示範偽碼 (pseudocode)可實施 INPSTR指令： MODULE INPSTR (DST)INPSTR DST wherein, in one embodiment, the bit stream is checked and passed back to the lower 16 bits of the target register of the SREG register 562a^. The upper 16 bits of the target buffer contain the sREGbitptr value. Due to this operation, the data is not removed from the SREG register 562a. The INPSTR instruction can be implemented according to the following pseudo-code (pseudocode): MODULE INPSTR (DST)

OUTPUT [31:0] DSTOUTPUT [31:0] DST

DST - {ZE (sREGbitptr), sREG [msb: msb-15]}； ENDMODULEDST - {ZE (sREGbitptr), sREG [msb: msb-15]}; ENDMODULE

Client’s Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 67 200821982 另一個分析位元流的指令為INPTRB指令，其檢查原始位元組序列承載（raw byte sequence payload ，RBSP) 尾隨位元（例如排列成位元組的位元流）。INPTrb指令提供位元流暫存器562b的讀取。INPTRB指令的示範格式描述如下： INPTRB DST。在INPTRB運算中，沒有位元從SREG暫存器562a移除。^ SREG暫存裔562a的南有效位元包含例如1 〇〇時，則SREG暫存态562a包含RBSP停止位元，以及位元組内剩下的位元為alignment zero bits。根據下列示範偽碼可實施INPTRB指令： MODULE INPTRB(DST) OUTPUT DST; REG [7:0] P; P = sREG [msb: msb-7];Client's Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 67 200821982 Another instruction to analyze the bitstream is the INPTRB instruction, which checks the raw byte sequence payload. RBSP) Trailing bits (such as a bit stream arranged in a byte). The INPTrb instruction provides a read of the bit stream register 562b. The exemplary format of the INPTRB instruction is described below: INPTRB DST. In the INPTRB operation, no bits are removed from the SREG register 562a. ^ When the south significant bit of the SREG temporary 562a contains, for example, 1 〇〇, the SREG temporary state 562a contains the RBSP stop bit, and the remaining bits in the byte are alignment zero bits. The INPTRB instruction can be implemented according to the following exemplary pseudo code: MODULE INPTRB(DST) OUTPUT DST; REG [7:0] P; P = sREG [msb: msb-7];

Sp = sREGbitptr; T [7:0] = (P » sp) « sp; DST[l]-(T = -0x80)? l：〇；Sp = sREGbitptr; T [7:0] = (P » sp) « sp; DST[l]-(T = -0x80)? l:〇;

DST[0] = ! (CVLC—BufferBytesRemaining > 0); ENDMODULE 提供RE AD指令以供位元流緩衝器5 62b中資料調正。現在將描述可變長度解碼單元530b的額外位元串緩衝DST[0] = ! (CVLC_BufferBytesRemaining >0); ENDMODULE Provides the RE AD instruction for data alignment in the bit stream buffer 5 62b. The extra bit string buffering of the variable length decoding unit 530b will now be described.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 68 200821982 器操作，目前將針對CAVLC操作的的初始化作描述，尤其是記憶體、暫存器結構以及解碼引擎（例如：CAVLC模、、且5 8 2 )的初始化。在片段起始處且在解碼對應於第一巨集區塊暫存器結構的語法成分之前，總體暫存器614、局部暫存器612,以及CAVLC模組582被初始化。在一實施例中，驅動軟體128發出INIT—CAVLC指令以進行初始化。INIT—CAVLC指令的示範格式描述如下： INIT—CAVLC SRC2, SRC 1Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 68 200821982 The device operation will now describe the initialization of the CAVLC operation, especially the memory, the scratchpad structure and the decoding engine ( For example: initialization of CAVLC mode, and 5 8 2 ). The global register 614, the local register 612, and the CAVLC module 582 are initialized at the beginning of the segment and before decoding the syntax components corresponding to the first macroblock register structure. In one embodiment, the driver software 128 issues an INIT-CAVLC instruction for initialization. The exemplary format of the INIT-CAVLC instruction is described as follows: INIT—CAVLC SRC2, SRC 1

其中’ SRC2包括片段資料中解碼之位元組的數目。其值寫入於内部 CVLC一bufferBytesRemaining 内： SRC1 [15:0] = mbAddrCurr ; SRC1 [23:16] = mbPerLine ； SRC 1 [24] = constrained—intra—predflag ; SRC 1 [27:25] = NAL—unit一type (NUT); SRC1 [29:28] = chroma—format—idc (—實施例係使用對應於4:2:0格式之1的chroma—format一idc值，然而部分實施例可使用其他取樣機制）；以及 SRC1 [31:30]=未定義。關於INIT一CAVLC指令，SRC1内的值被寫入至總體暫存器614中所對應的攔位。再者，SRC2内的值被寫入至由INIT指令所設定的内部暫存器（例如： CVLC—bufferByteRemaining 暫存器）。使用Where 'SRC2 includes the number of bytes decoded in the fragment data. Its value is written in the internal CVLC-bufferBytesRemaining: SRC1 [15:0] = mbAddrCurr ; SRC1 [23:16] = mbPerLine ; SRC 1 [24] = constrained—intra—predflag ; SRC 1 [27:25] = NAL -unit-type (NUT); SRC1 [29:28] = chroma-format-idc (----------------- Other sampling mechanisms); and SRC1 [31:30]=undefined. Regarding the INIT-CAVLC instruction, the value in SRC1 is written to the corresponding bit in the overall register 614. Furthermore, the value in SRC2 is written to the internal scratchpad set by the INIT instruction (for example: CVLC_bufferByteRemaining register). use

Clients Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 69 200821982 • CVLC—bufferByteRemaining暫存器以復原任何錯誤位元流，如前文所述。舉例來說，可變長度解碼單元53〇b (例如：SREG串流緩衝器/DMA引擎562)紀錄了分析已知片段之位元流中缓衝位元的資訊。當使用位元流時，可變長度解碼單元530b計數並更新CVLC_bufferByteRemaining 值。當其值低於0時，其中低於〇的值是表示緩衝器或是位元流錯誤’提示處理的終止以及返回至應用控制或是由驅動軟體128控制以處理復原。 : INIT—CAVLC指令亦初始化可變長度解碼單元530b的Clients Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 69 200821982 • CVLC—bufferByteRemaining register to restore any error bit stream, as described above. For example, variable length decoding unit 53A (e.g., SREG stream buffer/DMA engine 562) records information that analyzes buffer bits in the bitstream of a known slice. When a bit stream is used, the variable length decoding unit 530b counts and updates the CVLC_bufferByteRemaining value. When the value is below 0, the value below 〇 indicates the end of the buffer or bit stream error 'prompt processing' and returns to application control or is controlled by the driver software 128 to handle the recovery. : The INIT-CAVLC instruction also initializes the variable length decoding unit 530b

不同儲存結構，包括在某方面來說相似於先前描述之 CABAC程序的鄰近内容記憶體564、mbNeighCtxLeft暫存器 605 以及 mbNeighCtxCurrent 暫存器 603。已知 CAVLC 解碼之内容本質，根據前次解碼巨集區塊時CAVLC_T〇TC 指令所蒐集之資訊來解碼目前的巨集區塊，亦即左方巨集區塊儲存於左方mbNeighCtxLeft暫存器605並由指標607b 所指向，而上方巨集區塊儲存於陣列元素⑴6〇1中並由指標607c所指向。使用INIT_CAVLC指令來初始化上方指標607c與左方指標607b，並更新總體暫存器614。為了判斷鄰近巨集區塊（例如：左方鄰近）是否存在 (即有效），可由CAVLC-TOTC指令執行運算（例如： mbCurrAddr % mbPerLine)，其相似於 CABAC 實施例中所執行的同一程序，因此將不再描述。相似於所描述的CABAC程序，使用CWRITE指令可移除鄰近内容記憶體564的内容，而使用INSERT指令可Different storage structures, including adjacent content memory 564, mbNeighCtxLeft register 605, and mbNeighCtxCurrent register 603, which are similar in some respects to the previously described CABAC program. Knowing the content nature of CAVLC decoding, the current macroblock is decoded according to the information collected by the CAVLC_T〇TC command when decoding the macroblock, that is, the left macroblock is stored in the left mbNeighCtxLeft register. 605 is pointed to by indicator 607b, and the upper macroblock is stored in array element (1) 6〇1 and pointed to by indicator 607c. The upper indicator 607c and the left indicator 607b are initialized using the INIT_CAVLC instruction, and the overall register 614 is updated. In order to determine whether a neighboring macroblock (eg, left neighbor) is present (ie, valid), an operation can be performed by the CAVLC-TOTC instruction (eg, mbCurrAddr% mbPerLine), which is similar to the same procedure performed in the CABAC embodiment, thus Will not be described. Similar to the described CABAC program, the contents of the adjacent content memory 564 can be removed using the CWRITE instruction, and the INSERT instruction can be used.

Clients Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 70 200821982 更新鄰近内容記憶體564的内容、局部暫存器612以及總體暫存器614，其中可使用INSERT指令以供寫入至 mbNeighCtxCurrent暫存器603。維持在鄰近内容記憶體564 之資料的結構可描述如下： ,mbNeighCtxCurrent[01:00] ： 25b : mbType i mbNeighCtxCurrent[65:02] ： 4?b : TC[16] mbNeighCtxCurrent[81:66] : 45b ; TCC[cb][4] mbNeighCtxCurrent[97:82] ： 45b : TCC[cr][4] 當執行CWRITE指令時，更新mbNeighCtx□鄰近值，然後初始 mbNeighCtxCurrent 暫存器 603 〇已“述由可k長度解碼單元530b初始的内容記憶體結構以及初始化，下面將描述可變長度解碼單元53〇b (特別是CAVLC 一 TOTC指令）如何使用鄰近内容資訊以計算總係數（TotalCoeff，TC)，其之後將被使用來判斷是否應该使用CAVLC表格以解碼符號。通常，CAVLC的解碼是利用描述於H.264規格的可變長度解碼表格（於此稱為 CAVLC表格），《中根據先前已解碼符號之内容選擇 CAVLC表格以解碼各符號。即對每一格符號而言，其為不相同的CAVLC表格。第7B圖係顯示基本表格結構，其為可變大小的二維陣列。提供表格的陣列（每一個表格^為一特定符號），而每一個符號為霍夫曼（Huffman)編碼。雈夫曼碼被儲存成下列結構的表格： struct Table { unsigned head;Clients Docket No.: S3U06-0014-TW TT's Docket No: 0608-A41247twf.doc/NikeyChen 70 200821982 Update the content of the adjacent content memory 564, the local register 612, and the overall register 614, where the INSERT instruction can be used to For writing to the mbNeighCtxCurrent register 603. The structure of the data maintained in the adjacent content memory 564 can be described as follows: , mbNeighCtxCurrent[01:00] : 25b : mbType i mbNeighCtxCurrent[65:02] : 4?b : TC[16] mbNeighCtxCurrent[81:66] : 45b ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Length decoding unit 530b initial content memory structure and initialization, how variable length decoding unit 53A (especially CAVLC-TOTC instruction) uses neighboring content information to calculate total coefficient (TotalCoeff, TC), which will be described later Used to determine whether a CAVLC table should be used to decode symbols. Typically, CAVLC is decoded using a variable length decoding table (herein referred to as a CAVLC table) described in the H.264 specification, "based on previously decoded symbols" The content selects the CAVLC table to decode each symbol. That is, for each cell symbol, it is a different CAVLC table. Figure 7B shows the basic table structure, which is a variable size two-dimensional array. Providing an array table (each table is a particular symbol ^), and each symbol Huffman table (Huffman) encoding Huan Huffman code is stored as the following structure: struct Table {unsigned head;

Client^ Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChe] 71 200821982 struct table { unsigned val; unsigned shv; }table[]; }Table[];Client^ Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChe] 71 200821982 struct table { unsigned val; unsigned shv; }table[]; }Table[];

I 下面將描述根據唯一前置（prefix )編碼用以比對的方法（MatchVLC函數）。通常，CAVLC表格包括可變長度部分以及固定長度部分。藉由執行一些固定大小的索引查找（lookup)可簡化比對。在MatchVLC函數中，可執行 READ運算而不從SREG暫存器562a移除位元。因此，對處理位元流的位元流緩衝器562b而言，READ運算不同於前文所描述的READ指令。在下面所描述的MatchVLC函數中，一些位元（fixL)從位元流緩衝器562b被複製，然後於一指定表格中查找。指定表格内的各項目包含特定格式（例如：值以及以位元型式的大小）。使用項目的大小以進行位元流。 FUNCTION MatchVLC(Table5 maxldx) INPUT Table; INPUT maxldx;I. A method for matching (MatchVLC function) according to a unique prefix encoding will be described below. Generally, a CAVLC table includes a variable length portion and a fixed length portion. The alignment can be simplified by performing some fixed-size index lookups. In the MatchVLC function, the READ operation can be performed without removing the bit from the SREG register 562a. Therefore, for the bit stream buffer 562b that processes the bit stream, the READ operation is different from the READ instruction described above. In the MatchVLC function described below, some bits (fixL) are copied from the bit stream buffer 562b and then looked up in a specified table. Each item in the specified table contains a specific format (for example: value and size in bit type). Use the size of the project to make a bit stream. FUNCTION MatchVLC(Table5 maxldx) INPUT Table; INPUT maxldx;

Idxl = CLZ(sREG);//count number of leading zeros Idxl = (Idxl > maxldx)? maxldx : Idxl; fixL = Table[Idxl].head; SHL(sREG，Idxl+#1); //shift buffer Idxl + 1 bit leftIdxl = CLZ(sREG);//count number of leading zeros Idxl = (Idxl > maxldx)? maxldx : Idxl; fixL = Table[Idxl].head; SHL(sREG,Idxl+#1); //shift buffer Idxl + 1 bit left

Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 72 200821982Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 72 200821982

Idx2 - (fixL)? 0 : READ(fixL); (val5 shv) - Table[Idxl][Idx2]; SHL(sREG，shv); return val;Idx2 - (fixL)? 0 : READ(fixL); (val5 shv) - Table[Idxl][Idx2]; SHL(sREG,shv); return val;

ENDFUNCTONENDFUNCTON

I 第7B圖係顯示上述表格結構之示範二維陣列的方塊圖，用以描述在CAVLC解碼之内容中的MatchVLC函數。從H.264標準内的表格9-5中得到當nC == -1時的例子，其描述如下：I Fig. 7B is a block diagram showing an exemplary two-dimensional array of the above table structure for describing the MatchVLC function in the content of CAVLC decoding. An example when nC == -1 is obtained from Table 9-5 in the H.264 standard, which is described as follows:

Coeff—token TrailingOnes TotalCoeff Head Value Shift 1 1 1 〇 33 0 01 ^ 0 〇薄;0Π 001 2 2 〇 66 0 000100 -/000101 人 ;卞000110 ΐ 〇：f ^ 12 ‘ , 1 、 2 / ‘ -T- _，2 輪4 000111 r ‘ 000010 0 4 1 4 1 000011 0 3 3 1 -—---- 1 0000010 0000011 ： ^ 3 1 — 67 Esmr 00000010 2 4 1 68 —一.β 1 00000011 1 ""^"77 ~ "^7"""............. 4 36 I ^Γ^ 麵:妗，卜轉Q霸在偽碼（pseudocode)方面，上述表格可表示如下 Table9_5[8] = { 〇, {{33,0}}，〇，{{〇,〇}}，〇, {{66,0}}，Coeff—token TrailingOnes TotalCoeff Head Value Shift 1 1 1 〇33 0 01 ^ 0 Infertile; 0Π 001 2 2 〇66 0 000100 -/000101 person; 卞000110 ΐ 〇:f ^ 12 ' , 1 , 2 / ' -T - _, 2 rounds 4 000111 r ' 000010 0 4 1 4 1 000011 0 3 3 1 -—---- 1 0000010 0000011 : ^ 3 1 — 67 Esmr 00000010 2 4 1 68 —1.β 1 00000011 1 ""^"77 ~ "^7"""............. 4 36 I ^Γ^ Face: 妗, 卜转Q霸 in pseudocode (pseudocode) The above table can be expressed as follows Table9_5[8] = { 〇, {{33,0}}, 〇, {{〇,〇}}, 〇, {{66,0}},

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 73 200821982 2, {{2, 2}， {99, 2}， {34,2}， {1， 2}}， 1，{{4, 1}，{3, 1}}， 1，{{67, 1}“35, 1}}， 1，{{68, 1}，{36, 1}}，〇, {{1〇〇, 〇}} }；使用上述表格結構，可使用上述之MatchVLC函數以實施CAVLC解碼。由於MatchVLC函數，對位元流執行口十鼻别$ 0以存取已知語法成分的表格。再者，藉由計算前導0的值是否大於Idx的最大值，MatchVLC函數可啟動計算前導〇運算（例如在部分實施例中，使用計算前導〇模組576與讀取模組572)，然後傳回maxldx (其處置的情況為0000000，如第7B圖的表格所顯示）。MatchVLC 函數以及表格結構的另一優點為不需要多個指令來處置這些情況，其由下面MatchVLC區段所處置：Idx 1 = CLZ(sREG) 計算前導 0 的數量，以及 Idx 1 = (Idx 1 > maxldx)? maxldx : Idxl。接著，使用MatchVLC函數的下列區段移除已使用的位元：SHL(sREG，Idxl+#1)。使用下面MatchVLC區段讀取子陣列（sub-array)的標頭：fixL = Table[Idxl].head，以及Idx2=(!fixL)? 0 : READ(fixL)，其傳送最大數量的位元數以被不確定地讀取。前導〇可以相同，但尾隨位元的大小可以改變。因此，在一實施例中，可實施CASEX種類情況敘述（使用較多記憶體，但較簡單的碼結構）。Client's Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 73 200821982 2, {{2, 2}, {99, 2}, {34,2}, {1, 2}} , 1,{{4, 1},{3, 1}}, 1,{{67, 1}"35, 1}}, 1,{{68, 1},{36, 1}}, 〇, {{1〇〇, 〇}} }; Using the above table structure, the above-mentioned MatchVLC function can be used to implement CAVLC decoding. Due to the MatchVLC function, the bit stream is executed to access the known syntax components. Further, by calculating whether the value of the leading zero is greater than the maximum value of Idx, the MatchVLC function can initiate a computational pre-derivative operation (eg, in some embodiments, using the pre-computation module 576 and the reading module 572), Then return to maxldx (the case for which it is handled is 0000000, as shown in the table in Figure 7B). Another advantage of the MatchVLC function and table structure is that no multiple instructions are needed to handle these situations, which are handled by the MatchVLC section below. :Idx 1 = CLZ(sREG) Calculates the number of leading zeros, and Idx 1 = (Idx 1 > maxldx)? maxldx : Idxl. Next, use the following section of the MatchVLC function to remove the used Bits: SHL(sREG, Idxl+#1). Use the following MatchVLC section to read the header of the sub-array: fixL = Table[Idxl].head, and Idx2=(!fixL)? 0 : READ(fixL), which transfers the maximum number of bits to be read indefinitely. The leading 〇 can be the same, but the size of the trailing bits can be changed. Therefore, in one embodiment, the CASEX category description can be implemented ( Use more memory, but a simpler code structure).

Client’s Docket No.: S3U06-0014-TWClient’s Docket No.: S3U06-0014-TW

Docket No:0608-A41247twf.doc/NikeyChen 74 200821982 : 使用（val，shv) = Table[Idxl][Idx2]以及 SHL(sREG，shv) 讀取表格的實際值，其亦顯示實際上多少位元為語法成分所使用。這些位元從位元流被移除，且語法成分的值返回至目標暫存器。已描述VLC匹配的方法以及表格結構的配置，接著返回參考第7A圖以描述CAVLC解碼引擎或是程序（例如： CAVLC模組582)。一旦位元流被載入，且解碼引擎、記憶體結構以及暫存器被載入’藉由驅動軟體128發出 ^ CAVLC—TOTC指令可啟動係數符記模組71〇。在一實施例中，CAVLC—TOTC指令具有下面示範格式： CAVLC_TOTC DST5 S1 其中’ S1以及DST分別包括一輸入暫存器以及一内部輸出暫存器，具有下面所提供的示範格式： SRC1 [3:0] = blkldx SRC1 [18:16] = blkCat SRC1 [24] = iCbCr 剩下的位元為未定義。輸出格式描述如下： DST [31:16] = TrailingOnes DST [15:0] - TotalCoeff 因此’如圖所顯示’係數符記模組71 〇接收對應於 mbCurrAddr、mbType、是否正在處理色度通道的指示（例如：iCbCr )，以及blkldx (例如：區塊索引，因為圖像可被分成許多區塊）。對從位元流緩衝器562b戶斤存取的已知巨集區塊而言，傳送blkldx，不管是8X8像素區塊或是4x4Docket No:0608-A41247twf.doc/NikeyChen 74 200821982 : Use (val,shv) = Table[Idxl][Idx2] and SHL(sREG,shv) to read the actual value of the table, which also shows how many bits are actually The grammatical component is used. These bits are removed from the bitstream and the value of the syntax component is returned to the target scratchpad. The method of VLC matching and the configuration of the table structure have been described, and then reference is made to Figure 7A to describe the CAVLC decoding engine or program (e.g., CAVLC module 582). Once the bit stream is loaded, and the decoding engine, memory structure, and scratchpad are loaded, 'the CAVLC-TOTC command can be issued by the driver software 128 to activate the coefficient register module 71. In one embodiment, the CAVLC_TOTC instruction has the following exemplary format: CAVLC_TOTC DST5 S1 where 'S1 and DST respectively include an input register and an internal output register, having the exemplary format provided below: SRC1 [3: 0] = blkldx SRC1 [18:16] = blkCat SRC1 [24] = iCbCr The remaining bits are undefined. The output format is described as follows: DST [31:16] = TrailingOnes DST [15:0] - TotalCoeff Therefore 'as shown in the figure', the coefficient register module 71 receives an indication corresponding to mbCurrAddr, mbType, whether the chroma channel is being processed. (for example: iCbCr ), and blkldx (for example: block index, because the image can be divided into many blocks). For known macroblocks accessed from the bitstream buffer 562b, blkldx is transmitted, whether it is an 8x8 pixel block or 4x4

Client’s Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 75 200821982 像素區塊正在已知位置上進行處理。由驅動軟體丨28提供上述資訊。係數符記模組710包括一查找表。根據前文描述而輸入至係數符記模組710的查找表，可得到拖尾係數的個數（TrailingOnes)以及非零係數（TotalCoeff)的個 η 數。TrailingOnes傳送有多少個1在一列上，而T〇taic〇eff 傳送有多少運行/位準對（run/level pair)係數在從位元流抽出的塊狀資料上。TrailingOnes以及TotalCoeff分別提供至CAVLC位準模組714以及零位準模組718c)Trailing〇nes 亦提供至位準0模組716，其對應於從位元流緩衝器562b 所擷取的第一位準（例如：直流（DC)值）。位準模組714紀錄符號的字尾（suffix)長度（例如：尾隨1的數目），以及位準模組7丨4結合位準碼（levdc〇de、來計算位準值（leV_X]) ’之後鱗值儲存在位準陣列 722以及運行陣列724内。位準模組714操作在 CAVLC—LVL指令下，其具有下列格式： CAVLC_LVL DST，S2, S1 ，其中： 51 - Idx (16-bit)； 52 ^ suffixLength (16-bit);以及 DST = suffixLength (16-bit) 〇字尾長度（suffixLength)傳送碼字（c〇dew〇rd)的大 J、為何。來自驅動軟體128的輸入提供指定字尾長度之大 J、的資訊。此外，在一實施例中，因為字尾長度值被^新，Client's Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 75 200821982 The pixel block is being processed at a known location. The above information is provided by the driver software 丨28. The coefficient signature module 710 includes a lookup table. According to the lookup table input to the coefficient register module 710 as described above, the number of trailing coefficients (TrailingOnes) and the number of η of non-zero coefficients (TotalCoeff) can be obtained. TrailingOnes transmits how many 1s are in a column, and T〇taic〇eff transmits how many run/level pair coefficients are on the block data extracted from the bitstream. TrailingOnes and TotalCoeff are provided to CAVLC level module 714 and zero level module 718c, respectively. Trailing〇nes are also provided to level 0 module 716, which corresponds to the first bit retrieved from bit stream buffer 562b. Quasi (for example: direct current (DC) value). The level module 714 records the length of the suffix of the symbol (eg, the number of trailing 1), and the level module 7丨4 combines the level code (levdc〇de, to calculate the level value (leV_X))' The scale values are then stored in the level array 722 and the run array 724. The level module 714 operates under the CAVLC-LVL command and has the following format: CAVLC_LVL DST, S2, S1, where: 51 - Idx (16-bit) 52 ^ suffixLength (16-bit); and DST = suffixLength (16-bit) 〇 suffix length (suffixLength) The large J of the transmitted codeword (c〇dew〇rd), why. The input from the driver software 128 provides the designation The length of the suffix length J, in addition, in an embodiment, because the suffix length value is ^,

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 76 200821982 ·· DST以及Μ可選擇為同一暫存器。更注意到’轉發暫存器（例如維持由已知模組内部地產生的資料）亦可被使用，例如F1以及F2。由已知指令内的轉發旗標指示指令以及對應模組是否使用到轉發暫存态j符號F1 (即使用轉發來源丨的值，在一實施例中可由才曰令中的位兀26所指示）以及符號F2 (即使用轉發來源.2 的值，在一實施例中可由指令中的位元27所指示）可表示 f % 轉發暫存裔。當使用轉發暫存器時，CAVLC_LVL·指令可 (^ 具有下列示範格式： — CAVLC LVL.F1.F2 DST，SRC2, SR1 ，其中當不是F1就是F2被設定時（例如成立），所指定的轉發來源被當成輸入。在位準模組714的情況中，轉發暫存器F1對應於由位準模組714產生的位準索引 (level[Idx]) ’其在遞增（increment)模組内遞增並輸入至多工器730。同樣地，轉發暫存器F2對應於字尾長度 (suffixLength)，其由位準模組714所產生並輸入至多工器728。多工器730以及多工器728的其他輸入包括執行單元暫存器輸入（在第7A圖中標示為EU)，如下文所描述。位準模組714的另一輸入是由位準碼模組712所提供的位準碼。位準碼模組712以及位準模組714的結合運算解碼可解碼位準值（位準為按比例縮放（scaling )之前的轉換係數值）。透過具有下列示範格式的指令可致能位準Clienfs Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 76 200821982 ·· DST and Μ can be selected as the same register. It is further noted that 'forwarding registers (e.g., maintaining data generated internally by known modules) can also be used, such as F1 and F2. The instruction is indicated by a forwarding flag within the known command and whether the corresponding module uses the forwarded staging j symbol F1 (ie, the value of the forwarding source 丨 is used, which may be indicated by bit 26 in the command in an embodiment). And the symbol F2 (i.e., using the value of forwarding source .2, as indicated by bit 27 in the instruction in one embodiment) may indicate that f% forwards the temporary descent. When using the forward register, the CAVLC_LVL· command can (^ have the following exemplary format: – CAVLC LVL.F1.F2 DST, SRC2, SR1, where when F1 is not F2 is set (for example, established), the specified forwarding The source is treated as an input. In the case of level module 714, forwarding register F1 corresponds to a level index (level[Idx]) generated by level module 714, which is incremented in an increment module. And input to the multiplexer 730. Similarly, the forwarding register F2 corresponds to the suffixLength, which is generated by the level module 714 and input to the multiplexer 728. The multiplexer 730 and the multiplexer 728 Other inputs include an execution unit register input (labeled EU in Figure 7A), as described below. The other input to the level module 714 is the level code provided by the level code module 712. The combination operation of the quasi-code module 712 and the level module 714 decodes the decodable level value (the level is the conversion coefficient value before scaling). The instruction can be enabled by the instruction with the following exemplary format.

Client’s Docket No.: S3U06-0014-TW TT’s Docket N〇:0608-A41247twf.doc/NikeyChen 77 200821982 碼模組712。 CAVLCJLC SRC1 ，其中SRC 1 = suffixLength ( 16位元）。當使用轉發暫存器F1時，指令可表示如下： r » CAVLC_LVL.F1 SRC1 ，其中如果設定卩1，則轉發3尺01被當成輸入。如第7人圖所顯示，當設定F1時（例如FI = 1)，位準碼模組712 獲得轉發SRC1值（例如來自位準模組714的字尾長度）以作為輸入，否則輸入是從執行單元暫存器所獲得（例如 F卜 0)。回到位準模組714，字尾長度輸入可以是由位準模組 714經由多工器728所轉發，或是經由執行單元暫存器透過多工器728所提供。此外，Idx輸入亦可由位準模組714 經由多工器730所轉發（且由遞增模組來遞增，或是在部分實施例中，能自動遞增而不需要遞增模組），或是經由執行單元暫存器透過多工器730所提供。再者，位準模組 714亦直接從位準碼模組712接收位準碼輸入。除了至轉發暫存器的輸出之外，位準模組714亦提供位準索引 (level[idx])輸出至位準陣列722。如前文所提到，TrailingOnes輸出至位準0模組716。位準〇模組716經由下列指令而致能：Client's Docket No.: S3U06-0014-TW TT’s Docket N〇: 0608-A41247twf.doc/NikeyChen 77 200821982 Code Module 712. CAVLCJLC SRC1 , where SRC 1 = suffixLength (16 bits). When using the forward buffer F1, the command can be expressed as follows: r » CAVLC_LVL.F1 SRC1, where if 卩1 is set, the forward 3 feet 01 is treated as an input. As shown in the figure of the seventh person, when F1 is set (e.g., FI = 1), the level code module 712 obtains the forwarded SRC1 value (e.g., the suffix length from the level module 714) as an input, otherwise the input is from Obtained by the execution unit register (for example, F Bu 0). Returning to the level module 714, the suffix length input can be forwarded by the level module 714 via the multiplexer 728 or via the execution unit register multiplexer 728. In addition, the Idx input can also be forwarded by the level module 714 via the multiplexer 730 (and incremented by the incremental module, or in some embodiments, can be automatically incremented without incrementing the module), or via execution. The unit register is provided by the multiplexer 730. Furthermore, the level module 714 also receives the level code input directly from the level code module 712. In addition to the output to the transfer register, the level module 714 also provides a level index (level[idx]) output to the level array 722. As mentioned earlier, TrailingOnes outputs to level 0 module 716. The level register module 716 is enabled via the following instructions:

CAVLCJLVLO SRCCAVLCJLVLO SRC

Client’s Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen yg 200821982 ，其中 SRC = trailingOnes(coeff-token)。位準〇模組 716 的輸出包括位準索引（Level[Idx])，其被提供至位準陣列 722。係數值被編碼成為正負號以及大小。位準〇模組716 提供係數的正爲號值。結合來自CAVLC位準模組714的Client's Docket No.: S3U06-0014-TW TT^ Docket No: 0608-A41247twf.doc/NikeyChen yg 200821982, where SRC = trailingOnes(coeff-token). The output of the level module 716 includes a level index (Level[Idx]) that is provided to the level array 722. The coefficient value is encoded as a sign and a size. The register module 716 provides a positive value for the coefficients. Combined with the CAVLC level module 714

I 大小值以及來自位準〇模組716的正負號值，並:寫入至位準陣列722。使用位準索引（levei[idx])來指定寫入的位置。在一實施例中，係數是在子區塊（區塊為8x8 )的一個4x4矩陣内，而不按照光栅（raster)順序。陣列之後轉換成4x4矩陣。換句話說，被解碼的係數位準以及運行不是光柵格式。從位準-運行資料，4x4矩陣可以被重建（但是以鑛齒形掃描順序），接著重新排列成光栅順序4x4。從係數符記模組710輸出的TotalCoeff被提供至零位準模組718。零位準模組718可經由下列指令而致能： CAVLC_ZL DST，SRC1 其中，SRC1 =maxNumCoefi( 16 位元）以及 DST 二 ZerosLeft (16位元）。maxNumCoeff係由H.264標準所給定，並被重送以作為指令的原始值。換句話說，maxNumCoeff是由軟體所設定。在部分實施例中，maxNumCoeff可被儲存在硬體中。變換係數被編碼成（位準，運行）格式，其與被編碼成0之係數（位準）的數目有關。零位準模組718提供兩個輸出ZerosLeft以及Reset ( reset = 0 )，其分別被提供至多工器740以及多工器742。多工器740亦接收來自The I size value and the sign value from the level register module 716 are: written to the level array 722. Use the level index (levei[idx]) to specify the location of the write. In one embodiment, the coefficients are within a 4x4 matrix of sub-blocks (blocks 8x8), not in raster order. The array is then converted to a 4x4 matrix. In other words, the decoded coefficient level and operation are not raster format. From the level-run data, the 4x4 matrix can be reconstructed (but in the orthodontic scan order) and then rearranged into a raster order of 4x4. The TotalCoeff output from the coefficient register module 710 is supplied to the zero position module 718. The zero level module 718 can be enabled via the following instructions: CAVLC_ZL DST, SRC1 where SRC1 = maxNumCoefi (16 bits) and DST 2 ZerosLeft (16 bits). maxNumCoeff is given by the H.264 standard and is resent as the original value of the instruction. In other words, maxNumCoeff is set by the software. In some embodiments, maxNumCoeff can be stored in hardware. The transform coefficients are encoded into a (level, run) format that is related to the number of coefficients (levels) that are encoded as zero. The zero level module 718 provides two outputs, ZerosLeft and Reset (reset = 0), which are provided to the multiplexer 740 and the multiplexer 742, respectively. The multiplexer 740 also receives from

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 79 200821982 .運行模組720的轉發暫存器F2。多工器742接收來自運行模組72G之已遞增（在部分實施例巾是經由遞增模組或是其他方式）的轉發暫存器Fl。運行模組720分別從多工器74〇以及多工器742接收 ZerosLeft，以及Idx輸入並提供運行索引（R，un[Idx])輸出至運行陣列724。如前文所描述，因為運行，長度編碼被用作進一步壓縮，則係數被編碼成（位準，運行）格式。舉例來說，假設擁有下列的值1〇 12 12 15 19丨丨1㈢㈣⑽ ’ 10,則可被編碼成（1〇,〇) (12J) (15,〇) (19,〇) U,2) (〇,5) (1，0) (〇,〇)。這個碼字通常較短。索引為位準索引的對應索引。運行模組720可經由下列指令而致能： CAVLC一RUN DST，S2, S1 ，其中，由於ZerosLeft值被更新，DST以及S2可選擇為相同暫存器。因此，CAVLC—RUN指令的示範不具正負號值顯示如下：、 SI = Idx( 16-bit)， S2 二 ZerosLeft( 16-bit)， DST = Zerosleft(16-bit) 〇參考第7A圖，轉發暫存器被使用，其中CAVLC—RUN 指令可得到下列格式： CAVLC.F1.F2 DST，SRC2, SRC1Clienfs Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 79 200821982 . The forwarding register F2 of the running module 720. The multiplexer 742 receives the forwarded register F1 from the run module 72G that has been incremented (in some embodiments, via an incremental module or otherwise). The run module 720 receives ZerosLeft from the multiplexer 74A and the multiplexer 742, respectively, and the Idx input and provides a run index (R, un[Idx]) output to the run array 724. As described earlier, because the run length encoding is used for further compression, the coefficients are encoded into a (level, run) format. For example, suppose you have the following values: 1〇12 12 15 19丨丨1(3)(4)(10) '10, then you can encode it as (1〇,〇) (12J) (15,〇) (19,〇) U,2) ( 〇, 5) (1,0) (〇, 〇). This code word is usually shorter. The index is the corresponding index of the level index. The run module 720 can be enabled via the following instructions: CAVLC - RUN DST, S2, S1, wherein DST and S2 can be selected as the same register since the ZerosLeft value is updated. Therefore, the demonstration of the CAVLC-RUN instruction does not have a sign value as follows: , SI = Idx ( 16-bit), S2 two ZerosLeft ( 16-bit), DST = Zerosleft (16-bit) 〇 Refer to Figure 7A, Forwarding The memory is used, in which the CAVLC_RUN instruction can be obtained in the following format: CAVLC.F1.F2 DST, SRC2, SRC1

Client’s Docket No.: S3U06-0014-丁WClient’s Docket No.: S3U06-0014-Ding W

TT^ Docket No:0608-A41247twf.doc/NikeyChen gQ 200821982 ’其中’當不是F1就是F2被設定被當成輪入。則適*的轉發來源關於兩暫存器暫列，位準行陣列724對應於運行。在j 對應於位準，而運元素,。對位準陣列722而言，二二’各陣列包含16個具正負號。使用下列指令分別從位準陣列且不列724讀取位準值以及運行值。干〜…及運行陣 READ_LRUN Dst ，其中，在一實施例中，dst 操作頃取可變長度解碼單元、θ仔叩）。上述 \ 暫存器，並儲存至目標暫存器。當此：準H以及運行 Ζ時暫存器時，運行值被轉換成16位元於，說，前兩個暫存器維持16個16位元的位準=。陣列儲存第一16個係數），而第三以及第四暫存哭= 5己丨思脰。在一貝施例中，以下列順序寫入值：在第—暫存器中，最低有效16位元包含LEVEL[〇]值，而位元二二包含LEVEL[1]值等’直到位元112·127包含LEVEL[7]值。接著，對第一暫存器對而言，最低有效丨6位元包含 LEVEL[8]等。相同的方法應用在run值。根據下列示範指令格式，可使用CLR一LRUN指令來清TT^ Docket No: 0608-A41247twf.doc/NikeyChen gQ 200821982 'where' When F1 is not F2 is set as a round. Then the appropriate forwarding source. For the two register temporary columns, the level alignment array 724 corresponds to the operation. In j corresponds to the level, and the element,. For the level array 722, the two or two arrays contain 16 positive and negative signs. Use the following instructions to read the level and run values from the level array and column 724, respectively. Dry ~... and run array READ_LRUN Dst , where, in one embodiment, the dst operation takes a variable length decoding unit, θ 叩). The above \ scratchpad is stored to the target scratchpad. When this: quasi H and run the scratch register, the run value is converted to 16 bits, saying that the first two registers maintain 16 levels of 16 bits =. The array stores the first 16 coefficients), while the third and fourth temporary crying = 5 has been thought. In a case, the values are written in the following order: in the first register, the least significant 16 bits contain the LEVEL [〇] value, and the bit 22 contains the LEVEL [1] value, etc. until the bit 112·127 contains the LEVEL[7] value. Next, for the first register pair, the least significant 丨6 bits include LEVEL[8] and so on. The same method is applied to the run value. According to the following exemplary instruction format, the CLR-LRUN instruction can be used to clear

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 200821982 除位準陣列722以及運行陣列724的暫存器。上述可變長度解碼單元530b的軟體（著色程序）以及硬體操作（例如模組），特別是CAVLC模組582 ,可使用下列偽碼來描述。Clienfs Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 200821982 In addition to the level array 722 and the register that runs the array 724. The software (shading program) and hardware operations (e.g., modules) of the variable length decoding unit 530b described above, particularly the CAVLC module 582, can be described using the following pseudo code.

LL

Residual_block__cavlc( coeffLevel, maxNumCoeff) { """"CLR一 LEVEL一 RUN ' ' coeff一token if( TotalCoeff( coeff_token) > 0) { _if(TotalCoeff(coeffJoken)>l〇 && TrailingOnes(coeffjoken) <3) suffixLength = 1 Else " suffixLength = 0 CAVLCJevelOQ; _f〇r( I = TrailingOnes(coefF_taken); I < TotalCoeff( coeffjoken); i++){ _CAVLCJevelCode(levelCodeysufFixLength); _ _CAVLCJevel(suffixLength, i,levelCode) CAVLC^ZerosLeft(ZerosLeft, maxNumCoeff) for( i = 0; i < TotalCoeff( coeffjoken) -1; i++) { CAVLC_run(i, ZerosLeft) READ_LEVEL,RUN(level, nm) mn[ TotalCoeff( coeff—token) -1 ] = zerosLeft coeffNum = -1 fo「( i = TotalCoeff( coeff」oken) -1; i >= 0; i-) { _coeffNum += run[ i ] +1 _coeffLevel[ coeffNum ] = \eye\[ i] MPEG解碼以上已描述用作CABAC解碼（經由CABAC模組580 的可變長度解碼單元530a)以及CAVLC解碼（經由CAVLC 模組582的可變長度解碼單元530b)的解碼系統200 ,接Residual_block__cavlc( coeffLevel, maxNumCoeff) { """"CLR-LEVEL-RUN ' ' coeff-token if( TotalCoeff( coeff_token) > 0) { _if(TotalCoeff(coeffJoken)>l〇&& TrailingOnes(coeffjoken) <3) suffixLength = 1 Else " suffixLength = 0 CAVLCJevelOQ; _f〇r( I = TrailingOnes(coefF_taken); I < TotalCoeff( coeffjoken); i++){ _CAVLCJevelCode(levelCodeysufFixLength); _ _CAVLCJevel(suffixLength , i,levelCode) CAVLC^ZerosLeft(ZerosLeft, maxNumCoeff) for( i = 0; i < TotalCoeff( coeffjoken) -1; i++) { CAVLC_run(i, ZerosLeft) READ_LEVEL, RUN(level, nm) mn[ TotalCoeff( Coeff—token) -1 ] = zerosLeft coeffNum = -1 fo"( i = TotalCoeff( coeff"oken) -1; i >= 0; i-) { _coeffNum += run[ i ] +1 _coeffLevel[ coeffNum ] = \eye\[i] MPEG decoding The decoding system described above for CABAC decoding (variable length decoding unit 530a via CABAC module 580) and CAVLC decoding (variable length decoding unit 530b via CAVLC module 582) has been described. 200, pick up

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608_A41247twf.doc/NikeyChen 82 200821982 下來將描述解碼系統200的MPEG實施例，於此稱為可變長度解碼單元53〇c。可變長度解碼單元53〇c是根據由 MPEG模組578 (第5C圖所顯示）所執行的運算而操作。為了簡化，與CABAC以及CAVLC實施例共有的特徵（包 . 括位元流緩衝器以及對應的指，令）被省略，除了下列其他 : 需要注意的部分。INIT指令設置可變長度解碼單元530進入MPEG模式，以及使用READ、NPSTR、INPTRB (解釋於前文）以及VLC__MPEG2指令的混合以解碼MPEG-2位元 Γ 流。由著色器程式判斷使用何種方法。MPEG-2位元流具有全決定文法（fully deterministic grammar )，且著色碼執行用以解密文法的方法。在一實施例中，對MPEG-2處理而言，實施表格以霍夫曼解碼於MatchVLC一X函數，描述於後。因此，兩指令被載入至MPEG模組578，包括INIT—MPEG2指令以及 VLC—MPEG2指令。INIT—MPEG2指令載入位元流並設定可變長度解碼單元530進入MPEG2模式。在此模式中，當 I，第一係數為直流（DC)時，總體暫存器614保持住值。在 MPEG-2中有一或多個串流，其為相同的，但是根據是否為直流或是父流而有不同的解譯。位元載入至 VLD一globalRegister.InitDC暫存器被使用，而不是創造另一個指令。注意到對應於總體暫存器614 (例如映射到總體暫存器614(例如globalregister[0]))的暫存器使用在 CABAC以及CAVLC模式中，但是因為MPEG2模式下而有不同的解譯（以及因此標示不同）。因此，在巨集區塊Client's Docket No.: S3U06-0014-TW TT's Docket No: 0608_A41247twf.doc/NikeyChen 82 200821982 An MPEG embodiment of the decoding system 200 will be described below, referred to herein as a variable length decoding unit 53A. The variable length decoding unit 53A is operated in accordance with an operation performed by the MPEG module 578 (shown in Fig. 5C). For the sake of simplicity, features common to CABAC and CAVLC embodiments (including bitstream buffers and corresponding fingers, commands) are omitted, except for the following: Points to note. The INIT instruction sets the variable length decoding unit 530 to enter the MPEG mode, and uses a mixture of READ, NPSTR, INPTRB (explained above) and VLC__MPEG2 instructions to decode the MPEG-2 bit stream. The color program program determines which method to use. The MPEG-2 bit stream has a fully deterministic grammar, and the shading code performs a method for decrypting the grammar. In one embodiment, for MPEG-2 processing, the implementation table is Huffman decoded in the MatchVLC-X function, as described below. Therefore, the two instructions are loaded into the MPEG module 578, including the INIT-MPEG2 instruction and the VLC-MPEG2 instruction. The INIT-MPEG2 instruction loads the bit stream and sets the variable length decoding unit 530 to enter the MPEG2 mode. In this mode, when I, the first coefficient is direct current (DC), the overall register 614 holds the value. There are one or more streams in MPEG-2 that are identical, but have different interpretations depending on whether they are DC or parent. The bit is loaded into the VLD-globalRegister. The InitDC register is used instead of creating another instruction. Note that the scratchpad corresponding to the overall scratchpad 614 (eg, mapped to the overall scratchpad 614 (eg, globalregister[0])) is used in CABAC and CAVLC modes, but has different interpretations due to the MPEG2 mode ( And therefore the label is different). Therefore, in the macro block

Client’s Docket No.: S3U06-0014-TW TT^ Docket No:0608-A41247twf.doc/NikeyChen 83 200821982 的開始，值（VLD一globalRegister.InitDC暫存器内的位元）被初始化成1。當使用 MatchVLC一3函數時，刦斷 VLD一globalRegister.InitDC暫存器内的位元是否為1或是〇。如果為1的話，位元被改變成〇，以供已知巨集區塊後來的離散餘弦變換（discrete cosine transform，DCT)符号皮進行解碼。由著色器以及内部重置設定上述值。在實體部分，VLD_globalRegister.InitDC位元為旗標值，其傳送被解碼的DCT符號是否為已知巨集區塊之DCT符號的開始。 f MPEG模組578使用一具有符號之非常特定文法進行解碼，其中上述符號是使用限定數量之霍夫曼表格所解碼。在具有特定符號值的著色器内執行文法的分析，其中特定符號值是使用具有#Imm 16值使用於特定霍夫曼表才久的VLC一MPEG2指令所得到，其應該被使用以解碼特定符號。 ' 在描述可變長度解碼單元530c的不同元件之前，用以實施MPEG-2標準之不同表格的硬體以及軟體結構的簡單 L 描述如下。在 MPEG-2 標準（ISO-IEC 13818-2 ( 1995)) 中，所使用的編碼被定義在表B-1至表B-15,其為MpEG_2 標準所提供之已知表格。在可變長度解碼單元53〇c的不同實施例中，一或多個表B-1至表B-15以專業硬體型式而^ 施，例如合成為邏輯閘。根據實施方式（例如：Hdtv、 HDDVD等）或是所需之硬體安排，部分表格可以不用硬體方式來實施，而是可以使用其他指令（例如：將描述於後的EXP-GOL—UD指令，或是透過rEad指令）來實施。Client's Docket No.: S3U06-0014-TW TT^ Docket No: 0608-A41247twf.doc/NikeyChen 83 At the beginning of 200821982, the value (the bit in the VLD-globalRegister.InitDC register) is initialized to 1. When the MatchVLC-3 function is used, whether the bit in the VLD-globalRegister.InitDC register is truncated is 1 or 〇. If it is 1, the bit is changed to 〇 for decoding by the discrete cosine transform (DCT) symbol skin after the known macro block. The above values are set by the color picker and internal reset. In the entity part, the VLD_globalRegister.InitDC bit is a flag value that conveys whether the decoded DCT symbol is the beginning of the DCT symbol of the known macroblock. f MPEG Module 578 decodes using a very specific grammar with symbols that are decoded using a defined number of Huffman tables. The analysis of the grammar is performed in a colorimeter having a particular symbol value, which is obtained using a VLC-MPEG2 instruction having a #Imm 16 value for a particular Huffman table, which should be used to decode a particular symbol . Before describing the different elements of the variable length decoding unit 530c, the hardware and the simple structure of the software structure for implementing the different tables of the MPEG-2 standard are described below. In the MPEG-2 standard (ISO-IEC 13818-2 (1995)), the codes used are defined in Tables B-1 through B-15, which are known tables provided by the MpEG_2 standard. In a different embodiment of the variable length decoding unit 53A, one or more of Tables B-1 through B-15 are implemented in a professional hardware format, such as a logical gate. Depending on the implementation (eg Hdtv, HDDVD, etc.) or the hardware arrangement required, some tables can be implemented without hardware, but other instructions can be used (eg: EXP-GOL-UD instructions that will be described later) Or through the rEad command).

Client’s Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChei 84 200821982 舉例來說，雖然表B-2、表B-3以及表Β-ll的邏輯閘數量不大，所使用到的加法可能需要額外的多工器階段，其意味有關速度以及延遲。在部分實施例中，表B-5至表B-8 不由硬體所支援，因為其不需要支援設定檔。然而，部分貫施例可透過對效雖具有最小影響之不同指令（例如： INPSTR、EXP—GOLJLJD以及READ指令）而提供上述支繼續參考已知的 MPEG 表格，表 B-1 (' (Macroblock—address—increment )、表 B-10 ( motion—code ) 以及表B-9 (coded_block_pattern)具有相似的結構。由於部分相似，上述三個表格可使用由MPEG模組578執行的 MatchVLC函數而實施以及描述於後。對表B-9以及表B-10 而言，示範的表格結構表示如下： struct Table { unsigned head; //表格位址之位元數 struct table { ( unsigned val:6; //表 B-10 中為 5 位元 unsigned shv:2; //實際位元數 }table[]; }Table[]; 對表B-l而言，示範的表格結構表示如下： struct Table { unsigned head; //表格位址之位元數 struct table {Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChei 84 200821982 For example, although the number of logic gates in Table B-2, Table B-3, and Table-ll-ll is not large, The addition used may require an additional multiplexer phase, which means speed and latency. In some embodiments, Tables B-5 through B-8 are not supported by hardware because they do not require support profiles. However, some of the examples can provide the above-mentioned support by referring to different instructions that have the least impact on the effect (for example: INPSTR, EXP-GOLJLJD, and READ instructions). Table B-1 (' (Macroblock— Address—increment, Table B-10 (motion_code), and Table B-9 (coded_block_pattern) have similar structures. Due to partial similarity, the above three tables can be implemented and described using the MatchVLC function executed by MPEG Module 578. For Table B-9 and Table B-10, the exemplary table structure is expressed as follows: struct Table { unsigned head; //The number of bits in the table address struct table { ( unsigned val:6; //Table In B-10, it is 5-bit unsigned shv:2; //The actual number of bits}table[]; }Table[]; For Table Bl, the exemplary table structure is expressed as follows: struct Table { unsigned head; // The number of bits in the table address struct table {

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 85 200821982 unsigned val:5; unsigned shv:3; //實際位元數 }table[]; }Table[]; ϊ 在下面功能:中，只有SHL運算能從SREG暫存器562a 移除資料。不像著色器的RE AD指令，使用在Match VLC 函數的READ功能能從SREG暫存器562a移除位元而不需要從SREG暫存器562b移除任何位元。下面描述使用在 MPEG-2中實施表格之MatchVLC函數以提供作為霍夫甚解碼。 FUNCTION MatchVLC—1{ T=READ(2); /騮又2位元 SHL(2); · CASE(T){ 00:〇UTPUT(1); 01 : OUTPUT(2); 10:{ Q = READ(1); SHL(1); CASE (Q){ 0: OUTPUT ⑼； 1 :〇UTPUT(3); 11 :{ ldx = CLO(sREG);/^fS；?|*1Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 85 200821982 unsigned val:5; unsigned shv:3; //actual bit number}table[]; }Table[]; ϊ In the following functions: only the SHL operation can remove data from the SREG register 562a. Unlike the shader RE AD instruction, the READ function in the Match VLC function can be used to remove a bit from the SREG register 562a without removing any bits from the SREG register 562b. The following describes the use of the MatchVLC function that implements the table in MPEG-2 to provide as Hough Very Decode. FUNCTION MatchVLC—1{ T=READ(2); /骝2 bits SHL(2); · CASE(T){ 00:〇UTPUT(1); 01 : OUTPUT(2); 10:{ Q = READ (1); SHL(1); CASE (Q){ 0: OUTPUT (9); 1 :〇UTPUT(3); 11 :{ ldx = CLO(sREG);/^fS;?|*1

Idx = min(ldx,7); shv = (Idx != 7) ldx+1: Idx; SHL(shv); 〇UTPUT(4+ldx); FUNCTION MatchVLC_2{Idx = min(ldx,7); shv = (Idx != 7) ldx+1: Idx; SHL(shv); 〇UTPUT(4+ldx); FUNCTION MatchVLC_2{

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 86 200821982 T = READ(2); /顧取2位元 SHL(2); CASE (T){ 00 : OUTPUT(O); 01 : OUTPUT⑴； 10 : OUTPUT(2); 11 :{Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 86 200821982 T = READ(2); / Take 2 bits SHL(2); CASE (T){ 00 : OUTPUT( O); 01 : OUTPUT(1); 10 : OUTPUT(2); 11 :{

Idx = CLO(sREG);謂十算引導1 ΙςΙχ = min(ldx,8); shv = (Idx != 8) ldx+1 : Idx; SHL(shv); 〇UTPUT(3+ldx); FUNCTION MatchVLC_3{ INIT_MB DC = TRUE; T = CLZ(sREG); SHL(T+1); CASE (T){ 0: IF(DC){ DC = FALSE; Q = READ(1); SHL(1); 〇UTPUT({0，SGN(Q)*1});} ELSE{ 〇=READ⑴；Idx = CLO(sREG); say ten-calculated boot 1 ΙςΙχ = min(ldx,8); shv = (Idx != 8) ldx+1 : Idx; SHL(shv); 〇UTPUT(3+ldx); FUNCTION MatchVLC_3 { INIT_MB DC = TRUE; T = CLZ(sREG); SHL(T+1); CASE (T){ 0: IF(DC){ DC = FALSE; Q = READ(1); SHL(1); 〇UTPUT ({0,SGN(Q)*1});} ELSE{ 〇=READ(1);

/ IF (!Q) {〇UTPUT({63,0}); shv=1} // E〇B V ELSE {R=READ(1);〇UTPUT({0，SGN(R)*1}); shv=2} SHL(shv); } Q = READ ⑶； CASE (Q){ 1XX: OUTPUT({1, SGN(Q[1])*1}); shv = 2; 01X: OUTPUT({2, SGN(Q[0])*1}); shv = 3; 00X:〇UTPUT({0, SGN(Q[0])*2}); shv = 3; } SHL(shv); } 2：{ Q = READ(2); SHL(2);/ IF (!Q) {〇UTPUT({63,0}); shv=1} // E〇BV ELSE {R=READ(1);〇UTPUT({0,SGN(R)*1}); Shv=2} SHL(shv); } Q = READ (3); CASE (Q){ 1XX: OUTPUT({1, SGN(Q[1])*1}); shv = 2; 01X: OUTPUT({2, SGN(Q[0])*1}); shv = 3; 00X:〇UTPUT({0, SGN(Q[0])*2}); shv = 3; } SHL(shv); } 2:{ Q = READ(2); SHL(2);

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 87 200821982 CASE (Q){ 00:{ R = READ ⑷； CASE (R){ 000X :〇UTPUT({16, SGN(R[0])*1}); 001X : OUTPUT({5, SGN(R[0])*2}); 010X:〇UTPUT({0, SGN(R[0])*7}); 011X :〇UTPUT({2, SGN(R[0])*3}); 100X :〇UTPUT({1，SGN(R[0])*4}); , 101X:〇UTPUT({15, SGN(R[0])*1}); 110X : OUTPUT({14, SGN(R[0]ri}); 111X:〇UTPUT({4, SGN(R[0])*2}); }Clienfs Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 87 200821982 CASE (Q){ 00:{ R = READ (4); CASE (R){ 000X :〇UTPUT({16, SGN (R[0])*1}); 001X : OUTPUT({5, SGN(R[0])*2}); 010X:〇UTPUT({0, SGN(R[0])*7}); 011X :〇UTPUT({2, SGN(R[0])*3}); 100X :〇UTPUT({1,SGN(R[0])*4}); , 101X:〇UTPUT({15, SGN (R[0])*1}); 110X : OUTPUT({14, SGN(R[0]ri}); 111X:〇UTPUT({4, SGN(R[0])*2}); }

Shv = 4; } 01X: SGN = READ(1); OUTPUT({0, SGN*3}); shv = 1 10X: SGN = READ(1); OUTPUT({4, SGN*1}); shv= 1 11X: SGN = READ(1); OUTPUT({3, SGN*1}); shv = 1 } SHL(shv); } 3：{ 〇=READ⑶； CASE (Q){ 00X:〇UTPUT({7, SGN(Q[0])*1}); 01X:〇UTPUT({6, SGN(Q[0])*1}); 10X:〇UTPUT({1, SGN(Q[0])*2}); 11X: OUTPUT({5, SGN(Q[0])*1}); } SHL ⑶； } 4：{ Q = READ ⑶； CASE (Q){ 00X:〇UTPUT({2, SGN(Q[0])*2}); 01X:〇UTPUT({9, SGN(Q[0])*1}); 10X:〇UTPUT({0, SGN(Q[0])*4}); 11X:〇UTPUT({8, SGN(Q[0])*1}); } SHL ⑶； } 5: Q = READ(19); OUTPUT({Q[18:13], Q[12:0]}); 6：{Shv = 4; } 01X: SGN = READ(1); OUTPUT({0, SGN*3}); shv = 1 10X: SGN = READ(1); OUTPUT({4, SGN*1}); shv= 1 11X: SGN = READ(1); OUTPUT({3, SGN*1}); shv = 1 } SHL(shv); } 3:{ 〇=READ(3); CASE (Q){ 00X:〇UTPUT({7 , SGN(Q[0])*1}); 01X: 〇UTPUT({6, SGN(Q[0])*1}); 10X: 〇UTPUT({1, SGN(Q[0])*2 }); 11X: OUTPUT({5, SGN(Q[0])*1}); } SHL (3); } 4:{ Q = READ (3); CASE (Q){ 00X:〇UTPUT({2, SGN( Q[0])*2}); 01X: 〇UTPUT({9, SGN(Q[0])*1}); 10X: 〇UTPUT({0, SGN(Q[0])*4}); 11X: 〇UTPUT({8, SGN(Q[0])*1}); } SHL (3); } 5: Q = READ(19); OUTPUT({Q[18:13], Q[12:0] }); 6:{

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 88 200821982 Q = READ(4); CASE (Q){ 000X:〇UTPUT({16, SGN(Q[0])*1}); 001X:〇UTPUT({5, SGN(Q[0])*2}); 01 OX: OUTPUT({0, SGN(Q[0])*7}); 011X:〇UTPUT({2, SGN(Q[0])*3}); 100X:〇UTPUT({1, SGN(Q[0])*4}); 101X:〇UTPUT({15, SGN(Q[0])*1}); 110X:〇UTPUT({14, SGN(Q[0])*1}); 111X:〇UTPUT({4, SGN(Q[0])*2}); } SHL(4); } 7,8,9,10,11:JVLC(TabjeC[Tl); FUNCTION MatchVLC_4{ T = CLZ(sREG); SHL(T+1); CASE (T){ 〇：{ Q = CL〇(sREG); R = min(Q，7); shv=(R!=7)R+1 :R; SHL(shv); CASE (R){ 0: S = READ(1); OUTPUT({0, SGN(S)*1}); shv=1; 1 : S = READ(1); 〇UTPUT({0, SGN(S)*2}); shv=1; 2：{ R=READ(2); SHL(2); CASE (R){ OX: OUTPUT({0, SGN(R[0]r4}); 1X:〇UTPUT({0, SGN(R[0])*5}); 3：{ R=READ(3); SHL(3); CASE (R){ 00X:〇UTPUT({9, SGN(R[0])*1}); 01X:〇UTPUT({1, SGN(R[0])*3}); 10X:〇UTPUT({10, SGN(R[0])*1}); 11X:〇UTPUT({0, SGN(R[0])*8});Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 88 200821982 Q = READ(4); CASE (Q){ 000X:〇UTPUT({16, SGN(Q[0]) *1}); 001X: 〇UTPUT({5, SGN(Q[0])*2}); 01 OX: OUTPUT({0, SGN(Q[0])*7}); 011X: 〇UTPUT( {2, SGN(Q[0])*3}); 100X: 〇UTPUT({1, SGN(Q[0])*4}); 101X: 〇UTPUT({15, SGN(Q[0]) *1}); 110X: 〇UTPUT({14, SGN(Q[0])*1}); 111X:〇UTPUT({4, SGN(Q[0])*2}); } SHL(4) ; } 7,8,9,10,11:JVLC(TabjeC[Tl); FUNCTION MatchVLC_4{ T = CLZ(sREG); SHL(T+1); CASE (T){ 〇:{ Q = CL〇(sREG ); R = min(Q,7); shv=(R!=7)R+1 :R; SHL(shv); CASE (R){ 0: S = READ(1); OUTPUT({0, SGN (S)*1}); shv=1; 1 : S = READ(1); 〇UTPUT({0, SGN(S)*2}); shv=1; 2:{ R=READ(2); SHL(2); CASE (R){ OX: OUTPUT({0, SGN(R[0]r4}); 1X:〇UTPUT({0, SGN(R[0])*5}); 3:{ R=READ(3); SHL(3); CASE (R){ 00X:〇UTPUT({9, SGN(R[0])*1}); 01X:〇UTPUT({1, SGN(R[0 ])*3}); 10X: 〇UTPUT({10, SGN(R[0])*1}); 11X: 〇UTPUT({0, SGN(R[0])*8});

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 89 4：{200821982 R=READ(3); CASE (R){ OXX:〇UTPUT({0, SGN(R[0])*9}); shv=2; 10X:〇UTPUT({0, SGN(R[0])*12}); shv = 3; 11X:〇UTPUT({0, SGN(R[0])*13}); shv = 3; } . SHL(shv); } 5：：{ R=READ(2); SHL(2); CASE (R){Clienfs Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 89 4:{200821982 R=READ(3); CASE (R){ OXX:〇UTPUT({0, SGN(R[ 0])*9}); shv=2; 10X: 〇UTPUT({0, SGN(R[0])*12}); shv = 3; 11X: 〇UTPUT({0, SGN(R[0] )*13}); shv = 3; } . SHL(shv); } 5::{ R=READ(2); SHL(2); CASE (R){

OX:〇UTPUT({2, SGN(R[0])*3}); 1X: 0UTPUT({4, SGN(R[0])*2}); 6 : S = READ(1); OUTPUT({0, SGN(S)*14}); shv=1; 7: S = READ(1); 〇UTPUT({0, SGN(S)*15}); shv=1; } SHL(shv); Q = READ(2);SHL(2); CASE (Q){ OX:〇UTPUT({1, SGN(Q[0])*1}); 10:OUTPUT({63,0}); //<EOB> 11 : R= READ(1); SHL(1); OUTPUT(0,SGN(R)*3}); Q = READ(2);SHL(2); CASE (Q){ 00:{ R = READ(4); shv = 4; CASE (R){ OOOX: 0UTPUT({1, SGN(R[0]r5}); 001X :〇UTPUT({11，SGN(R[0])*1}) 010X:〇UTPUT({0, SGN(R[0])*11}) 011X: OUTPUT({0, SGN(R[0])*10}) 100X:〇UTPUT({13, SGN(R[0])*1}) 101X: 0UTPUT({12, SGN(R[0]ri})OX: 〇UTPUT({2, SGN(R[0])*3}); 1X: 0UTPUT({4, SGN(R[0])*2}); 6 : S = READ(1); OUTPUT( {0, SGN(S)*14}); shv=1; 7: S = READ(1); 〇UTPUT({0, SGN(S)*15}); shv=1; } SHL(shv); Q = READ(2); SHL(2); CASE (Q){ OX:〇UTPUT({1, SGN(Q[0])*1}); 10:OUTPUT({63,0}); // <EOB> 11 : R= READ(1); SHL(1); OUTPUT(0,SGN(R)*3}); Q = READ(2); SHL(2); CASE (Q){ 00: { R = READ(4); shv = 4; CASE (R){ OOOX: 0UTPUT({1, SGN(R[0]r5}); 001X :〇UTPUT({11,SGN(R[0])* 1}) 010X: 〇UTPUT({0, SGN(R[0])*11}) 011X: OUTPUT({0, SGN(R[0])*10}) 100X: 〇UTPUT({13, SGN( R[0])*1}) 101X: 0UTPUT({12, SGN(R[0]ri})

Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 90 200821982 110X:〇UTPUT({3, SGN(R[0])*2}); 111X: OUTPUT({1, SGN(R[0])*4}); 01 : R = READ(1); 〇UTPUT({2，SGN(R)*1}); shv=1; 10 : R = READ(1); 〇UTPUT({1，SGN(R)*2}); shv=1; 11 : R = READ(1); 〇UTPUT({3，SGN(R)*1}); shv=1; } SHL(shv); , } 3：{ Q = READ(3); SHL(3); CASE (Q){ 00X: OUTPUT({0, SGN(Q[0])*7}); ◦1X:〇UTPUT({0, SGN(Q[0])*6}); 10X:〇UTPUT({4, SGN(Q[0])*1}); 11X:〇UTPUT({5, SGN(Q[0])*1}); 4：{ Q = READ(3); SHL(3); CASE (Q){ OOX: 0UTPUT({7, SGN(Q[0])*1}) 01X:〇UTPUT({8, SGN(Q[0])*1}) 10X:〇UTPUT({6, SGN(Q[0])*1}) 11X:〇UTPUT({2, SGN(Q[0])*2}) 5: Q = READ(19); OUTPUT({Q[18:13], Q[12:0]}); U 6:{ Q = READ(2); SHL(2); CASE (Q){ 00: R= READ(1); 0UTPUT({5, SGN(R)*2}); shv=1; 01 : R = READ(1); 〇UTPUT({14, SGN(R)*1}); shv=1; 10:{ R=READ(2); shv = 2; CASE (R){ OX:〇UTPUT({2, SGN(R[0])*4}); 1X: 0UTPUT({16, SGN(R[0])*1}); 11 : R= READ(1); 〇UTPUT({15, SGN(R)*1}); shv=1;Clienfs Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 90 200821982 110X:〇UTPUT({3, SGN(R[0])*2}); 111X: OUTPUT({1, SGN(R[0])*4}); 01 : R = READ(1); 〇UTPUT({2,SGN(R)*1}); shv=1; 10 : R = READ(1); 〇 UTPUT({1,SGN(R)*2}); shv=1; 11 : R = READ(1); 〇UTPUT({3,SGN(R)*1}); shv=1; } SHL(shv ); , } 3:{ Q = READ(3); SHL(3); CASE (Q){ 00X: OUTPUT({0, SGN(Q[0])*7}); ◦1X:〇UTPUT({ 0, SGN(Q[0])*6}); 10X: 〇UTPUT({4, SGN(Q[0])*1}); 11X: 〇UTPUT({5, SGN(Q[0])* 1}); 4:{ Q = READ(3); SHL(3); CASE (Q){ OOX: 0UTPUT({7, SGN(Q[0])*1}) 01X: 〇UTPUT({8, SGN(Q[0])*1}) 10X: 〇UTPUT({6, SGN(Q[0])*1}) 11X: 〇UTPUT({2, SGN(Q[0])*2}) 5 : Q = READ(19); OUTPUT({Q[18:13], Q[12:0]}); U 6:{ Q = READ(2); SHL(2); CASE (Q){ 00: R= READ(1); 0UTPUT({5, SGN(R)*2}); shv=1; 01 : R = READ(1); 〇UTPUT({14, SGN(R)*1}); shv =1; 10:{ R=READ(2); shv = 2; CASE (R){ OX:〇UTPUT({2, SGN(R[0])*4}); 1X: 0UTPUT({16, SGN (R[0])*1}); 11 : R= READ(1); 〇UTPUT({15, SGN(R)*1}); shv=1;

Clienfs Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 91 200821982 一 SHL(shv); } ； 7,8,9,10,11: JVLC(TableCm); } } 從上面MatchVLC函數注意到，通常已解碼之最低有效位元會決定值的正負號，如此可使用SGN功能來檢查，其描述如下：： · ： FUNCTION SGN(R){ RETURN (R 二二 1)? -1 : 1;} 更注意到對MatchVLC_3以及MatchVLC—4而言，表格為 / 共同的（或是至少為一超集），因此可使用下面表格來存取功能。 FUNCTION JVLC(Table){ Q - READ(5); SHL(5); {R，L} =Table[Q]; RETURN {R，L}; } (j 到MatchVLC的介面，或者應該說MatchVLC_X (其中X等於1、2等）函數為下列指令： VLC_MPEG2 DST, #Imml6 ，其中，使用#Imml6值以選擇適當的表格，且因此以解碼特定語法成分。使用#Imml 6作為表格的索引（例如：〇、1、 2、3 )而從指令存取表格。#Imml6的值以及對應方法、語法成分以及MPEG-2表格的關係描述於下面表五。Clienfs Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChen 91 200821982 A SHL(shv); } ; 7,8,9,10,11: JVLC(TableCm); } } From above The MatchVLC function notices that the least significant bit that is usually decoded determines the sign of the value, so it can be checked using the SGN function, which is described as follows: : : : FUNCTION SGN(R){ RETURN (R 2 2 1)? 1 : 1;} It is more noticed that for MatchVLC_3 and MatchVLC-4, the table is / common (or at least a superset), so the following table can be used to access the function. FUNCTION JVLC(Table){ Q - READ(5); SHL(5); {R,L} =Table[Q]; RETURN {R,L}; } (j to the interface of MatchVLC, or should say MatchVLC_X (where The function of X equals 1, 2, etc.) is the following instruction: VLC_MPEG2 DST, #Imml6, where #Imml6 value is used to select the appropriate table, and thus to decode a specific syntax component. Use #Imml 6 as the index of the table (for example: 〇 1, 2, 3) and access the table from the instruction. The relationship between the value of #Imml6 and the corresponding method, syntax component, and MPEG-2 table is described in Table 5 below.

#lmm16 方法語法^ MPEG-2VLC#lmm16方法 Syntax ^ MPEG-2VLC

Client’s Docket No.: S3U06-0014-TW TT，s Docket No:0608-A41247twf.doc/NikeyChen 92 200821982 0 MatchVLC(B-1,7) MacrobIock__address_increment B-1 1 MatchVLC(B-9,8) Coded_blockjDattem B-9 2 MatchVLC(B-10,6) Motion_code B-1〇 3 MatchVLCJ Dct_dc__sizejuminance B-12 4 MatchVLC一2 ~' Dct—dc—size_chrominance B-13 5 MatchVLC一3 DCT coefficients (Table 0) B-14 6 MatchVLC—4 DCT coefficients (Table 1) B-15 表五 EXP_GOLOMB 解碼 Γ 已描述用作CABAC解碼（經由CABAC模組580的可變長度解碼單元530a)、CAVLC解碼（經由CAVLC模組582的可變長度解碼單元53〇b)以及MpEG解碼（經由 MPEG模組578的可變長度解碼單元53〇c)的解碼系統 200，接下來將描述解碼系統200的Exp_G〇1〇mb實施例，於此稱為可變長度解碼單元530d。可變長度解碼單元53〇d 根據EXP-Golomb模組584 (第5C圖所顯示）的運算而操作。可變長度解碼單元530d使用如CABAC及CAVLC實施例所使用的相同硬體以及相同位元流緩衝器排列。因此，與CABAC以及CAVLC實施例共有的特徵被省略，除了下列需要注意的部分。在描述可變長度解碼單元53〇d之前，先提出有關EXP-Golomb的簡單描述。在EXP-Golomb中，資料包含字首（prefix)以及字尾 (suffix )格式，顯示如下： codeNum 範圍 1 0 ◦ 1 x〇 12Client's Docket No.: S3U06-0014-TW TT,s Docket No:0608-A41247twf.doc/NikeyChen 92 200821982 0 MatchVLC(B-1,7) MacrobIock__address_increment B-1 1 MatchVLC(B-9,8) Coded_blockjDattem B- 9 2 MatchVLC(B-10,6) Motion_code B-1〇3 MatchVLCJ Dct_dc__sizejuminance B-12 4 MatchVLC-2~' Dct_dc_size_chrominance B-13 5 MatchVLC-3 DCT coefficients (Table 0) B-14 6 MatchVLC — 4 DCT coefficients (Table 1) B-15 Table 5 EXP_GOLOMB Decoding Γ Described as CABAC decoding (variable length decoding unit 530a via CABAC module 580), CAVLC decoding (variable length decoding via CAVLC module 582) The decoding system 200 of the unit 53〇b) and the MpEG decoding (via the variable length decoding unit 53〇c of the MPEG module 578), the Exp_G〇1〇mb embodiment of the decoding system 200 will be described next, which is referred to herein as Variable length decoding unit 530d. The variable length decoding unit 53〇d operates in accordance with the operation of the EXP-Golomb module 584 (shown in Fig. 5C). The variable length decoding unit 530d uses the same hardware and the same bit stream buffer arrangement as used by the CABAC and CAVLC embodiments. Therefore, features common to the CABAC and CAVLC embodiments are omitted, except for the following points that require attention. Before describing the variable length decoding unit 53〇d, a brief description about EXP-Golomb is proposed. In EXP-Golomb, the data contains the prefix and suffix formats, as shown below: codeNum Range 1 0 ◦ 1 x〇 12

Client’s Docket No.: S3U06-0014-TW TTs Docket No:0608-A41247twf.doc/NikeyChen 200821982 0 0 1 X1 0 0 0 1 X2 0 0 0 0 1 Xa 0 0 0 0 0 1 X4 x〇 3-6 Xi x〇 7-14 x2 Xi x〇 15-30 Xs x2 Xi x〇 31-62Client's Docket No.: S3U06-0014-TW TTs Docket No:0608-A41247twf.doc/NikeyChen 200821982 0 0 1 X1 0 0 0 1 X2 0 0 0 0 1 Xa 0 0 0 0 0 1 X4 x〇3-6 Xi X〇7-14 x2 Xi x〇15-30 Xs x2 Xi x〇31-62

因為多數:的碼字較短，有壓縮被獲得。再着，多數的碼字為唯一並且容易解碼。在H.264中，有四種 EXP-Golomb編碼方法使用：不具正負號一元（Unary )、正負號以及映射（碼字被映射至表格）。這些方法用以編碼已編碼之巨集區塊圖型以及截短（truncate )。在可變長度解碼單元530d中，提供單一指令以執行如下面表六所顯示不同型式之EXP-Golomb碼的解碼。截短EXP-Golomb 解碼描述如下。 codeNum = EXP一GOLOMB—UD t = CLZ SHL(t+1) val = READ(t) "val不具正負號 codeNum = 2 -1 + να/ codeNum = EXP—G〇L〇MB_CD(k〇rder) IZ := CountLeadingZero(sREG); sREG := {(sREG «(lz+1)),bitStreamBuffer[0:lz]}; J := Iz+k0rder-1; val := (J >= 0)? ZeroExtend(sREG[0:J]): 0; sREG := {(sREG «(lz+1)),bitStreamBuffer[0:lz]}; codeNum := (1 «(Iz + kOrder)) + (OxFFFFFFF « kOrder) + val; Seval = EXP一GOLOMB—SD k=EXP—G〇L〇MB_UD (—1)㈣ Cez7 问 Seval = cbp = EXP_G〇L〇MB一MD(Type) k=EXP—G〇L〇MB—UD cbp = TableCBP[Type][k]Because most: the codeword is shorter, compression is obtained. Again, most codewords are unique and easy to decode. In H.264, there are four EXP-Golomb encoding methods used: no sign, unary, sign, and map (codewords are mapped to tables). These methods are used to encode the encoded macroblock pattern and truncation. In the variable length decoding unit 530d, a single instruction is provided to perform decoding of the EXP-Golomb code of a different type as shown in Table 6 below. The truncated EXP-Golomb decoding is described below. codeNum = EXP-GOLOMB-UD t = CLZ SHL(t+1) val = READ(t) "val is not signed codeNum = 2 -1 + να/ codeNum = EXP—G〇L〇MB_CD(k〇rder) IZ := CountLeadingZero(sREG); sREG := {(sREG «(lz+1)), bitStreamBuffer[0:lz]}; J := Iz+k0rder-1; val := (J >= 0)? ZeroExtend(sREG[0:J]): 0; sREG := {(sREG «(lz+1)), bitStreamBuffer[0:lz]}; codeNum := (1 «(Iz + kOrder)) + (OxFFFFFFF « kOrder) + val; Seval = EXP-GOLOMB-SD k=EXP-G〇L〇MB_UD (-1) (4) Cez7 Ask Seval = cbp = EXP_G〇L〇MB-MD(Type) k=EXP—G〇L〇 MB—UD cbp = TableCBP[Type][k]

表六Table 6

Clienfs Docket No.: S3U06-0014-TW TT，s Docket No:0608-A41247twf.doc/NikeyChen 94 200821982Clienfs Docket No.: S3U06-0014-TW TT,s Docket No:0608-A41247twf.doc/NikeyChen 94 200821982

進一步解釋這些指令，EXP_GOLOMB_UD指令解碼一 • 元編碼之編碼符號。EXP—GOLOMB_SD指令解碼具正負號之一元編碼的編碼符號。如表六所顯示，對 EXP_G〇L〇MB_SD指令而言，當k二0時，在正0以及負〇之間沒有差別，因此傳回的值為0。EXP GOLOMB MD » 丁 — (SRC1 )指令解碼映射編碼符號，其中SR：C1 = Type，其與巨集區塊參數以及coded_block_pattern有關。Type的值會導致下列 coded—block—parameter : f Type = 0 Intra 4x4Further explaining these instructions, the EXP_GOLOMB_UD instruction decodes a coded symbol of a . The EXP_GOLOMB_SD instruction decodes a coded symbol with a signed one-element code. As shown in Table 6, for the EXP_G〇L〇MB_SD instruction, when k is 0, there is no difference between positive 0 and negative ,, so the value returned is 0. EXP GOLOMB MD » D - (SRC1) The instruction decodes the mapped code symbol, where SR: C1 = Type, which is related to the macro block parameter and the coded_block_pattern. The value of Type will result in the following coded—block—parameter : f Type = 0 Intra 4x4

Type Inter 可使用表格（例如··晶片上記憶體或是遠端記憶體内的表格）以根據巨集區塊預測模式（例如··碼數量、k)而指定值給 coded_block_parameter 〇解碼截短Exp-Golomb符號的EXP-Golomb指令更描述如下： EXP GOLOMB TD DST，SRC 1 C.i ，其中，SRC 1為範圍。至少在一實施例中，執行截短 Exp-Golomb編碼時，需要先知道範圍。接著，截短 Exp-Golomb編碼可被推導如下： codeNum = EXP—GOLOMB—TD(range){ else if(range==l) return READ(1)A1; else return EXP一GOLOMB一UE; } 因此，EXP_GOLOMB_D指令被提供。Type Inter can use a table (for example, on-wafer memory or a table in remote memory) to specify a value to coded_block_parameter according to the macroblock prediction mode (eg, code number, k). The EXP-Golomb instruction of the -Golomb symbol is further described as follows: EXP GOLOMB TD DST, SRC 1 Ci , where SRC 1 is the range. In at least one embodiment, when truncating Exp-Golomb encoding is performed, the range needs to be known first. Then, the truncated Exp-Golomb encoding can be derived as follows: codeNum = EXP_GOLOMB-TD(range){ else if(range==l) return READ(1)A1; else return EXP-GOLOMB-UE; } Therefore, The EXP_GOLOMB_D instruction is provided.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 95 200821982 • 解釋運异碼以及驅動·發出軟體指令之間的差異是有用的。通常，當設計ISA時，至少有兩個影響在工作上： (1 )瓖心令解碼為較簡單以及在單一管線階段中完成（即快速）’以及（2 )瓖程式设計師助記（mnem〇nics )較簡單。，參考五種EXP-Golomb基準的運荨，從使用者的觀點來看這些運算為有區別的。再者，有兩種不同格式：全部 EXP-Golomb基準的運算輸出相同值，但是只有部分運算具有一輸入（除了内含在運算中的位元流），其提供至少一 Γ 基本區別。傳統上，CPU指令不具有隱含輸入，但是卻透過運算包括隱含輸入。然而，位元流不經由運算而揭露，但是卻是内部自動管理以及使用INIT指令進行初始。從硬體的觀點，可使用EXP-GOLOMB-UD的相同硬體硬體的相同核心（或是至少）以及有關核心硬體的小加法來執行全部的其他EXP-GOLOMB-UD運算（例如在軟體内相似於CASE/SWITCH的部分）。因此編譯器/翻譯器可映射全部的運算至單一指令。再者，這些運算為固定（例如 \ 運算不會動態改變）。參考下面表七的pseudonym行，注意到對 EXP-GOLOMB-UD 以及 EXP-GOLOMB-SD 運算， SRC 1可以被加入（或是由核心所忽略），具有機制用以區別這些運算。同樣地，注意到沒有單一來源指令分組存在，但是可被映射至暫存器-立即分組。藉由使用如表七所顯示不同指令的明顯立即數目，可以得到這些指令之間的區別，因此導致只有一個主要/次要運算碼而不是五個，其包括一個有意義的儲存。即只有一個次要運算碼被使用因為Client’s Docket No.: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 95 200821982 • It is useful to explain the difference between the transport code and the driver and software instructions. In general, when designing an ISA, there are at least two effects at work: (1) Decoding the decoding to be simpler and complete (ie, fast) in a single pipeline phase' and (2) 瓖 programmer support ( Mnem〇nics) is simpler. Referring to the operation of the five EXP-Golomb benchmarks, these operations are different from the user's point of view. Furthermore, there are two different formats: all of the EXP-Golomb benchmarks output the same value, but only some of the operations have one input (except for the bitstream contained in the operation), which provides at least one basic difference. Traditionally, CPU instructions do not have implicit input, but pass through operations including implicit input. However, the bit stream is not exposed through computation, but is internally managed automatically and initialized using the INIT instruction. From a hardware point of view, all other EXP-GOLOMB-UD operations can be performed using the same core of the same hardware hardware of the EXP-GOLOMB-UD (or at least) and small additions to the core hardware (eg in software) It is similar to the part of CASE/SWITCH). So the compiler/translator can map all operations to a single instruction. Again, these operations are fixed (for example, the \ operation does not change dynamically). Refer to the pseudonym line in Table 7 below, noting that for EXP-GOLOMB-UD and EXP-GOLOMB-SD operations, SRC 1 can be added (or ignored by the core) with mechanisms to distinguish these operations. Again, note that no single source instruction packet exists, but can be mapped to a scratchpad-immediate packet. By using a distinct immediate number of different instructions as shown in Table 7, the distinction between these instructions can be obtained, thus resulting in only one primary/secondary opcode instead of five, which includes a meaningful store. That is, only one secondary opcode is used because

Client’s Docket No"· S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 96 200821982 可使用立即格式指令，以及藉由編碼帶有適當資料的立即 ’資料攔位並指定Pseudonym可完成不同EXP_Golomb指令之間的區別。 EXP一GOL〇MB_D Dst5 #Type5 Srcl.lane ，，其中經由下列表七可決S#Type/ #Type Pseudonym 指令 0x0 EXP一G〇L〇MB_UD Dst EGOLD Dst, 0x0, Src1 0x1 EXP一GOLOMB一SD Dst EGOLD Dst,0x1,Src1 0x2 EXP—G〇L〇MB一TD Dst，Src1 EGOLD Dst,0x2,Src1 0x3 EXP_GOLOMB_MD Dst,Src1 EGOLD Dst, 0x3, Src1 0x4 EXP—G〇L〇MB一CD Dst, Src1 EGOLD Dst, 0x4, Src1 表七進一步解釋表七，對#type=0x0或是射;叩6=0\1而言，沒有Srcl欄位是需要的，以及不需要指定這些指令至另一主要或是次要運算碼群組，因為可指定虛擬（dummy ) Src 或是Src以及Dst可被標示為相同。 EXP-Golomb編碼符號被編碼成如下圖所顯示（例如包括0或是多個引導0、跟隨著1，以及然後是對應於引導0 之數量的一些位元）： codeNum 範圍 1 0 0 1 X〇 1-2 0 0 1 X1 x〇 3-6 0 0 0 1 X2 Xi x〇 7-14 0 0 0 0 1 X3 x2 Xi x〇 15-30 0 0 0 0 1 X4 X3 X2 Xi X〇 31-62Client's Docket No" S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 96 200821982 You can use the immediate format command and complete the different EXP_Golomb by encoding the immediate 'data block with the appropriate data and specifying Pseudonym The difference between instructions. EXP-GOL〇MB_D Dst5 #Type5 Srcl.lane ,, which can be determined by the following list S#Type/#Type Pseudonym instruction 0x0 EXP-G〇L〇MB_UD Dst EGOLD Dst, 0x0, Src1 0x1 EXP-GOLOMB-SD Dst EGOLD Dst, 0x1, Src1 0x2 EXP-G〇L〇MB-TD Dst, Src1 EGOLD Dst, 0x2, Src1 0x3 EXP_GOLOMB_MD Dst, Src1 EGOLD Dst, 0x3, Src1 0x4 EXP-G〇L〇MB-CD Dst, Src1 EGOLD Dst , 0x4, Src1 Table 7 further explains Table 7. For #type=0x0 or shot; 叩6=0\1, no Srcl field is needed, and there is no need to specify these instructions to another major or secondary To operate the code group, because you can specify the dummy Src or Src and Dst can be marked as the same. The EXP-Golomb code symbols are encoded as shown in the following figure (for example including 0 or more boots 0, followed by 1, and then some bits corresponding to the number of boots 0): codeNum Range 1 0 0 1 X〇 1-2 0 0 1 X1 x〇3-6 0 0 0 1 X2 Xi x〇7-14 0 0 0 0 1 X3 x2 Xi x〇15-30 0 0 0 0 1 X4 X3 X2 Xi X〇31-62

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChe] 97 200821982 /這些位元如何被解釋是根據特定G〇1〇mb型式而定（這裡是根據H.264的三種型式以及Avs的第四型式）。使用 UD以及SD (不具正負號以及正負號）計算邏輯單元來計算值。例如’當位元流為_1〇1〇日夺，則仙的值為 (1«3)-1+2 = 9，而 SD 的值為（_1)Al〇*ceii(9/2) = +5。⑶ 6位元的一實 Inter 也發生相似的程序。然而，對MD而言，表單查找被執行 (例如當UD編瑪時，對值作解碼，接著使用此值做為索引進入表格，傳回6位元的值（在表格中儲存成 ’、值，但疋傳回值是從〇延伸至暫存器的寬度））施例中有兩表格，一表格為Intra編碼而另一表格為編碼。上述指令轉換如何被使用在EXP-Golomb解媽之内办中的例子，可藉由H· 264片段標頭部分解碼之示範偽碼^ 示如下。 ' sliceHeaderDecode: EXP_GOL〇MBJJD firstMBSlice EXP—G〇L〇MB一 UD sliceType EXPJ3〇L〇MBJJD picParameterSetlD READ frameNum, Nval IB_GT frameMbsOnlyFlag, ZERO, $Label1 READ fieldPicFlag, ONE B一EQ fieldPicFlag, ZERO, $Label1 READ bottomFieldFlag, ONE Label 1 : ISUBI t1, #5, nalUnitType IB一NEQ ZERO, t1,$Label2 EXP—GOLOMB一 _UD idrPicID Labe 丨 2: 旧一 NEQ ZERO, picOrderCntType, $Label3 READ picOrderCntLSB, Nvalt Label3:Client's Docket No.: S3U06-0014-TW TT's Docket No:0608-A41247twf.doc/NikeyChe] 97 200821982 / How these bits are interpreted depends on the specific G〇1〇mb type (here according to H.264 Three types and the fourth type of Avs). Use UD and SD (without sign and sign) to calculate the logical unit to calculate the value. For example, 'When the bit stream is _1〇1〇, the value of the fairy is (1«3)-1+2 = 9, and the value of SD is (_1)Al〇*ceii(9/2) = +5. (3) A 6-bit real Inter has a similar procedure. However, for MD, the form lookup is performed (for example, when the UD is programmed, the value is decoded, then the value is used as an index to enter the table, and the value of 6 bits is returned (stored as ', value in the table) However, it is said that the value is extended from 〇 to the width of the scratchpad.) There are two tables in the example. One table is Intra coded and the other table is coded. An example of how the above instruction conversion is used in the case of the EXP-Golomb solution, the exemplary pseudo code decoded by the H.264 fragment header portion is shown below. ' sliceHeaderDecode: EXP_GOL〇MBJJD firstMBSlice EXP—G〇L〇MB-UD sliceType EXPJ3〇L〇MBJJD picParameterSetlD READ frameNum, Nval IB_GT frameMbsOnlyFlag, ZERO, $Label1 READ fieldPicFlag, ONE B-EQ fieldPicFlag, ZERO, $Label1 READ bottomFieldFlag, ONE Label 1 : ISUBI t1, #5, nalUnitType IB-NEQ ZERO, t1, $Label2 EXP-GOLOMB_UD idrPicID Labe 丨2: Old NEQ ZERO, picOrderCntType, $Label3 READ picOrderCntLSB, Nvalt Label3:

Clienfs Docket No.: S3U06-0014-TW TT，s Docket No:0608-A41247twf.doc/NikeyChen 98 200821982 ICMPLEQ p1，〇NE，fieldPicFlagClienfs Docket No.: S3U06-0014-TW TT,s Docket No:0608-A41247twf.doc/NikeyChen 98 200821982 ICMPLEQ p1,〇NE,fieldPicFlag

[p1]MOV nfieldPicFlag, ZERO[p1]MOV nfieldPicFlag, ZERO

[!p1]MOV nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlag B_NEQ ONE, t1,$Label4 EXP__GOLOMB_SD deltaPicOrderCntBottom[!p1]MOV nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlag B_NEQ ONE, t1,$Label4 EXP__GOLOMB_SD deltaPicOrderCntBottom

Label4: 車專換至 sliceHeaderDecode: EGOLD firstMBSIice,#0, ZERO EGOLD sliceType, #0, ZERO EGOLD picParameterSetID, #0, ZERO READ frameNum, Nval IB一GT frameMbsOnlyFlag, ZERO, $Label1 READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO, $Label1 READ bottomFieldFlag, ONE LabeH: ISUBI t1, #5, nalUnitType B—NEQ ZERO, t1,$Labe!2 EGOLD idrPicID, #0, ZERO Label2: B—NEQ ZERO, picOrderCntType, $Label3 READ picOrderCntLSB, Nvalt Label3: ICMPI_EQ p1, ONE, fieldPicFlag [p1]M〇V nfieldPicFlag, ZERO [!p1]M〇V nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlag B—NEQ ONE, t1,$Label4 EGOLD deltaPicOrderCntBottom, #1, ZERO VC-1解碼Label4: Car exclusive to sliceHeaderDecode: EGOLD firstMBSIice, #0, ZERO EGOLD sliceType, #0, ZERO EGOLD picParameterSetID, #0, ZERO READ frameNum, Nval IB-GT frameMbsOnlyFlag, ZERO, $Label1 READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO , $Label1 READ bottomFieldFlag, ONE LabeH: ISUBI t1, #5, nalUnitType B—NEQ ZERO, t1, $Labe!2 EGOLD idrPicID, #0, ZERO Label2: B—NEQ ZERO, picOrderCntType, $Label3 READ picOrderCntLSB, Nvalt Label3 : ICMPI_EQ p1, ONE, fieldPicFlag [p1]M〇V nfieldPicFlag, ZERO [!p1]M〇V nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlag B—NEQ ONE, t1,$Label4 EGOLD deltaPicOrderCntBottom, #1, ZERO VC- 1 decoding

已描述用作CABAC解碼（經由CABAC模組580的可變長度解碼單元530a)、CAVLC解碼（經由CAVLC模組582的可變長度解碼單元530b)、MPEG解碼（經由 MPEG模組578的可變長度解碼單元530c Ώ J Μ及 EXP-Golomb解碼（經由EXP-Golomb模組584的可變吾户It has been described for use as CABAC decoding (variable length decoding unit 530a via CABAC module 580), CAVLC decoding (variable length decoding unit 530b via CAVLC module 582), MPEG decoding (variable length via MPEG module 578) Decoding unit 530c Ώ J Μ and EXP-Golomb decoding (variable via the EXP-Golomb module 584

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 99 200821982 •解碼單元530 d)的解碼系統2〇〇，接下來將描述解碼系統 200的VC-1實施例，於此稱為可變長度解碼單元53〇e。可變長度解碼單元53〇e根據計算前導1模組574、計算前導 0模組576的運算而操作。vC_i使用霍夫曼編碼且具有更多表格。代替建立q及測試這些表格，既然位元率需要較Client's Docket No.: S3U06-0014-TW TT's Docket No: 0608-A41247twf.doc/NikeyChen 99 200821982 • Decoding System of Decoding Unit 530 d) Next, a VC-1 embodiment of decoding system 200 will be described next, This is referred to herein as a variable length decoding unit 53〇e. The variable length decoding unit 53A operates in accordance with the calculation of the preamble 1 module 574 and the calculation preamble module 576. vC_i uses Huffman coding and has more tables. Instead of creating q and testing these tables, since the bit rate needs to be compared

I 低，但疋’1¾ δ立成本較南，必要的表格被載入至鄰近内容記憶體564。表格格式相同於MPEG-2所使用，而使用READ、 VLC—CLZ、VLC—CLO以及INPSTR指令以解碼值元流。 f 例如，使用下列偽碼可執行特定表格： 1I is low, but 疋'13⁄4 δ is relatively cost-effective, and the necessary forms are loaded into the adjacent content memory 564. The table format is the same as that used by MPEG-2, and the READ, VLC-CLZ, VLC-CLO, and INPSTR instructions are used to decode the value stream. f For example, use the following pseudocode to execute a specific table: 1

//TABLE -1 Picture CBPCY VLC TABLE//TABLE -1 Picture CBPCY VLC TABLE

VLC_CLZDST0，#8 CASE DSTO 0: VALUE = 0; BREAK; //USE MOVL 1:VLC^_CLZ DST1#5 CASE DST1 1:T=READ(2);VLC_CLZDST0,#8 CASE DSTO 0: VALUE = 0; BREAK; //USE MOVL 1:VLC^_CLZ DST1#5 CASE DST1 1:T=READ(2);

CASET 0: VALUE = 48; BREAK; 1 : VALUE = 56; BREAK; 2: GO20; BREAK; 3:VALUE=1;BREAK; CASE—END 2: VALUE = 2; BREAK; 3: VLC_CLODST2,#5 CASE DST2 0: VALUE = 28; BREAK; 1: VALUE = 22; BREAK; 2: VALUE = 43; BREAK; 3: VALUE = 30; BREAK; 4: VALUE = 41; BREAK; 5: VALUE = 49; BREAK;CASET 0: VALUE = 48; BREAK; 1 : VALUE = 56; BREAK; 2: GO20; BREAK; 3: VALUE = 1; BREAK; CASE_END 2: VALUE = 2; BREAK; 3: VLC_CLODST2, #5 CASE DST2 0: VALUE = 28; BREAK; 1: VALUE = 22; BREAK; 2: VALUE = 43; BREAK; 3: VALUE = 30; BREAK; 4: VALUE = 41; BREAK; 5: VALUE = 49; BREAK;

CASE—END 4: T = READ(1 )； VALUE = (T)? (READ(1) ? 31 : 54): 27; BREAK; 5: VALUE = 6; BREAK;CASE—END 4: T = READ(1); VALUE = (T)? (READ(1) ? 31 : 54): 27; BREAK; 5: VALUE = 6; BREAK;

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 100 200821982Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 100 200821982

/ CASE一 END 2: VLC_CLZ DS1 #4 ： CASE DST1 1: VALUE = 3; BREAK; 2: T = READ(1); VALUE = (T)? 19 : 36; BREAK; 3:T = READ(2);/ CASE_END 2: VLC_CLZ DS1 #4 : CASE DST1 1: VALUE = 3; BREAK; 2: T = READ(1); VALUE = (T)? 19 : 36; BREAK; 3:T = READ(2) ;

CASET 0: VALUE = 38; BREAK; 1: VALUE = 47; BREAK; 2: VALUE = 59; BREAK; 3: VALUE = 5; BREAK; CASE—END 4: VALUE = 7; BREAK;CASET 0: VALUE = 38; BREAK; 1: VALUE = 47; BREAK; 2: VALUE = 59; BREAK; 3: VALUE = 5; BREAK; CASE_END 4: VALUE = 7;

CASE_END 3: T = READ(1); VALUE = (T)? 16 : 8; BREAK; f 4: T = READ(1); VALUE = (T) GO10 ? : 12; BREAK; 5: VALUE = 20; BREAK; 6: VALUE = 44; BREAK; 7: T = READ(1); VALUE = (T)? 33: 58; BREAK; //USE SEL?? 8: VALUE =15; BREAK;CASE_END 3: T = READ(1); VALUE = (T)? 16 : 8; BREAK; f 4: T = READ(1); VALUE = (T) GO10 ? : 12; BREAK; 5: VALUE = 20; BREAK; 6: VALUE = 44; BREAK; 7: T = READ(1); VALUE = (T)? 33: 58; BREAK; //USE SEL?? 8: VALUE =15; BREAK;

CASE—END GO10: INPSTR S1,#3 READ一NCM S2, #0, off+S1»2 VALUE = S2& 0x63; Q = (S2 »6)&0x3; READ SO, Q RETURN;CASE—END GO10: INPSTR S1, #3 READ-NCM S2, #0, off+S1»2 VALUE = S2&0x63; Q = (S2 »6)&0x3; READ SO, Q RETURN;

G〇20: INPSTR S1, #4 READ_NCM S2, #0, off+s1»2 VALUE = S2& 0x63; Q = (S2 »6)&0x3; READ SO, Q RETURN; 在部分實施例中，可用分支指令代替CASE敘述。因此，和MPEG-2 —樣的VC-1具有容易定義的文法。文法中的符號具有特定方法（表格），其可被執行成著色器，G〇20: INPSTR S1, #4 READ_NCM S2, #0, off+s1»2 VALUE = S2&0x63; Q = (S2 »6)&0x3; READ SO, Q RETURN; In some embodiments, available The branch instruction replaces the CASE statement. Therefore, VC-1 like MPEG-2 has an easily defined grammar. The symbols in the grammar have a specific method (table) that can be executed as a colorizer.

Client’s Docket No·: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 101 200821982 如上述編碼所顯示。本發明雖以較佳實施例揭露如上，然其並非用以限定本發明的範圍，任何熟習此項技藝者，在不脫離本發明之精神和範圍内，當可做些許的更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Client’s Docket No·: S3U06-0014-TW TT’s Docket No: 0608-A41247twf.doc/NikeyChen 101 200821982 as shown in the above code. The present invention has been described above with reference to the preferred embodiments thereof, and is not intended to limit the scope of the present invention, and the invention may be modified and modified without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims.

I I 【圖式簡單說明】第1圖係顯示圖形處理器系統實施例之方塊圖，其中可執行不同的解碼系統（及方法）；第2圖係顯示示範處理環境之方塊圖，其中可執行解碼系統的不同實施例；第3圖係顯示第2圖所顯示之示範處理環境的選擇元件方塊圖；第4圖係顯示第2、3圖所顯示之示範處理環境的計算核心方塊圖，其中可執行解碼系統的不同實施例；第5A圖係顯示第4圖中計算核心之執行單元的選擇元件方塊圖，其中可執行解碼系統的不同實施例；第5B圖係顯示執行單元資料路徑之方塊圖，其中可執行解碼系統的不同實施例；第5C圖係顯示第5B圖中解碼系統實施例之方塊圖，其適用於複數編碼標準，以及更顯示對應之位元流緩衝器的實施例；第6A圖係顯示第5C圖中解碼系統實施例之方塊圖，用以進行CABAC解碼；第6B圖係顯示第6A圖中解碼系統實施例之方塊圖；II [Simplified Schematic] FIG. 1 is a block diagram showing an embodiment of a graphics processor system in which different decoding systems (and methods) can be executed; and FIG. 2 is a block diagram showing an exemplary processing environment in which decoding can be performed. Different embodiments of the system; Figure 3 is a block diagram showing the selection elements of the exemplary processing environment shown in Figure 2; Figure 4 is a block diagram showing the core of the exemplary processing environment shown in Figures 2 and 3, where Performing different embodiments of the decoding system; FIG. 5A is a block diagram showing selection elements of the execution unit of the computing core in FIG. 4, in which different embodiments of the decoding system can be executed; FIG. 5B is a block diagram showing the execution unit data path , in which a different embodiment of the decoding system can be executed; FIG. 5C is a block diagram showing an embodiment of the decoding system in FIG. 5B, which is applicable to a complex coding standard, and an embodiment in which a corresponding bitstream buffer is further displayed; 6A shows a block diagram of a decoding system embodiment in FIG. 5C for CABAC decoding; FIG. 6B shows a decoding system embodiment in FIG. 6A Block diagram;

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 102 200821982 * 第6C圖係顯示第6A圖中解碼系統之内容記憶結構及 •相關暫存器實施例之方塊圖；第6D圖係顯示使用第6A圖中解碼系統之巨集區塊劃分機制；第6E圖係顯示使用第6A圖中解碼系統所執行之示範Client's Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 102 200821982 * Figure 6C shows the block diagram of the content memory structure of the decoding system and the associated register embodiment of Figure 6A Figure 6D shows the macroblock partitioning mechanism using the decoding system in Figure 6A; Figure 6E shows the demonstration performed using the decoding system in Figure 6A;

I I 巨集區塊解碼機制的方塊圖；：第7A圖係顯示第5C圖中解碼系統實施例之方塊圖，用以進行CABAC解碼；以及，第7B圖係顯示第7A圖中解碼系統所使用的表格結構實施例之方塊圖。〜PCI-E匯流排介面單元122〜晶片組Block diagram of the II macroblock decoding mechanism; Figure 7A shows a block diagram of the decoding system embodiment in Figure 5C for CABAC decoding; and, Figure 7B shows the decoding system used in Figure 7A. A block diagram of a table structure embodiment. ~PCI-E bus interface unit 122~ chipset

執行單元集合控制以及頂點/串流快取单元【主要元件符號說明】 100〜圖形處理器系統 104〜顯示介面單元 110〜記憶介面單元 118 124〜糸統記憶體 12 8〜驅動軟體 202〜圖形處理器 206〜 208〜圖形管線 304〜像素包裝器 308〜寫回單元 402〜執行單元輸入 404a〜執行單元偶輸出 102〜顯示裝置 106〜局部記憶體 114〜圖形處理單元 126〜中央處理單元 200〜解碼系統 204〜計算核心 302〜紋理過濾單元 306〜命令流處理器 310〜紋理位址產生器 412〜執行單元集合 404b〜執行單元奇輸出Execution unit set control and vertex/streaming cache unit [main element symbol description] 100~ graphics processor system 104~ display interface unit 110~memory interface unit 118 124~system memory 12 8~drive software 202~ graphics processing 206 to 208 to graphics pipeline 304 to pixel packager 308 to writeback unit 402 to execution unit input 404a to execution unit even output 102 to display device 106 to local memory 114 to graphics processing unit 126 to central processing unit 200 to decoding The system 204 to the calculation core 302 to the texture filtering unit 306 to the command stream processor 310 to the texture address generator 412 to the execution unit set 404b to the execution unit odd output

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen 103 200821982 406〜記憶體存取單元 410〜記憶體介面仲裁器 4卯〜L2快取記憶體 504〜指令快取記憶體控制器Client's Docket No.: S3U06-0014-TW TT's Docket No: 0608-A41247twf.doc/NikeyChen 103 200821982 406~ Memory Access Unit 410~ Memory Interface Arbiter 4卯~L2 Cache Memory 504~ Instruction Cache Memory controller

506〜執行緒控制器 510〜共用暫存器檔案 514〜執行單元資料路徑 516〜述詞暫存器檔案 520〜資料輸出控制器 526〜暫存器檔案 532〜向量浮點單元 5Q8〜緩衝器 51,2〜執行單元資料路徑 ): 518〜純量暫存器檔案 524〜執行緒任務介面 530〜可變長度解碼單元 534〜向量整數計算邏輯單元 536〜特殊目的單元 ^ 干儿 54〇〜暫存器檔案 562〜SREG串流緩衝器/dma引擎 ’、 562a〜SREG暫存器 564〜鄰近内容記憶體 562b〜位元流緩衝器 568〜讀取鄰近内文記憶體模組506~Thread Controller 510~Common Register File 514~Executing Unit Data Path 516~Present Register File 520~Data Output Controller 526~Scratch File 532~Vector Floating Point Unit 5Q8~Buffer 51 , 2 ~ execution unit data path): 518 ~ scalar register file 524 ~ thread task interface 530 ~ variable length decoding unit 534 ~ vector integer calculation logic unit 536 ~ special purpose unit ^ dry child 54 〇 ~ temporary storage 562~SREG stream buffer/dma engine', 562a~SREG register 564~contiguous content memory 562b~bit stream buffer 568~read adjacent context memory module

570〜檢查字串模組 574〜計算引導1模組 578〜MPEG模組 582〜CAVLC模組 602〜狀態索引 606〜碼長範圍 612〜局部暫存器 616〜二進位字串暫存器 572〜讀取模組 576〜計算引導〇模組 580〜CABAC模組 584〜Exp_Gol〇mb 模組 604〜高可能性符號值 6087^碼長偏移量 614〜總體暫存器 620〜二進位化模組570~Check string module 574~Compute boot module 578~MPEG module 582~CAVLC module 602~state index 606~code length range 612~local register 616~binary string register 572~ Read module 576~calculation guide module 580~CABAC module 584~Exp_Gol〇mb module 604~high probability symbol value 6087^code length offset 614~total register 620~binary module

Client’s Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 104 200821982 622〜取得内容模組 624〜二進位計算解碼引擎 628〜目標 630〜 SRC2 632〜 SRC1 634〜共用以及執行緒資訊 636〜延遲/重置 638〜位址 640〜資料 650〜記憶體模組 654〜二進位索引 710〜係數符記模組 712〜位準碼模組 714〜位準模組 716〜位準0模組 718〜零位準模組 720〜運行模組 722〜位準陣列 724〜運行陣列Client's Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 104 200821982 622~ Get Content Module 624~ Binary Calculation Decoding Engine 628~ Target 630~ SRC2 632~ SRC1 634~ Share and Execute Information 636~ Delay/Reset 638~ Address 640~ Data 650~ Memory Module 654~ Binary Index 710~Coefficient Charging Module 712~ Registration Module 714~ Leveling Module 716~ Level 0 module 718~zero level module 720~ running module 722~ level array 724~ running array

Client’s Docket No.·· S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChenClient’s Docket No.·· S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen

Claims

200821982 X. Patent application scope: 1. A decoding method, comprising: providing a shader configured with a complex instruction set to decode a video stream, wherein the video stream is loaded with the above-mentioned complex reduction command set according to a complex number The above coloring = a variable length decoding unit of a software programmable core processing unit for execution by the variable length decoding unit described above;

The above video stream is decoded by the above-described shader performing the variable length decoding unit described above. 2. The decoding method of claim 2, wherein the loading further comprises initializing the variable length decoding unit, wherein the decoding is performed in a content programming of a graphics processing unit. The hardware of the material path and the one-bit stream buffer are used to automatically manage the additional hardware, and the above-mentioned complex encoding method includes the inner valley adaptive one-pass encoding (CABAC), and the content adaptation variable Two or more of length coding (CAVLC), EXP-Golomb, animation expert group (MpEG_2), and VC-1. 3. The decoding method according to claim 2, wherein the above instruction set for initializing comprises at least one INIT-CTX and init_ade for CABAC angle dry stone horse, INIT-CAVLC for CAVLC corner army horse INIT-MPEG2 for MPEG-2 decoding, INIT-VC-1 for VC-1 decoding and INIT_CTX or INIT_CAVLC for EXP-Golomb decoding' and more including INIT_AVS for decoding according to an audio video standard. Client's Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 106 200821982 ' 4. The decoding method according to claim 2, wherein the above initial: further includes: at least initialization a content memory array, a plurality of registers, a complex table of contents, and a decoding engine; updating the block of the register corresponding to the decoding operation or initializing the block of the register, wherein the update The method includes: moving a value between the temporary register and the content memory array; and reading the content memory array, wherein the instruction set for moving or initializing includes a CWRITE instruction for updating The above instruction set includes an INSERT instruction, and the above instruction set for reading includes a READ-NCM instruction 0. 5. The decoding method according to claim 2, wherein the initializing further comprises initializing a bit stream. a buffer and a related register for receiving the video stream segment, wherein the bit stream buffer is initialized and related The above instruction set of the memory includes an INIT_BSTR finger {, a command for loading data corresponding to the video stream to the bit stream buffer, and starting the automatic management of the bit stream buffer and the associated register. a program. 6. The decoding method of claim 5, further comprising arranging the data of the bit stream buffer in a byte manner, wherein the instruction set for arranging in a byte manner comprises an ABST instruction. 7. The decoding method according to claim 5, further comprising reading data from the relevant register when the data is used during decoding, wherein Clienfs Docket No.: S3U06-0014-TW TT's Docket No :0608-A41247twf.doc/NikeyChen 107 200821982 The above instruction set for reading data includes a READ instruction. 8. The decoding method according to claim 5, further comprising checking a bit stream of the bit stream buffer or a related register for a specific pattern that does not need to execute the bit stream. The above instruction set for checking includes at least one of the following: t I an INPSTR instruction corresponding to the check of the associated register, and a predetermined number of most significant bits to a target register corresponding to the above check One of the return; and an INPTRB instruction, the original byte sequence corresponding to the associated register carries the check of the trailing bit. 9. A decoding method comprising: decoding a video stream by performing a shader, the shader being embedded in a variable length decoding unit of a programmable core processing unit, and wherein the decoding system is different according to a complex number An encoding method; and providing a decoded data output. 10. The decoding method according to claim 9, wherein the I decoding is performed in a content programming of a graphics processing unit, by performing hardware processing on a data path of a graphics processing unit, and automatically managing a bit stream buffer. Complete with additional hardware, and wherein the above complex coding methods include content adaptive binary arithmetic coding (CABAC), content adaptive variable length coding (CAVLC), EXP-Golomb, dynamic expert group (MPEG-2), and VC-1 Two or more. 11. The decoding method according to claim 9, wherein the decoding according to CABAC includes: Client's Docket No.: S3U06-0014-TW TT^ Docket No: 0608-A41247twf.doc/NikeyChen 108 200821982 成八ΓΓ , within the group 'receive-the first information, package w" into a knife and a content block type; _ corresponding to the above-mentioned shader of the m-leveling module Lai Xing, the commander's roots turned over the content model ―” providing two-component gas corresponding to one or more macro block parameters. Receiving the second information in a content module; the internal character—corresponding to the above-mentioned execution of the content module The first instruction of the shader provides a binary information for binary decoding and a content recognition poor message, wherein the above content corresponds to a high value or a low probability symbol probability; The carry calculation decoding module receives the binary information, the content identification information, an offset, and a range; and corresponding to the coloring state performed by the binary computing decoding module The second instruction, which decodes one or more binary symbols. The decoding method according to claim 11, wherein the decoding according to CABAC further comprises the following combination: ... receiving one or more decoded two The carry symbol is in a binary string register, the above-mentioned or a plurality of decoded binary components are decoded; the updated content information is provided; and the write-to-content memory array is The writing system is based on a Boolean logic operation including a value conversion from a temporary storage device that supplies the content memory to the above-mentioned content recording body array. 13. The decoding method according to claim 9 of the patent application, further includes 矣Client's Docket No.: S3U06-0014-TW TT5s Docket No:0608-A41247twf.doc/NikeyChen 109 200821982 One of the following: / ^ The order of the order determines whether to use one of the results stored in the internal temporary operation 'Or in the -source operator - the data should be used in - or multiple modules - the current operation; the field, the number of bits in the number is used for decoding, repeated and automatic stream buffer The predetermined number of bits of the red, the above-described bit coefficient corresponding to said video stream;

^The delay should be buffered in the expected downward overflow in the above-mentioned bit stream buffer; and the number of bits used in the above bit stream buffer, corresponding to the j-bit parameter is greater than the "established number" _, stop the above bit stream buffer state transfer, and switch control to a host processor. 14. The decoding method according to claim 9, wherein the decoding according to the CAVLC comprises: receiving a macroblock information in a coefficient register module of the CAVLC unit; corresponding to one of the shaders The fourth instruction (CAVLC-TOTC) provides a trailing coefficient (TrailingOnes) information and a non-zero coefficient (TotalCoeff) information; and receives the trailing coefficient information and a quasi-code in one of the CAVLC unit level modules. Corresponding to the fifth instruction (CAVLC-LVL) of one of the above shaders, providing a suffix length information and a quasi-index (Level[Idx]) information; receiving in one of the CAVLC unit level code modules The above-mentioned suffix length Client's Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 110 200821982 information; and: Corresponding to the sixth instruction (CAVLC_LC) of one of the above shaders, the above-mentioned level code is provided Information to the above level module. 15. The decoding method of claim 14, wherein the suffix length information and the level index information are in-position mode via a forwarding register and a execution unit register The group is received, wherein the level index information is incremented. 16. The decoding method according to claim 14, wherein the decoding according to the i' CAVLC further comprises the following combination: receiving the above-mentioned trailing coefficient information and corresponding in one of the CAVLC unit level 0 modules; Providing a second level index information to a quasi-array in a seventh instruction (CAVLC-LVL0) of the above shader; receiving the non-zero coefficient information and coefficient information in one of the zero level components of the CAVLC unit One of the maximum values; corresponding to one of the above shader eighth instructions (CAVLC_ZL), providing a zero residual information and a reset value to a first and a second multiplexer; operating the module in one of the CAVLC units Receiving the zero residual information from the first and second multiplexers and the second level index information respectively; corresponding to one of the shader ninth instructions (CAVLC-RUN), providing a running index to a run Array; corresponding to one of the above-mentioned shader tenth instructions (READ_LRUN), respectively, the above-mentioned level array and the above-mentioned running array respectively provide a decoded level Clients Docket No.: S3U06-0014- TW TTs Docket No: 0608-A41247twf.doc/NikeyChen 111 200821982 Value and a decoded running value; and: Corresponding to one of the above shaders, the eleventh entry, the step (CLR-LRUN), clear The above level array and the above running array. 17. The decoding method according to claim 16 of the patent application, in the basin, the first: the multiplexer system (4) receives the above-mentioned zero surplus from the -^ forwarding register: the remaining multiplexer The system uses the second level index information from the second forwarding register. Χ 18. The decoding method of claim 9, wherein the decoding according to / EXP-Golomb comprises: detecting and tracking in the combined-bit stream buffer n crying 0 and guiding 1 The quantity, 曰 which is based on the above-mentioned decoding system of ΕΧΡ-Golomb, uses a single opcode to perform the complex EXP-Golomb operation, each of which: the above-mentioned number of EXP-Golomb operations can be used immediately in one of the shader instructions. That is, the individual values of the Bellows field values are distinguished, and the above-mentioned detection and tracking guidance system is based on the calculation of the guidance 〇U (CLZ) instruction and the above-mentioned _ and tracking bows! The instructions are based on a computed Boot 0 (CLO) instruction, and wherein the above shader instructions corresponding to the EXP-(5)omb decoding include an EXP-GOLOMB-D instruction. 19. The decoding method of claim 9, wherein the decoding according to MPEG-2 comprises: executing an MpEG standard table using one or more MatchVLC functions, each of the one or more MatchVLC functions corresponding to an imaginary number In a different syntax into Client's Docket No.: S3U06-0014-TW TT5s Docket No: 0608-A41247twf.doc/NikeyChen 112 200821982 n points, the above table selection is based on one of the above shader instructions, where corresponds; in the MatchVLC function The above shader instructions include a VLC-MPEG2 instruction. 20. The decoding method of claim 9, wherein the decoding according to VC-1 comprises: I selectively loading a VC-1 table to a content memory array, wherein the decoding is based on the above selectivity The form to load.

Client’s Docket No.: S3U06-0014-TW TT’s Docket No:0608-A41247twf.doc/NikeyChen