TWI354239B - Decoding system unit - Google Patents
Decoding system unit Download PDFInfo
- Publication number
- TWI354239B TWI354239B TW96120728A TW96120728A TWI354239B TW I354239 B TWI354239 B TW I354239B TW 96120728 A TW96120728 A TW 96120728A TW 96120728 A TW96120728 A TW 96120728A TW I354239 B TWI354239 B TW I354239B
- Authority
- TW
- Taiwan
- Prior art keywords
- decoding
- module
- instruction
- docket
- content
- Prior art date
Links
Landscapes
- Image Generation (AREA)
- Image Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Description
1354239 ' 九、發明說明:1354239 ' Nine, invention description:
V 【發明所屬之技術領域】 本發明係有關於資料處理系統,特別是有關於可編程 圖形處理系統以及方法。V TECHNICAL FIELD OF THE INVENTION The present invention relates to data processing systems, and more particularly to programmable graphics processing systems and methods.
t I .【先前技彳标】 - 電腦圖形是用電腦產生圖像、影像或是其他圖形或圖 Φ 像資訊的一種技術。目前,許多的圖形系統是透過介面的 使用而實施,例如:微軟的Direct3D介面、OpenGL等, - 其可在執行特定操作系統(例如:微軟的視窗系統)的電 腦上對多媒體硬體(例如:圖形加速器或是圖形處理單元 • (graphics processing unit,GPU)提供控制。圖像或是影像 - 的產生一般稱之為描緣成像(rendering ),上述操作的細 節主要是經由圖形加速器所實施。一般而言,在三維(three dimensional,3D)電腦圖形中,場景内物件表面(或容體) • 所表示的幾何被轉換成像素(圖像元素),並儲存在圖框 緩衝器(frame buffer )内,接著顯示於顯示裝置上。每個 _ 物件或是物件群都有與表面外觀有關的特定視覺性質(例 - 如:材料、反射係數、形狀、紋理(texture)等),其可 被定義成物件或物件群的描繪成像内容(rendering context) ° 電腦圖形用以增加消費者對遊戲及其他多媒體產品的 控制性及特色的要求、產生更加真實的影像以及改善處理 速度及耗能。現已發展出許多標準,可以利用較少的位元t I . [Previous technical standards] - Computer graphics are a technique for generating images, images or other graphics or images Φ image information using a computer. Currently, many graphics systems are implemented through the use of interfaces, such as Microsoft's Direct3D interface, OpenGL, etc. - which can be used for multimedia hardware on computers running specific operating systems (eg Microsoft's Windows systems) (eg: Graphics accelerators or graphics processing units (GPUs) provide control. Image or image generation is generally referred to as rendering, and the details of the above operations are mainly implemented by graphics accelerators. In the three-dimensional (3D) computer graphics, the geometry of the object surface (or volume) in the scene is converted into pixels (image elements) and stored in the frame buffer. And then displayed on the display device. Each _ object or group of objects has specific visual properties related to the appearance of the surface (eg - material, reflection coefficient, shape, texture, etc.), which can be defined Rendering context of objects or groups of objects ° Computer graphics to increase consumer perception of games and other multimedia The controllability and features of the body products, the generation of more realistic images and the improvement of processing speed and energy consumption. Many standards have been developed to utilize fewer bits.
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 5 1354239 ;數來產生較佳品質的影像。這些標準之一的H.264標準(亦 為工8〇動晝專家群(motion picture experts group,MPEG) _4的第十部份)為高壓縮數位視頻編碼譯碼(codec)標準。 相較於MPEG-2編碼器,h.264相容之編碼譯碼器僅使用 幾乎三分芩一的位元數來編碼視頻並維持相似的視頻品 貝。Η.264 .規格提供兩種型式的倘(entr〇py)_編碼處理, 包括内谷適應二進位算術編碼(c〇ntext_adaptive binary arithmetic coding,CAB AC)以及内容適應可變長度編碼 (context-adaptive variable length coding,CAVLC)。 為了滿足這些連續變化的需要,已提出了許多不同的 純軟體或是純硬體解決方式,然而,已知技術皆會導致較 高的庫存、立即淘汰的技術以及在設計上缺乏彈性。 【發明内容】 本發明揭露用於圖形處理單元之多執行序平行計算核 心之解碼系統以及方法。本發明提供一系統,包括一軟體 可編程核心處理單元,具有一可變長度解碼單元,用以執 行一著色器,上述著色器係選擇性地執行一視頻串流之一 解碼步驟以輸出一解碼資料,其中上述視頻串流係根據内 容適應二進位算術編碼(CABAC)、内容適應可變長度編 碼(CAVLC)、EXP-Golomb、動晝專家群(MPEG-2)以 及VC-1標準而得,且上述解碼步驟係使用軟體以及硬體 之一組合而執行。 本發明提供另一系統,包括一圖形處理單元耦接至一 主機處理器以及記憶體,上述圖形處理單元包括一圖形處Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 5 1354239; number to produce better quality images. One of these standards, the H.264 standard (also the tenth part of the motion picture experts group (MPEG) _4) is the high compression digital video coding (codec) standard. Compared to MPEG-2 encoders, h.264 compatible codecs use only almost one-thirds of the number of bits to encode video and maintain similar video products. The 264.264 specification provides two types of (entr〇py) coding processes, including c〇ntext_adaptive binary arithmetic coding (CAB AC) and content-adaptive variable-length coding (context-adaptive). Variable length coding, CAVLC). In order to meet these continuous changes, many different pure software or pure hardware solutions have been proposed. However, known techniques result in higher inventory, immediate elimination techniques, and lack of flexibility in design. SUMMARY OF THE INVENTION The present invention discloses a decoding system and method for a multi-execution sequential parallel computing core for a graphics processing unit. The present invention provides a system comprising a software programmable core processing unit having a variable length decoding unit for performing a shader, the shader selectively performing a video stream decoding step to output a decoding Data, wherein the video stream is based on content adaptive binary arithmetic coding (CABAC), content adaptive variable length coding (CAVLC), EXP-Golomb, dynamic expert group (MPEG-2), and VC-1 standard. And the above decoding steps are performed using a combination of software and hardware. The present invention provides another system including a graphics processing unit coupled to a host processor and a memory, the graphics processing unit including a graphics
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 6 具有厂軟體可編程核心處理單元,包括—或多個執 J早7L ’上述-❹個執行單元包括執行單元資料路徑硬 二、’其包括-可變長度解碼單元,上述可變長度解碼單元 从執行-m上述著色器根據内容適應二進位算術 馬、,内谷適應可變長度編碼、EXp_G〇1〇mb、MpEG_2以 及VC-1標準選擇性地執行—視财流編碼之解碼以提供 解碼過之貢料輸出。 【實施方式】 為讓本發明之上述和其他目的、特徵、和優點能更明 顯易懂,下文特舉出較佳實施例,並配合所附圖式,作詳 細說明如下: 實施例: 本發明揭露解碼系統以及方法的許多實施例(其中, 上述系統及方法將統稱為解碼系統)。在一實施例中,解 碼糸統係内嵌於圖形處理單元(graphics processing unh, GPU)之可編程、多執行序(multithread)以及平行計算核 心之一或多個執行單元中。使用軟體或硬體之結合以實施 解碼功能。即視訊解碼是在圖形處理單元程式設計 (programming )的内容(context)以及圖形處理單元資料 路徑内的硬體實施所完成。例如,在一實施例中,解碼運 异或方法係由具有擴充指令集(extended instruction set) 之著色器(shader )(例如:頂點著色器)、圖形處理單元 的執行單元資料路徑、以及用於位元流緩衝器之自動管理Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 6 has factory software programmable core processing unit, including - or multiple implementations J 7L 'above - one execution unit including execution unit The data path is hard, 'which includes - variable length decoding unit, the above variable length decoding unit performs -m above, the shader adapts to the binary arithmetic horse according to the content, and the inner valley adapts to the variable length coding, EXp_G〇1〇mb The MpEG_2 and VC-1 standards are selectively implemented - decoding of the stream encoding to provide a decoded tribute output. The above and other objects, features and advantages of the present invention will become more < Many embodiments of decoding systems and methods are disclosed (wherein the above systems and methods will be collectively referred to as decoding systems). In one embodiment, the decoding system is embedded in one or more of the execution units of the graphics processing unh (GPU) programmable, multithreaded, and parallel computing cores. Use a combination of software or hardware to implement the decoding function. That is, video decoding is done in the context of programming of the graphics processing unit and the hardware implementation in the data path of the graphics processing unit. For example, in one embodiment, the decoding algorithm or method is performed by a shader having an extended instruction set (eg, a vertex shader), an execution unit data path of the graphics processing unit, and Automatic management of bit stream buffers
Client's Docket No.: S3U06-0013-TW IT’s Docket No:0608-A41246twf.doc/NikeyChen 1354239 的額外硬體所實施。相較於現有系統,現有系統為處理純 硬體或純軟體為主的解決方式,因此會遇到於先前技術中 所提到的一些問題。 在本文所描述的解碼系統中,可實施使用複數熵編碼 辑術之資訊解碼的編碼動作。解碼系統可根據著名之國際 r 笔心耳外盟通 sfl標準部門(internati〇nal telecommunication union telecommunication standardization sector > ITU-T) H.264標準的CABAC以及CAVLC進行解碼,亦可根據 MPEG-2以及VC-1標準進行解碼。不同的解碼系統實施例 係根據複數模式之一而操作,其中各模式係對應於先前所 描述的標準之一並根據執行一或多個從圖形處理單元圖框 緩衝記憶體或對應於主機處理器之記憶體(例如主機中央 處理單元(central processing unit,CPU))所接收到的指 令集(例如經由預先載入(preload)等已知機制或是快取 失敗)。可重新使用硬體以提供多種型式的解碼標準(即 根據所選擇的模式)。再者,所選擇的模式亦會對初始化、 使用和/或更新内容記憶體的方式造成影響。 根據解碼的啟動模式,解碼系統可使用如Exp-Golomb 編碼、像霍夫曼(Huffman )的編碼(例如:CAVLV、MPEG-2 以及VC-1 )和/或算術編碼(例如:CABAC)。藉由延伸 對應於一或多執行單元的指令集,以及提供額外的自動管 理位元流之硬體來執行熵解碼方法,以在CAVLV解碼以 及CABAC解碼中執行内容模型。在一實施例中,熵編碼 表係使用不同的記憶體表格或是其他的資料結構(例如唯Client's Docket No.: S3U06-0013-TW IT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 is implemented with additional hardware. Compared to existing systems, existing systems are solutions that deal with pure hardware or pure software, and therefore encounter some of the problems mentioned in the prior art. In the decoding system described herein, an encoding action using information decoding of complex entropy coding can be implemented. The decoding system can be decoded according to the CABAC and CAVLC of the well-known international telecommunication union telecommunication standardization sector > ITU-T H.264 standard, or according to MPEG-2 and VC. The -1 standard is decoded. Different decoding system embodiments operate in accordance with one of a plurality of modes, wherein each mode corresponds to one of the previously described standards and buffers memory or corresponds to a host processor according to execution of one or more slave graphics processing unit frames The set of instructions received by the memory (eg, the central processing unit (CPU)) (eg, via known mechanisms such as preload or cache failure). The hardware can be reused to provide multiple types of decoding standards (ie, depending on the mode selected). Furthermore, the mode selected will also affect the way in which the content memory is initialized, used, and/or updated. Depending on the mode of decoding of the decoding, the decoding system may use, for example, Exp-Golomb encoding, encoding like Huffman (eg, CAVLV, MPEG-2, and VC-1) and/or arithmetic encoding (eg, CABAC). The entropy decoding method is performed by extending the instruction set corresponding to one or more execution units, and providing an additional hardware that automatically manages the bit stream to perform the content model in CAVLV decoding and CABAC decoding. In one embodiment, the entropy coding table uses different memory tables or other data structures (eg, only
Client's Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf,doc/NikeyChen 8 1354239 '· °賣 °己隐體(read only memory,ROM )表)。 此外’自動位元流缓衝器具備一些優點,例如,一旦 元緩衝器的直接記憶體存取(direct memory access, DMA )引擎得知位元流的位置(位址),便會自動管理位 ,兀流而不需要進一步的指令。相,較於傳統的微處理器/數位 ^ 號處理器(digital signal processor,DSP)系統,位元流 • s理代表了大量的間接費用。再者’透過追蹤所使用的位 φ 元數量’位元流緩衝器機制可以偵測和處理錯誤的位元流。 本發明解碼系統實施例的另一優點是將指令延遲 . (latency)減縮到最小。例如,因為CABAC解碼是非常 連續的動作且不易利用多執行序處理,因此在不同實施例 - 中使用一種轉發(forwarding)機制(例如暫存轉發)以減 - 少有效相依延遲。進一步解釋,許多深管線(deep-pipeline ) 以及多執行序處理器的限制是無法在同一執行序(thread) 中每一週期内執行指令。有些系統可使用一般轉發,其係 φ 藉由檢查先前結果的運算元(〇perand )位址以及指令運算 元位址’當兩者相同時,則使用先前結果的運算元。傳統 … 上’一般轉發需要複雜的比較和多工。在解碼系統的部分 實施例中,不管是使用先前的計算結果(例如儲存在内部 之暫存器)或疋原始運鼻元的資料’將利用不同的轉發型 式來使用指令中的位元以編碼,例如:總共2位元而每一 運异元使用1位元。藉由這種方式,可以減少整體的延遲 而改善處理器管線的效率。 第1圖係顯示圖形處理系統100之一實施例的方塊Client's Docket No.: S3U06-0013-TW TT, s Docket No: 0608-A41246twf, doc/NikeyChen 8 1354239 '· ° sells a read only memory (ROM) table). In addition, the 'automatic bit stream buffer has some advantages. For example, once the direct buffer access (DMA) engine of the meta buffer knows the location (address) of the bit stream, it automatically manages the bit. , turbulence without further instructions. Compared with the traditional microprocessor/digital processor (DSP) system, the bit stream represents a large amount of overhead. Furthermore, the erroneous bit stream can be detected and processed by tracking the number of bit φ elements used by the bit stream buffer mechanism. Another advantage of the embodiment of the decoding system of the present invention is that the delay of the instruction is reduced to a minimum. For example, because CABAC decoding is a very continuous action and it is not easy to utilize multiple execution order processing, a forwarding mechanism (e.g., temporary forwarding) is used in different embodiments to reduce the effective dependent delay. Further explanation, many deep-pipeline and multi-execution processor limitations are unable to execute instructions in each cycle of the same execution thread. Some systems may use general forwarding, which uses the operand of the previous result by checking the operand (〇perand) address of the previous result and the instruction operand address' when the two are the same. Traditional...Upper general forwarding requires complex comparisons and multiplexes. In some embodiments of the decoding system, whether using previous calculations (such as stored in internal registers) or "original data" will use different forwarding patterns to encode bits in the instruction. For example, a total of 2 bits and each transport element uses 1 bit. In this way, the overall delay can be reduced to improve the efficiency of the processor pipeline. 1 is a block showing an embodiment of a graphics processing system 100.
Client's Docket No.: S3U06-0013-TW TT's Docket No:〇608-A41246twf.doc/NikeyChen 1354239 圖’其中解碼系統以及方法的實施例於圖形處理系統l〇〇 中實施。在部分實施例中’圖形處理系統1〇〇可以是電腦 系統。圖形處理器系統100可包括由顯示介面單元(display interface unit ’ DIU) 104驅動的顯示裝置1〇2以及局部記 憶體106 (例如:可包括顯示緩衝器、圖框緩衝器、紋理 缓衝器、命令緩衝器等)。:局部記憶體106亦可取代為圖 框緩衝器或疋儲存早元。局部記憶體1 〇 6經由一或多個記 憶介面單元(memory interface unit,MIU) 11〇 搞接於圖 形處理單元。在一實施例中,記憶介面單元u〇、圖形 處理單元114以及顯示介面單元1〇4皆耦接至與高速週邊 組件互連(peripheral component interconnect express, PCI-E )相容之匯流排介面單元(bus interface unit,BIU ) 118。在一實施例中,匯流排介面單元118可使用圖形位址 重新映射表(graphics address remapping table,GART), 然而亦可使用其他的記憶映射(mapping )機制。圖形處理 單元114包括解碼系統200,其將描述於後。在部分實施 例中,雖然解碼系統200係顯示為圖形處理單元114内的 一個元件,解碼系統200亦可包括所顯示之圖形處理系統 100的一或多個額外元件或是不同元件。 匯流排介面單元118耦接於晶片組122 (例如:北橋晶 片組)或開關。晶片組122包括介面電子電路以增強來自 中央處理單元126 (又稱主機處理器)的信號,並分離從 系統記憶體124進出的信號以及從輸入輸出(I/O)裝置(未 顯示)進出的信號。雖然提到了 PCI-E匯流排協定,然而Client's Docket No.: S3U06-0013-TW TT's Docket No: 〇608-A41246twf.doc/NikeyChen 1354239 The embodiment in which the decoding system and method are implemented in the graphics processing system. In some embodiments, the graphics processing system 1 can be a computer system. The graphics processor system 100 can include a display device 1〇2 driven by a display interface unit 'DIU 104 104 and a local memory 106 (eg, can include a display buffer, a frame buffer, a texture buffer, Command buffer, etc.). The local memory 106 can also be replaced by a frame buffer or a memory. The local memory 1 〇 6 is connected to the graphics processing unit via one or more memory interface units (MIUs) 11 . In one embodiment, the memory interface unit 〇, the graphics processing unit 114, and the display interface unit 〇4 are all coupled to a bus interface interface unit that is compatible with a peripheral component interconnect express (PCI-E). (bus interface unit, BIU) 118. In an embodiment, bus interface interface unit 118 may use a graphics address remapping table (GART), although other memory mapping mechanisms may be used. Graphics processing unit 114 includes a decoding system 200, which will be described later. In some embodiments, although the decoding system 200 is shown as an element within the graphics processing unit 114, the decoding system 200 can also include one or more additional components or different components of the graphics processing system 100 being displayed. The bus interface unit 118 is coupled to the chip set 122 (e.g., a north bridge wafer set) or a switch. Wafer set 122 includes interface electronic circuitry to enhance signals from central processing unit 126 (also referred to as a host processor) and to separate signals entering and leaving system memory 124 and from input/output (I/O) devices (not shown). signal. Although the PCI-E bus protocol is mentioned,
Client's Docket No.: S3U06-0013-TW TT^ Docket No;0608-A43246twf.doc/NikeyChen i〇 1354239 在部分實施例中亦可在主機處理器與圖形處理單元114之 間使用其他的連接和/或通訊方式,例如:PCI、專屬高速 匯流排等。系統記憶體124亦包括驅動軟體128,其可使 用中央處理單元126將指令集或命令傳送至圖形處理單元 114内的暫存器。 1 » 在部分實施例中,:可透過晶片組122使用額外的圖形 處理單元經由PCI-E匯流排協定耦接至第1圖中的元件。 在一實施例中,圖形處理單元100可包括第1圖所顯示之 所有元件,或是較少元件和/或不同於第1圖所顯示之元 件。再者,在部分實施例中,可使用額外的元件,例如耦 接至晶片組122的南橋晶片組。 參考第2圖,第2圖係顯示實施解碼系統200之一實 施例之處理環境的方塊圖。特別是圖形處理單元114包括 圖形處理器202。圖形處理器202包括多執行單元 (execution unit,EU )及計算核心204 (亦稱為軟體可編 程核心處理單元)。在一實施例中,計算核心204包括内 嵌於執行單元資料路徑(execution unit data path,EUDP) 的解碼系統200 (亦稱為VLD單元),其中執行單元資料 路徑被分配至一或多個執行單元。圖形處理器202亦包括 執行單元集合(execution unit pool ’ EUP)控制、頂點/ 串流快取單元206 (這裡稱為執行單元集合控制單元206) 以及具有固定功能邏輯單元(例如包含三角形設定單元 (triangle set-up unit,TSU)、栅格-圖塊產生器(span-tile generator,STG)等)的圖形管線208,其將描述於後。計Client's Docket No.: S3U06-0013-TW TT^ Docket No; 0608-A43246twf.doc/NikeyChen i〇 1354239 may also use other connections and/or between the host processor and graphics processing unit 114 in some embodiments. Communication methods, such as PCI, exclusive high-speed bus, etc. System memory 124 also includes driver software 128 that can be used by central processing unit 126 to transfer instruction sets or commands to a scratchpad within graphics processing unit 114. 1 » In some embodiments, an additional graphics processing unit can be coupled to the elements of Figure 1 via a PCI-E busbar protocol via chipset 122. In one embodiment, graphics processing unit 100 may include all of the elements shown in Figure 1, or fewer elements and/or elements other than those shown in Figure 1. Moreover, in some embodiments, additional components may be utilized, such as a south bridge wafer set coupled to the wafer set 122. Referring to Figure 2, a second diagram is a block diagram showing the processing environment in which an embodiment of the decoding system 200 is implemented. In particular, graphics processing unit 114 includes graphics processor 202. Graphics processor 202 includes a multi-execution unit (EU) and computing core 204 (also known as a software programmable core processing unit). In one embodiment, computing core 204 includes a decoding system 200 (also referred to as a VLD unit) embedded in an execution unit data path (EUDP), wherein the execution unit data path is assigned to one or more executions. unit. The graphics processor 202 also includes an execution unit pool 'EUP' control, a vertex/streaming cache unit 206 (referred to herein as an execution unit set control unit 206), and a fixed function logic unit (eg, including a triangle setting unit) A graphics pipeline 208 of a triangle set-up unit (TSU), a grid-tile generator (STG), etc., which will be described later. meter
Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 算核心204包括多執行單元之集合以符合不同著色d 之著色任務的計算要求,其中著色器程式包括 器、幾何著色器和/或像素著色器處理圖形管 ⑽考色 料。在一實施例中,當著色器透過計算核心2 〇 4執行=, 系統200的功能a寺’圖形處理器實施例的說明將被插述馬 接著說明解碼系統200的特定實施例。 ’’ 解碼系統200可以用硬體、軟體、韌體或其組合等方 式而實施。在較佳實施例中’解碼系統200係以硬體以及 軟體的方式實施’其包括下列已知技術之任何技術或是会士 合:具有邏輯閘且可對資料信號進行邏輯功能的離散 電路、具有適當組合邏輯閘的特殊應用集成電路 (application specific integrated circuit * ASIC)、可程式化 閘極陣列(programmable gate array,PGA )、場式可程式 化閘陣列(field programmable gate array,FPGA)以及狀 態機(state machine )等。 參考第3圖以及第4圖,其分別為圖形處理器202之 實施例中選擇元件的方塊圖。如前所述,解碼系統200的 一實施例可以是具有擴充指令集以及額外硬體元件之圖形 處理器202内的著色器,圖形處理器202的一實施例以及 對應的處理將描述於後。雖然第3圖與第4圖並未顯示圖 形處理的全部元件,但是第3圖與第4圖所顯示的元件已 足夠使熟知此技藝之人士理解到相關圖形處理器的功能及 架構。參考第3圖,可編程處理環境的中心為計算核心 204,其包括解碼系統200並可處理各種指令。不同型式的Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen The calculation core 204 includes a set of multiple execution units to meet the computational requirements of the colored tasks of different shaded d, where the shader program includes the geometry and geometry. The shader and/or pixel shader processes the graphics tube (10) color test. In one embodiment, when the shader is executed through the computational core 2 = 4, the description of the function of the system 200, the GPU processor embodiment, will be interspersed. Next, a specific embodiment of the decoding system 200 will be described. The decoding system 200 can be implemented in the form of hardware, software, firmware, or a combination thereof. In the preferred embodiment, the 'decoding system 200 is implemented in a hardware and software manner' which includes any of the following known techniques or a combination of discrete circuits having logic gates and logic functions on data signals, Application specific integrated circuit (ASIC) with programmable combination gate, programmable gate array (PGA), field programmable gate array (FPGA), and state State machine, etc. Referring to Figures 3 and 4, which are block diagrams of selected elements in an embodiment of graphics processor 202, respectively. As previously mentioned, an embodiment of the decoding system 200 can be a colorizer within the graphics processor 202 having an extended instruction set and additional hardware components, an embodiment of the graphics processor 202 and corresponding processing will be described later. Although Figures 3 and 4 do not show all of the components of the graphics process, the components shown in Figures 3 and 4 are sufficient for those skilled in the art to understand the functionality and architecture of the associated graphics processor. Referring to Figure 3, the center of the programmable processing environment is computing core 204, which includes decoding system 200 and can process various instructions. Different types
Client's Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 12 1354239 著色器程式可執行或映射到計算核心204,例如頂點、幾 何、像素著色器程式。多重事件(multi-issue)處理器的計 算核心204可以在單一時脈週期内處理多個指令。 參考第3圖,圖形處理器202的相關元件包括計算核 心204、紋理择滤(filtering)單元302、像素包裝器(packer) 304、命令流處理器306、寫回單元308、以及紋理位址產 生器310。第3圖亦包括執行單元集合控制單元206,其中 執行單元集合控制單元206亦包括頂點快取記憶體和/或串 流(stream )快取記憶體。舉例來說,如第3圖所顯示, 紋理過濾單元302提供紋素(texel)資料給計算核心204 (輸入A以及輸入B)。在部分實施例中,紋素資料為512 位元資料。 像素包裝器304提供像素著色輸入給計算核心204(輸 入C以及輸入D),像素著色輸入亦為512位元資料格式。 此外,像素包裝器304向執行單元集合控制單元206請求 像素著色任務,而執行單元集合控制單元206便會提供指 定執行單元號碼及執行緒號碼給像素包裝器304。像素包 裝器304及紋理過濾單元302為已知的技術,因此將不再 進一步描述於此。雖然第3圖所顯示之像素及紋素封包為 512位元之資料封包,但是依據圖形處理器202所需的效 能特徵,可在部分實施例中改變封包的大小。 命令流處理器306提供三角形頂點索引給執行單元集 合控制單元206。在第3圖的實施例中,索引為256位元 之資料。執行單元集合控制單元206組合來自串流快取記Client's Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 12 1354239 The shader program can be executed or mapped to the computational core 204, such as vertex, geometry, pixel shader programs. The computational core 204 of the multi-issue processor can process multiple instructions in a single clock cycle. Referring to FIG. 3, the relevant elements of graphics processor 202 include computing core 204, texture filtering unit 302, pixel packer 304, command stream processor 306, write back unit 308, and texture address generation. The device 310. Figure 3 also includes an execution unit set control unit 206, wherein the execution unit set control unit 206 also includes vertex cache memory and/or stream cache memory. For example, as shown in FIG. 3, texture filtering unit 302 provides texel data to computing core 204 (input A and input B). In some embodiments, the texel data is 512 bit data. Pixel wrapper 304 provides pixel shading input to computing core 204 (input C and input D), which is also a 512-bit data format. In addition, the pixel wrapper 304 requests the execution unit collection control unit 206 for the pixel shading task, and the execution unit collection control unit 206 provides the specified execution unit number and thread number to the pixel wrapper 304. Pixel wrapper 304 and texture filtering unit 302 are known techniques and will therefore not be further described herein. Although the pixel and texel packet shown in FIG. 3 is a 512-bit data packet, the size of the packet can be changed in some embodiments depending on the desired performance characteristics of the graphics processor 202. Command stream processor 306 provides a triangle vertex index to execution unit set control unit 206. In the embodiment of Fig. 3, the index is 256 bits of data. Execution unit set control unit 206 combines from stream cache
Client’s Docket No·: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 1354239 憶體的頂點著色輸入,並傳送資料至計算核心204 (輸入 • E)。執行單元集合控制單元206亦組合幾何著色輸入並傳 送至計算核心204 (輸入F)。執行單元集合控制單元206 亦控制執行單元輸入402及執行單元輸出404 (第4圖)。 換句話說,執行單元集合控制單元206控制各輸入流以及Client’s Docket No·: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 1354239 Recall the vertex shader input and transfer the data to compute core 204 (input • E). Execution unit set control unit 206 also combines the geometric shading inputs and passes them to computing core 204 (input F). Execution unit set control unit 206 also controls execution unit input 402 and execution unit output 404 (Fig. 4). In other words, the execution unit set control unit 206 controls each input stream and
» I . 各輸出流至計算核心204。 經過處理之後,計算核心204提供像素著色輸出(輸 出J1與輸出J2)至寫回單元308。像素著色輸出包括色彩 • 資訊,例如紅/綠/藍/透明度(RGBA)資訊,其為此技藝之人 士所熟知。像素著色輸出可以是兩條512位元之資料流。 ' 其他實施例亦可使用其他的位元寬度。» I. Each output flows to computing core 204. After processing, computing core 204 provides pixel shading outputs (output J1 and output J2) to write back unit 308. Pixel shading outputs include color • information such as red/green/blue/transparency (RGBA) information, which is well known to those skilled in the art. The pixel shaded output can be two 512-bit data streams. Other embodiments may use other bit widths.
' 相似於像素著色輸出,計算核心204亦輸出包括UVRQ 資訊之紋理座標(輸出K1以及輸出K2)至紋理位址產生 器310。紋理位址產生器310發出紋理描述符號請求至計 算核心204的L2快取記憶體408 (輸入X),而計算核心 204的L2快取記憶體408 (輸出W)會輸出紋理描述符號 ® 資料至紋理位址產生器310。紋理位址產生器310及寫回 . 單元308為已知的技術,因此將不再進一步描述於此。再 者,雖然URVQ及RGBA是顯示為512位元之資料,但是 此參數亦可隨不同實施例而改變。在第三圖的實施例中, 匯流排分成兩條512位元通道,其中各通道保持四像素的 128位元RGBA色彩值及128位元UVRQ紋理座標。 圖形管線208包括固定功能之圖形處理功能。回應來 自驅動軟體128的命令,例如繪出三角形,則頂點資訊通Similar to the pixel shaded output, the compute core 204 also outputs texture coordinates (output K1 and output K2) including UVRQ information to the texture address generator 310. Texture address generator 310 issues a texture description symbol request to L2 cache memory 408 (input X) of computation core 204, while L2 cache memory 408 (output W) of computation core 204 outputs texture description symbol ® data to Texture address generator 310. Texture address generator 310 and write back. Unit 308 is a known technique and will therefore not be further described herein. Furthermore, although URVQ and RGBA are data shown as 512 bits, this parameter may also vary with different embodiments. In the third embodiment, the bus is divided into two 512-bit channels, each of which holds a four-pixel 128-bit RGBA color value and a 128-bit UVRQ texture coordinate. Graphics pipeline 208 includes graphics processing functions for fixed functions. Respond to commands from the driver software 128, such as drawing a triangle, then the vertex information
Clients Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 14 1354239 過計算核心204内的頂點著色邏輯單元以實施頂點轉換。 ' 尤其是從物件空間轉換物件成為工作空間和/或螢幕空間 的三角形。三角形通過計算核心204至圖形管線208的三 角形設定單元,其中圖形管線208結合基元(primitive ), 並亦執行已知的任務,例如:邊界盒(bounding box )產生、Clients Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 14 1354239 Override the vertex shader logic within core 204 to perform vertex conversion. 'In particular, transform objects from object space into triangles for workspace and/or screen space. The triangle passes through the calculation of the core 204 to the triangle setting unit of the graphics pipeline 208, wherein the graphics pipeline 208 incorporates primitives and also performs known tasks, such as: bounding box generation,
I I 棟選(culling) '邊緣功能產生(edge: function generation) 以及三角形層級剔除(triangle level rejection )。三角形設 定單元傳遞資料至圖形管線208中具有圖塊產生功能的柵 • 格及圖塊產生單元。因此,資料物件被分割成圖塊(例如8 x8、16x16等),並傳遞至其他的固定功能單元以執行深度 (例如Z-值)處理,例如Z-值之高階(例如:在相似的程 ; 序下,高階使用的位元數比低階少)剔除。然後,根據所 接收之紋理及管線資料,將Z-值傳回至計算核心204的像 ' 素著色邏輯元件以作為像素著色功能的效能。計算核心204 將已處理之值輸出至位於圖形管線208内的目的單元。在 不同快取記憶體需要更新内部值之前,目的單元用以執行α ®測試及模板測試。 . 值得注意的是,計算核心204的L2快取記憶體408以 及執行單元集合控制單元206之間亦有512位元之頂點快 取記憶體溢出資料的傳輸。此外,從計算核心204輸出兩 個512位元頂點快取記憶體寫入資料(輸出Ml及輸出M2) 至執行單元集合控制單元206做進一步的處理。 參考第4圖,第4圖係顯示計算核心204的附加元件 以及相關元件。計算核心204包括執行單元集合412。在I I culling 'edge: function generation and triangle level rejection. The triangle setting unit transfers the data to the grid and block generation unit having the tile generation function in the graphics pipeline 208. Therefore, the data object is divided into tiles (eg, 8 x 8, 16x16, etc.) and passed to other fixed functional units to perform depth (eg, Z-value) processing, such as high-order Z-values (eg, in a similar process) In the order, the higher order uses fewer bits than the lower order). The Z-value is then passed back to the image-like primed logic element of computing core 204 as a function of the pixel shading function based on the received texture and pipeline data. The calculation core 204 outputs the processed values to the destination unit located within the graphics pipeline 208. The destination unit is used to perform alpha ® testing and template testing before different cache memories need to update internal values. It should be noted that there is also a 512-bit vertex cache memory overflow data transfer between the L2 cache memory 408 of the compute core 204 and the execution unit set control unit 206. In addition, two 512-bit vertex cache memory write data (output M1 and output M2) are output from the calculation core 204 to the execution unit set control unit 206 for further processing. Referring to Figure 4, Figure 4 shows additional components of computing core 204 and associated components. The computing core 204 includes a set of execution units 412. in
Clients Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 1354239 -只施例t,執行單元集合41 一 420a-420h (統稱為執 G括一或夕個執灯早兀 可以在一個時脈週期 )。母一個執行單元420 合412在尖峰時可= ^ ,雖然第4圖顯示了8個:二 同時處理多個執行緒。 :可以了解的是其並(標示為腳观7), 分實施例令可增加或是減二“早元的數量為8 ’在部 行單元(例如執行單元42〇早7^的數量。至少一個執 -實施例,其將進〜步描述於後。G)包含解碼线的 計算核心2〇4亦包括:憶體存 unit,MXU) 406,苴击 丁狀早兀 C memory access 面仲裁器·體麵單經由記憶體介 408從執行單元集合抑1、取5己憶體408。L2躲記憶體 出資料(輸人G)早①2〇6接收頂點快取記憶體溢 出Η)給執行單元隼八^供頂點快取記憶體溢出資料(輸 體柳從紋理位址制單元206。此外w快取記憶 入X),並對所接收到:^接收紋理描述符號請求(輸 出W)給紋理位址產生器^提供紋理描述符號資料(輸 記憶體介面仲裁n 面,,緩衝器或是局部記=體== 面單凡118對系統提供如PCKE匯流排的介面。記憶體介 面仲裁器410以及匯流排介面單幻18提供了記憶體以及 L2快取記憶體408之間的介面。在部分實施例中,Q快 取記憶體40請由記憶體存取單元4〇6 _至記憶體介面Clients Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 1354239 - Only example t, execution unit set 41 a 420a-420h (collectively referred to as G or one of the lights) Can be in a clock cycle). The parent one execution unit 420 412 can be = ^ at the peak, although the fourth figure shows eight: two simultaneously processing multiple threads. : It can be understood that it is (marked as foot 7), and the number of embodiments can be increased or decreased by two "the number of early elements is 8' in the row unit (for example, the number of execution units 42 is 7^. At least A pertinent-embodiment, which will be described in the following steps. G) The computing core including the decoding line 2〇4 also includes: memory unit, MXU) 406, slamming the C memory access surface arbiter Decent single through the memory medium 408 from the execution unit set 1, take 5 memory 408. L2 hide memory data (input G) early 12 〇 6 receive vertex cache memory overflow Η) to the execution unit隼8^ for vertex cache memory overflow data (transportation from texture address generation unit 206. In addition w cache memory into X), and received: ^ receive texture description symbol request (output W) to texture The address generator ^ provides texture description symbol data (transport memory interface arbitration n-plane, buffer or local record = body == face-to-face 118 provides a interface such as PCKE busbar to the system. Memory interface arbiter 410 And the bus interface single phantom 18 provides the memory and the interface between the L2 cache memory 408 In some embodiments, Q is quickly taken from the memory 40 requested memory access unit to the memory interface 4〇6 _
Clients Docket No.; S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 16 1354239 仲裁器410與匯流排介面單元118。記憶體存取單元406 ^ 將從L 2快取記憶體4 0 8以及其他區塊得到的虛擬記憶體位 址轉換成實際記憶體位址。 記憶體介面仲裁器410對L2快取記憶體408提供記憶 體存取(例如讀出/寫入存取)、指令/常數/資料/紋理的提 » » : 取、直接記憶體存取(例如載入/儲存)、暫存存取的索引、 暫存器溢出以及頂點快取記憶體内容溢出等。 計算核心204更包括執行單元輸入402以及執行單元 • 輸出404,並分別用於提供輸入給執行單元集合412以及 接收來自執行單元集合412的輸出。執行單元輸入402以 及執行單元輸出404可以是交叉開關(crossbar )或是其他 ; 匯流排,或是其他已知的輸入與輸出架構。 執行單元輸入402接收來自於執行單元集合控制單元 206的頂點著色輸入(輸入E)以及幾何著色輸入(輸入F)’ 並提供資訊給執行單元集合412以供各執行單元420進行 處理。此外,執行單元輸入402接收像素著色輸入(輸入 ^ C與輸入D)以及紋素封包(輸入A與輸入B),並將這 . 些封包傳送至執行單元集合412以供各執行單元420進行 處理。再者,執行單元輸入402從L2快取記憶體408接收 資訊(L2讀取),以及當需要時將這些資訊提供給執行單 元集合412。 在第4圖之實施例中,執行單元輸出404被分配成偶 輸出404a以及奇輸出404b。相似於執行單元輸入402,執 行單元輸出404可以是交叉開關、匯流排或是其他已知的Clients Docket No.; S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 16 1354239 Arbiter 410 and bus interface unit 118. The memory access unit 406^ converts the virtual memory address obtained from the L2 cache memory 48 and other blocks into an actual memory address. The memory interface arbiter 410 provides memory access (eg, read/write access), instruction/constant/data/texture to the L2 cache 408. » : fetch, direct memory access (eg Load/store), index of scratch access, scratchpad overflow, and vertex cache memory overflow. The computing core 204 further includes an execution unit input 402 and an execution unit • output 404 and is used to provide input to the execution unit set 412 and receive output from the execution unit set 412, respectively. Execution unit input 402 and execution unit output 404 may be crossbars or other; bus bars, or other known input and output architectures. Execution unit input 402 receives vertex shading input (input E) and geometric shading input (input F)' from execution unit set control unit 206 and provides information to execution unit set 412 for processing by each execution unit 420. In addition, execution unit input 402 receives pixel shading inputs (input C and input D) and texel packets (input A and input B) and passes the packets to execution unit set 412 for processing by each execution unit 420. . Again, execution unit input 402 receives information from L2 cache 408 (L2 read) and provides this information to execution unit set 412 when needed. In the embodiment of Figure 4, the execution unit output 404 is assigned an even output 404a and an odd output 404b. Similar to execution unit input 402, execution unit output 404 can be a crossbar, busbar, or other known
Client's Docket No.: S3U06-0013-TW TT's Docket N〇:0608-A41246twf.doc/NikeyChen 1354239 架構。執行單元偶輸出404a處理偶執行單元420a、420c、 420e以及420g的輸出,而執行單元奇輸出404b處理奇執 行單元420b、420d、420f以及420h的輸出。執行單元偶 輸出404a以及執行單元奇輸出404b共同地接收來自於執 行單元集合412的輸出,例如:UVRQ以及RGBA。這些Client's Docket No.: S3U06-0013-TW TT's Docket N〇: 0608-A41246twf.doc/NikeyChen 1354239 Architecture. The execution unit even output 404a processes the outputs of the even execution units 420a, 420c, 420e, and 420g, and the execution unit odd output 404b processes the outputs of the odd execution units 420b, 420d, 420f, and 420h. Execution unit even output 404a and execution unit odd output 404b collectively receive outputs from execution unit set 412, such as UVRQ and RGBA. These ones
» I 輸出可回傳至L2快取記:憶體408、或是從計算核心204經: 由輸出J1以及輸出J2輸出至寫回單元308,或是經由輸出 Κ1及輸出Κ2輸出至紋理位址產生器310。 執行單元集合412的執行單元流程通常包括多個層 級,其包括:描繪内容層級、執行緒或任務層級,以及指 令或執行層級。在任一時間點,各執行單元420可准許兩 個描繪内容,其中藉由使用一位元旗標或是其他機制來識 別内容。在屬於這個内容的任務開始之前,從執行單元集 合控制單元206傳遞内容資訊。内容層級資訊可包括著色 器種類、輸入/輸出暫存器的數量、指令起始位址、輸出映 射表、頂點識別符以及個別常數緩衝器内的常數。執行單 元集合412的各執行單元420可同時儲存多個任務或執行 緒(例如在部分實施例中有32個執行緒)。在一實施例中, 各執行緒係根據程式計數器來提取指令。 執行單元集合控制單元206可作為任務的總排程,並 利用資料驅動(data-driven )方法(例如:在輸入内的頂點、 像素以及幾何封包)來指派執行單元420内的適當執行 緒。舉例來說,執行單元集合控制單元206指派一執行緒 給執行單元集合412之各執行單元420内的一空執行緒檜» The I output can be passed back to the L2 cache: the memory 408, or from the computation core 204: via the output J1 and the output J2 to the writeback unit 308, or via the output Κ1 and output Κ2 to the texture address Generator 310. The execution unit flow of execution unit set 412 typically includes a plurality of levels including: depicting a content level, a thread or task level, and an instruction or execution level. At any point in time, each execution unit 420 can authorize two rendered content, wherein the content is identified by using a one-bit flag or other mechanism. The content information is delivered from the execution unit collection control unit 206 before the task belonging to this content starts. Content level information can include shader types, number of input/output registers, instruction start addresses, output maps, vertex identifiers, and constants in individual constant buffers. Each execution unit 420 of the execution unit set 412 can simultaneously store multiple tasks or threads (e.g., 32 threads in some embodiments). In one embodiment, each thread fetches instructions based on a program counter. Execution unit set control unit 206 may serve as a general schedule for tasks and assign appropriate threads within execution unit 420 using data-driven methods (e.g., vertices, pixels, and geometry packets within the input). For example, execution unit set control unit 206 assigns a thread to an empty thread within each execution unit 420 of execution unit set 412.
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 18 1354239 (slot)。當開始執行執行緒之後,由頂點快取記憶體、其 他元件或是模組(根據著色器種類)所提供的資料將放置 在通用暫存缓衝器中。 通常,圖形處理器202係使用可編程頂點、幾何以及 像素缓衝器。不把這些元件當成具有不同設計以及指令集Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 18 1354239 (slot). When the thread is started, the data provided by the vertex cache memory, other components, or modules (depending on the shader type) will be placed in the general scratchpad buffer. Typically, graphics processor 202 uses programmable vertex, geometry, and pixel buffers. Do not treat these components as having different designs and instruction sets
• I 的個別固定功能單元而實施這些元件的功能或是操作:’而 是藉由具有統一指令集之執行單元420a、420b...420η的集 合來執行這些操作。除了執行單元420a (其包括解碼系統 200,因此具有額外的功能)之外,各執行單元420的設計 相同並且用於編程操作。在一實施例中,各執行單元420 可同時地進行多執行緒操作。當頂點著色器、幾何著色器 以及像素著色器產生不同的著色任務時,這些著色任務將 傳送至個別的執行單元420去執行。在使用頂點著色器的 一實施例中,解碼系統200可以被實施,其具有部分修改 和/或與其他執行單元420有差別。舉例來說,包含解碼系 統200的執行單元(例如:執行單元420a)與其他執行單 元(例如:執行單元42〇b)之間的差異是執行單元420a 使用一解碼系統200。而其他執行單元與執行單元420a不 同的地方是在於一或多個對應之内部缓衝器中解碼系統 200安排。解碼系統200的資料係藉由連接413以及執行 單元輸入402從記憶體存取單元406所接收。 當個別任務產生時,執行單元集合控制單元206會指 派這些任務給不同執行單元420中可使用的執行緒。當任 務完成時,執行單元集合控制單元206進一步管理相關執• The individual fixed functional units of I implement the functions or operations of these elements: ' Instead, these operations are performed by a collection of execution units 420a, 420b ... 420n having a uniform instruction set. Except for execution unit 420a (which includes decoding system 200, thus having additional functionality), each execution unit 420 is identically designed and used for programming operations. In an embodiment, each execution unit 420 can perform multiple thread operations simultaneously. When the vertex shader, geometry shader, and pixel shader produce different coloring tasks, these coloring tasks are passed to the individual execution unit 420 for execution. In an embodiment using vertex shaders, decoding system 200 can be implemented with partial modifications and/or differences from other execution units 420. For example, the difference between an execution unit (e.g., execution unit 420a) that includes decoding system 200 and other execution units (e.g., execution unit 42A) is that execution unit 420a uses a decoding system 200. The other execution units differ from the execution unit 420a in the arrangement of the decoding system 200 in one or more corresponding internal buffers. The data of decoding system 200 is received from memory access unit 406 via connection 413 and execution unit input 402. When individual tasks are generated, execution unit set control unit 206 will assign these tasks to threads that are available in different execution units 420. When the task is completed, the execution unit set control unit 206 further manages the relevant execution
Client's Docket No.; S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 19 1354239 行緒的釋放。就這點而言,執行單元集合控制單元206指 ' 派頂點著色器、幾何著色器以及像素著色器的任務給不同 執行單元420的執行緒,並紀錄相關的任務以及執行緒。 具體地,執行單元集合控制單元206會維持全部執行單元 420的執行緒以及記憶體的資源表(未顯示)。執行單元Client's Docket No.; S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 19 1354239 Release of the thread. In this regard, execution unit set control unit 206 refers to the tasks of the vertex shader, geometry shader, and pixel shader to the different execution units 420, and records related tasks and threads. Specifically, the execution unit set control unit 206 maintains the threads of all execution units 420 and the resource table (not shown) of the memory. Execution unit
I I 集合控制單元206會明確知道哪一個執行緒被指派給任務 並使用、當執行緒結束後哪一個執行緒會被釋放、多少共 用暫存器檔案記憶體暫存器(register file memory register ) • 在使用中,以及每一個執行單元有多少閒置空間可使用。 因此,當指派任務給執行單元(例如執行單元420a) 時,執行單元集合控制單元206將標示此執行緒為忙碌, ; 並將全部可使用的共用暫存器檔案記憶體減去各執行緒所 佔用之暫存器標案覆蓋區(footprint)的數量。覆蓋區是由 ' 頂點著色器、幾何著色器及像素著色器的狀態而設定或決 定。再者,各著色器狀態可以有不同的覆蓋區大小。例如, 頂點著色器執行緒可以要求10個共用暫存器.檔案暫存 * 器,而像素著色器執行緒可以僅要求5個共用暫存器檔案 . 暫存器。 當執行緒完成其被指派的工作時,執行該執行緒的執 行單元420會發出信號給執行單元集合控制單元206。接 著,執行單元集合控制單元206會更新資源表以標註該執 行緒未使用,並將全部執行緒共用暫存器檔案空間的數量 加回至可用空間。當所有的執行緒都是忙綠或是所有的共 用暫存器檔案記憶體都被分配時(或是剩下的暫存器空間The II collection control unit 206 will clearly know which thread is assigned to the task and use, which thread will be released when the thread ends, and how many registers register memory register (register file memory register). In use, and how much free space is available for each execution unit. Thus, when assigning a task to an execution unit (eg, execution unit 420a), execution unit set control unit 206 will flag the thread as busy; and subtract all available shared register file memories from each thread. The number of scratchpad footprints occupied. The footprint is set or determined by the state of the vertex shader, geometry shader, and pixel shader. Furthermore, each shader state can have a different footprint size. For example, the vertex shader thread can require 10 shared scratchpads. The file scratchpad can only require 5 shared scratchpad files. The scratchpad. When the thread completes its assigned work, the execution unit 420 executing the thread signals the execution unit set control unit 206. Next, execution unit set control unit 206 updates the resource table to indicate that the execution is unused and adds back the total number of thread shared scratch file space back to the available space. When all threads are busy green or all of the shared scratchpad file memory is allocated (or the remaining scratchpad space)
Client's Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyChen 20 1354239 太小而無法容納額外的執行緒時),執行單元420被視為 已全滿,以及執行單元集合控制單元2〇6將不會指派任何 額外或是新的執行緒給該執行單元。 在各執行單元420内部亦有一個執行緒控制器以負責 官理或標丐各執行緒為使用中(例如執行中)声是可使用。 就這點而言,至少在一實施例中,當頂點著色器正在執行 解碼系統200的功能時,執行單元集合控制單元2〇6可以 避免成何著色器以及像素著色器在同一時間被執行。 第5 A圖係顯示具有前述圖形處理器202以及計算核心 204特彳政的執行單元42〇a,其包括内嵌解碼系統2⑻的執 行單元資料路徑512。具體來說,第5A圖是執行單元42〇a 的方塊圖。在一實施例中,執行單元42〇a包括指令快取記 憶體控制器504、耦接於指令快取記憶體控制器5〇4的執 行緒控制器506、缓衝器508 (例如:常數緩衝器)、共用 暫存器槽案(common register file,CRF) 51 〇、麵接於執 行緒控制器506和緩衝器508以及共用暫存器檔案51〇的 執行單元資料路徑(EU data path,EUDP ) 512、執行單元 資料路控先進先出緩衝器(first in first out,FIFO ) 514、 述 5司暫存器稽案(predicate register file,PRF ) 516、純量 暫存器檔案(scalar register file,SRF) 518、資料輸出控 制器520以及執行緒任務介面524。如前所述,執行單元 420攸執行單元輸入402接收輸入,並提供輸出給執行單 元輸出404。 執行緒控制器506提供執行單元420a的控制功能,其Client's Docket No.: S3U06-0013-TW TT, s Docket No: 0608-A41246twf.doc/NikeyChen 20 1354239 Too small to accommodate additional threads), execution unit 420 is considered full, and execution unit The collection control unit 2〇6 will not assign any additional or new threads to the execution unit. There is also a thread controller inside each execution unit 420 to be responsible for the official or standard threads for use (e.g., in progress). In this regard, in at least one embodiment, when the vertex shader is performing the functions of the decoding system 200, the execution unit set control unit 〇6 can avoid which shader and pixel shader are executed at the same time. Figure 5A shows an execution unit 42A having the aforementioned graphics processor 202 and computing core 204, which includes an execution unit data path 512 of the embedded decoding system 2 (8). Specifically, FIG. 5A is a block diagram of the execution unit 42A. In one embodiment, the execution unit 42A includes an instruction cache controller 504, a thread controller 506 coupled to the instruction cache controller 5〇4, and a buffer 508 (eg, a constant buffer) The common register file (CRF) 51, the interface to the thread controller 506 and the buffer 508 and the shared register file 51〇 EU data path (EUDP) 512, execution unit data control first in first out (FIFO) 514, 5 predicate register file (PRF) 516, scalar register file , SRF) 518, data output controller 520, and thread task interface 524. As previously described, execution unit 420 攸 execution unit input 402 receives input and provides output to execution unit output 404. The thread controller 506 provides a control function of the execution unit 420a, which
Client's Docket No.: S3U06-00I3-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 21 丄 各執订緒的功能以及判斷功能,例如決定如何執 512 200 j ^ !:!ΐΓ 通常包括執行不同計算的功能,並包 二fT ^ 乂及整數計算邏輯單元(anthmetlc loglc她, ALU),、私位邏輯功能等的邏輯電路。 資料輸出控制器520蔣p — L 1 單元輸出姻之某些元件::成,:身:移至減於執行 的頂點快取記憶體、寫回單元集合控制單元挪 -傳送「任務結束」的執行單元㈣路徑 知任務已完成。資料輪出㈣器520 ’並告 料輸出控制器別從儲存複數個寫入埠。資 容所指定的_位置==== 的輸出貢料項目,並將資料發送至執行單元輸出4貝04。 執灯緒任務介面524送出執行單元伽完成之任務識 別符給執行單元#合㈣單元。任務識別符會通純 行单元集合控制單元206以指派新任務給一 (例如:執行單元420a)。 μ订早兀 在一實施例中,緩衝器508可分成16個區塊,其中各 區塊有16槽,而每一槽有128位元的水平向量常數^著色 器使用運算元以及索引以存取常數緩衝器槽。舉例來說, 索引可以是包括32位元不具正負號之整數或是接近”位 元不具正負號之常數的暫時暫存器。 指令快取記憶體控制器5〇4是到執行緒控制器5〇6的Client's Docket No.: S3U06-00I3-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 21 丄The functions of each thread and the judgment function, for example, how to perform 512 200 j ^ !:! ΐΓ usually includes performing different calculations The function, and package two fT ^ 乂 and integer computing logic unit (anthmetlc loglc her, ALU), private logic function and other logic circuits. The data output controller 520 jiang p — L 1 unit outputs some elements of the marriage::,: body: moved to the vertex cache memory minus the execution, the write back unit collection control unit moves - transmits the "task end" The execution unit (4) path knowledge task has been completed. The data is rotated (four) 520 ’ and the output controller is not written from the memory. The output tribute item of the _ position==== specified by the content is sent to the execution unit output 4B 04. The light task interface 524 sends the task identifier of the execution unit gamma to the execution unit #四(四) unit. The task identifier passes through the pure cell set control unit 206 to assign a new task to one (e.g., execution unit 420a). In an embodiment, the buffer 508 can be divided into 16 blocks, wherein each block has 16 slots, and each slot has 128-bit horizontal vector constants. The shader uses operands and indexes to store Take the constant buffer slot. For example, the index may be a temporary register including a 32-bit unsigned integer or a constant near the "bit" without a sign. The instruction cache memory controller 5〇4 is to the thread controller 5 〇6
Client's Docket No.: S3U06-0013-TW TTys Docket No:0608-A41246twf.doc/NikeyChen 22 介面區塊。當執行緒控制器笋 記憶體提取可執行著色哭胃存在時(例如從指令 ^ ^ ^ ^ i ^ # ^ ^ ;'' ^ ^ ^ ft ^^1504 (hit/miss)測試。舉例來1不)U執仃命甲/未命中 ㈣體控制器5。4二2求?指令是位於指令快 所請求的指令將從u ; ^ ’則命中發生。當 取時,則未命中發生。當命^巧記憶_提 單元輸人402的請求,^守’如果沒有來自執行 同意請求,隐體控制器5〇4即可 取記憶體只有一個讀寫蜂器5。4的指令快 1=:’如果未命中發生時,當快取記憶體_ 内有了取代的區塊以及有空 foIT::::%-;------- 5。“”_憶體具有=中:令 區塊帶有2位元狀離”时中母插一 W 4個區塊。各 效、载入、或是_^ 種“ ’ Μ別是無 為厂無Uf= 資料之前,區塊 態;以及合L2資料H 塊變為「載入」狀 々 貝料載入後’區塊變為「有效」狀態。 ::執:單元資料路徑512可對述詞暫存器檔案516 42〇Γ= U輸人術作為進人資料與執行單元 :的:面。在—實施例中’執行單元輸入4〇2包含一個 =先進先出緩衝器以缓衝進入資料。執行單元輸入4〇2 亦可傳送資料至指令快取記憶體控制器5〇4的指令快取記Client's Docket No.: S3U06-0013-TW TTys Docket No: 0608-A41246twf.doc/NikeyChen 22 Interface Block. When the thread controller memory extraction can be performed, the coloring crying stomach exists (for example, from the command ^ ^ ^ ^ ^ # ^ ^ ;'' ^ ^ ^ ft ^^1504 (hit/miss). For example, 1 U 仃 仃 / 未 未 未 未 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四 四When it is taken, the miss occurs. When the life of the memory _ _ unit input 402 request, ^ 守 'If there is no request to execute the consent, the hidden controller 5 〇 4 can take the memory only one read and write bee 5. The instruction of the fast is 1 =: 'If a miss occurs, when there is a replacement block in the cache memory _ and there is empty foIT::::%-;------- 5. ""_Remembered body has = medium: Let the block have 2 bits away from the "When the middle mother inserts a W 4 blocks. Each effect, load, or _^ kind" ' Screening is no factory Uf= Before the data, the block state; and the L2 data H block becomes "loaded", and the block becomes "valid" after loading. ::Execution: The unit data path 512 can be used for the predicate file 516 42〇Γ= U input technique as the entry data and execution unit:: face. In the embodiment, the execution unit input 4〇2 contains a = first in first out buffer to buffer incoming data. Execution unit input 4〇2 can also transfer data to the command cache memory controller 5〇4 instruction cache
Docket 1354239 憶體以及常數缓衝器508。執行單元輸入402亦維持著色 ' 器内容。 執行單元輸出404作為從執行單元420a送出資料至執 行單元集合控制單元206、L2快取記憶體408、以及寫回 單元308的介面。在一實施例中,執行單元輸出404包含Docket 1354239 memory and constant buffer 508. Execution unit input 402 also maintains the coloring content. The execution unit output 404 serves as an interface for sending data from the execution unit 420a to the execution unit set control unit 206, the L2 cache memory 408, and the write back unit 308. In an embodiment, execution unit output 404 includes
1 I :一個4項目先進先出缓衝器,用以接收仲裁之請求,並緩 衝執行單元集合控制單元206的資料。執行單元輸出404 包含多種功能,其包括仲裁指令快取記憶體讀取請求、資 • 料輸出寫入請求以及執行單元資料路徑讀出/寫入請求的 功能。 共用暫存器檔案510用於儲存輸入、輸出、以及暫存 ; 資料。在一實施例中,共用暫存器檔案510包括具有 128x128位元暫存器檔案之一讀一寫埠和一讀寫埠的八個 ' 記憶庫(bank)。一讀一寫埠是由執行單元資料路徑512 所使用,以供由指令執行所初始的讀出以及寫入存取。記 憶庫0、2、4以及6係由偶數執行緒所共用,而記憶庫1、 ® 3、5以及7係由奇數執行緒所共用。執行緒控制器506比 . 對不同執行緒的指令,並確認共用暫存器檔案的記憶體沒 有讀出或寫入記憶庫之衝突。 一讀寫埠是由執行單元輸入402以及資料輸出控制器 520所使用,用以載入初始執行緒輸入資料並將最後執行 緒輸出寫至執行單元集合控制單元資料緩衝器及L2快取 記憶體408或是其他模組。執行單元輸入402以及執行單 元輸出404共用一個讀寫輸入/輸出埠,以及在一實施例1 I : A 4-item FIFO buffer for receiving the request for arbitration and buffering the data of the execution unit set control unit 206. Execution unit output 404 includes a variety of functions including arbitration instruction cache memory read requests, resource output write requests, and execution unit data path read/write requests. The shared register file 510 is used to store input, output, and temporary storage; In one embodiment, the shared scratchpad file 510 includes eight 'banks' having one of 128x128-bit scratchpad files read and write and one read/write. The read-and-write write is used by the execution unit data path 512 for the initial read and write accesses performed by the instruction. Memory banks 0, 2, 4, and 6 are shared by even threads, while banks 1, ® 3, 5, and 7 are shared by odd threads. The thread controller 506 compares the instructions of the different threads and confirms that the memory of the shared scratchpad file does not have a read or write memory conflict. A read/write buffer is used by the execution unit input 402 and the data output controller 520 to load the initial thread input data and write the final thread output to the execution unit set control unit data buffer and the L2 cache memory. 408 or other modules. Execution unit input 402 and execution unit output 404 share a single read/write input/output port, and in an embodiment
Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 24 1354239 中,寫入比讀出具有較高的優先權。512位元的輸入資料 進入四個不同的記憶庫以避免將資料載入至共用暫存器檔 案510時會發生衝突。傳送2位元通道索引、資料以及512 位元對齊基準位址(aligned base address )以指定輸入資料 的開始記憶庫。舉例來說,當開始通道索引為1時,假設Client's Docket No.: S3U06-0013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 24 1354239, write ratio readout has a higher priority. The 512-bit input data enters four different banks to avoid collisions when loading data into the shared scratchpad file 510. The 2-bit channel index, data, and 512-bit aligned base address are transmitted to specify the starting memory of the input data. For example, when the start channel index is 1, the assumption
' I 執行緒基準記憶庫偏移量(offset)為0,則從最低有效位 : 元(lest significant bit,LSB )起算的第一個128位元被載 入至記憶庫卜下一個128位元被載入至記憶庫2…等,以 及最後一個128位元被載入至記憶庫0。值得注意的是, 使用執行緒ID的兩個最低有效位元來產生記憶庫偏移 量,以隨機排列每一個執行緒的開始記憶庫位置。 可使用共用暫存器檔案暫存器索引以及執行緒ID以建 立唯一的邏輯位址,使標籤能比對共用暫存器檔案510所 寫入以及讀出的資料。舉例來說,位址可以排成128位元, 即共用暫存器檔案記憶庫的寬度。藉由結合8位元之共用 暫存器檔案暫存器索引以及5位元之執行緒ID,可以建立 13位元的位址以產生唯一的位址。每一個1024位元線具 有一標籤,以及每一位元線有兩個512位元項目(字元)。 各字元儲存於4個記憶庫中,以及將共用暫存器檔案索引 的兩個最低有效位元加入至目前執行緒的記憶庫偏移量以 建立記憶庫選擇。 標籤比對方法可讓不同執行緒的暫存器共同使用共用 暫存器檔案510以有效利用記憶體,因為執行單元集合控 制單元206紀錄共用暫存器檔案510的記憶體使用程度,' I Threading the reference memory offset (offset) is 0, then the first 128 bits from the least significant bit: the least significant bit (LSB) are loaded into the next 128 bits of the memory bank. Loaded into memory 2...etc, and the last 128 bits are loaded into memory bank 0. It is worth noting that the two least significant bits of the thread ID are used to generate the memory offset to randomly rank the starting memory locations of each thread. The shared scratchpad file register index and the thread ID can be used to create a unique logical address that allows the tag to compare the data written and read by the shared scratchpad file 510. For example, the address can be arranged in 128 bits, which is the width of the shared scratch file archive. By combining the 8-bit shared scratchpad file register index and the 5-bit thread ID, a 13-bit address can be created to generate a unique address. Each 1024-bit line has a label, and each bit line has two 512-bit items (characters). Each character is stored in four banks, and the two least significant bits of the shared scratchpad file index are added to the current library's memory offset to establish a bank selection. The tag comparison method allows the different scratchpads to use the shared scratchpad file 510 to effectively utilize the memory, because the execution unit set control unit 206 records the memory usage of the shared scratchpad file 510.
Client's Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 的 =保對執行單元餘的新任務進行排程之前有足约Client's Docket No.: S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen = Guaranteed for the new task of the execution unit before scheduling
對*日召古\ Q 小以檢= =之全部共用暫存器檔案暫存器的大 著手進賴^ 槽案索引。在執行緒控制器= 後,藉由資*曰存讀案51G内。當執行緒執行結束 輸出資料4輪出控制器別從共用暫存器槽案5K)讀取 實施:::仃手凡420之實施例包括内含解碼系統200之 資料路元資料路徑512,第5B圖係顯示執行單元 存器。執料路徑512包含暫 計算邏輯單元5二工:二28、向里'予點早兀532、向量整數 器样幸H 殊目的單元536、多工器538、暫存 個;變長产’:及解碼系統200。解碼系統200包含-或多 、又解石馬(variable length decoding,VLD )單元 530, 二广解码—或多個串流。例如’單—可變長度解碼單元 可以解碼單—串流,兩個可變長度解碼單元530 (如虛 線所顯*,因簡潔之故而未顯示其連接關係)可以同時解 碼兩個串流料。為了說明,之後的敘述僅針雜用單一 可變長度解碼單元530之解碼系統2〇〇的操作,可以了解 的疋其原則可推衍至超過一個可變長度解瑪單元。 如圖所示,執行單元資料路徑512包含對應於可變長 度解碼單元530、向量浮點單元532、向量整數計算邏輯單 元534以及特殊目的單元536的一些平行資料路徑,其根For the *日召古\Q small to check = = all of the shared register file register is the first to go into the ^ slot index. After the thread controller = =, the file is read by 51*. When the thread execution ends the output data 4 rounds out the controller does not read from the shared register slot 5K) implementation::: The embodiment of the hand 420 includes the data path data path 512 of the decoding system 200, The 5B system displays the execution unit register. The routing path 512 includes the temporary calculation logic unit 5 two: two 28, the inward 'pre-point early 532, the vector integer device-like unit H 536, the multiplexer 538, the temporary storage; variable length production': And decoding system 200. The decoding system 200 includes - or more, a variable length decoding (VLD) unit 530, a second wide decoding, or a plurality of streams. For example, the 'single-variable-length decoding unit can decode the single-streaming, and the two variable-length decoding units 530 (shown by the dotted line*, which are not shown for simplicity) can simultaneously decode the two streams. For purposes of illustration, the following description will only utilize the operation of the decoding system 2 of a single variable length decoding unit 530, and the principles that can be understood can be derived to more than one variable length gamma unit. As shown, execution unit data path 512 includes some parallel data paths corresponding to variable length decoding unit 530, vector floating point unit 532, vector integer calculation logic unit 534, and special purpose unit 536, the root of which
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 26 1354239 ^接收到的指令執行對應的操作。暫存器槽案526接收 2凡(標不為SRCUSRC2)。在—實施例巾,暫存器 =可對應於第从圖所顯示之共用暫存11播案510、 Ϊ暫存“案516,和/或純量暫存器槽案51卜值得注 :的是在某些實卒例中,可使用額外的運算元。操作(功 月山吕號線542提供各單元53(M36接收運算信號的媒介 :dlUm) °當前信號線544耗接至多工器528,傳送編 令^前值以供各單元53G·536完成小整數值的整 >運异° ^令解碼器(未顯示)提供運算元、運算(功能) ,號=及當前信號。資料路徑(可包含寫回階段)末端的 夕时538 4擇已被選擇之正確資料路徑的輪出結果並提 供輸出給暫存器槽案54〇。輸出暫存器構案540包括目標 兀件’其可以是相同於暫存器檔案526或是不同暫存器: 元件。值得注愿、的是在實施例中,當來源以及目標暫存器 包含相同元件時’指令提供之位元具有由多共輯使^ 來源與目標選擇以多路傳輸資料至/來自適當暫存器 因此,執行單元420a可視為多階管線(例如案。 具有4個計算邏輯單元),並在4個執行階段中官線, 操作。需要實施延遲以允許執行解碼執行緒。舉例=解螞 當位元流緩衝器發生向下溢位(underfl〇w)、等候了成’ 容記憶體、等候將位元流載入至先進先出緩衝器以及防始内 暫存器(解釋於後),和/或處理時間已超過時間之sreg 限(threshold)時,可以在執行階段加入延遲。足弋定 如前所述’在部分實施例中,解碼系統2〇〇 At 也使用單Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 26 1354239 ^The received command performs the corresponding operation. The register slot 526 receives 2 (not labeled SRCUSRC2). In the embodiment towel, the register = can correspond to the shared temporary storage 11 broadcast 510 shown in the figure, the temporary storage "case 516, and / or the scalar register slot 51 is worthy of note: In some cases, additional operands may be used. Operation (gongyueshan Lu line 542 provides each unit 53 (M36 receives the medium of the operation signal: dlUm) ° The current signal line 544 is drained to the multiplexer 528 The transfer code is pre-valued for each unit 53G·536 to complete the integer value of the small integer value. The decoder (not shown) provides the operand, operation (function), number= and current signal. The end of the (can include the writeback phase) 538 4 selects the rounded result of the selected data path and provides the output to the scratchpad slot 54. The output register configuration 540 includes the target component 'its It can be the same as the scratchpad file 526 or a different register: component. It is worth noting that in the embodiment, when the source and the target register contain the same component, the bits provided by the instruction have a total of Make ^ source and target selection to multiplex data to/from the appropriate register Execution unit 420a can be regarded as a multi-stage pipeline (for example, having four computational logic units) and operating in four execution phases. It is necessary to implement a delay to allow execution of the decoding thread. The stream buffer has a downward overflow (underfl〇w), waits for the memory, waits for the bit stream to be loaded into the FIFO buffer, and the anti-initial register (explained later), and / Or when the processing time has exceeded the sreg limit of time, the delay can be added during the execution phase. As mentioned above, in some embodiments, the decoding system 2〇〇At also uses a single
Client’s Docket No·: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 一執行單元420a同時解碼兩個位元流。舉例來說,根據一 個擴充指令集,解碼系統可以使用兩個資料路徑(例如新 增另一可變長度解碼單元530)以同時進行兩個串流的解 碼,然而可一次解碼較多或較少的串流(因此會使用較多 或較少的資料路徑)。當需要多個串流時,解碼系統200Client's Docket No:: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 An execution unit 420a decodes two bit streams simultaneously. For example, according to an extended instruction set, the decoding system can use two data paths (eg, add another variable length decoding unit 530) to simultaneously decode two streams, but can decode more or less at a time. Streaming (so more or fewer data paths are used). Decoding system 200 when multiple streams are required
» I 的部分實施例並未限定於同時解碼。再者,在部分實施例 中,單一可變長度解碼單元530可以執行串流之多個同時 發生的解碼。 在實施例中,當解碼系統200使用兩個資料路徑時, 兩個執行緒可以同時運行。例如,在兩串流解碼之實施例 中,執行緒的數量限制為兩個,其中指派第一執行緒(例 如執行緒0)給解碼系統200的第一記憶庫(即可變長度 解碼單元530),而指派第二執行緒(例如執行緒1)給解 碼系統200的第二記憶庫(例如第5B圖虛線所顯示之可變 長度解碼單元)。在部分實施例中,兩個或多個執行緒可 運作在單一記憶庫。在部分實施例中,雖然顯示解碼系統 200是内嵌於執行單元資料路徑512内,其亦可包含其他 的元件,例如執行單元集合控制單元206内的邏輯電路。 在下面的描述中,可變長度解碼單元530以及解碼系統200 可交換使用,而可以了解到解碼系統200可包括一或多個 可變長度解碼單元530。 將描述位於解碼系統200下的結構,而各單獨解碼系 統模式描述如下。特別地,在一實施例中,由驅動軟體128 所提出之下列指令可設定不同模式。進一步描述如下:指Some embodiments of I are not limited to simultaneous decoding. Moreover, in some embodiments, single variable length decoding unit 530 can perform multiple simultaneous decoding of the stream. In an embodiment, when the decoding system 200 uses two data paths, the two threads can run simultaneously. For example, in an embodiment of two stream decoding, the number of threads is limited to two, wherein a first thread (eg, thread 0) is assigned to the first bank of decoding system 200 (ie, variable length decoding unit 530) And assign a second thread (e.g., thread 1) to the second memory of decoding system 200 (e.g., the variable length decoding unit shown by the dashed line in Figure 5B). In some embodiments, two or more threads can operate in a single memory bank. In some embodiments, although display decoding system 200 is embedded within execution unit data path 512, it may also include other components, such as logic circuitry within execution unit set control unit 206. In the following description, variable length decoding unit 530 and decoding system 200 are used interchangeably, and it is understood that decoding system 200 can include one or more variable length decoding units 530. The structure located under the decoding system 200 will be described, and the individual decoding system modes are described below. In particular, in one embodiment, the following instructions presented by driver software 128 may set different modes. Further described as follows:
Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 1354239 令INIT_CTX (設置解碼系統200為CABAC處理模式)、 指令INIT_CAVLC (設置解碼系統200為CAVLC處理模 式)、指令INIT_MPEG2 (設置解碼系統200為MPEG-2 處理模式),以及指令INIT_VC1 (設置解碼系統200為 YC-1/WMV9處理模式)。在部分實施例中,經由指令 » I_NIT_AVS可提供額外的初始化,其可初始化音頻視頻標準 (audio video standard,AVS)位元流編碼。對 EXP-Golomb 系統而言,在CABAC以及CAVLC編碼下使用 EXP-Golomb編碼符號’因此指令iNIT_CTX以及指令 INIT—CAVLC下載EXP-Golomb系統的位元流。其中,不 需要對EXP-Golomb系統進行初始。舉例來說,對要被編 碼的符號而言,在位元流(例如在片段標頭位準的位元設 定)所接收之計算編碼旗標會顯示符號為EXP-Golomb編 碼、CABAC編碼以及CAVLC編碼。當使用EXP-Golomb 編碼時’執行下列所提出之適當的EXP_Golomb編碼指令。 雖然這些模式會影響編碼引擎的實施,其亦會影響初始、 使用以及更新s己憶體的方法,進—'步描述於後。 參考弟5C圖,弟5C圖係顯示可變長度解碼單元530 之功能方塊圖,用以根據所選擇之模式完成任何複數解碼 知作之一。可k長度解碼單元530包括可變長度解碼邏輯 電路550’其中可變長度解碼邏輯電路55〇耦接於由sreg 串流緩衝器/DMA引擎562 (於此亦稱為DMA引擎模組) 所組成之位元流緩衝器管理以及鄰近内容記憶體 (neighborhood context memory,NCM ) 564 (亦稱為内容Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 Let INIT_CTX (set decoding system 200 to CABAC processing mode), instruction INIT_CAVLC (set decoding system 200 to CAVLC processing mode), command INIT_MPEG2 (Setting the decoding system 200 to the MPEG-2 processing mode), and the instruction INIT_VC1 (setting the decoding system 200 to the YC-1/WMV9 processing mode). In some embodiments, additional initialization may be provided via the instruction » I_NIT_AVS, which may initialize an audio video standard (AVS) bitstream encoding. For the EXP-Golomb system, the EXP-Golomb coded symbols are used under CABAC and CAVLC coding. Thus the iNIT_CTX and the INIT-CAVLC are commanded to download the bitstream of the EXP-Golomb system. There is no need to initialize the EXP-Golomb system. For example, for a symbol to be encoded, the computed coding flag received in the bitstream (eg, the bit set at the slice header level) will display the symbols as EXP-Golomb, CABAC, and CAVLC. coding. When using EXP-Golomb encoding, 'execute the appropriate EXP_Golomb encoding instructions as set forth below. Although these modes affect the implementation of the encoding engine, they also affect the initial, use, and update methods of the suffix, and the steps are described later. Referring to the Brother 5C diagram, the Brother 5C diagram displays a functional block diagram of the variable length decoding unit 530 for performing any of the complex decoding techniques in accordance with the selected mode. The k-length decoding unit 530 includes a variable length decoding logic circuit 550' in which the variable length decoding logic circuit 55 is coupled to the sreg stream buffer/DMA engine 562 (also referred to herein as a DMA engine module). Bitstream buffer management and proximity context memory (NCM) 564 (also known as content)
Clienfs Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 29 丄 j:)4z:)y 記憶體)。可變長度解码罩 甘—k m、 疋530亦包括一或多個暫存器 566 ’其包括用以儲存來自勃叩 于木目執仃早元420 (「CONTR〇l 例如使用來自執行單元之醢 σ 之解碼态的控制信號以選擇可變長 度解碼邏輯電路550的模组彳t + ^ 00 犋.'且)有關給定模式之選擇的解碼 負料之暫存器、運算亓「也,上「。 ^ (例如,SRC1」以及厂SRC2」), 以及轉發暫存器(例如「F1 丄1U及1 F2」)。SREG串流 緩衝器/DMA弓丨擎562包括卯站女。。 , 匕枯bREG暫存态562a以及位元流Clienfs Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 29 丄 j:) 4z:) y memory). The variable length decoding cover-km, 疋 530 also includes one or more registers 566' which are included for storage from the burgeoning 木 木 仃 ( ( ( ("CONTR〇l for example using 来自 σ from the execution unit The control signal of the decoded state is selected to select the module 彳t + ^ 00 犋.' of the variable length decoding logic circuit 550 and the register of the decoding material for the selection of the given mode, and the operation "also, upper" ^ (for example, SRC1" and factory SRC2"), and forwarding registers (such as "F1 丄1U and 1 F2"). The SREG Stream Buffer/DMA Bow Engine 562 includes a station female. . , dry bREG temporary state 562a and bit stream
缓衝益562b ,將進一步解釋於後。 在一實施例中,可變長度解石馬邏輯電路550包括第5C 圖所顯示組(亦稱為邏輯電路)。可變長度解碼邏輯 電路55〇包括硬體,其包括暫存器和/或布林或是計算邏輯 電路’用以執彳了指令並㈣所選擇之模錢行解碼。進— 步解釋,可變長度解碼邏輯電路55〇包括讀取鄰近内容記 憶體模組(read一NCM ) 568、檢查字串(iNpSTR)模組 570、讀取模組572、計算前導】(CL0)模組574、計算前Buffering benefit 562b will be further explained later. In one embodiment, the variable length sarcasm logic circuit 550 includes the set (also referred to as a logic circuit) shown in FIG. 5C. Variable length decoding logic 55 includes hardware including a register and/or a Boolean or computational logic 'for asserting instructions and (d) selecting the selected bank to decode. Further, the variable length decoding logic circuit 55 includes reading a neighboring content memory module (read-NCM) 568, an inspection string (iNpSTR) module 570, a reading module 572, and a computing preamble (CL0). Module 574, before calculation
導0(0:1^)模組576、]\/^0模組578、以从(:模組58〇、 CAVLC模組582,以及耦接於計算前導〇(CLZ)模組576 之Exp-Golomb模組584。計算前導〇 (CLZ)模組576以及 汁异鈿導l(CLO)模組574包括可解碼MPEG-2以及VC-1 位元流之指令。關於Exp-Golomb模組584,Exp-Golomb 符號由跟在1之後的一些前導零所編碼,接著一些位元會 專於零的數置。計异前導〇 (CLZ)模組576彳貞測前導零的 數量,接著移動這些位元加上1以記錄前導零的數量。Guide 0 (0:1^) module 576,]\/^0 module 578, from (: module 58〇, CAVLC module 582, and coupled to the calculation of the leading 〇 (CLZ) module 576 Exp - Golomb module 584. The Computational Leading (CLZ) module 576 and the CLO module 574 include instructions for decoding MPEG-2 and VC-1 bitstreams. About the Exp-Golomb module 584 The Exp-Golomb symbol is encoded by some leading zeros following the 1 followed by some bits that are specific to the zero number. The differential leading 〇 (CLZ) module 576 measures the number of leading zeros, then moves these The bit is incremented by 1 to record the number of leading zeros.
Exp-Golomb模組584讀取尾隨位元(trailing bit)的數量,The Exp-Golomb module 584 reads the number of trailing bits.
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen % 明4239 並根據Exp-Golomb模式而執行計算以判斷值。 嗔取鄰近内容記憶體模組568包括對應於產生位址以 ^請求記憶體讀轉作的邏輯電I在記憶體讀取操作 ^,從鄰近内容記憶體564讀取固定的位元數並輪出資料 心標暫存器。鄰近内容t己憶體指令為從内容記憶體⑹ =32位兀的資料並經由:多工器嫩傳回所讀取的值給執 420a的目標暫存器。CABAC以及cavlc編碼沒 ^用到鄰㈣容記憶體指令,然㈣其他可變長度解碼 ,异而言(例如:H、MPEG_4 Asp (DivX)),可使 内容記憶體564以維持可變長度解碼表,以 取鄰近内容記憶體模組以讀取可變長度解石馬表内的值/貝 讀取模組572包含邏輯電路以讀取sreg暫存哭 仏,且從SREG暫存器泌之最高有效位元(則二 職tblt,MSB)部分擷取特定位元數,零延伸(咖 ,並將值放入暫存器内。因此,讀取模組572包 =輯電路讀行讀取操作,其讀轉定位元數並從猶〇 :存裔562a移除以傳回不具正負號數值的值給目標暫存 為。檢查字串模組570 &SREG暫存器562a讀取固定位元 數,但沒有從SREG暫存器562a移除任何位元(例如不改 變指標位置),並傳回不具正負號數值的值給目標暫存器。 各模組568-584皆輕接至多工器训,其中多工器训 根據各自的命令而選擇-模式。在—實施财’多工器娜 的輸出提供至目標暫存器以進一步處理。模組⑽撕的 輸出亦提供至多工器586,其對應於一命令,選擇模組Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen % Ming 4239 and perform calculations according to the Exp-Golomb mode to determine the value. The contiguous content memory module 568 includes a logical memory I corresponding to the generated address to request the memory read transfer, and the fixed bit number is read from the adjacent content memory 564 and the round The data heart register is stored. The adjacent content t-memory command is the data from the content memory (6) = 32 bits and is returned to the target register of the 420a via the multiplexer. CABAC and cavlc encoding do not use adjacent (four) capacitive memory instructions, but (iv) other variable length decoding, in other words (for example: H, MPEG_4 Asp (DivX)), can enable content memory 564 to maintain variable length decoding The table is to take the adjacent content memory module to read the value in the variable length stone table. The reading module 572 includes a logic circuit to read the sreg temporary crying, and is secreted from the SREG register. The most significant bit (the second job tblt, MSB) part captures the specific number of bits, zero extension (coffee, and puts the value into the scratchpad. Therefore, the read module 572 package = circuit read row read Operation, which reads and locates the number of elements and removes it from the sacred: 562a to return the value of the unsigned value to the target temporary. The check string module 570 & SREG register 562a reads the fixed bit The number of elements, but no bits are removed from the SREG register 562a (for example, the position of the indicator is not changed), and the value of the unsigned value is returned to the target register. Each module 568-584 is connected to the multiplex. Instructor training, in which multiplexer training chooses according to their respective orders - mode. Output is provided to the target register for further processing. ⑽ module also provides an output to the tear multiplexer 586, which corresponds to a command, choosing module
Client's Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyChen 31 1354239 569-582的輸出並提供至SREG暫存器562a以作為輪入。 在個別相同的運算期間,提供來自轉發、控制以及運算暫 存益566的貢料給CABAC模組580以及CAVLC模組582 使用。經由接收控制信號(標示為第5C圖的 EXP_GOLOMB_〇p ),以致能 Exp_G〇1〇mb 模組似。The output of Client's Docket No.: S3U06-0013-TW TT, s Docket No: 0608-A41246twf.doc/NikeyChen 31 1354239 569-582 is provided to SREG register 562a as a round-in. The tribute from the forwarding, control, and computational savings benefits 566 is provided to the CABAC module 580 and the CAVLC module 582 during individual identical operations. By means of the receive control signal (labeled as EXP_GOLOMB_〇p in Figure 5C), the Exp_G〇1〇mb module is similar.
Exp-Golomb模組584接收來自計算前導〇(CLZ)模組576 的輸入並提供輪出至多工器586。CABAC模組58〇以及 CAVLC模組582可使用内容記憶體564。 對除了 CABAC以及CAVLC模式之外的全部模式而 言,讀取指令為從SREG暫存器562a讀取n位元,並經由 多工器586傳回所讀取的值至執行單元42〇a的目標暫存 器。對除了 CABAC以及CAVLC模式的模式而言,使用 内谷記憶體564以維持上方以及左方的内容值,其為自動 δ賣取以作為解碼程序的部分。這些元件以及可變長度解碼 單元530的其他元件將結合不同模式而進一步描述於後二 值的注意的是在部分實施例中,可變長度解碼邏輯電路 可包括少於(或多於)全部所顯示之模組和/或多工器。 將描述可變長度解碼單元53〇的一般功能,而可變香 度解碼單元530配置在不同模式下的操作將進一步福、 CABAC解碼 下面簡單解釋CABAC解碼,然後說明解碼系統 的一些實施例。通常,H.264標準的CABAC解碼程序可以The Exp-Golomb module 584 receives input from a compute leading 〇 (CLZ) module 576 and provides a turn-out to multiplexer 586. The content memory 564 can be used by the CABAC module 58A and the CAVLC module 582. For all modes except the CABAC and CAVLC modes, the read command reads n bits from the SREG register 562a and returns the read value to the execution unit 42A via the multiplexer 586. Target scratchpad. For modes other than CABAC and CAVLC modes, inner valley memory 564 is used to maintain the upper and left content values, which are automatic delta sells as part of the decoding process. These elements, as well as other elements of variable length decoding unit 530, will be further described in conjunction with different modes for the latter two values. In some embodiments, variable length decoding logic may include less than (or more than) all of Display modules and / or multiplexers. The general function of the variable length decoding unit 53A will be described, and the operation of the variable fragrance decoding unit 530 configured in different modes will further explain the CABAC decoding, and then some embodiments of the decoding system will be explained. Usually, the H.264 standard CABAC decoding program can
Client's Docket No.: S3U06-0013-TW TT s Docket No:0608-A41246twf.doc/NikeyChen 32 1354239 說明為包括解析第-語法成分之已編碼位元流、初始化一 片段之内容變數以及第—語法成分之解碼引擎,以及二進 位化(bmarization)。接著,對每一個二進位值(bin)進 行解瑪’其程序包括獲得内容模組以及各語法成分之二進 位值的解碼’直列獲得有意義的字碼(c〇dew〇rd)比對。 更進-步解釋’解碼I統2⑽對語法成分進行解碼,其中 每一語法成分可以代表量子化係數、動作向量、和/或預測 杈式、或其他有關巨集區塊(macr〇bl〇ck)的參數,用以 表示影像或是視頻的特定圖場(fiel(〇或是圖框㈤Client's Docket No.: S3U06-0013-TW TT s Docket No: 0608-A41246twf.doc/NikeyChen 32 1354239 Description is to include an encoded bit stream that parses the first-syntax component, initializes a segment's content variable, and a -gram component The decoding engine, as well as the bmarization. Next, each binary value (bin) is decoded. The program includes obtaining a decoding of the content module and the binary values of the syntax components, and obtaining a meaningful word (c〇dew〇rd) alignment. Further step-by-step interpretation 'decoding I system 2 (10) decodes the syntax components, where each syntax component can represent quantization coefficients, motion vectors, and/or prediction matrices, or other related macroblocks (macr〇bl〇ck Parameter to represent the image or video specific field (fiel (〇 or frame (5)
7L 每厂個語法纽可以包含連續的—或多個二進位符號 -進位值’而母-個二進位符號會被解碼成g' 碼系統200根據輸入二進位符號的發生機率控制輪出位解 當某些符號(稱為主要符號)比其 CABAC編碼器可提供高效率編财法。红:4生’ 較小位元/符號比例來進行編碼。編瑪器持續:::二 的頻率統計,並適當地調整編碼演管二4進入負料 型。具有較高可能性的二進位符號稱 ^ =内谷模 probable symbol ’ MPS),而苴他,% "此性符唬(m〇st ◦east P— SymboI,LPS ),:二為低可能性符號 結合’具有對應於低可能性符號的可,内容模型 號值的各内容模型。 ι以及局可能性符 為了對各二進位符號進行解碼,解 是接收一對應範圍、偏移量以及内容 、〇决疋或 果型。内容模型是根7L per factory syntax can contain consecutive - or multiple binary symbols - carry value ' and the parent - binary symbols will be decoded into g' code system 200 according to the probability of occurrence of the input binary symbol control round out solution Some symbols (called primary symbols) provide a more efficient way of making money than their CABAC encoders. Red: 4 students' smaller bits/symbols to encode. The coder continues to::: 2 frequency statistics, and appropriately adjust the coding run 2 to enter the negative material type. The binary symbol with higher probability is called ^ = probable symbol ' MPS), while he, % " this character 唬 (m〇st ◦east P - SymboI, LPS), : two is low The sex symbol combines 'each content model with a comparable, content model number value corresponding to the low likelihood symbol. ι and local likelihood symbols In order to decode each binary symbol, the solution is to receive a corresponding range, offset, and content, 〇 or 果. Content model is root
Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 33 1354239 據符號種類以及由鄰近空間(例如目前巨集區塊或是屬於 前次解碼的相鄰巨集區塊)所決定的内容而從複數個可能 的内容模型中所選擇。可由内容模型決定内容辨識符號, 從而並使用以得到高可能性符號值以及用於解碼程序之解 碼引擎的目前狀態。範圍表示一個區間(interval),每經Client's Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 33 1354239 According to the symbol type and the adjacent space (such as the current macro block or the adjacent macro area that belonged to the previous decoding) The content determined by the block) is selected from a plurality of possible content models. The content identification symbol can be determined by the content model and used to obtain a high probability symbol value and the current state of the decoding engine used to decode the program. Range represents an interval, each time
I I 過一次二進位解碼就會縮小一次範圍。 : 區間分為兩個子範圍,分別對應於高可能性符號值以 及低可能性符號值。藉由將範圍以及已知内容模型所指定 的低可能性符號可能性相乘則可計算出低可能性符號子範 圍。藉由將範圍減去低可能性符號子範圍可計算出高可能 性符號子範圍。偏移量是決定解碼二進位值的標準,且通 常是從編碼位元流中取出前9位元進行初始化。對於已知 二進位符號解碼及内容模型,當偏移量小於高可能性符號 子範圍時,二進位值為高可能性符號值,而下一次解碼所 使用的範圍會設為高可能性符號子範圍。反之,二進位值 由低可能性符號決定、高可能性符號值的反向值會包含在 相關的内容模型中,以及下一個範圍會設為低可能性符號 子範圍。解碼程序的結果為連續的已解碼二進位值,其被 評估以判斷此序列是否符合有意義的字碼。 概括敘述解碼系統200的操作與CABAC解碼的關 係,下列敘述提出在CABAC解碼程序之内容中解碼系統 200的各種元件,可將符合實際應用的各種變動列入考慮。 熟悉此技藝之人士可知下列所使用的許多術語是出自 H.264規格,為了簡潔不再贅述,除非是有助於了解所述I I will reduce the range once by binary decoding. : The interval is divided into two sub-ranges, which correspond to high probability symbol values and low probability symbol values, respectively. The low probability symbol sub-range can be calculated by multiplying the range and the low probability symbol likelihood specified by the known content model. The high likelihood symbol subrange can be calculated by subtracting the low likelihood symbol subrange from the range. The offset is the criterion for determining the binary value of the decoding, and is usually initialized by taking the first 9 bits from the encoded bit stream. For known binary symbol decoding and content models, when the offset is less than the high likelihood symbol subrange, the binary value is a high probability symbol value, and the range used for the next decoding is set to a high probability symbol. range. Conversely, the binary value is determined by the low probability symbol, the inverse of the high probability symbol value is included in the associated content model, and the next range is set to the low likelihood symbol subrange. The result of the decoding process is a continuous decoded binary value that is evaluated to determine if the sequence conforms to a meaningful word. The relationship between the operation of the decoding system 200 and the CABAC decoding is summarized. The following description proposes various elements of the decoding system 200 in the content of the CABAC decoding program, and various variations in accordance with practical applications can be considered. Those skilled in the art will recognize that many of the terms used below are from the H.264 specification and will not be described again for brevity unless it is helpful to understand
Client’s Docket No.: S3U06-0013-TW XT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 的不,程序和/或元件,才會再做進—步之說明。 且辦圖至第6f圖係顯示解碼系统及相關元件之 =的。方塊圖。如圖所顯示’解碼系統·具有單 C一ABAC Γ30 (在第从圖至第㈣,所使用之 中紘辑早兀可與解碼系、统200互換),因此在實施例 二:額=:解碼單一位元流。同:樣的原理可應用 d ^ 1單元的解碼“ 2()(),可同時解 =:(例如兩個)㈣。簡單地說,第Μ圖係顯示解 擇元件的方塊圖,而第嘯顯示第从 他元件的功能方塊圖。第w =及苐6E®係顯示解·統之内容記憶體功能的方 6D _顯示使用於解石馬巨集區塊之示範機 制的方塊圖。雖然下·述是有_集區塊解碼的内容, 但疋本發明所提出之原理可應㈣各種區塊解碼。 參考第6A圖,可變長度解碼單元53〇&包括cabac 邏輯模組580以及記憶體模、组65〇。在—實施例巾,cabac 邏輯模組580包含三個模組,其分別是二進位化(bind) 模組620、取得内容(GCTX)模組622、以及二進位計算 解碼(BARD )引擎624。二進位計算解碼引擎624更包含 狀態索引(pStateldx )暫存器602、高可能性符號值(valMPS ) 暫存器6〇4、碼長範圍(c〇dlRange)暫存器6〇6,以及碼 長偏移量暫存器(codlOffset) 608。可變長度解碼單元530a 更包括記憶體模組650,其包括内容記憶體564 (亦稱為巨 集區塊鄰近内容(mbNeighCtx)記憶體或是内容記憶體陣Client's Docket No.: S3U06-0013-TW XT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 No, program and / or components, will be done again - step description. And the picture to the 6f picture shows the decoding system and related components =. Block diagram. As shown in the figure, the 'decoding system has a single C-ABAC Γ30 (in the first to fourth (fourth), which can be used interchangeably with the decoding system 200), so in the second embodiment: the amount =: Decode a single bit stream. The same principle: the decoding principle of d ^ 1 unit can be applied to "2()(), which can be simultaneously solved =: (for example, two) (four). In short, the second graph shows the block diagram of the deciphering component, and the The whistle shows the functional block diagram from the other component. The wth and 苐6E® display the solution 6D of the content memory function _ shows the block diagram used in the demonstration mechanism of the solution stone macroblock. The following description is for the content of the _set block decoding, but the principle proposed by the present invention can be used to decode (4) various blocks. Referring to FIG. 6A, the variable length decoding unit 53 〇 & includes the cabac logic module 580 and Memory phantom, group 65. In the embodiment, the cabac logic module 580 includes three modules, which are a binary module 620, a get content (GCTX) module 622, and a binary. A computational decoding (BARD) engine 624. The binary computation decoding engine 624 further includes a state index (pStateldx) register 602, a high probability symbol value (valMPS) register 6〇4, and a code length range (c〇dlRange). Memory 6〇6, and code length offset register (codlOffset) 608. Variable length decoding Element 530a further comprises a memory module 650, which includes a content memory 564 (also referred to as a giant set adjacent block content (mbNeighCtx) memory array or memory contents
Client's Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 35 1354239 例)、局部暫存器612、總體暫存器614,以及SREG串流 缓衝器/DMA引擎562 (亦稱為DMA引擎模組,將於第6C 圖中做進一步說明),另外還有未顯示之暫存器。在一實 施例中,内容記憶體564包含如第6C圖之陣列結構,之後 會有更進一步之說明。記憶體模組650亦包括二進位字串Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 35 1354239 Example), local register 612, overall register 614, and SREG stream buffer/DMA engine 562 ( Also known as the DMA engine module, which will be further explained in Figure 6C), there are also scratchpads not shown. In one embodiment, the content memory 564 includes an array structure as shown in Figure 6C, as will be further explained hereinafter. The memory module 650 also includes a binary string
* I (:binstring)暫存器 616。 可變長度解碼單元530a與執行單元420a的介面包括 目標(DST)匯流排628、兩個來源匯流排SRC1 632以及 SRC2 630、共用以及執行緒資訊匯流排634,以及延遲/重 置匯流排636。目標匯流排628上的資料可以直接或間接 (例如經由中間快取記憶體、暫存器、緩衝器、或記憶體) 傳送至圖形處理單元114内部或外部的視頻處理單元。目 標匯流排628上的資料可以是複數不同格式之一,包括微 軟的DX ΑΠ格式或是其他格式。這些資料可包含係數、 巨集區塊參數、動作資訊,和/或IPCM取樣或是其他資料。 可變長度解碼單元530a亦包括具有位址匯流排638和資料 匯流排640的記憶體介面。藉由從位址匯流排638得到位 址,記憶體介面可存取位元流資料以供存取資料匯流排640 所接收的資料。在一實施例中,資料匯流排640上的資料 可以包括未編碼視頻串流,其包括各種信號參數以及其他 資料與格式。於部分實施例中,可以使用載入-儲存操作來 存取位元流資料。 在開始說明可變長度解碼單元530a的不同元件之前, 簡單說明有關CABAC解碼之執行單元420a的整體操作。* I (:binstring) register 616. The interface of the variable length decoding unit 530a and the execution unit 420a includes a target (DST) bus 628, two source buses SRC1 632 and SRC2 630, a shared and thread information bus 634, and a delay/reset bus 636. The data on the target bus 628 can be transferred to the video processing unit internal or external to the graphics processing unit 114, either directly or indirectly (e.g., via intermediate cache, scratchpad, buffer, or memory). The data on the target bus 628 can be one of a number of different formats, including the Microsoft DX format or other formats. These data may include coefficients, macro block parameters, motion information, and/or IPCM sampling or other data. The variable length decoding unit 530a also includes a memory interface having an address bus 638 and a data bus 640. By obtaining the address from the address bus 638, the memory interface can access the bit stream data for accessing the data received by the data bus 640. In an embodiment, the data on data bus 640 may include unencoded video streams including various signal parameters as well as other data and formats. In some embodiments, a load-store operation can be used to access the bitstream data. Before starting to explain the different elements of the variable length decoding unit 530a, the overall operation of the execution unit 420a regarding CABAC decoding will be briefly explained.
Client's Docket No.: S3UO6-O0I3-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 通常,根據片段(Slice)的種類,驅動軟體128 (第1圖) 準備並載人CABAC著色器至執行單元42Ga。CABAC著色 器使用標準料集,再加上二進位化指令、取得内容指令 以及二進位計算解碼指令以解碼位元流。因為可變長度解 碼單元5 3 0 a使用的内容表可根伴片段種類改變,其中每一 片段均要載人。在-實施例中’在發出其他指令前,cabac 著色杰所執行的第一個指令包含INIT—ctx指令和 INIT—ADE指令。這兩個指令使CABAC單元—53〇開始解碼 CABAC位兀流’並從自動安排串流解碼的指標載入位元流 至先進先出缓衝器,稍後將說明這兩個指令。 關於解析位元,從記憶體介面的資料匯流排接 收位兀流,然後由SREG串流緩衝器/DMA引擎562進行 緩衝。從片段資料解析階段提供位元流解碼。亦即,位元 流(例如· NAL位元流)包括一或多張圖片,其將切割成 圖片標頭(header)以及許多片段。片段通常與連續的巨 集區塊有關。在-實施例中,外部程序(即可變長度解 單元530a外部)解析NAL位元流 '解碼片段樓頭並傳送 指向該片段資料(例如片段開始處)位置的指標。硬體(加 上軟體)可以從圖形來解析H264位元流。不過,在—實 施例中,CABAC編碼僅出現於片段資料與巨集區塊階段'。 通常,驅動軟體128從片段資料階段處理位元流,因為這 是應用程式以及AP所(^:舞供的功能。指向片段資料位置的 才曰“還包含片段資料的第—位元组(例如: RBSPbyeAddress)以及指出是位元流開始或標頭位置(例Client's Docket No.: S3UO6-O0I3-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 Normally, according to the type of slice, the driver software 128 (Fig. 1) prepares and carries the CABAC shader to the execution unit. 42Ga. The CABAC shader uses a standard set of samples, plus a binary instruction, a fetch content instruction, and a binary calculation decode instruction to decode the bit stream. Because the table of contents used by the variable length decoding unit 5 3 0 a can be changed with the type of the segment, each of which is to be carried. In the embodiment - before the other instructions are issued, the first instruction executed by cabac coloring includes the INIT_ctx instruction and the INIT_ADE instruction. These two instructions cause the CABAC unit to start decoding the CABAC bit stream and load the bit stream from the indicator that automatically arranges the stream decoding to the first in first out buffer, which will be described later. Regarding the parsing bit, the data stream is converged from the memory interface and then buffered by the SREG stream buffer/DMA engine 562. Bitstream decoding is provided from the fragment data parsing stage. That is, a bit stream (e.g., a NAL bit stream) includes one or more pictures that will be cut into picture headers and a number of fragments. Fragments are usually associated with successive macroblocks. In an embodiment, the external program (i.e., external to the variable length solution unit 530a) parses the NAL bit stream 'decodes the fragment header and transmits an indicator pointing to the location of the fragment material (e.g., at the beginning of the segment). The hardware (plus software) can parse the H264 bit stream from the graph. However, in the embodiment, CABAC coding only occurs in the segment data and macroblock stages. Typically, the driver software 128 processes the bitstream from the fragment data stage, since this is the application and the AP (^: the function of the dance. The pointer to the location of the fragment data) also contains the first byte of the fragment data (eg : RBSPbyeAddress) and indicate that the bit stream starts or the header position (example
Client's Docket No.: S3U06-0013-TW TT*s Docket No:0608-A41246twf.doc/NikeyChen 1354239 如:sREGptr)的位元偏梦 ,_ ^ _ 场移量指標(例如一或多個位元)。 田+ ★ 灸解釋'。在某些實施例中,可以利 用主機處理器(例如第1 〜 却1十 * 1圖之中央處理單元126)處理外 口 P程序以提供圖片階段觭 Γ 斗, 又解螞以及片段標頭解碼。在部分實 知例中,由於解碼系統 Λ / 〇〇,的編程特性’可以在任何階段 中進行解碼。 ’Client's Docket No.: S3U06-0013-TW TT*s Docket No:0608-A41246twf.doc/NikeyChen 1354239 For example: sREGptr) bite dream, _ ^ _ field shift indicator (eg one or more bits) . Tian + ★ moxibustion interpretation '. In some embodiments, the host processor (eg, the central processing unit 126 of the 1st to 1st, 1st, 1st, 1st) can be utilized to process the external port P program to provide picture stage combat, resolution, and fragment header decoding. . In some embodiments, the programming feature ' can be decoded in any stage due to the decoding system Λ / 〇〇. ’
5丨敬乡考g 5C圖以及第6Α圖,SREG串流緩衝器/DMA 擎562帛以分別接枚匯流排632以及匯流排630的匯流 排SRC1值以及匯鱗SRC2值,以及對應於轉發暫存器以 及t制暫存$的食料。SRBG串流緩衝器/DMA引擎562包 含内部位元流緩衝器562b,在一實施例中可為BigEndmn 格式之32位το暫存器以及8個128位元(8χ128)暫存器。 經由驅動軟體發出如前逃之初始化指令可初始設^ s RE G 串流緩衝器/DMA弓丨擎562。—旦初始化,便自動管理SRE(} 串机缓衝器/DMA引擎562的内部緩衝器562b。使用SREG =流緩衝器/DMA引擎562以保留解析位元的位置。在一 =施例中,SREG φ流緩衝器/DMA弓|擎562使用兩個暫存 器,一快速32位元正反器與一較慢512或1〇24位元記憶 體。位元流會使用位元。SREG暫存器S62a以位元進行操 作,而位兀流緩衝器562b以位元組進行操作,其可以節省 電源。通常,指令操作在SREG暫存器562a中,並使用少 許位元(例如1-3位元)。當SREG暫存器562a使用超過 -位元組的資料時,資料(以位元組片段)將從位元流缓 衝器562b傳送給SREG暫存器562a,然後緩衝器指標會減5丨敬乡考g 5C diagram and the sixth diagram, SREG stream buffer/DMA engine 562帛 to respectively connect the busbar S6321 and the busbar SRC1 value of the busbar 630 and the scale SRC2 value, and corresponding to the forwarding Store and t-stock food materials. The SRBG Stream Buffer/DMA Engine 562 includes an internal bit stream buffer 562b, which in one embodiment can be a 32-bit το register in BigEndmn format and eight 128-bit (8 χ 128) registers. The initialization command of the escape mode is issued via the driver software to initially set the s RE G stream buffer/DMA bow engine 562. Once initialized, the internal buffer 562b of the SRE(} string buffer/DMA engine 562 is automatically managed. SREG = stream buffer/DMA engine 562 is used to preserve the position of the parsing bit. In a particular example, SREG φ Stream Buffer / DMA Bow | Engine 562 uses two registers, a fast 32-bit flip-flop with a slower 512 or 1 〇 24-bit memory. The bit stream uses bits. SREG The buffer S62a operates in bits, and the bit stream buffer 562b operates in a byte group, which saves power. Typically, the instructions operate in the SREG register 562a and use a few bits (e.g., 1-3). Bit) When the SREG register 562a uses more than -byte data, the data (in bytes) will be transferred from the bitstream buffer 562b to the SREG register 562a, and then the buffer indicator will Less
Client’s Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyChen 38 1354239 少所傳送的位元組數量。當SREG串流緩衝器/DMA引擎 562的DMA偵測到使用256位元或是更多位元時,從記慘 體提取256位元以再填滿位元流緩衝器562t^因此,可^ 長度解碼單元530a實施一個簡單的循環緩衝器(256位元 片段X 4)以紀錄位元亨緩衝器562b並提供填充。在某此 實施例中,可以使用單:一緩衝器,不過—個循環緩衝器 要更複雜的指標計算以跟上記憶體的速度。 可以利用初始化指令來達成内部緩衝器562b的内部勒 作’稱為INIT—BSTR指令。在一實施例中是由驅動軟體i& 發出INIT_BSTR指令以及其他之後說明的指令。已知位 流位置的位元組位址及位元偏移量,INIT_BSTR指令將資 料載入至内部位元流緩衝器562b並開始管理程序。對於每 一次呼叫處理片段資料’將發出下列格式之指令: INIT_BSTR offset, RBSPbyteAddress 發出INIT_BSTR指令以載入資料至SREG串流緩衝器 /DMA引擎562的内部缓衝器562b。SRC2暫存器提供位元 組位址(RBSPbyteAddress) ’而SRC1暫存器提供位元偏移 量。如此,可提供下列通用之指令格式: INIT一BSTR SRC2, SRC1, 其中,這個指令中的SRC1以及SRC2以及其他對應於 内部暫存器566的值非限定在這些暫存器。在一實施例 中,使用256位元排列之記憶體提取以存取位元流資料,Client's Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyChen 38 1354239 The number of bytes transmitted. When the DMA of the SREG stream buffer/DMA engine 562 detects the use of 256 bits or more, 256 bits are extracted from the tracing body to refill the bit stream buffer 562t^, thus, The length decoding unit 530a implements a simple circular buffer (256 bit segments X 4) to record the bit hellen buffer 562b and provide padding. In one such embodiment, a single: a buffer can be used, but a circular buffer is required to calculate more complex indicators to keep up with the speed of the memory. An internal instruction to internal buffer 562b can be reached using an initialization instruction, referred to as an INIT-BSTR instruction. In one embodiment, the INIT_BSTR instruction and other instructions described later are issued by the driver software i& Knowing the byte address and bit offset of the bitstream location, the INIT_BSTR instruction loads the data into the internal bitstream buffer 562b and begins the hypervisor. An instruction for the following format will be issued for each call processing fragment data: INIT_BSTR offset, RBSPbyteAddress An INIT_BSTR instruction is issued to load data into the internal buffer 562b of the SREG Stream Buffer/DMA Engine 562. The SRC2 register provides a byte address (RBSPbyteAddress) and the SRC1 register provides a bit offset. Thus, the following general instruction formats are available: INIT-BSTR SRC2, SRC1, where SRC1 and SRC2 in this instruction and other values corresponding to internal register 566 are not limited to these registers. In one embodiment, a 256-bit array of memory fetches is used to access bit stream data,
Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 39 1354239 其寫入至緩衝器暫存器並傳送 bRhCj串流緩衝器/DMA 引擎562之32位元SREG暫存器562a。於—實施例中,在 純其他f作針對這些暫存器或是缓衝器的操作開始之 則,位元流緩衝器通内的資料是以位元組方式排列。藉 由使用排龍令可”㈣的_,稱之為·τ指令。 ABST指令排列位元流緩衝器562b _資料,其中在解碼 程序中’排列位元(例如:填充位元)最後將丢棄。Client's Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 39 1354239 It writes to the buffer register and transfers the 32-bit SREG of the bRhCj stream buffer/DMA engine 562 The memory 562a. In the embodiment, the data in the bitstream buffer pass is arranged in a byte group at the beginning of the operation of the other f for these registers or buffers. By using the platoon command, the _ of the (4) is called the τ instruction. The ABST instruction arranges the bit stream buffer 562b _ data, where the 'arranged bits (for example, padding bits) will be lost in the decoding program. abandoned.
當SR—EG暫存器562a使用資料時,内部緩衝器職 便會填充資料。換句話說’ SREG串流緩衝器/dma引擎 562的内部緩衝器562b作為以3為模(m〇dui〇)之循環緩衝 器以輸入SREG串流缓衝器/DMa引擎562的32位元暫存 态562a。CABAC模組580與讀取模組572 一起可使用 read指令以從SREG暫存器562a讀取資料。例如,在 H.264規格中,某些符號為固定長度編碼,以及藉由執行 這些特定位元數的READ指令而得到值,並零延伸至暫存 器的尺寸。READ指令之格式如下: READDST, SRC1, 其中DST對應於輸出或目標暫存器。在一實施例中, SRC1暫存器包含不具正負號的整數值n。透過READ指 令,從SREG暫存器562a讀取n位元。當從32位元暫存 态562a使用了 256位元的資料(例如解碼一或多個語法成 分),自動開始提取動作以獲得另一個256位元的資料以 寫入至内部缓衝器562b的暫存器,接著進入SREG暫存器When the SR-EG register 562a uses the data, the internal buffer job fills the data. In other words, the internal buffer 562b of the SREG stream buffer/dma engine 562 acts as a circular buffer modulo 3 (m〇dui〇) to input the 32-bit temporary of the SREG stream buffer/DMa engine 562. State 562a. The CABAC module 580, along with the read module 572, can use the read command to read data from the SREG register 562a. For example, in the H.264 specification, some symbols are fixed length codes, and the value is obtained by executing these specific bit number READ instructions, and zeros to the size of the scratchpad. The format of the READ instruction is as follows: READDST, SRC1, where DST corresponds to the output or target register. In an embodiment, the SRC1 register contains an integer value n that is not signed. The n-bit is read from the SREG register 562a by the READ instruction. When 256-bit data is used from the 32-bit temporary storage state 562a (eg, decoding one or more syntax components), the extraction operation is automatically started to obtain another 256-bit material for writing to the internal buffer 562b. Register, then enter the SREG register
Client’s Docket No·: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 40 1354239 562a進行使用。 在某些實施例中,如果對應於一符號解碼之SREG暫 存器562a的資料已被使用了預定數量的位元或位元組,且 内部缓衝器562b沒有再接收到任何資料,則CABAC模組 580可以經由延遲/重置匯流排636執行延遲,以便執行其 I » 他的執行緒(例如與CABAC解碼程序無關之執行緒), 像是頂點著色器操作。 使用SREG串流缓衝器/DMA引擎562的DMA引擎可 以減少所需的全部缓衝器以補償記憶體延遲(例如,於某 些圖形處理單元中,會有三百多週期)。當使用了位元流, 可以請求流入另外的的位元流貢料。如果位元流貧料太 低,且位元流缓衝器562b有向下溢位的風險時(例如已知 週期數量,讓信號從可變長度解碼單元530a流至處理器管 線),可傳遞延遲信號給處理器管線以暫停操作直到所等 候的資料到達位元流緩衝器562b。 此外,SREG串流缓衝器/DMA引擎562原本就有處理 錯誤位元流的能力。例如,由於位元流錯誤,有可能會沒 有偵測到片段結尾標示。這種偵測錯誤可能會導致完全地 解碼錯誤,並且使用到後來的圖樣或片段的位元。SREG 串流缓衝器/DMA引擎562紀錄所使用的位元數。當使用 的位元數大於預設的定限值(可針對每一片段改變)時, 結束處理程序並送出異常的信號至處理器(例如:主機處 理器)。接著,處理器執行編碼以嘗試從錯誤中回復。 請同時參考第6A圖以及第6B圖,進一步說明可變長Client’s Docket No·: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 40 1354239 562a is used. In some embodiments, CABAC is used if the data corresponding to a symbol decoded SREG register 562a has been used for a predetermined number of bits or bytes and the internal buffer 562b has not received any further data. Module 580 can perform a delay via delay/reset bus 636 to execute its I » his thread (eg, a thread independent of the CABAC decoder), such as a vertex shader operation. The DMA engine using the SREG Stream Buffer/DMA Engine 562 can reduce all of the buffers needed to compensate for memory delays (e.g., in some graphics processing units, there will be more than three hundred cycles). When a bit stream is used, it can be requested to flow into another bit stream tribute. If the bitstream lean is too low and the bitstream buffer 562b is at risk of a downflow (eg, a known number of cycles, letting the signal flow from the variable length decoding unit 530a to the processor pipeline), passable The delay signal is sent to the processor pipeline to suspend operation until the waiting data arrives at the bit stream buffer 562b. In addition, the SREG Stream Buffer/DMA Engine 562 originally had the ability to handle error bitstreams. For example, due to a bit stream error, it may not be detected at the end of the segment. This detection error can result in a complete decoding error and the use of bits in subsequent patterns or fragments. The SREG Stream Buffer/DMA Engine 562 records the number of bits used. When the number of bits used is greater than the preset limit (which can be changed for each segment), the handler is terminated and an exception signal is sent to the processor (for example, the host processor). The processor then performs the encoding to attempt to reply from the error. Please refer to Figure 6A and Figure 6B for further description of variable length
Client's Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 41 1354239 度解碼單凡53〇a的功能,尤其是解碼引擎(例如:BARD 引擎或是模組624)以及内容變數的初始化。在片段起始 處且在解觸應於第―巨縣塊的語法成分之前,内容狀 態以及二進位計算解碼模組624被初始化。在一實施例 中,驅動,軟體128發出INIT—CTX指令以及INIT—ADE指 令來進彳于初始化。 INIT_CTX指令會啟動CABAC解碼模式並初始化一個 或多個内容表(例如遠端儲存或是晶片上記憶體,例如 ROM)。INIT—CTX指令可根據下列指令格式而執行: 丽T_CTX SRC2, SRC1 對INIT—CTX指令而言,根據位元位置,運算元SRC1 可具有下列一或多個關於已知H.264巨集區塊參數的值: cabac_init—idc、mbPerLine、constrained_intra_pred flag、 NAL_unit—type(NUT)以及 MbaffFlag 。需注意到 constrained—intra一pred_flag、NAL_unit_type(NUT)以及 MbaffFlag對應於已知H.264巨集區塊參數。此外,根據位 元位置,運算元SRC2具有下列值:SliceQPY以及 mbAddrCurr。在一實施例中,進一步解釋,執行INIT—CTX 指令(即CABAC内容表的初始化)需要cabac_init_idc以 及sliceQPY(如量子化)參數。不過,要初始化整個CABAC 引擎需要三個指令,即INIT_BTSR指令、INIT_CTX指令 以及INIT_ADE指令,因此’ SRC1及SRC2 (例如:全部 64位元或各32位元)中的可用位元可以傳遞其他用於Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 41 1354239 Decode the function of 53〇a, especially the decoding engine (for example: BARD engine or module 624) and content Initialization of variables. The content status and binary calculation decoding module 624 is initialized at the beginning of the segment and before the grammatical component of the first-large block is deciphered. In one embodiment, the driver 128 issues an INIT-CTX instruction and an INIT-ADE instruction to initiate initialization. The INIT_CTX instruction initiates the CABAC decoding mode and initializes one or more table of contents (eg, remote storage or on-wafer memory, such as ROM). The INIT-CTX instruction can be executed according to the following instruction formats: 丽T_CTX SRC2, SRC1 For the INIT-CTX instruction, the operand SRC1 can have one or more of the following H.264 macroblocks depending on the bit position. The values of the parameters: cabac_init_idc, mbPerLine, constrained_intra_pred flag, NAL_unit_type(NUT), and MbaffFlag. It should be noted that constrained_intra-pred_flag, NAL_unit_type(NUT), and MbaffFlag correspond to known H.264 macroblock parameters. Further, according to the bit position, the operand SRC2 has the following values: SliceQPY and mbAddrCurr. In an embodiment, it is further explained that the execution of the INIT-CTX instruction (i.e., the initialization of the CABAC table of contents) requires a cabac_init_idc and a sliceQPY (e.g., quantization) parameter. However, to initialize the entire CABAC engine requires three instructions, the INIT_BTSR instruction, the INIT_CTX instruction, and the INIT_ADE instruction, so the available bits in 'SRC1 and SRC2 (for example: all 64-bit or 32-bit each) can pass the other for
Client's Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 1354239 CABAC鄰近内容的參數。因此兩個來源暫存器SRC1以及 SRC2 664可以包含下列值: SRC1[15:0] = cabac_init_idc SRC1[23:16] = mbPerLine SRC 1 [24] = constrained_intra_pred_flag » 一 一 I — :SRC1 [27:25] = NAL_unit_type (N;UT) SRC1[28] =MbaffFlag SRC1[31:29]=未定義 SRC2[15:0] = SliceQPY SRC2[31:16] = mbAddrCurrClient's Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 1354239 Parameters of CABAC proximity content. Therefore, the two source registers SRC1 and SRC2 664 can contain the following values: SRC1[15:0] = cabac_init_idc SRC1[23:16] = mbPerLine SRC 1 [24] = constrained_intra_pred_flag » One I I — :SRC1 [27:25 ] = NAL_unit_type (N; UT) SRC1[28] = MbaffFlag SRC1[31:29]=Undefined SRC2[15:0] = SliceQPY SRC2[31:16] = mbAddrCurr
SliceQPY的值是用於初始化位元流缓衝器562b内的 狀態機(未顯示)。 雖然前文已討論各種已知之圖形與片段參數,另外提 供一些關於可變長度解碼單元530a之參數。在一實施例 中,cabac_init—idc是針對未編碼為I-picture和切換 I-picture(SI)之片段所定義。換句話說,cabac_init_idc只能 針對P、SP以及B片段而定義,以及當接收到I和SI片段 時’ cabac_init_idc為預設值。舉例來說,當大概460個内 谷(例如I以及SI片段)被初始化時’可以將cabac init idc 设為3 (因為根據H.264規格’ cabac_init_idc的值只能是 0〜2 ) ’致能2位元以表示該片段為I或SI。 可變長度解碼單元530a亦可使用iNIT_CTX指令以初 始化局部暫存器612以及巨集區塊鄰近内容記憶體564陣The value of SliceQPY is used to initialize the state machine (not shown) within bit stream buffer 562b. While various known pattern and segment parameters have been discussed above, some additional parameters regarding variable length decoding unit 530a are provided. In an embodiment, cabac_init_idc is defined for segments that are not encoded as I-picture and switched I-picture (SI). In other words, cabac_init_idc can only be defined for P, SP, and B segments, and 'cabac_init_idc' is the default when I and SI segments are received. For example, when about 460 inner valleys (such as I and SI fragments) are initialized, 'cabac init idc can be set to 3 (because the value of cabac_init_idc can only be 0~2 according to the H.264 specification) 2 bits to indicate that the fragment is I or SI. The variable length decoding unit 530a may also use the iNIT_CTX instruction to initialize the local register 612 and the macroblock neighboring content memory 564 array.
Client's Docket No.: S3U06O013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 43 1354239 列結構或是元件’包括與暫存相鄰巨集區塊有關之暫存 益。參考第6C圖,在一實施例中’巨集區塊鄰近内容記憶 體564位於圖的上方。在—實施例中,巨集區塊鄰近内容 記憶體564的巨集區塊基準鄰近内容記憶體排列成記憶體 ,陣列以儲存有關巨集區塊之列(row )的資料。如圖所示, 1 :巨集區塊鄰近内容記憶體564包括陣列元素mbNeighCtx[0, 1,i-1,i,i+1,...119](標號為601),各元素用以儲存120 個巨集區塊中的一個巨集區塊至一列(例如對應於HDTV 為.1920叉1080 像素)。目前11^]\^811(^乂€;1111^111;暫存器603 用於儲存當前解碼之巨集區塊,而mbNeighCtxLeft暫存器 605用於儲存先前解碼之鄰近(左方)巨集區塊❶此外, 利用指標607a、607b和607c (在第6C圖中以箭頭表示) 指向暫存器603、605和陣列元素601。為了解碼目前之巨 集區塊,解碼之資料儲存於mbNeighCtxCurrent暫存 603。已知CABAC解碼之内容本質,根據前次解碼巨集區 塊時所蒐集之資訊來解碼目前的巨集區塊,亦即左方巨集 區塊儲存於左方mbNeighCtxLeft暫存器605並由指標6〇7b 所指向,而上方巨集區塊儲存於陣列元素[i]中並由指標 607c所指向。 繼續解釋初始化指令,INIT_CTX指令用於初始化與目 前巨集區塊(例如巨集區塊鄰近内容記憶體564陣列之元 素)相鄰之巨集區塊有關的上方及左方指標607c及6〇7b。 例如,左方指標607b可以設為〇而上方指標607c可以設 為1。此外,INIT_CTX指令會更新總體暫存器614。Client's Docket No.: S3U06O013-TW TT5s Docket No: 0608-A41246twf.doc/NikeyChen 43 1354239 Column structure or component 'includes temporary benefits associated with temporary neighboring macroblocks. Referring to Figure 6C, in one embodiment the 'macroblock' adjacent content memory 564 is located above the figure. In an embodiment, the macroblocks of the macroblocks adjacent to the content memory 564 are arranged adjacent to the content memory as memory, and the arrays are configured to store data about the rows of the macroblocks. As shown, 1: macroblock neighboring content memory 564 includes array elements mbNeighCtx[0, 1, i-1, i, i+1, ... 119] (labeled 601), each element is used Stores one of the 120 macroblocks into one column (for example, 1080p for 1080p corresponding to HDTV). Currently 11^]\^811(^乂€;1111^111; the register 603 is used to store the currently decoded macroblock, and the mbNeighCtxLeft register 605 is used to store the previously decoded neighbor (left) macro In addition, the indicators 607a, 607b, and 607c (indicated by arrows in FIG. 6C) point to the registers 603, 605 and the array element 601. In order to decode the current macroblock, the decoded data is stored in mbNeighCtxCurrent. 603. Knowing the content nature of CABAC decoding, the current macroblock is decoded according to the information collected when the macroblock was decoded last time, that is, the left macroblock is stored in the left mbNeighCtxLeft register 605. And pointed by the indicator 6〇7b, and the upper macro block is stored in the array element [i] and pointed to by the indicator 607c. Continue to explain the initialization instruction, the INIT_CTX instruction is used to initialize the current macro block (such as a macro) The blocks are adjacent to the upper and left indicators 607c and 6〇7b of the adjacent macroblocks. For example, the left indicator 607b may be set to 〇 and the upper indicator 607c may be set to 1. In addition, the INIT_CTX instruction Update register 614 overall.
Client’s Docket No.: S3U06-0013-TW TTss Docket No;0608-A41246twf.doc/NikeyChen 44 1354239 關於内容表的初始化,因應呼叫INIT_CTX指令,可 變長度解碼單元530a建立一或多個内容表,亦稱為 CTX—TABLE。在一實施例中,CTX一TABLE可以是 4x460x 16位元表(8位元給m,另外8位元給η,具正負 號的值)或是其他資料結構’内容表的每一個項目包含從Client's Docket No.: S3U06-0013-TW TTss Docket No; 0608-A41246twf.doc/NikeyChen 44 1354239 Regarding the initialization of the table of contents, the variable length decoding unit 530a establishes one or more content tables, also called the INIT_CTX command. For CTX-TABLE. In one embodiment, the CTX-TABLE may be a 4x460x 16-bit table (8 bits for m, another 8 bits for η, with a positive or negative value) or other data structure.
I 狀態索引暫存器602以及高可能性符號值暫存器604所存 取之 pStateldx 值及 VaiMPS 值。 INIT_ADE指令起始化二進位計算解碼模組624,亦稱 為解碼引擎。在一實施例中,完成INIT一BTSR指令後呼叫 INIT一ADE指令。於執行iNIT_ADE指令之後,可變長度 解碼單元530a建立兩個暫存器,分別是碼長範圍 (codlRange)暫存器606以及碼長偏移量(codlOffset)暫 存器608,具有下列指令或是數值: codlRange = 0x01 FE 以及 codlOffset = ZeroExtend (READ(#9), #16) 如此,在一實施例中,這些變數可以是9位元數值。 關於codlOffset指令,9位元是從位元流緩衝器562b所讀 取’零延伸(ZeroExtend)則儲存於16位元碼長偏移量暫 存器608中。部分實施例亦可使用其他數值。二進位計算 解碼模組624使用儲存於暫存器6〇6及608之數值以決定 要輸出0或1 ’且當一進位解碼之後,這些值將進行更新。 除了初始化碼長範圍暫存器606以及碼長偏移量暫存 器608,INIT_ADE指令操作亦初始化二進位字串暫存器The I state index register 602 and the high probability symbol value register 604 hold the pStateldx value and the VaiMPS value. The INIT_ADE instruction initiates a binary computation decoding module 624, also known as a decoding engine. In one embodiment, the INIT-ADE command is invoked after the INIT-BTSR instruction is completed. After executing the iNIT_ADE instruction, the variable length decoding unit 530a creates two registers, which are a code length range (codlRange) register 606 and a code length offset (codlOffset) register 608, with the following instructions or Values: codlRange = 0x01 FE and codlOffset = ZeroExtend (READ(#9), #16) As such, in one embodiment, these variables can be 9-bit values. Regarding the codlOffset instruction, the 9-bit element is read from the bit stream buffer 562b and the zero extension (ZeroExtend) is stored in the 16-bit code length offset register 608. Some embodiments may also use other values. The binary calculation decoding module 624 uses the values stored in the registers 6〇6 and 608 to determine whether to output 0 or 1' and these values will be updated after a carry decoding. In addition to the initialization code length range register 606 and the code length offset register 608, the INIT_ADE instruction operation also initializes the binary string register.
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 45 1354239 616。在一實施例中,二進位字串暫存器6i6可以3 元暫存器’其接收來自二進位計算解碼模組疋位 元。在部分實施例中可使用其他大小之暫存器。、剧出位 當巨集區塊編碼成I_PCM資料時,二進位計管解石、“ 組624亦被初始化。已知,Ι-PCM資料包含像素資=螞模 H.264規格,其並沒有將:轉換或預測模型應用:原如:據 資料。例如,LPCM可被使用以供無損(1〇ssles :: 用。 〃、、雨吟應Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 45 1354239 616. In one embodiment, the binary string register 6i6 can be a ternary register that receives the bits from the binary computation decoding module. Other sizes of registers can be used in some embodiments. When the macro block is coded into I_PCM data, the binary counts the stone, and the group 624 is also initialized. It is known that the Ι-PCM data contains the pixel code = the model H.264 specification, which does not Will: conversion or predictive model application: as before: according to the data. For example, LPCM can be used for non-destructive (1〇ssles :: use. 〃,, rain 吟
以上已描述與解析位元流以及初始化各種解碼系、纟一 件有關的架構以及指令,下面將描述有關二進仿化’、、統70 模型資訊與内容,以及根據模型及内容解碼的一或多,= 序。通常,可變長度解碼單元530a用於取得解析語二固= (syntax element ’ SE )所有可能的二進位化,戋θ許由刀 進位化模組620及BIND指令至少足夠取得模型。= 變長度解碼單元530a更經由取得内容模組622及丁 々付到已知§吾法成分的内容,並根據内容及模型資气 ^ 由二進位計算解碼模組624及BARD指令實施運二二碼經 實際上,呼叫GCTX/BARD指令、輪出一位元給二進位字 串暫存器616直到發現配合已知語法成分之有意義字碼= 構成一迴圈。在一實施例中,每一次解碼二進位值之後: 提供對應的解碼位元給二進位字串暫存器616,而_進位 字串暫存器被讀回至内容模組622,直到發現配對。 更詳細解釋使用單一可變長度解碼單元5 3 〇 a的解碼系 統架構’並同時參考第6A圖與第6B圖,經由驅動軟體‘”128The architecture and instructions related to parsing the bit stream and initializing various decoding systems, and the following are described above. The following describes the information and content of the binary imitation ', the system 70 model, and the decoding according to the model and content. More, = order. In general, the variable length decoding unit 530a is configured to obtain all possible binarizations of the syntax element 'SE', which is at least sufficient to obtain the model by the knife progression module 620 and the BIND instruction. The variable length decoding unit 530a further obtains the content of the known § _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ In effect, the GCTX/BARD instruction is called, and a bit is rounded out to the binary string register 616 until it finds that the meaningful word with the known syntax component = constitutes a loop. In one embodiment, each time the binary value is decoded: a corresponding decoded bit is provided to the binary string register 616, and the _ carry string register is read back to the content module 622 until a match is found . The decoding system architecture using a single variable length decoding unit 5 3 〇 a is explained in more detail while referring to FIGS. 6A and 6B, via the driver software ‘”128
Client's Docket No.: S3U06-0013-TW TT’s Docket No:0(508-A41246twf.doc/NikeyChi 46 1354239 ,所發出的BIND指令以致能二進位化模組620。於—實施例 中’ BIND指令具有下列格式: BIND DST, #Imml6, SRC1, 其中’DST對應$目標暫存器652,而#Imml6對應16 位元目前數值,以及SRC 1對應於輸入暫存器SRC 1。BmD 指令操作的輸入包含語法成分(包含16位元目前數值 Imm )以及内容區塊種類(ctxB1〇ckCat)。語法成分可以 包3任何付合H.264規格的任何語法成分型式(例如: MBTypelni、MBSkipFlagB、IntraChromaPredMode 等)。 呼叫BIND相令會使得驅動軟體128從儲存在記憶體(例 如··晶片上記憶體或遠端記憶體)中的表單(或其他資料 結構)讀取語法成分,並取得語法成分索引(SEIdx)。語 法成分索引用於存取其他表單或是資料結構以獲得如下文 所描述之各巨集區塊參數。 在一實施例中,目標暫存器652包含32位元暫存器, 其具有下列格式:位元0-8 ( ctxIdxOffset)、位元16-18 (maxBinldxCtx)、位元 21-23 (ctxBlockCat)、位元 24-29 (ctxIdxBlockCatOffset)、以及位元 31 ( bypass flag) 〇 這些數值(例如ctxIdxOffset, maxBinldxCtx等等)會傳送 至取得内容模組622當作内容模型之用。在此實施例中, 任何未定義的保留位元可以是0。根據語法成分索引以及 内容區塊種類的配對結果,ctxIdxBlockOffset可經由儲存 於遠端或晶片上記憶體之表單或其他資料結構而取得。表Client's Docket No.: S3U06-0013-TW TT's Docket No: 0 (508-A41246twf.doc/NikeyChi 46 1354239, issued BIND instruction to enable the binary module 620. In the embodiment, the BIND instruction has the following Format: BIND DST, #Imml6, SRC1, where 'DST corresponds to $target register 652, and #Imml6 corresponds to 16-bit current value, and SRC 1 corresponds to input register SRC 1. Input of BmD instruction operation contains syntax The component (including the 16-bit current value Imm) and the content block type (ctxB1〇ckCat). The syntax component can contain any of the syntax components of the H.264 specification (for example: MBTypelni, MBSkipFlagB, IntraChromaPredMode, etc.) The BIND phase causes the driver software 128 to read the syntax components from the form (or other data structure) stored in the memory (eg, on-wafer memory or remote memory) and obtain the syntax component index (SEIdx). The syntax component index is used to access other forms or data structures to obtain the macroblock parameters as described below. In an embodiment, the target register 652 contains 32 bits. The memory has the following format: bit 0-8 ( ctxIdxOffset), bit 16-18 (maxBinldxCtx), bit 21-23 (ctxBlockCat), bit 24-29 (ctxIdxBlockCatOffset), and bit 31 (bypass Flag) These values (eg, ctxIdxOffset, maxBinldxCtx, etc.) are passed to the fetch content module 622 as a content model. In this embodiment, any undefined reserved bit may be 0. According to the syntax component index and The result of the pairing of the content block type, ctxIdxBlockOffset can be obtained via a form or other data structure stored in the remote or on-chip memory.
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 47 1354239 一說明一非限定實施例之表單内容:Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 47 1354239 A description of a non-limiting embodiment of the form content:
codeNum (k) Coded_block_pattem Intra一4x4 Inter 0 47 0 1 31 16 2 15 1 1 3 0 . 2 4 23 4 5 27 8 6 29 32 7 30 3 8 7 5 9 11 10 10 13 12 11 14 15 12 39 47 13 43 7 14 45 11 15 46 13 16 16 14 17 3 6 18 5 9 19 10 31 20 12 35 21 19 37 22 21 42 23 26 44 24 28 33 25 35 34 26 37 36 27 42 40 28 44 39 29 1 43 30 2 45 31 4 46 32 8 17 33 17 18 34 18 20 35 20 24codeNum (k) Coded_block_pattem Intra-4x4 Inter 0 47 0 1 31 16 2 15 1 1 3 0 . 2 4 23 4 5 27 8 6 29 32 7 30 3 8 7 5 9 11 10 10 13 12 11 14 15 12 39 47 13 43 7 14 45 11 15 46 13 16 16 14 17 3 6 18 5 9 19 10 31 20 12 35 21 19 37 22 21 42 23 26 44 24 28 33 25 35 34 26 37 36 27 42 40 28 44 39 29 1 43 30 2 45 31 4 46 32 8 17 33 17 18 34 18 20 35 20 24
Client's Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 48 1354239 -----—--_ 36 24 19 37 ------ 6 21 38 ---- 9 26 39 22 — 28 40 25 23 41 32 27 42 33 29 43 14 30 44 36 22 45 40 25 46 --------- 38 38 47 --------- 41 41 抑一如果接收到未定義之内容區塊種類,則可變長度解碼 單元53〇a可以把未定義參數當成〇,使得ctxIdxB1〇ck〇ffset 被考慮成具有0值。 呼叫BIND指令亦會使得重置信號(Rst_Signal)從二 進位化模組620輸出至二進位計算解碼模組624,說明如 下。 為了說明二進位化模組620的各種輸入與輸出,這裡 提出根據至少一實施例之二進位化模組62〇的操作。呼叫 二進位化模組620,則二進位化模組62〇擷取語法成分, 並且經由軟體知:供已知的s吾法成分索引(SEIdx )。使用語 法成分索引,二進位化模組620查找表單以獲得 maxBinldxCtx、ctxIdxOffset 以及 bypassFlag 的對應值。這 個查找值會暫時儲存在目標暫存器652的預先定義位元配 置。此外,使用語法成分索引以及内容區塊種類,二進位 化模組620進行第二次表單查找(例如:遠端記憶體或是 晶片上記憶體)以獲得ctxIdxBlockOffset數值。第二次的Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 48 1354239 --------_ 36 24 19 37 ------ 6 21 38 ---- 9 26 39 22 — 28 40 25 23 41 32 27 42 33 29 43 14 30 44 36 22 45 40 25 46 --------- 38 38 47 --------- 41 41 Upon receiving the undefined content block type, the variable length decoding unit 53A may treat the undefined parameter as 〇 such that ctxIdxB1〇ck〇ffset is considered to have a value of 0. The call BIND command also causes the reset signal (Rst_Signal) to be output from the binary module 620 to the binary calculation decoding module 624, as explained below. To illustrate the various inputs and outputs of the binary module 620, the operation of the binary module 62A in accordance with at least one embodiment is presented herein. Calling the binary module 620, the binary module 62 retrieves the syntax components and, via the software, knows the known index of the component (SEIdx). Using the syntax component index, the binary module 620 looks up the form to obtain the corresponding values for maxBinldxCtx, ctxIdxOffset, and bypassFlag. This lookup value is temporarily stored in the predefined bit configuration of the target register 652. In addition, using the syntax component index and the content block type, the binary module 620 performs a second form lookup (e.g., remote memory or on-wafer memory) to obtain a ctxIdxBlockOffset value. Second time
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 查找值亦是暫時儲存在目標暫存器652中。因此,已決定 之值將用於建立目標暫存器652以作為32位元數值輸出目 標。 對某些語法成分而言’可使用額外的資訊(語法成分 與内容,區塊種類除外)以開始H.264解碼操作。例如,對 像是SigCoeffflag以及lastSigCoeffFlag的巨集區塊參數而 言,使用儲存在巨集區塊鄰近内容記憶體564的陣列元素 maxBinIdxCtx[ 1 ]裡的值以及輸入内容區塊種類值以決定巨 集區塊是圖場编瑪或是圖框編碼。在某些實施例中,即使 是不同的語法成分,同樣的語法成分數目也使用於這些旗 才示’然後使用 mb_field_dec〇ding一flag(mbNeighCtx[l]欄位) 來識別。 除了上述有關一進位化模組620的功能,注意到在第 6B圖中,二進位化模組620可結合二進位索引暫存器654、 多工器單元656和/或轉發暫存器F1以及F2。至於二進位 索引暫存器654以及多工器單元,多工器單元會 根據不同輸入而提供輸出SRC1(例如暫存器SRC1内的值) 給取得内容模組622。 關於4示示為F1的轉發暫存器,當bind (或GCTX ) 指令產生結果時,結果可被寫入至目標暫存器(例如目標 暫存器052和/或轉發暫存器Fl)。藉由已知指令中的轉^ 旗標可表示一個指令以及對應的模組(例如取得内容模組 622或二進位計异解碼模組624)是否使用轉發暫存器 以及F 2。代表轉發暫存器的符號包括F!(即使用轉發來源Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 The lookup value is also temporarily stored in the target register 652. Therefore, the determined value will be used to establish the target register 652 as a 32-bit value output target. For some grammatical components, additional information (syntax components and content, except for the block type) can be used to initiate the H.264 decoding operation. For example, for macroblock parameters like SigCoeffflag and lastSigCoeffFlag, the value stored in the array element maxBinIdxCtx[1] of the macroblock adjacent to the content memory 564 and the input content block type value are used to determine the macro. The block is the field code or the frame code. In some embodiments, even with different syntax components, the same number of grammatical components are used for these flags and then identified using mb_field_dec ding a flag (mbNeighCtx[l] field). In addition to the above-described functions relating to a carry module 620, it is noted that in FIG. 6B, the binary module 620 can incorporate the binary index register 654, the multiplexer unit 656, and/or the forwarding register F1, and F2. As for the binary index register 654 and the multiplexer unit, the multiplexer unit provides an output SRC1 (e.g., the value in the scratchpad SRC1) to the fetch content module 622 based on the different inputs. With respect to the forward register shown as F1, when the bind (or GCTX) instruction produces a result, the result can be written to the target register (e.g., target register 052 and/or forwarding register F1). The forwarding flag in the known command can indicate whether an instruction and the corresponding module (e.g., the content module 622 or the binary decoder module 624) use the forwarding register and F2. The symbol representing the forwarding scratchpad includes F! (ie using the forwarding source)
Client’s Docket No.: S3U06-0013-TW TT's Docket No;0608-A41246twf.doc/NikeyChen 50 &之值’在一實施例中可以是指令中的位元26所表示)以 (即使用轉發來源2之值,在一實施例中可以是指令 ^ 70 27所表示)。對於取得内容模組622以及二進位 下 〜解馬模組624 ’資料可被轉發至個別的輸入,說明如 〜則面已說明二進位化模組620以:及相關程序,這裡將 °兒明關於取得内容模組622在GCTX指令方面如何取得已 知杈型的内容以及二進位索引。簡單地說,取得内容模組 2 的輸入包含 maxBinidxCtx、binldx 以及 CtxIdxOffset, 每述如下。取得内容模組622使用CtxIdxOffset及binldx 數值來計算Ctxldx之值(為一輸出,代表内容索引 指令的示範格式如下: GCTX DST,SRC2, SRC1, 其中,SRC1對應於由多工器單元656所輸出的值並儲 存於暫存器SRC1,而SRC2對應於由目標暫存器652所輸 出的值並儲存於暫存器SRC2,以及DST對應於目標暫: 器。在一實施例中,各暫存器具有下列數值: SRCl[7:0]=binIdx ;當目前語法成分包八 codedBlockPattern時,SRC1的值(從多工器單元 出’並作為取得内容模組622之輸入)可以是二進位索弓丨 暫存器654的值。 '、 SRC1 [15:8]可以是 levelListldx (當計算 sigC〇effpia 時)、lastSigCoeffFlag 或是 mbPartldx (當計首绝 # 雨碼區塊圖Client's Docket No.: S3U06-0013-TW TT's Docket No; 0608-A41246twf.doc/NikeyChen 50 & the value 'in one embodiment may be represented by bit 26 in the instruction) to (ie using forwarding source 2 The value, in one embodiment, may be as indicated by instruction ^ 70 27). The data of the acquired content module 622 and the binary-resolved module 624' can be forwarded to individual inputs, and the description has been made to describe the binary module 620 to: and related programs, here will be How the acquired content module 622 obtains the known type of content and the binary index in terms of the GCTX instruction. Briefly, the input to get content module 2 contains maxBinidxCtx, binldx, and CtxIdxOffset, as described below. The get content module 622 uses the CtxIdxOffset and binldx values to calculate the value of Ctxldx (which is an output, representing an exemplary format of the content index instruction as follows: GCTX DST, SRC2, SRC1, where SRC1 corresponds to the output by the multiplexer unit 656 The value is stored in register SRC1, and SRC2 corresponds to the value output by target register 652 and stored in register SRC2, and DST corresponds to the target temporary device. In one embodiment, each register It has the following values: SRCl[7:0]=binIdx; when the current syntax component contains eight codedBlockPattern, the value of SRC1 (from the multiplexer unit 'and as input to the content module 622) can be a binary cable The value of the register 654. ', SRC1 [15:8] can be levelListldx (when calculating sigC〇effpia), lastSigCoeffFlag or mbPartldx (when counting the first #雨码块图
Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 1354239 樣之Ref_Idx或是binldx)。當語法成分是sigCoeffFlag或 是lastSigCoeffFlag時,多工器單元656可以用來傳送 levelListldx。 SRC 1 [ 16]可包含iCbCr旗標,而當其值為〇時,區塊 為Cb色度區塊。此外’ SRC1 [丨6]可包含L0/L1值,如果 是L0時,其值為0,熟悉此技藝之人士從本發明的内容可 知L0/L1是用於移動補償預測之圖形參考列表(L〇 = iist0 · LI = listl )。 SRC1 [21:20] = mbPartitionMode SRC2 [8:0] = ctxIdxOffset SRC2 [18:16] = maxBinldxCtx SRC2 [23:31] = ctxBlockCat SRC2 [29:24] = ctxIdxBlockOffset SRC2 [31] = bypassFlag 再者’ DST包括取得内容模組622的輸出並具有下列 值: DST [15:00] = ctxldx DST [ 23:16] = binldx DST [ 27:24] = mbPartldx DST [29:28] = mbPartitionMode DST [30] = L0 取得内容模組622亦可與轉發暫存器互動。因此,當 使用轉發暫存器時,指令可取得GCTX.F1.F2的格式,^ 中F1以及F2指示轉發暫存器被使用,即有2位元在指^Client’s Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 1354239 Ref_Idx or binldx). When the syntax component is sigCoeffFlag or lastSigCoeffFlag, multiplexer unit 656 can be used to transmit levelListldx. SRC 1 [16] may contain the iCbCr flag, and when the value is 〇, the block is a Cb chrominance block. In addition, 'SRC1 [丨6] may contain the L0/L1 value, and if it is L0, its value is 0. Those skilled in the art can know from the content of the present invention that L0/L1 is a graphic reference list for motion compensation prediction (L). 〇= iist0 · LI = listl ). SRC1 [21:20] = mbPartitionMode SRC2 [8:0] = ctxIdxOffset SRC2 [18:16] = maxBinldxCtx SRC2 [23:31] = ctxBlockCat SRC2 [29:24] = ctxIdxBlockOffset SRC2 [31] = bypassFlag again ' DST This includes getting the output of the content module 622 with the following values: DST [15:00] = ctxldx DST [ 23:16] = binldx DST [ 27:24] = mbPartldx DST [29:28] = mbPartitionMode DST [30] = The L0 acquisition content module 622 can also interact with the forwarding register. Therefore, when using the forwarding register, the instruction can obtain the format of GCTX.F1.F2, and ^F1 and F2 indicate that the forwarding register is used, that is, there are 2 bits in the pointer ^
Client’s Docket No.: S3U06-00! 3-丁W TTJs Docket No:0608-A41246twf.doc/NikeyChen 52 1354239 _ WFl以及F2) 如未得到—或兩個轉發旗標,則表 示Mx暫存裔未被使用。當這些位元被設定時(例如設為 !)則使用轉發暫存為的值(内部產生的值)。否則,就 使用來源暫存器的值。因此,轉發暫存器更提供一個有關 Μ為最早的時間可發出指令的建議給編譯程序。當未使 -帛轉發日守’指令可能遇到已:知來源暫存器之寫人後讀取的 延遲。 鲁 士對GCTX指令而έ,當重置信號(Rst—Signal)被設定 日T , SRC1的值為〇。當運算成立時, SRC1為來自取得内容模組622内部的Mnidx值再加上i, 否則SRC1為來自執行單元暫存器的binIdx值。可使用二 ' 進位化模組62〇的輸出作為GCTX指令以及bard指令的 轉發SRC2值。在後面的指令中,不會發出BIND指令直 到bard指令使用到轉發暫存器。進一步解釋,重置信號 以及F1轉發信號結合成一信號(例如2位元信號) Φ {F1,reset} ’其指示輸入至取得内容模組622的SRC1值是 否包括binIdx值或是轉發值。提供重置信號的另一個作用 - 是清除以及重置二進位字串暫存器616,並重置二進位索 引暫存器654成0。 繼續討論取得内容模組622以及得到内容資訊,在一 實施例中,下面表二以及表三所顯示的資訊分別對應於結 構鄰近内容記憶體564以及mbNeighCtxCurrent暫存器603 的值。mbNeighCtxCurrent暫存器603包含目前巨集區塊的 解碼輸出結果。在目前巨集區塊處理的最後部分,發出Client's Docket No.: S3U06-00! 3-丁W TTJs Docket No:0608-A41246twf.doc/NikeyChen 52 1354239 _ WFl and F2) If not obtained - or two forwarding flags, it means that the Mx temporary state is not use. When these bits are set (for example, set to !), the value temporarily transferred to (for internally generated value) is used. Otherwise, the value of the source register is used. Therefore, the forwarding register provides a hint to the compiler about the earliest possible time to issue an instruction. When the command is not made, the "Day Forward" command may encounter a delay that has been read after knowing the writer of the source register. Luke 对 for the GCTX command, when the reset signal (Rst-Signal) is set to day T, the value of SRC1 is 〇. When the operation is established, SRC1 is the value of the Mnidx from the internal content acquisition module 622 plus i, otherwise SRC1 is the binIdx value from the execution unit register. The output of the two 'homing module 62' can be used as the forward SRC2 value for the GCTX command and the bard command. In the following instructions, the BIND instruction will not be issued until the bard instruction is used to forward the scratchpad. Further, the reset signal and the F1 forward signal are combined into a signal (e.g., a 2-bit signal) Φ {F1, reset}' indicating whether the SRC1 value input to the acquired content module 622 includes a binIdx value or a forward value. Another effect of providing a reset signal is to clear and reset the binary string register 616 and reset the binary index register 654 to zero. Continuing with the discussion of the content module 622 and the content information, in one embodiment, the information shown in Tables 2 and 3 below corresponds to the values of the structure neighboring content memory 564 and the mbNeighCtxCurrent register 603, respectively. The mbNeighCtxCurrent register 603 contains the decoded output of the current macroblock. In the last part of the current macro block processing, issue
Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 53 1354239 CWRITE指令,其複製來自mbNeighCtxCurrent暫存器603 的資訊至鄰近内容記憶體564陣列内所對應的位置。之 後,所複製的資訊被當作頂部鄰近值。 麵 大小(ί· transformj_size_8x8一flag 1 0 mb_fieldjdecode_flag 1 1 mb—skip—flag 1 2 lntra_chroma_pred_mode 2 4:3 mb一type 3 7:5 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 coded Flag Y 1 14 codedFlagCb 1 15 codedFlagCr 1 16 coded FiagTrans 8 24:17 refldx 8 32:25 predMode 4 36:33Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 53 1354239 CWRITE command that copies information from the mbNeighCtxCurrent register 603 to a location within the array of adjacent content memory 564. The copied information is then treated as the top neighbor value. Face size (ί· transformj_size_8x8_flag 1 0 mb_fieldjdecode_flag 1 1 mb_skip_flag 1 2 lntra_chroma_pred_mode 2 4:3 mb_type 3 7:5 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 coded Flag Y 1 14 codedFlagCb 1 15 codedFlagCr 1 16 coded FiagTrans 8 24:17 refldx 8 32:25 predMode 4 36:33
表二 參數 大小獅 transform—size—8x8」lag 1 0 mb—field一decode—flag 1 1 mb—skip—flag 1 2 lntra_chromajDred_mode 2 4:3 mbQpDeltaGTO 1 88 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 coded Flag Y 1 14 codedFlagCb 1 15 codedFlagCr 1 16 coded FiagTrans 24 87:64 refldx 16 52:37 predMode 8 60:53 mb_type 3 63:61 表三Table 2 Parameter size lion transform_size—8x8” lag 1 0 mb—field—decode—flag 1 1 mb—skip—flag 1 2 lntra_chromajDred_mode 2 4:3 mbQpDeltaGTO 1 88 codedBlockPattemLuma 4 11:8 codedBlockPattemChroma 2 13:12 coded Flag Y 1 14 codedFlagCb 1 15 codedFlagCr 1 16 coded FiagTrans 24 87:64 refldx 16 52:37 predMode 8 60:53 mb_type 3 63:61 Table 3
Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 54 1354239 在一實施例中,參數codedFlagTrans被分為三部分。 舉例來說’開始的4位元係有關於内容區塊種類為〇或是 1,而上面的4位元係有關於内容區塊種類為3或是4。上 面的4位元更可分為兩部分,較低的2位元給iCbCr==0而 其他2位元給iCb(^r=l。參數predMode (預測模式)具有 下列三選項之一:predLO = 〇、predLl = 1 以及 NiPred 二 2。 第6D係顯示參考表二以及表三之參數refidx結構的〆 實施例。需注意到參數refldx與使用在圖像復原之參考圖 像列表之索引有關。上述結構可提供記憶體以及邏輯電路 的最佳化。如圖所顯示,計算語法成分結構包括巨集區塊 的頂部列609、巨集區塊分區611 (如顯示的四區)、 值613以及各L0/L1值的儲存位元值Gt〇 (大於〇) 615以 及儲存位元值GU (大於υ 617。通常,需要存取頂部鄰 近巨集區塊609,然而巨集區塊的底部列也是需要存取, 其被分為4x4方陣的一實施例,結果產生四個如 61卜對各mbPartition 611而言,L〇/L1值613的消息 定,但並非實際值。關於L0值以及L1值為i或是大於崔 的判斷被決定。在一實施例中,藉由儲存Gt〇6l5以及 617兩位元而獲得決定,其被使用於計算語法成分。11 進-步簡單說明計算語法成分結構,兩個最佳 行。在-最佳化中,只有保持2位元(雖然參考值傳2 較大),而不需要更多位元以供可變長度解碼單元幻如 計算語法成分的解碼。解碼全部的值並維持在執行單= 存态或是記憶體(例如:L2快取記憶體)。第二最佳化口Client's Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 54 1354239 In an embodiment, the parameter codedFlagTrans is divided into three parts. For example, the starting 4-bit system has a content block type of 〇 or 1, and the upper 4-bit system has a content block type of 3 or 4. The above 4 bits can be further divided into two parts, the lower 2 bits give iCbCr==0 and the other 2 bits to iCb (^r=l. The parameter predMode (predictive mode) has one of the following three options: predLO = 〇, predLl = 1 and NiPred 2 2. The 6D shows an example of the parameter refidx structure with reference to Table 2 and Table 3. It should be noted that the parameter refldx is related to the index of the reference image list used in image restoration. The above structure can provide optimization of memory and logic circuits. As shown, the computational syntax component structure includes a top column 609 of a macroblock, a macroblock partition 611 (such as the four regions shown), a value of 613, and The storage bit value Gt 〇 (greater than 〇) 615 and the storage bit value GU of each L0/L1 value are greater than υ 617. Usually, the top neighboring macro block 609 needs to be accessed, but the bottom column of the macro block is also Access is required, which is divided into an embodiment of a 4x4 square matrix, resulting in four messages such as 61 for each mbPartition 611, L 〇 / L1 value 613, but not actual values. About L0 values and L1 values The judgment for i or greater than Cui is determined. In an embodiment The decision is obtained by storing Gt〇6l5 and 617 two-element, which is used to calculate the syntax component. 11 Step-by-step simply describes the computational syntax component structure, the two best rows. In the optimization, only keep 2 The bit (although the reference value is 2 is larger), and no more bits are needed for the decoding of the variable length decoding unit illusion to calculate the syntax component. Decode all the values and maintain the execution of the single = memory or memory (Example: L2 cache memory). Second optimization port
Client’s Docket No.: S3U06-00I3-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 55 1354239 有四個元素被維持(例如兩個在頂部而兩個在左方)。四 個元素為再循環’而最後的值會由CWRITE指令寫入於鄰 近,其儲存在記憶體中。之後,只有16位元被維持在 mbNeighCtxCurrent暫存器603,而只有8位元被維持在 mbNeighCtxLeft暫存器605以及陣列564的頂部 mbNeighCtx元素601。在計算邏輯電路使用再儲存,因為 解碼參考值的全部計算被較少位元的布林運算所取代。 mb_type包括如下列表四所顯示。Client’s Docket No.: S3U06-00I3-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 55 1354239 Four elements are maintained (eg two at the top and two at the left). The four elements are recycled' and the last value is written in the neighborhood by the CWRITE instruction, which is stored in memory. Thereafter, only 16 bits are maintained in the mbNeighCtxCurrent register 603, and only 8 bits are maintained in the mbNeighCtxLeft register 605 and the top mbNeighCtx element 601 of the array 564. The storage logic uses re-storage because all calculations of the decoded reference value are replaced by Boolean operations with fewer bits. The mb_type includes the following list four.
mb_type 備 4'bOOO SI 4’b001 I_4x4 or l_NxN 4’b010 I_16x16 4’b011 l—PCM 4’b100 P_8x8 4,b101 B_8x8 4'b110 B_Direct_16x16 4,b111 Others 表四 未顯示在第6B圖的額外暫存器可以被使用,例如 mbPerLine (例如8位元’不具正負號)、mb_qp_delta ( 8 位元,具正負號),以及mbAddrCurr ( 16-bit,目前巨集區 塊位址)。對mbAddrCurr而言’ 1920x1080陣列被實施, 雖然其只需要13位元。部分實施例會使用16位元以幫助 16位元計算的執行。 來自先前所描述之暫存器的值亦被儲存在總體暫存器 614。複製儲存在總體暫存器614内的值並儲存在暫存器以 幫助硬體設計。在一實施例中’總體暫存器614包括格式Mb_type 4'bOOO SI 4'b001 I_4x4 or l_NxN 4'b010 I_16x16 4'b011 l-PCM 4'b100 P_8x8 4,b101 B_8x8 4'b110 B_Direct_16x16 4,b111 Others Table 4 does not show additional temporary storage in Figure 6B It can be used, such as mbPerLine (for example, 8-bit 'no sign'), mb_qp_delta (8-bit, plus sign), and mbAddrCurr (16-bit, current macro block address). For the mbAddrCurr' 1920x1080 array is implemented, although it only requires 13 bits. Some embodiments will use 16 bits to aid in the execution of 16-bit calculations. Values from the previously described scratchpad are also stored in the overall register 614. The values stored in the overall register 614 are copied and stored in the scratchpad to aid in hardware design. In an embodiment, the overall register 614 includes a format
Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 56 1354239 化之32位元暫存器以包含對應於mbPerline、mbAddrCurr 以及 mb_qp_delta 的值,除了 對應於 NUT、MBAFF_FLAG 以及chroma_format_idc的其他值之外。 可使用INSERT指令來更新總體暫存器614内的不同 欄位。INSERT指令的示範格式描述如下:Client's Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 56 1354239 The 32-bit scratchpad is included to contain values corresponding to mbPerline, mbAddrCurr, and mb_qp_delta, except for the NUT, MBAFF_FLAG and other values of chroma_format_idc. The different fields within the overall scratchpad 614 can be updated using the INSERT instruction. The exemplary format of the INSERT instruction is described below:
' I INSERT DST, #Imm, SRC1 : 在上面INSERT指令中,#Imm的一實施例包括10位 元數字,其中前面5位元寬度的資料以及上面5位元指定 資料被插入的位置。輸入參數包括下列所述:' I INSERT DST, #Imm, SRC1 : In the above INSERT instruction, an embodiment of #Imm includes a 10-bit number, where the data of the first 5-bit width and the position where the above 5-bit designation data is inserted. Input parameters include the following:
Mask = NOT(0xFFFFFFFF«#Imm[4:0])Mask = NOT(0xFFFFFFFF«#Imm[4:0])
Data = SRC1 & Mask SDATA = Data«#Imm[9:5] SMask = Mask«#Imm[9:5] 輸出DST可表示如下: DST - (DST & NOT(sMask)) I SDATA 需注意到一些攔位(例如:NUT ( NAL—UNIT_TYPE )、C (constrained—intra_pred_flag ) ) 、MBAFF_FLAG、 mbPerLine以及mbAddrCurr值亦可使用INIT_CTX指令來 寫入/初始化至總體暫存器614。 在一實施例中,局部暫存器612包括32位元暫存器, 其具有對應於 b、mb—qp__delta、numDecodAbsLevelEql 以 及numDecodAbsLevelGtl的欄位。這些欄位可使用INSERT 指令來更新。局部暫存器612亦被初始化,使得b = 0、 mb_qp_delta=0 、 numDecodAbsLevelEql=-1 以 及Data = SRC1 & Mask SDATA = Data«#Imm[9:5] SMask = Mask«#Imm[9:5] The output DST can be expressed as follows: DST - (DST & NOT(sMask)) I SDATA Note Some of the intercept bits (eg, NUT (NAL_UNIT_TYPE), C (constrained-intra_pred_flag)), MBAFF_FLAG, mbPerLine, and mbAddrCurr values may also be written/initialized to the overall scratchpad 614 using the INIT_CTX instruction. In one embodiment, local register 612 includes a 32-bit scratchpad having fields corresponding to b, mb_qp__delta, numDecodAbsLevelEql, and numDecodAbsLevelGtl. These fields can be updated using the INSERT directive. Local register 612 is also initialized such that b = 0, mb_qp_delta = 0, numDecodAbsLevelEql = -1 and
Client’s Docket No. : S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 1354239 numDecodAbsLevelGtl = 0。用以提供初始化的指令可使用 下列格式: CWRITE SRC1 ’其中 SRC1 [15:0] = mbAddrCurr。CWRITE SRC1 更新總 體暫存器614的mbAddrCurr欄位。在鄰:近元素結構以及其 解碼的簡單描述之後,將描述透過CWRITE指令所提供的 額外功能。Client’s Docket No. : S3U06-0013-TW TT’s Docket No: 0608-A41246twf.doc/NikeyChen 1354239 numDecodAbsLevelGtl = 0. The instructions used to provide initialization can use the following format: CWRITE SRC1 ' where SRC1 [15:0] = mbAddrCurr. CWRITE SRC1 updates the mbAddrCurr field of the general register 614. After the neighbor: near element structure and a brief description of its decoding, the additional functions provided by the CWRITE instruction will be described.
在CABAC解碼中’語法值被預期並從其鄰近巨集區 塊模仿。不同方法描述如後,其提供可變長度解碼單元53〇a 的實施例如何判斷左方以及上方鄰近巨集區塊以及如何判 斷這些巨集區塊為實際上為可使用。如前文所描述,解碼 程序使用鄰近值(例如:從巨集區塊或區塊至上方以及至 左方)。在一實施例中,二進位計算解碼引擎624計算下 列方程式’其使用目前巨集區塊數量以及位於一線 (mbPerLine)之巨集區塊的數量以計算上方巨集區塊的位 址以及左方與上方巨集區塊是否為可用。 舉例來說’為了判斷鄰近巨集區塊(例如:左方鄰近) 疋否存在(即有效),可執行運算(例如:mbCurrAddr % 可執 mbPerLine )以檢查其結果是否為〇。在一實施例中 行下列計算: a =: [mbCurrAddr%mbPerLine) a = mbCurrAddr — mbCurrAddr mbPerLine x mbPerLineIn CABAC decoding, 'syntax values are expected and are mimicked from their neighboring macroblocks. The different methods are described as follows, which provide an embodiment of the variable length decoding unit 53a that determines the left and upper neighboring macroblocks and how to determine that the macroblocks are actually usable. As described earlier, the decoder uses neighboring values (e.g., from a macroblock or block to the top and to the left). In an embodiment, the binary calculation decoding engine 624 calculates the following equation 'which uses the current number of macroblocks and the number of macroblocks located in one line (mbPerLine) to calculate the address of the upper macroblock and the left side. Is it available with the macro block above? For example, to determine whether a neighboring macroblock (eg, left neighbor) is present (ie, valid), an operation (eg, mbCurrAddr% executable mbPerLine) can be performed to check if the result is 〇. In one embodiment, the following calculations are performed: a =: [mbCurrAddr%mbPerLine) a = mbCurrAddr - mbCurrAddr mbPerLine x mbPerLine
Client’s Docket No.: S3U06-0013-TW TTss Docket No:0608-A41246twf.doc/NikeyChen 1354239 需注意到mbCuirAddr與對應於要解碼之二進位符號 的目如巨集區塊位置有關,而mbPerLine與每一已知列之 巨集區塊的數量有關。上面計算是使用一個除法、一個乘 法以及一個減法而實施。 , 進一步描述由二進位計算解碼”擎624所實施之解碼 機制,參考第6E圖,其顯示將被解碼的圖像(16χ8巨集 區塊且mbPerLine=16)。當解碼第35巨集區塊時 (mbCuirent標記為35,而第36巨集區塊尚未被完全解碼) 時’需要來自先前已解碼之上方巨集區塊(標記為19)以 及左方巨集區塊(標記為34)的資料。上方巨集區塊的資 訊可從 mbNeighCtx[i] 得 到 ,其中 i=mbCurrent%mbPerLine。因此,就這個例子而言,土 =Client's Docket No.: S3U06-0013-TW TTss Docket No:0608-A41246twf.doc/NikeyChen 1354239 Note that mbCuirAddr is related to the location of the macroblock corresponding to the binary symbol to be decoded, and mbPerLine and each It is known that the number of macroblocks in the column is related. The above calculation is performed using a division, a multiplication, and a subtraction. Further describing the decoding mechanism implemented by the binary decoding decoding 620, referring to FIG. 6E, which shows the image to be decoded (16χ8 macroblock and mbPerLine=16). When decoding the 35th macroblock When the mbCuirent flag is 35 and the 36th macroblock has not been fully decoded, it needs to be from the previously decoded upper macroblock (labeled 19) and the left macroblock (labeled 34). Information. Information on the upper macro block can be obtained from mbNeighCtx[i], where i=mbCurrent%mbPerLine. So, for this example, soil =
35%16 ’則i=3。在目前巨集區塊被解碼後,可使用CWRITE 指令來更新陣列中的mbNeighCtxLeft 605以及 mbNeighCtx[i] 601。 當另一例子時,考慮下列: mbCurrAddr e [Ο: max MB -1] 其中,maxMB為8192而mbPerLine=120。在一實施例中, 除可以藉由乘上(1/mbPerLine)而實施,其查找儲存於晶 片上記憶體之表(例如120x11位元的表)。當 mbCurrentAddr為13位元時,可使用Ι3χΐ 1位元的乘法器。 在一實施例中’完成乘法運算的結果、儲存上方13位元, 以及執行13x7位元的乘法’藉以儲存較低13位元。最後,35% 16 ’ then i=3. After the current macroblock is decoded, the CWRITE instruction can be used to update the mbNeighCtxLeft 605 and mbNeighCtx[i] 601 in the array. For another example, consider the following: mbCurrAddr e [Ο: max MB -1] where maxMB is 8192 and mbPerLine=120. In one embodiment, in addition to being implemented by multiplying (1/mbPerLine), it looks up a table of memory stored on the wafer (e.g., a table of 120 x 11 bits). When mbCurrentAddr is 13 bits, a Ι3χΐ 1 bit multiplier can be used. In one embodiment, the result of the multiplication operation is completed, the upper 13 bits are stored, and the multiplication of 13x7 bits is performed to store the lower 13 bits. At last,
Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 59 1354239 執行13位元的減法以決定「a」^運算的全部順序會使用 到2個週期’而結果將被儲存以使用在其他運算,以及當 mbCurrAddr值改變時再計算一次。 在部分實施例中,模數(modulo)運算不會被執行, 反而可使用執行單元内的著色邏輯電路以提供對齊置於片 段之第一線的第一 mbAddrCurr值。舉例來說,上述著色邏 輯電路可執行下列計算:mbAddrCurr= absoluteMbAddrCurr - η * mbPerLine。因為,部分 HL264 彈性巨集區塊排序(Flexibility Macroblock Ordering,FMO ) 模式具有一些非常複雜的鄰近結構,為了複製這些模式, 可在解碼系統200的額外著色器計算左方/上方的可得性, 並載入至可變長度解碼單元53〇a的一或多個暫存器。藉由 離開載入可變長度解碼單元53〇a,當啟動全部η.264模式 以進行符號解碼時可減少硬體的複雜性。 CWRITE指令從mbNeighCtxCurrent 603複製適當的攔 位至 mbNeighCtxTop[] 601 以及 mbNeighCtxLeft[](例如陣 列564的左方巨集區塊)。根據是否設定mBaffFrameFlag (MBAFF)以及目前與先前巨集區塊是否為攔位或是圖框 解碼,則特定 mbNeighCtxTop[] 601 以及 mbNeighCtxLeft[] 資料寫入。當(mbAddrCurr % mbPerLine = = 0)成立時, 標記mbNeighCtxLeft 605為不可用(例如其被初始化成 0)。使用CWRITE指令可移除mbNeighCtx記憶體564、 局部暫存器612以及總體暫存器614的内容。例如, CWRITE指令移動鄰近内容記憶體564的相關内容至第iClient's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 59 1354239 Perform a 13-bit subtraction to determine the order of the "a"^ operation will use 2 cycles' and the result will be Stored for use in other operations, and once again when the mbCurrAddr value changes. In some embodiments, a modulo operation will not be performed, instead a colored logic circuit within the execution unit may be used to provide a first mbAddrCurr value aligned to the first line of the slice. For example, the above-described coloring logic circuit can perform the following calculations: mbAddrCurr = absoluteMbAddrCurr - η * mbPerLine. Because some of the HL264 Flexibility Macroblock Ordering (FMO) modes have some very complex neighboring structures, in order to replicate these modes, the left/upper availability can be calculated in the extra shader of the decoding system 200, And loaded into one or more registers of the variable length decoding unit 53A. By leaving the variable length decoding unit 53a, the hardware complexity can be reduced when all η.264 modes are activated for symbol decoding. The CWRITE instruction copies the appropriate block from mbNeighCtxCurrent 603 to mbNeighCtxTop[] 601 and mbNeighCtxLeft[] (for example, the left macro block of array 564). The specific mbNeighCtxTop[] 601 and mbNeighCtxLeft[] data are written depending on whether mBaffFrameFlag (MBAFF) is set and whether the current and previous macro blocks are blocked or frame decoded. When (mbAddrCurr % mbPerLine = = 0) is established, the flag mbNeighCtxLeft 605 is not available (for example, it is initialized to 0). The contents of mbNeighCtx memory 564, local register 612, and overall register 614 can be removed using the CWRITE instruction. For example, the CWRITE command moves the related content of the adjacent content memory 564 to the ith
CUenVs Docket No.; S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 60 1354239 個巨集區塊(例如mbNeighCtx[i]或是目前巨集區塊)的左 方以及上方區塊,並且亦清除mbNeighCtxCurrent暫存器 603。如前文所描述,上方指標607c以及左方指標607b與 鄰近内容記憶體564有關。在CWRITE指令之後,上方索 引增加1,並且目前巨集區塊的内容移動到陣列内的上方 t 位置以及左方位置。上述機構可減少讀出_/寫入時記憶體陣 列中讀出/寫入埠的數量。 可使用INSERT指令來更新鄰近内容記憶體564、局部 暫存器612以及總體暫存器614的内容,如前文所述。例 如’可使用INSERT 指令(例如:INSERT $mbNeighCtxCurrent_l,#ImmlO, SRC1 )來寫入目前巨集區 塊。後來的運算不會影響上方指標607c以及左方指標607b (即只寫入至目前位置)。 INSERT指令以及來自二進位計算解碼模組624之更 新被寫入至鄰近内容記憶體564的mbNeighCtxCurrent陣 列601。左方指標607b指向記憶體564的元素,其相同於 鄰近(鄰近於mbNeighCtx 601 )陣列元素(即 mbNeighCtx[i-l])。 鑑於上述關於得到内容以及模型資訊,下文將根據内 容以及模型資訊討論二進位計算解碼模組624以及計算解 碼。二進位計算解碼模組624在BARD指令下操作。BARD 指令的示範格式描述如下: BARD DST, SRC2, SRC1CUenVs Docket No.; S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 60 1354239 The left and upper blocks of a macroblock (such as mbNeighCtx[i] or the current macroblock), The mbNeighCtxCurrent register 603 is also cleared. As described above, the upper indicator 607c and the left indicator 607b are associated with the adjacent content memory 564. After the CWRITE instruction, the upper index is incremented by 1, and the contents of the current macroblock are moved to the upper t position and the left position within the array. The above mechanism can reduce the number of read/write 埠 in the memory array at the time of read_/write. The contents of the neighboring content store 564, the local register 612, and the overall register 614 can be updated using the INSERT instruction, as previously described. For example, you can use the INSERT instruction (for example: INSERT $mbNeighCtxCurrent_l, #ImmlO, SRC1) to write the current macro block. Subsequent operations do not affect the upper indicator 607c and the left indicator 607b (ie, only write to the current location). The INSERT instruction and the update from the binary computation decoding module 624 are written to the mbNeighCtxCurrent array 601 of the adjacent content memory 564. The left indicator 607b points to the element of the memory 564, which is identical to the adjacent (near mbNeighCtx 601) array element (i.e., mbNeighCtx[i-l]). In view of the above regarding the content and model information, the binary calculation decoding module 624 and the calculation decoding will be discussed below based on the content and model information. The binary calculation decoding module 624 operates under the BARD instruction. The exemplary format of the BARD instruction is described below: BARD DST, SRC2, SRC1
Client’s Docket No.: S3U06-0013-TW TTJs Docket No:0608-A41246twf.doc/NikeyChen ^ 1354239 其提供二進位計算解碼運算,复 單一位元輸出。輸人參數描述如下各—進位重複解瑪導致 SRCi = binIdx/ctxIdx,為取得 以及 板,、且622的輸出; 〇KC2 = bypassFlag > 運位化模組620的輸出 當使用轉發暫存器時,—Client's Docket No.: S3U06-0013-TW TTJs Docket No: 0608-A41246twf.doc/NikeyChen ^ 1354239 It provides a binary calculation decoding operation, and a single-bit output. The input parameters are described as follows - the carryover solver causes SRCi = binIdx/ctxIdx to obtain the output of the board, and 622; 〇 KC2 = bypassFlag > The output of the transport module 620 when using the forward register ,—
BARD.F1.F2,其指示轉發暫存哭。伊々、勒式可包括 位计异解碼模組624亦接收如前文 ^ 一進 別地,在接收重置信號之後,二二解二置信號。特 娜信號直到接收到第—次呼叫= = 置心被清除。 、异中,二進位計算解碼模組624接收内容索弓I (ctxldx)值以及指標至來自取得内容模组622❸解碼位元 流(bmldx)之目前位元分析位置。二進位計算解石馬模組 624使用來自於碼長偏移量暫存器6〇8以及碼長範圍暫存 杰606的偏移量以及範圍值以紀錄解碼引擎的目前間隔狀 態(偏移量’偏移量+範圍)。二進位計算解碼模組624 使用内容索引值以存取内容表(CTX_TABLE),其依序使 用以存取目前可能狀態pStateldx以及高可能性符號值。使 用pStateldx (例如:來自於儲存在遠端或晶片上記憶體之 表單)以續取低可能性符號子範圍值、下一個高可能性符 號值以及下一個低可能性符號的可能值。BARD.F1.F2, which instructs to forward the temporary cry. The 々 々 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒 勒The signal is not cleared until the first call is received. In the middle, the binary calculation decoding module 624 receives the content index I (ctxldx) value and the indicator to the current bit analysis position from the obtained content module 622 ❸ decoding bit stream (bmldx). The binary calculation solver module 624 uses the offset and range values from the code length offset register 6 〇 8 and the code length range temporary 606 to record the current interval state of the decoding engine (offset) 'Offset + range'. The binary computational decoding module 624 uses the content index value to access the table of contents (CTX_TABLE), which in turn is used to access the currently possible state pStateldx and the high likelihood symbol value. Use pStateldx (for example, from a form stored on the remote or on-wafer memory) to renew the low probability symbol sub-range value, the next high probability symbol value, and the next possible low probability symbol.
Client’s Docket No.: S3U06-0013-TW TT5s Docket N〇:0608-A41246twf.doc/NikeyChei 62 以239 根據高可能性符號值的狀能、 -貪訊’二進位計算解碼模組‘ 一,圍以及可能性 可能性符號值。二進位計* :异目别二進位符號的高 (位元或是:進位值,例模組624輪出二進位信號 2器616。接著’對下-個:進b位的:乂:,進位字串 重设程序,例如從:_字$ γ目门或疋不同内容 622的回授達接658 ° 至取得内容模組 :可能性符號值的二 付就以及可能性狀態寫入至彳目則向可施性 注意到關於轉發暫存器F1二轉容使用。 用,當信號發出轉發時,指令可,暫存盗F2的使 例如,當從二進位化模組_轉^^不;^具有延遲。 沒有延遲存在,且可在下-__/Gc^xH622中, =容模組622轉發至二進位 曰二在= 使用到4個週期。當在週期 =624中’會 週期(j+5 )發出BARD指令 曰·’則可在 槽最多填充4個N〇p。在從 "的缺少會導致延遲 位計算解碼模組624中,沒有延遲轉發至二壤Client's Docket No.: S3U06-0013-TW TT5s Docket N〇:0608-A41246twf.doc/NikeyChei 62 to 239 according to the high probability symbol value, - the greedy 'binary calculation decoding module' Possibility probability symbol value. Binary meter*: the high of the binary symbol (bit or: carry value, the example module 624 rounds out the binary signal 2 616. Then 'down-one: into the b-bit: 乂:, The carry string reset procedure, for example, from: _word $ γ gate or 回 different content 622 of the feedback 658 ° to the content module: the two sign of the possibility symbol value and the possibility state is written to 彳The purpose is to note that the transfer register F1 is used for forwarding. When used, when the signal is forwarded, the command can be used to temporarily suspend the F2. For example, when the relay is turned from the binary module _ There is no delay, and there is no delay, and in the lower -__/Gc^xH622, the = module 622 is forwarded to the binary 曰 in the = use to 4 cycles. When in the cycle = 624 'will cycle (j +5) Issue the BARD command 曰·' to fill up to 4 N〇p in the slot. In the absence of " the delay bit calculation decoding module 624, there is no delay to forward to the second
=碼模組624轉發至取得内容模紐奶卜言U 出BARD指令時,則可在週期㈣發出中聲 從一進位計算解碼模組624轉 θ π。在 如果第二二進位字串被保留:!=-進位化模組㈣中, -推㈣伊… 進位計算解碼模組624逊If the code module 624 is forwarded to obtain the content module, the U.S. BARD command can be sent from the carry calculation decoding module 624 to θ π in the period (4). If the second binary string is reserved: !=-the carry-in module (four), - push (four) Iraq...
Hlfnt,s D〇cket No.; S3U06-0013-TW TT's Docket No:〇6〇8-A41246twf.doc/NikeyChen -進位化Μ㈣之間有切換存在,則沒有延遲存在。^ 63 1354239 由保留第二二進位字串,可允許發出BARD iBARD_八 以供不需忍受延遲的旁路(bypass)情況。 曰々 CAVLC解碼 已經描述用於CABAC解碼的可變長度解碼單元 a ’目前將針對解碼系、统2〇〇 # CAVLC實施例作進— 述’其亦稱為可變長度解碼單元53%,如帛从圖乂田Hlfnt, s D〇cket No.; S3U06-0013-TW TT's Docket No: 〇6〇8-A41246twf.doc/NikeyChen - There is a switch between the carry-in 四(4), then no delay exists. ^ 63 1354239 By retaining the second binary string, BARD iBARD_8 can be allowed to be issued for bypass situations that do not have to endure delays.曰々CAVLC decoding has been described for variable length decoding unit a 'for CABAC decoding'. For the decoding system, the CAVLC embodiment will now be referred to as 'variable length decoding unit 53%, such as帛 from the map
在描述CAVLC架構之前,㈣單描述在可變長度解碼單 元530b中内容的H.264 CAVLC程序。 已知,C A V L C程序編碼有關巨集區塊或是其位置 號的位準(例如:大小),以及位準何時會重複(例如多° 少週期)’以避免需要對每一位元做解碼。位元流观接 收以及分析上述資訊,其中當資訊由解碼可變長度解碼單 元530b的解碼引擎使用時,緩衝器被填充。可變長度解碼 單元遍藉由從已純位元流職取具有料以及運行 係數的巨集區塊#訊來反向編碼過程並重建信號。 因此’可變長度解碼單元5鳥從位元流緩衝器通接收 巨集區塊錢,並分析串流已分別得到位準以及運行係數 值給位準以及運行陣列的暫時儲存器。舉例來說,位準以 及運行陣列讀出對應於巨集區塊中區塊之知4區塊的像 素,接著清除位準以及運行陣列以供下—個區塊使用。依 照Η.264標準,軟體可根據4χ4構建區塊而使用全部的巨 集區塊。 資讯的一般操作,下列 現在提供有關於解碼巨集區塊Before describing the CAVLC architecture, (iv) the H.264 CAVLC procedure describing the content in the variable length decoding unit 530b. It is known that the C A V L C program encodes the level (e.g., size) of a macroblock or its location number, and when the level is repeated (e.g., a few cycles) to avoid the need to decode each bit. The bit stream view receives and analyzes the above information, wherein the buffer is filled when the information is used by the decoding engine of the decoded variable length decoding unit 530b. The variable length decoding unit reverses the encoding process and reconstructs the signal by taking a macro block from the pure bit stream and having a running block and running coefficients. Thus, the variable length decoding unit 5 receives the macroblock money from the bit stream buffer, and analyzes that the stream has been separately leveled and the operating coefficient value is given to the level and the temporary storage of the array is operated. For example, the level and the run array read the pixels corresponding to the block 4 of the block in the macro block, then clear the level and run the array for use by the next block. According to the 264.264 standard, the software can use all the macroblocks according to the 4χ4 building block. The general operation of the information, the following is now available on the decoding macro block
Client’s Docket No.: S3U06-0013-TW TT5s Docket N〇:0608-A41246twf.doc/NikeyChen 64 1354239 敘述提出在CAVLC^_ 530b的不同元件,可將符合垂 T 了 ·交長度解螞單元 慮。熟悉此技藝之人士可知下;二;:各巧動列入考 不同參數的標號)是出自Ή 勺6午夕術語(例如 除非是有助於了解所述的不同二=潔:再資述, 一步之說明。 义凡件’才會再做進 第7Α圖係顯示可變長度解石馬單元^ — 塊圖。第7Α圖係顯示單—可變長 二貫施例之方 一可變長度解碼單元530b用以在命於4丨^凡53%,而單 流。同樣的原理可應用至具有額“: 時解碼多物如心=早= v B CA;LC ' 解碼的表袼結構。雖缺 述是有集區塊解碼的内容,但是本發明所提出之原理 可應用到各種區塊解碼,將不再進—步描述相同的部分。Client's Docket No.: S3U06-0013-TW TT5s Docket N〇: 0608-A41246twf.doc/NikeyChen 64 1354239 The different components proposed in CAVLC^_ 530b can be used to meet the requirements of the vertical length. Those who are familiar with this technique can know that; second;: the labels of the different parameters are included in the 6th noon term (for example, unless it is helpful to understand the different two = clean: re-status, The description of one step. The Yifan piece will be re-entered into the 7th frame showing the variable length solution stone unit ^ - block diagram. The 7th frame shows the single-variable length second instance of the variable length The decoding unit 530b is used to make a single stream of 53%, and the same principle can be applied to the table structure with the amount of ": decoding multiple objects such as heart = early = v B CA; LC ' decoding. Although the description is based on the content of block decoding, the principles proposed by the present invention can be applied to various block decodings, and the same portions will not be described in further detail.
可變長度解碼單元遍用以分析位元流、初始化解碼 硬體與暫存器/記憶體結構’以及階段_運行解碼。上述 H.264標準的CAVLC解碼程序的上述各功能將進—步描述 於後。關於位元流緩衝器操作,在CABAC以及cavlc運 算之間共用SREG串流緩衝器/DMA引擎562,因此除了下 錢及CABAC以及CAVLC模式之間的操作差異之外,為 了間/糸將不再進一步描述相同的部分。CABAC以及 CAVLC解碼實施例皆使用相同的内容記憶體564,但是欄 位(例如·結構)不相同,其將描述於後。因此,當Ca VLCThe variable length decoding unit is used to analyze the bit stream, initialize the decoding hardware and the scratchpad/memory structure', and stage_run decoding. The above-described functions of the above-mentioned H.264 standard CAVLC decoding program will be further described. Regarding the bit stream buffer operation, the SREG stream buffer/DMA engine 562 is shared between the CABAC and cavlc operations, so in addition to the difference in operation between the money and the CABAC and CAVLC modes, the The same parts are further described. Both the CABAC and CAVLC decoding embodiments use the same content memory 564, but the fields (e.g., structures) are not the same, which will be described later. So when Ca VLC
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 65 1354239 的内谷記憶體564操作相似於前文所描述的CABAC運算 時,為了簡潔將不再進一步描述相同的部分。此外,總體 暫存器614以及局部暫存器612亦被使用,因此將不再進 一步描述相同的部分。 ,參考第圖’可變長度解碼單元530b包括硬體的不 t 同模組’其包括係數符記(t〇ken )模扭(coeff—token ) 710、 位準碼模組(CAVLC_LevelCode ) 712、位準模組 (CAVLC—Level) 714、位準 〇 模組(CAVLC一L0) 716、 零位準模組(CAVLC—ZL) 718、運行模組(CAVLC_Run) 720、位準陣列(LevelArray ) 722以及運行陣列(RunArray ) 724。解碼系統亦包括如前文所描述之SREG串流緩衝器 /DMA引擎562、總體暫存器614、局部暫存器612以及鄰 近内容記憶體564。 可變長度解碼單元530b與執行單元420a的介面包括 相同於前文所述之CABAC實施例的一或多個目標匯流排 與對應的暫存器(例如:目標暫存器),以及兩個來源匯 流排與對應的暫存器(SRC1以及SRC2等)。 通常,根據片段的種類,驅動軟體128 (第1圖)準備 並載入C A VLC著色器至執行單元420a。CAVLC著色器使 用標準指令集再加上額外的指令集’包括coeff_token、 CAVLC_LevelCode 、CAVLC_Level 、CAVLC L0 、 CAVLC_ZL以及CAVLC_Run指令以解碼位元流。額外的 指令係包括有關於位準陣列722以及運行陣列724之讀取 以及清除運算的READ_LRUN以及CLR_LRUN指令。在Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 65 1354239 The inner valley memory 564 operation is similar to the CABAC operation described above, and the same part will not be further described for the sake of brevity. . In addition, the overall register 614 and the local register 612 are also used, and thus the same portions will not be further described. Referring to the figure, the variable length decoding unit 530b includes a hardware module that includes a coefficient register (t〇ken), a coeff-token 710, a level code module (CAVLC_LevelCode) 712, Level module (CAVLC-Level) 714, level module (CAVLC-L0) 716, zero level module (CAVLC-ZL) 718, running module (CAVLC_Run) 720, level array (LevelArray) 722 And run the array (RunArray) 724. The decoding system also includes a SREG stream buffer/DMA engine 562, an overall register 614, a local register 612, and a neighboring content memory 564 as previously described. The interface of the variable length decoding unit 530b and the execution unit 420a includes one or more target buss and corresponding registers (eg, target registers) of the CABAC embodiment as described above, and two source sinks. Arrange with the corresponding scratchpad (SRC1 and SRC2, etc.). Typically, driver software 128 (Fig. 1) prepares and loads the C A VLC shader to execution unit 420a, depending on the type of segment. The CAVLC shader uses a standard instruction set plus an additional instruction set 'including coeff_token, CAVLC_LevelCode, CAVLC_Level, CAVLC L0, CAVLC_ZL, and CAVLC_Run instructions to decode the bit stream. Additional instructions include READ_LRUN and CLR_LRUN instructions for the read and clear operations of the level array 722 and the run array 724. in
Clients Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 66 1354239 一實施例中,在發出其他指令前,CAVLC著色器所執行的 第一個指令包含iNIT_CTx指令和init_ade指令。這兩 個指令初始化可變長度解碼單元53〇 b以解碼CAVLC位元 流’並從自動安排串解碼的指標載入位元流至先進先出緩 衝器,稍後將說明這兩個指令。因此’可變長度解碼單元Clients Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 66 1354239 In one embodiment, the first instruction executed by the CAVLC shader includes the iNIT_CTx instruction and the init_ade before issuing other instructions. instruction. These two instructions initialize the variable length decoding unit 53 〇 b to decode the CAVLC bit stream ' and load the bit stream from the index of the automatically arranged string decoding to the FIFO buffer, which will be described later. Therefore 'variable length decoding unit
I 530b可用以·分析位元流、初始化解碼硬體與暫存器/記憶體 結構’以及階段-運行解碼。H.264標準的CAVLC解碼程 序的上述各功能將進一步描述於後。 關於分析位元流的指令,除了先前描述於CABAC程 序的READ以及INIT一BSTR指令會共用於CAVLC程序之 外,還有兩個其他指令分析位元流存取更有關於CAVLC 程序’即INPSTR指令(對應於檢查字串模組570)以及 INPTRB指令(第5C圖中前次載入至可變長度解碼邏輯電 路550)。INPSTR指令以及INPTRB指令不需要限定在 CAVLC操作(例如上述指令可使用在其他程序,如 CABAC、VC-1以及MPEG)。使用inPSTR指令以及 INPTRB指令以债測特定圖型(pattern )(例如:資料開始 或是結束圖型)是否出現在片段、巨集區塊等,用以致能 位元流的讀出而不需要進行位元流。在一實施例中,指令 的順序包括INPSTR以及INPTRB然後rEad指令的實 施。INPSTR指令的示範格式描述如下:I 530b can be used to analyze the bit stream, initialize the decoding hardware and scratchpad/memory structure, and stage-run decoding. The above functions of the CAVLC decoding program of the H.264 standard will be further described later. Regarding the instruction to analyze the bit stream, in addition to the READ and INIT-BSTR instructions previously described in the CABAC program, which are commonly used in the CAVLC program, there are two other instructions for analyzing the bit stream access more about the CAVLC program 'ie the INPSTR instruction. (corresponding to the check string module 570) and the INPTRB command (the previous load to the variable length decoding logic 550 in FIG. 5C). The INPSTR instruction and the INPTRB instruction do not need to be limited to CAVLC operations (for example, the above instructions can be used in other programs such as CABAC, VC-1, and MPEG). Use the inPSTR instruction and the INPTRB instruction to test whether a specific pattern (for example, data start or end pattern) appears in a fragment, a macro block, etc., to enable reading of the bit stream without performing Bit stream. In one embodiment, the order of the instructions includes the implementation of INPSTR and INPTRB followed by the rEad instruction. The exemplary format of the INPSTR instruction is described below:
INPSTR DST 其中’在一實施例中,檢查位元流並傳回SrjeG暫存器562aINPSTR DST where 'in one embodiment, the bit stream is checked and passed back to the SrjeG register 562a
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 67 1354239 的最高有效16位元在目標暫存器的較低16位元。目標暫 存态的上16位元包含sREGbitptr值。由於此操作,資料並 未從SREG暫存器562a移除❶根據下列示範偽碼 (pseudocode)可實施 INPSTR指令: MODULE INPSTR (DST) t OUTPUT [31:0] DST ; DST = {ZE (sREGbitptr), sREG [msb: msb-15]};Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 67 1354239 The most significant 16 bits are in the lower 16 bits of the target scratchpad. The upper 16 bits of the target scratch state contain the sREGbitptr value. Due to this operation, the data is not removed from the SREG register 562a. The INPSTR instruction can be implemented according to the following pseudo-code (Pseudocode): MODULE INPSTR (DST) t OUTPUT [31:0] DST ; DST = {ZE (sREGbitptr) , sREG [msb: msb-15]};
ENDMODULE 另一個分析位元流的指令為INPTRB指令,其檢查原 始位元組序列承載(raw byte sequence payload ,RBSP ) 尾隨位元(例如排列成位元組的位元流)dNPTRB指令 提供位元流暫存器562b的讀取。INPTRB指令的示範格式 描述如下: INPTRB DST。 在INPTRB運算中,沒有位元從SREG暫存器562a移 除。當SREG暫存器562a的高有效位元包含例如100時, 則SHEG暫存器562a包含RBSP停止位元,以及位元組内 剩下的位元為alignment zero bits。根據下列示範偽碼可實 施INPTRB指令: MODULE INPTRB(DST) OUTPUT DST; REG [7:0] P;ENDMODULE Another instruction to analyze the bitstream is the INPTRB instruction, which checks the raw byte sequence payload (RBSP) trailing bits (eg, the bitstream arranged in bytes). The dNPTRB instruction provides the bitstream. The reading of the register 562b. The exemplary format of the INPTRB instruction is described below: INPTRB DST. In the INPTRB operation, no bits are removed from the SREG register 562a. When the high significant bit of the SREG register 562a contains, for example, 100, the SHEG register 562a includes the RBSP stop bit, and the remaining bits in the byte are alignment zero bits. The INPTRB instruction can be implemented according to the following exemplary pseudo code: MODULE INPTRB(DST) OUTPUT DST; REG [7:0] P;
Client's Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 68 1354239 P = sREG [msb: msb-7]; Sp = sREGbitptr; T [7:0] =(P » sp) « sp; DST [1] = (T = - 0x80)? 1: 〇· DST[0] = ! (CVLC_BufferBytesRemaining > 〇); ENDMODULE 提供READ指令以供位元流緩衝器562b中資料調正。Client's Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 68 1354239 P = sREG [msb: msb-7]; Sp = sREGbitptr; T [7:0] = (P » Sp) « sp; DST [1] = (T = - 0x80)? 1: 〇· DST[0] = ! (CVLC_BufferBytesRemaining >〇); ENDMODULE Provides a READ instruction for data conditioning in the bit stream buffer 562b .
現在將描述可變長度解碼單元530b的額外位元串緩衝 為操作’目前將針對CAVLC操作的的初始化作描述’尤 其是記憶體、暫存器結構以及解碼引擎(例如:CAVLC模 組582 )的初始化。在片段起始處且在解碼對應於第一巨 集區塊暫存器結構的語法成分之前,總體暫存器 部暫存器612以及CAVLC模組582被初始化 在 例中,驅動軟體128發出INIT CAVLC指令η 1 " — M礎行# 化。INIT_CAVLC指令的示範格式描述如下: 局 實施 初始The extra bit string buffering of the variable length decoding unit 530b will now be described as an operation 'currently describing the initialization of the CAVLC operation', particularly the memory, the scratchpad structure, and the decoding engine (eg, CAVLC module 582). initialization. The general register portion register 612 and the CAVLC module 582 are initialized in the example at the beginning of the segment and before decoding the syntax component corresponding to the first macroblock register structure, and the driver software 128 issues the INIT. The CAVLC instruction η 1 " — M base line #化. The exemplary format of the INIT_CAVLC instruction is described as follows: Bureau Implementation Initial
INIT_CAVLC SRC2, SRC1 其中,SRC2包括片段資料中解碼之位元組的數目。 入於内部 CVLC_bufferBytesRemaining 内: ’值寫 SRC1 [15:0] = mbAddrCurr ; SRC1 [23:16] = mbPerLine ; SRC1 [24] = constrained_intra_predflag ; SRC1 [27:25] = NAL_unit_type (NUT); SRC1 [29:28] = chroma—format idc (—實施你I —_ 系使用鮮 Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 69 1354239 應於4:2:0格式之1的chroma—format—idc值,然而部分實 施例可使用其他取樣機制);以及 SRC1 [31:30]=未定義。 關於腿t—CAVLC齡,SRC1 _值被寫入至總體 暫存器614中所對應的攔位。再者,SRC2内的值被-入至 由INIT指令所設定的内部暫存器(例如: CVLC一bufferByteRemaining 暫存器)。使用 CVLC—bufferByteRemaining暫存器以復原任何錯誤位元 流,如前文所述。舉例來說,可變長度解碼單元53% (例 如:SREG串流緩衝器/DMA引擎562)紀錄了分析已知片 段之位元流中緩衝位元的資訊。當使用位元流時,可變長 度解碼單元通計數並更新CVLc_bufferByteRemaining 值。當其值低於0時,其中低於0的值是表示緩衝器或是 位元流錯誤,提示處理的終止以及返回至應用控制或是由 驅動軟體128控制以處理復原。 INIT_CAVLC指令亦初始化可變長度解碼單元53〇b的 不同儲存結構,包括在某方面來說相似於先前描述之 CABAC程序的鄰近内容記憶體564、mbNeighCtxLeft暫存INIT_CAVLC SRC2, SRC1 where SRC2 includes the number of bytes decoded in the fragment data. Within the internal CVLC_bufferBytesRemaining: 'value write SRC1 [15:0] = mbAddrCurr ; SRC1 [23:16] = mbPerLine ; SRC1 [24] = constrained_intra_predflag ; SRC1 [27:25] = NAL_unit_type (NUT); SRC1 [29: 28] = chroma-format idc (-implement your I__ use fresh Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 69 1354239 should be in 4:2:0 format 1 Chroma—format—idc value, however some embodiments may use other sampling mechanisms; and SRC1 [31:30]=undefined. Regarding the leg t-CAVLC age, the SRC1_value is written to the corresponding bit in the overall register 614. Furthermore, the value in SRC2 is entered into the internal scratchpad set by the INIT instruction (for example: CVLC-bufferByteRemaining register). Use the CVLC_bufferByteRemaining register to restore any error bitstreams, as described earlier. For example, variable length decoding unit 53% (e.g., SREG stream buffer/DMA engine 562) records information that analyzes buffer bits in the bitstream of a known slice. When a bit stream is used, the variable length decoding unit counts and updates the CVLc_bufferByteRemaining value. When the value is below 0, a value below 0 is indicative of a buffer or bit stream error, prompting termination of the process and returning to application control or being controlled by the driver software 128 to handle the restore. The INIT_CAVLC instruction also initializes different storage structures of the variable length decoding unit 53A, including neighboring content memory 564, mbNeighCtxLeft, which is similar in some respects to the previously described CABAC program.
态 605 以及 mbNeighCtxCurrent 暫存器 603。已知 CAVLC 解碼之内容本質’根據前次解碼巨集區塊時CAVLCJTOTC 指令所蒐集之資訊來解碼目前的巨集區塊,亦即左方巨集 區塊儲存於左方mbNeighCtxLeft暫存器605並由指標607b 所指向,而上方巨集區塊儲存於陣列元素⑴6〇1中並由指State 605 and mbNeighCtxCurrent register 603. It is known that the content of CAVLC decoding is essentially 'decoding the current macroblock according to the information collected by the CAVLCJTOTC instruction when decoding the macroblock last time, that is, the left macroblock is stored in the left mbNeighCtxLeft register 605 and Pointed by indicator 607b, and the upper macroblock is stored in array element (1)6〇1 and is indicated by
Chent's Docket No.: S3U06-0013-TW s Docket No:〇6〇8-A41246twf.doc/NikeyChen 1354239 標607c所指向。使用INIT_CAVLC指令來初始化上方指 標607c與左方指標607b,並更新總體暫存器614。 為了判斷鄰近巨集區塊(例如:左方鄰近)是否存在 (即有效),可由CAVLC_TOTC指令執行運算(例如: mbCurrAddr % mbPerLine),其相似於 CAB AC 實施例中 ' 1 所執行的同一程序,因此將不再描述。 相似於所描述的CABAC程序,使用CWRITE指令可 移除鄰近内容記憶體564的内容,而使用INSERT指令可 更新鄰近内容記憶體564的内容、局部暫存器612以及總 體暫存器614,其中可使用INSERT指令以供寫入至 mbNeighCtxCurrent暫存器603。維持在鄰近内容記憶體564 之資料的結構可描述如下: mbNeighCtxCurrent[01:00] : 2?b : mbType mbNeighCtxCurrent[65:02] : 4,b : TC[16] mbNeighCtxCurrent[81:66] : 45b : TCC[cb][4] mbNeighCtxCurrent[97:82] : 45b : TCC[cr][4] 當執行CWRITE指令時,更新mbNeighCtx[]鄰近值,然後 初始 mbNeighCtxCurrent 暫存器 603。 已描述由可變長度解碼單元530b初始的内容記憶體結 構以及初始化,下面將描述可變長度解碼單元53〇b (特別 是CAVLC一TOTC指令)如何使用鄰近内容資訊以計算總 係數(TotalCoeff,TC),其之後將被使用來判斷是否應 §亥使用CAVLC表格以解碼符號。通常,cavLC的解瑪是 利用描述於H.264規格的可變長度解碼表格(於此稱=Chent's Docket No.: S3U06-0013-TW s Docket No: 〇6〇8-A41246twf.doc/NikeyChen 1354239 pointed to by 607c. The upper indicator 607c and the left indicator 607b are initialized using the INIT_CAVLC instruction, and the overall register 614 is updated. In order to determine whether a neighboring macroblock (eg, left neighbor) is present (ie, valid), an operation can be performed by the CAVLC_TOTC instruction (eg, mbCurrAddr% mbPerLine), which is similar to the same procedure performed by '1 in the CAB AC embodiment, Therefore, it will not be described. Similar to the described CABAC program, the content of the adjacent content memory 564 can be removed using the CWRITE instruction, while the content of the adjacent content memory 564 can be updated using the INSERT instruction, the local register 612, and the overall register 614, where The INSERT instruction is used for writing to the mbNeighCtxCurrent register 603. The structure of the data maintained in the adjacent content memory 564 can be described as follows: mbNeighCtxCurrent[01:00] : 2?b : mbType mbNeighCtxCurrent[65:02] : 4,b : TC[16] mbNeighCtxCurrent[81:66] : 45b : TCC[cb][4] mbNeighCtxCurrent[97:82] : 45b : TCC[cr][4] When the CWRITE instruction is executed, the mbNeighCtx[] neighbor value is updated, and then the mbNeighCtxCurrent register 603 is initialized. The content memory structure initialized by the variable length decoding unit 530b and the initialization have been described, and how the variable length decoding unit 53B (especially the CAVLC-TOTC instruction) uses the neighbor content information to calculate the total coefficient (TotalCoeff, TC) will be described below. ), which will be used later to determine if the CAVLC table should be used to decode the symbols. In general, the cavLC solution is based on the variable length decoding table described in the H.264 specification (herein =
Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 1354239 CAVLC表格)’纟中根據先前已解碼符號之内容選擇 CAVLC表格以解碼各符號。即對每一格符號而言,其為不 相同的CAVLC表格。帛7B _係顯示基本表格結構,其為 可變大小的二維陣列。提供表格的陣列(每一個表格可為 一特定,號),而每一個符號為霍夫曼(Huffman)編碼。 霍夫雙崎被儲存成下列結構的表格: struct Table{ unsigned head; struct table { unsigned val; unsigned shv; }table[]; }Table[]; 下面將描述根據唯一前置(prefix)編碼用以比對的方 法(MatchVLC函數)。通常,CAVLC表格包括可變長度 部分以及固定長度部分。藉由執行一些固定大小的索引查 找(lookup)可簡化比對。在MatchVLC函數中,可執行 READ運算而不從SREG暫存器562a移除位元。因此,對 處理位元流的位元流緩衝器562b而言,READ運算不同於 前文所描述的READ指令。在下面所描述的MatchVLC函 數中’一些位元(fixL)從位元流緩衝器562b被複製,然 後於一指定表格中查找。指定表格内的各項目包含特定格 式(例如:值以及以位元型式的大小)。使用項目的大小Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 CAVLC Table) The CAVLC table is selected from the contents of the previously decoded symbols to decode each symbol. That is, for each cell symbol, it is a different CAVLC table.帛7B _ shows the basic table structure, which is a two-dimensional array of variable size. An array of tables is provided (each table can be a specific number), and each symbol is a Huffman code. Hoff Shuangqi is stored as a table of the following structure: struct Table{ unsigned head; struct table { unsigned val; unsigned shv; }table[]; }Table[]; The following describes the use of a unique prefix (prefix) The method of comparison (MatchVLC function). Generally, a CAVLC table includes a variable length portion and a fixed length portion. The alignment can be simplified by performing some fixed-size index lookups. In the MatchVLC function, the READ operation can be performed without removing the bit from the SREG register 562a. Therefore, for the bit stream buffer 562b that processes the bit stream, the READ operation is different from the READ instruction described above. In the MatchVLC function described below, some of the bits (fixL) are copied from the bit stream buffer 562b and then looked up in a specified table. Each item in the specified table contains a specific format (for example: value and size in bit type). Use item size
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 以進行位元流。 FUNCTION MatchVLC(Table, maxldx) INPUT Table; INPUT maxldx;Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 for bit stream. FUNCTION MatchVLC(Table, maxldx) INPUT Table; INPUT maxldx;
Idxl = CLZ(sREG);//count number of leading zeros Idxl = (Idxl > maxldx)? maxldx : Idxl; fixL = Table[Idxl].head; SHL(sREG, Idxl+#1); //shift buffer Idxl + 1 bit left Idx2 = (fixL)? 0 : READ(fixL); (val, shv) = Table[Idxl][Idx2]; SHL(sREG, shv); return val;Idxl = CLZ(sREG);//count number of leading zeros Idxl = (Idxl > maxldx)? maxldx : Idxl; fixL = Table[Idxl].head; SHL(sREG, Idxl+#1); //shift buffer Idxl + 1 bit left Idx2 = (fixL)? 0 : READ(fixL); (val, shv) = Table[Idxl][Idx2]; SHL(sREG, shv); return val;
ENDFUNCTON 第7B圖係顯示上述表格結構之示範二維陣列的方塊 圖,用以描述在CAVLC解碼之内容中的MatchVLC函數。 從H.264標準内的表格9-5中得到當nC == -1時的例子, 其描述如下:ENDFUNCTON Figure 7B is a block diagram showing an exemplary two-dimensional array of the above table structure for describing the MatchVLC function in the content of CAVLC decoding. An example when nC == -1 is obtained from Table 9-5 in the H.264 standard, which is described as follows:
Coeff_token TrailingOnes TotalCoeff Head Value Shift 1 1 1 0 33 0 • - oi " ·*· _鐵〇象練 0 、 ,¾ 0為· ▼s f< 07 001 2 2 0 66 0 Άο縦猫叢 a 2 产,; [:¾'.魏 “w# 、. 1 / -Ά*4 4 〜 : '·. -VT; --ϊϊϊ>· *· 、 :〇ρρι〇ι3Α: * ^•vOOOjltlO^ ’ OOOf 11 dv h 2 000010 0 4 1 4 1 000011 0 3 3 1 .0000010 3. ^ 1 67 · 1Coeff_token TrailingOnes TotalCoeff Head Value Shift 1 1 1 0 33 0 • - oi " ·*· _Iron 练 练 0 , , 3⁄4 0 is · ▼s f< 07 001 2 2 0 66 0 Άο縦猫丛 a 2 ,;[:3⁄4'.魏"w# ,. 1 / -Ά*4 4 ~ : '·. -VT; --ϊϊϊ>· *· , :〇ρρι〇ι3Α: * ^•vOOOjltlO^ ' OOOf 11 Dv h 2 000010 0 4 1 4 1 000011 0 3 3 1 .0000010 3. ^ 1 67 · 1
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 73 1354239 0000011 :. 3 1 ; 00000010 2 4 1 68 1 00000011 1 4 36 1 ::ooooooof: 3:..' '-:.:念: .· T '·: 100: ' 在偽碼(pseudocode)方面,上述表格可表示如下: Table9-5[8] = { 〇, {{33,0}}, - 〇,{{〇,〇}:}, 〇, {{66, 0}}, 2, {{2,2}, {99,2}, {34, 2), {1,2}}, 1, {{4, 1},{3, 1}}, 1,{{67, 1},{35, 1}}, 1, {{68, 1},{36, 1}}, 〇, {{!〇〇,〇}} }; 使用上述表格結構,可使用上述之MatchVLC函數以 φ 實施CAVLC解碼。由於MatchVLC函數,對位元流執行 計算前導0以存取已知語法成分的表格。再者,藉由計算 前導0的值是否大於Idx的最大值,MatchVLC函數可啟動 - 計算前導〇運算(例如在部分實施例中,使用計算前導0 模組576與讀取模組572),然後傳回maxldx (其處置的 情況為0000000,如第7B圖的表格所顯示)。MatchVLC 函數以及表格結構的另一優點為不需要多個指令來處置這 些情況,其由下面MatchVLC區段所處置:Idx 1 = CLZ(sREG) 计具釗導 〇 的數量,以及 Idx 1 = (Idx 1 > maxldx)? maxldx :Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 73 1354239 0000011 :. 3 1 ; 00000010 2 4 1 68 1 00000011 1 4 36 1 ::ooooooof: 3:..' ' -:.: read: .· T '·: 100: ' In terms of pseudocode, the above table can be expressed as follows: Table9-5[8] = { 〇, {{33,0}}, - 〇, {{〇,〇}:}, 〇, {{66, 0}}, 2, {{2,2}, {99,2}, {34, 2), {1,2}}, 1, { {4, 1}, {3, 1}}, 1, {{67, 1}, {35, 1}}, 1, {{68, 1}, {36, 1}}, 〇, {{! 〇〇,〇}} }; Using the above table structure, CAVLC decoding can be performed with φ using the MatchVLC function described above. Due to the MatchVLC function, a calculation of leading 0 is performed on the bitstream to access a table of known syntax components. Furthermore, by calculating whether the value of the leading zero is greater than the maximum value of Idx, the MatchVLC function can be started - calculating the leading 〇 operation (for example, in some embodiments, using the computed leading zero module 576 and the reading module 572), and then Returns maxldx (its disposition is 0000000, as shown in the table in Figure 7B). Another advantage of the MatchVLC function and table structure is that multiple instructions are not needed to handle these situations, which are handled by the MatchVLC section below: Idx 1 = CLZ(sREG) The number of gauges, and Idx 1 = (Idx 1 > maxldx)? maxldx :
Client’s Docket No,: S3U06-0013-TW TT s Docket No:0608-A41246twf.doc/NikeyChen 1354239Client’s Docket No,: S3U06-0013-TW TT s Docket No:0608-A41246twf.doc/NikeyChen 1354239
Idxl。接著,使用MatchVLC函數的下列區段移除已使用 的位元:SHL(sREG, Idxl+#1)。使用下面MatchVLC區段 讀取子陣列(sub-array )的標頭:fixL = Table[Idxl].head, 以及Idx2=(!fixL)? Ο : READ(fixL),其傳送最大數量的位元 數以被不確定地讀取。前導0可以相同,但尾隨位元的大Idxl. Next, use the following bits of the MatchVLC function to remove the used bits: SHL(sREG, Idxl+#1). Use the following MatchVLC section to read the header of the sub-array: fixL = Table[Idxl].head, and Idx2=(!fixL)? Ο : READ(fixL), which transfers the maximum number of bits To be read indefinitely. Leading 0 can be the same, but the trailing bit is large
* I 小可以改變。因此,在一實施例中,可實施CASEX種類情 況敘述(使用較多記憶體,但較簡單的碼結構)。 使用(val,shv) = Table[Idxl][Idx2]以及 SHL(sREG,shv) 讀取表格的實際值,其亦顯示實際上多少位元為語法成分 所使用。這些位元從位元流被移除,且語法成分的值返回 至目標暫存器。 已描述VLC匹配的方法以及表格結構的配置,接著返 回參考第7A圖以描述CAVLC解碼引擎或是程序(例如: CAVLC模組582)。一旦位元流被載入,且解碼引擎、記 憶體結構以及暫存器被載入’藉由驅動軟體12 8發出 CAVLC_TOTC指令可啟動係數符記模組71〇。在一實施例 中’ CAVLC—TOTC指令具有下面示範格式: CAVLC_TOTC DST, S1, 其中,S1以及DST分別包括一輸入暫存器以及一内部輸出 暫存器,具有下面所提供的示範格式: SRC1 [3:0] = blkldx SRC1 [18:16] = bllcCat SRC1 [24] = iCbCr 剩下的位元為未定義。輸出格式描述如下:* I can be changed. Thus, in one embodiment, a CASEX category description can be implemented (using more memory, but a simpler code structure). Use (val,shv) = Table[Idxl][Idx2] and SHL(sREG,shv) to read the actual value of the table, which also shows how many bits are actually used by the syntax component. These bits are removed from the bitstream and the value of the syntax component is returned to the target scratchpad. The method of VLC matching and the configuration of the table structure have been described, and then reference is made to Figure 7A to describe the CAVLC decoding engine or program (e.g., CAVLC module 582). Once the bit stream is loaded, and the decoding engine, memory structure, and scratchpad are loaded, the coefficient register module 71 is enabled by issuing the CAVLC_TOTC command by the driver software 128. In one embodiment, the 'CACLC_TOTC instruction has the following exemplary format: CAVLC_TOTC DST, S1, where S1 and DST respectively include an input register and an internal output register, having the exemplary format provided below: SRC1 [ 3:0] = blkldx SRC1 [18:16] = bllcCat SRC1 [24] = iCbCr The remaining bits are undefined. The output format is described as follows:
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 75 1354239 DST [31:16] = TrailingOnes DST [15:〇] = TotalCoeff 因此’如圖所顯示,係數符記模組71〇接收對應於 mbCuirAddr、mbType、是否正在處理色度通道的指示(例 如.lCbCr),以及blkldx (例如:區塊索引,因為圖像可 被分成許多區塊)。對從位元流緩衝器:562b所存取的已知 巨集區塊而言’傳送blkIdx,不管是8x8像素區塊或是4χ4 像素區塊正在已知位置上進行處理。由驅動軟體128提供 上述資訊。係數符記模組710包括一查找表。根據前文描 述而輸入至係數符記模組710的查找表,可得到拖尾係數 的個數(TrailingOnes)以及非零係數(TotalCoeff)的個 數。TrailingOnes傳送有多少個1在一列上,而Totalc〇eff 傳送有多少運行/位準對(run/level pair )係數在從位元流 抽出的塊狀資料上。Trailing〇nes以及丁〇talC〇eff分別提供 至CAVLC位準模組714以及零位準模組718。TrailingOnes 亦提供至位準0模組716,其對應於從位元流緩衝器562b 所擷取的第一位準(例如:直流(DC)值)。 位準模組714紀錄符號的字尾(suffix)長度(例如: 尾隨1的數目)’以及位準模組714結合位準碼(levelc〇de) 來計算位準值(level[Idx]),之後位準值儲存在位準陣列 722以及運行陣列724内。位準模組714操作在 CAVLCJLVL指令下,其具有下列格式: CAVLCJLVL DST, S2, S1 ,其中:Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 75 1354239 DST [31:16] = TrailingOnes DST [15:〇] = TotalCoeff Therefore 'as shown in the figure, the coefficient is modeled Group 71 receives an indication corresponding to mbCuirAddr, mbType, whether a chroma channel is being processed (eg, .lCbCr), and blkldx (eg, a block index because the image can be divided into a number of blocks). The blkIdx is transmitted for a known macroblock accessed from the bitstream buffer: 562b, whether it is an 8x8 pixel block or a 4χ4 pixel block is being processed at a known location. The above information is provided by the driver software 128. The coefficient signature module 710 includes a lookup table. The number of trailing coefficients (TrailingOnes) and the number of non-zero coefficients (TotalCoeff) can be obtained by inputting the lookup table to the coefficient register module 710 according to the foregoing description. TrailingOnes transmits how many 1s are in a column, and Totalc〇eff transmits how many run/level pair coefficients are on the block data extracted from the bitstream. Trailing〇nes and Ding talC〇eff are provided to the CAVLC level module 714 and the zero level module 718, respectively. TrailingOnes is also provided to a level 0 module 716 that corresponds to a first level (eg, a direct current (DC) value) drawn from the bit stream buffer 562b. The level module 714 records the suffix length of the symbol (eg, the number of trailing 1's) and the level module 714 combines the level code (levelc〇de) to calculate the level value (level[Idx]), The level values are then stored in the level array 722 and the run array 724. The level module 714 operates under the CAVLCJLVL instruction and has the following format: CAVLCJLVL DST, S2, S1, where:
Client’s Docket No.: S3U06-0013-TW TT*s Docket No:0608-A41246twf.doc/NikeyChen 76 ^54239 : SI = Idx (16-bit); S2 = suffixLength (16-bit);以及 DST = suffixLength (16_bit)。 , 字尾長度(suffixLength)傳送碼字(C0(ie word)的大 • :小為何。來自驅動軟體128的輸入提供指定字尾長度之大 • 小的貧訊。此外’在一實施例中,因為字尾長度值被更新, • DST以及S2可選擇為同一暫存器。 更注意到,轉發暫存器(例如維持由已知模組内部地 • 產生的資料)亦可被使用,例如F1以及F2。由已知指令 内的轉發旗標指示指令以及對應模組是否使用到轉發暫存 器。符號F1 (即使用轉發來源1的值,在一實施例中可由 、 指令中的位元26所指示)以及符號F2 (即使用轉發來源2 的值’在一實施例中可由指令中的位元27所指示)可表示 轉發暫存器。當使用轉發暫存器時,CAVLC_LVL指令可 φ 具有下列示範格式: CAVLC_LVL.F1.F2 DST, SRC2, SR1 - ’其中當不是F1就是F2被設定時(例如成立),所指定 的轉發來源被當成輸入。在位準模組714的情況中,轉發 暫存器F1對應於由位準模組714產生的位準索引 (level [Idx]),其在遞增(increment)模組内遞增並輸入 至多工器730。同樣地,轉發暫存器F2對應於字尾長度 (suffixLength),其由位準模組714所產生並輸入至多工Client's Docket No.: S3U06-0013-TW TT*s Docket No:0608-A41246twf.doc/NikeyChen 76 ^54239 : SI = Idx (16-bit); S2 = suffixLength (16-bit); and DST = suffixLength ( 16_bit). , suffixLength transfer codeword (C0 (ie word) large • : small. Input from the driver software 128 provides a large length of the specified suffix • small poor news. In addition, in an embodiment, Because the suffix length value is updated, • DST and S2 can be selected as the same register. It is also noted that forwarding registers (such as maintaining data generated by known modules) can also be used, such as F1. And F2. The instruction is indicated by a forwarding flag within the known instruction and whether the corresponding module uses the forwarding register. Symbol F1 (ie, using the value of forwarding source 1, in one embodiment, the bit 26 in the instruction The indication) and the symbol F2 (i.e., the value of the forwarding source 2 may be indicated by bit 27 in the instruction in one embodiment) may represent a forwarding register. When a forwarding register is used, the CAVLC_LVL instruction may have φ The following exemplary format: CAVLC_LVL.F1.F2 DST, SRC2, SR1 - 'When F1 is not F1 or F2 is set (eg, established), the specified forwarding source is treated as input. In the case of level module 714, forwarding Register F 1 corresponds to the level index (level [Idx]) generated by the level module 714, which is incremented in the increment module and input to the multiplexer 730. Similarly, the forwarding register F2 corresponds to the suffix Length (suffixLength), which is generated by the level module 714 and input to the multiplex
Client’s Docket No.: S3U06-0013-TW TT^ Docket No:0608-A41246twf.doc/NikeyChen 77 1354239 器728。多工器730以及多工器728的其他輸入包括執行 單元暫存器輸入(在第7A圖中標示為EU),如下文所描 述。 位準模組714的另一輸入是由位準碼模組712所提供 的位準碼。位準碼模組712以及位準模組714的結合運算Client's Docket No.: S3U06-0013-TW TT^ Docket No: 0608-A41246twf.doc/NikeyChen 77 1354239 728. The multiplexer 730 and other inputs to the multiplexer 728 include an execution unit register input (labeled EU in Figure 7A), as described below. Another input to the level module 714 is the level code provided by the level code module 712. Combination operation of the level code module 712 and the level module 714
I 解碼可解碼位準值(位準為按比例縮放(scaling)之前的 轉換係數值)。透過具有下列示範格式的指令可致能位準 碼模組712。 CAVLC_LC SRC1 ,其中SRC1 = suffixLength ( 16位元)。當使用轉發暫存 器F1時,指令可表示如下: CAVLC_LVL.F1 SRC1 ,其中如果設定F1,則轉發SRC1被當成輸入。如第7 A 圖所顯示,當設定F1時(例如FI = 1),位準碼模組712 獲得轉發SRC1值(例如來自位準模組714的字尾長度) 以作為輸入,否則輸入是從執行單元暫存器所獲得(例如 F1 = 0)。 回到位準模組714,字尾長度輸入可以是由位準模組 714經由多工器728所轉發,或是經由執行單元暫存器透 過多工器728所提供。此外,Idx輸入亦可由位準模組714 經由多工器730所轉發(且由遞增模組來遞增,或是在部 分實施例中,能自動遞增而不需要遞增模組),或是經由I Decodes the decodable level value (the level is the conversion coefficient value before scaling). The level code module 712 can be enabled by an instruction having the following exemplary format. CAVLC_LC SRC1 , where SRC1 = suffixLength (16 bits). When forwarding register F1 is used, the instruction can be expressed as follows: CAVLC_LVL.F1 SRC1, where if F1 is set, then forwarding SRC1 is treated as an input. As shown in Figure 7A, when F1 is set (e.g., FI = 1), the level code module 712 obtains the forwarded SRC1 value (e.g., the suffix length from the level module 714) as an input, otherwise the input is from Obtained by the execution unit register (for example, F1 = 0). Returning to the level module 714, the suffix length input can be forwarded by the level module 714 via the multiplexer 728 or via the execution unit register multiplexer 728. In addition, the Idx input can also be forwarded by the level module 714 via the multiplexer 730 (and incremented by the incremental module, or in some embodiments, automatically incremented without incrementing the module), or via
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 78 1354239 -執行單元暫存器透過多工器730所提供。再者,位準模組 714亦直接從位準碼模組712接收位準碼輪入。除了至轉 發暫存器的輸出之外,位準模組714亦提供位準索引 (level[idx])輸出至位準陣列722。 如刖文所提到,TrailingOnes輸出至位準〇模組716。 - 位準0模組716經由下列:指令而致能:Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 78 1354239 - The execution unit register is provided by the multiplexer 730. Furthermore, the level module 714 also receives the registration of the level code directly from the level code module 712. In addition to the output to the transfer register, the level module 714 also provides a level index (level[idx]) output to the level array 722. As mentioned in the article, TrailingOnes outputs to the position permit module 716. - Level 0 module 716 is enabled via the following: command:
• CAVLC_LVLO SRC ,其中 SR_C = trailingOnes(coeff_token)。位準 〇 模組 716 的輸出包括位準索引(LeVel[IdX]),其被提供至位準陣列 722。係數值被編碼成為正負號以及大小。位準〇模組 - 提供係數的正負號值。結合來自CAVLC位準模組714的 • 大小值以及來自位準〇模組716的正負號值,並寫入至位 準陣列722。使用位準索引(level[Idx])來指定寫入的位 置。在一實施例中,係數是在子區塊(區塊為8χ8)的一 φ 個4χ4矩陣内,而不按照光栅(raster )順序。陣列之後轉 換成4x4矩陣。換句話說’被解碼的係數位準以及運行不 - 疋光柵格式。從位準-運行資料,4x4矩陣可以被重建(但 是以鑛齒形掃描順序)’接著重新排列成光栅順序4x4。 仗係數付記模組710輸出的TotalCoeff被提供至零位 準模組718。零位準模組718可經由下列指令而致能: CAVLC_ZL DST, SRC1 其中 ’ SRC1 = maxNumCoefi( 16 位元)以及 D.ST = ZerosLeft• CAVLC_LVLO SRC , where SR_C = trailingOnes(coeff_token). The output of the level module 716 includes a level index (LeVel[IdX]) that is provided to the level array 722. The coefficient value is encoded as a sign and a size. Bit 〇 Module - Provides the sign value of the coefficient. The size value from the CAVLC level module 714 and the sign value from the level module 716 are combined and written to the level array 722. Use the level index (level[Idx]) to specify the location to write. In one embodiment, the coefficients are within a φ 4 χ 4 matrix of sub-blocks (blocks 8 χ 8), not in raster order. The array is then converted to a 4x4 matrix. In other words, the 'decoded coefficient level as well as the non- - raster format. From the level-run data, the 4x4 matrix can be reconstructed (but in the orthodontic scan order)' and then rearranged into raster order 4x4. The TotalCoeff output by the 仗 coefficient payment module 710 is supplied to the zero position module 718. The zero level module 718 can be enabled via the following instructions: CAVLC_ZL DST, SRC1 where ' SRC1 = maxNumCoefi (16 bits) and D.ST = ZerosLeft
Client^ Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 79 1354239 (16位元)。maxNumCoeff係由H.264標準所給定,並被 重送以作為指令的原始值。換句話說,maxNumCoeff是由 軟體所设定。在部分實施例中,maxNumCoeff可被儲存在 硬體中。變換係數被編碼成(位準,運行)格式,其與被 編碼吟0之係數(位準)的數目有關。零位準模組718提 供兩個輸出ZerosLeft以及Reset (reset =:〇),其分別被提 供至多工态740以及多工器742。多工器740亦接收來自 運行模組720的轉發暫存器F2。多工器742接收來自運行 模組720之已遞增(在部分實施例中是經由遞增模組或是 其他方式)的轉發暫存器ΙΠ。 運行模組720分別從多工器740以及多工器742接收 ZerosLeft以及Idx輸入並提供運行索引(Run[Idx])輸出 至運行陣列724。如前文所描述,因為運行_長度編碼被用 作進一步壓縮’則係數被編碼成(位準,運行)格式。舉 例來說’假設擁有下列的值10 12 12 15 19 1 1 1 〇〇〇〇〇〇 1 〇’ 則可被編碼成(10,0) (12,1) (15,〇) (19,〇) (1,2) (〇,5) (1,〇)(〇,〇)。這個碼字通常較短。索引為位準 索引的對應索引。運行模組720可經由下列指令而致能: CAVLC_RUN DST, S2, S1 ’其中,由於ZerosLeft值被更新’ DST以及S2可選擇為 相同暫存器。因此,CAVLC一RUN指令的示範不具正負號 值顯示如下: S1 = Idx(16-bit),Client^ Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 79 1354239 (16-bit). maxNumCoeff is given by the H.264 standard and is resent as the original value of the instruction. In other words, maxNumCoeff is set by the software. In some embodiments, maxNumCoeff can be stored in hardware. The transform coefficients are encoded into a (level, run) format that is related to the number of coefficients (levels) of the encoded 吟0. The zero level module 718 provides two outputs, ZerosLeft and Reset (reset =:〇), which are provided to the multi-mode 740 and the multiplexer 742, respectively. The multiplexer 740 also receives the forward register F2 from the run module 720. The multiplexer 742 receives the forwarded registers from the run module 720 that have been incremented (in some embodiments, via an incremental module or otherwise). The run module 720 receives the ZerosLeft and Idx inputs from the multiplexer 740 and the multiplexer 742, respectively, and provides a run index (Run[Idx]) output to the run array 724. As described earlier, since run_length coding is used for further compression' then the coefficients are encoded into a (level, run) format. For example, 'assuming the following values 10 12 12 15 19 1 1 1 〇〇〇〇〇〇1 〇' can be coded as (10,0) (12,1) (15,〇) (19,〇 ) (1,2) (〇,5) (1,〇) (〇,〇). This code word is usually shorter. The index is the corresponding index of the level index. The run module 720 can be enabled via the following instructions: CAVLC_RUN DST, S2, S1 ' where the ZerosLeft value is updated 'DST and S2 can be selected as the same register. Therefore, the demonstration of the CAVLC-RUN instruction does not have a sign value as follows: S1 = Idx(16-bit),
Client's Docket No.: S3U06-0013-TW TT s Docket No:0608-A41246twf.doc/NikeyChen 80 1354239 t S2 - ZerosLeft( 16-bit) > DST = Zerosleft(16-bit) 〇 參考第7A圖’轉發暫存器被使用,其中CAVLC_RUN 邦令可得到下列格式:Client's Docket No.: S3U06-0013-TW TT s Docket No:0608-A41246twf.doc/NikeyChen 80 1354239 t S2 - ZerosLeft( 16-bit) > DST = Zerosleft(16-bit) 〇Refer to Figure 7A 'Forwarding The scratchpad is used, and the CAVLC_RUN state is available in the following format:
I • CAVLC.F1.F2 DST,:SRC2, SRC1 _ ,其中,當不是F1就是F2被設定時,則適當的轉發來源 被當成輪入。 -^關於兩暫存器暫列,位準陣列722對應於位準,而運 行陣列724對應於運行。在一實施例中,各陣列包含16個 - 元素對位準陣列722而言,各元素的大小包括16位元具 • 正負號的值,而對運行陣列724而言,其值為4位元且不 具正負號。使用下列指令分別從位準陣列722以及運行陣 列724讀取位準值以及運行值。 _ READ LRUN ncnr :其中’在-實施例中,DST包括四個128位元連續的新 時暫存器(例如··執行單元暫時或是共用暫存器)。上二 ‘作碩取可變長度解碼單元53G _位準暫存器以及運^ 暫存器’並前至目㈣存器。#此運行被讀出並儲二 :時暫存器時,運行值被轉換成16位元不具正負號的值: 舉例來說,前兩個暫存器維持16個16位元的位準值 陣列儲存第-個係數),而第三以及第四暫存器維=I • CAVLC.F1.F2 DST,:SRC2, SRC1 _ , where, when F1 is not set or F2 is set, the appropriate forwarding source is considered to be a round. -^ Regarding the two temporary registers, the level array 722 corresponds to the level and the operational array 724 corresponds to the run. In one embodiment, each array comprises 16 elements-to-level array 722, the size of each element comprising a 16-bit value of • sign, and for running array 724, the value is 4 bits. And does not have a sign. The level values and run values are read from level array 722 and run array 724, respectively, using the following instructions. _ READ LRUN ncnr : where 'in the embodiment, the DST includes four 128-bit consecutive new scratchpads (eg, an execution unit temporary or a shared scratchpad). The last two ‘for the variable length decoding unit 53G _ level register and the register 并 register to the destination (four) register. #This run is read and stored: When the scratchpad is stored, the run value is converted to a 16-bit unsigned value: for example, the first two registers maintain 16 16-bit levels Array stores the first coefficient), while the third and fourth register dimensions =
Client's Docket No.: S3U06-0013-TW TT’s Docket No:〇6〇8-A41246twf.doc/NikeyChen 81 1354239 16個16位元的運行值。當超過16個係數時,其被解碼至 記憶體。在一實施例中,以下列順序寫入值:在第一暫存 器中’最低有效16位元包含LEVEL[0]值,而位元16-31 包含LEVEL[1]值等,直到位元112-127包含LEVEL[7]值。 接著’對第二暫存器對而言,最低有效16位元包含 I LEVEL[8]等。相同的方法應用在RUN值。 根據下列示範指令格式,可使用CLRJLRUN指令來清 除位準陣列722以及運行陣列724的暫存器。 上述可變長度解碼單元530b的軟體(著色程序)以及 硬體操作(例如模組),特別是CAVLC模組582,可使用 下列偽碼來描述。Client's Docket No.: S3U06-0013-TW TT’s Docket No: 〇6〇8-A41246twf.doc/NikeyChen 81 1354239 16 16-bit running values. When there are more than 16 coefficients, it is decoded to the memory. In one embodiment, the values are written in the following order: in the first register, the least significant 16 bits contain the LEVEL[0] value, and the bits 16-31 contain the LEVEL[1] value, etc., up to the bit. 112-127 contains the LEVEL[7] value. Then, for the second register pair, the least significant 16 bits include I LEVEL [8] and the like. The same method is applied to the RUN value. The CLRJLRUN instruction can be used to clear the level array 722 and the registers that run the array 724 in accordance with the following exemplary instruction format. The software (shading program) and hardware operations (e.g., modules) of the variable length decoding unit 530b described above, particularly the CAVLC module 582, can be described using the following pseudo code.
ResiduaLblock_cavlc( coeffLevel, maxNumCoeff) {_ 一 CLR_LEVEL一RUN coeff_token_ if( TotalCoeff( coeff_token) > 0) { if( TotalCoeff( coeff_token) > 10 && TrailingOnes( coeff_token) < 3) _suffixLength = 1 _ Else _suffixLength = 0_ CAVLCJevelOQ; _for( I = TrailingOnes(coeffJaken); I < TotalCoeff( txieffjoken),' i++){ _CAVLC_levelCode(levelCode>suffixLength);_ CAVLC_level(suffixLength, i.levelCode) CAVLC_ZerosLeft(ZerosLeft, maxNumCoeff) for( i = 0; i < TotalCoeff( coeff_token) -1 ; i++) { CAVLC_run(i, ZerosLeft)_ READ_LEVEL_RUN(level, run) run[ TotalCoeff( cx)eff_token) -1 ] = zerosLeft coeffNum = -1_ for( i = TotalCoeff( coeff一token) -1; i >= 0; i-) { coeffNum += run[ i ] +1ResiduaLblock_cavlc( coeffLevel, maxNumCoeff) {_ A CLR_LEVEL-RUN coeff_token_ if( TotalCoeff( coeff_token) > 0) { if( TotalCoeff( coeff_token) > 10 && TrailingOnes( coeff_token) < 3) _suffixLength = 1 _ Else _suffixLength = 0_ CAVLCJevelOQ; _for( I = TrailingOnes(coeffJaken); I < TotalCoeff( txieffjoken), ' i++){ _CAVLC_levelCode(levelCode>suffixLength);_ CAVLC_level(suffixLength, i.levelCode) CAVLC_ZerosLeft(ZerosLeft, maxNumCoeff) for( i = 0; i < TotalCoeff( coeff_token) -1 ; i++) { CAVLC_run(i, ZerosLeft)_ READ_LEVEL_RUN(level, run) run[ TotalCoeff( cx)eff_token) -1 ] = zerosLeft coeffNum = -1_ for( i = TotalCoeff( coeff a token) -1; i >= 0; i-) { coeffNum += run[ i ] +1
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 82 1354239Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 82 1354239
MPEG解碼 以上已描述用作CABAC解碼(經由CABAC模紱 ' 的可變長度解碼單元530a ):以及CAVLC解碼(經由〇 .模組582的可變長度解碼單元530b)的解碼系統2〇〇,^ • 下來將描述解碼系統200的MPEG實施例,於此稱為可纈 長度解碼單元530c。可變長度解碼單元53〇c是根據= MPEG模組578 (第5C圖所顯示)所執行的運算而操作。 為了間化與CABAC以及CAVLC實施例共有的特徵(勹 • 括位元流緩衝器以及對應的指令)被省略,除了下列其= • 需要注意的部分。INIT指令設置可變長度解碼單元53〇'逛 入MPEG模式,以及使用READ、NPSTR、INPTRB (解釋 於前文)以及VLC—MPEG2指令的混合以解碼MPEG_2位元 φ 流。由著色器程式判斷使用何種方法。MPEG-2位元流具 有全決定文法(fully deterministic grammar ),且著色碼執 行用以解密文法的方法。 - 在一實施例中,對MPEG-2處理而言,實施表格以霍 夫曼解碼於MatchVLC_X函數’描述於後。因此,兩指令 被載入至MPEG模組578 ’包括INIT_MPEG2指令以及 VLC—MPEG2 指♦ 。INIT_MPEG2 指立元&ϋ言史定 可變長度解碼單元530進入MPEG2模式。在此模式中,當 第一係數為直流(DC)時,總體暫存器614保持住值。在MPEG decoding has been described above as a decoding system for CABAC decoding (variable length decoding unit 530a via CABAC module): and CAVLC decoding (via variable length decoding unit 530b of module 582), ^ • An MPEG embodiment of the decoding system 200 will be described below, referred to herein as a scalable length decoding unit 530c. The variable length decoding unit 53A is operated in accordance with an operation performed by the = MPEG module 578 (shown in Fig. 5C). Features that are common to the CABAC and CAVLC embodiments (ie, the bitstream buffer and the corresponding instructions) are omitted, except for the following: • The part that needs attention. The INIT instruction sets the variable length decoding unit 53 to 'wander in the MPEG mode, and uses a mixture of READ, NPSTR, INPTRB (explained above) and VLC-MPEG2 instructions to decode the MPEG_2 bit φ stream. The color program program determines which method to use. The MPEG-2 bit stream has a fully deterministic grammar, and the shading code performs a method for decrypting the grammar. - In an embodiment, for MPEG-2 processing, the implementation table is described in Huffman decoding in the MatchVLC_X function'. Therefore, the two instructions are loaded into the MPEG module 578' including the INIT_MPEG2 instruction and the VLC-MPEG2 reference ♦. INIT_MPEG2 refers to the epoch & rumor history variable length decoding unit 530 enters the MPEG2 mode. In this mode, the overall register 614 holds the value when the first coefficient is direct current (DC). in
Client's Docket No.: S3U06-0013-TW TT's Docket N〇:0608-A41246twf.doc/NikeyChen 83 1354239 MPEG-2中有一或多個串流,其為相同的,但是根據是否 為直流或是交流而有不同的解譯。位元载入至 VLD一globalRegister.InitDC暫存器被使用,而不是創造另 一個指令。注意到對應於總體暫存器614 (例如映射到總 體暫存器614(例如gl〇balregister[0]))的暫存器使用在 CABAC以及CAVLC模式中,但是因為MPEG2模式下兩 有不同的解譯(以及因此標示不同)。因此,在巨集區塊 的開始’值(VLD—globalRegister.InitDC暫存器内的位元) 被初始化成1。當使用MatchVLC一3函數時,判斷 VLD_globalRegister.InitDC暫存器内的位元是否為1或是 0。如果為1的話,位元被改變成〇,以供已知巨集區塊後 來的離散餘弦變換(discrete cosine transform,DCT )符 $ 進行解碼。由著色器以及内部重置設定上述值。在實體部 分’ VLD_globalRegister.InitDC位元為旗標值,其傳送被 解碼的DCT符號是否為已知巨集區塊之DCT符號的開始。 MPEG模組578使用一具有符號之非常特定文法進行 解碼,其中上述符號是使用限定數量之霍夫曼表格戶斤_ 碼。在具有特定符號值的著色器内執行文法的分析,其+ 特定符號值是使用具有#Imml6值使用於特定霍夫曼表格 的VLC—MPEG2指令所得到,其應該被使用以解瑪特定符 號。 在描述可變長度解碼單元530c的不同元件之前,用以 實施MPEG-2標準之不同表格的硬體以及軟體結構的簡單 描述如下。在 MPEG-2 標準(ISO-IEC 13818-2 ( 1995 ))Client's Docket No.: S3U06-0013-TW TT's Docket N〇: 0608-A41246twf.doc/NikeyChen 83 1354239 There are one or more streams in MPEG-2, which are the same, but depending on whether it is DC or AC Different interpretations. The bit is loaded into the VLD-globalRegister. The InitDC register is used instead of creating another instruction. Note that the scratchpad corresponding to the overall scratchpad 614 (eg, mapped to the overall scratchpad 614 (eg, gl〇balregister[0])) is used in CABAC and CAVLC modes, but because of the different solutions in MPEG2 mode. Translated (and therefore marked different). Therefore, the start value of the macro block (the bit in the VLD-globalRegister.InitDC register) is initialized to 1. When using the MatchVLC-3 function, it is determined whether the bit in the VLD_globalRegister.InitDC register is 1 or 0. If it is 1, the bit is changed to 〇 for decoding by the discrete cosine transform (DCT) $ after the known macro block. The above values are set by the color picker and internal reset. In the entity part, the VLD_globalRegister.InitDC bit is a flag value that conveys whether the decoded DCT symbol is the beginning of the DCT symbol of the known macroblock. The MPEG module 578 decodes using a very specific grammar with symbols, wherein the above symbols use a limited number of Huffman tables. The analysis of the grammar is performed in a colorimeter having a particular symbol value, the + specific symbol value being obtained using a VLC-MPEG2 instruction having a #Imml6 value for a particular Huffman table, which should be used to solve the specific symbol. A brief description of the hardware and software structures used to implement the different tables of the MPEG-2 standard is described below before describing the different elements of the variable length decoding unit 530c. In the MPEG-2 standard (ISO-IEC 13818-2 (1995))
Client's Docket No.: S3U06-0013-TW TT,s Docket No:0608-A41246twf.doc/NikeyChen 84 1354239 中,所使用的編碼被定義在表;B_ 1至表丨5,其為 標準所提供之已知表格。在可變長度解碼單元兄㈦的不同 實施例中,一或多個表B-1至表B_15以專業硬體型式而實 施,例如合成為邏輯閘。根據實施方式(例如:hdtv、 HDDVD f)或今所需之硬體安排,部分表格可以〒用硬 體方式來實施,而是可以使用其他指令(例如:將描述於 後的EXP-GOL—UD指令,或是透過READ指令)來實施。 舉例來δ兒,雖然表B-2、表B-3以及表B-l 1的邏輯閘數量 不大,所使用到的加法可能需要額外的多工器階段,其意 味有關速度以及延遲。在部分實施例中,表Β_5至表Β_8 不由硬體所支援,因為其不需要支援設定檔。然而,部分 實施例可透過對效能具有最小影響之不同指令(例如: INPSTR、EXP_GOL—UD以及READ指令)而提供上述支 援。 繼續參考已知的 MPEG 表格,表 B-1 (Macroblock_address—increment)、表 B-10( motion—code) 以及表B-9 ( coded_block_pattern )具有相似的結構。由於 部分相似,上述三個表格可使用由MPEG模組578執行的 MatchVLC函數而實施以及描述於後。對表B-9以及表B-10 而言,示範的表格結構表示如下: struct Table { unsigned head; //表格位址之位元數 struct table { unsigned val:6; //表 B-10 中為 5 位元Client's Docket No.: S3U06-0013-TW TT, s Docket No: 0608-A41246twf.doc/NikeyChen 84 1354239, the codes used are defined in the table; B_1 to Table 5, which are provided by the standard Know the form. In a different embodiment of the variable length decoding unit (7), one or more of Tables B-1 through B_15 are implemented in a professional hardware version, such as a logical gate. Depending on the implementation (eg hdtv, HDDVD f) or the hardware arrangement required today, some tables can be implemented in hardware, but other instructions can be used (eg: EXP-GOL-UD, which will be described later) The instruction is executed by the READ command. For example, δ, although the number of logic gates in Table B-2, Table B-3, and Table B-1 is not large, the addition used may require an additional multiplexer stage, which means speed and delay. In some embodiments, the tables _5 to Β_8 are not supported by the hardware because they do not need to support the profile. However, some embodiments may provide such support through different instructions that have minimal impact on performance (e.g., INPSTR, EXP_GOL-UD, and READ instructions). Continuing with reference to known MPEG tables, Table B-1 (Macroblock_address_increment), Table B-10 (motion_code), and Table B-9 (coded_block_pattern) have similar structures. Because of their partial similarity, the above three tables can be implemented using the MatchVLC function executed by the MPEG module 578 and described later. For Table B-9 and Table B-10, the exemplary table structure is represented as follows: struct Table { unsigned head; //The number of bits in the table address struct table { unsigned val:6; //Table B-10 5 bits
Client's Docket No.: S3U06-0013-TW TTss Docket No:0608-A41246twf.doc/NikeyChen ,239 * » unsigned shv:2; //實際位元數 }table[]; }Table[]; 鮮表B-l而言,示範的表格結構表示如下: struct Table {Client's Docket No.: S3U06-0013-TW TTss Docket No:0608-A41246twf.doc/NikeyChen ,239 * » unsigned shv:2; //actual bit number}table[]; }Table[]; fresh table Bl The exemplary table structure is expressed as follows: struct Table {
- I unsigned head; //表格位址之位元數 struct table { % unsigned val:5; unsigned shv:3; //實際位元數 - }tablet]; - }Table[]; • 在下面功能中,只有SHL運算能從sreG暫存器562a 心除資料。不像著色益、的READ指令,使用在MatchVLC 函數的READ功能能從SREG暫存器562a移除位元而不 鲁 私要從SREG暫存器562b移除任何位元。下面插述使用在 MPEG-2中實施表格之MatchVLC函數以提供作為霍夫@ . 解碼。 FUNCTION MatchVLC_1{ ' T = READ(2); /職2 飯 SHL(2); CASE(T){ 00: OUTPUT(1); 01: OUTPUT(2); 10:{ Q = READ ⑴; SHL ⑴; CASE (Q){ 0 : OUTPUT(O);- I unsigned head; //The number of bits in the table address struct table { % unsigned val:5; unsigned shv:3; //the actual number of bits - }tablet]; - }Table[]; • In the following function Only SHL operations can erase data from the sreG register 562a. Unlike the READ instruction of the coloring benefit, the READ function in the MatchVLC function can be used to remove a bit from the SREG register 562a without removing any bits from the SREG register 562b. The following uses the MatchVLC function that implements the table in MPEG-2 to provide the decoding as Hof@. FUNCTION MatchVLC_1{ ' T = READ(2); / job 2 meal SHL(2); CASE(T){ 00: OUTPUT(1); 01: OUTPUT(2); 10:{ Q = READ (1); SHL (1); CASE (Q){ 0 : OUTPUT(O);
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 86 1354239 1: OUTPUT⑶; 11 :{ ldx=CLO(sREG);//|十朝丨導 1Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 86 1354239 1: OUTPUT(3); 11 :{ ldx=CLO(sREG);//| 十朝丨导 1
Idx = min(ldx,7); shv = (Idx != 7) ldx+1: Idx; SHL(shv); ,OUTPUT(4+ldx); FUNCTION MatchVLC_2{ T = READ(2); /纖 2 贩Idx = min(ldx,7); shv = (Idx != 7) ldx+1: Idx; SHL(shv); , OUTPUT(4+ldx); FUNCTION MatchVLC_2{ T = READ(2); /fiber 2
SHL(2); CASE (TK 00: OUTPUT(O); 01 : OUTPUT(1); 10 : OUTPUT(2); 11 :{ ldx=CLO(sREG);/針朝 |導 1SHL(2); CASE (TK 00: OUTPUT(O); 01 : OUTPUT(1); 10 : OUTPUT(2); 11 :{ ldx=CLO(sREG);/pin toward |
Idx = min(ldx,8); shv = (Idx != 8) ldx+1: Idx; SHL(shv); OUTPUT(3+ldx);Idx = min(ldx,8); shv = (Idx != 8) ldx+1: Idx; SHL(shv); OUTPUT(3+ldx);
FUNCTION MatchVLC_3{ INIT—MB DC = TRUE; T = CLZ(sREG); SHL(T+1); CASE (T){ 0: IF (DC){ DC = FALSE; Q = READ(1); SHL(1); OUTPUT({0,SGN(Q)*1});} ELSE{ Q = READ(1);FUNCTION MatchVLC_3{ INIT—MB DC = TRUE; T = CLZ(sREG); SHL(T+1); CASE (T){ 0: IF (DC){ DC = FALSE; Q = READ(1); SHL(1 ); OUTPUT({0,SGN(Q)*1});} ELSE{ Q = READ(1);
IF (!Q) {OUTPUT({63,0}); shv=1} // EOB ELSE {R=READ(1); OUTPUT({0,SGN(R)*1}); shv=2}IF (!Q) {OUTPUT({63,0}); shv=1} // EOB ELSE {R=READ(1); OUTPUT({0,SGN(R)*1}); shv=2}
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 87 1354239 SHL(shv); Q = READ ⑶;Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 87 1354239 SHL(shv); Q = READ (3);
CASE (QX 1XX: 0UTPUT({1, SGN(Q[1])*1}); shv = 2; 01X: OUTPUT({2, SGN(Q[0])*1}); shv = 3; OOX: OUTPUT({0, SGN(Q[0])*2}); shv = 3; } SHL(shv); Q = READ(2); SHL(2); CASE(Q){ 00: { R=READ(4); CASE(R){ OOOX: OUTPUT({16, SGN(R[0])*1}); 001X : OUTPUT({5, SGN(R[0])*2}); 01 OX : OUTPUT({0, SGN(R[0])*7}); 011X: OUTPUT({2, SGN(R[0])*3}); 100X: OUTPUT({1, SGN(R[0])*4}); 101X: OUTPUT({15, SGN(R[0])*1}); 110X : OUTPUT({14, SGN(R[0])*1}); 111X : OUTPUT({4, SGN(R[0])*2}); }CASE (QX 1XX: 0UTPUT({1, SGN(Q[1])*1}); shv = 2; 01X: OUTPUT({2, SGN(Q[0])*1}); shv = 3; OOX : OUTPUT({0, SGN(Q[0])*2}); shv = 3; } SHL(shv); Q = READ(2); SHL(2); CASE(Q){ 00: { R= READ(4); CASE(R){ OOOX: OUTPUT({16, SGN(R[0])*1}); 001X : OUTPUT({5, SGN(R[0])*2}); 01 OX : OUTPUT({0, SGN(R[0])*7}); 011X: OUTPUT({2, SGN(R[0])*3}); 100X: OUTPUT({1, SGN(R[0] )*4}); 101X: OUTPUT({15, SGN(R[0])*1}); 110X : OUTPUT({14, SGN(R[0])*1}); 111X : OUTPUT({4 , SGN(R[0])*2}); }
Shv = 4; } 01X: SGN = READ(1); OUTPUT({0, SGN*3}); shv= 1 10X : SGN = READ(1); OUTPUT({4, SGN*1}); shv= 1 11X : SGN = READ(1); OUTPUT({3, SGN*1}); shv= 1 } SHL(shv); 3:{ Q = READ ⑶; CASE (Q){ OOX : OUTPUT({7, SGN(Q[0])*1}) 01X: OUTPUT({6, SGN(Q[0])*1}) 10X: OUTPUT({1, SGN(Q[0])*2}) 11X: OUTPUT({5, SGN(Q[0])*1}) } SHL(3);Shv = 4; } 01X: SGN = READ(1); OUTPUT({0, SGN*3}); shv= 1 10X : SGN = READ(1); OUTPUT({4, SGN*1}); shv= 1 11X : SGN = READ(1); OUTPUT({3, SGN*1}); shv= 1 } SHL(shv); 3:{ Q = READ (3); CASE (Q){ OOX : OUTPUT({7, SGN(Q[0])*1}) 01X: OUTPUT({6, SGN(Q[0])*1}) 10X: OUTPUT({1, SGN(Q[0])*2}) 11X: OUTPUT ({5, SGN(Q[0])*1}) } SHL(3);
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 4:{1354239 〇=READ⑶;Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 4:{1354239 〇=READ(3);
CASE (QX OOX: OUTPUT({2, SGN(Q[0])*2}); 01X: OUTPUT({9, SGN(Q[0])*1}); 10X: OUTPUT({0, SGN(Q[0])*4}); 11X: OUTPUT({8, SGN(Q[0])*1}); } , SHL(3); . } 5: Q = READ(19); OUTPUT({Q[18:13], Q[12:0]}); 6:{ Q = READ ⑷;CASE (QX OOX: OUTPUT({2, SGN(Q[0])*2}); 01X: OUTPUT({9, SGN(Q[0])*1}); 10X: OUTPUT({0, SGN( Q[0])*4}); 11X: OUTPUT({8, SGN(Q[0])*1}); } , SHL(3); . } 5: Q = READ(19); OUTPUT({ Q[18:13], Q[12:0]}); 6:{ Q = READ (4);
CASE(Q){ OOOX : OUTPUT({16, SGN(Q[0])*1}); 001X: OUTPUT({5, SGN(Q[0])*2}); 01 OX: OUTPUT({0, SGN(Q[0])*7}); 011X: 0UTPUT({2, SGN(Q[0])*3}); 100X: 0UTPUT({1, SGN(Q[0])*4}); 101X : 0UTPUT({15, SGN(Q[0])*1}); 110X : 0UTPUT({14, SGN(Q[0])*1}); 111X: 0UTPUT({4, SGN(Q[0])*2}); } SHL(4); } 7,8,9,10,11: JVLC(TableC[T|);CASE(Q){ OOOX : OUTPUT({16, SGN(Q[0])*1}); 001X: OUTPUT({5, SGN(Q[0])*2}); 01 OX: OUTPUT({0 , SGN(Q[0])*7}); 011X: 0UTPUT({2, SGN(Q[0])*3}); 100X: 0UTPUT({1, SGN(Q[0])*4}) 101X : 0UTPUT({15, SGN(Q[0])*1}); 110X : 0UTPUT({14, SGN(Q[0])*1}); 111X: 0UTPUT({4, SGN(Q[ 0])*2}); } SHL(4); } 7,8,9,10,11: JVLC(TableC[T|);
FUNCTION MatchVLC_4{ T = CLZ(sREG); SHL(T+1); CASE (T){ 〇:{ Q = CLO(sREG); R = min(Q,7); shv = (R != 7)R+1 :R; SHL(shv); CASE (R){ 0: S = READ(1); OUTPUT({0, SGN(S)*1}); shv=1; 1 : S = READ(1); OUTPUT({0, SGN(S)*2}); shv=1; 2:{FUNCTION MatchVLC_4{ T = CLZ(sREG); SHL(T+1); CASE (T){ 〇:{ Q = CLO(sREG); R = min(Q,7); shv = (R != 7)R +1 :R; SHL(shv); CASE (R){ 0: S = READ(1); OUTPUT({0, SGN(S)*1}); shv=1; 1 : S = READ(1) ; OUTPUT({0, SGN(S)*2}); shv=1; 2:{
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 89 1354239 R = READ(2); SHL(2);Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 89 1354239 R = READ(2); SHL(2);
CASE (RK OX: OUTPUT({0, SGN(R[0])*4}); 1X: OUTPUT({0, SGN(R[0])*5}); 3:{ R = READ(3); SHL(3); CASE(RH , OOX: OUTPUT({9, SGN(R[OJ)*1}); 01X: 0UTPUT({1, SGN(R[0])*3}); 10X: OUTPUT({10, SGN(R[0])*1}); 11X: OUTPUT({0, SGN(R[0])*8});CASE (RK OX: OUTPUT({0, SGN(R[0])*4}); 1X: OUTPUT({0, SGN(R[0])*5}); 3:{ R = READ(3) ; SHL(3); CASE(RH , OOX: OUTPUT({9, SGN(R[OJ)*1}); 01X: 0UTPUT({1, SGN(R[0])*3}); 10X: OUTPUT ({10, SGN(R[0])*1}); 11X: OUTPUT({0, SGN(R[0])*8});
4:{ R=READ(3);4:{ R=READ(3);
CASE(RX ' OXX: OUTPUT({0, SGN(R[0])*9}); shv=2; 10X: OUTPUT({0, SGN(R[0])*12}); shv= 3; ' 11X: OUTPUT({0, SGN(R[0])*13}); shv = 3; ' } SHL(shv); ' } 5::{ R = READ(2); SHL(2); CASE (R){ OX : 0UTPUT({2, SGN(R[0])*3}); • 1X: 0UTPUT({4, SGN(R[0])*2}); 6 : S = READ(1); OUTPUT({0, SGN(S)*14}); shv=1; 7 : S = READ(1); OUTPUT({0, SGN(S)*15}); shv=1; } SHL(shv); Q = READ(2); SHL(2); CASE (Q){ OX: 0UTPUT({1, SGN(Q[0])*1}); 10: OUTPUT({63,0}); //<EOB> 11 : R = READ(1); SHL(1); OUTPUT(0,SGN(R)*3});CASE(RX ' OXX: OUTPUT({0, SGN(R[0])*9}); shv=2; 10X: OUTPUT({0, SGN(R[0])*12}); shv= 3; ' 11X: OUTPUT({0, SGN(R[0])*13}); shv = 3; ' } SHL(shv); ' } 5::{ R = READ(2); SHL(2); CASE (R) { OX : 0UTPUT({2, SGN(R[0])*3}); • 1X: 0UTPUT({4, SGN(R[0])*2}); 6 : S = READ(1 ); OUTPUT({0, SGN(S)*14}); shv=1; 7 : S = READ(1); OUTPUT({0, SGN(S)*15}); shv=1; } SHL( Shv); Q = READ(2); SHL(2); CASE (Q){ OX: 0UTPUT({1, SGN(Q[0])*1}); 10: OUTPUT({63,0}); //<EOB> 11 : R = READ(1); SHL(1); OUTPUT(0,SGN(R)*3});
Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 90 1354239 Q = READ(2); SHL(2); CASE (QX 00:{ R = READ(4); shv = 4; CASE (R){ 000X: OUTPUT({1, SGN(R[0])*5}); 001X: 0UTPUT({1,1, SGN(R[0])*1}) 01 OX: OUTPUT({0, SGN(R[0])*11}) 011X: OUTPUT({0; SGN(R[0])*10}) 100X: 0UTPUT({13, SGN(R[0])*1}) 101X: OUTPUT({12, SGN(R[0])*1}) 110X : OUTPUT({3, SGN(R[0])*2});Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 90 1354239 Q = READ(2); SHL(2); CASE (QX 00:{ R = READ(4); shv = 4; CASE (R) { 000X: OUTPUT({1, SGN(R[0])*5}); 001X: 0UTPUT({1,1, SGN(R[0])*1}) 01 OX: OUTPUT ({0, SGN(R[0])*11}) 011X: OUTPUT({0; SGN(R[0])*10}) 100X: 0UTPUT({13, SGN(R[0])*1} ) 101X: OUTPUT({12, SGN(R[0])*1}) 110X : OUTPUT({3, SGN(R[0])*2});
111X : OUTPUT({1, SGN(R[0])*4}); 01 : R = READ(1); 0UTPUT({2,SGN(R)*1}); shv=1; 10: R = READ(1); 0UTPUT({1,SGN(R)*2}); shv=1; 11: R = READ(1); 0UTPUT({3,SGN(R)*1}); shv=1; } SHL(shv); } 3:{ Q = READ⑶;SHL(3); CASE (Q){ OOX: OUTPUT({0, SGN(Q[0])*7}); 01X: OUTPUT({0, SGN(Q[0])*6});111X : OUTPUT({1, SGN(R[0])*4}); 01 : R = READ(1); 0UTPUT({2,SGN(R)*1}); shv=1; 10: R = READ(1); 0UTPUT({1,SGN(R)*2}); shv=1; 11: R = READ(1); 0UTPUT({3,SGN(R)*1}); shv=1; } SHL(shv); } 3:{ Q = READ(3); SHL(3); CASE (Q){ OOX: OUTPUT({0, SGN(Q[0])*7}); 01X: OUTPUT({0, SGN(Q[0])*6});
10X: 0UTPUT({4, SGN(Q[0])*1}); 11X: 0UTPUT({5, SGN(Q[0])*1}); 4:{ Q = READ(3); SHL(3); CASE (Q){ OOX: 0UTPUT({7, SGN(Q[0])*1}) 01X: 0UTPUT({8, SGN(Q[0])*1}) 10X: 0UTPUT({6, SGN(Q[0])*1}) 11X: 0UTPUT({2, SGN(Q[0])*2}) 5: Q = READ(19); OUTPUT({Q[18:13], Q[12:0]}); 6:{10X: 0UTPUT({4, SGN(Q[0])*1}); 11X: 0UTPUT({5, SGN(Q[0])*1}); 4:{ Q = READ(3); SHL( 3); CASE (Q){ OOX: 0UTPUT({7, SGN(Q[0])*1}) 01X: 0UTPUT({8, SGN(Q[0])*1}) 10X: 0UTPUT({6 , SGN(Q[0])*1}) 11X: 0UTPUT({2, SGN(Q[0])*2}) 5: Q = READ(19); OUTPUT({Q[18:13], Q [12:0]}); 6:{
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 Q = READ(2); SHL(2);Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 Q = READ(2); SHL(2);
CASE (QX 00: R = READ(1); OUTPUT({5, SGN(R)*2}); shv=1; 01 : R = READ(1); OUTPUT({14, SGN(R)*1}); shv=1; 10:{ R = READ(2); shv = 2;CASE (QX 00: R = READ(1); OUTPUT({5, SGN(R)*2}); shv=1; 01 : R = READ(1); OUTPUT({14, SGN(R)*1 }); shv=1; 10:{ R = READ(2); shv = 2;
CASE (RK OX: 0UTPUT({2, SGN(R[0])*4}); 1X: 0UTPUT({16, SGN(R[0])*1}); 11 : R = READ(1); 0UTPUT({15, SGN(R)*1}); shv=1; }CASE (RK OX: 0UTPUT({2, SGN(R[0])*4}); 1X: 0UTPUT({16, SGN(R[0])*1}); 11 : R = READ(1); 0UTPUT({15, SGN(R)*1}); shv=1; }
SHL(shv); 7,8,9,10,11:JVLCCTableCn); } } 從上面MatchVLC函數注意到,通常已解碼之最低有 效位元會決定值的正負號,如此可使用SGN功能來檢查, 其描述如下: FUNCTION SGN(R){ RETURN (R == 1)? -1:1;} 更注意到對MatchVLC_3以及MatchVLC_4而言,表格為 共同的(或是至少為一超集),因此可使用下面表格來存 取功能。 FUNCTION JVLC(Table){ Q =READ(5); SHL(5); {R,L} = Table[Q]; RETURN {R,L}; } 到MatchVLC的介面,或者應該說MatcliVLC_X (其SHL(shv); 7,8,9,10,11:JVLCCTableCn); } } It is noted from the MatchVLC function above that the least significant bit that is usually decoded determines the sign of the value, so it can be checked using the SGN function. It is described as follows: FUNCTION SGN(R){ RETURN (R == 1)? -1:1;} It is more noted that for MatchVLC_3 and MatchVLC_4, the tables are common (or at least a superset), so Use the form below to access features. FUNCTION JVLC(Table){ Q =READ(5); SHL(5); {R,L} = Table[Q]; RETURN {R,L}; } to the interface of MatchVLC, or should say MatcliVLC_X (its
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 〇〇 1354239 中X等於1、2等)函數為下列指令: VLC_MPEG2 DST, #Imml6 ,其中,使用#Imml6值以選擇適當的表格,且因此以解碼 特定語法成分。使用#Imml6作為表格的索引(例如:〇、1、 2、3)而從指令存取表格。#Imm 16的值以及對應方法、語Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 〇〇 1354239 where X equals 1, 2, etc.) The function is the following command: VLC_MPEG2 DST, #Imml6 , where #Imml6 value is used Select the appropriate table and therefore decode the specific grammatical components. Use #Imml6 as an index to the table (for example: 〇, 1, 2, 3) to access the table from the instruction. #Imm 16 value and corresponding method, language
I I 法成分以及MPEG-2表格的關'係描述於下面表五;° #Imm16 方法 mmm MPEG-2 VLC 表 0 MatchVLC(B-1,7) Macroblock_address_increment B-1 1 MatchVLC(B-9,8) Coded_block_pattem B-9 2 MatchVLC(B-10,6) Motion_code B-10 3 Match VLC_1 Dct_dc_size_luminance B-12 4 MatchVLC一2 Dct_dc_size_chrominance B-13 5 MatchVLC_3 DCT coefficients (Table 0) B-14 6 MatchVLC一4 DCT coefficients (Table 1) B-15 表五 EXP-GOLOMB 解碼 已描述用作CABAC解碼(經由CABAC模組580的 可變長度解碼單元530a)、CAVLC解碼(經由CAVLC模 組582的可變長度解碼單元53〇b)以及mpeg解碼(經由 MPEG模組578的可變長度解碼單元530c)的解碼系統 200 ’接下來將描述解碼系統2〇〇的EXp_G〇i〇mb實施例, 於此稱為可變長度解碼單元530d。可變長度解碼單元53〇d 根據EXP-Golomb模組584 (第5C圖所顯示)的運算而操 作。可變長度解碼單元53〇d使用如CABAC及CAVLC實 施例所使用的相同硬體以及相同位元流緩衝器排列。因The II component and the MPEG-2 table are described in Table 5 below. ° #Imm16 Method mmm MPEG-2 VLC Table 0 MatchVLC(B-1,7) Macroblock_address_increment B-1 1 MatchVLC(B-9,8) Coded_block_pattem B-9 2 MatchVLC(B-10,6) Motion_code B-10 3 Match VLC_1 Dct_dc_size_luminance B-12 4 MatchVLC-2 Dct_dc_size_chrominance B-13 5 MatchVLC_3 DCT coefficients (Table 0) B-14 6 MatchVLC-4 DCT coefficients ( Table 1) B-15 Table 5 EXP-GOLOMB decoding has been described for use as CABAC decoding (variable length decoding unit 530a via CABAC module 580), CAVLC decoding (variable length decoding unit 53 via CAVLC module 582) And the decoding system 200 of the mpeg decoding (variable length decoding unit 530c via the MPEG module 578). Next, the EXp_G〇i〇mb embodiment of the decoding system 2〇〇 will be described, which is referred to herein as a variable length decoding unit. 530d. The variable length decoding unit 53〇d operates in accordance with the operation of the EXP-Golomb module 584 (shown in Fig. 5C). The variable length decoding unit 53 〇d uses the same hardware and the same bit stream buffer arrangement as used by the CABAC and CAVLC embodiments. because
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 此,與CABAC以及CAVLC實施例共有的特徵被省略,除 了下列需要注意的部分。在描述可變長度解碼單元5 3 〇d之 前,先提出有關EXP-Golomb的簡單描述。 在EXP-Golomb中’資料包含字首(prefix)以及字尾 (suffix )格式,顯示如下:Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 Thus, features common to the CABAC and CAVLC embodiments are omitted except for the following points requiring attention. Before describing the variable length decoding unit 5 3 〇d, a brief description about EXP-Golomb is proposed. In EXP-Golomb, the data contains the prefix and suffix formats, as shown below:
I codeNum 範圍 1 0 0 1 x〇 1-2 β 〇 〇 1 Xi x〇 3-6 0 0 0 1 χ2 Xi χ〇 7-14 - 0 0 0 0 1 Χ3 χ2 Χτ χ〇 15-30 000001Χ4Χ3Χ2χιΧ〇 31-62 • 因為多數的碼字較短,有壓縮被獲得。再者,多數的 碼字為唯一並且容易解碼。在Η.264中,有四種 EXP-Golomb編碼方法使用.不具正負號*—元(Unary )、 _ 正負號以及映射(碼字被映射至表格)。這些方法用以編 碼已編碼之巨集區塊圖型以及截短(truncate )。在可變長 度解碼單元530d中’提供單一指令以執行如下面表六所顯 •示不同型式之EXP-Golomb碼的解碼。截短EXP-Golomb 解碼描述如下。 codeNum = EXP_GOLOMB_UD t = Cl_Z SHL(t+1) val = READ(t) //val 不具正負號 codeNum = 2 — 1 + val codeNum = EXP_GOLOMB_CD(kOrder) IZ := CountLeadingZero(sREG); sREG := {(sREG «(lz+1)),bitStreamBufferi〇:lz]};I codeNum Range 1 0 0 1 x〇1-2 β 〇〇1 Xi x〇3-6 0 0 0 1 χ2 Xi χ〇7-14 - 0 0 0 0 1 Χ3 χ2 Χτ χ〇15-30 000001Χ4Χ3Χ2χιΧ〇31 -62 • Since most of the code words are shorter, compression is obtained. Furthermore, most codewords are unique and easy to decode. In Η.264, there are four EXP-Golomb encoding methods used. There is no sign *-unary, _ sign, and mapping (codewords are mapped to tables). These methods are used to encode the encoded macroblock pattern and truncation. A single instruction is provided in variable length decoding unit 530d to perform decoding of different types of EXP-Golomb codes as shown in Table 6 below. The truncated EXP-Golomb decoding is described below. codeNum = EXP_GOLOMB_UD t = Cl_Z SHL(t+1) val = READ(t) //val without sign codeNum = 2 — 1 + val codeNum = EXP_GOLOMB_CD(kOrder) IZ := CountLeadingZero(sREG); sREG := {( sREG «(lz+1)), bitStreamBufferi〇:lz]};
Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 94 1354239 J:=lz+kOrder-1; val := (J >= 0)? ZeroExtend(sREG[0: J]): 0; sREG := {(sREG «(lz+1)),bitStreamBuffer{0:lz]}; codeNum := (1《(Iz + kOrder)) + (OxFFFFFFF « kOrder) + val; Seval = EXP_GOLOMB_SD t k = EXP_GOLOMB_UD (-\f+])Ceil{-1 Seval= UJ * cbp = EXP_GOLOMB_MD(Type) k = EXP GOLOMB UD 一 .— cbp = TableCBP[Type][k] 表六 進一步解釋這些指令,EXP_GOLOMB_UD指令解碼一 元編碼之編碼符號。EXP_GOLOMB_SD指令解碼具正負號 之一元編碼的編碼符號。如表六所顯示,對 EXP_GOLOMB_SD指令而言,當k = 0時,在正0以及負 〇之間沒有差別,因此傳回的值為0。E XP_GOLOMB_MD (SRC1)指令解碼映射編碼符號,其中SRC1 = Type,其 與巨集區塊參數以及coded_block_pattern有關。Type的值 會導致下列 coded_block_parameter :Client's Docket No.: S3U06-0013-TW TT5s Docket No:0608-A41246twf.doc/NikeyChen 94 1354239 J:=lz+kOrder-1; val := (J >= 0)? ZeroExtend(sREG[0: J ]): 0; sREG := {(sREG «(lz+1)), bitStreamBuffer{0:lz]}; codeNum := (1"(Iz + kOrder)) + (OxFFFFFFF « kOrder) + val; Seval = EXP_GOLOMB_SD tk = EXP_GOLOMB_UD (-\f+])Ceil{-1 Seval= UJ * cbp = EXP_GOLOMB_MD(Type) k = EXP GOLOMB UD I. — cbp = TableCBP[Type][k] Table 6 further explains these instructions, EXP_GOLOMB_UD instruction Decoding a one-ary coded code symbol. The EXP_GOLOMB_SD instruction decodes a coded symbol with a signed one-element code. As shown in Table 6, for the EXP_GOLOMB_SD instruction, when k = 0, there is no difference between positive 0 and negative ,, so the value returned is 0. The E XP_GOLOMB_MD (SRC1) instruction decodes the mapped coded symbols, where SRC1 = Type, which is related to the macroblock parameters and the coded_block_pattern. The value of Type will result in the following coded_block_parameter :
Type = Intra 4x4 Type = 1 — Inter 可使用表格(例如:晶片上記憶體或是遠端記憶體内的表 格)以根據巨集區塊預測模式(例如··碼數量、k )而指定 值給 coded_block_parameter ° 解碼截短Exp-Golomb符號的EXP-Golomb指令更描述 如下: EXP GOLOMB TD DST, SRC1 ,其中’ SRC 1為範圍。至少在一實施例中,執行截短Type = Intra 4x4 Type = 1 — Inter can use a table (for example: on-wafer memory or a table in remote memory) to specify values based on macroblock prediction mode (eg, number of codes, k) Coded_block_parameter ° The EXP-Golomb instruction that decodes the truncated Exp-Golomb symbol is described as follows: EXP GOLOMB TD DST, SRC1 , where 'SRC 1 is the range. In at least one embodiment, performing truncation
Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 95 1354239Client’s Docket No.: S3U06-0013-TW TT^s Docket No:0608-A41246twf.doc/NikeyChen 95 1354239
Exp-Golomb編碼時,需要先知道範圍。接著,戴短 Exp-Golomb編碼可被推導如下: codeNum = EXP_GOLOMB_TD(range) { else if(range==l) return READ(1)A1; else return EXP GOLOMB UE; I 丨 _ _ :} : 因此,EXP_GOLOMB_D指令被提供。 解釋運算碼以及驅動-發出軟體指令之間的差異是有 用的。通常,當設計ISA時,至少有兩個影響在工作上: (1)讓指令解碼器較簡單以及在單一管線階段中完成(即 快速);以及(2 )讓程式設計師助記(mnemonics )較簡 單。參考五種EXP-Golomb基準的運算,從使用者的觀點 來看這些運算為有區別的。再者,有兩種不同格式:全部 EXP-Golomb基準的運算輸出相同值,但是只有部分運算具 有一輸入(除了内含在運算中的位元流),其提供至少一 基本區別。傳統上’ CPU指令不具有隱含輸入,但是卻透 過運算包括隱含輸入。然而,位元流不經由運算而揭露, 但是卻是内部自動管理以及使用INIT指令進行初始。 從硬體的觀點,可使用EXP-GOLOMB-UD的相同硬體 硬體的相同核心(或是至少)以及有關核心硬體的小加法 來執行全部的其他EXP-GOLOMB-UD運算(例如在軟體内 相似於CASE/SWITCH的部分)。因此編譯器/翻譯器可映 射全部的運算至單一指令。再者,這些運算為固定(例如 運算不會動態改變)。參考下面表七的pseudonym行,注^When Exp-Golomb is encoded, you need to know the scope first. Next, wearing a short Exp-Golomb encoding can be derived as follows: codeNum = EXP_GOLOMB_TD(range) { else if(range==l) return READ(1)A1; else return EXP GOLOMB UE; I 丨_ _ :} : Therefore, The EXP_GOLOMB_D instruction is provided. It is useful to interpret the difference between the opcode and the driver-issued software instructions. In general, when designing an ISA, there are at least two effects at work: (1) making the instruction decoder simpler and completing in a single pipeline phase (ie fast); and (2) letting the programmer mnemonics (mnemonics) It's simpler. Referring to the operations of the five EXP-Golomb benchmarks, these operations are different from the user's point of view. Furthermore, there are two different formats: all of the EXP-Golomb benchmarks output the same value, but only some of the operations have an input (except for the bitstream contained in the operation), which provides at least one fundamental difference. Traditionally, CPU instructions do not have implicit input, but pass through operations including implicit input. However, the bit stream is not exposed through computation, but is internally managed automatically and initialized using the INIT instruction. From a hardware point of view, all other EXP-GOLOMB-UD operations can be performed using the same core of the same hardware hardware of the EXP-GOLOMB-UD (or at least) and small additions to the core hardware (eg in software) It is similar to the part of CASE/SWITCH). So the compiler/translator can map all operations to a single instruction. Again, these operations are fixed (for example, the operation does not change dynamically). Refer to the pseudonym line in Table 7 below, note ^
Client's Docket No.: S3U06-0013-TW TT*s Docket No:0608-A41246twf.doc/NikeyChen 96 1354239 意到對 EXP-GOLOMB-UD 以及 ΕΧΡ-GOLOMB-SD 運算, SRC1可以被加入(或是由核心所忽略),具有機制用以區 別這些運算。同樣地,注意到沒有單一來源指令分組存在, 但是可被映射至暫存器-立即分組。藉由使用如表七所顯示 不同指令的明顯立即數目,可以得到這些指令之間的區Client's Docket No.: S3U06-0013-TW TT*s Docket No:0608-A41246twf.doc/NikeyChen 96 1354239 For the EXP-GOLOMB-UD and ΕΧΡ-GOLOMB-SD operations, SRC1 can be added (or by the core) Ignored), with mechanisms to distinguish these operations. Again, note that no single source instruction packet exists, but can be mapped to a scratchpad-immediate packet. The area between these instructions can be obtained by using the apparent immediate number of different instructions as shown in Table 7.
I 別’因此導致只有一個主要/次:要運算碼而不是五個,其包 括一個有意義的儲存。即只有一個次要運算碼被使用因為 可使用立即格式指令,以及藉由編碼帶有適當資料的立即 資料攔位並指定Pseudonym可完成不同EXP_Golomb指令 之間的區別。 EXP GOLOMB D Dst, #Type, Srcl.lane ’其中經由下列表七可決定#Type : #Type Pseudonym 指令 0x0 EXP_GOLOMB_UD Dst EGOLD Dst,0x0,Src1 0x1 EXP_GOLOMB_SD Dst EGOLD Dst, 0x1, Src1 0x2 EXP_GOLOMB_TD Dst,Src1 EGOLD Dst, 0x2, Src1 0x3 EXP_GOLOMB_MD Dst, Src1 EGOLD Dst,0x3,Src1 0x4 EXP_GOLOMB_CD Dst, Src1 EGOLD Dst, 0x4, Src1 表七 進一步解釋表七,對#type=Ox〇或是#type=0xl而言, 沒有Srcl攔位是需要的,以及不需要指定這些指令至另一 主要或是次要運算碼群組,因為可指定虛擬(dummy ) Src 或是Src以及Dst可被標示為相同。 EXP-Golomb編碼符號被編碼成如下圖所顯示(例如包 括〇或是多個引導0、跟隨著1,以及然後是對應於引導0 之數量的一些位元):I don't result in only one major/time: to calculate the code instead of five, which includes a meaningful store. That is, only one secondary opcode is used because the immediate format instruction can be used, and the difference between the different EXP_Golomb instructions can be accomplished by encoding the immediate data block with the appropriate data and specifying Pseudonym. EXP GOLOMB D Dst, #Type, Srcl.lane 'It can be determined via the following list #Type : #Type Pseudonym Instruction 0x0 EXP_GOLOMB_UD Dst EGOLD Dst,0x0,Src1 0x1 EXP_GOLOMB_SD Dst EGOLD Dst, 0x1, Src1 0x2 EXP_GOLOMB_TD Dst,Src1 EGOLD Dst, 0x2, Src1 0x3 EXP_GOLOMB_MD Dst, Src1 EGOLD Dst, 0x3, Src1 0x4 EXP_GOLOMB_CD Dst, Src1 EGOLD Dst, 0x4, Src1 Table 7 further explains Table 7, for #type=Ox〇 or #type=0xl, no Srcl blocking is required and there is no need to specify these instructions to another primary or secondary opcode group, as it can be specified that the dummy Src or Src and Dst can be marked as the same. The EXP-Golomb code symbols are encoded as shown in the following figure (for example including 〇 or multiple boots 0, followed by 1, and then some bits corresponding to the number of boots 0):
Client’s Docket No.: S3U06-0013-TW TTJs Docket No:0608-A41246twf.doc/NikeyChen 1354239 1 0 1 x〇 0 0 1 X1 X〇 0 0 0 1 X2 X1 0 0 0 0 1 Xs X2 0 0 0 0 0 1 X4 Xs codeNum 範圍 0 1-2 3-6 x〇 7-14 X1 X〇 15-30 X2 X1 X〇 31-62 φ 這些位元如何被解釋是根據特定Golomb型式而定(這裡 是根據H.264的三種型式以及AVS的第四型式)。使用 . UD以及SD (不具正負號以及正負號)計算邏輯單元來計 算值。例如,當位元流為0001010時,則UD的值為Client's Docket No.: S3U06-0013-TW TTJs Docket No:0608-A41246twf.doc/NikeyChen 1354239 1 0 1 x〇0 0 1 X1 X〇0 0 0 1 X2 X1 0 0 0 0 1 Xs X2 0 0 0 0 0 1 X4 Xs codeNum Range 0 1-2 3-6 x〇7-14 X1 X〇15-30 X2 X1 X〇31-62 φ How these bits are interpreted depends on the specific Golomb type (here is based on H Three types of .264 and the fourth type of AVS). Use UD and SD (without sign and sign) to calculate the logical unit to calculate the value. For example, when the bit stream is 0001010, the value of UD is
• (1<<3)-1+2 = 9,而 SD 的值為(-l)A10*ceil(9/2) = +5。CD - 也發生相似的程序。然而,對MD而言,表單查找被執行 (例如當UD編碼時,對值作解碼,接著使用此值做為索 引進入表格,傳回6位元的值(在表格中儲存成6位元的 • 值,但是傳回值是從〇延伸至暫存器的寬度))。在一實 施例中有兩表格,一表格為Intra編碼而另一表格為Inter 編碼。 - 上述指令轉換如何被使用在EXP-Golomb解碼之内容 中的例子,可藉由H. 264片段標頭部分解碼之示範偽碼顯 示如下。 sliceHeaderDecode: EXP_GOLOMB_UD firstMBSlice EXP_GOLOMB_UD sliceType EXP_GOLOMB_UD picParameterSetID READ frameNum, Nval• (1<<3)-1+2 = 9, and the value of SD is (-l)A10*ceil(9/2) = +5. CD - A similar procedure occurs. However, for MD, the form lookup is performed (for example, when UD encoding, the value is decoded, then this value is used as an index into the table, and the value of 6 bits is returned (stored as 6 bits in the table) • Value, but the return value is from 〇 to the width of the scratchpad)). In one embodiment there are two tables, one for Intra coding and the other for Inter coding. - An example of how the above instruction conversion is used in the content of the EXP-Golomb decoding, which can be displayed by the exemplary pseudo code of the H.264 fragment header portion decoding as follows. sliceHeaderDecode: EXP_GOLOMB_UD firstMBSlice EXP_GOLOMB_UD sliceType EXP_GOLOMB_UD picParameterSetID READ frameNum, Nval
Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 98 1354239 IB_GT frameMbsOnlyFlag, ZERO, $Label1Client’s Docket No.: S3U06-0013-TW TT’s Docket No:0608-A41246twf.doc/NikeyChen 98 1354239 IB_GT frameMbsOnlyFlag, ZERO, $Label1
READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO, $Label1READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO, $Label1
READ bottomFieldFlag, ONEREAD bottomFieldFlag, ONE
Labeh: ISUBI t1,#5, nalUnitType IB_NEQ ZERO, t1,$Label2Labeh: ISUBI t1, #5, nalUnitType IB_NEQ ZERO, t1, $Label2
EXP_GOLOMB_UD idrPicID ZERO, picOrderCntType, $Label3 picOrderCntLSB, Nvalt p1, ONE, fieldPicFlag nfieidPicFlag, ZERO nfieldPicFlag, ONE t1, picOrderPresentFlag, nfieldPicFlag ONE, t1,$Label4 EXP_GOLOMB_SD deltaPicOrderCntBottom Label4:EXP_GOLOMB_UD idrPicID ZERO, picOrderCntType, $Label3 picOrderCntLSB, Nvalt p1, ONE, fieldPicFlag nfieidPicFlag, ZERO nfieldPicFlag, ONE t1, picOrderPresentFlag, nfieldPicFlag ONE, t1,$Label4 EXP_GOLOMB_SD deltaPicOrderCntBottom Label4:
Label2: IB_NEQ READ Label3: ICMPI—EQ [p1]MOV [!p1]MOV AND B NEQ 至 sliceHeaderDecode:Label2: IB_NEQ READ Label3: ICMPI—EQ [p1]MOV [!p1]MOV AND B NEQ to sliceHeaderDecode:
EGOLD firstMBSlice, #0, ZERO EGOLD sliceType, #0, ZERO EGOLD picParameterSetID, #0, ZERO READ frameNum, Nval IB_GT frameMbsOnlyFlag, ZERO, $Label1 READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO, $Labei1 READ bottomFieldFlag, ONE Label 1: ISUBI t1,#5, nalUnitType IB_NEQ ZERO, t1,$Label2 EGOLD idrPicID, #0, ZERO Label2: 旧一NEQ ZERO, picOrderCntType, $Label3 READ picOrderCntLSB, Nvalt Label3: ICMPI_EQ p1, ONE, fieldPicFlag [p1]MOV nfieldPicFlag, ZERO [!p1]M0V nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlac B_NEQ ONE, t1,$Label4 EGOLD deltaPicOrderCntBottom, #1, ZEROEGOLD firstMBSlice, #0, ZERO EGOLD sliceType, #0, ZERO EGOLD picParameterSetID, #0, ZERO READ frameNum, Nval IB_GT frameMbsOnlyFlag, ZERO, $Label1 READ fieldPicFlag, ONE IB_EQ fieldPicFlag, ZERO, $Labei1 READ bottomFieldFlag, ONE Label 1: ISUBI t1, #5, nalUnitType IB_NEQ ZERO, t1, $Label2 EGOLD idrPicID, #0, ZERO Label2: Old NEQ ZERO, picOrderCntType, $Label3 READ picOrderCntLSB, Nvalt Label3: ICMPI_EQ p1, ONE, fieldPicFlag [p1]MOV nfieldPicFlag, ZERO [!p1]M0V nfieldPicFlag, ONE AND t1, picOrderPresentFlag, nfieldPicFlac B_NEQ ONE, t1,$Label4 EGOLD deltaPicOrderCntBottom, #1, ZERO
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 1354239 VC-1解碼 :已描述用作CABAC解碼(經由CABAC模組谓的 可變長度解碼單元530a)、CAVLC解石馬(經由CAVLC模 .組582的可變5度解碼單元530b)、mpeg解碼(經由 • MPEG模、址57:8的可變長度解碼單元530c j以及 Exp-Go〗〇mb解碼(經由EXP-Golomb模組584的可變長度 解碼單S 530 d)的解碼系統200,接下來將描述解碼系統 200的VC_1貫施例,於此稱為可變長度解碼單元530e。可 麦長度解碼單元530e根據計算前導1模組574、計算前導 〇模組576的運算而操作。να!使用霍夫曼編碼且具有更 多表格。代替建立以及測試這些表格,既然位元率需要較 低’但是驗證成本較高,必要的表格被載入至鄰近内容記 憶體564。表格格式相同於MPEG-2所使用,而使用READ、 VLC—CLZ、VLC—CLO以及INPSTR指令以解碼位元流。 I 例如’使用下列偽碼可執行特定表格:Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 1354239 VC-1 decoding: Described as CABAC decoding (variable length decoding unit 530a via CABAC module), CAVLC solution Shima (variable 5 degree decoding unit 530b via CAVLC modulo group 582), mpeg decoding (via • MPEG modulo, variable length decoding unit 530c j of address 57:8, and Exp-Go 〇 mb decoding (via EXP The decoding system 200 of the variable length decoding single S 530 d) of the Golomb module 584 will next describe the VC_1 embodiment of the decoding system 200, here referred to as the variable length decoding unit 530e. The ummar length decoding unit 530e It operates according to the calculation of the preamble 1 module 574 and the calculation of the preamble module 576. να! uses Huffman coding and has more tables. Instead of establishing and testing these tables, since the bit rate needs to be lower, the verification cost Higher, the necessary tables are loaded into the adjacent content memory 564. The table format is the same as that used by MPEG-2, and the READ, VLC-CLZ, VLC-CLO, and INPSTR instructions are used to decode the bit stream. The following pseudo code can be executed Fixed form:
//TABLE -1 Picture CBPCY VLC TABLE//TABLE -1 Picture CBPCY VLC TABLE
VLC_CI_ZDST0,#8 CASE DSTO 0: VALUE = 0; BREAK; "USE MOVL 1:VLC_CLZDST1#5 CASE DST1 1:T = READ(2);VLC_CI_ZDST0,#8 CASE DSTO 0: VALUE = 0; BREAK; "USE MOVL 1:VLC_CLZDST1#5 CASE DST1 1:T = READ(2);
CASET 0: VALUE = 48; BREAK; 1: VALUE = 56; BREAK; 2 : GO20; BREAK; 3:VALUE=1;BREAK;CASET 0: VALUE = 48; BREAK; 1: VALUE = 56; BREAK; 2 : GO20; BREAK; 3: VALUE=1; BREAK;
CASE_ENDCASE_END
Clients Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen ]〇〇 1354239 2: VALUE = 2; BREAK; 3: VLC_CLO DST2, #5 CASEDST2 0:VALUE = 28;BREAK; 1: VALUE = 22; BREAK; 2: VALUE = 43; BREAK; 3: VALUE = 30; BREAK; 4: VALUE = 41; BREAK; 5: VALUE = 49; BREAK;Clients Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen ]〇〇1354239 2: VALUE = 2; BREAK; 3: VLC_CLO DST2, #5 CASEDST2 0: VALUE = 28;BREAK; 1 : VALUE = 22; BREAK; 2: VALUE = 43; BREAK; 3: VALUE = 30; BREAK; 4: VALUE = 41; BREAK; 5: VALUE = 49; BREAK;
CASE_END 4: T = READ(1); VALUE = (T)? (READ(1) ? 31 : 54): 27; BREAK; 5: VALUE = 6; BREAK; CASE_END 2: VLC_C1_ZDS1 #4 CASE DST1 1: VALUE = 3; BREAK; 2: T = READ(1 ); VALUE = (T)? 19: 36; BREAK; 3:T = READ(2);CASE_END 4: T = READ(1); VALUE = (T)? (READ(1) ? 31 : 54): 27; BREAK; 5: VALUE = 6; BREAK; CASE_END 2: VLC_C1_ZDS1 #4 CASE DST1 1: VALUE = 3; BREAK; 2: T = READ(1); VALUE = (T)? 19: 36; BREAK; 3:T = READ(2);
CASET 0: VALUE = 38; BREAK; 1: VALUE = 47; BREAK; 2: VALUE = 59; BREAK; 3: VALUE = 5; BREAK; CASE—END 4: VALUE = 7; BREAK;CASET 0: VALUE = 38; BREAK; 1: VALUE = 47; BREAK; 2: VALUE = 59; BREAK; 3: VALUE = 5; BREAK; CASE_END 4: VALUE = 7;
CASE—END 3: T = READ(1); VALUE = (T)? 16 : 8; BREAK; 4: T = READ(1); VALUE = (T) G010 ? : 12; BREAK;CASE—END 3: T = READ(1); VALUE = (T)? 16 : 8; BREAK; 4: T = READ(1); VALUE = (T) G010 ? : 12; BREAK;
5: VALUE = 20; BREAK; 6: VALUE = 44; BREAK; 7: T = READ(1); VALUE = (T)? 33 : 58; BREAK; //USE SEL?? 8: VALUE = 15; BREAK;5: VALUE = 20; BREAK; 6: VALUE = 44; BREAK; 7: T = READ(1); VALUE = (T)? 33 : 58; BREAK; //USE SEL?? 8: VALUE = 15; BREAK ;
CASE_END GO10: INPSTR S1,#3 READ_NCM S2, #0, off+S1 »2 VALUE = S2& 0x63; Q = (S2 » 6) & 0x3; READ SO, Q RETURN; G〇20:CASE_END GO10: INPSTR S1, #3 READ_NCM S2, #0, off+S1 »2 VALUE = S2&0x63; Q = (S2 » 6) &0x3; READ SO, Q RETURN; G〇20:
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 101 1354239 INPSTRS1,#4 READ_NCM S2, #0, off+s1»2 VALUE = S2& 0x63; Q = (S2 » 6) & 0x3; READ SO, Q RETURN; 在部分實施例中,可用分支指令代替CASE敘述。因 此,和,MPEG-2 —樣的VC-1具有容易定|義的文法。文法 中的符:號具有特定方法(表格),其可被執行成著色器, 如上述編碼所顯示。 本發明雖以較佳實施例揭露如上,然其並非用以限定 本發明的範圍,任何熟習此項技藝者,在不脫離本發明之 精神和範圍内,當可做些許的更動與潤飾,因此本發明之 保護範圍當視後附之申請專利範圍所界定者為準。 【圖式簡單說明】 第1圖係顯示圖形處理器系統實施例之方塊圖,其中 可執行不同的解碼系統(及方法); 第2圖係顯示示範處理環境之方塊圖,其中可執行解 碼系統的不同實施例; 第3圖係顯示第2圖所顯示之示範處理環境的選擇元 件方塊圖; 第4圖係顯示第2、3圖所顯示之示範處理環境的計算 核心方塊圖,其中可執行解碼系統的不同實施例; 第5A圖係顯示第4圖中計算核心之執行單元的選擇元 件方塊圖,其中可執行解碼系統的不同實施例; 第5B圖係顯示執行單元資料路徑之方塊圖,其中可執Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 101 1354239 INPSTRS1,#4 READ_NCM S2, #0, off+s1»2 VALUE = S2&0x63; Q = (S2 » 6 &0x3; READ SO, Q RETURN; In some embodiments, a branch instruction can be used instead of a CASE statement. Therefore, MPEG-2-like VC-1 has a grammar that is easy to define. The symbol in the grammar: the number has a specific method (table) that can be executed as a colorizer, as shown by the above code. The present invention has been described above with reference to the preferred embodiments thereof, and is not intended to limit the scope of the present invention, and the invention may be modified and modified without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a graphics processor system in which different decoding systems (and methods) can be executed; and FIG. 2 is a block diagram showing an exemplary processing environment in which a decoding system can be executed. Different embodiments of the present invention; Figure 3 is a block diagram showing the selection elements of the exemplary processing environment shown in Figure 2; Figure 4 is a block diagram showing the calculation of the exemplary processing environment shown in Figures 2 and 3, where Different embodiments of the decoding system; FIG. 5A is a block diagram showing the selection elements of the execution unit of the computing core in FIG. 4, in which different embodiments of the decoding system can be executed; FIG. 5B is a block diagram showing the data path of the execution unit, Among them
Client's Docket No.: S3U06-0013-TW TTs Docket No:0608-A41246twf.doc/NikeyChen 102 1354239 行解碼系統的不同實施例; 第5C圖係顯示第5B圖中解碼系統實施例之方塊圖, 其適用於複數編碼標準,以及更顯示對應之位元流緩衝器 的實施例; 第6A圖係顯示第5C圖中解碼系統實施例之方塊圖,Client's Docket No.: S3U06-0013-TW TTs Docket No: 0608-A41246twf.doc/NikeyChen 102 1354239 Different embodiments of the row decoding system; Figure 5C shows a block diagram of the decoding system embodiment of Figure 5B, which is applicable In the plural coding standard, and the embodiment of the corresponding bit stream buffer is further displayed; FIG. 6A is a block diagram showing the embodiment of the decoding system in FIG. 5C.
» I 用:以進行CABAC解碼; 第6B圖係顯示第6A圖中解碼系統實施例之方塊圖; 第6C圖係顯示第6A圖中解碼系統之内容記憶結構及 相關暫存器實施例之方塊圖; 第6D圖係顯示使用第6A圖中解碼系統之巨集區塊劃 分機制; 第6E圖係顯示使用第6A圖中解碼系統所執行之示範 巨集區塊解碼機制的方塊圖, 第7A圖係顯示第5C圖中解碼系統實施例之方塊圖, 用以進行CABAC解碼;以及 第7B圖係顯示第7A圖中解碼系統所使用的表格結構 實施例之方塊圖。 【主要元件符號說明】 100- -圖形處理器系統 102〜 顯示裝置 104, -顯示介面單元 106〜 局部記憶體 110- '"記憶介面單元 114〜 圖形處理單元 118- -PCI-E匯流排介面單元 122〜 晶片組 124- -糸統記憶體 126〜 中央處理單元 128- ^驅動軟體 200〜解碼系統» I use: for CABAC decoding; Figure 6B shows a block diagram of the decoding system embodiment of Figure 6A; Figure 6C shows the content memory structure of the decoding system and the block of the associated register embodiment of Figure 6A Figure 6D shows a macroblock partitioning mechanism using the decoding system of Figure 6A; Figure 6E shows a block diagram of the exemplary macroblock decoding mechanism performed by the decoding system of Figure 6A, 7A The figure shows a block diagram of a decoding system embodiment in FIG. 5C for CABAC decoding; and FIG. 7B shows a block diagram of a table structure embodiment used by the decoding system in FIG. 7A. [Description of main component symbols] 100--graphics processor system 102 to display device 104, - display interface unit 106~ local memory 110-'"memory interface unit 114~ graphics processing unit 118--PCI-E bus interface Unit 122~ Chipset 124--System Memory 126~ Central Processing Unit 128-^Drive Software 200~Decoding System
Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen 103 1354239 202〜圖形處理器 204〜計算核心 206〜執行單元集合控制以及頂點/串流快取單元 208〜 /圖形管線 302〜 紋理過濾單元 304〜 -像素包裝器 306〜 命令流處理器 308- /寫回單元 | 310〜 紋理位址產生器 402〜 /執行單元輸入 : 412〜 執行單元集合 404a^ 〜執行單元偶輸出 404b- '^執行單元奇輸出 406〜 /記憶體存取單元 408〜 L2快取記憶體 410- /記憶體介面仲裁器 504〜 /指令快取記憶體控制器 506〜 /執行緒控制器 508〜 缓衝器 510〜共用暫存器檔案 512〜 執行單元資料路徑 514〜 '執行單元資料路徑FIFO 516- '述詞暫存器檔案 518〜 純量暫存器檔案 520〜 /資料輸出控制器 524〜 執行緒任務介面 526- /暫存器檔案 530〜 可變長度解碼單元 532〜 /向量浮點單元 534〜向量整數計算邏輯單元 536〜特殊目的單元 540〜暫存器檔案 562〜SREG串流缓衝器/DMA引擎 562a〜SREG暫存器 562b〜位元流缓衝器 564〜鄰近内容記憶體 568〜讀取鄰近内文記憶體模組 570〜檢查字串模組 572〜讀取模組Client's Docket No.: S3U06-0013-TW TT's Docket No: 0608-A41246twf.doc/NikeyChen 103 1354239 202~Graphic Processor 204~Calculation Core 206~Execution Unit Set Control and Vertex/Stream Streaming Unit 208~/Graphics Pipeline 302~ Texture Filtering Unit 304~-Pixel Wrapper 306~ Command Stream Processor 308-/Write Back Unit|310~ Texture Address Generator 402~/Execution Unit Input: 412~ Execution Unit Set 404a^~Execution Unit Output 404b-'^Execution unit odd output 406~/memory access unit 408~L2 cache memory 410-/memory interface arbiter 504~/instruction cache memory controller 506~/thread controller 508 ~ Buffer 510~Shared scratchpad file 512~ Execution unit data path 514~ 'Execution unit data path FIFO 516-' Predicate register file 518~ suffix register file 520~ / data output controller 524 ~ Thread task interface 526- / register file 530 ~ variable length decoding unit 532 ~ / vector floating point unit 534 ~ vector integer calculation logic unit 536 ~ special purpose list 540 ~ register file 562 ~ SREG stream buffer / DMA engine 562a ~ SREG register 562b ~ bit stream buffer 564 ~ adjacent content memory 568 ~ read adjacent memory module 570 ~ Check string module 572~read module
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A43246iwf.doc/NikeyChen ]Q4 1354239 574〜 計算引導1模組 578〜 MPEG模組 582〜 CAVLC模組 602〜狀態索引 606〜 碼長範圍 1 612〜 局部暫存器 : 616〜 二進位字串暫存器 622〜 取得内容模組 624〜 二進位計算解碼引 6 2 8〜目標 632- SRC1 634〜 共用以及執行緒資 636〜延遲/重置 640〜資料 654〜 二進位索引 712〜 •位準碼模組 716〜 •位準0模組 720〜 運行模組 724〜 •運行陣列Client's Docket No.: S3U06-0013-TW TT's Docket No:0608-A43246iwf.doc/NikeyChen]Q4 1354239 574~ Computational Boot 1 Module 578~ MPEG Module 582~ CAVLC Module 602~Status Index 606~ Code Length Range 1 612~ Local register: 616~ Binary string register 622~ Get content module 624~ Binary calculation decoding reference 6 2 8~ Target 632- SRC1 634~ Shared and execution 636~delay/heavy Set 640~data 654~binary index 712~•bit level module 716~•level 0 module 720~ run module 724~•run array
576〜計算引導0模組 580〜CABAC模組 584〜Exp-Golomb 模組 604〜高可能性符號值 608〜碼長偏移量 614〜總體暫存器 620〜二進位化模組 630 〜SRC2 63 8〜位址 650〜記憶體模組 710〜係數符記模組 714〜位準模組 718〜零位準模組 722〜位準陣列576~Compute Boot 0 Module 580~CABAC Module 584~Exp-Golomb Module 604~High Probability Symbol Value 608~Code Length Offset 614~Overall Register 620~Secondary Module 630~SRC2 63 8~address 650~memory module 710~coefficient register module 714~level module 718~zero level module 722~level array
Client’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChenClient’s Docket No.: S3U06-0013-TW TT's Docket No:0608-A41246twf.doc/NikeyChen
Claims (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US81182106P | 2006-06-08 | 2006-06-08 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200809689A TW200809689A (en) | 2008-02-16 |
TWI354239B true TWI354239B (en) | 2011-12-11 |
Family
ID=38899303
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW96120899A TWI344795B (en) | 2006-06-08 | 2007-06-08 | Decoding of context adaptive variable length codes in computational core of programmable graphics processing unit |
TW96120728A TWI354239B (en) | 2006-06-08 | 2007-06-08 | Decoding system unit |
TW096120896A TWI348653B (en) | 2006-06-08 | 2007-06-08 | Decoding of context adaptive binary arithmetic codes in computational core of programmable graphics processing unit |
TW96120726A TWI428850B (en) | 2006-06-08 | 2007-06-08 | Decoding method |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW96120899A TWI344795B (en) | 2006-06-08 | 2007-06-08 | Decoding of context adaptive variable length codes in computational core of programmable graphics processing unit |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW096120896A TWI348653B (en) | 2006-06-08 | 2007-06-08 | Decoding of context adaptive binary arithmetic codes in computational core of programmable graphics processing unit |
TW96120726A TWI428850B (en) | 2006-06-08 | 2007-06-08 | Decoding method |
Country Status (2)
Country | Link |
---|---|
CN (4) | CN101087411A (en) |
TW (4) | TWI344795B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI479869B (en) * | 2012-04-03 | 2015-04-01 | Qualcomm Inc | Chroma slice-level qp offset and deblocking |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8156410B2 (en) * | 2008-03-05 | 2012-04-10 | Himax Technologies Limited | Fast debugging tool for CRC insertion in MPEG-2 video decoder |
US8686921B2 (en) * | 2008-12-31 | 2014-04-01 | Intel Corporation | Dynamic geometry management of virtual frame buffer for appendable logical displays |
CN101577629B (en) * | 2009-05-14 | 2011-05-25 | 北京邮电大学 | Dynamic allocation method of coding vector based on graph coloring in multicast network |
CN101908200B (en) * | 2009-06-05 | 2012-08-08 | 财团法人资讯工业策进会 | Graphics processing system with power gating function and method |
US8681162B2 (en) * | 2010-10-15 | 2014-03-25 | Via Technologies, Inc. | Systems and methods for video processing |
GB2488159B (en) * | 2011-02-18 | 2017-08-16 | Advanced Risc Mach Ltd | Parallel video decoding |
US9378560B2 (en) | 2011-06-17 | 2016-06-28 | Advanced Micro Devices, Inc. | Real time on-chip texture decompression using shader processors |
US9231616B2 (en) * | 2011-08-05 | 2016-01-05 | Broadcom Corporation | Unified binarization for CABAC/CAVLC entropy coding |
CN103037213B (en) * | 2011-09-28 | 2016-02-17 | 晨星软件研发(深圳)有限公司 | The cloth woods entropy decoding method of cloth woods entropy decoder and image playing system |
KR20130050904A (en) | 2011-11-08 | 2013-05-16 | 삼성전자주식회사 | Method and apparatus for arithmetic encoding of video, and method and apparatus for arithmetic decoding of video |
US20130307860A1 (en) * | 2012-03-30 | 2013-11-21 | Mostafa Hagog | Preempting Fixed Function Media Devices |
US9942571B2 (en) * | 2012-05-29 | 2018-04-10 | Hfi Innovations Inc. | Method and apparatus for coding of sample adaptive offset information |
US9196014B2 (en) * | 2012-10-22 | 2015-11-24 | Industrial Technology Research Institute | Buffer clearing apparatus and method for computer graphics |
CN103813177A (en) * | 2012-11-07 | 2014-05-21 | 辉达公司 | System and method for video decoding |
US9947084B2 (en) | 2013-03-08 | 2018-04-17 | Nvidia Corporation | Multiresolution consistent rasterization |
JP6379107B2 (en) * | 2013-05-21 | 2018-08-22 | 株式会社スクウェア・エニックス・ホールディングス | Information processing apparatus, control method therefor, and program |
CN107037984B (en) * | 2013-12-27 | 2019-10-18 | 威盛电子股份有限公司 | Data memory device and its method for writing data |
US9455743B2 (en) * | 2014-05-27 | 2016-09-27 | Qualcomm Incorporated | Dedicated arithmetic encoding instruction |
TW201626218A (en) | 2014-09-16 | 2016-07-16 | 輝達公司 | Techniques for passing dependencies in an API |
US10205957B2 (en) | 2015-01-30 | 2019-02-12 | Mediatek Inc. | Multi-standard video decoder with novel bin decoding |
US10250912B2 (en) * | 2015-02-17 | 2019-04-02 | Mediatek Inc. | Method and apparatus for entropy decoding with arithmetic decoding decoupled from variable-length decoding |
CN104869398B (en) * | 2015-05-21 | 2017-08-22 | 大连理工大学 | A kind of CABAC realized based on CPU+GPU heterogeneous platforms in HEVC parallel method |
GB2542162B (en) * | 2015-09-10 | 2019-07-17 | Imagination Tech Ltd | Trailing or leading digit anticipator |
US9537504B1 (en) * | 2015-09-25 | 2017-01-03 | Intel Corporation | Heterogeneous compression architecture for optimized compression ratio |
US10467006B2 (en) * | 2015-12-20 | 2019-11-05 | Intel Corporation | Permutating vector data scattered in a temporary destination into elements of a destination register based on a permutation factor |
US10375395B2 (en) | 2016-02-24 | 2019-08-06 | Mediatek Inc. | Video processing apparatus for generating count table in external storage device of hardware entropy engine and associated video processing method |
CN106921859A (en) * | 2017-05-05 | 2017-07-04 | 郑州云海信息技术有限公司 | A kind of CABAC entropy coding methods and device based on FPGA |
CN107277505B (en) * | 2017-05-19 | 2020-06-16 | 北京大学 | AVS-2 video decoder device based on software and hardware partition |
CN107242882A (en) * | 2017-06-05 | 2017-10-13 | 上海瓴舸网络科技有限公司 | A kind of B ultrasound shows auxiliary equipment and its control method |
CN110710219B (en) * | 2017-12-08 | 2022-02-11 | 谷歌有限责任公司 | Method and apparatus for context derivation for coefficient coding |
TWI674558B (en) | 2018-06-12 | 2019-10-11 | 財團法人工業技術研究院 | Device and method for processing numercial array data, and color table generation method thereof |
CN109818855B (en) * | 2019-01-14 | 2020-12-25 | 东南大学 | Method for obtaining content by supporting pipeline mode in NDN (named data networking) |
CN110458120B (en) * | 2019-08-15 | 2022-01-04 | 中国水利水电科学研究院 | Method and system for identifying different vehicle types in complex environment |
CN111028135B (en) * | 2019-12-10 | 2023-06-02 | 国网重庆市电力公司电力科学研究院 | Image file repairing method |
CN112582009B (en) * | 2020-12-11 | 2022-06-21 | 武汉新芯集成电路制造有限公司 | Monotonic counter and counting method thereof |
US11748011B2 (en) | 2021-03-31 | 2023-09-05 | Silicon Motion, Inc. | Control method of flash memory controller and associated flash memory controller and storage device |
US11733895B2 (en) | 2021-03-31 | 2023-08-22 | Silicon Motion, Inc. | Control method of flash memory controller and associated flash memory controller and storage device |
CN114816434B (en) * | 2022-06-28 | 2022-10-04 | 之江实验室 | Programmable switching-oriented hardware parser and parser implementation method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7742544B2 (en) * | 2004-05-21 | 2010-06-22 | Broadcom Corporation | System and method for efficient CABAC clock |
EP1599049A3 (en) * | 2004-05-21 | 2008-04-02 | Broadcom Advanced Compression Group, LLC | Multistandard video decoder |
KR100612015B1 (en) * | 2004-07-22 | 2006-08-11 | 삼성전자주식회사 | Method and apparatus for Context Adaptive Binary Arithmetic coding |
US7800620B2 (en) * | 2004-11-05 | 2010-09-21 | Microsoft Corporation | Optimizing automated shader program construction |
-
2007
- 2007-06-08 CN CN 200710126453 patent/CN101087411A/en active Pending
- 2007-06-08 CN CN 200710110297 patent/CN101072350B/en active Active
- 2007-06-08 TW TW96120899A patent/TWI344795B/en active
- 2007-06-08 TW TW96120728A patent/TWI354239B/en active
- 2007-06-08 TW TW096120896A patent/TWI348653B/en active
- 2007-06-08 CN CN 200710126452 patent/CN101072353B/en active Active
- 2007-06-08 TW TW96120726A patent/TWI428850B/en active
- 2007-06-08 CN CN 200710110295 patent/CN101072349B/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI479869B (en) * | 2012-04-03 | 2015-04-01 | Qualcomm Inc | Chroma slice-level qp offset and deblocking |
US9451258B2 (en) | 2012-04-03 | 2016-09-20 | Qualcomm Incorporated | Chroma slice-level QP offset and deblocking |
Also Published As
Publication number | Publication date |
---|---|
CN101072350B (en) | 2012-12-12 |
CN101087411A (en) | 2007-12-12 |
CN101072350A (en) | 2007-11-14 |
CN101072353B (en) | 2013-02-20 |
TWI344795B (en) | 2011-07-01 |
CN101072349B (en) | 2012-10-10 |
TWI348653B (en) | 2011-09-11 |
TW200809689A (en) | 2008-02-16 |
TW200813884A (en) | 2008-03-16 |
TWI428850B (en) | 2014-03-01 |
TW200821982A (en) | 2008-05-16 |
TW200803526A (en) | 2008-01-01 |
CN101072349A (en) | 2007-11-14 |
CN101072353A (en) | 2007-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI354239B (en) | Decoding system unit | |
US7626518B2 (en) | Decoding systems and methods in computational core of programmable graphics processing unit | |
US7626521B2 (en) | Decoding control of computational core of programmable graphics processing unit | |
US7656326B2 (en) | Decoding of context adaptive binary arithmetic codes in computational core of programmable graphics processing unit | |
US7623049B2 (en) | Decoding of context adaptive variable length codes in computational core of programmable graphics processing unit | |
US9392292B2 (en) | Parallel encoding of bypass binary symbols in CABAC encoder | |
US6842124B2 (en) | Variable length decoder | |
US8520740B2 (en) | Arithmetic decoding acceleration | |
US20140153635A1 (en) | Method, computer program product, and system for multi-threaded video encoding | |
US20080048893A1 (en) | Entropy decoding methods and apparatus | |
US6781529B1 (en) | Methods and apparatuses for variable length encoding | |
US20110235699A1 (en) | Parallel entropy coding | |
Juurlink et al. | Scalable parallel programming applied to H. 264/AVC decoding | |
JP4896944B2 (en) | Image decoding device | |
US6781528B1 (en) | Vector handling capable processor and run length encoding | |
Cho et al. | Parallelizing the H. 264 decoder on the cell BE architecture | |
US6707398B1 (en) | Methods and apparatuses for packing bitstreams | |
Jia et al. | An AVS HDTV video decoder architecture employing efficient HW/SW partitioning | |
US6707397B1 (en) | Methods and apparatus for variable length codeword concatenation | |
KR100731640B1 (en) | Apparatus for bitstream processing | |
Golston et al. | C64x VelociTI. 2 extensions support media-rich broadband infrastructure and image analysis systems | |
Nolte et al. | Memory efficient programmable processor for bitstream processing and entropy decoding of multiple-standard high-bitrate HDTV video bitstreams | |
Wu et al. | Hardware-assisted syntax decoding model for software AVC/H. 264 decoders | |
Choi et al. | Design of an application specific instruction set processor for a universal bitstream codec | |
XIAOHUA | System-on-Chip design of a high performance low power full hardware cabac encoder in H. 264/AVC |