TW200816820A

TW200816820A - VPU with programmable core

Info

Publication number: TW200816820A
Application number: TW096121865A
Authority: TW
Inventors: Jiangming Xu; Brother John; Hussain Zahid
Original assignee: Via Tech Inc
Priority date: 2006-06-16
Filing date: 2007-06-15
Publication date: 2008-04-01
Also published as: CN101068353A; TW200821986A; TWI482117B; CN101083763A; TW200816082A; CN101068353B; TW200803525A; CN101068365A; TWI348654B; TW200803527A; CN101072351B; CN101068364B; CN101072351A; TWI383683B; TWI350109B; TWI444047B; CN101083764A; CN101083764B; CN101083763B; TWI395488B

Abstract

Included are embodiments for processing video data. At lease one embodiment includes a programmable Video Processing Unit that includes logic configured to r receive video data having a format chosen from at least two formats and a logic configured to receive an instruction from an instruction set including an indication of the format of the video data. Some embodiments include first parallel logic configured to process video data according to a first format in response to the indication is the first format and second parallel logic configured to process the video data according to a second format in response to the indication is the second format.

Description

ΟΟ

200816820 S3U06-0010I00-TW 24381twf.doc/n 九、發明說明：【發明所屬之技術領域】太二處理視訊以及圖形資料，更特定言之，本杂明疋提供-種具有可【先前技術】理早I。隨著電腦技術之不斷發屏斜4管提升。更特d 1減備之需求亦隨之次特 ☆多電腦應用程式及/或資料流需要對視Λ貝料進彳了處理，隨著視訊資料變得 1 資料之處理要求亦隨之增加。旻雅對視訊目刚’許多計#架構提供用於處理包括視訊以及圖形貧料之中央處理單元（CPU)，軸CPU可提朗於—些視訊以及圖形之適當處理能力，但CPU亦需處理其他資料。因此，在處理複雜視訊以及圖形中對CPU之需求可能會不利地影響整個系統之效能。 / 。另外，許多計算架構包括用於處理資料之一或多個執行單元（EU)。更特定言之，在至少一架構中EU可用以處理多個不同類型之資料。如同CPU般，對EU之需求衍生自處理複雜視訊以及圖形資料可能會不利地影響整個計异系統之效能。另外，由EU處理複雜視訊以及圖形資料可能增加功率消耗以致超過可接受的臨限值。此外，資料之不同協定或規格更會限制EU處理視訊以及圖形資料之能力。另外’目前許多計算架構提供32位元命令，該情況可能降低效率，因而影響處理速度。此外，單一組件中利用多個操作亦是另一需求。 200816820 S3U06.〇〇l〇l〇〇.TW24381twf.doc/n 口此工業領域中存在解決上述缺陷以及不足之迄今仍未解決的需求。【發明内容】，· 本發明包括用於處理視訊資料之實施例。至少一實施 • 例包含一種可程式視訊處理單元，包含：用以接收選自至少兩種格式的其中一者的視訊資料的邏輯電路；用以接收來自一指令集其中之一指令的邏輯電路，此指令包含一指示攔位用以指 α 示視訊1料的格式；第一平行邏輯電路，於指示欄位表示一第一格式日寸’根據第一格式處理該視訊資料；以及第二平行邏輯電路’於該指示攔位表示一第二格式時，根據該第二格式處理该視吼資料。上述視訊資料之格式可為MPEG—2、Κ與Η· 264 其中之一。本發明之另一實施例包括一種用以處理至少兩種格式之視訊資料的可程式視訊處理單元，包含：濾波邏輯電路，用以根據視訊資料之格式濾波視訊資料；轉換邏輯電路，用以根據，訊貢料之格式轉換視訊資料；以及用以輸出視訊資料以供後續處理的邏輯電路。其中濾波邏輯電路與轉換邏輯電路可平行運作。 ~ . 本發明亦包括用於處理視訊資料之方法的實施例。至少一實施例自一指令集接收一指令；接收選自至少兩種格式、之一的視訊資料；以及根據指令處理視訊資料。其中指令包含一識別欄位用以指示視訊資料的格式；以及其中處理視訊資料之步驟根據識別攔位利用複數個演算法執行處理。本發明揭露之其他系統、方法、特徵以及優點在檢視 7 200816820 S3U06-001〇i〇〇.TW 24381twf.doc/n 了以下圖式以及詳細描述之後對於熟習該項技術者將是明顯的或變得明顯。預期將所有此等額外系統、方法、特徵以及優點包括於此描述内容内及本揭露内容之範疇内。【實施方式】圖1為用於處理視訊資料之計算架構的一實施例。如圖1所示，什异裝置可包括執行單元（Executi〇n ) 之集區（pool) 146。執行單元之集區146可包括用於在圖 1之計算架構中執行資料之一或多個執行單元。執行單元之集區146(本文中稱為“EUP146”）可耦接至資料流快取記憶體116,且自資料流快取記憶體116接收資料。Eup 146亦可耦接至輸入端口 142以及輸出端口 144。輸入端口 142可用以自具有快取記憶體子系統之EUp控制器Mg接收資料。輸入端口 142亦可自L2快取記憶體114以及後封裝器160接收資料。EUP 146可處理所接收之資料，且將經處理後的資料輸出至輸出端口 144。另外，具有快取記憶體子系統之EUP控制器118可將資料發送至記憶體存取單元（memoryaccessunit，以下簡稱MXU) A 164a以及三角與屬性配置單元（triage-attribute setup) 134 。 L2 快取記憶體 114 亦可將資料發送至MXU A 164a，且自MXU A 164a接收資料。頂點快取記憶體（vertex cache) 112以及資料流快取記憶體n〇亦可與MXU A 164a通信，記憶體存取端口 1〇8亦與Μχυ a 164a通信。記憶體存取端口 1〇8可與匯流排介面單元（匕仍 interface unit，mu ) 90、記憶體介面單元（mem〇ry interface 8 200816820 S3U06-0010I00-TW 24381twf.doc/n200816820 S3U06-0010I00-TW 24381twf.doc/n IX. Description of the invention: [Technical field of invention] The second processing video and graphic data, more specifically, the present invention provides a prior art Early I. With the continuous development of computer technology, the screen is improved by 4 tubes. The demand for more special d 1 reductions is also second to none. ☆ Multi-computer applications and/or data streams need to be processed in the same way. As the video data becomes 1 data processing requirements are also increased.旻对视视视视 ' ' ' ' ' 许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多许多other information. Therefore, the need for CPU in processing complex video and graphics can adversely affect the performance of the overall system. / . In addition, many computing architectures include one or more execution units (EU) for processing data. More specifically, the EU can be used to process multiple different types of data in at least one architecture. As with CPUs, the need for EU to derive from self-processing of complex video and graphical data can adversely affect the performance of the entire metering system. In addition, processing complex video and graphics data by the EU may increase power consumption beyond acceptable thresholds. In addition, different agreements or specifications of the data limit the ability of the EU to process video and graphics. In addition, many computing architectures currently offer 32-bit commands, which may reduce efficiency and therefore speed up processing. In addition, the use of multiple operations in a single component is another requirement. 200816820 S3U06.〇〇l〇l〇〇.TW24381twf.doc/n There are still unresolved needs in this industry to address the above shortcomings and deficiencies. SUMMARY OF THE INVENTION The present invention includes embodiments for processing video data. At least one implementation includes a programmable video processing unit including: logic circuitry for receiving video material selected from one of at least two formats; logic circuitry for receiving an instruction from one of the instruction sets, The instruction includes a format indicating that the block is used to refer to the alpha video device; the first parallel logic circuit indicates that the first format is in the indication field, and the video data is processed according to the first format; and the second parallel logic The circuit 'processes the view data according to the second format when the indication block indicates a second format. The format of the above video data may be one of MPEG-2, Κ and Η·264. Another embodiment of the present invention includes a programmable video processing unit for processing video data of at least two formats, comprising: filtering logic for filtering video data according to a format of video data; and converting logic for The format of the video material is converted into video data; and the logic circuit for outputting video data for subsequent processing. The filter logic circuit and the conversion logic circuit can operate in parallel. ~ The invention also includes an embodiment of a method for processing video material. At least one embodiment receives an instruction from an instruction set; receives video data selected from at least one of the two formats; and processes the video material in accordance with the instruction. The instruction includes a recognition field for indicating the format of the video material; and wherein the step of processing the video data is performed by using a plurality of algorithms according to the identification block. Other systems, methods, features, and advantages of the present invention will be apparent or apparent to those skilled in the art after reviewing the following figures and detailed descriptions. Obviously. All such additional systems, methods, features and advantages are intended to be included within the scope of the description and the scope of the disclosure. [Embodiment] FIG. 1 is an embodiment of a computing architecture for processing video data. As shown in FIG. 1, the spoke device may include a pool 146 of execution units (Executi〇n). The pool 146 of execution units may include one or more execution units for executing data in the computing architecture of FIG. The pool 146 of the execution unit (referred to herein as "EUP 146") can be coupled to the stream cache 116 and receive data from the stream cache 116. Eup 146 can also be coupled to input port 142 and output port 144. Input port 142 can be used to receive data from EUp controller Mg having a cache memory subsystem. Input port 142 can also receive data from L2 cache memory 114 and post-packager 160. The EUP 146 can process the received data and output the processed data to the output port 144. In addition, the EUP controller 118 having a cache memory subsystem can send data to a memory access unit (memory access unit, MXU) A 164a and a triage-attribute setup 134. The L2 cache memory 114 can also send data to the MXU A 164a and receive data from the MXU A 164a. The vertex cache 112 and the stream cache n can also communicate with the MXU A 164a, and the memory access port 1 〇 8 also communicates with the Μχυ a 164a. Memory access port 1〇8 can be connected to the bus interface unit (匕 still interface unit, mu) 90, memory interface unit (mem〇ry interface 8 200816820 S3U06-0010I00-TW 24381twf.doc/n

皿it，MIU) A 106a、MIU B 106b、MIU C 106c 以及 MIU D 106d通信資料’記憶體存取端口 log亦可耦接至Μχυ B 164b 〇 MXU A 164a亦耦接至命令流處理器（c〇nimand processor，以下簡稱CSP)前端120以及CSP後端128。 CSP韵& 120 _接至3D與狀態組件122，3D與狀態組件 122輕接至具有快取記憶體子系統之eup控制器ns。CSP 薊120亦柄接至2D前置組件（pre comp〇nent) 124，2D 韵置組件124搞接至2D先進先出（FIFO)組件126。CSP 前端120亦與清晰度及型號紋理處理器（dear and type texture processor ) 130以及高級加密系統（ encryption system，AES )加密/解密組件132通信資料。csp 後端128耦接至跨距像磚產生器（span4ile genemt〇r ) 136。三角與屬性配置單元134耦接至3D與狀態組件122、具有快取記憶體子系統之EUP控制器118以及跨距像磚產生态136。跨距像磚產生器136可用以將資料發送至ZL1 快取記憶體128,跨距像磚產生器136亦可耦接至zu 138 ’ ZL1 138可將資料發送至ZL1快取記憶體128。沈2 14〇可_至Z (例如，深度緩衝快取記憶體）及模板 (stend卜ST)快取記憶體148。z及ST快取記憶體148 可透過寫回單元162來發送及接收資料，且可減至頻寬 (以下簡稱BW)壓縮器146。BW壓縮器146亦可耦接至 164b ’ MXUB 164b可耗接至纹理快取記憶體盘控制器166。紋理快取記憶體與控制器166可耦接至纹理濾 200816820 S3U06-0Q10I00-TW 24381twf.doc/n 波單元（texture filter unit，以下簡稱 TFU) 168，TFm68 可將資料發送至後封裝器160。後封裝器i6〇可耦接至内插器158。前封裝器156可辆接至内插器158以及紋理位址產生器150。寫回單元162可耦接至2D處理組件（pr〇 component) 154、D快取記憶體152、z與ST快取記憶體 148、輸入端口 142以及CSP後端128。圖1之實施例經由利用EUP 146來處理視訊資料。更特定言之，在至少一實施例中，執行單元之一或多者可用以處理視訊資料。雖然此架構可適用於一些應用，但此架構可能消耗過量功率；另外，此架構在處理Η·264資料中可能頗具難度。 ' 圖2為類似於圖1架構且引入了視訊處理單元 processing unit，以下簡稱VPU)之計算架構的一實施例。更特疋έ之，在圖2之實施例中，可在圖1之計算架構中提供具有可程式核心之VPU199。VPU 199可輕接至CSP 前端120以及TFU168。VPU 199可作為用於視訊資料之專用處理器。另外，VPU 199可用以處理以動畫專家群（以下簡稱MPEG)、VC-1以及H.264協定編碼之視訊資料。更特定言之，在至少一實施例中，可在執行單元（EU) 146之一或多者上執行遮影器碼（shader code)。指令可經解碼及自暫存器提取，主要以及次要操作碼可用以判定運算元被投送之EU以及可基於此運算元執行運算之函數。若操作屬於SAMPLE類型（舉例而言，所有VPIJ指令皆為SAMPLE類型），則可自EUP146調度指令。儘管^ρυ 200816820 S3U06-0010I00-TW 2438 ltwf.doc/n 在VPU 199中被操作運算，該操作是由c〇MMANDFIF〇的資訊來判定的，其結果（最大256位元）可使用ThreadID 以及CRFIndex作為傳回位址傳回至EUP 146以及EU暫存器。另外’本發明包括由EUP 146提供且可供VPU 199使用之指令集’其指令可格式化成64位元，然而此非必要。更特定言之，在至少一實施例中，VPU指令集可包括一或〇多個動態補償濾波（motion compensation filter，以下簡稱 MCF)指令。在此實施例中可能存在以下mcf指令之一或多者： SAMPLE—MCF—BLR DST、S#、T#、SRC2、SRC1 SAMPLE—MCF—VC 1 DST、S#、T#、SRC2、SRC1 SAMPLE MCF—H264 DST、S#、T#、SRC2、SRC1 SRC1之第一組32位元含有U、V座標，其中最低有效16位元為U。由於可不使用或可忽略SRC2，因此SRC2 ◎ 可為任何值，例如為含有4元素濾波核心之32位元值，每一元素為如下揭示帶正負號之8位元。Dish it, MIU) A 106a, MIU B 106b, MIU C 106c and MIU D 106d communication data 'memory access port log can also be coupled to Μχυ B 164b 〇 MXU A 164a is also coupled to the command stream processor (c 〇nimand processor, hereinafter referred to as CSP) front end 120 and CSP back end 128. The CSP & 120 _ is connected to the 3D and status component 122, and the 3D and status component 122 is lightly coupled to the eup controller ns having the cache memory subsystem. The CSP 蓟 120 is also spliced to the 2D pre-completing component 124, and the 2D framing component 124 is coupled to the 2D first in first out (FIFO) component 126. The CSP front end 120 also communicates with a dear and type texture processor 130 and an encryption system (AES) encryption/decryption component 132. The csp back end 128 is coupled to a span tile generator (span4ile genemt〇r) 136. The triangle and attribute configuration unit 134 is coupled to the 3D and state component 122, the EUP controller 118 having the cache memory subsystem, and the span image architecture 136. The span tile generator 136 can be used to send data to the ZL1 cache 128, and the span tile generator 136 can also be coupled to the zu 138 'ZL1 138 to send data to the ZL1 cache 128. Sink 2 14 〇 to Z (for example, depth buffer memory) and template (stend ST) cache memory 148. The z and ST cache memory 148 can transmit and receive data through the write back unit 162 and can be reduced to a bandwidth (hereinafter referred to as BW) compressor 146. The BW compressor 146 can also be coupled to the 164b' MXUB 164b for consumption to the texture cache disk controller 166. The texture cache and controller 166 can be coupled to the texture filter. The device can send data to the post-packager 160. The TFm68 can send data to the post-packager 160. The rear packager i6〇 can be coupled to the interposer 158. Front wrapper 156 can be coupled to interpolator 158 and texture address generator 150. The write back unit 162 can be coupled to a 2D processing component (pr〇 component) 154, a D cache memory 152, z and ST cache memory 148, an input port 142, and a CSP back end 128. The embodiment of FIG. 1 processes video material via the use of EUP 146. More specifically, in at least one embodiment, one or more of the execution units can be used to process video material. While this architecture can be applied to some applications, this architecture can consume excessive power; in addition, this architecture can be difficult to process. FIG. 2 is an embodiment of a computing architecture similar to the architecture of FIG. 1 and incorporating a video processing unit (hereinafter referred to as VPU). More specifically, in the embodiment of FIG. 2, a VPU 199 having a programmable core can be provided in the computing architecture of FIG. The VPU 199 can be connected to the CSP front end 120 as well as the TFU168. The VPU 199 can be used as a dedicated processor for video data. In addition, the VPU 199 can be used to process video material encoded in the Animation Expert Group (hereinafter referred to as MPEG), VC-1, and H.264 protocols. More specifically, in at least one embodiment, a shader code can be executed on one or more of the execution units (EU) 146. The instructions can be decoded and extracted from the scratchpad, and the primary and secondary opcodes can be used to determine the EU in which the operand is being delivered and the function by which the operation can be performed based on the operand. If the operation is of the SAMPLE type (for example, all VPIJ instructions are of the SAMPLE type), the instructions can be scheduled from the EUP 146. Although ^ρυ 200816820 S3U06-0010I00-TW 2438 ltwf.doc/n is operated in VPU 199, the operation is determined by the information of c〇MMANDFIF〇, and the result (maximum 256 bits) can use ThreadID and CRFIndex It is passed back to the EUP 146 and the EU register as a return address. Further, the present invention includes an instruction set provided by EUP 146 and available to VPU 199. The instructions can be formatted into 64 bits, however this is not necessary. More specifically, in at least one embodiment, the VPU instruction set can include one or more motion compensation filter (MCF) instructions. There may be one or more of the following mcf instructions in this embodiment: SAMPLE-MCF-BLR DST, S#, T#, SRC2, SRC1 SAMPLE-MCF-VC 1 DST, S#, T#, SRC2, SRC1 SAMPLE MCF—H264 DST, S#, T#, SRC2, SRC1 The first 32-bit SRC1 contains U and V coordinates, of which the least significant 16 bits are U. Since SRC2 may not be used or may be ignored, SRC2 ◎ may be any value, such as a 32-bit value containing a 4-element filter core, each of which is a octet with a sign as follows.

ύ Vi > ( SRC2) —- 3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 0 9 0 8 0 7 0 6 0 0 4 0 0 9 0 1 0 〇核心[3] 核心[2] 核心[2] \J Μ 〇] \J 表格2 ·· MCF濾波核心另外’ VPU 199之指令集還包括關於迴路内解塊濾波 14 200816820 S3U06-0010I00-TW 243 81 twf.doc/n (Inloop Deblocking Filtering，以下簡稱 IDF )之指令，如以下指令之一或多者·· SAMPLE—IDF—VC1 DST、S#、T#、SRC2、SRC1 SAMPLE—IDF一H264—0 DST、S#、T#、SRC2、SRC1 SAMPLE」DF—H264」DST、S#、T#、SRC2、SRC1 ’ SAMPLE—IDF—H264—2 DST、S#、T#、SRC2、SRC1 對於VC-1 IDF之操作，TFU 168可將8x4x8位元（或 f) 4x8x8位元）資料提供至濾波緩衝器中。然而，對於Η·264, 由TFU 168輸送之資料量可視H.264 IDF操作之類型加以控制。對於 SAMPLE_IDF—H264—0 指令，TFU 供應 8x4x8 位元（或4x8x8位元）的資料塊。對於SAMpLE H264 1 ~ — — 指令’ TFU 168供應一 4x4x8位元之資料塊，且另一 4x4x8 位元資料由遮影器（EU) 146 (圖2)供應。另外，藉由 SAMPLE一IDF一H264—2，兩個4x4x8位元資料塊皆可由遮 (J 影器（位於EU) 146供應，而非來自τρυ 168。另外’ VPU 199之指令集還包括動態估計（m〇ti〇n estimation，以下簡稱ME)指令，其可包括諸如以下列出之指令： SAMPLE—SAD DST、S#、T#、SRC2、SRC1。以上指令可映射至以下主要以及次要操作碼且採取以上所述之格式。以下在相關指令部分中論述SRC以及dST 格式之細節。 15 200816820 S3U06-0010I00-TW 24381twf.doc/nύ Vi > ( SRC2) —- 3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 0 9 0 8 0 7 0 6 0 0 4 0 0 9 0 1 0 〇Core [3] Core [2] Core [2] \J Μ 〇] \J Table 2 ·· MCF filter core additionally 'VPU The instruction set of 199 also includes instructions for deblocking filtering in the loop 14 200816820 S3U06-0010I00-TW 243 81 twf.doc/n (Inloop Deblocking Filtering, hereinafter referred to as IDF), such as one or more of the following instructions. IDF—VC1 DST, S#, T#, SRC2, SRC1 SAMPLE—IDF—H264—0 DST, S#, T#, SRC2, SRC1 SAMPLE”DF—H264” DST, S#, T#, SRC2, SRC1 ' SAMPLE_IDF—H264—2 DST, S#, T#, SRC2, SRC1 For the operation of the VC-1 IDF, the TFU 168 can provide 8x4x8 bits (or f) 4x8x8 bits) data to the filter buffer. However, for Η·264, the amount of data transported by the TFU 168 can be controlled by the type of H.264 IDF operation. For the SAMPLE_IDF-H264-0 instruction, the TFU supplies 8x4x8 bits (or 4x8x8 bits) of data blocks. For the SAMpLE H264 1 ~ — instruction ' TFU 168 supplies a 4x4x8 bit data block, and another 4x4x8 bit data is supplied by the shader (EU) 146 (Fig. 2). In addition, with SAMPLE-IDF-H264-2, both 4x4x8-bit data blocks can be supplied by the mask (located in EU) 146 instead of τρυ 168. In addition, the instruction set of 'VPU 199 also includes dynamic estimation. (m〇ti〇n estimation, hereinafter referred to as ME) instructions, which may include instructions such as those listed below: SAMPLE_SAD DST, S#, T#, SRC2, SRC1. The above instructions can be mapped to the following primary and secondary operations The code is in the format described above. The details of the SRC and dST formats are discussed below in the relevant instruction section. 15 200816820 S3U06-0010I00-TW 24381twf.doc/n

Sample-MC_H264可僅用於Y平面，但對於CrCb平面並非為必需的。指令註解 CrCb 平面 SAMPLE 一MC 一BLR 自紋理快取記憶體之 8x8x8位元塊是是 SAMPLE—MC 一VC1 自紋理快取記憶體之 12x12x8位元塊【— 是 SAMPLE一MC 一H264 自紋理快取記憶體之 12x12x8位元塊 --——. 是否是 SAMPLE一 SAD 自紋理快取記憶體之 8x4x8位元塊，V可為任何對準是 SAMPLE一IDF—VC1 自紋理快取記憶體之 8x4x8 位元（或 4x8x8 位元），32位元對準是是 SAMPLE IDF H264 一 0 自紋理快取記憶體之 8x4x8 位元（或 4x8x8 位元），32位元對準 ------— 是 SAMPLE IDF H264 一 1 自紋理快取記憶體之 4x4x8位元，32位元對準是是 SAMPLE IDF H264 一2 無自紋理快取記憶體之資料 *----- SAMPLE—TCF—I4x4 無自紋理快取記憶體之資料 SAMPLE TCF M4x 4 無自紋理快取記憶體之資料 SAMPLE TCF MP EG2 — 無自紋理快取記憶體之資料 SAMPLE—MADD 無自紋理快取記憶體之資料 SAMPLE 一 SMMUL 無自紋理快取記憶體之資料〇 Ο 19 200816820 S3U06-0010I00-TW 24381twf.doc/nSample-MC_H264 can be used only for the Y plane, but is not required for the CrCb plane. The instruction annotation CrCb plane SAMPLE-MC-BLR self-texture memory 8x8x8 bit block is the SAMPLE-MC-VC1 self-texture memory memory 12x12x8 bit block [- is SAMPLE-MC-H264 self-texture cache Memory 12x12x8 bit block ---. Is it a SAMPLE-SAD self-texturing memory 8x4x8 bit block, V can be any alignment is SAMPLE-IDF-VC1 self-texturing memory 8x4x8 bit Meta (or 4x8x8 bits), 32-bit alignment is 8x4x8 bits (or 4x8x8 bits) of SAMPLE IDF H264-0 self-texturing cache, 32-bit alignment ------ SAMPLE IDF H264-1 Self-texturing memory 4x4x8 bit, 32-bit alignment is SAMPLE IDF H264-2 No self-texturing cache data*----- SAMPLE-TCF-I4x4 Texture Cache Memory SAMPLE TCF M4x 4 No Self-Texturing Cache Memory SAMPLE TCF MP EG2 — No Self-Texturing Cache Memory SAMPLE—MADD No Texture Cache Memory SAMPLE One SMMUL None Self-texturing Data from memory 〇 Ο 19 200816820 S3U06-0010I00-TW 24381twf.doc/n

表格7 :用於視訊之資料載入在本文中所揭露之至少一實施例中，γ平面可包括 HSF—Y0Y1Y2Y3—32BPE—VIDE02 鋪磚格式。CrCb 平面包括交錯CrCb通道且被視為HSF__CrCb」6BPE_VIDEO鋪磚格式。若不要求CbCr交錯平面，則對於Cb或Cr，均可利用與Y平面相同的格式。另外’已將以下指令添加至遮影器指令集架構 (ISA ) 〇Table 7: Data Loading for Video In at least one embodiment disclosed herein, the gamma plane may comprise a HSF-Y0Y1Y2Y3-32BPE-VIDE02 tiled format. The CrCb plane includes interleaved CrCb channels and is considered a HSF__CrCb"6BPE_VIDEO tile format. If the CbCr interlaced plane is not required, the same format as the Y plane can be used for Cb or Cr. In addition, the following instructions have been added to the Shader Instruction Set Architecture (ISA)〇

SAMPLE—MCFJBLR SAMPLE—MCF—VC 1 SAMPLE 一 MCF 一 H264 S AMPLE JDF^VCl SAMPLE 一 IDF—H264—0 SAMPLE—IDF—H264一 1 SAMPLE—SAD SAMPLE—TCF 一 MPEG2 SAMPLE_TCF_I4x4 SAMPLE—TCF—M4x4 SAMPLE 一 MADD SAMPLE IDF H264 2 DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# DST、#ctrl、SRC2 DST、#ctr1、SRC2 DST、#ctri、SRC2 DST、#ctrl、SRC2 DST、#ctri、SRC2 SRC2、SRC1 SRC2、SRC1 SRC2 > SRC1 SRC2、SRC1 SRC2、SRC1 SRC2、SRC1 SRC2、SRC1 SRC1 SRC1 SRC1 SRC1 SRC1 用於 SAMPLE_IDF_H264_2 2#ctrl 應為零。 SRC1、SRC2以及#ctrl (可用時）可用以形成如以下表格8中所示在EU/TAG/TCC介面中的512位元資料欄位0 20 200816820 U/30P.JAU Iscn寸(ΝΛνιόΟΙΟ 10Q-90ilrns C,!/G ,, o 1模式I PQUANT U P-l fc rn (N 6 s 00 eu u o 2 m m (N t IndexB | IndexA | bS | <N ro寸 m __議麵_議____画 m yn 寸 IndexA m m卜 \o cn 〇〇卜 m C7\ oo Q 寸O U 寸— —〇寸CN 寸m 一 <N 寸寸 —m 寸α 一寸寸 —t〇 I IndexB 寸卜丨:麗誦 CBCR 寸oo ! | | MMODE | 議:纖 eg 寸Ch 響漏:::議： to ο — CN ο IndexA CN IndexA | — <Ν — cn <Ν CN »η寸 <Μ m yn ^r) (Ν寸 W-) v〇 (Ν <〇们卜 CN Ό IndexB \Τί 00 I IndexB I — CN卜 On CM 〇〇 νο ο as —« m ο (N Γ〇 Τ—^ Ό m 控制 Control—3 Control_0 Control」 Control一2 1 Control 4 200816820SAMPLE-MCFJBLR SAMPLE-MCF-VC 1 SAMPLE-MCF-H264 S AMPLE JDF^VCl SAMPLE-IDF-H264-0 SAMPLE-IDF-H264-1 SAMPLE-SAD SAMPLE-TCF-MPEG2 SAMPLE_TCF_I4x4 SAMPLE-TCF-M4x4 SAMPLE-MADD SAMPLE IDF H264 2 DST, S#, T# DST, S#, T# DST, S#, T# DST, S#, T# DST, S#, T# DST, S#, T# DST, S# , T# DST, #ctrl, SRC2 DST, #ctr1, SRC2 DST, #ctri, SRC2 DST, #ctrl, SRC2 DST, #ctri, SRC2 SRC2, SRC1 SRC2, SRC1 SRC2 > SRC1 SRC2, SRC1 SRC2, SRC1 SRC2 , SRC1 SRC2, SRC1 SRC1 SRC1 SRC1 SRC1 SRC1 for SAMPLE_IDF_H264_2 2#ctrl should be zero. SRC1, SRC2, and #ctrl (when available) can be used to form a 512-bit data field in the EU/TAG/TCC interface as shown in Table 8 below. 0 20 200816820 U/30P.JAU Iscn-inch (ΝΛνιόΟΙΟ 10Q-90ilrns C,!/G,, o 1 mode I PQUANT U Pl fc rn (N 6 s 00 eu uo 2 mm (N t IndexB | IndexA | bS | <N ro inch m __面面_议____ m yn inch IndexA mm\o cn 〇〇 m m C7\ oo Q inch OU inch - 〇 inch CN inch m one <N inch inch - m inch α one inch inch - t〇I IndexB inch 丨丨Inch oo ! | | MMODE | Discussion: Fiber EG Ch leak::: Discussion: to ο — CN ο IndexA CN IndexA | — <Ν — cn <Ν CN »η寸<Μ m yn ^r) (Ν寸 W-) v〇(Ν <〇我卜CN Ό IndexB \Τί 00 I IndexB I — CN卜 On CM 〇〇νο ο as —« m ο (N Γ〇Τ—^ Ό m Control Control— 3 Control_0 Control" Control one 2 1 Control 4 200816820

UPOP.JMVIooe寸Z Vd-Ml-18 e寸Z c ο (N m 寸 v〇卜 OO On 〇 (N m 寸 to D P D D 丨未定 1 未定義 1 1 未定義 1 > > > > > Control 0 I Control 1 I Control 2 I I Control 2 I 1 Gontrol_2 | 1 Control 3 1 1 Control 4 未定義 | 未定義 i 未定義 | 1 未定義」 1 未定義 1 Control 5 I Control 5 I Control 5 I I Control 5 I 1 Control一5 | A〇卜 OO 〇\ CN <N CN m CN <ri ΛΟ (N <N OO CN 〇\ (N SRC2(偶）. | | 未定義 | | 未定義 | o % o g Pu s Pu o "q. (N "cu ΓΟ S D- ：y—0·^ (N 泛 ro 〇 rn^ ro^ (N m XX m 1____ . _ .未^義1 o o o 1 ra20 1 m20 o o s a. s Cl 1 ra21 | 1 m21 | g g a o "d. O | m22 | 1 m22 | i. rj "a ::¾ | m23 | 1 m23 | ro "a- ro Q- _ ：;：;〇. i m30 | | m30 CN^ <N CM ：；：：cx | m31 | 1 m31 | <N a. m CN^ o- | m32 | | m32 1 ro^ (N ro (X <N 1__Π133__I 1 m33 | ro m m cx (N ro m m | SRC2+1 (奇）1 1 未定義 1 1 耒定義 l 1 .未定義 1 l 未定義 I I 未定義 | 1第二暫存器對.1 I第二暫存器對I 1第二暫存器.對、 1I—31CL,變S XIJU1—3IVS CN—SZHIJQrandNVS Γ寸 i—JQrandis 0丨寸 9{NHI」arwldwvs Qvs—wldlAIVS luA—JQI—Hldwvs x—Juslsdis o cx 111 s :1議1睛 -P. ro 3 〇 B i S rj 1 rn s • rj 〇震議 ΪΝ 5 (N fv| B ro <N m3 2 〇 ro^ r〇玢·? m :__K m I ϋ'^^Μ滩 r—寸 9SJaI—31dwvs i^lll^鈉 “姝 x—UHul—wldis Vi fi 寸 £Inrn^ ^oomσΝΓΟ0 寸 )寸(N寸 3 寸寸wo寸Vo寸卜寸崩^^UPOP.JMVIooe inch Z Vd-Ml-18 e inch Z c ο (N m inch v〇 OO On 〇 (N m inch to DPDD 丨 undecided 1 undefined 1 1 undefined 1 >>>>> Control 0 I Control 1 I Control 2 II Control 2 I 1 Gontrol_2 | 1 Control 3 1 1 Control 4 Undefined | Undefined i Undefined | 1 Undefined 1 Undefined 1 Control 5 I Control 5 I Control 5 II Control 5 I 1 Control一5 | A〇卜OO 〇\ CN <N CN m CN <ri ΛΟ (N <N OO CN 〇\ (N SRC2(even). | | Undefined | | Undefined | o % og Pu s Pu o "q. (N "cu ΓΟ S D- :y—0·^ (N pan ro 〇rn^ ro^ (N m XX m 1____ . _ .未^义1 ooo 1 ra20 1 m20 oos a. s Cl 1 ra21 | 1 m21 | ggao "d. O | m22 | 1 m22 | i. rj "a ::3⁄4 | m23 | 1 m23 | ro "a- ro Q- _ : ;:;〇. i m30 | | m30 CN^ <N CM :;::cx | m31 | 1 m31 | <N a. m CN^ o- | m32 | | m32 1 ro^ (N ro (X <N 1__Π133__I 1 m33 | ro mm cx (N ro mm | SRC2+1 (odd) 1 1 undefined 1 1 耒 definition l 1 . undefined 1 l undetermined II Undefined | 1 Second register pair. 1 I second register pair I 1 second register. Pair, 1I-31CL, change S XIJU1—3IVS CN—SZHIJQrandNVS Γ inch i—JQrandis 0丨 inch 9{NHI"arwldwvs Qvs-wldlAIVS luA-JQI-Hldwvs x-Juslsdis o cx 111 s :1 议1眼-P. ro 3 〇B i S rj 1 rn s • rj 〇震议ΪΝ 5 (N fv| B Ro <N m3 2 〇ro^ r〇玢·? m :__K m I ϋ'^^Μ beach r-inch 9SJaI—31dwvs i^lll^sodium “姝x—UHul—wldis Vi fi inch £Inrn^ ^ OomσΝΓΟ0 inch) inch (N inch 3 inch inch wo inch Vo inch inch inch collapse ^^

o (N m 寸 \〇 r- oo as o 二 CN m 寸 Γ ΟΟ 〇\ s (N (N (N 未定義 ro CN S (N \〇 CN S OO <N as (N So (N m inch \〇 r- oo as o 2 CN m inch Γ ΟΟ 〇 \ s (N (N (N undefined ro CN S (N \〇 CN S OO <N as (N S

ς—J 200816820i0i00_TW24381twfdoc/n 參看表格8，Tr =轉置；FD =濾波方向（垂直· bS =邊界強度（Boundary Strength) ; bR = bR 控制，Yc 位元（於CbCr平面YC二1 ;於Y平面則YC = 〇)，以及 CEF =色度邊緣旗幟(Chroma Edge Flag)。另外，當32位元或（或更少位元）使用於SRC1或SRC2 (剩餘未定義）時，可規定巷（lane)選擇以減低暫存器之使用。雖然以上描述了指令格式，但以下在表格1〇中包括對指令操作之概述。〇指令名稱指令格式指令操作 SAMPLE 一MCF—BLR SAMPLE MCF BLR DST ^ SRC2 > SRC1 Mc濾波實施 SAMPLE—MCF—VC1 SAMPLE MCF VC1 DST > SRC2 > SRC1 對於VC4之MC 渡波實施 SAMPLE 一 MCF—H264 SAMPLE MCF H264 DST ^ SRC2 ^ SRC1 對於H.264之MC 濾波實施 SAMPLE 一IDF 一VC1 SAMPLE IDF VC1 DST、SR~02、SRC1 VC-1解塊操作 SAMPLE 一IDF—H264_0 SAMPLE IDF H264 0 DST、SR"02、一SRC「 H.264解塊操作。自紋理快取記憶體166提供4x4x8 (垂直濾波器）或 8x4x8 塊。 SAMPLEJDF—H264—1 SAMPLE IDF H264 1 DST > SRC2 >~SRCr H.264操作。自著色器提供一 4x4x8 位元塊，自紋理快取記憶體166提供另一 4x4x8位元塊。此允許構造 8x4 (或 4x8)塊。 SAMPLE_IDF_H264_2 SAMPLE IDF H264 2 DST、#ctri、S反C2、SRC1 H.264解塊操作。兩個4x4塊均由遮 23 200816820 I10I00-TW 24381twf.doc/nς—J 200816820i0i00_TW24381twfdoc/n See Table 8, Tr = transpose; FD = filtering direction (vertical · bS = Boundary Strength; bR = bR control, Yc bit (in the CbCr plane YC 2; in the Y plane) Then YC = 〇), and CEF = Chroma Edge Flag. In addition, when 32 bits or (or fewer bits) are used for SRC1 or SRC2 (remaining undefined), lanes can be specified (lane Select to reduce the use of the scratchpad. Although the above describes the instruction format, the following is an overview of the operation of the instruction in Table 1. The instruction name instruction format instruction operates SAMPLE-MCF-BLR SAMPLE MCF BLR DST ^ SRC2 &gt SRC1 Mc filter implementation SAMPLE-MCF-VC1 SAMPLE MCF VC1 DST > SRC2 > SRC1 Implementing SAMPLE for MC4 wave of VC4-H264 SAMPLE MCF H264 DST ^ SRC2 ^ SRC1 Implementing SAMPLE-IDF for MC filtering of H.264 A VC1 SAMPLE IDF VC1 DST, SR~02, SRC1 VC-1 deblocking operation SAMPLE-IDF-H264_0 SAMPLE IDF H264 0 DST, SR"02, an SRC "H.264 deblocking operation. Self-texturing memory 166 Available in 4x4x8 (vertical filter) or 8x4x8 block SAMPLEJDF—H264—1 SAMPLE IDF H264 1 DST > SRC2 >~SRCr H.264 operation. A 4x4x8 bit block is provided from the shader, and another texture is provided from texture cache memory 166. A 4x4x8 bit block. This allows 8x4 (or 4x8) blocks to be constructed. SAMPLE_IDF_H264_2 SAMPLE IDF H264 2 DST, #ctri, S reverse C2, SRC1 H.264 deblocking operation. Both 4x4 blocks are covered by 23 200816820 I10I00-TW 24381twf.doc/n

影器提供，以構造 8X4 塊。 SAMPLE—SAD SAMPLE_SAD DST、 S#、T#、SRC2、SRC1 "SW^SRC2)ir 及預測資料執行四次絕對差和 (SAD)運算。 SAMPLE—TCF—14x4 SAMPLE TCF 14x4 DST、#ct「l、SrTc2、SRC1 變換編碼實施 SAMPLE TCF M4x4 SAMPLE TCFM4x4 DST、#ctS、SRC2、SRC1 變換編碼實施 SAMPLE—TCF—MPEG2 SAMPLE TCF MPEG2 DST、#ctS、sfC2、SRC1 變換編碼實施 SAMPLE_MADD SAMPLE_MADD DST、 #ctrt、SRCW、SRC1 SAMPLESIMMUL SAMPLE—SIMMUL DST、#ctri、SRC2、SRC1 執行純量矩陣乘法。#ctrl為11位元立即值。此可為〇 (例如，#ctrl信號將忽略）。亦‘ 見下文夕 ---J 表格10 :指令概述另外，對於SAMPLE—MADD而言，#ctrl可為n位元的立即值’此外還須執行兩個4 x 4矩陣（SRC1以及SRC2 ) 之加法。任一矩陣之一或多個元素可為16位元帶正負號之整數，其結果（DST)為4 X 4 16位元矩陣。矩陣可如以下在表格11中所示置放於來源/目的暫存器中，此可 vpu内之個別單元。另外，SRC1以及#cM資料於週期I 時可供存取，且SRC2於隨後之週期亦可存取，因此，每兩週期發布一個操作。 #Ctrl⑼指示是否執行飽和（saturation，SAT)操作。 24 J10I00-TW 24381twf.doc/n 200816820 w v v-f \y #ctrl[l]指示是否執行捨入（rounding，R)操作 #ctrl[2]指示是否執行1位元右移(shift，s)操作 #ctrl[10:3]忽略。 25 5: 24 0 23 9: 22 4 22 3: 20 8 20 7: 19 2 — ^63Γ 48 32 ^ΤΓ 16 ----^^ 15： 0 Μ 33 Μ 32 Μ 31 Μ 30 Μ 23 Μ 22 Μ 21 Μ 20 Μ 13 Μ 12 Μ 11 Μ 10 Μ 03 Μ Μ 01 Μ 00 Ο 表格11 :用於來源矩陣以及目的矩陣之暫存ρ 另外’與此資料相關的邏輯準則可包括以下： #Lanes ·•二 16; #Lanewidth 16;The camera is provided to construct an 8X4 block. SAMPLE—SAD SAMPLE_SAD DST, S#, T#, SRC2, SRC1 "SW^SRC2)ir and predictive data perform four absolute difference sum (SAD) operations. SAMPLE-TCF—14x4 SAMPLE TCF 14x4 DST, #ct“l, SrTc2, SRC1 transform coding implementation SAMPLE TCF M4x4 SAMPLE TCFM4x4 DST, #ctS, SRC2, SRC1 transform coding implementation SAMPLE-TCF—MPEG2 SAMPLE TCF MPEG2 DST, #ctS, sfC2, SRC1 transform coding implementation SAMPLE_MADD SAMPLE_MADD DST, #ctrt, SRCW, SRC1 SAMPLESIMMUL SAMPLE—SIMMUL DST, #ctri, SRC2, SRC1 perform scalar matrix multiplication. #ctrl is an 11-bit immediate value. This can be 〇 (for example, The #ctrl signal will be ignored. Also see 'Eve eve'--J Table 10: Instruction Overview In addition, for SAMPLE-MADD, #ctrl can be an immediate value of n bits' in addition to two 4 x 4 Addition of the matrix (SRC1 and SRC2). One or more elements of any matrix may be a 16-bit signed integer, and the result (DST) is a 4 X 4 16-bit matrix. The matrix may be as shown in Table 11 below. It is placed in the source/destination register, which can be used as an individual unit in the vpu. In addition, the SRC1 and #cM data are available for access in cycle I, and SRC2 is also accessible during subsequent cycles. , one every two cycles #Ctrl(9) indicates whether to perform a saturation (SAT) operation. 24 J10I00-TW 24381twf.doc/n 200816820 wv vf \y #ctrl[l] indicates whether to perform rounding (R) operation #ctrl[2] Indicates whether to perform a 1-bit right shift (shift, s) operation #ctrl[10:3] ignores. 25 5: 24 0 23 9: 22 4 22 3: 20 8 20 7: 19 2 — ^63Γ 48 32 ^ΤΓ 16 ----^^ 15: 0 Μ 33 Μ 32 Μ 31 Μ 30 Μ 23 Μ 22 Μ 21 Μ 20 Μ 13 Μ 12 Μ 11 Μ 10 Μ 03 Μ Μ 01 Μ 00 Ο Table 11: for the source matrix and Temporary storage of the destination matrix ρ Additional 'logical criteria associated with this material may include the following: #Lanes ·•二16; #Lanewidth 16;

If (#ctrl[l]) R - 1; ELSE R = 0;If (#ctrl[l]) R - 1; ELSE R = 0;

If (#ctrl[2]) S = 1; ELSE S = 0; IF (#ctrl[0]) SAT = 1; ELSE SAT - 0;If (#ctrl[2]) S = 1; ELSE S = 0; IF (#ctrl[0]) SAT = 1; ELSE SAT - 0;

For (I := 0; I < #Lanes; 1+= 1){For (I := 0; I <#Lanes; 1+= 1){

Base := I * #Lanewidth; () ^ Top Base + #Lanewidth - 1;Base := I * #Lanewidth; () ^ Top Base + #Lanewidth - 1;

Sourcel[I] := SRC1 [Top..Base];Sourcel[I] := SRC1 [Top..Base];

Source2[I] := SRC2[Top"Base];Source2[I] := SRC2[Top"Base];

Destinationfl] := (Sourcel[I] + Source2[I] + R) » 〇. , IF (SAT) Destination[I] ^ MIN(MAX(Destination[I]，0)，255); DST[Top..Base] = Destination[I]; 再次參看圖9，其為執行純量矩陣相乘。#ctrl為位 25 )10I00-TW 243 81 twf.doc/n 200816820 元立即值，此值可為0 (亦即，#ctrl信號將忽略）。此指令在與 SAMPLE TCF 以及 SAMPLE_IDF_H264_2 相同的群中。與此指令相關的邏輯準則可包括以下：. #Lanes := 16; #Lanewidth := 16; ' MMODE 二 Control—4[17:16]; 气 SM = Control—4 [7:0]; SP-Control_4[15:8]; //僅使用最低有效5位元Destinationfl] := (Sourcel[I] + Source2[I] + R) » 〇. , IF (SAT) Destination[I] ^ MIN(MAX(Destination[I],0),255); DST[Top.. Base] = Destination[I]; Referring again to Figure 9, it is the multiplication of the scalar matrix. #ctrl is a bit 25 ) 10I00-TW 243 81 twf.doc/n 200816820 Element immediate value, this value can be 0 (that is, the #ctrl signal will be ignored). This instruction is in the same group as SAMPLE TCF and SAMPLE_IDF_H264_2. The logic guidelines associated with this directive may include the following: #Lanes := 16; #Lanewidth := 16; ' MMODE II Control—4[17:16]; Gas SM = Control—4 [7:0]; SP- Control_4[15:8]; //Use only the least significant 5 bits

For (I := 0; I < #Lanes; I += 1){ f; Base := I * #Lanewidth;For (I := 0; I <#Lanes; I += 1){ f; Base := I * #Lanewidth;

Top := Base + #Lanewidth - 1;Top := Base + #Lanewidth - 1;

Source2[I] ~ SRC2[Top..Base];Source2[I] ~ SRC2[Top..Base];

Destination[I] := (SM * Source2[I]) » SP; DST[Top"Base]二 Destination[I];} 此是使用 VPU 中用於執行MCF/TCF之 FIRJFILTERJBLOCK單元來實施的。SM為施加至所有巷之加權（例如，W[0] = W[l] = W[2] = W[3] = SM)，Pshift 〇為SP。當執行此操作時，FIRJFILTERJBLOCK中之總和加法器被越過，自16x8位元乘法所得之四個結果可被移位，且每一結果之最低有效16位元被收集在一起成為16 個16位元結果，以回傳遞至EU。圖3為說明如圖2之計算架構中用於處理視訊資料之過程之流程圖的實施例。更特定言之，如圖3之實施例所說明，命令流處理器可將資料以及指令發送至EUP 146。 EUP 146相應地可用以讀取指令且處理所接收之資料。 26 200816820 ojuuu-uulOIOO-TW 24381twf.doc/n EUP 146隨後可將指令、經處理之資料以及來自Eup紋理位址產生器（TAG)介面242之資料發送至紋理位址產生器（TAG) 15〇。TAG 150可用以產生已處理資料之位址。 TAG 150隨後可將資料以及指令發送至紋理快取記憶體控制斋（texture cache controller，TCC) 166。TCC 166 可用，以快取用於紋理濾波單元（texture filter unit，TFU) 168 之資料。TFU168可根據所接收之指令來濾波所接收之資、料，且將經濾波之資料發送至視訊可程式單元（VPU) 199。 ( VPU 199可根據所接收之指令來處理所接收之資料，且將經處理資料發送至後封裝器（p〇stpacker，psp ) 160。PSP 160可自諸如TFU 168之各組件來收集像素封包。若像碑是部分完整的，則PSP 160可封裝多個像碑且使用被發送至管線之特定識別符號將像磚發回至EUP 146。圖4A為說明在計算裝置（諸如具有圖2之計算架構的計算裝置）中資料流之功能流程圖的實施例。如圖4A 之實施例所說明，可將加密的資料流發送至CSP 120，128 Ο 上之解密組件236。在至少一實施例中，加密位元流可經解您且寫回至視訊記憶體。隨後可使用可變長度解碼器 (VLD)硬體來解碼所解密之視訊。解密組件236可解密所接收之位元流以形成編碼位元流238。編碼位元流238 可發送至VLD、霍夫曼（Huffman)解碼器、複雜適應性可變長度編碼器（complex adaptive variable length decoder，CAVLC)及/或二進制算術編碼器（context Based Binary Arithmetic Coder，CAB AC) 240 (本文中稱為“解 27 200816820 10I00-TW 24381twf.doc/n 碼器”）。解碼器240將所接收之位元流解碼，且將所解碼之位元流發送至DirectX視訊加速（DirectX Video Acceleration，DXVA)資料結構 242。另外，在 DXVA 資料結構242處接收到的資料為外部MPEG-2 VLD反掃描、 ‘ 反量化與反DC預測，以及外部VC-1 VLD反掃描、反量 ' 化與反DC/AC預測。隨後可經由圖像標頭244、記憶體緩衝器 0 (MBO) 246a，MB1 246b，MB2 246c，···，ΜΒΝ 246η 等而將此資料擷取於DXVA資料結構242中。資料隨後可 ^ 進入跳躍塊250、252以及254，以在圖4B以及圖4C中繼續。圖4B為圖4A之功能流程圖的延續。如圖所示，自圖 4A之跳躍塊250、252以及254，在反掃描反Q組件264 以及反DC/AC預測組件262處接收資料。此資料經處理且發送至交換器265。交換器265判定資料經由Intra/Inter 輸入端發送與否，將選定資料發送至跳躍塊27〇。另外，將來自跳躍塊260之資料發送至編碼圖案塊重建組件266。 Ο 圖4C為圖4A以及圖4B之功能流程圖的延續。如圖所示，來自跳躍塊272、274 (圖4A)之資料於渡波器組件280處被接收。此資料根據多個協定之任_者由Mc濾波器282濾波。更特定言之，若資料以MPEG_2格式被接收，則該資料以％像素偏差來構造。若資料以ν〇4格式被接收’則利用4抽頭（4-tap)濾波器。另一方面，若資料以H.264格式被接收，則可利用6抽頭濾波器。經濾波之資料隨後發送至重建參考組件284，與渡波器組^8〇相 28 200816820 o^^^u-^OlOIOO-TW 24381twf.d〇c/n f的二料發送至交換II組件288。交換器组件288亦接收 1父換組件可基於所接收之Intra/Inter資料來判定那些資料將發送至加法器298。另外，反轉換組件2%自編碼圖案塊重建組件286接一· =資料，以及經由跳躍塊276自交換器265 (圖4B)接收 ' 貧料。反轉換組件296執行對於MPEG-2資料之8x8離散餘弦反轉換（IDCT)、對於VC-1資料之8x8、8x4、4x8 及/或4X4整數轉換以及對於H.264資料之4x4整數轉換， ’ 藏據所要執行的轉換，將此資料發送至加法器298。加法器298將反轉換組件296以及交換器2狀之資料相加求和，且將求和所得的資料發送至迴路内濾波器 2=6。迴路内濾波器2%過濾所接收之資料，且將經過濾之 =料杳送至重建框架組件290。重建框架組件將資料，迗至重建參考組件284。重建框架組件29〇可將資料發达至解塊與去環（dering)濾波器292，濾波器292可將經過濾之資料發送至用於解交錯之解交錯（知七tedadng ) U 組件294，此資料隨後可供顯示。圖5A為說明在VPU中（諸如在圖2之計算架構中）了用於提供動態壓縮（MC)及/或離散餘弦轉換（dct) 刼作之組件之實施例的功能方塊圖。更特定言之，如圖5八之實施例所說明，匯流排A可用以將16位元資料發送至 PE 3 314d之輸入埠b，匯流排a亦將資料發送至乙！延遲組件300，以將16位元資料發送至pE23i4c之第二輸入端。匯流排A亦將此資料發送至ζ·!延遲組件3〇2以將％ 29 200816820 〇j^uu-vJ10I00-TW 24381twf.doc/n 位元資料發送至PE 1 314b，此資料亦發送至z_i延遲組件 304 ’其隨後進入PE 0 314a以及Z_1延遲組件306。在穿過 Z·1延遲組件306之後，將匯流排A之低位8位元資料發送至PE 0 314a ’此資料由Z-1 306延遲且發送至ρβ 1 314b - 以及z延遲組件。在到達Z_1延遲組件31〇之後，此 - 資料之低位8位元發送至PE 2 314c以及z-1延遲組件 312 ;在到達Ζ·ι延遲組件312之後，此資料之低位8位元 a 發送至PE 3 314d。另外，匯流排B將64位元資料發送至 ^ PE 3 314d、PE 2 314c、PE 1 314b 以及 PE 0 314a 之每一者。處理元件〇 (processing Elelment，PE 〇 ) 314a 可促進過濾所接收資料。更特定言之，！^可為FIR濾波器之一元件。^ PE 0 314a、PE 1 314b、PE 2 314c 以及 PE 3 314d 與加法裔330組合時，此可形成4抽頭/8抽頭fir濾波器。資料之一部分首先發送至延遲組件316。多工器318選擇資料以使輸入資料自欄位輸入回應組件（Field InputDestination[I] := (SM * Source2[I]) » SP; DST[Top"Base]Two Destination[I];} This is implemented using the FIRJFILTERJBLOCK unit in the VPU for executing MCF/TCF. SM is the weight applied to all lanes (for example, W[0] = W[l] = W[2] = W[3] = SM), and Pshift 〇 is SP. When this is done, the sum adder in FIRJFILTERJBLOCK is crossed, the four results from the 16x8 bit multiplication can be shifted, and the least significant 16 bits of each result are collected together into 16 16-bits. The result is passed back to the EU. 3 is an embodiment of a flow diagram illustrating the process for processing video material in the computing architecture of FIG. 2. More specifically, the command stream processor can send data and instructions to the EUP 146 as illustrated in the embodiment of FIG. The EUP 146 is accordingly available to read instructions and process the received data. 26 200816820 ojuuu-uulOIOO-TW 24381twf.doc/n EUP 146 can then send instructions, processed data, and data from the Eup Texture Address Generator (TAG) interface 242 to the Texture Address Generator (TAG). . The TAG 150 can be used to generate the address of the processed data. The TAG 150 can then send the data and instructions to the texture cache controller (TCC) 166. TCC 166 is available to cache data for the texture filter unit (TFU) 168. The TFU 168 can filter the received information and receive the filtered data to the Video Programmable Unit (VPU) 199. (The VPU 199 can process the received data in accordance with the received instructions and send the processed data to a post wrapper (psp) 160. The PSP 160 can collect pixel packets from components such as the TFU 168. If the monument is partially complete, the PSP 160 may package a plurality of image monuments and send the bricks back to the EUP 146 using a particular identification symbol sent to the pipeline. Figure 4A is an illustration of a computing device (such as having the calculation of Figure 2) An embodiment of a functional flow diagram of a data stream in an architectural computing device. As illustrated in the embodiment of Figure 4A, the encrypted data stream can be sent to a decryption component 236 on the CSP 120, 128 。. In at least one embodiment The encrypted bit stream can be resolved and written back to the video memory. The variable length decoder (VLD) hardware can then be used to decode the decrypted video. The decryption component 236 can decrypt the received bit stream to form Encoded bitstream 238. The encoded bitstream 238 can be sent to a VLD, a Huffman decoder, a complex adaptive variable length decoder (CAVLC), and/or a binary arithmetic coding. (Context Based Binary Arithmetic Coder, CAB AC) 240 (referred to herein as "Solution 27 200816820 10I00-TW 24381twf.doc/n Coder"). The decoder 240 decodes the received bit stream and decodes it. The bit stream is sent to the DirectX Video Acceleration (DXVA) data structure 242. Additionally, the data received at the DXVA data structure 242 is an external MPEG-2 VLD anti-scan, 'anti-quantization and anti-DC prediction, and External VC-1 VLD anti-scan, inverse and reverse DC/AC prediction. Then via image header 244, memory buffer 0 (MBO) 246a, MB1 246b, MB2 246c, ···, ΜΒΝ 246η This data is then fetched into the DXVA data structure 242. The data can then be entered into the jump blocks 250, 252, and 254 to continue in Figures 4B and 4C. Figure 4B is a continuation of the functional flow diagram of Figure 4A. As shown, the skip blocks 250, 252, and 254 of Figure 4A receive data at the inverse scan inverse Q component 264 and the inverse DC/AC prediction component 262. This data is processed and sent to the switch 265. The switch 265 determines Data sent via the Intra/Inter input Or not, the selected data is sent to skip blocks 27〇. Further, from the data block 260 sent to the skip code pattern reconstructed block assembly 266. 4C is a continuation of the functional flow diagram of FIGS. 4A and 4B. As shown, the data from jump blocks 272, 274 (Fig. 4A) is received at the waver assembly 280. This data is filtered by the Mc filter 282 according to a number of protocols. More specifically, if the data is received in the MPEG_2 format, the data is constructed with a % pixel deviation. If the data is received in the ν〇4 format, then a 4-tap filter is used. On the other hand, if the data is received in the H.264 format, a 6-tap filter can be utilized. The filtered data is then sent to the reconstruction reference component 284, which is sent to the switch II component 288 with the two components of the waver group 28 200816820 o^^^u-^OlOIOO-TW 24381twf.d〇c/n f. The switch component 288 also receives 1 the parent change component can determine which data will be sent to the adder 298 based on the received Intra/Inter data. In addition, the inverse conversion component 2% self-encoded pattern block reconstruction component 286 receives the data and receives the 'poor material from the switch 265 (Fig. 4B) via the jump block 276. The inverse conversion component 296 performs 8x8 discrete cosine inverse transform (IDCT) for MPEG-2 data, 8x8, 8x4, 4x8 and/or 4X4 integer conversion for VC-1 data, and 4x4 integer conversion for H.264 data, 'Hidden This data is sent to adder 298 depending on the conversion to be performed. The adder 298 sums the data of the inverse conversion component 296 and the switch 2 and sums the obtained data to the intra-loop filter 2 = 6. The in-loop filter 2% filters the received data and sends the filtered material to the reconstruction frame assembly 290. The reconstruction framework component dumps the data to the reconstruction reference component 284. The reconstruction framework component 29 can develop the data to a deblocking and detling filter 292, which can send the filtered data to a deinterlacing U component 294 for deinterlacing. This information is then available for display. 5A is a functional block diagram illustrating an embodiment of a component for providing dynamic compression (MC) and/or discrete cosine transform (dct) operations in a VPU, such as in the computing architecture of FIG. 2. More specifically, as illustrated in the embodiment of FIG. 5, bus A can be used to send 16-bit data to input 埠b of PE 3 314d, and bus a also sends data to B! The component 300 is delayed to send 16-bit data to the second input of the pE23i4c. Bus A also sends this data to ζ·! delay component 3〇2 to send % 29 200816820 〇j^uu-vJ10I00-TW 24381twf.doc/n bit data to PE 1 314b, this information is also sent to z_i Delay component 304' then enters PE 0 314a and Z_1 delay component 306. After passing through the Z·1 delay component 306, the lower 8 bits of bus A are sent to PE 0 314a ' This data is delayed by Z-1 306 and sent to ρβ 1 314b - and the z delay component. After reaching the Z_1 delay component 31, the lower 8 bits of the data are sent to the PE 2 314c and the z-1 delay component 312; after reaching the 延迟·1 delay component 312, the lower 8 bits of the data are sent to PE 3 314d. In addition, bus B transmits 64-bit metadata to each of ^ PE 3 314d, PE 2 314c, PE 1 314b, and PE 0 314a. Processing element 〇 (processing Elelment, PE 〇 ) 314a facilitates filtering of received data. More specifically,! ^ can be a component of the FIR filter. When PE 0 314a, PE 1 314b, PE 2 314c, and PE 3 314d are combined with Additions 330, this can form a 4-tap/8-tap fir filter. A portion of the data is first sent to the delay component 316. The multiplexer 318 selects the data to input the input data from the field to the response component (Field Input

ReSponse，FIR)輸出至多工器318之選擇埠，此資料自多〇工器318發送至加法器330。同樣地，來自PE 1 314b之資料發送至多工器322,其中一些資料首先在Z-2延遲組件32〇處被接收。多工器322 、、二由所接收之fir輸入端而自所接收之資料進行選擇，選定資料發送至加法器330。PE 2 314c之資料發送至多工器 326 ’其中一些資料首先發送至Z-l延遲組件324。輸入選擇待發送至加法器330之資料，自pE 3 314d之資料發送至加法器330。、 30 200816820 oj^juu-wuIOIOO-TW 24381twf.doc/n 亦輸入至加法器330的是N移位器332之反饋迴路。此資料經由z-1延遲組件326在多工器328處被接收。亦在多工器328處接收到的為捨入資料。多工器328在多工為328之述擇琿處經由較寬輸入端而對所接收之資料進行選擇。夕工為328將選定資料發送至加法器330，加法器 - 33()加上所接收之資料且將所加之資料發送至N移位器 332，此16位元移位資料被發送至輸出端。圖5B為圖5A之圖的延續。更特定言之，如圖5B之〇實施例所說明，來自記憶體緩衝器340a、340b、340c以及 340d之資料被發送至多工器342a。多工器342&將16位元資料發送至跳躍塊344a以及346a。同樣地，多工器342b 自記憶體緩衝器340b、340c、340d以及340e接收資料，且將資料發送至跳躍塊344b以及346b ;多工器342c自 340c、340d、340e以及34〇f接收資料且將資料發送至34如以及 346c ;多工器 34；2d 自 340d、340e、340f 以及 340g 接收資料且將資料發送至跳躍塊344d以及346d ;多工器〇 342e自340e、340f、340g以及340h接收資料且將資料發送至344e以及346e ;多工器342f自340f、340g、340h以及340i接收資料且將資料發送至344f以及346f;多工器 342g自340g、340h、340i以及340h接收資料且將資料發送至跳躍塊344g以及346g ;多工器342h自340h、340i、 340j以及340k接收資料且將資料發送至344h以及346h ; 多工器342i自340i、340j、340k以及3401接收資料且將資料發送至跳躍塊344i以及346i。 31 200816820 24381twf.doc/n 圖5C為圖5A以及圖5B之圖的延續。更特定言之，自多工器342a之資料（經由跳躍塊348a)發送至記憶體緩衝器B、槽350a;自多工器342b之資料（經由跳躍塊 _ 348b)發送至記憶體b、槽350b ;自多工器342c之資料 • · (經由跳躍塊348c)發送至記憶體B、槽35〇c ;自多工器 — 342d之資料（經由跳躍塊348d)發送至記憶體B、槽35〇d; 自多工為342e之貢料（經由跳躍塊348e)發送至記情體 B、槽350e ;自多工器342f之資料（經由跳躍塊348f)發〇送至記憶體B、槽350f;自多工器342g之資料（經由跳躍塊348g)發送至記憶體B、槽350g;自多工器342h之資料（經由跳躍塊348h)發送至記憶體B、槽350h ;自多工器342i之資料（經由跳躍塊348i)發送至記憶體B、槽350i。同樣地，自跳躍塊362j-362r之資料（自圖5D，以下論述）發送至轉置（Transpose)網路360。轉置網路360 轉置所接收之資料；且將其發送至記憶體緩衝器B，記憶體緩衝器B將資料發送至跳躍塊366j-366r。 (j 圖5D為圖5A-圖5C之圖的延續。更特定言之，資料在多工器369a處自跳躍塊368a(圖5B，經由多工器342a) 以及跳躍塊368j (圖5C，經由記憶體緩衝器B)被接收，此資料由vert信號選擇且經由匯流排a(見圖5A)發送至 FIR濾波器塊0 370a。同樣地，多工器369b-369i自跳躍塊 368b-368i以及368k-368r接收資料，此資料發送至fir滅波器塊370b-370i且經處理，就如關於圖5A所敘述。自 FIR濾、波器塊0 370a輸出之資料發送至跳躍塊372¾以及 32 200816820 o^^uu-vv/lOIOO-TW 24381twf.doc/n 37¾ ’ FIR濾波器塊37〇b輪出至跳躍塊372c以及372k ·， FIR濾波器塊370c輸出至跳躍塊372d以及3721 ; nR濾波斋塊370d輸出至跳躍塊372e以及372m; nR濾波器塊 _ 370e輸出至跳躍塊372f以及372n ; nR濾波器塊37〇f輸 - 出至跳躍塊372g以及372〇 ; FIR濾波器塊370g輸出至跳 - 躍塊372h以及372P; FIR濾波器塊370h輸出至跳躍塊372i 以及372q ; FIR濾波器塊370i輸出至跳躍塊372j以及、 372r。如上所論述，自跳躍塊372j-372r之資料由圖之 ( 轉置網路360接收。跳躍塊372b-372j在圖5E中繼續。圖5E為圖5A-圖5D之圖的延續。更特定言之，如圖 5E之實施例中所說明，自跳躍塊376b之資料（經由圖5D 之FIR濾波器塊370a)發送至記憶體緩衝器c、槽380b。同樣地，自跳躍塊376c之資料（經由圖5D之F1R濾波器塊370b)發送至記憶體緩衝器C、槽380c ;自跳躍塊376d 之資料（經由圖5D之FIR濾波器塊370c)發送至記憶體緩衝器C、槽380d ;自跳躍塊376e之資料（經由圖5D之 U FIR濾波器塊370d)發送至記憶體緩衝器C、槽380e ;自跳躍塊376f之資料（經由圖5D之FIR濾波器塊370e)發送至記憶體緩衝器C、槽380f;自跳躍塊376g之資料（經由圖5D之FIR濾波器塊370f)發送至記憶體缓衝器c、槽380g;自跳躍塊376h之資料（經由圖5D之FIR濾波器 ^ 塊37〇g)發送至記憶體緩衝器C、槽380h ;自跳躍塊376i 之資料（經由圖5D之FIR濾波器塊370h)發送至記憶體緩衝器C、槽380i ;自跳躍塊376j之資料（經由圖5D之 33 200816820 〇j^vu-wl0I00-TW 24381twf.doc/n FIR濾波器塊370i)發送至記憶體缓衝器C、槽380j。多工器382a自記憶體缓衝器C、槽380b、380c以及 380d接收資料；多工器382b自記憶體緩衝器〇槽380d、 380e以及380f接收資料；多工器382c自記憶體緩衝器c、，槽380f、38〇g以及38〇h接收資料；多工器382d自記憶體 - 緩衝器C、槽380h、380i以及380j接收資料。一旦接收到資料’多工器382a-382d便將資料發送至ALU 384a-384d。加法器382d接收此資料以及值“丨”以處理所接收之資料並將經處理之資料分別發送至移位器386a-386d，移位器 386a-386d將所接收之資料移位且將經移位之資料發送至 Z塊388a-388d，接著將資料自z塊388a_388d分別發送至多工器 390a-390d。另外，Z塊388a自跳躍塊376b接收資料且將資料發至多工器390a ; Z塊388b自跳躍塊376c接收資料且將貝料發送至多工器390b ; Z塊388c自跳躍塊376d接收資料且f資料發送至多工器39以；：2塊388(1自跳躍塊 ◎ 接收貧料且將資料發送至多工器390d ;多工器390a-390d 亦接收選擇輸入且將選定資料發送至輸出端。 _圖5F為圖5八_圖5E之組件的總圖之實施例。更特定 :之，如圖5F之實施例所說明，資料在記憶體緩衝器a 34〇 =被接收=此資料在多工器342處與記憶體緩衝器A細次=其他資料一起多工。多工器342選擇資料，且將選定發达至記憶體緩衝器B 35G。記憶體緩衝11 B 35〇亦达網路360接收資料。記憶體緩衝器B 350將資料發 34ReSponse, FIR) is output to the selection of multiplexer 318, which is sent from adder 318 to adder 330. Similarly, data from PE 1 314b is sent to multiplexer 322, some of which is first received at Z-2 delay component 32A. The multiplexer 322, and the second are selected from the received data by the received fir input, and the selected data is sent to the adder 330. The data of PE 2 314c is sent to multiplexer 326' where some of the data is first sent to Z-l delay component 324. The data to be sent to the adder 330 is selected for input, and the data from the pE 3 314d is sent to the adder 330. 30 200816820 oj^juu-wuIOIOO-TW 24381twf.doc/n Also input to the adder 330 is the feedback loop of the N shifter 332. This data is received at multiplexer 328 via z-1 delay component 326. Also received at multiplexer 328 is rounded data. The multiplexer 328 selects the received data via the wider input at the multiplex option 328. Xigong 328 sends the selected data to adder 330, adder-33() adds the received data and sends the added data to N shifter 332, and the 16-bit shift data is sent to the output. . Figure 5B is a continuation of the diagram of Figure 5A. More specifically, the data from the memory buffers 340a, 340b, 340c, and 340d is sent to the multiplexer 342a as illustrated in the embodiment of Fig. 5B. The multiplexer 342& sends 16-bit data to the skip blocks 344a and 346a. Similarly, multiplexer 342b receives data from memory buffers 340b, 340c, 340d, and 340e and transmits the data to jump blocks 344b and 346b; multiplexer 342c receives data from 340c, 340d, 340e, and 34〇f and Data is sent to 34 as well as 346c; multiplexer 34; 2d receives data from 340d, 340e, 340f, and 340g and transmits the data to jump blocks 344d and 346d; multiplexer 342e receives from 340e, 340f, 340g, and 340h The data is sent to 344e and 346e; the multiplexer 342f receives the data from 340f, 340g, 340h and 340i and sends the data to 344f and 346f; the multiplexer 342g receives the data from 340g, 340h, 340i and 340h and the data is Sended to jump blocks 344g and 346g; multiplexer 342h receives data from 340h, 340i, 340j, and 340k and transmits the data to 344h and 346h; multiplexer 342i receives data from 340i, 340j, 340k, and 3401 and sends the data to Jump blocks 344i and 346i. 31 200816820 24381twf.doc/n Figure 5C is a continuation of the diagrams of Figures 5A and 5B. More specifically, the data from the multiplexer 342a (via the jump block 348a) is sent to the memory buffer B, the slot 350a; the data from the multiplexer 342b (via the jump block _ 348b) is sent to the memory b, slot 350b; data from multiplexer 342c • (via jump block 348c) sent to memory B, slot 35〇c; data from multiplexer-342d (via jump block 348d) sent to memory B, slot 35 〇d; multiplexed to 342e tribute (via jump block 348e) sent to gestation body B, slot 350e; data from multiplexer 342f (via jump block 348f) sent to memory B, slot 350f The data from the multiplexer 342g (via the jump block 348g) is sent to the memory B, the slot 350g; the data from the multiplexer 342h (via the jump block 348h) is sent to the memory B, the slot 350h; the multiplexer 342i The data is transmitted to the memory B and the slot 350i via the jump block 348i. Similarly, the data from the skip blocks 362j-362r (from Figure 5D, discussed below) is sent to the Transpose network 360. The transposed network 360 transposes the received data; and sends it to the memory buffer B, which sends the data to the skip blocks 366j-366r. (j Figure 5D is a continuation of the Figures 5A-5C. More specifically, the data is at multiplexer 369a from jump block 368a (Figure 5B, via multiplexer 342a) and jump block 368j (Figure 5C, via The memory buffer B) is received, this data is selected by the vert signal and sent to the FIR filter block 0 370a via bus a (see Figure 5A). Similarly, the multiplexers 369b-369i are self-jumping blocks 368b-368i and The 368k-368r receives the data, which is sent to the fir eliminator block 370b-370i and processed as described with respect to Figure 5A. The data output from the FIR filter, wave block 0 370a is sent to the jump block 3723⁄4 and 32 200816820 o^^uu-vv/lOIOO-TW 24381twf.doc/n 373⁄4 ' FIR filter block 37〇b is rotated out to jump blocks 372c and 372k ·, FIR filter block 370c is output to jump blocks 372d and 3721; nR filter Block 370d is output to jump blocks 372e and 372m; nR filter block_370e is output to jump blocks 372f and 372n; nR filter block 37〇f is outputted to jump blocks 372g and 372〇; FIR filter block 370g is output to jump - Jump blocks 372h and 372P; FIR filter block 370h outputs to jump blocks 372i and 372q; FIR filter block 370i The jump blocks 372j and 372r are output. As discussed above, the data of the self-jumping blocks 372j-372r is received by the map (transfer network 360. The skip blocks 372b-372j continue in Figure 5E. Figure 5E is Figure 5A- Continuation of the 5D map. More specifically, as illustrated in the embodiment of Figure 5E, the data from the skip block 376b (via the FIR filter block 370a of Figure 5D) is sent to the memory buffer c, slot 380b. The data from the skip block 376c (via the F1R filter block 370b of FIG. 5D) is sent to the memory buffer C, slot 380c; the data from the skip block 376d (via the FIR filter block 370c of FIG. 5D) is sent to the memory. Body buffer C, slot 380d; data from jump block 376e (via U FIR filter block 370d of Figure 5D) is sent to memory buffer C, slot 380e; data from jump block 376f (via FIR filter of Figure 5D) The block 370e) is sent to the memory buffer C, the slot 380f; the data from the skip block 376g (via the FIR filter block 370f of FIG. 5D) is sent to the memory buffer c, the slot 380g; the data of the self-jump block 376h (via the FIR filter block 37〇g of FIG. 5D) to the memory buffer C, slot 380h; The data of the jump block 376i (via the FIR filter block 370h of FIG. 5D) is sent to the memory buffer C, the slot 380i; the data of the self-jump block 376j (via 33 of FIG. 5D 200816820 〇j^vu-wl0I00-TW 24381twf. The doc/n FIR filter block 370i) is sent to the memory buffer C, slot 380j. The multiplexer 382a receives data from the memory buffer C, the slots 380b, 380c, and 380d; the multiplexer 382b receives data from the memory buffer slots 380d, 380e, and 380f; the multiplexer 382c is from the memory buffer c The slots 380f, 38〇g, and 38〇h receive data; the multiplexer 382d receives data from the memory-buffer C, the slots 380h, 380i, and 380j. Once the data 'multiplexer 382a-382d is received, the data is sent to the ALU 384a-384d. Adder 382d receives the data and the value "丨" to process the received data and sends the processed data to shifters 386a-386d, respectively, and shifter 386a-386d shifts the received data and shifts The bit data is sent to Z blocks 388a-388d, which are then sent from z blocks 388a-388d to multiplexers 390a-390d, respectively. In addition, Z block 388a receives data from jump block 376b and sends the data to multiplexer 390a; Z block 388b receives data from jump block 376c and sends the bead to multiplexer 390b; Z block 388c receives data from jump block 376d and f The data is sent to the multiplexer 39; 2 blocks 388 (1 self-jump block ◎ receives the lean material and sends the data to the multiplexer 390d; the multiplexer 390a-390d also receives the selection input and sends the selected data to the output. Figure 5F is an embodiment of the general diagram of the components of Figures 5 - 5E. More specifically: as illustrated in the embodiment of Figure 5F, the data is received in the memory buffer a 34 = received = this data is in multiplex The 342 is multiplexed with the memory buffer A = other data. The multiplexer 342 selects the data and selects the developed to the memory buffer B 35G. The memory buffer 11 B 35 〇 also reaches the network 360 Receive data. Memory buffer B 350 sends data 34

O u 200816820 oj^wu-uOIOIOO-TW 24381twf.doc/n 送至多工器369,多工器369亦自多工器342接收資料。多工器369選擇資料，且將選定資料發送至FIR濾波器 370。FIR濾波器將所接收之資料過濾，且將經過濾之資料發达至記憶體緩衝器C 38〇、z組件388以及傳送網路 360。記憶體緩衝器c 38〇將資料發送至多工器，多工器382自從記憶體緩衝器c 38〇接收之資料進行選擇。被選定的資料發送至ALU 3 84 ’ ALU 3 84自所接收資料計算結果，且將計算所得的資料發送至移位器386。接著^ 位之資料被發送至多工器39〇,多工器39()亦自z会且件^ 接收貧料，多工器·藝結果且將此結果魏至輸出端。月5F中所示之組件可用以提供動態塵縮（MC) 及/或離政餘弦轉換（DCT)。更特定言之，視O u 200816820 oj^wu-uOIOIOO-TW 24381twf.doc/n is sent to multiplexer 369, which also receives data from multiplexer 342. The multiplexer 369 selects the material and sends the selected data to the FIR filter 370. The FIR filter filters the received data and develops the filtered data into a memory buffer C 38 〇, a z component 388, and a transport network 360. The memory buffer c 38 sends the data to the multiplexer, and the multiplexer 382 selects the data received from the memory buffer c 38 . The selected data is sent to the ALU 3 84 ' ALU 3 84 from the received data calculation result, and the calculated data is sent to the shifter 386. The data of the next bit is sent to the multiplexer 39, and the multiplexer 39() also receives the poor material, the multiplexer and the result from the z and the result is sent to the output. The components shown in month 5F can be used to provide dynamic dust reduction (MC) and/or off-goal cosine transform (DCT). More specifically, depending on

及料格式而定，資料可在遞迴操作中通過圖5A-圖5F 之、，且件，人以達成所要結果。另外’視特殊操作及料格式而定’資料可自EUH6及/或TFU168接收特、如一非限制性實施例，在實際操作中，組件可用以接收關於待執行之操作(例如運動:償；散餘弦變換等）的指示。另外連動補仏離如，一言’動態補償⑽)資料可在=二以轉換_素格式。如下更詳細論述之其他插作或其他資料可利。式下不同用途。另外，乘法器陣列可用=件的相同或 j」用以作為乘法器之陣列以 35 200816820 ^10I00-TW 24381twf.doc/n 執行16個16位元相乘及/或用作向量或矩陣乘法哭。此一 ^ 圖6為可用於計算架構（諸如圖2之計算架構）中之 • 像素處理引擎的功能方塊圖。更特定言之，如圖6之實施例所説明，匯流排A(在移位暫存器前）以及匯流排B (見圖5A)將16位元資料發送至多工器4〇〇。多工哭4㈨之選擇埠處接&收來自FIR濾波器370之否定信號，^選擇一 ❶ 筆16位元資料，將此資料發送至多工器406。另外，多工 ϋ術可用以接收匯流排A資料（在移位暫存器後）以及令資料。多工裔402可在選擇埠處自6抽頭資料中選擇所要結果，此16位元結果可發送至16位元無正負號加法器 4〇4。16位元無正負號加法器4〇4亦可用以自匯流排a接收資料（在移位暫存器前）。 16位元無正負號加法器404可加總所接收之資料，且將釔果發送至多工态406。多工器406可用以自選擇埠處之所接收的通路反相6抽頭資料中進行選擇，選定之資料 J 可發送至16x8乘法器410，乘法器41〇亦可接收模式資料。24位元結果隨後可發送至移位器412以提供％位元結果。，7A為可用於VC-1迴路内濾波器中（諸如在圖2 =計算架射）之組件功能方塊圖。如圖7A之實施例所况明，多工器420可在輸入琿處接收“丨，，值以及“〇，，值夕工态420亦可接收絕對值<?职&加與否作為選擇輸入。同樣地，多工器422可接收“丨，，值以及“〇，，值: 36 v>10I00-TW 24381twf.doc/n 200816820 以及A3 <A0 490c絕對值與否。多工器424可接收‘丫，值、〇值作為輸入，以及clip (剪輯）值不等於〇與否 (自圖7C之移位器468)作為選擇輸入。另外，自多工器 420輸出之資料可發送至邏輯或閘426，邏輯或閘426可將資料發送至多工器428。多工器428亦可接收filter_other_3 資料作為輸入。更特定言之，如圖7A中所示可產生 filter_〇ther一3信號，此信號若不為零，則指示需過濾其他 oDepending on the format of the material, the data can be passed through the recursive operation through Figures 5A-5F, and the person can achieve the desired result. In addition, 'depending on the particular operation and material format' data may be received from EUH6 and/or TFU 168, as in a non-limiting embodiment, in practice, the component may be used to receive information about the operation to be performed (eg, exercise: reimbursement; An indication of cosine transform, etc.). In addition, the linkage is as follows, the words 'dynamic compensation (10)) can be converted to _ prime format in = two. Other interpolations or other materials discussed in more detail below may be advantageous. Under the different uses. Alternatively, the multiplier array can be used as an array of multipliers as an array of multipliers to perform 16 16-bit multiplications and/or as vector or matrix multiplications for 35 200816820 ^10I00-TW 24381twf.doc/n . Figure 6 is a functional block diagram of a pixel processing engine that can be used in a computing architecture, such as the computing architecture of Figure 2. More specifically, as illustrated in the embodiment of Figure 6, bus A (before the shift register) and bus B (see Figure 5A) send 16-bit data to the multiplexer 4. The multiplexed cry 4 (nine) selects & receives the negative signal from the FIR filter 370, selects a 16-bit data, and sends the data to the multiplexer 406. In addition, multiplexing can be used to receive bus A data (after shift register) and data. The multiplexer 402 can select the desired result from the 6-tap data at the selection point, and the 16-bit result can be sent to the 16-bit unsigned adder 4〇4. The 16-bit unsigned adder 4〇4 It can be used to receive data from bus a (before the shift register). The 16-bit unsigned adder 404 can sum the received data and send the result to the multi-mode 406. The multiplexer 406 can be selected from the inverted 6-tap data received from the selected port, and the selected data J can be sent to the 16x8 multiplier 410, which can also receive the mode data. The 24-bit result can then be sent to shifter 412 to provide a % bit result. , 7A is a functional block diagram of the components that can be used in the VC-1 loop internal filter (such as in Figure 2 = calculation of the shot). As shown in the embodiment of FIG. 7A, the multiplexer 420 can receive "丨,, and value" at the input port, and the value of the value 420 can also receive the absolute value < Select input. Similarly, multiplexer 422 can receive "丨,, value, and "〇,, value: 36 v> 10I00-TW 24381twf.doc/n 200816820 and A3 < A0 490c absolute value or not. The multiplexer 424 can receive '丫, value, 〇 value as an input, and a clip value that is not equal to 〇 or not (from the shifter 468 of Figure 7C) as a selection input. Additionally, data output from multiplexer 420 can be sent to logic OR gate 426, which can send data to multiplexer 428. The multiplexer 428 can also receive the filter_other_3 data as input. More specifically, as shown in FIG. 7A, a filter_〇ther-3 signal can be generated, and if the signal is not zero, it indicates that other signals need to be filtered.

二列像素，否則，可不過濾（修改）此4χ4塊。多工器428 根據在選擇輸入端所接收之處理像素資料3而選擇輸出資料。、圖7Β為圖7Α之圖的延續。更特定言之，如圖7八之實施例所說明，絕對值組件43〇接收9位元輸入A1 49如 (自圖7D)’絕對值組件432接收9位元輸入A2 490b(自圖7D)。藉由計算所接收資料之絕對值，最小值組件434 判定所接收資料之最小值，且將此資料作為輸出A3並發达至2進位補數組件（2，s c〇mpHment c〇mp⑽如丨）4%。2 進位補數組件436計算所接收資料之2進位補數，且將此貧料發送至減法組件438。減法組件438自輸入資料a〇 490c (自圖7D)減去此資料，隨後發送至移位器以將結果向左移位兩位並發送至加法器442。极之輸出將輸入至加法器442中，因此“電乘法杰就可執行乘以5的操作。加法為442加總所接收之資料，且將結果發送至移位器物。移位器444將所接收之資料向右移三位，且將資 37 200816820 〜w u w-v/d 10I00-TW 24381 twf.doc/n 料發送至鉗位組件（clamp component) 446。钳位組件446 亦接收吳輯值clip (自移位器468，圖7C )，且將結果發送至輸出端。應注意濾波器之結果可為負或大於255。因此此鉗位組件446可用以將結果钳位至無正負號8位元值。因此，若輸入d為負的，則d將被設定為〇。若d >剪輯值 clip，則d可被設定為剪輯值dip。 o u 圖7C為圖7A以及圖7B之圖的延續。如圖7C之實巧例，P1資料45〇a、P5資料45〇e以及Η資料45〇c被發，至多工器452。多工器452接收選擇輸入並選擇資料以叙送至減法組件460。多工器亦將輸出資料發送至多工哭 454之選擇輸入端。 ™ 多工器454亦自P4 450d、P8 450h以及P6 450f接收輸入資料夕工為454將輸出資料發送至減法組件460。減法組件460對所接收之資料作減法，並將結果發送至移位态466。移位器466將所接收之資料向左移一位，且將此結果發送至跳躍塊474。同樣地’多工器456接收輸入P2 450b、P3 450c以及 P4 450d:多工器456自多工器454接收選擇輸入，且將所選定之資料發送至減法組件464。多工器458自多工器衫6 接收選擇輸入，且自P3 450c、P7 450g以及p5 450e接收輸入貝料。多工器將輸出資料發送至減法組件464，減法，件464對所接收之資料作減法，並將此資料發送至移位态470以及加法器472。移位器470將所接收之資料向左移兩位，且將經移位之資料發送至加法器472，加法器相加所接收之資料且將結果發送至跳躍塊48〇。另外，減法組件462自P4 450d以及P5 450e接收資 38 J10I00-TW 24381twf.doc/n 200816820 料、對所接收之資料作減法並將結果發送至移位器468。移位器468將所接收之資料向右移一位，且輸出此資料作為韵輯資料clip以輸入至鉗位組件446以及多工器424。另外，P4 450d被發送至跳躍塊476而p3 45〇e資料被發送 . 至跳躍塊478。 ‘圖7D為圖7A-圖7C之圖的延續。更特定言之，如圖 7D之實施例，減法組件486自跳躍塊482以及跳躍塊484 /接收資料。減法組件486對所接收之資料作減法且將結果 D 發送至移位器488。移位器488將所接收之資料向右移三位且將結果發送至A1 490a、A2 490b以及AO 490c。另外，多工器496接收輸入資料“〇，，以及“d” 。此操作可包括：Two columns of pixels, otherwise, this 4χ4 block may not be filtered (modified). The multiplexer 428 selects the output data based on the processed pixel data 3 received at the selection input. Figure 7Β is a continuation of the Figure 7Α diagram. More specifically, as illustrated in the embodiment of FIG. 78, the absolute value component 43 receives the 9-bit input A1 49 as (from FIG. 7D) the 'absolute value component 432 receives the 9-bit input A2 490b (from FIG. 7D). . By calculating the absolute value of the received data, the minimum component 434 determines the minimum value of the received data and uses this data as output A3 and develops to the 2-bit complement component (2, sc〇mpHment c〇mp(10) such as 丨4) %. The 2-bit complement component 436 calculates the 2-bit complement of the received data and sends the lean material to the subtraction component 438. The subtraction component 438 subtracts this data from the input data a 490c (from Figure 7D) and then sends it to the shifter to shift the result to the left by two bits and to the adder 442. The output of the pole will be input to the adder 442, so "Electrical Multiply can perform the operation of multiplying by 5. The addition is 442 to sum the received data, and the result is sent to the shifter. The shifter 444 will The received data is shifted to the right by three digits, and the resource 37 200816820 ~wu wv/d 10I00-TW 24381 twf.doc/n is sent to the clamp component 446. The clamp component 446 also receives the Wu value clip. (Self-shifter 468, Figure 7C), and send the result to the output. It should be noted that the result of the filter can be negative or greater than 255. Thus this clamp component 446 can be used to clamp the result to an unsigned 8-bit Therefore, if the input d is negative, d will be set to 〇. If d > clip value clip, d can be set to the clip value dip. ou Figure 7C is the diagram of Figure 7A and Figure 7B Continuation. As shown in Fig. 7C, the P1 data 45〇a, the P5 data 45〇e, and the data 45〇c are sent to the multiplexer 452. The multiplexer 452 receives the selection input and selects the data for transmission to subtraction. Component 460. The multiplexer also sends the output data to the selection input of multiplex cry 454. TM multiplexer 454 also P4 450d, P8 450h, and P6 450f receive input data for 454 to send the output data to subtraction component 460. Subtraction component 460 subtracts the received data and sends the result to shift state 466. Shifter 466 will The received data is shifted one bit to the left and the result is sent to jump block 474. Similarly, 'multiplexer 456 receives inputs P2 450b, P3 450c, and P4 450d: multiplexer 456 receives select input from multiplexer 454 And the selected data is sent to the subtraction component 464. The multiplexer 458 receives the selection input from the multiplexer 6, and receives the input material from the P3 450c, P7 450g, and p5 450e. The multiplexer sends the output data to Subtraction component 464, subtraction, component 464 subtracts the received data and sends the data to shift state 470 and adder 472. Shifter 470 shifts the received data to the left by two bits and shifts The bit data is sent to the adder 472, which adds the received data and sends the result to the jump block 48. In addition, the subtraction component 462 receives the charge from P4 450d and P5 450e 38 J10I00-TW 24381twf.doc/n 200816820 Material, right The data is subtracted and the result is sent to the shifter 468. The shifter 468 shifts the received data one bit to the right, and outputs the data as a rhyme data clip for input to the clamp component 446 and the multiplexer 424. In addition, P4 450d is sent to jump block 476 and p3 45〇e data is sent. To jump block 478. ‘FIG. 7D is a continuation of the diagrams of FIGS. 7A-7C. More specifically, as in the embodiment of Figure 7D, subtraction component 486 receives data from jump block 482 and jump block 484. Subtraction component 486 subtracts the received data and sends result D to shifter 488. The shifter 488 shifts the received data to the right by three bits and sends the result to A1 490a, A2 490b, and AO 490c. In addition, the multiplexer 496 receives the input data "〇,, and "d". This operation may include:

If (Do filter) { P4[I]二 P4[I] - D[I] P5[I] = P5[I] + D[I] } 多工器496經由do—filter選擇輸入而選擇所要結果。〇所述結果發送至減法組件500。減法組件500亦自跳躍塊 492接收資料（經由跳躍塊476，圖7C)，對所接收之資料作減法並將結果發送至Ρ4 450d。多工器498亦接收“〇，，以及“d”作為輸入以及 • do一filter作為選擇輸入。多工器498多工此資料且將結果，發送至加法器502。加法器502亦自跳躍塊494接收資料 (經由跳躍塊478，圖7C)、相加所接收之輸入且將結果發送至P5 450e。 39 ^ J10I00-TW 24381twf.doc/n 200816820 圖8為可用於在計算架構（諸如圖2之計算架構）中仃絕對差和（sumGfabsGlutedifferenee，SAD)計算之邏 ^區塊的方塊圖。更特定言之’如圖8之實施例，組件504 收32位凡貧料A[31:〇]之—部分以及幻位元資料b之 4刀。組件504藉由判定若(c)s =馳⑹+ i則{C,S}《 A - B與否，而將輸出提供至加法器5i2。同樣地，组件 o o I06接收A資料以及B資料’且基於與組件504類似之判疋將輪出發送至加法器512，除了組件5〇6所接收之八資料以及B資料為[23:16]位元的部分以外，相對於組件5〇4 所接收之資料為[31:24]位元的部份。同樣地，組件5⑽接 =5:8]位元部份的資料、執行與組件5〇4以及5〇6類似的 ^且Ϊ結果發送至加法器512。組件51G接收_位元 ^刀的貝料、執行與組件5()4、5〇6以及5⑽類似的計算且將結果發送至加法器512。 _ H組件514 ' 516、518以及520接收資料A對應 ;兀·32]之32位元的部分（與在組件504-510處所接收之[31:0]位元部份的資料相對）。更特定言之，租件別接收資料Α以及資料Β中[31:24]位元部份的資料“且件 514執行如上所論述之類似計算，且將8位元結果發送至 m2 ”組件516接收[23:16]位元部份的資 =、執饤類似計算，且將所得資料發送至加法器切。組的3”，接^料A以及資料B作5:8]位元部份接收之資料’且將結果發送至加法器切。、、且件520如上所論述接收資料a以及資料財⑽位元部 40 200816820 〇j^uu-uw10I00-TW 24381twf.doc/n 份的資料、處理所接收之資料，且將結果發送至加法器 522 〇、组件524_530接收A資料以及B資料中[95:64]位元部份之32位元。更特定言之，組件524接收[3丨:24]位元，I且 ' 件526接收[23:16]位元，組件汹接收[咖]位元，而挺件 - 530接收[7:0]位元的資料。一旦接收到此資料，組件 524-530可用以處理所接收之資料，如上所述，經處理料隨後可發送至加法器532。同樣地，組件534_54〇接收a 〇資料以及B資料中[127:96]位元部份之32位元資料。更特 ^言之，組件534接收A資料以及B中[31:24]位元部份的貢料，組件536接收[23:16]位元部份的資料，組件538接收[15:8]位元部份的資料，組件54〇接收[7:〇]位元部份的資料。所接收資料如上所論述經處理且發送至加法器541。' 另外，加法器512、522、532以及542對所接收:資料作加法，且將10位元結果發送至加法器544。加法器相加所接收之資料，且將12位元資料發送至輸出端°。 C/ 圖9為類似於圖8所示可用於執行絕對差和（SAD) 計算之過程之另一實施例的流程圖。更特定言之，如圖9 之貫施例，1之定義為塊尺寸BlkSize且suma初始化為“〇” （區塊550)。首先判定i是否大於“〇” （方塊 552)，若1大於“〇，，，則彻细=丁此也⑴、彻沖]= Tabely[i]、vectx = mv_x+vecx⑴且 vecty = mv—y + 彻丫⑴ (方塊554)。接著可利用vectx以及yecty計算位址，亦可自Predlmage提取4x4記憶體資料（位元組對準）（方塊 41 10I00-TW 24381twf.doc/n 200816820 556)。128位元預測資料可發送至SAD 44 (見圖8)，如方塊558中所說明。另外，方塊560可接收塊資料且計算位址。在方塊560，亦可自Reflmage提取4x4記憶體資料並執行位元組對準。128位元Ref[i]資料隨後可發送至SAD - 44 (方塊558)。和值可自SAD 44發送至方塊562，其中 • 總和值suma增加“Γ而i減少“Γ。接著可判定總和值 suma是否大於臨限值（方塊564)。若是，則過程可停止；另一方面’若總和值suma不大於該臨限值，則過程可返 (^ 回方塊552以判定i是否大於〇。若i不大於〇，則過程可結束。If (Do filter) { P4[I] 2 P4[I] - D[I] P5[I] = P5[I] + D[I] } The multiplexer 496 selects the desired result via the do-filter selection input. 〇 The result is sent to subtraction component 500. Subtraction component 500 also receives data from skip block 492 (via skip block 476, Figure 7C), subtracts the received data and sends the result to Ρ 4 450d. Multiplexer 498 also receives "〇,, and "d" as inputs and • do a filter as a selection input. Multiplexer 498 multiplexes this data and sends the result to adder 502. Adder 502 also self-jumps blocks 494 receives the data (via jump block 478, Figure 7C), adds the received input, and sends the result to P5 450e. 39 ^ J10I00-TW 24381twf.doc/n 200816820 Figure 8 is available for use in a computing architecture (such as Figure 2 The computational architecture is a block diagram of the logical block of the sum of absolute sum sum (sumGfabsGlutedifferenee, SAD). More specifically, as shown in the embodiment of Figure 8, component 504 receives 32 bits of poor material A [31: 〇] The portion is the 4th block of the magic bit data b. The component 504 provides the output to the adder 5i2 by determining if (c)s = chi (6) + i then {C, S} "A - B or not. Similarly, component oo I06 receives the A data and the B data 'and sends a round out to the adder 512 based on a similarity to component 504, except that the eight data received by component 5〇6 and the B data are [23:16]. The portion of the bit that is received relative to the component 5〇4 is a portion of [31:24] bits. The component 5 (10) is connected to the data of the bit portion, and is executed similarly to the components 5〇4 and 5〇6, and the result is sent to the adder 512. The component 51G receives the material of the _ bit^^ knife and executes it. The calculations are similar to components 5() 4, 5〇6, and 5(10) and the results are sent to adder 512. _ H components 514 '516, 518, and 520 receive data A corresponding; 32 bits of 32 bits ( In contrast to the data in the [31:0] bit portion received at components 504-510. More specifically, the renter receives the data and the data in the [31:24] bit portion of the data sheet" And the piece 514 performs a similar calculation as discussed above, and sends the 8-bit result to the m2" component 516 to receive the [23:16] bit portion of the asset =, perform a similar calculation, and send the resulting data to the adder Cut. The 3's of the group, the material A and the data B are 5:8] the data received by the bit part' and the result is sent to the adder. And, as described above, the component 520 receives the data a and the data (10) bit unit 40 200816820 〇j^uu-uw10I00-TW 24381twf.doc/n of the data, processes the received data, and sends the result to the adder. 522 〇, component 524_530 receives the A data and the 32 bits of the [95:64] bit portion of the B data. More specifically, component 524 receives [3丨:24] bits, I and 'piece 526 receives [23:16] bits, component 汹 receives [coffee] bits, and operative-530 receives [7:0 ] Bit information. Upon receipt of this material, components 524-530 can be used to process the received data, which can then be sent to adder 532 as described above. Similarly, component 534_54 receives the 32 data of the a 〇 data and the [127:96] bit portion of the B data. More specifically, component 534 receives the A material and the tribute of the [31:24] bit portion of B, component 536 receives the data for the [23:16] bit portion, and component 538 receives [15:8] For the information of the bit part, the component 54 receives the data of the [7:〇] bit part. The received data is processed as discussed above and sent to adder 541. In addition, adders 512, 522, 532, and 542 add the received data: and send the 10-bit result to adder 544. The adder adds the received data and sends the 12-bit data to the output. C/ Figure 9 is a flow chart similar to another embodiment of the process shown in Figure 8 that can be used to perform absolute difference sum (SAD) calculations. More specifically, as shown in the embodiment of Fig. 9, 1 is defined as the block size BlkSize and suma is initialized to "〇" (block 550). First, it is determined whether i is greater than "〇" (block 552). If 1 is greater than "〇,,, then fine = Ding this also (1), rushing] = Tabely[i], vectx = mv_x+vecx(1) and vecty = mv-y + 丫 (1) (block 554). The address can then be calculated using vectx and yecty, or 4x4 memory data (byte alignment) can be extracted from Predlmage (block 41 10I00-TW 24381twf.doc/n 200816820 556). The 128-bit prediction data can be sent to the SAD 44 (see Figure 8) as illustrated in block 558. Additionally, block 560 can receive the block data and calculate the address. At block 560, 4x4 memory data can also be extracted from the Relmage and The byte alignment is performed. The 128-bit Ref[i] data can then be sent to SAD-44 (block 558). The sum value can be sent from SAD 44 to block 562, where • the sum value suma is increased by "Γ and i is reduced" Then, it can be determined whether the sum value suma is greater than the threshold (block 564). If so, the process can be stopped; on the other hand, if the sum value suma is not greater than the threshold, the process can return (^ back to block 552) It is determined whether i is greater than 〇. If i is not greater than 〇, the process may end.

圖10A為可用於解塊操作中（諸如可在圖2之電腦架構中執行）之多個組件的方塊圖。如圖10A之實施例，ALU 580接收輸入資料p2以及p〇,且將資料發送至絕對值組件 586。絕對值組件586計算所接收資料之絕對值且輸出資料 ap ’判定組件590判定ap是否小於β且將資料發送至跳躍塊596。ALU 580亦將資料發送至跳躍塊594。同樣地， Q ALU 582自q〇以及q2接收資料。在計算結果之後，ALU 582將資料發送至絕對值組件588,絕對值組件588判定所接收資料之絕對值，並將ap發送至判定組件592。判定組件592判定~是否小於β且將資料發送至跳躍塊598。、、,ALU600自q〇以及ρ0接收資料、計算結果且將結果如送至絕對值組件606。絕對值組件6〇6判定與所接收資料的絕對值，且將其發送至判定組件612。判定組件612 判定所接收之值是否小於α ,且將結果發送至及閘62〇。 42 200816820 〇juwu-vu10I00-TW 24381twf.doc/nFigure 10A is a block diagram of various components that may be used in a deblocking operation, such as may be performed in the computer architecture of Figure 2. As with the embodiment of FIG. 10A, ALU 580 receives input data p2 and p, and sends the data to absolute value component 586. The absolute value component 586 calculates the absolute value of the received data and the output data ap' decision component 590 determines if ap is less than β and sends the data to jump block 596. The ALU 580 also sends the data to the jump block 594. Similarly, Q ALU 582 receives data from q〇 and q2. After calculating the results, ALU 582 sends the data to absolute value component 588, which determines the absolute value of the received data and sends ap to decision component 592. Decision component 592 determines if ~ is less than β and sends the data to jump block 598. The ALU 600 receives the data from q〇 and ρ0, calculates the result, and sends the result to the absolute value component 606. The absolute value component 6〇6 determines the absolute value of the received data and sends it to decision component 612. Decision component 612 determines if the received value is less than a and sends the result to AND gate 62. 42 200816820 〇juwu-vu10I00-TW 24381twf.doc/n

ALU 602自p〇以及pl接收資料、計算結果且將結果發送至絕對值組件608。絕對值組件6〇8判定所接收資料之絕對值，且將此值發送至判定組件014。判定組件614判定所接收貧料是否小於β，且將結果發送至及閘62〇。ALU ’ 604自q〇以及W接收資料、計算結果且將結果發送至絕 ‘ 對值組件610。絕對值組件610判定所接收資料之絕對值， f將，果發送至判定組件616。判定組件616判定所接收資料是否小於P，且將結果發送至及閘020。另外，及閘 620自判定組件618接收資料，判定組件618接收bs資料且判定此資料是否不等於零。圖10B為圖i〇A之圖的延續。更特定言之，ALU 622 自pl以及ql接收資料、計算結果且將資料發送至ALU 624。ALU 624亦自跳躍塊646接收資料（經由圖1〇A的 ALU 580)以及在進位輸入端之4位元資料。alu 624隨後汁异結果且將結果發送至移位器626，移位器626將所接收之資料向右移三位。移位器626隨後將資料發送至剪〇輯3 ( cllp3 )組件628，cliP3組件628亦自跳躍塊630接收資料（經由圖10D的ALU 744，以下更詳細描述）。clip3 組件628將資料發送至多工器634且發送至，，非(not)，，閘 632。非閘632反轉所接收資料，且將反相資料發送至多工器634。多工器634亦在選擇輸入端接收tc〇資料，且將選疋資料發送至ALU 636。ALU 636亦自多工器640接收資料。多工器640自q〇以及p0接收資料，且自丨丨故^叩接收選擇輸入。ALU 636之進位輸入端接收來自多工器642 43 2 ^jIOIOO-TW 24381twf.doc/n 之資料。多工器642接收“i，，以及“〇，，以及!Μ 料。ALU 636將結果發送至SAT ( 〇,255 ) 638，§ατ⑶乃/ 638將資料發送至跳躍塊644 (在多工器携處， — 10E) 〇、回 ’. 糾，ALU 648自收以及Μ接«料以及在選擇輸 - 入端接，一位元資料，ALU 648計算結果且將此資料發送至移位1§ 650。移位器650將所接收之資料向右移一位，且將所移位之資料發送至ALU 652。同樣地，多工器656 Ο 自P1以及巾接收資料以及!left—top作為選擇輪入Γ多工态656判定結果，且將結果發送至移位器658。移位器將所接收之資料向左移一位，且將所移位之資料金:送至 ALU 652，ALU 652計算結果且將資料發送至ALU 662。 ALU 662亦自多工器660接收資料，多工器66〇接收吆以及P2以及來自跳躍塊680之資料（經由圖1〇E的非閘 802) 〇 ALU 662计异結果且將此資料發送至移位器Μ#，移 (j 位态664將所接收之資料向右移一位，且將所移位之資料發送至剪輯3 (clip3)組件668。clip3組件668亦接收tc〇, 且將資料發送至ALU 670。ALU 670亦自多工器656接收資料’计异結果後將此資料發送至多工器672。多工哭672 亦自多工器656接收資料以及自跳躍塊678接收資料（經由圖10E的多工器754) ’並將資料發送至跳躍塊674。圖10C為圖10A以及圖10B之圖的延續。如圖i〇c 之貫施例，多工崙682自p2、pi以及!left—t〇p接收資料， 44 J10I00-TW 24381twf.doc/n 200816820 亚將選定貧料發送至加法器赢。多工器684接收pi以及 p0與!left_t〇P並將結果發送至移位器7〇〇。移位器將所，收之資料向左移—位，且將其發送至加法器7。6。多工器686自p〇以及ql以及!left—t〇p接收資料。多工器_ 將資料發送至移位ft 7G2，移位器搬將所接收之資料向左移-位，、且將所移位之資料發駐加法器寫。多工器 8自q0以及qi以及iieft—t〇p接收資料，並將選定資料發送至移位器7G4,移位器7G4將所接收之㈣向左且將其發送至加法器706。多工器_自ql以及q2以及！left—t叩接收資料且將資料發送至加法器7〇6。加法器 =亦接收進位輸人端之4位元’且將輸出發送至跳躍塊同樣地，多工器691接收q2、p〇以及丨left—t叩，並選擇-結果將其發送至加法器698。多工器692接收p〇以及!left_top且將選定結果發送至加法器698。多工器的4 自q〇、ql以及!left__top接收資料，並選擇一結果將其發送至加法器698。多卫器696接收q〇、q2以及!硫【叩並選擇所要結果將此資料發送至加法器698。加法器698亦接收進位輸入化之2位元且將輸出發送至跳躍塊71〇。多工器712接收p3、q3以及Ueft—t〇p且將結果發送至移位器722。移位器722將所接收之資料向左移一位，且將其發送至加法器726。多工器714接收p2、q2以 ft—top，且將遥疋結果發送至移位器724以及加法器 726扣位益724將所接收之資料向左移一位，且將所移位 45 200816820 …v^v^vilOIOO-TW 24381twf.doc/n 之結果發送至加法器726。多工器716接收pl、ql以及!left一top且將選定結果發送至加法器726。多工器718 接收p0、q0以及!left一top，且將選定結果發送至加法器 726。多工斋720接收p〇、q〇以及丨ieft—t〇p，且將選定結果 ' 發送至加法器726。加法器726在進位輸入端接收四位元 . 與所接收之資料相加，加總後之資料發送至跳躍塊730。圖10D為圖10A-圖i〇c之圖的延續。更特定言之，、如圖10D之實施例，α表格750接收indexA以及輸出α。 f) β表格7佔接收IndexB且將資料輸出至零擴展（Zer〇ALU 602 receives the data from p〇 and pl, calculates the result, and sends the result to absolute value component 608. The absolute value component 6〇8 determines the absolute value of the received data and sends this value to decision component 014. Decision component 614 determines if the received lean material is less than beta and sends the result to gate 62. The ALU '604 receives data from q〇 and W, calculates the result, and sends the result to the absolute value component 610. Absolute value component 610 determines the absolute value of the received data, and f sends the result to decision component 616. Decision component 616 determines if the received data is less than P and sends the result to AND gate 020. Additionally, the gate 620 receives the data from the decision component 618, and the decision component 618 receives the bs data and determines if the data is not equal to zero. Figure 10B is a continuation of the diagram of Figure iA. More specifically, ALU 622 receives data from pl and ql, calculates the results, and sends the data to ALU 624. ALU 624 also receives data from jump block 646 (via ALU 580 of Figure 1A) and 4-bit data at the carry input. The alu 624 then results in a different result and sends the result to the shifter 626, which shifts the received data to the right by three bits. The shifter 626 then sends the data to the Clip 3 ( cllp3 ) component 628, which also receives the data from the jump block 630 (via the ALU 744 of Figure 10D, described in more detail below). The clip3 component 628 sends the data to the multiplexer 634 and sends it to, not, gate 632. The non-gate 632 inverts the received data and sends the inverted data to the multiplexer 634. The multiplexer 634 also receives the tc〇 data at the selection input and sends the selected data to the ALU 636. The ALU 636 also receives data from the multiplexer 640. The multiplexer 640 receives data from q 〇 and p 0 and receives the selection input from the 叩叩。. The carry input of the ALU 636 receives data from the multiplexer 642 43 2 ^jIOIOO-TW 24381twf.doc/n. The multiplexer 642 receives "i,, and "〇,, and !! ALU 636 sends the result to SAT ( 〇, 255 ) 638, § ατ (3) is / 638 sends the data to jump block 644 (in the multiplexer, - 10E) 〇, back '. Correction, ALU 648 self-receiving and Μ The material is selected and the input and output are terminated. One bit of data, ALU 648 calculates the result and sends this data to shift 1 § 650. The shifter 650 shifts the received data one bit to the right and transmits the shifted data to the ALU 652. Similarly, the multiplexer 656 receives the data from P1 and the towel and !left_top as the selection rounds the multi-process 656 decision result, and sends the result to the shifter 658. The shifter shifts the received data one bit to the left and sends the shifted data gold to the ALU 652, which calculates the result and sends the data to the ALU 662. The ALU 662 also receives data from the multiplexer 660, and the multiplexer 66 receives the data and the data from the jump block 680 (via the non-gate 802 of FIG. 1A). The ALU 662 takes the result and sends the data to Shifter Μ#, shift (j bit state 664 shifts the received data one bit to the right, and sends the shifted data to clip 3 (clip3) component 668. clip3 component 668 also receives tc〇, and will The data is sent to the ALU 670. The ALU 670 also receives the data from the multiplexer 656 and sends the data to the multiplexer 672. The multiplex cry 672 also receives data from the multiplexer 656 and receives data from the jump block 678 ( Via the multiplexer 754)' of Figure 10E and send the data to the jump block 674. Figure 10C is a continuation of the Figure 10A and Figure 10B. As shown in Figure i〇c, the multiplex 682 from p2, pi And left-t〇p receiving data, 44 J10I00-TW 24381twf.doc/n 200816820 The sub-selected poor material is sent to the adder to win. The multiplexer 684 receives pi and p0 and !left_t〇P and sends the result to the shift The shifter 7 shifts the received data to the left and transmits it to the adder 7. 6. Multiplex The device 686 receives data from p〇 and ql and !left_t〇p. The multiplexer _ sends the data to the shift ft 7G2, and the shifter shifts the received data to the left by - bit, and moves the shift The bit data is sent to the adder to write. The multiplexer 8 receives the data from q0 and qi and iieft_t〇p, and sends the selected data to the shifter 7G4, and the shifter 7G4 will receive the (four) to the left and will It is sent to the adder 706. The multiplexer_ receives data from ql and q2 and !left_t叩 and sends the data to the adder 7〇6. The adder=also receives the 4-bit of the carry input' and will The output is sent to the skip block. Similarly, the multiplexer 691 receives q2, p〇, and 丨left_t叩, and selects - the result is sent to the adder 698. The multiplexer 692 receives p〇 and !left_top and selects the result. It is sent to the adder 698. The multiplexer 4 receives the data from q〇, ql, and !left__top, and selects a result to send it to the adder 698. The multi-guard 696 receives q〇, q2, and sulphur [叩 and selects The desired result is sent to the adder 698. The adder 698 also receives the 2-bit input and sends the output. To jump block 71. Multiplexer 712 receives p3, q3, and Ueft_t〇p and sends the result to shifter 722. Shifter 722 shifts the received data one bit to the left and sends it to Adder 726. The multiplexer 714 receives p2, q2 with ft_top, and sends the teleport result to the shifter 724 and the adder 726 deducts the benefit 724 to shift the received data to the left by one bit. The result of the shift 45 200816820 ...v^v^vilOIOO-TW 24381twf.doc/n is sent to the adder 726. The multiplexer 716 receives pl, ql, and !left-top and sends the selected result to the adder 726. The multiplexer 718 receives p0, q0, and !left-top, and sends the selected result to the adder 726. The multiplex 720 receives p〇, q〇 and 丨ieft_t〇p, and sends the selected result ' to the adder 726. The adder 726 receives the four bits at the carry input. The summed data is added to the received data, and the summed data is sent to the skip block 730. Figure 10D is a continuation of the diagram of Figures 10A-i. More specifically, as in the embodiment of FIG. 10D, the alpha table 750 receives the indexA and the output a. f) β Table 7 accounts for receiving IndexB and outputs data to zero extension (Zer〇

Extend)組件752，零擴展組件752輸出β。同樣地，多工器736接收“Γ以及“〇，，以及來自跳躍塊732之資料（經由圖ι〇Α的判定塊59〇)，並選擇結果將其發送至ALU740。多工器738亦接收“Γ以及“〇，，以及來自跳躍塊734之資料（經由圖ι〇Α的判定塊592)，並將選定結果發送至ALU 740。ALU 740計算結果且將資料發送至多工器742。多工器742亦接收“1”以及色度邊〇緣旗標（chr〇ma edge flag)資料，並選擇結果且將其發送至ALU 744。ALU 744亦接收tcQ、計算結果tc且將結果發送至跳躍塊746。圖10E為圖10A-圖10D之圖的延續。更特定言之，如圖10E實施例，多工器754接收與關係式 “ChromaEdgeFlag==0) &&(ap<p)” 相關的資料，以及與關係式“ChromaEdgeFlag^O) &&(aq<p)” 相關的資料，並自非組件802接收資料，且將選定資料發送至跳躍塊乃6(至 46Extend) component 752, zero extension component 752 outputs β. Similarly, multiplexer 736 receives "Γ and "〇," and the data from jump block 732 (via decision block 59 of Figure ι) and selects the result to send it to ALU 740. The multiplexer 738 also receives "Γ" and "〇," and data from the jump block 734 (via decision block 592 of Figure ι), and sends the selected result to the ALU 740. The ALU 740 calculates the results and sends the data to the multiplexer 742. The multiplexer 742 also receives the "1" and the chroma edge flag (chr〇ma edge flag) and selects the result and sends it to the ALU 744. The ALU 744 also receives tcQ, calculates the result tc, and sends the result to the skip block 746. Figure 10E is a continuation of the Figures 10A-10D. More specifically, as in the embodiment of FIG. 10E, the multiplexer 754 receives data related to the relationship "ChromaEdgeFlag==0) &&(amp<p)", and the relationship "ChromaEdgeFlag^O" &&(aq<p)" related material, and receive data from non-component 802, and send the selected data to the jump block is 6 (to 46)

u 200816820 ^juuo-uuIOIOO-TW 24381twf.doc/n 圖10B之多工器672)。另外’多工器780接收與關係式“chromaEdgeFlag==〇) &&(ap<p) &&(abs(p0-q0)<((a»2)+2)，’ 相關的資料以及與關係式 ChromaEdgeFlag：：：：r=〇)&&(aq<p)&&(abs(pO-q〇)< ((α»2) +2))”相關的資料，多工器780亦自非組件802接收選擇輸入，依此選擇所要結果且將其發送至多工器 782、784 以及 786。多工器757自pi、ql以及非組件8〇2接收資料，將選定資料發送至移位器763，移位器763將所接收之資料向左移一位，且將其發送至加法器774。多工器759自非組件8 02接收p 〇、q 〇以及資料，且將選定資料發送至加法器 774。多工器761自q卜ρι以及非組件8〇2接收資料，且將資料發送至加法ϋ 774。加法器774亦在進位輸入端接收兩位元之資料，且將輸出發送至多工器782。移位器764自跳躍塊758接收資料（經由圖的加 ^器爾）且將所接收之#料向右移三位，接著將所移位 =工器782。移位器766自跳躍塊7_ 的編698)且將所接收之資料向右 7…，接所移位之資料發送至多工器784。移位哭u 200816820 ^juuo-uuIOIOO-TW 24381twf.doc/n Figure 10B multiplexer 672). In addition, the 'multiplexer 780 receives the relationship with the relationship "chromaEdgeFlag==〇" &&(ap<p) &&(abs(p0-q0)<((a»2)+2),' And the relationship with the relationship ChromaEdgeFlag::::r=〇)&&(aq<p)&&(abs(pO-q〇)< ((α»2) +2)) The multiplexer 780 also receives the selection input from the non-component 802, selects the desired result and sends it to the multiplexers 782, 784, and 786. The multiplexer 757 receives data from pi, q1 and non-components 8〇2, sends the selected data to the shifter 763, which shifts the received data one bit to the left and sends it to the adder 774. . The multiplexer 759 receives p 〇, q 〇 and data from the non-component 802, and sends the selected data to the adder 774. The multiplexer 761 receives the data from the q ρ ι and the non-component 8 〇 2, and transmits the data to the addition 774 774. Adder 774 also receives the two-bit data at the carry input and sends the output to multiplexer 782. Shifter 764 receives the data from jump block 758 (via the adder of the graph) and shifts the received material to the right by three bits, then shifts = worker 782. The shifter 766 self-jumps block 698 of block 7_ and sends the received data to the right 7..., and the shifted data is sent to the multiplexer 784. Shifting crying

漏自跳躍塊762接收資料（自圖1〇夕I 將所接收之資料向右移二位，接法°。726)且多工器m。 I位接者將所移位之資料發送至 47 782 200816820 ^^v/u-vulOIOO-TW 24381twf.doc/n 發达至多工器790。同樣地，多工器784自移位器766、資料多工為780與多工器776接收資料。多工器776接收pl、 ql以及來自非組件802之資料，接著將選定結果發送至多工态798。多工器786自移位器768、多工器780與多工器 778接收資料。多工器778接收p2、q2以及來自非組件⑽2 . 之資料。多工器786將選定資料發送至多工器8〇〇。如上所論述，多工器79〇自多工器782接收資料。另、外，多工器79〇自跳躍塊77¾經由圖10B的SAT組件638 ) Ο 以及多工器794接收資料。多工器794接收p〇、q〇以及非組件802之資料。多工器79〇亦接收bSn & nfilterSampleFlag資料作為選擇輸入，並將選定資料發送至緩衝器808以及810。同樣地，多工器798自多工器784、跳躍塊755 (經由圖10B的多工器674)與多工器792接收資料以及選擇輸入的bSn & nfilterSampleFlag資料。多工器792接收pi、ql以及非組件802之資料。多工器798 將資料發送至缓衝器806以及812。同樣地，多工器800 〇自多工器786接收資料且接收bSn & nfilterSampleFlag資料作為選擇輸入。另外，多工器8〇〇自多工器788接收資料。多工器788接收P2、q2以及非組件802之資料。多工器800選擇所要資料，且將資料發送至緩衝器8〇6以及 814。緩衝器804-814亦自非組件802接收資料，且將資料 - 分別發送至p2、pi、p〇、q〇、ql以及q2。圖11為說明可用於在計算架構（諸如圖2之計算架構）中執行資料之過程之實施例流程圖。如圖11之實施例紋理 48 200816820 -^10I00-TW24381twf.doc/n 位址產生器TAG的奇數方塊880以及偶數方塊882 (亦見圖2之150)接收來自輸出端口 144 (圖2)的資料。接著產生用於所接收資料的位址，且此過程進行至紋理快取記憶體與控制器（TCC) 884、886 (亦見圖2，166)。資料隨後可發送至快取記憶體890以及紋理濾波先進 * 先出組件（Texture Cache First In First Out，TFF ) 888、892，其可用以充當延遲佇列/緩衝器。資料隨後發送至紋理濾波單元 894、896 ( Texture Filter Unit，TFU，亦見圖 2，168 )。一旦資料經過濾波後，TFU894、896便將資料發送至Vpu 898、900 (亦見圖2，199)。視指令是否要求動態補償濾波、紋理快取記憶體濾波、互解塊濾波及/或絕對差和而定，資料可發送至不同VPU及/或相同VPU之不同部分。在處理了所接收之資料之後，VPU 898、900可將資料發送至輸入端口 902、904之輸出端（亦見圖2，142 )。本文中所揭露之實施例可在硬體、軟體、韌體或其組合中貫施。本文中所揭露之至少一實施例在儲存於記憶體〇中，且由適當指令執行系統所執行之軟體及/或韌體中實施。若在硬體中實施，如在替代實施例中，則本文中所揭露之實施例可以以下技術之任一者或組合來實施··具有用於對資料信號實施邏輯功能之邏輯閘的離散邏輯電路、具有適當組合邏輯閘之特殊應用積體電路（ASIC)、可程式 • 閘陣列（PGA)、場可程式閘陣列（FPGA)等。，庄思本文中所包括之流程圖展示軟體及/或硬體之可能實施例的架構、功能以及操作。關於此，可將每一方 49 200816820 v/JlOIOO-TW 24381twf.doc/n 塊解釋為表示模組、區段或代私—部分，i包括施規定邏輯魏之-❹個可執行齡 A 三此行替代實施财，減巾所鱗之舰可料及^料她：’視戶:包括之功能而定，連續二 —a IV、上可Η上同日機行或方塊有時可以相反順序執 ΓLeak from the jump block 762 to receive data (from Figure 1 I I will shift the received data to the right by two bits, connect to ° 726) and the multiplexer m. The I-bit picker sends the shifted data to 47 782 200816820 ^^v/u-vulOIOO-TW 24381twf.doc/n Developed to multiplexer 790. Similarly, multiplexer 784 receives data from shifter 766, data multiplexer 780, and multiplexer 776. The multiplexer 776 receives pl, ql, and data from the non-component 802, and then sends the selected result to the multi-mode 798. The multiplexer 786 receives data from the shifter 768, the multiplexer 780, and the multiplexer 778. The multiplexer 778 receives p2, q2, and data from the non-component (10)2. The multiplexer 786 sends the selected data to the multiplexer 8A. As discussed above, multiplexer 79 receives data from multiplexer 782. In addition, the multiplexer 79 receives the data from the jump block 773⁄4 via the SAT component 638) Ο of FIG. 10B and the multiplexer 794. The multiplexer 794 receives data of p〇, q〇 and non-component 802. The multiplexer 79 also receives the bSn & nfilterSampleFlag data as a selection input and sends the selected data to the buffers 808 and 810. Similarly, multiplexer 798 receives data from multiplexer 784, hop block 755 (via multiplexer 674 of Figure 10B) and multiplexer 792 and selects the input bSn & nfilterSampleFlag data. The multiplexer 792 receives the data of pi, ql, and non-component 802. The multiplexer 798 sends the data to the buffers 806 and 812. Similarly, multiplexer 800 接收 receives data from multiplexer 786 and receives the bSn & nfilterSampleFlag data as a selection input. In addition, the multiplexer 8 receives data from the multiplexer 788. The multiplexer 788 receives the data of P2, q2, and non-component 802. The multiplexer 800 selects the desired material and sends the data to the buffers 8〇6 and 814. Buffers 804-814 also receive data from non-component 802 and send the data - to p2, pi, p, q, ql, and q2, respectively. 11 is a flow diagram illustrating an embodiment of a process that can be used to execute data in a computing architecture, such as the computing architecture of FIG. 2. The texture block 48 of the embodiment of FIG. 11 200816820 -^10I00-TW24381twf.doc/n The odd block 880 of the address generator TAG and the even block 882 (see also FIG. 2 of 150) receive the data from the output port 144 (FIG. 2). . The address for the received data is then generated and the process proceeds to Texture Cache Memory and Controller (TCC) 884, 886 (see also Figures 2, 166). The data can then be sent to the cache memory 890 and the Texture Cache First In First Out (TFF) 888, 892, which can be used to act as a delay queue/buffer. The data is then sent to texture filtering units 894, 896 (Texture Filter Unit, TFU, see also Figure 2, 168). Once the data has been filtered, TFU 894 and 896 send the data to Vpu 898, 900 (see also Figure 2, 199). Depending on whether the instruction requires dynamic compensation filtering, texture cache memory filtering, mutual deblocking filtering, and/or absolute difference, the data can be sent to different VPUs and/or different parts of the same VPU. After processing the received data, the VPU 898, 900 can transmit the data to the outputs of the input ports 902, 904 (see also Figures 2, 142). Embodiments disclosed herein can be applied in hardware, software, firmware, or a combination thereof. At least one embodiment disclosed herein is stored in a memory cartridge and implemented in software and/or firmware executed by a suitable instruction execution system. If implemented in hardware, as in an alternate embodiment, the embodiments disclosed herein may be implemented in any one or combination of the following: discrete logic having logic gates for performing logic functions on data signals Circuits, special application integrated circuits (ASICs) with appropriate combination of logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc. The flowcharts included in this article demonstrate the architecture, functionality, and operation of possible embodiments of software and/or hardware. In this regard, each party can be interpreted as a module, section or sub-private-part, i including the specified logic Weizhi-❹ executable age A The bank can replace the implementation of the money, the ship of the scales of the towel can be expected and materializes her: 'Depending on the function: including the function, continuous two-a IV, the upper can be on the same day or the square can sometimes be executed in reverse order

應注思本文中所列出程式之任一者（其可包括於施邏輯魏之可猜齡的有序絲）可體祕由指令執行系統、裝置或設備（諸如以電腦為基礎的系統、含有處理器之系統或可自指令執行系統、裝置或設備提取指令且執行指令之其他系統）使用或結合所述各項使用之任何電月自可璜媒體中。在此文獻之上下文中，“電腦可讀媒體” 可為可含有、儲存、傳送或輸送由指令執行系統、裝置或 β又備使用或結合其進行使用之程式的任何構件。電腦可讀媒體例如可為（但不限於）電子、磁、光、電磁、紅外線或半導體系統、裝置或設備。電腦可讀媒體之更多特定實例（非詳盡清單）可包括具有一或多個導線之電連接（電子）、攜帶型電腦碟片（磁）、隨機存取記憶體（RAM)(電子）、唯讀記憶體（ROM)(電子）、可抹除可程式唯讀記憶體（EPROM或快閃記憶體）（電子）、光纖（光）以及攜帶型壓縮光碟唯讀記憶體（CDROM)(光）。另外，此揭露内容之某些實施例的範疇可包括：體現以硬體或軟體架構之媒體中所體現之邏輯中所述的功能。亦應注意條件性語言（諸如）尤其是“可（can、could、 50 200816820 〇juuu-\jw10I00-TW 24381twf.doc/n =二其他糊碰括（而雜件-般並非旨在暗示特徵、元件及/或步驟魄是被-或夕個特殊實施例所需，或暗示—或多個特殊實闕必疋包括梅用或不採用使用讀人或提示之情況下用於決策之邏輯，料管任何特殊實_巾是否將包括或執行此等特徵、元件及/或步驟。 o o 應強調以上所述之實施例僅為實施例之可能實例、僅陳述以便清晰理解此揭露内容之原理。在實質上不偏離揭露内容之精神以及範疇的情況下可對以上所述之實施例進行許多變化以及修改。所有此等修改以及變化欲包括於本文中在此揭露内容之範疇内。【圖式簡單說明】圖1為用於處理視訊資料之計算架構的實施例。圖2為類似於圖1之架構之引入了視訊處理單元 (VPU)之計算架構的實施例。圖3為諸如在圖2之計算架構中用於處理視訊以及圖形資料之過程之流程圖實施例。圖4A為在計算裝置（諸如具有圖2之計算架構的計算裝置）中之資料流之功能流程圖實施例。圖4B為圖4A之功能流程圖的延續。圖4C為圖4A以及圖4B之功能流程圖的延續。圖5A為諸如在圖2之計算架構中可用於提供動態壓 51 200816820„4381—/η 縮（MC)及/或離散餘弦轉換（DCT)操作之組件實施例的功能方塊圖。圖5Β為圖5Α之圖的延續。圖5C為圖5Α以及圖5Β之圖的延續。圖5D為圖5Α-圖5C之圖的延續。圖5Ε為圖5Α-圖5D之圖的延續。圖5F為圖5Α-圖5Ε之組件之總圖的實施例。 Ο Ο 圖6為可用於計算架構（諸如圖2之計算架構）之像素處理引擎的功能方塊圖。圖7Α為說明可用於vc—i迴路内濾波器（諸如在圖2 之計算架構中）之組件的功能方塊圖。圖7B為圖7A之圖的延續。圖7C為圖7A以及圖7B之圖的延續。圖7D為圖7A-圖7C之圖的延續。圖8為可用於在計算架構（諸如圖2之執行絕對差和計算之組件的方塊圖。可用於執行絕對差和計算之過程之圖9為類似於圖8 實施例的流程圖。 2之電腦力：;：為說明可用於Γ操作中(諸如可在圖月匈木構中執仃）之多個組件的方塊圖。圖10B為圖10A之圖的延續。圖10C為圖·以及圖10B之圖的延續。圖1(^為圖10A-圖10C之圖的延續。、圖為圖1〇A-圖10D之圖的延續。 52 200816820 ^^uuu-uulOIOO-TW 24381twf.doc/n 圖11為可用於在計算架構（諸如圖2之計算架構）中執行資料之過程之實施例流程圖。【主要元件符號說明】 ' 88、^2 :内部邏輯分析器It should be noted that any of the programs listed in this document (which may be included in the orderly thread of the numerable age) may be secretly interpreted by an instruction execution system, apparatus or device (such as a computer-based system, A system containing a processor or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus or device can be used in conjunction with or in connection with any of the electronic devices used in the various items. In the context of this document, a "computer-readable medium" can be any component that can contain, store, communicate, or transport a program for use by or in connection with an instruction execution system, apparatus, or. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media may include electrical connections (electronics) with one or more wires, portable computer disks (magnetic), random access memory (RAM) (electronic), Read-only memory (ROM) (electronic), erasable programmable read-only memory (EPROM or flash memory) (electronic), optical fiber (light), and portable compact disk read-only memory (CDROM) (light) ). In addition, the scope of certain embodiments of the disclosure may include functionality embodied in the logic embodied in the media in a hardware or software architecture. It should also be noted that conditional language (such as) especially "can (can, can, 50 200816820 〇juuu-\jw10I00-TW 24381twf.doc/n = two other miscellaneous (and miscellaneous - generally not intended to suggest features, The elements and/or steps are required, or implied, or implied—or a number of special facts must include the logic used for decision making without or without the use of a reader or prompt. Whether any of the features, components, and/or steps will be included or executed. oo It should be emphasized that the embodiments described above are only possible examples of the embodiments, and are merely stated in order to clearly understand the principles of the disclosure. Numerous changes and modifications may be made to the above-described embodiments without departing from the spirit and scope of the disclosure. All such modifications and variations are intended to be included within the scope of the disclosure herein. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is an embodiment of a computing architecture for processing video data.Figure 2 is an embodiment of a computing architecture incorporating a video processing unit (VPU) similar to the architecture of Figure 1. Figure 3 is for example Flowchart embodiment of a process for processing video and graphics data in a computing architecture. Figure 4A is a functional flow diagram embodiment of a data flow in a computing device, such as a computing device having the computing architecture of Figure 2. 4B is a continuation of the functional flow diagram of Figure 4A. Figure 4C is a continuation of the functional flow diagram of Figures 4A and 4B. Figure 5A is for providing dynamic pressure 51 such as in the computing architecture of Figure 2 200816820 „4381—/η 缩Functional block diagram of a component embodiment of (MC) and/or discrete cosine transform (DCT) operation. Figure 5A is a continuation of Figure 5A. Figure 5C is a continuation of Figure 5A and Figure 5A. Figure 5D is Figure 5 - Continuation of the diagram of Figure 5C. Figure 5A is a continuation of the diagram of Figures 5A - 5D. Figure 5F is an embodiment of the general diagram of the components of Figures 5A - 5A. Figure 6 is an illustration of a computing architecture (such as a diagram) Figure 2B is a functional block diagram illustrating components that can be used in a vc-i loop-in-the-loop filter (such as in the computing architecture of Figure 2.) Figure 7B is a diagram of Figure 7A. Continuation of the Figure. Figure 7C is a continuation of the Figure 7A and Figure 7B Figure 7D is a continuation of the Figures 7A-7C. Figure 8 is a block diagram that can be used in a computing architecture, such as the components of Figure 2 for performing absolute difference and calculations. Figure 9 can be used to perform the process of absolute difference and calculation It is a flowchart similar to the embodiment of Fig. 8. Computer power of: 2: A block diagram illustrating a plurality of components that can be used in a helium operation, such as can be executed in the Hungarian structure. Fig. 10B is Fig. 10A Continuation of the figure. Fig. 10C is a continuation of the diagram of Fig. 10B and Fig. 10B. Fig. 1 (^ is a continuation of the diagram of Figs. 10A-10C. Fig. 1 is a continuation of the diagram of Fig. 1A to Fig. 10D. 52 200816820 ^^uuu-uulOIOO-TW 24381twf.doc/n Figure 11 is a flow diagram of an embodiment of a process that can be used to execute data in a computing architecture, such as the computing architecture of Figure 2. [Main component symbol description] ' 88, ^2: internal logic analyzer

. 90、104 ··匯流排介面單元BIU l〇6a、l〇6b、l〇6c、l〇6d :記憶體介面單元 ΜΙϋ :記憶體存取端口〇 110、U6 :資料流快取記憶體 112 :頂點快取記憶體 114 : L2快取記憶體 118 :具有快取記憶體子系統之eu集區控制器 120 :命令流處理器（CSP)前端 122 : 3D與狀態組件 124 : 2D前置組件 126 : 2D先進先出（FIFO)組件 128 : CSP後端/ZL1快取記憶體 w 130 :清晰度與型號紋理處理器 132 :高級加密系統（AES)加密/解密組件 134 :三角與屬性配置單元 136 :跨距像磚產生器 138 ： ZL1 鲁 140 ： ZL2 142、902、904 :輸入端口 144 ··輸出端口 53 200816820 ^^uuo-uulOIOO-TW 2438ltwf.doc/n 146 ·執行單元之集區eup/bw壓縮器 14δ : 2與8丁快取記憶體 150 :紋理位址產生器TAG 152 : D快取記憶體 ‘ 154 : 2D處理組件 • 156 :前封裝器 158 ··内插器 160 :後封裝器 162 :寫回單元90, 104 · · Bus interface unit BIU l〇6a, l〇6b, l〇6c, l〇6d: memory interface unit ΜΙϋ: memory access port 〇110, U6: data stream cache memory 112 : Vertex Cache Memory 114: L2 Cache Memory 118: eu Pool Controller 120 with Cache Memory Subsystem: Command Stream Processor (CSP) Front End 122: 3D and Status Component 124: 2D Pre-Component 126: 2D First In First Out (FIFO) Component 128: CSP Backend/ZL1 Cache Memory w 130: Definition and Model Texture Processor 132: Advanced Encryption System (AES) Encryption/Decryption Component 134: Triangle and Attribute Configuration Unit 136: Span tile generator 138: ZL1 Lu 140: ZL2 142, 902, 904: Input port 144 · Output port 53 200816820 ^^uuo-uulOIOO-TW 2438ltwf.doc/n 146 · The unit eup of the execution unit /bw compressor 14δ: 2 and 8 bit cache memory 150: texture address generator TAG 152: D cache memory '154: 2D processing component • 156: front wrapper 158 · interpolator 160: rear Encapsulator 162: write back unit

164a、164b :記憶體存取單元MXU 166、884、886 :紋理快取記憶體與控制器TCc164a, 164b: memory access unit MXU 166, 884, 886: texture cache memory and controller TCC

168、894、896 ·•紋理濾波單元TFU 199、898、900 ··視訊處理單元Vpu 234 :加密位元流 236 :解密組件 238 :編碼位元流168, 894, 896 • Texture filtering unit TFU 199, 898, 900 · Video processing unit Vpu 234: Encrypted bit stream 236: Decryption component 238: Coded bit stream

q 240: VLD、霍夫曼（Huffinan)解碼器、CAVLC、CABAC 242 : EUP TAG 介面 244 :圖像標頭q 240: VLD, Huffinan Decoder, CAVLC, CABAC 242: EUP TAG Interface 244: Image Header

246a、246b、246c、246η :記憶體緩衝器 MB 250、252、254、256、258、260、270、272、274、276、 • 344a〜i、346a〜i、348a〜i、362j〜r、366j〜r、368a〜r、372b〜r、 376b〜j、474、476、478、480、482、484、492、494、594、 596、598、630、644、646、674、678、680、708、710、 54 2〇〇81682〇i〇i〇〇_TW2438i_/n 730、732、734、746、755、756、758、760、762、770、 772 :跳躍塊 262 :反DC/AC預測組件 264 :反掃描反Q組件 • 265 :交換器 . 266 ·編碼圖案塊重建組件 280 :濾波器組件 282 : MC濾波器〇 284 :重建參考組件 286 :編碼圖案塊重建 288 :交換器組件 290 :重建框架組件 292 :解塊及去環濾波器 294 :解交錯組件 296 :反變換組件/迴路内濾波器 298、330、442、472、502、512、522、532、542、544、 q 698、706、726、774 :加法器 300、302、304、306、308、310、312、324 ·· Z-1 延遲組件 314a、314b、314c、314d : PE 316 : Z_3延遲組件 • 320 : Z_2延遲組件 318、322、326、328、342、342a〜i、369、369a〜i、 382、382a〜d、390、390a〜d、400、402、404、406、408、 55 200816820 ^juuo-uu10I00-TW 24381twf.doc/n 420、422、424、428、452、454、456、458、496、498、 634、640、642、656、660、672、682、684、686、690、 691、692、694、696、712、714、716、718、720、736、 738、742、754、757、759、76卜 776、778、780、782、 784、786、788、790、792、794、796、798、800 :多工器 332 : N移位器 340、304a〜1 :記憶體緩衝器 350、350a〜i :記憶體B、槽 3 60 ·轉置網路 370、370a〜i ·· FIR濾波器塊 380、380b〜j ··記憶體緩衝器C、槽 384、384a〜d、580、582、600、602、604、622、624、 636、648、652、662、670、740、744、·· ALU 386、386a〜d、412、440、444、466、468、470、488、 626、650、658、664、700、702、704、722、724、763、 764、766、768 :移位器 388 、 388a〜d : Z 塊 410 :乘法器 426 :邏輯或閘 430、432、586、606、608、610 :絕對值組件 434 :最小值組件 436 : 2進位補數組件 438、460、462、464、486、500 :減法組件 446 :鉗位組件 56 200816820 ^3UUD-uul0I00-TW 24381twf.doc/n246a, 246b, 246c, 246n: memory buffers MB 250, 252, 254, 256, 258, 260, 270, 272, 274, 276, 344a~i, 346a~i, 348a~i, 362j~r, 366j~r, 368a~r, 372b~r, 376b~j, 474, 476, 478, 480, 482, 484, 492, 494, 594, 596, 598, 630, 644, 646, 674, 678, 680, 708, 710, 54 2〇〇81682〇i〇i〇〇_TW2438i_/n 730, 732, 734, 746, 755, 756, 758, 760, 762, 770, 772: Jump block 262: anti-DC/AC prediction Component 264: Anti-Scan Anti-Q Component • 265: Switch. 266 • Encoding Pattern Block Reconstruction Component 280: Filter Component 282: MC Filter 〇 284: Reconstruction Reference Component 286: Encoding Pattern Block Reconstruction 288: Switch Component 290: Reconstruction framework component 292: deblocking and de-looping filter 294: de-interlacing component 296: inverse transform component/in-loop filter 298, 330, 442, 472, 502, 512, 522, 532, 542, 544, q 698, 706, 726, 774: adders 300, 302, 304, 306, 308, 310, 312, 324 · · Z-1 delay components 314a, 314b, 314c, 314d: PE 316: Z_3 delay components • 320: Z_2 delay components 318, 322, 326, 328, 342, 342a~i, 369, 369a~i, 382, 382a~d, 390, 390a~d, 400, 402, 404, 406, 408, 55 200816820 ^juuo- uu10I00-TW 24381twf.doc/n 420, 422, 424, 428, 452, 454, 456, 458, 496, 498, 634, 640, 642, 656, 660, 672, 682, 684, 686, 690, 691, 692, 694, 696, 712, 714, 716, 718, 720, 736, 738, 742, 754, 757, 759, 76, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800: multiplexer 332: N shifters 340, 304a~1: memory buffers 350, 350a~i: memory B, slot 3 60 · transpose network 370, 370a~i ·· FIR filter blocks 380, 380b~j··memory buffer C, slots 384, 384a~d, 580, 582, 600, 602, 604, 622, 624, 636, 648, 652, 662, 670, 740, 744, ALU 386, 386a~d, 412, 440, 444, 466, 468, 470, 488, 626, 650, 658, 664, 700, 702, 704, 722, 724, 763, 764, 766, 768 : shifter 388, 388a~d: Z block 410: multiplier 426: logic or gate 430 432, 586, 606, 608, 610: Absolute value component 434: Minimum value component 436: 2-bit complement component 438, 460, 462, 464, 486, 500: Subtraction component 446: Clamp component 56 200816820 ^3UUD-uul0I00 -TW 24381twf.doc/n

450a〜h ·· PI〜8資料 490a : A1 490b ： A2 " 490c ·· AO • 504、506、508、5Γ0、514、516、518、520、524、526、 . 528、530、534、536、538、540 :組件 590、592、612、614、616、618 :判定組件 620 :及閘 (' 628、668 ·· clip3 組件 632 :非閘 638 ·· SAT 組件 748 : β表格 750 ·· α表格 752 :零擴展組件 802 ··非組件 804、806、808、810、812、814 :緩衝器 q 880、882 :紋理位址產生器-TAG方塊 888、891 ··紋理濾波先進先出組件TFF 890 :快取記憶體 57450a~h ·· PI~8 data 490a: A1 490b: A2 " 490c ·· AO • 504, 506, 508, 5Γ0, 514, 516, 518, 520, 524, 526, . 528, 530, 534, 536 , 538, 540: components 590, 592, 612, 614, 616, 618: decision component 620: and gate (' 628, 668 · · clip3 component 632: non-gate 638 · SAT component 748: β table 750 · · α Table 752: Zero Expansion Component 802 - Non-Components 804, 806, 808, 810, 812, 814: Buffers q 880, 882: Texture Address Generator - TAG Blocks 888, 891 · Texture Filtering First In First Out Components TFF 890: Cache memory 57

Claims

200816820 ; ^juuo-uulOIOO-TW 24381twf.doc/n, patent application scope: 1 A programmable video processing unit, comprising: a visual logic circuit for receiving one selected from at least two formats; Receiving a logic circuit from one of the instruction sets, the instruction includes an indication field for indicating the format of the video material; and WO o the first parallel logic circuit, wherein the indication block indicates a first The first format processes the video material; and the "eight-day keeper' root second parallel logic circuit, wherein the indication block indicates that a second data processing the video data according to the second format. Root 2·Programmable Video as described in Patent Application No. 1 The first format includes H.264, and the second format is selected as ^, which is associated with MPEG-2. "^1, Ningzhi-1〇1 兮金3. Aru's patentable video processing as described in Patent Revenue 2 is a singular-parallel logic circuit for the following items ^, where: 2: conversion The wave device ', the road inside the 遽; the second tweeting wave 4. As described in the scope of claim 2, the second time is the processing unit, which makes any combination ··· dynamic compensation filter crying, The circuit is the intra-loop filter and conversion of the following items. The decentralized cosine inverse conversion filter, the wave device, and the combination of the meaning: - the dynamic compensation 遽 wave ^ brother logic circuit is the following items: the waver and the conversion filter have a positive number Conversion filter, primary loop internal filter 58 200816820 ^uud-uu10I〇〇.tw 24381twf.doc/n 6. The programmable video processing unit of claim 1 further includes a calculation for performing absolute difference and calculation The logic circuit of the program of claim 1 further includes a logic circuit for performing deblocking filtering in the loop. 8. The programmable video processing unit of claim 1 , more included to perform texture fast Filtered logic circuit. ^ 9. „A programmable oo processing early element for processing video data of at least two formats, including: material; filter-to-circuit, using video recording data to read the video data $Logic Wei, rib root _ Wei Zhige converts the video data to output the video data for subsequent processing logic circuit; wherein the filter logic circuit and the conversion logic circuit can be run in parallel. The programmable video unit can be selected from at least one of the following: H.264 'V (H and Penggong Chinese Patent Range 9 described in the programmable video processing unit, the center of the filter logic circuit implementation - dynamic _ The test can be processed as described in patent side 9, and the straight-to-intermediate logic circuit is in the format of H 624 early /, 13. such as the application of direct brake rFl n L 1 inch 仃仃整数整数整数整数整数整数 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 可可可Turn 14 · The Programmable Video Office as described in Patent Scope 9 Unit, more 59 200816820 o^^vu-uOIOIOO-tw 24381twf.doc/n The inclusion-deblocking logic circuit performs a deblocking on the video data: the deblocking logic circuit operates in parallel with the filtering logic circuit and the conversion logic A method for processing video data, comprising: 接收 receiving an instruction from an instruction set; > receiving video data selected from one of at least two formats; and processing the video material according to the instruction; wherein the instruction includes a The identification field is used to indicate the format of the video data; and the step of processing the video data in the port is performed according to the identification field by using a plurality of different algorithms. The method of processing video data according to claim 15, wherein the format of the video material comprises at least the following: Η.264, VC-1 and MPEG-2. 17. The video data processing method of claim 15, wherein the step of processing the video data comprises using at least two of the following algorithms: dynamic state compensation filtering, integer conversion, discrete cosine inverse conversion, and intra-loop filtering. ,wave. 18. The video data processing method of claim 17, wherein the step of processing the video data comprises dynamic compensation filtering and discrete cosine inverse conversion when the identification field is MPEG-2. 19. The method of processing video data as described in the patent application scope, wherein when the identification block is one of VC-1 and H·264, the step of processing the video data comprises dynamic compensation filtering, integer conversion and intra-loop filtering. . 20. The method of processing video data as described in claim 15 further includes any combination of the following items: performing an absolute difference and calculation; performing a texture cache memory; Perform deblocking filtering in the first loop. ϋ 61