TW200816082A

TW200816082A - Filtering for VPU

Info

Publication number: TW200816082A
Application number: TW096121890A
Authority: TW
Inventors: Zahid Hussain
Original assignee: Via Tech Inc
Priority date: 2006-06-16
Filing date: 2007-06-15
Publication date: 2008-04-01
Also published as: CN101068353A; TW200821986A; TWI482117B; CN101083763A; TW200816820A; CN101068353B; TW200803525A; CN101068365A; TWI348654B; TW200803527A; CN101072351B; CN101068364B; CN101072351A; TWI383683B; TWI350109B; TWI444047B; CN101083764A; CN101083764B; CN101083763B; TWI395488B

Abstract

Included are embodiments for processing video data. At least one embodiment includes receive logic configured to receive the video data chosen from a plurality of formats and filter logic configured to filter the video data according to the instruction. Similarly, some embodiments include transform logic configured to transform the video data according to the instruction, where the instruction contains a mode indication in which the filter logic and the transform logic execute based on the format of the video data.

Description

200816082 S3U06-0022I00-TW 24598twf.doc/p 九、發明說明：【發明所屬之技術領域】本發明是關於處理視訊以及圖形資料，更特定言之，本發明是關於提供一種具有可程式核心之視訊處理單元。【先前技術】200816082 S3U06-0022I00-TW 24598twf.doc/p IX. Description of the Invention: [Technical Field] The present invention relates to processing video and graphic materials, and more particularly, to provide a video with a programmable core Processing unit. [Prior Art]

隨耆電腦技術之不斷發展’對計算設備之需求亦隨之提升。更特定言之，許多電腦應用程式及/或資料流需要對視訊資料進行處理，隨著視訊資料變得愈加複雜，對視訊資料之處理要求亦隨之增加。目前’許多計算架構提供用於處理包括視訊以及圖形資料之中央處理單元（CPU)，雖然CPU可提供用於一些視訊以及圖形之適當處理能力，但CPU亦需處理其他資料。因此，在處理複雜視訊以及圖形中對CPU之需求可能會不利地影響整個系統之效能。另外，許多計算架構包括用於處理資料之一或多個執行單元（EU)。更特定言之，在至少一架構中EU可用以處理多個=同類型之資料。如同c P U般，對E U之需求衍士，處理複雜視g以及圖形資料可能會不利地影響整個計算f統之效能。另外，由EU處理複雜視訊以及圖形資料可能增加=率消耗以致超過可接受的臨限值。此外，資料 =不同協定或規格更會限制EU處理視訊以及圖形資料之 S降Π許多計算架構提供32位元命令，該情況用多個操作亦是另—需装。早、仟于扪 200816082 S3U06-0022I00-^ 24598twfd〇c/p 因此，工業仍未解決的需求【發明内容】領域中存在解決上述缺陷以及不足之迄今With the continuous development of computer technology, the demand for computing equipment has also increased. More specifically, many computer applications and/or data streams require the processing of video data. As video data becomes more complex, the processing requirements for video data increase. Currently, many computing architectures provide a central processing unit (CPU) for processing video and graphics data. Although the CPU can provide appropriate processing power for some video and graphics, the CPU also needs to process other data. Therefore, the need for CPU in processing complex video and graphics can adversely affect the performance of the overall system. In addition, many computing architectures include one or more execution units (EU) for processing data. More specifically, the EU can be used to process multiple = same type of data in at least one architecture. As with C P U, the demand for E U, dealing with complex g and graphical data can adversely affect the performance of the entire computing system. In addition, processing complex video and graphical data by the EU may increase the rate consumption beyond the acceptable threshold. In addition, the data = different agreements or specifications will limit the EU's processing of video and graphics data. Many computing architectures provide 32-bit commands, which is also required for multiple operations. Early, 仟于扪 200816082 S3U06-0022I00-^ 24598twfd〇c/p Therefore, the industry still has unresolved requirements [Summary] There are problems in the field to solve the above defects and deficiencies

^發明包括帛於根據—指令處理視訊資料的可程式視訊处理單兀’包含—接收邏輯電路，用以接收選自複數個格式 ^的視树料；—濾波邏輯電路，用以根據指令濾波該視訊貧枓’以及-轉換糖電路，肋根據指令轉猶濾波資料。指令包含-模式指示攔烟以指示濾波邏輯與轉換邏輯電路根據視訊資料之格式運作。、本發明之另一實施例包括一種用以處理至少兩種格式之視訊資料的可財魏纽單元H —觸邏輯電路，用以辨識視訊資料的格式；—嶋補償邏輯電路，用以執行一動態補償操作離散餘弦反轉換邏輯電路，用以執行一離散餘弦反轉換操作；以及—整數轉換邏輯電路，用以執行一整數轉換操作。其中離散餘弦反轉換邏輯電路與整數轉換邏輯電路根據該辨識邏輯電路的辨識結果分別被關閉。本發明亦包括用於處理視訊資料之方法的實施例。至少一實施例包括接收一指令；接收選自至少兩種格式之一的視訊資料；根據指令濾波該視訊資料；以及根據指令轉換該視亂貧料。其中此指令包含一模式識別欄位用以指示濾波與轉換該視訊資料之步驟根據視訊資料之格式運作。本發明揭露之其他系統、方法、特徵以及優點在檢視了以下圖式以及詳細描述之後對於熟習該項技術者將是明顯的或變得明顯。預期將所有此等額外系統、方法、特徵 7 200816082 S3U06-0022I00-TW 24598twf.doc/p 以及優點包括於此描述内容内及本揭露内容之範疇内。【實施方式】圖1為用於處理視訊資料之計算架構的一實施例。如圖1所示’計算裝置可包括執行單元（Execution Unit，EU) 之集區（pool) 146。執行單元之集區146可包括用於在圖 1之計异架構中執行資料之一^或多個執行單元。執行單元之集區146 (本文中稱為“EUP 146” )可耦接至資料流快取記憶體116,且自資料流快取記憶體116接收資料。EUP 146亦可耦接至輸入端口 142以及輸出端口 144。輸入端口 142可用以自具有快取記憶體子系統之EUP控制器118接收資料。輸入端口 142亦可自L2快取記憶體114以及後封裝器160接收資料。EUP 146可處理所接收之資料，且將經處理後的資料輸出至輸出端口 144。另外，具有快取記憶體子系統之EUP控制器118可將 ί料發送至§己憶體存取單元（memory access unit，以下簡稱MXU ) A 164a以及二角與屬性配置單元（打iangie an(j attribute setup) 134。L2快取記憶體114亦可將資料發送至MXU A 164a ’且自MXU A 164a接收資料。頂點快取記憶體（vertex cache) 112以及資料流快取記憶體n〇亦可與MXU A 164a通信’記憶體存取端口 1〇8亦與Μχυ a 164a通信。記憶體存取端口 i〇8可與匯流排介面單元（bus interface unit，BIU ) 90、記憶體介面單元（mem〇ry池时阶The invention comprises a programmable video processing unit for processing video data according to the instruction - an include-receive logic circuit for receiving a view tree selected from a plurality of formats ^, a filter logic circuit for filtering the command according to the instruction The video is poor and the conversion sugar circuit, the ribs according to the instructions to filter data. The command contains a mode indicating smoke blocking to indicate that the filtering logic and the conversion logic operate in accordance with the format of the video material. Another embodiment of the present invention includes a Win-Europe unit H-touch logic circuit for processing video data of at least two formats for recognizing a format of video data; and a compensation logic circuit for performing a The dynamic compensation operates a discrete cosine inverse conversion logic circuit for performing a discrete cosine inverse conversion operation; and an integer conversion logic circuit for performing an integer conversion operation. The discrete cosine inverse conversion logic circuit and the integer conversion logic circuit are respectively turned off according to the identification result of the identification logic circuit. The invention also includes embodiments of a method for processing video material. At least one embodiment includes receiving an instruction; receiving video material selected from one of at least two formats; filtering the video material in accordance with the instruction; and converting the visually poor material according to the instruction. The instruction includes a pattern recognition field for indicating that the step of filtering and converting the video data operates according to the format of the video material. Other systems, methods, features, and advantages of the invention will be apparent to those skilled in the <RTIgt; It is contemplated that all such additional systems, methods, features, and advantages are included within the scope of the description and the disclosure. [Embodiment] FIG. 1 is an embodiment of a computing architecture for processing video data. As shown in Figure 1, the computing device can include a pool 146 of Execution Units (EU). The pool 146 of execution units may include one or more execution units for executing data in the different architecture of FIG. The pool 146 (referred to herein as "EUP 146") of the execution unit can be coupled to the stream cache 116 and receive data from the stream cache 116. The EUP 146 can also be coupled to the input port 142 and the output port 144. Input port 142 can be used to receive data from EUP controller 118 having a cache memory subsystem. Input port 142 can also receive data from L2 cache memory 114 and post-packager 160. The EUP 146 can process the received data and output the processed data to the output port 144. In addition, the EUP controller 118 having the cache memory subsystem can send the message to the memory access unit (MXU) A 164a and the two-corner and attribute configuration unit (playing iangie an ( j attribute setup) 134. The L2 cache memory 114 can also send data to the MXU A 164a ' and receive data from the MXU A 164a. The vertex cache memory (vertex cache) 112 and the stream cache memory n〇 It can communicate with MXU A 164a. 'Memory access port 1〇8 also communicates with Μχυa 164a. Memory access port i〇8 can be connected to bus interface unit (BIU) 90, memory interface unit ( Mem〇ry pool time

unit，MIU) A 106a、MIU B l〇6b、MIU C 106c 以及 MIU D 106d通信資料，記憶體存取端口 i〇8亦可耦接至]^乂1；3 8 200816082 S3U06-0022I00-TW 24598twf.doc/p 164b 〇Unit, MIU) A 106a, MIU B l〇6b, MIU C 106c and MIU D 106d communication data, the memory access port i〇8 can also be coupled to]^乂1; 3 8 200816082 S3U06-0022I00-TW 24598twf .doc/p 164b 〇

MXU A 164a亦麵接至命令流處理器（c〇mman(j stream processor，以下簡稱CSP)前端12〇以及cSP後端128。 CSP前端120耦接至3D與狀態組件丨22，3D與狀態組件 122麵接至具有快取記憶體子系統之EUP控制器118°CSP - 鈾端120亦輕接至2D前置組件（pre c〇mp〇neni) 124，2D 前置組件124耦接至2D先進先出（FIF0)組件126。csp _ 前端120亦與清晰度及型號紋理處理器（dear and type texture processor ) 130 以及高級加密系統（advanced encryption system，AES )加密/解密組件！32通信資料。csp 後端j28耦接至跨距像磚產生器（span_tile generat〇〇 136。一角與屬性配置單元134 |馬接至3D與狀態組件122、具有快取記憶體子系統之EUP控制器118以及跨距像磚產生器136。跨距像磚產生器136可用以將資料發送至zli 快取記憶體128,跨距像磚產生器136亦可耦接至ZL1 . 138 ’ ZL1 138可將資料發送至2：£1快取記憶體128。紅2 140可耦接至Z (例如，深度緩衝快取記憶體）及模板 (stencil，ST)快取記憶體148。Z及ST快取記憶體148 可透過寫回單元I62來發送及接收資料，且可搞接至頻寬 (以下簡稱BW)壓縮器146c>BW壓縮器146亦可耦接至 M^UB 164b ’ MXUB 164b可麵接至紋理快取記憶體與控制，166。紋理快取記憶體與控制器166可耦接至紋理濾波單兀（texture filter unit，以下簡稱 TFU) 168，TFU168 可將資料發送至後封裴器16〇。後封裝器160可耦接至内 200816082 S3U06-0022I00-TW 24598twf.doc/p 插器158。前封裝器156可耦接至内插器158以及紋理位址產生器150。寫回單元162可耦接至2D處理組件（pro component) 154、D快取記憶體152、Z與ST快取記憶體 148、輸入端口 142以及CSP後端128。 t 圖1之實施例經由利用EUP 146來處理視訊資料。更 :特定言之，在至少一實施例中，執行單元之一或多者可用以處理視訊資料。雖然此架構可適用於一些應用，但此架 _ 構可能消耗過量功率；另外，此架構在處理Η·264資料中可能頗具難度。圖2為類似於圖1架構且引入了視訊處理單元（vide〇 processing unit，以下簡稱VPU)之計算架構的一實施例。更特定言之，在圖2之實施例中，可在圖1之計算架構中提供具有可程式核心之VPU199。VPU 199可耦接至CSP 前端120以及TFU168。VPU 199可作為用於視訊資料之專用處理器。另外，VPU 199可用以處理以動晝專家群（以 _ 下簡稱MPEG)、VC_1以及EL264協定編碼之視訊資料。The MXU A 164a is also connected to a command stream processor (c streamman (CSP) front end 12〇 and cSP back end 128. The CSP front end 120 is coupled to the 3D and state components 丨22, 3D and state components. 122 is connected to the EUP controller with cache memory subsystem 118 ° CSP - Uranium end 120 is also lightly connected to the 2D front component (pre c〇mp〇neni) 124, 2D front component 124 is coupled to 2D advanced First out (FIF0) component 126. csp _ front end 120 is also associated with dear and type texture processor 130 and advanced encryption system (AES) encryption/decryption component! 32 communication data. The back end j28 is coupled to the span tile generator (span_tile generat 136. The corner and attribute configuration unit 134 | the horse to the 3D and state component 122, the EUP controller 118 with the cache memory subsystem, and the span A tile generator 136. The span tile generator 136 can be used to send data to the zli cache 128, and the span tile generator 136 can also be coupled to the ZL1. 138 'ZL1 138 can send data to 2 : £1 cache memory 128. Red 2 140 can be coupled to Z (for example, deep buffer cache memory) and template (stencil, ST) cache memory 148. Z and ST cache memory 148 can send and receive data through write back unit I62, and can be connected to the frequency The wide (hereinafter referred to as BW) compressor 146c > BW compressor 146 can also be coupled to the M^UB 164b MXUB 164b can be interfaced to the texture cache memory and control, 166. The texture cache memory and controller 166 can It is coupled to a texture filter unit (TFU) 168. The TFU168 can send data to the post-package device. The post-packager 160 can be coupled to the internal 200816082 S3U06-0022I00-TW 24598twf.doc/ The p-interpolator 156 can be coupled to the interpolator 158 and the texture address generator 150. The write-back unit 162 can be coupled to a 2D pro component 154, D cache 152, Z The ST cache memory 148, the input port 142, and the CSP back end 128. The embodiment of FIG. 1 processes the video material by utilizing the EUP 146. More specifically, in at least one embodiment, one of the execution units or Many can be used to process video data. Although this architecture can be applied to some should Use, but this frame may consume excessive power; in addition, this architecture may be difficult to process Η·264 data. 2 is an embodiment of a computing architecture similar to the architecture of FIG. 1 and incorporating a video processing unit (VPU). More specifically, in the embodiment of FIG. 2, a VPU 199 having a programmable core can be provided in the computing architecture of FIG. The VPU 199 can be coupled to the CSP front end 120 and the TFU 168. The VPU 199 can be used as a dedicated processor for video data. In addition, the VPU 199 can be used to process video data encoded by the Expert Group (hereinafter referred to as MPEG), VC_1, and EL264 protocols.

更特定言之，在至少一實施例中，可在執行單元（EU) 146之一或多者上執行遮影器碼（shader code)。指令可經解碼及自暫存器提取，主要以及次要操作碼可用以判定運算元被投送之EU以及可基於此運算元執行運算之函數。若操作屬於SAMPLE類型（舉例而言，所有VPU指令皆為SAMPLE類型），則可自EUP146調度指令。儘管VPU 199可用以減少使用TFU濾波硬體，但VPU 199也可與 TFU168 —起駐存。 200816082 S3U06-0022IOO-TW 24598twf.doc/p 用於SAMPLE操作之EUP146構建580位元之資料結構（見表格1)。EUP146提取SAMPLE指令所指干夕也浪暫存器，此資料被置放於EUP-TAG介面結構之最低有效' 512位元中。EUP146插入於此結構中之其他相關資料為·· REGJTYPE :此應為 0More specifically, in at least one embodiment, a shader code can be executed on one or more of the execution units (EU) 146. The instructions can be decoded and extracted from the scratchpad, and the primary and secondary opcodes can be used to determine the EU in which the operand is being delivered and the function by which the operation can be performed based on the operand. If the operation is of the SAMPLE type (for example, all VPU instructions are of the SAMPLE type), the instructions can be scheduled from the EUP 146. Although the VPU 199 can be used to reduce the use of TFU filter hardware, the VPU 199 can also reside with the TFU168. 200816082 S3U06-0022IOO-TW 24598twf.doc/p The EUP146 for SAMPLE operation constructs a 580-bit data structure (see Table 1). The EUP 146 extracts the SAMPLE instruction, which is placed in the least significant '512 bits of the EUP-TAG interface structure. Other relevant information that EUP146 inserts into this structure is REGJTYPE: this should be 0

ThreadID -用以將結果投送回正確的遮影器程式 ShaderResID -ThreadID - used to post the result back to the correct shader program ShaderResID -

ShaderType - PS CRFIndex -目的暫存器 SAMPLE一MODE -此為待執行之vpu濾波操作 ExeMode =垂直此資料結構隨後可被發送至紋理位址產生器（textureShaderType - PS CRFIndex - Destination Scratchpad SAMPLE MODE - This is the vpu filtering operation to be performed ExeMode = Vertical This data structure can then be sent to the texture address generator (texture

address generator，以下簡稱 TAG) 150。TAG 150 可用以檢查SAMPLE一MODE位元以判定資料攔位是否含有紋理樣本資訊或實際資料。若含有實際資料，則TAG 15〇將資料直接轉發至VPU 199，否則TAG 15〇可啟始紋理提取:Address generator, hereinafter referred to as TAG) 150. The TAG 150 can be used to check the SAMPLE-MODE bit to determine if the data block contains texture sample information or actual data. If the actual data is included, TAG 15〇 forwards the data directly to VPU 199, otherwise TAG 15〇 initiates texture extraction:

11 20081608211 200816082

S3U06-0022I00-TW 24598twf.doc/p 攔位 Threadld 6 547 542 ;:·批-:<.:‘:饮>辦;併》;<·:々嘴:概; 她」.、♦::::;< 攔位 Shader Res ID 2 551 550 ._mm 義^:，::. 棚位 Shader Type 3 -, ,-, 553 » 552 :讎義顯:·:Α 補麟層:濟，'二：觸mm:,:藤養么:义 wmm娜.聯::' ' r 、纖縫厳麵·貌/ CRFindex 8 565 558 mmm^msm ， ' 1 , 讎晒 _ 、'λ、欄徂瞧國圓 Sample Mode 5 _ .*' : 、、’广 r r:.' ?，、' 圓 ^^^黎_纖纖 :、 570 、 w _ 藝 _ 麵麵圓 _ > 566 ύί_&Μ^Μ，ΜηΓΒ :P1P*_ -^mmm 雛蘇麵 mmmmmmB 麗rnrnm 攔位 exe mode 1 571 571 值 Horizontal (水平） 1 值 Vertical (垂直）〇 ::塗離毅駿琴雜錢_怒_ farff /, y 襴位、 Bx2 1 、， ,' 顚凝 * 572 ^· 擊聽鶴鑛 572 Jvc2修改。注意，對於 san^e Id ,此旗標用於指示i否使用取樣器，〇一無s#且1 一有s# (供 12 200816082 S3U06-0022I00-TW 24598twf.doc/p 雜:¾ ,Ρ: …' :: <R> 9 579 '573 表格1 :用於視訊處理之EUP-TAG介面若 SAMPLE—MODE 為 MCF、SAD、IDF—VC-1、 H264一0或IDF—H264—1中之一者，則其需要提取紋理資料，否則資料在Data攔位中。 TAG 150用以產生位址所需且傳遞至紋理快取記憶體控制器（texture cache controller，以下簡稱 TCC) 166 的資訊可在Data攔位之最低有效128位元中找到：位元[31:0] -U、V座標，此構成紋理塊之位址（4x4x8 位元）位元[102:96]-T# 位元[106:103]-S# T#、S#、U以及V為自特定表面提取之紋理所需的充分資訊。U、V、T#、S#可在解碼期間自INSTRUCTION 之SRC1欄位提取，且可用於填充以上攔位。因此，可在執行期間動態地修改U、V、T#、S#。S3U06-0022I00-TW 24598twf.doc/p Threadld 6 547 542 ;: · batch -: <.: ': drink >do; and "; < ·: grin: general; she"., ♦ ::::;< Block Shader Res ID 2 551 550 ._mm Meaning ^:,::. Shed Type Shader Type 3 -, ,-, 553 » 552 :雠义显:·:Α Bu Lin Layer: , '二: Touch mm:,: 藤养么:义wmm娜.联::' ' r, fiber suture face / appearance / CRFindex 8 565 558 mmm^msm , ' 1 , 雠 drying _, 'λ, column徂瞧国圆 Sample Mode 5 _ .*' : , , '广rr:.' ?,, '圆^^^黎_纤纤:, 570, w _ art _ face circle _ > 566 ύί_&Μ^ Μ,ΜηΓΒ :P1P*_ -^mmm 幼苏面mmmmmmB 丽rnnm exe exe mode 1 571 571 value Horizontal (horizontal) 1 value Vertical (垂直) 〇:: 涂离依骏琴杂钱_怒_ farff /, y 襕 position, Bx2 1 ,, , ' 顚 * * 572 ^ · Listen to the crane mine 572 Jvc2 modification. Note that for san^e Id, this flag is used to indicate whether i uses the sampler, 〇一无 s# and 1 has s# (for 12 200816082 S3U06-0022I00-TW 24598twf.doc/p Miscellaneous: 3⁄4 ,Ρ : ...' :: <R> 9 579 '573 Table 1: The EUP-TAG interface for video processing If SAMPLE_MODE is MCF, SAD, IDF-VC-1, H264-0 or IDF-H264-1 In one case, the texture data needs to be extracted, otherwise the data is in the data block. The TAG 150 is used to generate the information required by the address and passed to the texture cache controller (TCC) 166. Can be found in the least significant 128-bit of the Data Block: Bit [31:0] -U, V coordinate, this constitutes the address of the texture block (4x4x8 bits) Bit [102:96]-T# bit Yuan [106:103]-S# T#, S#, U, and V are sufficient information for the texture extracted from a particular surface. U, V, T#, S# can be from the SRC1 field of INSTRUCTION during decoding. Extracted, and can be used to fill the above blocks. Therefore, U, V, T#, S# can be dynamically modified during execution.

隨後SAMPLE_MODE以及含有此資訊之資料的最低有效128位元可置放於VPU 199之命令先進先出記憶體 (以下簡稱COMMAND HFO)中，相對應的資料先進先出記憶體（DATA FIFO)可填充以自紋理快取記憶體被轉發的資料（位元[383:128])或256位元（最大）。此資料將在VPU 199中被操作運算，該操作是由c〇MMAND FIF〇的寅‘來判疋的’其結果（隶大2S6攸元）可使用ThreadlD 13 200816082 S3U06-0022I00-TW 24598tw£doc/p 以及CRFIndex作為傳回位址傳回至EUP 146以及EU暫存器。另外，本發明包括由EUP 146提供且可供VPU 199使用之指令集，其指令可格式化成64位元，然而此非必要。更特定言之，在至少一實施例中，VPU指令集可包括一或多個動態補償濾波（motion compensation filter，以下簡稱 MCF)指令。在此實施例中可能存在以下MCF指令之一或多者： SAMPLE—MCF—BLR DST、S#、T#、SRC2、SRC1 SAMPLE—MCF—VC1 DST、S#、T#、SRC2、SRC1 SAMPLEJS4CFJH264 DST、S#、T#、SRC2、SRC1 SRC1之第一組32位元含有U、V座標，其中最低有效16位元為U。由於可不使用或可忽略SRC2，因此SRC2 可為任何值，例如為含有4元素濾波核心之32位元值，每一元素為如下揭示帶正負號之8位元。哼波核心（SRC2? ~3| 3| 2|2|2|2|2T2" 11 〇19] si 7 Uhl 4 核心[3] 2 2 2 2 1 1 1 1 1 3 2 1 0 9 8 7 6 5 核心[2] 4 3 核心[2] 2 0 9 0 4 核心[〇] 0 2 0 0 表格2 : MCF濾波核心另外，VPU 199之指令集還包括關於迴路内解塊濾波 (Inloop Deblocking Filtering，以下簡稱 IDF )之指令，如以下指令之一或多者·· SAMPLE—IDF—VC1 DST、S#、T#、SRC2、SRC1 14 200816082 S3U06-0022I00-TW 24598twf.doc/p SAMPLE—IDF—H264—0 DST、S#、T#、SRC2、SRC 1 SAMPLEJDF_H264 1 DST、S#、T#、SRC2、SRC1 SAMPLE IDF—H264—2 DST、S#、T#、SRC2、SRC1 對於VC-1 IDF之操作，TFU 168可將8x4x8位元（或 _ 4x8x8位元）資料提供至濾波緩衝器中。然而，對於H.264，由TFU 168輸送之資料量可視Η·264 IDF操作之類型加以控制。 φ 對於 SAMPLE—IDF—H264—0 指令，TFU 供應 8x4x8 位元（或4x8x8位元）的資料塊。對於SAMPLE IDF H264 1 — — 指令’ TFU 168供應一 4x4x8位元之資料塊，且另一 4x4x8 位元資料由遮影器（EU) 146 (圖2)供應。另外，藉由 SAMPLE—IDF—H264—2，兩個4x4x8位元資料塊皆可由遮影器（位於ETJ) 146供應，而非來自TFU 168。另外’ VPU 199之指令集還包括動態估計（m〇u〇n estimation，以下簡稱ME)指令，其可包括諸如以下列出之指令： • SAMPLE—SAD DST、S#、T#、SRC2、SRC1。以上指令可映射至以下主要以及次要操作碼且採取以上所述之格式。以下在相關指令部分中論述SRC以及DST 格式之細節。 3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 9 8 7 6 5 4 3 2 1 0 ，:S Λf 、乂' :Ί ι£^ %、挪S S2 S SRC2 S S 1 S 31 6 3 6 2 6 1 6 0 5 9 5 8 5 7 5 6 5 5 5 4 5 3 5 2 5 1 5 0 4 9 4 8 4 7 4 6 4 5 4 4 4 3 4 2 4 1 4 0 3 9 3 8 3 7 3 6 3 5 3 4 3 3 3 2 15 200816082 S3U06-0022I00-TW 24598twf.doc/pThe SAMPLE_MODE and the least significant 128 bits of the information containing this information can then be placed in the VPU 199 command first-in first-out memory (hereinafter referred to as COMMAND HFO), and the corresponding data FIFO can be filled. Data (bits [383:128]) or 256 bits (maximum) that are forwarded by the texture cache. This data will be manipulated in the VPU 199, which is judged by the 寅' of c〇MMAND FIF〇. The result (Luxury 2S6攸) can be used with ThreadlD 13 200816082 S3U06-0022I00-TW 24598tw£doc /p and CRFIndex are passed back to the EUP 146 and the EU register as the return address. Additionally, the present invention includes a set of instructions provided by EUP 146 and available to VPU 199, the instructions of which can be formatted into 64 bits, although this is not necessary. More specifically, in at least one embodiment, the VPU instruction set can include one or more motion compensation filter (MCF) instructions. There may be one or more of the following MCF instructions in this embodiment: SAMPLE-MCF-BLR DST, S#, T#, SRC2, SRC1 SAMPLE-MCF-VC1 DST, S#, T#, SRC2, SRC1 SAMPLEJS4CFJH264 DST The first set of 32 bits of S, S#, T#, SRC2, and SRC1 SRC1 contain U and V coordinates, of which the least significant 16 bits are U. Since SRC2 may not be used or may be ignored, SRC2 may be any value, such as a 32-bit value containing a 4-element filter core, each element exposing a signed 8-bit element as follows. Chopper core (SRC2? ~3| 3| 2|2|2|2|2T2" 11 〇19] si 7 Uhl 4 core [3] 2 2 2 2 1 1 1 1 1 3 2 1 0 9 8 7 6 5 Core [2] 4 3 Core [2] 2 0 9 0 4 Core [〇] 0 2 0 0 Table 2: MCF Filter Core In addition, the VPU 199 instruction set also includes in-loop deblocking filtering (Inloop Deblocking Filtering, The following is the instruction of IDF), such as one or more of the following instructions: · SAMPLE_IDF—VC1 DST, S#, T#, SRC2, SRC1 14 200816082 S3U06-0022I00-TW 24598twf.doc/p SAMPLE—IDF—H264 —0 DST, S#, T#, SRC2, SRC 1 SAMPLEJDF_H264 1 DST, S#, T#, SRC2, SRC1 SAMPLE IDF—H264—2 DST, S#, T#, SRC2, SRC1 For VC-1 IDF Operation, the TFU 168 can provide 8x4x8 bits (or _ 4x8x8 bits) of data to the filter buffer. However, for H.264, the amount of data transported by the TFU 168 can be controlled by the type of 264264 IDF operation. For the SAMPLE-IDF-H264-0 instruction, the TFU supplies 8x4x8 bits (or 4x8x8 bits) of data blocks. For SAMPLE IDF H264 1 - the instruction 'TFU 168 supplies a 4x4x8 bit data block, and A 4x4x8 bit data is supplied by the shader (EU) 146 (Fig. 2). In addition, with the SAMPLE-IDF-H264-2, both 4x4x8 bit data blocks can be supplied by the shader (located at ETJ) 146. Rather than from TFU 168. The 'VPU 199 instruction set also includes dynamic estimation (ME) instructions, which may include instructions such as those listed below: • SAMPLE-SAD DST, S#, T#, SRC2, SRC1. The above instructions can be mapped to the following primary and secondary opcodes and take the format described above. The details of the SRC and DST formats are discussed below in the relevant instructions section. 3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 9 8 7 6 5 4 3 2 1 0 , :S Λf ,乂' :Ί ι£^ %, Move S S2 S SRC2 SS 1 S 31 6 3 6 2 6 1 6 0 5 9 5 8 5 7 5 6 5 5 5 4 5 3 5 2 5 1 5 0 4 9 4 8 4 7 4 6 4 5 4 4 4 3 4 2 4 1 4 0 3 9 3 8 3 7 3 6 3 5 3 4 3 3 3 2 15 200816082 S3U06-0022I00-TW 24598twf.doc/p

表格3 :動態估計以及相應操作碼，盆中^ 位元集鎖定EU資料路徑且不允許另一線程進入管 LOCK JEG指示反轉述詞暫存器_伽〇。Table 3: Dynamic estimation and corresponding opcodes, the set of bits in the basin locks the EU data path and does not allow another thread to enter the tube. LOCK JEG indicates the inversion of the predicate _ 〇〇.

s#、T#攔位被VPU SAMPLE指令忽略。而替代地使用以 SRC1編碼之T#、S#攔位。指令 '^ 次要操作碼隹解 SSAMPLE—MCFJBLR 0 0 0 0 0 0 0 0 SSAMPLE—MCF—VCM 0 0 0 0 0 0 0 1 SSAMPLE一MCFJH264 0 0 0 0 0 1 0 0 SSAMPLEJDDF一VC-1 0 0 0 0 0 1 0 1 SSAMPLE一IDFJH264—0 0 0 0 1 0 0 0 0 SSAMPLE IDF H264 1 0 0 0 1 1 0 1 1 SSAMPLEJDFJH264一2 0 0 1 1 0 1 0 0 SSAMPLE一 SAD 0 0 1 1 1 1 1 1 表格4 :動態補償濾波以及相應操作碼 SAMPLE TCF MPEG2 益來自紋理快取記憶體之資料指令次 iSiPi 註解 SAMPLE TCF 14x4 0 0 0 — 0 SAMPLE TCF M4x4 0 0 0 —1 -_______ 16 200816082 S3U06-0022I00-TW 24$98twf.doc/p SAMPLE TCF MPEG2 0 0 1 0 表格 5 ··轉換係數濾、波（transform coefficient filtering，以下簡稱TCF)以及相應操作碼 SAMPLE指令依循圖3中所示之執行路徑。另外，介面如以下表格6 ’其他介面亦會在職更詳細The s# and T# blocks are ignored by the VPU SAMPLE instruction. Instead, the T# and S# blocks encoded in SRC1 are used. Command '^ Minor Operation Code SSSAMPLE—MCFJBLR 0 0 0 0 0 0 0 0 SSAMPLE—MCF—VCM 0 0 0 0 0 0 0 1 SSAMPLE-MCFJH264 0 0 0 0 0 1 0 0 SSAMPLEJDDF-VC-1 0 0 0 0 0 1 0 1 SSAMPLE-IDFJH264—0 0 0 0 1 0 0 0 0 SSAMPLE IDF H264 1 0 0 0 1 1 0 1 1 SSAMPLEJDFJH264-1 2 0 0 1 1 0 1 0 0 SSAMPLE-SAD 0 0 1 1 1 1 1 1 Table 4: Dynamic Compensation Filtering and Corresponding Opcodes SAMPLE TCF MPEG2 Benefits from texture cache memory data instruction iSiPi Note SAMPLE TCF 14x4 0 0 0 — 0 SAMPLE TCF M4x4 0 0 0 —1 —_______ 16 200816082 S3U06-0022I00-TW 24$98twf.doc/p SAMPLE TCF MPEG2 0 0 1 0 Table 5 ··Transform coefficient filtering (hereinafter referred to as TCF) and the corresponding opcode SAMPLE instruction are shown in Figure 3. Execution path. In addition, the interface is as shown in Table 6 below. Other interfaces will also be in the job.

17 200816082 S3U06-0022I00-TW 24598twf.doc/p : ，':二 ( V, /-：-:;， y iiS' e^hiode 1 571 wmmmsmmmmmm： Horizontal (水值平） 1 值 Vertical (垂直） ' 了:— ' ：； * - - ~ 0 ? :r，':w:… 戰_! .崎 ';.;:;w:'' '. Λ、：/ 又:·® 言遍 m 逊表格6:用於視訊處理之euP-TAG介面應注意紋理樣本濾波操作亦可映射至及_办攔位，在此種狀況下值為00ΧΧΧ。值11χχχ目前保留以供未來使用。另外，在本文中所揭露之至少一實施例中一些視訊功能可插入至紋理管線中以再利用L2快取記憶體邏輯電路以及-些L2以過_^Μυχ的㈣，如M]^動態估計）、MC (動態補償）、TC (轉換編碼）以及ID (迴路内解塊）。以下表格總結對於不同樣本指令之自TCC 166及/或 TFU 168之資料载入準則。 ⑽ ς 1 ΑΛη ^ /主思視4寸殊架構而定，17 200816082 S3U06-0022I00-TW 24598twf.doc/p : , ': two ( V, /-:-:;, y iiS' e^hiode 1 571 wmmmsmmmmmm: Horizontal (water value flat) 1 value Vertical (vertical) :—— ' : ; * - - ~ 0 ? :r,':w:... 战_! .崎';.;:;w:'' '. Λ,:/ again:·® 言过m 逊6: The euP-TAG interface for video processing should note that the texture sample filtering operation can also be mapped to the _ arbitrage, in this case the value is 00 ΧΧΧ. The value 11 χχχ is currently reserved for future use. In addition, in this paper In at least one embodiment of the disclosure, some video functions can be inserted into the texture pipeline to reuse the L2 cache memory logic and some L2 to pass the (four), such as M] ^ dynamic estimation), MC (dynamic Compensation), TC (conversion coding) and ID (deblocking in the loop). The following table summarizes the data loading criteria from TCC 166 and/or TFU 168 for different sample instructions. (10) ς 1 ΑΛ η ^ / The main thinking depends on the 4 inch architecture,

Sample一MC—H264 可僅用於 γ 非為必需的。 Y千面，但對於⑽平面並 18 200816082 S3U06-0022IG0-TW 24598twf.doc/p Λ 指令議議疆^^戀^1!1戀Ιβ__議議麵: 、註解、 Y平fe CrCb 平面， SAMPLE—MCJBLR 自紋理快取記憶體之 8x8x8位元塊是 SAMPLE一MC—VC1 自紋理快取記憶體之 12x12x8位元塊是是 SAMPLE 一MCJH264 自紋理快取記憶體之 12x12x8位元塊是否 SAMPLE 一 SAD 自紋理快取記憶體之 8x4x8位元塊，V可為任何對準是是 SAMPLE—IDF—VC1 自紋理快取記憶體之 8x4x8 位元（或 4x8x8 位元），32位元對準是是 SAMPLE JDF_H264 0 自紋理快取記憶體之 8x4x8 位元（或 4x8x8 位元），32位元對準是是 SAMPLE IDF H264 一 1 自紋理快取記憶體之 4x4x8位元，32位元對準是是 SAMPLE IDF H264 —2 無自紋理快取記憶體之資料 SAMPLE 一TCF—14x4 無自紋理快取記憶體之貢料 SAMPLE-TCF—M4x 無自紋理快取記憶體之資料 SAMPLE一TCF M3P EG2 無自紋理快取記憶體之資料 SAMPLE JMADD 無自紋理快取記憶體之貢料 SAMPLE SMMUL 無自紋理快取記憶體之資料表格7:用於視訊之資料載入 19 200816082 S3U06-0022IO0-TW 24598twf.doc/p 在本文中所揭露之至少一實施例中，γ平面可包括 HSF—YOY1Y2Y3_32BPE_VIDE02 鋪磚格式。CrCb 平面包括交錯CrCb通道且被視為HSF_CrCbJ6BPE_VIDEO鋪磚格式。若不要求CbCr交錯平面，則對於Cb或Cr，均可利用與Y平面相同的格式。另外，已將以下指令添加至遮影器指令集架構 (ISA)。Sample-MC-H264 can be used only for γ is not required. Y Thousand Faces, but for (10) Plane and 18 200816082 S3U06-0022IG0-TW 24598twf.doc/p Λ Directives to discuss Xinjiang ^^ Love ^1!1 Love Ιβ__ Discussion:, annotation, Y flat fe CrCb plane, SAMPLE —MCJBLR self-texturing memory 8x8x8 bit block is SAMPLE-MC-VC1 self-texturing memory 12x12x8 bit block is SAMPLE-MCJH264 self-texture memory memory 12x12x8 bit block whether SAMPLE-SAD From the 8x4x8-bit block of texture cache memory, V can be any alignment of 8x4x8 bits (or 4x8x8 bits) of SAMPLE-IDF-VC1 self-texturing memory, 32-bit alignment is SAMPLE JDF_H264 0 8x4x8 bits (or 4x8x8 bits) from texture cache memory, 32-bit alignment is 4x4x8 bits of SAMPLE IDF H264-1 self-texturing cache, 32-bit alignment is SAMPLE IDF H264 —2 No self-texturing memory data SAMPLE A TCF—14x4 no self-texturing memory memory SAMPLE-TCF—M4x no texture memory data SAMPLE-TCF M3P EG2 no self-texture Cache memory Material SAMPLE JMADD No self-texturing memory memory SAMPLE SMMUL No texture memory data Table 7: Data loading for video 19 200816082 S3U06-0022IO0-TW 24598twf.doc/p In this article In at least one embodiment of the disclosure, the gamma plane may comprise a HSF-YOY1Y2Y3_32BPE_VIDE02 tiled format. The CrCb plane includes interleaved CrCb channels and is considered to be in the HSF_CrCbJ6BPE_VIDEO tile format. If the CbCr interlaced plane is not required, the same format as the Y plane can be used for Cb or Cr. In addition, the following instructions have been added to the Shader Instruction Set Architecture (ISA).

SAMPLE MCF BLR SAMPLE MCF VC 1 SAMPLE MCF H264 SAMPLE—IDF—VC 1 SAMPLE—IDF 一H264J) SAMPLE—IDF 一H264一 1 SAMPLE 一 SAD SAMPLE-TCF 一MPEG2 SAMPLE 一 TCFJ4x4 SAMPLE 一 TCF 一 M4x4 SAMPLE 一 MADD SAMPLE IDF H264 2 DST、#cM、SRC2 DST、#ctrl、SRC2 DST、#ctrl、SRC2 DST、#cM、SRC2 DST、#ctrl、SRC2 DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# DST、S#、T# SRC2、SRC1 SRC2、SRC1 SRC2、SRC1 SRC2、SRC1 SRC2、SRC1 SRC2、SRC1 SRC2、SRC1 SRC1 SRC1 SRC1 SRC1 SRC1 用於SAMPLE—IDF—H264_2之#你1應為零。 SRC1、SRC2以及#ctrl (可用時）可用以形成如以下表格8中所示在EU/TAG/TCC介面中的512位元資料欄位0 20 200816082SAMPLE MCF BLR SAMPLE MCF VC 1 SAMPLE MCF H264 SAMPLE—IDF—VC 1 SAMPLE—IDF-H264J) SAMPLE—IDF-H264-1 SAMPLE-SAD SAMPLE-TCF-MPEG2 SAMPLE-TCFJ4x4 SAMPLE-TCF-M4x4 SAMPLE-MADD SAMPLE IDF H264 2 DST, #cM, SRC2 DST, #ctrl, SRC2 DST, #ctrl, SRC2 DST, #cM, SRC2 DST, #ctrl, SRC2 DST, S#, T# DST, S#, T# DST, S# , T# DST, S#, T# DST, S#, T# DST, S#, T# DST, S#, T# SRC2, SRC1 SRC2, SRC1 SRC2, SRC1 SRC2, SRC1 SRC2, SRC1 SRC2, SRC1 SRC2 , SRC1 SRC1 SRC1 SRC1 SRC1 SRC1 for SAMPLE-IDF-H264_2 #你1 should be zero. SRC1, SRC2, and #ctrl (when available) can be used to form a 512-bit data field in the EU/TAG/TCC interface as shown in Table 8 below. Bit 0 20 200816082

d/OOP.JAqoo6^s Avld>oI(Nso-90ACS ο 挺式丨 CQ P< 丨,PQUANT O W m <N 議 2 CO Pi &0 _i 一 u 〇 2 m m <N IndexB ] IndexA | bS ] CO寸 m 矩陣指數 cn 寸丨:____懸_._麵^^^^ 丨_;_纏編酬^ IndexA CO vo m卜 so m oo 卜 m on to β 寸o 〇\ ' o 寸H —〇寸（N ^ γτ* 寸m —cs 寸寸 —CO 寸们 ;Sii^ 寸v〇 ^r· Ό 「 IndexB 1::,纏纪繼?::纖嗜沿:锻破敢丨, 对卜一 \〇；GBGR 1戀戀满.1麵鱗参寸OO MMODE、| -V _ ON m $ | BS難〇 — *—<, 〇\ «ο T—( oi o IndexA uo <N IndexA ^ | 一 N — ^r> ro <N «Ν 寸 <N CD Cv|对 l〇 v〇 — <N 们卜 vo IndexB m oo IndexB I — Γ4卜, »r> o (N OO v〇 O <N CK Vs〇 r-H — m o' v〇 <N r*V— m 控制 Control_3 丨 Control」） Control 一 1 Control—2 1 Control 4 | 200816082 d/30P.JMloo6a 寸<N7wl-00I(Ns990nes o VD 卜 OO ON o Γ4 rn JO | SRCl | D D | 未定義 | 1 未定義 1 丨未定義 | > > > 議丨 > [Control 0 1 | Control 1 \ | Control 2 | | Control 2 ] | Control 2 1 Control 3 j | Control 4 丨 | ·未定義 | 未定義 1 1 未定義 1 I 未定義 1 I 未定義 | 1 Control 5 | | Control 5 | 1 Control 5 | | Control 5 : | | Control 5 | '\〇 r> 〇〇- 〇\ § isiiiie 1 未定義 1 響 l_ <N 網 s Ο :三 rn "5 •XT—4. CS <N fX <T) (N 'O <N m cu m __議画〇 Cu o cx |'，,m20 ’ | 1、 o a, g o. <N o |，’πι21Λ'| 'ληχΖΙ s s a. s tx o 響 D. 讓 rs ：：B _|議瞧_議_| i. vi^H ；繊; .謂 (N CN VD Γ4 OO CN CN ro 囑丨 cx 響 P, 丨_臓__ Β ro "S, m s a. [m30 I Β CL <N 04 <N cx (N (N VCU | 'ττώ 1 …| m31 r-> <s tx ΓΛ (N a- 、cx s D. 丨議議_纖議_丨 m32 ； V—( ro^ r4 c〇 fX rsi £X -m33 j i m33 ΓΟ m P, cn CS ro m m SRC2+1 (奇）J ιβιιιβι· ；參定義 I :'未定義、未定義，J 1第二暫存器對I 第二暫存器對第二暫存器對, 6εo寸域^岧 9εd/OOP.JAqoo6^s Avld>oI(Nso-90ACS ο 丨式丨CQ P< 丨, PQUANT OW m <N 议 2 CO Pi &0 _i a u 〇2 mm <N IndexB ] IndexA | bS ] CO inch m matrix index cn inch 丨: ____ hang _._ face ^^^^ 丨 _; _ wrapped compensation ^ IndexA CO vo m 卜 so m oo 卜 m on to β inch o 〇 \ ' o inch H - 〇 inch (N ^ γτ * inch m - cs inch - CO inch; Sii ^ inch v 〇 ^ r · Ό " IndexB 1::, wrapped around Ji?:: fiber 嗜 along: forged and dare, to 1〇; GBGR 1 love full. 1 face scales OO MMODE, | -V _ ON m $ | BS difficult * - * - <, 〇 \ «ο T—( oi o IndexA uo <N IndexA ^ | 一 N — ^r> ro <N «Ν 寸<N CD Cv|对 l〇v〇— <N 卜 vo IndexB m oo IndexB I — Γ4 Bu, »r> o (N OO v 〇O <N CK Vs〇rH — mo' v〇<N r*V— m Control Control_3 丨Control”) Control 1 Control—2 1 Control 4 | 200816082 d/30P.JMloo6a inch<N7wl-00I (Ns990nes o VD Bu OO ON o Γ 4 rn JO | SRCl | DD | Undefined | 1 Undefined 1 丨 Undefined | >>> Negotiation > [Control 0 1 | Control 1 \ | Control 2 | | Control 2 ] | Control 2 1 Control 3 j | Control 4 丨| · Undefined | Undefined 1 1 Undefined 1 I Undefined 1 I Undefined | 1 Control 5 | | Control 5 | 1 Control 5 | | Control 5 : | | Control 5 | '\〇 r> 〇〇- 〇\ § isiiiie 1 undefined 1 响 l_ <N net s Ο : three rn "5 •XT—4. CS <N fX <T) (N 'O <N m cu m __画画〇Cu o cx |',,m20 ' | 1, oa, g o. <N o |,'πι21Λ'| 'ληχΖΙ ss a. s tx o 响 D. Let rs ::B _议议议议议议议议CL <N 04 <N cx (N (N VCU | 'ττώ 1 ...| m31 r-><s tx ΓΛ (N a- , cx s D. 丨 Discussion _ fiber _ 丨 m32 ; V —( ro^ r4 c〇fX rsi £X -m33 ji m33 ΓΟ m P, cn CS ro mm SRC2+1 (odd) J ιβιιιβι· ; Definition I : 'Undefined, not Yi, J 1 of the second register of the I register to the second register a second, 6εo domain inch ^ ROADS 9ε

mooe ϊ寸 (Ν寸Γη寸？ ς 寸Vo寸 5 § 扫 Ψ% jvL v J \ 〆〇 7¾ Cu m21 s …：p s CS 、/Ά ' - i / ' εί ： m23 ΓΠ 1-( Cl 〇 i. s (N Q- «Ν ΓΊ £ CN^ 〇 cn Λ 6 ro^ CN m33 fO rn CLMooe ϊ inch (Ν inchΓη inch? 寸 inch Vo inch 5 § broom % jvL v J \ 〆〇73⁄4 Cu m21 s ...:ps CS , /Ά ' - i / ' εί : m23 ΓΠ 1-( Cl 〇i .s (N Q- «Ν ΓΊ £ CN^ 〇cn Λ 6 ro^ CN m33 fO rn CL

ο (N 寸 ^Τί v〇卜 CO ON 111 圓ΐ 〇 CS m ί VO 'V > 卜 :、 00 〇\ CN CN < m <N ίΤ) (N CN 口 oo <N On (N τιο (N inch ^Τί v〇卜 CO ON 111 round ΐ 〇CS m ί VO 'V > 卜:, 00 〇\ CN CN < m <N Τ ( (N CN port oo <N On (N Τι

TnsssWIJIWW XlSlISdwvs tN—SZH丨srsdls 「SZHISrsdIWS0丨寸 9ZHtsrsJ_s OVSTwldlMVcoGA 丨.iarwljwvw XI&SISJNVSTnsssWIJIWW XlSlISdwvs tN—SZH丨srsdls “SZHISrsdIWS0丨 9ZHtsrsJ_s OVSTwldlMVcoGA 丨.iarwljwvw XI&SISJNVS

f 11^5 浓ztsmldorsdis r 城 χ—δι 丨HltlNVS ^oJluou 戏傘 w IJP#^^(N3es , Ioes : 6 竣嵴 200816082 S3U06-0022I00-TW 24598twf.doc/p 參看表格8，ΤΓ =轉置；FD =濾波方向（垂直; bS =邊界強度（Boundary Strength) ; bR = bR 控制，ye 位元（於CbCr平面YC = 1 ;於Y平面則YC = 〇)，以及 CEF =色度邊緣旗幟(Chroma Edge Flag)。另外，當32位元或（或更少位元）使用於SRC1或SRC2 (剩餘未定義）時，可規定巷（lane)選擇以減低暫存器之使用。雖然以上描述了指令格式，但以下在表格1〇中包括對指令操作之概述。指令名稱指令格式指令操作 SAMPLE—MCFJBLR SAMPLE_MCF BLR DST > SRC2 > SRC1 MC濾波實施 SAMPLE—MCF—VC1 SAMPLE一MCF VC1 DST ^ SRC2 ^ SRC1 對於VC-1之MC 渡波實施 SAMPLE—MCF—H264 SAMPLE MCF H264 DST ^ SRC2 > SRC1 對於H.264之MC 渡波貫施 SAMPLEJDF—VCl SAMPLE—IDF VC1 DST ^ SRC2 -"SRCl 解塊操作 SAMPLEJDF—H264—0 SAMPLE IDF H264 0 DST ^ SRC2 ^RCr H.264解塊操作。自紋理快取記憶體166提供4x4x8 (垂直濾波器）或 8x4x8 塊。 SAMPLE—IDF一H264—1 SAMPLE_IDF—H264一 1 DST、SR"t2、SRC1 Η·264操作。自著色器提供一 4x4x8 位元塊，自紋理快取記憶體166提供另一 4x4x8位元塊。此允許構造 8x4 (或 4x8)塊。 SAMPLE—IDF 一H264—2 SAMPLE IDF H264—2 DST、#cM、S反C2、SRC1 H.264 兩個4x4塊均由遮 23 200816082 S3U06-0022I00-TW 24598twf.doc/p 影器提供， 8x4 塊。 SAMPLE_SAD SAMPLE一SAD DST、 S#、T#、SRC2、SRC1 對參考（SRC2)以及預測資料執行四次絕對差和 (SAD)運算。 SAMPLE—TCF—Ι4χ4 SAMPLE TCF 14x4 DST、#ct5、SlTC2、SRC1 變換編碼實施 SAMPLE—TCF—Μ4χ4 SAMPLE-TCFM4x4 DST、#ctrl、SRC2、SRC1 變 SAMPLE一TCF—MPEG2 SAMPLE一TCF MPEG2 DST、#ctrb SiTc2、SRC1 變 SAMPLE 一MADD SAMPLE—MADD DST、 #ctri、SRCW、SRC1 見 SAMPLE SIMMUL SAMPLE一 SIMMUL DST、#ctS、SRC2、SRC1 執行純量矩陣乘法。#ctrl為11位元立即值。此可為〇 (例如，#ctrl信號將忽略）。亦參見下文 ^ *----- 表格10 :指令概述 • 另外，對於SAMPLE—MADD而言，#ctrl可為η位元的立即值，此外還須執行兩個4x4矩陣（SRC1以及SRC2) 之加法。任一矩陣之一或多個元素可為16位元帶正負號之整數，其結果（DST)為4 X 4 16位元矩陣。矩陣可如以下在表格11中所示置放於來源/目的暫存器中，此可為内之個別單元。另外，SRC1以及#ctrl資料於週期1 日守可供存取，且SRC2於隨後之週期亦可存取，因此，可每兩週期發布一個操作。 #Ctrl[〇]指示是否執行飽和（saturation，SAT)操作。 24 200816082 S3U06-0022I00-TW 24598twf.doc/p #ctrl[l]指示是否執行捨入（r〇unding，r)操作。 #ctrl[2]指示是否執行1位元右移⑽浪，g)操作。 #ctrl[10:3]忽略。。f 11^5 Thick ztsmldorsdis r City χ—δι 丨HltlNVS ^oJluou 戏伞 w IJP#^^(N3es , Ioes : 6 竣嵴200816082 S3U06-0022I00-TW 24598twf.doc/p See Table 8, ΤΓ = transpose; FD = filtering direction (vertical; bS = Boundary Strength; bR = bR control, ye bit (YC = 1 in CbCr plane; YC = 于 in Y plane), and CEF = Chroma edge flag (Chroma Edge Flag. In addition, when 32 bits or (or fewer bits) are used for SRC1 or SRC2 (remaining undefined), a lane selection may be specified to reduce the use of the scratchpad. Format, but the following is an overview of the operation of the instruction in Table 1. The instruction name instruction format instruction operation SAMPLE_MCFJBLR SAMPLE_MCF BLR DST > SRC2 > SRC1 MC filter implementation SAMPLE-MCF-VC1 SAMPLE-MCF VC1 DST ^ SRC2 ^ SRC1 implements SAMPLE-MCF-H264 SAMPLE MCF H264 DST ^ SRC2 > SRC1 for MC-1 of VC-1. For the MC of H.264, apply SAMPLEJDF-VCl SAMPLE-IDF VC1 DST ^ SRC2 -"SRCl deblocking operation SAMPLEJDF —H264—0 SAMPLE IDF H264 0 DST ^ SRC2 ^RCr H.264 deblocking operation. 4x4x8 (vertical filter) or 8x4x8 block is provided from texture cache memory 166. SAMPLE-IDF-H264-1 SAMPLE_IDF-H264-1 DST, SR"t2, SRC1 Η 264 operation. A 4x4x8 bit block is provided from the shader, and another 4x4x8 bit block is provided from the texture cache memory 166. This allows the construction of an 8x4 (or 4x8) block. SAMPLE_IDF-H264-2 SAMPLE IDF H264- 2 DST, #cM, S anti-C2, SRC1 H.264 Two 4x4 blocks are provided by the cover 23 200816082 S3U06-0022I00-TW 24598twf.doc/p, 8x4 blocks. SAMPLE_SAD SAMPLE-SAD DST, S#, T #, SRC2, SRC1 Perform four absolute difference sum (SAD) operations on the reference (SRC2) and the prediction data. SAMPLE-TCF—Ι4χ4 SAMPLE TCF 14x4 DST, #ct5, SlTC2, SRC1 transform coding implementation SAMPLE-TCF—Μ4χ4 SAMPLE-TCFM4x4 DST, #ctrl, SRC2, SRC1 variable SAMPLE-TCF—MPEG2 SAMPLE-TCF MPEG2 DST, #ctrb SiTc2 , SRC1 variable SAMPLE - MADD SAMPLE - MADD DST, #ctri, SRCW, SRC1 See SAMPLE SIMMUL SAMPLE - SIMMUL DST, #ctS, SRC2, SRC1 Perform scalar matrix multiplication. #ctrl is an immediate value of 11 bits. This can be 〇 (for example, the #ctrl signal will be ignored). See also below ^ *----- Table 10: Instruction overview • In addition, for SAMPLE-MADD, #ctrl can be an immediate value of η bits, in addition to two 4x4 matrices (SRC1 and SRC2) addition. One or more elements of either matrix may be a 16-bit signed integer, and the result (DST) is a 4 X 4 16-bit matrix. The matrix can be placed in the source/destination register as shown in Table 11, which can be an individual unit within. In addition, the SRC1 and #ctrl data are available for access on the 1st day of the cycle, and the SRC2 can be accessed in subsequent cycles. Therefore, an operation can be issued every two cycles. #Ctrl[〇] indicates whether to perform a saturation (SAT) operation. 24 200816082 S3U06-0022I00-TW 24598twf.doc/p #ctrl[l] Indicates whether to perform rounding (r〇unding, r) operations. #ctrl[2] indicates whether to perform a 1-bit right shift (10) wave, g) operation. #ctrl[10:3]Ignore. .

表格11 :用於來源矩陣以及目的矩陣之暫存器另外，與此資料相關的邏輯準則可包括以下： #Lanes := 16; #Lanewidth :― 16;Table 11: Registers for Source Matrix and Destination Matrix In addition, the logic guidelines associated with this material can include the following: #Lanes := 16; #Lanewidth :― 16;

If (#ctrl[l]) R = 1; ELSE R = 0;If (#ctrl[l]) R = 1; ELSE R = 0;

If (#ctrl[2]) S = 1; ELSE S = 0; IF (#ctrl[0]) SAT - 1; ELSE SAT = 0;If (#ctrl[2]) S = 1; ELSE S = 0; IF (#ctrl[0]) SAT - 1; ELSE SAT = 0;

For (I := 0; I < #Lanes; I += 1){For (I := 0; I <#Lanes; I += 1){

Base := I * #Lanewidth;Base := I * #Lanewidth;

Top Base + #Lanewidth - 1;Top Base + #Lanewidth - 1;

Sourcelfl] := SRC1 [Top..Base];Sourcelfl] := SRC1 [Top..Base];

Source2[I] := SRC2[Top..Base];Source2[I] := SRC2[Top..Base];

Destination^] := (Sourcel[I] + Source2[I] + R) » IF (SAT) Destination[I] = MIN(MAX(Destination[I]，0)，255); DST[Top..Base] = Destination[I]; 再次參看圖9,其為執行純量矩陣相乘。#ctrl為ii位 25 200816082 S3U06-0022I00-TW 24598twf.doc/p 元立即值，此值可為〇 (亦即，#ctrl信號將忽略）。此指令在與 SAMPLE—TCF 以及 SAMPLEJGDF_H264_2 相同的群中。與此指令相關的邏輯準則可包括以下： #Lanes := 16; #Lanewidth := 16; MMODE = Control 一4[17:16]; SM = Control」[7:0]; SP = Control_4[15:8]; //僅使用最低有效5位元Destination^] := (Sourcel[I] + Source2[I] + R) » IF (SAT) Destination[I] = MIN(MAX(Destination[I],0),255); DST[Top..Base] = Destination[I]; Referring again to Figure 9, the multiplication of the scalar matrix is performed. #ctrl为ii bit 25 200816082 S3U06-0022I00-TW 24598twf.doc/p Element immediate value, this value can be 〇 (that is, the #ctrl signal will be ignored). This instruction is in the same group as SAMPLE-TCF and SAMPLEJGDF_H264_2. The logic guidelines associated with this directive can include the following: #Lanes := 16; #Lanewidth := 16; MMODE = Control a 4[17:16]; SM = Control"[7:0]; SP = Control_4[15: 8]; //Use only the least significant 5 bits

For (I :二 0; I < #Lanes; I += 1){For (I : 2 0; I <#Lanes; I += 1){

Base := I * #Lanewidth;Base := I * #Lanewidth;

Top := Base + #Lanewidth - 1;Top := Base + #Lanewidth - 1;

Source2[I] SRC2[Top..Base];Source2[I] SRC2[Top..Base];

Destination^] (SM * Source2[I]) » SP; DST[Top..Base] = Destination^];} 此是使用 VPU中用於執行MCF/TCF之 FIR-FILTERJBLOCK單元來實施的。SM為施加至所有巷之加權（例如，W[0] = W[l] = w[2] = W[3] = SM)，Pshift 為SP。當執行此操作時，FiRjpILTERJ3LOCK中之總和加法器被越過，自16x8位元乘法所得之四個結果可被移位，且每一結果之最低有效16位元被收集在一起成為16 個16位元結果，以回傳遞至eu。圖3為說明如圖2之計算架構中用於處理視訊資料之過程之流程圖的實施例。更特定言之，如圖3之實施例所說明，命令流處理器可將資料以及指令發送至EUP 146。 EUP 146相應地可用以讀取指令且處理所接收之資料。 26 200816082 S3UG6-0022I00-TW 24598twf.doc/p EUP146隨後可將指令、經處理之資料以及來自Eup紋理位址產生态（TAG)介面242之資料發送至紋理位址產生器（TAG) 150。TAG 150可用以產生已處理資料之位址。 TAG 150隨後可將資料以及指令發送至紋理快取記憶體控 • 制器（texture cache controller，TCC) 166。TCC 166 可用以陕取用於紋理濾波單元（texture unit，tfu) 168 之貧料。TFU168可根據所接收之指令來濾波所接收之資 φ 料’且將經濾波之資料發送至視訊可程式單元（VPIJ)199。 VPU 199可根據所接收之指令來處理所接收之資料，且將輕處理資料發送至後封裝器（p〇stpacker，psp) 16〇。pSp 可自諸如TFU ι68之各組件來收集像素封包。若像磚是，分完整的，則PSP 160可封裝多個像磚且使用被發送至管線之特定識別符號將像磚發回至EUP 146。圖4A為說明在計算裝置（諸如具有圖2之計算架構的計算裝置）中資料流之功能流程圖的實施例。如圖4A 之實施例所說明，可將加密的資料流發送至CSP 120，128 ’七=解密組件236。在至少一實施例中，加密位元流可經解始、且寫回至視訊記憶體。隨後可使用可變長度解碼器 (VLD)硬體來解碼所解密之視訊。解密組件236可解密所接收之位元流以形成編碼位元流238。編碼位元流238 可發送至VLD、霍夫曼（Huffman)解碼器、複雜適應性可變長度編碼器（complex a(japtive variable length decoder ’ CAVLC)及/或一進制算術編碼器（c〇ntext Based Binary Arithmetic Coder ’ CABAC) 240 (本文中稱為“解 27 200816082 S3U06-0022I00-TW 24598twf.doc/p 碼器）。解碼器240將所接收之位元流解碼，且將所解碼之位元發送至DirectX視訊加速（DirectX VideoDestination^] (SM * Source2[I]) » SP; DST[Top..Base] = Destination^];} This is implemented using the FIR-FILTERJBLOCK unit in the VPU for performing MCF/TCF. SM is the weight applied to all lanes (for example, W[0] = W[l] = w[2] = W[3] = SM), and Pshift is SP. When this is done, the sum adder in FiRjpILTERJ3LOCK is crossed, the four results from the 16x8 bit multiplication can be shifted, and the least significant 16 bits of each result are collected together into 16 16-bits. The result is passed back to eu. 3 is an embodiment of a flow diagram illustrating the process for processing video material in the computing architecture of FIG. 2. More specifically, the command stream processor can send data and instructions to the EUP 146 as illustrated in the embodiment of FIG. The EUP 146 is accordingly available to read instructions and process the received data. 26 200816082 S3UG6-0022I00-TW 24598twf.doc/p EUP146 can then send the instructions, processed data, and data from the Eup Texture Address Generation (TAG) interface 242 to the Texture Address Generator (TAG) 150. The TAG 150 can be used to generate the address of the processed data. The TAG 150 can then send the data and instructions to the texture cache controller (TCC) 166. The TCC 166 can be used to capture the lean material used for the texture unit (tfu) 168. The TFU 168 may filter the received resource ′ based on the received command and send the filtered data to the video programmable unit (VPIJ) 199. The VPU 199 can process the received data according to the received instructions and send the light processed data to a post wrapper (ppstpacker, psp) 16〇. pSp collects pixel packets from components such as TFU ι68. If the brick is complete and sub-complete, the PSP 160 can package multiple bricks and send the brick back to the EUP 146 using the specific identification symbol sent to the pipeline. 4A is an embodiment of a functional flow diagram illustrating data flow in a computing device, such as a computing device having the computing architecture of FIG. 2. As illustrated in the embodiment of Figure 4A, the encrypted data stream can be sent to the CSP 120, 128'==decryption component 236. In at least one embodiment, the encrypted bit stream can be uncompressed and written back to the video memory. The variable length decoder (VLD) hardware can then be used to decode the decrypted video. Decryption component 236 can decrypt the received bitstream to form encoded bitstream 238. The encoded bit stream 238 can be sent to a VLD, a Huffman decoder, a complex a variable length decoder (CAVLC), and/or a binary arithmetic encoder (c〇) Ntext Based Binary Arithmetic Coder 'CABAC) 240 (referred to herein as "Solution 27 200816082 S3U06-0022I00-TW 24598twf.doc/p). Decoder 240 decodes the received bit stream and decodes the bit Yuan sent to DirectX Video Acceleration (DirectX Video

Acceleration，DXVA)資料結構 242。另外，在 DXVA 資料結構242處接收到的資料為外部mpeG-2 VLD反掃描、 • 反量化與反DC預測，以及外部VC-l VLD反掃描、反量 ‘ 化與反DC/AC預測。隨後可經由圖像標頭244、記憶體缓衝器 0 (MBO) 246a，MB1 246b，MB2 246c，·.·，MBN 246η φ 等而將此資料擷取於DXVA資料結構242中。資料隨後可進入跳躍塊250、252以及254，以在圖4B以及圖4C中繼續。圖4B為圖4A之功能流程圖的延續。如圖所示，自圖 4A之跳躍塊250、252以及254，在反掃描反Q組件264 以及反DC/AC預測組件262處接收資料。此資料經處理且發迗至交換器265。交換器265判定資料經由intra/lnter 輸入端發送與否，將選定資料發送至跳躍塊27〇。另外，將來自跳躍塊260之資料發送至編碼圖案塊重建組件266。圖4C為圖4A以及圖4B之功能流程圖的延續。如圖所示’來自跳躍塊272、274 (圖4A)之資料於濾波器組件280處被接收。此資料根據多個協定之任一者由Mc濾波器282濾波。更特定言之，若資料以MpEG_2格式被接收，則該資料以％像素偏差來構造，可使用一雙通（tw〇 pass)濾波器來同時執行垂直濾波與水平濾波。若資料以 vc-i格式被接收，則利用4抽頭（4_tap)濾波器；當資料為1/2準度時操作於雙線性（bilinear)模式下，當資料 28 200816082 S3U06-0022I00-TW 24598twf.doc/p 為1/4準度時則操作於雙立方（bicubic)模式下。另面^若貧料以H.264格式被接收，則可利用6抽頭濾波器；當資料取樣為四分像素時使用亮度内插，當資料取樣^ 分像素時則使用色度内插。經遽波之資料隨後發送至重建參考組件284 ’與濾、波器組件28〇相關的資料發送至交換器組件288。交換器組件288亦接收零。交換器組件可基 =所接收之Intm/Inter資料來判定那些資料將發送至加法，益 298。另外，反轉換組件296自編碼圖案塊重建組件286接收資料，以及經由跳躍塊276自交換器265 (圖4B)接收資料。反轉換組件296執行對於MPEG_2資料之8χ8離散餘弦反轉換（Πχ：τ)、對於vc]資料之8χ8、8χ4、4χ8 及/或4x4整數轉換以及對於η·264資料之4χ4整數轉換，亚根據所要執行的轉換，將此資料發送至加法器2兇。加法裔298將反轉換組件296以及交換器288之資料相力求和’，且將求和所得的資料發送至迴路内濾波器 2f6迴路内濾波器296過濾所接收之資料，且將經過濾之 ^料务送至重建框架組件290。重建框架組件290將資料 ^送至重建麥考組件284。重建框架組件290可將資料發运至解塊與去環（dering)濾波器292，濾波器292可將經 ^慮之資料發送至用於解交錯之解交錯（de-interlacing ) 組件294 ’此資料隨後可供顯示。圖5A為說明在VPU中（諸如在圖2之計算架構中）可用於提供動態壓縮（MC)及/或離散餘弦轉換（DCT) 29 200816082 S3U06-0022I0Q-TW 24598twf.doc/p 操作之組件之實施例的功能方塊圖。更特定言之，如圖5A 之貝加例所§兄明’匯流排A可用以將16位元資料發送至 PE3 314d之輪入埠b，匯流排A亦將資料發送至以延遲組件3Ό0，以將16位元資料發送至PE 2 314c之第二輸入 • 。匯流排A亦將此貢料發送至Z 1延遲組件302以將16 / 位元資料發送至PE 1 314b，此資料亦發送至ζ-ι延遲組件 304，其隨後進入PE 〇 314a以及延遲組件3〇6。在穿過 • 乙1延遲組件3〇6之後，將匯流排A之低位8位元資料發送至PE 0 314a，此資料由ΖΓ1 306延遲且發送至pe 1 314b 以及Z·1延遲組件310。在到達ζ-ι延遲組件31〇之後，此資料之低位8位元發送至pe 2 314c以及Z·1延遲組件 312 ;在到達Ζ-ι延遲組件312之後，此資料之低位8位元發送至PE 3 314d。另外，匯流排B將64位元資料發送至 PE 3 314d、PE2 314c、PE 1 314b 以及 PE 0 314a 之每一者。處理元件 0 (processing Elelment，PE 〇) 314a 可促進鲁過濾所接收資料。更特^言之，叩可^服濾波器之一元件。當 PE 0 314a、PE 1 314b、PE 2 314c 以及 PE 3 314d ^加法器330組合時，此可形成4抽頭/ 8抽頭F j R濾波器。貝料之一部分首先發送至ζ-3延遲組件316。多工器318選擇資料以使輸入資料自欄位輸入回應組件（FieW Input Response，FIR)輸出至多工器318之選擇埠，此資料自多工态318發送至加法器mo。同樣地，來自PE 1 314b之資料發送至多工器322,其中一些賁料首先在z-2延遲組件32〇處被接收。多工器322 30 200816082 S3U06-0022I00-TW 24598twf.doc/p ，由所接收之FIR輸入端而自所接收之資料進行選擇，選疋資料I送至加法态330。PE 2 314c之資料發送至多工器 326，其中一些資料首先發送至rl延遲組件324。打汉^ 入選擇待發送至加法器330之資料，自pE 3 314d之資料發送至加法器330。 / 亦輸入至加法器330的是N移位器332之反饋迴路。此貝料經由Z1延遲組件326在多工器328處被接收。亦 _ 在夕工态328處接收到的為捨入資料。多工器328在多工 ™ 328之選擇埠處經由較1輸入端而對所接收之資料進行選擇。多工器328將選定資料發送至加法器33〇，'加法器 330加上所接收之資料且將所加之資料發送至N移位器 332，此16位元移位資料被發送至輸出端。圖5B為圖5A之圖的延續。更特定言之，如圖5B之實施例所說明，來自記憶體緩衝器34〇a、34〇b、34〇c以及 340d之資料被發送至多工器342a。多工器342&將16位元資料發送至跳躍塊344a以及346a。同樣地，多工器342b 自記憶體緩衝器340b、340c、340d以及340e接收資料，且將¥料發送至跳躍塊344b以及346b ;多工器342c自 340c、340d、340e以及34〇f接收資料且將資料發送至344c 以及 346c ;多工器 342d 自 34〇d、34〇e、34〇f 以及 34〇g 接收資料且將資料發送至跳躍塊344(1以及346d ;多工器 342e自340e、340f、340g以及340h接收資料且將資料發送至344e以及346e ;多工器342f自340f、340g、340h以及340i接收資料且將資料發送至344f以及346f ;多工器 31 200816082 S3U06-0022I00-TW 24598twf.doc/p 342g自340g、340h、340i以及340h接收資料且將資料發送至跳躍塊344g以及346g ;多工器342h自340h、34〇i、 340j以及340k接收資料且將資料發送至344h以及346h ; 多工器342i自340i、340j、340k以及3401接收資料且將資料發送至跳躍塊344i以及346i。圖5C為圖5A以及圖5B之圖的延續。更特定言之，自多工器342a之資料（經由跳躍塊348a)發送至記憶體緩衝器B、槽350a;自多工器342b之資料（經由跳躍塊 348b)發送至記憶體B、槽350b ;自多工器342c之資料 (經由跳躍塊348c)發送至記憶體B、槽350C ;自多工器 342d之資料（經由跳躍塊348d)發送至記憶體B、槽350d; 自多工器342e之資料（經由跳躍塊348e)發送至記憶體 B、槽350e ;自多工器342f之資料（經由跳躍塊348f)發送至記憶體B、槽350f，自多工器342g之資料（經由跳躍塊348g)發送至記憶體B、槽350g ;自多工器342h之資料（經由跳躍塊348h)發送至記憶體B、槽350h ;自多工器342i之資料（經由跳躍塊348i)發送至記憶體B、槽350i。同樣地，自跳躍塊362j-362r之資料（自圖5D，以下論述）發送至轉置（Transpose)網路360。轉置網路360 轉置所接收之資料；且將其發送至記憶體緩衝器B，記憶體缓衝器B將資料發送至跳躍塊366j-366r。圖5D為圖5A-圖5C之圖的延續。更特定言之，資料在多工器369a處自跳躍塊368a(圖5B，經由多工器342a) 以及跳躍塊368j (圖5C，經由記憶體緩衝器B)被接收， 32 200816082 S3U06-0022I00-TW 24598twf.doc/p 此資料由vert信號選擇且經由匯流排a(見圖5A)發送至 FIR濾波器塊〇 370a。同樣地，多工器369b-369i自跳躍塊 368b-368i以及368k-368r接收資料，此資料發送至pIR濾波器塊370b-370i且經處理，就如關於圖5A所敘述。自 FIR濾波器塊〇 370a輸出之資料發送至跳躍塊372b以及 372j ; FIR濾波器塊370b輸出至跳躍塊372c以及372k ; FIR濾波斋塊370c輸出至跳躍塊372d以及3721 ; FIR濾波器塊370d輸出至跳躍塊372e以及372m; FIR濾波器塊 370e輸出至跳躍塊372f以及372n ; FIR濾波器塊37〇f輸出至跳躍塊372g以及372〇 ; FIR濾波器塊370g輪出至跳雖塊372h以及372p，FIR濾波器塊370h輸出至跳躍塊372i 以及372q ; FIR濾波器塊37〇i輸出至跳躍塊372j以及 372r。如上所論述，自跳躍塊372j_372r之資料由圖5C之轉置網路360接收。跳躍塊372b-372j在圖5E中繼續。圖5E為圖5A-圖5D之圖的延續。更特定言之，如圖 5E之實施例中所說明，自跳躍塊376b之資料（經由圖5D 之FIR濾波器塊370a)發送至記憶體缓衝器c、槽38〇b。同樣地，自跳躍塊376c之資料（經由圖5D2FIR濾波器塊370b)發送至記憶體緩衝器c、槽380c ;自跳躍塊376d 之資料（經由圖5D之FIR濾波器塊370c)發送至記憶體緩衝器C、槽380d;自跳躍塊376e之資料（經由圖5〇之 FIR濾波器塊370d)發送至記憶體缓衝器c、槽38〇e ;自跳躍塊376f之資料（經由圖5D之FIR濾波器塊37〇e )發送至δ己^思體緩衝斋C、槽380f;自跳躍塊376g之資料（經 33 200816082 S3U06-0022I00-TW 24598twf.doc/p 由圖5D之FIR滤波器塊370f)發送至記憶體緩衝器c、槽380g ;自跳躍塊376h之資料（經由圖5D之FIR濾波器塊370g)發送至記憶體缓衝器c、槽380h ;自跳躍塊376i 之資料（經由圖5D之FIR濾波器塊370h)發送至記憶體缓衝器C、槽380i;自跳躍塊376j之資料（經由圖5D之 FIR濾波器塊370i)發送至記憶體缓衝器c、槽380j。多工器382a自記憶體緩衝器C、槽38〇b、38〇c以及 380d接收資料；多工器382b自記憶體緩衝器〇槽380d、 380e以及380f接收資料；多工器382c自記憶體缓衝器c、槽380f、380g以及380h接收資料；多工器382d自記憶體緩衝器C、槽380h、380i以及380j接收資料。一旦接收到資料，多工器382a-382d便將資料發送至ALU384a-384d。加法器382d接收此資料以及值“〗，，以處理所接收之資料並將經處理之資料分別發送至移位器386a-386(i，移位器 386a-386d將所接收之資料移位且將經移位之資料發送至 Z塊388a_388d，接著將資料自z塊388a-388d分別發送至多工器 390a-390d。另外，Z塊388a自跳躍塊376b接收資料且將資料發送至多工裔390a ; Z塊3881)自跳躍塊376(：接收資料且將資料發送至多工器390b ; Z塊388c自跳躍塊376d接收資料且將資料發送至多工器39〇c ; z塊388d自跳躍塊接收資料且將資料發送至多工器390d;多工器39〇a39〇d 亦接收選擇輸入且將選定資料發送至輸出端。圖5F為圖5A-圖5E之組件的總圖之實施例。更特定 34 200816082 S3U06-0022I00-TW 24598twf.doc/p 言之’如圖5F之實施例所說明，資料在記憶體緩衝器a 34〇處被接收。此資料在多工器342處與記憶體緩衝器A 34〇 ^之其他資料-起多工。多工器342選擇資料，且將選定資料發送至記憶體緩衝器B 35〇。記憶體緩衝器B 35〇亦自傳迗網路360接收資料。記憶體緩衝器B 35〇將資料發运至多工器369 ’多工器369亦自多工器342接收資料。多工器369選擇資料，且將選定資料發送至FIR濾、波器 370。FIR濾波器將所接收之資料過濾，且將經過濾之資料發送至記憶體緩衝器c 38G、Z組件388以及傳送網路 360。記憶體緩衝器C38〇將資料發送至多工器，多工器382自從記憶體緩衝器c 38〇接收之資料進行選擇選定的資料發送至ALU 384，ALU 384自所接收資料計算結果，且將計异所得的資料發送至移位器386。位之資料被發送至多工器39〇，多工器亦自z組件^ 接收貧料，多工器幾選擇結果且將此結果發送至輸出端。圖5A-W 5F中所示之組件可用以提供動態壓縮及/或離散餘弦轉換（DCT)。更特定言之，視殊及/或資料格式而定，資料可在遞迴操作n =r?rr果。另外，視特枓格式而疋，貧料可自EU146及/或TFUl68接收。如一非限制性實施例，在實際操作中，圖5八組件可用以接收關於待執行之操作（例如，運^= 散餘弦變轉）的指示。另外，還可接二如，H.264、VC心廳似等 ^貝^格式（例日不如一實施例，對 35 200816082 S3U06-0022I00-TW 24598twf.doc/p = Η·264格式而言，動態補償（MC)資料可在多個週期中穿過FIR濾波器謂，且隨後進入之記憶體緩衝器以轉換為％像素格式。如下更詳細論述，纟Η·264格式下之其他操作或其他資料可利關5Α•圖5F之組件的相同或不/同用途。另外，乘法轉列可用以作為乘法器之陣列以執行16個16位元相乘及/或用作向量或矩陣乘法哭。此一實例為SMMUL指令。Acceleration, DXVA) data structure 242. In addition, the data received at the DXVA data structure 242 are external mpeG-2 VLD back-scan, • inverse quantization and inverse DC prediction, and external VC-l VLD back-scan, inverse ‘anti-DC/AC prediction. This data can then be retrieved from the DXVA data structure 242 via image header 244, memory buffer 0 (MBO) 246a, MB1 246b, MB2 246c, .., MBN 246η φ, and the like. The data can then enter jump blocks 250, 252, and 254 to continue in Figures 4B and 4C. Figure 4B is a continuation of the functional flow diagram of Figure 4A. As shown, the skip blocks 250, 252, and 254 from Figure 4A receive data at the inverse scan inverse Q component 264 and the inverse DC/AC prediction component 262. This data is processed and sent to switch 265. The switch 265 determines whether the data is sent via the intra/interter input and sends the selected data to the jump block 27A. Additionally, the data from the skip block 260 is sent to the coded pattern block reconstruction component 266. 4C is a continuation of the functional flow diagram of FIGS. 4A and 4B. The data from jump blocks 272, 274 (Fig. 4A) is received at filter component 280 as shown. This data is filtered by the Mc filter 282 according to any of a number of protocols. More specifically, if the data is received in the MpEG_2 format, the data is constructed with a % pixel offset, and a tw〇 pass filter can be used to perform both vertical and horizontal filtering. If the data is received in vc-i format, the 4-tap (4_tap) filter is used; when the data is 1/2-degree, it operates in bilinear mode, when the data is 28 200816082 S3U06-0022I00-TW 24598twf When .doc/p is 1/4 accuracy, it operates in bicubic mode. On the other hand, if the poor material is received in the H.264 format, a 6-tap filter can be used; when the data is sampled as a quarter-pixel, the luminance interpolation is used, and when the data is sampled, the chrominance interpolation is used. The chopped data is then sent to the rebuild reference component 284' and the filter and waver component 28' associated data is sent to the switch component 288. Switch component 288 also receives zeros. The switch component can be based on the received Intm/Inter data to determine which data will be sent to the addition, benefit 298. In addition, inverse conversion component 296 receives data from coded pattern block reconstruction component 286 and receives data from switch 265 (Fig. 4B) via jump block 276. The inverse conversion component 296 performs 8χ8 discrete cosine inverse conversion (Πχ:τ) for MPEG_2 data, 8χ8, 8χ4, 4χ8, and/or 4x4 integer conversion for vc] data, and 4χ4 integer conversion for η·264 data, according to the desired The conversion is performed, and this data is sent to the adder 2 fierce. The adder 298 sums the data of the inverse conversion component 296 and the switch 288, and sends the summed data to the in-loop filter 2f6. The in-loop filter 296 filters the received data and filters it. The material is sent to the reconstruction framework component 290. The reconstruction framework component 290 sends the data to the reconstructed McCaw component 284. The reconstruction framework component 290 can ship the data to a deblocking and detling filter 292, which can send the processed data to a de-interlacing component 294 for deinterlacing. The data is then available for display. 5A is a diagram illustrating components that may be used to provide dynamic compression (MC) and/or discrete cosine transform (DCT) in a VPU, such as in the computing architecture of FIG. 2, 200816082 S3U06-0022I0Q-TW 24598 twf.doc/p Functional block diagram of an embodiment. More specifically, as shown in FIG. 5A, the BB brother's bus A can be used to send 16-bit data to the round-up PEb of the PE3 314d, and the bus A also sends the data to the delay component 3Ό0. To send 16-bit data to the second input of PE 2 314c. Bus A also sends this tribute to Z1 delay component 302 to send 16/bit data to PE 1 314b, which is also sent to ζ-ι delay component 304, which then enters PE 〇 314a and delay component 3 〇 6. After passing through the B1 delay component 3〇6, the lower 8 bits of bus A are sent to PE 0 314a, which is delayed by ΖΓ1 306 and sent to pe 1 314b and Z·1 delay component 310. After reaching the ζ-ι delay component 31〇, the lower octet of this data is sent to the pe 2 314c and the Z·1 delay component 312; after reaching the Ζ-ι delay component 312, the lower 8 bits of the data are sent to PE 3 314d. In addition, bus B transmits 64-bit data to each of PE 3 314d, PE2 314c, PE 1 314b, and PE 0 314a. Processing element 0 (processing Elelment, PE 〇) 314a facilitates the filtering of received data. More specifically, you can use one of the filters. When PE 0 314a, PE 1 314b, PE 2 314c, and PE 3 314d ^ adder 330 are combined, this can form a 4-tap / 8-tap F j R filter. A portion of the bait is first sent to the ζ-3 delay component 316. The multiplexer 318 selects the data to output the input data from the field input response component (FIEW Input Response, FIR) to the selection of the multiplexer 318, which is sent from the multi-mode 318 to the adder mo. Similarly, data from PE 1 314b is sent to multiplexer 322, some of which is first received at z-2 delay component 32. The multiplexer 322 30 200816082 S3U06-0022I00-TW 24598twf.doc/p selects the received data from the received FIR input and selects the data I to be sent to the adder 330. The data of PE 2 314c is sent to multiplexer 326, some of which is first sent to rl delay component 324. The data to be sent to the adder 330 is selected and sent from the data of the pE 3 314d to the adder 330. Also input to the adder 330 is a feedback loop of the N shifter 332. This beaker is received at multiplexer 328 via Z1 delay component 326. Also _ received the rounded data at 328. The multiplexer 328 selects the received data via the 1 input at the selection of the multiplexer 328. The multiplexer 328 sends the selected data to the adder 33, the adder 330 adds the received data and sends the added data to the N shifter 332, which is sent to the output. Figure 5B is a continuation of the diagram of Figure 5A. More specifically, as illustrated in the embodiment of Fig. 5B, data from the memory buffers 34A, 34B, 34C, and 340d is sent to the multiplexer 342a. The multiplexer 342& sends 16-bit data to the skip blocks 344a and 346a. Similarly, multiplexer 342b receives data from memory buffers 340b, 340c, 340d, and 340e, and sends the material to jump blocks 344b and 346b; multiplexer 342c receives data from 340c, 340d, 340e, and 34〇f. And the data is sent to 344c and 346c; the multiplexer 342d receives the data from 34〇d, 34〇e, 34〇f and 34〇g and sends the data to the jump block 344 (1 and 346d; the multiplexer 342e from 340e) , 340f, 340g, and 340h receive the data and send the data to 344e and 346e; multiplexer 342f receives the data from 340f, 340g, 340h, and 340i and sends the data to 344f and 346f; multiplexer 31 200816082 S3U06-0022I00-TW 24598twf.doc/p 342g receives data from 340g, 340h, 340i, and 340h and sends the data to jump blocks 344g and 346g; multiplexer 342h receives data from 340h, 34〇i, 340j, and 340k and sends the data to 344h and 346h; multiplexer 342i receives data from 340i, 340j, 340k, and 3401 and transmits the data to jump blocks 344i and 346i. Figure 5C is a continuation of the diagram of Figures 5A and 5B. More specifically, self-multiplexer 342a Information (via jump block 348a) Sended to the memory buffer B, the slot 350a; the data from the multiplexer 342b (via the jump block 348b) is sent to the memory B, the slot 350b; the data from the multiplexer 342c (via the jump block 348c) is sent to the memory B, slot 350C; data from multiplexer 342d (via jump block 348d) is sent to memory B, slot 350d; data from multiplexer 342e (via jump block 348e) is sent to memory B, slot 350e; The data of the multiplexer 342f (via the jump block 348f) is sent to the memory B, the slot 350f, and the data from the multiplexer 342g (via the jump block 348g) is sent to the memory B, the slot 350g; the data from the multiplexer 342h (via jump block 348h) sent to memory B, slot 350h; data from multiplexer 342i (via jump block 348i) is sent to memory B, slot 350i. Similarly, self-jump block 362j-362r data (from Figure 5D, discussed below) is sent to Transpose network 360. Transpose network 360 transposes the received data; and sends it to memory buffer B, which sends the data to Jump blocks 366j-366r. Figure 5D is a continuation of the Figures 5A-5C. More specifically, data The multiplexer 369a is received from the jump block 368a (Fig. 5B via the multiplexer 342a) and the jump block 368j (Fig. 5C via the memory buffer B), 32 200816082 S3U06-0022I00-TW 24598twf.doc/p The data is selected by the vert signal and sent to the FIR filter block 〇370a via bus a (see Figure 5A). Similarly, multiplexers 369b-369i receive data from jump blocks 368b-368i and 368k-368r, which are sent to pIR filter blocks 370b-370i and processed as described with respect to Figure 5A. The data output from the FIR filter block 〇370a is sent to the skip blocks 372b and 372j; the FIR filter block 370b is output to the jump blocks 372c and 372k; the FIR filter block 370c is output to the jump blocks 372d and 3721; and the FIR filter block 370d outputs Up to jump blocks 372e and 372m; FIR filter block 370e is output to jump blocks 372f and 372n; FIR filter block 37〇f is output to jump blocks 372g and 372A; FIR filter block 370g is rotated to jump blocks 372h and 372p The FIR filter block 370h is output to the skip blocks 372i and 372q; the FIR filter block 37〇i is output to the skip blocks 372j and 372r. As discussed above, the data from the skip block 372j_372r is received by the transposition network 360 of Figure 5C. Jump blocks 372b-372j continue in Figure 5E. Figure 5E is a continuation of the Figures 5A-5D. More specifically, as illustrated in the embodiment of Figure 5E, the data from the skip block 376b (via the FIR filter block 370a of Figure 5D) is sent to the memory buffer c, slot 38 〇 b. Similarly, the data from the skip block 376c (via the FIG. 5D2 FIR filter block 370b) is sent to the memory buffer c, slot 380c; the data from the skip block 376d (via the FIR filter block 370c of FIG. 5D) is sent to the memory. Buffer C, slot 380d; data from jump block 376e (via FIR filter block 370d of FIG. 5) is sent to memory buffer c, slot 38〇e; data from jump block 376f (via Figure 5D) FIR filter block 37〇e) is sent to δ 己体体体 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 370f) sent to the memory buffer c, the slot 380g; the data from the jump block 376h (via the FIR filter block 370g of FIG. 5D) is sent to the memory buffer c, the slot 380h; the data of the self-jumping block 376i The FIR filter block 370h) of FIG. 5D is sent to the memory buffer C, slot 380i; the data from the skip block 376j (via the FIR filter block 370i of FIG. 5D) is sent to the memory buffer c, slot 380j. The multiplexer 382a receives data from the memory buffer C, the slots 38〇b, 38〇c, and 380d; the multiplexer 382b receives data from the memory buffer slots 380d, 380e, and 380f; the multiplexer 382c is self-memory The buffer c, the slots 380f, 380g, and 380h receive the data; the multiplexer 382d receives the data from the memory buffer C, the slots 380h, 380i, and 380j. Once the data is received, the multiplexers 382a-382d send the data to the ALUs 384a-384d. The adder 382d receives the data and the value "" to process the received data and send the processed data to the shifters 386a-386, respectively (i, the shifters 386a-386d shift the received data and The shifted data is sent to Z-blocks 388a-388d, and the data is then sent from z-blocks 388a-388d to multiplexers 390a-390d. Additionally, Z-block 388a receives the data from jump block 376b and transmits the data to multiplexed 390a; Z block 3881) self-jump block 376 (: receives data and sends the data to multiplexer 390b; Z block 388c receives the data from jump block 376d and sends the data to multiplexer 39〇c; z block 388d receives the data from the skip block and The data is sent to multiplexer 390d; multiplexer 39〇a39〇d also receives the selection input and sends the selected data to the output. Figure 5F is an embodiment of the general diagram of the components of Figures 5A-5E. More specifically 34 200816082 S3U06-0022I00-TW 24598twf.doc/p As explained in the embodiment of Fig. 5F, the data is received at the memory buffer a 34. This data is at the multiplexer 342 and the memory buffer A 34 〇^ Other information - multiplexing. multiplexer 342 selection data And the selected data is sent to the memory buffer B 35. The memory buffer B 35〇 also receives data from the automatic transmission network 360. The memory buffer B 35〇 sends the data to the multiplexer 369 'multiplexer 369 The data is also received from the multiplexer 342. The multiplexer 369 selects the data and sends the selected data to the FIR filter, wave 370. The FIR filter filters the received data and sends the filtered data to the memory buffer. The device c 38G, the Z component 388, and the transfer network 360. The memory buffer C38 transmits the data to the multiplexer, and the multiplexer 382 selects the selected data from the data received by the memory buffer c 38 to the ALU 384. The ALU 384 calculates the result from the received data, and sends the calculated data to the shifter 386. The bit data is sent to the multiplexer 39, and the multiplexer receives the poor material from the z component ^, the multiplexer Several results are selected and sent to the output. The components shown in Figures 5A-W 5F can be used to provide dynamic compression and/or discrete cosine transform (DCT). More specifically, depending on the data format and/or data format Set, the data can be returned n = r?rr. In addition, depending on the format, the lean material may be received from EU 146 and/or TFU 168. As a non-limiting example, in actual operation, the Figure VIII component may be used to receive information about the pending The operation (for example, the operation of ^^ scattered cosine to change). In addition, it can also be connected to the same as H.264, VC heart-like, etc. (example is not as good as an embodiment, to 35 200816082 S3U06-0022I00 -TW 24598twf.doc/p = For the 264 format, dynamic compensation (MC) data can pass through the FIR filter in multiple cycles and then enter the memory buffer for conversion to the % pixel format. As discussed in more detail below, other operations or other materials in the 264·264 format may be useful for the same or non-identical use of the components of Figure 5F. Alternatively, the multiplication delta can be used as an array of multipliers to perform 16 16-bit multiplications and/or as vector or matrix multiplication crying. An example of this is the SMMUL instruction.

圖6為可用於計异架構（諸如圖2之計算架構）中之像素處理引擎的功能方塊圖。更特定言之，如圖6之實施例所說明，匯流排Α(在移位暫存器前）以及匯流排^見 =5Α)將16位元資料發送至多工器4〇〇。多工器*⑻之，擇埠處接收來自FIR濾波器37G之否定信號，並選擇一筆16位元資料，將此資料發送至多工器4〇6。另外，多工 =02可用以接收匯流排A資料（在移位暫存器後）以及令貝料。多工器402可在選擇璋處自6抽頭資料中選擇所果，此16位兀結果可發送至16位元無正負號加法器〇4。16位兀無正負號加法器4〇4亦可用以自匯流排a接收資料（在移位暫存器前）。 16位元無正負號加法器404可加總所接收之資料，且將結果發送至多工器406。多工器406可用以自選擇埠處之所接收的通路反相6抽頭資料中進行選擇，選定之資料了發送至16x8乘法器410，乘法器41〇亦可接收模式資料。24位元結果隨後可發送至移位器412以提供％位元結果。 36 200816082 S3U06-0022I00-TW 24598twf.doc/p6 is a functional block diagram of a pixel processing engine that can be used in a computing architecture, such as the computing architecture of FIG. 2. More specifically, as illustrated in the embodiment of Fig. 6, the bus bar (before the shift register) and the bus bar = 5 Α) send the 16-bit data to the multiplexer 4〇〇. The multiplexer*(8) receives the negative signal from the FIR filter 37G and selects a 16-bit data to send the data to the multiplexer 4〇6. In addition, multiplex = 02 can be used to receive bus A data (after shift register) and to order. The multiplexer 402 can select the result from the 6-tap data at the selection point, and the 16-bit 兀 result can be sent to the 16-bit unsigned adder 〇4. The 16-bit 兀 unsigned adder 4〇4 can also be used. Receive data from bus a (before the shift register). The 16-bit unsigned adder 404 can sum the received data and send the result to the multiplexer 406. The multiplexer 406 can be selected from the inverted 6-tap data received from the selected port, and the selected data is sent to the 16x8 multiplier 410, which can also receive the mode data. The 24-bit result can then be sent to shifter 412 to provide a % bit result. 36 200816082 S3U06-0022I00-TW 24598twf.doc/p

圖7A為可用於VC-1迴路内濾波器中（諸如在圖2 之计异架構中）之組件功能方塊圖。如圖7A之實施例所谠明’多工态420可在輸入埠處接收“丨，，值以及‘‘〇，，值夕工™ 亦可接收A0絕對值<pquant與否作為選擇輸入。同樣地，多工器422可接收“丨，，值以及“〇，，值，以及A3 <A0 490c絕對值與否。多工器424可接收“J，，值、〇值作為輸入，以及clip (剪輯）值不等於〇與否 (自圖7C之移位器468)作為選擇輸入。另外，自多工器 420輸出之資料可發送至邏輯或閘426,邏輯或閘426可將 ^料發送至多工器428。多工器428亦可接收fllter_otherJ 貧料作為輸入。更特定言之，如圖7A中所示可產生 filter一other—3信號，此信號若不為零，則指示需過濾其他二列像素，否則，可不過濾（修改）此4x4塊。多工器428 根據在選擇輸入端所接收之處理像素資料3而選擇輸出資料。、圖7B為圖7A之圖的延續。更特定言之，如圖7八之實施例所說明，絕對值組件43〇接收9位元輸入A1 490a (自圖7D)，絕對值組件432接收9位元輸入A2 49〇b(自圖7D)。藉由計算所接收資料之絕對值，最小值組件434 判疋所接收資料之最小值，且將此資料作為輸出A3並發送至 2 進位補數組件（2’s compliment component) 436。2 進位補數組件436計算所接收資料之2進位補數，且將此資料發送至減法組件438。減法組件438自輸入資料A〇 490c (自圖7D)減去此資料，隨後發送至移位器44〇以將 37 200816082 S3U06-0022I00-TW 24598twf.doc/p 結果向左移位兩位並發送至加法器442。另外，減法組件 438之輪出將輸入至加法器442中，因此允許電路不使用乘法器就可執行乘以5的操作。加法器442加總所接收之資料，且將結果發送至移位 ‘态444。移位器444將所接收之資料向右移三位，且將資料發送至钳位組件（clamp comp〇nent) 446。钳位組件446 亦接收剪輯值clip (自移位器468,圖7C)，且將結果發送 # 至輸出端。應注意濾波器之結果可為負或大於255。因此此鉗位組件446可用以將結果鉗位至無正負號δ位元值。因此，若輸入d為負的，則d將被設定為〇。若d>剪輯值 clip，則d可被設定為剪輯值ciip。圖7C為圖7A以及圖7B之圖的延續。如圖7C之實，例’ P1資料450a、P5資料450e以及P3資料450c被發多工器452。多工器452接收選擇輸入並選擇資料以叙迗至$法組件460。多工器亦將輸出資料發送至多工器 454之選擇輸入端。多工1§ 454亦自P4 450d、P8 450h以及P6 450f接收輸入資料。多工器454將輸出資料發送至減法組件46〇。減法組件460對所接收之資料作減法，並將結果發送至移位器466。移位器466將所接收之資料向左移一位，且將此結果發送至跳躍塊474。同樣地，多工益456接收輪入P2 45〇b、P3 450c以及 P4 450d。多工器456自多工器454接收選擇輸入，且將所選定之資料發送至減法組件464。多工器458自多工器456 接收選擇輪入，且自P3 450c、P7 45〇g以及p5徽接收輸入資料。多工器將輸出資料發送至減法組件464，減法 38 200816082 S3U06-0022I00-TW 24598twf.doc/p ，件464對所接收之資料作減法，並將此資料發送至移位器470以及加法器472。移位器47〇將所接收之資料向左移兩位，且將經移位之資料發送至加法器472，加法器472 相加所接收之資料且將結果發送至跳躍塊48〇。另外，減法組件462自P4 450d以及P5 450e接收資料、對所接收之資料作減法並將結果發送至移位器4沾。移位斋468將所接收之資料向右移一位，且輸出此資料作為匈輯資料clip以輸入至钳位組件446以及多工器。Figure 7A is a block diagram of the functionalities of the components that can be used in a VC-1 in-loop filter, such as in the variant architecture of Figure 2. As shown in the embodiment of FIG. 7A, the multi-work 420 can receive "丨,, and value" at the input port, and the value of the work can also receive the A0 absolute value < pquant or not as a selection input. Similarly, multiplexer 422 can receive "丨,, value, and "〇,, value, and A3 < A0 490c absolute value or not. Multiplexer 424 can receive "J,, value, 〇 value as input, and The clip value is not equal to 〇 or not (from shifter 468 of Figure 7C) as a selection input. Additionally, the data output from multiplexer 420 can be sent to logic OR gate 426, which can send the data to multiplexer 428. The multiplexer 428 can also receive fllter_otherJ lean as input. More specifically, as shown in FIG. 7A, a filter-other-3 signal can be generated. If the signal is not zero, it indicates that the other two columns of pixels need to be filtered. Otherwise, the 4x4 block may not be filtered (modified). The multiplexer 428 selects the output data based on the processed pixel data 3 received at the selection input. Figure 7B is a continuation of the Figure 7A. More specifically, as illustrated in the embodiment of FIG. 7A, the absolute value component 43 receives the 9-bit input A1 490a (from FIG. 7D), and the absolute value component 432 receives the 9-bit input A2 49〇b (from FIG. 7D). ). By calculating the absolute value of the received data, the minimum component 434 determines the minimum value of the received data and uses this data as output A3 and sends it to the 2's compliment component 436. 2 Carry Complement Component 436 calculates the 2-bit complement of the received data and sends this data to subtraction component 438. The subtraction component 438 subtracts this data from the input data A 〇 490c (from Figure 7D) and then sends it to the shifter 44 移位 to shift the 37 200816082 S3U06-0022I00-TW 24598 twf.doc/p result to the left by two bits and send To adder 442. In addition, the rounding of the subtraction component 438 is input to the adder 442, thus allowing the circuit to perform an operation multiplied by 5 without using a multiplier. Adder 442 sums up the received data and sends the result to shift 'state 444. The shifter 444 shifts the received data to the right by three bits and sends the data to a clamp comp〇nent 446. Clamp component 446 also receives the clip value clip (self-shifter 468, Figure 7C) and sends the result # to the output. It should be noted that the result of the filter can be negative or greater than 255. Thus, the clamp component 446 can be used to clamp the result to an unsigned δ bit value. Therefore, if the input d is negative, then d will be set to 〇. If d> clips the value of clip, then d can be set to the clip value ciip. Figure 7C is a continuation of the Figures 7A and 7B. As shown in Fig. 7C, the example 'P1 data 450a, P5 data 450e, and P3 data 450c are transmitted to the multiplexer 452. The multiplexer 452 receives the selection input and selects the data for narration to the $method component 460. The multiplexer also sends the output data to the selection input of multiplexer 454. Multiplex 1 § 454 also receives input data from P4 450d, P8 450h and P6 450f. The multiplexer 454 sends the output data to the subtraction component 46A. Subtraction component 460 subtracts the received data and sends the result to shifter 466. Shifter 466 shifts the received data one bit to the left and sends the result to jump block 474. Similarly, the multi-work 456 receives the wheels P2 45〇b, P3 450c, and P4 450d. The multiplexer 456 receives the selection input from the multiplexer 454 and sends the selected data to the subtraction component 464. The multiplexer 458 receives the selected rounds from the multiplexer 456 and receives input data from the P3 450c, P7 45〇g, and p5 emblems. The multiplexer sends the output data to subtraction component 464, subtraction 38 200816082 S3U06-0022I00-TW 24598 twf.doc/p, and 464 subtracts the received data and sends the data to shifter 470 and adder 472 . The shifter 47 shifts the received data to the left by two bits and transmits the shifted data to the adder 472, which adds the received data and sends the result to the skip block 48A. In addition, subtraction component 462 receives the data from P4 450d and P5 450e, subtracts the received data, and sends the result to shifter 4. Shift Zhai 468 shifts the received data to the right by one bit, and outputs this data as a Hung Clip data clip for input to the clamp component 446 and the multiplexer.

另外’ P4 450d被發送至跳躍塊476而p3 45〇e資料被發送至跳躍塊478。 X 圖7D為圖7A-圖7C之圖的延續。更特定言之，如圖 7D之實施例，減法組件486自跳躍塊482以及跳躍塊484 接收資料。減法組件486對所接收之資料作減法且將結果發送至移位器488。移位器488將所接收之資料向右移三位且將結果發送至A1 490a、A2 490b以及AO 490c。另外，多工器496接收輸入資料“〇，，以及“d” 。此操作可包括：In addition, 'P4 450d is sent to jump block 476 and p3 45〇e data is sent to jump block 478. X Figure 7D is a continuation of the Figures 7A-7C. More specifically, as in the embodiment of Figure 7D, subtraction component 486 receives data from skip block 482 and jump block 484. Subtraction component 486 subtracts the received data and sends the result to shifter 488. The shifter 488 shifts the received data to the right by three bits and sends the result to A1 490a, A2 490b, and AO 490c. In addition, the multiplexer 496 receives the input data "〇,, and "d". This operation may include:

If (Do—filter) { P4[I] = P4[I] - D[I] P5[I] = P5[I] +D[I] } 多工器496經由do—filter選擇輸入而選擇所要結果。所述結果發送至減法組件5〇〇。減法組件5〇〇亦自跳躍塊 492接收資料（經由跳躍塊476，圖7C)，對所接收之資料作減法並將結果發送至P4 45〇d。 39 200816082 S3U06-OO22IOO-TW 24598twf.doc/p 多工器498亦接收“〇”以及“d”作為輸入以及 do—fllter作為選擇輸入。多工器498多工此資料且將結果發送至加法器502。加法器502亦自跳躍塊494接收資料 (經由跳躍塊478，圖7C)、相加所接收之輸入且將結果發送至P5 450e。 ° 圖8為可用於在計算架構（諸如圖2之計算架構）中執行絕對差和（sum of absolute difference，SAD)計算之邏輯區塊的方塊圖。更特定言之，如圖8之實施例，組件5〇4 接收32位元資料a[31:0]之一部分以及32位元資料B之邛刀。組件5〇4猎由判定若(C)s = Not (S) + 1則 A - B與否，而將輸出提供至加法器512。同樣地，’組件 506接收A資料以及B資料，且基於與組件5〇4類似之判定將輸出發送至加法器512，除了組件5〇6所接收之a資料以及B資料為[23:16]位元的部分以外，相對於組件綱所接收之貧料為[31:24]位元的部份。同樣地，組件5〇8接收[15··8]位元部份的資料、執行與組件5〇4以及5〇6類似的 =且將結果發送至加法器512。組件接收[7:〇]位元；^ 504'506 508 將結果發迗至加法器512。另外，組件5M、516、518以及52〇接收 :=63:32]之32位㈣部分(與在組件耻處么收之[31.G]位凡部份的㈣相對）。更特定言之接收資料A以及資料B中m.wi/六-A 、 Ή 15 T 131.24]位部份的資料。組件執仃如上所論紅馳計算，且將8位元結果發送至 40 200816082 S3U06-0022I00-TW 24598twf.doc/p 加法器522。同樣地，組件516接收[23:16]位元部份的資料、執行類似計算，且將所得資料發送至加法器522。組件518如上所述接收資料A以及資料B中以义幻位元部份的資料、處理所接收之資料，且將結果發送至加法器522。組件520如上所論述接收資料A以及資料3中[7:〇]位元部If (Do-filter) { P4[I] = P4[I] - D[I] P5[I] = P5[I] +D[I] } The multiplexer 496 selects the desired result via the do-filter selection input. . The result is sent to the subtraction component 5〇〇. Subtraction component 5 also receives data from jump block 492 (via jump block 476, Figure 7C), subtracts the received data and sends the result to P4 45〇d. 39 200816082 S3U06-OO22IOO-TW 24598twf.doc/p The multiplexer 498 also receives "〇" and "d" as inputs and do-fllter as a selection input. The multiplexer 498 multiplexes this data and sends the result to the adder 502. Adder 502 also receives data from jump block 494 (via jump block 478, Figure 7C), adds the received input, and sends the result to P5 450e. Figure 8 is a block diagram of a logic block that can be used to perform a sum of absolute difference (SAD) calculation in a computational architecture, such as the computational architecture of Figure 2. More specifically, as in the embodiment of Fig. 8, component 5〇4 receives a portion of 32-bit data a[31:0] and a 32-bit data B file. The component 5〇4 is judged if (C)s = Not (S) + 1 then A - B or not, and the output is supplied to the adder 512. Similarly, the 'component 506 receives the A data and the B data, and sends the output to the adder 512 based on a determination similar to the component 5〇4, except that the a data and the B data received by the component 5〇6 are [23:16]. In addition to the portion of the bit, the poor material received relative to the component class is the [31:24] bit portion. Similarly, component 5〇8 receives the data of the [15··8] bit portion, performs a similarity to components 5〇4 and 5〇6, and sends the result to adder 512. The component receives the [7:〇] bit; ^ 504'506 508 sends the result to the adder 512. In addition, components 5M, 516, 518, and 52〇 receive the 32-bit (fourth) portion of := 63:32] (as opposed to (4) where the component is smeared [31.G]. More specifically, the data of the m.wi/six-A and Ή 15 T 131.24] sections of the data A and the data B are received. The component performs the red-hand calculation as discussed above and sends the 8-bit result to 40 200816082 S3U06-0022I00-TW 24598twf.doc/p adder 522. Similarly, component 516 receives the data for the [23:16] bit portion, performs a similar calculation, and sends the resulting data to adder 522. The component 518 receives the data in the data A and the data B in the pseudo-bit portion as described above, processes the received data, and transmits the result to the adder 522. The component 520 receives the data A and the [7: 〇] bit unit in the data 3 as discussed above.

份的資料、處理所接收之資料，且將結果發送至加法器 522 〇 W 組件524-530接收A資料以及B資料中[95:64]位元部份之32位元。更特定言之，組件524接收[31:24]位元，組件526接收[23··16]位元，組件528接收[15:8]位元，而組件 530接收[7:0]位元的資料。一旦接收到此資料，組件 524 530可用以處理所接收之貧料，如上所述，經處理資料隨後可發送至加法器532。同樣地，組件534_54()接收a 以及B資料中[127:96]位元部份之32位元資料。更特 f言之，組件534接收A資料以及B中[31:24]位元部份的資料’組件536接收[23:16]位元部份的資料，組件538接 = [15·8]位元部份的資料，組件540接收[7:〇]位元部份的資料。所接收資料如上所論述經處理且發送至加法器541。另外，加法器512、522、532以及542對所接收之資料作加法’且將10位元結果發送至加法器544。加法器544相加所接收之資料，且將12位元資料發送至輸出端。二#圖9為類似於圖8所示可用於執行絕對差和（sad) 計=之過程之另一實施例的流程圖。更特定言之，如圖9 之貝知例’ i之定義為塊尺寸BlkSize且suma初始化 41 200816082 S3U06-0022I00-TW 24598twf.doc/p 為“0” （區塊550)。首先判定i是否大於“〇” （方塊 552)，若 i 大於 “〇” ，則 vecx[i] = Tabelx[i]、vecy[i]= Tabely[i]、vectx = mv—x+vecx[i]且 vecty = mv—y + vecy[i] (方塊554)。接著可利用vectx以及vecty計算位址，亦可自Predlmage提取4x4記憶體資料（位元組對準）（方塊 * 556)。128位元預測資料可發送至SAD 44 (見圖8)，如方The data is processed, the received data is processed, and the result is sent to the adder 522 〇 W component 524-530 receives the A data and the 32 bits of the [95:64] bit portion of the B data. More specifically, component 524 receives [31:24] bits, component 526 receives [23·16] bits, component 528 receives [15:8] bits, and component 530 receives [7:0] bits. data of. Upon receipt of this material, component 524 530 can be used to process the received lean material, which can then be sent to adder 532 as described above. Similarly, component 534_54() receives the 32-bit data of the [127:96] bit portion of the a and B data. More specifically, component 534 receives the A data and the data of the [31:24] bit portion of B. Component 536 receives the data of the [23:16] bit portion, component 538 is connected = [15·8] The data of the bit portion, component 540 receives the data of the [7:〇] bit portion. The received data is processed as discussed above and sent to adder 541. In addition, adders 512, 522, 532, and 542 add '' the received data' and send the 10-bit result to adder 544. Adder 544 adds the received data and sends the 12-bit data to the output. Figure 9 is a flow diagram similar to another embodiment of the process shown in Figure 8 that can be used to perform an absolute difference sum (s) meter. More specifically, as shown in Fig. 9, the definition of 'i' is the block size BlkSize and the suma initialization 41 200816082 S3U06-0022I00-TW 24598twf.doc/p is "0" (block 550). First determine if i is greater than "〇" (block 552), if i is greater than "〇", then vecx[i] = Tabelx[i], vecy[i]= Tabely[i], vectx = mv-x+vecx[i And vecty = mv_y + vecy[i] (block 554). The address can then be calculated using vectx and vecty, and 4x4 memory data (byte alignment) can also be extracted from Predlmage (block * 556). 128-bit predictive data can be sent to SAD 44 (see Figure 8), such as

塊558中所說明。另外，方塊560可接收塊資料且計算位 _ 址。在方塊560，亦可自Refhnage提取4x4記憶體資料並執行位元組對準。128位元Ref[i]資料隨後可發送至SAD 44 (方塊558)。和值可自SAD 44發送至方塊562，其中總和值suma增加“1”而i減少“丨”。接著可判定總和值 suma是否大於臨限值（方塊564)。若是，則過程可停止；另一方面，右總和值suma不大於該臨限值，則過程可返回方塊552以判定i是否大於〇。若i不大於〇，則過程可結束。圖10A為可用於解塊操作中（諸如可在圖2之電腦架 ’構中執行）之多個組件的方塊圖。如圖1〇A之實施例，ALU 580接收輸入資料p2以及p〇,且將資料發送至絕對值組件 586。絕對值組件586計算所接收資料之絕對值且輸出資料 ap ’判定組件590判定％是否小於p且將資料發送至跳躍塊596。ALU 58〇亦將資料發送至跳躍塊594。同樣地， ALU 582自q〇以及q2接收資料。在計算結果之後，alu 582將貝料發达至絕對值經件588，絕對值組件判定所接收資料之絕對值，並將％發送至判定組件观。判定組 42 200816082 S3U06-0022I00-TW 24598twf.doc/p 件592判定aq是否小於p且將資料發送至跳躍塊598。 ALU600自q〇以及p0接收資料、計算結果且將結果發送至絕對值組件606。絕對值組件6〇6判定與所接收資料的絕對值’且將其發送至判定組件612。判定组件612 判定所接收之值是否小於α，且將結果發送至及閘㈣。 ALU602自ρ〇以及收資料、計算結果且將結果發送至絕對值組件608。絕對值組件6〇8判定所接收資料之絕對值，且將此值發送至判定組件614。判定組件614判定所接收倾是利、於β，且將結紐送至及閑㈣。auj 604自q〇以及ql接收資料、計算結果且將結果發送至絕對值組件610。絕對值組件610判定所接收資料之絕對值，且將結果發送至判定組件616。判定組件616判定所接收資料是否小於β，且將結果發送至及閘62〇。另外，及閘 620自判疋組件618接收資料，判定組件接收沾資料且判定此資料是否不等於零。圖10B為圖10A之圖的延續。更特定言之，ALU 622 自pi以及ql接收資料、計算結果且將資料發送至ALU 624。ALU 624亦自跳躍塊646接收資料（經由圖1〇A的 ALU 580)以及在進位輸入端之4位元資料。ALu 624隨後計异結果且將結果發送至移位器626，移位器626將所接收之資料向右移三位。移位器626隨後將資料發送至剪輯3 ( clip3 )組件628 ’ dip3組件6烈亦自跳躍塊630接收資料（經由圖10D的ALU744,以下更詳細描述>clip3 組件628將資料發送至多工器634且發送至，，非(not)，，閘 43 200816082 S3U06-0022I00-TW 24598twf.doc/p 632。非閘632反轉所接收資料，且將反相資料發送至多工器634。多工器634亦在選擇輸入端接收“資料，且將選定資料發送至ALU 636。ALU 636亦自多工器640接收資料。多工器640自q〇以及p0接收資料，且自^色^叩接收运擇輸入。ALU 636之進位輸入端接收來自多工器642 之資料。多工器642接收“1”以及“〇，，以及!left—t〇p資料。ALU 636 將結果發送至 SAT ( 0,255 ) 638，SAT ( 〇,255 ) 638將資料發送至跳躍塊644 (在多工器79〇處繼續，圖 10E) 〇另外，ALU 648自q〇以及p0接收資料以及在選擇輸入端接收一位元資料，ALU 648計算結果且將此資料發送至移位器650。移位器650將所接收之資料向右移一位，且將所移位之資料發送至ALU 652。同樣地，多工器656 自pi以及ql接收資料以及!left一t〇p作為選擇輸入，多工為656判定結果，且將結果發送至移位器658。移位器將所接收之資料向左移一位，且將所移位之資料發送至 ALU 652，ALU 652計算結果且將資料發送至ALU 6幻。 ALU 662亦自多工器660接收資料，多工器66〇接收q2 以及p2以及來自跳躍塊680之資料（經由圖1〇E的非閘 802) 〇 ALU 662計异結果且將此資料發送至移位器6料，移位斋664將所接收之資料向右移一位，且將所移位之資料發送至勇輯3 (clip3)組件668。clip3組件668亦接收tc〇, 且將資料發送至ALU 670。ALU 670亦自多工器656接收 44 200816082 S3U06-0022I00-TW 24598twf.doc/p 一貝料，計算結果後將此資料發送至多工器672。多工器672 亦自夕工器656接收資料以及自跳躍塊678接收資料（經由圖10E的多工器754)，並將資料發送至跳躍塊674。圖10C為圖i〇A以及圖10B之圖的延續。如圖1〇c 之實施例，多工器682自p2、pl以及!leftJ〇p接收資料，並將選定貧料發送至加法器706。多工器684接收pi以及 p〇與!left一top並將結果發送至移位器7〇〇。移位器7〇〇將所，收之資料向左移一位，且將其發送至加法器7〇6。多工器686自p〇以及ql以及！leftJ〇p接收資料。多工器6% 將資料發送至移位器702，移位器702將所接收之資料向左移-位’且將所移位之資料發送至加法器7〇6。多工哭 ⑽自q0以及ql以及!leftJ〇p接收資料並將選定資料^ 立器704將所接收之資料向左移二位: 且將其發达至加法器706。多工器_自及^t_t〇P接收資料且將資料發送至加法器7〇 q 7〇6亦接收進位輸入端之4 法时 7〇8。 U並將翰出發达至跳躍塊同樣地，多工器691接收2 擇一結果將細至_ tGp且將選定結果發送至加法 P ' P〇自q〇、ql以及！left〜top接收眘钮、，… 夕工时694 至加法器_。多工哭親接令㈣擇一結果將其發送選擇所要結果將此射;發送=二:q2以及·丨1efu〇p ’並接收進位輸入端之2位元且“:二亦 45 200816082 S3U06-〇〇22I〇〇.tW 24598twf.doc/p „ / f工器712接收P3、q3以及!left-t0P且將結果發送至夕立™ 722。移位态722將所接收之資料向左移一位，且將其發送至加法器726。多工器714接收p2、q2以及!left—top’且將選定結果發送至移位器724以及加法哭 _ 726。移位器724將所接收之資料向左移一位，且將所移^ 之結果發送至加法器726。多工器716接收qi以及!left—top且將選定結果發送至加法器726。多工器7i8 ❿ 接收p0、矽以及丨left_t〇P，且將選定結果發送至加法器 726。多工器72〇接收p0、q0以及丨left—t叩且將選定結果發送至加法器726。加法器726在進位輸入端接收四位元與所接收之資料相加，加總後之資料發送至跳躍塊73〇。圖10D為圖10A-圖10C之圖的延續。更特定言之，如圖10D之實施例，α表格750接收in(jexA以及輸出α。 β表格748接收hidexB且將資料輸出至零擴展（Zer〇 Extend)組件752，零擴展組件752輸出p。同樣地，多工器736接收“1”以及“〇，，以及來自跳躍塊732之資料（經由圖i〇A的判定塊59〇)，並選擇結果將其發送至ALU740。多工器738亦接收“1”以及‘‘〇，，以及來自跳躍塊734之資料（經由圖i〇A的判定塊592)，並將選定結果發送至ALU 740。ALU 740計算結果且將資料發送至多工器742。多工器742亦接收“Γ以及色度邊緣旗標（chroma edge flag)資料，並選擇結果且將其發送至ALU 744。ALU 744亦接收teG、計算結果te且將結果發送至跳躍塊746。 46 200816082 S3U06-0022I00-TW 24598twf.doc/p 圖10E為圖10A-圖10D之圖的延續。更特定言之，如圖10E實施例，多工器754接收與關係式 “ChromaEdgeFlag=0) &&(ap<p)” 相關的資料，以及與關係式 “ChromaEdgeFlag==0) &&(%<β)，’ 相關的資料，並自非組件802接收資料，且將選定資料發送至跳躍塊756(至圖10B之多工器672)。As explained in block 558. Additionally, block 560 can receive the block data and calculate the bit address. At block 560, 4x4 memory data may also be extracted from the Rephness and byte alignment performed. The 128-bit Ref[i] data can then be sent to the SAD 44 (block 558). The sum value can be sent from SAD 44 to block 562 where the sum value suma is increased by "1" and i is decreased by "丨". It can then be determined if the sum value suma is greater than the threshold (block 564). If so, the process can be stopped; on the other hand, if the right sum value suma is not greater than the threshold, then the process can return to block 552 to determine if i is greater than 〇. If i is not greater than 〇, the process can end. Figure 10A is a block diagram of various components that may be used in a deblocking operation, such as may be performed in the computer rack of Figure 2. As shown in the embodiment of FIG. 1A, ALU 580 receives input data p2 and p〇 and sends the data to absolute value component 586. The absolute value component 586 calculates the absolute value of the received data and the output data ap' decision component 590 determines if % is less than p and sends the data to jump block 596. The ALU 58〇 also sends the data to the jump block 594. Similarly, ALU 582 receives data from q〇 and q2. After calculating the result, alu 582 develops the bait material to an absolute value 588, and the absolute value component determines the absolute value of the received data and sends the % to the decision component view. Decision Group 42 200816082 S3U06-0022I00-TW 24598twf.doc/p 592 determines if aq is less than p and sends the data to jump block 598. The ALU 600 receives the data from q〇 and p0, calculates the result, and sends the result to the absolute value component 606. The absolute value component 6〇6 determines the absolute value of the received data' and sends it to decision component 612. Decision component 612 determines if the received value is less than a and sends the result to the AND gate (4). The ALU 602 receives the data from the ρ 〇 and calculates the result and sends the result to the absolute value component 608. The absolute value component 6〇8 determines the absolute value of the received data and sends this value to decision component 614. Decision component 614 determines that the received dump is profit, is in beta, and sends the knot to and out (four). The auj 604 receives the data from q〇 and ql, calculates the result, and sends the result to the absolute value component 610. Absolute value component 610 determines the absolute value of the received data and sends the result to decision component 616. Decision component 616 determines if the received data is less than β and sends the result to AND gate 62. In addition, the gate 620 receives the data from the component 618, and the determining component receives the data and determines whether the data is not equal to zero. Figure 10B is a continuation of the diagram of Figure 10A. More specifically, ALU 622 receives data from pi and ql, calculates the results, and sends the data to ALU 624. ALU 624 also receives data from jump block 646 (via ALU 580 of Figure 1A) and 4-bit data at the carry input. The ALu 624 then counts the results and sends the result to the shifter 626, which shifts the received data to the right by three bits. The shifter 626 then sends the data to the clip 3 ( clip3 ) component 628 ' dip3 component 6 also receives data from the skip block 630 (via the ALU 744 of Figure 10D, described in more detail below). The clip3 component 628 sends the data to the multiplexer 634 and sent to, not (not), gate 43 200816082 S3U06-0022I00-TW 24598twf.doc / p 632. Non-gate 632 reverses the received data, and sends the inverted data to multiplexer 634. multiplexer The 634 also receives the "data" at the selection input and sends the selected data to the ALU 636. The ALU 636 also receives data from the multiplexer 640. The multiplexer 640 receives the data from q〇 and p0, and receives the data from the color 叩Select input. The carry input of ALU 636 receives data from multiplexer 642. Multiplexer 642 receives "1" and "〇,, and !left-t〇p data. ALU 636 sends the result to SAT (0,255) 638, SAT ( 〇, 255 ) 638 sends the data to jump block 644 (continued at multiplexer 79 ,, Figure 10E) 〇 In addition, ALU 648 receives data from q〇 and p0 and receives a bit at the select input. Data, ALU 648 calculates the result and sends this data to shifter 65 0. The shifter 650 shifts the received data to the right by one bit and transmits the shifted data to the ALU 652. Similarly, the multiplexer 656 receives the data from pi and ql and the !left-t〇p Select input, multiplex determines 656 the result, and sends the result to shifter 658. The shifter shifts the received data one bit to the left and sends the shifted data to ALU 652, ALU 652 calculates the result The data is sent to the ALU 6 illusion. The ALU 662 also receives data from the multiplexer 660, and the multiplexer 66 receives the data from q2 and p2 and from the hop block 680 (via the non-gate 802 of Figure 〇E) 〇ALU 662 The data is sent to the shifter 6 and the data is shifted to the right by one bit, and the shifted data is sent to the clip 3 component 668. The clip3 component The 668 also receives the tc〇 and sends the data to the ALU 670. The ALU 670 also receives 44 from the multiplexer 656. 200816082 S3U06-0022I00-TW 24598twf.doc/p A bill of materials, the result is sent to the multiplexer 672 The multiplexer 672 also receives data from the 868 device and receives data from the hop block 678 (via the map) 10E multiplexer 754) and sends the data to jump block 674. Figure 10C is a continuation of Figures iA and Figure 10B. As in the embodiment of FIG. 1C, multiplexer 682 receives data from p2, pl, and !leftJ〇p, and sends the selected lean material to adder 706. The multiplexer 684 receives pi and p 〇 with !left one top and sends the result to the shifter 7 〇〇. The shifter 7 shifts the received data one bit to the left and sends it to the adder 7〇6. Multi-tool 686 from p〇 and ql and! leftJ〇p receives the data. The multiplexer 6% sends the data to the shifter 702, which shifts the received data to the left by the 'bit' and transmits the shifted data to the adder 7〇6. Duplex Cry (10) Receives data from q0 and ql and !leftJ〇p and moves the selected data to the left by two bits: and develops it to adder 706. The multiplexer_self and ^t_t〇P receive the data and send the data to the adder 7〇 q 7〇6 also receives the 4 input method of the carry input terminal 7〇8. U and will start to jump block. Similarly, multiplexer 691 receives 2 alternative results will be fine to _tGp and send the selected result to addition P ' P〇 from q〇, ql and ! Left~top receives caution button, .... 694 hours to adder _. Duplicate crying pro- stipulations (4) Select a result to send it to select the desired result; send = two: q2 and · 丨1efu〇p 'and receive the 2-bit of the carry input and ": two also 45 200816082 S3U06- 〇〇22I〇〇.tW 24598twf.doc/p „ / worker 712 receives P3, q3, and !left-t0P and sends the result to Xi Li TM 722. Shift state 722 shifts the received data one bit to the left and sends it to adder 726. The multiplexer 714 receives p2, q2, and !left_top' and sends the selected result to the shifter 724 and the addition cry _ 726. The shifter 724 shifts the received data one bit to the left and sends the result of the shift to the adder 726. The multiplexer 716 receives qi and !left_top and sends the selected result to the adder 726. The multiplexer 7i8 接收 receives p0, 矽, and 丨left_t〇P, and sends the selected result to the adder 726. The multiplexer 72 receives p0, q0, and 丨left_t叩 and sends the selected result to the adder 726. The adder 726 receives the four bits at the carry input and adds the received data, and the summed data is sent to the skip block 73. Figure 10D is a continuation of the Figures 10A-10C. More specifically, as in the embodiment of FIG. 10D, alpha table 750 receives in (jexA and output a. beta table 748 receives hidexB and outputs the data to zero extension (Zer〇Extend) component 752, which outputs p. Similarly, multiplexer 736 receives "1" and "〇," and data from jump block 732 (via decision block 59 of Figure iA), and selects the result to send it to ALU 740. Multiplexer 738 also Receive "1" and ''〇, and data from jump block 734 (via decision block 592 of Figure iA) and send the selected result to ALU 740. ALU 740 calculates the result and sends the data to multiplexer 742 The multiplexer 742 also receives the "Γ and chroma edge flag data, and selects the result and sends it to the ALU 744. The ALU 744 also receives the teG, calculates the result te, and sends the result to the jump block 746. 46 200816082 S3U06-0022I00-TW 24598twf.doc/p Figure 10E is a continuation of the Figures 10A-10D. More specifically, as in the embodiment of Figure 10E, the multiplexer 754 receives the relationship "ChromaEdgeFlag = 0" &&(ap<p)" related materials, and The relationship "ChromaEdgeFlag==0) &&(%<β), 'relevant data, and receives data from non-component 802, and sends the selected data to jump block 756 (to multiplexer 672 of Figure 10B) ).

另外’多工态780接收與關係式“chromaEdgeFlag=0) &&(ap<p) &&(abs(pO-qO)<((a»2)+2)” 相關的資料以及與關係式 “ ChromaEdgeFlag==〇)&&(aq<p)&&(abs(p0_q0)< ((α»2) +2))”相關的資料，多工器78〇亦自非組件8〇2接收選擇輸入，依此選擇所要結果且將其發送至多工器 782、784 以及 786。一多工态757自pi、qi以及非組件8〇2接收資料，將選定貧料發送至移位器763，移位器763將所接收之資料向左移一位，且將其發送至加法器774。多工器759自非组件802接收p0、q0以及資料，且將選定資料發送至加法器 774〜。多工器761自ql、pl以及非組件8〇2接收資料且發达至加法11 774。加法11 774亦在進位輸入端接收兩位兀之資料，且將輸出發送至多工器782。法-移:器爾自跳躍塊758接收資料（經由圖的加收之資料向右移三位，接著將所移位次料782。移位11 766自跳躍塊760接收 ;Γ的加法器698)且將所接收之資料向右移兩位，接讀所錄之資料發駐多工H784。移^ 47 200816082 S3U06-0022I00-TW 24598tw£doc/p 768自跳躍塊762接收資料（自圖1〇c的加法器726)且將所接收之資料向右移三位，接著將所移位之資料發送至多工器786。、又如以上所論述，多工器782自移位器764以及加法器 : 782以及多工器780接收資料，自此資料選擇結果且將i - 發送至多工器790。同樣地，多工器784自移位器冗卜; 料多工器780與多工器776接收資料。多工器776接收pi、 • ql以及來自非組件802之資料，接著將選定結果發送至多工器798。多工器786自移位器768、多工器78〇與多工哭 77^接收資料。多工器778接收p2、q2a及來自非組件8〇2 之資料。多工态786將選定資料發送至多工器8⑻。如上所論述，多工器790自多工器782接收資料。另外，多工器790自跳躍塊772(經由圖1〇B的SAT組件638 ) 以及多工器794接收資料。多工器794接收p〇、q〇以及非組件802之資料。多工器79〇亦接收bSn & nfllterSampleFlag資料作為選擇輸入，並將選定資料發送至緩衝斋808以及810。同樣地，多工器798自多工器784、跳躍塊755 (纽由圖10B的多工器674)與多工器792接收資料以及選擇輸入的bSn & nfilterSampleFlag資料。多工器792接收P1、qi以及非組件8〇2之資料。多工器798 將貢料發送至緩衝器806以及812。同樣地，多工器8〇〇自夕工裔786接收資料且接收bSn & nfilterSampleFlag資料作為選擇輸入。另外，多工器8〇〇自多工器788接收資料°多工|§ 788接收p2、q2以及非組件802之資料。多工 48 200816082 S3UO6.O022IOO.TW 24598twf.doc/p 器800選擇所要資料，且將資料發送至缓衝器806以及 814。緩衝器804-814亦自非組件802接收資料，且將資料分別發送至p2、pi、p〇、q〇、ql以及q2。圖11為說明可用於在計算架構（諸如圖2之計算架構） . 中執行資料之過程之實施例流程圖。如圖11之實施例紋理 - 位址產生器TAG的奇數方塊880以及偶數方塊882 (亦見圖2之150)接收來自輸出端口 144 (圖2)的資料。接著馨產生用於所接收資料的位址’且此過程進行至紋理快取記憶體與控制器（TCC) 884、886 (亦見圖2，166)。資料隨後可發送至快取記憶體890以及紋理濾波先進先出組件（Texture Cache First In First Out，TFF ) 888、892，其可用以充當延遲佇列/緩衝器。資料隨後發送至紋理濾波單兀 894、896 ( Texture Filter Unit，TFU，亦見圖 2，168 )。一旦資料經過濾波後，TFU894、896便將資料發送至γρυ 898、900 (亦見圖2，199)。視指令是否要求動態補償濾波、紋理快取記憶體濾波、互解塊濾波及/或絕對差和而疋，資料可發送至不同VPU及/或相同VPU之不同部分。在處理了所接收之資料之後，VPU 898、900可將資料發送至輸入端口 902、904之輸出端（亦見圖2，142 )。本文中所揭露之實施例可在硬體、軟體、韌體或其組合中實施。本文中所揭露之至少一實施例在儲存於記憶體中，且由適當指令執行系統所執行之軟體及/或韌體中實施。若在硬體中實施，如在替代實施例中，則本文中所揭露之實施例可以以下技術之任一者或組合來實施··具有用 49 20081 ⑨沒·tw 24598_· 於對資料信號實施邏輯功能之邏輯閘的離散邏輯電路、具有適當組合邏輯閘之特殊應用積體電路（ASIC)、可程式閘陣列（PGA)、場可程式閘陣列（FPGA)等。應注意本文中所包括之流程圖展示軟體及/或硬體之 :可能實施例的架構、功能以及操作。關於此，可將每一方 _ 塊解釋為表示模組、區段或代碼之一部分，其包括用於實施規定邏輯功能之一或多個可執行指令。亦應注意在一些 ⑩ 替代實施例中，方塊中所註釋之功能可異乎尋常及/或根本不出現。舉例而言，視所包括之功能而定，連續展示之兩方塊實際上可實質上同時執行或方塊有時可以相反順序執行。、應注意本文中所那程式之任-者（其可包括用於實輯功a之可執行指令的有序列表）可體現於由指令執統:裝置或設備（諸如以電腦絲礎的祕、含有處統或可自指令執行系統、裳置或設備提取指令且 • 腦；二ίϊ他系統）使用或結合所述各項使用之任何電可為ΐ人右。在此文獻之上下文中，“電腦可讀媒體” -備二特、傳送或輪送邮令執行祕、裝置或其進行使用之程式的任何構件。電腦可讀或半導體系統、磁、光、電磁、紅外線 -唯讀記;:=))(=^^^ 50 200816082 S3U06-0022I00-TW 24598twf.doc/p 憶體（EPROM或快閃記憶體）（電子）、光纖（光）以及攜帶型壓縮光碟唯讀記憶體（CDROM)(光）。另外，此揭露内容之某些實施例的範脅可包括：體現以硬體或軟體木構之媒體中所體現之邏輯中所述的功能。 •亦應㈣條件性語言（諸如）尤其是“可（eanH . might或may)，除非另外特別規定或在所使用之上下文内另有理解，否則大體上旨在傳達某些實施例包括（而 • 無實施例不包括）某些特徵、元件及/或步驟。因此，此等條件性語言-般並非旨在暗示特徵、元件及/或步驟總是被-或多個特殊實施例所需，或暗示一或多個特殊實施例必，包括，採用或不採用使用者輪入或提示之情況下用於決策之邏輯’而不管任何特殊實施例中是否將包括或執行此等特徵、元件及/或步驟。應強調以上所述之實施例僅為實施例之可能實例、僅陳述以便清晰理解此揭露内容之原理。在實質上不偏離揭露内容之精神以及範嘴的情況下可對以上所述之實施例進 • 行許多變化以及修改。所有此等修改以及變化欲包括於本文中在此揭露内容之範疇内。【圖式簡單說明】圖1為用於處理視訊資料之計算架構的實施例。圖2為類似於圖丨之架構之引入了視訊處理單元 (VPU)之計算架構的實施例。圖3為諸如在圖2之計算架構中用於處理視訊以及圖形資料之過程之流程圖實施例。 51 200816082 S3U06-0022I00-TW 24598tw£doc/p 圖4A為在計算裝置（諸如具有圖2之計算架構的計算裝置）中之資料流之功能流程圖實施例。圖4B為圖4A之功能流程圖的延續。圖4C為圖4A以及圖4B之功能流私圖的延續。圖5A為諸如在圖2之計算架構中可用於提供動態壓縮（MC)及/或離散餘弦轉換（DCT)操作之組件實施例的功能方塊圖。In addition, the multi-mode 780 receives information related to the relationship "chromaEdgeFlag=0) &&(ap<p) &&(abs(pO-qO)<((a»2)+2)" And the information related to the relationship "ChromaEdgeFlag==〇"&&(aq<p)&&(abs(p0_q0)< ((α»2) +2))", multiplexer 78〇 The selection input is also received from the non-component 8〇2, and the desired result is selected accordingly and sent to the multiplexers 782, 784, and 786. A multi-mode 757 receives data from pi, qi, and non-component 8〇2, sends the selected lean material to shifter 763, which shifts the received data one bit to the left and sends it to the addition. 774. The multiplexer 759 receives p0, q0 and data from the non-component 802 and sends the selected data to the adder 774~. The multiplexer 761 receives data from ql, pl, and non-component 8〇2 and is developed to add 11 774. Addition 11 774 also receives the two bits of data at the carry input and sends the output to multiplexer 782. The law-shift: Receiver data is received from the jump block 758 (the data added via the map is shifted to the right by three bits, then the shifted secondary 782 is received. The shift 11 766 is received from the jump block 760; the adder 698 of the file ) and shift the received data to the right by two digits, and read the recorded data to station the multiplex H784. Transfer ^ 47 200816082 S3U06-0022I00-TW 24598 tw£doc/p 768 Receive data from jump block 762 (adder 726 from Figure 1〇c) and shift the received data to the right by three bits, then shift the shift The data is sent to the multiplexer 786. As also discussed above, multiplexer 782 receives data from shifter 764 and adder: 782 and multiplexer 780, selects the result from this data, and sends i - to multiplexer 790. Similarly, multiplexer 784 is redundant from the shifter; multiplexer 780 and multiplexer 776 receive the data. The multiplexer 776 receives pi, ql, and data from the non-component 802, and then sends the selected result to the multiplexer 798. The multiplexer 786 receives the data from the shifter 768, the multiplexer 78, and the multiplexed cry. Multiplexer 778 receives p2, q2a, and data from non-components 8〇2. The multi-mode 786 sends the selected data to the multiplexer 8 (8). As discussed above, multiplexer 790 receives data from multiplexer 782. In addition, multiplexer 790 receives data from jump block 772 (via SAT component 638 of Figure IB) and multiplexer 794. The multiplexer 794 receives data of p〇, q〇 and non-component 802. The multiplexer 79〇 also receives the bSn & nfllterSampleFlag data as a selection input and sends the selected data to the buffers 808 and 810. Similarly, multiplexer 798 receives data from multiplexer 784, jump block 755 (which is multiplexer 674 of Figure 10B), and multiplexer 792, and selects the input bSn & nfilterSampleFlag data. The multiplexer 792 receives the data of P1, qi, and non-components 8〇2. The multiplexer 798 sends the tribute to the buffers 806 and 812. Similarly, the multiplexer 8 receives data from the 868 and receives the bSn & nfilterSampleFlag data as a selection input. In addition, the multiplexer 8 receives data from the multiplexer 788. § 788 receives data from p2, q2, and non-component 802. Multiplex 48 200816082 S3UO6.O022IOO.TW 24598twf.doc/p 800 selects the desired data and sends the data to buffers 806 and 814. Buffers 804-814 also receive data from non-component 802 and send the data to p2, pi, p〇, q〇, ql, and q2, respectively. 11 is a flow diagram illustrating an embodiment of a process that may be used to execute data in a computing architecture, such as the computing architecture of FIG. 2. The texture block of the embodiment of Figure 11 - odd block 880 of address generator TAG and even block 882 (see also 150 of Figure 2) receive data from output port 144 (Figure 2). Then Xin generates the address for the received data' and the process proceeds to Texture Cache Memory and Controller (TCC) 884, 886 (see also Figures 2, 166). The data can then be sent to cache memory 890 and Texture Cache First In First Out (TFF) 888, 892, which can be used to act as a delay queue/buffer. The data is then sent to the texture filtering unit 894, 896 (Texture Filter Unit, TFU, see also Figure 2, 168). Once the data has been filtered, TFU 894 and 896 send the data to γρυ 898, 900 (see also Figure 2, 199). Depending on whether the instruction requires dynamic compensation filtering, texture cache memory filtering, mutual deblocking filtering, and/or absolute difference sum, the data can be sent to different VPUs and/or different parts of the same VPU. After processing the received data, the VPU 898, 900 can transmit the data to the outputs of the input ports 902, 904 (see also Figures 2, 142). Embodiments disclosed herein can be implemented in hardware, software, firmware, or a combination thereof. At least one embodiment disclosed herein is stored in a memory and implemented in software and/or firmware executed by an appropriate instruction execution system. If implemented in hardware, as in an alternative embodiment, the embodiments disclosed herein may be implemented in any one or combination of the following techniques: • with 49 20081 9 no tw 24598_· for data signals Discrete logic circuit for logic function logic gate, special application integrated circuit (ASIC) with appropriate combination logic gate, programmable gate array (PGA), field programmable gate array (FPGA), etc. It should be noted that the flowcharts included herein represent software and/or hardware: the architecture, functionality, and operation of possible embodiments. In this regard, each block may be interpreted as representing a module, section, or portion of code that includes one or more executable instructions for implementing the specified logical functions. It should also be noted that in some alternative embodiments, the functions noted in the blocks may be unusual and/or non-existent. For example, two blocks of consecutive presentations may be executed substantially concurrently or the blocks can sometimes be executed in the reverse order, depending on the functionality included. It should be noted that the program of the program in this article (which may include an ordered list of executable instructions for the actual work) can be embodied in the instruction: device or device (such as the secret of the computer) Anything that is used or combined with any of the items described above may be defamatory, with or without a command execution system, a skirt or device extraction instruction. In the context of this document, "computer-readable medium" - any component of a program that transmits or transmits a mail order, a device, or a program for its use. Computer readable or semiconductor system, magnetic, optical, electromagnetic, infrared-read only;:=))(=^^^ 50 200816082 S3U06-0022I00-TW 24598twf.doc/p Remembrance (EPROM or flash memory) (Electronic), optical fiber (optical), and portable compact disk read-only memory (CDROM) (light). Additionally, the scope of certain embodiments of the disclosure may include media embodied in hardware or software. The functions described in the logic embodied in • • should also (4) conditional language (such as) especially “may (eanH. might or may), unless otherwise specifically stated or otherwise understood in the context of use, otherwise The use of certain features, elements, and/or steps in the present invention is intended to be inconsistent with respect to certain embodiments. Therefore, such conditional language is not intended to suggest that features, components, and/or steps are always It is required by one or more specific embodiments, or implies that one or more specific embodiments are necessary, including with or without the user's logic for decision making in the case of a user's rotation or prompting', in any particular embodiment. Whether these features will be included or implemented The elements of the above-described embodiments are merely illustrative of the possible embodiments of the embodiments, and are merely illustrative of the principles of the disclosure, without departing from the spirit of the disclosure and the scope of the disclosure. Many variations and modifications are possible in the above-described embodiments. All such modifications and variations are intended to be included within the scope of the disclosure herein. [Simplified Schematic] Figure 1 is a calculation for processing video data. Embodiments of the architecture. Figure 2 is an embodiment of a computing architecture incorporating a video processing unit (VPU) similar to the architecture of Figure 2. Figure 3 is a process for processing video and graphics data, such as in the computing architecture of Figure 2. Flowchart embodiment. 51 200816082 S3U06-0022I00-TW 24598 TW/doc Figure 4A is a functional flow diagram embodiment of a data flow in a computing device, such as a computing device having the computing architecture of Figure 2. Figure 4B is a A continuation of the functional flow diagram of Figure 4A. Figure 4C is a continuation of the functional flow private diagram of Figures 4A and 4B. Figure 5A is a dynamic compression (MC) that can be used to provide dynamic compression, such as in the computing architecture of Figure 2. And/or a functional block diagram of a component embodiment of a discrete cosine transform (DCT) operation.

圖5B為圖5A之圖的延續。圖5C為圖5A以及圖5B之圖的延續。圖5D為圖5A-圖5C之圖的延續。圖5E為圖5A-圖5D之圖的延續。圖5F為圖5A-圖5E之組件之總圖的實施例。圖6為可用於計异架構（諸如圖2之計算竿構）之像素處理引擎的功能方塊圖。，7A為說明可用於VC]迴路内濾波器（諸如在圖2 之計异架構中）之組件的功能方塊圖。圖7B為圖7A之圖的延續。〇之計算架構）中和計算之過程之圖7C為圖7A以及圖7B之圖的延續圖7D為圖7A-圖7C之圖的延續。、圖8為可用於在計算架構（諸如圖2 執行絕對差和計算之組件的方塊圖。圖9為類似於圖8可用於執行絕對實施例的流程圖。圖為說明可用於解塊操作中（諸如可在圖2之電 52 200816082 S3U06-0022I00-TW 24598tw£doc/p 腦架構中執行）之多個組件的方塊圖。圖10B為圖10A之圖的延續。圖10C為圖10A以及圖10B之圖的延續。圖10D為圖10A-圖10C之圖的延續。圖10E為圖10A-圖10D之圖的延續。圖11為可用於在計算架構（諸如圖2之計算架構）中執行資料之過程之實施例流程圖。【主要元件符號說明】 88、102 :内部邏輯分析器Figure 5B is a continuation of the diagram of Figure 5A. Figure 5C is a continuation of the Figures 5A and 5B. Figure 5D is a continuation of the Figures 5A-5C. Figure 5E is a continuation of the Figures 5A-5D. Figure 5F is an embodiment of a general view of the components of Figures 5A-5E. Figure 6 is a functional block diagram of a pixel processing engine that can be used in a computing architecture, such as the computing architecture of Figure 2. , 7A is a functional block diagram illustrating the components that can be used in a VC] in-loop filter, such as in the different architecture of Figure 2. Figure 7B is a continuation of the Figure 7A.计算 Calculation Architecture) Process of Neutralization Calculation FIG. 7C is a continuation of the diagrams of FIGS. 7A and 7B. FIG. 7D is a continuation of the diagrams of FIGS. 7A-7C. Figure 8 is a block diagram of components that can be used in a computing architecture, such as Figure 2 to perform absolute differences and calculations. Figure 9 is a flow diagram similar to Figure 8 that can be used to perform an absolute embodiment. The figure illustrates that it can be used in a deblocking operation. FIG. 10B is a continuation of the diagram of FIG. 10A. FIG. 10C is a continuation of the diagram of FIG. 10A. FIG. 10C is a continuation of the diagram of FIG. 10A. FIG. Continuation of Figure 10D. Figure 10D is a continuation of the Figures 10A-10C. Figure 10E is a continuation of Figures 10A-10D. Figure 11 is a diagram that can be used in a computing architecture, such as the computing architecture of Figure 2. Flow chart of an embodiment of the process of data. [Explanation of main component symbols] 88, 102: Internal logic analyzer

90、104 :匯流排介面單元BIU90, 104: Bus interface unit BIU

106a、106b、106c、106d :記憶體介面單元 MIU 108 :記憶體存取端口 110、116 :資料流快取記憶體 112 :頂點快取記憶體 114 : L2快取記憶體 118 :具有快取記憶體子系統之EUP控制器 120 :命令流處理器（CSP)前端 122 : 3D與狀態組件 124 : 2D前置組件 126 : 2D先進先出（FIFO)組件 128 : CSP後端/ZL1快取記憶體 130 :清晰度與型號紋理處理器 132 :高級加密系統（AES)加密/解密組件 134 :三角與屬性配置單元 53 200816082 S3U06-0022I00-TW 24598twf.doc/p 136 :跨距像磚產生器 138 ： ZL1 140 ： ZL2 142、902、904 :輸入端口 144 :輸出端口 146 :執行單元之集區EUP/BW壓縮器 148 : Z與ST快取記憶體106a, 106b, 106c, 106d: memory interface unit MIU 108: memory access port 110, 116: data stream memory 112: vertex cache memory 114: L2 cache memory 118: with cache memory EUP Controller 120 of the Subsystem: Command Stream Processor (CSP) Front End 122: 3D and Status Component 124: 2D Pre-Component 126: 2D First In First Out (FIFO) Component 128: CSP Backend/ZL1 Cache Memory 130: Definition and Model Texture Processor 132: Advanced Encryption System (AES) Encryption/Decryption Component 134: Triangle and Attribute Configuration Unit 53 200816082 S3U06-0022I00-TW 24598twf.doc/p 136: Span Brick Generator 138: ZL1 140 : ZL2 142, 902, 904: Input port 144: Output port 146: Set unit EUP/BW compressor 148: Z and ST cache memory

150 :紋理位址產生器TAG 152 : D快取記憶體 154 : 2D處理組件 156 :前封裝器 158 :内插器 160 :後封裝器 162 :寫回單元 164a、164b :記憶體存取單元Μχυ 166、884、886 :紋理快取記憶體與控制器tcc150: texture address generator TAG 152: D cache memory 154: 2D processing component 156: front wrapper 158: interpolator 160: post wrapper 162: write back unit 164a, 164b: memory access unit 166, 884, 886: texture cache memory and controller tcc

168、894、896 ··紋理濾波單元TFU 199、898、900 :視訊處理單元Vpu 234 :加密位元流 236 :解密組件 238 :編碼位元流 240· VLD、雈夫哭（Huffman)解石馬器、CAVLC、CABAC 242 : EUPTAG 介面 244 :圖像標頭 54 200816082 S3UU6-0022I00-TW 24598twf.doc/p 246a、246b、246c、246n :記憶體缓衝器 MB 250、252、254、256、258、260、270、272、274、276、 344a〜i、346a〜i、348a〜i、362 j〜r、366 j〜r、368a〜r、372b〜r、 376b〜j、474、476、478、480、482、484、492、494、594、 596、598、630、644、646、674、678、680、708、710、 . 730、732、734、746、755、756、758、760、762、770、 k 772 :跳躍塊 262 :反DC/AC預測組件 ® 264 ··反掃描反Q組件 265 :交換器 266 :編碼圖案塊重建組件 280 :濾波器組件 282 : MC濾波器 284 :重建參考組件 286 :編碼圖案塊重建 288 :交換器組件 • 29〇 :重建框架組件 292 :解塊及去環濾波器 294 :解交錯組件 296 :反變換組件/迴路内濾波器 298、330、442、472、502、512、522、532、542、544、 698、706、726、774 ··加法器 300、302、304、306、308、310、312、324 : Z]延遲組件 55 200816082 S3U06-G〇22I00-TW 24598twf.doc/p 314a、314b、314c、314d : PE 316 : Z_3延遲組件 320 : Z_2延遲組件 318、322、326、328、342、342a〜i、369、369a〜i、168, 894, 896 · Texture Filtering Unit TFU 199, 898, 900: Video Processing Unit Vpu 234: Encrypted Bit Stream 236: Decryption Component 238: Coded Bit Stream 240·VLD, Huffman Stone Horse , CAVLC, CABAC 242: EUPTAG interface 244: image header 54 200816082 S3UU6-0022I00-TW 24598twf.doc/p 246a, 246b, 246c, 246n: memory buffer MB 250, 252, 254, 256, 258 , 260, 270, 272, 274, 276, 344a~i, 346a~i, 348a~i, 362j~r, 366j~r, 368a~r, 372b~r, 376b~j, 474, 476, 478 , 480, 482, 484, 492, 494, 594, 596, 598, 630, 644, 646, 674, 678, 680, 708, 710, .730, 732, 734, 746, 755, 756, 758, 760, 762, 770, k 772: Jump Block 262: Anti-DC/AC Prediction Component® 264 · Anti-Scan Anti-Q Component 265: Switch 266: Encoding Pattern Block Reconstruction Component 280: Filter Component 282: MC Filter 284: Reconstruction Reference Component 286: Encoding Pattern Block Reconstruction 288: Switch Component • 29〇: Reconstruction Framework Component 292: Deblocking and Delooping Filter 294: Deinterleaving Component 296: Inverse Transform Component/ In-loop filters 298, 330, 442, 472, 502, 512, 522, 532, 542, 544, 698, 706, 726, 774 · Adders 300, 302, 304, 306, 308, 310, 312, 324 : Z] delay component 55 200816082 S3U06-G〇22I00-TW 24598twf.doc/p 314a, 314b, 314c, 314d: PE 316: Z_3 delay component 320: Z_2 delay component 318, 322, 326, 328, 342, 342a~ I, 369, 369a~i,

382、382a〜d、390、390a〜d、400、402、404、406、408、 420、422、424、428、452、454、456、458、496、498、 634、640、642、656、660、672、682、684、686、690、 691、692、694、696、712、714、716、718、720、736、 738、742、754、757、759、761、776、778、780、782、 784、786、788、790、792、794、796、798、800 :多工器 332 : N移位器 340、304a〜1 :記憶體缓衝器 350、350a〜i :記憶體B、槽 360 :轉置網路 370、370a〜i : FIR濾波器塊 380、380b〜j :記憶體緩衝器c、槽 384、384a〜d、580、582、600、602、604、622、624、 636、648、652、662、670、740、744、： ALU 386、386a〜d、412、440、444、466、468、470、488、 626、650、658、664、700、702、704、722、724、763、 764、766、768 :移位器 388、388a〜d ·· Z 塊 410 :乘法器 426 :邏輯或閘 56 200816082 〇 j u υυ-υ u22I00-TW 24598twf.doc/p 430、432、586、606、608、610 :絕對值組件 434 :最小值組件 436 : 2進位補數組件 438、460、462、464、486、500 :減法組件 446 ··钳位組件 450a〜h : P1〜8資料 490a ： A1 490b ： A2382, 382a~d, 390, 390a~d, 400, 402, 404, 406, 408, 420, 422, 424, 428, 452, 454, 456, 458, 496, 498, 634, 640, 642, 656, 660, 672, 682, 684, 686, 690, 691, 692, 694, 696, 712, 714, 716, 718, 720, 736, 738, 742, 754, 757, 759, 761, 776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800: multiplexer 332: N shifters 340, 304a~1: memory buffers 350, 350a~i: memory B, Slot 360: transposed network 370, 370a~i: FIR filter blocks 380, 380b~j: memory buffer c, slots 384, 384a~d, 580, 582, 600, 602, 604, 622, 624, 636, 648, 652, 662, 670, 740, 744, ALU 386, 386a~d, 412, 440, 444, 466, 468, 470, 488, 626, 650, 658, 664, 700, 702, 704, 722, 724, 763, 764, 766, 768: shifter 388, 388a~d · · Z block 410: multiplier 426: logic or gate 56 200816082 〇ju υυ-υ u22I00-TW 24598twf.doc/p 430, 432, 586, 606, 608, 610: absolute value component 434: minimum component 436 : 2 Carry complement component 438, 460, 462, 464, 486, 500: subtraction component 446 · · clamp component 450a~h : P1~8 data 490a : A1 490b : A2

• 490c : AO 504、506、508、510、514、516、518、520、524、526、 528、530、534、536、538、540 :組件 590、592、612、614、616、618 :判定組件 620 :及閘 628、668 ·· clip3 組件 632 :非閘 638 : SAT 組件 • 748 : β表格 750 : α表格 752 ·务擴展組件 802 :非組件 804、806、808、810、812、814 :緩衝器 880、882 :紋理位址產生器-TAG方塊 888、891 :紋理濾波先進先出組件TFF 890 :快取記憶體 57• 490c: AO 504, 506, 508, 510, 514, 516, 518, 520, 524, 526, 528, 530, 534, 536, 538, 540: components 590, 592, 612, 614, 616, 618: decision Component 620: and gates 628, 668·· clip3 component 632: non-gate 638: SAT component • 748: β table 750: α table 752 • service extension component 802: non-components 804, 806, 808, 810, 812, 814: Buffers 880, 882: Texture Address Generator - TAG Blocks 888, 891: Texture Filtering First In First Out Components TFF 890: Cache Memory 57

Claims

200816082 S3UU6-UU22I00-TW 24598twf.doc/p X. Patent application scope: • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • One of the formats of the video, wherein the instruction converts the Weibo data; the rate and the conversion logic are based on the surface information: the heterogeneous format carries the 14-wave logic 2. As described in the patent scope 1, the filter can be executed as the filter logic circuit A dynamic compensation filter falls. ^ The programmable video processing unit of the rUf patent range 2, in the bean format, operates in a two-pass mode of one-two vertical filtering and horizontal filtering. The program video processing unit of Shen Nawei 2 series, when the vc-i format is 1/2 degree in the basin, the filtering logic circuit: (6) is taken off, and the Lai type is V (H format 1/) The production logic is operated in a double-cube mode. 5. If you apply for the general-purpose video processing unit, the =-type indication is H. 264 format quarter-pixel, the occupational wave The logic circuit operates in the party mode; the mode indicates that the block is Η·264 format 八8守, far; the wave logic circuit operates in a chromaticity mode. ” =#6·, as described in the patent application scope The programmable video processing unit, wherein the mode indicates that the interception bit is in the MPEG-2 format, the conversion logic circuit performs a two-segment 58 200816082 s^uuo-uu22I〇〇-TW 24598twf.doc/p scattered cosine inverse conversion operation. The programmable video processing unit of claim 1, wherein the mode indicates that the interception is in the VC-ι and Η·264 formats, the conversion logic circuit performs an integer conversion operation. The programmable video processing unit of 1 is further included A deblocking logic circuit for performing intra-loop filtering. 9. A programmable video processing unit comprising:

An identification logic circuit for identifying a format of the video data; a dynamic compensation logic circuit for performing a dynamic compensation operation; a discrete cosine inverse conversion logic circuit for performing a discrete cosine inverse conversion operation; and, an integer conversion avoidance The circuit is configured to perform an integer conversion operation. The discrete cosine inverse conversion logic circuit and the integer conversion logic circuit are respectively turned off according to the identification result of the identification logic circuit. 10. The programmable video processing unit according to claim 9, wherein the identification result in the bean is V (the inverse conversion logic circuit of the M and the 264 264 format is closed. _ Zheng Yuxian 11. If the patent application scope 9 The programmable video processing bean, (4) When the identification result is Xia-2 format, the integer logic circuit is turned off: 兮 # # # #9 9 The programmable video processing unit, in: Hai identification I. For both V (M and Η. 264 formats), the more self-human logic circuit is used to perform the intra-wave operation in the first loop. 3 Ghost 13. The identification node in the programmable video as described in claim 9 _PEG_2 Grid; cut, the levy _ road = line = 59 200816082 W2I00-TW 24598twf.doc / p in a two-way mode. I4. In the programmable video as described in claim 9 the identification result is V (M In the format, the _ ^ ς column mode is _ : bilinear mode and bicubic mode ^ is in the lower I5 · as described in the scope of claim 9 in the process... _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The method includes: receiving an instruction; receiving video data selected from at least one of two formats; filtering the video data according to the instruction; and converting the video data according to the instruction; wherein the instruction includes a pattern recognition field and a video The data step is mixed with Wei Xun: the format and conversion of the miscellaneous. 17 The steps of filtering the T-data as described in the scope of application 16 include execution-dynamic supplementation method, which is the mode recognition method, in the dual-pass mode. The operation of the video data processing method described in Patent Requirement No. 17, wherein the dynamic compensation wave operates in the format when the work s does not block the Yangshuo style. 1/4 quasi-degree OA, , ^ and cubic chess type. The video data processing method described in the patent Zhuo Yuwei 17, wherein 60 200816082 uuo-uu22I00-TW 24598twf, doc / p the mode indication field is U. 264 When the format is quarter-pixel, the dynamic compensation filter is operated in a brightness mode; the mode indicates that the block is H 264 format octet 4, and the motion compensation chop operation operates in a chrominance mode. Wai video data processing method of claim 16, beans ^ == _ is MPEG-2 format 'of the step depends on the cooked containing a guess

22. If the pattern of the patent application is as described in claim 16 of the patent, the pattern recognition block is a spell-1 and 11 264 prescription, and the middle step includes executing the m 4 format of the towel, and the conversion 23 is as claimed in the patent scope 16 The inclusion-execution domain Wei wave is included. & 视^顺财法,更包

61