TWI289808B - Register-collecting mechanism, method for performing the same and pixel processing system employing the same - Google Patents

Register-collecting mechanism, method for performing the same and pixel processing system employing the same Download PDF

Info

Publication number
TWI289808B
TWI289808B TW94139823A TW94139823A TWI289808B TW I289808 B TWI289808 B TW I289808B TW 94139823 A TW94139823 A TW 94139823A TW 94139823 A TW94139823 A TW 94139823A TW I289808 B TWI289808 B TW I289808B
Authority
TW
Taiwan
Prior art keywords
register
program
processing system
instruction
pixel processing
Prior art date
Application number
TW94139823A
Other languages
Chinese (zh)
Other versions
TW200719274A (en
Inventor
R-Ming Hsu
Original Assignee
Silicon Integrated Sys Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Silicon Integrated Sys Corp filed Critical Silicon Integrated Sys Corp
Priority to TW94139823A priority Critical patent/TWI289808B/en
Publication of TW200719274A publication Critical patent/TW200719274A/en
Application granted granted Critical
Publication of TWI289808B publication Critical patent/TWI289808B/en

Links

Abstract

The invention provides a pixel processing system and method. The system includes a register-collecting mechanism and a pixel coloring device. The register-collecting mechanism corrects a first program to a second program. The first program is disposed with a plurality of first registers. The second program is disposed with a portion of the first registers of the first program. The pixel coloring device executes the second program. The method includes loading the first instructions of the first program into the register-collecting mechanism; scanning the first instructions; decoding the first instructions to form the code for the first register; mapping the code for the first register to the code for the second register in the register mapping table; and correcting the first program to rearrange the code for the first register in the first program so as to form a second program with a plurality of second instructions. The invention makes use a specific amount of registers to process more pixels so as to improve the delay time for the material instruction.

Description

1289808 , 九、發明說明: . 【發明所屬之技術領域】 本發明是關於一種暫存器收集機制及其方法以及使用該 收集機制及該方法之像素處理系統,特別是有關於一種用於圖 形處理器(Graphic Processor Unit,GPU)之暫存器收集機制及其 方法以及使用該收集機制及該方法之像素處理系統。 【先前技術】 φ 參考第1圖,繪示習知圖形處理單元2的管線架構之方塊 圖。圖形處理單元2主要包含三角設定單元23、像素處理單元 24以及深度處理單元25,像素處理單元24包括像素著色器2〇 以及連接於像素著色器20之材質單元(Texture Unit)241與色彩 内插器(Color Interp〇lator)242。將三維物件的表面分割成複數 個二維的三角形,此三角形係以幾何關聯性排列且具有任意的 ^寸。每個三角形包含三個頂點,並且將這些頂點資料傳送至 —角设定單元23,三角設定單元23用於將像素的參數輸出至 • 像素處理單70 24,其中參數例如可為像素在三角形的位置以及 十1於二角形的頂點之材質座標。依據像素的位置以及頂點的 =戶質座標,像素處理單元24利用材質單元(Texture1289808, IX, invention description: [Technical field] The present invention relates to a scratchpad collection mechanism and method thereof, and a pixel processing system using the same and the method, and more particularly to a graphics processing A scratchpad collection mechanism of a Graphic Processor Unit (GPU) and a method thereof, and a pixel processing system using the same and the method. [Prior Art] φ Referring to Fig. 1, a block diagram of a pipeline architecture of a conventional graphics processing unit 2 is shown. The graphics processing unit 2 mainly includes a triangle setting unit 23, a pixel processing unit 24, and a depth processing unit 25, and the pixel processing unit 24 includes a pixel shader 2A and a texture unit 241 connected to the pixel shader 20 and color interpolation. Color Interp〇lator 242. The surface of the three-dimensional object is divided into a plurality of two-dimensional triangles arranged in a geometrical association and having an arbitrary size. Each triangle contains three vertices, and these vertex data are transferred to an angle setting unit 23 for outputting the parameters of the pixels to the pixel processing unit 70 24, wherein the parameters can be, for example, pixels in a triangle The position and the material coordinates of the vertices of the eleventh. The pixel processing unit 24 utilizes the material unit (Texture) depending on the position of the pixel and the vertex coordinates of the vertices.

Unit)241 内 素=有像素的材質座標,然後將内插形成的材質座標輸入至像 規二色,2〇。接著像素著色器2〇執行載入指令,例如DireCtX 材質義的teXld指令’並且傳回已處理完成的材質座標給 標,材^…241根據未處理的材質座標以及已處理的材質座 且將材f早7° 241對像素在材質圖中的材質顏色進行取樣,並 _ $顏色輸出至像素著色n 2〇。同時依據像素的位置以及 8 1289808 頂點的材質座標,色彩内插器242對所有像素的頂點顏色作内 插運异,並且將這些頂點顏色輸出至像素著色器。像素著色 器20處理材質顏色以及像素的頂點顏色,並且將色彩值以及 深度值輸出至深度處理單元25,以形成所顯示的像素顏色。然 後將最後的顏色晝出,即可形成整個晝面(Frame)。 參考第2圖,繪示第1圖的圖形處理單元中利用像素著色 器處理程式之方塊圖。像素著色器20主要包含四種暫存器: 用來儲存暫時性資料的一般暫存器(Γη)、材質座標暫存器(tn)、 材質編號暫存器(Sn)、頂點混色暫存器(Vn)以及輸出暫存器 (〇cn) ’並且將最後轉換完成的像素顏色傳送至深度處理單元 25 〇 像素著色器20的處理流程主要包括四個階段··座標計算 階段、材質載入階段、混色階段以及發出階段。首先將來自材 質單元241的像素内插材質座標儲存於材質座標暫存器(tn)。在 座標計算階段,針對來自材質單元241像素之内插材質座標使 用材質座標暫存器(tn)以及一般暫存器(rn)進行算術運算,亦即 將運算的結果(處理之後的材質座標)儲存於一般暫存器(rn) 内。在材質處理階段,依據材質座標暫存器(tn)以及一般暫存器 (rn)内的材質座標,以像素著色器2〇執行材質載入指令,使材 質單元241從材質編號暫存器(Sn)指定的材質圖中取樣出材質 顏色’再將取樣完成的材質顏色傳回一般暫存器(Γη)。在混色階 段’利用像素著色器20將儲存在暫存器(rn)的材質顏色以及來 自色彩内插器242的頂點顏色作混色運算,並且將混色的結果 儲存於頂點混色暫存器(Vn)。在發出階段,像素著色器20將色 彩值以及深度值輸出至深度處理單元25。應注意的是,座標計 1289808 算P白t材質處理階段以及混色階段可重複組合。 指料和控制相依性,亦即後-個指令使用前-個 個指令。在像素著色=巾;;ΐ叙後才能執行後一 ♦ I, 、 工中材質載入指令的執行延遲時間相 及色彩内奸:因々為延遲的時間遠高於位址傳送、記憶體存取以 鉅。為了^异所需的時間’而此延遲時間影響系統效能甚 個傻;#°此延長時間,利用管線化(pipeiine)架構來處理νUnit) 241 Internal = Material coordinates of the pixel, and then the material coordinates formed by the interpolation are input to the image color, 2 〇. Then the pixel shader 2 executes a load instruction, such as the teXld instruction of the DireCtX material' and returns the processed material coordinates to the label. The material ^...241 is based on the unprocessed material coordinates and the processed material holder and the material. f 7° early 241 samples the material color of the pixel in the material map, and _ $ color output to pixel coloring n 2〇. At the same time, depending on the position of the pixel and the material coordinates of the 8 1289808 vertex, the color interpolator 242 interpolates the vertex colors of all pixels and outputs the vertex colors to the pixel shader. The pixel shader 20 processes the material color and the vertex color of the pixel, and outputs the color value and the depth value to the depth processing unit 25 to form the displayed pixel color. The final color is then ejected to form the entire frame. Referring to Fig. 2, a block diagram of a pixel shader processing program in the graphics processing unit of Fig. 1 is illustrated. The pixel shader 20 mainly includes four types of registers: a general register (Γη) for storing temporary data, a material coordinate register (tn), a material number register (Sn), and a vertex color mixing register. (Vn) and the output register (〇cn)' and transfer the final converted pixel color to the depth processing unit 25. The processing flow of the pixel shader 20 mainly includes four stages: the coordinate calculation stage and the material loading stage. , color mixing stage and release stage. First, the pixel interpolated material coordinates from the material unit 241 are stored in the material coordinate register (tn). In the coordinate calculation phase, the material coordinate register (tn) and the general register (rn) are used for the interpolation of the material coordinates from the material unit 241 pixels, and the result of the calculation (the material coordinates after processing) is stored. In the general register (rn). In the material processing stage, according to the material coordinate register (tn) and the material coordinates in the general register (rn), the material load instruction is executed by the pixel shader 2, so that the material unit 241 is from the material number register ( Sn) Samples the material color in the specified material map and returns the sampled material color back to the general scratchpad (Γη). In the color mixing stage, the color of the material stored in the register (rn) and the color of the vertex from the color interpolator 242 are mixed by the pixel shader 20, and the result of the color mixing is stored in the vertex color mixing register (Vn). . In the issuance phase, the pixel shader 20 outputs the color value and the depth value to the depth processing unit 25. It should be noted that the coordinate meter 1289808 can be repeatedly combined in the P white material processing stage and the color mixing stage. The reference and control dependencies, that is, the previous instructions are used by the next instruction. After the pixel coloring = towel;; can only be executed after the first ♦ I, the execution delay time of the material loading instruction in the work and the color traits: because the delay time is much higher than the address transmission, memory access Take the giant. This delay time affects the performance of the system, which is a bit silly; #°This extended time, using the pipelined architecture to handle ν

俯ίΓ :Γ在下一個週期就可以執行下-像素,直到執行Ν m執行此指令的像素已經完成,即可繼續執行下 ^:。然而‘執行N個像素,需要N組像素著色程式指令集 :::中:定:暫存器同時存於像素著色器。因此,習知圖形處 辛早二㊉的像素者色器20必需提供額外的暫存器來儲存像 門^致而要相當大的暫存器成本來解決材質載入指令延遲時 間的問題。 為了解決上述問題,美國第5,652,774號專利案揭露一種 -更名暫存器來減少執行指令所需耗用的週期之中央處理 • 中央處理器包括—個更名暫存器組,以保留前面執行過 的才"所取得的資料。更名暫存器組用來執行後面的指令,這 些,令需要前面載入的資料,亦即需要保持更名暫存器與後續 的指令間之關連性,故佔用較長的運算週期,使得更名暫存器 無法即時釋出,而無法執行下一指令或是更多的指令。因此習 知的中央處理器佔用較多的暫存器,此外在像素處理過程中, 取樣步驟是报複雜的内插運算,而且材質岐存放在記憶體 中’即使以快取(Cache)來加速,亦需要相當長的週期,尤其是 _ S快取失誤(Cache Miss)時,需要很長的週期來讀取記憶體。 1289808 * 此外,為了解決上述問題,另一美國第6,314,511號專利 . 案揭露一種釋放更名暫存器(Renaming Register)的系統及其執 行方法,於使用一指令重新定義結構暫存器(Architectural Register)之前,處理器先將數個更名暫存器配置給結構暫存器 使用,使處理器以亂序(Out-of-order)機制來執行數個指令。雖 然處理器使用指示器來釋放更名暫存器,然而處理器係使用複 雜的亂序機制。換言之,先擷取指令並且對該指令解碼,然後 更名機制在程式執行過程中以動態方式更改暫存器,以亂序機 φ 制對緩衝記憶體重新排序,導致更名的過程更為複雜。 因此需要一種新式的像素處理系統來解決上述之問題。 【發明内容】 本發明之一目的係提供一種像素處理系統,藉由暫存器收 集機制以特定數量的暫存器來處理更多的像素,以改善材質指 令的延遲時間。 本發明之另一目的係提供一種收集配置給像素著色程式 φ ㈣存11之方法’以處理更多的像素,並且解決像素材質指令 的延遲時間之問題。 +本發明之像素理系統主要包括像素著色器以及連結於像 素著色器的暫存器收集機制。暫存器收集機制將第一程式修正 為,二程^,第一程式需要使用一特定數量的暫存器,第二程 式只需要前述—部份的暫存器即可。像素著色器擁取、執行第 程式’並且重新配置第一程式中的暫存器以形成第二程式。 …本發明所述之方法用於收集像素處理系統中配置給第一 • &式之暫存11 ’第'程式包含複數個第-指令並且使用複數個 1289808 暫存器’其中一部份數量的暫存器為使用中的暫存器,另一部 分為閒置未使用的暫存器,但未使用的暫存器為第一程式所佔 用。本發明之方法包括掃描第一程式的第一指令,接著對第一 指令進行解碼,以取得複數個暫存器,然後將第一程式修正為 具有,數個第二指令的第二程式,且第二程式只使用特定暫存 器數里中的一部份,另一部份未使用的暫存器重新分配給其他 的程式使用。 ' 一本發明之方法亦包括提供對應於實質暫存器的編號,以顯 示第二程式使用的實質暫存器之數量,並且可修正第一程式之 唯名暫存H來對應於實質暫存II的編號。較佳實施例中,實質 暫存器的㈣為連續的正整數,可為微升冪或是降冪排列。本 發明之=法亦包含通報像素處理系統所處理的像素之總數量 至像素著色ϋ。另外,本發明之像素處理系統所處理的像素包 3,個不同的像素群組,其巾每個像素群組包含相同或是不同 數量的像素,本發明之指示值亦可指示出像素理系統正在處理 哪一個像素群組。 【實施方式】 首先>考第3圖’係!會不依據本發明之—實施例的暫存器 收集機制之方塊圖。此暫存陳集機制係設置㈣取第一程式 ^擷取$置之刚或疋設置於對第—程式作解碼的解碼裝置之 =,收集機制用於將第—程式修正為第二程式,其中第二程式 ==存器數量小於第—程式使用的數量。第—程式包含複 ^:令且具有複數個唯名暫存器,此處唯名暫存器係定義為 經過操取裝置或是解碼裝置處理之前第-程式中所使用的暫 12 1289808 ^ 存器而言。由於第一程式只使用一部份的唯名暫存器,另一部 ~ 份的唯名暫存器處於閒置狀態,但是由第一程式佔用。舉例而 言,在第4圖中,第一程式需要16個暫存器,然而在第一程 式的指令中僅使用4個暫存器r〇,ri,r3以及來執行程式,^ 餘的暫存器nivri4為間置狀態’並未使用。暫存器收集機制 10收集這些第一程式中閒置的唯名暫存器,使得閒置的唯名暫 存器可以被有效利用來重新配置給更多的程式使用。 暫存器收集機制10主要包括指令掃瞄裝置U、暫存器對 • 應表12、指令修正裝置13以及指示報告器14。指令掃猫裝置 11用以掃瞄具有複數個第一指令的第一程式並且對第一指a 進行解碼,以產生複數個第一暫存器編號,此編號係為唯名9 ^ 存器編號。暫存器對應表12連接於指令掃瞄裝置n,設有複 數個對應於第一暫存器編號之第二暫存器編號,其中第二暫存 器編號係為實質暫存器編號。指令修正裝置13分別連接於指 令掃瞄裝置11以及暫存器對應表12,用以修正對應於該第二 暫存器編號的第一暫存器編號,以形成具有複數個第二指令之 φ 第二程式,其中第二指令係由暫存器對應表12中的第二暫存 器編號所組成,第二暫存器編號係為實質上第二程式所使用的 暫存器數量。指示報告器14發出配置第二程式的第二暫存器 之數量指示值。 參考第4A及4B圖繪示依據本發明之第一實施例中執行暫 存器收集機制的流程圖。在步驟300,將第一程式載入至暫存 器收集機制中。之後在步驟301,清除暫存器對應表中的對應 資訊,以重置原始存放在對應表12中唯名暫存器編號與實質 • 暫存器編號之間對應資料的狀態。接著在步驟302中,利用指 13 1289808 令掃瞄裝¥ 令,例如從第_=^方式掃晦第一程式中的複數個第一指 瞒裝置u ΐ撼= 掃晦至最後一個指令,亦即指令掃 -個第—&式中這些第—指令的位置依序地掃猫每 日7。然後在步驟303,對已掃瞄完 行解螞步驟,以带取幻弟心令進 步驟3。3的解二=;個唯名暫存器編號。在-實施例中, 至一實對應表12中每個唯名暫存器編號是否對應 、存15編號,當唯名暫存器編號與暫存 器編號的實==而沒有對應時’新增對應於唯名暫存 予器、爲唬,如步驟305所示。步驟305中,唯 態:=暫號:=_之間的對應關_^ 傲μ 應表12。在本發明之—較佳實施例中,實 為暫存器的編號例如可為連續的正整數,編號可由…或是 η’其中η為實質暫存器編號中的最後—個編號,且實 編號可為升冪或是降幂排列。在步驟306 +,依序地 =實質暫存器編號的數量指示值,以回應上述的對應關係。 乂驟304進行判斷時’當唯名暫存器編號對應於暫存器對應 表。。12中的實f暫存器職,將唯名暫存器編號修正為實質暫 存器編號,以形成包含實㈣存器編號的第二指令,以產生第 ^程式\如步驟307所示。具體來說,係將唯名暫存器編號指 定至已經存在的連續實質暫存器編號。在—實施例中,第二程 式係由實質暫存器編號組成’來取代不連續的唯名暫存器編 號’而且第二程式可财於暫存器對絲12巾或是週邊的記 憶體内。 接著在步驟308中,判斷每個唯名暫存器編號是否為一第 14 1289808 一指令中最後一個唯名暫存器編號,假如不是的話,執行步驟 303,以擷取出下—個唯名暫存器編號。當錄㈣個唯名暫 存器編號並且找出該第-指令的最後—個唯名暫存器編號之 後,在步驟3G9蚊是否為最後—個第—指令,假如不是的話, 執行步驟302,繼續以靜態方式掃晦第—程式的下—個第—指 令。當摘取出每個第-指令之後,找出該第_程式的最後^俯 Γ : : 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在 在However, ‘N pixels are executed, and N sets of pixel shader instruction sets are required. ::: Medium: The scratchpad is stored in the pixel shader at the same time. Therefore, the pixel 20 of the conventional image must provide an additional register to store a significant amount of scratchpad cost to solve the problem of material load instruction delay time. In order to solve the above problem, U.S. Patent No. 5,652,774 discloses a central processing for renaming a scratchpad to reduce the cycle time required to execute an instruction. • The central processing unit includes a renamed scratchpad group to retain the previously executed register. Only " the information obtained. The renamed register group is used to execute the following instructions. These require the data to be loaded before, that is, the need to maintain the connection between the renamed register and the subsequent instructions, so it takes a long operation cycle, so that the name is temporarily changed. The memory cannot be released immediately, and the next instruction or more instructions cannot be executed. Therefore, the conventional central processing unit occupies more registers, and in the pixel processing process, the sampling step is to report complex interpolation operations, and the material is stored in the memory 'even if it is accelerated by Cache. It also takes a long period of time, especially when _S cache misses, it takes a long period of time to read the memory. 1289808 * In addition, in order to solve the above problem, another U.S. Patent No. 6,314,511 discloses a system for releasing a Renaming Register and an execution method thereof, which uses an instruction to redefine an architectural register (Architectural Register). Previously, the processor first configured a number of renamed registers to the structure register to cause the processor to execute several instructions in an out-of-order mechanism. Although the processor uses an indicator to release the renamed scratchpad, the processor uses a complex out-of-order mechanism. In other words, the instruction is fetched and decoded, and then the rename mechanism dynamically changes the scratchpad during program execution to reorder the buffer memory by the out-of-order machine, resulting in a more complicated process of renaming. Therefore, a new type of pixel processing system is needed to solve the above problems. SUMMARY OF THE INVENTION One object of the present invention is to provide a pixel processing system that processes more pixels with a specific number of registers by a scratchpad collection mechanism to improve the delay time of material instructions. Another object of the present invention is to provide a method of collecting the configuration to the pixel shader φ (4) memory 11 to process more pixels and to solve the delay time of the pixel material instructions. The pixelation system of the present invention primarily includes a pixel shader and a scratchpad collection mechanism coupled to the pixel shader. The scratchpad collection mechanism corrects the first program to two passes. The first program needs to use a specific number of scratchpads, and the second program only needs the aforementioned part of the scratchpad. The pixel shader fetches, executes the program ' and reconfigures the scratchpad in the first program to form the second program. The method of the present invention is used to collect a temporary storage 11 'the 'program' of the first & type containing a plurality of first instructions and using a plurality of 1289808 registers 'a part of the number in the pixel processing system The scratchpad is the scratchpad in use, and the other part is the unused unused scratchpad, but the unused scratchpad is occupied by the first program. The method of the present invention includes scanning a first instruction of the first program, then decoding the first instruction to obtain a plurality of registers, and then modifying the first program to a second program having a plurality of second instructions, and The second program uses only a portion of the specific scratchpad number, and another unused portion of the scratchpad is reassigned to other programs. The method of the present invention also includes providing a number corresponding to the physical register to display the number of physical registers used by the second program, and correcting the first temporary memory of the first program to correspond to the physical temporary storage. The number of II. In the preferred embodiment, (4) of the physical register is a continuous positive integer, which may be arranged in micro-elevation or power-down. The method of the present invention also includes notifying the total number of pixels processed by the pixel processing system to pixel shading. In addition, the pixel packet 3 processed by the pixel processing system of the present invention has a different pixel group, and each pixel group of the towel includes the same or a different number of pixels, and the indication value of the present invention can also indicate the pixel processing system. Which pixel group is being processed. [Embodiment] First, a block diagram of a register collection mechanism according to an embodiment of the present invention will be omitted. The temporary storage mechanism is set (4) to take the first program, and to set the data to be set to the decoding device that decodes the first program, and the collection mechanism is used to modify the first program to the second program. The second program == the number of registers is less than the number used by the first program. The first program includes a complex ^: command and has a plurality of unique registers, where the nominal register is defined as the temporary 12 1289808 stored in the first program before being processed by the processing device or the decoding device. For the sake of the device. Since the first program uses only a part of the nominal register, the other part of the named register is idle, but is occupied by the first program. For example, in Figure 4, the first program requires 16 registers, but in the first program instructions, only 4 registers r〇, ri, r3 are used to execute the program. The register nivri4 is in an interleaved state 'not used. The scratchpad collection mechanism 10 collects the idle unique scratchpads in these first programs so that the idle named registers can be effectively utilized to be reconfigured for use by more programs. The scratchpad collection mechanism 10 mainly includes an instruction scanning device U, a register pair table 12, an instruction correction device 13, and an instruction reporter 14. The instruction sweeping device 11 is configured to scan a first program having a plurality of first instructions and decode the first finger a to generate a plurality of first register numbers, the number being a unique name . The register corresponding table 12 is connected to the command scanning device n, and has a plurality of second register numbers corresponding to the first register number, wherein the second register number is a physical register number. The command correction device 13 is respectively connected to the command scanning device 11 and the register corresponding table 12 for correcting the first register number corresponding to the second register number to form a plurality of second instructions. The second program, wherein the second instruction is composed of the second temporary register number in the register corresponding table 12, and the second temporary register number is the number of temporary registers used by the second program. The indicator reporter 14 issues a quantity indication value of the second register that configures the second program. Referring to Figures 4A and 4B, a flow chart of performing a register collection mechanism in accordance with a first embodiment of the present invention is shown. At step 300, the first program is loaded into the scratchpad collection mechanism. Then, in step 301, the corresponding information in the register corresponding to the register is reset to reset the state of the corresponding data originally stored in the correspondence table 12 between the unique register number and the physical register number. Then in step 302, the finger 13 13289808 is used to make the scan command, for example, sweeping the plurality of first finger devices u ΐ撼 = broom to the last command in the first program from the _=^ mode, That is, the position of these first instructions in the instruction sweep--the first-&-sequence sweeps the cat 7 times. Then, in step 303, the step of solving the scan has been completed, so as to take the illusion of the syllabus, step 3. 3 is solved by two; In the embodiment, whether each of the unique register numbers in the corresponding table 12 corresponds to or not is stored, and when the unique register number and the register number of the temporary register number do not correspond, the new one is new. The increment corresponds to the nominal temporary storage, as shown in step 305. In step 305, the only state: = temporary number: = _ between the corresponding _ ^ proud μ should be Table 12. In the preferred embodiment of the present invention, the number of the temporary register can be, for example, a continuous positive integer, the number can be... or η' where η is the last number in the physical register number, and The number can be arranged in ascending or descending power. At step 306+, the value is sequentially indicated by the number of substantial register numbers in response to the above correspondence. When the determination is made in step 304, 'the name register number corresponds to the register corresponding to the register. . The real f scratchpad in 12 corrects the nominal register number to the physical register number to form a second instruction containing the real (four) register number to generate the second program\ as shown in step 307. Specifically, the unique register number is assigned to the existing continuous physical register number. In the embodiment, the second program is composed of a physical register number to replace the discontinuous unique register number and the second program can be used for the scratchpad or the surrounding memory. Inside. Next, in step 308, it is determined whether each of the nominal register numbers is a last name register number in an instruction of the 14th 1289808. If not, step 303 is performed to extract the next-named temporary number. Register number. After recording the (four) unique register number and finding the last-named scratchpad number of the first instruction, whether the mosquito is the last-first instruction in step 3G9, if not, executing step 302, Continue to sweep the next-first instruction of the first program in a static manner. After extracting each of the first instructions, find the last of the first _ program ^

在步驟31〇巾,指示報告器14發出實f暫存器編號的 數里指不值’最後—個連續實質暫存器編號用於表示實質暫存 器的數量指示值’其中這些實質暫存器是配置給第二程式,而 ,最後-個實質暫存器編號小於第—程式中最後—個唯名暫 存器編號,而可節省更多的暫存器。 參考第5圖,緣示依據本發明之暫存器收集機制ι〇的方 ::,係利用第4M4B圖所示之第一實施例來執行第一程式 Γ;:Λ集機制的流程圖。在此實施例中,第-程式的指令需 要I6個暫存器,其暫存3!總旁 言仔器編號係由。至r15,第一程式的第一 和令中之唯名暫存器係列舉在暫 Φ , ^ ^ ^ 4牛隹货畀器對應表12的左侧行列 T而實質暫存器係列舉在暫在哭姐處生h 平社货畀器對應表的右側行列。 舉例來說,在步驟300,將笫一 ^ Ίη ,从务 將第程式載入至暫存器收集機 刺10中,接著在步騾30 ..哪观中,清除暫存器對應表12的内容以 及數篁指示值,然後在步 鄉302掃描第一程式的第一指令”add 〇, Π,rls”,接著在步驟3〇3中 兑 . 中對别述指令進行解碼並且取得 嘴名暫存器編號r由於暫 ^ ^ 、臀存器對應表12以及數量指示值在步 鄉301清除,當唯a蕲为 / 實占 存器編號無法對應至暫存器對應表中的 頁負暫存器編號,則勃弁 在时 乂驟305。在步驟305中,將實質暫 存盜編5虎r〇對應至唯又叙十 、 存器編號r〇,並且將此對應狀態儲存 15 1289808 在,器對應表12中。在步驟3〇6巾,將實質暫存器編號的 數里才曰不值加卜然準備下-個對應的步驟。在步驟307中, 修正唯名暫存器編號r〇,以對應至實質暫存器編號r0,當唯名 暫存器編號r〇不是最後一個唯名暫存器編號,則執行步驟In step 31, the number of indications that the reporter 14 issues the real f register number is not worth 'last - one consecutive physical register number is used to indicate the quantity indication value of the physical register', wherein these substantial temporary storage The device is configured for the second program, and the last-substantial scratchpad number is smaller than the last-named scratchpad number in the first program, which saves more registers. Referring to Fig. 5, the following description of the register collection mechanism ι〇 according to the present invention is performed by using the first embodiment shown in Fig. 4M4B; In this embodiment, the instructions of the first program require I6 registers, and the temporary storage 3! To r15, the first program in the first program and the list of the name register are held in the temporary Φ, ^ ^ ^ 4 隹 隹 隹 对应 对应 对应 左侧 左侧 左侧 而 而 而 而 而 而 而 而 而 而 而 而 而 而 而 而 而 而 而In the crying sister's place, the right side of the table is the corresponding table. For example, in step 300, the program is loaded into the register collector 10, and then in step 30.., in the view, the register corresponding to the table 12 is cleared. The content and the number of indication values, then scan the first instruction of the first program "add 〇, Π, rls" in step 302, and then decode the other instructions in step 3. 3 and obtain the name of the mouth. The memory number r is cleared in the step 301 due to the temporary ^ ^, the hip-and-forth correspondence table 12 and the quantity indication value. When only the a 蕲 / real servant number cannot correspond to the page negative temporary storage in the register table of the register The number of the device is 305. In step 305, the physical temporary piracy code is corresponding to the only storage number, and the storage number is stored, and the corresponding state is stored in the device correspondence table 12. In step 3〇6, the number of the physical register number will not be added to the corresponding step. In step 307, the nominal register number r〇 is corrected to correspond to the physical register number r0. When the nominal register number r is not the last unique register number, the step is performed.

303,以對下一個笛 A 昂一4曰令的唯名暫存器編號ri進行解碼,並 且修正唯名暫存器編號,以對應至實質暫存器編號η,然後 將數量指示值增加為2。303, decoding the nominal register number ri of the next flute A, and correcting the nominal register number to correspond to the physical register number η, and then increasing the quantity indication value to 2.

當處理唯名暫存器編號^時,產生實質暫存器編號~,以 對應至第二個唯名暫存器編號〜,並且在步驟術中,將唯名 暫存器編號r15修改為實質暫存器編號η。接著在步驟細中, 由於r15疋第一指令”add % η,〜,,的最後一個唯名暫存器編 號,所以接著繼續處理下一個第一指令“r〇”,然後在步 驟309決疋是否已經處理到最後一個第一指令,隨後執行步驟 310。=在步驟310中,數量報告器14通報-數量指示值為4的 值最後形成具有4個實質暫存器之第二程式,在此實施例中, I:程式必須佔用16個暫存器。換言之,暫存器收集機制, ^二程式修正成第二程式,第二㈣只需要4個實質暫存 。、餘12個暫存器可重新配置給不同的程式使用。 本U除了依序對第—指令的所有唯名暫存器編號進 一 I之外亦可同時對所有唯名暫存器編號進行解碼,例如 :指令的唯名暫存器編號rQ、ri以及〜同時在步驟期 馬。亦即經過步驟則判斷得知唯名暫存器編號〇並不是 個編唬時,執行步驟304。 曼 參考第6八及犯圖,緣示依據本發明之第一實施例中 存益收集機制的流程圖。在步驟4〇〇中,將包含第一指令的 16 1289808 第-程式載人至暫存器收集機制中。之後在步驟4G1,清除暫 存器對應表12中的對應資訊,以重置原始存放在對應表12中 唯名暫存n編號與實質暫存器編號之間對應資料的狀態。接著 在步驟402中,利用指令_裝置n以靜態方式掃瞒第一程式 中所有的第-指令。然後在步驟4G3,對已掃心成的第一指 令進行解碼步驟,以形成複數個唯名暫存器編號,並且取得配 置給第-程式的唯名暫存器之總數量。接著在步驟楊中,建 立-暫存器對應表12,暫存n對應表12主要包含複數個對應 於上述之唯名暫存器的總數量。然後在步驟4〇5,決定每一個 唯名暫存器編號是否對應於儲存在暫存器對應表12中的實質 暫存器之編號。在步驟405進行判斷時,當唯名暫存器編號對 應於暫存器對應表12中的實質暫存器編號,執行步驟4〇6,以 記錄唯名暫存器編號與實質暫存器編號之間的對應關係。然後 在步驟407中,增加實質暫存器編號的數量指示值。接著在步 驟408,判斷每個唯名暫存器編號是否為一第一指令中的最後 一個唯名暫存器編號,假如不是的話,執行步驟4〇5。 在步驟405,當唯名暫存器編號與暫存器對應表12中的實 質暫存器編號不同而沒有對應時,將沒有對應的唯名暫存器編 號儲存在暫存器收集機制的記憶體中。假如在步驟4〇8判斷出 最後一個唯名暫存器編號,則執行步驟41〇。在步驟41〇中, 除了唯名暫存器編號與實質暫存器編號相同的對應編號之 外’將儲存在暫存器收集機制的記憶體中之唯名暫存器編號以 隨機(Random)或是循序(Sequentially)方式對應至實質暫存器編 號。然後執行步驟411,增加實質暫存器編號的數量指示值。 接著將唯名暫存器編號修改成實質暫存器編號,如步驟412所 17 1289808 示。然後在步驟41”’發出實質暫存器的數量指示值。最後 在步驟414中,產生可以執行的第二程式。 參考第7圖’繪示依據本發明之暫存器收集機制的方塊 圖’係利用第6A及6B圖所示之第一實施例來執行第一程式暫 存器收集機制的流程圖。此實施例中,第一程式的指令佔用” 個唯名暫存器,第—程式的第—指令中之唯名暫存器係 暫存器對絲12的左側行列中,而實質暫存器係列舉 在暫存器對應表12的右側行列中。When the unique register number ^ is processed, the physical register number ~ is generated to correspond to the second unique register number ~, and in the step, the unique register number r15 is modified to be a substantial temporary Register number η. Then in the step, since r15疋 the first instruction "add % η, ~," the last unique register number, then continue to process the next first instruction "r〇", and then in step 309 Whether the last first instruction has been processed, and then step 310 is performed. = In step 310, the quantity reporter 14 notifies that the value of the quantity indication value of 4 finally forms a second program with four physical registers, here In the embodiment, I: the program must occupy 16 registers. In other words, the scratchpad collection mechanism, the second program is modified into the second program, the second (four) only needs 4 physical temporary storage. The remaining 12 temporary registers It can be reconfigured for use by different programs. In addition to all the unique register numbers of the first instruction, the U can also decode all the unique register numbers at the same time, for example: the name of the instruction. The register numbers rQ, ri, and ~ are simultaneously in the step period. That is, after the step is judged that the unique register number is not a compilation, step 304 is performed. The first aspect according to the present invention A flow chart of the benefit collection mechanism in the example. In step 4, the 16 1289808 first program of the first instruction is loaded into the scratchpad collection mechanism. Then in step 4G1, the scratchpad correspondence table 12 is cleared. Corresponding information in the state to reset the state of the corresponding data originally stored in the corresponding table 12 between the temporary storage n number and the physical temporary register number. Then in step 402, the command_device n is used to statically sweep the data. All the first instructions in the first program. Then in step 4G3, the first instruction of the swept heart is decoded to form a plurality of unique register numbers, and the configuration is assigned to the first program. The total number of registers. Next, in step Yang, the setup-scratchpad correspondence table 12, the temporary storage n correspondence table 12 mainly includes a plurality of total numbers corresponding to the above-mentioned nominal registers. Then in step 4〇5 Determining whether each of the unique register numbers corresponds to the number of the physical register stored in the register corresponding table 12. When the determination is made in step 405, the unique register number corresponds to the register corresponding to the register. The physical register in Table 12 Step 4〇6 is executed to record the correspondence between the unique register number and the physical register number. Then, in step 407, the quantity indication value of the physical register number is incremented. Then, in step 408, it is determined. Whether each unique register number is the last unique register number in a first instruction, and if not, step 4〇5 is performed. In step 405, the unique register number and the temporary register are executed. If there is no corresponding physical scratchpad number in Table 12, there is no corresponding unique register number stored in the memory of the scratchpad collection mechanism. If the last name is determined in step 4〇8 The register number is executed in step 41. In step 41, except that the unique register number is the same as the physical register number, it will be stored in the memory of the scratchpad collection mechanism. The nominal register number corresponds to the physical register number in a random or sequential manner. Then step 411 is executed to increase the quantity indication value of the physical register number. The nominal register number is then modified to the physical register number, as shown in step 412, 17 1289808. Then, in step 41"', the quantity indication value of the physical register is issued. Finally, in step 414, a second program that can be executed is generated. Referring to Figure 7 'showing a block diagram of the scratchpad collection mechanism according to the present invention' A flowchart of executing the first program scratchpad collection mechanism by using the first embodiment shown in FIGS. 6A and 6B. In this embodiment, the first program instruction occupies a unique register, the first program The first register in the first instruction is in the left row of the wire 12, and the physical register series is in the right row of the register corresponding table 12.

將第一程式修正成第二程 器,其餘29個暫存器可重 換言之,暫存器收集機制1〇, 式,第二程式只需要ό個實質暫存 新配置給不同的程式使用。 ^步驟彻中,將包含第—指令的第—程式载人至暫存器 ^機制中。之後在步驟彻,清除暫存器對應表12中的對^ 二:。以重置原始存放在對應表12中唯名暫存器 ^ 暫存裔編號之間對應資料的狀態。接 ,、實負The first program is modified to the second program, and the remaining 29 registers are in the other way. In other words, the scratchpad collection mechanism is 1〇, and the second program only needs a substantial temporary storage new configuration for different programs. ^Steps are complete, the first program containing the first instruction is loaded into the scratchpad mechanism. After that, in the step, the pair 2 in the register corresponding to the register is cleared. In order to reset the state of the corresponding data originally stored in the corresponding temporary storage device ^ temporary storage number in the corresponding table 12. Connected, real negative

令_裝置η以靜態方式掃㈣―程^步所^中,利用指 ,在步驟403,對已掃瞒完成的第—指令進行解瑪曰令。然 給第程式且總數量為6的唯名暫存:在暫= 對應表12中亦包括6個唯名暫存器編號之連續的督 編號,例如由編號i至編號6。㈣在步驟_ =質暫存器 2號⑽應於暫存器對應表12中的實質暫存=名暫存器 乂驟4。6’以記錄唯名暫存器編號。盘 二、^ ri’執打 的對應關係。然後在步驟4〇7中,使實^存㈣號Π之間 示值増加卜接著在步驟術,當唯名暫H器Γ的數量指 指令中的最後-個唯名暫存器編號,執行步二 18 1289808 名暫存器編號r2、f5對應於暫存器對應表12中的實質暫存器編 號r2、ι·5。在步驟408中,唯名暫存器編號r15為最後一個對應 於實質暫存器編號之唯名暫存器編號。 在步驟405,當唯名暫存器編號r8、r1G以及r35並未對應於 暫存器對應表12中的實質暫存器編號,則執行步驟409,將唯 名暫存器編號r8、以及r35暫時儲存於暫存器收集機制的記 憶體中。在步驟408中,當唯名暫存器編號r15為最後一個對應 於實質暫存器編號之唯名暫存器編號,則執行步驟410。在步 驟410中,除了實質暫存器編號η、r2以及r5之外,將唯名暫 存器編號r8、ri〇以及r35以隨機(Random)或是循序(Sequentially) 方式對應至實質暫存器編號r8、以及r35。然後執行步驟411, 將實質暫存器編號的數量指示值增加至6。接著將唯名暫存器 編號修改成實質暫存器編號,以產生具有實質暫存器編號的第 二程式,如步驟412所示。然後在步驟413中,將實質暫存器 的數量指示值6發出。最後在步驟414中,產生可以執行的第 二程式。 熟習此項技術者應瞭解暫存器收集機制10例如可為硬體 電路或是軟體方式來實施。當以軟體方式實施時,暫存器收集 機制10可為電腦作業系統中正在執行的軟體工具程式、程式 載入器(Program Loader)、或是附加於程式編譯器之電腦週邊裝 置的驅動程式。較佳實施例中,當以硬體方式實施時,暫存器 收集機制10係連接至程式擷取單元或是解碼單元,亦即位於 指令排序單元201以及像素著色器20的解碼單元203之前, 或是將暫存器收集機制10整合至像素著色器20中。暫存器收 集機制10使實質暫存器對於像素的可利用性增加,主要是藉 19 1289808 由以靜態方式掃瞄第一程式來重新產生簡化的第二程式。本發 明之像素著色器20係定義於DirectX規格,應注意的是, OpenGL·規袼所定義的區段處理器(Fragment Processor)亦適用 本發明’以及類似的像素著色系統亦可適用之。Let _device η sweep (4) in the static mode, and use the finger, in step 403, to solve the command of the broom that has been completed. However, the first program with a total number of 6 is temporarily stored: in the temporary = correspondence table 12, the consecutive supervisor numbers of the six unique register numbers are also included, for example, from number i to number 6. (4) In step _ = quality register 2 (10) should be in the register of the register corresponding to the physical temporary storage = name register step 4. 6' to record the name of the register. The correspondence between the second and the second ri’s. Then in step 4〇7, the value of the actual (4) number is added and then the step is performed. When the number of the named temporary device is the last-named register number in the instruction, the execution is performed. Step 2 18 1289808 The name register numbers r2 and f5 correspond to the physical register numbers r2 and ι·5 in the register corresponding table 12. In step 408, the nominal register number r15 is the last unique register number corresponding to the physical register number. In step 405, when the nominal register numbers r8, r1G, and r35 do not correspond to the physical register numbers in the register corresponding table 12, step 409 is executed, and the unique register numbers r8 and r35 are executed. Temporarily stored in the memory of the scratchpad collection mechanism. In step 408, when the nominal register number r15 is the last unique register number corresponding to the physical register number, step 410 is performed. In step 410, in addition to the physical register numbers η, r2, and r5, the unique register numbers r8, ri, and r35 are correspondingly mapped to the physical register in a random or sequential manner. Number r8, and r35. Then, in step 411, the quantity indication value of the physical register number is increased to 6. The nominal register number is then modified to the physical register number to produce a second program having a physical register number, as shown in step 412. Then in step 413, the value of the physical register is indicated by a value of 6. Finally in step 414, a second program is executed that can be executed. Those skilled in the art will appreciate that the scratchpad collection mechanism 10 can be implemented, for example, in a hardware or software manner. When implemented in software, the scratchpad collection mechanism 10 can be a software utility program, a Program Loader, or a driver attached to a computer peripheral device of the program compiler being executed in the computer operating system. In the preferred embodiment, when implemented in a hardware manner, the scratchpad collection mechanism 10 is coupled to the program capture unit or the decoding unit, that is, before the instruction sorting unit 201 and the decoding unit 203 of the pixel shader 20, Or the scratchpad collection mechanism 10 is integrated into the pixel shader 20. The scratchpad collection mechanism 10 increases the availability of the physical scratchpad for pixels, primarily by re-generating the simplified second program by statically scanning the first program on 19 1289808. The pixel shader 20 of the present invention is defined in the DirectX specification. It should be noted that the Fragment Processor defined by the OpenGL protocol is also applicable to the present invention' and similar pixel shading systems are also applicable.

參考第8圖,係繪示依據本發明之一實施例中使用暫存器 收集機制之像素處理系統。暫存器收集機制10將第一程式修 正為第二程式,藉由收集配置給第一程式的唯名暫存器,以減 少第一程式所佔用的暫存器,使像素理系統以一特定的暫存器 數量來處理更多的像素。像素理系統主要是用於圖形處理單元 (GPU),包括像素著色器20以及連結於像素著色器20的暫存 器收集機制10。像素著色器20主要包括指令排序單元201、 程式計數器(Program Counter)202、解碼器203、複數個暫存器 204 以及算術邏輯單元(Arithmetic Logic Unit,ALU)205。指令 排序單元201從暫存器收集機制ι〇接收第二程式,程式計數 器202由指令排序單元2〇1擷取第二程式的指令,然後解碼器 203將掘取到的指令作解碼。算術邏輯單元2〇5控制解碼後的 指令之執订步驟’而暫存器收集機制1〇的指示報器14發出配 置第一程式的暫存器之數量指示值給像素著色器使得像素 處理系統1GG根據數$指示值將第—程式中閒置未使用的唯名 暫存重新配置給其他的寂斗、 式使用,亦即本發明之像素理系統 100以最少數量的實質暫在哭 货存為來執行第二程式,以同時處理更 在第一程式輸入至暫存 程式的唯名暫存器數量定義 存器數量定義為r,,且1«與 器收集機制10之前,配置給第一 為r,而配置給第二程式的實質暫 Γ之間的比值i定義為r/r’,用以表 20 1289808 第二程式的暫存器之使用狀態,其中!為整數, # ‘軚佳。以處理像素為例,當處理第一程式時,像素 者色1§ 20只能處理Ν Ν個像素,經過本發明之暫存器收集機制 10的處理之後可増加至iN個像素。 參考第9圖,係1會示依據本發明之另-實施例中使用暫存 器收集機ϋ之像素處理系統。像素著色器⑼利用一特定數量 的暫存器的執行第-程式,以處理Ν個像素。暫存器收集機制 10將第程式修正成為第二程式,由於第二程式只需要用到原 先一半的暫㈣,另—半的暫存ϋ可以配置給另外Ν個像素使 用’所以在此實施例中,像素處理系統對於像素的處理數量增 加至2Ν個。 應/主思的是,由指示報告器發出至像素著色器20的指示 值可為第二程式所需要的實質暫存器數量或是像素理系統1〇〇 中所處理的像素總數量。當像素理系統i 〇〇所處理的像素包含 數個不同的像素群組,其中每個像素群組包含相同或是不同數 里的像素,本發明之指示值亦可指示出像素理系統1〇〇正在處 理哪一個像素群組。 雖然本發明已用較佳實施例揭露如上,然其並非用以限定 本發明,任何熟習此技藝者,在不脫離本發明之精神和範圍 内’當可作各種之更動與潤飾,因此本發明之保護範圍當視後 附之申請專利範圍所界定者為準。 【圖式簡單說明】 第1圖繪示習知圖形處理單元的管線架構之方塊圖。 第2圖繪示第1圖的圖形處理單元中利用像素著色器處理 21 1289808 程式之方塊圖。 第3圖係繪示依據本發明之一 方塊圖。 實施例的暫存 器收集機制之 第4A及4B圖繪示依據本發明 收集機制的流程圖。 “施例中執行暫存器 用第據切明之暫存以集_时塊圖,係利 == 之第一實施例來執行第-程式暫存器收集 機制的流程圖。Referring to Figure 8, a pixel processing system using a scratchpad collection mechanism in accordance with one embodiment of the present invention is illustrated. The scratchpad collection mechanism 10 corrects the first program to the second program, by collecting the unique register configured to the first program, to reduce the register occupied by the first program, and making the pixel processing system specific The number of scratchpads to handle more pixels. The pixel management system is primarily for a graphics processing unit (GPU), including a pixel shader 20 and a scratchpad collection mechanism 10 coupled to the pixel shader 20. The pixel shader 20 mainly includes an instruction sorting unit 201, a program counter 202, a decoder 203, a plurality of registers 204, and an Arithmetic Logic Unit (ALU) 205. The instruction sorting unit 201 receives the second program from the scratchpad collection mechanism ι, and the program counter 202 retrieves the instructions of the second program from the instruction sorting unit 2〇1, and then the decoder 203 decodes the extracted instructions. The arithmetic logic unit 2〇5 controls the step of binding the decoded instruction, and the indicator 14 of the register collection mechanism 1 sends a quantity indication value of the register configuring the first program to the pixel shader to make the pixel processing system 1GG reconfigures the idle unused unused temporary storage in the first program according to the number $ indication value to other silent use, that is, the pixel processing system 100 of the present invention temporarily stores the crying goods with a minimum amount of substance. To execute the second program to simultaneously process the number of unique register registers defined in the first program input to the temporary program, and define the number of registers as r, and before the 1_mechanism collection mechanism 10, configure the first r, and the ratio i between the actual temporary allocations assigned to the second program is defined as r/r', used to use the state of the register of the second program of Table 20 1289808, where! For integers, # ‘軚佳. Taking the processing pixel as an example, when processing the first program, the pixel color 1 § 20 can only process 像素 pixels, and can be added to iN pixels after the processing of the register collecting mechanism 10 of the present invention. Referring to Fig. 9, a system 1 will show a pixel processing system using a scratchpad collector in accordance with another embodiment of the present invention. The pixel shader (9) uses a particular number of registers to execute the first program to process the pixels. The scratchpad collection mechanism 10 corrects the program to the second program. Since the second program only needs to use the original half (four), the other half of the temporary storage can be configured for use by another pixel. Therefore, in this embodiment In the pixel processing system, the number of processing for pixels is increased to 2 。. It should be appreciated that the indication value issued by the indicator reporter to the pixel shader 20 can be the number of physical registers required by the second program or the total number of pixels processed in the pixel processing system 1〇〇. When the pixel processed by the pixel processing system i 包含 includes a plurality of different pixel groups, wherein each pixel group contains pixels of the same or different numbers, the indication value of the present invention may also indicate the pixel processing system.哪 Which pixel group is being processed. While the invention has been described above by way of a preferred embodiment, it is not intended to limit the invention, and the invention may be modified and modified without departing from the spirit and scope of the invention. The scope of protection is subject to the definition of the scope of the patent application. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing the pipeline architecture of a conventional graphics processing unit. FIG. 2 is a block diagram showing the processing of the program using the pixel shader 21 1289808 in the graphics processing unit of FIG. 1. Figure 3 is a block diagram showing one of the present invention. 4A and 4B of the scratchpad collection mechanism of the embodiment illustrate a flow chart of a collection mechanism in accordance with the present invention. The execution flow of the first program register collection mechanism is implemented in the first embodiment of the embodiment.

第6八及印圖緣示依據本發明之第—實 收集機制的流程圖。 第7圖繪示依據本發明之暫存器收集機制的方塊圖,係利 用第6Α及6Β圖所示之第一實施例來執行第一程式暫存器收集 機制的流程圖。 μ 第8圖係繪示依據本發明之一實施例中使用暫存器收集機 制之像素處理系統。The sixth and eighth prints show a flow chart of the first actual collection mechanism in accordance with the present invention. Figure 7 is a block diagram showing a scratchpad collection mechanism in accordance with the present invention, which is a flow chart for executing the first program scratchpad collection mechanism using the first embodiment shown in Figures 6 and 6. μ Figure 8 is a diagram showing a pixel processing system using a scratchpad collection mechanism in accordance with one embodiment of the present invention.

第9圖係繪示依據本發明之另一實施例中使用暫存器收集 機制之像素處理系統。 10 暫存器收集機制 12 暫存器對應表 14 指示報告器 23 三角設定單元 25 深度處理單元 201 指令排序單元 【主要元件符號說明】 2 圖形處理單元 11指令掃瞄裝置 13指令修正裝置 20像素著色器 24像素處理單元 100像素處理系統 22 1289808 202程式計數器 203 204暫存器 205 241材質單元 242 解碼器 算術邏輯單元 色彩内插器Figure 9 is a diagram showing a pixel processing system using a scratchpad collection mechanism in accordance with another embodiment of the present invention. 10 register collection mechanism 12 register correspondence table 14 indicator reporter 23 triangle setting unit 25 depth processing unit 201 instruction sorting unit [main element symbol description] 2 graphics processing unit 11 instructs the scanning device 13 to instruct the correcting device 20 pixel coloring 24 pixel processing unit 100 pixel processing system 22 1289808 202 program counter 203 204 register 205 241 material unit 242 decoder arithmetic logic unit color interpolator

23twenty three

Claims (1)

1289808 十、申請專利範圍: 1 · 一種像素處理系統,至少包含: 暫存器收集機制,用以將第一程式修正成第二程弋,其 中該第一程式設有複數個第一暫存器,且該第二程式該第 一程式中一部分的該第一暫存器;以及 / / 一連接於該暫存器收集_之像素著色器,l取並且 執行該第二程式,以重新配置另一部份的該第一 、 gg . . X- t η 3 4予器’以配 置給不同的程式使用。 #二=請Λ利範圍第1項所述之像素處理系統,其中該像 素者色裔至少包含一指令儲存記憶體,用以儲存該第二 3·如巾請專利範圍第2項所述之像素處理系統,其中二暫 存器收集機制設置於該指令储存記憶體的輸人 1, Μ 集配置給該第一程式的暫存器。 别用以收 一 t如中睛專利範圍第1項所述之像素處理系統,其中 -程式至少包含複數個第—指令,且該些第 :“ 唯名暫存器。 7,、有複數個 |=請專利範㈣4項所述之像素處理“,其中兮暫 =:制至少包含-指™’―::: 6·如申請專利範圍第5項所述之像素處理 令掃瞒裝置循序地掃描該第-程式中該此第二、’其中該指 置。 二弟—指令的靜態位 7·如申請專利範圍第5項所述之像素處理 令掃瞒褒置同時地掃描該第-程式中該此第^…其中該指 置。 二第—指令的靜態位 24 1289808 8·如申請專利範圍第 存器收集機制至少包含··1289808 X. Patent application scope: 1 · A pixel processing system, comprising at least: a scratchpad collection mechanism for modifying a first program into a second program, wherein the first program is provided with a plurality of first registers And the second program is part of the first register of the first program; and / / a pixel shader connected to the register collection_, fetching and executing the second program to reconfigure another A portion of the first, gg . . X- t η 3 4 is configured to be used by different programs. The pixel processing system of claim 1, wherein the pixel chromophore includes at least one instruction storage memory for storing the second s. The pixel processing system, wherein the two register collection mechanism is disposed in the input memory of the instruction storage memory, and is configured to be configured to the temporary program of the first program. Do not use a pixel processing system as described in item 1 of the patent scope, wherein the program contains at least a plurality of first instructions, and the first: "named register. 7, a plurality of |=Please apply the pixel processing described in item 4 of the patent (4), where 兮 = =: system contains at least - refers to TM'-::: 6 · The pixel processing as described in item 5 of the patent application makes the broom device sequentially Scan the second, 'where the finger' in the first program. The second brother - the static bit of the instruction. 7. The pixel processing as described in claim 5 of the patent application causes the broom to simultaneously scan the first of the first programs. Second - the static bit of the instruction 24 1289808 8 · If the patent collection scope storage mechanism contains at least w I修正裝置,分別連接於該指令料裝置以及該暫存 器對應表,用以修正對應於該第二暫存器編號的第—暫存 號:以形成具有複數個第二指令之第二程式,其中該些第二指 7係由該暫存1 2對應表中的第二暫存H編號所組成。The W I correction device is respectively connected to the command device and the register correspondence table for correcting the first temporary storage number corresponding to the second temporary register number to form a second with a plurality of second instructions The program, wherein the second fingers 7 are composed of the second temporary H number in the temporary storage correspondence table. 1項所述之像素處理系統,其中該暫 應二 9·如申凊專利範圍第8項所述之像素處理系統,其中該第 一指令的第一暫#器編號係由複數個配置給該第一程㈣ 名暫存器所組成。 ^ 25 1 〇’如申明專利範圍第9項所述之像素處理系統,其中該 第二暫存器編號係為複數個配置給該第二程式的實質暫存器 編號。 2 u•如申請專利範圍第10項所述之像素處理系統,其中該 暫存器對應表巾的第二暫存器職係為連續的編號。 12.如申請專利範圍第丨丨項所述之像素處理系統,更包含 一指示報告器,以發出配置該第二程式的第二暫存器之數量指 示值0 13.如申請專利範圍第12項所述之像素處理系統,其中該 具有連續編號的第二暫存器中最後一個編號係等於該第二暫 存器之數量指示值,且該最後一個編號等於用以表示該第一指 1289808 令的第一暫存器之總數量值。 數量二=:專利範圍第12項所述之像素處理系統,其中該 里才曰不值係為配置給該第二程式之第二暫存器的總數量值。 产干止,二請專利範圍第12項所述之像素處理系統’其中該 =報。讀用於通報該像素處理系統所處理的像素之總數 乂如申請專利範圍第8項所述之像素處理系統,兮 暫存15收集機制係選自電腦作㈣統中正在執行的軟體工^ :附加於程式嶋之電⑽ 17. -種於像素處理系統中執行暫存器收集機制之方 至少包含下列步驟: 機制:設有複數個第一指令之第一程式载入至該暫存器收集 掃描位於該第一程式中的第一指令; 對該第-指令作解碼,以形成第一暫存器編號; 將該第-暫存器編號對應至一暫存器對應表 存器編號;以及 一 I 修正該第-程式,以重新配置該第一程式中一部分的第一 暫存器編號’以形成具有複數個第二指令之第二程式,其中另 -部分的第-暫存器編號係配置給位於像素著色器中不 程式使用,且該第二程式的第二暫存器編號係位於 斜 應表中。 $什斋對 18.如申請專利範圍第17項所述之方法,於掃描位於 -程式中的第-指令的步驟之前’更包含清除該暫存器對應表 26 1289808 的内容。 19·如申請專利範圍第17項所述之方法,更包含決定該第 一暫存器編號是否對應於已儲存於該暫存器對應表中的暫 器編號。 一口 20·如申請專利範圍第19項所述之方法,於掃描位於該第 私式中的第-指令的步驟中,更包含循序地掃描該第一程式 中該些第一指令的靜態位置。 21.如申請專利範圍第17項所述之方法,其中當該第一暫 • f器編號與已儲存於該暫存器對應表中的第二暫存器編號不 目同而無法對應時,更包含重新蚊另—第二暫存器編號給該 第二暫存器編號。 22·如申請專利範圍第21項所述之方法,於重新指定另一 第二暫存器編號給該第二暫存器編號的步驟中,該第一暫存器 編號與該第二暫存n編狀_對應g係儲存於該暫存器對 23·如申請專利範圍第22項所述之方法,The pixel processing system of claim 1, wherein the first processing unit of the first instruction is configured by a plurality of configurations. The first pass (four) is a temporary register. The pixel processing system of claim 9, wherein the second register number is a plurality of physical register numbers assigned to the second program. The pixel processing system of claim 10, wherein the register is corresponding to the second register of the towel as a continuous number. 12. The pixel processing system of claim 2, further comprising an indication reporter for issuing a quantity indication value of the second temporary register configured with the second program. The pixel processing system of the item, wherein a last number in the second register having consecutive numbers is equal to a quantity indication value of the second register, and the last number is equal to the first finger 1289808 The total number of first registers of the order. Quantity 2 =: The pixel processing system of claim 12, wherein the value is not the total number of second registers allocated to the second program. Production and drying, please refer to the pixel processing system described in item 12 of the patent scope. Reading the total number of pixels processed by the pixel processing system, for example, the pixel processing system described in claim 8 of the patent application, the temporary storage 15 collection mechanism is selected from the computer (4) system is executing the software ^: Additional to the program's power (10) 17. - The implementation of the scratchpad collection mechanism in the pixel processing system includes at least the following steps: Mechanism: The first program with a plurality of first instructions is loaded into the register for collection Scanning a first instruction located in the first program; decoding the first instruction to form a first register number; and assigning the first register number to a register corresponding table number; I correcting the first program to reconfigure a portion of the first register number ' of the first program to form a second program having a plurality of second instructions, wherein the other portion of the first register number is The configuration is not used by the program in the pixel shader, and the second register number of the second program is located in the skew table. $什斋对 18. The method of claim 17, in the method of scanning the first instruction in the program, further includes clearing the contents of the register corresponding table 26 1289808. 19. The method of claim 17, further comprising determining whether the first register number corresponds to a server number stored in the register corresponding to the register. The method of claim 19, wherein the step of scanning the first instruction in the first private mode further comprises sequentially scanning the static positions of the first instructions in the first program. 21. The method of claim 17, wherein when the first temporary device number is not identical to the second temporary register number stored in the register corresponding to the register, Further includes a re-mosquito-second register number to the second register number. 22. The method of claim 21, wherein in the step of reassigning another second register number to the second register number, the first register number and the second temporary storage The n-form _ corresponding g is stored in the register pair 23, as described in claim 22, 第=存㈣實質暫存器之職係為連續的正整數編^且; t連續編號為升冪排列。 24.如申請專利範圍第23項所述之方法,其中該且有連肖 :二:第二暫存器中最後一個編號係等於該第二程式的第: 暫存器之數量指示值。 2^如中請專利範圍第24項所述之方法,更包含發出配置 以第-程式的第二暫存器之數量指示值至該像素著色器。 伟解ϋΓ料鄉㈣19項料之方法,於對該第一指令 作解碼的步驟中,更包含形成該第一暫存器編號。 27 1289808 作解碼的:26項所述之方法,於對該第-指令 之總數量值。&包3取得配置給該第—程式的第-暫存器 28.如申請專利範圍第27項所 存器對應表,“ 更包含建立該暫 存器的外、^ 錢μ含魏㈣應於該第一暫 q、、心致里值之第二暫存器。 存中請專利範圍第27項所述之方法,其中當該第一暫 相==存於該暫存器對應表中的第二暫存器編號不 =對應時,更包含將無崎應_第—暫存器編號集 τ在該暫存器對應表的記憶體之中。 30.如申請專利範圍第29項所述之方法,更包含將該記憶 體中無法對應的該第-暫存器編號重新指定至該暫存器對應 表中除了已經對應的第二暫存器編號以外的第二暫存器編號: 3!.如申請專利範圍帛17項所述之方法,更包含通報該像 素處理系統所處理的像素之總數量。 28The first = deposit (four) physical scratchpad grades are consecutive positive integers and; t consecutive numbers are arranged in ascending power. 24. The method of claim 23, wherein the second number in the second register is equal to the number indication value of the second program of the second program. 2^ The method of claim 24, further comprising issuing a configuration indicating the value of the second register of the first program to the pixel shader. In the step of decoding the first instruction, the method of decoding the first instruction includes forming the first register number. 27 1289808 Decoded: The method described in item 26, for the total number of values for the first instruction. &Package 3 obtains the configuration of the first register to the first program 28. As shown in the application scope of the 27th item of the patent scope, "including the establishment of the temporary register, ^ Qian μ containing Wei (four) should The first temporary q, the second register of the heart value. The method of claim 27, wherein the first temporary phase == is stored in the register of the register When the second register number is not=corresponding, it further includes the non-seven _th-storage register number set τ in the memory of the register corresponding table. 30. As described in claim 29 The method further includes reassigning the first register number that cannot be corresponding to the memory to the second register number in the register corresponding to the second register number corresponding to the corresponding register number: 3 The method of claim 17, further comprising notifying the total number of pixels processed by the pixel processing system.
TW94139823A 2005-11-11 2005-11-11 Register-collecting mechanism, method for performing the same and pixel processing system employing the same TWI289808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW94139823A TWI289808B (en) 2005-11-11 2005-11-11 Register-collecting mechanism, method for performing the same and pixel processing system employing the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW94139823A TWI289808B (en) 2005-11-11 2005-11-11 Register-collecting mechanism, method for performing the same and pixel processing system employing the same

Publications (2)

Publication Number Publication Date
TW200719274A TW200719274A (en) 2007-05-16
TWI289808B true TWI289808B (en) 2007-11-11

Family

ID=39295756

Family Applications (1)

Application Number Title Priority Date Filing Date
TW94139823A TWI289808B (en) 2005-11-11 2005-11-11 Register-collecting mechanism, method for performing the same and pixel processing system employing the same

Country Status (1)

Country Link
TW (1) TWI289808B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090046105A1 (en) * 2007-08-15 2009-02-19 Bergland Tyson J Conditional execute bit in a graphics processor unit pipeline

Also Published As

Publication number Publication date
TW200719274A (en) 2007-05-16

Similar Documents

Publication Publication Date Title
US8704830B2 (en) System and method for path rendering with multiple stencil samples per color sample
TWI525584B (en) Programmable blending in multi-threaded processing units
US7969446B2 (en) Method for operating low power programmable processor
US6545686B1 (en) Cache memory and method for use in generating computer graphics texture
TWI272537B (en) Method and apparatus for compressing and decompressing instructions in a computer system
JP4914829B2 (en) Low power programmable processor
US6380935B1 (en) circuit and method for processing render commands in a tile-based graphics system
US6807620B1 (en) Game system with graphics processor
US8081184B1 (en) Pixel shader program thread assembly
TWI250785B (en) Image rendering device and image rendering method
TWI437507B (en) System and method for memory access of multi-thread execution units in a graphics processing apparatus
US9218793B2 (en) Intermediate value storage within a graphics processing apparatus
US20040189651A1 (en) Programmable graphics system and method using flexible, high-precision data formats
US20100265259A1 (en) Generating and resolving pixel values within a graphics processing pipeline
JPH06348854A (en) Graphic accelerator and geometric object drawing method
CN102648450A (en) Hardware for parallel command list generation
TW201435591A (en) Technique for accessing content-addressable memory
TW200929063A (en) Unified processor architecture for processing general and graphics workload
JP4154336B2 (en) Method and apparatus for drawing a frame of a raster image
TW201007610A (en) Hybrid multisample/supersample antialiasing
JP4637640B2 (en) Graphic drawing device
US20040169650A1 (en) Digital image compositing using a programmable graphics processor
TW201435581A (en) Triggering performance event capture via pipelined state bundles
US7508396B2 (en) Register-collecting mechanism, method for performing the same and pixel processing system employing the same
TWI289808B (en) Register-collecting mechanism, method for performing the same and pixel processing system employing the same

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees