TW569138B - A method for improving instruction selection efficiency in a DSP/RISC compiler - Google Patents
A method for improving instruction selection efficiency in a DSP/RISC compiler Download PDFInfo
- Publication number
- TW569138B TW569138B TW91121431A TW91121431A TW569138B TW 569138 B TW569138 B TW 569138B TW 91121431 A TW91121431 A TW 91121431A TW 91121431 A TW91121431 A TW 91121431A TW 569138 B TW569138 B TW 569138B
- Authority
- TW
- Taiwan
- Prior art keywords
- compiler
- instruction
- dsp
- risc
- digital signal
- Prior art date
Links
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
Description
569138 五、發明說明(1) 發明領域 本發明係有關於一種用以改進數位信號處理器/精簡 才曰集運异編譯裔(DSP/RISC compiler)之指令選擇效能 之方法,以同時得到最佳化執行效能及空間使用率。 相關技術說明569138 V. Description of the invention (1) Field of the invention The present invention relates to a method for improving the performance of instruction selection of a digital signal processor / reduced DSP / RISC compiler in order to obtain optimization at the same time. Performance and space usage. Related technical notes
第1圖是一典型編譯器的結構。在第1圖中,該結構包 含一人類可讀取的原始碼11、一編譯器丨2及一目標性目的 碼13。上述編譯器12又包含一前端器2〇〇、一最佳化器 202、一語法處理器204、一圖案表產生器206及一碼產生 器208。如第1圖所示,該前端器2〇〇接收該人類可讀取原 始碼11,例如,使用C、C + +、VB或PASCAL高階語言(這些 可儲存於一如内部記憶體或外部硬碟的儲存元件中)所撰 寫的一原始碼並執行一符號分析。該最佳化器2〇2翻譯該 原始碼11成為一最佳化的中間表述(intermediate representation,IR)。該語法處理器2〇4執行一語法分析 且其執行結果被輸入至一圖案表產生器中,以產生一組圖 案匹配表(pattern matching tables,PMTs)。根據上述 中間表述(IR)及圖案匹配表(PMTs),執行語義樹圖案的匹 配’以使该碼產生器2 0 8輸出一目的碼1 3。那些熟知此項Figure 1 shows the structure of a typical compiler. In Fig. 1, the structure includes a human-readable source code 11, a compiler 2 and a target purpose code 13. The compiler 12 includes a front end 2000, an optimizer 202, a syntax processor 204, a pattern table generator 206, and a code generator 208. As shown in Figure 1, the front end 200 receives the human-readable source code 11, for example, using C, C ++, VB, or PASCAL high-level languages (these can be stored in internal memory or external hardware) Source code) and perform a symbol analysis. The optimizer 202 translates the source code 11 into an optimized intermediate representation (IR). The syntax processor 204 performs a syntax analysis and its execution result is input into a pattern table generator to generate a set of pattern matching tables (PMTs). According to the above-mentioned intermediate expressions (IR) and pattern matching tables (PMTs), the matching of the semantic tree pattern is performed 'so that the code generator 208 outputs a destination code 13. Those familiar with this
技術之人士應了解,該目的碼1 3係如期待的可包括組合語 言碼(assembly code)或二進制碼(binary c〇de)。、’ 口叩 上述中間表述(IR)包含一些基本方塊(basic block)。一個基本方塊係在頂部有一單入口而在底部 單出口的一系列中間指令。每個基本方塊可代表一或更多Those skilled in the art should understand that the object code 13 may include an assembly code or a binary code as desired. , 口 叩 The above intermediate expression (IR) contains some basic blocks. A basic block is a series of intermediate instructions with a single entry at the top and a single exit at the bottom. Each basic block can represent one or more
569138 五、發明說明(2) 獨立的資料相依圖形,每個圖形包含一或更 t d es)。每個節點大致上代表在—目標機器(未顯示)中 ΪΉ令’可使該目標機器執行-與該指令相關的功 :及料相依圖τ一節點的操作會隨產生的水 =或U 1點(其中如此稱呼該前—節點是因它先於 γ一郎點被執行)中所產生的一變數而定。然而,該 二郎點的操作不隨產生的資料及/或其在下—節點(除非存 在一迴圈使得下一節點會先於前一節點執行)中所產生 一變數而定。 傳統上’該機器獨有的資訊(例如指令特性、指Α延 遲、指令所使用的暫存器數量及型態等等)被嵌入至:譯 器中。因此,在該編譯器12中的最佳化器2〇2係隨機器種 類而定。這個隨機器種類而定的最佳化器2〇2不斷地執行 指令選擇、暫存器分配及指令重組與平行化的工作。下舉 一例子以說明對一語義樹的指令選擇上,習知技術盥本二 明之間的差別。 η ^ 第2圖是一基本方塊例及其利用第1圖的編譯器所操作 的語義樹的圖形。如第2圖所示,本例顯示一具有一獨立 資料相依圖形的基本方塊,該獨立資料相依圖形含一運算 式 pRO = abs(pRl - pR2) + (pR3 - pR4)及它的語義樹,其中, pRO-4係為暫存器。為了完成這個語義樹,該碼產生器208 執行該樹圖案匹配操作。該樹圖案匹配係在暫存器分配操 作之前所執行的自底部往上的指令選擇操作。如第3圖所 示’節點暫存器pR5及pR7先分別選擇一由該圖案表產生器569138 V. Description of the invention (2) Independent data dependent figures, each figure contains one or t d es). Each node is roughly represented in the target machine (not shown) by order 'to enable the target machine to perform-the work related to the instruction: and in the material dependency graph τ the operation of a node will follow the water = or U 1 A variable (in which the former-node is so called because it is executed before the γ-Ichiro point). However, the operation of the Erlang point does not depend on the generated data and / or its next-node (unless it exists in a circle so that the next node will execute before the previous node). Traditionally, the unique information of the machine (such as the characteristics of the instruction, the delay of the finger A, the number and type of registers used by the instruction, etc.) is embedded in the translator. Therefore, the optimizer 20 in the compiler 12 is a randomizer type. The optimizer 202, which depends on the type of randomizer, continuously performs the work of instruction selection, register allocation, and instruction reorganization and parallelization. An example is given below to illustrate the differences between the conventional techniques in the instruction selection of a semantic tree. η ^ Figure 2 is an example of a basic block and a graph of a semantic tree operated by the compiler of Figure 1. As shown in Figure 2, this example shows a basic block with an independent data dependency graph, which contains an expression pRO = abs (pRl-pR2) + (pR3-pR4) and its semantic tree. Among them, pRO-4 is a temporary register. To complete this semantic tree, the code generator 208 performs the tree pattern matching operation. This tree pattern matching is a bottom-up instruction selection operation performed before the register allocation operation. As shown in Figure 3, the node register pR5 and pR7 first select one from the pattern table generator respectively.
0697-8165TWF(n);P2002-018;sue.ptd0697-8165TWF (n); P2002-018; sue.ptd
569138 五、發明說明(3) 所提供的配對圖案(match pattern)來形成,接著,節點 暫存器pR6及pR8利用跟其前一節點暫存器相同的方式來形 成。最後,在節點暫存器pR〇被形成並由該碼產生器2〇8輸 出時完成這個語義樹。然而,像第i圖丨2這樣的一傳統式 編譯器存在著無法同時提供最佳化空間使用率及最佳化執 行效能的問題。在上述情形下,大多會犧牲最佳化空間使 用率。例如,在該最佳化器202中,有二個方案來形成節 點PR6或PR8。第一方案示於第4a圖中,使用一條件式指令 (conditional instruction)及執行時需要6個時脈的一跳 躍指令(jump instruction)。該第一方案產生一2個指令 大小(空間使用率)及一平均4個時脈(執行效能)的結果。 第二方案示於第4b圖中,使用互斥、32次的帶符號位移及 減法運算。該第二方案產生一3個指令大小及一平均3個時 脈的結果。因此,當使用前者來產生最佳化空間使用率 時,它需要如第5a圖所示的11個時脈及7個指令。又,當 Ϊ。用後者來產生最佳化執行效能時,它需要如第讣圖所田示 個時脈及9個指令。如上述,可看到效能與空間使用率 :::谷的。如第6圖所示,它呈現出一種負線性關係(一 = 的直線)並在其左下方(點3處)具有一較佳的 二質而在其右下方(點b處)具有一較差的品質。舉例來 =當-使用者需要-12K大小的空間時’因為 说處理器的是以2n來成長,故該使用去"a日塞四ict, 田 人π 故成便用者必須購買一 16Κ容 ^大小的數位信號處理器’其中,η係為一整數。這個方 式將浪費上述1 6 Κ容量中的1 / 4。该個pq日s κ左# JA/4坆個問題隨著將該編譯器569138 V. Description of the invention (3) The matching pattern is provided. Then, the node register pR6 and pR8 are formed in the same way as the previous node register. Finally, the semantic tree is completed when the node register pR0 is formed and output by the code generator 208. However, a conventional compiler such as Fig. 2 has the problem that it cannot provide optimized space usage and optimized execution performance at the same time. In these cases, the optimization of space usage is often sacrificed. For example, in the optimizer 202, there are two schemes to form a node PR6 or PR8. The first scheme is shown in Figure 4a. It uses a conditional instruction and a jump instruction that requires 6 clocks for execution. This first solution produces a result of 2 instruction sizes (space usage) and an average of 4 clocks (execution performance). The second scheme is shown in Figure 4b, using mutually exclusive, 32-time signed shifts and subtraction operations. This second scheme produces a result of 3 instruction sizes and an average of 3 clocks. Therefore, when using the former to generate optimized space usage, it requires 11 clocks and 7 instructions as shown in Figure 5a. And when Ϊ. When using the latter to generate optimal execution performance, it requires the clocks and 9 instructions as shown in the figure below. As mentioned above, you can see the efficiency and space usage ::: Valley. As shown in Figure 6, it exhibits a negative linear relationship (a = straight line) and has a better second quality at the lower left (at point 3) and a poorer lower right (at point b). Quality. For example = when-the user needs -12K space 'Because the processor is growing by 2n, it should be used "a Japanese plug four ICT, Tian Ren π Therefore, the user must purchase a 16K A digital signal processor of a large size, where η is an integer. This method will waste 1/4 of the above 16K capacity. The pq day s κ 左 # JA / 4
569138 五、發明說明(4) " -- ^用於廣泛使用的多媒體,尤其是於影像處理用的一數位 佗號處理器/精簡指令集運算系統的發展中,漸漸變的嚴 重。 發明概述 據此’本發明之一目的係提供一種用以改進數位信號 處理器/精簡指令集運算編譯器之指令選擇效能之方法, 以同時獲得最佳化執行效能及空間使用率。 ,本發明提供一種用以改進數位信號處理器/精簡指令 ,運算編譯器(DSP/RISC comp i ler)之指令選擇效能之方 法,其在一使用者所選擇的有限使用空間内決定一最佳化 碼尺寸,藉以同時產生最佳化執行效能及空間使用率。本 ^法,3下列步驟:決定一基本方塊(basic bi〇ck)的一 語義樹(semantic tree);經由參考一組圖案找到所有匹 配=上述語義樹的組合(aU matching c〇[nbinati〇ns); 決定各組合的時脈數(cycle number)及指令長度 (instruction length);根據一預定的時脈數及指令長 度,過濾上述時脈數中,超過該預定的時脈數者與具 3 :寺=數及指令長度中的多餘者;以及從上述過濾後剩下 、、’且a中,選出一組具有最少時脈數的組合做為輸出,當 做所期待的目的碼(〇bject c〇(je)。 較佳實施例之詳細說明 全文中之相同功能元件係以相同參考號代表之。 第7圖是一根據本發明用以改進改進在一數位 理器/精簡指令集運算編譯器(DSP/RISC compiler)w^^569138 V. Description of the invention (4) "-^ It is used for the widely used multimedia, especially in the development of a digital processor / reduced instruction set computing system for image processing, which is becoming more and more serious. SUMMARY OF THE INVENTION Accordingly, one of the objects of the present invention is to provide a method for improving the instruction selection performance of a digital signal processor / reduced instruction set arithmetic compiler, so as to obtain optimized execution performance and space usage at the same time. The present invention provides a method for improving the instruction selection performance of a digital signal processor / reduced instruction and arithmetic compiler (DSP / RISC compiler), which determines an optimum within a limited use space selected by a user. The code size is optimized to generate optimized execution performance and space usage at the same time. In this method, 3 the following steps: determine a semantic tree of a basic biock; find all matches by referring to a set of patterns = a combination of the above semantic trees (aU matching c〇 [nbinati〇 ns); determine the number of cycles (cycle number) and instruction length (instruction length) of each combination; according to a predetermined number of clocks and instruction length, filter the number of clocks that exceed the predetermined number of clocks and have 3: Temple = the excess of the number and the instruction length; and from the remainder after filtering, and ', and a, select a group with the least number of clocks as the output, as the desired destination code (〇bject c〇 (je). Detailed description of the preferred embodiment. The same functional elements are represented by the same reference numerals throughout the text. Figure 7 is a diagram of a digital processor / reduced instruction set operation compilation according to the present invention. (DSP / RISC compiler) w ^^
569138 五、發明說明(5) 令選擇效能之方法的流程圖。在第7圖中,該方法 列步驟:決定一基本方塊(basic bl〇ck)的一語義3下 (semantic tree)(S1);經由參考一組圖案找到所有匹配 於上述語義樹的組合(al 1 matching combinations)(S2);決定各組合的時脈數(cycle nuniber·)及指令長度(instructi〇n length)(s3);根 ,定的時脈數及指令長度,過濾、上述時脈數中,超過該預 疋的時脈數者與具有相同時脈數及指令長度中的多餘 (S4^以及從上述過濾後剩下的組合中選出一組具 >、時脈數的組合做為輸出,當做所期待的目的碼(〇叫“七 )。如第7圖所不,當比較本發明與習知的指令選 :寺’後者在沒有尋找所有可能組合以決定最佳的空間的 ^形下完成它的基本方塊的語義樹。對上述相同例子(si) 相;r根ί本發明’其指令選擇邏輯是以第2圖中所示的 相同範例為基礎。 ^ 圖崇中,一組圖案被選取。如第8圖所示,該組 圖案81在郎點暫存器_中具有4種圖案分別為 ^(P^l)、abs(Prl)、pRl+pR2 及州—⑽。符號如 (fbsl ),4’ 2代表需要4時脈數及2指令的一第一絕對值運算 sj同樣地,符號(abs2),4, 2代表需要4時脈數及2指令 f&對值運算―2。X —加法或減法運算需要1 ::數及1指令。在習知例子中’所示係為只使用該第一 或第一絕對值運算來完成該語義樹。然而,根據本發明, 配置該語義樹可具有如第9圖所示的四種組合91,分別具569138 V. Description of the invention (5) Flow chart of the method for selecting the effectiveness. In Figure 7, the method includes steps: determining a semantic tree (S1) of a basic block (S1); and finding all combinations that match the above semantic tree by referring to a set of patterns ( al 1 matching combinations) (S2); determine the number of cycles (cycle nuniber ·) and instruction length (instructioon length) (s3) of each combination; root, fixed number of clocks and instruction length, filtering, the above clocks Among the numbers, the number of clocks exceeding the preliminaries is the same as having the same number of clocks and the excess of the command length (S4 ^ and a combination of > and the number of clocks is selected from the remaining combinations after the filtering above). For the output, it is used as the expected destination code (0 is called "seven"). As shown in Figure 7, when comparing the present invention with the conventional command selection: Temple 'the latter does not look for all possible combinations to determine the best space. ^ Complete the semantic tree of its basic block. For the same example (si) above; the root of the invention 'its instruction selection logic is based on the same example shown in Figure 2. ^ Tu Chongzhong, A group of patterns is selected. As shown in FIG. 8, the group of patterns 81 is temporarily stored in the Lang point. There are 4 kinds of patterns in the device _ (P ^ l), abs (Prl), pRl + pR2, and state—⑽. Symbols such as (fbsl), 4 '2 represents the number of clocks and 2 instructions. An absolute value operation sj Similarly, the symbol (abs2), 4, 2 means 4 clocks and 2 instructions f & pair value operation-2. X-addition or subtraction requires 1 :: number and 1 instruction. The example shown in the known example is to use only the first or first absolute value operation to complete the semantic tree. However, according to the present invention, the semantic tree can be configured to have four combinations 91 as shown in FIG. 9, respectively With
569138569138
五、發明說明(6) 有11時脈數及7指令;9時脈數及9指令; 令;及10時脈數及8指令(S3)。因為最後二數及8指 時脈數及指令,因此捨去一組,如最後且γ具有相同 8指令空間的預定指令長度限制下 述且。合。在考慮 二組合刪除(S4)。剩下來的結合中二上Λ有有9指令的第V. Description of the invention (6) There are 11 clocks and 7 instructions; 9 clocks and 9 instructions; order; and 10 clocks and 8 instructions (S3). Because the last two digits and 8 refer to the number of clocks and instructions, a group is discarded, as described below and the predetermined instruction length limit of γ with the same 8 instruction space is described below. Together. The second combination deletion is under consideration (S4). In the remaining combination, the second and upper Λ have 9 orders.
abs2的這個結合具有10時脈數,其時脈數:於:=J 數的另一結合,因此輸出具有一absl&abs2這 人傲 為期待的目的碼(S5)。 们、個…合做 用來執行上述步驟的邏輯法則如下: comp_C(v) Ον=Φ for all peP, if p can match v then i/=v+rl (p) ; tf2=v-fi:2 (p);' for all Cn,±eCn and all C/2rjeC/2 if s ize (C/ifi) +s ize (c/2,j) +s ize (p) thenThis combination of abs2 has a number of 10 clocks, and its number of clocks: another combination of the number of: = J, so it outputs an objective code with an absl & abs2 proud of it (S5). The logic rule used to perform the above steps is as follows: comp_C (v) Ον = Φ for all peP, if p can match v then i / = v + rl (p); tf2 = v-fi: 2 (p); 'for all Cn, ± eCn and all C / 2rjeC / 2 if s ize (C / ifi) + s ize (c / 2, j) + s ize (p) then
Cv=insert (Cv, (p , s ize (C〜i) +size (C/Zr j) +s ize (p) , cycle (C/i,i) + cycle (C/2, j) +cycle (p) Ά i2)); return Cv 如上述,該程序副程式名稱為comp_C(v)。Cv是一用 於每個節點v的候選組,開始時設定為空集合(Φ )。P是一Cv = insert (Cv, (p, s ize (C ~ i) + size (C / Zr j) + s ize (p), cycle (C / i, i) + cycle (C / 2, j) + cycle (p) Ά i2)); return Cv As mentioned above, the subroutine name of this program is comp_C (v). Cv is a candidate group for each node v, which is initially set to an empty set (Φ). P is one
0697-8165TWF(n);P2002-018;sue.ptd 569138 五、發明說明(7) ----- 組預置圖案。p是—所選圖案。是在該組G内從圖案根 部至最左邊節點中的第i個元素而〜A在該組Cv内從圖案 根部至最右邊節點中的第】個元素一限定的記憶體 空間。設定(:νΛ =(圖案名稱(p)、時脈數(cycle)、指令 長度(s 1 ze )、左運算節點(打)、右運算節點(扣)),其 中,C^i代表經由花費η長度大小及m時脈數以結合左節點 及右節點來完成該語義樹上的圖案而完成該組匕中的第i 個元=。得到該組Cv的方式可以不只是一種圖案。因此, 當一節點上的一向量具有一範圍落在所限記憶體空間 (也就是,總指令長度為 sizj(cw HSiZe(c^Hsize(p)W )内的長度大小時,該 向量將被插入至該候選組cv中。上述邏輯(程序副程式)被 遞迴性地執行,直到完成該惟一的根部r運算。例如,如 第1 0圖所不,一具有節點u、V、χ、y及…的語義樹丁分別具 有下列可能的指令選擇組為Cu={(-,1,1,a,b)丨、 、0697-8165TWF (n); P2002-018; sue.ptd 569138 V. Description of the invention (7) ----- Group preset pattern. p is-the selected pattern. It is the memory space defined by the i-th element in the group G from the pattern root to the leftmost node and ~ A from the pattern root to the right-most node in the group Cv. Set (: νΛ = (pattern name (p), clock number (cycle), instruction length (s 1 ze), left operation node (beat), right operation node (deduction)), where C ^ i represents the cost The length of η and the number of m clocks are combined with the left node and the right node to complete the pattern on the semantic tree to complete the i-th element in the set of d =. The way to obtain the group Cv can be more than just a pattern. Therefore, When a vector at a node has a length that falls within the limited memory space (that is, the total instruction length is within sizj (cw HSiZe (c ^ Hsize (p) W)), the vector will be inserted into In the candidate group cv, the above logic (program subroutine) is executed recursively until the unique root r operation is completed. For example, as shown in Figure 10, a node with nodes u, V, χ, y, and … The semantic tree has the following possible instruction selection groups: Cu = {(-, 1,1, a, b) 丨,,
Cv={(-,l,l,c,d)}、Cx={(absl,5,3,u,Φ)、 (abs2,4,4,u,Φ)}、Cy={(absl,5,3,v,φ)、 (abs2,4,4,v,0)}&Cw={( +,li,7,x,y)、( +,1Q,8,x,y)、 ( +,9,9,x,y)}。藉由上述最佳化的指令選擇,如第丨1圖所 示’比較在該根部組Cw中的所有候選者,在一區域邊界 (如虛線所示’並非如習知技術中的一直線邊界)限制下, 從底部Cu={(-,1,1,a, b)}及Cv={( —,!,;[,c,d)}經Cv = {(-, l, l, c, d)}, Cx = {(absl, 5,3, u, Φ), (abs2,4,4, u, Φ)}, Cy = {(absl, 5,3, v, φ), (abs2,4,4, v, 0)} & Cw = {(+, li, 7, x, y), (+, 1Q, 8, x, y), (+, 9, 9, x, y)}. With the above optimized instruction selection, as shown in Fig. 1 'compare all candidates in the root group Cw, at a region boundary (as shown by the dotted line' is not a straight line boundary as in the conventional technology) Under the constraints, Cu = {(-, 1,1, a, b)} and Cv = {(—,!,; [, C, d)}
569138 五、發明說明(8) »569138 V. Description of the invention (8) »
Cx={(absl,5,3,u,φ)}及Cy={(abs2 C:-二(+,1〇, 8, χ’ y)}的一條路徑被輸’:Φ)}至根部 的碼(如第1圖所示的相同結構)。士口此,,亥蝙譯器的目 小下,可達到較習知技術高的執行效能。相同記憶體大 雖然本發明已以一較佳實# 本發明,任何熟知此技術之人:如=,然其並非用 範圍當視後附之申請 /、 _,因此本發明之# 1 明專利軌圍所界定者為準。 保5楚 0697-8165TWF(n);P2002-018;sue.ptd 第12頁 138 圖式簡單說明 圖示之簡單說明 ,讓本發明之上述及其它目的、特徵、 而易見,下文鸫與 炎”、έ此更顯 細說明如; 較貫例,並配合所附圖式,作詳 第1圖是一典型編譯器的結構; =2圖是一基本方塊例及它的語義樹的圖形; 點的ϊ 3Λ是Γ經編譯器分解至第2圖的語義樹上的所有r 的基本方塊例圖形; J所有即 義樹ϊ4/ ®是—具有由編譯器所作的第—種指令選摆的在 我树的部分圖案圖形; 曰7選擇的浯 義抖Ϊ?圖是一具有由編譯器所作的第二種指令選摞的也 義枒的部分圖案圖形; 7選擇的浯 第5a圖是一經第4圖的第一 樹的圖形; 罘種扣令選擇所完成的語義 第5b圖是一經第4b圖的第— 樹的圖形; 幻罘一種乳令選擇所完成的語義 一第2圖的時脈數對空間的曲線圖; 理5| /铲羚:八2 = 士發明用以改進改進在-數位信號處 王為/精簡指令集運算編譯器( ·、丄 令潠摆外处—士 a 、SP/RISC compiler)中的指 7選擇效能之方法的流程圖; 第8圖是一根據本發明用 組圖案範例; ;第2圖中的基本方塊例的一 第9圖是一根據本發明經第二 義樹的圖形; 罘一種私7選擇所完成的浯A path of Cx = {(absl, 5,3, u, φ)} and Cy = {(abs2 C: -two (+, 1〇, 8, χ 'y)} is input': Φ)} to the root (The same structure as shown in Figure 1). From this point of view, the purpose of the Haiba translator is small, and it can achieve higher execution efficiency than the conventional technology. The same memory is large. Although the present invention has a better practice # This invention, anyone who is familiar with this technology: If =, but it is not the scope of the attached application /, _, so the invention # 1 Ming patent The ones defined by the rail fence shall prevail. Bao 5 Chu 0697-8165TWF (n); P2002-018; sue.ptd Page 12 138 Schematic illustrations to illustrate the simple description of the illustrations, to make the above and other objects, features and characteristics of the present invention easy to see. ", Here is a more detailed explanation such as; a more general example, and in conjunction with the attached drawings, detailed Figure 1 is the structure of a typical compiler; Figure 2 is an example of a basic block and its semantic tree; The point Λ 3Λ is the basic block example graph of all r on the semantic tree of Fig. 2 decomposed by the compiler; J all the meaning trees ϊ 4 / ® are—with the first kind of instruction chosen by the compiler. Part of the pattern of my tree; The selected image is a partial pattern with the second instruction selected by the compiler. The selected image of Figure 5a is Figure 4 of the first tree; Figure 2b is the semantics completed by the selection of Figure 4b. Figure 5b is the figure of the tree of Figure 4b. Number vs. space graph; Logic 5 | / Shovel: 8 2 = Inventor of the invention to improve on the-digital signal Flow chart of the method of selecting performance for / reduced instruction set operation compiler (·, command line, outside—Shia, SP / RISC compiler); Figure 8 is a group pattern according to the present invention Example;; An example of the basic block in Figure 2 Figure 9 is a graph of the second sense tree according to the present invention; 罘 A private 7 selection completed 浯
569138 圖式簡單說明 第1 0圖是一在執行本發明上述選擇邏輯後的結果說明 例;及 第11圖是一根據本發明的時脈數對空間的曲線圖。 [符號說明] 1 1 :人類可讀取原始碼 12 :編譯器 13 :目的碼 2 0 0 :前端器 202 :最佳化器 2 0 4 ··語法處理器 206 :圖案表產生器 2 0 8 :碼產生器569138 Brief Description of Drawings Figure 10 is an example of the result after implementing the above selection logic of the present invention; and Figure 11 is a graph of the number of clocks versus space according to the present invention. [Notation] 1 1: Human-readable source code 12: Compiler 13: Object code 2 0 0: Front end 202: Optimizer 2 0 4 · · Syntax processor 206: Pattern table generator 2 0 8 : Code Generator
0697-8165TWF(n);P2002-018;sue.ptd 第 14 頁0697-8165TWF (n); P2002-018; sue.ptd page 14
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW91121431A TW569138B (en) | 2002-09-19 | 2002-09-19 | A method for improving instruction selection efficiency in a DSP/RISC compiler |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW91121431A TW569138B (en) | 2002-09-19 | 2002-09-19 | A method for improving instruction selection efficiency in a DSP/RISC compiler |
Publications (1)
Publication Number | Publication Date |
---|---|
TW569138B true TW569138B (en) | 2004-01-01 |
Family
ID=32590466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW91121431A TW569138B (en) | 2002-09-19 | 2002-09-19 | A method for improving instruction selection efficiency in a DSP/RISC compiler |
Country Status (1)
Country | Link |
---|---|
TW (1) | TW569138B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7877741B2 (en) | 2005-04-29 | 2011-01-25 | Industrial Technology Research Institute | Method and corresponding apparatus for compiling high-level languages into specific processor architectures |
US8484442B2 (en) | 2004-03-31 | 2013-07-09 | Icera Inc. | Apparatus and method for control processing in dual path processor |
US8484441B2 (en) | 2004-03-31 | 2013-07-09 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths |
CN103440155A (en) * | 2013-07-05 | 2013-12-11 | 万高(杭州)科技有限公司 | Compiler of digital signal processor |
US9477475B2 (en) | 2004-03-31 | 2016-10-25 | Nvidia Technology Uk Limited | Apparatus and method for asymmetric dual path processing |
-
2002
- 2002-09-19 TW TW91121431A patent/TW569138B/en not_active IP Right Cessation
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8484442B2 (en) | 2004-03-31 | 2013-07-09 | Icera Inc. | Apparatus and method for control processing in dual path processor |
US8484441B2 (en) | 2004-03-31 | 2013-07-09 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths |
US9477475B2 (en) | 2004-03-31 | 2016-10-25 | Nvidia Technology Uk Limited | Apparatus and method for asymmetric dual path processing |
US7877741B2 (en) | 2005-04-29 | 2011-01-25 | Industrial Technology Research Institute | Method and corresponding apparatus for compiling high-level languages into specific processor architectures |
CN103440155A (en) * | 2013-07-05 | 2013-12-11 | 万高(杭州)科技有限公司 | Compiler of digital signal processor |
CN103440155B (en) * | 2013-07-05 | 2016-08-31 | 万高(杭州)科技有限公司 | A kind of compiler of digital signal processor |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0926594B1 (en) | Method of using primary and secondary processors | |
Settle | High-performance dynamic programming on fpgas with opencl | |
JP4283131B2 (en) | Processor and compiling method | |
JP3896087B2 (en) | Compiler device and compiling method | |
US8056069B2 (en) | Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization | |
US9626168B2 (en) | Compiler optimizations for vector instructions | |
Buchsbaum et al. | A new, simpler linear-time dominators algorithm | |
US10802806B1 (en) | Generating vectorized control flow using reconverging control flow graphs | |
JP2004164163A (en) | Method and device for creating simd instruction sequence and program for creating simd instruction sequence | |
US6611956B1 (en) | Instruction string optimization with estimation of basic block dependence relations where the first step is to remove self-dependent branching | |
US9594668B1 (en) | Debugger display of vector register contents after compiler optimizations for vector instructions | |
TW569138B (en) | A method for improving instruction selection efficiency in a DSP/RISC compiler | |
US7376818B2 (en) | Program translator and processor | |
Johnstone et al. | Generalised parsing: Some costs | |
US20040025151A1 (en) | Method for improving instruction selection efficiency in a DSP/RISC compiler | |
Um et al. | Optimal allocation of carry-save-adders in arithmetic optimization | |
JP2018163381A (en) | Code generation apparatus, code generation method, and code generation program | |
Kuras et al. | Value cloning for architectures with partitioned register banks | |
Karuri et al. | A generic design flow for application specific processor customization through instruction-set extensions (ISEs) | |
Kovačević et al. | A solution for automatic parallelization of sequential assembly code | |
Fryza et al. | Advanced mapping techniques for digital signal processors | |
JP2004021425A (en) | Memory arrangement system in compiler | |
JP3464019B2 (en) | Register allocation method | |
Haubelt et al. | Using stream rewriting for mapping and scheduling data flow graphs onto many-core architectures | |
Boulytchev | BURS-based instruction set selection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent | ||
MM4A | Annulment or lapse of patent due to non-payment of fees |