TW569138B - A method for improving instruction selection efficiency in a DSP/RISC compiler - Google Patents

A method for improving instruction selection efficiency in a DSP/RISC compiler Download PDF

Info

Publication number
TW569138B
TW569138B TW91121431A TW91121431A TW569138B TW 569138 B TW569138 B TW 569138B TW 91121431 A TW91121431 A TW 91121431A TW 91121431 A TW91121431 A TW 91121431A TW 569138 B TW569138 B TW 569138B
Authority
TW
Taiwan
Prior art keywords
compiler
instruction
dsp
risc
digital signal
Prior art date
Application number
TW91121431A
Other languages
Chinese (zh)
Inventor
Shan-Chyun Ku
Original Assignee
Faraday Tech Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Faraday Tech Corp filed Critical Faraday Tech Corp
Priority to TW91121431A priority Critical patent/TW569138B/en
Application granted granted Critical
Publication of TW569138B publication Critical patent/TW569138B/en

Links

Abstract

A method for improving instruction selection efficiency in a DSP/RISC compiler. Concurrently obtaining optimal performance and space, the method includes the following steps: determining a semantic tree for a basic block; finding all matching combinations for the semantic tree with reference to a set of patterns; determining cycle number and instruction length for all combinations; filtering the instruction length greater than a predetermined instruction length and extra ones having the same cycle number and instruction length according to the determined cycle number and instruction length; and choosing one combination with the smallest cycle number from the remaining combinations and outputting the one combination as the desired object code.

Description

569138 五、發明說明(1) 發明領域 本發明係有關於一種用以改進數位信號處理器/精簡 才曰集運异編譯裔(DSP/RISC compiler)之指令選擇效能 之方法,以同時得到最佳化執行效能及空間使用率。 相關技術說明569138 V. Description of the invention (1) Field of the invention The present invention relates to a method for improving the performance of instruction selection of a digital signal processor / reduced DSP / RISC compiler in order to obtain optimization at the same time. Performance and space usage. Related technical notes

第1圖是一典型編譯器的結構。在第1圖中,該結構包 含一人類可讀取的原始碼11、一編譯器丨2及一目標性目的 碼13。上述編譯器12又包含一前端器2〇〇、一最佳化器 202、一語法處理器204、一圖案表產生器206及一碼產生 器208。如第1圖所示,該前端器2〇〇接收該人類可讀取原 始碼11,例如,使用C、C + +、VB或PASCAL高階語言(這些 可儲存於一如内部記憶體或外部硬碟的儲存元件中)所撰 寫的一原始碼並執行一符號分析。該最佳化器2〇2翻譯該 原始碼11成為一最佳化的中間表述(intermediate representation,IR)。該語法處理器2〇4執行一語法分析 且其執行結果被輸入至一圖案表產生器中,以產生一組圖 案匹配表(pattern matching tables,PMTs)。根據上述 中間表述(IR)及圖案匹配表(PMTs),執行語義樹圖案的匹 配’以使该碼產生器2 0 8輸出一目的碼1 3。那些熟知此項Figure 1 shows the structure of a typical compiler. In Fig. 1, the structure includes a human-readable source code 11, a compiler 2 and a target purpose code 13. The compiler 12 includes a front end 2000, an optimizer 202, a syntax processor 204, a pattern table generator 206, and a code generator 208. As shown in Figure 1, the front end 200 receives the human-readable source code 11, for example, using C, C ++, VB, or PASCAL high-level languages (these can be stored in internal memory or external hardware) Source code) and perform a symbol analysis. The optimizer 202 translates the source code 11 into an optimized intermediate representation (IR). The syntax processor 204 performs a syntax analysis and its execution result is input into a pattern table generator to generate a set of pattern matching tables (PMTs). According to the above-mentioned intermediate expressions (IR) and pattern matching tables (PMTs), the matching of the semantic tree pattern is performed 'so that the code generator 208 outputs a destination code 13. Those familiar with this

技術之人士應了解,該目的碼1 3係如期待的可包括組合語 言碼(assembly code)或二進制碼(binary c〇de)。、’ 口叩 上述中間表述(IR)包含一些基本方塊(basic block)。一個基本方塊係在頂部有一單入口而在底部 單出口的一系列中間指令。每個基本方塊可代表一或更多Those skilled in the art should understand that the object code 13 may include an assembly code or a binary code as desired. , 口 叩 The above intermediate expression (IR) contains some basic blocks. A basic block is a series of intermediate instructions with a single entry at the top and a single exit at the bottom. Each basic block can represent one or more

569138 五、發明說明(2) 獨立的資料相依圖形,每個圖形包含一或更 t d es)。每個節點大致上代表在—目標機器(未顯示)中 ΪΉ令’可使該目標機器執行-與該指令相關的功 :及料相依圖τ一節點的操作會隨產生的水 =或U 1點(其中如此稱呼該前—節點是因它先於 γ一郎點被執行)中所產生的一變數而定。然而,該 二郎點的操作不隨產生的資料及/或其在下—節點(除非存 在一迴圈使得下一節點會先於前一節點執行)中所產生 一變數而定。 傳統上’該機器獨有的資訊(例如指令特性、指Α延 遲、指令所使用的暫存器數量及型態等等)被嵌入至:譯 器中。因此,在該編譯器12中的最佳化器2〇2係隨機器種 類而定。這個隨機器種類而定的最佳化器2〇2不斷地執行 指令選擇、暫存器分配及指令重組與平行化的工作。下舉 一例子以說明對一語義樹的指令選擇上,習知技術盥本二 明之間的差別。 η ^ 第2圖是一基本方塊例及其利用第1圖的編譯器所操作 的語義樹的圖形。如第2圖所示,本例顯示一具有一獨立 資料相依圖形的基本方塊,該獨立資料相依圖形含一運算 式 pRO = abs(pRl - pR2) + (pR3 - pR4)及它的語義樹,其中, pRO-4係為暫存器。為了完成這個語義樹,該碼產生器208 執行該樹圖案匹配操作。該樹圖案匹配係在暫存器分配操 作之前所執行的自底部往上的指令選擇操作。如第3圖所 示’節點暫存器pR5及pR7先分別選擇一由該圖案表產生器569138 V. Description of the invention (2) Independent data dependent figures, each figure contains one or t d es). Each node is roughly represented in the target machine (not shown) by order 'to enable the target machine to perform-the work related to the instruction: and in the material dependency graph τ the operation of a node will follow the water = or U 1 A variable (in which the former-node is so called because it is executed before the γ-Ichiro point). However, the operation of the Erlang point does not depend on the generated data and / or its next-node (unless it exists in a circle so that the next node will execute before the previous node). Traditionally, the unique information of the machine (such as the characteristics of the instruction, the delay of the finger A, the number and type of registers used by the instruction, etc.) is embedded in the translator. Therefore, the optimizer 20 in the compiler 12 is a randomizer type. The optimizer 202, which depends on the type of randomizer, continuously performs the work of instruction selection, register allocation, and instruction reorganization and parallelization. An example is given below to illustrate the differences between the conventional techniques in the instruction selection of a semantic tree. η ^ Figure 2 is an example of a basic block and a graph of a semantic tree operated by the compiler of Figure 1. As shown in Figure 2, this example shows a basic block with an independent data dependency graph, which contains an expression pRO = abs (pRl-pR2) + (pR3-pR4) and its semantic tree. Among them, pRO-4 is a temporary register. To complete this semantic tree, the code generator 208 performs the tree pattern matching operation. This tree pattern matching is a bottom-up instruction selection operation performed before the register allocation operation. As shown in Figure 3, the node register pR5 and pR7 first select one from the pattern table generator respectively.

0697-8165TWF(n);P2002-018;sue.ptd0697-8165TWF (n); P2002-018; sue.ptd

569138 五、發明說明(3) 所提供的配對圖案(match pattern)來形成,接著,節點 暫存器pR6及pR8利用跟其前一節點暫存器相同的方式來形 成。最後,在節點暫存器pR〇被形成並由該碼產生器2〇8輸 出時完成這個語義樹。然而,像第i圖丨2這樣的一傳統式 編譯器存在著無法同時提供最佳化空間使用率及最佳化執 行效能的問題。在上述情形下,大多會犧牲最佳化空間使 用率。例如,在該最佳化器202中,有二個方案來形成節 點PR6或PR8。第一方案示於第4a圖中,使用一條件式指令 (conditional instruction)及執行時需要6個時脈的一跳 躍指令(jump instruction)。該第一方案產生一2個指令 大小(空間使用率)及一平均4個時脈(執行效能)的結果。 第二方案示於第4b圖中,使用互斥、32次的帶符號位移及 減法運算。該第二方案產生一3個指令大小及一平均3個時 脈的結果。因此,當使用前者來產生最佳化空間使用率 時,它需要如第5a圖所示的11個時脈及7個指令。又,當 Ϊ。用後者來產生最佳化執行效能時,它需要如第讣圖所田示 個時脈及9個指令。如上述,可看到效能與空間使用率 :::谷的。如第6圖所示,它呈現出一種負線性關係(一 = 的直線)並在其左下方(點3處)具有一較佳的 二質而在其右下方(點b處)具有一較差的品質。舉例來 =當-使用者需要-12K大小的空間時’因為 说處理器的是以2n來成長,故該使用去"a日塞四ict, 田 人π 故成便用者必須購買一 16Κ容 ^大小的數位信號處理器’其中,η係為一整數。這個方 式將浪費上述1 6 Κ容量中的1 / 4。该個pq日s κ左# JA/4坆個問題隨著將該編譯器569138 V. Description of the invention (3) The matching pattern is provided. Then, the node register pR6 and pR8 are formed in the same way as the previous node register. Finally, the semantic tree is completed when the node register pR0 is formed and output by the code generator 208. However, a conventional compiler such as Fig. 2 has the problem that it cannot provide optimized space usage and optimized execution performance at the same time. In these cases, the optimization of space usage is often sacrificed. For example, in the optimizer 202, there are two schemes to form a node PR6 or PR8. The first scheme is shown in Figure 4a. It uses a conditional instruction and a jump instruction that requires 6 clocks for execution. This first solution produces a result of 2 instruction sizes (space usage) and an average of 4 clocks (execution performance). The second scheme is shown in Figure 4b, using mutually exclusive, 32-time signed shifts and subtraction operations. This second scheme produces a result of 3 instruction sizes and an average of 3 clocks. Therefore, when using the former to generate optimized space usage, it requires 11 clocks and 7 instructions as shown in Figure 5a. And when Ϊ. When using the latter to generate optimal execution performance, it requires the clocks and 9 instructions as shown in the figure below. As mentioned above, you can see the efficiency and space usage ::: Valley. As shown in Figure 6, it exhibits a negative linear relationship (a = straight line) and has a better second quality at the lower left (at point 3) and a poorer lower right (at point b). Quality. For example = when-the user needs -12K space 'Because the processor is growing by 2n, it should be used "a Japanese plug four ICT, Tian Ren π Therefore, the user must purchase a 16K A digital signal processor of a large size, where η is an integer. This method will waste 1/4 of the above 16K capacity. The pq day s κ 左 # JA / 4

569138 五、發明說明(4) " -- ^用於廣泛使用的多媒體,尤其是於影像處理用的一數位 佗號處理器/精簡指令集運算系統的發展中,漸漸變的嚴 重。 發明概述 據此’本發明之一目的係提供一種用以改進數位信號 處理器/精簡指令集運算編譯器之指令選擇效能之方法, 以同時獲得最佳化執行效能及空間使用率。 ,本發明提供一種用以改進數位信號處理器/精簡指令 ,運算編譯器(DSP/RISC comp i ler)之指令選擇效能之方 法,其在一使用者所選擇的有限使用空間内決定一最佳化 碼尺寸,藉以同時產生最佳化執行效能及空間使用率。本 ^法,3下列步驟:決定一基本方塊(basic bi〇ck)的一 語義樹(semantic tree);經由參考一組圖案找到所有匹 配=上述語義樹的組合(aU matching c〇[nbinati〇ns); 決定各組合的時脈數(cycle number)及指令長度 (instruction length);根據一預定的時脈數及指令長 度,過濾上述時脈數中,超過該預定的時脈數者與具 3 :寺=數及指令長度中的多餘者;以及從上述過濾後剩下 、、’且a中,選出一組具有最少時脈數的組合做為輸出,當 做所期待的目的碼(〇bject c〇(je)。 較佳實施例之詳細說明 全文中之相同功能元件係以相同參考號代表之。 第7圖是一根據本發明用以改進改進在一數位 理器/精簡指令集運算編譯器(DSP/RISC compiler)w^^569138 V. Description of the invention (4) "-^ It is used for the widely used multimedia, especially in the development of a digital processor / reduced instruction set computing system for image processing, which is becoming more and more serious. SUMMARY OF THE INVENTION Accordingly, one of the objects of the present invention is to provide a method for improving the instruction selection performance of a digital signal processor / reduced instruction set arithmetic compiler, so as to obtain optimized execution performance and space usage at the same time. The present invention provides a method for improving the instruction selection performance of a digital signal processor / reduced instruction and arithmetic compiler (DSP / RISC compiler), which determines an optimum within a limited use space selected by a user. The code size is optimized to generate optimized execution performance and space usage at the same time. In this method, 3 the following steps: determine a semantic tree of a basic biock; find all matches by referring to a set of patterns = a combination of the above semantic trees (aU matching c〇 [nbinati〇 ns); determine the number of cycles (cycle number) and instruction length (instruction length) of each combination; according to a predetermined number of clocks and instruction length, filter the number of clocks that exceed the predetermined number of clocks and have 3: Temple = the excess of the number and the instruction length; and from the remainder after filtering, and ', and a, select a group with the least number of clocks as the output, as the desired destination code (〇bject c〇 (je). Detailed description of the preferred embodiment. The same functional elements are represented by the same reference numerals throughout the text. Figure 7 is a diagram of a digital processor / reduced instruction set operation compilation according to the present invention. (DSP / RISC compiler) w ^^

569138 五、發明說明(5) 令選擇效能之方法的流程圖。在第7圖中,該方法 列步驟:決定一基本方塊(basic bl〇ck)的一語義3下 (semantic tree)(S1);經由參考一組圖案找到所有匹配 於上述語義樹的組合(al 1 matching combinations)(S2);決定各組合的時脈數(cycle nuniber·)及指令長度(instructi〇n length)(s3);根 ,定的時脈數及指令長度,過濾、上述時脈數中,超過該預 疋的時脈數者與具有相同時脈數及指令長度中的多餘 (S4^以及從上述過濾後剩下的組合中選出一組具 >、時脈數的組合做為輸出,當做所期待的目的碼(〇叫“七 )。如第7圖所不,當比較本發明與習知的指令選 :寺’後者在沒有尋找所有可能組合以決定最佳的空間的 ^形下完成它的基本方塊的語義樹。對上述相同例子(si) 相;r根ί本發明’其指令選擇邏輯是以第2圖中所示的 相同範例為基礎。 ^ 圖崇中,一組圖案被選取。如第8圖所示,該組 圖案81在郎點暫存器_中具有4種圖案分別為 ^(P^l)、abs(Prl)、pRl+pR2 及州—⑽。符號如 (fbsl ),4’ 2代表需要4時脈數及2指令的一第一絕對值運算 sj同樣地,符號(abs2),4, 2代表需要4時脈數及2指令 f&對值運算―2。X —加法或減法運算需要1 ::數及1指令。在習知例子中’所示係為只使用該第一 或第一絕對值運算來完成該語義樹。然而,根據本發明, 配置該語義樹可具有如第9圖所示的四種組合91,分別具569138 V. Description of the invention (5) Flow chart of the method for selecting the effectiveness. In Figure 7, the method includes steps: determining a semantic tree (S1) of a basic block (S1); and finding all combinations that match the above semantic tree by referring to a set of patterns ( al 1 matching combinations) (S2); determine the number of cycles (cycle nuniber ·) and instruction length (instructioon length) (s3) of each combination; root, fixed number of clocks and instruction length, filtering, the above clocks Among the numbers, the number of clocks exceeding the preliminaries is the same as having the same number of clocks and the excess of the command length (S4 ^ and a combination of > and the number of clocks is selected from the remaining combinations after the filtering above). For the output, it is used as the expected destination code (0 is called "seven"). As shown in Figure 7, when comparing the present invention with the conventional command selection: Temple 'the latter does not look for all possible combinations to determine the best space. ^ Complete the semantic tree of its basic block. For the same example (si) above; the root of the invention 'its instruction selection logic is based on the same example shown in Figure 2. ^ Tu Chongzhong, A group of patterns is selected. As shown in FIG. 8, the group of patterns 81 is temporarily stored in the Lang point. There are 4 kinds of patterns in the device _ (P ^ l), abs (Prl), pRl + pR2, and state—⑽. Symbols such as (fbsl), 4 '2 represents the number of clocks and 2 instructions. An absolute value operation sj Similarly, the symbol (abs2), 4, 2 means 4 clocks and 2 instructions f & pair value operation-2. X-addition or subtraction requires 1 :: number and 1 instruction. The example shown in the known example is to use only the first or first absolute value operation to complete the semantic tree. However, according to the present invention, the semantic tree can be configured to have four combinations 91 as shown in FIG. 9, respectively With

569138569138

五、發明說明(6) 有11時脈數及7指令;9時脈數及9指令; 令;及10時脈數及8指令(S3)。因為最後二數及8指 時脈數及指令,因此捨去一組,如最後且γ具有相同 8指令空間的預定指令長度限制下 述且。合。在考慮 二組合刪除(S4)。剩下來的結合中二上Λ有有9指令的第V. Description of the invention (6) There are 11 clocks and 7 instructions; 9 clocks and 9 instructions; order; and 10 clocks and 8 instructions (S3). Because the last two digits and 8 refer to the number of clocks and instructions, a group is discarded, as described below and the predetermined instruction length limit of γ with the same 8 instruction space is described below. Together. The second combination deletion is under consideration (S4). In the remaining combination, the second and upper Λ have 9 orders.

abs2的這個結合具有10時脈數,其時脈數:於:=J 數的另一結合,因此輸出具有一absl&abs2這 人傲 為期待的目的碼(S5)。 们、個…合做 用來執行上述步驟的邏輯法則如下: comp_C(v) Ον=Φ for all peP, if p can match v then i/=v+rl (p) ; tf2=v-fi:2 (p);' for all Cn,±eCn and all C/2rjeC/2 if s ize (C/ifi) +s ize (c/2,j) +s ize (p) thenThis combination of abs2 has a number of 10 clocks, and its number of clocks: another combination of the number of: = J, so it outputs an objective code with an absl & abs2 proud of it (S5). The logic rule used to perform the above steps is as follows: comp_C (v) Ον = Φ for all peP, if p can match v then i / = v + rl (p); tf2 = v-fi: 2 (p); 'for all Cn, ± eCn and all C / 2rjeC / 2 if s ize (C / ifi) + s ize (c / 2, j) + s ize (p) then

Cv=insert (Cv, (p , s ize (C〜i) +size (C/Zr j) +s ize (p) , cycle (C/i,i) + cycle (C/2, j) +cycle (p) Ά i2)); return Cv 如上述,該程序副程式名稱為comp_C(v)。Cv是一用 於每個節點v的候選組,開始時設定為空集合(Φ )。P是一Cv = insert (Cv, (p, s ize (C ~ i) + size (C / Zr j) + s ize (p), cycle (C / i, i) + cycle (C / 2, j) + cycle (p) Ά i2)); return Cv As mentioned above, the subroutine name of this program is comp_C (v). Cv is a candidate group for each node v, which is initially set to an empty set (Φ). P is one

0697-8165TWF(n);P2002-018;sue.ptd 569138 五、發明說明(7) ----- 組預置圖案。p是—所選圖案。是在該組G内從圖案根 部至最左邊節點中的第i個元素而〜A在該組Cv内從圖案 根部至最右邊節點中的第】個元素一限定的記憶體 空間。設定(:νΛ =(圖案名稱(p)、時脈數(cycle)、指令 長度(s 1 ze )、左運算節點(打)、右運算節點(扣)),其 中,C^i代表經由花費η長度大小及m時脈數以結合左節點 及右節點來完成該語義樹上的圖案而完成該組匕中的第i 個元=。得到該組Cv的方式可以不只是一種圖案。因此, 當一節點上的一向量具有一範圍落在所限記憶體空間 (也就是,總指令長度為 sizj(cw HSiZe(c^Hsize(p)W )内的長度大小時,該 向量將被插入至該候選組cv中。上述邏輯(程序副程式)被 遞迴性地執行,直到完成該惟一的根部r運算。例如,如 第1 0圖所不,一具有節點u、V、χ、y及…的語義樹丁分別具 有下列可能的指令選擇組為Cu={(-,1,1,a,b)丨、 、0697-8165TWF (n); P2002-018; sue.ptd 569138 V. Description of the invention (7) ----- Group preset pattern. p is-the selected pattern. It is the memory space defined by the i-th element in the group G from the pattern root to the leftmost node and ~ A from the pattern root to the right-most node in the group Cv. Set (: νΛ = (pattern name (p), clock number (cycle), instruction length (s 1 ze), left operation node (beat), right operation node (deduction)), where C ^ i represents the cost The length of η and the number of m clocks are combined with the left node and the right node to complete the pattern on the semantic tree to complete the i-th element in the set of d =. The way to obtain the group Cv can be more than just a pattern. Therefore, When a vector at a node has a length that falls within the limited memory space (that is, the total instruction length is within sizj (cw HSiZe (c ^ Hsize (p) W)), the vector will be inserted into In the candidate group cv, the above logic (program subroutine) is executed recursively until the unique root r operation is completed. For example, as shown in Figure 10, a node with nodes u, V, χ, y, and … The semantic tree has the following possible instruction selection groups: Cu = {(-, 1,1, a, b) 丨,,

Cv={(-,l,l,c,d)}、Cx={(absl,5,3,u,Φ)、 (abs2,4,4,u,Φ)}、Cy={(absl,5,3,v,φ)、 (abs2,4,4,v,0)}&Cw={( +,li,7,x,y)、( +,1Q,8,x,y)、 ( +,9,9,x,y)}。藉由上述最佳化的指令選擇,如第丨1圖所 示’比較在該根部組Cw中的所有候選者,在一區域邊界 (如虛線所示’並非如習知技術中的一直線邊界)限制下, 從底部Cu={(-,1,1,a, b)}及Cv={( —,!,;[,c,d)}經Cv = {(-, l, l, c, d)}, Cx = {(absl, 5,3, u, Φ), (abs2,4,4, u, Φ)}, Cy = {(absl, 5,3, v, φ), (abs2,4,4, v, 0)} & Cw = {(+, li, 7, x, y), (+, 1Q, 8, x, y), (+, 9, 9, x, y)}. With the above optimized instruction selection, as shown in Fig. 1 'compare all candidates in the root group Cw, at a region boundary (as shown by the dotted line' is not a straight line boundary as in the conventional technology) Under the constraints, Cu = {(-, 1,1, a, b)} and Cv = {(—,!,; [, C, d)}

569138 五、發明說明(8) »569138 V. Description of the invention (8) »

Cx={(absl,5,3,u,φ)}及Cy={(abs2 C:-二(+,1〇, 8, χ’ y)}的一條路徑被輸’:Φ)}至根部 的碼(如第1圖所示的相同結構)。士口此,,亥蝙譯器的目 小下,可達到較習知技術高的執行效能。相同記憶體大 雖然本發明已以一較佳實# 本發明,任何熟知此技術之人:如=,然其並非用 範圍當視後附之申請 /、 _,因此本發明之# 1 明專利軌圍所界定者為準。 保5楚 0697-8165TWF(n);P2002-018;sue.ptd 第12頁 138 圖式簡單說明 圖示之簡單說明 ,讓本發明之上述及其它目的、特徵、 而易見,下文鸫與 炎”、έ此更顯 細說明如; 較貫例,並配合所附圖式,作詳 第1圖是一典型編譯器的結構; =2圖是一基本方塊例及它的語義樹的圖形; 點的ϊ 3Λ是Γ經編譯器分解至第2圖的語義樹上的所有r 的基本方塊例圖形; J所有即 義樹ϊ4/ ®是—具有由編譯器所作的第—種指令選摆的在 我树的部分圖案圖形; 曰7選擇的浯 義抖Ϊ?圖是一具有由編譯器所作的第二種指令選摞的也 義枒的部分圖案圖形; 7選擇的浯 第5a圖是一經第4圖的第一 樹的圖形; 罘種扣令選擇所完成的語義 第5b圖是一經第4b圖的第— 樹的圖形; 幻罘一種乳令選擇所完成的語義 一第2圖的時脈數對空間的曲線圖; 理5| /铲羚:八2 = 士發明用以改進改進在-數位信號處 王為/精簡指令集運算編譯器( ·、丄 令潠摆外处—士 a 、SP/RISC compiler)中的指 7選擇效能之方法的流程圖; 第8圖是一根據本發明用 組圖案範例; ;第2圖中的基本方塊例的一 第9圖是一根據本發明經第二 義樹的圖形; 罘一種私7選擇所完成的浯A path of Cx = {(absl, 5,3, u, φ)} and Cy = {(abs2 C: -two (+, 1〇, 8, χ 'y)} is input': Φ)} to the root (The same structure as shown in Figure 1). From this point of view, the purpose of the Haiba translator is small, and it can achieve higher execution efficiency than the conventional technology. The same memory is large. Although the present invention has a better practice # This invention, anyone who is familiar with this technology: If =, but it is not the scope of the attached application /, _, so the invention # 1 Ming patent The ones defined by the rail fence shall prevail. Bao 5 Chu 0697-8165TWF (n); P2002-018; sue.ptd Page 12 138 Schematic illustrations to illustrate the simple description of the illustrations, to make the above and other objects, features and characteristics of the present invention easy to see. ", Here is a more detailed explanation such as; a more general example, and in conjunction with the attached drawings, detailed Figure 1 is the structure of a typical compiler; Figure 2 is an example of a basic block and its semantic tree; The point Λ 3Λ is the basic block example graph of all r on the semantic tree of Fig. 2 decomposed by the compiler; J all the meaning trees ϊ 4 / ® are—with the first kind of instruction chosen by the compiler. Part of the pattern of my tree; The selected image is a partial pattern with the second instruction selected by the compiler. The selected image of Figure 5a is Figure 4 of the first tree; Figure 2b is the semantics completed by the selection of Figure 4b. Figure 5b is the figure of the tree of Figure 4b. Number vs. space graph; Logic 5 | / Shovel: 8 2 = Inventor of the invention to improve on the-digital signal Flow chart of the method of selecting performance for / reduced instruction set operation compiler (·, command line, outside—Shia, SP / RISC compiler); Figure 8 is a group pattern according to the present invention Example;; An example of the basic block in Figure 2 Figure 9 is a graph of the second sense tree according to the present invention; 罘 A private 7 selection completed 浯

569138 圖式簡單說明 第1 0圖是一在執行本發明上述選擇邏輯後的結果說明 例;及 第11圖是一根據本發明的時脈數對空間的曲線圖。 [符號說明] 1 1 :人類可讀取原始碼 12 :編譯器 13 :目的碼 2 0 0 :前端器 202 :最佳化器 2 0 4 ··語法處理器 206 :圖案表產生器 2 0 8 :碼產生器569138 Brief Description of Drawings Figure 10 is an example of the result after implementing the above selection logic of the present invention; and Figure 11 is a graph of the number of clocks versus space according to the present invention. [Notation] 1 1: Human-readable source code 12: Compiler 13: Object code 2 0 0: Front end 202: Optimizer 2 0 4 · · Syntax processor 206: Pattern table generator 2 0 8 : Code Generator

0697-8165TWF(n);P2002-018;sue.ptd 第 14 頁0697-8165TWF (n); P2002-018; sue.ptd page 14

Claims (1)

569138 六、申請專利範圍 1 · 一種用以改進數位信號處理器/精簡指令集運算編 譯器(DSP/RISC compiler)之指令選擇效能之方法,包括 下列步驟: 決疋一基本方塊(basic block)的一語義樹(semantic tree); 經由參考一組圖案找到所有匹配於上述語義樹的組合 (all matching combinations); 決定各組合的時脈數(CyCle number)及指令長度 (instruction length); 根據一預定的時脈數及指令長度,過濾上述時脈數 中’超過該預定的時脈數者與具有相同時脈數及指令長度 中的多餘者;以及 從上述過濾後剩下的組合中,選出一組具有最少時脈 數的組合做為輸出,當做所期待的目的碼(〇bject code) ° 斤2 ·如申請專利範圍第丨項之用以改進數位信號處理器/ 精簡指令集運算編譯器(DSP/RISC c〇mpiler)之指令選^ 效能之方法,其中,該基本方塊係代表一或更多獨立 相依圖形,每一個包含一或更多節點。 / 3·如申請專利範圍第2項之用以改進數位信號 精簡指令集運算編譯器(DSP/RISC c〇mpUer)之指令 效能之方法,其中,該每一個節點代表一指令。7 、 4·如申請專利範圍第丨項之用以改進數位信號 精簡指令集運算編譯器(DSP/RISC coinpiler)之指令選569138 VI. Scope of patent application 1 · A method for improving the instruction selection performance of a digital signal processor / reduced instruction set arithmetic compiler (DSP / RISC compiler), including the following steps: Determine a basic block A semantic tree; finding all matching combinations that match the semantic tree by referring to a set of patterns; determining the CyCle number and instruction length of each combination; according to a A predetermined number of clocks and instruction length, filtering out the above-mentioned number of clocks that exceeds the predetermined number of clocks and those having the same number of clocks and instruction length; A set of combinations with the least number of clocks is used as the output, as the desired object code (〇bject code) ° 2 * As the patent application scope item 丨 to improve the digital signal processor / reduced instruction set operation compiler (DSP / RISC ompiler) instruction selection method, where the basic block represents one or more independent dependent graphics, each It comprises one or more nodes. / 3. If the method of patent application No. 2 is used to improve the instruction performance of the digital signal reduced instruction set arithmetic compiler (DSP / RISC commpUer), wherein each node represents an instruction. 7, 4 · Instructions for improving digital signals such as the reduced application instruction set (DSP / RISC coinpiler) according to item 丨 of the scope of patent application 0697-8165TWF(n);P2002-018;sue.ptd 第15頁 5691380697-8165TWF (n); P2002-018; sue.ptd p. 15 569138 六、申請專利範圍 效能之方法,其中,該圖案中的每一個包括一在該頂部的 入口節點及一連接至該入口節點之節點。 5 ·如申請專利範圍第1項之用以改進數位信號處理器/ 精簡指令集運算編譯器(DSp/RISC compiler)之指令選擇 效能之方法’其中,該圖案中的每一個包括一在該頂部的 入口節點及連接至該入口節點之多個節點。 6 ·如申請專利範圍第1項之用以改進數位信號處理器/ 精簡指令集運算編譯器(DSP/RISC compi ier)之指令選擇 效能之方法,其中,該組圖案係為機器相依的。 7 ·如申請專利範圍第1項之用以改進數位信號處理器/ 精簡指令集運算編譯器(DSP/RISC compi ler)之指令選擇 效能之方法,其中,該指令長度係為機器相依的。 ,々%如申請專利範圍第1項之用以改進數位信號處理器/ 精簡心々集運算編譯器(DSP/risc compiler)之指令選擇 $能之方法,纟中,該預定指令長度係由該數位信號處理 裔/精簡指令集運算編譯器的容量來決定。 # 9 ·如申睛專利範圍第丨項之用以改進數位信號處理器/ f =指令集運算編譯器(DSP/RISC compiler)之指令選擇 月b之方法其中’該期待的目的碼係為一組合語言碼。 /笋4 :凊ί利範圍第1項之用以改進數位信號處理器 效^夕W、運算編澤器(DSP/RISC compiler)之指令選擇 效',方法,其中,該期待的目的碼係為一二進制碼。 /浐t m利範圍第1項之用以改進數位信號處理器 、月θ曰v集運算編譯器(DSP/RISc c〇mpiier)之指令選擇6. A method of applying patent scope and effectiveness, wherein each of the patterns includes an entrance node on the top and a node connected to the entrance node. 5 · Method 1 for improving the instruction selection performance of a digital signal processor / reduced instruction set arithmetic compiler (DSp / RISC compiler) as described in the first item of the patent scope, wherein each of the patterns includes one on the top An ingress node and a plurality of nodes connected to the ingress node. 6 · The method of improving the instruction selection performance of the digital signal processor / reduced instruction set arithmetic compiler (DSP / RISC compi ier) according to item 1 of the scope of patent application, wherein the set of patterns is machine-dependent. 7 · The method for improving the instruction selection performance of the digital signal processor / reduced instruction set arithmetic compiler (DSP / RISC compi ler) according to item 1 of the scope of patent application, wherein the instruction length is machine-dependent. For example, if the method for selecting a digital signal processor / reduced-core set-computing compiler (DSP / risc compiler) is selected as the first method in the scope of patent application, the predetermined instruction length is determined by the The capacity of the digital signal processor / reduced instruction set operation compiler is determined. # 9 · As stated in item # 1 of the patent scope, to improve the digital signal processor / f = instruction set arithmetic compiler (DSP / RISC compiler) instruction selection method b month, where 'the expected destination code is one Combined language code. / Bamboo 4: The first item in the scope of the digital signal processor is used to improve the efficiency of the digital signal processor, the instruction selection effect of the DSP / RISC compiler, and the method, wherein the expected destination code is Is a binary code. / 浐 t m The scope of the first item is to improve the digital signal processor, the month θ said v set arithmetic compiler (DSP / RISc cmpiier) instruction selection 〇697.8165TW(n);P2〇〇2.〇i8;sue>ptd 第16頁 569138 六、申請專利範圍 1 效能之方法,其中,該語義樹匹配係從底部開始執行至完 成該基板方塊配置所在的一單根部為止。 1 2 ·如申請專利範圍第1項之用以改進數位信號處理器 /精簡才曰令集運算編譯器(DSP/risC compiler)之指令選擇 效能之方法,其中,進一步包括使用一最佳化器來配置本 方法。 1 3 ·如申請專利範圍第11項之用以改進數位信號處理〇697.8165TW (n); P2〇2.〇i8; sue > ptd page 16 569138 6. Method of applying patent scope 1 effectiveness, wherein the semantic tree matching is performed from the bottom to the completion of the substrate block configuration Up to a single root. 1 2 · The method for improving the instruction selection performance of a digital signal processor / reduced set computing compiler (DSP / risC compiler) according to item 1 of the scope of patent application, which further includes using an optimizer To configure this method. 1 3 · To improve digital signal processing as described in item 11 of the patent application 以輸出該期待的目 的码。 -螞譯器(DSP/RISC compiler)之指令選 —步包括使用一碼產生器來執行本方法To output the desired destination code. -Instruction selection of DSP / RISC compiler-The steps include using a code generator to execute the method 第17頁Page 17
TW91121431A 2002-09-19 2002-09-19 A method for improving instruction selection efficiency in a DSP/RISC compiler TW569138B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW91121431A TW569138B (en) 2002-09-19 2002-09-19 A method for improving instruction selection efficiency in a DSP/RISC compiler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW91121431A TW569138B (en) 2002-09-19 2002-09-19 A method for improving instruction selection efficiency in a DSP/RISC compiler

Publications (1)

Publication Number Publication Date
TW569138B true TW569138B (en) 2004-01-01

Family

ID=32590466

Family Applications (1)

Application Number Title Priority Date Filing Date
TW91121431A TW569138B (en) 2002-09-19 2002-09-19 A method for improving instruction selection efficiency in a DSP/RISC compiler

Country Status (1)

Country Link
TW (1) TW569138B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7877741B2 (en) 2005-04-29 2011-01-25 Industrial Technology Research Institute Method and corresponding apparatus for compiling high-level languages into specific processor architectures
US8484441B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths
US8484442B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for control processing in dual path processor
CN103440155A (en) * 2013-07-05 2013-12-11 万高(杭州)科技有限公司 Compiler of digital signal processor
US9477475B2 (en) 2004-03-31 2016-10-25 Nvidia Technology Uk Limited Apparatus and method for asymmetric dual path processing

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8484441B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths
US8484442B2 (en) 2004-03-31 2013-07-09 Icera Inc. Apparatus and method for control processing in dual path processor
US9477475B2 (en) 2004-03-31 2016-10-25 Nvidia Technology Uk Limited Apparatus and method for asymmetric dual path processing
US7877741B2 (en) 2005-04-29 2011-01-25 Industrial Technology Research Institute Method and corresponding apparatus for compiling high-level languages into specific processor architectures
CN103440155A (en) * 2013-07-05 2013-12-11 万高(杭州)科技有限公司 Compiler of digital signal processor
CN103440155B (en) * 2013-07-05 2016-08-31 万高(杭州)科技有限公司 A kind of compiler of digital signal processor

Similar Documents

Publication Publication Date Title
EP0926594B1 (en) Method of using primary and secondary processors
Settle High-performance dynamic programming on fpgas with opencl
JP4283131B2 (en) Processor and compiling method
JP3896087B2 (en) Compiler device and compiling method
US8056069B2 (en) Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization
Buchsbaum et al. A new, simpler linear-time dominators algorithm
US6611956B1 (en) Instruction string optimization with estimation of basic block dependence relations where the first step is to remove self-dependent branching
JP2004164163A (en) Method and device for creating simd instruction sequence and program for creating simd instruction sequence
US9594668B1 (en) Debugger display of vector register contents after compiler optimizations for vector instructions
TW569138B (en) A method for improving instruction selection efficiency in a DSP/RISC compiler
US10802806B1 (en) Generating vectorized control flow using reconverging control flow graphs
US20040019766A1 (en) Program translator and processor
Johnstone et al. Generalised parsing: Some costs
Um et al. Optimal allocation of carry-save-adders in arithmetic optimization
US20040025151A1 (en) Method for improving instruction selection efficiency in a DSP/RISC compiler
JP6897213B2 (en) Code generator, code generator and code generator
EP1828889A1 (en) Compiling method, compiling apparatus and computer system for a loop in a program
Kuras et al. Value cloning for architectures with partitioned register banks
Brandstädt et al. Efficiently recognizing the P 4-structure of trees and of bipartite graphs without short cycles
Kovačević et al. A solution for automatic parallelization of sequential assembly code
US20220166859A1 (en) Network packet processor for processing a data packet
Fryza et al. Advanced mapping techniques for digital signal processors
JP2004021425A (en) Memory arrangement system in compiler
JP3464019B2 (en) Register allocation method
Boulytchev BURS-based instruction set selection

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees