TW200809612A

TW200809612A - Circular register arrays of a computer

Info

Publication number: TW200809612A
Application number: TW96105677A
Authority: TW
Inventors: Charles H Moore; Jeffrey Arthur Fox; John W Rible
Original assignee: Technology Properties Ltd
Priority date: 2006-02-16
Filing date: 2007-02-15
Publication date: 2008-02-16

Abstract

A stack processor comprises a data stack with a T register, an S register, and eight hardwired bottom registers which function in a circular repeating pattern. The stack processor also comprises a return stack containing an R register, and eight hardwired bottom registers which function in a circular repeating pattern. The circular register arrays described herein eliminate overflow and underflow stack conditions.

Description

200809612 九、發明說明：此申睛案優先於2006年六月提出的暫時申請案，其唬碼為60/81 8, 084，並且此申請案為2〇〇6年五月二十六曰^出之申明案的連續部份，其號碼與標題分別為 11/441，818 以及” Method and Apparatus for Monitoring Inputs to a Computer” 。而號碼 11/44!，818 為 2006 年二月十六日提出之申請案的連續部份，其號碼與標題分別為 11/355,513 以及” Asynchronous Power Saving200809612 IX. Invention Description: This application is based on the provisional application filed in June 2006, the weight of which is 60/81 8, 084, and the application is May 26, 2002. For the continuation of the declaration, the number and title are 11/441,818 and "Method and Apparatus for Monitoring Inputs to a Computer". The number 11/44!, 818 is a continuation of the application filed on February 16, 2006. The number and title are 11/355,513 and "Asynchronous Power Saving" respectively.

Computer” 。此申請案亦優先於2〇〇6年五月三十一曰提出的暫時申明案，其號碼為以及優先於2006 年五月三曰提出的暫時申請案，其號碼為6〇/797, 345。上述所有引用之申請案在此提出以當作參考。發明背景【發明所屬之技術領域】本發明係相關於電腦以及電腦處理器領域，特別是相關於更有效率地使用堆疊電腦處理器之堆疊的方法與裝置。 '、【先前技術】堆疊機器使處理器的複雜度遠低於複雜指令集電腦 (CISCs)，並且使整體系統的複雜度小於精簡指ς集電腦（Rises)或CISCs機器。堆疊機器不需使用高效能之複雜編譯器或快取控制硬體而達到低複雜度的功用。堆疊 5Computer. This application also takes precedence over the provisional declaration filed on May 31, 2006. The number is the priority application and the priority application filed on May 3, 2006. The number is 6〇/ 797, 345. All of the above-cited applications are hereby incorporated by reference in its entirety in its entirety in the the the the the the the the the the the the the Method and apparatus for stacking processors. ', [Prior Art] Stacking machines make processor complexity much lower than Complex Instruction Set Computers (CISCs) and make the overall system less complex than the Reduced Fingerset Computer (Rises) Or CISCs machines. Stacking machines do not require high-performance complex compilers or cache control hardware to achieve low complexity. Stacking 5

3019 — 8669 —PF 200809612 機器亦達到具競爭性的效能，以及在大部分的可程式環境下與一定價格中具有較優的效能。它們首先成功應用在= 時嵌入式控制環境’並且它們要比其他系統設計優良，多：而先前提出之堆疊大都置於程式記憶體内，較堆豐機器則是應用於分離記憶晶片或甚至是晶載 (.Chlp)的記憶區域。這些堆疊機器提供了極快程式吟叫能力，以及在中斷處理和功能交換上，表現好的效能。然而，並沒有硬體偵測堆疊溢位（〇verfi〇w)或是欠。堆疊溢位發生在當可用之暫存器不足纣’但又連續地將資料輸入到堆疊中，而造成不斷地寫入下層的暫存器。而當所有的暫存器為未使用時，發生堆豐欠位的情況。這會造成不斷地丢出堆疊的内容，產生無意義或是不正確的4士要。 μ·#,丨 ' ㊉U 些其他的堆疊處理ϋ使用堆疊 4日標以及記憶體管理，使得告田“指標指到堆疊之記憶體乾圍外時，會打上錯誤旗標。由Zahir等人提出之Μ =ntNo. 6,367,005揭露了一暫存器堆疊機制，它保留暫存器堆疊之記憶體使用的暫存器，以在堆疊溢位 =直多：可用暫存器。此暫存器堆疊機制亦延遲了 ==旦直到此機制在堆疊溢位發生時可以回復適當的暫存為數里，此延遲才會終止。提出之USPatentN〇. 6,21 9 685揭露的方法 ^呆作結果與-門檻值做比較。然而此研究並沒有將低門檻值（可能會產生溢位）的結果以及剛好等於門檻值3019 — 8669 — PF 200809612 The machine also achieves competitive performance and superior performance at a certain price in most programmable environments. They are first successfully applied in the embedded control environment of = and they are better than other systems. Most of the previously proposed stacks are placed in the program memory, and the stacked machines are used to separate memory chips or even The memory area of the crystal carrier (.Chlp). These stacked machines provide extremely fast program squeaking capabilities and perform well in interrupt handling and function exchange. However, there is no hardware to detect stack overflow (〇verfi〇w) or owe. The stack overflow occurs when the available scratchpad is insufficient 但' but the data is continuously input to the stack, causing the scratchpad to be continuously written to the lower layer. When all the scratchpads are unused, a heap underrun occurs. This can result in the constant throwing of stacked content, resulting in meaningless or incorrect 4th. ··#,丨' Ten U Some other stacking processes use stacking 4 days and memory management, so that the “indicator” refers to the memory of the stack and will be marked with an error flag. It was proposed by Zahir et al. Then, ntNo. 6,367,005 exposes a scratchpad stacking mechanism that preserves the scratchpad used by the memory of the scratchpad stack to stack overflow = directly: available scratchpads. This scratchpad stacking mechanism It is also delayed == until the mechanism can reply to the appropriate temporary storage number when the stack overflow occurs, the delay will be terminated. The proposed method of USPatentN〇. 6,21 9 685 ^ stay results and - threshold Values are compared. However, this study does not result in a low threshold (which may produce an overflow) and just equals the threshold.

3019-8669-PF 200809612 的結果做區分。+ 辨識溢位或是欠位的产路之另一方法為讀寫硬體旗標以飞疋人位的情況。然而，指標必 - 而且任何暫存器讀寫後的指、仃，才能執行，這樣降低了處理速度。“寫#作完成後在記憶體内使用堆疊的形複寫入堆疊項或是使用了一個並不:二位：能會^ ::所以降低或是消除堆疊内溢位或欠位的發【發明内容】本發明之目的是提出一裝置與方法，其 :之資料，疊與回復堆疊並不在記憶體内由堆疊 :取’而疋用一分散專用之移位暫存器做硬接線式的存田本毛月之另目的疋降低或是消除資料堆疊與回堆疊内溢位或欠位的發生。 >、本發明之另一目的是把雙向堆疊暫存器的單堆疊暫存器之間的電子連接長度最小化，以最小化：驅動尺寸以及緩衝器（buffering)。而這些以及其他目的都在本發明中實現，其中傳統堆義將由暫存器陣列所取代，暫存器陣列以循環、重複的模二，行工作。此循環、重複模式利用相關聯之雙向堆疊器而實現，而雙向堆疊暫存器包含複數個單一位元堆聂子存盗，在交替樣式（alternating pattern)内以電子連 3019-8669-PF 7 200809612 讀取資料，以及結的方式存在。此結構可避免從堆疊外部避免讀取到空的暫存器值。上述之雙堆疊處理器可疋可以在相互連接電腦陣列不同處理器一起使用。以备作獨立的函數處理器，或内，與其它數個相似處理器或【實施方式】本發明將參考圖形以及圖形内標示的元成明。雖然本發明已達到本發明目標之形式陳述，作在不偏離本發明之精神或申請專利之範圍的情況下，熟習該項技術者可依本說明書之技術變更本發明。、在此參考圖形說明本發明之較佳實施例與變化方 j，但提出的實施例將僅做解說用，並不限定於本發明之範圍内。4了不同的應用，除非有特別的說明，只要能保持本發明之精神盘申諳真夫丨月竹一 Τ明寻利之乾圍，本發明之個別形式與元件可以ι忽略或修改。這是因為本發明可以適應於各種不同的應用。第1圖為使用於本發明之雙堆疊電腦12的一般規劃方塊圖。一般而言，電腦12包括自己的RAM 24與R〇M 。電腦12其他的基本元件為包括R暫存器29之回復堆 @ 28，指令區域30，算數邏輯單元（AL[J或處理器），資料堆疊34,以及解碼指令用之解碼邏輯區％。電腦12 為具有資料堆疊34以及回復堆疊28的雙堆疊電腦。熟悉该項技藝者基本上應已熟悉如本實施例之電腦丨2的堆疊 3019-8669-PF 8 200809612 操作。在此實施例中，指令區域白 A勒六… 砭30包括-些暫存器40，即 A暫存盗40a ’ B暫存器4〇b，以月Γ 以及C暫存器40c。在此實施例中，A暫存器40a為全！ 8仂；献七 ,ηκ 8位兀暫存器，而Β暫存器 4〇b以及C暫存器40c為全9位元暫存器。臀子此實施例之堆疊電腦處理都勺扛I —时土处為的貝枓堆疊與回復堆疊都已括暫存益陣列。這些暫存 Θ4 kM %轉、重複、或循衣杈式的方式操作。資料與腦一样如上也设倖邊並不像先前技術之電月& ，，在記憶體内以堆疊指標做存取。第2圖為根據本發明實料抢％ A L a 18位70資料堆疊。資枓堆4:之上層二個暫存器為 ς射六π次、丨”，々δ位70 τ暫存器以及18位元記為…。循環暫…：為18位-硬體暫存器，標哭… 衣暫存為ϋ可以在沒有Τ以及S暫存加、…一而結合至少s暫存器與s2-s9可以速電路的存取，並且可以得到 :::::，速度。此外，"存二正個處理益系統之間的緩器與整個處理器季續”…产此了以使ϋ暫存 — 糸、、先之間的時序互為獨立。此貝轭例亦包括具有複數個單一位元移位#, W 雙向移位暫存位70移位暫存益之 ^ 早位70移位暫存器的數目與位於5暫存杰下面之底層堆疊暫在每一罩暫存态ϋ數目相同。在第2圖中，器。如第2圖所矛，/ J連結到S2—S9堆疊暫存式連結，以使得S 移位暫存器以交互樣式電子 S^S9暫存器以循序循環相互連接成S2 +The result of 3019-8669-PF 200809612 is distinguished. + Another way to identify an overflow or under-production path is to read and write a hard flag to fly a person. However, the indicator must be - and any pointers and buffers read and written by the scratchpad can be executed, which reduces the processing speed. "Writing # is done after using the stacked shape in the memory to write the stack item or use one not: two bits: can ^ :: so reduce or eliminate the overflow or undershoot of the stack [invention SUMMARY OF THE INVENTION The object of the present invention is to provide a device and method, which: the data, the stack and the reply stack are not stacked in the memory body: and the hard-wired memory is used for a decentralized dedicated shift register. Another purpose of Tanaka Maoyue is to reduce or eliminate the occurrence of overflow or undershoot in the data stacking and back stacking. >, another object of the present invention is to place a bidirectional stack register between the single stack registers The length of the electronic connection is minimized to minimize: drive size and buffering. These and other purposes are achieved in the present invention, where the traditional heap is replaced by a register array, and the register array is looped. Repeated modulo 2, line work. This loop, repeat mode is implemented by using the associated bidirectional stacker, while the bidirectional stack register contains a plurality of single bit stacks, in alternating patterns (alternating patter) n) The data is read by electronic connection 3019-8669-PF 7 200809612, and the junction exists. This structure can avoid reading the empty register value from the outside of the stack. The above dual stack processor can be used. Used in conjunction with different processors interconnecting the computer array. For use as a stand-alone function processor, or internally, with several other similar processors or [embodiments] The present invention will refer to the figures and the elements indicated in the figures. The present invention has been described in the form of the present invention, and the present invention may be modified in accordance with the teachings of the present invention without departing from the spirit of the invention or the scope of the invention. The preferred embodiments of the invention and the modifications are intended to be illustrative only, and are not intended to limit the scope of the invention. 4 different applications, unless otherwise specified, as long as the invention is maintained The individual forms and components of the present invention can be ignored or modified, because the present invention can be adapted to A different application. Figure 1 is a general plan block diagram of a dual stack computer 12 for use in the present invention. In general, computer 12 includes its own RAM 24 and R〇M. Other basic components of computer 12 include R Repository heap 28 of register 29, instruction area 30, arithmetic logic unit (AL[J or processor), data stack 34, and decoding logic area % for decoding instructions. Computer 12 has data stack 34 and reply stack 28 The dual stacking computer. Those skilled in the art should basically be familiar with the stack 3019-8669-PF 8 200809612 operation of the computer 丨 2 as in this embodiment. In this embodiment, the command area is white A... - Some registers 40, that is, A temporary storage 40a 'B register 4〇b, with a month Γ and C register 40c. In this embodiment, the A register 40a is full! 8仂; offer seven, ηκ 8-bit 兀 register, while Β register 4〇b and C register 40c are all 9-bit registers. The hips are stacked on the computer in this embodiment. Both the stacking and the stacking of the shells and the stack are included in the temporary stack. These temporary storage operations are Θ4 kM % turn, repeat, or step-by-step. As with the brain, the data is not as good as the previous technology. It is accessed in the memory with stacking indicators. Figure 2 is a block diagram of the % A L a 18-bit 70 data stack according to the present invention.枓枓 4: The upper two registers are six π times, 丨”, 々 δ 70 τ register and 18 bits are recorded as... The cycle is temporarily...: 18 bits - hardware temporary storage The device, the standard crying... The clothes are temporarily stored as ϋ, and there is no Τ and S temporary storage, ... combined with at least s register and s2-s9 speed circuit access, and can get :::::, speed In addition, the "sequence between the two systems is the same as the entire processor season"... This is the case to make the temporary storage - the sequence between the first and the first. This yoke example also includes a plurality of single bit shifts #, W bidirectional shift temporary storage bit 70 shift temporary storage benefits ^ the number of early 70 shift register and the bottom layer located under 5 temporary storage The stack is temporarily the same in the temporary storage state of each cover. In Figure 2, the device. As shown in Figure 2, /J is connected to the S2-S9 stack temporary link, so that the S shift register is connected to the S2 + in a sequential loop by the interactive S_S9 register.

3019-8 669-PF 9 .200809612 sases8—s9—s7+s5—S3—S”底層堆疊暫存器的循序選擇以循環重複模式操作。單一位元移位暫存器的連接線不會散出超過三個鄰近的移位暫存器，如此可避免底層移位暫存器以長線連結到上層移位暫存器。這些較短的接線只需純小的驅動，以及可以最小化緩衝。此實施例㈣環暫存器陣列使用八個額外的堆疊暫存器。然而，也可以使用多個以四底層暫存器為主的其他組合。第2圖亦顯示了連結&到S9暫存器以及7與s暫存器之讀取線以及寫入線。雙向移位暫存器之每一單一位元移位暫存n都連接料應的S9陣狀底層堆疊暫存器’其中在一時間點，移位暫存器中僅有一位元為開啟 (⑽）（讀到數值υ，而其他位元則是讀到G。在開啟電源時，T位暫存器在初始時僅能有一位元為i ，而其他位凡則是設為0。在此實施例中，移位暫存器的頂層位元指到或是讀到S2，並且寫入鄰近的暫存器S4。暫存器τ與S以及㈣Sg形成—個十晶格向下延伸的堆疊。因為底層八個暫存器為-循環缓衝器，所以為硬體繞線（hardware wraps)，而不是溢位或欠位。雖然不能期待置入多於十個暫存器’並且將它們全部喚回。但是永遠可以從底層堆疊得到最後八個暫存器的數值。這也不合有欠位的錯誤發生。因為可以利用程式一直讀取底層：豐，使得不斷地被得到八個字元，所以這是最快的方式複製八個（或是4，或是2，或是丨）字元，。同樣地，因為堆疊指標允許堆疊進人到任意暫存器， 3019-8669-PF 10 .200809612 所以不會有堆疊溢位。但它是有限的，而且若是超過十個暫存器’則只會保留最後十個暫存器的内容；在第一個十暫存器之後的每一筆儲存都會寫到心到Sg暫存器的其中一個。此堆疊不需要，，初始化，，到目前的位置，只需要在開始時宣告為空堆疊即可。第3圖為資料或回復堆疊内每一暫存器的詳述圖。每一個18位元暫存器都包括18個閂鎖（latch)，閂鎖〇到閂鎖17。有18個輸入通過閘（編號〇到丨7 )，每一輸入通過閘透過寫入匯流排以及讀取匯流排與18個閂鎖連結。亦有18個輸出通過閘（編號〇到17)，每一輸出通過閘透過寫入匯流排以及讀取匯流排與丨8個閂鎖連結。輸入通過間由反向放大器的寫入控制做控制，以及輸出通：閘則由反向放大器的讀取控制做控制。第4圖為根據本發明之18位元回復堆疊。回復堆疊之上層暫存器A 18位元R暫存器，以及在μ存器下：有八個額外的18位元硬體暫存器，標記為Ri_R8。如上述之資料堆疊，下層之八個暫存琴 3于态Kl 以重稷循環陣列方式工作在交互樣式中。循環暫存器陣列Ri-r8可以在沒有R暫存器的情況下操作。然而，結合R暫存器與“可以加速電路的存取，以及可以最佳化時序，所以具有較高的循環暫存器陣列摔作速度。此外’此R暫存器可以當做…存器以及整減理K统之間的緩衝，以使# H暫存器與整個處理器系統之間的時序為相互獨立。 3019-8 669-PF 11 200809612 此實施例亦包括具有雙向移位暫存器。單n 位元移位暫存器之存考下面夕广思几移位暫存器的數目與位於r暫存裔下面之底層堆疊塹左 ^ 所干，"8數目相同。如第4圖3019-8 669-PF 9 .200809612 sases8—s9—s7+s5—S3—S” The sequential selection of the underlying stack register is operated in cyclic repeat mode. The connection line of the single bit shift register will not be scattered. More than three adjacent shift registers prevent the underlying shift register from being connected to the upper shift register with a long line. These shorter connections require only a small driver and minimize buffering. Embodiment (4) The ring register array uses eight additional stack registers. However, it is also possible to use multiple other combinations based on the four bottom registers. Figure 2 also shows the link & to S9 temporary storage. And the read line and the write line of the 7 and s register. Each single bit shift register of the bidirectional shift register n is connected to the S9 array of the underlying stack register. At one point in time, only one bit in the shift register is on ((10)) (the value is read, and the other bits are read to G. When the power is turned on, the T-bit register can only be initially One bit is i and the other bits are set to 0. In this embodiment, the top level of the shift register The meta refers to or reads S2, and writes to the adjacent register S4. The register τ and S and (4) Sg form a stack of ten lattices extending downwards because the bottom eight registers are -cycled Punches, so hardware wraps, not overflow or underscore. Although you can't expect to put more than ten scratchpads' and recall them all, you can always get the last stack from the bottom stack. The value of the eight scratchpads. This does not match the underrun error, because the program can always read the underlying: Feng, so that it is constantly getting eight characters, so this is the fastest way to copy eight ( Or 4, or 2, or 丨) characters. Similarly, because the stacking indicator allows stacking into any scratchpad, 3019-8669-PF 10 .200809612 so there is no stack overflow. It is limited, and if there are more than ten registers, only the contents of the last ten registers will be kept; every storage after the first ten registers will be written to the Sg register. One. This stack does not need, initialize, go to The previous position only needs to be declared empty at the beginning of the stack. Figure 3 is a detailed view of the data or reply to each register in the stack. Each 18-bit scratchpad includes 18 latches ( Latch), the latch is latched to the latch 17. There are 18 inputs through the gate (numbered to 丨7), each input is connected to the busbar through the gate and the read busbar is connected to 18 latches. There are also 18 The output passes through the gate (numbered to 17), each output is connected to the busbar through the gate and the read busbar is connected to the 8 latches. The input is controlled by the write control of the inverting amplifier, and the output is output. Pass: The gate is controlled by the read control of the inverting amplifier. Figure 4 is an 18-bit reply stack in accordance with the present invention. Reply to the stack upper register A 18-bit R register, and under the buffer: There are eight additional 18-bit hardware registers, labeled Ri_R8. As with the data stack described above, the eight temporary buffers in the lower layer operate in the interactive pattern in the form of a repeating loop in the state K1. The cyclic register array Ri-r8 can operate without the R register. However, combined with the R register and "can speed up the access of the circuit, and can optimize the timing, so have a higher cycle register array crash speed. In addition, this R register can be used as a memory and The buffering between the entire system is such that the timing between the #H register and the entire processor system is independent of each other. 3019-8 669-PF 11 200809612 This embodiment also includes a bidirectional shift register The number of shift registeres for a single n-bit shift register is the same as the number of the bottom stacks located below the r temporary storage, and the number is the same as "8. For example, the fourth Figure

所不，母一早一位元移 MNo, the mother moved one yuan in the morning.

?丨D糾士暫存盗都勿別連結到相對声、之P 到h暫存器。雙向移位暫存 η野m 互枵切堂工斗、* f仔益之早-位疋移位暫存器以交互樣式做電子式連結，人相互連接成RW 暫存器以循序循環 … ，7偏R，4偏L暫存器的循序選擇以循環重藉禮— 的私山果式知作。移位暫存器的連接線不合放出超過三個鄰近的單—位曰層單一位亓銘仞鉍+ 1皙存益，如此可避免底曰早位兀移位暫存器以長線連結到上声暫存器。這些較短的接線σ 70位小化緩衝。雖缺此實的驅動’以及可以最雖…、此實細例使用八個額外的回復暫存器，作也可以在循環暫存器陣主的豆…人幻内使用夕個以四底層暫存器為主的/、他組合。一讀取線以及一移位暫存哭之各〜气入線連結^R8暫存器。 W存抑之母一早一位元移位暫存器都連接到 R8陣列之堆疊暫存器。、Μ、 t间點，移位暫存器中僅有 =位4開啟（讀到數值υ，而其他位元則開啟電源時，移位暫存器在初始時僅能有一位元為丨在 =他=是設為。。在此實施例中，移位暫存器的頂 4到或是讀到R1，以及寫入連結的鄰近暫存器R” 在：發明中，並無溢位或欠位的硬體偵測。二护3而言’先前技術的處理器使用堆疊指標或是記憶體管理: 1似的方式，使得當堆疊指標超出堆疊佔用的記範時，此錯誤情況會旗標化。當堆疊在記憶體 3019 — 8669 — PF 12 200809612 溢位或欠位可能會覆寫到—堆疊項或是使用到堆疊項。然而口為本發明之底層暫存器為循環陣列，所以堆疊不會恤位或欠位。相反的，猶環陣列只會對暫存器陣列做繞線。因為堆疊為有限深度，所以在堆疊上層輸入任何資料意味著底層會被覆寫。輸入多於十項資料到資料堆疊，或是輸入多於九項資料到回復堆疊，都會使得底覆寫。、，體必須維持堆疊内資料項的執跡，並且不能輸入超過堆豐所能負載的資料量。硬體將不會偵測底層堆疊的覆寫項或是錯誤的旗標。要說明的是，利用軟體處理循環陣列的底層堆疊是有優勢的’並且有幾種方式。比如說，軟體可以簡單地假設堆疊在任何時間為，，空的，，。當舊的資料項被向下推擠到底層時，不需要清除它們，所二在程式的初始時假設堆疊為”空的，，是沒關係的。另一優點是可以不需要再載入使用過的資料項。這些堆疊的底層八個資料項亦可以迴圈的方式讀出或讀寫。在讀取二個資料堆疊後，T朱7 ς膝I μλ 貝Tf i和s將會從八個堆疊暫存器的循序陣列中複製二個資料項。在讀取八個或更多的資料後， T和S將會使用堆疊繞線，再載人與底層㈣的資料項。在沒有複製或是寫回堆疊的情況下，並沒有限制此八個資料項讀取的次數。若不算是堆疊的錯誤’則在資料或是回復堆疊内’-組係數可以重複八，四，或是二個晶格的方式，重複地從堆疊内讀取。雖然本發明之實施例以雙堆疊18位元處理器的資料 3019-8669-PF 13 200809612 本發明亦可適用其它位元數之堆疊以及回復堆疊做說明處理器。上述之循環暫存器陣 .. 、、平幻4相關於早一雙堆疊處理為…、上述之循環暫存器陣列亦可以使 «’如示之電腦陣列ig。電腦陣列ig具有複數個（在此貫施例為24個）雷腦1 9 r户由 q 1J電月自12(在陣列實施例中，有時也稱作”核心，，或，，節 .lL _ 即點）。在此實施例中，所有的電腦12都置於單一印模(die)14ji。根據本發明，每一電腦12皆為獨立運作之電腦。電腦12由複數條資料匯流排 16一相互連接。在此實施例中，資料匯流排16為雙向非同步兩速並行資料匯流排。這僅是本發明所提之實施例，其他連接的方式亦可使用。電腦12e為電腦12的其中一個電腦，並且不在陣列 10的周邊上。也就是電腦12e的四個正交鄰近電腦為 12a ’ 12b ’ 12c，以及12d。在此將利用電腦12a到電腦 1 2 e的電腦群’以實施例方式詳細地說明陣列1 〇的電腦 12之間的通訊方法。如第5圖所示，内部電腦1 2 e將會有四個其他的電腦1 2以匯流排1 6相互連接。在下面的討論中’除了陣列1 〇週邊的電腦12僅會有三個直接連接的電腦’或是角落的電腦12僅會有二個直接連接的電腦之外，討論的原則都將應用於所有的電腦12。第6圖更詳細示意了第5圖’其中僅顯示一些電腦 1 2。詳細的說，這些電腦為電腦1 2a到電腦12e。第6圖亦顯示資料匯流排16包括一讀取線18，一寫入線2〇，以 3019—8669—PF 14 200809612 及複數條（在此實施例為18)資料線22。資料線22可以旅聯式傳輸18位元指令字集的所有位元。根據本發明的方法，如電腦12 e的電腦1 2可以將它的一條，二條，三條，或是所有四條的讀取線丨8設成’’ high” ，使得它可以準備分別從一個，二個，三個，或是所有四個鄰近的電腦1 2接收資料。同樣地，電腦J 2 也可以將它的一條，二條，三條，或是所有四條的寫入線 20 設成” high” 。當鄰近電腦12a，12b，12c，以及12d其中之一設定與電腦12e之間的寫入線20為，，high”時，若電腦12e 已經把對應的讀取線18設成” high” ，則一字元會從電腦12a，12b，12c，或是12d以相關之資料線22傳送到電腦12e。然後，發送電腦12將會釋放寫入線2〇，以及接收電腦12(在此實施例為電腦12e)會將寫入線2〇與讀取線18拉到” 1 ow” 。此後序動作是要告知發送電腦} 2該資料已被接收。在本發明之實施例中，如第1圖所示，電腦12具有四個通訊埠38，用以與鄰近電腦12作通訊。通訊蜂38 為三態驅動器，具有切斷狀態，接收狀態（用以驅動信號至電腦12)，以及發送狀態（用以驅動電腦1 2送出作號）。若特定電腦12並不在内部的陣列（第5圖），如電腦 1 2e，則該特定電腦1 2至少會有一個或是更多通訊埠38 會因為上述之原因而不會被使用。然而，緊鄰印模邊緣的電腦12可以具有額外的電路，用以設計在此種電腦12内 3019-8669-PF 15 200809612 或疋電腦1 2的外部以使得通訊埠38可以當做外部j/〇埠 39(第5圖）。此種外部1/〇埠39並不限制但可以包括 USB(萬用串列匯流排）埠，RS232串列匯流排埠，並列通訊埠，類比到數位以及/或是數位到類比轉換埠，以及其他可能的變化。在第5圖中描繪了一，，彡緣，，電腦⑵以及相關之介面電路80,它利用外部1/〇埠39與外部裝置Μ 進行通訊。 t 各式的修改並不會改變本發明的數值或是範圍。舉例來說’本發明在此使用了特殊電腦12，本發明的許多型態或是全部型態皆可改變成其他電腦設計、電腦相似的電腦。 ^ & 雖然本發明在此主要揭露了在單—印模14上面，電 ^ 12與陣列1〇之間的通訊’相同的原則與方法可以被使用或是修改以完成其他内部裝置的通訊’如電^ Μ盥它專用的記憶體的通訊，戋是在皇、裝置的通訊。在陣列1〇内之電腦12與外部同樣的，雖然本發明在此使用了雙堆疊明亦可以使用在單—堆属声採哭^ θ ^ 态本發理器。 &處理為，或疋多於二個堆疊的處雖然本發明已以_ 乂佳實施例揭露如上，然复廿限定本發明，任㈣知㈣者，在不脫離本發= = :圍内，當可作些許更動與潤飾，因此本發；口 ^後附之中請專利範圍所界^者為準。 …蔓耗圍?丨D纠士 Do not link to the relative sound, P to h register. Two-way shift temporary storage η野m mutual 枵堂工工 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 7 partial R, 4 partial L register of the sequential selection of the re-lending ceremony - the private mountain fruit known. The connection line of the shift register does not release more than three adjacent single-bit layer 亓仞铋仞铋皙皙皙皙皙 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Save. These shorter wirings are σ 70 bits to minimize buffering. Although this lack of real driver 'and can be the most ..., this real example uses eight additional reply register, can also be used in the loop register of the main bean... Save the main /, he combined. A read line and a shift temporary hold crying ~ gas inlet line connection ^ R8 register. The mother of the W Suppression is connected to the stack register of the R8 array one morning. Μ, t, t, in the shift register only = bit 4 is on (read the value υ, while other bits turn on the power, the shift register can only have one bit at the beginning is now = he = is set to. In this embodiment, the top 4 of the shift register is either read to R1, and the adjacent register R of the write link is "in the invention, there is no overflow or Under-the-counter hardware detection. In the case of 2 Guard 3, 'pre-technical processors use stacking metrics or memory management: 1 way, so that when the stacking metrics exceed the stack occupancy, this error situation will flag Standardization. When stacked in memory 3019-8669 - PF 12 200809612 Overflow or under-bit may overwrite - stacking items or use to stack items. However, the underlying register of the present invention is a circular array, so Stacking does not fit or underpin. Conversely, the array of loops only wraps the scratchpad array. Because the stack is limited in depth, entering any data on top of the stack means that the underlying layer will be overwritten. Item data to data stack, or enter more than nine items to reply The stack will cause the bottom to be overwritten. The body must maintain the trace of the data items in the stack, and cannot input more than the amount of data that can be loaded by the heap. The hardware will not detect the overwrite of the underlying stack or the error. The flag is to illustrate that the use of software to process the underlying stack of loop arrays is advantageous 'and there are several ways. For example, software can simply assume that the stack is at any time, empty, when old When the data items are pushed down to the bottom layer, they do not need to be cleaned up. In the beginning of the program, it is assumed that the stack is "empty," it does not matter. Another advantage is that you can reload the used data. The bottom eight data items of these stacks can also be read or read and written in a loop. After reading the two data stacks, T Zhu 7 knees I μλ shells Tf i and s will be temporarily stacked from eight stacks. Two data items are copied in the sequential array of registers. After reading eight or more data, T and S will use the stacking winding, and then carry the data items of the bottom and bottom (4). In the case of back stacking, there is no limit. The number of times the eight data items are read. If it is not the stack error 'in the data or the reply stack', the group coefficient can be repeated eight, four, or two lattices, repeatedly from the stack. Although the embodiment of the present invention uses a dual-stack 18-bit processor data 3019-8669-PF 13 200809612, the present invention can also be applied to other bit number stacking and reply stacking as a description processor. Array, .., and Phantom 4 are related to the early double stacking process. The above-mentioned circular register array can also make «' computer array ig as shown. Computer array ig has multiple (in this case) For 24) Thunderbolt 1 9 r households by q 1J electric moon from 12 (in the array embodiment, sometimes referred to as "core,, or, section.lL _ ie point). In this embodiment, all of the computers 12 are placed in a single die 14ji. In accordance with the present invention, each computer 12 is a computer that operates independently. The computer 12 is connected to each other by a plurality of data bus bars 16 . In this embodiment, the data bus 16 is a two-way asynchronous two-speed parallel data bus. This is merely an embodiment of the present invention, and other ways of connecting may also be used. The computer 12e is one of the computers 12 and is not on the periphery of the array 10. That is, the four orthogonal neighboring computers of the computer 12e are 12a ' 12b ' 12c, and 12d. Here, the communication method between the computers 12 of the array 1 is explained in detail by way of an embodiment using the computer group 12a to the computer group of the computer 1 2 e. As shown in Figure 5, the internal computer 1 2 e will have four other computers 1 2 connected to each other by bus bars 16. In the following discussion, except that the computer 12 around the array 1 has only three directly connected computers, or the computer 12 in the corner has only two directly connected computers, the principles discussed will apply to all. Computer 12. Figure 6 shows Figure 5 in more detail, which shows only some computers 1 2 . In detail, these computers are computers 1 2a to 12e. Figure 6 also shows that the data bus 16 includes a read line 18, a write line 2, and a data line 22 of 3019-8669-PF 14 200809612 and a plurality of (18 in this embodiment). Data line 22 can link all bits of the 18-bit instruction word set in a bridging manner. According to the method of the present invention, the computer 12 of the computer 12 e can set one, two, three, or all four reading lines 丨 8 to ''high' so that it can be prepared separately from one, two Similarly, the computer J 2 can also set one, two, three, or all four of the write lines 20 to "high". When the write line 20 between the adjacent computers 12a, 12b, 12c, and 12d and the computer 12e is set to "high", if the computer 12e has set the corresponding read line 18 to "high", then A character will be transmitted from the computer 12a, 12b, 12c, or 12d to the computer 12e with the associated data line 22. Then, the transmitting computer 12 will release the write line 2, and the receiving computer 12 (in this embodiment, the computer 12e) will pull the write line 2 and the read line 18 to "1 ow". The subsequent action is to inform the sending computer} 2 The data has been received. In the embodiment of the present invention, as shown in Fig. 1, the computer 12 has four communication ports 38 for communicating with the neighboring computer 12. The communication bee 38 is a tri-state driver with a cut-off state, a receiving state (to drive a signal to the computer 12), and a transmitting state (to drive the computer to send a signal). If the particular computer 12 is not in an internal array (Fig. 5), such as a computer 1 2e, then the particular computer 1 2 will have at least one or more communications 埠 38 that will not be used for the reasons described above. However, the computer 12 next to the edge of the stamp may have additional circuitry for designing the exterior of the computer 12 within 3019-8669-PF 15 200809612 or the computer 12 so that the communication port 38 can be used as an external j/〇埠39 (figure 5). Such external 1/〇埠39 is not limited but may include USB (Universal Serial Bus), RS232 serial bus, parallel communication, analog to digital and/or digital to analog conversion, and Other possible changes. In Fig. 5, a phantom, computer (2) and associated interface circuit 80 is depicted which communicates with an external device 外部 using an external 1/〇埠39. Modifications of various formulas do not alter the value or range of the invention. For example, the present invention uses a special computer 12, and many or all of the forms of the present invention can be changed to other computer designs and computer-like computers. ^ & Although the present invention primarily discloses on the single-die 14, the communication between the electrical and the array 1 'the same principles and methods can be used or modified to complete the communication of other internal devices' For example, the communication of the dedicated memory of the device is the communication between the emperor and the device. The computer 12 in the array 1 is the same as the exterior, although the present invention can also be used in the single-stack acoustic crying θ^ state processor. &Processing, or 疋 more than two stacks, although the present invention has been disclosed above in the preferred embodiment, the recitation of the present invention, the (four) knowledge (four), without departing from the present = = : , when you can make some changes and retouching, so this hair; mouth ^ after the attachment, please be bound by the scope of the patent. ... vines

3019-8669-PF 16 200809612 【圖式簡單說明】第1圖為堆疊電腦之一般規劃方塊圖；第2圖為根據本發明之資料堆疊；第3圖為更詳細示意堆疊之單一暫存器；第4圖為根據本發明之回復堆疊；第5圖為根據本發明之電腦陣列示意圖；以及第6圖為更詳細示意第5圖電腦之子集，以及更詳示意第5圖之相連資料匯流排。【主要元件符號說明】 1 0〜電腦陣列； 12，12a，12b，12c，12d，12e，12f〜電腦； 14〜印模； 16〜資料匯流排； 18〜讀取線； 2 0〜寫入線； 22〜資料線； 24〜RAM ; 26〜ROM ; 29〜R暫存器； 28〜回復堆疊； 30〜指令區域； 32〜算數邏輯單元； 34〜資料堆疊； 3019-8669-PF 17 200809612 36〜解碼邏輯區； 38〜通訊埠； 39〜外部I/O埠； 4〇〜暫存器； 40a〜A暫存器； 40b〜B暫存器； 40c〜C暫存器； 80〜介面電路； 82〜外部裝置；3019-8669-PF 16 200809612 [Simplified Schematic] FIG. 1 is a general plan block diagram of a stacked computer; FIG. 2 is a data stack according to the present invention; FIG. 3 is a single register in more detail illustrating a stack; 4 is a reply stack in accordance with the present invention; FIG. 5 is a schematic diagram of a computer array in accordance with the present invention; and FIG. 6 is a subset of the computer in FIG. 5 in more detail, and a connected data bus in FIG. 5 in more detail . [Main component symbol description] 1 0~computer array; 12,12a,12b,12c,12d,12e,12f~computer; 14~die; 16~ data bus; 18~read line; 2 0~write Line; 22~ data line; 24~RAM; 26~ROM; 29~R register; 28~ reply stack; 30~ instruction area; 32~ arithmetic logic unit; 34~ data stack; 3019-8669-PF 17 200809612 36~decode logic area; 38~communication埠; 39~external I/O埠; 4〇~ scratchpad; 40a~A register; 40b~B register; 40c~C register; 80~ interface Circuit; 82~ external device;

Ri到R8，S2到S9〜循環暫存器；閂鎖0到閂鎖17〜閂鎖； 3019-8669-PF 18Ri to R8, S2 to S9~cyclic register; latch 0 to latch 17~latch; 3019-8669-PF 18

Claims

200809612 X. Applying for a patent garden: 1. A stacked computer processor, comprising: a data stack, a slab, including at least one data register; and a reply stack, including 5 | and at least one reply a register; to accommodate an 18' wherein the at least one of the at least locations. Wherein the at least one of the data, wherein the reply is in the array /, each of the data stacks and the reply stack, the instruction characters of the bit. 2. The processing number as described in item 1 of the patent application - the poor material register includes the top layer of the stack (7) register. 3. If the processing number mentioned in item 2 of the patent application scope - the data register further includes the stack (8) register - the number _ 4. If the patent application scope is the first! The processor described in the item has a reply register including a top register (8). ^ 5. The processor stack as described in claim 1 further includes an array of hardware registers. 6. The processing stack as described in claim 1 of the patent application further includes an array of hardware registers. 7. The processor described in item 5 of the scope of the patent application functions in a cyclic mode. 8. The processor of claim 2, wherein the array acts in a cyclic mode. 9. A method of operating a computer processor, comprising: inputting a plurality of instruction characters to a complex instruction early element corresponding to the processor; and processing the plurality of instruction characters; 3019-8669-PF 19 200809612 wherein The method of claim 12, wherein the input includes filling all available instruction units. II. The method of claim 10, wherein the input further includes inputting additional instruction characters after all available instruction units have been filled. 1 2. The method of claim 11, wherein the input and the processing do not use an indicator of software execution. The method of claim 9, wherein the processing comprises not reloading the plurality of instruction characters, but repeatedly using the plurality of instruction characters. It is a computer processor comprising: a register array; and a shifter register; wherein the shifter register comprises a plurality of single bit shifters temporarily stored and interconnected using electronic wiring , and in which the plurality of single-bit π shifters are temporarily stored! The number of I is the same as the number of registers in the scratchpad array. The processor of claim 14 further comprising at least one register at a top level of the register array. 16. The processor of claim 14, wherein the register array further comprises a link (four) receive-read bus and an incoming bus. The processor of claim 14, wherein the register array functions in a cyclic mode. 18. The processor of claim 14, wherein the register array is stacked. The processor of claim 18, wherein the plurality of single bit shifter registers are interconnected by the electronic wiring in an alternating pattern. 2 〇 The processor of claim 14, wherein the processor is a data stack. 21. The processor of claim 20, wherein the register array comprises eight data registers. 2 2 The processor of claim 20, wherein the register array comprises a multi-element (mu 11 i p 1 e ) four data register. 2 3 The processor of claim 14, wherein the processor is a reply stack. 2 4 The processor of claim 2, wherein the register array comprises eight reply registers. 2 5 · The processor described in the second paragraph of Shen Qing patent scope, wherein the register array sentence #夕一. 干〜匕一一凡凡回四存存器. 26· A computer processor comprising: a register array; and a bidirectional shifter register, connected in a hardwired manner to the temporary storage array, wherein the two-way shifter register 1 comprises a plurality of single bits temporarily; j^ 55, ° interconnected by electronic wiring, and wherein the plurality of singles 3019-8 669-Pf 21 200809612 a number of 70 shifter registers and the register array The number of scratchpads is the same. Further, at least one register is further included in the top layer of the processor register array as described in claim 26 of the patent application. The shift register is as described in claim 26, and the processor register uses the hardware indicator for the register array. Furthermore, the processor read bus and the write bus as described in claim 26 of the patent application. 3. A processor as claimed in claim 26, wherein each of the plurality of single-bit shift register pairs is associated with a associated register of the register array. 31. The processor of claim 3, wherein each of the plurality of single bit shifts is temporarily actuated. ^ ^ 伹皙裔 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . Ma entered 33. The processor described in item 26 of the patent application is multiplexed with a single unit. Jiang Baguabian Temple Pingyi shift temporary storage is interconnected by electronic wiring to size the drive and the buffer. 34. As claimed in the patent scope 帛26, the electronic wiring is connected to at most one adjacent 乂Mi 乂八八 all adjacent one-bit shift register. 35. A processor as described in the patent scope, wherein the storage array comprises eight registers. 6. The processor of claim 26, wherein the processor is a data stack. 3. The processor of claim 26, wherein the processor is a reply stack. 3019-8669-PF 23