TW502163B - Device to increase the efficiency of processor-systems - Google Patents

Device to increase the efficiency of processor-systems Download PDF

Info

Publication number
TW502163B
TW502163B TW089103056A TW89103056A TW502163B TW 502163 B TW502163 B TW 502163B TW 089103056 A TW089103056 A TW 089103056A TW 89103056 A TW89103056 A TW 89103056A TW 502163 B TW502163 B TW 502163B
Authority
TW
Taiwan
Prior art keywords
state
cache
cache memory
mesi
tlc
Prior art date
Application number
TW089103056A
Other languages
Chinese (zh)
Inventor
Annie Stoess
Johann Schachtner
Wolfgang Ziemann
Original Assignee
Fujitsu Siemens Computers Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Siemens Computers Gmbh filed Critical Fujitsu Siemens Computers Gmbh
Application granted granted Critical
Publication of TW502163B publication Critical patent/TW502163B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

In order to increase the efficiency of processor-systems composed of many one-processors, which are used to maintain the cache-coherence inside the processor-system of the MESI-State model, it is suggested to expand the MESI-State model, which is limited to four states, with other states. The state situation inside the processor-system can be processed differently in this way and unnecessary Bus-access can be avoided. The operation-capacity that becomes available can be used to increase the performance of the system.

Description

502163 修煩 正 本委502163 Repair troubles

〇之 五、發明說明(1) 本發明係關於一種依據申請專利範圍第1項前言部份 所述之提高處理機系統之效率所用之裝置。 大的伺服器(server)是由一種具有許多單一處理機之處 理機系統所構成。單一處理機之數目例如可爲 1 6,32,64,128〜等等。此種伺服器已知是一種1^1)(系統 或NT系統。單一系統機目前大部份都具有一個位於晶片 中之快取(Cache)記憶體。因此亦稱爲所謂晶片上第一階 (on chip f i rst level )快取記憶體或第一階快取記憶 體。此外,晶片外部仍可配置其它快取記憶體至單一處理 器中。在上述第一種其它快取記憶體中亦有所謂晶片外 第二階(off-chip second level)快取記憶體或第二階快 取記憶體。除了第一和第二階快取記憶體外,亦可設置其 它快取記憶體,其例如可稱爲第三階快取記憶體。 特別是可使用其它快取記憶體,以便組合各別-或成組 之單一處理機成爲一串(cluster)。現在之標準處理機只 允許組合1至4個單一處理機。若多個額外之此種快取 記憶體分別設置例如4個單一處理機,則可構成上述之大 的處理系統。這些額外的各別之快取記憶體因此必須以 快取一致性(coherent )之方式而互相耦合。此種耦合可 保持階層式等級,其中不只4個單一處理機可互相耦合, 而且亦可耦合多個額外之快取記憶體。結果是可得到一 種具有很多單一處理機之處理機系統,其具有很大之效 率。 502163 煩請委員明示所提之 修正本有無變更實質内容是否准予修瓜ο 五、發明說明(2) 如上所述,處理機系統必須以快取一致性之方式來操 作。此處可觀察及解釋匯流排之動作。由匯流排之動作 來分析即可推論:此種與匯流排動作有關之快取記憶體之 快取資料方塊是處於何種狀態中。若已知這些快取資料 方塊處於何種狀態中,則即可注意:處理機系統內部已形 成或保持著快取一致性。 快取資料方塊之狀態描述可藉由快取-規約(protocol) 來進行。一種習知之快取-規約是所謂MESI-標準。依據 MESI-標準,4個預定之MESI-狀態中之一配屬於各別之快 取資料方塊。此 4 個預定之 MESI-狀態是: MODIFIED(M),EXCLUSIVE(E),SHARED(S)和 INVALID(I)。 這些狀態能以二個MESI-位元來編碼而表示出來。 MODIFIED之意義是:所屬之快取記憶體方塊只包含在快 取記憶體中,但須重新寫下來,其在其餘之處理機系統中 實際上不爲人所知。EXCLUSIVE之意義是:所屬之快取記 憶體方塊不會改變,其是與系統-主記憶體之內容相一 致。SHARED之意義是:所屬之快取記憶體方塊仍然存在於 其它快取記憶體中且仍然是有效的。INVALID之意義是: 所屬之快取記憶體方塊是無效的或不存在於記憶體之 TAG-RAM Φ ° 快取記憶體方塊之狀態可藉由各別配置之單一處理機 經由讀出-或寫入過程而改變。此種狀態亦可藉由系統內 部之詢問(習知亦稱爲Snooping)來改變。同樣可藉由外 502163 _讀委員明示y年Α月所提之 #'公本有無變更實質内容是否准予修正,° 五、發明說明(3) 部之邏輯單元(例如,另一單一處理機)或第二階快取記憶 體(即,藉由外部之Snooping )來改變狀態。在每一情況 中每一時間點時各別之快取記憶體方塊只存在於4個上 述MESI-狀態中之一種狀態中。 MESI-規約之正確的功能描述例如已掲示在Internet 以下之 b i n/que ry ? pg = q&kl=de&q=MESI&s earch = Search,Punktl, “Pentium:Neuerungen in de r X86 Architektur”,最近 是在1997年3月28日改變或揭示在Internet中所屬之 文件中 ,其網 址是: http://plweb.htu.tuwien.ac.at/pentium,Punkt 2.7.1 M,E.S.I . - protocol 〇 以額外之快取記憶體來確保處理機系統中之快取一致 性的一種可能方法是須確定:這些額外之快取記憶體具有 這些直接與其相連接之其它快取記憶體之所有大量之快 取資料方塊。因此在所有之排除過程中須針對所連接之 單一處理機或其快取記憶體上之更新Update)需求來進 行且歸還此種與此排除過程有關之快取記憶體方塊。就 此種更新需求或其處理而言,則系統功率是需要的,系統 功率不能被使用者作爲有效功率使用。整個系統之效率 因此會受到限制。 本發明之目的是提供一種技術上之措施,藉此特別是可 提高大處理機系統之效率。 502163 煩請委I哦示知年仏片所提之 修正本有無實質内容是否准予·瓜0 五、發明說明(4) 依據本發明,此目的是藉由處理機系統中具有申請專利 範圍第1項特徵之裝置來達成。 因此,額外之快取記憶體不只可藉由狀態M,E,S和I來 實現簡易之快取記憶體方塊-管理,且可在額外之各別快 取記憶體之間或各單一處理機之快取記憶體之間實現一 種組合式之狀態管理。此種組合式之狀態管理顯示在額 外之快取記憶體之快取規約(protocol)之MESI-狀態之擴 展中,其優點是:更新需求所增加之次數在處理機系統中 其它位置之方向中可被精減,這是因爲已擴展之MESI-狀 態可事先決定:所需求之相關快取資料方塊可以不在此處 理機系統中之其它位置上。此種更新需求目前所需之功 率消耗量現在因此可用作此處理機系統之有效功率。處 理機系統之效率因此可提高。 ‘ 此外,爲了提高此處理機系統之效率,則下述情況是有 助益的.·由於已擴展之MESI -狀態,則又可區別一些情況, 其中需要進行一些處理,但這些處理整體而言可較快速地 達成。例如會產生一些情況,在這些情況中由於處理機系 統之一部份先前已有之對額外快取記憶體之更新需求, 則一種由額外之快取記憶體所進行之隨後對較低快取階 層之更新需求是需要的。例如,可使相關之快取記憶體方 塊分派有MESI-狀態I,但在此種情況下由於存在著已擴 展之MESI-狀態而可知:即將由相關之快取記憶體方塊所 傳送者並非先前已修正之資料。在此情況下此額外之快 502163 煩請委員萌示y年^: 所提之 #'it~本有無變更實質内容是否准予修iE..° 五、發明說明(5) 取記憶體可立刻通知此處理機系統之起動先前之更新需 求所用之部份:上述之MESI-分派已進行,雖然此種分派 事實上已有在隨後之措施中由額外之快取記憶體所促 成。此種處理機系統因此不必等到此更新需求之實際上 之終點已被確認時,而是可立刻開始下一個主題。 本發明有利之形式敘述在申請專利範圍各附屬項中。 處理機系統就其快取記憶體結構而言是以階層方式構 成,以便藉由截獲這些對儘可能高之階層面之更新需求而 在這些位於下方之各階層面之方向中達成一種對已精簡 之更新需求之最大値。 藉由選取適當之MESI狀態,則所需MESI-狀態之數目可 保持儘可能小。在MESI -狀態之數目儘可能小時,則在 MESI-狀態作數位化編碼時編碼位元之數目須儘可能小。 硬體費用因此會降低。 本發明以下將依據圖式作詳述。圖式簡單說明如下: 第1圖先前技藝之大處理系統之原理圖。 第2圖先前技藝之MES I -狀態之狀態轉換圖。 第3圖本發明之已擴展之MESI-狀態之狀態轉換圖。 第4圖狀態表,其中相對於第2画所使用之MESI-狀態 而言在使用第3圖已擴展之MESI-狀態時可節省系統功率 且此系統功率可用作第1圖之處理機系統之有效功率。 第1圖至第4圓可幫助_解本發明之本質,這些圖式並 未要求完整性,只要以下之說明涉及先前技藝和本發明之 502163 煩請委員明、示A年/ί月所提之 修正本有無變更實質内容是否准予修正〇Ⅴ. Description of the invention (1) The present invention relates to a device for improving the efficiency of a processor system in accordance with the preamble of item 1 of the scope of patent application. A large server is made up of a processor system with many single processors. The number of single processors may be, for example, 1, 6, 32, 64, 128, and so on. This kind of server is known as a 1 ^ 1) (system or NT system. Most single-system machines currently have a cache memory located in the chip. Therefore, it is also called the first stage on the chip (on chip first level) cache memory or first-level cache memory. In addition, other cache memories can still be configured outside the chip into a single processor. In the first other cache memory described above, There is a so-called off-chip second level cache memory or a second-level cache memory. In addition to the first- and second-level cache memories, other cache memories may be provided, such as It can be called third-level cache memory. In particular, other cache memories can be used in order to combine individual-or groups of single processors into a cluster. Today's standard processors only allow combinations of 1 to 4 single processors. If a plurality of additional such cache memories are respectively provided with, for example, 4 single processors, the above-mentioned large processing system can be constituted. These additional separate cache memories must therefore be cached. Take coherent This kind of coupling can maintain a hierarchical level, in which not only 4 single processors can be coupled to each other, but also multiple additional cache memories. As a result, a process with many single processors can be obtained. Machine system, which has great efficiency. 502163 Members are kindly requested to indicate whether the proposed amendment has changed the substance or whether the substance is allowed to be repaired. V. Description of the invention (2) As mentioned above, the processor system must be cache consistent. Here you can observe and explain the behavior of the bus. By analyzing the behavior of the bus, you can infer: what kind of state is this cache data block of the cache memory related to the bus action. If Knowing the state of these cached data blocks, you can note that the cache consistency has been formed or maintained within the processor system. The state description of the cached data blocks can be determined by the cache-protocol A known cache-protocol is the so-called MESI-standard. According to the MESI-standard, one of the four predetermined MESI-states is assigned to a separate cache data party. The 4 predetermined MESI-states are: MODIFIED (M), EXCLUSIVE (E), SHARED (S), and INVALID (I). These states can be expressed by encoding with two MESI-bits. The meaning of MODIFIED Yes: The cache block it belongs to is only contained in the cache memory, but it has to be rewritten. It is actually unknown in the rest of the processor system. The meaning of EXCLUSIVE is: the cache memory it belongs to The body block does not change, it is consistent with the content of the system-main memory. The meaning of SHARED is that the cache block it belongs to still exists in other cache memory and is still valid. The meaning of INVALID is: The cache block it belongs to is invalid or does not exist in the memory of TAG-RAM Φ ° The state of the cache block can be read-out or written by a single processor configured separately Into the process. This state can also be changed by interrogation (also known as Snooping) inside the system. The same can be explained by the outside member 502163 _ read the member ’s statement of #y in the y year A month whether the substance of the change is allowed to be amended. ° 5. The logical unit of the description (3) of the invention (for example, another single processor) Or second-level cache (ie, by external snooping) to change state. In each case at each point in time, separate cache memory blocks exist only in one of the four MESI-states described above. The correct functional description of the MESI-Protocol is, for example, bin / que ry shown below the Internet? Pg = q & kl = de & q = MESI & s earch = Search, Punktl, "Pentium: Neuerungen in de r X86 Architektur", It was recently changed or revealed in the documents belonging to the Internet on March 28, 1997. Its URL is: http://plweb.htu.tuwien.ac.at/pentium, Punkt 2.7.1 M, ESI.-Protocol 〇 One possible way to ensure cache consistency in a processor system with additional cache memory is to determine that these additional cache memories have all the large amounts of these other cache memories directly connected to them. Cache data box. Therefore, in all the exclusion process, it is necessary to perform and return such cache memory blocks related to the exclusion process to the connected single processor or its update on the cache memory. In terms of such an update requirement or its processing, the system power is required, and the system power cannot be used by the user as effective power. The efficiency of the entire system is therefore limited. The object of the present invention is to provide a technical measure whereby the efficiency of a large processor system can be improved, in particular. 502163 Would you please tell me whether the amendments mentioned in the year-end movie are approved or not? V. Description of the invention (4) According to the present invention, the purpose is to use the processor system with the first scope of patent application Features to achieve. Therefore, the additional cache memory can not only achieve simple cache memory block-management by the states M, E, S, and I, but also can be between additional individual cache memories or each single processor A combination of state management is implemented between cache memories. This combined state management is shown in the expansion of the MESI-state of the cache protocol of the extra cache memory, which has the advantage that the increased number of update requests is in the direction of other locations in the processor system It can be reduced because the expanded MESI-state can be determined in advance: the relevant cache data blocks required may not be located elsewhere in this processor system. The power consumption currently required for such an update requirement is therefore now available as the effective power of this processor system. The efficiency of the processor system can therefore be improved. In addition, in order to improve the efficiency of this processor system, the following situations are helpful. · Due to the extended MESI-state, some cases can be distinguished, in which some processes need to be performed, but these processes as a whole This can be achieved relatively quickly. For example, there may be situations in which a part of the processor system previously had an update requirement for additional cache memory, and a subsequent cache operation performed by the additional cache memory Hierarchical renewal needs are needed. For example, the relevant cache memory block can be assigned MESI-state I, but in this case it can be known that there is an extended MESI-state: the sender of the relevant cache memory block is not the previous one Corrected information. In this case, this extra speed is 502163. Members are kindly requested to show y years ^: The mentioned # 'it ~ Whether the substance of the change is allowed to repair iE .. ° 5. Description of the invention (5) The memory can be notified immediately Part of the processor system's activation of the previous update requirements: The above-mentioned MESI-assignment has been performed, although such an assignment has in fact been facilitated by subsequent cache memory in subsequent measures. This processor system therefore does not have to wait until the actual end of this update requirement has been confirmed, but can immediately start the next topic. Advantageous forms of the invention are described in the subordinates of the scope of patent application. The processor system is hierarchically structured in terms of its cache memory structure, so as to achieve a streamlined approach in the direction of these lower-level planes by intercepting these update requirements for the highest-level planes as possible. The largest demand for renewal. By selecting the appropriate MESI state, the number of required MESI-states can be kept as small as possible. When the number of MESI-states is as small as possible, the number of coding bits must be as small as possible when digitizing the MESI-states. As a result, hardware costs are reduced. The present invention will be described in detail below with reference to the drawings. The diagram is briefly explained as follows: Figure 1 Schematic diagram of the large processing system of the prior art. Fig. 2 State transition diagram of MES I-state of the prior art. FIG. 3 is a state transition diagram of the extended MESI-state of the present invention. The state table in FIG. 4, in which the system power can be saved when using the expanded MESI-state in FIG. 3 compared to the MESI-state used in FIG. Effective power. The 1st to 4th circles can help to understand the essence of the present invention. These diagrams do not require completeness, as long as the following description relates to the prior art and the present invention. Whether the substance of the amendment is allowed to be amended.

五、發明說明(6) 可能之實施形式即可。 第1圖顯示二組各別之單一處理機EP,其經由各別之第 一匯流排BS1而分別與額外之快取記憶體CS相連接。各 額外之快取記億體CS經由共同之第二匯流排BS2而與主 記憶體組件MEM相連接。第二匯流排BS2亦可稱爲快取 匯流排或系統匯流排。在第二匯流排BS2上除了這些額 外之快取記憶體外亦可連接所有型式之I/O系統或連接 系統而成爲進一步延伸之系統組件。在本實施例中,至少 圖中未連接其它系統。因此該第二匯流排BS2可稱爲快 取匯流排或系統匯流排,而第一匯流排BS1可稱爲處理機 匯流排。 第1圖所示之處理機系統是以階層式方式構成。由額 外之快取記憶體CS觀之,第一匯流排BS1之方向中配置 較低之階級而在第二匯流排BS2之方向中配置較高之階 級。 單一處理機ΕΡ和所屬之第一匯流排BS1之間,第一匯流 排BS1和額外之快取記憶體CS之間,額外之快取記憶體 CS和第二匯流排BS2之間以及第二匯流排BS2和主記憶 體組件MEM之間的連接都是以雙向方式構成,使資料和碼 以及其它所需之資訊都可在所有組件之間在所有方向中 傳送。 單一處理機EP在內部和外部分別具有一個快取記憶體, 其未詳細顯示在第1圖中。快取階層(其由單一處理機EP 502163 煩請委員明示月所提之 本有無變更實質内容是否准予修正❹ 五、發明說明(7) 之二個快取記憶體所共同組成)以下稱爲第二階快取記億 體(SLC )。只要提及SLC,則以下即指所屬之單一處理機 EP,其控制二個快取記憶體。反之,只要提及單一處理機 EP,則會提及二個在內部和外部之屬於此單一處理機EP 之快取記憶體。 縮寫SLC使用在第2至第4圖中。額外之快取記憶體 CS稱爲第三階快取記憶體(TLC)。縮寫TLC同樣用在第2 至第4圖中。縮寫TLC隨後可選擇性地用在"額外之快取 記憶體CS"此種槪念處。 額外之快取記憶體CS和單一處理機ΕΡ之間的需求稱 爲內部需求,而額外之快取記憶體CS和主記憶體組件MEM 之間或CS和其它耦合至快取記憶體匯流排BS2之組件 (例如,I /0組件)之間的需求稱爲外部需求。 這些命令(其起動各種需求及其在處理機系統中之操作 且在維護快取一致性時扮演一種角色)例如可以 是:PREFETCH,READSHARED(RDS)?READEXCLUSIVE(RDE),RE ADMODIFIED(RDM)及WRITE(WR)。這些命令造成各狀態之 轉移,這些狀態由於一種依據MESI原理而操作之快取-規 約而配屬於相關之快取資料方塊以便保持快取一致性。 可能之狀態轉換圖由第2及第3圖中可知。 第2圖中所示之狀態轉換圖是以下述情況爲依據,即: 上述之快取-規約是以4個MESI-狀態來操作。第3圖中 所示之狀態轉換圖是以下述情況爲依據:上述之快取-規 502163 煩請委員明示P年P月日所提之 #'0£-本有無變更實質内容是否准予修正。 五、發明說明(8) 約是以已擴展之MES I -狀態來操作。在目前情況中此快取 -規約是以8個MESI -狀態來操作。第2和第3圖之狀態 轉換圖由額外之快取記憶體CS之記憶體方塊之觀點即可 看出。單一處理機EP之快取記憶體具有相對應之狀態轉 換_。 以下先就第2_來說明。 由額外之快取記憶體CS之相關之快取記憶體方塊之狀 態I (其顯示:相關之快取記憶體方塊之登錄是無效的)開 始,則一種針對此額外之快取記憶體CS之內部RDS命令 相對於此快取記憶體方塊而言最後會使狀態由I轉換至 狀態S。但由於存在於此額外之快取記憶體CS中之登錄 是無效的,則先前之登錄會藉由此額外之快取記憶體CS 而達成。於是需要對其它快取記憶體進行詢問,以便確定: 實際之登錄可在何處進行。此種過程以下亦稱爲TLC-命 令。在TLC-命令結束之後,此種登錄成爲有效之一般登錄 而存在於此額外之快取記憶體CS中。此種情況在相同之 輸出狀態中(此時不是利用RES-命令)在進行Prefetch-需求時(其前提是此種可能性經常已被設定)亦適用。 在狀態轉換圖下方之表中,左方之表格中所顯示的4個 可能狀態之設置情形。其旁則顯示對SLC時在各別情況 中可能之狀態,對TLC而言亦顯示了各種可能之狀態。 由於此種觀察是由TLC開始,則這些狀態是與此表之第一 行中所給定之狀態相一致。若以下爲了簡化之故總是不 -10- 502163V. Description of the invention (6) The possible implementation forms are sufficient. Figure 1 shows two separate sets of single processors EP, which are connected to the additional cache memory CS via the respective first bus BS1. Each additional cache memory CS is connected to the main memory element MEM via a common second bus BS2. The second bus BS2 may also be referred to as a cache bus or a system bus. In addition to these additional cache memories on the second bus BS2, all types of I / O systems or connection systems can be connected to become further extended system components. In this embodiment, at least other systems are not connected in the figure. Therefore, the second bus BS2 may be referred to as a cache bus or a system bus, and the first bus BS1 may be referred to as a processor bus. The processor system shown in Fig. 1 is constructed in a hierarchical manner. From the perspective of the extra cache memory CS, the lower level is arranged in the direction of the first bus BS1 and the higher level is arranged in the direction of the second bus BS2. The single processor EP and the associated first bus BS1, between the first bus BS1 and the extra cache memory CS, between the extra cache memory CS and the second bus BS2, and the second bus The connection between the bank BS2 and the main memory component MEM is constituted in a bidirectional manner, so that data and codes and other required information can be transmitted in all directions between all components. The single processor EP has a cache memory inside and outside, which are not shown in detail in FIG. 1. The cache hierarchy (which consists of a single processor EP 502163, members are requested to indicate whether the substance of the changes mentioned in the month is permitted to be amended. V. The invention consists of two cache memories (7)) hereinafter referred to as the second Order cache remembers billions of bodies (SLC). As long as SLC is mentioned, the following refers to its single processor EP, which controls two cache memories. Conversely, whenever a single processor EP is mentioned, two cache memories belonging to this single processor EP, both internally and externally, will be mentioned. The abbreviation SLC is used in Figures 2 to 4. The extra cache memory CS is called the third-level cache memory (TLC). The abbreviation TLC is also used in Figures 2 to 4. The abbreviation TLC can then be used optionally in " extra cache memory CS ". The requirement between the extra cache memory CS and the single processor EP is called internal demand, and the extra cache memory CS and the main memory component MEM or CS and other coupled to the cache memory bus BS2 The requirements between components (for example, I / 0 components) are called external requirements. These commands (which initiate various requirements and their operation in the processor system and play a role in maintaining cache consistency) can be, for example: PREFETCH, READSHARED (RDS)? READEXCLUSIVE (RDE), RE ADMODIFIED (RDM), and WRITE (WR). These commands cause the transition of states, which are assigned to the relevant cache data blocks due to a cache-protocol operating according to the MESI principle in order to maintain cache consistency. The possible state transition diagrams are shown in Figures 2 and 3. The state transition diagram shown in Figure 2 is based on the following conditions, that is, the above-mentioned cache-protocol operates with 4 MESI-states. The state transition diagram shown in Figure 3 is based on the following conditions: The above-mentioned cache-rule 502163 Members are kindly requested to indicate whether # '0 £ mentioned in P, P, and P-Days-whether the substance of the change is allowed to be amended. V. Description of the invention (8) The operation is about the extended MES I-state. In the current situation this cache-protocol operates on 8 MESI-states. The state transition diagrams of Figures 2 and 3 can be seen from the viewpoint of the memory block of the extra cache CS. The cache memory of the single processor EP has a corresponding state transition_. The first 2_ is explained below. Starting from the state I of the relevant cache memory block of the extra cache memory CS (which shows: the registration of the relevant cache memory block is invalid), a kind of The internal RDS command finally changes the state from I to state S relative to this cache block. However, since the registration in this extra cache memory CS is invalid, the previous registration will be achieved by this extra cache memory CS. The other caches need to be queried to determine where the actual registration can take place. This process is also referred to as TLC-command below. After the TLC-command is completed, this login becomes a valid general login and exists in this additional cache CS. This situation also applies in the same output state (in this case, instead of using the RES-command) when performing a Prefetch-requirement (provided that this possibility is often set). In the table below the state transition diagram, the settings of the four possible states shown in the table on the left. Next to it are the possible states for SLC in each case, and for TLC the various possible states are also shown. Since this observation started with TLC, these states are consistent with the states given in the first row of this table. If the following is always not for the sake of simplicity -10- 502163

予曰 修所 正提 五、發明說明(9) 明確地設定時,則在設定MES I -狀態時各別之快取記憶體 方塊是與相關之快取記憶體有關。 由上述表格之第一列可知:在TLC中之狀態I時,則狀態 I在SLC中同樣占有優勢。這是因爲在本實施例中此TLC 所具有之數量(不一定是實際登錄之量而是資訊量)較所 連接之SLC之登錄之狀態還多。其結果是.·在SLC中總是 只會產生相同之狀態或較低値之狀態。反之,TLC中之狀 態値總是等於或大於SLC中之狀態値。 一種例外可形成狀態Exclusive,這是因爲此種狀態基 本上只有當至少企圖修改相關之登錄時才是有意義的。 由於TLC在Exclusive-需求時不知道所需求之登錄實際 上是否已被修改,則此TLC仍然以”登錄已被修改"此種情 況開始。在Exclusive-需求時,SLC中之實際狀態因此亦 可以是M。此種狀態無關緊要,這是因爲Modify-需求最 後是一種Exclusive-需求,只有”此登錄已被修定”是明確 的。 若TLC中之登錄已標記成S且在此TLC中已對此種登錄 發出RDS命令,則此登錄之MESI-狀態不會改變。但另一 方面在此登錄被TLC保存之後SLC中之登錄又被去除。 MESI-狀態相對於SLC中此種登錄之分級藉由所屬之單一 處理機EP來達成因此是可能的。這對TLC是無關緊要的, 因爲登錄本身不會改變,因此整體上仍然是有效的。改變 時會發覺TLC,這是因爲先前此種登錄須藉由相關之單一 -11- 502163 五、發明說明(10) 處理機而載入E狀態或Μ狀態中。修改之權利是與RDS 命令或Prefetch-命令無關。在狀態S登錄至TLC之後, 相對於SLC中之此種登錄(即,一般性之登錄)而言,則相 關之登錄所用之MESI-狀態I或S中之一實際上可保持在 較低之快取階層中。此種情況在上述之左下表中是顯示 在第二列中。 上述原理亦適用於隨後之各種考慮中5該處即不再詳 述。 〇之Yu Yue The repair is being mentioned V. Description of the invention (9) When explicitly set, the respective cache memory when setting the MES I-state box is related to the relevant cache memory. As can be seen from the first column of the above table: When state I in TLC, state I also has an advantage in SLC. This is because in this embodiment, the number of TLCs (not necessarily the actual registration amount but the information amount) is greater than the registration status of the connected SLC. The result is that in the SLC, only the same state or the lower state is always produced. Conversely, the state 値 in TLC is always equal to or greater than the state 中 in SLC. An exception can result in a state of Exclusive, because such a state is basically only meaningful if at least an attempt is made to modify the relevant entry. Since the TLC does not know whether the required login has actually been modified at the time of Exclusive-requirement, this TLC still starts with "login has been modified" and this situation. At the time of Exclusive-requirement, the actual status in the SLC is therefore also Can be M. This state is irrelevant, because the Modify-requirement is an Exclusive-requirement in the end, only "this login has been modified" is clear. If the login in the TLC has been marked as S and in this TLC If an RDS command has been issued for this type of login, the MESI-state of this login will not change. On the other hand, the login in the SLC is removed after this login is saved by the TLC. The MESI-state is relative to this login in the SLC. The classification is achieved by the single processor EP it belongs to. This is irrelevant to the TLC, because the login itself does not change, so it is still valid as a whole. The TLC will be noticed when it is changed, because previously This kind of registration must be loaded into the E state or M state through the relevant single-11-502163 V. Invention Description (10) Processor. The right to modify is independent of the RDS command or Prefetch- command. In state S After recording to the TLC, compared to this registration in the SLC (ie, general registration), one of the MESI-states I or S used by the related registration can actually be kept at a lower cache level This situation is shown in the second column in the table below on the left. The above principle also applies to various subsequent considerations. 5 This point will not be described in detail. 〇 之

命令RDE或RDM由TLC要求一種Exclusive登錄。這與 先前在TLC中之登錄是否已預先查覺此狀態I或S無關, 然後TLC中之登錄被標示成狀態E。依據上述,則在SLC 中就此種登錄而言實際上最後會分配到狀態I,S,E或Μ 中之一。此種情況顯示在左下表中之第三列中。The command RDE or RDM requires an exclusive login by the TLC. This has nothing to do with whether the previous registration in the TLC has detected this state I or S in advance, and then the registration in the TLC is marked as the state E. Based on the above, in the SLC, for this type of registration, it is actually finally assigned to one of the states I, S, E or M. This is shown in the third column in the table on the left.

每一命令RDE或RDM會產生一種TLC-命令,即,BUS-RequesU匯流排-需求這是因爲TLC在交出此種登錄之 前須注意:由處理機系統之較低之快取階級中相關之TLC 開始,此種登錄是以狀態I來標示。 若SLC要求TLC中以命令RDE或RDM來登錄且隨後作修 改,則須注意:此種已修改之登錄又以WR-命令寫回至TLC 中。TLC因此將此已修改之登錄以狀態Μ來標示。 在所需求之登錄已修改之後,則有需求之SLC可保持此 種已修改之登錄且以此爲特徵。其亦可將此種登錄寫回 旦由於此種寫回而使此種登錄之狀態之階層下降且使此 -12- 502163 修煩 正請 手委 f員 播明 f示Each command RDE or RDM will generate a TLC-command, that is, BUS-RequesU bus-demand. This is because TLC must pay attention before handing over this kind of registration: it is related to the lower cache level of the processor system. The TLC starts, and this type of registration is marked with status I. If the SLC requires the TLC to log in with the command RDE or RDM and then modify it, it must be noted that this modified login is written back to the TLC using the WR- command. The TLC therefore marks this modified entry as state M. After the required login has been modified, the required SLC can maintain such a modified login and be characterized as such. It can also write back such registrations. Once this writing back, the level of the status of such registrations is reduced and this -12-502163 is annoying

修所 正提 〇之 五、發明說明(11) 狀態例如只至狀態E中。若SLC中需要此種登錄或此種 登錄須去除,則此種登錄能以相對應之方式而分配到SLC 之狀態S或狀態I。這是可能的,因爲在此種序列中狀態 Μ在狀態E,S或I之前是屬最高値的。前述情況在第2圖 中之左下方之表中是顯示在第四列中。若相對於相關之 具有狀態Μ之快取記憶體方塊而對TLC發出一種RDS-,RDE -或WR -命令,則TLC中此種登錄之狀態不會改變。若 對TLC發出一種RDM-命令,則在本發明之有利之實施形式 中在標記Μ之位置處之相關登錄之狀態可成爲標記E。這 樣所具有之優點是:在某些情況下此快取記憶體方塊不會 以不必要之方式而被讀出。若一個配置在較低之快取階 層中之單一處理機ΕΡ以RDM-需求來要求一個快取記憶體 方塊時,則TLC可知道:此快取記憶體方塊實際上會被改 變。此種資訊是RDM -需求之一部份。若此快取記憶體方 塊確實未改變,則以RDE-命令來要求此快取記憶體方塊。 換言之,若此快取記憶體方塊是以RDM-命令而被需求時, 則此種有需求之單一處理機或所屬之SLC最後會具有最 新之資料Exclusive。若此種情況被查覺確實如此時,則 在存取這些資料時不是在TLC中尋找這些資料,而是立即 以相對應之較快速率在已連接之SLC中尋找。 第2圖之右下方顯示第二表,其中再一次進行上述之各 命令。除了這些命令之外,另有一些主要之動作(action), 在與TLC或SLC有關之這些命令中這些動作是由相關之 -1 3-The workshop is mentioning the fifth of the invention description (11) For example, only state E. If such a registration is required in the SLC or it needs to be removed, such a registration can be assigned to the state S or I of the SLC in a corresponding manner. This is possible because state M is the highest in this sequence before states E, S or I. The aforementioned situation is shown in the fourth column in the lower left table in FIG. 2. If an RDS-, RDE-, or WR-command is issued to the TLC relative to the associated cache block with the state M, the state of this registration in the TLC will not change. If an RDM-command is issued to the TLC, the state of the related registration at the position of the mark M in the advantageous embodiment of the present invention can be the mark E. This has the advantage that in some cases this cache block will not be read out in an unnecessary way. If a single processor EP configured in a lower cache level requests an cache block with RDM-requirements, the TLC knows that this cache block will actually be changed. This kind of information is part of the RDM-requirements. If the cache block does not change, then the cache block is requested with the RDE- command. In other words, if this cache memory block is requested by an RDM-command, such a demanded single processor or its own SLC will eventually have the latest data Exclusive. If it is found that this is the case, then instead of searching for the data in the TLC when accessing the data, it immediately searches in the connected SLC at a correspondingly faster rate. A second table is shown at the lower right of Fig. 2 in which the above-mentioned commands are performed again. In addition to these commands, there are other main actions. In these commands related to TLC or SLC, these actions are related by -1 3-

502163 頌讀委I明示P年aF: 所提之 修lij本有無變更實質内容是否准予修正。 五、發明說明(1 2 ) 快取記憶體方塊之可能不同之輸出狀態開始進行。具有 相同意義之相同之表在第3圖中亦稱爲狀態轉換圖。 在第3圖中可看出一種相對於第2圖之狀態圖而擴展 之狀態轉換圖。第3圖之狀態轉換圖具有已擴展之狀態 SI,ES,MI和MS。狀態II,SS,EM和MM由第2圖之狀態轉 換圖之狀態I,S,E和Μ開始而進入所設定之序列中。 狀態表示法之第一個字母是與TLC中之狀態有關,而第 二個字母是與SLC中之狀態或較低快取階層中之狀態有 關。 在本實施例中,選取此登錄ΕΜ以作爲此登錄ΕΕ之位置 處之快取記憶體方塊之狀態描述用之登錄,這是因爲例如 由狀態SI或SS開始在RDE-命令到達時由TLC上之單一 處理機ΕΡ這方面而言此TLC因此須計算:單一處理機ΕΡ 何時需求一個快取記憶體方塊Exc 1 us i ve ,其亦可修改此 方塊。由於TLC須維持此種較系統中之狀態情況數還多 之數量,則此TLC將此新的情況登錄成EM。 若相關之快取記憶體方塊同樣以RDM命令而被單一處 理機EP所需求時,則此情況在所需求之快取記憶體方塊 被修改時是明確的。此種相對於所需求之快取記憶體方 塊所作之登錄因此是合乎邏輯的。 整體而言以上述方法可使狀態轉換画達到最佳化,這是 因爲狀態情況不必被顯示,就其資料技術上之編碼而言可 能之方式是須提供一種額外之位元。八個狀態可以三個 -14- 502163 五、發明說明(13) 位元來編碼。超過九個或九個狀態時需要之位元數大於 三個。 在單一處理機EP對快取記憶體方塊有Exclusive-需求 (RDE命令)時,則如上所述未必須要修改此快取記憶體方 塊。有需求之單一處理機EP具有相關之快取記憶體方塊 Exclusive且最後可使此種處理達成所有可能之改變。此 EP可使快取記憶體方塊轉達至例如其它位置上或將此方 塊排除。在第一種情況時相關之快速取記憶體方塊用之 TLC必須保持此狀態ES,在第二種情況時則保持EI。TLC 以狀態EM來包含所有這些情況5這是因爲這樣可確保:若 相關之快取記憶體方塊轉向至相關之單一處理機EP時, 則就相關之快取記憶體方塊而言可保持每一種情況。前 述之各種情況顯示在第3圖左下方之表格中之第5列 中。 第3圖左下方之表格之第5列對應於第2園左下方表 格中之第三列。此外,在所示之序列中第3圖左下方表格 之列1,3和8對應於第2圖左下方表格之列1 ,2和4。上 述之原理可對應地使用。 由於第2圖之狀態轉換圖原理亦可用在第3圖之狀態 轉換圖中,有關第3圖之狀態轉換圖之描述基本上只限於 不同點之描述。各別之狀態轉換及相對應之命令存在時 加入一些狀態可參考第3圖。 依據第3圖之狀態轉換圖,一個TLC可具有二個Shared 502163 五、發明說明(14) 煩請委員明示^^年Θ月所提之 修A本有無變更實質内容是否准予修正ο. 狀態,二個Exclus ive狀態及三個Modi f i ed狀態。在 Prefetch-需求時很明確的是:一種在TLC中所取得之登 錄只存在於相關之TLC中。這情況是與狀態S I有關。若 此種特殊情況(除此之外可另外給定其它特殊情況)可另 外標示,則在依據此種登錄之需求中可不必詢問其它平 面。於是容量在第一匯流排 BS1側成爲可用的 (available),這些容量可被單一處理機EP使用於其它操 作中以便提高處理機系統之效率。 狀態SI標示在第3圖之左下表中之第二列2中。在 Exclusive-狀態中相對於TLC而言會產生額外之狀態 ES。若一種存在於 TLC Exclusive 中之登錄由 SLC Shared讀出時,則此種狀態ES例如可保持著。在此種情 況下此TLC可保持此狀態E且找出SLC之狀態E中之快 取記憶體方塊以便標示成狀態S。TLC然後知道:SLC中之 狀態實際上可以只是S或I。狀態可以不是Μ或E,這是 因爲此種權利之詢問並未發出。此種登錄因此在任何情 況下可不變。此種方式之優點將參考第4圖來描述。 SLC平面用之各種可能之狀態顯示在第3圖左下方表格 之第4列中。 除了狀態ΜΜ以外,已擴展之狀態ΜΙ和MS顯示在第3 圖中。 若SLC已獲得TLC Exclusive之登錄,將其修正且以WR 命令寫回至TLC中,則此TLC採用此狀態MI以用於此快 -16» 502163 修煩502163 The commendation committee I stated clearly that the year aF of P: whether the amendments to the revised version of lij were allowed to be modified. V. Description of the Invention (1 2) The possible different output states of the cache memory block begin. The same table with the same meaning is also called a state transition diagram in FIG. In Fig. 3, a state transition diagram which is expanded relative to the state diagram in Fig. 2 can be seen. The state transition diagram in Figure 3 has extended states SI, ES, MI, and MS. States II, SS, EM, and MM start from states I, S, E, and M in Figure 2 and enter the set sequence. The first letter of the state notation is related to the state in the TLC, and the second letter is related to the state in the SLC or the state in the lower cache hierarchy. In this embodiment, this login EM is selected as the registration for the state description of the cache memory block at the location of this login EE, because, for example, the state SI or SS is started by the TLC when the RDE-command arrives. In terms of the single processor EP, this TLC must therefore calculate: When the single processor EP needs a cache memory block Exc 1 us i ve, it can also modify this block. Since the TLC has to maintain more than this number of state conditions in the system, this TLC registers this new situation as EM. If the relevant cache memory block is also required by the single processor EP with the RDM command, this situation is clear when the required cache memory block is modified. Such a registration relative to the required cache memory block is therefore logical. On the whole, the state transition picture can be optimized in the above-mentioned way, because the state situation does not have to be displayed, and the possible way for its data technology encoding is to provide an extra bit. Eight states can be encoded by three -14- 502163 V. Invention description (13) bits. More than three bits are required for more than nine or nine states. When the single processor EP has an Exclusive-Requirement (RDE command) for the cache memory block, it is not necessary to modify the cache memory block as described above. A single processor EP in need has an associated cache memory block Exclusive and finally enables such processing to achieve all possible changes. This EP enables the cache block to be transferred to, for example, another location or excludes this block. In the first case, the TLC for the relevant fast memory block must maintain this state ES, and in the second case, it maintains EI. The TLC includes all of these cases in the state EM5 This is because it ensures that if the relevant cache memory block is turned to the relevant single processor EP, then each of the relevant cache memory blocks can be maintained Happening. The foregoing cases are shown in the fifth column in the table on the lower left of Figure 3. The fifth column in the table on the lower left of Figure 3 corresponds to the third column in the table on the lower left of the second garden. In addition, columns 1, 3, and 8 in the lower left table of Figure 3 in the sequence shown correspond to columns 1, 2 and 4 in the lower left table of Figure 2. The above principles can be used correspondingly. Since the principle of the state transition diagram of Figure 2 can also be used in the state transition diagram of Figure 3, the description of the state transition diagram of Figure 3 is basically limited to the description of different points. When the respective state transitions and corresponding commands exist, refer to Figure 3 to add some states. According to the state transition diagram in Figure 3, one TLC can have two Shared 502163. V. Description of the invention (14) Members are requested to indicate whether the substance of the amendment A mentioned in the year ^^ Θ is allowed to be modified. An Exclus ive state and three Modi fi ed states. At the time of Prefetch-requirement, it was clear that a kind of registration obtained in TLC only existed in the relevant TLC. This situation is related to the state SI. If such special circumstances (other special circumstances can be given in addition) can be marked separately, then there is no need to ask other planes in the requirements based on such registration. The capacity is then available on the first bus BS1 side, and these capacities can be used by the single processor EP for other operations in order to improve the efficiency of the processor system. The state SI is indicated in the second column 2 in the left table of FIG. 3. In the Exclusive-state, an additional state ES is generated relative to the TLC. If an entry in TLC Exclusive is read by SLC Shared, this state ES can be maintained, for example. In this case, the TLC can maintain the state E and find the cache memory block in the state E of the SLC for labeling as the state S. The TLC then knows that the state in the SLC can actually be just S or I. The status may not be M or E, because no enquiries on such rights have been issued. This registration is therefore unchanged under all circumstances. The advantages of this method will be described with reference to FIG. 4. The various possible states for the SLC plane are shown in the fourth column of the table at the bottom left of Figure 3. In addition to the state MM, the expanded states M1 and MS are shown in Figure 3. If the SLC has obtained the TLC Exclusive login, modify it and write it back to the TLC with the WR command, then this TLC uses this state MI for this fast -16 »502163 repair

丞月Leap month

修所 正提 〇之 五、發明說明(15) 取記憶體方塊中。SLC已排除此種登錄且在該處不再有 效。在相同情況下,當TLC具有一種已修改之登錄時(即, 就此種在狀態ΜΜ中之登錄而言),則TLC會到達且再一次 以WR-命令對此登錄進行寫入。TLC因此佔有狀態ΜΙ。在 此種情況下此TLC知道:相關之登錄例如可以不再位於 SLC 中。 此種情況標示在第3圖之左下方第6列中。 最後,仍會產生一種已擴展之狀態MS,其產生是與"SLC 中是否已登錄此狀態I或IT無關。此狀態MS在由 SLCShared讀出時被產生。但就像上述已多次提及者一 樣,SLC在相對應之動作之後可造成一種邏輯上向下設定 之狀態,使最後在狀態描述MS之後就相關之SLC而言可 存在一些狀態I或S。 上述情況標示在第3圖左下方之表格中之第7列。已 擴充之快取狀態之優點是:TLC可對此處理機系統內部之 狀態有很深之認知,其例如可適當地進一步引導或截獲這 些針對單一處理機EP之需求。"截獲"是可被達成的,這 是因爲TLC本身之需求可被討論。處理機側之負載因此 可減小。單一處理機EP可達成一些已擴展之任務以提高 處理機系統之有效功率。 第4 _顯示一些情況,其中可省略一些對處理機側之需 求。 二個外部之命令可考慮作爲例子。如前所述,外部命令 -17- 502163 修煩 I請 手委 f員IS 内λ 容A 4月tl 修所 正提 五、發明說明(16) 是一種由系統匯流排側而指向TLC之命令。其中一個外 部命令是RDS命令而另一個是RDE命令。 在第4圖中,其下方之部份是已擴展之MESI-狀態之情 況,其上方之部份是目前4個MESI-狀態之情況。 TLC中之快取記憶體方塊具有狀態I或11時,則此二種 情況中之結果是相同的。在此二種情況時不進行任何動 作。在此二種情況時存在一種所謂MISS-情況,即,其不傳 送任何資料。 若此種與RDS命令或RDE命令相關之登錄狀態在TLC 中被標示成S,則在RDS命令中在4個狀態模式或已擴展 之狀態模式時都不需有任何動作,這是因爲TLC中之登錄 已經是S,相對於上述已擴展之狀態模式而言,S(J"狀態SI 或SS實際上是否已預先設定"是沒有不同的。 在RDE命令時,在4個狀態模式中對於TLC以及SLC都 須注意:此種分配給相關之快取記憶體方塊之狀態由S朝 向I改變。相關之快取記憶體方塊於是只有(Exclusive) 對單一處理機EP之此側是需要的。這樣會對處理機匯流 排造成負載,因此暫時不能用於其它任務中。 在已擴展之狀態模式中,就RDE命令而言對處理機側之 需求通常是不需要的。例如若已查覺之狀態相對於TLC 中所需求之登錄是SI時,則就TLC觀之即不必再注意:相 關之SLC在狀態I中改變其對相關快取記憶體方塊之狀 態査覺。TLC已經知道:相關之快取記憶體方塊在單一處 -18- 502163 煩讀委員明示^一年Ap: 提之 修JE··本有無變更實質内容是否准予修正ο 五、發明說明(17) 理機側已被分級爲I。在此情況下此處理機側因此不再額 外地受到負載。最後,更多之處理機功率因此可用於其它 任務中。 若此種與RDS命令或RDE命令相關之登錄在TLC中是E , 則在4個狀態模式中和RDS命令時在注意到”相關之快取 記憶體方塊在TLC中以及在較低之快取階層中須分配到 此狀態S "之前須查看:至相關之快取記憶體方塊之暫時性 之新資料是否未在任何地方存在於此處理機系統中。這 是因爲一種以Exclusive方式而被需求之快取記憶體方 塊稍後在相對應之其它動作時可被其它有需求之單元引 進至狀態Μ,E,S或I中之一種狀態中。在此種情況下此 處理機側因此不會有效地受到負載。 同樣情況適用於RDE命令中,其不同點是:須引進SLC單 元,以便以狀態I來表示此相關之快取記憶體方塊。此處 此處理機側亦不會有效地受到負載。 此處理機系統之功率隨著已擴展之狀態模式而增大,這 樣可節省非有效性之功率。若在此種已擴展之狀態模式 中一種快取記憶體方塊已分配到狀態ES且相對於此快取 記憶體方塊已進行一種外部之RDS命令,則在較低之快取 階層之方向中就TLC而言可省略其它之促進動作。經由 SLC可知:相關之快取記憶體方塊在該處已進入狀態s 中。由於相關之快取記憶體方塊在較低之快取階層中已 進入狀態S中,則可知:在較低之快取階層中就相關之快 -19- 502163 煩請委員明示p只· @所提之 修正本有無變更實質内容是否准予修i£-o 五、發明說明(18) 取記憶體方塊而言不會發生一種修改(modi fi cat ion)現 象。因此不可由較低之快取階層側獲得實際之資料。因 此不須檢查:實際之資料是否存在,亦不須注意:狀態之改 變已達成。 在TLC以及現有之外部RDE命令中由狀態ES開始依據 上述可知:不可存在新的資料,只須注意"較低之快取階層 中之快取記憶體方塊是以I來表示”。因此只須一種對較 低之快取階層側之簡易之需求。整體而言此處理機系統 之非有效性功率較四個狀態模式時還小,仍然須額外地事 先檢查:是否可能仍有新的資料存在。 特別重要的是TLC中快取記憶體方塊進入狀態Μ時所 節省之潛能。相對於上述 4個狀態模式(其中此 Modified-狀態未作更進一層之劃分)而言,"所有之措施 總是必須採用"適用於RDS命令和RDE命令此二個命令中, 以便撿查:在相關之快取記憶體方塊以狀態S或I表示之 前,是否仍然有新的資料存在。 上述已擴展之狀態模式允許此Modified-狀態有差異存 在。在本實施例中此Modi fied-狀態劃分成狀態MI,MS和 MM。當外部之RDS命令指向TLC時,則在情況MI和MS時 類似於先前所述對較低快取階層側之每一需求即可不需 要。在RDE命令存在時,至少可不必檢查:新的資料是否 存在。在較低快取階層之方向中一些簡易之需求即已足 夠,因此須注意:在較低之快取階層中相關之快取記憶體 -20- 502163 煩請委員明示Λ 所提之 修正本有無變更實質内容是否准予修正。 五、發明說明(19) 方塊之狀態被改變成I。 在第4圖中這些位置(其中上述之各種措施完全不需要 或只是有限地需要)是以驚嘆號表示。就這些位置而言, 可節省此處理機系統之非有效性負載。 符號之說明 CS 快取記 憶 體 EP 單- -處 理 機 BS1,BS2 匯流排 SLC 第二 二階 快 取 記 ΙΑ 憶 mm 體 TLC 第三 三階 快 取 記 憶 體 0S MEM 主記憶 mm 體 組 件 -21-The institute is mentioning Ⅴ.5. Description of the invention (15) Take the memory block. SLC has ruled out such logins and is no longer valid there. In the same case, when the TLC has a modified login (ie, for such a login in the state MM), the TLC arrives and writes this login again with the WR- command. The TLC therefore occupies the state MI. In this case, the TLC knows that the relevant entry can no longer be located in the SLC, for example. This is indicated in the lower left column of Figure 3. Finally, an extended state MS will still be generated, which is generated regardless of whether the state I or IT is already registered in the " SLC. This state MS is generated when read by SLCShared. But just like those mentioned many times above, the SLC can cause a state set logically down after the corresponding action, so that after the state description MS, there can be some states I or S for the relevant SLC . The above situation is indicated in column 7 in the table on the lower left of Figure 3. The advantage of the expanded cache state is that the TLC can have a deep understanding of the state inside the processor system, which can, for example, further guide or intercept these requirements for a single processor EP as appropriate. " Intercept " can be achieved because the needs of the TLC itself can be discussed. The load on the processor side can therefore be reduced. A single processor EP can accomplish some expanded tasks to increase the effective power of the processor system. Section 4_ shows some situations where some of the requirements on the processor side can be omitted. Two external orders can be considered as examples. As mentioned earlier, the external order -17- 502163 repair trouble I invite the hand committee member IS internal λ content A April tl The training institute is mentioning V. Description of the invention (16) is a command to the TLC from the bus side of the system . One of the external commands is an RDS command and the other is an RDE command. In Figure 4, the lower part is the case of the expanded MESI-state, and the upper part is the case of the current four MESI-states. When the cache memory block in the TLC has state I or 11, the result in both cases is the same. No action is taken in either case. In both cases, there is a so-called MISS-case, that is, it does not transmit any information. If the login status related to the RDS command or RDE command is marked as S in the TLC, there is no need to take any action in the RDS command in the 4 state modes or the extended state mode. This is because the TLC The registration is already S. Compared with the extended state mode, whether S (J " state SI or SS is actually set in advance " is no different. In the RDE command, there are four state modes for Both TLC and SLC must note that the state of this allocation to the relevant cache memory block changes from S to I. The relevant cache memory block is therefore only required for this side of the single processor EP. This will cause a load on the processor bus, so it cannot be used for other tasks temporarily. In the extended state mode, the demand on the processor side for the RDE command is usually not needed. For example, if it is found When the status is SI compared to the required registration in TLC, then you do n’t need to pay attention to the TLC view: the related SLC changes its status perception of the relevant cache block in status I. TLC already knows: related The cache memory block is in a single place-18- 502163 The annoying member expressly stated ^ One year Ap: whether the revision JE ·· has the substance of the book whether the amendment is allowed to be amended. V. Description of the invention (17) The machine side has been classified as I. In this case, the processor side is therefore no longer subject to additional load. Finally, more processor power can therefore be used for other tasks. If the registration related to the RDS command or RDE command is E in the TLC , Then in the 4 state modes and the RDS command, you notice that the “relevant cache memory block must be assigned to this state in the TLC and in the lower cache level S " before you check: to the relevant speed Whether the temporary new data of the memory block does not exist in this processor system anywhere. This is because a cache memory block that is required in an exclusive manner can be later used in the corresponding other actions. Other required units are introduced into one of states M, E, S, or I. In this case, the processor side will not be effectively loaded. The same applies to the RDE command. The difference is that The SLC unit must be introduced in order to represent the relevant cache memory block with state I. Here the processor side will not be effectively loaded either. The power of the processor system increases with the expanded state mode This can save non-effective power. If in this expanded state mode a cache memory block has been allocated to the state ES and an external RDS command has been executed relative to this cache memory block, then In the direction of the lower cache level, other promotion actions can be omitted for TLC. According to SLC, it can be known that the relevant cache memory block has entered the state s there. Because the relevant cache memory block is in The lower cache level has entered the state S, then we can know that: in the lower cache level, the relevant speed is -19- 502163. Members are kindly requested to indicate p only. i £ -o V. Description of the invention (18) A modi fi cat ion phenomenon does not occur when taking memory blocks. Therefore, actual data cannot be obtained from the lower cache side. Therefore, it is not necessary to check whether the actual data exists or not to note that the change of status has been achieved. In the TLC and the existing external RDE commands, the state ES starts from the above. According to the above, no new data can exist. Just note that "the cache memory block in the lower cache level is represented by I." Therefore only There must be a need for simplicity on the lower cache level side. In general, the non-effective power of this processor system is smaller than that in the four state modes, and it is necessary to additionally check in advance: whether new data may still be available Existing. Especially important is the potential saving of the cache memory block in TLC when it enters state M. Compared to the above 4 state modes (where this Modified-state is not further divided), all of the " The measures must always be applied to the two commands "Applicable to RDS Command and RDE Command" in order to check whether there is still new data before the relevant cache memory block is indicated by the state S or I. The above has been The extended state mode allows this Modified-state to have differences. In this embodiment, this Modi fied-state is divided into states MI, MS, and MM. When the external RDS command points to TLC, the In the case of MI and MS, it is not necessary to require each side of the lower cache level similar to the previously described. When the RDE command exists, at least you do not need to check: whether new data exists. In the direction of the lower cache level Some of the simple requirements are sufficient, so it should be noted that the relevant cache memory in the lower cache level is -20- 502163. Members are kindly requested to indicate whether there is any change in the substance of the amendment proposed by Λ, and whether the amendment allows the amendment. Description of the invention (19) The state of the block is changed to I. In Figure 4, these positions (where the above-mentioned measures are not needed at all or only limitedly required) are indicated by exclamation marks. For these positions, this process can be saved Non-effective load of the machine system. Explanation of symbols CS cache EP single--processor BS1, BS2 bus SLC second and second order cache ΙΑ memory mm TLC third and third order cache memory 0S MEM Main memory mm body assembly-21-

Claims (1)

502163 修煩 正請 奎委 有員 無明 變示 實处質k ft年f A 鑫月 古502163 Repair annoyance, please ask Kui Wei, a member, ignorance, change the actual quality k ft year f A Xinyue ancient 六、申請專利範圍 第891 03056號「提高處理機系統之效率所用之裝置」專 利案 (90年12月修正) 六申請專利範圍 1· 一種提高處理機系統之效率所用之裝置,此處理機系 統是由許多單一處理機和多個快取記憶體所構成,快 取記憶體設置成可互相保持快取一致性(Cache Coherence)以便處理一種依據MESI-標準而操作之快 取-規約(protocol ),其特徵爲:須對此種依據MESI-標準而操作之快取-規約進行組構,使得在顯示MES]^ 狀態時至少一部份是設定成多個顯示値,以便至少可 分別顯示:相關之快取記憶體(CS)之MESI-狀態,連接 於此CS上之單一處理機(EP)之MESI-狀態或連接於 此CS之另一快取記憶體(CS)之MESI-狀態。 2.如申請專利範圍第1項之裝置,其中處理機系統具有 多個單一處理機(EP),其至少一部份具有所屬之第一 快取記憶體或具有第一和第二快取記憶體,此處理機 系統具有額外之快取記憶體(CS ),其至少配置在第一 階層中且在第一階層上連接各別之-或整組之單純的 或混合式的單一處理機(EP)或連接其它額外之快取 記憶體(CS)。 3.如申請專利範圍第1或第2項之裝置,其中藉由多個 顯示値來顯示例如II,SI,SS,EM,ES,MI,MS或MM等等 之MESI-狀態組合。6. Patent Application No. 891 03056 "Apparatus for Improving the Efficiency of Processor System" Patent Amendment (December 1990) 6. Application for Patent Scope 1. A device for improving the efficiency of processor system, this processor system It is composed of many single processors and multiple cache memories. The cache memories are set to maintain cache coherence with each other in order to process a cache-protocol that operates according to the MESI-standard. , Its characteristic is: This cache-protocol operated in accordance with the MESI-standard must be structured so that at least a part of the MES] ^ status is set into multiple displays 以便 so that at least they can be displayed separately: The MESI-state of the relevant cache memory (CS), the MESI-state of a single processor (EP) connected to this CS or the MESI-state of another cache memory (CS) connected to this CS. 2. The device according to item 1 of the patent application scope, wherein the processor system has a plurality of single processors (EP), at least a part of which has the first cache memory or the first and second cache memories. The processor system has an additional cache memory (CS), which is arranged at least in the first layer and connects to the individual-or a group of simple or mixed single processors ( EP) or connect other extra cache memory (CS). 3. The device according to item 1 or 2 of the patent application scope, wherein a plurality of display screens are used to display MESI-state combinations such as II, SI, SS, EM, ES, MI, MS or MM. -22 --twenty two -
TW089103056A 1999-02-26 2000-02-22 Device to increase the efficiency of processor-systems TW502163B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
DE19908415 1999-02-26

Publications (1)

Publication Number Publication Date
TW502163B true TW502163B (en) 2002-09-11

Family

ID=7899003

Family Applications (1)

Application Number Title Priority Date Filing Date
TW089103056A TW502163B (en) 1999-02-26 2000-02-22 Device to increase the efficiency of processor-systems

Country Status (3)

Country Link
EP (1) EP1163587A1 (en)
TW (1) TW502163B (en)
WO (1) WO2000052582A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7484044B2 (en) * 2003-09-12 2009-01-27 Intel Corporation Method and apparatus for joint cache coherency states in multi-interface caches
US20070150663A1 (en) 2005-12-27 2007-06-28 Abraham Mendelson Device, system and method of multi-state cache coherence scheme

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715428A (en) * 1994-02-28 1998-02-03 Intel Corporation Apparatus for maintaining multilevel cache hierarchy coherency in a multiprocessor computer system
DE69705072T2 (en) * 1996-01-25 2002-03-21 Unisys Corp., Blue Bell METHOD FOR EXECUTING READ AND WRITE COMMANDS IN A MULTI-STAGE DISTRIBUTED DATA PROCESSING SYSTEM

Also Published As

Publication number Publication date
EP1163587A1 (en) 2001-12-19
WO2000052582A1 (en) 2000-09-08

Similar Documents

Publication Publication Date Title
JP5348429B2 (en) Cache coherence protocol for persistent memory
JP5440067B2 (en) Cache memory control device and cache memory control method
US7469321B2 (en) Software process migration between coherency regions without cache purges
CN100414494C (en) Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache
US7484043B2 (en) Multiprocessor system with dynamic cache coherency regions
CN100511185C (en) Cache filtering using core indicators
KR100273039B1 (en) Method and system of providing a cache-coherency protocol for maintaining cache coherency within a multiprocessor data-processing system
US20100325374A1 (en) Dynamically configuring memory interleaving for locality and performance isolation
JP5241838B2 (en) System and method for allocating cache sectors (cache sector allocation)
US20150002526A1 (en) Shared Virtual Memory Between A Host And Discrete Graphics Device In A Computing System
EP2472412B1 (en) Explicitly regioned memory organization in a network element
US10678702B2 (en) Using multiple memory elements in an input-output memory management unit for performing virtual address to physical address translations
CN101030170A (en) Device, system and method of multi-state cache coherence scheme
US20090193197A1 (en) Selective coherency control
JP5226010B2 (en) Shared cache control device, shared cache control method, and integrated circuit
US7882327B2 (en) Communicating between partitions in a statically partitioned multiprocessing system
US11487674B2 (en) Virtual memory pool within a network which is accessible from multiple platforms
TW200424850A (en) Logic and method for reading data from cache
EP3188028B1 (en) Buffer management method and apparatus
CN100514311C (en) Method and apparatus for implementing a combined data/coherency cache
US6952761B2 (en) Bus interface selection by page table attributes
TW502163B (en) Device to increase the efficiency of processor-systems
US10318428B2 (en) Power aware hash function for cache memory mapping
KR20070084441A (en) Coherent caching of local memory data
US4924379A (en) Multiprocessor system with several processors equipped with cache memories and with a common memory

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees