TWI537731B - Systems and methods for supporting a plurality of load accesses of a cache in a single cycle - Google Patents

Systems and methods for supporting a plurality of load accesses of a cache in a single cycle Download PDF

Info

Publication number
TWI537731B
TWI537731B TW102127066A TW102127066A TWI537731B TW I537731 B TWI537731 B TW I537731B TW 102127066 A TW102127066 A TW 102127066A TW 102127066 A TW102127066 A TW 102127066A TW I537731 B TWI537731 B TW I537731B
Authority
TW
Taiwan
Prior art keywords
cache memory
data cache
memory
requests
tag
Prior art date
Application number
TW102127066A
Other languages
Chinese (zh)
Other versions
TW201428494A (en
Inventor
卡塞凱彥 艾弗戴亞潘
摩翰麥德 艾伯戴爾拉
Original Assignee
軟體機器公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/561,570 external-priority patent/US9229873B2/en
Priority claimed from US13/561,441 external-priority patent/US9740612B2/en
Priority claimed from US13/561,491 external-priority patent/US9710399B2/en
Priority claimed from US13/561,528 external-priority patent/US9430410B2/en
Application filed by 軟體機器公司 filed Critical 軟體機器公司
Publication of TW201428494A publication Critical patent/TW201428494A/en
Application granted granted Critical
Publication of TWI537731B publication Critical patent/TWI537731B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Description

支援在單一週期中一快取記憶體的複數個載入存取之系統與方法 System and method for supporting multiple load accesses of a cache memory in a single cycle

支援在單一週期中一快取記憶體的複數個載入存取之系統與方法。 A system and method for supporting a plurality of load accesses of a cache memory in a single cycle.

中央處理單元中的快取記憶體係電腦的該中央處理單元所使用以縮減存取記憶體所需平均時間的資料儲存結構。該資料儲存結構是儲存資料副本的記憶體,位於最頻繁使用的主記憶體位置。而且,快取記憶體係比主要記憶體小並可更迅速存取的記憶體。有數種不同類型的快取記憶體。這些包括實體索引實體標籤化(physically indexed physically tagged,PIPT)、虛擬索引虛擬標籤化(virtually indexed virtually tagged,VIVT)和虛擬索引實體標籤化(virtually indexed physically tagged,VIPT)。 A data storage structure used by the central processing unit of the cache memory computer in the central processing unit to reduce the average time required to access the memory. The data storage structure is a memory for storing a copy of the data, located at the most frequently used main memory location. Moreover, the cache memory system is smaller than the main memory and can be accessed more quickly. There are several different types of cache memory. These include physically indexed physical tagged (PIPT), virtual indexed virtually tagged (VIVT), and virtual indexed physical tagged (VIPT).

可以在單一週期中容納多重存取的快取記憶體提供性能優勢。尤其是,此快取記憶體的特徵為縮減存取等待時間(latencies)。在單一週期中容納多個存取的慣用方法包括使用多埠式快取記憶體和供應包括複數個標籤和資料庫(banks)的快取記憶體。 A cache memory that accommodates multiple accesses in a single cycle provides performance advantages. In particular, this cache memory is characterized by reduced access latency. Conventional methods of accommodating multiple accesses in a single cycle include the use of multi-stream caches and provisioning of cache memory including a plurality of tags and banks.

多埠式快取記憶體係一次可處理一個以上的請求之快取記憶體。在存取某些慣用快取記憶體中,只請求單一記憶體位址,然而在多埠式快取記憶體中,一次可以請求N個記憶體位址,其中N係該多埠式快取記憶體所具備的埠數量。多埠式快取記憶體的優勢係可容納較大的流通量(throughput)(例如較大數量的載入和儲存請求)。然而,容納漸增的高程度流通量所需要的快取記憶體埠的數量可能不切實際。 The multi-stream cache system can process more than one requested cache memory at a time. In accessing some conventional cache memory, only a single memory address is requested. However, in a multi-stream memory, N memory addresses can be requested at one time, where N is the multi-stream memory. The number of defects that are available. The advantage of multi-ported cache memory is that it can accommodate larger throughputs (such as larger numbers of load and store requests). However, the amount of cache memory that is required to accommodate increasing levels of throughput may be impractical.

當每個標籤和資料庫可處理至少一個請求時,包括複數個標籤和資料庫的快取記憶體可一次處理一個以上的請求。然而,當一個以上的請求試圖存取相同資料庫時,必須判定該請求是否允許存取該庫。在一慣用方法中,仲裁(arbitration)係用於判定將允許哪個請求存取給定標籤和資料庫。在此慣用方法中,執行該仲裁所需要的時間會延遲對該標籤庫的存取,並因而延遲觸發通常在處理器的1階(level 1)快取記憶體中的關鍵(critical)載入命中(Load Hit)信號。 When each tag and database can process at least one request, the cache containing multiple tags and libraries can process more than one request at a time. However, when more than one request attempts to access the same repository, it must be determined whether the request allows access to the library. In a conventional method, arbitration is used to determine which request will be allowed to access a given tag and database. In this conventional method, the time required to perform the arbitration delays access to the tag library and thus delays the triggering of critical loading typically in the processor's level 1 cache. Hit Hit signal.

容納涉及多個載入的流通量的慣用方法可能在接收載入提取信號上產生不符合要求的延遲。揭示解決這些缺點的一種支援(例如從靜態隨機存取記憶體(Static Random Access Memory,SRAM)或其他類型記憶體形成的)資料快取記憶體的複數個載入存取之方法。然而,所主張的具體實施例不限於解決任何或所有前述缺點的實作。作為該方法的一部分,存取複數個請求以存取該資料快取記憶體,且回應該等複數個請求,存取一標籤記憶體,用以維護在該資料快取記憶體中每個條目之標籤的複數個副 本。識別對應於個別請求的標籤。(例如從SRAM或其他類型記憶體形成的)資料快取記憶體分成許多資料庫或「區塊(blocks)」。基於該等經識別的標籤存取資料快取記憶體。存取資料快取記憶體的該等複數個區塊的相同區塊的複數個請求產生涉及該區塊的存取仲裁。該區塊存取仲裁和對應於個別存取請求的標籤的存取一起並行執行。從而,避免載入提取信號之時序的損失(penalty),該載入提取信號之時序的損失是慣用方法中存取標籤和資料庫之仲裁所必要的(exacted)。 Conventional methods of accommodating throughput involving multiple loads may result in undesirable delays in receiving the load pull signal. A method of resolving these shortcomings, such as a plurality of load accesses of a data cache from a static random access memory (SRAM) or other type of memory, is disclosed. However, the specific embodiments claimed are not limited to implementations that solve any or all of the aforementioned disadvantages. As part of the method, accessing a plurality of requests to access the data cache memory, and responding to a plurality of requests, accessing a tag memory for maintaining each entry in the data cache memory Multiple pairs of labels this. Identify tags that correspond to individual requests. Data cache memory (for example, formed from SRAM or other types of memory) is divided into a number of databases or "blocks". Accessing data cache memory based on the identified tags. A plurality of requests for accessing the same block of the plurality of blocks of the data cache memory result in access arbitration involving the block. The block access arbitration is performed in parallel with the access of the tags corresponding to the individual access requests. Thus, the penalty of loading the timing of the extracted signal is avoided, and the loss of timing of the load-extracted signal is necessary for arbitration of the access tag and the database in the conventional method.

100‧‧‧示例性運算環境 100‧‧‧Executive computing environment

101‧‧‧系統 101‧‧‧ system

103‧‧‧1階(L1)快取記憶體;L1快取記憶體 103‧‧‧1 (L1) cache memory; L1 cache memory

103a‧‧‧1階(L1)資料快取記憶體;L1資料快取記憶體;資料快取記憶體 103a‧‧1st order (L1) data cache memory; L1 data cache memory; data cache memory

103b‧‧‧資料快取記憶體標籤記憶體 103b‧‧‧Data cache memory tag memory

103c‧‧‧L1快取記憶體控制器;快取記憶體控制器 103c‧‧‧L1 cache memory controller; cache memory controller

105‧‧‧中央處理單元 105‧‧‧Central Processing Unit

107‧‧‧2階(L2)快取記憶體;L2快取記憶體 107‧‧‧2nd order (L2) cache memory; L2 cache memory

109‧‧‧主要記憶體 109‧‧‧ main memory

111‧‧‧系統介面;主要記憶體 111‧‧‧System interface; main memory

201‧‧‧載入請求存取器 201‧‧‧Load request accessor

203‧‧‧標籤記憶體存取器 203‧‧‧ tag memory accessor

205‧‧‧快取記憶體存取器 205‧‧‧Cache Memory Accessor

300‧‧‧流程圖 300‧‧‧ Flowchart

301、303、305、307‧‧‧步驟 301, 303, 305, 307 ‧ ‧ steps

1-N‧‧‧存取請求 1-N‧‧‧ access request

AR1‧‧‧第一存取請求;存取請求 AR1‧‧‧ first access request; access request

AR2‧‧‧第二存取請求;存取請求 AR2‧‧‧ second access request; access request

(AR1-ARN)‧‧‧請求 (AR1-ARN) ‧‧‧Request

本發明以及其進一步優勢藉由下列連同附圖的描述而變得更了解,其中:圖1A顯示根據一具體實施例的支援在單一週期中一資料快取記憶體的複數個載入存取之系統的示例性運算環境。 The invention and its further advantages are further understood by the following description in conjunction with the accompanying drawings in which: FIG. 1A shows a plurality of load accesses supporting a data cache in a single cycle in accordance with an embodiment. An exemplary computing environment for the system.

圖1B顯示根據一具體實施例之方式,其中複數個資料區塊藉由在相同時脈週期中多重載入存取的流通量促進資料快取記憶體的存取。 FIG. 1B illustrates a manner in which a plurality of data blocks facilitate access of a data cache by a throughput of multiple load accesses in the same clock cycle, in accordance with an embodiment.

圖1C顯示根據一具體實施例維護對應於1階(level one)資料快取記憶體的條目之標籤的複數個副本之資料快取記憶體標籤記憶體。 1C shows a data cache memory tag memory that maintains a plurality of copies of a tag corresponding to an entry of a level one data cache in accordance with an embodiment.

圖1D例示根據一具體實施例之有關於和資料快取記憶體標籤記憶體的搜尋(search)並行執行的存取請求1-N的第一存取請求和第二存取請求之仲裁運算。 1D illustrates an arbitration operation for a first access request and a second access request for an access request 1-N performed in parallel with a search of a data cache memory tag, in accordance with an embodiment.

圖1E例示根據一具體實施例的一種支援在單一週期中一資 料快取記憶體的複數個載入存取之系統所執行的運算。 Figure 1E illustrates a support for a single cycle in accordance with an embodiment. The memory cache performs a number of operations performed by the system that loads the access.

圖2顯示根據一具體實施例的一種支援在單一週期中一資料快取記憶體的複數個載入存取之系統的組件。 2 shows components of a system that supports a plurality of load accesses of a data cache in a single cycle, in accordance with an embodiment.

圖3顯示根據一具體實施例的一種支援在單一週期中一資料快取記憶體的複數個載入存取之方法的流程圖。 3 shows a flow diagram of a method of supporting a plurality of load accesses of a data cache in a single cycle, in accordance with an embodiment.

應注意,同樣的參考號碼指稱圖示中同樣的元件。 It should be noted that the same reference numbers refer to the same elements in the drawings.

雖然已搭配一具體實施例說明本發明,但本發明不欲被限制在文中所提出的具體形式。相反地,其欲涵蓋可以合理包括於所附諸申請專利範圍所定義的本發明之範疇內的替代例、修飾例相等物。 Although the present invention has been described in connection with a specific embodiment, the invention is not intended to be limited to the specific forms disclosed herein. On the contrary, the invention is intended to cover alternatives, modifications, and equivalents, which are included within the scope of the invention as defined by the appended claims.

在以下實施方式中,已提出諸如具體的方法順序、結構、元件和連接的眾多具體細節。然而應了解,不需要利用這些和其他具體細節實施本發明之具體實施例。在其他狀況下,已省略或不以特定細節說明習知的結構、元件或關係以避免對此說明造成不必要的模糊。 In the following embodiments, numerous specific details are set forth, such as the specific method sequences, structures, elements, and connections. It should be understood, however, that the specific embodiments of the invention are not to be In other instances, well-known structures, components, or <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt;

本說明書內提及「一具體實施例(one embodiment)」係欲指出有關該具體實施例搭所述的特定特徵、結構或特徵包括於本發明之至少一具體實施例中。在本說明書內的不同地方出現的用語「在一具體實施例中(in one embodiment)」不必然全部參考相同具體實施例,亦非其他具體實施例所互斥的分開或替代性具體實施例。此外,所描述的許多特徵可能會在某些具體實施例中呈現而沒有在其他具體實施例中呈現。類似地,所描述的許多需求可能為某些具體實施例的需求,但不為其他具體實施例的需 求。 References to "one embodiment" in this specification are intended to indicate that the particular features, structures, or characteristics described in connection with the specific embodiments are included in at least one embodiment of the invention. The phrase "in one embodiment" or "an" or "an" or "an" In addition, many of the features described may be presented in some specific embodiments and not in other specific embodiments. Similarly, many of the requirements described may be desirable for certain embodiments, but not for other specific embodiments. begging.

以下實施方式的某些部分依據電腦記憶體內的資料位元運算的程序、步驟、邏輯區塊、處理和其他符號表示進行描述。這些說明和表示係熟習此項資料處理技術者所使用以將其工作的主旨最有效傳達給熟習此項技術的其他人士的手段。在此一般將程序、電腦執行的步驟、邏輯區塊、處理等認為導致所需結果的步驟或指令的前後一致序列。該等步驟需要物理量的實體操控。通常情況下,但非必須,這些物理量的形式為電腦可讀儲存媒體的電性或磁性信號,且能夠被儲存、轉移、結合、比較、或操作於電腦系統中。已證實有時(主要為了平常使用)將此等訊號稱作位元、值、元件、符號、字元、術語、號碼或其類似者係便利的。 Portions of the following embodiments are described in terms of procedures, steps, logic blocks, processing, and other symbolic representations of data bit operations in a computer memory. These instructions and representations are the means used by those skilled in the art to best convey the substance of their work to others skilled in the art. Here, the program, the steps performed by the computer, the logical blocks, the processing, and the like are generally considered to be a consistent sequence of steps or instructions leading to the desired result. These steps require physical manipulation of physical quantities. Typically, but not necessarily, these physical quantities take the form of electrical or magnetic signals of a computer-readable storage medium and can be stored, transferred, combined, compared, or operated in a computer system. It has proven convenient at times (primarily for normal use) to refer to such signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

然而,應記得,所有這些和相似術語係與該等適當的物理量有關聯,且僅是運用於這些物理量的方便用語。除非從以下討論顯而易見另外明確聲明,否則應察知,在整個本發明中利用諸如「存取(accessing)」或「搜尋(searching)」或「識別(identifying)」或「提供(providing)」或此類的術語的討論,係指電腦系統或類似電子計算裝置的動作和處理,其將在電腦系統暫存器和記憶體及其他電腦可讀媒體內表示為物理(電子)量的資料運用且轉換成在電腦系統記憶體或暫存器或其他此類資訊儲存、傳輸或顯示裝置內類似地表示為物理量的其他資料。 However, it should be remembered that all of these and similar terms are associated with the appropriate physical quantities and are merely convenient terms for the application. Unless expressly stated otherwise from the following discussion, it should be appreciated that throughout the present invention such as "accessing" or "searching" or "identifying" or "providing" or this The terminology of a class refers to the action and processing of a computer system or similar electronic computing device that is represented and converted in physical (electronic) quantities of data in computer system registers and memory and other computer readable media. Other material similarly represented as physical quantities in a computer system memory or scratchpad or other such information storage, transmission or display device.

一種根據一具體實施例支援在單一週期中一快取記憶體的複數個載入存取之系統的示例性運算環境An exemplary computing environment for a system that supports multiple load accesses of a cache memory in a single cycle in accordance with one embodiment

圖1A顯示根據一具體實施例支援在單一時脈週期中複數個 載入存取一資料快取記憶體之示例性運算環境100。系統101允許在單一時脈週期內獲得對應於由複數個載入請求對1階資料快取記憶體所搜尋資料的標籤(其具有容納複數個請求的複數個資料區塊)。而且,作為系統101的運算的一部分,在相同時脈週期內執行涉及對該1階資料快取記憶體的該等複數個載入請求的區塊存取仲裁。因此,複數個載入存取的流通量被調適且避免慣用方法中仲裁所必要的載入提取信號時序的損失。圖1A顯示系統101、1階(L1)快取記憶體103、1階(L1)資料快取記憶體103a、資料快取記憶體標籤記憶體103b、L1快取記憶體控制器103c、中央處理單元(Central Processing Unit,CPU)105、2階(level two,L2)快取記憶體107、主記憶體109和系統介面111。 1A shows support for multiples in a single clock cycle, in accordance with an embodiment. An exemplary computing environment 100 that accesses a data cache memory is loaded. The system 101 allows for a tag (having a plurality of data blocks containing a plurality of requests) corresponding to the data sought by the first order data cache by a plurality of load requests within a single clock cycle. Moreover, as part of the operation of system 101, block access arbitration involving the plurality of load requests for the first order data cache is performed in the same clock cycle. Therefore, the throughput of a plurality of load accesses is adapted and the loss of the timing of the load extraction signals necessary for arbitration in the conventional method is avoided. 1A shows a system 101, a 1st order (L1) cache memory 103, a 1st order (L1) data cache memory 103a, a data cache memory tag memory 103b, an L1 cache memory controller 103c, and a central processing. A central processing unit (CPU) 105, a level two (L2) cache memory 107, a main memory 109, and a system interface 111.

請即參考圖1A,L1快取記憶體103是1階或「主要(primary)」快取記憶體且L2快取記憶體107是2階「次要(secondary)」快取記憶體。在一具體實施例中,L1快取記憶體103可以形成為CPU 105的一部分。在一具體實施例中,如在圖1A中所示,L1快取記憶體103可包括L1資料快取記憶體103a、資料快取記憶體標籤記憶體103b和L1快取記憶體控制器103c。在一具體實施例中,L1資料快取記憶體103a可分成複數個資料區塊。在一具體實施例中,L1資料快取記憶體103a可分成四個8千位元組(8kilobyte)資料區塊。在其他具體實施例中,L1資料快取記憶體103a可分成具有儲存其他資料量之容量的其他資料區塊數量。在一具體實施例中,如在圖1B中所示,該等複數個資料區塊藉由在相同時脈週期中的多重存取流通量有助於存取L1資料快取記憶體103a。在一具體實施例中,使用仲裁可同時解決關於搜尋存取L1資料快取記憶體103a的相同區塊的衝突請求(忽略如上所討論對其 時間的影響)。在一具體實施例中,該等資料區塊可包括由載入所存取的快取記憶體行條目。 Referring to FIG. 1A, the L1 cache memory 103 is a 1st order or "primary" cache memory and the L2 cache memory 107 is a 2nd order "secondary" cache memory. In a specific embodiment, the L1 cache memory 103 can be formed as part of the CPU 105. In one embodiment, as shown in FIG. 1A, the L1 cache memory 103 can include an L1 data cache memory 103a, a data cache memory tag memory 103b, and an L1 cache memory controller 103c. In one embodiment, the L1 data cache memory 103a can be divided into a plurality of data blocks. In one embodiment, the L1 data cache memory 103a can be divided into four 8 kilobyte data blocks. In other embodiments, the L1 data cache memory 103a can be divided into other data blocks having a capacity to store other data volumes. In one embodiment, as shown in FIG. 1B, the plurality of data blocks facilitate access to the L1 data cache memory 103a by multiple access flows in the same clock cycle. In a specific embodiment, the use of arbitration can simultaneously resolve conflicting requests for searching for the same block accessing the L1 data cache memory 103a (ignoring it as discussed above) The impact of time). In a specific embodiment, the data blocks may include cache memory line entries accessed by the load.

資料快取記憶體標籤記憶體103b配置成用以維護儲存於L1資料快取記憶體103a中的每個快取記憶體行條目之標籤條目。參照圖1C,在一具體實施例中,作為該配置的一部分,資料快取記憶體標籤記憶體103b維護對應於L1資料快取記憶體103a的該等條目的該等標籤的複數個副本(例如1-N)。尤其是,存取L1資料快取記憶體103a的每個請求皆符合對應於L1資料快取記憶體103a的條目的標籤的專用副本。維護標籤條目的此方式促進與在單一時脈週期內該等快取記憶體行條目相關聯的標籤的識別。在一具體實施例中,可以在涉及對L1資料快取記憶體103a中與該標籤相關聯的資料之存取請求(例如載入請求)的仲裁被執行時,於相同時脈週期中完成標籤的識別。在一具體實施例中,對L1資料快取記憶體103a的存取請求(例如載入請求)觸發資料快取記憶體標籤記憶體103b的檢索,其係針對對應於該載入請求所搜尋資料的標籤。 The data cache memory tag memory 103b is configured to maintain a tag entry for each cache memory line entry stored in the L1 data cache memory 103a. Referring to FIG. 1C, in one embodiment, as part of the configuration, the data cache memory tag memory 103b maintains a plurality of copies of the tags corresponding to the entries of the L1 data cache memory 103a (eg, 1-N). In particular, each request to access the L1 data cache memory 103a conforms to a dedicated copy of the tag corresponding to the entry of the L1 data cache memory 103a. This manner of maintaining tag entries facilitates the identification of tags associated with such cache memory line entries within a single clock cycle. In a specific embodiment, the tag may be completed in the same clock cycle when arbitration involving an access request (e.g., load request) for data associated with the tag in the L1 data cache memory 103a is performed. Identification. In one embodiment, an access request (eg, a load request) to the L1 data cache memory 103a triggers a retrieval of the data cache memory tag 103b for the data corresponding to the load request. s Mark.

請即參考圖1A,系統101,回應L1快取記憶體103接收到存取L1快取記憶體103的L1資料快取記憶體103a的複數個請求,執行資料快取記憶體標籤記憶體103b的搜尋,使得對應於該等複數個請求所搜尋資料的標籤和與該等請求有關聯的任何仲裁運算的執行一起並行識別。此是在圖1D中例示,其中有關存取請求1-N的第一存取請求AR1和第二存取請求AR2的仲裁運算係顯示為和資料快取記憶體標籤記憶體103b的搜尋一起並行執行。在一具體實施例中,系統101的前述動作運算以避免對載入提取信號的時序的仲裁運算的有害影響。尤其是,經複製的資料快取記憶體標籤記憶 體103b和成塊(blocked)L1資料快取記憶體103a所支援的系統101,藉由在一時脈週期中數個載入請求以促進快取記憶體的存取而不損失快取記憶體命中等待時間和流通量。在一具體實施例中,系統101可位於快取記憶體控制器103c中。在其他具體實施例中,系統101可從快取記憶體控制器103c分離,但與其共同運算。 Referring to FIG. 1A, the system 101 responds to the L1 cache memory 103 receiving a plurality of requests for accessing the L1 data cache memory 103a of the L1 cache memory 103, and executing the data cache memory tag memory 103b. The search is such that the tags corresponding to the data sought by the plurality of requests are identified in parallel with the execution of any arbitration operations associated with the requests. This is illustrated in FIG. 1D, in which the arbitration operation of the first access request AR1 and the second access request AR2 regarding the access request 1-N is displayed in parallel with the search of the data cache memory tag 103b. carried out. In a specific embodiment, the aforementioned action of system 101 operates to avoid detrimental effects on the arbitration operation of loading the timing of the extracted signals. In particular, the copied data cache memory tag memory The body 101b and the system 101 supported by the L1 data cache memory 103a support a cache memory access by a number of load requests in a clock cycle without loss of cache memory hits. Waiting time and circulation. In a specific embodiment, system 101 can be located in cache memory controller 103c. In other embodiments, system 101 can be detached from cache memory controller 103c, but in conjunction with it.

請即重新參考圖1A,主記憶體111包括實體位址,用以儲存複製到快取記憶體的資訊。在一具體實施例中,當已快取的包含在主記憶體的該等實體位址中的該資訊改變時,更新對應的快取資訊以反映主記憶體儲存資訊的變化。同時,在圖1A中亦顯示系統介面111。 Referring back to FIG. 1A again, the main memory 111 includes a physical address for storing information copied to the cache memory. In a specific embodiment, when the cached information contained in the physical addresses of the main memory changes, the corresponding cache information is updated to reflect changes in the main memory storage information. At the same time, system interface 111 is also shown in FIG. 1A.

運算Operation

圖1E例示根據一具體實施例支援在單一週期中一資料快取記憶體的複數個載入存取之系統101所執行的運算。為了清楚表示和簡化之目的,例示有關支援一資料快取記憶體的複數個載入存取的這些運算。應可察知,可以根據一具體實施例執行圖1E未例示的其他運算。 FIG. 1E illustrates an operation performed by system 101 that supports a plurality of load accesses of a data cache in a single cycle in accordance with an embodiment. For the purposes of clarity of representation and simplification, these operations are described with respect to a plurality of load accesses that support a data cache. It should be appreciated that other operations not illustrated in FIG. 1E may be performed in accordance with a particular embodiment.

參照圖1E,在A,接收存取資料快取記憶體103a的複數個請求。在圖1E範例中,接收該等複數個請求之兩者,即存取請求AR1和存取請求AR2,其試圖存取L1資料快取記憶體103a的相同資料區塊(例如AR1和AR2的虛擬位址位元6和7所識別的區塊0,例如有關AR1和AR2的虛擬位址的虛擬位址位元7:6)。 Referring to FIG. 1E, at A, a plurality of requests for accessing the data cache memory 103a are received. In the example of FIG. 1E, two of the plurality of requests are received, namely, an access request AR1 and an access request AR2, which attempt to access the same data block of the L1 data cache memory 103a (eg, virtual of AR1 and AR2) Block 0 identified by address bits 6 and 7, such as virtual address bits 7: 6) for the virtual addresses of AR1 and AR2.

在B,搜尋資料快取記憶體標籤記憶體103b並識別與存取L1資料快取記憶體103a的該等複數個請求(AR1-ARN)所搜尋資料有關聯之常 駐其中的標籤。 At B, the search data cache memory tag memory 103b is identified and associated with the search for the plurality of requests (AR1-ARN) of the L1 data cache memory 103a. The label in which it is located.

在C,在與B執行的搜尋資料快取記憶體標籤記憶體103b相同的時脈週期期間,啟動並完成判定將允許該等兩個請求(存取請求AR1和存取請求AR2)之哪一者存取L1資料快取記憶體103a的區塊0的仲裁處理。作為該仲裁處理的一部分,選擇該等兩個請求之一者(存取請求AR1)繼續進行區塊0的存取。 At C, during the same clock cycle as the search data cache memory tag memory 103b executed by B, which of the two requests (access request AR1 and access request AR2) is allowed to be initiated and completed is determined. The user accesses the arbitration processing of the block 0 of the L1 data cache memory 103a. As part of this arbitration process, one of the two requests (access request AR1) is selected to continue accessing block 0.

在D,該等複數個存取請求(除了在圖1E範例中諸如AR2的仲裁失敗者之外)使用在B所識別的該等標籤存取資料快取記憶體103a。 At D, the plurality of access requests (except for the arbitration loser such as AR2 in the example of FIG. 1E) use the tags identified by B to access the data cache memory 103a.

在E,該等存取請求所搜尋的資料(例如對應於AR1的「X」)是在L1資料快取記憶體103a中識別並讀取(例如載入)。 At E, the data sought by the access requests (e.g., "X" corresponding to AR1) is identified and read (e.g., loaded) in the L1 data cache memory 103a.

在一具體實施例中,系統101設計成在單一週期中提供幾個載入和儲存指令的環境中運算。在一具體實施例中,於文中所揭示的方法論避免依賴可能不切實際的使用過多數量的快取記憶體埠。在示例性具體實施例中,啟用流通量而未負面影響「載入命中(Load Hit)」信號的時序。 In one embodiment, system 101 is designed to operate in an environment that provides several load and store instructions in a single cycle. In a specific embodiment, the methodology disclosed herein avoids relying on the use of an excessive number of cache memories that may be impractical. In an exemplary embodiment, throughput is enabled without negatively impacting the timing of the "Load Hit" signal.

在一具體實施例中,如於文中所討論,L1資料快取記憶體103a可組織成複數個區塊,且對應於在L1資料快取記憶體103a中所維護資料的該等標籤可以複製並儲存於資料快取記憶體標籤記憶體103b。而且,如於文中所討論,將資料快取記憶體103a組織成區塊允許在單一週期中支援數個載入,只要係不存取相同資料區塊。然而,在一具體實施例中,只要複數個載入是在相同位址,單一資料區塊可以容納複數個載入。在示例性具體實施例中,於文中所討論方法不執行有關標籤的任何仲裁,且因而避免與從此仲裁運算導出的「載入命中(Load Hit)」信號的時序有關聯的等 待時間損失(增加等待時間)。 In a specific embodiment, as discussed herein, the L1 data cache memory 103a can be organized into a plurality of blocks, and the tags corresponding to the data maintained in the L1 data cache memory 103a can be copied and Stored in the data cache memory tag memory 103b. Moreover, as discussed herein, organizing the data cache memory 103a into blocks allows for multiple loads to be supported in a single cycle, as long as the same data block is not accessed. However, in one embodiment, a single data block can accommodate a plurality of loads as long as the plurality of loads are at the same address. In an exemplary embodiment, the method discussed herein does not perform any arbitration on the tag, and thus avoids the timing associated with the "Load Hit" signal derived from the arbitration operation, etc. Loss of time (increasing waiting time).

根據一具體實施例支援在單一週期中一快取記憶體的複數個載入存取之系統的組件A component supporting a system of multiple load accesses of a cache memory in a single cycle in accordance with an embodiment

圖2顯示根據一具體實施例一種支援在單一週期中一快取記憶體的複數個載入存取之系統101的組件。在一具體實施例中,系統101的組件實施支援複數個載入存取的演算法。在圖2具體實施例中,系統101的組件包括載入請求存取器201、標籤記憶體存取器203和快取記憶體存取器205。 2 shows components of a system 101 that supports a plurality of load accesses of a cache memory in a single cycle, in accordance with an embodiment. In one embodiment, the components of system 101 implement an algorithm that supports a plurality of load accesses. In the particular embodiment of FIG. 2, components of system 101 include load request accessor 201, tag memory accessor 203, and cache memory accessor 205.

載入請求存取器201存取複數個載入請求,以搜尋存取儲存於L1資料快取記憶體(例如在圖1A中的103a)中的資料。在一具體實施例中,在某些情況下,該等複數個載入請求的一個以上的載入請求可以搜尋存取該L1資料快取記憶體中的相同資料區塊。在此情況下,執行仲裁來決定將允許哪個載入請求存取L1資料快取記憶體的該區塊。 The load request accessor 201 accesses a plurality of load requests to search for data stored in the L1 data cache (e.g., 103a in Fig. 1A). In one embodiment, in some cases, more than one load request of the plurality of load requests may search for the same data block in the L1 data cache. In this case, arbitration is performed to determine which load request will be allowed to access the block of the L1 data cache.

作為回應接收複數個載入請求,標籤記憶體存取器203平行搜尋一資料快取記憶體標籤記憶體(例如在圖1A中的103b)的標籤的個別副本(例如1-N),該等標籤對應於一L1資料快取記憶體(例如圖1A的103a)的條目。在一具體實施例中,每個載入請求符合對應於L1資料快取記憶體的該等條目的標籤的專用副本。維護標籤條目的此方式有助於在單一時脈週期內識別與快取記憶體行條目有關聯的標籤。在一具體實施例中,涉及有關一標籤之資料的L1資料快取記憶體的區塊之一存取請求(例如載入請求)的仲裁是在完成標籤識別的相同時脈週期內執行。 In response to receiving a plurality of load requests, the tag memory accessor 203 searches for an individual copy (eg, 1-N) of a tag of a data cache memory tag (eg, 103b in FIG. 1A) in parallel. The tag corresponds to an entry of an L1 data cache (e.g., 103a of Figure 1A). In a specific embodiment, each load request conforms to a dedicated copy of the tag corresponding to the entry of the L1 data cache. This way of maintaining tag entries helps identify tags associated with cache line entries in a single clock cycle. In one embodiment, the arbitration of an access request (e.g., a load request) for one of the blocks of the L1 data cache associated with the data for a tag is performed during the same clock cycle in which the tag identification is completed.

快取記憶體存取器205使用該等標籤存取L1資料快取記憶體的複數個資料區塊,該等標籤可由標籤記憶體存取器203識別。在一具體實施例中,該等複數個資料區塊藉由在相同時脈週期中的多個存取請求者(requestors)促進L1資料快取記憶體(例如在圖1A的103a)的存取。在一具體實施例中,使用仲裁可以解決同時搜尋存取L1資料快取記憶體的相同區塊的衝突存取請求(藉由系統101的運算忽略如於文中所討論對於「載入命中(Load Hit)」信號的時間影響)。在一具體實施例中,存取資料區塊涉及資料的載入。 The cache memory accessor 205 uses the tags to access a plurality of data blocks of the L1 data cache memory, the tags being identifiable by the tag memory accessor 203. In one embodiment, the plurality of data blocks facilitate access by the L1 data cache (eg, 103a of FIG. 1A) by a plurality of accessors in the same clock cycle. . In a specific embodiment, arbitration can be used to resolve conflicting access requests for simultaneous access to the same block of L1 data cache memory (by operation of system 101, as discussed in the text for "Load Hits (Load) Hit)" The time effect of the signal). In a specific embodiment, accessing the data block involves loading of the data.

應可了解,系統101的前述組件能以硬體或軟體或兩者的組合實施。在一具體實施例中,系統101的組件和運算可由一個或多個電腦組件或程式(例如在圖1A中的快取記憶體控制器103c)的組件和運算涵蓋。在另一具體實施例中,系統101的組件和運算可從前述一或多個電腦組件或程式分離,但可與其組件和運算一起共同運算。 It should be understood that the foregoing components of system 101 can be implemented in hardware or software or a combination of both. In one embodiment, the components and operations of system 101 may be covered by components and operations of one or more computer components or programs (e.g., cache memory controller 103c in FIG. 1A). In another embodiment, the components and operations of system 101 may be separate from one or more of the aforementioned computer components or programs, but may operate in conjunction with its components and operations.

根據一具體實施例支援在單一週期中一快取記憶體的複數個載入存取之方法Method for supporting a plurality of load accesses of a cache memory in a single cycle according to a specific embodiment

圖3顯示根據一具體實施例的一種支援在單一週期中一資料快取記憶體的複數個載入存取之方法的流程圖300。該流程圖包括在一具體實施例中可在電腦可讀取和電腦可執行指令的控制之下由處理器和電子組件執行的處理。雖然在該等流程圖中揭示具體步驟,但此步驟係示例性。亦即本具體實施例最適合執行在流程圖中所陳述各種其他步驟或步驟的變化例。 3 shows a flowchart 300 of a method of supporting a plurality of load accesses of a data cache memory in a single cycle, in accordance with an embodiment. The flowchart includes processing that can be performed by a processor and electronic components under the control of computer readable and computer executable instructions in a particular embodiment. Although specific steps are disclosed in the flowcharts, this step is exemplary. That is, this particular embodiment is best suited to carry out variations of various other steps or steps recited in the flowchart.

請即參考圖3,在步驟301,存取複數個載入請求以存取一資料快取記憶體。在一具體實施例中,該資料快取記憶體可包括複數個區塊,該等複數個區塊可容納該等複數個載入請求。在一具體實施例中,該等複數個載入請求可包括複數個請求,以搜尋存取前述資料快取記憶體的相同區塊。 Referring to FIG. 3, in step 301, a plurality of load requests are accessed to access a data cache. In a specific embodiment, the data cache memory can include a plurality of blocks, and the plurality of blocks can accommodate the plurality of load requests. In a specific embodiment, the plurality of load requests may include a plurality of requests to search for the same block of access to the data cache.

在步驟303,存取一標籤記憶體,該標籤記憶體維護該等標籤的複數個副本,該等標籤為對應資料快取記憶體的該等條目。 In step 303, a tag memory is accessed, the tag memory maintaining a plurality of copies of the tags, the tags being the entries of the corresponding data cache.

在步驟305,識別該等標籤,該等標籤對應於L1快取記憶體所接收的該等複數個載入請求的個別載入請求。在一具體實施例中,每個載入請求符合標籤集(set)的專用副本,該等標籤對應於位在資料快取記憶體中的該等條目。 At step 305, the tags are identified, the tags corresponding to individual load requests for the plurality of load requests received by the L1 cache. In a specific embodiment, each load request conforms to a dedicated copy of a set of tags that correspond to the entries in the data cache.

在步驟307,基於該等標籤存取資料快取記憶體的該等區塊,該等標籤對應於該等個別請求。在一具體實施例中,該等複數個區塊的存取允許在相同時脈週期中有多重載入存取的流通量。 At step 307, the blocks of the data cache memory are accessed based on the tags, the tags corresponding to the individual requests. In one embodiment, the access of the plurality of blocks allows for throughput of multiple load accesses in the same clock cycle.

有關其示例性具體實施例,揭示存取資料快取記憶體之系統與方法。存取複數個請求(其存取資料快取記憶體),且回應該等複數個請求,以存取一標籤記憶體,該標籤記憶體維護在載入快取記憶體中每個條目的標籤的複數個副本。標籤可識別,該等標籤對應於個別請求。基於對應於該等個別請求的該等標籤以存取資料快取記憶體。存取該等複數個區塊的相同區塊的複數個請求造成一存取仲裁,該存取仲裁可在與該標籤記憶體存取的相同時脈週期中執行。 With respect to its exemplary embodiments, systems and methods for accessing data cache memory are disclosed. Accessing a plurality of requests (which access data cache memory), and responding to a plurality of requests to access a tag memory that maintains tags for each entry in the cache memory Multiple copies of it. Tags are identifiable and correspond to individual requests. The data cache memory is accessed based on the tags corresponding to the individual requests. Accessing a plurality of requests for the same block of the plurality of blocks results in an access arbitration that can be performed in the same clock cycle as the tag memory accesses.

雖然為方便起見以單數形說明以上許多組件和處理,但熟習 此項技術者應可察知多個組件和經重複的處理亦可以用於實作本發明之該等技術。再者,雖然本發明已參照其具體實施例特別顯示並說明,但熟習此項技術者應可了解可做到在所揭示的諸具體實施例的形式和細節上的改變而不悖離本發明之精神或範疇。舉例來說,本發明之具體實施例可與多種組件一起採用,且不應受限於以上所提及者。因此欲理解本發明為包括所有變化例和相等物,其落於本發明之真實精神與範疇內。 Although many of the above components and processes are described in the singular form for convenience, they are familiar with Those skilled in the art will recognize that multiple components and repeated processes can also be used to implement the techniques of the present invention. In addition, the present invention has been particularly shown and described with respect to the specific embodiments thereof, and those skilled in the art can understand the form and details of the disclosed embodiments without departing from the invention. The spirit or scope. For example, specific embodiments of the invention may be employed with a variety of components and should not be limited to the above. It is intended that the present invention cover the modifications and

100‧‧‧示例性運算環境 100‧‧‧Executive computing environment

101‧‧‧系統 101‧‧‧ system

103‧‧‧1階(L1)快取記憶體;L1快取記憶體 103‧‧‧1 (L1) cache memory; L1 cache memory

103a‧‧‧1階(L1)資料快取記憶體;L1資料快取記憶體;資料快取記憶體 103a‧‧1st order (L1) data cache memory; L1 data cache memory; data cache memory

103b‧‧‧資料快取記憶體標籤記憶體 103b‧‧‧Data cache memory tag memory

103c‧‧‧L1快取記憶體控制器;快取記憶體控制器 103c‧‧‧L1 cache memory controller; cache memory controller

105‧‧‧中央處理單元 105‧‧‧Central Processing Unit

107‧‧‧2階(L2)快取記憶體;L2快取記憶體 107‧‧‧2nd order (L2) cache memory; L2 cache memory

109‧‧‧主要記憶體 109‧‧‧ main memory

111‧‧‧系統介面;主要記憶體 111‧‧‧System interface; main memory

Claims (20)

一種支援一資料快取記憶體的複數個存取之方法,包括:存取複數個請求,以存取該資料快取記憶體,其中該資料快取記憶體包括複數個區塊;回應該等複數個請求存取該資料快取記憶體,存取一標籤記憶體,以維護在該資料快取記憶體中每個條目的標籤的複數個副本,且識別對應於該等複數個請求的個別請求的標籤;及基於對應於該等個別請求的該等標籤以存取該資料快取記憶體,其中存取該等複數個區塊的一相同區塊的複數個請求造成一存取仲裁(arbitration),其是在與該存取該標籤記憶體的相同時脈週期中執行。 A method for supporting a plurality of accesses of a data cache memory, comprising: accessing a plurality of requests to access the data cache memory, wherein the data cache memory comprises a plurality of blocks; A plurality of requests accessing the data cache memory, accessing a tag memory to maintain a plurality of copies of the tags of each entry in the data cache, and identifying individual ones corresponding to the plurality of requests a tag of the request; and accessing the data cache based on the tags corresponding to the individual requests, wherein accessing a plurality of requests of an identical block of the plurality of blocks results in an access arbitration ( Arbitration), which is performed in the same clock cycle as the access to the tag memory. 如申請專利範圍第1項之方法,其中該基於該標籤存取該資料快取記憶體包括複數個載入,該等載入是在單一時脈週期中執行。 The method of claim 1, wherein the accessing the data cache based on the tag comprises a plurality of loads, the loading being performed in a single clock cycle. 如申請專利範圍第1項之方法,其中存取該資料快取記憶體的該等複數個請求的每個請求具有對應位於該載入快取記憶體中條目的標籤的專用副本。 The method of claim 1, wherein each request for the plurality of requests to access the data cache has a dedicated copy of the tag corresponding to the entry in the load cache. 如申請專利範圍第1項之方法,其中存取該資料快取記憶體的複數個該等請求搜尋存取各個區塊。 The method of claim 1, wherein the plurality of requests for accessing the data cache access the search for access to the respective blocks. 如申請專利範圍第1項之方法,其中存取該資料快取記憶體的複數個該等複數個請求在相同週期中搜尋存取在該資料快取記憶體的一相同區塊內的不同位址。 The method of claim 1, wherein the plurality of requests for accessing the data cache memory search for different bits in an identical block of the data cache memory in the same cycle site. 如申請專利範圍第1項之方法,其中該等複數個區塊包括一1階(level one)資料快取記憶體的各個部分並含有快取記憶體行條目。 The method of claim 1, wherein the plurality of blocks comprise portions of a level one data cache and a cache memory line entry. 如申請專利範圍第1項之方法,其中該標籤記憶體包括一標籤靜態隨機存取記憶體(Static Random Access Memory,SRAM)。 The method of claim 1, wherein the tag memory comprises a tag static random access memory (SRAM). 一種快取記憶體系統,包括:一資料快取記憶體,其分成複數個資料區塊;標籤記憶體,其配置成維護對應於該載入快取記憶體的條目之標籤的複數個副本所對應的標籤;以及一快取記憶體子系統,其配置成存取該標籤記憶體和該資料快取記憶體,其中仲裁運算是在與存取該標籤記憶體的相同時脈週期內執行,該等仲裁運算是與存取該資料快取記憶體的複數個請求有關聯。 A cache memory system comprising: a data cache memory divided into a plurality of data blocks; and a tag memory configured to maintain a plurality of copies of tags corresponding to the entries of the load cache memory Corresponding tag; and a cache memory subsystem configured to access the tag memory and the data cache memory, wherein the arbitration operation is performed in the same clock cycle as accessing the tag memory, The arbitration operations are associated with a plurality of requests for accessing the data cache. 如申請專利範圍第8項之快取記憶體系統,其中該等複數個資料區塊容納複數個載入,該等複數個載入可於一單一時脈週期內執行。 For example, in the cache memory system of claim 8, wherein the plurality of data blocks accommodate a plurality of loads, the plurality of loads can be executed in a single clock cycle. 如申請專利範圍第8項之快取記憶體系統,其中存取該資料快取記憶體的複數個該等複數個請求搜尋存取個別資料區塊。 For example, in the cache memory system of claim 8, wherein the plurality of the plurality of requests for accessing the data cache memory are searched for access to the individual data blocks. 如申請專利範圍第8項之快取記憶體系統,其中該等複數個請求的每個請求具有標籤的專用副本,該等標籤對應於位在該資料快取記憶體的快取記憶體行條目。 A cache system as claimed in claim 8 wherein each request of the plurality of requests has a dedicated copy of the tag corresponding to a cache line entry located in the data cache. . 如申請專利範圍第8項之快取記憶體系統,其中該資料快取記憶體分 成四個8千個位元組資料區塊。 For example, the cache memory system of claim 8 of the patent scope, wherein the data cache memory Into four 8,000 byte data blocks. 如申請專利範圍第8項之快取記憶體系統,其中複數個載入請求在相同週期內存取在不同位址的相同區塊。 For example, the cache memory system of claim 8 wherein a plurality of load requests access the same block at different addresses in the same period. 如申請專利範圍第8項之快取記憶體系統,其中該標籤記憶體包括一標籤SRAM。 The cache memory system of claim 8, wherein the tag memory comprises a tag SRAM. 一種電腦系統,包括:一記憶體;一處理器;及一快取記憶體系統,包含:一資料快取記憶體,其配置成儲存資料的單元;一標籤記憶體,其配置成儲存標籤,該等標籤對應於資料的該等單元;以及一快取記憶體控制器,包含一系統,用於支援對該資料快取記憶體的複數個存取,包含:一請求存取組件,用於存取複數個請求以存取該資料快取記憶體,其中該資料快取記憶體包含複數個區塊;一標籤記憶體存取組件,用於存取一標籤記憶體,該標籤記憶體用以維護在該資料快取記憶體中每個條目的標籤的複數個副本,且識別標籤,該等標籤對應於該等複數個請求的個別請求;及 一資料快取記憶體存取組件,用於基於該等標籤以存取該資料快取記憶體,該等標籤則為對應該等個別請求,其中存取該等複數個區塊的一相同區塊的複數個請求造成一存取仲裁,該存取仲裁是在與該存取該標籤記憶體的相同時脈週期中執行。 A computer system comprising: a memory; a processor; and a cache memory system comprising: a data cache memory configured to store data; a tag memory configured to store tags, The tags correspond to the units of the data; and a cache memory controller includes a system for supporting a plurality of accesses to the data cache, including: a request access component, for Accessing a plurality of requests to access the data cache memory, wherein the data cache memory comprises a plurality of blocks; and a tag memory access component for accessing a tag memory, the tag memory Maintaining a plurality of copies of the tags of each entry in the data cache and identifying tags corresponding to individual requests for the plurality of requests; and a data cache memory access component for accessing the data cache memory based on the labels, the labels being corresponding to individual requests, wherein accessing a same area of the plurality of blocks The plurality of requests of the block result in an access arbitration that is performed in the same clock cycle as the access to the tag memory. 如申請專利範圍第15項之電腦系統,其中該存取該資料快取記憶體包括複數個載入,該等複數個載入是在一單一時脈週期內執行。 A computer system as claimed in claim 15 wherein the accessing the data cache comprises a plurality of loads, the plurality of loads being performed in a single clock cycle. 如申請專利範圍第15項之電腦系統,其中存取該資料快取記憶體的該等複數個請求的每個請求具有標籤的專用副本,該等標籤對應於位在該載入快取記憶體中的條目。 The computer system of claim 15, wherein each request for the plurality of requests for accessing the data cache has a dedicated copy of the tag corresponding to the bit in the load cache. The entry in . 如申請專利範圍第15項之電腦系統,其中存取該資料快取記憶體的該等複數個請求搜尋存取各個區塊。 A computer system as claimed in claim 15 wherein the plurality of requests for accessing the data cache memory are searched for access to the respective blocks. 如申請專利範圍第15項之電腦系統,其中存取該資料快取記憶體的該等複數個請求是在相同週期中搜尋存取相同區塊的不同位址。 The computer system of claim 15 wherein the plurality of requests for accessing the data cache are searched for different addresses of the same block in the same cycle. 如申請專利範圍第15項之電腦系統,其中該等複數個區塊包括一1階資料快取記憶體的各個部分且含有快取記憶體行條目。 The computer system of claim 15 wherein the plurality of blocks comprise portions of a first-order data cache and includes a cache line entry.
TW102127066A 2012-07-30 2013-07-29 Systems and methods for supporting a plurality of load accesses of a cache in a single cycle TWI537731B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US13/561,570 US9229873B2 (en) 2012-07-30 2012-07-30 Systems and methods for supporting a plurality of load and store accesses of a cache
US13/561,441 US9740612B2 (en) 2012-07-30 2012-07-30 Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US13/561,491 US9710399B2 (en) 2012-07-30 2012-07-30 Systems and methods for flushing a cache with modified data
US13/561,528 US9430410B2 (en) 2012-07-30 2012-07-30 Systems and methods for supporting a plurality of load accesses of a cache in a single cycle

Publications (2)

Publication Number Publication Date
TW201428494A TW201428494A (en) 2014-07-16
TWI537731B true TWI537731B (en) 2016-06-11

Family

ID=50028431

Family Applications (1)

Application Number Title Priority Date Filing Date
TW102127066A TWI537731B (en) 2012-07-30 2013-07-29 Systems and methods for supporting a plurality of load accesses of a cache in a single cycle

Country Status (2)

Country Link
TW (1) TWI537731B (en)
WO (1) WO2014022115A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5513335A (en) * 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5640534A (en) * 1994-10-05 1997-06-17 International Business Machines Corporation Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US5752260A (en) * 1996-04-29 1998-05-12 International Business Machines Corporation High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses
US6704822B1 (en) * 1999-10-01 2004-03-09 Sun Microsystems, Inc. Arbitration protocol for a shared data cache
US7133950B2 (en) * 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor

Also Published As

Publication number Publication date
WO2014022115A1 (en) 2014-02-06
TW201428494A (en) 2014-07-16

Similar Documents

Publication Publication Date Title
US9430410B2 (en) Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US9720839B2 (en) Systems and methods for supporting a plurality of load and store accesses of a cache
US9235514B2 (en) Predicting outcomes for memory requests in a cache memory
CN102483704B (en) There is the transactional memory system that efficient high-speed cache is supported
US6427188B1 (en) Method and system for early tag accesses for lower-level caches in parallel with first-level cache
TWI603264B (en) Region based technique for accurately predicting memory accesses
US7386679B2 (en) System, method and storage medium for memory management
US10831675B2 (en) Adaptive tablewalk translation storage buffer predictor
US10482024B2 (en) Private caching for thread local storage data access
US20120173843A1 (en) Translation look-aside buffer including hazard state
JP2008529181A5 (en)
JP7160792B2 (en) Systems and methods for storing cache location information for cache entry transfers
TW200304594A (en) System and method of data replacement in cache ways
CN113515470A (en) Cache addressing
US20140013054A1 (en) Storing data structures in cache
US20150205721A1 (en) Handling Reads Following Transactional Writes during Transactions in a Computing Device
US9792213B2 (en) Mitigating busy time in a high performance cache
US8356141B2 (en) Identifying replacement memory pages from three page record lists
JP7264806B2 (en) Systems and methods for identifying the pendency of memory access requests in cache entries
US10380034B2 (en) Cache return order optimization
TWI537731B (en) Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US9251093B2 (en) Managing the translation look-aside buffer (TLB) of an emulated machine
CN111344684B (en) Multi-layer cache placement mechanism
US9785574B2 (en) Translation lookaside buffer that employs spacial locality
CN111344684A (en) Multi-level cache placement mechanism