WO2016105961A1 - Cache accessed using virtual addresses - Google Patents

Cache accessed using virtual addresses Download PDF

Info

Publication number
WO2016105961A1
WO2016105961A1 PCT/US2015/064955 US2015064955W WO2016105961A1 WO 2016105961 A1 WO2016105961 A1 WO 2016105961A1 US 2015064955 W US2015064955 W US 2015064955W WO 2016105961 A1 WO2016105961 A1 WO 2016105961A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
cache
virtual address
memory
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2015/064955
Other languages
English (en)
French (fr)
Inventor
Gurindar S. Sohi
Hongil Yoon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wisconsin Alumni Research Foundation
Original Assignee
Wisconsin Alumni Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wisconsin Alumni Research Foundation filed Critical Wisconsin Alumni Research Foundation
Priority to EP15874115.7A priority Critical patent/EP3238074B1/en
Priority to JP2017534307A priority patent/JP6696987B2/ja
Priority to KR1020177020817A priority patent/KR102448124B1/ko
Priority to CN201580070399.3A priority patent/CN107111455B/zh
Publication of WO2016105961A1 publication Critical patent/WO2016105961A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0664Virtualisation aspects at device level, e.g. emulation of a storage device or system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/608Details relating to cache mapping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]

Definitions

  • the present invention relates to computer architectures and in particular to an improved computer architecture providing for a memory cache that allows access of cache contents by virtual uddresscs rather than physical addresses.
  • Cache memories arc used to minimize the time required tor a processor to access memory data by providing relatively compact, quickly accessed memory structure close to the processor. Portions of the main memory are loaded into the cache memory with the expectation that temporally proximate memory accesses will tend to cluster in the loaded portion (locality of reference) thus allowing the cache memory, once loaded, to serve multiple memory accesses by the processor before needing to be reloaded. Often multiple levels of cache (e.g., LI, L2. L3) may be used to optimize the trade-offs between rapid access and limited storage inherent in the cache structure.
  • levels of cache e.g., LI, L2. L3
  • a set of bits from a first set of address bits for the data is used to index into the cache and select a line at an indexed entry.
  • a set of bits from a second set of address bits is then compared against a set of tag bits corresponding to a selected entry and a hit declared if the set of bits from the second set of address bits match the selected set of tag bits.
  • a page table entry may also contain access permissions for data in the corresponding virtual page.
  • the page tables may be augmented by a translation lookaside buffer (TLB) that serves to cache recently accessed entries from the page table to speed up the process of translating the addresses and checking the requisite access permissions.
  • TLB translation lookaside buffer
  • the TLB may be optimized tor low access latency and for low miss rate by employing a fast, highly associative structure.
  • the cache memory is nevertheless normally accessed by physical memory addresses, that is, the first and second set of bits of address (e.g., index and tag) used to access the data in the cache are both part of a some physical address
  • the latency of the address translation using the TLB and page tables is tolerable for the main memory, it is more burdensome when used with a cache memory which is intended to provide frequent and rapid access that can be significantly slowed by address translation.
  • the translation with a highly associative TLB will be energy hungry.
  • the cache could be accessed directly using a virtual address from a program, that is. where the first and second set of bits (e.g. index and tog) used to access the data in the cache are both pans of the same virtual addresses.
  • the use of a virtual address to access cached data can obviate the latency and energy overhead resulting from TLB lookups; however it is complicated by the possibility of synonyms that is a group of distinct virtual addresses mapped to the same physical address.
  • This aliasing is possible and desirable to efficiently manage data in (physical or main) memory, for example, shared information across different processes with distinct virtual address spaces, in such cases, the use of virtual addressing tor the cache could permit multiple different cache entries mapping to the same physical address (synonym virtual addresses), that is, holding the same data under distinct virtual addresses. Allowing the duplicates reduces the cache capacity. This also presents a consistency problem among them. For example, if one process updates data associated with tine common physical address using the first cache location, then a second process cannot read the up-to-date value for the common physical address using the second cache location.
  • VI PT virtually-indexed physically-tagged cache
  • a first set ofbits used to index into the cache to select an entry is put of a virtual address and a second set ofbits used to coinpare against the tag bits of the selected entry is part of a physical address corresponding to die virtual address.
  • This solution exploits the observation that some low-order bits of a virtual address (the page offset bits) do not change as a result of the address translation.
  • these low-order bits of the virtual address (which ore the same as that of the corresponding physical address) con be used to index into the cache and start the access of the data and tag bits residing in the corresponding entry, in parallel the TLB is accessed to obtain the physical address.
  • the physical address bite obtained from the TLB access are compared with the physical address bits stored in the cache lags, and a hit declared if they match.
  • This approach may decrease the latency of the cache access, since the TLB is accessed in parallel with, rather than prior to, the cache access.
  • the TLB access energy is still expended and thus the energy benefits of a cache accessed with virtual addresses is not obtained.
  • the limits placed on the number ofbits used to index the cache (these hits should not change as a result of the address translation) restricts the organization of the cache potentially requiring a higher degree of associativity than may be desirable to achieve energy efficiency.
  • What is desirable is to have a cache that can be accessed with a virtual address so that a first address used to index into the cache and select an entry and a second address used to compare against the selected tag bits and to declare a hit are both parts of the same virtual address.
  • a data access that hits in the cache can be completed solely using a virtual address, without the need to access the TLB or perform u virtual to physical address translation.
  • Such a cache would have significant access latency and/or energy consumption advantages over designs diet employed a physical address to complete the cache access.
  • the present inventors have recognized that in many important computational tasks while synonyms are present over the duration of the program, within a short period of time comparable to Hie time a data is resident in a cache: (1 ) there are very few synonyms mapped to the same physical address, and (2) very few of the accesses to the cache are to data with synonyms.
  • the present invention exploits these observations by accessing the cache directly with a given victual address and then detecting synonym virtual addresses and efficiently converting the synonym virtual address to a selected key virtual address prior to cache access. It is practical to track synonyms with modest size data structures because of the relative scarcity of synonyms during the life of cached data.
  • the invention provides an electronic processor architecture for use with a memory huving storage at physical addresses.
  • the architecture includes a processor, a memory cache * a cache control circuit caching memory data of physical addresses in the memory cache by virtual addresses, and a translation circuit for translating between a virtual address from Che processor and a physical address.
  • the architecture includes a synonym tracking circuit receiving a given virtual address from the processor for access to the cache and: 0 ) determining if the given virtual address is a synonym with an other virtual address mapping to a same given physical memory address of data in the cache; and (2) when the given virtual address is a synonym, accessing the memory cache using the oilier virtual address as an accessing address for die memory cache.
  • the synonym tracking may access the memory cache using the given virtual address as the accessing address for the memory cuche. It is thus a feature of at least one embodiment of the invention to permit faster direct access to the cache using virtual matiory addresses in the dominant situation where there arc no synonyms.
  • the cache controller may prepare tor access of the memory cache using the given virtual address in parallel with the synonym tracking circuit determining if the given virtual address is a synonym with another virtual address.
  • the synonym tracking circuit may include a first table of virtual addresses (termed the ART) that are synonyms to determine if a given virtual addresses is a synonym with another virtual address mapped to the same physical memory address of data.
  • the synonym tracking circuit may include a compressed signature of the first table indicating whether a given virtual address is likely in the first table and the synonym tracking circuit may first check the compressed signature and may check the first table only if a compressed signature indicates that the given virtual address is likely in the first table.
  • the synonym tracking circuit may respond to a cache miss of the memory cache by determining a physical address associated with the given virtual address and applying the determined physical address to a second table linking a physical address and a designated (key) virtual address, and when the determined physical address links to a (key) virtual address in the second table using the virtual address of the second table as an accessing address for the memory cache.
  • the synonym .racking circuit may use a translation lookaside buffer and page table to convert the accessing address to a physical address. [0026] It is thus a feature of at least one embodiment of the invention to leverage existing translation circuitry (eg. the TLB) for the purpose of determining synonyms.
  • the first table, the second table, and a cache line may include a memory access permission linked to a virtual address.
  • the electronic processor may be an out of order processor using a load queue and a store queue and the load queue and store queue may store duta linked to a key virtual address.
  • FIG. 1 is a block diagram of a computer architecture employing a cache responding to virtual addresses per the present invention showing a memory management unit employing a cache controller, a translation lookaside buffer, and synonym tracking circuitry:
  • Fig.2 is a flowchart of die operation of the memory management unit responding to the memory access request and lurving a virtual address:
  • Fig.3 is a diagram of a synonym signature used in the synonym tracking circuitry
  • Fig.4 is a diagram of an address remapping tabic used in the synonym tracking circuitry
  • FIG. 5 is a diagram of an active synonym detection table used in the synonym tracking circuit!':
  • Fig. 6 is a diagram of a single cache line entry in the cache; and [0039] Figs. 7a-7d show multiple embodiments ia which access to the synonym tracking circuitry and address remapping table are distributed in a pipeline processor.
  • a computer processing system 10 may provide for one or more processor systems 12 communicating with a memory system 16 composed of various elements to he described below and other devices 15 such as networks, terminals, or the like.
  • Each processor system 12 may include a processor 13 communicating through a memory management unit (MMU) 14 with the memory system 16 and specifically with one or more caches 18 such as an LI cache 18a and an L2 cache 18b.
  • the processor system 12 may have a lower level cache (e.g.. an L3 cache 18c) that in turn communicates with physical memory 20 including random access memory disk drives and the like.
  • the processor 13 may provide for the ability to execute standard computer instructions including arithmetic and logical operations, memory accessing operations, as well as flow control operations including conditional branches and the like.
  • the processor 13 may include a load queue 17 and a store queue 19 as will be described below providing a function generally understood hi the an.
  • the MMU 14 may include cache controller 22 providing Tor ihe updating of the cache 18 from the physical memory 20 including evicting and loading lines of the cache 18 according to techniques generally understood in the art. As will be discussed in greater detail below, the MMU 14 stores data in the cache 18a so dial it can be accessed by virtual addresses rather man the physical addresses. Data may be stored in other caches 18b and 18c, for example, so that it may be accessed by virtual addresses or by physical addresses. In this regard, the cache controller 22 may communicate with a translation lookaside buffer (TLB) 26 capable of determining a mapping from the virtual address space of the virtual address V'b to the physical address space of the physical memory 20 to provide a physical address Pb.
  • TLB translation lookaside buffer
  • This physical address Pb will ultimately be used to access the physical memory 20 over lines 30 if there is a cache miss at the cache 18.
  • virtual addresses e.g. Vb
  • Vb virtual addresses
  • die TLB 26 provides a cache with recent translations between virtual address space and physical address space. If a mapping for the particular virtual address Vb is not found in the TLB 26, the cache controller 22 may consult with one or more page tables 28 and, by doing a "page walk", may obtain the necessary translation. Access to the TLB 26 or the page tables 28 is time and energy-consuming and desirably avoided.
  • the page table 28 may be stored in pan in the caches 18 and in part in physical memory 20 indicted generally by the dotted box labeled 20, 18.
  • the MMU 14 also includes a synonym circuit 35 that in one embodiment may include an active synonym signature (SS) 32, a first table providing an Address Remapping Table (ART) 34, and a second table providing an Active Synonym Detection Table (ASDT) 36 as will be discussed in detail below.
  • SS active synonym signature
  • ART Address Remapping Table
  • ASDT Active Synonym Detection Table
  • the MMU 14 receives memory access requests from the processor 13 over address lines 24, the requests including a virtual address ( Vb) of the data to be accessed. These requests may be sent to the SS 32 and to the ART 34 of the synonym circuit 35 which determine if Vb is in fact an active synonym of another active virtual address (a key virtual address) which is being used to access the same data in the cache 18.
  • Vb virtual address
  • the term "active" refers to virtual addresses that map to data currently in the cache 18.
  • the virtual address Vb is used to access directly a cache 18 with the expectation that the virtual address Vb is a key virtual address.
  • the virtual address Vb is passed to an Address Remapping Table (ART) 34 which confirms this relationship and identifies a key virtual address (e.g. Va) for which Vb is a synonym.
  • ART Address Remapping Table
  • Va is used instead to access the memory cache 18.
  • the cache controller 22 refers to the ASDT 36 to determine if there may be an unknown synonym to Vb not currently identified by SS 32 or ART 34, but detectable by comparing a given virtual address (Vb) against a known, active, key virtual address mapped to the same physical address Px. If such a hidden synonym is detected, and the necessary cache line 71 is available per the ASDT 36, the memory cache 18 is accessed again using the key virtual address Va. Otherwise the access is submitted to other levels of the cache or to physical memory 20 and when this access is complete, the acquired data is stored in the cache 18 indexed to the virtual address Va. In either case. SS 32, ART 34, and ASDT 36 are updated as needed.
  • the MMU 14 may receive a given virtual address Vb for memory access as indicated by process block 40.
  • This given virtual address Vb is. in a preferred embodiment, checked against the SS 32 per decision block 52.
  • the SS 32 processes the given virtual uddress Vb, for example, by lushing the given virtual address Vb using a hash coder 44 to generate a pointer to a specific bit 46 of a bit vector 48 in the SS 32. That pointed-lo bit 46 will be set if Vb is likely an active synonym virtual address. This indicates that there is likely to be on entry for that given virtual address Vb in the ART 34.
  • the hash indexing is extremely fast but will only provide a likelihood of success with the ART 34 and does not indicate the synonym which requires review of the ART 34 as will be discussed below.
  • each bit 46 of the bit vector 48 may be associated with a bit counter 50 indicating the number of associated entries in the ART 34 (or virtual addresses mapping to that bit 46 in the A RT 34).
  • This counter value of counter 50 allows the bit vector 48 to be updaled to reset bits 46 that are no longer the hash of valid synonym data in the ART 34.
  • the counter 50 also prevents the reset of bits 46, in (he case where the hash coder 44 maps multiple synonyms to the same bit 46, before all synonyms arc evicted from the ART 34.
  • die cache 18 is accessed directly ut process block 42 as indicated by arrow 41 using Vb.
  • the MMU 14 proceeds to process block 54 and the ART 34 is used to identify a key virtual address with which the synonym virtual address Vb is associated.
  • the ART 34 may provide multiple entries represented as rows each coiresponding to a synonym virtual address 56 to which the received virtual address cm address lines 24 (of Fig. 1 ) will be compared. Any "known" active synonym virtual addresses 56 (e.g. Vb) will be listed in the ART 34 and will map to a single key virtual address 58 (e.g.. Va) also in the ART 34.
  • the ART 34 may also hold memory access "permissions" 59 operating to control whether particular memory addresses (e.g., Vb) may be read or written to and being a standard of memory management known in the art
  • the permissions 59 of the ART 34 are evaluated to confirm that the memory access is proper. If not, an error is generated resulting in handling the request as a memory access violation exception as is understood in the art
  • each cache line 7 J in the cache J 8 may provide for a set of tag bits from a virtual address 63 to which data 65 of that cache line 71 is associated. That is, the cache 18 is both virtually indexed and virtually tagged as opposed to cache structures that arc virtually indexed but physically tagged and which require TLB access for correct functioning,
  • the cache line 71 may also provide for permission data 62 and an entry identification index value 69 for the ASDT 36 as will be discussed below.
  • the cache controller 22 proceeds to process block 67.
  • the given virtual address Vb is applied to the TLB 26 (and possibly to the page tables 28) to obtain a corresponding physical address (eg., Pb) associated with that given virtual address Vb.
  • Pb physical address
  • the ASDT 36 may provide fur a logical table having multiple entries represented as rows corresponding to active physical addresses 68 (Px) known to be in the cache 18. Also in each mw of the ASDT 36, and thereby linked to a physical address Px, is a single key virtual address 58 for mat physical address 68.
  • the ASDT 36 may also include remissions 73 similar to those described above providing access permissions tor the data associated with the key virtual address 58.
  • Each row of the ASDT 36 may also include a bit vector 72 indicating particular lines of the physical address in the cache 18 ussociatcd with each physical address Px.
  • the bit vector 72 thus provides for line level resolution in the identification of the data in the cache 18. For example, when the physical address 68 is a page (eg. four kilobytes;, each bit 74 of the bit vector 72 will be set when a corresponding line (64 bytes) of the page is enrolled in the cache 18.
  • a counter 76 may provide for the sum of tho number of bits 74 set in die bit vector 72 so that the entry of the row of ASDT 36 may be invalidated when all lines in the cache 18 have been evicted and the counter has a value of zero for a particular physical address Px.
  • the counter 76 thus acts like a valid bit that may be used to evict rows of the ASDT 36 when they are no longer useful, in updating the ASDT 36 described below.
  • the ASDT 36 may also include on active synonym detection bit 79 as will be discussed below.
  • the given virtual address Vb is not in the ASDT 36.
  • the role of Vb as a synonym to the key virtual address that is in the ASDT 36 (e.g., Ve) is memorialized by updating the ART 34 and the SS 32 appropriately at process block 85.
  • the active synonym detection bit 79 in the ASDT 36 may be set.
  • the ASDT 36 is then checked to see if the necessary cache line 71 required by given virtual address Vb is in the memory cache 18 (e.g. by checking whether the corresponding bit 74 of the bil vector 72 is set in the ASDT 36). Jf so. as indicated by line 89, access to the cache 18 may be provided by process block 42 using a key virtual address Vc discovered in the ASDT 36.
  • the cache controller 22 is allowed to update the cache 18 and if that updating is successful the ASDT 36 is updated appropriately.
  • this updating adds to the ASDT 36 the physical address Pb in the first column and the given virtual address Vb in the second column associated with that updating (which will now be a key virtual address 58) and the appropriate bit 74 of the bit vector 72 set and counter 76 incremented.
  • the appropriate bit 74 may be deduced from an offset value derivable from the given virtual address Vb.
  • a victim entry of the ASDT 36 may be determined by looking at the ASDT 36 lo select a physical address 68 associated with an invalid row of the ASDT 36 (i.e.. with counter 76 equaling zero) or a row with a non-zero counter value.
  • the invalid row having a counter of zero indicates that there are no valid lines still relied upon in the cache 18.
  • the detenninarion of the victim entry to be evicted can be curried out using any number of a variety of policies known in the art.
  • the lines tracked by that entry that ore still resident in the cache (which may be indicated by a non-zero value in the corresponding bits 74 of bit vector 72) are first evicted from the cache 18, thereby bringing the counter vulue down to zero and thus allowing the ASDT entry to be evicted without problems.
  • the associated entries in the ART and the SS also need to be invalidated and/or updated as needed.
  • the cache 18 is substantially the same as a normal cache, however, accessed by virtual addresses.
  • Virtual addresses may be combined with address space identifiers (AS1D) as is generally understood in the art to address the homonym issue.
  • AS1D address space identifiers
  • the ASDT 36 entry corresrroding to the evicted line must be identified so that the corresponding bit 74 in the hit vector 72 may be updated and the vorresponding counter 76 decremented. Identifying die corresponding line in the ASDT 36 may he complicated by the fact that the line to be evicted is identified by a virtual address 58 while the ASDT is most simply accessible via a physical address 68. Accordingly, the cache line 71 may include an ASDT index value 69 allowing rapid identification of the ncccasury entry in the ASDT 36 to be updated.
  • an ASDT index value 69 may also be stored in a Mis-Status Handling Register (MSHR) used lo handle cache miss requests in standard computer architectures so that the ASDT 36 entry corresponding to a cache line returned on a cache miss, to be placed in cache 18, can be updated.
  • MSHR Mis-Status Handling Register
  • FIG. 1 other processor systems 12 may write to data in their caches that invalidates a cache line 71 in the cache of the given processor systems 12a. This will generate an invalidation message normally describing the data to be invalidated using a physical address.
  • One way to identify the necessary cache line 71 for invalidation is to apply the physical address of the invalidation message to the ASDT 36 to obtain the key virtual address 58 and to use that key virtual address 58 to identify the nccesRury cache line 71.
  • a similar procedure to consult the ASDT 36 and to identity the necessary cache line 71 could be employed.
  • the present invention will typically find greatest use on the LI cache 1 Ra and coherence events between the 12 and LI caches arc rare, minimizing this overhead.
  • the load queue 17 and store queue! 9 are used in out of order processor* to manage memory reads and writes.
  • Using virtual addresses for the store queue 19 can create problems if a later load docs not identity a matching store due to synonyms and vice versa. In such cases, Mule data could be returned from the cache 18.
  • using virtual addresses for the load queue can create problems when there is a coherence-based invalidation or eviction in cache 18 and a load has been carried out (speculatively) for that evicted data.
  • the present invention may address this issue by identifying data in the loud queue 17 and store queue 19 by the key virtual address 58 and not by a synonym.
  • a TLB miss for a store in the store queue 19 may be resolved properly by holding younger stores in a separate queue until the TLB miss is resolved.
  • An alternative is to restart the program from the offending store to effectively delay the release of the store instruction from a store queue.
  • bit vector 72 of ASDT 36 may become unwieldy. In this case one may eliminate the bit vector 72 and instead "walk" through the lines of the cache 18 to search for lines from a given page related to a desired line to be evicted from the ASDT 36. This expensive operation can be avoided by preferably evicting from the ASDT 36 lines associated with small rather than large pages especially if there is a need to evict an entry with a non-zero value of counter 76. Generally, the eviction of large pages will also be less likely events. Access ofthti SS and ART
  • Figs. 7a-7d the access of the SS 32 and the ART 34 in a pipeline processor can be implemented in a variety of ways. As shown in Fig.6a, access to SS 32 and ART 34 can occur after address generation but before disambiguation. This approach achieves most of the power efficiency benefit of using a virtual cache but does not fully exploit potential latency benefits.
  • the SS 32 and ART 34 can be accessed in parallel during the disambiguation stage.
  • the disambiguation may need to be done with the key virtual addresses 58 on hits in the ART 34 although they are rare. This may increase power consumption but decrease latency.
  • access to SS 32 can occur before address generation based on base (or segment) registers, and access to ART 34 can occur after address generation and before disambiguation.
  • This approach can be applied to both instruction and data caches and obtains most of the energy and latency benefits possible with virtual caches.
  • SS 32 con be accessed after intermediate steps of address generation but before address generation is complete.
  • ART 34 Accesses to ART 34 can be reduced by exploiting the fact that successive references arc often to the same page (especially for instructions).
  • a last LVA (LLVA) register 100 (shown in Fig.4) may be associated with the ART 34 which maintains the last key virtual address 58 that wus accessed.
  • LLVA last LVA
  • access to the ART 34 need not occur and this value in the LLVA 100 can be used directly, skipping process block 54 of Fig. 2.
  • Each process may be associated with a different virtual address memory space which consists of a user and a kernel space.
  • the kernel space is shared across different processes and thus accesses to the kernel space can be considered as synonyms because a diffcrcni process has a different AS1D although the accessed virtual addresses in the kernel space are same.
  • This can create multiple entries in the ART 34 when the kernel space is accessed in a temporal proximate manner by multiple processes.
  • one embodiment may use a run time remapping of an ASID to a single unique value only for accesses to the kernel space. The identification of such access is based on a priori knowledge that the access has an address range associated with a kernel access. For example, if the kernel space (or operating system) is located in the upper half of address space, this remapping can simply look at the highest order address bit to trigger the remapping process.
  • Tlic invention contemplates that each data storage structure, including the ART 34, the ASDT 36 and the cache line 71 , may store virtual addressee together with address space identifiers (ASID) uniquely identifying each virtual address space and thus effectively being a portion of the virtual address.
  • ASID address space identifiers
  • the SS 32 and the ART 34 may be eliminated in favor of direct access to the cache 18 with a synonym Vb and the expectation that if the synonym Vh is not a key virtual address, that a cache miss will result and the proper key synonym resolved by access to the ASDT 36 after the cache miss per process block 60.
  • the data storage functionality of the SS 32, the ART 34. and the ASDT 36. providing data employed by the synonym tracking circuitry 35, and the synonym tracking circuitry 35 itself may be distributed in a variety of ways among the circuitry of the computer processor system 10 or intcrprocessor circuitry.
  • the data of the SS 32. ART 34 and ASDT 36 may be freely stored in a variety of different locations including in the cache 18 itself. In that latter case, the physical cache should be understood to have a regular cache component and a Bynonym tracking component each which may be the subject of separate claim dements.
  • microprocessors that can communicate in a stand-alone and/or a distributed environmcntfa), and can thus be configured to communicate via wired or wireless communications with other processors, where such one or more processor can be configured to operate on one or more processor-controlled devices that can be similar or different devices.
  • a processor could be a generul-purpose processor, a processing core, a context of a multithreaded processor, a graphics processing unit, a special-purpose processor, or any other form of processor that carries out operations that access a memory, as is understood in the art.
  • reference* to memory can include one or more processor-readable and accessible memory elements and/or components that can be internal to the processor-controlled device, external to the procetisor-controllcd device, and can be accessed via a wired or wireless network.
  • index should be imderatood to refer generally to the process of using a value to locate and access information related to that value, in the maimer of a book index, and is not intended to be limited to the technical meaning of index in the context of a cache memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
PCT/US2015/064955 2014-12-26 2015-12-10 Cache accessed using virtual addresses Ceased WO2016105961A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP15874115.7A EP3238074B1 (en) 2014-12-26 2015-12-10 Cache accessed using virtual addresses
JP2017534307A JP6696987B2 (ja) 2014-12-26 2015-12-10 仮想アドレスを使用してアクセスされるキャッシュ
KR1020177020817A KR102448124B1 (ko) 2014-12-26 2015-12-10 가상 주소들을 사용하여 액세스된 캐시
CN201580070399.3A CN107111455B (zh) 2014-12-26 2015-12-10 电子处理器架构以及缓存数据的方法

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201462096962P 2014-12-26 2014-12-26
US62/096,962 2014-12-26
US201462097342P 2014-12-29 2014-12-29
US62/097,342 2014-12-29
US14/867,926 2015-09-28
US14/867,926 US10089240B2 (en) 2014-12-26 2015-09-28 Cache accessed using virtual addresses

Publications (1)

Publication Number Publication Date
WO2016105961A1 true WO2016105961A1 (en) 2016-06-30

Family

ID=56151409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/064955 Ceased WO2016105961A1 (en) 2014-12-26 2015-12-10 Cache accessed using virtual addresses

Country Status (6)

Country Link
US (1) US10089240B2 (enExample)
EP (1) EP3238074B1 (enExample)
JP (1) JP6696987B2 (enExample)
KR (1) KR102448124B1 (enExample)
CN (1) CN107111455B (enExample)
WO (1) WO2016105961A1 (enExample)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019069255A1 (en) * 2017-10-06 2019-04-11 International Business Machines Corporation MANAGING EFFECTIVE ADDRESS SYNONYMS IN A LOADING-STORAGE UNIT OPERATING WITHOUT ADDRESS TRANSLATION
US10572257B2 (en) 2017-10-06 2020-02-25 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10606590B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Effective address based load store unit in out of order processors
US10606591B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10628158B2 (en) 2017-10-06 2020-04-21 International Business Machines Corporation Executing load-store operations without address translation hardware per load-store unit port
US10977047B2 (en) 2017-10-06 2021-04-13 International Business Machines Corporation Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses
US11175925B2 (en) 2017-10-06 2021-11-16 International Business Machines Corporation Load-store unit with partitioned reorder queues with single cam port

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089240B2 (en) * 2014-12-26 2018-10-02 Wisconsin Alumni Research Foundation Cache accessed using virtual addresses
US10210088B2 (en) * 2015-12-28 2019-02-19 Nxp Usa, Inc. Computing system with a cache invalidation unit, a cache invalidation unit and a method of operating a cache invalidation unit in a computing system
US9772943B1 (en) * 2016-04-01 2017-09-26 Cavium, Inc. Managing synonyms in virtual-address caches
US20180173637A1 (en) * 2016-12-21 2018-06-21 Intel Corporation Efficient memory aware cache management
US10698836B2 (en) 2017-06-16 2020-06-30 International Business Machines Corporation Translation support for a virtual cache
US10606762B2 (en) 2017-06-16 2020-03-31 International Business Machines Corporation Sharing virtual and real translations in a virtual cache
US10831664B2 (en) 2017-06-16 2020-11-10 International Business Machines Corporation Cache structure using a logical directory
US10402337B2 (en) 2017-08-03 2019-09-03 Micron Technology, Inc. Cache filter
US10324846B2 (en) 2017-09-21 2019-06-18 International Business Machines Corporation Bits register for synonyms in a memory system
US10534616B2 (en) 2017-10-06 2020-01-14 International Business Machines Corporation Load-hit-load detection in an out-of-order processor
US11392508B2 (en) * 2017-11-29 2022-07-19 Advanced Micro Devices, Inc. Lightweight address translation for page migration and duplication
GB2570691B (en) * 2018-02-02 2020-09-09 Advanced Risc Mach Ltd Controlling guard tag checking in memory accesses
CN111124945B (zh) * 2018-10-30 2023-09-22 伊姆西Ip控股有限责任公司 用于提供高速缓存服务的方法、设备和计算机可读介质
US11010067B2 (en) * 2018-12-28 2021-05-18 Intel Corporation Defense against speculative side-channel analysis of a computer system
CN112579170B (zh) * 2020-12-10 2022-11-08 海光信息技术股份有限公司 一种用于减少虚拟地址计算的处理器及其方法
US11461247B1 (en) * 2021-07-19 2022-10-04 Arm Limited Granule protection information compression
US12105634B2 (en) 2021-09-27 2024-10-01 Ati Technologies Ulc Translation lookaside buffer entry allocation system and method
CN113934655B (zh) * 2021-12-17 2022-03-11 北京微核芯科技有限公司 解决高速缓冲存储器地址二义性问题的方法和装置
CN119719018A (zh) * 2024-11-15 2025-03-28 北京航空航天大学 面向多类神经网络协同场景的张量感知片上缓存系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023814A1 (en) * 2000-06-10 2003-01-30 Barroso Luiz Andre Method and system for detecting and resolving virtual address synonyms in a two-level cache hierarchy
US20060101227A1 (en) * 2001-03-30 2006-05-11 Willis Thomas E Method and apparatus for sharing TLB entries
US20080082721A1 (en) * 2006-09-29 2008-04-03 Mips Technologies, Inc. Data cache virtual hint way prediction, and applications thereof
US20130013856A1 (en) * 2006-06-23 2013-01-10 Microsoft Corporation Flash management techniques
WO2013058745A1 (en) * 2011-10-18 2013-04-25 Soft Machines, Inc. Methods and systems for managing synonyms in virtually indexed physically tagged caches

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2839060B2 (ja) * 1992-03-02 1998-12-16 インターナショナル・ビジネス・マシーンズ・コーポレイション データ処理システムおよびデータ処理方法
JPH07287668A (ja) * 1994-04-19 1995-10-31 Hitachi Ltd データ処理装置
US6061774A (en) * 1997-05-23 2000-05-09 Compaq Computer Corporation Limited virtual address aliasing and fast context switching with multi-set virtual cache without backmaps
US20050055528A1 (en) * 2002-12-12 2005-03-10 International Business Machines Corporation Data processing system having a physically addressed cache of disk memory
US20040117587A1 (en) * 2002-12-12 2004-06-17 International Business Machines Corp. Hardware managed virtual-to-physical address translation mechanism
US7017024B2 (en) * 2002-12-12 2006-03-21 International Business Machines Corporation Data processing system having no system memory
US7213125B2 (en) * 2004-07-31 2007-05-01 Hewlett-Packard Development Company, L.P. Method for patching virtually aliased pages by a virtual-machine monitor
CN100414518C (zh) * 2004-11-24 2008-08-27 中国科学院计算技术研究所 改进的虚拟地址变换方法及其装置
US20070101044A1 (en) * 2005-10-27 2007-05-03 Kurichiyath Sudheer Virtually indexed cache system
US9110830B2 (en) * 2012-01-18 2015-08-18 Qualcomm Incorporated Determining cache hit/miss of aliased addresses in virtually-tagged cache(s), and related systems and methods
US8904068B2 (en) * 2012-05-09 2014-12-02 Nvidia Corporation Virtual memory structure for coprocessors having memory allocation limitations
US10089240B2 (en) * 2014-12-26 2018-10-02 Wisconsin Alumni Research Foundation Cache accessed using virtual addresses

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023814A1 (en) * 2000-06-10 2003-01-30 Barroso Luiz Andre Method and system for detecting and resolving virtual address synonyms in a two-level cache hierarchy
US20060101227A1 (en) * 2001-03-30 2006-05-11 Willis Thomas E Method and apparatus for sharing TLB entries
US20130013856A1 (en) * 2006-06-23 2013-01-10 Microsoft Corporation Flash management techniques
US20080082721A1 (en) * 2006-09-29 2008-04-03 Mips Technologies, Inc. Data cache virtual hint way prediction, and applications thereof
WO2013058745A1 (en) * 2011-10-18 2013-04-25 Soft Machines, Inc. Methods and systems for managing synonyms in virtually indexed physically tagged caches

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3238074A4 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019069255A1 (en) * 2017-10-06 2019-04-11 International Business Machines Corporation MANAGING EFFECTIVE ADDRESS SYNONYMS IN A LOADING-STORAGE UNIT OPERATING WITHOUT ADDRESS TRANSLATION
US10572257B2 (en) 2017-10-06 2020-02-25 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10572256B2 (en) 2017-10-06 2020-02-25 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10606590B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Effective address based load store unit in out of order processors
US10606593B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Effective address based load store unit in out of order processors
US10606591B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10606592B2 (en) 2017-10-06 2020-03-31 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10628158B2 (en) 2017-10-06 2020-04-21 International Business Machines Corporation Executing load-store operations without address translation hardware per load-store unit port
CN111133421A (zh) * 2017-10-06 2020-05-08 国际商业机器公司 在无地址转换的情况下操作的加载存储单元中处理有效地址同义词
GB2579757A (en) * 2017-10-06 2020-07-01 Ibm Handling effective address synonyms in a load-store unit that operates without address translation
US10776113B2 (en) 2017-10-06 2020-09-15 International Business Machines Corporation Executing load-store operations without address translation hardware per load-store unit port
GB2579757B (en) * 2017-10-06 2020-11-18 Ibm Handling effective address synonyms in a load-store unit that operates without address translation
DE112018004006B4 (de) * 2017-10-06 2021-03-25 International Business Machines Corporation Verarbeiten von synonymen von effektiven adressen in einer lade-speicher-einheit, die ohne adressumsetzung arbeitet
US10963248B2 (en) 2017-10-06 2021-03-30 International Business Machines Corporation Handling effective address synonyms in a load-store unit that operates without address translation
US10977047B2 (en) 2017-10-06 2021-04-13 International Business Machines Corporation Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses
US11175925B2 (en) 2017-10-06 2021-11-16 International Business Machines Corporation Load-store unit with partitioned reorder queues with single cam port
US11175924B2 (en) 2017-10-06 2021-11-16 International Business Machines Corporation Load-store unit with partitioned reorder queues with single cam port
CN111133421B (zh) * 2017-10-06 2023-09-29 国际商业机器公司 在无地址转换的情况下操作的加载存储单元中处理有效地址同义词

Also Published As

Publication number Publication date
US10089240B2 (en) 2018-10-02
CN107111455A (zh) 2017-08-29
KR20170100003A (ko) 2017-09-01
JP2018504694A (ja) 2018-02-15
EP3238074A1 (en) 2017-11-01
JP6696987B2 (ja) 2020-05-20
EP3238074A4 (en) 2018-08-08
US20160188486A1 (en) 2016-06-30
EP3238074B1 (en) 2019-11-27
KR102448124B1 (ko) 2022-09-28
CN107111455B (zh) 2020-08-21

Similar Documents

Publication Publication Date Title
EP3238074B1 (en) Cache accessed using virtual addresses
US9251095B2 (en) Providing metadata in a translation lookaside buffer (TLB)
EP0945805B1 (en) A cache coherency mechanism
CN109240950B (zh) 处理器、区分系统管理模式条目的方法以及存储介质
US8782348B2 (en) Microprocessor cache line evict array
US20040117588A1 (en) Access request for a data processing system having no system memory
US9069690B2 (en) Concurrent page table walker control for TLB miss handling
CN100428198C (zh) 改进任务切换的系统和方法
JP2018504694A5 (enExample)
US20040117587A1 (en) Hardware managed virtual-to-physical address translation mechanism
US5715427A (en) Semi-associative cache with MRU/LRU replacement
US11620220B2 (en) Cache system with a primary cache and an overflow cache that use different indexing schemes
US9058284B1 (en) Method and apparatus for performing table lookup
US6711653B1 (en) Flexible mechanism for enforcing coherency among caching structures
US11687466B1 (en) Translation lookaside buffer consistency directory for use with virtually-indexed virtually-tagged first level data cache that holds page table permissions
GB2542771A (en) Hazard Checking
US20050055528A1 (en) Data processing system having a physically addressed cache of disk memory
US7017024B2 (en) Data processing system having no system memory
US20040117590A1 (en) Aliasing support for a data processing system having no system memory
US20200210346A1 (en) Software translation prefetch instructions
US12061555B1 (en) Non-cacheable access handling in processor with virtually-tagged virtually-indexed data cache
US20040117583A1 (en) Apparatus for influencing process scheduling in a data processing system capable of utilizing a virtual memory processing scheme
US20040117589A1 (en) Interrupt mechanism for a data processing system having hardware managed paging of disk data
Rao et al. Implementation of Efficient Cache Architecture for Performance Improvement in Communication based Systems
KUMAR Performance improvement by Software controlled Cache Architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15874115

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017534307

Country of ref document: JP

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2015874115

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20177020817

Country of ref document: KR

Kind code of ref document: A