US20170329711A1 - Interleaved cache controllers with shared metadata and related devices and systems - Google Patents

Interleaved cache controllers with shared metadata and related devices and systems Download PDF

Info

Publication number
US20170329711A1
US20170329711A1 US15/154,812 US201615154812A US2017329711A1 US 20170329711 A1 US20170329711 A1 US 20170329711A1 US 201615154812 A US201615154812 A US 201615154812A US 2017329711 A1 US2017329711 A1 US 2017329711A1
Authority
US
United States
Prior art keywords
metadata
memory
metadata store
cache
controllers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/154,812
Other languages
English (en)
Inventor
Daniel Greenspan
Zvika Greenfield
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US15/154,812 priority Critical patent/US20170329711A1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GREENFIELD, ZVIKA, GREENSPAN, DANIEL
Priority to PCT/US2017/027499 priority patent/WO2017196495A1/fr
Publication of US20170329711A1 publication Critical patent/US20170329711A1/en
Priority to US16/019,426 priority patent/US10657058B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0851Cache with interleaved addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0837Cache consistency protocols with software control, e.g. non-cacheable data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0605Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0658Controller construction arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/22Employing cache memory using specific memory technology
    • G06F2212/221Static RAM
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/28Using a specific disk cache architecture
    • G06F2212/283Plural cache memories

Definitions

  • Computer and electronic devices have become integral to the lives of many and include a wide range of uses from social media activity to intensive computational data analysis.
  • Such devices can include smart phones, tablets, laptops, desktop computers, network servers, and the like.
  • Memory systems and subsystems play an important role in the implementation of such devices, and are one of the key factors affecting performance. Accordingly, memory systems and subsystems are the subject of continual research and development.
  • FIG. 1 is a schematic view of an exemplary memory system
  • FIG. 2 is a schematic view of an exemplary memory system
  • FIG. 3 is a schematic view of an exemplary memory system
  • FIG. 4 is a schematic view of an exemplary memory system
  • FIG. 5 is a schematic view of an exemplary memory system.
  • FIG. 6 is a schematic view of an exemplary memory system
  • FIG. 7A is a schematic view of an exemplary memory system
  • FIG. 7B is a schematic view of an exemplary memory system
  • FIG. 7C is a schematic view of an exemplary memory system
  • FIG. 8A is a representation of an exemplary metadata entry
  • FIG. 8B is a representation of an exemplary shared metadata entry
  • FIG. 9 is a schematic view of an exemplary system.
  • FIG. 10 is a representation of steps of an exemplary method of a memory system with shared metadata.
  • bit line includes support for a plurality of such bit lines.
  • Coupled refers to a relationship of electrical or physical connection or attachment between one item and another item, and includes relationships of either direct or indirect connection or attachment. Any number of items can be coupled, such as materials, components, structures, layers, devices, objects, etc.
  • directly coupled refers to a relationship of electrical or physical connection or attachment between one item and another item where the items have at least one point of direct physical contact or otherwise touch one another. For example, when one layer of material is deposited on or against another layer of material, the layers can be said to be directly coupled.
  • Objects or structures described herein as being “adjacent to” each other may be in physical contact with each other, in close proximity to each other, or in the same general region or area as each other, as appropriate for the context in which the phrase is used.
  • the term “substantially” refers to the complete or nearly complete extent or degree of an action, characteristic, property, state, structure, item, or result.
  • an object that is “substantially” enclosed would mean that the object is either completely enclosed or nearly completely enclosed.
  • the exact allowable degree of deviation from absolute completeness may in some cases depend on the specific context. However, generally speaking, the nearness of completion will be so as to have the same overall result as if absolute and total completion were obtained.
  • the use of “substantially” is equally applicable when used in a negative connotation to refer to the complete or near complete lack of an action, characteristic, property, state, structure, item, or result.
  • compositions that is “substantially free of” particles would either completely lack particles, or so nearly completely lack particles that the effect would be the same as if it completely lacked particles.
  • a composition that is “substantially free of” an ingredient or element may still actually contain such item as long as there is no measurable effect thereof.
  • the term “about” is used to provide flexibility to a numerical range endpoint by providing that a given value may be “a little above” or “a little below” the endpoint. However, it is to be understood that even when the term “about” is used in the present specification in connection with a specific numerical value, that support for the exact numerical value recited apart from the “about” terminology is also provided.
  • interleaved memory is a design made to compensate for the relatively slow speed of dynamic random-access memory (DRAM) memory, by spreading memory addresses evenly across memory channels. In this way, contiguous memory read and write operations use each memory channel in turn, resulting in higher memory throughputs. This is achieved by allowing memory channels to perform the desired operations in parallel, yet not forcing individual non-contiguous memory transactions into issuing the excessively large transactions that would result if the data bus to memory were to be merely widened.
  • Memory systems including one level ( 1 LM) memory systems that implement high bandwidth using multiple memory controllers, such as DRAM, can interleave memory transactions between controllers.
  • An operating system allocates memory in chunks. For example, a program executing on the OS may request an allocation of memory for its data and the OS will provide this allocation as a non-sequential series of chunks of a specified size.
  • the use of fixed-size chunks when allocating memory allows large allocations of memory to be made even where, as a result of continuous software operations, memory has become highly fragmented.
  • a typical OS will allocate memory in 4K bytes or chunks (4 KByte).
  • a system may implement a plurality of memory controllers to increase efficiency.
  • interleave granularity be 4K between memory controllers, as this may result in a read of an entire 4K chunk being serviced by only a single memory controller, and single memory channel. Therefore, requests can be interleaved at a size smaller than the size allocated by the OS. For example, requests for 256 bytes of data interleaved between controllers at 128 byte granularity can be serviced by more than one memory controller in parallel. Similarly, a request to read an entire 4 Kbyte OS page could be serviced by multiple controllers in parallel.
  • a memory system with two cache controllers connected to two memory controllers may maintain tags within each cache controller for half-OS-pages rather than OS-pages, causing 100% size/cost impact for the large tag arrays.
  • a different memory system may limit the interleave between cache controllers to OS page size, causing a 50% loss in stream bandwidth.
  • a different memory system may, in addition to limiting the interleave between cache controllers to OS page size, add a memory fabric between cache controllers and memory controllers, causing a multi-cycle latency penalty.
  • One or more cache controllers may be implemented in memory systems to control local storage of cached data.
  • a memory-side cache such as in a two level memory (2LM) system
  • bandwidth requirements typically necessitate the use of multiple cache controllers.
  • the memory may store all the data but may be slow and therefore a portion of the data stored in the memory will be stored locally in the cache and managed by the cache controllers.
  • the cache controllers are capable of holding entries that relate to 4 Kbyte of memory allocations, in line with the allocation granularity of an OS.
  • the cache controllers may store data locally and hold the metadata on-die in a static random-access memory (SRAM) array to allow quick identification of the data stored locally.
  • SRAM static random-access memory
  • the cache controllers may store metadata that will typically include cache tags. Each cache controller has an upper limit of how many cache tags or pieces of metadata may be stored.
  • Various embodiments provide a metadata store fabric that provides a plurality of cache controllers with shared access to a plurality of metadata stores.
  • a metadata store fabric may be hardware that is a set of connections between metadata stores and cache controllers that allow an exchange of data between the metadata stores and the cache controllers.
  • Embodiments exemplified herein include memory devices, systems and methods that re-distribute storage and handling of memory-side cache metadata utilizing a mesh structure between multiple cache controllers and multiple metadata stores.
  • the mesh structure may be a hardware structure and may also be referred to as a “metadata store fabric” or simply “fabric”.
  • the metadata stores may store the metadata or cache tags as shared distributed metadata.
  • the shared distributed metadata allows a first cache controller to send information such as cache tags or metadata to a metadata store connected through the metadata store fabric.
  • the metadata store then converts or stores the cache tag to a shared distributed metadata and provides a shared access to the shared distributed metadata allowing a second cache controller to access the shared distributed metadata that is based on the information from the first cache controller.
  • This allows the second cache controller to carry out an operation based on cache tags or metadata without the need to allocate an additional metadata entry.
  • the second cache controller, or all of the cache controllers in the memory system may be able to operate more efficiently at a higher bandwidth without increasing the capacity or size of the local store of the cache controller. For example, 256-byte requests being handled by two cache controllers in parallel and handled by two memory controllers in parallel.
  • the present disclosure utilizes tag and valid bits.
  • the tags and valid bits are part of the metadata or shared distributed metadata that allow operations on the memory to occur.
  • the shared distributed metadata also introduces lock bits that lock the shared distributed metadata until the lock bit is cleared by the associated cache controller. This ensures that the shared distributed metadata is not cleared from a metadata store before it is no longer needed for operations and possible update by a given cache controller.
  • the mesh structure allows for efficient operation with OS-page-granularity cache entries, and hence metadata entries, in terms of metadata usage.
  • the mesh also allows for efficient memory interleaving between cache controllers at sub-OS-page-size granularity in terms of optimized data path.
  • metadata stores may be combined with various techniques to achieve zero additional latency for all cache hit transactions even when sub-page interleaving is used.
  • FIG. 1 shows a system-on-chip (SOC) 102 with a basic 1 LM system.
  • the SOC 102 includes a central processing unit (CPU) 104 for processing data in a computer system. It should be appreciated that CPU 104 may comprise integrated caches, not pictured, which are integrated into subsystems of CPU 104 .
  • the SOC 102 also comprises an integrated display controller 106 , a controller to control output data to a user being displayed on a display such as a screen.
  • the SOC 102 additionally comprises an 10 subsystem 108 , which is an input output system for inputting and outputting data for system 102 .
  • the SOC 102 also comprises a system fabric 110 , which can be a hardware fabric for connecting a memory controller 112 and other components of the SOC 102 together.
  • the memory controller 112 is a dedicated hardware incorporated into the SOC 102 for controlling the memory 114 .
  • the memory 114 is DRAM, but it should be appreciated that the memory 114 may be other types of memory as well.
  • FIG. 1 shows a 1 LM system where the operating system employs a 4 KByte page and memory 114 has a 4 KByte page size. In one example, two adjacent OS-allocated pages of data, “A” and “B” are shown stored in memory 114 as shown. While the illustration of FIG.
  • SOC system-on-chip
  • FIG. 2 shows a 1LM system with a SOC 200 that has multiple memory controllers.
  • the SOC 200 may include some of the components of SOC 102 .
  • the SOC 200 includes two memory controllers, specifically the memory controller 204 and the memory controller 206 that are connected to the system fabric 110 via a memory fabric 202 .
  • the memory fabric 202 is hardware configured to interleave across the two memory controllers as well as the memory 208 and the memory 210 . For example, the interleave may occur every 4K bytes.
  • the memory controller and memory bandwidth has theoretically been doubled, the peak stream bandwidth of the system of FIG. 2 will remain little changed when compared to the system of FIG. 1 .
  • FIG. 3 shows a 1 LM system with a SOC 300 that has multiple memory controllers.
  • the SOC 300 may comprise some of the components of the SOC 102 and/or 200 and illustrates how the memory fabric 202 of SOC 200 may be configured differently in FIG. 3 .
  • the SOCs 102 , 200 , and 300 depict examples where a SOC may issue multiple read requests simultaneously.
  • the SOC 300 depicts embodiments that improve or optimize the performance ‘stream bandwidth’ where such multiple read requests exist.
  • the system may request to read 256 bytes, which may be one sixteenth of a memory page such as a DRAM page.
  • each OS page has been sliced—such that A becomes A 0 and A 1 .
  • a 0 contains data for bytes 0-127, 256-383, 512-639, 768-895, 1024-1151, 1280-1407, 1536-1663, 1792-1919, 2048-2175, 2304-2431, 2560-2687, 2816-2943, 3072-3199, 3328-3455, 3584-3711, 3840-3967 and
  • a 1 contains data for bytes 128-255, 384-511, 640-767, 896-1023, 1152-1279, 1408-1535, 1664-1791, 1920-2047, 2176-2303, 2432-2559, 2688-2815, 2944-3071, 3200-3327, 3456-3583, 3712-3839, 3968-4095 within the page.
  • a request to read 256 sequential bytes such as from address 512 to address 767 , will be serviced by both the memory controller 204 and the memory 302 (bytes 512-639) and the memory controller 206 and the memory 304 (bytes 640-767), realizing a doubling of bandwidth compared to the SOC 102 of FIG. 1 .
  • FIG. 4 shows a 2LM system with a SOC 400 .
  • the SOC 400 may include some of the components of the SOCs 102 , 200 , and/or 300 .
  • the SOC 400 depicts embodiments which further include a cache controller 408 and a cache controller 410 disposed between the system fabric 110 and the memory controller 204 and the memory controller 206 respectively.
  • FIG. 4 depicts a memory 402 further comprising a memory 404 and SOC memory controller 406 .
  • the memory controller 406 is connected to the cache controller 408 and the cache controller 410 .
  • FIG. 4 also depicts a memory 418 comprising the memory 414 and memory 416 connected to SOC memory controller 204 and SOC memory controller 206 respectively.
  • Memory 414 and 416 provide relatively fast data storage for the cache controller 408 and the cache controller 410 , thereby allowing fast access to cached data of memory 404 .
  • the storage of pages A and B in the memory 414 and 416 may be similar to what is described in the system of FIG. 3 .
  • the position of the pages within each memory may be influenced by the organizational policies, such as the use of ways 0, 1, 2, 3, and 4, of the cache controller 408 and the cache controller 410 .
  • cache tags are references to which portions of the main memory 404 are held in which pages of the cache and are maintained by each cache controller.
  • OS page “in use” such as A
  • there is a double overhead of assigning, storing, looking-up, and maintaining tags where the cache controller 408 is for maintaining the tag for A 0 and the cache controller 410 is for maintaining the tag for A 1 .
  • One design approach to avoid this double overhead is to use a single cache controller.
  • FIG. 5 shows a 2LM system with a SOC 500 .
  • the SOC 500 may comprise some of the components of the SOCs 102 , 200 , 300 , and/or 400 .
  • the SOC 500 depicts a larger interleave between the two cache controllers (for example 4 KByte) as compared to the SOC 400 of FIG. 4 , such that an entire OS page is handled by a single cache controller.
  • this large interleave causes the bandwidth limitations similar to the SOC 200 of FIG. 2 , as only one memory controller handles each OS page.
  • FIG. 6 shows a 2LM system with a SOC 600 .
  • the SOC 600 may comprise some of the components of the SOCs 102 , 200 , 300 , 400 , and/or 500 .
  • the SOC 600 connects to the memory 604 and memory 608 .
  • the SOC 600 depicts an additional fabric, a memory fabric 602 , which is disposed between the cache controllers 408 and 410 and the memory controllers 204 and 206 .
  • Memory fabric 602 provides interleaving at the memory with sub-OS-page granularity (for example, 128 bytes or other values), while still allowing the cache controllers to be interleaved by memory fabric 202 at OS-page granularity (for example 4 KByte).
  • sub-OS-page granularity for example, 128 bytes or other values
  • OS-page granularity for example 4 KByte
  • FIG. 7A shows a 2LM system with a SOC 700 in accordance with various embodiments.
  • the SOC 700 may comprise some of the components of the SOCs 102 , 200 , 300 , 400 , 500 , and/or 600 .
  • the SOC 700 depicts embodiments of the present disclosure that may overcome at least some of the described limitations of the SOCs 400 , 500 , and/or 600 .
  • the SOC 700 depicts a metadata store fabric 702 , a metadata store 704 , and a metadata store 706 .
  • FIG. 7A further depicts SOC 700 connected to memories 708 and 710 .
  • the metadata stores 704 and 706 are on-die metadata storage blocks that service the cache controllers, but are separated from the cache controllers and each serve a multiplicity of cache controllers.
  • the metadata store is a static random-access memory (SRAM) array.
  • SRAM static random-access memory
  • Each metadata store may serve a multiplicity of the cache controllers in the SOC 700 .
  • the metadata stores 704 and 706 are assembled as separate metadata storages but can be implemented in the same or different memory devices. It should be appreciated that the SOC 700 depicts two cache controllers and two metadata stores, but any number or combination of cache controllers and metadata stores can be used. In a given SOC for example, the number of cache controllers may be greater than the number of metadata stores, the number of metadata stores may be greater than the number of cache controllers, or the system may include only one metadata store for a plurality of cache controllers.
  • a logic block is added that is assigned responsibility for some of the tasks that would generally be assigned to a cache controller.
  • these tasks may include maintaining least recently used (LRU) indications, and re-allocating the clean entry with the highest LRU when a cache allocation to a new system memory address is required.
  • LRU least recently used
  • Various embodiments may achieve the same interleave as shown in the SOC 600 of FIG. 6 , but without the latency and wide data paths of the memory fabric. The additional latency of the metadata store fabric may be mitigated by the use of various techniques.
  • identically-offset fragments of the pages stored in multiple ways of a cache set are stored together in a single DRAM page, facilitating the ability of the memory controller to issue the DRAM page open requests on the assumption that the requested data will be found in the cache, but prior to knowing in which way it is to be found.
  • FIG. 7B shows a 2LM system with a SOC 701 upon which embodiments of the present disclosure are implemented.
  • the SOC 701 may be described as an alternate configuration of SOC 700 of FIG. 7A .
  • the metadata stores 704 and 706 in SOC 701 are each physically co-located with the cache controllers 408 and 410 respectively.
  • the presence of the metadata store fabric 702 allows the metadata stores 704 and 706 to logically operate in a similar manner to what was described for SOC 700 of FIG. 7A .
  • FIG. 7B depicts different locations of the metadata stores 704 and 706 relative to the locations of the metadata stores 704 and 706 for FIG.
  • each metadata store 704 and 706 need not affect the general connectivity of the metadata stores 704 and 706 to the cache controllers 408 and 410 for the described operations. It should be appreciated that the physical proximity of each metadata store to a cache controller may allow simplified construction of derivative designs, for example a ‘chopped’ derivative containing only one cache controller, one metadata store, and one memory controller, or a ‘high-end’ derivative containing four cache controllers, four metadata stores, and four memory controllers.
  • FIG. 7C shows a 2LM system with a SOC 703 upon which embodiments of the present disclosure are implemented.
  • SOC 703 is an alternate configuration of the SOC 700 of FIG. 7A or SOC 701 of FIG. 7C .
  • the system 703 further comprises the common logic block 710 .
  • the common logic block 710 is a logic block connected with the metadata store fabric 702 .
  • the common logic block 710 is added to SOC 703 so that each of the metadata stores are not required to comprise their own logic block that is responsible for tasks. For example, these tasks may include scrubbing, maintaining least recently used (LRU) indications, and re-allocating the clean entry with the highest LRU when a cache allocation to a new system memory address is required.
  • LRU least recently used
  • FIGS. 7A-C of the present disclosure depict the development of a “shared metadata entry” that allows cache controllers to each access shared, distributed metadata without the risk of corrupting metadata used by the other cache controllers sharing that metadata entry.
  • FIG. 8A is a representation of one type of a standard metadata entry.
  • a metadata 802 is a standard or typical metadata entry and may be used in a set-associative sectored cache.
  • the metadata 802 employs fourteen tag bits for address matching.
  • eight valid bits each report on the validity of 512 bytes of data of that entry.
  • eight dirty bits indicate whether the data must be scrubbed to main memory before the entry is re-allocated.
  • Three LRU bits track order of use (in relation to other entries of the same cache set), and one Pinned “P” bit captures software has requested that the entry not be re-allocated.
  • FIG. 8B depicts shared metadata entry 804 which may be shared metadata entry among cache controllers as employed by embodiments of the present disclosure.
  • a division of “valid” and “dirty” bits occurs according to each controller.
  • valid[3:0] may refer to bytes 0-127, 256-383, 512-639 and 768-895, all of which may be handled by cache controller 408 of FIG. 7A .
  • valid[7:4] may refer to bytes 128-255, 384-511, 640-767, and 896-1023, all of which may be handled by cache controller 410 of FIG. 7A .
  • “Lock” bits are included for each cache controller. It should be appreciated that “lock” bits relate to the valid and dirty bits of a given cache controller.
  • lock 0 (depicted as L[ 0 ]) would relate to valid[3:0] and dirty[3:0] for cache controller 408 of FIG. 7A .
  • Lock 1 (depicted as L[ 1 ]) would relate to valid[7:4] and dirty[7:4] for cache controller 410 of FIG. 7A .
  • an assertion of a “lock” bit indicates that the respective controller has taken a local copy of its “dirty” and “valid” bits for that entry, and that these should not be changed except by that cache controller.
  • the shared metadata entry 804 may be further enhanced by the addition of a lock bit related to the common logic block 710 of FIG. 7C .
  • a lock bit is not strictly needed but may be optionally added.
  • An additional lock bit L[ 2 ] (not depicted) may be added to the metadata entry 804 .
  • the additional lock bit L[ 2 ] may force the metadata store to know to request that the common logic block 710 completed its task and released this lock prior to giving the metadata from the metadata store to the requesting cache controller.
  • any entry which is not in use by any controller, will have its “lock” bits clear.
  • the metadata store is free to initiate scrub of the dirty data for that entry, and, for clean entries, re-allocate at will. For example, a re-allocation may occur according to an least recently used (LRU) protocol or other algorithm.
  • LRU least recently used
  • one of the cache controllers receives a transaction to a memory address, it sends a request to the appropriate metadata store to check the appropriate tags for a match (indicating that this memory address has been allocated in the cache), such tags are common to the cache controllers.
  • the copy of the contents delivered to the requesting cache controller need not include valid or dirty bits belonging to one of the other controllers.
  • the receiving cache controller serves that transaction as well as any further ones to the other parts of same OS page that are assigned to it due to the chosen interleave.
  • the cache controller may update the values of its local copy of the “valid” and “dirty” bits for that entry to reflect the cache operations it has performed.
  • the cache controller when the cache controller has completed handling all transactions relating to this entry, it will send an update to the metadata store of the appropriate “valid” and “dirty” bits for that cache controller. In one embodiment, the receipt of this update causes the lock bit for the requesting cache controller to be cleared in the entry at the metadata store.
  • shared metadata entry 804 By virtue of the assignment shown for shared metadata entry 804 regarding which parts of the “valid” and “dirty” fields may be updated by a given cache controller, avoids the problem of stale metadata belonging to one cache controller being written as part of an update by one of the other cache controllers. Such a mechanism allows multiple cache controllers independently and simultaneously, with no synchronization or communication between them, to access and update a single shared metadata entry, without risk of corrupting the “valid” or “dirty” bits relating to data of the entry handled by one of the other cache controllers because the shared meta data entry is locked.
  • the metadata store will again be able to perform scrubbing and re-allocation of entries.
  • the metadata store may also have a mechanism or protocol to instruct a cache controller to send its update in order to release the lock bit.
  • scrubbing is the process of taking a ‘dirty’ cache data entry (i.e., one that contains newer data than the main memory) and making it ‘clean’ (i.e., containing the same data as main memory).
  • a ‘clean’ cache data entry may become ‘dirty’ as a result of a write command with new data being received from the CPU. Scrubbing is accomplished by copying the data from the cache to the main memory, which results in the data of both cache and main memory being once again identical, hence this cache data entry can now be considered ‘clean’.
  • scrubbing dirty cache data while a lock bit for that entry is set may be possible, provided that the cache controller that set the lock bit indicates to the metadata store whether additional writes were received to that data while the entry was “locked”. For example, this may be because the cache controller has taken a local copy of its “dirty” and “valid” bits for that entry. It is sufficient for a cache controller to notify a metadata store whether additional writes (for example from the CPU) were received to cache data that was already dirty to allow the metadata store to decide whether entry that was scrubbed while ‘locked’ may remain clean (if no additional writes were received, and thus cache data is the same data as main memory), or should be dirty (if additional writes were received and written to the cache data, cache data is not expected to be the same data as main memory).
  • the metadata store when serving transaction requests from an agent that may be expected to access a stream of data, may choose to pro-actively send metadata also to cache controller(s) that did not request it, and to set the appropriate lock bit.
  • the stream of data may be a display controller streaming data to the display as advised to the metadata store by the cache controller.
  • the non-requesting cache controllers may then match incoming cache access requests against the metadata and know not to send a metadata request to the metadata store because they already have the results for such a metadata request. This will allow those controllers to be prepared should they receive a request to the same OS page as was requested in the initial request.
  • logic of the metadata store could request that the cache controllers perform the scrubbing.
  • the logic of the metadata store could send a request to the cache controller to write the cache data for a particular entry to main memory and notify the metadata store when that was done.
  • the metadata store may read the data cached by the cache controllers from the memory accessed by the memory controllers, either directly or via request to the cache controllers, and write this to main memory.
  • FIG. 9 depicts an exemplary system upon which embodiments of the present disclosure may be implemented.
  • the system of FIG. 9 may be a computer system.
  • the system can include a memory controller 902 , a plurality of memory 904 , a processor 906 , and circuitry 908 .
  • the circuitry can be configured to implement the hardware described herein for system 700 , 701 , and/or 703 of FIGS. 7A-C .
  • Various embodiments of such systems for FIG. 9 can include smart phones, laptop computers, handheld and tablet devices, CPU systems, SoC systems, server systems, networking systems, storage systems, high capacity memory systems, or any other computational system.
  • the system can also include an I/O (input/output) interface 910 for controlling the I/O functions of the system, as well as for I/O connectivity to devices outside of the system.
  • I/O input/output
  • a network interface can also be included for network connectivity, either as a separate interface or as part of the I/O interface 910 .
  • the network interface can control network communications both within the system and outside of the system.
  • the network interface can include a wired interface, a wireless interface, a Bluetooth interface, optical interface, and the like, including appropriate combinations thereof.
  • the system can additionally include various user interfaces, display devices, as well as various other components that would be beneficial for such a system.
  • the system can also include memory in addition to memory 904 that can include any device, combination of devices, circuitry, and the like that is capable of storing, accessing, organizing and/or retrieving data.
  • memory 904 can include any device, combination of devices, circuitry, and the like that is capable of storing, accessing, organizing and/or retrieving data.
  • Non-limiting examples include SANs (Storage Area Network), cloud storage networks, volatile or non-volatile RAM, phase change memory, optical media, hard-drive type media, and the like, including combinations thereof.
  • the processor 906 can be a single or multiple processors, and the memory can be a single or multiple memories.
  • the local communication interface can be used as a pathway to facilitate communication between any of a single processor, multiple processors, a single memory, multiple memories, the various interfaces, and the like, in any useful combination.
  • any system can include and use a power supply such as but not limited to a battery, AC-DC converter at least to receive alternating current and supply direct current, renewable energy source (e.g., solar power or motion based power), or the like.
  • a power supply such as but not limited to a battery, AC-DC converter at least to receive alternating current and supply direct current, renewable energy source (e.g., solar power or motion based power), or the like.
  • the disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. Portions of the disclosed embodiments may also be implemented as instructions carried by or stored on a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors.
  • a machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device).
  • Reference to storage, stores, memory, or memory devices can refer to memory whose state is in-determinate if power is interrupted to the device (e.g., DRAM) or to memory devices whose state is determinate even if power is interrupted to the device.
  • such an additional memory device can comprise a block addressable mode memory device, such as planar or multi-dimensional NAND or NOR technologies, or more specifically, multi-threshold level NAND flash memory, NOR flash memory, and the like.
  • a memory device can also include a byte-addressable three dimensional crosspoint memory device, or other byte addressable write-in-place nonvolatile memory devices, such as single or multi-level Phase Change Memory (PCM), memory devices that use chalcogenide phase change material (e.g., chalcogenide glass), resistive memory, nanowire memory, ferroelectric transistor random access memory (FeTRAM), magnetoresistive random access memory (MRAM) memory that incorporates memristor technology, or spin transfer torque (STT)-MRAM.
  • PCM Phase Change Memory
  • FeTRAM ferroelectric transistor random access memory
  • MRAM magnetoresistive random access memory
  • STT spin transfer torque
  • FIG. 10 depicts a flowchart of a method for sharing metadata and metadata stores.
  • the method can be executed as instructions on a machine, where the instructions are included on at least one computer readable medium or one non-transitory machine-readable storage medium.
  • the circuitry 908 of FIG. 9 is configured to carry out the steps of FIG. 10 .
  • the systems depicted in FIGS. 7A-C may be employed to carry out the steps of FIG. 10 .
  • the method can include the operation of: connect a metadata store with a plurality of cache controllers via a metadata store fabric, as in block 1002 .
  • the method can include the operation of: receive information at the metadata store from at least one of the plurality of cache controllers, as in block 1004 .
  • the method can include the operation of: store the information as shared distributed metadata in the metadata store, as in block 1006 .
  • the method can include the operation of: providing shared access of the shared distributed metadata to the plurality of cache controllers, as in block 1008 .
  • the method can include the operation of: assign a task to a logic block wherein the task executed at the logic block operates on the shared distributed metadata, as in block 1010 .
  • the method can include the operation of: lock the valid bits and dirty bits of a given cache controller via a lock bit indicating that the valid bits and dirty bits of the given cache controller should not be changed except by the given cache controller, as in block 1012 .
  • the method can include the operation of: upon completion of relevant transactions at a given cache controller, update the appropriate metadata store of appropriate valid bits and dirty bits which causes a lock bit to be cleared, as in block 1014 . It should be appreciated that the steps of FIG. 10 may not include all of the steps depicted nor in the order in which they are depicted.
  • a memory system comprising:
  • circuitry configured to:
  • a metadata store in communication with the at least one cache controller with circuitry configured to:
  • a metadata store fabric disposed between the plurality of cache controllers and the at least one metadata store to facilitate the shared access.
  • the information is related to a task assigned to one of the plurality of cache controllers.
  • the metadata store fabric further comprises a common logic block to manage the task assigned to one of the plurality of cache controllers.
  • the metadata store further comprises a logic block to manage the task assigned to one of the plurality of cache controllers.
  • the metadata store is one of a plurality of metadata stores.
  • the metadata store is one of a plurality of metadata stores and the number of the plurality of metadata stores corresponds to the number of the plurality of cache controllers.
  • the metadata store is one of a plurality of metadata stores and the number of the plurality of metadata stores is greater than the number of the plurality of cache controllers.
  • the metadata store is a static random-access memory (SRAM) array.
  • SRAM static random-access memory
  • one of the tasks assigned to the metadata store comprises maintaining least recently used (LRU) indications.
  • LRU least recently used
  • one of the tasks assigned to the metadata store comprises re-allocating an entry based on the least recently used (LRU) indication when a new system memory address is to be cached.
  • LRU least recently used
  • the shared distributed metadata hosted by the metadata store comprises valid bits and dirty bits.
  • the shared distributed metadata hosted by the metadata store comprises lock bits pertaining to the plurality of cache controllers.
  • a lock bit is to assert that the valid bits and dirty bits of a given cache controller are locked and are not changed except by the given cache controller.
  • one of the plurality of cache controllers upon completion of all transactions relating to a metadata entry, is to update the metadata store of appropriate valid bits and dirty bits and cause a lock bit to be cleared.
  • a logic block is configured to identify dirty entries for a scrubbing operation wherein the logic block is associated with the metadata store fabric or the metadata store.
  • a system comprising:
  • processors configured to process data
  • an input output subsystem configured to receive input data and to output data
  • circuitry configured to:
  • a cache controller fabric disposed between the system fabric and the plurality of cache controllers
  • a metadata store in communication with the plurality of cache controllers with circuitry configured to:
  • a metadata store fabric disposed between the plurality of cache controllers and the plurality of metadata stores
  • a system fabric configured to connect the one or more processors and the input output subsystem to the plurality of memory controllers and the plurality of cache controllers.
  • the information is related to a task assigned to one of the plurality of cache controllers.
  • the metadata store fabric further comprises a common logic block to manage the task assigned to one of the plurality of cache controllers.
  • the metadata store further comprises a logic block to manage the task assigned to one of the plurality of cache controllers.
  • the metadata store is one of a plurality of metadata stores.
  • the metadata store is one of a plurality of metadata stores and the number of the plurality of metadata stores corresponds to the number of the plurality of cache controllers.
  • the metadata store is one of a plurality of metadata stores and the number of the plurality of metadata stores is greater than the number of the plurality of cache controllers.
  • the metadata store is a static random-access memory (SRAM) array.
  • SRAM static random-access memory
  • one of the tasks assigned to the metadata store comprises maintaining least recently used (LRU) indications.
  • LRU least recently used
  • one of the tasks assigned to the metadata store comprises re-allocating an entry based on the least recently used (LRU) indication when a new system memory address is to be cached.
  • LRU least recently used
  • the shared distributed metadata hosted by the metadata store comprises valid bits and dirty bits.
  • the shared distributed metadata hosted by the metadata store comprises lock bits pertaining to the plurality of cache controllers.
  • a lock bit is to assert that the valid bits and dirty bits of a given cache controller are locked and are not changed except by the given cache controller.
  • one of the plurality of cache controllers upon completion of all transactions relating to a metadata entry, is to update the metadata store of appropriate valid bits and dirty bits and cause a lock bit to be cleared.
  • a logic block is configured to identify dirty entries for a scrubbing operation wherein the logic block is associated with the metadata store fabric or the metadata store.
  • a method comprising:
  • the metadata store is one of a plurality of metadata stores.
  • the plurality of cache controllers and the metadata store are interconnected via a metadata store fabric.
  • the metadata store fabric comprises a common logic block to manage the task assigned to one of the plurality of cache controllers.
  • the metadata store further comprises a logic block to manage the task assigned to one of the plurality of cache controllers.
  • the metadata store is a static random-access memory (SRAM) array.
  • SRAM static random-access memory
  • the task assigned to the metadata store comprises maintaining least recently used (LRU) indications.
  • LRU least recently used
  • the task assigned to the metadata store comprises re-allocating a clean entry with a higher least recently used (LRU) indication when a new system memory address is to be cached.
  • LRU least recently used
  • the shared distributed metadata hosted by the metadata store comprises lock bits, valid bits, and dirty bits.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
US15/154,812 2016-05-13 2016-05-13 Interleaved cache controllers with shared metadata and related devices and systems Abandoned US20170329711A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US15/154,812 US20170329711A1 (en) 2016-05-13 2016-05-13 Interleaved cache controllers with shared metadata and related devices and systems
PCT/US2017/027499 WO2017196495A1 (fr) 2016-05-13 2017-04-13 Contrôleurs de cache entrelacés à métadonnées partagées et dispositifs et systèmes associés
US16/019,426 US10657058B2 (en) 2016-05-13 2018-06-26 Interleaved cache controllers with shared metadata and related devices and systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/154,812 US20170329711A1 (en) 2016-05-13 2016-05-13 Interleaved cache controllers with shared metadata and related devices and systems

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/019,426 Continuation US10657058B2 (en) 2016-05-13 2018-06-26 Interleaved cache controllers with shared metadata and related devices and systems

Publications (1)

Publication Number Publication Date
US20170329711A1 true US20170329711A1 (en) 2017-11-16

Family

ID=58606604

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/154,812 Abandoned US20170329711A1 (en) 2016-05-13 2016-05-13 Interleaved cache controllers with shared metadata and related devices and systems
US16/019,426 Active 2036-10-11 US10657058B2 (en) 2016-05-13 2018-06-26 Interleaved cache controllers with shared metadata and related devices and systems

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/019,426 Active 2036-10-11 US10657058B2 (en) 2016-05-13 2018-06-26 Interleaved cache controllers with shared metadata and related devices and systems

Country Status (2)

Country Link
US (2) US20170329711A1 (fr)
WO (1) WO2017196495A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020012104A1 (fr) * 2018-07-10 2020-01-16 Commissariat A L'energie Atomique Et Aux Energies Alternatives Circuit de génération de facteurs de rotation pour processeur ntt
CN110765036A (zh) * 2018-07-27 2020-02-07 伊姆西Ip控股有限责任公司 在控制设备处管理元数据的方法、设备和计算机程序产品
CN114301858A (zh) * 2021-02-05 2022-04-08 井芯微电子技术(天津)有限公司 共享缓存系统及方法、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110276762A1 (en) * 2010-05-07 2011-11-10 International Business Machines Corporation Coordinated writeback of dirty cachelines
US20120210069A1 (en) * 2009-10-25 2012-08-16 Plurality Ltd. Shared cache for a tightly-coupled multiprocessor
US20130268728A1 (en) * 2011-09-30 2013-10-10 Raj K. Ramanujan Apparatus and method for implementing a multi-level memory hierarchy having different operating modes

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6757790B2 (en) * 2002-02-19 2004-06-29 Emc Corporation Distributed, scalable data storage facility with cache memory
EP2579160A1 (fr) * 2010-05-27 2013-04-10 Fujitsu Limited Système de traitement d'informations et contrôleur de système
US9846648B2 (en) * 2015-05-11 2017-12-19 Intel Corporation Create page locality in cache controller cache allocation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120210069A1 (en) * 2009-10-25 2012-08-16 Plurality Ltd. Shared cache for a tightly-coupled multiprocessor
US20110276762A1 (en) * 2010-05-07 2011-11-10 International Business Machines Corporation Coordinated writeback of dirty cachelines
US20130268728A1 (en) * 2011-09-30 2013-10-10 Raj K. Ramanujan Apparatus and method for implementing a multi-level memory hierarchy having different operating modes

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020012104A1 (fr) * 2018-07-10 2020-01-16 Commissariat A L'energie Atomique Et Aux Energies Alternatives Circuit de génération de facteurs de rotation pour processeur ntt
FR3083885A1 (fr) * 2018-07-10 2020-01-17 Commissariat A L'energie Atomique Et Aux Energies Alternatives Circuit de generation de facteurs de rotation pour processeur ntt
CN110765036A (zh) * 2018-07-27 2020-02-07 伊姆西Ip控股有限责任公司 在控制设备处管理元数据的方法、设备和计算机程序产品
US10936489B2 (en) * 2018-07-27 2021-03-02 EMC IP Holding Company LLC Method, device and computer program product for managing metadata at a control device
CN114301858A (zh) * 2021-02-05 2022-04-08 井芯微电子技术(天津)有限公司 共享缓存系统及方法、电子设备及存储介质

Also Published As

Publication number Publication date
US20190004953A1 (en) 2019-01-03
US10657058B2 (en) 2020-05-19
WO2017196495A1 (fr) 2017-11-16

Similar Documents

Publication Publication Date Title
US10564872B2 (en) System and method for dynamic allocation to a host of memory device controller memory resources
US9009397B1 (en) Storage processor managing solid state disk array
JP6496626B2 (ja) 異種統合メモリ部及びその拡張統合メモリスペース管理方法
CN102804152B (zh) 对存储器层次结构中的闪存的高速缓存一致性支持
US8417871B1 (en) System for increasing storage media performance
EP2992438B1 (fr) Réseau de mémoire
US11003385B2 (en) Memory system and method for controlling nonvolatile memory in which write data are stored in a shared device side write buffer shared by a plurality of write destination blocks
US8037251B2 (en) Memory compression implementation using non-volatile memory in a multi-node server system with directly attached processor memory
US11797436B2 (en) Memory system and method for controlling nonvolatile memory
TW202331530A (zh) 記憶體系統
US10657058B2 (en) Interleaved cache controllers with shared metadata and related devices and systems
US20220066693A1 (en) System and method of writing to nonvolatile memory using write buffers
US10503647B2 (en) Cache allocation based on quality-of-service information
WO2018024214A1 (fr) Procédé et dispositif de réglage de flux e/s
US7793051B1 (en) Global shared memory subsystem
CN109213425B (zh) 利用分布式缓存在固态存储设备中处理原子命令
CN109840048A (zh) 存储命令处理方法及其存储设备
US11768628B2 (en) Information processing apparatus
CN107766262B (zh) 调节并发写命令数量的方法与装置
WO2024088150A1 (fr) Procédé et appareil de stockage de données basés sur un disque statique à semiconducteurs à canaux ouverts, dispositif, support et produit
JP7167295B2 (ja) メモリシステムおよび制御方法
JP7204020B2 (ja) 制御方法
US20230359578A1 (en) Computing system including cxl switch, memory device and storage device and operating method thereof
EP4220414A1 (fr) Contrôleur de stockage gérant différents types de blocs, son procédé de fonctionnement et procédé de fonctionnement de dispositif de stockage le comprenant
JP7490714B2 (ja) メモリシステムおよび制御方法

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GREENSPAN, DANIEL;GREENFIELD, ZVIKA;SIGNING DATES FROM 20160330 TO 20160405;REEL/FRAME:038594/0644

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION