WO2013048467A1 - Generation of far memory access signals based on usage statistic tracking - Google Patents

Generation of far memory access signals based on usage statistic tracking Download PDF

Info

Publication number
WO2013048467A1
WO2013048467A1 PCT/US2011/054379 US2011054379W WO2013048467A1 WO 2013048467 A1 WO2013048467 A1 WO 2013048467A1 US 2011054379 W US2011054379 W US 2011054379W WO 2013048467 A1 WO2013048467 A1 WO 2013048467A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
address
nvram
storage
circuitry
Prior art date
Application number
PCT/US2011/054379
Other languages
French (fr)
Inventor
Robert Faber
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to EP11873232.0A priority Critical patent/EP2761467B1/en
Priority to PCT/US2011/054379 priority patent/WO2013048467A1/en
Priority to CN201180075119.XA priority patent/CN103946813B/en
Priority to US13/996,525 priority patent/US9600407B2/en
Priority to TW101130980A priority patent/TWI518686B/en
Publication of WO2013048467A1 publication Critical patent/WO2013048467A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/20Employing a main memory using a specific memory technology
    • G06F2212/202Non-volatile memory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This invention relates generally to the field of computer systems. More particularly, the invention relates to an apparatus and method for implementing a multi-level memory hierarchy including a non-volatile memory tier.
  • system memory also known as main memory, primary memory, executable memory
  • DRAM dynamic random access memory
  • DRAM-based memory consumes power even when no memory reads or writes occur because it must constantly recharge internal capacitors.
  • DRAM-based memory is volatile, which means data stored in DRAM memory is lost once the power is removed.
  • Conventional computer systems also rely on multiple levels of caching to improve performance.
  • a cache is a high speed memory positioned between the processor and system memory to service memory access requests faster than they could be serviced from system memory. Such caches are typically implemented with static random access memory (SRAM).
  • SRAM static random access memory
  • Cache management protocols may be used to ensure that the most frequently accessed data and instructions are stored within one of the levels of cache, thereby reducing the number of memory access transactions and improving performance.
  • mass storage also known as secondary storage or disk storage
  • conventional mass storage devices typically include magnetic media (e.g., hard disk drives), optical media (e.g., compact disc (CD) drive, digital versatile disc (DVD), etc.), holographic media, and/or mass-storage flash memory (e.g., solid state drives (SSDs), removable flash drives, etc.).
  • these storage devices are considered Input/Output (I/O) devices because they are accessed by the processor through various I/O adapters that implement various I/O protocols.
  • I/O adapters and I/O protocols consume a significant amount of power and can have a significant impact on the die area and the form factor of the platform.
  • Portable or mobile devices e.g., laptops, netbooks, tablet computers, personal digital assistant (PDAs), portable media players, portable gaming devices, digital cameras, mobile phones, smartphones, feature phones, etc.
  • PDAs personal digital assistant
  • portable media players portable gaming devices
  • digital cameras mobile phones, smartphones, feature phones, etc.
  • removable mass storage devices e.g., Embedded Multimedia Card (eMMC), Secure Digital (SD) card
  • eMMC Embedded Multimedia Card
  • SD Secure Digital
  • BIOS flash With respect to firmware memory (such as boot memory (also known as BIOS flash)), a conventional computer system typically uses flash memory devices to store persistent system information that is read often but seldom (or never) written to. For example, the initial instructions executed by a processor to initialize key system components during a boot process (Basic Input and Output System (BIOS) images) are typically stored in a flash memory device. Flash memory devices that are currently available in the market generally have limited speed (e.g., 50 MHz). This speed is further reduced by the overhead for read protocols (e.g., 2.5 MHz). In order to speed up the BIOS execution speed, conventional processors generally cache a portion of BIOS code during the Pre-Extensible Firmware Interface (PEI) phase of the boot process. The size of the processor cache places a restriction on the size of the BIOS code used in the PEI phase (also known as the "PEI BIOS code").
  • PEI Pre-Extensible Firmware Interface
  • PCM Phase-Change Memory
  • Phase-change memory also sometimes referred to as phase change random access memory (PRAM or PCRAM), PCME, Ovonic Unified Memory, or Chalcogenide RAM (C-RAM)
  • PCM phase change random access memory
  • PCME phase change random access memory
  • C-RAM Chalcogenide RAM
  • PCM Phase-change memory
  • PCME phase change random access memory
  • C-RAM Chalcogenide RAM
  • PCM proivdes higher performance than flash because the memory element of PCM can be switched more quickly, writing (changing individual bits to either 1 or 0) can be done without the need to first erase an entire block of cells, and degradation from writes is slower (a PCM device may survive approximately 100 million write cycles; PCM degradation is due to thermal expansion during programming, metal (and other material) migration, and other mechanisms).
  • FIG. 1 illustrates a cache and system memory arrangement according to embodiments of the invention
  • FIG. 2 illustrates a memory and storage hierarchy employed in embodiments of the invention
  • FIG. 3 illustrates a computer system on which embodiments of the invention may be implemented;
  • FIG. 4A illustrates a first system architecture which includes PCM according to embodiments of the invention;
  • FIG. 4B illustrates a second system architecture which includes PCM according to embodiments of the invention
  • FIG. 4C illustrates a third system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4D illustrates a fourth system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4E illustrate a fifth system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4F illustrate a sixth system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4G illustrates a seventh system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4H illustrates an eight system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4I illustrates a ninth system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4J illustrates a tenth system architecture which includes PCM according to embodiments of the invention.
  • FIG. 4K illustrates an eleventh system architecture which includes PCM according to embodiments of the invention
  • FIG. 4L illustrates a twelfth system architecture which includes PCM according to embodiments of the invention
  • FIG. 4M illustrates a thirteenth system architecture which includes PCM according to embodiments of the invention.
  • FIG. 5 illustrates aspects of an NVRAM controller for determining far memory signaling based on usage statistics tracking
  • FIG. 6 illustrates a method that can be performed by the NVRAM controller of FIG. 5;
  • FIGS. 7A-7D illustrate various approaches for integrating the NVRAM controller of FIG. 5 into a memory channel.
  • references in the specification to "one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other
  • Coupled is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
  • Connected is used to indicate the establishment of communication between two or more elements that are coupled with each other.
  • Bracketed text and blocks with dashed borders are sometimes used herein to illustrate optional operations/components that add additional features to
  • FIG. 1 illustrates a cache and system memory arrangement according to embodiments of the invention.
  • Figure 1 shows a memory hierarchy including a set of internal processor caches 120, "near memory” acting as a far memory cache 121 , which may include both internal cache(s) 106 and external caches 107-109, and "far memory” 122.
  • One particular type of memory which may be used for "far memory” in some embodiments of the invention is non-volatile random access memory
  • NVRAM NVRAM
  • far memory and near memory NVRAM
  • NRAM Non-Volatile Random Access Memory
  • NVRAM Non-Volatile RAM
  • PCM Phase Change Memory and Switch
  • BPRAM byte-addressable persistent memory
  • SCM storage class memory
  • PMC programmable metallization cell
  • RRAM resistive memory
  • RESET amorphous cell
  • SET crystalline cell
  • PCME Ovshinsky memory
  • ferroelectric memory also known as polymer memory and poly(N- vinylcarbazole)
  • ferromagnetic memory also known as Spintronics, SPRAM (spin-transfer torque RAM), STRAM (spin tunneling RAM), magnetoresistive memory, magnetic memory, magnetic random access memory (MRAM)
  • SONOS Semiconductor-oxide-nitride-oxide-semiconductor
  • NVRAM has the following characteristics:
  • FLASH memory used in solid state disks (SSD), and different from SRAM and DRAM which are volatile;
  • the bus may be a memory bus (e.g., a DDR bus such as DDR3, DDR4, etc.) over which is run a transactional protocol as opposed to the non-transactional protocol that is normally used.
  • a transactional protocol a protocol that supports transaction identifiers (IDs) to distinguish different transactions so that those transactions can complete out-of-order
  • the bus may be a memory bus (e.g., a DDR bus such as DDR3, DDR4, etc.) over which is run a transactional protocol as opposed to the non-transactional protocol that is normally used.
  • the bus may one over which is normally run a transactional protocol (a native transactional protocol), such as a PCI express (PCIE) bus, desktop management interface (DMI) bus, or any other type of bus utilizing a transactional protocol and a small enough transaction payload size (e.g., cache line size such as 64 or 128 byte); and
  • a transactional protocol such as a PCI express (PCIE) bus, desktop management interface (DMI) bus, or any other type of bus utilizing a transactional protocol and a small enough transaction payload size (e.g., cache line size such as 64 or 128 byte); and
  • the level of granularity at which NVRAM is accessed in any given implementation may depend on the particular memory controller and the particular memory bus or other type of bus to which the NVRAM is coupled. For example, in some combination of non-volatile memory/storage technologies such as FLASH; b) very high read speed (faster than FLASH and near or equivalent to DRAM read speeds); c) directly writable (rather than requiring erasing (overwriting with 1 s) before writing data like FLASH memory used in SSDs
  • the NVRAM may be accessed at the granularity of a cache line (e.g., a 64-byte or 128- Byte cache line), notwithstanding an inherent ability to be accessed at the granularity of a byte, because cache line is the level at which the memory subsystem accesses memory.
  • a cache line e.g., a 64-byte or 128- Byte cache line
  • cache line is the level at which the memory subsystem accesses memory.
  • the level of granularity of access to the NVRAM by the memory controller and memory bus or other type of bus is smaller than that of the block size used by Flash and the access size of the I/O subsystem's controller and bus.
  • NVRAM may also incorporate wear leveling algorithms to account for the fact that the storage cells at the far memory level begin to wear out after a number of write accesses, especially where a significant number of writes may occur such as in a system memory implementation. Since high cycle count blocks are most likely to wear out in this manner, wear leveling spreads writes across the far memory cells by swapping addresses of high cycle count blocks with low cycle count blocks. Note that most address swapping is typically transparent to application programs_because it is handled by hardware, lower-level software (e.g., a low level driver or operating system), or a combination of the two.
  • lower-level software e.g., a low level driver or operating system
  • the far memory 122 of some embodiments of the invention is implemented with NVRAM, but is not necessarily limited to any particular memory technology. Far memory 122 is distinguishable from other instruction and data memory/storage technologies in terms of its characteristics and/or its application in the memory/storage hierarchy. For example, far memory 122 is different from:
  • SRAM static random access memory
  • LLC lower level cache
  • DRAM dynamic random access memory
  • memory such as FLASH memory or other read only memory
  • ROM applied as firmware memory (which can refer to boot ROM, BIOS Flash, and/or TPM Flash). (not shown).
  • Far memory 122 may be used as instruction and data storage that is directly addressable by a processor 100 and is able to sufficiently keep pace with the processor 100 in contrast to FLASH/magnetic disk/optical disc applied as mass storage. Moreover, as discussed above and described in detail below, far memory 122 may be placed on a memory bus and may communicate directly with a memory controller that, in turn, communicates directly with the processor 100.
  • Far memory 122 may be combined with other instruction and data storage technologies (e.g., DRAM) to form hybrid memories (also known as Co-locating PCM and DRAM; first level memory and second level memory; FLAM (FLASH and DRAM)). Note that at least some of the above
  • PCM/PCMS may be used for mass storage instead of, or in addition to, system memory, and need not be random accessible, byte addressable or directly addressable by the processor when applied in this manner.
  • NVRAM NVRAM
  • PCM PCM
  • PCMS PCMS
  • NVRAM non-volatile memory
  • PCM PCM
  • PCMS PCMS
  • far memory may be used interchangeably in the following discussion. However it should be realized, as discussed above, that different technologies may also be utilized for far memory. Also, that NVRAM is not limited for use as far memory.
  • Near memory 121 is an intermediate level of memory configured in front of a far memory 122 that has lower read/write access latency relative to far memory and/or more symmetric read/write access latency (i.e., having read times which are roughly equivalent to write times).
  • the near memory 121 has significantly lower write latency than the far memory 122 but similar (e.g., slightly lower or equal) read latency; for instance the near memory 121 may be a volatile memory such as volatile random access memory (VRAM) and may comprise a DRAM or other high speed capacitor-based memory. Note, however, that the underlying principles of the invention are not limited to these specific memory types. Additionally, the near memory 121 may have a relatively lower density and/or may be more expensive to manufacture than the far memory 122.
  • VRAM volatile random access memory
  • near memory 121 is configured between the far memory 122 and the internal processor caches 120.
  • near memory 121 is configured as one or more memory-side caches (MSCs) 107-109 to mask the performance and/or usage limitations of the far memory including, for example, read/write latency limitations and memory degradation limitations.
  • MSCs memory-side caches
  • the combination of the MSC 107-109 and far memory 122 operates at a performance level which approximates, is equivalent or exceeds a system which uses only DRAM as system memory.
  • the near memory 121 may include modes in which it performs other roles, either in addition to, or in lieu of, performing the role of a cache.
  • Near memory 121 can be located on the processor die (as cache(s) 106) and/or located external to the processor die (as caches 107- 109) (e.g., on a separate die located on the CPU package, located outside the CPU package with a high bandwidth link to the CPU package, for example, on a memory dual in-line memory module (DIMM), a DIMM, a DIMM, a DIMM, a DIMM, a separate die located on the CPU package, located outside the CPU package with a high bandwidth link to the CPU package, for example, on a memory dual in-line memory module (DIMM), a DIMM dual in-line memory module (DIMM), a
  • the near memory 121 may be coupled in communicate with the processor 100 using a single or multiple high bandwidth links, such as DDR or other transactional high bandwidth links (as described in detail below).
  • Figure 1 illustrates how various levels of caches 101 -109 are configured with respect to a system physical address (SPA) space 1 16-1 19 in embodiments of the invention.
  • this embodiment comprises a processor 100 having one or more cores 101 -104, with each core having its own dedicated upper level cache (L0) 101 a-104a and mid-level cache (MLC) (L1 ) cache 101 b-104b.
  • the processor 100 also includes a shared LLC 105. The operation of these various cache levels are well understood and will not be described in detail here.
  • the caches 107-109 illustrated in Figure 1 may be dedicated to a particular system memory address range or a set of non-contiguous address ranges.
  • cache 107 is dedicated to acting as an MSC for system memory address range # 1 1 16 and caches 108 and 109 are dedicated to acting as MSCs for non-overlapping portions of system memory address ranges # 2 1 17 and # 3 1 18.
  • the latter implementation may be used for systems in which the SPA space used by the processor 100 is interleaved into an address space used by the caches 107-109 (e.g., when configured as MSCs). In some embodiments, this latter address space is referred to as a memory channel address (MCA) space.
  • MCA memory channel address
  • the internal caches 101 a-106 perform caching operations for the entire SPA space.
  • System memory as used herein is memory which is visible to and/or directly addressable by software executed on the processor 100;
  • cache memories 101 a-109 may operate transparently to the software in the sense that they do not form a directly-addressable portion of the system address space, but the cores may also support execution of instructions to allow software to provide some control (configuration, policies, hints, etc.) to some or all of the cache(s).
  • the subdivision of system memory into regions 1 16-1 19 may be performed manually as part of a system configuration process (e.g., by a system designer) and/or may be performed automatically by software.
  • system memory regions 1 16-1 19 are implemented using far memory (e.g., PCM) and, in some embodiments, near memory configured as system memory.
  • System memory address range # 4 represents an address range which is implemented using a higher speed memory such as DRAM which may be a near memory configured in a system memory mode (as opposed to a caching mode).
  • Figure 2 illustrates a memory/storage hierarchy 140 and different configurable modes of operation for near memory 144 and NVRAM
  • a cache level 150 which may include processor caches 150A (e.g., caches 101 A-105 in Figure 1 ) and optionally near memory as cache for far memory 150B (in certain modes of operation as described herein), (2) a system memory level 151 which may include far memory 151 B (e.g., NVRAM such as PCM) when near memory is present
  • processor caches 150A e.g., caches 101 A-105 in Figure 1
  • system memory level 151 which may include far memory 151 B (e.g., NVRAM such as PCM) when near memory is present
  • NVRAM non-NVRAM as system memory 174 when near memory is not present
  • near memory operating as system memory 151 A in certain modes of operation as described herein
  • mass storage level 152 which may include a flash/magnetic/optical mass storage 152B and/or NVRAM mass storage 152A (e.g., a portion of the NVRAM 142); and (4) a firmware memory level 153 that may include BIOS flash 170 and/or BIOS NVRAM 172 and optionally trusted platform module (TPM) NVRAM 173.
  • TPM trusted platform module
  • near memory 144 may be implemented to operate in a variety of different modes including: a first mode in which it operates as a cache for far memory (near memory as cache for FM 150B); a second mode in which it operates as system memory 151 A and occupies a portion of the SPA space (sometimes referred to as near memory "direct access” mode); and one or more additional modes of operation such as a scratchpad memory 192 or as a write buffer 193.
  • the near memory is partitionable, where each partition may concurrently operate in a different one of the supported modes; and different
  • embodiments may support configuration of the partitions (e.g., sizes, modes) by hardware (e.g., fuses, pins), firmware, and/or software (e.g., through a set of programmable range registers within the MSC controller 124 within which, for example, may be stored different binary codes to identify each mode and partition).
  • hardware e.g., fuses, pins
  • firmware e.g., firmware
  • software e.g., through a set of programmable range registers within the MSC controller 124 within which, for example, may be stored different binary codes to identify each mode and partition.
  • System address space A 190 in Figure 2 is used to illustrate operation when near memory is configured as a MSC for far memory 150B.
  • system address space A 190 represents the entire system address space (and system address space B 191 does not exist).
  • system address space B 191 is used to show an
  • system address space B 191 represents the range of the system address space assigned to the near memory 151 A and system address space A 190 represents the range of the system address space assigned to NVRAM 174.
  • the near memory 144 may operate in various sub-modes under the control of the MSC controller 124. In each of these modes, the near memory address space (NMA) is transparent to software in the sense that the near memory does not form a directly-addressable portion of the system address space. These modes include but are not limited to the following:
  • (2) Near Memory Bypass Mode In this mode all reads and writes bypass the NM acting as a FM cache 150B and go directly to the NVRAM FM 151 B. Such a mode may be used, for example, when an application is not cache friendly or requires data to be committed to persistence at the granularity of a cache line.
  • the caching performed by the processor caches 150A and the NM acting as a FM cache 150B operate independently of one another. Consequently, data may be cached in the NM acting as a FM cache 150B which is not cached in the processor caches 150A (and which, in some cases, may not be permitted to be cached in the processor caches 150A) and vice versa. Thus, certain data which may be designated as "uncacheable" in the processor caches may be cached within the NM acting as a FM cache 150B.
  • FM 151 B is allowed (i.e., the persistent data is cached in the near memory as cache for far memory 150B for read-only operations). This is useful when most of the persistent data is "Read-Only" and the application usage is cache-friendly.
  • HPC high performance computing
  • graphics applications which require very fast access to certain data structures.
  • the near memory direct access mode is implemented by "pinning" certain cache lines in near memory (i.e., cache lines which have data that is also concurrently stored in NVRAM 142). Such pinning may be done effectively in larger, multi-way, set-associative caches.
  • Figure 2 also illustrates that a portion of the NVRAM 142 may be used as firmware memory.
  • the BIOS NVRAM 172 portion may be used to store BIOS images (instead of or in addition to storing the BIOS information in BIOS flash 170).
  • the BIOS NVRAM portion 172 may be a portion of the SPA space and is directly addressable by software executed on the processor cores 101 -104, whereas the BIOS flash 170 is addressable through the I/O subsystem 1 15.
  • a trusted platform module (TPM) NVRAM 173 portion may be used to protect sensitive system information (e.g., encryption keys).
  • TPM trusted platform module
  • the NVRAM 142 may be implemented to operate in a variety of different modes, including as far memory 151 B (e.g., when near memory 144 is present/operating, whether the near memory is acting as a cache for the FM via a MSC control 124 or not (accessed directly after cache(s) 101 A - 105 and without MSC control 124)); just NVRAM system memory 174 (not as far memory because there is no near memory present/operating; and accessed without MSC control 124); NVRAM mass storage 152A; BIOS NVRAM 172; and TPM NVRAM 173. While different embodiments may specify the NVRAM modes in different ways, Figure 3 describes the use of a decode table 333.
  • FIG. 3 illustrates an exemplary computer system 300 on which embodiments of the invention may be implemented.
  • the computer system 300 includes a processor 310 and memory/storage subsystem 380 with a NVRAM 142 used for both system memory, mass storage, and optionally firmware memory.
  • the NVRAM 142 comprises the entire system memory and storage hierarchy used by computer system 300 for storing data, instructions, states, and other persistent and non-persistent information.
  • NVRAM 142 can be configured to implement the roles in a typical memory and storage hierarchy of system memory, mass storage, and firmware memory, TPM memory, and the like.
  • NVRAM 142 is partitioned into FM 151 B, NVRAM mass storage 152A, BIOS NVRAM 173, and TMP NVRAM 173. Storage hierarchies with different roles are also contemplated and the application of NVRAM 142 is not limited to the roles described above.
  • FM 150B is in the write-back caching is described.
  • a read operation will first arrive at the MSC controller 124 which will perform a look-up to determine if the requested data is present in the near memory acting as a cache for FM 150B (e.g., utilizing a tag cache 342). If present, it will return the data to the requesting CPU, core 101 -104 or I/O device through I/O subsystem 1 15. If the data is not present, the MSC controller 124 will send the request along with the system memory address to an NVRAM controller 332.
  • the NVRAM controller 332 will use the decode table 333 to translate the system memory address to an NVRAM physical device address (PDA) and direct the read operation to this region of the far memory 151 B.
  • the decode table 333 includes an address indirection table (AIT) component which the NVRAM controller 332 uses to translate between system memory addresses and NVRAM PDAs.
  • the AIT is updated as part of the wear leveling algorithm implemented to distribute memory access operations and thereby reduce wear on the NVRAM FM 151 B.
  • the AIT may be a separate table stored within the NVRAM controller 332.
  • the NVRAM controller 332 Upon receiving the requested data from the NVRAM FM 151 B, the NVRAM controller 332 will return the requested data to the MSC controller 124 which will store the data in the MSC near memory acting as an FM cache 150B and also send the data to the requesting processor core 101 -104, or I/O Device through I/O subsystem 1 15. Subsequent requests for this data may be serviced directly from the near memory acting as a FM cache 150B until it is replaced by some other NVRAM FM data.
  • a memory write operation also first goes to the MSC controller 124 which writes it into the MSC near memory acting as a FM cache 150B.
  • the data may not be sent directly to the NVRAM FM 151 B when a write operation is received.
  • the data may be sent to the NVRAM FM 151 B only when the location in the MSC near memory acting as a FM cache 150B in which the data is stored must be re-used for storing data for a different system memory address.
  • the MSC controller 124 notices that the data is not current in NVRAM FM 151 B and will thus retrieve it from near memory acting as a FM cache 150B and send it to the NVRAM controller 332.
  • the NVRAM controller 332 looks up the PDA for the system memory address and then writes the data to the NVRAM FM 151 B.
  • the NVRAM controller 332 is shown connected to the FM 151 B, NVRAM mass storage 152A, and BIOS NVRAM 172 using three separate lines. This does not necessarily mean, however, that there are three separate physical buses or communication channels connecting the NVRAM controller 332 to these portions of the NVRAM 142. Rather, in some embodiments, a common memory bus or other type of bus (such as those described below with respect to Figures 4A-M) is used to
  • the three lines in Figure 3 represent a bus, such as a memory bus (e.g., a DDR3, DDR4, etc, bus), over which the NVRAM controller 332 implements a transactional protocol to communicate with the NVRAM 142.
  • the NVRAM controller 332 may also communicate with the NVRAM 142 over a bus supporting a native transactional protocol such as a PCI express bus, desktop management interface (DMI) bus, or any other type of bus utilizing a transactional protocol and a small enough transaction payload size (e.g., cache line size such as 64 or 128 byte).
  • a native transactional protocol such as a PCI express bus, desktop management interface (DMI) bus, or any other type of bus utilizing a transactional protocol and a small enough transaction payload size (e.g., cache line size such as 64 or 128 byte).
  • computer system 300 includes integrated memory controller (IMC) 331 which performs the central memory access control for processor 310, which is coupled to: 1 ) a memory-side cache (MSC) controller 124 to control access to near memory (NM) acting as a far memory cache 150B; and 2) a NVRAM controller 332 to control access to NVRAM 142.
  • IMC integrated memory controller
  • MSC memory-side cache
  • NM near memory
  • NVRAM controller 332 to control access to NVRAM 142.
  • the MSC controller 124 and NVRAM controller 332 may logically form part of the IMC 331 .
  • the MSC controller 124 includes a set of range registers 336 which specify the mode of operation in use for the
  • NM acting as a far memory cache 150B e.g., write-back caching mode, near memory bypass mode, etc, described above.
  • DRAM 144 is used as the memory technology for the NM acting as cache for far memory 150B.
  • the MSC controller 124 may determine (depending on the mode of operation specified in the range registers 336) whether the request can be serviced from the NM acting as cache for FM 150B or whether the request must be sent to the NVRAM controller 332, which may then service the request from the far memory (FM) portion 151 B of the NVRAM 142.
  • FM far memory
  • NVRAM controller 332 is a PCMS controller that performs access with protocols consistent with the PCMS technology. As previously discussed, the PCMS memory is inherently capable of being accessed at the
  • the NVRAM controller 332 may access a PCMS-based far memory 151 B at a lower level of granularity such as a cache line (e.g., a 64-bit or 128-bit cache line) or any other level of granularity consistent with the memory subsystem.
  • a cache line e.g., a 64-bit or 128-bit cache line
  • the underlying principles of the invention are not limited to any particular level of granularity for accessing a PCMS-based far memory 151 B.
  • PCMS-based far memory 151 B when PCMS-based far memory 151 B is used to form part of the system address space, the level of granularity will be higher than that traditionally used for other non-volatile storage technologies such as FLASH, which can only perform rewrite and erase operations at the level of a "block" (minimally 64Kbyte in size for NOR FLASH and 16 Kbyte for NAND FLASH).
  • NVRAM controller 332 can read configuration data to establish the previously described modes, sizes, etc. for the NVRAM 142 from decode table 333, or alternatively, can rely on the decoding results passed from IMC 331 and I/O subsystem 315.
  • computer system 300 can program decode table 333 to mark different regions of NVRAM 142 as system memory, mass storage exposed via SATA interfaces, mass storage exposed via USB Bulk Only Transport (BOT) interfaces, encrypted storage that supports TPM storage, among others.
  • BOT Bulk Only Transport
  • the means by which access is steered to different partitions of NVRAM device 142 is via a decode logic.
  • the address range of each partition is defined in the decode table 333.
  • the target address of the request is decoded to reveal whether the request is directed toward memory, NVRAM mass storage, or I/O. If it is a memory request, IMC 331 and/or the MSC controller 124 further determines from the target address whether the request is directed to NM as cache for FM 150B or to FM 151 B. For FM 151 B access, the request is forwarded to NVRAM controller 332. IMC 331 passes the request to the I/O subsystem 1 15 if this request is directed to I/O (e.g., non-storage and storage I/O devices).
  • I/O e.g., non-storage and storage I/O devices
  • I/O subsystem 1 15 further decodes the address to determine whether the address points to NVRAM mass storage 152A, BIOS NVRAM 172, or other non-storage or storage I/O devices. If this address points to NVRAM mass storage 152A or BIOS NVRAM 172, I/O subsystem 1 15 forwards the request to NVRAM controller 332. If this address points to TMP NVRAM 173, I/O subsystem 1 15 passes the request to TPM 334 to perform secured access.
  • each request forwarded to NVRAM controller 332 is accompanied with an attribute (also known as a "transaction type") to indicate the type of access.
  • NVRAM controller 332 may emulate the access protocol for the requested access type, such that the rest of the platform remains unaware of the multiple roles performed by NVRAM 142 in the memory and storage hierarchy.
  • NVRAM controller 332 may perform memory access to NVRAM 142 regardless of which transaction type it is. It is understood that the decode path can be different from what is described above. For example, IMC 331 may decode the target address of an access request and determine whether it is directed to NVRAM 142. If it is directed to NVRAM 142, IMC 331 generates an attribute according to decode table 333. Based on the attribute, IMC 331 then forwards the request to appropriate
  • NVRAM controller 332 may decode the target address if the corresponding attribute is not passed on from the upstream logic (e.g., IMC 331 and I/O subsystem 315). Other decode paths may also be implemented.
  • NVRAM 142 acts as a total replacement or supplement for traditional DRAM technology in system memory.
  • NVRAM 142 represents the introduction of a second-level system memory (e.g., the system memory may be viewed as having a first level system memory comprising near memory as cache 150B (part of the DRAM device 340) and a second level system memory
  • FM far memory
  • NVRAM 142 acts as a total replacement or supplement for the flash/magnetic/optical mass storage 152B.
  • NVRAM controller 332 may still access NVRAM mass storage 152A in blocks of multiple bytes, depending on the implementation (e.g., 64 Kbytes, 128 Kbytes, etc.).
  • the specific manner in which data is accessed from NVRAM mass storage 152A by NVRAM controller 332 may be transparent to software executed by the processor 310.
  • NVRAM mass storage 152A may be accessed differently from Flash/magnetic/optical mass storage 152A, the operating system may still view NVRAM mass storage 152A as a standard mass storage device (e.g., a serial ATA hard drive or other standard form of mass storage device).
  • a standard mass storage device e.g., a serial ATA hard drive or other standard form of mass storage device.
  • NVRAM mass storage 152A acts as a total replacement for the flash/magnetic/optical mass storage 152B, it is not necessary to use storage drivers for block-addressable storage access. The removal of storage driver overhead from storage access can increase access speed and save power. In alternative embodiments where it is desired that NVRAM mass storage 152A appears to the OS and/or applications as block-accessible and indistinguishable from
  • emulated storage drivers can be used to expose block-accessible interfaces (e.g., Universal Serial Bus (USB) Bulk-Only Transfer (BOT), 1 .0; Serial Advanced Technology Attachment (SATA), 3.0; and the like) to the software for accessing NVRAM mass storage 152A.
  • USB Universal Serial Bus
  • BOT Serial Advanced Technology Attachment
  • SATA Serial Advanced Technology Attachment
  • NVRAM 142 acts as a total replacement or supplement for firmware memory such as BIOS flash 362 and TPM flash 372 (illustrated with dotted lines in Figure 3 to indicate that they are optional).
  • the NVRAM 142 may include a BIOS NVRAM 172 portion to supplement or replace the BIOS flash 362 and may include a TPM NVRAM 173 portion to supplement or replace the TPM flash 372.
  • Firmware memory can also store system persistent states used by a TPM 334 to protect sensitive system information (e.g., encryption keys).
  • the use of NVRAM 142 for firmware memory removes the need for third party flash parts to store code and data that are critical to the system operations.
  • processor 310 may be any type of data processor including a general purpose or special purpose central processing unit (CPU), an application-specific integrated circuit (ASIC) or a digital signal processor (DSP).
  • processor 310 may be a general-purpose processor, such as a CoreTM i3, i5, i7, 2 Duo and Quad, XeonTM, or ItaniumTM
  • processor 310 may be from another company, such as ARM Holdings, Ltd, of Sunnyvale, CA, MIPS Technologies of Sunnyvale, CA, etc.
  • Processor 310 may be a special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, co-processor, embedded processor, or the like.
  • Processor 310 may be implemented on one or more chips included within one or more packages. Processor 310 may be a part of and/or may be implemented on one or more substrates using any of a number of process technologies, such as, for example, BiCMOS, CMOS, or NMOS. In the embodiment shown in Figure 3, processor 310 has a system-on-a-chip (SOC) configuration.
  • SOC system-on-a-chip
  • the processor 310 includes an integrated graphics unit 31 1 which includes logic for executing graphics commands such as 3D or 2D graphics commands. While the embodiments of the invention are not limited to any particular integrated graphics unit 31 1 , in one embodiment, the graphics unit 31 1 is capable of executing industry standard graphics commands such as those specified by the Open GL and/or Direct X application programming interfaces (APIs) (e.g., OpenGL 4.1 and Direct X 1 1 ).
  • APIs OpenGL 4.1 and Direct X 1 1
  • the processor 310 may also include one or more cores 101 -104, although a single core is illustrated in Figure 3, again, for the sake of clarity.
  • the core(s) 101 -104 includes internal functional blocks such as one or more execution units, retirement units, a set of general purpose and specific registers, etc. If the core(s) are multi-threaded or hyper-threaded, then each hardware thread may be considered as a "logical" core as well.
  • the cores 101 -104 may be homogenous or
  • the processor 310 may also include one or more caches, such as cache 313 which may be implemented as a SRAM and/or a DRAM. In many embodiments that are not shown, additional caches other than cache 313 are implemented so that multiple levels of cache exist between the execution units in the core(s) 101 -104 and memory devices 150B and 151 B.
  • the set of shared cache units may include an upper-level cache, such as a level 1 (L1 ) cache, mid-level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of cache, an (LLC), and/or different combinations thereof.
  • LLC level 1
  • cache 313 may be apportioned in different ways and may be one of many different sizes in different embodiments.
  • cache 313 may be an 8 megabyte (MB) cache, a 16 MB cache, etc.
  • the cache may be a direct mapped cache, a fully associative cache, a multi-way set-associative cache, or a cache with another type of mapping.
  • cache 313 may include one large portion shared among all cores or may be divided into several separately functional slices (e.g., one slice for each core). Cache 313 may also include one portion shared among all cores and several other portions that are separate functional slices per core.
  • the processor 310 may also include a home agent 314 which includes those components coordinating and operating core(s) 101 -104.
  • the home agent unit 314 may include, for example, a power control unit (PCU) and a display unit.
  • PCU power control unit
  • the PCU may be or include logic and
  • the display unit is for driving one or more externally connected displays.
  • processor 310 includes an integrated memory controller (IMC) 331 , near memory cache (MSC) controller, and NVRAM controller 332 all of which can be on the same chip as processor 310, or on a separate chip and/or package connected to processor 310.
  • IMC integrated memory controller
  • MSC near memory cache
  • NVRAM controller 332 all of which can be on the same chip as processor 310, or on a separate chip and/or package connected to processor 310.
  • DRAM device 144 may be on the same chip or a different chip as the IMC 331 and MSC controller 124; thus, one chip may have processor 310 and DRAM device 144; one chip may have the processor 310 and another the DRAM device 144 and (these chips may be in the same or different packages); one chip may have the core(s) 101 -104 and another the IMC 331 , MSC controller 124 and DRAM 144 (these chips may be in the same or different packages); one chip may have the core(s) 101 -104, another the IMC 331 and MSC controller 124, and another the DRAM 144 (these chips may be in the same or different packages); etc.
  • processor 310 includes an I/O subsystem 1 15 coupled to IMC 331 .
  • I/O subsystem 1 15 enables communication between processor 310 and the following serial or parallel I/O devices: one or more networks 336 (such as a Local Area Network, Wide Area Network or the Internet), storage I/O device (such as flash/magnetic/optical mass storage 152B, BIOS flash 362, TPM flash 372) and one or more non-storage I/O devices 337 (such as display, keyboard, speaker, and the like).
  • networks 336 such as a Local Area Network, Wide Area Network or the Internet
  • storage I/O device such as flash/magnetic/optical mass storage 152B, BIOS flash 362, TPM flash 372
  • non-storage I/O devices 337 such as display, keyboard, speaker, and the like.
  • I/O subsystem 1 15 may include a platform controller hub (PCH) (not shown) that further includes several I/O adapters 338 and other I/O circuitry to provide access to the storage and non-storage I/O devices and networks. To accomplish this, I/O subsystem 1 15 may have at least one integrated I/O adapter 338 for each I/O protocol utilized. I/O subsystem 1 15 can be on the same chip as processor 310, or on a separate chip and/or package connected to processor 310.
  • PCH platform controller hub
  • I/O adapters 338 translate a host communication protocol utilized within the processor 310 to a protocol compatible with particular I/O devices.
  • some of the protocols that I/O adapters 338 may translate include Peripheral Component Interconnect (PCI)-Express (PCI-E), 3.0; USB, 3.0; SATA, 3.0; Small Computer System Interface (SCSI), Ultra-640; and Institute of Electrical and Electronics
  • Peripheral Interface SPI
  • Microwire among others.
  • wireless protocol I/O adapters examples of wireless protocols, among others, are used in personal area networks, such as IEEE 802.15 and Bluetooth, 4.0; wireless local area networks, such as IEEE 802.1 1 -based wireless protocols; and cellular protocols.
  • the I/O subsystem 1 15 is coupled to a TPM control 334 to control access to system persistent states, such as secure data, encryption keys, platform configuration information and the like.
  • system persistent states are stored in a TMP NVRAM 173and accessed via NVRAM controller 332.
  • TPM 334 is a secure micro-controller with cryptographic functionalities.
  • TPM 334 has a number of trust-related capabilities; e.g., a SEAL capability for ensuring that data protected by a TPM is only available for the same TPM.
  • TPM 334 can protect data and keys (e.g., secrets) using its encryption capabilities.
  • TPM 334 has a unique and secret RSA key, which allows it to authenticate hardware devices and platforms. For example, TPM 334 can verify that a system seeking access to data stored in computer system 300 is the expected system.
  • TPM 334 is also capable of reporting the integrity of the platform (e.g., computer system 300). This allows an external resource (e.g., a server on a network) to determine the trustworthiness of the platform but does not prevent access to the platform by the user.
  • an external resource e.g., a server on a network
  • I/O subsystem 315 also includes a
  • ME Management Engine
  • a system administrator can remotely configure computer system 300 by editing the contents of the decode table 333 through ME 335 via networks 336.
  • NVRAM 142 As convenience of explanation, the remainder of the application sometimes refers to NVRAM 142 as a PCMS device.
  • a PCMS device includes multi-layered (vertically stacked) PCM cell arrays that are non- volatile, have low power consumption, and are modifiable at the bit level. As such, the terms NVRAM device and PCMS device may be used
  • NVRAM 142 for system memory, mass storage, firmware memory and/or other memory and storage purposes even if the processor of that computer system does not have all of the above-described components of processor 310, or has more components than processor 310.
  • the MSC controller 124 and NVRAM controller 332 are located on the same die or package (referred to as the CPU package) as the processor 310.
  • the MSC controller 124 and/or NVRAM controller 332 may be located off-die or off-CPU package, coupled to the processor 310 or CPU package over a bus such as a memory bus (like a DDR bus (e.g., a DDR3, DDR4, etc)), a PCI express bus, a desktop management interface (DMI) bus, or any other type of bus.
  • a memory bus like a DDR bus (e.g., a DDR3, DDR4, etc)
  • PCI express bus e.g., a PCI express bus
  • DMI desktop management interface
  • Figures 4A-M illustrates a variety of different deployments in which the processor, near memory and far memory are configured and packaged in different ways.
  • the series of platform memory configurations illustrated in Figures 4A-M enable the use of new nonvolatile system memory such as PCM technologies or, more specifically, PCMS technologies.
  • a memory side cache (MSC) controller e.g., located in the processor die or on a separate die in the CPU package intercepts all system memory requests. There are two separate interfaces that "flow downstream" from that controller that exit the CPU package to couple to the Near Memory and Far Memory. Each interface is tailored for the specific type of memory and each memory can be scaled independently in terms of performance and capacity.
  • MSC memory side cache
  • This memory interface must be tailored to meet the memory performance requirements of the processor and must support a transactional, out-of-order protocol at least because PCMS devices may not process read requests in order.
  • PCMS devices may not process read requests in order.
  • bus and “channel” are used synonymously herein.
  • the number of memory channels per DIMM socket will depend on the particular CPU package used in the computer system (with some CPU packages supporting, for example, three memory channels per socket).
  • DRAM memory channels including, by way of example and not limitation, DDR channels (e.g., DDR3, DDR4, DDR5, etc).
  • DDR channels e.g., DDR3, DDR4, DDR5, etc.
  • DDR is advantageous because of its wide acceptance in the industry, resulting price point, etc.
  • the underlying principles of the invention are not limited to any particular type of DRAM or volatile memory.
  • FIG. 4A illustrates one embodiment of a split architecture which includes one or more DRAM devices 403-406 operating as near memory acting as cache for FM (i.e., MSC) in the CPU package 401 (either on the processor die or on a separate die) and one or more NVRAM devices such as PCM memory residing on DIMMs 450-451 acting as far memory.
  • High bandwidth links 407 on the CPU package 401 interconnect a single or multiple DRAM devices 403-406 to the processor 310 which hosts the integrated memory controller (IMC) 331 and MSC controller 124.
  • IMC integrated memory controller
  • the MSC controller 124 may be integrated within the memory controller 331 in one embodiment.
  • the DIMMs 450-451 use DDR slots and electrical connections defining a DDR channels 440 with DDR address, data and control lines and voltages (e.g., the DDR3 or DDR4 standard as defined by the Joint Electron Devices Engineering Council (JEDEC)).
  • the PCM devices on the DIMMs 450-451 provide the far memory capacity of this split architecture, with the DDR channels 440 to the CPU package 401 able to carry both DDR and transactional protocols.
  • the transactional protocol used to communicate with PCM devices allows the CPU 401 to issue a series of transactions, each identified by a unique transaction ID.
  • the commands are serviced by a PCM controller on the recipient one of the PCM DIMMs, which sends responses back to the CPU package 401 , potentially out of order.
  • the processor 310 or other logic within the CPU package 401 identifies each transaction response by its transaction ID, which is sent with the response.
  • the above configuration allows the system to support both standard DDR DRAM-based DIMMs (using DDR protocols over DDR electrical connections) and PCM-based DIMMs configurations (using transactional protocols over the same DDR electrical connections).
  • FIG. 4B illustrates a split architecture which uses DDR DRAM- based DIMMs 452 coupled over DDR channels 440 to form near memory which acts as an MSC.
  • the processor 310 hosts the memory controller 331 and MSC controller 124.
  • NVRAM devices such as PCM memory devices reside on PCM-based DIMMs 453 that use DDR slots and electrical connections on additional DDR channels 442 off the CPU package 401 .
  • the PCM-based DIMMs 453 provide the far memory capacity of this split architecture, with the DDR channels 442 to the CPU package 401 being based on DDR electrical connections and able to carry both DDR and transactional protocols. This allows the system to be configured with varying numbers of DDR DRAM DIMMs 452 (e.g., DDR4 DIMMS) and PCM DIMMs 453 to achieve the desired capacity and/or performance points.
  • FIG. 4C illustrates a split architecture which hosts the near memory 403-406 acting as a memory side cache (MSC) on the CPU package 401 (either on the processor die or on a separate die).
  • High bandwidth links 407 on the CPU package are used to interconnect a single or multiple DRAM devices 403-406 to the processor 310 which hosts the memory controller 331 and the MSC controller 124, as defined by the split architecture.
  • NVRAM such as PCM memory devices reside on PCI Express cards or risers 455 that use PCI Express electrical connections and PCI Express protocol or a different transactional protocol over the PCI Express bus 454.
  • the PCM devices on the PCI Express cards or risers 455 provide the far memory capacity of this split architecture.
  • Figure 4D is a split architecture which uses DDR DRAM-based
  • DIMMs 452 and DDR channels 440 to form the near memory which acts as an MSC.
  • the processor 310 hosts the memory controller 331 and MSC controller 124.
  • NVRAM such as PCM memory devices 455 reside on PCI Express cards or risers that use PCI Express electrical connections and PCI Express protocol or a different transactional protocol over the PCI Express link 454.
  • the PCM devices on the PCI Express cards or risers 455 provide the far memory capacity of this split architecture, with the memory channel interfaces off the CPU package 401 providing multiple DDR channels 440 for DDR DRAM DIMMs 452.
  • Figure 4E illustrates a unified architecture which hosts both near memory acting as an MSC and far memory NVRAM such as PCM on PCI Express cards or risers 456 that use PCI Express electrical connections and PCI Express protocol or a different transactional protocol over the PCI Express bus 454.
  • the processor 310 hosts the integrated memory controller 331 but, in this unified architecture case, the MSC controller 124 resides on the card or riser 456, along with the DRAM near memory and NVRAM far memory.
  • Figure 4F illustrates a unified architecture which hosts both the near memory acting as an MSC and the far memory NVRAM such as PCM, on DIMMs 458 using DDR channels 457.
  • the near memory in this unified architecture comprises DRAM on each DIMM 458, acting as the memory side cache to the PCM devices on that same DIMM 458, that form the far memory of that particular DIMM.
  • the MSC controller 124 resides on each DIMM 458, along with the near and far memory.
  • multiple memory channels of a DDR bus 457 are provided off the CPU package.
  • the DDR bus 457 of this embodiment implements a transactional protocol over DDR electrical connections.
  • FIG. 4G illustrates a hybrid split architecture, whereby the MSC controller 124 resides on the processor 310 and both near memory and far memory interfaces share the same DDR bus 410. This configuration uses
  • DRAM-based DDR DIMMs 41 1 a as near memory acting as an MSC with the
  • PCM-Based DIMMs 41 1 b i.e., far memory
  • the memory channels of this embodiment carry both DDR and transactional protocols simultaneously to address the near memory and far memory DIMMs, 41 1 a and 41 1 b, respectively.
  • FIG. 4H illustrates a unified architecture in which the near memory 461 a acting as a memory side cache resides on a mezzanine or riser 461 , in the form of DRAM-based DDR DIMMs.
  • the memory side cache (MSC) controller 124 is located in the riser's DDR and PCM controller 460 which may have two or more memory channels connecting to DDR DIMM channels 470 on the mezzanine/riser 461 and interconnecting to the CPU over high performance interconnect(s) 462 such as a differential memory link.
  • the associated far memory 461 b sits on the same
  • mezzanine/riser 461 is formed by DIMMs that use DDR channels 470 and are populated with NVRAM (such as PCM devices).
  • Figure 4I illustrates a unified architecture that can be used as memory capacity expansion to a DDR memory subsystem and DIMMs 464 connected to the CPU package 401 on its DDR memory subsystem, over a DDR bus 471 .
  • the near memory acting as a MSC resides on a mezzanine or riser 463, in the form of DRAM based DDR DIMMs 463a.
  • the MSC controller 124 is located in the riser's DDR and PCM controller 460 which may have two or more memory channels connecting to DDR DIMM channels 470 on the mezzanine/riser and interconnecting to the CPU over high performance interconnect(s) 462 such as a differential memory link.
  • the associated far memory 463b sits on the same mezzanine/riser 463 and is formed by DIMMs 463b that use DDR channels 470 and are populated with NVRAM (such as PCM devices).
  • Figure 4J is a unified architecture in which a near memory acting as a memory side cache (MSC) resides on each and every DIMM 465, in the form of DRAM.
  • the DIMMs 465 are on a high performance
  • interconnect/channel(s) 462 such as a differential memory link
  • NVRAM such as PCM devices
  • FIG. 4K illustrates a unified architecture in which the near memory acting as a MSC resides on every DIMM 466, in the form of DRAM.
  • the DIMMs are on high performance interconnect(s) 470 connecting to the CPU package 401 with the MSC controller 124 located on the DIMMs.
  • the associated far memory sits on the same DIMM 466 and is formed by
  • NVRAM such as PCM devices
  • Figure 4L illustrates a split architecture which uses DDR DRAM- based DIMMs 464 on a DDR bus 471 to form the necessary near memory which acts as a MSC.
  • the processor 310 hosts the integrated memory controller 331 and memory side cache controller 124.
  • NVRAM such as PCM memory forms the far memory which resides on cards or risers 467 that use high performance interconnects 468 communicating to the CPU package 401 using a transactional protocol.
  • the cards or risers 467 hosting the far memory host a single buffer/controller that can control multiple PCM- based memories or multiple PCM-based DIMMs connected on that riser.
  • Figure 4M illustrates a unified architecture which may use DRAM on a card or riser 469 to form the necessary near memory which acts as a MSC.
  • NVRAM such as PCM memory devices form the far memory which also resides on the cards or risers 469 that use high performance
  • the cards or risers 469 hosting the far memory hosts a single buffer/controller that can control multiple PCM-based devices or multiple PCM based DIMMs on that riser 469 and also integrates the memory side cache controller 124.
  • the DRAM DIMMS 41 1 a and PCM-based DIMMS 41 1 b reside on the same memory channel. Consequently the same set of address/control and data lines are used to connect the CPU to both the DRAM and PCM memories.
  • a DDR DIMM on a common memory channel with a PCM-based DIMM is configured to act as the sole MSC for data stored in the PCM-based DIMM.
  • the far memory data stored in the PCM-based DIMM is only cached in the DDR DIMM near memory within the same memory channel, thereby localizing memory transactions to that particular memory channel.
  • the system address space may be logically subdivided between the different memory channels. For example, if there are four memory channels, then 1 ⁇ 4 of the system address space may be allocated to each memory channel. If each memory channel is provided with one PCMS-based DIMM and one DDR DIMM, the DDR DIMM may be configured to act as the MSC for that 1 ⁇ 4 portion of the system address space.
  • system memory and mass storage devices may depend on the type of electronic platforms on which embodiments of the invention are employed.
  • the mass storage may be implemented using NVRAM mass storage 152A alone, or using NVRAM mass storage 152A in
  • the mass storage may be implemented using magnetic storage (e.g., hard drives) or any combination of magnetic storage, optical storage, holographic storage, mass-storage flash memory, and NVRAM mass storage 152A.
  • system hardware and/or software responsible for storage may implement various intelligent persistent storage allocation techniques to allocate blocks of persistent program code and data between the FM 151 B/NVRAM storage 152A and a flash/magnetic/optical mass storage 1 52B in an efficient or otherwise useful manner.
  • a high powered server is configured with a near memory (e.g., DRAM), a PCMS device, and a magnetic mass storage device for large amounts of persistent storage.
  • a notebook computer is configured with a near memory and a PCMS device which performs the role of both a far memory and a mass storage device (i.e., which is logically partitioned to perform these roles as shown in Figure 3).
  • a home or office desktop computer is configured similarly to a notebook computer, but may also include one or more magnetic storage devices to provide large amounts of persistent storage capabilities.
  • One embodiment of a tablet computer or cellular telephony device is configured with PCMS memory but potentially no near memory and no additional mass storage (for cost/power savings). However, the
  • tablet/telephone may be configured with a removable mass storage device such as a flash or PCMS memory stick.
  • PDAs personal digital assistants
  • gaming consoles may be configured in a similar manner to desktops or laptops.
  • Other devices which may be similarly configured include digital cameras, routers, set-top boxes, digital video recorders, televisions, and automobiles.
  • the storage cells of various far memory technologies may have various reliability concerns that are a function of their usage.
  • the appropriate read and/or write low level access signals applied to a far memory storage cell e.g., pulse width, voltage amplitude, current amplitude, etc.
  • the appropriate read threshold voltage for a far memory storage cell may change as a function of the length of time that has elapsed since the storage cell was last written to.
  • wear leveling algorithms may be used to "spread out” accesses to the cells in an attempt to keep the low level signaling characteristics approximately the same across a PCMS storage device's storage cells. Wear leveling algorithms, however, may be costly to implement. For example, wear leveling algorithms may temporarily suspend far memory accesses during time periods in which the data of heavily utilized storage cells and minimally used storage cells are "swapped". This has the effect of reducing far memory performance. Moreover, the logic circuitry needed to implement the wear leveling function may consume scores of logic gates that, if implemented proximate to the far memory storage devices
  • one or more usage statistics of a specific set of far memory storage addresses is tracked, and, the appropriate low level signaling properties applied to that set of addresses is determined as a function of the tracked accesses.
  • the usage statistics are tracked and utilized during normal system operation rather than at only system bring up, system test diagnostics and/or in response to a system failure.
  • the appropriate low level signals are then applied. Notably, however, the specific characteristics of the appropriate low level signals
  • the specific signal characteristics e.g., specific pulse widths, specific voltage amplitudes, specific current amplitudes, specific read threshold voltages, etc.
  • the specific tracked value parameters e.g., a specific number of writes, a specific amount of time since a last write
  • Figures 5 and 6 provide representations of such a platform.
  • Figure 5 shows components of a hardware architecture for an NVRAM controller 532 and
  • Figure 6 shows basic methodologies that may be performed by the hardware architecture.
  • NVRAM controller 532 may be used, for example, to access a computer system's main memory where the main memory has only PCMS technology or combined near/far memory technology (e.g., DRAM and PCMS).
  • NVRAM controller 532 may be coupled to or include a main memory channel into which DIMM cards are plugged.
  • NVRAM controller 532 may be used to access a computing system's mass storage.
  • NVRAM controller 532 may be coupled to, and/or be integrated within, an SSD package.
  • a first correlation is instantiated that tracks certain usage parameters 502_1 to 502_N for each of N sets of address space 501 _1 to 501_N of a memory core 516.
  • Memory core 516 may be implemented, for example, with PCMS devices coupled to a same memory channel and the address space of the PCMS devices is broken down into N address sets 501_1 to 501_N. Said another way, the address space of the memory storage supported by the memory channel can be viewed as being arranged into N address sets 501 _1 to 501 _N.
  • each unique address set will therefore correspond to 2 X /N unique addresses.
  • N 16777,216 unique memory addresses supported by the memory channel.
  • the sets may represent contiguous address space but they do not need to be organized in this manner. For example, some form of interleaving may be used so that consecutive addresses in a same set have a numerical offset of N or value based on N. Further still, the strategy for determining which addresses belong in which set may be based on the structural and/or wiring
  • address decoder 503 receives 601 the address of a read or write transaction targeted to the memory core 516 as an input, and, in response, produces 602 an identifier 506 of the specific set that the address belongs to as an output.
  • N may be programmable and may be an input term provided to the address decoder 503.
  • the tracking statistics for that address set are looked-up 603 from a first level of look-up circuitry 504 (such as content addressable memory (CAM) circuitry).
  • a first level of look-up circuitry 504 such as content addressable memory (CAM) circuitry.
  • two tracking statistics are kept for each set of addresses: 1 ) total number of write accesses 507; and, 2) time of last write operation 508.
  • these statistics are updated for a write transaction targeted to the memory core 516 but are not updated 604 for a read transaction targeted to the memory core 516 (if updated, they are eventually written back to the first level storage circuitry 504). Specifically, if the incoming transaction is a write transaction, the number of write accesses 507 is incremented by 1 and the time of last write operation 508 is updated to be the current time.
  • a fetched (and possibly updated) usage statistic is then used as a look-up parameter to a second look-up level 505 to retrieve 605 a digital representation (e.g., a plurality of bits) of an appropriate low level signaling characteristic (or characteristics set or "signature") for the implicated address set 51 1 .
  • a digital representation e.g., a plurality of bits
  • an appropriate low level signaling characteristic or characteristics set or "signature”
  • the total number of writes statistic 507 that was retrieved for the implicated address set is used as a look up parameter to storage circuit 509 (which may also be implemented with CAM circuitry) to retrieve low level signature 51 1 .
  • the low level signaling signature 51 1 is essentially a digital code or other representation from which the appropriate low level signaling (e.g., any one or more of waveform shape, voltage amplitude, current amplitude, etc.) for the memory core 516 for the particular transaction (read or write) and implicated address set can be determined.
  • the signature 51 1 as contained within its storage circuit 509 e.g., CAM
  • various types of PCMS devices may actually perform a "pre-read" prior to a write, hence, a write transaction may actually be implemented with both a read operation and write operation.
  • the storage circuit 509 has X entries which corresponds to the granularity at which the tracked statistic used as the look-up parameter (e.g., total number of writes) is designed to affect specific low level signals applied to the memory core 516 for the transaction.
  • the tracked statistic used as the look-up parameter e.g., total number of writes
  • the second level look-up storage circuitry 509, 510 defines its search key column(s) entries with ranges. A hit is
  • the entries of the search column for look-up table 509 may consist of different, consecutive ranges of total numbers of write operations (e.g., 0 to 1 ,000 for the first entry; 1 ,001 to 10,000 for the second entry, etc.).
  • a total number of write operations for the applicable address set is fetched from the first look-up level 504, it will hit within one of the ranges of the search column of table 510, which, in turn, will identify the appropriate analog signal signature for the transaction.
  • 501_1 through 501_N are composed of contiguous addresses (address ranges) and address decoder 503 contains binning logic that can determine which address range a particular address is associated with. For example, logic circuitry 503 may be informed of, or calculate, the appropriate address ranges for N contiguous address ranges and may further populate 2N registers with the minimum and maximum address for each set/range.
  • logic circuitry can determine which set a received address belongs to (e.g., both the greater than and less than comparison circuits signify a logical "true").
  • the address set identifier 506 may be the transaction address or a portion of the transaction address (e.g., a row component or a column component of the address, or portions thereof).
  • the individual address sets 501 _1 to 501 _N in the first level of look-up 504 may be defined by address (or address portion) ranges.
  • address decoder 503 may include division logic circuitry that divides the incoming address by a value based on N and examines the remainder to identify what set the address belongs to.
  • the approach for determining the address sets, as designed into address decoder 503, may also take into account the structure of the memory core 516 itself. For instance, storage cells coupled to a same row or a same column may be grouped into a same set because such cells are coupled to a common, critical node within the memory core (e.g., a same row node or a same column node) whose applicable pulse widths, voltage/current amplitudes, etc. stress the cells in like fashion. As such, tracking the usage of these cells as a group and determining the appropriate low level signals to apply to them as a group is largely consistent with a more ideal (but less practical) scheme that tracks usage and applies signals to the cells on an individual cell-by-cell basis.
  • addresses from different rows/columns of the core may be grouped into a same set if their wiring is deemed proximate to one another and/or there is some other structural relationship within the memory core that leads to a belief that they may receive same low level signaling as a function of the accesses made to the group as a whole.
  • FIG. 5 Different hardware platform architectures than that depicted in Figure 5 may also exist.
  • the architecture of Figure 5 indicates that both low level signaling signatures 51 1 , 512 are determined from the same address set definition.
  • the signatures 51 1 , 512 may be driven by different address set definitions, which, in turn, corresponds to the grouping of different parts of the memory core
  • the low level signaling signature 51 1 for a write operation or a read operation may be determined from the total number of times the address's column component has been written to (or other first grouping of memory core wiring and/or structure).
  • the read threshold voltage 512 for a read operation may be determined from the time elapsed since the last write to the address's corresponding row component (or other, different, second grouping of core wiring and/or structure). This would correspond to different types of set identifiers 506 (one for read transactions and one for write transactions) and potentially two separate look-up circuits in the first level look-up 504 (a first CAM used for reads and a second CAM used for writes).
  • addresses associated with a same row (or other first address grouping) are identified in a first address set, and, addresses associated with a same column (or other second address grouping) are identified is a second address set.
  • Total number of writes and time of last write are tracked for all the sets so that the system tracks the total number of writes and the time of last write for each row and each column in the system (or, more generally, the two different groupings).
  • two sets of tracked statistics e.g., two sets of total number of write accesses
  • the tracked statics may be added or mathematically combined in some fashion (e.g., each weighted equally or one weighted more heavily than other) to establish, for example, a total number of write addresses for the targeted cell based on the combined perspective of the two address groupings (e.g. a combined row and column perspective).
  • the total number may then be used as a look-up parameter into the second stage look-up 505 to produce an analog signaling signature based on this combined perspective.
  • the "rows" or “columns” described above may instead be larger, different groupings of memory core structure and/or wiring where same low level signaling is appropriate based on accesses to the corresponding groups as a whole.
  • the NVRAM controller 532 information identifying the type of memory core, the address sets for the type of memory core, or the applicable function(s) for determining the address sets (e.g., contiguous ranges, interleaved, etc.) for the memory core are provided to the NVRAM controller 532.
  • this information is communicated to the NVRAM controller 532 by the memory core 516 (e.g., having the information pre-programmed therein).
  • this information is kept in system BIOS and provided to the NVRAM controller 532. In either approach the information may be provided to the NVRAM controller 532 at system bring-up. The information is then used by the NVRAM controller 532 to internally configure the address decoder 503 so that it can subsequently determine the correct address set for any given read or write transaction address.
  • FIGS 7a-d show different possible ways in which the above described techniques may be integrated into a memory channel within a computing system.
  • a memory channel is understood to include a host side 701 and one or more platforms 702 (e.g., DIMM cards, SDD devices, etc.) that are coupled to the memory channel's interconnect structure (such as a bus) 703.
  • the one or more platforms 702 have storage devices including non volatile memory devices (such as PCMS devices) 716.
  • Interface circuitry 717 may also reside on a platform to specially address the memory devices 716.
  • the interface circuitry 717 may be viewed as a component of an NVRAM controller that is local to the storage core 716 (e.g., on a DIMM card or within an SSD package) whereas the host side, depicted in Figs. 7a - d as "memory controller 701 " may be viewed as the host side component of an NVRAM controller.
  • the memory controller 701 sends a read or write command to the interface circuitry 717 with a corresponding memory address.
  • the interface circuitry 717 in response to the received command, performs the desired operation (read or write) to the memory storage devices.
  • the storage devices 716 of Figure 7a-d can be viewed as the memory core 516 referred to above with respect to Figure 5.
  • the D/A circuitry and/or waveform circuitry 714a that converts the received signatures into an actual low level signal are located in the interface circuit 717a (and/or the memory device(s) 716a). All other roles/responsibilities of the above described techniques may be implemented entirely on the memory controller 701 a, entirely on the interface circuit 717a, or may be partially implemented on both.
  • All of the remaining roles/responsibilities are implemented entirely on the memory controller 701 a.
  • each of the address decoder 703a, the first level look-up storage circuitry 704a and the second level look-up storage circuitry 705a and any logic in between reside on the memory controller 701 a.
  • the memory controller 701 a also sends the applicable low level signaling signature(s) 71 1 a (e.g., which may further include a read threshold voltage signature for read operations) to the interface circuitry 717.
  • the memory controller 701 c sends to the interface circuitry 717c information related to and used for the determination of the low level signaling signature - other than the signatures themselves.
  • the memory controller 701 c includes the address decoder 703c so that it can determine which address set or set(s) are implicated by the transaction address.
  • the memory controller then sends an identifier 717c of the implicated address set to the interface circuit 717c.
  • the interface circuit 717c which includes the first and second level look-up circuitry 704c, 705c then determines the applicable low level signaling signature from the address set information (e.g., by performing both the first and second stage look-ups).
  • the memory controller 701 d includes the address decoder 703d and determines the appropriate address set for the transaction's address.
  • the memory controller 701 d also includes the first level look-up circuitry 704d and looks up the information that is tracked for the address set.
  • the tracked information 707c e.g., total number of writes and/or time of last write or time elapsed since last write
  • the interface circuit 717d which uses the information to determine the applicable low level signaling signature.
  • the memory controller 701 d may include logic to determine time elapsed since the last write, or, such logic may be located on the interface circuit 717.

Abstract

A method is described that entails receiving an address for a read or write transaction to a non volatile system memory device. The method further involves determining a usage statistic of the memory device for a set of addresses of which the address is a member. The method further involves determining a characteristic of a signal to be applied to the memory device for the read or write transaction based on the usage statistic. The method further involves generating a signal having the characteristic to perform the read or write transaction.

Description

GENERATION OF FAR MEMORY ACCESS SIGNALS BASED ON USAGE STATISTIC TRACKING
BACKGROUND
Field of the Invention
[0001] This invention relates generally to the field of computer systems. More particularly, the invention relates to an apparatus and method for implementing a multi-level memory hierarchy including a non-volatile memory tier.
Description of the Related Art
A. Current Memory and Storage Configurations
[0002] One of the limiting factors for computer innovation today is memory and storage technology. In conventional computer systems, system memory (also known as main memory, primary memory, executable memory) is typically implemented by dynamic random access memory (DRAM). DRAM-based memory consumes power even when no memory reads or writes occur because it must constantly recharge internal capacitors. DRAM-based memory is volatile, which means data stored in DRAM memory is lost once the power is removed. Conventional computer systems also rely on multiple levels of caching to improve performance. A cache is a high speed memory positioned between the processor and system memory to service memory access requests faster than they could be serviced from system memory. Such caches are typically implemented with static random access memory (SRAM). Cache management protocols may be used to ensure that the most frequently accessed data and instructions are stored within one of the levels of cache, thereby reducing the number of memory access transactions and improving performance. [0003] With respect to mass storage (also known as secondary storage or disk storage), conventional mass storage devices typically include magnetic media (e.g., hard disk drives), optical media (e.g., compact disc (CD) drive, digital versatile disc (DVD), etc.), holographic media, and/or mass-storage flash memory (e.g., solid state drives (SSDs), removable flash drives, etc.). Generally, these storage devices are considered Input/Output (I/O) devices because they are accessed by the processor through various I/O adapters that implement various I/O protocols. These I/O adapters and I/O protocols consume a significant amount of power and can have a significant impact on the die area and the form factor of the platform.
Portable or mobile devices (e.g., laptops, netbooks, tablet computers, personal digital assistant (PDAs), portable media players, portable gaming devices, digital cameras, mobile phones, smartphones, feature phones, etc.) that have limited battery life when not connected to a permanent power supply may include removable mass storage devices (e.g., Embedded Multimedia Card (eMMC), Secure Digital (SD) card) that are typically coupled to the processor via low-power interconnects and I/O controllers in order to meet active and idle power budgets.
[0004] With respect to firmware memory (such as boot memory (also known as BIOS flash)), a conventional computer system typically uses flash memory devices to store persistent system information that is read often but seldom (or never) written to. For example, the initial instructions executed by a processor to initialize key system components during a boot process (Basic Input and Output System (BIOS) images) are typically stored in a flash memory device. Flash memory devices that are currently available in the market generally have limited speed (e.g., 50 MHz). This speed is further reduced by the overhead for read protocols (e.g., 2.5 MHz). In order to speed up the BIOS execution speed, conventional processors generally cache a portion of BIOS code during the Pre-Extensible Firmware Interface (PEI) phase of the boot process. The size of the processor cache places a restriction on the size of the BIOS code used in the PEI phase (also known as the "PEI BIOS code").
B. Phase-Change Memory (PCM) and Related Technologies
[0005] Phase-change memory (PCM), also sometimes referred to as phase change random access memory (PRAM or PCRAM), PCME, Ovonic Unified Memory, or Chalcogenide RAM (C-RAM), is a type of non-volatile computer memory which exploits the unique behavior of chalcogenide glass. As a result of heat produced by the passage of an electric current, chalcogenide glass can be switched between two states: crystalline and amorphous. Recent versions of PCM can achieve two additional distinct states.
[0006] PCM proivdes higher performance than flash because the memory element of PCM can be switched more quickly, writing (changing individual bits to either 1 or 0) can be done without the need to first erase an entire block of cells, and degradation from writes is slower (a PCM device may survive approximately 100 million write cycles; PCM degradation is due to thermal expansion during programming, metal (and other material) migration, and other mechanisms).
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The following description and accompanying drawings are used to illustrate embodiments of the invention. In the drawings:
[0008] FIG. 1 illustrates a cache and system memory arrangement according to embodiments of the invention;
[0009] FIG. 2 illustrates a memory and storage hierarchy employed in embodiments of the invention;
[0010] FIG. 3 illustrates a computer system on which embodiments of the invention may be implemented; [0011] FIG. 4A illustrates a first system architecture which includes PCM according to embodiments of the invention;
[0012] FIG. 4B illustrates a second system architecture which includes PCM according to embodiments of the invention;
[0013] FIG. 4C illustrates a third system architecture which includes PCM according to embodiments of the invention;
[0014] FIG. 4D illustrates a fourth system architecture which includes PCM according to embodiments of the invention;
[0015] FIG. 4E illustrate a fifth system architecture which includes PCM according to embodiments of the invention;
[0016] FIG. 4F illustrate a sixth system architecture which includes PCM according to embodiments of the invention;
[0017] FIG. 4G illustrates a seventh system architecture which includes PCM according to embodiments of the invention;
[0018] FIG. 4H illustrates an eight system architecture which includes PCM according to embodiments of the invention;
[0019] FIG. 4I illustrates a ninth system architecture which includes PCM according to embodiments of the invention;
[0020] FIG. 4J illustrates a tenth system architecture which includes PCM according to embodiments of the invention;
[0021] FIG. 4K illustrates an eleventh system architecture which includes PCM according to embodiments of the invention;
[0022] FIG. 4L illustrates a twelfth system architecture which includes PCM according to embodiments of the invention;
[0023] FIG. 4M illustrates a thirteenth system architecture which includes PCM according to embodiments of the invention;
[0024] FIG. 5 illustrates aspects of an NVRAM controller for determining far memory signaling based on usage statistics tracking; [0025] FIG. 6 illustrates a method that can be performed by the NVRAM controller of FIG. 5; and
[0026] FIGS. 7A-7D illustrate various approaches for integrating the NVRAM controller of FIG. 5 into a memory channel.
DETAILED DESCRIPTION
[0027] In the following description, numerous specific details such as logic implementations, opcodes, means to specify operands, resource partitioning/sharing/duplication implementations, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
[0028] References in the specification to "one embodiment," "an embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other
embodiments whether or not explicitly described.
[0029] In the following description and claims, the terms "coupled" and
"connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. "Coupled" is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. "Connected" is used to indicate the establishment of communication between two or more elements that are coupled with each other.
[0030] Bracketed text and blocks with dashed borders (e.g., large dashes, small dashes, dot-dash, dots) are sometimes used herein to illustrate optional operations/components that add additional features to
embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations/components, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
I NTRODUCTION
[0031] Memory capacity and performance requirements continue to increase with an increasing number of processor cores and new usage models such as virtualization. In addition, memory power and cost have become a significant component of the overall power and cost, respectively, of electronic systems.
[0032] Some embodiments of the invention solve the above challenges by intelligently subdividing the performance requirement and the capacity requirement between memory technologies. The focus of this approach is on providing performance with a relatively small amount of a relatively higher-speed memory such as DRAM while implementing the bulk of the system memory using significantly cheaper and denser non-volatile random access memory (NVRAM). Embodiments of the invention described below define platform configurations that enable hierarchical memory subsystem organizations for the use of NVRAM. The use of NVRAM in the memory hierarchy also enables new usages such as expanded boot space and mass storage implementations, as described in detail below. [0033] FIG. 1 illustrates a cache and system memory arrangement according to embodiments of the invention. Specifically, Figure 1 shows a memory hierarchy including a set of internal processor caches 120, "near memory" acting as a far memory cache 121 , which may include both internal cache(s) 106 and external caches 107-109, and "far memory" 122. One particular type of memory which may be used for "far memory" in some embodiments of the invention is non-volatile random access memory
("NVRAM"). As such, an overview of NVRAM is provided below, followed by an overview of far memory and near memory.
A. Non-Volatile Random Access Memory ("NVRAM")
[0034] There are many possible technology choices for NVRAM, including PCM, Phase Change Memory and Switch (PCMS) (the latter being a more specific implementation of the former), byte-addressable persistent memory (BPRAM), storage class memory (SCM), universal memory, Ge2Sb2Te5, programmable metallization cell (PMC), resistive memory (RRAM), RESET (amorphous) cell, SET (crystalline) cell, PCME, Ovshinsky memory, ferroelectric memory (also known as polymer memory and poly(N- vinylcarbazole)), ferromagnetic memory (also known as Spintronics, SPRAM (spin-transfer torque RAM), STRAM (spin tunneling RAM), magnetoresistive memory, magnetic memory, magnetic random access memory (MRAM)), and Semiconductor-oxide-nitride-oxide-semiconductor (SONOS, also known as dielectric memory).
[0035] NVRAM has the following characteristics:
(1 ) It maintains its content even if power is removed, similar to
FLASH memory used in solid state disks (SSD), and different from SRAM and DRAM which are volatile;
(2) lower power consumption than volatile memories such as SRAM and DRAM;
(3) random access similar to SRAM and DRAM (also known as randomly addressable); (4) rewritable and erasable at a lower level of granularity (e.g., byte level) than FLASH found in SSDs (which can only be rewritten and erased a "block" at a time - minimally 64 Kbyte in size for NOR FLASH and 16 Kbyte for NAND FLASH);
(5) used as a system memory and allocated all or a portion of the system memory address space;
(6) capable of being coupled to the processor over a bus using a transactional protocol (a protocol that supports transaction identifiers (IDs) to distinguish different transactions so that those transactions can complete out-of-order) and allowing access at a level of granularity small enough to support operation of the NVRAM as system memory (e.g., cache line size such as 64 or 128 byte). For example, the bus may be a memory bus (e.g., a DDR bus such as DDR3, DDR4, etc.) over which is run a transactional protocol as opposed to the non-transactional protocol that is normally used. As another example, the bus may one over which is normally run a transactional protocol (a native transactional protocol), such as a PCI express (PCIE) bus, desktop management interface (DMI) bus, or any other type of bus utilizing a transactional protocol and a small enough transaction payload size (e.g., cache line size such as 64 or 128 byte); and
(7) one or more of the following: a) faster write speed than non-volatile memory/storage technologies such as FLASH; b) very high read speed (faster than FLASH and near or equivalent to DRAM read speeds); c) directly writable (rather than requiring erasing (overwriting with 1 s) before writing data like FLASH memory used in SSDs); d) a greater number of writes before failure (more than boot ROM and FLASH used in SSDs); and/or [0036] As mentioned above, in contrast to FLASH memory, which must be rewritten and erased a complete "block" at a time, the level of granularity at which NVRAM is accessed in any given implementation may depend on the particular memory controller and the particular memory bus or other type of bus to which the NVRAM is coupled. For example, in some
implementations where NVRAM is used as system memory, the NVRAM may be accessed at the granularity of a cache line (e.g., a 64-byte or 128- Byte cache line), notwithstanding an inherent ability to be accessed at the granularity of a byte, because cache line is the level at which the memory subsystem accesses memory. Thus, when NVRAM is deployed within a memory subsystem, it may be accessed at the same level of granularity as the DRAM (e.g., the "near memory") used in the same memory subsystem. Even so, the level of granularity of access to the NVRAM by the memory controller and memory bus or other type of bus is smaller than that of the block size used by Flash and the access size of the I/O subsystem's controller and bus.
[0037] NVRAM may also incorporate wear leveling algorithms to account for the fact that the storage cells at the far memory level begin to wear out after a number of write accesses, especially where a significant number of writes may occur such as in a system memory implementation. Since high cycle count blocks are most likely to wear out in this manner, wear leveling spreads writes across the far memory cells by swapping addresses of high cycle count blocks with low cycle count blocks. Note that most address swapping is typically transparent to application programs_because it is handled by hardware, lower-level software (e.g., a low level driver or operating system), or a combination of the two.
B. Far Memory
[0038] The far memory 122 of some embodiments of the invention is implemented with NVRAM, but is not necessarily limited to any particular memory technology. Far memory 122 is distinguishable from other instruction and data memory/storage technologies in terms of its characteristics and/or its application in the memory/storage hierarchy. For example, far memory 122 is different from:
1 ) static random access memory (SRAM) which may be used for level 0 and level 1 internal processor caches 101 a-b, 102a-b, 103a-b, 103a-b, and 104a-b dedicated to each of the processor cores 101 -104, respectively, and lower level cache (LLC) 105 shared by the processor cores;
2) dynamic random access memory (DRAM) configured as a cache 106 internal to the processor 100 (e.g., on the same die as the processor 100) and/or configured as one or more caches 107-109 external to the processor (e.g., in the same or a different package from the processor 100); and
3) FLASH memory/magnetic disk/optical disc applied as mass
storage (not shown); and
4) memory such as FLASH memory or other read only memory
(ROM) applied as firmware memory (which can refer to boot ROM, BIOS Flash, and/or TPM Flash). (not shown).
[0039] Far memory 122 may be used as instruction and data storage that is directly addressable by a processor 100 and is able to sufficiently keep pace with the processor 100 in contrast to FLASH/magnetic disk/optical disc applied as mass storage. Moreover, as discussed above and described in detail below, far memory 122 may be placed on a memory bus and may communicate directly with a memory controller that, in turn, communicates directly with the processor 100.
[0040] Far memory 122 may be combined with other instruction and data storage technologies (e.g., DRAM) to form hybrid memories (also known as Co-locating PCM and DRAM; first level memory and second level memory; FLAM (FLASH and DRAM)). Note that at least some of the above
technologies, including PCM/PCMS may be used for mass storage instead of, or in addition to, system memory, and need not be random accessible, byte addressable or directly addressable by the processor when applied in this manner.
[0041] For convenience of explanation, most of the remainder of the application will refer to "NVRAM" or, more specifically, "PCM," or "PCMS" as the technology selection for the far memory 122. As such, the terms
NVRAM, PCM, PCMS, and far memory may be used interchangeably in the following discussion. However it should be realized, as discussed above, that different technologies may also be utilized for far memory. Also, that NVRAM is not limited for use as far memory.
C. Near Memory
[0042] "Near memory" 121 is an intermediate level of memory configured in front of a far memory 122 that has lower read/write access latency relative to far memory and/or more symmetric read/write access latency (i.e., having read times which are roughly equivalent to write times). In some
embodiments, the near memory 121 has significantly lower write latency than the far memory 122 but similar (e.g., slightly lower or equal) read latency; for instance the near memory 121 may be a volatile memory such as volatile random access memory (VRAM) and may comprise a DRAM or other high speed capacitor-based memory. Note, however, that the underlying principles of the invention are not limited to these specific memory types. Additionally, the near memory 121 may have a relatively lower density and/or may be more expensive to manufacture than the far memory 122.
[0043] In one embodiment, near memory 121 is configured between the far memory 122 and the internal processor caches 120. In some of the embodiments described below, near memory 121 is configured as one or more memory-side caches (MSCs) 107-109 to mask the performance and/or usage limitations of the far memory including, for example, read/write latency limitations and memory degradation limitations. In these
implementations, the combination of the MSC 107-109 and far memory 122 operates at a performance level which approximates, is equivalent or exceeds a system which uses only DRAM as system memory. As
discussed in detail below, although shown as a "cache" in Figure 1 , the near memory 121 may include modes in which it performs other roles, either in addition to, or in lieu of, performing the role of a cache.
[0044] Near memory 121 can be located on the processor die (as cache(s) 106) and/or located external to the processor die (as caches 107- 109) (e.g., on a separate die located on the CPU package, located outside the CPU package with a high bandwidth link to the CPU package, for example, on a memory dual in-line memory module (DIMM), a
riser/mezzanine, or a computer motherboard). The near memory 121 may be coupled in communicate with the processor 100 using a single or multiple high bandwidth links, such as DDR or other transactional high bandwidth links (as described in detail below).
AN EXEMPLARY SYSTEM MEMORY ALLOCATION SCHEME
[0045] Figure 1 illustrates how various levels of caches 101 -109 are configured with respect to a system physical address (SPA) space 1 16-1 19 in embodiments of the invention. As mentioned, this embodiment comprises a processor 100 having one or more cores 101 -104, with each core having its own dedicated upper level cache (L0) 101 a-104a and mid-level cache (MLC) (L1 ) cache 101 b-104b. The processor 100 also includes a shared LLC 105. The operation of these various cache levels are well understood and will not be described in detail here.
[0046] The caches 107-109 illustrated in Figure 1 may be dedicated to a particular system memory address range or a set of non-contiguous address ranges. For example, cache 107 is dedicated to acting as an MSC for system memory address range # 1 1 16 and caches 108 and 109 are dedicated to acting as MSCs for non-overlapping portions of system memory address ranges # 2 1 17 and # 3 1 18. The latter implementation may be used for systems in which the SPA space used by the processor 100 is interleaved into an address space used by the caches 107-109 (e.g., when configured as MSCs). In some embodiments, this latter address space is referred to as a memory channel address (MCA) space. In one embodiment, the internal caches 101 a-106 perform caching operations for the entire SPA space.
[0047] System memory as used herein is memory which is visible to and/or directly addressable by software executed on the processor 100;
while the cache memories 101 a-109 may operate transparently to the software in the sense that they do not form a directly-addressable portion of the system address space, but the cores may also support execution of instructions to allow software to provide some control (configuration, policies, hints, etc.) to some or all of the cache(s). The subdivision of system memory into regions 1 16-1 19 may be performed manually as part of a system configuration process (e.g., by a system designer) and/or may be performed automatically by software.
[0048] In one embodiment, the system memory regions 1 16-1 19 are implemented using far memory (e.g., PCM) and, in some embodiments, near memory configured as system memory. System memory address range # 4 represents an address range which is implemented using a higher speed memory such as DRAM which may be a near memory configured in a system memory mode (as opposed to a caching mode).
[0049] Figure 2 illustrates a memory/storage hierarchy 140 and different configurable modes of operation for near memory 144 and NVRAM
according to embodiments of the invention. The memory/storage hierarchy
140 has multiple levels including (1 ) a cache level 150 which may include processor caches 150A (e.g., caches 101 A-105 in Figure 1 ) and optionally near memory as cache for far memory 150B (in certain modes of operation as described herein), (2) a system memory level 151 which may include far memory 151 B (e.g., NVRAM such as PCM) when near memory is present
(or just NVRAM as system memory 174 when near memory is not present), and optionally near memory operating as system memory 151 A (in certain modes of operation as described herein), (3) a mass storage level 152 which may include a flash/magnetic/optical mass storage 152B and/or NVRAM mass storage 152A (e.g., a portion of the NVRAM 142); and (4) a firmware memory level 153 that may include BIOS flash 170 and/or BIOS NVRAM 172 and optionally trusted platform module (TPM) NVRAM 173.
[0050] As indicated, near memory 144 may be implemented to operate in a variety of different modes including: a first mode in which it operates as a cache for far memory (near memory as cache for FM 150B); a second mode in which it operates as system memory 151 A and occupies a portion of the SPA space (sometimes referred to as near memory "direct access" mode); and one or more additional modes of operation such as a scratchpad memory 192 or as a write buffer 193. In some embodiments of the invention, the near memory is partitionable, where each partition may concurrently operate in a different one of the supported modes; and different
embodiments may support configuration of the partitions (e.g., sizes, modes) by hardware (e.g., fuses, pins), firmware, and/or software (e.g., through a set of programmable range registers within the MSC controller 124 within which, for example, may be stored different binary codes to identify each mode and partition).
[0051] System address space A 190 in Figure 2 is used to illustrate operation when near memory is configured as a MSC for far memory 150B. In this configuration, system address space A 190 represents the entire system address space (and system address space B 191 does not exist). Alternatively, system address space B 191 is used to show an
implementation when all or a portion of near memory is assigned a portion of the system address space. In this embodiment, system address space B 191 represents the range of the system address space assigned to the near memory 151 A and system address space A 190 represents the range of the system address space assigned to NVRAM 174. [0052] In addition, when acting as a cache for far memory 150B, the near memory 144 may operate in various sub-modes under the control of the MSC controller 124. In each of these modes, the near memory address space (NMA) is transparent to software in the sense that the near memory does not form a directly-addressable portion of the system address space. These modes include but are not limited to the following:
(1 ) Write-Back Caching Mode: In this mode, all or portions of the near memory acting as a FM cache 150B is used as a cache for the NVRAM far memory (FM) 151 B. While in write-back mode, every write operation is directed initially to the near memory as cache for FM 150B (assuming that the cache line to which the write is directed is present in the cache). A corresponding write operation is performed to update the NVRAM FM 151 B only when the cache line within the near memory as cache for FM 150B is to be replaced by another cache line (in contrast to write-through mode described below in which each write operation is immediately propagated to the NVRAM FM 151 B).
(2) Near Memory Bypass Mode: In this mode all reads and writes bypass the NM acting as a FM cache 150B and go directly to the NVRAM FM 151 B. Such a mode may be used, for example, when an application is not cache friendly or requires data to be committed to persistence at the granularity of a cache line. In one embodiment, the caching performed by the processor caches 150A and the NM acting as a FM cache 150B operate independently of one another. Consequently, data may be cached in the NM acting as a FM cache 150B which is not cached in the processor caches 150A (and which, in some cases, may not be permitted to be cached in the processor caches 150A) and vice versa. Thus, certain data which may be designated as "uncacheable" in the processor caches may be cached within the NM acting as a FM cache 150B.
(3) Near Memory Read-Cache Write Bypass Mode: This is a variation of the above mode where read caching of the persistent data from NVRAM
FM 151 B is allowed (i.e., the persistent data is cached in the near memory as cache for far memory 150B for read-only operations). This is useful when most of the persistent data is "Read-Only" and the application usage is cache-friendly.
(4) Near Memory Read-Cache Write-Through Mode: This is a variation of the near memory read-cache write bypass mode, where in addition to read caching, write-hits are also cached. Every write to the near memory as cache for FM 150B causes a write to the FM 151 B. Thus, due to the write-through nature of the cache, cache-line persistence is still guaranteed.
[0053] When acting in near memory direct access mode, all or portions of the near memory as system memory 151 A are directly visible to software and form part of the SPA space. Such memory may be completely under software control. Such a scheme may create a non-uniform memory address (NUMA) memory domain for software where it gets higher
performance from near memory 144 relative to NVRAM system memory 174. By way of example, and not limitation, such a usage may be employed for certain high performance computing (HPC) and graphics applications which require very fast access to certain data structures.
[0054] In an alternate embodiment, the near memory direct access mode is implemented by "pinning" certain cache lines in near memory (i.e., cache lines which have data that is also concurrently stored in NVRAM 142). Such pinning may be done effectively in larger, multi-way, set-associative caches.
[0055] Figure 2 also illustrates that a portion of the NVRAM 142 may be used as firmware memory. For example, the BIOS NVRAM 172 portion may be used to store BIOS images (instead of or in addition to storing the BIOS information in BIOS flash 170). The BIOS NVRAM portion 172 may be a portion of the SPA space and is directly addressable by software executed on the processor cores 101 -104, whereas the BIOS flash 170 is addressable through the I/O subsystem 1 15. As another example, a trusted platform module (TPM) NVRAM 173 portion may be used to protect sensitive system information (e.g., encryption keys).
[0056] Thus, as indicated, the NVRAM 142 may be implemented to operate in a variety of different modes, including as far memory 151 B (e.g., when near memory 144 is present/operating, whether the near memory is acting as a cache for the FM via a MSC control 124 or not (accessed directly after cache(s) 101 A - 105 and without MSC control 124)); just NVRAM system memory 174 (not as far memory because there is no near memory present/operating; and accessed without MSC control 124); NVRAM mass storage 152A; BIOS NVRAM 172; and TPM NVRAM 173. While different embodiments may specify the NVRAM modes in different ways, Figure 3 describes the use of a decode table 333.
[0057] Figure 3 illustrates an exemplary computer system 300 on which embodiments of the invention may be implemented. The computer system 300 includes a processor 310 and memory/storage subsystem 380 with a NVRAM 142 used for both system memory, mass storage, and optionally firmware memory. In one embodiment, the NVRAM 142 comprises the entire system memory and storage hierarchy used by computer system 300 for storing data, instructions, states, and other persistent and non-persistent information. As previously discussed, NVRAM 142 can be configured to implement the roles in a typical memory and storage hierarchy of system memory, mass storage, and firmware memory, TPM memory, and the like. In the embodiment of Figures 3, NVRAM 142 is partitioned into FM 151 B, NVRAM mass storage 152A, BIOS NVRAM 173, and TMP NVRAM 173. Storage hierarchies with different roles are also contemplated and the application of NVRAM 142 is not limited to the roles described above.
[0058] By way of example, operation while the near memory as cache for
FM 150B is in the write-back caching is described. In one embodiment, while the near memory as cache for FM 150B is in the write-back caching mode mentioned above, a read operation will first arrive at the MSC controller 124 which will perform a look-up to determine if the requested data is present in the near memory acting as a cache for FM 150B (e.g., utilizing a tag cache 342). If present, it will return the data to the requesting CPU, core 101 -104 or I/O device through I/O subsystem 1 15. If the data is not present, the MSC controller 124 will send the request along with the system memory address to an NVRAM controller 332. The NVRAM controller 332 will use the decode table 333 to translate the system memory address to an NVRAM physical device address (PDA) and direct the read operation to this region of the far memory 151 B. In one embodiment, the decode table 333 includes an address indirection table (AIT) component which the NVRAM controller 332 uses to translate between system memory addresses and NVRAM PDAs. In one embodiment, the AIT is updated as part of the wear leveling algorithm implemented to distribute memory access operations and thereby reduce wear on the NVRAM FM 151 B. Alternatively, the AIT may be a separate table stored within the NVRAM controller 332.
[0059] Upon receiving the requested data from the NVRAM FM 151 B, the NVRAM controller 332 will return the requested data to the MSC controller 124 which will store the data in the MSC near memory acting as an FM cache 150B and also send the data to the requesting processor core 101 -104, or I/O Device through I/O subsystem 1 15. Subsequent requests for this data may be serviced directly from the near memory acting as a FM cache 150B until it is replaced by some other NVRAM FM data.
[0060] As mentioned, in one embodiment, a memory write operation also first goes to the MSC controller 124 which writes it into the MSC near memory acting as a FM cache 150B. In write-back caching mode, the data may not be sent directly to the NVRAM FM 151 B when a write operation is received. For example, the data may be sent to the NVRAM FM 151 B only when the location in the MSC near memory acting as a FM cache 150B in which the data is stored must be re-used for storing data for a different system memory address. When this happens, the MSC controller 124 notices that the data is not current in NVRAM FM 151 B and will thus retrieve it from near memory acting as a FM cache 150B and send it to the NVRAM controller 332. The NVRAM controller 332 looks up the PDA for the system memory address and then writes the data to the NVRAM FM 151 B.
[0061] In Figure 3, the NVRAM controller 332 is shown connected to the FM 151 B, NVRAM mass storage 152A, and BIOS NVRAM 172 using three separate lines. This does not necessarily mean, however, that there are three separate physical buses or communication channels connecting the NVRAM controller 332 to these portions of the NVRAM 142. Rather, in some embodiments, a common memory bus or other type of bus (such as those described below with respect to Figures 4A-M) is used to
communicatively couple the NVRAM controller 332 to the FM 151 B, NVRAM mass storage 152A, and BIOS NVRAM 172. For example, in one embodiment, the three lines in Figure 3 represent a bus, such as a memory bus (e.g., a DDR3, DDR4, etc, bus), over which the NVRAM controller 332 implements a transactional protocol to communicate with the NVRAM 142. The NVRAM controller 332 may also communicate with the NVRAM 142 over a bus supporting a native transactional protocol such as a PCI express bus, desktop management interface (DMI) bus, or any other type of bus utilizing a transactional protocol and a small enough transaction payload size (e.g., cache line size such as 64 or 128 byte).
[0062] In one embodiment, computer system 300 includes integrated memory controller (IMC) 331 which performs the central memory access control for processor 310, which is coupled to: 1 ) a memory-side cache (MSC) controller 124 to control access to near memory (NM) acting as a far memory cache 150B; and 2) a NVRAM controller 332 to control access to NVRAM 142. Although illustrated as separate units in Figure 3, the MSC controller 124 and NVRAM controller 332 may logically form part of the IMC 331 .
[0063] In the illustrated embodiment, the MSC controller 124 includes a set of range registers 336 which specify the mode of operation in use for the
NM acting as a far memory cache 150B (e.g., write-back caching mode, near memory bypass mode, etc, described above). In the illustrated embodiment, DRAM 144 is used as the memory technology for the NM acting as cache for far memory 150B. In response to a memory access request, the MSC controller 124 may determine (depending on the mode of operation specified in the range registers 336) whether the request can be serviced from the NM acting as cache for FM 150B or whether the request must be sent to the NVRAM controller 332, which may then service the request from the far memory (FM) portion 151 B of the NVRAM 142.
[0064] In an embodiment where NVRAM 142 is implemented with PCMS, NVRAM controller 332 is a PCMS controller that performs access with protocols consistent with the PCMS technology. As previously discussed, the PCMS memory is inherently capable of being accessed at the
granularity of a byte. Nonetheless, the NVRAM controller 332 may access a PCMS-based far memory 151 B at a lower level of granularity such as a cache line (e.g., a 64-bit or 128-bit cache line) or any other level of granularity consistent with the memory subsystem. The underlying principles of the invention are not limited to any particular level of granularity for accessing a PCMS-based far memory 151 B. In general, however, when PCMS-based far memory 151 B is used to form part of the system address space, the level of granularity will be higher than that traditionally used for other non-volatile storage technologies such as FLASH, which can only perform rewrite and erase operations at the level of a "block" (minimally 64Kbyte in size for NOR FLASH and 16 Kbyte for NAND FLASH).
[0065] In the illustrated embodiment, NVRAM controller 332 can read configuration data to establish the previously described modes, sizes, etc. for the NVRAM 142 from decode table 333, or alternatively, can rely on the decoding results passed from IMC 331 and I/O subsystem 315. For example, at either manufacturing time or in the field, computer system 300 can program decode table 333 to mark different regions of NVRAM 142 as system memory, mass storage exposed via SATA interfaces, mass storage exposed via USB Bulk Only Transport (BOT) interfaces, encrypted storage that supports TPM storage, among others. The means by which access is steered to different partitions of NVRAM device 142 is via a decode logic. For example, in one embodiment, the address range of each partition is defined in the decode table 333. In one embodiment, when IMC 331 receives an access request, the target address of the request is decoded to reveal whether the request is directed toward memory, NVRAM mass storage, or I/O. If it is a memory request, IMC 331 and/or the MSC controller 124 further determines from the target address whether the request is directed to NM as cache for FM 150B or to FM 151 B. For FM 151 B access, the request is forwarded to NVRAM controller 332. IMC 331 passes the request to the I/O subsystem 1 15 if this request is directed to I/O (e.g., non-storage and storage I/O devices). I/O subsystem 1 15 further decodes the address to determine whether the address points to NVRAM mass storage 152A, BIOS NVRAM 172, or other non-storage or storage I/O devices. If this address points to NVRAM mass storage 152A or BIOS NVRAM 172, I/O subsystem 1 15 forwards the request to NVRAM controller 332. If this address points to TMP NVRAM 173, I/O subsystem 1 15 passes the request to TPM 334 to perform secured access.
[0066] In one embodiment, each request forwarded to NVRAM controller 332 is accompanied with an attribute (also known as a "transaction type") to indicate the type of access. In one embodiment, NVRAM controller 332 may emulate the access protocol for the requested access type, such that the rest of the platform remains unaware of the multiple roles performed by NVRAM 142 in the memory and storage hierarchy. In alternative
embodiments, NVRAM controller 332 may perform memory access to NVRAM 142 regardless of which transaction type it is. It is understood that the decode path can be different from what is described above. For example, IMC 331 may decode the target address of an access request and determine whether it is directed to NVRAM 142. If it is directed to NVRAM 142, IMC 331 generates an attribute according to decode table 333. Based on the attribute, IMC 331 then forwards the request to appropriate
downstream logic (e.g., NVRAM controller 332 and I/O subsystem 315) to perform the requested data access. In yet another embodiment, NVRAM controller 332 may decode the target address if the corresponding attribute is not passed on from the upstream logic (e.g., IMC 331 and I/O subsystem 315). Other decode paths may also be implemented.
[0067] The presence of a new memory architecture such as described herein provides for a wealth of new possibilities. Although discussed at much greater length further below, some of these possibilities are quickly highlighted immediately below.
[0068] According to one possible implementation, NVRAM 142 acts as a total replacement or supplement for traditional DRAM technology in system memory. In one embodiment, NVRAM 142 represents the introduction of a second-level system memory (e.g., the system memory may be viewed as having a first level system memory comprising near memory as cache 150B (part of the DRAM device 340) and a second level system memory
comprising far memory (FM) 151 B (part of the NVRAM 142).
[0069] According to some embodiments, NVRAM 142 acts as a total replacement or supplement for the flash/magnetic/optical mass storage 152B. As previously described, in some embodiments, even though the NVRAM 152A is capable of byte-level addressability, NVRAM controller 332 may still access NVRAM mass storage 152A in blocks of multiple bytes, depending on the implementation (e.g., 64 Kbytes, 128 Kbytes, etc.). The specific manner in which data is accessed from NVRAM mass storage 152A by NVRAM controller 332 may be transparent to software executed by the processor 310. For example, even through NVRAM mass storage 152A may be accessed differently from Flash/magnetic/optical mass storage 152A, the operating system may still view NVRAM mass storage 152A as a standard mass storage device (e.g., a serial ATA hard drive or other standard form of mass storage device).
[0070] In an embodiment where NVRAM mass storage 152A acts as a total replacement for the flash/magnetic/optical mass storage 152B, it is not necessary to use storage drivers for block-addressable storage access. The removal of storage driver overhead from storage access can increase access speed and save power. In alternative embodiments where it is desired that NVRAM mass storage 152A appears to the OS and/or applications as block-accessible and indistinguishable from
flash/magnetic/optical mass storage 152B, emulated storage drivers can be used to expose block-accessible interfaces (e.g., Universal Serial Bus (USB) Bulk-Only Transfer (BOT), 1 .0; Serial Advanced Technology Attachment (SATA), 3.0; and the like) to the software for accessing NVRAM mass storage 152A.
[0071] In one embodiment, NVRAM 142 acts as a total replacement or supplement for firmware memory such as BIOS flash 362 and TPM flash 372 (illustrated with dotted lines in Figure 3 to indicate that they are optional). For example, the NVRAM 142 may include a BIOS NVRAM 172 portion to supplement or replace the BIOS flash 362 and may include a TPM NVRAM 173 portion to supplement or replace the TPM flash 372. Firmware memory can also store system persistent states used by a TPM 334 to protect sensitive system information (e.g., encryption keys). In one embodiment, the use of NVRAM 142 for firmware memory removes the need for third party flash parts to store code and data that are critical to the system operations.
[0072] Continuing then with a discussion of the system of Figure 3, in some embodiments, the architecture of computer system 100 may include multiple processors, although a single processor 31 0 is illustrated in Figure 3 for simplicity. Processor 310 may be any type of data processor including a general purpose or special purpose central processing unit (CPU), an application-specific integrated circuit (ASIC) or a digital signal processor (DSP). For example, processor 310 may be a general-purpose processor, such as a Core™ i3, i5, i7, 2 Duo and Quad, Xeon™, or Itanium™
processor, all of which are available from Intel Corporation, of Santa Clara, Calif. Alternatively, processor 310 may be from another company, such as ARM Holdings, Ltd, of Sunnyvale, CA, MIPS Technologies of Sunnyvale, CA, etc. Processor 310 may be a special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, co-processor, embedded processor, or the like.
Processor 310 may be implemented on one or more chips included within one or more packages. Processor 310 may be a part of and/or may be implemented on one or more substrates using any of a number of process technologies, such as, for example, BiCMOS, CMOS, or NMOS. In the embodiment shown in Figure 3, processor 310 has a system-on-a-chip (SOC) configuration.
[0073] In one embodiment, the processor 310 includes an integrated graphics unit 31 1 which includes logic for executing graphics commands such as 3D or 2D graphics commands. While the embodiments of the invention are not limited to any particular integrated graphics unit 31 1 , in one embodiment, the graphics unit 31 1 is capable of executing industry standard graphics commands such as those specified by the Open GL and/or Direct X application programming interfaces (APIs) (e.g., OpenGL 4.1 and Direct X 1 1 ).
[0074] The processor 310 may also include one or more cores 101 -104, although a single core is illustrated in Figure 3, again, for the sake of clarity. In many embodiments, the core(s) 101 -104 includes internal functional blocks such as one or more execution units, retirement units, a set of general purpose and specific registers, etc. If the core(s) are multi-threaded or hyper-threaded, then each hardware thread may be considered as a "logical" core as well. The cores 101 -104 may be homogenous or
heterogeneous in terms of architecture and/or instruction set. For example, some of the cores may be in order while others are out-of-order. As another example, two or more of the cores may be capable of executing the same instruction set, while others may be capable of executing only a subset of that instruction set or a different instruction set. [0075] The processor 310 may also include one or more caches, such as cache 313 which may be implemented as a SRAM and/or a DRAM. In many embodiments that are not shown, additional caches other than cache 313 are implemented so that multiple levels of cache exist between the execution units in the core(s) 101 -104 and memory devices 150B and 151 B. For example, the set of shared cache units may include an upper-level cache, such as a level 1 (L1 ) cache, mid-level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of cache, an (LLC), and/or different combinations thereof. In different embodiments, cache 313 may be apportioned in different ways and may be one of many different sizes in different embodiments. For example, cache 313 may be an 8 megabyte (MB) cache, a 16 MB cache, etc. Additionally, in different embodiments the cache may be a direct mapped cache, a fully associative cache, a multi-way set-associative cache, or a cache with another type of mapping. In other embodiments that include multiple cores, cache 313 may include one large portion shared among all cores or may be divided into several separately functional slices (e.g., one slice for each core). Cache 313 may also include one portion shared among all cores and several other portions that are separate functional slices per core.
[0076] The processor 310 may also include a home agent 314 which includes those components coordinating and operating core(s) 101 -104. The home agent unit 314 may include, for example, a power control unit (PCU) and a display unit. The PCU may be or include logic and
components needed for regulating the power state of the core(s) 101 -104 and the integrated graphics unit 31 1 . The display unit is for driving one or more externally connected displays.
[0077] As mentioned, in some embodiments, processor 310 includes an integrated memory controller (IMC) 331 , near memory cache (MSC) controller, and NVRAM controller 332 all of which can be on the same chip as processor 310, or on a separate chip and/or package connected to processor 310. DRAM device 144 may be on the same chip or a different chip as the IMC 331 and MSC controller 124; thus, one chip may have processor 310 and DRAM device 144; one chip may have the processor 310 and another the DRAM device 144 and (these chips may be in the same or different packages); one chip may have the core(s) 101 -104 and another the IMC 331 , MSC controller 124 and DRAM 144 (these chips may be in the same or different packages); one chip may have the core(s) 101 -104, another the IMC 331 and MSC controller 124, and another the DRAM 144 (these chips may be in the same or different packages); etc.
[0078] In some embodiments, processor 310 includes an I/O subsystem 1 15 coupled to IMC 331 . I/O subsystem 1 15 enables communication between processor 310 and the following serial or parallel I/O devices: one or more networks 336 (such as a Local Area Network, Wide Area Network or the Internet), storage I/O device (such as flash/magnetic/optical mass storage 152B, BIOS flash 362, TPM flash 372) and one or more non-storage I/O devices 337 (such as display, keyboard, speaker, and the like). I/O subsystem 1 15 may include a platform controller hub (PCH) (not shown) that further includes several I/O adapters 338 and other I/O circuitry to provide access to the storage and non-storage I/O devices and networks. To accomplish this, I/O subsystem 1 15 may have at least one integrated I/O adapter 338 for each I/O protocol utilized. I/O subsystem 1 15 can be on the same chip as processor 310, or on a separate chip and/or package connected to processor 310.
[0079] I/O adapters 338 translate a host communication protocol utilized within the processor 310 to a protocol compatible with particular I/O devices. For flash/magnetic/optical mass storage 152B, some of the protocols that I/O adapters 338 may translate include Peripheral Component Interconnect (PCI)-Express (PCI-E), 3.0; USB, 3.0; SATA, 3.0; Small Computer System Interface (SCSI), Ultra-640; and Institute of Electrical and Electronics
Engineers (IEEE) 1394 "Firewire;" among others. For BIOS flash 362, some of the protocols that I/O adapters 338 may translate include Serial
Peripheral Interface (SPI), Microwire, among others. Additionally, there may be one or more wireless protocol I/O adapters. Examples of wireless protocols, among others, are used in personal area networks, such as IEEE 802.15 and Bluetooth, 4.0; wireless local area networks, such as IEEE 802.1 1 -based wireless protocols; and cellular protocols.
[0080] In some embodiments, the I/O subsystem 1 15 is coupled to a TPM control 334 to control access to system persistent states, such as secure data, encryption keys, platform configuration information and the like. In one embodiment, these system persistent states are stored in a TMP NVRAM 173and accessed via NVRAM controller 332.
[0081] In one embodiment, TPM 334 is a secure micro-controller with cryptographic functionalities. TPM 334 has a number of trust-related capabilities; e.g., a SEAL capability for ensuring that data protected by a TPM is only available for the same TPM. TPM 334 can protect data and keys (e.g., secrets) using its encryption capabilities. In one embodiment, TPM 334 has a unique and secret RSA key, which allows it to authenticate hardware devices and platforms. For example, TPM 334 can verify that a system seeking access to data stored in computer system 300 is the expected system. TPM 334 is also capable of reporting the integrity of the platform (e.g., computer system 300). This allows an external resource (e.g., a server on a network) to determine the trustworthiness of the platform but does not prevent access to the platform by the user.
[0082] In some embodiments, I/O subsystem 315 also includes a
Management Engine (ME) 335, which is a microprocessor that allows a system administrator to monitor, maintain, update, upgrade, and repair computer system 300. In one embodiment, a system administrator can remotely configure computer system 300 by editing the contents of the decode table 333 through ME 335 via networks 336.
[0083] For convenience of explanation, the remainder of the application sometimes refers to NVRAM 142 as a PCMS device. A PCMS device includes multi-layered (vertically stacked) PCM cell arrays that are non- volatile, have low power consumption, and are modifiable at the bit level. As such, the terms NVRAM device and PCMS device may be used
interchangeably in the following discussion. However it should be realized, as discussed above, that different technologies besides PCMS may also be utilized for NVRAM 142.
[0084] It should be understood that a computer system can utilize
NVRAM 142 for system memory, mass storage, firmware memory and/or other memory and storage purposes even if the processor of that computer system does not have all of the above-described components of processor 310, or has more components than processor 310.
[0085] In the particular embodiment shown in Figure 3, the MSC controller 124 and NVRAM controller 332 are located on the same die or package (referred to as the CPU package) as the processor 310. In other embodiments, the MSC controller 124 and/or NVRAM controller 332 may be located off-die or off-CPU package, coupled to the processor 310 or CPU package over a bus such as a memory bus (like a DDR bus (e.g., a DDR3, DDR4, etc)), a PCI express bus, a desktop management interface (DMI) bus, or any other type of bus.
EXEMPLARY PCM Bus AND PACKAGING CONFIGURATIONS
[0086] Figures 4A-M illustrates a variety of different deployments in which the processor, near memory and far memory are configured and packaged in different ways. In particular, the series of platform memory configurations illustrated in Figures 4A-M enable the use of new nonvolatile system memory such as PCM technologies or, more specifically, PCMS technologies.
[0087] While some of the same numerical designations are used across multiple figures in Figures 4A-M, this does not necessarily mean that that the structures identified by those numerical designations are always identical. For example, while the same numbers are used to identify an integrated memory controller (IMC) 331 and CPU 401 in several figures, these components may be implemented differently in different figures.
Some of these differences are not highlighted because they are not pertinent to understanding the underlying principles of the invention.
[0088] While several different system platform configuration approaches are described below, these approaches fall into two broad categories: split architecture, and unified architecture. Briefly, in the split architecture scheme, a memory side cache (MSC) controller (e.g., located in the processor die or on a separate die in the CPU package) intercepts all system memory requests. There are two separate interfaces that "flow downstream" from that controller that exit the CPU package to couple to the Near Memory and Far Memory. Each interface is tailored for the specific type of memory and each memory can be scaled independently in terms of performance and capacity.
[0089] In the unified architecture scheme a single memory interface exits the processor die or CPU package and all memory requests are sent to this interface. The MSC controller along with the Near and Far Memory
subsystems are consolidated on this single interface. This memory interface must be tailored to meet the memory performance requirements of the processor and must support a transactional, out-of-order protocol at least because PCMS devices may not process read requests in order. In accordance with the above general categories, the following specific platform configurations may be employed.
[0090] The embodiments described below include various types of buses/channels. The terms "bus" and "channel" are used synonymously herein. The number of memory channels per DIMM socket will depend on the particular CPU package used in the computer system (with some CPU packages supporting, for example, three memory channels per socket).
[0091] Additionally, in the embodiments described below which use DRAM, virtually any type of DRAM memory channels may be used including, by way of example and not limitation, DDR channels (e.g., DDR3, DDR4, DDR5, etc). Thus, while DDR is advantageous because of its wide acceptance in the industry, resulting price point, etc., the underlying principles of the invention are not limited to any particular type of DRAM or volatile memory.
[0092] Figure 4A illustrates one embodiment of a split architecture which includes one or more DRAM devices 403-406 operating as near memory acting as cache for FM (i.e., MSC) in the CPU package 401 (either on the processor die or on a separate die) and one or more NVRAM devices such as PCM memory residing on DIMMs 450-451 acting as far memory. High bandwidth links 407 on the CPU package 401 interconnect a single or multiple DRAM devices 403-406 to the processor 310 which hosts the integrated memory controller (IMC) 331 and MSC controller 124. Although illustrated as separate units in Figures 4A and other figures described below, the MSC controller 124 may be integrated within the memory controller 331 in one embodiment.
[0093] The DIMMs 450-451 use DDR slots and electrical connections defining a DDR channels 440 with DDR address, data and control lines and voltages (e.g., the DDR3 or DDR4 standard as defined by the Joint Electron Devices Engineering Council (JEDEC)). The PCM devices on the DIMMs 450-451 provide the far memory capacity of this split architecture, with the DDR channels 440 to the CPU package 401 able to carry both DDR and transactional protocols. In contrast to DDR protocols in which the processor 310 or other logic within the CPU package (e.g., the IMC 331 or MSC controller 124) transmits a command and receives an immediate response, the transactional protocol used to communicate with PCM devices allows the CPU 401 to issue a series of transactions, each identified by a unique transaction ID. The commands are serviced by a PCM controller on the recipient one of the PCM DIMMs, which sends responses back to the CPU package 401 , potentially out of order. The processor 310 or other logic within the CPU package 401 identifies each transaction response by its transaction ID, which is sent with the response. The above configuration allows the system to support both standard DDR DRAM-based DIMMs (using DDR protocols over DDR electrical connections) and PCM-based DIMMs configurations (using transactional protocols over the same DDR electrical connections).
[0094] Figure 4B illustrates a split architecture which uses DDR DRAM- based DIMMs 452 coupled over DDR channels 440 to form near memory which acts as an MSC. The processor 310 hosts the memory controller 331 and MSC controller 124. NVRAM devices such as PCM memory devices reside on PCM-based DIMMs 453 that use DDR slots and electrical connections on additional DDR channels 442 off the CPU package 401 . The PCM-based DIMMs 453 provide the far memory capacity of this split architecture, with the DDR channels 442 to the CPU package 401 being based on DDR electrical connections and able to carry both DDR and transactional protocols. This allows the system to be configured with varying numbers of DDR DRAM DIMMs 452 (e.g., DDR4 DIMMS) and PCM DIMMs 453 to achieve the desired capacity and/or performance points.
[0095] Figure 4C illustrates a split architecture which hosts the near memory 403-406 acting as a memory side cache (MSC) on the CPU package 401 (either on the processor die or on a separate die). High bandwidth links 407 on the CPU package are used to interconnect a single or multiple DRAM devices 403-406 to the processor 310 which hosts the memory controller 331 and the MSC controller 124, as defined by the split architecture. NVRAM such as PCM memory devices reside on PCI Express cards or risers 455 that use PCI Express electrical connections and PCI Express protocol or a different transactional protocol over the PCI Express bus 454. The PCM devices on the PCI Express cards or risers 455 provide the far memory capacity of this split architecture.
[0096] Figure 4D is a split architecture which uses DDR DRAM-based
DIMMs 452 and DDR channels 440 to form the near memory which acts as an MSC. The processor 310 hosts the memory controller 331 and MSC controller 124. NVRAM such as PCM memory devices 455 reside on PCI Express cards or risers that use PCI Express electrical connections and PCI Express protocol or a different transactional protocol over the PCI Express link 454. The PCM devices on the PCI Express cards or risers 455 provide the far memory capacity of this split architecture, with the memory channel interfaces off the CPU package 401 providing multiple DDR channels 440 for DDR DRAM DIMMs 452.
[0097] Figure 4E illustrates a unified architecture which hosts both near memory acting as an MSC and far memory NVRAM such as PCM on PCI Express cards or risers 456 that use PCI Express electrical connections and PCI Express protocol or a different transactional protocol over the PCI Express bus 454. The processor 310 hosts the integrated memory controller 331 but, in this unified architecture case, the MSC controller 124 resides on the card or riser 456, along with the DRAM near memory and NVRAM far memory.
[0098] Figure 4F illustrates a unified architecture which hosts both the near memory acting as an MSC and the far memory NVRAM such as PCM, on DIMMs 458 using DDR channels 457. The near memory in this unified architecture comprises DRAM on each DIMM 458, acting as the memory side cache to the PCM devices on that same DIMM 458, that form the far memory of that particular DIMM. The MSC controller 124 resides on each DIMM 458, along with the near and far memory. In this embodiment, multiple memory channels of a DDR bus 457 are provided off the CPU package. The DDR bus 457 of this embodiment implements a transactional protocol over DDR electrical connections.
[0099] Figure 4G illustrates a hybrid split architecture, whereby the MSC controller 124 resides on the processor 310 and both near memory and far memory interfaces share the same DDR bus 410. This configuration uses
DRAM-based DDR DIMMs 41 1 a as near memory acting as an MSC with the
PCM-Based DIMMs 41 1 b (i.e., far memory) residing on the same memory channel of the DDR bus 410, using DDR slots and NVRAM (such as PCM memory devices). The memory channels of this embodiment carry both DDR and transactional protocols simultaneously to address the near memory and far memory DIMMs, 41 1 a and 41 1 b, respectively.
[0100] Figure 4H illustrates a unified architecture in which the near memory 461 a acting as a memory side cache resides on a mezzanine or riser 461 , in the form of DRAM-based DDR DIMMs. The memory side cache (MSC) controller 124 is located in the riser's DDR and PCM controller 460 which may have two or more memory channels connecting to DDR DIMM channels 470 on the mezzanine/riser 461 and interconnecting to the CPU over high performance interconnect(s) 462 such as a differential memory link. The associated far memory 461 b sits on the same
mezzanine/riser 461 and is formed by DIMMs that use DDR channels 470 and are populated with NVRAM (such as PCM devices).
[0101] Figure 4I illustrates a unified architecture that can be used as memory capacity expansion to a DDR memory subsystem and DIMMs 464 connected to the CPU package 401 on its DDR memory subsystem, over a DDR bus 471 . For the additional NVM-based capacity in this configuration, the near memory acting as a MSC resides on a mezzanine or riser 463, in the form of DRAM based DDR DIMMs 463a. The MSC controller 124 is located in the riser's DDR and PCM controller 460 which may have two or more memory channels connecting to DDR DIMM channels 470 on the mezzanine/riser and interconnecting to the CPU over high performance interconnect(s) 462 such as a differential memory link. The associated far memory 463b sits on the same mezzanine/riser 463 and is formed by DIMMs 463b that use DDR channels 470 and are populated with NVRAM (such as PCM devices).
[0102] Figure 4J is a unified architecture in which a near memory acting as a memory side cache (MSC) resides on each and every DIMM 465, in the form of DRAM. The DIMMs 465 are on a high performance
interconnect/channel(s) 462, such as a differential memory link, coupling the
CPU package 401 with the MSC controller 124 located on the DIMMs. The associated far memory sits on the same DIMMs 465 and is formed by NVRAM (such as PCM devices).
[0103] Figure 4K illustrates a unified architecture in which the near memory acting as a MSC resides on every DIMM 466, in the form of DRAM. The DIMMs are on high performance interconnect(s) 470 connecting to the CPU package 401 with the MSC controller 124 located on the DIMMs. The associated far memory sits on the same DIMM 466 and is formed by
NVRAM (such as PCM devices).
[0104] Figure 4L illustrates a split architecture which uses DDR DRAM- based DIMMs 464 on a DDR bus 471 to form the necessary near memory which acts as a MSC. The processor 310 hosts the integrated memory controller 331 and memory side cache controller 124. NVRAM such as PCM memory forms the far memory which resides on cards or risers 467 that use high performance interconnects 468 communicating to the CPU package 401 using a transactional protocol. The cards or risers 467 hosting the far memory host a single buffer/controller that can control multiple PCM- based memories or multiple PCM-based DIMMs connected on that riser.
[0105] Figure 4M illustrates a unified architecture which may use DRAM on a card or riser 469 to form the necessary near memory which acts as a MSC. NVRAM such as PCM memory devices form the far memory which also resides on the cards or risers 469 that use high performance
interconnects 468 to the CPU package 401 . The cards or risers 469 hosting the far memory hosts a single buffer/controller that can control multiple PCM-based devices or multiple PCM based DIMMs on that riser 469 and also integrates the memory side cache controller 124.
[0106] In some of the embodiments described above, such as that illustrated in Figure 4G, the DRAM DIMMS 41 1 a and PCM-based DIMMS 41 1 b reside on the same memory channel. Consequently the same set of address/control and data lines are used to connect the CPU to both the DRAM and PCM memories. In order to reduce the amount of data traffic through the CPU mesh interconnect, in one embodiment, a DDR DIMM on a common memory channel with a PCM-based DIMM is configured to act as the sole MSC for data stored in the PCM-based DIMM. In such a
configuration, the far memory data stored in the PCM-based DIMM is only cached in the DDR DIMM near memory within the same memory channel, thereby localizing memory transactions to that particular memory channel.
[0107] Additionally, to implement the above embodiment, the system address space may be logically subdivided between the different memory channels. For example, if there are four memory channels, then ¼ of the system address space may be allocated to each memory channel. If each memory channel is provided with one PCMS-based DIMM and one DDR DIMM, the DDR DIMM may be configured to act as the MSC for that ¼ portion of the system address space.
[0108] The choice of system memory and mass storage devices may depend on the type of electronic platforms on which embodiments of the invention are employed. For example, in a personal computer, tablet computer, notebook computer, smartphone, mobile phone, feature phone, personal digital assistant (PDA), portable media player, portable gaming device, gaming console, digital camera, switch, hub, router, set-top box, digital video recorder, or other devices that have relatively small mass storage requirements, the mass storage may be implemented using NVRAM mass storage 152A alone, or using NVRAM mass storage 152A in
combination with a flash/magnetic/optical mass storage 152B. In other electronic platforms that have relatively large mass storage requirements (e.g., large-scale servers), the mass storage may be implemented using magnetic storage (e.g., hard drives) or any combination of magnetic storage, optical storage, holographic storage, mass-storage flash memory, and NVRAM mass storage 152A. In such a case, system hardware and/or software responsible for storage may implement various intelligent persistent storage allocation techniques to allocate blocks of persistent program code and data between the FM 151 B/NVRAM storage 152A and a flash/magnetic/optical mass storage 1 52B in an efficient or otherwise useful manner.
[0109] For example, in one embodiment a high powered server is configured with a near memory (e.g., DRAM), a PCMS device, and a magnetic mass storage device for large amounts of persistent storage. In one embodiment, a notebook computer is configured with a near memory and a PCMS device which performs the role of both a far memory and a mass storage device (i.e., which is logically partitioned to perform these roles as shown in Figure 3). One embodiment of a home or office desktop computer is configured similarly to a notebook computer, but may also include one or more magnetic storage devices to provide large amounts of persistent storage capabilities.
[0110] One embodiment of a tablet computer or cellular telephony device is configured with PCMS memory but potentially no near memory and no additional mass storage (for cost/power savings). However, the
tablet/telephone may be configured with a removable mass storage device such as a flash or PCMS memory stick.
[0111 ] Various other types of devices may be configured as described above. For example, portable media players and/or personal digital assistants (PDAs) may be configured in a manner similar to
tablets/telephones described above, gaming consoles may be configured in a similar manner to desktops or laptops. Other devices which may be similarly configured include digital cameras, routers, set-top boxes, digital video recorders, televisions, and automobiles.
FAR MEMORY SIGNALING BASED ON USAGE STATISTIC(S) TRACKING
[0112] As alluded to above, the storage cells of various far memory technologies, such as PCMS, may have various reliability concerns that are a function of their usage. For example, the appropriate read and/or write low level access signals applied to a far memory storage cell (e.g., pulse width, voltage amplitude, current amplitude, etc.) may change as a function of the number of times it has been written to. Moreover, the appropriate read threshold voltage for a far memory storage cell (which also may be viewed as an analog access signal) may change as a function of the length of time that has elapsed since the storage cell was last written to.
[0113] As mentioned previously, in order to account for these reliability concerns, wear leveling algorithms may be used to "spread out" accesses to the cells in an attempt to keep the low level signaling characteristics approximately the same across a PCMS storage device's storage cells. Wear leveling algorithms, however, may be costly to implement. For example, wear leveling algorithms may temporarily suspend far memory accesses during time periods in which the data of heavily utilized storage cells and minimally used storage cells are "swapped". This has the effect of reducing far memory performance. Moreover, the logic circuitry needed to implement the wear leveling function may consume scores of logic gates that, if implemented proximate to the far memory storage devices
themselves (e.g., on a same DIMM card or within a same SSD package) may exceed or otherwise challenge the power and surface area constraints of a peripheral platform that the far memory devices are affixed on.
[0114] It therefore may be beneficial to de-emphasize, or avoid altogether, the use of wear leveling in a system having far memory technology.
[0115] According to one possible approach, one or more usage statistics of a specific set of far memory storage addresses is tracked, and, the appropriate low level signaling properties applied to that set of addresses is determined as a function of the tracked accesses. Here, the usage statistics are tracked and utilized during normal system operation rather than at only system bring up, system test diagnostics and/or in response to a system failure. The appropriate low level signals are then applied. Notably, however, the specific characteristics of the appropriate low level signals
(e.g., specific waveform shapes, specific analog parameters such as specific voltages and currents), and the particular values for the tracked parameters that the appropriate signals are determined from (e.g., the specific number of write accesses and/or specific amount of time that has elapsed since a last write), should be dependent on the specific far memory technology employed (e.g., type of PCMS, generation of PCMS, etc).
[0116] It therefore behooves system designers to implement a generic platform capable of applying appropriate signals as a function of tracked parameters irrespective of the storage device's particular technology. That is, a platform that essentially supports the ability to "program" into the system's NVRAM circuitry 532 specific low level access signal
characteristics (e.g., specific pulse widths, specific voltage amplitudes, specific current amplitudes, specific read threshold voltages, etc.) and the specific tracked value parameters (e.g., a specific number of writes, a specific amount of time since a last write) that such signal values are determined from, where, the specific signal characteristics and tracked values are a function of the specific type of far memory technology resident in the system.
[0117] Figures 5 and 6 provide representations of such a platform. Figure 5 shows components of a hardware architecture for an NVRAM controller 532 and Figure 6 shows basic methodologies that may be performed by the hardware architecture. Referring to Figure 5, NVRAM controller 532 may be used, for example, to access a computer system's main memory where the main memory has only PCMS technology or combined near/far memory technology (e.g., DRAM and PCMS). For example, NVRAM controller 532 may be coupled to or include a main memory channel into which DIMM cards are plugged. Alternatively, NVRAM controller 532 may be used to access a computing system's mass storage. For example, NVRAM controller 532 may be coupled to, and/or be integrated within, an SSD package.
[0118] A first correlation is instantiated that tracks certain usage parameters 502_1 to 502_N for each of N sets of address space 501 _1 to 501_N of a memory core 516. Memory core 516 may be implemented, for example, with PCMS devices coupled to a same memory channel and the address space of the PCMS devices is broken down into N address sets 501_1 to 501_N. Said another way, the address space of the memory storage supported by the memory channel can be viewed as being arranged into N address sets 501 _1 to 501 _N.
[0119] Here, if X bits are used to specify an address to memory core 516, there are 2X unique addresses. If there are N address sets, each unique address set will therefore correspond to 2X/N unique addresses. For example, if a memory channel uses 24 bits of address to access the memory core 516 there are 224 = 16,777,216 unique memory addresses supported by the memory channel. If the memory address space of the memory channel is configured into N = 214 = 16,384 unique address sets 501_1 to 501_N, each address set will correspond to 224/214 = 210 = 1 ,024 unique addresses supported by the memory channel. The sets may represent contiguous address space but they do not need to be organized in this manner. For example, some form of interleaving may be used so that consecutive addresses in a same set have a numerical offset of N or value based on N. Further still, the strategy for determining which addresses belong in which set may be based on the structural and/or wiring
architecture of the memory core 516. A more thorough discussion of possible address set definition schemes are described further below.
[0120] Whatever scheme is used to organize the specific addresses into distinct address sets is incorporated into address decoder 503. Here, address decoder 503 receives 601 the address of a read or write transaction targeted to the memory core 516 as an input, and, in response, produces 602 an identifier 506 of the specific set that the address belongs to as an output. Here, N may be programmable and may be an input term provided to the address decoder 503.
[0121] In response to the address decoder 503 identifying the particular address set that an incoming address belongs to, the tracking statistics for that address set are looked-up 603 from a first level of look-up circuitry 504 (such as content addressable memory (CAM) circuitry). In an embodiment, two tracking statistics are kept for each set of addresses: 1 ) total number of write accesses 507; and, 2) time of last write operation 508. In a further embodiment, these statistics are updated for a write transaction targeted to the memory core 516 but are not updated 604 for a read transaction targeted to the memory core 516 (if updated, they are eventually written back to the first level storage circuitry 504). Specifically, if the incoming transaction is a write transaction, the number of write accesses 507 is incremented by 1 and the time of last write operation 508 is updated to be the current time.
[0122] A fetched (and possibly updated) usage statistic is then used as a look-up parameter to a second look-up level 505 to retrieve 605 a digital representation (e.g., a plurality of bits) of an appropriate low level signaling characteristic (or characteristics set or "signature") for the implicated address set 51 1 . For example, as observed in the embodiment of Figure 5, the total number of writes statistic 507 that was retrieved for the implicated address set is used as a look up parameter to storage circuit 509 (which may also be implemented with CAM circuitry) to retrieve low level signature 51 1 . The low level signaling signature 51 1 is essentially a digital code or other representation from which the appropriate low level signaling (e.g., any one or more of waveform shape, voltage amplitude, current amplitude, etc.) for the memory core 516 for the particular transaction (read or write) and implicated address set can be determined. The signature 51 1 as contained within its storage circuit 509 (e.g., CAM) may have a read signature and write signature and any appropriate one of these is used depending on the type of transaction at hand. Here, it is worthwhile to note that various types of PCMS devices may actually perform a "pre-read" prior to a write, hence, a write transaction may actually be implemented with both a read operation and write operation. In this case, both a read signature and a write signature would be included in the total signature information used to implement the transaction. Notably, as observed in Figure 5, the storage circuit 509 has X entries which corresponds to the granularity at which the tracked statistic used as the look-up parameter (e.g., total number of writes) is designed to affect specific low level signals applied to the memory core 516 for the transaction.
[0123] Additionally, as observed in the embodiment of Figure 5, in the case of a read operation, another lookup is performed in the second look-up level 504 (e.g., in another CAM circuit 510). Here, the time of last write operation statistic 508 that was fetched in the first look-up is used by logic circuitry 51 1 to calculate an amount of time that has elapsed since the last write operation. The elapsed time since the last write operation 51 1 is then used as a look-up parameter into storage circuit 510 to fetch a signature of the appropriate read threshold voltage 512 to apply when reading the targeted storage cell. The applicable signatures 51 1 , 512 are then provided to low level memory access circuitry 514 having digital-to-analog converters and/or wave shaping circuitry that assist in effecting the correct analog signals applied to the memory core 516.
[0124] In an embodiment, the second level look-up storage circuitry 509, 510 defines its search key column(s) entries with ranges. A hit is
recognized when an input term falls within one of the ranges. For example, the entries of the search column for look-up table 509 may consist of different, consecutive ranges of total numbers of write operations (e.g., 0 to 1 ,000 for the first entry; 1 ,001 to 10,000 for the second entry, etc.). When a total number of write operations for the applicable address set is fetched from the first look-up level 504, it will hit within one of the ranges of the search column of table 510, which, in turn, will identify the appropriate analog signal signature for the transaction.
[0125] According to various approaches, the individual address sets
501_1 through 501_N are composed of contiguous addresses (address ranges) and address decoder 503 contains binning logic that can determine which address range a particular address is associated with. For example, logic circuitry 503 may be informed of, or calculate, the appropriate address ranges for N contiguous address ranges and may further populate 2N registers with the minimum and maximum address for each set/range. With comparison logic circuitry coupled to the registers for a set/range (e.g., for a same set/range, "greater than" comparison circuitry coupled to the minimum address value register and "less than" comparison circuitry coupled to the maximum address value register), logic circuitry can determine which set a received address belongs to (e.g., both the greater than and less than comparison circuits signify a logical "true").
[0126] In a simpler approach, the address set identifier 506 may be the transaction address or a portion of the transaction address (e.g., a row component or a column component of the address, or portions thereof). Here, the individual address sets 501 _1 to 501 _N in the first level of look-up 504 may be defined by address (or address portion) ranges.
[0127] In other approaches, rather than have contiguous address ranges, the address sets are composed of interleaved addresses having a fixed offset with respect to one another (e.g., each address in a set has an offset of N with respect to its neighboring address in the same set). In this case, address decoder 503 may include division logic circuitry that divides the incoming address by a value based on N and examines the remainder to identify what set the address belongs to.
[0128] The approach for determining the address sets, as designed into address decoder 503, may also take into account the structure of the memory core 516 itself. For instance, storage cells coupled to a same row or a same column may be grouped into a same set because such cells are coupled to a common, critical node within the memory core (e.g., a same row node or a same column node) whose applicable pulse widths, voltage/current amplitudes, etc. stress the cells in like fashion. As such, tracking the usage of these cells as a group and determining the appropriate low level signals to apply to them as a group is largely consistent with a more ideal (but less practical) scheme that tracks usage and applies signals to the cells on an individual cell-by-cell basis. To further reduce the amount of data that is tracked, addresses from different rows/columns of the core may be grouped into a same set if their wiring is deemed proximate to one another and/or there is some other structural relationship within the memory core that leads to a belief that they may receive same low level signaling as a function of the accesses made to the group as a whole.
[0129] Different hardware platform architectures than that depicted in Figure 5 may also exist. For example, the architecture of Figure 5 indicates that both low level signaling signatures 51 1 , 512 are determined from the same address set definition. In alternate approaches, the signatures 51 1 , 512 may be driven by different address set definitions, which, in turn, corresponds to the grouping of different parts of the memory core
architecture. For example, the low level signaling signature 51 1 for a write operation or a read operation (other than the read threshold voltage in the case of a read operation) may be determined from the total number of times the address's column component has been written to (or other first grouping of memory core wiring and/or structure).
[0130] By contrast, the read threshold voltage 512 for a read operation may be determined from the time elapsed since the last write to the address's corresponding row component (or other, different, second grouping of core wiring and/or structure). This would correspond to different types of set identifiers 506 (one for read transactions and one for write transactions) and potentially two separate look-up circuits in the first level look-up 504 (a first CAM used for reads and a second CAM used for writes). Again, those of ordinary skill can determine from the low level design details of the structure of the memory core 516 what groupings of addresses are appropriate to permit same application of signals as a function of accesses made to the group as a whole, as well as what tracked statistics are pertinent, whether the type of transaction is pertinent (read or write) and what the specific low level signaling should be. [0131] In another possible embodiment, for a single input transaction address, addresses associated with a same row (or other first address grouping) are identified in a first address set, and, addresses associated with a same column (or other second address grouping) are identified is a second address set. Total number of writes and time of last write are tracked for all the sets so that the system tracks the total number of writes and the time of last write for each row and each column in the system (or, more generally, the two different groupings). In this case, two sets of tracked statistics (e.g., two sets of total number of write accesses) are produced for a single transaction address input. The tracked statics may be added or mathematically combined in some fashion (e.g., each weighted equally or one weighted more heavily than other) to establish, for example, a total number of write addresses for the targeted cell based on the combined perspective of the two address groupings (e.g. a combined row and column perspective). The total number may then be used as a look-up parameter into the second stage look-up 505 to produce an analog signaling signature based on this combined perspective.
[0132] Again, in order to reduce the amount of information tracked, the "rows" or "columns" described above may instead be larger, different groupings of memory core structure and/or wiring where same low level signaling is appropriate based on accesses to the corresponding groups as a whole.
[0133] In order for the address decoder 503 to configure itself to properly identify the correct address set for any transaction address input,
information identifying the type of memory core, the address sets for the type of memory core, or the applicable function(s) for determining the address sets (e.g., contiguous ranges, interleaved, etc.) for the memory core are provided to the NVRAM controller 532. According to one approach, this information is communicated to the NVRAM controller 532 by the memory core 516 (e.g., having the information pre-programmed therein). According to another approach this information is kept in system BIOS and provided to the NVRAM controller 532. In either approach the information may be provided to the NVRAM controller 532 at system bring-up. The information is then used by the NVRAM controller 532 to internally configure the address decoder 503 so that it can subsequently determine the correct address set for any given read or write transaction address.
[0134] Figures 7a-d show different possible ways in which the above described techniques may be integrated into a memory channel within a computing system. Here, a memory channel is understood to include a host side 701 and one or more platforms 702 (e.g., DIMM cards, SDD devices, etc.) that are coupled to the memory channel's interconnect structure (such as a bus) 703. The one or more platforms 702 have storage devices including non volatile memory devices (such as PCMS devices) 716.
Interface circuitry 717 may also reside on a platform to specially address the memory devices 716. Here, the interface circuitry 717 may be viewed as a component of an NVRAM controller that is local to the storage core 716 (e.g., on a DIMM card or within an SSD package) whereas the host side, depicted in Figs. 7a - d as "memory controller 701 " may be viewed as the host side component of an NVRAM controller. In a standard approach, the memory controller 701 sends a read or write command to the interface circuitry 717 with a corresponding memory address. The interface circuitry 717 in response to the received command, performs the desired operation (read or write) to the memory storage devices.
[0135] Here, the storage devices 716 of Figure 7a-d can be viewed as the memory core 516 referred to above with respect to Figure 5. As observed in Fig. Figure 7a-d, the D/A circuitry and/or waveform circuitry 714a that converts the received signatures into an actual low level signal are located in the interface circuit 717a (and/or the memory device(s) 716a). All other roles/responsibilities of the above described techniques may be implemented entirely on the memory controller 701 a, entirely on the interface circuit 717a, or may be partially implemented on both. [0136] At one extreme, as observed in Figure 7a, all of the remaining roles/responsibilities are implemented entirely on the memory controller 701 a. That is, each of the address decoder 703a, the first level look-up storage circuitry 704a and the second level look-up storage circuitry 705a and any logic in between reside on the memory controller 701 a. In this case the, for a read or write command sent to the far interface circuit 717a, the memory controller 701 a also sends the applicable low level signaling signature(s) 71 1 a (e.g., which may further include a read threshold voltage signature for read operations) to the interface circuitry 717.
[0137] At the other extreme, as observed in Figure 7b, all of these roles/responsibilities 703b, 704b, 705b are instead implemented on the interface circuitry 717b. As such, for both read and write transactions, the memory controller 701 b sends a read or write command with the address but any additional information used to determine the applicable low level signaling signature is not sent to the interface circuitry 717b because all determinations as to the appropriate low level signaling can be made locally on the platform where the storage device(s) 716 reside.
[0138] In cases where the roles/responsibilities are shared across the channel, as observed in Figures 7c and 7d, the memory controller 701 c sends to the interface circuitry 717c information related to and used for the determination of the low level signaling signature - other than the signatures themselves.
[0139] For instance, according to the approach observed in Figure 7c, the memory controller 701 c, includes the address decoder 703c so that it can determine which address set or set(s) are implicated by the transaction address. The memory controller then sends an identifier 717c of the implicated address set to the interface circuit 717c. The interface circuit 717c, which includes the first and second level look-up circuitry 704c, 705c then determines the applicable low level signaling signature from the address set information (e.g., by performing both the first and second stage look-ups). [0140] According to another approach, observed in Figure 7d, the memory controller 701 d includes the address decoder 703d and determines the appropriate address set for the transaction's address. The memory controller 701 d also includes the first level look-up circuitry 704d and looks up the information that is tracked for the address set. The tracked information 707c (e.g., total number of writes and/or time of last write or time elapsed since last write) is then sent to the interface circuit 717d which uses the information to determine the applicable low level signaling signature. Depending on implementation, in the case of read operations the memory controller 701 d may include logic to determine time elapsed since the last write, or, such logic may be located on the interface circuit 717.

Claims

Claims
1 . A method, comprising:
receiving an address for a read or write transaction to a non volatile system memory device;
determining a usage statistic of said memory device for a set of addresses of which said address is a member;
determining a characteristic of a signal to be applied to said memory device for said read or write transaction based on said usage statistic; and, generating a signal having said characteristic to perform said read or write transaction.
2. The method of claim 1 wherein said usage statistic includes total writes.
3. The method of claim 1 wherein said usage statistic includes time elapsed since a last write.
4. The method of claim 3 wherein said signaling characteristic is a specific read threshold voltage.
5. The method of claim 1 wherein a memory controller determines the set of addresses based on the address.
6. The method of claim 5 wherein said determining of a usage statistic is determined by said memory controller.
7. The method of claim 1 wherein interface circuitry disposed on a same platform that said memory device is disposed on performs said determining of said usage statistic.
8. The method of claim 1 wherein interface circuitry disposed on a same platform that said memory device is disposed on performs said determining of said characteristic of said signal.
9. An apparatus, comprising:
storage circuitry to store a signal characteristic, a signal having said signal characteristic to be generated when performing a read or write transaction on a non volatile system memory device, said storage circuitry to provide said signal characteristic at an output, said signal characteristic provided at said output by said storage circuitry in response to receiving at an input of said storage circuitry a usage statistic of said memory device for a set of addresses of which said read or write transaction's address is a member.
10. The apparatus of claim 9 wherein said storage circuitry is CAM circuitry.
1 1 . The apparatus of claim 9 further comprising second storage circuitry having a second output coupled to said input of said storage circuitry, said second storage circuitry to store said usage statistic, said usage statistic provided at said output by said second storage circuit in response to receiving at a second input of said second storage circuit an identifier of said set of addresses.
12. The apparatus of claim 1 1 further comprising an address decoder circuit having a third output coupled to said second input of said second storage circuitry, said address decoder to provide said identifier of said set of addresses from said read or write transaction's address.
13. The apparatus of claim 12 wherein said address decoder further comprises registers to store address ranges of said memory device's address space.
14. The apparatus of claim 9 further comprising digital-to-analog circuitry coupled downstream from said storage circuit output, said digital-to-analog circuitry to receive said signal characteristic and generate said signal.
15. The apparatus of claim 9 wherein said storage circuitry is implemented on a memory controller.
16. The apparatus of claim 9 wherein said storage circuitry is implemented on interface circuitry disposed on a same platform that said far memory is disposed on.
17. An apparatus, comprising:
a channel;
a memory controller coupled to said channel;
a platform coupled to said channel, said platform having a non volatile system memory device and interface circuitry disposed thereon; said memory controller or interface circuitry further comprising storage circuitry to store a signal characteristic, a signal having said signal characteristic to be generated when performing a read or write transaction on said memory device, said storage circuitry to provide said signal characteristic at an output, said signal characteristic provided at said output by said storage circuitry in response to receiving at an input of said storage circuitry a usage statistic of said far memory storage device for a set of addresses of which said read or write transaction's address is a member.
18. The apparatus of claim 17 further comprising second storage circuitry having a second output coupled to said input of said storage circuitry, said second storage circuitry to store said usage statistic, said usage statistic provided at said output by said second storage circuit in response to receiving at a second input of said second storage circuit an identifier of said set of addresses.
19. The apparatus of claim 17 further comprising an address decoder circuit having a third output coupled to said second input of said second storage circuitry, said address decoder to provide said identifier of said set of addresses from said read or write transaction's address.
20. The apparatus of claim 19 wherein said address decoder further comprises registers to store address ranges of said far memory's address space.
PCT/US2011/054379 2011-09-30 2011-09-30 Generation of far memory access signals based on usage statistic tracking WO2013048467A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP11873232.0A EP2761467B1 (en) 2011-09-30 2011-09-30 Generation of far memory access signals based on usage statistic tracking
PCT/US2011/054379 WO2013048467A1 (en) 2011-09-30 2011-09-30 Generation of far memory access signals based on usage statistic tracking
CN201180075119.XA CN103946813B (en) 2011-09-30 2011-09-30 Generation based on the remote memory access signals followed the trail of using statistic
US13/996,525 US9600407B2 (en) 2011-09-30 2011-09-30 Generation of far memory access signals based on usage statistic tracking
TW101130980A TWI518686B (en) 2011-09-30 2012-08-27 Generation of far memory access signals based on usage statistic tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2011/054379 WO2013048467A1 (en) 2011-09-30 2011-09-30 Generation of far memory access signals based on usage statistic tracking

Publications (1)

Publication Number Publication Date
WO2013048467A1 true WO2013048467A1 (en) 2013-04-04

Family

ID=47996199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/054379 WO2013048467A1 (en) 2011-09-30 2011-09-30 Generation of far memory access signals based on usage statistic tracking

Country Status (5)

Country Link
US (1) US9600407B2 (en)
EP (1) EP2761467B1 (en)
CN (1) CN103946813B (en)
TW (1) TWI518686B (en)
WO (1) WO2013048467A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI623937B (en) * 2013-05-31 2018-05-11 桑迪士克科技有限責任公司 Updating read voltages
US10475523B2 (en) 2013-05-31 2019-11-12 Western Digital Technologies, Inc. Updating read voltages triggered by the rate of temperature change
US10811091B2 (en) 2018-10-12 2020-10-20 Western Digital Technologies, Inc. Adaptive processing for read threshold voltage calibration

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8874865B2 (en) * 2011-09-09 2014-10-28 International Business Machines Corporation Memory type-specific access control of a field of a record
EP2761476B1 (en) 2011-09-30 2017-10-25 Intel Corporation Apparatus, method and system that stores bios in non-volatile random access memory
EP2761472B1 (en) 2011-09-30 2020-04-01 Intel Corporation Memory channel that supports near memory and far memory access
US9829951B2 (en) 2011-12-13 2017-11-28 Intel Corporation Enhanced system sleep state support in servers using non-volatile random access memory
CN103999067A (en) 2011-12-21 2014-08-20 英特尔公司 High-performance storage structures and systems featuring multiple non-volatile memories
CN104115230B (en) 2011-12-22 2018-02-16 英特尔公司 Computing device, method and system based on High Efficiency PC MS flush mechanisms
KR101761044B1 (en) 2011-12-22 2017-07-24 인텔 코포레이션 Power conservation by way of memory channel shutdown
WO2013097105A1 (en) 2011-12-28 2013-07-04 Intel Corporation Efficient dynamic randomizing address remapping for pcm caching to improve endurance and anti-attack
US9335954B2 (en) * 2012-09-10 2016-05-10 Texas Instruments Incorporated Customizable backup and restore from nonvolatile logic array
US9697905B2 (en) 2013-05-31 2017-07-04 Sandisk Technologies Llc Updating read voltages using syndrome weight comparisons
KR102116258B1 (en) * 2013-12-24 2020-06-05 삼성전자주식회사 Memory system and user device including the same
KR102211865B1 (en) * 2014-05-20 2021-02-04 삼성전자주식회사 Nonvolatile memory system and operating method of memory controller
US10204047B2 (en) 2015-03-27 2019-02-12 Intel Corporation Memory controller for multi-level system memory with coherency unit
US10387259B2 (en) 2015-06-26 2019-08-20 Intel Corporation Instant restart in non volatile system memory computing systems with embedded programmable data checking
US10073659B2 (en) 2015-06-26 2018-09-11 Intel Corporation Power management circuit with per activity weighting and multiple throttle down thresholds
US10108549B2 (en) 2015-09-23 2018-10-23 Intel Corporation Method and apparatus for pre-fetching data in a system having a multi-level system memory
US10261901B2 (en) 2015-09-25 2019-04-16 Intel Corporation Method and apparatus for unneeded block prediction in a computing system having a last level cache and a multi-level system memory
US10185501B2 (en) 2015-09-25 2019-01-22 Intel Corporation Method and apparatus for pinning memory pages in a multi-level system memory
US9792224B2 (en) 2015-10-23 2017-10-17 Intel Corporation Reducing latency by persisting data relationships in relation to corresponding data in persistent memory
US10033411B2 (en) 2015-11-20 2018-07-24 Intel Corporation Adjustable error protection for stored data
US9824419B2 (en) * 2015-11-20 2017-11-21 International Business Machines Corporation Automatically enabling a read-only cache in a language in which two arrays in two different variables may alias each other
US10095618B2 (en) 2015-11-25 2018-10-09 Intel Corporation Memory card with volatile and non volatile memory space having multiple usage model configurations
US10303372B2 (en) 2015-12-01 2019-05-28 Samsung Electronics Co., Ltd. Nonvolatile memory device and operation method thereof
US20170177482A1 (en) * 2015-12-18 2017-06-22 Intel Corporation Computing system having multi-level system memory capable of operating in a single level system memory mode
US9747041B2 (en) 2015-12-23 2017-08-29 Intel Corporation Apparatus and method for a non-power-of-2 size cache in a first level memory device to cache data present in a second level memory device
US10007606B2 (en) 2016-03-30 2018-06-26 Intel Corporation Implementation of reserved cache slots in computing system having inclusive/non inclusive tracking and two level system memory
US10185619B2 (en) 2016-03-31 2019-01-22 Intel Corporation Handling of error prone cache line slots of memory side cache of multi-level system memory
US10120806B2 (en) 2016-06-27 2018-11-06 Intel Corporation Multi-level system memory with near memory scrubbing based on predicted far memory idle time
US10915453B2 (en) 2016-12-29 2021-02-09 Intel Corporation Multi level system memory having different caching structures and memory controller that supports concurrent look-up into the different caching structures
US10445261B2 (en) 2016-12-30 2019-10-15 Intel Corporation System memory having point-to-point link that transports compressed traffic
CN108733311B (en) * 2017-04-17 2021-09-10 伊姆西Ip控股有限责任公司 Method and apparatus for managing storage system
US10304814B2 (en) 2017-06-30 2019-05-28 Intel Corporation I/O layout footprint for multiple 1LM/2LM configurations
US11188467B2 (en) 2017-09-28 2021-11-30 Intel Corporation Multi-level system memory with near memory capable of storing compressed cache lines
US10418097B2 (en) 2017-11-27 2019-09-17 Western Digital Technologies, Inc. Non-volatile storage system with read calibration
US10860244B2 (en) 2017-12-26 2020-12-08 Intel Corporation Method and apparatus for multi-level memory early page demotion
US11099995B2 (en) 2018-03-28 2021-08-24 Intel Corporation Techniques for prefetching data to a first level of memory of a hierarchical arrangement of memory
US11055228B2 (en) 2019-01-31 2021-07-06 Intel Corporation Caching bypass mechanism for a multi-level memory
US11036642B2 (en) * 2019-04-26 2021-06-15 Intel Corporation Architectural enhancements for computing systems having artificial intelligence logic disposed locally to memory
US20220345414A1 (en) * 2019-08-30 2022-10-27 Unitex Corporation Interface Conversion Device
KR20220070951A (en) 2020-11-23 2022-05-31 삼성전자주식회사 Memory device, system including the same and operating method of memory device
US11860773B2 (en) * 2022-02-03 2024-01-02 Micron Technology, Inc. Memory access statistics monitoring

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0806726A1 (en) * 1996-05-10 1997-11-12 Sun Microsystems, Inc. On-line memory monitoring system and methods
US20080034148A1 (en) * 2006-08-01 2008-02-07 International Business Machines Corporation Systems and methods for providing performance monitoring in a memory system
US20100131827A1 (en) * 2007-05-12 2010-05-27 Anobit Technologies Ltd Memory device with internal signap processing unit

Family Cites Families (81)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3451099B2 (en) 1991-12-06 2003-09-29 株式会社日立製作所 External storage subsystem
US6161208A (en) 1994-05-06 2000-12-12 International Business Machines Corporation Storage subsystem including an error correcting cache and means for performing memory to memory transfers
US5517615A (en) 1994-08-15 1996-05-14 Unisys Corporation Multi-channel integrity checking data transfer system for controlling different size data block transfers with on-the-fly checkout of each word and data block transferred
US6470405B2 (en) * 1995-10-19 2002-10-22 Rambus Inc. Protocol for communication with dynamic memory
JP3210590B2 (en) 1996-11-29 2001-09-17 株式会社日立製作所 Multiprocessor system and cache coherency control method
US5822251A (en) 1997-08-25 1998-10-13 Bit Microsystems, Inc. Expandable flash-memory mass-storage using shared buddy lines and intermediate flash-bus between device-specific buffers and flash-intelligent DMA controllers
JP3098486B2 (en) * 1998-03-31 2000-10-16 山形日本電気株式会社 Nonvolatile semiconductor memory device
US6038166A (en) * 1998-04-01 2000-03-14 Invox Technology High resolution multi-bit-per-cell memory
US5912839A (en) 1998-06-23 1999-06-15 Energy Conversion Devices, Inc. Universal memory element and method of programming same
US7827348B2 (en) 2000-01-06 2010-11-02 Super Talent Electronics, Inc. High performance flash memory devices (FMD)
US6868472B1 (en) 1999-10-01 2005-03-15 Fujitsu Limited Method of Controlling and addressing a cache memory which acts as a random address memory to increase an access speed to a main memory
US8171204B2 (en) 2000-01-06 2012-05-01 Super Talent Electronics, Inc. Intelligent solid-state non-volatile memory device (NVMD) system with multi-level caching of multiple channels
US6259627B1 (en) 2000-01-27 2001-07-10 Multi Level Memory Technology Read and write operations using constant row line voltage and variable column line load
US6922350B2 (en) 2002-09-27 2005-07-26 Intel Corporation Reducing the effect of write disturbs in polymer memories
US7328304B2 (en) 2004-02-27 2008-02-05 Intel Corporation Interface for a block addressable mass storage system
US7475174B2 (en) 2004-03-17 2009-01-06 Super Talent Electronics, Inc. Flash / phase-change memory in multi-ring topology using serial-link packet interface
US7590918B2 (en) 2004-09-10 2009-09-15 Ovonyx, Inc. Using a phase change memory as a high volume memory
US7441081B2 (en) 2004-12-29 2008-10-21 Lsi Corporation Write-back caching for disk drives
US7681004B2 (en) 2005-06-13 2010-03-16 Addmm, Llc Advanced dynamic disk memory module
US7797479B2 (en) 2005-06-30 2010-09-14 Intel Corporation Technique to write to a non-volatile memory
US20070005922A1 (en) 2005-06-30 2007-01-04 Swaminathan Muthukumar P Fully buffered DIMM variable read latency
KR100609621B1 (en) 2005-07-19 2006-08-08 삼성전자주식회사 Synchronous semiconductor memory device having block-dedicated programmable cas latency
US7533215B2 (en) 2005-09-15 2009-05-12 Intel Corporation Distributed and packed metadata structure for disk cache
US7516267B2 (en) 2005-11-03 2009-04-07 Intel Corporation Recovering from a non-volatile memory failure
US7516349B2 (en) 2005-12-29 2009-04-07 Intel Corporation Synchronized memory channels with unidirectional links
US7600078B1 (en) 2006-03-29 2009-10-06 Intel Corporation Speculatively performing read transactions
US7913147B2 (en) 2006-05-08 2011-03-22 Intel Corporation Method and apparatus for scrubbing memory
CN101501779B (en) * 2006-05-12 2013-09-11 苹果公司 Memory device with adaptive capacity
US7756053B2 (en) 2006-06-30 2010-07-13 Intel Corporation Memory agent with error hardware
US7761657B2 (en) 2006-07-10 2010-07-20 Hitachi, Ltd. Storage control system, control method for storage control system, port selector, and controller
US7587559B2 (en) * 2006-08-10 2009-09-08 International Business Machines Corporation Systems and methods for memory module power management
US8051253B2 (en) 2006-09-28 2011-11-01 Virident Systems, Inc. Systems and apparatus with programmable memory control for heterogeneous main memory
US7555605B2 (en) 2006-09-28 2009-06-30 Freescale Semiconductor, Inc. Data processing system having cache memory debugging support and method therefor
WO2008055272A2 (en) * 2006-11-04 2008-05-08 Virident Systems, Inc. Integrating data from symmetric and asymmetric memory
US9153337B2 (en) * 2006-12-11 2015-10-06 Marvell World Trade Ltd. Fatigue management system and method for hybrid nonvolatile solid state memory system
US20080270811A1 (en) 2007-04-26 2008-10-30 Super Talent Electronics Inc. Fast Suspend-Resume of Computer Motherboard Using Phase-Change Memory
US8799620B2 (en) 2007-06-01 2014-08-05 Intel Corporation Linear to physical address translation with support for page attributes
KR101498673B1 (en) 2007-08-14 2015-03-09 삼성전자주식회사 Solid state drive, data storing method thereof, and computing system including the same
CN101237546A (en) 2007-11-13 2008-08-06 东南大学 High-speed audio and video magnitude storage method and device for vehicular environment
US7941692B2 (en) 2007-12-31 2011-05-10 Intel Corporation NAND power fail recovery
TWI373768B (en) 2008-02-05 2012-10-01 Phison Electronics Corp System, controller and method for data storage
TWI437429B (en) 2008-06-04 2014-05-11 A Data Technology Co Ltd Multi-channel hybrid density memory storage device and control method thereof
US20090313416A1 (en) 2008-06-16 2009-12-17 George Wayne Nation Computer main memory incorporating volatile and non-volatile memory
US20090327837A1 (en) 2008-06-30 2009-12-31 Robert Royer NAND error management
US9152569B2 (en) 2008-11-04 2015-10-06 International Business Machines Corporation Non-uniform cache architecture (NUCA)
US8375241B2 (en) 2009-04-02 2013-02-12 Intel Corporation Method and system to improve the operations of a registered memory module
US8331857B2 (en) 2009-05-13 2012-12-11 Micron Technology, Inc. Wireless interface to program phase-change memories
US8250282B2 (en) 2009-05-14 2012-08-21 Micron Technology, Inc. PCM memories for storage bus interfaces
US8504759B2 (en) 2009-05-26 2013-08-06 Micron Technology, Inc. Method and devices for controlling power loss
US20100306453A1 (en) 2009-06-02 2010-12-02 Edward Doller Method for operating a portion of an executable program in an executable non-volatile memory
US8159881B2 (en) * 2009-06-03 2012-04-17 Marvell World Trade Ltd. Reference voltage optimization for flash memory
US9123409B2 (en) 2009-06-11 2015-09-01 Micron Technology, Inc. Memory device for a hierarchical memory architecture
US8612666B2 (en) 2009-06-30 2013-12-17 Intel Corporation Method and system for managing a NAND flash memory by paging segments of a logical to physical address map to a non-volatile memory
US8626997B2 (en) 2009-07-16 2014-01-07 Micron Technology, Inc. Phase change memory in a dual inline memory module
WO2011007599A1 (en) 2009-07-17 2011-01-20 株式会社 東芝 Memory management device
US8077515B2 (en) 2009-08-25 2011-12-13 Micron Technology, Inc. Methods, devices, and systems for dealing with threshold voltage change in memory devices
US8249099B2 (en) 2009-08-27 2012-08-21 Texas Instruments Incorporated External memory data management with data regrouping and channel look ahead
US20110087824A1 (en) 2009-10-08 2011-04-14 Giga-Byte Technology Co.,Ltd. Flash memory accessing apparatus and method thereof
US8914568B2 (en) 2009-12-23 2014-12-16 Intel Corporation Hybrid memory architectures
US8612809B2 (en) 2009-12-31 2013-12-17 Intel Corporation Systems, methods, and apparatuses for stacked memory
US20110197031A1 (en) 2010-02-05 2011-08-11 Nokia Corporation Update Handler For Multi-Channel Cache
US20110208900A1 (en) 2010-02-23 2011-08-25 Ocz Technology Group, Inc. Methods and systems utilizing nonvolatile memory in a computer system main memory
US9189385B2 (en) * 2010-03-22 2015-11-17 Seagate Technology Llc Scalable data structures for control and management of non-volatile storage
KR20110131781A (en) 2010-05-31 2011-12-07 삼성전자주식회사 Method for presuming accuracy of location information and apparatus for the same
GB201011146D0 (en) * 2010-07-02 2010-08-18 Vodafone Ip Licensing Ltd Mobile computing device
US8649212B2 (en) 2010-09-24 2014-02-11 Intel Corporation Method, apparatus and system to determine access information for a phase change memory
US8838935B2 (en) 2010-09-24 2014-09-16 Intel Corporation Apparatus, method, and system for implementing micro page tables
CN101989183A (en) 2010-10-15 2011-03-23 浙江大学 Method for realizing energy-saving storing of hybrid main storage
US8806106B2 (en) * 2010-11-12 2014-08-12 Seagate Technology Llc Estimating wear of non-volatile, solid state memory
US8612676B2 (en) 2010-12-22 2013-12-17 Intel Corporation Two-level system main memory
US9779020B2 (en) 2011-02-08 2017-10-03 Diablo Technologies Inc. System and method for providing an address cache for memory map learning
US8595597B2 (en) 2011-03-03 2013-11-26 Intel Corporation Adjustable programming speed for NAND memory devices
US8462577B2 (en) 2011-03-18 2013-06-11 Intel Corporation Single transistor driver for address lines in a phase change memory and switch (PCMS) array
US8462537B2 (en) 2011-03-21 2013-06-11 Intel Corporation Method and apparatus to reset a phase change memory and switch (PCMS) memory cell
US8607089B2 (en) 2011-05-19 2013-12-10 Intel Corporation Interface for storage device access over memory bus
CN102209262B (en) * 2011-06-03 2017-03-22 中兴通讯股份有限公司 Method, device and system for scheduling contents
US20120324195A1 (en) 2011-06-14 2012-12-20 Alexander Rabinovitch Allocation of preset cache lines
US8605531B2 (en) 2011-06-20 2013-12-10 Intel Corporation Fast verify for phase change memory with switch
US8463948B1 (en) 2011-07-01 2013-06-11 Intel Corporation Method, apparatus and system for determining an identifier of a volume of memory
US8767482B2 (en) 2011-08-18 2014-07-01 Micron Technology, Inc. Apparatuses, devices and methods for sensing a snapback event in a circuit
CN103946819B (en) 2011-09-30 2017-05-17 英特尔公司 Statistical wear leveling for non-volatile system memory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0806726A1 (en) * 1996-05-10 1997-11-12 Sun Microsystems, Inc. On-line memory monitoring system and methods
US20080034148A1 (en) * 2006-08-01 2008-02-07 International Business Machines Corporation Systems and methods for providing performance monitoring in a memory system
US20100131827A1 (en) * 2007-05-12 2010-05-27 Anobit Technologies Ltd Memory device with internal signap processing unit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2761467A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI623937B (en) * 2013-05-31 2018-05-11 桑迪士克科技有限責任公司 Updating read voltages
US10475523B2 (en) 2013-05-31 2019-11-12 Western Digital Technologies, Inc. Updating read voltages triggered by the rate of temperature change
US10811091B2 (en) 2018-10-12 2020-10-20 Western Digital Technologies, Inc. Adaptive processing for read threshold voltage calibration

Also Published As

Publication number Publication date
TWI518686B (en) 2016-01-21
EP2761467A1 (en) 2014-08-06
US20130290597A1 (en) 2013-10-31
US9600407B2 (en) 2017-03-21
CN103946813A (en) 2014-07-23
TW201322261A (en) 2013-06-01
EP2761467A4 (en) 2015-03-11
EP2761467B1 (en) 2019-10-23
CN103946813B (en) 2017-08-25

Similar Documents

Publication Publication Date Title
EP2761467B1 (en) Generation of far memory access signals based on usage statistic tracking
US10719443B2 (en) Apparatus and method for implementing a multi-level memory hierarchy
US10282323B2 (en) Memory channel that supports near memory and far memory access
US10102126B2 (en) Apparatus and method for implementing a multi-level memory hierarchy having different operating modes
US9317429B2 (en) Apparatus and method for implementing a multi-level memory hierarchy over common memory channels
US9958926B2 (en) Method and system for providing instant responses to sleep state transitions with non-volatile random access memory
US9202548B2 (en) Efficient PCMS refresh mechanism
US20140229659A1 (en) Thin translation for system access of non volatile semicondcutor storage as random access memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11873232

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 13996525

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2011873232

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE