CN108139978B - Memory system with cached memory module operation - Google Patents

Memory system with cached memory module operation Download PDF

Info

Publication number
CN108139978B
CN108139978B CN201680057520.3A CN201680057520A CN108139978B CN 108139978 B CN108139978 B CN 108139978B CN 201680057520 A CN201680057520 A CN 201680057520A CN 108139978 B CN108139978 B CN 108139978B
Authority
CN
China
Prior art keywords
memory
module
dram
scm
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201680057520.3A
Other languages
Chinese (zh)
Other versions
CN108139978A (en
Inventor
F·A·韦尔
K·L·赖特
J·E·林斯塔特
C·汉佩尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rambus Inc
Original Assignee
Rambus Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rambus Inc filed Critical Rambus Inc
Priority to CN202310262063.3A priority Critical patent/CN116560563A/en
Publication of CN108139978A publication Critical patent/CN108139978A/en
Application granted granted Critical
Publication of CN108139978B publication Critical patent/CN108139978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1678Details of memory controller using bus width
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1068Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices in sector programmable memories, e.g. flash disk
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0634Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/52Protection of memory contents; Detection of errors in memory contents
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1032Reliability improvement, data loss prevention, degraded operation etc
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/403Error protection encoding, e.g. using parity or ECC codes

Abstract

Memory controllers, devices, modules, systems, and associated methods are disclosed. In one embodiment, a memory module includes a pin interface for coupling to a bus. The bus has a first width. The module includes at least one Storage Class Memory (SCM) component and at least one DRAM component. The memory module operates in a first mode utilizing all of the first widths and in a second mode utilizing less than all of the first widths.

Description

Memory system with cached memory module operation
Technical Field
The disclosure herein relates to memory systems, memory modules, memory controllers, memory devices, and associated methods.
Background
A further generation of dynamic random access memory Devices (DRAMs) have emerged in the market, with lithographic feature sizes being continually shrinking. Thus, device storage capacity from each generation has increased. However, it is increasingly difficult to scale DRAM devices and to obtain sufficient capacitive performance for charge storage. DRAM devices can also be costly to manufacture.
Various non-volatile memories, such as Resistive Random Access Memory (RRAM) and Phase Change Random Access Memory (PCRAM), to name a few, are relatively inexpensive to manufacture. However, many non-volatile memory technologies must still achieve the performance of their DRAM counterparts.
It would be desirable to use memory in a memory system that has the cost advantages of many non-volatile technologies and the performance of DRAM.
Drawings
Embodiments of the disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
FIG. 1 illustrates one embodiment of a memory system that uses a first memory module that mounts only DRAM components and a second module that mounts both DRAM components and Storage Class Memory (SCM) memory components.
Fig. 2A-2I illustrate various examples of module interconnection schemes further described herein.
Fig. 3 illustrates steering circuitry within a data buffer component.
Fig. 4A-4F illustrate various cache operations corresponding to two different memory module configurations.
FIG. 5 illustrates a tag data structure, which is associated with an SCM memory space.
FIG. 6 illustrates a point-to-point memory architecture utilizing a first memory module having a DRAM component and a second memory module having an SCM component.
FIG. 7 illustrates a memory architecture similar to FIG. 6 utilizing four memory modules.
Fig. 8A through 8F illustrate various cache operations corresponding to various memory module configurations.
FIG. 9 illustrates further details of a memory system, in accordance with one embodiment.
Fig. 10-13 illustrate various timing diagrams for cache operations in the memory system of fig. 9.
Fig. 14 illustrates circuitry related to tag matching distributed across modules.
15A-15I illustrate configuration taxonomies for various memory module configurations that utilize both SCM memory components and DRAM memory components.
16A-16F illustrate various cache operations for the memory module configuration shown in FIG. 15A.
17A-17F illustrate various cache operations for the memory module configuration shown in FIG. 15B.
18A-18F illustrate various cache operations for the memory module configuration shown in FIG. 15C.
Fig. 19A to 19F illustrate various cache operations for the memory module configuration shown in fig. 15D.
20A-20F illustrate various cache operations for the memory module configuration shown in FIG. 15E.
21A-21F illustrate various cache operations for the memory module configuration shown in FIG. 15F.
22A-22F illustrate various cache operations for the memory module configuration shown in FIG. 15I.
23A-23F illustrate various cache operations for the memory module configuration shown in FIG. 15I.
FIGS. 24A-24F illustrate various cache operations for the memory module configuration shown in FIG. 15I.
FIG. 25 illustrates a performance comparison between various memory module configurations.
FIG. 26 illustrates a power comparison between various memory module configurations.
FIG. 27 illustrates further details regarding the memory system configuration of FIG. 15A.
FIG. 28 illustrates further details regarding the memory system configuration of FIG. 15C.
FIG. 29 illustrates further details regarding the memory system configuration of FIG. 15C.
Fig. 30-36 illustrate further details regarding various cache operations relating to various system configurations.
Fig. 37-42 illustrate timing diagrams showing the timing of various cache operations in various system embodiments.
Fig. 43 to 45 illustrate various system configurations.
Fig. 46 to 49 illustrate various timing charts.
FIG. 50 illustrates further details regarding the DRAM package structure.
FIG. 51 illustrates yet another system embodiment.
Fig. 52 illustrates a high-level system embodiment.
Detailed Description
Memory modules, memory controllers, devices, and associated methods are disclosed. In one embodiment, a memory module is disclosed that includes a pin interface for coupling to a data bus having a first width. The module is capable of operating in a first legacy mode that utilizes a full first width of the data bus. The module may also be operated in a second mode in which the module utilizes less than all of the first width of the data bus. The module includes at least one Storage Class Memory (SCM) device and at least one DRAM memory device. This basic architecture enables the DRAM device to provide caching functionality for transactions involving the SCM device, thereby providing high performance for memory systems that are primarily populated by lower cost SCM devices.
Referring now to FIG. 1, a memory system, generally designated 100, in accordance with a first embodiment is shown. The system includes a first DRAM module 102 and a second module 104, the second module 104 incorporating a DRAM memory component or device 106 and a Storage Class Memory (SCM) device 108. Storage class memory devices include, for example, memory components having characteristics often associated with non-volatile memory devices, such as Phase Change Memory (PCM), resistive Random Access Memory (RRAM), and flash memory. The first module 102 and the second module 104 are interconnected to the memory controller 105 via a legacy point-to-2 point (multi-strand) architecture including primary data paths DQu and DQv and control addresses C/a paths CA, CSy, and CSx. For one embodiment, the DRAM components in the hybrid module 104 are configured to form a cache memory, thus providing a caching function for transactions involving the memory controller 105 and the SCM memory device 108. However, utilizing a legacy multistrand link architecture may involve a fixed ratio of DRAM cache memory to SCM memory on hybrid module 104. Thus, expanding the cache region to the secondary module may involve additional transfers on the primary links, which may affect the bandwidth of those links.
With further reference to fig. 1, each of the memory devices (DRAM and/or SCM) 106 and 108 may include one or more memory die stacks (stacks). The structure of the SCM heap will be described with the understanding that the DRAM heap may be configured in a similar manner. As noted earlier, the memory die may be a non-volatile type of memory technology other than DRAM, such as Resistive Random Access Memory (RRAM), phase Change Random Access Memory (PCRAM), and flash memory, to name a few. Each heap of the device may, for example, contain eight SCM memory components. One example of a stacked set of devices is shown in an enlarged view 1-1, which illustrates stacked SCM memory components 116 within a single package 118. For some configurations, the opposite side of the SCM module substrate 120 may be equipped with memory components, such as at 122. The interfaces of each SCM memory component may be connected in parallel using through silicon vias, wire bonding, any other connection method. Other stack configurations (such as package-on-package stacks) are also possible.
Fig. 2A to 2C illustrate various high-level configuration diagrams. Fig. 2A illustrates the interconnection between the memory controller 202 and one hybrid DRAM/SCM module 204, while fig. 2B-2I illustrate the interconnection between the memory controller 202 and two modules, including the hybrid module 204 and a second module 206 that includes only DRAM components or both DRAM and SCM components. The configuration of FIG. 2A includes a memory controller 202 having tag memory compare circuitry 208, where the controller is connected to a hybrid memory module 204 via two 36b independent channels or one 72b lockstep channel. The hybrid module 204 includes both a DRAM memory component 210 and an SCM memory component 212. Fig. 2B shows a two-module configuration in which the memory controller 202 is coupled to a hybrid module 204 and a second module 206 that includes only DRAM components 210. The two modules 204 and 206 are coupled to the memory controller 202 via a multi-strand configuration, where a given link of the controller is shared by the corresponding links associated with each module. Finally, fig. 2C illustrates yet another multi-strand, two-module configuration, where both modules 204 and 206 are hybrid modules that include both SCM memory component 210 and DRAM memory component 212.
Fig. 2D-2F illustrate memory module configurations involving point-to-point connections between the memory controller 202 and one or each of two modules 204 and 206 and each of two sockets, each capable of holding one module. As shown in fig. 2D, the memory controller 202 includes tag memory compare circuitry 208 and a first point-to-point link 214 that couples portions of the tag memory compare circuitry to the hybrid memory module 204. The second portion of the tag memory compare circuit at 217 is connected to the hybrid memory module 204 via a point-to-point link 216 routed through the continuity module 218. The continuity module is configured to plug into the module socket to provide point-to-point communication between the memory controller 202 and at least one other module. In this way, the continuity module does not use a memory device but only provides connectivity functionality. Due to the added connectivity, a single hybrid memory module 202 may interface with the memory controller 202 through a number of point-to-point links, with half of the links being directly coupled to the memory controller and half of the links being routed through the continuity module to the controller. The link 216 may form a backchannel that is directly coupled between the hybrid memory module 204 and the continuity module 218. As described more fully below, for one embodiment, the reverse channel link may provide a connection for balancing load between memory modules.
FIG. 2E illustrates a similar configuration to FIG. 2D, but instead of utilizing a continuity module in the second module socket, the second memory module 206 using DRAM memory components 210 is plugged into the socket and thus connected to the memory controller 202 in a point-to-point fashion. For one embodiment, the DRAM memory component provides non-cached DRAM storage. Fig. 2F illustrates yet another alternative embodiment, where the second memory module 206 is a hybrid memory module that uses both SCM memory components 210 and DRAM memory components 212.
Fig. 2G-2I illustrate possible load balancing examples for the system configuration of fig. 2E. Fig. 2G illustrates a load balancing example, where half of the bandwidth of the system is directed to the first module 204 and half of the bandwidth of the system is directed to the second module 206. For this case, the backchannel link 216 between the two modules does not provide a balancing function. FIG. 2H illustrates a load balancing example in which the full bandwidth of the system is allocated to the hybrid module 204 by utilizing the backchannel 216 to bypass data transfers from the memory controller 202 through the second memory module 206 to the hybrid memory module 204. Figure 2I illustrates an example in which the full bandwidth of the system is allocated to the second module 206, a DRAM-only module, by activating the backchannel 216 to direct the transfer of data through the hybrid module 204 to the DRAM module 206.
FIG. 3 illustrates one embodiment of steering logic 300 that may be utilized with the data buffer DB component of each of the memory modules described above. The steering logic 300 includes a primary interface 302 having nibble-to-DQ/DQs I/O circuitry and a secondary interface 304 that also includes corresponding nibble-to-DQ/DQs I/O circuitry. Each nibble pair of primary interface 302 is associated with a receive path 306 for write data and a transmit path 308 for read data. Each transmit path 308 is fed by the output of a multiplexer 310 that receives data from a selected receive path 312 associated with a secondary nibble pair and the other receive path 306 of the primary nibble pair. The nibble pair circuitry for the secondary interface is similar to the primary structure, but the receive path and the transmit path correspond to read data and write data, respectively. Steering logic 300 essentially allows data from any of the DQ/DQs I/O circuits to be steered to any of the other DQ/DQs I/O circuits, thus providing bypass functionality and achieving load balancing for systems with point-to-point techniques (such as fig. 2D-2I).
Fig. 4A-4F illustrate various cache operations associated with a hybrid DRAM module and an SCM module, where tag comparison circuitry 402 is located on a memory controller 404. Generally, the portion of the total DRAM capacity defined by the DRAM components on each module is labeled as the cache for the SCM memory. For a given memory operation, an address is requested at 402 by the controller 404. If the address is maintained in the DRAM cache, the data in the addressed cache space may be fed and provided directly to the controller as shown in FIG. 4A. As shown in fig. 4B, in the case where the address is not in the cache (often referred to as a "miss"), the contents still need to be read for tag comparison. This step is performed as a first step at 406. At 408, an additional step is performed in which the SCM memory is read for the data. Referring now to FIG. 4C, if the miss involves "dirty" data, a write back operation to the old location in the SCM memory is performed at 410 in addition to the read operations 406 and 408 from the DRAM module and SCM module described above with respect to FIG. 4B. The second step 408 and the third step 410 may be performed in any order. Fig. 4D-4F illustrate corresponding cache operations for a write transaction.
FIG. 5 illustrates one embodiment of a relationship or mapping between a given cache line in DRAM space to a plurality of address lines in SCM memory space. This may be referred to as DRAM-to-SCM mapping for single set (direct mapped) cache organization. For the example shown, any of the eight cache line locations in SCM space, such as at 502, may be loaded in a single cache line in DRAM space, such as at 504. One embodiment of a data structure 506 that may be stored in a DRAM cache line includes a tag address field 508 identifying a 3-bit tag address, a 72B data field 510, an error code EDC field 512 protecting the entire data structure, and corresponding parity fields 514, valid 516, and dirty fields 518. For one embodiment, EDC field 512 may protect DATA field 510 and TAG field 508 with an error correction/detection code. In some embodiments, the EDC field 512 field may also protect the VALID field 514, DIRTY field 516, and PARITY field 518, and in other embodiments these fields may not be protected. The PARITY field 514 may protect the TAG field 508 with an error detection code (this code would be redundant with the code used in the EDC field). The PARITY field 514 may also protect the VALID field 516 and the DIRTY field 518 in some embodiments.
FIG. 6 illustrates one embodiment of a memory system 600 exhibiting a point-to-point architecture. The system 600 includes a first memory module 602, the first memory module 602 using a DRAM memory component 604 buffered from a memory controller 606 by a plurality of data buffer components DB and command/address (CA) buffer components RCD. The first memory module 602 is coupled to a memory controller 606 through a point-to-point data nibble or link DQv and a point-to-point CA link CS/CAx. The system 600 includes a second memory module 608, the second memory module 608 using an SCM memory component 610 that is also buffered from the memory controller 606 by a plurality of data buffers DB and CA buffer components RCD. The second module 608 includes point-to-point connections to the memory controller via data nibble DQu. The CA link CS/CAy couples the CA signal lines of the second module 608 to the memory controller 606 in a point-to-point manner. The first module 602 and the second module 608 may communicate with each other via a backchannel signal path DQt. For the two-module configuration described, one module may be allocated half of the total data width of the controller, and a second module may be allocated half of the total data width. For one embodiment, at least a portion of the DRAM memory space of the first module 602 is allocated to a cache for operations involving the SCM memory of the second memory module 608. For one embodiment, the backchannel path DQt provides cache transfers between the two modules 602 and 608, such that no transfer bandwidth is needed on the primary links DQu and DQv for the cache transfers. The backchannel DQt may also provide dynamic load balancing operations.
The memory system of fig. 6 may be expanded as shown in fig. 7 while still maintaining a point-to-point architecture. Four modules 702, 704, 706, and 708 are shown, with two of the modules 704 and 708 using DRAM components to provide a cache for two other modules 702 and 706 using SCM memory components. In such an embodiment, the total data width of memory controller 710 may be apportioned in half, with a direct point-to-point data connection to first DRAM module 708 being generated via nibble link DQu, and a second point-to-point connection to second DRAM module 704 being generated via nibble link DQv. The point-to-point CA connection between DRAM modules 708 and 704 and memory controller 710 is made via links CS/CAx and CS/CAy. The backchannel links at 712 and 714 for CA signaling are provided between the DRAM modules 708, 704 and the SCM modules 706, 702 via links CS/CAx 'and CS/CA'. SCM modules 706 and 702 interface with DRAM modules 708 and 704 for data signal transfer via backchannel link connections along links DQu 'and DQv'. As explained more fully below, the backchannel connections 714 and 712 allow cache operations between modules with little impact on the bandwidth of the primary interface point-to-point links DQu and DQv.
Fig. 8A-8F illustrate cache operations between the DRAM memory module 602, SCM memory module 608, and memory controller 606 of fig. 6. The operation assumes that the cache tag compare circuit 802 resides on the DRAM module 602 and that dirty read miss data is not written back nor a cache line in the DRAM is allocated when there is a read miss. This means that read miss dirty and write miss clean cases do not occur (they have been deleted). This simplification allows the cached system to have the highest read and write bandwidth at the expense of most reads accessing the SCM location.
FIG. 8A illustrates a read hit case where the address at 800 in the DRAM cache matches the requested address (as performed by the tag compare circuit 802), resulting in the data being transferred from the DRAM module 602 directly to the memory controller 604 (in only one step, since the data has been read from the DRAM space 800 and provided to the tag circuit 802 and the data buffer). For one embodiment, the read data is delayed by a time interval in an effort to match the latency associated with the read miss case. When the data in the read miss is delayed to match the delay of the read miss data, there will be no conflict on the data bus between the read hit to a different address and an earlier read miss.
For the read miss case shown in FIG. 8B, DRAM 800 is read first, so that the tag contents can be determined via tag circuitry 802 and compared to the address of the incoming request. When the data is determined to be a "miss," the SCM memory module 606 is then accessed for a read operation to transfer the requested read data from the SCM module 606 at 804 along the reverse channel link (between the DRAM module and the SCM module), and then from the DRAM module 602 via the bypass connection configured through the data buffer DB steering logic 300 described above with respect to fig. 3. FIG. 8C confirms only that the particular embodiment described herein does not provide any write back for the dirty read miss case.
Fig. 8D illustrates a write hit case where in a first step, the tag contents are first read from DRAM 800 at 810, and in a second step, the write data is written to DRAM module 602 at 812. For one embodiment, for a write hit case, the write operation involves alternating odd and even nibbles of the DRAM module 602. When alternating odd and even nibbles are used, it is then possible to perform a tag read and a data write in two different cycles in the odd nibble and a second tag read and data write in two different cycles in the even nibble, resulting in a throughput of one cached write operation per tCC interval (the same as an uncached DRAM system).
FIG. 8E shows that for the described embodiments, no operation is performed for the write miss case. For a write miss dirty operation, after the tag contents are compared in the first step, the tag contents are transferred to the SCM memory via the reverse channel link 804 at 814 and the write data is written to DRAM 800 as shown at 816 in FIG. 8F.
FIG. 9 illustrates further details of the memory systems of FIGS. 6, 7, and 8A-8F, with particular emphasis onBuffer circuits for DRAM modules and SCM modules. To reiterate, this particular embodiment uses a point-to-point connection between the memory controller 902 and at least one DRAM module 904, and also on the DRAM module (and for one embodiment, specifically on the CA buffer RCD) D Middle) is incorporated into the tag compare circuit 906. Data transfers between the memory controller 902 and the SCM module 908 are passed through the DRAM module 904 via the backchannel connection DQu' between the DRAM module 904 and the SCM module 908. The memory controller 902 includes respective read data queue 910 and write data queue 912 coupled to the data nibble DQu, which are used as a data buffer component DB for coupling to a DRAM module D The primary data link of (1). The memory controller 902 also includes respective read address queues 914 and write address queues 916 that are selectively coupled to a CA link CS/CAx via a multiplexer 918, which serves as a CA buffer RCD for coupling the memory controller 902 to a DRAM module D The primary CA link of (1). As described below, the STATUS circuit "STATUS" 920 communicates with the DRAM CA buffer component RCD via a STATUS link STx D The docking, the status link STx provides information relating to the tag compare operation.
With further reference to FIG. 9, each buffer component DB on the DRAM module 904 D Steering logic as described above with respect to fig. 3 is used such that primary data interface I/O circuits coupled to links DQu and DQu' (backchannel links) may be selectively coupled to either of the secondary data I/O circuits DQu (respective even and odd cache DRAM nibbles) via a multiplexer circuit appropriately placed in the data transfer path of the steering logic. For a read hit condition, there may be a data buffer DB D The DELAY circuit "(DATA DELAY MATCH)" 922 is used to MATCH the DELAY of the read miss case, which may maximize the bandwidth of the primary DATA link DQu between the DRAM module 904 and the memory controller 902. Alternatively, delay circuit 922 may be omitted and RCD implemented by using a DRAM CA buffer D The delay circuit in (1) delays the address command to provide a similar delay. As described earlier, the read hit is delayedMatching data with read miss data allows for maximum read data bandwidth at the expense of increased latency in the case of a read hit.
With continued reference to FIG. 9, the DRAM CA buffer RCD D Including DRAM write address buffer circuitry 924 that provides buffered write addresses to DRAM memory component 926. The tag compare circuit 906 is also in the CA buffer section RCD D Upper residence, RCD of the CA buffer component D Receiving new TAG information for a requested address having a TAG communication path TAG by a DRAM data buffer via a TAG communication path OLD The old tag information provided. The results of the tag comparison are then fed back to the memory controller 902 via the status link STx so that the controller can dispatch any commands necessary for additional cache operations associated with the "miss" condition. CA information passed to the DRAM module 904 via the CA link CS/CAx link is redriven by the DRAM module to the SCM module 908 via the CA backchannel link CS/CAx'.
The SCM memory module data buffer components DB and CA buffer components are configured similarly to their DRAM module counterparts. However, since the connection between the memory controller 902 and the memory modules 904 and 908 is made through the DRAM module 904, it is used for the data buffer DB D And DB S Steering logic of (2) generally provides steering capability between one primary data circuit (the backchannel link) to any of the secondary data I/O circuits. As for the to CA buffer component RCD, no tag comparison circuit is provided (or alternatively provided). However, to control the reverse path link CS/CAx', bypass comparison logic 928 is used.
Fig. 10-13 illustrate various timing diagrams showing the relative timing of various cache operations discussed above with respect to fig. 6, 7, 8A-8F, and 9. Referring now to FIG. 10, the relative timing for a series of operations relating to the cache "read hit" and cache read miss (clean) cases is shown. The top half of the labeled link shown in the graph (CAx to STx) corresponds to the signal associated with the DRAM module, such as 904 (fig. 9), while the lower signal (CAx 'to DQu') relates to the signal associated with the SCM module, such as 908 (fig. 9). To evaluate the contents of the tag memory, read transactions corresponding to commands "ACT" and "RD" to read tag addresses from DRAM are dispatched along CA link CAx and redriven by the CA buffer along "even" auxiliary CA link CAx. The tag address data Q is accessed and optionally delayed a number of cycles later. Delaying tag data (addresses) to match the latency of a read miss data operation may help to maximize the bandwidth of the primary data link DQu. Tag compare circuitry 906 evaluates the requested address and tag address and indicates a hit or miss "H/M" on status link STx. In parallel with the DRAM module transaction to read the contents of the tag memory, speculative SCM memory access operations are performed by dispatching a read command along the reverse channel CA link CAx ', which is redriven by the CA buffer RCD along the secondary CA interface path CAx's. Alternatively, the SCM read access may wait to begin until the tag is read in DRAM and a read miss is acknowledged. However, this increases read miss latency. The resulting buffer component Q from the SCM module 908 may then be transferred along the primary link DQu to the memory controller 902 in the event of a "read miss clean" condition. For a read "hit" case, the data accessed from the DRAM is transferred to the memory controller 902.
FIG. 11 illustrates a timing diagram similar to that of FIG. 10 for the "read hit" and "read miss clean" cases, but at the DRAM CA buffer RCD instead of delaying read data being accessed from the DRAM module 904 D Where address information is delayed along the secondary CA link CAx. As with the data delay option described in fig. 10, address delay may increase the bandwidth of the primary data link DQu.
FIG. 12 illustrates the timing for the "write hit" and "dirty write miss" cases. Write operations (commands "ACT" and "WR") are dispatched by the memory controller 902 along the primary CA link CAx. CA buffer component RCD D Receiving a write command, address information and associated data, and retransmitting the command as a read operation "RD" (by the CA buffer RCD) for reading the contents of the tag memory D Implementation). The new write data and new address tag information are then stored in the DRAM write buffer. For a write miss dirty case (determined after evaluating tag information), a further write operation is performed with write command "WR" along link CAx' to place the old data and tag information in the SCM write buffer. Storing the old tag information and the new tag information in the buffer in this way helps to maximize the bandwidth of the primary data link DQu.
FIG. 13 illustrates the relative timing involved for various operations taken to implement odd/even nibble ordering for cache write hit and miss cases. This sequencing takes advantage of temporarily storing tags and data in a buffer for the write operation of fig. 12. Turn-around latency is minimized and bandwidth of the channel is maximized for alternating read and write operations of odd and even nibbles attached to the data buffer.
Fig. 14 shows a portion of a DRAM module, such as the one shown in fig. 9, in which the tag matching circuits are distributed across the DRAM module via a plurality of data buffer components DB rather than being used in a single CA buffer RCD, such as at 904 in fig. 9. Also shown is a portion of the data buffer 1400 that includes a tag compare circuit 1404. The status link STx interfaces the memory controller 902 to the CA buffer component RCD D . Routing the match link 1402 along the modules between the various data buffer DBs allows the data buffer DB to communicate the results of each match for each buffer along the match link to the CA buffer RCD so that the results can be sent to the memory controller 902 to control cache operations. The TAG comparison circuit 1404 is duplicated in each data buffer DB and includes an XOR circuit 1406 which receives the old TAG information TAG OLD [i]And new TAG information TAG NEW [i](from the tag field) as its input. OR gate 1410 then masks OR ORs the XOR output at 1408 with the output from control register 1412 (set at initialization). The output from OR gate 1410 may then undergo timing adjustment at 1414 and be transmitted out along matching link 1402 to CA buffer RCDs. In this way, the label is distributedThe tag status information may reduce the pin count. Alternatively, TAG from DRAM read OLD May be passed from the DB component to the RCD component where the TAG comparison is done. This may require more pins on the buffer component.
Fig. 15A-15I illustrate high-level system configuration diagrams for various embodiments that may be used as alternatives to the systems, modules, and memory devices described above. The embodiment of fig. 15E and 20 is similar to the DV4 system of fig. 8, except that the embodiment of fig. 8 uses a cache policy that is not allocated at the time of the read, whereas the embodiment of fig. 15E and 20 will allow allocation at the time of the read operation (as will be the case with other configurations). For example, FIG. 15A shows a high level diagram of a first pair of DRAM and SCM modules 1502 and 1504 coupled together via a backchannel connection 1506, where the DRAM module 1502 is connected to a memory controller 1508. A second pair of DRAM modules 1510 and SCM modules 1512 are similarly configured. For this embodiment, tag compare circuitry 1514 resides on the memory controller 1508. The associative cache operation for the embodiment of fig. 15A is illustrated in fig. 16A through 16F. A read hit operation, as shown in fig. 16A, involves directly reading the DRAM module 1502 and directly providing the data to the memory controller 1508. For the read miss clean shown in FIG. 16B, additional steps are performed that involve reading tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 and sending the read data to the memory controller 1508. For the read miss dirty shown in FIG. 16C, the first cache operation is augmented by reading the tag and address information from the DRAM module 1502 at 1516 and writing it to the SCM module 1504 at 1518. Fig. 16D-16F illustrate corresponding write cache operations.
FIG. 15B shows an alternative system configuration similar to that of FIG. 15A, but instead of interfacing DRAM modules 1502 and 1510 to memory controller 1508, SCM modules 1504 and 1512 communicate directly to the memory controller via point-to- point links 1520 and 1522. The associated cache operations for reads and writes are illustrated in fig. 17A-17F. The read hit operation, as shown in figure 17A, involves reading the DRAM module 1502 via a reverse channel link (with a bypass formed by steering logic in the buffer circuitry of the SCM module 1504) and providing the data to the memory controller 1508. For the read miss clean shown in FIG. 17B, additional steps are performed that involve reading tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 and sending the read data to the memory controller 1508. For the read miss dirty shown in FIG. 17C, the first cache operation is augmented by reading the tag and address information from the DRAM module 1502 at 1702 and writing it to the SCM module 1504 at 1704. 17D-17F illustrate corresponding write cache operations.
Fig. 15C illustrates yet another system configuration similar to fig. 15A in which the DRAM modules are directly coupled to the memory controller 1508. However, a tag compare circuit 1514 is provided on each SCM module 1504 and 1512 instead of the memory controller 1508. The associative cache operation is shown in fig. 18A to 18F. Read hit operations read the DRAM module 1502 directly, provide tag information to the tag compare circuit 1514 via back channel 1520, and provide data directly to the memory controller 1508 as shown in FIG. 18A. For the read miss clean shown in FIG. 18B, additional steps are performed that involve reading tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 and sending the read data to the memory controller 1508. For the read miss dirty shown in FIG. 18C, the first cache operation is augmented by writing the tag and address information to the SCM module 1504 at 1802. Fig. 18D-18F illustrate corresponding write cache operations.
FIG. 15D illustrates yet another system configuration similar to that of FIG. 15A, but incorporating a tag comparison circuit 1514 on each SCM module 1504 and 1512 and connecting the SCM modules directly to the memory controller 1508 (rather than DRAM modules). The associative cache operation is shown in fig. 19A to 19F. A read hit operation, as shown in fig. 19A, involves reading DRAM modules 1502 via reverse channel link 1520, providing tag information to tag compare circuit 1514, and providing data to memory controller 1508. For the read miss clean shown in FIG. 19B, additional steps are performed that involve reading the tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 and sending the read data directly to the memory controller 1508. For the read miss dirty shown in fig. 19C, the first cache operation is augmented by writing tag and address information to the SCM module 1504 at 1902. Fig. 19D-19F illustrate corresponding write cache operations.
Figure 15E illustrates a four-module embodiment similar to the embodiment described above with respect to figures 7-14, which utilizes tag compare circuitry 1514 on each DRAM module 1502 and 1510 and directly couples the DRAM modules to memory controller 1508. The associated cache operations are shown in fig. 20A through 20F. A read hit operation, as shown in fig. 20A, involves directly reading the DRAM module 1502 and providing data directly to the memory controller 1508. For read miss clean as shown in FIG. 20B, additional steps are performed that involve reading tag information and data from the SCM module 1504 via the reverse channel link 1520, and sending the read data directly to the memory controller 1508. For the read miss dirty shown in FIG. 20C, the first cache operation is augmented by reading the tag and address information from the DRAM module 1502 at 2002 and writing it to the SCM module 1504 at 2004. 20D-20F illustrate corresponding write cache operations.
FIG. 15F illustrates yet another system embodiment similar to FIG. 15A, but incorporating a tag compare circuit 1514 on each DRAM module 1502 and 1510 and connecting the SCM modules 1504 and 1512 directly to the memory controller 1508 (rather than the DRAM modules). The associated cache operation is shown in fig. 21A to 21F. The read operation, as shown in FIG. 21A, involves reading the DRAM module 1502 via the backchannel link 1520, comparing the tag information with the tag compare circuit 1514, and providing the data to the memory controller 1508. For the read miss clean shown in FIG. 21B, additional steps are performed that involve reading the tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 (via the reverse channel link 1520), and sending the read data directly from the SCM module 1504 to the memory controller 1508. For the read miss dirty shown in fig. 21C, the first cache operation is augmented by reading the tag and address information from the DRAM module 1502 at 2102 and writing it to the SCM module 1504 at 2104. 21D-21F illustrate corresponding write cache operations.
FIG. 15G illustrates a high level diagram showing a 3-module system configuration diagram using a first DRAM module 1524, the first DRAM module 1524 being interconnected to an SCM module 1526 via a first backchannel connection 1528. The SCM module interfaces with a second DRAM module 1530 via a second backchannel connection 1532, such that both DRAM modules 1524 and 1530 interface directly with the memory controller 1508.
FIG. 15H illustrates a high-level 3 module configuration similar to FIG. 15G, but incorporating a first SCM module 1532 interconnected to DRAM module 1534 via a first backchannel connection 1536. The DRAM module interfaces with a second SCM module 1538 via a second backchannel connection 1540, such that SCM modules 1532 and 1538 interface directly with memory controller 1508.
FIG. 15I illustrates a high level dual module system configuration that primarily uses an SCM module 1542 and a DRAM module 1544 interconnected by a backchannel connection 1546 and where both modules are coupled to the memory controller 1508 via a point-to-point link. 22A-22F illustrate cache operations associated with the system configuration of FIG. 15I, where tag compare circuitry is provided on the memory controller. A read hit operation, as shown in fig. 22A, involves directly reading the DRAM module 1502 and directly providing the data to the memory controller 1508. For read miss clean as shown in FIG. 22B, additional steps are performed that involve reading tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 via the reverse channel link 1514, and sending the read data directly to the memory controller 1508. For the read miss dirty shown in FIG. 22C, the first cache operation is extended by reading the tag and address information from the DRAM module 1502 at 2202 and writing it to the SCM module 1504 at 2204. Fig. 22D-22F illustrate corresponding write cache operations.
23A-23F illustrate cache operations associated with the system configuration of 15I, where both memory modules 1502 and 1504 are coupled directly to memory controller 1508, and tag compare circuitry 1514 is disposed on SCM memory module 1504. A read hit operation, as shown in figure 23A, involves reading the DRAM module 1502 directly, providing tag information to the tag compare circuit 1514 via the reverse channel link 1520, and providing data directly to the memory controller 1508. For read miss clean as shown in FIG. 23B, additional steps are performed that involve reading tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 (via the reverse channel link 1520), and sending the read data directly from the SCM module 1504 to the memory controller 1508. For the read miss dirty shown in FIG. 23C, the first cache operation is extended by reading the tag and address information from DRAM module 1502 at 2302 and writing it to SCM module 1504 at 2304. Fig. 23D-23F illustrate corresponding write cache operations.
Fig. 24A to 24F illustrate a cache operation associated with the system configuration of 15I in which a tag comparison circuit is provided on the DRAM memory module 1502. A read hit operation, as shown in fig. 24A, involves directly reading DRAM modules 1502, comparing tag information with tag compare circuit 1514, and providing data directly to memory controller 1508. For read miss clean shown in FIG. 24B, additional steps are performed that involve reading tag information and data from the SCM module 1504 as a write operation to the DRAM module 1502 via the reverse channel link 1520, and sending the read data directly from the SCM module 1504 to the memory controller 1508. For the read miss dirty shown in fig. 24C, the first cache operation is augmented by reading the tag and address information from DRAM module 1502 at 2402 and writing it to SCM module 1504 at 2404. Fig. 24D-24F illustrate corresponding write cache operations.
Fig. 25 illustrates a graph comparing various performance characteristics between the various system configurations described above with respect to fig. 15A-15I. This comparison provides a continuous bandwidth digest (such as, for example, a small portion of 25.4GB/s or 36b at 6.4GB/s with alternating accesses to odd and even nibbles) that is compared to the baseline. The first column indicates the particular module configuration nomenclature (corresponding to the symbolic representation of the bottom portion of the graph), while the top row lists the cache operations involved, such as read hit "RH", read miss clean "RMC", read miss dirty "RMD", write miss "WH", write hit clean "WHC", and write hit dirty "WHD".
Fig. 26 illustrates a graph comparing various power characteristics between the various system configurations described above with respect to fig. 15A-15I. The configuration and cache operation naming of the first column and first row corresponds to the configuration and cache operation naming of the first column and first row of fig. 25.
FIG. 27 illustrates further details of the memory system of FIG. 15A similar to FIG. 19 with particular emphasis on buffer circuits for the DRAM module and the SCM module. To reiterate, this particular embodiment uses a point-to-point connection between the memory controller 2702, the at least one DRAM module 2704, and the SCM memory module 2706, with the tag comparison circuitry 2708 being disposed on the memory controller 2702. Data transfers between the memory controller 2702 and the SCM module 2706 are passed through the DRAM module 2704 via backchannel connections DQu 'and CS/CAx' between the DRAM module 2704 and the SCM module 2706. The memory controller 2702 includes respective read and write data queues 2710 and 2712 coupled to data nibble DQu, which serves as a data buffer component DB for coupling to a DRAM module D The primary data link of (1). The controller 2702 also includes respective read address queues 2714 and write address queues 2716 that are selectively coupled via a multiplexer 2718 to a CA link CS/CAx that serves to couple the controller 2702 to a DRAM module CA buffer RCD D The primary CA link of (1). Since the tag compare circuitry 2708 resides on the memory controller 2702 for this embodiment, the memory modules 2704 and 2706 do not use status circuitry. The tag comparison circuit 2708 receives the old tag information read from the DRAM module 2704 and the new tag information provided with the new request information as a new memory operationPart (c) of (a).
With further reference to FIG. 27, each buffer component DB on the DRAM module 2704 D Steering logic as described above with respect to fig. 3 is used such that primary data interface I/O circuitry coupled to links DQu and DQu' (backchannel links) may be selectively coupled to any of the secondary data I/O circuitry DQu (respective even and odd cache DRAM nibbles) via multiplexers 2720, 2722, and 2724 appropriately placed in the data transfer path of the steering logic.
With continued reference to fig. 27, the SCM data buffer DB is similar to the DRAM module counterpart configuration, but includes SCM write data buffer circuitry 2726 that buffers write data to SCM memory components 2728 and 2730. In a similar manner, the SCM module CA buffer RCD includes SCM write address buffer circuitry 2732 that provides buffered write addresses to SCM memory components 2728 and 2730. Since the connection between the memory controller 2702 and the memory modules 2704 and 2706 is made through the DRAM module 2704, it is used for the data buffer DB D And DB S The steering logic of (1) generally provides steering capability between a primary data circuit (backchannel link DQu ') to either of the secondary data I/Q circuits DQu and DQu'. However, to control the backchannel link DQu', bypass comparison logic may be used.
FIG. 28 illustrates a memory system configuration similar to that of FIG. 27 with particular emphasis on buffer component circuitry, while tag comparison circuitry 2802 resides in the SCM module CA buffer component RCD S And not in memory controller 2804. Having the tag compare circuit 2802 reside on the SCM CA buffer RCD, the status link STx will use the SCM CA buffer RCD S Coupled to memory controller 2804 to provide cache state information. Most of the rest of the system construction is similar to the system of fig. 27.
FIG. 29 illustrates a memory system configuration similar to that shown in FIG. 9, in which a DRAM data buffer DB is provided D Without optional data delay circuit, and in the DRAM CA buffer section RCD D Without an optional address delay circuit. This embodimentA point-to-point connection between the memory controller 2902 and at least one DRAM module 2904 is used, and a tag compare circuit 2906 is also incorporated on the DRAM module. Data transfers between the memory controller 2902 and the SCM module 2908 are passed through the DRAM module 2904 via the backchannel connection DQu' between the DRAM module 2904 and the SCM module 2908. The memory controller 2902 includes respective read and write data queues 2910 and 2912 coupled to data nibble DQu serving as a data buffer component DB for coupling to a DRAM module D The primary data link of (2). The memory controller 2902 also includes respective read address queues 2914 and write address queues 2916 that are selectively coupled, via multiplexers 2918, to a CA link CS/CAx that serves as a CA buffer RCD for coupling the memory controller 2902 to a DRAM module CA buffer RCD D The primary CA link of (1). As described below, the STATUS circuit "STATUS"2920 communicates with the DRAM CA buffer component RCD via a STATUS link STx D The docking, the status link STx, provides information relating to the tag compare operation.
With further reference to FIG. 29, each buffer component DB on the DRAM module 2904 D Steering logic as described above with respect to fig. 3 is used such that primary data interface I/O circuitry coupled to links DQu and DQu' (backchannel links) can be selectively coupled to either of the secondary data I/O circuitry DQu (respective even and odd cache DRAM nibbles) via multiplexer circuitry appropriately placed in the data transfer path of the steering logic.
With continued reference to FIG. 29, the DRAM CA buffer RCD D Including a DRAM write address buffer circuit 2924 that provides buffered write addresses to a DRAM memory component 2926. The tag compare circuit 2906 also resides on the CA buffer component RCD D The CA buffer component receives new TAG information for a requested address having a TAG communication path TAG by the DRAM data buffer via the TAG communication path OLD The old tag information provided. The result of the tag comparison is then fed back to the memory controller 2902 via the status link STx so that the controller can dispatch the tag comparison to "notHit "any commands necessary for additional cache operations associated with the case. CA information passed to the DRAM module 2904 via the CA link CS/CAx link is redriven by the DRAM module to the SCM module 2908 via the CA backchannel link CS/CAx'.
The SCM memory module data buffer component DB and the CA buffer component RCD are configured similarly to their DRAM module counterparts. However, since the connection between the memory controller 2902 and the memory modules 2904 and 2908 is made through the DRAM module 2904, it is used for the data buffer DB D And DB S Steering logic of (1) generally provides steering capability between one primary data circuit (the backchannel link) to any of the secondary data I/O circuits.
FIG. 30 illustrates the memory system configuration of FIG. 29, highlighting various signal paths operating during a cache operation involving a read hit. As shown, a first read operation occurs, where at 3002 the DRAMCA buffer component RCD is directed along the primary DRAM CA path CS/CAx D A read command is dispatched. The command propagates to the secondary CA path CS/CAxs and is fed to the first chunk of the DRAM cache memory (chunk 0) at 3004. Old cache line data and tag information are accessed, read out and first passed along the auxiliary data path DQus to the data buffer component DB D And directed by buffer steering logic to the primary data interface path DQu. In parallel, for one embodiment, the old cache line TAG information TAG OLD From the data buffer DB D E.g., at 3006 along the extension, to the BCOM bus interconnecting all data buffers DB on the DRAM module D And CA buffer component RCD D . The tag compare logic at 3008 on the CA buffer receives the old tag information and the new tag information received from the primary CA path CS/CAx upon receiving the original DRAM read command via the BCOM bus. The output of the tag comparison is then driven at 3010 along the status line STx to the memory controller. For some embodiments, the BCOM bus may be further extended to allow propagation of tag comparison information or partial tag status information. The memory controller then uses the received tag status to determineWhether the data read from the DRAM cache is the current data for the requested address (read hit case).
FIG. 31 illustrates the memory system of FIG. 30 including highlighting related to read hit conditions, but also including highlighting signaling paths involved in cache operations, where the state of tag comparison results in a read miss clean condition. The operations involved in reading the DRAM cache and comparing the tag information remain the same as described above with respect to fig. 30, but additional caching operations with respect to the SCM module are performed. To provide the correct data to the controller, the read command is forwarded by dispatching it along the primary CA link CS/CAx and through the DRAM CA buffer RCD along the reverse path connection CS/CAx' at 3102 D A command is directed to the SCM CA buffer RCD to read the SCM memory module. The command is then fed at 3104 along secondary CA link CS/CAx's to the first chunk of SCM memory (chunk 0) at 3106. The new cache line data and tag information is then accessed from the SCM memory, passed along secondary data links DQu's at 3108, and through SCM data buffer components DBs. The new data and tag information is then transmitted along the reverse channel link DQu's at 3110 and received at the DRAM data buffer component DBD. The data is then directed by the buffer steering logic to the memory controller via the primary data path DQ at 3112 and to the DRAM write buffers at 3114. The buffer contents may then be written to the DRAM cache at appropriate timing intervals.
Referring now to FIG. 32, for the read miss dirty case, the cache operation is similar to the cache operation described above, with the addition of loading the old data and old tag information at 3203 and 3204 to the SCM write buffer. The information may then be written to the SCM memory at appropriate intervals.
Fig. 33-35 illustrate cache line operations similar to those described above in fig. 30-32, but for write operations rather than read operations.
FIG. 36 illustrates cache line operations involving retiring a cache line in a write buffer. The retirement operation will typically be initiated by a bit field in either the read column command or the write column command, with possibly a static delay. Generally, the operation involves transferring old data and tag information from SCM write buffers (data and address) at 3602 and 3604 to SCM memory at 3606. New data and tag information is transferred from the DRAM write buffers (data and address) at 3608 and 3610 to the DRAM memory at 3612.
37-42 illustrate various timing diagrams related to various cache line operations consistent with the operations described above. FIG. 37 illustrates various operations and relative timing parameters associated with one embodiment of a cache line read miss dirty sequence. FIG. 38 illustrates various operations and relative timing parameters associated with one embodiment of a cache line write miss dirty sequence. FIG. 39 illustrates various timing parameters associated with a retirement operation involving retiring one cache entry from each of three write buffers. FIG. 40 illustrates the minimum latency associated with a read miss dirty cache operation. FIG. 41 illustrates a constant latency for read hit, miss, and dirty operations. FIG. 42 illustrates the timing associated with back-to-back read miss dirty, write miss dirty, and read miss dirty sequences.
FIG. 43 illustrates a high level embodiment of a dual module memory system similar to the embodiments described above, where a first portion of the DRAM memory module 4302 is utilized as cache memory at 4304, while a second portion of the DRAM memory is not cached at 4306. This system is similar to the SU4 system shown in fig. 15D and fig. 19.
FIG. 44 illustrates further details of circuitry used by respective data buffers and CA buffers on the DRAM and SCM modules of the memory system shown in FIG. 43 and associated cache operations for read operations.
FIG. 45 illustrates further details of the circuitry used by the respective data buffers and CA buffers on the DRAM module and SCM module shown in FIG. 43, similar to FIG. 44, and the associated cache operations for write operations.
FIGS. 46 and 47 illustrate timing diagrams with operations associated with cache read hit and miss conditions for the system shown in FIG. 43.
FIG. 48 illustrates a timing diagram with operations associated with uncached DRAM memory read operations and SCM memory read operations for the system of FIG. 43.
FIG. 49 illustrates a timing diagram with operations associated with both cached and uncached DRAM memory write operations and SCM memory write operations for the system of FIG. 43.
Fig. 50 illustrates further details regarding the DRAM package structure and various interconnections between the DRAM package and the data and CA buffer circuits.
FIG. 51 illustrates yet another system embodiment that uses a first DRAM memory module 5102 and a second memory module 5104 that includes both a DRAM memory component 5106 and an SCM memory component 5108. Similar to the various other memory system configurations described herein, the memory modules 5102 and 5104 are connected to the memory controller 5110 via point-to-point links and to each other via backchannel links DQt to provide cache transfers between modules. This shows the FC2 system first described in fig. 2E.
FIG. 52 illustrates yet another high-level system embodiment similar to the high-level system embodiment shown in FIG. 43. This figure applies to all previously discussed systems that use a DRAM memory region and an SCM memory region. In this example, there is one DRAM module 5202 and one SCM module 5204. Each module may be used by the memory controller 5206 with both cached regions, such as at 5208 and 5210, and uncached regions, such as at 5212 and 5214. The region size may be set at initialization by a control register (not shown). Accesses to uncached regions may primarily involve accesses to a single memory type. Access to the cached region of the SCM memory may primarily involve access to the cached DRAM region as described in the previous example.
When received within a computer system via one or more computer-readable media, such data and/or instruction-based expressions of the circuitry described above may be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with one or more other computer programs that perform functions including, without limitation, netlist generation programs, place and route programs, and the like, to generate representations or images of physical representations of such circuitry. Such representations or images may be used later in device fabrication, for example, by implementing one or more masks that generate the various components used to form the circuit during device fabrication.
In the foregoing description and in the drawings, specific terms and reference signs have been set forth to provide a thorough understanding of the invention. In some instances, the terminology and symbols may imply specific details that are not required to practice the invention. For example, any of the specific numbers of bits, signal path widths, signaling or operating frequencies, component circuits or devices, etc. may differ from those described above in alternative embodiments. The interconnections between circuit elements or circuit blocks shown or described as multi-conductor signal links may alternatively be single-conductor signal links, and single-conductor signal links may alternatively be multi-conductor signal links. Signals and signaling paths shown or described as being single-ended may also be differential and vice versa. Similarly, signals described or depicted as having active high or active low logic levels may have opposite logic levels in alternative embodiments. Component circuitry within an integrated circuit device may be implemented using Metal Oxide Semiconductor (MOS) technology, bipolar technology, or any other technology in which logic and analog circuitry may be implemented. With respect to terminology, a signal is said to be "asserted" when the signal is driven to a low or high logic state (or charged to a high logic state or discharged to a low logic state) to indicate a particular condition. Conversely, a signal is said to be "deasserted" to indicate that the signal is driven (charged or discharged) to a state other than the asserted state (including a high or low logic state or may be driven at the signal)The circuit is transitioned to a high impedance condition such as a floating state that occurs when the drain or collector is open). A signal driving circuit is said to "output" a signal to a signal receiving circuit when the signal driving circuit asserts (or deasserts if the context clearly states or indicates) the signal on a signal line coupled between the signal driving and signal receiving circuits. A signal line is said to be "activated" when a signal is asserted on the signal line and "deactivated" when the signal is de-asserted. Additionally, the prefix symbol "/" attached to the signal name indicates that the signal is an active low signal (i.e., the asserted state is a logic low state). The lines above the signal names (e.g.,
Figure BDA0001615184820000251
) And is also used to indicate an active low signal. The term "coupled" is used herein to express a direct connection as well as a connection through one or more intervening circuits or structures. Integrated circuit device "programming" may include, for example and without limitation, loading control values into registers or other storage circuits within the device in response to host instructions, and thus controlling operational aspects of the device by a one-time programming operation (e.g., blowing fuses within configuration circuits during device production), establishing device configuration or controlling operational aspects of the device, and/or connecting one or more selected pins or other contact structures of the device to reference voltage lines (also referred to as shorting) to establish a particular device configuration or operational aspect of the device. The term "exemplary" is used to express examples, but not preferences or requirements.
Although the present invention has been described with reference to specific embodiments thereof, it will be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, features or aspects of any of the embodiments may be applied, at least where feasible, in combination with any other of the embodiments or in place of their corresponding features or aspects. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (3)

1. A memory module, comprising:
a pin interface for coupling to a bus, the bus having a first width;
an SCM memory space including at least one storage class memory SCM component;
a DRAM memory space comprising at least one DRAM component; wherein the first region of the DRAM memory space is operated as a cached region of the SCM memory space to provide caching functionality for transactions involving the SCM component; and is provided with
Wherein the memory module operates in a first mode utilizing all of the first width and in a second mode utilizing a portion of the first width;
further comprising:
a buffer circuit to buffer operations between the at least one SCM component and a memory controller, the buffer circuit including a primary interface and a secondary interface coupled to the at least one SCM component;
wherein the buffer circuit comprises steering logic to route data between any of a plurality of paths coupled to the primary interface and any of a plurality of paths coupled to the secondary interface,
wherein the steering logic further comprises bypass circuitry to route data between any of the plurality of paths coupled to the primary interface to any other of the plurality of paths coupled to the primary interface,
wherein the bypass circuit is also operable to pass transactions through the memory module between a port having half the first width for coupling to the primary interface and a second memory module.
2. The memory module of claim 1, wherein:
the DRAM memory space includes a second region operable as a directly accessible memory.
3. The memory module of claim 1 or 2, wherein the buffer circuit further comprises:
a tag compare circuit to compare an incoming tag address with a stored tag address associated with the DRAM cache,
further comprising:
match links distributed along the memory module to communicate match signals to each of the buffer circuits.
CN201680057520.3A 2015-10-01 2016-09-09 Memory system with cached memory module operation Active CN108139978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310262063.3A CN116560563A (en) 2015-10-01 2016-09-09 Memory system with cached memory module operation

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201562235660P 2015-10-01 2015-10-01
US62/235,660 2015-10-01
US201562271551P 2015-12-28 2015-12-28
US62/271,551 2015-12-28
PCT/US2016/051141 WO2017058494A1 (en) 2015-10-01 2016-09-09 Memory system with cached memory module operations

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202310262063.3A Division CN116560563A (en) 2015-10-01 2016-09-09 Memory system with cached memory module operation

Publications (2)

Publication Number Publication Date
CN108139978A CN108139978A (en) 2018-06-08
CN108139978B true CN108139978B (en) 2023-03-03

Family

ID=58424304

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201680057520.3A Active CN108139978B (en) 2015-10-01 2016-09-09 Memory system with cached memory module operation
CN202310262063.3A Pending CN116560563A (en) 2015-10-01 2016-09-09 Memory system with cached memory module operation

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202310262063.3A Pending CN116560563A (en) 2015-10-01 2016-09-09 Memory system with cached memory module operation

Country Status (4)

Country Link
US (3) US10678719B2 (en)
EP (1) EP3356943B1 (en)
CN (2) CN108139978B (en)
WO (1) WO2017058494A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102467357B1 (en) * 2018-01-31 2022-11-14 삼성전자주식회사 Memory system and method of determining error of the same
WO2020190841A1 (en) * 2019-03-18 2020-09-24 Rambus Inc. System application of dram component with cache mode
US11620055B2 (en) 2020-01-07 2023-04-04 International Business Machines Corporation Managing data structures in a plurality of memory devices that are indicated to demote after initialization of the data structures
US11907543B2 (en) 2020-01-07 2024-02-20 International Business Machines Corporation Managing swappable data structures in a plurality of memory devices based on access counts of the data structures
US11573709B2 (en) 2020-01-07 2023-02-07 International Business Machines Corporation Maintaining data structures in a memory subsystem comprised of a plurality of memory devices
US11182291B2 (en) 2020-02-03 2021-11-23 International Business Machines Corporation Using multi-tiered cache to satisfy input/output requests
US11157418B2 (en) 2020-02-09 2021-10-26 International Business Machines Corporation Prefetching data elements within a heterogeneous cache
US11775213B2 (en) 2020-05-27 2023-10-03 Rambus Inc. Stacked memory device with paired channels

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101689145A (en) * 2007-03-30 2010-03-31 拉姆伯斯公司 System including hierarchical memory modules having different types of integrated circuit memory devices
CN102508787A (en) * 2011-11-29 2012-06-20 清华大学 System and method for memory allocation of composite memory
CN103946811A (en) * 2011-09-30 2014-07-23 英特尔公司 Apparatus and method for implementing a multi-level memory hierarchy having different operating modes

Family Cites Families (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356639B2 (en) * 2000-01-05 2008-04-08 Rambus Inc. Configurable width buffered module having a bypass circuit
US6502161B1 (en) * 2000-01-05 2002-12-31 Rambus Inc. Memory system including a point-to-point linked memory subsystem
US6889304B2 (en) 2001-02-28 2005-05-03 Rambus Inc. Memory device supporting a dynamically configurable core organization
JP4481588B2 (en) * 2003-04-28 2010-06-16 株式会社東芝 Semiconductor integrated circuit device
US7269708B2 (en) 2004-04-20 2007-09-11 Rambus Inc. Memory controller for non-homogenous memory system
US8397013B1 (en) * 2006-10-05 2013-03-12 Google Inc. Hybrid memory module
US8089795B2 (en) * 2006-02-09 2012-01-03 Google Inc. Memory module with memory stack and interface with enhanced capabilities
US9171585B2 (en) * 2005-06-24 2015-10-27 Google Inc. Configurable memory circuit system and method
KR101303518B1 (en) * 2005-09-02 2013-09-03 구글 인코포레이티드 Methods and apparatus of stacking drams
US8745315B2 (en) 2006-11-06 2014-06-03 Rambus Inc. Memory Systems and methods supporting volatile and wear-leveled nonvolatile physical memory
US8127199B2 (en) * 2007-04-13 2012-02-28 Rgb Networks, Inc. SDRAM convolutional interleaver with two paths
WO2008131058A2 (en) 2007-04-17 2008-10-30 Rambus Inc. Hybrid volatile and non-volatile memory device
US8874831B2 (en) 2007-06-01 2014-10-28 Netlist, Inc. Flash-DRAM hybrid memory module
US8572455B2 (en) * 2009-08-24 2013-10-29 International Business Machines Corporation Systems and methods to respond to error detection
US8914568B2 (en) 2009-12-23 2014-12-16 Intel Corporation Hybrid memory architectures
US8612809B2 (en) 2009-12-31 2013-12-17 Intel Corporation Systems, methods, and apparatuses for stacked memory
CN102812518B (en) * 2010-01-28 2015-10-21 惠普发展公司,有限责任合伙企业 Access method of storage and device
KR101616093B1 (en) * 2010-02-19 2016-04-27 삼성전자주식회사 Nonvolatile memory device conducting repair operation and memory system including the same
KR101713051B1 (en) 2010-11-29 2017-03-07 삼성전자주식회사 Hybrid Memory System and Management Method there-of
US8713379B2 (en) * 2011-02-08 2014-04-29 Diablo Technologies Inc. System and method of interfacing co-processors and input/output devices via a main memory system
WO2013009442A2 (en) * 2011-07-12 2013-01-17 Rambus Inc. Dynamically changing data access bandwidth by selectively enabling and disabling data links
US8874827B2 (en) 2011-08-09 2014-10-28 Samsung Electronics Co., Ltd. Page merging for buffer efficiency in hybrid memory systems
CN107391397B (en) * 2011-09-30 2021-07-27 英特尔公司 Memory channel supporting near memory and far memory access
US8804394B2 (en) * 2012-01-11 2014-08-12 Rambus Inc. Stacked memory with redundancy
CN104246732A (en) 2012-06-28 2014-12-24 惠普发展公司,有限责任合伙企业 Memory module with a dual-port buffer
US9569393B2 (en) * 2012-08-10 2017-02-14 Rambus Inc. Memory module threading with staggered data transfers
KR20140024669A (en) * 2012-08-20 2014-03-03 에스케이하이닉스 주식회사 Semiconductor memory device
US9158679B2 (en) * 2012-10-10 2015-10-13 Rambus Inc. Data buffer with a strobe-based primary interface and a strobe-less secondary interface
US9430324B2 (en) * 2013-05-24 2016-08-30 Rambus Inc. Memory repair method and apparatus based on error code tracking
US9921980B2 (en) 2013-08-12 2018-03-20 Micron Technology, Inc. Apparatuses and methods for configuring I/Os of memory for hybrid memory modules
US9430434B2 (en) 2013-09-20 2016-08-30 Qualcomm Incorporated System and method for conserving memory power using dynamic memory I/O resizing
CN108831512A (en) 2013-10-15 2018-11-16 拉姆伯斯公司 Load reduced memory module
CN105612580B (en) 2013-11-11 2019-06-21 拉姆伯斯公司 Use the mass-storage system of standard controller component
US20150201016A1 (en) * 2014-01-14 2015-07-16 Amit Golander Methods and system for incorporating a direct attached storage to a network attached storage
US9916196B2 (en) * 2014-02-28 2018-03-13 Rambus Inc. Memory module with dedicated repair devices
US9740646B2 (en) * 2014-12-20 2017-08-22 Intel Corporation Early identification in transactional buffered memory
US20170289850A1 (en) * 2016-04-01 2017-10-05 Intel Corporation Write delivery for memory subsystem with narrow bandwidth repeater channel
US10901840B2 (en) * 2018-06-28 2021-01-26 Western Digital Technologies, Inc. Error correction decoding with redundancy data
US11675716B2 (en) * 2019-12-10 2023-06-13 Intel Corporation Techniques for command bus training to a memory device
US11188264B2 (en) * 2020-02-03 2021-11-30 Intel Corporation Configurable write command delay in nonvolatile memory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101689145A (en) * 2007-03-30 2010-03-31 拉姆伯斯公司 System including hierarchical memory modules having different types of integrated circuit memory devices
CN103946811A (en) * 2011-09-30 2014-07-23 英特尔公司 Apparatus and method for implementing a multi-level memory hierarchy having different operating modes
CN102508787A (en) * 2011-11-29 2012-06-20 清华大学 System and method for memory allocation of composite memory

Also Published As

Publication number Publication date
CN116560563A (en) 2023-08-08
EP3356943A4 (en) 2019-05-01
US11210242B2 (en) 2021-12-28
US20200364164A1 (en) 2020-11-19
US20220171721A1 (en) 2022-06-02
CN108139978A (en) 2018-06-08
US10678719B2 (en) 2020-06-09
EP3356943B1 (en) 2021-11-03
US11836099B2 (en) 2023-12-05
EP3356943A1 (en) 2018-08-08
WO2017058494A1 (en) 2017-04-06
US20180267911A1 (en) 2018-09-20

Similar Documents

Publication Publication Date Title
CN108139978B (en) Memory system with cached memory module operation
US9431063B2 (en) Stacked memory having same timing domain read data and redundancy
US7409491B2 (en) System memory board subsystem using DRAM with stacked dedicated high speed point to point links
US11500576B2 (en) Apparatus and architecture of non-volatile memory module in parallel configuration
US20220148643A1 (en) Memories and memory components with interconnected and redundant data interfaces
EP2458508A1 (en) Integrated circuit with graduated on-die termination
US20230307026A1 (en) High performance, non-volatile memory module
US10983933B2 (en) Memory module with reduced read/write turnaround overhead
US20240079079A1 (en) Buffer circuit with adaptive repair capability
US11853600B2 (en) Memory systems, modules, and methods for improved capacity
US20060294327A1 (en) Method, apparatus and system for optimizing interleaving between requests from the same stream
US9164914B1 (en) Multiple port routing circuitry for flash memory storage systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant