WO2015171461A1 - Interconnect systems and methods using hybrid memory cube links - Google Patents

Interconnect systems and methods using hybrid memory cube links Download PDF

Info

Publication number
WO2015171461A1
WO2015171461A1 PCT/US2015/028873 US2015028873W WO2015171461A1 WO 2015171461 A1 WO2015171461 A1 WO 2015171461A1 US 2015028873 W US2015028873 W US 2015028873W WO 2015171461 A1 WO2015171461 A1 WO 2015171461A1
Authority
WO
WIPO (PCT)
Prior art keywords
data handling
memory
handling device
data
packetized
Prior art date
Application number
PCT/US2015/028873
Other languages
French (fr)
Inventor
John D. LEIDEL
Original Assignee
Micron Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Micron Technology, Inc. filed Critical Micron Technology, Inc.
Priority to KR1020167034480A priority Critical patent/KR101885452B1/en
Priority to CN201580030653.7A priority patent/CN106462524B/en
Priority to JP2016566810A priority patent/JP6522663B2/en
Priority to EP22155792.9A priority patent/EP4016317A1/en
Priority to CN202010026526.2A priority patent/CN111190553B/en
Priority to KR1020187021869A priority patent/KR101925266B1/en
Priority to EP15789012.0A priority patent/EP3140748B1/en
Publication of WO2015171461A1 publication Critical patent/WO2015171461A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • G06F13/1621Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by maintaining request order
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1657Access to multiple memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1684Details of memory controller using multiple buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4234Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a memory bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C21/00Digital stores in which the information circulates continuously
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1075Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers for multiport memories each having random access ports and serial ports, e.g. video RAM
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/20Employing a main memory using a specific memory technology
    • G06F2212/205Hybrid memory, e.g. using both volatile and non-volatile memory

Definitions

  • the present disclosure relates generally to interconnection of devices and related methods, such as semiconductor memory devices, processing devices, memory systems, and processing systems. More particularly, the present disclosure relates to interconnection of such devices and systems using Hybrid Memory Cube links.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • flash memory and resistance variable memory, among others.
  • DRAM Dual In-line Memory Module
  • CPU central processing unit
  • DIMM Dual In-line Memory Module
  • the memory system is in communication with a memory control subsystem or central processing unit (CPU) or microprocessor.
  • the memory controller is physically subsumed into the same physical chip as the processor.
  • the memory controller may be just one of many logical components comprising a memory controller hub.
  • a memory controller hub typically supports completely separate and distinct memory address spaces, often using different types of semiconductor memory or different purposes.
  • a memory controller may support the use of video DRAM for graphics applications, flash memory for disk-drive acceleration, and commodity DRAM as the processor's main external memory.
  • MCHs Memory Control Hubs
  • MCHs are defined primarily as a memory subsystem for a single processor.
  • Many general purpose system architectures include multiple processors, each possibly with their own memory domain. Often these multiple processors need to communicate between themselves.
  • private processor communication busses have been proposed to enhance system interconnection.
  • FIG. 1 is a diagram of a data processing system including a hybrid memory cube as example of a device for operation on a memory bus using an abstracted memory protocol.
  • FIG. 2 illustrates possible partitioning of DRAMs in a hybrid memory cube.
  • FIG. 3 illustrates a logical partitioning of DRAMs in a hybrid memory cube.
  • FIG. 4 illustrates a logic base for link interfaces and controlling the DRAMs in a hybrid memory cube.
  • FIG. 5 illustrates some elements that may be present in a data handling device according to some embodiments of the present disclosure.
  • FIG. 6 illustrates a diagram of a system using in-situ routing between various data handling devices and memory devices and showing sparse routing between the memory devices.
  • FIG. 7 illustrates a diagram of a system using in-situ routing between various data handling devices and memory devices and showing dense routing between the memory devices.
  • FIG. 8 illustrates a diagram of a system using dedicated routing between various data handling devices and memory devices.
  • FIG. 9 illustrates various example topologies that may be used in systems with the dedicated routing of FIG. 8.
  • the embodiments disclosed herein may be implemented or performed with a general purpose processor, a special purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general-purpose processor may be a general-purpose processor
  • processor may be any conventional processor, controller, microcontroller, or state machine.
  • a general-purpose processor should be considered a special-purpose processor configured for carrying out such processes.
  • a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
  • a flowchart may describe operational acts as a sequential process, many of these acts can be performed in another sequence, in parallel, or substantially concurrently. In addition, the order of the acts may be re-arranged.
  • a process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
  • the methods disclosed herein may be implemented in hardware, software, or both. If implemented in software, the functions may be stored or transmitted as one or more instructions or code on a
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • any reference to an element herein using a designation such as “first,” “second,” and so forth does not limit the quantity or order of those elements, unless such limitation is explicitly stated. Rather, these designations may be used herein as a convenient method of distinguishing between two or more elements or instances of an element. Thus, a reference to first and second elements does not mean that only two elements may be employed there or that the first element must precede the second element in some manner.
  • a set of elements may comprise one or more elements.
  • Elements described herein may include multiple instances of the same element. These elements may be generically indicated by a numerical designator (e.g., 110) and specifically indicated by the numerical indicator followed by an alphabetic designator (e.g., 110A) or a numeric indicator preceded by a "dash" (e.g., 1 10-1).
  • a numerical designator e.g., 110
  • an alphabetic designator e.g., 110A
  • a numeric indicator preceded by a "dash” e.g., 1 10-1
  • the term "substantially" in reference to a given parameter, property, or condition means and includes to a degree that one of ordinary skill in the art would understand that the given parameter, property, or condition is met with a small degree of variance, such as within acceptable manufacturing tolerances.
  • the parameter, property, or condition may be at least 90% met, at least 95% met, or even at least 99% met.
  • any relational term such as “over,” “under,” “on,” “underlying,” “upper,” “lower,” etc., is used for clarity and convenience in understanding the disclosure and accompanying drawings and does not connote or depend on any specific preference, orientation, or order, except where the context clearly indicates otherwise.
  • the present disclosure describes methods and apparatus for improving memory subsystems by providing more balanced system bandwidth and enabling reduced complexity of the design and use of such interconnect systems.
  • FIG. 1 is a diagram of a data processing system 100 including a hybrid memory cube device 200 as an example of a device for operation on a memory bus using an abstracted memory protocol 130 to communicate with a data handling device 500.
  • a hybrid memory cube device 200 as an example of a device for operation on a memory bus using an abstracted memory protocol 130 to communicate with a data handling device 500.
  • this disclosure focuses on HMC protocol busses.
  • embodiments of the present disclosure may be practiced with other high-speed data busses that include an abstraction between devices holding the data and the protocol on the data bus.
  • data handling device 500 is used herein to distinguish devices on a memory bus that are configured mostly as consumers and generators of data, rather than devices for storing data, such as a DRAM memory.
  • data handling devices 500 can be considered processors (also referred to herein as processing devices), such as, for example, general purpose processors, special purpose processors, graphics processors, and digital signal processors.
  • data handling devices 500 can be considered communication devices.
  • a communication type data handling device 500 may be configured to convey data between a memory bus and some other type of communication bus, such as, for example, an Input/Output (10) bus or a network bus.
  • data handling devices 500 may also include both processor elements and
  • a data handling device 500 may also describe a data handling device 500 as a System on a Chip (SoC) 500. Unless specifically stated otherwise, a SoC 500 as referred to herein should be considered equivalent to a data handling device 500.
  • SoC 500 may be considered to be focused on processing and moving data, they may also contain significant amounts of memory in the form of registers, buffers, caches, and other types of local memory on the SoC 500. Additional details of the SoC 500 are discussed below in combination with FIG. 5.
  • the hybrid memory cube device 200 includes a logic base 400, which defines the abstracted memory protocol 130 to create memory links 120 between the SoC 500 and the HMC 200.
  • a group of parallel busses 410 interface between the logic base 400 and a group of DRAMs 250 on the HMC 200. Additional details of the HMC 200 are discussed below in connection with FIGS. 2-4.
  • the memory links 120 are partitioned into upstream links headed toward the SoC 500 and downstream links headed away from the SoC 500. As part of the abstracted memory protocol 130 the memory links 120 are packetized as is explained more fully below. As a result, the memory links 120 are also referred to herein as packetized memory links 120 as well as hybrid memory cube links 120. Moreover, the packets conveyed on the memory links 120 are referred to as packet requests and packetized requests.
  • FIG. 2 illustrates a possible partitioning of DRAMs 250 in the HMC 200.
  • the HMC 200 may be considered as a 3 -dimensional stack of DRAM die 250 coupled to the logic base 400.
  • the logic base 400 may be configured as a separate die and configured to interface with the DRAM die 250.
  • interconnect between the various die may be accomplished with through silicon vias. While these devices may be physically configured as a 3-dimensional stack, they do not need to be so configured, but can still be thought of as 3-dimensional from an interconnect perspective.
  • FIG. 3 illustrates a logical partitioning of DRAMs 250 in a HMC 200.
  • FIGS. 2 and 3 the interconnection of multiple die layers enables a memory device with a combination of memory storage layers and one or more logic layers. In this manner, the device provides the physical memory storage and logical memory transaction processing in a single die package configured as the HMC 200. The end result is a very compact, power efficient package with available bandwidth capacity of up to 320GB/s per device.
  • the HMC 200 is capable of such bandwidth via a hierarchical and parallel approach to the design. For example, device hierarchy may occur vertically across the logic layers and the hardware parallelism may occur across a given die layer.
  • the logic base 400 includes multiple components that provide both external link access to the HMC 200 as well as internal routing and transaction logic.
  • the HMC 200 may be segmented into vertical slices 220 often referred to as
  • Each vault 220 may include vault logic 450 incorporated into the logic base 400 to control segments of the DRAMs 250 associated with that vault 220.
  • the vault logic 450 manages memory reference operations to memory partitions within its vault 220.
  • Each vault controller 450 may determine its own timing requirements and refresh operations, which allows different timing for each vault 220 and also eliminates the need for these functions in a host memory controller.
  • a queue may be included with each vault controller 450 to buffer references for that vault's memory.
  • the vault controllers 450 may execute references within their queue based on need rather than order of arrival. Therefore, responses from vault operations back to the external memory links 120 (FIG. 1) may be out of order in some cases.
  • the memory links 120 may be configured to provide four or eight logical links. Each link may be configured as a group of sixteen or eight serial and bidirectional I/O links.
  • Devices configured with four links have the ability to operate at 10, 12.5 and 15Gbps.
  • Devices configured with eight links have the ability to operate at 1 OGbps.
  • the HMC specification defines a different physical addressing and interleave model than traditional banked DRAM devices.
  • Physical addresses for HMC devices 200 are encoded into a 34-bit field that contain the vault, bank, and address bits.
  • the specification permits the implementer and user to define an address mapping scheme that is most optimized for the target memory access characteristics. It also provides a series of default address map modes that join the physical vault and bank structure to the desired maximum block request size.
  • the default map schemas implement a low interleave model by mapping the less significant address bits to the vault address, followed immediately by the bank address bits.
  • This method forces sequential addresses to first interleave across vaults then across banks within vault in order to avoid bank conflicts.
  • -Si- All in-band communication between host devices (e.g., SoCs 500) and HMC devices 200 are performed via a packetized format.
  • This format includes three major packet classifications: request packets, response packets, and flow control packets.
  • Packets may be configured as multiples of a single 16-byte flow unit (also referred to as a FLIT). Packet sizes may be as large as 9 FLITs (i.e., 144 bytes). A smallest packet may include only one 16-byte FLIT including a packet header and packet tail.
  • Memory read request packets for all memory payload sizes only require the packet header, packet tail, and the respective physical memory address.
  • read requests may be configured using a single FLIT.
  • Memory read responses are separate packets that include the data from the address requested in the corresponding memory read packet.
  • Write request and atomic request packets must also contain the associated input data for write and read-modify- write operations, respectively. As such, these request types may have packet widths of 2-9 FLITs.
  • the HMC specification defines a weak-ordering model between packets. As such, there may exist multiple packet reordering points present within a target implementation. Arriving packets that are destined for ancillary devices may pass those waiting for local vault access.
  • Local vaults may also reorder queued packets in order to make most efficient use of bandwidth to and from the respective vault banks.
  • reordering points present in a given HMC implementation may be defined to maintain the order of a stream of packets from a specific link to a specific bank within a vault. This ordering ensures that memory write requests followed by memory read requests deliver correct and deterministic behavior.
  • the link structure in the HMC 200 enables chaining of multiple HMCs 200 to enable the construction of memory subsystems that require capacities larger than a single HMC 200 device while maintaining the link structure and packetized transaction protocols. Additional details regarding the chaining are discussed below with reference to FIGS. 6-9.
  • FIG. 4 illustrates a logic base 400, which may be used for creating the link interfaces 120 and controlling the DRAMs 250 (FIGS. 1-3) in a HMC 200.
  • the memory links 120 which include upstream links and downstream links, may be controlled by a link interface controller 420 for each memory link 120. Packets passed through the link interface controllers 420 may be passed through a crossbar switch 430. If a packet is destined for a vault on the HMC 200, the crossbar switch 430 may pass the packet to memory control logic 440. If a packet is destined for another HMC 200, the crossbar switch 430 may pass the packet to an appropriate link interface controller 420 to be sent on to the appropriate HMC 200.
  • the memory control logic 440 and the vault logic 450 for the various vaults may combine to select the appropriate vault and appropriate timing for the selected vault.
  • FIG. 5 illustrates some elements that may be present in a data handling device 500 according to some embodiments of the present disclosure.
  • systems and methods may use HMC 200 devices and the memory interconnect protocols defined for HMC 200 as the basis for a more global system
  • Devices and systems constructed using the present disclosure may define system characteristics that are improvements over conventional multiprocessor system architectures. Some of these characteristics include high bandwidth memory and system interconnect links, balanced bandwidth and latency characteristics between locally connected memories and other system-level memories, latency minimization by reducing and/or eliminating protocol translations between local memory requests and system-level requests, and latency minimization by utilizing the efficient HMC 200 packet protocol for both local memory requests and system-level requests.
  • the characteristics may also include maintaining atomicity between local memories and system-level memories over the HMC 200 system interconnect using the same protocol, support for a wide spectrum of system-level memory models (e.g., weak versus strong ordering), and support for cache coherency.
  • System configurations may generally be considered as including in-situ routing as shown in FIGS. 6 and 7 and dedicated routing as shown in FIGS. 8 and 9.
  • the two potential implementations represent two different scalability models.
  • the in situ routing model provides efficient system-level scalability for multi-socket workstations, data center servers, and other basic infrastructure devices.
  • the dedicated routing model provides efficient scalability beyond a small number of sockets. This scalability is analogous to building a large enterprise server or mainframe platforms. Both methodologies provide the ability to construct system architectures that are SoC-centric and support architectures that are Non-Uniform Memory Access (NUMA) in nature.
  • NUMA Non-Uniform Memory Access
  • the SoC 500 presents an HMC "source” link to the HMC
  • the source link may also be referred to herein as a second packetized memory link.
  • the SoC 500 inherits the ability to send and receive system link traffic.
  • This extra link enables support of direct messaging from SoC to SoC.
  • One example of such functionality is cache coherency traffic.
  • a system vendor may encode cache coherency requests (e.g., coherency lookups or invalidations) into HMC atomic request packets.
  • cache coherency requests e.g., coherency lookups or invalidations
  • SoC 500 to SoC 500 messaging packets in the HMC base specification for read, write, posted read and posted write requests This ability for system vendors to encode protocols in the HMC packet specification allows them to retain their respective intellectual property and provide high bandwidth, low latency system interconnect support.
  • a conventional SoC 500 configured for an HMC interface may be as simple as one or more processor(s) 540 and a data requestor endpoint 510 coupled to a packetized memory link 120 (e.g., an HMC link 120) through a first hybrid memory cube interface 122.
  • the data requestor endpoint 510 may also be referred to herein as a host requestor endpoint 510.
  • a host only needs to make packet requests on an HMC 200 interface to perform functions such as, for example, memory reads, memory writes, and configuration definition packets.
  • Embodiments of the present disclosure include a data handling
  • the endpoint 520 coupled to a second packetized memory link 620 through a second hybrid memory cube interface 622.
  • Physically, and logically the second packetized memory link 620 is similar to a memory link on an HMC 200 device.
  • the data handling endpoint 520 behaves similar to a memory endpoint.
  • the data handling endpoint 520 interprets packet requests that look like memory reads, memory writes, or other configuration type packets, consumes data on memory writes and generates response packets of data for memory reads.
  • systems can be created wherein the second packetized memory link 620 can be used as a system interconnection to other
  • the second packetized memory link 620 is physically and logically the same as the hybrid memory cube link 120, from an architectural perspective it can be treated as a link for conveying packetized system requests creating flexible and efficient system interconnections.
  • a SoC 500 may be considered a processing device wherein the processors 540 could be implemented as a general purpose processor, a DSP, a special purpose processor, a graphics process, or a combination thereof.
  • the SoC 500 may also be implemented primarily as a communication device.
  • one or more communication elements 550 may be included to translate packets from the data handling endpoint 520 to another bus 560.
  • This other bus 560 may be, for example, a bus to an I/O hub, another communication device, storage devices, a network, or combinations thereof.
  • the SoC 500 may include both processors 540 and communication elements 550.
  • processors 540 and communication elements 550 may be referred to generically as data handling elements (540, 550).
  • the data handling endpoint 520 behaves similar to a memory endpoint, packets handled by the data handling endpoint 520 have addresses associated with them and data may be conveyed in large bursts.
  • the processors 540 and/or communication elements 550 may have memory associated with them with their own addresses such that data can be conveyed directly between the data handling endpoint 520 and the appropriate data handling elements (540, 550).
  • Other embodiments may include a data buffer 530 for defining an address space for link requests to the data handling device 500.
  • the data buffer 530 may be configured as a Direct Memory Access (DMA) buffer or a (First In First Out) FIFO buffer that permits SoCs 500 to send traffic asynchronously to one another.
  • DMA Direct Memory Access
  • FIFO First In First Out
  • the respective size of the data buffer 530 may be determined by the number and frequency of the associated HMC link 620.
  • the SoC 500 may be configured such that the data requestor endpoint 510 can handle requests to that endpoint in a manner similar to the data handling endpoint 520.
  • the data handling endpoint 520 can be configured to originate requests from the data handling elements (540, 550) in a manner similar to the data requestor endpoint 510.
  • the data requestor endpoint is configured for originating first packet requests on a first packetized memory link.
  • the data handling endpoint is configured for interpreting second packet requests to the data handling endpoint on a second packetized memory link and conveying data bidirectionally across the second packetized memory link in response to the second packet requests.
  • the first packetized memory link and the second packetized memory link are separate but include a same type of link protocol and a same type of physical interface.
  • a first hybrid memory cube link is operably coupled to a host requestor endpoint on the data handling device, the host requestor endpoint is for originating packetized memory requests to a local memory domain including one or more hybrid memory cube devices.
  • a second hybrid memory cube link is operably coupled to a data handling endpoint on the data handling device, the data handling endpoint is for interpreting packetized system requests from an additional data handling device operably coupled to at least one of the one or more hybrid memory cube devices.
  • a method of conveying data with a data handling device includes using the data handling device to originate packetized memory requests on a first hybrid memory cube link to a hybrid memory cube device in a first memory domain associated with the data handling device. The method also includes using the data handling device to receive packetized system requests on a second hybrid memory cube link, wherein the packetized system request originates from a second data handling device (not shown in FIG. 5). The method also includes responding to the packetized system requests.
  • FIG. 6 illustrates a diagram of a system 600 using in-situ routing between various data handling devices 500 and memory devices 200 and showing sparse routing between the memory devices 130.
  • multiple HMC devices 200 may be chained together to increase the total memory capacity available to a SoC 500.
  • each HMC 200 is identified through the value in a 3 -bit chip ID field in the request packet header.
  • the 3-bit chip ID field may also be referred to herein as a CUB field or a device ID.
  • a network of up to eight HMC devices 200 may be supported for the processor.
  • Various topologies for interconnection of HMCs 200 are supported and the routing to different HMCs 200 can be complex and include multiple paths.
  • a host processor is usually in control of the routing topologies and loads routing configuration information into each HMC 200 to determine how packets that are not for that HMC 200 should be routed to other links on the HMC 200. This routing information enables each HMC 200 to use the
  • the in situ routing configuration provides system interconnect routing capabilities for a small number of system devices. More specifically, the total number of system devices is gated by the total number of HMC devices 200 present in the system architecture. This limitation follows the base HMC specification's notion that the CUB field is limited to three bits of address field space, which maps to eight total HMC endpoints. In the case of in situ routing, the CUB field is used to denote one or more SoC endpoints. Thus, each SoC 500 and all HMC devices 200 receive a unique CUB identifier for the purpose of routing request traffic between SoC 500 and HMC 200, HMC 200 and HMC 200 or SoC 500 and SoC 500.
  • each of the HMC devices (200-0 through 200-5) are defined with a corresponding device ID 0-5.
  • a first SoC 500-0 in a socket 0 is defined with a device ID 6
  • a second SoC 500-1 is a socket 1 is defined with device ID 7.
  • the in-situ routing configuration can be thought of as having three different types of links.
  • the first link type may be identified as SoC source links 620-0 and 620-1.
  • SoC source links (620-0, 620-1) may also be referred to as second packetized memory links 620 and second hybrid memory cube links 620, as described above with reference to FIG. 5.
  • SoC source links (620-0, 620-1) serve to receive request traffic on the SoC (500-0, 500-1) at its data handling endpoint 520.
  • the SoC source links (620-0, 620-1) permit SoCs (500-0, 500-1) to communicate directly without intermediate double buffering in a main memory space. In this manner, the SoCs (500-0, 500-1) will appear as both an HMC source through the data handling endpoint 520 and a HMC requestor through the data requestor endpoint 510.
  • the second and third link types map to traditional HMC configurations.
  • the second link type i.e., an inter-domain memory link 650-0
  • the inter-domain memory link 650-0 provides the ability to route traffic across HMC links to neighboring memory domains such as a first memory domain 630 and a second memory domain 640.
  • the inter-domain memory link 650-0 serves as a bridge between memory domains.
  • system architects can choose the number of links that bridge the gap between the respective NUMA domains using these system links.
  • FIG. 6 illustrates a sparse routing because there is only one inter-domain memory link 650-0.
  • FIG. 7 illustrates a diagram of a system 700 using in-situ routing between various data handling devices 500 and memory devices 200 and showing dense routing between the memory devices.
  • thee system is densely routed because there are three
  • inter-domain memory links 650-0, 650-1 , and 650-2 are inter-domain memory links 650-0, 650-1 , and 650-2.
  • the densely connected system architecture provides the ability to configure the memory to memory domain topology to create multiple routing paths in order to reduce link hot spotting.
  • FIG. 7 is similar to FIG. 6 and the elements need not be described again.
  • the third link type is local request links 120 that routes memory traffic for each of the local memory domains, respectively. These links are denoted as 120-0 through 120-5. These links provide traditional HMC 200 memory traffic within a memory domain.
  • FIGS. 6 and 7 illustrate fully populated systems 600, 700, respectively.
  • every device ID for the current version of the HMC specification is used.
  • Other systems may be used that expand on the device ID.
  • the addition of a single bit to the device ID could expand the number of devices from 8 to 16 and could include any combination of SoCs 500 and HMCs 200.
  • a system could include the socket 0 SoC 500-0, the socket 1 SoC 500-1 and a single HMC 200 (e.g., HMC 200-0).
  • the SoC source link 620-1 on the SoC 500-1 may be connected directly to a link on the HMC 200-0 and the local memory link 120-1 on the SoC 500-1 may be connected directly to another link on the HMC 200-0.
  • packets can still be passed between SoC 500-0 and SoC 500-1 and the two SoCs 500-0 and 500-1 can share access to the memory in HMC 200-0.
  • the data processing system includes two or more data handling devices and a hybrid memory cube device.
  • Each data handling devices includes a host requestor endpoint configured for originating first packet requests on a first packetized memory links.
  • Each data handling devices also includes a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory links.
  • the hybrid memory cube device is associated with a first memory domain corresponding to one of the two or more data handling devices.
  • the hybrid memory cube device is configured to chain and pass the second packet requests between two of the two or more data handling devices.
  • a method of conveying data in a system includes originating memory requests from a host requestor endpoint on a first data handling device. The method also includes sending the memory requests on a first packetized memory link coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device. The method also includes receiving system requests at the first hybrid memory cube wherein the system requests are from a second data handling device. The method also includes passing the system requests from the first hybrid memory cube to a data handling endpoint on the first data handling device via a second packetized memory links coupled to the first data handling device.
  • the method may further include originating the system requests from the host requestor endpoint on the second data handling device and before receiving the system request at the first hybrid memory cube, receiving the system requests at the second hybrid memory cube and passing the system requests from the second hybrid memory cube to the first hybrid memoiy cube.
  • FIG. 8 illustrates a diagram of a system 800 using dedicated routing between various data handling devices 500 and memory devices 200.
  • the dedicated routing configuration permits larger, more scalable system architectures to be constructed.
  • dedicated routing includes SoCs 500 that can serve both as an HMC requestor through the data requestor endpoint 510 and appear as a target endpoint through the data handling endpoint 520.
  • the HMC request traffic is split into two domains from the perspective of any given SoC 500.
  • Each SoC 500 contains both a local domain and a system domain. Each domain has the ability to support up to eight endpoints (based upon the aforementioned CUB field limitations).
  • each SoC 500 has the ability to support up to eight HMC devices that are locally connected in its local domain. Endpoints in the local domain are generally HMC memory devices 200.
  • FIG. 8 illustrates local domain links as 120-0 through 120-3. Thus, in FIG. 8 there is only one HMC (200-0 through 200-3) associated with each SoC (500-0 through 500-3). However, dedicated routing systems can be configured with up to 8 HMC devices 200 in the local domain of each SoC (500-0 through 500-3).
  • the system domain provides functionality for system level traffic routing.
  • Each SoC (500-0 through 500-3) provides the ability to route system request traffic over the system domain.
  • Endpoints in the system domain can be SoCs 500, HMC devices 200 used as hubs and HMC devices 200 used as memory storage.
  • the scalability of the system is determined by the ratio of HMC router devices to SoC endpoints.
  • FIG. 8 illustrates a dedicated routing system with two HMC hubs (810-0 and 810-1).
  • the HMC hubs (810-0 and 810-1) include links coupled to the second packetized memory links (620-0 through 620-3) of each SoC (500-0 through 500-3).
  • FIG. 8 illustrates inter-hub links (820-0 through 820-2) for coupling the HMC hubs (810-0 and 810-1) together and to adjacent hub devices.
  • FIG. 8 illustrates a system that is not fully populated in the system domain.
  • the HMC hubs (810-0 and 810-1) use device IDs 0 and 1 respectively and the SoCs (500-0 through 500-3) use device IDs 2-5 respectively.
  • another SoC 500 may be coupled to inter-hub link 820-0 and given a device ID of 6 and another SoC 500 may be coupled to inter-hub link 820-1 and given a device ID of 7.
  • another HMC hub 810 may be coupled to inter-hub link 820-1 and given a device ID of 6 and another SoC 500 may be coupled to that other HMC hub 810 and given a device ID of 7.
  • the system interconnect in the dedicated routing architecture may be expanded in other ways.
  • additional bits could be added to the device ID field.
  • the addition of a single bit to the device ID could expand the number of devices from 8 to 16 and could include any combination of SoCs 500 and HMC hubs 810.
  • additional packetized link busses similar to the first packetized link 120 and the second packetized link 620 could be added to open up another completely new domain.
  • the local memory domains for each SoC 500 could be more complex that just including HMC 200 memory devices.
  • the local domain could be configured with an in situ routing architecture as discussed above with reference to FIGS. 5-7.
  • a data processing system in a dedicated routing configuration, includes two or more data handling devices.
  • Each data handling device includes a host requestor endpoint configured for originating local memory packet requests on a first packetized memory links and a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory links.
  • the data processing system also includes one or more hybrid memory cube hubs.
  • Each of the hybrid memory cube hubs include a first packetized memory link operably coupled to the data handling endpoint of one of the two or more data handling devices and a second packetized memory link operably coupled to the data handling endpoint of another of the two or more data handling devices.
  • a method of conveying data in a system includes originating memory requests from a host requestor endpoint on a first data handling device and sending the memory requests on a first packetized memory links coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device.
  • the method also includes originating system requests from a data handling endpoint on the first data handling device and sending the system requests on a second packetized memory links coupled to the first data handling device to a hybrid memory cube hub.
  • the method also includes passing some of the system requests from the hybrid memory cube hub 810-0 to a second data handling device.
  • FIG. 9 illustrates various example topologies that may be used in systems with the dedicated routing of FIG. 8.
  • the dedicated routing methodology also provides the ability to construct much more complex system architectures with different topological advantages.
  • topologies of system domains can be constructed using rings 910, modified rings 920, meshes 930 and crossbars (not shown). The eventual topological determination may be made based upon required bandwidth and latency characteristics weighed against the target system cost.
  • Embodiments of the disclosure may be further characterized, without limitation, as set forth below.
  • Embodiment 1 A data handling device, comprising:
  • a data requestor endpoint configured for originating first packet requests on a first
  • a data handling endpoint configured for:
  • first packetized memory link and the second packetized memory link are
  • Embodiment 2 The data handling device of Embodiment 1, further comprising one or more data handling elements operably coupled to one or more of the data requestor endpoint and the data handling endpoint, each of the one or more data handling elements comprising one or more processors and one or more communication elements.
  • Embodiment 3 The data handling device of Embodiment 2, further comprising a data buffer operably coupled between the data requestor endpoint and the one or more data handling elements, the data buffer for defining an address space for the data handling endpoint.
  • Embodiment 4. The data handling device according to any of Embodiments 1 through 3, wherein the first packetized memory link and the second packetized memory link are both hybrid memory cube links.
  • Embodiment 5. The data handling device according to any of Embodiments 1 through 3, wherein the data handling endpoint is further configured for originating third packet requests on the second packetized memory link.
  • Embodiment 6 The data handling device according to any of Embodiments 1 through 3, wherein the data requestor endpoint is further configured for:
  • Embodiment 7 A data handling device, comprising:
  • a first hybrid memory cube interface operably coupled to a host requestor endpoint on the data handling device, the host requestor endpoint for originating packetized memory requests to a local memory domain comprising one or more hybrid memory cube devices;
  • a second hybrid memory cube interface operably coupled to a data handling endpoint on the data handling device, the data handling endpoint for interpreting packetized system requests from an additional data handling device operably coupled to at least one of the one or more hybrid memory cube devices.
  • Embodiment 8 The data handling device of Embodiment 7, wherein the data handling endpoint is further for conveying data in response to the packetized system requests from the additional data handling device.
  • Embodiment 9. The data handling device of Embodiment 7, wherein at least one of the host requestor endpoint and the data handling endpoint is further for originating additional packetized system requests to the additional data handling device.
  • Embodiment 10. The data handling device of Embodiment 7, wherein at least one of the host requestor endpoint and the data handling endpoint is further for originating additional packetized memory requests to one or more additional hybrid memory cube devices in a remote memory domain correlated with the additional data handling device.
  • Embodiment 1 1.
  • Embodiment 12 The data handling device of Embodiment 7, further comprising a data buffer operably coupled to one or more of the host requestor endpoint and the data handling endpoint, the data buffer for defining an address space for link requests to the data handling device.
  • Embodiment 13 A data processing system, comprising:
  • each data handling device comprising:
  • a host requestor endpoint configured for originating first packet requests on a first packetized memory link
  • a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory link
  • a first hybrid memory cube device associated with a first memory domain of a
  • Embodiment 14 The data processing system of Embodiment 13, further comprising a second hybrid memory cube device associated with a second memory domain of a corresponding one of the two or more data handling devices, wherein the second hybrid memory cube device is configured to chain and pass the second packet requests between the data handling device associated with the second memory domain and the first hybrid memory cube device.
  • Embodiment 15 The data processing system of Embodiment 14, wherein the originated first packet requests from the host requestor endpoint of one of the two or more data handling devices is chained and passed to the data handling endpoint of another of the two or more data handling devices.
  • Embodiment 16 The data processing system according to either of Embodiment 14 and Embodiment 15, wherein each of the first memory domain and the second memory domain includes at least one additional hybrid memory cube device.
  • Embodiment 17 The data processing system of Embodiment 16, further comprising at least one inter-domain link between an additional hybrid memory cube in the first memory domain and an additional hybrid memory cube in the second memory domain.
  • Embodiment 18 The data processing system according to any of Embodiments 14 through 17, wherein each of the two or more data handling devices further comprise a data buffer operably coupled to one or more of the host requestor endpoint and the data handling endpoint, the data buffer for defining an address space for link requests to the data handling device.
  • Embodiment 19 A data processing system, comprising:
  • each data handling device comprising:
  • a host requestor endpoint configured for originating local memory packet requests on a first packetized memory link
  • a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory link
  • one or more hybrid memory cube hubs comprising:
  • a first packetized memory link operably coupled to the data handling endpoint of one of the two or more data handling devices
  • a second packetized memory link operably coupled to the data handling endpoint of another of the two or more data handling devices.
  • Embodiment 20 The data processing system of Embodiment 19, wherein the data handling endpoint for each of the two or more data handling devices is further configured for originating second packet requests on the second packetized memory link to another of the two or more data handling devices.
  • Embodiment 21 The data processing system of Embodiment 19, further comprising two or more hybrid memory cube devices, each hybrid memory cube device operably coupled to the host requestor endpoint of a corresponding one of the two or more data handling devices.
  • Embodiment 22 The data processing system of Embodiment 19, wherein at least one of the one or more hybrid memory cube hubs includes at least one additional packetized memory link operably coupled to another of the one or more hybrid memory cube hubs.
  • Embodiment 23 The data processing system of Embodiment 19, wherein each of the two or more data handling devices further comprise a data buffer operably coupled to one or more of the host requestor endpoint and the data handling endpoint, the data buffer for defining an address space for link requests to the data handling device.
  • Embodiment 24 The data processing system of Embodiment 19, wherein the one or more hybrid memory cube hubs comprise at least two hybrid memory cube hubs arranged in a ring topology.
  • Embodiment 25 The data processing system of Embodiment 19, wherein the one or more hybrid memory cube hubs comprise at least two hybrid memory cube hubs arranged in a hybrid ring topology.
  • Embodiment 26 The data processing system according to any of Embodiments 19 through 25, wherein the one or more hybrid memory cube hubs comprise at least two hybrid memory cube hubs arranged in a mesh topology.
  • Embodiment 27 A method of conveying data with a data handling device, comprising:
  • packetized system requests originate from a second data handling device
  • Embodiment 28 The method of Embodiment 27, further comprising buffering data received with the packetized system requests on the first data handling device to define an address space for the packetized system requests to the first data handling device.
  • Embodiment 29 The method according to either of Embodiment 27 and
  • Embodiment 28 further comprising buffering read data to be sent when responding to the packetized system requests to define an address space on the first data handling device.
  • Embodiment 30 The method according to any of Embodiments 27 through 29, further comprising originating packetized system requests on the first hybrid memory cube link of the first data handling device to the second data handling device.
  • Embodiment 31 The method according to any of Embodiments 27 through 29, further comprising originating packetized system requests on the second hybrid memory cube link of the first data handling device to the second data handling device.
  • Embodiment 32 The method according to any of Embodiments 27 through 29, further comprising originating packetized memory requests on the first hybrid memory cube link of the first data handling device to a hybrid memory cube device in a second memory domain associated with the second data handling device.
  • Embodiment 33 The method according to any of Embodiments 27 through 29, further comprising originating packetized memory requests on the first hybrid memory cube link of the first data handling device to a hybrid memory cube device in a second memory domain associated with the second data handling device.
  • Embodiment 34 A method of conveying data in a system, comprising:
  • originating memory requests from a host requestor endpoint on a first data handling device sending the memory requests on a first packetized memory link coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device;
  • Embodiment 35 The method of Embodiment 34, further comprising:
  • Embodiment 36 The method according to either of Embodiment 34 and
  • Embodiment 35 further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to the second data handling device.
  • Embodiment 37 The method according to any of Embodiments 34 through 36, further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to another hybrid memory cube in the first memory domain.
  • Embodiment 38 The method according to any of Embodiments 34 through 37, further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to a second hybrid memory cube in a second memory domain associated with the second data handling device.
  • Embodiment 39 The method of Embodiment 38, further comprising passing some of the memory requests from the second hybrid memory cube to the second data handling device.
  • Embodiment 40 The method of Embodiment 38, further comprising passing some of the memory requests from the second hybrid memory cube to a third hybrid memory cube in the second memory domain.
  • Embodiment 41 A method of conveying data in a system, comprising:
  • originating memory requests from a host requestor endpoint on a first data handling device sending the memory requests on a first packetized memory link coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device;
  • Embodiment 42 The method of Embodiment 41 , further comprising:
  • Embodiment 43 The method according to either of Embodiment 41 and
  • Embodiment 42 further comprising passing some of the system requests from the hybrid memory cube hub to one or more additional memory cube hubs.
  • Embodiment 44 The method of Embodiment 43, further comprising passing some of the system requests from the one or more additional memory cube hubs to one or more additional data handling devices.
  • Embodiment 45 The method of Embodiment 43, wherein passing some of the system requests between the hybrid memory cube hub and the one or more additional memory cube hubs comprises passing the system requests in an interconnect topology selected from the group consisting of a ring topology, a modified ring topology, and a mesh topology.
  • Embodiment 46 The method of Embodiment 43, further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to another hybrid memory cube in the first memory domain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

System on a Chip (SoC) devices include two packetized memory busses for conveying local memory packets and system interconnect packets. In an in situ configuration of a data processing system two or more SoCs are coupled with one or more hybrid memory cubes (HMCs). The memory packets enable communication with local HMCs in a given SoC's memory domain. The system interconnect packets enable communication between SoCs and communication between memory domains. In a dedicated routing configuration each SoC in a system has its own memory domain to address local HMCs and a separate system interconnect domain to address HMC hubs, HMC memory devices, or other SoC devices connected in the system interconnect domain.

Description

INTERCONNECT SYSTEMS AND METHODS USING
HYBRID MEMORY CUBE LINKS
PRIORITY CLAIM
This application claims the benefit of the filing date of United States Patent Application Serial Number 14/273,867, filed May 9, 2014, for "INTERCONNECT
SYSTEMS AND METHODS USING HYBRID MEMORY CUBE LINKS.
TECHNICAL FIELD
The present disclosure relates generally to interconnection of devices and related methods, such as semiconductor memory devices, processing devices, memory systems, and processing systems. More particularly, the present disclosure relates to interconnection of such devices and systems using Hybrid Memory Cube links.
BACKGROUND
Memory devices are typically provided in many data processing systems as semiconductor integrated circuits and/or external removable devices in computers or other electronic devices. There are many different types of memory including Random Access Memory (RAM), Read Only Memory (ROM), Dynamic Random Access Memory
(DRAM), Synchronous DRAM (SDRAM), flash memory, and resistance variable memory, among others.
Conventional memory systems typically consist of one or more memory devices, such as DRAMs, mounted on a Printed Circuit Board (PCB) called a Dual In-line Memory Module (DIMM). The memory system is in communication with a memory control subsystem or central processing unit (CPU) or microprocessor. In some configurations, the memory controller is physically subsumed into the same physical chip as the processor. In other configurations the memory controller may be just one of many logical components comprising a memory controller hub. A memory controller hub typically supports completely separate and distinct memory address spaces, often using different types of semiconductor memory or different purposes. For example, a memory controller may support the use of video DRAM for graphics applications, flash memory for disk-drive acceleration, and commodity DRAM as the processor's main external memory. The limitations imposed by memory protocols, traditional memory subsystem architectures, standards, processor-specific memory access models, end-user
configurability requirements, power constraints, or combinations of those limitations tend to interact in such a manner that reduce performance and result in non-optimal memory subsystems. Recently, Memory Control Hubs (MCHs) have been proposed to enhance memory performance between processors and memory subsystems. However, MCHs are defined primarily as a memory subsystem for a single processor. Many general purpose system architectures include multiple processors, each possibly with their own memory domain. Often these multiple processors need to communicate between themselves. As a result, private processor communication busses have been proposed to enhance system interconnection.
However, the current generation of general purpose system interconnect specifications do not provide sufficient functionality, flexibility and performance necessary to maintain appropriate balance in systems whose main memory is based upon high bandwidth devices such as are proposed with the HMC specification. It is often the case to find system architectures that maintain many hundreds of gigabytes per second of access to local memory bandwidth, but provide a small fraction (on the order of 1/10th) of this bandwidth to the system interconnect. This result is a highly imbalanced system.
This phenomenon is especially evident in applications with multiple threads (e.g., tasks) of execution distributed among multiple processing sockets/devices. If the core processor supports functional data caching, the cache coherency mechanism that must be present between the processor sockets must support a local memory bandwidth that may be an order of magnitude larger than the bandwidth on the system interconnect. The result is a highly imbalanced system.
There is a need for interconnect systems and methodologies that provide more balanced system bandwidth and can also reduce the complexity needed to design such interconnect systems.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram of a data processing system including a hybrid memory cube as example of a device for operation on a memory bus using an abstracted memory protocol.
FIG. 2 illustrates possible partitioning of DRAMs in a hybrid memory cube.
FIG. 3 illustrates a logical partitioning of DRAMs in a hybrid memory cube. FIG. 4 illustrates a logic base for link interfaces and controlling the DRAMs in a hybrid memory cube.
FIG. 5 illustrates some elements that may be present in a data handling device according to some embodiments of the present disclosure.
FIG. 6 illustrates a diagram of a system using in-situ routing between various data handling devices and memory devices and showing sparse routing between the memory devices.
FIG. 7 illustrates a diagram of a system using in-situ routing between various data handling devices and memory devices and showing dense routing between the memory devices.
FIG. 8 illustrates a diagram of a system using dedicated routing between various data handling devices and memory devices.
FIG. 9 illustrates various example topologies that may be used in systems with the dedicated routing of FIG. 8.
MODE(S) FOR CARRYING OUT THE INVENTION In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which are shown, by way of illustration, specific example embodiments in which the present disclosure may be practiced. These embodiments are described in sufficient detail to enable a person of ordinary skill in the art to practice the present disclosure. However, other embodiments may be utilized, and structural, material, and process changes may be made without departing from the scope of the disclosure. The illustrations presented herein are not meant to be actual views of any particular method, system, device, or structure, but are merely idealized representations that are employed to describe the embodiments of the present disclosure. The drawings presented herein are not necessarily drawn to scale. Similar structures or components in the various drawings may retain the same or similar numbering for the convenience of the reader; however, the similarity in numbering does not mean that the structures or components are necessarily identical in size, composition, configuration, or any other property.
Elements, circuits, modules, and functions may be shown in block diagram form in order not to obscure the present disclosure in unnecessary detail. Moreover, specific implementations shown and described are exemplary only and should not be construed as the only way to implement the present disclosure unless specified otherwise herein. Additionally, block definitions and partitioning of logic between various blocks is exemplary of a specific implementation. It will be readily apparent to one of ordinary skill in the art that the present disclosure may be practiced by numerous other partitioning solutions. For the most part, details concerning timing considerations and the like have been omitted where such details are not necessary to obtain a complete understanding of the present disclosure and are within the abilities of persons of ordinary skill in the relevant art.
Those of ordinary skill would appreciate that the various illustrative logical blocks, modules, circuits, and algorithm acts described in connection with embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and acts are described generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments described herein.
When implemented with hardware, the embodiments disclosed herein may be implemented or performed with a general purpose processor, a special purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a
microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. When executing software for carrying out processes for embodiments described herein, a general-purpose processor should be considered a special-purpose processor configured for carrying out such processes. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
In addition, it is noted that the embodiments may be described in terms of a process that is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram.
Although a flowchart may describe operational acts as a sequential process, many of these acts can be performed in another sequence, in parallel, or substantially concurrently. In addition, the order of the acts may be re-arranged. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. Furthermore, the methods disclosed herein may be implemented in hardware, software, or both. If implemented in software, the functions may be stored or transmitted as one or more instructions or code on a
computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
Those of ordinary skill in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout this description may be represented by voltages, currents,
electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof. Some drawings may illustrate signals as a single signal for clarity of presentation and description. It will be understood by a person of ordinary skill in the art that the signal may represent a bus for carrying the signals, wherein the bus may have a variety of bit widths.
It should be understood that any reference to an element herein using a designation such as "first," "second," and so forth does not limit the quantity or order of those elements, unless such limitation is explicitly stated. Rather, these designations may be used herein as a convenient method of distinguishing between two or more elements or instances of an element. Thus, a reference to first and second elements does not mean that only two elements may be employed there or that the first element must precede the second element in some manner. In addition, unless stated otherwise, a set of elements may comprise one or more elements.
Elements described herein may include multiple instances of the same element. These elements may be generically indicated by a numerical designator (e.g., 110) and specifically indicated by the numerical indicator followed by an alphabetic designator (e.g., 110A) or a numeric indicator preceded by a "dash" (e.g., 1 10-1). For ease of following the description, for the most part element number indicators begin with the number of the drawing on which the elements are introduced or most fully discussed. Thus, for example, element identifiers on a FIG. 1 will be mostly in the numerical format l x and elements on a FIG. 4 will be mostly in the numerical format 4xx. As used herein, the term "substantially" in reference to a given parameter, property, or condition means and includes to a degree that one of ordinary skill in the art would understand that the given parameter, property, or condition is met with a small degree of variance, such as within acceptable manufacturing tolerances. By way of example, depending on the particular parameter, property, or condition that is substantially met, the parameter, property, or condition may be at least 90% met, at least 95% met, or even at least 99% met.
As used herein, any relational term, such as "over," "under," "on," "underlying," "upper," "lower," etc., is used for clarity and convenience in understanding the disclosure and accompanying drawings and does not connote or depend on any specific preference, orientation, or order, except where the context clearly indicates otherwise.
It will be understood that when an element is referred to as being "on," "connected to," "coupled to," or "coupled with" another element, it can be directly on, connected, or coupled with the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly on," "directly connected to" or "directly coupled with" another element, there are no intervening elements or layers present. As used herein, the term "and/or" includes any and all combinations of a number of the associated listed items.
The present disclosure describes methods and apparatus for improving memory subsystems by providing more balanced system bandwidth and enabling reduced complexity of the design and use of such interconnect systems.
FIG. 1 is a diagram of a data processing system 100 including a hybrid memory cube device 200 as an example of a device for operation on a memory bus using an abstracted memory protocol 130 to communicate with a data handling device 500. For ease of description, this disclosure focuses on HMC protocol busses. However, as those in the art having the benefit of this disclosure will appreciate, embodiments of the present disclosure may be practiced with other high-speed data busses that include an abstraction between devices holding the data and the protocol on the data bus.
The term "data handling device" 500 is used herein to distinguish devices on a memory bus that are configured mostly as consumers and generators of data, rather than devices for storing data, such as a DRAM memory. As non-limiting examples, data handling devices 500 can be considered processors (also referred to herein as processing devices), such as, for example, general purpose processors, special purpose processors, graphics processors, and digital signal processors. As another non-limiting example, data handling devices 500 can be considered communication devices. For example, a communication type data handling device 500 may be configured to convey data between a memory bus and some other type of communication bus, such as, for example, an Input/Output (10) bus or a network bus. Of course, data handling devices 500 may also include both processor elements and
communication elements. As such, the description herein may also describe a data handling device 500 as a System on a Chip (SoC) 500. Unless specifically stated otherwise, a SoC 500 as referred to herein should be considered equivalent to a data handling device 500. Finally, while data handling devices 500 may be considered to be focused on processing and moving data, they may also contain significant amounts of memory in the form of registers, buffers, caches, and other types of local memory on the SoC 500. Additional details of the SoC 500 are discussed below in combination with FIG. 5.
The hybrid memory cube device 200 (HMC 200) includes a logic base 400, which defines the abstracted memory protocol 130 to create memory links 120 between the SoC 500 and the HMC 200. A group of parallel busses 410 interface between the logic base 400 and a group of DRAMs 250 on the HMC 200. Additional details of the HMC 200 are discussed below in connection with FIGS. 2-4.
The memory links 120 are partitioned into upstream links headed toward the SoC 500 and downstream links headed away from the SoC 500. As part of the abstracted memory protocol 130 the memory links 120 are packetized as is explained more fully below. As a result, the memory links 120 are also referred to herein as packetized memory links 120 as well as hybrid memory cube links 120. Moreover, the packets conveyed on the memory links 120 are referred to as packet requests and packetized requests.
FIG. 2 illustrates a possible partitioning of DRAMs 250 in the HMC 200. The HMC 200 may be considered as a 3 -dimensional stack of DRAM die 250 coupled to the logic base 400. The logic base 400 may be configured as a separate die and configured to interface with the DRAM die 250. When stacked, interconnect between the various die may be accomplished with through silicon vias. While these devices may be physically configured as a 3-dimensional stack, they do not need to be so configured, but can still be thought of as 3-dimensional from an interconnect perspective.
FIG. 3 illustrates a logical partitioning of DRAMs 250 in a HMC 200. Referring to
FIGS. 2 and 3, the interconnection of multiple die layers enables a memory device with a combination of memory storage layers and one or more logic layers. In this manner, the device provides the physical memory storage and logical memory transaction processing in a single die package configured as the HMC 200. The end result is a very compact, power efficient package with available bandwidth capacity of up to 320GB/s per device.
The HMC 200 is capable of such bandwidth via a hierarchical and parallel approach to the design. For example, device hierarchy may occur vertically across the logic layers and the hardware parallelism may occur across a given die layer. The logic base 400 includes multiple components that provide both external link access to the HMC 200 as well as internal routing and transaction logic.
The HMC 200 may be segmented into vertical slices 220 often referred to as
"vaults 220." Each vault 220 may include vault logic 450 incorporated into the logic base 400 to control segments of the DRAMs 250 associated with that vault 220. The vault logic 450 manages memory reference operations to memory partitions within its vault 220. Each vault controller 450 may determine its own timing requirements and refresh operations, which allows different timing for each vault 220 and also eliminates the need for these functions in a host memory controller. In addition, a queue may be included with each vault controller 450 to buffer references for that vault's memory. The vault controllers 450 may execute references within their queue based on need rather than order of arrival. Therefore, responses from vault operations back to the external memory links 120 (FIG. 1) may be out of order in some cases.
The memory links 120 may be configured to provide four or eight logical links. Each link may be configured as a group of sixteen or eight serial and bidirectional I/O links.
Devices configured with four links have the ability to operate at 10, 12.5 and 15Gbps.
Devices configured with eight links have the ability to operate at 1 OGbps.
Considering the hierarchical nature of the physical memory storage, the HMC specification defines a different physical addressing and interleave model than traditional banked DRAM devices. Physical addresses for HMC devices 200 are encoded into a 34-bit field that contain the vault, bank, and address bits. Rather than relying on a single addressing structure, the specification permits the implementer and user to define an address mapping scheme that is most optimized for the target memory access characteristics. It also provides a series of default address map modes that join the physical vault and bank structure to the desired maximum block request size. The default map schemas implement a low interleave model by mapping the less significant address bits to the vault address, followed immediately by the bank address bits. This method forces sequential addresses to first interleave across vaults then across banks within vault in order to avoid bank conflicts. -Si- All in-band communication between host devices (e.g., SoCs 500) and HMC devices 200 are performed via a packetized format. This format includes three major packet classifications: request packets, response packets, and flow control packets. Packets may be configured as multiples of a single 16-byte flow unit (also referred to as a FLIT). Packet sizes may be as large as 9 FLITs (i.e., 144 bytes). A smallest packet may include only one 16-byte FLIT including a packet header and packet tail.
Memory read request packets for all memory payload sizes only require the packet header, packet tail, and the respective physical memory address. As such, read requests may be configured using a single FLIT. Memory read responses are separate packets that include the data from the address requested in the corresponding memory read packet. Write request and atomic request packets, however, must also contain the associated input data for write and read-modify- write operations, respectively. As such, these request types may have packet widths of 2-9 FLITs. The HMC specification defines a weak-ordering model between packets. As such, there may exist multiple packet reordering points present within a target implementation. Arriving packets that are destined for ancillary devices may pass those waiting for local vault access. Local vaults may also reorder queued packets in order to make most efficient use of bandwidth to and from the respective vault banks. However, reordering points present in a given HMC implementation may be defined to maintain the order of a stream of packets from a specific link to a specific bank within a vault. This ordering ensures that memory write requests followed by memory read requests deliver correct and deterministic behavior.
The link structure in the HMC 200 enables chaining of multiple HMCs 200 to enable the construction of memory subsystems that require capacities larger than a single HMC 200 device while maintaining the link structure and packetized transaction protocols. Additional details regarding the chaining are discussed below with reference to FIGS. 6-9.
FIG. 4 illustrates a logic base 400, which may be used for creating the link interfaces 120 and controlling the DRAMs 250 (FIGS. 1-3) in a HMC 200. The memory links 120, which include upstream links and downstream links, may be controlled by a link interface controller 420 for each memory link 120. Packets passed through the link interface controllers 420 may be passed through a crossbar switch 430. If a packet is destined for a vault on the HMC 200, the crossbar switch 430 may pass the packet to memory control logic 440. If a packet is destined for another HMC 200, the crossbar switch 430 may pass the packet to an appropriate link interface controller 420 to be sent on to the appropriate HMC 200. The memory control logic 440 and the vault logic 450 for the various vaults may combine to select the appropriate vault and appropriate timing for the selected vault.
FIG. 5 illustrates some elements that may be present in a data handling device 500 according to some embodiments of the present disclosure. According to embodiments of the present disclosure, systems and methods may use HMC 200 devices and the memory interconnect protocols defined for HMC 200 as the basis for a more global system
interconnect between multiple SoCs 500. Basic system interconnection capabilities using the HMC 200 device and link specification, thus consolidates the number, density, and heterogeneity of the outgoing links from a host system (e.g., a SoC 500).
Devices and systems constructed using the present disclosure may define system characteristics that are improvements over conventional multiprocessor system architectures. Some of these characteristics include high bandwidth memory and system interconnect links, balanced bandwidth and latency characteristics between locally connected memories and other system-level memories, latency minimization by reducing and/or eliminating protocol translations between local memory requests and system-level requests, and latency minimization by utilizing the efficient HMC 200 packet protocol for both local memory requests and system-level requests. The characteristics may also include maintaining atomicity between local memories and system-level memories over the HMC 200 system interconnect using the same protocol, support for a wide spectrum of system-level memory models (e.g., weak versus strong ordering), and support for cache coherency.
System configurations may generally be considered as including in-situ routing as shown in FIGS. 6 and 7 and dedicated routing as shown in FIGS. 8 and 9. The two potential implementations represent two different scalability models. The in situ routing model provides efficient system-level scalability for multi-socket workstations, data center servers, and other basic infrastructure devices. The dedicated routing model provides efficient scalability beyond a small number of sockets. This scalability is analogous to building a large enterprise server or mainframe platforms. Both methodologies provide the ability to construct system architectures that are SoC-centric and support architectures that are Non-Uniform Memory Access (NUMA) in nature.
In both models, the SoC 500 presents an HMC "source" link to the HMC
infrastructure. The source link may also be referred to herein as a second packetized memory link. With this second link, the SoC 500 inherits the ability to send and receive system link traffic. This extra link enables support of direct messaging from SoC to SoC. One example of such functionality is cache coherency traffic. For example, a system vendor may encode cache coherency requests (e.g., coherency lookups or invalidations) into HMC atomic request packets. One could also encode SoC 500 to SoC 500 messaging packets in the HMC base specification for read, write, posted read and posted write requests. This ability for system vendors to encode protocols in the HMC packet specification allows them to retain their respective intellectual property and provide high bandwidth, low latency system interconnect support.
Returning to FIG. 5, a conventional SoC 500 configured for an HMC interface may be as simple as one or more processor(s) 540 and a data requestor endpoint 510 coupled to a packetized memory link 120 (e.g., an HMC link 120) through a first hybrid memory cube interface 122. The data requestor endpoint 510 may also be referred to herein as a host requestor endpoint 510. Conventionally a host only needs to make packet requests on an HMC 200 interface to perform functions such as, for example, memory reads, memory writes, and configuration definition packets.
Embodiments of the present disclosure, however, include a data handling
endpoint 520 coupled to a second packetized memory link 620 through a second hybrid memory cube interface 622. Physically, and logically the second packetized memory link 620 is similar to a memory link on an HMC 200 device. In other words, the data handling endpoint 520 behaves similar to a memory endpoint. Thus, the data handling endpoint 520 interprets packet requests that look like memory reads, memory writes, or other configuration type packets, consumes data on memory writes and generates response packets of data for memory reads. With a data handling endpoint 520, systems can be created wherein the second packetized memory link 620 can be used as a system interconnection to other
SoCs 500 and memory domains associated with the other SoCs 500. Thus, while the second packetized memory link 620 is physically and logically the same as the hybrid memory cube link 120, from an architectural perspective it can be treated as a link for conveying packetized system requests creating flexible and efficient system interconnections.
Moreover, since the data requestor endpoint 510 and the data handling endpoint 520 are similar, much of the logic design for the two endpoints can be reused rather than creating two separate busses with separate protocols as in conventional multi-processor systems.
As stated previously, a SoC 500 may be considered a processing device wherein the processors 540 could be implemented as a general purpose processor, a DSP, a special purpose processor, a graphics process, or a combination thereof. However, the SoC 500 may also be implemented primarily as a communication device. In such an implementation one or more communication elements 550 may be included to translate packets from the data handling endpoint 520 to another bus 560. This other bus 560 may be, for example, a bus to an I/O hub, another communication device, storage devices, a network, or combinations thereof. Of course, the SoC 500 may include both processors 540 and communication elements 550. Thus, processors 540 and communication elements 550 may be referred to generically as data handling elements (540, 550).
Since the data handling endpoint 520 behaves similar to a memory endpoint, packets handled by the data handling endpoint 520 have addresses associated with them and data may be conveyed in large bursts. In some embodiments, the processors 540 and/or communication elements 550 may have memory associated with them with their own addresses such that data can be conveyed directly between the data handling endpoint 520 and the appropriate data handling elements (540, 550).
Other embodiments may include a data buffer 530 for defining an address space for link requests to the data handling device 500. With the data buffer 530 a separate dedicated address space can be defined and the data buffer 530 can collect data before passing it on the appropriate data handling elements (540, 550). The data buffer 530 may be configured as a Direct Memory Access (DMA) buffer or a (First In First Out) FIFO buffer that permits SoCs 500 to send traffic asynchronously to one another. The respective size of the data buffer 530 may be determined by the number and frequency of the associated HMC link 620.
In addition, the SoC 500 may be configured such that the data requestor endpoint 510 can handle requests to that endpoint in a manner similar to the data handling endpoint 520. Similarly, the data handling endpoint 520 can be configured to originate requests from the data handling elements (540, 550) in a manner similar to the data requestor endpoint 510.
Thus, in a data handling device the data requestor endpoint is configured for originating first packet requests on a first packetized memory link. The data handling endpoint is configured for interpreting second packet requests to the data handling endpoint on a second packetized memory link and conveying data bidirectionally across the second packetized memory link in response to the second packet requests. In addition, the first packetized memory link and the second packetized memory link are separate but include a same type of link protocol and a same type of physical interface.
In another embodiment of a data handling device, a first hybrid memory cube link is operably coupled to a host requestor endpoint on the data handling device, the host requestor endpoint is for originating packetized memory requests to a local memory domain including one or more hybrid memory cube devices. A second hybrid memory cube link is operably coupled to a data handling endpoint on the data handling device, the data handling endpoint is for interpreting packetized system requests from an additional data handling device operably coupled to at least one of the one or more hybrid memory cube devices.
In another embodiment, a method of conveying data with a data handling device includes using the data handling device to originate packetized memory requests on a first hybrid memory cube link to a hybrid memory cube device in a first memory domain associated with the data handling device. The method also includes using the data handling device to receive packetized system requests on a second hybrid memory cube link, wherein the packetized system request originates from a second data handling device (not shown in FIG. 5). The method also includes responding to the packetized system requests.
FIG. 6 illustrates a diagram of a system 600 using in-situ routing between various data handling devices 500 and memory devices 200 and showing sparse routing between the memory devices 130. As stated earlier, multiple HMC devices 200 may be chained together to increase the total memory capacity available to a SoC 500. In a conventional single processor/HMC system, each HMC 200 is identified through the value in a 3 -bit chip ID field in the request packet header. The 3-bit chip ID field may also be referred to herein as a CUB field or a device ID. Thus, a network of up to eight HMC devices 200 may be supported for the processor.
Various topologies for interconnection of HMCs 200 are supported and the routing to different HMCs 200 can be complex and include multiple paths. Thus, a host processor is usually in control of the routing topologies and loads routing configuration information into each HMC 200 to determine how packets that are not for that HMC 200 should be routed to other links on the HMC 200. This routing information enables each HMC 200 to use the
CUB field to route request packets to the proper destination. As a result, when an HMC 200 processes a packet that is not destined for itself, the HMC 200 chains and passes the packet through to another link on the HMC 200 to be sent to another HMC 200.
The in situ routing configuration provides system interconnect routing capabilities for a small number of system devices. More specifically, the total number of system devices is gated by the total number of HMC devices 200 present in the system architecture. This limitation follows the base HMC specification's notion that the CUB field is limited to three bits of address field space, which maps to eight total HMC endpoints. In the case of in situ routing, the CUB field is used to denote one or more SoC endpoints. Thus, each SoC 500 and all HMC devices 200 receive a unique CUB identifier for the purpose of routing request traffic between SoC 500 and HMC 200, HMC 200 and HMC 200 or SoC 500 and SoC 500.
In FIG. 6, each of the HMC devices (200-0 through 200-5) are defined with a corresponding device ID 0-5. In addition, a first SoC 500-0 in a socket 0 is defined with a device ID 6 and a second SoC 500-1 is a socket 1 is defined with device ID 7.
The in-situ routing configuration can be thought of as having three different types of links. The first link type may be identified as SoC source links 620-0 and 620-1. These SoC source links (620-0, 620-1) may also be referred to as second packetized memory links 620 and second hybrid memory cube links 620, as described above with reference to FIG. 5.
These SoC source links (620-0, 620-1) serve to receive request traffic on the SoC (500-0, 500-1) at its data handling endpoint 520. The SoC source links (620-0, 620-1) permit SoCs (500-0, 500-1) to communicate directly without intermediate double buffering in a main memory space. In this manner, the SoCs (500-0, 500-1) will appear as both an HMC source through the data handling endpoint 520 and a HMC requestor through the data requestor endpoint 510.
The second and third link types map to traditional HMC configurations. The second link type (i.e., an inter-domain memory link 650-0) provides the ability to route traffic across HMC links to neighboring memory domains such as a first memory domain 630 and a second memory domain 640. In other words, the inter-domain memory link 650-0 serves as a bridge between memory domains. Depending upon the target system cost model, system architects can choose the number of links that bridge the gap between the respective NUMA domains using these system links. FIG. 6 illustrates a sparse routing because there is only one inter-domain memory link 650-0.
FIG. 7 illustrates a diagram of a system 700 using in-situ routing between various data handling devices 500 and memory devices 200 and showing dense routing between the memory devices. In FIG. 7, thee system is densely routed because there are three
inter-domain memory links 650-0, 650-1 , and 650-2. The densely connected system architecture provides the ability to configure the memory to memory domain topology to create multiple routing paths in order to reduce link hot spotting. Other than the inter-domain memory links 650-0, 650-1 , and 650-2, FIG. 7 is similar to FIG. 6 and the elements need not be described again. The third link type is local request links 120 that routes memory traffic for each of the local memory domains, respectively. These links are denoted as 120-0 through 120-5. These links provide traditional HMC 200 memory traffic within a memory domain.
FIGS. 6 and 7 illustrate fully populated systems 600, 700, respectively. In other words every device ID for the current version of the HMC specification is used. Other systems may be used that expand on the device ID. For example, the addition of a single bit to the device ID could expand the number of devices from 8 to 16 and could include any combination of SoCs 500 and HMCs 200.
In addition, systems may be defined that are sparsely populated. For example, while not illustrated, a system could include the socket 0 SoC 500-0, the socket 1 SoC 500-1 and a single HMC 200 (e.g., HMC 200-0). In such a system, the SoC source link 620-1 on the SoC 500-1 may be connected directly to a link on the HMC 200-0 and the local memory link 120-1 on the SoC 500-1 may be connected directly to another link on the HMC 200-0. As a result, packets can still be passed between SoC 500-0 and SoC 500-1 and the two SoCs 500-0 and 500-1 can share access to the memory in HMC 200-0.
Thus, the data processing system includes two or more data handling devices and a hybrid memory cube device. Each data handling devices includes a host requestor endpoint configured for originating first packet requests on a first packetized memory links. Each data handling devices also includes a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory links. The hybrid memory cube device is associated with a first memory domain corresponding to one of the two or more data handling devices. The hybrid memory cube device is configured to chain and pass the second packet requests between two of the two or more data handling devices.
In another embodiment, a method of conveying data in a system includes originating memory requests from a host requestor endpoint on a first data handling device. The method also includes sending the memory requests on a first packetized memory link coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device. The method also includes receiving system requests at the first hybrid memory cube wherein the system requests are from a second data handling device. The method also includes passing the system requests from the first hybrid memory cube to a data handling endpoint on the first data handling device via a second packetized memory links coupled to the first data handling device. In some embodiments with a second hybrid memory cube, the method may further include originating the system requests from the host requestor endpoint on the second data handling device and before receiving the system request at the first hybrid memory cube, receiving the system requests at the second hybrid memory cube and passing the system requests from the second hybrid memory cube to the first hybrid memoiy cube.
FIG. 8 illustrates a diagram of a system 800 using dedicated routing between various data handling devices 500 and memory devices 200. The dedicated routing configuration permits larger, more scalable system architectures to be constructed. As with the in situ routing configuration, dedicated routing includes SoCs 500 that can serve both as an HMC requestor through the data requestor endpoint 510 and appear as a target endpoint through the data handling endpoint 520. However, in the dedicated routing configuration, the HMC request traffic is split into two domains from the perspective of any given SoC 500. Each SoC 500 contains both a local domain and a system domain. Each domain has the ability to support up to eight endpoints (based upon the aforementioned CUB field limitations). In this manner, each SoC 500 has the ability to support up to eight HMC devices that are locally connected in its local domain. Endpoints in the local domain are generally HMC memory devices 200. FIG. 8 illustrates local domain links as 120-0 through 120-3. Thus, in FIG. 8 there is only one HMC (200-0 through 200-3) associated with each SoC (500-0 through 500-3). However, dedicated routing systems can be configured with up to 8 HMC devices 200 in the local domain of each SoC (500-0 through 500-3).
The system domain provides functionality for system level traffic routing. Each SoC (500-0 through 500-3) provides the ability to route system request traffic over the system domain. Endpoints in the system domain can be SoCs 500, HMC devices 200 used as hubs and HMC devices 200 used as memory storage. The scalability of the system is determined by the ratio of HMC router devices to SoC endpoints.
As one example, FIG. 8 illustrates a dedicated routing system with two HMC hubs (810-0 and 810-1). The HMC hubs (810-0 and 810-1) include links coupled to the second packetized memory links (620-0 through 620-3) of each SoC (500-0 through 500-3). In addition FIG. 8 illustrates inter-hub links (820-0 through 820-2) for coupling the HMC hubs (810-0 and 810-1) together and to adjacent hub devices.
FIG. 8 illustrates a system that is not fully populated in the system domain. The HMC hubs (810-0 and 810-1) use device IDs 0 and 1 respectively and the SoCs (500-0 through 500-3) use device IDs 2-5 respectively. Thus, as one example, another SoC 500 may be coupled to inter-hub link 820-0 and given a device ID of 6 and another SoC 500 may be coupled to inter-hub link 820-1 and given a device ID of 7. As another example, another HMC hub 810 may be coupled to inter-hub link 820-1 and given a device ID of 6 and another SoC 500 may be coupled to that other HMC hub 810 and given a device ID of 7.
The system interconnect in the dedicated routing architecture may be expanded in other ways. For example, as with the in-situ routing additional bits could be added to the device ID field. For example, the addition of a single bit to the device ID could expand the number of devices from 8 to 16 and could include any combination of SoCs 500 and HMC hubs 810. As another example additional packetized link busses similar to the first packetized link 120 and the second packetized link 620 could be added to open up another completely new domain.
Also, the local memory domains for each SoC 500 could be more complex that just including HMC 200 memory devices. The local domain could be configured with an in situ routing architecture as discussed above with reference to FIGS. 5-7.
Thus, in a dedicated routing configuration, a data processing system includes two or more data handling devices. Each data handling device includes a host requestor endpoint configured for originating local memory packet requests on a first packetized memory links and a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory links. The data processing system also includes one or more hybrid memory cube hubs. Each of the hybrid memory cube hubs include a first packetized memory link operably coupled to the data handling endpoint of one of the two or more data handling devices and a second packetized memory link operably coupled to the data handling endpoint of another of the two or more data handling devices.
In another embodiment of a dedicated routing configuration, a method of conveying data in a system includes originating memory requests from a host requestor endpoint on a first data handling device and sending the memory requests on a first packetized memory links coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device. The method also includes originating system requests from a data handling endpoint on the first data handling device and sending the system requests on a second packetized memory links coupled to the first data handling device to a hybrid memory cube hub. The method also includes passing some of the system requests from the hybrid memory cube hub 810-0 to a second data handling device. FIG. 9 illustrates various example topologies that may be used in systems with the dedicated routing of FIG. 8. The dedicated routing methodology also provides the ability to construct much more complex system architectures with different topological advantages. As non-limiting examples, topologies of system domains can be constructed using rings 910, modified rings 920, meshes 930 and crossbars (not shown). The eventual topological determination may be made based upon required bandwidth and latency characteristics weighed against the target system cost.
Embodiments of the disclosure may be further characterized, without limitation, as set forth below.
Embodiment 1. A data handling device, comprising:
a data requestor endpoint configured for originating first packet requests on a first
packetized memory link; and
a data handling endpoint configured for:
interpreting second packet requests to the data handling endpoint on a second
packetized memory link; and
conveying data bidirectionally across the second packetized memory link in
response to the second packet requests;
wherein the first packetized memory link and the second packetized memory link are
separate but include a same type of link protocol and a same type of physical interface.
Embodiment 2. The data handling device of Embodiment 1, further comprising one or more data handling elements operably coupled to one or more of the data requestor endpoint and the data handling endpoint, each of the one or more data handling elements comprising one or more processors and one or more communication elements.
Embodiment 3. The data handling device of Embodiment 2, further comprising a data buffer operably coupled between the data requestor endpoint and the one or more data handling elements, the data buffer for defining an address space for the data handling endpoint. Embodiment 4. The data handling device according to any of Embodiments 1 through 3, wherein the first packetized memory link and the second packetized memory link are both hybrid memory cube links. Embodiment 5. The data handling device according to any of Embodiments 1 through 3, wherein the data handling endpoint is further configured for originating third packet requests on the second packetized memory link.
Embodiment 6. The data handling device according to any of Embodiments 1 through 3, wherein the data requestor endpoint is further configured for:
interpreting third packet requests to the data requestor endpoint on the first packetized
memory link; and
conveying data bidirectionally across the first packetized memory link in response to the third packet requests.
Embodiment 7. A data handling device, comprising:
a first hybrid memory cube interface operably coupled to a host requestor endpoint on the data handling device, the host requestor endpoint for originating packetized memory requests to a local memory domain comprising one or more hybrid memory cube devices; and
a second hybrid memory cube interface operably coupled to a data handling endpoint on the data handling device, the data handling endpoint for interpreting packetized system requests from an additional data handling device operably coupled to at least one of the one or more hybrid memory cube devices.
Embodiment 8. The data handling device of Embodiment 7, wherein the data handling endpoint is further for conveying data in response to the packetized system requests from the additional data handling device. Embodiment 9. The data handling device of Embodiment 7, wherein at least one of the host requestor endpoint and the data handling endpoint is further for originating additional packetized system requests to the additional data handling device. Embodiment 10. The data handling device of Embodiment 7, wherein at least one of the host requestor endpoint and the data handling endpoint is further for originating additional packetized memory requests to one or more additional hybrid memory cube devices in a remote memory domain correlated with the additional data handling device.
Embodiment 1 1. The data handling device of Embodiment 7, wherein the host requestor is further configured for:
interpreting third packet requests to the host requestor endpoint on the first hybrid memory cube interface; and
conveying data bidirectionally across the first hybrid memory cube interface in response to the third packet requests.
Embodiment 12. The data handling device of Embodiment 7, further comprising a data buffer operably coupled to one or more of the host requestor endpoint and the data handling endpoint, the data buffer for defining an address space for link requests to the data handling device.
Embodiment 13. A data processing system, comprising:
two or more data handling devices, each data handling device comprising:
a host requestor endpoint configured for originating first packet requests on a first packetized memory link; and
a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory link; and
a first hybrid memory cube device associated with a first memory domain of a
corresponding one of the two or more data handling devices and wherein the hybrid memory cube device is configured to chain and pass the second packet requests between two of the two or more data handling devices. Embodiment 14. The data processing system of Embodiment 13, further comprising a second hybrid memory cube device associated with a second memory domain of a corresponding one of the two or more data handling devices, wherein the second hybrid memory cube device is configured to chain and pass the second packet requests between the data handling device associated with the second memory domain and the first hybrid memory cube device.
Embodiment 15. The data processing system of Embodiment 14, wherein the originated first packet requests from the host requestor endpoint of one of the two or more data handling devices is chained and passed to the data handling endpoint of another of the two or more data handling devices.
Embodiment 16. The data processing system according to either of Embodiment 14 and Embodiment 15, wherein each of the first memory domain and the second memory domain includes at least one additional hybrid memory cube device.
Embodiment 17. The data processing system of Embodiment 16, further comprising at least one inter-domain link between an additional hybrid memory cube in the first memory domain and an additional hybrid memory cube in the second memory domain.
Embodiment 18. The data processing system according to any of Embodiments 14 through 17, wherein each of the two or more data handling devices further comprise a data buffer operably coupled to one or more of the host requestor endpoint and the data handling endpoint, the data buffer for defining an address space for link requests to the data handling device.
Embodiment 19. A data processing system, comprising:
two or more data handling devices, each data handling device comprising:
a host requestor endpoint configured for originating local memory packet requests on a first packetized memory link; and
a data handling endpoint configured for receiving and responding to second packet requests to the data handling endpoint on a second packetized memory link; and one or more hybrid memory cube hubs comprising:
a first packetized memory link operably coupled to the data handling endpoint of one of the two or more data handling devices; and
a second packetized memory link operably coupled to the data handling endpoint of another of the two or more data handling devices.
Embodiment 20. The data processing system of Embodiment 19, wherein the data handling endpoint for each of the two or more data handling devices is further configured for originating second packet requests on the second packetized memory link to another of the two or more data handling devices.
Embodiment 21. The data processing system of Embodiment 19, further comprising two or more hybrid memory cube devices, each hybrid memory cube device operably coupled to the host requestor endpoint of a corresponding one of the two or more data handling devices.
Embodiment 22. The data processing system of Embodiment 19, wherein at least one of the one or more hybrid memory cube hubs includes at least one additional packetized memory link operably coupled to another of the one or more hybrid memory cube hubs.
Embodiment 23. The data processing system of Embodiment 19, wherein each of the two or more data handling devices further comprise a data buffer operably coupled to one or more of the host requestor endpoint and the data handling endpoint, the data buffer for defining an address space for link requests to the data handling device.
Embodiment 24. The data processing system of Embodiment 19, wherein the one or more hybrid memory cube hubs comprise at least two hybrid memory cube hubs arranged in a ring topology.
Embodiment 25. The data processing system of Embodiment 19, wherein the one or more hybrid memory cube hubs comprise at least two hybrid memory cube hubs arranged in a hybrid ring topology. Embodiment 26. The data processing system according to any of Embodiments 19 through 25, wherein the one or more hybrid memory cube hubs comprise at least two hybrid memory cube hubs arranged in a mesh topology.
Embodiment 27. A method of conveying data with a data handling device, comprising:
on a first data handling device:
originating packetized memory requests on a first hybrid memory cube link to a hybrid memory cube device in a first memory domain associated with the first data handling device;
receiving packetized system requests on a second hybrid memory cube link,
wherein the packetized system requests originate from a second data handling device; and
responding to the packetized system requests.
Embodiment 28. The method of Embodiment 27, further comprising buffering data received with the packetized system requests on the first data handling device to define an address space for the packetized system requests to the first data handling device.
Embodiment 29. The method according to either of Embodiment 27 and
Embodiment 28, further comprising buffering read data to be sent when responding to the packetized system requests to define an address space on the first data handling device. Embodiment 30. The method according to any of Embodiments 27 through 29, further comprising originating packetized system requests on the first hybrid memory cube link of the first data handling device to the second data handling device.
Embodiment 31. The method according to any of Embodiments 27 through 29, further comprising originating packetized system requests on the second hybrid memory cube link of the first data handling device to the second data handling device. Embodiment 32. The method according to any of Embodiments 27 through 29, further comprising originating packetized memory requests on the first hybrid memory cube link of the first data handling device to a hybrid memory cube device in a second memory domain associated with the second data handling device.
Embodiment 33. The method according to any of Embodiments 27 through 29, further comprising originating packetized memory requests on the first hybrid memory cube link of the first data handling device to a hybrid memory cube device in a second memory domain associated with the second data handling device.
Embodiment 34. A method of conveying data in a system, comprising:
originating memory requests from a host requestor endpoint on a first data handling device; sending the memory requests on a first packetized memory link coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device;
receiving system requests at the first hybrid memory cube, the system requests from a
second data handling device;
passing the system requests from the first hybrid memory cube to a data handling endpoint on the first data handling device via a second packetized memory link coupled to the first data handling device.
Embodiment 35. The method of Embodiment 34, further comprising:
originating the system requests from a host requestor endpoint on the second data handling device; and
before receiving the system request at the first hybrid memory cube:
receiving the system requests at a second hybrid memory cube; and
passing the system requests from the second hybrid memory cube to the first hybrid memory cube. Embodiment 36. The method according to either of Embodiment 34 and
Embodiment 35, further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to the second data handling device. Embodiment 37. The method according to any of Embodiments 34 through 36, further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to another hybrid memory cube in the first memory domain.
Embodiment 38. The method according to any of Embodiments 34 through 37, further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to a second hybrid memory cube in a second memory domain associated with the second data handling device.
Embodiment 39. The method of Embodiment 38, further comprising passing some of the memory requests from the second hybrid memory cube to the second data handling device. Embodiment 40. The method of Embodiment 38, further comprising passing some of the memory requests from the second hybrid memory cube to a third hybrid memory cube in the second memory domain.
Embodiment 41. A method of conveying data in a system, comprising:
originating memory requests from a host requestor endpoint on a first data handling device; sending the memory requests on a first packetized memory link coupled to the first data handling device to a first hybrid memory cube in a first memory domain associated with the first data handling device;
originating system requests from a data handling endpoint on the first data handling device; sending the system requests on a second packetized memory link coupled to the first data handling device to a hybrid memory cube hub; and
passing some of the system requests from the hybrid memory cube hub to a second data handling device.
Embodiment 42. The method of Embodiment 41 , further comprising:
passing second system requests from the second data handling device to the hybrid memory cube hub; and receiving the second system requests at the data handling endpoint on the first data handling device.
Embodiment 43. The method according to either of Embodiment 41 and
Embodiment 42, further comprising passing some of the system requests from the hybrid memory cube hub to one or more additional memory cube hubs.
Embodiment 44. The method of Embodiment 43, further comprising passing some of the system requests from the one or more additional memory cube hubs to one or more additional data handling devices.
Embodiment 45. The method of Embodiment 43, wherein passing some of the system requests between the hybrid memory cube hub and the one or more additional memory cube hubs comprises passing the system requests in an interconnect topology selected from the group consisting of a ring topology, a modified ring topology, and a mesh topology.
Embodiment 46. The method of Embodiment 43, further comprising passing some of the memory requests from the first hybrid memory cube in the first memory domain to another hybrid memory cube in the first memory domain.
The embodiments of the disclosure described above and illustrated in the
accompanying drawing figures do not limit the scope of the invention, since these
embodiments are merely examples of embodiments of the disclosure. The invention is defined by the appended claims and their legal equivalents. Any equivalent embodiments lie within the scope of this disclosure. Indeed, various modifications of the present disclosure, in addition to those shown and described herein, such as alternative useful combinations of the elements described, will become apparent to those of ordinary skill in the art from the description. Such modifications and embodiments also fall within the scope of the appended claims and their legal equivalents.

Claims

CLAIMS What is claimed is:
1. A data handling device, comprising:
a data requestor endpoint configured for originating first packet requests on a first
packetized memory link; and
a data handling endpoint configured for:
interpreting second packet requests to the data handling endpoint on a second
packetized memory link; and
conveying data bidirectionally across the second packetized memory link in
response to the second packet requests;
wherein the first packetized memory link and the second packetized memory link are separate but include a same type of link protocol and a same type of physical interface.
2. The data handling device of claim 1 , further comprising one or more data handling elements operably coupled to one or more of the data requestor endpoint and the data handling endpoint, each of the one or more data handling elements comprising one or more processors and one or more communication elements.
3. The data handling device of claim 2, further comprising a data buffer operably coupled between the data requestor endpoint and the one or more data handling elements, the data buffer for defining an address space for the data handling endpoint.
4. The data handling device according to any of claims 1 through 3, wherein the first packetized memory link and the second packetized memory link are both hybrid memory cube links.
5. The data handling device according to any of claims 1 through 3, wherein the data handling endpoint is further configured for originating third packet requests on the second packetized memory link.
6. The data handling device according to any of claims 1 through 3, wherein the data requestor endpoint is further configured for:
interpreting third packet requests to the data requestor endpoint on the first packetized memory link; and
conveying data bidirectionally across the first packetized memory link in response to the third packet requests.
7. The data handling device according to any of claims 1 through 3, further comprising:
a first hybrid memory cube interface operably coupled to the data requestor endpoint, wherein the data requestor endpoint is configured for originating the first packet requests to a local memory domain comprising one or more hybrid memory cube devices; and
a second hybrid memory cube interface operably coupled to the data handling endpoint, wherein the data handling endpoint is configured for interpreting the second packet requests from an additional data handling device operably coupled to at least one of the one or more hybrid memory cube devices.
8. The data handling device of claim 7, wherein the data handling endpoint is further configured for conveying data in response to the second packet requests from the additional data handling device.
9. The data handling device of claim 7, wherein at least one of the data requestor endpoint and the data handling endpoint is further configured for originating additional packet requests to an additional data handling device.
10. The data handling device of claim 7, wherein at least one of the data requestor endpoint and the data handling endpoint is further configured for originating additional packet requests to one or more additional hybrid memory cube devices in a remote memory domain correlated with the additional data handling device.
1 1. The data handling device of any of claims 1 through 3, wherein a first hybrid memory cube device associated with a first memory domain of the data handling device is configured to chain and pass the second packet requests between the data handling device and at least another data handling device.
12. The data handling device of any of claims 1 through 3, wherein the data handling endpoint is operably coupled to a first packetized memory link of a hybrid memory cube hub, and wherein a second packetized memory link of the hybrid memory cube hub is operably coupled to a data handling endpoint of another data handling device.
13. A method of conveying data with a data handling device, comprising: on a first data handling device:
originating packetized memory requests on a first hybrid memory cube link to a hybrid memory cube device in a first memory domain associated with the first data handling device;
receiving packetized system requests on a second hybrid memory cube link,
wherein the packetized system requests originate from a second data handling device; and
responding to the packetized system requests.
14. The method of claim 13, further comprising buffering data received with the packetized system requests on the first data handling device to define an address space for the packetized system requests to the first data handling device.
15. The method of claim 13, further comprising buffering read data to be sent when responding to the packetized system requests to define an address space on the first data handling device.
16. The method of claim 13, further comprising originating packetized system requests on the first hybrid memory cube link of the first data handling device to the second data handling device.
17. The method of claim 13, further comprising originating packetized system requests on the second hybrid memory cube link of the first data handling device to the second data handling device.
18. The method of claim 13, further comprising originating packetized memory requests on the first hybrid memory cube link of the first data handling device to a hybrid memory cube device in a second memory domain associated with the second data handling device.
19. The method of claim 13, further comprising originating system requests from the first data handling device, sending the system requests from the first data handling device on the second hybrid memory cube link to a hybrid memory cube hub, and passing some of the system requests from the first data handling device with the hybrid memory cube hub to a second data handling device.
20. The method of claim 19, further comprising passing some of the system requests from the hybrid memory cube hub to one or more additional memory cube hubs.
PCT/US2015/028873 2014-05-09 2015-05-01 Interconnect systems and methods using hybrid memory cube links WO2015171461A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
KR1020167034480A KR101885452B1 (en) 2014-05-09 2015-05-01 Interconnect systems and methods using hybrid memory cube links
CN201580030653.7A CN106462524B (en) 2014-05-09 2015-05-01 Interconnect system and method using hybrid memory cube links
JP2016566810A JP6522663B2 (en) 2014-05-09 2015-05-01 Interconnection system and method using hybrid memory cube link
EP22155792.9A EP4016317A1 (en) 2014-05-09 2015-05-01 Interconnect systems and methods using hybrid memory cube links
CN202010026526.2A CN111190553B (en) 2014-05-09 2015-05-01 Interconnect system and method using hybrid memory cube links
KR1020187021869A KR101925266B1 (en) 2014-05-09 2015-05-01 Interconnect systems and methods using hybrid memory cube links
EP15789012.0A EP3140748B1 (en) 2014-05-09 2015-05-01 Interconnect systems and methods using hybrid memory cube links

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/273,867 2014-05-09
US14/273,867 US9558143B2 (en) 2014-05-09 2014-05-09 Interconnect systems and methods using hybrid memory cube links to send packetized data over different endpoints of a data handling device

Publications (1)

Publication Number Publication Date
WO2015171461A1 true WO2015171461A1 (en) 2015-11-12

Family

ID=54367966

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/028873 WO2015171461A1 (en) 2014-05-09 2015-05-01 Interconnect systems and methods using hybrid memory cube links

Country Status (7)

Country Link
US (5) US9558143B2 (en)
EP (2) EP3140748B1 (en)
JP (1) JP6522663B2 (en)
KR (2) KR101885452B1 (en)
CN (2) CN111190553B (en)
TW (1) TWI584116B (en)
WO (1) WO2015171461A1 (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11768689B2 (en) * 2013-08-08 2023-09-26 Movidius Limited Apparatus, systems, and methods for low power computational imaging
US9558143B2 (en) 2014-05-09 2017-01-31 Micron Technology, Inc. Interconnect systems and methods using hybrid memory cube links to send packetized data over different endpoints of a data handling device
KR102338266B1 (en) * 2015-09-15 2021-12-16 에스케이하이닉스 주식회사 Memory network and system including the same
US10866854B2 (en) 2015-12-29 2020-12-15 Arteris, Inc. System and method for reducing ECC overhead and memory access bandwidth
US10528421B2 (en) * 2015-12-29 2020-01-07 Arteris, Inc. Protection scheme conversion
CN106445849B (en) * 2016-10-21 2019-05-28 郑州云海信息技术有限公司 The method orderly ordered is handled in a kind of multi-controller
US11397687B2 (en) 2017-01-25 2022-07-26 Samsung Electronics Co., Ltd. Flash-integrated high bandwidth memory appliance
WO2019009585A1 (en) * 2017-07-03 2019-01-10 한양대학교 산학협력단 Hmc control device and method of cpu side and hmc side for low power mode, and power management method of hmc control device
US10877839B2 (en) 2017-09-22 2020-12-29 Arteris, Inc. Recovery of a coherent system in the presence of an uncorrectable error
JP6991446B2 (en) * 2018-05-18 2022-01-12 日本電信電話株式会社 Packet processing device and its memory access control method
US20190042511A1 (en) * 2018-06-29 2019-02-07 Intel Corporation Non volatile memory module for rack implementations
US10846250B2 (en) * 2018-11-12 2020-11-24 Arm Limited Apparatus and method for handling address decoding in a system-on-chip
US11030144B2 (en) * 2018-12-14 2021-06-08 Texas Instruments Incorporated Peripheral component interconnect (PCI) backplane connectivity system on chip (SoC)
CN112131174A (en) * 2019-06-25 2020-12-25 北京百度网讯科技有限公司 Method, apparatus, electronic device, and computer storage medium supporting communication between multiple chips
KR20210063496A (en) 2019-11-22 2021-06-02 삼성전자주식회사 Memory device including processing circuit, and electronic device including system on chip and memory device
US11184245B2 (en) 2020-03-06 2021-11-23 International Business Machines Corporation Configuring computing nodes in a three-dimensional mesh topology
TWI802275B (en) * 2022-02-16 2023-05-11 昱文 李 System on chip
TWI817834B (en) * 2022-11-18 2023-10-01 鯨鏈科技股份有限公司 Memory architecture and data processing method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6970968B1 (en) * 1998-02-13 2005-11-29 Intel Corporation Memory module controller for providing an interface between a system memory controller and a plurality of memory devices on a memory module
US7584335B2 (en) * 2006-11-02 2009-09-01 International Business Machines Corporation Methods and arrangements for hybrid data storage
US20100238693A1 (en) * 2009-03-23 2010-09-23 Micron Technology, Inc. Configurable bandwidth memory devices and methods
US20110219197A1 (en) * 2007-04-12 2011-09-08 Rambus Inc. Memory Controllers, Systems, and Methods Supporting Multiple Request Modes
WO2014014711A1 (en) * 2012-07-18 2014-01-23 Micron Technology, Inc Memory management for a hierarchical memory system

Family Cites Families (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5226125A (en) * 1989-11-17 1993-07-06 Keith Balmer Switch matrix having integrated crosspoint logic and method of operation
US5157663A (en) * 1990-09-24 1992-10-20 Novell, Inc. Fault tolerant computer system
JP2931490B2 (en) * 1992-12-18 1999-08-09 富士通株式会社 Parallel processing method
JPH06314264A (en) * 1993-05-06 1994-11-08 Nec Corp Self-routing cross bar switch
US5561622A (en) 1993-09-13 1996-10-01 International Business Machines Corporation Integrated memory cube structure
US5541914A (en) 1994-01-19 1996-07-30 Krishnamoorthy; Ashok V. Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fanin buffering to efficiently realize arbitrarily low packet loss
JP2790034B2 (en) * 1994-03-28 1998-08-27 日本電気株式会社 Non-operational memory update method
US5448511A (en) 1994-06-01 1995-09-05 Storage Technology Corporation Memory stack with an integrated interconnect and mounting structure
US5692155A (en) * 1995-04-19 1997-11-25 International Business Machines Corporation Method and apparatus for suspending multiple duplex pairs during back up processing to insure storage devices remain synchronized in a sequence consistent order
JP4123621B2 (en) * 1999-02-16 2008-07-23 株式会社日立製作所 Main memory shared multiprocessor system and shared area setting method thereof
US7012895B1 (en) * 2000-11-17 2006-03-14 University Of Kentucky Research Foundation Packet-switching network with symmetrical topology and method of routing packets
US6947981B2 (en) * 2002-03-26 2005-09-20 Hewlett-Packard Development Company, L.P. Flexible data replication mechanism
JP2003076670A (en) * 2002-05-27 2003-03-14 Mitsumasa Koyanagi Network device, and processing system
US6807600B2 (en) * 2002-07-24 2004-10-19 Intel Corporation Method, system, and program for memory based data transfer
US8185602B2 (en) * 2002-11-05 2012-05-22 Newisys, Inc. Transaction processing using multiple protocol engines in systems having multiple multi-processor clusters
US7421525B2 (en) * 2003-05-13 2008-09-02 Advanced Micro Devices, Inc. System including a host connected to a plurality of memory modules via a serial memory interconnect
US7136958B2 (en) * 2003-08-28 2006-11-14 Micron Technology, Inc. Multiple processor system and method including multiple memory hub modules
US7716409B2 (en) * 2004-04-27 2010-05-11 Intel Corporation Globally unique transaction identifiers
US7600023B2 (en) * 2004-11-05 2009-10-06 Hewlett-Packard Development Company, L.P. Systems and methods of balancing crossbar bandwidth
US7471623B2 (en) * 2004-11-23 2008-12-30 Hewlett-Packard Development Company, L.P. Systems and methods for a unified computer system fabric
US8397013B1 (en) 2006-10-05 2013-03-12 Google Inc. Hybrid memory module
US7936772B2 (en) * 2007-07-13 2011-05-03 International Business Machines Corporation Enhancement of end-to-end network QoS
US8787060B2 (en) 2010-11-03 2014-07-22 Netlist, Inc. Method and apparatus for optimizing driver load in a memory package
US8656082B2 (en) * 2008-08-05 2014-02-18 Micron Technology, Inc. Flexible and expandable memory architectures
US8051467B2 (en) * 2008-08-26 2011-11-01 Atmel Corporation Secure information processing
US7929368B2 (en) * 2008-12-30 2011-04-19 Micron Technology, Inc. Variable memory refresh devices and methods
US8549092B2 (en) * 2009-02-19 2013-10-01 Micron Technology, Inc. Memory network methods, apparatus, and systems
US8199759B2 (en) * 2009-05-29 2012-06-12 Intel Corporation Method and apparatus for enabling ID based streams over PCI express
WO2011036727A1 (en) * 2009-09-25 2011-03-31 富士通株式会社 Memory system and memory system control method
US9055076B1 (en) * 2011-06-23 2015-06-09 Amazon Technologies, Inc. System and method for distributed load balancing with load balancer clients for hosts
US8943313B2 (en) 2011-07-19 2015-01-27 Elwha Llc Fine-grained security in federated data sets
US9098209B2 (en) 2011-08-24 2015-08-04 Rambus Inc. Communication via a memory interface
US9092305B1 (en) * 2012-04-16 2015-07-28 Xilinx, Inc. Memory interface circuit
TW201411482A (en) * 2012-05-29 2014-03-16 Mosaid Technologies Inc Ring topology status indication
US9348385B2 (en) 2012-07-09 2016-05-24 L. Pierre deRochement Hybrid computing module
US20140015679A1 (en) 2012-07-15 2014-01-16 Green Badge LLC Specialty Plant Moisture Sensing
US8885510B2 (en) * 2012-10-09 2014-11-11 Netspeed Systems Heterogeneous channel capacities in an interconnect
US9047417B2 (en) * 2012-10-29 2015-06-02 Intel Corporation NUMA aware network interface
US9065722B2 (en) * 2012-12-23 2015-06-23 Advanced Micro Devices, Inc. Die-stacked device with partitioned multi-hop network
US9331958B2 (en) * 2012-12-31 2016-05-03 Advanced Micro Devices, Inc. Distributed packet switching in a source routed cluster server
US9405688B2 (en) * 2013-03-05 2016-08-02 Intel Corporation Method, apparatus, system for handling address conflicts in a distributed memory fabric architecture
US11074169B2 (en) * 2013-07-03 2021-07-27 Micron Technology, Inc. Programmed memory controlled data movement and timing within a main memory device
US9652388B2 (en) * 2013-07-31 2017-05-16 Intel Corporation Method, apparatus and system for performing management component transport protocol (MCTP) communications with a universal serial bus (USB) device
US9558143B2 (en) * 2014-05-09 2017-01-31 Micron Technology, Inc. Interconnect systems and methods using hybrid memory cube links to send packetized data over different endpoints of a data handling device
US9501222B2 (en) * 2014-05-09 2016-11-22 Micron Technology, Inc. Protection zones in virtualized physical addresses for reconfigurable memory systems using a memory abstraction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6970968B1 (en) * 1998-02-13 2005-11-29 Intel Corporation Memory module controller for providing an interface between a system memory controller and a plurality of memory devices on a memory module
US7584335B2 (en) * 2006-11-02 2009-09-01 International Business Machines Corporation Methods and arrangements for hybrid data storage
US20110219197A1 (en) * 2007-04-12 2011-09-08 Rambus Inc. Memory Controllers, Systems, and Methods Supporting Multiple Request Modes
US20100238693A1 (en) * 2009-03-23 2010-09-23 Micron Technology, Inc. Configurable bandwidth memory devices and methods
WO2014014711A1 (en) * 2012-07-18 2014-01-23 Micron Technology, Inc Memory management for a hierarchical memory system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3140748A4 *

Also Published As

Publication number Publication date
TWI584116B (en) 2017-05-21
KR20180088526A (en) 2018-08-03
JP6522663B2 (en) 2019-05-29
US20190012089A1 (en) 2019-01-10
TW201606502A (en) 2016-02-16
US20170131915A1 (en) 2017-05-11
US11947798B2 (en) 2024-04-02
EP4016317A1 (en) 2022-06-22
US20220011940A1 (en) 2022-01-13
KR101925266B1 (en) 2018-12-04
EP3140748B1 (en) 2022-03-16
US10126947B2 (en) 2018-11-13
CN111190553B (en) 2023-05-30
US20240241641A1 (en) 2024-07-18
KR101885452B1 (en) 2018-08-03
EP3140748A1 (en) 2017-03-15
CN106462524A (en) 2017-02-22
US20150324319A1 (en) 2015-11-12
KR20170002604A (en) 2017-01-06
US9558143B2 (en) 2017-01-31
CN106462524B (en) 2020-02-07
US11132127B2 (en) 2021-09-28
JP2017517807A (en) 2017-06-29
CN111190553A (en) 2020-05-22
EP3140748A4 (en) 2018-01-03

Similar Documents

Publication Publication Date Title
US11947798B2 (en) Packet routing between memory devices and related apparatuses, methods, and memory systems
US11237880B1 (en) Dataflow all-reduce for reconfigurable processor systems
Zhang et al. Boosting the performance of FPGA-based graph processor using hybrid memory cube: A case for breadth first search
US11847395B2 (en) Executing a neural network graph using a non-homogenous set of reconfigurable processors
US9501222B2 (en) Protection zones in virtualized physical addresses for reconfigurable memory systems using a memory abstraction
US7155525B2 (en) Transaction management in systems having multiple multi-processor clusters
US10394747B1 (en) Implementing hierarchical PCI express switch topology over coherent mesh interconnect
US7251698B2 (en) Address space management in systems having multiple multi-processor clusters
US20030225938A1 (en) Routing mechanisms in systems having multiple multi-processor clusters
CN111630487B (en) Centralized-distributed hybrid organization of shared memory for neural network processing
CN102866980B (en) Network communication cell used for multi-core microprocessor on-chip interconnected network
WO2020122988A1 (en) Memory request chaining on bus
US10741226B2 (en) Multi-processor computer architecture incorporating distributed multi-ported common memory modules
US12072756B2 (en) Scalable machine check architecture
Engelhardt et al. Towards flexible automatic generation of graph processing gateware
Tseng et al. Scalable mutli-layer barrier synchronization on NoC
Daneshtalab et al. Pipeline-based interlayer bus structure for 3D networks-on-chip

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15789012

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016566810

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015789012

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015789012

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20167034480

Country of ref document: KR

Kind code of ref document: A