US9977750B2 - Coherent memory interleaving with uniform latency - Google Patents

Coherent memory interleaving with uniform latency Download PDF

Info

Publication number
US9977750B2
US9977750B2 US14/568,433 US201414568433A US9977750B2 US 9977750 B2 US9977750 B2 US 9977750B2 US 201414568433 A US201414568433 A US 201414568433A US 9977750 B2 US9977750 B2 US 9977750B2
Authority
US
United States
Prior art keywords
read
target memory
network
switch
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/568,433
Other versions
US20160170916A1 (en
Inventor
Sanjay R. Deshpande
John E. Larson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NXP USA Inc
Original Assignee
NXP USA Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Assigned to FREESCALE SEMICONDUCTOR, INC. reassignment FREESCALE SEMICONDUCTOR, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DESHPANDE, SANJAY R., LARSON, JOHN E.
Priority to US14/568,433 priority Critical patent/US9977750B2/en
Application filed by NXP USA Inc filed Critical NXP USA Inc
Assigned to CITIBANK, N.A., AS NOTES COLLATERAL AGENT reassignment CITIBANK, N.A., AS NOTES COLLATERAL AGENT SUPPLEMENT TO IP SECURITY AGREEMENT Assignors: FREESCALE SEMICONDUCTOR, INC.
Assigned to CITIBANK, N.A., AS NOTES COLLATERAL AGENT reassignment CITIBANK, N.A., AS NOTES COLLATERAL AGENT SUPPLEMENT TO IP SECURITY AGREEMENT Assignors: FREESCALE SEMICONDUCTOR, INC.
Assigned to CITIBANK, N.A., AS NOTES COLLATERAL AGENT reassignment CITIBANK, N.A., AS NOTES COLLATERAL AGENT SUPPLEMENT TO IP SECURITY AGREEMENT Assignors: FREESCALE SEMICONDUCTOR, INC.
Assigned to FREESCALE SEMICONDUCTOR, INC. reassignment FREESCALE SEMICONDUCTOR, INC. PATENT RELEASE Assignors: CITIBANK, N.A., AS COLLATERAL AGENT
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. ASSIGNMENT AND ASSUMPTION OF SECURITY INTEREST IN PATENTS Assignors: CITIBANK, N.A.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. ASSIGNMENT AND ASSUMPTION OF SECURITY INTEREST IN PATENTS Assignors: CITIBANK, N.A.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SUPPLEMENT TO THE SECURITY AGREEMENT Assignors: FREESCALE SEMICONDUCTOR, INC.
Publication of US20160170916A1 publication Critical patent/US20160170916A1/en
Assigned to NXP USA, INC. reassignment NXP USA, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: FREESCALE SEMICONDUCTOR INC.
Assigned to NXP USA, INC. reassignment NXP USA, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE NATURE OF CONVEYANCE PREVIOUSLY RECORDED AT REEL: 040626 FRAME: 0683. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER AND CHANGE OF NAME EFFECTIVE NOVEMBER 7, 2016. Assignors: NXP SEMICONDUCTORS USA, INC. (MERGED INTO), FREESCALE SEMICONDUCTOR, INC. (UNDER)
Publication of US9977750B2 publication Critical patent/US9977750B2/en
Application granted granted Critical
Assigned to NXP B.V. reassignment NXP B.V. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: MORGAN STANLEY SENIOR FUNDING, INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0833Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means in combination with broadcast means (e.g. for invalidation or updating)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1684Details of memory controller using multiple buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements
    • G06F2212/621Coherency control relating to peripheral accessing, e.g. from DMA or I/O device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/101Packet switching elements characterised by the switching fabric construction using crossbar or matrix

Definitions

  • the present disclosure relates in general to semiconductor memories, and more specifically to coherent memory interleaving with uniform latency.
  • Memory interleaving refers to distributing accesses made by programs uniformly across multiple available memory devices in order to achieve higher memory throughput while still maintaining lower latencies by attempting to hide delays between consecutive accesses to a single device. Ultimately, memory interleaving improves the performance of applications running in the system and relieves programs of the worry about optimizing data placement in memories.
  • the effectiveness of interleaving depends on whether accesses to different memory devices have the same or similar latency. If there is significant basic inequality between their access times, then the intended benefit is lost. In that case one might just as well just access the device with the lowest native latency.
  • FIG. 1 is a schematic diagram of a processing network according to an embodiment.
  • Embodiments disclosed herein that provide equal access times to different memory devices without significantly impacting latencies. Interleaving among multiple memory controllers or channels is achieved while maintaining coherency and uniform latencies to the interleaved memory devices using system topologies that nominally do not offer uniform memory access pathways.
  • FIG. 1 is a schematic diagram of a processing network 100 according to an embodiment of the present invention that includes a network 101 , such as a mesh network or other suitable network) of switch points 102 - 116 arranged in a two-dimensional grid or array pattern.
  • a network 101 such as a mesh network or other suitable network
  • switch points 102 - 116 are coupled to communicate with the other switch points 102 - 116 , either directly or indirectly, using a communication protocol such as the International Standards Organization's (ISO) Transmission Control Protocol (TCP) or other suitable protocol that specifies how data should be packetized, addressed, transmitted, routed and received.
  • ISO International Standards Organization's
  • TCP Transmission Control Protocol
  • switch point 102 can be configured to use the TCP to transfer packets directly to switch points 104 and 110 , and indirectly to switch points 106 , 108 , 112 , 114 and 116 via switch points 104 and 110 .
  • Switch points 102 - 116 are further coupled to transmit and receive packets with a respective one of processing devices 118 - 132 using a suitable communication protocol, such as TCP.
  • Processing devices 118 - 132 can issue requests for data from interleaved memory devices 168 - 174 through a respective one of switch points 102 - 116 .
  • the requests and responses may be routed column-wise through the network of switch points 102 - 116 .
  • a request from processor 118 can be routed through switch point 102 to switch point 110 .
  • a request from processor 120 can be routed through switch point 104 to switch point 112 , and so on.
  • the requests typically specify an address of the data within memory device 168 - 174 .
  • Data can be sorted or hashed among memory devices 168 - 174 by address. For example, the last few numbers of each address can be used to determine which memory device 168 - 174 stores the data.
  • memory addresses can be accessed sequentially in memory devices 168 - 174 , i.e., a first memory address can be in memory device 168 , the second address can be in memory device 170 , the third address can be in memory device 172 , etc.
  • each memory device 168 - 174 can work on gathering a portion of the requested data simultaneously to reduce the amount of time it would otherwise take to gather all of the data from one of memory devices 168 - 174 sequentially.
  • the address for a single cacheline can reside within a single memory controller 160 - 166 .
  • Multiple cacheline requests can be interleaved across multiple memory controllers 160 - 166 hashed by the address.
  • Each memory controller 160 - 166 can then uniquely operate on its own hashed memory addresses fetching a cacheline worth of data.
  • Switch points 110 - 116 at the edge of the network 101 are each coupled to a respective network interface controller (NIC) 134 - 140 .
  • NICs 134 - 140 translate the message packets from a data layer to a coherency layer using a coherency protocol, and vice versa, depending on whether the message is incoming or outgoing. Routing information can also be added to the message packets in NICs 134 - 140 .
  • Memory devices 168 - 174 can be any suitable memory devices such as, for example, Random Access Memory (RAM), Static RAM (SRAM), Magnetoresistive RAM (MRAM), Nonvolatile RAM (NVRAM, such as “FLASH” memory, etc.), and/or Dynamic RAM (DRAM) such as Synchronous DRAM (SDRAM), Double Data Rate RAM, Erasable Programmable ROM (EPROM), an Electrically Erasable Programmable ROM (EEPROM), etc., among others.
  • RAM Random Access Memory
  • SRAM Static RAM
  • MRAM Magnetoresistive RAM
  • NVRAM Nonvolatile RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • EPROM Erasable Programmable ROM
  • EEPROM Electrically Erasable Programmable ROM
  • End points 142 - 148 are coupled to communicate with NICs 134 - 140 and interleaving crossbar 150 .
  • End points 142 - 148 performs a hash process on the memory address for each request to determine which memory device 168 - 174 to use to access the data. Any suitable hash process can be used in end points 142 - 148 .
  • Processors 118 - 132 issue requests to interleaved memory devices 168 - 174 through one or more switch points 102 - 116 along a column corresponding to the switch point 102 - 116 to which the processor 118 - 132 is connected regardless of the particular address within the range that is interleaved. This way, all such accesses within the range of addresses from processor 118 - 132 experience the same path length inside interleaving crossbar 150 .
  • Interleaving crossbar 150 receives read and write requests from end points 142 - 148 and routes the request to the specified global ordering point (GOP) and memory target modules 152 - 158 .
  • Crossbar 150 also transmits data from memory devices 168 - 174 to end points 142 - 148 via a respective GOP and memory target modules 152 - 158 .
  • Crossbar 150 can use hashed addresses to route requests and is capable of interleaving the requests among multiple memory devices 168 - 174 .
  • routing requests through crossbar 150 results in approximately the same latency among all interleaved read/write requests, which helps realize the benefits of interleaving as requests may be evenly distributed among memory devices 168 - 174 .
  • GOP and memory target modules 152 - 158 are coupled to communicate bi-directionally with a respective one of memory controllers 160 - 166 .
  • Memory controllers 160 - 166 are coupled to communicate bi-directionally with a respective one of memory devices 168 - 174 .
  • the memory target portion of each GOP and memory target module 152 - 158 communicates requests and receives responses directly from a respective one of memory controllers 160 - 166 .
  • the combination of each GOP and memory target module 152 - 158 and the corresponding memory controller 160 - 166 is referred to herein as a controller or control logic unit 176 - 182 .
  • Processors 118 - 132 include one or more levels of cache memory that stores a subset of duplicative information that is stored in memory devices 168 - 174 . Using a cache may reduce the number of occurrences that processors 118 - 132 must communicate with the memory devices 168 - 174 via interleaving crossbar 150 . However, the presence of various caches (or other memory devices) in system 100 can readily permit a data operand which has the same identifier or address to be present at various locations in system 100 . When the data operand is modified in one part of system 100 , an opportunity exists that an old version of the data operand will be stored or used.
  • Memory coherency refers to the need of each processor 118 - 132 in system 100 to have access to the most recently modified data corresponding to a particular address in the memory devices 168 - 174 .
  • the presence of differing data values for a same address value in system 100 may lead to erroneous results being output by processors 118 - 132 .
  • reads and writes of information to memory devices 168 - 174 are monitored or “snooped”.
  • this address of the transaction is used as a snoop address.
  • a snoop request is initiated and directed to all caches in system 100 such that snoop lookups can be performed to search for any address in the caches that match the snoop address.
  • a snoop hit occurs for every match, and any needed corrective action is taken to maintain coherency of the data at the address in the cache where the snoop hit occurs.
  • Coherency of information utilized by the processors 118 - 132 can be maintained in accordance with, for example, the MESI (Modified-Exclusive-Shared-Invalid) protocol.
  • MESI Modified-Exclusive-Shared-Invalid
  • memory controllers 160 - 166 coupled to GOP and memory target modules 152 - 158 may implement a credit system.
  • GOP and memory target modules 152 - 158 include logic to buffer transactions sent by requesters, and to buffer snoop credits issued by a target memory device 168 - 174 .
  • Each request type may be allocated a certain number of credits (where each credit represents one address buffer entry and corresponding data buffer entry, if applicable). The total number of credits may not exceed the total number of entries.
  • the credits can include request credits corresponding to a request queue and snoop credits for snoop requests. If a snoop credit is available from a target memory device 168 - 174 , GOP and memory target modules 152 - 158 arbitrate between the transactions and select one at a time to snoop. Each transaction so selected consumes a snoop credit.
  • GOP and memory target modules 152 - 158 can also perform a snoop broadcast for the selected transaction to the snoopers in network 100 , and attach a write data buffer credit to a snoop, if one is outstanding, and indicate its dispatch to the target.
  • Signals used to maintain coherency in system 100 may be communicated between each GOP and memory target module 152 - 158 and a respective one of switch points 110 - 116 on the edge of mesh network 101 outside of interleaving crossbar 150 .
  • a snoop bus 184 can be included to transmit accesses to be snooped to a selected set of cache memories used by processors 118 - 132 .
  • a coherency response connection 186 can be coupled between each GOP and memory target module 152 - 158 and a respective switch point 110 - 116 .
  • each processor 118 - 132 can report compliance with the coherency requirements of the memory transaction request to a respective GOP and memory target module 152 - 158 via a respective coherency response connection 186 .
  • clean data paths 188 can be coupled between each GOP and memory target module 152 - 158 and a respective switch point 110 - 116 to move coherent or “clean” data to memory in processors 118 - 132 in compliance with the coherency protocol being used.
  • the coherency action is carried out via the switch point 102 - 116 that received the snoop request and can be routed to all cache memories using data at a specified address via the receiving switch point 102 - 116 . If there is intervention, a snooper responds to a request for data and sources the data rather than allowing the data to be sourced from one of memory devices 166 - 174 to which the request was addressed. The requesting processor 118 - 132 receives the data directly from the snooper supplying the data.
  • GOP and memory target module 152 - 158 sends the response via the interleaving crossbar 150 back to the switch point 102 - 116 from where the original request arrived. Again, the response path has the same length regardless of the specific GOP and memory target module 152 - 158 that returns the data.
  • a data processing system ( 100 ) can comprise a network of interconnected switch points ( 102 - 116 ) having a plurality of edge switch points ( 110 - 116 ) located at an edge of the network and a plurality of network interface controllers ( 134 - 140 ). Each edge switch point of the plurality of edge points can be coupled to a corresponding network interface controller of the plurality of network interface controllers.
  • a crossbar switch ( 150 ) can be coupled between the plurality of network interface controllers and the plurality of target controllers ( 152 / 160 , 154 / 162 , 156 / 164 , 158 / 166 ).
  • the crossbar switch can be configured to communicate read/write signals (R/W) between any one of the plurality of network interface controllers and any one of the plurality of target controllers.
  • the read/write signals communicated by the crossbar switch can comprise read/write access requests and corresponding read/write data.
  • all read/write access requests directed to any target controller of the plurality of target controllers generated by a processing unit at a first switch point of the network are transmitted through a same path of switch points from the first switch point to a first edge switch point of the plurality of edge switch points.
  • a first read/write access request generated by the processing unit at the first switch point and transmitted to the first edge switch point is communicated to a first target controller of the plurality of target controllers through the crossbar switch
  • a second read/write access request generated by the processing unit at the first switch point and transmitted to the first edge switch point is communicated to a second target controller of the plurality of target controllers through the crossbar switch.
  • the data processing system can further comprise a snoop interconnect between each network interface controller and a corresponding target controller of the plurality of target controllers, wherein the snoop interconnect is configured to communicate snoop messages outside the crossbar switch.
  • the snoop messages are communicated between each network interface controller and a snoop network.
  • the data processing system can further comprise a coherency response interconnect between each network interface controller and the corresponding target controller, wherein the coherency response interconnect is configured to communicate coherency responses outside the crossbar switch.
  • the coherency responses are communicated between each network interface and a coherency response network.
  • each target controller comprises a global ordering point ( 152 - 158 ) and a memory controller ( 160 - 166 ), wherein the global ordering point is configured to determine an order of memory accesses performed by the memory controller.
  • the data processing system can further comprise a memory ( 160 , 162 , 164 , or 166 ) coupled to a memory controller of a target controller of the plurality of target controllers.
  • the data processing system can further comprise a memory ( 160 - 166 ) coupled to each memory controller.
  • an embodiment of a method can comprise generating, by a first processing unit coupled to a first switch point of the network, a first read/write access request; transmitting the first read/write access request through a first path of the network to a first edge switch point; determining a first destination target controller of the first read/write access request; and transmitting the first read/write access request through a crossbar switch ( 150 ) from the first edge switch point to the first destination target controller.
  • the first path of the network comprises one or more switch points of the network.
  • the method can further comprise generating, by the first processing unit coupled to the first switch point of the network, a second read/write access request having a different destination than the first read/write access request; transmitting the second read/write access request through the first path of the network to the first edge switch point; determining a second destination target controller of the second read/write access request, wherein the second destination target controller is different from the first target controller; and transmitting the second read/write access request through the crossbar switch from the first edge switch point to the second destination target controller.
  • the method can further comprise providing a snoop message from the first destination target controller to the first edge switch point without transmitting the snoop message through the crossbar switch; and providing the snoop message from the first edge switch point to a snoop network of the data processing system.
  • the method can further comprise receiving, at the first edge switch point, a coherency response to the first read/write access from a coherency response network of the data processing system; and providing the coherency response to the first destination target controller without transmitting the coherency response through the crossbar switch.
  • the method can further comprise transmitting read/write data corresponding to the first read/write access request between the first switch point and the first edge switch point, wherein the read/write data is transmitted along the first path; and transmitting the read/write data through the crossbar switch between the first edge switch point and the destination target controller.
  • the method can further comprise, when the first read/write access request is a write request, providing the read/write data to a first memory coupled to the first destination target controller; and when the first read/write access request is a read request, receiving the read/write data from the first memory.
  • a data processing system can comprise a network of interconnected switch points having a plurality of edge switch points located at an edge of the network, and a plurality of network interface controllers. Each edge switch point of the plurality of edge points can be coupled to a corresponding network interface controller of the plurality of network interface controllers.
  • a crossbar switch can be coupled between the plurality of network interface controllers and the plurality of target controllers. The crossbar switch can be configured to communicate read/write signals between any one of the plurality of network interface controllers and any one of the plurality of target controllers.
  • a snoop bus can be coupled between each network interface controller and a corresponding target controller of the plurality of target controllers.
  • the data processing system can further comprise a coherency response bus coupled between each network interface controller and the corresponding target controller.
  • Processing system or network 100 can be implemented in one or more Systems-On-Chip (SoC), Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Field-Programmable Gate Array (FPGA), processor, microprocessor, controller, a microcontroller (MCU), Graphics Processing Unit (GPU), or the like. Additionally or alternatively, processing network 100 may include one or more mixed-signal or analog circuits, such as, for example, Analog-to-Digital Converter (ADCs), Digital-to-Analog Converter (DACs), Phased Locked Loop (PLLs), oscillators, filters, amplifiers, etc. Additionally or alternatively, processing network 100 may include one or more Micro-ElectroMechanical Systems (MEMS), Nano-ElectroMechanical Systems (NEMS), or the like.
  • SoC Systems-On-Chip
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • FPGA Field-Programmable Gate Array
  • processor microprocessor
  • controller controller
  • MCU Microcontroller
  • processors 118 - 132 are computer systems which may be designed to give independent computing power to one or more users.
  • Computer systems may be found in many forms including but not limited to mainframes, minicomputers, servers, workstations, personal computers, notepads, tablets, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices.
  • a typical computer system includes at least one processing unit, associated memory and a number of input/output (I/O) devices.
  • a computer system processes information according to a program and produces resultant output information via I/O devices.
  • a computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process.
  • any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components.
  • any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality.
  • the illustrated elements of systems disclosed herein are circuitry located on a single integrated circuit or within a same device.
  • the systems may include any number of separate integrated circuits or separate devices interconnected with each other.
  • a system or portions thereof may be soft or code representations of physical circuitry or of logical representations convertible into physical circuitry.
  • a system may be embodied in a hardware description language of any appropriate type.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Multi Processors (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)

Abstract

A data processing system includes a network of interconnected switch points having a plurality of edge switch points located at an edge of the network; a plurality of network interface controllers, wherein each edge switch point of the plurality of edge points is coupled to a corresponding network interface controller of the plurality of network interface controllers; a plurality of target controllers; and a crossbar switch coupled between the plurality of network interface controllers and the plurality of target controllers. The crossbar switch is configured to communicate read/write signals between any one of the plurality of network interface controllers and any one of the plurality of target controllers.

Description

BACKGROUND
Field of the Disclosure
The present disclosure relates in general to semiconductor memories, and more specifically to coherent memory interleaving with uniform latency.
Description of the Related Art
Memory interleaving refers to distributing accesses made by programs uniformly across multiple available memory devices in order to achieve higher memory throughput while still maintaining lower latencies by attempting to hide delays between consecutive accesses to a single device. Ultimately, memory interleaving improves the performance of applications running in the system and relieves programs of the worry about optimizing data placement in memories.
The effectiveness of interleaving depends on whether accesses to different memory devices have the same or similar latency. If there is significant basic inequality between their access times, then the intended benefit is lost. In that case one might just as well just access the device with the lowest native latency.
Many connectivity topologies for large computer systems inherently present large inequality of path-lengths to different memory devices connected within them. The differences in latency makes access times non-uniform, which undermines the usefulness of interleaving.
BRIEF DESCRIPTION OF THE DRAWINGS
The benefits, features, and advantages of the present disclosure will become better understood with regard to the following description, and accompanying drawings where:
FIG. 1 is a schematic diagram of a processing network according to an embodiment.
DETAILED DESCRIPTION
Embodiments disclosed herein that provide equal access times to different memory devices without significantly impacting latencies. Interleaving among multiple memory controllers or channels is achieved while maintaining coherency and uniform latencies to the interleaved memory devices using system topologies that nominally do not offer uniform memory access pathways.
FIG. 1 is a schematic diagram of a processing network 100 according to an embodiment of the present invention that includes a network 101, such as a mesh network or other suitable network) of switch points 102-116 arranged in a two-dimensional grid or array pattern. Each switch point 102-116 is coupled to communicate with the other switch points 102-116, either directly or indirectly, using a communication protocol such as the International Standards Organization's (ISO) Transmission Control Protocol (TCP) or other suitable protocol that specifies how data should be packetized, addressed, transmitted, routed and received. As an example, switch point 102 can be configured to use the TCP to transfer packets directly to switch points 104 and 110, and indirectly to switch points 106, 108, 112, 114 and 116 via switch points 104 and 110.
Switch points 102-116 are further coupled to transmit and receive packets with a respective one of processing devices 118-132 using a suitable communication protocol, such as TCP. Processing devices 118-132 can issue requests for data from interleaved memory devices 168-174 through a respective one of switch points 102-116. The requests and responses may be routed column-wise through the network of switch points 102-116. For example, a request from processor 118 can be routed through switch point 102 to switch point 110. A request from processor 120 can be routed through switch point 104 to switch point 112, and so on. The requests typically specify an address of the data within memory device 168-174. Data can be sorted or hashed among memory devices 168-174 by address. For example, the last few numbers of each address can be used to determine which memory device 168-174 stores the data. When data is interleaved, memory addresses can be accessed sequentially in memory devices 168-174, i.e., a first memory address can be in memory device 168, the second address can be in memory device 170, the third address can be in memory device 172, etc. When multiple sequential addresses are accessed, each memory device 168-174 can work on gathering a portion of the requested data simultaneously to reduce the amount of time it would otherwise take to gather all of the data from one of memory devices 168-174 sequentially. For simultaneous memory accesses within a cacheline, the address for a single cacheline can reside within a single memory controller 160-166. Multiple cacheline requests can be interleaved across multiple memory controllers 160-166 hashed by the address. Each memory controller 160-166 can then uniquely operate on its own hashed memory addresses fetching a cacheline worth of data.
Switch points 110-116 at the edge of the network 101 are each coupled to a respective network interface controller (NIC) 134-140. NICs 134-140 translate the message packets from a data layer to a coherency layer using a coherency protocol, and vice versa, depending on whether the message is incoming or outgoing. Routing information can also be added to the message packets in NICs 134-140.
Memory devices 168-174 can be any suitable memory devices such as, for example, Random Access Memory (RAM), Static RAM (SRAM), Magnetoresistive RAM (MRAM), Nonvolatile RAM (NVRAM, such as “FLASH” memory, etc.), and/or Dynamic RAM (DRAM) such as Synchronous DRAM (SDRAM), Double Data Rate RAM, Erasable Programmable ROM (EPROM), an Electrically Erasable Programmable ROM (EEPROM), etc., among others.
End points 142-148 are coupled to communicate with NICs 134-140 and interleaving crossbar 150. End points 142-148 performs a hash process on the memory address for each request to determine which memory device 168-174 to use to access the data. Any suitable hash process can be used in end points 142-148. Processors 118-132 issue requests to interleaved memory devices 168-174 through one or more switch points 102-116 along a column corresponding to the switch point 102-116 to which the processor 118-132 is connected regardless of the particular address within the range that is interleaved. This way, all such accesses within the range of addresses from processor 118-132 experience the same path length inside interleaving crossbar 150.
Interleaving crossbar 150 receives read and write requests from end points 142-148 and routes the request to the specified global ordering point (GOP) and memory target modules 152-158. Crossbar 150 also transmits data from memory devices 168-174 to end points 142-148 via a respective GOP and memory target modules 152-158. Crossbar 150 can use hashed addresses to route requests and is capable of interleaving the requests among multiple memory devices 168-174. Thus, routing requests through crossbar 150 results in approximately the same latency among all interleaved read/write requests, which helps realize the benefits of interleaving as requests may be evenly distributed among memory devices 168-174.
GOP and memory target modules 152-158 are coupled to communicate bi-directionally with a respective one of memory controllers 160-166. Memory controllers 160-166 are coupled to communicate bi-directionally with a respective one of memory devices 168-174. The memory target portion of each GOP and memory target module 152-158 communicates requests and receives responses directly from a respective one of memory controllers 160-166. The combination of each GOP and memory target module 152-158 and the corresponding memory controller 160-166 is referred to herein as a controller or control logic unit 176-182.
Processors 118-132 include one or more levels of cache memory that stores a subset of duplicative information that is stored in memory devices 168-174. Using a cache may reduce the number of occurrences that processors 118-132 must communicate with the memory devices 168-174 via interleaving crossbar 150. However, the presence of various caches (or other memory devices) in system 100 can readily permit a data operand which has the same identifier or address to be present at various locations in system 100. When the data operand is modified in one part of system 100, an opportunity exists that an old version of the data operand will be stored or used. Memory coherency refers to the need of each processor 118-132 in system 100 to have access to the most recently modified data corresponding to a particular address in the memory devices 168-174. The presence of differing data values for a same address value in system 100 may lead to erroneous results being output by processors 118-132.
To maintain memory coherency, reads and writes of information to memory devices 168-174 are monitored or “snooped”. When either a memory read or a memory write of data at an address is detected, this address of the transaction is used as a snoop address. A snoop request is initiated and directed to all caches in system 100 such that snoop lookups can be performed to search for any address in the caches that match the snoop address. A snoop hit occurs for every match, and any needed corrective action is taken to maintain coherency of the data at the address in the cache where the snoop hit occurs. Coherency of information utilized by the processors 118-132 can be maintained in accordance with, for example, the MESI (Modified-Exclusive-Shared-Invalid) protocol.
To control the address buffer entries/data buffer entries among different requests, memory controllers 160-166 coupled to GOP and memory target modules 152-158 may implement a credit system. GOP and memory target modules 152-158 include logic to buffer transactions sent by requesters, and to buffer snoop credits issued by a target memory device 168-174.
Each request type may be allocated a certain number of credits (where each credit represents one address buffer entry and corresponding data buffer entry, if applicable). The total number of credits may not exceed the total number of entries. When a buffer entry is allocated to a request, a credit is consumed and when the buffer entry is freed via completion of a request, the credit is returned. The credits can include request credits corresponding to a request queue and snoop credits for snoop requests. If a snoop credit is available from a target memory device 168-174, GOP and memory target modules 152-158 arbitrate between the transactions and select one at a time to snoop. Each transaction so selected consumes a snoop credit. GOP and memory target modules 152-158 can also perform a snoop broadcast for the selected transaction to the snoopers in network 100, and attach a write data buffer credit to a snoop, if one is outstanding, and indicate its dispatch to the target.
Signals used to maintain coherency in system 100 may be communicated between each GOP and memory target module 152-158 and a respective one of switch points 110-116 on the edge of mesh network 101 outside of interleaving crossbar 150. For example, a snoop bus 184 can be included to transmit accesses to be snooped to a selected set of cache memories used by processors 118-132. As another example, a coherency response connection 186 can be coupled between each GOP and memory target module 152-158 and a respective switch point 110-116. In response to processing a memory transaction request for a memory location, each processor 118-132 can report compliance with the coherency requirements of the memory transaction request to a respective GOP and memory target module 152-158 via a respective coherency response connection 186. As a further example of a signal path that can be used to maintain memory coherency, clean data paths 188 can be coupled between each GOP and memory target module 152-158 and a respective switch point 110-116 to move coherent or “clean” data to memory in processors 118-132 in compliance with the coherency protocol being used.
The coherency action is carried out via the switch point 102-116 that received the snoop request and can be routed to all cache memories using data at a specified address via the receiving switch point 102-116. If there is intervention, a snooper responds to a request for data and sources the data rather than allowing the data to be sourced from one of memory devices 166-174 to which the request was addressed. The requesting processor 118-132 receives the data directly from the snooper supplying the data. If the data is to be provided by memory devices 168-174, GOP and memory target module 152-158 sends the response via the interleaving crossbar 150 back to the switch point 102-116 from where the original request arrived. Again, the response path has the same length regardless of the specific GOP and memory target module 152-158 that returns the data.
By now it should be appreciated that in some embodiments, a data processing system (100) can comprise a network of interconnected switch points (102-116) having a plurality of edge switch points (110-116) located at an edge of the network and a plurality of network interface controllers (134-140). Each edge switch point of the plurality of edge points can be coupled to a corresponding network interface controller of the plurality of network interface controllers. A crossbar switch (150) can be coupled between the plurality of network interface controllers and the plurality of target controllers (152/160, 154/162, 156/164, 158/166). The crossbar switch can be configured to communicate read/write signals (R/W) between any one of the plurality of network interface controllers and any one of the plurality of target controllers.
In another aspect, the read/write signals communicated by the crossbar switch can comprise read/write access requests and corresponding read/write data.
In another aspect, all read/write access requests directed to any target controller of the plurality of target controllers generated by a processing unit at a first switch point of the network are transmitted through a same path of switch points from the first switch point to a first edge switch point of the plurality of edge switch points.
In another aspect, a first read/write access request generated by the processing unit at the first switch point and transmitted to the first edge switch point is communicated to a first target controller of the plurality of target controllers through the crossbar switch, and a second read/write access request generated by the processing unit at the first switch point and transmitted to the first edge switch point is communicated to a second target controller of the plurality of target controllers through the crossbar switch.
In another aspect, the data processing system can further comprise a snoop interconnect between each network interface controller and a corresponding target controller of the plurality of target controllers, wherein the snoop interconnect is configured to communicate snoop messages outside the crossbar switch.
In another aspect, the snoop messages are communicated between each network interface controller and a snoop network.
In another aspect, the data processing system can further comprise a coherency response interconnect between each network interface controller and the corresponding target controller, wherein the coherency response interconnect is configured to communicate coherency responses outside the crossbar switch.
In another aspect, the coherency responses are communicated between each network interface and a coherency response network.
In another aspect, each target controller comprises a global ordering point (152-158) and a memory controller (160-166), wherein the global ordering point is configured to determine an order of memory accesses performed by the memory controller.
In another aspect, the data processing system can further comprise a memory (160, 162, 164, or 166) coupled to a memory controller of a target controller of the plurality of target controllers.
In another aspect, the data processing system can further comprise a memory (160-166) coupled to each memory controller.
In a data processing system having a network of interconnected switch points (102-116), wherein the network includes a plurality of edge switch points (110-116) located at an edge of the network, an embodiment of a method can comprise generating, by a first processing unit coupled to a first switch point of the network, a first read/write access request; transmitting the first read/write access request through a first path of the network to a first edge switch point; determining a first destination target controller of the first read/write access request; and transmitting the first read/write access request through a crossbar switch (150) from the first edge switch point to the first destination target controller.
In another aspect, the first path of the network comprises one or more switch points of the network.
In another aspect, the method can further comprise generating, by the first processing unit coupled to the first switch point of the network, a second read/write access request having a different destination than the first read/write access request; transmitting the second read/write access request through the first path of the network to the first edge switch point; determining a second destination target controller of the second read/write access request, wherein the second destination target controller is different from the first target controller; and transmitting the second read/write access request through the crossbar switch from the first edge switch point to the second destination target controller.
In another aspect, the method can further comprise providing a snoop message from the first destination target controller to the first edge switch point without transmitting the snoop message through the crossbar switch; and providing the snoop message from the first edge switch point to a snoop network of the data processing system.
In another aspect, the method can further comprise receiving, at the first edge switch point, a coherency response to the first read/write access from a coherency response network of the data processing system; and providing the coherency response to the first destination target controller without transmitting the coherency response through the crossbar switch.
In another aspect, the method can further comprise transmitting read/write data corresponding to the first read/write access request between the first switch point and the first edge switch point, wherein the read/write data is transmitted along the first path; and transmitting the read/write data through the crossbar switch between the first edge switch point and the destination target controller.
In another aspect, the method can further comprise, when the first read/write access request is a write request, providing the read/write data to a first memory coupled to the first destination target controller; and when the first read/write access request is a read request, receiving the read/write data from the first memory.
In another embodiment, a data processing system can comprise a network of interconnected switch points having a plurality of edge switch points located at an edge of the network, and a plurality of network interface controllers. Each edge switch point of the plurality of edge points can be coupled to a corresponding network interface controller of the plurality of network interface controllers. A crossbar switch can be coupled between the plurality of network interface controllers and the plurality of target controllers. The crossbar switch can be configured to communicate read/write signals between any one of the plurality of network interface controllers and any one of the plurality of target controllers. A snoop bus can be coupled between each network interface controller and a corresponding target controller of the plurality of target controllers.
In another aspect, the data processing system can further comprise a coherency response bus coupled between each network interface controller and the corresponding target controller.
This disclosure is presented to enable one of ordinary skill in the art to make and use the present disclosure as provided within the context of a particular application and its requirements. Various modifications to the preferred embodiments will, however, be apparent to one skilled in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present disclosure is not intended to be limited to the particular embodiments shown and described herein, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed.
Processing system or network 100 can be implemented in one or more Systems-On-Chip (SoC), Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Field-Programmable Gate Array (FPGA), processor, microprocessor, controller, a microcontroller (MCU), Graphics Processing Unit (GPU), or the like. Additionally or alternatively, processing network 100 may include one or more mixed-signal or analog circuits, such as, for example, Analog-to-Digital Converter (ADCs), Digital-to-Analog Converter (DACs), Phased Locked Loop (PLLs), oscillators, filters, amplifiers, etc. Additionally or alternatively, processing network 100 may include one or more Micro-ElectroMechanical Systems (MEMS), Nano-ElectroMechanical Systems (NEMS), or the like.
In one embodiment, processors 118-132 are computer systems which may be designed to give independent computing power to one or more users. Computer systems may be found in many forms including but not limited to mainframes, minicomputers, servers, workstations, personal computers, notepads, tablets, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices. A typical computer system includes at least one processing unit, associated memory and a number of input/output (I/O) devices.
A computer system processes information according to a program and produces resultant output information via I/O devices. A computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process.
Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. In an abstract, but still definite sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality.
Also for example, in one embodiment, the illustrated elements of systems disclosed herein are circuitry located on a single integrated circuit or within a same device. Alternatively, the systems may include any number of separate integrated circuits or separate devices interconnected with each other. Also for example, a system or portions thereof may be soft or code representations of physical circuitry or of logical representations convertible into physical circuitry. As such, a system may be embodied in a hardware description language of any appropriate type.
Furthermore, those skilled in the art will recognize that boundaries between the functionality of the above described operations merely illustrative. The functionality of multiple operations may be combined into a single operation, and/or the functionality of a single operation may be distributed in additional operations. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.
Although the invention(s) is/are described herein with reference to specific embodiments, various modifications and changes can be made without departing from the scope of the present invention(s), as set forth in the claims below. Accordingly, the specification and FIGURES are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present invention(s). Any benefits, advantages, or solutions to problems that are described herein with regard to specific embodiments are not intended to be construed as a critical, required, or essential feature or element of any or all the claims.
Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements. The term “coupled” is defined as connected, although not necessarily directly, and not necessarily mechanically. The terms “a” and “an” are defined as one or more unless stated otherwise. The terms “comprise” (and any form of comprise, such as “comprises” and “comprising”), “have” (and any form of have, such as “has” and “having”), “include” (and any form of include, such as “includes” and “including”) and “contain” (and any form of contain, such as “contains” and “containing”) are open-ended linking verbs. As a result, a system, device, or apparatus that “comprises,” “has,” “includes” or “contains” one or more elements possesses those one or more elements but is not limited to possessing only those one or more elements. Similarly, a method or process that “comprises,” “has,” “includes” or “contains” one or more operations possesses those one or more operations but is not limited to possessing only those one or more operations.

Claims (20)

What is claimed is:
1. A data processing system, comprising:
a network of interconnected switch points having a plurality of edge switch points located at an edge of the network;
a plurality of network interface controllers, wherein each edge switch point of the plurality of edge points is coupled to a corresponding network interface controller of the plurality of network interface controllers;
a plurality of target memory controllers; and
a crossbar switch coupled between the plurality of network interface controllers and the plurality of target memory controllers, wherein the crossbar switch is configured to communicate read/write signals between any one of the plurality of network interface controllers and any one of the plurality of target memory controllers, wherein the plurality of network interface controllers are coupled between the plurality of edge switch points and the crossbar switch.
2. The data processing system of claim 1, wherein the read/write signals communicated by the crossbar switch comprise read/write access requests and corresponding read/write data.
3. The data processing system of claim 2, wherein all read/write access requests directed to any target memory controller of the plurality of target memory controllers generated by a processing unit at a first switch point of the network are transmitted through a same path of switch points from the first switch point to a first edge switch point of the plurality of edge switch points.
4. The data processing system of claim 3, wherein a first read/write access request generated by the processing unit at the first switch point and transmitted to the first edge switch point is communicated to a first target memory controller of the plurality of target memory controllers through the crossbar switch, and a second read/write access request generated by the processing unit at the first switch point and transmitted to the first edge switch point is communicated to a second target memory controller of the plurality of target memory controllers through the crossbar switch.
5. The data processing system of claim 1, further comprising a snoop interconnect between each network interface controller and a corresponding target memory controller of the plurality of target memory controllers, wherein the snoop interconnect is configured to communicate snoop messages outside the crossbar switch.
6. The data processing system of claim 5, wherein the snoop messages are communicated between each network interface controller and a snoop network.
7. The data processing system of claim 1, further comprising a coherency response interconnect between each network interface controller and the corresponding target memory controller, wherein the coherency response interconnect is configured to communicate coherency responses outside the crossbar switch.
8. The data processing system of claim 7, wherein the coherency responses are communicated between each network interface and a coherency response network.
9. The data processing system of claim 1, wherein each target memory controller of the plurality of target memory controllers comprises a global ordering point and a memory controller, wherein the global ordering point is configured to determine an order of memory accesses performed by the memory controller.
10. The data processing system of claim 9, further comprising a memory coupled to a memory controller of a target memory controller of the plurality of target memory controllers.
11. The data processing system of claim 9, further comprising a memory coupled to each memory controller.
12. A method comprising:
generating a first read/write access request by a first processing unit coupled to a first switch point of a network of interconnected switch points in a data processing system, wherein the network includes a plurality of edge switch points located at an edge of the network;
transmitting the first read/write access request through a first path of the network to a first edge switch point via a second switch point of the network of interconnected switch points, the second switch point is not at the edge of the network;
determining a first destination target memory controller of the first read/write access request; and
transmitting the first read/write access request through a crossbar switch from the first edge switch point to the first destination target memory controller.
13. The method of claim 12, wherein the first path of the network comprises one or more switch points of the network.
14. The method of claim 12, further comprising:
generating, by the first processing unit coupled to the first switch point of the network, a second read/write access request having a different destination than the first read/write access request;
transmitting the second read/write access request through the first path of the network to the first edge switch point;
determining a second destination target memory controller of the second read/write access request, wherein the second destination target memory controller is different from the first target memory controller; and
transmitting the second read/write access request through the crossbar switch from the first edge switch point to the second destination target memory controller.
15. The method of claim 12, further comprising:
providing a snoop message from the first destination target memory controller to the first edge switch point without transmitting the snoop message through the crossbar switch; and
providing the snoop message from the first edge switch point to a snoop network of the data processing system.
16. The method of claim 15, further comprising:
receiving, at the first edge switch point, a coherency response to the first read/write access from a coherency response network of the of the data processing system; and
providing the coherency response to the first destination target memory controller without transmitting the coherency response through the crossbar switch.
17. The method of claim 12, further comprising:
transmitting read/write data corresponding to the first read/write access request between the first switch point and the first edge switch point, wherein the read/write data is transmitted along the first path; and
transmitting the read/write data through the crossbar switch between the first edge switch point and the destination target memory controller.
18. The method of claim 17, further comprising:
when the first read/write access request is a write request, providing the read/write data to a first memory coupled to the first destination target memory controller; and
when the first read/write access request is a read request, receiving the read/write data from the first memory.
19. A data processing system, comprising:
a network of interconnected switch points having a plurality of edge switch points located at an edge of the network;
a plurality of network interface controllers, wherein each edge switch point of the plurality of edge points is coupled to a corresponding network interface controller of the plurality of network interface controllers;
a plurality of target memory controllers;
a crossbar switch coupled between the plurality of network interface controllers and the plurality of target memory controllers, wherein the crossbar switch is configured to communicate read/write signals between any one of the plurality of network interface controllers and any one of the plurality of target memory controllers, wherein the plurality of network interface controllers are coupled between the plurality of edge switch points and the crossbar switch; and
a snoop bus coupled between each network interface controller and a corresponding target memory controller of the plurality of target memory controllers.
20. The data processing system of claim 19, further comprising:
a coherency response bus coupled between each network interface controller and the corresponding target memory controller.
US14/568,433 2014-12-12 2014-12-12 Coherent memory interleaving with uniform latency Active 2036-05-04 US9977750B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/568,433 US9977750B2 (en) 2014-12-12 2014-12-12 Coherent memory interleaving with uniform latency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/568,433 US9977750B2 (en) 2014-12-12 2014-12-12 Coherent memory interleaving with uniform latency

Publications (2)

Publication Number Publication Date
US20160170916A1 US20160170916A1 (en) 2016-06-16
US9977750B2 true US9977750B2 (en) 2018-05-22

Family

ID=56111305

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/568,433 Active 2036-05-04 US9977750B2 (en) 2014-12-12 2014-12-12 Coherent memory interleaving with uniform latency

Country Status (1)

Country Link
US (1) US9977750B2 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11995448B1 (en) 2018-02-08 2024-05-28 Marvell Asia Pte Ltd Method and apparatus for performing machine learning operations in parallel on machine learning hardware
CN110324265B (en) * 2018-03-29 2021-09-07 阿里巴巴集团控股有限公司 Traffic distribution method, routing method, equipment and network system
US10929779B1 (en) 2018-05-22 2021-02-23 Marvell Asia Pte, Ltd. Architecture to support synchronization between core and inference engine for machine learning
US10929778B1 (en) * 2018-05-22 2021-02-23 Marvell Asia Pte, Ltd. Address interleaving for machine learning
US11016801B1 (en) 2018-05-22 2021-05-25 Marvell Asia Pte, Ltd. Architecture to support color scheme-based synchronization for machine learning
US10997510B1 (en) 2018-05-22 2021-05-04 Marvell Asia Pte, Ltd. Architecture to support tanh and sigmoid operations for inference acceleration in machine learning
US10929760B1 (en) 2018-05-22 2021-02-23 Marvell Asia Pte, Ltd. Architecture for table-based mathematical operations for inference acceleration in machine learning
CN114880254A (en) * 2022-04-02 2022-08-09 锐捷网络股份有限公司 Table entry reading method and device and network equipment
CN115344515B (en) * 2022-10-17 2022-12-27 中科声龙科技发展(北京)有限公司 Method for realizing access control, on-chip computing system and chip

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5577204A (en) * 1993-12-15 1996-11-19 Convex Computer Corporation Parallel processing computer system interconnections utilizing unidirectional communication links with separate request and response lines for direct communication or using a crossbar switching device
US20010034815A1 (en) * 2000-04-25 2001-10-25 Dugan Michael K. Apparatus and method for performing speculative cache directory tag updates
US20020181395A1 (en) * 2001-04-27 2002-12-05 Foster Michael S. Communicating data through a network so as to ensure quality of service
US20030163649A1 (en) * 2002-02-25 2003-08-28 Kapur Suvansh K. Shared bypass bus structure
US20030167348A1 (en) * 2001-07-02 2003-09-04 Globespanvirata, Inc. Communications system using rings architecture
US6636906B1 (en) * 2000-04-28 2003-10-21 Hewlett-Packard Development Company, L.P. Apparatus and method for ensuring forward progress in coherent I/O systems
US20070217436A1 (en) * 2006-03-16 2007-09-20 Markley Jeffrey P Methods and apparatus for centralized content and data delivery
US20100180318A1 (en) * 2009-01-14 2010-07-15 Cisco Technology, Inc. Flexible supplicant access control
US20120170585A1 (en) * 2010-12-29 2012-07-05 Juniper Networks, Inc. Methods and apparatus for standard protocol validation mechanisms deployed over a switch fabric system
US20130304841A1 (en) * 2012-05-14 2013-11-14 Advanced Micro Devices, Inc. Server node interconnect devices and methods
US20160092362A1 (en) * 2013-04-25 2016-03-31 Hewlett-Packard Development Company, L.P. Memory network to route memory traffic and i/o traffic

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5577204A (en) * 1993-12-15 1996-11-19 Convex Computer Corporation Parallel processing computer system interconnections utilizing unidirectional communication links with separate request and response lines for direct communication or using a crossbar switching device
US20010034815A1 (en) * 2000-04-25 2001-10-25 Dugan Michael K. Apparatus and method for performing speculative cache directory tag updates
US6636906B1 (en) * 2000-04-28 2003-10-21 Hewlett-Packard Development Company, L.P. Apparatus and method for ensuring forward progress in coherent I/O systems
US20020181395A1 (en) * 2001-04-27 2002-12-05 Foster Michael S. Communicating data through a network so as to ensure quality of service
US20030167348A1 (en) * 2001-07-02 2003-09-04 Globespanvirata, Inc. Communications system using rings architecture
US20030163649A1 (en) * 2002-02-25 2003-08-28 Kapur Suvansh K. Shared bypass bus structure
US20070217436A1 (en) * 2006-03-16 2007-09-20 Markley Jeffrey P Methods and apparatus for centralized content and data delivery
US20100180318A1 (en) * 2009-01-14 2010-07-15 Cisco Technology, Inc. Flexible supplicant access control
US20120170585A1 (en) * 2010-12-29 2012-07-05 Juniper Networks, Inc. Methods and apparatus for standard protocol validation mechanisms deployed over a switch fabric system
US20130304841A1 (en) * 2012-05-14 2013-11-14 Advanced Micro Devices, Inc. Server node interconnect devices and methods
US20160092362A1 (en) * 2013-04-25 2016-03-31 Hewlett-Packard Development Company, L.P. Memory network to route memory traffic and i/o traffic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Chao, A., "Using Multichannel DRAM Subsystems to Create Scalable Architecture for Video SOCs", 4th Annual Multicore Expo, Sonics Inc., Mar. 18, 2009, pp. 1-20.

Also Published As

Publication number Publication date
US20160170916A1 (en) 2016-06-16

Similar Documents

Publication Publication Date Title
US9977750B2 (en) Coherent memory interleaving with uniform latency
US9465767B2 (en) Multi-processor, multi-domain, multi-protocol cache coherent speculation aware shared memory controller and interconnect
US9298643B2 (en) Performance and power improvement on DMA writes to level two combined cache/SRAM that is cached in level one data cache and line is valid and dirty
US10394747B1 (en) Implementing hierarchical PCI express switch topology over coherent mesh interconnect
US8799588B2 (en) Forward progress mechanism for stores in the presence of load contention in a system favoring loads by state alteration
US8799589B2 (en) Forward progress mechanism for stores in the presence of load contention in a system favoring loads
US7987322B2 (en) Snoop request management in a data processing system
US20150178211A1 (en) Information processing apparatus, parallel computer system, and control method for controlling information processing apparatus
US20020009095A1 (en) Multicast decomposition mechanism in a hierarchically order distributed shared memory multiprocessor computer system
US8868844B2 (en) System and method for a software managed cache in a multiprocessing environment
JP2009527861A (en) Data processing system and method having address translation bypass
US10528519B2 (en) Computing in parallel processing environments
US20120331187A1 (en) Bandwidth control for a direct memory access unit within a data processing system
US8977821B2 (en) Parallel processing of multiple block coherence operations
US20180074958A1 (en) Light-weight cache coherence for data processors with limited data sharing
US8117392B2 (en) Method and apparatus for efficient ordered stores over an interconnection network
US10437725B2 (en) Master requesting missing segments of a cache line for which the master has coherence ownership
US20190129884A1 (en) Node controller direct socket group memory access
US10339059B1 (en) Global socket to socket cache coherence architecture
US7809004B2 (en) Data processing system and processing unit having an address-based launch governor
US10445267B2 (en) Direct memory access (DMA) unit with address alignment
US8447934B2 (en) Reducing cache probe traffic resulting from false data sharing
Zhao et al. LA-LLC: Inter-core locality-aware last-level cache to exploit many-to-many traffic in GPGPUs
US9274955B2 (en) Reduced scalable cache directory
KR100841130B1 (en) Method and apparatus for efficient ordered stores over an interconnection network

Legal Events

Date Code Title Description
AS Assignment

Owner name: FREESCALE SEMICONDUCTOR, INC., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DESHPANDE, SANJAY R.;LARSON, JOHN E.;REEL/FRAME:034496/0126

Effective date: 20141202

AS Assignment

Owner name: CITIBANK, N.A., AS NOTES COLLATERAL AGENT, NEW YOR

Free format text: SUPPLEMENT TO IP SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:035033/0923

Effective date: 20150213

Owner name: CITIBANK, N.A., AS NOTES COLLATERAL AGENT, NEW YOR

Free format text: SUPPLEMENT TO IP SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:035033/0001

Effective date: 20150213

Owner name: CITIBANK, N.A., AS NOTES COLLATERAL AGENT, NEW YOR

Free format text: SUPPLEMENT TO IP SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:035034/0019

Effective date: 20150213

AS Assignment

Owner name: FREESCALE SEMICONDUCTOR, INC., TEXAS

Free format text: PATENT RELEASE;ASSIGNOR:CITIBANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:037358/0001

Effective date: 20151207

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: ASSIGNMENT AND ASSUMPTION OF SECURITY INTEREST IN PATENTS;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:037444/0535

Effective date: 20151207

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: ASSIGNMENT AND ASSUMPTION OF SECURITY INTEREST IN PATENTS;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:037444/0444

Effective date: 20151207

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SUPPLEMENT TO THE SECURITY AGREEMENT;ASSIGNOR:FREESCALE SEMICONDUCTOR, INC.;REEL/FRAME:039138/0001

Effective date: 20160525

AS Assignment

Owner name: NXP USA, INC., TEXAS

Free format text: CHANGE OF NAME;ASSIGNOR:FREESCALE SEMICONDUCTOR INC.;REEL/FRAME:040626/0683

Effective date: 20161107

AS Assignment

Owner name: NXP USA, INC., TEXAS

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE NATURE OF CONVEYANCE PREVIOUSLY RECORDED AT REEL: 040626 FRAME: 0683. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER AND CHANGE OF NAME EFFECTIVE NOVEMBER 7, 2016;ASSIGNORS:NXP SEMICONDUCTORS USA, INC. (MERGED INTO);FREESCALE SEMICONDUCTOR, INC. (UNDER);SIGNING DATES FROM 20161104 TO 20161107;REEL/FRAME:041414/0883

Owner name: NXP USA, INC., TEXAS

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE NATURE OF CONVEYANCE PREVIOUSLY RECORDED AT REEL: 040626 FRAME: 0683. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER AND CHANGE OF NAME;ASSIGNOR:FREESCALE SEMICONDUCTOR INC.;REEL/FRAME:041414/0883

Effective date: 20161107

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: NXP B.V., NETHERLANDS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC.;REEL/FRAME:050744/0097

Effective date: 20190903

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4