US10095629B2 - Local and remote dual address decoding using caching agent and switch - Google Patents
Local and remote dual address decoding using caching agent and switch Download PDFInfo
- Publication number
- US10095629B2 US10095629B2 US15/279,319 US201615279319A US10095629B2 US 10095629 B2 US10095629 B2 US 10095629B2 US 201615279319 A US201615279319 A US 201615279319A US 10095629 B2 US10095629 B2 US 10095629B2
- Authority
- US
- United States
- Prior art keywords
- node
- address
- memory request
- memory
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0813—Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1416—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights
- G06F12/1425—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights the protection being physical, e.g. cell, word, block
- G06F12/1441—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights the protection being physical, e.g. cell, word, block for a range
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1041—Resource optimization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/65—Details of virtual memory and virtual address translation
- G06F2212/657—Virtual address space management
Definitions
- Embodiments generally relate to computing systems and, more particularly, to systems, devices, and methods for multi-level address decoding.
- Computer processing nodes include system address decoders to determine to which memory a request is directed. Keeping the address of all the memories universally consistent can be challenging. Memories can be decommissioned, fault out, or otherwise become inoperable, thus altering accessible address space. In some current distributed shared memory (DSM) systems, every system address decoder of the DSM system needs to be updated to reflect changes in the memory structure so that memory access requests are routed properly and faults are reduced. This system address decoder update is cumbersome, tedious, and can cause unwanted downtime and address decoding errors in the system.
- DSM distributed shared memory
- FIG. 1 illustrates, by way of example, a logical block diagram of an embodiment of a DSM system.
- FIG. 2 illustrates, by way of example, an exploded view diagram of a portion of the system.
- FIG. 3 illustrates, by way of example, a flow diagram of an embodiment of a technique of address decoding.
- FIG. 4 illustrates, by way of example, a logical block diagram of an embodiment of a system with multiple layers of address decoding.
- FIG. 5 illustrates, by way of example, a logical block diagram of an embodiment of a portion of a system that includes security features.
- FIG. 6 illustrates, by way of example, a flow diagram of an embodiment of operations performed in a technique for performing a memory request.
- FIG. 7 illustrates, by way of example, a logical block diagram of an embodiment of the switch.
- FIG. 8 illustrates, by way of example, a flow diagram of an embodiment of communications to implement a multi-level address decoding scheme.
- FIG. 9 illustrates, by way of example, a logical block diagram of an embodiment of a system.
- Examples in this disclosure relate to devices and systems that include multiple levels of address decoding.
- a first level of decoding can be performed locally by a local system address decoder and a second level of decoding can be performed by a system address decoder of a switch between a local node and a remote node.
- a distributed shared memory is a memory architecture where physically separate memories are addressed as one shared address space. Shared means that the address space is shared such that a same physical address from two physically separate processors refers to a same location in the DSM.
- a Home Agent is the node (e.g., node cluster) that is responsible for processing a memory request from a caching agent and acting as a home for part of the memory address space (note that one die (e.g., processor) can have multiple homes in a distributed address space mapping).
- a request can go to the same node's local memory.
- a memory request can go to an interface (e.g., a universal peripheral interface (UN)) to route the request to the other processors within the same coherent domain, or to processors outside the coherent domain, through the NIC.
- a NIC is referred to as a host-fabric interface. All the processors connected on one side of the interface belong to the same coherent domain.
- One system can include one or more coherent domains connected through a fabric interconnect (e.g., one or more of a fabric link, a fabric memory tunnel, and a switch).
- a fabric interconnect e.g., one or more of a fabric link, a fabric memory tunnel, and a switch.
- HPC high performance computing
- data centers can include N clusters or servers that can communicate with each other using the fabric.
- each coherent domain can expose some address regions to the other coherent domains.
- accesses between different coherent domains are not coherent.
- Embodiment herein allow mapping address of memory ranges between different coherent domains.
- DSM Dynamic Switch
- system address decoders that map the entire address space. Each address is homed to a corresponding node.
- the system address decoders can determine where a memory address is homed, can modify a request accordingly, and forward the (modified) memory request to a switch that forwards the memory request to the proper destination (e.g., memory controller).
- a memory request under current embodiments can proceed as follows: (1) a node generates a memory request (e.g., a memory read or a memory write request); (2) the memory request is provided to a caching agent (CA); (3) the CA uses a system address decoder to decode that the memory request is homed to a memory location in a specific memory; (4) the system address decoder returns the address of the memory to which the request is homed; and (5) the memory request is forwarded to a memory controller of the memory to which the request is homed so that the memory request can be fulfilled.
- a memory request e.g., a memory read or a memory write request
- CA caching agent
- Every system address decoder includes a map to the entire address space of the DSM. If one address is changed somewhere in the DSM, all system address decoders need to be updated to reflect the change, such as to keep the address space coherent. Such a restriction reduces the flexibility and/or scalability of the DSM system. Removing or adding a memory to the system requires updating every system address decoder of the system to retain coherency. In some DSM systems, each node of many nodes can include many system address decoders. To retain coherency, each of these system address decoders need to be updated to reflect the same memory address space, in the event of a change to the address space.
- Embodiments discussed herein provide a DSM architecture that provides an ability to add or remove a memory without the burden of having to update every system address decoder of the system.
- Embodiments discussed herein can help provide flexibility in scaling or otherwise altering a DSM, such as by adding a level of address decoding at a network switch and/or a network interface controller.
- a level of address decoding at a network switch and/or a network interface controller.
- FIG. 1 illustrates, by way of example, a logical block diagram of an embodiment of a DSM system 100 .
- the DSM system 100 as illustrated includes a plurality of nodes 102 A and 102 B, a switch 104 , and a plurality of client servers 106 A and 106 B respectively coupled to a plurality of remote nodes 108 A, 108 B, and 108 C.
- Each of the nodes 102 A-B is illustrated as including a plurality of hardware processors 110 A and 110 B communicatively connected via a link 112 and a network interface controller (NIC) 114 A or 114 B.
- NIC network interface controller
- Each of the client servers 106 A-B includes a corresponding NIC 1140 and 114 D, respectively.
- Each of the NICs 114 A-D is communicatively coupled through the switch 104 .
- the DSM system 100 sometimes called a scale-out cluster, includes compute nodes (e.g., the nodes 102 A-B) and pooled-resource nodes (e.g., the sub-nodes 108 A-C accessible through the client servers 106 A-B).
- the sub-nodes 108 A-C provide the nodes 102 A-B with additional memory.
- the memory of the sub-nodes 108 A-C is exposed to the nodes 102 A-B locally, such as by a software protocol (e.g., a distributed file system, object map, or the like).
- FIG. 2 illustrates, by way of example, an exploded view diagram of a portion 200 of the system 100 .
- the exploded view is of the processor 110 A and corresponding contents thereof.
- the processor 110 A as illustrated includes a caching agent 111 A with a plurality of system address decoders 216 A, 216 B, 2160 , and 216 D.
- Each of the system address decoders 216 A-D decodes addresses homed to a specific node(s) of the system 100 .
- there are four system address decoders per processor one for each of the nodes 102 A-B and 106 A-B.
- the processor 110 B includes a replica of the system address decoders 216 A-D.
- the caching agent 111 A can forward an address request to a corresponding memory controller 219 (e.g., via one or more NICs, switches, and/or servers shown in FIG. 1 ).
- the memory controller 219 retrieves data corresponding to the memory request from a memory 218 or performs a memory write operation.
- the memory 218 as illustrated includes DRAM (dynamic random access memory), memory-mapped I/O (Input/Output), and legacy memory. Note that the layout of the memory is implementation specific.
- the DRAM can include the memory of nodes connected to the server 106 A-B and the local nodes 102 A-B ( FIG. 1 ), for example.
- the system address decoders 216 A-D combine to form a global decoder.
- Each processor 110 A-B includes such a global decoder.
- any access to remotely situated memory is decoded by the local system address decoders 216 A-D, and vectored to the appropriate node. Keeping the many global system address decoders updated within each node and consistent across nodes, such as when a memory is removed or added is thus a major undertaking. Such a configuration can inhibit flexibility of the system 100 and dynamic DSM operation.
- One or more embodiments discussed herein can help provide one or more advantages, such as can include (1) elasticity and fault-resilience, (2) cost efficiency in implementing a change, and (3) segregating inter-node and intra-node request call decoding, thus segregating which decoders need to be updated in response to a memory change.
- elasticity and fault-resilience a scale-out friendly DSM can benefit from supporting an increase or reduction in memory exposed by a given pooled memory server, redirected from a failed node to a stand-by node, and/or redistributing loads, or the like, all of which affect elasticity and/or fault resilience.
- Changing current system address decoders to be re-configurable is demanding, particularly if backward compatibility (support for legacy devices) is needed.
- a change in a given node requires changing only the local system address decoders (local to the node that is changed) and possibly the system address decoders of each of the switches. This is generally many fewer changes as compared to changing all of the system address decoders of the system.
- a system with sixteen nodes with each node having sixteen processors assuming a system address decoder per processor, per node
- two hundred fifty-six system address decoders will need to be updated in the case of a change (if all system address decoders need to be changed to retain coherence).
- this updating burden could be isolated to just seventeen (or fewer) system address decoders.
- the system address decoders of the switches only require changes for intra-node changes and not inter-node changes. For example, consider that previously a change in a system address decoder was configured at boot time and relatively static. Previous designs may not support distinguishing between nodes that are highly available (e.g., multi-homed) and those that are not, such as to allow for transparent synchronous replication operations via network switch logic. In one or more embodiments, a memory space can be replicated in multiple memory location, such that if a node fails, the system address decoder inside the switch is able to select another node in which the data is replicated. Thus there is higher availability as compared to other implementations. By isolating such configuration to inter-node and intra-node, run-time reconfigurability can be realized.
- embodiments discussed herein move address decoding for requests that are homed to remote nodes to one or more switches.
- the local system address decoder only needs to know that a given range of addresses is horned locally and/or remotely (which can be configured at boot time).
- the actual remote node to which an address request is homed need not be known locally and can be determined using a global mapping as programmed into system address decoders of the switch(es).
- the global mapping can be updated during run time.
- FIG. 3 illustrates, by way of example, a flow diagram of an embodiment of a technique 300 of multi-level address decoding.
- a memory request is provided to a local system address decoder, at operation 302 .
- the memory request can include a get or put request, for example.
- a get request is a common command in programming languages that allows for retrieval of data from a destination (e.g., get(destination)).
- a put request is a common command in programming languages that allows for attempting to change a memory location to specific data (e.g., put (memory location, specific data)).
- a memory request can include an indication of a node that hosts the memory address that is a target of the request (e.g., a node identification) and/or a physical or virtual address of the memory space at which data is to be read or written.
- the local system address decoder determines that the request is homed to an address that is not local (the request is homed to a remote node). This can be by determining that the address is not local (is not present in local memory and therefore is homed to a remote address) or determining that a characteristic of the request indicates that the memory request is a memory request for a remote address, such as a target identification in the request being blank or a specified value (of a specified range of values) (e.g., a maximum, minimum, or other value).
- the local system address decoder can forward the memory request to a NIC that forwards the request to a switch using a switch decode request, at operation 306 .
- the NIC can modify the request before forwarding the request to the switch.
- the switch determines a node identification corresponding to the memory address that is the subject of the request (e.g., in response to detecting that no target identification or a specific target identification is specified), at operation 308 .
- the switch then generates another request (with the proper node identification) and forwards the request to the node that includes the corresponding address.
- the requested data if a get request
- an acknowledgement (ACK) if a put request
- a not acknowledge (NACK) or error packet can be generated in the case of a failed get or put request.
- FIG. 4 illustrates, by way of example, a logical block diagram of an embodiment of a system 400 with multiple layers of address decoding.
- the system 400 as illustrated includes one or more local nodes 402 communicatively coupled to one or more remote nodes 404 A, 404 B, AND 404 C through NIC 406 , switch 408 , NIC 410 , and client server 412 (node 2 ).
- the local node 402 as illustrated includes a plurality of processors 414 A and 414 B communicatively coupled through a communication link 416 .
- Each of the processors 414 A-B includes a caching agent 415 A that includes a plurality of system address decoders 418 A, 418 B, 418 C, and 418 D.
- Each of the system address decoders 418 A-D can be for a specific memory of the local memory space.
- the system address decoders 418 A-D decode addresses homed to a respective local address space 420 .
- An unmapped address space 422 is optional and provides for flexibility in altering the local address space 420 , such as by expanding data stored in a local memory or adding another memory to the local address space 420 .
- the caching agent 415 A can determine whether a memory request from the processor 414 A-B is homed to a local address space 420 . If the request is homed to the local address space 420 , the caching agent 415 A can forward the request to the local memory controller (not shown in FIG. 4 ), such as to retrieve the contents of that address space or overwrite the contents of that address space with the data in the request. If the request is not horned to the local address space 420 (it is horned to a remote node 404 A-C), the caching agent 415 A can forward the request to the network interface controller 406 .
- a caching agent is a hardware, software, and/or firmware component that can initiate transactions with memory.
- a caching agent can retain one or more copies in its own cache structure.
- a caching agent can provide one or more copies of the coherent memory contents to other caching agents or other components, such as NICs, switches, routers, or the like.
- the system address decoders 418 A-D provide coherency within the node 402 .
- the system address decoders 418 A-D process memory requests from the processors 414 A-B within the same node.
- the NIC 406 is a hardware component that connects a node to a network (e.g., the node 402 to the network(s) connected to the switch 408 ).
- the NIC 406 hosts circuitry to communicate using a specific standard (e.g., Ethernet, Wi-Fi, Internet Protocol (IP), cellular (e.g., Long Term Evolution (LTE), or the like).
- IP Internet Protocol
- cellular e.g., Long Term Evolution (LTE), or the like.
- the NIC 406 allows nodes to communicate over wired or wireless connections therebetween.
- the NIC 406 can provide access to a physical layer and/or a data link layer, such as by providing physical access to a network medium and for addressing, such through media access control (MAC) addresses in the case of an Institute for Electrical and Electronics Engineers (IEEE) 802.11 network.
- MAC media access control
- the NIC 406 receives memory requests that are determined, by the system address decoders 418 A-D, to be homed remotely.
- the NIC 406 provides such memory to the switch 408 (e.g., a system address decoder 424 of the switch 408 ), such as with or without modification.
- the NIC 406 can modify the request, such as by including data from the request in a get or a put request, for example.
- the get or put request from the NIC 406 can then be provided to the switch 408 .
- the switch 408 as illustrated includes a system address decoder 424 .
- the switch 408 filters and forwards packets between networks (e.g., local area network (LAN) segments, LANs, and/or WANs).
- the switch 408 can operate at the data layer and/or the network layer.
- the switch 408 keeps a record of the addresses of devices connected to it. With this information, the switch can identify which system is sitting on which port. Therefore, when a memory request is received, the switch can determine which port thereof to forward the request.
- a switch will allocate full bandwidth to each of its ports. So regardless of the number of nodes transmitting, users will always have access to the maximum amount of bandwidth.
- a hub however, allocates its bandwidth amongst all currently transmitting nodes so that when a single node is transmitting it gets the full bandwidth, but when multiple nodes are transmitting, each node only gets a portion of the full bandwidth.
- a switch transmits frames, where a router, as its name implies, is to route a request to other networks until that request ultimately reaches its destination.
- the switch 408 can track what nodes have copies of at least part of other memory of other nodes. For example, the switch 408 can track which nodes are active and operational and which are non-operational. If a node fails (becomes non-operational), the switch 408 can detect such an event, such as by having a memory request to that node fail one or more times. The switch 408 can then notify one or more nodes that include the copies of at least part of the memory of the node that failed and can route requests to those nodes with future memory requests homed to the node that failed.
- the system address decoder 424 maps to the remote memory space 426 and an optional unmapped address space 428 .
- the system address decoder 424 decodes the address of the memory request from the node 402 to determine the node to which the request is homed.
- the switch 408 then forwards the request to the proper NIC 410 .
- the NIC 410 is similar to the NIC 406 , with the NIC 410 connecting the remote node 412 to other networks.
- the unmapped address space 428 is optional and can provide flexibility in a number of nodes that are connected to the switch 408 , such as to allow a node to be added to the system 400 .
- the NIC 410 provides a request to the client server 412 , which serves the request to the proper sub-node 404 A-C.
- the server 412 provides resources to the sub-nodes 404 A-C, which request services of the server 412 .
- a response to the request from the sub-node 404 A-C is provided back to the NIC 410 .
- the NIC 410 provides the response to the switch 408 , which decodes the address to which the response is homed.
- the switch 408 then provides the response to the NIC 406 , which provides the response to the corresponding processor 414 A-B.
- embodiments discussed can implement two levels of system address decoding.
- the first level can be used to determine whether the requested memory address(es) are hosted by memory in the local node or by remote memory.
- the second level (once it has been determined that the memory address is remote and the request has been sent to the NIC 406 and/or switch 408 ) takes place at the switch 408 and determines what remote node or nodes of the DSM (the fabric) should be targeted by the given request.
- the system address decoders 418 A-D in the local nodes 402 are configured to specify that all the remote memory is homed by the local NIC 406 . All requests targeting non-local address space (e.g., the address space 422 ) can be sent to the NIC 406 . Some address space can be left open, such as to support increase or decrease in the size of the exposed memory.
- the NIC 406 can be configured to generate requests to the switch 408 , such as for requests in which the destination node is not specified or is set to a specified value or range of values.
- the request from the processor 418 A-B specify a target node (e.g., using a node identification).
- remote memory requests coming from the system address decoder 418 A-D can generate requests without a target node id or including a target id with a specified value or range of values. This field can be generated or overwritten using the switch 408 and/or the NIC 406 .
- the switch 408 includes logic that contains the system address decoder 424 .
- the system address decoder 424 maps all the different memory exposed by remote nodes to corresponding node ids.
- the switch 408 includes one or more interfaces that can be used to setup or change the system address decoder 424 . How the system addresses map and how it is mapped to the nodes can be managed by a data center orchestrator 534 (see FIG. 5 ).
- the system address decoder 424 can specify that one address space is mapped to one or more nodes. This can be used for replication, fault-resilience, or other advantages.
- FIG. 5 illustrates, by way of example, a logical block diagram of an embodiment of a portion of a system 500 that includes security features.
- the system 500 is similar to the system 400 , with the system 500 illustrated as including security features.
- the security features include virtual address (VA) to physical address (PA) security check module (VATPASC) 530 A and 530 B, an operating system (OS) 532 A and 532 B on each local node 402 A and 402 B, respectively, and a data center orchestrator (DCO) 534 .
- VA virtual address
- PA physical address
- OS operating system
- DCO data center orchestrator
- the DCO 534 configures a physical global address space, such as by assigning a PA to each of the nodes that expose memory space to other nodes of the DSM.
- the OS 532 A-B communicates with the DCO 534 to allocate physical memory to the local processes requesting such memory.
- the PA range is mapped to a VA range of an application (or vice versa) and future accesses to the VA range can proceed with the corresponding page table entry (PTE) checks, such as by the VA to PA security check module.
- PTE page table entry
- FIG. 6 illustrates, by way of example, a flow diagram of an embodiment of operations 600 performed in a technique for performing a memory request.
- Operations 600 can be performed by one or more components illustrated in FIGS. 4 and/or 5 .
- the operations 600 as illustrated include a local node (e.g., the node 402 A-B) performing an access, such as a read or a write performed as a function of a VA and by a processor, at operation 602 ; the local node translating the VA to a PA, such as by using a PTE of a VATPASC module 530 A-B, at operation 604 ; the node providing the PA to a system address decoder 418 A-D of the node, at operation 606 ; the system address decoder 418 A-D indicating the PA is horned to a remote node (e.g., 404 A-C), at operation 608 ; the NIC 406 vectoring the request to a switch 408 , at operation 610
- Another security implementation can include only using a resilient highly privileged micro-service to configure a system address decoder. Such an implementation helps protect the system address decoders from undesired changes in the address mappings that can be used by an attacker.
- FIG. 7 illustrates, by way of example, a logical block diagram of an embodiment of the switch 408 .
- the switch 408 as illustrated includes the system address decoder 424 , egress logic 702 , and ingress logic 704 .
- the egress logic 702 includes one or more queues that can be used to store one or more messages that are to be routed to a node.
- the messages in the egress logic 702 can be from a memory responding to request.
- the ingress logic 704 includes one or more queues that can be used to store requests from nodes.
- the messages in the ingress queue can be provided to the address decoder of the switch to determine a node to which the request is homed.
- one or more embodiments discussed herein can help increase functionality, flexibility, scalability and dynamism in a scale-out DSM architecture using pooled memory that is exposed via a fabric.
- Advantages can include one or more of: (1) using two levels of system address decoding, such as to determine where a given memory address is homed, provides for flexibility in adding and/or removing address space from the DSM; (2) scalability is easier with the added flexibility; (3) with a second level of decoding being done in the switch, anytime that a re-configuration is required, fewer system address decoders need to be updated as compared to previous solutions.
- 4 switches connect a total of 16 dual socket nodes with 36 system address decoders each.
- a switch may be configured to achieve reliability for a particular range of what it maps, by mapping memory lines in that range to, for example, three home nodes. This is just one example of how embodiments discussed herein can easily be configured to include beneficial features that are much more challenging to implement in previous solutions.
- embodiments discussed herein can introduce a cross-cutting feature spanning core (e.g., processor and/or node) and fabric elements, embodiments may not introduce new dependencies.
- Fabric capabilities and node capabilities can evolve orthogonally, as the local system address decoders only need to know if they can delegate further decoding elsewhere (e.g., to a system address decoder of a switch).
- FIG. 8 illustrates, by way of example, a flow diagram of an embodiment of communications 800 to implement a multi-level address decoding scheme.
- the communications 800 as illustrated include a get(address) request 802 from the node 402 A to the switch 408 .
- the get(address) request 802 in one or more embodiments can be from the NIC 406 of the nodes 402 A.
- the get(address) request 802 is one or more packets from the node 402 A that specifies an address from which to retrieve data.
- the packet can include a destination node id that is blank or set to a specified value (of a range of specified values).
- the get(address) request can be a modified version of a get(address) request from a processor 414 A-B of the node 402 A.
- the NIC 406 can modify the request by removing the node id or replacing the node id with a specified value (of a range of specified values).
- the get(address) request 802 can be provided in response to a system address decoder of the node 402 A determining that the request is homed to a remote node.
- the communications 800 further include a decode(address) request 804 from an interface 801 of the switch 408 to the system address decoder 424 .
- the interface 801 exposes the switch logic (the system address decoder 424 ) to discover the final home.
- the interface can be accessed with a “get” command, for example.
- the decode(address) request 804 can be forwarded to the system address decoder 424 . If the address is not in the remote address space of the system address decoder 424 an error message can be created and provided to the node 402 A. The error message can be provided to the NIC 406 . The NIC 406 can create a software interrupt, such as to notify the node 402 A of the error.
- the operations 800 further include a node ID 806 from the system address decoder 424 to the interface 801 .
- the node ID 806 is a unique identifier that points to a node that includes the address used in operations 802 and 804 .
- the interface 801 can add the node ID 806 to a memory request to the node (node 412 in the example of FIG. 8 ).
- the interface 801 provides a get(address, node ID) request 808 to the corresponding node.
- the node 412 receives the request and provides a response(data) 810 .
- the response(data) can include an acknowledge, an error indicator (e.g., not acknowledged), and/or data.
- the response(data) can be provided to the node 402 A at operation 812 .
- the invention provides Intel unique differentiation on distributed shared memory fabric-connected systems without global memory coherence requirements.
- FIG. 9 illustrates, by way of example, a logical block diagram of an embodiment of a system 900 .
- the system 900 includes one or more components that can be included in the node 402 , 402 A, 402 B, processor 414 A-B, system address decoder 418 A-D, switch 408 , system address decoder 424 , NIC 406 and/or 410 , server 412 , sub-node 404 A-C, VA to PA security check module 530 A-B, OS 532 A-B, data center orchestrator 534 , egress logic 702 , ingress logic 704 , and/or interface 801 .
- processor 910 has one or more processing cores 912 and 912 N, where 912 N represents the Nth processing core inside processor 910 where N is a positive integer.
- system 900 includes multiple processors including 910 and 905 , where processor 905 has logic similar or identical to the logic of processor 910 .
- processing core 912 includes, but is not limited to, pre-fetch logic to fetch instructions, decode logic to decode the instructions, execution logic to execute instructions and the like.
- processor 910 has a cache memory 916 to cache instructions and/or data for system 900 . Cache memory 916 may be organized into a hierarchal structure including one or more levels of cache memory.
- processor 910 includes a memory controller 914 , which is operable to perform functions that enable the processor 910 to access and communicate with memory 930 that includes a volatile memory 932 and/or a non-volatile memory 934 .
- processor 910 is coupled with memory 930 and chipset 920 .
- Processor 910 may also be coupled to a wireless antenna 978 to communicate with any device configured to transmit and/or receive wireless signals.
- the wireless antenna interface 978 operates in accordance with, but is not limited to, the IEEE 802.11 standard and its related family, Home Plug AV (HPAV), Ultra Wide Band (UWB), Bluetooth, WiMax, or any form of wireless communication protocol.
- volatile memory 932 includes, but is not limited to, Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM), and/or any other type of random access memory device.
- Non-volatile memory 934 includes, but is not limited to, flash memory, phase change memory (PCM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), or any other type of non-volatile memory device.
- Memory 930 stores information and instructions to be executed by processor 910 .
- memory 930 may also store temporary variables or other intermediate information while processor 910 is executing instructions.
- the memory 930 is an example of a machine-readable medium. While a machine-readable medium may include a single medium, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers).
- machine-readable medium may include any medium that is capable of storing, encoding, or carrying instructions for execution by a machine (e.g., the control device 102 or any other module) and that cause the machine to perform any one or more of the techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions.
- the processing circuitry 204 can include instructions and can therefore be termed a machine-readable medium in the context of various embodiments.
- Other non-limiting machine-readable medium examples may include solid-state memories, and optical and magnetic media.
- machine-readable media may include: non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
- non-volatile memory such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices
- EPROM Electrically Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- flash memory devices e.g., Electrically Erasable Programmable Read-Only Memory (EEPROM)
- EPROM Electrically Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- flash memory devices e.g., Electrically Era
- chipset 920 connects with processor 910 via Point-to-Point (PtP or P-P) interfaces 917 and 922 .
- PtP Point-to-Point
- Chipset 920 enables processor 910 to connect to other elements in system 900 .
- interfaces 917 and 922 operate in accordance with a PtP communication protocol such as the Intel® QuickPath Interconnect (QPI) or the like. In other embodiments, a different interconnect may be used.
- QPI QuickPath Interconnect
- chipset 920 is operable to communicate with processor 910 , 905 N, display device 940 , and other devices.
- Chipset 920 may also be coupled to a wireless antenna 978 to communicate with any device configured to transmit and/or receive wireless signals.
- Chipset 920 connects to display device 940 via interface 926 .
- Display device 940 may be, for example, a liquid crystal display (LCD), a plasma display, cathode ray tube (CRT) display, or any other form of visual display device.
- processor 910 and chipset 920 are merged into a single SOC.
- chipset 920 connects to one or more buses 950 and 955 that interconnect various elements 974 , 960 , 962 , 964 , and 966 .
- Buses 950 and 955 may be interconnected together via a bus bridge 972 .
- chipset 920 couples with a non-volatile memory 960 , a mass storage device(s) 962 , a keyboard/mouse 964 , and a network interface 966 via interface 924 and/or 904 , etc.
- mass storage device 962 includes, but is not limited to, a solid state drive, a hard disk drive, a universal serial bus flash memory drive, or any other form of computer data storage medium.
- network interface 966 is implemented by any type of well-known network interface standard including, but not limited to, an Ethernet interface, a universal serial bus (USB) interface, a Peripheral Component Interconnect (PCI) Express interface, a wireless interface and/or any other suitable type of interface.
- the wireless interface operates in accordance with, but is not limited to, the IEEE 802.11 standard and its related family, Home Plug AV (HPAV), Ultra Wide Band (UWB), Bluetooth, WiMax, or any form of wireless communication protocol.
- cache memory 916 is depicted as a separate block within processor 910 , cache memory 916 (or selected aspects of 916 ) can be incorporated into processor core 912 .
- Example 1 can include a node comprising one or more processors to generate a first memory request, the first memory request including a first address and a node identification, a caching agent coupled to the one or more processors, the caching agent to determine that the first address is homed to a remote node remote to the local node, a network interface controller (NIC) coupled to the caching agent, the NIC to produce a second memory request based on the first memory request, and the one or more processors further to receive a response to the second memory request, the response generated by a switch coupled to the NIC, the switch includes a remote system address decoder to determine a node identification to which the second memory request is homed.
- NIC network interface controller
- Example 1 can further include, wherein the first address is a virtual address and the node further comprises a virtual address to physical address security check (VATPASC) module coupled to the one or more processors, the VATPASC to, before determining that the first address is horned to a node remote to the local node, convert the virtual address to a physical address and replace the first address of the first memory request with the physical address.
- VATPASC virtual address to physical address security check
- Example 3 at least one of Examples 1-2 can further include, wherein the NIC is further to replace the node identification of the first memory request with a specified value to create the second memory request.
- Example 3 can further include, wherein the caching agent to determine that the first address is horned to a node remote to the local node includes the caching agent to determine that the node identification of the memory request includes the specified value.
- Example 5 at least one of Examples 1-4 can include, wherein the one or more processors are to leave the node identification of the first memory request blank and the caching agent to determine that the first address is homed to a node remote to the local node includes the caching agent to determine that the node identification of the second memory request is blank.
- Example 6 includes a non-transitory machine-readable storage device comprising instructions stored thereon that, when executed by a local node, configure the local node to generate a first memory request, the first memory request including a first address and a node identification, determine that the first address is horned to a remote node remote to the local node, produce a second memory request based on the first memory request, and receive, from a switch that includes a remote system address decoder to determine a node identification to which the second memory request is homed, a response to the second memory request.
- Example 7 Example 6 can further include, wherein the first address is a virtual address and the storage device further comprises instructions stored thereon that, when executed by the local node, configure the local node to, before determining that the first address is homed to a node remote to the local node, convert the virtual address to a physical address and replace the first address of the first memory request with the physical address.
- Example 8 at least one of Examples 6-7 further includes instructions stored thereon that, when executed by the local node, configured the local node to replace a node identification of the first memory request with a specified value to create the second memory request.
- Example 8 further includes, wherein the instructions for determining that the first address is homed to a node remote to the local node include instructions for determining that the node identification of the second memory request includes the specified value.
- Example 10 at least one of Examples 6-9 further includes instructions stored thereon that, when executed by the local node, configure the local node to remove the node identification of the first memory request to create the second memory request and wherein the instructions for determining that the first address is homed to a node remote to the local node include instructions for determining that the node identification of the second memory request is blank.
- Example 11 includes a method performed by a local node, the method comprising generating a first memory request, the first memory request including a first address and a node identification, determining that the first address is horned to a remote node remote to the local node, producing a second memory request based on the first memory request, and receiving, from a switch that includes a remote system address decoder to determine a node identification to which the second memory request is homed, a response to the second memory request.
- Example 11 can further include, wherein the first address is a virtual address and the method further includes determining that the first address is homed to a node remote to the local node, convert the virtual address to a physical address and replace the first address of the first memory request with the physical address.
- Example 13 at least one of Examples 11-12 further includes replacing a node identification of the first memory request with a specified value to create the second memory request.
- Example 13 further includes, wherein determining that the first address is homed to a node remote to the local node includes determining that the node identification of the second memory request includes the specified value.
- Example 15 at least one of Examples 11-14 further includes removing the node identification of the first memory request to create the second memory request and wherein determining that the first address is homed to a node remote to the local node includes determining that the node identification of the second memory request is blank.
- Example 16 includes a distributed shared memory (DSM) system comprising a plurality of local nodes respectively comprising a first plurality of hardware processors, a local system address decoder coupled to the first plurality of hardware processors, a local memory coupled to the local system address decoder and a first network interface controller, the local system address decoder to determine whether a first memory request from a hardware processor of the plurality of hardware processors is homed to an address of the local memory or homed to a memory remote to the respective local node, a plurality of client servers respectively comprising a second network interface controller and a plurality of client nodes accessible therethrough, each of the plurality of client nodes including a remote memory, and a switch communicatively coupled between the first and second network interface controllers, the switch including a remote system address decoder to determine a node identification to which the first memory request is homed if the local system address decoder determines the address is homed to the remote memory, the switch to provide a second memory request to a second network
- Example 16 further includes, wherein the network interface controller of each of the plurality of local nodes is to perform one of (1) replace a second node identification in the first memory request with a specified node identification, and (2) remove the second node identification from the first memory request before providing the memory request to the switch.
- Example 18 at least one of Examples 16-17 includes, wherein the switch is to provide the first memory request from the network interface controller to the remote system address decoder in response to determining the second node identification is one of (1) the specified node identification and (2) blank.
- Example 19 at least one of Examples 16-18 includes, wherein the remote system address decoder is to determine a third node identification corresponding to a remote node of the plurality of remote nodes to which the memory request is homed.
- Example 19 further includes, wherein the switch is to provide a second memory request to the remote node, the second memory request including the third node identification.
- Example 21 at least one of Examples 16-20 includes, wherein each of the local nodes comprise a virtual address to physical address security check (VATPASC) module executable by one or more of the first plurality of hardware processors, the VATPASC module to convert a virtual address of the first memory request to a physical address including a node identification and an address of a memory in a node corresponding to the node identification and produce a second memory request, the second memory request including the physical address and the address of the memory.
- VATPASC virtual address to physical address security check
- Example 21 includes, wherein the VATPASC module is to provide the second memory request to the local system address decoder.
- Example 23 at least one of Examples 16-22 includes, wherein the switch further comprises egress logic to queue responses to requests from the local nodes.
- Example 24 at least one of Examples 16-23 includes, wherein the switch further comprises ingress logic to queue memory requests from the local nodes.
- Example 25 includes a method performed by a DSM system, the method including generating a first memory request from a local node, the first memory request including a first address, determining, at a local system address decoder of the local node, that the first address is homed to a node remote to the local node, producing, using a network interface controller coupled to the local node, a second memory request based on the first memory request, determining, using a remote system address decoder of a switch coupled to the network interface controller, a node identification of the node remote to the local node based on the first address in the second memory request, generating, using the switch, a third memory request including the determined node identification; and providing, from the switch and to the network interface controller of the local node, a communication including data responding to the third memory request.
- Example 25 includes, wherein the first address is a virtual address and the method further includes before determining that the first address is homed to a node remote to the local node, convert the virtual address to a physical address and replace the first address of the first memory request with the physical address.
- Example 27 at least one of Examples 25-26 further includes replacing, at the network interface controller, a node identification of the first memory request with a specified value to create the second memory request.
- Example 27 further includes, wherein determining that the first address is homed to a node remote to the local node includes determining that the node identification of the second memory request includes the specified value.
- Example 29 at least one of Examples 25-28 further includes removing, at the network interface controller, the node identification of the first memory request to create the second memory request.
- Example 29 further includes, wherein determining that the first address is homed to a node remote to the local node includes determining that the node identification of the second memory request is blank.
- the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more.”
- the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated.
Abstract
Description
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/279,319 US10095629B2 (en) | 2016-09-28 | 2016-09-28 | Local and remote dual address decoding using caching agent and switch |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/279,319 US10095629B2 (en) | 2016-09-28 | 2016-09-28 | Local and remote dual address decoding using caching agent and switch |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180089098A1 US20180089098A1 (en) | 2018-03-29 |
US10095629B2 true US10095629B2 (en) | 2018-10-09 |
Family
ID=61686152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/279,319 Active 2036-11-29 US10095629B2 (en) | 2016-09-28 | 2016-09-28 | Local and remote dual address decoding using caching agent and switch |
Country Status (1)
Country | Link |
---|---|
US (1) | US10095629B2 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11082523B2 (en) * | 2017-02-09 | 2021-08-03 | International Business Machines Corporation | System, method and computer program product for a distributed virtual address space |
US10521112B2 (en) * | 2017-03-17 | 2019-12-31 | International Business Machines Corporation | Layered clustered scale-out storage system |
JP7236948B2 (en) | 2019-07-16 | 2023-03-10 | 富士フイルム株式会社 | Image processing system, image processing method, and image processing program |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5983326A (en) * | 1996-07-01 | 1999-11-09 | Sun Microsystems, Inc. | Multiprocessing system including an enhanced blocking mechanism for read-to-share-transactions in a NUMA mode |
US20100082675A1 (en) * | 2008-09-30 | 2010-04-01 | Hitachi, Ltd | Method and apparatus for enabling wide area global name space |
US20150121022A1 (en) * | 2013-10-25 | 2015-04-30 | International Business Machines Corporation | Redundant location address mapper |
US20160283278A1 (en) * | 2015-03-27 | 2016-09-29 | Alejandro Duran Gonzalez | Apparatuses and methods to translate a logical thread identification to a physical thread identification |
US20160283375A1 (en) * | 2015-03-27 | 2016-09-29 | Intel Corporation | Shared buffered memory routing |
US20170344283A1 (en) * | 2016-05-27 | 2017-11-30 | Intel Corporation | Data access between computing nodes |
-
2016
- 2016-09-28 US US15/279,319 patent/US10095629B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5983326A (en) * | 1996-07-01 | 1999-11-09 | Sun Microsystems, Inc. | Multiprocessing system including an enhanced blocking mechanism for read-to-share-transactions in a NUMA mode |
US20100082675A1 (en) * | 2008-09-30 | 2010-04-01 | Hitachi, Ltd | Method and apparatus for enabling wide area global name space |
US20150121022A1 (en) * | 2013-10-25 | 2015-04-30 | International Business Machines Corporation | Redundant location address mapper |
US20160283278A1 (en) * | 2015-03-27 | 2016-09-29 | Alejandro Duran Gonzalez | Apparatuses and methods to translate a logical thread identification to a physical thread identification |
US20160283375A1 (en) * | 2015-03-27 | 2016-09-29 | Intel Corporation | Shared buffered memory routing |
US20170344283A1 (en) * | 2016-05-27 | 2017-11-30 | Intel Corporation | Data access between computing nodes |
Also Published As
Publication number | Publication date |
---|---|
US20180089098A1 (en) | 2018-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9383932B2 (en) | Data coherency model and protocol at cluster level | |
US9916095B2 (en) | Fork-safe memory allocation from memory-mapped files with anonymous memory behavior | |
US9037898B2 (en) | Communication channel failover in a high performance computing (HPC) network | |
US9665534B2 (en) | Memory deduplication support for remote direct memory access (RDMA) | |
JP5573829B2 (en) | Information processing apparatus and memory access method | |
US20130013889A1 (en) | Memory management unit using stream identifiers | |
US8402248B2 (en) | Explicitly regioned memory organization in a network element | |
US10095629B2 (en) | Local and remote dual address decoding using caching agent and switch | |
US10114763B2 (en) | Fork-safe memory allocation from memory-mapped files with anonymous memory behavior | |
US7765363B2 (en) | Mask usable for snoop requests | |
US11693805B1 (en) | Routing network using global address map with adaptive main memory expansion for a plurality of home agents | |
US20130227219A1 (en) | Processor, information processing apparatus, and arithmetic method | |
US11138130B1 (en) | Nested page tables | |
US10810133B1 (en) | Address translation and address translation memory for storage class memory | |
US10747679B1 (en) | Indexing a memory region | |
US10397096B2 (en) | Path resolution in InfiniBand and ROCE networks | |
US9798674B2 (en) | N-ary tree for mapping a virtual memory space | |
US10762137B1 (en) | Page table search engine | |
JP2018503156A (en) | Write request processing method, processor and computer | |
US10868864B2 (en) | System and method for fault-tolerant remote direct memory access using single port host channel adapter hardware | |
JP2014160502A (en) | Information processor and memory access method | |
TW202301133A (en) | Memory inclusivity management in computing systems | |
US10936219B2 (en) | Controller-based inter-device notational data movement system | |
CN116346382A (en) | Method and device for blocking malicious TCP connection and electronic equipment | |
US8762647B2 (en) | Multicore processor system and multicore processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SCHMISSEUR, MARK A;LARSEN, STEEN;DOSHI, KSHITIJ ARUN;AND OTHERS;SIGNING DATES FROM 20160916 TO 20180205;REEL/FRAME:046735/0430 Owner name: INTEL CORPORATION, CALIFORNIA Free format text: EMPLOYEE AGREEMENT;ASSIGNOR:RAMANUJAN, RAJ K;REEL/FRAME:046970/0468 Effective date: 19971020 |
|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RAMANUJAN, RAJ K;REEL/FRAME:046779/0064 Effective date: 20180831 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
CC | Certificate of correction |