US12229069B2 - Accelerator controller hub - Google Patents
Accelerator controller hub Download PDFInfo
- Publication number
- US12229069B2 US12229069B2 US17/083,200 US202017083200A US12229069B2 US 12229069 B2 US12229069 B2 US 12229069B2 US 202017083200 A US202017083200 A US 202017083200A US 12229069 B2 US12229069 B2 US 12229069B2
- Authority
- US
- United States
- Prior art keywords
- memory
- gpu
- coupled
- accelerator
- interfaces
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4027—Coupling between buses using bus bridges
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4221—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4282—Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
- G06F13/4295—Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus using an embedded synchronisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
- G06F9/4451—User profiles; Roaming
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
Definitions
- GPUs Graphics Processing Units
- ML machine learning
- AI artificial intelligence
- GPUs GPUs, GPGPU and other parallel programmable accelerator devices are termed XPUs.
- HDL Host-Device Link
- Network data ingestion Growing compute needs driven by larger AI models or HPC (high-performance computing) workloads sometimes require splitting the problem across XPUs in two or more server nodes, connected using a high-speed network (like InfiniBand or Ethernet).
- the network interface cards (NICs) are connected to the host CPU (central processing unit) over a Host-Device Fabric (HDF). Therefore, all network dataflows are limited by the XPU HDL bandwidth.
- Storage data ingestion Storage drives are connected to the host CPU over an HDF. While advances like direct RDMA (Remote Direct Memory Access) from SSD (solid-state drive) to XPU memory prevents extra data copy in host memory, the effective bandwidth is still limited by the XPU HDL bandwidth.
- direct RDMA Remote Direct Memory Access
- SSD solid-state drive
- Model sizes are growing faster than device memory capacity (like high-bandwidth memory (HBM) capacity).
- Advanced XPU memory virtualization methods (such as Unified Shared Memory schemes) enables application transparent device memory oversubscription and auto-migration.
- host memory like DRAM (dynamic random-access memory), or non-volatile memory
- This is wasteful use of HDL bandwidth and CPU's memory controller bandwidth. This could have a negative impact on CPU thread's performance competing for memory bandwidth.
- FIG. 1 is a schematic diagram of a multi-socket platform employing a current design including a pair of CPUs and multiple XPUs and IO devices, wherein the multiple XPUs and IO devices are coupled to the CPUs via a host device fabric (HDF) and the XPUs are coupled to one another via a high-performance accelerator fabric (HPAF);
- HDF host device fabric
- HPAF high-performance accelerator fabric
- FIG. 2 is a schematic diagram of a multi-socket platform representing an augmented version of the multi-socket platform of FIG. 1 employing an accelerator controller hub (ACH) coupled between the HDF and the HPAF and to which multiple IO devices and memory are coupled, according to one embodiment;
- ACH accelerator controller hub
- FIG. 3 is a schematic diagram of an ACH, according to one embodiment
- FIG. 4 a is a schematic diagram of a system including a CPU coupled to multiple XPUs with integrated ACHs to which IO devices are coupled;
- FIG. 4 b is a schematic diagram of a system including a CPU coupled to multiple GPUs with integrated GPU-IIOs to which IO devices are coupled;
- FIG. 5 is a schematic diagram of a system including a CPU coupled to a pair of GPUs with integrated GPU IIOs interconnected via a CAFE link, according to one embodiment
- FIG. 6 is a schematic diagram of a platform including a GPU with an on-die or on package GPU IIO that is coupled to a CPU and an IO device, according to one embodiment,
- FIG. 7 a is a schematic diagram of a system including an initiator and a target, further illustrating a remote direct memory access (RDMA) Send flow under which data is sent from the initiator and written to GPU memory on the target, according to one embodiment;
- RDMA remote direct memory access
- FIG. 7 b is a schematic diagram of the system of FIG. 7 a where the target is a passive target, further illustrating an RDMA Read flow under which data is read from the GPU memory of the passive target and written to the GPU memory of the initiator, according to one embodiment;
- FIG. 8 is a diagram of a system that may be implemented with aspects of the embodiments described and illustrated herein.
- an accelerator controller hub (ACH) is provided.
- the ACH represents a platform design rethinking based on the observation that moving storage, memory and networking closer to XPUs by connecting them to a high-performance accelerator fabric may yield a better platform balance and enable direct data movement to/from the data consumer/producer (either CPU or XPU).
- FIG. 1 shows a platform 100 illustrating a current platform design.
- Platform 100 is a multi-socket platform including two CPUs 102 and 104 that are connected via an ultra-path socket-to-socket interconnect 106 .
- CPU 102 is connected to host memory 107 comprising one or more memory devices, such as but not limited to DRAM DIMMs (dual inline memory modules) via one or more memory channels.
- CPU 104 is connected to host memory 109 comprising one or more memory devices via one or more memory channels.
- Each of CPUs 102 and 104 is connected to a host-device fabric 108 via respective HDLs 110 and 112 .
- HDF 108 is coupled to XPUs 114 , 116 , 118 , and 120 via respective HDLs 122 , 124 , 126 , and 128 .
- HDF 108 is also connected to one or more SSDs 130 via one or more HDLs 132 and is connected to one or more NICs 134 via one or more HDLs 136 .
- XPUs 114 , 116 , 118 , and 120 are connected to a high-performance accelerator fabric (HPAF) 138 via respective high-performance accelerator links (HPALs) 140 , 142 , 144 , and 146 .
- HPAFs include NVLink and CCIX (Cache Coherent Interconnect for Accelerators).
- input-output (TO) devices e.g., SSDs 130 and NICs 134
- TO input-output
- XPU to IO flows traverse the HDF, either via a switch or through the CPU as discussed below.
- FIG. 2 shows a platform 200 illustrating an example of a platform with an accelerator controller hub, according to one embodiment.
- Components in platforms 100 and 200 in FIGS. 1 and 2 with like-numbered references have similar configurations in both platforms. Accordingly, the following focuses on the differences between platforms 100 and 200 .
- an ACH 202 is coupled to HDF 108 via an HDL 204 and to HPAF 138 via an HPAL 206 .
- Memory 208 comprising one or more memory device is coupled to ACH 202 via one or more memory channels 210 .
- memory 208 may comprise storage-class memory, such as a hybrid memory, that is connected to ACH 204 via an HDL such as a PCIe (Peripheral Component Interconnect Express) link.
- PCIe Peripheral Component Interconnect Express
- NICs 212 are connected to ACH 202 via one or more HDLs 214 .
- one or more SSDs 216 are connected to ACH 202 via one or more HDLs 218 .
- NICs 212 and SSDs 216 are illustrative of IO devices that may be coupled to an ACH. Such IO devices further include but are not limited to network interfaces, InfiniBand HCAs, offloaded accelerator, encryption, and security devices, and FPGAs.
- FIG. 3 shows further details of an ACH 300 , according to one embodiment.
- the interfaces for ACH 300 include an HDL interface (I/F) 302 , a memory interface 304 , one or more (n) PCIe interfaces 306 - 1 . . . 306 - n , and one or more (m) HPAL interfaces 308 - 1 . . . 308 - m .
- ACH 300 further includes provisions for routing and protocol bridging, including a router 310 , a PCIe to HDL bridge 312 , and a PCIe to HPAL bridge 314 .
- HDL interface 302 is used for device discovery, enumeration, and host communication. An HDL interface is also used to maintain software compatibility.
- the one or more PCIe interfaces are used to connect to PCIe IO devices like NICs and SSDs via respective PCIe links.
- the one or more HPAF interfaces provide direct data paths from and IO device or memory to an HPAF attached accelerators, such as XPUs shown in FIG. 2 .
- Memory interface 304 is used to connect to various types of memory devices such as DRAM DIMMs, non-volatile DIMMs (NVDIMMs), and hybrid DIMMs that combine both volatile and non-volatile memory.
- PCIe to HDL bridge 312 provides bridging functionality between the PCIe interfaces 306 - 1 . . . 306 - n and HDL interface 302 to enable the host to enumerate and communicate with the PCIe IO devices coupled to the PCIe interfaces. If an HDL is a PCIe link, then this implies that the ACH should further implement PCIe switch functionality (not shown).
- ACH 300 uses PCIe to HPAL bridge 314 to bridge between the PCIe and HPAL protocols. This may involve remapping the opcodes, reformatting the packets, breakdown the payload etc.
- Router 310 is configured to steer memory requests targeting CPU host memory over HDL, while flows targeting XPU memory will be directed over HPAL.
- the routing decision may be based on one or more of the following:
- An ACH may also have to bridge the memory ordering model.
- PCIe interfaces 306 - 1 . . . 306 - n and HPAL interfaces 308 - 1 . . . 308 - m include a memory ordering block 322 .
- memory ordering block 322 implements a fence unit 324 to drain prior writes targeted to an XPU upon a trigger. The following are some examples of a trigger:
- ACH 300 When ACH 300 is integrated in an accelerator die or is on package with an accelerator, it further includes an internal interconnect or fabric interface 326 . Various types of interconnects or fabrics may be used, depending on the accelerator architecture and associated internal interface on the accelerator.
- FIGS. 4 a and 4 b respectively show platforms 400 a and 400 b with accelerators include on-die or on package ACH's.
- platform 400 a includes a CPU 400 coupled to multiple (j) XPUs 402 - 1 . . . 402 - j via respective Compute Express Link (CXL) or PCIe links 404 - 1 . . . 404 - j .
- CXL Compute Express Link
- PCIe links 404 - 1 . . . 404 - j Each of XPUs 402 - 1 . . . 402 - j includes a respective on-die or on package ACHs 406 - 1 . . . 406 - j .
- XPU 402 - 1 is coupled to one or more NICs 408 via one or more PCIe links 410 connected to PCIe interfaces on ACH 406 - 1 .
- XPU 402 - j is coupled to one or more SSDs 412 via one or more PCIe links 414 connected to PCIe interfaces on ACH 406 - j.
- an accelerate may include embedded memory or may including a memory interface coupled to external memory, observing that some implementations may not include either of these memories.
- the memory is referred to as accelerator memory.
- each XPU is coupled to accelerator memory, as depicted by accelerator memory 416 - 1 . . . 416 - j .
- the accelerator memory may be embedded on the XPU.
- platform 400 b includes a CPU 400 coupled to multiple (j) GPUs 403 - 1 . . . 403 - j via respective CXL or PCIe links 405 - 1 . . . 405 - j .
- GPUs 403 - 1 . . . 403 - j include respective on-die or on package GPU IIOs 407 - 1 . . . 406 - j .
- GPU 403 - 1 is coupled to one or more NICs 409 via one or more PCIe links 411 connected to PCIe interfaces on GPU IIO 407 - 1 .
- GPU 403 - j is coupled to one or more SSDs 413 via one or more PCIe links 415 connected to PCIe interfaces on ACH 407 - j .
- GPUs 403 - 1 . . . 403 - j further are shown as coupled to GPU memory 417 - 1 . . . 417 - j .
- the GPU memory may be embedded on a GPU rather than external to the GPU.
- a GPU includes a embedded GPU memory and also is coupled to external GPU memory.
- FIG. 5 shows a platform 500 implementing a new CAFE inter-accelerator link based on CXL being designed for a next generation GPU and comprising an HPAF.
- platform 500 includes a CPU 502 coupled to a GPU 504 including a GPU IIO 506 via a CXL or PCIe link 507 and CPU 502 is coupled to a GPU 508 including a GPU IIO 510 via a CXL or PCIe link 511 .
- CPU 502 is further coupled to a NIC 512 via a PCIe link 514 .
- GPU IIO 506 on GPU 504 is coupled to a NIC 516 via a PCIe link 518 .
- GPU IIO 510 on GPU 508 is coupled to a NIC 520 via a PCIe link 522 .
- GPUs 504 and 508 are connected via a CAFÉ inter-accelerator link 524 .
- CPU 502 is further coupled to memory 526
- GPU 504 is coupled to memory 528
- GPU 508 is coupled to memory 530 .
- NICs 516 and 520 are direct-attached to GPUs 504 and 508 .
- storage devices such as SSDs and storage class memory may be direct-attached to GPUs.
- the direct attachment enables low-latency and high-bandwidth communication and access to local large ML training sets without the involvement of the host CPU. With 15 TB+ SSDs available now, and more on the roadmap, caching large training sets close to the GPU will unlock massive AI training performance potential.
- FIG. 6 shows platform 600 including a GPU with an on-die or on package GPU IIO that is coupled to a CPU and an IO device, according to one embodiment.
- Platform 600 includes a GPU 602 including a GPU Core 604 internally coupled to an on-die or on package GPU-IIO 606 via an interconnect 608 .
- the GPU core represents the parallel processing circuitry implemented by a GPU to perform graphics processing operations and/or accelerator operations (e.g., matrix operations used in ML and AI).
- a CPU 610 is connected to GPU core 604 via a CXL or PCIe link 612 and is connected to GPU-IIO 606 via a PCIe link 614 .
- GPU core 604 is also connected to GPU memory comprising high-bandwidth memory (HBM) 616 via link 618 , while GPU-IIO is connected to an IO device 620 via a PCIe link 622 .
- CPU 610 is also connected to one or more memory devices 624 via one or more memory channels 610 .
- IO device 620 is more generally representative of any PCIe-compliant device that may be attached to the ACH, enabling tremendous flexibility in the NICs, SSDs, or other IO devices used, and in attaching nearby data coprocessors, for instance.
- CPU 610 includes M cores 628 , a CXL or PCIe interface 630 , an input-output memory management unit (IOMMU) 632 , a memory controller (MC) 634 , and a PCIe root port (RP) 636 .
- IOMMU input-output memory management unit
- Core 628 are used to execute software that has been loaded into memory 624 , as well as platform firmware (not shown).
- CXL or PCIe link 612 is coupled to CXL or PCIe interface 630 .
- CXL or PCIe interface 630 may be a PCIe RP.
- PCIe link 614 is coupled to PCIe RP 636 , which is embedded in or coupled to a PCIe root complex (not shown).
- an IOMMU is used to support DMA transfers by (among other functions) mapping memory addresses in IO devices and host memory. A DMA transfer is performed without involvement of any of cores 628 .
- Other DMA transfers described an illustrated herein may include additional IOMMUs that are not shown and/or other components to facilitate the DMA transfers, such as a translation look-aside buffer (TLB).
- TLB translation look-aside buffer
- GPU 602 may include an IOMMU and/or a TLB to support DMA data transfers between HBM 616 and IO device 620 .
- one or more TLBs are implement in an IOMMU.
- NICs 1:1 or in similar a larger ratio
- GPUs the host (CPU) may still access, use, virtualize and share the downstream PCIe device. This means that an investment in high-performance SSDs or NICs may be shared with both the host or hosted VMs on a server-class CPU, which will provide a cost benefit to Cloud service providers and the like.
- RDMA-based NICs as these are key for low-latency, and tend to be optimized for high throughput.
- a specific example of RDMA flows is shown below, encompassing both send and receive details.
- RNIC is used to refer to an RDMA-enabled NIC, and this can be abstracted to use Verbs/UCX/OFI semantics.
- FIGS. 7 a and 7 b show an embodiment of a system comprising a pair of platforms 700 and 702 that are configured to communication using RDMA flows.
- Platforms 700 and 702 have similar configurations to platform 600 of FIG. 6 discussed above, where like-numbered components (used for platform 600 ) for platform 700 include an appended ‘a’ and for platform 702 include an appended ‘b’.
- platform 700 includes a GPU 602 a while platform 702 includes a GPU 602 b , wherein both GPU 602 a and 602 b have a similar configuration to GPU 602 in platform 600 .
- Platform 700 includes an RNIC 704 coupled to GPU-IIO 606 a via a PCIe link 622 a .
- platform 702 includes an RNIC 706 coupled to GPU-IIO 606 b via a PCIe link 622 b .
- RNIC 704 is connected to RNIC 706 via a network 708 .
- network 708 may be a network that using a protocol for which RNICs are available, including but not limited to Ethernet networks and InfiniBand networks.
- RDMA over converged Ethernet (RoCE) protocols may be used (e.g, RoCE V1 or RoCE V2).
- Platforms 700 and 702 respectively include send queues (SQs) 710 a and 710 b , receive queues (RQs) 712 a and 712 b , and completion queues (CQs) 714 a and 714 b , which are implemented in memory 624 a and memory 624 b .
- HBM 616 a on platform 700 includes a data buffer 716 a
- HBM 616 b on platform 702 includes a data buffer 716 b.
- FIG. 7 a illustrates an example of an RDMA Send/Receive flow
- FIG. 7 b illustrates and example of a RDMA Read flow
- platform 700 is the initiator
- platform 702 is the target
- platform 702 is a passive target.
- an RDMA Send operation allows a local host (i.e., initiator) to send data to an RQ in a remote host (the target).
- the receiver will have previously posted a receive buffer to receive the data into the RQ.
- the sender does not have control over where the data will reside in the remote host. This is called a two-sided operation, because the remote host participates in the operation, by posting the Work Queue Entry (WQE) in the RQ.
- WQE Work Queue Entry
- RNIC 704 at the initiator (sender) fetches the descriptor (or WQE) from SQ 704 a .
- RNIC 704 then uses to the descriptor or WQE to read the data from the local GPU memory (data buffer 716 a in HBM 616 a ) during operation 2 a and sends the read data over network 708 to the target RNIC 706 during operation 2 b.
- While operation 3 is ordered with respect to operations 2 a and 2 b , there is no specific ordering for operations 3 a and 3 b .
- RNIC 704 may post a completion to CQ 714 a .
- RNIC 706 at the target fetches a descriptor from RQ 712 b .
- RNIC 706 performs an access permission check, and writes received data to the address specified by the RQ descriptor in data buffer 716 b of HBM 616 b .
- RNIC 706 posts a completion to CQ 714 b , as depicted by operation 5 .
- FIG. 7 b shows an RDMA Read flow, under which data is read from the remote host (depicted as the passive target).
- the initiator specifies the remote virtual address as well as local memory address to be copied to.
- the remote target is passive because the remote host does not participate the operation (i.e., CPU 610 b is not involved). Rather remote RNIC 706 performs a DMA write to the specified remote virtual address.
- RNIC 704 at the initiator fetches the descriptor (or WQE) from SQ 710 a and sends the request over to the RNIC 706 at the target.
- RNIC 706 performs access permission checks for the remote address, fetches the data from GPU memory (data buffer 716 b in HBM 616 b ) and returns it back to the initiator RNIC 704 .
- RNIC 704 then writes the data to the GPU memory (data buffer 716 a in HBM 616 a ), as depicted by operation 3 . After the full buffer is read, RNIC 704 posts a completion to CQ 714 a , as depicted by operation 4 .
- Similar flows are possible with SSDs and other PCIe devices, and common to these flows is the ability for the GPU-IIO (ACH) to route and manage traffic from the downstream PCIe device (RNIC in this example) and determine which flows should to/from host memory on the host processor, vs. which flows are destined for a GPU. For instance, this is performed by the GPU core in the RDMA Send and RDMA Read flow examples, and often on to GPU high-bandwidth memory. In this fashion the ACH may be thought of as a complex and integral component to enable this system architecture.
- RNIC PCIe device
- the RDMA flow is a host-mastered flow (where the descriptor submission is from the CPU).
- the ACH can also allow an XPU-mastered flow where the descriptor is submitted from a kernel running on the XPU itself.
- FIG. 8 depicts a system 800 in which aspects of some embodiments disclosed above may be implemented.
- System 800 includes one or more processors 810 , which provides processing, operation management, and execution of instructions for system 800 .
- Processor 810 can include any type of microprocessor, central processing unit (CPU), graphics processing unit (GPU), processing core, multi-core processor or other processing hardware to provide processing for system 800 , or a combination of processors.
- Processor 810 controls the overall operation of system 800 , and can be or include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- PLDs programmable logic devices
- system 800 includes interface 812 coupled to processor 810 , which can represent a higher speed interface or a high throughput interface for system components that needs higher bandwidth connections, such as memory subsystem 820 or optional graphics interface components 840 , or optional accelerators 842 .
- Interface 812 represents an interface circuit, which can be a standalone component or integrated onto a processor die.
- graphics interface 840 interfaces to graphics components for providing a visual display to a user of system 800 .
- graphics interface 840 can drive a high definition (HD) display that provides an output to a user.
- HD high definition
- High definition can refer to a display having a pixel density of approximately 100 PPI (pixels per inch) or greater and can include formats such as full HD (e.g., 1080p), retina displays, 4K (ultra-high definition or UHD), or others.
- the display can include a touchscreen display.
- graphics interface 840 generates a display based on data stored in memory 830 or based on operations executed by processor 810 or both. In one example, graphics interface 840 generates a display based on data stored in memory 830 or based on operations executed by processor 810 or both.
- accelerators 842 can be a fixed function offload engine that can be accessed or used by a processor 810 .
- an accelerator among accelerators 842 can provide data compression capability, cryptography services such as public key encryption (PKE), cipher, hash/authentication capabilities, decryption, or other capabilities or services.
- PKE public key encryption
- an accelerator among accelerators 842 provides field select controller capabilities as described herein.
- accelerators 842 can be integrated into a CPU socket (e.g., a connector to a motherboard or circuit board that includes a CPU and provides an electrical interface with the CPU).
- accelerators 842 can include a single or multi-core processor, graphics processing unit, logical execution unit single or multi-level cache, functional units usable to independently execute programs or threads, application specific integrated circuits (ASICs), neural network processors (NNPs), programmable control logic, and programmable processing elements such as field programmable gate arrays (FPGAs). Accelerators 842 can provide multiple neural networks, CPUs, processor cores, general purpose graphics processing units, or graphics processing units can be made available for use by AI or ML models.
- ASICs application specific integrated circuits
- NNPs neural network processors
- FPGAs field programmable gate arrays
- the AI model can use or include any or a combination of: a reinforcement learning scheme, Q-learning scheme, deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C), combinatorial neural network, recurrent combinatorial neural network, or other AI or ML model.
- a reinforcement learning scheme Q-learning scheme, deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C)
- A3C Asynchronous Advantage Actor-Critic
- combinatorial neural network recurrent combinatorial neural network
- recurrent combinatorial neural network or other AI or ML model.
- Multiple neural networks, processor cores, or graphics processing units can be made available for use by AI or ML models.
- Memory subsystem 820 represents the main memory of system 800 and provides storage for code to be executed by processor 810 , or data values to be used in executing a routine.
- Memory subsystem 820 can include one or more memory devices 830 such as read-only memory (ROM), flash memory, one or more varieties of random access memory (RAM) such as DRAM, or other memory devices, or a combination of such devices.
- Memory 830 stores and hosts, among other things, operating system (OS) 832 to provide a software platform for execution of instructions in system 800 .
- applications 834 can execute on the software platform of OS 832 from memory 830 .
- Applications 834 represent programs that have their own operational logic to perform execution of one or more functions.
- Processes 836 represent agents or routines that provide auxiliary functions to OS 832 or one or more applications 834 or a combination.
- OS 832 , applications 834 , and processes 836 provide software logic to provide functions for system 800 .
- memory subsystem 820 includes memory controller 822 , which is a memory controller to generate and issue commands to memory 830 . It will be understood that memory controller 822 could be a physical part of processor 810 or a physical part of interface 812 .
- memory controller 822 can be an integrated memory controller, integrated onto a circuit with processor 810 .
- system 800 can include one or more buses or bus systems between devices, such as a memory bus, a graphics bus, interface buses, or others.
- Buses or other signal lines can communicatively or electrically couple components together, or both communicatively and electrically couple the components.
- Buses can include physical communication lines, point-to-point connections, bridges, adapters, controllers, or other circuitry or a combination.
- Buses can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a Hyper Transport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (Firewire).
- PCI Peripheral Component Interconnect
- ISA Hyper Transport or industry standard architecture
- SCSI small computer system interface
- USB universal serial bus
- IEEE Institute of Electrical and Electronics Engineers
- system 800 includes interface 814 , which can be coupled to interface 812 .
- interface 814 represents an interface circuit, which can include standalone components and integrated circuitry.
- Network interface 850 provides system 800 the ability to communicate with remote devices (e.g., servers or other computing devices) over one or more networks.
- Network interface 850 can include an Ethernet adapter, wireless interconnection components, cellular network interconnection components, USB (universal serial bus), or other wired or wireless standards-based or proprietary interfaces.
- Network interface 850 can transmit data to a device that is in the same data center or rack or a remote device, which can include sending data stored in memory.
- Network interface 850 can receive data from a remote device, which can include storing received data into memory.
- Various embodiments can be used in connection with network interface 850 , processor 810 , and memory subsystem 820 .
- system 800 includes one or more IO interface(s) 860 .
- IO interface 860 can include one or more interface components through which a user interacts with system 800 (e.g., audio, alphanumeric, tactile/touch, or other interfacing).
- Peripheral interface 870 can include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently to system 800 . A dependent connection is one where system 800 provides the software platform or hardware platform or both on which operation executes, and with which a user interacts.
- system 800 includes storage subsystem 880 to store data in a nonvolatile manner.
- storage subsystem 880 includes storage device(s) 884 , which can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination.
- Storage 884 holds code or instructions and data 886 in a persistent state (i.e., the value is retained despite interruption of power to system 800 ).
- Storage 884 can be generically considered to be a “memory,” although memory 830 is typically the executing or operating memory to provide instructions to processor 810 .
- storage 884 is nonvolatile
- memory 830 can include volatile memory (i.e., the value or state of the data is indeterminate if power is interrupted to system 800 ).
- storage subsystem 880 includes controller 882 to interface with storage 884 .
- controller 882 is a physical part of interface 814 or processor 810 or can include circuits or logic in both processor 810 and interface 814 .
- a volatile memory is memory whose state (and therefore the data stored in it) is indeterminate if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state.
- DRAM Synchronous DRAM
- SDRAM Synchronous DRAM
- a memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (Double Data Rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on Jun. 27, 2007).
- DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR4E (DDR version 4), LPDDR3 (Low Power DDR version3, JESD209-3B, August 2013 by JEDEC), LPDDR4) LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (Wide Input/output version 2, JESD229-2 originally published by JEDEC in August 2014), HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013, LPDDR3 (currently in discussion by JEDEC), HBM2 (HBM version 2), currently in discussion by JEDEC, or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications.
- the JEDEC standards are available at www.jedec.org.
- a non-volatile memory (NVM) device is a memory whose state is determinate even if power is interrupted to the device.
- the NVM device can comprise a block addressable memory device, such as NAND technologies, or more specifically, multi-threshold level NAND flash memory (for example, Single-Level Cell (“SLC”), Multi-Level Cell (“MLC”), Quad-Level Cell (“QLC”), Tri-Level Cell (“TLC”), or some other NAND).
- SLC Single-Level Cell
- MLC Multi-Level Cell
- QLC Quad-Level Cell
- TLC Tri-Level Cell
- a NVM device can also comprise a byte-addressable write-in-place three dimensional cross point memory device, or other byte addressable write-in-place NVM device (also referred to as persistent memory), such as single or multi-level Phase Change Memory (PCM) or phase change memory with a switch (PCMS), NVM devices that use chalcogenide phase change material (for example, chalcogenide glass), resistive memory including metal oxide base, oxygen vacancy base and Conductive Bridge Random Access Memory (CB-RAM), nanowire memory, ferroelectric random access memory (FeRAM, FRAM), magneto resistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque (STT)-MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thyristor based memory device, or a combination of any of the above, or other memory.
- a power source (not depicted) provides power to the components of system 800 . More specifically, power source typically interfaces to one or multiple power supplies in system 800 to provide power to the components of system 800 .
- the power supply includes an AC to DC (alternating current to direct current) adapter to plug into a wall outlet.
- AC power can be renewable energy (e.g., solar power) power source.
- power source includes a DC power source, such as an external AC to DC converter.
- power source or power supply includes wireless charging hardware to charge via proximity to a charging field.
- power source can include an internal battery, alternating current supply, motion-based power supply, solar power supply, or fuel cell source.
- system 800 can be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components.
- High speed interconnects can be used such as: Ethernet (IEEE 802.3), remote direct memory access (RDMA), InfiniBand, Internet Wide Area RDMA Protocol (iWARP), quick UDP Internet Connections (QUIC), RDMA over Converged Ethernet (RoCE), Peripheral Component Interconnect express (PCIe), Intel® QuickPath Interconnect (QPI), Intel® Ultra Path Interconnect (UPI), Intel® On-Chip System Fabric (IOSF), Omnipath, Compute Express Link (CXL), HyperTransport, Infinity Fabric, high-speed fabric, NVLink, Advanced Microcontroller Bus Architecture (AMBA) interconnect, OpenCAPI, Gen-Z, Cache Coherent Interconnect for Accelerators (CCIX), 3GPP Long Term Evolution (LTE) (4G), 3GPP 5G, and variations thereof. Data can be copied or stored to virtualized storage nodes using a protocol such
- the elements in some cases may each have a same reference number or a different reference number to suggest that the elements represented could be different and/or similar.
- an element may be flexible enough to have different implementations and work with some or all of the systems shown or described herein.
- the various elements shown in the figures may be the same or different. Which one is referred to as a first element and which is called a second element is arbitrary.
- Coupled may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
- communicatively coupled means that two or more elements that may or may not be in direct contact with each other, are enabled to communicate with each other. For example, if component A is connected to component B, which in turn is connected to component C, component A may be communicatively coupled to component C using component B as an intermediary component.
- An embodiment is an implementation or example of the inventions.
- Reference in the specification to “an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions.
- the various appearances “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments.
- embodiments of this invention may be used as or to support a software program, software modules, firmware, and/or distributed software executed upon some form of processor, processing core or embedded logic a virtual machine running on a processor or core or otherwise implemented or realized upon or within a non-transitory computer-readable or machine-readable storage medium.
- a non-transitory computer-readable or machine-readable storage medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer).
- a non-transitory computer-readable or machine-readable storage medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a computer or computing machine (e.g., computing device, electronic system, etc.), such as recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.).
- the content may be directly executable (“object” or “executable” form), source code, or difference code (“delta” or “patch” code).
- a non-transitory computer-readable or machine-readable storage medium may also include a storage or database from which content can be downloaded.
- the non-transitory computer-readable or machine-readable storage medium may also include a device or product having content stored thereon at a time of sale or delivery.
- delivering a device with stored content, or offering content for download over a communication medium may be understood as providing an article of manufacture comprising a non-transitory computer-readable or machine-readable storage medium with such content described herein.
- the operations and functions performed by various components described herein may be implemented by software running on a processing element, via embedded hardware or the like, or any combination of hardware and software.
- Such components may be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, ASICs, DSPs, etc.), embedded controllers, hardwired circuitry, hardware logic, etc.
- Software content e.g., data, instructions, configuration information, etc.
- a list of items joined by the term “at least one of” can mean any combination of the listed terms.
- the phrase “at least one of A, B or C” can mean A; B; C; A and B; A and C; B and C; or A, B and C.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Advance Control (AREA)
- User Interface Of Digital Computer (AREA)
- Materials For Photolithography (AREA)
Abstract
Description
-
- a. Address decode—A simple physical address decode (base/limit registers like PCIe) may be sufficient for implementations employing physical addresses. The logic for performing this is depicted by
address decode logic 316. - b. A bit in command descriptor—This enables SW to specify the target in a command descriptor, as depicted by a
command descriptor bit 318. - c. Process Address Space Identifier (PASID)—for future scalable IOV (input-output virtualization) devices, one could use a separate IO device queue per memory target.
PASID logic 320 is used to decode the queue id to route the request.
- a. Address decode—A simple physical address decode (base/limit registers like PCIe) may be sufficient for implementations employing physical addresses. The logic for performing this is depicted by
-
- a. Zero Length Read operation.
- b. RO=0 write will flush prior writes targeted to HPAL
- c. Software triggered fence—an ACH aware application could use an explicit trigger to ensure data generated by an IO device is observable before launching a dependence XPU kernel.
Claims (15)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/083,200 US12229069B2 (en) | 2020-10-28 | 2020-10-28 | Accelerator controller hub |
| DE102021122233.8A DE102021122233A1 (en) | 2020-10-28 | 2021-08-27 | ACCELERATOR CONTROLLER HUB |
| NL2029100A NL2029100B1 (en) | 2020-10-28 | 2021-09-01 | Accelerator controller hub |
| CN202111120599.9A CN114493978A (en) | 2020-10-28 | 2021-09-24 | Accelerator controller center |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/083,200 US12229069B2 (en) | 2020-10-28 | 2020-10-28 | Accelerator controller hub |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20210042254A1 US20210042254A1 (en) | 2021-02-11 |
| US12229069B2 true US12229069B2 (en) | 2025-02-18 |
Family
ID=74499669
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/083,200 Active 2043-05-26 US12229069B2 (en) | 2020-10-28 | 2020-10-28 | Accelerator controller hub |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12229069B2 (en) |
| CN (1) | CN114493978A (en) |
| DE (1) | DE102021122233A1 (en) |
| NL (1) | NL2029100B1 (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12271322B2 (en) | 2019-06-24 | 2025-04-08 | Samsung Electronics Co., Ltd. | Multi-function flexible computational storage device |
| US11914903B2 (en) | 2020-10-12 | 2024-02-27 | Samsung Electronics Co., Ltd. | Systems, methods, and devices for accelerators with virtualization and tiered memory |
| CN115242563B (en) * | 2021-06-25 | 2023-11-14 | 统信软件技术有限公司 | Network communication method, computing device and readable storage medium |
| US12386772B2 (en) * | 2021-07-20 | 2025-08-12 | Intel Corporation | Technologies for increasing link efficiency |
| US20230185760A1 (en) * | 2021-12-13 | 2023-06-15 | Intel Corporation | Technologies for hardware microservices accelerated in xpu |
| CN115904226B (en) | 2022-10-10 | 2025-10-28 | 阿里巴巴(中国)有限公司 | Solid-state drive, device, and method for operating a solid-state drive |
| US20240205312A1 (en) * | 2022-12-18 | 2024-06-20 | Nvidia Corporation | Application programming interface to load synchronization information |
| US20240241843A1 (en) * | 2024-03-29 | 2024-07-18 | Intel Corporation | Network controller low latency data path |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040114589A1 (en) * | 2002-12-13 | 2004-06-17 | Alfieri Robert A. | Method and apparatus for performing network processing functions |
| US7362772B1 (en) * | 2002-12-13 | 2008-04-22 | Nvidia Corporation | Network processing pipeline chipset for routing and host packet processing |
| US9658981B2 (en) | 2012-03-14 | 2017-05-23 | Istituto Nazionale Di Fisica Nucleare | Network interface card for a computing node of a parallel computer accelerated by general purpose graphics processing units, and related inter-node communication method |
| US20190297015A1 (en) | 2019-06-07 | 2019-09-26 | Intel Corporation | Network interface for data transport in heterogeneous computing environments |
| US10579557B2 (en) * | 2018-01-16 | 2020-03-03 | Advanced Micro Devices, Inc. | Near-memory hardened compute blocks for configurable computing substrates |
| US20200210365A1 (en) * | 2018-12-27 | 2020-07-02 | Graphcore Limited | Exchange of data between processor modules |
| US20200210233A1 (en) * | 2018-12-29 | 2020-07-02 | Cambricon Technologies Corporation Limited | Operation method, device and related products |
| US20210021619A1 (en) * | 2020-09-26 | 2021-01-21 | Ned M. Smith | Trust-based orchestration of an edge node |
| US20210026687A1 (en) * | 2019-07-26 | 2021-01-28 | Castalune LLC | Computer-implemented system and methods for computing valuation |
| US20210194793A1 (en) * | 2019-12-23 | 2021-06-24 | Graphcore Limited | Sync Network |
| US20220012621A1 (en) * | 2018-11-19 | 2022-01-13 | QMware AG | Systems and methods involving hybrid quantum machines, aspects of quantum information technology and/or other features |
-
2020
- 2020-10-28 US US17/083,200 patent/US12229069B2/en active Active
-
2021
- 2021-08-27 DE DE102021122233.8A patent/DE102021122233A1/en active Pending
- 2021-09-01 NL NL2029100A patent/NL2029100B1/en active
- 2021-09-24 CN CN202111120599.9A patent/CN114493978A/en active Pending
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040114589A1 (en) * | 2002-12-13 | 2004-06-17 | Alfieri Robert A. | Method and apparatus for performing network processing functions |
| US7362772B1 (en) * | 2002-12-13 | 2008-04-22 | Nvidia Corporation | Network processing pipeline chipset for routing and host packet processing |
| US9658981B2 (en) | 2012-03-14 | 2017-05-23 | Istituto Nazionale Di Fisica Nucleare | Network interface card for a computing node of a parallel computer accelerated by general purpose graphics processing units, and related inter-node communication method |
| US10579557B2 (en) * | 2018-01-16 | 2020-03-03 | Advanced Micro Devices, Inc. | Near-memory hardened compute blocks for configurable computing substrates |
| US20220012621A1 (en) * | 2018-11-19 | 2022-01-13 | QMware AG | Systems and methods involving hybrid quantum machines, aspects of quantum information technology and/or other features |
| US20200210365A1 (en) * | 2018-12-27 | 2020-07-02 | Graphcore Limited | Exchange of data between processor modules |
| US20200210233A1 (en) * | 2018-12-29 | 2020-07-02 | Cambricon Technologies Corporation Limited | Operation method, device and related products |
| US20190297015A1 (en) | 2019-06-07 | 2019-09-26 | Intel Corporation | Network interface for data transport in heterogeneous computing environments |
| US20210026687A1 (en) * | 2019-07-26 | 2021-01-28 | Castalune LLC | Computer-implemented system and methods for computing valuation |
| US20210194793A1 (en) * | 2019-12-23 | 2021-06-24 | Graphcore Limited | Sync Network |
| US20210021619A1 (en) * | 2020-09-26 | 2021-01-21 | Ned M. Smith | Trust-based orchestration of an edge node |
Non-Patent Citations (1)
| Title |
|---|
| Dutch Examination Report for Patent Application No. 2029100, Mailed Jun. 21, 2022, 10 pages. |
Also Published As
| Publication number | Publication date |
|---|---|
| NL2029100B1 (en) | 2022-09-16 |
| CN114493978A (en) | 2022-05-13 |
| NL2029100A (en) | 2022-06-16 |
| DE102021122233A1 (en) | 2022-04-28 |
| US20210042254A1 (en) | 2021-02-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12229069B2 (en) | Accelerator controller hub | |
| US12086082B2 (en) | PASID based routing extension for scalable IOV systems | |
| US12204758B2 (en) | Near-memory compute module | |
| US12182455B2 (en) | Data processing near data storage | |
| US10216419B2 (en) | Direct interface between graphics processing unit and data storage unit | |
| EP3706394B1 (en) | Writes to multiple memory destinations | |
| US20200104275A1 (en) | Shared memory space among devices | |
| US12130754B2 (en) | Adaptive routing for pooled and tiered data architectures | |
| US11709774B2 (en) | Data consistency and durability over distributed persistent memory systems | |
| CN112054963A (en) | Network interface for data transmission in heterogeneous computing environments | |
| US11681625B2 (en) | Receive buffer management | |
| US12170625B2 (en) | Buffer allocation for parallel processing of data by message passing interface (MPI) | |
| US20190102287A1 (en) | Remote persistent memory access device | |
| US12231339B2 (en) | Extension of openvswitch megaflow offloads to hardware to address hardware pipeline limitations | |
| EP4268089A1 (en) | Memory accesses using a memory hub | |
| CN114461544A (en) | Software defined coherency caching for pooled memory | |
| US20210149821A1 (en) | Address translation technologies | |
| US11966330B2 (en) | Link affinitization to reduce transfer latency | |
| CN116010331A (en) | Access to Multiple Timing Domains | |
| US20200341776A1 (en) | Apparatus for initializing memory using a hardware engine for minimizing boot time | |
| US12487757B2 (en) | Using a persistent byte-addressable memory in a compute express link (CXL) memory device for efficient power loss recovery | |
| Choi et al. | Performance evaluation of a remote block device with high-speed cluster interconnects | |
| US12341709B2 (en) | Configurable receive buffer size |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STCT | Information on status: administrative procedure adjustment |
Free format text: PROSECUTION SUSPENDED |
|
| STCT | Information on status: administrative procedure adjustment |
Free format text: PROSECUTION SUSPENDED |
|
| AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MAROLIA, PRATIK;HERDRICH, ANDREW;SANKARAN, RAJESH;AND OTHERS;SIGNING DATES FROM 20201023 TO 20210721;REEL/FRAME:056984/0288 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |