US20220300437A1 - Memory chip connecting a system on a chip and an accelerator chip - Google Patents
Memory chip connecting a system on a chip and an accelerator chip Download PDFInfo
- Publication number
- US20220300437A1 US20220300437A1 US17/837,565 US202217837565A US2022300437A1 US 20220300437 A1 US20220300437 A1 US 20220300437A1 US 202217837565 A US202217837565 A US 202217837565A US 2022300437 A1 US2022300437 A1 US 2022300437A1
- Authority
- US
- United States
- Prior art keywords
- memory
- chip
- soc
- accelerator
- pins
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000015654 memory Effects 0.000 claims abstract description 473
- 238000013473 artificial intelligence Methods 0.000 claims description 116
- 239000013598 vector Substances 0.000 claims description 67
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 13
- 238000003491 array Methods 0.000 claims description 7
- 230000001133 acceleration Effects 0.000 description 17
- 239000003990 capacitor Substances 0.000 description 11
- 238000013500 data storage Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 5
- 238000000034 method Methods 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- PXHVJJICTQNCMI-UHFFFAOYSA-N Nickel Chemical compound [Ni] PXHVJJICTQNCMI-UHFFFAOYSA-N 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000002688 persistence Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229910052759 nickel Inorganic materials 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4027—Coupling between buses using bus bridges
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7839—Architectures of general purpose stored program computers comprising a single central processing unit with memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8053—Vector processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1075—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers for multiport memories each having random access ports and serial ports, e.g. video RAM
Definitions
- At least some embodiments disclosed herein relate to a memory chip connecting a SoC and an accelerator chip (e.g., an AI accelerator chip). At least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory.
- an accelerator chip e.g., an AI accelerator chip
- Memory such as main memory
- Memory is computer hardware that stores information for immediate use in a computer or computing device.
- Memory in general operates at a higher speed than computer storage.
- Computer storage provides slower speeds for accessing information, but also can provide higher capacities and better data reliability.
- Random-access memory (RAM) which is a type of memory, can have high operation speeds.
- memory is made up of addressable semiconductor memory units or cells.
- a memory IC and its memory units can be at least partially implemented by silicon-based metal-oxide-semiconductor field-effect transistors (MOSFETs).
- MOSFETs silicon-based metal-oxide-semiconductor field-effect transistors
- Non-volatile memory can include flash memory (which can also be used as storage) as well as ROM, PROM, EPROM and EEPROM (which can be used for storing firmware).
- flash memory which can also be used as storage
- ROM read-only memory
- PROM PROM
- EPROM EPROM
- EEPROM EEPROM
- Volatile memory can include main memory technologies such as dynamic random-access memory (DRAM), and cache memory which is usually implemented using static random-access memory (SRAM).
- DRAM dynamic random-access memory
- SRAM static random-access memory
- An AI accelerator is a type of microprocessor or computer system configured to accelerate computations for AI applications, including AI applications such as artificial neural networks, machine vision, and machine learning.
- AI accelerators can be hardwired to improve data processing for data-intensive or sensor-driven tasks.
- AI accelerators can include one or more cores and can be wired for low-precision arithmetic and in-memory computing.
- AI accelerators can be found in many devices such as smartphones, tablets, and any type of computer (especially computers with sensors and data-intensive tasks such as graphics and optics processing). Also, AI accelerators can include vector processors or array processors to improve performance on numerical simulations and other types of tasks used in AI applications.
- a SoC is an integrated circuit (IC) that integrates computer components in a single chip.
- Computer components common in a SoC include a central processing unit (CPU), memory, input/output ports and secondary storage.
- a SoC can have all its components on a single substrate or microchip, and some chips can be smaller than a quarter.
- a SoC can include various signal processing functions and can include specialty processors or co-processors such as graphics processing unit (GPU).
- GPU graphics processing unit
- a SoC can consume much less power than conventional multichip systems of equivalent functionality. This makes a SoC beneficial for integration of mobile computing devices (such as in smartphones and tablets). Also, a SoC can be useful for embedded systems and the Internet of Things (especially when the smart device is small).
- memory of a computing system can be hierarchical. Often referred to as memory hierarchy in computer architecture, memory hierarchy can separate computer memory into a hierarchy based on certain factors such as response time, complexity, capacity, persistence and memory bandwidth. Such factors can be related and can often be tradeoffs which further emphasizes the usefulness of a memory hierarchy.
- memory hierarchy affects performance in a computer system. Prioritizing memory bandwidth and speed over other factors can require considering the restrictions of a memory hierarchy, such as response time, complexity, capacity, and persistence. To manage such prioritization, different types of memory chips can be combined to balance chips that are faster with chips that are more reliable or cost effective, etc. Each of the various chips can be viewed as part of a memory hierarchy. And, for example, to reduce latency on faster chips, other chips in a memory chip combination can respond by filling a buffer and then signaling for activating the transfer of data between chips.
- Memory hierarchy can be made of up of chips with different types of memory units or cells.
- memory cells can be DRAM units.
- DRAM is a type of random access semiconductor memory that stores each bit of data in a memory cell, which usually includes a capacitor and a MOSFET. The capacitor can either be charged or discharged which represents two values of a bit, such as “0” and “1”.
- the electric charge on a capacitor leaks off, so DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors by restoring the original charge per capacitor.
- DRAM is considered volatile memory since it loses its data rapidly when power is removed. This is different from flash memory and other types of non-volatile memory, such as NVRAM, in which data storage is more persistent.
- 3D XPoint memory A type of NVRAM is 3D XPoint memory.
- 3D XPoint memory memory units store bits based on a change of bulk resistance, in conjunction with a stackable cross-gridded data access array.
- 3D XPoint memory can be more cost effective than DRAM but less cost effective than flash memory.
- 3D XPoint is non-volatile memory and random-access memory.
- Flash memory is another type of non-volatile memory.
- An advantage of flash memory is that is can be electrically erased and reprogrammed. Flash memory is considered to have two main types, NAND-type flash memory and NOR-type flash memory, which are named after the NAND and NOR logic gates that can implement the memory units of flash memory. The flash memory units or cells exhibit internal characteristics similar to those of the corresponding gates.
- a NAND-type flash memory includes NAND gates.
- a NOR-type flash memory includes NOR gates. NAND-type flash memory may be written and read in blocks which can be smaller than the entire device. NOR-type flash permits a single byte to be written to an erased location or read independently.
- NAND-type flash memory Because of advantages of NAND-type flash memory, such memory has been often utilized for memory cards, USB flash drives, and solid-state drives. However, a primary tradeoff of using flash memory in general is that it is only capable of a relatively small number of write cycles in a specific block compared to other types of memory such as DRAM and NVRAM.
- FIG. 1 illustrates an example related system including an accelerator chip (e.g., an AI accelerator chip) connecting a SoC and a memory chip.
- an accelerator chip e.g., an AI accelerator chip
- FIGS. 2-3 illustrate example related systems including the accelerator chip depicted in FIG. 1 as well as separate memory.
- FIG. 4 illustrates an example system, in accordance with some embodiments of the present disclosure, including a memory chip connecting a SoC and an accelerator chip (e.g., an AI accelerator chip).
- an accelerator chip e.g., an AI accelerator chip
- FIGS. 5-7 illustrate example systems including the memory chip depicted in FIG. 4 as well as separate memory.
- FIG. 8 illustrates an example arrangement of parts of an example computing device, in accordance with some embodiments of the present disclosure.
- FIG. 9 illustrates another example arrangement of parts of an example computing device, in accordance with some embodiments of the present disclosure.
- FIGS. 10 and 11 illustrate example strings of memory chips that can be used in the separate memory depicted in FIGS. 2-3 and 5-7 .
- At least some embodiments disclosed herein relate to a memory chip (e.g., DRAM) connecting a SoC and an accelerator chip (e.g., an AI accelerator chip). At least some embodiments disclosed herein relate to connecting an accelerator chip (e.g., an AI accelerator chip) to a SoC via a memory chip.
- the accelerator chip communicates with the SoC indirectly via the memory chip.
- the data placed in the memory chip connecting the SoC and the accelerator chip by the memory chip is interpreted for requests to the accelerator chip.
- the SoC may optionally use the memory chip connecting the SoC and the accelerator chip for its operations that do not involve the accelerator chip.
- the memory chip connecting the SoC and the accelerator chip can have two general purposes—to be used for the SoC and used for the accelerator chip.
- first memory chip 402 see first memory chip 402 , accelerator chip 404 , and SoC 406 depicted in FIGS. 4-7 . Also, see SoC 806 and application-specific components 807 shown in FIGS. 8-9 .
- the application-specific components 807 can include the first memory chip 402 and accelerator chip 404 in some embodiments of devices 800 and 900 .
- the memory chip connecting the SoC and the accelerator chip can be logically (and sometimes physically) intermediate to the SoC and the accelerator chip.
- a memory chip for the accelerator that is intermediate to the SoC and the accelerator chip may not require having two sets of pins.
- the accelerator chip and the memory chip can be physically on the same bus.
- the memory chip connecting the SoC and the accelerator chip is at least logically between the accelerator chip and the SoC.
- the connection, provided by the memory chip, of the SoC and the accelerator chip may only be a logical connection.
- the memory chip connecting the SoC and the accelerator chip can have two separate sets of pins; one set for connecting to the accelerator chip directly via wiring (e.g., see set of pins 414 and wiring 424 shown in FIGS. 4, 5, and 7 ) and the other set for connecting to the SoC directly via wiring (e.g., see set of pins 416 and wiring 426 shown in FIGS. 4-5 ).
- the accelerator chip being connected to the SoC via the memory chip can provide acceleration of application-specific computations (such as AI computations) for the SoC in general or more specifically, in some embodiments, for a GPU included in the SoC (e.g., see GPU 408 shown in FIGS. 4-7 ).
- a GPU in the SoC and the memory chip connecting the SoC and the accelerator chip can be connected directly.
- the memory chip connecting the GPU and the accelerator chip can include a set of pins and can be connected to the accelerator chip directly via the set of pins and wiring (e.g., see set of pins 414 and wiring 424 ).
- the accelerator chip can have a corresponding set of pins too (e.g., see set of pins 415 ).
- the memory chip connecting the SoC and the accelerator chip can include a second set of pins and can be connected to the GPU directly via the second set of pins and wiring (e.g., see set of pins 416 and wiring 426 ).
- the GPU in the SOC can include a set of pins and can be connected to the memory chip directly via the set of pins and wiring (e.g., see set of pins 417 and wiring 426 ).
- any one of the accelerator chips described herein can be or include a part of a special purpose accelerator chip.
- a special purpose accelerator chip can include an artificial intelligence (AI) accelerator chip, a virtual reality accelerator chip, an augmented reality accelerator chip, a graphics accelerator chip, a machine learning accelerator chip, or any other type of ASIC or FPGA that can provide low latency or high bandwidth memory access.
- AI artificial intelligence
- any one of the accelerator chips described herein can be or include a part of an AI accelerator chip.
- the accelerator chip can be a microprocessor chip or a SoC itself designed for hardware acceleration of AI applications, including artificial neural networks, machine vision, and machine learning.
- the accelerator chip is configured to perform numerical calculations on vectors and matrices (e.g., see vector processor 412 shown in FIG. 4 , which can be configured to perform the numerical calculations on vectors and matrices).
- the accelerator chip can be or include an ASIC or FPGA. With ASIC embodiments of the accelerator chip, the accelerator chip can specifically hardwired for acceleration of application-specific computations (such as AI computations).
- the accelerator chip can be a modified FPGA or GPU modified for acceleration of application-specific computations beyond an unmodified FPGA or GPU. In some other embodiments, the accelerator chip can be an unmodified FPGA or GPU.
- the memory chips connected directly to the accelerator chip are also referred to herein as application-specific memory chips for the sake of clarity when describing multiple memory chips of the overall system.
- the application-specific memory chips are not necessarily hardwired specifically for application-specific computations (e.g., AI computations).
- Each of the application-specific memory chips can be a DRAM chip or a NVRAM chip.
- each of the application-specific memory chips can be connected directly to the accelerator chip and can have memory units specifically for the acceleration of application-specific computations by the accelerator after the application-specific memory chip is configured by the SoC or the accelerator chip.
- the SoC can include a main processor (e.g., CPU).
- main processor 110 shown in FIGS. 4-7 the GPU in the SoC can run instructions for application-specific tasks and computations (e.g., AI tasks and computations), and the main processor can run instructions for non-application-specific tasks and computations (e.g., non-AI tasks and computations).
- the accelerator can provide acceleration of application-specific tasks and computations for the GPU specifically.
- the SoC can also include its own bus for connecting components of the SoC to each other (such as connecting the main processor and the GPU). Also, the bus of the SoC can be configured to connect the SoC to a bus external to the SoC so that the components of the SoC can couple with chips and devices external to the SoC such as a separate memory chip.
- the non-application-specific computations and tasks (e.g., non-AI computations and tasks) of the GPU or such computations and tasks not using the accelerator chip, which may not be conventional tasks performed by the main processor, can use separate memory such as a separate memory chip (which can be application-specific memory).
- the memory can be implemented by DRAM, NVRAM, flash memory, or any combination thereof.
- a separate memory or memory chip can be connected to the SoC and the main processor via a bus external to the SoC (e.g., see memory 204 and bus 202 depicted in FIG. 5 ).
- the separate memory or memory chip can have memory units specifically for the main processor.
- a separate memory or memory chip can be connected to the SoC and the GPU via the bus external to the SoC (e.g., see second memory chip 204 and bus 202 depicted in FIGS. 5-7 ).
- the separate memory or memory chip can have memory units for the main processor or the GPU.
- the application-specific memory chip and the separate memory chip can each be substituted by a group of memory chips such as a string of memory chips (e.g., see the strings of memory chips shown in FIGS. 10 and 11 ).
- the separate memory chip can be substituted by a string of memory chips that includes at least a NVRAM chip and a flash memory chip downstream of the NVRAM chip.
- the separate memory chip can be substituted by at least two memory chips where one of the chips is for the main processor (e.g., CPU) and the other chip is for the GPU for use as memory for non-AI computations and/or tasks.
- At least some embodiments disclosed herein relate to an accelerator chip (e.g., an AI accelerator chip) having a vector processor (e.g., see vector processor 412 shown in FIGS. 4-7 ). And, at least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory (e.g., see FIGS. 10 and 11 ).
- an accelerator chip e.g., an AI accelerator chip
- a vector processor e.g., see vector processor 412 shown in FIGS. 4-7
- at least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory (e.g., see FIGS. 10 and 11 ).
- any one of the accelerator chips described herein can be or include a part of a special purpose accelerator chip.
- a special purpose accelerator chip can include an AI accelerator chip, a virtual reality accelerator chip, an augmented reality accelerator chip, a graphics accelerator chip, a machine learning accelerator chip, or any other type of ASIC or FPGA that can provide low latency or high bandwidth memory access.
- FIG. 1 illustrates an example related system including an accelerator chip (e.g., an AI accelerator chip) connecting a SoC and a memory chip.
- an accelerator chip e.g., an AI accelerator chip
- FIG. 1 illustrates an example system 100 , which is to some extend related to system 400 .
- System 100 includes an accelerator chip 102 (e.g., an AI accelerator chip) connecting a first memory chip 104 and a SoC 106 .
- the SoC 106 includes a GPU 108 as well as a main processor 110 .
- the main processor 110 can be or include a CPU.
- the accelerator chip 102 includes a vector processor 112 .
- the accelerator chip 102 includes a first set of pins 114 and a second set of pins 116 .
- the first set of pins 114 is configured to connect to the first memory chip 104 via wiring 124 .
- the second set of pins 116 is configured to connect to the SoC 106 via wiring 126 .
- the first memory chip 104 includes a corresponding set of pins 115 that connects the memory chip to the accelerator chip 102 via wiring 124 .
- the GPU 108 of the SoC 106 includes a corresponding set of pins 117 that connects the SoC to the accelerator chip 102 via wiring 126 .
- the accelerator chip 102 is configured to perform and accelerate application-specific computations (e.g., AI computations) for the SoC 106 .
- the accelerator chip 102 is also configured to use the first memory chip 104 as memory for the application-specific computations.
- the acceleration of application-specific computations can be performed by the vector processor 112 .
- the vector processor 112 in the accelerator chip 102 can be configured to perform numerical calculations on vectors and matrices for the SoC 106 .
- the accelerator chip 102 can include an ASIC that includes the vector processor 112 and is specifically hardwired to accelerate application-specific computations (e.g., AI computations) through the vector processor 112 .
- the accelerator chip 102 can include FPGA that include the vector processor 112 and are specifically hardwired to accelerate application-specific computations through the vector processor 112 .
- the accelerator chip 102 can include a GPU that includes the vector processor 112 and is specifically hardwired to accelerate application-specific computations through the vector processor 112 .
- the GPU can be specifically modified to accelerate application-specific computations through the vector processor 112 .
- the SoC 106 includes a GPU 108 .
- the accelerator chip 102 can be configured to perform and accelerate application-specific computations (e.g., AI computations) for the GPU 108 .
- the vector processor 112 can be configured to perform numerical calculations on vectors and matrices for the GPU 108 .
- the GPU 108 can be configured to perform application-specific tasks and computations (e.g., AI tasks and computations).
- the SoC 106 includes a main processor 110 that is configured to perform non-AI tasks and computations.
- the memory chip 104 is a DRAM chip.
- the first set of pins 114 can be configured to connect to the DRAM chip via wiring 124 .
- the accelerator chip 102 can be configured to use DRAM cells in the DRAM chip as memory for the application-specific computations (e.g., AI computations).
- the memory chip 104 is a NVRAM chip.
- the first set of pins 114 can be configured to connect to the NVRAM chip via wiring 124 .
- the accelerator chip 102 can be configured to use NVRAM cells in the NVRAM chip as memory for the application-specific computations.
- the NVRAM chip can be or include a 3D XPoint memory chip.
- the first set of pins 114 can be configured to connect to the 3D XPoint memory chip via wiring 124 and the accelerator chip 102 can be configured to use 3D XPoint memory cells in the 3D XPoint memory chip as memory for the application-specific computations.
- the system 100 includes the accelerator chip 102 that is connected, via wiring, to the first memory chip 104 , and the first memory chip 104 can be an application-specific memory chip.
- the system 100 also includes SoC 106 that includes GPU 108 (which can be configured to perform AI tasks) and main processor 110 (which can be configured to perform non-AI tasks and delegate the AI tasks to the GPU 108 ).
- GPU 108 includes set of pins 117 configured to connect to accelerator chip 102 via wiring 126
- the accelerator chip 102 is configured to perform and accelerate AI computations of the AI tasks for the GPU 108 .
- the accelerator chip 102 can include vector processor 112 that is configured to perform numerical calculations on vectors and matrices for the GPU 108 .
- the accelerator chip 102 includes an ASIC that includes the vector processor 112 and is specifically hardwired to accelerate AI computations through the vector processor 112 .
- the accelerator chip 102 includes FPGA that include vector processor 112 and are specifically hardwired to accelerate AI computations through the vector processor 112 .
- the accelerator chip 102 includes a GPU that includes the vector processor 112 and is specifically hardwired to accelerate AI computations through the vector processor 112 .
- the system 100 also includes memory chip 104 , and the accelerator chip 102 can be connected, via wiring 124 , to the memory chip 104 and be configured to perform and accelerate AI computations of AI tasks.
- the memory chip 104 can be or include a DRAM chip having DRAM cells, and the DRAM cells can be configured, by the accelerator chip 102 , to store data for acceleration of AI computations.
- the memory chip 104 can be or include a NVRAM chip having NVRAM cells, and the NVRAM cells can be configured, by the accelerator chip 102 , to store data for acceleration of AI computations.
- the NVRAM chip can include 3D XPoint memory cells, and the 3D XPoint memory cells can be configured, by the accelerator chip 102 , to store data for acceleration of AI computations.
- FIGS. 2-3 illustrate example systems 200 and 300 respectively, each system including the accelerator chip 102 depicted in FIG. 1 as well as separate memory (e.g., NVRAM).
- NVRAM separate memory
- a bus 202 connects the system 100 (including the accelerator chip 102 ) with memory 204 .
- the memory 204 which can be NVRAM in some embodiments, is separate memory from the memory of first memory chip 104 of system 100 .
- memory 204 can be main memory in some embodiments.
- the SoC 106 of the system 100 is connected with the memory 204 via the bus 202 .
- the system 100 as part of system 200 includes the accelerator chip 102 , the first memory chip 104 , and the SoC 106 . These parts of system 100 are connected to the memory 204 via bus 202 .
- a memory controller 206 included in the SoC 106 controls data access of the memory 204 by the SoC 106 of system 100 .
- the memory controller 206 controls data access of the memory 204 by the GPU 108 and/or the main processor 110 .
- the memory controller 206 can control data access of all memory in the system 200 (such as data access of the first memory chip 104 and the memory 204 ).
- the memory controller 206 can be communicatively coupled to the first memory chip 104 and/or the memory 204 .
- the memory 204 is separate memory from the memory provided by the first memory chip 104 of system 100 , and it can be used as memory for the GPU 108 and the main processor 110 of the SoC 106 via the memory controller 206 and the bus 202 . Also, memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by the accelerator chip 102 , for the GPU 108 and the main processor 110 . Data for such tasks can be accessed and communicated to and from memory 204 via memory controller 206 and bus 202 .
- memory 204 is main memory of a device, such as a device that hosts system 200 .
- memory 204 can be the main memory 808 shown in FIG. 8 .
- the bus 202 connects the system 100 (including the accelerator chip 102 ) with the memory 204 . Also, in system 300 , the bus 202 connects the accelerator chip 102 to the SoC 106 as well as the accelerator chip 102 to the memory 204 . Also shown, in system 300 , the bus 202 has replaced the second set of pins 116 of the accelerator chip as well as the wiring 126 and the set of pins 117 of the SoC 106 and GPU 108 .
- the accelerator chip 102 in system 300 similar to system 200 , connects the first memory chip 104 and the SoC 106 of system 100 ; however, the connection is through the first set of pins 114 and the bus 202 .
- the memory 204 is separate memory from the memory of first memory chip 104 of system 100 .
- the SoC 106 of the system 100 is connected with the memory 204 via the bus 202 .
- the system 100 as part of system 300 includes the accelerator chip 102 , the first memory chip 104 , and the SoC 106 . These parts of system 100 are connected to the memory 204 via bus 202 in system 300 .
- a memory controller 206 included in the SoC 106 controls data access of the memory 204 by the SoC 106 of system 100 .
- the memory controller 206 can control data access of all memory in the system 300 (such as data access of the first memory chip 104 and the memory 204 ). And, the memory controller can be connected to the first memory chip 104 and/or the memory 204 . And, the memory controller 206 can be communicatively coupled to the first memory chip 104 and/or the memory 204 .
- the memory 204 (which can be NVRAM in some embodiments) is separate memory from the memory provided by the first memory chip 104 of system 100 , and it can be used as memory for the GPU 108 and the main processor 110 of the SoC 106 via the memory controller 206 and the bus 202 .
- the accelerator chip 102 can use the memory 204 via the bus 202 , in some embodiments and situations.
- memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by the accelerator chip 102 for the GPU 108 and the main processor 110 . Data for such tasks can be accessed and communicated to and from memory 204 via memory controller 206 and/or bus 202 .
- memory 204 is main memory of a device, such as a device that hosts system 300 .
- memory 204 can be the main memory 808 shown in FIG. 9 .
- FIG. 4 illustrates an example system 400 including a first memory chip 402 connecting an accelerator chip 404 (e.g., an AI accelerator chip) and a SoC 406 , in accordance with some embodiments of the present disclosure.
- the SoC 406 includes a GPU 408 as well as main processor 110 .
- the main processor 110 can be or include a CPU in system 400 .
- the accelerator chip 404 includes a vector processor 412 .
- the memory chip 402 includes a first set of pins 414 and a second set of pins 416 .
- the first set of pins 414 is configured to connect to the accelerator chip 404 via wiring 424 .
- the second set of pins 416 is configured to connect to the SoC 406 via wiring 426 .
- the accelerator chip 404 includes a corresponding set of pins 415 that connects the first memory chip 402 to the accelerator chip via wiring 424 .
- the GPU 408 of the SoC 406 includes a corresponding set of pins 417 that connects the SoC to the first memory chip 402 via wiring 426 .
- the first memory chip 402 includes a first plurality of memory cells configured to store and provide computational input data (e.g., AI computation input data) received from the SoC 406 , via the second set of pins 416 , to be used by the accelerator chip 404 as computation input (e.g., AI computation input).
- the computation input data is accessed from the first plurality of memory cells and transmitted from the first memory chip 402 , via the first set of pins 414 , to be received and used by the accelerator chip 404 .
- the first plurality of memory cells can include DRAM cells and/or NVRAM cells. In examples having NVRAM cells, the NVRAM cells can be or include 3D XPoint memory cells.
- the first memory chip 402 also includes a second plurality of memory cells configured to store and provide computation output data (e.g., AI computation output data) received from the accelerator chip 404 , via the first set of pins 414 , to be retrieved by the SoC 406 or reused by the accelerator chip 404 as computation input (e.g., AI computation input).
- the computation output data can be accessed from the second plurality of memory cells and transmitted from the first memory chip 402 , via the first set of pins 414 , to be received and used by the accelerator chip 404 .
- the computation output data can be accessed from the second plurality of memory cells and transmitted from the SoC 406 or the GPU 408 in the SoC, via the second set of pins 416 , to be received and used by the SoC or the GPU in the SoC.
- the second plurality of memory cells can include DRAM cells and/or NVRAM cells.
- the NVRAM cells can be or include 3D XPoint memory cells.
- the first memory chip 402 also includes a third plurality of memory cells configured to store non-AI data related to non-AI tasks received from the SoC 406 , via the set of pins 416 , to be retrieved by the SoC 406 for non-AI tasks.
- the non-AI data can be accessed from the third plurality of memory cells and transmitted from the first memory chip 402 , via the second set of pins 416 , to be received and used by the SoC 406 , the GPU 408 in the SoC, or the main processor 110 in the SoC.
- the third plurality of memory cells can include DRAM cells and/or NVRAM cells. In examples having NVRAM cells, the NVRAM cells can be or include 3D XPoint memory cells.
- the accelerator chip 404 is configured to perform and accelerate application-specific computations (e.g., AI computations) for the SoC 406 .
- the accelerator chip 404 is also configured to use the first memory chip 402 as memory for the application-specific computations.
- the acceleration of application-specific computations can be performed by the vector processor 412 .
- the vector processor 412 in the accelerator chip 404 can be configured to perform numerical calculations on vectors and matrices for the SoC 406 .
- the vector processor 412 can be configured to perform numerical calculations on vectors and matrices for the SoC 406 using the first and second pluralities of memory cells as memory.
- the accelerator chip 404 can include an ASIC that includes the vector processor 412 and is specifically hardwired to accelerate application-specific computations (e.g., AI computations) through the vector processor 412 .
- the accelerator chip 404 can include FPGA that include the vector processor 412 and are specifically hardwired to accelerate application-specific computations through the vector processor 412 .
- the accelerator chip 404 can include a GPU that includes the vector processor 412 and is specifically hardwired to accelerate application-specific computations through the vector processor 412 . In such embodiments, the GPU can be specifically modified to accelerate application-specific computations through the vector processor 412 .
- the SoC 406 includes a GPU 408 .
- the accelerator chip 402 can be configured to perform and accelerate application-specific computations for the GPU 408 .
- the vector processor 412 can be configured to perform numerical calculations on vectors and matrices for the GPU 408 .
- the GPU 408 can be configured to perform application-specific tasks and computations.
- the SoC 406 includes a main processor 110 that is configured to perform non-AI tasks and computations.
- the system 400 includes memory chip 402 , accelerator chip 404 , and SoC 406 , and the memory chip 402 includes at least the first set of pins 414 configured to connect to the accelerator chip 404 via wiring 424 and the second set of pins 416 configured to connect to the SoC 406 via wiring 426 .
- the memory chip 402 can include the first plurality of memory cells configured to store and provide AI computation input data received from the SoC 406 , via the set of pins 416 , to be used by the accelerator chip 404 as AI computation input, as well as the second plurality of memory cells configured to store and provide AI computation output data received from the accelerator chip 404 , via the other set of pins 414 , to be retrieved by the SoC 406 or reused by the accelerator chip 404 as AI computation input.
- the memory chip 402 can include the third plurality of cells used for memory for non-AI computations.
- the SoC 406 includes GPU 408
- the accelerator chip 404 can be configured to perform and accelerate AI computations for the GPU 408 using the first and second pluralities of memory cells as memory.
- the accelerator chip 404 includes a vector processor 412 that can be configured to perform numerical calculations on vectors and matrices for the SoC 406 using the first and second pluralities of memory cells as memory.
- the first plurality of memory cells in the memory chip 402 can be configured to store and provide AI computation input data received from the SoC 406 , via the set of pins 416 , to be used by an accelerator chip 404 (e.g., an AI accelerator chip) as AI computation input.
- the second plurality of memory cells in the memory chip 402 can be configured to store and provide AI computation output data received from the accelerator chip 404 , via the other set of pins 414 , to be retrieved by the SoC 406 or reused by the accelerator chip 404 as AI computation input.
- the third plurality of memory cells in the memory chip 402 can be configured to store non-AI data related to non-AI tasks received from the SoC 406 , via the set of pins 416 , to be retrieved by the SoC 406 for non-AI tasks.
- the first, second, and third pluralities of memory cells in the memory chip 402 each can include DRAM cells and/or NVRAM cells and the NVRAM cells can include 3D XPoint memory cells.
- FIGS. 5-7 illustrate example systems 500 , 600 , and 700 respectively, each system includes the memory chip 402 depicted in FIG. 4 as well as separate memory.
- bus 202 connects the system 400 (including the memory chip 402 and accelerator chip 404 ) with memory 204 .
- the memory 204 e.g., NVRAM
- the memory 204 is separate memory from the memory of first memory chip 402 of system 400 .
- memory 204 can be main memory.
- the SoC 406 of the system 400 is connected with the memory 204 via the bus 202 .
- the system 400 as part of system 500 includes the first memory chip 402 , the accelerator chip 404 , and the SoC 406 . These parts of system 400 are connected to the memory 204 via bus 202 .
- a memory controller 206 included in the SoC 406 controls data access of the memory 204 by the SoC 406 of system 400 .
- the memory controller 206 controls data access of the memory 204 by the GPU 408 and/or the main processor 110 .
- the memory controller 206 can control data access of all memory in the system 500 (such as data access of the first memory chip 402 and the memory 204 ).
- the memory controller 206 can be communicatively coupled to the first memory chip 402 and/or the memory 204 .
- the memory 204 is separate memory from the memory provided by the first memory chip 402 of system 400 , and it can be used as memory for the GPU 408 and the main processor 110 of the SoC 406 via the memory controller 206 and the bus 202 . Also, memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by the accelerator chip 404 , for the GPU 408 and the main processor 110 . Data for such tasks can be accessed and communicated to and from memory 204 via memory controller 206 and bus 202 .
- memory 204 is main memory of a device, such as a device that hosts system 500 .
- memory 204 can be the main memory 808 shown in FIG. 8 .
- bus 202 connects the system 400 (including the memory chip 402 and accelerator chip 404 ) with memory 204 .
- the first memory chip 402 includes a single set of pins 602 that connects the first memory chip 402 to both the accelerator chip 404 and the SoC 406 directly via wiring 614 and 616 respectively.
- the accelerator chip 404 includes a single set of pins 604 that connects the accelerator chip 404 to the first memory chip 402 directly via wiring 614 .
- the GPU of the SoC includes a set of pins 606 that connects the SoC 406 to the first memory chip 402 directly via wiring 606 .
- the SoC 406 of the system 400 is connected with the memory 204 via the bus 202 .
- the system 400 as part of system 600 includes the first memory chip 402 , the accelerator chip 404 , and the SoC 406 .
- These parts of system 400 are connected to the memory 204 via bus 202 (e.g., the accelerator chip 404 and the first memory chip 402 having indirect connections to the memory 204 via the SoC 406 and the bus 202 , and the SoC 406 having a direct connection to the memory 204 via the bus 202 ).
- a memory controller 206 included in the SoC 406 controls data access of the memory 204 by the SoC 406 of system 400 .
- the memory controller 206 controls data access of the memory 204 by the GPU 408 and/or the main processor 110 .
- the memory controller 206 can control data access of all memory in the system 600 (such as data access of the first memory chip 402 and the memory 204 ).
- the memory controller 206 can be communicatively coupled to the first memory chip 402 and/or the memory 204 .
- the memory 204 is separate memory (e.g., NVRAM) from the memory provided by the first memory chip 402 of system 400 , and it can be used as memory for the GPU 408 and the main processor 110 of the SoC 406 via the memory controller 206 and the bus 202 . Also, memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by the accelerator chip 404 , for the GPU 408 and the main processor 110 . Data for such tasks can be accessed and communicated to and from memory 204 via memory controller 206 and bus 202 .
- NVRAM non-application-specific tasks or application-specific tasks
- memory 204 is main memory of a device, such as a device that hosts system 600 .
- memory 204 can be the main memory 808 shown in FIG. 8 .
- bus 202 connects the system 400 (including the memory chip 402 and accelerator chip 404 ) with memory 204 . Also, in system 700 , the bus 202 connects the first memory chip 402 to the SoC 406 as well as the first memory chip 402 to the memory 204 . Also shown, in system 700 , the bus 202 has replaced the second set of pins 416 of the first memory chip 402 as well as the wiring 426 and the set of pins 417 of the SoC 406 and GPU 408 .
- the first memory chip 402 in system 700 similar to systems 500 and 600 , connects the accelerator chip 404 and the SoC 406 of system 400 ; however, the connection is through the first set of pins 414 and the bus 202 .
- the memory 204 is separate memory from the memory of first memory chip 402 of system 400 .
- the SoC 406 of the system 400 is connected with the memory 204 via the bus 202 .
- the system 400 as part of system 700 includes the first memory chip 402 , the accelerator chip 404 , and the SoC 406 . These parts of system 400 are connected to the memory 204 via bus 202 in system 700 .
- a memory controller 206 included in the SoC 406 controls data access of the memory 204 by the SoC 406 of system 400 .
- the memory controller 206 can control data access of all memory in the system 700 (such as data access of the first memory chip 402 and the memory 204 ). And, the memory controller 206 can be communicatively coupled to the first memory chip 402 and/or the memory 204 .
- the memory 204 is separate memory (e.g., NVRAM) from the memory provided by the first memory chip 402 of system 400 , and it can be used as memory for the GPU 408 and the main processor 110 of the SoC 406 via the memory controller 206 and the bus 202 .
- the accelerator chip 404 can use the memory 204 in some embodiments and situations via the first memory chip 402 and the bus 202 .
- the first memory chip 402 can include a cache for the accelerator chip 404 and the memory 204 .
- memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by the accelerator chip 404 for the GPU 408 and the main processor 110 . Data for such tasks can be accessed and communicated to and from memory 204 via memory controller 206 and/or bus 202 .
- memory 204 is main memory of a device, such as a device that hosts system 700 .
- memory 204 can be the main memory 808 shown in FIG. 9 .
- Embodiments of accelerator chips disclosed herein can be microprocessor chips or SoCs or the like.
- the embodiments of the accelerator chips can be designed for hardware acceleration of AI applications, including artificial neural networks, machine vision, and machine learning.
- an accelerator chip e.g., an AI accelerator chip
- the accelerator chip can include a vector processor to perform numerical calculations on vectors and matrices (e.g., see vector processors 112 and 412 shown in FIGS. 1-3 and 4-7 respectively, which can be configured to perform the numerical calculations on vectors and matrices).
- Embodiments of accelerator chips disclosed herein can be or include an ASIC or FPGA.
- the accelerator chip is specifically hardwired for acceleration of application-specific computations (such as AI computations).
- the accelerator chip can be a modified FPGA or GPU modified for acceleration of application-specific computations (such as AI computations) beyond an unmodified FPGA or GPU.
- the accelerator chip can be an unmodified FPGA or GPU.
- An ASIC described herein can include an IC customized for a particular use or application such as acceleration of application-specific computations (such as AI computations). This is different from general-purpose use which is usually implemented by a CPU or another type of general-purpose processor such as a GPU which is generally for processing graphics.
- FPGA described herein can be included in an IC designed and/or configured after manufacturing of the IC and FPGA; thus, the IC and FPGA is field-programmable.
- An FPGA configuration can be specified using a hardware description language (HDL).
- HDL hardware description language
- ASIC configuration can be specified using a HDL.
- a GPU described herein can include an IC configured to rapidly manipulate and alter memory to accelerate the generation and updating of images in a frame buffer to be outputted to a display device.
- systems described herein can include a display device connected to the GPU and a frame buffer connected to the display device and GPU.
- GPUs described herein can be a part of an embedded system, mobile device, personal computer, workstation, or game console, or any device connected to and using a display device.
- Embodiments of microprocessor chips described herein are each one or more integrated circuits that incorporate at least the functionality of a central processing unit.
- Each microprocessor chip can be multipurpose and include at least a clock and registers that implement the chip by accepting binary data as input and processing the data using the registers and clock according to instructions stored in memory connected to the microprocessor chip. Upon processing the data, the microprocessor chip can provide results of the input and instructions as output. And, the output can be provided to the memory connected to the microprocessor chip.
- Embodiments of SoCs described herein are each one or more integrated circuits that integrates components of a computer or other electronic system.
- the SoC is a single IC.
- the SoC can include separated and connected integrated circuits.
- the SoC can include its own CPU, memory, input/output ports, secondary storage, or any combination thereof.
- Such one or more parts can be on a single substrate or microprocessor chip in a SoC described herein.
- the SoC is smaller than a quarter, a nickel, or a dime.
- Some embodiments of the SoCs can be a part of a mobile device (such as a smartphone or tablet computer), an embedded system, or a device in the Internet of Things.
- SoCs are different from systems having a motherboard-based architecture that separates components based on function and connects them through a central interfacing circuit board.
- Embodiments of memory chips described herein that are connected directly to an accelerator chip are also referred to herein as application-specific memory chips for the sake of clarity when describing multiple memory chips of the overall system.
- the application-specific memory chips described herein are not necessarily hardwired specifically for application-specific computations (such as AI computations).
- Each of the application-specific memory chips can be a DRAM chip or a NVRAM chip, or a memory device with similar functionality to either a DRAM chip or a NVRAM chip.
- each of the application-specific memory chips can be connected directly to an accelerator chip (e.g., an AI accelerator chip), e.g., see accelerator chip 102 shown in FIGS. 1-3 and accelerator chip 404 shown in FIGS. 4-7 , and can have memory units or cells specifically for the acceleration of application-specific computations (such as AI computations) by the accelerator chip after the application-specific memory chip is configured by the accelerator chip or a separate SoC or processor (e.g., see SoCs 106 and 406 shown in FIGS. 1-3 and 4-7 respectively).
- an accelerator chip e.g., an AI accelerator chip
- SoCs 106 and 406 shown in FIGS. 1-3 and 4-7 respectively.
- DRAM chips described herein can include random access memory that stores each bit of data in a memory cell or unit having a capacitor and a transistor (such as a MOSFET).
- DRAM chips described herein can take the form of an IC chip and include billions of DRAM memory units or cells. In each unit or cell, the capacitor can either be charged or discharged. This can provide two states used to represent two values of a bit. The electric charge on the capacitor can slowly leak from the capacitor, so an external memory refresh circuit which periodically rewrites the data in the capacitor is needed to maintain state of the capacitor and the memory unit.
- DRAM is also volatile memory and not non-volatile memory, such as flash memory or NVRAM, in that it loses its data quickly when power is removed.
- a benefit of a DRAM chip is that it can be used in digital electronics requiring low-cost and high-capacity computer memory. DRAM is also beneficial to use as main memory or memory for a GPU specifically.
- NVRAM chips described herein can include random-access memory that is non-volatile, which is a main differentiating feature from DRAM.
- An example of NVRAM units or cells that can be used in embodiments described herein can include 3D XPoint units or cells. In a 3D XPoint unit or cell, bit storage is based on a change of bulk resistance, in conjunction with a stackable cross-gridded data access array.
- Embodiments of SoCs described herein can include a main processor (such as a CPU or a main processor including a CPU).
- a main processor such as a CPU or a main processor including a CPU.
- a GPU in the SoC e.g., see GPU 108 shown FIGS. 1-3 and GPU 408 shown in FIGS. 4-7
- the main processor can run instructions for non-application-specific tasks and computations (such as non-AI tasks and computations).
- the accelerator chip connected to the SoC e.g.
- each one of the embodiments of SoCs described herein can include its own bus for connecting components of the SoC to each other (such as connecting the main processor and the GPU).
- a bus of a SoC can be configured to connect the SoC to a bus external to the SoC so that the components of the SoC can couple with chips and devices external to the SoC such as a separate memory or memory chip (e.g., see memory 204 depicted in FIGS. 2-3 and 5-7 as well as main memory 808 depicted in FIGS. 8-9 ).
- the non-application-specific computations and tasks (e.g., non-AI computations and tasks) of the GPU or application-specific computations and tasks (e.g., AI computations and tasks) not using the accelerator chip, which may not be conventional tasks performed by the main processor, can use separate memory such as a separate memory chip (which can be application-specific memory) and the memory can be implemented by DRAM, NVRAM, flash memory, or any combination thereof.
- a separate memory chip which can be application-specific memory
- the memory can be implemented by DRAM, NVRAM, flash memory, or any combination thereof.
- main memory 808 depicted in FIGS. 8-9 see memory 204 depicted in FIGS. 2-3 and 5-7 as well as main memory 808 depicted in FIGS. 8-9 .
- a separate memory or memory chip can be connected to the SoC and the main processor (e.g., CPU) via a bus external to the SoC (e.g., see memory 204 depicted in FIGS. 2-3 and 5-7 as well as main memory 808 depicted in FIGS. 8-9 ; and see bus 202 depicted in FIGS. 2-3 and 5-7 as well as buses 804 depicted in FIGS. 8-9 ).
- the separate memory or memory chip can have memory units specifically for the main processor.
- the separate memory or memory chip can be connected to the SoC and the GPU via the bus external to the SoC.
- the separate memory or memory chip can have memory units or cells for the main processor or the GPU.
- an application-specific memory or memory chip described herein e.g., see first memory chip 104 shown in FIGS. 1-3 or first memory chip 402 shown in FIGS. 4-7
- a separate memory or memory chip described herein e.g., see memory 204 depicted in FIGS. 2-3 and 5-7 as well as main memory 808 depicted in FIGS. 8-9
- a group of memory chips such as a string of memory chips (e.g., see the strings of memory chips shown in FIGS. 10 and 11 ).
- the separate memory or memory chip can be substituted by a string of memory chips that includes at least a NVRAM chip and a flash memory chip downstream of the NVRAM chip.
- the separate memory chip can be substituted by at least two memory chips where one of the chips is for the main processor (e.g., CPU) and the other chip is for the GPU for use as memory for non-AI computations and/or tasks.
- Embodiments of memory chips described herein can be part of main memory and/or can be computer hardware that stores information for immediate use in a computer or for immediate use by any one of the processors described herein (e.g., any SoC or accelerator chip described herein).
- the memory chips described herein can operate at a higher speed than computer storage. Computer storage provides slower speeds for accessing information, but also can provide higher capacities and better data reliability.
- the memory chips described herein can include RAM, which is a type of memory, that can have high operation speeds.
- the memory can be made up of addressable semiconductor memory units or cells, and its units or cells can be at least partially implemented by MOSFETs.
- At least some embodiments disclosed herein relate to an accelerator chip (e.g., an AI accelerator chip) having a vector processor (e.g., see vector processors 112 and 412 shown in FIGS. 1-3 and 4-7 respectively). And, at least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory (e.g., see FIGS. 10 and 11 ).
- an accelerator chip e.g., an AI accelerator chip
- a vector processor e.g., see vector processors 112 and 412 shown in FIGS. 1-3 and 4-7 respectively.
- at least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory (e.g., see FIGS. 10 and 11 ).
- Embodiments of vector processors described herein are each an IC that can implement an instruction set containing instructions that operate on one-dimensional arrays of data called vectors or multidimensional arrays of data called matrices.
- Vector processor are different from scalar processors, whose instructions operate on single data items.
- a vector processor can go beyond merely pipelining instructions and pipeline the data itself. Pipelining can include a process where instructions, or in the case of a vector processor, data itself, passes through multiple sub-units in turn.
- the vector processor is fed instructions that instruct an arithmetic operation on a vector or matrix of numbers simultaneously.
- the vector processor reads a single instruction from memory, and it is simply implied in the definition of the instruction itself that the instruction will operate again on another item of data, at an address one increment larger than the last. This allows for significant savings in decoding time.
- FIG. 8 illustrates an example arrangement of parts of an example computing device 800 , in accordance with some embodiments of the present disclosure.
- the example arrangement of parts of the computing device 800 can include system 100 shown in FIG. 1 , system 200 shown in FIG. 2 , system 400 shown in FIG. 4 , system 500 shown in FIG. 5 , and system 600 shown in FIG. 6 .
- application-specific components e.g., see application-specific components 807 in FIG. 8
- wiring directly connects components of the application-specific components to each other (e.g., see wiring 124 and 424 as well as wiring 614 shown in FIGS. 1-2 and 4-6 respectively). And, in computing device 800 , wiring directly connects the application-specific components to the SoC (e.g., see wiring 817 that directly connects the application-specific components to SoC 806 ).
- the wiring that directly connects the application-specific components to the SoC can include wiring 126 as shown in FIGS. 1 and 2 or wiring 426 as shown in FIGS. 4 and 5 . Also, the wiring that directly connects the application-specific components to the SoC can include wiring 616 as shown in FIG. 6 .
- the computing device 800 can be communicatively coupled to other computing devices via the computer network 802 as shown in FIG. 8 .
- the computing device 800 includes at least buses 804 (which can be one or more buses—such as a combination of a memory bus and a peripheral bus), a SoC 806 (which can be or include SoC 106 or 406 ), application-specific components 807 (which can be accelerator chip 102 and first memory chip 104 or first memory chip 402 and accelerator chip 404 ) and a main memory 808 (which can be or include memory 204 ), as well as a network interface 810 , and a data storage system 812 .
- buses 804 which can be one or more buses—such as a combination of a memory bus and a peripheral bus
- SoC 806 which can be or include SoC 106 or 406
- application-specific components 807 which can be accelerator chip 102 and first memory chip 104 or first memory chip 402 and accelerator chip 404
- main memory 808 which can be or include memory 204
- the buses 804 communicatively couples the SoC 806 , the main memory 808 , the network interface 810 , and the data storage system 812 . And, the buses 804 can include bus 202 and/or a point-to-point memory connection such as wiring 126 , 426 , or 616 .
- the computing device 800 includes a computer system that includes at least one or more processors in the SoC 806 , main memory 808 (e.g., read-only memory (ROM), flash memory, DRAM such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), NVRAM, SRAM, etc.), and data storage system 812 , which communicate with each other via buses 804 (which can include one or more buses and wirings).
- main memory 808 e.g., read-only memory (ROM), flash memory, DRAM such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), NVRAM, SRAM, etc.
- data storage system 812 which communicate with each other via buses 804
- the main memory 808 (which can be, include, or be included in the memory 204 ) can include the memory string 1000 depicted in FIG. 10 . Also, the main memory 808 can include the memory string 1100 depicted in FIG. 11 . In some embodiments, the data storage system 812 can include the memory string 1000 or the memory string 1100 .
- SoC 806 can include one or more general-purpose processing devices such as a microprocessor, a CPU, or the like. Also, the SoC 806 can include one or more special-purpose processing devices such as a GPU, an ASIC, FPGA, a digital signal processor (DSP), network processor, a processor in memory (PIM), or the like.
- the SoC 806 can include one or more processors with a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets, or processors implementing a combination of instruction sets.
- the processors of the SoC 806 can be configured to execute instructions for performing the operations and steps discussed herein. SoC 806 can further include a network interface device such as network interface 810 to communicate over one or more communications network such as network 802 .
- the data storage system 812 can include a machine-readable storage medium (also known as a computer-readable medium) on which is stored one or more sets of instructions or software embodying any one or more of the methodologies or functions described herein.
- the instructions can also reside, completely or at least partially, within the main memory 808 and/or within one or more of the processors of the SoC 806 during execution thereof by the computer system, the main memory 808 and the one or more processors of the SoC 806 also constituting machine-readable storage media.
- machine-readable storage medium shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
- machine-readable storage medium shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.
- FIG. 9 illustrates another example arrangement of parts of an example computing device 900 , in accordance with some embodiments of the present disclosure.
- the example arrangement of parts of the computing device 900 can include system 300 shown in FIG. 3 as well as system 700 shown in FIG. 7 .
- application-specific components e.g., see application-specific components 807 in FIG. 9
- wiring directly connects components of the application-specific components to each other (e.g., see wiring 124 and 424 shown in FIGS. 3 and 7 respectively). However, in computing device 900 , wiring does not directly connect the application-specific components to the SoC. Instead, in computing device 900 , one or more busses connects the application-specific components to the SoC (e.g., see buses 804 as configured and shown in FIG. 9 as well as bus 202 as configured and shown in FIGS. 3 and 7 ).
- computing device 900 can be communicatively coupled to other computing devices via the computer network 802 as shown in FIG. 9 .
- computing device 900 includes at least buses 804 (which can be one or more buses—such as a combination of a memory bus and a peripheral bus), SoC 806 (which can be or include SoC 106 or 406 ), application-specific components 807 (which can be accelerator chip 102 and first memory chip 104 or first memory chip 402 and accelerator chip 404 ) and main memory 808 (which can be or include memory 204 ), as well as network interface 810 , and data storage system 812 .
- buses 804 can be one or more buses—such as a combination of a memory bus and a peripheral bus
- SoC 806 which can be or include SoC 106 or 406
- application-specific components 807 which can be accelerator chip 102 and first memory chip 104 or first memory chip 402 and accelerator chip 404
- main memory 808 which can be or include memory 204
- the buses 804 communicatively couples the SoC 806 , the main memory 808 , the network interface 810 , and the data storage system 812 .
- the buses 804 can include bus 202 and/or a point-to-point memory connection such as wiring 126 , 426 , or 616 .
- At least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory.
- FIGS. 10 and 11 illustrate example strings of memory chips 1000 and 1100 respectively, which can be used in the separate memory depicted in FIGS. 2-3 and 5-7 (i.e., memory 204 ).
- the memory chip string 1000 includes a first memory chip 1002 and a second memory chip 1004 .
- the first memory chip 1002 is directly wired to the second memory chip 1004 (e.g., see wiring 1022 ) and is configured to interact directly with the second memory chip.
- Each chip in the memory chip string 1000 can include one or more sets of pins for connecting to an upstream chip and/or downstream chip in the string (e.g., see sets of pins 1012 and 1014 ).
- each chip in the memory chip string 1000 can include a single IC enclosed within a IC package.
- set of pins 1012 is part of first memory chip 1002 and connects first memory chip 1002 to second memory chip 1004 via wiring 1022 and set of pins 1014 that is part of second memory chip 1004 .
- the wiring 1022 connects the two sets of pins 1012 and 1014 .
- the second memory chip 1004 can have a lowest memory bandwidth of the chips in the string 1000 . In such embodiments and others, the first memory chip 1002 can have a highest memory bandwidth of the chips in the string 1000 . In some embodiments, the first memory chip 1002 is or includes a DRAM chip. In some embodiments, the first memory chip 1002 is or includes a NVRAM chip. In some embodiments, the second memory chip 1004 is or includes a DRAM chip. In some embodiments, the second memory chip 1004 is or includes a NVRAM chip. And, in some embodiments, the second memory chip 1004 is or includes a flash memory chip.
- the memory chip string 1100 includes a first memory chip 1102 , a second memory chip 1104 , and a third memory chip 1106 .
- the first memory chip 1102 is directly wired to the second memory chip 1104 (e.g., see wiring 1122 ) and is configured to interact directly with the second memory chip.
- the second memory chip 1104 is directly wired to the third memory chip 1106 (e.g., see wiring 1124 ) and is configured to interact directly with the third memory chip.
- the first and third memory chips 1102 and 1106 interact with each other indirectly via the second memory chip 1104 .
- Each chip in the memory chip string 1100 can include one or more sets of pins for connecting to an upstream chip and/or downstream chip in the string (e.g., see sets of pins 1112 , 1114 , 1116 , and 1118 ).
- each chip in the memory chip string 1100 can include a single IC enclosed within a IC package.
- set of pins 1112 is part of first memory chip 1102 and connects first memory chip 1102 to second memory chip 1104 via wiring 1122 and set of pins 1114 that is part of second memory chip 1104 .
- the wiring 1122 connects the two sets of pins 1112 and 1114 .
- set of pins 1116 is part of second memory chip 1104 and connects second memory chip 1104 to third memory chip 1106 via wiring 1124 and set of pins 1118 that is part of third memory chip 1106 .
- the wiring 1124 connects the two sets of pins 1116 and 1118 .
- the third memory chip 1106 can have a lowest memory bandwidth of the chips in the string 1100 .
- the first memory chip 1102 can have a highest memory bandwidth of the chips in the string 1100 .
- the second memory chip 1104 can have the next highest memory bandwidth of the chips in the string 1100 .
- the first memory chip 1102 is or includes a DRAM chip.
- the first memory chip 1102 is or includes a NVRAM chip.
- the second memory chip 1104 is or includes a DRAM chip.
- the second memory chip 1104 is or includes a NVRAM chip.
- the second memory chip 1104 is or includes a flash memory chip.
- the third memory chip 1106 is or includes a NVRAM chip. And, in some embodiments, the third memory chip 1106 is or includes a flash memory chip.
- a DRAM chip can include a logic circuit for command and address decoding as well as arrays of memory units of DRAM.
- a DRAM chip described herein can include a cache or buffer memory for incoming and/or outgoing data.
- the memory units that implement the cache or buffer memory can be different from the DRAM units on the chip hosting the cache or buffer memory.
- the memory units that implement the cache or buffer memory on the DRAM chip can be memory units of SRAM.
- a NVRAM chip can include a logic circuit for command and address decoding as well as arrays of memory units of NVRAM such as units of 3D XPoint memory.
- a NVRAM chip described herein can include a cache or buffer memory for incoming and/or outgoing data.
- the memory units that implement the cache or buffer memory can be different from the NVRAM units on the chip hosting the cache or buffer memory.
- the memory units that implement the cache or buffer memory on the NVRAM chip can be memory units of SRAM.
- NVRAM chips can include a cross-point array of non-volatile memory cells.
- a cross-point array of non-volatile memory can perform bit storage based on a change of bulk resistance, in conjunction with a stackable cross-gridded data access array.
- cross-point non-volatile memory can perform a write in-place operation, where a non-volatile memory cell can be programmed without the non-volatile memory cell being previously erased.
- NVRAM chips can be or include cross point storage and memory devices (e.g., 3D XPoint memory).
- a cross point memory device uses transistor-less memory elements, each of which has a memory cell and a selector that are stacked together as a column. Memory element columns are connected via two perpendicular lays of wires, where one lay is above the memory element columns and the other lay below the memory element columns. Each memory element can be individually selected at a cross point of one wire on each of the two layers.
- Cross point memory devices are fast and non-volatile and can be used as a unified memory pool for processing and storage.
- a flash memory chip can include a logic circuit for command and address decoding as well as arrays of memory units of flash memory such as units of NAND-type flash memory.
- a flash memory chip described herein can include a cache or buffer memory for incoming and/or outgoing data.
- the memory units that implement the cache or buffer memory can be different from the flash memory units on the chip hosting the cache or buffer memory.
- the memory units that implement the cache or buffer memory on the flash memory chip can be memory units of SRAM.
- an embodiment of the string of memory chips can include DRAM to DRAM to NVRAM, or DRAM to NVRAM to NVRAM, or DRAM to flash memory to flash memory; however, DRAM to NVRAM to flash memory may provide a more effective solution for a string of memory chips being flexibly provisioned as multi-tier memory.
- DRAM, NVRAM, 3D XPoint memory, and flash memory are techniques for individual memory units, and that a memory chip for any one of the memory chips described herein can include a logic circuit for command and address decoding as well as arrays of memory units of DRAM, NVRAM, 3D XPoint memory, or flash memory.
- a DRAM chip described herein includes a logic circuit for command and address decoding as well as an array of memory units of DRAM.
- NVRAM chip described herein includes a logic circuit for command and address decoding as well as an array of memory units of NVRAM.
- a flash memory chip described herein includes a logic circuit for command and address decoding as well as an array of memory units of flash memory.
- a memory chip for any one of the memory chips described herein can include a cache or buffer memory for incoming and/or outgoing data.
- the memory units that implement the cache or buffer memory may be different from the units on the chip hosting the cache or buffer memory.
- the memory units that implement the cache or buffer memory can be memory units of SRAM.
Abstract
Description
- The present application is a continuation application of U.S. patent application Ser. No. 16/573,805, filed Sep. 17, 2019, the entire disclosure of which application is hereby incorporated herein by reference.
- At least some embodiments disclosed herein relate to a memory chip connecting a SoC and an accelerator chip (e.g., an AI accelerator chip). At least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory.
- Memory, such as main memory, is computer hardware that stores information for immediate use in a computer or computing device. Memory in general operates at a higher speed than computer storage. Computer storage provides slower speeds for accessing information, but also can provide higher capacities and better data reliability. Random-access memory (RAM), which is a type of memory, can have high operation speeds.
- Typically, memory is made up of addressable semiconductor memory units or cells. A memory IC and its memory units can be at least partially implemented by silicon-based metal-oxide-semiconductor field-effect transistors (MOSFETs).
- There are two main types of memory, volatile and non-volatile. Non-volatile memory can include flash memory (which can also be used as storage) as well as ROM, PROM, EPROM and EEPROM (which can be used for storing firmware). Another type of non-volatile memory is non-volatile random-access memory (NVRAM). Volatile memory can include main memory technologies such as dynamic random-access memory (DRAM), and cache memory which is usually implemented using static random-access memory (SRAM).
- An AI accelerator is a type of microprocessor or computer system configured to accelerate computations for AI applications, including AI applications such as artificial neural networks, machine vision, and machine learning. AI accelerators can be hardwired to improve data processing for data-intensive or sensor-driven tasks. AI accelerators can include one or more cores and can be wired for low-precision arithmetic and in-memory computing. AI accelerators can be found in many devices such as smartphones, tablets, and any type of computer (especially computers with sensors and data-intensive tasks such as graphics and optics processing). Also, AI accelerators can include vector processors or array processors to improve performance on numerical simulations and other types of tasks used in AI applications.
- A SoC is an integrated circuit (IC) that integrates computer components in a single chip. Computer components common in a SoC include a central processing unit (CPU), memory, input/output ports and secondary storage. A SoC can have all its components on a single substrate or microchip, and some chips can be smaller than a quarter. A SoC can include various signal processing functions and can include specialty processors or co-processors such as graphics processing unit (GPU). By being tightly integrated, a SoC can consume much less power than conventional multichip systems of equivalent functionality. This makes a SoC beneficial for integration of mobile computing devices (such as in smartphones and tablets). Also, a SoC can be useful for embedded systems and the Internet of Things (especially when the smart device is small).
- Referring back to memory, memory of a computing system can be hierarchical. Often referred to as memory hierarchy in computer architecture, memory hierarchy can separate computer memory into a hierarchy based on certain factors such as response time, complexity, capacity, persistence and memory bandwidth. Such factors can be related and can often be tradeoffs which further emphasizes the usefulness of a memory hierarchy.
- In general, memory hierarchy affects performance in a computer system. Prioritizing memory bandwidth and speed over other factors can require considering the restrictions of a memory hierarchy, such as response time, complexity, capacity, and persistence. To manage such prioritization, different types of memory chips can be combined to balance chips that are faster with chips that are more reliable or cost effective, etc. Each of the various chips can be viewed as part of a memory hierarchy. And, for example, to reduce latency on faster chips, other chips in a memory chip combination can respond by filling a buffer and then signaling for activating the transfer of data between chips.
- Memory hierarchy can be made of up of chips with different types of memory units or cells. For example, memory cells can be DRAM units. DRAM is a type of random access semiconductor memory that stores each bit of data in a memory cell, which usually includes a capacitor and a MOSFET. The capacitor can either be charged or discharged which represents two values of a bit, such as “0” and “1”. In DRAM, the electric charge on a capacitor leaks off, so DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors by restoring the original charge per capacitor. DRAM is considered volatile memory since it loses its data rapidly when power is removed. This is different from flash memory and other types of non-volatile memory, such as NVRAM, in which data storage is more persistent.
- A type of NVRAM is 3D XPoint memory. With 3D XPoint memory, memory units store bits based on a change of bulk resistance, in conjunction with a stackable cross-gridded data access array. 3D XPoint memory can be more cost effective than DRAM but less cost effective than flash memory. Also, 3D XPoint is non-volatile memory and random-access memory.
- Flash memory is another type of non-volatile memory. An advantage of flash memory is that is can be electrically erased and reprogrammed. Flash memory is considered to have two main types, NAND-type flash memory and NOR-type flash memory, which are named after the NAND and NOR logic gates that can implement the memory units of flash memory. The flash memory units or cells exhibit internal characteristics similar to those of the corresponding gates. A NAND-type flash memory includes NAND gates. A NOR-type flash memory includes NOR gates. NAND-type flash memory may be written and read in blocks which can be smaller than the entire device. NOR-type flash permits a single byte to be written to an erased location or read independently. Because of advantages of NAND-type flash memory, such memory has been often utilized for memory cards, USB flash drives, and solid-state drives. However, a primary tradeoff of using flash memory in general is that it is only capable of a relatively small number of write cycles in a specific block compared to other types of memory such as DRAM and NVRAM.
- The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the disclosure.
-
FIG. 1 illustrates an example related system including an accelerator chip (e.g., an AI accelerator chip) connecting a SoC and a memory chip. -
FIGS. 2-3 illustrate example related systems including the accelerator chip depicted inFIG. 1 as well as separate memory. -
FIG. 4 illustrates an example system, in accordance with some embodiments of the present disclosure, including a memory chip connecting a SoC and an accelerator chip (e.g., an AI accelerator chip). -
FIGS. 5-7 illustrate example systems including the memory chip depicted inFIG. 4 as well as separate memory. -
FIG. 8 illustrates an example arrangement of parts of an example computing device, in accordance with some embodiments of the present disclosure. -
FIG. 9 illustrates another example arrangement of parts of an example computing device, in accordance with some embodiments of the present disclosure. -
FIGS. 10 and 11 illustrate example strings of memory chips that can be used in the separate memory depicted inFIGS. 2-3 and 5-7 . - At least some embodiments disclosed herein relate to a memory chip (e.g., DRAM) connecting a SoC and an accelerator chip (e.g., an AI accelerator chip). At least some embodiments disclosed herein relate to connecting an accelerator chip (e.g., an AI accelerator chip) to a SoC via a memory chip. The accelerator chip communicates with the SoC indirectly via the memory chip. The data placed in the memory chip connecting the SoC and the accelerator chip by the memory chip is interpreted for requests to the accelerator chip. Also, the SoC may optionally use the memory chip connecting the SoC and the accelerator chip for its operations that do not involve the accelerator chip. Thus, the memory chip connecting the SoC and the accelerator chip can have two general purposes—to be used for the SoC and used for the accelerator chip. For some examples of such embodiments, see
first memory chip 402,accelerator chip 404, andSoC 406 depicted inFIGS. 4-7 . Also, seeSoC 806 and application-specific components 807 shown inFIGS. 8-9 . The application-specific components 807 can include thefirst memory chip 402 andaccelerator chip 404 in some embodiments ofdevices - As shown in
FIGS. 4-7 , the memory chip connecting the SoC and the accelerator chip can be logically (and sometimes physically) intermediate to the SoC and the accelerator chip. And, a memory chip for the accelerator that is intermediate to the SoC and the accelerator chip may not require having two sets of pins. In some embodiments, the accelerator chip and the memory chip can be physically on the same bus. However, in no circumstances using the intermediate memory chip does the SoC communicate with the accelerator chip directly via a bus or wiring. Thus, the memory chip connecting the SoC and the accelerator chip is at least logically between the accelerator chip and the SoC. Also, the connection, provided by the memory chip, of the SoC and the accelerator chip may only be a logical connection. - The memory chip connecting the SoC and the accelerator chip can have two separate sets of pins; one set for connecting to the accelerator chip directly via wiring (e.g., see set of
pins 414 andwiring 424 shown inFIGS. 4, 5, and 7 ) and the other set for connecting to the SoC directly via wiring (e.g., see set ofpins 416 andwiring 426 shown inFIGS. 4-5 ). - The accelerator chip being connected to the SoC via the memory chip can provide acceleration of application-specific computations (such as AI computations) for the SoC in general or more specifically, in some embodiments, for a GPU included in the SoC (e.g., see
GPU 408 shown inFIGS. 4-7 ). In some embodiments, a GPU in the SoC and the memory chip connecting the SoC and the accelerator chip can be connected directly. In some embodiments, the memory chip connecting the GPU and the accelerator chip can include a set of pins and can be connected to the accelerator chip directly via the set of pins and wiring (e.g., see set ofpins 414 and wiring 424). The accelerator chip can have a corresponding set of pins too (e.g., see set of pins 415). And, the memory chip connecting the SoC and the accelerator chip can include a second set of pins and can be connected to the GPU directly via the second set of pins and wiring (e.g., see set ofpins 416 and wiring 426). Also, the GPU in the SOC can include a set of pins and can be connected to the memory chip directly via the set of pins and wiring (e.g., see set ofpins 417 and wiring 426). - For the purposes of this disclosure, it is to be understood that any one of the accelerator chips described herein can be or include a part of a special purpose accelerator chip. Examples of a special purpose accelerator chip can include an artificial intelligence (AI) accelerator chip, a virtual reality accelerator chip, an augmented reality accelerator chip, a graphics accelerator chip, a machine learning accelerator chip, or any other type of ASIC or FPGA that can provide low latency or high bandwidth memory access. For example, any one of the accelerator chips described herein can be or include a part of an AI accelerator chip.
- The accelerator chip can be a microprocessor chip or a SoC itself designed for hardware acceleration of AI applications, including artificial neural networks, machine vision, and machine learning. In some embodiments, the accelerator chip is configured to perform numerical calculations on vectors and matrices (e.g., see
vector processor 412 shown inFIG. 4 , which can be configured to perform the numerical calculations on vectors and matrices). The accelerator chip can be or include an ASIC or FPGA. With ASIC embodiments of the accelerator chip, the accelerator chip can specifically hardwired for acceleration of application-specific computations (such as AI computations). In some other embodiments, the accelerator chip can be a modified FPGA or GPU modified for acceleration of application-specific computations beyond an unmodified FPGA or GPU. In some other embodiments, the accelerator chip can be an unmodified FPGA or GPU. - The memory chips connected directly to the accelerator chip, e.g., see
first memory chip 402, are also referred to herein as application-specific memory chips for the sake of clarity when describing multiple memory chips of the overall system. The application-specific memory chips are not necessarily hardwired specifically for application-specific computations (e.g., AI computations). Each of the application-specific memory chips can be a DRAM chip or a NVRAM chip. And, each of the application-specific memory chips can be connected directly to the accelerator chip and can have memory units specifically for the acceleration of application-specific computations by the accelerator after the application-specific memory chip is configured by the SoC or the accelerator chip. - In some embodiments, the SoC can include a main processor (e.g., CPU). For example, see
main processor 110 shown inFIGS. 4-7 . In such embodiments, the GPU in the SoC can run instructions for application-specific tasks and computations (e.g., AI tasks and computations), and the main processor can run instructions for non-application-specific tasks and computations (e.g., non-AI tasks and computations). And, in such embodiments, the accelerator can provide acceleration of application-specific tasks and computations for the GPU specifically. The SoC can also include its own bus for connecting components of the SoC to each other (such as connecting the main processor and the GPU). Also, the bus of the SoC can be configured to connect the SoC to a bus external to the SoC so that the components of the SoC can couple with chips and devices external to the SoC such as a separate memory chip. - The non-application-specific computations and tasks (e.g., non-AI computations and tasks) of the GPU or such computations and tasks not using the accelerator chip, which may not be conventional tasks performed by the main processor, can use separate memory such as a separate memory chip (which can be application-specific memory). And, the memory can be implemented by DRAM, NVRAM, flash memory, or any combination thereof. For example, a separate memory or memory chip can be connected to the SoC and the main processor via a bus external to the SoC (e.g., see
memory 204 and bus 202 depicted inFIG. 5 ). In such embodiments, the separate memory or memory chip can have memory units specifically for the main processor. Also, a separate memory or memory chip can be connected to the SoC and the GPU via the bus external to the SoC (e.g., seesecond memory chip 204 and bus 202 depicted inFIGS. 5-7 ). In such embodiments, the separate memory or memory chip can have memory units for the main processor or the GPU. - It is to be understood for the purposes of this disclosure that the application-specific memory chip and the separate memory chip can each be substituted by a group of memory chips such as a string of memory chips (e.g., see the strings of memory chips shown in
FIGS. 10 and 11 ). For example, the separate memory chip can be substituted by a string of memory chips that includes at least a NVRAM chip and a flash memory chip downstream of the NVRAM chip. Also, the separate memory chip can be substituted by at least two memory chips where one of the chips is for the main processor (e.g., CPU) and the other chip is for the GPU for use as memory for non-AI computations and/or tasks. - Additionally, at least some embodiments disclosed herein relate to an accelerator chip (e.g., an AI accelerator chip) having a vector processor (e.g., see
vector processor 412 shown inFIGS. 4-7 ). And, at least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory (e.g., seeFIGS. 10 and 11 ). - For the purposes of this disclosure, it is to be understood that any one of the accelerator chips described herein can be or include a part of a special purpose accelerator chip. Examples of a special purpose accelerator chip can include an AI accelerator chip, a virtual reality accelerator chip, an augmented reality accelerator chip, a graphics accelerator chip, a machine learning accelerator chip, or any other type of ASIC or FPGA that can provide low latency or high bandwidth memory access.
-
FIG. 1 illustrates an example related system including an accelerator chip (e.g., an AI accelerator chip) connecting a SoC and a memory chip. -
FIG. 1 illustrates anexample system 100, which is to some extend related tosystem 400.System 100 includes an accelerator chip 102 (e.g., an AI accelerator chip) connecting afirst memory chip 104 and aSoC 106. As shown, theSoC 106 includes aGPU 108 as well as amain processor 110. Themain processor 110 can be or include a CPU. And, theaccelerator chip 102 includes avector processor 112. - In
system 100, theaccelerator chip 102 includes a first set ofpins 114 and a second set ofpins 116. The first set ofpins 114 is configured to connect to thefirst memory chip 104 viawiring 124. The second set ofpins 116 is configured to connect to theSoC 106 viawiring 126. As shown, thefirst memory chip 104 includes a corresponding set ofpins 115 that connects the memory chip to theaccelerator chip 102 viawiring 124. TheGPU 108 of theSoC 106 includes a corresponding set ofpins 117 that connects the SoC to theaccelerator chip 102 viawiring 126. - The
accelerator chip 102 is configured to perform and accelerate application-specific computations (e.g., AI computations) for theSoC 106. Theaccelerator chip 102 is also configured to use thefirst memory chip 104 as memory for the application-specific computations. The acceleration of application-specific computations can be performed by thevector processor 112. Thevector processor 112 in theaccelerator chip 102 can be configured to perform numerical calculations on vectors and matrices for theSoC 106. Theaccelerator chip 102 can include an ASIC that includes thevector processor 112 and is specifically hardwired to accelerate application-specific computations (e.g., AI computations) through thevector processor 112. Alternatively, theaccelerator chip 102 can include FPGA that include thevector processor 112 and are specifically hardwired to accelerate application-specific computations through thevector processor 112. In some embodiments, theaccelerator chip 102 can include a GPU that includes thevector processor 112 and is specifically hardwired to accelerate application-specific computations through thevector processor 112. In such embodiments, the GPU can be specifically modified to accelerate application-specific computations through thevector processor 112. - As shown, the
SoC 106 includes aGPU 108. And, theaccelerator chip 102 can be configured to perform and accelerate application-specific computations (e.g., AI computations) for theGPU 108. For example, thevector processor 112 can be configured to perform numerical calculations on vectors and matrices for theGPU 108. Also, theGPU 108 can be configured to perform application-specific tasks and computations (e.g., AI tasks and computations). - Also, as shown, the
SoC 106 includes amain processor 110 that is configured to perform non-AI tasks and computations. - In some embodiments, the
memory chip 104 is a DRAM chip. In such examples, the first set ofpins 114 can be configured to connect to the DRAM chip viawiring 124. Also, theaccelerator chip 102 can be configured to use DRAM cells in the DRAM chip as memory for the application-specific computations (e.g., AI computations). In some other embodiments, thememory chip 104 is a NVRAM chip. In such embodiments, the first set ofpins 114 can be configured to connect to the NVRAM chip viawiring 124. Also, theaccelerator chip 102 can be configured to use NVRAM cells in the NVRAM chip as memory for the application-specific computations. Further, the NVRAM chip can be or include a 3D XPoint memory chip. In such examples, the first set ofpins 114 can be configured to connect to the 3D XPoint memory chip viawiring 124 and theaccelerator chip 102 can be configured to use 3D XPoint memory cells in the 3D XPoint memory chip as memory for the application-specific computations. - In some embodiments, the
system 100 includes theaccelerator chip 102 that is connected, via wiring, to thefirst memory chip 104, and thefirst memory chip 104 can be an application-specific memory chip. Thesystem 100 also includesSoC 106 that includes GPU 108 (which can be configured to perform AI tasks) and main processor 110 (which can be configured to perform non-AI tasks and delegate the AI tasks to the GPU 108). In such embodiments,GPU 108 includes set ofpins 117 configured to connect toaccelerator chip 102 viawiring 126, and theaccelerator chip 102 is configured to perform and accelerate AI computations of the AI tasks for theGPU 108. - In such embodiments, the
accelerator chip 102 can includevector processor 112 that is configured to perform numerical calculations on vectors and matrices for theGPU 108. And, theaccelerator chip 102 includes an ASIC that includes thevector processor 112 and is specifically hardwired to accelerate AI computations through thevector processor 112. Or, theaccelerator chip 102 includes FPGA that includevector processor 112 and are specifically hardwired to accelerate AI computations through thevector processor 112. Or, theaccelerator chip 102 includes a GPU that includes thevector processor 112 and is specifically hardwired to accelerate AI computations through thevector processor 112. - The
system 100 also includesmemory chip 104, and theaccelerator chip 102 can be connected, viawiring 124, to thememory chip 104 and be configured to perform and accelerate AI computations of AI tasks. Thememory chip 104 can be or include a DRAM chip having DRAM cells, and the DRAM cells can be configured, by theaccelerator chip 102, to store data for acceleration of AI computations. Or, thememory chip 104 can be or include a NVRAM chip having NVRAM cells, and the NVRAM cells can be configured, by theaccelerator chip 102, to store data for acceleration of AI computations. The NVRAM chip can include 3D XPoint memory cells, and the 3D XPoint memory cells can be configured, by theaccelerator chip 102, to store data for acceleration of AI computations. -
FIGS. 2-3 illustrateexample systems accelerator chip 102 depicted inFIG. 1 as well as separate memory (e.g., NVRAM). - In
FIG. 2 , a bus 202 connects the system 100 (including the accelerator chip 102) withmemory 204. Thememory 204, which can be NVRAM in some embodiments, is separate memory from the memory offirst memory chip 104 ofsystem 100. And,memory 204 can be main memory in some embodiments. - In the
system 200, theSoC 106 of thesystem 100 is connected with thememory 204 via the bus 202. And, thesystem 100 as part ofsystem 200 includes theaccelerator chip 102, thefirst memory chip 104, and theSoC 106. These parts ofsystem 100 are connected to thememory 204 via bus 202. Also, shown inFIG. 2 , amemory controller 206 included in theSoC 106 controls data access of thememory 204 by theSoC 106 ofsystem 100. For example, thememory controller 206 controls data access of thememory 204 by theGPU 108 and/or themain processor 110. In some embodiments, thememory controller 206 can control data access of all memory in the system 200 (such as data access of thefirst memory chip 104 and the memory 204). And, thememory controller 206 can be communicatively coupled to thefirst memory chip 104 and/or thememory 204. - The
memory 204 is separate memory from the memory provided by thefirst memory chip 104 ofsystem 100, and it can be used as memory for theGPU 108 and themain processor 110 of theSoC 106 via thememory controller 206 and the bus 202. Also,memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by theaccelerator chip 102, for theGPU 108 and themain processor 110. Data for such tasks can be accessed and communicated to and frommemory 204 viamemory controller 206 and bus 202. - In some embodiments,
memory 204 is main memory of a device, such as a device that hostssystem 200. For example, with thesystem 200,memory 204 can be themain memory 808 shown inFIG. 8 . - In
FIG. 3 , the bus 202 connects the system 100 (including the accelerator chip 102) with thememory 204. Also, insystem 300, the bus 202 connects theaccelerator chip 102 to theSoC 106 as well as theaccelerator chip 102 to thememory 204. Also shown, insystem 300, the bus 202 has replaced the second set ofpins 116 of the accelerator chip as well as thewiring 126 and the set ofpins 117 of theSoC 106 andGPU 108. Theaccelerator chip 102 insystem 300, similar tosystem 200, connects thefirst memory chip 104 and theSoC 106 ofsystem 100; however, the connection is through the first set ofpins 114 and the bus 202. - Also, similar to
system 200, insystem 300, thememory 204 is separate memory from the memory offirst memory chip 104 ofsystem 100. In thesystem 300, theSoC 106 of thesystem 100 is connected with thememory 204 via the bus 202. And, insystem 300, thesystem 100 as part ofsystem 300 includes theaccelerator chip 102, thefirst memory chip 104, and theSoC 106. These parts ofsystem 100 are connected to thememory 204 via bus 202 insystem 300. Also, similar, as shown inFIG. 3 , amemory controller 206 included in theSoC 106 controls data access of thememory 204 by theSoC 106 ofsystem 100. In some embodiments, thememory controller 206 can control data access of all memory in the system 300 (such as data access of thefirst memory chip 104 and the memory 204). And, the memory controller can be connected to thefirst memory chip 104 and/or thememory 204. And, thememory controller 206 can be communicatively coupled to thefirst memory chip 104 and/or thememory 204. - Also, in
system 300, the memory 204 (which can be NVRAM in some embodiments) is separate memory from the memory provided by thefirst memory chip 104 ofsystem 100, and it can be used as memory for theGPU 108 and themain processor 110 of theSoC 106 via thememory controller 206 and the bus 202. Further, theaccelerator chip 102 can use thememory 204 via the bus 202, in some embodiments and situations. And,memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by theaccelerator chip 102 for theGPU 108 and themain processor 110. Data for such tasks can be accessed and communicated to and frommemory 204 viamemory controller 206 and/or bus 202. - In some embodiments,
memory 204 is main memory of a device, such as a device that hostssystem 300. For example, with thesystem 300,memory 204 can be themain memory 808 shown inFIG. 9 . -
FIG. 4 illustrates anexample system 400 including afirst memory chip 402 connecting an accelerator chip 404 (e.g., an AI accelerator chip) and aSoC 406, in accordance with some embodiments of the present disclosure. As shown, theSoC 406 includes aGPU 408 as well asmain processor 110. Themain processor 110 can be or include a CPU insystem 400. And, theaccelerator chip 404 includes avector processor 412. - In
system 400, thememory chip 402 includes a first set ofpins 414 and a second set ofpins 416. The first set ofpins 414 is configured to connect to theaccelerator chip 404 viawiring 424. The second set ofpins 416 is configured to connect to theSoC 406 viawiring 426. As shown, theaccelerator chip 404 includes a corresponding set ofpins 415 that connects thefirst memory chip 402 to the accelerator chip viawiring 424. TheGPU 408 of theSoC 406 includes a corresponding set ofpins 417 that connects the SoC to thefirst memory chip 402 viawiring 426. - The
first memory chip 402 includes a first plurality of memory cells configured to store and provide computational input data (e.g., AI computation input data) received from theSoC 406, via the second set ofpins 416, to be used by theaccelerator chip 404 as computation input (e.g., AI computation input). The computation input data is accessed from the first plurality of memory cells and transmitted from thefirst memory chip 402, via the first set ofpins 414, to be received and used by theaccelerator chip 404. The first plurality of memory cells can include DRAM cells and/or NVRAM cells. In examples having NVRAM cells, the NVRAM cells can be or include 3D XPoint memory cells. - The
first memory chip 402 also includes a second plurality of memory cells configured to store and provide computation output data (e.g., AI computation output data) received from theaccelerator chip 404, via the first set ofpins 414, to be retrieved by theSoC 406 or reused by theaccelerator chip 404 as computation input (e.g., AI computation input). The computation output data can be accessed from the second plurality of memory cells and transmitted from thefirst memory chip 402, via the first set ofpins 414, to be received and used by theaccelerator chip 404. Also, the computation output data can be accessed from the second plurality of memory cells and transmitted from theSoC 406 or theGPU 408 in the SoC, via the second set ofpins 416, to be received and used by the SoC or the GPU in the SoC. The second plurality of memory cells can include DRAM cells and/or NVRAM cells. In examples having NVRAM cells, the NVRAM cells can be or include 3D XPoint memory cells. - The
first memory chip 402 also includes a third plurality of memory cells configured to store non-AI data related to non-AI tasks received from theSoC 406, via the set ofpins 416, to be retrieved by theSoC 406 for non-AI tasks. The non-AI data can be accessed from the third plurality of memory cells and transmitted from thefirst memory chip 402, via the second set ofpins 416, to be received and used by theSoC 406, theGPU 408 in the SoC, or themain processor 110 in the SoC. The third plurality of memory cells can include DRAM cells and/or NVRAM cells. In examples having NVRAM cells, the NVRAM cells can be or include 3D XPoint memory cells. - The
accelerator chip 404 is configured to perform and accelerate application-specific computations (e.g., AI computations) for theSoC 406. Theaccelerator chip 404 is also configured to use thefirst memory chip 402 as memory for the application-specific computations. The acceleration of application-specific computations can be performed by thevector processor 412. Thevector processor 412 in theaccelerator chip 404 can be configured to perform numerical calculations on vectors and matrices for theSoC 406. For example, thevector processor 412 can be configured to perform numerical calculations on vectors and matrices for theSoC 406 using the first and second pluralities of memory cells as memory. - The
accelerator chip 404 can include an ASIC that includes thevector processor 412 and is specifically hardwired to accelerate application-specific computations (e.g., AI computations) through thevector processor 412. Alternatively, theaccelerator chip 404 can include FPGA that include thevector processor 412 and are specifically hardwired to accelerate application-specific computations through thevector processor 412. In some embodiments, theaccelerator chip 404 can include a GPU that includes thevector processor 412 and is specifically hardwired to accelerate application-specific computations through thevector processor 412. In such embodiments, the GPU can be specifically modified to accelerate application-specific computations through thevector processor 412. - As shown, the
SoC 406 includes aGPU 408. And, theaccelerator chip 402 can be configured to perform and accelerate application-specific computations for theGPU 408. For example, thevector processor 412 can be configured to perform numerical calculations on vectors and matrices for theGPU 408. Also, theGPU 408 can be configured to perform application-specific tasks and computations. Also, as shown, theSoC 406 includes amain processor 110 that is configured to perform non-AI tasks and computations. - In some embodiments, the
system 400 includesmemory chip 402,accelerator chip 404, andSoC 406, and thememory chip 402 includes at least the first set ofpins 414 configured to connect to theaccelerator chip 404 viawiring 424 and the second set ofpins 416 configured to connect to theSoC 406 viawiring 426. And, thememory chip 402 can include the first plurality of memory cells configured to store and provide AI computation input data received from theSoC 406, via the set ofpins 416, to be used by theaccelerator chip 404 as AI computation input, as well as the second plurality of memory cells configured to store and provide AI computation output data received from theaccelerator chip 404, via the other set ofpins 414, to be retrieved by theSoC 406 or reused by theaccelerator chip 404 as AI computation input. And thememory chip 402 can include the third plurality of cells used for memory for non-AI computations. - Also, the
SoC 406 includesGPU 408, and theaccelerator chip 404 can be configured to perform and accelerate AI computations for theGPU 408 using the first and second pluralities of memory cells as memory. And, theaccelerator chip 404 includes avector processor 412 that can be configured to perform numerical calculations on vectors and matrices for theSoC 406 using the first and second pluralities of memory cells as memory. - Also, in the
system 400, the first plurality of memory cells in thememory chip 402 can be configured to store and provide AI computation input data received from theSoC 406, via the set ofpins 416, to be used by an accelerator chip 404 (e.g., an AI accelerator chip) as AI computation input. And, the second plurality of memory cells in thememory chip 402 can be configured to store and provide AI computation output data received from theaccelerator chip 404, via the other set ofpins 414, to be retrieved by theSoC 406 or reused by theaccelerator chip 404 as AI computation input. And, the third plurality of memory cells in thememory chip 402 can be configured to store non-AI data related to non-AI tasks received from theSoC 406, via the set ofpins 416, to be retrieved by theSoC 406 for non-AI tasks. - The first, second, and third pluralities of memory cells in the
memory chip 402 each can include DRAM cells and/or NVRAM cells and the NVRAM cells can include 3D XPoint memory cells. -
FIGS. 5-7 illustrateexample systems memory chip 402 depicted inFIG. 4 as well as separate memory. - In
FIG. 5 , bus 202 connects the system 400 (including thememory chip 402 and accelerator chip 404) withmemory 204. The memory 204 (e.g., NVRAM) is separate memory from the memory offirst memory chip 402 ofsystem 400. And,memory 204 can be main memory. - In the
system 500, theSoC 406 of thesystem 400 is connected with thememory 204 via the bus 202. And, thesystem 400 as part ofsystem 500 includes thefirst memory chip 402, theaccelerator chip 404, and theSoC 406. These parts ofsystem 400 are connected to thememory 204 via bus 202. Also, shown inFIG. 5 , amemory controller 206 included in theSoC 406 controls data access of thememory 204 by theSoC 406 ofsystem 400. For example, thememory controller 206 controls data access of thememory 204 by theGPU 408 and/or themain processor 110. In some embodiments, thememory controller 206 can control data access of all memory in the system 500 (such as data access of thefirst memory chip 402 and the memory 204). And, thememory controller 206 can be communicatively coupled to thefirst memory chip 402 and/or thememory 204. - The
memory 204 is separate memory from the memory provided by thefirst memory chip 402 ofsystem 400, and it can be used as memory for theGPU 408 and themain processor 110 of theSoC 406 via thememory controller 206 and the bus 202. Also,memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by theaccelerator chip 404, for theGPU 408 and themain processor 110. Data for such tasks can be accessed and communicated to and frommemory 204 viamemory controller 206 and bus 202. - In some embodiments,
memory 204 is main memory of a device, such as a device that hostssystem 500. For example, with thesystem 500,memory 204 can be themain memory 808 shown inFIG. 8 . - In
FIG. 6 , similar to inFIG. 5 , bus 202 connects the system 400 (including thememory chip 402 and accelerator chip 404) withmemory 204. Unique to thesystem 600 with respect tosystems first memory chip 402 includes a single set ofpins 602 that connects thefirst memory chip 402 to both theaccelerator chip 404 and theSoC 406 directly viawiring system 600, theaccelerator chip 404 includes a single set ofpins 604 that connects theaccelerator chip 404 to thefirst memory chip 402 directly viawiring 614. Further, insystem 600, the GPU of the SoC includes a set ofpins 606 that connects theSoC 406 to thefirst memory chip 402 directly viawiring 606. - In the
system 600, theSoC 406 of thesystem 400 is connected with thememory 204 via the bus 202. And, thesystem 400 as part ofsystem 600 includes thefirst memory chip 402, theaccelerator chip 404, and theSoC 406. These parts ofsystem 400 are connected to thememory 204 via bus 202 (e.g., theaccelerator chip 404 and thefirst memory chip 402 having indirect connections to thememory 204 via theSoC 406 and the bus 202, and theSoC 406 having a direct connection to thememory 204 via the bus 202). Also, shown inFIG. 6 , amemory controller 206 included in theSoC 406 controls data access of thememory 204 by theSoC 406 ofsystem 400. For example, thememory controller 206 controls data access of thememory 204 by theGPU 408 and/or themain processor 110. In some embodiments, thememory controller 206 can control data access of all memory in the system 600 (such as data access of thefirst memory chip 402 and the memory 204). And, thememory controller 206 can be communicatively coupled to thefirst memory chip 402 and/or thememory 204. - The
memory 204 is separate memory (e.g., NVRAM) from the memory provided by thefirst memory chip 402 ofsystem 400, and it can be used as memory for theGPU 408 and themain processor 110 of theSoC 406 via thememory controller 206 and the bus 202. Also,memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by theaccelerator chip 404, for theGPU 408 and themain processor 110. Data for such tasks can be accessed and communicated to and frommemory 204 viamemory controller 206 and bus 202. - In some embodiments,
memory 204 is main memory of a device, such as a device that hostssystem 600. For example, with thesystem 600,memory 204 can be themain memory 808 shown inFIG. 8 . - In
FIG. 7 , bus 202 connects the system 400 (including thememory chip 402 and accelerator chip 404) withmemory 204. Also, insystem 700, the bus 202 connects thefirst memory chip 402 to theSoC 406 as well as thefirst memory chip 402 to thememory 204. Also shown, insystem 700, the bus 202 has replaced the second set ofpins 416 of thefirst memory chip 402 as well as thewiring 426 and the set ofpins 417 of theSoC 406 andGPU 408. Thefirst memory chip 402 insystem 700, similar tosystems accelerator chip 404 and theSoC 406 ofsystem 400; however, the connection is through the first set ofpins 414 and the bus 202. - Also, similar to
system systems system 700, thememory 204 is separate memory from the memory offirst memory chip 402 ofsystem 400. In thesystem 700, theSoC 406 of thesystem 400 is connected with thememory 204 via the bus 202. And, insystem 700, thesystem 400 as part ofsystem 700 includes thefirst memory chip 402, theaccelerator chip 404, and theSoC 406. These parts ofsystem 400 are connected to thememory 204 via bus 202 insystem 700. Also, similar, as shown inFIG. 7 , amemory controller 206 included in theSoC 406 controls data access of thememory 204 by theSoC 406 ofsystem 400. In some embodiments, thememory controller 206 can control data access of all memory in the system 700 (such as data access of thefirst memory chip 402 and the memory 204). And, thememory controller 206 can be communicatively coupled to thefirst memory chip 402 and/or thememory 204. - Also, in
system 700, thememory 204 is separate memory (e.g., NVRAM) from the memory provided by thefirst memory chip 402 ofsystem 400, and it can be used as memory for theGPU 408 and themain processor 110 of theSoC 406 via thememory controller 206 and the bus 202. Further, theaccelerator chip 404 can use thememory 204 in some embodiments and situations via thefirst memory chip 402 and the bus 202. In such examples, thefirst memory chip 402 can include a cache for theaccelerator chip 404 and thememory 204. And,memory 204 can be used as memory for non-application-specific tasks or application-specific tasks (such as non-AI tasks or AI tasks) not performed by theaccelerator chip 404 for theGPU 408 and themain processor 110. Data for such tasks can be accessed and communicated to and frommemory 204 viamemory controller 206 and/or bus 202. - In some embodiments,
memory 204 is main memory of a device, such as a device that hostssystem 700. For example, with thesystem 700,memory 204 can be themain memory 808 shown inFIG. 9 . - Embodiments of accelerator chips disclosed herein (e.g., see
accelerator chip 102 andaccelerator chip 404 shown inFIGS. 1-3 and 4-7 respectively) can be microprocessor chips or SoCs or the like. The embodiments of the accelerator chips can be designed for hardware acceleration of AI applications, including artificial neural networks, machine vision, and machine learning. In some embodiments, an accelerator chip (e.g., an AI accelerator chip) can be configured to perform numerical calculations on vectors and matrices. In such embodiments, the accelerator chip can include a vector processor to perform numerical calculations on vectors and matrices (e.g., seevector processors FIGS. 1-3 and 4-7 respectively, which can be configured to perform the numerical calculations on vectors and matrices). - Embodiments of accelerator chips disclosed herein can be or include an ASIC or FPGA. With ASIC embodiments of the accelerator chip, the accelerator chip is specifically hardwired for acceleration of application-specific computations (such as AI computations). In some other embodiments, the accelerator chip can be a modified FPGA or GPU modified for acceleration of application-specific computations (such as AI computations) beyond an unmodified FPGA or GPU. In some other embodiments, the accelerator chip can be an unmodified FPGA or GPU.
- An ASIC described herein can include an IC customized for a particular use or application such as acceleration of application-specific computations (such as AI computations). This is different from general-purpose use which is usually implemented by a CPU or another type of general-purpose processor such as a GPU which is generally for processing graphics.
- FPGA described herein can be included in an IC designed and/or configured after manufacturing of the IC and FPGA; thus, the IC and FPGA is field-programmable. An FPGA configuration can be specified using a hardware description language (HDL). Likewise, an ASIC configuration can be specified using a HDL.
- A GPU described herein can include an IC configured to rapidly manipulate and alter memory to accelerate the generation and updating of images in a frame buffer to be outputted to a display device. And, systems described herein can include a display device connected to the GPU and a frame buffer connected to the display device and GPU. GPUs described herein can be a part of an embedded system, mobile device, personal computer, workstation, or game console, or any device connected to and using a display device.
- Embodiments of microprocessor chips described herein are each one or more integrated circuits that incorporate at least the functionality of a central processing unit. Each microprocessor chip can be multipurpose and include at least a clock and registers that implement the chip by accepting binary data as input and processing the data using the registers and clock according to instructions stored in memory connected to the microprocessor chip. Upon processing the data, the microprocessor chip can provide results of the input and instructions as output. And, the output can be provided to the memory connected to the microprocessor chip.
- Embodiments of SoCs described herein are each one or more integrated circuits that integrates components of a computer or other electronic system. In some embodiments, the SoC is a single IC. In other embodiments, the SoC can include separated and connected integrated circuits. In some embodiments, the SoC can include its own CPU, memory, input/output ports, secondary storage, or any combination thereof. Such one or more parts can be on a single substrate or microprocessor chip in a SoC described herein. In some embodiments, the SoC is smaller than a quarter, a nickel, or a dime. Some embodiments of the SoCs can be a part of a mobile device (such as a smartphone or tablet computer), an embedded system, or a device in the Internet of Things. In general, SoCs are different from systems having a motherboard-based architecture that separates components based on function and connects them through a central interfacing circuit board.
- Embodiments of memory chips described herein that are connected directly to an accelerator chip (e.g., an AI accelerator chip), e.g., see
first memory chip 104 shown inFIGS. 1-3 orfirst memory chip 402 show inFIGS. 4-7 , are also referred to herein as application-specific memory chips for the sake of clarity when describing multiple memory chips of the overall system. The application-specific memory chips described herein are not necessarily hardwired specifically for application-specific computations (such as AI computations). Each of the application-specific memory chips can be a DRAM chip or a NVRAM chip, or a memory device with similar functionality to either a DRAM chip or a NVRAM chip. And, each of the application-specific memory chips can be connected directly to an accelerator chip (e.g., an AI accelerator chip), e.g., seeaccelerator chip 102 shown inFIGS. 1-3 andaccelerator chip 404 shown inFIGS. 4-7 , and can have memory units or cells specifically for the acceleration of application-specific computations (such as AI computations) by the accelerator chip after the application-specific memory chip is configured by the accelerator chip or a separate SoC or processor (e.g., seeSoCs FIGS. 1-3 and 4-7 respectively). - DRAM chips described herein can include random access memory that stores each bit of data in a memory cell or unit having a capacitor and a transistor (such as a MOSFET). DRAM chips described herein can take the form of an IC chip and include billions of DRAM memory units or cells. In each unit or cell, the capacitor can either be charged or discharged. This can provide two states used to represent two values of a bit. The electric charge on the capacitor can slowly leak from the capacitor, so an external memory refresh circuit which periodically rewrites the data in the capacitor is needed to maintain state of the capacitor and the memory unit. DRAM is also volatile memory and not non-volatile memory, such as flash memory or NVRAM, in that it loses its data quickly when power is removed. A benefit of a DRAM chip is that it can be used in digital electronics requiring low-cost and high-capacity computer memory. DRAM is also beneficial to use as main memory or memory for a GPU specifically.
- NVRAM chips described herein can include random-access memory that is non-volatile, which is a main differentiating feature from DRAM. An example of NVRAM units or cells that can be used in embodiments described herein can include 3D XPoint units or cells. In a 3D XPoint unit or cell, bit storage is based on a change of bulk resistance, in conjunction with a stackable cross-gridded data access array.
- Embodiments of SoCs described herein can include a main processor (such as a CPU or a main processor including a CPU). For example, see
SoC 106 depicted inFIGS. 1-3 andSoC 406 depicted inFIGS. 4-7 as well asmain processor 110 shown inFIGS. 1-7 . In such embodiments, a GPU in the SoC (e.g., seeGPU 108 shownFIGS. 1-3 andGPU 408 shown inFIGS. 4-7 ) can run instructions for application-specific tasks and computations (such as AI tasks and computations) and the main processor can run instructions for non-application-specific tasks and computations (such as non-AI tasks and computations). And, in such embodiments, the accelerator chip connected to the SoC (e.g. see any one of the accelerator chips shown inFIGS. 1-7 ) can provide acceleration of application-specific tasks and computations (such as AI tasks and computations) for the GPU specifically. Each one of the embodiments of SoCs described herein can include its own bus for connecting components of the SoC to each other (such as connecting the main processor and the GPU). Also, a bus of a SoC can be configured to connect the SoC to a bus external to the SoC so that the components of the SoC can couple with chips and devices external to the SoC such as a separate memory or memory chip (e.g., seememory 204 depicted inFIGS. 2-3 and 5-7 as well asmain memory 808 depicted inFIGS. 8-9 ). - The non-application-specific computations and tasks (e.g., non-AI computations and tasks) of the GPU or application-specific computations and tasks (e.g., AI computations and tasks) not using the accelerator chip, which may not be conventional tasks performed by the main processor, can use separate memory such as a separate memory chip (which can be application-specific memory) and the memory can be implemented by DRAM, NVRAM, flash memory, or any combination thereof. For example, see
memory 204 depicted inFIGS. 2-3 and 5-7 as well asmain memory 808 depicted inFIGS. 8-9 . A separate memory or memory chip can be connected to the SoC and the main processor (e.g., CPU) via a bus external to the SoC (e.g., seememory 204 depicted inFIGS. 2-3 and 5-7 as well asmain memory 808 depicted inFIGS. 8-9 ; and see bus 202 depicted inFIGS. 2-3 and 5-7 as well as buses 804 depicted inFIGS. 8-9 ). In such embodiments, the separate memory or memory chip can have memory units specifically for the main processor. Also, the separate memory or memory chip can be connected to the SoC and the GPU via the bus external to the SoC. In such embodiments, the separate memory or memory chip can have memory units or cells for the main processor or the GPU. - It is to be understood for the purposes of this disclosure that an application-specific memory or memory chip described herein (e.g., see
first memory chip 104 shown inFIGS. 1-3 orfirst memory chip 402 shown inFIGS. 4-7 ) and a separate memory or memory chip described herein (e.g., seememory 204 depicted inFIGS. 2-3 and 5-7 as well asmain memory 808 depicted inFIGS. 8-9 ) can each be substituted with a group of memory chips such as a string of memory chips (e.g., see the strings of memory chips shown inFIGS. 10 and 11 ). For example, the separate memory or memory chip can be substituted by a string of memory chips that includes at least a NVRAM chip and a flash memory chip downstream of the NVRAM chip. Also, the separate memory chip can be substituted by at least two memory chips where one of the chips is for the main processor (e.g., CPU) and the other chip is for the GPU for use as memory for non-AI computations and/or tasks. - Embodiments of memory chips described herein can be part of main memory and/or can be computer hardware that stores information for immediate use in a computer or for immediate use by any one of the processors described herein (e.g., any SoC or accelerator chip described herein). The memory chips described herein can operate at a higher speed than computer storage. Computer storage provides slower speeds for accessing information, but also can provide higher capacities and better data reliability. The memory chips described herein can include RAM, which is a type of memory, that can have high operation speeds. The memory can be made up of addressable semiconductor memory units or cells, and its units or cells can be at least partially implemented by MOSFETs.
- Additionally, at least some embodiments disclosed herein relate to an accelerator chip (e.g., an AI accelerator chip) having a vector processor (e.g., see
vector processors FIGS. 1-3 and 4-7 respectively). And, at least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory (e.g., seeFIGS. 10 and 11 ). - Embodiments of vector processors described herein are each an IC that can implement an instruction set containing instructions that operate on one-dimensional arrays of data called vectors or multidimensional arrays of data called matrices. Vector processor are different from scalar processors, whose instructions operate on single data items. In some embodiments, a vector processor can go beyond merely pipelining instructions and pipeline the data itself. Pipelining can include a process where instructions, or in the case of a vector processor, data itself, passes through multiple sub-units in turn. In some embodiments, the vector processor is fed instructions that instruct an arithmetic operation on a vector or matrix of numbers simultaneously. Instead of continually having to decode instructions and then fetch the data needed to complete the instructions, the vector processor reads a single instruction from memory, and it is simply implied in the definition of the instruction itself that the instruction will operate again on another item of data, at an address one increment larger than the last. This allows for significant savings in decoding time.
-
FIG. 8 illustrates an example arrangement of parts of anexample computing device 800, in accordance with some embodiments of the present disclosure. The example arrangement of parts of thecomputing device 800 can includesystem 100 shown inFIG. 1 ,system 200 shown inFIG. 2 ,system 400 shown inFIG. 4 ,system 500 shown inFIG. 5 , andsystem 600 shown inFIG. 6 . In thecomputing device 800, application-specific components (e.g., see application-specific components 807 inFIG. 8 ), which can be AI components, can include thefirst memory chip accelerator chip FIGS. 1, 2, 4, 5, and 6 respectively as well as theSoC FIGS. 1, 2, 4, 5, and 6 respectively. In thecomputing device 800, wiring directly connects components of the application-specific components to each other (e.g., see wiring 124 and 424 as well aswiring 614 shown inFIGS. 1-2 and 4-6 respectively). And, incomputing device 800, wiring directly connects the application-specific components to the SoC (e.g., see wiring 817 that directly connects the application-specific components to SoC 806). The wiring that directly connects the application-specific components to the SoC can includewiring 126 as shown inFIGS. 1 and 2 orwiring 426 as shown inFIGS. 4 and 5 . Also, the wiring that directly connects the application-specific components to the SoC can includewiring 616 as shown inFIG. 6 . - The
computing device 800 can be communicatively coupled to other computing devices via thecomputer network 802 as shown inFIG. 8 . Thecomputing device 800 includes at least buses 804 (which can be one or more buses—such as a combination of a memory bus and a peripheral bus), a SoC 806 (which can be or includeSoC 106 or 406), application-specific components 807 (which can beaccelerator chip 102 andfirst memory chip 104 orfirst memory chip 402 and accelerator chip 404) and a main memory 808 (which can be or include memory 204), as well as anetwork interface 810, and adata storage system 812. The buses 804 communicatively couples theSoC 806, themain memory 808, thenetwork interface 810, and thedata storage system 812. And, the buses 804 can include bus 202 and/or a point-to-point memory connection such aswiring computing device 800 includes a computer system that includes at least one or more processors in theSoC 806, main memory 808 (e.g., read-only memory (ROM), flash memory, DRAM such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), NVRAM, SRAM, etc.), anddata storage system 812, which communicate with each other via buses 804 (which can include one or more buses and wirings). - The main memory 808 (which can be, include, or be included in the memory 204) can include the
memory string 1000 depicted inFIG. 10 . Also, themain memory 808 can include thememory string 1100 depicted inFIG. 11 . In some embodiments, thedata storage system 812 can include thememory string 1000 or thememory string 1100. -
SoC 806 can include one or more general-purpose processing devices such as a microprocessor, a CPU, or the like. Also, theSoC 806 can include one or more special-purpose processing devices such as a GPU, an ASIC, FPGA, a digital signal processor (DSP), network processor, a processor in memory (PIM), or the like. TheSoC 806 can include one or more processors with a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets, or processors implementing a combination of instruction sets. The processors of theSoC 806 can be configured to execute instructions for performing the operations and steps discussed herein.SoC 806 can further include a network interface device such asnetwork interface 810 to communicate over one or more communications network such asnetwork 802. - The
data storage system 812 can include a machine-readable storage medium (also known as a computer-readable medium) on which is stored one or more sets of instructions or software embodying any one or more of the methodologies or functions described herein. The instructions can also reside, completely or at least partially, within themain memory 808 and/or within one or more of the processors of theSoC 806 during execution thereof by the computer system, themain memory 808 and the one or more processors of theSoC 806 also constituting machine-readable storage media. - While the memory, processor, and data storage parts are shown in the example embodiment to each be a single part, each part should be taken to include a single part or multiple parts that can store the instructions and perform their respective operations. The term “machine-readable storage medium” shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term “machine-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.
-
FIG. 9 illustrates another example arrangement of parts of anexample computing device 900, in accordance with some embodiments of the present disclosure. The example arrangement of parts of thecomputing device 900 can includesystem 300 shown inFIG. 3 as well assystem 700 shown inFIG. 7 . In thecomputing device 900, application-specific components (e.g., see application-specific components 807 inFIG. 9 ), which can be AI components, can include thefirst memory chip accelerator chip FIGS. 3 and 7 respectively as well as theSoC FIGS. 3 and 7 respectively. In thecomputing device 900, wiring directly connects components of the application-specific components to each other (e.g., see wiring 124 and 424 shown inFIGS. 3 and 7 respectively). However, incomputing device 900, wiring does not directly connect the application-specific components to the SoC. Instead, incomputing device 900, one or more busses connects the application-specific components to the SoC (e.g., see buses 804 as configured and shown inFIG. 9 as well as bus 202 as configured and shown inFIGS. 3 and 7 ). - As shown by
FIGS. 8 and 9 ,devices computing device 900 can be communicatively coupled to other computing devices via thecomputer network 802 as shown inFIG. 9 . Similarly, as shown inFIG. 9 ,computing device 900 includes at least buses 804 (which can be one or more buses—such as a combination of a memory bus and a peripheral bus), SoC 806 (which can be or includeSoC 106 or 406), application-specific components 807 (which can beaccelerator chip 102 andfirst memory chip 104 orfirst memory chip 402 and accelerator chip 404) and main memory 808 (which can be or include memory 204), as well asnetwork interface 810, anddata storage system 812. Similarly, the buses 804 communicatively couples theSoC 806, themain memory 808, thenetwork interface 810, and thedata storage system 812. And, the buses 804 can include bus 202 and/or a point-to-point memory connection such aswiring - As mentioned, at least some embodiments disclosed herein relate to using memory hierarchy and a string of memory chips to form a memory.
-
FIGS. 10 and 11 illustrate example strings ofmemory chips FIGS. 2-3 and 5-7 (i.e., memory 204). - In
FIG. 10 , thememory chip string 1000 includes afirst memory chip 1002 and asecond memory chip 1004. Thefirst memory chip 1002 is directly wired to the second memory chip 1004 (e.g., see wiring 1022) and is configured to interact directly with the second memory chip. Each chip in thememory chip string 1000 can include one or more sets of pins for connecting to an upstream chip and/or downstream chip in the string (e.g., see sets ofpins 1012 and 1014). In some embodiments, each chip in thememory chip string 1000 can include a single IC enclosed within a IC package. - As shown in
FIG. 10 , set ofpins 1012 is part offirst memory chip 1002 and connectsfirst memory chip 1002 tosecond memory chip 1004 viawiring 1022 and set ofpins 1014 that is part ofsecond memory chip 1004. Thewiring 1022 connects the two sets ofpins - In some embodiments, the
second memory chip 1004 can have a lowest memory bandwidth of the chips in thestring 1000. In such embodiments and others, thefirst memory chip 1002 can have a highest memory bandwidth of the chips in thestring 1000. In some embodiments, thefirst memory chip 1002 is or includes a DRAM chip. In some embodiments, thefirst memory chip 1002 is or includes a NVRAM chip. In some embodiments, thesecond memory chip 1004 is or includes a DRAM chip. In some embodiments, thesecond memory chip 1004 is or includes a NVRAM chip. And, in some embodiments, thesecond memory chip 1004 is or includes a flash memory chip. - In
FIG. 11 , thememory chip string 1100 includes afirst memory chip 1102, asecond memory chip 1104, and athird memory chip 1106. Thefirst memory chip 1102 is directly wired to the second memory chip 1104 (e.g., see wiring 1122) and is configured to interact directly with the second memory chip. Thesecond memory chip 1104 is directly wired to the third memory chip 1106 (e.g., see wiring 1124) and is configured to interact directly with the third memory chip. In such ways, the first andthird memory chips second memory chip 1104. - Each chip in the
memory chip string 1100 can include one or more sets of pins for connecting to an upstream chip and/or downstream chip in the string (e.g., see sets ofpins memory chip string 1100 can include a single IC enclosed within a IC package. - As shown in
FIG. 11 , set ofpins 1112 is part offirst memory chip 1102 and connectsfirst memory chip 1102 tosecond memory chip 1104 viawiring 1122 and set ofpins 1114 that is part ofsecond memory chip 1104. Thewiring 1122 connects the two sets ofpins pins 1116 is part ofsecond memory chip 1104 and connectssecond memory chip 1104 tothird memory chip 1106 viawiring 1124 and set ofpins 1118 that is part ofthird memory chip 1106. Thewiring 1124 connects the two sets ofpins - In some embodiments, the
third memory chip 1106 can have a lowest memory bandwidth of the chips in thestring 1100. In such embodiments and others, thefirst memory chip 1102 can have a highest memory bandwidth of the chips in thestring 1100. Also, in such embodiments and others, thesecond memory chip 1104 can have the next highest memory bandwidth of the chips in thestring 1100. In some embodiments, thefirst memory chip 1102 is or includes a DRAM chip. In some embodiments, thefirst memory chip 1102 is or includes a NVRAM chip. In some embodiments, thesecond memory chip 1104 is or includes a DRAM chip. In some embodiments, thesecond memory chip 1104 is or includes a NVRAM chip. In some embodiments, thesecond memory chip 1104 is or includes a flash memory chip. In some embodiments, thethird memory chip 1106 is or includes a NVRAM chip. And, in some embodiments, thethird memory chip 1106 is or includes a flash memory chip. - In embodiments having one or more DRAM chips, a DRAM chip can include a logic circuit for command and address decoding as well as arrays of memory units of DRAM. Also, a DRAM chip described herein can include a cache or buffer memory for incoming and/or outgoing data. In some embodiments, the memory units that implement the cache or buffer memory can be different from the DRAM units on the chip hosting the cache or buffer memory. For example, the memory units that implement the cache or buffer memory on the DRAM chip can be memory units of SRAM.
- In embodiments having one or more NVRAM chips, a NVRAM chip can include a logic circuit for command and address decoding as well as arrays of memory units of NVRAM such as units of 3D XPoint memory. Also, a NVRAM chip described herein can include a cache or buffer memory for incoming and/or outgoing data. In some embodiments, the memory units that implement the cache or buffer memory can be different from the NVRAM units on the chip hosting the cache or buffer memory. For example, the memory units that implement the cache or buffer memory on the NVRAM chip can be memory units of SRAM.
- In some embodiments, NVRAM chips can include a cross-point array of non-volatile memory cells. A cross-point array of non-volatile memory can perform bit storage based on a change of bulk resistance, in conjunction with a stackable cross-gridded data access array. Additionally, in contrast to many flash-based memories, cross-point non-volatile memory can perform a write in-place operation, where a non-volatile memory cell can be programmed without the non-volatile memory cell being previously erased.
- As mentioned herein, NVRAM chips can be or include cross point storage and memory devices (e.g., 3D XPoint memory). A cross point memory device uses transistor-less memory elements, each of which has a memory cell and a selector that are stacked together as a column. Memory element columns are connected via two perpendicular lays of wires, where one lay is above the memory element columns and the other lay below the memory element columns. Each memory element can be individually selected at a cross point of one wire on each of the two layers. Cross point memory devices are fast and non-volatile and can be used as a unified memory pool for processing and storage.
- In embodiments having one or more flash memory chips, a flash memory chip can include a logic circuit for command and address decoding as well as arrays of memory units of flash memory such as units of NAND-type flash memory. Also, a flash memory chip described herein can include a cache or buffer memory for incoming and/or outgoing data. In some embodiments, the memory units that implement the cache or buffer memory can be different from the flash memory units on the chip hosting the cache or buffer memory. For example, the memory units that implement the cache or buffer memory on the flash memory chip can be memory units of SRAM.
- Also, for example, an embodiment of the string of memory chips can include DRAM to DRAM to NVRAM, or DRAM to NVRAM to NVRAM, or DRAM to flash memory to flash memory; however, DRAM to NVRAM to flash memory may provide a more effective solution for a string of memory chips being flexibly provisioned as multi-tier memory.
- Also, for the purposes of this disclosure, it is to be understood that that DRAM, NVRAM, 3D XPoint memory, and flash memory are techniques for individual memory units, and that a memory chip for any one of the memory chips described herein can include a logic circuit for command and address decoding as well as arrays of memory units of DRAM, NVRAM, 3D XPoint memory, or flash memory. For example, a DRAM chip described herein includes a logic circuit for command and address decoding as well as an array of memory units of DRAM. For example, a NVRAM chip described herein includes a logic circuit for command and address decoding as well as an array of memory units of NVRAM. For example, a flash memory chip described herein includes a logic circuit for command and address decoding as well as an array of memory units of flash memory.
- Also, a memory chip for any one of the memory chips described herein can include a cache or buffer memory for incoming and/or outgoing data. In some embodiments, the memory units that implement the cache or buffer memory may be different from the units on the chip hosting the cache or buffer memory. For example, the memory units that implement the cache or buffer memory can be memory units of SRAM.
- In the foregoing specification, embodiments of the disclosure have been described with reference to specific example embodiments thereof. It will be evident that various modifications can be made thereto without departing from the broader spirit and scope of embodiments of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/837,565 US20220300437A1 (en) | 2019-09-17 | 2022-06-10 | Memory chip connecting a system on a chip and an accelerator chip |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/573,805 US11397694B2 (en) | 2019-09-17 | 2019-09-17 | Memory chip connecting a system on a chip and an accelerator chip |
US17/837,565 US20220300437A1 (en) | 2019-09-17 | 2022-06-10 | Memory chip connecting a system on a chip and an accelerator chip |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/573,805 Continuation US11397694B2 (en) | 2019-09-17 | 2019-09-17 | Memory chip connecting a system on a chip and an accelerator chip |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220300437A1 true US20220300437A1 (en) | 2022-09-22 |
Family
ID=74869510
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/573,805 Active US11397694B2 (en) | 2019-09-17 | 2019-09-17 | Memory chip connecting a system on a chip and an accelerator chip |
US17/837,565 Pending US20220300437A1 (en) | 2019-09-17 | 2022-06-10 | Memory chip connecting a system on a chip and an accelerator chip |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/573,805 Active US11397694B2 (en) | 2019-09-17 | 2019-09-17 | Memory chip connecting a system on a chip and an accelerator chip |
Country Status (8)
Country | Link |
---|---|
US (2) | US11397694B2 (en) |
EP (1) | EP4032032A4 (en) |
JP (1) | JP2022548641A (en) |
KR (1) | KR20220041226A (en) |
CN (1) | CN114402308A (en) |
AU (1) | AU2020349448A1 (en) |
TW (1) | TW202117551A (en) |
WO (1) | WO2021055280A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11416422B2 (en) | 2019-09-17 | 2022-08-16 | Micron Technology, Inc. | Memory chip having an integrated data mover |
US20230051863A1 (en) * | 2021-08-10 | 2023-02-16 | Micron Technology, Inc. | Memory device for wafer-on-wafer formed memory and logic |
TWI819480B (en) | 2022-01-27 | 2023-10-21 | 緯創資通股份有限公司 | Acceleration system and dynamic configuration method thereof |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030023958A1 (en) * | 2001-07-17 | 2003-01-30 | Patel Mukesh K. | Intermediate language accelerator chip |
US20140359219A1 (en) * | 2013-05-31 | 2014-12-04 | Altera Corporation | Cache Memory Controller for Accelerated Data Transfer |
US20180107406A1 (en) * | 2016-10-14 | 2018-04-19 | Snu R&Db Foundation | Memory module, memory device, and processing device having a processor mode, and memory system |
US20190057302A1 (en) * | 2017-08-16 | 2019-02-21 | SK Hynix Inc. | Memory device including neural network processor and memory system including the memory device |
US20190057303A1 (en) * | 2017-08-18 | 2019-02-21 | Microsoft Technology Licensing, Llc | Hardware node having a mixed-signal matrix vector unit |
US20190273782A1 (en) * | 2016-04-06 | 2019-09-05 | Reniac, Inc. | System and method for a database proxy |
US20200042247A1 (en) * | 2018-08-06 | 2020-02-06 | Samsung Electronics Co., Ltd. | Memory device and memory system including the same |
US10649672B1 (en) * | 2016-03-31 | 2020-05-12 | EMC IP Holding Company LLC | Offloading device maintenance to an external processor in low-latency, non-volatile memory |
US10659672B2 (en) * | 2015-02-17 | 2020-05-19 | Alpinereplay, Inc. | Systems and methods to control camera operations |
Family Cites Families (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030112613A1 (en) | 2002-10-22 | 2003-06-19 | Hitachi, Ltd. | IC card |
JP2003006041A (en) | 2001-06-20 | 2003-01-10 | Hitachi Ltd | Semiconductor device |
US20030212845A1 (en) | 2002-05-07 | 2003-11-13 | Court John William | Method for high-speed data transfer across LDT and PCI buses |
US20050086040A1 (en) | 2003-10-02 | 2005-04-21 | Curtis Davis | System incorporating physics processing unit |
US7895411B2 (en) | 2003-10-02 | 2011-02-22 | Nvidia Corporation | Physics processing unit |
US7739479B2 (en) | 2003-10-02 | 2010-06-15 | Nvidia Corporation | Method for providing physics simulation data |
US7210008B2 (en) | 2003-12-18 | 2007-04-24 | Intel Corporation | Memory controller for padding and stripping data in response to read and write commands |
US7185153B2 (en) | 2003-12-18 | 2007-02-27 | Intel Corporation | Packet assembly |
US7206915B2 (en) | 2004-06-03 | 2007-04-17 | Emc Corp | Virtual space manager for computer having a physical address extension feature |
US7406634B2 (en) | 2004-12-02 | 2008-07-29 | Cisco Technology, Inc. | Method and apparatus for utilizing an exception handler to avoid hanging up a CPU when a peripheral device does not respond |
US20070165457A1 (en) | 2005-09-30 | 2007-07-19 | Jin-Ki Kim | Nonvolatile memory system |
US7600081B2 (en) * | 2006-01-18 | 2009-10-06 | Marvell World Trade Ltd. | Processor architecture having multi-ported memory |
US9195602B2 (en) | 2007-03-30 | 2015-11-24 | Rambus Inc. | System including hierarchical memory modules having different types of integrated circuit memory devices |
US7627744B2 (en) | 2007-05-10 | 2009-12-01 | Nvidia Corporation | External memory accessing DMA request scheduling in IC of parallel processing engines according to completion notification queue occupancy level |
US8077644B2 (en) | 2007-07-20 | 2011-12-13 | Infineon Technologies Ag | Data transfer in a computing device |
US20090063786A1 (en) | 2007-08-29 | 2009-03-05 | Hakjune Oh | Daisy-chain memory configuration and usage |
US7721010B2 (en) | 2007-10-31 | 2010-05-18 | Qimonda North America Corp. | Method and apparatus for implementing memory enabled systems using master-slave architecture |
US20100217977A1 (en) | 2009-02-23 | 2010-08-26 | William Preston Goodwill | Systems and methods of security for an object based storage device |
US8219746B2 (en) | 2009-10-08 | 2012-07-10 | International Business Machines Corporation | Memory package utilizing at least two types of memories |
US8463984B2 (en) | 2009-12-31 | 2013-06-11 | Seagate Technology Llc | Dynamic data flow management in a multiple cache architecture |
US8595429B2 (en) | 2010-08-24 | 2013-11-26 | Qualcomm Incorporated | Wide input/output memory with low density, low latency and high density, high latency blocks |
US8726107B2 (en) | 2011-07-15 | 2014-05-13 | Seagate Technology Llc | Measurement of latency in data paths |
CN107608910B (en) | 2011-09-30 | 2021-07-02 | 英特尔公司 | Apparatus and method for implementing a multi-level memory hierarchy with different operating modes |
US9256915B2 (en) | 2012-01-27 | 2016-02-09 | Qualcomm Incorporated | Graphics processing unit buffer management |
US9055069B2 (en) | 2012-03-19 | 2015-06-09 | Xcelemor, Inc. | Hardware computing system with software mediation and method of operation thereof |
US9304828B2 (en) | 2012-09-27 | 2016-04-05 | Hitachi, Ltd. | Hierarchy memory management |
US10073626B2 (en) | 2013-03-15 | 2018-09-11 | Virident Systems, Llc | Managing the write performance of an asymmetric memory system |
WO2015099767A1 (en) | 2013-12-27 | 2015-07-02 | Intel Corporation | Scalable input/output system and techniques |
WO2015101827A1 (en) | 2013-12-31 | 2015-07-09 | Mosys, Inc. | Integrated main memory and coprocessor with low latency |
US10445025B2 (en) | 2014-03-18 | 2019-10-15 | Micron Technology, Inc. | Apparatuses and methods having memory tier structure and recursively searching between tiers for address in a translation table where information is only directly transferred between controllers |
US10437479B2 (en) | 2014-08-19 | 2019-10-08 | Samsung Electronics Co., Ltd. | Unified addressing and hierarchical heterogeneous storage and memory |
KR102208072B1 (en) | 2014-09-01 | 2021-01-27 | 삼성전자주식회사 | Data processing system |
US20170017576A1 (en) | 2015-07-16 | 2017-01-19 | Qualcomm Incorporated | Self-adaptive Cache Architecture Based on Run-time Hardware Counters and Offline Profiling of Applications |
US10387303B2 (en) * | 2016-08-16 | 2019-08-20 | Western Digital Technologies, Inc. | Non-volatile storage system with compute engine to accelerate big data applications |
KR20180075913A (en) * | 2016-12-27 | 2018-07-05 | 삼성전자주식회사 | A method for input processing using neural network calculator and an apparatus thereof |
US10261786B2 (en) | 2017-03-09 | 2019-04-16 | Google Llc | Vector processing unit |
US10872290B2 (en) | 2017-09-21 | 2020-12-22 | Raytheon Company | Neural network processor with direct memory access and hardware acceleration circuits |
US11222256B2 (en) * | 2017-10-17 | 2022-01-11 | Xilinx, Inc. | Neural network processing system having multiple processors and a neural network accelerator |
KR102424962B1 (en) | 2017-11-15 | 2022-07-25 | 삼성전자주식회사 | Memory Device performing parallel arithmetic process and Memory Module having the same |
US10860244B2 (en) | 2017-12-26 | 2020-12-08 | Intel Corporation | Method and apparatus for multi-level memory early page demotion |
CN108228387B (en) * | 2017-12-27 | 2019-11-05 | 中兴通讯股份有限公司 | A kind of starting control method, electronic equipment and computer readable storage medium |
US11398453B2 (en) * | 2018-01-09 | 2022-07-26 | Samsung Electronics Co., Ltd. | HBM silicon photonic TSV architecture for lookup computing AI accelerator |
US10956086B2 (en) | 2018-01-29 | 2021-03-23 | Micron Technology, Inc. | Memory controller |
KR20190106228A (en) | 2018-03-08 | 2019-09-18 | 에스케이하이닉스 주식회사 | Memory system and operating method of memory system |
US11562208B2 (en) | 2018-05-17 | 2023-01-24 | Qualcomm Incorporated | Continuous relaxation of quantization for discretized deep neural networks |
US11656775B2 (en) | 2018-08-07 | 2023-05-23 | Marvell Asia Pte, Ltd. | Virtualizing isolation areas of solid-state storage media |
US20190188386A1 (en) | 2018-12-27 | 2019-06-20 | Intel Corporation | Protecting ai payloads running in gpu against main cpu residing adversaries |
US10949356B2 (en) | 2019-06-14 | 2021-03-16 | Intel Corporation | Fast page fault handling process implemented on persistent memory |
US20210081353A1 (en) | 2019-09-17 | 2021-03-18 | Micron Technology, Inc. | Accelerator chip connecting a system on a chip and a memory chip |
US20210081318A1 (en) | 2019-09-17 | 2021-03-18 | Micron Technology, Inc. | Flexible provisioning of multi-tier memory |
US11163490B2 (en) | 2019-09-17 | 2021-11-02 | Micron Technology, Inc. | Programmable engine for data movement |
US11416422B2 (en) | 2019-09-17 | 2022-08-16 | Micron Technology, Inc. | Memory chip having an integrated data mover |
-
2019
- 2019-09-17 US US16/573,805 patent/US11397694B2/en active Active
-
2020
- 2020-09-07 TW TW109130611A patent/TW202117551A/en unknown
- 2020-09-14 EP EP20866396.3A patent/EP4032032A4/en active Pending
- 2020-09-14 AU AU2020349448A patent/AU2020349448A1/en not_active Withdrawn
- 2020-09-14 KR KR1020227008626A patent/KR20220041226A/en unknown
- 2020-09-14 CN CN202080064781.4A patent/CN114402308A/en active Pending
- 2020-09-14 WO PCT/US2020/050713 patent/WO2021055280A1/en unknown
- 2020-09-14 JP JP2022517123A patent/JP2022548641A/en active Pending
-
2022
- 2022-06-10 US US17/837,565 patent/US20220300437A1/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030023958A1 (en) * | 2001-07-17 | 2003-01-30 | Patel Mukesh K. | Intermediate language accelerator chip |
US20140359219A1 (en) * | 2013-05-31 | 2014-12-04 | Altera Corporation | Cache Memory Controller for Accelerated Data Transfer |
US10659672B2 (en) * | 2015-02-17 | 2020-05-19 | Alpinereplay, Inc. | Systems and methods to control camera operations |
US10649672B1 (en) * | 2016-03-31 | 2020-05-12 | EMC IP Holding Company LLC | Offloading device maintenance to an external processor in low-latency, non-volatile memory |
US20190273782A1 (en) * | 2016-04-06 | 2019-09-05 | Reniac, Inc. | System and method for a database proxy |
US20180107406A1 (en) * | 2016-10-14 | 2018-04-19 | Snu R&Db Foundation | Memory module, memory device, and processing device having a processor mode, and memory system |
US20190057302A1 (en) * | 2017-08-16 | 2019-02-21 | SK Hynix Inc. | Memory device including neural network processor and memory system including the memory device |
US20190057303A1 (en) * | 2017-08-18 | 2019-02-21 | Microsoft Technology Licensing, Llc | Hardware node having a mixed-signal matrix vector unit |
US20200042247A1 (en) * | 2018-08-06 | 2020-02-06 | Samsung Electronics Co., Ltd. | Memory device and memory system including the same |
Also Published As
Publication number | Publication date |
---|---|
EP4032032A1 (en) | 2022-07-27 |
KR20220041226A (en) | 2022-03-31 |
TW202117551A (en) | 2021-05-01 |
US20210081337A1 (en) | 2021-03-18 |
AU2020349448A1 (en) | 2022-01-20 |
JP2022548641A (en) | 2022-11-21 |
WO2021055280A1 (en) | 2021-03-25 |
CN114402308A (en) | 2022-04-26 |
US11397694B2 (en) | 2022-07-26 |
EP4032032A4 (en) | 2023-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210081353A1 (en) | Accelerator chip connecting a system on a chip and a memory chip | |
US11599475B2 (en) | Apparatuses and methods for compute enabled cache | |
US11915741B2 (en) | Apparatuses and methods for logic/memory devices | |
US20220300437A1 (en) | Memory chip connecting a system on a chip and an accelerator chip | |
US11468944B2 (en) | Utilization of data stored in an edge section of an array | |
US10725952B2 (en) | Accessing status information | |
US11682449B2 (en) | Apparatuses and methods for compute in data path | |
KR102054335B1 (en) | Translation index buffer in memory | |
US10185674B2 (en) | Apparatus and methods for in data path compute operations | |
US20210181974A1 (en) | Systems and methods for low-latency memory device | |
US20220050639A1 (en) | Programmable engine for data movement | |
CN114945984A (en) | Extended memory communication | |
CN111694513A (en) | Memory device and method including a circular instruction memory queue |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICRON TECHNOLOGY, INC., IDAHO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EILERT, SEAN STEPHEN;CUREWITZ, KENNETH MARION;ENO, JUSTIN M.;REEL/FRAME:060169/0373 Effective date: 20190916 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |