US20060206641A1 - Active memory data compression system and method - Google Patents
Active memory data compression system and method Download PDFInfo
- Publication number
- US20060206641A1 US20060206641A1 US11/431,455 US43145506A US2006206641A1 US 20060206641 A1 US20060206641 A1 US 20060206641A1 US 43145506 A US43145506 A US 43145506A US 2006206641 A1 US2006206641 A1 US 2006206641A1
- Authority
- US
- United States
- Prior art keywords
- data
- memory device
- dram
- active memory
- host
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000015654 memory Effects 0.000 title abstract description 58
- 238000000034 method Methods 0.000 title description 9
- 238000013144 data compression Methods 0.000 title 1
- 238000012545 processing Methods 0.000 claims abstract description 37
- 238000012546 transfer Methods 0.000 claims description 16
- 230000008878 coupling Effects 0.000 abstract description 2
- 238000010168 coupling process Methods 0.000 abstract description 2
- 238000005859 coupling reaction Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
- G06F9/3879—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7821—Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7839—Architectures of general purpose stored program computers comprising a single central processing unit with memory
- G06F15/7842—Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
- G06F15/785—Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers) with decentralized control, e.g. smart memories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
Definitions
- This invention relates memory devices, and, more particularly, to techniques for efficiently transferring data to and from active memory devices.
- a common computer processing task involves sequentially processing large numbers of data items, such as data corresponding to each of a large number of pixels in an array. Processing data in this manner normally requires fetching each item of data from a memory device, performing a mathematical or logical calculation on that data, and then returning the processed data to the memory device. Performing such processing tasks at high speed is greatly facilitated by a high data bandwidth between the processor and the memory devices.
- the data bandwidth between a processor and a memory device is proportional to the width of a data path between the processor and the memory device and the frequency at which the data are clocked between the processor and the memory device. Therefore, increasing either of these parameters will increase the data bandwidth between the processor and memory device, and hence the rate at which data can be processed.
- An active memory device is a memory device having its own processing resource. It is relatively easy to provide an active memory device with a wide data path, thereby achieving a high memory bandwidth.
- Conventional active memory devices have been provided for mainframe computers in the form of discrete memory devices having dedicated processing resources.
- DRAM dynamic random access memory
- Single chip active memories have several advantageous properties.
- the data path between the DRAM device and the processor can be made very wide to provide a high data bandwidth between the DRAM device and the processor.
- the data path between a discrete DRAM device and a processor is normally limited by constraints on the size of external data buses.
- the speed at which data can be clocked between the DRAM device and the processor can be relatively high, which also maximizes data bandwidth.
- the cost of an active memory fabricated on a single chip can is also less than the cost of a discrete memory device coupled to an external processor.
- An active memory device can be designed to operate at a very high speed by parallel processing data using a large number of processing elements (“PEs”) each of which processes a respective group of the data bits.
- PEs processing elements
- One type of parallel processor is known as a single instruction, multiple data (“SIMD”) processor.
- SIMD single instruction, multiple data
- each of a large number of PEs simultaneously receive the same instructions, but they each process separate data.
- the instructions are generally provided to the PE's by a suitable device, such as a microprocessor.
- a suitable device such as a microprocessor.
- MIMD multiple instruction, multiple data
- MIMD multiple instruction, multiple data
- a high performance active memory device can be implemented by fabricating a large number of SIMD PEs or MIMD PEs and a DRAM on a single chip, and coupling each of the PEs to respective groups of columns of the DRAM.
- the instructions are provided to the PEs from an external device, such as a host microprocessor.
- the number of PE's included on the chip can be very large, thereby resulting in a massively parallel processor capable of processing vast amounts of data.
- data to be operated on by the PEs are first written to the DRAM, generally from an external source such as a disk, network or input/output (“I/O”) device in a host computer system.
- the PE's fetch respective groups of data to be operated on by the PEs, perform the operations called for by the instructions, and then pass data corresponding to the results of the operations back to the DRAM.
- the results data can be either coupled back to the external source or processed further in a subsequent operation.
- active memory devices allow much more efficient processing of data stored in memory
- the processing speed of a computer system using active memory devices is somewhat limited by the time required to transfer operand data to the active memory for processing and the time required to transfer results data from the active memory after the operand data has been processed.
- active memory devices are essentially no more efficient than passive memory devices that also require data stored in the memory device to be transferred to and from an external device, such as a CPU.
- An integrated circuit active memory device includes a memory device and an array of processing elements, such as SIMD or MIMD processing elements, coupled to the memory device. Compressed data transferred through a host/memory interface port are first written to the memory device. The processing elements then decompresses the data stored in the memory device and write the decompressed data to the memory device. The processing elements also read data from the memory device, compress the data read from the memory device, and then write the compressed data to the memory device. The compressed data are then transferred through the host/memory interface. Instructions are preferably provided to the processing elements by an array control unit, and memory commands are preferably issued to the memory device through a memory control unit. The array control unit and the memory control unit preferably execute instructions provided by a command engine responsive to task commands provided to the active memory device by a host computer system.
- a command engine responsive to task commands provided to the active memory device by a host computer system.
- FIG. 1 is a block diagram of a computer system using an active memory device according to one embodiment of the invention.
- FIG. 2 is a memory map showing the organization of intrinsics stored in a program memory in the active memory device of FIG. 1 .
- FIG. 3 is a block diagram of computer system using several active memory devices according to one embodiment of the invention.
- FIG. 4 is a flow chart showing one embodiment of a procedure for transferring data from the active memory device to a mass storage device in the computer system of FIG. 3 .
- FIG. 5 is a flow chart showing one embodiment of a procedure for transferring data from a mass storage device to active memory devices in the computer system of FIG. 3 .
- FIG. 1 shows an active memory device 10 according to one embodiment of the invention.
- the memory device 10 is preferably a component in a host system 14 , which may include a memory controller 18 , a host CPU 20 , a mass storage device 24 , such as a disk drive, a bus bridge 28 coupled between the memory controller 18 and the mass storage device 24 , and other components that have been omitted from the host system 14 shown in FIG. 1 for the purpose of brevity and clarity.
- a network such as a local area network (“LAN”), may be coupled to the bus bridge 28 .
- a high speed interface (not shown), such as an Infiniband or Hypertransport interface, could be coupled to the memory controller 18 .
- Other variations to the host system 14 shown in FIG. 1 will be apparent to one skilled in the art.
- the active memory device 10 includes a first in, first out (“FIFO”) buffer 38 that receives high level task commands from the host system 14 , which may also include a task address.
- the received task commands are buffered by the FIFO buffer 38 and passed to a command engine 40 at the proper time and in the order in which they are received.
- the command engine 40 generates respective sequences of instructions corresponding to the received task commands. These instructions are at a lower level than the task commands.
- the instructions are coupled from the command engine 40 to either a processing element (“PE”) FIFO buffer 44 or a dynamic random access memory (“DRAM”) FIFO buffer 48 depending upon whether the commands are PE commands or DRAM commands.
- PE processing element
- DRAM dynamic random access memory
- the instructions are PE instructions, they are passed to the PE FIFO buffer 44 and then from the buffer 44 to a processing array control unit (“ACU”) 50 .
- the ACU 50 subsequently passes microinstructions to an array of PEs 54 .
- the PEs 54 preferably operate as SIMD processors in which all of the PEs 54 receive and simultaneously execute the same instructions, but they may do so on different operands. However, the PEs 54 may alternatively operate at MIMD processors or some other type of processors.
- the DCU 60 couples memory commands and addresses to a DRAM 64 to read data from and write data to the DRAM 64 .
- DRAM 64 there are 256 PE's 54 each of which is coupled to receive 8 bits of data from the DRAM 64 through register files 68 .
- the register files 68 thus allow operand data to be coupled from the DRAM 64 to the PEs 54 , and results data to be coupled from the PEs 54 to the DRAM 64 .
- the DRAM 64 stores 16 M bytes of data.
- the number of PEs 54 used in the active memory device 10 can be greater or lesser than 256, and the storage capacity of the DRAM 64 can be greater or lesser than 16 Mbytes.
- the ACU 50 executes intrinsic routines each containing several microinstructions responsive to the command from the FIFO buffer 44 . These microinstructions are stored in a program memory 70 , which is preferably loaded at power-up or at some other time based on specific operations that the active memory device 10 is to perform. Control and address (“C/A”) signals are coupled to the program memory 70 from the ACU 50 .
- a memory map 80 of the program memory 70 according to one embodiment is shown in FIG. 2 .
- the memory map 80 shows a large number of intrinsics 84 - 1 , - 2 , - 3 , - 4 . . . -N, each of which is composed of one or more microinstructions, as previously explained.
- microinstructions generally include both code that is executed by the ACU 50 and code that is executed by the PEs 54 .
- the microinstructions in at least some of the intrinsics 84 cause the PEs 54 to perform respective operations on data received from the DRAM 54 through the register files 68 .
- the microinstructions in other of the intrinsics 84 cause data to transferred from the PEs 54 to the register files 68 or from the register files 68 to the PEs 54 .
- the microinstructions in other of the intrinsics 84 are involved in the transfer of data to and from the DRAM 54 .
- the command engine 40 executes respective sequences of instructions stored in an internal program memory (not shown).
- the instructions generally include both code that is executed by the command engine 40 and PE instructions that are passed to the ACU 50 .
- Each of the PE instructions that are passed to the ACU 50 is generally used to address the program memory 70 to select the first microinstruction in an intrinsic 84 corresponding to the PE instruction.
- the ACU 50 couples command and address signals to the program memory 70 to sequentially read from the program memory 70 each microinstruction in the intrinsic 84 being executed.
- a portion of each microinstruction from the program memory 70 is executed by the PEs 54 to operate on data received from the register files 68 .
- the DRAM 54 may also be accessed directly by the host system 14 through a host/memory interface (“HMI”) port 90 .
- the HMI port 90 is adapted to receive a set of memory commands that are substantially similar to the commands of a conventional SDRAM except that it includes signals for performing a “handshaking” function with the host system 14 . These commands include, for example, ACTIVE, PRECHARGE, READ, WRITE, etc.
- the HMI port 90 includes a 32-bit data bus and a 14-bit address bus, which is capable of addressing 16,384 pages of 256 words.
- the address mapping mode is configurable to allow data to be accessed as 8, 16 or 32 bit words. However, other memory configurations are, of course, possible.
- the host system 14 passes a relatively large volume of data to the DRAM 64 through the HMI port 90 , often from the mass storage device 24 .
- the host system 14 then passes task commands to the active memory device 10 , which cause subsets of operand data to be read from the DRAM 64 and operated on by the PEs 54 .
- Results data generated from the operations performed by the PEs 54 are then written to the DRAM 64 .
- the relatively large volume of results data are read from the DRAM 64 and passed to the host system 14 through the HMI port 90 .
- the DRAM 64 may simply be used as system memory for the host system 14 without the PEs 54 processing any of the data stored in the DRAM 64 .
- the time required to transfer relatively large volumes of data from the host system 14 to the DRAM 64 and from the DRAM 64 to the host system 14 can markedly slow the operating speed of a system using active memory devices. If the data could be transferred trough the HMI port 90 at a more rapid rate, the operating efficiency of the active memory device 10 could be materially increased.
- the host system 14 transfers compressed data through the HMI port 90 to the DRAM 64 .
- the compressed data are then transferred to the PEs 54 , which execute a decompression algorithm to decompress the data.
- the decompressed data are then stored in the DRAM 64 and operated on by the PEs 54 , as previously explained.
- the results data are then stored in the DRAM 64 .
- the data stored in the DRAM 64 are to be transferred to the host system 14 , the data are first transferred to the PEs 54 , which execute a compression algorithm to compress the data.
- the compressed data are then stored in the DRAM 64 and subsequently transferred to the host system 14 through the HMI port 90 .
- the PEs 54 preferably compress and decompress the data by executing microinstructions stored in the program memory 70 .
- some of the intrinsics 84 ( FIG. 2 ) stored in the program memory 70 such as 84 - 2 , cause the PEs 54 to decompress data transferred from the host system 14 through the HMI port 90 .
- Other of the intrinsics 84 stored in the program memory 70 such as 84 - 3 , cause the PEs 54 to compress data before being transferred to the host system 14 through the HMI port 90 .
- the intrinsics 84 can compress and decompress the data using any of a wide variety of conventional or hereinafter developed compression algorithms.
- a single active memory device 10 may be used in a computer system as shown in FIG. 1 , or multiple active memory devices 10 - 1 , 10 - 2 . . . 10 - n may be used as shown in FIG. 3 .
- the active memory devices 10 are coupled to the memory controller 18 ′, which is, in turn, coupled to the host CPU 20 ′.
- the memory controller 18 ′ of FIG. 3 is substantially identical to the memory controller 18 of FIG. 1 except that it outputs an N-bit control signal to specify which of the active memory devices 10 is to communicate with the memory controller 18 ′.
- Other components of the computer system some of which are shown in FIG. 1 , have been omitted from FIG. 3 in interest of brevity and clarity.
- the use of several active memory devices 10 can substantially increase the memory bandwidth of a computer system in which they are included because the host system 14 ′ can be passing data to or from one of the active memory devices 10 while another of the active memory devices 10 is decompressing data that has been transferred from the host system 14 ′ or compressing data prior to being transferred to the host system 14 ′.
- FIG. 4 illustrates the execution of a “page to disk” task command from the host system 14 .
- a page to disk command is a command that transfers data stored in a block of memory, known as a “page,” to a storage location in a disk drive.
- the operation is entered at 100 , and the host CPU 20 formulates a “page to disk” task command at 104 .
- the host CPU 20 computes the locations of the page to be transferred, which is designated by a DRAM address in the active memory devices 10 .
- the memory controller 18 ′ in the host system 14 ′ preferably accesses each of the active memory devices 10 - 1 , 10 - 2 . . . 10 - n in sequence.
- a memory device index “I” is set to the number “N” of active memory devices 10 in the system at 108 .
- the host CPU 20 through the memory controller 18 , then issues the task command to the highest order active memory device 10 at 110 .
- the task command consists of a “page to disk” command and the address in the active memory devices 10 from where the data is to be transferred. As explained above, this address was calculated at step 106 .
- the memory device index I is decremented at 114 and a determination is made at 116 whether or not the previously issued task command was issued to the first active memory device 10 - 1 . If the task command has not yet been issued to the first active memory device 10 - 1 , the operation returns to 110 where the “page to disk” command is issued to the next active memory device 10 . When the task command has been issued to the first active memory device 10 - 1 , the operation progresses to 120 where a delay is initiated that allows the active memory devices 10 sufficient time to complete the task corresponding to the task commands. Thus, the task commands may be issued to the active memory devices 10 at a rate that is faster than the active memory devices 10 can complete the task.
- the DRAM 64 in each of the active memory devices 10 transfer the block of data in the designated page to the respective array of PEs 54 through the register files 68 .
- the PEs 54 then compress the data by executing the microcode in an intrinsic 84 stored in the program memory 70 in each of the active memory devices 10 .
- the PEs 54 then transfer the compressed data through the register files 68 back to the DRAM 64 .
- DMA direct memory access
- the DMA operations may be initiated at a rate that is faster than the mass storage device 24 ′ can complete the operations.
- the DMA operations are simply stored as a list of DMA operations that are sequentially completed, which is detected at 126 .
- Each DMA operation causes the compressed data stored in the DRAM 64 to be sequentially coupled to the mass storage device 24 ′ through the HMI port 90 and memory controller 18 ′.
- the “page to disk” task is then completed at 128 .
- FIG. 5 A “memory page from disk” algorithm that is the reverse of the operation shown in FIG. 4 is shown in FIG. 5 .
- the operation is initiated at 140 , and a determination is made at 144 of the number of active memory devices 10 to which the data in the mass storage device 24 will be transferred.
- the memory device index I is then set to that number at 144 .
- the host CPU 20 ′ then issues a command at 148 that causes the designated compressed data stored in the mass storage device 24 ′ to be transferred through the memory controller 18 ′ and the HMI port 90 to the DRAM 64 in the highest order active memory device 10 to which data will be transferred.
- the operation waits at 150 until the data have been transferred from the mass storage device 24 ′.
- the host CPU 20 ′ then issues a decompress task command to the active memory device 10 at step 154 .
- the DRAM 64 in the active memory device 10 being addressed transfers the compressed data through the register files 68 to the array of PEs 54 .
- the PEs 54 then decompress the data by executing one of the intrinsics 84 stored in the program memory 70 , and then transfer the decompressed data through the register files 68 to the DRAM 64 .
- intrinsics 84 are stored in the program memory 70 to assist in carrying out these operations.
- intrinsics 84 could be provided that cause the PEs 54 to compress and/or decompress all of the data stored in the DRAM 64 , or to compressed and/or decompress data stored in the DRAM 64 only within certain ranges of addresses.
- Other operations in which the PEs 54 compress or decompress data will be apparent to one skilled in the art and, of course, can also be carried out in the active memory device 10 .
- the data from the active memory device 10 to the mass storage device 24 may be transferred to other components, such as the host CPU 20 , a graphics processor (not shown), etc., through a DMA operation or some other operation.
- the PEs 54 need not SIMD PEs, but instead can be other types of processing devices such as multiple instruction multiple data (“MIMD”) processing elements. Accordingly, the invention is not limited except as by the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Advance Control (AREA)
Abstract
An integrated circuit active memory device receives task commands from a component in a host computer system that may include the active memory device. The host system includes a memory controller coupling the active memory device to a host CPU and a mass storage device. The active memory device includes a command engine issuing instructions responsive to the task commands to either an array control unit or a DRAM control unit. The instructions provided to the DRAM control unit cause data to be written to or read from a DRAM and coupled to or from either the processing elements or a host/memory interface. The processing elements execute instructions provided by the array control unit to decompress data written to the DRAM through the host/memory interface and compress data read from the DRAM through the host/memory interface.
Description
- This invention relates memory devices, and, more particularly, to techniques for efficiently transferring data to and from active memory devices.
- A common computer processing task involves sequentially processing large numbers of data items, such as data corresponding to each of a large number of pixels in an array. Processing data in this manner normally requires fetching each item of data from a memory device, performing a mathematical or logical calculation on that data, and then returning the processed data to the memory device. Performing such processing tasks at high speed is greatly facilitated by a high data bandwidth between the processor and the memory devices. The data bandwidth between a processor and a memory device is proportional to the width of a data path between the processor and the memory device and the frequency at which the data are clocked between the processor and the memory device. Therefore, increasing either of these parameters will increase the data bandwidth between the processor and memory device, and hence the rate at which data can be processed.
- An active memory device is a memory device having its own processing resource. It is relatively easy to provide an active memory device with a wide data path, thereby achieving a high memory bandwidth. Conventional active memory devices have been provided for mainframe computers in the form of discrete memory devices having dedicated processing resources. However, it is now possible to fabricate a memory device, particularly a dynamic random access memory (“DRAM”) device, and one or more processors on a single integrated circuit chip. Single chip active memories have several advantageous properties. First, the data path between the DRAM device and the processor can be made very wide to provide a high data bandwidth between the DRAM device and the processor. In contrast, the data path between a discrete DRAM device and a processor is normally limited by constraints on the size of external data buses. Further, because the DRAM device and the processor are on the same chip, the speed at which data can be clocked between the DRAM device and the processor can be relatively high, which also maximizes data bandwidth. The cost of an active memory fabricated on a single chip can is also less than the cost of a discrete memory device coupled to an external processor.
- An active memory device can be designed to operate at a very high speed by parallel processing data using a large number of processing elements (“PEs”) each of which processes a respective group of the data bits. One type of parallel processor is known as a single instruction, multiple data (“SIMD”) processor. In a SIMD processor, each of a large number of PEs simultaneously receive the same instructions, but they each process separate data. The instructions are generally provided to the PE's by a suitable device, such as a microprocessor. The advantages of SIMD processing are simple control, efficient use of available data bandwidth, and minimal logic hardware overhead. Another parallel processing architecture is multiple instruction, multiple data (“MIMD”) in which a large number of processing elements process separate data using separate instructions.
- A high performance active memory device can be implemented by fabricating a large number of SIMD PEs or MIMD PEs and a DRAM on a single chip, and coupling each of the PEs to respective groups of columns of the DRAM. The instructions are provided to the PEs from an external device, such as a host microprocessor. The number of PE's included on the chip can be very large, thereby resulting in a massively parallel processor capable of processing vast amounts of data.
- In operation, data to be operated on by the PEs are first written to the DRAM, generally from an external source such as a disk, network or input/output (“I/O”) device in a host computer system. In response to common instructions passed to all of the PEs, the PE's fetch respective groups of data to be operated on by the PEs, perform the operations called for by the instructions, and then pass data corresponding to the results of the operations back to the DRAM. After they have been written to the DRAM, the results data can be either coupled back to the external source or processed further in a subsequent operation. By operating on the data using active memory devices, particularly active memory devices using SIMD PEs and MIMD PEs, the data can be processed very efficiently. If the same data were operated on by a microprocessor or other central processing unit (“CPU”), it would be necessary to couple substantially smaller blocks of data from the memory device to the CPU for processing, and then write substantially smaller blocks of results data back to the memory device. The wider data bus and faster data transfer speeds made possible by using an active memory instead of a conventional memory result in a significantly higher data bandwidth.
- Although an active memory device allows much more efficient processing of data stored in memory, the processing speed of a computer system using active memory devices is somewhat limited by the time required to transfer operand data to the active memory for processing and the time required to transfer results data from the active memory after the operand data has been processed. During such data transfer operations, active memory devices are essentially no more efficient than passive memory devices that also require data stored in the memory device to be transferred to and from an external device, such as a CPU.
- There is therefore a need for a system and method for allowing data to be more efficiently transferred between active memory devices and an external system.
- An integrated circuit active memory device includes a memory device and an array of processing elements, such as SIMD or MIMD processing elements, coupled to the memory device. Compressed data transferred through a host/memory interface port are first written to the memory device. The processing elements then decompresses the data stored in the memory device and write the decompressed data to the memory device. The processing elements also read data from the memory device, compress the data read from the memory device, and then write the compressed data to the memory device. The compressed data are then transferred through the host/memory interface. Instructions are preferably provided to the processing elements by an array control unit, and memory commands are preferably issued to the memory device through a memory control unit. The array control unit and the memory control unit preferably execute instructions provided by a command engine responsive to task commands provided to the active memory device by a host computer system.
-
FIG. 1 is a block diagram of a computer system using an active memory device according to one embodiment of the invention. -
FIG. 2 is a memory map showing the organization of intrinsics stored in a program memory in the active memory device ofFIG. 1 . -
FIG. 3 is a block diagram of computer system using several active memory devices according to one embodiment of the invention. -
FIG. 4 is a flow chart showing one embodiment of a procedure for transferring data from the active memory device to a mass storage device in the computer system ofFIG. 3 . -
FIG. 5 is a flow chart showing one embodiment of a procedure for transferring data from a mass storage device to active memory devices in the computer system ofFIG. 3 . -
FIG. 1 shows anactive memory device 10 according to one embodiment of the invention. Thememory device 10 is preferably a component in ahost system 14, which may include amemory controller 18, ahost CPU 20, amass storage device 24, such as a disk drive, abus bridge 28 coupled between thememory controller 18 and themass storage device 24, and other components that have been omitted from thehost system 14 shown inFIG. 1 for the purpose of brevity and clarity. For example, a network (not shown), such as a local area network (“LAN”), may be coupled to thebus bridge 28. Also, a high speed interface (not shown), such as an Infiniband or Hypertransport interface, could be coupled to thememory controller 18. Other variations to thehost system 14 shown inFIG. 1 will be apparent to one skilled in the art. - The
active memory device 10 includes a first in, first out (“FIFO”)buffer 38 that receives high level task commands from thehost system 14, which may also include a task address. The received task commands are buffered by the FIFObuffer 38 and passed to acommand engine 40 at the proper time and in the order in which they are received. Thecommand engine 40 generates respective sequences of instructions corresponding to the received task commands. These instructions are at a lower level than the task commands. The instructions are coupled from thecommand engine 40 to either a processing element (“PE”)FIFO buffer 44 or a dynamic random access memory (“DRAM”)FIFO buffer 48 depending upon whether the commands are PE commands or DRAM commands. - If the instructions are PE instructions, they are passed to the
PE FIFO buffer 44 and then from thebuffer 44 to a processing array control unit (“ACU”) 50. The ACU 50 subsequently passes microinstructions to an array ofPEs 54. ThePEs 54 preferably operate as SIMD processors in which all of thePEs 54 receive and simultaneously execute the same instructions, but they may do so on different operands. However, the PEs 54 may alternatively operate at MIMD processors or some other type of processors. - If the instruction from the
command engine 40 are DRAM instructions, they are passed to theDRAM FIFO buffer 48 and then to a DRAM Control Unit (“DCU”) 60. TheDCU 60 couples memory commands and addresses to aDRAM 64 to read data from and write data to theDRAM 64. In the embodiment shown inFIG. 1 , there are 256 PE's 54 each of which is coupled to receive 8 bits of data from theDRAM 64 through register files 68. The register files 68 thus allow operand data to be coupled from theDRAM 64 to thePEs 54, and results data to be coupled from thePEs 54 to theDRAM 64. In the embodiment shown inFIG. 1 , theDRAM 64 stores 16 M bytes of data. However, it should be understood that the number ofPEs 54 used in theactive memory device 10 can be greater or lesser than 256, and the storage capacity of theDRAM 64 can be greater or lesser than 16 Mbytes. - The
ACU 50 executes intrinsic routines each containing several microinstructions responsive to the command from theFIFO buffer 44. These microinstructions are stored in aprogram memory 70, which is preferably loaded at power-up or at some other time based on specific operations that theactive memory device 10 is to perform. Control and address (“C/A”) signals are coupled to theprogram memory 70 from theACU 50. Amemory map 80 of theprogram memory 70 according to one embodiment is shown inFIG. 2 . Thememory map 80 shows a large number of intrinsics 84-1, -2, -3, -4 . . . -N, each of which is composed of one or more microinstructions, as previously explained. These microinstructions generally include both code that is executed by theACU 50 and code that is executed by thePEs 54. The microinstructions in at least some of the intrinsics 84 cause thePEs 54 to perform respective operations on data received from theDRAM 54 through the register files 68. The microinstructions in other of the intrinsics 84 cause data to transferred from thePEs 54 to the register files 68 or from the register files 68 to thePEs 54. As explained in greater detail below, the microinstructions in other of the intrinsics 84 are involved in the transfer of data to and from theDRAM 54. - In operation, in response to each task command from the
host system 14, thecommand engine 40 executes respective sequences of instructions stored in an internal program memory (not shown). The instructions generally include both code that is executed by thecommand engine 40 and PE instructions that are passed to theACU 50. Each of the PE instructions that are passed to theACU 50 is generally used to address theprogram memory 70 to select the first microinstruction in an intrinsic 84 corresponding to the PE instruction. Thereafter, theACU 50 couples command and address signals to theprogram memory 70 to sequentially read from theprogram memory 70 each microinstruction in the intrinsic 84 being executed. As mentioned above, a portion of each microinstruction from theprogram memory 70 is executed by thePEs 54 to operate on data received from the register files 68. - With further reference to
FIG. 1 , theDRAM 54 may also be accessed directly by thehost system 14 through a host/memory interface (“HMI”)port 90. TheHMI port 90 is adapted to receive a set of memory commands that are substantially similar to the commands of a conventional SDRAM except that it includes signals for performing a “handshaking” function with thehost system 14. These commands include, for example, ACTIVE, PRECHARGE, READ, WRITE, etc. In the embodiment shown inFIG. 1 , theHMI port 90 includes a 32-bit data bus and a 14-bit address bus, which is capable of addressing 16,384 pages of 256 words. The address mapping mode is configurable to allow data to be accessed as 8, 16 or 32 bit words. However, other memory configurations are, of course, possible. - In a typical processing task, the
host system 14 passes a relatively large volume of data to theDRAM 64 through theHMI port 90, often from themass storage device 24. Thehost system 14 then passes task commands to theactive memory device 10, which cause subsets of operand data to be read from theDRAM 64 and operated on by thePEs 54. Results data generated from the operations performed by thePEs 54 are then written to theDRAM 64. After all of the subsets of data have been processed by the PE's 54, the relatively large volume of results data are read from theDRAM 64 and passed to thehost system 14 through theHMI port 90. Also, of course, theDRAM 64 may simply be used as system memory for thehost system 14 without thePEs 54 processing any of the data stored in theDRAM 64. - As mentioned above, the time required to transfer relatively large volumes of data from the
host system 14 to theDRAM 64 and from theDRAM 64 to thehost system 14 can markedly slow the operating speed of a system using active memory devices. If the data could be transferred trough theHMI port 90 at a more rapid rate, the operating efficiency of theactive memory device 10 could be materially increased. - According to one embodiment of the invention, the
host system 14 transfers compressed data through theHMI port 90 to theDRAM 64. The compressed data are then transferred to thePEs 54, which execute a decompression algorithm to decompress the data. The decompressed data are then stored in theDRAM 64 and operated on by thePEs 54, as previously explained. The results data are then stored in theDRAM 64. When the data stored in theDRAM 64 are to be transferred to thehost system 14, the data are first transferred to thePEs 54, which execute a compression algorithm to compress the data. The compressed data are then stored in theDRAM 64 and subsequently transferred to thehost system 14 through theHMI port 90. By transferring only compressed data through theHMI port 90, the data bandwidth to and from theDRAM 64 is markedly increased. - The
PEs 54 preferably compress and decompress the data by executing microinstructions stored in theprogram memory 70. As previously mentioned, some of the intrinsics 84 (FIG. 2 ) stored in theprogram memory 70, such as 84-2, cause thePEs 54 to decompress data transferred from thehost system 14 through theHMI port 90. Other of the intrinsics 84 stored in theprogram memory 70, such as 84-3, cause thePEs 54 to compress data before being transferred to thehost system 14 through theHMI port 90. The intrinsics 84 can compress and decompress the data using any of a wide variety of conventional or hereinafter developed compression algorithms. - A single
active memory device 10 may be used in a computer system as shown inFIG. 1 , or multiple active memory devices 10-1, 10-2 . . . 10-n may be used as shown inFIG. 3 . In the system ofFIG. 3 , theactive memory devices 10 are coupled to thememory controller 18′, which is, in turn, coupled to thehost CPU 20′. Thememory controller 18′ ofFIG. 3 is substantially identical to thememory controller 18 ofFIG. 1 except that it outputs an N-bit control signal to specify which of theactive memory devices 10 is to communicate with thememory controller 18′. Other components of the computer system, some of which are shown inFIG. 1 , have been omitted fromFIG. 3 in interest of brevity and clarity. The use of severalactive memory devices 10 can substantially increase the memory bandwidth of a computer system in which they are included because thehost system 14′ can be passing data to or from one of theactive memory devices 10 while another of theactive memory devices 10 is decompressing data that has been transferred from thehost system 14′ or compressing data prior to being transferred to thehost system 14′. - The operation of the computer system shown in
FIG. 3 for a typical data transfer operation will now be explained with reference to the flowchart ofFIG. 4 , which illustrates the execution of a “page to disk” task command from thehost system 14. As is well known in the art, a page to disk command is a command that transfers data stored in a block of memory, known as a “page,” to a storage location in a disk drive. The operation is entered at 100, and thehost CPU 20 formulates a “page to disk” task command at 104. At 106, thehost CPU 20 computes the locations of the page to be transferred, which is designated by a DRAM address in theactive memory devices 10. As explained below, thememory controller 18′ in thehost system 14′ preferably accesses each of the active memory devices 10-1, 10-2 . . . 10-n in sequence. A memory device index “I” is set to the number “N” ofactive memory devices 10 in the system at 108. Thehost CPU 20, through thememory controller 18, then issues the task command to the highest orderactive memory device 10 at 110. The task command consists of a “page to disk” command and the address in theactive memory devices 10 from where the data is to be transferred. As explained above, this address was calculated atstep 106. After the task command has been issued by thememory controller 18, the memory device index I is decremented at 114 and a determination is made at 116 whether or not the previously issued task command was issued to the first active memory device 10-1. If the task command has not yet been issued to the first active memory device 10-1, the operation returns to 110 where the “page to disk” command is issued to the nextactive memory device 10. When the task command has been issued to the first active memory device 10-1, the operation progresses to 120 where a delay is initiated that allows theactive memory devices 10 sufficient time to complete the task corresponding to the task commands. Thus, the task commands may be issued to theactive memory devices 10 at a rate that is faster than theactive memory devices 10 can complete the task. During the time that theactive memory devices 10 are processing the “page to disk” task commands atstep 120, theDRAM 64 in each of theactive memory devices 10 transfer the block of data in the designated page to the respective array ofPEs 54 through the register files 68. ThePEs 54 then compress the data by executing the microcode in an intrinsic 84 stored in theprogram memory 70 in each of theactive memory devices 10. ThePEs 54 then transfer the compressed data through the register files 68 back to theDRAM 64. - After sufficient time has lapsed for the
active memory devices 10 to complete the task of compressing the read data stored in the designated page and making the compressed data available to theHMI port 90, direct memory access (“DMA”) operations to themass storage device 24′ are initiated at 124. In this regard, the DMA operations may be initiated at a rate that is faster than themass storage device 24′ can complete the operations. The DMA operations are simply stored as a list of DMA operations that are sequentially completed, which is detected at 126. Each DMA operation causes the compressed data stored in theDRAM 64 to be sequentially coupled to themass storage device 24′ through theHMI port 90 andmemory controller 18′. The “page to disk” task is then completed at 128. - A “memory page from disk” algorithm that is the reverse of the operation shown in
FIG. 4 is shown inFIG. 5 . The operation is initiated at 140, and a determination is made at 144 of the number ofactive memory devices 10 to which the data in themass storage device 24 will be transferred. The memory device index I is then set to that number at 144. Thehost CPU 20′ then issues a command at 148 that causes the designated compressed data stored in themass storage device 24′ to be transferred through thememory controller 18′ and theHMI port 90 to theDRAM 64 in the highest orderactive memory device 10 to which data will be transferred. The operation waits at 150 until the data have been transferred from themass storage device 24′. Thehost CPU 20′ then issues a decompress task command to theactive memory device 10 atstep 154. In response to the decompress task command, theDRAM 64 in theactive memory device 10 being addressed transfers the compressed data through the register files 68 to the array ofPEs 54. ThePEs 54 then decompress the data by executing one of the intrinsics 84 stored in theprogram memory 70, and then transfer the decompressed data through the register files 68 to theDRAM 64. - After the data from the
mass storage device 24 have been downloaded to theDRAM 64 and decompressed, the memory device index I is decremented at 158 in a determination is made at 160 whether I=1 corresponding to the data being transferred from themass storage device 24 to the first active memory device 10-1. If not, the operation returns to 150 to repeat the process described above. If all of the data have been transferred from themass storage device 24, the operation branches to 170 where it waits for all of the downloaded data to be decompressed by thePEs 54 and stored in therespective DRAM 64. The operation and then takes its through 174. - Although only the “page to disk” and the “memory page from disk” operations have been described herein, it will be understood that other operations can also occur, and corresponding intrinsics 84 are stored in the
program memory 70 to assist in carrying out these operations. For example, intrinsics 84 could be provided that cause thePEs 54 to compress and/or decompress all of the data stored in theDRAM 64, or to compressed and/or decompress data stored in theDRAM 64 only within certain ranges of addresses. Other operations in which thePEs 54 compress or decompress data will be apparent to one skilled in the art and, of course, can also be carried out in theactive memory device 10. - From the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. For example, rather than transfer the compressed data from the
HMI port 90 to theDRAM 64 prior to being decompressed by thePEs 54, it may be possible to transfer the compressed data directly from theHMI port 90 to the register files 68 or some other component (not shown) before being decompressed by thePEs 54. Similarly, rather than storing data compressed by thePEs 54 in theDRAM 64 before being transferring the compressed data through theHMI interface 90, it may be possible to store the data compressed by thePEs 54 in the register files 68 or some other location prior to being transferred through theHMI port 90. As another example, instead of or in addition to transferring the data from theactive memory device 10 to themass storage device 24, it may be transferred to other components, such as thehost CPU 20, a graphics processor (not shown), etc., through a DMA operation or some other operation. Furthermore, as mentioned above, thePEs 54 need not SIMD PEs, but instead can be other types of processing devices such as multiple instruction multiple data (“MIMD”) processing elements. Accordingly, the invention is not limited except as by the appended claims.
Claims (2)
1. An integrated circuit active memory device comprising:
a memory device having a data bus containing a plurality of data bus bits;
an array of processing elements each of which is coupled to a respective group of the data bus bits, each of the processing elements having an instruction input coupled to receive processing element instructions for controlling the operation of the processing elements;
a host interface port operable to transfer data to and from the active memory device; and
a control unit being operable to receive task commands and to generate corresponding sequences of instructions responsive to each of the task commands to control the operation of the memory device and the processing elements, at least some of the instructions generated by the control unit causing the processing elements to either decompress data transferred to the active memory device through the host interface port and then store the decompressed data in the memory device or to compress data transferred from the memory device that is to be transferred from the active memory device through the host interface port.
2-43. (canceled)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/431,455 US20060206641A1 (en) | 2003-04-25 | 2006-05-09 | Active memory data compression system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/424,206 US9015390B2 (en) | 2003-04-25 | 2003-04-25 | Active memory data compression system and method |
US11/431,455 US20060206641A1 (en) | 2003-04-25 | 2006-05-09 | Active memory data compression system and method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/424,206 Continuation US9015390B2 (en) | 2003-04-25 | 2003-04-25 | Active memory data compression system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060206641A1 true US20060206641A1 (en) | 2006-09-14 |
Family
ID=33299302
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/424,206 Active 2026-10-10 US9015390B2 (en) | 2003-04-25 | 2003-04-25 | Active memory data compression system and method |
US11/431,455 Abandoned US20060206641A1 (en) | 2003-04-25 | 2006-05-09 | Active memory data compression system and method |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/424,206 Active 2026-10-10 US9015390B2 (en) | 2003-04-25 | 2003-04-25 | Active memory data compression system and method |
Country Status (1)
Country | Link |
---|---|
US (2) | US9015390B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3320429A4 (en) * | 2015-07-30 | 2018-07-18 | Huawei Technologies Co., Ltd. | System and method for variable lane architecture |
Families Citing this family (140)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9606807B2 (en) * | 2009-06-04 | 2017-03-28 | Micron Technology, Inc. | Direct communication with a processor internal to a memory device |
CN102945213B (en) * | 2012-09-24 | 2016-08-10 | 无锡众志和达数据计算股份有限公司 | A kind of out of order Memory Controller Hub based on FPGA and its implementation |
US9158667B2 (en) | 2013-03-04 | 2015-10-13 | Micron Technology, Inc. | Apparatuses and methods for performing logical operations using sensing circuitry |
US8964496B2 (en) | 2013-07-26 | 2015-02-24 | Micron Technology, Inc. | Apparatuses and methods for performing compare operations using sensing circuitry |
US8971124B1 (en) | 2013-08-08 | 2015-03-03 | Micron Technology, Inc. | Apparatuses and methods for performing logical operations using sensing circuitry |
US9153305B2 (en) | 2013-08-30 | 2015-10-06 | Micron Technology, Inc. | Independently addressable memory array address spaces |
US9019785B2 (en) | 2013-09-19 | 2015-04-28 | Micron Technology, Inc. | Data shifting via a number of isolation devices |
US9449675B2 (en) | 2013-10-31 | 2016-09-20 | Micron Technology, Inc. | Apparatuses and methods for identifying an extremum value stored in an array of memory cells |
US9430191B2 (en) | 2013-11-08 | 2016-08-30 | Micron Technology, Inc. | Division operations for memory |
US9934856B2 (en) | 2014-03-31 | 2018-04-03 | Micron Technology, Inc. | Apparatuses and methods for comparing data patterns in memory |
US9711207B2 (en) | 2014-06-05 | 2017-07-18 | Micron Technology, Inc. | Performing logical operations using sensing circuitry |
US9449674B2 (en) | 2014-06-05 | 2016-09-20 | Micron Technology, Inc. | Performing logical operations using sensing circuitry |
US9711206B2 (en) | 2014-06-05 | 2017-07-18 | Micron Technology, Inc. | Performing logical operations using sensing circuitry |
US9786335B2 (en) | 2014-06-05 | 2017-10-10 | Micron Technology, Inc. | Apparatuses and methods for performing logical operations using sensing circuitry |
US9910787B2 (en) | 2014-06-05 | 2018-03-06 | Micron Technology, Inc. | Virtual address table |
US9830999B2 (en) | 2014-06-05 | 2017-11-28 | Micron Technology, Inc. | Comparison operations in memory |
US9496023B2 (en) | 2014-06-05 | 2016-11-15 | Micron Technology, Inc. | Comparison operations on logical representations of values in memory |
US9704540B2 (en) | 2014-06-05 | 2017-07-11 | Micron Technology, Inc. | Apparatuses and methods for parity determination using sensing circuitry |
US9455020B2 (en) | 2014-06-05 | 2016-09-27 | Micron Technology, Inc. | Apparatuses and methods for performing an exclusive or operation using sensing circuitry |
US10074407B2 (en) | 2014-06-05 | 2018-09-11 | Micron Technology, Inc. | Apparatuses and methods for performing invert operations using sensing circuitry |
US9779019B2 (en) | 2014-06-05 | 2017-10-03 | Micron Technology, Inc. | Data storage layout |
US9904515B2 (en) | 2014-09-03 | 2018-02-27 | Micron Technology, Inc. | Multiplication operations in memory |
US9898252B2 (en) | 2014-09-03 | 2018-02-20 | Micron Technology, Inc. | Multiplication operations in memory |
US9847110B2 (en) | 2014-09-03 | 2017-12-19 | Micron Technology, Inc. | Apparatuses and methods for storing a data value in multiple columns of an array corresponding to digits of a vector |
US10068652B2 (en) | 2014-09-03 | 2018-09-04 | Micron Technology, Inc. | Apparatuses and methods for determining population count |
US9747961B2 (en) | 2014-09-03 | 2017-08-29 | Micron Technology, Inc. | Division operations in memory |
US9740607B2 (en) | 2014-09-03 | 2017-08-22 | Micron Technology, Inc. | Swap operations in memory |
US9589602B2 (en) | 2014-09-03 | 2017-03-07 | Micron Technology, Inc. | Comparison operations in memory |
US9940026B2 (en) | 2014-10-03 | 2018-04-10 | Micron Technology, Inc. | Multidimensional contiguous memory allocation |
US9836218B2 (en) | 2014-10-03 | 2017-12-05 | Micron Technology, Inc. | Computing reduction and prefix sum operations in memory |
US10163467B2 (en) | 2014-10-16 | 2018-12-25 | Micron Technology, Inc. | Multiple endianness compatibility |
US10147480B2 (en) | 2014-10-24 | 2018-12-04 | Micron Technology, Inc. | Sort operation in memory |
US9779784B2 (en) | 2014-10-29 | 2017-10-03 | Micron Technology, Inc. | Apparatuses and methods for performing logical operations using sensing circuitry |
US10073635B2 (en) | 2014-12-01 | 2018-09-11 | Micron Technology, Inc. | Multiple endianness compatibility |
US9747960B2 (en) | 2014-12-01 | 2017-08-29 | Micron Technology, Inc. | Apparatuses and methods for converting a mask to an index |
US10061590B2 (en) | 2015-01-07 | 2018-08-28 | Micron Technology, Inc. | Generating and executing a control flow |
US10032493B2 (en) | 2015-01-07 | 2018-07-24 | Micron Technology, Inc. | Longest element length determination in memory |
US9583163B2 (en) | 2015-02-03 | 2017-02-28 | Micron Technology, Inc. | Loop structure for operations in memory |
WO2016126478A1 (en) | 2015-02-06 | 2016-08-11 | Micron Technology, Inc. | Apparatuses and methods for memory device as a store for program instructions |
WO2016126472A1 (en) | 2015-02-06 | 2016-08-11 | Micron Technology, Inc. | Apparatuses and methods for scatter and gather |
EP3254286B1 (en) | 2015-02-06 | 2019-09-11 | Micron Technology, INC. | Apparatuses and methods for parallel writing to multiple memory device locations |
WO2016144724A1 (en) | 2015-03-10 | 2016-09-15 | Micron Technology, Inc. | Apparatuses and methods for shift decisions |
US9741399B2 (en) | 2015-03-11 | 2017-08-22 | Micron Technology, Inc. | Data shift by elements of a vector in memory |
US9898253B2 (en) | 2015-03-11 | 2018-02-20 | Micron Technology, Inc. | Division operations on variable length elements in memory |
EP3268965A4 (en) | 2015-03-12 | 2018-10-03 | Micron Technology, INC. | Apparatuses and methods for data movement |
US10146537B2 (en) | 2015-03-13 | 2018-12-04 | Micron Technology, Inc. | Vector population count determination in memory |
US10049054B2 (en) | 2015-04-01 | 2018-08-14 | Micron Technology, Inc. | Virtual register file |
US10140104B2 (en) | 2015-04-14 | 2018-11-27 | Micron Technology, Inc. | Target architecture determination |
US9959923B2 (en) | 2015-04-16 | 2018-05-01 | Micron Technology, Inc. | Apparatuses and methods to reverse data stored in memory |
US10073786B2 (en) | 2015-05-28 | 2018-09-11 | Micron Technology, Inc. | Apparatuses and methods for compute enabled cache |
US9704541B2 (en) | 2015-06-12 | 2017-07-11 | Micron Technology, Inc. | Simulating access lines |
US9921777B2 (en) | 2015-06-22 | 2018-03-20 | Micron Technology, Inc. | Apparatuses and methods for data transfer from sensing circuitry to a controller |
US9996479B2 (en) | 2015-08-17 | 2018-06-12 | Micron Technology, Inc. | Encryption of executables in computational memory |
US9905276B2 (en) | 2015-12-21 | 2018-02-27 | Micron Technology, Inc. | Control of sensing components in association with performing operations |
US9952925B2 (en) | 2016-01-06 | 2018-04-24 | Micron Technology, Inc. | Error code calculation on sensing circuitry |
US10048888B2 (en) | 2016-02-10 | 2018-08-14 | Micron Technology, Inc. | Apparatuses and methods for partitioned parallel data movement |
US9892767B2 (en) | 2016-02-12 | 2018-02-13 | Micron Technology, Inc. | Data gathering in memory |
US9971541B2 (en) | 2016-02-17 | 2018-05-15 | Micron Technology, Inc. | Apparatuses and methods for data movement |
US9899070B2 (en) | 2016-02-19 | 2018-02-20 | Micron Technology, Inc. | Modified decode for corner turn |
US10956439B2 (en) | 2016-02-19 | 2021-03-23 | Micron Technology, Inc. | Data transfer with a bit vector operation device |
US9697876B1 (en) | 2016-03-01 | 2017-07-04 | Micron Technology, Inc. | Vertical bit vector shift in memory |
US10262721B2 (en) | 2016-03-10 | 2019-04-16 | Micron Technology, Inc. | Apparatuses and methods for cache invalidate |
US9997232B2 (en) | 2016-03-10 | 2018-06-12 | Micron Technology, Inc. | Processing in memory (PIM) capable memory device having sensing circuitry performing logic operations |
US10379772B2 (en) | 2016-03-16 | 2019-08-13 | Micron Technology, Inc. | Apparatuses and methods for operations using compressed and decompressed data |
US9910637B2 (en) | 2016-03-17 | 2018-03-06 | Micron Technology, Inc. | Signed division in memory |
US11074988B2 (en) | 2016-03-22 | 2021-07-27 | Micron Technology, Inc. | Apparatus and methods for debugging on a host and memory device |
US10120740B2 (en) | 2016-03-22 | 2018-11-06 | Micron Technology, Inc. | Apparatus and methods for debugging on a memory device |
US10388393B2 (en) | 2016-03-22 | 2019-08-20 | Micron Technology, Inc. | Apparatus and methods for debugging on a host and memory device |
US10474581B2 (en) | 2016-03-25 | 2019-11-12 | Micron Technology, Inc. | Apparatuses and methods for cache operations |
US10977033B2 (en) | 2016-03-25 | 2021-04-13 | Micron Technology, Inc. | Mask patterns generated in memory from seed vectors |
US10074416B2 (en) | 2016-03-28 | 2018-09-11 | Micron Technology, Inc. | Apparatuses and methods for data movement |
US10430244B2 (en) | 2016-03-28 | 2019-10-01 | Micron Technology, Inc. | Apparatuses and methods to determine timing of operations |
US10453502B2 (en) | 2016-04-04 | 2019-10-22 | Micron Technology, Inc. | Memory bank power coordination including concurrently performing a memory operation in a selected number of memory regions |
US10607665B2 (en) | 2016-04-07 | 2020-03-31 | Micron Technology, Inc. | Span mask generation |
US9818459B2 (en) | 2016-04-19 | 2017-11-14 | Micron Technology, Inc. | Invert operations using sensing circuitry |
US9659605B1 (en) | 2016-04-20 | 2017-05-23 | Micron Technology, Inc. | Apparatuses and methods for performing corner turn operations using sensing circuitry |
US10153008B2 (en) | 2016-04-20 | 2018-12-11 | Micron Technology, Inc. | Apparatuses and methods for performing corner turn operations using sensing circuitry |
US10042608B2 (en) | 2016-05-11 | 2018-08-07 | Micron Technology, Inc. | Signed division in memory |
US9659610B1 (en) | 2016-05-18 | 2017-05-23 | Micron Technology, Inc. | Apparatuses and methods for shifting data |
US10049707B2 (en) | 2016-06-03 | 2018-08-14 | Micron Technology, Inc. | Shifting data |
US10387046B2 (en) | 2016-06-22 | 2019-08-20 | Micron Technology, Inc. | Bank to bank data transfer |
US10037785B2 (en) | 2016-07-08 | 2018-07-31 | Micron Technology, Inc. | Scan chain operation in sensing circuitry |
US10388360B2 (en) | 2016-07-19 | 2019-08-20 | Micron Technology, Inc. | Utilization of data stored in an edge section of an array |
US10387299B2 (en) | 2016-07-20 | 2019-08-20 | Micron Technology, Inc. | Apparatuses and methods for transferring data |
US10733089B2 (en) | 2016-07-20 | 2020-08-04 | Micron Technology, Inc. | Apparatuses and methods for write address tracking |
US9972367B2 (en) | 2016-07-21 | 2018-05-15 | Micron Technology, Inc. | Shifting data in sensing circuitry |
US9767864B1 (en) | 2016-07-21 | 2017-09-19 | Micron Technology, Inc. | Apparatuses and methods for storing a data value in a sensing circuitry element |
US10303632B2 (en) | 2016-07-26 | 2019-05-28 | Micron Technology, Inc. | Accessing status information |
US10468087B2 (en) | 2016-07-28 | 2019-11-05 | Micron Technology, Inc. | Apparatuses and methods for operations in a self-refresh state |
US9990181B2 (en) | 2016-08-03 | 2018-06-05 | Micron Technology, Inc. | Apparatuses and methods for random number generation |
US11029951B2 (en) | 2016-08-15 | 2021-06-08 | Micron Technology, Inc. | Smallest or largest value element determination |
US10606587B2 (en) | 2016-08-24 | 2020-03-31 | Micron Technology, Inc. | Apparatus and methods related to microcode instructions indicating instruction types |
US10466928B2 (en) | 2016-09-15 | 2019-11-05 | Micron Technology, Inc. | Updating a register in memory |
US10387058B2 (en) | 2016-09-29 | 2019-08-20 | Micron Technology, Inc. | Apparatuses and methods to change data category values |
US10014034B2 (en) | 2016-10-06 | 2018-07-03 | Micron Technology, Inc. | Shifting data in sensing circuitry |
US10529409B2 (en) | 2016-10-13 | 2020-01-07 | Micron Technology, Inc. | Apparatuses and methods to perform logical operations using sensing circuitry |
US9805772B1 (en) | 2016-10-20 | 2017-10-31 | Micron Technology, Inc. | Apparatuses and methods to selectively perform logical operations |
US10373666B2 (en) | 2016-11-08 | 2019-08-06 | Micron Technology, Inc. | Apparatuses and methods for compute components formed over an array of memory cells |
US10423353B2 (en) | 2016-11-11 | 2019-09-24 | Micron Technology, Inc. | Apparatuses and methods for memory alignment |
US9761300B1 (en) | 2016-11-22 | 2017-09-12 | Micron Technology, Inc. | Data shift apparatuses and methods |
US10402340B2 (en) | 2017-02-21 | 2019-09-03 | Micron Technology, Inc. | Memory array page table walk |
US10268389B2 (en) | 2017-02-22 | 2019-04-23 | Micron Technology, Inc. | Apparatuses and methods for in-memory operations |
US10403352B2 (en) | 2017-02-22 | 2019-09-03 | Micron Technology, Inc. | Apparatuses and methods for compute in data path |
US10838899B2 (en) | 2017-03-21 | 2020-11-17 | Micron Technology, Inc. | Apparatuses and methods for in-memory data switching networks |
US10185674B2 (en) | 2017-03-22 | 2019-01-22 | Micron Technology, Inc. | Apparatus and methods for in data path compute operations |
US11222260B2 (en) | 2017-03-22 | 2022-01-11 | Micron Technology, Inc. | Apparatuses and methods for operating neural networks |
US10049721B1 (en) | 2017-03-27 | 2018-08-14 | Micron Technology, Inc. | Apparatuses and methods for in-memory operations |
US10043570B1 (en) | 2017-04-17 | 2018-08-07 | Micron Technology, Inc. | Signed element compare in memory |
US10147467B2 (en) | 2017-04-17 | 2018-12-04 | Micron Technology, Inc. | Element value comparison in memory |
US9997212B1 (en) | 2017-04-24 | 2018-06-12 | Micron Technology, Inc. | Accessing data in memory |
US10942843B2 (en) | 2017-04-25 | 2021-03-09 | Micron Technology, Inc. | Storing data elements of different lengths in respective adjacent rows or columns according to memory shapes |
US10236038B2 (en) | 2017-05-15 | 2019-03-19 | Micron Technology, Inc. | Bank to bank data transfer |
US10068664B1 (en) | 2017-05-19 | 2018-09-04 | Micron Technology, Inc. | Column repair in memory |
US10013197B1 (en) | 2017-06-01 | 2018-07-03 | Micron Technology, Inc. | Shift skip |
US10152271B1 (en) | 2017-06-07 | 2018-12-11 | Micron Technology, Inc. | Data replication |
US10262701B2 (en) | 2017-06-07 | 2019-04-16 | Micron Technology, Inc. | Data transfer between subarrays in memory |
US10318168B2 (en) | 2017-06-19 | 2019-06-11 | Micron Technology, Inc. | Apparatuses and methods for simultaneous in data path compute operations |
US10162005B1 (en) | 2017-08-09 | 2018-12-25 | Micron Technology, Inc. | Scan chain operations |
US10534553B2 (en) | 2017-08-30 | 2020-01-14 | Micron Technology, Inc. | Memory array accessibility |
US10416927B2 (en) | 2017-08-31 | 2019-09-17 | Micron Technology, Inc. | Processing in memory |
US10741239B2 (en) | 2017-08-31 | 2020-08-11 | Micron Technology, Inc. | Processing in memory device including a row address strobe manager |
US10346092B2 (en) | 2017-08-31 | 2019-07-09 | Micron Technology, Inc. | Apparatuses and methods for in-memory operations using timing circuitry |
US10409739B2 (en) | 2017-10-24 | 2019-09-10 | Micron Technology, Inc. | Command selection policy |
US10522210B2 (en) | 2017-12-14 | 2019-12-31 | Micron Technology, Inc. | Apparatuses and methods for subarray addressing |
US10332586B1 (en) | 2017-12-19 | 2019-06-25 | Micron Technology, Inc. | Apparatuses and methods for subrow addressing |
US10614875B2 (en) | 2018-01-30 | 2020-04-07 | Micron Technology, Inc. | Logical operations using memory cells |
US10437557B2 (en) | 2018-01-31 | 2019-10-08 | Micron Technology, Inc. | Determination of a match between data values stored by several arrays |
US11194477B2 (en) | 2018-01-31 | 2021-12-07 | Micron Technology, Inc. | Determination of a match between data values stored by three or more arrays |
US10725696B2 (en) | 2018-04-12 | 2020-07-28 | Micron Technology, Inc. | Command selection policy with read priority |
US10440341B1 (en) | 2018-06-07 | 2019-10-08 | Micron Technology, Inc. | Image processor formed in an array of memory cells |
US10769071B2 (en) | 2018-10-10 | 2020-09-08 | Micron Technology, Inc. | Coherent memory access |
US11175915B2 (en) | 2018-10-10 | 2021-11-16 | Micron Technology, Inc. | Vector registers implemented in memory |
US10483978B1 (en) | 2018-10-16 | 2019-11-19 | Micron Technology, Inc. | Memory device processing |
US11184446B2 (en) | 2018-12-05 | 2021-11-23 | Micron Technology, Inc. | Methods and apparatus for incentivizing participation in fog networks |
US11157423B2 (en) * | 2019-05-02 | 2021-10-26 | Dell Products L.P. | Pipelined-data-transform-enabled data mover system |
US10867655B1 (en) | 2019-07-08 | 2020-12-15 | Micron Technology, Inc. | Methods and apparatus for dynamically adjusting performance of partitioned memory |
US11360768B2 (en) | 2019-08-14 | 2022-06-14 | Micron Technolgy, Inc. | Bit string operations in memory |
US11449577B2 (en) | 2019-11-20 | 2022-09-20 | Micron Technology, Inc. | Methods and apparatus for performing video processing matrix operations within a memory array |
US11853385B2 (en) | 2019-12-05 | 2023-12-26 | Micron Technology, Inc. | Methods and apparatus for performing diversity matrix operations within a memory array |
US11227641B1 (en) | 2020-07-21 | 2022-01-18 | Micron Technology, Inc. | Arithmetic operations in memory |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4468688A (en) * | 1981-04-10 | 1984-08-28 | Ampex Corporation | Controller for system for spatially transforming images |
US5404553A (en) * | 1991-01-09 | 1995-04-04 | Mitsubishi Denki Kabushiki Kaisha | Microprocessor and data flow microprocessor having vector operation function |
US5528549A (en) * | 1993-05-28 | 1996-06-18 | Texas Instruments Incorporated | Apparatus, systems and methods for distributed signal processing |
US5528550A (en) * | 1993-05-28 | 1996-06-18 | Texas Instruments Incorporated | Apparatus, systems and methods for implementing memory embedded search arithmetic logic unit |
US5915077A (en) * | 1997-07-28 | 1999-06-22 | Canon Kabushiki Kaisha | Image compression using adjacent pixels and predetermined colors |
US6002411A (en) * | 1994-11-16 | 1999-12-14 | Interactive Silicon, Inc. | Integrated video and memory controller with data processing and graphical processing capabilities |
US6058056A (en) * | 1998-04-30 | 2000-05-02 | Micron Technology, Inc. | Data compression circuit and method for testing memory devices |
US6163863A (en) * | 1998-05-22 | 2000-12-19 | Micron Technology, Inc. | Method and circuit for compressing test data in a memory device |
US6191791B1 (en) * | 1997-09-30 | 2001-02-20 | Hewlett-Packard Company | Methods for high precision, memory efficient surface normal compression and expansion |
US6212628B1 (en) * | 1998-04-09 | 2001-04-03 | Teranex, Inc. | Mesh connected computer |
US6237786B1 (en) * | 1995-02-13 | 2001-05-29 | Intertrust Technologies Corp. | Systems and methods for secure transaction management and electronic rights protection |
US6337684B1 (en) * | 1998-05-29 | 2002-01-08 | Hewlett-Packard Company | Surface normal compression/decompression storing two vector components |
US20020070943A1 (en) * | 2000-09-07 | 2002-06-13 | Hall Deirdre M. | Graphics memory system for volumeric displays |
US20020199046A1 (en) * | 2001-06-22 | 2002-12-26 | Michael Ruehle | Method and apparatus for active memory bus peripheral control utilizing address call sequencing |
US20030012062A1 (en) * | 2001-06-11 | 2003-01-16 | Emblaze Semiconductor Ltd. | Specialized memory device |
US6704022B1 (en) * | 2000-02-25 | 2004-03-09 | Ati International Srl | System for accessing graphics data from memory and method thereof |
US6754802B1 (en) * | 2000-08-25 | 2004-06-22 | Micron Technology, Inc. | Single instruction multiple data massively parallel processor systems on a chip and system using same |
US20040193842A1 (en) * | 2003-03-31 | 2004-09-30 | Graham Kirsch | Active memory processing array topography and method |
US20040193839A1 (en) * | 2003-03-27 | 2004-09-30 | Graham Kirsch | Data reordering processor and method for use in an active memory device |
US20040193784A1 (en) * | 2003-03-27 | 2004-09-30 | Graham Kirsch | System and method for encoding processing element commands in an active memory device |
US20040193840A1 (en) * | 2003-03-27 | 2004-09-30 | Graham Kirsch | Active memory command engine and method |
US20060233261A1 (en) * | 1999-04-06 | 2006-10-19 | Leonid Yavits | Video encoding and video/audio/data multiplexing device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6108460A (en) * | 1996-01-02 | 2000-08-22 | Pixelfusion Limited | Load balanced image generation |
US7418344B2 (en) * | 2001-08-02 | 2008-08-26 | Sandisk Corporation | Removable computer with mass storage |
-
2003
- 2003-04-25 US US10/424,206 patent/US9015390B2/en active Active
-
2006
- 2006-05-09 US US11/431,455 patent/US20060206641A1/en not_active Abandoned
Patent Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4468688A (en) * | 1981-04-10 | 1984-08-28 | Ampex Corporation | Controller for system for spatially transforming images |
US5404553A (en) * | 1991-01-09 | 1995-04-04 | Mitsubishi Denki Kabushiki Kaisha | Microprocessor and data flow microprocessor having vector operation function |
US5528549A (en) * | 1993-05-28 | 1996-06-18 | Texas Instruments Incorporated | Apparatus, systems and methods for distributed signal processing |
US5528550A (en) * | 1993-05-28 | 1996-06-18 | Texas Instruments Incorporated | Apparatus, systems and methods for implementing memory embedded search arithmetic logic unit |
US6002411A (en) * | 1994-11-16 | 1999-12-14 | Interactive Silicon, Inc. | Integrated video and memory controller with data processing and graphical processing capabilities |
US6237786B1 (en) * | 1995-02-13 | 2001-05-29 | Intertrust Technologies Corp. | Systems and methods for secure transaction management and electronic rights protection |
US5915077A (en) * | 1997-07-28 | 1999-06-22 | Canon Kabushiki Kaisha | Image compression using adjacent pixels and predetermined colors |
US6191791B1 (en) * | 1997-09-30 | 2001-02-20 | Hewlett-Packard Company | Methods for high precision, memory efficient surface normal compression and expansion |
US6326966B1 (en) * | 1997-09-30 | 2001-12-04 | Hewlett Packard Company | Methods for high precision, memory efficient surface normal compression and expansion |
US6212628B1 (en) * | 1998-04-09 | 2001-04-03 | Teranex, Inc. | Mesh connected computer |
US6058056A (en) * | 1998-04-30 | 2000-05-02 | Micron Technology, Inc. | Data compression circuit and method for testing memory devices |
US6163863A (en) * | 1998-05-22 | 2000-12-19 | Micron Technology, Inc. | Method and circuit for compressing test data in a memory device |
US6337684B1 (en) * | 1998-05-29 | 2002-01-08 | Hewlett-Packard Company | Surface normal compression/decompression storing two vector components |
US20060233261A1 (en) * | 1999-04-06 | 2006-10-19 | Leonid Yavits | Video encoding and video/audio/data multiplexing device |
US6704022B1 (en) * | 2000-02-25 | 2004-03-09 | Ati International Srl | System for accessing graphics data from memory and method thereof |
US6754802B1 (en) * | 2000-08-25 | 2004-06-22 | Micron Technology, Inc. | Single instruction multiple data massively parallel processor systems on a chip and system using same |
US20040221135A1 (en) * | 2000-08-25 | 2004-11-04 | Graham Kirsch | Method for forming a single instruction multiple data massively parallel processor system on a chip |
US20020070943A1 (en) * | 2000-09-07 | 2002-06-13 | Hall Deirdre M. | Graphics memory system for volumeric displays |
US20030012062A1 (en) * | 2001-06-11 | 2003-01-16 | Emblaze Semiconductor Ltd. | Specialized memory device |
US20020199046A1 (en) * | 2001-06-22 | 2002-12-26 | Michael Ruehle | Method and apparatus for active memory bus peripheral control utilizing address call sequencing |
US20040193839A1 (en) * | 2003-03-27 | 2004-09-30 | Graham Kirsch | Data reordering processor and method for use in an active memory device |
US20040193784A1 (en) * | 2003-03-27 | 2004-09-30 | Graham Kirsch | System and method for encoding processing element commands in an active memory device |
US20040193840A1 (en) * | 2003-03-27 | 2004-09-30 | Graham Kirsch | Active memory command engine and method |
US20040193842A1 (en) * | 2003-03-31 | 2004-09-30 | Graham Kirsch | Active memory processing array topography and method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3320429A4 (en) * | 2015-07-30 | 2018-07-18 | Huawei Technologies Co., Ltd. | System and method for variable lane architecture |
JP2018521427A (en) * | 2015-07-30 | 2018-08-02 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | System and method for variable lane architecture |
US10691463B2 (en) | 2015-07-30 | 2020-06-23 | Futurewei Technologies, Inc. | System and method for variable lane architecture |
US10884756B2 (en) | 2015-07-30 | 2021-01-05 | Futurewei Technologies, Inc. | System and method for variable lane architecture |
Also Published As
Publication number | Publication date |
---|---|
US9015390B2 (en) | 2015-04-21 |
US20040215852A1 (en) | 2004-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9015390B2 (en) | Active memory data compression system and method | |
US7454451B2 (en) | Method for finding local extrema of a set of values for a parallel processing element | |
US7793075B2 (en) | Active memory command engine and method | |
US7584343B2 (en) | Data reordering processor and method for use in an active memory device | |
US7447720B2 (en) | Method for finding global extrema of a set of bytes distributed across an array of parallel processing elements | |
US7574466B2 (en) | Method for finding global extrema of a set of shorts distributed across an array of parallel processing elements | |
US5872987A (en) | Massively parallel computer including auxiliary vector processor | |
JPH05502125A (en) | Microprocessor with last-in, first-out stack, microprocessor system, and method of operating a last-in, first-out stack | |
JP2002509302A (en) | A multiprocessor computer architecture incorporating multiple memory algorithm processors in a memory subsystem. | |
CN111274025A (en) | System and method for accelerating data processing in SSD | |
JP2021507352A (en) | Memory device and methods for controlling it | |
JP2620511B2 (en) | Data processor | |
US7073034B2 (en) | System and method for encoding processing element commands in an active memory device | |
US8001358B2 (en) | Microprocessor and method of processing data including peak value candidate selecting part and peak value calculating part | |
JP7507304B2 (en) | Clearing register data | |
US5717891A (en) | Digital signal processor with caching of instructions that produce a memory conflict | |
US6243822B1 (en) | Method and system for asynchronous array loading | |
KR20230095795A (en) | Host device performing near data processing function and accelerator system including the same | |
JP2023533795A (en) | Erasing register data | |
JP2583614B2 (en) | Vector arithmetic unit | |
WO2009136402A2 (en) | Register file system and method thereof for enabling a substantially direct memory access | |
GB2393286A (en) | Method for finding a local extreme of a set of values associated with a processing element by separating the set into an odd and an even position pair of sets | |
JPH0242530A (en) | Central processing unit | |
JPH0310325A (en) | Data processing system and data processor | |
JPH07219765A (en) | Microprogram controller |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |