US7451293B2 - Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing - Google Patents
Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing Download PDFInfo
- Publication number
- US7451293B2 US7451293B2 US11/584,480 US58448006A US7451293B2 US 7451293 B2 US7451293 B2 US 7451293B2 US 58448006 A US58448006 A US 58448006A US 7451293 B2 US7451293 B2 US 7451293B2
- Authority
- US
- United States
- Prior art keywords
- data
- processing engines
- instructions
- computer system
- array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000012545 processing Methods 0.000 title claims abstract description 132
- 238000012163 sequencing technique Methods 0.000 title claims description 12
- 230000015654 memory Effects 0.000 claims abstract description 41
- 238000012546 transfer Methods 0.000 claims abstract description 36
- 238000004891 communication Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 2
- 239000013598 vector Substances 0.000 description 30
- 238000000034 method Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G06F15/8023—Two dimensional arrays, e.g. mesh, torus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
Definitions
- the invention relates generally to computer processors. More specifically, the invention relates to an integrated processor array, instruction sequencer, and I/O controller.
- processors are increasingly asked to perform mathematical operations, such as calculations and other data manipulation, at greater rates of speed.
- processors are also increasingly required to transfer more data at higher rates of speed, as multimedia and other applications employ larger files storing greater amounts of data.
- the invention can be implemented in numerous ways, including as a method, system, and device. Various embodiments of the invention are discussed below
- a computer system comprises an instruction sequencing unit configured to sequence instructions for manipulating data and to transmit the sequenced instructions.
- the computer system also includes an array of processing engines configured to receive instructions corresponding to the sequenced instructions, each processing engine of the array of processing engines being configured to receive the data.
- Each processing engine has a first memory configured to store the data, a decision unit configured to store decision data, and a Boolean unit configured to store a logic state and to modify the logic state according to the received instructions.
- Each processing engine also has an integer unit configured to conditionally perform integer operations on the stored data according to the stored decision data, the received instructions, and the logic state, so as to generate integer result data, as well as a second memory configured to store I/O data.
- the Boolean unit is configured to modify the logic state in the same clock cycle as the integer unit performs the integer operations.
- the computer system also includes an I/O controller configured to transmit the I/O data to, and receive the I/O data from, the array of processing engines.
- a computer system comprises a processing array having processing engines serially interconnected in rows and columns so as to form rows of processing engines and columns of processing engines, the processing array configured to execute I/O operations by shifting I/O data sequentially through the columns of processing engines, to shift computation data sequentially across the rows of processing engines, and to execute computation operations upon the shifted computation data in parallel with the I/O operations.
- the computer system also includes an instruction sequencing unit configured to sequence instructions and to transfer the instructions to the processing engines of the processing array so as to control the computation operations. It also includes an I/O controller configured to exchange the I/O data with the processing engines of the processing array.
- FIG. 1 illustrates a block diagram representation of a processor constructed in accordance with the invention, and including an integrated instruction sequencer, an array of processing engines, and an I/O controller.
- FIG. 2 illustrates further details of processing engines constructed in accordance with the invention, as well as their interconnection.
- FIG. 3 illustrates a block diagram representation of an individual processing engine in accordance with the invention.
- FIG. 4 is a vector representation of commands to be executed by the processing engines of FIG. 3 .
- the invention relates to a computer processor having an integrated instruction sequencer, array of processing engines, and I/O controller.
- the instruction sequencer sequences instructions from a host and transfers these instructions to the processing engines, thus directing their operation.
- the I/O controller controls the transfer of I/O data to and from the processing engines in parallel with the processing controlled by the instruction sequencer.
- the processing engines themselves are constructed with an integer arithmetic and logic unit (ALU), a 1-bit ALU, a decision unit, and registers. Instructions from the instruction sequencer direct the integer ALU to perform integer operations according to a logic state stored in the 1-bit ALU and data stored in the decision unit.
- the 1-bit ALU and the decision unit can modify their stored information in the same clock cycle as the integer ALU carries out its operation, allowing for faster and more efficient processing.
- the processing engines also contain a local memory for storing instructions and data to be shifted among the engines.
- FIG. 1 illustrates a processor of the invention in block diagram form.
- the processor 100 includes an instruction sequencer 102 , an array 104 of processing engines, and an I/O controller 106 .
- the instruction sequencer 102 receives tasks from a host (not shown), and transforms each task into sequences of instructions for proper use by the array 104 .
- decoders 108 , 110 can decode instructions from the instruction sequencer 102 , translating the instructions for various applications to corresponding native instructions understood by the array 104 . Instructions are then fed to the pipeline registers 112 , where they are fed sequentially to the array 104 .
- the array 104 is also configured to handle I/O data.
- the I/O controller 106 receives I/O data from the host or from an external memory, and transfers it to an I/O interface 114 , where it is formatted for the local memories of individual processing engines of the array 104 .
- the processor 100 includes the ability to transfer I/O data to individual processing engines in a number of ways, to maximize efficiency and speed.
- the processing engines When the processing engines have finished performing their various operations on their data, including shifting the data amongst the processors, the data is shifted out of the array 104 . I/O data is shifted out to the I/O controller 106 , while other data is shifted out to the instruction sequencer 102 for transfer to the host, via an adder 116 if desired.
- the processing engines have the capacity to simultaneously transfer I/O data and perform operations on other data, adding to the speed and efficiency of the processor 100 .
- FIG. 2 illustrates the interconnections between processing engines in the array 104 .
- the array 104 is constructed as a two dimensional array of processing engines PE ij .
- the processing engines PE ij are serially interconnected in rows and columns. That is, the processing engines PE ij are arranged in rows and columns, with each processing engine PE ij able to exchange data with its neighboring processing engines, both in its row and in its adjacent columns.
- the processing engines at the end of each row are able to exchange data with the first processing engine of the next row, and vice versa.
- the processing engines at the end of each column are able to transfer data to the first processing engine in the same column.
- the processing engines can thus be configured to transfer I/O data and other data both column-wise and row-wise.
- the I/O controller 106 transfers I/O data (perhaps after formatting by the I/O interface 114 ) to various processing engines, which transfer the I/O data serially down their respective columns. Simultaneously, this I/O data, or other data inserted into the various processing engines accompanying instructions from the instruction sequencer 102 , can be operated on by each processing engine and shifted row-wise. In this manner, the array 104 can both transfer I/O data as well as simultaneously perform various operations on that or other data.
- I/O bounded processes are dominated by the need to transfer large amounts of data without performing significant computational operations upon that data, e.g., multimedia file playback, file copying, or other transfers of large amounts of data).
- FIG. 3 illustrates a block diagram representation of an individual processing engine PE ij in accordance with the invention.
- each processing engine 300 includes an integer ALU 302 , a 1-bit ALU 304 , and a decision unit 306 that either execute, or facilitate the execution of, various operations.
- the processing engine 300 also includes a local data memory 308 and registers 310 .
- the integer ALU 302 , 1-bit ALU 304 , and decision unit 306 are connected so as to operate in parallel with each other.
- the 1-bit ALU 304 and decision unit 306 can send their current logic states to the integer ALU 302 as well as modify those states in the same clock cycle.
- the processing engine 300 receives sequenced instructions from the instruction sequencer 102 .
- the instructions are sent to the integer ALU 302 , as well as to the registers 310 and local data memory 308 .
- the instructions are also sent to the 1-bit ALU 304 and decision unit 306 .
- Instructions requiring computation direct the registers 310 and/or local data memory 308 to transfer data to the integer ALU 302 for processing.
- the data can be transferred from the registers 308 to the integer ALU 302 as left and right operands, although the invention includes any form of data transfer among the local data memory 308 , registers 310 , and integer ALU 302 .
- the instructions also modify the logic state of the 1-bit ALU 304 .
- the 1-bit ALU 304 stores a single bit whose two binary logic states are read by the integer ALU 302 . Instructions from the instruction sequencer 302 can direct the integer ALU 302 to read the logic state of the 1-bit ALU 304 and execute different operations depending on the logic state.
- an instruction can direct the integer ALU 302 to add its data to data from a neighboring processing engine 300 if the logic state is binary “0”, or subtract its data from that of the neighboring processing engine 300 if the logic state is binary “1.”
- the 1-bit ALU 304 allows a single instruction to represent more than one operation.
- the instructions also modify a decision state stored in the decision unit 306 . This decision state indicates whether the particular processor is “marked” for execution of its instruction, or “unmarked” and thus directed not to execute its instruction. This allows the instruction sequencer 102 to selectively instruct individual processing engines 300 to carry out operations, or to avoid carrying out operations, as necessary. This allows the array 104 to execute more complex and detailed processes.
- integer ALU 302 1-bit ALU 304 , and decision unit 306 are arranged in parallel, so that the 1-bit ALU 304 and decision unit 306 can modify their states in the same clock cycle as the integer ALU 302 carries out its operations. This speeds the operation of each processing engine 300 , as the integer ALU 302 can thus carry out a new operation each clock cycle, rather than having to wait for the 1-bit ALU 304 and decision unit 306 to update first.
- the local memory 308 and registers 310 store data and instructions needed for the operations performed by the integer ALU 302 .
- the registers 310 are in electronic communication with the registers of adjacent processing engines 300 (both row-wise and column-wise), and thus allow data to be exchanged between adjacent processing engines 300 .
- the local memory 308 can exchange data with the registers 310 , so that data can be shifted from the registers 310 into the local data memory 308 for storage as necessary. This data can then be retrieved by the registers and either sent to the integer ALU 302 for processing, or shifted into the registers of adjacent processing engines 300 for eventual transfer out of the array 104 .
- the local data memory 308 and registers 310 also allow for the transfer of I/O data.
- the I/O controller 106 and/or I/O interface 114 can place I/O data into various processing engines 300 , typically by transferring data to the registers 310 . If calculations are required on this I/O data, they can be performed as above, and if not, the I/O data can be shifted down column-wise out of the array 104 and to the host. Alternatively, it can be shifted into the local data memory 308 for future processing or transfer.
- the processing engine 300 has a local data memory 308 that can hold at least 256 16-bit words.
- the register 310 can hold at least 8 16-bit words, as well as 8 Boolean bits for selecting the active components of the integer vectors for processing in the integer ALU 302 .
- FIG. 4 illustrates a vector representation of such an embodiment (a vector being simply a representation of data), where 1024 processing engines 300 are shown along the top of the chart, while the various vectors, registers, and Boolean bits of each engine 300 run along the side.
- instructions and data can be thought of as being transmitted to the processing engines 300 as vectors, e.g., vector — 000 is a 1024-component vector of data, each component of which is 16-bits long and is sent to one processing engine 300 .
- vector Boolean — 0 is a 1024-component vector of single bits, each of which is transmitted to the 1-bit ALU 304 of a processing engine 300 .
- each processing engine 300 can be represented as a column of FIG. 4 , able to store 256 16-bit words of data, 8 16-bit words of register information, and 8 Boolean bits.
- processing engine “0” can store the first 16-bit word from each of vector — 000-vector — 255 in its local data memory 308 for shifting down column-wise or for manipulation in its integer ALU 302 . It can also store the first 16-bit word from each of register — 0-register — 7 in its registers 310 as queued instructions or transferred data, and the first bit from each of boolean — 0-boolean — 7 in its registers 310 or 1-bit ALU 302 as queued logic states.
- the first such feature relates to the decoding of instructions.
- the instruction sequencer 102 can include decoders 108 , 110 for decoding instructions. These decoders 108 , 110 can store microcode instructions corresponding to the instruction sets of any applications. The instruction sequencer 102 then transmits sequenced instructions to the decoders 108 , 110 , which retrieve the corresponding microcode instructions and transmit them to the processing engines 300 of the array 104 . This allows the processor 100 to be compatible with any application, so long as microcode corresponding to instructions for that application can be stored in the decoders 108 , 110 .
- the decoders 108 , 110 are SRAM decoders, which allows users to periodically update or otherwise alter the stored instruction sets, although the invention encompasses decoders 108 , 110 that employ any form of memory for storing microcode instructions corresponding to the instructions for various applications. Also, it is sometimes preferred that one decoder 108 is dedicated to storing the operation codes of the integer ALU 302 , while the other decoder 110 is dedicated to storing Boolean operation codes for the 1-bit ALU 304 .
- the invention is not limited to embodiments including two separate decoders 108 , 110 , although it is sometimes preferable to include separate decoders 108 , 110 for integer and Boolean operation codes, so as to allow for independent changes to be made to either.
- the decoders 108 , 110 can store microcode corresponding to multiple applications, the stored microcode is often longer than the instructions received from the host. Thus, it is often the case that the decoders 108 , 110 act to effectively expand these received instructions.
- the expanded microcode instructions stored in the decoders 108 , 110 can be 64-bit microcode instructions (allowing for 2 64 possible unique instructions).
- the processor 100 may receive relatively small instructions like 8- or 16-bit instructions, it may work internally with larger 64-bit instructions.
- the second such feature concerns data addressing.
- the I/O controller 106 and/or I/O interface 114 can transmit I/O data to any processing engine 300 . That is, data can be transmitted to any arbitrarily selected processing engine 300 . This allows for more efficient use of the array 104 , as I/O data can be preferentially sent to those processing engines 300 that are less active and able to more immediately handle the data.
- the arbitrary selection of particular processing engines 300 is accomplished by first instructing each processing engine 300 to transmit an available address in its local memory 308 to the I/O controller 106 .
- the addresses can be any format, but it is often convenient to transmit the addresses as a vector, where each element of the vector represents a different processing engine 300 . Each element can thus be filled by the position in the local data memory 308 that is available to hold data, if any. A zero value can represent a processing engine 300 that is unavailable for I/O data.
- each processing engine 300 is directed to transmit a position in its memory 308 , and these positions are assembled into a vector that effectively contains the identities of each available processing engine 300 and the available memory positions of each. This vector allows the I/O controller 106 to quickly determine where it can transfer I/O data.
- vectors can also be used in the transfer of data to/from memories external to the processor 100 .
- the array 104 can be instructed to construct a vector containing addresses to be used in accessing an external memory. This vector can then be transferred out through the I/O controller 106 to address desired portions of the external memory for data transfer to/from that external memory.
- processing engines 300 can be instructed to transmit memory positions of I/O data they store, and these positions can be assembled into a vector informing the I/O controller 106 of the addresses at which it can retrieve data from the processing engines 300 .
- this approach increases the overall efficiency of the processor 100 , as a single instruction from the instruction sequencer 102 allows all available processing engines 300 to be identified, and data to be transferred to/from only those processing engines.
- the third such feature concerns data formatting.
- the I/O controller 106 and/or I/O interface 114 can format data to fit the local data memories 308 of the processing engines 300 .
- the invention encompasses the use of any data format.
- the I/O controller 106 can load/store data in shuffled mode, direct transfer mode, and indirect transfer mode.
- the I/O controller 106 can also perform byte expanded loads and byte compacted stores, as well as word expanded loads and word compacted stores.
- shuffled mode data from the host is divided into two vectors, one vector having the even-numbered words and one vector having the odd-numbered words. That is, if the host transmits data in 16-byte word format, each processing engine 300 stores data in 16-bit format, and the array 104 contains 1024 processing engines 300 , then the I/O controller 106 can accumulate a 2048-component double-length vector of data from the host, [w 0 , w 1 , . . . , w 2047 ], where each component wi is a 2-byte word.
- the two 1024-component vectors are then sent to the 1024 processing engines 300 , where each 2-byte (i.e., 16-bit) component is already formatted for storage in the registers 310 and local data memory 308 .
- each 2-byte (i.e., 16-bit) component is already formatted for storage in the registers 310 and local data memory 308 .
- the I/O controller 106 breaks up host-formatted data into two 1024-component vectors, each component of which contains data formatted for the processing engines 300 .
- the I/O controller 106 can accumulate 512 2-byte words [w 0 , w 1 , . . . , w 511 ], which are then divided into 1024 2-byte words, with the most significant byte of each word set to zero: ⁇ 8′b0, w0[7:0] ⁇ , ⁇ 8′b0, w0[15:8] ⁇ , ⁇ 8′b0, w1[7:0] ⁇ , ⁇ 8′b0, w1[15:8] ⁇ , . . .
- each byte from external memory is stored as a 16-bit number with the most significant byte zero.
- a vector of stored 16-bit numbers [w 0 , w 1 , . . . , w 1023 ] is retrieved, and the zero-value most significant bytes are stripped out to yield 1024 2-byte words again: ⁇ w 0 [7:0], w 1 [7:0], . . . , w 1023 [7:0] ⁇ .
- the I/O controller 106 can accumulate a vector of 512 2-byte words [w 0 , w 1 , . . . , w 511 ], which are then converted to 1024 2-byte words, where every other 2-byte word is set to zero.
- the 1024 2-byte words are then loaded into the array 104 as vector: [w0, 16′b0, w1, 16′b0, . . . , w510, 16′b0, w511, 16′b0]
- every other 2-byte word i.e., the zero-value words
- every other 2-byte word is stripped out to once again achieve a vector of 512 2-byte words: [w 0 , w 2 , . . . , w 1020 , w 1022 ].
- direct transfer mode the I/O controller 106 uses a specified increment, and transfers data to the processing engines 300 based on this increment. For example, if the increment is 2, the I/O controller 106 transfers its data to every other processing engine 300 .
- indirect transfer mode involves addresses provided by each processing engine 300 , similar to the data addressing techniques described above. For instance, each processing engine 300 is instructed to provide its address based on whether it is sufficiently available to receive data. The I/O controller 106 then transmits its data to the processing engines 300 that it has received addresses from.
- each processing engine 300 to shift data to and from adjacent processing engines 300 , coupled with the ability of the instruction sequencer 102 to selectively mark engines 300 for executing computational operations, allows for great flexibility and speed in computation, providing for much faster computation bounded processes.
- a single instruction from the instruction sequencer 102 can instruct every processing engine 300 in the array 104 to execute varying operations, with different engines 300 instructed to perform different operations according to the logic states set individually by the instruction, or instructed not to perform any calculations at all.
- each individual instruction can control a “global” set of operations that can vary as necessary from engine 300 to engine 300 .
- the array 104 can perform functions such as sequential multiplication algorithms much faster.
- Multiplication can be performed using a process which inspects 2 bits in each step, decides the appropriate addition, and performs two position shifts. This can be accomplished with only three instructions (init_mult, mult, end_mult, each having specific microcode generated by the programmable decoders 108 and 110 ) in the processor 100 , thus greatly speeding multiplication.
- two bits of multiplicand can be tested in each cycle:
- the array 104 need not be limited to a two dimensional array of rows and columns, but can be organized in any manner.
- certain components such as the SRAM decoders 108 , 110 and I/O interface 114 may be desirable in certain embodiments, they are not required for the practice of the invention.
- the embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
- Programmable Controllers (AREA)
Abstract
Description
v1=[w0, w2, . . . , w2046]
and
v2=[w1, w3, . . . , w2047]
The two 1024-component vectors are then sent to the 1024
{8′b0, w0[7:0]}, {8′b0, w0[15:8]},
{8′b0, w1[7:0]}, {8′b0, w1[15:8]},
. . .
{8′b0, w510[7:0]}, {8′b0, w510[15:8]},
{8′b0, w511[7:0]}, {8′b0, w511[15:8]},
In other words, each byte from external memory is stored as a 16-bit number with the most significant byte zero. Conversely, for byte compacted stores, a vector of stored 16-bit numbers [w0, w1, . . . , w1023] is retrieved, and the zero-value most significant bytes are stripped out to yield 1024 2-byte words again: {w0[7:0], w1[7:0], . . . , w1023[7:0]}.
[w0, 16′b0, w1, 16′b0, . . . , w510, 16′b0, w511, 16′b0]
Conversely, for word compacted stores, every other 2-byte word (i.e., the zero-value words) is stripped out to once again achieve a vector of 512 2-byte words: [w0, w2, . . . , w1020, w1022].
Claims (17)
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/584,480 US7451293B2 (en) | 2005-10-21 | 2006-10-19 | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
EP06836411A EP1941380A2 (en) | 2005-10-21 | 2006-10-20 | Integrated processor array, instruction sequencer and i/o controller |
PCT/US2006/040975 WO2007050444A2 (en) | 2005-10-21 | 2006-10-20 | Integrated processor array, instruction sequencer and i/o controller |
JP2008534793A JP2009512920A (en) | 2005-10-21 | 2006-10-20 | Integrated processor array, instruction sequencer, and I / O controller |
KR1020087009137A KR20080091754A (en) | 2005-10-21 | 2006-10-20 | Integrated processor array, instruction sequencer and i/o controller |
CA002626184A CA2626184A1 (en) | 2005-10-21 | 2006-10-20 | Integrated processor array, instruction sequencer and i/o controller |
TW095138731A TW200745876A (en) | 2005-10-21 | 2006-10-20 | Integrated processor array, instruction sequencer and I/O controller |
US12/128,528 US20080307196A1 (en) | 2005-10-21 | 2008-05-28 | Integrated Processor Array, Instruction Sequencer And I/O Controller |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US72917805P | 2005-10-21 | 2005-10-21 | |
US11/584,480 US7451293B2 (en) | 2005-10-21 | 2006-10-19 | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/128,528 Division US20080307196A1 (en) | 2005-10-21 | 2008-05-28 | Integrated Processor Array, Instruction Sequencer And I/O Controller |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070130444A1 US20070130444A1 (en) | 2007-06-07 |
US7451293B2 true US7451293B2 (en) | 2008-11-11 |
Family
ID=37968408
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/584,480 Expired - Fee Related US7451293B2 (en) | 2005-10-21 | 2006-10-19 | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
US12/128,528 Abandoned US20080307196A1 (en) | 2005-10-21 | 2008-05-28 | Integrated Processor Array, Instruction Sequencer And I/O Controller |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/128,528 Abandoned US20080307196A1 (en) | 2005-10-21 | 2008-05-28 | Integrated Processor Array, Instruction Sequencer And I/O Controller |
Country Status (7)
Country | Link |
---|---|
US (2) | US7451293B2 (en) |
EP (1) | EP1941380A2 (en) |
JP (1) | JP2009512920A (en) |
KR (1) | KR20080091754A (en) |
CA (1) | CA2626184A1 (en) |
TW (1) | TW200745876A (en) |
WO (1) | WO2007050444A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080028192A1 (en) * | 2006-07-31 | 2008-01-31 | Nec Electronics Corporation | Data processing apparatus, and data processing method |
US20080059467A1 (en) * | 2006-09-05 | 2008-03-06 | Lazar Bivolarski | Near full motion search algorithm |
US20080126757A1 (en) * | 2002-12-05 | 2008-05-29 | Gheorghe Stefan | Cellular engine for a data processing system |
US20080307196A1 (en) * | 2005-10-21 | 2008-12-11 | Bogdan Mitu | Integrated Processor Array, Instruction Sequencer And I/O Controller |
US20100066748A1 (en) * | 2006-01-10 | 2010-03-18 | Lazar Bivolarski | Method And Apparatus For Scheduling The Processing Of Multimedia Data In Parallel Processing Systems |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8427490B1 (en) | 2004-05-14 | 2013-04-23 | Nvidia Corporation | Validating a graphics pipeline using pre-determined schedules |
US8624906B2 (en) * | 2004-09-29 | 2014-01-07 | Nvidia Corporation | Method and system for non stalling pipeline instruction fetching from memory |
US8698817B2 (en) | 2004-11-15 | 2014-04-15 | Nvidia Corporation | Video processor having scalar and vector components |
US9092170B1 (en) | 2005-10-18 | 2015-07-28 | Nvidia Corporation | Method and system for implementing fragment operation processing across a graphics bus interconnect |
US8683126B2 (en) | 2007-07-30 | 2014-03-25 | Nvidia Corporation | Optimal use of buffer space by a storage controller which writes retrieved data directly to a memory |
US9024957B1 (en) | 2007-08-15 | 2015-05-05 | Nvidia Corporation | Address independent shader program loading |
US8698819B1 (en) | 2007-08-15 | 2014-04-15 | Nvidia Corporation | Software assisted shader merging |
US8411096B1 (en) * | 2007-08-15 | 2013-04-02 | Nvidia Corporation | Shader program instruction fetch |
US8659601B1 (en) | 2007-08-15 | 2014-02-25 | Nvidia Corporation | Program sequencer for generating indeterminant length shader programs for a graphics processor |
US8028150B2 (en) * | 2007-11-16 | 2011-09-27 | Shlomo Selim Rakib | Runtime instruction decoding modification in a multi-processing array |
US9064333B2 (en) | 2007-12-17 | 2015-06-23 | Nvidia Corporation | Interrupt handling techniques in the rasterizer of a GPU |
US8780123B2 (en) | 2007-12-17 | 2014-07-15 | Nvidia Corporation | Interrupt handling techniques in the rasterizer of a GPU |
US8681861B2 (en) | 2008-05-01 | 2014-03-25 | Nvidia Corporation | Multistandard hardware video encoder |
US8923385B2 (en) | 2008-05-01 | 2014-12-30 | Nvidia Corporation | Rewind-enabled hardware encoder |
US8489851B2 (en) | 2008-12-11 | 2013-07-16 | Nvidia Corporation | Processing of read requests in a memory controller using pre-fetch mechanism |
US20140032795A1 (en) | 2011-04-13 | 2014-01-30 | Hewlett-Packard Development Company, L.P. | Input/output processing |
CN103392165B (en) * | 2011-06-24 | 2016-04-06 | 株式会社日立制作所 | Storage system |
JP5739758B2 (en) * | 2011-07-21 | 2015-06-24 | ルネサスエレクトロニクス株式会社 | Memory controller and SIMD processor |
CN107748674B (en) * | 2017-09-07 | 2021-08-31 | 中国科学院微电子研究所 | Information processing system oriented to bit granularity |
US11074214B2 (en) * | 2019-08-05 | 2021-07-27 | Arm Limited | Data processing |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3308436A (en) * | 1963-08-05 | 1967-03-07 | Westinghouse Electric Corp | Parallel computer system control |
US4212076A (en) * | 1976-09-24 | 1980-07-08 | Giddings & Lewis, Inc. | Digital computer structure providing arithmetic and boolean logic operations, the latter controlling the former |
US4783738A (en) * | 1986-03-13 | 1988-11-08 | International Business Machines Corporation | Adaptive instruction processing by array processor having processor identification and data dependent status registers in each processing element |
US6173386B1 (en) | 1998-12-14 | 2001-01-09 | Cisco Technology, Inc. | Parallel processor with debug capability |
US6336178B1 (en) | 1995-10-06 | 2002-01-01 | Advanced Micro Devices, Inc. | RISC86 instruction set |
US20020133688A1 (en) * | 2001-01-29 | 2002-09-19 | Ming-Hau Lee | SIMD/MIMD processing on a reconfigurable array |
US20020174318A1 (en) * | 1999-04-09 | 2002-11-21 | Dave Stuttard | Parallel data processing apparatus |
US6658578B1 (en) | 1998-10-06 | 2003-12-02 | Texas Instruments Incorporated | Microprocessors |
US20040006584A1 (en) | 2000-08-08 | 2004-01-08 | Ivo Vandeweerd | Array of parallel programmable processing engines and deterministic method of operating the same |
US6938183B2 (en) | 2001-09-21 | 2005-08-30 | The Boeing Company | Fault tolerant processing architecture |
Family Cites Families (93)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4575818A (en) * | 1983-06-07 | 1986-03-11 | Tektronix, Inc. | Apparatus for in effect extending the width of an associative memory by serial matching of portions of the search pattern |
JPS6224366A (en) * | 1985-07-03 | 1987-02-02 | Hitachi Ltd | Vector processor |
US4907148A (en) * | 1985-11-13 | 1990-03-06 | Alcatel U.S.A. Corp. | Cellular array processor with individual cell-level data-dependent cell control and multiport input memory |
GB2211638A (en) * | 1987-10-27 | 1989-07-05 | Ibm | Simd array processor |
US4873626A (en) * | 1986-12-17 | 1989-10-10 | Massachusetts Institute Of Technology | Parallel processing system with processor array having memory system included in system memory |
US5122984A (en) * | 1987-01-07 | 1992-06-16 | Bernard Strehler | Parallel associative memory system |
DE3877105D1 (en) * | 1987-09-30 | 1993-02-11 | Siemens Ag, 8000 Muenchen, De | |
US4876644A (en) * | 1987-10-30 | 1989-10-24 | International Business Machines Corp. | Parallel pipelined processor |
US4983958A (en) * | 1988-01-29 | 1991-01-08 | Intel Corporation | Vector selectable coordinate-addressable DRAM array |
US5241635A (en) * | 1988-11-18 | 1993-08-31 | Massachusetts Institute Of Technology | Tagged token data processing system with operand matching in activation frames |
AU624205B2 (en) * | 1989-01-23 | 1992-06-04 | General Electric Capital Corporation | Variable length string matcher |
US5497488A (en) * | 1990-06-12 | 1996-03-05 | Hitachi, Ltd. | System for parallel string search with a function-directed parallel collation of a first partition of each string followed by matching of second partitions |
US5319762A (en) * | 1990-09-07 | 1994-06-07 | The Mitre Corporation | Associative memory capable of matching a variable indicator in one string of characters with a portion of another string |
US5963746A (en) * | 1990-11-13 | 1999-10-05 | International Business Machines Corporation | Fully distributed processing memory element |
ATE180586T1 (en) * | 1990-11-13 | 1999-06-15 | Ibm | PARALLEL ASSOCIATIVE PROCESSOR SYSTEM |
US5765011A (en) * | 1990-11-13 | 1998-06-09 | International Business Machines Corporation | Parallel processing system having a synchronous SIMD processing with processing elements emulating SIMD operation using individual instruction streams |
US5150430A (en) * | 1991-03-15 | 1992-09-22 | The Board Of Trustees Of The Leland Stanford Junior University | Lossless data compression circuit and method |
US5228098A (en) * | 1991-06-14 | 1993-07-13 | Tektronix, Inc. | Adaptive spatio-temporal compression/decompression of video image signals |
US5706290A (en) * | 1994-12-15 | 1998-01-06 | Shaw; Venson | Method and apparatus including system architecture for multimedia communication |
US5373290A (en) * | 1991-09-25 | 1994-12-13 | Hewlett-Packard Corporation | Apparatus and method for managing multiple dictionaries in content addressable memory based data compression |
US5640582A (en) * | 1992-05-21 | 1997-06-17 | Intel Corporation | Register stacking in a computer system |
US5450599A (en) * | 1992-06-04 | 1995-09-12 | International Business Machines Corporation | Sequential pipelined processing for the compression and decompression of image data |
US5818873A (en) * | 1992-08-03 | 1998-10-06 | Advanced Hardware Architectures, Inc. | Single clock cycle data compressor/decompressor with a string reversal mechanism |
US5440753A (en) * | 1992-11-13 | 1995-08-08 | Motorola, Inc. | Variable length string matcher |
US5446915A (en) * | 1993-05-25 | 1995-08-29 | Intel Corporation | Parallel processing system virtual connection method and apparatus with protection and flow control |
JPH07114577A (en) * | 1993-07-16 | 1995-05-02 | Internatl Business Mach Corp <Ibm> | Data retrieval apparatus as well as apparatus and method for data compression |
US6073185A (en) * | 1993-08-27 | 2000-06-06 | Teranex, Inc. | Parallel data processor |
US5490264A (en) * | 1993-09-30 | 1996-02-06 | Intel Corporation | Generally-diagonal mapping of address space for row/column organizer memories |
US6085283A (en) * | 1993-11-19 | 2000-07-04 | Kabushiki Kaisha Toshiba | Data selecting memory device and selected data transfer device |
US5602764A (en) * | 1993-12-22 | 1997-02-11 | Storage Technology Corporation | Comparing prioritizing memory for string searching in a data compression system |
US5758176A (en) * | 1994-09-28 | 1998-05-26 | International Business Machines Corporation | Method and system for providing a single-instruction, multiple-data execution unit for performing single-instruction, multiple-data operations within a superscalar data processing system |
US5631849A (en) * | 1994-11-14 | 1997-05-20 | The 3Do Company | Decompressor and compressor for simultaneously decompressing and compressng a plurality of pixels in a pixel array in a digital image differential pulse code modulation (DPCM) system |
US5682491A (en) * | 1994-12-29 | 1997-10-28 | International Business Machines Corporation | Selective processing and routing of results among processors controlled by decoding instructions using mask value derived from instruction tag and processor identifier |
US6128720A (en) * | 1994-12-29 | 2000-10-03 | International Business Machines Corporation | Distributed processing array with component processors performing customized interpretation of instructions |
US5867726A (en) * | 1995-05-02 | 1999-02-02 | Hitachi, Ltd. | Microcomputer |
US6317819B1 (en) * | 1996-01-11 | 2001-11-13 | Steven G. Morton | Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction |
US5963210A (en) * | 1996-03-29 | 1999-10-05 | Stellar Semiconductor, Inc. | Graphics processor, system and method for generating screen pixels in raster order utilizing a single interpolator |
US5828593A (en) * | 1996-07-11 | 1998-10-27 | Northern Telecom Limited | Large-capacity content addressable memory |
JP2882475B2 (en) * | 1996-07-12 | 1999-04-12 | 日本電気株式会社 | Thread execution method |
US5867598A (en) * | 1996-09-26 | 1999-02-02 | Xerox Corporation | Method and apparatus for processing of a JPEG compressed image |
US6212237B1 (en) * | 1997-06-17 | 2001-04-03 | Nippon Telegraph And Telephone Corporation | Motion vector search methods, motion vector search apparatus, and storage media storing a motion vector search program |
US5909686A (en) * | 1997-06-30 | 1999-06-01 | Sun Microsystems, Inc. | Hardware-assisted central processing unit access to a forwarding database |
US5951672A (en) * | 1997-07-02 | 1999-09-14 | International Business Machines Corporation | Synchronization method for work distribution in a multiprocessor system |
EP0905651A3 (en) * | 1997-09-29 | 2000-02-23 | Canon Kabushiki Kaisha | Image processing apparatus and method |
US6089453A (en) * | 1997-10-10 | 2000-07-18 | Display Edge Technology, Ltd. | Article-information display system using electronically controlled tags |
US6226710B1 (en) * | 1997-11-14 | 2001-05-01 | Utmc Microelectronic Systems Inc. | Content addressable memory (CAM) engine |
US6101592A (en) * | 1998-12-18 | 2000-08-08 | Billions Of Operations Per Second, Inc. | Methods and apparatus for scalable instruction set architecture with dynamic compact instructions |
US6145075A (en) * | 1998-02-06 | 2000-11-07 | Ip-First, L.L.C. | Apparatus and method for executing a single-cycle exchange instruction to exchange contents of two locations in a register file |
US6295534B1 (en) * | 1998-05-28 | 2001-09-25 | 3Com Corporation | Apparatus for maintaining an ordered list |
US6088044A (en) * | 1998-05-29 | 2000-07-11 | International Business Machines Corporation | Method for parallelizing software graphics geometry pipeline rendering |
US6119215A (en) * | 1998-06-29 | 2000-09-12 | Cisco Technology, Inc. | Synchronization and control system for an arrayed processing engine |
US6269354B1 (en) * | 1998-11-30 | 2001-07-31 | David W. Arathorn | General purpose recognition e-circuits capable of translation-tolerant recognition, scene segmentation and attention shift, and their application to machine vision |
FR2788873B1 (en) * | 1999-01-22 | 2001-03-09 | Intermec Scanner Technology Ct | METHOD AND DEVICE FOR DETECTING RIGHT SEGMENTS IN A DIGITAL DATA FLOW REPRESENTATIVE OF AN IMAGE, IN WHICH THE POINTS CONTOURED OF SAID IMAGE ARE IDENTIFIED |
US6542989B2 (en) * | 1999-06-15 | 2003-04-01 | Koninklijke Philips Electronics N.V. | Single instruction having op code and stack control field |
US6611524B2 (en) * | 1999-06-30 | 2003-08-26 | Cisco Technology, Inc. | Programmable data packet parser |
ATE310358T1 (en) * | 1999-07-30 | 2005-12-15 | Indinell Sa | METHOD AND DEVICE FOR PROCESSING DIGITAL IMAGES AND AUDIO DATA |
US6745317B1 (en) * | 1999-07-30 | 2004-06-01 | Broadcom Corporation | Three level direct communication connections between neighboring multiple context processing elements |
US7072398B2 (en) * | 2000-12-06 | 2006-07-04 | Kai-Kuang Ma | System and method for motion vector generation and analysis of digital video clips |
US7191310B2 (en) * | 2000-01-19 | 2007-03-13 | Ricoh Company, Ltd. | Parallel processor and image processing apparatus adapted for nonlinear processing through selection via processor element numbers |
US20020107990A1 (en) * | 2000-03-03 | 2002-08-08 | Surgient Networks, Inc. | Network connected computing system including network switch |
US7020671B1 (en) * | 2000-03-21 | 2006-03-28 | Hitachi America, Ltd. | Implementation of an inverse discrete cosine transform using single instruction multiple data instructions |
US6898304B2 (en) * | 2000-12-01 | 2005-05-24 | Applied Materials, Inc. | Hardware configuration for parallel data processing without cross communication |
US6772268B1 (en) * | 2000-12-22 | 2004-08-03 | Nortel Networks Ltd | Centralized look up engine architecture and interface |
US7013302B2 (en) * | 2000-12-22 | 2006-03-14 | Nortel Networks Limited | Bit field manipulation |
GB2377519B (en) * | 2001-02-14 | 2005-06-15 | Clearspeed Technology Ltd | Lookup engine |
US6985633B2 (en) * | 2001-03-26 | 2006-01-10 | Ramot At Tel Aviv University Ltd. | Device and method for decoding class-based codewords |
US6782054B2 (en) * | 2001-04-20 | 2004-08-24 | Koninklijke Philips Electronics, N.V. | Method and apparatus for motion vector estimation |
JP2003069535A (en) * | 2001-06-15 | 2003-03-07 | Mitsubishi Electric Corp | Multiplexing and demultiplexing device for error correction, optical transmission system, and multiplexing transmission method for error correction using them |
US6760821B2 (en) * | 2001-08-10 | 2004-07-06 | Gemicer, Inc. | Memory engine for the inspection and manipulation of data |
JP2003100086A (en) * | 2001-09-25 | 2003-04-04 | Fujitsu Ltd | Associative memory circuit |
US7116712B2 (en) * | 2001-11-02 | 2006-10-03 | Koninklijke Philips Electronics, N.V. | Apparatus and method for parallel multimedia processing |
JP3902741B2 (en) * | 2002-01-25 | 2007-04-11 | 株式会社半導体理工学研究センター | Semiconductor integrated circuit device |
US6901476B2 (en) * | 2002-05-06 | 2005-05-31 | Hywire Ltd. | Variable key type search engine and method therefor |
US7000091B2 (en) * | 2002-08-08 | 2006-02-14 | Hewlett-Packard Development Company, L.P. | System and method for independent branching in systems with plural processing elements |
GB2395299B (en) * | 2002-09-17 | 2006-06-21 | Micron Technology Inc | Control of processing elements in parallel processors |
US20040081238A1 (en) * | 2002-10-25 | 2004-04-29 | Manindra Parhy | Asymmetric block shape modes for motion estimation |
US7120195B2 (en) * | 2002-10-28 | 2006-10-10 | Hewlett-Packard Development Company, L.P. | System and method for estimating motion between images |
WO2004079916A2 (en) * | 2003-03-03 | 2004-09-16 | Mobilygen Corporation | Array arrangement for memory words and combination of video prediction data for an effective memory access |
US7581080B2 (en) * | 2003-04-23 | 2009-08-25 | Micron Technology, Inc. | Method for manipulating data in a group of processing elements according to locally maintained counts |
US9292904B2 (en) * | 2004-01-16 | 2016-03-22 | Nvidia Corporation | Video image processing with parallel processing |
JP4511842B2 (en) * | 2004-01-26 | 2010-07-28 | パナソニック株式会社 | Motion vector detecting device and moving image photographing device |
GB2411745B (en) * | 2004-03-02 | 2006-08-02 | Imagination Tech Ltd | Method and apparatus for management of control flow in a simd device |
US7196708B2 (en) * | 2004-03-31 | 2007-03-27 | Sony Corporation | Parallel vector processing |
US7983342B2 (en) * | 2004-07-29 | 2011-07-19 | Stmicroelectronics Pvt. Ltd. | Macro-block level parallel video decoder |
JP2006140601A (en) * | 2004-11-10 | 2006-06-01 | Canon Inc | Image processor and its control method |
US7725691B2 (en) * | 2005-01-28 | 2010-05-25 | Analog Devices, Inc. | Method and apparatus for accelerating processing of a non-sequential instruction stream on a processor with multiple compute units |
CL2006000541A1 (en) * | 2005-03-10 | 2008-01-04 | Qualcomm Inc | Method for processing multimedia data comprising: a) determining the complexity of multimedia data; b) classify multimedia data based on the complexity determined; and associated apparatus. |
US8149926B2 (en) * | 2005-04-11 | 2012-04-03 | Intel Corporation | Generating edge masks for a deblocking filter |
US8619860B2 (en) * | 2005-05-03 | 2013-12-31 | Qualcomm Incorporated | System and method for scalable encoding and decoding of multimedia data using multiple layers |
US20070071404A1 (en) * | 2005-09-29 | 2007-03-29 | Honeywell International Inc. | Controlled video event presentation |
US7451293B2 (en) * | 2005-10-21 | 2008-11-11 | Brightscale Inc. | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
US20070188505A1 (en) * | 2006-01-10 | 2007-08-16 | Lazar Bivolarski | Method and apparatus for scheduling the processing of multimedia data in parallel processing systems |
US20080126278A1 (en) * | 2006-11-29 | 2008-05-29 | Alexander Bronstein | Parallel processing motion estimation for H.264 video codec |
-
2006
- 2006-10-19 US US11/584,480 patent/US7451293B2/en not_active Expired - Fee Related
- 2006-10-20 JP JP2008534793A patent/JP2009512920A/en not_active Abandoned
- 2006-10-20 EP EP06836411A patent/EP1941380A2/en not_active Withdrawn
- 2006-10-20 KR KR1020087009137A patent/KR20080091754A/en not_active Application Discontinuation
- 2006-10-20 WO PCT/US2006/040975 patent/WO2007050444A2/en active Application Filing
- 2006-10-20 CA CA002626184A patent/CA2626184A1/en not_active Abandoned
- 2006-10-20 TW TW095138731A patent/TW200745876A/en unknown
-
2008
- 2008-05-28 US US12/128,528 patent/US20080307196A1/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3308436A (en) * | 1963-08-05 | 1967-03-07 | Westinghouse Electric Corp | Parallel computer system control |
US4212076A (en) * | 1976-09-24 | 1980-07-08 | Giddings & Lewis, Inc. | Digital computer structure providing arithmetic and boolean logic operations, the latter controlling the former |
US4783738A (en) * | 1986-03-13 | 1988-11-08 | International Business Machines Corporation | Adaptive instruction processing by array processor having processor identification and data dependent status registers in each processing element |
US6336178B1 (en) | 1995-10-06 | 2002-01-01 | Advanced Micro Devices, Inc. | RISC86 instruction set |
US6658578B1 (en) | 1998-10-06 | 2003-12-02 | Texas Instruments Incorporated | Microprocessors |
US6173386B1 (en) | 1998-12-14 | 2001-01-09 | Cisco Technology, Inc. | Parallel processor with debug capability |
US20020174318A1 (en) * | 1999-04-09 | 2002-11-21 | Dave Stuttard | Parallel data processing apparatus |
US20040006584A1 (en) | 2000-08-08 | 2004-01-08 | Ivo Vandeweerd | Array of parallel programmable processing engines and deterministic method of operating the same |
US20020133688A1 (en) * | 2001-01-29 | 2002-09-19 | Ming-Hau Lee | SIMD/MIMD processing on a reconfigurable array |
US6938183B2 (en) | 2001-09-21 | 2005-08-30 | The Boeing Company | Fault tolerant processing architecture |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080126757A1 (en) * | 2002-12-05 | 2008-05-29 | Gheorghe Stefan | Cellular engine for a data processing system |
US7908461B2 (en) | 2002-12-05 | 2011-03-15 | Allsearch Semi, LLC | Cellular engine for a data processing system |
US20080307196A1 (en) * | 2005-10-21 | 2008-12-11 | Bogdan Mitu | Integrated Processor Array, Instruction Sequencer And I/O Controller |
US20100066748A1 (en) * | 2006-01-10 | 2010-03-18 | Lazar Bivolarski | Method And Apparatus For Scheduling The Processing Of Multimedia Data In Parallel Processing Systems |
US20080028192A1 (en) * | 2006-07-31 | 2008-01-31 | Nec Electronics Corporation | Data processing apparatus, and data processing method |
US20080059467A1 (en) * | 2006-09-05 | 2008-03-06 | Lazar Bivolarski | Near full motion search algorithm |
Also Published As
Publication number | Publication date |
---|---|
TW200745876A (en) | 2007-12-16 |
KR20080091754A (en) | 2008-10-14 |
JP2009512920A (en) | 2009-03-26 |
US20080307196A1 (en) | 2008-12-11 |
WO2007050444A2 (en) | 2007-05-03 |
WO2007050444A3 (en) | 2009-04-30 |
CA2626184A1 (en) | 2007-05-03 |
EP1941380A2 (en) | 2008-07-09 |
US20070130444A1 (en) | 2007-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7451293B2 (en) | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing | |
CN110678840B (en) | Tensor register file | |
CN110678841B (en) | Tensor processor instruction set architecture | |
KR101202445B1 (en) | Processor | |
JP3559046B2 (en) | Data processing management system | |
US8181003B2 (en) | Instruction set design, control and communication in programmable microprocessor cores and the like | |
US5287532A (en) | Processor elements having multi-byte structure shift register for shifting data either byte wise or bit wise with single-bit output formed at bit positions thereof spaced by one byte | |
KR100904318B1 (en) | Conditional instruction for a single instruction, multiple data execution engine | |
CN110770701A (en) | Register based matrix multiplication | |
CN100472505C (en) | Parallel processing array | |
WO2001031418A2 (en) | Wide connections for transferring data between pe's of an n-dimensional mesh-connected simd array while transferring operands from memory | |
US20050024983A1 (en) | Providing a register file memory with local addressing in a SIMD parallel processor | |
US11907158B2 (en) | Vector processor with vector first and multiple lane configuration | |
US20080059764A1 (en) | Integral parallel machine | |
JP2021108104A (en) | Partially readable/writable reconfigurable systolic array system and method | |
CN110050259B (en) | Vector processor and control method thereof | |
US11443014B1 (en) | Sparse matrix multiplier in hardware and a reconfigurable data processor including same | |
CN111158757A (en) | Parallel access device and method and chip | |
US7577824B2 (en) | Methods and apparatus for storing expanded width instructions in a VLIW memory for deferred execution | |
US6728863B1 (en) | Wide connections for transferring data between PE's of an N-dimensional mesh-connected SIMD array while transferring operands from memory | |
US7069386B2 (en) | Associative memory device | |
CN101217673B (en) | Format conversion apparatus from band interleave format to band separate format | |
JP2812292B2 (en) | Image processing device | |
CN112579971B (en) | Matrix operation circuit, matrix operation device and matrix operation method | |
US20230059970A1 (en) | Weight sparsity in data processing engines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BRIGHTSCALE INC.,, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MITU, BOGDAN;STEFAN, GHEORGHE;TOMESCU, DAN;REEL/FRAME:018971/0744;SIGNING DATES FROM 20061228 TO 20070117 |
|
AS | Assignment |
Owner name: SILICON VALLEY BANK, CALIFORNIA Free format text: SECURITY AGREEMENT;ASSIGNOR:BRIGHTSCALE, INC.;REEL/FRAME:020353/0462 Effective date: 20080110 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: BRIGHTSCALE, INC., CALIFORNIA Free format text: RELEASE;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:022868/0330 Effective date: 20090622 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: ALLSEARCH SEMI LLC, ILLINOIS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BRIGHTSCALE, INC.;REEL/FRAME:028292/0199 Effective date: 20090810 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20201111 |