WO1994010630A1 - Data formatter - Google Patents
Data formatter Download PDFInfo
- Publication number
- WO1994010630A1 WO1994010630A1 PCT/AU1993/000572 AU9300572W WO9410630A1 WO 1994010630 A1 WO1994010630 A1 WO 1994010630A1 AU 9300572 W AU9300572 W AU 9300572W WO 9410630 A1 WO9410630 A1 WO 9410630A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- formatter
- addresses
- matrix
- data formatter
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0207—Addressing or allocation; Relocation with multidimensional access, e.g. row/column, matrix
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
Definitions
- This invention relates to the general field of digital computing and more particularly to a method and apparatus for addressing a memory space in an ordered manner to input and extract data structures.
- the invention may operate as part of a scalable array processing system.
- the data formatter can be used as a member of a set of formatters which provide data and instructions to a dataflow processor.
- An example of a dataflow processor can be a systolic array of processing elements.
- the subsystem formed by the controllers and the systolic array implements a high performance tensor or matrix processing engine.
- the formatter has two primary modes of operation. In the first mode it generates addresses to read scalar operands from a memory space, constructs a parallel set of operands comprising ⁇ instruction, data ⁇ 2-tuples, and outputs the set in bit-serial form to an appropriate interface in synchronism with other members of the set of formatters. In the second mode, the formatter accepts, in synchronism with a number of other data formatters, all or part of a parallel data structure which is presented in bit-skewed bit- serial form from an appropriate interface. When the formatter has stored in internal buffers sufficient of the parallel data structure, it generates addresses to write the stored data structure word sequentially into a memory space.
- the parallel data structure can be considered as "wavefronts" which are either entered into the parallel interface or read from the interface. Wavefronts consist of sets of ⁇ instruction.data ⁇ 2-tuples which are bit-skewed between adjacent processing elements.
- a data formatter adapted to provide data and instructions to a dataflow processor or at least to offer the public a useful alternative. Therefore, according to one form of this invention, although this need not be the only or indeed the broadest form, there is proposed a data formatter comprising : a Bus Control means adapted to facilitate communication within the data formatter and between the data formatter and external memory means; an Address Generation means adapted to generate memory addresses for data fetch or storage; and a Shift Register means adapted to provide local data storage and communication with a dataflow processor.
- the data formatter is adapted to access at least one predetermined region of the external memory means.
- the data formatter further includes an Instruction Fetch means adapted to fetch and execute commands which determine the operation of the data formatter.
- the address generator means comprises a parallel datapath, a local memory means adapted to store microprograms and a sequencer means adapted to sequence the microprograms to generate addresses.
- the parallel datapath possesses an internal memory means which stores parameters used by the microprograms.
- the shift register means comprises a number of serial-to- parallel/parallel-to-serial registers adapted to provide local storage of wavefronts and communication with a dataflow processor.
- the data formatter is adapted to detect the presence of an IEEE infinity and effects an output dependant on such detection status.
- the data formatter executes a linear sequence of commands.
- the address generator unit generates memory addresses from which data is read to load the registers of the shift register unit.
- the address generator unit generates memory addresses to which data is written from the registers of the shift register unit.
- the invention can be said to reside in a method of formatting data for provision to a dataflow processor including the steps of : (a) configuration wherein internal registers of the data formatter are initialised and loaded with information including instructions to be concatenated with data during a wavefront execution phase; (b) wavefront execution wherein addresses are generated, data is fetched from the generated addresses and instructions and data are concatenated to form ⁇ instruction, data ⁇ 2-tuples which are output to the dataflow processor; and (c) termination wherein data formatting is terminated.
- steps (a) and (b) may be repeated an arbitrary number of times.
- the instructions are 5-bit opcodes.
- configuration phase can be performed under the control of a bus control means by the fetching of commands from an external memory means or alternatively by explicit loading of parameters by a host processor.
- FIG 1 is a schematic diagram of a data formatter
- FIG.2 is a schematic diagram of a two-dimensional difference engine
- FIG.3 is a C-code listing of an implementation of the algorithm described in equation (1) for normal matrix storage and access;
- FIG.4 is a schematic diagram of the address generation for normal matrix accesses
- FIG.5 is a C-code listing of an implementation of the algorithm for normal storage and lower triangular matrix access
- FIG. 6 is a schematic diagram of the address generation for lower triangular matrix accesses
- FIG. 7 is a C-code listing of an implementation of the algorithm for normal storage and upper triangular matrix access
- FIG. 8 is a schematic diagram of the address generation for upper triangular matrix accesses
- FIG. 9 is a C-Code listing of an implementation of the algorithm for normal storage and strictly-upper triangular matrix access
- FIG. 10 is a schematic diagram of the address generation for strictly upper triangular matrix accesses
- FIG. 11 is a summary of the interface signals between the formatter and both the host memory and the parallel data interface
- FIG. 12 is a schematic diagram of a first embodiment of the implementation of a data formatter in a system.
- FIG. 13 is a schematic diagram of a second embodiment of the implementation of a data formatter in a system.
- the data formatter is comprised of four modules : the Bus Control Unit; the Address Generation Unit; the Instruction Fetch Unit; and the Shift Register Unit.
- the bus control unit provides the control for the internal bus by which functional units communicate between themselves or with the external world. Requests for bus access are ordered in priority and serviced by the bus control unit interface. External communications are also controlled by the bus controller. The external address and data bus and their associated protocols are interfaced to the internal bus in the bus control unit. External bus request and bus grant are part of the interface, as is the multiplexing of address and data.
- the internal registers within the various units are made available to the external bus by the control unit so that they may be addressed as memory mapped registers. A number of memory spaces are supported by the bus control unit. This allows the use of partitioned memory to enhance system speed.
- An example is a partitioned cache (described later) in which different matrix operands are stored in different partitions to improve the efficiency of the cache.
- the Address Generation Unit consists of a parallel datapath, a microprogram ROM (Read Only Memory) and a microprogramed sequencer.
- the address for either source or destination data are computed by the AGU and passed to the bus control unit to be used in data reads and writes.
- a number of microprograms are held in the microprogram ROM which enable the AGU to perform a range of different addressing modes.
- the Shift Register Unit contains 20 serial-to-parallel/parallel-to-serial registers. These shift registers constitute the local storage for structured data which is input either from the sequential memory accesses of the address generator unit when reading structured operands from memory, or from the parallel bit-serial inputs prior to writing a result back to memory.
- the formatter is controlled by the host either directly or indirectly.
- the host writes configuration data directly into the registers of the formatter, and then initiates the device by writing to a control register. Determination of the completion of a formatter sequence is done by polling a status register.
- Indirect control of the formatter is effected by a program resident in an accessible memory space which is fetched and executed by an instruction fetch unit in the formatter.
- the initiation of program execution is carried out by writing the address of a program into the Program Address register.
- Fetched commands load internal registers which are used to specify the parameters of the data structures to be fetched from or stored to a specified memory space.
- ⁇ data A 32-bit immediate word.
- ⁇ short data A signed (2's complement) 23-bit immediate data word.
- parameters which specify the data structure include the length of a data vector, the starting address in memory, the number of rows and columns of the matrix and the linear spacing between matrix elements.
- Additional commands are used to initiate the transfer of parallel data in wavefronts, and to interrupt a host processor. No conditional or branch statements are present in the command set, and the formatter executes a linear sequence of commands until a halt command is executed. Branching commands can be incorporated into the command set if desired. This command causes the formatter to activate an interrupt signal and go into a wait state until a new program start address is written to the Program Address register.
- the typical program consists of the following sequence of phases: configuration; wavefront execution; termination.
- the internal registers of the formatters are initialised, together with the loading of the instructions which are to be concatenated with the data during the output of a wavefront.
- wavefront execution phase data is fetched and stored in the internal shift registers, instructions are appended and the ⁇ instruction.data ⁇ 2-tuples are output serially as wavefronts after the set of formatters has synchronised.
- the address generator unit is used either to generate memory addresses from which wavefront data is read to load the shift registers, or to generate memory addresses to which wavefront data in the shift registers is written.
- Two 5-bit opcodes are stored in the formatter for appending to the data during output. The first is output with the first data wavefront and the second is output during all subsequent wavefronts of a given data structure.
- termination phase command fetching is terminated by a HALT command, and an interrupt signal is asserted. Additional configuration and wavefront execution phases may occur before a termination command is executed.
- the length of a formatter program is limited only by the address space.
- Table 2 is a read/write register map of the formatter. Fifteen registers are used for configuration and control information, and twenty registers are used in the shift register unit for the parallel loading and storing of structure data.
- the instruction fetch unit uses registers 0 and 1 as a program counter and a command holding register respectively.
- the program start address is initially written to register 0 and subsequent reads return the address of the next command to be fetched.
- TABLE 2 Registers 2 and 3 contain 8-bit AND and OR masks for the command space and 8-bit AND and OR masks for the data space. They are used by the Bus Control Unit to calculate and output an 8-bit descriptor for both data and command addresses.
- Register 5 is a 3-bit status and control register providing information regarding the following:
- Infinity Detected an infinity has been detected in a value which has been entered into a shift register. Setting this register bit clears the Infinity Detected bit.
- Interrupt the formatter is asserting an interrupt. Setting this register bit clears the interrupt.
- AGU Busy the address generator is executing a program. Setting this register bit starts the AGU if the parameters have been written into the AGU.
- Registers 6 and 7 are two array control registers used to define the way in which the formatter communicates with the parallel interface.
- the first register specifies information concerning the properties of the first wavefront transmitted to the interface.
- inter-wavefront gap a variable wait period between wavefronts.
- element length the number of bits in each 2-tuple passed to the interface.
- wavefront type a 2-bit field which identifies the type of wavefront.
- negate a flag which causes the sign bit of all operands processed to be reversed, so negating the operand.
- opcode a 5-bit field which is output as one element of the operand 2-tuples transmitted to the interface.
- the second control register contains an identical set of parameters to the first, with the exception of the negate flag which is specified by the first register.
- the parameters held in this second control register are used to specify the properties of all wavefronts subsequent to the first.
- the address generator unit (AGU) consists of a general purpose arithmetic datapath and two 20-bit increment/decrement datapaths. Control of the datapaths is effected by programs resident in a microcode ROM internal to the AGU. Microprograms for a number of different matrix addressing algorithms are present in the ROM. These programs are initiated either by a Wave command, or the setting of the AGU bit of the status and control register.
- the AGU utilizes registers 8 to 15 of the registers listed in Table 2.
- the eight destination registers are loaded either by host writes to the memory-mapped registers, or by the LOAD or LOADQ commands.
- the only readable register is 9, which contains the current address generated by the AGU.
- the address generated by the AGU is dependent upon the set of parameters ⁇ Argument type, Storage mode, Access mode ⁇ . Taking these parameters in order :
- Argument type can be one of three; Operand, Result and Hadamard Result.
- Operand programs are used to access operand matrices to be output to the parallel interface.
- Result programs are used for storing the data structures input from the parallel interface when the structures are generated from a conventional matrix multiplication.
- Hadamard Result programs are used when the structure input from the parallel interface has been generated with an element-wise operation. They cause additional synchronisation protocols to be observed between all data formatters in a system.
- Storage mode The storage mode of matrix operands have been defined as one of the set ⁇ Normal, Triangular, No storage ⁇ . For matrix operands stored normally, every element in the matrix is written into a memory location, whereas for triangular operands the zero elements are not written to the memory, so allowing packed storage techniques to be used.
- the access mode of a matrix structure can be an element of the set ⁇ Normal, Upper triangle, Strictly upper triangle, Lower triangle, no access ⁇ . Access for each is described as follows : Normal: Addresses are generated for all elements, and all elements are accessed in host memory.
- a general approach to matrix addressing is to use a second order difference engine, implemented with a modulo arithmetic capability.
- the following expression is implemented in hardware:
- This maps an element of an arbitrary matrix [X], stored at address a in a linear address space starting at base_address, onto the (n-
- the parameters of the right hand side of this expression are loaded into the registers of the datapath in the address generator.
- ni and n2 are indexed through their respective ranges (the dimension of the matrix). This is carried out using the difference engine principle shown in FIG. 2.
- addresses are formed by ni - 1 accumulations of the first difference value di , where each operation is carried out modulo q.
- the address of the first element of the second row is computed by accumulating the second difference d2 modulo q, and the remaining addresses of the matrix elements are computed by repeating this procedure.
- Prime-radix mappings can be implemented directly with this technique.
- the dimensions ⁇ nj ⁇ are variable. By linearly decreasing one of the two dimensions in a matrix it is possible to generate addresses for a triangular region of the matrix.
- the symbol ⁇ .> in FIG. 2 represents evaluation modulo q.
- FIG. 3 is a C-code listing modelling the normal storage, normal access matrix address generation algorithm derived from equation (1), and FIG. 4 is a schematic representation of the method of generation of the addresses.
- the access () function models the accessing of data in memory. If the matrix was of type Operand, the access (a, sreg) call would fetch the contents of memory at address a and write the data into the shift register number sreg. If the matrix was of type Result, the contents of shift register number sreg would be written into memory location a.
- FIG. 5 is a C-code listing modelling the normal storage, lower access matrix address generation algorithm derived from equation (1)
- FIG. 6 is a schematic representation of the method of generation of the addresses.
- FIG. 7 is a C-code listing modelling the normal storage, upper access matrix address generation algorithm derived from equation (1)
- FIG. 8 is a schematic representation of the method of generation of the addresses.
- FIG. 9 is a C-code listing modelling the normal storage, strictly-upper access matrix address generation algorithm derived from equation (1), and FIG. 10 is a schematic representation of the method of generation of the addresses.
- the formatter communicates with the host system memory via a multiplexed address/data bus and associated bus control signals.
- the bus is 32-bits wide.
- Multiple formatters can be connected to a common system bus with the use of an asynchronous bus-request/bus-grant protocol. One such interface is shown in FIG. 11.
- FIG. 12 shows a system diagram in which formatters are used to input two parallel data structures into a systolic processor array from a global system bus, and also to accept the output of the array and write the output back onto the system bus.
- FIG. 13 in a second embodiment shown in FIG. 13 the invention has been implemented in a system hosted by a Sun SPARCstation.
- the matrix processor is interfaced to the Sun SPARCstation via the SBus.
- This arrangement is convenient since it allows the SCAP hardware to operate using virtual addressing, with virtual to physical translation being performed by the SBus controller in the SPARCstation.
- the host processor and the matrix processor therefore share the same data space, so both can interact with the matrix data directly.
- This approach does however have its own disadvantages, the most critical being the fact that the data transfer rate across the SBus tends to be quite low (only 1.5 to 3.85 Mwords per second) due to the overheads of address translation.
- the matrix processor also includes a cache memory subsystem.
- the cache supports burst mode data transfers across the SBus on cache misses and can also be used to hold frequently used operand matrices (such as coefficient matrices in transform applications) and to store temporary or intermediate results.
- a novel cache partitioning scheme has been implemented.
- the technique allows the cache to be dynamically divided into a number of regions that are guaranteed not to interact thereby ensuring that fetches for one matrix operand do not interfere with fetches for the other.
- the data controllers determine how the cache is partitioned on a per-operand/result basis (it is also possible to assign a cache partition to the command streams) by issuing an 8-bit space address along with each address generated. Each bit of the space address can be set or cleared, or can take on the value of one of the generated address bits. In our system implementation, three bits of this space address are used to control non-cached accesses, temporary matrix accesses and temporary matrix initialization. Four bits are used to partition the cache into up-to 16 independent regions.
- the data formatter chip was designed using a generic 1.2 micron double layer metal CMOS process rule-set and were retargetted for fabrication using Hewlett Packard's 1.0 micron HP34 process using a gate shrink.
- the processing element chip is described as part of a second embodiment in a co- pending application number PL5697 entitled SYSTOLIC DIMENSIONLESS ARRAY.
- the data formatter chip was designed using a mixture of full custom and standard cell design styles. Data formatter chips are used to fetch operands from matrix data structures held in the host memory system, and to store results back into the host memory and/or cache.
- the data formatter chips implement matrix addressing. They access the elements of the matrix using information from a matrix descriptor that specifies the base address of the matrix, element spacing and row/column spacing, etc. The same chip can be used as either an operand data formatter or a result data formatter.
- a number of addressing modes have been implemented to support conventional matrix multiplication, element-wise operations and certain triangular access modes. Constant and circulant matrices can be stored and accessed efficiently. Both real and complex matrices are supported. Matrix transposition, negation, and submatrix evaluation can be performed by the data controllers, as can more complex mappings or permutations of the matrix elements (e.g., prime factor mappings).
- the data formatters fetch one operand for each processing element along the two edges X and Y of the array, and then transmit the data to the array as a block known as an operand wavefront.
- the operand wavefront also includes an instruction opcode that is transmitted to the array along with the data.
- the opcode specifies to the processing elements what type of computation is to be performed (e.g. multiply/accumulate, element-wise addition, clear accumulator, etc).
- Bit-serial communication to the processing element array is used, with a one clock cycle pipeline delay between each processing element in each dimension of the array. This approach approximates broadcast operation, but caters for arbitrary expansion.
- Result wavefronts are read back from the processing element array using a similar timing scheme.
- Data formatter chips have the ability to fetch and execute their own command streams. This minimizes host intervention and thereby improves system performance.
- Data formatter programs describe the matrices involved in the computation and specify the methods by which the matrix data is to be accessed as well as the operation(s) to be computed by the array.
- a data formatter program can be as simple as a single matrix multiplication or as complex as an entire application.
- Each data formatter chip can provide data to or receive data from up to 20 processing elements along the edges of the array. Therefore, a system containing up to 400 processing elements (20 PE chips) can be controlled with just 3 data formatter chips: one for each of the X and Y operand data streams, and one for the result data stream R.
- Data formatter chips can be cascaded to support arbitrarily large processing element arrays.
- a system containing up to 1600 processing elements (80 PE chips) requires 6 data formatters, while a 3600 PE array (180 chips) requires 9 data formatter chips.
- Table 3 shows the statistics associated with one embodiment of the data formatter chip.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Executing Machine-Instructions (AREA)
- Advance Control (AREA)
- Radar Systems Or Details Thereof (AREA)
- Burglar Alarm Systems (AREA)
- Light Guides In General And Applications Therefor (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU54125/94A AU5412594A (en) | 1992-11-05 | 1993-11-05 | Data formatter |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AUPL569692 | 1992-11-05 | ||
AUPL5696 | 1992-11-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1994010630A1 true WO1994010630A1 (en) | 1994-05-11 |
Family
ID=3776519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU1993/000572 WO1994010630A1 (en) | 1992-11-05 | 1993-11-05 | Data formatter |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU5412594A (en) |
CA (1) | CA2148464A1 (en) |
WO (1) | WO1994010630A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997043715A2 (en) * | 1996-05-15 | 1997-11-20 | Philips Electronics N.V. | Processor with an instruction cache |
US7730031B2 (en) | 2000-03-01 | 2010-06-01 | Computer Associates Think, Inc. | Method and system for updating an archive of a computer file |
US8495019B2 (en) | 2011-03-08 | 2013-07-23 | Ca, Inc. | System and method for providing assured recovery and replication |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4814978A (en) * | 1986-07-15 | 1989-03-21 | Dataflow Computer Corporation | Dataflow processing element, multiprocessor, and processes |
US4864491A (en) * | 1984-08-07 | 1989-09-05 | Nec Corporation | Memory device |
-
1993
- 1993-11-05 CA CA002148464A patent/CA2148464A1/en not_active Abandoned
- 1993-11-05 AU AU54125/94A patent/AU5412594A/en not_active Abandoned
- 1993-11-05 WO PCT/AU1993/000572 patent/WO1994010630A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4864491A (en) * | 1984-08-07 | 1989-09-05 | Nec Corporation | Memory device |
US4814978A (en) * | 1986-07-15 | 1989-03-21 | Dataflow Computer Corporation | Dataflow processing element, multiprocessor, and processes |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997043715A2 (en) * | 1996-05-15 | 1997-11-20 | Philips Electronics N.V. | Processor with an instruction cache |
WO1997043715A3 (en) * | 1996-05-15 | 1998-01-22 | Philips Electronics Nv | Processor with an instruction cache |
US7730031B2 (en) | 2000-03-01 | 2010-06-01 | Computer Associates Think, Inc. | Method and system for updating an archive of a computer file |
US8019730B2 (en) | 2000-03-01 | 2011-09-13 | Computer Associates Think, Inc. | Method and system for updating an archive of a computer file |
US8019731B2 (en) | 2000-03-01 | 2011-09-13 | Computer Associates Think, Inc. | Method and system for updating an archive of a computer file |
US8495019B2 (en) | 2011-03-08 | 2013-07-23 | Ca, Inc. | System and method for providing assured recovery and replication |
Also Published As
Publication number | Publication date |
---|---|
AU5412594A (en) | 1994-05-24 |
CA2148464A1 (en) | 1994-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220291923A1 (en) | Register-based matrix multiplication | |
US4935867A (en) | Signal processor memory management unit with indirect addressing using selectable offsets and modulo values for indexed address calculations | |
US20190188151A1 (en) | Two address translations from a single table look-aside buffer read | |
US6289434B1 (en) | Apparatus and method of implementing systems on silicon using dynamic-adaptive run-time reconfigurable circuits for processing multiple, independent data and control streams of varying rates | |
US9015390B2 (en) | Active memory data compression system and method | |
US4980817A (en) | Vector register system for executing plural read/write commands concurrently and independently routing data to plural read/write ports | |
US5175863A (en) | Signal data processing system having independently, simultaneously operable alu and macu | |
US7584343B2 (en) | Data reordering processor and method for use in an active memory device | |
US11119779B2 (en) | Dual data streams sharing dual level two cache access ports to maximize bandwidth utilization | |
EP0081034A2 (en) | Control unit for a functional processor | |
US10318433B2 (en) | Streaming engine with multi dimensional circular addressing selectable at each dimension | |
JP2000509528A (en) | Data processing management system | |
US10459843B2 (en) | Streaming engine with separately selectable element and group duplication | |
US5168573A (en) | Memory device for storing vector registers | |
US20190391918A1 (en) | Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets | |
JPH05100948A (en) | Speed improvement type data processing system executing square arithmetic operation and method thereof | |
US5473557A (en) | Complex arithmetic processor and method | |
US20030221086A1 (en) | Configurable stream processor apparatus and methods | |
WO1994010630A1 (en) | Data formatter | |
US7073034B2 (en) | System and method for encoding processing element commands in an active memory device | |
JP7136343B2 (en) | Data processing system, method and program | |
Dowling et al. | A Hybrid VLSI System Architecture for Scientific, Matrix, Image, and DSP Computations | |
SRINIVASAN et al. | IMPLEMENTATION OF A FAST DATA ACCESS ARCHITECTURE FOR TWO DIMENSIONAL APPLICATIONS | |
Klemmer et al. | CMOS/SOS Microsignal Processor | |
JPH0223476A (en) | Filtering processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AT AU BB BG BR BY CA CH CZ DE DK ES FI GB HU JP KP KR KZ LK LU LV MG MN MW NL NO NZ PL PT RO RU SD SE SK UA US UZ VN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2148464 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 1995 424376 Country of ref document: US Date of ref document: 19950505 Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase |