EP1573573A2 - Dataflow-synchronized embedded field programmable processor array - Google Patents
Dataflow-synchronized embedded field programmable processor arrayInfo
- Publication number
- EP1573573A2 EP1573573A2 EP03775666A EP03775666A EP1573573A2 EP 1573573 A2 EP1573573 A2 EP 1573573A2 EP 03775666 A EP03775666 A EP 03775666A EP 03775666 A EP03775666 A EP 03775666A EP 1573573 A2 EP1573573 A2 EP 1573573A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- processor
- array
- cells
- paths
- cell
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G06F15/8023—Two dimensional arrays, e.g. mesh, torus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
Definitions
- the present invention relates to array processors embedded in integrated circuits, such as those implemented in a semiconducting material like silicon, and particularly to reconfigurable embedded array processors.
- An embedded system is some combination of hardware or software that is specifically designed for a particular purpose or application within an overall system, and may be fixed in capability or programmable.
- a mobile phone may, for example, have a power saving integrated circuit (IC) or "chip" operable only with its respective type of phone and devoted exclusively to controlling the display and other elements to conserve power.
- the same mobile phone typically includes a digital signal processing integrated circuit, which executes the functions on a digital portion of the radio.
- programmable radios would be desirable.
- digital radio processing functions can entail high data sample rates, along with high computational loads, that are typically impractical to implement on programmable hardware.
- Embedded field programmable gate arrays are "chip macros" that can be programmable in the field, as well as integrated in a silicon chip, and are available from a limited number of vendors. These special purpose processors operate at high speeds, minimize the amount of hardware required, and minimize software development programming time. Although EFPGAs offer "post silicon” reconfigurability, their design density is poor and their clock speed is unpredictable, particularly for high speed demodulation functions in digital radios.
- the present invention is directed to an embedded processor consisting of a two-dimensional array of processing cells and a mechanism for reconfigurably connecting paths between a signal processing circuit and respective cells on a periphery of the array. The processor performs mathematical operations under dataflow control, and is thereby easily integrated within a signal processing circuit operating under the same mode of control. According to this invention the signal processing behavior of the integrated circuit may be reconfigured in the field.
- FIG. 1 depicts an example of a device having an embedded array processor in accordance with the present invention.
- FIG. 2 depicts an exemplary flow of processing in controlling the array processor of FIG. 1;
- FIG. 3 depicts an example of a mixed-signal system on a chip using an embedded array processor according to the present invention.
- FIG. 1 shows an exemplary embodiment of an apparatus in accordance with the present invention.
- a receiver 100 such as one in a broadcast or cable television receiver, local area network wireless receiver or mobile phone receiver, contains an IC 102.
- the IC 102 includes a system controller 104 and an embedded array processor 106.
- An array processor is a processor capable of executing instructions that operate on input that may consist of arrays.
- the embedded array processor 106 has a two-dimensional rectangular array 108 and a mechanism or interface 110 which is shown in FIG. 1 to surround the array 108 on all four edges.
- the two-dimensional array 108 is composed of processing cells 112.
- inter-cell connection within the array 108 is such that each cell 112 is connected only to cells 112 whose column is the same and whose row is immediately adjacent, and only to cells 112 whose row is the same and whose column is immediately adjacent, to realize a "nearest neighbor" connection architecture, as shown in FIG. 2 of commonly owned U.S. Patent Publication No. 2003/0065904, filed October 1, 2001, (hereinafter the '904 application), the entire disclosure of which is incorporated herein by reference. Since inter-cell connection is purely nearest-neighbor, the array offers the flexibility of being scalable.
- the interface 110 has border cells 114 connected to each respective processing cell 112 on the periphery of the array 108, each border cell 114 having a buffer 116.
- the periphery preferably consists of those processing cells 112 which are located on the array edges, i.e., in at least one of the first row, last row, first column and last column. Since internal array connection cell-to-cell, under the nearest neighbor scheme, leaves two neighbors missing for each corner cell 112 and one neighbor missing for each other cell 112 on array edges, the missing connections are each made to a corresponding border cell 114.
- FIG. 1 shows an information path 122 that includes an I/O pad 118 the crossbar network 120 and a border cell 114. Reconfiguring a path causes the path to traverse either a different border cell 114, a different I/O pad 118, or both.
- the path 124 is a reconfiguration of the path 112 to traverse a different border cell 114.
- the array processor 106 is a systolic processing array, a special-purpose system which can be likened to an assembly line for input operands, although operations typically proceed not in a strictly linear direction but in changing directions.
- a systolic processing array a special-purpose system which can be likened to an assembly line for input operands, although operations typically proceed not in a strictly linear direction but in changing directions.
- differing mathematical operations are performed on the data by different cells, while data proceeds in an orderly, lock-step progression from one cell to another.
- An example of a systolic array would be one that multiplies matrices. Entries of a row are multiplied by corresponding entries of a column, and the products are summed to produce an ordered column of sums. Efficiency is achieved by arranging operations to be performed in parallel, so that the results are produced in the fewest clock cycles.
- the '904 application provides another example of a systolic processing array, implementing a 32-tap real finite impulse response (FIR) filter.
- the filter is enhanced by concatenating other levels, two- dimensional and otherwise, to the original two-dimensional array, border cells being connected to processing cells on the periphery of each level.
- Such an enhanced array, connected by the border cells 114, is also within the intended scope of the present invention.
- the border cells 114 not only provide input to the array 108. They also provide results of array processing to the I/O pads 118. The border cells 114 receive these results by neighbor to neighbor conveyance from the processing cells 112 producing the results. Optionally, the border cell 114 may validate the results and output a data valid signal to the external process.
- the IC 102 includes a memory from which array programs are downloaded by means of a bus to corresponding processing cells 112.
- the memory is preferably a random access memory (RAM) or other writeable storage device so that updated array programs can be provided, as by an array generator external to the receiver 100.
- RAM random access memory
- the system controller 104 passes array programs to a master cell 126 of the embedded array processor 106 over a configuration bus such as the random access configuration bus shown in FIG. 16 of the '904 application.
- the master cell 126 forwards the array programs to the appropriate processing cells 112 (step 202) at system initialization or upon reconfiguration, e.g. implementation of a new algorithm for the processing array 106 (step 204). Due to the parallelism inherent in systolic processing, some of the processing cells 112 may receive identical programs. Alternatively implemented, the system controller 104 and RAM may instead reside within the embedded array processor 106.
- FIG. 2 Further depicted in FIG. 2 is an exemplary dataflow into the array 108.
- a new operand is received on an I/O pad 118, it continues flowing over a path that the crossbar network 120 directs to a corresponding border cell 114 (step 206) which checks the operand for validity (step 208). If invalid, error processing ensues (step 212), which may involve notifying a user of the receiver 100, and a new operand is requested 216 from the IC application using the embedded array processor 106 (step 216).
- forward error correction techniques may be applied to rectify the faulty operand.
- validation may be performed further upstream, before buffering by the border cell 114. In the embodiment shown in FIG.
- a valid operand is added to buffer 116 (step 214) and a counter (not shown) is incremented (step 216).
- the buffer cell 116 is implemented to stall the processor providing the new operand when the buffer 116 is full, as by issuing a stall instruction that is routed over the corresponding I/O pad 128 to that processor.
- a resume instruction is subsequently issued to the processor when an operand is de-buffered.
- enough buffer space may be provided at the outset to insure that the inflow of new operands in accommodated.
- a parameter corresponding to a predetermined number of input operands is compared to the buffer count.
- the parameters may vary among border cells 114 and are preferably programmable.
- the buffers e.g. ring or circular buffers, are implemented preferably in software. Alternatively, simple first in/first out (FIFO) buffers may be employed.
- a trigger is actuated, e.g. the border cell 114 signals the master cell 126 (step 220). If the buffer count is instead less than the parameter control returns to the top of the loop (step 206), and a new operand is awaited.
- the counter is decremented (step 224).
- the master cell 126 has the additional role of directing array operations based on the inflow of operands.
- the master cell 126 checks if it has received triggers from all active border cells 114, i.e. the border cells immediately adjacent those of the needed processing cells on the array periphery (step 228). If all of the triggers have been received, or when this occurs, the operands are read from buffer, the new operation or stage is commenced and the triggers are reset (step 230).
- the array processor 106 performs mathematical operations whose timing is based on a flow of input operands along the paths providing the operands to the array 108.
- the parameter for step 218 is set to zero.
- a Kahn process network is therefore implemented.
- the processors are interconnected by channels having first-in/first-out (FIFO) buffers.
- a processor can either send data to a FIFO channel, or else receive data from a FIFO channel. If a processor requests a read and no data is available then the processor stalls until the data is available.
- FIFO first-in/first-out
- step 216 can be retained to detect when the buffer 114 is full, at which point a stall instruction as described above is preferably issued to the processor providing the input operands. If step 216 is retained, the counter decrementing process (steps 222, 224) for the border cells would be retained as well, and a resume instruction would issue when an operand is de-buffered.
- Array programs may be prepared using a graphical user interface (GUI) that can edit and show the code to be downloaded to RAM on the IC 102 and then to each programming cell 112.
- GUI graphical user interface
- the embedded array processor 106 is particularly useful for integration, in a manner similar to that of embedding an FPGA within a system on chip (SoC).
- SoC system on chip
- the border cell-based interface 110 affords simple integration and a simple software programming flow in place of the proprietary hardware design flow characteristics of EFPGAs.
- the embedded array processor 106 may be integrated with a general system on a chip 102 that includes a digital circuit 302 and possibly an analog circuit 304, in order to introduce reconf ⁇ gurability within the system.
- the digital circuit may be composed of fixed design, digital circuit modules 306.
- One of the modules 306 may act as the system controller 104.
- the modules 306 have pins interconnected by routing switches 308, which normally connect the outputs of one digital circuit module 306 to the input of another.
- the routing switches 308 are also capable of replacing the connection between two modules 306 with an alternative input and output connector pair 310 to switch connection from one or both of the two modules 306 to a respective pin 128 of the embedded array processor 106.
- the digital circuit may also be integrated with the analog circuit 304 using one or more analog-to-digital converters 314 to convert the analog signals from the analog circuit outputs 304 to digital signals to be connected routed to the digital circuit modules 306.
- digital circuit outputs to the analog circuit 304 may be converted from digital samples to analog signals by a digital-to-analog converter 316.
- a routing switch 318 may also be placed between the converter 314 and the digital circuit 302 in order to afford switchable connection from and to the processor 106.
- the input/output connector pair 320 affords switching between a signal pathway from the analog circuit to the digital circuit and a signal pathway to or from said one or more input/output pads.
- a routing switch 322 may be placed between the digital-to-analog converter 316 and the digital circuit 302.
- the routing switches 308, 318, 322 in combination with the reconfigurable interface 110 of the processor 106 provide the analog and digital circuits 302, 304 with one or more dataflow-driven signal processing functions into the array processor 307 and insert such functions into either the chain of the digital circuit.
- the processor array 106 may interface with a plurality of inhomogeneous parallel processing elements on a chip.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Logic Circuits (AREA)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US43297102P | 2002-12-12 | 2002-12-12 | |
US432971P | 2002-12-12 | ||
US47516603P | 2003-06-02 | 2003-06-02 | |
US475166P | 2003-06-02 | ||
PCT/IB2003/005623 WO2004053716A2 (en) | 2002-12-12 | 2003-11-28 | Dataflow-synchronized embedded field programmable processor array |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1573573A2 true EP1573573A2 (en) | 2005-09-14 |
Family
ID=32511684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP03775666A Withdrawn EP1573573A2 (en) | 2002-12-12 | 2003-11-28 | Dataflow-synchronized embedded field programmable processor array |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1573573A2 (ko) |
JP (1) | JP2006510128A (ko) |
KR (1) | KR20050091715A (ko) |
AU (1) | AU2003283685A1 (ko) |
WO (1) | WO2004053716A2 (ko) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102004044976A1 (de) * | 2004-09-16 | 2006-03-30 | Siemens Ag | Rechnereinrichtung mit rekonfigurierbarer Architektur |
CN112738777B (zh) * | 2020-12-24 | 2022-04-08 | 山东高云半导体科技有限公司 | 近场通讯装置和方法、可读存储介质及处理器 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5193202A (en) * | 1990-05-29 | 1993-03-09 | Wavetracer, Inc. | Processor array with relocated operand physical address generator capable of data transfer to distant physical processor for each virtual processor while simulating dimensionally larger array processor |
US5457644A (en) * | 1993-08-20 | 1995-10-10 | Actel Corporation | Field programmable digital signal processing array integrated circuit |
US5892962A (en) * | 1996-11-12 | 1999-04-06 | Lucent Technologies Inc. | FPGA-based processor |
US5915123A (en) * | 1997-10-31 | 1999-06-22 | Silicon Spice | Method and apparatus for controlling configuration memory contexts of processing elements in a network of multiple context processing elements |
DE10081643D2 (de) * | 1999-06-10 | 2002-05-29 | Pact Inf Tech Gmbh | Sequenz-Partitionierung auf Zellstrukturen |
US20020133688A1 (en) * | 2001-01-29 | 2002-09-19 | Ming-Hau Lee | SIMD/MIMD processing on a reconfigurable array |
-
2003
- 2003-11-28 EP EP03775666A patent/EP1573573A2/en not_active Withdrawn
- 2003-11-28 AU AU2003283685A patent/AU2003283685A1/en not_active Abandoned
- 2003-11-28 KR KR1020057010653A patent/KR20050091715A/ko not_active Application Discontinuation
- 2003-11-28 JP JP2005502339A patent/JP2006510128A/ja active Pending
- 2003-11-28 WO PCT/IB2003/005623 patent/WO2004053716A2/en active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2004053716A2 * |
Also Published As
Publication number | Publication date |
---|---|
JP2006510128A (ja) | 2006-03-23 |
AU2003283685A1 (en) | 2004-06-30 |
WO2004053716A3 (en) | 2005-03-17 |
WO2004053716A2 (en) | 2004-06-24 |
KR20050091715A (ko) | 2005-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6920545B2 (en) | Reconfigurable processor with alternately interconnected arithmetic and memory nodes of crossbar switched cluster | |
US6745317B1 (en) | Three level direct communication connections between neighboring multiple context processing elements | |
JP4241045B2 (ja) | プロセッサアーキテクチャ | |
US7765382B2 (en) | Propagating reconfiguration command over asynchronous self-synchronous global and inter-cluster local buses coupling wrappers of clusters of processing module matrix | |
US7996652B2 (en) | Processor architecture with switch matrices for transferring data along buses | |
US7188192B2 (en) | Controlling multiple context processing elements based on transmitted message containing configuration data, address mask, and destination indentification | |
US7266672B2 (en) | Method and apparatus for retiming in a network of multiple context processing elements | |
US8107311B2 (en) | Software programmable multiple function integrated circuit module | |
EP0976059B1 (en) | A field programmable processor | |
US7856246B2 (en) | Multi-cell data processor | |
JPH04233326A (ja) | 構成可能相互接続構造 | |
US9564902B2 (en) | Dynamically configurable and re-configurable data path | |
US20090282213A1 (en) | Semiconductor integrated circuit | |
JP2005508532A5 (ko) | ||
US7961004B2 (en) | FPGA having a direct routing structure | |
US20030025132A1 (en) | Inputs and outputs for embedded field programmable gate array cores in application specific integrated circuits | |
EP1573573A2 (en) | Dataflow-synchronized embedded field programmable processor array | |
US20060075213A1 (en) | Modular integration of an array processor within a system on chip | |
US20130007411A1 (en) | Configurable Allocation of Hardware Resources | |
CN1726485A (zh) | 数据流同步的嵌入式现场可编程处理器阵列 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
17P | Request for examination filed |
Effective date: 20050919 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: NXP B.V. |
|
17Q | First examination report despatched |
Effective date: 20070904 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20100601 |