WO2008108005A1 - A data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled - Google Patents
A data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled Download PDFInfo
- Publication number
- WO2008108005A1 WO2008108005A1 PCT/JP2007/054756 JP2007054756W WO2008108005A1 WO 2008108005 A1 WO2008108005 A1 WO 2008108005A1 JP 2007054756 W JP2007054756 W JP 2007054756W WO 2008108005 A1 WO2008108005 A1 WO 2008108005A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- access
- processing elements
- processing
- network
- processing element
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
Definitions
- the present invention relates to an architecture design for data transfer bandwidth cost reduction, particularly, to measures for wiring area reduction, and a control metho d for architectures with processing elements, each can be either self-controlled or common controlled, to reach an optimized design regarding area requirements while offering maximal dual mode processor flexibility.
- SIMD single instruction, multiple data stream
- MIMD multiple instruction stream
- processor of the first mentioned style who se architecture has been disclo sed in Reference 1
- processor of the second mentioned style who se architecture has been disclo sed in Reference 2
- the SIMD processors are wasting unoccupyable processing elements (PEs) in tasks with irregular input data
- the MIMD processors are wasting unoccupyable logic in tasks with regular input data.
- SIMD/MIMD architectures have been proposed, which are mainly starting from an MIMD approach and attaching an additional crossbar to enable SIMD functionality. Examples are References 3 to 6. Other approaches have a fixed percentage of SIMD and MIMD processing power by either adding each fixed number of SIMD units and MIMD units, like in Reference 7, or by adding to an array of processing elements without memory control ability a number of so called user computers with memory control capability as suggested in Reference 8.
- SIMD/MIMD architectures there is much desired in the art. Accordingly, it is an object of the present invention to provide a novel processor or processing system having a plurality of processing elements in which the wiring area needs for connecting the PEs can be reduced.
- the reduction is achieved by using a pipelined bus system, preferably formed as a ring, connecting all processing elements and a global data transfer control unit sequentially, in general.
- a pro cessor comprising: first processing elements that execute the same program of a common controller; second processing elements that execute their own program independently from other processing elements programs ; and a pipelined network connecting the first processing elements and the second processing elements sequentially.
- the processor further comprises : an access controller with access control lines, each access control line being connected to each processing element of the first and second processing elements to control data access timing between each processing element and the network.
- the relation of the data access timings from the first processing elements and the data access timings from the second processing elements is different.
- the data access for the first processing elements is a concurrent access so that each of the first processing elements accesses the network at a same timing slot; and the data access for the second processing elements is a standalone access so that each of the second processing elements accesses the network independently.
- the access controller controls the network so as to achieve an increased (high) efficiency in the use of the network. In a sixth aspect, the access controller controls the network so as to achieve to hold a specified bus access waiting time for one processing element.
- the access controller assigns higher priority to one standalone access than to the concurrent access or to the other standalone accesses when assigning the data access timing to each processing element.
- the access controller assigns higher priority to the concurrent access than the standalone access when assigning the data access timing to each processing element.
- the access controller controls the network so as to hold a specified bus access waiting time for each processing element.
- the access controller decides the data access timing of each processing element to minimize the necessary duration to send the required data.
- processing elements are configurable to the first processing element and the second processing element; and the access controller decides which processing element is used as the first processing element and which processing element is used as the second processing element.
- the processor further comprises an arbitration unit that arbitrates demand of data transfer for the first and second processing elements.
- the first processing element comprises a SIMD architecture pro cessing element; and the second processing element comprises a MIMD architecture processing element.
- the pipelined network is a pipelined ring network.
- an access controller having access control lines, wherein each access control line is connected to each of processing elements via access control lines ; the processing elements comprising : first processing elements that execute the same program of the access controller, and second processing elements that execute their own program independently from other processing elements programs the first and second processing elements being sequentially connected with a pipelined network; and the access controller controls data access timing between each processing element and the pipelined network.
- the access controller may be formulated in association with any one of the processors as mentioned herein as the preceding aspects,
- a processing method comprising: executing the same program of a common controller by first processing elements; executing own program independently from other processing elements programs by second processing elements; and connecting the first processing elements and the second processing elements sequentially through a pipelined network.
- a method o f controlling an access controller having access control lines comprising: providing each access control line connected to each processing element; the processing elements being controlled by the steps comprising : executing the same program of the access controller by first processing elements, and executing own program independently from other processing elements programs by second processing elements, sequentially connecting the first and second processing elements with a pipelined network; and controlling data access timing between each processing element and the pipelined network by the access controller.
- the number of connections of all data signals to the global data transfer control unit will be reduced to l/number_of_processing_elements which results in less needs of wiring area around this unit. Further on, the wiring length of these data signals can be reduced, so that special driver cells which would otherwise be necessary to prevent critical path length problem can be eliminated.
- FIG. 1 is a schematic illustration of an 8 PE architecture, each can be either self-controlled or common controlled, with a pipelined data transferring network connected to a GCU.
- Figure 2 is a schematic illustration of an example arbiter for external memory access.
- Figure 3 is a schematic illustration showing the way of using the signal selection lo gic for controlled access to PE internal memory
- Figure 4 is a schematic illustration of the data and control signal transfer network in an example dual mode SIMD/MIMD architecture.
- Figure 5 is a timing chart o f a data transfer from arbiter to processing element PE2 which is working in MIMD mode.
- Figure 6 is a timing chart of a data transfer from arbiter to all
- GCU Global Control Unit
- PEs common controlled processing elements
- Arbiter composed of main arbitration unit, pre-arbitration unit and access controller 202: Main arbitration unit inside arbiter
- Pre-arbitration unit for self-controlled PE requests formed o f request selection and request parameter decimation lo gic
- Access Controller controlling access timing between PEs and network 205 : GCU, including instruction cache, data cache and a PE
- PE IMEM 303 Control lines
- Access control lines to control access timing between PE and network 404 Unidirectional pipelined ring bus for data signal transfer PREFERRED MODES OF THE INVENTION
- Figure 1 shows an example of architecture implementation with global control unit 10 1 , an array of 8 processing elements, each can be either self-controlled 102 or common controlled 103 , and the pipelined bus 104 as data transfer network connecting the GCU and all PEs sequentially. While for each PE, the operation mode is chosen freely by a mode decision unit, for this example, all odd PEs are common controlled and all even PEs are self-controlled. To enable in such kind of architecture the data transfer from the
- an external memory arbiter 201 has to be added.
- this arbiter the requests from GCU for common controlled PEs and the requests from all self-controlled PEs are handled as shown in Figure 2.
- This handling can be done in different ways, by first, giving the priority to the global controller request if the common controlled PEs are working on tasks which are more urgent to continue, or by second, giving the priority to a single self-controlled PE if this self-controlled PE is working on a task which is more urgent to continue.
- Other po ssibilities are to give the priority in such way, that the necessary duration to send data is minimized or like used here in this example, for equal priority request sources by giving access to a unit which had for the longest time no access.
- the main arbitration unit 202 four different request types have to be arbitrated by the main arbitration unit 202, three from GCU 205 , named instruction cache, data cache and PE IMEM data transfer control for common controlled PEs, and one from the self-controlled PEs, the PE ' s IMEM request.
- the request selection logic is constructing in a first step a request tree from leaf nodes to root with PE requests forming the leaf nodes information and doing an "OR"-operation to receive the child's parent node's information.
- the longest inactive and now requesting PE is searched by going through the tree from root to leave nodes with additional update of the parent's "last child taken" -information.
- the parameters from the active PE are taken and passed to the main arbitration unit inside the arbiter (unit) by a request parameter decimation logic, which can be constructed like in the example implementation by a simple "OR"-gate if the information about the current active PE is sent to all PEs and the current inactive PEs disable their request parameters by sending zero vectors.
- the decision, whether a PE is running in self-controlled mode or common controlled mode is performed inside the access controller (unit) 204 of the arbiter. This assignment (allocation) can be changed during run-time to achieve a high efficiency of the network.
- FIG. 3 By selecting depending on the mode over a selector (unit) 301 the correct control lines 303 for the PE IMEM units 302 inside the arbiter, different access types for self-controlled and common controlled PEs can be executed ( Figure 3) . While for a data transfer to a self-controlled PE only one IMEM is accessed at a time, all common controlled PE IMEM units are accessed at the same time in case with a data transfer to common controlled PEs.
- An implementation example of the architecture with PEs 40 1 each can be either self-controlled to enable MIMD type or common controlled to enable SIMD type processing, including IMEM and ring bus register R, and GCU with external memory arbiter 402 is the dual mode SIMD/MIMD architecture shown in Figure 4.
- MISD multiple instruction, single data
- the transfer network in this example architecture is done in such way that the control lines 403 for address and control signals between PEs and GCU with external memory arbiter are non-pipelined, direct connected signals while data signals are transferred over a unidirectional pipelined ring bus system 404 which results first in less needs of wiring area and second in a reduced critical path length. Further on, such bus system enables data transfer from the arbiter to PE IMEM as well as from PE IMEM to the arbiter without bidirectional network problems.
- a data transfer is initiated and controlled by the arbiter as shown in the timing block diagram in Figure 5 for an example read transfer of the three byte DO, D l and D2 from arbiter to the self-controlled processing element PE2.
- the data is transferred over the registers of the pipelined ring bus PE n R and then stored inside the desired PE by setting the signals PE select (SEL), PE number (NO), register shift (SFT), data load (LD) and data store (ST) inside the arbiter correctly and transferring those signals directly to the PEs to set there the paths in a desired way.
- the data load control signal (LDp E n ) is set in one clock cycle at the end of the transfer to one, the other data load control signals as well as all data store control signals (STpE n ) are not changed and they ho ld the value zero .
- the knowledge inside the arbiter about the kind of current transfer to IMEM is provided by the requesting source.
- MIMD mode it is a self-controlled PE and for SIMD mode, it is the GCU, while the decision, whether a PE is working in self- or common controlled mode is done in the access controller of the arbiter.
- SIMD single instruction multiple data
- MIMD multiple instruction multiple data
- data has to be transferred between an external memory and processing elements having an internal memory (IMEM), which results in a problem for the data bus because of the large wiring area requirements, in case with architectures with many PEs.
- IMEM internal memory
- This problem can be solved by a newly proposed formulation for architectures with PEs, each can be either self-controlled or common controlled, such as dual mode SIMD/MIMD architectures will reduce this wiring area requirement by using, as network for data transfer, a pipelined bus system, preferably formed as a ring, connecting all PEs and the global control unit with external memory arbiter sequentially. Data transfers over such kind of network can then be preformed for one single PE in MIMD mode at a time, e.g.
- This invention can be used to achieve a high performance processor design in low cost for embedded systems.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Multi Processors (AREA)
- Advance Control (AREA)
Abstract
A processor of SIMD/MIMD dual mode architecture comprises common controlled first processing elements, self-controlled second processing elements and a pipelined (ring) network connecting the first PEs and the second PEs sequentially. An access controller has access control lines, each access control line being connected to each PE of the first and second PEs to control data access timing between each PE and the network. Each PE can be self-controlled or common controlled, such as dual mode SIMD/MIMD architectures, reducing the wiring area requirement.
Description
DES CRIPTION
A DATA TRANSFER NETWORK AND CONTROL APPARATUS FOR A
SYSTEM WITH AN ARRAY OF PROCES SING ELEMENTS EACH EITHER SELF- OR COMMON CONTROLLED
FIELD OF THE INVENTION
The present invention relates to an architecture design for data transfer bandwidth cost reduction, particularly, to measures for wiring area reduction, and a control metho d for architectures with processing elements, each can be either self-controlled or common controlled, to reach an optimized design regarding area requirements while offering maximal dual mode processor flexibility. BACKGROUND OF THE INVENTION By now, many processors operating in the single instruction, multiple data stream (SIMD) style or the multiple instruction stream, multiple data stream (MIMD) style have been proposed. While processor of the first mentioned style, who se architecture has been disclo sed in Reference 1 , are mo stly used for processing computationally expensive, data independent low-level tasks with regular data and control flow or medium-level tasks with their regular data access but irregular data and control flow, processor of the second mentioned style, who se architecture has been disclo sed in Reference 2, work on irregular input data with irregular data and control flow. This results in the problem that the SIMD processors are wasting
unoccupyable processing elements (PEs) in tasks with irregular input data, while the MIMD processors are wasting unoccupyable logic in tasks with regular input data.
Many upcoming algorithms, like for example H.264, are made up of a number of sub algorithms which follow partly the SIMD contro l style and partly the MIMD control style. Therefore, new dual mode
SIMD/MIMD architectures have been proposed, which are mainly starting from an MIMD approach and attaching an additional crossbar to enable SIMD functionality. Examples are References 3 to 6. Other approaches have a fixed percentage of SIMD and MIMD processing power by either adding each fixed number of SIMD units and MIMD units, like in Reference 7, or by adding to an array of processing elements without memory control ability a number of so called user computers with memory control capability as suggested in Reference 8.
The references are listed below.
[Reference 1 ] R. A. Stokes et al. , "Parallel operating array computer", U. S . Pat. No . 3 ,537, 074, Oct. 27, 1970
[Reference 2] A. Rosman, "MIMD instruction flow computer architecture", U. S . Pa t . No . 4, 837,676, June 6, 1989
[Reference 3] R. J. Gove et al. , "Multi-processor reconfigurable in single instruction multiple data (SIMD) and multiple instruction multiple data (MIMD) modes and method of operation", U. S . Pat. No .
5 ,212,777, May 18 , 1993 [Reference 4] N. K. Ing-Simmons et al. , "Dual mode
SIMD/MIMD processor providing reuse of MIMD instruction memories as data memories when operating inSIMD mode", U.S. Pat. No. 5,239,654, Aug.24, 1993
[Reference 5] R. J. Gove et al., "Reconfigurable multi-processor operating in SIMD mode with one processor fetching instructions for use by remaining processors", U.S. Pat. No.5,522,083, May 28, 1996
[Reference 6] J. A. Sgro et al., "Scalable multi-processor architecture for SIMD and MIMD operations", U.S. Pat. No. 5,903,771, May 11, 1999 [Reference 7] T. Kan, "Parallel data processing system combining a SIMD unit with a MIMD unit and sharing a common bus, memory, and system controller", U.S. Pat. No.5,355,508, Oct. 11, 1994 [Reference 8] J. H. Jackson et al., "MIMD arrangement of SIMD machines", U.S. Pat. No.6,487,651, Nov.26, 2002 SUMMARY OF THE DISCLOSURE
The disclosures of the above references are incorporated herein by reference thereto and should be referred to upon needs.
According to the analysis based on the present invention, all those approaches have in common that they need a complex crossbar for data transfer between external memory and processing elements with internal memory which results for the data bus in large wiring area requirements for architectures with processing elements, each can be either self-controlled or common controlled, such as dual mode
SIMD/MIMD architectures. Thus there is much desired in the art. Accordingly, it is an object of the present invention to provide a
novel processor or processing system having a plurality of processing elements in which the wiring area needs for connecting the PEs can be reduced.
It is another object of the present invention to provide a novel so lution for a processor or processing system with an array of PEs, each either self- or common controlled.
It is a further object o f the present invention to improve a processor or processing system having an array of PEs, such that each can be either self- or common controlled with respect to the wiring area requirement for data bus.
Other objects of the present invention will become apparent in the entire disclo sure.
According to the present invention, the reduction is achieved by using a pipelined bus system, preferably formed as a ring, connecting all processing elements and a global data transfer control unit sequentially, in general.
Specifically the present invention provides various aspects.
According to a first aspect of the present invention, there is provided a pro cessor comprising: first processing elements that execute the same program of a common controller; second processing elements that execute their own program independently from other processing elements programs ; and a pipelined network connecting the first processing elements and the second processing elements sequentially.
In a second aspect, the processor further comprises : an access controller with access control lines, each access control line being
connected to each processing element of the first and second processing elements to control data access timing between each processing element and the network.
In a third aspect, the relation of the data access timings from the first processing elements and the data access timings from the second processing elements is different.
In a forth aspect, the data access for the first processing elements is a concurrent access so that each of the first processing elements accesses the network at a same timing slot; and the data access for the second processing elements is a standalone access so that each of the second processing elements accesses the network independently.
In a fifth aspect, the access controller controls the network so as to achieve an increased (high) efficiency in the use of the network. In a sixth aspect, the access controller controls the network so as to achieve to hold a specified bus access waiting time for one processing element.
In a seventh aspect, the access controller assigns higher priority to one standalone access than to the concurrent access or to the other standalone accesses when assigning the data access timing to each processing element.
In a eighth aspect, the access controller assigns higher priority to the concurrent access than the standalone access when assigning the data access timing to each processing element. In a ninth aspect, the access controller controls the network so
as to hold a specified bus access waiting time for each processing element.
In a tenth aspect, the access controller decides the data access timing of each processing element to minimize the necessary duration to send the required data.
In a eleventh aspect, processing elements are configurable to the first processing element and the second processing element; and the access controller decides which processing element is used as the first processing element and which processing element is used as the second processing element.
In a twelfth aspect, the processor further comprises an arbitration unit that arbitrates demand of data transfer for the first and second processing elements.
In a thirteenth aspect, the first processing element comprises a SIMD architecture pro cessing element; and the second processing element comprises a MIMD architecture processing element.
In a fourteenth aspect, the pipelined network is a pipelined ring network.
In a fifteenth aspect, there is provided an access controller having access control lines, wherein each access control line is connected to each of processing elements via access control lines ; the processing elements comprising : first processing elements that execute the same program of the access controller, and second processing elements that execute their own program independently from other processing elements programs the first and second processing elements
being sequentially connected with a pipelined network; and the access controller controls data access timing between each processing element and the pipelined network.
In still further aspects, the access controller may be formulated in association with any one of the processors as mentioned herein as the preceding aspects,
In a further aspect, there is provided a processing method comprising: executing the same program of a common controller by first processing elements; executing own program independently from other processing elements programs by second processing elements; and connecting the first processing elements and the second processing elements sequentially through a pipelined network.
In a still further aspect, there is provided a method o f controlling an access controller having access control lines, comprising: providing each access control line connected to each processing element; the processing elements being controlled by the steps comprising : executing the same program of the access controller by first processing elements, and executing own program independently from other processing elements programs by second processing elements, sequentially connecting the first and second processing elements with a pipelined network; and controlling data access timing between each processing element and the pipelined network by the access controller.
The meritorious effects of the present invention are summarized as fo llows.
Two positive effects are achieved on the chip area requirements.
First, the number of connections of all data signals to the global data transfer control unit will be reduced to l/number_of_processing_elements which results in less needs of wiring area around this unit. Further on, the wiring length of these data signals can be reduced, so that special driver cells which would otherwise be necessary to prevent critical path length problem can be eliminated.
BRIEF DES CRIPTION OF THE DRAWINGS Figure 1 is a schematic illustration of an 8 PE architecture, each can be either self-controlled or common controlled, with a pipelined data transferring network connected to a GCU.
Figure 2 is a schematic illustration of an example arbiter for external memory access. Figure 3 is a schematic illustration showing the way of using the signal selection lo gic for controlled access to PE internal memory
(IMEM) units.
Figure 4 is a schematic illustration of the data and control signal transfer network in an example dual mode SIMD/MIMD architecture. Figure 5 is a timing chart o f a data transfer from arbiter to processing element PE2 which is working in MIMD mode.
Figure 6 is a timing chart of a data transfer from arbiter to all
PEs working in SIMD mode.
EXPLANATION OF NUMERALS 101 : Global Control Unit (GCU), including instruction cache and
data cache, serving as a common controller for common controlled processing elements (PEs)
102: Self-controlled PE, PE is using own control unit
103 : Common controlled PE, PE is using common control unit in GCU
104 : Pipelined ring bus with registers (R) for data signal transfer
201 : Arbiter, composed of main arbitration unit, pre-arbitration unit and access controller 202: Main arbitration unit inside arbiter
203 : Pre-arbitration unit for self-controlled PE requests, formed o f request selection and request parameter decimation lo gic
204: Access Controller controlling access timing between PEs and network 205 : GCU, including instruction cache, data cache and a PE
IMEM data transfer control unit for common controlled PEs
301 : Selector to control the control signals which are transferred to self-controlled and common controlled PEs
302: PE IMEM 303 : Control lines
401 : PE with IMEM and ring bus register R
402 : GCU with external memory arbiter for data signal transfer control
403 : Access control lines to control access timing between PE and network
404: Unidirectional pipelined ring bus for data signal transfer PREFERRED MODES OF THE INVENTION
Figure 1 shows an example of architecture implementation with global control unit 10 1 , an array of 8 processing elements, each can be either self-controlled 102 or common controlled 103 , and the pipelined bus 104 as data transfer network connecting the GCU and all PEs sequentially. While for each PE, the operation mode is chosen freely by a mode decision unit, for this example, all odd PEs are common controlled and all even PEs are self-controlled. To enable in such kind of architecture the data transfer from the
PEs to an external memory, an external memory arbiter 201 has to be added. In this arbiter the requests from GCU for common controlled PEs and the requests from all self-controlled PEs are handled as shown in Figure 2. This handling can be done in different ways, by first, giving the priority to the global controller request if the common controlled PEs are working on tasks which are more urgent to continue, or by second, giving the priority to a single self-controlled PE if this self-controlled PE is working on a task which is more urgent to continue. Other po ssibilities are to give the priority in such way, that the necessary duration to send data is minimized or like used here in this example, for equal priority request sources by giving access to a unit which had for the longest time no access.
Overall in this example, four different request types have to be arbitrated by the main arbitration unit 202, three from GCU 205 , named instruction cache, data cache and PE IMEM data transfer control for
common controlled PEs, and one from the self-controlled PEs, the PE ' s IMEM request. B ecause the last request can arrive from every self-controlled PE, first, like in this implementation example done for self-controlled PEs with equal priority, a selection of the next accepted active self-controlled PE can be done in a pre-arbitration logic 203 formed of a request selection logic and a parameter decimation lo gic . The request selection logic is constructing in a first step a request tree from leaf nodes to root with PE requests forming the leaf nodes information and doing an "OR"-operation to receive the child's parent node's information. In a second step, the longest inactive and now requesting PE is searched by going through the tree from root to leave nodes with additional update of the parent's "last child taken" -information. After this, the parameters from the active PE are taken and passed to the main arbitration unit inside the arbiter (unit) by a request parameter decimation logic, which can be constructed like in the example implementation by a simple "OR"-gate if the information about the current active PE is sent to all PEs and the current inactive PEs disable their request parameters by sending zero vectors. The decision, whether a PE is running in self-controlled mode or common controlled mode, is performed inside the access controller (unit) 204 of the arbiter. This assignment (allocation) can be changed during run-time to achieve a high efficiency of the network.
By selecting depending on the mode over a selector (unit) 301 the correct control lines 303 for the PE IMEM units 302 inside the arbiter, different access types for self-controlled and common
controlled PEs can be executed (Figure 3) . While for a data transfer to a self-controlled PE only one IMEM is accessed at a time, all common controlled PE IMEM units are accessed at the same time in case with a data transfer to common controlled PEs. An implementation example of the architecture with PEs 40 1 , each can be either self-controlled to enable MIMD type or common controlled to enable SIMD type processing, including IMEM and ring bus register R, and GCU with external memory arbiter 402 is the dual mode SIMD/MIMD architecture shown in Figure 4. However, without limiting the concept, instead of MIMD controlled PEs, also MISD (multiple instruction, single data) controlled PEs could have been cho sen. The transfer network in this example architecture is done in such way that the control lines 403 for address and control signals between PEs and GCU with external memory arbiter are non-pipelined, direct connected signals while data signals are transferred over a unidirectional pipelined ring bus system 404 which results first in less needs of wiring area and second in a reduced critical path length. Further on, such bus system enables data transfer from the arbiter to PE IMEM as well as from PE IMEM to the arbiter without bidirectional network problems. EXAMPLES
In a system as shown in Figure 4 different transfer methods can be provided for the different PE control styles. For a request REQ from a PE operating in MIMD mode after selection of a current active PE has been done inside the arbiter, a data transfer is initiated and controlled
by the arbiter as shown in the timing block diagram in Figure 5 for an example read transfer of the three byte DO, D l and D2 from arbiter to the self-controlled processing element PE2. The data is transferred over the registers of the pipelined ring bus PEnR and then stored inside the desired PE by setting the signals PE select (SEL), PE number (NO), register shift (SFT), data load (LD) and data store (ST) inside the arbiter correctly and transferring those signals directly to the PEs to set there the paths in a desired way.
In contrast to the data transfer request from a PE operating in MIMD mode where for a read request data is only sent to one PE IMEM at a time, in SIMD mode, the IMEM from all PEs operating in SIMD mode (active) are filled at the same time. Therefore, first the data DO till D7 is transferred from the arbiter to all registers of the pipelined ring bus PE n R and then for the currently active PEs the data is loaded from the register to the memory modules as shown in the timing chart in Figure 6. For the read request in the example architecture from Figure 1 with eight PEs where all odd PEs are active in SIMD mode only for these active PEs the data load control signal (LDpE n ) is set in one clock cycle at the end of the transfer to one, the other data load control signals as well as all data store control signals (STpEn ) are not changed and they ho ld the value zero .
The knowledge inside the arbiter about the kind of current transfer to IMEM is provided by the requesting source. For MIMD mode, it is a self-controlled PE and for SIMD mode, it is the GCU, while the decision, whether a PE is working in self- or common
controlled mode is done in the access controller of the arbiter.
The following example is given for better illustration of the present invention.
In the dual mode single instruction multiple data (SIMD)/ multiple instruction multiple data (MIMD) architectures data has to be transferred between an external memory and processing elements having an internal memory (IMEM), which results in a problem for the data bus because of the large wiring area requirements, in case with architectures with many PEs. This problem can be solved by a newly proposed formulation for architectures with PEs, each can be either self-controlled or common controlled, such as dual mode SIMD/MIMD architectures will reduce this wiring area requirement by using, as network for data transfer, a pipelined bus system, preferably formed as a ring, connecting all PEs and the global control unit with external memory arbiter sequentially. Data transfers over such kind of network can then be preformed for one single PE in MIMD mode at a time, e.g. for a data transfer to a PE IMEM by shifting the data from the arbiter over the pipelined (ring) bus to the destination PE and by opening there the path to the IMEM while closing the paths to all the other PE IMEM units. In contrast, for SIMD mode the data is sent to all the common controlled PEs at the same time by releasing the data words from the arbiter to the bus in correct order and by shifting them over the pipelined (ring) bus till the data words have reached their destination register on the pipelined bus. After this, the data is stored at the same time from all the common
controlled PEs to the IMEM units by setting the open paths to the IMEM units only for the common controlled PEs. INDUSTRIAL APPLICABILITY
This invention can be used to achieve a high performance processor design in low cost for embedded systems.
It should be noted that other objects, features and aspects of the present invention will become apparent in the entire disclosure and that modifications from the disclosed embodiments may be done without departing the scope of the present invention claimed as appended herewith.
Also it should be noted that any combination of the disclo sed and/or claimed elements, matters and/or items may fall under the modifications aforementioned.
Claims
1 . A processor comprising: first processing elements that execute the same program of a common controller; second processing elements that execute their own program independently from other processing elements programs; and a pipelined network connecting said first processing elements and said second processing elements sequentially.
2. The processor as defined in Claim 1 , further comprising: an access controller with access control lines, each access control line being connected to each processing element of said first and second processing elements to control data access timing between each processing element and said network.
3. The processor as defined in Claim 2, wherein the relation of said data access timings from said first processing elements and said data access timings from said second processing elements is different.
4. The processor as defined in Claim 2 or 3, wherein said data access for said first processing elements is a concurrent access so that each of said first processing elements accesses said network at a same timing slot; and said data access for said second processing elements is a standalone access so that each of said second processing elements accesses said network independently.
5. The processor as defined in any one of Claims 2 to 4, wherein said access controller controls said network so as to achieve an increased efficiency in the use of said network.
6. The processor as defined in any one of Claims 2 to 5, wherein said access controller controls said network so as to achieve to hold a specified bus access waiting time for one processing element.
7. The processor as defined in any one of Claims 4 to 6, wherein said access controller assigns higher priority to one standalone access than to said concurrent access or to said other standalone accesses when assigning the data access timing to each processing element.
8. The processor as defined in any one of Claims 4 to 6, wherein said access controller assigns higher priority to said concurrent access than said standalone access when assigning the data access timing to each processing element.
9. The processor as defined in Claim 5, wherein said access controller controls said network so as to hold a specified bus access waiting time for each processing element.
10. The processor as defined in Claim 5, wherein said access controller decides the data access timing of each processing element to minimize the necessary duration to send the required data.
1 1 . The processor as defined in any one of Claims 5 to 10, wherein processing elements are configurable to said first processing element and said second processing element; and said access controller decides which processing element is used as said first processing element and which processing element is used as said second processing element.
12. The processor as defined in any one of Claims 1 to 1 1 , further comprising an arbitration unit that arbitrates demand of data transfer for said first and second processing elements.
13. The processor as defined in any one of Claims 1 to 12, wherein said first processing element is a SIMD architecture processing element; and said second processing element is a MIMD architecture processing element.
14. The processor as defined in any one of Claims 1 to 13, wherein said pipelined network is a pipelined ring network.
15. An access controller having access control lines, wherein each access control line is connected to each pro cessing element; said processing elements comprising: first processing elements that execute the same program of said access controller, and second processing elements that execute their own program independently from other processing elements programs said first and second processing elements being sequentially connected with a pipelined network; and said access controller controls data access timing between each processing element and said pipelined network.
16. The access controller as defined in Claim 15, wherein the relation of said data access timings from said first processing elements and said data access timings from said second processing elements is different.
17. The access controller as defined in Claim 16, wherein said data access for said first processing elements is a concurrent access so that each o f said first processing elements accesses said network at a same timing slot; and said data access for said second processing elements is a standalone access so that each of said second processing elements accesses said network independently.
1 8. The access controller as defined in Claim 17, wherein said access controller controls said network so as to achieve an increased efficiency in the use of said network.
19. The access controller as defined in Claim 17, wherein processing elements are configurable to said first processing element and said second processing element; and said access controller decides which processing element is used as said first processing element and which processing element is used as said second processing element.
20. The access controller as defined in Claim 19, further comprising an arbitration unit that arbitrates demand of data transfer for said first and second processing elements.
21. The access controller as defined in Claim 20, wherein said first processing element is a SIMD architecture processing element; and said second processing element is a MIMD architecture processing element.
22. The access controller as defined in any one of Claims 1 5 to 21 , wherein said pipelined network is a pipelined ring network.
23. A processing metho d comprising : executing the same program of a common controller by first processing elements; executing own program independently from other processing elements programs by second processing elements; and connecting said first processing elements and said second processing elements sequentially through a pipelined network.
24. A method of controlling an access controller having access control lines, comprising : providing each access control line connected to each processing element; said processing elements being controlled by the steps comprising: executing the same program of said access controller by first processing elements, and executing own program independently from other processing elements programs by second processing elements sequentially connecting said first and second processing elements with a pipelined network; and controlling data access timing between each pro cessing element and said pipelined network by said access controller.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/449,977 US8190856B2 (en) | 2007-03-06 | 2007-03-06 | Data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled |
JP2009538540A JP5158091B2 (en) | 2007-03-06 | 2007-03-06 | Data transfer network and controller for systems with autonomously or commonly controlled PE arrays |
PCT/JP2007/054756 WO2008108005A1 (en) | 2007-03-06 | 2007-03-06 | A data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled |
EP07715314A EP2132645B1 (en) | 2007-03-06 | 2007-03-06 | A data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled |
AT07715314T ATE508415T1 (en) | 2007-03-06 | 2007-03-06 | DATA TRANSFER NETWORK AND CONTROL DEVICE FOR A SYSTEM HAVING AN ARRAY OF PROCESSING ELEMENTS EACH EITHER SELF-CONTROLLED OR JOINTLY CONTROLLED |
DE602007014413T DE602007014413D1 (en) | 2007-03-06 | 2007-03-06 | DATA TRANSFER NETWORK AND CONTROL DEVICE FOR A SYSTEM WITH AN ARRAY OF PROCESSING ELEMENTS, EITHER EITHER SELF- OR COMMONLY CONTROLLED |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2007/054756 WO2008108005A1 (en) | 2007-03-06 | 2007-03-06 | A data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008108005A1 true WO2008108005A1 (en) | 2008-09-12 |
Family
ID=38616413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2007/054756 WO2008108005A1 (en) | 2007-03-06 | 2007-03-06 | A data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled |
Country Status (6)
Country | Link |
---|---|
US (1) | US8190856B2 (en) |
EP (1) | EP2132645B1 (en) |
JP (1) | JP5158091B2 (en) |
AT (1) | ATE508415T1 (en) |
DE (1) | DE602007014413D1 (en) |
WO (1) | WO2008108005A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010113340A1 (en) * | 2009-03-30 | 2010-10-07 | Nec Corporation | Single instruction multiple data (simd) processor having a plurality of processing elements interconnected by a ring bus |
WO2011064898A1 (en) | 2009-11-26 | 2011-06-03 | Nec Corporation | Apparatus to enable time and area efficient access to square matrices and its transposes distributed stored in internal memory of processing elements working in simd mode and method therefore |
JP2012522280A (en) * | 2009-03-30 | 2012-09-20 | 日本電気株式会社 | Single instruction multiple data (SIMD) processor having multiple processing elements interconnected by a ring bus |
WO2016051435A1 (en) * | 2014-10-01 | 2016-04-07 | Renesas Electronics Corporation | Data transfer apparatus and microcomputer |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9996500B2 (en) * | 2011-09-27 | 2018-06-12 | Renesas Electronics Corporation | Apparatus and method of a concurrent data transfer of multiple regions of interest (ROI) in an SIMD processor system |
ES2391733B2 (en) * | 2011-12-30 | 2013-05-10 | Universidade De Santiago De Compostela | DYNAMICALLY RECONFIGURABLE SIMD / MIMD HYBRID ARCHITECTURE OF A COPROCESSOR FOR VISION SYSTEMS |
US20140189298A1 (en) * | 2012-12-27 | 2014-07-03 | Teresa Morrison | Configurable ring network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5050065A (en) * | 1987-11-06 | 1991-09-17 | Thomson-Csf | Reconfigurable multiprocessor machine for signal processing |
US5522083A (en) * | 1989-11-17 | 1996-05-28 | Texas Instruments Incorporated | Reconfigurable multi-processor operating in SIMD mode with one processor fetching instructions for use by remaining processors |
US5903771A (en) * | 1996-01-16 | 1999-05-11 | Alacron, Inc. | Scalable multi-processor architecture for SIMD and MIMD operations |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3537074A (en) * | 1967-12-20 | 1970-10-27 | Burroughs Corp | Parallel operating array computer |
US4837676A (en) * | 1984-11-05 | 1989-06-06 | Hughes Aircraft Company | MIMD instruction flow computer architecture |
US5239654A (en) | 1989-11-17 | 1993-08-24 | Texas Instruments Incorporated | Dual mode SIMD/MIMD processor providing reuse of MIMD instruction memories as data memories when operating in SIMD mode |
US5212777A (en) | 1989-11-17 | 1993-05-18 | Texas Instruments Incorporated | Multi-processor reconfigurable in single instruction multiple data (SIMD) and multiple instruction multiple data (MIMD) modes and method of operation |
US5355508A (en) | 1990-05-07 | 1994-10-11 | Mitsubishi Denki Kabushiki Kaisha | Parallel data processing system combining a SIMD unit with a MIMD unit and sharing a common bus, memory, and system controller |
JPH07122866B1 (en) * | 1990-05-07 | 1995-12-25 | Mitsubishi Electric Corp | |
JPH0668053A (en) * | 1992-08-20 | 1994-03-11 | Toshiba Corp | Parallel computer |
EP0791194A4 (en) * | 1994-11-07 | 1998-12-16 | Univ Temple | Multicomputer system and method |
AU2470701A (en) | 1999-10-26 | 2001-05-08 | Arthur D. Little, Inc. | Dual aspect ratio pe array with no connection switching |
WO2002065700A2 (en) * | 2001-02-14 | 2002-08-22 | Clearspeed Technology Limited | An interconnection system |
-
2007
- 2007-03-06 EP EP07715314A patent/EP2132645B1/en not_active Not-in-force
- 2007-03-06 US US12/449,977 patent/US8190856B2/en active Active
- 2007-03-06 JP JP2009538540A patent/JP5158091B2/en active Active
- 2007-03-06 WO PCT/JP2007/054756 patent/WO2008108005A1/en active Application Filing
- 2007-03-06 AT AT07715314T patent/ATE508415T1/en not_active IP Right Cessation
- 2007-03-06 DE DE602007014413T patent/DE602007014413D1/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5050065A (en) * | 1987-11-06 | 1991-09-17 | Thomson-Csf | Reconfigurable multiprocessor machine for signal processing |
US5522083A (en) * | 1989-11-17 | 1996-05-28 | Texas Instruments Incorporated | Reconfigurable multi-processor operating in SIMD mode with one processor fetching instructions for use by remaining processors |
US5903771A (en) * | 1996-01-16 | 1999-05-11 | Alacron, Inc. | Scalable multi-processor architecture for SIMD and MIMD operations |
Non-Patent Citations (1)
Title |
---|
KYO S ET AL: "An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems", COMPUTER ARCHITECTURE, 2005. ISCA '05. PROCEEDINGS. 32ND INTERNATIONAL SYMPOSIUM ON MADISON, WI, USA 04-08 JUNE 2005, PISCATAWAY, NJ, USA,IEEE, 4 June 2005 (2005-06-04), pages 134 - 145, XP010807901, ISBN: 0-7695-2270-X * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010113340A1 (en) * | 2009-03-30 | 2010-10-07 | Nec Corporation | Single instruction multiple data (simd) processor having a plurality of processing elements interconnected by a ring bus |
JP2012522280A (en) * | 2009-03-30 | 2012-09-20 | 日本電気株式会社 | Single instruction multiple data (SIMD) processor having multiple processing elements interconnected by a ring bus |
WO2011064898A1 (en) | 2009-11-26 | 2011-06-03 | Nec Corporation | Apparatus to enable time and area efficient access to square matrices and its transposes distributed stored in internal memory of processing elements working in simd mode and method therefore |
WO2016051435A1 (en) * | 2014-10-01 | 2016-04-07 | Renesas Electronics Corporation | Data transfer apparatus and microcomputer |
Also Published As
Publication number | Publication date |
---|---|
JP5158091B2 (en) | 2013-03-06 |
EP2132645A1 (en) | 2009-12-16 |
EP2132645B1 (en) | 2011-05-04 |
US20100088489A1 (en) | 2010-04-08 |
US8190856B2 (en) | 2012-05-29 |
JP2010520519A (en) | 2010-06-10 |
ATE508415T1 (en) | 2011-05-15 |
DE602007014413D1 (en) | 2011-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11609769B2 (en) | Configuration of a reconfigurable data processor using sub-files | |
US20230289310A1 (en) | Top level network and array level network for reconfigurable data processors | |
US11983140B2 (en) | Efficient deconfiguration of a reconfigurable data processor | |
US6594713B1 (en) | Hub interface unit and application unit interfaces for expanded direct memory access processor | |
US6219775B1 (en) | Massively parallel computer including auxiliary vector processor | |
US6631439B2 (en) | VLIW computer processing architecture with on-chip dynamic RAM | |
US7415594B2 (en) | Processing system with interspersed stall propagating processors and communication elements | |
US20040136241A1 (en) | Pipeline accelerator for improved computing architecture and related system and method | |
US8190856B2 (en) | Data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled | |
EP1573515A2 (en) | Pipeline accelerator and related system and method | |
CN111656339B (en) | Memory device and control method thereof | |
WO2004042562A2 (en) | Pipeline accelerator and related system and method | |
US11782760B2 (en) | Time-multiplexed use of reconfigurable hardware | |
US10659396B2 (en) | Joining data within a reconfigurable fabric | |
JP2001092772A (en) | Data bus to use synchronizing fixed latency loop | |
JPH06266615A (en) | Sequential data transfer-type memory and computer system using the same | |
US20180212894A1 (en) | Fork transfer of data between multiple agents within a reconfigurable fabric | |
US6694385B1 (en) | Configuration bus reconfigurable/reprogrammable interface for expanded direct memory access processor | |
US8478946B2 (en) | Method and system for local data sharing | |
US20040064662A1 (en) | Methods and apparatus for bus control in digital signal processors | |
EP4214609A1 (en) | Method and apparatus for configurable hardware accelerator | |
US8683106B2 (en) | Control apparatus for fast inter processing unit data exchange in an architecture with processing units of different bandwidth connection to a pipelined ring bus | |
TW202032383A (en) | Configuration load and unload of a reconfigurable data processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07715314 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009538540 Country of ref document: JP Ref document number: 12449977 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007715314 Country of ref document: EP |