WO2010032205A1 - Electronic circuit comprising a plurality of processing devices - Google Patents

Electronic circuit comprising a plurality of processing devices Download PDF

Info

Publication number
WO2010032205A1
WO2010032205A1 PCT/IB2009/054069 IB2009054069W WO2010032205A1 WO 2010032205 A1 WO2010032205 A1 WO 2010032205A1 IB 2009054069 W IB2009054069 W IB 2009054069W WO 2010032205 A1 WO2010032205 A1 WO 2010032205A1
Authority
WO
WIPO (PCT)
Prior art keywords
distributor
processing devices
dispatchers
commands
electronic circuit
Prior art date
Application number
PCT/IB2009/054069
Other languages
French (fr)
Inventor
Cornelis Hermanus Van Berkel
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Publication of WO2010032205A1 publication Critical patent/WO2010032205A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Definitions

  • the invention relates to an electronic circuit comprising a plurality of processing devices.
  • SoC system-on- chip
  • a system on chip comprises a plurality of processing devices.
  • a system-on-chip is implemented as an integrated circuit (IC).
  • IC integrated circuit
  • system-on-chip designs are implemented with components, such as transistors, of ever decreasing dimensions. By decreasing the dimension the performance of a device can be increased. Increased performance can be seen in features such as increased processing speed, decreased power consumption, or an increase in capabilities.
  • United States Patent 5,083,265 describes a known parallel computer system.
  • the computer system comprises a plurality of processing elements which communicate through a router.
  • the router operates independently of the individual processing elements. Data packets are delivered to the router. Once a data packet is delivered to the router, the packet is routed to its destination without any burden on the processing elements which may continue their processing.
  • the router delivers messages point to point between pairs of components.
  • the router can be implemented by an electronic or optical packet-switching network.
  • a computing task which is to be performed by the electronic circuit, is partitioned into a number of smaller tasks, each smaller task is expressed as one or more commands in the third plurality of commands, which are to be executed by one or more processing devices.
  • the third plurality of commands is produced by a first plurality of dispatchers.
  • the processing devices in the second plurality of processing devices are capable of indicating their availability.
  • the dispatchers do not execute the commands themselves, or at least not all of them. Instead they are dispatched to the second plurality of processing devices.
  • the first plurality of dispatchers typically holds the state of the overall computation.
  • one or more of the first plurality of dispatchers may comprise a memory or a state-machine to hold part or all of the state of the overall computation.
  • the dispatching is done using a distributor.
  • the distributor distributes on the basis of the indicated availability.
  • the processing device While a processing device is working on a command, the processing device will typically indicate that it is not available. Likewise once it is finished the processing device may indicate its return to availability to the distributor. If, e.g., due to manufacture differences, two of the second plurality of processing devices do not have the same processing speed, the slower one of the two processing devices may be unavailable longer than the faster one, even if they were to receive a similar command for processing. These differences will not however unduly influence the system as a whole, as the faster processing device can perform more commands and compensate for the slower one.
  • the dispatchers typically are fairly lightweight devices compared to the processing devices.
  • the result produced by the processing devices may be used elsewhere.
  • the results may be sent to the plurality of dispatchers again.
  • the results may be used in a further plurality of dispatchers.
  • the results may be used in a further subsystem of the system.
  • a slow processing device is not likely to become a bottle-neck. Even if, for example, a slow processing device was to be working on a critical calculation on which results others, and in particular dispatchers are waiting, it would only slow down the circuit temporarily. A next time a critical calculation is made, it is likely, that some other processing device will make the critical calculation, possibly one of the faster ones.
  • the invention is particularly suited for computationally intensive tasks, which may be performed in parallel. Especially, when such a computation needs to be done on a power-restrained device, e.g., a device operating from a battery, the invention will help to get more computation done with the same amount of power.
  • the invention will help to get more computations from the available hardware, avoiding the need to add more hardware.
  • the invention is particular suited for digital signal processing.
  • the digital signal processing done on a battery-operated mobile phone will be aided by the present invention.
  • a personal digital assistant (PDA) a laptop, a database server, etc, may benefit from this invention.
  • PDA personal digital assistant
  • many digital signal processing algorithms are suitable for parallelization.
  • the fast Fourier transform is such an algorithm.
  • the invention is also particularly suited for use in digital image analysis and enhancement algorithms; for example: edge detection, contrast improvement, etc.
  • the electronic circuit comprises a router configured to receive from the second plurality of processing devices the fourth plurality of results.
  • the router is further configured to send the fourth plurality of results to the first plurality of dispatchers.
  • the router is configured to receive the specific result from the specific processing device and to send the specific result to a particular one of the first plurality of dispatchers in dependency on the result.
  • a first one of the second plurality of processing devices and a second one of the second plurality of processing devices are of substantially a same configuration and have substantially different processing speeds.
  • a first one of the first plurality of dispatchers and a second one of the first plurality of dispatchers are configured to operate in parallel.
  • the dispatchers are typically faster than the processing devices, it may be a bottle-neck if the dispatchers were to operate sequentially, rather than in parallel.
  • the first and second one of the dispatchers may operate in parallel by extending the capabilities of the distributor. The latter may be done in one of several ways.
  • the distributor may be arranged with an input buffer in which parallel inputs are serialized; the circuit may be equipped with a second distributor, which can distribute a second parallel request to the second plurality of processing devices; etc.
  • the distributor comprises a network of distributor circuits.
  • the network is arranged for selecting by which one of the second plurality of processing devices the third plurality of commands will be processed.
  • the network comprises a fifth plurality of distributor circuits.
  • Each distributor circuit has multiple source-side interfaces and multiple of consumer-side interfaces.
  • Each distributor circuit is configured to select over which of the consumer-side interfaces commands from the source-side interfaces will be transmitted towards the processing devices, based at least partly on signals from the consumer-side interfaces that indicate a current availability to forward the commands via the consumer-side interfaces.
  • the first plurality of dispatchers are coupled to source-side interfaces of a sixth plurality of the distributor circuits in the network and said processing circuits are coupled to consumer-side interfaces of a seventh plurality of distributor circuits in the network.
  • the consumer-side interfaces of the distributor circuits in the sixth plurality is coupled directly or indirectly to the source-side interfaces of the distributor circuits in the seventh plurality, with a connectivity so that at least two of the first plurality of dispatchers, that are coupled to different ones of the distributor circuits in the sixth plurality are both coupled to all of said plurality of processing devices via the distributor circuits of the seventh plurality.
  • Such a network of distributor circuits allows a high degree of parallelism in the distribution from the first plurality of dispatchers to the second plurality of processing devices.
  • Fig. 1 is a block diagram illustrating a first embodiment of the electronic circuit according to the invention
  • Fig. 2 is a block diagram illustrating a network of distributor circuits.
  • Fig. 3 is a block diagram illustrating a further embodiment of the distributor
  • Fig. 4 is a block diagram illustrating an embodiment of a dispatcher
  • Fig. 5 is a flow chart illustrating a method according to the invention Throughout the Figures, similar or corresponding features are indicated by same reference numerals. List of Reference Numerals:
  • Fig. 1 a first embodiment of an electronic circuit 100 according to the invention is illustrated.
  • Electronic circuit 100 comprises first plurality of dispatchers 110.
  • four dispatchers are shown: dispatcher 112, dispatcher 114, dispatcher 116, and dispatcher 118.
  • the first plurality of dispatchers 110 are connected via a connection 102 to a distributor 120.
  • a dispatcher is arranged to produce one or more commands.
  • a command indicates an action to be performed by a data processing device, typically on or with one or more data items. Typically, the processing of a command produces a result comprising a resulting data item. For example, a command may request the addition of two numbers, or vectors of numbers.
  • a command may comprise data on which the command acts.
  • a command may also comprise a reference, such as an address, to a storage location, such as a memory (not shown), where data is to be found.
  • a command is also sometimes referred to as a message.
  • a dispatcher may be a circuit that produces commands itself, or the dispatcher may pass commands generated by some other circuit (not shown).
  • Distributor 120 is connected to second plurality of processing devices 130.
  • a processing device include: a digital signal processor (DSP), a graphics processing unit (GPU), a central processing unit (CPU), a memory, such as flash based memory, a field-programmable gate array (FPGA), etc.
  • DSP digital signal processor
  • GPU graphics processing unit
  • CPU central processing unit
  • FPGA field-programmable gate array
  • the processing devices are typically arranged to produce results under control of received commands.
  • four processing devices are shown: processing device 132, processing device 134, processing device 136 and processing device 138.
  • Both the number of dispatchers 110 and the number of processing devices 130 are optional. For example, any of these two numbers may be chosen to be 8 or 16, or any other power of two. They may also be chosen not be a power of two, such as, 3, 5, 7, etc.
  • the number of dispatchers may be chosen to be higher, equal, or lower, than the number of processing devices. Having more dispatchers than processing devices tends to more fully utilize the capacity of the second plurality of processing devices 130. On the other hand, having more processing devices than dispatchers tends to maximize processing performance of the system as a whole. Also, a single dispatcher may be used.
  • the second plurality of processing devices 130 are all of the same design, or substantially so.
  • Two processing devices of the same design may however differ slightly, or markedly, due to, e.g., process variation during manufacture. They may also differ in performance by design. Since a faster processing device is more expensive to produce than a slower processing device, e.g., because of the gate count, having a faster processing device and a slower one tends to be cheaper to manufacture than having two fast processing devices. Yet, in some circumstances while using the invention a faster processing device and a slower one may already give some of the benefits of parallel computation.
  • the second plurality of processing devices 130 is connected to a router 140 via connection 106 and is arranged to send its results to router 140.
  • Router 140 forwards a received result to a particular one of the first plurality of dispatchers 110, via a connection 108.
  • Router 140 may be implemented as a standard multiplexer, sending a plurality of received signals to a plurality of addresses, in this case the fourth plurality of results to the first plurality of dispatchers.
  • Connection 102, connection 104, connection 106 and connection 108 may be implemented as direct connection from their source to their destination, but may also be implemented in an indirect fashion; for example, the sources may write to a memory or a bus, from which the destination can read. Any one of the connections 102, 104, 106 and 108 may be advantageously implemented as a parallel connection, e.g., using multiple parallel wires.
  • the first plurality of dispatchers 110 together produces a third plurality of commands.
  • the first plurality of dispatchers 110 preferably operate in parallel.
  • the operation of the first plurality of dispatchers 110 may be synchronized with a clock, but they may also work asynchronously.
  • a loader (not shown) may be used.
  • the loader initializes the dispatchers by providing them with, e.g., starting values, program code, parameters, etc.
  • the loader may be arranged to obtain a final result of the computations from one or more of the first plurality of dispatchers 110.
  • a final result may also be obtained directly from one or more of the second plurality of processing devices 130 and the router 140; for example, by reading a result from a memory of a dispatcher.
  • connection 102 may operate sequentially. For example, commands that are produced in parallel may be serialized before going over connection 102.
  • Connection 102 may also comprise more than a single connection from the first plurality of dispatchers 110 to distributor 120.
  • Connection 102 may also be fully parallel, in that each dispatcher has its own connection to the distributor 120.
  • the first plurality of dispatchers 110 act as source circuits to distributor 120.
  • distributor 120 may be arranged with multiple input ports.
  • Distributor 120 receives commands from the first plurality of dispatchers 110, via connection 102. Distributor 120 also receives availability signals from the second plurality of processing devices 130. The availability signal may go upstream using connection 104, but practically may also use another connection (not shown).
  • electronic circuit 100 works in an iterative nature: First of all, the first plurality of dispatchers 110 dispatch commands which are processed by the second plurality of processing devices 130. The results of the processing are received by the first plurality of dispatchers, who possibly in dependency on the results, produce new commands. The new commands are processed by the second plurality of processing devices 130 thereby producing new results. The first plurality of dispatchers 110 and second plurality of processing devices 130 alternately use each others results in a new round of computations.
  • This iterative process can at some point terminate. For example, the dispatchers may note that some predetermined termination condition is met, for example, the data on which the computations take place has been exhausted. Also, the iteration may terminate after a predetermined number of rounds.
  • the iteration may also terminate after some predetermined amount of time, for example, by adding interrupt capabilities to the electronic circuit.
  • the process may also be terminated by an external operator, for example, an operating system.
  • a computation performed by the electronic circuit according to the invention may proceed in a number of repeating orderly rounds of computation. However, a more preferable implementation proceeds more irregular, wherein the components of the electronic circuit operate asynchronously. For example, a dispatcher may forward a new command as soon as it has the inputs needed for the new command. Similarly, as soon as the distributor receives a command from a dispatcher it may forward the command as soon as a processing device is available. Similarly, a processing device can proceed with forwarding a result as soon as it finishes a processing.
  • an advantage of this asynchronous mode of operation is that a processing device can be occupied with useful work as soon as it is available. In this way the capabilities of the electronic circuit are more fully exploited.
  • the distributor Upon receiving a specific command from first plurality of dispatchers 110, via connection 102, the distributor selects a specific processing device from those processing devices that indicate via an availability signal that they are available. Embodiments for the distributor 120 are expanded upon below.
  • the specific command is forwarded by distributor 120 to the specific processing device, via connection 104. Note that the command itself does not necessarily need to indicate by which processing device the command is to be processed.
  • each command may be distributed to any available one of the second plurality of processing devices 130.
  • some processing devices may only be reachable from some of the first plurality of dispatchers, for example, in case the distributor is implemented as a partial network.
  • This has the advantage that the complexity of the distributor is reduced, as fewer connections need to be made.
  • the potential disadvantage of not being able to reach all processing devices is significantly reduced as long as most dispatchers can reach more than one processing device.
  • the processing device After a specific processing device has received a specific command, the processing device will process the command.
  • the command may, e.g., comprise a first vector and a second vector and a command indication.
  • the command indication may indicate, e.g., that the first vector is to be added to the second vector, or subtracted, or that their dot product must be calculated, etc.
  • the second plurality of processing devices 130 together produces a fourth plurality of results.
  • the fourth plurality of results is sent to router 140 via connection 106.
  • a result comprises an indication to which one of the first plurality of dispatchers 110 the result is to be sent.
  • a dispatcher may include in a command it dispatches a number indicating to which dispatcher or dispatchers the eventual result should be sent.
  • a dispatcher may request the result to be sent to itself, but may also request the result to be sent to another dispatcher.
  • the indication to which dispatcher a result must be sent to can also be a result from the processing of the command, done by a processing device.
  • a first one of the second plurality of processing devices 130 and a second one of the second plurality of processing devices 130 may have substantially different processing speeds even though they are in principle able to perform the same processing. Even if the first processing device is of the substantially a same configuration as the second processing device this may be the case. As a result of the ever decreasing size of processing devices their processing speeds may increasingly diverge.
  • the configuration of electronic circuit 100 is able to utilize a large part of the combined capabilities of the first processing device and the second processing device.
  • each processing device may be occupied with a new command to process as soon as the processing device is finished with a previous command.
  • the second plurality of processing devices 130 includes designs of multiple types.
  • the second plurality of processing devices 130 may comprise a first set of processing devices with a design of a first type and a second set of processing devices with a design of a second type.
  • the second set may, for example, only contain a single processing device.
  • a command is called a typed command if the command comprises a type indication.
  • a typed command indicates to the distributor that the command must be executed on a processing device of the particular type indicated by the type indicator.
  • a type indicator may be a number, indexing in a list of possible types.
  • a type indicator may be a string of bits, each bit indicating a particular capability a processing device should at least have. In this case the distributor could distribute, e.g., forward, a command to the first available processing device that at least meets all the indicated capabilities. If distributor 120 receives a typed command, the distributor 120 will forward the typed command to a processing device which is of the indicated type and which is available, e.g., the first such one.
  • Distributor 120 may employ a buffer for the situation that no processing device is available.
  • the buffer can temporarily store commands until a processing device becomes available.
  • Distributor 120 may also include a stalling module.
  • the stalling device signals the dispatchers with a stalling signal that they should cease sending commands.
  • the dispatchers are arranged to receive a stalling signal and will stop sending commands.
  • the staling stalling module may send a resume signal to the dispatchers, upon receiving such, the dispatchers will resume sending commands.
  • Fig. 2 illustrates a distributor 200 comprising a network 202 comprising distributor circuits 210.
  • a network 202 of distributor circuits 210 is described more completely in co-pending European patent application with title "Circuit with network of message distributor circuits.”, filed on 13.07.2007, with application number 07112419.2, and the corresponding PCT application filed on 07.07.2008, with application number IB2008/052728, herein incorporated by reference in its entirety.
  • Fig. 1 showing a circuit
  • Fig. Ia showing a basic distributor circuit
  • Fig. 2 showing a distributor circuit
  • Fig. 3 showing an alternative distributor circuit 30
  • Fig. 3a showing a further distributor circuit
  • Fig. 4 showing part of a distributor circuit.
  • Fig. 5 showing a handshake buffer circuit
  • Distributor 200 comprises a network 202 of distributor circuits 210.
  • a network 202 of distributor circuits 210 is the network 202 formed by a fifth plurality of distributor circuits 210.
  • Distributor 200 comprises a fifth plurality of distributor circuits 210.
  • the fifth plurality of distributor circuits 210 is interconnected to form the network 202 of distributor circuits 210.
  • the fifth plurality of distributor circuits 210 comprises a sixth plurality of distributor circuits 212 and a seventh plurality of distributor circuits 214.
  • Four distributor circuits are shown individually: a distributor circuit 220, a distributor circuit 222, a distributor circuit 230 and a distributor circuit 232.
  • the sixth plurality of distributor circuits 212 comprises distributor circuit 220 and distributor circuit 222.
  • the seventh plurality of distributor circuits 214 comprises distributor circuit 230 and distributor circuit 232.
  • the fifth plurality of distributor circuits 210 is coupled to form a network 202.
  • a distributor circuit typically has multiple source-side interfaces and multiple of consumer- side interfaces. Each distributor circuit is configured to select over which of the consumer- side interfaces a command received at a source-side interfaces, will be transmitted. This selection of the distributor circuit is based, at least partly, on availability signals that the distributor circuit received on its consumer-side interfaces. If the distributor circuit received the availability signal directly from a processing device, it signals to the distributor circuit that it can forward a command to that processing device. If the distributor circuit received the availability signal from some other distributor circuit, it signals that the other distributor circuit is able to forward to an available processing device, either directly or indirectly via yet further distributor circuits.
  • a distributor circuit with a single source-side interface and multiple consumer- side interfaces may help to accommodate some particular number of dispatchers, e.g., an odd number of dispatchers.
  • the network 202 is coupled between first plurality of dispatchers and second plurality of processing devices 130.
  • dispatchers are shown: a dispatcher 112, a dispatcher 114, a dispatcher 116 and a dispatcher 118.
  • processing devices are shown: a processing device 132, a processing device 134, a processing device 136 and a processing device 138.
  • Dispatcher 112 is connected to a first source-side interface of distributor circuit 220.
  • Dispatcher 114 is connected to a second source-side interface of distributor circuit 220.
  • Dispatcher 116 is connected to a first source-side interface of distributor circuit 222.
  • Dispatcher 118 is connected to a second source-side interface of distributor circuit 222.
  • a first consumer-side interface of distributor circuit 220 is connected to a first source-side interface of distributor circuit 230.
  • a second consumer-side interface of distributor circuit 220 is connected to a first source-side interface of distributor circuit 232.
  • a first consumer-side interface of distributor circuit 222 is connected to a second source-side interface of distributor circuit 230.
  • a second consumer-side interface of distributor circuit 222 is connected to a second source-side interface of distributor circuit 232.
  • a first consumer-side interface of distributor circuit 230 is connected to processing device 132.
  • a second consumer-side interface of distributor circuit 230 is connected to processing device 134.
  • a first consumer-side interface of distributor circuit 232 is connected to processing device 136.
  • a second consumer-side interface of distributor circuit 232 is connected to processing device 138.
  • Each of the processing devices 132, 134, 136 and 138 is configured to signal its availability upstream to distributor circuits 230 and 232.
  • Distributor circuit 230 combines the received availability signal and is configured to signal upstream to distributor circuit 220 and 222, the availability of at least one of processing devices 132 and 134.
  • Distributor circuit 232 combines the received availability signal and is configured to signal upstream to distributor circuit 220 and 222 the availability of at least one of processing devices 136 and 138.
  • the consumer-side interface of the first plurality 212 would be connected to source-side interfaces of the eighth plurality, also consumer-side interfaces of the eighth plurality would be connected to source-side interface of the second plurality 214.
  • a network 202 of distributor circuits such as in distributor 200, has the advantage that the commands produced by the first plurality of dispatchers 110 may be forwarded to the second plurality of processing devices 130 in parallel and with little overhead.
  • networks 202 of distributor circuits can easily be constructed of various network sizes. For example, by adding more dispatcher circuits more dispatchers can be accommodated and/or more processing devices can be accommodated. It is not necessary that all processing devices can be reached from all dispatchers, although preferably all dispatchers can reach at least multiple processing devices.
  • Fig. 3 illustrates in a block diagram a further embodiment of the distributor: distributor 300.
  • Distributor 300 comprises a buffer 302, a multiplexer 304, a connection 306 and a connection 308.
  • the first plurality of dispatchers 110 sends the third plurality of commands via connection 102 to buffer 302.
  • buffer 302 the third plurality of commands are serialized.
  • Buffer 302 forwards the third plurality of commands one-by-one to multiplexer 304 via connection 306.
  • Multiplexer 304 comprises multiple states, the multiple states indicating the availability of the second plurality of processing devices 130, respectively.
  • multiplexer 304 updates the multiple states, so that they reflect the current state of the second plurality of processing devices 130. In the Figure four such processing devices are shown: processing device 132, processing device 134, processing device 136 and processing device 138, each one connected to multiplexer 304.
  • multiplexer 304 Upon receiving a command from buffer 302, multiplexer 304 sends the command to the first processing device that is available. For example, the multiplexer may use connection 308 from multiplexer 304 to processing device 138, that device is the first device available. Instead of using the first available device it may be advantageous to use a random available device.
  • Fig. 4 illustrates dispatcher 400, which may be used for any one of the first plurality of dispatchers 110, for example, for dispatcher 112.
  • Dispatcher 400 comprises a read only memory (ROM) 402, a random access memory (RAM) 404 and a command producer 406.
  • ROM read only memory
  • RAM random access memory
  • the algorithm is loaded into the dispatchers in the first plurality of dispatchers 110.
  • the dispatchers may be implemented as dispatcher 400.
  • Dispatcher 400 executes a simple program to create the command from the values contained in the memories: ROM 402 and RAM 404.
  • Command producer 406 is typically a processor, albeit a much lighter-weight one than the second plurality of processing devices 130.
  • Command producer 406 may also be implemented as a finite-state machine.
  • An executable program may be comprised in ROM 402 and/or RAM 404.
  • the dispatchers together apply the algorithm to input data.
  • the dispatchers control the operations that have to be performed and the data values to be subjected to the operations.
  • the input values can be read directly from an external memory or other storage location (neither is shown), but are typically placed inside RAM 404.
  • the results produced for the commands will typically be received out-of-order due to variability in the performance of the second plurality of processing devices 130.
  • the dispatcher is configured to take account of data dependencies. For example, if the result of a previous command is needed to create a new command, the receiving dispatcher will perform the necessary bookkeeping to verify that those needed results have been received from one of the second plurality of processing devices 130. For example, use may be made of locking, synchronizing and barriers. If a needed result is not yet received, the dispatchers dependent on the needed result may, temporarily, stall the production of commands.
  • Fig. 5 illustrates in a flow chart a method 500 according to the invention.
  • Method 500 comprises: sending third plurality of commands from first plurality of dispatchers to a distributor 502; distributing the third plurality of commands from the distributor to the second plurality of processing devices 130 based on indicated availability 504; processing the third plurality of commands using the second plurality of processing devices 130 to produce fourth plurality of results 506; receiving from the second plurality of processing devices 130 the fourth plurality of results in a router 508; and sending the fourth plurality of results to the first plurality of dispatchers 510.
  • step 500 can be varied and some steps may be executed in parallel, as will be apparent to a person skilled in the art. Also between the steps of the method other operations can be interposed. In particular it is stressed, that there is no need for synchronized action. It is most advantageous to execute the method in a distributed fashion. For example, sending the third plurality of command will typically overlap with and occur parallel to the processing of those commands.
  • the present invention may be implemented using a programmed processor executing programming instructions that are broadly described above in flow chart form that can be stored on any suitable electronic storage medium.
  • programming instructions that are broadly described above in flow chart form that can be stored on any suitable electronic storage medium.
  • processes described above can be implemented in any number of variations and in many suitable programming languages without departing from the present invention.
  • the order of certain operations carried out can often be varied, additional operations can be added or operations can be deleted without departing from the invention. Error trapping, enhancements and variations can be added without departing from the present invention. Such variations are contemplated and considered equivalent.
  • the present invention could be implemented using special purpose hardware and/or dedicated processors.
  • general purpose computers, microprocessor based computers, digital signal processors, microcontrollers, dedicated processors, custom circuits, ASICS and/or dedicated hard wired logic may be used to construct alternative equivalent embodiments of the present invention.
  • several of these means may be embodied by one and the same item of hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A computing task, which is to be performed by the electronic circuit, is partitioned into a number of smaller tasks, each smaller task is expressed as one or more commands in the third plurality of commands, which are to be executed by one or more processing devices. The third plurality of commands is produced by a first plurality of dispatchers. The processing devices in the second plurality of processing devices are capable of indicating their availability. The dispatchers do not execute the commands themselves, or at least not all of them. Instead they are dispatched to the second plurality of processing devices. The first plurality of dispatchers typically holds the state of the overall computation. For example, one or more of the first plurality of dispatchers may comprise a memory or a state-machine to hold part or all of the state of the overall computation.

Description

ELECTRONIC CIRCUIT COMPRISING A PLURALITY OF PROCESSING DEVICES
FIELD OF THE INVENTION
The invention relates to an electronic circuit comprising a plurality of processing devices.
BACKGROUND OF THE INVENTION
To meet the demanding requirements for consumer appliances, the industry is moving towards the integration of a complete system on a single chip, a so-called system-on- chip (SoC). A system on chip comprises a plurality of processing devices. A system-on-chip is implemented as an integrated circuit (IC). Furthermore, also to meet these demanding requirements, such system-on-chip designs are implemented with components, such as transistors, of ever decreasing dimensions. By decreasing the dimension the performance of a device can be increased. Increased performance can be seen in features such as increased processing speed, decreased power consumption, or an increase in capabilities. United States Patent 5,083,265 describes a known parallel computer system.
The computer system comprises a plurality of processing elements which communicate through a router. The router operates independently of the individual processing elements. Data packets are delivered to the router. Once a data packet is delivered to the router, the packet is routed to its destination without any burden on the processing elements which may continue their processing. The router delivers messages point to point between pairs of components. The router can be implemented by an electronic or optical packet-switching network.
SUMMARY OF THE INVENTION To aid the design of a system-on-chip it is usual to work with components of uniform capabilities. It is especially important that the components have, as much as possible, the same switching speed. Due to physical variations in, e.g., the substrate material and the manufacturing process, there will be unavoidable variations between same components at different locations in the chip. The prior art tries to deal with this problem, by working from worst-case assumptions on the components. By treating each component as slow as the slowest component, all components seem equally fast. Unfortunately, a faster component will then be made to wait idly after it has finished its current processing task.
It is a problem of the known system art that part of the capabilities of individual processing devices remain unutilized.
It is an insight of the inventor that variations in processing speed between the components in a system-on-chip will increase as a result of decreasing dimensions, thereby exacerbating the problem.
It is an object of the invention to provide a system-on-chip design that includes a plurality of processing devices which operate at different speeds, while utilizing a larger part of the combined capabilities of the plurality of processing devices, than in the known system.
This and other objects are achieved by the electronic circuit according to the invention, as defined in claim 1. A computing task, which is to be performed by the electronic circuit, is partitioned into a number of smaller tasks, each smaller task is expressed as one or more commands in the third plurality of commands, which are to be executed by one or more processing devices. The third plurality of commands is produced by a first plurality of dispatchers. The processing devices in the second plurality of processing devices are capable of indicating their availability. The dispatchers do not execute the commands themselves, or at least not all of them. Instead they are dispatched to the second plurality of processing devices. The first plurality of dispatchers typically holds the state of the overall computation. For example, one or more of the first plurality of dispatchers may comprise a memory or a state-machine to hold part or all of the state of the overall computation. The dispatching is done using a distributor. The distributor distributes on the basis of the indicated availability.
While a processing device is working on a command, the processing device will typically indicate that it is not available. Likewise once it is finished the processing device may indicate its return to availability to the distributor. If, e.g., due to manufacture differences, two of the second plurality of processing devices do not have the same processing speed, the slower one of the two processing devices may be unavailable longer than the faster one, even if they were to receive a similar command for processing. These differences will not however unduly influence the system as a whole, as the faster processing device can perform more commands and compensate for the slower one. The dispatchers typically are fairly lightweight devices compared to the processing devices.
It is an insight of the inventor, to split the task of creating new processing commands and the actual processing thereof. In a conventional parallel computing system, the processing devices would typically be tasked with the processing of the data, communicating with other processing devices, and the creation of new processing tasks. In such a system it is easy to get bottlenecks, for example, if one of the slower devices is tasked with critical computations on which many other processing devices are dependent.
The result produced by the processing devices may be used elsewhere. For example, the results may be sent to the plurality of dispatchers again. The results may be used in a further plurality of dispatchers. In a system which comprises the electronic circuit a according to the invention as a subsystem, the results may be used in a further subsystem of the system.
However, by decoupling the dispatching and the collecting of the results from the processing devices a slow processing device is not likely to become a bottle-neck. Even if, for example, a slow processing device was to be working on a critical calculation on which results others, and in particular dispatchers are waiting, it would only slow down the circuit temporarily. A next time a critical calculation is made, it is likely, that some other processing device will make the critical calculation, possibly one of the faster ones. The invention is particularly suited for computationally intensive tasks, which may be performed in parallel. Especially, when such a computation needs to be done on a power-restrained device, e.g., a device operating from a battery, the invention will help to get more computation done with the same amount of power. Also, if such a computation needs to be done on a budget-restricted device, the invention will help to get more computations from the available hardware, avoiding the need to add more hardware. For example, the invention is particular suited for digital signal processing. For example, the digital signal processing done on a battery-operated mobile phone will be aided by the present invention. Similarly, a personal digital assistant (PDA) a laptop, a database server, etc, may benefit from this invention. It is well known that many digital signal processing algorithms are suitable for parallelization. For example, the fast Fourier transform is such an algorithm. Moreover, the invention is also particularly suited for use in digital image analysis and enhancement algorithms; for example: edge detection, contrast improvement, etc.
In a preferred embodiment of the invention, the electronic circuit comprises a router configured to receive from the second plurality of processing devices the fourth plurality of results. The router is further configured to send the fourth plurality of results to the first plurality of dispatchers. The router is configured to receive the specific result from the specific processing device and to send the specific result to a particular one of the first plurality of dispatchers in dependency on the result. When a processing device is done with the execution of a command the results are routed back to one of the dispatchers. This makes is possible to map conventional algorithms, intended for a set of parallel processing devices and map these to the light-weight dispatchers. At the same time bottle-necks are avoided since the processing devices themselves are decoupled from the dispatchers. In principle, any processing device could pick up the slack of any other.
In a preferred embodiment of the invention, a first one of the second plurality of processing devices and a second one of the second plurality of processing devices are of substantially a same configuration and have substantially different processing speeds.
It is an advantage of the invention that many processing devices may be made on a single SoC, all substantially to the same design, without being unduly hampered by the problem that some processing devices, for whatever reason, are slower or faster than other processing devices.
In a preferred embodiment of the invention, a first one of the first plurality of dispatchers and a second one of the first plurality of dispatchers are configured to operate in parallel.
Although the dispatchers are typically faster than the processing devices, it may be a bottle-neck if the dispatchers were to operate sequentially, rather than in parallel. The first and second one of the dispatchers may operate in parallel by extending the capabilities of the distributor. The latter may be done in one of several ways. For example: the distributor may be arranged with an input buffer in which parallel inputs are serialized; the circuit may be equipped with a second distributor, which can distribute a second parallel request to the second plurality of processing devices; etc.
In a preferred embodiment the distributor comprises a network of distributor circuits. The network is arranged for selecting by which one of the second plurality of processing devices the third plurality of commands will be processed. The network comprises a fifth plurality of distributor circuits. Each distributor circuit has multiple source-side interfaces and multiple of consumer-side interfaces. Each distributor circuit is configured to select over which of the consumer-side interfaces commands from the source-side interfaces will be transmitted towards the processing devices, based at least partly on signals from the consumer-side interfaces that indicate a current availability to forward the commands via the consumer-side interfaces.
The first plurality of dispatchers are coupled to source-side interfaces of a sixth plurality of the distributor circuits in the network and said processing circuits are coupled to consumer-side interfaces of a seventh plurality of distributor circuits in the network. The consumer-side interfaces of the distributor circuits in the sixth plurality is coupled directly or indirectly to the source-side interfaces of the distributor circuits in the seventh plurality, with a connectivity so that at least two of the first plurality of dispatchers, that are coupled to different ones of the distributor circuits in the sixth plurality are both coupled to all of said plurality of processing devices via the distributor circuits of the seventh plurality.
Such a network of distributor circuits allows a high degree of parallelism in the distribution from the first plurality of dispatchers to the second plurality of processing devices.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is explained in further detail by way of example and with reference to the accompanying drawings, wherein:
Fig. 1 is a block diagram illustrating a first embodiment of the electronic circuit according to the invention
Fig. 2 is a block diagram illustrating a network of distributor circuits.
Fig. 3 is a block diagram illustrating a further embodiment of the distributor
Fig. 4 is a block diagram illustrating an embodiment of a dispatcher
Fig. 5 is a flow chart illustrating a method according to the invention Throughout the Figures, similar or corresponding features are indicated by same reference numerals. List of Reference Numerals:
100 an electronic circuit
102, 104, 106, 108 a connection
110 a first plurality of dispatchers
112, 114, 116, 118 a dispatcher
120 a distributor
130 a second plurality of processing devices
132, 134, 136, 138 a processing device
140 a router
200 a distributor comprising a network of distributor circuits
202 a network
210 a fifth plurality of distributor circuits
212 a sixth plurality of the distributor circuits
214 a seventh plurality of distributor circuits
220, 222, 230, 232 a distributor circuit
300 a distributor
302 a buffer
304 a multiplexer
306, 308 a connection
400 a dispatcher
402 a read only memory
404 a random access memory
406 a command producer
500 a method
502 sending a third plurality of commands from a first plurality of dispatchers to a distributor 504 distributing the third plurality of commands from the distributor to the second plurality of processing devices based on indicated availability 506 processing the third plurality of commands using the second plurality of processing devices to produce a fourth plurality of results 508 receiving from the second plurality of processing devices a fourth plurality of results in a router 510 sending the fourth plurality of results to the first plurality of dispatchers DETAILED DESCRIPTION OF THE EMBODIMENTS
While this invention is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail one or more specific embodiments, with the understanding that the present disclosure is to be considered as exemplary of the principles of the invention and not intended to limit the invention to the specific embodiments shown and described.
In Fig. 1 a first embodiment of an electronic circuit 100 according to the invention is illustrated.
Electronic circuit 100 comprises first plurality of dispatchers 110. In this embodiment four dispatchers are shown: dispatcher 112, dispatcher 114, dispatcher 116, and dispatcher 118.
The first plurality of dispatchers 110 are connected via a connection 102 to a distributor 120.
A dispatcher is arranged to produce one or more commands. A command indicates an action to be performed by a data processing device, typically on or with one or more data items. Typically, the processing of a command produces a result comprising a resulting data item. For example, a command may request the addition of two numbers, or vectors of numbers. A command may comprise data on which the command acts. A command may also comprise a reference, such as an address, to a storage location, such as a memory (not shown), where data is to be found. A command is also sometimes referred to as a message. A dispatcher may be a circuit that produces commands itself, or the dispatcher may pass commands generated by some other circuit (not shown).
Distributor 120 is connected to second plurality of processing devices 130. Examples of a processing device include: a digital signal processor (DSP), a graphics processing unit (GPU), a central processing unit (CPU), a memory, such as flash based memory, a field-programmable gate array (FPGA), etc. The processing devices are typically arranged to produce results under control of received commands. In the Figure, four processing devices are shown: processing device 132, processing device 134, processing device 136 and processing device 138. Both the number of dispatchers 110 and the number of processing devices 130 are optional. For example, any of these two numbers may be chosen to be 8 or 16, or any other power of two. They may also be chosen not be a power of two, such as, 3, 5, 7, etc. The number of dispatchers may be chosen to be higher, equal, or lower, than the number of processing devices. Having more dispatchers than processing devices tends to more fully utilize the capacity of the second plurality of processing devices 130. On the other hand, having more processing devices than dispatchers tends to maximize processing performance of the system as a whole. Also, a single dispatcher may be used.
Preferably, the second plurality of processing devices 130 are all of the same design, or substantially so. Two processing devices of the same design may however differ slightly, or markedly, due to, e.g., process variation during manufacture. They may also differ in performance by design. Since a faster processing device is more expensive to produce than a slower processing device, e.g., because of the gate count, having a faster processing device and a slower one tends to be cheaper to manufacture than having two fast processing devices. Yet, in some circumstances while using the invention a faster processing device and a slower one may already give some of the benefits of parallel computation.
The second plurality of processing devices 130 is connected to a router 140 via connection 106 and is arranged to send its results to router 140. Router 140 forwards a received result to a particular one of the first plurality of dispatchers 110, via a connection 108. Router 140 may be implemented as a standard multiplexer, sending a plurality of received signals to a plurality of addresses, in this case the fourth plurality of results to the first plurality of dispatchers.
Connection 102, connection 104, connection 106 and connection 108 may be implemented as direct connection from their source to their destination, but may also be implemented in an indirect fashion; for example, the sources may write to a memory or a bus, from which the destination can read. Any one of the connections 102, 104, 106 and 108 may be advantageously implemented as a parallel connection, e.g., using multiple parallel wires.
During operation, the first plurality of dispatchers 110 together produces a third plurality of commands. The first plurality of dispatchers 110 preferably operate in parallel. The operation of the first plurality of dispatchers 110 may be synchronized with a clock, but they may also work asynchronously. Optionally, a loader (not shown) may be used. The loader initializes the dispatchers by providing them with, e.g., starting values, program code, parameters, etc. Also, the loader may be arranged to obtain a final result of the computations from one or more of the first plurality of dispatchers 110. Optionally, a final result may also be obtained directly from one or more of the second plurality of processing devices 130 and the router 140; for example, by reading a result from a memory of a dispatcher.
The third plurality of commands is sent via connection 102 to distributor 120. Connection 102 may operate sequentially. For example, commands that are produced in parallel may be serialized before going over connection 102. Connection 102 may also comprise more than a single connection from the first plurality of dispatchers 110 to distributor 120. Connection 102 may also be fully parallel, in that each dispatcher has its own connection to the distributor 120. The first plurality of dispatchers 110 act as source circuits to distributor 120. To support parallel connections, distributor 120 may be arranged with multiple input ports.
Distributor 120 receives commands from the first plurality of dispatchers 110, via connection 102. Distributor 120 also receives availability signals from the second plurality of processing devices 130. The availability signal may go upstream using connection 104, but practically may also use another connection (not shown).
Typically, electronic circuit 100 works in an iterative nature: First of all, the first plurality of dispatchers 110 dispatch commands which are processed by the second plurality of processing devices 130. The results of the processing are received by the first plurality of dispatchers, who possibly in dependency on the results, produce new commands. The new commands are processed by the second plurality of processing devices 130 thereby producing new results. The first plurality of dispatchers 110 and second plurality of processing devices 130 alternately use each others results in a new round of computations. This iterative process can at some point terminate. For example, the dispatchers may note that some predetermined termination condition is met, for example, the data on which the computations take place has been exhausted. Also, the iteration may terminate after a predetermined number of rounds. The iteration may also terminate after some predetermined amount of time, for example, by adding interrupt capabilities to the electronic circuit. Note that the process may also be terminated by an external operator, for example, an operating system. A computation performed by the electronic circuit according to the invention may proceed in a number of repeating orderly rounds of computation. However, a more preferable implementation proceeds more irregular, wherein the components of the electronic circuit operate asynchronously. For example, a dispatcher may forward a new command as soon as it has the inputs needed for the new command. Similarly, as soon as the distributor receives a command from a dispatcher it may forward the command as soon as a processing device is available. Similarly, a processing device can proceed with forwarding a result as soon as it finishes a processing. An advantage of this asynchronous mode of operation is that a processing device can be occupied with useful work as soon as it is available. In this way the capabilities of the electronic circuit are more fully exploited. Upon receiving a specific command from first plurality of dispatchers 110, via connection 102, the distributor selects a specific processing device from those processing devices that indicate via an availability signal that they are available. Embodiments for the distributor 120 are expanded upon below. The specific command is forwarded by distributor 120 to the specific processing device, via connection 104. Note that the command itself does not necessarily need to indicate by which processing device the command is to be processed. Preferably, each command may be distributed to any available one of the second plurality of processing devices 130.
Optionally, some processing devices may only be reachable from some of the first plurality of dispatchers, for example, in case the distributor is implemented as a partial network. This has the advantage that the complexity of the distributor is reduced, as fewer connections need to be made. On the other hand, the potential disadvantage of not being able to reach all processing devices is significantly reduced as long as most dispatchers can reach more than one processing device. After a specific processing device has received a specific command, the processing device will process the command. For example, the command may, e.g., comprise a first vector and a second vector and a command indication. The command indication may indicate, e.g., that the first vector is to be added to the second vector, or subtracted, or that their dot product must be calculated, etc. The second plurality of processing devices 130 together produces a fourth plurality of results. The fourth plurality of results is sent to router 140 via connection 106. Typically, a result comprises an indication to which one of the first plurality of dispatchers 110 the result is to be sent. For example, a dispatcher may include in a command it dispatches a number indicating to which dispatcher or dispatchers the eventual result should be sent. A dispatcher may request the result to be sent to itself, but may also request the result to be sent to another dispatcher. The indication to which dispatcher a result must be sent to, can also be a result from the processing of the command, done by a processing device. By design or by accident a first one of the second plurality of processing devices 130 and a second one of the second plurality of processing devices 130 may have substantially different processing speeds even though they are in principle able to perform the same processing. Even if the first processing device is of the substantially a same configuration as the second processing device this may be the case. As a result of the ever decreasing size of processing devices their processing speeds may increasingly diverge. The configuration of electronic circuit 100 is able to utilize a large part of the combined capabilities of the first processing device and the second processing device. As the third plurality of commands are sent to the second plurality of processing devices 130, each processing device may be occupied with a new command to process as soon as the processing device is finished with a previous command.
Optionally, the second plurality of processing devices 130 includes designs of multiple types. For example, the second plurality of processing devices 130 may comprise a first set of processing devices with a design of a first type and a second set of processing devices with a design of a second type. The second set may, for example, only contain a single processing device. A command is called a typed command if the command comprises a type indication. A typed command indicates to the distributor that the command must be executed on a processing device of the particular type indicated by the type indicator. The advantage is that a processing device type that is used frequently, and of which type many may be available, can be used to their maximum combined processing speed. At the same time there may be a few special processing devices that are needed less frequently, and of which only few or even one is available.
A type indicator may be a number, indexing in a list of possible types. A type indicator may be a string of bits, each bit indicating a particular capability a processing device should at least have. In this case the distributor could distribute, e.g., forward, a command to the first available processing device that at least meets all the indicated capabilities. If distributor 120 receives a typed command, the distributor 120 will forward the typed command to a processing device which is of the indicated type and which is available, e.g., the first such one.
Distributor 120 may employ a buffer for the situation that no processing device is available. The buffer can temporarily store commands until a processing device becomes available. Distributor 120 may also include a stalling module. In case the buffer is full and/or almost full the stalling device signals the dispatchers with a stalling signal that they should cease sending commands. In such an embodiment the dispatchers are arranged to receive a stalling signal and will stop sending commands. When the buffer becomes sufficiently empty the staling stalling module may send a resume signal to the dispatchers, upon receiving such, the dispatchers will resume sending commands.
Fig. 2 illustrates a distributor 200 comprising a network 202 comprising distributor circuits 210. Such a network 202 of distributor circuits 210 is described more completely in co-pending European patent application with title "Circuit with network of message distributor circuits.", filed on 13.07.2007, with application number 07112419.2, and the corresponding PCT application filed on 07.07.2008, with application number IB2008/052728, herein incorporated by reference in its entirety. In particular the description describing the Figures: Fig. 1 showing a circuit; Fig. Ia showing a basic distributor circuit; Fig. 2 showing a distributor circuit; Fig. 3 showing an alternative distributor circuit 30; Fig. 3a showing a further distributor circuit; Fig. 4 showing part of a distributor circuit.; and Fig. 5 showing a handshake buffer circuit; are helpful for implementing networks of distributor circuits and variants thereof.
Distributor 200 comprises a network 202 of distributor circuits 210. One embodiment of a network 202 of distributor circuits 210 is the network 202 formed by a fifth plurality of distributor circuits 210. Distributor 200 comprises a fifth plurality of distributor circuits 210. The fifth plurality of distributor circuits 210 is interconnected to form the network 202 of distributor circuits 210. The fifth plurality of distributor circuits 210 comprises a sixth plurality of distributor circuits 212 and a seventh plurality of distributor circuits 214. Four distributor circuits are shown individually: a distributor circuit 220, a distributor circuit 222, a distributor circuit 230 and a distributor circuit 232. The sixth plurality of distributor circuits 212 comprises distributor circuit 220 and distributor circuit 222. The seventh plurality of distributor circuits 214 comprises distributor circuit 230 and distributor circuit 232.
The fifth plurality of distributor circuits 210 is coupled to form a network 202. A distributor circuit typically has multiple source-side interfaces and multiple of consumer- side interfaces. Each distributor circuit is configured to select over which of the consumer- side interfaces a command received at a source-side interfaces, will be transmitted. This selection of the distributor circuit is based, at least partly, on availability signals that the distributor circuit received on its consumer-side interfaces. If the distributor circuit received the availability signal directly from a processing device, it signals to the distributor circuit that it can forward a command to that processing device. If the distributor circuit received the availability signal from some other distributor circuit, it signals that the other distributor circuit is able to forward to an available processing device, either directly or indirectly via yet further distributor circuits. A distributor circuit with a single source-side interface and multiple consumer- side interfaces may help to accommodate some particular number of dispatchers, e.g., an odd number of dispatchers.
The network 202 is coupled between first plurality of dispatchers and second plurality of processing devices 130. Four dispatchers are shown: a dispatcher 112, a dispatcher 114, a dispatcher 116 and a dispatcher 118. Also four processing devices are shown: a processing device 132, a processing device 134, a processing device 136 and a processing device 138.
Dispatcher 112 is connected to a first source-side interface of distributor circuit 220.
Dispatcher 114 is connected to a second source-side interface of distributor circuit 220.
Dispatcher 116 is connected to a first source-side interface of distributor circuit 222. Dispatcher 118 is connected to a second source-side interface of distributor circuit 222.
A first consumer-side interface of distributor circuit 220 is connected to a first source-side interface of distributor circuit 230.
A second consumer-side interface of distributor circuit 220 is connected to a first source-side interface of distributor circuit 232.
A first consumer-side interface of distributor circuit 222 is connected to a second source-side interface of distributor circuit 230.
A second consumer-side interface of distributor circuit 222 is connected to a second source-side interface of distributor circuit 232. A first consumer-side interface of distributor circuit 230 is connected to processing device 132.
A second consumer-side interface of distributor circuit 230 is connected to processing device 134.
A first consumer-side interface of distributor circuit 232 is connected to processing device 136.
A second consumer-side interface of distributor circuit 232 is connected to processing device 138.
Each of the processing devices 132, 134, 136 and 138 is configured to signal its availability upstream to distributor circuits 230 and 232. Distributor circuit 230 combines the received availability signal and is configured to signal upstream to distributor circuit 220 and 222, the availability of at least one of processing devices 132 and 134. Distributor circuit 232 combines the received availability signal and is configured to signal upstream to distributor circuit 220 and 222 the availability of at least one of processing devices 136 and 138.
Suppose, as an example, that only processing device 136 is available and that dispatcher 114 sends a command. The availability signal of processing device 136 is received by distributor circuit 232. Distributor circuit 232 is available to forward a command, since it can forward the command to processing device 136. Distributor circuit 232 sends the availability of itself to distributor circuit 220 and 222. The command sent by dispatcher 114 is received by distributor circuit 220. Distributor circuit 220 received an availability signal from distributor circuit 232 and will forward the command there. Finally, the command is forwarded by distributor circuit 232 to the processing device which has signaled availability, that is, to processing device 136. Note that in between the sixth plurality of distributor circuits 212 and the seventh plurality of distributor circuits 214 may be any number of distributor circuits, in particular there may be an eighth plurality of distributor circuits (not shown). Typically, the consumer-side interface of the first plurality 212 would be connected to source-side interfaces of the eighth plurality, also consumer-side interfaces of the eighth plurality would be connected to source-side interface of the second plurality 214.
Use of a network 202 of distributor circuits, such as in distributor 200, has the advantage that the commands produced by the first plurality of dispatchers 110 may be forwarded to the second plurality of processing devices 130 in parallel and with little overhead. It will be appreciated that networks 202 of distributor circuits can easily be constructed of various network sizes. For example, by adding more dispatcher circuits more dispatchers can be accommodated and/or more processing devices can be accommodated. It is not necessary that all processing devices can be reached from all dispatchers, although preferably all dispatchers can reach at least multiple processing devices. Fig. 3 illustrates in a block diagram a further embodiment of the distributor: distributor 300. Distributor 300 comprises a buffer 302, a multiplexer 304, a connection 306 and a connection 308. The first plurality of dispatchers 110 sends the third plurality of commands via connection 102 to buffer 302. In buffer 302 the third plurality of commands are serialized. Buffer 302 forwards the third plurality of commands one-by-one to multiplexer 304 via connection 306. Multiplexer 304 comprises multiple states, the multiple states indicating the availability of the second plurality of processing devices 130, respectively. Upon receiving an availability signal, or detecting the absence thereof, multiplexer 304 updates the multiple states, so that they reflect the current state of the second plurality of processing devices 130. In the Figure four such processing devices are shown: processing device 132, processing device 134, processing device 136 and processing device 138, each one connected to multiplexer 304.
Upon receiving a command from buffer 302, multiplexer 304 sends the command to the first processing device that is available. For example, the multiplexer may use connection 308 from multiplexer 304 to processing device 138, that device is the first device available. Instead of using the first available device it may be advantageous to use a random available device.
Fig. 4 illustrates dispatcher 400, which may be used for any one of the first plurality of dispatchers 110, for example, for dispatcher 112. Dispatcher 400 comprises a read only memory (ROM) 402, a random access memory (RAM) 404 and a command producer 406. The amount of ROM 402 and/or RAM 404 that is needed, if any, depends on the algorithm that is mapped onto the electronic circuit 100.
The algorithm is loaded into the dispatchers in the first plurality of dispatchers 110. The dispatchers may be implemented as dispatcher 400. Dispatcher 400 executes a simple program to create the command from the values contained in the memories: ROM 402 and RAM 404. Command producer 406 is typically a processor, albeit a much lighter-weight one than the second plurality of processing devices 130. Command producer 406 may also be implemented as a finite-state machine. An executable program may be comprised in ROM 402 and/or RAM 404.
The dispatchers together apply the algorithm to input data. The dispatchers control the operations that have to be performed and the data values to be subjected to the operations. The input values can be read directly from an external memory or other storage location (neither is shown), but are typically placed inside RAM 404. The results produced for the commands will typically be received out-of-order due to variability in the performance of the second plurality of processing devices 130. The dispatcher is configured to take account of data dependencies. For example, if the result of a previous command is needed to create a new command, the receiving dispatcher will perform the necessary bookkeeping to verify that those needed results have been received from one of the second plurality of processing devices 130. For example, use may be made of locking, synchronizing and barriers. If a needed result is not yet received, the dispatchers dependent on the needed result may, temporarily, stall the production of commands.
Fig. 5 illustrates in a flow chart a method 500 according to the invention. Method 500 comprises: sending third plurality of commands from first plurality of dispatchers to a distributor 502; distributing the third plurality of commands from the distributor to the second plurality of processing devices 130 based on indicated availability 504; processing the third plurality of commands using the second plurality of processing devices 130 to produce fourth plurality of results 506; receiving from the second plurality of processing devices 130 the fourth plurality of results in a router 508; and sending the fourth plurality of results to the first plurality of dispatchers 510.
The order of the steps of method 500 can be varied and some steps may be executed in parallel, as will be apparent to a person skilled in the art. Also between the steps of the method other operations can be interposed. In particular it is stressed, that there is no need for synchronized action. It is most advantageous to execute the method in a distributed fashion. For example, sending the third plurality of command will typically overlap with and occur parallel to the processing of those commands.
The present invention, as described in embodiments herein, may be implemented using a programmed processor executing programming instructions that are broadly described above in flow chart form that can be stored on any suitable electronic storage medium. However, those skilled in the art will appreciate that the processes described above can be implemented in any number of variations and in many suitable programming languages without departing from the present invention. For example, the order of certain operations carried out can often be varied, additional operations can be added or operations can be deleted without departing from the invention. Error trapping, enhancements and variations can be added without departing from the present invention. Such variations are contemplated and considered equivalent.
The present invention could be implemented using special purpose hardware and/or dedicated processors. Similarly, general purpose computers, microprocessor based computers, digital signal processors, microcontrollers, dedicated processors, custom circuits, ASICS and/or dedicated hard wired logic may be used to construct alternative equivalent embodiments of the present invention. In a claim enumerating several means, several of these means may be embodied by one and the same item of hardware. Those skilled in the art will appreciate that the program steps and associated data used to implement the embodiments described above can be implemented using disc storage as well as other forms of storage, such as, for example, Read Only Memory (ROM) devices, Random Access Memory (RAM) devices, optical storage elements, magnetic storage elements, magneto-optical storage elements, flash memory and/or other equivalent storage technologies without departing from the present invention. Such alternative storage devices should be considered equivalents.
While the invention has been described in conjunction with specific embodiments, it is evident that many alternatives, modifications, permutations and variations will become apparent to those of ordinary skill in the art in light of the foregoing description. Accordingly, it is intended that the present invention embrace all such alternatives, modifications and variations as fall within the scope of the appended claims.

Claims

CLAIMS:
1. An electronic circuit (100) comprising: a first plurality of dispatchers (110); a distributor (120); and a second plurality of processing devices (130) capable of indicating their individual availabilities through indications; wherein: the first plurality of dispatchers (110) is configured to dispatch a third plurality of commands to the second plurality of processing devices (130), the dispatching includes sending the third plurality of commands to the distributor (120); - the distributor is configured to distribute the third plurality of commands to the second plurality of processing devices (130) under control of at least the indications; the second plurality of processing devices (130) is configured to process the third plurality of commands to produce a fourth plurality of results; and a specific one of the first plurality of dispatchers (110) is configured to dispatch a specific one of the third plurality of commands, the distributor (120) is configured to receive the specific command and to forward the specific command to a specific one of the second plurality of processing devices (130) under control of at least a specific indication indicated by the specific processing device, the specific processing device is configured to produce a specific one of the fourth plurality of results.
2. An electronic circuit (100) as in claim 1 comprising a router (140) configured to receive from the second plurality of processing devices (130) the fourth plurality of results, the router is further configured to send the fourth plurality of results to the first plurality of dispatchers; wherein the router is configured to receive the specific result from the specific processing device and to send the specific result to a particular one of the first plurality of dispatchers in dependency on the result.
3. An electronic circuit as in any one of the previous claims wherein a first one of the second plurality of processing devices (130) and a second one of the second plurality of processing devices (130) are of substantially a same configuration and have substantially different processing speed.
4. An electronic circuit as in any one of the previous claims wherein a first one of the first plurality of dispatchers and a second one of the first plurality of dispatchers are configured to operate in parallel.
5. An electronic circuit as in any one of the preceding claims, wherein the distributor (120) comprises a network (202) for selecting by which of the second plurality of processing devices (130) the commands will be processed, the network comprising a fifth plurality of distributor circuits (210), each distributor circuit (220; 222; 230; 232) having multiple source-side interfaces and multiple of consumer-side interfaces, each distributor circuit (220; 222; 230; 232) being configured to select over which of the consumer-side interfaces commands from the source-side interfaces will be transmitted towards the processing devices (130), based at least partly on signals from the consumer-side interfaces that indicate a current availability to forward the commands via the consumer-side interfaces, wherein said first plurality of dispatchers (110) are coupled to source-side interfaces of a sixth plurality of the distributor circuits (212) in the network and said processing circuits are coupled to consumer-side interfaces of a seventh plurality of distributor circuits (214) in the network, the consumer-side interfaces of the distributor circuits in the first plurality (212) being coupled directly or indirectly to the source-side interfaces of the distributor circuits in the second plurality (214), with a connectivity so that at least two of the first plurality of dispatchers (110) that are coupled to different ones of the distributor circuits in the fifth plurality (212) are both coupled to all of said second plurality of processing devices (130) via the distributor circuits of the seventh plurality (214).
6. An electronic circuit as in any one of the preceding claims, wherein: a set of at least two of the processing devices are of a first type; a typed command of the third plurality of commands includes a type indication for the first type; the distributor is arranged to distribute the typed command to a processing device from the set, indicating availability.
7. An electronic circuit as in any one of the preceding claims, wherein said electronic circuit is arranged to operate asynchronously.
8. A further electronic circuit (100) comprising a first electronic circuit as in claim 1 and a second electronic circuit as in claim 1, wherein the first plurality of dispatchers (110) comprised in the second electronic circuit are configured to receive the fourth plurality of results from the second plurality of processing devices (130) comprised in the first electronic circuit.
9. A method for exploiting a second plurality of processing devices, each one of the second plurality of processing devices capable of indicating its availability, comprising: sending a third plurality of commands from first plurality of dispatchers to a distributor; distributing the third plurality of commands from the distributor to the second plurality of processing devices based on indicated availability; processing the third plurality of commands using the second plurality of processing devices to produce a fourth plurality of results; receiving from the second plurality of processing devices the fourth plurality of results in a router; and - sending the fourth plurality of results to the first plurality of dispatchers.
PCT/IB2009/054069 2008-09-17 2009-09-17 Electronic circuit comprising a plurality of processing devices WO2010032205A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP08164474.2 2008-09-17
EP08164474 2008-09-17

Publications (1)

Publication Number Publication Date
WO2010032205A1 true WO2010032205A1 (en) 2010-03-25

Family

ID=41349673

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2009/054069 WO2010032205A1 (en) 2008-09-17 2009-09-17 Electronic circuit comprising a plurality of processing devices

Country Status (1)

Country Link
WO (1) WO2010032205A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862042A (en) * 2017-11-06 2018-03-30 中国银行股份有限公司 A kind of control method and device of data base concurrency degree

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992003784A1 (en) * 1990-08-23 1992-03-05 Supercomputer Systems Limited Partnership Scheduling method for a multiprocessing operating system
EP1039383A2 (en) * 1999-03-25 2000-09-27 International Business Machines Corporation System and method for scheduling system resources
EP1788491A2 (en) * 2005-11-16 2007-05-23 Alcatel Lucent Thread aware distributed software system for a multi-processor array

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992003784A1 (en) * 1990-08-23 1992-03-05 Supercomputer Systems Limited Partnership Scheduling method for a multiprocessing operating system
EP1039383A2 (en) * 1999-03-25 2000-09-27 International Business Machines Corporation System and method for scheduling system resources
EP1788491A2 (en) * 2005-11-16 2007-05-23 Alcatel Lucent Thread aware distributed software system for a multi-processor array

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862042A (en) * 2017-11-06 2018-03-30 中国银行股份有限公司 A kind of control method and device of data base concurrency degree

Similar Documents

Publication Publication Date Title
US11907726B2 (en) Systems and methods for virtually partitioning a machine perception and dense algorithm integrated circuit
CN110619595B (en) Graph calculation optimization method based on interconnection of multiple FPGA accelerators
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
US8954986B2 (en) Systems and methods for data-parallel processing
Hoefler et al. Towards efficient mapreduce using mpi
US7487302B2 (en) Service layer architecture for memory access system and method
KR102466984B1 (en) Improved function callback mechanism between a central processing unit (cpu) and an auxiliary processor
US10402223B1 (en) Scheduling hardware resources for offloading functions in a heterogeneous computing system
KR20210057184A (en) Accelerate data flow signal processing applications in heterogeneous CPU/GPU systems
CN103608776A (en) Dynamic work partitioning on heterogeneous processing device
US20090006296A1 (en) Dma engine for repeating communication patterns
US20220027716A1 (en) Neural network accelerator
WO2019147708A1 (en) A deep learning accelerator system and methods thereof
JP2021518591A (en) Systems and methods for implementing machine perception and high density algorithm integrated circuits
US8959319B2 (en) Executing first instructions for smaller set of SIMD threads diverging upon conditional branch instruction
CN111653317B (en) Gene comparison acceleration device, method and system
CN111475205B (en) Coarse-grained reconfigurable array structure design method based on data flow decoupling
JP4404228B2 (en) Task scheduling system, method, and program
US20130117533A1 (en) Coprocessor having task sequence control
CN103197917A (en) Compute thread array granularity execution preemption
WO2010032205A1 (en) Electronic circuit comprising a plurality of processing devices
Huang et al. Performance optimization of High-Performance LINPACK based on GPU-centric model on heterogeneous systems
US6768336B2 (en) Circuit architecture for reduced-synchrony on-chip interconnect
JP7357767B2 (en) Communication in computers with multiple processors
US11940940B2 (en) External exchange connectivity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09787225

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2009787225

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09787225

Country of ref document: EP

Kind code of ref document: A1