WO2009039566A1 - Accélérateur numérique reconfigurable - Google Patents

Accélérateur numérique reconfigurable Download PDF

Info

Publication number
WO2009039566A1
WO2009039566A1 PCT/AU2008/001415 AU2008001415W WO2009039566A1 WO 2009039566 A1 WO2009039566 A1 WO 2009039566A1 AU 2008001415 W AU2008001415 W AU 2008001415W WO 2009039566 A1 WO2009039566 A1 WO 2009039566A1
Authority
WO
WIPO (PCT)
Prior art keywords
fpga
random number
functional nodes
chip
time slice
Prior art date
Application number
PCT/AU2008/001415
Other languages
English (en)
Inventor
Michael Reznik
Original Assignee
Michael Reznik
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2007905235A external-priority patent/AU2007905235A0/en
Application filed by Michael Reznik filed Critical Michael Reznik
Publication of WO2009039566A1 publication Critical patent/WO2009039566A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/34Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]

Definitions

  • This invention concerns field programmable gate array (FPGA) chips, and in particular a dynamically reconfigurable computing system for executing a function, a method of executing a function, a method of developing a FPGA-based function, and a random number generator.
  • FPGA field programmable gate array
  • FPGA chips are programmable integrated circuits containing an array of configurable logic blocks connected via programmable switches or interconnects. Unlike devices that are custom-built for a particular design, FPGA chips are configurable to perform specific functionalities. The function of a FPGA chip is defined by a user's program, which typically alters the function of and selectively interconnects the blocks.
  • FPGA chips are used as replacement for DSP processors to implement complex signal processing applications.
  • the capacity of FPGA chips has exceeded 10 million gates per die, making it possible to implement highly complex algorithms on one chip.
  • the rapid evolution of development tools for FPGA chips also makes it possible to design, debug and simulate algorithms before transferring the debugged software to a chip.
  • FPGA-based applications may be developed using either hardware description languages (HDL) or high-level programming languages (HLL).
  • HDL hardware description languages
  • HLL high-level programming languages
  • the invention is a dynamically reconfigurable computing system for executing a function, the system comprising: one or more interconnected FPGA chips, each chip comprising a plurality of functional nodes that are each associated with an array position and interconnected to other nodes by programmable data buses; a library associated with the one or more FPGA chips to store loadable objects, each object being precompiled for an array position on a FPGA chip and loadable into a functional node at that array position on any of the one or more FPGA chips; a dynamic resource allocator operable to : receive a request to execute the function, select functional nodes in the one or more FPGA chips that are available to execute function, retrieve, from the library, loadable objects that are associated with the function and the array position of the selected functional nodes, load the retrieved objects into the selected functional nodes, and program the data buses to route data to and from the selected functional nodes, activate the loaded objects, and control operation of the loaded objects.
  • the invention allows an application to be built without the need for run-time compilation, simulation and verification.
  • the invention therefore provides a FPGA-based computing system that is more accessible from an application layer, allowing developers who are less familiar with FPGA hardware architecture to develop an entire application using HLL without considering the hardware that it will run on, provided that necessary objects exist in the library.
  • the invention may be used to execute high performance applications which require high volume data processing and parallel calculations. Examples of such applications are Monte Carlo and Stochastic Monte Carlo simulations.
  • the invention may also be used for risk and financial simulations, statistical physics analysis and modelling in the fields of biochemistry and molecular biology.
  • Each FPGA chip may further comprise a time slice manager in communication with the functional nodes on the chip and the dynamic resource allocator, and the dynamic resource allocator is operable to activate and control operation of a loaded object on a FPGA chip by scheduling the time slice manager of that chip to send a command to the loaded object to perform the operation during a time slice.
  • time slice manager allows dynamic activation, deactivation and connection of loaded objects without generating bit streams and interfering neighbouring functional nodes.
  • Scheduling the time slice manager to send a command to a loaded object may involve updating an entry in a table associated with the time slice manager, the entry specifying one or more of: the command to be sent, array position of the functional node of the loaded object, period of the time slice to perform the command, and
  • Period of the time slice may depend on I/O and memory access requirements of the loaded object. In this case, period of the time slice may depend on the array position of the functional node associated with the object if all other functional nodes on the chip have substantially similar I/O and memory access requirements.
  • the command to a loaded object may be one of: activating the object; deactivating the object; connecting the object to a memory or I/O interface; reading data from a connected memory or I/O interface; writing data to a connected memory or I/O interface; and disconnecting the object from a memory or I/O interface.
  • Scheduling the time slice manager of a FPGA chip may be performed using either: an external processor in communication with the FPGA chip, the dynamic resource allocator and the time slice manager, or a dynamic reconfiguration port associated with the FPGA chip.
  • the system may further comprise a random number generator in communication with the FPGA chips, the random number generator is operable to: receive a request to deliver one or more random number sequences to one or more functional nodes of the one or more FPGA chips; generate the one or more random number sequences, each random number in a sequence being generated according to a probability distribution and one or more parameters associated with the distribution; and automatically deliver the one or more generated random number sequences to the one or more functional nodes.
  • a random number generator in communication with the FPGA chips, the random number generator is operable to: receive a request to deliver one or more random number sequences to one or more functional nodes of the one or more FPGA chips; generate the one or more random number sequences, each random number in a sequence being generated according to a probability distribution and one or more parameters associated with the distribution; and automatically deliver the one or more generated random number sequences to the one or more functional nodes.
  • the random number generator may be further operable to dynamically update the probability distribution or the one or more parameters associated with a random number sequence based on a request from the one or more functional nodes or the dynamic resource allocator.
  • the one or more random number sequences are delivered to the one or more functional nodes according to a delivery time specified in the request.
  • the dynamic resource allocator may be operable to load the retrieved objects into the selected functional nodes without interrupting operation of functional nodes at other array positions. This step may be performed using either an external processor in communication with the FPGA chip and the dynamic resource allocator, or a dynamic reconfiguration port associated with the FPGA chips.
  • a functional node may represent a physical location on a FPGA chip and comprises one or more configurable logic blocks, multiply units or FPGA memory blocks.
  • the invention is a method of executing a function using one or more interconnected FPGA chips, each chip comprising a plurality of functional nodes that are each associated with an array position and interconnected by programmable data buses, the method comprising the steps of: receiving a request to execute a function, selecting functional nodes in the one or more FPGA chips that are available to execute the function, retrieving, from a library, loadable objects that are associated with the function and the array position of the selected functional nodes, loading the retrieved objects into the selected functional nodes, and programming the data buses to route data to and from the selected functional nodes, activating and controlling operation of the loaded objects.
  • the step of activating and controlling operation of a loaded object on a chip may involve scheduling a time slice manager associated with that chip to send a command to the loaded object to perform the operation during a time slice.
  • Scheduling the time slice manager to send a command to a loaded object may involve updating an entry in a table associated with the time slice manager, the entry specifying one or more of: the command to be sent, array position of the functional node of the loaded object, period of the time slice to perform the command, and
  • Period of the time slice may depend on I/O and memory access requirements of the loaded object. In this case, period of the time slice may also depend on the array position of the functional node associated with the object if all other functional nodes on the chip have substantially similar I/O and memory access requirements.
  • the command to a loaded object may be one of: activating the object; deactivating the object; connecting the object to a memory or interface; reading data from a connected memory or interface; writing data to a connected memory or interface; and disconnecting the object from a memory or interface.
  • the step of scheduling the time slice manager of a FPGA chip may be performed using either: an external processor in communication with the chip and the time slice manager, or a dynamic reconfiguration port associated with the chip.
  • the method may further comprise the step of sending a request to a random number generator to deliver one or more random number sequences to one or more functional nodes of the one or more FPGA chips, each number in a sequence being generated according to a probability distribution and one or more parameters associated with the distribution.
  • the method may further comprise sending a request to the random number generator to dynamically update the probability distribution or the one or more parameters, or both. Further, the step of loading the retrieved objects into the selected functional nodes is performed without interrupting operation of functional nodes at other array positions.
  • the loading step may be performed using either: an external processor in communication with the FPGA chip, or a dynamic reconfiguration port associated with the FPGA chip.
  • a functional node may represent a physical location on a FPGA chip and comprises one or more configurable logic blocks, multiply units or FPGA memory blocks.
  • the step of activating and controlling operation of a loaded object on a chip may involve sending a bit stream to the chip to connect the loaded object node to a data bus, a memory or an interface.
  • the bit stream is either dynamically generated or retrieved from a database.
  • the request to execute a function may be received from a user application via an Application Program Interface (API).
  • API Application Program Interface
  • a loadable object in the library may be either a core object containing minimum executable components, or a package object comprising plural core objects.
  • the library may further comprise a description table, an entry of which specifying the function of each loadable object and its I/O interface requirements.
  • the invention is a method of developing a FPGA-based function, the method comprising the steps of: developing the function using a library of objects, wherein each object is precompiled for an array position on a FPGA chip and loadable into a functional node at that array position on one or more FPGA chips, and wherein the function is developed without requiring compilation, simulation and verification prior to execution; and executing the developed function using the one or more FPGA chips according to the method of executing a function using one or more interconnected FPGA chips.
  • the invention is a random number generator in communication with one or more interconnected FPGA chips, each chip comprising a plurality of functional nodes interconnected by programmable data buses, the random number generator being operable to: receive a request to deliver one or more random number sequences to one or more functional nodes of the one or more FPGA chips; generate the one or more random number sequences, each random number in a sequence being generated according to a probability distribution and one or more parameters associated with the distribution; and automatically deliver the one or more generated random number sequences to the one or more functional nodes.
  • the random number generator may be further operable to dynamically update the probability distribution or the one or more parameters associated with a random number sequence based on a request from the one or more functional nodes or an external processor in communication with the functional nodes.
  • the one or more random number sequences may be delivered to the one or more functional nodes according to a delivery time specified in the request.
  • the random number generator according to the invention has to ability to deliver random numbers to one or more functional nodes of one or more FPGA chips.
  • processing time can be reduced because calculation of a function based on the same random number sequence can be distributed over multiple functional nodes on a chip, or over multiple FPGA chips.
  • Fig. 1 is a diagram of a computing system exemplifying the invention.
  • Fig. 2 is a diagram of a FPGA chip according to the system in Fig. 1.
  • Fig. 3 is a diagram of Input/Output (I/O) interfaces of a FPGA chip.
  • Fig. 4 is a diagram of an exemplary resource allocation table.
  • Fig. 5 is a diagram of an exemplary object library.
  • Fig. 6 is a flowchart of a method of executing a function.
  • Figs. 7(a) and (b) show two alternative methods for step 445 in Fig. 6.
  • Fig. 8(a) to (c) show exemplary outputs of the methods in Fig. 6 and Fig. 7.
  • Figs. 9(a) and (b) are exemplary time slice processing tables.
  • Fig. 10 is an operation timeline of a programmable internal data bus according to the time slice processing tables in Fig. 9.
  • Fig. 11 is a flowchart of a method of providing a random number sequence by the random number generator.
  • the computing system 100 comprises plural FPGA chips 200 in communication with an application program interface (API) 110; a Dynamic Resource Allocation Manager (DRAM) 120; an external processor 130; and an object library 160 via communications interface 150 and switch 155.
  • API application program interface
  • DRAM Dynamic Resource Allocation Manager
  • the system 100 may be used as a Numeric Accelerator for high volume data processing.
  • the API 110 provides an interface for user applications 105 developed in high level language to access the system 100.
  • PCIe PCIe interface
  • FPGA HyperTransport
  • HT HyperTransport
  • the following units are implemented on the external processor 130, which may also be a FPGA chip: a Dynamic Loader processor 132 to install library objects into FPGA chips; a Configuration Manager 136 to configure functional nodes 210; and a Random Number Generator 134 to generate random numbers according to various probability distributions.
  • a Dynamic Loader processor 132 to install library objects into FPGA chips
  • a Configuration Manager 136 to configure functional nodes 210
  • a Random Number Generator 134 to generate random numbers according to various probability distributions.
  • the chip 200 comprises plural configurable functional nodes 210 arranged in a (m+ l)-by-(n+ 1) array or grid, m+1 being the number of rows and n+1 being the number of columns.
  • Functional nodes 210 are each identifiable by an identifier (ID) associated with their array position on the grid. For example, functional nodes on the first column are identified by "00", “01” to "0m” while blocks "n ⁇ ", "nl” to "nm” are on the (n+7)th or last column. All functional nodes 210 on the grid are interconnected by an internal programmable data buses 230.
  • Neighbouring functional nodes 210 such as “nla” and “nOa” are connected by direct data buses 220.
  • Each functional node 210 may comprise one or more configurable logic blocks, multiply units and FPGA memory blocks. Note that the functional nodes 210 may run the same or different type of library objects.
  • Time Slice Manager 260 is an internal block within a FPGA chip and is responsible for resource allocation and scheduling on the internal data bus 220 and 230. Time Slice Manager 260 also installs, activates, deactivates functional nodes 210 and connects the nodes 210 to internal I/O interface 250, internal Configuration Manager Interface 280, internal Random Number Generator Interface 275 and memory interfaces 240. Every time an object is installed in a functional node, Time Slice Manager 260 activates the functional node and its connection to the internal data bus 230.
  • Time Slice Manager 260 organises functional node operations using a Time Slice Processing Table 265, which contains microcode for the state machine of the Time Slice Manager 260 and stores the Time Slice Manager's current and future operations on two pages. Each entry in the table represents an operation associated with a functional node on the chip in a particular processing loop.
  • the internal I/O interface 250 connects the FPGA chip 200 to an external I/O data interface 305; the internal Configuration Manager Interface 280 connects the FPGA chip 200 to a Configuration Manager 136 on the external processor 130 via an external Configuration Manager Interface 310; while the internal Random Number Generator Interface 275 connects the FPGA chip 200 to a Random Number Generator 134 on the external processor 130 via an external Random Number Generator Interface 320.
  • Each FPGA chip 200 further comprises one or more internal memory interfaces 240 to external memory devices that are used for temporary storage of incoming data and computational results.
  • Fig. 3 shows a FPGA chip that is connected to a DDR-2 DIMM memory chip 330 and two very fast RLDRAM-II memory chips 340. Other suitable memory may also be used.
  • DRAM 120 Dynamic Resource Allocation Manager (DRAM) 120
  • DRAM 120 is an application responsible for dynamic resource allocation in the computer system 100. It makes all routing decisions for incoming and outgoing data to and from all FPGA chips 200 and reports results to the API when required. DRAM 120 also controls functional node operations in the system using Dynamic Loader 132, Configuration Manager 136 and Time Slice Manager 260 during execution of functions.
  • DRAM 120 maintains the status of all functional nodes in the system in a Resource Allocation Table 170 shown in Fig. 4.
  • a functional node is either Available (A) or Busy (B).
  • Each functional node is identified by i.c,p), where c is the ID of its FPGA chip and/? is the array position of the block on the chip.
  • functional node (1,Om-I) is Busy (B) 172 while functional node (2,00) is Available (A) 174.
  • DRAM 120 also maintains a object library 160 containing two types of object: core objects 162 and package objects 164.
  • Core objects 162 are minimum executable components while package objects 164 have a higher level function built from plural core objects 162.
  • a library object is compiled for each functional node 210 on an array position and the same object can be loaded into a functional node 210 at that position on any of the chips 200.
  • the library of core objects 162 may be developed using existing CAD tools for FPGA development. During core object development process, the source code for each core object is designed, implemented, debugged and then compiled for every array position on an FPGA chip. Library objects are generated using FPGA development tools that allow a code to be compiled to a specific array position on a chip. A Description Table 166 of all core objects is then created to store a unique ID of each library object; amount of resources required and I/O and memory interface requirements. Package objects 164 are created by compiling core objects 162 for specific component implementations. Similar to the development of core object library, the compiled code is also tested for various array positions 210 within the FPGA 200 and encapsulated as package objects. Each package object 164 also contains information describing the functionality of the object, a list of used core components with their position and I/O and memory interface requirements, and a unique object ID.
  • FIG. 5 A diagram of an exemplary object library 160 is shown in Fig. 5.
  • This library comprises four objects 162: a, b, c and d.
  • objects 162 Associated with each object is an array of objects 164 that are precompiled for each of the array positions on a FPGA chip.
  • object a is precompiled for each of the array positions 00 ... Om on the first column to positions n0 ... nm on the last column.
  • a precompiled, position-dependent object may be loaded into the relevant functional node on any of the FPGA chips in the system.
  • object a00 can be loaded into functional node 00 on any of the FPGA chips labelled 1 to N, as identified by (1,00), (2,00), .... (N.00).
  • a core object 162 may occupy more than one functional node 210 within the system 100 if necessary.
  • a group of neighbouring functional nodes may be joined to form a larger node.
  • Object blocks may be extended in both the vertical and horizontal directions. For example, block 00 may be joined with block 01 on the upper row or block 10 on the next column to form an object that occupies two array positions.
  • the size of library components is also recorded in the description table.
  • the application 105 comprises one or more functions, each of which is developed using objects in the library 160 without requiring compilation, simulation and verification prior to execution.
  • the application may be developed using high level languages such as C and C++.
  • the application 105 sends a request to the system 100 to execute one of its functions via API 110, the request is first forwarded to the DRAM 120, which then determines whether the library has the necessary objects to execute the function; steps 405 and 410.
  • the function 300 in Fig. 8 requires objects a, b and d, which are then located in the library 160. If the DRAM 120 fails to locate the required loadable objects, the DRAM reports an error to the API; step 450.
  • the DRAM proceeds to determine the amount of functional nodes required to load the objects using Description Table 166; step 420. hi the example in Fig. 8, objects a and b each require one functional node while d requires two. Therefore, a total of 5 functional nodes are required given that a is executed twice. DRAM then checks Resource Allocation Table 170 to locate one or more FPGA chips that have the amount of resources required; step 425. If the system 100 has no available resources, the DRAM 120 reports an error to the API 110; step 450.
  • the DRAM 120 selects the available functional nodes to execute the function; step 430.
  • Resource Allocation Table 170 shows that FPGA 1 has three and FPGA 2 has 13 available functional nodes.
  • the selected functional nodes are:
  • Functional nodes (1,00), (1,11), (1,34) are on FPGA 1 while (2,03) and (2,04) are on FPGA 2.
  • functional nodes (2,03) and (2,04) are neighbours in the vertical direction.
  • node (2,14) may also be selected instead of (2,03).
  • more than one FPGA chip may be selected if the objects require more resources.
  • DRAM 120 retrieves loadable objects from the library that are specific to the position of the selected functional nodes within the FPGA chip; step 435.
  • the retrieved objects are: aOO, bll, a34 and [dO3,dO4].
  • Library objects aOO, bll and a34 require one functional node while object [dO3,dO4] requires two functional nodes.
  • Dynamic Loading (step 440) The retrieved library objects are then dynamically loaded into the selected functional nodes without interrupting other nodes; step 440 in Fig. 6.
  • objects aOO, bll and a34 are respectively loaded into functional nodes (1,00), (1,11) and (1,34) while [d03, dO4] is loaded into [(2,03), (2,04)].
  • This step may be performed by the Dynamic Loader 132 using selective programming to place the retrieved library objects on the selected functional nodes.
  • DRAM 120 may load the objects to the relevant FPGA chips via a dynamic reconfiguration port, a port present in Xilinx FPGA chips which permits reconfiguration of a functional node while the system is operating.
  • the loaded library objects are activated in step 445.
  • the loaded objects are also connected to internal resources such as memory interface 240 and I/O data interface 250 via internal programmable data buses 220 and 230.
  • This activation and connection step may be performed using either:
  • bit streams see Fig. 7(a); or (ii) Configuration Manager and Time Slice Manager, see Fig. IQa).
  • DRAM 120 first analyses the positioning and interface requirements of the retrieved objects and selected functional nodes. DRAM 120 then proceeds to generate a connection bit stream for FPGA or link code to configure the programmable internal data buses 220 and 230 to interconnect the selected functional nodes 210; step 450.
  • the connection bit stream may be generated using a Link Manager module within the DRAM 120 or retrieved from a local database connected to the DRAM.
  • the bit stream is a series of bits representing configuration data for the FPGA.
  • the generated bit streams are then dynamically loaded to the selected FPGA without interrupting other functional nodes; step 455. This may be done either: using a Dynamic Loader processor 132, or directly to the FPGA via a Dynamic reconfiguration port. Note that steps 450 and 455 in Fig. 7(a) may be combined with step 440 in Fig. 6.
  • the bit streams for dynamic loading, activation and connection may be combined and loaded into the FPGA chips at the same time.
  • DRAM 120 schedules Time Slice Manager 260 to activate or deactivate loaded objects or connect the objects to internal I/O and memory resources. Referring to Fig. 7(b), DRAM 120 first determines the resource and interface requirements of the retrieved objects to generate commands for the Time Slice Processing Table 265; steps 462.
  • objects aOO, bll, a34 and [dO3,dO4] are first loaded into functional nodes (1,00), (1,11), (1,34) and [(2,03),(2,04)] respectively.
  • DRAM After dynamic loading is complete, DRAM generates the following commands for: functional nodes (1,00), (1,11) and (1,34) in the Time Slice Processing Table
  • the generated commands are then uploaded onto the Time Slice Processing Table 265 either using Configuration Manager 136 or via the dynamic reconfiguration port associated with the chip; step 464.
  • Time Slice Manager 260 reads and performs the command in the Time Slice Processing Table 265 that is associated with the time slice; step 466.
  • Commands of the Time Slice Manager 265 include, but not limited to, the following: activate a loaded library object, deactivate a loaded library object, connect a loaded library object and the associated functional node to I/O data interface 250 to read or write data, connect a loaded library object and the associated functional to memory interface 240 to read data from or write data to a memory device, and connect to the Random Number Generator 134 to read a random number.
  • An exemplary Time Slice Processing Table 265 associated with FPGA 1 is shown in Fig. 9.
  • the Table is updated with the command entries for functional nodes 00, 11 and 34, which are scheduled to be activated during time slice period numbers 1, 7 and 15 respectively of a current loop.
  • Each entry in the table contains the command to be sent, array position or ID of the functional node of a loaded object, period of the time slice (time slice period number) to perform the command, and I/O and memory access requirements to perform the command.
  • the selected functional nodes may require access to I/O and memory data bus.
  • node 00 is deactivated
  • node 77 is connected to memory device 1 and node 34 is connected to Random Number Generator 134.
  • the Time Slice Processing Table 265 of FPGA 2 will have a similar "activate" command entry for functional nodes [(2, 03), (2,04)].
  • objects can exchange data between each other based on connection routing set up by the DRAM 120 via Configuration Manager 136 and Time Slice Manager 260.
  • DRAM will report the results to the user application 105 via API 110; step 470 in Fig. 6.
  • Time Slice Manager 260 associated with the Time Slice Processing Table 265 in Fig. 9 are plotted in time domain in Fig. 10.
  • a processing loop is divided into nm time slice periods in the time domain, each of which is allocated to a functional node in the chip. Operations are performed according to an ordered list based on a functional node's internal ID, starting from block 00 in time slice period 1, 01 in time slice period 2 to block 44 in time slice 25. The same order is used in the next processing loop.
  • the library objects loaded into each of the functional nodes on a FPGA chip are assumed to be of the same type. Such setup is common in applications like Monte Carlo. Since the functional nodes have similar I/O and memory access requirements, the internal data buses 230 can be managed effectively. Starting the operation in each functional node with a delay based on its ID creates an internal data bus with zero latency in terms of arbitration for data access. Alternatively, the functional nodes 210 on the grid may also be different, requiring data access at different times. In this case, a First-In-First-Out (FIFO) queue may be used to order the operations and alignment of data exchange. The order of the operations is controlled by DRAM 120 by updating the Time Slice Processing Table 265.
  • FIFO First-In-First-Out
  • the Random Number Generator 134 is operable to receive a request to deliver one or more random number sequences each comprising random numbers; see step 510.
  • a request may be received from one or more functional nodes of one or more FPGA chips via the Random Number Generator Interface 320 and takes the form of:
  • Distribution is the probability distribution of the random number, such as normal, linear, binomial and Poisson;
  • Parameters is one or more parameters associated with the distribution;
  • Sequence Length is the length of one or more random number sequences, which is at least one;
  • “Seed” is the seed number used to initialise the random number generator, which may also be random;
  • Delivery Time is the timing of the delivery, which may be represented by one or more time slice period numbers; and “Destination” is the one or more functional nodes to which the random numbers are delivered.
  • the Random Number Generator 134 would then proceed to generate the sequence according to the request; see step 520.
  • the generated sequence or sequences will then be delivered during time slice period numbers 1, 26 and 43 to nodes 00 of FPGA 1 and 03 of FPGA 2; see step 530 in Fig.
  • the generated random number sequence or sequences can be simultaneously delivered directly by the Random Number Generator 134, or by the DRAM using bit streams or by scheduling the Time Slice Manager 260 to connect the nodes to the Random Number Generator 134 to allow the delivery.
  • the ability to deliver a random number sequence to multiple FPGA chips allows calculation based on the same sequence to be distributed over the chips, reducing overall processing time. Further, by allowing the timing of the delivery to be specified in a request, the Random Number Generator 134 allows just-in-time delivery of the sequence just before it is required. This can be achieved from the external Generator 134 or alternatively, FPGA could install a Random Number Generation function inside FPGA using a RNG object, that is a package object 164 stored in the object library 160.
  • the parameters of the probability distributions can also be dynamically changed based on the feedback of the functional nodes by updating the parameters in the request.
  • Multiple random number sequences of the same or different distributions may also be generated and delivered to one or more functional nodes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

La présente invention concerne des puces à réseau prédiffusé programmable par l'utilisateur (FPGA). Selon un aspect, l'invention concerne un système de calcul reconfigurable dynamiquement pour exécuter une fonction. Le système comprend : une ou plusieurs puces FPGA interconnectées, chaque puce comprenant une pluralité de nœuds fonctionnels qui sont chacun associés à une position de réseau et interconnectés aux autres nœuds par des bus de données programmables; une bibliothèque associée à la ou les puces FPGA en vue de stocker des objets chargeables, chaque objet étant précompilé pour une position de réseau sur une puce FPGA et pouvant être chargé dans un nœud fonctionnel à cette position de réseau sur la ou l'une des puces FPGA; un distributeur de ressource dynamique pouvant recevoir une demande d'exécuter la fonction, choisir les nœuds fonctionnels dans le ou les puces FPGA qui sont disponibles pour exécuter la fonction, récupérer dans la bibliothèque les objets pouvant être chargés qui sont associés à la fonction et à la position de réseau des nœuds fonctionnels choisis, charger les objets récupérés dans les nœuds fonctionnels choisis, et programmer les bus de données pour acheminer les données en direction et à partir des nœuds fonctionnels choisis, activer les objets chargés et commander le fonctionnement des objets chargés. Selon un autre aspect, l'invention concerne un procédé d'exécution de fonction, un procédé de développement d'une fonction basée sur une matrice FPGA et un générateur de nombre aléatoire.
PCT/AU2008/001415 2007-09-25 2008-09-24 Accélérateur numérique reconfigurable WO2009039566A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2007905235A AU2007905235A0 (en) 2007-09-25 Reconfigurable Numeric Accelerator
AU2007905235 2007-09-25

Publications (1)

Publication Number Publication Date
WO2009039566A1 true WO2009039566A1 (fr) 2009-04-02

Family

ID=40510662

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2008/001415 WO2009039566A1 (fr) 2007-09-25 2008-09-24 Accélérateur numérique reconfigurable

Country Status (1)

Country Link
WO (1) WO2009039566A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015042684A1 (fr) * 2013-09-24 2015-04-02 University Of Ottawa Virtualisation d'accélérateur de matériel
EP2627005A3 (fr) * 2012-02-08 2017-05-17 Altera Corporation Procédé et appareil pour la mise en oeuvre de dispositifs périphériques sur un circuit programmable au moyen de la reconfiguration partielle
CN107566524A (zh) * 2017-10-11 2018-01-09 中船重工(武汉)凌久电子有限责任公司 一种基于以太网的远程加载管理系统及远程加载管理方法
CN116107726A (zh) * 2023-04-13 2023-05-12 上海思尔芯技术股份有限公司 Fpga资源调度方法、装置、设备以及存储介质
CN117610472A (zh) * 2024-01-24 2024-02-27 上海合见工业软件集团有限公司 超大规模集群fpga原型验证系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6091263A (en) * 1997-12-12 2000-07-18 Xilinx, Inc. Rapidly reconfigurable FPGA having a multiple region architecture with reconfiguration caches useable as data RAM
US20050097305A1 (en) * 2003-10-30 2005-05-05 International Business Machines Corporation Method and apparatus for using FPGA technology with a microprocessor for reconfigurable, instruction level hardware acceleration
US7228531B1 (en) * 2003-02-03 2007-06-05 Altera Corporation Methods and apparatus for optimizing a processor core on a programmable chip

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6091263A (en) * 1997-12-12 2000-07-18 Xilinx, Inc. Rapidly reconfigurable FPGA having a multiple region architecture with reconfiguration caches useable as data RAM
US7228531B1 (en) * 2003-02-03 2007-06-05 Altera Corporation Methods and apparatus for optimizing a processor core on a programmable chip
US20050097305A1 (en) * 2003-10-30 2005-05-05 International Business Machines Corporation Method and apparatus for using FPGA technology with a microprocessor for reconfigurable, instruction level hardware acceleration

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2627005A3 (fr) * 2012-02-08 2017-05-17 Altera Corporation Procédé et appareil pour la mise en oeuvre de dispositifs périphériques sur un circuit programmable au moyen de la reconfiguration partielle
US9852255B2 (en) 2012-02-08 2017-12-26 Altera Corporation Method and apparatus for implementing periphery devices on a programmable circuit using partial reconfiguration
WO2015042684A1 (fr) * 2013-09-24 2015-04-02 University Of Ottawa Virtualisation d'accélérateur de matériel
CN105579959A (zh) * 2013-09-24 2016-05-11 渥太华大学 硬件加速器虚拟化
US10037222B2 (en) 2013-09-24 2018-07-31 University Of Ottawa Virtualization of hardware accelerator allowing simultaneous reading and writing
CN107566524A (zh) * 2017-10-11 2018-01-09 中船重工(武汉)凌久电子有限责任公司 一种基于以太网的远程加载管理系统及远程加载管理方法
CN116107726A (zh) * 2023-04-13 2023-05-12 上海思尔芯技术股份有限公司 Fpga资源调度方法、装置、设备以及存储介质
CN116107726B (zh) * 2023-04-13 2023-07-18 上海思尔芯技术股份有限公司 Fpga资源调度方法、装置、设备以及存储介质
CN117610472A (zh) * 2024-01-24 2024-02-27 上海合见工业软件集团有限公司 超大规模集群fpga原型验证系统
CN117610472B (zh) * 2024-01-24 2024-03-29 上海合见工业软件集团有限公司 超大规模集群fpga原型验证系统

Similar Documents

Publication Publication Date Title
EP2216722B1 (fr) Divers procédés et appareil pour la cartographie configurable de régions d'adresse sur une ou plusieurs cibles agrégées
US7716622B2 (en) Memory re-implementation for field programmable gate arrays
US10242146B2 (en) Method and apparatus for placing and routing partial reconfiguration modules
US6871341B1 (en) Adaptive scheduling of function cells in dynamic reconfigurable logic
US7849441B2 (en) Method for specifying stateful, transaction-oriented systems for flexible mapping to structurally configurable, in-memory processing semiconductor device
CN101221514B (zh) 选择处理器对程序而言遵循的架构级别的方法、处理器及系统
US20020162097A1 (en) Compiling method, synthesizing system and recording medium
US8365111B2 (en) Data driven logic simulation
JP6113964B2 (ja) 動的ポート優先割当能力を有しているメモリコントローラー
WO2015026233A1 (fr) Plateforme informatique, dispositif matériel reconfigurable et procédé pour exécuter simultanément des processus sur un dispositif matériel dynamiquement reconfigurable, comme une fpga, ainsi que processeurs d'ensembles d'instructions, comme un cpu, et support lisible par ordinateur associé
EP2441013A1 (fr) Réseau processeur à plusieurs unités et à ressources partagées
CN1342940A (zh) 到一个共享的协处理器资源的多个逻辑接口
JP2009129447A (ja) デザイン構造、ネットワーク・オン・チップ(‘noc’)でのデータ処理方法、ネットワーク・オン・チップ、およびネットワーク・オン・チップでのデータ処理のためのコンピュータ・プログラム(パーティションを有するネットワーク・オン・チップのためのデザイン構造)
WO2009039566A1 (fr) Accélérateur numérique reconfigurable
JP2010244435A (ja) キャッシュ制御装置及びキャッシュ制御方法
CN103258074A (zh) 使用部分重构在可编程电路上实施外围器件的方法和装置
WO2022040230A1 (fr) Reconfiguration de circuits intégrés programmables
JP2005004736A (ja) データ処理装置用バス相互接続ブロックの設計に関するフレキシビリティの改善
US20020010825A1 (en) Memory resource arbitrator for multiple gate arrays
JP5007838B2 (ja) 情報処理装置および情報処理プログラム
WO2014004736A1 (fr) Procédé ou appareil permettant de réaliser une optimisation basée sur l'encombrement en même temps que d'autres étapes
US9910810B1 (en) Multiphase I/O for processor-based emulation system
US10860763B1 (en) Data routing and multiplexing architecture to support serial links and advanced relocation of emulation models
US7146480B2 (en) Configurable memory system
TW202217564A (zh) 可重組態資料流資源的運行時間虛擬化

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08800050

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112 (1) EPC (EPO FORM 1205A DATED 20/09/2010)

122 Ep: pct application non-entry in european phase

Ref document number: 08800050

Country of ref document: EP

Kind code of ref document: A1