DE19655265B4

DE19655265B4 - Phase-synchronous, flexible frequency operation for scalable, parallel, dynamically reconfigurable computer - uses reconfigurable processor units and general-purpose interconnect matrix

Info

Publication number: DE19655265B4
Application number: DE19655265A
Authority: DE
Inventors: Michael A. Menlo Park Baxter
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1995-04-17
Filing date: 1996-04-16
Publication date: 2005-04-14
Anticipated expiration: 2016-04-17

Abstract

The dynamically-reconfigurable scalable, parallel computer system uses a set of S-machines (12), a T-machine (14) corresponding to each S-machine, a general-purpose connection matrix (GPIM), a set of I-O T-machines and a master timing unit. Each S-machine is a dynamically reconfigurable computer with store, a local timing unit and a dynamically reconfigurable processor unit (DRPU). The DRPU is realised in programmable logic with an instruction unit (IFU), data operations unit (DDU) and address generator (AOU).

Description

Die vorliegende Erfindung betrifft ein Verfahren zur Erzeugung von Instruktionen, die von einem rekonfigurierbaren Computer mit einer dynamisch rekonfigurierbaren Verarbeitungseinheit mit einer veränderbaren, internen Hardware-Organisation aus einer Vielzahl von Aussagen hohen Niveaus ausführbar sind, gemäß dem Anspruch 1.The The present invention relates to a method for generating instructions that of a reconfigurable computer with a dynamically reconfigurable Processing unit with a changeable, internal hardware organization are executable from a variety of high-level statements, according to the claim 1.

Die Evolution der Computerarchitektur wird von dem Bedürfnis nach immer größerer Rechenleistung angetrieben. Eine schnelle und genaue Lösung verschiedener Arten von Rechenproblemen oder numerischen Problemen erfordert typischerweise verschiedene Arten von Rechen-Systemelementen bzw. Rechen-Resourcen. Ist ein bestimmter Problembereich gegeben, so kann die Rechenleistung durch die Verwendung von Rechen-Systemelementen erhöht werden, die speziell für die betrachteten Problemtypen architektonisch ausgelegt bzw. strukturiert wurden. Zum Beispiel kann die Verwendung einer Hardware zur digitalen Signalverarbeitung (DSP) bzw. eine "Digital Signal Processing"-Hardware in Verbindung mit einem Allzweck-Computer signifikant bestimmte Arten der Signalverarbeitungsleistung bzw. der Signalverarbeitungswirkung erhöhen. Für den Fall, daß ein Computer selbst speziell für die betrachteten Problemtypen architektonisch ausgelegt wurde, wird die Berechnungsleistungsfähigkeit weiter erhöht werden oder möglicherweise sogar in bezug auf die verfügbaren Rechen-Systemelemente bzw. Rechen-Resourcen für diese bestimmten Problemtypen optimiert werden. Gegenwärtige parallele oder massiv parallele Computer, die eine hohe Leistungsfähigkeit für bestimmte Typen von Probleme der Ordnung n² bzw. von O(n²) oder größerer Komplexität anbieten, stellen Beispiele für diesen Fall dar.The evolution of computer architecture is driven by the need for ever greater computing power. A quick and accurate solution to various types of computational or numerical problems typically requires different types of computational resources. If there is a certain problem area, the computing power can be increased by the use of computing system elements that have been architecturally designed or structured specifically for the problem types considered. For example, the use of Digital Signal Processing (DSP) hardware in conjunction with a general-purpose computer can significantly increase certain types of signal processing performance. In the event that a computer itself has been architecturally designed specifically for the problem types under consideration, the computational performance will be further increased or possibly even optimized with respect to the available computational resources for these particular types of problems. Current parallel or massively parallel computers offering high performance for certain types of problems of order n ^2, O (n ² ), or greater complexity are examples of this case.

Das Bedürfnis nach größerer Rechenleistung muß gegenüber dem Bedürfnis nach der Minimierung der Systemkosten und dem Bedürfnis nach der Maximierung der Systemleistungsfähigkeit in einem breitestmöglichen Bereich sowohl der täglichen bzw. aktuellen als auch möglichen zukünftigen Anwendungen abgewogen werden. Im allgemeinen beeinträchtigt das Einbringen von Rechen-Systemelementen bzw. Rechen-Resourcen, die auf eine begrenzte Anzahl von Problemtypen ausgerichtet sind, in ein Computersystem nachteilig die Systemkosten, weil eine spezialisierte Hardware typischerweise teurer ist als eine Allzweck-Hardware. Die Gestaltung und die Erzeugung eines ganzen Computers für einen bestimmten Zweck kann so teuer werden, daß es sich verbietet, und zwar sowohl bezüglich der Erstellungszeit als auch der Hardwarekosten. Die Verwendung spezialisierter Hardware, um die Rechenleistung zu erhöhen, kann wenig positives bezüglich der Leistungsfähigkeit anbieten, da die Bedürfnisse bezüglich der Berechnungen sich ändern. Im Stand der Technik wurden, da sich die Erfordernisse bezüglich der Berechnung geändert haben, neue Typen spezialisierter Hardware oder neue Systeme für bestimmte Zwecke gestaltet und hergestellt, was zu einem weiterlaufenden Kreislauf von unerwünschten, nicht zurücklaufenden Ingenieurkosten bzw. Erstellungskosten führt. Die Verwendung von Rechen-Systemelementen bzw. Rechen-Resourcen, die auf bestimmte Problemtypen ausgerichtet sind, führt deshalb zu einer ineffizenten Verwendung verfügbaren System-Siliziums, wenn man die sich ändernden Rechenerfordernisse bzw. Rechenbedürfnisse in Betracht zieht. Somit ist es wegen der oben beschriebenen Gründe nicht wünschenswert, wenn man versucht, die Rechenleistung zu erhöhen, indem man spezialisierte Hardware verwendet.The desire for greater computing power must be opposite to that desire after minimizing the system cost and the need for the Maximizing system performance in a widest possible Range of both daily and current as well as possible future Applications are weighed. In general, this affects Introduction of computing system elements or computing resources, the are focused on a limited number of problem types, in a computer system adversely affects system costs because of a specialized Hardware is typically more expensive than general purpose hardware. The Designing and creating a whole computer for one certain purpose may become so expensive that it prohibits, and indeed both regarding the creation time as well as the hardware costs. The usage specialized hardware to increase computing power little positive regarding the efficiency offer, as the needs in terms of the calculations change. In the prior art, as the requirements regarding the Calculation changed have new types of specialized hardware or new systems for certain Purpose designed and manufactured, resulting in a continuing cycle unwanted, not returning Engineering costs or creation costs leads. The use of rake system elements or computing resources that are targeted to specific types of problems are leads therefore, for an inefficient use of available system silicon, if the changing one Calculating requirements or computing needs into consideration. Thus, for the reasons described above, it is not desirable to try to to increase the computing power, by using specialized hardware.

Im Stand der Technik wurden verschiedene Versuche bzw. Anläufe unternommen, um sowohl die Rechenleistung zu erhöhen als auch die problemtypische Anwendbarkeit zu maximieren, indem reprogrammierbare oder rekonfigurierbare Hardware verwendet wird. Ein erster derartiger Anlauf bzw. Versuch gemäß dem Stand der Technik liegt in herunterladbaren Mikrocode-Computerarchitekturen. In einer herunterladbaren Mikrocode-Architektur kann das Verhalten von fixen, nicht rekonfigurierbaren Hardware-Systemelementen selektiv geändert werden, indem eine bestimmte Version eines Mikrocodes verwendet wird.in the In the prior art, various attempts have been made, to increase both the computing power and the problem typical Maximize applicability by being reprogrammable or reconfigurable Hardware is used. A first such attempt according to the state The technique resides in downloadable microcode computer architectures. In a downloadable microcode architecture, the behavior can be of fixed, non-reconfigurable hardware system elements selectively changed be used by using a specific version of a microcode becomes.

Ein Beispiel für eine derartige Architektur ist diejenige des IBM-Systems/360. Da die fundamentale bzw. grundlegende Rechen-Hardware in derartigen Systemen nach dem Stand der Technik nicht selbst rekonfigurierbar ist, liefern derartige Systeme keine optimierte Rechenleistungsfähigkeit, wenn man einen breiten Bereich von Problemtypen betrachtet.One example for such an architecture is that of the IBM system / 360. There the fundamental computing hardware in such Systems of the prior art are not reconfigurable themselves such systems do not provide optimized computing performance, considering a wide range of problem types.

Ein zweiter Anlauf bzw. versuch nach dem Stand der Technik im Hinblick auf sowohl einer Erhöhung der Rechenleistungsfähigkeit als auch einer Maximierung der problemtypischen Anwendbarkeit liegt in der Verwendung einer rekonfigurierbaren Hardware, die mit einem nicht-rekonfigurierbaren Host-Prozessor oder Host-System verbunden ist bzw. damit gekoppelt ist. Diesen Versuch bzw. Anlauf kann man als "zugeordnete rekonfigurierbare Prozessor"- bzw. "Attached Reconfiguralbe Processor"(ARP)-Architektur kategorisieren, bei welcher ein gewisser Anteil der Hardware innerhalb einer Prozessorgruppe, die einem Host zugeordnet ist, rekonfigurier bar ist. Beispiele gegenwärtiger ARP-Systeme, die eine Gruppe rekonfigurierbarer Prozessoren verwenden, die mit einem Host-System gekoppelt sind bzw. verbunden sind, beinhalten: die SPLASH-1- und SPLASH-2-Systeme, die an dem Supercomputer Forschungszentrum bzw. Supercomputing Research Center (Bowie, MD, USA) designed bzw. gestaltet wurden; den WILDFIRE bzw. allgemein konfigurierbaren Computer, der von den Annapolis Micro Systems (Annapolis, MD, USA) hergestellt wird, der eine kommerzielle Version der SPLASH-2 darstellt; und dem EVC-1, der von der Virtual Computer Corporation (Reseda, CA, USA), hergestellt wird. Bei den meisten rechenintensiven Problemen wird eine beträchtliche Menge an Zeit darauf verwendet, relativ kleine Abschnitte von Programmcodes auszuführen. Im allgemeinen werden ARP-Architekturen verwendet, um eine rekonfigurierbare Rechenbeschleunigungseinrichtung für derartige Abschnitte von Programmcodes bereitzustellen. Unglücklicherweise leidet ein Rechenmodell, das auf einer oder mehreren rekonfigurierbaren Rechenbeschleunigungseinrichtungen basiert, an beträchtlichen Nachteilen, wie im folgenden beschrieben werden wird.A second prior art approach to both increasing computational performance and maximizing problematic usability is the use of reconfigurable hardware connected to a non-reconfigurable host processor or host system coupled with it. This attempt can be categorized as an "Associated Reconfigurable Processor" or "Attached Reconfigurable Processor" (ARP) architecture in which a certain amount of hardware within a processor group associated with a host is reconfigurable. Examples of current ARP systems using a set of reconfigurable processors coupled to a host system include: the SPLASH-1 and SPLASH-2 systems attached to the Supercomputing Research Center and Supercomputing Research Center (Bowie, MD, USA) were designed; the WILDFIRE or general configurable computer manufactured by Annapolis Micro Systems (Annapolis, MD, USA), which is a commercial version of the SPLASH-2; and EVC-1 manufactured by Virtual Computer Corporation (Reseda, CA). For most compute-intensive problems, a considerable amount of time is spent executing relatively small portions of program code. In general, ARP architectures are used to provide a reconfigurable computational accelerator for such portions of program code. Unfortunately, a computational model based on one or more reconfigurable computational accelerators suffers significant disadvantages, as will be described below.

Ein erster Nachteil der ARP-Architekturen tritt auf, weil ARP-Systeme versuchen, eine optimierte Implementation eines bestimmten Algorithmus in einer rekonfigurierbaren Hardware zu einer bestimmten Zeit bereitzustellen. Die Philosophie, die hinter dem EVC-1 der Virtual Computer Corporation liegt, besteht z.B. darin, einen speziellen Algorithmus in eine spezielle Konfiguration von rekonfigurierbaren Hardware-Systemelementen umzuwandeln, um eine optimierte Rechenleistung für jenen bestimmten Algorithmus bereitzustellen.One first disadvantage of ARP architectures occurs because ARP systems try an optimized implementation of a specific algorithm in a reconfigurable hardware at a given time. The philosophy behind the EVC-1 of Virtual Computer Corporation is, there is e.g. in it, a special algorithm in one to convert special configuration of reconfigurable hardware system elements, Optimized computing power for that particular algorithm provide.

Rekonfigurierbare Hardware-Systemelemente werden allein zu dem Zweck verwendet, eine optimale Leistungsfähigkeit für einen bestimmten Algorithmus bereitzustellen. Die Verwendung von rekonfigurierbaren Hardware-Systemelementen für allgemeinere Zwecke, wie z.B. dem Management der Ausführung von Instruktionen bzw. Befehlen, wird vermieden. Für einen gegebenen Algorithmus werden somit rekonfigurierbare Hardware-Systemelemente unter dem Gesichtspunkt von individuellen Gattern bzw. Gates betrachtet, die gekoppelt sind, um eine optimale Leistungsfähigkeit zu gewährleisten.reconfigurable Hardware system elements are used solely for the purpose of optimal capacity for one to provide certain algorithm. The use of reconfigurable Hardware system elements for more general purposes, such as the management of the execution of Instructions or commands, is avoided. For a given algorithm Thus, reconfigurable hardware system elements from the point of view from individual gates that are coupled to optimal performance to ensure.

Gewisse ARP-Systeme verlassen sich auf ein Programmierungsmodell, bei welchem ein "Programm" sowohl konventionelle Programminstruktionen bzw. -befehle beinhaltet als auch Instruktionen für spezielle Zwecke, die spezifizieren, wie verschiedene rekonfigurierbare Hardware-Systemelemente untereinander verbunden sind. Weil ARP-Systeme rekonfigurierbare Hardware-Systemelemente in einer algorithmus-spezifischen Art und Weise auf Gatterebene betrachten, müssen diese auf einen speziellen Zweck gerichtete Instruktionen explizite Details bezüglich der Natur eines jeden verwendeten rekonfigurierbaren Hardware-Systemelements und der Art und Weise, in der es mit anderen rekonfigurierbaren Hardware-Systemelementen verbunden ist, bereitstellen. Dies beeinflußt nachteilig die Programmkomplexität. Um die Programmkomplexität zu verringern, wurden Versuche unternommen, ein Programmiermodell zu verwenden, bei welchem ein Programm sowohl konventionelle Instruktionen auf hohem Niveau einer Programmiersprache als auch Instruktionen auf hohem Niveau für spezielle Zwecke beinhaltet. Gegenwärtige ARP-Systeme versuchen deshalb, ein Kompilier-System zu verwenden, das in der Lage ist, sowohl Befehle auf hohem Niveau einer Programmiersprache zu kompilieren als auch die zuvor erwähnten Befehle auf hohem Niveau für spezielle Zwecke. Die angestrebte Ausgabe bzw. Sollausgabe eines derartigen Kompilier-Systems ist ein Code in Assemblersprache für konventionelle Instruktionen auf hohem Niveau einer Programmiersprache und in ein Code einer Hardware-Beschreibungssprache bzw. einer "Hardware Description Language" (HDL) für Befehle zu speziellen Zwecken. Unglücklicherweise stellt die automatische Bestimmung einer Gruppe von rekonfigurierbaren Hardware-Systemelementen und ein Schema bezüglich ihrer Verbindung, um eine optimale Rechenleistung für jeglichen bestimmten Algorithmus, der gerade betrachtet wird, ein NP-Hard-Problem bzw. "NP-hard"-Problem dar. Ein Fernziel gewisser ARP-Systeme ist die Entwicklung eines Kompiliersystems, das einen Algorithmus direkt in ein optimiertes Schema bezüglich der Verbindungen untereinander für eine Gruppe von Gattern kompilieren kann. Die Entwicklung eines derartigen Kompiliersystems ist jedoch eine außerordentlich schwierige Aufgabe, insbesondere wenn man die vielfachen Typen von Algorithmen betrachtet.Certain ARP systems rely on a programming model in which a "program" both conventional Program instructions include instructions as well as instructions for specific ones Purposes that specify how different reconfigurable hardware system elements relate to each other are connected. Because ARP systems reconfigurable hardware system elements in an algorithm-specific manner at the gate level consider, must these instructions directed to a specific purpose are explicit Details regarding the nature of any reconfigurable hardware system element used and the way in which it connects to other reconfigurable hardware system elements is, deploy. This adversely affects the program complexity. To the program complexity attempts were made to reduce a programming model in which a program is both conventional instructions at a high level of programming language as well as instructions at a high level for includes special purposes. Attempting current ARP systems therefore to use a compiler system that is able to both compile commands at a high level of a programming language as well as the aforementioned Commands at a high level for special purposes. The desired output or target output of a such compilation system is an assembly language code for conventional Instructions at a high level of a programming language and in one Code of a hardware description language or a "Hardware Description Language "(HDL) for commands for special purposes. Unfortunately represents the automatic determination of a group of reconfigurable Hardware system elements and a scheme regarding their connection to an optimal computing power for any particular algorithm under consideration is an NP-hard problem or "NP-hard" problem. A long-term goal certain ARP systems is the development of a compilation system, that's an algorithm directly into an optimized schema regarding the Connections with each other for can compile a group of gates. The development of a however, such a compilation system is an extraordinarily difficult task especially considering the multiple types of algorithms.

Ein zweiter Nachteil von ARP-Architekturen tritt auf, weil ein ARP-Apparat die Berechnungsarbeit, die mit dem Algorithmus verbunden ist, für den er konfiguriert worden ist, über eine Vielzahl rekonfigurierbarer Logikvorrichtungen verteilt. Zum Beispiel wird bei einem ARP-Apparat, der realisiert wurde, indem eine Gruppe von feldprogrammierbaren Logikvorrichtungen bzw. "Field Programmable Logic Devices" (FPGAs) verwendet wurde, und der konfiguriert wurde, um eine parallele Multiplikations-Beschleunigungseinrichtung zu implementieren, die Berechnungsarbeit, die mit der parallelen Multiplikation verbunden ist, über die gesamte Gruppe der FPGAs verteilt. Deshalb ist die Größe des Algorithmus, bezüglich dem der ARP-Apparat konfiguriert werden kann, auf die Anzahl der vorliegenden rekonfigurierbaren Logikvorrichtungen beschränkt. In ähnlicher Weise ist die maximale Datensatzgröße, die der ARP-Apparat handhaben kann, beschränkt. Eine Untersuchung von Sourcecodes bzw. Quellcodes lieferten nicht notwendigerweise einen klaren Hinweis auf die Beschränkungen des ARP-Apparats, weil einige Algorithmen Daten-Abhängigkeiten aufweisen können. Im allgemeinen werden datenabhängige Algorithmen vermieden.A second disadvantage of ARP architectures occurs because an ARP apparatus distributes the computational work associated with the algorithm for which it has been configured over a plurality of reconfigurable logic devices. For example, in an ARP apparatus realized by using a set of Field Programmable Logic Devices (FPGAs) and configured to implement a parallel multiplication accelerator, the computational work, which is connected to the parallel multiplication distributed over the entire group of FPGAs. Therefore, the size of the algorithm with which the ARP apparatus can be configured is limited to the number of reconfigurable logic devices present. Similarly, the maximum record size that the ARP apparatus can handle is limited. An examination of source codes or source codes did not necessarily provide a clear indication of the limitations of the ARP apparatus because some algorithms may have data dependencies. In general, data-dependent algo rithmen avoided.

Da weiter ARP-Architekturen die Verteilung von Rechenarbeit über mehrere rekonfigurierbare Logikvorrichtung lehren, erfordert die Anpassung eines neuen (oder sogar leicht modifizierten) Algorithmus, daß die Rekonfiguration massiv durchgeführt werden muß, d.h. die mehreren rekonfigurierbaren Logikvorrichtungen müssen rekonfiguriert werden. Dies beschränkt die maximale Rate, bei der eine Rekonfiguration für alternative Probleme oder Kaskaden-Unterprobleme bzw. hintereinandergeschaltete Unterprobleme auftreten kann.There ARP architectures continue the distribution of computational work over several teach reconfigurable logic device requires customization a new (or even slightly modified) algorithm that reconfigures massively performed must, i. the multiple reconfigurable logic devices must be reconfigured become. This is limited the maximum rate at which to reconfigure for alternative Problems or cascade sub-problems or sub-problems connected in series can occur.

Ein dritter Nachteil von ARP-Architekturen ergibt sich aus der Tatsache, daß eine oder mehrere Abschnitte des Programmcodes auf dem Host ausgeführt werden. Das heißt ein ARP-Apparat ist nicht selbst ein unabhängiges Computersystem bzw. Rechensystem, der ARP-Apparat führt nicht ganze Pro gramme aus und es ist deshalb eine Wechselwirkung mit dem Host notwendig. Da etwas von dem Programmcode auf dem nicht rekonfigurierbaren Host ausgeführt wird, wird die Gruppe bzw. der Satz an verfügbaren Silizium-Resourcen nicht maximal über den Zeitrahmen der Programmausführung ausgenutzt. Insbesondere während der Ausübung von Instruktionen bzw. Befehlen auf Host-Basis, werden die Silizium-Resourcen bzw. Silizium-Systemelemente auf dem ARP-Apparat untätig sein oder ineffizient verwendet werden. In ähnlicher weise werden, wenn der ARP-Apparat mit Daten arbeitet, die Silizium-Resourcen bzw. die Silizium-Systemelemente auf dem Host im allgemeinen ineffizient verwendet werden. Um leicht mehrere ganze Programme auszuführen, müssen die Silizium-Resourcen bzw. die Silizium-Systemeinheiten in leicht wiederverwendbare Resourcen bzw. Systemeinheiten kopiert werden. Wie vorstehend beschrieben wurde, behandeln ARP-Systeme rekonfigurierbare Hardware-Systemelemente als eine Gruppe von Gattern bzw. Gates, die optimal untereinander verbunden sind, um einen bestimmten Algorithmus zu einer bestimmten Zeit zu implementieren. Somit liefern ARP-Systeme nicht eine Einrichtung, um einen bestimmten Satz bzw. eine bestimmte Gruppe an rekonfigurierbaren Hardware-Systemelementen als ein leicht von einem Algorithmus zum anderen wiederverwendbares Systemelement zu behandeln, weil die Wiederverwendbarkeit ein gewisses Niveau an Unabhängigkeit bezüglich des Algorithmus erfordert.One third disadvantage of ARP architectures arises from the fact that one or multiple sections of the program code are executed on the host. This means an ARP apparatus is not an independent one Computer system or computing system, the ARP apparatus does not lead whole programs and it is therefore an interaction with the Host necessary. As something of the program code on the non-reconfigurable Host executed becomes, the group or the set of available silicon resources does not become maximum over the Timeframe of program execution exploited. Especially during the exercise of host-based instructions, the silicon resources become or silicon system elements on the ARP apparatus be idle or inefficiently used. Similarly, if the ARP apparatus works with data, the silicon resources or the silicon system elements generally inefficiently used on the host. To easily to run several whole programs, the Silicon resources or the silicon system units into easily reusable resources or system units are copied. As described above ARP systems handle reconfigurable hardware system elements as a group of gates that are optimally with each other are connected to a specific algorithm to a particular Time to implement. Thus, ARP systems do not provide a facility around a particular set or set of reconfigurable ones Hardware system elements as an easy of an algorithm for to treat other reusable system element because the Reusability a certain level of independence in terms of of the algorithm requires.

Ein ARP-Apparat kann nicht das gegenwärtig ausgeführte Hostprogramm als Daten behandeln und kann im allgemeinen nicht sich selbst kontextualisieren. Ein ARP-Apparat kann nicht leicht so hergestellt werden, daß er sich selbst durch das Ausführen seines eigenen Hostprogramms simuliert. Weiter kann ein ARP-Apparat nicht hergestellt werden, um seine eigene HDL oder seine eigenen Anwendungsprogramme auf ihn selbst zu kompilieren, wobei er direkt die rekonfigurierbaren Hardware-Systemelemente bzw. Hardware-Resourcen verwendet, aus denen er aufgebaut ist. Ein ARP-Apparat ist somit bezüglich seiner Architektur bezüglich in sich geschlossener Rechenmodelle beschränkt, die die Unabhängigkeit von einem Hostprozessor lehren.One ARP device can not use the currently running host program as data treat and generally can not contextualize itself. An ARP machine can not be easily made to fit even by running his own host program simulated. Next, an ARP device can not be made to its own HDL or its own application programs to compile on himself, being directly the reconfigurable Hardware system elements or hardware resources used, from which he is built up. An ARP apparatus is thus in terms of its architecture in terms of limited to closed computational models that independence teach from a host processor.

Weil ein ARP-Apparat als eine Rechen-Beschleunigungseinrichtung wirkt, ist er im allgemeinen nicht in der Lage, eine unabhängige Eingangs-/Ausgangs- bzw. "Input/Output"(I/O)-Verarbeitung durchzuführen. Typischerweise erfordert ein ARP-Apparat eine Wechselwirkung mit dem Host für eine I/O-Verarbeitung. Die Leistungsfähigkeit eines ARP-Apparats kann deshalb bezüglich des I/O beschränkt sein. Fachleute werden erkennen, daß ein ARP-Apparat jedoch konfiguriert werden kann, um ein spezielles I/O-Problem zu beschleunigen. Da jedoch der gesamte ARP-Apparat auf ein einziges, spezielles Problem hin gestaltet bzw. konfiguriert ist, kann ein ARP-Apparat die I/O-Verarbeitung nicht mit der Datenverarbeitung balancieren, ohne bezüglich des einen oder des anderen einen Kompromiß einzugehen. Darüber hinaus stellt ein ARP-Apparat keine Einrichtung für die Interruptverarbeitung bereit. Die Lehren bezüglich eines ARP's bieten. keine derartigen Mechanismen an, da sie auf eine maximale Beschleunigung des Rechnens hin ausgerichtet sind und die Unterbrechung sich negativ auf die Rechenbeschleunigung auswirkt.Because an ARP apparatus acts as a computing accelerator, he is generally unable to design an independent entry / exit or "input / output" (I / O) processing. typically, An ARP apparatus requires interaction with the host for I / O processing. The capacity of an ARP apparatus may therefore be limited in I / O. professionals will recognize that one ARP apparatus, however, can be configured to handle a specific I / O problem to accelerate. However, since the entire ARP apparatus is limited to a single, If a specific problem is designed or configured, an ARP device can perform I / O processing do not balance with the data processing, without regard to the to compromise one or the other. Furthermore an ARP device does not provide any means for interrupt processing ready. The lessons concerning an ARP's offer. no such mechanisms, as they aim for maximum acceleration of arithmetic and the interruption are negative affects the computational acceleration.

Ein vierter Nachteil von ARP-Architekturen existiert, da es Software-Applikationen gibt, die eine inhärente Datenparallelität besitzen, wobei es schwierig ist, diese auszunutzen, indem ein ARP-Apparat verwendet wird. HDL-Kompilierungs-Anwendungen liefern ein derartiges Beispiel, wenn eine Netz-Namen-Symbolauflösung in einer sehr großen Netzliste benötigt wird.One Fourth disadvantage of ARP architectures exists because there are software applications that gives an inherent data parallelism which it is difficult to exploit by using an ARP apparatus is used. HDL compilation applications provide such Example, if a network name symbol resolution in a very large netlist needed becomes.

Ein fünfter Nachteil, der mit ARP-Architekturen verbunden ist, ist, daß es im wesentlichen ein SIMD-Computerarchitekturmodell gibt. ARP-Architekturen sind deshalb weniger bezüglich ihrer Architektur effektiv, als ein oder mehrere innovative nicht-rekonfigurierbare Systeme nach dem Stand der Technik. ARP-Systeme spiegeln nur einen Teil des Prozesses der Ausführung eines Programms wider, hauptsächlich die arithmetische Logik für eine arithmetische Berechnung, und zwar für jeden spezifischen Konfigurationsfall, und zwar für so viel Rechenleistung, wie die verfügbare rekonfigurierbare Hardware liefern kann. Im Gegensatz dazu nutzte nach dem Systemdesign der SYMBOL-Maschine bei Fairchild 1971 der gesamte Computer einen einzigen Hardware-Kontext für jeden Aspekt der Programmausführung. Infolgedessen umfaßte SYMBOL jedes Element für die Systemanwendung eines Computers, einschließlich des Hostabschnittes, der durch ARP-Systeme gelehrt bzw. angewiesen wird.A fifth disadvantage associated with ARP architectures is that there is essentially a SIMD computer architecture model. ARP architectures are therefore less effective in their architecture than one or more prior art innovative non-reconfigurable systems. ARP systems only reflect part of the process of executing a program, mainly the arithmetic logic for arithmetic calculation, for each specific configuration case, for as much processing power as the available reconfigurable hardware can provide. In contrast, according to the system design of the SYMBOL machine at Fairchild 1971, the entire computer used a single hardware context for every aspect of program execution. As a result, SYMBOL included every element for the system application of a computer, including the host section, through ARP systems taught or instructed.

ARP-Architekturen weisen andere Unzulänglichkeiten ebenso auf. Zum Beispiel fehlt es einem ARP-Apparat an einer effektiven Einrichtung, um ein unabhängiges Timing bzw. eine unabhängige Zeitsteuerung für vielfach rekonfigurierbare Logikvorrichtungen bereitzustellen. Ähnlich fehlt es einem kaskadierten ARP-Apparat an einer wirksamen Takt-Verteilungseinrichitung, um unabhängig getimte Einheiten bzw. Einheiten, die bezüglich ihrer Zeitsteuerung unabhängig sind, bereitzustellen. Nach einem anderen Beispiel ist es schwierig, genau die Ausführungszeit mit den Quellencode-Anweisungen zu korrelieren, für die eine Beschleunigung beabsichtigt wird. Für eine genaue Abschätzung der Netzsystem-Taktrate muß die ARP-Vorrichtung mit einem Werkzeug für computerunterstütztes Design bzw. mit einem "Computer-Aided Design (CAD)"-Tool nach einer HDL-Kompilierung modelliert werden, dies ist ein zeitaufbrauchender Prozeß, um zu einem derartigen Basisparameter zu gelangen.ARP architectures have other shortcomings as well. For example, an ARP apparatus lacks an effective one Establishment to an independent Timing or independent timing for many times to provide reconfigurable logic devices. Similarly missing a cascaded ARP apparatus at an effective clock distribution facility, to be independent timed units or units that are independent in their timing, provide. For another example, it's difficult, exactly the execution time with the source code instructions too correlate, for an acceleration is intended. For an accurate estimate of the Network system clock rate must be ARP device with a computer-aided design tool or with a "computer-aided Design (CAD) "tool modeled after HDL compilation, this is a time-consuming one Process, to arrive at such a basic parameter.

Was gebraucht wird, ist eine Einrichtung zum rekonfigurierbaren Rechnen, die die Beschränkungen des oben beschriebenen Standes der Technik überwindet.What is needed, is a device for reconfigurable computing, the the restrictions of the prior art described above.

Aus der EP 0 253 530 A2 ist es bekannt, eine Rekonfigurations-Anweisung zu verwenden, um eine Hardware-Rekonfiguration einer dynamisch rekonfigurierbarer Verarbeitungseinheit zu veranlassen.From the EP 0 253 530 A2 It is known to use a reconfiguration instruction to cause a hardware reconfiguration of a dynamically reconfigurable processing unit.

Gemäß der Erfindung wird der Gegenstand des Patentanspruchs 1 vorgeschlagen.According to the invention the subject-matter of claim 1 is proposed.

Im folgenden werden kurz die Zeichnungen beschrieben:in the The following will briefly describe the drawings:

1 ist ein Blockdiagramm einer bevorzugten Ausführungsform eines Systems zum skalierbaren, parallelen, dynamisch rekonfigurierbaren Rechnen, das in Übereinstimmung mit der vorliegenden Erfindung aufgebaut ist; 1 Figure 10 is a block diagram of a preferred embodiment of a scalable parallel dynamic reconfigurable computing system constructed in accordance with the present invention;

2 ist ein Blockdiagramm einer bevorzugten Ausführungsform einer S-Maschine gemäß der vorliegenden Erfindung; 2 Fig. 10 is a block diagram of a preferred embodiment of an S-engine according to the present invention;

3A ist ein exemplarisches Programmlisting, das Rekonfigurationsanweisungen enthält; 3A is an exemplary program listing containing reconfiguration instructions;

3B ist ein Flußdiagramm bekannter Kompilierungsoperationen, die während der Kompilierung einer Folge von Programminstruktionen ausgeführt werden; 3B Figure 13 is a flow chart of known compile-time operations performed during compilation of a sequence of program instructions;

3C und 3D sind Flußdiagramme der bevorzugten Kompilie rungsoperationen, die von einem Kompiler zum dynamisch rekonfigurierbaren Rechnen durchgeführt werden; 3C and 3D FIGURES are flowcharts of the preferred compilation operations performed by a compiler for dynamically reconfigurable computing;

4 ist ein Blockdiagramm einer bevorzugten Ausführungsform einer dynamisch rekonfigurierbaren Verarbeitungseinheit bzw. einer "Dynamically Reconfigurable Processing Unit" gemäß der vorliegenden Erfindung; 4 Figure 10 is a block diagram of a preferred embodiment of a dynamically reconfigurable processing unit in accordance with the present invention;

5 ist ein Blockdiagramm einer bevorzugten Ausführungsform einer Instruktionsabrufeinheit bzw. einer "Instruction Fetch Unit" gemäß der vorliegenden Erfindung; 5 Figure 4 is a block diagram of a preferred embodiment of an instruction fetch unit according to the present invention;

6 ist ein Zustandsdiagramm, das einen bevorzugten Satz von Zuständen zeigt, die durch eine Interruptlogik der vorliegenden Erfindung unterstützt werden; 6 Fig. 10 is a state diagram showing a preferred set of states supported by an interrupt logic of the present invention;

7A und 7B zeigen ein Flußdiagramm eines bevorzugten Verfahrens zum skalierbaren, parallelen, dynamisch rekonfigurierbaren Rechnen in Übereinstimmung mit der vorliegenden Erfindung. 7A and 7B Figure 12 is a flow chart of a preferred method for scalable, parallel, dynamically reconfigurable computing in accordance with the present invention.

Im folgenden werden die bevorzugten Ausführungsformen detailliert beschrieben:
1 zeigt ein Blockdiagramm einer bevorzugten Ausführungsform eines Systems 10 zum skalierbaren, parallelen, dynamisch rekonfigurierbaren Rechnen, das in Übereinstimmung mit der vorliegenden Erfindung aufgebaut ist. Das System 10 umfaßt vorzugsweise wenigstens eine S-Maschine 12, eine T-Maschine 14, die zu jeder S-Maschine 12 korrespondiert, eine Allzweck-Verbindungsmatrix bzw. "General Purpose Interconnect Matrix" (GPIM) 16, wenigstens eine I/O-T-Maschine 18, eine oder mehrere I/O-Vorrichtungen 20 und eine Master-Zeitbasiseinheit 22. Bei der bevorzugten Ausführungsform umfaßt das System 10 mehrere S-Maschinen 12 und somit mehrere T-Maschinen 14 sowie mehrere I/O-T-Maschinen 18 und mehrere I/O-Vorrichtungen 20.In the following the preferred embodiments are described in detail:
1 shows a block diagram of a preferred embodiment of a system 10 for scalable, parallel, dynamically reconfigurable computing constructed in accordance with the present invention. The system 10 preferably comprises at least one S-machine 12 , a T-machine 14 that goes to every S machine 12 corresponds to a General Purpose Interconnect Matrix (GPIM) 16 , at least one I / OT machine 18 , one or more I / O devices 20 and a master timebase unit 22 , In the preferred embodiment, the system comprises 10 several S-machines 12 and thus several T-machines 14 as well as several I / OT machines 18 and multiple I / O devices 20 ,

Jede der S-Maschinen 12, T-Maschinen 14 und I/O-T-Maschinen 18 umfaßt einen Master-Zeitsteuerungseingang, der mit einem Zeitsteuerungsausgang der Master-Zeitbasiseinheit 22 verbunden ist.Each of the S machines 12 , T-machines 14 and I / OT machines 18 includes a master timing input coupled to a timing output of the master timebase unit 22 connected is.

Jede S-Maschine 12 umfaßt einen Eingang und einen Ausgang, der mit ihrer entsprechenden bzw. korrespondierenden T-Maschine 14 verbunden ist. Zusätzlich dazu, daß der Eingang und der Ausgang mit ihrer entsprechenden bzw. korrespondierenden S-Maschine 12 verbunden ist, umfaßt jede T-Maschine 14 einen Leitwegeingang bzw. Führungseingang und einen Leitwegausgang bzw. Führungsausgang, der mit der GPIM 16 verbunden ist. In ähnlicher Art und Weise umfaßt jede I/O-T-Maschine 18 einen Eingang und einen Ausgang, der mit einer I/O-Vorrichtung 20 verbunden ist, und einen Leitwegeingang und einen Leitwegausgang, der mit der GPIM 16 verbunden ist.Every S-machine 12 includes an input and an output connected to its corresponding T-machine 14 connected is. In addition to the input and the output with their corresponding S machine 12 is connected, includes any T-machine 14 a routing input and routing input and a routing output that is connected to the GPIM 16 connected is. In similar manner includes any I / OT machine 18 an input and an output connected to an I / O device 20 and a routing input and a routing output connected to the GPIM 16 connected is.

Wie detaillierter weiter unten beschrieben werden wird, handelt es sich bei jeder S-Maschine 12 um einen dynamisch rekonfigurierbaren Computer bzw. Rechner. Die GPIM 16 bildet eine Punkt-zu-Punkt-Parallel-Verbindungseinrichtung, die die Kommunikation zwischen T-Maschinen 14 erleichtert. Der Satz von T-Maschinen 14 und die GPIM 16 bilden eine Punkt-zu-Punkt-Parallel-Verbindungseinrichtung für einen Datentransfer zwischen S-Maschinen 12. In ähnlicher Weise bilden die GPIM 16, der Satz von T-Maschinen 14 und der Satz von I/O-T-Maschinen 18 eine Punkt-zu-Punkt-Parallel-Verbindungseinrichtung für einen I/O-Transfer bzw. für eine I/O-Übertragung zwischen S-Maschinen 12 und jeder I/O-Vorrichtung 20. Die Master-Zeitbasiseinheit 22 umfaßt einen Oszillator, der ein Master-Zeitsteuersignal für jede S-Maschine 12 und T-Maschine 14 liefert.As will be described in more detail below, each S machine is 12 to a dynamically reconfigurable computer or computer. The GPIM 16 forms a point-to-point parallel connection device, which controls the communication between T-machines 14 facilitated. The set of T-machines 14 and the GPIM 16 form a point-to-point parallel connection device for a data transfer between S-machines 12 , Similarly, the GPIM 16 , the set of T-machines 14 and the set of I / OT machines 18 a point-to-point parallel connection device for an I / O transfer or for an I / O transfer between S-machines 12 and every I / O device 20 , The master timebase unit 22 comprises an oscillator which provides a master timing signal for each S-machine 12 and T-machine 14 supplies.

Bei einer beispielhaften Ausführungsform ist jede S-Maschine 12 realisiert, indem ein Xilinx XC4013 (Xilinx, Inc., San Jose, CA, USA) feldprogrammierbares Gate-Array bzw. "Field Programmable Gate Array" (FPGA) verwendet wird, das mit 64 Megabyte Speicher mit wahlfreiem Zugriff bzw. "Random Access Memory" (RAM) verbunden ist. Jede T-Maschine 14 wird realisiert, indem ungefähr 50% der rekonfigurierbaren Hardware-Systemelemente in einer Xilinx XC4013 FPGA verwendet werden, dies ist auch bei jeder I/O-T-Maschine 18 der Fall. Die GPIM 14 ist als eine toroidale Verbindungsmasche bzw. Verbindungsvermaschung verwirklicht. Die Master-Zeitbasiseinheit 22 ist ein Taktoszillator, der mit dem Taktverteilungsschaltkreis verbunden ist, um eine systemweite Frequenzreferenz zu liefern, wie dies in der US-Patentanmeldung mit dem Titel "System und verfahren für phasensynchrones Takten und phasensynchrone Nachrichtenübertragung mit flexibler Frequenz" bzw. "System and Method for Phase-Synchronous, Flexible Frequency Clocking and Messaging" beschrieben ist. Vorzugsweise übertragen die GPIM 14, die T-Maschinen 12 und die I/O-T-Maschinen 18 Information in Übereinstimmung mit dem ANSI/IEEE-Standard 1596-1992, der ein skalierbares kohärentes Interface bzw. ein "Scalable Coherent Interface" (SCI) festlegt.In an exemplary embodiment, each S machine is 12 realized by using a Xilinx XC4013 (Xilinx, Inc., San Jose, CA) field programmable gate array (FPGA), which is provided with 64 megabytes of random access memory Memory "(RAM) is connected. Every T-machine 14 It is realized by using approximately 50% of the reconfigurable hardware elements in a Xilinx XC4013 FPGA, which is the same with any I / OT machine 18 the case. The GPIM 14 is realized as a toroidal connection mesh. The master timebase unit 22 is a clock oscillator connected to the clock distribution circuit to provide a system-wide frequency reference, as described in the US patent application entitled "System and Method for Phase-Synchronous Clocking and Phase-Synchronous Flexible Frequency Messaging" -Synchronous, Flexible Frequency Clocking and Messaging "is described. Preferably, the GPIM transmit 14 , the T-machines 12 and the I / OT machines 18 Information in accordance with the ANSI / IEEE standard 1596-1992, which specifies a Scalable Coherent Interface (SCI).

Bei der bevorzugten Ausführungsform umfaßt das System 10 mehrere S-Maschinen 12, die parallel arbeiten. Die Struktur und die Funktionalität bzw. Wirkungsweise einer jeden einzelnen S-Maschine 12 sind weiter unten detailliert unter Bezugnahme auf die 2 bis 12B beschrieben. Nimmt man nun Bezug auf die 2, so ist ein Blockdiagramm einer bevorzugten Ausführungsform einer S-Maschine 12 gezeigt. Die S-Maschine 12 umfaßt eine erste lokale Zeitbasiseinheit 30, eine dynamisch rekonfigurierbare Verarbeitungseinheit bzw. eine "Dynamically Reconfigurable Processing Unit" (DRPU) 32, um Programminstruktionen auszuführen, und einen Speicher 34. Die erste lokale Zeitbasiseinheit 30 umfaßt einen Zeitsteuerungseingang, der den Master-Zeitsteuerungseingang der S-Maschine ausbildet. Die erste lokale Zeitbasiseinheit 30 umfaßt ebenso einen Zeitsteuerungsaungang, der ein erstes lokales Zeitsteuerungssignal oder einen ersten lokalen Zeitsteuerungstakt zu einem Zeitsteuerungseingang der DRPU 32 und einem Zeitsteuerungseingang des Speichers 34 über eine erste Zeitsteuerungssignalleitung 40 liefert. Die DRPU 32 umfaßt einen Steuersignalausgang, der mit einem Steuersignaleingang des Speichers 34 über eine Speichersteuerleitung 42 verbunden ist; einen Adressenausgang, der mit einem Adresseneingang des Speichers 34 über eine Adressenleitung 44 verbunden ist; einen bidirektionalen Datenport, der mit einem bidirektionalen Datenport des Speichers 34 über eine Speicher-I/O-Leitung 46 verbunden ist. Die DRPU 32 umfaßt zusätzlich einen bidirektionalen Steuerport, der mit einem bidirektionalen Steuerport ihrer entsprechenden T-Maschine 14 über eine externe Steuerleitung 48 verbunden ist. Wie in der 2 gezeigt, überspannt die Speichersteuerungsleitung 42 X Bits, die Adressenleitung 44 M Bits, die Speicher-I/O-Leitung 46 (N × k) Bits und die externe Steuerleitung 48 überspannt Y Bits.In the preferred embodiment, the system comprises 10 several S-machines 12 that work in parallel. The structure and functionality of each S machine 12 are detailed below with reference to the 2 to 12B described. If one refers now to the 2 Figure 4 is a block diagram of a preferred embodiment of an S-machine 12 shown. The S machine 12 includes a first local time base unit 30 , a Dynamically Reconfigurable Processing Unit or a Dynamically Reconfigurable Processing Unit (DRPU) 32 to execute program instructions and a memory 34 , The first local time base unit 30 includes a timing input which forms the master timing input of the S engine. The first local time base unit 30 Also included is a timing device that receives a first local timing signal or a first local timing clock at a timing input of the DRPU 32 and a timing input of the memory 34 via a first timing signal line 40 supplies. The DRPU 32 comprises a control signal output connected to a control signal input of the memory 34 via a memory control line 42 connected is; an address output connected to an address input of the memory 34 via an address line 44 connected is; a bidirectional data port that communicates with a bidirectional data port of the memory 34 via a memory I / O line 46 connected is. The DRPU 32 In addition, it includes a bi-directional control port connected to a bidirectional control port of its corresponding T-machine 14 via an external control line 48 connected is. Like in the 2 shown spans the memory control line 42 X bits, the address line 44 M bits, the memory I / O line 46 (N × k) bits and the external control line 48 spans Y bits.

Bei der bevorzugten Ausführungsform empfängt die erste lokale Zeitbasiseinheit 30 das Master-Zeitsteuerungssignal von der Master-Zeitbasiseinheit 22. Die erste lokale Zeitbasiseinheit 30 erzeugt das erste lokale Zeitsteuerungssignal aus dem Master-Zeitsteuerungssignal und liefert das erste lokale Zeitsteuerungssignal zu der DRPU 32 und dem Speicher 34. Bei der bevorzugten Ausführungsform kann das erste lokale Zeitsteuerungssignal von einer S-Maschine 12 zu der anderen sich ändern. Somit arbeiten die DRPU 32 und der Speicher 34 innerhalb einer gegebenen S-Maschine 12 bei unabhängigen Taktraten relativ zu der DRPU 32 dem Speicher 34 innerhalb irgendeiner anderen S-Maschine 12. Vorzugsweise ist das erste lokale Zeitsteuerungssignal mit dem Master-Zeitsteuerungssignal phasensynchronisiert. Bei der bevorzugten Ausführungsform ist die erste lokale Zeitbasiseinheit 30 realisiert, indem eine phasensynchronisierte Frequenzumwandlungsschaltung verwendet wird, einschließlich einer phasensynchronisierten Detektionsschaltung, die realisiert wird, indem rekonfigurierbare Hardware-Systemeinheiten verwendet werden. Fachleute werden erkennen, daß bei einer alternativen Ausführungsform die erste lokale Zeitbasiseinheit 30 als ein Abschnitt eines Taktverteilungsbaums realisiert werden kann.In the preferred embodiment, the first local time base unit receives 30 the master timing signal from the master timebase unit 22 , The first local time base unit 30 generates the first local timing signal from the master timing signal and provides the first local timing signal to the DRPU 32 and the memory 34 , In the preferred embodiment, the first local timing signal may be from an S-machine 12 to change to the other. Thus, the DRPU work 32 and the memory 34 within a given S machine 12 at independent clock rates relative to the DRPU 32 the memory 34 within any other S machine 12 , Preferably, the first local timing signal is phase locked to the master timing signal. In the preferred embodiment, the first local time base unit 30 realized by using a phase-locked frequency conversion circuit, including a phase-locked detection circuit realized by using reconfigurable hardware system units. Those skilled in the art will recognize that in an alternative embodiment, the first local time base unit 30 can be realized as a section of a clock distribution tree.

Der Speicher 34 wird vorzugsweise als ein RAM realisiert bzw. implementiert und speichert Programminstruktionen, Programmdaten und Konfigurationsdatensätze für die DRPU 32. Der Speicher 34 einer jeden gegebenen S-Maschine 12 ist vorzugsweise für jede andere S-Maschine 12 in dem System 10 über die GPIM 16 zugänglich. Darüber hinaus ist jede S-Maschine 12 vorzugsweise dadurch gekennzeichnet, daß sie einen einheitlichen bzw. gleichmäßigen Speicheradressenraum aufweist. Bei der bevorzugten Ausführungsform enthalten Programminstruktionen, die in dem Speicher 34 gespeichert sind, selektiv Rekonfigurationsanweisungen, die an die DRPU 32 gerichtet sind. Nimmt man nun Bezug auf die 3A, so ist ein beispielhaftes Programmlisting 50 einschließlich der Rekonfigurationsanweisungen gezeigt. Wie in der 3A gezeigt ist, beinhaltet das beispielhafte Programmlisting 50 einen Satz von Außenschleifenabschnitten 52, einen ersten Innenschleifenabschnitt 54, einen zweiten Innenschleifenabschnitt 55, einen dritten Innenschleifenabschnitt 56, einen vierten Innenschleifenabschnitt 57 und einen fünften Innenschleifenabschnitt 58. Fachleute werden leicht erkennen, daß der Ausdruck "Innenschleife" bzw. "Inner-Loop" auf einen iterativen Abschnitt eines Programms Bezug nimmt, der für die Durchführung eines bestimmten Satzes von dazu in bezug stehenden Operationen verantwortlich ist, und der Term "Außenschleife" bzw. "Outer-Loop" nimmt auf jene Abschnitte eines Programms Bezug, die hauptsächlich. für die Durchführung von Allzweckoperationen und/oder für das Übertragen einer Steuerung von einem Innenschleifenabschnitt zu einem anderen verantwortlich ist. Im allgemeinen führen Innenschleifenabschnitte 54, 55, 56, 57, 58 eines Programmes Operationen bezüglich potentiell großer Datensätze durch. Bei einer Bildverarbeitungsanwendung kann z.B. der erste Innenschleifenabschnitt 54 eine Farbformatumwandlungsoperation bezüglich der Bilddaten durchführen und die zweiten bis fünften Innenschleifenabschnitte 55, 56, 57, 58 können ein lineares Filtern, Falten, Mustersuchen und Komprimierungsoperationen durchführen. Fachleute werden erkennen können, daß eine aneinanderliegende Abfolge von Innenschleifenabschnitten 55, 56, 57, 58 als eine Software-Pipeline bzw. ein Software-Fließband gedacht werden kann. Jeder Außenschleifenabschnitt 52 wäre für den Daten-I/O und/oder für die Anweisung des Transfers bzw. der Übertragung von Daten und der Steuerung von dem ersten Innenschleifenabschnitt 54 zu dem zweiten Innenschleifen-Abschnitt 55 verantwortlich. Fachleute werden zusätzlich erkennen, daß ein gegebener Innenschleifenabschnitt 54, 55, 56, 57, 58 eine oder mehrere Rekonfigurationsanweisungen enthalten kann. Im allgemeinen werden für jedes gegebene Programm die Außenschleifenabschnitte 52 oder das Programmlisting 50 eine Vielzahl von Allzweck-Instruktionstypen beinhalten, während die Innenschleifenabschnitte 54, 56 des Programmlistings 50 aus relativ wenigen Instruktionstypen bestehen, die verwendet werden, um einen spezifischen Satz von Operationen durchzuführen.The memory 34 is preferably implemented as a RAM and stores program instructions, program data and configuration records for the DRPU 32 , The memory 34 any given S machine 12 is preferable for every other S machine 12 in the system 10 about the GPIM 16 accessible. In addition, every S machine is 12 preferably characterized in that it has a uniform memory address space. In the preferred embodiment, program instructions contained in the memory 34 are stored, selectively reconfiguration instructions sent to the DRPU 32 are directed. If one refers now to the 3A so is an exemplary program listing 50 including the reconfiguration instructions. Like in the 3A is shown includes the exemplary program listing 50 a set of outer loop sections 52 , a first inner loop section 54 , a second inner loop section 55 , a third inner loop section 56 , a fourth inner loop section 57 and a fifth inner loop section 58 , Those skilled in the art will readily recognize that the term "inner loop" refers to an iterative portion of a program responsible for performing a particular set of related operations, and the term "outer loop" or "inner loop""OuterLoop" refers to those sections of a program that are primarily. is responsible for performing general purpose operations and / or for transmitting control from one inner loop section to another. In general, inner loop sections lead 54 . 55 . 56 . 57 . 58 of a program performs operations on potentially large data sets. In an image processing application, for example, the first inner loop section 54 perform a color format conversion operation on the image data and the second through fifth inner loop portions 55 . 56 . 57 . 58 can perform linear filtering, folding, pattern searching, and compression operations. Those skilled in the art will recognize that a contiguous sequence of inner loop sections 55 . 56 . 57 . 58 can be thought of as a software pipeline or software assembly line. Each outer loop section 52 would be for the data I / O and / or instruction of transfer of data and control from the first inner loop section 54 to the second inner loop section 55 responsible. Those skilled in the art will additionally recognize that a given inner loop portion 54 . 55 . 56 . 57 . 58 may contain one or more reconfiguration instructions. In general, for any given program, the outer loop sections become 52 or the program listing 50 include a variety of general purpose instruction types while the inner loop sections 54 . 56 of the program listing 50 consist of relatively few instruction types used to perform a specific set of operations.

Bei dem beispielhaften Programmlisting 50 erscheint eine erste Rekonfigurationsanweisung am Beginn des ersten Innenschleifenabschnittes 54 und eine zweite Rekonfigurationsanweisung erscheint an dem Ende des ersten Innenschleifenabschnittes 54. In ähnlicher Weise erscheint eine dritte Rekonfigurationsanweisung am Beginn des zweiten Innenschleifenabschnittes 55; eine vierte Rekonfigurationsanweisung erscheint am Beginn des dritten Innenschleifenabschnittes 56; eine fünfte Rekonfigurationsanweisung erscheint am Beginn des vierten Innenschleifenabschnittes 57; und eine sechste und siebte Rekonfigurationsanweisung erscheint am Beginn und am Ende des fünften Innenschleifenabschnittes 58, und zwar jeweilig. Jede Rekonfigurationsanweisung nimmt vorzugsweise auf einen Konfigurationsdatensatz Bezug, der eine interne DRPU-Hardware-Organisation spezifiziert, die auf die Implementation bzw. Realisation einer bestimmten Instruktionssatz-Architektur bzw. "Instruction Set Architecture" (ISA) ausgerichtet ist und dafür optimiert ist. Eine ISA ist ein primitiver Satz oder Kernsatz von Instruktionen, der verwendet werden kann, um einen Computer zu programmieren. Eine ISA definiert Instruktionsformate, Operationscodes bzw. Opcodes, Datenformate, Adressierungsmoden, Ausführungssteuermarken bzw. Ausführungssteuerflags und über das Programm zugreifbare Register. Fachleute werden erkennen, daß dies der konventionellen Definition einer ISA entspricht. Bei der vorliegenden Erfindung kann jede DRPU 32 einer S-Maschine schnell in Echtzeit konfiguriert werden, um direkt mehrere ISAs durch die Verwendung eines einzigen Konfigurationsdatensatzes für jede gewünschte ISA zu realisieren. Das heißt, jede ISA wird mit einer einzigen internen DRPU-Hardware-Organisation realisiert bzw. implementiert, wie dies durch einen entsprechenden Konfigurationsdatensatz spezifiziert ist. Somit entsprechen bei der vorliegenden Erfindung die ersten bis fünften Innenschleifenabschnitte 54, 55, 56, 57, 58 jeweils einer einzigen ISA, nämlich ISA jeweilig 1, 2, 3, 4 und k. Fachleute werden erkennen, daß jede aufeinanderfolgende bzw. sukzessive ISA nicht einzig zu sein braucht. Somit kann ISA k ISA 1, 2, 3, 4 oder jede davon unterschiedliche ISA sein. Der Satz von Außenschleifenabschnitten 52 entspricht ebenso einer einzigen ISA, nämlich ISA 0. Bei der bevorzugten Ausführungsform kann während der Programmausführung die Auswahl sukzessiver Rekonfigurationsanweisungen von den Daten abhängen. Nach der Auswahl einer gegebenen Rekonfigurationsanweisung werden die Programminstruktionen aufeinanderfolgend gemäß einer entsprechenden ISA über eine einzige DRPU-Hardware-Konfiguration ausgeführt, wie dies durch einen entsprechenden Konfigurationsdatensatz spezifiziert ist.In the exemplary program listing 50 a first reconfiguration instruction appears at the beginning of the first inner loop section 54 and a second reconfiguration instruction appears at the end of the first inner loop portion 54 , Similarly, a third reconfiguration instruction appears at the beginning of the second inner loop section 55 ; a fourth reconfiguration instruction appears at the beginning of the third inner loop section 56 ; a fifth reconfiguration instruction appears at the beginning of the fourth inner loop section 57 ; and a sixth and seventh reconfiguration instruction appears at the beginning and at the end of the fifth inner loop section 58 , respectively. Each reconfiguration instruction preferably references a configuration record specifying an internal DRPU hardware organization that is aligned and optimized for the implementation of a particular instruction set architecture (ISA). An ISA is a primitive set or core set of instructions that can be used to program a computer. An ISA defines instruction formats, opcodes, data formats, addressing modes, execution control flags, and program accessible registers. Those skilled in the art will recognize that this is consistent with the conventional definition of an ISA. In the present invention, each DRPU 32 An S machine can be quickly configured in real time to directly implement multiple ISAs through the use of a single configuration record for each desired ISA. That is, each ISA is implemented with a single internal DRPU hardware organization, as specified by a corresponding configuration record. Thus, in the present invention, the first to fifth inner loop portions correspond 54 . 55 . 56 . 57 . 58 each of a single ISA, namely ISA respectively 1, 2, 3, 4 and k. Those skilled in the art will recognize that each successive or successive ISA need not be unique. Thus, ISA k ISA may be 1, 2, 3, 4 or any different ISA. The set of outer loop sections 52 also corresponds to a single ISA, namely ISA 0. In the preferred embodiment, during program execution, the selection of successive reconfiguration instructions may depend on the data. After selecting a given reconfiguration instruction, the program instructions are sequentially executed according to a corresponding ISA via a single DRPU hardware configuration, as specified by a corresponding configuration record.

Bei der vorliegenden Erfindung kann eine gegebene ISA als eine Innenschleifen-ISA oder eine Außenschleifen-ISA gemäß der Anzahl und Typen von Instruktionen, die sie enthält, kategorisiert werden. Eine ISA, die mehrere Instruktionen beinhaltet, und das ist nützlich, um Allzweck-Operationen durchzuführen, ist eine Außenschleifen-ISA, während eine ISA, die aus relativ wenigen Instruktionen besteht, und das ist daraufhin ausgerichtet, spezifische Typen von Operationen durchzuführen, ist eine Innenschleifen-ISA. Weil eine Außenschleifen-ISA auf die Durchführung von Allzweck-Operationen gerichtet ist, ist eine Außenschleifen-ISA besonders nützlich, wenn eine aufeinander abfolgende bzw. sequentielle Ausführung von Programminstruktionen erwünscht ist. Die Leistungsfähigkeit einer Außenschleifen-ISA bezüglich der Ausführung wird vorzugsweise in Termen der Taktzyklen pro ausgeführter Instruktion charakterisiert. Im Gegensatz dazu ist eine Innenschleifen-ISA, da eine Innenschleifen-ISA auf die Ausführung spezieller Typen von Operationen gerichtet ist, am nützlichsten, wenn eine Ausführung von Parallel-Programminstruktionen wünschenswert ist. Die Leistungsfähigkeit einer Innenschleifen-ISA wird vorzugsweise in Termen von ausgeführten Instruktionen pro Taktzyklus oder Rechenergebnis, das pro Taktzyklus erzeugt wird, charakterisiert.at of the present invention may use a given ISA as an inner-loop ISA or an outer-loop ISA according to the number and types of instructions that it contains are categorized. A ISA, which includes several instructions, and that is useful to perform general purpose operations, is an outer-loop ISA, while an ISA that consists of relatively few instructions, and that is designed to perform specific types of operations an inner-loop ISA. Because an outside loop ISA is performing on General purpose operations, an external loop ISA is special useful, when a sequential execution of Program instructions desired is. The efficiency an outer-loop ISA in terms of the execution is preferably in terms of the clock cycles per instruction executed characterized. In contrast, an inner-loop ISA, as an inner-loop ISA on the execution of special types of Directed operations, most useful if an execution of parallel program instructions desirable is. The efficiency an inner-loop ISA is preferably written in terms of executed instructions per Clock cycle or calculation result generated per clock cycle characterized.

Fachleute werden erkennen, daß die vorhergehende Diskussion einer Ausführung von sequentiellen Programminstruktionen und einer Ausführung von parallelen Programminstruktionen die Ausführung von Programminstruktionen innerhalb einer einzigen DRPU 32 betrifft. Das Vorliegen von mehreren S-Maschinen 12 in dem System 10 erleichtert die parallele Ausführung von mehreren Programminstruktionssequenzen zu jeder gegebenen Zeit, und zwar wo jede Programminstruktionssequenz durch eine gegebene DRPU 32 ausgeführt wird. Jede DRPU 32 ist so konfiguriert, daß sie eine parallele bzw. serielle Hardware aufweist, um eine bestimmte Innenschleifen-ISA bzw. Außenschleifen-ISA zu einer bestimmten Zeit zu implementieren. Die interne Hardwarekonfiguration einer gegebenen DRPU 32 ändert sich mit der Zeit gemäß der Auswahl von einer oder mehreren Rekonfigurationsanweisungen, die innerhalb einer Sequenz von Programminstruktionen, die ausgeführt werden, eingebettet sind.Those skilled in the art will recognize that the preceding discussion of executing sequential program instructions and executing parallel program instructions involves executing program instructions within a single DRPU 32 concerns. The presence of several S-machines 12 in the system 10 facilitates the parallel execution of multiple program instruction sequences at any given time, where each program instruction sequence is passed through a given DRPU 32 is performed. Every DRPU 32 is configured to have parallel or serial hardware to implement a particular inner-loop ISA at an appointed time. The internal hardware configuration of a given DRPU 32 changes with time according to the selection of one or more reconfiguration instructions embedded within a sequence of program instructions being executed.

Bei der bevorzugten Ausführungsform werden jede ISA und ihre entsprechende interne DRPU-Hardware-Organisation ausgestaltet, um optimale Rechenleistungsfähigkeit für eine bestimmte Klasse von Rechenproblemen relativ zu einem Satz von verfügbaren rekonfigurierbaren Hardware-Systemeinheiten bereitzustellen. Wie zuvor erwähnt wurde und wie detaillierter weiter unten beschrieben werden wird, wird eine interne DRPU-Hardware-Organisation, die einer Außenschleifen-ISA entspricht, vorzugsweise für die Ausführung sequentieller Programminstruktionen optimiert und eine interne DRPU-Hardware-Organisation, die einer Innenschleifen-ISA entspricht, wird vorzugsweise für eine Ausführung paralleler Programminstruktionen optimiert. Eine beispielhafte Allzweck-Außenschleifen-ISA ist Anhang A zu entnehmen und eine beispielhafte Innenschleifen-ISA, die auf die Faltung gerichtet ist, ist dem Anhang B zu entnehmen.at the preferred embodiment Become any ISA and its corresponding internal DRPU hardware organization designed to provide optimal computational power for a particular class of Computational issues relative to a set of available reconfigurable hardware system units provide. As previously mentioned and as will be described in more detail below, becomes an internal DRPU hardware organization, that of an outer-loop ISA corresponds, preferably for execution optimized sequential program instructions and an internal DRPU hardware organization, the an inner-loop ISA, preferably becomes parallel for one embodiment Program instructions optimized. An exemplary general purpose external loop ISA is appendix A and an exemplary inner-loop ISA based on the folding is addressed, see Annex B.

Mit Ausnahme jeder Rekonfigurationsanweisung, umfaßt das beispielhafte Programmlisting 50 der 3A vorzugsweise herkömmliche Sprachbefehle hohen Niveaus, z.B. Befehle, die in Übereinstimmung mit der C-Programmiersprache beschrieben sind. Fachleute werden erkennen, daß der Einschluß von einer oder mehreren Konfigurationsanweisungen in eine Sequenz von Programminstruktionen einen Kompiler erfordert, der modifiziert ist bzw. geändert ist, um den Rekonfigurationsanweisungen Rechnung zu tragen. Nimmt man nun auf die 3B Bezug, so ist ein Flußdiagramm nach dem Stand der Technik für Kompilierungsoperationen gezeigt, die während der Kompilierung einer Sequenz bzw. Abfolge von Programminstruktionen durchgeführt werden. Hierin entsprechen die Kompilierungsoperationen gemäß dem Stand der Technik im allgemeinen jenen, die durch den GNU-C-Kompiler (GCC) durchgeführt werden, der von der Free Software Foundation (Cambridge, MA, USA) hergestellt werden. Fachleute werden erkennen, daß die Kompilierungsoperationen gemäß dem Stand der Technik, die weiter unten beschrieben sind, leicht für andere Kompiler verallgemeinert werden können. Die Kompilierungsoperationen nach dem Stand der Technik beginnen im Schritt 500, wobei das Kompiler-Vorderende einen nächsten Befehl hohen Niveaus aus einer Abfolge von Programminstruktionen auswählt. Danach erzeugt das Kompiler-Vorderende einen Code mittleren Niveaus, der dem gewählten Befehl hohen Niveaus. im Schritt 502 entspricht, was in dem Fall des GCC dem Registertransferniveau- bzw. "Register Transfer Level" (RTL)-Befehlen entspricht. Folgt man nun Schritt 502, so bestimmt das Vorderende des Kompilers, ob ein anderer Befehl hohen Niveaus im Schritt 504 eine Beachtung erfordert. Falls dem so ist, so kehrt das bevorzugte verfahren zu dem Schritt 500 zurück.With the exception of each reconfiguration instruction, the exemplary program listing includes 50 of the 3A preferably conventional high level voice commands, eg commands described in accordance with the C programming language. Those skilled in the art will recognize that the inclusion of one or more configuration instructions in a sequence of program instructions requires a compiler that is modified to accommodate the reconfiguration instructions. If one takes now on the 3B Reference is made to a prior art flow chart for compilation operations performed during compilation of a sequence of program instructions. Herein, the prior art compilation operations generally correspond to those performed by the GNU C compiler (GCC) manufactured by the Free Software Foundation (Cambridge, MA, USA). Those skilled in the art will recognize that the prior art compilation operations, described below, can be easily generalized to other compilers. The prior art compilation operations begin in step 500 wherein the compiler front end selects a next high level instruction from a sequence of program instructions. Thereafter, the compiler front end generates a middle level code corresponding to the selected high level command. in step 502 which corresponds to the Register Transfer Level (RTL) instructions in the case of the GCC. Follow now step 502 Thus, the front end of the compiler determines if another high level command is in step 504 requires attention. If so, the preferred method returns to the step 500 back.

Im Schritt 509 bestimmt das Kompiler-Vorderende, daß kein anderer Befehl hohen Niveaus eine Beachtung erfordert, das Kompiler-Hinterende führt als nächstes herkömmliche Registerzuweisungsoperationen im Schritt 506 durch. Nach dem Schritt 506 wählt das Kompiler-Hinterende einen nächsten RTL-Befehl, der innerhalb einer gegenwärtigen RTL-Befehlsgruppe im Schritt 508 zu beachten ist. Das Kompiler-Hinterende bestimmt dann, ob eine Regel im Schritt 510 existiert, die eine Art und Weise spezifiziert, in der die gegenwärtige RTL-Befehlsgruppe in einen Satz von Assemblersprachenbefehlen übersetzt werden kann. Falls eine derartige Regel nicht existiert, kehrt das bevorzugte Verfahren zum Schritt 508 zurück, um einen anderen RTL-Befehl auszuwählen, um ihn in die gegenwärtige RTL-Befehlsgruppe einzuschließen. Falls eine Regel existiert, die der gegenwärtigen RTL-Befehlsgruppe entspricht, erzeugt das Kompiler-Hinterende einen Satz von Assemblersprachenbefehlen gemäß der Regel in dem Schritt 512. Nachfolgend zum Schritt 512 bestimmt das Kompiler-Hinterende, ob ein nächster RTL-Befehl eine Beachtung erfordert, und zwar im Zusammenhang mit einer nächsten RTL-Befehlsgruppe. Falls dem so ist, kehrt das bevorzugte verfahren zum Schritt 508 zurück; ansonsten endet das bevorzugte Verfahren.In step 509 If the compiler front end determines that no other high level instruction requires attention, the compiler tail will next perform conventional register allocation operations in step 506 by. After the step 506 the compiler tail end selects a next RTL instruction within a current RTL instruction set in step 508 is to be noted. The compiler backend then determines if a rule is in step 510 which specifies a manner in which the current RTL instruction set can be translated into a set of assembly language instructions. If such a rule does not exist, the preferred method returns to the step 508 to select another RTL command to include in the current RTL command group. If a rule exists that corresponds to the current RTL instruction set, the compiler tail end generates a set of assembly language instructions according to the rule in the step 512 , Below to the step 512 the compiler tail determines if a next RTL instruction requires attention in the context of a next RTL instruction set. If so, the preferred method returns to the step 508 back; otherwise, the preferred method ends.

Die vorliegende Erfindung beinhaltet vorzugsweise einen Kompiler für dynamisch rekonfigurierbares Rechnen. Nimmt man nun Bezug auf die 3C und 3D, so ist ein Flußdiagramm für bevorzugte Kompilierungsoperationen gezeigt, die von einem Kompiler für eine dynamisch rekonfigurierbare Berechnung durchgeführt werden. Die bevorzugten Kompilierungsoperationen beginnen beim Schritt 600 mit dem Vorderende des Kompilers für dynamisch rekonfigurierbares Berechnen, wobei ein nächster Befehl hohen Niveaus innerhalb einer Sequenz von Programminstruktionen ausgewählt wird. Als nächstes bestimmt das Vorderende des Kompilers für dynamisch rekonfigurierbares Berechnen, ob der gewählte Befehl hohen Niveaus eine Rekonfigurationsanweisung ist, und zwar im Schritt 602. Falls dem so ist, erzeugt das Vorderende des Kompilers für dynamisch rekonfigurierbares Rechnen einen RTL-Rekonfigurationsbefehl im Schritt 604, nachdem das bevorzugte Verfahren zu dem Schritt 600 zurückkehrt. Bei der bevorzugten Ausführungsform handelt es sich bei dem RTL-Rekonfigurationsbefehl um einen Nicht-Standard-RTL-Befehl, der eine ISA-Identifikation bzw. ISA-Kennzeichnung beinhaltet. Falls im Schritt 602 der gewählte Programmbefehl hohen Niveaus nicht eine Rekonfigurationsanweisung ist, erzeugt das Vorderende des Kompilers für dynamisch rekonfigurierbares Rechnen als nächstes einen Satz von RTL-Befehlen auf eine konventionelle Art und Weise, und zwar im Schritt 606. Nach dem Schritt 606 bestimmt das Vorderende des Kompilers für dynamisch rekonfigurierbares Rechnen, ob ein anderer Befehl hohen Niveaus eine Beachtung erfordert, und zwar im Schritt 608. Falls dem so ist, kehrt das bevorzugte Verfahren zum Schritt 500 zurück; ist dem nicht so, fährt das bevorzugte Verfahren zu dem Schritt 610 fort, um Operationen bezüglich des hinteren Endes auszulösen bzw. zu beginnen.The present invention preferably includes a compiler for dynamically reconfigurable computing. If one refers now to the 3C and 3D Thus, a flow chart for preferred compilation operations performed by a compiler for a dynamically reconfigurable computation is shown. The preferred compilation operations begin at the step 600 with the front end of the compiler for dynamically reconfigurable computing, wherein a next high level instruction within a sequence of program instructions is selected. Next, the front end of the compiler for dynamically reconfigurable computing determines whether the selected high level instruction is a reconfiguration instruction in step 602 , If so, the front end of the dynamic reconfigurable compiler generates an RTL reconfiguration command in step 604 After the preferred method to the step 600 returns. In the preferred embodiment, the RTL reconfiguration command is a non-standard RTL command that includes an ISA identification. If in step 602 the selected high level program instruction is not a reconfiguration instruction, the front end of the dynamic reconfigurable compiler compiler next generates a set of RTL instructions in a conventional manner, in step 606 , After the step 606 The front end of the compiler for dynamically reconfigurable computing determines whether another high level command requires attention, in step 608 , If so, the preferred method returns to the step 500 back; if not, the preferred method moves to the step 610 continue to initiate operations on the trailing end.

Im Schritt 610 führt das Hinterende des Kompilers für dynamisch rekonfigurierbares Rechnen Registerzuordnungsoperationen durch. Bei der bevorzugten Ausführungsform der vorliegenden Erfindung ist jede ISA derartig definiert, daß die Registerarchitektur von einer ISA zu einer anderen konsistent ist; deshalb werden die Registerzuordnungsoperationen auf eine konventionelle Art und Weise durchgeführt. Fachleute werden erkennen, daß im allgemeinen eine konsistente Registerarchitektur von einer ISA zur anderen kein absolutes Erfordernis ist. Als nächstes wählt das Hinterende des Kompilers für dynamisch rekonfigurierbares Rechnen einen nächsten RTL-Befehl innerhalb einer gegenwärtig betrachteten RTL-Befehlsgruppe im Schritt 612. Das Hinterende des Kompilers für dynamisch rekonfigurierbares Rechnen bestimmt dann im Schritt 614, ob der gewählte RTL-Befehl eine RTL-Rekonfigurationsbefehl ist. Falls der gewählte RTL-Befehl kein RTL-Rekonfigurationsbefehl ist, bestimmt das Hinterende des Kompilers für dynamisch rekonfigurierbares Rechnen im Schritt 618, ob eine Regel für die gegenwärtig betrachtete RTL-Befehlsgruppe existiert. Falls nicht, kehrt das bevorzugte Verfahren zum Schritt 612 zurück, um einen nächsten RTL-Befehl auszuwählen, der in die gegenwärtig betrachtete RTL-Befehlsgruppe eingeschlossen werden soll. Für den Fall, daß eine Regel für die gegenwärtig betrachtete RTL-Befehlsgruppe im Schritt 618 existiert, erzeugt das Hinterende des Kompilers für dynamisch rekonfigurierbares Rechnen als nächstes einen Satz von Assemblersprachbefehlen, die der gegenwärtig betrachteten RTL-Befehlsgruppe gemäß dieser Regel entspricht, und zwar im Schritt 620. Nachfolgend zum Schritt 620 bestimmt das Hinterende des Kompilers für dynamisch rekonfigurierbares Rechnen, ob ein anderer RTL-Befehl eine Beachtung innerhalb des Zusammenhangs einer nächsten RTL-Befehlsgruppe erfordert, und zwar im Schritt 622. Falls dem so ist, kehrt das bevorzugte Verfahren zum Schritt 612 zurück, falls dem nicht so ist, endet das bevorzugte Verfahren.In step 610 The tail end of the compiler for dynamically reconfigurable computing performs register mapping operations. In the preferred embodiment of the present invention, each ISA is defined such that the register architecture is consistent from one ISA to another; therefore, the register allocation operations are performed in a conventional manner. Those skilled in the art will recognize that, in general, a consistent register architecture from one ISA to another is not an absolute requirement. Next, the tail end of the dynamic reconfigurable compiler selects a next RTL instruction within a currently considered RTL instruction set in step 612 , The back end of the compiler for dynamically reconfigurable computing then determines in step 614 whether the selected RTL instruction is an RTL reconfiguration instruction. If the selected RTL instruction is not an RTL reconfiguration instruction, the tail end of the dynamic reconfigurable computation compiler determines in step 618 Whether a rule exists for the currently considered RTL command group. If not, the preferred method returns to the step 612 back to select a next RTL instruction to be included in the currently considered RTL instruction group. In the event that a rule for the currently considered RTL command group in step 618 Next, the tail end of the dynamic reconfigurable compiler compiler next generates a set of assembly language instructions that corresponds to the currently considered RTL instruction set according to this rule, in step 620 , Below to the step 620 At the end of the dynamic reconfigurable computation compiler, determines whether another RTL instruction requires attention within the context of a next RTL instruction set, in step 622 , If so, the preferred method returns to the step 612 back, if not, the preferred method ends.

Im Schritt 614 handelt es sieh bei dem gewählten RTL-Befehl um einen RTL-Rekonfigurationsbefehl, das Hinterende des Kompilers für dynamisch rekonfigurierbares Rechnen wählt einen Regelsatz, der der ISA-Identifikation innerhalb des RTL-Rekonfigurationsbefehls entspricht, und zwar im Schritt 616. Bei der vorliegenden Erfindung existiert vorzugsweise ein einziger Regelsatz für jede ISA. Jeder Regelsatz liefert. deshalb eine oder mehrere Regeln zur Umwandlung von Gruppen von RTL-Befehlen in Assemblersprachbefehle in Übereinstimmung mit einer bestimmten ISA. Nachfolgend zum Schritt 616 läuft das bevorzugte Verfahren weiter zum Schritt 618. Der Regelsatz, der einer jeden gegebenen ISA entspricht, beinhaltet vorzugsweise eine Regel, um den RTL-Rekonfigurationsbefehl in einen Satz von Assemblersprachbefehlen zu übersetzen, die einen Software-Interrupt erzeugen, der in der Ausführung einer Rekonfigurations-Handhabungseinrichtung resultiert, wie dies im folgenden detailliert beschrieben werden wird.In step 614 At the selected RTL instruction, the RTL reconfiguration instruction is the tail end of the dynamic reconfigurable compiler compiler selecting a set of rules corresponding to the ISA identification within the RTL reconfiguration instruction, in step 616 , In the present invention, there is preferably a single rule set for each ISA. Each rule set delivers. Therefore, one or more rules for converting groups of RTL instructions to assembly language instructions in accordance with a particular ISA. Below to the step 616 the preferred method proceeds to the step 618 , The rule set corresponding to any given ISA preferably includes a rule to translate the RTL reconfiguration instruction into a set of assembly language instructions that generate a software interrupt resulting in the execution of a reconfiguration handler, as detailed below will be described.

In der oben beschriebenen Art und Weise erzeugt der Kompiler für dynamisch rekonfigurierbares Rechnen selektiv und automatisch Assemblersprachbefehle, und zwar in Übereinstimmung mit mehreren ISAs während der Kompilierungsoperationen. Mit anderen Worten kompiliert während des Kompilierungsprozesses der Kompiler für dynamisch rekonfigurierbares Rechnen einen einzigen Satz von Programminstruktionen gemäß einer variablen ISA. Bei dem Kompiler für dynamisch rekonfigurierbares Rechnen handelt es sich vorzugsweise um einen herkömmlichen Kompiler, der modifiziert ist, um die bevorzugten Kompilierungsoperationen durchzuführen, die oben unter Bezugnahme auf die 3C und 3D beschrieben sind. Fachleute werden erkennen, daß, obwohl die erforderlichen Modifikationen bzw. Änderungen nicht komplex sind, derartige Modifikationen im Hinblick auf sowohl Kompilierungstechniken nach dem Stand der Technik als auch rekonfigurierbare Rechentechniken nach dem Stand der Technik nicht offensichtlich sind.In the manner described above he For dynamic reconfigurable computing, the compiler selectively and automatically generates assembly language instructions in accordance with multiple ISAs during compilation operations. In other words, during the compilation process, the compiler for dynamically reconfigurable computing compiles a single set of program instructions according to a variable ISA. The compiler for dynamically reconfigurable computing is preferably a conventional compiler modified to perform the preferred compilation operations described above with reference to FIGS 3C and 3D are described. Those skilled in the art will recognize that, although the required modifications are not complex, such modifications with respect to both prior art compilation techniques and prior art reconfigurable computational techniques are not obvious.

Nimmt man nun Bezug auf die 4, so ist ein Blockschaltbild bzw. ein Blockdiagramm einer bevorzugten Ausführungsform einer dynamischen rekonfigurierbaren Verarbeitungseinheit 32 gezeigt. Die DRPU 32 umfaßt eine Instruktionsanforderungseinheit (IFU) 60, eine Datenoperationseinheit (DOU) 62 und eine Adressenoperationseinheit (AOU) 64. Sowohl die IFU 60 als auch die DOU 62 und die AOU 64 umfassen einen Zeitsteuereingang, der mit der ersten Zeitsteuersignalleitung 40 verbunden ist. Die IFU 60 umfaßt einen Speichersteuerausgang, der mit der Speichersteuerleitung 42 verbunden ist, einen Dateneingang, der mit der Speicher-I/O-Leitung 46 verbunden ist, und einen bidirektionalen Steuerport, der mit der externen Steuerleitung 48 verbunden ist. Die IFU 60 umfaßt zusätzlich einen ersten Steuerausgang, der mit einem ersten Steuereingang der DOU 62 über eine erste Steuerleitung 70 verbunden ist, und einen zweiten Steuerausgang, der mit einem ersten Steuereingang der AOU 64 über eine zweite Steuerleitung 72 verbunden ist. Die IFU 60 umfaßt ebenso einen dritten Steuerausgang, der mit einem zweiten Steuereingang der DOU 62 und einem zweiten Steuereingang der AOU 64 über eine dritte Steuerleitung 74 verbunden ist. Die DOU 62 und die AOU 64 umfassen jeweils einen bidirektionalen Datenport, der mit der Speicher-I/O-Leitung 46 verbunden ist. Schließlich umfaßt die AOU 64 einen Adressenausgang, der den Adressenausgang der DRPU ausbildet.If one refers now to the 4 Figure 4 is a block diagram or block diagram of a preferred embodiment of a dynamic reconfigurable processing unit 32 shown. The DRPU 32 includes an instruction request unit (IFU) 60 , a data operation unit (DOU) 62 and an address operation unit (AOU) 64 , Both the IFU 60 as well as the DOU 62 and the AOU 64 comprise a timing input associated with the first timing signal line 40 connected is. The IFU 60 comprises a memory control output connected to the memory control line 42 connected to a data input connected to the memory I / O line 46 connected, and a bidirectional control port connected to the external control line 48 connected is. The IFU 60 additionally includes a first control output connected to a first control input of the DOU 62 via a first control line 70 connected, and a second control output connected to a first control input of the AOU 64 via a second control line 72 connected is. The IFU 60 also includes a third control output connected to a second control input of the DOU 62 and a second control input of the AOU 64 via a third control line 74 connected is. The DOU 62 and the AOU 64 each includes a bidirectional data port that connects to the memory I / O line 46 connected is. Finally, the AOU includes 64 an address output which forms the address output of the DRPU.

Die DRPU 32 wird vorzugsweise realisiert, indem eine rekonfigurierbare oder reprogrammierbare Logikvorrichtung, wie z.B. eine FPGA, wie z.B. eine Xilinx XC4013 (Xilinx, Inc., San Jose, CA, USA) oder eine AT&T ORCA^TM 1C07 (AT&T Microelectronics, Allentown, PA, USA) verwendet wird. Vorzugsweise stellt die reprogrammierbare Logikvorrichtung eine Vielzahl von folgendem bereit: 1) selektiv reprogrammierbare Logikblöcke oder konfigurierbare Logikblöcke (CLBs); 2) selektiv reprogrammierbare I/O-Blöcke (IOBs); 3) selektiv reprogrammierbare Verbindungsstrukturen; 4) Datenspeicher-Systemeinheiten; 5) Dreizustands-Puffer-Systemeinheiten; und 6) Funktionsfähigkeiten einer fest verdrahteten Logik. Jede CLB beinhaltet vorzugsweise eine selektiv rekonfigurierbare Schaltung zur Erzeugung von Logikfunktionen, Speicherdaten und Wegeermittlungssignalen bzw. Leitwegsignalen. Fachleute werden erkennen, daß eine rekonfigurierbare Datenspeicherschaltung auch in einer oder mehreren Datenspeicherblöcken (DSBs) beinhaltet sein können, die von dem Satz von CLBs getrennt sind, und zwar in Abhängigkeit von der exakten Ausgestaltung der rekonfigurierbaren Logikvorrichtung, die verwendet wird. Hier befindet sich die rekonfigurierbare Datenspeicherschaltung, die innerhalb einer FPGA ist, innerhalb der CLBs; d.h. die Gegenwart von DSBs wird nicht angenommen. Fachleute werden leicht erkennen, daß eine oder mehrere Elemente, die hierin beschrieben sind, die eine CLB-basierte rekonfigurierbare Datenspeicherschaltung verwenden, eine DSB-basierte Schaltung für den Fall verwenden könnten, daß DSBs vorhanden sind. Jede IOB beinhaltet vorzugsweise eine selektiv rekonfigurierbare Schaltung, um Daten zwischen CLBs und einem FPGA-Ausgangspin zu übertragen. Ein Konfigurationsdatensatz legt eine DRPU-Hardware-Konfiguration oder -Organisation fest, indem Funktionen spezifiziert werden, die innerhalb von CLBs durchgeführt werden, sowie. Verbindungen spezifiziert werden, und zwar wie folgt: 1) innerhalb CLBs; 2) zwischen CLBs; 3) innerhalb IOBs; 4) zwischen IOBs; und 5) zwischen CLBs und IOBs. Fachleute werden erkennen, daß über einen Konfigurationsdatensatz die Anzahl von Bits sowohl in der Speichersteuerleitung 42 als auch in der Adreßleitung 44, der Speicher-I/O-Leitung 46 und der externen Steuerleitung 48 rekonfigurierbar ist. Vorzugsweise werden Konfigurationsdatensätze in einem oder mehreren S-Maschinenspeichern 34 innerhalb des Systems 10 gespeichert. Fachleute werden erkennen, daß die DRPU 32 nicht auf eine FPGA-basierte Implementation bzw. Realisierung beschränkt ist. Zum Beispiel könnte die DRPU 32 als eine RAM-basierte Zustandsmaschine verwirklicht werden, die möglicherweise eine oder mehrere Nachschlag- bzw. Verweistabellen enthält. Alternativ könnte die DRPU 32 realisiert werden, indem eine komplex programmierbare Logikvorrichtung bzw. eine "Complex Programmable Logic Device" (CPLD) verwendet wird. Jedoch werden Fachleute erkennen, daß einige der S-Maschinen 12 des Systems 10 DRPUs 32 enthalten können, die nicht rekonfigurierbar sind.The DRPU 32 is preferably implemented by using a reconfigurable or reprogrammable logic device such as an FPGA such as a Xilinx XC4013 (Xilinx, Inc., San Jose, CA, USA) or an AT & T ORCA ^™ 1C07 (AT & T Microelectronics, Allentown, PA, USA). is used. Preferably, the reprogrammable logic device provides a plurality of: 1) selectively reprogrammable logic blocks or configurable logic blocks (CLBs); 2) selectively reprogrammable I / O blocks (IOBs); 3) selectively reprogrammable interconnect structures; 4) data storage system units; 5) three-state buffer system units; and 6) functional capabilities of hardwired logic. Each CLB preferably includes a selectively reconfigurable circuit for generating logic functions, memory data, and routing signals. Those skilled in the art will recognize that a reconfigurable data storage circuit may also be included in one or more data storage blocks (DSBs) that are separate from the set of CLBs, depending on the exact configuration of the reconfigurable logic device that is used. Here is the reconfigurable data storage circuit, which is inside an FPGA, within the CLBs; ie the presence of DSBs is not accepted. Those skilled in the art will readily recognize that one or more elements described herein that use a CLB-based reconfigurable data storage circuit could use DSB-based circuitry in the event that DSBs are present. Each IOB preferably includes a selectively reconfigurable circuit to transfer data between CLBs and an FPGA output pin. A configuration record sets a DRPU hardware configuration or organization by specifying functions performed within CLBs as well. Connections are specified, as follows: 1) within CLBs; 2) between CLBs; 3) within IOBs; 4) between IOBs; and 5) between CLBs and IOBs. Those skilled in the art will recognize that via a configuration record, the number of bits in both the memory control line 42 as well as in the address line 44 , the memory I / O line 46 and the external control line 48 is reconfigurable. Preferably, configuration records are stored in one or more S-machines 34 within the system 10 saved. Experts will recognize that the DRPU 32 is not limited to FPGA-based implementation. For example, the DRPU 32 are implemented as a RAM-based state machine that may contain one or more lookup tables. Alternatively, the DRPU 32 can be realized by using a Complex Programmable Logic Device (CPLD). However, those skilled in the art will recognize that some of the S-machines 12 of the system 10 DRPUs 32 that are not reconfigurable.

Bei der bevorzugten Ausführungsform sind sowohl die IFU 60 als auch die DOU 62 und die AOU 64 dynamisch rekonfigurierbar. Somit kann ihre interne Hardware-Konfiguration selektiv während der Programmausführung geändert werden. Die IFU 60 verwaltet Instruktionsanweisungs- und Decodieroperationen, Speicherzugriffsoperationen, DRPU-Rekonfigurationsoperationen und gibt Steuersignale zu der DOU 62 und der AOU 64 aus, um die Instruktionsausübung zu erleichtern. Die DOU 62 führt Operationen aus, die eine Datenberechnung mit einschließen bzw. mit sich bringen und die AOU 64 führt Operationen aus, die eine Adressenberechnung mit sich bringen. Die interne Struktur und der Betrieb sowohl der IFU 60 als auch der DOU 62 und der AOU 64 wird nun detailliert beschrieben.In the preferred embodiment, both the IFUs 60 as well as the DOU 62 and the AOU 64 dynamically reconfigurable. Thus, their internal hardware configuration can be selectively during the Pro program execution. The IFU 60 manages instruction instruction and decode operations, memory access operations, DRPU reconfiguration operations, and provides control signals to the DOU 62 and the AOU 64 to facilitate the instruction exercise. The DOU 62 Performs operations involving a data calculation and the AOU 64 performs operations that involve an address calculation. The internal structure and operation of both the IFU 60 as well as the DOU 62 and the AOU 64 will now be described in detail.

Nimmt man Bezug auf die 5, so ist ein Blockdiagramm bzw. ein Blockschaltbild einer bevorzugten Ausführungsform der Instruktionsabrufeinheit 60 bzw. "Instruction Fetch Unit" 60 gezeigt. Die IFU 60 umfaßt eine Instruktionszustandsfolgesteuereinheit bzw. einen "Instruction State Sequencer" (ISS) 100, einen Architekturbeschreibungsspeicher 101, eine Speicherzugriffslogik 102, eine Rekonfigurationslogik 104, eine Interruptlogik 106, eine Abrufsteuereinheit 108, einen Instruktionspuffer 110, eine Dekodersteuereinheit 112, einen Instruktionsdecoder 114, einen Operationscode-Speicherregistersatz 116, einen Registerfile(RF)-Adressenregistersatz 118, einen Konstantenregistersatz 120 und einen Prozeßsteuerregistersatz 122. Die ISS 100 umfaßt einen ersten bzw. einen zweiten Steuerausgang, der den ersten bzw. zweiten Steuerausgang der IFU ausbildet, und einen Zeitsteuereingang, der den Zeitsteuereingang der IFU ausbildet. Die ISS 100 umfaßt ebenso einen Abruf-/Decodersteuerausgang, der mit einem Steuereingang der Abrufsteuereinheit 108 und einem Steuereingang der Decodiersteuereinheit 112 über eine Abruf-/Decodiersteuerleitung 130 verbunden ist. Die ISS 100 weist zusätzlich einen bidirektionalen Steuerport auf, der mit einem ersten bidirektionalen Steuerport sowohl der Speicherzugriffslogik 102 als auch der Rekonfigurationslogik 104 und der Interruptlogik 106 über eine bidirektionale Steuerleitung 132 verbunden ist. Die ISS 100 umfaßt ebenso einen Operationscodeeingang, der mit einem Ausgang des Operationscodespeicherregistersatzes 116 über eine Operationscodeleitung 142 verbunden ist. Schließlich umfaßt die ISS 100 einen bidirektionalen Datenport, der mit einem bidirektionalen Datenport des Prozeßsteuerregistersatzes 122 über eine Prozeßdatenleitung 144 verbunden ist.If one refers to the 5 Fig. 12 is a block diagram and a block diagram of a preferred embodiment of the instruction fetch unit 60 or "Instruction Fetch Unit" 60 shown. The IFU 60 includes an instruction state sequencer (ISS) 100 , an architecture description store 101 , a memory access logic 102 , a reconfiguration logic 104 , an interrupt logic 106 , a polling control unit 108 , an instruction buffer 110 , a decoder control unit 112 , an instruction decoder 114 , an opcode storage register set 116 , a register file (RF) address register set 118 , a constant register set 120 and a process control register set 122 , The ISS 100 comprises a first and a second control output, which forms the first and second control output of the IFU, and a timing input, which forms the timing input of the IFU. The ISS 100 also includes a fetch / decoder control output coupled to a control input of the fetch control unit 108 and a control input of the decoder control unit 112 via a polling / decoding control line 130 connected is. The ISS 100 In addition, it has a bidirectional control port that communicates with a first bi-directional control port of both memory access logic 102 as well as the reconfiguration logic 104 and the interrupt logic 106 via a bidirectional control line 132 connected is. The ISS 100 also includes an opcode input associated with an output of the opcode memory register set 116 via an opcode line 142 connected is. Finally, the ISS covers 100 a bidirectional data port associated with a bidirectional data port of the process control register set 122 via a process data line 144 connected is.

Sowohl die Speicherzugriffslogik 102 als auch die Rekonfigurationslogik 104 und die Interruptlogik 106 umfassen einen zweiten bidirektionalen Steuerport, der mit der externen Steuerleitung 48 verbunden ist. Die Speicherzugriffslogik 102, die Rekonfigurationslogik 104 und die Interruptlogik 106 umfassen zusätzlich jeweils einen Dateneingang, der mit einem Datenausgang des Architekturbeschreibungsspeichers 101 über eine Implementationssteuerleitung 131 bzw. Realisierungssteuerleitung 131 verbunden ist. Die Speicherzugriffslogik 102 umfaßt zusätzlich einen Steuerausgang, der den Speichersteuerausgang der IFU ausbildet. Und die Inter ruptlogik 106 umfaßt einen Ausgang, der mit der Prozeßdatenleitung 144 verbunden ist. Der Instruktionspuffer 110 umfaßt einen Dateneingang, der den Dateneingang der IFU ausbildet, einen Steuereingang, der mit einem Steuerausgang der Abrufsteuereinheit 108 über eine Abrufsteuerleitung 134 verbunden ist, und einen Ausgang, der mit einem Eingang des Instruktionsdecoders 114 über eine Instruktionsleitung 136 verbunden ist. Der Instruktionsdecoder 114 umfaßt einen Steuereingang, der mit einem Steuerausgang der Decodiersteuereinheit 112 über eine Decodiersteuerleitung 138 verbunden ist, und einen Ausgang, der über eine Decodierinstruktionsleitung 140 mit 1) einem Eingang des Operationscode-Speicherregistersatzes 116; 2) einem Eingang des RF-Adressenregistersatzes 118; und 3) einem Eingang des Konstantenregistersatzes 120 verbunden ist. Der RF-Adressenregistersatz 118 und der Konstantenregistersatz 120 umfassen jeweils einen Ausgang, die zusammen den dritten Steuerausgang 74 der IFU ausbilden.Both the memory access logic 102 as well as the reconfiguration logic 104 and the interrupt logic 106 comprise a second bidirectional control port connected to the external control line 48 connected is. The memory access logic 102 , the reconfiguration logic 104 and the interrupt logic 106 additionally each comprise a data input connected to a data output of the architecture description memory 101 via an implementation control line 131 or realization control line 131 connected is. The memory access logic 102 additionally includes a control output which forms the memory control output of the IFU. And the interrupt logic 106 includes an output connected to the process data line 144 connected is. The instruction buffer 110 comprises a data input forming the data input of the IFU, a control input connected to a control output of the polling control unit 108 via a call control line 134 and an output connected to an input of the instruction decoder 114 via an instruction line 136 connected is. The instruction decoder 114 comprises a control input connected to a control output of the decoder control unit 112 via a decode control line 138 and an output connected via a decode instruction line 140 with 1) an input of the opcode memory register set 116 ; 2) an input of the RF address register set 118 ; and 3) an input of the constant register set 120 connected is. The RF address register set 118 and the constant register set 120 each comprise an output which together form the third control output 74 train the IFU.

Der Architekturbeschreibungsspeicher 101 speichert Architekturbeschreibungssignale, die die gegenwärtige DRPU-Konfiguration kennzeichnen. Vorzugsweise beinhalten die Architekturspezifikationssignale 1) einen Bezug bzw. eine Referenz zu einem Ausgangskonfigurationsdatensatz bzw. Default-Konfigurationsdatensatz; 2) einen Bezug bzw. eine Referenz zu einer Liste von möglichen Konfigurationsdatensätzen; 3) einen Bezug bzw. eine Referenz zu einem Konfigurationsdatensatz, der der gegenwärtig betrachteten ISA entspricht, d.h. einen Bezug zu dem Konfigurationsdatensatz, der die gegenwärtige DRPU-Konfiguration festlegt; 4) eine Verbindungsadressenliste, die eine oder mehrere Verbindungs-I/O-Einheiten 304 innerhalb der T-Maschine 14 identifiziert, die der S-Maschine 12 zugeordnet ist, in der sich die IFU 60 befindet; 5) einen Satz von Interrupt-Antwortsignalen, die eine Interruptsuchzeit bzw. Interruptverzögerungszeit und eine Interrupt-Präzisionsin-formation, die festlegt, wie die IFU 60 auf die Interrupt antwortet, spezifizieren; und 6) eine Speicherzugriffskonstante, die eine atomare Speicheradresseninkrementierung festlegt. Bei der bevorzugten Ausführungsform realisiert bzw. implementiert jeder Konfigurationsdatensatz den Architekturbeschreibungsspeicher 101 als einen Satz von CLBs, der als ein Nur-Lese-Speicher bzw. "Read Only Memory" (ROM) konfiguriert ist. Die Architekturspezifikationssignale, die den Inhalt des Architekturbeschreibungsspeichers 101 festlegen, sind vorzugsweise in jedem Konfigurationsdatensatz enthalten. Da jeder Konfigurationsdatensatz einer bestimmten ISA entspricht, variiert der Inhalt des Architekturbeschreibungsspeichers 101 gemäß der ISA, die gegenwärtig betrachtet wird. Für eine gegebene ISA wird der Programmzugriff auf den Inhalt des Architekturbeschreibungsspeichers 101 vorzugsweise erleichtert, indem eine Speicherleseinstruktion in die. ISA eingeschlossen wird bzw. mit aufgenommen wird. Dies ermöglicht es, daß ein Programm Informationen über die gegenwärtige DRPU-Konfiguration während der Programmausführung wiederfindet.The architecture description store 101 stores architectural description signals identifying the current DRPU configuration. Preferably, the architectural specification signals 1) include a reference to a default configuration data set or default configuration data set; 2) a reference to a list of possible configuration records; 3) a reference to a configuration record corresponding to the currently considered ISA, ie, a reference to the configuration record defining the current DRPU configuration; 4) a connection address list containing one or more connection I / O units 304 within the T-machine 14 identified that the S machine 12 in which the IFU 60 is; 5) a set of interrupt response signals including an Interrupt Delay Time and an Interrupt Precision Information defining how the IFU 60 to which interrupt responds, specify; and 6) a memory access constant defining an atomic memory address increment. In the preferred embodiment, each configuration record implements the architecture description store 101 as a set of CLBs configured as a read only memory (ROM). The architecture specification signals representing the contents of the architecture description store 101 are preferably included in each configuration record. Since each configuration record corresponds to a particular ISA, the content of the architecture description store varies 101 according to the ISA currently being considered. For a given ISA, program access to the contents of the Ar chitekturbeschreibungsspeichers 101 preferably facilitated by a memory reading instruction in the. ISA is included or is included. This allows a program to retrieve information about the current DRPU configuration during program execution.

Bei der vorliegenden Erfindung handelt es sich bei der Rekonfigurationslogik 104 um eine Zustandsmaschine, die eine Abfolge von Rekonfigurationsoperationen steuert, die die Rekonfiguration der DRPU 32 gemäß einem Konfigurationsdaten satz erleichtert. Vorzugsweise löst die Rekonfigurationslogik 104 die Rekonfigurationsoperationen nach dem Empfang eines Rekonfigurationssignals aus. Wie detallierter weiter unten beschrieben werden wird, wird das Rekonfigurationssignal durch die Interruptlogik 106 in Antwort auf einen Rekonfigurationsinterrupt erzeugt, der auf der externen Steuerleitung 48 empfangen wird, oder es wird durch die ISS 100 in Antwort auf eine Rekonfigurationsanweisung, die in einem Programm eingebaut ist, erzeugt. Die Rekonfigurationsoperationen stellen eine anfängliche DRPU-Konfiguration bereit, die einer Einschalt-/Reset-Bedingung folgt, die den Ausgangs-Konfigurationsdatensatz bzw. Default-Konfigurationsdatensatz verwendet, auf den der Architekturbeschreibungsspeicher 101 verweist. Die Rekonfigurationsoperationen stellen ebenso eine selektive DRPU-Rekonfiguration bereit, nachdem die anfängliche DRPU-Konfiguration erstellt worden ist. Nach der Vollendung der Rekonfigurationsoperationen gibt die Rekonfigurationslogik 104 ein Vollendungssignal ab. Bei der bevorzugten Ausführungsform handelt es sich bei der Rekonfigurationslogik 104 um eine nicht-rekonfigurierbare Logik, die das Laden von Konfigurationsdatensätzen in die reprogrammierbare Logikvorrichtung selbst steuert, und somit wird die Abfolge bzw. Sequenz von Rekonfigurationsoperationen durch den Hersteller der reprogrammierbaren Logikvorrichtung festgelegt. Die Rekonfigurationsoperationen werden nun für die Fachleute bekannt sein.The present invention is the reconfiguration logic 104 a state machine that controls a sequence of reconfiguration operations that reconfigure the DRPU 32 facilitated according to a configuration data set. Preferably, the reconfiguration logic triggers 104 the reconfiguration operations after receiving a reconfiguration signal. As will be described in more detail below, the reconfiguration signal will be through the interrupt logic 106 in response to a reconfiguration interrupt generated on the external control line 48 is received, or it will be through the ISS 100 in response to a reconfiguration instruction built in a program. The reconfiguration operations provide an initial DRPU configuration that follows a power-on / reset condition that uses the default configuration record to which the architecture description store refers 101 points. The reconfiguration operations also provide selective DRPU reconfiguration after the initial DRPU configuration has been created. After completing the reconfiguration operations, the reconfiguration logic returns 104 a completion signal. In the preferred embodiment, the reconfiguration logic is 104 a non-reconfigurable logic that controls the loading of configuration records into the reprogrammable logic device itself, and thus the sequence of reconfiguration operations is determined by the manufacturer of the reprogrammable logic device. The reconfiguration operations will now be known to those skilled in the art.

Jede DRPU-Konfiguration ist vorzugsweise durch einen Konfigurationsdatensatz gegeben, der eine bestimmte Hardware-Organisation festlegt, die auf die Implementation bzw. Rea lisierung einer entsprechenden ISA ausgerichtet ist bzw. zugeordnet ist. Bei der bevorzugten Ausführungsform beinhaltet die IFU 60 jedes der Elemente, auf die oben hingewiesen wurde, und zwar unabhängig von der DRPU-Konfiguration. Bei einem Basisniveau bzw. bei einer Grundebene ist die Funktionalität bzw. Funktionstüchtigkeit, die durch jedes Element innerhalb der IFU 60 bereitgestellt wird, unabhängig von der gegenwärtig betrachteten ISA. Jedoch kann bei der bevorzugten Ausführungsform die detaillierte Struktur und Funktionstüchtigkeit von einem oder mehreren Elementen der IFU 60 sich verändern, und zwar aufgrund der Natur der ISA, für die sie konfiguriert worden ist. Bei der bevorzugten Ausführungsform bleibt die Struktur und Funktionalität des Architekturbeschreibungsspeichers 101 und der Rekonfigurationslogik 104 vorzugsweise von einer DRPU-Konfiguration zu einer anderen konstant. Die Struktur und Funktionstüchtigkeit bzw. Funktionalität der anderen Elemente der IFU 60 und die Art und Weise, in der sie gemäß dem Typ der ISA variieren, wird nun im Detail beschrieben.Each DRPU configuration is preferably given by a configuration record specifying a particular hardware organization that is aligned with the implementation of a corresponding ISA. In the preferred embodiment, the IFU 60 each of the elements referred to above, regardless of the DRPU configuration. At a base level, or at a ground level, the functionality is that of each element within the IFU 60 regardless of the currently considered ISA. However, in the preferred embodiment, the detailed structure and functionality of one or more elements of the IFU 60 change due to the nature of the ISA for which it has been configured. In the preferred embodiment, the structure and functionality of the architecture description memory remains 101 and the reconfiguration logic 104 preferably constant from one DRPU configuration to another. The structure and functionality of the other elements of the IFU 60 and the manner in which they vary according to the type of ISA will now be described in detail.

Der Prozeßsteuerregistersatz 122 speichert Signale und Daten, die durch die ISS 100 während der Ausführung von Instruktionen verwendet wird. Bei der bevorzugten Ausführungsform umfaßt der Prozeßsteuerregistersatz 122 ein Register, um ein Prozeßsteuerwort zu speichern, ein Register, um einen Interruptvektor zu speichern, und ein Register, um einen Bezug zu einem Konfigurationsdatensatz zu speichern. Das Prozeßsteuerwort beinhaltet vorzugsweise eine Vielzahl von Bedingungsflags, die wahlweise gesetzt und rückgesetzt werden können, und zwar in Abhängigkeit von Bedingungen, die während der Instruktionsausführung auftreten. Das Prozeßsteuerwort beinhaltet zusätzlich eine Vielzahl von Übergangssteuersignalen, die eine oder mehrere Arten und Weisen festlegen, in denen Interrupts abgearbeitet werden können, wie im folgenden detaillierter beschrieben werden wird. Bei der bevorzugten Ausführungsform ist der Prozeßsteuerregistersatz 122 als ein Satz von CLBs realisiert, der für Datenspeicherung und für Gattersteuerlogik bzw. logische Torschaltungen konfiguriert ist.The process control register set 122 stores signals and data by the ISS 100 while executing instructions. In the preferred embodiment, the process control register set comprises 122 a register to store a process control word, a register to store an interrupt vector, and a register to store a reference to a configuration record. The process control word preferably includes a plurality of condition flags that may be selectively set and reset depending on conditions encountered during instruction execution. In addition, the process control word includes a plurality of transition control signals that determine one or more ways in which interrupts can be executed, as will be described in more detail below. In the preferred embodiment, the process control register set 122 realized as a set of CLBs configured for data storage and gate control logic.

Die ISS 100 ist vorzugsweise eine Zustandsmaschine, die den Betrieb der Abrufsteuereinheit 108, der Decodiersteuereinheit 112, der DOU 62 und der AOU 64 steuert und Speicherlese- und Speicherschreibsignale zu der Speicherzugriffslogik 102 ausgibt, um die Instruktionsausführung zu erleichtern.The ISS 100 is preferably a state machine which controls the operation of the polling control unit 108 , the decoding controller 112 , the DOU 62 and the AOU 64 controls and memory read and write memory signals to the memory access logic 102 to facilitate the instruction execution.

Wie im folgenden unter Bezugnahme auf die 6 beschrieben werden wird, erzeugt die Interruptlogik 106 Übergangssteuersignale und speichert die Übergangssteuersignale in dem Prozeßsteuerwort innerhalb des Prozeßsteuerregistersatzes 122. Die Übergangssteuersignale weisen vorzugsweise darauf hin, welcher der Zustände F, D, E, M, W und Y unterbrechbar sind bzw. auf ein Interrupt reagieren, und auf ein Niveau bzw. eine Ebene einer Interruptpräzision, die in bzw. bei jedem unterbrechbaren Zustand auf einen nächsten Zustand, bei dem di Instruktionsausführung fortgesetzt werden soll und der dem Zustand I folgt. Falls die ISS 100 ein Interruptmeldungssignal empfängt, während sie in einem gegebenen Zustand ist, schreitet die ISS 100 zu dem Zustand I, falls das Übergangssteuersignal anzeigt, daß der gegenwärtige Zustand unterbrechbar ist bzw. auf ein Interrupt ansprechen kann. Andernfalls schreitet die ISS 100 fort als ob sie kein Interruptsignal empfangen hätte, bis sie einen unterbrechbaren Zustand erreicht.As in the following with reference to the 6 will generate the interrupt logic 106 Transition control signals and stores the transition control signals in the process control word within the process control register set 122 , The transition control signals preferably indicate which of the states F, D, E, M, W, and Y are interruptible, and a level of interrupt precision that occurs in each interruptible state a next state in which the instruction execution should continue and which follows the state I. If the ISS 100 An interrupt notification signal, while in a given state, is received by the ISS 100 to the state I, if the transition control signal indicates that the current state is interruptible or responsive to an interrupt. Otherwise, the ISS will proceed 100 as if it had not received an interrupt signal until it is an interruptible State reached.

Wenn einmal die ISS 100 zum Zustand I fortgeschritten ist, greift die ISS 100 vorzugsweise auf den Prozeßsteuerregistersatz 122 zu, um ein Interruptmaskierflag bzw. eine Interruptmaskiermarke zu setzen, und sie empfängt einen Interruptvektor. Nachdem der Interruptvektor wiedergewonnen bzw. ausgelesen ist, arbeitet die ISS 100 vorzugsweise über einen herkömmlichen Subroutinensprung zu einer Interrupthandhabungseinrichtung bzw. zu einem Interrupthandler, wie es durch den Interruptvektor beschrieben ist, den gegenwärtigen Interrupt ab.If once the ISS 100 to state I advanced, the ISS attacks 100 preferably to the process control register set 122 to set an interrupt mask flag and receives an interrupt vector. After the interrupt vector is retrieved, the ISS operates 100 preferably via a conventional subroutine jump to an interrupt handler or to an interrupt handler as described by the interrupt vector, the current interrupt.

Bei der vorliegenden Erfindung wird die Rekonfiguration der DRPU 32 in Antwort auf folgendes ausgelöst: 1) einen Rekonfigurationsinterrupt, der auf der externen Steuerleitung 48 aktiv ist; oder 2) die Ausführung einer Rekonfigurationsanweisung innerhalb einer Abfolge von Programminstruktionen. Bei der bevorzugten Ausführungsform führen sowohl der Rekonfigurationsinterrupt als auch die Ausführung einer Rekonfigurationsanweisung zu einem Subroutinensprung zu einer Rekonfigurationshandhabungseinrichtung bzw. zu einem Rekonfigurationshandler. Vorzugsweise speichert die Rekonfigurationshandhabungseinrichtung Programmzustandsinformationen und gibt eine Konfigurationsdatensatzadresse und das Rekonfigurationssignal zu der Rekonfigurationslogik 104.In the present invention, the reconfiguration of the DRPU 32 in response to the following: 1) a reconfiguration interrupt on the external control line 48 is active; or 2) the execution of a reconfiguration instruction within a sequence of program instructions. In the preferred embodiment, both the reconfiguration interrupt and the execution of a reconfiguration instruction result in a subroutine jump to a reconfiguration handler or reconfiguration handler, respectively. Preferably, the reconfiguration handler stores program state information and provides a configuration data set address and the reconfiguration signal to the reconfiguration logic 104 ,

In dem Fall, daß es sich bei dem vorliegenden Interrupt nicht um einen Rekonfigurationsinterrupt handelt, schreitet die ISS 100 zu dem nächsten Zustand fort, wie dies durch die Übergangssteuersignale angezeigt wird, wenn der Interrupt empfangen worden ist, wodurch ein Instruktionsausführungszyklus wieder aufgenommen, vollendet oder ausgelöst wird.In the event that the present interrupt is not a reconfiguration interrupt, the ISS proceeds 100 to the next state, as indicated by the transition control signals, when the interrupt has been received, resuming, completing, or firing an instruction execution cycle.

Bei der bevorzugten Ausführungsform variiert der Satz an Zuständen, der durch die ISS 100 unterstützt wird, gemäß der Natur der ISA, für die die DRPU 32 konfiguriert ist. Somit würde der Zustand M für eine ISA nicht vorliegen, in der eine oder mehrere Instruktionen in einem einzigen Taktzyklus ausgeführt werden können, was der Fall wäre bei einer typischen Innenschleifen-ISA. Wie gezeigt, definiert das Zustandsdiagramm der 6 vorzugsweise die Zustände, die durch die ISS 100 festgelegt werden, um eine Allzweck-Außenschleifen-ISA zu realisieren. Zur Realisierung der Innenschleifen-ISA unterstützt die ISS 100 vorzugsweise mehrere Sätze von Zuständen F, D, E und W, und zwar parallel, wodurch eine pipelineartige Steuerung einer Instruktionsausführung in einer Art und Weise erleichtert wird, die leicht von Fachleuten verstanden wird. Bei der bevorzugten Ausführungsform ist die ISS 100 als eine CLB-basierte Zustandsmaschine realisiert, die die Zustände oder einen Untersatz von den Zuständen, die oben beschrieben wurden, in Übereinstimmung mit der gegenwärtig betrachteten ISA unterstützt.In the preferred embodiment, the set of states varied by the ISS 100 supported, according to the nature of the ISA, for which the DRPU 32 is configured. Thus, the M state would not be for an ISA in which one or more instructions can be executed in a single clock cycle, which would be the case for a typical inner-loop ISA. As shown, the state diagram defines the 6 preferably the states that pass through the ISS 100 be set to realize a general purpose external loop ISA. The ISS supports the implementation of the inner-loop ISA 100 preferably multiple sets of states F, D, E and W in parallel, thereby facilitating pipelined control of instruction execution in a manner readily understood by those skilled in the art. In the preferred embodiment, the ISS 100 as a CLB-based state machine that supports the states or a subset of the states described above in accordance with the currently considered ISA.

Die Interruptlogik 106 umfaßt vorzugsweise eine Zustandsmaschine, die Übergangssteuersignale erzeugt und Interruptmeldeoperationen in Antwort auf ein Interruptsignal durchführt, das über die externe Steuerleitung 48 empfangen wird. Nimmt man Bezug auf 6, so ist ein Zustandsdiagramm gezeigt, das einen bevorzugten Satz von Zuständen zeigt, die durch die Interruptlogik 106 unterstützt werden. Die Interruptlogik 106 beginnt ihren Betrieb bzw. ihre Operation im Zustand P. Der Zustand P entspricht einer Einschalt-, Reset- oder Rekonfigurationsbedingung. In Antwort auf das Vollendungssignal, das durch die Rekonfigurationslogik 104 abgegeben wird, schreitet die Interruptlogik 106 zum Zustand A und liest das Interruptantwortsignal aus dem Architekturbeschreibungsspeicher 101. Die Interruptlogik 106 erzeugt dann das Übergangssteuersignal aus den Interruptantwortsignalen und speichert das Übergangssteuersignal in dem Prozeßsteuerregistersatz 122. Bei der bevorzugten Ausführungsform beinhaltet die Interruptlogik 106 ein CLB-basiertes programmierbares Logikfeld bzw. eine CLB-basierte "Programmable Logic Array" (PLA), um die Interruptantwortsignale zu empfangen und um die Übergangssteuersignale zu erzeugen. Nachfolgend zum Zustand A schreitet die Interruptlogik 106 zu dem Zustand B fort, um auf ein Interruptsignal zu warten. Nach dem Empfang eines Interruptsignals schreitet die Interruptlogik 106 zu dem Zustand C in dem Fall fort, daß das Interruptmaskierflag innerhalb des Prozeßsteuerregistersatzes 122 zurückgesetzt wird. Wenn einmal der Zustand C erreicht ist bzw. vorliegt, bestimmt die Interruptlogik 106 den Ursprung des Interrupts, eine Interruptpriorität und eine Adresse der Interrupthandhabungseinrichtung bzw. eine Interrupthandleradresse. In dem Fall, daß das Interruptsignal ein Rekonfigurationsinterrupt ist, schreitet die Interruptlogik 106 zu dem Zustand R und speichert eine Konfigurationsdatensatzadresse in dem Prozeßsteuerregistersatz 122. Nach dem Zustand P oder nachfolgend zu dem Zustand C schreitet die Interruptlogik 106, in dem Fall, daß das Interruptsignal nicht ein Rekonfigurationsinterrupt ist, zu dem Zustand N fort und speichert die Interrupthandleradresse bzw. die Adresse der Interrupthandhabungseinrichtung in dem Prozeßsteuerregistersatz 122. Die Interruptlogik 106 schreitet als nächstes zu dem Zustand X und gibt ein Interruptmeldesignal zu der ISS 100. Nachfolgend zu dem Zustand X kehrt die Interruptlogik 122 zum Zustand B zurück, um auf ein nächstes Interruptsignal zu waren.The interrupt logic 106 preferably comprises a state machine which generates transition control signals and performs interrupt signaling operations in response to an interrupt signal via the external control line 48 Will be received. If you take reference 6 Thus, a state diagram showing a preferred set of states by the interrupt logic is shown 106 get supported. The interrupt logic 106 starts its operation in state P. State P corresponds to a switch-on, reset or reconfiguration condition. In response to the completion signal generated by the reconfiguration logic 104 is dispensed, the interrupt logic proceeds 106 to state A and reads the interrupt response signal from the architecture description memory 101 , The interrupt logic 106 then generates the transition control signal from the interrupt response signals and stores the transition control signal in the process control register set 122 , In the preferred embodiment, the interrupt logic is included 106 a CLB-based programmable logic array (PLA) to receive the interrupt response signals and to generate the transition control signals. Subsequent to state A, the interrupt logic proceeds 106 to state B to wait for an interrupt signal. Upon receipt of an interrupt signal, the interrupt logic proceeds 106 to state C in the event that the interrupt mask flag within the process control register set 122 is reset. Once state C is reached, the interrupt logic determines 106 the origin of the interrupt, an interrupt priority and an address of the interrupt handler or an interrupt handler address. In the event that the interrupt signal is a reconfiguration interrupt, the interrupt logic proceeds 106 to the state R and stores a configuration record address in the process control register set 122 , After state P or subsequent to state C, the interrupt logic proceeds 106 in the event that the interrupt signal is not a reconfiguration interrupt, goes to state N and stores the interrupt handler address or address of the interrupt handler in the process control register set 122 , The interrupt logic 106 next goes to state X and gives an interrupt notification signal to the ISS 100 , Subsequent to state X, the interrupt logic returns 122 back to state B to be for a next interrupt signal.

Bei der bevorzugten Ausführungsform variiert der Pegel bzw. die Ebene der Interruptwartezeit bzw. der Interruptverzögerungszeit, wie sie durch die Interruptantwortsignale und folglich durch die Übergangssteuersignale spezifiziert ist, in Abhängigkeit von der gegenwärtigen ISA, für die die DRPU 32 konfiguriert worden ist. Zum Beispiel erfordert eine ISA, die auf Hochleistungs-Echtzeitbewegungssteuerung ausgerichtet ist, schnelle und vorhersehbare Interruptantwortfähigkeiten. Der Konfigurationsdatensatz, der einer derartigen ISA entspricht, beinhaltet deshalb vorzugsweise Interruptantwortsignale, die anzeigen, daß eine Unterbrechung mit geringer Wartezeit bzw. Verzögerungszeit erforderlich ist. Die entsprechenden Übergangssteuersignale identifizieren wiederum mehrere ISS- Zustände als unterbrechbar, wodurch es einem Interrupt ermöglicht wird, einen Instruktionsausführungszyklus auszusetzen bzw. aufzugeben, bevor der Instruktionsausführungszyklus vollendet ist. Im Gegensatz zu einer ISA, die auf Echtzeitbewegungssteuerung ausgerichtet ist, benötigt eine ISA, die auf Bildfaltungsoperationen ausgerichtet ist, Interruptantwortfähigkeiten, die gewährleisten, daß die Anzahl der Faltungsoperationen, die pro Zeiteinheit durchgeführt wird, maximiert ist. Der Konfigurationsdatensatz, der der Bildfaltungs-ISA entspricht, beinhaltet vorzugsweise Interruptantwortsignale, die spezifizieren, daß eine Unterbrechung mit langer Warte- bzw. Verzögerungszeit erforderlich ist. Die entsprechenden Übergangssteuersignale identifizieren einen Zustand W vorzugsweise als einen unterbrechbaren. In dem Fall, daß die ISS 100 mehrere Sätze von Zuständen F, D, E und W parallel unterstützt, identifizeren die Bildsteuersignale, wenn sie konfiguriert sind, um die Bildfaltungs-ISA zu realisieren, jeden Zustand W als unterbrechbar und spezifizieren weiter, daß das Abarbeiten bezüglich des Interrupts verzögert werden soll, bis jede der parallelen Instruktionsausführungszyklen ihre Zustand-W-Operationen beendet hat. Dies gewährleistet, daß eine ganze Gruppe von Instruktionen ausgeführt werden wird, bevor ein Interrupt abgearbeitet wird, wodurch vernünftige Fließband-Ausführungsleistungspegel bzw. gepipelinte Ausführungsleistungspegel aufrechterhalten werden.In the preferred embodiment, the level of the Interrupt Wait Time or the Interrupt Delay Time, as specified by the Interrupt Response signals, and thus by the Transition Control Signals, varies in response from the current ISA, for which the DRPU 32 has been configured. For example, an ISA designed for high performance real-time motion control requires fast and predictable interrupt response capabilities. The configuration data set corresponding to such an ISA therefore preferably includes interrupt response signals indicating that a low latency or delay time interrupt is required. The respective transition control signals again identify multiple ISS states as interruptible, thereby allowing an interrupt to suspend an instruction execution cycle before the instruction execution cycle is completed. In contrast to an ISA that is directed to real-time motion control, an ISA that is aligned with image folding operations requires interrupt response capabilities that ensure that the number of convolution operations performed per unit time is maximized. The configuration record corresponding to the image folding ISA preferably includes interrupt response signals specifying that a long wait or delay time is required. The respective transition control signals preferably identify a state W as an interruptible one. In the case that the ISS 100 support multiple sets of states F, D, E, and W in parallel, identify the image control signals when configured to implement the image convolution ISA, interrupt each state W as interruptible, and further specify that processing for the interrupt be delayed; until each of the parallel instruction execution cycles has completed its state W operations. This ensures that a whole set of instructions will be executed before an interrupt is executed, thereby maintaining reasonable pipelined execution power levels and pipelined execution power levels, respectively.

In einer Art und Weise, die analog zu dem Pegel der Interruptverzögerungszeit ist, variiert der Pegel bzw. das Niveau der Interruptpräzision, wie sie durch die Interruptantwort signale spezifiziert werden, ebenso in Abhängigkeit von der ISA, für die die DRPU 32 konfiguriert ist. Zum Beispiel geben die Interruptantwortsignale in dem Fall, daß ein Zustand M als ein unterbrechbarer Zustand für eine Außenschleifen-ISA festgelegt ist, die unterbrechbare Multizyklusoperationen unterstützt, vorzugsweise vor, daß präzise Interrupts erforderlich sind. Die Übergangssteuersignale spezifizieren somit bzw. geben somit vor, daß Interrupts, die in dem Zustand M empfangen werden, als präzise Interrupts behandelt werden, um zu gewährleisten, daß Multizyklusoperationen erfolgreich erneut gestartet werden können. In einem anderen Beispiel bezüglich einer ISA, die nicht fehlbare bzw. nicht fehlerbehaftete Fließband-Arithmetikoperationen unterstützt, spezifizieren Interruptantwortsignale vorzugsweise, daß ungenaue bzw. unpräzise Interrupts erforderlich sind. Die Übergangssteuersignale spezifizieren dann bzw. geben dann vor, daß die Interrupts, die im Zustand W empfangen werden, als unpräzise Interrupts behandelt werden.In a manner analogous to the level of interrupt delay time, the level or level of interrupt precision as specified by the interrupt response signals also varies depending on the ISA for which the DRPU 32 is configured. For example, in the case that a state M is set as an interruptible state for an outside loop ISA that supports interruptible multi-cycle operations, the interrupt response signals preferably dictate that precise interrupts are required. The transition control signals thus specify that interrupts received in state M are treated as precise interrupts to ensure that multi-cycle operations can be successfully restarted. In another example, with respect to an ISA that supports non-erratic pipeline arithmetic operations, interrupt response signals preferably specify that inaccurate or imprecise interrupts are required. The transition control signals then specify that the interrupts received in state W are treated as imprecise interrupts.

Bezüglich jeder gegebenen ISA werden die Interruptantwortsignale durch einen Abschnitt des Datensatzes, der der ISA entspricht, festgelegt oder programmiert. Über programmierbare Interruptantwortsignale und die Erzeugung von entsprechenden Übergangssteuersignalen erleichtert die vorliegende Erfindung die Realisierung eines optimalen Interruptschemas auf einer ISA-durch-ISA-Basis. Fachleute werden erkennen, daß die überwiegende Mehrzahl von Computerarchitekturen nach dem Stand der Technik nicht für die flexible Spezifizierung von Unterbrechungsfähigkeiten bzw. Interruptfähigkeiten, nämlich programmierbare Zustands-Übergangsfreigabe, programmierbare Interruptverzögerungszeit und programmierbare Interruptpräzision sorgen. Bei der bevorzugten Ausführungsform ist die Interruptlogik 106 als eine CLB-basierte Zustandsmaschine realisiert, die die oben beschriebenen Zustände unterstützt.With respect to any given ISA, the interrupt response signals are determined or programmed by a portion of the data set corresponding to the ISA. Through programmable interrupt response signals and the generation of corresponding transition control signals, the present invention facilitates the realization of an optimal interrupt scheme on an ISA-by-ISA basis. Those skilled in the art will recognize that the vast majority of prior art computer architectures do not provide flexible specification of interrupt capabilities, namely, programmable state transition enable, programmable interrupt delay time, and programmable interrupt precision. In the preferred embodiment, the interrupt logic is 106 as a CLB-based state machine that supports the states described above.

Die Abrufsteuereinheit 108 verwaltet das Laden von Instruktionen in den Instruktionspuffer 110 in Antwort auf das Abrufsignal, das von der ISS 100 ausgegeben wird. Bei der bevorzugten Ausführungsform ist die Abrufsteuereinheit 108 als eine herkömmliche "One-Hot" codierte bzw. Monoflog-kodierte Zustandsmaschine realisiert, die Flipflops innerhalb eines Satzes von CLBs verwendet. Fachleute werden erkennen, daß bei einer alternativen Ausführungsform die Abrufsteuereinheit 108 als eine konventionell codierte Zustandsmaschine oder als eine ROM-basierte Zustandsmaschine konfiguriert werden könnte. Der Instruktionspuffer 110 stellt eine temporäre Speicherung für Instruktionen bereit, die von dem Speicher 34 geladen werden. Für die Realisierung einer Außenschleifen-ISA ist der Instruktionspuffer 110 vorzugsweise als herkömmlicher RAM-basierter "Zuerst-rein, Zuerst-raus"- bzw. "First In, First Out"-(FIFO)-Puffer realisiert, der eine Vielzahl von CLBs verwendet. Für die Realisierung einer Innenschleifen-ISA ist der Instruktionspuffer 110 vorzugsweise als ein Satz von Flipflop-Registern realisiert, die eine Vielzahl von Flipflops innerhalb eines Satzes von IOBs oder eine Vielzahl von Flipflops innerhalb sowohl IBOs als auch CLBs verwendet.The polling control unit 108 manages the loading of instructions into the instruction buffer 110 in response to the polling signal issued by the ISS 100 is issued. In the preferred embodiment, the polling control unit is 108 as a conventional one-hot coded state machine that uses flip-flops within a set of CLBs. Those skilled in the art will recognize that in an alternative embodiment, the polling control unit 108 could be configured as a conventionally encoded state machine or as a ROM-based state machine. The instruction buffer 110 provides temporary storage for instructions issued by the memory 34 getting charged. For the realization of an outer loop ISA, the instruction buffer is 110 preferably as a conventional RAM-based "first-in, first-out" or "first in, first out" (FIFO) buffer implementing a plurality of CLBs. For the realization of an inner-loop ISA, the instruction buffer is 110 Preferably, it is implemented as a set of flip-flop registers using a plurality of flip-flops within a set of IOBs or a plurality of flip-flops within both IBOs and CLBs.

Die Decodiersteuereinheit 112 verwaltet den Transfer von Instruktionen von dem Instruktionspuffer 110 in den Instruk tionsdecoder 114 in Antwort auf das Decodiersignal, das von der ISS 100 ausgegeben wird. Bezüglich einer Innenschleifen-ISA ist die Decodiersteuereinheit 112 vorzugsweise als eine Zustandsmaschine auf ROM-Basis realisiert, die ein ROM auf CLB-Basis aufweist, das mit einem Register auf CLB-Basis verbunden ist. Bezüglich einer Außenschleifen-ISA ist die Decodiersteuereinheit 112 vorzugsweise als eine codierte Zustandsmaschine auf CLB-Basis realisiert. Bezüglich jeder Instruktion, die als Eingang empfangen wird, gibt der Instruktionsdecoder 114 einen entsprechenden Operationscode bzw. Opcode, eine Registerfileadresse und optional eine oder mehrere Konstanten in einer herkömmlichen Art und Weise aus. Bezüglich einer Innenschleifen-ISA ist der Instruktionsdecoder 114 vorzugsweise konfiguriert, um eine Gruppe von Instruktionen zu decodieren, die als ein Eingang bzw. ein Eingangssignal empfangen werden. Bei der bevorzugten Ausführungsform ist der Instruktionsdecoder 114 als ein Decoder auf CLB-Basis bzw. als ein CLB-basierter Decoder konfiguriert, um jede der Instruktionen zu decodieren, die in der ISA, die gegenwärtig betrachtet wird, beinhaltet sind.The decoder control unit 112 manages the transfer of instructions from the instruction buffer 110 in the instruction decoder 114 in response to the decode signal from the ISS 100 is issued. With respect to an inner-loop ISA, the decoding control unit is 112 Preferably, it is realized as a ROM-based state machine having a CLB-based ROM connected to a CLB-based register. Regarding an exterior Loop ISA is the decode controller 112 preferably implemented as a CLB-based coded state machine. With respect to each instruction received as an input, the instruction decoder gives 114 a corresponding opcode, a register filing address, and optionally one or more constants in a conventional manner. With respect to an inner-loop ISA, the instruction decoder is 114 preferably configured to decode a group of instructions received as an input. In the preferred embodiment, the instruction decoder 114 as a CLB-based decoder, respectively, as a CLB-based decoder configured to decode each of the instructions included in the ISA currently being considered.

Der Operationscode-Speicherregistersatz 116 stellt eine temporäre Speicherung für jeden Operationscode bereit, der durch den Instruktionsdecoder 144 ausgegeben wird und gibt jeden Operationscode bzw. Opcode zu der ISS 100 aus. Wenn eine Außenschleifen-ISA in der DRPU 32 realisiert wird, wird der Operationscode-Speicherregistersatz 116 vorzugsweise realisiert, indem eine optimale Anzahl an Flipflop-Registerbänken verwendet wird. Die Flipflop-Registerbänke empfangen Signale von dem Instruktionsdecoder 114, die Klassen- oder Gruppencodes darstellen, die von Operationscode-Literal-Bitfeldern von Instruktionen abgeleitet werden, die zuvor durch den Instruktionspuffer 110 geschleust wurden bzw. dort in einer Warteschlange eingereiht wurden. Die Flipflop-Registerbänke speichern die zuvor erwähnten Klassen- oder Gruppencodes gemäß einem Decodierschema, das vorzugsweise die ISS-Komplexität minimiert. Für den Fall einer Innenschleifen-ISA speichert der Operationscode-Speicherregistersatz 116 vorzugsweise Operationscode-Hinweissignale, die direkter von den Operationscode-Bitfeldern abgeleitet werden, die durch den Instruktionsdecoder 114 ausgegeben werden. Innenschleifen-ISAs weisen notwendigerweise kleinere Operationscode-Literal-Bitfelder auf, wodurch die Realisierungserfordernisse für das Puffern, Decodieren bzw. Operationscode-Anzeigen für das Sequenzieren von Instruktionen durch den Instruktionspuffer 110, den Instruktionsdecodierer 114 bzw. den Operationscode-Speicherregistersatz 116 minimiert werden. Zusammengefaßt ist bezüglich Außenschleifen-ISAs der Operationscode-Speicherregistersatz 116 vorzugsweise als ein kleiner Verbund von Flipflop-Registerbänken realisiert, die durch eine Bitbreite charakterisiert sind, die gleich der Operationscode-Literal-Größe ist oder einen Bruchteil davon darstellt. Bezüglich Innenschleifen-ISAs ist der Operationscode-Speicherregistersatz 116 vorzugsweise eine kleinere und gleichmäßigere Flipflop-Registerbank als bezüglich Außenschleifen-ISAs. Die reduzierte Flipflop-Registerbankgröße in dem Innenschleifenfall spiegelt die minimale Instruktionszählcharakteristik von Innenschleifen-ISAs relativ zu Außenschleifen-ISAs wieder.The opcode storage register set 116 provides temporary storage for each opcode provided by the instruction decoder 144 is issued and gives each opcode to the ISS 100 out. If an outer-loop ISA in the DRPU 32 is realized becomes the operation code storage register set 116 preferably realized by using an optimal number of flip-flop register banks. The flip-flop register banks receive signals from the instruction decoder 114 representing class or group codes derived from op-code literal bit fields of instructions previously passed through the instruction buffer 110 were or were queued there. The flip-flop register banks store the aforementioned class or group codes in accordance with a decoding scheme that preferably minimizes ISS complexity. In the case of an inner-loop ISA, the opcode memory register set stores 116 preferably opcode hint signals derived more directly from the opcode bit fields provided by the instruction decoder 114 be issued. Inner-loop ISAs necessarily have smaller op-code literal bit-fields, thereby reducing the implementation requirements for buffering, decoding, and opcode indications for sequencing instructions by the instruction buffer 110 , the instruction decoder 114 or the opcode memory register set 116 be minimized. In summary, with respect to outer-loop ISAs, the opcode memory register set 116 preferably implemented as a small composite of flip-flop register banks characterized by a bit width equal to or a fraction of the opcode literal size. With respect to inner-loop ISAs, the opcode storage register set is 116 preferably a smaller and more uniform flip-flop register bank than outer-loop ISAs. The reduced flip-flop register bank size in the inner-loop case reflects the minimum instruction count characteristic of inner-loop ISAs relative to outer-loop ISAs.

Der RF-Adreßregistersatz 118 bzw. der Konstantenregistersatz 120 stellen eine temporäre Speicherung für jede Registerfileadresse bzw. für jede Konstante bereit, die durch den Instruktionsdecoder 114 ausgegeben wird. Bei der bevorzugten Ausführungsform werden der Operationscode-Speicherregistersatz 116, der RF-Adreßregistersatz 118 und der Konstantenregistersatz 120 jeweils als ein Satz von CLBs realisiert, die für die Datenspeicherung konfiguriert sind.The RF Address Register Set 118 or the constant register set 120 provide temporary storage for each register filename address or for each constant supplied by the instruction decoder 114 is issued. In the preferred embodiment, the opcode memory register set becomes 116 , the RF address register set 118 and the constant register set 120 each implemented as a set of CLBs configured for data storage.

Bei der Speicherzugriffslogik 102 handelt es sich um eine Speicher-Steuerschaltung, die den Transfer bzw. die Übertragung von Daten zwischen dem Speicher 34, der DOU 62 und der AOU 64 gemäß der atomaren Speicheradressengröße verwaltet und synchronisiert, die in dem Architekturbeschreibungsspeicher 122 spezifiziert ist. Die Speicherzugriffslogik 102 verwaltet und synchronisiert zusätzlich den Transfer von Daten und Befehlen bzw. Kommandos zwischen der S-Maschine 12 und einer gegebenen T-Maschine 14. Bei der bevorzugten Ausführungsform unterstützt die Speicherzugriffslogik 102 Burstmodus-Speicherzugriffe und sie ist vorzugsweise als eine herkömmliche RAM-Steuereinrichtung realisiert, die CLBs verwendet. Fachleute werden erkennen, daß während der Rekonfiguration die Eingangs- und Ausgangspins bzw. -anschlüsse der Rekonfigurierbaren Logikvorrichtung drei Zustände aufweisen, die ohmsche Abschlüsse ermöglichen, um unaktive bzw. nicht angesteuerte Logikpegel festzulegen, und sie werden somit nicht den Speicher 34 stören. Bei einer alternativen Ausführungsform könnte die Speicherzugriffslogik 102 außerhalb der DRPU 32 realisiert sein.In memory access logic 102 it is a memory control circuit that controls the transfer or transfer of data between the memory 34 , the DOU 62 and the AOU 64 managed and synchronized according to the atomic memory address size specified in the architecture description memory 122 is specified. The memory access logic 102 additionally manages and synchronizes the transfer of data and commands between the S-machine 12 and a given T-machine 14 , In the preferred embodiment, the memory access logic supports 102 Burst mode memory accesses, and is preferably implemented as a conventional RAM controller using CLBs. It will be appreciated by those skilled in the art that during reconfiguration, the input and output pins of the reconfigurable logic device have three states that enable ohmic terminations to establish inactive or undriven logic levels, and thus do not become the memory 34 to disturb. In an alternative embodiment, the memory access logic could 102 outside the DRPU 32 be realized.

Nimmt man nun Bezug auf die 7A und 7B, so ist ein Flußdiagramm eines bevorzugten Verfahrens für skalierbares, paralleles, dynamisch rekonfigurierbares Rechnen gezeigt. Vorzugsweise wird das Verfahren der 7A und 7B innerhalb jeder S-Maschine 12 in dem System 10 durchgeführt. Das bevorzugte Verfahren beginnt im Schritt 1000 in der 7A mit der Rekonfigurationslogik 104, die einen Konfigurationsdatensatz, der einer ISA entspricht, ausliest bzw. wieder aufnimmt. Danach konfiguriert im Schritt 1002 die Rekonfigurationslogik 104 jedes Element innerhalb der IFU 60, der DOU 62 und der AOU 64 gemäß dem ausgelesenen Datensatz im Schritt 1002, wodurch eine DRPU-Hardware-Organisation für die Implementation der ISA, die gegenwärtig betrachtet wird, erzeugt wird. Nachfolgend zum Schritt 1002 liest die Inter ruptlogik 106 die Interruptantwortsignale aus, die in dem Architekturbeschreibungsspeicher 101 gespeichert sind, und erzeugt einen entsprechenden Satz von Übergangssteuersignalen, die festlegen, wie die gegenwärtige DRPU-Konfiguration auf die Unterbrechungen bzw. Interrupts im Schritt 1004 antwortet. Die ISS 100 initialisiert nachfolgend eine Programmzustandsinformation im Schritt 1006, nach der die ISS 100 einen Instruktionausführungszyklus im Schritt 1008 initialisiert.If one refers now to the 7A and 7B Thus, a flow diagram of a preferred method for scalable, parallel, dynamically reconfigurable computing is shown. Preferably, the method of the 7A and 7B within every S machine 12 in the system 10 carried out. The preferred method begins in step 1000 in the 7A with the reconfiguration logic 104 which reads out or resumes a configuration record corresponding to an ISA. Then configured in step 1002 the reconfiguration logic 104 every element within the IFU 60 , the DOU 62 and the AOU 64 according to the read record in step 1002 thus creating a DRPU hardware organization for the implementation of the ISA currently under consideration. Below to the step 1002 reads the interrupt logic 106 the interrupt response signals included in the architecture description memory 101 stored and generates a corresponding set of transitions control signals that determine how the current DRPU configuration will respond to the interrupts in step 1004 responds. The ISS 100 subsequently initializes program state information in step 1006 after which the ISS 100 an instruction execution cycle in step 1008 initialized.

Danach bestimmt im Schritt 1010 die ISS 100 oder die Interruptlogik 106, ob eine Rekonfiguration erforderlich ist. Die ISS 100 bestimmt, daß die Rekonfiguration in dem Fall erforderlich ist, daß eine Rekonfigurationsanweisung während der Programmausführung gewählt wird. Die Interruptlogik 106 bestimmt, daß eine Rekonfiguration in Antwort auf einen Rekonfigurationsinterrupt erforderlich ist. Falls eine Rekonfiguration erforderlich ist, schreitet das bevorzugte Verfahren zum Schritt 1012 fort, in dem eine Rekonfigurationshandhabungseinrichtung Programmzustandsinformationen sichert. Vorzugsweise beinhalten die Programmzustandsinformationen einen Bezug zu dem Konfigurationsdatensatz, der die laufende DRPU-Konfiguration entspricht. Nach dem Schritt 1012 kehrt das bevorzugte Verfahren zum Schritt 1000 zurück, um einen nächsten Konfigurationsdatensatz derartig auszulesen, wie durch die Rekonfigurationsanweisung oder den Rekonfigurationsinterrupt verwiesen wurde.After that determined in step 1010 the ISS 100 or the interrupt logic 106 whether a reconfiguration is required. The ISS 100 determines that the reconfiguration is required in case a reconfiguration instruction is selected during program execution. The interrupt logic 106 determines that reconfiguration is required in response to a reconfiguration interrupt. If reconfiguration is required, the preferred method proceeds to step 1012 in which a reconfiguration handler secures program state information. Preferably, the program state information includes a reference to the configuration record corresponding to the current DRPU configuration. After the step 1012 the preferred method returns to the step 1000 to retrieve a next configuration record as referenced by the reconfiguration instruction or the reconfiguration interrupt.

Für den Fall, daß eine Rekonfiguration im Schritt 1010 nicht erforderlich ist, bestimmt die Interruptlogik 106, ob ein Nicht-Rekonfigurationsinterrupt eine Behandlung im Schritt 1014 erfordert. Falls dem so ist, bestimmt die ISS 100 als nächstes im Schritt 1020, ob ein Zustandsübergang von dem gegenwärtigen ISS-Zustand innerhalb des Instruktionsausführungszyklus zu dem Interrupt Servicezustand, basierend auf den Übergangssteuersignalen, möglich ist. Falls ein Zustandsübergang zu dem Interruptservicezustand nicht möglich ist bzw. nicht erlaubt ist, schreitet die ISS 100 zu einem nächsten Zustand in dem Instruktionsausführungszyklus fort und kehrt zu dem Zustand 1020 zurück. Für den Fall, daß die Übergangssteuersignale einen Zustandsübergang von dem gegenwärtigen ISS-Zustand innerhalb des Instruktionsausführungszyklus zu dem Interruptservicezustand erlauben, schreitet die ISS 100 als nächstes zu dem Interruptservicezustand im Schritt 1024 fort. Im Schritt 1024 sichert die ISS 100 die Programmzustandsinformation und führt Programminstruktionen zur Abarbeitung des Interrupts aus. Nachfolgend zum Schritt 1024 kehrt das bevorzugte Verfahren zum Schritt 1008 zurück, um den gegenwärtigen Instruktionsausführungszyklus wieder aufzunehmen, falls er nicht vollendet worden ist, oder um einen nächsten Instruktionsausführungszyklus auszulösen bzw. zu initialisieren.In the event that a reconfiguration in step 1010 is not required determines the interrupt logic 106 Whether a non-reconfiguration interrupt is a treatment in the step 1014 requires. If so, the ISS determines 100 next in the step 1020 whether a state transition from the current ISS state within the instruction execution cycle to the interrupt service state based on the transition control signals is possible. If a state transition to the interrupt service state is not possible or not allowed, the ISS proceeds 100 to a next state in the instruction execution cycle and returns to the state 1020 back. In the event that the transition control signals permit a state transition from the current ISS state within the instruction execution cycle to the interrupt service state, the ISS proceeds 100 next to the interrupt service state in step 1024 continued. In step 1024 assures the ISS 100 the program state information and executes program instructions for processing the interrupt. Below to the step 1024 the preferred method returns to the step 1008 to resume the current instruction execution cycle if it has not been completed or to initiate a next instruction execution cycle.

Für den Fall, daß kein Nicht-Rekonfigurationsinterrupt eine Abarbeitung im Schritt 1014 erfordert, schreitet das bevorzugte Verfahren zum Schritt 1016 fort und bestimmt, ob die Ausführung des gegenwärtigen Programms vollendet ist. Falls die Ausführung des gegenwärtigen Programms fortgesetzt werden soll, kehrt das bevorzugte Verfahren zum Schritt 1008 zurück, um einen anderen Instruktionsausführungszyklus auszulösen bzw. zu initialisieren. Ansonsten endet das bevorzugte Verfahren.In the event that no non-reconfiguration interrupt is processing in step 1014 requires, the preferred method proceeds to the step 1016 and determines if the execution of the current program is complete. If execution of the current program is to continue, the preferred method returns to the step 1008 back to initiate or initialize another instruction execution cycle. Otherwise, the preferred method ends.

Die Lehren der vorliegenden Erfindung unterscheiden sich entscheidend von anderen Systemen und Verfahren zum reprogrammierbaren oder rekonfigurierbaren Rechnen. Insbesondere ist die vorliegende Erfindung nicht äquivalent zu herunterladbaren Mikrocode-Architekturen, weil derartige Architekturen sich im allgemeinen auf nicht-rekonfigurierbare Steuereinrichtungen und nicht-rekonfigurierbare Hardware verlassen. Die vorliegende Erfindung unterscheidet sich also eindeutig von einem zugeordneten rekonfigurierbaren Prozessor bzw. "Attached Reconfigurable Processor" (ARP)-System, in dem ein Satz von rekonfigurierbaren Hardware-Systemelementen zu einem nicht-rekonfigurierbaren Hostprozessor oder Hostsystem verbunden wird. Ein ARP-Apparat hängt bezüglich der Ausführung einiger Programminstruktionen von dem Host ab. Deshalb ist der Satz an verfügbaren Silizium-Systemelementen nicht maximal über den Zeitrahmen der Programmausführung ausgenutzt, da die Silizium-Systemelemente auf dem ARP-Apparat bzw. dem Host untätig sein werden oder ineffizient genutzt werden, wenn der Host- bzw. der ARP-Apparat mit Daten arbeitet. Im Gegensatz dazu handelt es sich bei jeder S-Maschine 12 um einen unabhängigen Computer, in dem gesamte Programme leicht ausgeführt werden können. Mehrere S-Maschinen 12 führen vorzugsweise Programme simultan aus. Die vorliegende Erfindung lehrt deshalb das maximale Ausnutzen von Silizium-Systemelementen bzw. Silizium-Resourcen zu allen Zeiten, sowohl für einzelne Programm, die auf einzelnen S-Maschinen 12 ausgeführt werden, als auch für mehrere Programme, die auf dem gesamten System 10 ausgeführt werden.The teachings of the present invention are significantly different from other systems and methods for reprogrammable or reconfigurable computing. In particular, the present invention is not equivalent to downloadable microcode architectures because such architectures generally rely on non-reconfigurable controllers and non-reconfigurable hardware. Thus, the present invention clearly differs from an associated reconfigurable processor (ARP) system in which a set of reconfigurable hardware system elements are connected to a non-reconfigurable host processor or host system. An ARP apparatus depends on the host for the execution of some program instructions. Therefore, the set of available silicon system elements is not maximally exploited over the time frame of program execution because the silicon system elements on the ARP apparatus or the host will be idle or inefficiently utilized when the host or ARP apparatus works with data. In contrast, every S-machine is 12 to an independent computer in which entire programs can be easily executed. Several S-machines 12 Preferably, execute programs simultaneously. The present invention therefore teaches the maximum utilization of silicon resources at all times, both for individual programs operating on individual S-machines 12 be run, as well as for multiple programs running on the entire system 10 be executed.

Ein ARP-Apparat liefert eine Rechen-Beschleunigungseinrichtung für einen bestimmten Algorithmus zu einer bestimmten Zeit und ist als ein Satz von Gattern realisiert, die optimal bezüglich dieses bestimmten Algorithmus verbunden sind. Die Verwendung rekonfigurierbarer Hardware-Systemelemente für Allzweck-Operationen, wie z.B. die Handhabung einer Befehlsausführung wird in ARP-Systemen vermieden. Darüber hinaus behandelt ein ARP-System nicht einen gegebenen Satz von untereinander verbundenen Gattern als eine leicht wiederverwendbare Resource bzw. als ein leicht wiederverwendbares Systemelement. Im Gegensatz dazu lehrt die vorliegende Erfindung eine dynamisch rekonfigurierbare Verarbeitungseinrichtung, die für die effiziente Handhabung einer Instruktionsausführung konfiguriert ist, und zwar gemäß einem Instruktionsausführungsmodell, das am besten an die Rechenerfordernisse zu einem bestimmten Moment angepaßt ist. Jede S-Maschine 12 beinhaltet eine Vielzahl von leicht wiederverwendbaren Systemelementen, z.B. die ISS 100, die Interruptlogik 106 und die Abspeicher-/Ausrichtlogik 152. Die vorliegende Erfindung lehrt die Verwendung von rekonfigurierbaren Logik-Systemelementen auf der Ebene von Gruppen von CLBs, IOBs und rekonfigurierbaren Verbindungen eher als auf der Ebene von untereinander verbundenen Gattern. Die vorliegende Erfindung lehrt somit die Verwendung rekonfigurierbarer Logikdesign-Konstruktionen höheren Niveaus, die nützlich bei der Durchführung von Operationen auf einer gesamten Klasse von Rechenproblemen sind, eher als daß sie ein einziges nützliches Gatterverbindungsschema lehrt, das für einen einzigen Algorithmus nützlich ist.An ARP apparatus provides a computational accelerator for a particular algorithm at a particular time and is implemented as a set of gates that are optimally connected to that particular algorithm. The use of reconfigurable hardware system elements for general purpose operations, such as handling instruction execution, is avoided in ARP systems. In addition, an ARP system does not treat a given set of interconnected gates as an easily reusable resource or reusable system element. In contrast, the present invention teaches a dynamically reconfigurable processing device that is capable of efficiently handling an instruction execution guriert, according to an instruction execution model that best fits the computational requirements at a particular moment. Every S-machine 12 includes a variety of easily reusable system elements, such as the ISS 100 , the interrupt logic 106 and the save / align logic 152 , The present invention teaches the use of reconfigurable logic system elements at the level of clusters of CLBs, IOBs, and reconfigurable links rather than at the level of interconnected gates. The present invention thus teaches the use of higher level reconfigurable logic design constructions which are useful in performing operations on a whole class of computational problems, rather than teaching a single useful gate connection scheme useful for a single algorithm.

Im allgemeinen sind ARP-Systeme auf die Übersetzung eines bestimmten Algorithmus in einen Satz von untereinander verbundenen Gattern gerichtet. Einige ARP-Systeme streben an, Instruktionen hohen Niveaus in eine optimale Hardware-Konfiguration auf Gatterniveau zu kompilieren bzw. zu übersetzen, was im allgemeinen ein "NP-Hard"- bzw. NP-hart-Problem ist. Im Gegensatz dazu lehrt die vorliegende Erfindung die Verwendung eines Kompilers für dynamisch rekonfigurierbares Rechnen, der Programminstruktionen hohen Niveaus in Assemblersprachinstruktionen gemäß einer variablen ISA in einer sehr geradlinigen Art und Weise kompiliert.in the In general, ARP systems are based on the translation of a particular Algorithm into a set of interconnected gates directed. Some ARP systems aim at high level instructions to compile into an optimal gate-level hardware configuration or translate, which is generally an "NP-Hard" or NP-hard problem is. In contrast, the present invention teaches the use a compiler for dynamically reconfigurable arithmetic, the program instructions high level in assembly language instructions according to a variable ISA compiled in a very straightforward manner.

Ein ARP-Apparat ist im allgemeinen nicht in der Lage, sein eigenes Hostprogramm als Daten zu behandeln oder sich selbst zu kontextualisieren bzw. vom Kontext abhängig zu behandeln. Im Gegensatz dazu kann jede S-Maschine in dem System 10 ihr eigenes Programm als Daten behandeln und sich somit leicht selbst kontextualisieren bzw. vom Kontext abhängig behandeln. Das System 10 kann sich leicht selbst durch das Ausführen seiner eigenen Programme simulieren. Die vorliegende Erfindung hat zusätzlich die Fähigkeit, seinen eigenen Kompiler zu kompilieren.An ARP apparatus is generally unable to treat its own host program as data or to contextualize itself or contextually. In contrast, every S machine in the system 10 treat their own program as data and thus easily contextualize themselves or treat them contextually. The system 10 can easily simulate itself by running its own programs. The present invention additionally has the ability to compile its own compiler.

Bei der vorliegenden Erfindung kann ein einziges Programm eine erste Gruppe von Instruktionen beinhalten, die zu einer ersten ISA gehören, eine zweite Gruppe von Instruktionen, die zu einer zweiten ISA gehören, eine dritte Gruppe von Instruktionen, die zu einer noch anderen ISA gehören, usw. Die hierin gelehrte Architektur führt jede derartige Gruppe von Instruktionen aus, indem Hardware verwendet wird, die bezüglich der Ausführungszeit bzw. Laufzeit konfiguriert ist, um die ISA zu realisieren, zu der die Instruktionen gehören. Kein System oder Verfahren nach dem Stand der Technik bietet ähnliche Techniken an.at In the present invention, a single program may be a first Group of instructions that belong to a first ISA, a second one Group of instructions belonging to a second ISA, one third group of instructions belonging to yet another ISA, etc. The architecture taught herein carries each such group of Instructions by using hardware related to the execution time or run time is configured to realize the ISA, to the the instructions belong. No prior art system or method offers similar techniques at.

Während die vorliegende Erfindung unter Bezugnahme auf gewisse bevorzugte Ausführungsformen beschrieben wurde, werden Fachleute erkennen, daß verschiedene Änderungen vorgenommen werden können. Veränderungen und Modifikationen der bevorzugten Ausführungsformen werden durch die vorliegende Erfindung bereitgestellt, die nur durch die folgenden Ansprüche beschränkt ist.While the present invention with reference to certain preferred embodiments will be appreciated, those skilled in the art will recognize that various changes can be made. changes and modifications of the preferred embodiments are achieved by the present invention provided only by the following claims limited is.

Zusammenfassend kann man folgendes bemerken:
Die Erfindung betrifft ein System und ein Verfahren zum skalierbaren, parallelen, dynamisch rekonfigurierbaren Rechnen. Ein Satz von S-Maschinen, eine T-Maschine, die zu jeder S-Maschine korrespondierend ist, eine Allzweck-Verbindungsmatrix (GPIM), ein Satz von I/O-T-Maschinen, ein Satz von I/O-Vorrichtungen und eine Master-Zeitbasiseinheit bilden ein System für skalierbares paralleles, dynamisch rekonfigurierbares Rechnen. Jede S-Maschine ist ein dynamisch rekonfigurierbarer Rechner mit einem Speicher, einer ersten lokalen Zeitbasiseinheit und einer dynamisch rekonfigurierbaren Verarbeitungseinheit (DRPU). Die DRPU wird realisiert bzw. implementiert, indem eine reprogrammierbare Logikvorrichtung verwendet wird, die als eine Instruktionsabrufeinheit (IFU), eine Datenoperationseinheit (DOU) und eine Adressenoperationseinheit (AOU) konfiguriert ist, von denen jede selektiv während einer Programmausführung in Antwort auf einen Rekonfigurationsinterrupt oder der Auswahl einer Rekonfigurationsanweisung, die in einen Satz von Programminstruktionen eingebettet ist, rekonfiguriert wird. Jeder Rekonfigurationsinterrupt und jede Rekonfigurationsanweisung nimmt auf einen Satz von Konfigurationsdaten Bezug, der eine DRPU-Hardware-Organisation spezifiziert, die für Realisierung bzw. Implementation einer bestimmten Instruktionssatzarchitektur (ISA) optimiert ist. Die IFU verwaltet Rekonfigurationsoperationen, Instruktionsabruf- und Decadieroperationen, Speicherzugriffsoperationen, und sie gibt Steuersignale an die DOU und die AOU aus, um eine Instruktionsausführung zu erleichtern bzw. zu vereinfachen. Die DOU führt Datenberechnungen durch und die AOU führt Adressenberechnungen durch. Bei jeder T-Maschine handelt es sich um eine Datentransfervorrichtung bzw. um eine Datenübertragungsvorrichtung mit einer gemeinsamen Schnittstellen- und Steuereinheit, einer oder mehrerer Verbindungs-I/O-Einheiten und einer zweiten lokalen Zeitbasiseinheit. Bei der GPIM handelt es sich um ein skalierbares Verbindungsnetzwerk, das eine parallele Kommunikation zwischen T-Maschinen erleichtert bzw. vereinfacht. Der Satz von T-Maschinen und die GPIM erleichtert eine parallele Kommunikation zwischen S-Maschinen.In summary, one can note the following:
The invention relates to a system and a method for scalable, parallel, dynamically reconfigurable computing. A set of S machines, a T machine corresponding to each S machine, a general purpose connection matrix (GPIM), a set of I / OT machines, a set of I / O devices, and a master machine. Time base units form a system for scalable parallel, dynamically reconfigurable computing. Each S-machine is a dynamically reconfigurable computer with a memory, a first local time base unit and a dynamically reconfigurable processing unit (DRPU). The DRPU is implemented using a reprogrammable logic device configured as an instruction fetch unit (IFU), a data operation unit (DOU), and an address operation unit (AOU), each of which is selectively enabled during program execution in response to a reconfiguration interrupt reconfiguring the selection of a reconfiguration instruction embedded in a set of program instructions. Each reconfiguration interrupt and reconfiguration instruction refers to a set of configuration data specifying a DRPU hardware organization optimized for implementation of a particular instruction set architecture (ISA). The IFU manages reconfiguration operations, instruction fetch and decode operations, memory access operations, and outputs control signals to the DOU and the AOU to facilitate instruction execution. The DOU performs data calculations and the AOU performs address calculations. Each T-machine is a data transfer device or a data transfer device with a common interface and control unit, one or more connection I / O units and a second local time base unit. The GPIM is a scalable interconnection network that facilitates or simplifies parallel communication between T-machines. The set of T-machines and the GPIM facilitates parallel communication between S-machines.

Claims

Method for generating instructions, executable by a reconfigurable computer having a dynamically reconfigurable processing unit with a variable, internal hardware organization of a plurality of high-level statements, the method comprising the steps of: a) a plurality of sets of rules for translating high level statements in instructions executable by the respective computer reconfigured for one of the respective plurality of instruction set architectures is provided; b) one of the plurality of sets of rules is selected as the current set of rules to be used to translate high level statements into instructions executable by the reconfigured computer; c) if a high level statement is a reconfiguration instruction, c1) the current set of rules used to translate high level statements is changed to a set of rules specified in the reconfiguration instruction, where c2 ) the high level statement is a reconfiguration instruction, and the selected high level statement is translated into at least one instruction executable by the reconfigured computer using the current set of rules; d) a high-level statement is translated into a middle-level statement, where d1) the reconfiguration instruction is translated into a medium-level reconfiguration statement if the selected high-level statement is a reconfiguration instruction; d2) selecting a set of rules corresponding to the instruction set architecture specified by the medium level reconfiguration statement; e) a middle level statement is translated into an assembler language statement using the selected set of rules corresponding to the instruction set architecture specified by the medium level reconfiguration statement.

The method of claim 1, by a compiler accomplished becomes.