DE3740834A1

DE3740834A1 - MAINTAINING COHERENCE BETWEEN A MICROPROCESSOR-INTEGRATED CACHE AND AN EXTERNAL MEMORY

Info

Publication number: DE3740834A1
Application number: DE19873740834
Authority: DE
Inventors: Alon Shacham; Jonathan Levy
Original assignee: National Semiconductor Corp
Current assignee: National Semiconductor Corp
Priority date: 1987-01-22
Filing date: 1987-12-02
Publication date: 1988-08-04
Also published as: JPS63193246A; GB2200481A; GB8729324D0; GB2200481B

Abstract

A method of maintaining data coherency between a microprocessor's integrated (on-chip) cache memory 14, 16 and its associated external main memory is provided. When data is written to external memory, the address of the external memory write is compared with the address tags of the integrated cache memory entries. If the comparison results in a match, data at locations within the cache corresponding to the write address are invalidated by execution of an invalidation instruction without adversely affecting the microprocessor's performance. <IMAGE>

Description

Background of the Invention 1. Subject of the invention

Die vorliegende Erfindung betrifft Datenverarbeitungssysteme und im besonderen ein Verfahren zur Aufrechterhaltung der Kohärenz zwischen einem mikroprozessoren-integrierten Cache-Speicher und einem externen Speicher ohne gegensätzliche Beeinflussung der Mikroprozessorenleistung.The present invention relates to data processing systems and in particular a process for maintaining coherence between a microprocessor integrated cache and an external memory without adversely affecting the microprocessor performance.

2. Discussion of the state of the art

In konventionellen Datenverarbeitungssystem-Architekturen verarbeitet eine zentrale Verarbeitungseinheit Befehle und Operanden, die sie aus dem Speicher über einen externen Interface (Schnittstellen)-Bus wieder auffindet. Weil die zentrale Verarbeitungseinheit viel schneller Befehle und Operanden ausführen kann, als diese aus dem externen Speicher wieder aufgefunden werden, wird oftmals ein kleiner Hochgeschwindigkeitspuffer oder Cache-Speicher (Zwischenspeicher) zwischen die zentrale Verarbeitungseinheit und den externen Speicher geschaltet, um die Zeit zu minimieren, die von der zentralen Verarbeitungseinheit zum Warten auf Befehle und Daten verbraucht wird.Processed in conventional data processing system architectures a central processing unit commands and operands, which they get from memory via an external interface (interface) bus finds again. Because the central processing unit can execute instructions and operands much faster than these will be found again from the external memory often a small high-speed buffer or cache (Buffer) between the central processing unit and switched the external memory to minimize the time waiting for from the central processing unit Commands and data is consumed.

Ein Cache-Speicher ersetzt dynamisch seinen Inhalt um sicherzustellen, daß die höchstwahrscheinlich zu benutzende Information für die zentrale Verarbeitungseinheit fertig verfügbar ist. Wenn die zentrale Verarbeitungseinheit Informationen benötigt, greift sie auf den Cache zu und, wenn die geforderte Information im Cache gefunden ist, braucht kein Zugriff zum externen Speicher über den externen Interface-Bus mehr durchgeführt werden. A cache dynamically replaces its content to ensure that the most likely information to use is available for the central processing unit. If the central processing unit needs information them to the cache and if the requested information in the Cache is found, does not need access to external memory more can be done via the external interface bus.

Der Cache-Speicher ist ein Zusatz, der erst kürzlich für Hochleistungsrechner eingeführt worden ist. In diesen Mikroprozessor-Architekturen ist jedoch der Cache, so lange er in der Mikroprozessorenrechnergruppe angeordnet ist, nicht auf demselben Halbleiter "Mikrochip" mit dem Mikroprozessor integriert. Wenn man den Cache- Speicher "on-chip" integriert, wäre der Vorteil einer weiteren Verringerung der Zeitverzögerung gegeben, inhärent das "off-chip" für Informationen gehend. Der integrierte Cache ist wesentlich, um eine Spitzenleistung von einem Mikroprozessor zu erzielen.The cache memory is an addition that has only recently been used for high-performance computers has been introduced. In these microprocessor architectures however, the cache is as long as it is in the microprocessor group is arranged, not on the same semiconductor "Microchip" integrated with the microprocessor. If you look at the cache Integrated on-chip memory would be the advantage of another Reduction in time delay given inherently the "off-chip" going for information. The built-in cache is essential to to achieve peak performance from a microprocessor.

Um eine Spitzenleistung und korrekte Operationen zu erzielen, muß ein Cache-Speicher die Information reflektieren, die am meisten up-to-date ist und die in der zentralen Verarbeitungseinheit gebracht werden mag. Diese Forderung wird gewöhnlich "Aufrechterhaltung der Cache-Kohärenz" genannt. Die Aufrechterhaltung der Cache-Kohärenz kann durch die folgende Folge von Ereignissen zusammengefaßt werden. Zuerst erhält der Cache eine Kopie eines Informationszeichens von einer Adresse in dem externen Speicher. Das Informationszeichen wird dann an dieser Adresse in den externen Speicher durch eingeschriebene Daten auf einem externen Gerät modifiziert. Als Ergebnis existiert ein "abgestandenes" Zeichen in dem Cache. Um die Kohärenz zwischen dem externen Memory und dem Cache aufrechtzuerhalten, muß entweder das abgestandene Zeichen im Cache fortgeschrieben (aktualisiert) oder ungültig gemacht werden, bevor die zentrale Verarbeitungseinheit Informationen für die korrespondierende Adresse anfordert.To achieve peak performance and correct operations, must a cache reflect the information that most is up-to-date and brought to the central processing unit like to be. This requirement is usually "maintenance called cache coherence ". Maintaining the Cache coherency can be summarized by the following sequence of events will. First, the cache receives a copy of an information mark from an address in the external memory. The information sign is then at this address in the external Storage through written data on an external device modified. The result is a "stale" sign in the cache. To ensure coherence between the external memory and To maintain the cache, either the stale character updated (updated) or invalidated in the cache be before the central processing unit information requests for the corresponding address.

In konventionellen Mikroprozessorkonstruktionen, die off-chip Caches benutzen, werden Cache-Eingangsungültigkeiten durchlaufen, indem die Adressen für die modifizierten Speicherzellen im externen Memory zu einer Reihe von Cache-Adreß-Identifizierungskennzeichen zum Vergleich präsentiert werden. In einigen Fällen wird eine Extrareihe von Cache-Identifizierungskennzeichen benutzt, um Interferenz mit den Mikroprozessor-Cache-Referenzen zu vermeiden. In conventional microprocessor designs that are off-chip Using caches, cache invalidations are run through, by the addresses for the modified memory cells in the external Memory to a series of cache address identifiers be presented for comparison. In some cases an extra set of cache tags used to Avoid interference with the microprocessor cache references.

Dieser Vergleich und die resultierende Cache-Ungültigkeit, wenn überhaupt, wird über den System-Interface-Bus durchgeführt.This comparison and the resulting cache invalid if at all, is carried out via the system interface bus.

Jedoch, wenn die konventionelle Cache-Eingangsungültigkeitstechnik bei einem integrierten Cache angewendet wird, gibt es eine Reihe von Problemen. Zuerst können zusätzlich Anschlüsse für den Mikroprozessor benötigt werden, um die Ungültigkeitsadressen einzugeben. Wenn zusätzliche Anschlüsse nicht hinzugefügt werden und statt dessen die Anschlüsse für die externen Mikroprozessor-Referenzen auch zur Eingabe von Cache-Ungültigkeitsadressen benutzt werden, wird dann eine Konkurrenz für den Interface-Bus erzeugt und die Mikroprozessorleistung dort herabgesetzt. Zweitens, der on-chip Cache kann zum Vergleich mit den Ungültigkeitsadressen benötigt werden. Anderenfalls, wenn nur ein Set von Identifizierungskennzeichen benutzt wird, würde es dort eine Konkurrenz für die Identifizierungskennzeichen geben und, wieder, die Leistung würde leiden.However, if the conventional cache entry invalidation technique is applied to a built-in cache, there is one Series of problems. First, additional connections for the Microprocessor are needed to enter the invalid addresses. If additional connections are not added and instead, the connections for the external microprocessor references also used to enter cache invalid addresses competition is then created for the interface bus and reduced the microprocessor performance there. Second, the on-chip cache can be used to compare with the invalid addresses are needed. Otherwise, if only a set of identifiers is used, there would be competition for give the identifiers and, again, the performance would suffer.

Summary of the invention

Demnach ist es Aufgabe der vorliegenden Erfindung, ein Verfahren zur Aufrechterhaltung der Kohärenz in einem integrierten Cache- Speicher ohne gegensätzliche Beeinflussung der Leistung der angeschlossenen zentralen Verarbeitungseinheit zu schaffen.Accordingly, the object of the present invention is a method to maintain coherence in an integrated cache Memory without affecting the performance of the connected to create a central processing unit.

Es ist ferner Aufgabe der vorliegenden Erfindung, ein Verfahren zur externen Überwachung der Inhalte eines on-chip Cache zu schaffen.It is also an object of the present invention to provide a method for external monitoring of the content of an on-chip cache create.

Es ist ferner Aufgabe der vorliegenden Erfindung, eine selektive Ungültigkeit von on-chip Cache-Speicherzellen zu schaffen.It is also an object of the present invention to provide a selective Create invalidation of on-chip cache memory cells.

Die oben genannten Aufgaben werden durch die vorliegende Erfindung dadurch gelöst, daß die Zahl der Anschlüsse auf der Mikroprozessor-Schnittstelle durch Angabe der ungültig zu machenden Speicherzellen in den Cache begrenzt wird; das bedeutet, den Cache Set ungültig zu machen ist eher spezifiziert als die Hauptspeicheradresse. Durch Benutzung eines getrennten Ungültigkeits-Bus können die Cache-Ungültigkeiten ohne Interferenz mit den externen Referenzen des Mikroprozessors auftreten. Durch Benutzung zweikanaliger Gültigkeit-Bits in den on-chip Caches können Ungültigkeiten ohne Interferenz mit den on-chip Referenzen des Mikroprozessors auftreten.The above objects are achieved by the present invention solved by the number of connections on the microprocessor interface by specifying the ones to be invalidated Memory cells in the cache is limited; that means the cache Invalidating the set is more specified than the main memory address. By using a separate invalidation bus can cache invalidates without interference with the external References of the microprocessor occur. By using two-channel Validity bits in the on-chip caches can be invalid without interference with the microprocessor's on-chip references occur.

Ein "Bus-Watcher"-Kreislauf ist geschaffen, der die zusätzliche Kopie der Cache-Identifizierungskennzeichen enthält, vielmehr als die Identifizierungskennzeichen auf dem Mikroprozessor zu plazieren. Dies reduziert die Kosten des Mikroprozessors. Auch wird der Bus-Watcher nicht benötigt, wenn die Zahl der Ungültigkeiten gering ist. Wann immer eine Speicherzelle in dem Hauptspeicher modifiziert ist, ist es möglich, den Cache-Set, der diese Speicherzelle enthält, ungültig zu machen, sogar wenn eine andere Adresse in den Cache gespeichert ist. Dies erspart die Kosten eines speziellen Bus-Watcher, aber reduziert die Leistung, weil unnötige Ungültigkeiten durchgeführt werden. Die Kosten-/Leistungsbetrachtung, ob ein Bus-Watcher einzuschließen ist, bleiben dem Systemkonstrukteur übrig.A "bus watcher" circuit is created, which is the additional one Contains copy of cache tags, rather than to place the identifiers on the microprocessor. This reduces the cost of the microprocessor. Also the Bus-Watcher is not required if the number of invalidations is low is. Whenever a memory cell in the main memory is modified is, it is possible to set the cache of this memory cell contains, invalidate even if another address is cached. This saves the cost of a special one Bus-Watcher, however, reduces performance because unnecessary Invalidity can be carried out. The cost / performance analysis, The system designer remains free to include a bus watcher left.

Der on-chip Cache des Mikroprozessors, wie er hier beschrieben wird, schließt ein 512-Byte Befehls-Cache und einen separaten 1024 Byte-Daten-Cache ein. Der Befehls-Cache und der Daten-Cache können getrennt aktivierbar sein. Die Inhalte der zwei Caches können wahlweise gesperrt zu speicherresistenten Memory-Speicherzellen sein. Durch Schaffung der Möglichkeit spezifische Speicherstellen in den Caches zu sperren, bietet die zentrale Verarbeitungseinheit sehr schnellen on-chip Zugriff zu kritischen Befehlen und Daten, die von großem Vorteil in Realzeitanwendungen sein können.The on-chip cache of the microprocessor as described here includes a 512-byte instruction cache and a separate one 1024 byte data cache. The instruction cache and the data cache can be activated separately. The content of the two caches can optionally be locked to memory-resistant memory cells be. By creating the possibility of specific storage locations The central processing unit offers locking in the caches very fast on-chip access to critical commands and data that can be of great advantage in real-time applications can.

Ein Cache-Ungültigkeitsbefehl kann ausgeführt werden, entweder um den Befehls-Cache und/oder Daten-Cache vollständig ungültig zu machen oder nur einen einzelnen 16-Byte-Block in einen oder beiden Caches.A cache invalidate command can be executed either at the command cache and / or data cache completely invalid make or just a single 16-byte block in one or both Caches.

Der Gebrauch der Caches kann für individuelle Speicherzellen gesperrt werden, wobei man ein Cache-Eingangssperrsignal benutzt, das einer zentralen Verarbeitungseinheit anzeigt, daß die Speicherreferenz des aktuellen Bus-Zyklus nicht zwischenspeicherbar ist.The use of the caches can be blocked for individual memory cells using a cache input lock signal which indicates to a central processing unit that the memory reference of the current bus cycle cannot be buffered is.

Description of the drawings

Fig. 1 ist ein schematisches Blockdiagramm, das eine allgemeine Mikroprozessorarchitektur zeigt, die ein Verfahren zur Aufrechterhaltung der Cache-Kohärenz in Übereinstimmung mit der vorliegenden Erfindung benutzt. Figure 1 is a schematic block diagram showing a general microprocessor architecture using a method for maintaining cache coherency in accordance with the present invention.

Fig. 2 ist ein schematisches Diagramm, das die Interface-Signale des Mikroprozessors, wie hierin beschrieben, darstellt. Figure 2 is a schematic diagram illustrating the interface signals of the microprocessor as described herein.

Fig. 3 ist ein schematisches Blockdiagramm, das die größeren funktionalen Einheiten und zwischengeschalteten Buses des Mikroprozessors, wie hierin beschrieben, darstellt. Figure 3 is a schematic block diagram illustrating the larger functional units and intermediate buses of the microprocessor as described herein.

Fig. 4 ist ein schematisches Blockdiagramm, das die Struktur des integrierten Befehls-Cache des Mikroprozessors, wie hierin beschrieben, darstellt. Figure 4 is a schematic block diagram illustrating the structure of the microprocessor's integrated instruction cache as described herein.

Fig. 5 ist ein schematisches Blockdiagramm, das die Struktur eines integrierten Daten-Cache des Mikroprozessors, wie hierin beschrieben, darstellt. Figure 5 is a schematic block diagram illustrating the structure of an integrated data cache of the microprocessor as described herein.

Fig. 6 ist ein Zeitablaufdiagramm, das die Zeitablauffolge für den Zugriff zu den Daten-Cache darstellt. Fig. 6 is a timing diagram illustrating the timing sequence for accessing the data cache.

Fig. 7 ist ein schematisches Diagramm, das die allgemeine Struktur einer Vierstufen-Pipeline (Parallelbearbeitung) eines Mikroprozessors, wie hierin beschrieben, hat. Figure 7 is a schematic diagram that has the general structure of a four stage pipeline (parallel processing) of a microprocessor as described herein.

Fig. 8 ist ein Zeitablaufdiagramm, das die Pipelinezeitabläufe für einen internen Daten-Cache-Treffer darstellt. Figure 8 is a timing diagram illustrating the pipeline timings for an internal data cache hit.

Fig. 9 ist ein Zeitablaufdiagramm, das den Pipelinezeitablauf für einen internen Daten-Cache-Fehlgriff darstellt. Figure 9 is a timing diagram illustrating the pipeline timing for an internal data cache miss.

Fig. 10 ist ein Zeitablaufdiagramm, das die Wirkung eines Adressenregisters darstellt, verknüpft mit dem Pipelinezeitablauf. Figure 10 is a timing diagram illustrating the effect of an address register associated with the pipeline timing.

Fig. 11 ist ein Zeitablaufdiagramm, das die Wirkung einer korrekten Vorhersage eines Sprungbefehls darstellt, der in der Operation des Mikroprozessors, wie hierin beschrieben, durchgeführt wird. Fig. 11 is a timing diagram illustrating the effect of a correct prediction of a branch instruction which is in the operation of the microprocessor performed as described herein.

Fig. 12 ist ein Zeitablaufdiagramm, das die Wirkung einer unkorrekten Vorhersage einer Auflösung eines Sprungbefehls in der Operation des Mikroprozessors, wie hierin beschrieben, darstellt. Figure 12 is a timing diagram illustrating the effect of incorrectly predicting a jump instruction resolution in the operation of the microprocessor as described herein.

Fig. 13 ist ein Zeitablaufdiagramm, das die Beziehung zwischen einer Takt (CLK)-Eingabe und BUS-getakteten Outputsignalen des Mikroprozessors, wie beschrieben hierin, darstellt. Figure 13 is a timing diagram illustrating the relationship between a clock (CLK) input and bus clocked output signals of the microprocessor as described herein.

Fig. 14 ist ein Zeitablaufdiagramm, das den Basislesezyklus des Mikroprozessors, wie hierin beschrieben, darstellt. Fig. 14 is a timing diagram illustrating the basic read cycle of the microprocessor as described herein.

Fig. 15 ist ein Zeitablaufdiagramm, das den Basisschreibzyklus des Mikroprozessors, wie hierin beschrieben, darstellt. Figure 15 is a timing diagram illustrating the basic write cycle of the microprocessor as described herein.

Fig. 16 ist ein Zeitablaufdiagramm, das den Lesezyklus eines Mikroprozessors, wie hierin beschrieben, erweitert um zwei Wartezyklen darstellt. Figure 16 is a timing diagram illustrating the read cycle of a microprocessor as described herein expanded by two wait cycles.

Fig. 17 ist ein Zeitablaufdiagramm, das einen stoßweisen Lesezyklus darstellt, der drei Transfers hat, der beendet wird von dem Mikroprozessor, wie hier beschrieben. Figure 17 is a timing diagram illustrating an intermittent read cycle that has three transfers that is terminated by the microprocessor as described herein.

Fig. 18 ist ein Zeitablaufdiagramm, das einen stoßweisen Lesezyklus darstellt, der beendet wird von dem hierin beschriebenen Mikroprozessor, wobei der stoßweise Zyklus zwei Transfers hat, der zweite Transfer ist erweitert um einen Wartezustand. Figure 18 is a timing diagram illustrating an intermittent read cycle terminated by the microprocessor described herein, the intermittent cycle having two transfers, the second transfer being augmented by a wait state.

Fig. 19 ist ein schematisches Blockdiagramm, das einen Bus-Watcher darstellt, der zur Aufrechterhaltung der Cache-Köhärenz in Übereinstimmung mit der vorliegenden Erfindung gebraucht wird. Figure 19 is a schematic block diagram illustrating a bus watcher used to maintain cache coherency in accordance with the present invention.

Fig. 20 ist ein schematisches Blockdiagramm, das eine Cache-Kohärenzlösung für ein System mit einer niedrigen Ungültigkeitsrate. Figure 20 is a schematic block diagram illustrating a cache coherency solution for a low invalidation rate system.

Fig. 21 ist ein schematisches Blockdiagramm, das eine Cache-Kohärenzlösung für ein System mit einer hohen Ungültigkeitsrate darstellt. Figure 21 is a schematic block diagram illustrating a cache coherency solution for a system with a high invalidation rate.

Fig. 22 ist ein schematisches Blockdiagramm, das eine Cache-Kohärenzlösung für ein System mit einer hohen Ungültigkeitsrate darstellt, mit einem großen externen Cache-Speicher. Figure 22 is a schematic block diagram illustrating a cache coherency solution for a system with a high invalidation rate, with a large external cache.

Detaillierte Beschreibung eines bevorzugten Ausführungsbeispiels. Fig. 1 zeigt die allgemeine Architektur eines Mikroprozessors (CPU) 10 zur Verwendung eines Verfahrens zur Aufrechterhaltung der Kohärenz in einem integrierten Cache-Speicher in Übereinstimmung mit der vorliegenden Erfindung.Detailed description of a preferred embodiment. Figure 1 shows the general architecture of a microprocessor (CPU) 10 for using a method of maintaining coherency in an integrated cache in accordance with the present invention.

CPU 10 initiiert den Bus-Zyklus zur Kommunikation mit dem externen Speicher und anderen Geräten in dem System, um Befehle zu erhalten, Daten zu lesen und zu schreiben, Gleitpunktoperationen durchzuführen und gesteuerte Anforderungen zu beantworten. CPU 10 schließt eine Vier-Stufen-Befehlspipeline 12 ein, die geeignet zur Ausführung bei 20 MHz bis zu 10 MIPS (Millionen von Befehlen pro Sekunde) ist. Auch, integriert on-chip mit der Befehls-Pipeline 12 sind drei Speicherpuffer, die eine schwere Anforderung der Pipeline 12 für Befehle und Daten aufrechterhalten. Die Speicherpuffer schließen einen 512-Byte-Befehls-Cache 14 ein, einen 1024-Byte-Daten-Cache 16 und einen Übersetzungspuffer mit 64 Eingängen, der in einer integrierten Speichermanagement-Einheit (MMU) 18 angeordnet ist. Die primären Funktionen des MMU 18 sind Anforderungen für Memory-Referenzen zu schlichten und virtuelle Adressen in physikalische Adressen zu übersetzen. Eine integrierte Bus-Schnittstelleneinheit (BIU) 20 kontrolliert die Bus-Zyklen für externe Referenzen.CPU 10 initiates the bus cycle to communicate with external memory and other devices in the system to receive commands, read and write data, perform floating point operations, and respond to controlled requests. CPU 10 includes a four stage instruction pipeline 12 suitable for execution at 20 MHz up to 10 MIPS (millions of instructions per second). Also, integrated on-chip with the instruction pipeline 12 are three memory buffers that maintain a heavy pipeline 12 request for instructions and data. The memory buffers include a 512-byte instruction cache 14 , a 1024-byte data cache 16 and a 64-input translation buffer arranged in an integrated memory management unit (MMU) 18 . The primary functions of the MMU 18 are to meet requirements for memory references and to translate virtual addresses into physical addresses. An integrated bus interface unit (BIU) 20 controls the bus cycles for external references.

Wenn man die Cache- und Speichermanagement-Funktionen auf demselben Chip mit der Befehlspipeline 12 anordnet, so schafft man ausgezeichnete Kosten/Leistungen durch verbesserte Speicherzugriffzeit und Bandweite für alle Anwendungen.Placing the cache and memory management functions on the same chip with the command pipeline 12 creates excellent cost / performance through improved memory access time and bandwidth for all applications.

Beide, Befehls-Cache 14 und Daten-Cache 16, sind physikalisch. Dies ist wichtig im Hinblick der Unterstützung der Cache-Kohärenz mit externen Caches und Speichern. Den Multiprozessorsystemen, oder in Direktspeicherzugriff (DMA)-Operationen in allen Systemen, können Daten zu externen Speichern geschrieben werden, während dieselbe Adresse in den internen Caches 14, 16 existiert und deshalb ungültig gemacht werden muß. Wenn die internen Caches 14, 16 virtuell wären, würde ein einziger Cache-Eingang sehr schwer ungültig zu machen sein, während die externe Adresse physikalisch ist. Physikalische Caches erlauben einzelne Eingangsungültigkeiten.Both instruction cache 14 and data cache 16 are physical. This is important in terms of supporting cache coherence with external caches and memories. Data to external memories can be written to the multiprocessor systems, or in direct memory access (DMA) operations in all systems, while the same address exists in the internal caches 14, 16 and must therefore be invalidated. If the internal caches 14 , 16 were virtual, a single cache entry would be very difficult to invalidate while the external address is physical. Physical caches allow individual entry invalidities.

CPU 10 ist ebenso kompatibel mit nützlichen peripheren Geräten, so wie Unterbrechungseinheit (ICU) 24, zum Beispiel, NS 32 202. Die ICU Schnittstelle zu CPU 10 ist vollständig asynchron, so daß es möglich ist, die ICU 24 bei niedrigeren Sequenzen als die CPU 10 zu betreiben.CPU 10 is also compatible with useful peripheral devices such as Interrupt Unit (ICU) 24 , for example, NS 32 202. The ICU interface to CPU 10 is completely asynchronous, so that it is possible to run the ICU 24 at lower sequences than the CPU 10 to operate.

CPU 10 inkorporiert seinen eigenen Taktgenerator. Deshalb wird keine Zeitablaufkontrolleinheit benötigt.CPU 10 incorporates its own clock generator. Therefore, no timing control unit is needed.

CPU 10 unterstützt auch beide externen Cache-Speicher 25 genauso wie "Bus-Watcher" Kreis 26, der im Detail weiter unten beschrieben wird, und der zur Aufrechterhaltung der internen Cache-Kohärenz beiträgt. Wie in Fig. 2 dargestellt, hat CPU 10 114 Interface-Signale zur Bus-Taktung und Steuerung, Cache-Steuerung, gesteuerte Anforderungen und andere Funktionen. Die folgende Liste stellt eine Zusammenfassung der CPU 10 Schnittstellen-Signalfunktionen dar:CPU 10 also supports both external cache memories 25 as well as "bus watcher" circuit 26 , which is described in detail below, and which helps to maintain internal cache coherence. As shown in Fig. 2, CPU 10 has 114 interface signals for bus clocking and control, cache control, controlled requests, and other functions. The following list is a summary of the CPU 10 interface signal functions:

Input signals

Stoßweise positive Meldung (aktiv niedrig).
Wenn aktiv in Beantwortung zu einer Stoßanforderung, Anzeige, daß der Speicher die Stoßzyklen unterstützt.Intermittent positive message (active low).
When active in response to a burst request, it indicates that the memory supports the burst cycles.

Bus-Irrtum (Bus Error) (aktiv niedrig) zeigt CPU 10 an, daß ein Irrtum entdeckt worden ist während des aktuellen Bus-Zyklusses.Bus Error (active low) indicates to CPU 10 that an error has been detected during the current bus cycle.

Bus wiederholter Versuch (Bus Retry) (aktiv niedrig).
Zeigt an, daß CPU 10 den aktuellen Bus-Zyklus wieder durchführen muß.Bus retry (active low).
Indicates that CPU 10 must run the current bus cycle again.

BW0-BW1Bus-Breite (2 verschlüsselte Anschlußeinheiten). Diese Anschlußeinheiten definieren die Bus-Breite (8, 16 oder 32 bits) für jeden Datentransfer, wie dargestellt in Tabelle 1. Tabelle 1BW0-BW1Bus width (2 encrypted connection units). These connection units define the bus width (8, 16 or 32 bits) for each data transfer, such as shown in Table 1. Table 1

CIA0-CIA6Cache Ungültigkeitsadresse (Cache Invalidation Address) (7 verschlüsselte Anschlußeinheiten) Die Cache Ungültigkeitsadresse ist auf dem CIA Bus vorhanden. Tabelle 2 zeigt die CIA Anschlußeinheiten, die relevant für jeden der internen Caches von CPU 10 sind. Tabelle 2 CIA (0 : 4) Set Adresse in DC und IC
CIA (5 : 6) reserviertCIA0-CIA6Cache Invalidation Address (7 encrypted connection units) The cache invalidation address is available on the CIA bus. Table 2 shows the CIA port units that are relevant to each of CPU 10's internal caches. Table 2 CIA (0: 4) set address in DC and IC
CIA (5: 6) reserved

CIICache Sperreingang (Cache Inhibit In) (aktiv hoch).
Zeigt der CPU 10 an, daß die Speicherreferenz des aktuellen Bus-Zyklus nicht speicherbar ist. Cache Ungültigkeitsfreigabe (Cache Invalidation Enable).
Eingang, der bestimmt, ob die externe Cache Ungültigkeitsoption oder die Testmodus-Operation ausgewählt worden sind. CLKTaktgeber (Clock).
Eingangstaktgeber benutzt, um jeden Zeittakt für CPU 10 abzuleiten. Test nicht programmierbarer Sprung-Anfrage (Debug Trap Request) (Abstiegsflanke aktiviert).
Hoch nach niedrig Übergang dieses Signals verursacht nicht programmierbaren Sprung (DBG). Halteanforderung (Hold Request) (aktiv niedrig).
Fragt CPU 10 an, um den Bus auszulösen für DMA oder Multiprozessorzwecke. Unterbrechung (Interrupt) (aktiv niedrig).
Abdeckbare Unterbrechungsanforderung. Ungültigkeitsset (Invalidate Set) (aktiv niedrig).
Wenn niedrig, ist nur ein Set in den on-chip Caches ungültig gemacht; wenn hoch, sind alle Caches ungültig gemacht worden. Ungültige Daten-Cache (Invalidate Data Cache) (aktiv niedrig).
Wenn niedrig, wird eine Ungültigkeit gemacht in dem Daten-Cache. Ungültigkeitsbefehls-Cache (Invalidate Instruction Cache) (aktiv niedrig).
Wenn niedrig, wird eine Ungültigkeit gemacht in den Instruktions-Cache. I/O dekodieren (I/O Decode) (aktiv niedrig).
Zeigt CPU 10 an, daß ein peripheres Gerät von einem aktuellen Bus-Zyklus adressiert ist. Nicht markierbare Unterbrechung (Nonmaskable Interrupt) (Abstiegsflanke aktiviert).
Ein hoch-zu-niedrig Übergang dieses Signals fordert eine nicht maskierbare Unterbrechung an. RDYBetriebsklar (Ready) (aktiv hoch).
Während dieses Signal nicht aktiv ist, erweitert CPU 10 den aktuellen Bus-Zyklus zur Unterstützung eines langsamen Speichers oder peripherer Geräte. Rückstellung (Reset) (aktiv niedrig).
Erzeugt Rückstellungssteuerung, um CPU 10 vorzubereiten. Slave done (aktiv niedrig).
Zeigt CPU 10 an, daß ein Slave Prozessor vollständig einen Befehl ausgeführt hat. Paralleler nicht programmierter Sprung (Slave Trap) (aktiv niedrig).
Zeigt CPU 10 an, daß ein Parallelprozessor eine nicht programmierte Sprungkondition detektiert hat, während ein Befehl ausgeführt wird.CIICache lock input (cache inhibit in) (active high).
Indicates to CPU 10 that the memory reference of the current bus cycle cannot be saved. Cache Invalidation Enable.
Input that determines whether the external cache invalidation option or test mode operation has been selected. CLK clock.
Input clock used to derive each clock for CPU 10 . Test of non-programmable jump request (debug trap request) (rising edge activated).
High to low transition of this signal causes non-programmable jump (DBG). Hold request (active low).
Requests CPU 10 to trigger the bus for DMA or multiprocessing purposes. Interrupt (active low).
Coverable interrupt request. Invalidate set (active low).
If low, only one set in the on-chip caches is invalidated; if high, all caches have been invalidated. Invalidate data cache (active low).
If low, the data cache is invalidated. Invalidate Instruction Cache (active low).
If low, the instruction cache is invalidated. Decode I / O (I / O decode) (active low).
CPU 10 indicates that a peripheral device is addressed by a current bus cycle. Non-markable interrupt (non-maskable interrupt) (rising edge activated).
A high-to-low transition of this signal requests an unmaskable break. RDY operational clear (ready) (active high).
While this signal is not active, CPU 10 extends the current bus cycle to support slow memory or peripheral devices. Reset (active low).
Generates default control to prepare CPU 10 . Slave done (active low).
CPU 10 indicates that a slave processor has completely executed an instruction. Parallel non-programmed jump (slave trap) (active low).
CPU 10 indicates that a parallel processor has detected an unprogrammed jump condition while an instruction is being executed.

Output signals

A0-A31Adressen-Bus (3-Zustände, 32 Anschlüsse).
Überträgt die 32 bit-Adresse während eines Bus-Zyklusses; A0 überträgt den weniger signifikanten Bit. Adressen-Strobe (Adress Strobe) (aktiv niedrig, 3 Zustände).
Zeigt an, daß ein Bus-Zyklus begonnen hat und eine gültige Adresse auf dem Adressen-Bus ist. -Byte Freigabe (aktiv niedrig, dreistufig, 4 Anschlußeinheiten).
Signale geben den Transfer auf jedes Byte des Datenbus frei, wie in Tabelle 3 gezeigt. Tabelle 3A0-A31 Address bus (3 states, 32 connections).
Transmits the 32 bit address during a bus cycle; A0 transfers the less significant bit. Address strobe (address strobe) (active low, 3 states).
Indicates that a bus cycle has started and is a valid address on the address bus. -Byte enable (active low, three-stage, 4 connection units).
Signals enable the transfer to every byte of the data bus, as shown in Table 3. Table 3

Anfang Speicher Transaktion (aktiv niedrig, 3stufig).
Zeigt an, daß der aktuelle Bus-Zyklus gültig ist, das ist, der Bus-Zyklus ist noch nicht gelöscht worden; früher verfügbar in dem Bus-Zyklus als .Kipp-Punkt (aktiv niedrig).
Zeigt an, daß CPU 10 eine Testbedingung detektiert hat.Trennanforderung (aktiv niedrig, 3stufig).
Zeigt an, daß CPU 10 anfordert, getrennte Zyklen durchzuführen. BUSCLKBus-Taktgeber.
Ausgangs-Taktgeber zum Bustimen. CASECCache Sektion (3stufig).
Lese für zwischenspeicherbare Daten die Bus-Zyklen, wobei die Sektion auf dem on-chip Daten Cache 18 angezeigt wird, worin die Daten plaziert werden. CIOCache-Sperre (aktiv hoch).
Anzeige durch CPU 10, daß die Speicherreferenz des aktuellen Bus-Zyklus nicht zwischenspeicherbar ist; kontrolliert durch den CI-Bit in den Kanal-2 Seiten Tabellen Eingang. Bestätigung Bus-Zyklus (aktiv niedrig, 3stufig).
Zeigt an, daß der Bus-Zyklus mit ADS startend gültig ist; das ist, der Bus-Zyklus ist noch nicht gelöscht worden. Datenflußrichtung (aktiv niedrig, 3-Zustände).
Zeigt die Richtung des Transfers auf dem Daten-Bus an; wenn "niedrig" während eines Bus-Zyklus vorliegt, zeigt es an, daß CPU 10 Daten liest; wenn "hoch" während eines Bus-Zyklus vorliegt, zeigt es an, daß CPU 10 Daten schreibt. Halten der positiven Meldung (aktiv niedrig).
Aktiviert durch CPU 10 als Antwort auf die 1-HOLD Eingabe, um anzuzeigen, daß CPU 10 den Bus freigegeben hat. Verriegelter Bus-Zyklus (aktiv niedrig).
Zeigt an, daß eine Folge von Bus-Zyklen mit Verriegelungssicherung fortschreitet. I/O Sperre (aktiv niedrig).
Zeigt an, daß der aktuelle Bus-Zyklus ignoriert werden sollte, wenn eine periphere Anweisung adressiert wird. Interner sequentieller Abruf.
Zeigt an, einhergehend mit PFS, daß die befehlsauslösende Ausführung sequentiell ist (ISF = niedrig) oder nicht sequentiell ist (ISF = hoch). Programmflußstatus (aktiv niedrig).
Ein Impuls auf diesem Signal zeigt den Beginn der Ausführung für jeden Befehl an. Slave Rechner-Kontrolle (aktiv niedrig).
Datenübernahme (Data Strobe) für Slave Verarbeitungs-Bus-Zyklus. ST0-ST4Zustand (5 kodierte Zeilen (lines)).
Bus-Zyklus Status Code; ST0 ist der am wenigsten wichtige Bit. Die Verschlüsselung ist in Tabelle 4 gezeigt. U/Benutzer/Überwacher (3 Zustände).
Zeigt den Benutzer (U/ = hoch) oder den Überwacher (U/ = niedrig) Modus an.Start of memory transaction (active low, 3 levels).
Indicates that the current bus cycle is valid, that is, the bus cycle has not yet been deleted; available earlier in the bus cycle as a toggle point (active low).
Indicates that CPU 10 has detected a test condition. Disconnect request (active low, 3-stage).
Indicates that CPU 10 requests to run separate cycles. BUSCLKBus clock.
Output clock for bus timing. CASECCache section (3 levels).
Read the bus cycles for cacheable data with the section indicated on the on-chip data cache 18 where the data is placed. CIOCache lock (active high).
Indication by CPU 10 that the memory reference of the current bus cycle cannot be buffered; controlled by the CI bit in the channel 2 page table input. Bus cycle confirmation (active low, 3-stage).
Indicates that the bus cycle starting with ADS is valid; that is, the bus cycle has not yet been cleared. Data flow direction (active low, 3-states).
Indicates the direction of the transfer on the data bus; if "low" during a bus cycle, it indicates that CPU 10 is reading data; if "high" during a bus cycle, it indicates that CPU 10 is writing data. Hold the positive message (active low).
Activated by CPU 10 in response to the 1-HOLD input to indicate that CPU 10 has released the bus. Interlocked bus cycle (active low).
Indicates that a sequence of bus cycles with interlock protection is progressing. I / O lock (active low).
Indicates that the current bus cycle should be ignored when a peripheral instruction is addressed. Internal sequential polling.
Indicates, along with PFS, that the instruction execution is sequential (ISF = low) or non-sequential (ISF = high). Program flow status (active low).
A pulse on this signal indicates the start of execution for each command. Slave computer control (active low).
Data strobe for slave processing bus cycle. ST0-ST4 state (5 coded lines).
Bus cycle status code; ST0 is the least important bit. The encryption is shown in Table 4. U / user / supervisor (3 states).
Displays user (U / = high) or supervisor (U / = low) mode.

Bi-directional signals

D0-D31Daten-Bus (3 Zustände, 32 Zeilen).
Überträgt 8, 16 oder 32 Bits von Daten während eines Bus-Zyklus; D0 überträgt den am wenigsten wichtigen Bit. D0-D31 data bus (3 states, 32 lines).
Transmits 8, 16 or 32 bits of data during a bus cycle; D0 transmits the least important bit.

Tabelle 4 Table 4

Bezug nehmend auf Fig. 3, CPU 10 ist intern aufgebaut aus 8 größeren funktionalen Einheiten, die parallel arbeiten, um die folgenden Operationen zur Ausführung von Befehlen durchzuführen:Referring to Figure 3, CPU 10 is internally built up of 8 larger functional units that work in parallel to perform the following instruction execution operations:

Vor-Abrufen, Decodieren, Ermitteln effektiver Adressen und Lesen von Ursprungsoperanden, Berechnen von Ergebnissen und Speichern in Registern, Speichern der Ergebnisse in Speicher.Pre-fetch, decode, find effective addresses and read of source operands, calculation of results and storage in registers, storing the results in memory.

Ein Bereitsteller 28 ruft Befehle vorab und decodiert sie zur Benutzung durch eine Adresseneinheit 30 und eine Ausführungseinheit 32. Der Bereitsteller 28 (loader) überträgt Befehle, die er von dem Befehls-Cache 14 auf den IBUS-Bus in einer 8-Byte-Befehlswarteschlange erhalten hat. Der Bereitsteller 28 kann ein Befehlsfeld auf jedem Zyklus abfragen, wobei ein "Feld" entweder einen Operationscode (1-3 Bytes einschl. Adressieren von Mode- Spezifikationselementen), Distanz- oder Sofortwert. Der Bereitsteller 28 decodiert den Operationscode, um die Anfangs-Mikro- Code-Adresse zu erzeugen, die über den LADR Bus zu der Ausführungseinheit 32 gelangt. Die decodierten allgemeinen adressierenden Moden werden über den ADMS Bus zur Adresseneinheit 30 gegeben. Distanzwerte werden zur Adresseneinheit 30 über den DISP Bus gesandt. Sofortwerte sind verfügbar auf den GCBUS Bus. Der Bereitsteller 28 schließt auch einen Abzweig-vorhersagenden Mechanismus ein, der im einzelnen Detail weiter unten beschrieben wird.A provider 28 pre-fetches and decodes instructions for use by an address unit 30 and an execution unit 32 . The provider 28 (loader) transfers commands received from the command cache 14 to the IBUS bus in an 8-byte command queue. The provider 28 can query a command field on each cycle, a "field" being either an operation code (1-3 bytes including addressing mode specification elements), distance or immediate value. The provider 28 decodes the opcode to generate the initial microcode address which is passed to the execution unit 32 via the LADR bus. The decoded general addressing modes are given to the address unit 30 via the ADMS bus. Distance values are sent to address unit 30 via the DISP bus. Immediate values are available on the GCBUS bus. The provider 28 also includes a branch prediction mechanism, which is described in detail below.

Adresseneinheit 20 ermittelt effektive Adressen, wobei ein zugeordneter 32-Bit-Zähler benutzt wird und liest Ursprungsoperanden zur Ausführungseinheit 32. Adresseneinheit 30 kontrolliert einen Anschluß von der Registerdatei 34 zum GCBUS durch den es Basis und Indexwerte zu dem Adressenaddierglied und Datenwerte zu der Ausführungseinheit 32 transferiert. Effektive Adressen für Operandenverweise werden zu MMU 18 und Data Cache 16 über den GVA Bus transferiert, der der virtuelle Adressen-Bus ist. Ausführungseinheit 32 schließt eine Datenbahn und die mikrocodierte Steuerung für die Ausführungsbefehle und die Verarbeitungsabläufe ein. Die Datenbahn schließt eine 32 Bit arithmetische Schalteinheit (Arithmetic Logic Unit (ALU)), einen 32 Bit Trommelumschalter (barrel shifter), einen 8 Bit Prioritätscodierer und eine Anzahl von Zählern ein. Hardware für besondere Zwecke, die in der Ausführungseinheit 32 inkorporiert ist, unterstützt Multiplikation, Zurücknahme ein Bit pro Zyklus mit Optimierung für Multiplizierer von kleinen absoluten Werten.Address unit 20 determines effective addresses using an associated 32-bit counter and reads source operands to execution unit 32 . Address unit 30 controls a connection from register file 34 to GCBUS through which it transfers base and index values to the address adder and data values to execution unit 32 . Effective addresses for operand references are transferred to MMU 18 and data cache 16 via the GVA bus, which is the virtual address bus. Execution unit 32 includes a highway and microcoded control for the execution instructions and processing flows. The data path includes a 32 bit arithmetic logic unit (ALU), a 32 bit barrel shifter, an 8 bit priority encoder and a number of counters. Special purpose hardware incorporated in execution unit 32 supports multiplication, one bit per cycle retrace with optimization for multipliers of small absolute values.

Ausführungseinheit 32 steuert einen Anschluß zu der Registerdatei 34 von dem GNA Bus, auf welchem sie Resultate speichert. Der GNA Bus wird ebenfalls benutzt durch die Ausführungseinheit 32, um die Werte von zugeordneten Registern, wie Konfiguration- und Unterbrechungsbasisregistern zu lesen, die in die Registerdatei 34 eingeschlossen sind. Ein zweieingängiger Datenpuffer erlaubt der Ausführungseinheit 32 die Ausführung eines Befehls mit der Speicherung von Resultaten im Speicher für vorhergehende Befehle zu überlappen. Der GVA Bus wird von der Ausführungseinheit 32 benutzt, um Speicherreferenzen für komplexe Befehle durczuführen, zum Beispiel String Operationen, und Ausgabesteuerung.Execution unit 32 controls a connection to register file 34 from the GNA bus, on which it stores results. The GNA bus is also used by execution unit 32 to read the values of associated registers, such as configuration and interrupt base registers, that are included in register file 34 . A two-pass data buffer allows the execution unit 32 to overlap the execution of an instruction with the storage of results in memory for previous instructions. The GVA bus is used by the execution unit 32 to perform memory references for complex instructions such as string operations and output control.

Das Dateiregister 34 ist mit zwei Anschlüssen versehen, die einen Lesezugriff durch die Adresseneinheit 30 über den GCBUS und einen Lese-/Schreibzugriff durch die Ausführungseinheit 32 über den GNA Bus erlaubt. Das Dateiregister 34 hält die Register für allgemeine Zwecke, die zugeordneten Register und Programmzählerwerte für Adresseneinheit 30 und Ausführungseinheit 32.The file register 34 is provided with two connections which allow read access by the address unit 30 via the GCBUS and read / write access by the execution unit 32 via the GNA bus. The file register 34 holds the general-purpose registers, the associated registers, and program counter values for the address unit 30 and execution unit 32 .

MMU 18 ist kompatibel mit den Speichermanagementfunktionen der CPU 10. Befehlscache 14, Adresseneinheit 30 und Ausführungseinheit 32 machen Anforderungen an MNU 18 für Speicherreferenzen. MMU 18 bearbeitet die Abfragen, wobei ein Zugriff zum Transfer einer virtuellen Adresse auf den GVA Bus zugestanden wird. MMU 18 übersetzt die virtuelle Adresse, die er über den GVA Bus erhält, in die korrespondierende physikalische Adresse, wobei der Übersetzungspuffer benutzt wird. MMU 18 überträgt die physikalische Adresse über den MPA Bus entweder zum Befehls Cache 14 oder zum Daten Cache 16, was davon abhängt, ob eine Befehls- oder Datenreferenz durchzuführen ist. Die physikalische Adresse wird ebenso zu BIU 20 für einen externen Bus-Zyklus übertragen.MMU 18 is compatible with the memory management functions of CPU 10 . Instruction cache 14 , address unit 30 and execution unit 32 make requests to MNU 18 for memory references. MMU 18 processes the queries, allowing access to transfer a virtual address to the GVA bus. MMU 18 translates the virtual address that it receives via the GVA bus into the corresponding physical address, using the translation buffer. MMU 18 transfers the physical address over the MPA bus to either instruction cache 14 or data cache 16 , depending on whether an instruction or data reference is to be performed. The physical address is also transferred to BIU 20 for an external bus cycle.

Bus-Schnittstelleneinheit (BIU) 20 steuert die Bus-Zyklen für Referenzen von dem Befehls-Cache 14, der Adresseneinheit 30 und der Ausführungseinheit 32. BIU 20 enthält einen Puffer mit drei Eingängen für externe Referenzen. Auf diese Weise, zum Beispiel, kann BIU 20 einen Bus-Zyklus für eine Befehlsabfrage durchführen, während ein Befehl für einen anderen Bus-Zyklus gehalten wird, an den Speicher zu schreiben, und gleichzeitig die nächsten gelesenen Daten annehmen.Bus interface unit (BIU) 20 controls the bus cycles for references from instruction cache 14 , address unit 30, and execution unit 32 . BIU 20 contains a buffer with three inputs for external references. In this way, for example, BIU 20 can perform a bus cycle for an instruction query while holding an instruction for another bus cycle to write to memory while accepting the next read data.

Bezogen auf Fig. 4, Befehls-Cache 14 speichert 512 Bytes in einer direkt abbildenden Organisation. Die Bits 4-8 einer Referenzbefehlsadresse wählen eins der 32 Sets aus. Jeder Set enthält 16 Bytes des Codes und ein Protokoll, das Adressenidentifizierungszeichen hält, die die 23 am meisten signifikanten Bits der physikalischen Adresse für die Speicherplätze aufweist, die in jedem Set gespeichert sind. Ein gültiger Bit ist mit jedem Doppelwort verbunden.Referring to Figure 4, instruction cache 14 stores 512 bytes in a direct mapping organization. Bits 4-8 of a reference command address select one of the 32 sets. Each set contains 16 bytes of code and a protocol that holds address identifiers that have the 23 most significant bits of the physical address for the memory locations stored in each set. A valid bit is associated with each double word.

Befehls-Cache 14 beinhaltet ebenso einen 16-Byte Befehlspuffer, von dem er 32-Bits des Codes der Zyklus über den IBUS zum Bereitsteller 28 transferieren kann. Für den Fall, daß der gewünschte Befehl in dem Befehls-Cache 14 gefunden wird (ein "Treffer"), wird der Befehlspuffer von dem aufgesuchten Set des Befehls-Cache geladen und kein Bus-Zyklus wird mit externem Speicher angefordert. Für den Fall, daß ein Referenzbefehl nicht gefunden wird im Befehls-Cache 14 (ein "Fehlschlag"), transferiert der Befehls- Cache 14 die Adresse des den Fehlschlag anzeigenden Doppelwortes über den GVA Bus zu der MMU 18, die die Adresse für BIU 20 übersetzt. BIU 20 beginnt einen stoßweisen Lesezyklus, um den Befehlspuffer von dem externen Speicher durch die GBDI Bus zu laden. Der Befehlspuffer ist dann für einen der Sets des Befehls- Cache 14 beschrieben.Instruction cache 14 also includes a 16 byte instruction buffer from which it can transfer 32 bits of the code of the cycle to IBUS 28 via IBUS. In the event that the desired instruction is found in the instruction cache 14 (a "hit"), the instruction buffer is loaded from the visited instruction cache set and no bus cycle is requested with external memory. In the event that a reference instruction is not found in the instruction cache 14 (a "failure"), the instruction cache 14 transfers the address of the double word indicating the failure via the GVA bus to the MMU 18 , which is the address for BIU 20 translated. BIU 20 begins an intermittent read cycle to load the command buffer from external memory through the GBDI bus. The command buffer is then described for one of the sets of command cache 14 .

Befehls-Cache 14 hält Zähler sowohl für die virtuellen als auch die physikalischen Adressen, von denen das nächste Doppelwort des Datenstroms vorabzurufen ist. Wenn der Befehls-Cache 14 beginnen muß, von einem neuen Befehlsstrom vorabzurufen, wird die virtuelle Adresse für den neuen Strom vom Bereitsteller 28 auf den IBUS transferiert. Wenn zu einer neuen Seite übergegangen wird, transferiert der Befehls-Cache 14 die virtuelle Adresse nach MMU 18 über den GVA Bus und erhält die physikalische Adresse über den MPA Bus zurück.Instruction cache 14 holds counters for both the virtual and physical addresses from which the next double word of the data stream is to be prefetched. When the instruction cache 14 must begin prefetching from a new instruction stream, the virtual address for the new stream is transferred from the provider 28 to the IBUS. When proceeding to a new page, instruction cache 14 transfers the virtual address to MMU 18 over the GVA bus and gets the physical address back over the MPA bus.

Befehls-Cache 14 unterstützt einen Operationsmodus, um seinen Inhalt für speicherresistente Plätze zu sperren. Dieses Merkmal wird dadurch erreicht, indem ein Sperrbefehls-Cache (LIC) Bit in dem Konfigurationsregister angeordnet wird. Es kann in Realzeitsystemen benutzt werden, um schnelle, on-chip Zugriffe zu den meist kritischen Routinen zu ermöglichen. Befehls-Cache 14 kann auch dadurch eingeschaltet werden, indem ein Befehls-Cache-Einschalter (IC) Bit in dem Konfigurationsregister vorgesehen wird.Instruction cache 14 supports an operation mode to lock its content from memory-resistant locations. This feature is achieved by placing a lock instruction cache (LIC) bit in the configuration register. It can be used in real-time systems to enable fast, on-chip access to the most critical routines. Instruction cache 14 can also be turned on by providing an instruction cache on (IC) bit in the configuration register.

Daten-Cache 16 speichert 1024 Bytes von Daten in einer zweiwegesetassoziativen Organisation, wie in Fig. 5 gezeigt. Jeder Set hat zwei Eingänge, die 16 Bytes enthalten und zwei Adreßidentifizierungskennzeichen, die die 23 meist signifikanten Bits der physikalischen Adresse für die Speicherplätze halten, die in zwei Eingängen gespeichert werden. Ein gültiger Bit ist mit jedem Doppelwort assoziiert.Data cache 16 stores 1024 bytes of data in a two-way set associative organization, as shown in FIG . Each set has two inputs that contain 16 bytes and two address identifiers that hold the 23 most significant bits of the physical address for the memory locations that are stored in two inputs. A valid bit is associated with each double word.

Der Zeitablauf beim Zugriff im Daten-Cache 16 ist in Fig. 6 gezeigt. Zunächst werden die virtuellen Adressen-Bits 4-8 auf dem GVA Bus benutzt, um das zugehörige Set innerhalb des Daten-Caches 16 auszusuchen, um die zwei Eingänge zu lesen. Gleichzeitig übersetzt MMU 18 die virtuelle Adresse und überträgt die physikalische Adresse zum Daten-Cache 16 und zu BIU 20 auf den MPA Bus. Daten Cache 16 vergleicht die zwei Adressenidentifizierungskennzeichen mit den physikalischen Adressen, während BIU 20 einen externen Bus-Zyklus initiiert, um die Daten aus dem externen Speicher zu lesen. Wenn die Referenz ein Treffer ist, werden durch Daten Cache 16 die ausgesuchten Daten ausgerichtet und zur Ausführungseinheit 32 auf den GDATA Bus transferiert und BIU 20 löscht den externen Bus-Zyklus, aber bestimmt nicht die BMT und Signale. Wenn die Referenz ein Fehlschlag ist, vervollständigt BIU 20 den externen Bus-Zyklus und transferiert Daten aus dem externen Speicher zur Ausführungseinheit 32 und zum Daten-Cache 16, der seinen Cache-Eingang aktualisiert. Bei Referenzen, die treffen, kann der Daten-Cache 16 einen Durchsatz von einem Doppelwort pro Zyklus aufrechterhalten, mit einer Latenzzeit von 1,5 Zyklen.The timing of access to data cache 16 is shown in FIG. 6. First, the virtual address bits 4-8 on the GVA bus are used to search for the associated set within the data cache 16 in order to read the two inputs. At the same time, MMU 18 translates the virtual address and transfers the physical address to the data cache 16 and to BIU 20 on the MPA bus. Data cache 16 compares the two address identifiers with the physical addresses while BIU 20 initiates an external bus cycle to read the data from external memory. If the reference is a hit, data cache 16 aligns the selected data and transfers it to execution unit 32 on the GDATA bus and BIU 20 clears the external bus cycle but does not determine the BMT and signals. If the reference is a failure, BIU 20 completes the external bus cycle and transfers data from external memory to execution unit 32 and to data cache 16 , which updates its cache entry. For references that hit, data cache 16 can maintain throughput of one double word per cycle, with a latency of 1.5 cycles.

Data-Caches 16 ist ein Durchschreibe-Cache. Zur Speicherung von Schreibreferenzen prüft der Daten-Cache 6, ob die Referenz ein Treffer ist. Wenn es so ist, werden die Inhalte des Cache aktualisiert. In dem Fall eines Treffers oder eine Fehlgriffes schreibt BIU 20 die Daten durch auf einen externen Speicher.Data caches 16 is a write-through cache. To store write references, the data cache 6 checks whether the reference is a hit. If so, the contents of the cache are updated. In the event of a hit or a miss, BIU 20 writes the data through to an external memory.

Wie der Befehls-Cache 14 unterstützt der Daten-Cache 16 einen Operationsmodus, um seine Inhalte für speicherresistente Plätze zu sperren. Dieses Merkmal wird dadurch verwirklicht, indem ein Sperrdaten-Cache (LDC) Bit in dem Konfigurationsregister angeordnet wird. Es kann in Realzeitsystemen benutzt werden, um einen schnellen on-chip Zugriff zu den meist kritischen Datenspeicherplätzen zu ermöglichen. Data Cache 16 kann eingeschaltet werden durch Setzen des Data Cache Freigabebits in dem Konfigurationsregister.Like instruction cache 14 , data cache 16 supports an operation mode to lock its contents from memory-resistant locations. This feature is accomplished by placing a lock data cache (LDC) bit in the configuration register. It can be used in real-time systems to enable fast on-chip access to the most critical data storage locations. Data cache 16 can be turned on by setting the data cache enable bit in the configuration register.

Das Konfigurationsregister ist in Registerdatei 34 eingeschlossen und aufgeteilt in 32 Bits, von denen 9 Bits implementiert sind. Die implementierten Bits setzen verschiedene Operationsmoden für CPU 10 in Gang, einschließlich einer gerichteten Unterbrechung (Vektorenunterbrechung), Ausführung von Slave Befehlen und Steuerung des on-chip Befehls-Cache 14 und Data Cache 16. Wenn die Inhalte der Konfigurationsregister geladen sind, werden die für Bits 4-7 geladenen Werte ignoriert; wenn die Inhalte der Konfigurationsregister gespeichert sind, sind diese Bits 1. Das Format der Konfigurationsregister ist in Tabelle 5 gezeigt. Die verschiedenen Kontroll-Bits werden unten beschrieben.The configuration register is included in register file 34 and divided into 32 bits, of which 9 bits are implemented. The implemented bits initiate various operating modes for CPU 10 , including a directional interrupt (vector interrupt), execution of slave instructions, and control of on-chip instruction cache 14 and data cache 16 . If the contents of the configuration register are loaded, the values loaded for bits 4-7 are ignored; when the contents of the configuration register are stored, these bits are 1. The format of the configuration register is shown in Table 5. The various control bits are described below.

Tabelle 5 Table 5

IVektorenunterbrechung. Dieses Bit steuert, ob abdeckbare Unterbrechungen in nicht zeigergesteuerten (VI=0) oder zeigergesteuerten (VI=1) Moden behandelt werden. FGleitpunktbefehls-Set. Dieses Bit zeigt an, daß eine Gleitpunkteinheit vorhanden ist, um die Gleitpunktbefehle auszuführen. MSpeichermanagement-Befehls-Set. Dieses Bit löst die Ausführung von Speichermanagement-Befehlen aus. CKundenbefehls-Set. Dieses Bit zeigt an, ob ein Kunden Slave Prozessor anwesend ist, um Kundenbefehle auszuführen. DEDirektunterbrecher. Dieses Bit löst einen Direktunterbrecher Mode aus, eine Mode zur Verarbeitung von Unterbrechungen, die die Antwortzeit des CPU 10 für Unterbrechungen und andere Ablaufunterbrechungen verbessert. DCData Cache Auslöser. Dieses Bit löst Data Cache 16 aus, um zugreifbar zu sein für Daten lesen und schreiben. LDCSperre Data Caches. Dieses Bit kontrolliert, ob die Inhalte des Daten Caches 16 auf speicherresistenen Speicherplätzen lokalisiert sind oder aktualisiert (fortgechrieben) werden, wenn ein Datenlesen von dem Cache fehlschlägt (LIC=0). ICBefehls-Cache-Auslöser. Dieses Bit löst Befehls-Cache 14 aus, um zugreifbar zu sein für Befehlsabfragen. LICSperre Befehls-Cache. Dieses Bit steuert, ob die Inhalte des Befehls-Cache 14 auf speicherresistenten Speicherplätzen lokalisiert sind (LIC=1) oder aktualisiert (fortgeschrieben) sind, wenn eine Befehlsabfrage von dem Cache fehlschlägt (LIC=0). IVector interruption. This bit controls whether coverable interruptions are handled in non-pointer controlled (VI = 0) or pointer controlled (VI = 1) modes. F floating point command set. This bit indicates that there is a floating point unit to execute the floating point commands. MSemory management command set. This bit triggers the execution of memory management commands. CCustomer command set. This bit indicates whether a customer slave processor is present to execute customer commands. Direct breaker. This bit triggers a direct breaker mode, an interrupt processing mode that improves the CPU 10 response time for interruptions and other process interruptions. DCData cache trigger. This bit triggers data cache 16 to be accessible for data read and write. LDC lock data caches. This bit controls whether the contents of the data cache 16 are located in memory-resistant storage locations or are updated (updated) if data reading from the cache fails (LIC = 0). IC instruction cache trigger. This bit triggers instruction cache 14 to be accessible for instruction queries. LIC lock command cache. This bit controls whether the contents of instruction cache 14 are located in memory-resistant locations (LIC = 1) or updated (updated) when an instruction query from the cache fails (LIC = 0).

Wie oben festgestellt, überlappt CPU 10 Operationen zur gleichzeitigen Ausführung verschiedener Befehle in der Vierstufen-Pipeline 12. Der allgemeine Aufbau der Pipeline 12 und die verschiedenen Puffer für Befehle und Daten sind in Fig. 7 gezeigt. Während Ausführungseinheit 32 die Ergebnisse für einen Befehl berechnet, kann die Adresseneinheit 30 die effektive Adresse ermitteln und die Quellenoperanden für den folgenden Befehl lesen, und Bereitsteller 28 kann einen dritten Befehl decodieren und einen vierten Befehl in seine 8-Byte Warteschlange vorabrufen.As noted above, CPU 10 overlaps operations to simultaneously execute various instructions in four-stage pipeline 12 . The general structure of the pipeline 12 and the various buffers for instructions and data are shown in FIG . While execution unit 32 calculates the results for one instruction, address unit 30 can determine the effective address and read the source operands for the following instruction, and provider 28 can decode a third instruction and prefetch a fourth instruction into its 8-byte queue.

Adresseneinheit 30 und Ausführungseinheit 32 können Befehle mit einer Spitzenrate von 2 Zyklen pro Befehl verarbeiten. Bereitsteller 28 kann Befehle mit einer Spitzenrate von einem Zyklus pro Befehl verarbeiten, so wird er typischerweise eine dauernde Versorgung von Befehlen zu der Adresseneinheit 30 und der Ausführungseinheit 32 aufrechterhalten. Bereitsteller 28 unterbricht den Durchsatz der Pipeline 12 nur, wenn ein Loch in dem Befehlsstrom auftaucht entsprechend einem Sprungbefehl oder einem Fehlgriff im Befehls-Cache 14.Address unit 30 and execution unit 32 can process instructions at a peak rate of 2 cycles per instruction. Provider 28 can process instructions at a peak rate of one cycle per instruction, so it will typically maintain a continuous supply of instructions to address unit 30 and execution unit 32 . Provider 28 only interrupts pipeline 12 throughput when a hole appears in the instruction stream corresponding to a jump instruction or a miss in instruction cache 14 .

Fig. 8 zeigt die Ausführung von zwei Speicher-zu-Register Befehlen durch Adresseneinheit 30 und Ausführungseinheit 32. CPU 10 kann eine Ausführungsrate von 2 Zyklen für die meisten gewöhnlichen Befehle aufrechterhalten, typischerweise auftretende Verzögerungen gibt es nur in den folgenden Fällen: Fig. 8 of two memory-to-register shows the execution instructions by address unit 30 and execution unit 32. CPU 10 can maintain an execution rate of 2 cycles for most common instructions, delays typically occurring only in the following cases:

1. Memory delays due to cache and translation buffer misses and misaligned references.
2. Resource competition between stages of the pipeline 12 .
3. Jump instruction and other non-sequential instruction calls.
4. Complex addressing modes like scaled index, and complex Operations such as division.

Fig. 9 zeigt den Effekt eines Fehlgriffes im Daten-Cache 16 im Zeitablauf der Pipeline 12. Ausführungseinheit 32 wird verzögert durch zwei Zyklen bis BIU 20 den Bus-Zyklus zum Datenlesen abschließt. Die von CPU 10 durchgeführten Basis-Bus-Zyklen werden im näheren Detail unten diskutiert. Fig. 9 shows the effect of a mishit in the data cache 16 in the timing of the pipeline 12.. Execution unit 32 is delayed by two cycles until BIU 20 completes the bus cycle for reading data. The base bus cycles performed by CPU 10 are discussed in more detail below.

Fig. 10 zeigt den Effekt einer Adressenregisterunterbrechung im Zeitablauf der Pipeline 12. Ein Befehl modifiziert ein Register, während der nächste Befehl das Register für eine Adressenermittlung benutzt. Adresseneinheit 30 wird verzögert durch 3 Zyklen bis Ausführungseinheit 32 die Registerfortschreibung abgeschlossen hat. Bemerkenswert ist, daß, wenn der zweite Befehl das Register für einen Datenwert eher als eine Adressenermittlung (z. B. ADDD R0, R1) benutzt hat, dann würde ein Bypasskreislauf in der Ausführungseinheit 32 benutzt, um eine Verzögerung zu Pipeline 12 zu vermeiden. Fig. 10 shows the effect of an address register in the interrupt timing of the pipeline 12.. One instruction modifies a register, while the next instruction uses the register for address determination. Address unit 30 is delayed by 3 cycles until execution unit 32 has completed the register update. It is noteworthy that if the second instruction used the register for a data value rather than an address determination (e.g. ADDD R0, R1), then a bypass circuit in the execution unit 32 would be used to avoid a delay to pipeline 12 .

Wie oben festgestellt, schließt der Bereitsteller 28 einen Kreislauf für die Behandlung von Sprungbefehlen ein.As noted above, the provider 28 includes a circuit for handling jump instructions.

"Sprung"-Befehle sind solche Befehle, die mögliche Übertragung leisten einer Steuerung zu einem Befehl an einer ermittelten Bestimmungsadresse durch Hinzufügen eines Distanzwertes, der in den laufend ausgeführten Befehl für eine Adresse von laufend ausgeführten Befehlen verschlüsselt ist. Sprungbefehle können "unbedingt" oder "bedingt" sein; in dem letzteren Falle wird ein Test gemacht, um zu bestimmen, ob eine spezifizierte Bedingung bezüglich des Zustandes von CPU 10 zutrifft. Ein Sprungbefehl ist angezeigt "genommen zu werden", entweder wenn er unbedingt oder wenn er bedingt ist und die spezifizierte Bedingung zutrifft."Jump" commands are those commands which allow a control to be transmitted to a command at a determined destination address by adding a distance value which is encoded in the currently executed command for an address of continuously executed commands. Jump commands can be "unconditional" or "conditional"; in the latter case, a test is made to determine whether a specified condition regarding the state of CPU 10 applies. A jump command is indicated "to be taken", either if it is unconditional or if it is conditional and the specified condition applies.

Wenn ein Sprungbefehl decodiert ist, ermittelt der Bereitsteller 28 die Bestimmungsadresse und wählt zwischen den sequentiellen und nichtsequentiellen Befehlsströmen aus. Die Auswahl basiert auf der Sprungbefehlbedingung und Richtung. Wenn Bereitsteller 28 voraussagt, daß der Sprungbefehl genommen wird, dann wird die Bestimmungsadresse zu dem Befehls-Cache 14 über den IBUS transferiert. Ob nun der Sprungbefehl vorausgesagt wird genommen zu werden oder nicht, stellt der Bereitsteller 28 die Adresse des anderen Befehlsstroms bereit. Später erreicht der Sprungbefehl die Ausführungseinheit 32, wo die Bedingung aufgelöst wird. Ausführungseinheit 32 signalisiert Bereitsteller 28, ob der Sprungbefehl genommen wurde oder nicht. Wenn der Sprungbefehl unkorrekt vorhergesagt worden ist, wird Pipeline 12 erregt und Befehls-Cache 14 beginnt, Befehle von der korrekten Stromfuhr abzufragen. When a branch instruction is decoded, the provider 28 determines the destination address and selects between the sequential and non-sequential instruction streams. The selection is based on the jump instruction condition and direction. If provider 28 predicts that the branch instruction will be taken, then the destination address is transferred to instruction cache 14 via the IBUS. Whether the jump instruction is predicted to be taken or not, the provider 28 provides the address of the other instruction stream. Later, the jump instruction arrives at execution unit 32 , where the condition is released. Execution unit 32 signals provider 28 whether the jump instruction has been taken or not. If the jump instruction has been incorrectly predicted, pipeline 12 is energized and instruction cache 14 begins to interrogate instructions for the correct power supply.

Fig. 11 zeigt die Wirkung einer korrekten Vorhersage eines Sprungbefehls, der genommen wird. Ein 2-Zyklusspalt tritt in der Decodierung der Befehle durch Bereitsteller 28 auf. Dieser Spalt im oberen Bereich der Pipeline 12 kann oftmals geschlossen werden, weil ein vollständig decodierter Befehl zwischen Bereitsteller 28 und Adresseneinheit 30 gepuffert wird und weil andere Verzögerungen gleichzeitig bei späteren Zuständen in der Pipeline 12 auftreten können. Fig. 11 shows the effect of correctly predicting a jump instruction that is taken. A 2 cycle gap occurs in the decoding of commands by provider 28 . This gap in the upper area of the pipeline 12 can often be closed because a completely decoded instruction is buffered between the provider 28 and the address unit 30 and because other delays can occur simultaneously in later states in the pipeline 12 .

Fig. 12 zeigt den Effekt einer unkorrekten Vorhersage der Auflösung eines Sprungbefehls. Ein 4-Zyklus-Spalt tritt in der Ausführungseinheit 32 auf. Figure 12 shows the effect of incorrectly predicting the resolution of a jump instruction. A 4 cycle gap occurs in execution unit 32 .

CPU 10 empfängt ein einphasiges Eingangs-Taktsignal CLK, das eine zweifache Frequenz in Beziehung zur Operationsrate des CPU 10 hat. Zum Beispiel beträgt die Eingangs-Taktsequenz 40 MHz für eine CPU 10 Verarbeitung bei 20 MHz. CPU 10 dividiert den CLK Eingang durch zwei um einen internen Takt zu erlangen, der zusammengesetzt ist aus zwei sich nicht überlappenden Phasen, PHI1 und PHI2. CPU 10 fährt PHI1 auf das BUSCLK Ausgangssignal.CPU 10 receives a single-phase input clock signal CLK that has a double frequency in relation to the operation rate of CPU 10 . For example, the input clock sequence is 40 MHz for CPU 10 processing at 20 MHz. CPU 10 divides the CLK input by two to obtain an internal clock which is composed of two non-overlapping phases, PHI1 and PHI2. CPU 10 drives PHI1 to the BUSCLK output signal.

Fig. 13 zeigt die Beziehung zwischen den CLK-Eingangs- und den BUSCLK Ausgangssignalen. Fig. 13 shows the relationship between the CLK input and the BUSCLK output signals.

Wie in Fig. 14 illustriert, definiert jede Anstiegskante des BUSCLK Ausgangs einen Übergang in den Zeitablaufzustand ("T-Zustand") des CPU 10. Bus Zyklen treten während einer Folge von T-Zuständen auf, gekennzeichnet mit T1, T2 und T2B in den zugehörigen Zeitablaufdiagrammen. Es kann Ruhe- T-Zustände (Ti) zwischen Bus-Zyklen geben. Die Phasenbeziehung des BUSCLK Ausgangs und des CLK Eingangs kann auf einem Rücksetzen (reset) aufgebaut werden.As illustrated in FIG. 14, each rising edge of the BUSCLK output defines a transition to the timeout state ("T state") of the CPU 10 . Bus cycles occur during a sequence of T states, identified by T1, T2 and T2B in the associated timing diagrams. There may be quiescent T states (Ti) between bus cycles. The phase relationship of the BUSCLK output and the CLK input can be based on a reset.

Der Basis Bus-Zyklus, der von CPU 10 durchgeführt wird, um von bzw. zu einem externen Hauptspeicher zu lesen bzw. zu schreiben und periphere Bausteine treten während zwei Zyklen des Bus-Taktes auf, genannt T1 und T2. Die Basis Bus-Zyklen können über zwei Takt-Zyklen für zwei Plausibilitäten ausgedehnt werden. Zunächst können zusätzliche T2 Zyklen hinzugefügt werden, um auf langsame Speicher und periphere Bausteine zu warten. An zweiter Stelle, wenn aus einem externen Speicher gelesen wird, können stoßweise Zyklen (genannt "T2B") benutzt werden, um Vielfach Doppelworte von aufeinanderfolgenden Speicherzellen zu transferieren. Der Zeitablauf für Basis-Lese und Schreib-Bus-Zyklen mit Zuständen ohne Warten ist in Fig. 14 und 15 beispielhaft gezeigt. Für beide Lese- und Schreib-Bus-Zyklen behauptet CPU 10 Adressen Strobe während der ersten Hälfte von TI, die den Beginn des Bus-Zyklus anzeigt. Vom Beginn von T1 ab bis zum Ablauf des Bus-Zyklus steuert CPU 10 den Adressen-Bus und steuert die Signale für den Zustand (ST0-ST4), Byte Aktivierungen (BE0- BE3), Daten Direction IN (), Cache-Sperre (CIO), I/O Sperre () und Bestätigungs-Bus-Zyklen () Signale.The basic bus cycle performed by CPU 10 to read from or write to external main memory and peripheral devices occur during two cycles of the bus clock, called T1 and T2. The basic bus cycles can be extended over two clock cycles for two plausibilities. First, additional T2 cycles can be added to wait for slow memories and peripheral devices. Second, when reading from external memory, burst cycles (called "T2B") can be used to transfer multiple double words from successive memory cells. The timing for basic read and write bus cycles with states without waiting is shown by way of example in FIGS. 14 and 15. For both read and write bus cycles, CPU claims 10 strobe addresses during the first half of TI, which indicates the start of the bus cycle. From the beginning of T1 until the end of the bus cycle, CPU 10 controls the address bus and controls the signals for the status (ST0-ST4), byte activations (BE0- BE3), data direction IN (), cache lock ( CIO), I / O lock () and confirmation bus cycles () signals.

Wenn der Bus-Zyklus nicht gelöscht wird (das bedeutet, T2 wird auf dem nächsten Takt folgen), behauptet CPU 10 Start Speicher Transaktion , während T1 und behauptet den Bestätigungs-Bus-Zyklus von der Mitte von T1 bis zum Abschluß des Bus-Zyklus, zu welcher Zeit negiert wird.If the bus cycle is not cleared (that is, T2 will follow on the next clock), CPU 10 asserts start memory transaction while T1 and asserts the acknowledge bus cycle from the middle of T1 to the end of the bus cycle what time is negated.

Am Ende von T2, fragt CPU 10 ab, ob RDY aktiv ist, was anzeigt, daß der Bus-Zyklus abgeschlossen ist, das bedeutet, kein zusätzlicher T2 Zustand sollte hinzugefügt werden. T2 folgend ist entweder T1 für den nächsten Bus-Zyklus oder Ti, wenn CPU 10 keinen Bus-Zyklus durchzuführen hat.At the end of T2, CPU 10 queries whether RDY is active, indicating that the bus cycle is complete, which means no additional T2 state should be added. Following T2 is either T1 for the next bus cycle or Ti if CPU 10 has no bus cycle to perform.

Wie in Fig. 16 gezeigt, können die gerade beschriebenen Basis-Lese- und Schreib-Bus-Zyklen ausgedehnt werden, um längere Zugriffzeiten zu unterstützen. Wie festgestellt, fragt CPU 10 RDY am Ende von jedem T2 Zustand ab. Wenn RDY inaktiv ist, dann wird der Bus-Zyklus ausgedehnt durch Wiederholung T2 für einen anderen Takt. Die zusätzlichen T2 Zustände nach dem ersten werden "Warte" Zustände genannt. Fig. 16 zeigt die Ausweitung eines Lese-Bus-Zyklus mit der Addition von zwei Wartezuständen. Wie in Fig. 17 gezeigt, können die Basis-Lese-Zyklen auch ausgedehnt werden, um stoßweise Übergänge von bis zu vier Doppelworten aus aufeinanderfolgenden Speicher-Speicherplätzen zu unterstützen. Während eines stoßweisen Lesezyklusses wird das Anfangs-Doppelwort während einer Folge von T1 und T2 Zuständen transferiert, wie ein Basis-Lese-Zyklus. Folgende Doppelworte werden während der "T2B" genannten Zustände übertragen. Stoßzyklen werden nur benutzt, um aus den 32 Bit-weiten Speichern zu lesen. Die Zahl der Übergänge in einen Stoßlesezyklus wird durch einen Quittungsaustausch zwischen dem Ausgangssignal und Eingangssignal während eines T2 oder T2B Zustandes gesteuert, um anzuzeigen, daß ein anderer Übergang abgefragt wird, der dem laufenden folgt. Der Speicher hält fest, um anzuzeigen, daß er einen anderen Übergang unterstützen kann. Fig. 17 zeigt einen Stoß-Lese-Zyklus von drei Übergängen, in denen CPU 10 die Folge durch Negieren nach dem zweiten Übergang abschließt. Fig. 18 zeigt einen Stoßzyklus von zwei Übergängen, die von dem System abgeschlossen werden, wenn inaktiv war während des zweiten Übergangs.As in Fig . 16, the basic read and write bus cycles just described can be extended to support longer access times. As stated, CPU 10 queries RDY at the end of each T2 state. If RDY is inactive, then the bus cycle is extended by repeating T2 for another clock. The additional T2 states after the first are called "wait" states. Fig. 16 shows the extension of a read bus cycle with the addition of two wait states. As shown in Figure 17, the base read cycles can also be extended to support intermittent transitions of up to four double words from successive memory locations. During an intermittent read cycle, the initial double word is transferred during a sequence of T1 and T2 states, like a basic read cycle. The following double words are transmitted during the states called "T2B". Bump cycles are only used to read from the 32-bit memories. The number of transitions in a burst read cycle is controlled by a handshake between the output signal and input signal during a T2 or T2B state to indicate that another transition is being polled that follows the current one. The memory holds to indicate that it can support another transition. Fig. 17 shows a burst-read cycle of three transitions in which CPU 10 completes the sequence by negating after the second transition. Fig. 18 shows a burst cycle of two transitions completed by the system when inactive during the second transition.

Für jeden Übergang nach dem ersten in der stoßweisen Folge, inkrementiert CPU 10 Adressen Bits 2 und 3, um das nächste Doppelwort auszuwählen. Wie für den zweiten Übergang in Fig 18 gezeigt worden ist, fragt CPU 10 RDY am Ende jedes Zustandes T2B ab und erweitert die Zugriffszeit für den Stoßübergang, wenn RDY inaktiv ist.For each transition after the first in the intermittent sequence, CPU 10 increments addresses 2 and 3 to select the next double word. As shown for the second transition in Fig. 18, CPU 10 queries RDY at the end of each state T2B and extends the access time for the burst transition when RDY is inactive.

Hochgeschwindigkeits-Adressen-Übersetzungen werden on-chip durch den oben erwähnten Übersetzungspuffer durchgeführt, der Adressenkarten für 64 Seiten hält. Die Seitengröße beträgt 4K Bytes. Der Übersetzungspuffer schafft direktes Abbilden virtueller auf physikalische Adressen für kürzlich benutzte Memory-Seiten. Eingänge in den Übersetzungspuffer werden automatisch durch MMU zugeordnet und ersetzt. Wenn die zur Übersetzung einer virtuellen Adresse zu einer physikalischen Adresse notwendige Information aus dem Übersetzungspuffer fehlt, lokalisiert CPU 10 automatisch den Befehl von zwei Ebenen der Seiten-Tabelleneingänge im externen Speicher und aktualisiert den Übersetzungspuffer. Wenn MMU 18 eine Schutzfunktionsverletzung oder einen Seitenfehler detektiert, während eine Adresse für eine angefragte Referenz zur Durchführung eines Befehls übersetzt wird, tritt ein nicht programmierter Abbruchsprung auf, und der durchgeführte Befehl wird suspendiert.High speed address translations are performed on-chip by the above-mentioned translation buffer, which holds address cards for 64 pages. The page size is 4K bytes. The translation buffer creates a direct mapping of virtual to physical addresses for recently used memory pages. Inputs in the translation buffer are automatically assigned and replaced by MMU. If the information necessary to translate a virtual address to a physical address is missing from the translation buffer, CPU 10 automatically locates the command of two levels of the page table entries in external memory and updates the translation buffer. If MMU 18 detects a protection violation or page fault while translating an address for a requested reference to perform a command, an unprogrammed abort jump occurs and the command performed is suspended.

Jeder der 64 Eingänge in den Übersetzungspuffer speichert die virtuellen und physikalischen Seiten-Datenübertragungsblockzahlen, das bedeutet, die 20 meistsignifikanten Bits einer Adresse mit dem Adressenraum für die virtuelle Seite, die Schutzebene für die Seite und modfizierte und Cache Sperrbits von den 2 Ebenen Seiten Tabelleneingang.Each of the 64 inputs in the translation buffer stores the virtual ones and physical page frame numbers, that is, the 20 most significant bits of an address with the address space for the virtual page, the protection level for the page and modified and cache Lock bits from the 2 level sides of table entry.

Das Schutzebenenfeld bestimmt die Schutzebene, die einer bestimmten Seite oder Gruppen von Seiten zugeordnet ist. Tabelle 6 zeigt die Verschlüsselung des Schutzebenenfeldes. The protection level field determines the protection level, that of a certain page or groups of pages. Table 6 shows the encryption of the protection level field.

Tabelle 6 Table 6

Wie oben festgestellt, erscheint ein Cache Sperrbit CI in dem Eingang der Seiten Tabelle in der zweiten Ebene. Wenn der Cache Sperrbit 1 ist, dann gehen Befehlsabfrage und Daten-Lese-Referenzen zu Speicherplätzen auf der Seite, wobei die on-chip-Caches umgangen werden. Der Cache-Sperrbit ist angezeigt auf der Systemschnittstelle, während der Referenzen zum externen Speicher.As stated above, a cache lock bit CI appears in the input of the Pages table in the second level. If the cache lock bit is 1, then command query and data read references go to memory locations on the Side, bypassing the on-chip caches. The cache lock bit is displayed on the system interface, while references to the external Storage.

Der modifizierte Bit erscheint ebenso in den Seiten Tabellen-Eingängen in der zweiten Ebene. MMU 18 setzt den modifizierten Bit in den Seiten-Tabellen-Eingang auf 1, wann immer ein Schreiben auf die Seite durchgeführt ist und der modifizierte Bit in dem Seiten-Tabellen-Eingang 0 ist.The modified bit also appears in the table entries on the second level. MMU 18 sets the modified bit in the page table input to 1 whenever the page is written and the modified bit in the page table input is 0.

Um eine virtuelle Adresse in eine korrespondierende physikalische Adresse zu übersetzen, werden die virtuelle Seiten-Daten-Übertragungsblockzahl und der Adressenraum mit den Eingängen in dem Übersetzungspuffer verglichen. Wenn ein gültiger Eingang mit einer passenden Seiten-Datenübertragungsblockzahl und Adressenraum schon in dem Übersetzungspuffer präsent ist, ist die physikalische Adresse sofort verfügbar. Anderenfalls, wenn kein gültiger Eingang in den Übersetzungspuffer die übereinstimmende Seiten-Datenübertragungsblockzahl und den Adressenraum hat, übersetzt MMU 18 die virtuelle Adresse und plaziert die fehlende Information in den Übersetzungspuffer. MMU 18 führt ebenso eine Übersetzung über das Schreiben auf eine Seite durch, die vorläufig noch nicht modifiziert worden ist.To translate a virtual address into a corresponding physical address, the virtual page frame number and address space are compared to the inputs in the translation buffer. If a valid input with a suitable page frame number and address space is already present in the translation buffer, the physical address is immediately available. Otherwise, if no valid entry in the translation buffer has the matching page frame number and address space, MMU 18 translates the virtual address and places the missing information in the translation buffer. MMU 18 also carries out a translation on writing to a page that has not yet been modified.

Wenn die Übersetzung für eine Speicherreferenz aktiviert ist, übersetzt MMU 18 32-Bit virtuelle Adressen in 32-Bit physikalische Adressen, wobei nach Schutzfunktionsverletzungen auf jeder Referenz und möglichen Sperren der Benutzung des on-chip Cache für die Referenz überprüft wird, wie oben beschrieben. Wenn die Übersetzung für eine Referenz gesperrt ist, ist die physikalische Adresse identisch mit der virtuellen Adresse, keine Schutzfunktionsüberprüfung wird durchgeführt und die on-chip Caches werden nicht für die Referenz gesperrt.When translation is enabled for a memory reference, MMU 18 translates 32-bit virtual addresses into 32-bit physical addresses, checking for protection violations on each reference and possible locks on the use of the on-chip cache for the reference, as described above. If the translation is blocked for a reference, the physical address is identical to the virtual address, no protection function check is carried out and the on-chip caches are not blocked for the reference.

Wie oben festgestellt, übersetzt MMU 18 Adressen durch Benutzen von 4KB Seiten und zwei Ebenen von Übersetzungstabellen. Die virtuelle Adresse wird in drei Komponenten geteilt: INDEX1, INDEX2 und OFFSET. INDEX1 und INDEX2 sind beide 10-Bit Felder, die benutzt werden, um beispielsweise in die erste und zweite Ebene der Seitentabellen zum Beispiel zu zeigen. OFFSET ist der untere 12 Bits der virtuellen Adresse; es zeigt auf ein Byte innerhalb der ausgesuchten Seite.As stated above, MMU translates 18 addresses using 4KB pages and two levels of translation tables. The virtual address is divided into three components: INDEX1, INDEX2 and OFFSET. INDEX1 and INDEX2 are both 10-bit fields that are used, for example, to point to the first and second levels of the page tables, for example. OFFSET is the lower 12 bits of the virtual address; it points to a byte within the selected page.

Wenn Seiten-Tabellen-Eingänge während der Adressenübersetzung gelesen werden, umgeht MMU 18 den Daten Cache 16, wobei er immer zum externen Speicher zugeordnet ist. Wenn ein Seiten-Tabellen-Eingang, der im Daten- Cache 16 lokalisiert ist, aktualisiert ist, aktualisiert MMU 18 die Inhalte der Seitentabellen-Eingänge sowohl im Daten-Cache 16 als auch im externen Speicher.If page table inputs are read during address translation, MMU 18 bypasses data cache 16 , always being allocated to external memory. When a page table entry located in data cache 16 is updated, MMU 18 updates the contents of the page table entries in both data cache 16 and external memory.

Die System-Schnittstelle des CPU 10 unterstützt auch die Benutzung des externen Cache-Speichers 25, wie in Fig. 1 dargestellt. Das CI Bit von der zweiten Ebene der Seiten-Tabellen-Eingänge wird auf dem CIO Ausgangssignal während eines Bus-Zyklusses zusammen mit der Adresse präsentiert, was erlaubt, individuelle Seiten selektiv zwischenzuspeichern. CPU 10 kann auch so ausgestaltet werden, daß ein Bus-Zyklus wiederholt wird durch Aufrechterhaltung des Eingangssignals während des Bus-Zyklusses. Vor der erneuten Wiederholung des Bus-Zyklus löst CPU 10 den Bus aus, wodurch dem externen Cache 25 erlaubt wird, Fehlgriffe abzuwickeln, indem Zugriffe zu dem externen Hauptspeicher durchgeführt werden.The system interface of the CPU 10 also supports the use of the external cache memory 25 , as shown in FIG. 1. The CI bit from the second level of page table inputs is presented on the CIO output signal during a bus cycle along with the address, which allows individual pages to be selectively buffered. CPU 10 can also be designed to repeat a bus cycle by maintaining the input signal during the bus cycle. Before repeating the bus cycle again, CPU 10 triggers the bus, allowing external cache 25 to handle misses by accessing external main memory.

In Übereinstimmung mit der vorliegenden Erfindung schafft CPU 10 die Aufrechterhaltung der Kohärenz zwischen zwei on-chip Caches und dem externen Speicher. Die von CPU 10 benutzten Techniken für diesen Zweck sind in Tabelle 7 zusammengefaßt. In accordance with the present invention, CPU 10 maintains coherency between two on-chip caches and external memory. The techniques used by CPU 10 for this purpose are summarized in Table 7.

Tabelle 7 Table 7

Wie oben festgestellt, kann die Benutzung der Caches für individuelle Seiten gesperrt werden, wobei der CI Bit in der zweiten Ebene der Seiten- Tabellen-Eingänge benutzt wird.As stated above, the use of the caches can be customized Pages are blocked, the CI bit in the second level of the page Table inputs are used.

Eingänge im Befehls-Cache 14 und Daten-Cache 16 können ungültig gemacht werden, indem der Cache-Ungültigkeitsbefehl CINV benutzt wird. Während der Ausführung des CINV Befehls erzeugt CPU 10 zwei Slave Bus-Zyklen auf der System-Schnittstelle, um die ersten drei Bytes der Befehle und den Quellenoperanden darzustellen. Der externe Kreislauf kann dabei die Ausführung des CINV Befehls zur Benutzung in der Überwachung der Inhalte des on-chip Caches detektieren.Entries in instruction cache 14 and data cache 16 can be invalidated using the cache invalidate instruction CINV. During execution of the CINV instruction, CPU 10 generates two slave bus cycles on the system interface to represent the first three bytes of the instructions and the source operands. The external circuit can detect the execution of the CINV command for use in monitoring the content of the on-chip cache.

Der CINV Befehl kann benutzt werden, um entweder die vollständigen Inhalte eines oder beider der internen Caches oder nur eines 16-Byte-Blockes in einem ausgewählten Cache ungültig zu machen. In dem letzteren Fall spezifizieren die 28 signifikantesten Bits des Quellenoperanden die physikalische Adresse des ausgerichteten 16-Byte-Blockes; die vier weniger signifikanten Bits des Quellenoperanden werden ignoriert. Wenn der ausgewählte Block nicht in dem on-chip Cache lokalisiert ist, dann hat der Befehl keine Wirkung. Der CINV Befehl ist Befehls-Cache 14 zugeordnet, wenn er eine I-Angabe betrifft und dem Daten Cache 16 zugeordnet, wenn er eine D-Angabe betrifft.The CINV instruction can be used to invalidate either the complete contents of one or both of the internal caches or just a 16-byte block in a selected cache. In the latter case, the 28 most significant bits of the source operand specify the physical address of the aligned 16 byte block; the four less significant bits of the source operand are ignored. If the selected block is not located in the on-chip cache, the instruction has no effect. The CINV instruction is associated with instruction cache 14 if it relates to an I specification and to data cache 16 if it relates to a D specification.

Das Format des CINV Befehls ist in Tabelle 8 dargestellt.The format of the CINV command is shown in Table 8.

Tabelle 8 Table 8

Angaben werden spezifiziert durch die Auflistung der Buchstaben A (alle ungültig zu machen), I (Befehls Cache) und D (Daten Cache). Wenn weder I noch D Angaben spezifiziert sind, wird nichts ungültig gemacht.Information is specified by listing the letters A (all invalidate), I (instruction cache) and D (data cache). If neither I D details are specified, nothing is invalidated.

In der Maschinenanweisung werden die Angaben wie folgt in den A, I und D- Feldern verschlüsselt:In the machine instructions, the details are as follows in the A, I and D Encrypted fields:

A:0 = Mache nur einen 16-Byte Block ungültig
1 = Mache den vollständigen Cache ungültig I:0 = Keine Einwirkung auf den Befehls Cache
1 = Ungültigmachen des Befehls Cache D:0 = Keine Einwirkung auf den Daten Cache
1 = Ungültigmachen des Daten CacheA: 0 = invalidate only a 16-byte block
1 = invalidate the full cache I: 0 = no effect on the instruction cache
1 = Invalidate the Cache D command: 0 = No effect on the data cache
1 = invalidate the data cache

CPU 10 unterstützt auch einen externen "Bus-Watcher" Kreislauf 26, wie in Fig. 1 dargestellt. Die primäre Funktion des Bus-Watcher 26 ist die Kohärenz zwischen dem Befehls Cache und dem Daten Cache auf der einen Seite und dem externen Speicher auf der anderen Seite aufrechtzuerhalten, um entweder einem geteilten Speicher Multiprozessor-System oder einem einzelnen Prozessorsystem mit Direktspeicherzugriff (DMA) Bauteilen hoher Bandbreite. Bus-Watcher 26 überwacht die Bus-Zyklen des CPU 10, um eine Kopie der Cache Adressenidentifizierungskennzeichen für den Instruktions Cache 14 und den Daten Cache 16 zu verwalten, während er gleichzeitig das Schreiben zu dem externen Speicher durch z. B. DMA Steuerteile oder andere Mikroprozessoren in dem System überwacht. Wenn Bus-Watcher 26 durch einen Vergleich der Cache Adressenidentifizierungskennzeichen und der Schreibreferenzadresse detektiert, daß ein Speicherplatz in dem on-chip Cache modifiziert worden ist im externen Speicher, signalisiert er eine Cache Ungültigkeitsabfrage zu CPU 10.CPU 10 also supports an external "bus watcher" circuit 26 , as shown in FIG. 1. The primary function of bus watcher 26 is to maintain coherency between the instruction cache and data cache on one side and external memory on the other, to either a shared memory multiprocessor system or a single processor system with direct memory access (DMA) High bandwidth components. Bus watcher 26 monitors the bus cycles of CPU 10 to maintain a copy of the cache address identifiers for instruction cache 14 and data cache 16 while simultaneously writing to external memory by e.g. B. DMA control parts or other microprocessors in the system monitored. When bus watcher 26 detects, by comparing the cache address identifiers and the write reference address, that a memory location in the on-chip cache has been modified in external memory, it signals a cache invalidation request to CPU 10 .

Wie in Fig. 19 dargestellt, verbindet Bus-Watcher 26 zu den folgenden Buses:As shown in Fig. 19, bus watcher 26 connects to the following buses:

1. CPU 10 address bus and CASEC output to obtain information on which internal cache inputs have tags modified and to maintain updated copies of the CPU 10 internal cache tags;
2. The system bus to detect which external memory addresses are modified; and
3. the CPU 10 cache invalidation bus, consisting of the,, and CIA0-CIA6 input signals.

Bezogen auf Fig. 19 verwaltet der Bus-Watcher 26 Identifizierungskennzeichen-Kopien von zwei internen Caches des CPU 10, des Befehls Cache 14 und des Daten Cache 18. Wenn die Adresse eines Speicher-Schreib-Zyklusses auf den System-Bus mit einer der Identifizierungskennzeichen innerhalb des Bus-Watcher 26 übereinstimmt, wird durch Bus-Watcher 26 ein Befehl an CPU 10 herausgegeben, über den Cache Ungültigkeits-Bus, um den korrespondierenden Eingang in den internen Cache ungültig zu machen. Während das Ungültigkeitssignal über den separaten Cache Ungültigkeits-Bus vorgesehen ist, benötigt die Ungültigkeitsmachung des internen Cache Eingangs durch CPU 10 nur einen Takt-Zyklus und interferiert nicht mit einem fortschreitenden externen Bus-Zyklus des CPU 10. Befehls Cache 14 und Daten Cache 16 werden auf einmal ungültig gemacht, zum Beispiel 16 Bytes im Befehls- Cache 14 und 32 Bytes im Daten Cache 16.Referring to FIG. 19, bus watcher 26 manages tag copies of two internal caches of CPU 10 , instruction cache 14, and data cache 18 . If the address of a memory write cycle on the system bus to one of the tag within the bus watcher matches 26, issued by bus watcher 26 a command to CPU 10, via the cache invalidation bus to the corresponding input invalidate in the internal cache. While the invalidate signal is provided via the separate cache invalidate bus, the invalidation of the internal cache input by CPU 10 only requires one clock cycle and does not interfere with a progressive external bus cycle of CPU 10 . Instruction cache 14 and data cache 16 are invalidated all at once, for example 16 bytes in instruction cache 14 and 32 bytes in data cache 16 .

Das Eingangssignal INVSET zeigt an, ob die Ungültigkeit für einen Single- Set (niedrig) oder den vollständigen Cache (hoch) vorgesehen ist.The input signal INVSET indicates whether the invalidity for a single Set (low) or the full cache (high) is provided.

Wenn die Ungültigkeitsabfrage früher oder zur gleichen Zeit bei CPU 10 auftaucht, zu der CPU 10 einen T2 oder T2B Zustand in einem Lese-Zyklus zu einem Speicherplatz, abschließt, der durch die Ungültigkeitmachung berührt wird, werden die auf dem Bus gelesenen Daten in dem Cache gültig sein. Wenn die Ungültigkeitsabfrage nach dem T2 oder T2B Zustand in dem Lese-Zyklus auftaucht, werden die Daten in dem Cache ungültig sein. Wenn der Ungültigkeitsbefehls Cache INVIC Eingang niedrig ist, ist die Ungültigmachung im Befehls Cache 14 durchgeführt. Wenn der Ungültigkeitsdaten- Cache INVDC Eingang niedrig ist, ist die Ungültigkeit im Daten Cache 16 durchgeführt.If the invalidation query occurs earlier or at the same time at CPU 10 , to which CPU 10 completes a T2 or T2B state in a read cycle to a memory location that is affected by the invalidation, the data read on the bus becomes in the cache to be valid. If the invalidate query appears after the T2 or T2B state in the read cycle, the data in the cache will be invalid. If the invalidate cache INVIC command is low, the invalidate in cache 14 is done. If the invalid data cache INVDC input is low, the invalid data cache 16 is done.

Die Cache Ungültigkeitsadresse CIA0-CIA6 wird CPU 10 auf dem CIA Bus präsentiert. Die Bits sorgen dafür, daß die Set-Adresse im Daten Cache 16 und im Befehls Cache 14 ungültig gemacht wird.The cache invalidation address CIA0-CIA6 is presented to CPU 10 on the CIA bus. The bits ensure that the set address in the data cache 16 and in the instruction cache 14 is invalidated.

Der Bus-Watcher-Kreislauf besteht primär aus drei RAM Matrizen, die die Kopien der Cache Adressenidentifizierungskennzeichen speichern. Die RAM Bits sind, wie in Fig. 19 dargestellt, mit zwei Eingängen versehen. Ein Eingang wird zum Schreiben der Identifizeirungskennzeichen während der Bus-Lese-Zyklen durch CPU 10 benutzt. Der zweite Eingang wird zum Lesen der Identifizierungskennzeichen während der Ungültigkeitsabfrage von dem System Bus benutzt. Durch Benutzung zwei-eingängiger Speicherzellen werden Probleme vereinfacht, die mit der Synchronisierung der System Bus Ungültigkeitsabfragen mit den Bus-Zyklen von CPU 10 verbunden sind.The bus watcher circuit primarily consists of three RAM matrices that store the copies of the cache address identifiers. As shown in Fig. 19, the RAM bits are provided with two inputs. An input is used by CPU 10 to write the identifiers during bus read cycles. The second input is used to read the identifiers during the invalidation query from the system bus. Using two-input memory cells simplifies problems associated with the synchronization of the system bus invalidation queries with the bus cycles of CPU 10 .

Zusätzlich, um Interferenz mit den externen Referenzen des CPU 10 zu vermeiden, vermeidet die Benutzung des BUS-Watcher 26 auch eine Interfrerenz mit der internen Aktivität des CPU 10. Dies wird erfüllt durch die Benutzung von zwei-eingängigen Gültigkeits-Bits sowohl in dem Befehls Cache als auch im Daten Cache, wie im Hinblick auf Fig. 4 und 5 oben beschrieben ist.In addition, to avoid interference with the external references of the CPU 10 , the use of the BUS watcher 26 also avoids interference with the internal activity of the CPU 10 . This is accomplished using two-pass valid bits in both the instruction cache and the data cache, as described with respect to Figures 4 and 5 above.

Die Systemanforderungen für die Nutzung des Bus-Watcher 26 hängt von der möglichen Rate ab, in der Cache Ungültigkeiten durch Modifikationen des geteilten Speichers verursacht werden.The system requirements for using Bus Watcher 26 depend on the possible rate at which cache invalidations are caused by modifications to the shared memory.

Der Cache Ungültigkeitsmechanismus des CPU 10, zum Beispiel der CINV Befehl der oben beschrieben worden ist, kann ohne Bus-Watcher 26 benutzt werden, wenn die mögliche Rate der Ungültigkeitsmachungen sehr viel niedriger ist als die Rate der Speicherzugriffe durch CPU 10. Zum Beispiel verursachen Systeme, die mit Durchschreiblisten (Policen) versehen sind, eine höhere mögliche Rate von Ungültigkeitsmachungen und würden einen Bus-Watcher 26 brauchen; Systeme, die Rückschreiblisten benötigen mögen eine mögliche Rate von Ungültigkeitsmachungen haben, die hinreichend niedrig ist, daß der Bus-Watcher 26 unnötig ist.The CPU 10 cache invalidation mechanism, for example the CINV instruction described above, can be used without bus watcher 26 if the possible invalidation rate is much lower than the CPU 10 memory access rate. For example, systems that have write-through lists (policies) cause a higher possible rate of invalidations and would need a bus watcher 26 ; Systems that need writeback lists may have a possible invalidation rate that is sufficiently low that the bus watcher 26 is unnecessary.

Drei mögliche interne Cache Ungültigkeitsszenarien sind in Fig. 20-22 dargestellt.Three possible internal cache invalidation scenarios are shown in Fig. 20-22.

Fig. 20 zeigt eine Cache Kohärenzlösung für ein System, das eine niedrige Ungültigkeitsmachungsrate benötigt und deshalb einen Bus-Watcher 26 nicht verwertet. Wenn ein DMA Steuergerät oder ein anderes CPU in dem System zu den Inhalten von Speicherplatz A im Hauptspeicher aus dem System-Bus schreibt, sind die sieben niedrigeren Bits der Adresse des Speicherplatzes A für CPU 10 auf dem CIA Bus vorgesehen und sowohl die INVIC und INVDC Eingänge werden niedrig gefahren, so daß der Set ungültig gemacht wird, welcher den Speicherplatz A on-chip einschließt. Das bedeutet, ohne das Aufrastern, wie es ein Bus-Watcher 26 vorsieht, wird jedes Schreiben auf dem System-Bus ein Set in dem Befehls Cache 14 und dem Daten Cache 16 ungültig machen. Dieser Aufbau ist anwendbar bei Uniprozessor-Systemen oder bestimmten Typen von Multiprozessor-Organisationen, zum Beispiel jenen, die Eigentümerschemen für den Speicher benutzen. Fig. 20 shows a cache coherence solution for a system that requires a low Ungültigkeitsmachungsrate and therefore a bus watcher 26 is not utilized. When a DMA controller or other CPU in the system writes the contents of location A in main memory from the system bus, the seven lower bits of the address of location A are provided for CPU 10 on the CIA bus and both the INVIC and INVDC Inputs are driven low so that the set which includes the memory location A on-chip is invalidated. That is, without the rasterization as provided by a bus watcher 26 , each write on the system bus will invalidate a set in the instruction cache 14 and data cache 16 . This structure is applicable to uniprocessor systems or certain types of multiprocessor organizations, for example those that use ownership schemes for the memory.

Fig. 21 zeigt eine Cache Kohärenzlösung für ein System, das eine hohe Cache Ungültigkeitsmachungsrate unterstützen muß und, somit, die Benutzung des Bus-Watcher 26 rechtfertigt. Wie oben festgestellt, verwaltet der Bus-Watcher 26 eine Kopie der on-chip Cache Identifizierungskennzeichen. Somit wird jedes Schreiben auf dem System Bus, welches eine Übereinstimmung mit einem Bus-Watcher Identifizierungskennzeichen erzeugt, eine Ungültigkeitsmachung in dem korrespondierenden internen Cache auslösen, basierend auf den CIA, INVDC, INVIC und INVSET Eingängen. FIG. 21 shows a cache coherency solution for a system that must support a high cache invalidation rate and thus justifies the use of the bus watcher 26 . As noted above, bus watcher 26 maintains a copy of the on-chip cache tags. Thus, any write on the system bus that matches a bus watcher identifier will invalidate the corresponding internal cache based on the CIA, INVDC, INVIC and INVSET inputs.

Eine dritte Cache Kohärenzlösung ist in Fig. 22 dargestellt. Dieses System hat eine hohe Cache Ungültigkeitsmachungsrate und schließt ebenso einen großen externen Cache 25 ein. In diesem Falle hält der externe Cache 25 die Kohärenz mit dem Hauptspeicher aufrecht, indem er seinen eigenen Bus-Watcher benutzt. Die interenen Caches jedoch brauchen nur die Kohärenz mit dem externen Cache 25 aufrechtzuerhalten. Jegliche Ungültigkeitsmachung im externen Cache macht einen Set in dem internen Cache ungültig. Jedes Aktualisieren zu einem externen Cache 25 macht einen Set im internen Cache ungültig.A third cache coherency solution is shown in FIG. 22. This system has a high cache invalidation rate and also includes a large external cache 25 . In this case, external cache 25 maintains coherence with main memory by using its own bus watcher. The internal caches, however, only need to maintain coherence with the external cache 25 . Any invalidation in the external cache invalidates a set in the internal cache. Each update to an external cache 25 invalidates a set in the internal cache.

Zusätzliche Information bezüglich der Arbeitsweise des CPU 10 kann gefunden werden in den zugehörigen und allgemein bestimmten US-Patentanmeldungen Serial No. 006,016, "Hochleistungsmikroprozessor", durchgeführt von Alpert et al am selben Tag mit dieser Anmeldung, und die hierdurch inhaltlich einbezogen werden soll. Es sollte verstanden worden sein, daß verschiedene Alternativen zu der Ausführung, die hier dargestellt ist, angewandt werden kann bei der Ausübung der vorliegenden Erfindung. Es ist beabsichtigt, daß die folgenden Ansprüche die Erfindung und den Aufbau und die Verfahren innerhalb des Rahmens dieser Ansprüche und ihrer Äquivalente, die hierdurch abgedeckt werden, definieren.Additional information regarding the operation of the CPU 10 can be found in the associated and general U.S. patent applications Serial No. 006,016, "high-performance microprocessor", carried out by Alpert et al on the same day with this application, and which is to be included as a result. It should be understood that various alternatives to the embodiment shown here can be used in the practice of the present invention. It is intended that the following claims define the invention and the structure and methods within the scope of these claims and their equivalents which are covered thereby.

Claims

1. A method for maintaining coherence between an integrated cache memory of a microprocessor and the external memory connected to the microprocessor, the microprocessor communicating with the external memory via an external data bus, characterized in that the method executes a cache invalidation command which invalidates the information that is stored in the built-in cache.

2. The method according to claim 1, characterized in that the built-in cache memory an instruction cache and a separate one Owns data cache.

3. The method according to claim 2, characterized in that the Cache invalidate the entire contents of both the instruction cache as well as invalidates the data cache.

4. The method according to claim 2, characterized in that the Cache invalidate command the entire contents of the command Invalidates cache.

5. The method according to claim 2, characterized in that the Cache invalidate instruction marked contents of the instruction cache invalidates.

6. The method according to claim 2, characterized in that the Cache invalidate command the entire contents of the data Invalidates cache.

7. The method according to claim 2, characterized in that the Contents of the data Invalidates cache.

8. The method according to claim 2, characterized in that the Cache invalidate instruction marked content of both the Command cache and data cache invalid at the same time makes.

9. A system for maintaining coherence between an integrated cache memory of a microprocessor and an external memory connected to the microprocessor, wherein the microprocessor communicates with the external memory via an external data bus and wherein the external data bus is used by devices external to the microprocessor to modify the information stored in the external memory, characterized by
Storage means for managing address identifiers corresponding to addresses of the information stored in the integrated cache memory;
Means for monitoring the external data bus to identify write addresses to the external memory from the external devices;
Means for comparing a write address with the stored address identifiers to detect a match between the write address and the information address stored in the integrated cache memory; and
Means for generating a request to the microprocessor to invalidate the information stored in the integrated cache in response to a match between the write address and the information address stored in the integrated cache.

10. System according to claim 9, characterized in that further a cache invalidation bus is included, which is separate from the external data bus that is the cache invalidation request transfers to the microprocessor.

11. System according to claim 2, characterized in that the Cache invalidation request the address of the storage space specified that is invalid in the integrated cache memory should be made.

12. System according to claim 10, characterized in that the Cache invalidation request specified all the information which are stored in the built-in cache are and which are to be invalidated.

13. System according to claim 10, characterized in that the integrated An instruction cache and a separate cache Data cache.

14. System according to claim 13, characterized in that the Cache invalidation requests specify all of the information those in both the instruction cache and the data cache stored and which are to be invalidated.

15. System according to claim 13, characterized in that the Cache invalidation request specified all the information that are stored in the instruction cache and that are invalid are to be made.

16. System according to claim 13, characterized in that the Cache invalidation request the address of the memory cells specified to be invalidated within the instruction cache.

17. System according to claim 13, characterized in that the Cache invalidation request specified all the information which are stored in the data cache and which are invalid are to be made.

18. System according to claim 13, characterized in that the Cache invalidation request the address of the memory cell specified to be invalidated within the data cache.

19. System according to claim 13, characterized in that the Cache invalidation request the address of the memory cells specified that are to be invalidated within both the instruction cache and the data cache.

20. System according to claim 13, characterized in that the data cache has a number of sets and the cache invalidation request specified the set to void is.

21. A method of maintaining coherence between an integrated microprocessor cache and the external microprocessor memory, the microprocessor communicating with the external memory via an external data bus, and the external data bus being used by devices external to the microprocessor to modify the information , which is stored in the external memory, characterized by the following method steps:
Managing the address identifiers corresponding to the addresses of information stored in the built-in cache;
Monitoring the external data bus to identify the write addresses for the external memory from the external devices;
Comparing a write address with the stored address identifiers to detect a match between the write address and an address of information stored in the integrated cache memory;
in response to the correspondence between the write address and an information address stored in the integrated cache, generating a request to the microprocessor to invalidate information stored in the integrated cache.

22. The method according to claim 21, characterized in that the A cache invalidation request to the microprocessor Cache invalidation bus is provided, which is separate from the external data bus.

23. The method according to claim 22, characterized in that the Cache invalidation request specifies the memory cell those within the built-in cache become invalid make is.

24. The method of claim 23 including the further Step of external monitoring of the memory cells that are inside of the built-in cache become invalid.