DE112009005006T5

DE112009005006T5 - Optimizations for an Unbound Transactional Storage System (UTM)

Info

Publication number: DE112009005006T5
Application number: DE112009005006T
Authority: DE
Inventors: Gad Sheaffer; Jan Gray; Burton Smith; Robert Geva; Vadim Bassin; David Callahan; Yang Ni; Bratin Saha; Martin Taillefer; Shlomo Raikin; Arun Kishan; Ali-Reza Adl-Tabatabai; Landy Wang; Koichi Yamada
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2009-06-26
Filing date: 2009-06-26
Publication date: 2013-01-10
Also published as: KR101370314B1; GB2484416A; JP2012530960A; WO2010151267A1; CN102460376A; BRPI0925055A2; GB201119084D0; KR20130074726A; CN102460376B; JP5608738B2; GB2484416B

Abstract

Ein Verfahren und eine Vorrichtung zum Optimieren eines unbounded transactional memory (UTM) Systems werden hierin beschrieben. Eine Hardware-Unterstützung für Monitore, Puffer und Metadaten wird bereitgestellt, wobei orthogonale metaphysikalische Adressräume für Metadaten getrennt mit Threads und/oder Software-Untersystemen innerhalb von Threads verbunden werden können. Zusätzlich können die Metadaten durch Hardware in einer komprimierten Weise hinsichtlich für Software transparenten Daten gehalten werden. Darüber hinaus ist die Hardware in Reaktion auf eine Metadatenzugriffsinstruktion/Operationen in der Lage, einen erzwungenen Metadatenwert zu unterstützen, um mehrere Modi einer transaktionalen Ausführung freizugeben. Falls jedoch Monitore, gepufferte Daten, Metadaten oder andere Informationen verlorengehen oder Konflikte erfasst werden, sorgt die Hardware für Variationen einer Verlustinstruktion, die in der Lage ist, ein Transaktionsstaturregister für einen derartigen Verlust oder Konflikt zu pollen und die Ausführung zu einer Marke in Reaktion auf das Erfassen des Verlustes oder Konflikts zu springen. In ähnlicher Weise werden mehrere Variationen einer Commit-Instruktion bereitgestellt, um es Software zu ermöglichen, Commit-Bedingungen und Informationen zum Löschen bei einem Commit zu definieren. Darüber hinaus liefert die Hardware eine Unterstützung, um eine Aussetzung und Wiederaufnahme von Transaktionen bei Ringniveauübergängen zu ermöglichen.A method and apparatus for optimizing an unbounded transactional memory (UTM) system are described herein. Hardware support for monitors, buffers, and metadata is provided wherein metadata orthogonal metaphysical address spaces can be separately connected to threads and / or software subsystems within threads. In addition, the metadata may be held by hardware in a compressed manner with respect to software transparent data. In addition, in response to a metadata access instruction / operations, the hardware is able to support a forced metadata value to enable multiple modes of transactional execution. However, if monitors, buffered data, metadata or other information is lost or conflicts are detected, the hardware provides for variations of a loss instruction capable of polling a transaction history register for such loss or conflict and execution in response to a flag to capture the capture of the loss or conflict. Similarly, several variations of a commit instruction are provided to enable software to define commit conditions and commit deletion information. In addition, the hardware provides support to allow suspension and resumption of transactions at ring level transitions.

Description

GEBIETTERRITORY

Diese Erfindung betrifft das Gebiet der Prozessorausführung und insbesondere die Ausführung von Instruktionsgruppen.This invention relates to the field of processor execution, and more particularly to the execution of instruction sets.

HINTERGRUNDBACKGROUND

Fortschritte in der Halbleiterbearbeitung und im Logikdesign ermöglichten eine Vergrößerung des Umfangs der Logik, die auf integrierten Schaltungsvorrichtungen vorhanden sein kann. Im Ergebnis wurden Computersystemkonfigurationen von einer einzelnen oder mehreren integrierten Schaltungen in einem System zu mehreren Kernen und mehreren logischen Prozessoren, die auf einzelnen integrierten Schaltungen vorhanden sind, entwickelt. Ein Prozessor oder eine integrierte Schaltung umfasst typischerweise einen Einzelprozessorchip, wobei der Prozessorchip jede beliebige Anzahl von Kernen oder logischen Prozessoren umfassen kann.Advances in semiconductor processing and logic design have allowed for an increase in the amount of logic that may be present on integrated circuit devices. As a result, computer system configurations have been developed from a single or multiple integrated circuits in a system to multiple cores and multiple logical processors resident on individual integrated circuits. A processor or integrated circuit typically includes a single-processor chip, wherein the processor chip may include any number of cores or logical processors.

Die immerzu ansteigende Anzahl von Kernen und logischen Prozessoren auf integrierten Schaltungen ermöglicht, dass mehrere Softwarethreads gleichzeitig ausgeführt werden können. Jedoch entstanden durch das Anwachsen der Anzahl von Softwarethreads, die gleichzeitig ausgeführt werden können, Probleme bei der Synchronisierung von Daten, die zwischen den Softwarethreads gemeinsam geteilt werden. Eine herkömmliche Lösung für den Zugang zu gemeinsam geteilten Daten in Mehrfachkern- oder Mehrfachlogikprozessorsystemen umfasst die Verwendung von Verriegelungen, um einen gegenseitigen Ausschluss über Mehrfachzugriffe auf gemeinsam geteilte Daten zu garantieren. Jedoch führt die immer weiter zunehmende Möglichkeit mehrere Softwarethreads auszuführen potentiell zu einer falschen Konkurrenzsituation und einer Serialisierung der Ausführung.The ever increasing number of cores and logic processors on integrated circuits allows multiple software threads to be executed concurrently. However, the increase in the number of software threads that can run concurrently has resulted in problems in synchronizing data shared between the software threads. One conventional approach to sharing shared data in multi-core or multiple-logic processor systems involves the use of latches to guarantee mutual exclusion via multiple accesses to shared data. However, the ever-increasing ability to execute multiple software threads potentially results in a fake contention and serialization of execution.

Beispielsweise wird eine Hash-Tabelle betrachtet, die gemeinsam geteilte Daten enthält. Mit einem Verriegelungssystem kann ein Programmierer die gesamte Hash-Tabelle verriegeln, wodurch einem Thread der Zugang zur gesamten Hash-Tabelle ermöglicht wird. Jedoch werden der Durchsatz und die Leistungsfähigkeit anderer Threads potentiell negativ beeinflusst, da sie auf Einträge in der Hash-Tabelle nicht zugreifen können, bis die Verriegelung aufgelöst wird. Alternativ kann jeder Eintrag in der Hash-Tabelle verriegelt werden. Nach einer Extrapolation dieses einfachen Beispiels zu einem großen skalierbaren Programm ist offensichtlich, dass für beide Arten die Komplexität einer Verriegelungskonfliktsituation, Serialisierung, Feinsynchronisierung und Deadlockvermeidung für Programmierer extrem hinderliche Auflagen werden.For example, consider a hash table that contains shared data together. With a locking system, a programmer can lock the entire hash table, allowing a thread access to the entire hash table. However, the throughput and performance of other threads are potentially adversely affected because they can not access entries in the hash table until the lock is resolved. Alternatively, each entry in the hash table can be locked. After extrapolating this simple example to a large scalable program, it is obvious that for both types, the complexity of a lock conflict situation, serialization, fine synchronization and deadlock avoidance will be extremely cumbersome for programmers.

Eine weitere jüngere Datensynchronisationstechnik umfasst die Verwendung eines transaktionalen Speichers (TM). Häufig umfasst die transaktionale Ausführung eine Ausführung einer Gruppierung einer Mehrzahl von Mikrooperationen, Operationen oder Instruktionen. Bei dem obigen Beispiel führen beide Threads in der Hash-Tabelle aus und ihre Speicherzugriffe werden überwacht/verfolgt. Falls beide Threads auf denselben Eintrag zugreifen/denselben Eintrag verändern, kann eine Konfliktauflösung durchgeführt werden, um eine Datengültigkeit sicherzustellen. Ein Typ einer transaktionalen Ausführung umfasst einen Software Transactional Memory (STM) (Software-Transaktional-Speicher), wobei eine Verfolgung von Speicherzugriffen, Konfliktauflösung, Taskabbrüchen und anderen transaktionalen Tasks häufig ohne Hardwareunterstützung in Software durchgeführt werden.Another recent data synchronization technique involves the use of transactional memory (TM). Often, transactional execution involves execution of a grouping of a plurality of micro-operations, operations, or instructions. In the above example, both threads execute in the hash table and their memory accesses are monitored / tracked. If both threads access the same entry / modify the same entry, a conflict resolution can be performed to ensure data validity. One type of transactional execution includes Software Transactional Memory (STM), where memory access, conflict resolution, task abends, and other transactional task tracking are often performed without software hardware support.

Eine andere Art einer transaktionalen Ausführung umfasst ein Hardware Transactional Memory (HTM) System (Hardware-Transaktional-Speichersystem), bei dem Hardware zur Unterstützung der Zugriffsverfolgung, Konfliktauflösung und für andere transaktionale Tasks einbezogen ist. Zuvor wurden tatsächliche Speicherdatenarrays mit zusätzlichen Bits zum Halten von Informationen, wie beispielsweise Hardwareattribute zur Verfolgung von Lesevorgängen, Schreibvorgängen und Buffering, erweitert und folglich werden die Daten mit den Daten vom Prozessor zum Speicher bewegt. Häufig wird diese Information als dauerhaft bezeichnet, d. h. sie geht bei einer Cache-Entleerung nicht verloren, da die Information sich mit den Daten durch die Speicherhierarchie bewegt. Jedoch auferlegt diese Dauerhaftigkeit dem Speicherhierarchiesystem einen zusätzlichen Overhead.Another type of transactional execution includes a hardware transactional memory (HTM) system that includes hardware to support access tracking, conflict resolution, and other transactional tasks. Previously, actual memory data arrays have been augmented with additional bits to hold information, such as hardware attributes for tracking reads, writes, and buffering, and thus data is moved with the data from the processor to memory. Often this information is called permanent, i. H. it is not lost on cache flush as the information moves through the memory hierarchy with the data. However, this durability imposes an additional overhead on the memory hierarchy system.

Zusätzlich waren bisherige Hardware Transactional Memory (HTM) Systeme mit einer Anzahl von Ineffizienzen belastet. Als ein erstes Beispiel bieten HTMs zur Zeit kein effizientes Verfahren zum Übergang zwischen ungepuffert oder gepuffert und von nicht überwachten Zuständen zu einem gepufferten und überwachten Zustand, um eine Konsistenz vor einem Commit einer Transaktion sicherzustellen. Als ein weiteres Beispiel bestehen zahlreiche Ineffizienzen der Schnittstelle eines HTM mit Software. Insbesondere stellt die Hardware keinen Mechanismus bereit, um Softwarespeicherzugangsbarrieren richtig zu beschleunigen, die unterschiedliche Formen starker und schwacher Atomität zwischen transaktionalen und nicht transaktionalen Operationen berücksichtigen. Zusätzlich liefert Hardware während eines versuchten Commits einer Transaktion keine Möglichkeiten zur Bestimmung, wann eine Transaktion abbrechen oder Committen soll, basierend auf einem Verlust an Überwachung, Pufferung und/oder anderen Attributinformationen. Ähnlich sorgt der Instruktionssatz für diese früheren HTMs nicht für Commit-Instruktionen, die Informationen definieren, die bei einem Commit einer Transaktion gehalten oder gelöscht werden sollen. Andere beispielhafte Effizienzen umfassen: HTMs sehen keine Instruktionen zur effizienten Vektor- oder Sprungausführung bei einer Erfassung eines Konflikts oder bei einem Informationsverlust vor und die Unfähigkeit momentaner HTMs Ringlevel-Prioritätsübergänge während einer Ausführung von Transaktionen handzuhaben.In addition, previous hardware transactional memory (HTM) systems were burdened with a number of inefficiencies. As a first example, HTMs currently do not provide an efficient way to transition between unbuffered or buffered and unmonitored states to a buffered and monitored state to ensure consistency before committing a transaction. As another example, there are numerous inefficiencies of the interface of an HTM with software. In particular, the Hardware does not provide a mechanism to properly accelerate software memory access barriers that are aware of different forms of strong and weak atomicity between transactional and non-transactional operations. In addition, during an attempted commit of a transaction, hardware provides no means of determining when to abort or commit a transaction based on a loss of monitoring, buffering, and / or other attribute information. Similarly, the instruction set for these prior HTMs does not provide for commit instructions that define information to be held or deleted when a transaction commits. Other exemplary efficiencies include: HTMs do not provide instructions for efficient vector or jump execution in detecting a conflict or loss of information, and the inability of current HTMs to handle Ringlevel priority transitions during transaction execution.

KURZBESCHREIBUNG DER ZEICHNUNGENBRIEF DESCRIPTION OF THE DRAWINGS

Die vorliegende Erfindung ist beispielhaft veranschaulicht und nicht durch die Figuren der beigefügten Zeichnungen beschränkt.The present invention is illustrated by way of example and not limited by the figures of the accompanying drawings.

1 veranschaulicht eine Ausführungsform eines Prozessors, der mehrere Verarbeitungselemente umfasst, die geeignet sind, mehrere Softwarethreads gleichzeitig auszuführen; 1 Figure 1 illustrates one embodiment of a processor that includes multiple processing elements that are capable of executing multiple software threads concurrently;

2 veranschaulicht eine Ausführungsform zum Zuordnen von Metadaten für ein Datenelement; 2 illustrates an embodiment for mapping metadata for a data item;

3 veranschaulicht eine Ausführungsform mehrerer orthogonaler metaphysikalischer Adressräume für getrennte Softwareuntersysteme in einer Vielzahl von Verarbeitungselementen; 3 Figure 12 illustrates one embodiment of multiple orthogonal metaphysical address spaces for separate software subsystems in a plurality of processing elements;

4 veranschaulicht eine Ausführungsform einer Kompression von Metadaten zu Daten; 4 Figure 1 illustrates one embodiment of compression of metadata to data;

5 veranschaulicht eine Ausführungsform eines Flussdiagramms für ein Verfahren zum Zugriff auf Metadaten; 5 Figure 12 illustrates an embodiment of a flowchart for a metadata access method;

6 veranschaulicht eine Ausführungsform eines Speicherelements für Metadaten um eine Beschleunigung von Übergängen in Umgebungen mit starker und schwacher Atomität zu unterstützen; 6 Fig. 12 illustrates an embodiment of a metadata storage element to aid in accelerating transients in high and low atomicity environments;

7 veranschaulicht eine Ausführungsform eines Flussdiagramms zum Beschleunigen von nicht transaktionalen Operationen, wobei die Atomität in einer transaktionalen Umgebung beibehalten wird; 7 Fig. 12 illustrates an embodiment of a flowchart for accelerating non-transactional operations while maintaining the atomicity in a transactional environment;

8 veranschaulicht eine Ausführungsform eines Flussdiagramms für ein Verfahren zum effizienten Übertragen eines Datenblocks in einen gepufferten und überwachten Zustand vor einem Commit einer Transaktion; 8th FIG. 12 illustrates an embodiment of a flowchart for a method of efficiently transferring a block of data to a buffered and monitored state prior to committing a transaction; FIG.

9 veranschaulicht eine Ausführungsform einer Hardware zur Unterstützung einer Verlustinstruktion zum Springen zu einer Zielmarke basierend auf einem Statuswert in einem Transaktionsstatusregister; 9 FIG. 12 illustrates an embodiment of hardware for assisting a loss instruction to hop to a target based on a status value in a transaction status register; FIG.

10 veranschaulicht eine Ausführungsform eines Flussdiagramms für ein Verfahren zum Ausführen einer Verlustinstruktion zum Springen zu einer Zielmarke basierend auf einem Konflikt oder Verlust einer spezifischen Information; 10 Fig. 11 illustrates an embodiment of a flowchart for a method of executing a loss instruction to hop to a target based on a conflict or loss of specific information;

11 veranschaulicht eine Ausführungsform einer Hardware zur Unterstützung einer Definition von Commit-Bedingungen und Clear-Steuerungen in einer Commit-Instruktion; 11 Figure 4 illustrates one embodiment of hardware supporting a definition of commit conditions and clear controls in a commit instruction;

12 veranschaulicht eine Ausführungsform eines Flussdiagramms für ein Verfahren zum Ausführen einer Commit-Instruktion, die Commit-Bedingungen und Clear-Steuerungen definiert; 12 Figure 12 illustrates an embodiment of a flowchart for a method of executing a commit instruction that defines commit conditions and clear controls;

13 veranschaulicht eine Ausführungsform einer Hardware zur Unterstützung von Handhabungsprivileg-Levelübergängen während der Ausführung von Transaktionen. 13 Figure 1 illustrates one embodiment of hardware for supporting handling privilege level transitions during the execution of transactions.

DETAILLIERTE BESCHREIBUNGDETAILED DESCRIPTION

In der folgenden Beschreibung sind zahlreiche spezielle Details, wie Beispiele spezieller Hardwarestrukturen für eine transaktionale Ausführung, spezifische Typen und Implementierungen von Zugangsmonitoren, spezifische Typen von Cache-Kohärenzmodellen zur Erfassung von Zugriffskonflikten, spezifische Datengranularitäten und spezifische Typen von Speicherzugriffen und Orten, etc. dargelegt, um ein umfassendes Verständnis der vorliegenden Erfindung zu gewährleisten. Es ist jedoch für den Fachmann offensichtlich, dass diese speziellen Details nicht eingesetzt werden müssen, um die vorliegende Erfindung zu realisieren. Bei anderen Beispielen wurden gut bekannte Komponenten oder Methoden, wie beispielsweise das Codieren von Transaktionen in Software, das Einfügen von Operationen zur Durchführung von enumerierten Funktionen durch einen Compiler, die Abgrenzung von Transaktionen, spezifische und alternative Multikern- und Multithread-Prozessorarchitekturen, spezifische Compilermethoden/-implementierungen und spezifische Betriebsdetails von Mikroprozessoren nicht in Einzelheiten beschrieben, um eine unnötige Verschleierung der vorliegenden Erfindung zu vermeiden.In the following description, numerous specific details, such as specific hardware structures for transactional execution, specific types and implementations of access monitors, specific types of cache coherency models for detecting access conflicts, are specific Data granularities and specific types of memory accesses and locations, etc. set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that these specific details need not be employed to practice the present invention. Other examples have included well-known components or methods such as encoding transactions in software, inserting operations to perform enumerated functions by a compiler, delineating transactions, specific and alternative multi-core and multi-threaded processor architectures, specific compiler methods. Implementations and specific operating details of microprocessors are not described in detail to avoid unnecessary concealment of the present invention.

Die hierin beschriebenen Verfahren und Vorrichtungen sind zur Optimierung von Hardware und Software für eine Unbounded Transactional Memory (UTM) Ausführung (Unbeschränkte Transaktional-Speicherausführung) vorgesehen. Insbesondere werden die Optimierungen primär mit Bezug zu einer Unterstützung eines UTM Systems erläutert. Jedoch können die hierin beschriebenen Verfahren und Vorrichtungen in jeder Form eines Transaktional-Speichersystems verwendet werden, wie beispielsweise in einer Hardware zur Unterstützung oder Beschleunigung von Software-Transaktional-Speichersystemen (STMs), reinen Hardware-Transaktional-Speichersystemen (HTMs) oder einem Hybrid daraus, das sich in der Implementierung von einem UTM-System unterscheidet.The methods and apparatus described herein are for optimizing hardware and software for Unbounded Transactional Memory (UTM) execution (Transactional Unlimited Memory Execution). In particular, the optimizations are explained primarily with reference to a support of a UTM system. However, the methods and apparatus described herein may be used in any form of a transactional memory system, such as in hardware to support or speed up software transactional memory (STM) systems, pure hardware transactional memory (HTM) systems, or a hybrid thereof that differs in implementation from a UTM system.

Unter Bezugnahme auf 1 ist eine Ausführungsform eines Prozessors dargestellt, der sich zur gleichzeitigen Ausführung mehrerer Threads eignet. Man beachte, dass der Prozessor 100 eine Hardwareunterstützung für eine Hardware-Transaktional-Ausführung umfassen kann. Entweder in Verbindung mit einer Hardware-Transaktional-Ausführung oder getrennt davon kann der Prozessor 100 auch eine Hardwareunterstützung zur Hardwarebeschleunigung eines Software-Transaktional-Speichers (STM), eine getrennte Ausführung eines STM oder eine Kombination daraus, wie beispielsweise ein hybrides Transaktional-Speicher(TM)-System bereitstellen. Der Prozessor 100 umfasst jeden beliebigen Prozessor, wie beispielsweise einen Mikroprozessor, einen eingebetteten Prozessor, einen digitalen Signalprozessor (DSP), einen Netzprozessor oder eine andere Vorrichtung zur Ausführung von Code. Der Prozessor 100 umfasst, wie dargestellt ist, eine Vielzahl von Verarbeitungselementen.With reference to 1 shows an embodiment of a processor that is suitable for the simultaneous execution of multiple threads. Note that the processor 100 may include hardware support for hardware transactional execution. Either in connection with a hardware-transactional execution or separately, the processor can 100 Also, hardware support for software transactional memory (STM) hardware acceleration, a separate execution of an STM, or a combination thereof, such as providing a hybrid Transactional Memory (TM) system. The processor 100 includes any processor, such as a microprocessor, an embedded processor, a digital signal processor (DSP), a network processor or other device for executing code. The processor 100 As shown, it includes a plurality of processing elements.

Bei einer Ausführungsform betrifft ein Verarbeitungselement eine Thread-Unit, eine Prozess-Unit, einen Kontext, einen logischen Prozessor, einen Hardwarethread, einen Kern und/oder jedes andere beliebige Element, das geeignet ist, einen Zustand für einen Prozessor zu halten, wie beispielsweise einen Ausführungszustand oder einen Architekturzustand. Mit anderen Worten betrifft ein Verarbeitungselement gemäß einer Ausführungsform jede beliebige Hardware, die geeignet ist, unabhängig mit Code in Verbindung zu stehen, wie beispielsweise einem Softwarethread, einem Betriebssystem, einer Anwendung oder einem anderen Code.In one embodiment, a processing element relates to a thread unit, a process unit, a context, a logical processor, a hardware thread, a core, and / or any other element capable of holding a state for a processor, such as an execution state or an architectural state. In other words, a processing element according to one embodiment relates to any hardware capable of being independently associated with code, such as a software thread, operating system, application, or other code.

Ein physikalischer Prozessor bezieht sich typischerweise auf eine integrierte Schaltung, die potentiell jede beliebige Anzahl anderer Verarbeitungselemente, wie beispielsweise Kerne oder Hardwarethreads umfasst.A physical processor typically refers to an integrated circuit that potentially includes any number of other processing elements, such as cores or hardware threads.

Ein Kern bezieht sich häufig auf eine Logik, die sich auf einer integrierten Schaltung befindet, die geeignet ist einen unabhängigen Architekturzustand aufrechtzuerhalten, wobei jeder unabhängig aufrechterhaltene Architekturzustand mit zumindest einigen zweckbestimmten Ausführungsressourcen verknüpft ist. Im Gegensatz zu Kernen bezieht sich ein Hardwarethread typischerweise auf jede beliebige Logik, die sich auf einer integrierten Schaltung befindet, die geeignet ist einen unabhängigen Architekturzustand aufrechtzuerhalten, wobei sich die unabhängig aufrechterhaltenen Architekturzustände einen Zugriff auf Ausführungsressourcen teilen. Wie zu erkennen ist gibt es eine Überlappung zwischen der Nomenklatur eines Hardwarethreads und eines Kerns, wenn bestimmte Ressourcen gemeinsam geteilt werden und andere für einen Architekturzustand zweckbestimmt sind. Häufig werden ein Kern und ein Hardwarethread von einem Betriebssystem als individuelle logische Prozessoren gesehen, wobei das Betriebssystem in der Lage ist, Operationen auf jedem logischen Prozessor individuell zu planen.One core often refers to logic residing on an integrated circuit capable of maintaining an independent architectural state, each independently maintained architectural state being associated with at least some dedicated execution resources. Unlike cores, a hardware thread typically refers to any logic residing on an integrated circuit that is capable of maintaining an independent architectural state, with independently maintained architectural states sharing access to execution resources. As can be seen, there is an overlap between the nomenclature of a hardware thread and a kernel when certain resources are shared and others are dedicated to an architectural state. Often, a kernel and a hardware thread are seen by an operating system as individual logical processors, with the operating system being able to individually plan operations on each logical processor.

Ein logischer Prozessor 100, wie er in 1 dargestellt ist, umfasst zwei Kerne, einen Kern 101 und einen Kern 102, die einen Zugriff auf einen Cache höherer Ebene 110 gemeinsam teilen. Obwohl der Prozessor 100 asymmetrische Kerne umfassen kann, d. h. Kerne mit unterschiedlichen Konfigurationen, funktionalen Einheiten und/oder Logik, sind symmetrische Kerne dargestellt. Folglich wird der Kern 102, der identisch zum Kern 101 dargestellt ist, nicht in Einzelheiten erläutert, um eine Wiederholung der Erläuterung zu vermeiden. Zusätzlich umfasst der Kern 101 zwei Hardwarethreads 101a und 101b, während der Kern 102 zwei Hardwarethreads 102a und 102b umfasst. Daher sehen Softwareentitäten, wie beispielsweise ein Betriebssystem, den Prozessor 100 potentiell als vier getrennte Prozessoren, d. h. vier logische Prozessoren oder Verarbeitungselemente, die in der Lage sind, vier Softwarethreads gleichzeitig auszuführen.A logical processor 100 as he is in 1 includes two cores, a core 101 and a core 102 that provide access to a higher-level cache 110 share together. Although the processor 100 asymmetric cores, ie cores of different configurations, functional units and / or logic, symmetric cores are shown. Consequently, the core becomes 102 , which is identical to the core 101 is illustrated, not explained in detail, to avoid repetition of the explanation. In addition, the core includes 101 two hardware threads 101 and 101b while the core 102 two hardware threads 102 and 102b includes. Therefore, software entities, such as an operating system, the processor 100 potentially as four separate processors, ie four logical processors or processing elements capable of executing four software threads simultaneously.

Hierin ist ein erster Thread mit Architekturzustandsregistern 101a, ein zweiter Thread mit Architekturzustandsregistern 101b, ein dritter Thread mit Architekturzustandregistern 102a und ein vierter Thread mit Architekturzustandsregistern 102b verknüpft. Wie dargestellt ist, sind die Architekturzustandregister 101a in Architekturzustandsregistern 101b repliziert, so dass individuelle Architekturzustände/-kontexte für den logischen Prozessor 101a und den logischen Prozessor 101b gespeichert werden können. Andere kleinere Ressourcen, wie beispielsweise Instruktionszeiger und Neubenennungslogik in der Rename Allocator Logik 130 können ebenfalls für die Threads 101a und 101b repliziert sein. Einige Ressourcen, wie beispielsweise Umordnungspuffer in der Umordrungs/Retirements Unit 135, ILTB 120, Lade/Speicherpuffer und Warteschlangen können durch Partitionieren gemeinsam geteilt sein. Andere Ressourcen, beispielsweise interne Allzweckregister, Seiten-Tabellen-Basisregister, Datencache niedriger Ebene und Daten-TLB 115, Ausführungseinheit(en) 140 und Teile einer Out-Of-Order-Unit 135 werden potentiell vollständig gemeinsam geteilt.Here's a first thread with architectural state registers 101 , a second thread with architectural state registers 101b , a third thread with architectural state registers 102 and a fourth thread with architectural state registers 102b connected. As shown, the architectural state registers 101 in architectural state registers 101b replicates, allowing individual architectural states / contexts for the logical processor 101 and the logical processor 101b can be stored. Other minor resources, such as instruction pointers and renaming logic in the Rename Allocator logic 130 can also do the threads 101 and 101b be replicated. Some resources, such as reorder buffers in the reorganization / retention unit 135 , ILTB 120 , Load / store buffers and queues can be shared by partitioning. Other resources, such as general purpose internal registers, page table base registers, low level data cache, and data TLBs 115 , Execution unit (s) 140 and parts of an out-of-order unit 135 are potentially shared completely in common.

Der Prozessor 100 umfasst häufig andere Ressourcen, die vollständig geteilt sein können, insbesondere geteilt durch Partitionieren oder zweckbestimmt durch/für Verarbeitungselemente. in 1 ist eine Ausführungsform eines rein exemplarischen Prozessors mit veranschaulichenden funktionalen Einheiten/Ressourcen eines Prozessors dargestellt. Man beachte jedoch, dass ein Prozessor jede dieser funktionalen Einheiten umfassen kann und auch jede andere bekannte funktionale Einheit, Logik oder Firmware, die nicht dargestellt ist, umfassen kann oder diese auch weggelassen sein können.The processor 100 often includes other resources that may be fully shared, in particular divided by partitioning or dedicated by / for processing elements. in 1 FIG. 3 illustrates one embodiment of a purely exemplary processor with illustrative functional units / resources of a processor. It should be understood, however, that a processor may include any of these functional units and may include or may be omitted any other known functional entity, logic, or firmware not shown.

Wie dargestellt ist, umfasst der Prozessor 100 ein Businterfacemodul 105, um mit Vorrichtungen zu kommunizieren, die sich außerhalb des Prozessors befinden, wie beispielsweise ein Speichersystem 175, einen Chipset, eine Northbridge oder eine andere integrierte Schaltung. Der Speicher 175 kann für den Prozessor 100 zweckbestimmt sein oder kann mit anderen Vorrichtungen in einem System gemeinsam geteilt werden. Ein Cache 110 höherer Ebene oder Further-Out-Cache 110 ist dazu bestimmt, kürzlich abgerufene Elemente von einem Cache 110 höherer Ebene zwischenzuspeichern. Man beachte, dass eine höhere Ebene oder Further-Out Cache Ebenen betrifft, die ansteigen oder sich von der Ausführungseinheit (den Ausführungseinheiten) weiter entfernen. Bei einer Ausführungsform handelt es sich bei dem Cache 110 höherer Ebene um einen Datencache zweiter Ebene. Jedoch ist der Cache 110 höherer Ebene nicht in dieser Weise beschränkt, da er einen Instruktionscache umfassen oder diesem zugeordnet sein kann. Ein Trace-Cache, d. h. ein Typ eines Instruktionscache kann stattdessen nach dem Decoder 125 angeschlossen sein, um kürzlich decodierte Traces zu speichern. Das Modul 120 umfasst ebenfalls potentiell einen Zweig-Ziel-Puffer, um auszuführende oder genommene Verzweigungen vorherzusagen, und einen Instruktions-Übersetzungspuffer (I-DLB) um Adressübersetzungseinträge für Instruktionen zu speichern.As shown, the processor includes 100 a bus interface module 105 to communicate with devices external to the processor, such as a storage system 175 , a chipset, northbridge or other integrated circuit. The memory 175 can for the processor 100 be dedicated or can be shared with other devices in a system. A cache 110 higher level or further-out cache 110 is meant to retrieve recently retrieved items from a cache 110 cache higher level. Note that a higher level or further-out cache refers to levels that increase or decrease further from the execution unit (s). In one embodiment, the cache is 110 higher level around a second level data cache. However, the cache is 110 higher level is not limited in this way, since it may include or be associated with an instruction cache. A trace cache, ie a type of instruction cache, may instead be after the decoder 125 be connected to save recently decoded traces. The module 120 also potentially includes a branch target buffer to predict branches to be executed or taken, and an instruction translation buffer (I-DLB) to store address translation entries for instructions.

Das Dekodiermodul 125 ist mit der Abrufeinheit 120 gekoppelt, um abgerufene Elemente zu dekodieren. Bei einer Ausführungsform ist der Prozessor 100 mit einer Instruction-Set-Architecture (ISA) (Instruktionssatzarchitektur) verknüpft, die Instruktionen definiert/spezifiziert, die auf dem Prozessor 100 ausführbar sind. Hier umfassen häufig Maschinencodeinstruktionen, die von der ISA erkannt werden, einen Teil der Instruktion, der als ein Op-Code bezeichnet wird, der sich auf eine Instruktion oder eine Operation bezieht/diese spezifiziert, die ausgeführt werden soll.The decoding module 125 is with the polling unit 120 coupled to decode retrieved elements. In one embodiment, the processor is 100 associated with an instruction set architecture (ISA) that defines / specifies instructions on the processor 100 are executable. Here, machine code instructions that are recognized by the ISA often include a portion of the instruction referred to as an op code that relates to / specifies an instruction or operation to be executed.

In einem Beispiel umfasst der Allocator- und Renamer-Block 130 einen Allocator zur Reservierung von Ressourcen, wie beispielsweise Registerdateien zur Speicherung von Ergebnissen einer Instruktionsverarbeitung. Jedoch sind die Threads 101a und 101b potentiell für eine Out-of-Order-Ausführung geeignet, wobei der Allocator- und Renamer-Block 130 auch andere Ressourcen reserviert, wie beispielsweise Reorder-Puffer zur Verfolgung von Instruktionsergebnissen. Die Einheit 130 kann auch einen Registerrenamer umfassen, um Programm-/Instruktionsreferenzregister in andere interne Register des Prozessors 100 umzubenennen. Die Reorder/Retirement-Unit 135 umfasst Komponenten, wie beispielsweise die oben erwähnten Reorder-Puffer, Ladepuffer und Speicherpuffer, um eine Out-of-Order-Ausführung und darauffolgend ein In-Order-Retirement von Instruktionen, die out-of-Order ausgeführt wurden, zu unterstützen.In one example, the allocator and renamer block includes 130 an allocator for reserving resources, such as register files for storing results of an instruction processing. However, the threads are 101 and 101b potentially suitable for out-of-order execution, with the allocator and renamer block 130 also reserves other resources, such as reorder buffers for tracking instruction results. The unit 130 may also include a register renamer to program / instruction reference registers into other internal registers of the processor 100 rename. The reorder / retirement unit 135 includes components such as the aforementioned reorder buffers, load buffers, and memory buffers to support out-of-order execution and, subsequently, in-order retirement of instructions executed out-of-order.

Der Scheduler- und die Ausführungseinheit(en)-Block 140 umfasst bei einer Ausführungsform eine Scheduler-Einheit zum Planen von Instruktionen/Betrieb auf Ausführungseinheiten. Beispielsweise wird eine Fließkommainstruktion auf einem Port einer Ausführungseinheit geplant, die eine verfügbare Fließkommaausführungseinheit umfasst. Registerdateien, die mit den Ausführungseinheiten verknüpft sind, sind ebenfalls umfasst, um Informationsinstruktionsverarbeitungsergebnisse zu speichern. Beispielhafte Ausführungseinheiten umfassen eine Fließkommeausführungseinheit, eine Integer-Ausführungseinheit, eine Sprungausführungseinheit, eine Ladeausführungseinheit, eine Speicherausführungseinheit und andere bekannte Ausführungseinheiten.The scheduler and execution unit (s) block 140 In one embodiment, a scheduler unit for scheduling instructions / operations on execution units. For example, a floating point instruction is scheduled on a port of an execution unit that includes an available floating point execution unit. Register files associated with the execution units are also included to store information instruction processing results. Exemplary execution units include a floating point execution unit, an integer execution unit, a Jump execution unit, a load execution unit, a memory execution unit and other known execution units.

Ein Datencache niedriger Ebene und ein Datenübersetzungspuffer (D-TLB) 150 sind mit einer Ausführungseinheit (Ausführungseinheiten) 140 gekoppelt. Der Datencache ist dazu eingerichtet, kürzlich verwendete Elemente oder Elemente, auf welchen Operationen durchgeführt wurden, zu speichern, wie beispielsweise Datenoperanden, die potentiell in Speicherkohärenzzuständen gehalten werden. Der D-TLB ist dazu eingerichtet, kürzliche Adresseübersetzungen von virtuell linear in physikalisch zu speichern. Als ein spezifisches Beispiel kann ein Prozessor eine Seitentabellenstruktur umfassen, um einen physikalischen Speicher in eine Vielzahl von virtuellen Seiten aufzuteilen.A low-level data cache and a data translation buffer (D-TLB) 150 are with an execution unit (execution units) 140 coupled. The data cache is adapted to store recently used elements or elements on which operations have been performed, such as data operands potentially held in memory coherency states. The D-TLB is adapted to store recent address translations from virtual linear to physical. As a specific example, a processor may include a page table structure to divide a physical memory into a plurality of virtual pages.

Bei einer Ausführungsform ist der Prozessor 100 zur Hardwaretransaktionsausführung, Softwaretransaktionsausführung oder einer Kombination oder einem Hybrid daraus geeignet. Eine Transaktion, die auch als ein kritischer oder atomischer Code-Abschnitt bezeichnet werden kann, umfasst eine Gruppierung von Instruktionen, Operationen oder Mikrooperationen, die als eine atomische Gruppe auszuführen sind. Beispielsweise können Instruktionen oder Operationen verwendet werden, um eine Transaktion oder einen kritischen Abschnitt zu demarkieren. Bei einer Ausführungsform, die in weiteren Einzelheiten im Nachfolgenden beschrieben wird, sind diese Instruktionen ein Teil einer Gruppe von Instruktionen, wie beispielsweise einer Instruction-Set-Architecture (ISA), die durch Hardware eines Prozessors 100, wie beispielsweise die oben beschriebenen Decoder, erkennbar sind. Häufig umfassen diese Instruktionen, sobald sie von einer Sprache hoher Ebene in eine für Hardware erkennbare Assembler-Sprache kompiliert sind, Operationscodes (Op-Codes) oder andere Teile der Instruktionen, die Decoder während einer Decodierstufe erkennen.In one embodiment, the processor is 100 for hardware transaction execution, software transaction execution, or a combination or hybrid thereof. A transaction, which may also be referred to as a critical or atomic code section, comprises a grouping of instructions, operations, or micro-operations to be executed as an atomic group. For example, instructions or operations may be used to unmask a transaction or critical section. In one embodiment, which will be described in further detail below, these instructions are part of a set of instructions, such as an instruction set architecture (ISA), which may be hardware of a processor 100 , such as the decoders described above, are recognizable. Often these instructions, once compiled from a high level language into a hardware recognizable assembler language, include opcodes or other portions of the instructions that decoders detect during a decode stage.

Typischerweise sind während einer Ausführung einer Transaktion Aktualisierungen des Speichers nicht global sichtbar gemacht, bis die Transaktion committed ist. Als ein Beispiel ist ein transaktionales Schreiben in eine Stelle für einen lokalen Thread potentiell sichtbar, jedoch werden die Schreibdaten in Antwort auf ein Lesen von einem anderen Thread nicht weitergeleitet, bis die Transaktion, die das transaktionale Schreiben umfasst, committed ist. Während die Transaktion immer noch schwebend ist, werden Datenpunkte/Elemente, die von einem Speicher geladen und in einem Speicher geschrieben werden, verfolgt, wie nachfolgend in weiteren Einzelheiten erläutert wird. Sobald die Transaktion einen Commit-Punkt erreicht, wird die Transaktion committed und es werden während der Transaktion erfolgte Aktualisierungen allgemein sichtbar gemacht, falls keine Konflikte für die Transaktion erfasst wurden.Typically, during execution of a transaction, updates to the memory are not made globally visible until the transaction is committed. As an example, a transactional write to a location for a local thread is potentially visible, however, the write data is not forwarded in response to a read from another thread until the transaction comprising the transactional write is committed. While the transaction is still pending, data points / items loaded from memory and written to memory are tracked, as will be explained in more detail below. Once the transaction reaches a commit point, the transaction is committed and updates made during the transaction are made generally visible if no conflicts are detected for the transaction.

Wenn jedoch die Transaktion während des Schwebezustands invalidiert wird, wird die Transaktion abgebrochen und potentiell neu gestartet, ohne dass die Aktualisierungen global sichtbar gemacht werden. Folglich bezieht sich der Schwebezustand einer Transaktion, wie er hierin verwendet wird, auf eine Transaktion, die mit der Ausführung begonnen hat und nicht committed oder abgebrochen wurde, d. h. schwebend ist.However, if the transaction is invalidated during the suspend state, the transaction is aborted and potentially restarted without making the updates globally visible. Thus, the limbo of a transaction as used herein refers to a transaction that has begun execution and has not been committed or aborted, i. H. is floating.

Ein Software-Transaction-Memory(STM)-System betrifft häufig das Durchführen einer Zugriffsverfolgung, Konfliktauflösung oder andere Transaktionsspeicheraufgaben in Software oder zumindest teilweise in Software. Bei einer Ausführungsform ist der Prozessor 108 dazu geeignet, einen Compiler auszuführen, um einen Programmcode zur Unterstützung einer transaktionalen Ausführung zu kompilieren. Hier kann der Compiler Operationen, Calls, Funktionen und anderen Code einfügen, um eine Ausführung von Transaktionen zu ermöglichen.A software transaction memory (STM) system often involves performing access tracking, conflict resolution or other transaction storage tasks in software or at least partially in software. In one embodiment, the processor is 108 adapted to execute a compiler to compile a program code to support transactional execution. Here, the compiler can insert operations, calls, functions, and other code to allow execution of transactions.

Ein Compiler umfasst häufig ein Programm oder eine Gruppe von Programmen, um Quelltext/Code in Zieltext/Code zu übersetzen. Gewöhnlich wird eine Kompilierung eines Programms von Anwendungs-Code mit einem Compiler in mehreren Phasen und Durchgangen durchgeführt, um einen Programmiersprachencode hoher Ebene in einen Maschinen- oder Assemblersprachencode niedriger Ebene zu transformieren. Jedoch kennen Einzeldurchgangscompiler für eine einfache Kompilierung dennoch verwendet werden. Ein Compiler kann jede beliebige bekannte Kompilierungstechnik verwenden und kann jede beliebige Compileroperation durchführen, wie beispielsweise eine lexikalische Analyse, eine Vorverarbeitung, ein Parsen, eine Semantikanalyse, eine Codegenerierung, eine Codetransformation und eine Codeoptimierung.A compiler often includes a program or set of programs to translate source code into destination text / code. Typically, compilation of a program of application code with a compiler is performed in multiple phases and passes to transform a high level programming language code into a low level machine or assembly language code. However, single pass compilers still know how to be used for easy compilation. A compiler may use any known compilation technique and may perform any compiler operation, such as lexical analysis, preprocessing, parsing, semantic analysis, code generation, code transformation, and code optimization.

Größere Compiler umfassen häufig mehrere Phasen, jedoch sind diese Phasen meistens von zwei allgemeinen Phasen umfasst: (1) ein Front-End, d. h. dort, wo im Allgemeinen eine syntaktische Verarbeitung, semantische Verarbeitung und einige Transformationen/Optimierungen stattfinden können und (2) ein Back-End, d. h. dort, wo im Allgemeinen Analyse, Transformationen, Optimierungen und eine Codegenerierung stattfinden. Einige Compiler beziehen sich auf ein Middle-End, das das Verwischen der Abgrenzung zwischen einem Front-End und einem Back-End eines Compilers veranschaulicht. Als Ergebnis kann eine Bezugnahme auf ein Einfügen, eine Assoziierung, Erzeugung oder eine andere Operation eines Compilers in jeder der zuvor genannten Phasen oder in jedem der zuvor genannten Durchgänge sowie in jeder anderen bekannten Phase oder in jedem anderen bekannten Durchgang eines Compilers stattfinden. Als ein veranschaulichendes Beispiel fügt ein Compiler potentiell transaktionale Operationen, Calls, Funktionen etc. in eine oder mehrere Phasen einer Kompilierung ein, wie beispielsweise eine Einfügung von Calls/Operationen in einer Front-End-Phase einer Kompilierung und dann eine Transformation der Calls/Operationen in einem Code niedriger Ebene während einer transaktionalen Speichertransformationsphase.Larger compilers often include multiple phases, but these phases are mostly comprised of two general phases: (1) a front-end, ie where syntactic processing, semantic processing and some transformations / optimizations can generally take place and (2) Back-end, where generally analysis, transformations, optimizations, and code generation take place. Some compilers refer to a middle-end that demonstrates blurring the boundary between a front-end and a back-end of a compiler. As a result, a reference insertion, association, creation, or other operation of a compiler in each of the aforementioned phases or in each of the aforementioned passes, as well as in any other known phase or in any other known passage of a compiler. As an illustrative example, a compiler inserts potentially transactional operations, calls, functions, etc. into one or more phases of compilation, such as insertion of calls / operations in a front-end phase of compilation, and then transformation of the calls / operations in a low-level code during a transactional memory transformation phase.

Nichtsdestoweniger kompiliert der Compiler trotz der Ausführungsumgebung und der dynamischen und der statischen Natur eines Compilers gemäß einer Ausführungsform Programmcode, um eine transaktionale Ausführung zu ermöglichen. Daher betrifft eine Bezugnahme auf die Ausführung eines Programmcodes gemäß einer Ausführungsform (1) die Ausführung eines Compilerprogramms (von Compilerprogrammen) entweder dynamisch oder statisch, um einen Hauptprogrammcode zu kompilieren, transaktionale Strukturen aufrechtzuerhalten oder andere mit einer Transaktion verbundene Operationen auszuführen, (2) die Ausführung eines Hauptprogrammcodes einschließlich von transaktionalen Operationen/Calls, (3) die Ausführung von anderem Programmcode, wie beispielsweise Bibliotheken, der mit dem Hauptprogrammcode verknüpft ist, oder (4) eine Kombination daraus.Nevertheless, despite the execution environment and the dynamic and static nature of a compiler, in accordance with one embodiment, the compiler compiles program code to enable transactional execution. Therefore, reference to the execution of a program code according to an embodiment (1) relates to the execution of a compiler program (either compiler programs) either dynamically or statically to compile main program code, maintain transactional structures, or perform other transaction related operations (2) Execution of a main program code including transactional operations / calls, (3) execution of other program code, such as libraries associated with the main program code, or (4) a combination thereof.

Häufig wird in Software-Transactional-Memory(STM)-Systemen ein Compiler verwendet, um einige Operationen, Calls und anderen Codes inline mit Anwendungscode, der kompiliert werden soll, einzufügen, während andere Operationen, Calls, Funktionen und Code getrennt in Bibliotheken bereitgestellt werden. Dies bietet den Vertreibern der Bibliotheken potentiell die Möglichkeit, die Bibliotheken zu optimieren und zu aktualisieren ohne den Anwendungscode neu zu kompilieren. Als spezielles Beispiel kann ein Call an eine Commit-Funktion inline in Anwendungscode an einem Commit-Punkt einer Transaktion eingefügt werden, während die Commit-Funktion getrennt in einer aktualisierbaren Bibliothek bereitgestellt wird. Zusätzlich beeinflusst die Wahl, wo spezielle Operationen und Calls platziert werden sollen, potentiell die Effizienz des Anwendungscodes. Wenn beispielsweise eine Filteroperation, die in weiteren Einzelheiten mit Bezug zu Zugangsbarrieren unter Bezugnahme auf 6 erläutert wird, inline mit Code eingefügt wird, kann die Filteroperation vor einer vektorgeführten Ausführung zu einer Barriere anstelle eines ineffizienten Vektorings zur Barriere und eines darauffolgenden Durchführens der Filteroperation ausgeführt werden.Often, in software transactional memory (STM) systems, a compiler is used to insert some operations, calls and other codes inline with application code to be compiled, while other operations, calls, functions and code are provided separately in libraries , This potentially offers library distributors the ability to optimize and update the libraries without recompiling the application code. As a specific example, a call to a commit function may be inserted inline in application code at a commit point of a transaction, while the commit function is provided separately in an updateable library. In addition, the choice of where to place special operations and calls potentially affects the efficiency of the application code. For example, if a filtering operation described in greater detail with reference to access barriers with reference to 6 is inserted inline with code, the filtering operation may be performed before a vector-guided execution to a barrier instead of inefficient vectoring to the barrier and then performing the filtering operation.

Bei einer Ausführungsform ist der Prozessor 100 dazu geeignet, Transaktionen unter Verwendung von Hardware/Logik, d. h. in einem Hardware-Transactional-Memory(HTM)-System auszuführen. Zahlreiche spezifische Implementierungsdetails existieren aus Sicht sowohl einer architektonischen als auch einer mikroarchitektonischen Perspektive, wenn ein HTM implementiert wird; die meisten davon werden jedoch hierin nicht erläutert, um ein unnötiges Verschleiern der Erfindung zu vermeiden. Jedoch werden zu Anschauungszwecken einige Strukturen und Implementierungen offenbart. Jedoch ist zu beachten, dass diese Strukturen und Implementierungen nicht erforderlich sind und durch andere Strukturen erweitert und/oder ersetzt werden können, die andere Implementierungsdetails aufweisen.In one embodiment, the processor is 100 adapted to perform transactions using hardware / logic, ie in a hardware transactional memory (HTM) system. Many specific implementation details exist from an architectural as well as a microarchitectural perspective when implementing an HTM; however, most of them are not discussed herein to avoid unnecessarily obscuring the invention. However, for purposes of illustration, a few structures and implementations are disclosed. However, it should be understood that these structures and implementations are not required and may be extended and / or replaced by other structures having different implementation details.

Als eine Kombination kann der Prozessor 100 dazu eingerichtet sein, Transaktionen in einem ungebundenen transaktionalen Speicher(UTM)-System auszuführen, das versucht, Nutzen aus den Vorteilen sowohl des STM als auch HTM-Systems zu ziehen. Beispielsweise ist ein HTM häufig schnell und effizient für eine Ausführung kleiner Transaktionen, da es nicht auf Software angewiesen ist, um die gesamte Zugriffsverfolgung, Konflikterfassung, Validierung und Commit für Transaktionen durchzuführen. Jedoch sind HTMs gewöhnlicherweise nur dazu in der Lage, kleinere Transaktionen zu behandeln, während STMs dazu geeignet sind, Transaktionen unbeschränkter Größe zu behandeln. Daher verwendet gemäß einer Ausführungsform ein UTM-System Hardware, um kleinere Transaktionen auszuführen, und Software, um Transaktionen auszuführen, die zu groß für die Hardware sind. Wie anhand der nachfolgenden Erläuterung zu erkennen ist, kann Hardware, selbst wenn Software Transaktionen behandelt, dazu verwendet werden, die Software zu unterstützen und sie zu beschleunigen. Darüber hinaus muss beachtet werden, dass dieselbe Hardware ebenfalls dazu verwendet werden kann, ein reines STM-System zu unterstützen und zu beschleunigen.As a combination, the processor 100 be configured to perform transactions in an unbound transactional storage (UTM) system attempting to take advantage of the benefits of both the STM and HTM systems. For example, an HTM is often fast and efficient for running small transactions because it does not rely on software to perform all access tracking, conflict detection, validation, and commit transactions. However, HTMs are usually only able to handle smaller transactions, while STMs are capable of handling unrestricted size transactions. Thus, in one embodiment, a UTM system uses hardware to perform smaller transactions and software to perform transactions that are too large for the hardware. As will be seen from the discussion below, hardware, even when handling software transactions, can be used to support and accelerate the software. In addition, it should be noted that the same hardware can also be used to support and accelerate a pure STM system.

Wie oben erläutert, umfassen Transaktionen transaktionale Speicherzugriffe auf Datenelemente sowohl durch lokale Verarbeitungselemente innerhalb eines Prozessors 100 als auch potentiell durch andere Verarbeitungselemente. Ohne Sicherheitsmechanismen in einem transaktionalen Speichersystem würden einige dieser Zugriffe potentiell zu ungültigen Daten und einer ungültigen Ausführung führen, d. h. zu einem Schreiben in Daten, das ein Lesen invalidiert, oder zu einem Lesen von ungültigen Daten. Folglich umfasst ein Prozessor 100 potentiell Logik, um Speicherzugriffe auf und von Datenelementen zur Identifikation von potentiellen Konflikten zu verfolgen oder zu überwachen, wie beispielsweise Lesemonitore und Schreibmonitore, wie nachfolgend erläutert wird.As discussed above, transactions include transactional memory accesses to data elements by both local processing elements within a processor 100 as well as potentially by other processing elements. Without security mechanisms in a transactional storage system, some of these accesses would potentially result in invalid data and invalid execution, ie, writing to data invalidating a read, or reading invalid data. Consequently, a processor includes 100 potentially logic to memory accesses to and from data elements for identification of to track or monitor potential conflicts, such as reading monitors and writing monitors, as explained below.

Eine Dateneinheit oder ein Datenelement kann Daten bei jedem beliebigen Granularitätsniveau umfassen, wie durch Hardware, Software oder einer Kombination daraus definiert wird. Eine nicht abschließende Liste von Beispielen von Daten, Datenelementen, Dateneinheiten oder Bezugnahmen darauf umfassen eine Speicheradresse, ein Datenobjekt, eine Klasse, ein Feld eines Typs eines dynamischen Sprachcodes, einen Typ eines dynamischen Sprachcodes, eine Variable, einen Operanden, eine Datenstruktur und eine indirekte Bezugnahme auf eine Speicheradresse. Jedoch kann jedes beliebige bekannte Gruppieren von Daten als ein Datenelement oder eine Dateneinheit bezeichnet werden. Einige der obengenannten Beispiele, wie beispielsweise ein Feld eines Typs eines dynamischen Sprachcodes und ein Typ eines dynamischen Sprachcodes betreffen Datenstrukturen eines dynamischen Sprachcodes. Zur Veranschaulichung ist dynamischer Sprachcode, wie beispielsweise Java^TM von Sun Microsystems, Inc. eine stark typisierte Sprache. Jede Variable umfasst einen Typ, der zur Kompilierungszeit bekannt ist. Die Typen sind in zwei Kategorien unterteilt – primitive Typen (Boolsch und numerisch, z. B. int, float) und Referenztypen (Classes, Interfaces und Arrays). Die Werte von Referenztypen sind Bezugnahmen auf Objekte. In Java kann ein Objekt, das aus Feldern besteht, eine Klasseninstanz oder ein Array sein. Unter der Annahme eines Objektes a einer Klasse A wird üblicherweise die Notation A::x verwendet, um auf das Feld x eines Typs A zu verweisen, und a.x, um auf das Feld x eines Objekts a einer Klasse A zu verweisen. Beispielsweise kann ein Ausdruck als a.x = a.y + a.z ausgedrückt werden. Hierbei werden das Feld y und das Feld z zum Hinzufügen geladen und das Ergebnis wird in das Feld x geschrieben.A data unit or data item may include data at any level of granularity, as defined by hardware, software, or a combination thereof. A non-exhaustive list of examples of data, data elements, data units, or references thereto, includes a memory address, a data object, a class, a dynamic language code type field, a dynamic language code type, a variable, an operand, a data structure, and an indirect one Reference to a memory address. However, any known grouping of data may be referred to as a data item or unit of data. Some of the above examples, such as a dynamic language code type field and a dynamic language code type, relate to dynamic language code data structures. By way of illustration, dynamic language code, such as Java ^™ from Sun Microsystems, Inc., is a strongly typed language. Each variable has a type known at compile time. The types are divided into two categories - primitive types (Boolean and numeric, such as int, float) and reference types (Classes, Interfaces, and Arrays). The values of reference types are references to objects. In Java, an object consisting of fields can be a class instance or an array. Assuming an object a of a class A, the notation A :: x is usually used to refer to the field x of a type A, and ax to refer to the field x of an object a of a class A. For example, an expression can be expressed as ax = ay + az. In this case, the field y and the field z are loaded for addition and the result is written in the field x.

Daher können Überwachungs-/Pufferungsspeicherzugriffe auf Datenelemente bei jeder beliebigen Datenniveaugranularität durchgeführt werden. Beispielsweise werden bei einer Ausführungsform Speicherzugriffe auf Daten auf einer Typebene überwacht. Hierbei können ein transaktionales Schreiben in ein Feld A::x und ein nicht transaktionales Laden eines Feldes A::y als Zugriffe auf dasselbe Datenelement überwacht werden, d. h. Typ A. Bei einer anderen Ausführungsform wird eine Speicherzugriffüberwachung/Puffern auf einer Feldniveaugranularität durchgeführt. Hierbei werden ein transaktionales Schreiben in A::x und ein nichttransaktionales Laden von A::y nicht als Zugriffe auf dasselbe Datenelement überwacht, da sie Bezugnahmen auf getrennte Felder darstellen. Man beachte, dass andere Datenstrukturen oder Programmiertechniken beim Verfolgen von Speicherzugriffen auf Datenelemente berücksichtigt werden können. Als Beispiel sei angenommen, dass die Felder x und y des Objekts einer Klasse A, d. h. A::x und A::y, die auf Objekte einer Klasse B zeigen, auf neu zugewiesene Objekte initialisiert werden und dass nach einer Initialisierung niemals in diese geschrieben wird. Bei einer Ausführungsform wird ein transaktionales Schreiben in ein Feld B::z eines Objekts, auf das durch A::x gezeigt wird, hinsichtlich eines nicht transaktionalen Ladens eines Feldes B::z eines Objekts, auf das durch A::y gezeigt wird, nicht als Speicherzugriff auf dasselbe Datenelement überwacht. Anhand der Extrapolation dieser Beispiele kann festgestellt werden, dass Monitore ein Überwachen/Puffern bei jedem beliebigen Datengranularitätniveau ausführen können.Therefore, monitoring / buffering memory accesses to data elements may be performed at any data level granularity. For example, in one embodiment, memory accesses to data at a type level are monitored. In this case, a transactional write to a field A :: x and a non-transactional loading of a field A :: y can be monitored as accesses to the same data element, i. H. Type A. In another embodiment, memory access monitoring / buffering is performed on a field level granularity. In this case, a transactional write in A :: x and a non-transactional load of A :: y are not monitored as accesses to the same data item because they are references to separate fields. Note that other data structures or programming techniques may be considered when tracking memory accesses to data items. For example, suppose that the fields x and y of the object of class A, d. H. A :: x and A :: y point to class B objects, initialize to newly assigned objects, and never write to them after initialization. In one embodiment, a transactional write to a field B :: z of an object pointed to by A :: x is considered to be a non-transactional loading of a field B :: z of an object pointed to by A :: y , not monitored as memory access to the same data item. By extrapolating these examples, it can be seen that monitors can perform monitoring / buffering at any level of data granularity.

Gemäß einer Ausführungsform umfasst der Prozessor 100 Monitor, um Zugriffe und potentielle nachfolgende Konflikte, die mit diesen Datentelementen verbunden sind, zu erfassen oder zu verfolgen. Als Beispiel umfasst Hardware eines Prozessors 100 Lesemonitore und Schreibmonitore, um Laden und Speichern zu verfolgen, die entsprechend überwacht werden sollen. Zum Beispiel sollen Hardwarelesemonitore und -schreibmonitore trotz der Granularität von zugrundeliegenden Speicherstrukturen Datenelemente bei einer Granularität der Datenelemente überwachen. Bei einer Ausführungsform ist ein Datenelement durch Verfolgungsmechanismen, die mit der Granularität der Speicherstrukturen verbunden sind, beschränkt, um sicherzustellen, dass zumindest das gesamte Datenelement in geeigneter Weise überwacht wird.In one embodiment, the processor includes 100 Monitor to capture or track accesses and potential subsequent conflicts associated with those data elements. As an example, hardware includes a processor 100 Read monitors and write monitors to track loading and saving, which should be monitored accordingly. For example, despite the granularity of underlying storage structures, hardware monitor and write monitors should monitor data items at granularity of the data items. In one embodiment, a data element is constrained by tracking mechanisms associated with the granularity of the memory structures to ensure that at least the entire data element is properly monitored.

Als spezifisches veranschaulichendes Beispiel umfassen Lese- und Schreibmonitore mit Cache-Orten verbundene Attribute, wie beispielsweise Orte in einem Datencache 150 niedrigerer Ebene, um Laden von und Speichern in Adressen zu überwachen, die mit diesen Stellen verbunden sind. Hierbei wird ein Leseattribut für einen Cache-Ort eines Daten-Caches 150 bei einem Leseereignis auf eine Adresse gesetzt, die mit dem Cache-Ort verbunden ist, der auf potentiell konfliktionäres Schreiben zur selben Adresse zu überwachen ist. In diesem Fall operieren Attribute in einer ähnlichen Weise für Schreibereignisse, um ein potentiell konfliktionäres Lesen und Schreiben an derselben Adresse zu überwachen. Um dieses Beispiels weiter fortzusetzen, ist Hardware in der Lage, Konflikte basierend auf Snoops für Lesen und Schreiben zu Cache-Orten mit Lese- und Schreibattributen zu erfassen, die festgelegt wurden, um anzuzeigen, dass die Cache-Orte entsprechend überwacht werden. Umgekehrt führt ein Setzen von Lese- und Schreibmonitoren oder ein Aktualisieren eines Cache-Orts zu einem gepufferten Zustand gemäß einer Ausführungsform zu Snoops, wie beispielsweise Leseanfragen oder Lesen-für-Besitz-Anfragen, die es ermöglichen, Konflikte mit Adressen, die in anderen Caches überwacht werden, zu erfassen.As a specific illustrative example, read and write monitors include attributes associated with cache locations, such as locations in a data cache 150 lower level to monitor load from and store to addresses associated with these locations. This becomes a read attribute for a cache location of a data cache 150 on a read event, set to an address associated with the cache location to be monitored for potentially conflicting writing to the same address. In this case, attributes operate in a similar manner for write events to monitor potentially conflicting reading and writing at the same address. To continue this example, hardware is able to detect read / write conflicts based on read and write snoops to cache locations with read and write attributes that are set to indicate that the cache locations are monitored appropriately. Conversely, setting read and write monitors or updating a cache location to a buffered state in accordance with one embodiment results in snoops such as Read requests or read-for-ownership requests that allow conflicts with addresses monitored in other caches to be captured.

Daher führen, basierend auf der Konstruktion, verschiedene Kombinationen von Cache-Kohärenz-Anfragen und überwachten Kohärenz-Zuständen von Cache-Zeilen zu potentiellen Konflikten, wie beispielsweise, dass eine Cache-Zeile ein Datenelement in einem gemeinsamen leseüberwachten Zustand hält und ein Snoop eine Schreibanfrage auf das Datenelement anzeigt. Umgekehrt können eine Cache-Zeile, die ein Datenelement hält, das sich in einem gepufferten Schreibzustand befindet und ein externer Snoop, der eine Leseanfrage zu dem Datenelement anzeigt, als potentiell konfliktionär betrachtet werden. Gemäß einer Ausführungsform ist eine Snoop-Logik an eine Konflikterfassungs/Berichtslogik wie beispielsweise Monitore und/oder eine Logik zur Konflikterfassung/Bericht, sowie Statusregister, um die Konflikte zu berichten, angeschlossen, um derartige Kombinationen von Zugriffsanfragen und Attributzuständen zu erfassen.Therefore, based on the design, various combinations of cache coherency requests and monitored cache line coherency states lead to potential conflicts, such as a cache line holding a data item in a common read-monitored state and a snoop holding a write request to the data element. Conversely, a cache line holding a data item that is in a buffered write state and an external snoop that is indicating a read request to the data item may be considered potentially conflicting. According to one embodiment, snoop logic is connected to conflict detection / reporting logic such as monitors and / or conflict detection / reporting logic, as well as status registers to report the conflicts, to detect such combinations of access requests and attribute states.

Jedoch können beliebige Kombinationen von Bedingungen und Szenarien als für eine Transaktion invalidierend angesehen werden, die durch eine Instruktion, wie beispielsweise eine Commit-Instruktion definiert sein kann. Dies ist nachfolgend in weiteren Einzelheiten mit Bezugnahme auf 11–12 erläutert. Beispiele von Faktoren, die für ein Non-Commit einer Transaktion betrachtet werden können, umfassen ein Erfassen eines Konflikts bei einer transaktional zugegriffenen Speicherstelle, einen Verlust von Monitorinformationen, einen Verlust von gepufferten Daten, ein Verlust von Metadaten, die mit einem transaktional zugegriffenen Datenelement verbunden sind, und ein Erfassen eines anderen invalidierenden Ereignisses, wie beispielsweise eines interrupts, eines Ring-Übergangs oder einer expliziten Nutzerinstruktion.However, any combination of conditions and scenarios may be considered invalidating for a transaction, which may be defined by an instruction such as a commit instruction. This will be described in more detail below with reference to FIG 11 - 12 explained. Examples of factors that may be considered non-commit to a transaction include detecting a conflict in a transactionally accessed memory location, a loss of monitor information, a loss of buffered data, a loss of metadata associated with a transactionally accessed data item and detecting another invalidating event, such as an interrupt, a ring transition, or an explicit user instruction.

Gemäß einer Ausführungsform ist Hardware des Prozessors 100 dafür vorgesehen, transaktionale Aktualisierungen in einer gepufferten Weise zu halten. Wie oben erläutert, werden transaktionale Schreibvorgänge nicht global sichtbar gemacht bis zu einem Commit einer Transaktion. Jedoch ist ein lokaler Software-Thread, der mit dem transaktionalen Schreibvorgang verbunden ist, in der Lage, auf die transaktionalen Aktualisierungen für nachfolgende transaktionale Zugriffe zuzugreifen. Als ein erstes Beispiel wird eine separate Pufferstruktur im Prozessor 100 bereit gestellt, um die gepufferten Aktualisierungen zu halten, die dazu eingerichtet ist, die Aktualisierungen zum lokalen Thread und nicht zu anderen externen Threads zu liefern. Jedoch ist die Einbeziehung einer separaten Pufferstruktur potenziell teuer und komplex.According to one embodiment, hardware is the processor 100 intended to keep transactional updates in a buffered manner. As discussed above, transactional writes are not made globally visible until a transaction is committed. However, a local software thread associated with the transactional write is able to access the transactional updates for subsequent transactional accesses. As a first example, a separate buffer structure will be in the processor 100 provided to hold the buffered updates that is set up to deliver the updates to the local thread and not to other external threads. However, including a separate buffer structure is potentially expensive and complex.

Im Gegensatz dazu wird als ein weiteres Beispiel ein Cache-Speicher, wie beispielsweise ein Daten-Cache 150 verwendet, um die Aktualisierungen zu puffern, während dieselbe transaktionale Funktionalität bereitgestellt wird. Hierbei ist der Cache 150 dazu in der Lage, Datenelemente in einem gepufferten Kohärenz-Zustand zu halten; in einem Fall wird ein neuer gepufferter Kohärenz-Zustand zu einem Cache-Kohärenz-Protokoll hinzugefügt, wie beispielsweise einem Modified-Exclusive-Shared-Invalid(MESI)-Protokoll, um ein MESIB-Protokoll zu bilden. In Reaktion auf lokale Anfragen für ein gepuffertes Datenelement – ein Datenelement, das in einem gepufferten Kohärenz-Zustand gehalten wird, liefert der Cache 150 das Datenelement zum lokalen Verarbeitungselement, um ein internes transaktionales sequentielles Anfordern sicher zu stellen. Jedoch wird in Reaktion auf externe Zugriffsanfragen eine Fehler-Antwort geliefert, um sicher zu stellen, dass das transaktional aktualisierte Datenelement bis zu einem Commit nicht global sichtbar gemacht wird. Darüber hinaus wird die gepufferte Aktualisierung nicht zu Cache-Speichern höherer Ebene zurückgeschrieben, wenn eine Zeile eines Caches 150 in einem gepufferten Kohärenz-Zustand gehalten ist und für eine Entleerung ausgewählt wurde – die gepufferte Aktualisierung soll nicht durch das Speichersystem verbreitet werden, d. h. soll nicht bis nach einem Commit global sichtbar gemacht werden. Beim Commit werden die gepufferten. Zeilen in einen modifizierten Zustand übertragen, um das Datenelement global sichtbar zu machen.In contrast, as another example, a cache memory such as a data cache is used 150 used to buffer the updates while providing the same transactional functionality. Here is the cache 150 being able to hold data items in a buffered coherence state; in one case, a new buffered coherency state is added to a cache coherency protocol, such as a Modified Exclusive Shared Invalid (MESI) protocol, to form a MESIB protocol. In response to local requests for a buffered data item - a data item held in a buffered coherency state, the cache provides 150 the data item to the local processing element to ensure internal transactional sequential request. However, in response to external access requests, an error response is provided to ensure that the transactionally updated data item is not made globally visible until committed. Additionally, the buffered update is not restored to higher-level cache stores when one line of a cache 150 is kept in a buffered coherency state and has been selected for eviction - the buffered update should not be propagated through the storage system, ie, should not be made globally visible until after a commit. At commit the buffered. Transfer lines to a modified state to make the data element globally visible.

Man beachte, dass die Ausdrücke intern und extern aus der Perspektive eines Threads in Verknüpfung mit der Ausführung einer Transaktion oder Verarbeitungselementen, die sich einen Cache gemeinsam teilen, häufig relativ sind. Beispielsweise wird ein erstes Verarbeitungselement zur Ausführung eines Software-Threads, der mit der Ausführung einer Transaktion verbunden ist, als ein lokaler Thread bezeichnet. Daher wird in obiger Erörterung, falls ein Speichern an oder Laden von einer Adresse empfangen wird, in die vorher durch den ersten Thread geschrieben wurde, was zu einer Cachezeile für die Adresse führt, die in einem gepufferten Kohärenz-Zustand gehalten wird, dem ersten Thread die gepufferte Version der Cache-Zeile geliefert, da es sich um den lokalen Thread handelt. Im Gegensatz dazu kann ein zweiter Thread auf einem anderen Verarbeitungselement im selben Prozessor ausführen, ist jedoch nicht mit einer Ausführung der Transaktion verbunden, die für die Cache-Zeile verantwortlich ist, die in dem gepufferten Zustand gehalten wird – ein externer Thread; daher verfehlt ein Laden oder Speichern vom zweiten Thread an der Adresse die gepufferte Version der Cache-Zeile und eine normale Cache-Ersetzung wird verwendet, um die nicht gepufferte Version der Cache-Zeile aus einem Speicher höherer Ebene abzurufen.Note that the expressions internally and externally are often relative from the perspective of a thread associated with the execution of a transaction or processing elements that share a cache. For example, a first processing element for executing a software thread associated with the execution of a transaction is referred to as a local thread. Therefore, in the above discussion, if storing or loading is received from an address previously written by the first thread, resulting in a cache line for the address held in a buffered coherency state, the first thread is received supplied the buffered version of the cache line because it is the local thread. In contrast, a second thread may execute on a different processing element in the same processor, but is not associated with any execution of the transaction responsible for the cache line being held in the buffered state - an external thread; therefore, loading or saving from the second thread at the address misses the buffered version of the cache line and a normal cache replacement is used to fetch the non-buffered version of the cache line from higher-level memory.

Hierin werden die internen/lokalen und externen/entfernten Threads auf demselben Prozessor ausgeführt und bei einigen Ausführungsformen können sie auch auf getrennten Verarbeitungselementen im selben Kern eines Prozessors ausgeführt werden, die sich einen Zugriff auf den Cache gemeinsam teilen. Jedoch ist der Gebrauch dieser Ausdrücke nicht in dieser Weise beschränkt. Wie oben erläutert, kann sich lokal auf mehrere Threads beziehen, die sich einen Zugriff auf einen Cache gemeinsam teilen, uns muss nicht spezifisch für einen einzelnen Thread sein, der mit der Ausführung der Transaktion verbunden ist, während sich extern oder entfernt auf Threads beziehen kann, die sich einen Zugang zum Cache nicht gemeinsam teilen. Herein, the internal / local and external / remote threads are executed on the same processor and, in some embodiments, may also be executed on separate processing elements in the same core of a processor that share access to the cache in common. However, the use of these terms is not limited in this way. As discussed above, a local thread may refer to multiple threads that share access to a cache in common, we need not be specific to a single thread associated with running the transaction, while externally or remotely may refer to threads that do not share access to the cache.

Wie oben in der ersten Bezugnahme zu 1 erläutert wurde, ist die Architektur des Prozessors 100 zum Zweck einer Erläuterung rein veranschaulichend. Auf ähnliche Weise sind die speziellen Beispiele des Übersetzens von Datenadressen, um auf Metadaten zu verweisen, ebenfalls beispielhaft, da jedes beliebige Verfahren eines Zuordnen von Daten zu Metadaten in getrennten Einträgen desselben Speichers verwendet werden kann.As above in the first reference to 1 has been explained is the architecture of the processor 100 for illustrative purposes only. Similarly, the specific examples of translating data addresses to refer to metadata are also exemplary, as any method of mapping data to metadata in separate entries of the same memory can be used.

Metaphysikalische Address-Räume für MetadatenMetaphysical address spaces for metadata

Metadatenmetadata

Mit Bezug zu 2 wird eine Ausführungsform eines Haltens von Metadaten für ein Datenelement in einem Prozessor veranschaulicht. Wie dargestellt, werden Metadaten 217 für ein Datenelement 216 lokal in einen Speicher 215 gehalten. Metadaten umfassen jede Eigenschaft oder Attribut, die bzw. das mit einem Datenelement 216 verbunden ist, wie beispielsweise transaktionale Informationen, die ein Datenelement 216 betreffen. Einige veranschaulichende Beispiele von Metadaten sind nachfolgend einbezogen; Jedoch sind die offenbarten Beispiele von Metadaten rein veranschaulichend und bilden keine abschließende Liste. Zusätzlich kann ein Metadatenort 217 jede beliebige Kombination der nachfolgend erläuterten Beispiele und andere Attribute für ein Datenelement 216, die nicht speziell erläutert werden, halten.In reference to 2 An embodiment of holding metadata for a data item in a processor is illustrated. As shown, metadata 217 for a data element 216 locally in a store 215 held. Metadata includes any property or attribute associated with a data item 216 connected, such as transactional information, which is a data element 216 affect. Some illustrative examples of metadata are included below; However, the disclosed examples of metadata are merely illustrative and do not form an exhaustive list. Additionally, a metadata location 217 any combination of the examples discussed below and other attributes for a data item 216 that are not specifically explained.

Als ein erstes Beispiel umfassen Metadaten 217 eine Bezugnahme auf einen Backup- oder Pufferort für ein transaktional geschriebenes Datenelement 216, falls innerhalb einer Transaktion auf das Datenelement 216 vorher zugegriffen wurde, dieses gepuffert und/oder eine Sicherungskopie gemacht wurde. Hierin wird bei einigen Implementierungen eine Backup-Kopie einer früheren Version eines Datenelements 216 an einem anderen Ort gehalten und als ein Ergebnis umfassen die Metadaten 217 eine Adresse oder eine andere Bezugnahme auf den Backup-Ort.As a first example, metadata includes 217 a reference to a backup or buffer location for a transactionally written data item 216 if within a transaction on the data item 216 previously accessed, buffered, and / or backed up. Herein, in some implementations, becomes a backup copy of an earlier version of a data item 216 held in a different location, and as a result, the metadata includes 217 an address or other reference to the backup location.

Alternativ können Metadaten 217 selbst als ein Backup- oder Pufferort für ein Datenelement 216 wirken.Alternatively, metadata 217 even as a backup or buffer location for a data item 216 Act.

Als ein weiteres Beispiel umfassen Metadaten 217 einen Filterwert, um wiederholte transaktionsle Zugriffe auf ein Datenelement 216 zu beschleunigen. Häufig werden während einer Ausführung einer Transaktion unter Verwendung von Software bei transaktionalen Speicherzugriffen Zugriffsbarrieren ausgeführt, um eine Konsistenz und Datengültigkeit sicherzustellen. Beispielsweise wird vor einer transaktionalen Ladeoperation eine Lesebarriere ausgeführt, um Lesebarrierenoperationen durchzuführen, wie beispielsweise ein Testen, ob ein Datenelement 216 unverriegelt ist, Bestimmen ob ein momentaner Lesesatz der Transaktion immer noch gültig ist, Aktualisieren eines Filterwertes und Protokollieren von Versionswerten im Lesesatz, der für die Transaktion gesetzt wurde, um eine spätere Validierung zu ermöglichen. Falls jedoch ein Lesen dieses Ortes bereits während der Ausführung der Transaktion durchgeführt wurde, sind dieselben Lesebarrieren-Operationen potenziell nicht notwendig.As another example, metadata includes 217 a filter value to repeat transactional accesses to a data item 216 to accelerate. Often, during transaction execution using software in transactional memory accesses, access barriers are executed to ensure consistency and data validity. For example, before a transactional load operation, a read barrier is executed to perform read barrier operations, such as testing whether a data item 216 is unlocked, determining whether a current read record of the transaction is still valid, updating a filter value, and logging version values in the read set set for the transaction to allow for later validation. However, if a read of this location was already performed during the execution of the transaction, the same read barrier operations are potentially not necessary.

Folglich umfasst eine Lösung ein Verwenden eines Lesefilters, um einen ersten Voreinstellungswert zu halten, um anzuzeigen, dass das Datenelement 216 oder die Adresse für dieses während einer Ausführung der Transaktion nicht gelesen wurde, und um einen zweiten zugegriffenen Wert zu halten, um anzuzeigen, dass auf das Datenelement 216 oder die Adresse dafür bereits während eines Schwebezustands der Transaktion zugegriffen wurde. Im Wesentlichen zeigt der zweite zugegriffene Wert an, oh die Lesebarriere beschleunigt werden sollte. In diesem Fall, falls eine transaktionale Ladeoperation erhalten wird und der Lesefilterwert an der Metadatenstelle 217 angibt, dass das Datenelement 216 bereits gelesen wurde, wird bei einer Ausführungsform die Lesebarriere ignoriert – nicht ausgeführt-, um die transaktionale Ausführung zu beschleunigen, indem nicht notwendige redundante Lesebarrieren-Operationen nicht durchgeführt werden. Man beachte, dass ein Schreibfilterwert mit Bezug zu Schreiboperationen in derselben Weise funktionieren kann. Jedoch sind individuelle Filter werte rein veranschaulichend, da bei einer Ausführungsform ein einzelner Filterwert verwendet wird, um anzuzeigen, ob auf eine Adresse bereits zugegriffen wurde – entweder gelesen oder geschrieben wurde. Hierin verwenden Metadaten-Zugriffsoperationen, um Metadaten 217 für 216 sowohl für Laden als auch für Speichern zu prüfen, den einzelnen Filterwert, was im Gegensatz zu den obigen Beispielen steht, bei welchen Metadaten 217 einen separaten Lesefilterwert und Schreibfilterwert umfassen. Als eine spezielle veranschaulichende Ausführungsform werden vier Bits von Metadaten 217 einem Lesefilter, um anzuzeigen, ob eine Lesebarriere hinsichtlich eines zugeordneten Datenelements zu beschleunigen ist, einem Schreibfilter, um anzuzeigen, ob eine Schreibbarriere hinsichtlich eines zugeordneten Datenelements zu beschleunigen ist, einem Undo-Filter, um anzuzeigen ob Undo-Operationen zu beschleunigen sind, und einem Mischfilter zugewiesen, der auf jede beliebige Weise durch Software als ein Filterwert verwendet werden kann.Thus, one approach involves using a read filter to hold a first default value to indicate that the data item 216 or the address for this was not read during execution of the transaction, and to hold a second accessed value to indicate that the data item 216 or the address has already been accessed during a suspend state of the transaction. In essence, the second value accessed indicates that the reading career should be speeded up. In this case, if a transactional load operation is obtained and the read filter value at the metadata point 217 indicates that the data item 216 In one embodiment, the read barrier is ignored - not executed - to expedite transactional execution by not performing unnecessary redundant read barrier operations. Note that a write filter value may operate in the same way with respect to write operations. However, individual filter values are purely illustrative because in one embodiment a single filter value is used to indicate whether an address has already been accessed - either read or written. Here metadata access operations use metadata 217 For 216 both for shop as also check for save the single filter value, which is in contrast to the above examples, for which metadata 217 include a separate read filter value and write filter value. As a specific illustrative embodiment, four bits of metadata 217 a read filter to indicate whether to accelerate a read barrier for an associated data item, a write filter to indicate whether a write barrier is to be accelerated with respect to an associated data item, an undo filter to indicate whether undo operations are to be expedited, and assigned to a blending filter that can be used in any manner by software as a filter value.

Einige wenige andere Beispiele von Metadaten umfassen einen Hinweis einer Darstellung einer oder eine Bezugnahme auf eine Adresse für einen Handler, der entweder generisch oder spezifisch für eine mit einem Datenelement verbundene Transaktion ist, eine unwiderrufliche/eigensinnige Natur einer mit einem Datenelement 216 verbundenen Transaktion, einen Verlust eines Datenelements 216, einen Verlust von Überwachungsinformationen für ein Datenelement 216, einen für ein Datenelement 216 erfassten Konflikt, eine Adresse eines Lesesatzes oder Leseeintrags in einem Lesesatz, die mit einem Datenelement verbunden ist, eine zuvor aufgezeichnete Version für ein Datenelement 216, eine momentane Version eines Datenelements 216, eine Verriegelung um einen Zugriff auf ein Datenelement 216 zu erlauben, einen Versionswert für das Datenelement 216, einen Transaktionsdeskriptor für die mit dem Datenelement 216 verbundene Transaktion und andere mit einer Transaktion in Verbindung stehende deskriptive Informationen. Darüber hinaus ist eine Verwendung von Metadaten, wie oben beschrieben wurde, nicht auf transaktionale Informationen beschränkt. Ms eine logische Folge können Metadaten 217 auch Informationen, Eigenschaften, Attribute oder Zustände umfassen, die mit dem Datenelement 216 verbunden sind, die nicht an einer Transaktion beteiligt sind.A few other examples of metadata include an indication of a representation or reference to an address for a handler that is either generic or specific to a transaction associated with a data item, an irrevocable / willful nature of having a data item 216 associated transaction, a loss of a data item 216 , a loss of monitoring information for a data item 216 , one for a data element 216 detected conflict, an address of a read sentence or read entry in a read record associated with a data item, a prerecorded version for a data item 216 , a current version of a data element 216 , a lock on access to a data item 216 to allow a version value for the data item 216 , a transaction descriptor for the data element 216 associated transaction and other descriptive information associated with a transaction. Moreover, as described above, use of metadata is not limited to transactional information. Ms a logical consequence may be metadata 217 Also include information, properties, attributes or states that are associated with the data element 216 who are not involved in a transaction.

In Fortsetzung der Erläuterung von Darstellungen für Metadaten werden auch die oben beschriebenen Hardware-Monitore und gepufferten Kohärenz-Zustände bei einigen Ausführungsformen als Metadaten betrachtet. Die Monitore geben an, ob ein Ort auf externe Leseanfragen oder externe Read-For-Ownership-Anfragen (Lesen-für-Besitz-Anfragen) zu überwachen ist, während der gepufferte Kohärenz-Zustand anzeigt, ob eine zugeordnete Daten-Cache-Zeile, die ein Datenelement hält, gepuffert ist. Jedoch werden bei den oben angegebenen Beispielen Monitore als Attribut-Bits unterhalten, die an Cache-Zeilen angefügt sind oder auf andere Weise direkt mit Cache-Zeilen verbunden sind, während der gepufferte Kohärenz-Zustand zu Cache-Zeilen-Kohärenz-Zustands-Bits hinzugefügt wird. Folglich sind in diesem Fall Hardware Monitore und gepufferte Kohärenz-Zustände Teil der Cache-Zeilen-Struktur, die nicht in einem getrennten metaphysikalischen Addressraum gehalten wird, wie die veranschaulichten Metadaten 217. Jedoch können bei anderen Ausführungsformen Monitore als Metadaten 217 an einem vom Datenelement 216 getrennten Speicherort gehalten werden und können auf ähnliche Weise Metadaten 217 eine Referenz umfassen, um anzuzeigen, dass das Datenelement 216 ein gepuffertes Datenelement ist. Umgekehrt können an Stelle einer Update-In-Place-Architektur (vor Ort aktualisierte Architektur), bei der ein Datenelement 216 aktualisiert und in einem gepufferten Zustand gehalten wird, wie oben beschrieben wurde, Metadaten 217 das gepufferte Datenelement halten, während die global sichtbare Version des Datenelements 216 an seinem ursprünglichen Ort beibehalten wird. Hierin ersetzt bei einem Commit die gepufferte Aktualisierung, die in den Metadaten 217 gehalten wird, das Datenelement 216.Continuing to explain metadata representations, the hardware monitors and buffered coherency states described above are also considered metadata in some embodiments. The monitors indicate whether a location is to be monitored for external read requests or external read-for-ownership (read-for-ownership) requests, while the buffered coherency state indicates whether an associated data cache line, which holds a data item is buffered. However, in the examples given above, monitors are maintained as attribute bits appended to cache lines or otherwise directly connected to cache lines while the buffered coherency state is added to cache line coherency state bits becomes. Thus, in this case, hardware monitors and buffered coherency states are part of the cache line structure that is not kept in a separate metaphysical address space, such as the illustrated metadata 217 , However, in other embodiments, monitors may be metadata 217 at one of the data element 216 be kept separate storage location and can similarly metadata 217 include a reference to indicate that the data item 216 is a buffered data item. Conversely, in place of an update-in-place architecture (locally updated architecture), a data element can be used 216 is updated and maintained in a buffered state as described above, metadata 217 hold the buffered data item while the globally visible version of the data item 216 maintained in its original location. Herein, at a commit, replaces the buffered update that is in the metadata 217 is held, the data element 216 ,

Verlustbehaftete MetadatenLossy metadata

Ähnlich zur oben geführten Erörterung mit Bezugnahme auf gepufferte Cache-Kohärenzzustände sind Metadaten 217 in einer Ausführungsform verlustbehaftet-lokale Informationen, die nicht zur Domain externer Anfragen außerhalb des Speichers geliefert werden. Unter der Annahme, dass für eine Ausführungsform dieser Speicher 215 ein gemeinsamer Cache-Speicher ist, wird ein Miss (Fehlschlag) in Reaktion auf eine Metadaten-Zugriffsoperation nicht zur Domain außerhalb des Cache-Speichers zugestellt. Im Wesentlichen gibt es keinen Grund, den Miss extern weiterzuleiten, um die Anfragen von einem Speicher höherer Ebene zu bedienen, da verlustbehaftete Metadaten 217 lediglich lokal in der Cache-Domäne gehalten werden und nicht als dauerhafte Daten im gesamten Speicheruntersystem existieren. Folglich werden Misses auf verlustbehaftete Metadaten potentiell in einer raschen und effizienten Weise behandelt; eine unmittelbare Allokation von Speicher im Prozessor kann zugeordnet werden, ohne auf eine Erzeugung oder Behandlung einer externen Anfrage für die Metadaten zu warten.Similar to the discussion above with reference to buffered cache coherency states are metadata 217 in one embodiment, lossy-local information that is not delivered to the domain of external requests outside of the store. Assuming that for one embodiment of this memory 215 is a shared cache, a miss is not delivered to the domain outside of the cache in response to a metadata access operation. In essence, there is no reason to relay the miss externally to service the requests from a higher level store, as lossy metadata 217 are only held locally in the cache domain and do not persist as persistent data throughout the storage subsystem. Consequently, misses on lossy metadata are potentially treated in a rapid and efficient manner; an immediate allocation of memory in the processor can be allocated without waiting for a generation or handling of an external request for the metadata.

Metaphysikalischer AdressraumMetaphysical address space

Wie die veranschaulichte Ausführungsform zeigt, werden Metadaten 217 an einem vom Datenelement 216 getrennten Speicherort – einer verschiedenen Adresse – gehalten, was zu einem getrennten metaphysikalischen Adressraum für Metadaten führt. Da der metaphysikalische Adressraum orthogonal zum Datenadressraum ist, trifft eine Metadatenzugriffsoperation auf den metaphysikalischen Adressraum nicht auf einen physikalischen Dateneintrag oder modifiziert diesen. Jedoch beeinflusst bei einer Ausführungsform, bei der Metadaten im selben Speicher gehalten werden, wie beispielsweise dem Speicher 215, der metaphysikalische Adressraum potentiell den Datenadressraum durch einen Konkurrenzkampf um eine Zuweisung im Speicher 215. Als Beispiel wird ein Datenelement 216 in einem Eintrag eines Speichers 215 zwischengespeichert, während Metadaten 217 für die Daten 216 in einem anderen Eintrag des Caches gehalten werden. Hier kann eine nachfolgende Metadatenoperation zur Auswahl des Speicherortes des Datenelements 216 für eine Entleerung und Ersetzung mit Metadaten für ein anderes Datenelement führen. Als Ergebnis treffen mit der Adresse der Metadaten 217 verbundene Operationen nicht das Datenelement 216. Jedoch kann eine Metadatenadresse für ein Metadatenelement physikalische Daten, wie beispielsweise das Datenelement 216 im Speicher 215 ersetzen.As the illustrated embodiment shows, metadata becomes 217 at one of the data element 216 separated memory location - a different address - resulting in a separate metaphysical address space for metadata. Because the metaphysical address space is orthogonal to the Data address space, a metadata access operation on the metaphysical address space does not affect or modify a physical data entry. However, in one embodiment, metadata is held in the same memory, such as memory 215 , the metaphysical address space potentially destroys the data address space by competing for allocation in memory 215 , As an example, a data element 216 in an entry of a memory 215 cached while metadata 217 for the data 216 be kept in another entry of the cache. Here is a subsequent metadata operation to select the location of the data item 216 for depletion and replacement with metadata for another piece of data. As a result, meet with the address of the metadata 217 connected operations are not the data item 216 , However, a metadata address for a metadata item may include physical data, such as the data item 216 In the storage room 215 replace.

Obwohl sich bei diesem Beispiel Metadaten potentiell in Konkurrenz zu Daten für einen Raum im Cache-Speicher befinden, führt die Fähigkeit, Metadaten lokal zu halten, potentiell zu einer effizienten Unterstützung für Metadaten ohne hohe Kosten, um langlebige Metadaten über eine Speicherhierarchie zu verbreiten. Wie sich aus der Annahme dieses Beispiels ergibt, werden diese Metadaten im selben Speicher, d. h. im Speicher 215, gehalten. Jedoch werden bei einer alternativen Ausführungsform Metadaten 217 für das Datenelement 216 oder damit verbundene Metadaten in einer getrennten Speicherstruktur gehalten. Hierin können Adressen für Metadaten und für Daten dieselben sein, während ein metaphysikalischer Teil der Metadatenadresse in eine getrennte Metadatenspeicherstruktur anstelle der Datenspeicherstruktur indexiert.Although in this example metadata is potentially in competition with data for a cache-space, the ability to maintain metadata locally potentially results in efficient support for high-cost metadata to propagate long-lived metadata across a memory hierarchy. As can be seen from the assumption of this example, these metadata will be in the same memory, ie in memory 215 , held. However, in an alternative embodiment, metadata becomes 217 for the data element 216 or associated metadata held in a separate memory structure. Herein, addresses for metadata and for data may be the same, while a metaphysical portion of the metadata address indexes into a separate metadata storage structure instead of the data storage structure.

Bei einem 1:1-Verhältnis von Metadaten zu Daten schattet der metaphysikalische Adressraum den Datenadressraum ab, bleibt jedoch orthogonal, wie oben erläutert wurde. Im Gegensatz dazu können, wie nachfolgend erläutert wird, Metadaten mit Bezug zu physikalischen Daten komprimiert werden. In diesem Fall schattet die Größe eines metaphysikalischen Adressraums für Metadaten den Datenadressraum bezüglich der Größe nicht ab, bleibt dennoch orthogonal.At a 1: 1 ratio of metadata to data, the metaphysical address space shadows the data address space, but remains orthogonal, as discussed above. In contrast, as discussed below, metadata related to physical data may be compressed. In this case, the size of a metaphysical address space for metadata does not shade the data address space in size, yet remains orthogonal.

Metaphysikalische AdressübersetzungMetaphysical address translation

In Fortsetzung der Erläuterung metaphysikalischer Adressräume kann jedes beliebige Verfahren zum Übersetzen einer Datenadresse, wie beispielsweise einer Adresse für ein Datenelement 216 in einem Datenadressraum, in eine metaphysikalische Adresse, wie beispielsweise eine Metadatenadresse für Metadaten 217 in einem metaphysikalischen Adressraum, verwendet werden. Gemäß einer Ausführungsform wird eine metaphysikalische Übersetzungslogik 210 verwendet, um eine Adresse, wie beispielsweise eine Datenadresse 200 in eine Metadatenadresse zu übersetzen. Wie dargestellt ist, umfasst eine Adresse 200 eine Adresse, die mit einem Datenelement 216 verbunden ist oder darauf verweist. Eine normale Datenübersetzung, wie beispielsweise eine Übersetzung zwischen physikalischen oder linearen und virtuellen Adressen kann verwendet werden, um ein Datenelement 216 im Speicher 215 zu indexieren. Zusätzlich umfasst eine Verbindung von Metadaten 217 mit dem Datenelement 216 eine ähnliche Übersetzung einer Adresse 200, die auf das Datenelement 216 verweist, in eine andere verschiedene Adresse, die auf Metadaten 217 verweist; daher führt eine Übersetzung einer Adresse 200 in eine Datenadresse mit einer Datenübersetzungslogik 205 und eine verschiedene metaphysikalische Adresse mit einer metaphysikalischen Übersetzung 210 zu getrennten Zugriffen, ohne gegenseitige Beeinflussung – wodurch die orthogonale Natur der beiden Adressräume erzeugt wird. Wie nachfolgend in weiteren Einzelheiten erläutert wird, basiert die Verwendung einer Datenübersetzung 205 oder eine metaphysikalischen Übersetzung 210 bei einer Ausführungsform auf dem Typ von Operation zum Zugriff auf Daten 200 – eine normale Datenzugriffsoperation, um auf ein Datenelement 216 zuzugreifen, verwendet eine Datenübersetzung 205, während eine Metadatenzugriffsoperation, um auf Metadaten 217 zuzugreifen, eine metaphysikalische Übersetzung 210 verwendet, die durch einen Teil eines Instruktions-/Operations-Code (Op-Code) identifiziert sein kann.Continuing to explain metaphysical address spaces, any method of translating a data address, such as an address for a data item, may be used 216 in a data address space, in a metaphysical address, such as a metadata metadata address 217 in a metaphysical address space. According to one embodiment, a metaphysical translation logic 210 used to get an address, such as a data address 200 translate into a metadata address. As shown, includes an address 200 an address associated with a data element 216 connected or referred to. A normal data translation, such as a translation between physical or linear and virtual addresses, can be used to construct a data element 216 In the storage room 215 to index. Additionally includes a compound of metadata 217 with the data element 216 a similar translation of an address 200 pointing to the data element 216 refers to another different address based on metadata 217 refers; therefore, a translation results in an address 200 in a data address with a data translation logic 205 and a different metaphysical address with a metaphysical translation 210 to separate accesses without mutual interference - thus creating the orthogonal nature of the two address spaces. As will be explained in more detail below, the use of a data translation is based 205 or a metaphysical translation 210 in one embodiment, on the type of operation for accessing data 200 A normal data access operation to access a data item 216 access uses a data translation 205 while performing a metadata access operation on metadata 217 to access, a metaphysical translation 210 which may be identified by a portion of an instruction / operation code (op code).

Bei einer anderen Ausführungsform kann eine Instruktion, die durch ihren Op-Code identifiziert ist, potentiell sowohl auf Daten als auch auf Metadaten für eine gegebene Metadatenadresse zugreifen und somit komplexe Operationen durchführen, wie beispielsweise ein konditionales Speichern zu Daten basierend auf Metadaten. Zum Beispiel wird eine Instruktion in eine Test- und Setze-Metadaten-Operation, um Metadaten zu testen und auf einen Wert zu setzen sowie eine zusätzliche Operation dekodiert, um Daten auf einen Wert zu setzen, falls der Test der Metadaten erfolgreich war. Als weiteres Beispiel kann ein Datenelement basierend auf einem Datenlesevorgang von einem Datenspeicher zur passenden Metadatenadresse bewegt werden.In another embodiment, an instruction identified by its op code may potentially access both data and metadata for a given metadata address, thus performing complex operations, such as conditional storage to data based on metadata. For example, an instruction is encoded into a test and set metadata operation to test and set metadata and an additional operation to set data to a value if the metadata test was successful. As another example, a data item may be moved from a data store to the appropriate metadata address based on a data read operation.

Beispiele einer Übersetzung der Datenadresse 200 in eine Metadatenadresse für Metadaten 217 sind unmittelbar nachfolgend angegeben. Als ein erstes Beispiel umfasst ein Übersetzen einer Datenadresse in eine Metadatenadresse ein Verwenden einer physikalischen Adresse oder einer virtuellen Adresse – nach einer normalen Datenübersetzung 205 – plus Hinzufügen eines metaphysikalischen Wertes mit einer metaphysikalischen Übersetzungslogik 210, um Datenadressen von Metadatenadressen zu trennen. In der Situation, in der eine virtuelle Adresse ohne eine Übersetzung verwendet wird, umfasst die metaphysikalische Übersetzungslogik 210 eine Logik, um die virtuelle Adresse mit einem metaphysikalischen Wert zu kombinieren. Jedoch wird in dem Fall, in dem eine normale Übersetzung einer Adresse von virtuell in physikalisch verwendet wird, eine normale Datenübersetzung 205 verwendet, um eine übersetzte Adresse aus der Adresse 200 zu erhalten, und dann umfasst die metaphysikalische Übersetzungslogik 210 eine Logik, um die übersetze Adresse mit einem metaphysikalischen Wert zu kombinieren, um eine Metadatenadresse zu bilden. Als ein weiteres Beispiel kann die Datenadresse 200 unter Verwendung getrennter Übersetzungsstrukturen, Tabellen und/oder Logik in der metaphysikalischen Übersetzung 210 übersetzt werden, um eine verschiedene Metadatenadresse zu erhalten. Hier kann eine metaphysikalische Übersetzungslogik 210 eine Logik spiegeln oder eine getrennte Logik umfassen – Logik, um die Adresse 200 mit einem metaphysikalischen Wert zu kombinieren, im Vergleich zur Datenübersetzungslogik 205, jedoch umfasst die metaphysikalische Übersetzungslogik 210 eine Tabelleninformation, um die Adresse 200 in eine andere, verschiedene Metadatenadresse zu übersetzen. Es ist zu erkennen, dass entweder durch Hinzufügen von Information zur, durch eine Erweiterung durch angefügte Informationen eine Ersetzung von Information darin oder eine Übersetzung einer Datenadresse, um eine Metadatenadresse zu erhalten, die resultierende, verschiedene Metadatenadresse mit dem Datenelement durch den Additions-, Erweiterungs-, Ersetzungs-, oder Übersetzungsalgorithmus verbunden ist, während sie gegen ein inkorrektes Aktualisieren oder Lesen des Datenelements orthogonal bleibt.Examples of a translation of the data address 200 into a metadata address for metadata 217 are given immediately below. As a first example, translating a data address into a metadata address using a physical address or a virtual address - after a normal data translation 205 - plus adding a metaphysical value with a metaphysical translation logic 210 to separate data addresses from metadata addresses. In the situation where a virtual address without a translation is used, the metaphysical translation logic is included 210 a logic to combine the virtual address with a metaphysical value. However, in the case where a normal translation of an address from virtual to physical is used, a normal data translation becomes 205 used to get a translated address from the address 200 and then includes the metaphysical translation logic 210 a logic to combine the translated address with a metaphysical value to form a metadata address. As another example, the data address 200 using separate translation structures, tables, and / or logic in the metaphysical translation 210 translated to get a different metadata address. Here can be a metaphysical translation logic 210 mirror a logic or comprise a separate logic - logic to the address 200 with a metaphysical value compared to the data translation logic 205 , but includes the metaphysical translation logic 210 a table information to the address 200 to translate into another, different metadata address. It will be appreciated that either by adding information to, information added by an extension, a replacement of information therein or a translation of a data address to obtain a metadata address, the resulting different metadata address with the data element by the addition, extension , Substitution, or translation algorithm while remaining orthogonal against improper updating or reading of the data element.

Einige wenige spezifische veranschaulichende Beispiele einer Übersetzung einer Datenadresse in eine Metadatenadresse, oder in anderen Worten einer Bestimmung einer Metadatenadresse aus/basierend auf einer Datenadresse, werden nachfolgend beschrieben. (1) Übersetzen einer ersten Adresse in eine zweite Adresse unter Verwendung einer normalen Adressenübersetzung von virtuell in physikalisch und Hinzufügen, Anhängen oder Einbeziehen eines metaphysikalischen Wertes an oder in die Datenadresse, um die Metadatenadresse zu bilden; (2) Nicht-Durchführen einer Adressübersetzung von virtuell in physikalisch der Datenadresse und Hinzufügen, Anfügen oder Einbeziehen eines metaphysikalischen Wertes an oder in die Datenadresse, um die Metadatenadresse zu bilden; (3) Übersetzen einer Datenadresse in eine übersetzte Metadatenadresse unter Verwendung einer metaphysikalischen Übersetzungstabellenlogik, die ebenfalls einen metaphysikalischen Wert umfassen kann, wobei sie nicht notwendiger Weise ein Einbeziehen, Hinzufügen, Anfügen oder Einfügen eines metaphysikalischen Wertes an oder in die übersetzte Metadatenadresse erfordert, um die Metadatenadresse zu bilden. Darüber hinaus kann jede beliebige der zuvor genannten Übersetzungstechniken ein Kompressionsverhältnis von Daten zu Metadaten, um Metadaten für jedes Kompressionsverhältnis getrennt zu speichern, einschließen, d. h. darauf basieren.A few specific illustrative examples of a translation of a data address into a metadata address, or in other words a determination of a metadata address from / based on a data address, are described below. (1) translating a first address to a second address using a normal virtual-to-physical address translation and adding, appending, or including a metaphysical value to or in the data address to form the metadata address; (2) non-performing address translation from virtual to physical to the data address and adding, appending, or incorporating a metaphysical value to or into the data address to form the metadata address; (3) translating a data address into a translated metadata address using metaphysical translation table logic, which may also include a metaphysical value, and does not necessarily require incorporation, addition, append or insertion of a metaphysical value to or into the translated metadata address in order to obtain the metadata address Metadata address form. In addition, any of the aforementioned translation techniques may include a compression ratio of data to metadata to separately store metadata for each compression ratio, i. H. based on it.

Hier kann eine Adresse für eine Übersetzung und/oder Komprimierung modifiziert sein, wie beispielsweise durch Vernachlässigung spezifischer Bits einer Adresse, Entfernen spezifischer Bits einer Adresse, Veränderung der Bit-Bereiche, die bei einer Adresse für eine Auswahl verschiedener Granularitäten von Daten verwendet werden, Übersetzen spezifischer Bits und Hinzufügen oder Ersetzen spezifischer Bits mit Informationen, die zu Metadaten in Beziehung stehen. Die Kompression wird in weiteren Einzelheiten nachfolgend mit Bezugnahme auf 4 erläutert.Here, an address for translation and / or compression may be modified, such as by neglecting specific bits of an address, removing specific bits of an address, changing the bit areas used at an address for a selection of different granularities of data, translating specific bits and adding or replacing specific bits with information related to metadata. The compression will be described in more detail below with reference to FIG 4 explained.

Multiple Metaphysikalische AdressräumeMultiple metaphysical address spaces

Unter Bezugnahme auf 3 wird eine Ausführungsform zur Unterstützung multipler metaphysikalischer Adressräume veranschaulicht. Bei einer Ausführungsform ist jedes Verarbeitungselement mit einem metaphysikalischen Adressraum verbunden, so dass jedes Verarbeitungselement in der Lage ist, unabhängige Metadaten beizubehalten. Vier Verarbeitungselemente 301–304 sind gezeigt. Wie oben erläutert wurde, kann ein Verarbeitungselement jedes beliebige der oben beschriebenen Elemente mit Bezugnahme zu 1 umfassen. Als ein erstes Beispiel umfassen Verarbeitungselemente Kerne eines Prozessors. Jedoch werden als ein illustratives Beispiel, um die Erläuterung nachfolgend zu fördern, Verarbeitungselemente 301–304 mit Bezugnahme zu Hardware-Threads (Threads) in einem Prozessor erörtert. Jeder Hardware-Thread ist vorgesehen einen Software-Thread und potenziell mehrere Software-Untersysteme auszuführen.With reference to 3 An embodiment for supporting multiple metaphysical address spaces is illustrated. In one embodiment, each processing element is connected to a metaphysical address space so that each processing element is able to maintain independent metadata. Four processing elements 301 - 304 are shown. As explained above, a processing element may be any of the elements described above with reference to 1 include. As a first example, processing elements include cores of a processor. However, as an illustrative example, to promote the explanation below, processing elements become 301 - 304 discussed with reference to hardware threads (threads) in a processor. Each hardware thread is intended to execute a software thread and potentially multiple software subsystems.

Daher ist es potenziell vorteilhaft zu ermöglichen, dass einzelne Threads der Threads 301–304 getrennte Metadaten beibehalten. Bei einer Ausführungsform ist die metaphysikalische Übersetzungslogik 310 dafür vorgesehen, Zugriffe verschiedener Threads 301–304 ihren geeigneten metaphysikalischen Adressräumen zuzuordnen. Als eine Beispiel indexiert ein Thread-Identifizierer (ID), der in Verbindung mit einer Adresse verwendet wird, auf die durch eine Metadatenzugriffsoperation verwiesen wird, in den korrekten metaphysikalischen Adressraum.Therefore, it is potentially beneficial to allow individual threads of the threads 301 - 304 Keep separate metadata. In one embodiment, the metaphysical translation logic is 310 intended to access different threads 301 - 304 their appropriate metaphysical address spaces. As an example, a thread identifier (ID) indexes in conjunction with an address is being referenced by a metadata access operation, into the correct metaphysical address space.

Zur Veranschaulichung sei angenommen, dass eine Metadatenzugriffsoperation, die mit dem Thread 302 verbunden ist und auf Adressdaten 300 für das Datenelement 316 verweist, empfangen wird. Jede beliebige Übersetzungsmethode, wie oben beschrieben wurde, kann verwendet werden, um die Datenadresse für das Datenelement 316 in eine Metadatenadresse zu übersetzen. Jedoch umfasst die Übersetzung zusätzlich eine Kombination mit einer Thread ID 312, die beispielsweise von einem Steuerungsregister für den Thread 302 oder einem Opcode der empfangenen Instruktion vom Thread 302 erhalten werden kann. Die Kombination kann ein Anfügen der Thread ID 302 an die Adresse, eine Ersetzung von Bits in der Adresse oder jedes andere bekannte Verfahren einer Zuordnung einer Thread-ID zu einer Adresse umfassen. Im Ergebnis ist die metaphysikalische Übersetzungslogik 310 dazu in der Lage, aus dem metaphysikalischen Adressraum, der mit dem Datenelement 316 verbunden ist, für das Verarbeitungselement 302 auszuwählen/in diesen zu indexieren.For illustrative purposes, suppose that a metadata access operation was performed with the thread 302 is connected and on address data 300 for the data element 316 refers, is received. Any translation method, as described above, may be used to designate the data address for the data element 316 translate into a metadata address. However, the translation additionally includes a combination with a thread ID 312 For example, from a control register for the thread 302 or an opcode of the received instruction from the thread 302 can be obtained. The combination may be appending the thread ID 302 to the address, a replacement of bits in the address, or any other known method of associating a thread ID with an address. The result is the metaphysical translation logic 310 capable of doing so from the metaphysical address space associated with the data element 316 connected to the processing element 302 to select / index into this.

Aus dem Beispiel lässt sich schließen, dass durch Verwenden der Thread-ID für die Threads 301–304 als Teil der Übersetzung in eine metaphysikalische Adresse jedes Verarbeitungselement 301–304 in der Lage ist, unabhängige Metadaten für das Datenelement 316 beizubehalten. Dennoch muss ein Programmierer die metaphysikalischen Adressräume nicht individuell verwalten, da die Hardware in der Lage ist, diese durch Verwenden einer Thread-ID in einer für Software transparenten Weise getrennt zu halten. Darüber hinaus sind die metaphysikalischen Adressräume orthogonal – ein Metadatenzugriff von einem Thread greift nicht auf Metadaten von einem anderen Thread zu, da jeder Metadatenzugriff mit einem getrennten Satz von Adressen verbunden ist, der eine Referenz auf eine eindeutige Thread-ID umfasst.The example suggests that by using the thread ID for the threads 301 - 304 as part of the translation into a metaphysical address of each processing element 301 - 304 is capable of independent metadata for the data element 316 maintain. However, a programmer does not have to manage the metaphysical address spaces individually because the hardware is able to keep them separate by using a thread ID in a software transparent manner. In addition, the metaphysical address spaces are orthogonal-one metadata access from one thread does not access metadata from another thread because each metadata access is associated with a separate set of addresses that includes a reference to a unique thread ID.

Dennoch kann es bezüglich Instruktionen/Operationen, um auf Metadaten zuzugreifen, bestimmte Situationen geben, in welchen einem Metadatenzugriff von einem Thread ein Zugriff auf Metadaten eines anderen Threads gewährt wird. In anderen Worten kann bei einigen Implementierungen ein Zugriff über PEIDs und/oder MDIDs (wie nachfolgend erörtert wird) vorteilhaft sein. Beispielsweise um zu bestimmen, ob Hardware Konflikte erfasst hat, um Monitor-Metadaten von einem anderen Thread zu prüfen, um zu bestimmen, ob ein zugeordnetes Datenelement durch einen anderen Thread überwacht wird, um Metadaten anderer Threads zu löschen oder um Commit-Bedingungen zu bestimmen, die ein Thread prüfen muss, um Metadaten anderer Threads, die mit dem Datenelement 316 verknüpft sind, zu modifizieren oder zu löschen.However, with respect to instructions / operations to access metadata, there may be certain situations in which metadata access from one thread is granted access to metadata of another thread. In other words, in some implementations, access via PEIDs and / or MDIDs (as discussed below) may be beneficial. For example, to determine if hardware has detected conflicts to examine monitor metadata from another thread, to determine whether an associated data item is being monitored by another thread, to delete metadata from other threads, or to determine commit conditions One thread needs to check for metadata of other threads that match the data item 316 are linked, modified or deleted.

Hierin wird ein spezifischer Opcode für die Operationen, um auf Metadaten eines anderen Threads zuzugreifen, erkannt und als ein Ergebnis führt die metaphysikalische Übersetzungslogik 310 die Übersetzung der Adresse 300 in alle Metadatenadressen für die Metadaten, auf die zugegriffen werden soll, durch. Als ein spezifisches veranschaulichendes Beispiel, bei dem vier Bits an die Adresse 300 angefügt werden, wobei jedes Bit eines der Verarbeitungselemente 301–304 repräsentiert und eine Metadatenzugriffsoperation, wie beispielsweise eine Löschungsoperation, alle Metadaten für das Datenelement 316 löschen soll, wobei dann eine metaphysikalische Übersetzungslogik 310 jedes der vier Bits setzt, um auf alle Metadaten 317 zuzugreifen. Hierin kann die Lookup-Logik für den Speicher 315 so konstruiert werden, dass ein einzelner Zugriff, wobei alle vier Bits gesetzt sind, auf alle Metadaten 317 zugreift oder die Metaphysikalische Übersetzungslogik 310 kann vier getrennte Zugriffe mit einem verschiedenen Thread-ID Bit der vier gesetzten Bits erzeugen, um auf alle Metadaten 317 zuzugreifen. Als ein veranschaulichendes Beispiel kann eine Maske auf einen Adresswert angewendet werden, um es einem Thread zu ermöglichen, Metadaten eines anderen Threads zu treffen.Herein, a specific opcode for the operations to access metadata of another thread is recognized, and as a result, the metaphysical translation logic results 310 the translation of the address 300 in all metadata addresses for the metadata to be accessed. As a specific illustrative example in which four bits are sent to the address 300 where each bit is one of the processing elements 301 - 304 and a metadata access operation, such as a delete operation, all metadata for the data item 316 and then a metaphysical translation logic 310 Each of the four bits sets to all metadata 317 access. This can be the lookup logic for the memory 315 be constructed so that a single access, with all four bits set, on all metadata 317 accesses or the metaphysical translation logic 310 can generate four separate accesses with a different thread ID bit of the four set bits to all metadata 317 access. As an illustrative example, a mask may be applied to an address value to allow one thread to encounter metadata of another thread.

Zusätzlich kann jedes Verarbeitungselement 301–304, wie dargestellt ist, mit mehreren metaphysikalischen Adressräumen verbunden sein, um mehrere Kontexte oder Software-Untersysteme in einem einzelnen Thread mit mehreren Metadatenadressräumen zu verschränken. Beispielsweise ist es bei einigen Situationen potenziell vorteilhaft mehreren Software-Untersystemen in einem einzelnen Verarbeitungselement zu ermöglichen unabhängige Metadatensätze zu halten. Daher können in einem Beispiel orthogonale Metadatenadressräume auf mehreren Verarbeitungselementniveaus, wie beispielsweise einem Kernniveau, Hardware-Thread-Niveau und/Software-Untersystem-Niveau bereitgestellt werden. Bei der Veranschaulichung ist jedes Verarbeitungselement 301–304 mit zwei metaphysikalischen Adressräumen verbunden, wobei jeder der beiden metaphysikalischen Adressräume mit Software-Untersystemen zu verbinden ist, um auf einem der Verarbeitungselemente auszuführen.In addition, each processing element 301 - 304 as shown, may be connected to multiple metaphysical address spaces to interleave multiple contexts or software subsystems in a single thread with multiple metadata address spaces. For example, in some situations, it is potentially advantageous to allow multiple software subsystems in a single processing element to hold independent sets of metadata. Thus, in one example, orthogonal metadata address spaces may be provided at multiple processing element levels, such as core level, hardware thread level, and / software subsystem level. In the illustration, each processing element is 301 - 304 associated with two metaphysical address spaces, each of the two metaphysical address spaces to be connected to software subsystems to execute on one of the processing elements.

Ein Software-Untersystem umfasst jeden beliebigen auf einem Verarbeitungselement auszuführenden Task oder Code, der einen getrennten metaphysikalischen Adressraum verwenden kann. Als ein veranschaulichendes Beispiel umfassen vier Untersysteme, die mit individuellen metaphysikalischen Adressräumen verbunden sein können, ein transaktionales Laufzeit-Untersystem, ein Garbage-Collection-Laufzeit-Untersystem, ein Speicherschutz-Untersystem und ein Software-Übersetzungs-Untersystem, die auf einem einzelnen Verarbeitungselement ausgeführt werden können. Hierin kann jedes Software-Untersystem zu verschiedenen Zeiten die Steuerung des Verarbeitungselements inne haben. Als ein weiteres Beispiel umfasst ein Software-Untersystem individuelle Transaktionen, die in einem einzelnen Verarbeitungselement ausgeführt werden. Tatsächlich kann es für verschachtelte Transaktionen, die auf demselben Thread ausführen, erstrebenswert sein, mit getrennten metaphysikalischen Adressräumen verbunden zu sein. Zur Veranschaulichung ist es, wenn ein Filtertest für einen Zugriff auf ein Datenelement in einer äußeren Transaktion fehlschlägt, immer noch potenziell vorteilhaft einen zweiten, unterschiedlichen Filter für einen Zugriff auf dasselbe Datenelement in einer inneren verschachtelten Transaktion vorzusehen, die getrennt Erfolg haben kann, um den Zugriff innerhalb der inneren Transaktion zu beschleunigen. Darüber hinaus ist zur Sicherstellung, dass die Metadaten für die äußere Transaktion beibehalten werden, wenn eine verschachtelte innere Transaktion abbricht, jede verschachtelte Transaktion – Untersystem – mit einem verschiedenen Metadatenraum verbunden, so dass ein Löschen der Metadaten der inneren verschachtelten Transaktion nicht die Metadaten der äußeren Transaktion beeinträchtigt. Jedoch ist ein Software-Untersystem nicht in dieser Weise beschränkt, da jeder beliebige Task oder Code in der Lage sein kann, Metadaten zu verwalten.A software subsystem includes any task or code to be executed on a processing element that can use a separate metaphysical address space. As an illustrative example, four subsystems that may be associated with individual metaphysical address spaces include a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem and a software translation subsystem that can be executed on a single processing element. Herein, each software subsystem may have control of the processing element at different times. As another example, a software subsystem includes individual transactions executed in a single processing element. In fact, for nested transactions running on the same thread, it may be desirable to be connected to separate metaphysical address spaces. By way of illustration, if a filter test fails to access a data item in an outer transaction, it is still potentially advantageous to provide a second, different filter for accessing the same data item in an inner nested transaction that may succeed separately Accelerate access within the inner transaction. Moreover, to ensure that the metadata for the outer transaction is maintained when a nested inner transaction aborts, each nested transaction - subsystem - is associated with a different metadata space so that deleting the metadata of the inner nested transaction does not compromise the metadata of the outer Transaction impaired. However, a software subsystem is not limited in this way, as any task or code may be able to manage metadata.

Bei einer Auführungsform wird zur Gewährleistung orthogonaler metaphysikalischer Adressräume auf der Ebene des Software-Untersystems die Adresse mit der ID des Verarbeitungselements (PEIPD), wie oben erläutert wurde, und zusätzlich mit einer Metadaten-ID (MDID) oder einer Context-ID kombiniert. Daher können getrennte Metadaten eindeutig für ein Untersystem in einem Verarbeitungselement identifiziert werden. Unter Verwendung eines Beispiels von oben sei angenommen, dass die Verarbeitungselemente 301–304 Hardware-Threads sind und dass der Thread 302 eine äußere Transaktion und eine innere Transaktion ausführt, die in der äußeren Transaktion verschachtelt ist. Für die äußere Transaktion sind die Metadaten 317c mit dem Datenelement 316 durch eine metaphysikalische Übersetzung 310 verbunden, die die Adressdaten 300 des Datenelements 316 in eine Adresse einschließlich einer Thread-ID (TID) und einer Metadaten-ID (MDID) für die äußere Transaktion übersetzt, welche auf die Metadaten 317c verweist.In one embodiment, to ensure orthogonal metaphysical address spaces at the software subsystem level, the address is combined with the ID of the processing element (PEIPD), as discussed above, and additionally with a metadata ID (MDID) or context ID. Therefore, separate metadata can be uniquely identified for a subsystem in a processing element. Using an example from above, assume that the processing elements 301 - 304 Hardware threads are and that thread 302 executes an outer transaction and an inner transaction that is nested within the outer transaction. For the outer transaction, the metadata 317c with the data element 316 through a metaphysical translation 310 connected to the address data 300 of the data element 316 translated into an address including a thread ID (TID) and a metadata ID (MDID) for the outer transaction, which is based on the metadata 317c points.

Als ein rein veranschaulichendes Beispiel umfassen Metadaten 317c vier Filterwerte – Lesefilterwert, Schreibfilterwert, Undo-Filterwert und einen Misch-Filterwert, einen Zeiger auf einen anderen Verweis auf einen Backup-Ort für das Datenelement 316, einen Überwachungswert, um anzuzeigen, ob Monitore auf das Datenelement 316 verloren gegangen sind, einen Transaktions-Deskriptorwert, und eine Version des Datenelements 316. Ähnlich ist die innere Transaktion mit den Metadaten 317d für das Datenelement 316 verbunden, das dieselben Metadatenfelder, wie jene in den Metadaten 317c, umfasst. Wie oben, übersetzt die metaphysikalische Übersetzung 310 die Datenadresse 300 für das Datenelement 316 in eine Adresse, die mit der Thread-ID und der Metadaten-ID für die innere Transaktion verbunden ist, auf die die Metadaten 317d verweisen.As a purely illustrative example, metadata includes 317c four filter values - read filter value, write filter value, undo filter value and a mixed filter value, a pointer to another reference to a backup location for the data element 316 , a monitor value to indicate whether monitors are on the data item 316 lost, a transaction descriptor value, and a version of the data item 316 , Similar is the inner transaction with the metadata 317d for the data element 316 connected, the same metadata fields as those in the metadata 317c , includes. As above, the metaphysical translation translates 310 the data address 300 for the data element 316 to an address associated with the thread ID and metadata ID for the inner transaction to which the metadata 317d refer.

Hier kann der einzige Unterschied zwischen der Metadatenadresse, welche auf die Metadaten 317c verweist und der Metadatenadresse, welche auf die Metadaten 317d verweist, die Metadaten ID für die äußere Transaktion und die innere Transaktion sein. Dennoch stellt dieser Unterschied in der Adresse sicher, dass die Adressräume unverbunden/orthogonal sind – ein Zugriff auf Metadaten von der inneren Transaktion beeinflusst nicht Metadaten von der äußeren Transaktion, da die MDID für einen Zugriff auf die innere Transaktion von der äußeren Transaktion verschieden sein wird. Wie oben darauf Bezug genommen wurde, kann dies für ein Zurückrollen verschachtelter Transaktionen oder für ein Halten verschiedener Metadatenwerte für Transaktionen verschiedener Niveaus vorteilhaft sein. Insbesondere, falls die innere Transaktion abgebrochen wird, können die Backup-Daten für das Datenelement 316, die in den Metadaten 317 gehalten werden, gelöscht werden oder verwendet werden, um das Datenelement 316 zu einem Eingangspunkt vor der inneren Transaktion ohne ein Löschen oder Beeinflussen der Backup-Daten für die äußere Transaktion, die in den Metadaten 317c gehalten werden, zurück zu drehen.Here, the only difference between the metadata address, which is on the metadata 317c and the metadata address pointing to the metadata 317d Refers to be the metadata ID for the outer transaction and the inner transaction. However, this difference in address ensures that the address spaces are unconnected / orthogonal - accessing metadata from the inner transaction does not affect metadata from the outer transaction, since the MDID will be different from the outer transaction for accessing the inner transaction , As noted above, this may be beneficial for rolling back nested transactions or holding different metadata values for transactions of different levels. In particular, if the inner transaction is aborted, the backup data for the data item 316 that in the metadata 317 be kept, deleted or used to the data item 316 to an entry point in front of the inner transaction without deleting or affecting the backup data for the outer transaction contained in the metadata 317c be kept spinning back.

Man beachte, dass die Metadaten-ID (MDID), um metaphysikalische Adressräume von Software-Untersystemen zu trennen, jede beliebige Größe aufweisen kann und von einer Vielzahl von Quellen kommen kann. Als ein stark vereinfachtes veranschaulichendes Beispiel mit vier Verarbeitungselementen (PEs) 301–304 kann ein PEID eine Kombination von zwei Bits – 00, 01, 10, 11 – umfassen. Ähnlich ist, falls vier getrennte metaphysikalische Adressräume unterstützt werden, ein MDID aus zwei Bits – 00, 01, 10, 11 – in ähnlicher Weise in der Lage zwischen vier Untersystemen zu unterscheiden. Zur Veranschaulichung umfasst ein Wert zur Repräsentation des Verarbeitungselementes 302 und eines Untersystems 2 in der PE 302 0101 (wobei die ersten beiden Bits 01 für PE 302 sind und die beiden zweiten Bits 01 für das zweite Untersystem sind). In diesem Beispiel kombiniert die metaphysikalische Übersetzungslogik diesen Wert mit der Datenadresse 300 oder einer Übersetzung davon, um auf PE 302 MDID 01 zu verweisen, worin der Metadatenort 317d enthalten ist.Note that the Metadata ID (MDID) to separate metaphysical address spaces from software subsystems can be any size and can come from a variety of sources. As a highly simplified illustrative example with four processing elements (PEs) 301 - 304 For example, a PEID may comprise a combination of two bits - 00, 01, 10, 11. Similarly, if four separate metaphysical address spaces are supported, a two-bit MDID - 00, 01, 10, 11 - will similarly be able to distinguish between four subsystems. By way of illustration, a value for representing the processing element comprises 302 and a subsystem 2 in the PE 302 0101 (where the first two bits 01 are for PE 302 and the two second bits are 01 for the second subsystem). In this example, metaphysical translation logic combines this value with the data address 300 or a translation of it to PE 302 MDID 01, which is the metadata location 317d is included.

Jedoch können sowohl Thread-IDs und MDIDs komplexer sein. Beispielsweise sei angenommen, dass sich die Threads 301–302 einen Zugriff auf den Speicher 315 gemeinsam teilen, während die Threads 303–304 entfernte Verarbeitungselemente sind, die sich einen Zugriff auf den Speicher 315 nicht teilen. Zusätzlich sei angenommen, dass die Threads 301–302 jeweils zwei Software-Untersysteme für eine Gesamtzahl von vier orthogonalen Adressräumen für die Threads 301–302 unterstützen – PE 301 MD0, PE 301 MD1, PE 302 MD0 und PE 302 MD1 Adressräume. In diesem Fall kann ein Wert für die kombinierte Thread-ID und MDID, die verwendet wird, um die Metadatenadresse zu erhalten, von einem Opcode, einem Steuerungsregister oder einer Kombination daraus stammen. Zur Veranschaulichung liefert ein Opcode ein Bit für Kontext/MDID, ein Steuerregister liefert ein Bit für eine ID eines Verarbeitungselements (PEID) – unter der Annahme von lediglich zwei Verarbeitungselementen, und ein Metadatensteuerregister, wie beispielsweise MDCR 320, liefert vier Bits, um ein spezifisches Software-Untersystem/Kontext für eine größere Granularität zu identifizieren. Somit wird, wenn eine Metadatenzugriffsoperation, die auf die Adresse 300 für das Datenelement 316 verweist, vorn zweiten Thread empfangen wird – PE 302, dann das eine Bit vom Opcode – wobei das erste Bit eine 1 umfasst, um einen zweiten Kontext anzuzeigen, und ein zweites Bit von einem Steuerregister für das Verarbeitungselement 302 – das zweite Bit umfasst eine 1, um das Verarbeitungselement 302 anzuzeigen, mit einer MDID vom Metadatensteuerregister (MDCR) 320 kombiniert, das mit dem zweiten Thread verbunden ist; das MDCR, das zuvor durch die MDID des momentanen Untersystems aktualisiert wurde, welche den zweiten Thread steuert – 0010 –, um das richtige Untersystem zu identifizieren, das mit der empfangenen Operation verbunden ist. Die metaphysikalische Übersetzungslogik nimmt den kombinierten Wert, wie beispielsweise 110010 und kombiniert diesen weiter mit der referenzierten Datenadresse 300 oder einer Übersetzung davon, um eine Metadatenadresse zu erhalten. Jedoch ist der Teil 110010 der Metadatenadresse für das Untersystem eindeutig, von dem die Adressoperation stammte, so dass sie lediglich die Metadatenadresse 317d im Speicher 315 trifft oder modifiziert, ohne die Metadatenadressen 317a, b, c, e, f, g, h zu treffen oder zu beeinflussen – die orthogonalen metaphysikalischen Adressräume für andere Untersysteme, sowohl im zweiten als auch anderen Threads. However, both thread IDs and MDIDs can be more complex. For example, suppose that the threads 301 - 302 an access to the memory 315 share together while the threads 303 - 304 There are remote processing elements that access the memory 315 do not share. Additionally, suppose that the threads 301 - 302 two software subsystems each for a total of four orthogonal address spaces for the threads 301 - 302 support - PE 301 MD0, PE 301 MD1, PE 302 MD0 and PE 302 MD1 address spaces. In this case, a value for the combined thread ID and MDID used to obtain the metadata address may come from an opcode, a control register, or a combination thereof. By way of illustration, an opcode provides a bit for context / MDID, a control register provides a bit for an ID of a processing element (PEID) - assuming only two processing elements, and a metadata control register, such as MDCR 320 , provides four bits to identify a specific software subsystem / context for greater granularity. Thus, if a metadata access operation is performed on the address 300 for the data element 316 refers, front second thread is received - PE 302 , then the one bit of the opcode - the first bit including a 1 to indicate a second context and a second bit of a control register for the processing element 302 The second bit includes a 1 to the processing element 302 to display with an MDID from the metadata control register (MDCR) 320 combined, which is connected to the second thread; the MDCR previously updated by the MDID of the current subsystem controlling the second thread - 0010 - to identify the proper subsystem associated with the received operation. The metaphysical translation logic takes the combined value, such as 110010, and further combines it with the referenced data address 300 or a translation thereof to obtain a metadata address. However, the metadata address portion 110010 is unique to the subsystem from which the address operation originated, so that it only contains the metadata address 317d In the storage room 315 meets or modifies without the metadata addresses 317a , b, c, e, f, g, h - the orthogonal metaphysical address spaces for other subsystems, both in the second and other threads.

Als ein besonderes veranschaulichendes Beispiel wird eine Erörterung einer speziellen Form eines MDCR einbezogen. Bei einigen Ausführungsformen kann eine ISA mit einem Pro-Thread Metadatenidentifiziererregister (MDID-Register) erweitert werden, welches ein MDID mit MDID-sensitiven Metadaten-Lade/Speicher/Test/Setz-Instruktionen belegt. Bei einigen Ausführungsformen ist es zweckdienlich, über eine Mehrzahl derartiger Register zu verfügen. Beispielsweise MDCR: Metadaten-Steuer-Register ist ein 32-Bit-Lese-Schreibregister, das die momentane Metadatenkontext-ID (MDID) enthält. Es kann durch ein CR MOV aktualisiert werden. Beispielhafte Bit-Felddefinitionen sind wie folgt:

Tabelle A. Beispielhafte Ausführungsform von Bits für MDCR As a particular illustrative example, a discussion of a specific form of MDCR is included. In some embodiments, an ISA may be extended with a per-thread metadata identifier register (MDID register) that populates an MDID with MDID-sensitive metadata load / store / test / set instructions. In some embodiments, it is convenient to have a plurality of such registers. For example, MDCR: metadata control register is a 32-bit read-write register containing the current metadata context ID (MDID). It can be updated by a CR MOV. Exemplary bit field definitions are as follows:

Table A. Exemplary embodiment of bits for MDCR

MDID 0 und MDID 1 sind die Metadaten-IDs, auf die durch den Instruktionssatz zeitgleich zugegriffen werden kann. Die Anzahl der Bits, die aus diesen Feldern tatsächlich verwendet wird, ist MDID_SIZE, was bei einer Ausführungsform auf einer Berechtigungsebene, die durch den Aufbau des Prozessors spezifiziert ist, lediglich gelesen wird. Jedoch können bei anderen Ausführungsformen Privilegienebenen unterschiedlicher Ebenen dazu in der Lage sein, die Größe zu modifizieren. Es ist denkbar, dann keine Hardware-Überprüfungen vorhanden sind, die sicherstellen, dass das MDID in die Bitgrößenzuweisung passt. Bei einer Ausführungsform sind MDID 0 und MDID 1 dazu eingerichtet, auf jeder Berechtigungsebene geschrieben und gelesen zu werden. Es ist auch möglich, spezielle MDID-Werte zu verwenden, um spezielle Metadatenräume zu bezeichnen, die immer als 0 oder 1 gelesen werden.MDID 0 and MDID 1 are the metadata IDs that can be accessed simultaneously by the instruction set. The number of bits actually used from these fields is MDID_SIZE, which in one embodiment is merely read at a privilege level specified by the processor design. However, in other embodiments, privilege levels of different levels may be able to modify the size. It is conceivable that then there are no hardware checks that will ensure that the MDID fits into the bit size allocation. In one embodiment, MDID 0 and MDID 1 are arranged to be written and read at each privilege level. It is also possible to use special MDID values to designate special metadata spaces that are always read as 0 or 1.

Dies kann von Software dazu verwendet werden, auf eine ähnliche Weise wie die Erläuterung eines Registers mit Bezugnahme auf 6 und 7, einen Metadatenwert zu erzwingen, alle Metadatentests in einem Block dazu zu zwingen, wahr oder falsch zu sein.This may be used by software in a similar manner as the explanation of a register with reference to 6 and 7 to force a metadata value to force all metadata tests in a block to be true or false.

Jedoch ist bei einem anderen Beispiel, das oben erläutert worden, eine metaphysikalische Übersetzungslogik 310 in Verbindung mit (nicht dargestellten) Decoder dazu eingerichtet, Metadatenzugriffsoperationen vom Thread 302 zu erkennen, die dazu vorgesehen sind, auf Metadaten aus dem Metadatenadressraum des Threads 301 zuzugreifen und einen Zugriff für diese spezifischen Instruktionen/Operationen zuzulassen, um die Metadaten des Threads 301 zu lesen oder zu modifizieren. However, in another example, discussed above, is a metaphysical translation logic 310 in conjunction with decoders (not shown) configured to execute metadata access operations from the thread 302 to detect metadata from the metadata address space of the thread 301 to access and allow access for these specific instructions / operations to the metadata of the thread 301 to read or modify.

Komprimierung von Metadaten zu DatenCompress metadata to data

Oben wurde eine 1-zu-1-Abbildung von Daten auf Metadaten – nicht komprimierte Metadaten – erläutert. Jedoch ist es unter gewissen Umständen effizienter, eine geringere Menge an Metadaten im Vergleich zu Daten zu verwenden – eine Kompression von Metadaten, wobei die Größe der Metadaten geringer ist als die der Daten. Man beachte, dass die metaphysikalische Adresseübersetzungslogik 210 und 310 auf den 2–3 eine Komprimierung bei der Durchführung einer Übersetzung und Modifizierung einer Adresse, um auf komprimierte Metadaten zu verweisen, entsprechend berücksichtigen kann. Mit Bezugnahme im Folgenden auf 4 ist eine Ausführungsform zum Modifizieren einer Adresse, um eine Kompression von Metadaten zu erhalten, dargestellt. Insbesondere wird eine Ausführungsform eines Kompressionsverhältnisses von 8 für Daten in Metadaten gezeigt. Eine Steuerungslogik, wie beispielsweise eine metaphysikalische Adresseübersetzungslogik 210 und 310 aus 2–3 ist dazu vorgesehen, eine Datenadresse 400 zu empfangen, auf die durch eine Metadatenzugriffsoperation verwiesen wird. Ein Beispiel umfasst eine Komprimierung, ein Verschieben oder Entfernen einer Zahl von log₂(N) Bits in oder aus einer Adresse 400, wobei N das Komprimierungsverhältnis von Daten zu Metadaten ist. Bei dem gezeigten Beispiel werden für ein Komprimierungsverhältnis von 8 drei Bits für die Metadatenadresse 405 nach unten verschoben und entfernt. Im Wesentlichen wird die Adresse 400, die 64 Bits umfasst, um auf ein spezielles Daten-Byte im Speicher zu verweisen, um drei Bits verkürzt, um die Metadaten-Byte-Adresse 405 zu bilden, die dazu verwendet wird, auf Metadaten im Speicher auf einer Byte-Granularität zu verweisen. Daraus wird ein Bit von Metadaten unter Verwendung der drei zuvor aus der Adresse entfernten Bits ausgewählt, um die Metadaten-Byte-Adresse zu bilden. Die verschobenen/entfernten Bits werden gemäß einer Ausführungsform durch andere Bits ersetzt. Wie dargestellt wurde, werden die Bits höherer Ordnung durch Nullen ersetzt, nachdem die Adresse 400 verschoben wurde. Jedoch können die entfernten/verschobenen Bits durch andere Daten oder Informationen ersetzt werden, wie beispielsweise eine Verarbeitungselement-ID, einen Kontextidentifizierer (ID) und/oder eine Metadaten-ID (MDID), die mit der Metadatenzugriffsoperation verknüpft ist. Obwohl bei diesem Beispiel die Bits mit der niedrigsten Zahl entfernt werden, kann jede Position von Bits entfernt und, basierend auf jeder beliebigen Anzahl von Faktoren, wie beispielsweise der Cache-Organisation, der Cache-Schaltungszeitplanung, Lokalität von Metadaten zu Daten und Minimierung von Konflikten zwischen Daten und Metadaten entfernt und ersetzt werden.Above, a one-to-one mapping of data to metadata - uncompressed metadata - was discussed. However, under some circumstances, it is more efficient to use a lower amount of metadata compared to data-a compression of metadata, where the size of the metadata is less than that of the data. Note that the metaphysical address translation logic 210 and 310 on the 2 - 3 compression may be considered when performing translation and modification of an address to refer to compressed metadata. With reference to below 4 For example, an embodiment for modifying an address to obtain compression of metadata is illustrated. In particular, an embodiment of a compression ratio of 8 for data in metadata is shown. A control logic, such as metaphysical address translation logic 210 and 310 out 2 - 3 is intended to be a data address 400 receive, which is referenced by a metadata access operation. An example includes compression, shifting or removing a number of log ₂ (N) bits in or out of an address 400 where N is the compression ratio of data to metadata. In the example shown, for a compression ratio of 8, three bits for the metadata address 405 moved down and removed. In essence, the address 400 that includes 64 bits to refer to a specific data byte in memory, truncated to three bits, to the metadata byte address 405 which is used to refer to metadata in memory on a byte granularity. From this, one bit of metadata is selected using the three bits previously removed from the address to form the metadata byte address. The shifted / removed bits are replaced by other bits according to one embodiment. As has been shown, the higher order bits are replaced by zeros after the address 400 was moved. However, the removed / shifted bits may be replaced by other data or information, such as a processing element ID, context identifier (ID), and / or metadata ID (MDID) associated with the metadata access operation. Although in this example the least significant bits are removed, each position can be bit-removed and based on any number of factors, such as cache organization, cache scheduling, metadata-to-data locality, and conflict minimization removed and replaced between data and metadata.

Beispielsweise könnte eine Datenadresse nicht um log₂(N) verschoben werden, sondern könnten Adress-Bits 0:2 auf 0 gestellt werden. Als Ergebnis werden Bits der physikalischen Adresse und der virtuellen Adresse, die dieselben sind, nicht verschoben wie bei dem oben gezeigten Beispiel, wodurch eine Vorauswahl einer Gruppe und einer Bank mit nicht modifizierten Bits, wie beispielsweise Bits 11:3. ermöglicht wird.For example, a data address could not be shifted by log ₂ (N), but address bits 0: 2 could be set to 0. As a result, bits of the physical address and the virtual address that are the same are not shifted as in the example shown above, thereby preselecting a group and a bank of unmodified bits, such as bits 11: 3. is possible.

Man beachte, dass die Erörterung hinsichtlich der Übersetzung mit der Komprimierung kombiniert werden kann. Mit anderen Worten kann ein Komprimierungsverhältnis eine Eingabe in die metaphysikalische Adressübersetzungslogik 210 und 310 aus 2–3 sein und die Übersetzungslogik verwendet das Komprimierungsverhältnis in Verbindung mit einem PEID, CD, MDID, metaphysikalischen Wert oder anderen Informationen, um eine Datenadresse in eine Metadatenadresse zu übersetzen. Die Metadatenadresse wird dann dazu verwendet, auf einen Speicher zuzugreifen, der die Metadaten hält. Wie oben erläutert, können, da es sich bei den Metadaten um ein lokales Konstrukt handelt – verlustbehaftet, Fehlschläge auf den Speicher basierend auf der Metadatenadressen schnell und effizient behandelt werden – Zuweisung eines Speicherortes ohne Erzeugung einer externen Fehlschlagbehandlungs-Anfrage und ohne Warten auf die zu behandelnde externe Anfrage. Hierin wird ein Eintrag auf normale Weise für die Metadaten zugewiesen. Beispielsweise wird ein Eintrag, wie beispielsweise ein Eintrag 217 aus 2 ausgewählt, zugeteilt und auf den Metadaten-Default-Wert basierend auf einer Metadatenadresse 405 und einem Cache-Ersetzungs-Algorithmus, wie beispielsweise eines Least-Recently-Used-(LRU)Algorithmus initialisiert. Als Resultat konkurrieren die Metadaten potenziell mit regulären Daten um Raum, bleiben jedoch komprimiert und von anderen Software-Untersystemen/Verarbeitungselementen getrennt.Note that the discussion of translation can be combined with compression. In other words, a compression ratio may be an input to the metaphysical address translation logic 210 and 310 out 2 - 3 and the translation logic uses the compression ratio in conjunction with a PEID, CD, MDID, metaphysical value, or other information to translate a data address into a metadata address. The metadata address is then used to access memory that holds the metadata. As explained above, since the metadata is a local construct - lossy, failures to memory are handled quickly and efficiently based on the metadata addresses - assignment of a storage location without generating an external failure handling request and without waiting for them handling external request. Here, an entry is assigned in the normal way for the metadata. For example, an entry such as an entry 217 out 2 selected, assigned and to the metadata default value based on a metadata address 405 and a cache replacement algorithm, such as a least-recently-used (LRU) algorithm. As a result, the metadata potentially contends for space with regular data, but remains compressed and separated from other software subsystems / processing elements.

Man beachte, dass ein Kompressionsverhältnis von acht rein veranschaulichend angegeben wurde und jedes beliebige Kompressionsverhältnis verwendet werden kann. Als ein weiteres Beispiel wird ein Kompressionsverhältnis von 512:1 verwendet – ein Bit von Metadaten entspricht 64 Bytes an Daten. Ähnlich zu dem obigen wird eine Datenadresse übersetzt/modifiziert, um durch Verschieben der Datenadresse um log₂(512) Bits – 9 Bits abwärts eine Metadatenadresse zu bilden. Hierbei werden die Bits 6:8 noch immer anstelle von Bits 0:2 dazu verwendet, um ein Bit zu wählen, wodurch effektiv die Kompression durch Auswahl bei einer Granularität von 512 Bits erzeugt wird. Da die Datenadresse um 9 Bits verschoben wurde, hat der Teil höherer Ordnung der Datenadresse neun offene Bit-Stellen, um Information zu erhalten. Bei einer Ausführungsform sind die 9 Bits dazu vorgesehen, Identifizierer zu erhalten, wie beispielsweise eine Kontext-ID, Thread-ID und/oder MDID. Zusätzlich können in diesen Bits auch Werte des metaphysikalischen Raums gehalten werden oder die Adresse kann durch den metaphysikalischen Wert erweitert werden.Note that a compression ratio of eight has been given purely illustrative and any compression ratio can be used. As another example, a compression ratio of 512: 1 is used - one bit of metadata corresponds to 64 bytes of data. Similar to the above, a data address is translated / modified to be shifted by shifting the data address by log ₂ (512). Bits - 9 bits down to form a metadata address. Here, bits 6: 8 are still used instead of bits 0: 2 to select a bit, effectively producing the compression by selection at a granularity of 512 bits. Since the data address has been shifted by 9 bits, the higher order part of the data address has nine open bit positions to obtain information. In one embodiment, the 9 bits are provided to obtain identifiers, such as a context ID, thread ID, and / or MDID. In addition, values of the metaphysical space can also be kept in these bits, or the address can be extended by the metaphysical value.

Bei einer Ausführungsform werden mehrere gleichzeitige Kompressionsverhältnisse durch Hardware unterstützt. Hierin wird eine Darstellung eines Kompressionsverhältnisses als Teil eines metaphysikalischen Wertes kombiniert mit einer Datenadresse gehalten, um eine Metadatenadresse zu erhalten. Als ein Resultat wird während einer Suche eines Speichers mit der Datenadresse das Kompressionsverhältnis berücksichtigt und passt nicht mit Adressen verschiedener Kompressionsverhältnisse zusammen. Darüber hinaus kann die Software dazu geeignet sein, auf Hardware angewiesen zu sein, um Speicherinformationen nicht zu Ladungen eines anderen Kompressionsverhältnisses zu schicken.In one embodiment, multiple concurrent compression ratios are supported by hardware. Herein, a representation of a compression ratio as part of a metaphysical value combined with a data address is held to obtain a metadata address. As a result, during a search of a memory with the data address, the compression ratio is taken into account and does not match with addresses of different compression ratios. In addition, the software may be capable of relying on hardware so as not to send memory information to charges of a different compression ratio.

Bei einer Ausführungsform wird Hardware unter Verwendung eines einzelnen Kompressionsverhältnisses implementiert, wobei sie jedoch eine andere Hardwareunterstützung umfasst, um Software mehrere Kompressionsverhältnisse darzulegen. Als ein Beispiel sei angenommen, dass eine Cache-Hardware unter Verwendung eines Kompressionsverhältnisses von 8:1 implementiert ist, wie in 4 gezeigt ist. Jedoch wird eine Metadaten-Zugriffs-Operation, um auf Metadaten bei unterschiedlichen Granularitäten zuzugreifen, dekodiert, so dass sie eine Mikro-Operation, um eine Default-Menge an Metadaten zu lesen, und eine Test-Mikro-Operation umfasst, um einen geeigneten Teil des Metadatenlesens zu testen. Als ein Beispiel umfasst die Default-Menge an gelesen Metadaten 32 Bits. Jedoch testet eine Testoperation für eine andere Granularität/Kompression von 8:1 die korrekten Bits der 32 Bits der gelesenen Metadaten, die auf einer bestimmten Anzahl von Bits einer Adresse, wie beispielsweise einer Anzahl von LSBs einer Metadatenadresse und/oder einer Kontext-ID basieren können.In one embodiment, hardware is implemented using a single compression ratio, but includes different hardware support to expose software to multiple compression ratios. As an example, assume that a cache hardware is implemented using a compression ratio of 8: 1, as in FIG 4 is shown. However, a metadata access operation to access metadata at different granularities is decoded so that it includes a micro-operation to read a default amount of metadata and a test micro-operation to an appropriate part to test the metadata reading. As an example, the default amount of read metadata comprises 32 bits. However, another 8: 1 granularity / compression test operation tests the correct bits of the 32 bits of the read metadata that are based on a particular number of bits of an address, such as a number of LSBs of a metadata address and / or context ID can.

Als eine Darstellung wird in einem Schema, das Metadaten für nicht ausgerichtete Daten für ein Bit von Metadaten pro Byte an Daten unterstützt, ein einzelnes Bit aus den niedrigstwertigen acht Bits der 32 gelesenen Bits von Metadaten basierend auf den drei LSBs einer Metadatenadresse gewählt. Für ein Datenwort werden zwei aufeinanderfolgende Metadatenbits aus den niedrigst-wertigen 16 Bits der 32 Bits gelesener Metadaten basierend auf den drei LSBs der Adresse gewählt und dies wird bis zu 16 Bits für eine Metadatengröße von 128 Bits fortgesetzt.As an illustration, in a scheme supporting metadata for unaligned data for one bit of metadata per byte of data, a single bit is selected from the least significant eight bits of the 32 read bits of metadata based on the three LSBs of a metadata address. For a data word, two consecutive metadata bits are selected from the least significant 16 bits of the 32 bits of read metadata based on the three LSBs of the address, and this continues for up to 16 bits for a metadata size of 128 bits.

Metadaten-Zugriffs-Instruktionen/OperationenMetadata access instructions / operations

Im Folgenden wird mit Bezugnahme auf 5 ein Flussdiagramm für ein Verfahren zum Zugreifen auf Metadaten dargestellt, die mit Daten verbunden sind. Obwohl die Flüsse aus 5 in einer im Wesentlichen seriellen Weise gezeigt sind, können die Flüsse zumindest teilweise parallel, sowie potenziell mit einer anderen Reihenfolge durchgeführt werden.The following is with reference to 5 a flow chart for a method for accessing metadata associated with data shown. Although the rivers are off 5 are shown in a substantially serial fashion, the flows may be performed at least partially in parallel, as well as potentially in a different order.

Beim Fluss 505 liegt eine Metadatenoperation vor, die auf eine Datenadresse für ein gegebenes Datenelement verweist. Bei der obigen Erläuterung wurde erwähnt, dass Metadateninstruktionen/-operationen durch Hardware unterstützt sein können, um Metadaten zu lesen, zu modifizieren und/oder zu löschen. Mit anderen Worten können Instruktionen in einer Instruktions-Satz-Architektur (ISA) eines Prozessors unterstützt sein, so dass Dekoder des Prozessors Operationscodes (Opcodes) von Instruktionen erkennen, um auf Daten zuzugreifen, und Logik, um die Zugriffe entsprechend durchzuführen. Man beachte, dass sich der Gebrauch einer Instruktion auch auf eine Operation beziehen kann. Einige Prozessoren verwenden die Idee einer Makro-Instruktion, die geeignet ist, in eine Mehrzahl von Mikro-Operationen dekodiert zu werden, um individuelle Aufgaben auszuführen, wie beispielsweise eine Test- und Setze-Metadaten-Makro-Instruktion, die in eine Metadaten-Test-Operation/-Mikro-Operation dekodiert wird, um die Metadaten zu testen und, falls der korrekte Boolsche Wert als ein Ergebnis der Test-Operation erhalten wird, aktualisiert eine Setzoperation die Metadaten auf einen spezifischen Wert.At the river 505 There is a metadata operation that references a data address for a given data element. In the above discussion, it has been mentioned that hardware metadata instructions / operations may be assisted to read, modify, and / or delete metadata. In other words, instructions in an instruction set architecture (ISA) of a processor may be supported so that processor decoders recognize opcodes of instructions to access data and logic to perform the accesses accordingly. Note that the use of an instruction can also refer to an operation. Some processors use the idea of a macro instruction that is capable of being decoded into a plurality of micro-operations to perform individual tasks, such as a test and set metadata macro instruction, which is used in a metadata test Operation / micro-operation is decoded to test the metadata and, if the correct Boolean value is obtained as a result of the test operation, a set operation updates the metadata to a specific value.

Jedoch sind die Metadaten-Zugriffs-Operationen nicht auf explizite Software-Instruktionen zum Zugriff auf Metadaten beschränkt, sondern können auch implizite Mikro-Operationen umfassen, die als Teil einer größeren, komplexeren Instruktion dekodiert werden, die einen Zugriff auf ein Datenelement umfasst, dass mit Metadaten verbunden ist. Hier kann die Daten-Zugriffs-Instruktion in eine Mehrzahl von Operationen dekodiert werden, wie beispielsweise einen Zugriff auf das Datenelement und ein implizites Update der zugeordneten Metadaten.However, the metadata access operations are not limited to explicit software instructions for accessing metadata, but may also include implicit micro-operations that are decoded as part of a larger, more complex instruction that includes access to a data item that includes Metadata is connected. Here, the data access instruction may be decoded into a plurality of operations such as access to the data item and an implicit update of the associated metadata.

Wie zuvor erläutert wurde, ist bei einer Ausführungsform das physikalische Mapping von Metadaten auf Daten in Hardware nicht direkt für Software sichtbar. Als ein Ergebnis verweisen Metadaten-Zugriffs-Operationen in diesem Beispiel auf Datenadressen und stützen sich auf die Hardware, um die richtigen Übersetzungen durchzuführen, d. h. Mapping, um auf die Metadaten geeignet zuzugreifen. Dennoch können Metadaten-Zugriffs-Operationen, abhängig davon, von welchem Thread, Kontext und/oder Software-Untersystem sie stammen, individuell auf getrennte metaphysikalische Adressräume verweisen. Daher kann ein Speicher Metadaten für Datenelemente in einer bezüglich der Software transparenten Art halten. Wenn die Hardware entweder durch einen expliziten Operationscode (Opcode einer Instruktion) oder durch Dekodieren einer Instruktion in eine Metadaten-Zugriffs-Mikro-Operationen) eine Zugriffs-Operation auf Metadaten erfasst, führt die Hardware die erforderliche Übersetzung der Datenadresse durch, auf die durch die Zugriffsoperation verwiesen wurde, um auf die Metadaten entsprechend zuzugreifen.As previously discussed, in one embodiment, the physical mapping of metadata to data in hardware is not directly visible to software. As a result, in this example metadata access operations point to data addresses and rely on the hardware to perform the correct translations, i. H. Mapping to properly access the metadata. However, metadata access operations may individually refer to separate metaphysical address spaces, depending on which thread, context, and / or software subsystem they originate from. Therefore, a memory may hold metadata for data items in a software transparent manner. If the hardware detects either an explicit operation code (opcode of an instruction) or by decoding an instruction into a metadata access micro-operations) an access operation on metadata, the hardware performs the required translation of the data address pointed to by the Access operation was referenced to access the metadata accordingly.

Wie dieses Beispiel veranschaulicht, kann ein Programm getrennte Operationen, wie beispielsweise eine Daten-Zugriffs-Operation oder eine Metadaten-Zugriffs-Operation umfassen, die auf dieselbe Adresse eines Datenelements verweisen, wie beispielsweise Datenelemente 216 und 316 aus 2–3, und die Hardware kann diese Zugriffe auf andere Adressräume, wie beispielsweise einen physikalischen Adressraum und einen metaphysikalischen Adressraum, abbilden. Bei einigen Ausführungsformen kann die ISA durch Instruktionen zum Laden/Speichern/Testen/Setzen von Metadaten für eine gegebene, virtuelle Adresse, MDID, Kompressionsverhältnis und Operantenbreite erweitert werden. Beliebige dieser Parameter können explizite Instruktions-Operanten sein, können im Opcode kodiert sein oder können von einem getrennten Steuerregister erhalten werden. Instruktionen können die Metadaten-Lade/-speicheroperation mit anderen Operationen, wie beispielsweise einem Laden einiger Daten, einem Testen einiger Bits davon und einem Setzen eines Bedingungs-Codes für einen nachfolgenden bedingten Sprung, kombinieren. Instruktionen können auch alle Metadaten oder lediglich Metadaten für einen bestimmten MDID leeren. Nachfolgend ist eine Anzahl erläuternder Metadaten-Zugriffs-Operationen aufgelistet. Man beachte, dass einige der beispielhaften Instruktionen zu spezifischen 64X Kompression-Verhältnis-Instruktionen in Beziehung stehen, wobei jedoch ähnliche Instruktionen für andere Kompressionsverhältnisse sowie nicht komprimierte Metadaten verwendet werden können, selbst wenn sie nicht speziell. offenbart wurden.As this example illustrates, a program may include separate operations, such as a data access operation or a metadata access operation, that reference the same address of a data item, such as data items 216 and 316 out 2 - 3 and the hardware can map these accesses to other address spaces, such as physical address space and metaphysical address space. In some embodiments, the ISA may be extended by metadata load / store / test / set instructions for a given virtual address, MDID, compression ratio, and operant width. Any of these parameters may be explicit instruction operands, may be coded in opcode, or may be obtained from a separate control register. Instructions may combine the metadata load / store operation with other operations, such as loading some data, testing a few bits thereof, and setting a conditional branch condition code. Instructions can also empty all metadata or just metadata for a particular MDID. A number of illustrative metadata access operations are listed below. Note that some of the example instructions are related to specific 64X compression ratio instructions, but similar instructions for other compression ratios as well as uncompressed metadata may be used, even if not specific. were revealed.

Metadaten-Bit-Test und Setzen (MDLT)Metadata Bit Test and Set (MDLT)

Die Metadaten-Lade- und Test-Instruktion (MDLT) hat zwei Argumente: Die Datenadresse, mit welcher die Metadaten verbunden sind als ein Quell-Operand und ein Register (Ziel-Operand), in welches das Byte, Word, dWord, qWord oder eine andere Größe von Metadaten, die das Bit enthalten, geschrieben wird. Der Wert des getesteten Metadaten-Bits wird in das Register geschrieben. Der Programmierer sollte nicht von irgendeiner Kenntnis der Daten, die im Zielregister der MDLT-Instruktion gespeichert werden, ausgehen und sollte dieses Register nicht manipulieren. Dieses Register soll einzig als ein Quell-Operand für eine Metadaten-Speicher- und -Setze-Instruktion (MDSS) zur selben Adresse verwendet werden. Bei einer Ausführungsform wird die MDLT-Instruktion die Test- und -Setze-Operationen kombinieren, wird jedoch die Setze-Operation unterdrücken, falls der Test erfolgreich ist.The metadata load and test instruction (MDLT) has two arguments: the data address to which the metadata is associated as a source operand and a register (destination operand) into which the byte, word, dword, qword or another size of metadata that contains the bit is written. The value of the tested metadata bit is written to the register. The programmer should not assume any knowledge of the data stored in the destination register of the MDLT instruction and should not manipulate that register. This register is to be used solely as a source operand for a Metadata Store and Set Instruction (MDSS) to the same address. In one embodiment, the MDLT instruction will combine the test and set operations, but will suppress the set operation if the test is successful.

Metadaten-Speicher und Setzen (MSS)Metadata Store and Set (MSS)

Die Metadaten-Speicher- und -Setze-Instruktion (MDSS) hat zwei Argumente: Die Metadatenadresse, mit welchen die Metadaten verbunden sind, und ein Register (Quell-Operand) aus dem das Byte, Word, dWord, qWord oder eine andere Größe von Metadaten, die das Bit enthalten, zur Speicherung gespeichert werden soll. Die MDSS-Instruktion wird das korrekte Bit im Wert aus seinem Quell-Operanden setzen.The Metadata Store and Set Instruction (MDSS) has two arguments: the metadata address to which the metadata is associated and a register (source operand) from which the byte, word, dWord, qWord, or any other size of Metadata containing the bit should be stored for storage. The MDSS instruction will set the correct bit in value out of its source operand.

Metadaten-Speicher- und Reset-Instruktion (MDSR)Metadata Storage and Reset Instruction (MDSR)

Die MDSR-Instruktion hat zwei Quell-Argumente: Die Datenadresse, mit welcher die Metadaten verbunden sind, als ein Quell-Operand und ein Register (Quelloperand), aus dem das Byte, Word, dWord, qWord oder eine andere Größe von Metadaten, die das Bit enthalten, zurückzusetzen ist. Die MDSR-Instruktion wird das richtige Bit im Wert von seinem Quell-Operanden zurücksetzen.The MDSR instruction has two source arguments: the data address to which the metadata is associated, as a source operand, and a register (source operand) that makes up the byte, word, dWord, qWord, or any other size of metadata contain the bit to reset. The MDSR instruction will reset the correct bit in value from its source operand.

Eine Metadatenadresse wird aus der referenzierten Datenadresse bestimmt. Beispiele einer Bestimmung einer Metadatenadresse sind oben in den Abschnitten zur metaphysikalisehen Adressübersetzung und den multiplen metaphysikalischen Adressräumen enthalten. Jedoch ist zu beachten, dass die Übersetzung ein Kompressionsverhältnis von Daten zu Metadaten umfassen kann, d. h. darauf basieren kann, um Metadaten für jedes Kompressionsverhältnis getrennt zu speichern.A metadata address is determined from the referenced data address. Examples of a metadata address determination are in the metaphysical address translation sections above contain the multiple metaphysical address spaces. However, it should be noted that the translation may include, ie, be based on, a compression ratio of data to metadata to separately store metadata for each compression ratio.

Test-Metadaten (CMDT)Test Metadata (CMDT)

Tabelle B: Erläuternde Ausführungsform einer Test-Metadaten-Operation

Table B: Illustrative Embodiment of a Test Metadata Operation

Die CMDT-Instruktion soll die Speicherdatenadresse in eine Speicher-Metadatenadresse mit einer komprimierten Mapping-Funktion konvertieren, die implementierungsabhängig ist, und soll testen, ob ein Metadaten-Bit entsprechend der Speicher-Metadatenadresse gesetzt ist. Als ein Beispiel ist das Kompressionsverhältnis CR ein Bit für acht Bytes. Die Metadatenadress-Berechnung umfasst eine der Kontext-IDs vorn MDCR-Register, um ein eindeutiges Setzen von MD für jede individuelle Kontext-ID zu gewährleisten, wobei MDBLK[CR][MDCR.MDID[MDID number]].META adressiert wird. Die Instruktion richtet die Adresse „mem” mit der spezifizierten Datengröße aus, und verstärkt somit die Ausrichtung. Die Instruktion testet, ob Metadata gesetzt ist.The CMDT instruction is to convert the memory data address to a memory metadata address having a compressed mapping function that is implementation-dependent and to test whether a metadata bit corresponding to the memory metadata address is set. As an example, the compression ratio CR is one bit for eight bytes. The metadata address calculation includes one of the MDCR register context IDs to ensure uniquely setting MD for each individual context ID, addressing MDBLK [CR] [MDCR.MDID [MDID number]]. META. The instruction aligns the address "mem" with the specified data size, thus enhancing alignment. The instruction tests if metadata is set.

Nachfolgend ist ein beispielhafter Pseudo-Code mit Bezug zu CDMT eingefügt (Das ZF-Flag ist gesetzt, um einen Null-Metadaten-Wert zu repräsentieren. Alle anderen Flags sind gelöscht):

The following is an exemplary pseudo code related to CDMT (The IF flag is set to represent a zero metadata value, all other flags are cleared):

Speicher für komprimierte Metadaten (CMDS)Memory for compressed metadata (CMDS)

Tabelle C: Erläuternde Ausführungsform einer Metadaten-Speicher-Operation

Table C: Illustrative Embodiment of a Metadata Store Operation

Die CMDS-Instruktion konvertiert die Speicher-Datenadresse in eine Speicher-Metadatenadresse mit einer komprimierten Mapping-Funktion, die implementierungsabhängig ist. Das Kompressionsverhältnis ist ein Bit für 8 Bytes von Daten. Die Kodierung des imm8-Wertes ist wie folgt: 0 → MD_Value; Value speichern in MD und 7:1 → Reserved; Not Used Unten ist ein beispielhafter mit CMDS verbundener pseudo-Code enthalten.The CMDS instruction converts the memory data address into a memory metadata address with a compressed mapping function that is implementation-dependent. The compression ratio is one bit for 8 bytes of data. The coding of the imm8 value is as follows: 0 → MD_Value; Save value in MD and 7: 1 → Reserved; Not Used Below is an exemplary pseudo-code associated with CMDS.

Löschen komprimierter Metadaten (CMDCLR)Delete Compressed Metadata (CMDCLR)

Tabelle D: Erläuternde Ausführungsform der Lösche-Metadaten-Operation

Table D: Illustrative Embodiment of the Delete Metadata Operation

Die CMDCLR-Instruktion setzt alle MDBLK[CR][MDCR.MDID[MDID number]].META die beliebigen Daten im Bereich entsprechen, der MBLK (mem) überspannt. Ein beispielhafter pseudo-Code mit Bezug zu CMDCLR ist nachfolgend eingefügt:

The CMDCLR instruction sets all MDBLK [CR] [MDCR.MDID [MDID number]]. META will match any data in the range that spans MBLK (mem). An exemplary pseudo code related to CMDCLR is inserted below:

Als nächstes wird im Fluss 510 eine Metadatenadresse aus der Datenadresse bestimmt, auf die in der Metadaten-Zugriffs-Operation verwiesen wird, basierend auf einem Kompressionsverhältnis, Verarbeitungselement-ID, Kontext-ID, MDID, metaphysikalischen Wert, Operantengröße und/oder einem anderen mit der metaphysikalischen Adressraum-Übersetzung verbundenen Wert. Es können beliebige der oben beschriebenen Verfahren, wie beispielsweise eine Kombination von ID-Werten ohne eine Übersetzung der Datenadresse, einer normalen Übersetzung der Datenadresse oder einer getrennten metaphysikalische Adressübersetzung der Datenadresse, verwendet werden, um die geeignete Metadatenadresse zu erhalten.Next is in the river 510 determines a metadata address from the data address referenced in the metadata access operation based on a compression ratio, processing element ID, context ID, MDID, metaphysical value, operator size and / or another metaphysical address space translation related Value. Any of the methods described above, such as a combination of ID values without a data address translation, a normal data address translation, or a separate metaphysical address translation of the data address, may be used to obtain the appropriate metadata address.

Darüber hinaus wird bei einigen Beispielen, wie oben erläutert wurde, eine Version der Test-, Set-, Clear- oder anderer Instruktionen bereitgestellt, um einem Thread- oder Metadaten-Kontext zu ermöglichen Metadaten anderer Threads oder Metadaten-Kontexte zu testen, setzten oder zu löschen. Als ein Ergebnis kann die Übersetzung in eine Metadatenadresse eine Modifizierung der Adresse umfassen, wie beispielsweise eine Anwendung einer Maske, um den Zugriff von einem Thread oder einer Kontext-ID zuzulassen, um auf einen anderen Thread oder Kontext-ID zuzugreifen.Additionally, in some examples, as discussed above, a version of the test, set, clear, or other instructions is provided to allow a thread or metadata context to test, set, or test metadata of other threads or metadata contexts to delete. As a result, the translation into a metadata address may include a modification of the address, such as an application of a mask to allow access from a thread or context ID to access another thread or context ID.

Im Fluss 515 wird auf die durch die Metadatenadresse referenzierten Metadaten zugegriffen. Im Normalfall wird auf den getrennten Ort für die Metadaten, die mit dem lokalen, anfordernden Thread oder Kontext-ID verbunden sind, zugegriffen und die geeigneten Operationen, wie beispielsweise Test, Set, und Clear, werden durchgeführt. Jedoch kann im zweiten oben beschriebenen Fall auf Metadaten für andere Threads oder Kontext-IDs in diesem Fluss ebenfalls zugegriffen werden.In the river 515 the metadata referenced by the metadata address is accessed. Normally, the separate location for the metadata associated with the local requesting thread or context ID is accessed and the appropriate operations, such as Test, Set, and Clear, are performed. However, in the second case described above, metadata for other threads or context IDs in that flow may also be accessed.

Abstraktionenabstractions

Eine Ausführungsform von Abstraktionen für Software ist hierin enthalten. Ein gegebenes CR ist eine Potenz von zwei, die angibt, wie viele Daten-Bits auf ein Bit von Metadaten abbilden. Durch die Implementierung wird definiert, welche CRs-Werte, falls es solche gibt, verwendet werden können. CR > 1 bezeichnet komprimierte Metadaten. CR = 1 bezeichnet unkomprimierte Metadaten.One embodiment of abstractions for software is included herein. A given CR is a power of two that indicates how many bits of data map to one bit of metadata. The implementation defines which CRs values, if any, can be used. CR> 1 refers to compressed metadata. CR = 1 denotes uncompressed metadata.

MDBLK[CR][*]s sind ceil(CR/8) Bytes groß und sind natürlich ausgerichtet. MDBLKs sind mit physikalischen Daten und nicht ihren linearen, virtuellen Adressen verbunden. Alle gültigen physikalischen Adressen A mit demselben Wert floor(A/MDBLK[CR][*]_SIZE) bezeichnen dieselben Sätze von MDBLKs.MDBLK [CR] [*] s are ceil (CR / 8) bytes in size and are naturally aligned. MDBLKs are connected to physical data rather than their linear virtual addresses. All valid physical addresses A with the same value floor (A / MDBLK [CR] [*] _ SIZE) denote the same sets of MDBLKs.

Für ein gegebenes CR kann es eine beliebige Anzahl von verschiedenen MDIDs geben, wovon jeder ein eindeutiges Metadaten-Exemplar bezeichnet. Die Metadaten für ein gegebenes CR und MDID ist von den Metadaten für jedes beliebige andere CR oder MDID verscheiden. Beispielsweise für Thd#0 ist unter der Annahme, dass addr QWORD-ausgerichtet ist, der Metadatenblock auf den durch MDBLK[CR=64][MDID=3](addr) Bezug genommen wird, derselbe wie MDBLK[CR=64][MDID=3](addr+7), ist jedoch sicherlich verschieden von MDBLK[CR=64][MDID=4](addr) und von MDBLK[CR=512][MDID=3](addr).For a given CR, there can be any number of different MDIDs, each designating a unique metadata instance. The metadata for a given CR and MDID is different from the metadata for any other CR or MDID. For example, for Thd # 0, assuming that addr is QWORD aligned, the metadata block referred to by MDBLK [CR = 64] [MDID = 3] (addr) is the same as MDBLK [CR = 64] [MDID = 3] (addr + 7), but is certainly different from MDBLK [CR = 64] [MDID = 4] (addr) and from MDBLK [CR = 512] [MDID = 3] (addr).

Eine gegebene Implementierung kann mehrere übereinstimmende Kontexte unterstützen, wobei die Anzahl von Kontexten von der CR und bestimmten Konfigurationsinformationen abhängt, die mit dem spezifischen System verbunden sind, dessen Teile der Prozessor bildet. Für unkomprimierte Metadaten existiert ein QWORD von Metadaten für jedes QWORD physikalischer Daten. A given implementation may support multiple matching contexts, with the number of contexts depending on the CR and certain configuration information associated with the specific system whose parts are the processor. For uncompressed metadata, there is a QWORD of metadata for each QWORD of physical data.

Metadaten werden ausschließlich durch Software interpretiert. Software kann META für ein spezielles MDBLK[CR][MDID]setzten, zurücksetzten oder testen, oder META für alle MDBLK[*][*]s des Threads zurücksetzten oder META für alle MDBLKs[CR][MDID] des Threads zurücksetzten, die ein gegebenes MBLK(addr) kreuzen können.Metadata are interpreted exclusively by software. Software may set, reset, or test META for a particular MDBLK [CR] [MDID], or reset META for all MDBLK [*] [*] s of the thread, or reset META for all MDBLKs [CR] [MDID] of the thread can cross a given MBLK (addr).

Metadaten-Verlust. Jede beliebige META-Eigenschaft des Threads kann spontan auf 0 zurücksetzen, wodurch ein Metadata Loss Event erzeugt wird.Metadata loss. Any META property of the thread can spontaneously reset to 0, creating a metadata loss event.

Erzwungener MetadatenwertForced metadata value

Unter Bezugnahme auf 6 wird eine Ausführungsform zur Bereitstellung einer Hardware-Unterstützung für einen erzwungenen Metadatenwert veranschaulicht. STMs stellen gewöhnlich die Konsistenz zwischen Speicherzugriffsoperationen unter Verwendung von Zugriffsbarrieren sicher. Beispielsweise wird vor einem Speicherzugriff auf ein Datenelement ein Metadaten-Ort oder ein Verriegelungs-Ort, der mit dem Datenelement verbunden ist, geprüft, um festzustellen, ob das Datenelement verfügbar ist. Andere potenzielle Barrierenoperationen umfassen das Erhalten einer Verriegelung, wie beispielsweise einer Leseverriegelung, Schreibverriegelung oder einer anderen Verriegelung auf das Datenelement an dem Metadaten- oder Verriegelungs-Ort, das Logging/Speichern einer Version für das Datenelement in einem Lese- oder Schreibsatz für eine Transaktion, ein Bestimmen, ob ein Lese-Set für eine Transaktion zu diesem Punkt immer noch gültig ist, ein Puffer oder Backup eines Wertes des Datenelements, ein Setzen von Monitoren, ein Aktualisieren eines Filterwerts sowie beliebige andere transaktionale Operationen.With reference to 6 An embodiment for providing hardware support for a forced metadata value is illustrated. STMs usually ensure consistency between memory access operations using access barriers. For example, prior to a memory access to a data item, a metadata location or an interlock location associated with the data item is checked to determine if the data item is available. Other potential barrier operations include obtaining a lock, such as a read lock, write lock, or other lock on the data item at the metadata or lock location, logging / storing a version for the data item in a read or write record for a transaction, determining whether a read set for a transaction is still valid at that point, buffering or backing up a value of the data item, setting monitors, updating a filter value, and any other transactional operations.

Häufig verursachen jedoch nachfolgende Zugriffe auf dasselbe Datenelement in einer Transaktion jedes Mal wenn der Zugriff auf das Datenelement angetroffen wird den Aufwand einer Ausführung einer zugeordneten transaktionalen Barriere. Zur Veranschaulichung werden drei Schreibvorgänge zur Adresse A in einer Transaktion durchgeführt, was bei diesem Szenario zu einer dreimaligen getrennten Ausführung einer Schreibbarriere führt, um eine Schreibverriegelung für die Adresse A zu erlangen. Jedoch wurde die Verriegelung für die Adresse A bereits durch die Ausführung einer Schreibbarriere beim ersten transaktionalen Schreiben erlangt und die nachfolgenden beiden Ausführungen der Schreibbarrieren vor den letzten beiden transaktionalen Schreibvorgängen sind überflüssig – die Verriegelung auf die Adresse A muss nicht wiedererlangt werden.Often, however, subsequent accesses to the same data item in a transaction each time the access to the data item is encountered causes the overhead of executing an associated transactional barrier. By way of illustration, three writes to address A are performed in one transaction, resulting in a write-barrier three times apart in this scenario to obtain a write lock for address A. However, the lock for address A has already been obtained by performing a write barrier on the first transactional write, and the subsequent two executions of write barriers prior to the last two transactional writes are redundant - the lock on address A need not be recovered.

Daher hält bei einer Ausführungsform die Hardware einen Filterwert, um eine mit diesen Barrieren verbundene Ausführung zu beschleunigen. Der Filterwert kann in einem Cache, wie beispielsweise die Lese- und Schreibmonitore, als ein Anmerkungs-Bit enthalten sein oder kann an einem Metadaten-Ort in einem metaphysikalischen Adressraum gehalten werden, wie zuvor beschrieben wurde. Unter Verwendung des Beispiels von oben wird beim ersten Antreffen der Schreibbarriere ein Schreibfilterwert von einem nicht zugegriffenen Wert auf einen zugegriffenen Wert aktualisiert, um anzuzeigen, dass eine Schreibbarriere für die Adresse A bereits innerhalb der Transaktion angetroffen wurde. Daher wird bei den nachfolgenden beiden transaktionalen Schreiboperationen innerhalb der Transaktion der Schreibfilterwert vor einem Vectoring zur Schreibbarriere für die Adresse A geprüft. Hier umfasst der Filterwert einen zugegriffenen Wert, der anzeigt, dass die Schreibbarriere nicht ausgeführt werden muss – die Schreibbarriere wurde bereits innerhalb der Transaktion ausgeführt. Als ein Ergebnis wird die Ausführung zumindest für die letzten beiden Schreiboperationen nicht zur Schreibbarriere Vektor-geführt. Mit anderen Worten beschleunigt der Filterwert die transaktionale Ausführung – lässt die Ausführung der Schreibbarriere für die zumindest zwei Zugriffe im Vergleich zum vorherigen Beispiel ohne Verwendung eines Filters aus oder schließt die Ausführung nicht ein.Thus, in one embodiment, the hardware maintains a filter value to speed up an implementation associated with these barriers. The filter value may be included in a cache, such as the read and write monitors, as an annotation bit, or may be held at a metadata location in a metaphysical address space, as previously described. Using the example above, the first time the write barrier encounters, a write filter value is updated from an un-accessed value to an accessed value to indicate that a write barrier to address A has already been encountered within the transaction. Therefore, in the subsequent two transactional writes within the transaction, the write filter value is checked before vectoring to the write barrier for address A. Here, the filter value includes an accessed value indicating that the write barrier does not need to be executed - the write barrier has already been executed within the transaction. As a result, the execution is not vectored to the write barrier for at least the last two write operations. In other words, the filter value speeds up the transactional execution - omits or does not include execution of the write barrier for the at least two accesses compared to the previous example without using a filter.

Man beachte, dass Lesefilter für Laden/Schreiben, Undo-Filter für Undo-Operationen und sonstige Filter für generische Filteroperationen in derselben Weise verwendet werden können, wie der Schreibfilter oben für Schreib/Speicher-Operationen verwendet wurde.Note that read / write filters, undo filters for undo operations, and other filters for generic filter operations can be used in the same way as the write filter above was used for write / store operations.

Mit transaktionalen Barrieren ebenfalls verbundene Konzepte sind die starke und schwache Atomität, die sich mit der Isolation transaktionaler Operationen von nicht transaktionalen Operationen beschäftigen. Hierin stellt genauso wie transaktionales Schreiben an eine transaktional geladene Speicherstelle einen potenzieller Konflikt darstellt, ein transaktionales Schreiben an eine nicht transaktional geladene Speicherstelle einen potenziellen Konflikt dar, der zu einer Verwendung ungültiger Daten durch die nicht transaktionale Ladeoperation führt. In Systemen mit schwacher Atomität sind bei nicht transaktionalen Operationen keine oder nur minimale Barrieren eingefügt, so dass die Systeme mit schwacher Atomität dem Risiko einer ungültigen Ausführung ausgesetzt sind. Im Gegensatz dazu sind bei System mit starker Atomität auch bei nicht transaktionalen Operationen transaktionale Barrieren eingefügt. Dies gewährleistet einen Schutz und eine Isolierung zwischen transaktionalen und nicht transaktionalen Operationen, was jedoch mit einem Kostenaufwand verbunden ist – den Kosten der Ausführung einer transaktionalen Barriere bei jeder nicht transaktionalen Operation.Concepts also associated with transactional barriers are the strong and weak atomics that deal with the isolation of transactional operations from non-transactional operations. Herein, as well as transactional writing to a transactionally-loaded location represents a potential conflict, a transactional write to a non-transactionally-loaded location represents a potential conflict that results in the non-transactional use of invalid data Loading operation leads. In weak atomic systems, no or minimal barriers are included in non-transactional operations, so the weak atomic systems are at risk of invalid execution. In contrast, in systems with strong atomicity, transactional barriers are also included in non-transactional operations. This ensures protection and isolation between transactional and non-transactional operations, but at a cost - the cost of executing a transactional barrier for each non-transactional operation.

Daher können bei einer Ausführungsform die oben beschriebenen Filter in Kombination mit Barrieren starker Atomität bei nicht transaktionalen Operationen wirksam eingesetzt werden, um verschiedene Betriebsmodi starker und schwacher Atomität zu unterstützen. Zur Veranschaulichung wird eine vereinfachte beispielhafte Ausführungsform in 6 gezeigt. Hierbei werden Metadaten 610 in Hardware für Daten 605 gehalten, wie oben erläutert wurde. Ein Metadaten-Zugriff 600 wird empfangen, um auf Metadaten 610 zuzugreifen. Bei einer Ausführungsform umfasst der Metadaten-Zugriff eine Test-Metadaten-Operation, um einen Filter zu testen, wie beispielsweise einen Lesefilter, Schreibfilter, Undo-Filter oder sonstigen Filter.Thus, in one embodiment, the above-described filters in combination with strong atomic barrier may be effectively employed in non-transactional operations to support various high and low atomicity modes of operation. To illustrate, a simplified exemplary embodiment is shown in FIG 6 shown. This will be metadata 610 in hardware for data 605 held as explained above. A metadata access 600 is received to on metadata 610 access. In one embodiment, the metadata access includes a test metadata operation to test a filter, such as a read filter, write filter, undo filter, or other filter.

Eine Test-Metadaten-Operation, um einen Filter zu testen, kann von einer transaktionalen oder nicht transaktionalen Zugriffsoperation stammen. Bei einer Ausführungsform fügt ein Compiler, wenn er einen Anwendungs-Code compiliert, die Test-Filter-Operation inline in den Anwendungs-Code als eine Bedingung zur Ausführung eines Calls zu einer transaktionalen Barriere bei transaktionalen und nicht transaktionalen Zugriffen ein. Daher wird in einer Transaktion die Filteroperation vor einem Call zu einer Barriere ausgeführt und falls sie erfolgreich zurückkommt, wird der Call zur transaktionalen Barriere nicht ausgeführt, wodurch die oben erläuterte Beschleunigung erzielt wird.A test metadata operation to test a filter may be from a transactional or non-transactional access operation. In one embodiment, when compiling an application code, a compiler inserts the test-filter operation inline into the application code as a condition to execute a transactional barrier call on transactional and non-transactional accesses. Therefore, in a transaction, the filter operation is performed prior to a call to a barrier, and if it successfully returns, the call to the transactional barrier is not executed, thereby achieving the acceleration discussed above.

Auch bei nicht transaktionalen Operationen ist die Hardware bei einer Ausführungsform dazu in der Lage, in einem Modus geringer Atomität, wobei transaktionale Barrieren bei nicht transaktionalen Operationen nicht ausgeführt werden, und in einem Modus starker Atomität zu arbeiten, wobei transaktionale Barrieren ausgeführt werden.Even in non-transactional operations, in one embodiment, the hardware is capable of operating in a low atomic mode, where transactional barriers are not performed in non-transactional operations, and in a strong atomic mode, where transactional barriers are executed.

Der Betriebsmodus oder Steuerung 625 kann im Metadaten-Steuerregister (MDCR) 615 gesetzt werden, der mit der Version von MDCR kombiniert werden kann, wie oben beschreiben wurde, um MDIDs zu halten oder kann ein getrenntes Steuerregister sein. Bei einer anderen Ausführungsform kann die Steuerung 625 für den Betriebsmodus in einem allgemeinen transaktionalen Steuerregister oder Statusregister gehalten werden. Hierin umfasst ein erster Ausführungsmodus einen Modus starker Atomität, wobei transaktionale Barrieren bei nicht transaktionalen Operationen auszuführen sind. In diesem Fall repräsentiert die Steuerung 625 einen ersten Wert, wie beispielsweise 00, um eine starke Atomität und einen nicht transaktionalen Betriebsmodus anzuzeigen. In der Reaktion darauf wählt die Logik 620, die als ein beispielhafter Multiplexer aufgeführt ist, den Metadatenwert aus Metadaten 610, die von Hardware gehalten werden und die mit Datenadresse A verbunden sind, die für den Metadatenzugriff 600 zum Zielregister 650 zu liefern ist. Im Wesentlichen werden Barrieren in einem Modus starker Atomität basierend auf den aktuellen von Hardware gehaltenen Metadaten beschleunigt. Alternativ wird während eines zweiten Ausführungsmodus, wie beispielsweise einem nicht transaktionalen Modus mit schwacher Atomität, in Reaktion auf einen Metadatenzugriff 600 anstelle der von der Hardware gehaltenen Metadaten 610 ein fester oder erzwungener Wert vom MDCR zum Zielregister 650 geliefert, wie durch Steuerung 625 angezeigt ist, die einen zweiten Wert, wie beispielsweise 01 darstellt.The operating mode or controller 625 can in metadata control register (MDCR) 615 which can be combined with the version of MDCR as described above to hold MDIDs or may be a separate control register. In another embodiment, the controller may 625 for the operating mode in a general transactional control register or status register. Herein, a first mode of execution includes a strong atomic mode where transactional barriers are to be performed in non-transactional operations. In this case, the controller represents 625 a first value, such as 00, to indicate a strong atomic and non-transactional operating mode. In the reaction to it chooses the logic 620 listed as an exemplary multiplexer, the metadata value from metadata 610 which are held by hardware and which are connected to data address A, which is for metadata access 600 to the destination register 650 to deliver. In essence, barriers are accelerated in a strong atomic mode based on the current hardware-held metadata. Alternatively, during a second execution mode, such as a non-transactional mode with weak atomicity, in response to a metadata access 600 instead of the hardware metadata 610 a fixed or forced value from the MDCR to the destination register 650 delivered as by control 625 is displayed, which represents a second value, such as 01.

Im Wesentlichen wird in einem Modus niedriger Atomizität ein erzwungener Wert zum Zielregister 650 in Reaktion auf eine Testfilteroperation 600 geliefert, um sicherzustellen, dass der Test des Filterwerts immer erfolgreich ist und der Call zur transaktionalen Barriere nicht vor dem nicht transaktionalen Speicherzugriff ausgeführt wird. Man beachte, das diese Beschreibung davon ausgeht, dass die Testfilteroperation einen Boolschen Wert zurückgibt, um anzuzeigen, ob der Filtertest erfolgreich (die Barriere ist nicht auszuführen) oder fehlschlägt (die Barriere ist auszuführen). Als Ergebnis wird dasselbe Filtersoftwarekonstrukt zur Beschleunigung von Transaktionen durch Ignorieren von Barrieren basierend auf dem Filterwert wirksam eingesetzt, um einen Betriebsmodus, bei dem alle Barrieren bei nicht transaktionalen Operationen ignoriert werden – Modus schacher Atomizität, und einen zweiten Betriebsmodus zu liefern, in dem Barrieren bei nicht transaktionalen Operationen ausgeführt oder beschleunigt werden, basierend auf Metadaten, die von Hardware aufrecht erhalten werden – starke Atomizität. In einer anderen Ausführungsform können für jeden Modus unterschiedliche erzwungene Werte vorgesehen werden, Hier würde in einem Modus starker Atomizität der erzwungene Wert sicherstellen, dass die Testfileroperation fehlschlägt, sodass die Barriere immer ausgeführt wird, während im Modus schwacher Atomizität der erzwungene Wert sicherstellen würde, dass die Testfilteroperation erfolgreich ist, so dass die Barriere nicht ausgeführt wird.In essence, in a low atomicity mode, a forced value becomes the destination register 650 in response to a test filter operation 600 to ensure that the filter value test is always successful and the transactional barrier call is not executed before non-transactional memory access. Note that this description assumes that the test filter operation returns a Boolean value to indicate whether the filter test is successful (the barrier is not running) or fails (the barrier is to be executed). As a result, the same filter software construct is used to accelerate transactions by ignoring barriers based on the filter value, to provide an operational mode in which all barriers are ignored in non-transactional operations - shaky atomicity mode, and provide a second mode of operation in the barrier non-transactional operations are performed or accelerated based on metadata maintained by hardware - strong atomicity. In another embodiment, different enforced values may be provided for each mode. Here, in a high atomicity mode, the forced value would ensure that the test filter operation fails so that the barrier will always be executed while in the weak atomic mode the forced value would ensure that the test filter operation succeeds, so that the barrier is not executed.

Obwohl das Bereitstellen eines erzwungenen oder festen Wertes von einem Steuerregister, wie beispielsweise MDCR 615 basierend auf Steuerinformationen, wie beispielsweise der Steuerung 625, in Verbindung mit einem Bereitstellen eines festen/erzwungenen Wertes oder eines Metadatenwertes basierend auf einem Modusbetrieb beschrieben wurde, kann das Bereitstellen eines erzwungenen oder festen Wertes für einen beliebigen generischen Gebrauch von Metadaten verwendet werden, wie beispielsweise ein Zulassen eines dateninvarianten Verhaltens, das für ein Debuggen und ein generisches Überwachen von Speicherzugriffen verwendet wird, die auf Anforderung aktiviert werden können. Although providing a forced or fixed value from a control register, such as MDCR 615 based on control information, such as the controller 625 , has been described in connection with providing a fixed / forced value or a metadata value based on a mode operation, providing a forced or fixed value may be used for any generic use of metadata, such as allowing for data-invariant behavior associated with a metadata Debugging and generic memory access monitoring is used, which can be activated on demand.

Mit Bezugnahme auf 7 wird eine Ausführungsform eines Flussdiagramms zur Beschleunigung von nicht transaktionalen Operationen gezeigt, während die Atomizität in einer transaktionalen Umgebung beibehalten wird. Im Fluss 705 wird eine Metadaten(MD)-Zugriffsoperation, die auf eine Datenadresse verweist, angetroffen. Als ein spezielles darstellendes Beispiel umfasst die MD-Zugriffsoperation eine Testoperation, die zuvor von einem Compiler in-line mit Anwendungscode eingefügt wurde, um eine transaktionale Barriere bei einem nicht transaktionalen Speicherzugriff zu übergehen, falls der Test einen Wert (Erfolg) zurückgibt, und um die Barriere auszuführen, falls der Test einen zweiten Wert zurückgibt (Fehlschlagen). Jedoch ist eine Test-MD-Operation nicht in dieser Weise beschränkt, da sie jede beliebige Testoperation zur Rückgabe eines Boolschen Erfolg- oder Fehlschlagwertes umfassen kann.With reference to 7 For example, one embodiment of a flowchart for accelerating non-transactional operations while maintaining atomicity in a transactional environment is shown. In the river 705 is encountered a metadata (MD) access operation pointing to a data address. As a specific illustrative example, the MD access operation includes a test operation previously inserted by a compiler in-line with application code to override a transactional barrier in a non-transactional memory access if the test returns a value (success) and execute the barrier if the test returns a second value (fail). However, a test MD operation is not so limited since it can include any test operation for returning a Boolean hit or miss value.

Im Fluss 710 wird ein Betriebsmodus bestimmt. Hierbei können Beispiele für einen Betriebsmodus transaktional oder nicht transaktional in Verbindung mit einer starken Atomizität oder schwachen Atomizität sein. Daher können ein oder zwei getrennte Register ein erstes Bit, um einen transaktionalen oder nicht transaktionalen Betriebsmodus anzuzeigen, und ein zweites Bit für einen Betriebsmodus mit starker oder schwacher Atomizität halten.In the river 710 an operating mode is determined. Here, examples of a mode of operation may be transactional or non-transactional in connection with strong atomicity or weak atomicity. Therefore, one or two separate registers may hold a first bit to indicate a transactional or non-transactional mode of operation and a second bit for a high or low atomicity mode of operation.

Falls der Betriebsmodus transaktional oder nicht transaktional ist und eine starke Atomizität aufweist, wird der von Hardware aufrecht erhaltene Metadatenwert zur Metadatenzugriffsoperation geliefert – der von Hardware aufrecht erhaltene Wert wird in einem Zielregister platziert, das durch die MD-Zugriffsoperation festgelegt ist. Im Gegensatz dazu wird der erzwungene feste MDCR Wert zur MD-Zugriffsoperation anstelle des von Hardware aufrechterhaltenen MD-Wertes geliefert, falls der Betriebsmodus nicht transaktional ist und eine schwache Atomizität aufweist. Folglich sind während des Modus mit starker Atomizität Barrieren beschleunigt oder basieren nicht auf einem von Hardware aufrechterhaltenen MD-Wert, während im Modus mit geringer Atomizität Barrieren basierend auf dem erzwungenen MDCR-Wert beschleunigt sind.If the mode of operation is transactional or non-transactional and has strong atomicity, the metadata value maintained by hardware is provided to the metadata access operation - the hardware maintained value is placed in a destination register specified by the MD access operation. In contrast, if the mode of operation is not transactional and has weak atomicity, the enforced fixed MDCR value is supplied to the MD access operation instead of the hardware-maintained MD value. Thus, during the high atomicity mode, barriers are accelerated or are not based on a hardware-maintained MD value, while in the low-atomic mode, barriers based on the forced MDCR value are accelerated.

Wirkungsvoller Übergang in einem gepufferten und überwachten ZustandEffective transition in a buffered and monitored state

Im Folgenden wird unter Bezugnahme auf 8 eine Ausführungsform eines Flussdiagramms für ein Verfahren zur wirksamen Übertragung eines Datenblocks in einen gepufferten und überwachten Zustand vor einem Commit einer Transaktion dargestellt. Wie oben beschrieben, können Speicherblöcke, wie beispielsweise eine Cache-Zeile, die ein Datenelement oder Metadaten hält, gepuffert und/oder überwacht werden. Beispielsweise umfassen Kohärenzbits für eine Cache-Zeile eine Darstellung eines gepufferten Zustands und Attributbits für eine Cache-Zeile zeigen an, ob die Cache-Zeile nicht überwacht, leseüberwacht oder schreibüberwacht ist.The following is with reference to 8th an embodiment of a flowchart for a method for effectively transferring a data block into a buffered and monitored state before a transaction is committed. As described above, memory blocks, such as a cache line holding a data item or metadata, may be buffered and / or monitored. For example, coherence bits for a cache line include a representation of a buffered state, and attribute bits for a cache line indicate whether the cache line is not monitored, read-monitored, or write-monitored.

Bei einigen Ausführungsformen ist eine Cache-Zeile gepuffert, jedoch nicht überwacht, was bedeutet, dass die in der Cache-Zeile gehaltenen Daten verlustbehaftet sind und dass die Cache-Zeile betreffende Konflikte nicht erfasst werden, da es keine dafür aufgewendete Überwachung gibt. Beispielsweise können Daten, die bezüglich einer Transaktion lokal sind und die nicht committed werden müssen, wie beispielsweise Metadaten, in einem gepufferten und nicht überwachten Zustand gehalten werden.In some embodiments, a cache line is buffered, but not monitored, which means that the data held in the cache line is lossy and that conflicts concerning the cache line are not detected because there is no dedicated monitoring. For example, data that is local to a transaction and need not be committed, such as metadata, may be kept in a buffered and unmonitored state.

Wenn Konflikte zwischen gepufferten Daten und Schreibvorgängen zur selben Adresse erfasst werden sollen, wird eine Leseüberwachung auf die Daten angewandt. Die Cache-Zeile wird dann in einen gepufferten und leseüberwachten Zustand überführt. Jedoch wird, um in diesen Zustand zu gelangen, eine Leseanfragen zu externen Verarbeitungselementen gesandt die alle anderen Kopien dazu zwingen, in einen gemeinsamen Zustand überzugehen. Diese externen Leseanfragen können zu einem Konflikt mit einem anderen Verarbeitungselement führen, das eine Schreibüberwachung für denselben Block-/dieselbe Cache-Zeile unterhält.When conflicts between buffered data and writes to the same address are to be detected, read-only monitoring is applied to the data. The cache line is then put into a buffered and read-monitored state. However, to get into this state, a read request is sent to external processing elements which forces all other copies to transition to a common state. These external read requests may conflict with another processing element that maintains a write monitor for the same block / same cache line.

In ähnlicher Weise wird eine Schreibüberwachung auf die Cache-Zeile angewandt, wenn Konflikte zwischen den gepufferten Daten und Lesevorgängen zu denselben Speicherblöcken erfasst werden sollen. Die Zeile wird dann in einen gepufferten und schreibüberwachten Zustand überführt, der durch Senden einer Lesen-für-Eigentümerschaft-Anfrage zu einem anderen Verarbeitungselement erreicht wird, das alle anderen Kopien zwingt, in einen ungültigen Zustand überzugehen. In ähnlicher Weise wird ein Konflikt bei jedem beliebigen Verarbeitungselement erfasst, das entweder eine Lese- oder Schreibüberwachung für denselben Speicherblock unterhält.Similarly, write monitoring is applied to the cache line when conflicts between the buffered data and reads at the same memory blocks are to be detected. The line is then placed in a buffered and read-only state, obtained by sending a read-for-ownership request to another processing element, all others Copies force you to enter an invalid state. Similarly, a conflict is detected on any processing element that maintains either read or write control for the same memory block.

Um jedoch transaktionale Konflikte zu minimieren, kann ein Speicherblock, den die Transaktion aktualisieren, jedoch möglicherweise nicht committen muss, im gepufferten, jedoch unüberwachten Zustand unterhalten werden, wie oben beschrieben wurde. Falls jedoch für einen Block, der im gepufferten, jedoch unüberwachten Zustand gehalten ist, bestimmt wird, dass er committed werden muss, wird gemäß einer Ausführungsform ein wirkungsvoller Pfad vom gepufferten und unüberwachten Zustand zu einem commitbaren Zustand vorgesehen, wie in 8 gezeigt ist.However, to minimize transactional conflicts, a block of memory which the transaction may update, but may not commit, may be maintained in the buffered but unsupervised state, as described above. However, if it is determined for a block being held in the buffered but unsupervised state that it needs to be committed, then according to one embodiment, an effective path from the buffered and unsupervised state to a commitable state is provided, as in FIG 8th is shown.

Als ein Beispiel wird eine gepufferte Aktualisierung für einen Speicherblock – eine Cache-Zeile, um den Block zu halten – im Fluss 805 empfangen. Entweder vor dem gepufferten Aktualisieren oder gleichzeitig damit wird eine Leseüberwachung auf dem Block angewandt. Beispielsweise wird ein Leseattribut für die Cache-Zeile auf einen Leseüberwachungswert gesetzt, um anzuzeigen, dass der Block leseüberwacht ist. Jedoch wird zur Anwendung der Leseüberwachung zunächst eine Leseanfrage zu anderen Verarbeitungselementen im Fluss 815 ausgesandt. In Reaktion auf den Empfang der Leseanfrage detektieren die anderen Verarbeitungselemente entweder bereits einen Konflikt aufgrund der Beibehaltung der Zeile in einem Schreibüberwachungszustand oder übertragen ihre Kopien in einen gemeinsamen Zustand im Fluss 820. Im Fluss 825 wird die Cache-Zeile, falls es keine Konflikte gibt, in einen gepufferten und leseüberwachten Zustand übertragen – die Cache-Zeilen-Kohärenzbits werden in einen gepufferten Kohärenz-Zustand aktualisiert und das Lesemonitorattribut wird gesetzt.As an example, a buffered update for a memory block - a cache line to hold the block - will flow 805 receive. Either before the buffered update or simultaneously with it, a read-only watch is applied to the block. For example, a read attribute for the cache line is set to a read monitor value to indicate that the block is read-monitored. However, to read-monitor, first, a read request to other processing elements in flow 815 sent. In response to receiving the read request, the other processing elements either already conflict because of maintaining the line in a write monitor state or transfer their copies to a common state in the flow 820 , In the river 825 For example, if there are no conflicts, the cache line is transferred to a buffered and read-monitored state - the cache line coherency bits are updated to a buffered coherency state and the read monitor attribute is set.

Im Fluss 830 werden sich entgegenstehende Schreibvorgänge auf die Cache-Zeile basierend auf der Leseüberwachung erfasst. Bei einer Ausführungsform sind die Leseattribute mit einer Snoop-Logik gekoppelt, sodass eine externe Lese-für-Eigentümerschaft-Anfrage zur Cache-Zeile einen Konflikt mit der Leseüberwachung erfasst, die auf die Cache-Zeile gesetzt ist.In the river 830 conflicting writes to the cache line are detected based on the read monitoring. In one embodiment, the read attributes are coupled to a snoop logic such that an external read-for-ownership request to the cache line detects a conflict with the read watch set on the cache line.

Später wird die Leseüberwachung im Fluss 840 angewandt, wenn der Block committed werden soll als Teil eines Zustands einer Transaktion im Fluss 835. Hierbei wird eine Lese-für-Eigentümerschaft-Anfrage zu den anderen Verarbeitungselementen im Fluss 845 gesandt, was entweder einen Konflikt in Reaktion auf ein Halten der Cache-Zeile in einem lese- oder schreibüberwachten Zustand erfasst oder ihre Kopie in einen ungültigen Zustand im Fluss 850 überführt. Als Ergebnis ermöglicht die Erfassung der Konflikte bei der Lese-für-Eigentümerschaft-Anfrage das Erfassen beliebiger Konflikte an diesen Punkt, wodurch die Zeile im Wesentlichen in einen commitbaren Zustand versetzt wird.Later, the read monitoring is in flux 840 applied when the block is to be committed as part of a state of a transaction in flow 835 , In doing so, a read-for-ownership request flows to the other processing elements 845 which either detects a conflict in response to holding the cache line in a read or write monitored state, or copies its copy to an invalid state in the flow 850 transferred. As a result, detecting the conflicts for the read for ownership request enables detection of arbitrary conflicts at that point, thereby essentially rendering the row into a commitable state.

Folglich ist das Übertragen des gepufferten und nicht überwachten Blocks in einen commitbaren Zustand in zwei Stufen – Fluss 810 und Fluss 840 – potentiell vorteilhaft. Das Verzögern der Besitzerlangung über die gestufte Akquisition von Lese- und Schreibmonitoren ermöglicht es, dass mehrere gleichzeitige Transaktionen denselben Bock aktualisieren, während die Konflikte zwischen diesen Transaktionen reduziert werden. Falls eine Transaktion aus irgendeinem Grund nicht zur Commit-Stufe gelangt, wird durch das Aktualisieren des Blocks in einer gepufferten und leseüberwachten Weise nicht bewirkt, dass eine andere Transaktion, die zur Commit-Stufe gelangen, unnötigerweise abbricht. Zusätzlich ist das Verzögern der Erlangung des ausschließlichen Besitzes des Blocks bis zur Commit-Stufe somit ein Weg, um eine höhere Nebenläufigkeit zwischen Threads zu erreichen, ohne die Gültigkeit von Daten zu opfern.Thus, transferring the buffered and unmonitored block to a commitable state is in two stages - flow 810 and river 840 - potentially advantageous. Delaying owner acquisition via the tiered acquisition of read and write monitors allows multiple concurrent transactions to update the same bucket while reducing conflicts between these transactions. If, for some reason, a transaction does not get to the commit level, updating the block in a buffered and read-monitored manner will not cause another transaction to commit to terminate unnecessarily. In addition, delaying the acquisition of exclusive ownership of the block to the commit level is thus a way to achieve higher concurrency between threads without sacrificing the validity of data.

In der nachfolgenden Tabelle E wird eine Ausführungsform sich entgegenstehender Zustande zwischen zwei Verarbeitungselementen veranschaulicht: P0 und P1. Beispielsweise stehen sich eine von P1 in einem gepufferten leseüberwachten Zustand gehaltene Zeile, die durch die R-B-Spalte gekennzeichnet ist, und ein beliebiger Zustand von P0, bei dem die Cache-Zeile mit einer Schreibüberwachung beibehalten wird, wie durch -W, RW-, WB, RWB angezeigt ist, entgegen, wie durch das x in den sich überschneidenden Zellen dargestellt ist.

Tabelle E: eine Ausführungsform sich entgegenstehender Zustände zwischen zwei Verarbeitungselementen Table E below illustrates an embodiment of conflicting states between two processing elements: P0 and P1. For example, a row held by P1 in a buffered read-supervised state identified by the RB column and any state of P0 where the cache line is maintained with write-monitoring, such as -W, RW, WB, RWB is displayed, as represented by the x in the intersecting cells.

Table E: an embodiment of conflicting states between two processing elements

Zusätzliche zeigt Tabelle F unten einen Verlust eines zugehörigen Eigentums im Verarbeitungselement P1 in Reaktion auf die unter P0 aufgelistete Operation. Falls beispielsweise P1 eine Zeile in einem gepufferten leseüberwachten Zustand hält, wie durch die Spalte R-B angezeigt wird, und entweder eine Speicher- oder Setze-Schreibüberwachungsoperation auf P0 auftritt, verliert P1 sowohl die Leseüberwachung als auch das Puffern der Zeile, wie durch das x-x in der Überschneidung der Speicher/Setze-WM-Reihen und der R-B-Spalte angezeigt wird.

Tabelle E: eine Ausführungsform eines Verlustes von Attributen als Ergebnis einer Operation In addition, Table F below shows loss of associated ownership in processing element P1 in response to the operation listed under P0. For example, if P1 holds a row in a buffered read-supervised state, as indicated by column RB, and either a memory or set-write monitoring operation on P0 occurs, P1 loses both read-monitoring and buffering of the row, as indicated by xx in FIG the intersection of the memory / set WM rows and the RB column is displayed.

Table E: an embodiment of a loss of attributes as a result of an operation

Verzweigungsinstruktion (JLOSS) für einen Konflikt oder Verlust von transaktionalen Daten Branch instruction (JLOSS) for a conflict or loss of transactional data

Unter Bezugnahme auf 9 wird eine Ausführungsform einer Hardware zur Unterstützung einer Verlustinstruktion zum Springen zu einer Zielmarke basierend auf einem Statuswert in einem Transaktionsstatusregister gezeigt. Gemäß einer Ausführungsform bietet Hardware einen beschleunigten Weg zur Überprüfung der Konsistenz einer Transaktion. Beispielsweise kann Hardware durch Bereitstellung eines Mechanismus, der einen Verlust an überwachten oder gepufferten Daten vom Cache überprüft, die Konsistenzprüfung unterstützen – die Räumung gepufferter oder überwachter Zeilen, oder potentiell widersprüchliche Zugriffe auf derartige Daten verfolgen – Monitore zur Erfassung sich widersprechender Snoops, wie beispielsweise eine Lese-für-Eigentümerschaft-Anfrage an einer überwachten Zeile.With reference to 9 For example, one embodiment of hardware for assisting a loss instruction to hop to a target based on a status value in a transaction status register is shown. In one embodiment, hardware provides an accelerated way to verify the consistency of a transaction. For example, hardware can be tracked by providing a mechanism that checks for loss of monitored or buffered data from the cache, consistency checking - clearing buffered or monitored lines, or potentially conflicting access to such data - monitors for detecting conflicting snoops, such as a Read for ownership request on a watched line.

Zusätzlich liefert gemäß einer Ausführungsform Hardware architektonische Schnittstellen, um zu ermöglichen, dass Software auf diese Mechanismen basierend auf dem Status überwachter oder gepufferter Daten zugreifen kann. Zwei derartige Schnittstellen umfassen das Folgende: (1) Instruktionen zum Lesen oder Schreiben eines Statusregisters, die es ermöglichen, dass Software das Register explizit während einer Ausführung abfragt; (2) eine Schnittstelle, die es ermöglicht, dass Software einen Handler einrichtet, der immer dann aufgerufen wird, wenn das Statusregister einen potentiellen Konsistenzverlust anzeigt.Additionally, in one embodiment, hardware provides architectural interfaces to enable software to access these mechanisms based on the status of monitored or buffered data. Two such interfaces include the following: (1) instructions for reading or writing a status register that allow software to explicitly query the register during execution; (2) an interface that allows software to set up a handler that is invoked whenever the status register indicates a potential loss of consistency.

Gemäß einer anderen Ausführungsform unterstützt Hardware eine neue mit JLOSS bezeichnete Instruktion, die eine bedingte Verzweigung basierend auf dem Status von HW-überwachten oder gepufferten Daten ausführt. Die JLOSS-Instruktion verzweigt zu einer Marke, falls die Hardware einen potentiellen Verlust irgendwelcher überwachter oder gepufferter Daten aus dem Cache erfasst oder sie erfasst potentielle Konflikte für beliebige derartige Daten. Eine Marke umfasst irgendein Ziel, wie beispielsweise eine Adresse eines Handlers oder anderen Code, der als Folge eines Datenverlusts oder einer Erfassung eines Konflikts auszuführen ist.In another embodiment, hardware supports a new JLOSS-designated instruction that performs conditional branching based on the status of HW-monitored or buffered data. The JLOSS instruction branches to a tag if the hardware detects a potential loss of any monitored or buffered data from the cache, or it detects potential conflicts for any such data. A tag includes any destination, such as an address of a handler or other code to be executed as a result of data loss or detection of a conflict.

Als eine veranschaulichende Ausführungsform zeigt 9 Dekoder 910, die JLOSS als Teil einer Prozessor-ISA erkennen und dekodiert die Instruktion, um zu ermöglichen, dass Logik des Prozessors die bedingte Verzweigung basierend auf dem Status einer Transaktion ausführt. Als Beispiel wird der Status einer Transaktion in einem Transaktionsstatusregister 915 gehalten. Das Transaktionsstatusregister kann den Status von Transaktionen darstellen, wie wenn beispielsweise Hardware einen Konflikt oder Verlust von Daten erfasst – was hierin als ein Verlustereignis bezeichnet wird. Zur Veranschaulichung wird ein Konflikt-Flag in TSR 915 gesetzt, wenn ein Monitor anzeigt, dass eine Adresse in Kombination mit einem Snoop auf die überwachte Adresse überwacht wird, wobei das Konflikt-Flag in TSR 912 anzeigt, dass ein Konflikt erfasst wurde. In ähnlicher Weise wird ein Verlust-Flag bei einem Datenverlust gesetzt, wie beispielsweise einer Entleerung einer Zeile, die transaktionale Daten oder Metadaten umfasst.As an illustrative embodiment 9 decoder 910 , which recognize JLOSS as part of a processor ISA, and decode the instruction to allow processor logic to execute the conditional branch based on the status of a transaction. As an example, the status of a transaction in a transaction status register 915 held. The transaction status register may represent the status of transactions, such as when hardware detects a conflict or loss of data - which is referred to herein as a loss event. By way of illustration, a conflict flag in TSR 915 is set when a monitor indicates that an address is being monitored in combination with a snoop to the monitored address, with the conflict flag in TSR 912 indicates that a conflict has been detected. Similarly, a loss flag is set on data loss, such as emptying a line that includes transactional data or metadata.

Daher testet hierin JLOSS, wenn es erfasst und ausführt wird, die Statusregister-Flags, und, falls dort ein Verlustereignis auftritt – ein Verlust und/oder Konflikt, liefert die Logik 925 die durch JLOSS referenzierte Marke zu den Ausführungsresourcen 930 als Sprungzieladresse. Als Folge wird Software mit einer einzelnen Instruktion in die Lage versetzt, den Status einer Transaktion festzustellen und ist basierend auf dem Status in der Lage, eine Ausführung zu einer durch die einzelne Instruktion spezifizierten Marke zu vektorieren. Da JLOSS die Konsistenz prüft, ist ein Berichten falscher Konflikte akzeptabel – JLOSS kann konservativ berichten, dass ein Konflikt aufgetreten ist.Therefore, herein JLOSS, when detected and executed, tests the status register flags and, if there is a miss event - a loss and / or conflict, provides the logic 925 the trademark referenced by JLOSS to the execution resources 930 as a jump destination address. As a result, software with a single instruction will be able to determine the status of a transaction and, based on the status, will be able to vector an execution to a mark specified by the single instruction. Because JLOSS checks for consistency, reporting false conflicts is acceptable - JLOSS can conservatively report that a conflict has occurred.

Gemäß einer Ausführungsform fügt Software, wie beispielsweise ein Compiler JLOSS-Instruktionen in den Programmcode ein, um auf Konsistenz abzufragen. Obwohl JLOSS inline mit dem Hauptanwendungscode verwendet werden kann, werden JLOSS-Instruktionen häufig in Lese- und Schreibbarrieren verwendet, um die Konsistenz auf Anfrage zu bestimmen, die häufig in Bibliotheken bereitgestellt werden; daher kann die Ausführung eines Programmcodes einen Compiler umfassen, um JLOSS in Code einzufügen oder eine Ausführung von JLOSS aus dem Programmcode oder jede beliebige andere Form eines Einfügen oder Ausführens einer Instruktion. Es wird erwartet, dass ein Abfragen durch JLOSS viel schneller ist als ein explizites Lesen des Statusregisters, da die JLOSS-Instruktion keine zusätzliche Register erfordert – es besteht keine Notwendigkeit für ein Zielregister, um die Statusinformationen für ein explizites Lesen zu empfangen. Es gibt mehrere Ausführungsformen dieser Instruktion, bei welchen die Bedingungen zum Überprüfen auf Konsistenz entweder explizit in der Instruktion oder implizit in einem getrennten Steuerregister bereitgestellt werden.According to one embodiment, software, such as a compiler, inserts JLOSS instructions into the program code to query for consistency. Although JLOSS can be used inline with the main application code, JLOSS instructions are often used in read and write barriers to determine on demand the consistency that is often provided in libraries; therefore, execution of a program code may include a compiler to insert JLOSS into code, or execution of JLOSS from the program code, or any other form of inserting or executing an instruction. It is expected that querying through JLOSS is much faster than explicitly reading the status register because the JLOSS instruction does not require additional registers - there is no need for a destination register to receive the explicit read status information. There are several embodiments of this instruction in which the conditions for checking for consistency are either explicitly provided in the instruction or implicitly in a separate control register.

Zum Beispiel hält das Transaktionsstaturregister 915 oder ein anderes Speicherelement spezielle Konflikt- und Verluststatusinformationen, wie beispielsweise ob ein leseüberwachter Ort durch einen anderen Agenten geschrieben wurde – Lesekonflikt, ein schreibüberwachter Ort gelesen oder durch einen anderen Agenten beschrieben wurde – Schreibkonflikt, ein Verlust physikalischer transaktionaler Daten oder ein Verlust von Metadaten aufgetreten ist. Daher können verschiedene Versionen der JLOSS-Instruktion verwendet werden. Beispielsweise verzweigt eine JLOSS.rm<Label>-Instruktion zu seiner Marke, falls ein beliebiger leseüberwachter Ort durch einen anderen Agenten beschrieben wurde. Ein Hardware-beschleunigter STM (HASTM) ist in der Lage, seine JLOSS.rm-Instruktion dazu zu verwenden, um eine Konsistenzprüfung zu beschleunigen – rasch auf sich entgegenstehende Aktualisierungen für einen Lesesatz durch Verwenden von JLOSS.rm zu überprüfen, immer wenn eine Lesesatzkonsistenz sicherzustellen ist, wie beispielsweise nach jedem transaktionalen Laden in einem Native-Code-TM-System. In diesem Fall kann ein Lesesatz unter Verwendung von JLOSS in einer Lesebarriere verifiziert werden, sodass die JLOSS-Instruktion in die Barriere in einer Bibliothek oder nach der Ladeoperation inline mit dem Hauptanwendungscode eingefügt wird. Ähnlich der JLOSS-Instruktion zur Erfassung von Schreibvorgängen an leseüberwachten Orten kann eine JLOSS.wm-Instruktion verwendet werden, um beliebige Schreib- oder Lesevorgänge zu schreibüberwachten Orten zu erfassen. Als noch ein weiteres Beispiel kann in einem Prozessor, der Orte Puffern kann, eine JLOSS.buf-Instruktion verwendet werden, um zu bestimmen, ob gepufferte Daten verlorengegangen sind und um in der Folge zu einer spezifizierten Marke zu springen.For example, the transaction history register holds 915 or another memory element has special conflict and loss status information, such as whether a read-monitored location has been written by another agent - read conflict, read-write location, or one other agents described - write conflict, a loss of physical transactional data or a loss of metadata has occurred. Therefore, different versions of the JLOSS instruction can be used. For example, a JLOSS.rm <label> instruction branches to its tag if any read-monitored location has been described by another agent. A hardware accelerated STM (HASTM) is able to use its JLOSS.rm instruction to speed up a consistency check - quickly check for conflicting updates to a read set by using JLOSS.rm whenever a read set consistency for example, after each transactional load in a Native Code TM system. In this case, a read set can be verified using JLOSS in a read barrier so that the JLOSS instruction is inserted into the barrier in a library or after the load operation inline with the main application code. Similar to the JLOSS instruction to capture writes to read-monitored locations, a JLOSS.wm instruction can be used to capture any read or write to read-only locations. As yet another example, in a processor that can buffer locations, a JLOSS.buf instruction may be used to determine if buffered data has been lost and to subsequently hop to a specified mark.

Der folgende, als Pseudocode A bezeichnete Pseudocode zeigt eine Native-Code-STM-Lesebarriere, die einen konsistenten Lesesatz bereitstellt und JLOSS verwendet. Die setrm(void*address)-Funktion setzt den Lesemonitor auf die gegebene Adresse und die JLOSS_rm0-Funktion ist eine intrinsische Funktion für die JLOSS-Instruktion, die „wahr” zurückgibt, falls sich entgegenstehende Zugriffe auf leseüberwachte Orte aufgetreten sind. Dieser Pseudocode überwacht die geladenen Daten, wobei es jedoch ebenfalls möglich ist, stattdessen die Transaktionsdatensätze (Besitzerdatensätze) zu überwachen. Eine Verwendung einer Instruktion, die ein Setzen des Lesemonitors mit einem Laden der Daten kombiniert, ist möglich – z. B. eine movxm-Instruktion, die die Daten sowohl lädt als auch überwacht. Ebenso ist es möglich, dies in einer Lesebarriere zu verwenden, die ein Filter zusätzlich zur Überwachung durchführt, wie auch dies in einem STM-System zu verwenden, das lediglich eine Hardware-Überwachung für eine Lesesatzvalidierung verwendet – ein STM-System, das keine Software-Leseprotokollierung und keine SW-Validierung ausführt. Pseudo-Code A: ein In-Situ Aktualisierungs-STM, optimistische Lese-, native Code-Barriere

The following pseudo code, called Pseudocode A, shows a Native Code STM read barrier that provides a consistent read set and uses JLOSS. The setrm (void * address) function sets the read monitor to the given address and the JLOSS_rm0 function is an intrinsic function for the JLOSS instruction that returns "true" if conflicting accesses to read-monitored locations have occurred. This pseudocode monitors the loaded data, but it is also possible to monitor the transaction records (owner records) instead. It is possible to use an instruction that combines setting the read monitor to load the data - e.g. A movxm instruction that both loads and monitors the data. Likewise, it is possible to use this in a read barrier that performs a filter in addition to monitoring, as well as to use in an STM system that only uses hardware monitoring for read set validation - an STM system that is not software Read-only and no SW validation. Pseudo code A: an in-situ update STM, optimistic reading, native code barrier

In ähnlicher Weise kann ein STM-Sytem, das keine Lese-Satz-Konsistenz beibehält, wie beispielsweise ein STM für verwalteten Code, unendliche Schleifen oder einen anderen unkorrekten Steuerungsfluss vermeiden – Ausnahmen, aufgrund einer Inkonsistenz durch Einfügen einer JLOSS-Instruktion an den Rückkehrschleifen-Rändern oder anderen kritischen Steuerungsflusspunkten, wie beispielsweise Instruktionen, die Ausnahmen verursachen können. Similarly, an STM system that does not maintain read set consistency, such as a managed code STM, infinite loops, or other incorrect control flow, can avoid exceptions due to inconsistency by inserting a JLOSS instruction at the return loop. Borders or other critical control flow points, such as instructions that may cause exceptions.

Der folgende als Pseudocode B bezeichnete Pseudocode zeigt eine andere Native-Code-Lesebarriere, die Konsistenz gewährleistet. Das TM-System dieser Version verwendet Cacheresidente Schreibsets unter Verwendung gepufferter Updates für Schreiben innerhalb von Transaktionen. Ein Lesen von einem zuvor gepufferten und dann verlorengegangenen Ort verursacht eine Inkonsistenz und, um die Konsistenz beizubehalten, vermeidet diese Lesebarriere ein Lesen von irgendeinem verlorengegangenen gepufferten Ort. Das COMMIT_LOCKING-Flag ist wahr, falls das STM eine Commit-Zeitverriegelung für gepufferte Orte verwendet. Die JLOSS_buf0-Überprüfung wird für Lesevorgänge von einem zuvor verriegelten Ort verwendet, wenn keine Commit-Zeit-Verriegelung verwendet wird. Ansonsten wird sie bei allen Lesevorgängen verwendet. Pseudo-Code B: In-situ-Aktualisierung, Native-Code-STM-Lesebarriere

The following pseudo-code, called pseudocode B, shows another native-code read barrier that ensures consistency. The TM system of this release uses cacheresidentes writing sets using buffered updates for writing within transactions. Reading from a previously buffered and then lost location causes an inconsistency and, to maintain consistency, this read barrier avoids reading from any lost buffered location. The COMMIT_LOCKING flag is true if the STM uses a buffered-location commit time-out. The JLOSS_buf0 check is used for reads from a previously locked location if no commit time lock is used. Otherwise, it will be used in all reads. Pseudo-code B: in situ update, native code STM reading barrier

TM-Systeme können das Leseüberwachen mit Puffer und Schreib-Überwachen kombinieren, wie oben erläutert wurde, und können somit auch ein Überprüfen auf Konflikte bei entweder überwachten oder gepufferten Zeilen umfassen, um die Konsistenz beizubehalten. Um derartigen Systemen Rechnung zu tragen, können gemäß verschiedenen Ausführungsformen auch JLOSS-Varianten bereit gestellt werden, die auf logischen Kombinationen verschiedener Überwachungs- und Pufferereignisse verzweigen, wie beispielsweise JLOSS.rm.buf (Konflikte an lese-überwachten oder gepufferten Orten), JLOSS.rm.wm (Konflikte an lese- oder schreib-überwachten Orten) oder JLOSS.* (Konflikt an einem lese-überwachten, schreibüberwachten oder gepufferten Ort).TM systems may combine read monitoring with buffering and write monitoring, as discussed above, and thus may include checking for conflicts on either monitored or buffered lines to maintain consistency. To accommodate such systems, various embodiments may also provide JLOSS variants that branch to logical combinations of various monitoring and buffering events, such as JLOSS.rm.buf (conflicts at read-monitored or buffered locations), JLOSS. rm.wm (conflicts at read or write-monitored locations) or JLOSS. * (conflict at a read-only, read-only, or buffered location).

Bei einer anderen Ausführungsform entkoppelt die architektonische Schnittstelle die JLOSS-Instruktion von den Bedingungen unter welchen sie sich verzweigt, indem zugelassen wird, dass Software die Bedingungen – Konflikt bei lese/schreib-überwachten Zeilen oder gepufferten Zeilen – in einem getrennten Steuerregister einstellt. Bei dieser Ausführungsform ist lediglich eine einzelne JLOSS-Instruktion erforderlich, die zukünftige Erweiterungen der Gruppe von Ereignissen, bei welchen JLOSS verzweigen soll, unterstützen kann.In another embodiment, the architectural interface decouples the JLOSS instruction from the conditions under which it branches by allowing software to set conditions - conflict on read / write-monitored rows or buffered rows - in a separate control register. In this embodiment, only a single JLOSS instruction is required, which may support future extensions of the group of events at which JLOSS is to branch.

Mit Bezug zu 10 wird eine Ausführungsform eines Flussdiagramms für ein Verfahren zur Ausführung einer Verlust-Instruktion gezeigt, um basierend auf einem Konflikt oder einem Verlust spezifischer Informationen zu einer Zielmarke zu springen. Bei einer Ausführungsform wird eine JLOSS-Instruktion im Fluss 1005 empfangen. Wie oben erläutert wurde, kann die JLOSS-Instruktion durch einen Programmierer oder einen Compiler entweder in Haupt-Codes eingefügt werden, wie beispielsweise hinter einer Lade-Operation, um eine Lese-Satz-Konsistenz sicher zu stellen, oder innerhalb einer Barriere, wie beispielsweise innerhalb einer Lese- oder Schreib-Barriere. Die JLOSS-Instruktion und ihre oben erläuterten Varianten sind bei einer Ausführungsform als Teil der ISA des Prozessors erkennbar. Hier sind Decoder dazu in der Lage die Opcodes für die JLOSS-Instruktion zu dekodieren.In reference to 10 For example, one embodiment of a flowchart for a method of executing a loss instruction to jump to a target based on a conflict or loss of specific information is shown. In one embodiment, a JLOSS instruction is in flow 1005 receive. As discussed above, the JLOSS instruction may be inserted by a programmer or compiler into either main codes, such as a load operation to ensure read-set consistency, or within a barrier such as within a read or write barrier. The JLOSS instruction and its variants explained above are at one Embodiment recognizable as part of the ISA of the processor. Here decoders are able to decode the opcodes for the JLOSS instruction.

Im Fluss 1010 wird bestimmt, ob ein Konflikt oder ein Verlust von Informationen aufgetreten ist. Bei einer Ausführungsform hängt der Typ des Konflikts oder des Verlustes von der Variante der JLOSS-Instruktion ab. Falls beispielsweise die empfangene JLOSS-Instruktion eine JLOSS.rm-Instruktion ist, wird festgestellt, ob auf eine leseüberwachte Zeile durch einen externen Schreibvorgang auf eine konfliktionäre Weise zugegriffen wurde. Jedoch kann, wie oben beschrieben wurde, jede Abwandlung von JLOSS empfangen werden, einschließlich einer JLOSS-Instruktion, die es dem Nutzer ermöglicht, Bedingungen in einem Steuerregister zu spezifizieren.In the river 1010 determines if a conflict or loss of information has occurred. In one embodiment, the type of conflict or loss depends on the variant of the JLOSS instruction. For example, if the received JLOSS instruction is a JLOSS.rm instruction, it is determined if a read-supervised row was accessed by an external write in a conflicting manner. However, as described above, any variation of JLOSS may be received, including a JLOSS instruction that allows the user to specify conditions in a control register.

Daher wird sobald die Bedingungen festgelegt sind, entweder vom Steuerregister oder von einem Typ einer JLOSS-Instruktion, bestimmt, ob diese Bedingungen erfüllt wurden. Als ein erstes Beispiel werden Informationen in einem Transaktionsstatusregister, wie beispielsweise TSR 915, dazu verwendet, um zu bestimmen, ob die Bedingungen erfüllt wurden. Hier kann das TSR 915 einen Leseüberwachungsstatus-Flag umfassen, das als Voreinstellung auf einen Kein-Konflikt-Wert gesetzt wird und das auf einen Konfliktwert aktualisiert wird, um anzuzeigen, dass ein Konflikt aufgetreten ist. Jedoch ist ein Statusregister nicht der einzige Weg, um zu bestimmen, ob ein Konflikt aufgetreten ist und es kann tatsächlich jedes beliebige bekannte Verfahren zum Bestimmen eines Verlustes oder eines Konflikts verwendet werden.Therefore, once the conditions are set, either by the control register or by a type of JLOSS instruction, it is determined if these conditions have been met. As a first example, information in a transaction status register, such as TSR 915 , used to determine if the conditions have been met. Here is the TSR 915 a read-monitoring status flag set by default to a no-conflict value and updated to a conflict value to indicate that a conflict has occurred. However, a status register is not the only way to determine if a conflict has occurred and, in fact, any known method of determining a loss or conflict may be used.

In Reaktion darauf, dass kein Konflikt erfasst wurde, wie beispielsweise wenn ein Leseüberwachungskonflikt-Flag immer noch auf einen Voreinstellungswert in TSR 915 gesetzt ist, wird ein Falsch-Wert im Fluss 1025 zurückgegeben und die Ausführung normal fortgesetzt. Falls jedoch ein Konflikt oder ein Verlust erfasst wurde, wie beispielsweise dadurch, dass das Leseüberwachungskonflikt-Flag gesetzt wurde, gibt JLOSS „wahr” im Fluss 1015 zurück und lenkt die Ausführung dahin, zu einer Marke zu springen, die durch die empfangene JLOSS-Instruktion im Fluss 1020 empfangen wurde.In response to no conflict being detected, such as when a read conflict conflict flag is still at a default value in TSR 915 is set, a false value becomes in flux 1025 returned and the execution continued normally. However, if a conflict or loss has been detected, such as by setting the read-guard conflict flag, JLOSS returns "true" in flow 1015 back and redirects the execution to jump to a mark, which flows through the received JLOSS instruction 1020 was received.

Hardware-Unterstützung für transaktionalen Speicher-CommitHardware support for transactional storage commit

Wie zuvor erläutert, können Hardware-unterstützte Transaktionen die Verwaltung einer Version der Software durch Puffer transaktionaler Schreibvorgänge im Cache beschleunigen, ohne sie global sichtbar zu machen. In diesem Fall kann eine einfache Commit-Instruktion verwendet werden, die die gepufferten Werte für alle Prozessoren sichtbar macht, jedoch fehlschlägt, falls irgendwelche gepufferten Zeilen verloren gehen. Jedoch kann die Eignung von Hardware auch Metadaten zu halten, welche eine Software zur Beschleunigung verwenden kann, wie beispielsweise einen Filter, um redundante Barrieren zu beseitigen/zu filtern, ein Fehlschlagen einer Commit-Instruktion benötigen, falls Hardware irgendwelche Konflikte erfasst hat. Zusätzlich kann es bei einem Commit wünschenswert sein, verschiedene Kombinationen von Information zu löschen, die in Hardware für eine Transaktion gehalten wird, wie beispielsweise Metadaten, Monitore und gepufferte Zeilen.As discussed above, hardware-assisted transactions can speed the management of a version of the software by buffering transactional writes in the cache without making them globally visible. In this case, a simple commit instruction can be used that makes the buffered values visible to all processors, but fails if any buffered lines are lost. However, the ability of hardware to hold metadata that software may use for acceleration, such as a filter to remove / filter redundant barriers, may require a commit instruction to fail if hardware has detected any conflicts. Additionally, at commit, it may be desirable to delete various combinations of information held in hardware for a transaction, such as metadata, monitors, and buffered rows.

Daher unterstützt bei einer Ausführungsform die Hardware mehrere Formen einer Commit-Instruktion, um zu ermöglichen, dass die Commit-Instruktion sowohl die Bedingungen für den Commit als auch die beim Commit zu löschende Information bestimmen kann. Unter Bezugnahme auf 11 wird eine Ausführungsform eines allgemeinen Falls, in dem Hardware eine Definition von Commit-Bedingungen und Löschungssteuerungen in einer Commit-Instruktion unterstützt, gezeigt.Thus, in one embodiment, the hardware supports multiple forms of a commit instruction to allow the commit instruction to determine both the commit conditions and the information to be deleted at commit. With reference to 11 For example, one embodiment of a general case in which hardware supports definition of commit conditions and delete controls in a commit instruction is shown.

Wie dargestellt ist, umfasst eine Commit-Instruktion 1105 einen Opcode 1110, der als Teil der ISA eines Prozessors erkennbar ist – Decoder 1115 sind in der Lage, den Opcode 1110 zu dekodieren. Bei dem veranschaulichten Beispiel umfasst der Opcode 1110 zwei Teile: Commit-Bedingungen 1111 und eine Löschungssteuerung 1112. Die Commit-Bedingungen 1111 spezifizieren die Bedingungen für den Commit einer Transaktion, während die Commit-Löschungssteuerung 1112 die bei einem Commit einer Transaktion zu löschenden Informationen festlegt.As shown, includes a commit instruction 1105 an opcode 1110 , which is recognizable as part of the ISA of a processor - decoder 1115 are capable of the opcode 1110 to decode. In the illustrated example, the opcode includes 1110 two parts: commit conditions 1111 and an erase control 1112 , The commit conditions 1111 specify the conditions for committing a transaction while the commit deletion control 1112 sets the information to be deleted when a transaction is committed.

Bei einer Ausführungsform umfassen beide Teile vier Werte: Leseüberwachung (RM), Schreibüberwachung (WM), Puffern (Buf) und Metadaten (MD). Im Wesentlichen ist, falls ein beliebiger der vier Werte im Abschnitt 1111 gesetzt ist – einschließlich eines Wertes, um anzuzeigen, dass das zugeordnete Attribut/Eigenschaft eine Commit-Bedingung ist, die entsprechende Eigenschaft eine Bedingung für den Commit. Mit anderen Worten führt, falls das erste Bit der Bedingungen 1111 entsprechend einer Leseüberwachungsinformation gesetzt ist, der Verlust beliebiger Leseüberwachungsdaten von den Monitoren 1135, die mit der Transaktion verbunden sind, zu einem Abbruch – kein Commit, da eine festgelegte Bedingung der Commit-Instruktion fehlgeschlagen ist. Ähnlich wird die entsprechende Eigenschaft beim Commit gelöscht, falls ein Wert in 1112 gesetzt ist. Zur Fortsetzung des Beispiels wird die Leseüberwachungsinformation in den Monitoren 1135 für die Transaktion gelöscht, wenn die Transaktion committed wird, falls RM im Teil 1112 gesetzt ist. Daher besteht bei diesem Beispiel eine Möglichkeit von vier Bedingungen für Commit und vier Löschungssteuerungen, was zu 256 möglichen Kombinationen als Abwandlungen auf eine Commit-Instruktion führt. Bei einer Ausführungsform ist die Hardware dadurch, dass die Commit-Bedingungen im Opcode spezifiziert werden können, in der Lage alle Abwandlungen zu unterstützen. Jedoch werden nachfolgend einige wenige Abwandlungen für ein weiteres Verständnis der verschiedenen Arten von Commit-Instruktionen und wie diese verwendet werden können, erläutert.In one embodiment, both parts include four values: read monitoring (RM), write monitoring (WM), buffering (Buf), and metadata (MD). Essentially, if any of the four values in the section 1111 is set - including a value to indicate that the associated attribute / property is a commit condition, the corresponding property is a condition for the commit. In other words, if the first bit of conditions 1111 according to read monitoring information, the loss of any read monitoring data from the monitors 1135 aborted - no commit because a specified condition of the commit instruction has failed. Similarly, the corresponding property is deleted on commit, if a value in 1112 is set. To continue the example, the read monitoring information is displayed in the monitors 1135 cleared for the transaction if the transaction is committed, if RM is in the part 1112 is set. Therefore, in this example, there is a possibility of four conditions for commit and four delete controls, resulting in 256 possible combinations as modifications to a commit instruction. In one embodiment, by allowing the commit conditions in the opcode to be specified, the hardware is able to support all variations. However, a few modifications are described below for further understanding of the various types of commit instructions and how to use them.

TXCOMWMTXCOMWM

Als erstes Beispiel wird eine Txcomwm-Instruktion erörtert. Diese Instruktion beendet die Transaktion und macht alle schreibüberwachten gepufferten Daten allgemein sichtbar, falls keine schreibüberwachten Daten verlorengegangen sind (Erfolg); andererseits hat sie keinen Erfolg, falls schreibüberwachte Daten verlorengegangen sind. Txcomwm setzt ein Flag (oder setzt es zurück), um einen Erfolg (oder einen Fehlschlag) anzuzeigen. Bei einem Erfolg löscht Txcomwm den gepufferten Zustand aller schreibüberwachten Daten. Txcomwm beeinflusst keinen lese- oder schreibüberwachten Zustand, wodurch ermöglicht wird, dass Software diesen Zustand in nachfolgenden Transaktionen erneut verwenden kann. Es beeinflusst auch nicht den Zustand von Orten, die gepuffert, jedoch nicht schreibüberwacht sind, was es ermöglicht, dass Software Informationen, die an derartigen Orten gehalten werden, weiter bestehen lassen kann. Der nachfolgend als Pseudocode C markierte Pseudocode veranschaulicht eine algorithmische Beschreibung von Txcomwm. Wenn TSR.LOSS_WM 0 ist, wird die BF-Eigenschaft aller schreibüberwachten gepufferten BBLKs atomisch gelöscht und alle derartigen gepufferten Daten werden für andere Agenten sichtbar. TCR.IN_TX wird gelöscht. Gepufferte Blöcke, die keinen WM haben, werden nicht beeinflusst und bleiben gepuffert. Das CF-Flag wird bei einer Vervollständigung (Completion) gesetzt. Wenn TSR.LOSS_WM 1 ist, wird das CF-Flag gelöscht und TCR.IN_TX wird gelöscht. Das CF-Flag wird auf 1 gesetzt, falls die Operation erfolgreich war und auf 0 bei einen Misserfolg. Die OF-, SF-, ZF-, AF- und PF-Flags werden auf 0 gesetzt. Pseudocode C: Ausführungsform eines Algorithmus für eine Txcomwm-Operation

As a first example, a Txcomwm instruction is discussed. This instruction terminates the transaction and makes all write-monitored buffered data generally visible if no read-only data has been lost (success); On the other hand, it has no success if write-monitored data is lost. Txcomwm sets (or resets) a flag to indicate success (or failure). If successful, Txcomwm clears the buffered state of all read-only data. Txcomwm does not affect a read-only or write-monitored state, which allows software to reuse this state in subsequent transactions. It also does not affect the state of locations that are buffered but not write-monitored, allowing software to continue to hold information held in such locations. The pseudocode marked below as Pseudocode C illustrates an algorithmic description of Txcomwm. If TSR.LOSS_WM is 0, the BF property of all write-monitored buffered BBLKs is atomically cleared and all such buffered data becomes visible to other agents. TCR.IN_TX is deleted. Buffered blocks that have no WM are not affected and remain buffered. The CF flag is set on completion. If TSR.LOSS_WM is 1, the CF flag is cleared and TCR.IN_TX is cleared. The CF flag is set to 1 if the operation was successful and 0 in the case of a failure. The OF, SF, ZF, AF and PF flags are set to 0. Pseudocode C: Embodiment of an algorithm for a Txcomwm operation

Der nachfolgend als Pseudocode D markierte Pseudocode zeigt, wie ein HASTM-System in der Lage ist, die Txcomwm-Instruktion zu verwenden, um eine Transaktion zu committen, die ein Hardware-Schreibpuffern verwendet, um eine undo-Protokollierung in einem in-Situ-aktualisierten STM zu vermeiden. Das CACHE-RESIDENT-WRITES-Flag zeigt diesen Ausführungsmodus an. veranschaulicht eine Ausführungsform, wie ein HASTM Pseudocode D: Ausführungsform eines Pseudocodes zur Verwendung einer TXCOMWM-Instruktion

The pseudocode, hereafter referred to as Pseudocode D, shows how a HASTM system is able to use the Txcomwm instruction to commit a transaction that uses a hardware write buffer to perform an undo logging in an in-situ to avoid updated STM. The CACHE-RESIDENT-WRITES flag indicates this execution mode. illustrates an embodiment, such as a HASTM Pseudocode D: Embodiment of a Pseudocode for Using a TXCOMWM Instruction

TXCOMWMRMTXCOMWMRM

Gemäß einer Variante erweitert txcomwmrm die Txcomwm-Instruktion, sodass sie fehlschlägt, falls irgendwelche leseüberwachten Orte ebenfalls verlorengegangen sind. Diese Variante ist für Transaktionen nützlich, die lediglich Hardware verwenden, um lesegesetzte Konflikte zu erfassen. Der unten angegebene als Pseudocode E markierte Pseudocode, veranschaulicht eine algorithmische Beschreibung von Txcomwmrm. Wenn TSR.LOSS_WM und TSR.LOSS_RM 0 sind, wird die BF-Eigenschaft aller schreibüberwachten gepufferten BBLKs atomisch gelöscht und alle derartigen gepufferten Daten werden für andere Agenten sichtbar. TCR.IN_TX wird gelöscht. Gepufferte Blöcke, die keinen WM aufweisen, werden nicht beeinträchtigt und bleiben gepuffert. Das CF-Flag wird bei einer Vervollständigung gesetzt. Wenn TSR.LOSS_WM oder TSR.LOSS_RM 1 ist, wird das CF-Flag gelöscht und TCR.IN_TX wird gelöscht. Das Flag wird auf 1 gesetzt, falls die Operation Erfolg hatte und bei einem Fehlschlag auf 0 gelöscht. Die OF-, SF-, ZF-, AF- und PF-Flags werden auf 0 gesetzt. Pseudocode E: Eine Ausführungsform einer algorithmischen Beschreibung von Txcomwmrm

According to one variant, txcomwmrm extends the Txcomwm instruction so that it fails if any read-monitored locations are also lost. This variant is useful for transactions that only use hardware to capture read conflicts. The pseudocode marked pseudo-code E below illustrates an algorithmic description of Txcomwmrm. If TSR.LOSS_WM and TSR.LOSS_RM are 0, the BF property of all write-monitored buffered BBLKs is atomically cleared, and all such buffered data becomes visible to other agents. TCR.IN_TX is deleted. Buffered blocks that have no WM are not affected and remain buffered. The CF flag is set upon completion. If TSR.LOSS_WM or TSR.LOSS_RM is 1, the CF flag is cleared and TCR.IN_TX is cleared. The flag is set to 1 if the operation was successful and cleared to 0 on failure. The OF, SF, ZF, AF and PF flags are set to 0. Pseudocode E: An embodiment of an algorithmic description of Txcomwmrm

Der nächste Pseudocode, Pseudocode F, zeigt den Commit-Algorithmus unter Verwendung der Txcomwmrm-Instruktion für ein STM-System, das Hardware sowohl zum Puffer von transaktionalen Schreibvorgängen als auch zum Erfassen von Lese-Satz-Konflikten verwendet. Das HW_READ_MONITORING-Flag zeigt an, ob der Algorithmus ausschließlich Hardware für eine Lese-Satz-Konflikterfassung verwendet. Pseudocode F: Eine Ausführungsform eines Pseudocodes, der die Txcomwmrm-Instruktion verwendet

The next pseudocode, pseudocode F, shows the commit algorithm using the Txcomwmrm instruction for an STM system that uses hardware both to buffer transactional writes and to detect read set conflicts. The HW_READ_MONITORING flag indicates whether the algorithm uses only hardware for a read-set conflict detection. Pseudocode F: An embodiment of a pseudocode that uses the Txcomwmrm instruction

TXCOMWMIRMCTXCOMWMIRMC

Eine dritte Variante wird nachfolgend in der Beschreibung des Algorithmus des Pseudocodes F gezeigt. Wenn TSR.LOSS_WM und TSR.LOSS_IRM 0 sind, wird die BF-Eigenschaft aller schreibüberwachten gepufferten BBLKs atomisch gelöscht und alle derartigen gepufferten Daten werden für andere Agenten sichtbar. RM, WM und IRM sowie TCR.IN_TX werden gelöscht. Gepufferte Blöcke, die kein WM aufweisen, sind nicht betroffen und bleiben gepuffert. Das CF-Flag wird bei der Vervollständigung gesetzt. Wenn TSR_LOSS_WM oder TSR.LOSS_IRM 1 ist, wird das CF-Flag gelöscht und TCR.IN_TX wird gelöscht. Das CF-Flag wird auf 1 gesetzt, falls die Operation Erfolg hatte und auf 0 gelöscht bei einem Fehlschlagen. Die OF-, SF-, ZF-, AF- und PF-Flags werden auf 0 gesetzt. Pseudocode F: Eine Ausführungsform einer algorithmische Beschreibung für eine TX-COMWMIRMC-Instruktion

A third variant is shown below in the description of the algorithm of the pseudocode F. If TSR.LOSS_WM and TSR.LOSS_IRM are 0, the BF property of all write-monitored buffered BBLKs is atomically cleared, and all such buffered data becomes visible to other agents. RM, WM and IRM and TCR.IN_TX are deleted. Buffered blocks that have no WM are not affected and remain buffered. The CF flag is set at completion. If TSR_LOSS_WM or TSR.LOSS_IRM is 1, the CF flag is cleared and TCR.IN_TX is cleared. The CF flag is set to 1 if the operation was successful and cleared to 0 on failure. The OF, SF, ZF, AF and PF flags are set to 0. Pseudocode F: An embodiment of an algorithmic description for a TX-COMWMIRMC instruction

Unter Bezugnahme auf 12 wird eine Ausführungsform eines Flussdiagramms für ein Verfahren zur Ausführung einer Commit-Instruktion gezeigt, die Commit-Bedingungen und Löschungssteuerungen definiert. Im Fluss 1205 wird eine Commit-Instruktion empfangen. Wie oben erläutert, kann ein Compiler eine Commit-Instruktion in einen Programmcode einfügen. Als ein spezielles veranschaulichendes Beispiel wird ein Call an eine Commit-Funktion in einen Hauptcode eingefügt und die Commit-Funktion, wie die oben vom Pseudocode umfassten, werden in einer Bibliothek bereitgestellt; ein Compiler kann die Commit-Instruktion auch in die Commit-Funktion in der Bibliothek einfügen.With reference to 12 "An embodiment of a flowchart for a method of executing a commit instruction that defines commit conditions and delete controls is shown. In the river 1205 a commit instruction is received. As explained above, a compiler can insert a commit instruction into a program code. As a specific illustrative example, a call to a commit function is inserted into a main code, and the commit function, such as that encompassed above by the pseudo-code, is provided in a library; a compiler can also insert the commit instruction into the commit function in the library.

Nachdem die Commit-Instruktion empfangen wurde, sind Decoder in der Lage, die Commit-Instruktion zu decodieren. Aus der decodierten Information werden im Fluss 1210 die durch den Op-Code der Commit-Instruktion bestimmten Bedingungen festgelegt. Wie oben erläutert wurde, kann, der Op-Code einige Flags setzen und andere zurücksetzen, um anzuzeigen, welche Bedingungen für den Commit verwendet werden sollen. Falls die Bedingungen nicht erfüllt werden, wird ein Fehler (false) zurückgegeben und die Transaktion kann getrennt abgebrochen werden. Falls jedoch die Bedinungen für den Commit, wie beispielsweise beliebige Kombinationen von kein Verlust von Leseüberwachungen, Schreibüberwachungen, Metadaten und/oder Puffer, werden im Fluss 1215 die Löschungsbedingungen/Steuerung bestimmt. Als Beispiel wird eine beliebige Kombination von Leseüberwachungen, Schreibüberwachungen, Metadaten und/oder Puffer für die Transaktion zum Löschen bestimmt. Als Folge davon werden die zum Löschen bestimmten Informationen im Fluss 1225 gelöscht.After the commit instruction has been received, decoders are able to decode the commit instruction. From the decoded information will flow 1210 set the conditions determined by the op code of the commit instruction. As explained above, the op code may set some flags and reset others to indicate which conditions to commit. If the conditions are not met, an error (false) is returned and the transaction can be aborted separately. However, if the commit conditions, such as any combination of no loss of read monitoring, write monitoring, metadata, and / or buffers, are in flux 1215 the cancellation conditions / control determined. As an example, any combination of read monitors, write monitors, metadata, and / or buffers for the transaction is designated for deletion. As a result, the information intended for deletion will be in flux 1225 deleted.

Optimierte Speicherverwaltung für UTM Optimized memory management for UTM

Wie oben erläutert, erweitern die unbegrenzte transaktionale Speicherarchitektur (UTM) und ihre Hardwareimplementierung die Prozessorarchitektur durch Einführen der folgenden Eigenschaften: Überwachung, Pufferung und Metadaten. Miteinander kombiniert bilden sie für die Software das notwendige Mittel um eine Vielfalt hochentwickelter Algorithmen zu implementieren einschließlich eines breiten Spektrums an transaktionalen Speicherentwürfen. Jede Eigenschaft kann entweder durch Erweitern der bestehenden Cache-Protokolle in der Cache-Implementierung oder durch Zuordnen unabhängiger neuer Hardware-Resourcen als Hardware implementiert werden.As discussed above, the Unlimited Transactional Memory Architecture (UTM) and its hardware implementation extend the processor architecture by introducing the following properties: monitoring, buffering, and metadata. Combined, they provide the software with the necessary tools to implement a variety of advanced algorithms, including a wide range of transactional memory designs. Each property can be implemented either by extending the existing cache protocols in the cache implementation or by allocating independent new hardware resources as hardware.

Wenn die UTM-Eigenschaften mit HW implementiert sind, bieten die UTM-Architektur und ihre Hardware-Implementierungen bei Transaktionen potentiell einen Leistungsschub über eine reine Softwarelösung, falls es möglich ist, Störungen, wie beispielsweise UTM-Transaktionsabbrüche und nachfolgende Transaktionswiederversuchsoperationen effektiv zu vermeiden und zu minimieren. Eine der Hauptursachen für Hardware-Transaktionsabbrüche war durch einen häufigen Ringübergang verursacht, der durch externe Interrupts, System-Call-Ereignisse und Seitenfehler verursacht wurde.When the UTM properties are implemented with HW, the UTM architecture and its hardware implementations potentially provide a performance boost to transactions through a software-only solution, if it is possible to effectively avoid and facilitate disruptions such as UTM transaction aborts and subsequent transaction retry operations minimize. One of the main causes of hardware transaction aborts was a frequent ring transition caused by external interrupts, system call events, and page faults.

Ein Aussetzungsmechanismus, der auf einem momentanen Privileg-Niveau (CPL) basiert, macht eine Hardware-Transaktion aktiv (ermöglicht eine hardwarebeschleunigte Transaktion mit UTM-Eigenschaften, wie beispielsweise Pufferung und Überwachung und ermöglicht den Auswurfmechanismus), während der Prozessor auf der Privilegebene 3 (Nutzermodus) arbeitet. Ringübergänge vom Ring 3 bewirken ein automatisches Aussetzen der momentan aktiven Transaktionen (Anhalten der Erzeugung von UTM-Eigenschaften und Deaktivieren des Auswurfmechanismus). In ähnlicher Weise wird bei Ringübergängen zurück zum Ring 3 automatisch die zuvor ausgesetzte Hardware-Transaktion wiederaufgenommen, als ob sie aktiv wäre. Der potentielle Nachteil dieses Ansatzes ist, dass die Verwendung von Hardwaretransaktionalen Speicherressourcen im Kernel-Code oder auf beliebigen anderen Ringebenen mit der Ausnahme von Ring 3 zumeist ausgeschlossen ist.A suspension mechanism based on a current privilege level (CPL) enables a hardware transaction (enables a hardware-accelerated transaction with UTM properties, such as buffering and monitoring, and enables the eject mechanism), while the processor at privilege level 3 (FIG. User mode) works. Ring transitions from Ring 3 cause automatic suspension of the currently active transactions (stopping the generation of UTM properties and disabling the ejection mechanism). Similarly, ring transitions back to ring 3 automatically resume the previously suspended hardware transaction as if it were active. The potential disadvantage of this approach is that the use of hardware transactional memory resources in kernel code or on any other ring layer except ring 3 is mostly excluded.

Gemäß einem weiteren Ansatz werden duplizierte TM-Steuerressourcen, wie beispielsweise ein Transaktionssteuerregister (TxCR) für den Ring 0 eingeführt, sodass die Hardware-Transaktionen für den Ring-0-Code mit diesen getrennten TM-Ressourcen immer noch ermöglicht werden können. Jedoch mangelt es diesem Ansatz potentiell immer noch an einer wirksamen Lösung zur Behandlung von verschachtelten Interrupts und Ausnahmen während Ring-0-Transaktionsoperationen.In another approach, duplicate TM control resources, such as a transaction control register (TxCR) for ring 0, are introduced so that the hardware transactions for the ring 0 code can still be enabled with these separate TM resources. However, this approach still potentially lacks an effective solution for handling nested interrupts and exceptions during Ring-0 transaction operations.

Als Folge davon veranschaulicht 13 eine Ausführungsform einer Hardware zur Unterstützung der Handhabung von Übergangen auf priviligierter Ebene während der Ausführung von Transaktionen, welche Ring-0-Transaktionen auf den Nutzermodus-(Ring 3)-Transaktionen ermöglicht, aber auch dafür sorgt, dass das OS und ein Hypervisor, wie beispielsweise ein virtueller Maschinenmonitor (VMM) unbeschränkte Ebenen von verschachtelten Interrupts und NMI-Fällen bei der Anwesenheit von Ring-0-Transaktionen behandeln, Ein Speicherelement, wie beispielsweise ein EFLAGS-Register 1310 umfasst ein Feld zur Aktivierung von Transaktionen (TEF) 1311. Wenn das TEF 1311 einen aktiven Wert hält, zeigt es an, dass eine Transaktion momentan aktiv und aktiviert ist, während es anzeigt, dass eine Transaktion ausgesetzt wurde, wenn TEF 1311 einen inaktiven Wert hält.As a result, illustrated 13 an embodiment of hardware to support privileged-level handle handling during the execution of transactions, which enables Ring 0 transactions on the user-mode (Ring 3) transactions, but also ensures that the OS and a hypervisor such as For example, a virtual machine monitor (VMM) handles unrestricted levels of nested interrupts and NMI cases in the presence of Ring 0 transactions, a storage element such as an EFLAGS register 1310 includes a transaction activation field (TEF) 1311 , If the TEF 1311 holds an active value, it indicates that a transaction is currently active and enabled while indicating that a transaction has been suspended if TEF 1311 holds an inactive value.

Bei einer Ausführungsform setzt eine Transaktionsbeginn-Operation oder eine andere Operation an einem Beginn einer Transaktion das TEF-Feld 1311 auf den aktiven Wert. Bei einem Ringniveau-Transaktionsereignis im Fluss 1300, wie beispielsweise einem Interrupt, einer Ausnahme, einem System-Call, einen Austritt einer virtuellen Maschine oder dem Eintritt einer virtuellen Maschine wird der Zustand des PE-0-Eflags-Registers 1310 auf den Kernelstapel 1320 im Fluss 1301 geschoben. Beim Fluss 302 wird das TEF-Feld 1311 auf den inaktiven Wert gelöscht/aktualisiert, um die Transaktion auszusetzen. Das Ringniveau-Übergangsereignis wird in geeigneter Weise behandelt oder bedient, während die Transaktion ausgesetzt ist. Bei Erfassen eines Return-Ereignisses beim Fluss 1303 wird der Zustand des Eflags-Registers 1310, das beim Fluss 1301 auf den Stapel geschoben wurde, beim Fluss 1304 hervorgeholt, um Eflags 1310 mit dem vorherigen Zustand wiederherzustellen. Die Wiederherstellung des vorherigen Zustands bringt TEF 1311 zum aktiven Wert zurück und nimmt die Transaktion als aktiv und aktiviert wieder auf.In one embodiment, a transaction start operation or other operation sets the TEF field at the beginning of a transaction 1311 to the active value. In a ring level transaction event in the flow 1300 , such as an interrupt, an exception, a system call, a virtual machine exit, or the entry of a virtual machine, becomes the state of the PE-0 flag register 1310 on the kernel pile 1320 in the river 1301 pushed. At the river 302 becomes the TEF field 1311 deleted / updated to the inactive value to suspend the transaction. The ring level transition event is appropriately handled or serviced while the transaction is suspended. When detecting a return event on the flow 1303 becomes the state of the Eflags register 1310 that at the river 1301 was pushed on the pile, by the river 1304 brought to Eflags 1310 restore to the previous state. The restoration of the previous state brings TEF 1311 back to the active value and takes the transaction as active and reactivated.

Spezielle Beispiele des Prozesses für veranschaulichende Ringebenenübergangsereignisse sind nachfolgend aufgelistet. Bei Interrupts und Ausnahmen schiebt der Prozessor das Eflags-Register in den Kernel-Stapel und löscht das „Transaction Enable”-Bit, falls es gesetzt ist, wodurch die zuvor aktivierte Transaktion ausgesetzt wird. Bei IRET stellt der Prozessor den gesamten Eflags-Register-Zustand für den unterbrochenen Thread einschließlich des „Transaction Enable”-Bits vom Kernel-Stapel wieder her, wodurch die Transaktion unausgesetzt wird, falls sie zuvor aktiviert war.Specific examples of the process for illustrative ring-level transition events are listed below. For interrupts and exceptions, the processor shifts the eflags register into the kernel stack and clears the "Transaction Enable" bit if set, thereby suspending the previously activated transaction. With IRET, the processor sets the entire eflags register state for the interrupted one Thread, including the "Transaction Enable" bit, from the kernel stack, which will suspend the transaction if it was previously enabled.

Bei SYSCALL schiebt der Prozessor das Eflags-Register und löscht das „transaction enable bit”, falls es gesetzt ist, wodurch die zuvor aktivierte Transaktion ausgesetzt wird. Bei SYSRET stellt der Prozessor den gesamten Eflags-Register-Zustand für den unterbrochenen Thread einschließlich des „transaction enable”-Bits vom Kernel-Stapel wieder her, wodurch die Transaktion unausgesetzt wird, falls sie zuvor aktiviert war.At SYSCALL, the processor shifts the eflags register and clears the transaction enable bit if set, thereby suspending the previously activated transaction. In SYSRET, the processor recovers the entire eflags register state for the interrupted thread, including the transaction enable bit from the kernel stack, thereby suspending the transaction if it was previously enabled.

Beim VM-Austritt speichert der Prozessor das Eflags-Register des Gastes einschließlich des „Transaction Enable”-Bit-Zustands in der virtuellen Maschinen-Steuerstruktur (VMCS) und lädt den Eflags-Register-Zustand des Hosts, welcher „Transaction Enable”-Bit-Zustand gelöscht ist, wodurch die zuvor aktivierte Transaktion des Gastes ausgesetzt wird, falls sie aktiviert ist.At VM exit, the processor stores the guest's Eflags register including the Transaction Enable bit state in the virtual machine control structure (VMCS) and loads the host's Eflags register state, which is the "Transaction Enable" bit Status, which suspends the guest's previously activated transaction, if enabled.

Beim VM-Eintritt stellt der Prozessor das Eflags-Register des Gastes einschließlich des „Transaction Enable”-Bit-Zustands vorn VMCS wieder her, wobei die zuvor aktivierte Transaktion des Gastes unausgesetzt wird, falls sie aktiviert war.At VM entry, the processor recovers the guest's Eflags register, including the "Transaction Enable" bit state from the VMCS, with the guest's previously activated transaction being suspended if it was activated.

Dadurch werden die hardwarebeschleunigten UTM-Transaktionen des Kernel-Modus (Ring 0) auf den hardwarebeschleunigten UTM-Transaktionen des Nutzermodus (Ring 3) aktiviert, jedoch auch Wege sowohl für das OS als auch den VMM bereitgestellt, um unbegrenzte Niveaus verschachtelter Interrupts und NMI-Fälle in Präsenz von Ring-0-Transaktionen zu behandeln. Derartige Mechanismen werden in keinem Stand der Technik vorgesehen.This activates kernel mode hardware-accelerated (ring 0) UTM transactions on the user mode hardware-accelerated UTM transactions (ring 3), but also provides paths for both the OS and the VMM to provide unlimited levels of interleaved interrupts and NMIs. Treat cases in presence of Ring 0 transactions. Such mechanisms are not provided in any prior art.

Ein hierin verwendetes Modul betrifft beliebige Hardware, Software, Firmware oder eine Kombination daraus. Als getrennt dargestellte Modulgrenzen können im Allgemeinen häufig variieren und sich potentiell überlappen. Beispielsweise können sich ein erstes und ein zweites Modul Hardware, Software, Firmware oder eine Kombination daraus teilen, während sie potentiell einen Teil unabhängiger Hardware, Software oder Firmware behalten. Bei einer Ausführungsform umfasst der Gebrauch des Ausdrucks Logik Hardware, wie beispielsweise Transistoren, Register, oder andere Hardware, wie beispielsweise programmierbare Logikvorrichtungen. Jedoch umfasst Logik bei einer anderen Ausführungsform Software oder einen Code, der in Hardware integriert ist, wie beispielsweise Firmware oder Microcode.A module used herein refers to any hardware, software, firmware or combination thereof. Module boundaries shown separately may generally vary frequently and potentially overlap. For example, a first and a second module may share hardware, software, firmware, or a combination thereof, while potentially retaining a portion of independent hardware, software, or firmware. In one embodiment, use of the term includes logic hardware, such as transistors, registers, or other hardware, such as programmable logic devices. However, in another embodiment, logic includes software or code integrated into hardware, such as firmware or microcode.

Ein hierin verwendeter Wert kann jede beliebige bekannte Darstellung einer Zahl, eines Zustands, eines logischen Zustands oder eines binären Logischen Zustands umfassen. Häufig wird auf die Verwendung von Logikniveaus, Logikwerten oder logischen Werten auch als 1en, 0en Bezug genommen, was einfach binäre logische Zustände darstellt. Beispielsweise betrifft eine 1 einen hohen logischen Pegel und 0 einen niedrigen logischen Pegel. Bei einer Ausführungsform kann eine Speicherzelle, wie beispielsweise ein Transistor oder eine Flashzelle, dazu eingerichtet sein, einen einzelnen Wert oder mehrere logische Wert zu halten. Jedoch wurden andere Darstellungen von Werten in Computersystemen ebenfalls verwendet. Beispielsweise kann die Dezimalzahl 10 auch als ein binärer Wert von 1010 und als ein hexadezimaler Buchstabe A dargestellt sein. Daher kann ein Wert eine beliebige Wiedergabe von Informationen umfassen, die sich dazu eignet, in einem Computersystem gehalten zu werden.A value used herein may include any known representation of a number, state, logic state, or binary logic state. Often, the use of logic levels, logic values, or logic values is also referred to as 1s, 0s, which simply represents binary logic states. For example, a 1 refers to a high logic level and 0 to a low logic level. In one embodiment, a memory cell, such as a transistor or a flash cell, may be configured to hold a single value or multiple logical values. However, other representations of values in computer systems have also been used. For example, the decimal number 10 may also be represented as a binary value of 1010 and a hexadecimal letter A. Therefore, a value may include any representation of information that is suitable for being held in a computer system.

Darüber hinaus können Zustände durch Werte oder Teile von Werten dargstellt werden. Als Beispiel kann ein erster Wert, wie beispielsweise eine logische Eins einen Voreinstellungszustand oder Ausgangszustand darstellen, während ein zweiter Wert, wie beispielsweise eine logische Null einen Nicht-Voreinstellungszustand darstellen kann. Zusätzlich betreffen die Ausdrücke Zurücksetzen und Setzen gemäß einer Ausführungsform einen Voreinstellungswert oder -zustand bzw. einen aktualisierten Wert oder Zustand. Beispielsweise umfasst ein Voreinstellungswert potentiell einen hohen logischen Wert, d. h. ein Zurücksetzen, während ein aktualisierter Wert potentiell einen niedrigen logischen Wert, d. h. Setzen, umfasst. Man beachte, dass jede beliebige Kombination von Werten verwendet werden kann, um jede beliebige Anzahl von Zuständen wiederzugeben.In addition, states can be represented by values or parts of values. As an example, a first value, such as a logical one, may represent a default state or initial state, while a second value, such as a logical zero, may represent a non-default state. Additionally, in one embodiment, the terms reset and set refer to a default value or state, or an updated value or state. For example, a default value potentially includes a high logical value, i. H. a reset, while an updated value potentially has a low logical value, i. H. Put, includes. Note that any combination of values can be used to represent any number of states.

Die oben dargelegten Ausführungsformen von Verfahren, Hardware, Software, Firmware oder Code können über Instruktionen oder Code implementiert werden, die bzw. der auf einem maschinenzugreifbaren oder maschinenlesbaren Medium gespeichert sind, die durch ein Verarbeitungselement ausführbar sind. Ein maschinenzugängliches/-lesbares Medium umfasst jeden beliebigen Mechanismus, der Informationen, (d. h. Speicherungen und/oder Übertragungen) Informationen in einer Form bereitstellt, die für eine Maschine lesbar ist, wie beispielsweise einen Computer oder ein elektronisches System. Ein maschinenzugängliches Medium umfasst beispielsweise einen Direktzugriffsspeicher (RAM), wie beispielsweise einen statischen RAM (SRAM) oder einen dynamischen RAM (DRAM); ROM; ein magnetisches oder optisches Speichermedium, Flash-Speichervorrichtungen, eine elektrische Speichervorrichtung, optische Speichervorrichtungen, akustische Speichervorrichtungen oder eine andere Form eines sich ausbreitenden Signale (z. B. Trägerwellen, infrarote Signale, Digitalsignale), Speichervorrichtung etc. Beispielsweise kann eine Maschine auf eine Speichervorrichtung durch Empfangen eines sich ausbreitenden Signals, wie beispielsweise einer Trägerwelle, von einem Medium zugreifen, das geeignet ist, die auf dem sich ausbreitenden Signal zu übertragenden Informationen zu halten.The embodiments of methods, hardware, software, firmware or code set forth above may be implemented via instructions or code stored on a machine-accessible or machine-readable medium executable by a processing element. A machine accessible / readable medium includes any mechanism that provides information (i.e., stores and / or transfers) of information in a form that is readable by a machine, such as a computer or electronic system. For example, a machine-accessible medium includes random access memory (RAM) such as static RAM (SRAM) or dynamic RAM (DRAM); ROME; a magnetic or optical storage medium, flash memory devices, an electrical storage device, optical storage devices, acoustic storage devices, or other form of propagating signal (e.g., carrier waves, infrared signals, digital signals), storage device, etc. For example, a machine may be mounted on a storage device by receiving a propagating signal, such as a carrier wave, from a medium capable of holding the information to be transmitted on the propagating signal.

Durchgängig durch diese Beschreibung bedeutet eine Bezugnahme auf „eine einzelne Ausführungsform” oder „eine Ausführungsform”, dass ein bestimmtes Merkmal, eine Struktur oder Eigenschaft, die in Verbindung mit der Ausführungsform beschrieben wurde, in zumindest einer Ausführungsform der vorliegenden Erfindung enthalten ist. Somit betreffen die Stellen, an welchen die Ausdrücke „in einer einzelnen Ausführungsform” oder „in einer Ausführungsform” in dieser Beschreibung auftreten, nicht notwendigerweise alle dieselbe Ausführungsform. Darüber hinaus können die speziellen Merkmale, Strukturen oder Charakteristika in jeder geeigneten Weise in einer oder mehreren Ausführungsformen kombiniert werden.Throughout this description, reference to "a single embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the places where the terms "in a single embodiment" or "in one embodiment" appear in this specification are not necessarily all the same embodiment. Moreover, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In der vorangehenden Beschreibung wurde eine detaillierte Beschreibung unter Bezugnahme auf spezielle beispielhafte Ausführungsformen angegeben. Es ist jedoch offensichtlich, dass verschiedene Modifizierungen und Abwandlungen daran vorgenommen werden können, ohne vom allgemeinen Gedanken und Umfang der Erfindung abzuweichen, wie er in den beigefügten Ansprüchen niedergelegt ist. Die Beschreibung und die Zeichnungen sind dementsprechend in einem veranschaulichenden Sinn und nicht in einem beschränkenden Sinn zu betrachten. Darüber hinaus betrifft der vorangehende Gebrauch einer Ausführungsform oder einer anderen beispielhaften Sprache nicht notwendigerweise dieselbe Ausführungsform oder dasselbe Beispiel, sondern kann sich auf andere und verschiedene Ausführungsformen sowie potentiell auf dieselbe Ausführungsform beziehen.In the foregoing description, a detailed description has been given with reference to specific exemplary embodiments. It is, however, to be understood that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. Accordingly, the description and drawings are to be considered in an illustrative sense and not in a limiting sense. Moreover, the foregoing use of an embodiment or other exemplary language does not necessarily contemplate the same embodiment or example, but may refer to other and different embodiments as well as potentially to the same embodiment.

Claims

Apparatus comprising: a plurality of processing elements, wherein a processing element of the plurality of processing elements is to be associated with a plurality of software subsystems; a metaphysical logic to associate a metadata access operation to be associated with a current software subsystem of the plurality of software subsystems and reference a data address to a metaphysical address space associated with the current software subsystem based on at least the data address and a metadata identifier (MDID) corresponding to the current one Software subsystem is assigned.

The apparatus of claim 1, wherein the metaphysical address space associated with the current software subsystem is to be orthogonal to a data address space comprising the data address and at least one other metaphysical address space subordinate to the second software subsystem of the plurality of software subsystems.

The apparatus of claim 2, wherein each of the plurality of software subsystems is individually selected from a group consisting of a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software translation subsystem, an outer transaction of a nested set of transactions, and an inner transaction of a nested group of transactions.

The apparatus of claim 1, further comprising decoding logic for decoding the metadata access operation, wherein the metadata access operation comprises an opcode that is recognized as one of the plurality of supported operations within the decode logic.

The apparatus of claim 1, wherein the metaphysical logic comprises metaphysical translation logic to translate the data address into a metadata address in the metaphysical address space associated with the current software subsystem based on at least the MDID.

The apparatus of claim 5, wherein the metaphysical translation logic for translating the data address into a metadata address within the metaphysical address space associated with the current software subsystem is further based on a processing element identifier (PEID) connected to the processing element.

The apparatus of claim 6, wherein the metaphysical translation logic for translating the data address into a metadata address within the metaphysical address space associated with the current software subsystem is further based on a compression ratio of data to metadata.

The apparatus of claim 6, further comprising a register that is modifiable by the current software subsystem, the register being configured to hold the MDID in response to a write from the current software subsystem to indicate that the current software subsystem is currently on the processing element and wherein the metaphysical translation logic is based on the PEID for translating the data address into a metadata address within the metaphysical address space associated with the current software subsystem, and wherein the MDID includes the metaphysical translation logic to provide a representation of the data address with the PEID and the Combine MDID.

The apparatus of claim 8, wherein the metaphysical translation logic for combining a representation of the data address with the PEID and the MDID is based on a combination algorithm selected from a group consisting of an algorithm for adding the PEID and the MDID to the data address at the data address an algorithm for translating the data address into a translated data address using normal data translation tables and adding the PEID and MDID to the translated address to form the metadata address and an algorithm for translating the data address into a translated metadata address using metaphysical translation tables are separate from normal data translation tables, and adding the PEID and MDID to the translated metadata address to form the metadata address.

Method, comprising: Finding a metadata operation that references a data address located within a data address space and associated with a data item held in a data entry of a cache memory; Determining a metadata address in a metaphysical address space separate from the data address space based on the data address, a processing element identifier (PEID) of a processing element associated with the metadata operation, and a metadata identifier (MDID) for a software subsystem associated with the processing element connected is; Accessing a metadata entry of the cache memory based on the metadata address.

The method of claim 10, wherein the metaphysical address space is also separated from an additional metaphysical address space associated with an additional software subsystem also connected to the processing element.

The method of claim 10, wherein the software subsystem is selected from a group consisting of a transactional runtime subsystem, a garbage collection runtime subsystem, a storage protection subsystem, a software translation subsystem, an outer transaction of a nested set of transactions, and an inner transaction a nested set of transactions.

The method of claim 10, further comprising: Writing the MDID into a control register associated with the processing element in response to finding a write operation to the control register from the software subsystem in response to the software subsystem currently executing on the processing element; and Determine the MDID from the control register.

The method of claim 13, further comprising: Determine the PEID from a part of an opcode for the metadata operation.

The method of claim 13, wherein determining the metadata address from the data address, the PEID, and the MDID comprises: combining the data address, the PEID, and the MDID with an algorithm selected from a group consisting of an algorithm to obtain the PEID and add the MDID to the data address to form the metadata address, an algorithm to translate the data address into a translated data address using normal data translation tables, and add the PEID and the MDID to the translated data address to form the metadata address, and a Algorithm to the There is a translation of data address into a translated metadata address using metaphysical translation tables that are separate from normal data translation tables, and to add the PEID and MDID to the translated metadata address to form the metadata address.

Apparatus comprising: Decoding logic for decoding a metadata access instruction to reference a data address of a data item, the metadata access instruction comprising an op code recognizable as part of an instruction set capable of being properly decoded by the decode logic; and metadata logic for translating the data address into a different metadata address, transparent to software, and for accessing metadata referenced by the different metadata address in response to the decoding logic decoding the metadata access instruction.

The apparatus of claim 16, wherein the metadata access instruction is selected from a group of instructions consisting of a Metadata Bit Test and Set (MDLT) Instruction, a Metadata Store and Set (MSS) Instruction, and a Metadata Store and Reset Instruction (MDSR).

The apparatus of claim 16, wherein the metadata access instruction is selected from a group of instructions consisting of a compressed metadata testing (CMDT) instruction, a compressed metadata storage (CMS) instruction, and a compressed metadata clearing instruction (CMDCLR).

The apparatus of claim 16, wherein the metadata logic for translating the data address into a different metadata address, transparent to software, comprises translating the data address based at least on a metadata identifier (MDID) specified in a control register by a software subsystem associated with the Metadata access instruction.

The apparatus of claim 16, wherein the metadata access instruction also includes a reference to a destination register and wherein the metadata logic for accessing metadata referenced by the different metadata address comprises the metadata logic for loading the metadata at the referenced different metadata address into the destination register.

The apparatus of claim 20, wherein the op code comprises a thread identifier field for identifying the thread from which the metadata access instruction originated.

The apparatus of claim 20, wherein the metadata logic for accessing metadata referenced by the different metadata address further comprises the metadata logic for setting the metadata at the referenced different metadata address to a set value in response to the Destination register loaded metadata are an unset value.

The apparatus of claim 22, wherein the set and unset values are specified in the metadata access instruction.

Machine-readable medium holding a program code which, when executed by a machine, causes the machine to perform the following operations: In response to a data access operation that refers to a data address: generating a metadata access operation to freeze on the data address during the data access operation, the metadata access operation, when executed by the machine, causing the machine to: To translate the data address into a metadata address that is separate from the data address, and to access metadata for a data item at the data address based on the metadata address.

The machine readable medium of claim 24, wherein the metadata access operation is selected from a group of instructions consisting of a metadata bit test and set (MDLT) instruction, a metadata storage and set (MSS) instruction, and a Metadata Store and Reset Instruction (MDSR).

The machine-readable medium of claim 24, wherein the metadata access operation is selected from a group of compression instructions consisting of a compressed metadata testing (CMDT) instruction, a compressed metadata storage (CMS) instruction, and a compressed metadata delete instruction (CMDCLR).

The machine readable medium of claim 26, wherein the metadata access operation, when executed by the machine to cause the machine to translate the data address into a metadata address, comprises causing the metadata access operation, when executed by the machine, to cause the machine to do so to combine the data address with a processing element identifier (PEID) associated with the metadata access operation and a metadata data identifier (MDID) associated with the metadata access operation based on a compression ratio of data to metadata.

The machine readable medium of claim 27, wherein the data address is also adapted to be translated by logic to translate a virtual to a physical address in the machine to refer to the data item.

The machine-readable medium of claim 24, wherein the metadata access operation also refers to an operand register and wherein the metadata access operation, when executed by the machine to cause the machine to access metadata for the data item, comprises metadata access operation as performed by the machine is running causes the machine to update the metadata for the data item with a value held in the operand register.

The machine readable medium of claim 24, wherein the program code comprises a compiler code and wherein the compiler code is for compiling application code including the data access operation, and wherein generating the metadata access operation on the data access operation comprises generating the metadata access operation in a compiled version of the application code.

A machine-readable medium holding program code which, when executed by a machine, causes the machine to perform the following operations: translating a data address referenced by a metadata access instruction in the program code into a metadata address based on a A metadata identifier (MDID) associated with a software subsystem currently active on a processing element associated with the metadata access instruction; and accessing metadata based on the metadata address.

The machine readable medium of claim 31, wherein the metadata access instruction is selected from a group of instructions consisting of a metadata loading metadata load instruction, a metadata store instruction for storing the metadata, and a metadata delete instruction for resetting the metadata.

The machine readable medium of claim 31, wherein the software subsystem is selected from the group consisting of a transactional runtime subsystem, a garbage collection runtime subsystem, a storage protection subsystem, a software translation subsystem, an outer transaction of a nested set of transactions, and an inner transaction a nested set of transactions.

The machine-readable medium of claim 31, wherein translating a data address referenced by a metadata access instruction in the program code into a metadata address based on a metadata identifier (MDID) associated with a software system currently active on a processing element associated with the metadata access instruction comprises combining the data address with the MDID based on a combination algorithm selected from a group consisting of an algorithm to add the MDID to the data address to form the metadata address, an algorithm the data address into a translated data address using normal transactional data tables and adding the MDID to the translated address to form the metadata address and an algorithm to convert the data address into a translated metadata address using metaphysical translation tables, This is separate from normal data translation tables and adds the MDID to the translated metadata address to form the metadata address.

The machine readable medium of claim 34, wherein adding the MDID comprises an algorithm for adding the MDID selected from a group consisting of an algorithm for appending the MDID in an MSB position, an algorithm for appending the MDID in an LSB position, and an algorithm for replacing address bits in the MDID.

The machine-readable medium of claim 34, wherein the program code, when executed by the machine, further causes the machine to perform the following operations: determining the MDID from a control register for the processing element that is to represent the current software subsystem; currently active on the processing element.

System comprising: A memory for holding a program code including a metadata access instruction that refers to a data storage address associated with a data item; A processor coupled to the memory, the processor being a processing element of a plurality of processing elements associated with execution of the metadata access instruction; fetch logic for fetching the metadata access instruction from the memory, a decode logic for decoding the metadata access instruction into at least one metadata access operation, a control register for holding a metadata identifier (MDID) associated with an active context on the processing element, a data cache to include a data entry for holding the data item, and execution logic for performing the metadata access operation, wherein the execution logic for performing the metadata access operation comprises metaphysical address translation logic in the processor to translate the data memory address into a metadata memory address based on the MDID held in the memory register and comprises a cache control logic coupled to the data cache to perform the metadata access operation on a separate entry of the data cache based on the metadata storage address a uszuführen.

The system of claim 37, wherein the metadata access instruction is selected from a group of instructions consisting of a metadata loading metadata load instruction, a metadata storage instruction for storing the metadata, and a metadata delete instruction for resetting the metadata.

The system of claim 37, wherein the active context is selected from a group consisting of a transactional runtime subsystem, a garbage collection runtime subsystem, a storage protection subsystem, an outer transaction of a nested set of transactions, and an inner transaction of a nested set of transactions.

The system of claim 37, wherein the metaphysical address translation logic in the processor for translating the data memory address into a metadata memory address is further based on a processing element identifier (PEID) for the processing element and wherein the metaphysical address translation logic in the processor translates the data memory address into a metadata memory address based on the MDID, is held in the control register, and the PEID comprises combining the data address with the MDID and the PEID based on a combination algorithm selected from a group consisting of an algorithm for adding the MDID and the PEID to the data address to form the metadata address an algorithm for translating the data address into a translated data address using normal transactional data tables and adding the MDID and the PEID to the translated address to form the metadata address, and an A Algorithm for translating the data address into a translated metadata address using metaphysical translation tables separate from normal data translation tables and adding the MDID and PEID to the translated metadata address to form the metadata address.

Processor comprising: An execution module for executing a metadata load operation that refers to an address; A strength module for providing in response to a metadata load operation a metadata value associated with an address in response to the processor operating in a first mode and providing a fixed value in response to the processor being in a second Mode works.

The processor of claim 41, wherein the first mode comprises a strong atomic mode and the second mode comprises a weak atomic mode.

The processor of claim 42, further comprising a first register for holding the fixed value.

The processor of claim 43 including a second register for holding a mode value, the mode value having a first value for indicating that the processor is operating in the strong atomic mode. and the mode value represents a second value to indicate that the processor is operating in the weak atomic mode.

The processor of claim 44, wherein the first and second registers are the same metadata control register.

The processor of claim 44, wherein the strength module for providing a metadata value associated with an address in response to the processor operating in the strong atomic mode and providing a fixed value in response to the processor being in the mode weak atom, includes that the strength module loads the metadata value into a destination register specified by the metadata load operation in response to the mode value held in the second register representing the first value to indicate that the processor is in strong mode Atomic operates and loads the fixed value from the first register into the destination register in response to the mode value held in the second register representing the second value to indicate that the processor is operating in the weak atomic mode.

Traversing include: Finding a metadata access operation that references an address; Determining a mode of processor execution; Providing a metadata value associated with the metadata access operation address in response to determining that the processor execution mode is a first execution mode; and Providing a fixed value from a register for the metadata access operation in response to determining that the processor execution mode is a second execution mode.

The method of claim 47, wherein determining a mode of processor execution comprises reading a mode flag from a first control register, wherein the mode flag is to hold a first value to indicate that the processor execution mode is a first execution mode and hold a second value to indicate that the processor execution mode is a second execution mode.

The method of claim 47, wherein providing a metadata value associated with the address for the metadata access operation comprises loading the metadata value from a storage location associated with the address into a destination register referenced by the metadata access operation.

The method of claim 49, wherein providing a fixed value from a register for the metadata access operation comprises loading the fixed value from the register into the destination register.

System comprising: A memory for holding a metadata load operation to reference an address and a destination register; A processor associated with the memory, wherein the processor includes execution logic to perform the metadata load operation, a metadata register to hold a forced value, a cache memory to hold a metadata value associated with the address, and strength logic to respond in that the execution logic executes the metadata load operation to supply the metadata value to the destination register in response to the processor operating in a first mode and to provide the forced value from the metadata register to the destination register in response to the processor in a second Operating mode works.

The system of claim 51, wherein the strength logic is further configured to determine whether the processor is operating in the first mode or the second mode.

The system of claim 52, wherein the first mode comprises a strong atomic mode and the second mode comprises a weak atomic mode.

The system of claim 52, wherein the metadata register is further configured to hold a mode value, the mode value representing a first value when the processor is operating in the first mode and representing a second value when the processor is operating in the second mode, wherein the strength value is Logic is further configured to determine whether the processor is operating in the first mode or the second mode, comprising the strength logic interpreting the mode value from the metadata register.

The system of claim 52, further comprising a control register for billing a mode value, the mode value representing a first value when the processor is operating in the first mode and representing a second value when the processor is operating in the second mode, and wherein the strength logic is further configured to determine whether the processor is operating in the first mode or the second mode, comprising the strength logic interpreting the mode value from the control register.

The system of claim 51, wherein the memory is selected from a group consisting of dynamic random access memory (DRAM), static random access memory (SRAM), and nonvolatile memory.

Apparatus comprising: A data cache array to hold a cache entry; A cache control logic coupled to the data cache array, the data control logic configured to transfer the cache entry from an unsupervised state to a buffered coherency and read-monitored state in a buffered update to the cache entry, and subsequently transferring the cache entry to a buffered coherency and write supervised state prior to transition of the cache entry to a modified state to commit the buffered update.

The apparatus of claim 57, wherein the buffered update to the cache entry comprises an update selected from a group consisting of a transactional memory access to a data address for a data item to be held in the cache entry, a metadata access to a data address, the is associated with metadata to be cached, and there is a local update to the cache entry.

The apparatus of claim 57, wherein the cache control logic for transferring the cache entry from an unsupervised state to a buffered coherency and read-supervised state comprises the cache control logic integrating coherency bits associated with the cache entry into one buffered value is updated to represent the buffered coherency state, and a read monitor attribute bit associated with the cache entry is updated to a read-monitored value to represent the read-monitored state.

The apparatus of claim 59, wherein the cache control logic for subsequently transferring the cache entry to a buffered coherency and write-monitored state prior to transitioning the cache entry to a modified state to commit the buffered update comprises the cache memory. Control logic maintains the coherency bits associated with the cache entry at the buffered value to reflect the buffered coherency state and updates a write monitor attribute bit associated with the cache entry to a read-only value to update the buffered coherency state read-write state.

The apparatus of claim 60, wherein the cache control logic for transferring the cache entry to the modified state comprises the cache control logic updating the coherency bits associated with the cache entry to a modified value to provide the modified coherency state represent.

The apparatus of claim 57, further comprising execution logic for performing the buffered update and subsequently performing a commit operation, the cache control logic subsequently placing the cache entry in a buffered coherency and write-monitored state prior to transition of the cache entry in a modified state to commit the buffered update in response to the execution logic performing the commit operation.

Method, comprising: Finding a buffered update to a block of a cache memory; Applying a read monitor to the block when finding the buffered update to the block of the cache; and Subsequently, apply write monitoring to the block before committing the block.

The method of claim 63, wherein the buffered update to the block of the cache comprises a transactional write to the block of the cache.

The method of claim 63, further comprising performing the buffered update to the block of the cache simultaneously with the application of a read monitor, wherein the block is held in a buffered coherency state after performing the buffered update.

The method of claim 62, further comprising performing the buffered update to the block of the cache memory after applying read monitoring, wherein the block is maintained in a buffered coherency state after performing the buffered update.

The method of claim 63, wherein applying read monitoring on the block when the buffered update is found to the block of the cache comprises: Generating a read request for the block to processing elements that are external to a cache domain of the cache memory; and Updating the read monitor attribute associated with the block of cache memory to a read monitor value to apply read monitoring to the block in response to detecting no conflicts from the processing elements that are external to the cache domain in response to the Read request for the block.

The method of claim 67, wherein subsequently applying a write monitor to the block prior to committing comprises: Generating a read for a possession request for the block to the processing elements located outside the cache's cache domain; and Updating a write monitor attribute associated with the block of cache memory to a write monitor value to write-monitor the block in response to detecting no conflicts from the processing elements that are external to the cache domain in response to the reading to apply for a possession request for the block.

The method of claim 68, wherein committing the block comprises: transitioning a cache coherency state of the block from a buffered coherency state to a modified coherency state.

Machine-accessible medium holding program code which, when executed by a machine, causes the machine to perform the following operations: Applying a read monitor to a block of a cache on a buffered write to the block; Performing the buffered writing to the block; and Apply a write monitor to the block after applying the read monitor and before committing the block.

The machine-accessible medium of claim 70, wherein applying read-monitoring to the block in a buffered write to the block of the cache comprises: Generating a read request for the block to processing elements that are external to a cache domain of the cache memory; and Updating a read monitor attribute associated with the block of cache memory to a read monitor value to provide read monitoring to the block in response to detecting no conflicts from the processing elements residing outside the cache domain in response to the read request to apply for the block.

The machine-accessible medium of claim 71, wherein applying write-monitoring to the block after applying the read-monitoring and before committing the block comprises: Generating a read-for-ownership request for the block to the processing elements located outside of the cache's cache domain; and Updating a write monitor attribute associated with the block of cache memory to a write monitor value to write monitoring the block in response to writing no conflicts from the processing elements residing outside the cache domain in response to the reading apply for ownership request for the block.

The machine-accessible medium of claim 70, wherein applying write-monitoring to the block occurs after applying the read-monitoring and before committing the block in response to finding a commit operation.

The machine-accessible medium of claim 70, wherein committing the block comprises: translating a cache coherency state of the block into a modified coherency state.

System comprising: A system memory to hold a transactional write referencing a memory address and a commit operation; A processor associated with the system memory, the processor comprising a cache to generate a read request for a cache line associated with the memory address in response to receiving the transactional write; Transitioning the cache line to a buffered and read-monitored state in response to detecting no conflicts based on the read request; Generating a read-for-ownership request in response to receiving the commit operation; Transmitting the cache line in a buffered and read-monitored state in response to no conflicts being detected based on the read-for-ownership request; and transitioning the cache line to a modified state in response to the transfer of the cell to the buffered and read-only state.

The system of claim 75, wherein the cache memory for transitioning the cache line to a buffered and read-monitored state comprises the cache updating coherency bits associated with the cache line to a buffered value to the cache represent a buffered portion of the buffered and read-monitored state, and update a read monitor attribute bit associated with a cache line into a read-only value to represent the read-monitored portion of the buffered and read-monitored state.

The system of claim 76, wherein the cache memory for transitioning the cache line to a buffered and read-only state includes the cache memory maintaining the coherency bits associated with the cache line at the buffered value to represent the buffered and read-only state buffered portion, and updating a write monitor attribute bit associated with the cache line to a read-only value to represent the read-only portion of the buffered and read-only state.

The system of claim 77, wherein the cache memory for transitioning the cache line to a modified state in response to transitioning the line to the buffered and read-only state comprises: updating the coherency bits to a modified value to provide the modified state represent.

The system of claim 75, wherein the memory is selected from a group consisting of Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), and non-volatile memory.

Apparatus comprising: A decode logic for decoding a loss instruction to provide a decoded item, the loss instruction referencing a mark and comprising an opcode that is part of an instruction set recognizable by the decode logic; A status storage element comprising a loss field to hold a loss value, the loss value indicating that a loss event has been detected; and a jump logic coupled to the status memory element to transmit control to the marker based on the decoded element and the loss value to indicate that the loss event has been detected.

The apparatus of claim 80, wherein the tag comprises a hop destination address, and wherein a miss event is selected from a group consisting of a read monitor conflict indicating that a write to a read-monitored cache line may have occurred, a write monitor conflict indicating that access to a write-monitored cache line might have occurred and a loss of a buffered cache line.

The apparatus of claim 80, wherein the status storage element comprises a register, and wherein the loss field to hold a loss value, a first bit to be set if a read monitor conflict was detected, a second bit to set if a write monitor conflict is detected became, a third bit, to be set when a loss of buffered physical data has been detected, and a fourth bit to be set if a loss of buffered metadata has been detected.

The apparatus of claim 80, wherein the loss instruction comprises a read monitor loss instruction and the op code is to specify a read monitor loss event type and wherein the jump logic for transferring control to the tag based on the decoded item and the loss value for indicating that the loss event was detected comprises the jump logic executes in response to the loss field holding the loss value indicating that the loss event that occurred was from the read monitor loss event type specified by the op code of the read monitor loss instruction to the jump label.

The apparatus of claim 80, wherein the loss instruction comprises a write monitor loss instruction and the op code specifies a write monitor loss event type and wherein the jump logic for transferring the control to the mark based on the decoded item and the loss field holding the loss value for indicating that the loss event is detected , in response to the loss field holding the loss value indicating that the loss event that occurred has been of the write monitor loss event type specified by the op code of the write monitor loss instruction causes the jump logic to cause execution to occur Brand jumps

The apparatus of claim 80, wherein the loss instruction has a buffered loss instruction and the op code specifies a buffered loss event type, and wherein the jump logic for rendering the control to the tag is indicative based on the decoded item and the loss field holding the loss value in that the loss event has been detected comprises the jump logic responsive to the loss field holding the loss value indicating that the loss event that has occurred is of the buffered loss event type specified by the op code of the buffered loss instruction specified, causes the execution to jump to the mark.

Machine-accessible medium holding a program code that, when executed by a machine, causes the machine to perform the following operations: in response to a loss instruction: Determining a status of a transaction held in a transactional status register specified by the loss instruction and residing in the machine; and conducting execution to a mark specified by the loss instruction in response to the status of the transaction indicating that a loss event associated with the loss instruction has been detected.

The machine-accessible medium of claim 86, wherein the tag includes a hop destination address and wherein a miss event is selected from a group consisting of a read monitor conflict indicating that a write to a read-monitored cache line may have occurred, a write monitor conflict indicating that An access to a read-only cache line may have occurred and there is a loss of a buffered cache line.

The machine-accessible medium of claim 86, wherein the loss instruction comprises a read monitor hop loss (JLOSS) instruction intended to specify that the loss event is a read monitor conflict, wherein determining a status of a transaction held in the transactional status register comprises determining a status of a read monitor conflict bit which is held in the transaction status register, and wherein directing the execution to a mark by the loss instruction is specified in response to the status of the transaction indicating that a loss event associated with the loss instruction has been detected, directing the execution to a tag specified by the loss instruction in response to the status the read monitor conflict bit held in the transaction status register indicating that a read monitor conflict has been detected.

The machine-accessible medium of claim 86, wherein the loss instruction comprises a write monitor hop loss (JLOSS) instruction intended to specify that the loss event is a write monitor conflict, wherein determining a status of a transaction held in the transactional status register comprises determining a status of a write monitor contention bit comprising, held in the transaction status register, and directing execution to a token specified by the loss instruction, in response to the status of the transaction indicating that a loss event associated with the loss instruction has been detected, comprises directing the execution to a tag specified by the loss instruction in response to the status of the write monitor conflict bit held in the transaction status register indicating that a write monitor conflict has been detected.

The machine-accessible medium of claim 86, wherein the loss instruction comprises a buffered monitor hop loss (JLOSS) instruction intended to specify that the loss event is a buffered monitor conflict, wherein determining a status of a transaction held in the transactional status register comprises determining a status a buffered monitor conflict bit held in the transaction status register and wherein directing execution to a flag specified by the loss instruction in response to the status of the transaction indicating that a loss event associated with the loss instruction is detected , comprising directing the execution to a tag specified by the loss instruction in response to the status of the buffered monitor conflict bit held in the transaction status register indicating that a buffered monitor conflict has been detected.

Method, comprising: Finding a loss instruction in a processor; Determining whether a loss event associated with the loss instruction has been detected in the processor in response to locating the loss instruction; and Branch to a tag referenced by the loss instruction in response to locating the loss instruction and determining that the loss event associated with the loss instruction has been detected in the processor.

The method of claim 91, wherein the tag comprises a jump address.

The method of claim 91, wherein the loss instruction comprises a read monitor loss instruction, and wherein the loss event associated with the read monitor loss instruction comprises writing to a read-supervised cache line.

The method of claim 91, wherein the loss instruction comprises a write monitor loss instruction, and wherein the loss event associated with the write monitor loss instruction comprises access to a write-monitored cache line.

The method of claim 91, wherein the loss instruction comprises a buffered loss instruction, and wherein the loss event associated with the buffered loss instruction comprises flushing a buffered cache line.

The method of claim 95, wherein determining whether a buffered cache line flush has been detected in the processor comprises checking a buffered loss status bit in a transaction status register and determining that a buffered cache line flush has been detected in response to the buffered loss status bit is set to a loss value.

Apparatus comprising: Decoding logic for decoding a commit instruction for a transaction to provide a decoded item, wherein the commit instruction specifies a commit condition and comprises an opcode that is part of an instruction set recognizable by the decode logic ; and Commit logic for determining whether the commit condition to be specified by the commit instruction is satisfied for the transaction in response to the decoded item.

The apparatus of claim 97, wherein the commit condition includes any specified combination of no loss of read-monitored data, no loss of read-only data, no loss of buffered data, and no loss of metadata, and wherein the commit logic for determining whether the commit Condition, includes determining that the specified combination of no loss of read-monitored data, no loss of read-only data, no loss of buffered data, and no loss of metadata has occurred.

The apparatus of claim 97, wherein the commit instruction for specifying a commit condition comprises the commit instruction holding four bits: a first bit, if set, for indicating that any loss of read-monitored data is a condition to commit That is, a second bit, if set, indicating that any loss of write-monitored data is a condition to commit, a third bit, if set, to indicate that any loss of buffered data is a condition is to commit, and a fourth bit, if set, to indicate that any loss of metadata is a condition to commit.

The apparatus of claim 99, wherein the four bits are to be included in the op code.

The apparatus of claim 99, wherein the commit logic for determining whether the commit condition to be specified by the commit instruction is satisfied for the transaction comprises the commit logic having corresponding status bits in a transaction status register for checks each of the four bits set in the commit instruction and determines that the commit condition is met if none of the corresponding status bits are set in the checked transaction register to indicate an associated loss.

The apparatus of claim 97, wherein the commit instruction is further to specify clear controls to indicate a combination of read-monitored data, read-monitored data, buffered data, and metadata to be deleted at commit, and wherein the commit logic is the specified combination of read-monitored Data, read-only data, buffered data, and metadata is deleted after the transaction has been committed in response to determining that the commit condition to be specified by the commit instruction is satisfied for the transaction.

Machine readable medium holding a program code which, when executed by a machine, causes the machine to perform the following operations: Finding a commit instruction for a transaction from the program code, the commit instruction specifying at least one commit error condition; Determining if the at least one commit error condition specified by the commit instruction has been captured during a pending transaction; Provide a value to indicate that the at least one commit error condition specified by the commit instruction has been detected during the pendency of the transaction, in response to determining that at least one commit error condition set by the commit Instruction is specified while pending the transaction.

The machine-readable medium of claim 103, wherein the at least one commit error condition is selected from a group consisting of a loss of read-monitored data, a loss of write-monitored data, a loss of buffered data, and a loss of metadata.

The machine-readable medium of claim 103, wherein providing a value to indicate that the at least one commit error condition specified by the commit instruction has been detected during the pendency of the transaction comprises loading the value into a destination register indicate that the at least one commit error condition specified by the commit instruction has been captured during the pendency of the transaction.

The machine-readable medium of claim 103, wherein determining whether the at least one commit error condition specified by the commit instruction has been detected during a pending transaction includes checking a status bit in a transaction status register that is at least one commit Error condition is connected; Determining that the at least one commit error condition specified by the commit instruction has been detected during pending of the transaction in response to the status bit associated with the at least one commit error condition being set to indicate that the one or more commit errors at least one commit error condition was detected during the pendency of the transaction; Determining that the at least one commit error condition specified by the commit instruction has not been captured during a pending transaction in response to the status bit associated with the at least one commit error condition being reset; to indicate that the at least one commit error condition was not captured during the pendency of the transaction.

The machine-readable medium of claim 106, further comprising committing the transaction in response to determining that the at least one commit error condition specified by the commit instruction was not captured during a pending transaction.

A method, comprising: finding a commit instruction in a transaction, wherein the commit instruction comprises an opcode that specifies the commit error conditions for the transaction; Determining that no commit error conditions for the transaction specified in the op code of the commit instruction were captured during a pending transaction; and committing the transaction in response to determining that no commit error conditions have been detected for the transaction specified in the op code of the commit instruction during the pendency of the transaction.

The method of claim 108, wherein the op code specifying commit error conditions for the transaction includes a first bit of the op code that, when set, specifies that a loss of read-monitored data is a commit error condition second bit of Op code, which, when set, specifies that a loss of write-monitored data is a commit error condition, a third bit of Op code, which, when set, specifies a loss of buffered Data is a commit error condition, and includes a fourth bit of op code that, when set, specifies that a loss of metadata is a commit error condition.

The method of claim 109, wherein determining that transaction commit commit conditions specified in the op code of the commit instruction have been detected during a pending transaction, determining that a transaction state register read monitor bit is not set. to indicate no loss of read-monitored data in response to the first bit of the op-code being set, determining that a write monitor bit of the transaction status register is not set to indicate no loss of read-monitored data in response to the second bit of the opcode being set; determining that a buffered bit of the transaction status register is not set to indicate no loss of buffered data in response to the third bit of the opcode being set, and determining that a metadata bit of the transaction status register is not set to indicate no loss of metadata in response to the fourth bit of the opcode being set.

The method of claim 109, wherein the op code further specifies erase controls, and wherein the erase control specifying op code specifies a fifth bit of the op code that, when set, specifies that read-monitored data be erased at commit of the Op code, which, when set, specifies that the read-monitored data is to be erased at commit, a seventh bit of the Op code which, when set, specifies to clear the buffered data at commit and includes an eighth bit of the opcode that, when set, specifies that metadata should be deleted on commit.

The method of claim 111, wherein committing the transaction is deleting read-monitored data if the fifth bit is set, clearing read-monitored data if the sixth bit is set, clearing buffered data if the seventh bit is set, and deleting metadata if the eighth bit is set.

System comprising: a memory for holding a program code including a commit instruction for a transaction; wherein the commit instruction comprises an op code specifying transaction commit error conditions and deletion control information; a processor having decode logic for decoding the op code of the commit instruction; and a commit logic for determining whether none of the error-to-commit conditions to be specified in the op-code has been captured during a pendency of the transaction, and to commit the transaction in response to the commit being committed Logic determines that the error-to-commit condition was not captured during the pendency of the transaction, where the commit logic for committing the transaction includes the commit logic having transactional information based on the erase control information that is present in the op-server. Code of the commit instruction to be deleted.

The system of claim 113, wherein the error-to-commit condition is based on a combination of loss of read-monitored data, loss of write-monitored data, loss of buffered data, and loss of metadata.

The system of claim 114, wherein the error-to-commit condition is selected from a group consisting of a loss of write-monitored data, a loss of read-monitored data or a loss of write-monitored data, a loss of write-monitored data, or a loss of buffered data Dates; a loss of read-only data or a loss of metadata and one Loss of read-only data, loss of read-monitored data, loss of buffered data or loss of metadata.

The system of claim 113, wherein the op code for specifying erasure control information comprises the op code specifying which of read monitors, write monitors, buffered coherency, and metadata to erase at commit, and wherein the commit logic is to erase transactional information based on the deletion control information to be specified in the op code of the commit instruction, the commit logic deletes the read monitors, write monitors, buffered coherency, and metadata specified by the op code as being to be deleted.

Apparatus comprising: a memory element comprising a transaction enable field (TEF), the TEF, if it holds an active value, indicating that an associated transaction is active and enabled and, if it holds an inactive value, indicating that an associated transaction is suspended; and Logic to store a state of at least the TEF in a memory structure in response to a ring level transition event and to restore the state of at least the TEF from the memory structure to the memory element in response to a return event.

The apparatus of claim 117, wherein the ring level transition event comprises an event selected from a group consisting of an interrupt, an exception, a system call, a virtual machine entry, and a virtual machine exit.

The apparatus of claim 117, wherein the retum event comprises an event selected from a group consisting of an interrupt return (IRET), a system return (SYSRET), a virtual machine entry (VM), and a virtual exit Machine (VM).

The apparatus of claim 117, wherein the storage element comprises a flag register and wherein the TEF comprises a transaction release flag.

The apparatus of claim 117, wherein the memory structure comprises a stack, wherein the logic for storing the state of at least the TEF in the stack comprises a shift logic to push the state of at least the TEF onto the stack, and wherein the logic for restoring the state is at least the TEF from the stack to the memory element comprises a pop logic to retrieve the state of at least the TEF from the stack and to restore the TEF in the memory element.

System comprising: a memory for holding code which, when executed, generates a ring level transition event; and a processor that includes a register that includes a transaction enable field (TEF) to hold an active value to indicate that an associated transaction is active; and stack logic to shift a previous state of the register to a stack in response to the ring level transition event, clear the TEF to an inactive value to indicate that the associated transaction is suspended, and return the previous state of the register from the stack to the register Response to a return event.

The system of claim 122, wherein the ring level transition event comprises an event selected from a group consisting of an interrupt, an exception, a system call, a virtual machine entry, and a virtual machine exit.

The system of claim 122, wherein the return event comprises an event selected from a group consisting of an interrupt return (IRET), a system return (SYSRET), a virtual machine entry (VM) and an exit a virtual machine (VM).

The system of claim 122, wherein the register comprises a flag register, the TEF comprises a transaction enable field, the active value comprises a high logical value of the flag, and the inactive value comprises a low logical value of the flag.

A method, comprising: detecting a ring level transition event from a current ring level; Storing a previous state of a register including a transaction enable field in a memory structure; Clearing the transaction release field to indicate that an associated transaction is suspended; Detecting a return to the current ring level event; Restoring the previous state of the register from the memory structure in response to detecting the return to the current ring level event.

The method of claim 126, wherein the memory structure comprises a kernel stack, wherein backing up the previous state of the register in the kernel stack comprises shifting the previous state of the register to the kernel stack and restoring the previous state of the register from the kernel. Stack includes fetching the previous state of the register from the kernel stack and restoring the previous state of the register.

The method of claim 126, wherein the current ring level comprises a user ring level.

The method of claim 128, wherein the ring level transition event comprises an event selected from a group consisting of an interrupt, an exception, a system call and a virtual machine entry.

The method of claim 129, wherein the return to the current privilege level event comprises an event selected from a group consisting of an interrupt return (IRET), a system return (SYSRET) and a virtual machine exit (VM).