JP5608738B2

JP5608738B2 - Unlimited transactional memory (UTM) system optimization

Info

Publication number: JP5608738B2
Application number: JP2012516043A
Authority: JP
Inventors: シェファー、ガッド; グレイ、ジャン; スミス、バートン; エーディーエル−タバタバイ、アリ−レザ; ゲバ、ロバート; バシン、ヴァディム; カラハン、デーヴィッド; ニー、ヤン; サハ、ブラティン; タイユフェール、マーティン; ライキン、シュロモ; ヤマダ、コウイチ; ワン、ランディ; キシャン、アラン
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2009-06-26
Filing date: 2009-06-26
Publication date: 2014-10-15
Anticipated expiration: 2029-06-26
Also published as: CN102460376B; BRPI0925055A2; GB2484416B; WO2010151267A1; CN102460376A; KR20130074726A; DE112009005006T5; GB201119084D0; JP2012530960A; KR101370314B1; GB2484416A

Description

本発明は、プロセッサの実行に関する。具体的には、命令群の実行に関する。 The present invention relates to processor execution. Specifically, it relates to the execution of an instruction group.

半導体プロセスおよびロジック設計が進化したことによって、集積回路素子に設けられるロジックの量を増加させることが可能となった。この結果、コンピュータシステム構成は、１つのシステム内に１以上の集積回路を設ける構成から、各集積回路に複数のコアおよび複数の論理プロセッサが設けられる構成へと進化を遂げた。通常、プロセッサまたは集積回路は１つのプロセッサダイを備え、プロセッサダイは任意の数のコアまたは論理プロセッサを有するとしてよい。 Advances in semiconductor processes and logic design have made it possible to increase the amount of logic provided in integrated circuit elements. As a result, the computer system configuration has evolved from a configuration in which one or more integrated circuits are provided in one system to a configuration in which a plurality of cores and a plurality of logical processors are provided in each integrated circuit. In general, a processor or integrated circuit comprises one processor die, which may have any number of cores or logical processors.

集積回路に設け得るコアおよび論理プロセッサの数は増加の一途を辿っているので、同時に実行可能なソフトウェアスレッドの数も増加している。しかし、同時に実行するソフトウェアスレッドの数が増加すると、複数のソフトウェアスレッド間で共有しているデータの同期に関して問題が発生する。マルチコアシステムまたはマルチ論理プロセッサシステムにおいて共有データにアクセスするための一般的な解決策の１つとして、共有データに対して複数アクセスがある場合に相互排除を保証するためロックを用いる方法がある。しかし、複数のソフトウェアスレッドを実行する機能は進化する一方であり、誤って競合が発生する可能性や実行がシリアル化してしまう可能性がある。 Since the number of cores and logical processors that can be provided in an integrated circuit is steadily increasing, the number of software threads that can be executed simultaneously is also increasing. However, when the number of software threads executed simultaneously increases, a problem occurs regarding synchronization of data shared among a plurality of software threads. One common solution for accessing shared data in a multi-core or multi-logical processor system is to use locks to ensure mutual exclusion when there are multiple accesses to the shared data. However, the function of executing a plurality of software threads is evolving, and there is a possibility that contention may occur accidentally and execution may be serialized.

例えば、共有データを保持しているハッシュテーブルを考えられたい。ロックシステムを利用する場合、プログラマは、ハッシュテーブル全体をロックして、１つのスレッドがハッシュテーブル全体にアクセスできるようにする場合がある。しかし、他のスレッドはロックが解除されるまでハッシュテーブルのどのエントリにもアクセスできないので、他のスレッドのスループットおよび性能に悪影響が出る。これに代えて、ハッシュテーブルをエントリ毎にロックするとしてもよい。いずれにしても、この単純な例に基づき大きくスケーラブルなプログラムを考えると、ロック競合、シリアル化、細粒度の同期、およびデッドロック回避の複雑さはプログラマによって非常に面倒な負担となることが明らかである。 For example, consider a hash table that holds shared data. When utilizing a lock system, the programmer may lock the entire hash table so that one thread can access the entire hash table. However, other threads cannot access any entry in the hash table until the lock is released, which adversely affects the throughput and performance of other threads. Instead, the hash table may be locked for each entry. In any case, considering a large and scalable program based on this simple example, it is clear that the complexity of lock contention, serialization, fine-grained synchronization, and deadlock avoidance can be very burdensome for programmers. It is.

近年利用されている別のデータ同期技術には、トランザクショナルメモリ（ＴＭ）を利用するものがある。トランザクション的実行では多くの場合、複数のマイクロ演算、演算または命令のグループ分けを実行する。上記の例では、両方のスレッドを同一ハッシュテーブル内で実行し、それぞれのメモリアクセスを監視／追跡する。両方のスレッドが同一エントリにアクセス／変更を実行すると、データの有効性を保証するべくコンフリクト解決を実行するとしてよい。トランザクション的実行の一種には、ソフトウェアトランザクショナルメモリ（ＳＴＭ）があり、ＳＴＭでは、メモリアクセスの追跡、コンフリクト解決、アボートタスク、および、その他のトランザクション的タスクは、多くの場合はハードウェアのサポート無しに、ソフトウェアで実行される。 Another data synchronization technique used in recent years uses a transactional memory (TM). Transactional execution often involves grouping multiple micro-operations, operations or instructions. In the above example, both threads execute in the same hash table and monitor / track their respective memory accesses. If both threads access / modify the same entry, conflict resolution may be performed to ensure the validity of the data. One type of transactional execution is software transactional memory (STM), where memory access tracking, conflict resolution, abort tasks, and other transactional tasks are often not supported by hardware. It is executed by software.

別の種類のトランザクション的実行には、ハードウェアトランザクショナルメモリ（ＨＴＭ）システムがあり、ＨＴＭシステムでは、アクセスの追跡、コンフリクト解決、および、他のトランザクション的タスクをサポートするべくハードウェアを含める。以前は、実際のメモリデータアレイを追加ビットで拡張して、読出、書込およびバッファリングを追跡するためにハードウェア属性等の情報を保持していた。このため、データはこのデータと共にプロセッサからメモリまで転送される。この情報は、持続的情報と呼ばれることが多く、つまり、キャッシュ・エビクションが発生しても失われない。これは、この情報がメモリヒエラルキー内をデータと共に移動するためである。しかし、このような持続性のために、メモリヒエラルキーシステムにおいてオーバーヘッドが増加してしまう。 Another type of transactional execution is a hardware transactional memory (HTM) system, which includes hardware to support access tracking, conflict resolution, and other transactional tasks. Previously, the actual memory data array was expanded with additional bits to hold information such as hardware attributes to track reads, writes and buffering. Therefore, the data is transferred from the processor to the memory together with this data. This information is often referred to as persistent information, that is, it is not lost when cache eviction occurs. This is because this information moves with the data in the memory hierarchy. However, such persistence increases overhead in the memory hierarchy system.

また、従来のハードウェアトランザクショナルメモリ（ＨＴＭ）システムは多くの非効率性を抱えている。第１の例を挙げると、ＨＴＭでは現時点において、トランザクションのコミット前にコンシステンシを保証するために、バッファされていない状態またはバッファされているが監視されていない状態からバッファされており且つ監視されている状態への遷移を効率的に行なう方法がない。別の例を挙げると、ＨＴＭのソフトウェアに対するインターフェースには非効率な点が多く見られる。具体的には、ハードウェアは、トランザクション的処理と非トランザクション的処理との間の強アトミック性および弱アトミック性についてさまざまな形態を考慮しているソフトウェアメモリアクセスバリアを適切に加速するメカニズムを提供していない。また、ハードウェアは、トランザクションをコミットしようとする場合、監視情報、バッファリング情報および/またはその他の属性情報の損失に基づきトランザクションをアボートまたはコミットするタイミングを判断するメカニズムを提供しない。同様に、このような従来のＨＴＭのための命令群は、トランザクションのコミット時に保持すべき情報またはクリアすべき情報を定義するコミット命令を実現できない。他にも非効率性の例を挙げると、ＨＴＭではコンフリクトまたは情報の損失を検出した場合に効率的に実行を誘導またはジャンプさせるための命令がない点、および、現時点のＨＴＭではトランザクション実行時にリングレベル優先遷移を処理することが出来ない点がある。 In addition, conventional hardware transactional memory (HTM) systems have many inefficiencies. As a first example, HTM is currently buffered and monitored from unbuffered or buffered but unmonitored state to ensure consistency before committing the transaction. There is no efficient way to transition to the current state. As another example, the interface to HTM software is often inefficient. Specifically, hardware provides a mechanism to properly accelerate software memory access barriers that take into account various forms of strong and weak atomicity between transactional and non-transactional processing. Not. Also, when attempting to commit a transaction, the hardware does not provide a mechanism for determining when to abort or commit a transaction based on the loss of monitoring information, buffering information, and / or other attribute information. Similarly, such a conventional instruction group for HTM cannot realize a commit instruction that defines information to be held or cleared when a transaction is committed. Another example of inefficiency is that HTM has no instructions to efficiently guide or jump to execution when a conflict or loss of information is detected, and current HTM has a ring at transaction execution. There is a point that level priority transition cannot be processed.

添付図面の図示内容は、本発明を限定するのではなく例示するものである。
複数のソフトウェアスレッドを同時に実行可能な複数の処理要素を備えるプロセッサの実施形態を示す図である。データアイテムについてメタデータを対応付ける実施形態を示す図である。複数の処理要素における複数の別個のソフトウェアサブシステムの複数の直交するメタフィジカルアドレス空間の実施形態を示す図である。データに対してメタデータを圧縮する実施形態を示す図である。メタデータにアクセスする方法の実施形態を説明するフローチャートである。強アトミック性環境および弱アトミック性環境においてトランザクションの加速をサポートするメタデータ格納要素の実施形態を示す図である。トランザクション的環境におけるアトミック性を維持しつつ非トランザクション的処理を加速するための実施形態を示すフローチャートである。トランザクションのコミット前にデータのブロックをバッファされており監視されている状態へと効率的に遷移させる方法の実施形態を示すフローチャートである。トランザクションステータスレジスタのステータス値に基づきデスティネーションラベルにジャンプする損失命令をサポートするハードウェアの実施形態を示す図である。コンフリクトまたは特定情報の損失の基づきデスティネーションラベルにジャンプする損失命令を実行する方法の実施形態を説明するためのフローチャートである。コミット命令においてコミット条件およびクリア制御の定義をサポートするハードウェアの実施形態を示す図である。コミット条件およびクリア制御を定義するコミット命令を実行する方法の実施形態を説明するためのフローチャートである。トランザクション実行中の特権レベル遷移の処理をサポートするハードウェアの実施形態を示す図である。 The illustrations in the accompanying drawings illustrate rather than limit the invention.
FIG. 6 illustrates an embodiment of a processor that includes multiple processing elements capable of executing multiple software threads simultaneously. It is a figure which shows embodiment which matches metadata about a data item. FIG. 4 illustrates an embodiment of multiple orthogonal metaphysical address spaces for multiple separate software subsystems in multiple processing elements. FIG. 3 is a diagram illustrating an embodiment for compressing metadata with respect to data. 6 is a flowchart illustrating an embodiment of a method for accessing metadata. FIG. 6 illustrates an embodiment of a metadata storage element that supports acceleration of transactions in a strong atomic environment and a weak atomic environment. 6 is a flowchart illustrating an embodiment for accelerating non-transactional processing while maintaining atomicity in a transactional environment. FIG. 5 is a flowchart illustrating an embodiment of a method for efficiently transitioning a block of data to a buffered and monitored state prior to committing a transaction. FIG. 4 illustrates an embodiment of hardware that supports a loss instruction that jumps to a destination label based on a status value in a transaction status register. 6 is a flowchart illustrating an embodiment of a method for executing a loss instruction that jumps to a destination label based on a conflict or loss of specific information. FIG. 4 is a diagram illustrating an embodiment of hardware that supports defining commit conditions and clear control in a commit instruction. 6 is a flowchart for explaining an embodiment of a method for executing a commit instruction defining a commit condition and a clear control. FIG. 4 is a diagram illustrating an embodiment of hardware that supports privilege level transition processing during transaction execution.

以下に記載する説明では、本発明の完全な理解を目的として、トランザクション的実行のための特定のハードウェア構造、アクセス監視部の特定の種類および実装、アクセスコンフリクトを検出する特定の種類のキャッシュコヒーレンシモデル、特定のデータ粒度、および、特定の種類のメモリアクセスおよびメモリ位置等の例等、具体的且つ詳細な内容を数多く記載する。しかし、本発明を実施する際に以下に記載する具体的且つ詳細な内容を採用する必要はないことは当業者には明らかである。また、ソフトウェアでのトランザクションのコーディング、コンパイラによるエニュメレーションされた機能を実行する処理の挿入、トランザクションの境界画定、特定および他のマルチコアプロセッサアーキテクチャおよびマルチスレッドプロセッサアーキテクチャ、特定のコンパイラ方法／実装、および、マイクロプロセッサの特定の処理の詳細内容等の公知の構成要素または方法は、本発明を不要にあいまいにすることを避けるべく詳細な説明を省略している。 In the description that follows, for the purposes of a thorough understanding of the present invention, a specific hardware structure for transactional execution, a specific type and implementation of an access monitor, and a specific type of cache coherency for detecting access conflicts. Many specific details are described, such as models, specific data granularity, and examples of specific types of memory accesses and memory locations. However, it will be apparent to those skilled in the art that the specific details described below need not be employed when practicing the present invention. Also, coding transactions in software, inserting processing to perform enumerated functions by the compiler, delimiting transactions, specific and other multi-core and multi-threaded processor architectures, specific compiler methods / implementations, and Well-known components or methods, such as details of specific processing of a microprocessor, have not been described in detail to avoid unnecessarily obscuring the present invention.

本明細書で説明する方法および装置は、無制限トランザクショナルメモリ（ＵＴＭ）を実装するハードウェアおよびソフトウェアを最適化する方法および装置である。具体的には、ＵＴＭシステムをどのようにサポートするかに主に関連付けて最適化を説明する。しかし、本明細書で説明する方法および装置は、任意の形態のトランザクショナルメモリシステムで利用されるとしてもよく、例えば、ソフトウェアトランザクショナルメモリシステム（ＳＴＭ）をサポートまたは加速するハードウェア、純粋なハードウェアトランザクショナルメモリシステム（ＨＴＭ）、または、これらを組み合わせたものであって、ＵＴＭシステムとは実装方法が異なるハイブリッド型で利用されるとしてもよい。 The method and apparatus described herein is a method and apparatus for optimizing hardware and software that implements unlimited transactional memory (UTM). Specifically, optimization will be described mainly in relation to how the UTM system is supported. However, the methods and apparatus described herein may be utilized in any form of transactional memory system, such as hardware supporting or accelerating a software transactional memory system (STM), pure hardware A wear transactional memory system (HTM), or a combination of these, may be used in a hybrid type that differs in mounting method from the UTM system.

図１は、複数のスレッドを同時に実行可能なプロセッサの実施形態を示す図である。尚、プロセッサ１００は、ハードウェアトランザクション的実行についてのハードウェアサポートを備えるとしてよい。プロセッサ１００はまた、ハードウェアトランザクション的実行と組み合わせて、または、これとは別個に、ソフトウェアトランザクショナルメモリ（ＳＴＭ）のハードウェア加速、ＳＴＭの独立実行、または、これらの組み合わせ、例えば、ハイブリッド型のトランザクショナルメモリ（ＴＭ）システムについてハードウェアサポートを提供するとしてよい。プロセッサ１００は、任意のプロセッサを備え、例えば、マイクロプロセッサ、組み込みプロセッサ、デジタルシグナルプロセッサ（ＤＳＰ）、ネットワークプロセッサ、または、その他のコード実行デバイスを備える。プロセッサ１００は、図示しているように、複数の処理要素を備える。 FIG. 1 is a diagram illustrating an embodiment of a processor capable of simultaneously executing a plurality of threads. Note that the processor 100 may include hardware support for hardware transactional execution. The processor 100 may also be in combination with or separately from hardware transactional execution, hardware acceleration of software transactional memory (STM), independent execution of STM, or a combination thereof, eg, hybrid type Hardware support may be provided for a transactional memory (TM) system. The processor 100 includes any processor, such as a microprocessor, embedded processor, digital signal processor (DSP), network processor, or other code execution device. As shown, the processor 100 includes a plurality of processing elements.

一実施形態によると、処理要素は、スレッドユニット、プロセスユニット、コンテクスト、論理プロセッサ、ハードウェアスレッド、コア、および／または、プロセッサの状態、例えば、実行状態またはアーキテクチャ状態を保持可能な任意のその他の要素を意味する。言い換えると、一実施形態に係る処理要素は、ソフトウェアスレッド、オペレーティングシステム、アプリケーション等のコードと独立して対応付け可能な任意のハードウェアを意味する。物理プロセッサは通常、コアまたはハードウェアスレッド等の他の処理要素を任意の数だけ有する集積回路を意味する。 According to one embodiment, a processing element may be a thread unit, a process unit, a context, a logical processor, a hardware thread, a core, and / or any other state capable of maintaining processor state, eg, execution state or architectural state. Means an element. In other words, a processing element according to an embodiment means arbitrary hardware that can be independently associated with code such as a software thread, an operating system, and an application. A physical processor typically refers to an integrated circuit having any number of other processing elements such as cores or hardware threads.

「コア」は、独立したアーキテクチャ状態を維持可能な、集積回路に設けられているロジックを意味することが多く、独立して維持されるアーキテクチャ状態はそれぞれ、少なくとも幾つかの専用実行リソースと対応付けられている。コアとは対照的に、「ハードウェアスレッド」は通常、独立したアーキテクチャ状態を維持可能な、集積回路に設けられている任意のロジックを意味するが、複数の独立して維持されるアーキテクチャ状態は実行リソースへのアクセスを共有する。以上から分かるように、所定のリソースが共有されており、他のリソースが一のアーキテクチャ状態の専用である場合、ハードウェアスレッドと呼ぶか、コアと呼ぶかの境界は重複している。しかし多くの場合、コアおよびハードウェアスレッドは、オペレーティングシステムからは別個の論理プロセッサとして認識され、オペレーティングシステムは論理プロセッサ毎に別個に処理をスケジューリング可能である。 “Core” often refers to logic provided in an integrated circuit that can maintain independent architectural states, each independently maintained architectural state associated with at least some dedicated execution resources It has been. In contrast to the core, a “hardware thread” usually means any logic provided in an integrated circuit that can maintain independent architectural states, but multiple independently maintained architectural states are Share access to execution resources. As can be seen from the above, when a predetermined resource is shared and another resource is dedicated to one architecture state, the boundary between the hardware thread and the core is overlapped. However, in many cases, the core and hardware threads are recognized by the operating system as separate logical processors, and the operating system can schedule processing separately for each logical processor.

物理プロセッサ１００は、図１に示すように、高位レベルキャッシュ１１０へのアクセスを共有しているコア１０１およびコア１０２の２つのコアを備えている。プロセッサ１００は、複数の非対称なコア、つまり、構成、機能部および／またはロジックが異なる複数のコアを備えるとしてもよいが、図示しているのは複数の対称なコアである。このため、コア１０１と同一であるものとして図示されているコア１０２は、説明が繰り返しになるのを避けるべく、詳細な説明は省略する。また、コア１０１は２つのハードウェアスレッド１０１ａおよび１０１ｂを有し、コア１０２は２つのハードウェアスレッド１０２ａおよび１０２ｂを有する。このため、オペレーティングシステム等のソフトウェアエンティティは、プロセッサ１００を４つの別個のプロセッサと見なす。つまり、４つのソフトウェアスレッドを同時に実行可能な４つの論理プロセッサまたは処理要素があると見なす。 As shown in FIG. 1, the physical processor 100 includes two cores, a core 101 and a core 102 that share access to the high-level cache 110. Although the processor 100 may include a plurality of asymmetric cores, that is, a plurality of cores having different configurations, functional units, and / or logics, the illustrated are a plurality of symmetric cores. For this reason, the detailed description of the core 102 shown as being the same as the core 101 is omitted to avoid repeated description. The core 101 has two hardware threads 101a and 101b, and the core 102 has two hardware threads 102a and 102b. Thus, a software entity such as an operating system sees the processor 100 as four separate processors. That is, it is assumed that there are four logical processors or processing elements that can execute four software threads simultaneously.

ここで、第１のスレッドはアーキテクチャ状態レジスタ１０１ａに対応付けられ、第２のスレッドはアーキテクチャ状態レジスタ１０１ｂに対応付けられ、第３のスレッドはアーキテクチャ状態レジスタ１０２ａに対応付けられ、第４のスレッドはアーキテクチャ状態レジスタ１０２ｂに対応付けられている。図示しているように、アーキテクチャ状態レジスタ１０１ａはアーキテクチャ状態レジスタ１０１ｂで複製されているので、論理プロセッサ１０１ａおよび論理プロセッサ１０１ｂについて別個のアーキテクチャ状態／コンテクストを格納することができる。他のこれより小規模のリソース、例えば、リネーム割り当てロジック１３０の命令ポインタおよびリネームロジックも、スレッド１０１ａおよび１０１ｂについて複製するとしてよい。一部のリソース、例えば、リオーダ／リタイア部１３５のリオーダバッファ、ＩＬＴＢ１２０、ロード／格納バッファ、および、待ち行列は、パーティション化を利用して共有するとしてよい。他のリソース、例えば、汎用内部レジスタ、ページテーブルベースレジスタ、低レベルデータキャッシュおよびデータＴＬＢ１１５、実行部１４０、および、アウトオブオーダ部１３５の一部は、完全に共有され得る。 Here, the first thread is associated with the architecture state register 101a, the second thread is associated with the architecture state register 101b, the third thread is associated with the architecture state register 102a, and the fourth thread is Corresponding to the architecture status register 102b. As shown, the architectural state register 101a is duplicated in the architectural state register 101b so that separate architectural states / contexts can be stored for logical processor 101a and logical processor 101b. Other smaller resources, such as the instruction pointer and rename logic of rename assignment logic 130, may also be replicated for threads 101a and 101b. Some resources, such as the reorder buffer of the reorder / retirement unit 135, the ILTB 120, the load / store buffer, and the queue may be shared using partitioning. Other resources such as general purpose internal registers, page table base registers, low level data cache and data TLB 115, execution unit 140, and part of out-of-order unit 135 may be fully shared.

プロセッサ１００は多くの場合、他に完全に共有されるリソース、パーティション化を利用して共有されるリソース、または、ある処理要素に専用のリソースを備える。図１で図示する機能部／リソースを備えるプロセッサの実施形態は、単に一例に過ぎない。尚、プロセッサは、上記の機能部のうちいずれを備えるとしてもよいし、省略するとしてもよいし、図示していない任意のその他の公知の機能部、ロジック、または、ファームウェアを備えるとしてもよい。 The processor 100 often comprises other fully shared resources, resources shared using partitioning, or resources dedicated to certain processing elements. The embodiment of a processor with functional units / resources illustrated in FIG. 1 is merely an example. Note that the processor may include any of the functional units described above, may be omitted, or may include any other known functional unit, logic, or firmware that is not illustrated.

図示しているように、プロセッサ１００は、プロセッサ１００の外部のデバイス、システムメモリ１７５、チップセット、ノースブリッジ、または、その他の集積回路等と通信するためにバスインターフェースモジュール１０５を備えている。メモリ１７５は、プロセッサ１００の専用であるとしてもよいし、システム内のほかのデバイスとの間で共有するとしてもよい。高位レベルキャッシュまたはその上位のキャッシュ１１０は、高位レベルキャッシュ１１０から最近フェッチされた要素をキャッシュする。尚、「高位レベル」または「その上位」とは、実行部から離れるにしたがって高くなるキャッシュレベルを意味する。一実施形態によると、高位レベルキャッシュ１１０は第２レベルデータキャッシュである。しかし、高位レベルキャッシュ１１０は、これに限定されるものではなく、命令キャッシュに対応付けられているとしてもよいし、命令キャッシュを有するとしてもよい。これに代えて、トレースキャッシュ、つまり、命令キャッシュの一種を、最近デコードされたトレースを格納するべく、デコーダ１２５の後段に結合するとしてもよい。モジュール１２０はさらに、実行／採用すべき分岐を予測する分岐ターゲットバッファ、および、命令用のアドレス変換エントリを格納する命令変換バッファ（Ｉ−ＴＬＢ）を有する。 As shown, the processor 100 includes a bus interface module 105 for communicating with devices external to the processor 100, system memory 175, chipset, north bridge, or other integrated circuits. The memory 175 may be dedicated to the processor 100 or may be shared with other devices in the system. The high level cache or higher level cache 110 caches elements that have been recently fetched from the high level cache 110. Note that “higher level” or “its higher level” means a cache level that increases as the user leaves the execution unit. According to one embodiment, the high level cache 110 is a second level data cache. However, the high-level cache 110 is not limited to this, and may be associated with an instruction cache or may have an instruction cache. Alternatively, a trace cache, or a type of instruction cache, may be coupled to the subsequent stage of the decoder 125 to store recently decoded traces. The module 120 further includes a branch target buffer that predicts a branch to be executed / adopted, and an instruction translation buffer (I-TLB) that stores an address translation entry for the instruction.

デコードモジュール１２５は、フェッチ部１２０に結合されており、フェッチされた要素をデコードする。一実施形態によると、プロセッサ１００は、プロセッサ１００で実行可能な命令を定義／規定している命令セットアーキテクチャ（ＩＳＡ）に対応付けられている。ここにおいて、ＩＳＡで認識される機械コード命令は多くの場合、実行すべき命令または演算を参照／規定しているオペコードと呼ばれる部分を含む。 The decode module 125 is coupled to the fetch unit 120 and decodes the fetched element. According to one embodiment, processor 100 is associated with an instruction set architecture (ISA) that defines / defines instructions executable on processor 100. Here, machine code instructions recognized by the ISA often include a portion called an opcode that refers to / specifies an instruction or operation to be executed.

一例を挙げると、割り当て／リネームブロック１３０は、命令処理結果を格納するためのレジスタファイル等のリソースを確保する割り当て部を有する。しかし、スレッド１０１ａおよび１０１ｂはアウトオブオーダでの実行が可能であり、この場合には、割り当て／リネームブロック１３０は、命令結果を追跡するためのリオーダバッファ等の他のリソースも確保する。割り当て／リネームブロック１３０はさらに、プログラム／命令参照レジスタをプロセッサ１００の内部のほかのレジスタへとリネームするレジスタリネーム部を有するとしてよい。リオーダ／リタイア部１３５は、アウトオブオーダ実行およびアウトオブオーダで実行された命令のインオーダでのリタイアをサポートするべく、上述したリオーダバッファ、ロードバッファ、および、格納バッファ等の構成要素を有している。 For example, the allocation / rename block 130 has an allocation unit that secures resources such as a register file for storing the instruction processing result. However, threads 101a and 101b can execute out-of-order, in which case allocation / rename block 130 also reserves other resources such as a reorder buffer for tracking instruction results. The assign / rename block 130 may further include a register rename unit that renames the program / instruction reference register to another register within the processor 100. The reorder / retirement unit 135 includes components such as the above-described reorder buffer, load buffer, and storage buffer to support out-of-order execution and in-order retirement of instructions executed out of order. Yes.

スケジューラおよび実行部ブロック１４０は、一実施形態によると、実行部に対して命令／演算をスケジューリングするスケジューラ部を有する。例えば、浮動小数点命令は、利用可能な浮動小数点実行部を持つ実行部のポートにスケジューリングされる。命令処理結果に関する情報を格納するべく、実行部に対応付けられているレジスタファイルも含まれる。実行部の例を挙げると、浮動小数点実行部、整数実行部、ジャンプ実行部、ロード実行部、格納実行部、および、その他の公知の実行部が含まれる。 The scheduler and execution unit block 140, according to one embodiment, includes a scheduler unit that schedules instructions / operations for the execution unit. For example, a floating point instruction is scheduled at a port of an execution unit that has an available floating point execution unit. A register file associated with the execution unit is also included to store information on the instruction processing result. Examples of the execution unit include a floating point execution unit, an integer execution unit, a jump execution unit, a load execution unit, a storage execution unit, and other known execution units.

低位レベルデータキャッシュおよびデータ変換バッファ（Ｄ−ＴＬＢ）１５０は、実行部１４０に結合されている。データキャッシュは、最近利用された／演算が行なわれた要素を格納している。例えば、メモリコヒーレンシ状態で保持されているデータオペランドを格納している。Ｄ−ＴＬＢは、最近の仮想／線形アドレスから物理アドレスへの変換を格納している。具体例を挙げると、プロセッサは、物理メモリを複数の仮想ページに分割するページテーブル構造を備えるとしてよい。 A low level data cache and data conversion buffer (D-TLB) 150 is coupled to the execution unit 140. The data cache stores recently used / calculated elements. For example, data operands held in the memory coherency state are stored. The D-TLB stores recent virtual / linear address to physical address translations. As a specific example, the processor may include a page table structure that divides physical memory into a plurality of virtual pages.

一実施形態によると、プロセッサ１００では、ハードウェアトランザクション的実行、ソフトウェアトランザクション的実行、または、これらを組み合わせたハイブリッド型の実行が可能である。トランザクションは、コードのクリティカル部分またはアトミック部分とも呼ばれるが、アトミック群として実行されるべき命令群、演算群またはマイクロ演算群を含む。例えば、命令または演算を用いて、トランザクションまたはクリティカル部分を画定するとしてよい。一実施形態によると、より詳細に後述するが、このような命令は、上述したデコーダ等、プロセッサ１００のハードウェアによって認識可能な命令群アーキテクチャ（ＩＳＡ）等の命令群の一部を成している。多くの場合、このような命令は、高級言語からハードウェアで認識可能なアセンブリ言語へとコンパイルされると、オペレーションコード（オペコード）または命令のうちデコーダがデコード段階で認識するその他の部分を含む。 According to one embodiment, the processor 100 can perform hardware transactional execution, software transactional execution, or a hybrid execution that combines these. A transaction, also called a critical or atomic part of code, includes an instruction group, an operation group, or a micro operation group to be executed as an atomic group. For example, instructions or operations may be used to define transactions or critical parts. According to one embodiment, as will be described in more detail below, such instructions form part of an instruction group such as an instruction group architecture (ISA) recognizable by the processor 100 hardware, such as the decoder described above. Yes. In many cases, such instructions include operation code (opcode) or other portions of the instruction that the decoder recognizes in the decoding stage when compiled from a high-level language to an assembly language that can be recognized by hardware.

トランザクションの実行中、メモリへの更新は、当該トランザクションがコミットされるまで、グローバルには可視化されないのが普通である。一例を挙げると、ある位置へのトランザクション的書込は、ローカルスレッドには可視であるが、当該トランザクション的書込を含むトランザクションがコミットされるまでは、他のスレッドからの読出に応じて書込みデータを転送することはない。トランザクションがまだ実行中である間は、メモリからロードされるデータアイテム／データ要素およびメモリに書き込まれるデータアイテム／データ要素を追跡する。これについては、より詳細に後述する。トランザクションがコミット点に到達すると、当該トランザクションに関してコンフリクトが検出されなければ、当該トランザクションをコミットして、トランザクション実行中に行なわれた更新をグローバルに可視化する。 During the execution of a transaction, updates to memory are usually not globally visible until the transaction is committed. As an example, a transactional write to a location is visible to the local thread, but until the transaction containing the transactional write is committed, the write data in response to a read from another thread Never transfer. While the transaction is still running, it tracks the data items / data elements loaded from and written to memory. This will be described later in more detail. When a transaction reaches the commit point, if no conflict is detected for the transaction, the transaction is committed and the updates made during transaction execution are globally visible.

しかし、実行中にトランザクションが無効化されると、トランザクションをアボートして、更新をグローバルに可視化することなく再開する。このため、「トランザクションの実行中」という表現は、本明細書で用いる場合、実行が開始されており、コミットまたはアボートされていない、つまり、実行中のトランザクションを意味するものとする。 However, if the transaction is invalidated during execution, the transaction is aborted and the update is resumed without global visibility. For this reason, the expression “transaction in progress”, as used herein, means a transaction that has been started and has not been committed or aborted, that is, being executed.

ソフトウェアトランザクショナルメモリ（ＳＴＭ）システムは通常、アクセス追跡、コンフリクト解決、または、その他のトランザクショナルメモリのタスクを、ソフトウェア内で、または、少なくとも部分的にソフトウェア内で実行することを意味する。一実施形態によると、プロセッサ１００は、トランザクション的実行をサポートするべくプログラムコードをコンパイルするコンパイラを実行することができる。ここにおいて、コンパイラは、演算、呼び出し、関数、および、トランザクションの実行を可能とするその他のコードを挿入するとしてよい。 Software transactional memory (STM) systems typically mean performing access tracking, conflict resolution, or other transactional memory tasks in software, or at least partially in software. According to one embodiment, the processor 100 may execute a compiler that compiles program code to support transactional execution. Here, the compiler may insert operations, calls, functions, and other code that enables execution of transactions.

コンパイラは通常、ソーステキスト／コードをターゲットテキスト／コードに変換するプログラムまたはプログラム群を含む。プログラム／アプリケーションコードをコンパイラでコンパイルする場合には、複数の段階およびパスに分けて行い、高級プログラミング言語コードを低級の機械言語コードまたはアセンブリ言語コードに変換することが多い。しかし、コンパイルを簡略化するべく１パス型のコンパイラを利用するとしてもよい。コンパイラは、任意の公知のコンパイル方法を利用して任意の公知のコンパイラ処理を実行するとしてよい。例えば、字句解析、前処理、構文解析、意味解析、コード生成、コード変換、および、コード最適化を実行するとしてよい。 A compiler typically includes a program or group of programs that convert source text / code into target text / code. When a program / application code is compiled by a compiler, the program / application code is often divided into a plurality of stages and paths, and the high-level programming language code is converted into a low-level machine language code or assembly language code. However, a one-pass compiler may be used to simplify compilation. The compiler may execute any known compiler process using any known compilation method. For example, lexical analysis, preprocessing, syntax analysis, semantic analysis, code generation, code conversion, and code optimization may be performed.

コンパイラが大型になると複数の段階を含むことが多いが、一般的には２段階が最も多い。（１）フロントエンド段階、つまり、構文処理、意味処理、および、一部の変換／最適化を行なう段階と、（２）バックエンド段階、つまり、分析、変換、最適化、および、コード生成を実行する段階とに分けられる。一部のコンパイラでは、コンパイラのフロントエンド段階とバックエンド段階との間の境界があいまいなミドルエンドが見られる。このため、挿入、対応付け、生成、または、その他のコンパイラの処理は、上述した段階またはパス、および、任意のその他の公知のコンパイラの段階またはパスのうちいずれの段階またはパスで行われるとしてもよい。説明のために一例を挙げると、コンパイラは、コンパイルの１以上の段階にトランザクション的な演算、呼び出し、関数等を挿入するが、例えば、コンパイルのフロントエンド段階に呼び出し／演算を挿入し、トランザクショナルメモリ変換段階において呼び出し／演算をより低級なコードに変換する。 When a compiler becomes large, it often includes a plurality of stages, but generally two stages are the most common. (1) Front-end stage, ie, syntactic processing, semantic processing, and part conversion / optimization stage; (2) Back-end stage, ie analysis, conversion, optimization, and code generation It is divided into the stage to execute. Some compilers have a middle end where the boundary between the front-end and back-end phases of the compiler is ambiguous. For this reason, insertion, mapping, generation, or other compiler processing may be performed at any of the stages or passes described above and any other known compiler stage or pass. Good. To illustrate, for example, the compiler inserts transactional operations, calls, functions, etc. at one or more stages of compilation, for example, inserts calls / operations at the front end stage of compilation, and transactional In the memory conversion stage, the call / operation is converted into a lower code.

しかし、コンパイラの動的または静的な性格および実行環境に関わらず、コンパイラは、一実施形態によると、トランザクション的な実行を可能にするべくプログラムコードをコンパイルする。このため、プログラムコードの実行という場合、一実施形態によると、（１）メインプログラムコードをコンパイルするため、トランザクション的構造を維持するため、または、他のトランザクション関連の処理を実行するために、コンパイラプログラムを動的または静的に実行すること、（２）トランザクション的な演算／呼び出しを含むメインプログラムコードを実行すること、（３）メインプログラムコードと対応付けられているライブラリ等の他のプログラムコードを実行すること、または、（４）これらの組み合わせを意味する。 However, regardless of the compiler's dynamic or static nature and execution environment, the compiler, according to one embodiment, compiles the program code to allow transactional execution. Thus, when referring to execution of program code, according to one embodiment, (1) a compiler for compiling the main program code, maintaining a transactional structure, or performing other transaction-related processing. Executing the program dynamically or statically; (2) executing main program code including transactional operations / calls; and (3) other program code such as a library associated with the main program code. Or (4) a combination of these.

コンパイラは、ソフトウェアトランザクショナルメモリ（ＳＴＭ）システムでは多くの場合、一部の演算、呼び出し、および、他のコードを、コンパイルされるべきアプリケーションコードにしたがって挿入するために用いられる。一方、他の演算、呼び出し、関数、および、コードはライブラリ内で別に提供される。このような構成とすることによって、アプリケーションコードを再度コンパイルすることなく、ライブラリを最適化および更新するライブラリ分配部の機能が得られる。具体例を挙げると、コミット関数の呼び出しは、アプリケーションコード中のトランザクションのコミット点に応じて挿入され、当該コミット関数は別に、更新可能なライブラリで提供されるとしてよい。また、特定の演算および呼び出しをどこに配置するかの選択は、アプリケーションコードの効率に影響を及ぼす。例えば、図６を参照しつつアクセスバリアに関連付けてより詳細に後述するフィルタリング処理をコードにしたがって挿入する場合、当該フィルタリング処理は、バリアに誘導してからフィルタリング処理を実行するという非効率的なやり方ではなく、バリアに向けて実行を誘導する前に行われるとしてよい。 Compilers are often used in software transactional memory (STM) systems to insert some operations, calls, and other code according to the application code to be compiled. On the other hand, other operations, calls, functions, and code are provided separately within the library. With this configuration, it is possible to obtain the function of the library distribution unit that optimizes and updates the library without recompiling the application code. As a specific example, a call to a commit function may be inserted according to a transaction commit point in application code, and the commit function may be provided separately in an updatable library. Also, the choice of where to place certain operations and calls affects the efficiency of application code. For example, when a filtering process described later in more detail in association with an access barrier with reference to FIG. 6 is inserted according to a code, the filtering process is an inefficient way of performing the filtering process after guiding to the barrier Rather, it may be done before inducing execution towards the barrier.

一実施形態によると、プロセッサ１００は、ハードウェア／ロジックを利用してトランザクションを実行可能である。つまり、ハードウェアトランザクショナルメモリ（ＨＴＭ）システムにおいて実行可能である。ＨＴＭを実現する場合にはアーキテクチャおよびマイクロアーキテクチャの両方の観点から見て具体的な実現に関する詳細な内容が数多くあり、その大半は、本発明を不要にあいまいにすることを避けるべく、本明細書では説明を省略する。しかし、一部の構造および実施例については例示を目的として開示する。しかし、開示する構造および実施例は必須ではなく、実施の際の詳細な内容が異なる他の構造を追加しても、および／または、そのような他の構造と置換するとしてもよいことに留意されたい。 According to one embodiment, the processor 100 can perform transactions utilizing hardware / logic. That is, it can be executed in a hardware transactional memory (HTM) system. When implementing an HTM, there are many details regarding specific implementations from both an architectural and micro-architecture perspective, most of which are described herein to avoid unnecessarily obscuring the present invention. Then, explanation is omitted. However, some structures and examples are disclosed for illustrative purposes. However, it is noted that the structures and examples disclosed are not essential and other structures that differ in detail in implementation may be added and / or replaced with such other structures. I want to be.

組み合わせとして、プロセッサ１００は、ＳＴＭシステムおよびＨＴＭシステムの両方の長所を利用しようとする無制限トランザクショナルメモリ（ＵＴＭ）システムでトランザクションを実行可能であるとしてもよい。例えば、ＨＴＭは多くの場合、小規模のトランザクションを高速且つ効率的に実行することに適している。これは、アクセス追跡、コンフリクト検出、認証およびトランザクションのコミットの全てを実行するためにソフトウェアを利用しないためである。しかし、ＨＴＭは通常、比較的小規模のトランザクションを処理することしか出来ない。一方、ＳＴＭは、処理できるトランザクションのサイズに制限はない。このため、一実施形態によると、ＵＴＭシステムは、比較的小規模のトランザクションを実行するためにはハードウェアを利用し、ハードウェアには大きすぎるトランザクションを実行するためにはソフトウェアを利用する。以下の説明から分かるように、ソフトウェアがトランザクションを処理している場合であっても、ソフトウェアを支援および加速させるためにハードウェアを利用するとしてもよい。さらに、重要なポイントであるが、純粋なＳＴＭシステムをサポートおよび加速するために利用されるハードウェアは同じである点にも留意されたい。 In combination, the processor 100 may be capable of executing transactions in an unlimited transactional memory (UTM) system that seeks to take advantage of both STM and HTM systems. For example, HTM is often suitable for executing small transactions at high speed and efficiency. This is because no software is used to perform all of access tracking, conflict detection, authentication and transaction commit. However, an HTM can usually only handle relatively small transactions. On the other hand, STM does not limit the size of transactions that can be processed. Thus, according to one embodiment, the UTM system utilizes hardware to perform relatively small transactions and software to perform transactions that are too large for the hardware. As can be seen from the following description, hardware may be utilized to assist and accelerate the software, even when the software is processing transactions. Furthermore, it is important to note that the hardware used to support and accelerate a pure STM system is the same.

上述したように、トランザクションは、プロセッサ１００内のローカルな処理要素および他の処理要素による、データアイテムへのトランザクション的なメモリアクセスを含む。トランザクショナルメモリシステムに安全メカニズムが無い場合、このようなアクセスのうち一部は、無効なデータおよび実行となってしまう。つまり、読出を無効化するようなデータへの書込、または、無効なデータの読出等となってしまう。このため、プロセッサ１００は、コンフリクトが発生する可能性を特定するべく、データアイテムに対するメモリアクセスを追跡または監視するロジックを備える。例えば、後述するように、読出監視部および書込監視部を備える。 As described above, transactions include transactional memory access to data items by local processing elements and other processing elements within processor 100. If there is no safety mechanism in the transactional memory system, some of these accesses result in invalid data and execution. That is, writing to data that invalidates reading, reading of invalid data, or the like. To this end, the processor 100 includes logic to track or monitor memory accesses to data items to identify potential conflicts. For example, as will be described later, a read monitoring unit and a write monitoring unit are provided.

データアイテムまたはデータ要素とは、ハードウェア、ソフトウェアまたはこれらの組み合わせが定義する任意の粒度レベルのデータを含むとしてよい。データ、データ要素、データアイテムまたはこれらへの参照の例を挙げると、全てを網羅しているものではないが、メモリアドレス、データオブジェクト、クラス、動的言語コードの型のフィールド、動的言語コードの型、変数、オペランド、データ構造、および、メモリアドレスへの間接的な参照がある。しかし、任意の公知のデータ群を、データ要素またはデータアイテムと呼ぶとしてもよい。上記の例のうちの数例、例えば、動的言語コードの型のフィールドおよび動的言語コードの型とは、動的言語コードのデータ構造を指している。説明すると、サン・マイクロシステムズ・インコーポレーテッド（ＳｕｎＭｉｃｒｏｓｙｓｔｅｍｓ，Ｉｎｃ．）社製のＪａｖａ（登録商標）等の動的言語コードは、型付けの程度が高い言語である。各変数は、コンパイル時に明らかとなる型を持つ。型は、プリミティブ型（ブーリアンおよび数値、例えば、整数、浮動小数点）および参照型（クラス、インターフェース、およびアレイ）という２つのカテゴリーに分けられる。参照型の値は、オブジェクトへの参照である。Ｊａｖａ（登録商標）では、オブジェクトは、複数のフィールドから成り、クラスインスタンスまたはアレイであってよい。クラスＡのオブジェクトａについては、Ａ：：ｘという表記方法を用いて型Ａのフィールドｘを表し、ａ．ｘという表記方法でクラスＡのオブジェクトａのフィールドｘを表すのが通常である。例えば、数式はａ．ｘ＝ａ．ｙ＋ａ．ｚ等で表されるとしてよい。この場合、フィールドｙおよびフィールドｚをロードして加算し、その結果をフィールドｘに書き込む。 A data item or data element may include any level of granularity defined by hardware, software, or a combination thereof. Examples of data, data elements, data items or references to them are not exhaustive but include memory addresses, data objects, classes, dynamic language code type fields, dynamic language codes There are indirect references to types, variables, operands, data structures, and memory addresses. However, any known data group may be referred to as a data element or data item. Some of the above examples, for example, dynamic language code type field and dynamic language code type, refer to the data structure of the dynamic language code. For example, a dynamic language code such as Java (registered trademark) manufactured by Sun Microsystems, Inc. is a highly typed language. Each variable has a type that becomes apparent at compile time. Types are divided into two categories: primitive types (boolean and numeric, eg, integer, floating point) and reference types (class, interface, and array). A reference type value is a reference to an object. In Java (registered trademark), an object consists of a plurality of fields, and may be a class instance or an array. For an object a of class A, a field x of type A is represented using the notation A :: x, and a. Usually, the field x of the object a of class A is represented by the notation method x. For example, the formula is a. x = a. y + a. It may be represented by z or the like. In this case, field y and field z are loaded and added, and the result is written in field x.

このため、データアイテムへのメモリアクセスの監視／バッファリングは、任意のデータレベル粒度で実行するとしてよい。例えば、一実施形態によると、データへのメモリアクセスは、型のレベルで監視する。ここで、フィールドＡ：：ｘへのトランザクション的書込およびフィールドＡ：：ｙの非トランザクション的ロードは、同じデータアイテム、つまり、型Ａに対するアクセスとして監視されるとしてよい。別の実施形態によると、メモリアクセスの監視／バッファリングは、フィールドのレベルの粒度で実行される。この場合、Ａ：：ｘへのトランザクション的書込およびＡ：：ｙの非トランザクション的ロードは、別のフィールドを参照しているので、同じデータアイテムへのアクセスとしては監視されない。尚、データアイテムへのメモリアクセスの追跡については、他のデータ構造またはプログラミング技術を考慮に入れるとしてもよい。一例を挙げると、クラスＡのオブジェクトのフィールドｘおよびフィールドｙ、つまり、Ａ：：ｘおよびＡ：：ｙが、クラスＢのオブジェクトを指定しており、新たに割り当てられたオブジェクトに初期化され、初期化の後は書き込みが決して行われないと仮定する。一実施形態によると、Ａ：：ｘによって指定されたオブジェクトのフィールドＢ：：ｚへのトランザクション的書込は、Ａ：：ｙによって指定されているオブジェクトのフィールドＢ：：ｚの非トランザクション的ロードと同じデータアイテムへのメモリアクセスとしては監視されない。上記の例から、監視部は任意のデータ粒度レベルで監視／バッファリングを実行するものと推察され得る。 Thus, monitoring / buffering of memory access to data items may be performed at any data level granularity. For example, according to one embodiment, memory access to data is monitored at the type level. Here, transactional writes to field A :: x and non-transactional loads of field A :: y may be monitored as accesses to the same data item, type A. According to another embodiment, memory access monitoring / buffering is performed at a field level granularity. In this case, transactional writes to A :: x and non-transactional loads of A :: y are not monitored as accesses to the same data item because they refer to different fields. It should be noted that other data structures or programming techniques may be taken into account for tracking memory accesses to data items. As an example, field x and field y of a class A object, ie A :: x and A :: y, specify a class B object and are initialized to a newly allocated object, Assume that writing is never done after initialization. According to one embodiment, a transactional write to the field B :: z of the object specified by A :: x is a non-transactional load of the field B :: z of the object specified by A :: y. Is not monitored for memory accesses to the same data item. From the above example, it can be inferred that the monitoring unit performs monitoring / buffering at an arbitrary data granularity level.

一実施形態によると、プロセッサ１００は、データアイテムに対応付けられているアクセス、および、その後に発生し得るコンフリクトを検出または追跡する監視部を備える。一例を挙げると、プロセッサ１００のハードウェアは、監視対象と判断されるロードおよびストアを追跡する読出監視部および書込監視部を有する。一例を挙げると、ハードウェアの読出監視部および書込監視部は、基本格納構造の粒度に関わらず、データアイテムを当該データアイテムの粒度で監視する。一実施形態によると、データアイテムは、少なくとも当該データアイテム全体が適切に監視されるように、格納構造の粒度で対応付けられている追跡メカニズムによって画定されている。 According to one embodiment, the processor 100 comprises a monitoring unit that detects or tracks accesses associated with data items and conflicts that may occur thereafter. For example, the hardware of the processor 100 includes a read monitoring unit and a write monitoring unit that track loads and stores determined to be monitored. For example, the hardware read monitoring unit and the write monitoring unit monitor the data item at the granularity of the data item regardless of the granularity of the basic storage structure. According to one embodiment, the data items are defined by a tracking mechanism that is associated with granularity of the storage structure so that at least the entire data item is properly monitored.

説明するために具体例を挙げると、読出監視部および書込監視部は、キャッシュ位置、例えば、低レベルデータキャッシュ１５０内の位置に対応付けられている属性を、当該位置に対応付けられているアドレスに対するロードおよびストアを監視するために含む。この場合、データキャッシュ１５０のキャッシュ位置の読出属性は、当該キャッシュ位置に対応付けられているアドレスに対する読出イベントに応じて、同じアドレスに対して競合する書込を監視するために設定される。ここにおいて、書込属性は、同じアドレスに対して読出および書込が競合していないか監視するべく、書込イベントについて同様に操作される。この例に基づいてさらに説明すると、ハードウェアは、キャッシュ位置に対する読出および書込をスヌープして、当該キャッシュ位置が監視されている旨を示すべく読出属性および／書込属性を設定することによってコンフリクトを検出することができる。逆に、一実施形態によると、読出監視部および書込監視部を設定することによって、または、キャッシュ位置をバッファ済み状態に更新することによって、読出要求または所有権読出要求等のスヌープが実行される。これによって、他のキャッシュの監視されているアドレスについてのコンフリクトを検出することができる。 As a specific example for explanation, the read monitoring unit and the write monitoring unit associate an attribute associated with a cache position, for example, a position in the low-level data cache 150, with the position. Include to monitor load and store for addresses. In this case, the read attribute of the cache position of the data cache 150 is set to monitor a conflicting write for the same address in response to a read event for the address associated with the cache position. Here, the write attribute is similarly manipulated for the write event to monitor for read and write contention for the same address. To further explain based on this example, the hardware conflicts by snooping reads and writes to a cache location and setting the read and / or write attributes to indicate that the cache location is being monitored. Can be detected. Conversely, according to one embodiment, a snoop such as a read request or ownership read request is performed by setting a read monitor and a write monitor, or by updating the cache location to a buffered state. The As a result, it is possible to detect a conflict regarding the monitored address of another cache.

このため、設計に応じて、キャッシュラインの監視対象コヒーレンシ状態およびキャッシュコヒーレンシ要求の組み合わせによっては、コンフリクトが発生する可能性がある。例えば、共有されており読出監視の対象である状態のデータアイテムを保持しているキャッシュラインと、当該データアイテムに対する書込要求を示すスヌープである。逆に、バッファ済み書込状態のデータアイテムを保持するキャッシュラインと、当該データアイテムに対する読出要求を示す外部スヌープは、コンフリクトを発生させる可能性があると見なされるとしてよい。一実施形態によると、このようにアクセス要求と属性状態との組み合わせを検出するべく、スヌープロジックは、コンフリクト検出／報告ロジック、例えば、コンフリクトの検出／報告のための監視部および／またはロジック、ならびに、コンフリクトの報告のためのステータスレジスタ等に結合されている。 For this reason, depending on the design, a conflict may occur depending on the combination of the monitoring target coherency state of the cache line and the cache coherency request. For example, a cache line that holds a data item that is shared and is subject to read monitoring, and a snoop that indicates a write request for the data item. Conversely, a cache line holding a data item in a buffered write state and an external snoop that indicates a read request for the data item may be considered as potentially causing a conflict. According to one embodiment, to detect a combination of access request and attribute state in this manner, the snoop logic includes conflict detection / reporting logic, eg, monitoring and / or logic for conflict detection / reporting, and , Coupled to a status register for conflict reporting.

しかし、条件およびシナリオの組み合わせによっては、コミット命令等の命令によって定義され得るトランザクションについて無効であると見なされるとしてもよい。これについては、図１１および図１２を参照しつつより詳細に後述する。トランザクションをコミットしないと見なす要因の例には、トランザクション的なアクセスが実行されたメモリ位置に対するコンフリクトの検出、監視情報の損失、バッファ済みデータの損失、トランザクション的なアクセスが実行されたデータアイテムに対応付けられているメタデータの損失、および、割り込み、リング遷移、または、明示的なユーザ命令等の他の無効イベントの検出が含まれる。 However, depending on the combination of conditions and scenarios, a transaction that may be defined by an instruction such as a commit instruction may be considered invalid. This will be described in detail later with reference to FIGS. 11 and 12. Examples of factors that do not consider a transaction to commit include detecting conflicts for memory locations where transactional access was performed, loss of monitoring information, loss of buffered data, and data items subjected to transactional access This includes detecting attached metadata loss and other invalid events such as interrupts, ring transitions, or explicit user instructions.

一実施形態によると、プロセッサ１００のハードウェアは、バッファされた状態で、トランザクション的な更新を保持する。上述したように、トランザクション的書込は、トランザクションがコミットされるまで、グローバルに可視化されない。しかし、トランザクション的書込に対応付けられているローカルソフトウェアスレッドは、後続のトランザクション的アクセスのために、トランザクション的更新にアクセスすることができる。第１の例を挙げると、プロセッサ１００にはバッファされている更新を保持する別個のバッファ構造が設けられており、ローカルスレッドには更新を供給することができるが、他の外部のスレッドには供給できない。しかし、別個のバッファ構造を備えると、コストが増加し、複雑化してしまう。 According to one embodiment, the hardware of processor 100 holds transactional updates in a buffered state. As described above, transactional writes are not globally visible until the transaction is committed. However, local software threads associated with transactional writes can access transactional updates for subsequent transactional access. As a first example, the processor 100 is provided with a separate buffer structure that holds buffered updates, which can supply updates to local threads, but not to other external threads. Cannot supply. However, providing a separate buffer structure increases cost and complexity.

別の例を挙げると、これとは対照的に、データキャッシュ１５０のようなキャッシュメモリを用いて、同様にトランザクション的処理の機能を維持しつつ、更新をバッファリングする。ここにおいて、キャッシュ１５０は、データアイテムをバッファ済みコヒーレンシ状態で保持することができる。あるケースでは、新しくバッファ済みコヒーレンシ状態を、ＭＥＳＩ（変更、排他、共有、無効：ＭｏｄｉｆｉｅｄＥｘｃｌｕｓｉｖｅＳｈａｒｅｄＩｎｖａｌｉｄ）プロトコル等のキャッシュコヒーレンシプロトコルに追加して、ＭＥＳＩＢプロトコルを作成する。キャッシュ１５０は、バッファされているデータアイテム、つまり、バッファ済みコヒーレンシ状態で保持されているデータアイテムについてのローカル要求に応じて、データアイテムをローカル処理要素に供給して、内部でのトランザクションの逐次的順序を保証する。しかし、外部からのアクセス要求に対しては、ミス応答を供給して、トランザクション的に更新されたデータアイテムがコミットまではグローバルに可視化されないようにする。さらに、キャッシュ１５０のあるラインがバッファ済みコヒーレンシ状態で保持されておりエビクションの対象に選択されると、このバッファされている更新は高位レベルのキャッシュメモリには書き戻されない。このバッファされている更新はメモリシステム内で拡散されない。つまり、コミットが完了するまではグローバルに可視化されない。コミットが完了すると、バッファされているラインは、修正済み状態に遷移して、データアイテムをグローバルに可視化する。 In another example, in contrast, a cache memory, such as data cache 150, is used to buffer updates while also maintaining the functionality of transactional processing. Here, the cache 150 can hold data items in a buffered coherency state. In some cases, the newly buffered coherency state is added to a cache coherency protocol such as MESI (Modified, Exclusive, Shared Invalid) protocol to create a MESIB protocol. Cache 150 provides data items to local processing elements in response to local requests for buffered data items, i.e., data items held in buffered coherency state, so that internal transactions are sequential. Guarantee the order. However, for external access requests, a miss response is provided so that transactionally updated data items are not globally visible until commit. Further, if a line in the cache 150 is held in a buffered coherency state and selected for eviction, this buffered update is not written back to the higher level cache memory. This buffered update is not spread within the memory system. In other words, it will not be visible globally until the commit is complete. When the commit is complete, the buffered line transitions to the modified state, making the data item visible globally.

尚、「内部」および「外部」という用語は通常、トランザクションの実行に対応付けられているスレッドまたはキャッシュを共有している複数の処理要素の観点から見た「内部」および「外部」である。例えば、トランザクションの実行に対応付けられているソフトウェアスレッドを実行する第１の処理要素をローカルスレッドと呼ぶ。このため、上記の説明では、第１のスレッドによって以前に書込が行われ、対応するキャッシュラインがバッファ済みコヒーレンシ状態で保持されているアドレスに対するストアまたはロードを受信すると、ローカルスレッドである第１のスレッドには当該キャッシュラインのバッファ済みバージョンを提供する。対照的に、同じプロセッサの別の処理要素において実行されている第２のスレッドは、バッファ済み状態で保持されているキャッシュラインに対応するトランザクションの実行に対応付けられておらず、外部スレッドとなる。このため、第２のスレッドからこのアドレスへのロードまたはストアは、このキャッシュラインのバッファ済みバージョンはミスして、通常のキャッシュ置換を利用してより高位レベルのメモリからこのキャッシュラインの未バッファバージョンを取得する。 Note that the terms “internal” and “external” are typically “internal” and “external” from the perspective of multiple processing elements sharing a thread or cache associated with the execution of a transaction. For example, a first processing element that executes a software thread associated with execution of a transaction is called a local thread. Thus, in the above description, when a store or load is received for an address previously written by the first thread and the corresponding cache line is held in a buffered coherency state, the first thread is the local thread. Provide a buffered version of the cache line. In contrast, a second thread executing in another processing element of the same processor is not associated with the execution of a transaction corresponding to a cache line held in a buffered state and becomes an external thread. . Thus, a load or store from the second thread to this address will miss the buffered version of this cache line, and will use normal cache replacement to take an unbuffered version of this cache line from higher level memory. To get.

ここにおいて、「内部／ローカル」スレッドおよび「外部／リモート」スレッドは、同じプロセッサで実行されており、一部の実施形態では、プロセッサの同じコアにおける、キャッシュへのアクセスを共有している別個の処理要素で実行されるとしてもよい。しかし、これらの用語の意味は上記の内容に限定されない。上述したように、「ローカル」は、トランザクションの実行に対応付けられている１つのスレッドに限定されるのではなく、キャッシュへのアクセスを共有している複数のスレッドを意味するとしてよい。一方、「外部」または「リモート」は、キャッシュへのアクセスを共有していない複数のスレッドを意味するとしてよい。 Here, the “internal / local” thread and the “external / remote” thread are running on the same processor, and in some embodiments, separate, sharing access to the cache in the same core of the processor. It may be executed by a processing element. However, the meaning of these terms is not limited to the above contents. As described above, “local” is not limited to one thread associated with execution of a transaction, but may mean a plurality of threads sharing access to the cache. On the other hand, “external” or “remote” may mean a plurality of threads that do not share access to the cache.

図１について最初に言及したように、プロセッサ１００のアーキテクチャは説明のための例示的なものに過ぎない。同様に、メタデータの参照のためにデータアドレスを変換する具体例も一例に過ぎず、同一メモリの別個のエントリにおいてデータとメタデータとを対応付けるには、どのような方法を用いるとしてもよい。 As initially mentioned with respect to FIG. 1, the architecture of the processor 100 is merely illustrative for purposes of illustration. Similarly, a specific example of converting a data address for referring to metadata is merely an example, and any method may be used to associate data and metadata in separate entries of the same memory.

＜メタデータのためのメタフィジカルアドレス空間＞
＜メタデータ＞
図２を参照すると、プロセッサにおいてデータアイテムのメタデータを保持する実施形態を図示している。同図に示すように、データアイテム２１６のメタデータ２１７はメモリ２１５においてローカルに保持されている。メタデータは、データアイテム２１６に対応付けられている任意の特性または属性、例えば、データアイテム２１６に関するトランザクション的情報を含む。説明のためにメタデータの例を幾つか後述する。しかし、開示しているメタデータの例は、例示的なものに過ぎず、全てを網羅しているものではない。また、メタデータ位置２１７は、後述する例と、具体的には説明しないがデータアイテム２１６の他の属性とを組み合わせて保持するとしてもよい。 <Metaphysical address space for metadata>
<Metadata>
Referring to FIG. 2, an embodiment of maintaining data item metadata in a processor is illustrated. As shown in the figure, the metadata 217 of the data item 216 is held locally in the memory 215. The metadata includes any characteristic or attribute associated with the data item 216, for example, transactional information regarding the data item 216. For the sake of explanation, some examples of metadata will be described later. However, the disclosed metadata examples are merely illustrative and are not exhaustive. Further, the metadata position 217 may be held in combination with an example to be described later and other attributes of the data item 216 although not specifically described.

第１の例を挙げると、メタデータ２１７は、データアイテム２１６が以前に、トランザクションにおいてアクセスされたか、バッファされたか、および／または、バックアップされたかしている場合には、トランザクション的に書き込まれたデータアイテム２１６のバックアップ位置またはバッファ位置への参照を含む。この場合、一部の実施例では、データアイテム２１６の以前のバージョンのバックアップコピーが別の位置で保持されているので、メタデータ２１７はこのバックアップ位置へのアドレス等の参照を含む。これに代えて、メタデータ２１７自体がデータアイテム２１６のバックアップ位置またはバッファ位置として機能するとしてもよい。 As a first example, the metadata 217 is data written transactionally if the data item 216 was previously accessed, buffered, and / or backed up in a transaction. Contains a reference to the backup or buffer location of item 216. In this case, in some embodiments, the metadata 217 includes a reference, such as an address to this backup location, because the backup copy of the previous version of the data item 216 is kept at another location. Alternatively, the metadata 217 itself may function as a backup position or buffer position for the data item 216.

別の例を挙げると、メタデータ２１７は、データアイテム２１６への繰り返しのトランザクション的アクセスを加速するためのフィルタリング値を含む。多くの場合、ソフトウェアを用いたトランザクションの実行時には、コンシステンシおよびデータ有効性を保証するためにトランザクション的メモリアクセスではアクセスバリアが実行される。例えば、トランザクション的ロード処理の前に、読出バリアを実行して読出バリア処理を実施する。例えば、データアイテム２１６がロックされていないか否かを試験したり、当該トランザクションの現在の読出設定が依然として有効か否かを判断したり、フィルタリング値を更新したり、後に検証できるように当該トランザクションの読出設定のバージョン値を記録したりする。しかし、当該トランザクションの実行中にこの位置の読出が既に実行されている場合、同じ読出バリア処理は不要である。 As another example, metadata 217 includes filtering values to accelerate repeated transactional access to data item 216. In many cases, when executing a transaction using software, an access barrier is executed in transactional memory access to ensure consistency and data validity. For example, the read barrier process is executed by executing the read barrier before the transactional load process. For example, to test whether the data item 216 is locked, determine whether the current read setting of the transaction is still valid, update the filtering value, and verify the transaction so that it can be verified later Record the version value of the read setting. However, if the reading of this position has already been executed during the execution of the transaction, the same reading barrier process is not necessary.

このため、解決方法の１つとして、読出フィルタを利用して、データアイテム２１６または当該アドレスに対する読出は当該トランザクションの実行中には実施されていない旨を示す第１のデフォルト値を保持して、データアイテム２１６または当該アドレスに対するアクセスが既に当該トランザクションの実行中に行なわれている旨を示す第２のアクセス済み値を保持する。基本的に、第２のアクセス済み値は、読出バリアを加速すべきか否かを示す。本例では、トランザクション的ロード処理を受信して、メタデータ位置２１７の読出フィルタリング値がデータアイテム２１６が既に読み出されていることを示す場合、一実施形態では、読出バリアを省略して、実施せず、不要で冗長な読出バリア処理を実行しないことによって、トランザクション的実行を加速する。尚、書込フィルタリング値も、書込処理について同じような構成を持つとしてよい。しかし、各フィルタリング値は例示的なものに過ぎず、一実施形態によると、１つのフィルタリング値を用いて、あるアドレスについて書込または読出のいずれであろうともアクセスが既に行なわれたか否かを示す。ここにおいて、データアイテム２１６のメタデータ２１７をロードおよびストアの両処理について確認するメタデータアクセス処理ではこのような１つのフィルタリング値を利用する。これは、メタデータ２１７が読出フィルタリング値および書込フィルタリング値を別々に含む上記の例とは対照的である。説明のために具体的な実施形態を挙げると、メタデータ２１７の４ビットを、対応付けられているデータアイテムについて読出バリアを加速すべきか否かを示す読出フィルタ、対応付けられているデータアイテムについて書込バリアを加速すべきか否かを示す書込フィルタ、取消処理を加速すべきか否かを示す取消フィルタ、および、ソフトウェアによって任意の方法でフィルタリング値として用いられる雑則フィルタに割り当てる。 For this reason, one solution is to use a read filter and retain a first default value indicating that reading to the data item 216 or the address has not been performed during execution of the transaction, A second accessed value indicating that an access to the data item 216 or the address has already been made during the execution of the transaction is held. Basically, the second accessed value indicates whether the read barrier should be accelerated. In this example, if a transactional load process is received and the read filtering value at metadata location 217 indicates that data item 216 has already been read, in one embodiment, the read barrier is omitted and implemented. Without executing unnecessary and redundant read barrier processing, transactional execution is accelerated. Note that the write filtering value may have the same configuration for the writing process. However, each filtering value is exemplary only, and according to one embodiment, a single filtering value is used to determine whether an access has already been made, whether written or read for an address. Show. Here, such a filtering value is used in the metadata access process for confirming both the load and store processes of the metadata 217 of the data item 216. This is in contrast to the above example where the metadata 217 includes read and write filtering values separately. For illustrative purposes, the four bits of metadata 217 are used to indicate whether or not the read barrier should be accelerated for the associated data item, and for the associated data item. The writing filter indicating whether or not the writing barrier should be accelerated, the canceling filter indicating whether or not the cancellation processing should be accelerated, and the miscellaneous filter used as a filtering value by software in an arbitrary method.

メタデータの例を他にも幾つか挙げると、データアイテム２１６に対応付けられているトランザクションに固有のハンドラまたは汎用ハンドラ用のアドレスの指定、説明または参照、データアイテム２１６に対応付けられているトランザクションの取消不能度／強固度、データアイテム２１６の損失、データアイテム２１６の監視情報の損失、データアイテム２１６に関するコンフリクトの検出、データアイテム２１６に対応付けられている読出設定の読出エントリまたは読出設定のアドレス、以前に記録されたバージョンのデータアイテム２１６、現在のバージョンのデータアイテム２１６、データアイテム２１６へのアクセスを許可するロック、データアイテム２１６のバージョン値、データアイテム２１６に対応付けられているトランザクションのトランザクション記述子、および、その他の公知のトランザクションに関連する記述情報がある。さらに、上述したように、メタデータの用途はトランザクションの情報に限定されない。このため、メタデータ２１７はさらに、トランザクションに関連しない、データアイテム２１６に対応付けられている情報、特性、属性または状態を含むとしてもよい。 Some other examples of metadata include addressing, explaining or referring to an address for a handler or generic handler specific to the transaction associated with the data item 216, and a transaction associated with the data item 216. Irrevocability / strength of data, loss of data item 216, loss of monitoring information of data item 216, detection of conflict regarding data item 216, read entry of read setting or read setting address associated with data item 216 The previously recorded version of the data item 216, the current version of the data item 216, a lock that allows access to the data item 216, the version value of the data item 216, and the transaction associated with the data item 216. Transfection transaction descriptors, and, there is a description information related to other known transaction. Furthermore, as described above, the use of metadata is not limited to transaction information. Thus, the metadata 217 may further include information, characteristics, attributes, or states associated with the data item 216 that are not related to the transaction.

メタデータについての説明を続けると、上述したハードウェアの監視部およびバッファ済みコヒーレンシ状態もまた、一部の実施形態ではメタデータと見なされる。監視部は、ある位置に対する外部読出要求または外部所有権読出要求を監視すべきか否かを示し、バッファ済みコヒーレンシ状態はデータアイテムを保持している対応データキャッシュラインがバッファされているか否かを示す。しかし、上記の例では、監視部は、キャッシュラインに付加されている属性ビットまたはキャッシュラインと直接対応付けられている属性ビットとして維持され、バッファ済みコヒーレンシ状態は、キャッシュラインコヒーレンシ状態ビットに追加される。このため、この場合には、ハードウェアの監視部およびバッファ済みコヒーレンシ状態は、キャッシュライン構造の一部を成しており、メタデータ２１７と図示しているような別のメタフィジカルアドレス空間では保持されない。しかし、他の実施形態では、監視部は、データアイテム２１６とは別のメモリ位置においてメタデータ２１７として保持されているとしてもよく、同様に、メタデータ２１７は、データアイテム２１６がバッファ済みデータアイテムである旨を示す参照を含むとしてもよい。逆に、データアイテム２１６が更新されてバッファ済み状態で保持される上述のその場更新アーキテクチャの代わりに、メタデータ２１７がバッファ済みデータアイテムを保持して、データアイテム２１６のグローバルに可視のバージョンは元々の位置で維持されるとしてもよい。この場合、コミットされると、メタデータ２１７に保持されているバッファ済み更新でデータアイテム２１６を置換する。 Continuing with the description of metadata, the hardware monitoring and buffered coherency states described above are also considered metadata in some embodiments. The monitor indicates whether an external read request or external ownership read request for a location should be monitored, and the buffered coherency state indicates whether the corresponding data cache line holding the data item is buffered . However, in the above example, the monitoring unit is maintained as an attribute bit attached to the cache line or an attribute bit directly associated with the cache line, and the buffered coherency state is added to the cache line coherency state bit. The For this reason, in this case, the hardware monitor and buffered coherency state form part of the cache line structure and are maintained in metadata and another metaphysical address space as illustrated. Not. However, in other embodiments, the monitoring unit may be maintained as metadata 217 in a memory location separate from the data item 216, and similarly, the metadata 217 may be a data item that has been buffered by the data item 216. It may include a reference indicating that Conversely, instead of the in-situ update architecture described above where the data item 216 is updated and held in a buffered state, the metadata 217 holds the buffered data item and the globally visible version of the data item 216 is It may be maintained in its original position. In this case, when committed, the data item 216 is replaced with a buffered update held in the metadata 217.

＜損失の多いメタデータ＞
バッファ済みキャッシュコヒーレンシ状態に関して上述したように、一実施形態によると、メタデータ２１７は損失が多く、メモリ２１５のドメインの外部からの外部要求には提供されないローカル情報である。一実施形態においてメモリ２１５が共有キャッシュメモリであると仮定すると、メタデータアクセス処理に対するミスは、キャッシュメモリ２１５のドメインの外部では提供されない。基本的に、損失が多いメタデータ２１７は、キャッシュドメイン内でローカルにのみ保持されており、メモリサブシステム全体で通用するデータとしては存在しないので、ミスを外部に渡して高位レベルメモリからの要求に答える理由はない。このため、損失の多いメタデータについてのミスは、高速且つ効率的に提供され、メタデータに対する外部要求が生成または提供されるまで待機することなく、プロセッサにおけるメモリが即座に割り当てられるとしてよい。 <Lost metadata>
As described above with respect to buffered cache coherency states, according to one embodiment, metadata 217 is local information that is lossy and is not provided for external requests from outside the domain of memory 215. Assuming that memory 215 is a shared cache memory in one embodiment, misses to metadata access processing are not provided outside the domain of cache memory 215. Basically, the lossy metadata 217 is held only locally in the cache domain and does not exist as data that is valid for the entire memory subsystem. There is no reason to answer. Thus, mistakes for lossy metadata may be provided quickly and efficiently, and memory in the processor may be immediately allocated without waiting for external requests for metadata to be generated or provided.

＜メタフィジカルアドレス空間＞
図示した実施形態で示すように、メタデータ２１７は、データアイテム２１６とは別のメモリ位置、つまり、異なるアドレスに保持されており、メタデータには別のメタフィジカルアドレス空間が存在することになる。メタフィジカルアドレス空間は、データアドレス空間と直交する空間で、メタフィジカルアドレス空間に対するメタデータアクセス処理が、物理的データエントリをヒットまたは修正することはない。しかし、メタデータがメモリ２１５等の同一メモリに保持されている実施形態では、メタフィジカルアドレス空間は、メモリ２１５における割り当てに関する競合のために、データアドレス空間に影響を及ぼす可能性がある。一例を挙げると、データアイテム２１６はメモリ２１５のエントリにキャッシュされており、データ２１６のメタデータ２１７はキャッシュの別のエントリに保持されている。この場合、後続のメタデータ処理では、データアイテム２１６のメモリ位置をエビクションの対象として選択して、別のデータアイテムのメタデータを代わりに保持する可能性がある。このため、メタデータ２１７のアドレスに対応付けられている処理はデータアイテム２１６をヒットしないことになり、メモリ２１５内のデータアイテム２１６等の物理データの代わりにメタデータ要素のメタデータアドレスが保持される。 <Metaphysical address space>
As shown in the illustrated embodiment, the metadata 217 is held at a different memory location than the data item 216, i.e., at a different address, and there will be a separate metaphysical address space for the metadata. . The metaphysical address space is a space orthogonal to the data address space, and the metadata access processing for the metaphysical address space does not hit or modify the physical data entry. However, in embodiments where the metadata is held in the same memory, such as the memory 215, the metaphysical address space may affect the data address space due to contention for allocation in the memory 215. In one example, the data item 216 is cached in an entry in the memory 215, and the metadata 217 of the data 216 is held in another entry in the cache. In this case, in subsequent metadata processing, the memory location of the data item 216 may be selected as an eviction target and the metadata of another data item may be retained instead. Therefore, the process associated with the address of the metadata 217 does not hit the data item 216, and the metadata address of the metadata element is held instead of the physical data such as the data item 216 in the memory 215. The

本例ではメタデータとデータとがキャッシュメモリ内の空間について競合するが、メタデータをローカルに保持できると、メモリヒエラルキー全体において通用するメタデータを拡散する高コストを必要とすることなく、メタデータが効率的にサポートされる。メタデータは同一メモリ、つまりメモリ２１５に保持されるこの例の仮定から推定されるように、別の実施形態では、データアイテム２１６の／データアイテム２１６に対応付けられているメタデータ２１７は、別のメモリ構造で保持される。この場合、メタデータのアドレスおよびデータのアドレスは同じであるが、メタデータアドレスのメタフィジカル部分によって、データストレージ構造の代わりに別のメタデータストレージ構造に導かれる。 In this example, metadata and data compete for the space in the cache memory. However, if the metadata can be held locally, the metadata can be used without the high cost of diffusing metadata that is valid in the entire memory hierarchy. Is efficiently supported. In another embodiment, the metadata 217 associated with / in the data item 216 is different from the data item 216, as estimated from this example assumption that the metadata is held in the same memory, ie, the memory 215. Retained in the memory structure. In this case, the metadata address and the data address are the same, but the metaphysical part of the metadata address leads to another metadata storage structure instead of the data storage structure.

メタデータとデータとの割合が１対１である場合、メタフィジカルアドレス空間は、データアドレス空間にシャドウするが、上述したように直交となる。これとは対照的に、以下に説明するが、メタデータを物理データに対して圧縮するとしてよい。この場合、メタデータ用のメタフィジカルアドレス空間のサイズは、データアドレス空間のサイズにシャドウしないが、依然として直交となる。 When the ratio of metadata to data is 1: 1, the metaphysical address space is shadowed in the data address space, but is orthogonal as described above. In contrast, as described below, metadata may be compressed against physical data. In this case, the size of the metaphysical address space for metadata does not shadow on the size of the data address space, but is still orthogonal.

＜メタフィジカルアドレス変換＞
メタフィジカルアドレス空間の説明を続けると、データアドレス空間内のデータアドレス、例えば、データアイテム２１６のアドレスを、メタフィジカルアドレス空間内のメタフィジカルアドレス、例えば、メタデータ２１７用のメタデータアドレスに変換する場合には任意の方法を利用するとしてよい。一実施形態によると、メタフィジカル変換ロジック２１０を用いて、データアドレス２００等のアドレスをメタデータアドレスに変換する。図示しているように、アドレス２００は、データアイテム２１６に対応付けられているアドレス、または、データアイテム２１６を参照するアドレスを含む。通常のデータ変換、例えば、物理アドレスまたは線形アドレスと、仮想アドレスとの間での変換を利用して、メモリ２１５においてデータアイテム２１６にインデックスを付与するとしてよい。また、メタデータ２１７とデータアイテム２１６との対応付けは、データアイテム２１６を参照するアドレス２００から、メタデータ２１７を参照する別の異なるアドレスへの同様の変換を含むので、アドレス２００をデータ変換ロジック２０５でデータアドレスに変換すること、および、メタフィジカル変換ロジック２１０で別のメタフィジカルアドレスに変換することは、互いに干渉し合うことのない別々のアクセスとなり、これら２つのアドレス空間が直交することとなる。より詳細に後述するが、データ変換ロジック２０５またはメタフィジカル変換ロジック２１０は、一実施形態によると、アドレス２００に対するアクセス処理の種類に応じて利用され、データアイテム２１６にアクセスする通常のデータアクセス処理の場合はデータ変換ロジック２０５を利用し、メタデータ２１７にアクセスするメタデータアクセス処理ではメタフィジカル変換ロジック２１０を利用する。これは、命令／処理のオペレーションコード（オペコード）の一部分で特定され得る。 <Metaphysical address conversion>
Continuing with the description of the metaphysical address space, the data address in the data address space, eg, the address of the data item 216, is converted into a metaphysical address in the metaphysical address space, eg, a metadata address for the metadata 217. In some cases, any method may be used. According to one embodiment, metaphysical conversion logic 210 is used to convert an address, such as data address 200, to a metadata address. As illustrated, the address 200 includes an address associated with the data item 216 or an address referring to the data item 216. The data item 216 may be indexed in the memory 215 using normal data translation, such as translation between a physical or linear address and a virtual address. Also, the association between the metadata 217 and the data item 216 includes a similar conversion from an address 200 referring to the data item 216 to another different address referring to the metadata 217, so that the address 200 is converted into data conversion logic. The conversion to a data address in 205 and the conversion to another metaphysical address in the metaphysical conversion logic 210 are separate accesses that do not interfere with each other, and the two address spaces are orthogonal. Become. As will be described in more detail below, the data conversion logic 205 or the metaphysical conversion logic 210 is used according to the type of access processing for the address 200 according to one embodiment, and is a normal data access processing for accessing the data item 216. In this case, the data conversion logic 205 is used, and in the metadata access processing for accessing the metadata 217, the metaphysical conversion logic 210 is used. This can be specified by a part of the instruction / processing operation code (opcode).

別の実施形態では、命令は、そのオペコードで特定されるように、所与のメタデータアドレスについてメタデータおよびデータの両方にアクセスする場合があるので、複雑な処理、例えば、メタデータに基づくデータへの条件付きストアを実行するとしてよい。一例を挙げると、命令は、メタデータを試験してある値に設定するためのメタデータ試験および設定処理にデコードされると共に、メタデータの試験が成功した場合にはデータをある値に設定する追加処理にデコードされる。別の例によると、データアイテムを、データメモリから読み出したデータに基づき、一致するメタデータアドレスに移動させるとしてよい。 In another embodiment, an instruction may access both metadata and data for a given metadata address, as specified by its opcode, so complex processing, for example, data based on metadata A conditional store to may be performed. In one example, the instruction is decoded into a metadata test and setting process to test the metadata and set it to a value, and set the data to a value if the metadata test is successful. Decoded for additional processing. According to another example, the data item may be moved to a matching metadata address based on the data read from the data memory.

データアドレス２００をメタデータ２１７用のメタデータアドレスに変換する例を以下で説明する。第１の例として、データアドレスをメタデータアドレスに変換することは、通常のデータ変換ロジック２０５の処理後に、物理アドレスまたは仮想アドレスを利用することと、メタデータアドレスからデータアドレスを分離するメタフィジカル変換ロジック２１０でメタフィジカル値を追加することとを含む。変換を行なうことなく仮想アドレスを利用する場合には、メタフィジカル変換ロジック２１０は、仮想アドレスとメタフィジカル値とを組み合わせるロジックを含む。しかし、通常の仮想−物理アドレス変換が利用される場合には、通常のデータ変換ロジック２０５を用いてアドレス２００から変換後のアドレスを取得し、メタフィジカル変換ロジック２１０が備えるロジックで、変換後のアドレスとメタフィジカル値とを組み合わせてメタデータアドレスを生成する。別の例を挙げると、データアドレス２００は、メタフィジカル変換ロジック２１０内の別個の変換構造、変換テーブルおよび／または変換ロジックを利用して変換され、区別可能なメタデータアドレスを取得するとしてよい。この場合、メタフィジカル変換ロジック２１０は、データ変換ロジック２０５と比較して、アドレス２００をメタフィジカル値と組み合わせるロジック等の別のロジックをミラーリングしているか、または、有しているが、メタフィジカル変換ロジック２１０は、アドレス２００を別の区別可能なメタデータアドレスに変換するためのページテーブル情報を含む。メタデータアドレスに情報を追加すること、メタデータアドレスに付加した情報で拡張すること、メタデータアドレスに含まれる情報を置換すること、または、データアドレスを変換してメタデータアドレスを取得することによって、結果として得られる区別可能なメタデータアドレスは、データアイテムを間違って更新したり読み出すことに対して直交性を維持しつつ、追加、拡張、置換または変換のためのアルゴリズムを用いて、データアイテムに対応付けられることが分かる。 An example of converting the data address 200 to a metadata address for the metadata 217 will be described below. As a first example, converting a data address to a metadata address includes using a physical address or a virtual address after processing of the normal data conversion logic 205, and separating the data address from the metadata address. Adding a metaphysical value in the transformation logic 210. When the virtual address is used without performing the conversion, the metaphysical conversion logic 210 includes logic that combines the virtual address and the metaphysical value. However, when normal virtual-physical address conversion is used, the post-conversion address is acquired from the address 200 using the normal data conversion logic 205, and the logic provided in the metaphysical conversion logic 210 is A metadata address is generated by combining the address and the metaphysical value. As another example, the data address 200 may be converted utilizing a separate conversion structure, conversion table and / or conversion logic within the metaphysical conversion logic 210 to obtain a distinct metadata address. In this case, the metaphysical conversion logic 210 mirrors or has another logic, such as a logic that combines the address 200 with the metaphysical value, compared to the data conversion logic 205, but the metaphysical conversion The logic 210 includes page table information for translating the address 200 into another distinguishable metadata address. By adding information to the metadata address, expanding with information added to the metadata address, replacing information contained in the metadata address, or converting the data address to obtain the metadata address The resulting distinct metadata address can be used to add, extend, replace or transform data items while maintaining orthogonality for erroneously updating or reading data items. It can be seen that they are associated with.

説明のために、データアドレスをメタデータアドレスに変換する処理、つまり、データアドレスに基づきメタデータアドレスを決定する処理の具体例を幾つか以下で挙げる。 For the sake of explanation, some specific examples of processing for converting a data address into a metadata address, that is, processing for determining a metadata address based on the data address will be given below.

（１）通常の仮想−物理アドレス変換を用いて第１のデータアドレスを第２のデータアドレスに変換し、メタフィジカル値を当該データアドレスに追加、付加、または、含めてメタデータアドレスを形成する。
（２）仮想−物理アドレス変換をデータアドレスに対して実行せず、メタフィジカル値をデータアドレスに追加、付加、または含めて、メタデータアドレスを形成する。
（３）メタフィジカル変換テーブルロジックを用いてデータアドレスを変換後メタデータアドレスに変換する。必ずしも必要ではないが、メタフィジカル値を変換後メタデータアドレスに追加、付加、または含めて、メタデータアドレスを形成する。さらに、上述した変換方法はいずれも、データ対メタデータの圧縮比を組み込み、つまり、当該圧縮比に基づいて行なわれ、圧縮比毎にメタデータを別個に格納する。 (1) The first data address is converted to the second data address using normal virtual-physical address conversion, and the metadata address is formed by adding, adding, or including the metaphysical value to the data address. .
(2) Virtual-physical address conversion is not performed on the data address, and a metaphysical value is added to, added to, or included in the data address to form a metadata address.
(3) The data address is converted into a post-conversion metadata address using the metaphysical conversion table logic. Although not necessarily required, a metadata address is formed by adding, adding, or including a metaphysical value to the converted metadata address. Further, any of the above-described conversion methods incorporates a data-to-metadata compression ratio, that is, is performed based on the compression ratio, and stores metadata separately for each compression ratio.

この場合、変換および／または圧縮のためにアドレスを修正するとしてよい。例えば、アドレスの特定のビットを無視したり、アドレスの特定のビットを削除したり、データの粒度を選択するためにどのビット範囲をアドレスにおいて利用するかを変更したり、特定のビットを変換したり、特定のビットを追加するかまたはメタデータ関連情報で置換したりするとしてよい。圧縮については図４を参照しつつ詳細に後述する。 In this case, the address may be modified for translation and / or compression. For example, you can ignore certain bits in the address, delete certain bits in the address, change which bit range is used in the address to select the granularity of data, or convert certain bits Or a specific bit may be added or replaced with metadata-related information. The compression will be described later in detail with reference to FIG.

＜複数のメタフィジカルアドレス空間＞
図３では、複数のメタフィジカルアドレス空間をサポートする実施形態を図示している。一実施形態によると、処理要素毎に１つのメタフィジカルアドレス空間に対応付けられており、各処理要素は独立したメタデータを維持可能である。４つの処理要素３０１−３０４を図示している。上述したように、処理要素は、図１を参照しつつ上述した要素のうちいずれを含むとしてもよい。第１の例を挙げると、複数の処理要素は、１つのプロセッサの複数のコアを含む。しかし、以下に記載する説明のための例から分かるように、処理要素３０１−３０４は、１つのプロセッサにおける複数のハードウェアスレッド（スレッド）に関連付けて説明する。各ハードウェアスレッドは、１つのソフトウェアスレッドおよび複数のソフトウェアサブシステムを実行する。 <Multiple metaphysical address spaces>
FIG. 3 illustrates an embodiment that supports multiple metaphysical address spaces. According to one embodiment, each processing element is associated with one metaphysical address space, and each processing element can maintain independent metadata. Four processing elements 301-304 are illustrated. As described above, the processing elements may include any of the elements described above with reference to FIG. In the first example, the plurality of processing elements include a plurality of cores of one processor. However, as can be seen from the illustrative examples described below, the processing elements 301-304 are described in relation to multiple hardware threads (threads) in one processor. Each hardware thread executes one software thread and multiple software subsystems.

このため、スレッド３０１−３０４の各スレッドに別個のメタデータを維持させるのが有益である。一実施形態によると、メタフィジカル変換ロジック３１０は、複数の異なるスレッド３０１−３０４からのアクセスを、適切なメタフィジカルアドレス空間に対応付ける。一例を挙げると、メタデータアクセス処理によって参照されるアドレスと共に利用されるスレッド識別子（ＩＤ）によって、正しいメタフィジカルアドレス空間へと導かれる。 For this reason, it is beneficial to have each thread of threads 301-304 maintain separate metadata. According to one embodiment, the metaphysical translation logic 310 maps accesses from multiple different threads 301-304 to the appropriate metaphysical address space. As an example, the correct metaphysical address space is guided by the thread identifier (ID) used together with the address referred to by the metadata access process.

説明のために、スレッド３０２に対応付けられており、データアイテム３１６のデータアドレス３００を参照するメタデータアクセス処理を受信したと仮定する。上述したように任意の変換方法を利用して、データアイテム３１６用のデータアドレスをメタデータアドレスに変換するとしてよい。しかし、この変換処理はさらに、スレッドＩＤ３０２を組み合わせることを含む。スレッドＩＤ３０２は、例えば、スレッド３０２用の制御レジスタから取得するとしてもよいし、または、スレッド３０２から受信した命令のオペコードから受信するとしてもよい。この組み合わせ処理は、スレッドＩＤ３０２をアドレスに付加すること、アドレスに含まれるビットを置換すること、または、その他の公知のスレッドＩＤとアドレスとを対応付ける方法を実行することを含むとしてよい。このようにして、メタフィジカル変換ロジック３１０は、処理要素３０２について、データアイテム３１６に対応付けられているメタフィジカルアドレス空間を選択することができる／当該メタフィジカルアドレス空間に誘導され得る。 For purposes of explanation, assume that a metadata access process associated with thread 302 and referring to data address 300 of data item 316 has been received. As described above, an arbitrary conversion method may be used to convert the data address for the data item 316 into a metadata address. However, this conversion process further includes combining the thread ID 302. The thread ID 302 may be acquired from, for example, a control register for the thread 302, or may be received from an operation code of an instruction received from the thread 302. This combination processing may include adding the thread ID 302 to the address, replacing a bit included in the address, or executing another known method of associating the thread ID with the address. In this way, the metaphysical translation logic 310 can select / direct to the metaphysical address space for the processing element 302 that is associated with the data item 316.

この例から推測すると、メタフィジカルアドレスへの変換の一環としてスレッド３０１−３０４についてスレッドＩＤを利用することによって、処理要素３０１−３０４はそれぞれ、データアイテム３１６について、独立してメタデータを維持可能である。しかし、プログラマは、複数のメタフィジカルアドレス空間を別個に管理する必要はない。これは、ハードウェアが、ソフトウェアに対してトランスペアレントに、スレッドＩＤを用いて複数のメタフィジカルアドレス空間を分離することが可能なためである。また、メタデータアクセス毎に、一意的なスレッドＩＤへの参照を含む別個のアドレス群に対応付けられているので、複数のメタフィジカルアドレス空間は直交しており、あるスレッドからのあるメタデータアクセスは別のスレッドからのメタデータにアクセスすることはない。 Inferring from this example, by using the thread ID for threads 301-304 as part of the conversion to a metaphysical address, each processing element 301-304 can maintain metadata independently for each data item 316. is there. However, the programmer does not need to manage multiple metaphysical address spaces separately. This is because the hardware can separate a plurality of metaphysical address spaces using thread IDs transparently to software. In addition, since each metadata access is associated with a separate address group including a reference to a unique thread ID, the plurality of metaphysical address spaces are orthogonal, and a certain metadata access from a thread Does not access metadata from another thread.

しかし、後述するように、メタデータにアクセスする命令／処理に関して、あるスレッドからのメタデータアクセスが別のスレッドのメタデータへのアクセスを提供される場合があり得る。つまり、一部の実施例では、複数のＰＥＩＤおよび／またはＭＤＩＤ（後述する）にわたるアクセスが有益である場合がある。例えば、ハードウェアがコンフリクトを検出したか否かを判断するために、別のスレッドからのメタデータの監視を確認して、対応付けられているデータアイテムを別のスレッドが監視しているか否かを判断するために、他のスレッドのメタデータをクリアするために、または、スレッドが確認する必要があるコミット条件を判断するために、データアイテム３１６に対応付けられている他のスレッドのメタデータを修正またはクリアする。 However, as described below, with respect to instructions / processes that access metadata, a metadata access from one thread may be provided with access to another thread's metadata. That is, in some embodiments, access across multiple PEIDs and / or MDIDs (discussed below) may be beneficial. For example, to determine whether the hardware has detected a conflict, check for metadata monitoring from another thread and whether another thread is monitoring the associated data item To clear the other thread's metadata or to determine the commit condition that the thread needs to check to determine the other thread's metadata associated with the data item 316 Correct or clear

この場合、別のスレッドのメタデータにアクセスするための処理についての特定のオペコードが認識されるので、メタフィジカル変換ロジック３１０は、アドレス３００の変換を実行して、アクセスすべきメタデータについて全てのメタデータアドレスを得る。説明のために具体例を挙げると、４ビットがアドレス３００に付加されており、各ビットは処理要素３０１−３０４のうちの１つを表しており、クリア処理等のメタデータアクセス処理はデータアイテム３１６についての全てのメタデータをクリアして、メタフィジカル変換ロジック３１０は全てのメタデータ３１７にアクセスするように４ビットの各ビットを設定する。この場合、メモリ３１５用のルックアップロジックは、４ビット全てが設定されている１つのアクセスで全てのメタデータ３１７にアクセスするように設計されているか、または、メタフィジカル変換ロジック３１０は、４つの別個のアクセスを生成して、全てのメタデータ３１７にアクセスするように、それぞれにおいて４ビットのうち異なるスレッドＩＤビットを設定するとしてよい。説明のために一例を挙げると、マスクをアドレス値に適用して、１つのスレッドに別のスレッドのメタデータをヒットさせるとしてもよい。 In this case, since a specific opcode for the process for accessing the metadata of another thread is recognized, the metaphysical conversion logic 310 performs the conversion of the address 300 and all of the metadata to be accessed is converted. Get the metadata address. For illustrative purposes, 4 bits are added to the address 300, each bit represents one of the processing elements 301-304, and metadata access processing such as clear processing is a data item. Clearing all metadata for 316, the metaphysical conversion logic 310 sets each of the 4 bits to access all metadata 317. In this case, the look-up logic for the memory 315 is designed to access all metadata 317 with one access with all four bits set, or the metaphysical conversion logic 310 has four Different thread ID bits of each of the 4 bits may be set to generate separate accesses and access all metadata 317. To illustrate, for example, a mask may be applied to an address value to cause one thread to hit another thread's metadata.

また、図示しているように、処理要素３０１−３０４はそれぞれ、複数のメタフィジカルアドレス空間に対応付けられており、１つのスレッドにおける複数のコンテクストまたはソフトウェアサブシステムを、複数のメタデータアドレス空間に対してインターリーブするとしてよい。例えば、１つの処理要素の複数のソフトウェアサブシステムに複数の独立したメタデータ集合を維持させることが有益である場合がある。このため、一例を挙げると、直交するメタデータアドレス空間は、さまざまな処理要素レベルで提供されるとしてよい。例えば、コアレベル、ハードウェアスレッドレベル、および／または、ソフトウェアサブシステムレベルで提供されるとしてよい。図中では、処理要素３０１−３０４はそれぞれ、２つのメタフィジカルアドレス空間に対応付けられており、これら２つのメタフィジカルアドレス空間はそれぞれ、処理要素のうち何れか１つで実行される複数のソフトウェアサブシステムと対応付けられる。 Further, as shown in the figure, each of the processing elements 301 to 304 is associated with a plurality of metaphysical address spaces, and a plurality of contexts or software subsystems in one thread are converted into a plurality of metadata address spaces. It may be interleaved. For example, it may be beneficial to have multiple software subsystems of a processing element maintain multiple independent metadata sets. Thus, by way of example, orthogonal metadata address spaces may be provided at various processing element levels. For example, it may be provided at the core level, the hardware thread level, and / or the software subsystem level. In the figure, each of the processing elements 301-304 is associated with two metaphysical address spaces, and each of these two metaphysical address spaces is a plurality of software executed by any one of the processing elements. Associated with a subsystem.

ソフトウェアサブシステムは、別個のメタフィジカルアドレス空間を利用する処理要素で実行されるべき任意のタスクまたはコードを含む。説明のための例を挙げると、個別にメタフィジカルアドレス空間に対応付けられる４つのサブシステムは、１つの処理要素で実行されるトランザクショナルランタイムサブシステム、ガベージコレクションランタイムサブシステム、メモリ保護サブシステム、および、ソフトウェア変換サブシステムを含む。この場合、ソフトウェアサブシステムはそれぞれ、この処理要素の制御を持つタイミングが異なるとしてよい。別の例を挙げると、１つのソフトウェアサブシステムは、１つの処理要素で実行される複数の別個のトランザクションを含む。実際には、同一スレッドで実行される複数のネスト化されたトランザクションは別々のメタフィジカルアドレス空間に対応付けられることが望ましいとしてよい。説明すると、外側のトランザクションにおけるデータアイテムへのアクセスについてのフィルタリング試験は失敗に終わる可能性があるが、内側のネスト化されたトランザクションにおける同じデータアイテムに対するアクセスについての第２の別のフィルタを提供するのは有益である。この第２の別のフィルタは別に、内側のトランザクションにおけるアクセスを加速することに成功する可能性がある。さらに、ネスト化された内側のトランザクションがアボートされると、外側のトランザクションのためにメタデータを維持するために、ネスト化されたトランザクション、つまり、サブシステムはそれぞれ、異なるメタデータ空間と対応付けられており、内側のネスト化されたトランザクションのメタデータのクリアが外側のトランザクションのメタデータに影響を及ぼさないようにする。しかし、ソフトウェアサブシステムは、メタデータを管理することが出来る任意のタスクまたはコードであってよく、これに限定されない。 The software subsystem includes any task or code that is to be executed on a processing element that utilizes a separate metaphysical address space. As an illustrative example, the four subsystems that are individually mapped to the metaphysical address space are a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, And a software conversion subsystem. In this case, the software subsystems may have different timings for controlling the processing elements. As another example, a software subsystem includes multiple separate transactions that are executed on a processing element. In practice, it may be desirable for multiple nested transactions executed in the same thread to be associated with different metaphysical address spaces. To illustrate, the filtering test for access to data items in the outer transaction may fail, but provides a second separate filter for access to the same data item in the inner nested transaction Is beneficial. Apart from this second further filter, it may succeed in accelerating access in the inner transaction. In addition, when a nested inner transaction is aborted, each nested transaction, or subsystem, is associated with a different metadata space in order to maintain metadata for the outer transaction. And clearing the inner nested transaction's metadata does not affect the outer transaction's metadata. However, the software subsystem may be any task or code capable of managing metadata, and is not limited to this.

一実施形態によると、ソフトウェアサブシステムのレベルにおいて直交するメタフィジカルアドレス空間を提供するために、アドレスは、上述したように処理要素ＩＤ（ＰＥＩＤ）と組み合わせて、さらにメタデータＩＤ（ＭＤＩＤ）またはコンテクストＩＤと組み合わせる。このため、別のメタデータを、処理要素内のサブシステムについて一意的に特定するとしてよい。上記の例を用いて、処理要素３０１−３０４がハードウェアスレッドであり、スレッド３０２が外側トランザクションおよびこの外側トランザクションの内側にネスト化されている内側トランザクションを実行していると仮定する。外側トランザクションについて、メタデータ３１７ｃが、データアイテム３１６のデータアドレス３００をアドレスおよびスレッドＩＤ（ＴＩＤ）と、メタデータ３１７ｃを参照する外側トランザクションのためのメタデータＩＤ（ＭＤＩＤ）とに変換するメタフィジカル変換ロジック３１０によってデータアイテム３１６に対応付けられている。 According to one embodiment, in order to provide an orthogonal metaphysical address space at the level of the software subsystem, the address is combined with a processing element ID (PEID) as described above, and further a metadata ID (MDID) or context. Combine with ID. Thus, another metadata may be uniquely identified for the subsystem within the processing element. Using the above example, assume that processing elements 301-304 are hardware threads, and thread 302 is executing an outer transaction and an inner transaction nested inside of this outer transaction. For the outer transaction, the metadata 317c converts the data address 300 of the data item 316 into an address and thread ID (TID) and a metadata ID (MDID) for the outer transaction that references the metadata 317c. Corresponding to data item 316 by logic 310.

説明のための一例に過ぎないが、メタデータ３１７ｃは、読出フィルタリング値、書込フィルタリング値、取消フィルタリング値、および、雑則フィルタリング値の４つのフィルタリング値、データアイテム３１６のバックアップ位置に対するポインタまたはその他の参照、データアイテム３１６に対する監視部が失われたか否かを示す監視値、トランザクション記述子値、および、データアイテム３１６のバージョン値を含む。同様に、内側トランザクションは、メタデータ３１７ｃと同じメタデータフィールドを含む、データアイテム３１６のためのメタデータ３１７ｄと対応付けられている。上述したように、メタフィジカル変換ロジック３１０は、データアイテム３１６のためのデータアドレス３００を、メタデータ３１７ｄを参照する内側トランザクションのメタデータＩＤおよびスレッドＩＤと組み合わせたアドレスに変換する。 For illustrative purposes only, metadata 317c may include four filtering values, a read filtering value, a writing filtering value, a cancellation filtering value, and a miscellaneous filtering value, a pointer to the backup location of data item 316, or other The reference includes a monitoring value indicating whether the monitoring unit for the data item 316 has been lost, a transaction descriptor value, and a version value of the data item 316. Similarly, the inner transaction is associated with metadata 317d for data item 316 that includes the same metadata fields as metadata 317c. As described above, the metaphysical conversion logic 310 converts the data address 300 for the data item 316 into an address that is combined with the metadata ID and thread ID of the inner transaction that references the metadata 317d.

この場合、メタデータ３１７ｃを参照しているメタデータアドレスと、メタデータ３１７ｄを参照しているメタデータアドレスとの間の相違点は、外側トランザクションおよび内側トランザクションのためのメタデータＩＤだけであるが、アドレスについてのこの相違点によって、アドレス空間同士が互いに別個となる／直交することとなり、内側トランザクションからのアクセスのＭＤＩＤは、外側トランザクションからのアクセスのものとは異なるので、内側トランザクションからのメタデータへのアクセスは、外側トランザクションからのメタデータに影響を及ぼすことはない。上述したように、これは、ネスト化されたトランザクションをロールバックする上で、または、トランザクションのレベル毎に異なるメタデータ値を保持する上で有益である場合がある。具体的には、内側トランザクションがアボートされると、メタデータ３１７ｄに保持されているデータアイテム３１６のバックアップデータは、クリアされるか、または、クリアすることなくあるいはメタデータ３１７ｃに保持されている外側トランザクションのバックアップデータに影響を与えることなく、データアイテム３１６を内側トランザクション前のエントリポイントにロールバックするために用いられるとしてよい。 In this case, the only difference between the metadata address referring to the metadata 317c and the metadata address referring to the metadata 317d is the metadata ID for the outer transaction and the inner transaction. Because of this difference in address, the address spaces will be separate / orthogonal from each other, and the MDID of the access from the inner transaction is different from that of the access from the outer transaction, so the metadata from the inner transaction Access to does not affect metadata from outer transactions. As mentioned above, this may be useful for rolling back nested transactions or for maintaining different metadata values for each level of transaction. Specifically, when the inner transaction is aborted, the backup data of the data item 316 held in the metadata 317d is cleared, or the outer data held in the metadata 317c without being cleared. It may be used to roll back the data item 316 to the entry point before the inner transaction without affecting the backup data of the transaction.

尚、ソフトウェアサブシステムメタフィジカルアドレス空間同士を分離するためのメタデータＩＤ（ＭＤＩＤ）は、任意のサイズであってよく、さまざまなソースから得られるとしてよい。非常に簡略した例を説明のために挙げると、４つの処理要素（ＰＥ）３０１−３０４がある場合、ＰＥＩＤは２ビットの組み合わせ、００、０１、１０、１１であるとしてよい。同様に、４つの別個のメタフィジカルアドレス空間がサポートされている場合、２ビットから成るＭＤＩＤ、００、０１、１０、１１で同様に４つのサブシステムを区別可能である。説明すると、処理要素３０２およびＰＥ３０２内の第２のサブシステムを表す値は、０１０１を含む（最初の２ビットの０１はＰＥ３０２を表し、続く２ビットの０１は第２のサブシステムを表す）。本例では、メタフィジカル変換ロジックは、この値と、データアドレス３００またはデータアドレス３００の変換後の値とを組み合わせて、メタデータ位置３１７ｄを含むＰＥ３０２のＭＤＩＤである０１を参照する。 Note that the metadata ID (MDID) for separating the software subsystem metaphysical address spaces may be of any size and may be obtained from various sources. Taking a very simple example for illustration purposes, if there are four processing elements (PE) 301-304, the PEID may be a two-bit combination, 00, 01, 10, 11. Similarly, if four separate metaphysical address spaces are supported, the four subsystems can be similarly distinguished by a 2-bit MDID, 00, 01, 10, 11. To illustrate, the value representing the second subsystem in processing element 302 and PE 302 includes 0101 (the first two bits of 01 represent PE 302 and the subsequent two bits of 01 represent the second subsystem). In this example, the metaphysical conversion logic combines this value with the data address 300 or the converted value of the data address 300, and refers to 01 which is the MDID of the PE 302 including the metadata position 317d.

しかし、スレッドＩＤおよびＭＤＩＤは共に、より複雑であってもよい。例えば、スレッド３０１−３０２がメモリ３１５へのアクセスを共有しており、スレッド３０３−３０４がメモリ３１５へのアクセスを共有していないリモート処理要素であると仮定する。また、スレッド３０１−３０２はそれぞれ２つのソフトウェアサブシステムをサポートしており、スレッド３０１−３０２については、ＰＥ３０１ＭＤ０、ＰＥ３０１ＭＤ１、ＰＥ３０２ＭＤ０、およびＰＥ３０２ＭＥ１という合計で４つの直交するアドレス空間がサポートされていると仮定する。この場合、メタデータアドレスを取得するために用いられるスレッドＩＤとＭＤＩＤとを組み合わせた値は、オペコード、制御レジスタ、または、これらの組み合わせから得られるとしてよい。説明すると、オペコードは、コンテクスト／ＭＤＩＤのために１ビットを提供し、制御レジスタは、処理要素は２つのみと仮定して、処理要素ＩＤ（ＰＥＩＤ）のために１ビットを提供し、ＭＤＣＲ３２０等のメタデータ制御レジスタは、細粒化のために特定のソフトウェアサブシステム／コンテクストを特定するための４ビットを提供する。このため、データアイテム３１６のアドレス３００を参照しているメタデータアクセス処理を第２のスレッドＰＥ３０２から受信すると、オペコードからの１ビット、つまり、第２のコンテクストを示す１を含む第１のビット、および、処理要素３０２の制御レジスタからの第２のビット、処理要素３０２を示す１を含む第２のビットを、第２のスレッドに対応付けられているメタデータ制御レジスタ（ＭＤＣＲ）３２０からのＭＤＩＤと組み合わせる。ＭＤＣＲは、受信した処理に対応付けられている適切なサブシステムを特定するべく、第２のスレッド「００１０」を制御している現在のサブシステムのＭＤＩＤで既に更新されている。メタフィジカル変換ロジックは、組み合わせた値、例えば「１１００１０」を取得して、さらに参照されているデータアドレス３００またはデータアドレス３００の変換後の値と組み合わせて、メタデータアドレスを取得する。しかし、メタデータアドレスの「１１００１０」部分は、アクセス処理の出所であるサブシステムについて一意的であるので、第２のスレッドおよびその他のスレッドにおけるその他のサブシステムのメタフィジカルアドレス空間であって互いに直交しているメタデータアドレス３１７ａ、ｂ、ｃ、ｅ、ｆ、ｇ、ｈをヒットまたは影響を与えることなく、メモリ３１５内のメタデータアドレス３１７ｄのみをヒットまたは修正する。 However, both thread ID and MDID may be more complex. For example, assume that threads 301-302 share access to memory 315 and threads 303-304 are remote processing elements that do not share access to memory 315. Each of the threads 301-302 supports two software subsystems, and for the threads 301-302, a total of four orthogonal address spaces, PE301 MD0, PE301 MD1, PE302 MD0, and PE302 ME1, are supported. Assuming that In this case, the value obtained by combining the thread ID and the MDID used for acquiring the metadata address may be obtained from the operation code, the control register, or a combination thereof. To illustrate, the opcode provides one bit for context / MDID, the control register assumes one processing element ID (PEID), assuming only two processing elements, MDCR 320, etc. The metadata control register provides 4 bits to identify a particular software subsystem / context for fine graining. For this reason, when a metadata access process referring to the address 300 of the data item 316 is received from the second thread PE302, one bit from the opcode, that is, a first bit including 1 indicating the second context, And the second bit from the control register of the processing element 302, the second bit including 1 indicating the processing element 302, and the MDID from the metadata control register (MDCR) 320 associated with the second thread. Combine with. The MDCR has already been updated with the MDID of the current subsystem that is controlling the second thread “0010” to identify the appropriate subsystem associated with the received process. The metaphysical conversion logic acquires a combined value, for example, “110010”, and further acquires a metadata address by combining with the data address 300 or the converted value of the data address 300 that is referred to. However, since the “110010” portion of the metadata address is unique for the subsystem from which the access process originated, it is the metaphysical address space of the other subsystems in the second thread and other threads that are orthogonal to each other. Only the metadata address 317d in the memory 315 is hit or corrected without hitting or affecting the metadata addresses 317a, b, c, e, f, g, and h.

説明のために具体例を挙げて、ＭＤＣＲの具体的な形態を説明する。一部の実施形態によると、ＩＳＡは、ＭＤＩＤをＭＤＩＤ依存性メタデータロード／ストア／試験／設定命令へと供給するスレッド毎のメタデータ識別子レジスタ（ＭＤＩＤレジスタ）で拡張されるとしてよい。一部の実施形態によると、このようなレジスタを複数備えると便利である。例えば、メタデータ制御レジスタ（ＭＤＣＲ）は、現在のメタデータコンテクストＩＤ（ＭＤＩＤ）を保持する３２ビットの読み書きレジスタである。ＣＲＭＯＶで更新されるとしてよい。ビットフィールドの定義の一例を以下に示す。

Specific examples of the MDCR will be described with specific examples for explanation. According to some embodiments, the ISA may be extended with a per-thread metadata identifier register (MDID register) that provides MDID to MDID-dependent metadata load / store / test / set instructions. In some embodiments, it is convenient to have a plurality of such registers. For example, the metadata control register (MDCR) is a 32-bit read / write register that holds the current metadata context ID (MDID). It may be updated with CR MOV. An example of bit field definition is shown below.

ＭＤＩＤ０およびＭＤＩＤ１は、命令群で同時にアクセス可能なメタデータＩＤである。これらのフィールドのうち実際に利用されるビット数は、ＭＤＩＤ＿ｓｉｚｅである。一実施形態によると、このフィールドは、プロセッサ設計で仕様が定められているので、許可レベルでのみ読み出される。しかし、他の実施形態によると、他のレベルの特権レベルでこのサイズを修正可能であるとしてもよい。ＭＤＩＤがこの割り当てられたビットサイズに収まっていることを確認するハードウェア確認メカニズムはないとしてよい。一実施形態によると、ＭＤＩＤ０およびＭＤＩＤ１は、任意の許可レベルで読み書きが可能である。また、読出結果が常に０または１となる特別なメタデータ空間を指定する特別なＭＤＩＤ値を利用することも可能であるとしてよい。これは、図６および図７を参照して説明するメタデータ値を強制するレジスタと同様に、あるブロック内の全てのメタデータ試験を真または偽に強制するソフトウェアによって利用されるとしてよい。 MDID0 and MDID1 are metadata IDs that can be accessed simultaneously by an instruction group. Of these fields, the number of bits actually used is MDID_size. According to one embodiment, this field is read only at the permission level as specified in the processor design. However, according to other embodiments, this size may be modifiable at other levels of privilege. There may be no hardware confirmation mechanism to confirm that the MDID is within this allocated bit size. According to one embodiment, MDID0 and MDID1 can be read and written at any permission level. It is also possible to use a special MDID value that designates a special metadata space in which the read result is always 0 or 1. This may be used by software that forces all metadata tests in a block to be true or false, as well as registers that force metadata values as described with reference to FIGS.

しかし、別の例では、上述したように、メタフィジカル変換ロジック３１０は、デコーダ（不図示）と共に、スレッド３０１のメタデータアドレス空間のメタデータにアクセスすることを意図している、スレッド３０２からのメタデータアクセス処理を認識することができ、このような特定の命令／処理のアクセスに対してスレッド３０１のメタデータの読出または修正を許可する。 However, in another example, as described above, the metaphysical transformation logic 310, together with a decoder (not shown), from the thread 302 intended to access the metadata in the metadata address space of the thread 301. The metadata access process can be recognized, and the reading or modification of the metadata of the thread 301 is permitted for such a specific instruction / process access.

＜データに対するメタデータの圧縮＞
上記では、データ対メタデータを１対１でマッピングする場合、圧縮されていないメタデータを説明した。しかし、データに比べて量が削減されたメタデータを利用すること、メタデータのサイズをデータのサイズよりも小さくするメタデータの圧縮がより効率的である場合もある。尚、図２および図３に示すメタフィジカルアドレス変換ロジック２１０および３１０は、アドレスの変換および修正を行なう際には、圧縮を考慮に入れて、圧縮メタデータを参照するようにするとしてよい。図４は、メタデータの圧縮を実現するようにアドレスを修正する実施形態を示す。具体的には、データ対メタデータの圧縮比が８である実施形態を示す。図２および図３のメタフィジカルアドレス変換ロジック２１０および３１０等の制御ロジックは、メタデータアクセス処理によって参照されているデータアドレス４００を受信する。一例を挙げると、圧縮は、アドレス４００においてｌｏｇ２（Ｎ）個のビットをシフト、または、アドレス４００からｌｏｇ２（Ｎ）個のビットを削除することを含む。尚、Ｎは、データ対メタデータの圧縮比を表す。図示している例では、圧縮比が８の場合、メタデータアドレス４０５について、３ビットを下位シフトさせて削除する。基本的に、６４ビットを含み、メモリ内の特定のデータバイトを参照するアドレス４００は、３ビットを切り捨てて、バイトの粒度でメモリ内のメタデータを参照するために用いられるメタデータバイトアドレス４０５を形成する。このうち、メタデータバイトアドレスを形成するためにアドレスから既に削除された３ビットを用いてメタデータの１ビットが選択される。 <Metadata compression for data>
In the above description, uncompressed metadata has been described when mapping data to metadata on a one-to-one basis. However, in some cases, it is more efficient to use metadata whose amount is reduced than that of data and to compress metadata so that the metadata size is smaller than the data size. The metaphysical address conversion logic 210 and 310 shown in FIGS. 2 and 3 may refer to the compressed metadata in consideration of compression when performing address conversion and correction. FIG. 4 illustrates an embodiment in which addresses are modified to achieve metadata compression. Specifically, an embodiment where the data to metadata compression ratio is 8 is shown. Control logic such as the metaphysical address translation logic 210 and 310 of FIGS. 2 and 3 receives the data address 400 referenced by the metadata access process. In one example, compression includes shifting log2 (N) bits at address 400 or deleting log2 (N) bits from address 400. N represents the compression ratio of data to metadata. In the illustrated example, when the compression ratio is 8, the metadata address 405 is deleted by shifting the lower 3 bits. Basically, an address 400 that contains 64 bits and refers to a particular data byte in memory is a metadata byte address 405 that is used to refer to metadata in memory with byte granularity by truncating 3 bits. Form. Of these, 1 bit of metadata is selected using 3 bits already deleted from the address to form a metadata byte address.

一実施形態によると、シフト／削除されたビットの代わりに他のビットを利用する。図示しているように、アドレス４００がシフトされた後の高位ビットは、代わりにゼロを用いる。しかし、削除／シフトされたビットの代わりに他のデータまたは情報を用いるとしてもよい。例えば、メタデータアクセス処理に対応付けられているメタデータＩＤ（ＭＤＩＤ）、コンテクスト識別子（ＩＤ）および／または処理要素ＩＤを用いるとしてよい。本例では最下位ビットからの所定数ビットを削除したが、キャッシュ組成、キャッシュ回路タイミング、データに対するメタデータのローカル度、および、データとメタデータとの間のコンフリクトを最小限に抑える等、任意の数の要因に基づき、どの位置のビットを削除および置換するとしてもよい。 According to one embodiment, other bits are used instead of the shifted / deleted bits. As shown, the high order bit after address 400 is shifted uses zero instead. However, other data or information may be used in place of the deleted / shifted bits. For example, a metadata ID (MDID), context identifier (ID), and / or processing element ID associated with the metadata access process may be used. In this example, the predetermined number of bits from the least significant bit have been deleted, but arbitrary, such as cache composition, cache circuit timing, locality of metadata for data, and minimizing conflicts between data and metadata The bit at any position may be deleted and replaced based on the number of factors.

例えば、データアドレスはｌｏｇ２（Ｎ）だけシフトさせるのではなく、アドレスビット０：２をゼロにするとしてもよい。このため、同じである物理アドレスおよび仮想アドレスのビットは、上記の例のようにシフトされず、ビット１１：３等の修正されないビットでセットおよびバンクを事前に選択することが可能となる。 For example, the data address may not be shifted by log2 (N), but the address bits 0: 2 may be set to zero. For this reason, the bits of the same physical address and virtual address are not shifted as in the above example, and it becomes possible to select a set and a bank in advance with bits that are not modified, such as bits 11: 3.

尚、変換に関する説明については、圧縮を組み合わせるとしてもよい。言い換えると、圧縮比は、図２および図３のメタフィジカルアドレス変換ロジック２１０および３１０に入力されるとしてよく、当該変換ロジックは、ＰＥＩＤ、ＣＩＤ、ＭＤＩＤ、メタフィジカル値、または、データアドレスをメタデータアドレスに変換するためのほかの情報と共に圧縮比を用いる。メタデータアドレスはこの後、メタデータを保持しているメモリにアクセスするために用いられる。上述したように、メタデータはローカルに生成されるものであるので、損失が多く、メタデータアドレスに基づきメモリに対してミスが発生すれば、即座に且つ効率的に対応され、メモリ位置の割り当ては、外部ミス対応要求を発生させることなく、且つ、対応すべき外部要求を待機することなく行なわれる。ここで、エントリは、通常のやり方で、メタデータについて割り当てられる。例えば、メタデータアドレス４０５および最長時間未使用（ＬＲＵ）アルゴリズム等のキャッシュ置換アルゴリズムに基づいて、図２に示すエントリ２１７等のエントリを選択して、割り当てて、メタデータデフォルト値に初期化する。このため、メタデータと通常のデータは空間について競合するが、メタデータは圧縮した状態を維持し、他のソフトウェアサブシステム／処理要素とは別のままである。 In addition, about the description regarding conversion, you may combine compression. In other words, the compression ratio may be input to the metaphysical address translation logic 210 and 310 of FIGS. 2 and 3, which translates the PEID, CID, MDID, metaphysical value, or data address into metadata. Use compression ratio along with other information to convert to address. The metadata address is then used to access the memory holding the metadata. As described above, since the metadata is generated locally, there is a lot of loss, and if a mistake occurs in the memory based on the metadata address, it can be dealt with immediately and efficiently, and the allocation of the memory location Is performed without generating an external error handling request and without waiting for an external request to be handled. Here, entries are allocated for metadata in the usual way. For example, based on the cache replacement algorithm such as the metadata address 405 and the least recently used (LRU) algorithm, an entry such as the entry 217 shown in FIG. 2 is selected, assigned, and initialized to a metadata default value. Thus, while metadata and regular data compete for space, the metadata remains compressed and remains separate from other software subsystems / processing elements.

尚、圧縮比を８にするというのは一例に過ぎず、圧縮比は任意の値に設定するとしてよい。別の例を挙げると、圧縮比は、メタデータの１ビットがデータの６４バイトを表す５１２：１とする。上記と同様に、データアドレスを変換／修正して、ｌｏｇ２（５１２）ビット、つまり、９ビットだけデータアドレスを下位シフトすることによって、メタデータアドレスを形成する。この場合、ビット０：２の代わりに、ビット６：８は変わらず１ビットを選択するために利用され、５１２ビットの粒度で選択することによって効率的に圧縮が行われる。データアドレスが９ビットシフトされるので、データアドレスの高位部分には、９個のビット位置が空き、情報が保持され得る。一実施形態によると、この９ビットでは、コンテクストＩＤ、スレッドＩＤ、および／または、ＭＤＩＤ等の識別子を保持する。また、メタフィジカル空間値もまた、これらのビットで保持されているとしてよい。または、このアドレスをメタフィジカル値で拡張するとしてもよい。 Note that setting the compression ratio to 8 is merely an example, and the compression ratio may be set to an arbitrary value. As another example, the compression ratio is 512: 1, where one bit of metadata represents 64 bytes of data. Similar to the above, the metadata address is formed by converting / modifying the data address and shifting the data address lower by log2 (512) bits, that is, 9 bits. In this case, instead of bits 0: 2, bits 6: 8 remain unchanged and are used to select one bit, and compression is performed efficiently by selecting with a granularity of 512 bits. Since the data address is shifted by 9 bits, 9 bit positions are empty in the high-order part of the data address, and information can be held. According to one embodiment, these 9 bits hold an identifier such as a context ID, thread ID, and / or MDID. Metaphysical space values may also be held in these bits. Alternatively, this address may be extended with a metaphysical value.

一実施形態によると、複数の同時圧縮比がハードウェアでサポートされている。この場合、圧縮比を表す情報は、メタデータアドレスを取得するべくデータアドレスと組み合わせられたメタフィジカル値の一部として保持される。このため、このデータアドレスでメモリを検索する際には、圧縮比が考慮に入れられ、さまざまな圧縮比のアドレスと一致してしまうことはない。また、ソフトウェアは、ストア情報を、別の圧縮比のロードに転送しないように、ハードウェアを利用することができるとしてよい。 According to one embodiment, multiple simultaneous compression ratios are supported in hardware. In this case, information representing the compression ratio is held as part of the metaphysical value combined with the data address to obtain the metadata address. For this reason, when the memory is searched with this data address, the compression ratio is taken into consideration and does not coincide with addresses of various compression ratios. Also, the software may be able to use hardware so that store information is not transferred to a load with a different compression ratio.

一実施形態によると、１つの圧縮比を利用してハードウェアを実現するが、当該ハードウェアはソフトウェアに対して複数の圧縮比を提示する他のハードウェアサポートを含む。一例を挙げると、図４に示すように、キャッシュハードウェアが８：１の圧縮比を用いて実現されていると仮定する。しかし、複数の異なる粒度でメタデータにアクセスするメタデータアクセス処理は、メタデータのうちデフォルト量を読み出すマイクロ処理と、メタデータ読出のうち適切な部分を試験する試験マイクロ処理とを含むようにデコードされる。一例を挙げると、メタデータ読出のデフォルト量は３２ビットである。しかし、８：１という異なる粒度／圧縮についての試験処理は、アドレスのうち所定数のビット、例えば、メタデータアドレスのＬＳＢから所定数のビットおよび／またはコンテクストＩＤに基づき、メタデータ読出の３２ビットのうち正しいビットを試験する。 According to one embodiment, a single compression ratio is used to implement the hardware, but the hardware includes other hardware support that presents multiple compression ratios to the software. As an example, assume that the cache hardware is implemented with an 8: 1 compression ratio, as shown in FIG. However, metadata access processing that accesses metadata at multiple different granularities is decoded to include microprocessing that reads the default amount of metadata and test microprocessing that tests the appropriate portion of metadata reading. Is done. As an example, the default amount for reading metadata is 32 bits. However, the test process for different granularity / compression of 8: 1 is based on a predetermined number of bits of the address, eg 32 bits for reading metadata based on a predetermined number of bits and / or context ID from the LSB of the metadata address. Test for the correct bit.

説明として、データ１バイトにつきメタデータ１ビットというメタデータとデータとがアライメントされていない方式では、メタデータアドレスの下位３ビットに基づいて、メタデータの３２個の読み出したビットのうち下位８ビットから１ビットを選択する。１ワードのデータについては、２つの連続したメタデータビットを、アドレスの下位３ビットに基づいて、読み出したメタデータの３２ビットのうち下位１６ビットから選択し、１２８ビットのメタデータサイズについては１６ビットとなるまで同様に続く。 As an explanation, in a method in which metadata is not aligned with 1 bit of metadata per 1 byte of data, based on the lower 3 bits of the metadata address, the lower 8 bits of the 32 read bits of the metadata 1 bit is selected. For one word of data, two consecutive metadata bits are selected from the lower 16 bits of the read 32 bits based on the lower 3 bits of the address, and 16 bits for a 128 bit metadata size. The same goes until it becomes a bit.

＜メタデータアクセス命令／処理＞
図５は、データに対応付けられているメタデータにアクセスする方法を説明するためのフローチャートである。図５に示すフローチャートの各ステップは実質的に逐次的に実行されるものとして図示されているが、少なくとも一部のステップを並行して実行するとしてよいし、順序を変更して実行するとしてもよい。 <Metadata access instruction / processing>
FIG. 5 is a flowchart for explaining a method of accessing metadata associated with data. Although the steps of the flowchart shown in FIG. 5 are illustrated as being executed substantially sequentially, at least some of the steps may be executed in parallel, or may be executed in a different order. Good.

ステップ５０５において、所与のデータアイテムのデータアドレスを参照しているメタデータ処理を発見する。上記の説明では、メタデータ命令／処理は、メタデータの読出、修正および／またはクリアを実行するハードウェアでサポートされていると説明した。つまり、命令はプロセッサの命令セットアーキテクチャ（ＩＳＡ）でサポートされているので、プロセッサのデコーダはデータにアクセスする命令のオペレーションコード（オペコード）およびアクセスを実行するロジックを認識するとしてよい。尚、命令を利用するということも処理を意味するとしてよい。一部のプロセッサでは、それぞれ個別のタスクを実行する複数のマイクロ処理にデコードされ得るマクロ命令という概念を利用する。例えば、メタデータ試験および設定マクロ命令は、メタデータを試験するためのメタデータ試験処理／マイクロ処理にデコードされ、試験処理の結果として正しいブール値が取得されれば、設定処理でメタデータを特定の値に更新する。 In step 505, a metadata process that references the data address of a given data item is found. In the above description, it has been described that metadata instructions / processing is supported by hardware that reads, modifies, and / or clears metadata. That is, since instructions are supported by the processor's instruction set architecture (ISA), the processor's decoder may recognize the operation code (opcode) of the instruction that accesses the data and the logic that performs the access. Note that using an instruction may also mean processing. Some processors make use of the concept of macroinstructions that can be decoded into multiple microprocessing that each perform a separate task. For example, metadata test and set macro instructions are decoded into a metadata test process / micro process to test the metadata, and if the correct Boolean value is obtained as a result of the test process, the metadata is specified in the set process Update to the value of.

しかし、メタデータアクセス処理は、メタデータにアクセスする明示的なソフトウェア命令に限定されず、メタデータに対応付けられているデータアイテムへのアクセスを含むより大型でより複雑な命令の一部としてデコードされた非明示的なマイクロ処理を含むとしてよい。ここにおいて、データアクセス命令は、複数の処理に、例えば、データアイテムにアクセスする処理および対応付けられているメタデータを非明示的に更新する処理にデコードされるとしてよい。 However, metadata access processing is not limited to explicit software instructions that access metadata, but is decoded as part of a larger, more complex instruction that includes access to the data item associated with the metadata. Implicit microprocessing may be included. Here, the data access instruction may be decoded into a plurality of processes, for example, a process of accessing a data item and a process of implicitly updating associated metadata.

先述したように、一実施形態によると、ハードウェアでのメタデータとデータとの物理的マッピングは、ソフトウェアには直接見えない。このため、本例では、メタデータアクセス処理は、データアドレスを参照し、ハードウェアを利用して正しく変換、つまり、マッピングを実行して、メタデータに適切にアクセスする。しかし、メタデータアクセス処理はそれぞれ、出所のスレッド、コンテクストおよび／またはソフトウェアサブシステムによって、参照するメタフィジカルアドレス空間がそれぞれ異なり得る。このため、メモリは、ソフトウェアに対してトランスペアレントに、データアイテムのメタデータを保持するとしてよい。ハードウェアがメタデータへのアクセス処理を検出すると、明示的なオペレーションコード（命令のオペコード）によって、または、命令をメタデータアクセスマイクロ処理にデコードすることによって、当該ハードウェアは、メタデータにアクセスするためのアクセス処理が参照しているデータアドレスについて必要な変換を実行する。 As previously noted, according to one embodiment, the physical mapping between hardware metadata and data is not directly visible to software. For this reason, in this example, the metadata access process refers to the data address and correctly converts using the hardware, that is, executes mapping and appropriately accesses the metadata. However, each metadata access process may have a different reference metaphysical address space depending on the source thread, context, and / or software subsystem. Thus, the memory may hold the metadata of the data item transparent to the software. When the hardware detects processing for accessing the metadata, the hardware accesses the metadata by an explicit operation code (instruction opcode) or by decoding the instruction into a metadata access microprocessing. Necessary conversion is executed for the data address referred to by the access process.

この例で説明するように、プログラムは、図２および図３に示すデータアイテム２１６および３１６等のデータアイテムの同一アドレスを参照しているデータアクセス処理またはメタデータアクセス処理等の複数の別個の処理を含むとしてよく、ハードウェアは、これらのアクセスを、物理アドレス空間およびメタフィジカルアドレス空間等の異なるアドレス空間にマッピングするとしてよい。一部の実施形態によると、ＩＳＡは、所与の仮想アドレス、ＭＤＩＤ、圧縮比、および、オペランド幅のために、メタデータをロード／ストア／試験／設定する命令で拡張するとしてよい。これらのパラメータは何れかが、明示的な命令オペランドであってよく、オペコードにエンコードされているとしてよく、または、別個の制御レジスタから取得されるとしてよい。命令では、メタデータロード／ストア処理と他の処理とを組み合わせるとしてよい。例えば、一部のデータをロードして、そのうち一部のビットを試験して、後続の条件付きジャンプのための条件コードを設定するとしてよい。また、命令では、全メタデータ、または、特定のＭＤＩＤのメタデータのみをフラッシュするとしてもよい。以下にメタデータアクセス処理の一例を列挙する。尚、例として挙げる命令の一部は、具体的に６４Ｘの圧縮比の命令に関するものがあるが、具体的に開示していなくても、同様の命令を他の圧縮比に利用するとしてもよく、圧縮していないメタデータを利用するとしてもよい。 As described in this example, the program may include a plurality of separate processes, such as a data access process or a metadata access process that refers to the same address of a data item such as data items 216 and 316 shown in FIGS. And the hardware may map these accesses to different address spaces, such as a physical address space and a metaphysical address space. According to some embodiments, the ISA may be extended with instructions to load / store / test / set metadata for a given virtual address, MDID, compression ratio, and operand width. Any of these parameters may be explicit instruction operands, may be encoded in the opcode, or may be obtained from a separate control register. In the instruction, the metadata load / store process may be combined with another process. For example, some data may be loaded, some of those bits may be tested, and a condition code for subsequent conditional jumps may be set. In the command, all metadata or only metadata of a specific MDID may be flushed. Examples of metadata access processing are listed below. Some of the instructions given as examples are specifically related to instructions with a compression ratio of 64X, but even if not specifically disclosed, the same instructions may be used for other compression ratios. Alternatively, uncompressed metadata may be used.

＜メタデータビット試験および設定（ＭＤＬＴ）＞
メタデータロードおよび試験命令（ＭＤＬＴ）は、引数を２つ含む。つまり、ソースオペランドとしてメタデータが対応付けられているデータアドレス、および、バイト、ワード、ｄワード、ｑワードまたはその他のサイズのビットを含むメタデータが書き込まれるレジスタ（デスティネーションオペランド）である。試験されたメタデータビットの値は、レジスタに書き込まれる。プログラマは、ＭＤＬＴ命令のデスティネーションレジスタに格納されているデータに関する知識について何も仮定すべきではなく、このレジスタを操作すべきではない。このレジスタは、同一アドレスへのメタデータストアおよび設定命令（ＭＤＳＳ）に対するソースオペランドとしてのみ利用される。一実施形態によると、ＭＤＬＴ命令は、試験処理および設定処理を組み合わせるが、試験が成功すれば、設定処理を破棄する。 <Metadata bit test and setting (MDLT)>
The metadata load and test instruction (MDLT) includes two arguments. That is, it is a register (destination operand) in which metadata including metadata associated with metadata as a source operand and metadata including bytes of bytes, words, d words, q words, or other sizes is written. The value of the tested metadata bit is written to the register. The programmer should not assume any knowledge about the data stored in the destination register of the MDLT instruction and should not manipulate this register. This register is only used as a source operand for the metadata store and set instruction (MDSS) to the same address. According to one embodiment, the MDLT instruction combines the test process and the setting process, but discards the setting process if the test is successful.

＜メタデータストアおよび設定（ＭＳＳ）＞
メタデータストアおよび設定命令（ＭＤＳＳ）は、引数を２つ持つ。メタデータが対応付けられているデータアドレス、および、メモリに格納するべきバイト、ワード、ｄワード、ｑワードまたは他のサイズのビットを含むメタデータを格納しているレジスタ（ソースオペランド）である。ＭＤＳＳ命令は、ソースオペランドからの値において正しいビットを設定する。 <Metadata Store and Settings (MSS)>
The metadata store and setting instruction (MDSS) have two arguments. A register (source operand) that stores metadata including a data address with which the metadata is associated and a byte, word, d word, q word or other size bits to be stored in memory. The MDSS instruction sets the correct bit in the value from the source operand.

＜メタデータストアおよびリセット命令（ＭＤＳＲ）＞
ＭＤＳＲ命令は、ソース引数を２つ持つ。ソースオペランドとしてメタデータが対応付けられているデータアドレス、および、リセットすべきバイト、ワード、ｄワード、ｑワードまたは他のサイズのビットを含むメタデータを格納しているレジスタ（ソースオペランド）である。ＭＤＳＲ命令は、ソースオペランドからの値において正しいビットをリセットする。 <Metadata store and reset instruction (MDSR)>
The MDSR instruction has two source arguments. A register (source operand) that stores metadata that contains metadata as source operands and metadata that includes bytes, words, d words, q words, or other size bits to be reset . The MDSR instruction resets the correct bit in the value from the source operand.

メタデータアドレスは、参照されているデータアドレスから決定される。メタデータアドレスを決定する例は、メタフィジカルアドレス変換および複数のメタフィジカルアドレス空間について説明した上記の内容に含まれている。しかし、圧縮比毎にメタデータを別々に格納するべく、変換にはデータ対メタデータの圧縮比を組み込む、つまり、データ対メタデータの圧縮比に基づいて変換を行なうことに留意されたい。 The metadata address is determined from the referenced data address. An example of determining a metadata address is included in the above description describing metaphysical address translation and multiple metaphysical address spaces. However, it should be noted that in order to store metadata separately for each compression ratio, the conversion incorporates a data-to-metadata compression ratio, ie, the conversion is based on the data-to-metadata compression ratio.

ＣＭＤＴ命令は、処理系依存の圧縮マッピング関数でメモリデータアドレスをメモリメタデータアドレスに変換して、当該メモリメタデータアドレスに対応するメタデータビットが設定されているか否かを試験する。一例を挙げると、圧縮比ＣＲは、８バイトについて１ビットである。メタデータアドレスの算出では、ＭＤＣＲレジスタのコンテクストＩＤのうち１つを組み込んで、各コンテクストＩＤについて一意的なＭＤ群を提供して、ＭＤＢＬＫ［ＣＲ］［ＭＤＣＲ．ＭＤＩＤ［ＭＤＩＤ番号］］．ＭＥＴＡをアドレス指定する。当該命令は、アドレス「ｍｅｍ」を特定されたデータサイズにアラインメントさせて、強制的にアラインメントされる。当該命令は、メタデータが設定されているか否かを試験する。 The CMDT instruction converts a memory data address into a memory metadata address using a processing-dependent compression mapping function, and tests whether a metadata bit corresponding to the memory metadata address is set. As an example, the compression ratio CR is 1 bit for 8 bytes. In the calculation of the metadata address, one of the MDCR register context IDs is incorporated to provide a unique MD group for each context ID, and MDBLK [CR] [MDCR. MDID [MDID number]]. Address META. The instruction is forcibly aligned by aligning the address “mem” to the specified data size. The instruction tests whether metadata is set.

以下に、ＣＤＭＴ（ＺＦフラグはゼロのメタデータ値を表すように設定されている。他のフラグは全てクリアされている。）に関連する擬似コードの一例を記載する。

The following is an example of pseudo code associated with CDMT (the ZF flag is set to represent a metadata value of zero. All other flags are cleared).

ＣＭＤＳ命令は、処理系依存の圧縮マッピング関数でメモリデータアドレスをメモリメタデータアドレスに変換する。圧縮比は、８バイトのデータについて１ビットである。ｉｍｍ８値のエンコードは、以下の通りである。０→ＭＤ＿Ｖａｌｕｅ；ＭＤに格納されるべき値および７：１→確保済み；未使用 The CMDS instruction converts a memory data address into a memory metadata address with a processing-dependent compression mapping function. The compression ratio is 1 bit for 8-byte data. The encoding of the imm8 value is as follows. 0 → MD_Value; value to be stored in MD and 7: 1 → reserved; unused

以下に、ＣＭＤＳに対応付けられている擬似コードの一例を記載する。

An example of pseudo code associated with CMDS is described below.

ＣＭＤＣＬＲ命令は、ＭＢＬＫ（ｍｅｍ）にわたる範囲内の任意のデータに対応する全てのＭＤＢＬＫ［ＣＲ］［ＭＤＣＲ．ＭＤＩＤ［ＭＤＩＤ番号］］．ＭＥＴＡをリセットする。 The CMDCLR instruction includes all MDBLK [CR] [MDCR. MDID [MDID number]]. Reset META.

ＣＭＤＣＬＲに関連する擬似コードの例を以下に記載する。

Examples of pseudo code related to CMDCLR are described below.

続いて、ステップ５１０では、圧縮比、処理要素ＩＤ、コンテクストＩＤ、ＭＤＩＤ、メタフィジカル値、オペランドサイズ、および／または、その他のメタフィジカルアドレス空間の変換に関連する値に基づいて、メタデータアクセス処理で参照されているデータアドレスに基づきメタデータアドレスを決定する。上述した方法、例えば、データアドレスを変換することなく、データアドレスを通常通りに変換して、または、データアドレスに別にメタフィジカルアドレス変換を実行して複数のＩＤ値を組み合わせる方法のいずれかを用いて、適切なメタデータアドレスを取得するとしてよい。 Subsequently, in step 510, the metadata access processing is performed based on the compression ratio, processing element ID, context ID, MDID, metaphysical value, operand size, and / or other values related to the transformation of the metaphysical address space. The metadata address is determined based on the data address referred to in. Using one of the methods described above, for example, converting a data address as usual without converting the data address, or performing a separate metaphysical address conversion on the data address to combine a plurality of ID values Thus, an appropriate metadata address may be acquired.

さらに、上述したように、一部の例では、あるバージョンの試験命令、設定命令、クリア命令等の命令は、１つのスレッドまたはメタデータコンテクストに他のスレッドまたはメタデータコンテクストのメタデータを試験、設定、またはクリアさせる。このため、メタデータアドレスへの変換は、あるスレッドまたはコンテクストのＩＤからのアクセスに別のスレッドまたはコンテクストのＩＤにアクセスさせるべく、マスクの適用等、アドレスの修正を含むとしてもよい。 Further, as described above, in some examples, a version of a test instruction, a set instruction, a clear instruction, etc., tests the metadata of another thread or metadata context in one thread or metadata context, Set or clear. Thus, conversion to a metadata address may include address modification, such as applying a mask, to allow access from one thread or context ID to access another thread or context ID.

ステップ５１５では、メタデータアドレスが参照しているメタデータにアクセスする。通常の場合は、要求元のローカルなスレッドまたはコンテクストのＩＤに対応付けられているメタデータの独立した位置にアクセスして、試験、設定およびクリア等の適切な処理を実行する。しかし別の場合には、上述したように、このステップにおいて、別のスレッドまたはコンテクストのＩＤのメタデータにアクセスするとしてもよい。 In step 515, the metadata referenced by the metadata address is accessed. In a normal case, an independent position of the metadata associated with the requesting local thread or context ID is accessed, and appropriate processing such as testing, setting and clearing is executed. However, in other cases, as described above, this step may access metadata of another thread or context ID.

＜抽象化＞
以下ではソフトウェアの抽象化の実施形態を説明する。所与のＣＲは、メタデータの１ビットにマッピングされているデータのビット数を示す２の累乗である。利用する場合はどのＣＲ値を利用するかは実施系で定義されている。ＣＲ＞１は、圧縮されたメタデータを意味する。ＣＲ＝１は、圧縮されていないメタデータを意味する。 <Abstraction>
In the following, an embodiment of software abstraction will be described. A given CR is a power of 2 indicating the number of bits of data mapped to one bit of metadata. Which CR value to use is defined by the implementation system. CR> 1 means compressed metadata. CR = 1 means uncompressed metadata.

ＭＤＢＬＫ［ＣＲ］［＊］は、サイズがｃｅｉｌ（ＣＲ／８）バイトであって、自然にアラインメントされている。ＭＤＢＬＫは、自身の線形仮想アドレスではなく、物理データに対応付けられている。同じｆｌｏｏｒ（Ａ／ＭＤＢＬＫ［ＣＲ］［＊］＿ＳＩＺＥ）値を持つ有効な物理アドレスＡは全て、同じＭＤＢＬＫ群を指定する。 MDBLK [CR] [*] is ceil (CR / 8) bytes in size and is naturally aligned. MDBLK is associated with physical data, not with its own linear virtual address. All valid physical addresses A having the same floor (A / MDBLK [CR] [*] _ SIZE) value specify the same MDBLK group.

所与のＣＲについては、任意の数の識別可能なＭＤＩＤを設けることが可能で、各ＭＤＩＤは一意的なメタデータのインスタンスを指定している。所与のＣＲおよびＭＤＩＤのメタデータは、任意のその他のＣＲまたはＭＤＩＤのメタデータとは識別可能である。例えば、Ｔｈｄ＃０について、ａｄｄｒがＱＷＯＲＤでアラインメントされていると仮定すると、ＭＤＢＬＫ［ＣＲ＝６４］［ＭＤＩＤ＝３］（ａｄｄｒ）で指定されるメタデータブロックは、ＭＤＢＬＫ［ＣＲ＝６４］［ＭＤＩＤ＝３］（ａｄｄｒ＋７）と同じであるが、ＭＤＢＬＫ［ＣＲ＝６４］［ＭＤＩＤ＝４］（ａｄｄｒ）およびＭＤＢＬＫ［ＣＲ＝５１２］［ＭＤＩＤ＝３］（ａｄｄｒ）とは識別可能である。 For a given CR, any number of identifiable MDIDs can be provided, each MDID specifying a unique metadata instance. The metadata for a given CR and MDID is distinguishable from any other CR or MDID metadata. For example, assuming that addr is aligned with QWORD for Thd # 0, the metadata block specified by MDBLK [CR = 64] [MDID = 3] (addr) is MDBLK [CR = 64] [MDID. = 3] (addr + 7), but distinguishable from MDBLK [CR = 64] [MDID = 4] (addr) and MDBLK [CR = 512] [MDID = 3] (addr).

所与の実施例では、複数のコンテクストを同時にサポートするとしてよい。この場合、コンテクスト数は、プロセッサが一部を成している特定のシステムに関する所定の設定情報およびＣＲに応じて決まる。圧縮されていないメタデータの場合、物理データのＱＷＯＲＤ毎に、メタデータのＱＷＯＲＤが設けられている。 In a given embodiment, multiple contexts may be supported simultaneously. In this case, the number of contexts is determined according to predetermined setting information and CR regarding a specific system of which the processor is a part. In the case of uncompressed metadata, a metadata QWORD is provided for each physical data QWORD.

メタデータを解釈するのは、ソフトウェアのみである。ソフトウェアは、特定のＭＤＢＬＫ［ＣＲ］［ＭＤＩＤ］のＭＥＴＡを設定、リセットまたは試験するとしてよく、または、全てのＴｈｄのＭＤＢＬＫ［＊］［＊］のＭＥＴＡをリセットするとしてよく、または、所与のＭＢＬＫ（ａｄｄｒ）と交差する全てのＴｈｄのＭＤＢＬＫ［ＣＲ］［ＭＤＩＤ］のＭＥＴＡをリセットするとしてよい。 Only software interprets the metadata. The software may set, reset or test the META for a specific MDBLK [CR] [MDID], or may reset the META of all Thd MDBLK [*] [*], or a given The META of MDBLK [CR] [MDID] of all Thd crossing MBLK (addr) may be reset.

メタデータの損失について、ＴｈｄのＭＥＴＡ特性は、自発的に０にリセットされる場合があり、この場合にはメタデータ損失イベントが発生する。 Regarding metadata loss, the META characteristic of Thd may be spontaneously reset to 0, in which case a metadata loss event occurs.

＜強制されたメタデータ値＞
図６は、強制されたメタデータ値に対するハードウェアサポートを提供する実施形態を説明するための図である。ＳＴＭは通常、アクセスバリアを用いてメモリアクセス処理間でのコンシステンシを保証する。例えば、データアイテムへのメモリアクセスに先立って、当該データアイテムに対応付けられているメタデータ位置またはロック位置を確認して、当該データアイテムが利用可能であるか否かを判断する。他のバリア処理としては、メタデータ位置またはロック位置にあるデータアイテムに対する読出ロック、書込ロック等のロックを取得する処理、データアイテムのあるバージョンを、トランザクションの読出設定または書込設定に記録／格納する処理、そのポイントへのトランザクションの読出設定が依然として有効か否かを判断する処理、データアイテムの値をバッファリングまたはバックアップする処理、監視部を設定する処理、フィルタリング値を更新する処理、および、任意のその他のトランザクション的処理が含まれ得る。 <Forced metadata value>
FIG. 6 is a diagram illustrating an embodiment that provides hardware support for enforced metadata values. The STM usually uses an access barrier to guarantee consistency between memory access processes. For example, prior to memory access to a data item, the metadata position or lock position associated with the data item is checked to determine whether the data item is available. Other barrier processes include a process of acquiring a lock such as a read lock and a write lock for a data item at a metadata position or a lock position, and a certain version of the data item is recorded / read in a transaction read setting or write setting. Processing to store, processing to determine whether the transaction read settings for that point are still valid, processing to buffer or back up the value of the data item, processing to set the monitoring unit, processing to update the filtering value, and Any other transactional processing may be included.

しかし、トランザクションでは、同じデータアイテムに対する後続のアクセスによって、当該データアイテムに対するアクセスが発見される度に、対応するトランザクション的バリアを実行するというオーバーヘッドが発生することが多い。説明すると、アドレスＡに対する書込みがあるトランザクションにおいて３回行なわれると、アドレスＡに対する書込ロックを取得する書込バリアを３回別個に実行することになる。しかし、アドレスＡに対するロックは最初のトランザクション的書込での書込バリアを実行することで既に取得されており、その後に続く２回のトランザクション的書込の前に書込バリアを２回実行することは冗長であり、アドレスＡに対するロックは再度取得する必要はない。 However, in a transaction, subsequent accesses to the same data item often incur the overhead of executing the corresponding transactional barrier each time an access to the data item is discovered. To explain, if a write to address A is performed three times in a transaction, a write barrier for acquiring a write lock for address A is executed three times separately. However, the lock for address A has already been acquired by executing the write barrier on the first transactional write, and the write barrier is executed twice before the next two transactional writes. This is redundant and it is not necessary to acquire the lock for address A again.

このため、一実施形態によると、ハードウェアは、これらのバリアに対応付けられている実行を加速するフィルタリング値を保持する。このフィルタリング値は、キャッシュに、読出監視部および書込監視部等の注釈ビットとして含められるとしてよいし、または、上述したようにメタフィジカルアドレス空間内のメタデータ位置に保持されるとしてもよい。上記の例に基づいて説明すると、最初の書込バリアが発見されると、書込フィルタリング値を未アクセス値からアクセス済値に更新して、アドレスＡに対する書込バリアが既に同一トランザクションにおいて発見されている旨を示す。このため、同一トランザクションにおいて続いてトランザクション的書込処理が２回発生すると、書込バリアに誘導される前に、アドレスＡの書込フィルタリング値を確認する。この場合、このフィルタリング値は、書込バリアを実行する必要はない、つまり、書込バリアは既に同一トランザクション内で実行されている旨を示すアクセス済値を含んでいる。このため、続く２つの書込処理については実行を書込バリアに誘導しない。つまり、このフィルタリング値はトランザクション的実行を加速する。つまり、フィルタリングを利用しない前述の例に対して、続く２つのアクセスでは書込バリアの実行を削除するかまたは含めない。 Thus, according to one embodiment, the hardware maintains filtering values that accelerate execution associated with these barriers. This filtering value may be included in the cache as an annotation bit, such as a read monitor and a write monitor, or may be held at a metadata location in the metaphysical address space as described above. Based on the above example, when the first write barrier is found, the write filtering value is updated from the unaccessed value to the accessed value, and the write barrier for address A is already found in the same transaction. Indicates that For this reason, when a transactional write process occurs twice in the same transaction, the write filtering value of the address A is confirmed before being guided to the write barrier. In this case, this filtering value does not need to execute the write barrier, that is, it contains an accessed value indicating that the write barrier has already been executed in the same transaction. For this reason, execution of the subsequent two write processes is not guided to the write barrier. That is, this filtering value accelerates transactional execution. That is, for the previous example that does not use filtering, the next two accesses do not delete or include the execution of the write barrier.

尚、ロード／読出のための読出フィルタ、取消処理のための取消フィルタ、および、一般的なフィルタリング処理のための雑則フィルタを、上記の書込フィルタを書込／ストア処理について利用したのと同じやり方で用いるとしてよい。 Note that the read filter for loading / reading, the canceling filter for canceling processing, and the miscellaneous filter for general filtering processing are the same as those using the above writing filter for writing / store processing. May be used in a way.

さらに、トランザクション的バリアには、非トランザクション的処理からトランザクション的処理を分離することに関連する強アトミック性および弱アトミック性の概念が対応付けられている。この場合、トランザクション的にロードされるメモリ位置に対するトランザクション的書込はコンフリクトとなる可能性があるように、非トランザクション的にロードされるメモリ位置に対するトランザクション的書込もコンフリクトとなる可能性があり、非トランザクション的ロード処理で利用されるデータは無効なデータとなってしまう。弱アトミック性のシステムでは、非トランザクション的処理にはバリアが挿入されないか、または、挿入されるバリアは最低限に抑えられるので、弱アトミック性のシステムは実行が無効になる危険性を抱える。対照的に、強アトミック性のシステムでは、非トランザクション的処理にもトランザクション的バリアが挿入されるので、トランザクション的処理と非トランザクション処理との間が分離および保護されるが、非トランザクション処理全てについてトランザクション的バリアを実行しなければならないという代償を伴う。 Furthermore, the transactional barrier is associated with the concept of strong atomicity and weak atomicity associated with separating transactional processing from non-transactional processing. In this case, a transactional write to a non-transactionally loaded memory location can also be a conflict, as a transactional write to a transactionally loaded memory location can be a conflict, Data used in non-transactional load processing becomes invalid data. In a weak atomic system, a barrier is not inserted in non-transactional processing, or the inserted barrier is minimized, so a weak atomic system has a risk of invalidation of execution. In contrast, in a strong atomic system, transactional barriers are also inserted into non-transactional processing, which isolates and protects between transactional and non-transactional processing, but transactional for all non-transactional processing. With the price of having to implement a dynamic barrier.

このため、一実施形態によると、上述したフィルタは、非トランザクション的処理における強アトミック性バリアと共に利用されて、強アトミック性処理および弱アトミック性処理というさまざまなモードをサポートするとしてよい。説明のために、簡略化した実施形態例を図６に示す。この場合、メタデータ６１０は、上述したように、データ６０５のハードウェアに保持されている。メタデータ６１０にアクセスするためのメタデータアクセス６００を受信する。一実施形態によると、メタデータアクセスは、読出フィルタ、書込フィルタ、取消フィルタ、または、雑則フィルタ等のフィルタを試験する試験メタデータ処理を含む。 Thus, according to one embodiment, the filter described above may be utilized with a strong atomicity barrier in non-transactional processing to support various modes of strong atomic processing and weak atomic processing. For purposes of explanation, a simplified example embodiment is shown in FIG. In this case, the metadata 610 is held in the hardware of the data 605 as described above. A metadata access 600 for accessing the metadata 610 is received. According to one embodiment, metadata access includes a test metadata process that tests a filter, such as a read filter, a write filter, a cancellation filter, or a miscellaneous filter.

フィルタを試験するための試験メタデータ処理は、トランザクション的または非トランザクション的なアクセス処理から発行されるとしてよい。一実施形態によると、コンパイラは、アプリケーションコードをコンパイルする際に、アプリケーションコードに応じた試験フィルタ処理を、条件として、トランザクション的アクセスおよび非トランザクション的アクセスにおけるトランザクション的バリアの呼び出しの実行に挿入する。このため、トランザクションにおいては、バリアの呼び出しの前にフィルタ処理を実行して、成功すればトランザクション的バリアへの呼び出しは実行せず、上述したように加速化を実行する。 A test metadata process for testing the filter may be issued from a transactional or non-transactional access process. According to one embodiment, when compiling the application code, the compiler inserts test filtering according to the application code as a condition into the execution of the transactional barrier calls in transactional and non-transactional access. For this reason, in the transaction, filter processing is executed before calling the barrier, and if successful, the call to the transactional barrier is not executed, and acceleration is executed as described above.

しかし、非トランザクション的処理の場合、一実施形態では、非トランザクション的処理におけるトランザクション的バリアが実行されない弱アトミック性モード、および、トランザクション的バリアが実行される強アトミック性モードでハードウェアが動作可能である。 However, for non-transactional processing, in one embodiment, the hardware can operate in a weak atomicity mode where the transactional barrier in non-transactional processing is not executed, and in a strong atomicity mode where the transactional barrier is executed. is there.

処理または制御６２５のモードは、メタデータ制御レジスタ（ＭＤＣＲ）６１５で設定されるとしてよい。ＭＤＣＲ６１５は、上述したＭＤＩＤを保持するＭＤＣＲのバージョンと組み合わせられるか、または、別の制御レジスタであってもよい。別の実施形態によると、動作モードの制御６２５は、一般的なトランザクション的制御レジスタまたはステータスレジスタで保持されるとしてよい。この場合、第１の実行モードは、非トランザクション的処理でトランザクション的バリアが実行される強アトミック性モードを含む。この場合、制御６２５は、「００」等の第１の値を示し、強アトミック性および非トランザクション的動作モードを表す。一例としてマルチプレクサが挙げられる応答ロジック６２０において、メタデータアクセス６００のデスティネーションレジスタ６５０に供給されるべきデータアドレスＡに対応付けられているハードウェアで維持されているメタデータ６１０からメタデータ値を選択する。基本的に、強アトミック性モードでは、実際にハードウェアに保持されているメタデータに基づいてバリアを加速する。これに代えて、弱アトミック性および非トランザクション的モード等の第２の実行モードでは、「０１」等の第２の値を示す制御６２５で指定されるように、メタデータアクセス６００に応じて、ハードウェアに保持されているメタデータ６１０ではなく、ＭＤＣＲからの固定値または強制値をデスティネーションレジスタ６５０に提供する。 The mode of processing or control 625 may be set in the metadata control register (MDCR) 615. The MDCR 615 may be combined with the version of the MDCR that holds the MDID described above, or may be a separate control register. According to another embodiment, the operating mode control 625 may be maintained in a general transactional control register or status register. In this case, the first execution mode includes a strong atomicity mode in which a transactional barrier is executed in a non-transactional process. In this case, control 625 indicates a first value, such as “00”, representing strong atomicity and a non-transactional mode of operation. In response logic 620, which includes a multiplexer as an example, select a metadata value from metadata 610 maintained in hardware associated with data address A to be supplied to destination register 650 of metadata access 600 To do. Basically, in the strong atomicity mode, the barrier is accelerated based on metadata actually held in hardware. Instead, in the second execution mode, such as weak atomicity and non-transactional mode, depending on the metadata access 600 as specified by the control 625 indicating a second value such as “01”, Instead of the metadata 610 held in hardware, a fixed or forced value from the MDCR is provided to the destination register 650.

基本的に、弱アトミック性モードでは、フィルタ試験処理６００に応じて強制値をデスティネーションレジスタ６５０に提供して、フィルタ値の試験が常に成功して、非トランザクション的メモリアクセスの前にトランザクション的バリアに対する呼び出しが実行されないようにする。尚、この説明では、フィルタ試験処理が、フィルタ試験が成功（バリアが実行されない）したか、失敗（バリアが実行される）したかを示すためにブール値を返すものと仮定している。このため、フィルタ値に基づいてバリアを削除することによってトランザクションを加速する同じフィルタリングソフトウェア構成を利用して、非トランザクション処理では全てのバリアを削除する第１の動作モード、つまり、弱アトミック性モード、および、ハードウェアで維持されているメタデータに基づいて非トランザクション的処理におけるバリアを実行または加速する第２の動作モード、つまり、強アトミック性モードを提供する。別の実施形態によると、モード毎に異なる強制値を提供するとしてよい。この場合、強アトミック性モードでは、強制値によって、フィルタ試験処理は必ず失敗するのでバリアが常に実行されるが、弱アトミック性モードでは、強制値によってフィルタ試験処理が常に成功するので、バリアが実行されない。 Basically, in the weak atomic mode, a forced value is provided to the destination register 650 in response to the filter test process 600 so that the filter value test is always successful and a transactional barrier before non-transactional memory access. Prevent calls to to be executed. In this description, it is assumed that the filter test process returns a Boolean value to indicate whether the filter test was successful (barrier not executed) or failed (barrier executed). For this reason, using the same filtering software configuration that accelerates transactions by removing barriers based on filter values, a first mode of operation that removes all barriers in non-transaction processing, namely a weak atomic mode, And a second mode of operation, ie, a strong atomicity mode, that performs or accelerates a barrier in non-transactional processing based on metadata maintained in hardware. According to another embodiment, different forcing values may be provided for each mode. In this case, in the strong atomic mode, the filter test process always fails due to the forced value, so the barrier is always executed. In the weak atomic mode, the filter test process always succeeds because of the forced value, so the barrier is executed. Not.

制御６２５等の制御情報に基づいてＭＤＣＲ６１５等の制御レジスタから強制値または固定値を提供することを、処理モードに応じて固定値／強制値またはメタデータ値を提供することに関連付けて説明したが、強制値または固定値を提供することは、一般的なメタデータの利用にも適用されるとしてよい。例えば、データが変わらないことを利用して、要求に応じてイネーブルが可能なメモリアクセスのデバッグおよび一般的な監視を行うとしてもよい。 The provision of the forced value or the fixed value from the control register such as the MDCR 615 based on the control information such as the control 625 has been described in relation to the provision of the fixed value / forced value or the metadata value according to the processing mode. Providing mandatory or fixed values may also apply to general metadata usage. For example, the fact that data does not change may be used to perform debugging and general monitoring of memory access that can be enabled on demand.

図７は、トランザクション的環境においてアトミック性を維持しつつ非トランザクション的処理を加速する実施形態を説明するためのフローチャートを示す。ステップ７０５において、データアドレスを参照しているメタデータ（ＭＤ）アクセス処理を発見する説明のために具体例を１つ挙げると、ＭＤアクセス処理は、試験の結果第１の値が返答されれば（成功すれば）非トランザクション的メモリアクセスでのトランザクション的バリアを削除し、試験の結果第２の値が返答されれば（失敗すれば）バリアを実行するべく、コンパイラによってアプリケーションコードに応じて既に挿入された試験処理を含む。しかし、ＭＤ試験処理は、これに限定されるものではなく、ブール値である成功値または失敗値を返す試験処理であればどのような試験処理を含むとしてもよい。 FIG. 7 shows a flowchart for describing an embodiment for accelerating non-transactional processing while maintaining atomicity in a transactional environment. In step 705, one specific example is given for explaining the discovery of the metadata (MD) access process referring to the data address. If the MD access process returns the first value as a result of the test, Depending on the application code, depending on the application code, the compiler should remove the transactional barrier in non-transactional memory access (if successful) and execute the barrier if the second value is returned as a result of the test (if it fails) Includes inserted test procedures. However, the MD test process is not limited to this, and any test process may be included as long as it is a test process that returns a success value or a failure value that is a Boolean value.

ステップ７１０では、動作モードを判断する。この場合、動作モードの例としては、トランザクション的または非トランザクション的と強アトミック性または弱アトミック性とを組み合わせた動作モードが挙げられる。このため、１のレジスタまたは２つの別個のレジスタが、トランザクション的動作モードまたは非トランザクション的動作モードを示す第１のビット、および、強アトミック性動作モードまたは弱アトミック性動作モードを示す第２のビットを保持するとしてよい。 In step 710, the operation mode is determined. In this case, an example of the operation mode includes an operation mode in which transactional or non-transactional and strong atomicity or weak atomicity are combined. Thus, one register or two separate registers have a first bit indicating a transactional mode or a non-transactional mode of operation and a second bit indicating a strong atomic mode of operation or a weak atomic mode of operation. May be held.

動作モードがトランザクション的または非トランザクション的で強アトミック性の動作モードである場合、ハードウェアで維持されているメタデータ値をメタデータアクセス処理に対して提供する。つまり、ハードウェアで維持されている値が、ＭＤアクセス処理で特定されているデスティネーションレジスタに入れられる。これとは対照的に、動作モードが、非トランザクション的且つ弱アトミック性の動作モードである場合、ハードウェアで維持されているＭＤ値に代えて、ＭＤＣＲの強制固定値をＭＤアクセス処理に対して提供する。このため、強アトミック性モードでは、ハードウェアで維持されているＭＤ値に基づきバリアが加速されず、弱アトミック性モードでは強制ＭＤＣＲ値に基づいてバリアが加速される。 When the operation mode is a transactional or non-transactional and strong atomic operation mode, a metadata value maintained in hardware is provided to the metadata access process. That is, the value maintained by the hardware is entered in the destination register specified by the MD access process. In contrast, when the operation mode is a non-transactional and weak atomic operation mode, the MDCR forced fixed value is used for the MD access process instead of the MD value maintained in hardware. provide. For this reason, in the strong atomic mode, the barrier is not accelerated based on the MD value maintained by the hardware, and in the weak atomic mode, the barrier is accelerated based on the forced MDCR value.

＜バッファされており監視されている状態への効率的な遷移＞
続いて図８は、トランザクションをコミットする前にデータブロックをバッファされており監視されている状態に効率的に遷移させる方法の実施形態を説明するためのフローチャートである。上述したように、メモリの中の複数のブロック、例えば、データアイテムまたはメタデータを保持しているキャッシュラインは、バッファリングおよび／または監視されることがある。例えば、キャッシュラインのコヒーレンシビットは、バッファされている状態を表し、キャッシュラインの属性ビットは、当該キャッシュラインが、監視されていないか、読出監視されているか、または、書込監視されているかを示す。 <Efficient transition to buffered and monitored state>
Subsequently, FIG. 8 is a flow chart illustrating an embodiment of a method for efficiently transitioning a data block to a buffered and monitored state before committing the transaction. As mentioned above, multiple blocks in memory, eg, cache lines holding data items or metadata, may be buffered and / or monitored. For example, a cache line coherency bit indicates a buffered state, and a cache line attribute bit indicates whether the cache line is being monitored, read monitored, or written monitored. Show.

一部の実施形態によると、キャッシュラインはバッファされているが、監視されていない。監視が適用されていないので、当該キャッシュラインに保持されているデータが損失が多く、当該キャッシュラインに対するコンフリクトは検出されないことを意味する。例えば、トランザクションに対してローカルでコミットされないデータ、例えば、メタデータは、バッファされているが監視されていない状態で保持されるとしてよい。 According to some embodiments, cache lines are buffered but not monitored. Since monitoring is not applied, it means that the data held in the cache line has a large loss, and no conflict with the cache line is detected. For example, data that is not committed locally to the transaction, eg, metadata, may be kept buffered but not monitored.

バッファされているデータと同じアドレスへの書込との間でコンフリクトが検出される場合、このデータには読出監視が適用される。この後、キャッシュラインは、バッファされており読出監視されている状態へと移行するが、この状態に移行するには、その他のコピー全てを共有状態へと強制的に移行させる読出要求が外部処理要素に送信される。このような外部読出要求によって、同じブロック／キャッシュラインに対して書込監視部を維持している別の処理要素との間でコンフリクトが発生する可能性がある。 If a conflict is detected between the buffered data and writing to the same address, read monitoring is applied to this data. Thereafter, the cache line shifts to a buffered and read-monitored state. To shift to this state, a read request for forcibly shifting all other copies to the shared state is performed by external processing. Sent to the element. Such an external read request may cause a conflict with another processing element that maintains a write monitoring unit for the same block / cache line.

同様に、バッファされているデータと同じメモリブロックへの読出との間でコンフリクトが検出される場合、当該キャッシュラインには書込監視を適用する。この後、当該キャッシュラインは、バッファされており書込監視されている状態に遷移する。この遷移は、他の処理要素に所有権読出要求を送信して、他の全てのコピーを強制的に無効状態に移行させることによって、実現する。同様に、同じメモリブロックに対して読出監視または書込監視を維持している処理要素との間でもコンフリクトが検出される。 Similarly, when a conflict is detected between buffered data and reading to the same memory block, write monitoring is applied to the cache line. Thereafter, the cache line transitions to a state where it is buffered and monitored for writing. This transition is realized by sending an ownership read request to another processing element to forcibly shift all other copies to an invalid state. Similarly, a conflict is detected with a processing element that maintains read monitoring or write monitoring for the same memory block.

しかし、トランザクション的なコンフリクトを最小限に抑えるべく、トランザクションによって更新される必要はあるが最終的なコミットはされないメモリブロックは、上述したように、バッファされているが監視されていない状態で維持されるとしてよい。しかし、バッファされているが監視されていない状態で保持されているブロックがコミットされるべきと判断されると、一実施形態では、図８に示すように、バッファされているが監視されていない状態からコミット可能な状態へと遷移するための効率的な経路が提供される。 However, to minimize transactional conflicts, memory blocks that need to be updated by a transaction but are not ultimately committed are kept buffered but not monitored, as described above. It's okay. However, if it is determined that a block that is buffered but held in an unmonitored state should be committed, in one embodiment, as shown in FIG. 8, buffered but not monitored. An efficient path for transitioning from state to commitable state is provided.

一例を挙げると、ステップ８０５において、キャッシュラインが保持するメモリブロックに対するバッファ済み更新を受信する。バッファ済み更新の前、または、バッファ済み更新と同時に、読出監視を当該ブロックに適用する。例えば、当該キャッシュラインのための読出属性を、読出監視値に設定して、当該ブロックが読出監視されている旨を示す。しかし、読出監視を適用するためには、ステップ８１５において、読出要求をまず他の処理要素に送信する。他の処理要素は、この読出要求を受信すると、ステップ８２０において、既に当該キャッシュラインを書込監視状態で維持しているのでコンフリクトを検出するか、または、コピーを共有状態へと移行させる。ステップ８２５において、コンフリクトがない場合、キャッシュラインはバッファされており読出監視されている状態へと移行する。つまり、キャッシュラインのコヒーレンシビットをバッファ済みコヒーレンシ値に更新して、読出監視属性を設定する。 As an example, in step 805, a buffered update for a memory block held by a cache line is received. Read monitoring is applied to the block before the buffered update or simultaneously with the buffered update. For example, the read attribute for the cache line is set to the read monitoring value to indicate that the block is being read monitored. However, to apply read monitoring, a read request is first sent to other processing elements in step 815. When the other processing element receives this read request, in step 820, since the cache line is already maintained in the write monitoring state, it detects a conflict or shifts the copy to the shared state. If there is no conflict at step 825, the cache line is buffered and transitions to a read-monitored state. That is, the read monitoring attribute is set by updating the coherency bit of the cache line to the buffered coherency value.

ステップ８３０では、読出監視に基づき、キャッシュラインに対して競合する書込を検出する。一実施形態によると、読出属性はスヌープロジックに結合されており、キャッシュラインへの外部からの所有権読出要求は、キャッシュラインに設定されている読出監視との間でコンフリクトを検出する。 In step 830, a conflicting write to the cache line is detected based on the read monitoring. According to one embodiment, the read attribute is coupled to the snoop logic, and an external ownership read request to the cache line detects a conflict with the read monitor set on the cache line.

この後、ステップ８３５においてトランザクションの状態の一部としてブロックがコミットされるべきである場合には、ステップ８４０において書込監視を適用する。この場合、ステップ８４５において所有権読出要求が他の処理要素に送信され、当該要求は、ステップ８５０において、読出監視状態または書込監視状態でキャッシュラインを保持することに応じてコンフリクトを検出するか、または、コピーを無効状態に移行させる。このため、所有権読出要求でコンフリクトを検出することによって、この時点でどのようなコンフリクトも検出されるので、キャッシュラインがコミット可能な状態となる。 After this, if the block is to be committed as part of the state of the transaction at step 835, write monitoring is applied at step 840. In this case, an ownership read request is sent to another processing element in step 845, and the request detects a conflict in step 850 in response to holding the cache line in the read monitoring state or the write monitoring state. Or move the copy to an invalid state. Therefore, by detecting a conflict in the ownership read request, any conflict is detected at this time, so that the cache line can be committed.

この結果、バッファされているが監視されていないブロックをコミット可能な状態に２段階で、つまり、ステップ８１０およびステップ８４０で移行させることは、有益である。所有権の取得を、読出監視および書込監視の取得を段階的に行うことによって引き延ばすことによって、複数の同時トランザクションは、コンフリクトの発生率を低減しつつ、同一ブロックを更新できるようになる。トランザクションがコミット段階まで何らかの理由で到達しない場合には、バッファされており読出監視されている状態にブロックを更新しても、コミット段階に到達する別のトランザクションが不要にアボートされることはない。また、このため、ブロックの唯一の所有権の取得をコミット段階まで引き延ばすのは、データの有効性を犠牲にすることなくスレッド間での同時性を高める１つの方法である。 As a result, it is beneficial to transition the buffered but unmonitored blocks to a committable state in two stages, ie, step 810 and step 840. By extending the acquisition of ownership by performing the reading monitoring and the writing monitoring in stages, a plurality of simultaneous transactions can update the same block while reducing the occurrence rate of conflicts. If the transaction does not reach the commit stage for some reason, updating the block to a buffered and read-monitored state does not unnecessarily abort another transaction that reaches the commit stage. For this reason, extending the acquisition of the block's sole ownership to the commit stage is one way to increase concurrency among threads without sacrificing the validity of the data.

以下に記載する表８は、２つの処理要素Ｐ０およびＰ１の間におけるコンフリクトの一実施形態を示す。例えば、Ｐ１によってバッファされており読出監視されている状態に保持されているライン（Ｒ−Ｂ列で示す）と、キャッシュラインが書込監視部で維持されているＰ０の任意の状態（−Ｗ−、ＲＷ−、ＷＢ、ＲＷＢで示す）は、交差するセルで×印で示すように、コンフリクトしている。

Table 8, described below, illustrates one embodiment of a conflict between the two processing elements P0 and P1. For example, a line buffered by P1 and held in a read-monitored state (indicated by the RB column) and an arbitrary state of P0 in which a cache line is maintained in the write monitoring unit (-W -, RW-, WB, and RWB) are conflicting as shown by crosses in the intersecting cells.

また、以下に示す表９は、Ｐ０の列に列挙している処理に応じて、処理要素Ｐ１の対応付けられている特性が失われることを示す。例えば、Ｐ１がラインをバッファされており読出監視されている状態で保持している（Ｒ−Ｂ列で示す）場合、Ｐ０においてストア処理または書込監視設定処理が発生すると、Ｐ１では、ストア／設定ＷＭ行とＲ−Ｂ列が交差している箇所のｘ−ｘで示すように、当該ラインの読出監視属性およびバッファリング属性が失われる。

Table 9 shown below indicates that the characteristics associated with the processing element P1 are lost in accordance with the processing listed in the column of P0. For example, if P1 holds the line in a buffered and read-monitored state (indicated by the RB column), if a store process or a write monitor setting process occurs at P0, P1 / As indicated by xx where the set WM row intersects with the RB column, the read monitoring attribute and buffering attribute of the line are lost.

＜トランザクション的データの損失またはコンフリクトに対する分岐命令（ＪＬＯＳＳ）＞
図９は、トランザクションステータスレジスタのステータス値に基づいてデスティネーションラベルに損失命令がジャンプするようにサポートしているハードウェアの実施形態を図示している。一実施形態によると、ハードウェアは、トランザクションのコンシステンシを確認する手順を加速する。一例として、ハードウェアは、監視あるいはバッファリングされているデータのキャッシュからの損失、つまり、バッファリングあるいは監視されているラインのエビクションを追跡するメカニズム、または、このようなデータに対する競合アクセスを追跡するメカニズム、つまり、監視されているラインへの所有権読出要求等、競合するスヌープを検出する監視部を提供することによってコンシステンシ確認手順をサポートするとしてよい。 <Branch instruction for transactional data loss or conflict (JLOSS)>
FIG. 9 illustrates an embodiment of hardware that supports lost instructions jumping to a destination label based on the status value of the transaction status register. According to one embodiment, the hardware accelerates the procedure of checking transaction consistency. As an example, the hardware tracks the loss of monitored or buffered data from the cache, that is, a mechanism that tracks eviction of the buffered or monitored line, or a competitive access to such data. The consistency checking procedure may be supported by providing a monitoring mechanism that detects competing snoops, such as a mechanism to do this, ie, a read ownership request to the line being monitored.

また、一実施形態によると、ハードウェアは、監視またはバッファリングされているデータのステータスに基づきソフトウェアにこれらのメカニズムにアクセスさせるためのアーキテクチャインターフェースを提供する。このようなインターフェースとして、（１）実行中にソフトウェアに明示的にレジスタをポーリングさせる、ステータスレジスタを読み書きするための命令、（２）ステータスレジスタがコンシステンシが失われた可能性を示す場合には常に読み出されるハンドラをソフトウェアに設定させるインターフェースの２つが挙げられる。 Also, according to one embodiment, the hardware provides an architectural interface for allowing software to access these mechanisms based on the status of the data being monitored or buffered. As such an interface, (1) an instruction to read and write the status register explicitly, causing the software to explicitly poll the register during execution, and (2) if the status register indicates a possible loss of consistency There are two interfaces that allow software to set a handler that is always read.

別の実施形態によると、ハードウェアは、ＨＷの監視またはバッファリングされているデータのステータスに基づき条件付き分岐を実行するＪＬＯＳＳと呼ばれる新しい命令をサポートする。ＪＬＯＳＳ命令は、監視またはバッファリングされているデータがキャッシュから失われたことをハードウェアが検出すると、または、監視またはバッファリングされているデータに関するコンフリクトを検出すると、ラベルに分岐する。ラベルは、ハンドラのアドレス、または、データ損失またはコンフリクト検出の結果実行されるべきその他のコード等、任意のデスティネーションを含む。 According to another embodiment, the hardware supports a new instruction called JLOSS that performs conditional branches based on the status of HW monitoring or buffered data. The JLOSS instruction branches to a label when the hardware detects that the data being monitored or buffered has been lost from the cache, or when it detects a conflict with the data being monitored or buffered. The label includes any destination, such as the address of a handler or other code to be executed as a result of data loss or conflict detection.

説明のための実施形態として、図９は、プロセッサＩＳＡの一部としてＪＬＯＳＳを認識して、プロセッサのロジックにトランザクションのステータスに基づいて条件付き分岐を実行させる命令をデコードするデコーダ９１０を示す。一例を挙げると、トランザクションのステータスは、トランザクションステータスレジスタ９１５に保持されている。トランザクションステータスレジスタは、トランザクションのステータス、例えば、ハードウェアがコンフリクトまたは、本明細書では損失イベントと呼ぶデータ損失を検出したことを表すとしてよい。説明すると、ＴＳＲ９１５のコンフリクトフラグは、監視対象のアドレスへのスヌープと組み合わせてアドレスが監視されている旨を監視部が示すと設定される。ＴＳＲ９１２のコンフリクトフラグは、コンフリクトが検出された旨を示す。同様に、損失フラグは、トランザクション的データまたはメタデータを含むラインのエビクション等、データの損失に応じて設定される。 As an illustrative embodiment, FIG. 9 shows a decoder 910 that recognizes JLOSS as part of the processor ISA and decodes instructions that cause the processor logic to perform conditional branches based on the status of the transaction. As an example, the status of the transaction is held in the transaction status register 915. The transaction status register may represent the status of the transaction, for example, that the hardware has detected a conflict or data loss, referred to herein as a loss event. To explain, the conflict flag of the TSR 915 is set when the monitoring unit indicates that the address is being monitored in combination with the snoop to the monitoring target address. The conflict flag of TSR 912 indicates that a conflict has been detected. Similarly, the loss flag is set in response to data loss such as eviction of lines containing transactional data or metadata.

このため本明細書では、ＪＬＯＳＳは、デコードされて実行されると、ステータスレジスタのフラグを試験して、損失イベント、つまり、損失および／またはコンフリクトがある場合には、ロジック９２５がＪＬＯＳＳが参照しているラベルを実行リソース９３０に対してジャンプ先アドレスとして提供する。このため、１つの命令で、ソフトウェアはトランザクションのステータスを識別することができ、このステータスに基づいて、当該命令が特定しているラベルに実行を誘導することができる。ＪＬＯＳＳはコンシステンシを確認するので、偽のコンフリクトを報告しても許容され得る。ＪＬＯＳＳは、「従来の方法でコンフリクトが発生したと報告するとしてよい。 For this reason, in this document, JLOSS, when decoded and executed, examines the flags in the status register and if there is a loss event, ie loss and / or conflict, logic 925 refers to JLOSS. Is provided to the execution resource 930 as a jump destination address. Thus, with one instruction, the software can identify the status of the transaction, and based on this status, execution can be directed to the label specified by the instruction. JLOSS checks consistency, so reporting false conflicts is acceptable. JLOSS says, “You may report that a conflict has occurred in a conventional way.

一実施形態によると、コンパイラ等のソフトウェアは、コンシステンシについてポーリングするために、プログラムコードにＪＬＯＳＳ命令を挿入する。しかし、ＪＬＯＳＳはメインアプリケーションコードに応じて利用され、ＪＬＯＳＳ命令は、要求に応じてコンシステンシを判断するために、ライブラリ内で提供される読出バリアおよび書込バリアで利用されることが多い。このため、プログラムコードの実行は、コンパイラがＪＬＯＳＳをコードに挿入すること、または、プログラムコードからＪＬＯＳＳを実行すること、任意のその他の形態での命令の挿入または実行を含むとしてよい。ＪＬＯＳＳによるポーリングはステータスレジスタの明示的な読出よりもはるかに高速であると考えられる。これは、ＪＬＯＳＳ命令は追加レジスタを必要としない、つまり、デスティネーションレジスタが明示的読出のステータス情報を受信する必要がないためである。この命令の実施形態では、コンシステンシを確認するための条件は、命令内に明示的に記載されているか、または、別の制御レジスタに非明示的に記載されている。 According to one embodiment, software such as a compiler inserts a JLOSS instruction in the program code to poll for consistency. However, JLOSS is used depending on the main application code, and the JLOSS instruction is often used on read barriers and write barriers provided in the library to determine consistency on demand. Thus, execution of program code may include the compiler inserting JLOSS into the code, or executing JLOSS from program code, and any other form of instruction insertion or execution. Polling with JLOSS is considered to be much faster than explicit reading of the status register. This is because the JLOSS instruction does not require an additional register, that is, the destination register does not need to receive explicit read status information. In this instruction embodiment, the conditions for checking consistency are either explicitly stated in the instruction or implicitly in another control register.

一例を挙げると、トランザクションステータスレジスタ９１５またはその他の格納要素は、特定のコンフリクトステータス情報および損失ステータス情報を保持している。例えば、読出監視されている位置に別のエージェントが書込を行なえば読出コンフリクトで、書込監視されている位置に別のエージェントが読出または書込を行なえば書込コンフリクトで、物理的なトランザクション的データの損失、または、メタデータの損失といった情報を保持している。このため、さまざまなバージョンのＪＬＯＳＳ命令を利用するとしてよい。例えば、ＪＬＯＳＳ．ｒｍ＜ｌａｂｅｌ＞命令は、読出監視されている位置に別のエージェントが書込を行なった場合にラベルに分岐する。ハードウェアで加速されるＳＴＭ（ＨＡＳＴＭ）は、このＪＬＯＳＳ．ｒｍ命令を用いて、コンシステンシの確認を加速することができる。読出設定のコンシステンシを確認する場合は常に、例えば、ネイティブコードのＴＭシステムでの各トランザクション的ロードの後に、ＪＬＯＳＳ．ｒｍを用いて読出設定への競合する更新を短時間で確認する。この場合、読出設定は、読出バリアにあるＪＬＯＳＳを用いて検証されるとしてよいので、ＪＬＯＳＳ命令は、ライブラリ内の読出バリアに挿入されるか、または、メインアプリケーションコードに応じてロード処理の後に挿入される。読出監視されている位置への書込を検出するＪＬＯＳＳ．ｒｍ命令と同様に、ＪＬＯＳＳ．ｗｍ命令は、書込監視されている位置への読出または書込を検出するために用いられるとしてよい。さらに別の例として、位置をバッファリング出来るプロセッサでは、ＪＬＯＳＳ．ｂｕｆ命令を用いて、バッファされているデータが失われたので特定のラベルにジャンプするか否かを判断するとしてよい。 In one example, the transaction status register 915 or other storage element holds specific conflict status information and loss status information. For example, a physical transaction is a read conflict if another agent writes to a location that is monitored for reading, and a write conflict if another agent reads or writes to a location that is monitored for writing. It holds information such as lost data or lost metadata. For this reason, various versions of the JLOSS instruction may be used. For example, JLOSS. The rm <label> instruction branches to a label when another agent writes to the location being monitored for reading. Hardware-accelerated STM (HASTM) is JLOSS. The consistency check can be accelerated using the rm instruction. Whenever checking the consistency of the read settings, for example, after each transactional load in the native code TM system, JLOSS. Use rm to check for conflicting updates to read settings in a short time. In this case, the read settings may be verified using JLOSS in the read barrier, so the JLOSS instruction is inserted into the read barrier in the library or after the load process depending on the main application code Is done. JLOSS. Detects the writing to the position being read. Similar to the rm instruction, JLOSS. The wm instruction may be used to detect a read or write to a location being monitored for writing. As yet another example, in a processor capable of buffering positions, JLOSS. A buf instruction may be used to determine whether or not to jump to a specific label because the buffered data has been lost.

以下に記載する擬似コードは、擬似コードＡとするが、コンシステンシを持つ読出設定を提供すると共にＪＬＯＳＳを利用する、ネイティブコードのＳＴＭの読出バリアを示す。ｓｅｔｒｍ（ｖｏｉｄ＊ａｄｄｒｅｓｓ）関数は、所与のアドレスに対して読出監視を設定し、ｊｌｏｓｓ＿ｒｍ（）関数は、読出監視されている位置に対して競合するアクセスが発生した場合に真を返すＪＬＯＳＳ命令の組み込み関数である。この擬似コードは、ロードされたデータを監視するが、代わりにトランザクションの記録（所有権の記録）を監視することもできる。読出監視の設定とデータのロードとを組み合わせる命令、例えば、データのロードおよび監視の両方を行うｍｏｖｘｍ命令を利用することが可能である。監視に加えてフィルタリングを実行する読出バリアでこれを利用することも可能であり、さらに、読出設定の検証のためにハードウェア監視のみを利用するＳＴＭシステムで、つまり、ソフトウェア読出記録およびＳＷ検証を実行しないＳＴＭシステムでこれを利用することも可能である。 The pseudo code described below is pseudo code A, but shows a STM read barrier for native code that provides consistent read settings and utilizes JLOSS. The setrm (void * address) function sets the read monitor for a given address, and the jloss_rm () function returns a true when a competing access occurs for the position being read monitored. Is a built-in function. This pseudo code monitors the loaded data, but can alternatively monitor the transaction record (ownership record). It is possible to use an instruction that combines read monitoring settings and data loading, for example, a movxm instruction that both loads and monitors data. It is also possible to use this in a read barrier that performs filtering in addition to monitoring, and also in STM systems that use only hardware monitoring for verification of read settings, ie software read records and SW verification. It is also possible to use this in a non-executed STM system.

＜擬似コードＡ：その場更新のＳＴＭ、楽観的読出、ネイティブコードの読出バリア＞

同様に、読出設定コンシステンシを維持しないＳＴＭシステム、例えば、管理されているコード用のＳＴＭは、無限ループまたはその他の不正確な制御フローを回避するとしてよい。つまり、ループバックエッジまたはその他の重要な制御フロー点、例えば、例外が上がる命令においてＪＬＯＳＳ．ｒｍ命令を挿入することによってコンシステンシが失われるので、例外を回避するとしてよい。 Similarly, STM systems that do not maintain read setting consistency, eg, STM for managed code, may avoid infinite loops or other inaccurate control flows. That is, at the loopback edge or other important control flow point, eg, JLOSS. Since consistency is lost by inserting the rm instruction, exceptions may be avoided.

以下に記載する擬似コードは、擬似コードＢと示すが、コンシステンシを実現する別のネイティブコードの読出バリアを示す。このバージョンのＴＭシステムは、トランザクション内の書込についてバッファされている更新を用いてキャッシュに存在する書込設定を利用する。バッファされた後に削除された位置からの読出はコンシステンシが無いので、コンシステンシを維持するために、この読出バリアでは削除されたバッファ済み位置からの読出を回避する。ＣＯＭＭＩＴ＿ＬＯＣＫＩＮＧフラグは、ＳＴＭがバッファ済み位置についてコミット時間ロックを利用している場合に真となる。ｊｌｏｓｓ＿ｂｕｆ（）の確認は、コミット時間ロックを利用しない場合に以前にロックされた位置からの読出について利用されるが、利用しない場合には全ての読出に対して利用される。 The pseudo code described below is designated as pseudo code B, but represents another native code read barrier that achieves consistency. This version of the TM system takes advantage of the write settings present in the cache using buffered updates for writes within a transaction. Since reading from a position that has been buffered and then deleted has no consistency, this read barrier avoids reading from a deleted buffered position in order to maintain consistency. The COMMIT_LOCKING flag is true if the STM uses a commit time lock for buffered locations. The confirmation of jloss_buf () is used for reading from a previously locked position when commit time lock is not used, but is used for all reads when not used.

＜擬似コードＢ：その場更新、ネイティブコードのＳＴＭの読出バリア＞

ＴＭシステムは、上述したように、読出監視とバッファリングおよび書込監視とを組み合わせるとしてよいので、コンシステンシを維持するために監視されているラインまたはバッファリングされているラインに対するコンフリクトを確認することをさらに含むとしてよい。このようなシステムに対応するべく、異なる実施形態では、ＪＬＯＳＳ．ｒｍ．ｂｕｆ（読出監視されている位置またはバッファリングされている位置に対するコンフリクト）、ＪＬＯＳＳ．ｒｍ．ｗｍ（読出監視されている位置または書込監視されている位置に対するコンフリクト）、または、ＪＬＯＳＳ．＊（読出監視されている位置、書込監視されている位置、または、バッファリングされている位置に対するコンフリクト）等のさまざまな監視イベントおよびバッファリングイベントを論理的に組み合わせて分岐するＪＬＯＳＳフレーバーを提供するとしてよい。 The TM system may combine read monitoring with buffering and write monitoring, as described above, to check for conflicts with respect to monitored or buffered lines in order to maintain consistency. May further be included. In order to accommodate such a system, in a different embodiment, JLOSS. rm. buf (conflict with respect to read monitored position or buffered position), JLOSS. rm. wm (conflict with respect to read-monitored position or write-monitored position), or JLOSS. * Provides a JLOSS flavor that logically combines various monitoring events and buffering events, such as (conflicts to read monitored positions, write monitored positions, or buffered positions). You may do it.

別の実施形態によると、アーキテクチャインターフェースは、ソフトウェアに別の制御レジスタで条件、つまり、読出／書込監視されているラインまたはバッファリングされているラインに対するコンフリクトを設定させることによって、分岐する条件からＪＬＯＳＳ命令を切り離す。この実施形態では、必要なＪＬＯＳＳ命令のエンコードは１回のみであり、ＪＬＯＳＳが分岐すべきイベント群を将来に拡張することもサポートできる。 According to another embodiment, the architecture interface may cause the software to branch out of a condition by branching by having the software set a condition on another control register, i.e. a conflict for the line being read / written or buffered. Disconnect the JLOSS instruction. In this embodiment, the required JLOSS instruction is encoded only once, and it is possible to support future expansion of events to which JLOSS should branch.

図１０は、コンフリクトまたは特定の情報の損失に基づいてデスティネーションラベルにジャンプする損失命令を実行する方法の実施形態を説明するためのフローチャートである。一実施形態によると、ステップ１００５において、ＪＬＯＳＳ命令を受信する。上述したように、ＪＬＯＳＳ命令は、プログラマまたはコンパイラによって、メインコード中に、例えば、ロード処理後に読出設定のコンシステンシを実現するために挿入されたり、または、バリア内に、例えば、読出バリアまたは書込バリア内に挿入されるとしてよい。ＪＬＯＳＳ命令および上述したその変形は、一実施形態によると、プロセッサのＩＳＡの一部として認識可能である。この場合、デコーダは、ＪＬＯＳＳ命令のオペコードをデコード可能である。 FIG. 10 is a flowchart illustrating an embodiment of a method for executing a loss instruction that jumps to a destination label based on a conflict or loss of specific information. According to one embodiment, at step 1005, a JLOSS instruction is received. As described above, the JLOSS instruction is inserted by the programmer or compiler into the main code, for example, to achieve read setting consistency after the load process, or within the barrier, for example, a read barrier or write. It may be inserted into the bayonet barrier. The JLOSS instruction and its variants described above, according to one embodiment, can be recognized as part of the processor's ISA. In this case, the decoder can decode the operation code of the JLOSS instruction.

ステップ１０１０において、コンフリクトまたは情報の損失が発生したか否かを判断する。一実施形態によると、コンフリクトまたは損失の種類は、ＪＬＯＳＳ命令のバージョンに応じて決まる。例えば、受信したＪＬＯＳＳ命令がＪＬＯＳＳ．ｒｍ命令の場合、読出監視されているラインと外部からの書込アクセスとの間でコンフリクトが発生したか否かを判断する。しかし、上述したように、ユーザに制御レジスタで条件を指定させるＪＬＯＳＳ命令を始めとして、どのバージョンのＪＬＯＳＳを受信するとしてもよい。 In step 1010, it is determined whether a conflict or loss of information has occurred. According to one embodiment, the type of conflict or loss depends on the version of the JLOSS instruction. For example, the received JLOSS command is JLOSS. In the case of the rm instruction, it is determined whether or not a conflict has occurred between the line monitored for reading and the external write access. However, as described above, any version of JLOSS may be received, including a JLOSS instruction that allows the user to specify a condition in the control register.

このため、条件が確立すれば、制御レジスタまたはＪＬＯＳＳ命令の種類のいずれかに基づき、確立した条件を満たしているか否かを判断する。第１の例を挙げると、ＴＳＲ９１５等のトランザクションステータスレジスタにある情報を用いて、条件を満たしているか否かを判断する。ここにおいて、ＴＳＲ９１５は、デフォルトでは無コンフリクト値に設定されており、且つ、コンフリクトが発生すればその旨を示すべくコンフリクト値に更新される読出監視ステータスフラグを含むとしてよい。しかし、コンフリクトが発生したか否かを判断するための方法はステータスレジスタを利用する方法に限定されず、損失またはコンフリクトの判断を行なうためには任意の公知の方法を利用するとしてよい。 Therefore, if the condition is established, it is determined whether the established condition is satisfied based on either the control register or the JLOSS instruction type. As a first example, it is determined whether or not a condition is satisfied using information in a transaction status register such as TSR 915. Here, the TSR 915 may be set to a non-conflict value by default, and may include a read monitoring status flag that is updated to a conflict value to indicate that a conflict has occurred. However, the method for determining whether or not a conflict has occurred is not limited to the method using the status register, and any known method may be used for determining loss or conflict.

コンフリクトが検出されないことに応じて、例えば、ＴＳＲ９１５の読出監視コンフリクトフラグが依然としてデフォルト値に設定されている場合、ステップ１０２５において、「偽」を表す値を返して実行は通常通りに継続する。しかし、コンフリクトまたは損失が検出されれば、例えば、読出監視コンフリクトフラグが設定されれば、ステップ１０１５において、ＪＬＯＳＳは「真」を表す値を返して、ステップ１０２０において、受信したＪＬＯＳＳ命令が定義しているラベルに実行をジャンプさせる。 In response to no conflict being detected, for example, if the read monitoring conflict flag of the TSR 915 is still set to the default value, in step 1025, a value representing “false” is returned and execution continues normally. However, if a conflict or loss is detected, for example, if the read monitoring conflict flag is set, in step 1015 JLOSS returns a value representing “true” and in step 1020 the received JLOSS instruction defines. Jump to the label that is running.

＜トランザクショナルメモリのコミットに対するハードウェアサポート＞
上述したように、ハードウェアでサポートされているトランザクションは、グローバルに可視化することなくキャッシュ内にトランザクション的な書込をバッファリングすることによって、ソフトウェアのバージョン管理を加速するとしてよい。この場合、バッファされている値を全てのプロセッサに対して可視化させる単純なコミット命令を用いるとしてよいが、バッファされているラインの何れかが失われている場合には失敗に終わってしまう。しかし、冗長なバリアを削除／フィルタリングするためのフィルタ等、ソフトウェアが加速化のために利用可能なメタデータをハードウェアが保持することができるので、ハードウェアがコンフリクトを検出すれば、コミット命令が失敗することが望ましい。また、コミットされると、トランザクションのためにハードウェアで保持される情報のさまざまな組み合わせ、例えば、メタデータ、監視部、または、バッファされているラインをクリアすることが望ましい。 <Hardware support for committing transactional memory>
As mentioned above, hardware supported transactions may accelerate software version control by buffering transactional writes in the cache without global visibility. In this case, a simple commit instruction that makes the buffered value visible to all processors may be used, but will fail if any of the buffered lines are lost. However, since hardware can hold metadata that software can use for acceleration, such as filters to remove / filter redundant barriers, if the hardware detects a conflict, the commit instruction It is desirable to fail. Also, once committed, it is desirable to clear various combinations of information held in hardware for the transaction, such as metadata, monitoring, or buffered lines.

このため、一実施形態によると、ハードウェアは、さまざまな形態のコミット命令をサポートして、コミット命令に、コミットするための条件およびコミット時にクリアする情報の両方を特定させる。図１１は、ハードウェアがコミット命令でコミット条件およびクリア制御の定義をサポートする一般的な場合の実施形態を示す。 Thus, according to one embodiment, the hardware supports various forms of commit instructions, causing the commit instruction to specify both the conditions for committing and the information to clear upon commit. FIG. 11 illustrates a general case embodiment where the hardware supports definition of commit conditions and clear control with a commit instruction.

同図に示すように、コミット命令１１０５は、プロセッサのＩＳＡの一部として認識可能な、つまり、デコーダ１１１５がデコード可能であるオペコード１１１０を含む。図示している例では、オペコード１１１０は、コミット条件１１１１およびクリア制御１１１２の２つの部分を含む。コミット条件１１１１は、コミットすべきトランザクションの条件を特定しており、コミットクリア制御１１１２は、トランザクションのコミット時にクリアすべき情報を特定している。 As shown in the figure, the commit instruction 1105 includes an operation code 1110 that can be recognized as a part of the ISA of the processor, that is, can be decoded by the decoder 1115. In the illustrated example, the opcode 1110 includes two parts: a commit condition 1111 and a clear control 1112. The commit condition 1111 specifies the condition of the transaction to be committed, and the commit clear control 1112 specifies the information to be cleared when the transaction is committed.

一実施形態によると、どちらの部分も、読出監視（ＲＭ）、書込監視（ＷＭ）、バッファリング（Ｂｕｆ）、および、メタデータ（ＭＤ）という４つの値を含む。基本的に、これら４つの値のうちいずれかが部分１１１１で設定されている場合、つまり、対応付けられている属性／特性がコミット条件である旨を示す値を含む場合、対応する特性はコミットするための条件となる。言い換えると、条件１１１１のうち読出監視情報に対応する第１のビットが設定されている場合、トランザクションに対応付けられている監視部１１３５の読出監視データの損失はアボートとなる。つまり、コミット命令の特定の条件が満たされていないので、コミットしない。同様に、１１１２のある値が設定されていれば、対応する特性はコミット時にクリアされる。この例で説明を続けると、部分１１１２のＲＭが設定されている場合、トランザクションの監視部１１３５の読出監視情報は、当該トランザクションがコミットされる場合にクリアされる。このため、本例では、コミット条件およびクリア制御としてそれぞれ４通りの設定が可能で、コミット命令のバージョンとしては２５６通りが可能となる。一実施形態によると、コミット条件をオペコードで特定させるので、ハードウェアは全てのバージョンをサポート可能である。さまざまなタイプのコミット命令を理解していただくため、そして、コミット命令がどのように用いられるかを説明するため、以下で数バージョンを説明する。 According to one embodiment, both parts include four values: read monitor (RM), write monitor (WM), buffering (Buf), and metadata (MD). Basically, if any of these four values is set in the portion 1111, that is, if the associated attribute / characteristic includes a value indicating that it is a commit condition, the corresponding characteristic is a commit It is a condition to do. In other words, when the first bit corresponding to the read monitoring information in the condition 1111 is set, the loss of the read monitoring data of the monitoring unit 1135 associated with the transaction is aborted. That is, since the specific condition of the commit instruction is not satisfied, the commit is not performed. Similarly, if a value of 1112 is set, the corresponding characteristic is cleared at commit time. Continuing with this example, when the RM of the portion 1112 is set, the read monitoring information of the transaction monitoring unit 1135 is cleared when the transaction is committed. For this reason, in this example, four settings can be made for the commit condition and the clear control, respectively, and 256 versions of the commit instruction can be made. According to one embodiment, the commit condition is specified by the opcode, so that the hardware can support all versions. In order to help you understand the various types of commit instructions, and to illustrate how commit instructions are used, a few versions are described below.

＜ＴＸＣＯＭＷＭ＞
第１の例として、Ｔｘｃｏｍｗｍ命令を説明する。この命令は、書込監視されているデータがいずれも失われていなければ（成功）、トランザクションを終了させて、書込監視されておりバッファリングされている全てのデータをグローバルに可視化する命令である。書込監視されているデータが失われていれば、失敗となる。Ｔｘｃｏｍｗｍは、フラグを設定（またはリセット）して、成功（または失敗）を示す。Ｔｘｃｏｍｗｍは、成功すると、書込監視されている全てのデータのバッファ済み状態をクリアする。Ｔｘｃｏｍｗｍは、読出監視状態または書込監視状態には影響を与えないので、ソフトウェアは読出監視状態または書込監視状態を後続のトランザクションで再利用することができる。また、バッファリングされているが書込監視されていない位置の状態にも影響を与えないので、ソフトウェアはこのような位置に維持されている情報を保持することができる。以下に記載する擬似コードは、擬似コードＣと示すが、Ｔｘｃｏｍｗｍをアルゴリズム的に説明している。ＴＳＲ．ＬＯＳＳ＿ＷＭが０の場合、書込監視されておりバッファされているＢＢＬＫは全て、ＢＦ特性がアトミックにクリアされ、このバッファリングされているデータは全て他のエージェントにも可視となる。ＴＣＲ．ＩＮ＿ＴＸがクリアされる。バッファリングされているがＷＭされていないブロックは、影響を受けず、バッファリングされたままとなる。ＣＦフラグは、完了すると設定される。ＴＳＲ．ＬＯＳＳ＿ＷＭが１である場合には、ＣＦフラグはクリアされ、ＴＣＲ．ＩＮ＿ＴＸがクリアされる。ＣＦフラグは、処理が成功すると１に設定され、失敗すると０に設定される。ＯＦフラグ、ＳＦフラグ、ＺＦフラグ、ＡＦフラグおよびＰＦフラグは０に設定される。 <TXCOMWM>
As a first example, a Txcomwm instruction will be described. This command terminates the transaction and globally visualizes all buffered and buffered data if no write-monitored data is lost (success). is there. If the data being monitored for writing is lost, a failure occurs. Txcomwm sets (or resets) a flag to indicate success (or failure). If successful, Txcomwm clears the buffered state of all data being monitored for writing. Since Txcomwm does not affect the read monitoring state or the write monitoring state, the software can reuse the read monitoring state or the write monitoring state in subsequent transactions. In addition, since it does not affect the state of the buffered position where the writing is not monitored, the software can maintain the information maintained at such a position. Although the pseudo code described below is indicated as pseudo code C, Txcomwm is described algorithmically. TSR. When LOSS_WM is 0, all BBLKs that are write-monitored and buffered have their BF characteristics cleared atomically and all this buffered data is also visible to other agents. TCR. IN_TX is cleared. Blocks that are buffered but not WM are not affected and remain buffered. The CF flag is set upon completion. TSR. If LOSS_WM is 1, the CF flag is cleared and TCR. IN_TX is cleared. The CF flag is set to 1 when the process is successful, and is set to 0 when the process is unsuccessful. The OF flag, SF flag, ZF flag, AF flag and PF flag are set to 0.

＜擬似コードＣ：Ｔｘｃｏｍｗｍ処理のアルゴリズムの実施形態＞

以下に記載する擬似コードは、擬似コードＤとして示すが、ＨＡＳＴＭシステムがどのようにＴｘｃｏｍｗｍ命令を利用して、その場更新型のＳＴＭにおいて記録の取消を回避するべくハードウェア書込バッファリングを利用するトランザクションをコミットするのか説明する。ＣＡＣＨＥ＿ＲＥＳＩＤＥＮＴ＿ＷＲＩＴＥＳフラグは、この実行モードを示す。 The pseudo-code described below is shown as pseudo-code D, but how the HASTM system uses the Txcomwm instruction to use hardware write buffering to avoid undoing records in an in-situ update STM Explain whether to commit the transaction. The CACHE_RESIDENT_WRITES flag indicates this execution mode.

＜擬似コードＤ：Ｔｘｃｏｍｗｍ命令を利用するための擬似コードの実施形態＞

＜ＴＸＣＯＭＷＭＲＭ＞
変形例の１つであるｔｘｃｏｍｗｍｒｍは、Ｔｘｃｏｍｗｍ命令を拡張した命令で、読出監視されている位置も失われていれば、失敗となる。この変形例は、読出設定コンフリクトを検出する際にハードウェアのみを利用するトランザクションについて有用である。以下に記載する擬似コードは、擬似コードＥとして示すが、Ｔｘｃｏｍｗｍｒｍをアルゴリズム的に説明する。ＴＳＲ．ＬＯＳＳ＿ＷＭおよびＴＳＲ．ＬＯＳＳ＿ＲＭが０の場合、書込監視されておりバッファされているＢＢＬＫは全て、ＢＦ特性がアトミックにクリアされ、このバッファリングされているデータは全て他のエージェントにも可視となる。ＴＣＲ．ＩＮ＿ＴＸがクリアされる。バッファリングされているがＷＭされていないブロックは、影響を受けず、バッファリングされたままとなる。ＣＦフラグは、完了すると設定される。ＴＳＲ．ＬＯＳＳ＿ＷＭまたはＴＳＲ．ＬＯＳＳ＿ＲＭが１である場合には、ＣＦフラグはクリアされ、ＴＣＲ．ＩＮ＿ＴＸがクリアされる。ＣＦフラグは、処理が成功すると１に設定され、失敗すると０にクリアされる。ＯＦフラグ、ＳＦフラグ、ＺＦフラグ、ＡＦフラグおよびＰＦフラグは０に設定される。 <TXCOMWMRM>
One modification, txcomwmrm, is an extension of the Txcomwm instruction and fails if the position being read is lost. This variation is useful for transactions that use only hardware when detecting read setting conflicts. Although the pseudo code described below is shown as pseudo code E, Txcomwrmm will be described algorithmically. TSR. LOSS_WM and TSR. When LOSS_RM is 0, all BBLKs that are write-monitored and buffered have their BF characteristics cleared atomically and all this buffered data is also visible to other agents. TCR. IN_TX is cleared. Blocks that are buffered but not WM are not affected and remain buffered. The CF flag is set upon completion. TSR. LOSS_WM or TSR. If LOSS_RM is 1, the CF flag is cleared and TCR. IN_TX is cleared. The CF flag is set to 1 when the process is successful, and cleared to 0 when the process is unsuccessful. The OF flag, SF flag, ZF flag, AF flag and PF flag are set to 0.

＜擬似コードＥ：Ｔｘｃｏｍｗｍｒｍのアルゴリズムによる記述の実施形態＞

続く擬似コードは、擬似コードＦと示すが、トランザクション的書込のバッファリングおよび読出設定コンフリクトの検出の両方にハードウェアを利用するＳＴＭシステムについてｔｘｃｏｍｗｍｒｍ命令を利用するコミットアルゴリズムを表す。ＨＷ＿ＲＥＡＤ＿ＭＯＮＩＴＯＲＩＮＧフラグは、このアルゴリズムが読出設定コンフリクト検出のためにハードウェアのみを利用するか否かを示す。 The following pseudo code, denoted as pseudo code F, represents a commit algorithm that utilizes the txcomwrmm instruction for an STM system that utilizes hardware for both transactional write buffering and read configuration conflict detection. The HW_READ_MONITORING flag indicates whether this algorithm uses only hardware for read setting conflict detection.

＜擬似コードＦ：ｔｘｃｏｍｗｍｒｍ命令を利用する擬似コードの実施形態＞

＜ＴＸＣＯＭＷＭＩＲＭＣ＞
第３の変形例を、以下に擬似コードＦとしてアルゴリズムで説明する。ＴＳＲ．ＬＯＳＳ＿ＷＭおよびＴＳＲ．ＬＯＳＳ＿ＩＲＭが０の場合、書込監視されておりバッファされているＢＢＬＫは全て、ＢＦ特性がアトミックにクリアされ、このバッファリングされているデータは全て他のエージェントにも可視となる。ＲＭ、ＷＭおよびＩＲＭ、ならびに、ＴＣＲ．ＩＮ＿ＴＸがクリアされる。バッファリングされているがＷＭされていないブロックは、影響を受けず、バッファリングされたままとなる。ＣＦフラグは、完了すると設定される。ＴＳＲ．ＬＯＳＳ＿ＷＭまたはＴＳＲ．ＬＯＳＳ＿ＲＭが１である場合には、ＣＦフラグはクリアされ、ＴＣＲ．ＩＮ＿ＴＸがクリアされる。ＣＦフラグは、処理が成功すると１に設定され、失敗すると０にクリアされる。ＯＦフラグ、ＳＦフラグ、ＺＦフラグ、ＡＦフラグおよびＰＦフラグは０に設定される。 <TXCOMMWIRMC>
A third modification will be described below using an algorithm as pseudo code F. TSR. LOSS_WM and TSR. When LOSS_IRM is 0, all BBLKs that are write-monitored and buffered have their BF characteristics cleared atomically, and all this buffered data is visible to other agents. RM, WM and IRM, and TCR. IN_TX is cleared. Blocks that are buffered but not WM are not affected and remain buffered. The CF flag is set upon completion. TSR. LOSS_WM or TSR. If LOSS_RM is 1, the CF flag is cleared and TCR. IN_TX is cleared. The CF flag is set to 1 when the process is successful, and cleared to 0 when the process is unsuccessful. The OF flag, SF flag, ZF flag, AF flag and PF flag are set to 0.

＜擬似コードＦ：Ｔｘｃｏｍｗｍｉｒｍｃ命令のアルゴリズムによる説明の実施形態＞

図１２は、コミット条件およびクリア制御を定義しているコミット命令を実行する方法の実施形態を示すフローチャートである。ステップ１２０５において、コミット命令を受信する。上述したように、コンパイラは、プログラムコードにコミット命令を挿入するとしてよい。説明のために具体例を挙げると、コミット関数の呼び出しはメインコードに挿入され、上記の擬似コードに含まれているようなコミット関数はライブラリにおいて提供されており、コンパイラはさらに、ライブラリ内のコミット関数にコミット命令を挿入するとしてもよい。 FIG. 12 is a flowchart illustrating an embodiment of a method for executing a commit instruction defining a commit condition and clear control. In step 1205, a commit command is received. As described above, the compiler may insert a commit instruction into the program code. For illustration purposes, a commit function call is inserted into the main code, a commit function as provided in the pseudocode above is provided in the library, and the compiler further includes a commit in the library. A commit instruction may be inserted into the function.

コミット命令を受信すると、デコーダはコミット命令をデコードすることが可能である。ステップ１２１０では、デコードされた情報に基づき、コミット命令のオペコードで特定されている条件を判断する。上述したように、オペコードは、一部のフラグを設定する一方他のフラグをリセットすることによって、コミットの際にどの条件を利用するか示すとしてよい。条件を満たしていない場合には、「偽」を返して、トランザクションは別個にアボートされるとしてよい。しかし、コミットのための条件が満たされていれば、例えば、読出監視、書込監視、メタデータおよび／またはバッファリングのうち任意の組み合わせが失われていない場合には、ステップ１２１５において、クリア条件／制御を判断する。一例を挙げると、トランザクションの読出監視、書込監視、メタデータおよび／またはバッファリングのうち任意の組み合わせを、クリアすべきと判断する。このため、ステップ１２２５において、クリアすべきと判断された情報をクリアする。 Upon receiving the commit instruction, the decoder can decode the commit instruction. In step 1210, based on the decoded information, the condition specified by the opcode of the commit instruction is determined. As described above, the opcode may indicate which condition is used during commit by setting some flags while resetting other flags. If the condition is not met, it returns “false” and the transaction may be aborted separately. However, if the condition for commit is satisfied, for example, if any combination of read monitoring, write monitoring, metadata and / or buffering is not lost, in step 1215, the clear condition / Judge the control. As an example, it is determined that any combination of transaction read monitoring, write monitoring, metadata, and / or buffering should be cleared. For this reason, in step 1225, information determined to be cleared is cleared.

＜ＵＴＭのメモリ管理の最適化＞
上述したように、無制限トランザクショナルメモリ（ＵＴＭ）のアーキテクチャおよびそのハードウェア実装は、監視、バッファリング、および、メタデータという特性を導入することによって、プロセッサアーキテクチャを拡張したものである。このような構成とすることによって、多岐にわたるトランザクショナルメモリ設計を始めとする、さまざまな高度なアルゴリズムを実装するのに必要な手段がソフトウェアに提供される。それぞれの特性は、キャッシュを実装する際に既存のキャッシュプロトコルを拡張するか、または、独立した新しいハードウェアリソースを割り当てるかによって、ハードウェアで実装されるとしてよい。 <Optimization of UTM memory management>
As described above, the Unlimited Transactional Memory (UTM) architecture and its hardware implementation extend the processor architecture by introducing the characteristics of monitoring, buffering, and metadata. Such a configuration provides software with the means necessary to implement a variety of advanced algorithms, including a wide variety of transactional memory designs. Each characteristic may be implemented in hardware by extending an existing cache protocol when implementing the cache, or by assigning new independent hardware resources.

ＵＴＭの特性をＨＷで実装する場合、ＵＴＭアーキテクチャおよびそのハードウェア実装によって、ＵＴＭトランザクションアボートおよびそれに続くトランザクション再試行処理等の発生を回避して最小限に抑えることができれば、トランザクションに関してソフトウェアのみを利用する場合（ＳＴＭ）に比べて、性能が向上する可能性がある。ハードウェアトランザクションアボートの主な原因の１つとして、外部割込、システム呼び出しイベント、および、ページフォールトに起因してリング遷移が頻繁に発生することが挙げられていた。 When implementing UTM characteristics in HW, use software only for transactions if the UTM architecture and its hardware implementation can minimize UTM transaction abort and subsequent transaction retry processing. When compared with (STM), the performance may be improved. One of the main causes of hardware transaction aborts was the frequent occurrence of ring transitions due to external interrupts, system call events, and page faults.

現在実行中の特権レベル（ＣＰＬ）に基づく一時停止メカニズムは、ハードウェアトランザクションをアクティブ状態とし（バッファリングおよび監視等のＵＴＭ特性を持つハードウェアで加速されたトランザクションをイネーブルし、放出メカニズムをイネーブルする）、プロセッサの動作モードは、特権レベル３（ユーザモード）である。リング３からのリング遷移によって、現在アクティブなトランザクションは自動的に一時停止してしまう（ＵＴＭ特性を生成するために停止して、放出メカニズムをディセーブルする）。同様に、リング３に戻るリング遷移によって、以前に一時停止したハードウェアトランザクションを、アクティブであったならば、自動的に再開する。この手法の欠点は、カーネルコードまたはリング３を除くリングレベルでのハードウェアトランザクショナルメモリのリソースの利用が、ほとんど排除されていることである。 A suspension mechanism based on the currently executing privilege level (CPL) activates the hardware transaction (enables hardware accelerated transactions with UTM characteristics such as buffering and monitoring, and enables the release mechanism) The operation mode of the processor is privilege level 3 (user mode). A ring transition from ring 3 automatically suspends the currently active transaction (stops to generate UTM characteristics and disables the release mechanism). Similarly, a ring transition back to ring 3 automatically resumes a previously suspended hardware transaction if it was active. The disadvantage of this approach is that the use of hardware transactional memory resources at the ring level, excluding kernel code or ring 3, is largely eliminated.

別の手法では、リング０のトランザクション制御レジスタ（ＴｘＣＲ）等のＴＭ制御リソースを複製して、この別個に用意されたＴＭリソースを用いてリング０コードのハードウェアトランザクションをイネーブルできるようにする。しかし、この手法では、リング０のトランザクション処理でのネスト化された割り込みおよび例外を効率的に処理することができない。 Another approach replicates TM control resources, such as the Ring 0 transaction control register (TxCR), to enable ring 0 code hardware transactions using this separately prepared TM resource. However, this method cannot efficiently handle nested interrupts and exceptions in ring 0 transaction processing.

このため、図１３は、トランザクションの実行時に特権レベルの遷移の処理をサポートするハードウェアの実施形態を示す。当該実施形態では、ユーザモード（リング３）のトランザクションに加えてリング０のトランザクションをイネーブルするが、仮想マシンモニタ（ＶＭＭ）等のハイパーバイザおよびＯＳも提供して、リング０のトランザクションが存在している状態で、無限の階層のネスト化された割り込みおよびＮＭＩを処理する。 Thus, FIG. 13 shows an embodiment of hardware that supports privilege level transition processing during transaction execution. In this embodiment, a ring 0 transaction is enabled in addition to a user mode (ring 3) transaction, but a hypervisor such as a virtual machine monitor (VMM) and an OS are also provided, and a ring 0 transaction exists. In this state, it handles infinite hierarchy of nested interrupts and NMI.

ＥＦＬＡＧＳレジスタ１３１０等の格納要素は、トランザクションイネーブルフィールド（ＴＥＦ）１３１１を含む。ＴＥＦ１３１１は、アクティブ値を保持する場合には、トランザクションが現在アクティブでありイネーブルされている旨を示し、非アクティブ値を保持する場合には、トランザクションが一時停止している旨を示す。 Storage elements such as the EFLAGS register 1310 include a transaction enable field (TEF) 1311. The TEF 1311 indicates that the transaction is currently active and enabled if it holds an active value, and indicates that the transaction is suspended if it holds an inactive value.

一実施形態によると、トランザクション開始処理、または、トランザクション開始時の別の処理によって、ＴＥＦフィールド１３１１をアクティブ値に設定する。ステップ１３００において、割り込み、例外、システム呼び出し、仮想マシン終了、または、仮想マシン開始等のリングレベル遷移イベントが発生すると、ステップ１３０１において、ＰＥ０のＥｆｌａｇｓレジスタ１３１０の状態がカーネルスタック１３２０にプッシュされる。ステップ３０２において、ＴＥＦフィールド１３１１は、トランザクションを一時停止するべく、非アクティブ値にクリア／更新される。リングレベル遷移イベントについては、トランザクションが一時停止している間に、適切に処理または対応する。ステップ１３０３において戻りイベントを検出すると、ステップ１３０１においてスタックにプッシュされたＥｆｌａｇｓレジスタ１３１０の状態は、ステップ１３０４においてポップされて、Ｅｆｌａｇｓレジスタ１３１０に以前の状態を戻す。以前の状態に戻すことによって、ＴＥＦ１３１１にはアクティブ値が再び設定され、アクティブ状態且つイネーブル状態にあるものとしてトランザクションを再開する。 According to one embodiment, the TEF field 1311 is set to an active value by a transaction start process or another process at the start of a transaction. When a ring level transition event such as an interrupt, exception, system call, virtual machine end, or virtual machine start occurs in step 1300, the state of the Eflags register 1310 of PE0 is pushed onto the kernel stack 1320 in step 1301. In step 302, the TEF field 1311 is cleared / updated to an inactive value to suspend the transaction. For ring level transition events, handle or respond appropriately while the transaction is suspended. When a return event is detected at step 1303, the state of the Eflags register 1310 pushed onto the stack at step 1301 is popped at step 1304 to return the previous state to the Eflags register 1310. By returning to the previous state, the active value is set again in the TEF 1311 and the transaction is resumed as being in the active and enabled state.

リングレベル遷移イベントに対する処理の具体例を以下に記載する。割り込みおよび例外が発生すると、プロセッサは、ＥＦＬＡＧＳレジスタをカーネルスタックにプッシュして、「トランザクションイネーブル」ビットを、設定されていれば、クリアして、以前にイネーブルされていたトランザクションを一時停止させる。ＩＲＥＴが発生すると、プロセッサは、カーネルスタックに基づき、「トランザクションイネーブル」ビットを含む、割り込みが発生したスレッドのＥＦＬＡＧＳレジスタの状態を全て元に戻して、以前にイネーブルされていればトランザクションの一時停止を解除する。 A specific example of processing for a ring level transition event is described below. When interrupts and exceptions occur, the processor pushes the EFLAGS register onto the kernel stack, clears the “transaction enable” bit, if set, and suspends previously enabled transactions. When an IRET occurs, the processor restores all of the EFLAGS register state of the interrupted thread, including the “transaction enable” bit, based on the kernel stack, and suspends the transaction if previously enabled. To release.

ＳＹＳＣＡＬＬが発生すると、プロセッサは、ＥＦＬＡＧＳレジスタをプッシュして、「トランザクションイネーブル」ビットを、設定されていれば、クリアして、以前にイネーブルされたトランザクションを一時停止する。ＳＹＳＲＥＴが発生すると、プロセッサは、カーネルスタックに基づいて、「トランザクションイネーブル」ビットを含む、割り込みが発生したスレッドのＥＦＬＡＧＳレジスタの状態を全て元に戻して、以前にイネーブルされていれば、トランザクションの一時停止を解除する。 When SYSCALL occurs, the processor pushes the EFLAGS register, clears the “transaction enable” bit, if set, and suspends the previously enabled transaction. When a SYSRET occurs, the processor reverts the state of the EFLAGS register of the thread in which the interrupt occurred, including the “transaction enable” bit, based on the kernel stack and, if previously enabled, the transaction temporary Release the stop.

ＶＭ−Ｅｘｉｔが発生すると、プロセッサは、「トランザクションイネーブル」ビットの状態を含む、ゲストのＥＦＬＡＧＳレジスタを仮想マシン制御構造（ＶＭＣＳ）に保存して、「トランザクションイネーブル」ビットの状態がクリア状態であるホストのＥＦＬＡＧＳレジスタの状態をロードして、以前にイネーブルされているゲストのトランザクションを、イネーブルされていれば、一時停止させる。 When a VM-Exit occurs, the processor stores the guest's EFLAGS register, including the state of the “transaction enable” bit, in the virtual machine control structure (VMCS), and the host whose “transaction enable” bit is in the clear state. The EFLAGS register state is loaded, and previously enabled guest transactions are suspended, if enabled.

ＶＭ−Ｅｎｔｅｒが発生すると、プロセッサは、ＶＭＣＳから、「トランザクションイネーブル」ビットの状態を含むゲストのＥＦＬＡＧＳレジスタを元に戻して、以前にイネーブルされたゲストのトランザクションを、イネーブルされていた場合には、一時停止を解除する。 When VM-Enter occurs, the processor reverts the guest's EFLAGS register containing the state of the "transaction enable" bit from VMCS, and if previously enabled guest transactions were enabled, Release the pause.

このような構成とすることによって、ハードウェアで加速されたユーザモード（リング３）のＵＴＭトランザクションに加えて、ハードウェアで加速されたカーネルモード（リング０）のＵＴＭトランザクションがイネーブルされるが、ＯＳおよびＶＭＭの両方が、リング０のトランザクションが存在する中で、無限の階層のネスト化された割り込みおよびＮＭＩを処理できるようになる。先行技術では、このようなメカニズムは提供されていない。 With this configuration, in addition to the hardware-accelerated user mode (ring 3) UTM transaction, the hardware-accelerated kernel mode (ring 0) UTM transaction is enabled. And VMM will be able to handle an infinite hierarchy of nested interrupts and NMIs in the presence of ring 0 transactions. The prior art does not provide such a mechanism.

「モジュール」は、本明細書で用いる場合、任意のハードウェア、ソフトウェア、ファームウェア、または、これらの組み合わせを意味する。多くの場合、別々のものとして示されるモジュールの境界は、変更されるのが普通で、重複する可能性がある。例えば、第１のモジュールおよび第２のモジュールは、一部のハードウェア、ソフトウェア、または、ファームウェアを別個に保有しつつも、ハードウェア、ソフトウェア、ファームウェア、または、これらの組み合わせを共有するとしてもよい。一実施形態によると、「ロジック」という用語を用いる場合、トランジスタ、レジスタ等のハードウェア、または、プログラム可能ロジックデバイス等のその他のハードウェアを含む。しかし、別の実施形態では、「ロジック」はさらに、ハードウェアと一体化したソフトウェアまたはコード、例えば、ファームウェアまたはマイクロコードを含む。 “Module” as used herein means any hardware, software, firmware, or combination thereof. In many cases, the boundaries of modules shown as separate are usually changed and can overlap. For example, the first module and the second module may share some hardware, software, firmware, or a combination thereof, while having some hardware, software, or firmware separately. . According to one embodiment, the term “logic” includes hardware such as transistors, registers, or other hardware such as programmable logic devices. However, in another embodiment, “logic” further includes software or code integrated with hardware, eg, firmware or microcode.

「値」は、本明細書で用いる場合、数、状態、論理状態、または、バイナリ論理状態を表す任意の公知の方法を含む。多くの場合、「論理レベル」、「ロジック値」、または、「論理値」は、１および０で表現することもあり、これは単にバイナリロジック状態を表したものである。例えば、「１」は論理Ｈｉｇｈレベルを意味し、「０」は論理Ｌｏｗレベルを意味する。一実施形態によると、トランジスタまたはフラッシュセル等のストレージセルは、１または複数の論理値を保持可能であるとしてよい。しかし、コンピュータシステムでは他の方法で値を表現することもある。例えば、１０進法の「１０」は、バイナリ値では「１０１０」と表現され、１６進法では文字「Ａ」で表現される。このため、「値」は、コンピュータシステムで保持可能な情報の表現方法であればどのような表現方法も含む。 “Value”, as used herein, includes any known method of representing a number, state, logical state, or binary logical state. In many cases, a “logic level”, “logic value”, or “logic value” may be represented by 1 and 0, which simply represents a binary logic state. For example, “1” means a logical high level, and “0” means a logical low level. According to one embodiment, a storage cell such as a transistor or flash cell may be capable of holding one or more logical values. However, in a computer system, values may be expressed by other methods. For example, “10” in the decimal system is expressed as “1010” in the binary value, and the character “A” is expressed in the hexadecimal system. Therefore, the “value” includes any representation method as long as it is a representation method of information that can be held by the computer system.

また、値または値の一部で状態を表現するとしてもよい。一例として、論理値「１」等の第１の値は、デフォルト状態または初期状態を表す一方、論理値「０」等の第２の値はデフォルトでない状態を表すとしてよい。また、「リセット」および「設定」という用語はそれぞれ、一実施形態によると、デフォルト値またはデフォルト状態、および、更新後の値または更新後の状態を意味する。例えば、デフォルト値は論理Ｈｉｇｈ値、つまり、リセット値を含み、更新後の値は論理Ｌｏｗ値、つまり、設定値を含む。尚、値を任意に組み合わせて、任意の数の状態を表現するとしてよい。 Further, the state may be expressed by a value or a part of the value. As an example, a first value such as a logical value “1” may represent a default or initial state, while a second value such as a logical value “0” may represent a non-default state. Also, the terms “reset” and “setting” respectively refer to a default value or default state, and an updated value or updated state, according to one embodiment. For example, the default value includes a logical high value, that is, a reset value, and the updated value includes a logical low value, that is, a set value. Any number of states may be expressed by arbitrarily combining values.

上述した方法、ハードウェア、ソフトウェア、ファームウェア、または、コードの実施形態は、処理要素で実行可能な機械アクセス可能媒体または機械可読媒体に格納されている命令またはコードで実装されるとしてよい。機械アクセス可能／可読媒体は、コンピュータまたは電子システム等の機械が読出可能な状態で情報を提供する（つまり、格納および／または送信する）任意のメカニズムを含む。例えば、機械アクセス可能媒体は、スタティックランダムアクセスメモリ（ＳＲＡＭ）またはダイナミックＲＡＭ（ＤＲＡＭ）等のＲＡＭ、ＲＯＭ、磁気格納媒体または光格納媒体、フラッシュメモリデバイス、電気ストレージデバイス、光ストレージデバイス、音響ストレージデバイスまたはその他の形態の伝播信号（例えば、搬送波、赤外線信号、デジタル信号）ストレージデバイス等を含む。例えば、機械は、搬送波等の伝播信号を、当該伝播信号で送信される情報を保持可能な媒体から受信することによってストレージデバイスにアクセスするとしてよい。 The method, hardware, software, firmware, or code embodiments described above may be implemented with instructions or code stored on a machine-accessible or machine-readable medium that is executable on a processing element. A machine-accessible / readable medium includes any mechanism that provides (ie, stores and / or transmits) information in a state readable by a machine, such as a computer or electronic system. For example, the machine accessible medium is a RAM such as static random access memory (SRAM) or dynamic RAM (DRAM), ROM, magnetic storage medium or optical storage medium, flash memory device, electrical storage device, optical storage device, acoustic storage device Or other forms of propagated signal (eg, carrier wave, infrared signal, digital signal) storage devices and the like. For example, a machine may access a storage device by receiving a propagation signal, such as a carrier wave, from a medium capable of holding information transmitted with the propagation signal.

本明細書において「一実施形態」または「ある実施形態」という場合、当該実施形態に関連付けて説明している特定の特徴、構造または特性が本発明の少なくとも１つの実施形態に含まれることを意味するものである。このため、「一実施形態において」または「ある実施形態において」という表現は本明細書で繰り返し用いられるが、必ずしも全てが同じ実施形態を意味するものではない。また、特定の特徴、構造、または、特性は、一以上の実施形態において適宜組み合わせるとしてもよい。 In this specification, “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in association with the embodiment is included in at least one embodiment of the invention. To do. Thus, the expressions “in one embodiment” or “in an embodiment” are used repeatedly in this specification, but all do not necessarily mean the same embodiment. In addition, specific features, structures, or characteristics may be combined as appropriate in one or more embodiments.

上述の説明では、具体的な実施形態例を参照しつつ詳細に本発明を説明した。しかし、特許請求の範囲に記載している本発明のより広義の意図および範囲から逸脱することなく、上述した具体的な実施形態例をさまざまな点で変形および変更し得ることは明らかである。したがって、明細書および図面は、本発明を限定するものではなく例示するものと解釈されたい。また、上述したような「実施形態」等の例示的な表現は、必ずしも同じ実施形態または同じ例を指しているものではなく、同じ実施形態に言及している可能性もあるが異なる別個の実施形態に言及している場合もある。
本実施形態によれば、以下の各項目もまた開示される。
（項目１）
複数の処理要素と、
メタフィジカルロジックと
を備え、
前記複数の処理要素のうち一の処理要素は、複数のソフトウェアサブシステムと対応付けられ、
前記メタフィジカルロジックは、前記複数のソフトウェアサブシステムのうち一の現行ソフトウェアサブシステムと対応付けられ、且つ、データアドレスを参照するメタデータアクセス処理を、前記現行ソフトウェアサブシステムに対応付けられているメタデータ識別子（ＭＤＩＤ）および前記データアドレスに少なくとも基づいて、前記現行ソフトウェアサブシステムに対応付けられているメタフィジカルアドレス空間に対応付ける、装置。
（項目２）
前記現行ソフトウェアサブシステムに対応付けられている前記メタフィジカルアドレス空間は、前記複数のソフトウェアサブシステムのうち第２のソフトウェアサブシステムに対応付けられている少なくとも１つのほかのメタフィジカルアドレス空間、および、前記データアドレスを含むデータアドレス空間に対して直交する項目１に記載の装置。
（項目３）
前記複数のソフトウェアサブシステムはそれぞれ、トランザクショナル・ランタイム・サブシステム、ガベージ・コレクション・ランタイム・サブシステム、メモリ保護サブシステム、ソフトウェア変換サブシステム、ネスト化された一群のトランザクションのうち一の外側トランザクション、および、ネスト化された一群のトランザクションのうち一の内側トランザクションから成る群から選択される項目２に記載の装置。
（項目４）
前記メタデータアクセス処理をデコードするデコードロジックをさらに備え、
前記メタデータアクセス処理は、前記デコードロジック内でサポートされた複数の処理のうちの１つとして認められるオペレーションコード（オペコード）を含む項目１に記載の装置。
（項目５）
前記メタフィジカルロジックは、前記データアドレスを、少なくとも前記ＭＤＩＤに基づいて、前記現行ソフトウェアサブシステムに対応付けられている前記メタフィジカルアドレス空間内のメタデータアドレスに変換するメタフィジカル変換ロジックを含む項目１に記載の装置。
（項目６）
前記メタフィジカル変換ロジックは、前記データアドレスを、さらに前記処理要素に対応付けられている処理要素識別子（ＰＥＩＤ）に基づいて、前記現行ソフトウェアサブシステムに対応付けられている前記メタフィジカルアドレス空間内のメタデータアドレスに変換する項目５に記載の装置。
（項目７）
前記メタフィジカル変換ロジックは、前記データアドレスを、さらにデータ対メタデータの圧縮比に基づいて、前記現行ソフトウェアサブシステムに対応付けられている前記メタフィジカルアドレス空間内のメタデータアドレスに変換する項目６に記載の装置。
（項目８）
前記現行ソフトウェアサブシステムによって修正可能なレジスタをさらに備え、
前記レジスタは、前記現行ソフトウェアサブシステムからの書込みに応じて、前記ＭＤＩＤを保持して、前記現行ソフトウェアサブシステムが現在前記処理要素で実行されている旨を示し、
前記メタフィジカル変換ロジックが、前記データアドレスを、前記ＰＥＩＤおよび前記ＭＤＩＤに基づいて、前記現行ソフトウェアサブシステムに対応付けられている前記メタフィジカルアドレス空間内のメタデータアドレスに変換することは、前記メタフィジカル変換ロジックが、前記データアドレスを表す情報と、前記ＰＥＩＤおよび前記ＭＤＩＤとを組み合わせることを含む項目６に記載の装置。
（項目９）
前記メタフィジカル変換ロジックは、前記データアドレスを表す情報と前記ＰＥＩＤおよび前記ＭＤＩＤとを組み合わせる場合に、前記ＰＥＩＤおよび前記ＭＤＩＤを前記データアドレスに追加して前記メタデータアドレスを形成するアルゴリズム、通常のデータ変換テーブルを用いて前記データアドレスを変換後データアドレスに変換して、前記ＰＥＩＤおよび前記ＭＤＩＤを前記変換後データアドレスに追加して前記メタデータアドレスを形成するアルゴリズム、および、通常のデータ変換テーブルとは別のメタフィジカル変換テーブルを用いて前記データアドレスを変換後メタデータアドレスに変換して、前記ＰＥＩＤおよび前記ＭＤＩＤを前記変換後メタデータアドレスに追加して前記メタデータアドレスを形成するアルゴリズムから成る群から選択される組み合わせアルゴリズムに基づいて行う項目８に記載の装置。
（項目１０）
データアドレス空間内にあるデータアドレスであって、キャッシュメモリのデータエントリに保持されているデータアイテムに対応付けられているデータアドレスを参照するメタデータ処理を発見する段階と、
前記データアドレス空間とは別個のメタフィジカルアドレス空間内のメタデータアドレスを、前記データアドレス、前記メタデータ処理に対応付けられている処理要素の処理要素識別子（ＰＥＩＤ）、および、前記処理要素に対応付けられているソフトウェアサブシステムのメタデータ識別子（ＭＤＩＤ）に基づいて決定する段階と、
前記メタデータアドレスに基づいて、前記キャッシュメモリのメタデータエントリにアクセスする段階と
を備える方法。
（項目１１）
前記メタフィジカルアドレス空間は、同様に前記処理要素に対応付けられている追加ソフトウェアサブシステムに対応付けられている追加メタフィジカルアドレス空間からも別個である項目１０に記載の方法。
（項目１２）
前記ソフトウェアサブシステムは、トランザクショナル・ランタイム・サブシステム、ガベージ・コレクション・ランタイム・サブシステム、メモリ保護サブシステム、ソフトウェア変換サブシステム、ネスト化された一群のトランザクションのうち一の外側トランザクション、および、ネスト化された一群のトランザクションのうち一の内側トランザクションから成る群から選択される項目１０に記載の方法。
（項目１３）
前記処理要素で現在実行されている前記ソフトウェアサブシステムに応答した、前記ソフトウェアサブシステムからの、前記処理要素に対応付けられている制御レジスタへの書込み処理を発見することに応じて、前記制御レジスタに前記ＭＤＩＤを書き込む段階と、
前記制御レジスタに基づき前記ＭＤＩＤを決定する段階と
をさらに備える項目１０に記載の方法。
（項目１４）
前記メタデータ処理のオペコードの一部分に基づき前記ＰＥＩＤを決定する段階をさらに備える項目１３に記載の方法。
（項目１５）
前記メタデータアドレスを前記データアドレス、前記ＰＥＩＤ、および、前記ＭＤＩＤから決定する段階は、前記ＰＥＩＤおよび前記ＭＤＩＤを前記データアドレスに追加して前記メタデータアドレスを形成するアルゴリズム、通常のデータ変換テーブルを用いて前記データアドレスを変換後データアドレスに変換して、前記ＰＥＩＤおよび前記ＭＤＩＤを前記変換後データアドレスに追加して前記メタデータアドレスを形成するアルゴリズム、および、通常のデータ変換テーブルとは別のメタフィジカル変換テーブルを用いて前記データアドレスを変換後メタデータアドレスに変換して、前記ＰＥＩＤおよび前記ＭＤＩＤを前記変換後メタデータアドレスに追加して前記メタデータアドレスを形成するアルゴリズムから成る群から選択されたアルゴリズムと、前記データアドレス、前記ＰＥＩＤおよび前記ＭＤＩＤとを組み合わせる段階を有する項目１３に記載の方法。
（項目１６）
データアイテムのデータアドレスを参照するメタデータアクセス命令をデコードするデコードロジックと、
前記データアドレスをソフトウェアに対してトランスペアレントに別個のメタデータアドレスに変換し、前記デコードロジックが前記メタデータアクセス命令をデコードすることに応じて前記別個のメタデータアドレスによって参照されているメタデータにアクセスするメタデータロジックと
を備え、
前記メタデータアクセス命令は、前記デコードロジックが適切にデコード可能な命令群の一部として認識可能なオペコードを含む、装置。
（項目１７）
前記メタデータアクセス命令は、メタデータ・ビット試験および設定（ＭＤＬＴ）命令、メタデータストアおよび設定（ＭＳＳ）命令、および、メタデータストアおよびリセット命令（ＭＤＳＲ）から成る命令群から選択される項目１６に記載の装置。
（項目１８）
前記メタデータアクセス命令は、圧縮メタデータ試験（ＣＭＤＴ）命令、圧縮メタデータストア（ＣＭＳ）命令、および、圧縮メタデータクリア（ＣＭＤＣＬＲ）命令から成る命令群から選択される項目１６に記載の装置。
（項目１９）
前記メタデータロジックが前記データアドレスをソフトウェアに対してトランスペアレントに別個のメタデータアドレスに変換することは、前記メタデータアクセス命令に対応付けられている、ソフトウェアサブシステムによって制御レジスタにおいて特定されているメタデータ識別子（ＭＤＩＤ）に少なくとも基づいて前記データアドレスを変換することを含む項目１６に記載の装置。
（項目２０）
前記メタデータアクセス命令はさらに、デスティネーションレジスタへの参照を含み、
前記メタデータロジックが前記別個のメタデータアドレスによって参照されているメタデータにアクセスすることは、前記メタデータロジックが、参照されている前記別個のメタデータアドレスにおける前記メタデータを前記デスティネーションレジスタにロードすることを含む項目１６に記載の装置。
（項目２１）
前記オペコードは、前記メタデータアクセス命令を発行したスレッドを特定するスレッド識別子フィールドを含む項目２０に記載の装置。
（項目２２）
前記メタデータロジックが前記別個のメタデータアドレスによって参照されているメタデータにアクセスすることは、前記メタデータロジックが、前記デスティネーションレジスタにロードされた前記メタデータが未設定値であることに応じて、参照されている前記別個のメタデータアドレスにおける前記メタデータを設定値に設定することをさらに含む項目２０に記載の装置。
（項目２３）
前記設定値および前記未設定値は、前記メタデータアクセス命令で定められている項目２２に記載の装置。
（項目２４）
プログラムコードを保持する機械可読媒体であって、前記プログラムコードが機械によって実行されると、前記機械は、
データアドレスを参照するデータアクセス処理に応じて、前記データアクセス処理において前記データアドレスを参照するメタデータアクセス処理を生成して、
前記メタデータアクセス処理が前記機械によって実行されると、前記機械は、
前記データアドレスを、前記データアドレスとは別個のメタデータアドレスに変換して、
前記メタデータアドレスに基づき、前記データアドレスにおけるデータアイテムのメタデータにアクセスする機械可読媒体。
（項目２５）
前記メタデータアクセス処理は、メタデータ・ビット試験および設定（ＭＤＬＴ）命令、メタデータストアおよび設定（ＭＳＳ）命令、および、メタデータストアおよびリセット命令（ＭＤＳＲ）から成る命令群から選択される項目２４に記載の機械可読媒体。
（項目２６）
前記メタデータアクセス処理は、圧縮メタデータ試験（ＣＭＤＴ）命令、圧縮メタデータストア（ＣＭＳ）命令、および、圧縮メタデータクリア（ＣＭＤＣＬＲ）命令から成る圧縮命令群から選択される項目２４に記載の機械可読媒体。
（項目２７）
前記メタデータアクセス処理が前記機械によって実行されると、前記機械が前記データアドレスをメタデータアドレスへと変換することは、前記メタデータアクセス処理が前記機械によって実行されると、前記機械が、データ対メタデータの圧縮比に基づき、前記データアドレスと、前記メタデータアクセス処理に対応付けられている処理要素識別子（ＰＥＩＤ）、および、前記メタデータアクセス処理に対応付けられているメタデータデータ識別子（ＭＤＩＤ）とを組み合わせることを含む項目２６に記載の機械可読媒体。
（項目２８）
前記データアドレスは、前記データアイテムを参照するべく、前記機械が有する仮想−物理アドレス変換ロジックによっても変換可能である項目２７に記載の機械可読媒体。
（項目２９）
前記メタデータアクセス処理はさらに、オペランドレジスタを参照し、
前記メタデータアクセス処理が前記機械によって実行されると前記機械が前記データアイテムのメタデータにアクセスすることは、前記メタデータアクセス処理が前記機械によって実行されると、前記機械は前記オペランドレジスタに保持されている値で前記データアイテムの前記メタデータを更新することを含む項目２４に記載の機械可読媒体。
（項目３０）
前記プログラムコードはコンパイラコードを含み、
前記コンパイラコードは、前記データアクセス処理を含むアプリケーションコードをコンパイルし、
前記データアクセス処理において前記メタデータアクセス処理を生成することは、前記アプリケーションコードのコンパイル後のバージョンにおいて前記メタデータアクセス処理を生成することを含む項目２４に記載の機械可読媒体。
（項目３１）
プログラムコードを保持する機械可読媒体であって、前記プログラムコードが機械で実行されると、前記機械は、
前記プログラムコードに含まれるメタデータアクセス命令によって参照されるデータアドレスを、前記メタデータアクセス命令に対応付けられている処理要素において現在アクティブであるソフトウェアサブシステムと対応付けられているメタデータ識別子（ＭＤＩＤ）に基づいて、メタデータアドレスに変換し、
前記メタデータアドレスに基づいてメタデータにアクセスする機械可読媒体。
（項目３２）
前記メタデータアクセス命令は、前記メタデータをロードするメタデータロード命令、前記メタデータにストアするメタデータストア命令、および、前記メタデータをリセットするメタデータクリア命令から成る命令群から選択される項目３１に記載の機械可読媒体。
（項目３３）
前記ソフトウェアサブシステムは、トランザクショナル・ランタイム・サブシステム、ガベージ・コレクション・ランタイム・サブシステム、メモリ保護サブシステム、ソフトウェア変換サブシステム、ネスト化された一群のトランザクションのうち一の外側トランザクション、および、ネスト化された一群のトランザクションのうち一の内側トランザクションから成る群から選択される項目３１に記載の機械可読媒体。
（項目３４）
前記プログラムコードに含まれるメタデータアクセス命令によって参照されるデータアドレスを、前記メタデータアクセス命令に対応付けられている処理要素において現在アクティブであるソフトウェアサブシステムと対応付けられているメタデータ識別子（ＭＤＩＤ）に基づいて、メタデータアドレスに変換することは、前記ＭＤＩＤを前記データアドレスに追加して前記メタデータアドレスを形成するアルゴリズム、通常のデータトランザクションテーブルを用いて前記データアドレスを変換後データアドレスに変換して、前記ＭＤＩＤを前記変換後アドレスに追加して前記メタデータアドレスを形成するアルゴリズム、および、通常のデータ変換テーブルとは別のメタフィジカル変換テーブルを用いて前記データアドレスを変換後メタデータアドレスに変換して、前記ＭＤＩＤを前記変換後メタデータアドレスに追加して前記メタデータアドレスを形成するアルゴリズムから成る群から選択される組み合わせアルゴリズムに基づき、前記ＭＤＩＤと前記データアドレスとを組み合わせることを含む項目３１に記載の機械可読媒体。
（項目３５）
前記ＭＤＩＤを追加することは、前記ＭＤＩＤをＭＳＢ位置に付加するアルゴリズム、前記ＭＤＩＤをＬＳＢ位置に付加するアルゴリズム、アドレスビットを前記ＭＤＩＤで置換するアルゴリズムから成る群から選択される前記ＭＤＩＤを追加するアルゴリズムを含む項目３４に記載の機械可読媒体。
（項目３６）
前記プログラムコードを前記機械で実行すると、前記機械はさらに、前記処理要素で現在アクティブである現行ソフトウェアサブシステムを示す、前記処理要素のための制御レジスタに基づき前記ＭＤＩＤを決定する項目３４に記載の機械可読媒体。
（項目３７）
データアイテムに対応付けられているデータメモリアドレスを参照するメタデータアクセス命令を含むプログラムコードを保持するメモリと、
前記メモリに対応付けられているプロセッサと
を備え、
前記プロセッサは、
前記メタデータアクセス命令の実行に対応付けられる複数の処理要素のうちの一の処理要素と、
前記メモリから前記メタデータアクセス命令をフェッチするフェッチロジックと、
前記メタデータアクセス命令を少なくとも一のメタデータアクセス処理にデコードするデコードロジックと、
前記処理要素におけるアクティブコンテクストに対応付けられているメタデータ識別子（ＭＤＩＤ）を保持する制御レジスタと、
前記データアイテムを保持するためのデータエントリを含むデータキャッシュメモリと、
前記メタデータアクセス処理を実行する実行ロジックと
を有し、
前記実行ロジックが前記メタデータアクセス処理を実行することは、前記プロセッサが有するメタフィジカルアドレス変換ロジックが、前記制御レジスタに保持されている前記ＭＤＩＤに基づいて、前記データメモリアドレスをメタデータメモリアドレスに変換することを含み、前記データキャッシュメモリに結合されているキャッシュ制御ロジックが、前記メタデータメモリアドレスに基づいて、前記データキャッシュメモリの別のエントリに対して前記メタデータアクセス処理を実行することを含むシステム。
（項目３８）
前記メタデータアクセス命令は、前記メタデータをロードするメタデータロード命令、前記メタデータにストアするメタデータストア命令、および、前記メタデータをリセットするメタデータクリア命令から成る命令群から選択される項目３７に記載のシステム。
（項目３９）
前記アクティブコンテクストは、トランザクショナル・ランタイム・サブシステム、ガベージ・コレクション・ランタイム・サブシステム、メモリ保護サブシステム、ソフトウェア変換サブシステム、ネスト化された一群のトランザクションのうち一の外側トランザクション、および、ネスト化された一群のトランザクションのうち一の内側トランザクションから成る群から選択される項目３７に記載のシステム。
（項目４０）
前記プロセッサが有する前記メタフィジカルアドレス変換ロジックが前記データメモリアドレスをメタデータメモリアドレスに変換することはさらに、前記処理要素のための処理要素識別子（ＰＥＩＤ）に基づいて行われ、
前記プロセッサが有する前記メタフィジカルアドレス変換ロジックが、前記ＰＥＩＤおよび前記制御レジスタに保持されている前記ＭＤＩＤに基づいて、前記データメモリアドレスをメタデータメモリアドレスに変換することは、前記ＰＥＩＤおよび前記ＭＤＩＤを前記データメモリアドレスに追加して前記メタデータメモリアドレスを形成するアルゴリズム、通常のデータトランザクションテーブルを用いて前記データメモリアドレスを変換後データメモリアドレスに変換して、前記ＰＥＩＤおよび前記ＭＤＩＤを前記変換後データメモリアドレスに追加して前記メタデータメモリアドレスを形成するアルゴリズム、および、通常のデータ変換テーブルとは別のメタフィジカル変換テーブルを用いて前記データメモリアドレスを変換後メタデータメモリアドレスに変換して、前記ＰＥＩＤおよび前記ＭＤＩＤを前記変換後メタデータメモリアドレスに追加して前記メタデータメモリアドレスを形成するアルゴリズムから成る群から選択される組み合わせアルゴリズムに基づいて、前記ＰＥＩＤおよび前記ＭＤＩＤと前記データメモリアドレスとを組み合わせることを含む項目３７に記載のシステム。
（項目４１）
プロセッサであって、
アドレスを参照するメタデータロード処理を実行する実行モジュールと、
前記メタデータロード処理に応じて、前記プロセッサが第１のモードで動作している場合にはアドレスに対応付けられているメタデータ値を提供し、前記プロセッサが第２のモードで動作している場合には固定値を提供する強制モジュールと
を備えるプロセッサ。
（項目４２）
前記第１のモードは強アトミック性モードを含み、前記第２のモードは弱アトミック性モードを含む項目４１に記載のプロセッサ。
（項目４３）
前記固定値を保持する第１のレジスタをさらに備える項目４２に記載のプロセッサ。
（項目４４）
モード値を保持する第２のレジスタをさらに備え、
前記モード値は、前記プロセッサが前記強アトミック性モードで動作している旨を示す場合には第１の値を表し、
前記モード値は、前記プロセッサが前記弱アトミック性モードで動作している旨を示す場合には第２の値を表す項目４３に記載のプロセッサ。
（項目４５）
前記第１のレジスタおよび前記第２のレジスタは、同じメタデータ制御レジスタである項目４４に記載のプロセッサ。
（項目４６）
前記強制モジュールが、前記プロセッサが前記強アトミック性モードで動作している場合にはアドレスに対応付けられているメタデータ値を提供し、前記プロセッサが前記弱アトミック性モードで動作している場合には固定値を提供することは、前記強制モジュールが、前記第２のレジスタに保持される前記モード値が前記プロセッサが前記強アトミック性モードで動作している旨を示す前記第１の値を表している場合には、前記メタデータロード処理が定めるデスティネーションレジスタに前記メタデータ値をロードして、前記第２のレジスタに保持される前記モード値が前記プロセッサが前記弱アトミック性モードで動作している旨を示す前記第２の値を表す場合には、前記第１のレジスタから前記デスティネーションレジスタへと前記固定値をロードすることを含む項目４４に記載のプロセッサ。
（項目４７）
アドレスを参照するメタデータアクセス処理を発見する段階と、
プロセッサの実行モードを判断する段階と、
前記プロセッサの実行モードが第１の実行モードであると判断されると、前記メタデータアクセス処理に対して、前記アドレスに対応付けられているメタデータ値を提供する段階と、
前記プロセッサの実行モードが第２の実行モードであると判断されると、前記メタデータアクセス処理に対して、レジスタから固定値を提供する段階と
を備える方法。
（項目４８）
前記プロセッサの実行モードを判断する段階は、第１の制御レジスタからモードフラグを読み出す段階を有し、
前記モードフラグは、前記プロセッサの実行モードが第１の実行モードである旨を示す第１の値を保持し、前記プロセッサの実行モードが第２の実行モードである旨を示す第２の値を保持する項目４７に記載の方法。
（項目４９）
前記メタデータアクセス処理に対して、前記アドレスに対応付けられているメタデータ値を提供する段階は、前記アドレスに対応付けられているメモリ位置から、前記メタデータアクセス処理が参照しているデスティネーションレジスタへと、前記メタデータ値をロードする段階を有する項目４７に記載の方法。
（項目５０）
前記メタデータアクセス処理に対して、レジスタから固定値を提供する段階は、前記レジスタから前記デスティネーションレジスタへと前記固定値をロードする段階を有する項目４９に記載の方法。
（項目５１）
アドレスおよびデスティネーションレジスタを参照するメタデータロード処理を保持するメモリと、
前記メモリに対応付けられているプロセッサと
を備え、
前記プロセッサは、
前記メタデータロード処理を実行する実行ロジックと、
強制値を保持するメタデータレジスタと、
前記アドレスに対応付けられているメタデータ値を保持するキャッシュメモリと、
前記実行ロジックが前記メタデータロード処理を実行すると、前記プロセッサが第１のモードで動作する場合には前記メタデータ値を前記デスティネーションレジスタに提供し、前記プロセッサが第２のモードで動作している場合に前記メタデータレジスタから前記デスティネーションレジスタへと前記強制値を提供する強制ロジックと
を有するシステム。
（項目５２）
前記強制ロジックはさらに、前記プロセッサが前記第１のモードまたは前記第２のモードのいずれで動作しているかを判断する項目５１に記載のシステム。
（項目５３）
前記第１のモードは、強アトミック性モードを含み、前記第２のモードは、弱アトミック性モードを含む項目５２に記載のシステム。
（項目５４）
前記メタデータレジスタはさらに、モード値を保持し、
前記モード値は、前記プロセッサが前記第１のモードで動作している場合に第１の値を示し、前記プロセッサが前記第２のモードで動作している場合に第２の値を示し、
前記強制ロジックがさらに前記プロセッサが前記第１のモードまたは前記第２のモードのいずれで動作しているかを判断することは、前記強制ロジックが前記メタデータレジスタの前記モード値を解釈することを含む項目５２に記載のシステム。
（項目５５）
モード値を保持する制御レジスタをさらに備え、
前記モード値は、前記プロセッサが前記第１のモードで動作している場合に第１の値を示し、前記プロセッサが前記第２のモードで動作している場合に第２の値を示し、
前記強制ロジックがさらに前記プロセッサが前記第１のモードまたは前記第２のモードのいずれで動作しているかを判断することは、前記強制ロジックが前記制御レジスタの前記モード値を解釈することを含む項目５２に記載のシステム。
（項目５６）
前記メモリは、ダイナミックランダムアクセスメモリ（ＤＲＡＭ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、および、不揮発性メモリから成る群から選択される項目５１に記載のシステム。
（項目５７）
キャッシュエントリを保持するデータキャッシュアレイと、
前記データキャッシュアレイに結合されているキャッシュ制御ロジックと
を備え、
前記キャッシュ制御ロジックは、
前記キャッシュエントリに対するバッファ済み更新に応じて、前記キャッシュエントリを、監視されていない状態からバッファ済みコヒーレンシ状態および読出監視状態へと遷移させて、
その後に、前記バッファ済み更新をコミットするために前記キャッシュエントリを修正済み状態に遷移させる前に、前記キャッシュエントリを、バッファ済みコヒーレンシ状態および書込監視状態に遷移させる、装置。
（項目５８）
前記キャッシュエントリに対する前記バッファ済み更新は、前記キャッシュエントリに保持されるデータアイテムのデータアドレスに対するトランザクショナルメモリアクセス、前記キャッシュエントリに保持されるメタデータに対応付けられているデータアドレスへのメタデータアクセス、および、前記キャッシュエントリへのローカル更新から成る群から選択される更新を含む項目５７に記載の装置。
（項目５９）
前記キャッシュ制御ロジックが、前記キャッシュエントリを、監視されてない状態からバッファ済みコヒーレンシ状態および読出監視状態へと遷移させることは、前記キャッシュ制御ロジックが前記キャッシュエントリに対応付けられているコヒーレンシビットをバッファ済み値に更新して前記バッファ済みコヒーレンシ状態を表し、前記キャッシュエントリに対応付けられている読出監視属性ビットを読出監視値に更新して前記読出監視状態を表すことを含む項目５７に記載の装置。
（項目６０）
前記キャッシュ制御ロジックが、前記バッファ済み更新をコミットするために前記キャッシュエントリを修正済み状態に遷移させる前に、前記キャッシュエントリを、バッファ済みコヒーレンシ状態および書込監視状態に遷移させることは、前記キャッシュ制御ロジックが、前記キャッシュエントリに対応付けられている前記コヒーレンシビットの前記バッファ済み値を維持して前記バッファ済みコヒーレンシ状態を表し、前記キャッシュエントリに対応付けられている書込監視属性ビットを書込監視値に更新して前記書込監視状態を表すことを含む項目５９に記載の装置。
（項目６１）
前記キャッシュ制御ロジックが前記キャッシュエントリを前記修正済み状態に遷移させることは、前記キャッシュ制御ロジックが、前記キャッシュエントリに対応付けられている前記コヒーレンシビットを修正済み値に更新して前記修正済みコヒーレンシ状態を表すことを含む項目６０に記載の装置。
（項目６２）
前記バッファ済み更新を実行した後にコミット処理を実行する実行ロジックをさらに備え、
前記キャッシュ制御ロジックが、前記バッファ済み更新をコミットするために前記キャッシュエントリを修正済み状態に遷移させる前に、前記キャッシュエントリをバッファ済みコヒーレンシ状態および書込監視状態に遷移させることは、前記実行ロジックが前記コミット処理を実行することに応じて行なう項目５７に記載の装置。
（項目６３）
キャッシュメモリのブロックに対するバッファ済み更新を発見する段階と、
前記キャッシュメモリの前記ブロックに対する前記バッファ済み更新を発見すると、前記ブロックに対して読出監視を適用する段階と、
前記読出監視を適用する段階の後に、前記ブロックをコミットする前に、前記ブロックに書込監視を適用する段階と
を備える方法。
（項目６４）
前記キャッシュメモリの前記ブロックに対する前記バッファ済み更新は、前記キャッシュメモリの前記ブロックに対するトランザクション的書込を含む項目６３に記載の方法。
（項目６５）
前記読出監視を適用する段階と同時に前記キャッシュメモリの前記ブロックに対して前記バッファ済み更新を実行する段階をさらに備え、
前記ブロックは、前記バッファ済み更新を実行した後は、バッファ済みコヒーレンシ状態で保持される項目６３に記載の方法。
（項目６６）
前記読出監視を適用する段階の後に前記キャッシュメモリの前記ブロックに対して前記バッファ済み更新を実行する段階をさらに備え、
前記ブロックは、前記バッファ済み更新を実行した後は、バッファ済みコヒーレンシ状態で保持される項目６３に記載の方法。
（項目６７）
前記キャッシュメモリの前記ブロックに対する前記バッファ済み更新を発見すると、前記ブロックに対して読出監視を適用する段階は、
前記キャッシュメモリのキャッシュドメインの外部の複数の処理要素に対して、前記ブロックに対する読出要求を生成する段階と、
前記ブロックに対する前記読出要求に応じて前記キャッシュドメインの外部の前記複数の処理要素からコンフリクトがないことを検出すると、前記キャッシュメモリの前記ブロックに対応付けられている読出監視属性を読出監視値に更新して前記ブロックに読出監視を適用する段階と
を有する項目６３に記載の方法。
（項目６８）
前記読出監視を適用する段階の後に、前記ブロックをコミットする前に、前記ブロックに書込監視を適用する段階は、
前記キャッシュメモリの前記キャッシュドメインの外部の前記複数の処理要素に対して、前記ブロックについての所有権読出要求を生成する段階と、
前記ブロックについての前記所有権読出要求に応じて前記キャッシュドメインの外部の前記複数の処理要素からコンフリクトがないと検出することに応じて、前記キャッシュメモリの前記ブロックに対応付けられている書込監視属性を書込監視値に更新して、前記ブロックに書込監視を適用する段階と
を有する項目６７に記載の方法。
（項目６９）
前記ブロックをコミットすることは、前記ブロックのキャッシュコヒーレンシ状態を、バッファ済みコヒーレンシ状態から修正済みコヒーレンシ状態へと遷移させることを含む項目６８に記載の方法。
（項目７０）
プログラムコードを保持する機械アクセス可能媒体であって、
前記プログラムコードを機械が実行すると、前記機械は、
キャッシュメモリのブロックへのバッファ済み書込があると、前記ブロックに読出監視を適用し、
前記ブロックに対して前記バッファ済み書込を実行し、
前記読出監視を適用した後、且つ、前記ブロックをコミットする前に、前記ブロックに書込監視を適用する機械アクセス可能媒体。
（項目７１）
前記キャッシュメモリの前記ブロックへの前記バッファ済み書込があると、前記ブロックに読出監視を適用することは、
前記キャッシュメモリのキャッシュドメインの外部の複数の処理要素に対して前記ブロックについての読出要求を生成することと、
前記ブロックについての前記読出要求に応じて前記キャッシュドメインの外部の前記複数の処理要素からコンフリクトがないことを検出することに応じて、前記キャッシュメモリの前記ブロックに対応付けられている読出監視属性を読出監視値に更新して前記ブロックに対して読出監視を適用することと
を含む
項目７０に記載の機械アクセス可能媒体。
（項目７２）
前記読出監視を適用した後、且つ、前記ブロックをコミットする前に、前記ブロックに書込監視を適用することは、
前記キャッシュメモリの前記キャッシュドメインの外部の前記複数の処理要素に対して前記ブロックについての所有権読出要求を生成することと、
前記ブロックについての前記所有権読出要求に応じて前記キャッシュドメインの外部の前記複数の処理要素からコンフリクトがないと検出することに応じて、前記キャッシュメモリの前記ブロックに対応付けられている書込監視属性を書込監視値に更新して前記ブロックに書込監視を適用することと
を含む項目７１に記載の機械アクセス可能媒体。
（項目７３）
コミット処理を発見することに応じて、前記読出監視を適用した後、且つ、前記ブロックをコミットする前に、前記ブロックに書込監視を適用する項目７０に記載の機械アクセス可能媒体。
（項目７４）
前記ブロックをコミットすることは、前記ブロックのキャッシュコヒーレンシ状態を修正済みコヒーレンシ状態に遷移させることを含む項目７０に記載の機械アクセス可能媒体。
（項目７５）
メモリアドレスを参照するトランザクション的書込、および、コミット処理を保持するシステムメモリと、
キャッシュメモリを有し、前記システムメモリに対応付けられているプロセッサと
を備え、
前記プロセッサが有する前記キャッシュメモリは、
前記トランザクション的書込を受信することに応じて前記メモリアドレスに対応付けられているキャッシュラインについての読出要求を生成し、
前記読出要求に基づきコンフリクトが検出されないことに応じて、バッファされており読出監視されている状態に前記キャッシュラインを遷移させ、
前記コミット処理を受信することに応じて所有権読出要求を生成し、前記所有権読出要求に基づきコンフリクトが検出されないことに応じて、バッファされており書込監視されている状態に前記キャッシュラインを遷移させ、
前記バッファされており書込監視されている状態に前記キャッシュラインを遷移させることに応じて、修正済み状態に前記キャッシュラインを遷移させる、システム。
（項目７６）
前記キャッシュメモリが、バッファされており読出監視されている状態に前記キャッシュラインを遷移させることは、前記キャッシュメモリが、前記キャッシュラインに対応付けられているコヒーレンシビットをバッファ済み値に更新して、前記バッファされており読出監視されている状態のうちバッファされている状態の部分を表し、前記キャッシュラインに対応付けられている読出監視属性ビットを読出監視値に更新して、前記バッファされており読出監視されている状態のうち読出監視されている状態の部分を表すことを含む項目７５に記載のシステム。
（項目７７）
前記キャッシュメモリが、バッファされており書込監視されている状態に前記キャッシュラインを遷移させることは、前記キャッシュメモリが、前記キャッシュラインに対応付けられている前記コヒーレンシビットの前記バッファ済み値を維持して、前記バッファされており書込監視されている状態のうちバッファされている状態の部分を表し、前記キャッシュラインに対応付けられている書込監視属性ビットを書込監視値に更新して、前記バッファされており書込監視されている状態のうち書込監視されている状態部分を表すことを含む項目７６に記載のシステム。
（項目７８）
前記キャッシュメモリが、前記キャッシュラインを前記バッファされており書込監視されている状態に遷移させることに応じて、前記キャッシュラインを修正済み状態に遷移させることは、前記コヒーレンシビットを修正済み値に更新して前記修正済み状態を表すことを含む項目７７に記載のシステム。
（項目７９）
前記メモリは、ダイナミックランダムアクセスメモリ（ＤＲＡＭ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、および、不揮発性メモリから成る群から選択される項目７５に記載のシステム。
（項目８０）
損失命令をデコードしてデコード済み要素を提供するデコードロジックと、
損失イベントが検出された旨を示す損失値を保持する損失フィールドを有するステータス格納要素と、
前記ステータス格納要素に結合されているジャンプロジックと
を備え、
前記損失命令は、ラベルを参照しており、オペレーションコード（オペコード）を含み、前記デコードロジックによって認識可能な命令群の一部となり、
前記ジャンプロジックは、前記デコード済み要素、および、前記損失イベントが検出された旨を示す前記損失値に基づいて前記ラベルに制御を渡す、装置。
（項目８１）
前記ラベルはジャンプ先アドレスを含み、
損失イベントは、読出監視されているキャッシュラインへの書込みが発生した旨を示す読出監視コンフリクト、書込監視されているキャッシュラインへのアクセスが発生した旨を示す書込監視コンフリクト、および、バッファされているキャッシュラインの損失から成る群から選択される項目８０に記載の装置。
（項目８２）
前記ステータス格納要素はレジスタを有し、
前記損失値を保持する前記損失フィールドは、読出監視コンフリクトが検出されると設定される第１のビットと、書込監視コンフリクトが検出されると設定される第２のビットと、バッファされている物理データの損失が検出されると設定される第３のビットと、バッファされているメタデータの損失が検出されると設定される第４のビットとを含む項目８０に記載の装置。
（項目８３）
前記損失命令は読出監視損失命令を含み、前記オペコードは読出監視損失イベント型を規定し、
前記ジャンプロジックが、前記デコード済み要素、および、前記損失イベントが検出された旨を示す前記損失値に基づいて前記ラベルに制御を渡すことは、前記ジャンプロジックが、発生した前記損失イベントの種類が、前記読出監視損失命令の前記オペコードによって規定されている前記読出監視損失イベント型である旨を示す前記損失値を前記損失フィールドが保持していることに応じて、前記ラベルに実行をジャンプさせることを含む項目８０に記載の装置。
（項目８４）
前記損失命令は書込監視損失命令を含み、前記オペコードは書込監視損失イベント型を規定し、
前記ジャンプロジックが、前記デコード済み要素、および、前記損失イベントが検出された旨を示す前記損失値を保持する前記損失フィールドに基づいて前記ラベルに制御を渡すことは、前記ジャンプロジックが、発生した前記損失イベントの種類が、前記書込監視損失命令の前記オペコードによって規定されている前記書込監視損失イベント型である旨を示す前記損失値を前記損失フィールドが保持していることに応じて、前記ラベルに実行をジャンプさせることを含む項目８０に記載の装置。
（項目８５）
前記損失命令はバッファ済み損失命令を含み、前記オペコードはバッファ済み損失イベント型を規定し、
前記ジャンプロジックが、前記デコード済み要素、および、前記損失イベントが検出された旨を示す前記損失値を保持する前記損失フィールドに基づいて前記ラベルに制御を渡すことは、前記ジャンプロジックが、発生した前記損失イベントの種類が、前記バッファ済み損失命令の前記オペコードによって規定されている前記バッファ済み損失イベント型である旨を示す前記損失値を前記損失フィールドが保持していることに応じて、前記ラベルに実行をジャンプさせることを含む項目８０に記載の装置。
（項目８６）
プログラムコードを保持する機械アクセス可能媒体であって、
前記プログラムコードを機械が実行すると、前記機械は、
損失命令に応じて、
前記損失命令によって規定されており、前記機械に設けられているトランザクションステータスレジスタに保持されているトランザクションのステータスを判断して、
前記損失命令に対応付けられている損失イベントが検出された旨を前記トランザクションの前記ステータスが示していることに応じて、前記損失命令によって規定されているラベルに実行を誘導する機械アクセス可能媒体。
（項目８７）
前記ラベルはジャンプ先アドレスを含み、
損失イベントは、読出監視されているキャッシュラインへの書込みが発生した旨を示す読出監視コンフリクト、書込監視されているキャッシュラインへのアクセスが発生した旨を示す書込監視コンフリクト、および、バッファされているキャッシュラインの損失から成る群から選択される項目８６に記載の機械アクセス可能媒体。
（項目８８）
前記損失命令は、前記損失イベントを読出監視コンフリクトと規定する読出監視ジャンプ損失（ＪＬＯＳＳ）命令を含み、
前記トランザクションステータスレジスタに保持されているトランザクションのステータスを判断することは、前記トランザクションステータスレジスタに保持されている読出監視コンフリクトビットのステータスを判断することを含み、
前記損失命令に対応付けられている損失イベントが検出された旨を前記トランザクションの前記ステータスが示していることに応じて、前記損失命令によって規定されているラベルに実行を誘導することは、読出監視コンフリクトが検出された旨を、前記トランザクションステータスレジスタに保持されている前記読出監視コンフリクトビットの前記ステータスが示していることに応じて、前記損失命令によって規定されているラベルに実行を誘導することを含む項目８６に記載の機械アクセス可能媒体。
（項目８９）
前記損失命令は、前記損失イベントを書込監視コンフリクトと規定する書込監視ジャンプ損失（ＪＬＯＳＳ）命令を含み、
前記トランザクションステータスレジスタに保持されているトランザクションのステータスを判断することは、前記トランザクションステータスレジスタに保持されている書込監視コンフリクトビットのステータスを判断することを含み、
前記損失命令に対応付けられている損失イベントが検出された旨を前記トランザクションの前記ステータスが示していることに応じて、前記損失命令によって規定されているラベルに実行を誘導することは、書込監視コンフリクトが検出された旨を、前記トランザクションステータスレジスタに保持されている前記書込監視コンフリクトビットの前記ステータスが示していることに応じて、前記損失命令によって規定されているラベルに実行を誘導することを含む項目８６に記載の機械アクセス可能媒体。
（項目９０）
前記損失命令は、前記損失イベントをバッファ済み監視コンフリクトと規定するバッファ済み監視ジャンプ損失（ＪＬＯＳＳ）命令を含み、
前記トランザクションステータスレジスタに保持されているトランザクションのステータスを判断することは、前記トランザクションステータスレジスタに保持されているバッファ済み監視コンフリクトビットのステータスを判断することを含み、
前記損失命令に対応付けられている損失イベントが検出された旨を前記トランザクションの前記ステータスが示していることに応じて、前記損失命令によって規定されているラベルに実行を誘導することは、バッファ済み監視コンフリクトが検出された旨を、前記トランザクションステータスレジスタに保持されている前記バッファ済み監視コンフリクトビットの前記ステータスが示していることに応じて、前記損失命令によって規定されているラベルに実行を誘導することを含む項目８６に記載の機械アクセス可能媒体。
（項目９１）
プロセッサにおいて損失命令を発見する段階と、
前記損失命令を発見する段階に応じて、前記プロセッサにおいて前記損失命令に対応付けられている損失イベントが検出されたか否かを判断する段階と、
前記損失命令を発見する段階に応じて前記損失命令が参照しているラベルに分岐して、前記損失命令に対応付けられている前記損失イベントが前記プロセッサにおいて検出されたと判断する段階と
を備える方法。
（項目９２）
前記ラベルはジャンプアドレスを含む項目９１に記載の方法。
（項目９３）
前記損失命令は読出監視損失命令を含み、
前記読出監視損失命令に対応付けられている前記損失イベントは、読出監視されているキャッシュラインへの書込を含む項目９１に記載の方法。
（項目９４）
前記損失命令は書込監視損失命令を含み、
前記書込監視損失命令に対応付けられている前記損失イベントは、書込監視されているキャッシュラインへのアクセスを含む項目９１に記載の方法。
（項目９５）
前記損失命令はバッファ済み損失命令を含み、
前記バッファ済み損失命令に対応付けられている前記損失イベントは、バッファされているキャッシュラインのエビクションを含む項目９１に記載の方法。
（項目９６）
前記プロセッサにおいてバッファされているキャッシュラインのエビクションが検出されたか否かを判断することは、トランザクションステータスレジスタのバッファ済み損失ステータスビットを確認して、前記バッファ済み損失ステータスビットが損失値に設定されていることに応じて、バッファされているキャッシュラインのエビクションが検出されたと判断することを含む項目９５に記載の方法。
（項目９７）
トランザクションのコミット命令をデコードしてデコード済み要素を提供するデコードロジックと、
前記デコード済み要素に応じて、前記コミット命令が規定するコミット条件が前記トランザクションについて満たされるか否かを判断するコミットロジックと
を備え、
前記コミット命令は、前記コミット条件を規定しており、オペレーションコード（オペコード）を含み、前記デコードロジックによって認識可能な命令群の一部となる、装置。
（項目９８）
前記コミット条件は、読出監視データの損失無し、書込監視データの損失無し、バッファデータの損失無し、および、メタデータの損失無しのうち任意の規定された組み合わせを含み、
前記コミットロジックが前記コミット条件が満たされていると判断することは、読出監視データの損失無し、書込監視データの損失無し、バッファデータの損失無し、および、メタデータの損失無しのうち前記規定された組み合わせが発生したと判断することを含む項目９７に記載の装置。
（項目９９）
前記コミット条件を規定する前記コミット命令は、４ビットを保持する前記コミット命令を含み、前記４ビットのうち、第１のビットは、設定されると、読出監視データの損失がコミットすべき条件であることを示し、第２のビットは、設定されると、書込監視データの損失がコミットすべき条件であることを示し、第３のビットは、設定されると、バッファデータの損失がコミットすべき条件であることを示し、第４のビットは、設定されると、メタデータの損失がコミットすべき条件であることを示す項目９７に記載の装置。
（項目１００）
前記４ビットは、前記オペコードに含まれる項目９９に記載の装置。
（項目１０１）
前記コミットロジックが前記コミット命令が規定する前記コミット条件が前記トランザクションについて満たされているか否か判断することは、前記コミットロジックが、前記コミット命令で設定される前記４ビットの各ビットについてトランザクションステータスレジスタの対応ステータスビットを確認して、確認した前記トランザクションステータスレジスタの前記対応ステータスビットのいずれも対応する損失を示すべく設定されていない場合に、前記コミット条件が満たされていると判断することを含む項目９９に記載の装置。
（項目１０２）
前記コミット命令はさらに、読出監視データ、書込監視データ、バッファ済みデータ、および、メタデータのうちコミット時にクリアすべき組み合わせを示すクリア制御を規定しており、
前記コミットロジックは、前記コミット命令で規定する前記コミット条件が前記トランザクションについて満たされていると判断することに応じて前記トランザクションをコミットした後で、読出監視データ、書込監視データ、バッファデータ、および、メタデータのうち規定された前記組み合わせをクリアする項目９７に記載の装置。
（項目１０３）
プログラムコードを保持する機械可読媒体であって、
前記プログラムコードを機械が実行すると、前記機械は、
前記プログラムコードに含まれるトランザクションについて、少なくとも１つのコミット失敗条件を規定しているコミット命令を発見し、
前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されたか否かを判断し、
前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されたと判断したことに応じて、前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されたことを示す値を提供する
機械可読媒体。
（項目１０４）
前記少なくとも１つのコミット失敗条件は、読出監視データの損失、書込監視データの損失、バッファデータの損失、および、メタデータの損失から成る群から選択される項目１０３に記載の機械可読媒体。
（項目１０５）
前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されたことを示す値を提供することは、前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されたことを示す値をデスティネーションレジスタにロードすることを含む項目１０３に記載の機械可読媒体。
（項目１０６）
前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されたか否かを判断することは、
前記少なくとも１つのコミット失敗条件に対応付けられているトランザクションステータスレジスタのステータスビットを確認することと、
前記少なくとも１つのコミット失敗条件に対応付けられている前記ステータスビットが、前記トランザクションの実行中に前記少なくとも１つのコミット失敗条件が検出された旨を示すように設定されていることに応じて、前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されたと判断することと、
前記少なくとも１つのコミット失敗条件に対応付けられている前記ステータスビットが、前記トランザクションの実行中に前記少なくとも１つのコミット失敗条件が検出されなかった旨を示すようにリセットされていることに応じて、前記トランザクションの実行中に、前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されなかったと判断することと
を含む項目１０３に記載の機械可読媒体。
（項目１０７）
前記トランザクションの実行中に前記コミット命令が規定する前記少なくとも１つのコミット失敗条件が検出されなかったと判断することに応じて、前記トランザクションをコミットすることをさらに含む項目１０６に記載の機械可読媒体。
（項目１０８）
トランザクションにおいて、前記トランザクションについての複数のコミット失敗条件を規定しているオペレーションコード（オペコード）を含むコミット命令を発見する段階と、
前記トランザクションの実行中に前記コミット命令の前記オペコードにおいて規定されている前記トランザクションについての前記複数のコミット失敗条件がいずれも検出されなかったと判断する段階と、
前記トランザクションの実行中に前記コミット命令の前記オペコードにおいて規定されている前記トランザクションについての前記複数のコミット失敗条件がいずれも検出されなかったと判断することに応じて、前記トランザクションをコミットする段階と
を備える方法。
（項目１０９）
前記トランザクションについての前記複数のコミット失敗条件を規定している前記オペコードは、設定されると読出監視データの損失がコミット失敗条件であると規定する前記オペコードの第１のビットと、設定されると書込監視データの損失がコミット失敗条件であると規定する前記オペコードの第２のビットと、設定されるとバッファデータの損失がコミット失敗条件であると規定する前記オペコードの第３のビットと、設定されるとメタデータの損失がコミット失敗条件であると規定する前記オペコードの第４のビットとを含む項目１０８に記載の方法。
（項目１１０）
前記トランザクションの実行中に前記コミット命令の前記オペコードにおいて規定されている前記トランザクションについての前記複数のコミット失敗条件がいずれも検出されなかったと判断する段階は、
前記オペコードの前記第１のビットが設定されていることに応じて、トランザクションステータスレジスタの読出監視ビットが設定されておらず読出監視データの損失無しを示していると判断する段階と、
前記オペコードの前記第２のビットが設定されていることに応じて、前記トランザクションステータスレジスタの書込監視ビットが設定されておらず書込監視データの損失無しを示していると判断する段階と、
前記オペコードの前記第３のビットが設定されていることに応じて、前記トランザクションステータスレジスタのバッファ済みビットが設定されておらずバッファデータの損失無しを示していると判断する段階と、
前記オペコードの前記第４のビットが設定されていることに応じて、前記トランザクションステータスレジスタのメタデータビットが設定されておらずメタデータの損失無しを示していると判断する段階と
を有する項目１０９に記載の方法。
（項目１１１）
前記オペコードはさらに、クリア制御を規定しており、
前記クリア制御を規定している前記オペコードは、設定されると読出監視データをコミット時にクリアすることを規定している前記オペコードの第５のビットと、設定されると書込監視データをコミット時にクリアすることを規定している前記オペコードの第６のビットと、設定されるとバッファデータをコミット時にクリアすることを規定している前記オペコードの第７のビットと、設定されるとメタデータをコミット時にクリアすることを規定している前記オペコードの第８のビットとを含む項目１０９に記載の方法。
（項目１１２）
前記トランザクションをコミットする段階は、前記第５のビットが設定されている場合に読出監視データをクリアする段階と、前記第６のビットが設定されている場合に書込監視データをクリアする段階と、前記第７のビットが設定されている場合にバッファデータをクリアする段階と、前記第８のビットが設定されている場合にメタデータをクリアする段階を含む項目１１１に記載の方法。
（項目１１３）
トランザクションのためのコミット命令であって、クリア制御情報および前記トランザクションについての複数のコミット失敗条件を規定するオペレーションコード（オペコード）を含むコミット命令を含むプログラムコードを保持するメモリと、
前記コミット命令の前記オペコードをデコードするデコードロジック、および、コミットロジックを有するプロセッサと
を備え、
前記コミットロジックは、前記オペコードで規定される前記複数のコミット失敗条件のいずれも前記トランザクションの実行中に検出されなかったか否かを判断し、前記コミットロジックが前記トランザクションの実行中に前記複数のコミット失敗条件のいずれも検出されなかったと判断することに応じて、前記トランザクションをコミットし、
前記コミットロジックが前記トランザクションをコミットすることは、前記コミットロジックが、前記コミット命令の前記オペコードで規定される前記クリア制御情報に基づいてトランザクション情報をクリアすることを含む、システム。
（項目１１４）
前記コミット失敗条件は、読出監視データの損失、書込監視データの損失、バッファデータの損失、および、メタデータの損失を組み合わせて決まる項目１１３に記載のシステム。
（項目１１５）
前記コミット失敗条件は、書込監視データの損失、読出監視データの損失または書込監視データの損失、書込監視データの損失またはバッファデータの損失、書込監視データの損失またはメタデータの損失、および、書込監視データの損失、読出監視データの損失、バッファデータの損失またはメタデータの損失から成る群から選択される項目１１４に記載のシステム。
（項目１１６）
前記オペコードが前記クリア制御情報を規定することは、読出監視、書込監視、バッファ済みコヒーレンシおよびメタデータのうちどれがコミット時にクリアされるかを前記オペコードが規定していることを含み、
前記コミットロジックが前記コミット命令の前記オペコードに規定される前記クリア制御情報に基づいてトランザクション情報をクリアすることは、前記コミットロジックが前記読出監視、前記書込監視、前記バッファ済みコヒーレンシ、および、前記メタデータのうち前記オペコードでクリアされるべきことが規定されているものをクリアすることを含む項目１１３に記載のシステム。
（項目１１７）
トランザクションイネーブルフィールド（ＴＥＦ）を有する格納要素と、
リングレベル遷移イベントに応じて少なくとも前記ＴＥＦの状態を格納構造に保存し、リターンイベントに応じて少なくとも前記ＴＥＦの状態を前記格納構造から前記格納要素へと戻すロジックと
を備え、
前記ＴＥＦは、アクティブ値を保持している場合には対応付けられているトランザクションがアクティブでイネーブルされている旨を示し、非アクティブ値を保持している場合には対応付けられているトランザクションが一時停止している旨を示す、装置。
（項目１１８）
前記リングレベル遷移イベントは、割り込み、例外、システム呼び出し、仮想マシン開始、仮想マシン終了から成る群から選択されるイベントを含む項目１１７に記載の装置。
（項目１１９）
前記リターンイベントは、割り込みリターン（ＩＲＥＴ）、システムリターン（ＳＹＳＲＥＴ）、仮想マシン（ＶＭ）開始、および、仮想マシン（ＶＭ）終了から成る群から選択されるイベントを含む項目１１７に記載の装置。
（項目１２０）
前記格納要素はフラグレジスタを含み、
前記ＴＥＦはトランザクションイネーブルフラグを含む項目１１７に記載の装置。
（項目１２１）
前記格納構造はスタックを含み、前記ロジックが前記スタックに少なくとも前記ＴＥＦの状態を保存することは、プッシュロジックが少なくとも前記ＴＥＦの状態を前記スタックにプッシュすることを含み、前記ロジックが前記スタックから前記格納要素に少なくとも前記ＴＥＦの状態を戻すことは、ポップロジックが少なくとも前記ＴＥＦの状態を前記スタックからポップして前記ＴＥＦを前記格納要素に戻すことを含む項目１１７に記載の装置。
（項目１２２）
実行されるとリングレベル遷移イベントを発生させるコードを保持するメモリと、
レジスタおよびスタックロジックを有するプロセッサと
を備え、
前記レジスタは、対応付けられているトランザクションがアクティブである旨を示すアクティブ値を保持するためのトランザクションイネーブルフィールド（ＴＥＦ）を含み、
前記スタックロジックは、前記リングレベル遷移イベントに応じて前記レジスタの以前の状態をスタックにプッシュして、前記ＴＥＦを非アクティブ値にクリアして前記対応付けられているトランザクションが一時停止している旨を示し、リターンイベントに応じて、前記スタックから前記レジスタに前記レジスタの前記以前の状態を戻すシステム。
（項目１２３）
前記リングレベル遷移イベントは、割り込み、例外、システム呼び出し、仮想マシン開始、および、仮想マシン終了から成る群から選択されるイベントを含む項目１２２に記載のシステム。
（項目１２４）
前記リターンイベントは、割り込みリターン（ＩＲＥＴ）、システムリターン（ＳＹＳＲＥＴ）、仮想マシン（ＶＭ）開始、および、仮想マシン（ＶＭ）終了から成る群から選択されるイベントを含む項目１２２に記載のシステム。
（項目１２５）
前記レジスタはフラグレジスタを含み、前記ＴＥＦはトランザクションイネーブルフラグを含み、前記アクティブ値は前記トランザクションイネーブルフラグの論理Ｈｉｇｈ値を含み、前記非アクティブ値は前記トランザクションイネーブルフラグの論理Ｌｏｗ値を含む項目１２２に記載のシステム。
（項目１２６）
現在のリングレベルからのリングレベル遷移イベントを検出する段階と、
トランザクションイネーブルフィールドを含むレジスタの以前の状態を格納構造に保存する段階と、
前記トランザクションイネーブルフィールドをクリアして対応付けられているトランザクションが一時停止している旨を示す段階と、
前記現在のリングレベルへのリターンイベントを検出する段階と、
前記現在のリングレベルへの前記リターンイベントを検出する段階に応じて、前記格納構造から前記レジスタの前記以前の状態を戻す段階と
を備える方法。
（項目１２７）
前記格納構造はカーネルスタックを含み、前記レジスタの前記以前の状態を前記カーネルスタックに保存する段階は、前記レジスタの前記以前の状態を前記カーネルスタックにプッシュする段階を有し、前記カーネルスタックから前記レジスタの前記以前の状態を戻す段階は、前記カーネルスタックから前記レジスタの前記以前の状態をポップして、前記以前の状態を前記レジスタに戻す段階を有する項目１２６に記載の方法。
（項目１２８）
前記現在のリングレベルは、ユーザリングレベルを含む項目１２６に記載の方法。
（項目１２９）
前記リングレベル遷移イベントは、割り込み、例外、システム呼び出し、および、仮想マシン開始から成る群から選択されるイベントを含む項目１２８に記載の方法。
（項目１３０）
前記現在の特権レベルへの前記リターンイベントは、割り込みリターン（ＩＲＥＴ）、システムリターン（ＳＹＳＲＥＴ）、および、仮想マシン（ＶＭ）終了から成る群から選択されるイベントを含む項目１２９に記載の方法。

In the foregoing description, the invention has been described in detail with reference to specific exemplary embodiments. However, it will be apparent that various modifications and changes may be made in the specific embodiments described above without departing from the broader spirit and scope of the invention as set forth in the claims. Accordingly, the specification and drawings are to be construed as illustrative rather than limiting. Also, exemplary expressions such as “embodiments” as described above do not necessarily refer to the same embodiment or the same example, and may refer to the same embodiment but different separate implementations. It may also refer to a form.
According to this embodiment, the following items are also disclosed.
(Item 1)
Multiple processing elements;
Metaphysical logic and
With
One processing element of the plurality of processing elements is associated with a plurality of software subsystems,
The metaphysical logic is associated with one current software subsystem of the plurality of software subsystems, and metadata access processing that refers to a data address is associated with the current software subsystem. An apparatus for associating with a metaphysical address space associated with the current software subsystem based at least on a data identifier (MDID) and the data address.
(Item 2)
The metaphysical address space associated with the current software subsystem is at least one other metaphysical address space associated with a second software subsystem of the plurality of software subsystems; and The apparatus of item 1, wherein the apparatus is orthogonal to a data address space containing the data address.
(Item 3)
Each of the plurality of software subsystems includes a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software conversion subsystem, an outer transaction of a group of nested transactions, And the apparatus of item 2 selected from the group consisting of one inner transaction of the nested group of transactions.
(Item 4)
A decoding logic for decoding the metadata access process;
The apparatus according to item 1, wherein the metadata access process includes an operation code (opcode) recognized as one of a plurality of processes supported in the decoding logic.
(Item 5)
The metaphysical logic includes metaphysical conversion logic that converts the data address to a metadata address in the metaphysical address space associated with the current software subsystem based on at least the MDID. The device described in 1.
(Item 6)
The metaphysical conversion logic further includes the data address in the metaphysical address space associated with the current software subsystem based on a processing element identifier (PEID) associated with the processing element. 6. The device according to item 5, wherein the device converts the metadata address.
(Item 7)
The metaphysical conversion logic further converts the data address into a metadata address in the metaphysical address space associated with the current software subsystem based on a data-to-metadata compression ratio. The device described in 1.
(Item 8)
A register further modifiable by the current software subsystem;
The register holds the MDID in response to a write from the current software subsystem to indicate that the current software subsystem is currently being executed by the processing element;
The metaphysical conversion logic converts the data address to a metadata address in the metaphysical address space associated with the current software subsystem based on the PEID and the MDID. The apparatus according to item 6, wherein physical conversion logic includes combining information representing the data address, the PEID, and the MDID.
(Item 9)
The metaphysical conversion logic includes an algorithm for forming the metadata address by adding the PEID and the MDID to the data address when the information representing the data address is combined with the PEID and the MDID, normal data An algorithm for converting the data address to a converted data address using a conversion table, adding the PEID and the MDID to the converted data address to form the metadata address, and a normal data conversion table; Is an algorithm for converting the data address to a converted metadata address using another metaphysical conversion table, and adding the PEID and the MDID to the converted metadata address to form the metadata address. Apparatus according to claim 8 carried out based on a combination algorithm selected from the group consisting of.
(Item 10)
Discovering metadata processing that refers to a data address that is in a data address space and that is associated with a data item held in a data entry in a cache memory; and
A metadata address in a metaphysical address space separate from the data address space corresponds to the data address, a processing element identifier (PEID) of a processing element associated with the metadata processing, and the processing element Determining based on a metadata identifier (MDID) of the attached software subsystem;
Accessing a metadata entry in the cache memory based on the metadata address;
A method comprising:
(Item 11)
11. The method of item 10, wherein the metaphysical address space is also distinct from an additional metaphysical address space that is also associated with an additional software subsystem that is also associated with the processing element.
(Item 12)
The software subsystem includes a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software conversion subsystem, an outer transaction of one of a group of nested transactions, and a nested 11. The method according to item 10, wherein the method is selected from the group consisting of one inner transaction of a group of transactions.
(Item 13)
Responsive to the software subsystem currently executing on the processing element in response to discovering a write operation from the software subsystem to a control register associated with the processing element Writing the MDID in
Determining the MDID based on the control register;
The method according to item 10, further comprising:
(Item 14)
14. The method of item 13, further comprising determining the PEID based on a portion of the metadata processing opcode.
(Item 15)
The step of determining the metadata address from the data address, the PEID, and the MDID includes an algorithm that forms the metadata address by adding the PEID and the MDID to the data address, and a normal data conversion table. An algorithm for converting the data address to a converted data address and adding the PEID and the MDID to the converted data address to form the metadata address, and a separate from a normal data conversion table Select from the group consisting of algorithms that convert the data address to a converted metadata address using a metaphysical conversion table and add the PEID and MDID to the converted metadata address to form the metadata address Is Algorithm and the data address The method of claim 13 including the step of combining the PEID and the MDID.
(Item 16)
Decoding logic for decoding a metadata access instruction that refers to the data address of the data item;
Transform the data address into a separate metadata address transparent to software and access the metadata referenced by the separate metadata address in response to the decode logic decoding the metadata access instruction Metadata logic to
With
The metadata access instruction includes an operation code that can be recognized as a part of an instruction group that can be appropriately decoded by the decoding logic.
(Item 17)
The metadata access instruction is selected from an instruction group consisting of a metadata bit test and set (MDLT) instruction, a metadata store and set (MSS) instruction, and a metadata store and reset instruction (MDSR). The device described in 1.
(Item 18)
17. The apparatus of item 16, wherein the metadata access instructions are selected from an instruction group consisting of a compressed metadata test (CMDT) instruction, a compressed metadata store (CMS) instruction, and a compressed metadata clear (CMDCLR) instruction.
(Item 19)
Translating the data address into a separate metadata address that is transparent to the software by the metadata logic is specified in a control register by a software subsystem associated with the metadata access instruction. The apparatus of item 16, comprising translating the data address based at least on a data identifier (MDID).
(Item 20)
The metadata access instruction further includes a reference to a destination register;
When the metadata logic accesses metadata referenced by the separate metadata address, the metadata logic places the metadata at the referenced separate metadata address into the destination register. Item 17. The device according to Item 16, comprising loading.
(Item 21)
21. The apparatus of item 20, wherein the opcode includes a thread identifier field that identifies a thread that issued the metadata access instruction.
(Item 22)
The metadata logic accessing metadata referenced by the separate metadata address depends on the metadata logic being loaded with the destination register being an unset value. 21. The apparatus of item 20, further comprising: setting the metadata at the referenced separate metadata address to a set value.
(Item 23)
The apparatus according to item 22, wherein the set value and the unset value are determined by the metadata access command.
(Item 24)
A machine-readable medium having program code, wherein when the program code is executed by a machine, the machine
According to the data access process that refers to the data address, generates a metadata access process that refers to the data address in the data access process,
When the metadata access process is executed by the machine, the machine
Converting the data address into a metadata address separate from the data address;
A machine-readable medium for accessing metadata of a data item at the data address based on the metadata address.
(Item 25)
The metadata access process is an item 24 selected from an instruction group consisting of a metadata bit test and set (MDLT) instruction, a metadata store and set (MSS) instruction, and a metadata store and reset instruction (MDSR). A machine-readable medium according to claim 1.
(Item 26)
25. The machine of item 24, wherein the metadata access process is selected from a group of compressed instructions consisting of a compressed metadata test (CMDT) instruction, a compressed metadata store (CMS) instruction, and a compressed metadata clear (CMDCLR) instruction. A readable medium.
(Item 27)
When the metadata access process is executed by the machine, the machine converts the data address into a metadata address when the metadata access process is executed by the machine. Based on the compression ratio of metadata, the data address, a processing element identifier (PEID) associated with the metadata access process, and a metadata data identifier associated with the metadata access process ( 27. The machine readable medium according to item 26, including combining with MDID).
(Item 28)
28. The machine readable medium of item 27, wherein the data address is translatable by virtual-physical address translation logic of the machine to refer to the data item.
(Item 29)
The metadata access process further refers to an operand register,
When the metadata access process is executed by the machine, the machine accesses the metadata of the data item. When the metadata access process is executed by the machine, the machine holds in the operand register. 25. The machine readable medium of item 24, comprising updating the metadata of the data item with a value being rendered.
(Item 30)
The program code includes compiler code,
The compiler code compiles application code including the data access process,
25. The machine readable medium of item 24, wherein generating the metadata access process in the data access process includes generating the metadata access process in a compiled version of the application code.
(Item 31)
A machine-readable medium having program code, wherein when the program code is executed on a machine, the machine
A metadata address (MDID) associated with a software subsystem that is currently active in a processing element associated with the metadata access instruction is a data address referred to by a metadata access instruction included in the program code. ) To a metadata address,
A machine-readable medium for accessing metadata based on the metadata address.
(Item 32)
The metadata access instruction is an item selected from an instruction group consisting of a metadata load instruction for loading the metadata, a metadata store instruction for storing in the metadata, and a metadata clear instruction for resetting the metadata. 31. The machine-readable medium according to 31.
(Item 33)
The software subsystem includes a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software conversion subsystem, an outer transaction of one of a group of nested transactions, and a nested 32. The machine readable medium of item 31, wherein the machine readable medium is selected from the group consisting of one inner transaction of the grouped transactions.
(Item 34)
A metadata address (MDID) associated with a software subsystem that is currently active in a processing element associated with the metadata access instruction is a data address referred to by a metadata access instruction included in the program code. ) Is converted into a metadata address by adding the MDID to the data address to form the metadata address, and converting the data address into a converted data address using a normal data transaction table. An algorithm for converting and adding the MDID to the post-conversion address to form the metadata address, and a post-conversion meta-data using a metaphysical conversion table different from a normal data conversion table Combining the MDID and the data address based on a combination algorithm selected from the group consisting of algorithms that form the metadata address by adding the MDID to the converted metadata address. 32. The machine readable medium of item 31 comprising.
(Item 35)
Adding the MDID is an algorithm for adding the MDID selected from the group consisting of an algorithm for adding the MDID to the MSB position, an algorithm for adding the MDID to the LSB position, and an algorithm for replacing address bits with the MDID. 35. The machine readable medium of item 34, comprising:
(Item 36)
Item 35. When the program code is executed on the machine, the machine further determines the MDID based on a control register for the processing element indicating a current software subsystem currently active on the processing element. Machine-readable medium.
(Item 37)
A memory for holding program code including a metadata access instruction that refers to a data memory address associated with the data item;
A processor associated with the memory;
With
The processor is
One processing element of a plurality of processing elements associated with execution of the metadata access instruction;
Fetch logic for fetching the metadata access instruction from the memory;
Decoding logic for decoding the metadata access instruction into at least one metadata access process;
A control register holding a metadata identifier (MDID) associated with an active context in the processing element;
A data cache memory including a data entry for holding the data item;
Execution logic for executing the metadata access processing;
Have
When the execution logic executes the metadata access process, the metaphysical address conversion logic of the processor uses the data memory address as a metadata memory address based on the MDID held in the control register. Converting the cache control logic coupled to the data cache memory to perform the metadata access process on another entry in the data cache memory based on the metadata memory address. Including system.
(Item 38)
The metadata access instruction is an item selected from an instruction group consisting of a metadata load instruction for loading the metadata, a metadata store instruction for storing in the metadata, and a metadata clear instruction for resetting the metadata. 37. The system according to 37.
(Item 39)
The active context includes a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software conversion subsystem, an outer transaction in a group of nested transactions, and a nesting 40. The system of item 37, wherein the system is selected from the group consisting of one inner transaction of the group of transactions made.
(Item 40)
The metaphysical address conversion logic of the processor further converts the data memory address into a metadata memory address based on a processing element identifier (PEID) for the processing element,
The metaphysical address conversion logic of the processor converts the data memory address into a metadata memory address based on the PEID and the MDID held in the control register. An algorithm for forming the metadata memory address in addition to the data memory address, converting the data memory address to a converted data memory address using a normal data transaction table, and converting the PEID and the MDID to the converted data memory address An algorithm for forming the metadata memory address in addition to the data memory address, and a metaphysical conversion table different from the normal data conversion table, and converting the data memory address to The PEID based on a combination algorithm selected from the group consisting of an algorithm for converting the data memory address and adding the PEID and the MDID to the converted metadata memory address to form the metadata memory address. 38. The system of item 37, comprising combining the MDID and the data memory address.
(Item 41)
A processor,
An execution module that executes a metadata loading process that refers to an address;
In response to the metadata loading process, if the processor is operating in the first mode, the metadata value associated with the address is provided, and the processor is operating in the second mode. In the case with a forced module that provides a fixed value
Processor.
(Item 42)
42. The processor of item 41, wherein the first mode includes a strong atomic mode, and the second mode includes a weak atomic mode.
(Item 43)
45. The processor of item 42, further comprising a first register that holds the fixed value.
(Item 44)
A second register for holding a mode value;
The mode value represents a first value when indicating that the processor is operating in the strong atomic mode;
44. A processor according to item 43, wherein the mode value represents a second value when the processor indicates that the processor is operating in the weak atomic mode.
(Item 45)
45. The processor of item 44, wherein the first register and the second register are the same metadata control register.
(Item 46)
The enforcement module provides a metadata value associated with an address when the processor is operating in the strong atomic mode, and when the processor is operating in the weak atomic mode. Providing a fixed value means that the enforcement module represents the first value indicating that the mode value held in the second register indicates that the processor is operating in the strong atomic mode. If the metadata value is loaded, the metadata value is loaded into the destination register defined by the metadata loading process, and the mode value held in the second register indicates that the processor operates in the weak atomic mode. The second value indicating that the first register is present, the first register is transferred to the destination register. The processor of claim 44 comprising loading the value.
(Item 47)
Discovering a metadata access process that references an address;
Determining the execution mode of the processor;
Providing the metadata value associated with the address to the metadata access process when it is determined that the execution mode of the processor is the first execution mode;
Providing a fixed value from a register for the metadata access process when it is determined that the execution mode of the processor is the second execution mode;
A method comprising:
(Item 48)
Determining the execution mode of the processor comprises reading a mode flag from a first control register;
The mode flag holds a first value indicating that the execution mode of the processor is the first execution mode, and a second value indicating that the execution mode of the processor is the second execution mode. 48. A method according to item 47 to be retained.
(Item 49)
The step of providing the metadata value associated with the address to the metadata access process includes the step of referencing the metadata access process from the memory location associated with the address. 48. The method of item 47, comprising loading the metadata value into a register.
(Item 50)
50. The method of item 49, wherein for the metadata access process, providing a fixed value from a register comprises loading the fixed value from the register to the destination register.
(Item 51)
A memory that holds a metadata load process that references an address and a destination register;
A processor associated with the memory;
With
The processor is
Execution logic for executing the metadata loading process;
A metadata register that holds the forced value,
A cache memory that holds a metadata value associated with the address;
When the execution logic executes the metadata loading process, when the processor operates in the first mode, the metadata value is provided to the destination register, and the processor operates in the second mode. Forcing logic to provide the forcing value from the metadata register to the destination register when
Having a system.
(Item 52)
52. The system of item 51, wherein the forcing logic further determines whether the processor is operating in the first mode or the second mode.
(Item 53)
53. The system of item 52, wherein the first mode includes a strong atomic mode, and the second mode includes a weak atomic mode.
(Item 54)
The metadata register further holds a mode value,
The mode value indicates a first value when the processor is operating in the first mode, and indicates a second value when the processor is operating in the second mode;
The forcing logic further determining whether the processor is operating in the first mode or the second mode includes the forcing logic interpreting the mode value of the metadata register. 53. The system according to item 52.
(Item 55)
A control register for holding a mode value;
The mode value indicates a first value when the processor is operating in the first mode, and indicates a second value when the processor is operating in the second mode;
The forcing logic further determining whether the processor is operating in the first mode or the second mode includes the forcing logic interpreting the mode value of the control register. 52. The system according to 52.
(Item 56)
52. The system of item 51, wherein the memory is selected from the group consisting of dynamic random access memory (DRAM), static random access memory (SRAM), and non-volatile memory.
(Item 57)
A data cache array that holds cache entries; and
Cache control logic coupled to the data cache array;
With
The cache control logic is
In response to a buffered update to the cache entry, the cache entry is transitioned from an unmonitored state to a buffered coherency state and a read monitoring state,
Thereafter, the apparatus transitions the cache entry to a buffered coherency state and a write monitoring state before transitioning the cache entry to a modified state to commit the buffered update.
(Item 58)
The buffered update for the cache entry includes a transactional memory access to a data address of a data item held in the cache entry, and a metadata access to a data address associated with the metadata held in the cache entry. 58. The apparatus of item 57, further comprising: an update selected from the group consisting of local updates to the cache entry.
(Item 59)
The cache control logic causes the cache entry to transition from an unmonitored state to a buffered coherency state and a read monitor state so that the cache control logic buffers coherency bits associated with the cache entry. 58. The apparatus of item 57, comprising updating to a completed value to represent the buffered coherency state, and updating a read monitoring attribute bit associated with the cache entry to a read monitoring value to represent the read monitoring state. .
(Item 60)
Transitioning the cache entry to a buffered coherency state and a write monitor state before the cache control logic transitions the cache entry to a modified state to commit the buffered update; Control logic maintains the buffered value of the coherency bit associated with the cache entry to represent the buffered coherency state and writes a write monitoring attribute bit associated with the cache entry 60. The apparatus of item 59, comprising updating to a monitoring value to represent the write monitoring state.
(Item 61)
The cache control logic transitioning the cache entry to the modified state means that the cache control logic updates the coherency bit associated with the cache entry to a modified value and the modified coherency state. 61. The apparatus according to item 60, comprising representing:
(Item 62)
Further comprising execution logic for performing commit processing after executing the buffered update;
Transitioning the cache entry to a buffered coherency state and a write monitoring state before the cache control logic transitions the cache entry to a modified state in order to commit the buffered update, the execution logic The device according to item 57, which is performed in response to executing the commit process.
(Item 63)
Discovering buffered updates to a block of cache memory;
Applying a read monitor to the block upon finding the buffered update for the block of the cache memory;
Applying write monitoring to the block after committing the read monitor and before committing the block;
A method comprising:
(Item 64)
64. The method of item 63, wherein the buffered update to the block of the cache memory comprises a transactional write to the block of the cache memory.
(Item 65)
Performing the buffered update on the block of the cache memory simultaneously with applying the read monitoring;
64. The method of item 63, wherein the block is held in a buffered coherency state after performing the buffered update.
(Item 66)
Performing the buffered update on the block of the cache memory after applying the read monitoring;
64. The method of item 63, wherein the block is held in a buffered coherency state after performing the buffered update.
(Item 67)
Upon finding the buffered update for the block of the cache memory, applying read monitoring to the block comprises:
Generating a read request for the block for a plurality of processing elements outside a cache domain of the cache memory;
When it is detected that there is no conflict from the plurality of processing elements outside the cache domain in response to the read request for the block, the read monitoring attribute associated with the block of the cache memory is updated to a read monitoring value. Applying read monitoring to the block;
64. A method according to item 63, comprising:
(Item 68)
After applying the read monitoring and before committing the block, applying write monitoring to the block comprises:
Generating an ownership read request for the block for the plurality of processing elements outside the cache domain of the cache memory;
Write monitoring associated with the block of the cache memory in response to detecting no conflict from the plurality of processing elements outside the cache domain in response to the ownership read request for the block Updating the attribute to a write monitoring value and applying write monitoring to the block;
68. The method according to item 67, comprising:
(Item 69)
69. The method of item 68, wherein committing the block includes transitioning the cache coherency state of the block from a buffered coherency state to a modified coherency state.
(Item 70)
A machine accessible medium holding program code,
When the machine executes the program code, the machine
When there is a buffered write to a block of cache memory, apply read monitoring to the block,
Performing the buffered write on the block;
A machine accessible medium that applies write monitoring to the block after applying the read monitoring and before committing the block.
(Item 71)
Applying read monitoring to the block when there is the buffered write to the block of the cache memory,
Generating a read request for the block for a plurality of processing elements outside a cache domain of the cache memory;
In response to detecting that there is no conflict from the plurality of processing elements outside the cache domain in response to the read request for the block, a read monitoring attribute associated with the block of the cache memory is Updating the read monitoring value to apply the read monitoring to the block;
including
Item 70. The machine accessible medium of item 70.
(Item 72)
Applying write monitoring to the block after applying the read monitoring and before committing the block;
Generating an ownership read request for the block for the plurality of processing elements outside the cache domain of the cache memory;
Write monitoring associated with the block of the cache memory in response to detecting no conflict from the plurality of processing elements outside the cache domain in response to the ownership read request for the block Updating the attribute to the write monitoring value and applying the write monitoring to the block;
72. The machine accessible medium according to item 71, comprising:
(Item 73)
71. The machine accessible medium of item 70, wherein write monitoring is applied to the block after applying the read monitoring and before committing the block in response to discovering a commit process.
(Item 74)
71. The machine accessible medium of item 70, wherein committing the block includes transitioning the cache coherency state of the block to a modified coherency state.
(Item 75)
System memory holding transactional writes referencing memory addresses and commit processing;
A processor having a cache memory and associated with the system memory;
With
The cache memory included in the processor is:
Generating a read request for the cache line associated with the memory address in response to receiving the transactional write;
Transitioning the cache line to a buffered and read-monitored state in response to no conflict being detected based on the read request;
An ownership read request is generated in response to receiving the commit process, and the cache line is buffered and monitored for writing in response to no conflict being detected based on the ownership read request. Transition
A system that transitions the cache line to a modified state in response to transitioning the cache line to the buffered and write-monitored state.
(Item 76)
Transitioning the cache line to a state where the cache memory is buffered and being monitored for reading means that the cache memory updates a coherency bit associated with the cache line to a buffered value, Represents a portion of the buffered and read-monitored state that is buffered, updates the read monitoring attribute bit associated with the cache line to a read monitoring value, and 76. A system according to item 75, comprising representing a portion of the read-monitored state that is being read-monitored.
(Item 77)
Transitioning the cache line to a state where the cache memory is buffered and monitored for writing means that the cache memory maintains the buffered value of the coherency bit associated with the cache line. Represents the buffered state of the buffered and write-monitored state, and updates the write monitoring attribute bit associated with the cache line to the write monitoring value. 77. The system of item 76, comprising representing a state of the buffered and write monitored state that is being write monitored.
(Item 78)
Transitioning the cache line to a modified state in response to the cache memory transitioning the cache line to the buffered and write-monitored state sets the coherency bit to a modified value. 78. A system according to item 77, comprising updating to represent the modified state.
(Item 79)
76. The system of item 75, wherein the memory is selected from the group consisting of dynamic random access memory (DRAM), static random access memory (SRAM), and non-volatile memory.
(Item 80)
Decode logic to decode the lost instructions and provide the decoded elements;
A status storage element having a loss field holding a loss value indicating that a loss event has been detected;
Jump logic coupled to the status storage element;
With
The lost instruction refers to a label, includes an operation code (opcode), and is part of an instruction group that can be recognized by the decode logic,
The apparatus wherein the jump logic passes control to the label based on the decoded element and the loss value indicating that the loss event has been detected.
(Item 81)
The label includes a jump destination address;
A loss event is a read monitoring conflict indicating that a write to a cache line being read has occurred, a write monitoring conflict indicating that an access to a cache line being written has occurred, and a buffered event. 81. The apparatus of item 80, wherein the apparatus is selected from the group consisting of:
(Item 82)
The status storage element comprises a register;
The loss field holding the loss value is buffered with a first bit set when a read monitoring conflict is detected, and a second bit set when a write monitoring conflict is detected. 81. The apparatus of item 80, comprising a third bit that is set when a loss of physical data is detected and a fourth bit that is set when a loss of buffered metadata is detected.
(Item 83)
The loss instruction includes a read monitoring loss instruction, the opcode defines a read monitoring loss event type,
The jump logic passes control to the label based on the decoded element and the loss value indicating that the loss event has been detected because the type of loss event that the jump logic has occurred , Causing the label to jump to execution in response to the loss field holding the loss value indicating the read monitor loss event type defined by the opcode of the read monitor loss instruction. The device of item 80 comprising:
(Item 84)
The loss instruction includes a write monitoring loss instruction, the opcode defines a write monitoring loss event type;
The jump logic has occurred that the jump logic has passed control to the label based on the decoded element and the loss field holding the loss value indicating that the loss event has been detected. In response to the loss field holding the loss value indicating that the type of the loss event is the write monitoring loss event type defined by the opcode of the write monitoring loss instruction, The apparatus of item 80, comprising causing the label to jump execution.
(Item 85)
The loss instruction includes a buffered loss instruction, the opcode defines a buffered loss event type;
The jump logic has occurred that the jump logic has passed control to the label based on the decoded element and the loss field holding the loss value indicating that the loss event has been detected. In response to the loss field holding the loss value indicating that the type of the loss event is the buffered loss event type defined by the opcode of the buffered loss instruction, the label 81. The apparatus of item 80, comprising causing the execution to jump to.
(Item 86)
A machine accessible medium holding program code,
When the machine executes the program code, the machine
Depending on the loss order,
Determine the status of the transaction defined by the loss instruction and held in the transaction status register provided in the machine,
A machine accessible medium that directs execution to a label defined by the loss instruction in response to the status of the transaction indicating that a loss event associated with the loss instruction has been detected.
(Item 87)
The label includes a jump destination address;
A loss event is a read monitoring conflict indicating that a write to a cache line being read has occurred, a write monitoring conflict indicating that an access to a cache line being written has occurred, and a buffered event. 87. The machine accessible medium of item 86, selected from the group consisting of a loss of cash lines.
(Item 88)
The loss command includes a read monitoring jump loss (JLOSS) command that defines the loss event as a read monitoring conflict,
Determining the status of a transaction held in the transaction status register includes determining a status of a read monitoring conflict bit held in the transaction status register;
Directing execution to a label defined by the loss instruction in response to the status indicating that the loss event associated with the loss instruction has been detected is a read monitor In response to the fact that the status of the read monitoring conflict bit held in the transaction status register indicates that a conflict has been detected, execution is directed to a label defined by the loss instruction. 90. The machine accessible medium of item 86 comprising.
(Item 89)
The loss instruction includes a write supervisor jump loss (JLOSS) instruction that defines the loss event as a write supervisor conflict;
Determining the status of the transaction held in the transaction status register includes determining the status of the write monitoring conflict bit held in the transaction status register;
In response to the status of the transaction indicating that a loss event associated with the loss instruction has been detected, directing execution to a label defined by the loss instruction is a write In response to the fact that the status of the write monitoring conflict bit held in the transaction status register indicates that a monitoring conflict has been detected, execution is directed to a label defined by the loss instruction. 89. The machine accessible medium of item 86, comprising:
(Item 90)
The loss instruction includes a buffered monitoring jump loss (JLOSS) instruction that defines the loss event as a buffered monitoring conflict;
Determining the status of a transaction held in the transaction status register includes determining a status of a buffered monitoring conflict bit held in the transaction status register;
Directing execution to the label defined by the loss instruction in response to the status indicating that the loss event associated with the loss instruction has been detected is buffered In response to the status of the buffered monitoring conflict bits held in the transaction status register indicating that a monitoring conflict has been detected, execution is directed to a label defined by the lost instruction 89. The machine accessible medium of item 86, comprising:
(Item 91)
Discovering lost instructions in the processor;
Determining whether a loss event associated with the loss instruction is detected in the processor in response to finding the loss instruction;
Branching to a label referenced by the loss instruction in response to finding the loss instruction and determining that the loss event associated with the loss instruction has been detected in the processor;
A method comprising:
(Item 92)
92. The method of item 91, wherein the label includes a jump address.
(Item 93)
The loss command includes a read monitoring loss command,
92. The method of item 91, wherein the loss event associated with the read monitoring loss instruction includes writing to a cache line being read monitored.
(Item 94)
The loss instruction includes a write monitoring loss instruction;
92. The method of item 91, wherein the loss event associated with the write monitoring loss instruction includes access to a cache line that is being write monitored.
(Item 95)
The loss instruction includes a buffered loss instruction;
92. The method of item 91, wherein the loss event associated with the buffered loss instruction includes eviction of a buffered cache line.
(Item 96)
Determining whether eviction of a buffered cache line is detected in the processor is by checking the buffered loss status bit in the transaction status register and setting the buffered loss status bit to a loss value. 96. The method of item 95, comprising determining, in response, that eviction of a buffered cache line has been detected.
(Item 97)
Decode logic to decode the transaction commit instructions and provide decoded elements;
Commit logic for determining whether a commit condition defined by the commit instruction is satisfied for the transaction according to the decoded element;
With
The commit instruction defines the commit condition, includes an operation code (opcode), and is a part of an instruction group that can be recognized by the decode logic.
(Item 98)
The commit condition includes any defined combination of no loss of read monitoring data, no loss of write monitoring data, no loss of buffer data, and no loss of metadata,
The determination that the commit logic satisfies the commit condition is that there is no loss of read monitoring data, no loss of write monitoring data, no loss of buffer data, and no loss of metadata. 98. The apparatus according to item 97, comprising determining that the specified combination has occurred.
(Item 99)
The commit instruction that defines the commit condition includes the commit instruction that holds 4 bits, and the first bit of the 4 bits is a condition that the loss of read monitoring data should be committed when set. The second bit, when set, indicates that the loss of write monitoring data is a condition to commit, and the third bit, when set, commits the loss of buffer data 98. The apparatus of item 97, indicating a condition to be performed and a fourth bit, when set, indicates that a metadata loss is a condition to commit.
(Item 100)
100. The apparatus of item 99, wherein the 4 bits are included in the opcode.
(Item 101)
When the commit logic determines whether the commit condition specified by the commit instruction is satisfied for the transaction, the commit logic determines whether the commit logic is a transaction status register And confirming that the commit condition is satisfied if none of the confirmed corresponding status bits of the transaction status register is set to indicate a corresponding loss. 100. Apparatus according to item 99.
(Item 102)
The commit instruction further defines a clear control indicating a combination to be cleared at the time of commit among read monitoring data, write monitoring data, buffered data, and metadata,
The commit logic, after committing the transaction in response to determining that the commit condition defined by the commit instruction is satisfied for the transaction, reads monitor data, write monitor data, buffer data, and 98. The apparatus according to item 97, wherein the specified combination of metadata is cleared.
(Item 103)
A machine-readable medium for holding program code,
When the machine executes the program code, the machine
For a transaction included in the program code, find a commit instruction that defines at least one commit failure condition;
Determining whether or not the at least one commit failure condition defined by the commit instruction is detected during execution of the transaction;
In response to determining that the at least one commit failure condition specified by the commit instruction is detected during the execution of the transaction, the at least one commit failure specified by the commit instruction during the execution of the transaction. Provides a value indicating that the condition has been detected
Machine-readable medium.
(Item 104)
104. The machine-readable medium of item 103, wherein the at least one commit failure condition is selected from the group consisting of loss of read monitoring data, loss of write monitoring data, loss of buffer data, and loss of metadata.
(Item 105)
Providing a value indicating that the at least one commit failure condition specified by the commit instruction has been detected during execution of the transaction means that the at least one specified by the commit instruction during execution of the transaction. 104. The machine-readable medium of item 103, comprising loading a value indicating that one commit failure condition has been detected into a destination register.
(Item 106)
Determining whether the at least one commit failure condition defined by the commit instruction is detected during execution of the transaction;
Checking a status bit of a transaction status register associated with the at least one commit failure condition;
In response to the status bit associated with the at least one commit failure condition being set to indicate that the at least one commit failure condition has been detected during execution of the transaction; Determining that the at least one commit failure condition defined by the commit instruction is detected during execution of a transaction;
In response to the status bit associated with the at least one commit failure condition being reset to indicate that the at least one commit failure condition was not detected during execution of the transaction, Determining that the at least one commit failure condition defined by the commit instruction was not detected during execution of the transaction;
104. The machine-readable medium of item 103, comprising:
(Item 107)
108. The machine-readable medium of item 106, further comprising committing the transaction in response to determining that the at least one commit failure condition defined by the commit instruction was not detected during execution of the transaction.
(Item 108)
In a transaction, finding a commit instruction including an operation code (opcode) defining a plurality of commit failure conditions for the transaction;
Determining that none of the plurality of commit failure conditions for the transaction defined in the opcode of the commit instruction was detected during execution of the transaction;
Committing the transaction in response to determining that none of the plurality of commit failure conditions for the transaction defined in the opcode of the commit instruction has been detected during execution of the transaction;
A method comprising:
(Item 109)
The opcode defining the plurality of commit failure conditions for the transaction is set with a first bit of the opcode that defines a loss of read monitoring data as a commit failure condition when set. A second bit of the opcode that defines a loss of write monitoring data as a commit failure condition; and a third bit of the opcode that defines a loss of buffer data as a commit failure condition when set; 109. The method of item 108, comprising: a fourth bit of the opcode that, when set, defines that the loss of metadata is a commit failure condition.
(Item 110)
Determining that none of the plurality of commit failure conditions for the transaction defined in the opcode of the commit instruction was detected during execution of the transaction;
Determining, in response to the first bit of the opcode being set, that the read monitoring bit of the transaction status register is not set, indicating no loss of read monitoring data;
Determining that the write monitoring bit of the transaction status register is not set and indicates no loss of write monitoring data in response to the second bit of the opcode being set;
Determining, in response to the third bit of the opcode being set, that the buffered bit of the transaction status register is not set, indicating no loss of buffer data;
Determining that the metadata bit of the transaction status register is not set and indicates no loss of metadata in response to the fourth bit of the opcode being set;
110. The method of item 109, comprising:
(Item 111)
The opcode further defines clear control,
The opcode that prescribes the clear control is set with a fifth bit of the opcode that prescribes clearing the read monitoring data when committed, and the write monitoring data when committed when set A sixth bit of the opcode that prescribes clearing, a seventh bit of the opcode that prescribes clearing the buffer data when committed, and a metadata when set. 110. The method of item 109, comprising: an eighth bit of the opcode specifying that it is cleared upon commit.
(Item 112)
Committing the transaction includes clearing read monitoring data when the fifth bit is set; clearing write monitoring data when the sixth bit is set; 119. The method of item 111, comprising: clearing buffer data when the seventh bit is set; and clearing metadata when the eighth bit is set.
(Item 113)
A memory for holding a program code including a commit instruction for a transaction, the clear control information and a commit instruction including an operation code (opcode) defining a plurality of commit failure conditions for the transaction;
Decoding logic for decoding the opcode of the commit instruction, and a processor having commit logic;
With
The commit logic determines whether none of the plurality of commit failure conditions specified by the opcode is detected during execution of the transaction, and the commit logic determines whether the plurality of commit commits during execution of the transaction. Committing the transaction in response to determining that none of the failure conditions were detected,
The system wherein the commit logic commits the transaction includes the commit logic clearing transaction information based on the clear control information defined by the opcode of the commit instruction.
(Item 114)
114. The system according to item 113, wherein the commit failure condition is determined by combining a loss of read monitoring data, a loss of write monitoring data, a loss of buffer data, and a loss of metadata.
(Item 115)
The commit failure condition includes: write monitoring data loss, read monitoring data loss or write monitoring data loss, write monitoring data loss or buffer data loss, write monitoring data loss or metadata loss, 119. The system of item 114, wherein the system is selected from the group consisting of loss of write monitoring data, loss of read monitoring data, loss of buffer data, or loss of metadata.
(Item 116)
The operation code defining the clear control information includes the operation code defining which of read monitoring, write monitoring, buffered coherency and metadata is cleared upon commit;
The commit logic clears transaction information based on the clear control information defined in the opcode of the commit instruction because the commit logic is the read monitor, the write monitor, the buffered coherency, and the 114. The system of item 113, comprising clearing metadata that is specified to be cleared by the opcode.
(Item 117)
A storage element having a transaction enable field (TEF);
Logic that stores at least the TEF state in a storage structure in response to a ring level transition event and returns at least the TEF state from the storage structure to the storage element in response to a return event;
With
If the TEF holds an active value, it indicates that the associated transaction is active and enabled, and if the TEF holds an inactive value, the associated transaction is temporary. A device that indicates that it has stopped.
(Item 118)
118. The apparatus of item 117, wherein the ring level transition event comprises an event selected from the group consisting of an interrupt, an exception, a system call, a virtual machine start, and a virtual machine end.
(Item 119)
118. The apparatus of item 117, wherein the return event includes an event selected from the group consisting of an interrupt return (IRET), a system return (SYSRET), a virtual machine (VM) start, and a virtual machine (VM) end.
(Item 120)
The storage element includes a flag register;
118. The apparatus of item 117, wherein the TEF includes a transaction enable flag.
(Item 121)
The storage structure includes a stack, and the logic storing at least the state of the TEF in the stack includes push logic pushing at least the state of the TEF into the stack, the logic from the stack 118. The apparatus of item 117, wherein returning at least the state of the TEF to a storage element comprises pop logic popping at least the state of the TEF from the stack and returning the TEF to the storage element.
(Item 122)
A memory that holds code that, when executed, generates a ring level transition event;
A processor having registers and stack logic;
With
The register includes a transaction enable field (TEF) for holding an active value indicating that the associated transaction is active;
The stack logic pushes the previous state of the register onto the stack in response to the ring level transition event, clears the TEF to an inactive value, and the associated transaction is suspended. And returning the previous state of the register from the stack to the register in response to a return event.
(Item 123)
123. The system of item 122, wherein the ring level transition event comprises an event selected from the group consisting of interrupt, exception, system call, virtual machine start, and virtual machine end.
(Item 124)
123. The system of item 122, wherein the return event includes an event selected from the group consisting of an interrupt return (IRET), a system return (SYSRET), a virtual machine (VM) start, and a virtual machine (VM) end.
(Item 125)
The register includes a flag register, the TEF includes a transaction enable flag, the active value includes a logic high value of the transaction enable flag, and the inactive value includes a logic low value of the transaction enable flag. The described system.
(Item 126)
Detecting a ring level transition event from the current ring level; and
Saving the previous state of the register containing the transaction enable field to a storage structure;
Clearing the transaction enable field to indicate that the associated transaction is suspended;
Detecting a return event to the current ring level;
Returning the previous state of the register from the storage structure in response to detecting the return event to the current ring level;
A method comprising:
(Item 127)
The storage structure includes a kernel stack, and saving the previous state of the register to the kernel stack comprises pushing the previous state of the register onto the kernel stack, from the kernel stack 127. The method of item 126, wherein returning the previous state of a register comprises popping the previous state of the register from the kernel stack and returning the previous state to the register.
(Item 128)
127. The method of item 126, wherein the current ring level includes a user ring level.
(Item 129)
129. The method of item 128, wherein the ring level transition event comprises an event selected from the group consisting of an interrupt, an exception, a system call, and a virtual machine start.
(Item 130)
131. The method of item 129, wherein the return event to the current privilege level includes an event selected from the group consisting of an interrupt return (IRET), a system return (SYSRET), and a virtual machine (VM) termination.

Claims

Multiple processing elements;
With logic and
One processing element of the plurality of processing elements is associated with a plurality of software subsystems,
The logic is associated with a first software subsystem among the plurality of software subsystems, and a metadata access process that refers to a data address is associated with the first software subsystem. An apparatus for associating with a metaphysical address space associated with the first software subsystem based at least on a data identifier (MDID) and the data address.

The metaphysical address space associated with the first software subsystem is at least one other metaphysical address space associated with a second software subsystem of the plurality of software subsystems; The apparatus of claim 1, wherein the apparatus is orthogonal to a data address space containing the data address.

The logic includes conversion logic for converting the data address to a metadata address in the metaphysical address space associated with the first software subsystem based on at least the MDID. The device described.

The translation logic further includes the data address in the metaphysical address space associated with the first software subsystem based on a processing element identifier (PEID) associated with the processing element. The apparatus of claim 3, wherein the apparatus converts to a metadata address.

The conversion logic further converts the data address to a metadata address in the metaphysical address space associated with the first software subsystem based on a data-to-metadata compression ratio. 4. The apparatus according to 4.

Further comprising a modifiable register by the first software sub-system,
Said register in response to a write from the first software sub-systems, while holding the MDID, shows the effect that the first software sub-system is running on the processing elements,
Said conversion logic, the data address, based on the PEID and the MDID, be converted to the meta data address of the first software the metaphysical address space associated with the subsystem, said conversion The apparatus of claim 4, wherein logic includes combining information representing the data address with the PEID and the MDID.

The conversion logic, when said data address and information indicating the combining the PEID and the MDID, wherein forming the PEID and the metadata address by adding the MDID in the data address, the usual data conversion table by converting the data address to the converted data address using, to form the metadata address by adding the PEID and the MDID in the converted data address, and, separate from the normal data conversion table by converting the data address using the meta physical conversion table in the conversion after the metadata address, at least one of the by adding the PEID and the MDID in the converted metadata address to form the metadata address instrumentation of claim 6 for the .

Discovering metadata processing that refers to a data address that is in a data address space and that is associated with a data item held in a data entry in a cache memory; and
A metadata address in a metaphysical address space separate from the data address space corresponds to the data address, a processing element identifier (PEID) of a processing element associated with the metadata processing, and the processing element Determining based on a metadata identifier (MDID) of the attached software subsystem;
Accessing a metadata entry in the cache memory based on the metadata address.

9. The method of claim 8, wherein the metaphysical address space is also distinct from an additional metaphysical address space that is also associated with an additional software subsystem that is also associated with the processing element.

Responsive to the software subsystem currently executing on the processing element in response to discovering a write operation from the software subsystem to a control register associated with the processing element Writing the MDID in
The method of claim 8, further comprising: determining the MDID based on the control register.

The method of claim 10, further comprising determining the PEID based on a portion of the metadata processing opcode.

It said data address to the metadata address, the PEID, and determining from the MDID is to form the metadata address by adding the PEID and the MDID in the data address, the usual data conversion table by converting the data address to the converted data address using, to form the PEID and the metadata address by adding the MDID in the converted data address, and, separate from the normal data conversion table Converting at least one of the data address into a converted metadata address using a metaphysical conversion table, and adding the PEID and the MDID to the converted metadata address to form the metadata address; go and said data address, Method of claim 10 including the step of combining the serial PEID and said MDID.

Decoding logic for decoding a metadata access instruction that refers to the data address of the data item;
Transform the data address into a separate metadata address transparent to software and access the metadata referenced by the separate metadata address in response to the decode logic decoding the metadata access instruction With logic to
The metadata access instruction includes an operation code that can be recognized as a part of an instruction group that can be appropriately decoded by the decoding logic.

The metadata access instruction is selected from an instruction group consisting of a metadata bit test and set (MDLT) instruction, a metadata store and set (MSS) instruction, and a metadata store and reset instruction (MDSR). 13. The apparatus according to 13.

14. The apparatus of claim 13, wherein the metadata access instruction is selected from a group of instructions consisting of a compressed metadata test (CMDT) instruction, a compressed metadata store (CMS) instruction, and a compressed metadata clear (CMDCLR) instruction. .

It is the metadata identifier specified in the control register by the software subsystem that is associated with the metadata access instruction that the logic converts the data address into a separate metadata address transparent to software. 14. The apparatus of claim 13, comprising translating the data address based at least on (MDID).

The metadata access instruction further includes a reference to a destination register;
Access to metadata which the logic is referenced by the separate metadata address, said logic loads the metadata in the separate metadata address referenced in the destination register 14. The apparatus of claim 13, comprising.

The apparatus of claim 17, wherein the opcode includes a thread identifier field that identifies a thread that issued the metadata access instruction.

To the machine,
A program for executing a step of generating a metadata access process that refers to the data address in the data access process according to a data access process that refers to a data address,
When the metadata access process is executed, the program converts the data address to a metadata address different from the data address, and the data address based on the metadata address. A program that causes access to metadata of data items in

The metadata access process is selected from an instruction group consisting of a metadata bit test and set (MDLT) instruction, a metadata store and set (MSS) instruction, and a metadata store and reset instruction (MDSR). 19. The program according to 19 .

The method of claim 19 , wherein the metadata access process is selected from a group of compressed instructions consisting of a compressed metadata test (CMDT) instruction, a compressed metadata store (CMS) instruction, and a compressed metadata clear (CMDCLR) instruction. program.

The converting is associated with the data address, a processing element identifier (PEID) associated with the metadata access process, and the metadata access process based on a data-to-metadata compression ratio. 23. The program of claim 21 , comprising combining a metadata data identifier (MDID).

The program according to claim 22 , wherein the data address can be converted by a virtual-physical address conversion logic of the machine so as to refer to the data item.

The metadata access process further refers to an operand register,
The program according to claim 19 , wherein the accessing includes updating the metadata of the data item with a value held in the operand register.

Including compiler code for compiling application code including the data access process;
The program according to claim 19 , wherein the step of generating the metadata access process in the data access process includes generating the metadata access process in a compiled version of the application code.