KR101370314B1

KR101370314B1 - Optimizations for an unbounded transactional memory (utm) system

Info

Publication number: KR101370314B1
Application number: KR1020117031098A
Authority: KR
Inventors: 가드 쉐퍼; 얀 그레이; 버튼 스미스; 알리-레자 에이디엘-타바타바이; 로버트 제바; 바딤 바신; 데이비드 찰라한; 양 니; 브라틴 사하; 마틴 테일레퍼; 슐로모 라이킨; 코이치 야마다; 랜디 왕; 아룬 키샨
Original assignee: 인텔 코포레이션
Priority date: 2009-06-26
Filing date: 2009-06-26
Publication date: 2014-03-05
Also published as: GB2484416A; CN102460376A; WO2010151267A1; JP2012530960A; CN102460376B; BRPI0925055A2; DE112009005006T5; GB2484416B; KR20130074726A; JP5608738B2; GB201119084D0

Abstract

언바운디드 트랜잭션 메모리(UTM) 시스템을 최적화하는 방법 및 장치가 본원에서 개시된다. 모니터, 버퍼링, 및 메타데이터에 대한 하드웨어 지원이 제공되고, 메타데이터에 대해 직교하는 추상적 어드레스 공간이 스레드 및/또는 스레드 안의 소프트웨어 서브시스템들과 결합되어 분리된다. 또한, 메타데이터는 소프트웨어에 투명하게, 데이터에 대한 압축 방식으로 하드웨어를 통해 보유될 수 있다. 또, 메타데이터 액세스 명령어/오퍼레이션에 응답하여, 하드웨어는 여러 모드의 트랜잭션 실행을 가능하게 하는 강제 메타데이터 값을 지원할 수 있다. 그러나, 모니터, 버퍼링된 데이터, 메타데이터, 또는 다른 정보가 손실되거나 충돌이 검출될 때, 하드웨어는 그러한 손실이나 충돌에 대해 트랜잭션 상태 레지스터를 폴링하고 손실이나 충돌을 검출함에 따라 어떤 레이블로 점프 실행을 할 수 있는 손실 명령어의 변형들을 제공한다. 마찬가지로, 소프트웨어가 커미트 조건과 커미트 시 클리어할 정보를 정의할 수 있도록 커미트 명령어의 여러 변형들이 제공된다. 또한, 하드웨어는 링 레벨 전환시 트랜잭션의 보류 및 복구를 행하기 위한 지원을 제공한다.Disclosed herein are methods and apparatus for optimizing an unbound transactional memory (UTM) system. Hardware support for monitoring, buffering, and metadata is provided, and abstract address spaces orthogonal to the metadata are combined in isolation with the thread and / or software subsystems within the thread. In addition, metadata may be retained through hardware in a manner that is transparent to software, in a manner that compresses data. In addition, in response to metadata access commands / operations, the hardware may support mandatory metadata values that enable execution of various modes of transaction. However, when a monitor, buffered data, metadata, or other information is lost or a conflict is detected, the hardware polls the transaction status register for that loss or conflict and jumps to any label as it detects the loss or conflict. Provides variations of the missing instruction that can be done. Similarly, several variations of commit instructions are provided so that software can define commit conditions and information to clear at commit. In addition, the hardware provides support for holding and restoring transactions at the ring level transition.

Description

OPTIMIZATIONS FOR AN UNBOUNDED TRANSACTIONAL MEMORY (UTM) SYSTEM}

본 발명은 프로세서 실행 분야에 관한 것으로, 특히 명령어 그룹의 실행에 관한 것이다.
The present invention relates to the field of processor execution, and more particularly to the execution of instruction groups.

반도체 프로세싱 및 로직 설계의 진보는 집적 회로 장치들 상에 존재할 수 있는 로직 양의 증가를 허용한다. 그 결과, 컴퓨터 시스템 구성은 시스템 내 단일 혹은 다중 집적 회로들로부터 개별 집적 회로들 상에 존재하는 다중 코어 및 다중 로직 프로세서들로 진화하였다. 프로세서 또는 집적 회로는 전형적으로 한 개의 프로세서 다이(die)를 포함하며, 프로세서 다이는 임의의 개수의 코어나 로직 프로세서를 포함할 수 있다.Advances in semiconductor processing and logic design allow for an increase in the amount of logic that can exist on integrated circuit devices. As a result, computer system configurations have evolved from single or multiple integrated circuits in the system to multiple core and multiple logic processors residing on separate integrated circuits. The processor or integrated circuit typically includes one processor die, which may include any number of cores or logic processors.

집적 회로들 상에서 계속해서 늘어나는 개수의 코어와 로직 프로세서들은 보다 많은 소프트웨어 스레드들이 동시에 실행될 수 있게 한다. 그러나 동시에 실행될 수 있는 소프트웨어 스레드 개수의 증가는 소프트웨어 스레드들 사이에 공유되는 데이터를 동기화시키는 것과 연관된 문제를 만들고 있다. 멀티 코어 또는 멀티 로직 프로세서 시스템의 공유 데이터 액세스를 위한 한 가지 전형적인 해법은 공유 데이터에 대한 다중 액세스에 걸쳐 상호 배제(mutual exclusion)를 보장하기 위한 락(lock)들의 사용을 포함한다. 그러나, 여러 소프트웨어 스레드들을 잠정적으로 실행하기 위해 계속해서 개선되는 능력은 실행의 직렬화 및 거짓 경합(false contention)을 일으킨다. The ever increasing number of cores and logic processors on integrated circuits allow more software threads to run simultaneously. However, the increase in the number of software threads that can run simultaneously creates a problem associated with synchronizing data shared between software threads. One typical solution for shared data access in a multi-core or multi-logic processor system involves the use of locks to ensure mutual exclusion over multiple accesses to shared data. However, the ability to continually improve to potentially execute multiple software threads results in serialization of execution and false contention.

예를 들어, 공유 데이터를 보유한 해시 테이블을 고려할 수 있다. 락 시스템을 이용하여, 프로그래머는 전체 해시 테이블을 락 시켜, 한 스레드가 전체 해시 테이블을 액세스할 수 있게 한다. 그러나, 다른 스레드들의 처리량과 성능은 불리한 영향이 미칠 가능성이 있는데, 이는 다른 스레드들은 락이 해제될 때까지 해시 테이블의 어떤 엔트리들에도 액세스가 불가하기 때문이다. 다른 대안으로서, 해시 테이블의 각각의 엔트리가 락될 수도 있다. 어떤 방식이든, 그러한 간단한 예를 큰 확장 프로그램(scalable program)에 외삽한 뒤, 락 경합, 직렬화, 파인-그레인(fine-grain) 동기화 및 교착상태 회피(deadlock avoidance)가 프로그래머들에게는 극히 거추장스러운 부담이 된다.For example, consider a hash table with shared data. Using the lock system, the programmer locks the entire hash table so that one thread can access the entire hash table. However, the throughput and performance of other threads can have an adverse effect, because other threads cannot access any entries in the hash table until the lock is released. As another alternative, each entry in the hash table may be locked. Either way, you can extrapolate such a simple example into a large scalable program and then lock contention, serialization, fine-grain synchronization, and deadlock avoidance are extremely cumbersome for programmers. Becomes

또 하나의 최근 데이터 동기화 기법은 트랜잭션 메모리(TM(transactional memory))의 사용을 포함한다. 흔히 트랜잭션 실행(transactional execution)은 복수의 마이크로 오퍼레이션, 오퍼레이션, 또는 명령어의 그룹화 실행을 포함한다. 상기 예에서, 양쪽 스레드들이 해시 테이블 안에서 실행되고, 그들의 메모리 액세스가 모니터링/추적된다. 만약 양측 스레드가 같은 엔트리를 액세스/변경하면, 데이터 유효성을 보장하기 위해 충돌 해결(conflict resolution)이 수행될 수 있다. 트랜잭션 실행의 한 타입은 소프트웨어 트랜잭션 메모리(STM(Software Transactional Memory))를 포함하고, 여기서 메모리 액세스의 추적, 충돌 해결, 태스크 중단, 및 기타 트랜잭션 작업들은 보통 하드웨어 지원 없이 소프트웨어를 통해 수행된다. Another recent data synchronization technique involves the use of transactional memory (TM). Transactional execution often involves grouping a plurality of microoperations, operations, or instructions. In the above example, both threads run in a hash table and their memory accesses are monitored / tracked. If both threads access / change the same entry, conflict resolution may be performed to ensure data validity. One type of transaction execution includes software transactional memory (STM), where memory access tracking, conflict resolution, task abort, and other transactional tasks are typically performed through software without hardware support.

다른 종류의 트랜잭션 실행은 하드웨어 트랜잭션 메모리(HTM(Hardware Transactional Memory))를 포함하고, 여기서는 액세스 추적, 충돌 해결, 및 다른 트랜잭션 작업들을 지원하기 위해 하드웨어가 포함된다. 앞서, 실제 메모리 데이터 어레이들은 판독, 기록 및 버퍼링을 추적하기 위한 하드웨어 속성들 같은 정보를 보유할 추가 비트들을 사용하여 확장되었고, 그 결과 데이터는 프로세서에서 메모리까지 데이터와 함께 간다. 보통 이 정보는 지속적인 것이라 일컬어진다, 즉 그 정보는 메모리 계층구조 전체를 통해 데이터와 함께 이동하기 때문에 캐시 퇴거(cache eviction) 시에 사라지지 않는다. 여기서 그 지속성은 메모리 계층구조 시스템 전체에 걸쳐 더 많은 오버헤드를 부과한다.Other kinds of transaction execution include hardware transactional memory (HTM), which includes hardware to support access tracking, conflict resolution, and other transactional tasks. Earlier, the actual memory data arrays were extended with additional bits that would hold information such as hardware attributes to track reads, writes, and buffering, with the result that the data went with the data from the processor to the memory. Usually this information is said to be persistent, that is, it does not disappear during cache eviction because it travels with the data throughout the memory hierarchy. The persistence here imposes more overhead on the entire memory hierarchy system.

또한, 이전의 하드웨어 트랜잭션 메모리(HTM) 시스템들은 수많은 비효율성과 싸워왔다. 첫 번째 예로서, HTM들은 현재 트랜잭션의 커미트(commit) 전에 일관성을 보장하기 위해 버퍼링 및 모니터링된 상태로 비버퍼링되거나 버퍼링되고 모니터링되지 않은 상태들 사이의 전환을 위한 어떠한런 효율적 방법도 제공하고 있지 못하다. 또 다른 예로서, 소프트웨어와의 HTM의 인터페이스에 대한 여러 비효율성이 존재한다. 구체적으로, 하드웨어는 트랜잭션 및 비트랜잭션 오퍼레이션들 사이의 강약 원자성(atomicity)에 대한 여러 형식들을 고려하는 소프트웨어 메모리 액세스 장벽들을 적절히 가속화하는 어떠한 메커니즘도 제공하지 않는다. 또한, 시도된 트랜잭션의 커미트 중에, 하드웨어는 모니터링, 버퍼링, 및/또는 다른 속성 정보의 손실에 기반하여 트랜잭션이 중단(abort)하거나 커미트해야 하는 때를 판단하기 위한 어떠한 편의도 제공하지 못한다. 마찬가지로, 이러한 이전의 HTM들에 대한 명령어 집합은 트랜잭션의 커미트에 따라 보유하거나 제거할 정보를 정의하는 커미트 명령어들을 제공하지 못한다. 다른 비효율성의 예들로는, HTM들이 트랜잭션들의 실행 중에 링(ring) 레벨 우선 트랜잭션들을 처리하는 데 대한 현 HTM들의 무능력과 정보의 충돌 또는 손실 검출에 따라 벡터나 점프 실행을 효율적으로 수행할 명령어들을 제공하지 못한다는 점을 포함한다.
In addition, previous hardware transactional memory (HTM) systems have struggled with numerous inefficiencies. As a first example, HTMs do not provide any efficient way to switch between unbuffered, buffered and unmonitored states to buffered and monitored states to ensure consistency before the commit of the current transaction. . As another example, there are several inefficiencies with the interface of the HTM with software. Specifically, the hardware does not provide any mechanism to properly speed up software memory access barriers that take into account various forms of strong atomicity between transactional and non-transactional operations. In addition, during the commit of the attempted transaction, the hardware provides no convenience to determine when the transaction should abort or commit based on the loss of monitoring, buffering, and / or other attribute information. Similarly, the instruction set for these older HTMs does not provide commit instructions that define the information to retain or remove depending on the commit of the transaction. Other examples of inefficiency include instructions for HTMs to efficiently perform vector or jump execution depending on current HTMs inability to process ring level priority transactions and detecting collision or loss of information during execution of transactions. It does not include that.

본 발명은 예로서 도시되며, 첨부 도면들에 의해 한정되도록 의도되지 않는다.
도 1은 다중 소프트웨어 스레드들을 동시에 실행할 수 있는 다중 프로세싱 요소들을 포함하는 프로세서의 일 실시예를 예시한다.
도 2는 데이터 아이템에 메타데이터를 연관하는 일 실시예를 예시한다.
도 3은 복수의 프로세싱 요소들 안에 있는 각각의 소프트웨어 서브시스템들에 대한 직교하는 추상적 어드레스 공간들의 일 실시예를 예시한다.
도 4는 데이터에 대한 메타데이터의 압축에 대한 일 실시예를 예시한다.
도 5는 메타데이터를 액세스하는 방법의 흐름도에 대한 일 실시예를 예시한다.
도 6은 강약 원자성 환경 안에서 트랜잭션의 가속화를 지원하기 위한 메타데이터 저장 요소의 일 실시예를 예시한다.
도 7은 트랜잭션 환경에서 원자성을 유지하면서 비트랜잭션 오퍼레이션들을 가속화하는 흐름도의 일 실시예를 예시한다.
도 8은 트랜잭션 커미트 전에 버퍼링 밍 모니터링된 상태로 데이터의 블록을 효율적으로 전환하는 방법의 흐름도에 대한 일 실시예를 예시한다.
도 9는 트랜잭션 상태 레지스터 내 상태 값에 기반하여 목적지 레이블로 점프하는 손실(loss) 명령어를 지원하는 하드웨어의 일 실시예를 예시한다.
도 10은 특정 정보의 충돌이나 손실에 기반하여 목적지 레이블로 점프하는 손실 명령어 실행 방법의 흐름도에 대한 일 실시예를 예시한다.
도 11은 커미트 명령어에서의 커미트 조건과 클리어 컨트롤의 정의를 지원하는 하드웨어의 일 실시예를 예시한다.
도 12는 커미트 조건과 클리어 컨트롤을 정의하는 커미트 명령어의 실행 방법에 대한 흐름도의 일 실시예를 예시한다.
도 13은 트랜잭션의 실행 중에 특권 레벨 전환 처리를 지원하는 하드웨어의 일 실시예를 예시한다.The invention is shown by way of example and is not intended to be limited by the accompanying drawings.
1 illustrates one embodiment of a processor that includes multiple processing elements capable of executing multiple software threads simultaneously.
2 illustrates one embodiment of associating metadata with a data item.
3 illustrates one embodiment of orthogonal abstract address spaces for respective software subsystems within a plurality of processing elements.
4 illustrates one embodiment for the compression of metadata for data.
5 illustrates one embodiment for a flowchart of a method of accessing metadata.
6 illustrates one embodiment of a metadata storage element to support acceleration of a transaction in a strongly atomic environment.
7 illustrates one embodiment of a flow diagram for accelerating non-transactional operations while maintaining atomicity in a transactional environment.
8 illustrates one embodiment for a flow diagram of a method of efficiently transitioning a block of data to a buffered dimming monitored state prior to transaction commit.
9 illustrates one embodiment of hardware that supports a loss instruction that jumps to a destination label based on a state value in a transaction status register.
10 illustrates one embodiment of a flow diagram of a method of executing a loss instruction that jumps to a destination label based on a conflict or loss of specific information.
Figure 11 illustrates one embodiment of hardware that supports the definition of commit conditions and clear control in commit instructions.
12 illustrates one embodiment of a flowchart for a method of executing a commit instruction that defines a commit condition and a clear control.
13 illustrates one embodiment of hardware that supports privilege level switching processing during execution of a transaction.

이하의 설명에서는 본 발명의 완전한 이해를 제공하기 위해, 트랜잭션 실행을 위한 특정 하드웨어 구조들, 액세스 모니터들의 특정 타입 및 구현예들, 액세스 충돌을 검출할 특정 타입 캐시 일관성 모델, 특정 데이터 세분도(granularities), 및 특정 타입의 메모리 액세스 및 위치의 예들 같은 수많은 특정 세부사항들이 언급된다. 그러나 당업자라면 이러한 특정 세부사항들이 본 발명을 실시하기 위해 사용되어야 하는 것은 아니라는 것을 잘 알 수 있을 것이다. 소프트웨어를 통한 트랜잭션의 코딩, 컴파일러에 의해 열거된 기능을 수행할 오퍼레이션들의 삽입, 트랜잭션의 경계결정(demarcation), 특정한 대안적 멀티 코어 및 멀티 스레드 프로세서 구조들, 특정 컴파일러 방법/구현예, 및 마이크로프로세서에 대한 특정 동작 세부사항들 같은 잘 알려진 구성요소나 방법들은 본 발명을 불필요하게 불명료하게 하는 것을 피하기 위해 상세히 기술되지 않았다. In the following description, to provide a thorough understanding of the present invention, specific hardware structures for transaction execution, specific types and implementations of access monitors, specific type cache coherency model to detect access conflicts, and specific data granularities And numerous specific details, such as examples of specific types of memory access and location. However, it will be apparent to one skilled in the art that these specific details are not to be used to practice the present invention. Coding of transactions through software, insertion of operations to perform functions enumerated by the compiler, demarcation of transactions, certain alternative multicore and multithreaded processor structures, specific compiler methods / implements, and microprocessors Well known components or methods, such as specific operating details, have not been described in detail in order to avoid unnecessarily obscuring the present invention.

여기에 기술되는 방법 및 장치는 언바운디드 트랜잭션 메모리(UTM(unbounded transactional memory)) 실행을 위해 하드웨어 및 소프트웨어를 최적화하기 위한 것이다. 구체적으로, 그러한 최적화는 UTM 시스템 지원과 연관하여 주로 논의된다. 그러나, 여기에 기술되는 방법 및 장치는 소프트웨어 트랜잭션 메모리 시스템(STM(software transactional memory systme)), 순수 하드웨어 트랜잭션 메모리 시스템(HTM), 또는 이들의 혼합형을 지원하거나 가속화하는 하드웨어 안에서, UTM 시스템과는 구현에 있어서 상이한 트랜잭션 메모리 시스템의 어떤 형식으로 활용될 수 있다. The methods and apparatus described herein are for optimizing hardware and software for running unbounded transactional memory (UTM). Specifically, such optimizations are primarily discussed in connection with UTM system support. However, the methods and apparatus described herein may be implemented with a UTM system in hardware that supports or accelerates a software transactional memory system (STM), a pure hardware transactional memory system (HTM), or a combination thereof. Can be utilized in any form of different transactional memory system.

도 1을 참조할 때, 다중 스레드를 동시에 실행할 수 있는 프로세서의 일 실시예가 예시된다. 프로세서(100)는 하드웨어 트랜잭션 실행에 대한 하드웨어 지원을 포함할 수 있다는 것을 알아야 한다. 하드웨어 트랜잭션 실행과 연계하거나, 독자적으로, 프로세서(100)는 소프트웨어 트랜잭션 메모리(STM)의 하드웨어 가속화, STM의 개별 실행, 또는 하이브리드 트랜잭션 메모리(TM) 시스템 같이 그들의 연관에 대한 하드웨어 지원 또한 제공할 수 있다. 프로세서(100)는 마이크로 프로세서, 임베디드(embedded) 프로세서, 디지털 시그널 프로세서(DSP), 네트워크 프로세서, 또는 코드를 실행할 다른 디바이스 같은 어떤 프로세서를 포함한다. 예시된 것 같은 프로세서(100)는 복수의 프로세싱 요소들을 포함한다. Referring to FIG. 1, one embodiment of a processor capable of executing multiple threads simultaneously is illustrated. It should be appreciated that the processor 100 may include hardware support for hardware transaction execution. In conjunction with hardware transaction execution, or on its own, processor 100 may also provide hardware support for their association, such as hardware acceleration of software transactional memory (STM), individual execution of STM, or hybrid transactional memory (TM) systems. . Processor 100 includes any processor, such as a microprocessor, an embedded processor, a digital signal processor (DSP), a network processor, or another device to execute code. The processor 100 as illustrated includes a plurality of processing elements.

일 실시예에서, 프로세싱 요소는 스레드 유닛, 프로세스 유닛, 컨텍스트, 로직 프로세서, 하드웨어 스레드, 코어, 및/또는 실행 상태나 구조적 상태 같이 프로세서의 상태를 보유할 수 있는 어떤 다른 요소를 말한다. 즉, 일 실시예에서 프로세싱 요소는 소프트웨어 스레드, 운영체계, 애플리케이션, 또는 다른 코드 같은 코드와 독자적으로 연관될 수 있는 어떤 하드웨어를 말한다. 물리적 프로세서는 통상적으로 집적 회로를 의미하는 것으로, 이것은 잠정적으로 코어들이나 하드웨어 스레드들 같은 임의 개수의 다른 프로세싱 요소들을 포함한다.In one embodiment, a processing element refers to a thread unit, process unit, context, logic processor, hardware thread, core, and / or any other element that can hold a processor's state, such as an execution state or a structural state. In other words, in one embodiment, a processing element refers to any hardware that can be associated independently with code such as a software thread, operating system, application, or other code. Physical processor typically refers to an integrated circuit, which potentially includes any number of other processing elements, such as cores or hardware threads.

코어는 보통 독자적인 구조적 상태를 유지할 수 있는 집적 회로 상에 위치하는 로직을 말하며, 여기서 각각의 독자적으로 유지되는 구조적 상태는 적어도 일부 전용 실행 자원들과 연관된다. 코어들과 반대로, 하드웨어 스레드는 통상적으로 독자적인 구조적 상태를 보유할 수 있는 집적 회로 상에 자리한 어떤 로직을 말하며, 여기서 독자적으로 유지되는 구조적 상태들은 실행 자원에 대한 액세스를 공유한다. 알 수 있는 바와 같이, 소정 자원들이 공유되고 다른 자원들은 어떤 구조적 상태에 전용될 때, 하드웨어 스레드 및 코어의 명칭 간의경계가 겹쳐진다. 또한 흔히, 코어와 하드웨어 스레드는 운영체계에 의해 개별 로직 프로세서들로서 간주되며, 이때 운영체계는 각각의 로직 프로세서 상에서 개별적으로 오퍼레이션을 스케줄링할 수 있다. The core is generally referred to as logic located on an integrated circuit that can maintain its own structural state, where each independently maintained structural state is associated with at least some dedicated execution resources. In contrast to cores, a hardware thread typically refers to some logic that resides on an integrated circuit that can hold its own structural state, where the structural states that maintain its own share access to execution resources. As can be seen, when certain resources are shared and other resources are dedicated to some structural state, the boundary between the names of hardware threads and cores overlaps. Also, cores and hardware threads are often considered as separate logic processors by the operating system, where the operating system can schedule operations on each logical processor individually.

도 1에 예시된 것 같은 물리적 프로세서(100)는 상위 레벨 캐시(110)에 대한 액세스를 공유하는 두 개의 코어들인 코어(101 및 102)를 포함한다. 프로세서(100)는 비대칭 코어들, 즉 서로 다른 구성을 가진 코어들을 포함할 수 있지만, 기능적 유닛들, 및/또는 대칭적 코어들이 예시된다. 그 결과 코어(101)와 동일한 것으로서 예시된 코어(102)는 반복적 논의를 피하기 위해 상세히 논의되지 않을 것이다. 또한, 코어(101)는 두 개의 하드웨어 스레드(101a 및 101b)를 포함하고, 코어(102)는 두 개의 하드웨어 스레드(102a 및 102b)를 포함한다. 따라서, 운영체계 같은 소프트웨어 개체들은 프로세서(100)를 네 개의 개별 프로세서들, 즉 네 개의 소프트웨어 스레드들을 동시에 실행할 수 있는 네 개의 로직 프로세서들 또는 프로세싱 요소들로서 간주한다. Physical processor 100 as illustrated in FIG. 1 includes two cores, cores 101 and 102, that share access to upper level cache 110. The processor 100 may include asymmetric cores, ie cores having different configurations, but functional units, and / or symmetric cores are illustrated. As a result, the core 102 illustrated as being the same as the core 101 will not be discussed in detail to avoid repeated discussion. In addition, core 101 includes two hardware threads 101a and 101b, and core 102 includes two hardware threads 102a and 102b. Thus, software entities such as an operating system regard the processor 100 as four separate processors, or four logic processors or processing elements capable of executing four software threads simultaneously.

여기서, 제1스레드는 구조 상태 레지스터들(101a)과 연관되고, 제2스레드는 구조 상태 레지스터들(101b)과 연관되고, 제3스레드는 구조 상태 레지스터들(102a)과 연관되며, 제4스레드는 구조 상태 레지스터들(102b)과 연관된다. 예시된 바와 같이, 구조 상태 레지스터들(101a)은 구조 상태 레지스터들(101b)에 복제되므로, 개별 구조 상태/컨텍스트들이 로직 프로세서(101a) 및 로직 프로세서(101b)를 위해 저장될 수 있다. 리네임(rename) 할당 로직(130)의 명령어 포인터들 및 나머지 로직 같은 다른 보다 작은 자원들도 스레드들(101a 및 101b)을 위해 복제될 수 있다. 리오더/리타이어먼트(reorder/retirement) 유닛(135) 내 재정렬 버퍼들, 로드/저장(load/store) 버퍼들, 및 큐들(queues) 같은 일부 자원들이 분할(partitioning)을 통해 공유될 수 있다. 일반 목적의 내부 레지스터들, 페이지-테이블 베이스 레지스터, 로우레벨 데이터 캐시 및 데이터-TLB(115), 실행 유닛(들)(140), 및 비순차(out-of-order) 유닛(135)의 일부 같은 다른 자원들이 완전히 공유될 가능성이 있다. Here, the first thread is associated with the structure state registers 101a, the second thread is associated with the structure state registers 101b, the third thread is associated with the structure state registers 102a, and the fourth thread. Is associated with the structure status registers 102b. As illustrated, structure state registers 101a are replicated to structure state registers 101b, so that individual structure state / contexts may be stored for logic processor 101a and logic processor 101b. Other smaller resources, such as the instruction pointers of the rename allocation logic 130 and the rest of the logic, may also be replicated for the threads 101a and 101b. Some resources, such as reorder buffers, load / store buffers, and queues within the reorder / retirement unit 135 may be shared through partitioning. General purpose internal registers, page-table base registers, low-level data cache and data-TLB 115, execution unit (s) 140, and part of out-of-order unit 135 The same other resources are likely to be fully shared.

프로세서(100)는 보통 완전 공유되거나, 분할을 통해 공유되거나, 프로세싱 요소들에 의해/에 대해 전용될 수 있는 다른 자원들을 포함한다. 도 1에서는 프로세서의 예시적 기능 유닛들/자원들을 가진 단순한 전형적 프로세서의 실시예가 예시된다. 프로세서는 그러한 기능적 유닛들 중 어느 것을 포함하거나 생략할 수도 있고, 알려진 어떤 다른 기능적 유닛들, 로직, 또는 묘사되지 않은 펌웨어를 포함할 수도 있다. Processor 100 typically includes other resources that may be fully shared, shared via partitioning, or dedicated to / by processing elements. In FIG. 1 an embodiment of a simple exemplary processor with exemplary functional units / resources of the processor is illustrated. The processor may include or omit any of such functional units, or may include any other known functional units, logic, or undepicted firmware.

예시된 바와 같이, 프로세서(100)는 시스템 메모리(175), 칩설정, 노스브리지(northbridge), 또는 다른 집적 회로 같이, 프로세서(100) 외부의 장치들과 통신하기 위한 버스 인터페이스 모듈(105)을 포함한다. 메모리(175)는 프로세서(100)에 전용되거나 시스템 내 다른 장치들과 공유될 수 있다. 상위 레벨 또는 그 이상(further-out)의 캐시(110)는 상위 레벨 캐시(110)로부터 최근 불러 온 요소들을 캐싱해야 한다. 상위 레벨 또는 그 이상이란 실행 유닛(들)로부터 훨씬 나아가거나 개선된 캐시 레벨을 의미한다. 일 실시예에서, 상위 레벨 캐시(110)는 이차 레벨 데이터 캐시이다. 그러나 상위 레벨 캐시(110)는 명령어(instruction) 캐시와 연관되거나 명령어 캐시를 포함할 수 있기 때문에 그러한 것에 국한되지 않는다. 트레이스(trace) 캐시, 즉 명령어 캐시의 일종이 대신 디코더(125) 뒤에서 최근 디코딩된 트레이스들을 저장하기 위해 연결될 수 있다. 모듈(120) 또한 실행/취해질 브랜치들을 예측하는 브랜치 타깃 버퍼 및 명령어들의 어드레스 변환 엔트리를 저장할 명령어-변환 버퍼(I-TLB(instruction-translation buffer))를 포함할 가능성이 있다. As illustrated, the processor 100 may include a bus interface module 105 for communicating with devices external to the processor 100, such as system memory 175, chip configuration, northbridge, or other integrated circuits. Include. The memory 175 may be dedicated to the processor 100 or shared with other devices in the system. The higher-level or higher-out cache 110 must cache elements recently retrieved from the higher-level cache 110. By higher level or above is meant a cache level that is further or improved from the execution unit (s). In one embodiment, the high level cache 110 is a second level data cache. However, the high level cache 110 is not so limited because it may be associated with or include an instruction cache. A sort of trace cache, or instruction cache, may instead be connected to store the recently decoded traces behind the decoder 125. Module 120 may also include a branch target buffer that predicts the branches to be executed / taken and an instruction-translation buffer (I-TLB) to store the address translation entry of instructions.

디코딩 모듈(125)은 불러온 요소들을 디코딩하기 위해 페치(fetch) 유닛(120)과 연결된다. 일 실시예에서, 프로세서(100)는 프로세서(100) 상에서 실행 가능한 명령어들을 정의/특정하는 명령어 집합 구조(ISA(Instruction Set Architecture))와 연관된다. 여기서 보통 ISA에 의해 인식되는 머신 코드 명령어들은 수행될 명령어나 오퍼레이션을 인용/특정하는 오퍼레이션 코드(opcode)라 불려지는 명령어의 일부를 포함한다.Decoding module 125 is coupled with fetch unit 120 to decode the imported elements. In one embodiment, the processor 100 is associated with an instruction set architecture (ISA) that defines / specifies instructions executable on the processor 100. Machine code instructions, which are usually recognized by ISA, include some of the instructions called opcodes that cite / specify instructions or operations to be performed.

일례에서, 할당자 및 리네이머(renamer) 블록(130)은 명령어 프로세싱 결과를 저장할 레지스터 파일들 같은 자원들을 예비하기 위한 할당자를 포함한다. 그러나, 스레드들(101a 및 101b)은 비순차적 실행이 가능할 수도 있는데, 여기서 할당자 및 리네이머 블록(130) 역시 명령어 결과들을 추적할 리오더(reorder) 버퍼 같은 다른 자원들을 예비한다. 유닛(130)은 또, 프로그램/명령어 레퍼런스 레지스터들을 프로세서(100) 내부의 다른 레지스터들로 이름 변경할 레지스터 리네이머를 포함할 수 있다. 리오더/리타이어먼트(reorder/retirement) 유닛(135)은 상술한 리오더 버퍼들, 로드 버퍼들, 및 저장 버퍼들 같은 구성요소들을 포함하여, 비순차적 실행 및 비순차적으로 실행된 명령어들의 추후의 순차적 리타이어먼트를 지원한다. In one example, allocator and reamer block 130 includes an allocator to reserve resources such as register files to store instruction processing results. However, threads 101a and 101b may be capable of out of order, where the allocator and renamer block 130 also reserve other resources, such as a reorder buffer to keep track of instruction results. Unit 130 may also include a register renamer to rename the program / instruction reference registers to other registers within processor 100. The reorder / retirement unit 135 includes components such as the reorder buffers, load buffers, and storage buffers described above, for subsequent sequential retirement of out of order execution and out of order executed instructions. Support

일 실시예에서 스케줄러 및 실행 유닛 블록(140)은 실행 유닛들 상에서 명령어/오퍼레이션을 스케줄링하는 스케줄러 유닛을 포함한다. 예를 들어, 이용 가능한 유동 소수점 실행 유닛을 가진 실행 유닛의 포트 상에서 유동 소수점 명령어가 스케줄링된다. 정보 명령 프로세싱 결과들을 저장하기 위해 실행 유닛들과 연관된 레지스터 파일들 역시 포함된다. 전형적 실행 유닛들은 유동 소수점 실행 유닛, 정수 실행 유닛, 점프 실행 유닛, 로드 실행 유닛, 저장 실행 유닛, 및 기타 공지된 실행 유닛들을 포함한다. In one embodiment scheduler and execution unit block 140 includes a scheduler unit that schedules instructions / operations on execution units. For example, floating point instructions are scheduled on a port of an execution unit that has an available floating point execution unit. Register files associated with execution units are also included to store information instruction processing results. Typical execution units include floating point execution units, integer execution units, jump execution units, load execution units, storage execution units, and other known execution units.

하위 레벨 데이터 캐시 및 데이터 변환 버퍼(D-TLB)(150)가 실행 유닛(140)과 연결된다. 데이터 캐시는 메모리 일관성 상태들 상에서 유지 가능한 데이터 피연산자들 같이 최근에 사용되고/연산된 요소들을 저장하는 것이다. D-TLB는 최근의 가상/선형의 물리적 어드레스 변환들을 저장하는 것이다. 특정 예로서, 프로세서는 물리적 메모리를 복수의 가상 페이지들로 분해하는 페이지 테이블 구조를 포함할 수 있다. The low level data cache and data translation buffer (D-TLB) 150 are coupled with the execution unit 140. A data cache is one that stores recently used / operated elements such as data operands that can be maintained on memory coherency states. The D-TLB is to store recent virtual / linear physical address translations. As a specific example, the processor may include a page table structure that breaks up physical memory into a plurality of virtual pages.

일 실시예에서, 프로세서(100)는 하드웨어 트랜잭션 실행, 소프트웨어 트랜잭션 실행, 또는 이들의 조합 또는 하이브리드 실행 기능을 가진다. 코드의 임계(critical) 또는 아토믹(atomic) 섹션이라 불릴 수 있는 트랜잭션은 아토믹 그룹으로서 실행될 명령어, 오퍼레이션, 또는 마이크로 오퍼레이션들의 그룹을 포함한다. 예를 들어, 명령어들이나 오퍼레이션들은 트랜잭션이나 임계 섹션을 경계 결정(demarcate)하는데 사용될 수 있다. 일 실시예에서, 이하에 상세히 기술되는 바와 같이, 이 명령어들은 상술한 디코더들 같은 프로세서(100)의 하드웨어에 의해 인식될 수 있는 명령어 집합 구조(ISA) 같은 명령어들의 집합의 일부이다. 보통, 상위 언어에서 하드웨어가 인식할 수 있는 어셈블리 언어로 컴파일된 그러한 명령어들은 디코더들이 디코드 단계 중에 인식하는 오퍼레이션 코드(오퍼레이션 코드), 또는 다른 명령어 부분들을 포함한다. In one embodiment, processor 100 has hardware transaction execution, software transaction execution, or a combination or hybrid execution function thereof. A transaction, which may be called a critical or atomic section of code, includes a group of instructions, operations, or microoperations to be executed as an atomic group. For example, instructions or operations may be used to demarcate a transaction or critical section. In one embodiment, as described in detail below, these instructions are part of a set of instructions, such as an instruction set structure (ISA) that can be recognized by the hardware of processor 100, such as the decoders described above. Usually, such instructions compiled into a hardware-recognizable assembly language in a higher language include operation code (operation code), or other instruction portions, that decoders recognize during the decode phase.

통상적으로 트랜잭션의 실행 중에, 메모리에 대한 업데이트는 트랜잭션이 커미트될 때까지는 전체적으로 보여지지 않는다. 예로서, 한 위치로의 트랜잭션 기록은 그러한 트랜잭션 기록을 포함한 트랜잭션이 커미트될 때까지 기록 데이터가 전달되지 않은 다른 스레드로부터의 판독에 응답하여, 로컬 스레드에게는 그래도 보여질 가능성이 있다. 이하에 보다 상세히 논의되는 바와 같이, 트랜잭션이 아직 계류 중일 때, 메모리 내로부터 로드되거나 메모리 안에 쓰여진 데이터 아이템/요소들이 추적된다. 트랜잭션이 커미트 포인트에 도달할 때, 그 트랜잭션에 대한 충돌이 검출되지 않았다면, 트랜잭션이 커미트되고 트랜잭션 중에 이루어진 업데이트가 전체적으로 가시화된다. Typically during the execution of a transaction, updates to memory are not seen as a whole until the transaction is committed. By way of example, a transaction record to one location is likely to be visible to the local thread in response to a read from another thread whose record data has not been delivered until the transaction including that transaction record is committed. As discussed in more detail below, when a transaction is still pending, data items / elements loaded into or written into memory are tracked. When a transaction reaches a commit point, if no conflict was detected for that transaction, the transaction is committed and the updates made during the transaction are visualized as a whole.

그러나, 트랜잭션이 그 계류 중에 무효화된 경우, 트랜잭션은 중단되고 업데이트가 전체적으로 가시화되지 못한 채 재시작될 수 있다. 그 결과, 여기에 사용된 것과 같은 트랜잭션의 계류는 실행을 시작했지만 커미트되거나 중단되지 않은, 미결정 상태의 트랜잭션을 말한다. However, if a transaction is invalidated during its pending, the transaction may be aborted and restarted without the update being fully visible. As a result, the pending of a transaction as used herein refers to an undetermined transaction that has started execution but has not been committed or aborted.

소프트웨어 트랜잭션 메모리(STM) 시스템은 보통 액세스 추적, 충돌 해결, 또는 소프트웨어 안이나 적어도 부분적으로 그 안에서의 다른 트랜잭션 메모리 작업들을 말한다. 일 실시예에서 프로세서(100)는 트랜잭션 실행을 지원하는 프로그램 코드를 컴파일하기 위해 컴파일러를 실행할 수 있다. 여기서, 컴파일러는 오퍼레이션, 콜, 함수, 및 기타 트랜잭션 실행을 가능하게 하는 코드를 삽입할 수 있다. A software transactional memory (STM) system usually refers to access tracking, conflict resolution, or other transactional memory operations within or at least partially within software. In one embodiment, the processor 100 may execute a compiler to compile program code that supports transaction execution. Here, the compiler can insert code that enables operations, calls, functions, and other transactional execution.

컴파일러는 보통 소스 텍스트/코드를 타깃 텍스트/코드로 변환할 프로그램 또는 프로그램 집합을 포함한다. 보통, 컴파일러를 통한 프로그램/애플리케이션 코드의 컴파일은 상위 레벨 프로그래밍 언어 코드를 하위 레벨 머신어 또는 어셈블리 언어 코드로 변환하도록 여러 단계 및 패스로 수행된다. 간단한 컴파일을 위해 단일 패스 컴파일러가 여전히 활용될 수 있다. 컴파일러는 어떤 알려진 컴파일 기법들을 활용하며, 어휘 분석, 전치 프로세싱, 파싱, 어의 분석, 코드 생성, 코드 변환 및 코드 최적화 같은 어떤 알려진 컴파일러 동작들을 수행한다. Compilers usually contain a program or set of programs that will convert source text / code into target text / code. Usually, the compilation of program / application code through a compiler is performed in several steps and passes to convert the high level programming language code into lower level machine or assembly language code. The single pass compiler can still be utilized for simple compilation. The compiler utilizes some known compilation techniques and performs some known compiler operations such as lexical analysis, preprocessing, parsing, lexical analysis, code generation, code conversion, and code optimization.

보다 큰 컴파일러들은 보통 여러 단계들을 포함하지만, 대부분 그러한 단계들은 다음의 두 일반 단계들 속에 포함된다: (1) 프론트 엔드(front-end), 즉 일반적으로 구문 프로세싱, 어의 프로세싱, 및 어떤 변환/최적화가 일어날 수 있는 경우, 및 (2) 백 엔드(back-end), 즉 일반적으로 분석, 변환, 최적화, 및 코드 생성이 일어나는 경우. 어떤 컴파일러들은 컴파일러의 프론트 엔드 및 백 엔드 사이에서 경계결정의 흐릿함을 나타내는 미들 엔드(middle end)를 나타낸다. 그 결과, 삽입, 연관, 생성, 또는 다른 컴파일러 동작에 대한 참조는 상술한 단계들이나 패스들 중 어느 하나뿐 아니라 어떤 다른 알려진 컴파일러 단계들이나 패스들에서 일어날 수도 있다. 예시적인 예로서, 컴파일러는 컴파일의 프론트 엔드 단계에서의 콜/오퍼레이션의 삽입 및 그 다음 트랜잭션 메모리 변환 단계 중 콜/오퍼레이션의 하위 레벨 코드 변환 같이, 한 개 이상의 컴파일 단계들 안에 트랜잭션 오퍼레이션, 콜, 함수 등을 삽입할 수 있다. Larger compilers usually include several steps, but most often those steps fall into two general steps: (1) front-end, ie generally syntax processing, word processing, and some conversion / Where optimization can occur, and (2) back-end, i.e., analysis, transformation, optimization, and code generation in general. Some compilers have a middle end that indicates blurring of the demarcation between the front end and the back end of the compiler. As a result, references to insertions, associations, generations, or other compiler operations may occur in any of the above mentioned steps or passes, as well as in any other known compiler steps or passes. As an illustrative example, the compiler may perform transaction operations, calls, and functions within one or more compilation steps, such as inserting a call / operation at the front end of compilation and then converting the low level code of the call / operation during a transaction memory translation step. Etc. can be inserted.

컴파일러의 실행 환경 및 동적이거나 정적인 성질에도 불구하고, 컴파일러는 일 실시예에서 트랜잭션 실행이 가능하도록 프로그램 코드를 컴파일한다. 따라서, 일 실시예에서 프로그램 코드의 실행에 대한 언급은 (1) 메인 프로그램 코드를 컴파일하거나, 트랜잭션 구조를 유지하거나, 다른 트랜잭션 연관 동작을 수행하기 위한 동적이거나 정적인 컴파일러 프로그램의 실행, (2) 트랜잭션 오퍼레이션/콜들을 포함하는 메인 프로그램 코드의 실행, (3) 메인 프로그램 코드와 연관된 라이브러리들 같은 다른 프로그램 코드의 실행, 또는 (4) 이들의 조합을 의미한다.Despite the compiler's execution environment and dynamic or static nature, the compiler compiles the program code to enable transactional execution in one embodiment. Thus, in one embodiment, reference to the execution of program code refers to (1) the execution of a dynamic or static compiler program to compile main program code, maintain transaction structure, or perform other transactional association operations, (2) Execution of main program code including transaction operations / calls, (3) execution of other program code such as libraries associated with the main program code, or (4) combinations thereof.

보통 소프트웨어 트랜잭션 메모리(STM)들 안에서, 다른 오퍼레이션, 콜, 함수, 및 코드는 라이브러리들 안에서 개별적으로 제공되는 한편, 컴파일러는 어떤 오퍼레이션, 콜, 및 컴파일될 응용 코드와 일치하는 다른 코드를 삽입하는데 활용될 것이다. 이것은 응용 코드를 재컴파일할 필요 없이 라이브러리들을 최적화하고 업데이트하는 라이브러리 배포자(distributors)의 능력을 제공할 수 있다. 특정 예로서, 커미트 함수에 대한 콜이 트랜잭션의 커미트 시점에서 응용 코드 안에 일치되어 삽입될 수 있는 한편, 커미트 함수는 업데이트 가능한 라이브러리 안에 개별적으로 제공된다. 또한, 특정 오퍼레이션 및 콜을 어디에 놓을지에 대한 선택이 애플리케이션 코드의 효율성에 잠정적 영향을 미친다. 예를 들어, 도 6을 참조하여 액세스 장벽들에 관해 보다 상세히 논의되는 필터 오퍼레이션이 코드와 일치되어 삽입되는 경우, 필터 오퍼레이션은 비효율적으로 장벽으로 벡터링(vectoring)하고 나서 필터링 오퍼레이션을 수행하는 대신, 장벽으로의 실행 이동 전에 필터 오퍼레이션이 수행될 수 있다. Usually in software transaction memories (STMs), other operations, calls, functions, and code are provided separately in libraries, while the compiler is used to insert certain operations, calls, and other code that matches the application code to be compiled. Will be. This can provide the ability of library distributors to optimize and update libraries without having to recompile the application code. As a specific example, a call to a commit function may be inserted into the application code at the commit point of a transaction, while the commit function is provided separately in an updatable library. In addition, the choice of where to put certain operations and calls has a potential impact on the efficiency of the application code. For example, if a filter operation discussed in more detail with respect to access barriers with reference to FIG. 6 is inserted in line with the code, the filter operation is inefficiently vectoring to the barrier and then performing the filtering operation instead of performing the filtering operation. The filter operation may be performed before the execution move to.

일 실시예에서, 프로세서(100)는 하드웨어/로직을 활용하여, 즉 하드웨어의 트랜잭션 메모리(HTM) 시스템 안에서 트랜잭션들의 실행을 할 수 있다. HTM을 구현할 때 구조적이고 미세 구조적인 양 관점에서 볼 때 수많은 특정 구현예의 세부사항들이 존재하고, 이들 중 대부분은 본 발명을 불필요하게 불명료하게 만드는 것을 피하기 위해 여기에서는 논의되지 않는다. 그러나, 일부 구조들과 구현예들은 예시적 목적으로 개시된다. 또한, 이러한 구조들과 구현예들이 필수적인 것은 아니며 다른 구현 세부사항들을 가진 다른 구조들이 더해지고/거나 대체될 수도 있다는 것을 알아야 한다. In one embodiment, processor 100 may utilize hardware / logic, ie, execute transactions within a hardware transactional memory (HTM) system. There are numerous details of specific embodiments in terms of both structural and microstructural aspects when implementing HTM, many of which are not discussed herein to avoid unnecessarily obscuring the present invention. However, some structures and embodiments are disclosed for illustrative purposes. In addition, it is to be understood that these structures and implementations are not essential and that other structures with other implementation details may be added and / or replaced.

조합으로서, 프로세서(100)는 STM 및 HTM 양 시스템들의 이점을 적극 활용하고자 하는 언바운디드 트랜잭션 메모리(UTM) 시스템 안에서 트랜잭션을 실행할 수 있다. 예를 들어, HTM은 보통 작은 트랜잭션들을 실행하는데 있어 빠르고 효율적인데, 이는 그것이 액세스 추적, 충돌 검출, 실효화, 및 트랜잭션들의 커미트 모두를 수행함에 있어 소프트웨어에 의존하지 않기 때문이다. 그러나, HTM들은 보통 보다 작은 트랜잭션들만을 취급할 수 있는 한편, STM들은 무제한적 크기의 트랜잭션들을 취급할 수 있다. 따라서, 일 실시예에서, UTM 시스템은 보다 작은 트랜잭션들을 실행하기 위한 하드웨어 및 그 하드웨어에 비해 큰 트랜잭션들을 실행하기 위한 소프트웨어를 이용한다. 이하의 논의로부터 알 수 있겠지만, 소프트웨어가 트랜잭션들을 다룰 때에도, 소프트웨어를 지원하고 가속화시키는데 하드웨어가 사용될 수 있다. 또한, 순수 STM 시스템을 지원하고 가속화하기 위해 동일한 하드웨어가 역시 사용될 수 있다는 것을 아는 것이 중요하다. As a combination, processor 100 may execute transactions in an unbound transactional memory (UTM) system that seeks to take full advantage of both STM and HTM systems. For example, HTM is usually fast and efficient in executing small transactions because it does not rely on software in performing both access tracking, conflict detection, validation, and committing of transactions. However, HTMs can usually handle only smaller transactions, while STMs can handle transactions of unlimited size. Thus, in one embodiment, the UTM system utilizes hardware to execute smaller transactions and software to execute larger transactions relative to that hardware. As will be appreciated from the discussion below, even when software handles transactions, hardware can be used to support and accelerate the software. In addition, it is important to know that the same hardware can also be used to support and accelerate pure STM systems.

상술한 바와 같이, 트랜잭션들은 프로세서(100) 내 로컬 프로세싱 요소들 및 다른 프로세싱 요소들 둘 모두에 의해 데이터 아이템들에 대한 트랜잭션 메모리 액세스를 포함한다. 트랜잭션 메모리 시스템의 안전 메커니즘 없이 이 액세스들 중 일부는 무효한 데이터 및 실행, 즉 판독을 무효화하는 데이터 기록나 무효한 데이터의 판독을 가져올 가능성이 있을 것이다. 그 결과, 프로세서(100)는 이하에 논의되는 판독 모니터 및 기록 모니터 같이, 가능한 충돌에 대한 식별을 위해 데이터 아이템들로/로부터 메모리 액세스를 추적 또는 모니터하는 로직을 포함한다. As mentioned above, transactions include transactional memory access to data items by both local processing elements and other processing elements in processor 100. Without the safety mechanism of a transactional memory system, some of these accesses will likely result in invalid data and execution, i.e., data writes that invalidate reads or reads of invalid data. As a result, processor 100 includes logic to track or monitor memory access to and from data items for identification of possible conflicts, such as read monitors and write monitors discussed below.

데이터 아이템 또는 데이터 요소는 하드웨어, 소프트웨어, 또는 이들의 조합을 통해 정의되는 것 같은 어떤 세분화(granularity) 레벨의 데이터를 포함할 수 있다. 데이터, 데이터 요소, 데이터 아이템, 또는 이들에 대한 참조의 비포괄적 리스트는 메모리 어드레스, 데이터 오브젝트, 클래스, 동적 언어 코드의 타입 필드, 동적 언어 코드의 타입, 변수, 오퍼랜드, 데이터 구조, 및 메모리 어드레스에 대한 간접 참조를 포함한다. 그러나, 데이터의 어느 알려진 그룹도 데이터 요소나 데이터 아이템이라 불릴 수 있다. 동적 언어 코드의 타입 필드 및 동적 언어 코드의 타입과 같은 상기 예들 중 소수는 동적 언어 코드의 데이터 구조를 의미한다. 예시를 위해, 선 마이크로시스템 사의 자바™ 같은 동적 언어 코드가 강형(strongly typed) 언어이다. 각각의 변수는 컴파일 시점에 알려지는 타입을 가진다. 타입들은 두 개의 카테고리-프리미티브 타입들(불린(Boolean) 및 누메릭(numeric), 이를테면 int, float) 및 레퍼런스 타입들(클래스, 인터페이스 및 어레이들)로 구분된다. 레퍼런스 타입의 값은 오브젝트에 대한 레퍼런스이다. 자바™에서, 필드들로 이루어진 오브젝트가 클래스 인스턴스 또는 어레이일 수 있다. 클래스 A의 오브젝트 a가 주어질 때, 타입 A의 필드 x를 참조하기 위해 A::x를, 그리고 클래스 A의 오브젝트 a의 필드 x에 대해 a.x 표기를 사용하는 것이 관례적이다. 예를 들어, 식이 a.x=a.y+a.z처럼 표현될 수 있다. 여기서, 필드 y 및 z가 더해지기 위해 로드되며 그 결과가 필드 x에 쓰여지게 된다. A data item or data element may comprise data of any granularity level as defined through hardware, software, or a combination thereof. A non-exclusive list of data, data elements, data items, or references to them may be stored in memory addresses, data objects, classes, type fields of dynamic language code, types of dynamic language code, variables, operands, data structures, and memory addresses. Contains an indirect reference to However, any known group of data may be called a data element or data item. A few of the examples such as the type field of the dynamic language code and the type of the dynamic language code refer to the data structure of the dynamic language code. To illustrate, dynamic language code such as Java ™ from Sun Microsystems is a strongly typed language. Each variable has a type known at compile time. Types are divided into two category-primitive types (Boolean and numeric, such as int, float) and reference types (class, interface and arrays). The value of the reference type is a reference to the object. In Java ™, an object made of fields can be a class instance or an array. Given object a of class A, it is customary to use A :: x to refer to field x of type A and a.x notation for field x of object a of class A. For example, the expression can be expressed as a.x = a.y + a.z. Here, fields y and z are loaded for addition and the result is written to field x.

따라서, 데이터 아이템들에 대한 메모리 액세스를 모니터링/버퍼링하는 것은 어떤 데이터 레벨 세분도로든 수행될 수 있다. 예를 들어, 일 실시예에서, 데이터에대한 메모리 액세스는 타입 레벨에서 모니터링된다. 여기서, 필드 A::x로의 트랜잭션 기록 및 필드 A::y의 비트랜잭션 로드는 동일한 데이터 아이템, 즉 타입 A에 대한 액세스로서 모니터링될 수 있다. 또 다른 실시예에서, 메모리 액세스 모니터링/버퍼링이 필드 레벨 세분도로 수행된다. 여기서, 필드 A::x로의 트랜잭션 기록 및 필드 A::y의 비트랜잭션 로드는 동일한 데이터 아이템에 대한 액세스로서 모니터링되지 않는데, 이는 그들이 독자적 필드들에 대한 레퍼런스들이기 때문이다. 다른 데이터 구조나 프로그래밍 기법도 데이터 아이템에 대한 메모리 액세스를 추적함에 있어 고려될 수 있다는 것을 알아야 한다. 예로서, 클래스 A의 오브젝트의 필드들 x 및 y, 즉 A::x 및 A::y가 클래스 B의 오브젝트들을 가리키고, 새로 할당된 오브젝트들로 초기화되며, 초기화 후에는 절대 덮어 쓰여지지 못한다고 가정할 수 있다. 일 실시예에서 A::x에 의해 가리켜진 오브젝트의 필드 B::z로의 트랜잭션 기록은 A::y에 의해 가리켜진 오브젝트의 필드 B::z의 비트랜잭션 로드에 관해 동일한 데이터 아이템에 대한 메모리 액세스로서 모니터링되지 않는다. 이러한 예들로부터 추정할 때, 모니터들이 어떠한 데이터 세분 레벨에서든 모니터링/버퍼링을 수행할 수 있다고 판단 가능하다. Thus, monitoring / buffering memory accesses to data items may be performed at any data level granularity. For example, in one embodiment, memory accesses to data are monitored at the type level. Here, the transaction record to field A :: x and the non-transactional load of field A :: y can be monitored as access to the same data item, type A. In yet another embodiment, memory access monitoring / buffering is performed at field level granularity. Here, transaction records to field A :: x and non-transactional load of field A :: y are not monitored as access to the same data item because they are references to proprietary fields. It should be appreciated that other data structures or programming techniques may be considered in tracking memory accesses to data items. As an example, assume that the fields x and y of an object of class A, that is, A :: x and A :: y point to objects of class B, are initialized with newly allocated objects, and are never overwritten after initialization can do. In one embodiment, the transaction record of the object pointed by A :: x to field B :: z is the memory for the same data item with respect to the non-transactional load of field B :: z of the object pointed to by A :: y. It is not monitored as an access. Inferring from these examples, it is possible to determine that monitors can perform monitoring / buffering at any data granularity level.

일 실시예에서, 프로세서(100)는 액세스, 및 데이터 아이템들과 연관된 가능한 후속 충돌들을 검출하거나 추적하기 위한 모니터들을 포함한다. 한 예로서, 프로세서(100)의 하드웨어는 판독 모니터들 및 기록 모니터들을 포함하여, 그에 따라 모니터링되도록 결정되는 로드 및 저장를 추적하도록 한다. 예로서, 하드웨어 판독 모니터 및 기록 모니터들은 기본적인 저장 구조들의 세분도에도 불구하고, 데이터 아이템들의 해당 세분도로 데이터 아이템들을 모니터링해야 한다. 일 실시예에서, 데이터 아이템은 최소한 전체 데이터 아이템이 적절히 모니터링되게 하는 저장 구조들의 세분도로 연관 메커니즘들을 추적하는 것과 접해 있다. In one embodiment, processor 100 includes monitors for detecting or tracking access and possible subsequent conflicts associated with data items. As one example, the hardware of the processor 100 includes read monitors and write monitors to track the load and storage determined to be monitored accordingly. By way of example, hardware read monitors and write monitors should monitor data items with the corresponding granularity of data items, despite the granularity of underlying storage structures. In one embodiment, the data item is in contact with tracking association mechanisms at a granularity of storage structures that allow at least the entire data item to be properly monitored.

특정한 도시적 예로서, 판독 및 기록 모니터들은 하위 레벨 데이터 캐시(150) 내 위치들 같은 캐시 위치들과 연관된 속성들을 포함하여, 그 위치들과 연관된 어드레스들로부터의 로드 및 그 어드레스들로의 저장(저장)를 모니터링하도록 한다. 여기서, 데이터 캐시(150)의 캐시 위치에 대한 판독 속성은 같은 어드레스에 대해 잠정적으로 충돌하는 기록들에 대해 모니터하기 위한 캐시 위치와 연관된 어드레스로의 판독 이벤트시에 설정된다. 이 경우, 기록 속성들은 동일 어드레스에 대해 잠정적으로 충돌하는 판독 및 기록들을 모니터링하는 기록 이벤트에 대해 유사한 방법으로 동작된다. 더 나아간 예에서, 하드웨어는 캐시 위치들이 그에 따라 모니터링됨을 나타내기 위해 설정된 판독 및/또는 기록 속성들을 가진 캐시 위치들에 대한 판독 및 기록의 스눕스(snoops)에 기반하여 충돌을 검출할 수 있다. 반대로, 일 실시예에서 판독 및 기록 모니터들을 설정하거나 캐시 위치를 버퍼링된 상태로 업데이트하는 일이 검출될 다른 캐시들에서 모니터링되는 어드레스들과의 충돌을 고려하는 판독 요청들이나 소유 요청에 대한 판독 같은 스눕스들을 파생한다.As a particular illustrative example, read and write monitors may include attributes associated with cache locations, such as locations in lower level data cache 150, such as load from and store in addresses associated with those locations ( Storage). Here, the read attribute for the cache location of the data cache 150 is set upon a read event to an address associated with the cache location for monitoring for potentially conflicting writes to the same address. In this case, the write attributes are operated in a similar manner for write events that monitor read and writes that potentially conflict for the same address. In a further example, the hardware may detect a collision based on snoops of read and write to cache locations with read and / or write attributes set to indicate that cache locations are monitored accordingly. Conversely, in one embodiment snooping, such as reading a read request or an owning request, taking into account conflicts with addresses monitored in other caches to be detected, setting the read and write monitors or updating the cache location to a buffered state is detected. Deriving Swadle

따라서, 설계에 따라 캐시 라인들의 모니터링된 일치성 상태들 및 캐시 일관성 요청들의 상이한 조합들은 공유된 판독 모니터링 상태의 데이터 아이템 및 데이터 아이템에 대한 기록 요청을 나타내는 스눕을 보유한 캐시 라인 같이 잠정적 충돌을 가져온다. 반대로, 버퍼링된 기록 상태에 있는 데이터 아이템 및 데이터 아이템에 대한 판독 요청을 나타내는 외부 스눕을 보유한 캐시 라인은 충돌 가능한 것이라고 간주될 수 있다. 일 실시예에서, 액세스 요청 및 속성 상태의 그러한 조합을 검출하기 위해, 스눕 로직은 충돌 검출/리포팅을 위한 모니터들 및/또는 로직 뿐 아니라 충돌을 보고하기 위한 상태 레지스터들 같은 충돌 검출/리포팅 로직에 연결된다. Thus, depending on the design, different combinations of cache consistency requests and monitored consistency states of cache lines result in a potential conflict, such as a cache line having a snoop indicating a data item in a shared read monitoring state and a write request for the data item. Conversely, a cache item with an external snoop indicating a data item in a buffered write state and a read request for the data item may be considered to be collidable. In one embodiment, to detect such a combination of access request and attribute status, the snoop logic is connected to conflict detection / reporting logic such as monitors and / or logic for collision detection / reporting as well as status registers for reporting a conflict. Connected.

그러나, 조건과 상황의 조합이 도 11-도 12를 참조하여 이하에 보다 상세히 논의되는 커미트 명령어 같은 어떤 명령어에 의해 정의될 수 있는 트랜잭션의 무효화에 고려될 수 있다. 트랜잭션의 비 커미트(non-commit)에 대해 고려될 수 있는 요인들의 예는 트랜잭션 방식으로 액세스된 메모리 위치에 대한 충돌의 검출, 모니터 정보의 손실, 버퍼링된 데이터의 손실, 트랜잭션 방식으로 액세스된 데이터 아이템과 연관된 메타데이터의 손실, 및 인터럽트, 링 트랜잭션, 또는 명백한 사용자 명령어 같은 다른 무효화 이벤트 검출을 포함한다.However, a combination of conditions and circumstances may be considered for invalidating a transaction that may be defined by any instruction, such as a commit instruction discussed in more detail below with reference to FIGS. 11-12. Examples of factors that can be considered for non-commit of a transaction include: detection of conflicts with transactionally accessed memory locations, loss of monitor information, loss of buffered data, transactionally accessed data items Loss of metadata associated with and detection of other invalidation events such as interrupts, ring transactions, or explicit user instructions.

일 실시예에서, 프로세서(100)의 하드웨어는 버퍼링되는 방식으로 트랜잭션 업데이트들을 보유하도록 한다. 상술한 바와 같이, 트랜잭션 기록들은 트랜잭션이 커미트될 때까지 전체적으로 가시화되지 못한다. 그러나, 트랜잭션 기록와 연관된 로컬 소프트웨어 스레드는 후속 트랜잭션 액세스들의 트랜잭션 업데이트들을 액세스할 수 있다. 첫 번째 예로서, 다른 외부 스레드들이 아닌 로컬 스레드에 대한 업데이트들을 제공할 수 있는 버퍼링된 업데이트들을 보유하는 별도의 버퍼 구조가 프로세서(100) 내에서 주어진다. 별도의 버퍼 구조를 포함하는 것은 비용이 들고 복잡할 가능성이 있다. In one embodiment, the hardware of the processor 100 allows for retaining transaction updates in a buffered manner. As mentioned above, transaction records are not visible at all until the transaction is committed. However, the local software thread associated with the transaction record can access transaction updates of subsequent transaction accesses. As a first example, a separate buffer structure is provided within processor 100 that holds buffered updates that can provide updates to a local thread rather than other external threads. Including separate buffer structures can be expensive and complex.

반대로, 또 다른 예로서, 데이터 캐시(150) 같은 캐시 메모리가 활용되어 동일한 트랜잭션 기능을 제공하면서 업데이트들을 버퍼링하도록 한다. 여기서, 캐시(150)는 버퍼링된 일관성 상태로 데이터 아이템들을 보유할 수 있고, 이 경우, 새로 버퍼링된 일관성 상태가 MESIB 프로토콜을 형성하기 위해 MESI(Modified Exclusive Shared Invalid) 프로토콜 같은 캐시 일관성 프로토콜에 추가된다. 버퍼링된 데이터 아이템-버퍼링된 일관성 상태에서 보유되는 데이터 아이템에 대한 로컬 요청에 응답하여, 캐시(150)는 내부 트랜잭션 후속 명령을 보장하기 위해 그 데이터 아이템을 로컬 프로세싱 요소로 제공한다. 그러나, 외부 액세스 요청에 응답하여, 트랜잭션 방식으로 업데이트된 데이터 아이템이 커미트 전까지는 전체적으로 가시화되지 않게 하기 위해 미스(miss) 응답이 주어진다. 또한, 캐시(150)의 라인이 버퍼링된 일관성 상태로 보유되고 퇴거(eviction)을 위해 선택될 때, 버퍼링된 업데이트는 상위 레벨 캐시 메모리들로는 다시 쓰여지지 않는다-버퍼링된 업데이트는 메모리 시스템을 통해 증가되는 것이 아니다, 즉 커미트 후까지는 전체적으로 가시화되지 않는다. 커미트 시, 버퍼링된 라인들이 데이터 아이템을 전체적으로 가시화되게 하는 변형 상태로 전환된다. Conversely, as another example, cache memory such as data cache 150 may be utilized to buffer updates while providing the same transactional functionality. Here, cache 150 may hold data items in a buffered coherency state, in which case a newly buffered coherency state is added to a cache coherency protocol such as a Modified Exclusive Shared Invalid (MESI) protocol to form the MESIB protocol. . In response to a local request for a data item held in a buffered data item-buffered coherency state, the cache 150 provides the data item to a local processing element to ensure an internal transaction follow-up command. However, in response to an external access request, a miss response is given to ensure that the data item updated in a transactional manner is not fully visible until committed. Also, when a line of cache 150 is held in a buffered coherency state and selected for eviction, the buffered update is not rewritten to higher level cache memories-the buffered update is incremented through the memory system. This is not the case, ie it is not entirely visible until after commit. Upon commit, the buffered lines are transitioned to a transformed state that makes the data item visible as a whole.

내외부의 조건들은 보통 트랜잭션의 실행이나 캐시를 공유하는 프로세싱 요소들과 연관된 스레드의 관점과 연관된다는 것을 알아야 한다. 예를 들어, 트랜잭션의 실행과 연관된 소프트웨어 스레드를 실행하기 위한 제1프로세싱 요소는 로컬 스레드를 말한다. 따라서, 위에서 논의된 바와 같이, 버퍼링된 일관성 상태로 보유되는 어드레스에 대한 캐시 라인을 파생하는 제1스레드에 의해 앞서 쓰여진 어드레스로의 저장이나 그로부터의 로드가 수신되면, 버퍼링된 버전의 캐시 라인은 제1스레드가 로컬 스레드이기 때문에 제1스레드로 주어진다. 반대로 제2스레드는 동일 프로세서 내 다른 프로세싱 요소 상에서 실행될 수 있으나, 버퍼링된 상태로 보유되는 캐시 라인을 담당하는 트랜잭션의 실행과는 연관되지 않는다-외부 스레드; 따라서, 제2스레드로부터 어드레스로의 로드나 저장는 버퍼링된 버전의 캐시 라인을 생략하며, 보통의 캐시 교체가 활용되어 상위 레벨 메모리로부터 버퍼링되지 않은 버전의 캐시 라인을 검색한다. It should be noted that internal and external conditions usually relate to the view of a thread associated with the execution of a transaction or processing elements that share a cache. For example, the first processing element for executing a software thread associated with the execution of a transaction refers to a local thread. Thus, as discussed above, upon receiving or loading from an address previously written by a first thread that derives a cache line for an address held in a buffered coherency state, the buffered version of the cache line is removed. Since thread 1 is a local thread, it is given as the first thread. Conversely, the second thread can run on other processing elements within the same processor, but is not associated with the execution of a transaction that is responsible for the cache line held in a buffered state—external thread; Thus, loading or storing from the second thread to the address omits the buffered version of the cache line, and normal cache replacement is utilized to retrieve the unbuffered version of the cache line from the upper level memory.

여기서, 내부/로컬 및 외부/리모트 스레드들은 동일한 프로세서 상에서 실행되며, 어떤 실시예에서는 캐시에 대한 액세스를 공유하는 프로세서의 동일 코어 내 별도의 프로세싱 요소들 상에서 실행될 수 있다. 그러나, 이러한 조건들의 사용이 거기에 국한되는 것은 아니다. 상술한 바와 같이, 로컬이란 트랜잭션 실행과 연관된 단독 스레드에 특정되는 것이 아니라 캐시에 대한 액세스를 공유하는 여러 스레드들을 의미하며, 외부나 리모트는 그 캐시로의 액세스를 공유하지 않는 스레드들을 의미할 수 있다. Here, internal / local and external / remote threads run on the same processor, and in some embodiments may run on separate processing elements within the same core of the processor sharing access to the cache. However, the use of these conditions is not limited thereto. As mentioned above, local refers to a number of threads that are not specific to a single thread associated with a transaction execution but share access to a cache, and external or remote may refer to threads that do not share access to that cache. .

위에서 도 1의 초기 참조시 서술한 바와 같이, 프로세서(100)의 구조는 단순히 논의의 목적으로 예시된 것이다. 마찬가지로, 데이터를 동일 메모리의 개별 엔트리들의 메타데이터와 연관하는 어떤 방법이 활용될 수 있기 때문에, 메타데이터를 참조하는 데이터 어드레스들을 변환하는 특정 예들 또한 본보기적인 것이다. As described above in the initial reference of FIG. 1, the structure of the processor 100 is merely illustrated for purposes of discussion. Similarly, certain examples of translating data addresses that reference metadata are also exemplary because any method of associating data with metadata of individual entries in the same memory can be utilized.

메타데이터에 대한 추상적 어드레스 공간Abstract address space for metadata

메타데이터 Metadata

도 2를 보면, 프로세서에서 데이터 아이템에 대한 메타데이터를 보유하는 실시예가 예시된다. 묘사된 바와 같이, 데이터 아이템(216)의 메타데이터(217)가 메모리(215) 내부에 보유된다. 메타데이터는 데이터 아이템(216)과 연관된 트랜잭션 정보 같이, 데이터 아이템(216)과 연관된 어떤 특성이나 속성을 포함한다. 메타데이터의 일부 예시적 예들이 이하에 포함된다; 개시된 메타데이터의 예들 또한 단지 예시적인 것일 뿐 포괄적 리스트를 포함하는 것인 아니다. 또한, 메타데이터 위치(217)는 이하에 논의되는 예들 및 구체적으로 논의되지 않는 데이터 아이템(216)에 대한 다른 속성들의 어떤 조합을 보유할 수 있다. 2, an embodiment of retaining metadata for a data item in a processor is illustrated. As depicted, metadata 217 of data item 216 is retained inside memory 215. The metadata includes certain properties or attributes associated with the data item 216, such as transaction information associated with the data item 216. Some illustrative examples of metadata are included below; Examples of the disclosed metadata are also merely illustrative and not inclusive of a comprehensive list. In addition, metadata location 217 may hold any combination of other attributes for data item 216, examples discussed below, and not specifically discussed.

첫 번째 예로서, 메타데이터(217)는 데이터 아이템(216)이 어떤 트랜잭션 중에 앞서 액세스되고, 버퍼링되고/거나 백업되었다면, 트랜잭션 방식으로 기입된 데이터 아이템(216)의 백업 또는 버퍼 위치에 대한 레퍼런스를 포함한다. 여기서, 일부 구현예에서 이전 버전의 데이터 아이템(216)의 백업 사본이 다른 위치에 보유되고, 그 결과 메타데이터(217)는 어드레스, 또는 백업 위치에 대한 다른 레퍼런스를 포함한다. 다른 대안으로서, 메타데이터(217) 자체가 데이터 아이템(216)에 대한 백업 또는 버퍼 위치로서 작용할 수도 있다.As a first example, metadata 217 may reference a reference to a backup or buffer location of data item 216 that has been written transactionally, if data item 216 was previously accessed, buffered, and / or backed up during any transaction. Include. Here, in some implementations a backup copy of a previous version of data item 216 is retained in another location, such that metadata 217 includes an address or other reference to the backup location. Alternatively, the metadata 217 itself may act as a backup or buffer location for the data item 216.

또 다른 예로서, 메타데이터(217)는 데이터 아이템(216)에 대한 반복 트랜잭션 액세스를 가속화하기 위한 필터 값을 포함한다. 보통, 소프트웨어를 활용하는 트랜잭션의 실행 중에, 일관성 및 데이터 유효성을 보장하기 위해 트랜잭션 메모리 액세스시 액세스 장벽들이 이행된다. 예를 들어, 트랜잭션 로드 동작 전에, 판독 장벽이 실행되어 데이터 아이템(216)이 언락되었는지의 테스트, 현재의 트랜잭션 판독 집합이 아직 유효한지의 판단, 필터 값 업데이트, 및 추후 검증을 가능하게 할 트랜잭션의 판독 집합 내 버전 값들의 기록과 같은 판독 장벽 동작들을 수행하도록 한다. 그러나, 트랜잭션 실행 중에 그 위치의 판독이 이미 수행되었다면, 동일한 판독 장벽 동작들이 불필요해질 가능성이 있다. As another example, metadata 217 includes filter values for accelerating repetitive transaction access to data item 216. Usually, during execution of a transaction utilizing software, access barriers are enforced in transactional memory access to ensure consistency and data validity. For example, prior to a transaction load operation, a read barrier is executed to test whether the data item 216 is unlocked, determine if the current set of transaction reads is still valid, update filter values, and verify the transaction to enable later verification. Read barrier operations such as writing of version values in a read set. However, if a read of that location has already been performed during the execution of a transaction, there is a possibility that the same read barrier operations are unnecessary.

그 결과, 한 가지 해법은 데이터 아이템(216)이나 그에 따른 어드레스가 트랜잭션의 실행 중에 읽혀지지 않았다는 것을 가리키기 위한 제1디폴트 값 및 데이터 아이템(216)이나 그에 따른 어드레스가 트랜잭션의 계류 중에 이미 액세스되었음을 나타내기 위한 제2 액세스 값을 보유하도록 판독 필터를 활용하는 것을 포함한다. 실제로, 제2액세스 값은 판독 장벽이 가속화되어야 하는지 여부를 나타낸다. 이 예에서, 트랜잭션 로드 동작이 수신되고 메타데이터 위치(217) 내 판독 필터 값이 데이터 아이템(216)이 이미 읽혀졌음을 가리키는 경우, 일 실시예에서 판독 장벽은 불필요한 반복적 판독 배리어 동작을 수행하지 않음으로써 트랜잭션 실행을 가속화하도록 제거된다-실행되지 않는다. 기록 필터 값은 기록 동작에 관해 동일한 방법으로 운영될 수 있음을 알아야 한다. 그러나, 일 실시예에서 단일 필터 값이 어드레스가 이미 액세스되었는지-기입되었는지 읽혀졌는지 여부를 가리키기 위해 사용되는 것처럼, 개별 필터 값은 단순히 예시적인 것이다. 여기서 로드와 저장 둘 모두에 대한 216의 메타데이터(217)를 체크하는 메타데이터 액세스 동작들은 메타데이터(217)가 별도의 판독 필터 값과 기록 필터 값을 포함하는 위의 예들과는 다르게 하나의 필터 값을 활용한다. 특정한 본보기적 실시예로서, 메타데이터(217)의 네 비트가 연관 데이터 아이템에 대해 판독 장벽이 가속화될 것인지를 가리키는 판독 필터, 기록 장벽이 연관 데이터 아이템에 대해 가속화될 것인지를 가리키는 기록 필터, 취소(undo) 동작들이 가속화되어야 함을 가리키는 취소 필터, 및 필터 값으로서 소프트웨어에 의해 어떤 방식에 따라 활용될 기타 필터에 할당된다. As a result, one solution is to indicate that the data item 216 or its address has not been read during the execution of the transaction and that the data item 216 or its address has already been accessed during the transaction's pending. Utilizing the read filter to retain a second access value for presentation. In practice, the second access value indicates whether the read barrier should be accelerated. In this example, if a transaction load operation is received and the read filter value in metadata location 217 indicates that the data item 216 has already been read, in one embodiment the read barrier does not perform unnecessary repetitive read barrier operations. As such, they are removed to speed up transaction execution-they are not executed. Note that the write filter value can be operated in the same way with respect to the write operation. However, the individual filter values are merely exemplary, as in one embodiment a single filter value is used to indicate whether an address has already been accessed-written or read. Here, metadata access operations that check 216 metadata 217 for both load and store are different from the above examples where the metadata 217 includes separate read filter values and write filter values. To utilize. In a particular exemplary embodiment, four bits of metadata 217 are read filters that indicate whether a read barrier will be accelerated for an associated data item, a write filter that indicates whether a write barrier will be accelerated for an associated data item, canceled ( undo) is assigned to the undo filter indicating that operations should be accelerated, and to other filters to be utilized in some way by the software as filter values.

메타데이터의 소수의 다른 예들로는, 데이터 아이템(216)과 연관된 트랜잭션에 특정되거나 포괄적인 핸들러의 어드레스 표시, 표현, 또는 레퍼런스, 데이터 아이템(216)과 연관된 트랜잭션의 번복불가/처리하기 힘든 특징, 데이터 아이템(216)의 손실, 데이터 아이템(216)에 대한 모니터링 정보의 손실, 데이터 아이템(216)에 대해 검출되는 충돌, 데이터 아이템(216)과 연관된 판독 집합의 어드레스 또는 판독 집합 안의 판독 엔트리, 데이터 아이템(216)의 앞서 기록된 버전, 데이터 아이템(216)의 현재 버전, 데이터 아이템(216)에 대한 액세스를 허용하는 락, 데이터 아이템(216)의 버전 값, 데이터 아이템(216)과 연관된 트랜잭션의 트랜잭션 서술자(descriptor), 및 기타 알려진 트랜잭션 연관 서술 정보가 포함된다. 또한, 위에 기술된 바와 같이, 메타데이터의 사용은 트랜잭션 정보에 국한되지 않는다. 결과적으로, 메타데이터(217)는 트랜잭션과 연관되지 않는, 데이터 아이템(216) 연관 정보, 특성, 속성, 또는 상태들을 또한 포함할 수 있다.A few other examples of metadata include, but are not limited to, an address indication, representation, or reference of a handler specific to or inclusive of a transaction associated with the data item 216, a non-reversible / hard to process transaction of the transaction associated with the data item 216. Loss of item 216, loss of monitoring information for data item 216, collision detected for data item 216, address of a read set associated with data item 216 or a read entry in the read set, a data item The previously recorded version of 216, the current version of data item 216, the lock that allows access to data item 216, the version value of data item 216, and the transaction of the transaction associated with data item 216. Descriptors, and other known transaction association description information. In addition, as described above, the use of metadata is not limited to transaction information. As a result, metadata 217 may also include data item 216 association information, properties, attributes, or states that are not associated with a transaction.

메타데이터 예에 대한 논의를 계속하면, 상술한 하드웨어 모니터와 버퍼링된 일관성 상태들 또한 어떤 실시예들의 메타데이터에 대해 고려된다. 모니터들은 어떤 위치가 소유 요청을 위한 외부 판독나 외부 판독 요청을 위해 모니터링되어야 하는지 여부를 나타내고, 버퍼링된 일관성 상태는 데이터 아이템을 보유한 연관 데이터 캐시 라인이 버퍼링되는지를 나타낸다. 또, 상기 예들에서, 모니터들은 캐시 라인들에 부가되거나 직접 연관되는 속성 비트들로서 유지되고, 버퍼링된 일관성 상태는 캐시 라인 일관성 상태 비트들에 추가된다. 결과적으로, 이 경우, 하드웨어 모니터들 및 버퍼링된 일관성 상태들은 예시된 메타데이터(217) 같이 별도의 추상적 어드레스 공간에 보유되는 것이 아니라 캐시 라인 구조의 일부이다. 그러나, 다른 실시예들에서 모니터들은 데이터 아이템(216)과 다른 메모리 위치의 메타데이터(217)로서 보유될 수 있고, 마찬가지로 메타데이터(217)는 데이터 아이템(216)이 버퍼링된 데이터 아이템이라는 것을 가리키는 레퍼런스를 포함할 수 있다. 이와 달리, 상술한 것 같이 데이터 아이템(216)이 업데이트되고 버퍼링된 상태로 보유되는 제자리 업데이트(update-in-place) 구조 대신, 메타데이터(217)가 버퍼링된 데이터 아이템을 보유할 수 있는 한편, 데이터 아이템(216)의 전체적 가시화 버전은 그 원래 위치에 유지된다. 여기서, 커미트 시, 메타데이터(217)에 보유된 버퍼링된 업데이트가 데이터 아이템(216)을 대체한다. Continuing to discuss the metadata example, the hardware monitor and buffered coherency states described above are also considered for the metadata of certain embodiments. The monitors indicate which location should be monitored for an external read for an ownership request or an external read request, and the buffered coherency state indicates whether the associated data cache line holding the data item is buffered. Further, in the above examples, the monitors are maintained as attribute bits added to or directly associated with cache lines, and the buffered coherency state is added to the cache line coherency state bits. As a result, in this case, the hardware monitors and buffered coherency states are part of the cache line structure, rather than being held in a separate abstract address space like the illustrated metadata 217. However, in other embodiments the monitors may be retained as metadata 217 of a different memory location than the data item 216, likewise metadata 217 indicates that the data item 216 is a buffered data item. May contain a reference. Alternatively, instead of the update-in-place structure in which the data item 216 is updated and held in a buffered state as described above, the metadata 217 may retain the buffered data item, The overall visualized version of the data item 216 is kept in its original location. Here, at commit, the buffered update held in metadata 217 replaces data item 216.

손실있는Lost 메타데이터 Metadata

버퍼링된 캐시 일관성 상태들을 참조한 상기 논의와 마찬가지로, 메타데이터(217)는 일 실시예에서 메모리(215) 도메인 밖에서 외부 요청들에 대해 제공되는 손실을 가진 로컬 정보이다. 일 실시예에서 메모리(215)가 공유된 캐시 메모리라고 가정할 때, 메타데이터 액세스 동작에 대한 응답의 미스(miss)는 캐시 메모리(215) 도메인 밖에서는 서비스되지 않는다. 실제로, 손실있는 메타데이터(217)는 캐시 도메인 안에서 국지적으로만 보유되고 메모리 서브시스템 전체에 걸친 지속적 데이터로서 존재하지 않기 때문에, 그러한 미스를 상위 레벨 메모리로부터의 요청을 서비스하기 위해 외부로 전달할 이유가 없다. 그 결과, 손실있는 메타데이터에 대한 미스들은 빠르고 효율적으로 서비스될 가능성이 있다; 메타데이터가 생성되거나 서비스되도록 하는 외부 요청을 기다리지 않고 프로세서 내 메모리의 즉각적 할당이 정해질 수 있다.Similar to the above discussion with reference to buffered cache coherency states, metadata 217 is lossy local information provided for external requests outside the memory 215 domain in one embodiment. In one embodiment, assuming memory 215 is shared cache memory, a miss in response to the metadata access operation is not serviced outside the cache memory 215 domain. In practice, since the lost metadata 217 is only held locally within the cache domain and does not exist as persistent data throughout the memory subsystem, there is no reason to forward such misses out of service to service requests from higher level memory. none. As a result, misses on lost metadata are likely to be serviced quickly and efficiently; Immediate allocation of memory in the processor can be made without waiting for an external request for metadata to be generated or serviced.

추상적 어드레스 공간Abstract address space

예시된 실시예가 묘사하듯이, 메타데이터(217)는 데이터 아이템(216)과 별개인 메모리 위치-다른 어드레스-에 보유되고, 이것은 메타데이터에 대한 별개의 추상적 어드레스 공간을 파생시킨다; 데이터 어드레스 공간에 직교하는 추상적 어드레스 공간-추상적 어드레스 공간에 대한 메타데이터 액세스 동작들은 물리적 데이터 엔트리를 건드리거나 수정하지 못한다. 그러나, 메타데이터가 메모리(215) 같은 동일한 메모리 안에 보유되는 실시예에서, 추상적 어드레스 공간은 메모리(215) 내 할당을 위한 경쟁을 통해 데이터 어드레스 공간에 영향을 미칠 수 있다. 예로서, 데이터 아이템(216)은 메모리(215)의 어떤 엔트리 안에 캐싱되고, 데이터(216)의 메타데이터(217)는 캐시의 다른 엔트리 안에 보유된다. 여기서, 후속 메타데이터 동작은 퇴거 및 다른 데이터 아이템의 메타데이터로의 교체를 위한 데이터 아이템(216)의 메모리 위치 선택이라는 결과를 가져올 수 있다. 그 결과, 메타데이터(217)의 어드레스와 연관된 동작들은 데이터 아이템(216)을 건드리지 않지만, 메타데이터 요소의 메타데이터 어드레스는 메모리(215) 내 데이터 아이템(216) 같은 물리적 데이터를 교체할 수 있다. As the illustrated embodiment depicts, the metadata 217 is held at a memory location—another address—separate from the data item 216, which derives a separate abstract address space for the metadata; Metadata access operations on an abstract address space-abstract address space orthogonal to the data address space do not touch or modify the physical data entry. However, in embodiments where metadata is held in the same memory, such as memory 215, the abstract address space may affect the data address space through contention for allocation in memory 215. By way of example, data item 216 is cached in some entry in memory 215, and metadata 217 of data 216 is retained in another entry in the cache. Here, subsequent metadata operations may result in memory location selection of data item 216 for retirement and replacement of other data items with metadata. As a result, operations associated with the address of metadata 217 do not touch data item 216, but the metadata address of the metadata element may replace physical data, such as data item 216 in memory 215.

이 예에서 메타데이터가 캐시 메모리 내 공간을 위해 데이터와 경쟁할 가능성이 있음에도 불구하고, 메타데이터를 내부적으로 보유하는 기능은 메모리 계층구조 전체에 걸쳐 지속적 메타데이터를 만연시키는 값비싼 비용 없이 메타데이터의 효율적 지원을 파생할 수 있다. 이 예의 가정을 통해 추정된 바와 같이, 메타데이터는 동일한 메모리인 메모리(215)에 보유된다; 그러나, 다른 대안적 실시예에서 데이터 아이템(216)에 대한/연관된 메타데이터(217)는 별개의 메모리 구조 안에 보유된다. 여기서, 메타데이터 및 데이터의 어드레스는 동일할 수 있고, 메타데이터의 추상적 부분은 데이터 저장 구조 대신 별개의 메타데이터 저장 구조 안으로 인덱싱된다. In this example, even though metadata is likely to compete with data for space in cache memory, the ability to retain metadata internally does not allow metadata to be consumed without the expensive cost of perpetuating persistent metadata across the memory hierarchy. Efficient support can be derived. As estimated through the assumption of this example, the metadata is held in memory 215 which is the same memory; However, in another alternative embodiment / metadata 217 for data item 216 is retained in a separate memory structure. Here, the metadata and the address of the data may be the same, and the abstract portion of the metadata is indexed into a separate metadata storage structure instead of the data storage structure.

일대일 비의 메타데이터 대 데이터에서, 추상적 어드레스 공간이 데이터 어드레스 공간을 섀도잉(shadow)하지만, 위에서 논의된 바와 같이 계속 직교 상태로 남는다. 반대로 이하에 논의되는 것과 같이, 메타데이터는 물리적 데이터에 대해 압축될 수 있다. 이 경우, 메타데이터에 대한 추상적 어드레스 공간의 사이즈는 데이터 어드레스공간의 사이즈를 섀도잉하지 않지만 계속해서 직교 상태로 남는다.In a one-to-one ratio of metadata to data, the abstract address space shadows the data address space, but remains orthogonal as discussed above. Conversely, as discussed below, metadata can be compressed for physical data. In this case, the size of the abstract address space for the metadata does not shadow the size of the data address space but remains in an orthogonal state.

추상적 어드레스 변환Abstract address translation

추상적 어드레스 공간에 대한 논의를 계속하면, 데이터 어드레스 공간 내 데이터 아이템(216)의 어드레스 같은 데이터 어드레스를 추상적 어드레스 공간 내 메타데이터(217)의 메타데이터 어드레스 같은 추상적 어드레스로 변환하는 어떤 방법이 사용될 수 있다. 일 실시예에서, 추상적 변환 로직(210)이 데이터 어드레스(200) 같은 어드레스를 메타데이터 어드레스로 변환하는데 사용된다. 묘사된 어드레스(200)는 데이터 아이템(216)과 연관되거나 데이터 아이템(216)을 참조하는 어드레스를 포함한다. 물리적 혹은 선형 어드레스와 가상 어드레스 사이의 변환 같은 보통의 데이터 변환이 사용되어 메모리(215) 안의 데이터 아이템(216)으로 인덱싱할 수 있다. 또, 메타데이터(217)의 데이터 아이템(216)과의 연관은 데이터 아이템(216)을 참조하는 어드레스(200)의 메타데이터(217)를 참조하는 다른 별개의 어드레스로의 유사한 변환을 포함하고, 따라서, 데이터 변환 로직(205)을 이용한 데이터 어드레스 및 추상적 변환(210)을 이용한 별개의 추상적 어드레스로의 어드레스(200) 변환은 서로 간섭하지 않는 별개의 액세스를 낳는다-두 어드레스 공간들의 직교 특성을 만든다. 이하에서 보다 상세히 논의되는 것과 같이, 일 실시예에서 데이터 변환(205)이나 추상적 변환(210)의 이용은 어드레스(200)를 액세스하는 동작의 타입에 기반한다-데이터 아이템(216)을 액세스하는 보통의 데이터 액세스는 데이터 변환(205)을 이용하고, 메타데이터(217)를 액세스하는 메타데이터 액세스 동작은 추상적 변환(210)을 이용하며, 이것은 명령어/오퍼레이션 동작 코드(오퍼레이션 코드)의 일부를 통해 식별될 수 있다. Continuing with the discussion of the abstract address space, any method of translating a data address, such as the address of data item 216 in the data address space, into an abstract address, such as the metadata address of metadata 217 in the abstract address space, may be used. . In one embodiment, abstract translation logic 210 is used to translate an address, such as data address 200, into a metadata address. The depicted address 200 includes an address associated with or referring to the data item 216. Ordinary data translations, such as translations between physical or linear addresses and virtual addresses, may be used to index into data items 216 in memory 215. In addition, the association of the metadata 217 with the data item 216 includes a similar translation of the address 200 referring to the data item 216 into another separate address referring to the metadata 217, Thus, the translation of the data address using the data translation logic 205 and the address 200 translation to a separate abstract address using the abstract translation 210 results in separate accesses that do not interfere with each other-creating orthogonal characteristics of the two address spaces. . As discussed in more detail below, the use of data translation 205 or abstract transformation 210 in one embodiment is based on the type of operation that accesses address 200—usually accessing data item 216. Access of the data uses the data transformation 205 and the metadata access operation to access the metadata 217 uses the abstract transformation 210, which is identified through a portion of the instruction / operation operation code (operation code). Can be.

다른 실시예에서, 그 오퍼레이션 코드에 의해 식별되는 명령어는 주어진 메타데이터 어드레스에 대한 데이터 및 메타데이터를 모두 액세스할 수 있고, 그에 따라 메타데이터에 기반하는 데이터에 대한 조건부 저장과 같은 복잡한 동작을 수행할 수 있다. 예로서, 어떤 명령어가 메타데이터를 테스트하고 그것을 어떤 값으로 설정하는 테스트 및 설정 메타데이터 동작뿐 아니라 메타데이터의 테스트가 성공했을 때 데이터를 어떤 값에 설정하는 추가 동작으로 디코딩된다. 또 다른 예로서, 데이터 아이템은 데이터 메모리로부터의 데이터 판독에 기반하여 매칭하는 메타데이터 어드레스로 옮겨질 수 있다.In another embodiment, the instruction identified by the operation code can access both data and metadata for a given metadata address, thereby performing a complex operation such as conditional storage for data based on the metadata. Can be. For example, a command is decoded into a test and setup metadata operation that tests the metadata and sets it to a value, as well as an additional operation that sets the data to a value when the test of the metadata is successful. As another example, data items may be moved to matching metadata addresses based on reading data from the data memory.

데이터 어드레스(200)를 메타데이터(217)의 메타데이터 어드레스로 변환하는 것의 예들이 바로 아래에 포함되어 있다. 제1예로서, 데이터 어드레스를 메타데이터 어드레스로 변환하는 것은 물리적 어드레스나 가상의 어드레스를 활용하는 것-정상적 데이터 변환(205) 후-에 더하여, 추상적 변환 로직(210)을 이용해 추상적 값을 더하여 데이터 어드레스들을 메타데이터 어드레스로부터 분리하는 것을 포함한다. 변환 없이 가상 어드레스가 사용되는 상황에서, 추상적 변환 로직(210)은 가상 어드레스를 추상적 값과 연관하는 로직을 포함한다. 그러나, 일반적인 가상 어드레스에서 물리적 어드레스로의 변환이 활용되는 경우, 어드레스(200)로부터 변환된 어드레스를 얻기 위해 일반적인 데이터 변환(205)이 사용되고, 이때 추상적 변환 로직(210)이 그 변환된 어드레스를 추상적 값과 연관하여 메타데이터 어드레스를 형성하는 로직을 포함한다. 또 다른 예로서, 데이터 어드레스(200)는 별개의 메타데이터 어드레스를 얻기 위해 추상적 변환(210) 내에서 별개의 변환 구조, 테이블, 및/또는 로직을 이용해 변환될 수 있다. 여기서, 추상적 변환 로직(210)은 데이터 변환 로직(205)과 비교해 별도의 로직-추상적 값과 어드레스(200)를 연관하는 로직을 반영 또는 포함할 수 있지만, 추상적 변환 로직(200)은 어드레스(200)를 다른 별개의 메타데이터 어드레스로 변환하기 위한 페이지 테이블 정보를 포함한다. 메타데이터 어드레스를 얻기 위한 정보의 추가, 첨부된 정보를 이용한 확장, 정보의 교체, 또는 데이터 어드레스의 변환을 통해, 결과적으로 별개의 메타데이터 어드레스는 데이터 아이템을 부정확하게 업데이트하거나 판독하는 것으로부터 직교 상태를 유지하면서 추가, 확장, 교체, 또는 변환 알고리즘을 통해 데이터 아이템과 연관된다. Examples of translating data address 200 into a metadata address of metadata 217 are included directly below. As a first example, translating a data address into a metadata address utilizes a physical address or a virtual address—after normal data translation 205—and adds abstract values using abstract translation logic 210 to add data. Separating the addresses from the metadata address. In situations where virtual addresses are used without translation, abstract translation logic 210 includes logic to associate virtual addresses with abstract values. However, if a general virtual address to physical address translation is utilized, then general data translation 205 is used to obtain a translated address from address 200, where abstract translation logic 210 abstracts the translated address. Logic to form a metadata address in association with the value. As another example, data address 200 may be translated using separate translation structures, tables, and / or logic within abstract translation 210 to obtain separate metadata addresses. Here, the abstract transformation logic 210 may reflect or include a logic that associates the address 200 with a separate logic-abstract value compared to the data transformation logic 205, while the abstract transformation logic 200 may include an address 200. ) Is used to convert page table information to other distinct metadata addresses. Through addition of information to obtain metadata addresses, expansion with attached information, replacement of information, or conversion of data addresses, the result is that separate metadata addresses are orthogonal from incorrectly updating or reading data items. It is associated with the data item via an add, extend, replace, or transform algorithm, while maintaining it.

데이터 어드레스를 메타어드레스로 변환하는 일, 또는 달리 말해 데이터 어드레스로부터/에 기반하여 메타데이터 어드레스를 결정하는 일의 소수의 특정한 예시적 예들이 이하에 기술된다. (1) 보통의 가상 어드레스에서 물리적 어드레스로의 변환을 이용해 제1데이터 어드레스를 제2데이터 어드레스로 변환하고, 추상적 값을 데이터 어드레스에 추가하거나 그 안에 포함시켜 메타데이터 어드레스를 형성하는 일; (2) 데이터 어드레스에 대해 가상 어드레스에서 물리적 어드레스로의 변환을 수행하지 않거나, 추상적 값을 데이터 어드레스에 추가하거나 그 안에 포함시켜 메타데이터 어드레스를 형성하는 일; (3) 메타데이터 어드레스를 형성하기 위해 변환된 메타데이터 어드레스에 추상적 값을 추가, 첨부, 또는 그 안에 포함시키는 것을 포함할 수 있으나 반드시 포함할 필요는 없는 추상적 변환 테이블 로직을 이용하여 데이터 어드레스를 변환된 메타데이터 어드레스로 변환하는 일. 또, 상술한 변환 기법들 중 어느 하나는 데이터 대 메타데이터의 압축 비율을 포함하여, 즉 그에 기반하여, 각각의 압축비율에 대해 개별적으로 메타데이터를 저장하도록 할 수 있다. A few specific illustrative examples of converting a data address into a metaaddress, or in other words determining a metadata address based on / from the data address, are described below. (1) converting the first data address into a second data address using a normal virtual address to physical address translation, and adding or including an abstract value to the data address to form a metadata address; (2) do not perform a virtual to physical address translation for the data address or add or include abstract values in the data address to form a metadata address; (3) convert the data address using abstract translation table logic, which may include, but not necessarily include, abstract values in the translated metadata address to form the metadata address. To converted metadata addresses. In addition, any of the above-described transformation techniques may include storing the metadata separately for each compression ratio, including, based on, the compression ratio of data to metadata.

여기서, 변환 및/또는 압축을 위한 어드레스는 어드레스의 특정 비트들을 무시하고, 어드레스의 특정 비트들을 제거하고, 데이터에 대한 다양한 세분도 선택에 사용되는 어드레스의 비트 범위를 변경하고, 특정 비트들을 변환하고, 특정 비트들을 메타데이터 연관 정보에 추가하거나 그를 교체하는 등의 일을 통해 수정될 수 있다. 압축은 도 4를 참조하여 이하에서 보다 상세히 논의된다. Here, the address for translation and / or compression ignores specific bits of the address, removes specific bits of the address, changes the bit range of the address used to select various granularities for the data, converts specific bits and Specific bits may be modified by adding or replacing certain bits to metadata association information. Compression is discussed in more detail below with reference to FIG. 4.

여러 추상적 어드레스 공간Multiple abstract address spaces

도 3으로 가면, 여러 추상적 어드레스 공간을 지원하는 실시예가 예시된다. 일 실시예에서, 각각의 프로세성 요소는 추상적 어드레스 공간과 연관되어, 각각의 프로세싱 요소가 독자적 메타데이터를 유지할 수 있도록 한다. 네 개의 프로세싱 요소들(301-304)이 묘사된다. 위에서 논의된 바와 같이, 프로세싱 요소는 도 1을 참조하여 위에서 서술한 요소들 중 어느 하나를 포함할 수 있다. 제1예로서, 프로세싱 요소들은 프로세서의 코어들을 포함한다. 그러나, 아래에 더 논의되는 예시적 예로서, 프로세싱 요소들(301-304)은 프로세서 내 하드웨어 스레드들(스레드들)을 참조하여 논의될 것이다; 각각의 하드웨어 스레드는 소프트웨어 스레드 및 가능한 여러 소프트웨어 서브시스템들을 실행한다. 3, an embodiment that supports several abstract address spaces is illustrated. In one embodiment, each processor element is associated with an abstract address space, allowing each processing element to maintain its own metadata. Four processing elements 301-304 are depicted. As discussed above, the processing element may include any of the elements described above with reference to FIG. 1. As a first example, processing elements include cores of a processor. However, as an illustrative example further discussed below, processing elements 301-304 will be discussed with reference to hardware threads (threads) in the processor; Each hardware thread executes a software thread and possibly several software subsystems.

따라서, 스레드들(301-304)의 개별 스레드들이 개별 메타데이터를 유지할 수 있게 하는 것이 바람직할 수 있다. 일 실시예에서, 추상적 변환 로직(310)은 여러 스레드들(301-304)로부터의 액세스들을 그들의 적절한 추상적 어드레스 공간들과 연관시키는 것이다. 예로서, 메타데이터 액세스 동작에 의해 참조되는 어드레스와 함께 사용되는 스레드 식별기(ID)는 알맞은 추상적 어드레스 공간으로 인덱싱된다. Thus, it may be desirable to enable individual threads of threads 301-304 to maintain separate metadata. In one embodiment, abstract translation logic 310 is to associate access from several threads 301-304 with their appropriate abstract address spaces. As an example, the thread identifier (ID) used with the address referenced by the metadata access operation is indexed into the appropriate abstract address space.

예시를 위해, 스레드(302)와 연관되고 데이터 아이템(316)의 데이터 어드레스(300)를 참조하는 메타데이터 액세스 동작이 수신된다고 가정할 수 있다. 상술한 바와 같이 데이터 아이템(316)의 데이터 어드레스를 메타데이터 어드레스로 변환하는데 어떤 변환 방법이 사용될 수 있다. 그러나, 그 변환은 추가적으로, 가령 스레드(302)의 제어 레지스터 또는 스레드(302)로부터 수신된 명령어의 오퍼레이션 코드로부터 얻어질 수 있는 스레드 ID(302)와의 연관을 포함한다. 그러한 연관은 스레드 ID(302)를 어드레스에 부가하기, 어드레스 내 비트들의 교체, 또는 스레드 ID를 어떤 어드레스와 연관하는 다른 알려진 방법을 포함할 수 있다. 그 결과, 추상적 변환 로직(310)은 프로세싱 요소(302)의 데이터 아이템(316)과 연관된 추상적 어드레스 공간으로 선택/인덱싱할 수 있다.For illustration, it may be assumed that a metadata access operation is received that is associated with thread 302 and references data address 300 of data item 316. As described above, any translation method may be used to translate the data address of the data item 316 into a metadata address. However, the translation further includes an association with thread ID 302, which may be obtained, for example, from a control register of thread 302 or an operation code of an instruction received from thread 302. Such association may include adding thread ID 302 to an address, replacing bits in the address, or other known method of associating a thread ID with an address. As a result, abstract translation logic 310 may select / index into an abstract address space associated with data item 316 of processing element 302.

예로부터 추정할 때, 스레드들(301-304)의 스레드 ID를 추상적 어드레스로의 변환의 일부로서 활용함으로써, 각각의 프로세싱 요소(301-304)는 데이터 아이템(316)에 대한 독자적 메타데이터를 유지할 수 있다. 그러나, 프로그래머는 추상적 어드레스 공간들을 개별적으로 관리할 필요가 없는데, 이는 하드웨어가 소프트웨어에 대해 투명한 방식으로 스레드 ID의 사용을 통해 그 공간들을 분리 상태로 유지할 수 있기 때문이다. 게다가, 추상적 어드레스 공간들은 직교적이다-한 스레드로부터의 한 메타데이터 액세스는 다른 스레드로부터 메타데이터를 액세스하지 못하는데, 이는 각각의 메타데이터 액세스가 고유 스레드 ID에 대한 레퍼런스를 포함하는 어드레스들의 개별 집합과 연관되기 때문이다.Inferred from the example, by utilizing the thread IDs of the threads 301-304 as part of the translation to an abstract address, each processing element 301-304 maintains its own metadata for the data item 316. Can be. However, the programmer does not need to manage the abstract address spaces separately because the hardware can keep them separate through the use of thread IDs in a way that is transparent to software. In addition, abstract address spaces are orthogonal—one metadata access from one thread does not access metadata from another thread, each associated with a separate set of addresses containing a reference to a unique thread ID. Because it becomes.

이하에 논의되는 바와 같이, 메타데이터를 액세스하는 명령어/오퍼레이션에 대해, 한 스레드로부터의 메타데이터 액세스에 다른 스레드의 메타데이터에 대한 액세스가 지원된다. 즉, 일부 구현예에서 PEID들 및/또는 MDID들(이하에 논의됨)에 대한 액세스가 바람직할 수 있다. 예를 들어, 하드웨어가 충돌을 검출했는지를 판단하거나, 다른 스레드로부터 메타데이터 모니터를 체크하거나, 다른 스레드에 의해 연관 데이터 아이템이 모니터링되는지를 판단하거나, 다른 스레드의 메타데이터를 제거하거나, 커미트 조건을 결정하기 위해, 스레드는 데이터 아이템(316)과 연관된 다른 스레드의 메타데이터를 체크하거나, 수정하거나, 클리어할 수 있다. As discussed below, for instructions / operations that access metadata, access to metadata from another thread is supported for metadata access from one thread. That is, in some implementations, access to PEIDs and / or MDIDs (discussed below) may be desirable. For example, determine if the hardware has detected a conflict, check the metadata monitor from another thread, determine if an associated data item is being monitored by another thread, remove metadata from another thread, or commit conditions. To determine, a thread can check, modify, or clear the metadata of another thread associated with data item 316.

여기서, 다른 스레드의 메타데이터를 액세스하는 오퍼레이션을 위한 특정 오퍼레이션 코드가 인식되고, 그 결과 추상적 변환 로직(310)은 메타데이터가 액세스되도록 모든 메타데이터 어드레스들로의 어드레스(300) 변환을 수행한다. 특정한 예시적 예로서, 네 개의 비트들이 어드레스(300)에 첨가되고 각각의 비트는 프로세싱 요소들(301-304) 중 하나를 나타내고, 클리어 오퍼레이션 같은 메타데이터 액세스 오퍼레이션이 데이터 아이템(316)에 대한 모든 메타데이터를 클리어 해야 하는 경우, 추상적 변환 로직(310)은 모든 메타데이터(317)를 액세스하도록 네 개의 비트들 각각을 설정한다. 여기서, 네 개의 모든 비트가 설정된 단일 액세스가 모든 메타데이터(317)를 액세스하는 메모리(315)에 대한 룩업 로직이 설계되거나, 추상적 변환 로직(310)이 모든 메타데이터(317)를 액세스하도록 설정된 네 개의 비트들 중 다른 스레드 ID 비트를 이용해 네 개의 개별 액세스를 생성할 수 있다. 예시적 예로서, 한 스레드가 다른 스레드의 메타데이터를 건드릴 수 있도록 마스크가 어드레스 값에 적용될 수 있다. Here, a specific operation code for an operation that accesses another thread's metadata is recognized, and as a result, abstract translation logic 310 performs translation of address 300 to all metadata addresses so that the metadata is accessed. As a specific illustrative example, four bits are added to address 300 and each bit represents one of the processing elements 301-304, and a metadata access operation, such as a clear operation, is performed on all data items 316. If it is necessary to clear the metadata, the abstract transformation logic 310 sets each of the four bits to access all metadata 317. Here, the lookup logic for the memory 315 where a single access with all four bits set accesses all metadata 317 is designed, or the abstract transformation logic 310 is set to access all metadata 317. Four separate accesses can be created using different thread ID bits of the four bits. As an illustrative example, a mask can be applied to an address value so that one thread can touch the metadata of another thread.

또, 예시된 바와 같이, 각각의 프로세싱 요소(301-304)는 여러 추상적 어드레스 공간들과 연관되어, 단일 스레드 안의 여러 컨텍스트들이나 소프트웨어 서브시스템들을 여러 메타데이터 어드레스 공간들로 인터리빙할 수 있다. 예를 들어, 어떤 상황에 있어서, 단일 프로세싱 요소 안의 여러 소프트웨어 서브시스템들이 독자적 메타데이터 집합들을 유지할 수 있게 하는 것이 바람직할 수 있다. 따라서, 일례에서, 직교 메타데이터 어드레스 공간들이 코어 레벨, 하드웨어 스레드 레벨, 및/또는 소프트웨어 서브시스템 레벨 같은 여러 프로세싱 요소 레벨에서 제공될 수 있다. 실례에서, 각각의 프로세싱 요소(301-304)는 두 개의 추상적 어드레스 공간들과 연관되며, 두 추상적 어드레스 공간들 중 각 한 개는 프로세싱 요소들 중 하나에서 실행할 소프트웨어 서브시스템들과 연관되어야 한다. As illustrated, each processing element 301-304 may be associated with several abstract address spaces to interleave several contexts or software subsystems within a single thread into several metadata address spaces. For example, in some situations, it may be desirable to allow multiple software subsystems within a single processing element to maintain unique metadata sets. Thus, in one example, orthogonal metadata address spaces may be provided at various processing element levels, such as core level, hardware thread level, and / or software subsystem level. In an example, each processing element 301-304 is associated with two abstract address spaces, each one of which must be associated with software subsystems to execute in one of the processing elements.

소프트웨어 서버 시스템은 개별 추상적 어드레스 공간을 활용할 수 있는 프로세싱 요소 상에서 실행될 어떤 작업이나 코드를 포함한다. 예시적인 예로서, 개별 추상적 어드레스 공간들과 연관될 수 있는 네 개의 서브시스템들은 트랜잭션 런타임 서브시스템, 가비지(garbage) 수집 런타임 서브시스템, 메모리 보호 서브시스템, 및 소프트웨어 변환 시스템을 포함하며, 이들은 하나의 프로세싱 요소 상에서 실행될 수 있다. 여기서, 각각의 소프트웨어 서브시스템은 서로 다른 시점에 프로세싱 요소를 제어할 수 있다. 다른 예로서, 소프트웨어 서브시스템은 한 개의 프로세싱 요소 안에서 실행되는 개별 트랜잭션들을 포함한다. 사실, 동일한 스레드 상에서 실행되는 네스트된(nested) 트랜잭션들은 개별 추상적 어드레스 공간들과 연관되는 것이 바람직할 수 있다. 예시를 위해, 외부 트랜잭션 내 데이터 아이템 액세스에 대한 필터 테스트가 실패할 수 있어, 내부 트랜잭션 내 액세스를 가속화하도록 각자 계속될 수 있는 내부의 네스트된 트랜잭션 내 같은 데이터 아이템 액세스를 위한 제2의 별도의 필터를 제공하는 것이 바람직할 수 있다. 또한, 외부 트랜잭션에 대한 메타데이터가 유지될 수 있도록 네스트된 트랜잭션이 중단될 때, 각각의 네스트된 트랜잭션-서브시스템-은 별개의 메타데이터 공간과 연관됨으로써 내부 네스트된 트랜잭션의 메타데이터의 클리어가 외부 트랜잭션의 메타데이터에 영향을 미치지 않도록 한다. 그러나, 소프트웨어 서브시스템은 메타데이터를 관리할 수 있는 어떤 태스크 또는 코드일 수 있으므로 그러한 것에 국한되지 않는다. A software server system includes any task or code to be executed on processing elements that can utilize a separate abstract address space. As an illustrative example, four subsystems that may be associated with individual abstract address spaces include a transactional runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, and a software translation system. May be executed on the processing element. Here, each software subsystem can control the processing elements at different points in time. As another example, the software subsystem includes individual transactions executed within one processing element. In fact, it may be desirable for nested transactions executing on the same thread to be associated with separate abstract address spaces. For illustrative purposes, a second test filter for accessing the same data item in an internal nested transaction may be failed, so that the filter test for accessing the data item in an external transaction may fail. It may be desirable to provide. In addition, when nested transactions are aborted so that metadata for external transactions can be maintained, each nested transaction-subsystem is associated with a separate metadata space so that the clearing of the metadata of internal nested transactions is external. Do not affect the metadata of the transaction. However, the software subsystem may be any task or code capable of managing metadata, so it is not limited to such.

일 실시예에서, 소프트웨어 서브시스템 레벨에서 직교하는 추상적 어드레스 공간들을 제공하기 위해, 위에서 논의된 바와 같이 어드레스는 프로세싱 요소 ID(PEID)와 연관되고; 또한 그 외에 메타데이터 ID(MDID)나 컨텍스트 ID와 연관된다. 따라서, 각각의 메타데이터는 프로세싱 요소 내 서브시스템에 대해 고유하게 식별된다. 상기 예를 이용할 때, 프로세싱 요소들(301-304)은 하드웨어 스레드들이고, 스레드(302)는 외부 트랜잭션 및 외부 트랜잭션 안에 네스트된 내부 트랜잭션을 실행한다고 가정할 수 있다. 외부 트랜잭션에 대해, 메타데이터(317c)는 데이터 아이템(316)의 데이터 어드레스(300)를 메타데이터(317c)를 참조하는 외부 트랜잭션에 대해 어드레스 플러스 스레드 ID(TID)와 메타데이터 ID(MDID)로 변환하는 추상적 변환(310)을 통해 데이터 아이템(316)과 연관된다.In one embodiment, to provide orthogonal abstract address spaces at the software subsystem level, an address is associated with a processing element ID (PEID) as discussed above; It is also associated with other metadata IDs (MDIDs) or context IDs. Thus, each metadata is uniquely identified for a subsystem within a processing element. Using the example above, it can be assumed that the processing elements 301-304 are hardware threads, and the thread 302 executes an external transaction and an internal transaction nested within the external transaction. For external transactions, metadata 317c converts data address 300 of data item 316 to address plus thread ID (TID) and metadata ID (MDID) for external transactions that reference metadata 317c. Associated with the data item 316 through an abstract transform 310 that transforms.

단지 예시적인 예로서, 메타데이터(317c)는 네 개의 필터 값-판독 필터 값, 기록 필터 값, 취소 필터 값, 및 기타 필터 값, 데이터 아이템(316)의 백첩 위치에 대한 포인터나 다른 레퍼런스, 데이터 아이템(316)에 대한 모니터들이 손실되었는지를 나타내는 모니터링 값, 트랜잭션 서술자(descriptor) 값, 및 데이터 아이템(316)의 버전을 포함한다. 마찬가지로, 내부 트랜잭션은 데이터 아이템(316)에 대해 메타데이터(317)에서와 같은 메타데이터 필드들을 포함하는 메타데이터(317d)와 연관된다. 위에서처럼, 추상적 변환(310)은 데이터 아이템(316)의 데이터 어드레스(300)를 메타데이터(317d)를 참조하는, 내부 트랜잭션을 위한 스레드 ID 및 메타데이터 ID와 연관된 어드레스로 변환한다. By way of example only, the metadata 317c may include four filter value-read filter values, a write filter value, a cancel filter value, and other filter values, pointers or other references to the location of the entry of the data item 316, data A monitoring value indicating that monitors for item 316 have been lost, a transaction descriptor value, and a version of data item 316. Similarly, an internal transaction is associated with metadata 317d that includes metadata fields such as in metadata 317 for data item 316. As above, the abstract translation 310 translates the data address 300 of the data item 316 into an address associated with the thread ID and metadata ID for the internal transaction, which refers to the metadata 317d.

여기서, 메타데이터(317c)를 참조하는 메타데이터 어드레스와 메타데이터(317d)를 참조하는 메타데이터 어드레스 사이의 유일한 차이는 외부 트랜잭션과 내부 트랜잭션에 대한 메타데이터 ID일 수 있다; 이러한 어드레스 차이가 어드레스 공간이 해체되고/직교되게 한다-내부 트랜잭션으로부터의 메타데이터에 대한 액세스는 외부 트랜잭션으로부터의 메타데이터에 영향을 미치지 않을 것인데, 이는 내부 트랜잭션으로부터의 액세스를 위한 MDID가 외부 트랜잭션과는 상이할 것이기 때문이다. 위에 언급된 바와 같이, 이것은 네스트된 트랜잭션들을 롤백하거나 상이한 레벨의 트랜잭션들에 대해 상이한 메타데이터 값들을 보유하는 데 있어 바람직할 수 있다. 구체적으로 말하면, 내부 트랜잭션이 중단된 경우, 메타데이터(317c)에 보유된 외부 트랜잭션의 백업 데이터를 제거하거나 영향을 주지 않고, 내부 트랜잭션 메타데이터(317d)에 보유된 데이터 아이템(316)의 백업 데이터가 제거되거나 내부 트랜잭션 이전의 입력 시점(entry point)으로 데이터 아이템(316)을 롤백하는 데 사용될 수 있다. Here, the only difference between the metadata address referring to metadata 317c and the metadata address referring to metadata 317d may be the metadata ID for the external transaction and the internal transaction; This address difference causes the address space to be dismantled / orthogonal-access to metadata from an inner transaction will not affect metadata from an outer transaction, which means that the MDID for access from an inner transaction is Because will be different. As mentioned above, this may be desirable for rolling back nested transactions or holding different metadata values for different levels of transactions. Specifically, if the internal transaction is aborted, the backup data of the data item 316 held in the internal transaction metadata 317d without removing or affecting the backup data of the external transaction held in the metadata 317c. May be removed or used to roll back data item 316 to an entry point prior to an internal transaction.

소프트웨어 서브시스템 추상 어드레스 공간들을 구분하기 위한 메타데이터 ID(MDID)는 임의의 사이즈일 수 있으며 여러 소스들로부터 나올 수 있다는 것을 알아야 한다. 지나치게 간략화된 예시적 예로서, 네 개의 프로세싱 요소들(PEs)(301-304)을 사용할 때 PEID는 두 비트의 조합-00, 01, 10, 11으로부터 나올 수 있다. 마찬가지로, 네 개의 각각의 추상 어드레스 공간들이 지원되는 경우, 두 비트의 MDID-00, 01, 10, 11가 마찬가지로 네 개의 서브시스템들 사이를 구별할 수 있다. 예시를 위해, 프로세싱 요소(302) 및 서브시스템 둘을 PE(302) 안에서 나타내기 위한 값은 0101을 포함한다(첫 번째 두 비트는 PE(302)에 대한 01이고 두 번째의 두 비트는 이차 서브시스템에 대한 01이다). 이 예에서, 추상 변환 로직은 이 값을 메타데이터 어드레스(300) 또는 그 변환값과 연관하여, 메타데이터 위치(317d)를 포함하는 PE(302) MDID 01을 참조하도록 한다.It should be appreciated that the metadata ID (MDID) for distinguishing software subsystem abstract address spaces can be of any size and can come from several sources. As an oversimplified example, when using four processing elements (PEs) 301-304, the PEID may come from a combination of two bits-00, 01, 10, 11. Similarly, if four respective abstract address spaces are supported, two bits of MDID-00, 01, 10, 11 can likewise distinguish between the four subsystems. For illustration purposes, the value for representing both the processing element 302 and the subsystem within the PE 302 includes 0101 (the first two bits are 01 for the PE 302 and the second two bits are secondary subs). 01 for the system). In this example, the abstract translation logic associates this value with metadata address 300 or its translation value to refer to PE 302 MDID 01, which includes metadata location 317d.

그러나, 스레드 ID와 MDID들 양자 모두는 좀 더 복잡할 수 있다. 예를 들어, 스레드들(301-302)은 메모리(315)에 대한 액세스를 공유하고, 스레드들(3030-304)은 메모리(315)에 대한 액세스를 공유하지 않는 리모트 프로세싱 요소들이라고 가정할 수 있다. 또, 스레드들(301-302) 각각이 스레드들(301-302)의 총 네 개의 직교 어드레스 공간들-PE 301 MDO, PE 301 MDl, PE 302 MDO, 및 PE 302 MDl 어드레스 공간들을 위한 두 개씩의 서브시스템들을 지원한다. 이 경우, 메타데이터 어드레스를 획득하는데 사용되는 스레드 ID와 MDID가 연관된 것의 값은 오퍼레이션 코드, 제어 레지스터, 또는 이들의 조합으로부터 나올 수 있다. 예시를 위해 오퍼레이션 코드는 컨텍스트/MDID를 위한 한 비트를 제공하고, 제어 레지스터는 프로세싱 요소 ID(PEID)에 대해 한 비트를 제공한다-오직 두 개의 프로세싱 요소들, 및 MDCR(320) 같은 메타데이터 제어 레지스터가 보다 큰 세분도를 위한 특정 소프트웨어 서브시스템/컨텍스트를 식별하기 위해 네 개의 비트를 제공한다고 가정한다. 따라서, 데이터 아이템(316)의 어드레스(300)를 참조하는 메타데이터 액세스 오퍼레이션이 제2스레드-PE(302)-로부터 수신될 때, 오퍼레이션 코드의 한 비트-제2컨텍스트를 나타내기 위한 1을 포함하는 제1비트, 및 프로세싱 요소(302)에 대한 제어 레지스터로부터의 제2비트-프로세싱 요소(302)를 가리키는 1을 포함-가 제2스레드와 연관된 메타데이터 제어 레지스터(MDCR)(320)로부터의 MDID와 연관된다; 상기 MDCR은 수신된 오퍼레이션과 연관된 적절한 서브시스템을 식별하기 위해 제2스레드-0010-를 제어하는 현재의 서브시스템의 MDID에 의해 앞서 업데이트되어 있다. 추상적 변환 로직은 110010 같은 연관된 값을 취하고, 다시 그것을 참조된 데이터 어드레스(300)나 그 변환사항과 더 연관하여 메타데이터 어드레스를 획득하도록 한다. 그러나, 메타데이터 어드레스의 110010 부분은 액세스 오퍼레이션이 기원되었던 서브시스템에 고유하기 때문에, 그것은 메타데이터 어드레스들(317a, b, c, e, f, g, h)-제2스레드 및 다른 스레드들 둘 모두 안에 있는 다른 서브시스템들의 직교하는 추상 어드레스 공간들을 건드리거나 영향을 미치지 않으면서 메모리(315) 안의 메타데이터 어드레스(317d) 만을 건드리거나 수정할 것이다. However, both thread ID and MDIDs can be more complex. For example, one can assume that threads 301-302 share access to memory 315, and threads 3030-304 are remote processing elements that do not share access to memory 315. FIG. have. In addition, each of threads 301-302 has two totals of four orthogonal address spaces of threads 301-302-PE 301 MDO, PE 301 MDl, PE 302 MDO, and PE 302 MDl address spaces. Support subsystems. In this case, the value of the thread ID associated with the MDID used to obtain the metadata address may come from an operation code, a control register, or a combination thereof. For illustration, the operation code provides one bit for the context / MDID, and the control register provides one bit for the processing element ID (PEID) —only two processing elements, and metadata control such as MDCR 320. Assume a register provides four bits to identify a particular software subsystem / context for larger granularity. Thus, when a metadata access operation referring to the address 300 of the data item 316 is received from the second thread-PE 302, it contains one to indicate one bit-second context of the operation code. And a second bit from the control register for the processing element 302-including 1 pointing to the processing element 302-from the metadata control register (MDCR) 320 associated with the second thread. Associated with MDID; The MDCR has been previously updated by the MDID of the current subsystem controlling second thread-0010- to identify the appropriate subsystem associated with the received operation. The abstract translation logic takes an associated value, such as 110010, and again associates it with the referenced data address 300 or its translation to obtain a metadata address. However, since the 110010 portion of the metadata address is unique to the subsystem from which the access operation originated, it is the metadata addresses 317a, b, c, e, f, g, h) —the second thread and two other threads. It will only touch or modify the metadata address 317d in memory 315 without touching or affecting orthogonal abstract address spaces of other subsystems within it.

특정한 예시적 예로서, 특정 형태의 MDCR에 대한 논의가 포함된다. 일부 실시예들에서, ISA는 MDID 별로 영향받기 쉬운(MDID to MDID-sensitive) 메타데이터 로드/저장/테스트/설정 명령어들을 소싱하는 스레드별 메타데이터 식별자 레지스터(MDID 레지스터)를 사용해 확장될 수 있다. 일부 실시예들에서, 그러한 복수의 레지스터들을 가지는 것이 편리하다. 예를 들어, MDCR: 메타데이터 제어 레지스터는 32 비트의 판독-기록 레지스터이며 현재의 메타데이터 컨텍스트 ID(MDID)를 포함한다. 그것은 CR MOV에 의해 업데이트될 수 있다. 전형적인 비트 필드 정의는 다음과 같다: As a specific illustrative example, a discussion of certain forms of MDCR is included. In some embodiments, ISA may be extended using a thread-specific metadata identifier register (MDID register) that sources MDID to MDID-sensitive metadata load / store / test / setup instructions. In some embodiments, it is convenient to have such a plurality of registers. For example, the MDCR: Metadata Control Register is a 32 bit read-write register and contains the current Metadata Context ID (MDID). It can be updated by the CR MOV. A typical bitfield definition is as follows:

MDID 0 및 MDID 1는 명령어 집합에 동시에 액세스될 수 있는 메타데이터 ID들이다. 이 필드들 중에서 실제로 사용되는 비트들의 개수가 MDID_size이며, 이것은 일 실시예에서 프로세서 설계를 통해 특정되는 것과 같이 어떤 허용 레벨에서만 읽혀진다. 그러나, 일 실시예들에서 다양한 레벨 특권(privilege) 레벨들이 사이즈를 수정할 수 있다. MDID가 그 사이즈 내 비트 할당에 맞게 보장하는 어떠한 하드웨어 검사도 존재하지 않을 것이다. 일 실시예에서, MDID 0 및 MDID 1은 어떤 허용 레벨에서도 기입 및 판독될 수 있다. 또한, 특별한 MDID 값들을 사용하여 항상 0이나 1로서 읽혀지는 특별한 메타데이터 공간들을 지정하는 것이 가능할 수도 있다. 이것은 도 6 및 7을 참조해 메타데이터 값을 강제할 레지스터에 대한 논의와 유사한 방식으로 블록 내 모든 메타데이터 테스트들이 참 또는 거짓이 되게 강제하는 소프트웨어에 의해 사용될 수 있을 것이다. MDID 0 and MDID 1 are metadata IDs that can be accessed simultaneously in the instruction set. The number of bits actually used in these fields is MDID_size, which is only read at certain tolerance levels as specified in the processor design in one embodiment. However, in one embodiment various level privilege levels may modify the size. There will not be any hardware check that the MDID guarantees to match the bit allocation within that size. In one embodiment, MDID 0 and MDID 1 can be written and read at any acceptable level. It may also be possible to specify special metadata spaces that are always read as 0 or 1 using special MDID values. This may be used by software forcing all metadata tests in a block to be true or false in a manner similar to the discussion of registers to enforce metadata values with reference to FIGS. 6 and 7.

그러나 다른 예에서, 상술한 바와 같이 디코더들(미도시)과 연계된 추상적 변환 로직(310)은 스레드(302)로부터 스레드(301)의 메타데이터 어드레스 공간으로부터 메타데이터를 액세스하도록 된 메타데이터 액세스 오퍼레이션들을 인식할 수 있고, 스레드(301)의 메타데이터를 읽거나 수정하기 위해 그러한 특정 명령어/오퍼레이션들에 대한 액세스를 허용할 수 있다. However, in another example, as described above, abstract transformation logic 310 associated with decoders (not shown) may cause metadata access operations to access metadata from the thread's metadata address space from thread 302. And may allow access to such specific instructions / operations to read or modify the thread 301's metadata.

데이터에 대한 메타데이터의 압축Compression of Metadata for Data

위에서는 데이터에서 메타데이터-압축되지 않은 메타데이터-로의 일대일 매핑이 논의되었다; 그러나 어떤 환경에서는 데이터에 비해 작은 양의 메타데이터를 활용하는 것-메타데이터의 크기가 데이터보다 작은 경우 메타데이터의 압축-이 보다 효율적이다. 도 2-3으로부터 추상적 어드레스 변환 로직(210 및 310)은 어드레스의 변환 및 수정을 수행할 때 압축을 고려하여 그에 따라 압축된 메타데이터를 참조할 수 있도록 할 수 있다. 도 4를 참조할 때, 메타데이터의 압축을 수행하기 위해 어드레스를 수정하는 실시예가 예시된다; 특히 데이터 대 메타데이터의 압축 비율이 8인 실시예가 묘사된다. 도 2-3으로부터의 추상적 어드레스 변환 로직(210 및 310) 같은 제어 로직은 메타데이터 액세스 오퍼레이션에 의해 참조되는 데이터 어드레스(400)를 수신햐야 한다. 예로서, 압축은 어드레스(400) 안이나 그로부터 log2(N) 개수의 비트들을 쉬프트하거나 제거하는 동작을 포함하며, 여기서 N은 데이터 대 메타데이터의 압축 비율이다. 8이라는 압축 비율에 대해 도시된 예에서, 메타데이터 어드레스(405)를 위해 세 개의 비트가 하향 쉬프트되고 제거된다. 실질적으로, 메모리의 특정 데이터 바이트를 참조하기 위한 64 비트를 포함하는 어드레스(400)는 세 개의 비트가 잘려 나가 바이트 세분도로 메모리 내 메타데이터를 참조하는데 사용되는 메타데이터 바이트 어드레스(405)를 형성한다; 그 가운데 메타데이터의 비트는 메타데이터 바이트 어드레스를 형성하기 위해 어드레스로부터 앞서 제거된 세 비트를 사용하여 선택된다. Above, the one-to-one mapping from data to metadata to uncompressed metadata is discussed; In some circumstances, however, utilizing a smaller amount of metadata than data—compression of the metadata when the size of the metadata is smaller than the data—is more efficient. 2-3, the abstract address translation logic 210 and 310 may refer to the compressed metadata accordingly in consideration of compression when performing the translation and modification of the address. Referring to FIG. 4, an embodiment of modifying an address to perform compression of metadata is illustrated; In particular, an embodiment is described in which the compression ratio of data to metadata is eight. Control logic, such as abstract address translation logic 210 and 310 from FIGS. 2-3, must receive the data address 400 referenced by the metadata access operation. By way of example, compression includes shifting or removing a log2 (N) number of bits in or out of address 400, where N is the compression ratio of data to metadata. In the example shown for a compression ratio of eight, three bits are shifted down and removed for metadata address 405. Practically, address 400, which contains 64 bits to refer to a particular data byte in memory, forms the metadata byte address 405, which is used to refer to metadata in memory in three subsections, with the three bits truncated. ; Among them, the bits of metadata are selected using the three bits previously removed from the address to form the metadata byte address.

일 실시예에서 쉬프트/제거된 비트들은 다른 비트들로 대체된다. 예시된 것과 같이, 어드레스(400)가 쉬프트된 후 상위 비트들은 0으로 교체된다. 그러나, 제거/쉬프트된 비트들이 메타데이터 액세스 오퍼레이션과 연관된 프로세싱 요소 ID, 컨텍스트 식별자(ID), 및/또는 메타데이터 ID(MDID) 같은 다른 데이터 정보로 대체될 수도 있다. 이 예에서는 가장 하위 넘버의 비트들이 제거되지만, 캐시 구조, 캐시 회로 타이밍, 데이터에 대한 메타데이터의 국지성, 및 데이터 및 메타데이터 사이의 충돌 최소화 같은 임의의 개수의 요인들에 기초해 어떤 위치의 비트들이라도 제거 및 대체될 수 있을 것이다.In one embodiment the shifted / removed bits are replaced with other bits. As illustrated, the upper bits are replaced by zero after the address 400 is shifted. However, the removed / shifted bits may be replaced with other data information, such as processing element ID, context identifier (ID), and / or metadata ID (MDID) associated with the metadata access operation. In this example, the least significant bits are removed, but bits at any location based on any number of factors, such as cache structure, cache circuit timing, locality of metadata for data, and minimizing collisions between data and metadata. Even they may be removed and replaced.

예를 들어, 데이터 어드레스는 log₂(N) 만큼 쉬프트되지 않을 것이며, 대신 어드레스 비트들 0:2가 0들로 된다. 그 결과, 동일한 물리적 어드레스 및 가상 어드레스의 비트들은 상기 예에서와 같이 쉬프트되지 않고, 비트들 11:3 같이 수정되지 않은 비트들을 가진 설정 및 뱅크의 사전 선택을 고려한다.For example, the data address will not be shifted by log ₂ (N), instead address bits 0: 2 are zeros. As a result, the bits of the same physical address and the virtual address are not shifted as in the above example, but consider the setting and preselection of the bank with unmodified bits such as bits 11: 3.

변환과 연관된 논의가 압축과 연관될 수 있다는 것을 알아야 한다. 즉, 압축 비율은 도 2-3으로부터의 추상적 어드레스 변환 로직(210) 안으로의 입력일 수 있으며, 그 변환 로직은 PEID, CID, MDID, 추상적인 값, 또는 데이터 어드레스를 메타데이터 어드레스로 변환하기 위한 다른 정보와 함께 그 압축 비율을 활용한다. 메타데이터 어드레스는 여기서 메타데이터를 보유하는 메모리를 액세스하는데 활용된다. 위에서 논의된 바와 같이, 메타데이터는 손실이 있는 로컬 구조이므로 메타데이터 어드레스에 기반한 메모에 대한 미스들-외부 미스 서비스 요청의 생성도 없고 외부 요청이 서비스되는 것을 기다리지도 않는 메모리 위치의 할당 메모리 위치의 할당-이 빠르고 효율적으로 서비스될 수 있다. 여기서, 어떤 엔트리가 메타데이터에 대해 일반적인 방법으로 할당된다. 예를 들어, 도 2로부터의 엔트리(217) 같은 엔트리가 메타데이터 어드레스(405) 및 LRU(Least Recently Used) 알고리즘 같은 캐시 교체 알고리즘에 기반하여 메타데이터 디폴트 값으로 선택, 할당, 및 초기화된다. 그 결과 메타데이터가 공간에 대해 일반 데이터와 경쟁할 가능성이 있지만, 다른 소프트웨어 서브시스템들/프로세싱 요소들로부터 압축되고 분해된 상태를 유지한다. It should be noted that discussions associated with transformations may be associated with compression. That is, the compression ratio may be input into the abstract address translation logic 210 from Figures 2-3, which translation logic may be a PEID, CID, MDID, abstract value, or other for converting a data address into a metadata address. Use that compression ratio with the information. The metadata address is utilized here to access the memory holding the metadata. As discussed above, since metadata is a lossy local structure, there is no generation of miss-external miss service requests for memos based on metadata addresses and no allocation of memory locations of memory locations waiting for external requests to be serviced. Allocation can be serviced quickly and efficiently. Here, an entry is assigned in the usual way for metadata. For example, an entry, such as entry 217 from FIG. 2, is selected, assigned, and initialized with a metadata default value based on a cache replacement algorithm, such as metadata address 405 and Least Recently Used (LRU) algorithm. As a result, while metadata is likely to compete with regular data for space, it remains compressed and decomposed from other software subsystems / processing elements.

8이라는 압축 비율은 단지 예시적인 것으로 어떠한 압축 비율도 사용될 수 있다는 것을 알아야 한다. 또 다른 예로서 512:1의 압축 비율이 사용된다 -메타데이터의 한 비트가 데이터의 64 바이트를 나타낸다. 위와 마찬가지로, 데이터 어드레스는 데이터 어드레스를 log2(512) 비트-9 비트만큼의 하향 쉬프트를 통해 메타데이터 어드레스로부터 변환/수정된다. 여기서, 비트들 0:2 대신 비트들 6:8이 비트를 선택하는데 계속 사용되어, 512 비트 세분도의 선택을 통한 압축을 효과적으로 만들 수 있다. 데이터 어드레스가 9 비트 쉬프트되었기 때문에, 데이터 어드레스의 상위 부분은 정보를 보유하기 위해 개방된 9 개의 비트 위치들을 가진다. 일 실시예에서, 그 9 비트는 컨텍스트 ID, 스레드 ID, 및/또는 MDID 같은 식별자들을 보유하기 위한 것이다. 또한, 추상적 공간 값들이 또한 이 비트들 안에 보유되거나, 어드레스가 추상적 값만큼 확장될 수 있다. 일 실시예에서는 하드웨어에 의해 여러 동시 압축 비율들이 지원된다.It should be noted that the compression ratio of 8 is merely exemplary and any compression ratio may be used. As another example, a compression ratio of 512: 1 is used-one bit of metadata represents 64 bytes of data. As above, the data address is translated / modified from the metadata address via a downward shift of log2 (512) bits-9 bits. Here, bits 6: 8 instead of bits 0: 2 continue to be used to select the bits, effectively making compression through the selection of 512 bit granularity. Since the data address has been shifted nine bits, the upper portion of the data address has nine bit positions open to hold information. In one embodiment, the 9 bits are for holding identifiers such as context ID, thread ID, and / or MDID. In addition, abstract spatial values may also be held in these bits, or the address may be extended by an abstract value. In one embodiment, multiple simultaneous compression rates are supported by the hardware.

여기서, 압축 비율의 표현은 메타데이터 어드레스를 얻기 위해 데이터 어드레스와 연관된 추상 값의 일부로서 보유된다. 그 결과, 데이터 어드레스를 이용한 메모리 검색 중에 압축 비율이 고려되며, 그 압축 비율은 다른 압축 비율들의 어드레스들과 매치하지 않는다. 또한 소프트웨어는 저장 정보를 다른 압축 비율의 로드들로 포워드하지 않도록 하드웨어에 의존할 수 있다. Here, the representation of the compression ratio is retained as part of the abstract value associated with the data address to obtain the metadata address. As a result, the compression ratio is taken into account during memory retrieval using data addresses, and the compression ratio does not match the addresses of other compression ratios. Software may also rely on hardware to not forward stored information to loads of different compression ratios.

일 실시예에서 하드웨어는 단일 압축 비율을 활용해 구현되지만 소프트웨어에 여러 압축 비율들을 제공하는 다른 하드웨어 지원을 포함한다. 예로서, 캐시 하드웨어가 도 4에 예시된 것과 같이 8:1 압축 비율을 활용해 구현된다고 가정할 수 있다: 다양한 세분도로 메타데이터를 액세스하는 메타데이터 액세스 오퍼레이션은 디폴트 량의 메타데이터를 판독 위한 마이크로 오퍼레이션 및 메타데이터 판독의 적절한 일부를 테스트하기 위한 마이크로 오퍼레이션을 포함하도록 디코딩된다. 예로서, 메타데이터 판독의 디폴트 량은 32 비트이다. 그러나 8:1의 다른 세분도/압축에 대한 테스트 오퍼레이션은 메타데이터 어드레스의 LSB들의 개수 같은 어드레스의 소정 비트들의 수 및/또는 컨텍스트 ID에 기반할 수 있는, 메타데이터 판독 32 비트 중 적당한 비트들을 테스트한다. In one embodiment, the hardware is implemented using a single compression ratio but includes other hardware support that provides multiple compression ratios to the software. As an example, it can be assumed that the cache hardware is implemented using an 8: 1 compression ratio as illustrated in Figure 4: A metadata access operation that accesses metadata at various granularities is a microcomputer for reading the default amount of metadata. It is decoded to include a microoperation for testing the appropriate portion of the operation and metadata readout. As an example, the default amount of metadata read is 32 bits. However, the test operation for another granularity / compression of 8: 1 tests the appropriate bits of the 32 bits of metadata read, which may be based on the context ID and the number of bits in the address, such as the number of LSBs in the metadata address. do.

예로서, 데이터 바이트 당 메타데이터 한 비트에 대해 언얼라인(unaligned) 데이터의 메타데이터를 지원하는 스킴에 있어서, 메타데이터 어드레스의 세 LSB들에 기반해 메타데이터의 32 판독 비트들 중 하위 여덟 비트로부터 한 비트가 선택된다. 데이터의 워드에 있어서, 어드레스의 세 LSB들에 기반해 판독 메타데이터 32 비트 중 하위 16 비트들로부터 두 개의 연속 메타데이터 비트가 선택되고, 128 비트의 메타데이터 사이즈에 대한 16 비트들까지 계속된다. For example, in a scheme that supports metadata of unaligned data for one bit of metadata per data byte, the lower eight bits of the 32 read bits of metadata based on the three LSBs of the metadata address. One bit is selected from. In the word of data, two consecutive metadata bits are selected from the lower 16 bits of the read metadata 32 bits based on the three LSBs of the address, and continue to 16 bits for a 128 bit metadata size.

메타데이터 액세스 명령어/오퍼레이션Metadata Access Command / Operation

도 5로 가면 데이터와 연관된 메타데이터를 액세스하는 방법의 흐름도가 예시된다. 도 5의 흐름도는 실질적으로 순차적으로 예시되어 있지만, 적어도 일부가 병렬 방식 및 가능한 다른 순서로 수행될 수 있다. 5, a flow diagram of a method of accessing metadata associated with data is illustrated. Although the flowchart of FIG. 5 is illustrated substantially sequentially, at least some may be performed in a parallel manner and possibly in other orders.

단계 505에서 소정 데이터 아이템의 데이터 어드레스를 참조하는 메타데이터 오퍼레이션과 만난다. 상기 논의에서 메타데이터 명령어/오퍼레이션들은 메타데이터를 읽고, 수정하고/하거나 클리어하는 하드웨어를 통해 지원될 수 있다고 언급되었다. 즉, 명령어들은 프로세서의 명령어 집합 구조(ISA)를 통해 지원되어, 프로세서의 디코더들이 데이터를 액세스하라는 명령어들의 오퍼레이션 코드들(오퍼레이션 코드들)과 그에 따라 액세스를 수행하는 로직을 인식하도록 한다. 명령어의 사용은 오퍼레이션을 의미할 수도 있다는 것을 알아야 한다. 어떤 프로세서들은 메타데이터를 테스트하는 메타데이터 테스트 오퍼레이션/마이크로 오퍼레이션으로 디코딩되고 테스트 오퍼레이션의 결과로서 정확한 불린 값이 얻어지면 설정 오퍼레이션이 메타데이터를 특정 값으로 업데이트하는 테스트 및 설정 메타데이터 매크로 명령어 같이, 개별 태스크들을 수행하는 복수의 마이크로 오퍼레이션들로 디코딩될 수 있는 매크로 명령어의 개념을 이용한다. In step 505, a metadata operation that references a data address of a predetermined data item is encountered. It was mentioned in the above discussion that metadata instructions / operations may be supported through hardware that reads, modifies, and / or clears metadata. That is, the instructions are supported through the processor's instruction set structure (ISA), allowing the decoders of the processor to recognize the operation codes (operation codes) of the instructions to access the data and hence the logic to perform the access. Note that the use of an instruction may mean an operation. Some processors are decoded into a metadata test operation / microoperation that tests the metadata, and if the correct boolean value is obtained as a result of the test operation, then the set operation will update the metadata to a specific value, such as test and setup metadata macro instructions. It utilizes the concept of macro instructions that can be decoded into a plurality of microoperations that perform tasks.

그러나, 메타데이터 액세스 오퍼레이션들은 메타데이터를 액세스하라는 명시적 명령어들에 국한되지 않으며, 오히려 메타데이터와 연관된 데이터 아이템에 대한 액세스를 포함하는 더 크고 더 복잡한 명령어의 일부로서 디코딩되는 내재적 마이크로-오퍼레이션들을 또한 포함할 수 있다. 여기서, 데이터 액세스 명령어는 데이터에 대한 액세스 및 연관 메타데이터의 내재적 업데이트 같은 복수의 오퍼레이션들로 디코딩될 수 있다.However, metadata access operations are not limited to explicit instructions to access metadata, but rather also implicit micro-operations that are decoded as part of a larger and more complex instruction that includes access to data items associated with metadata. It may include. Here, the data access instruction can be decoded into a plurality of operations, such as an access to data and an implicit update of the associated metadata.

앞에서 논의된 바와 같이, 일 실시예에서 하드웨어 상의 메타데이터의 데이터로의 물리적 매핑은 소프트웨어에 있어 직접적으로 가시화되지 않는다. 그 결과, 메타데이터 액세스 오퍼레이션들은 이 예에서 메타데이터를 알맞게 액세스하기 위해 데이터 어드레스들을 참조하고 정확한 변환, 즉 매핑을 수행하기 위한 하드웨어에 의존한다. 메타데이터 액세스 오퍼레이션들은 그들이 발원되는 것이 어떤 스레드, 컨텍스트, 및/또는 소프트웨어 서브시스템인가에 따라 각각의 추상 어드레스 공간들을 개별적으로 참조할 수 있다. 따라서, 메모리는 소프트웨어에 대해 투명한 방식으로 데이터 아이템들에 대한 메타데이터를 보유할 수 있다. 하드웨어가 명시적 오퍼레이션 코드(명령어의 옵(op) 코드)를 통하거나 메타데이터 액세스 마이크로 오퍼레이션(들)로의 명령어 디코딩을 통해 메타데이터에 대한 액세스 오퍼레이션을 검출할 때, 하드웨어는 그에 따라 메타데이터를 액세스하기 위해 액세스 오퍼레이션에 의해 참조되는 데이터 어드레스의 필수적 변환을 수행한다. As discussed above, in one embodiment the physical mapping of metadata on hardware to data is not directly visible in software. As a result, metadata access operations refer to the data addresses in this example to properly access the metadata and rely on hardware to perform the correct translation, ie mapping. Metadata access operations may individually reference each abstract address space depending on which thread, context, and / or software subsystem they originate from. Thus, the memory can hold metadata for data items in a manner that is transparent to software. When the hardware detects an access operation for metadata through explicit operation code (op code of the instruction) or through instruction decoding into the metadata access micro operation (s), the hardware accesses the metadata accordingly. To perform the necessary translation of the data address referenced by the access operation.

이 예에서 도시하듯이, 프로그램은 데이터 액세스 오퍼레이션이나 메타데이터 액세스 오퍼레이션 같이 도 2-3으로부터의 데이터 아이템들(216 및 316) 같은 데이터 아이템의 동일한 어드레스를 참조하는 개별 오퍼레이션들을 포함할 수 있고, 하드웨어는 이러한 액세스들을 물리적 어드레스 공간 및 추상적 어드레스 공간 같은 상이한 어드레스 공간들로 매핑할 수 있다. 일부 실시예들에서, ISA는 소정의 가상 어드레스, MDID, 압축 비율, 및 오퍼랜드 폭에 대해 메타데이터를 로드/저장/테스트/설정하라는 명령어들을 사용해 확장될 수 있다. 그 파라미터들 어느 것이나 명시적 명령어 오퍼랜드들이거나, 오퍼레이션 코드로 인코딩되거나, 별개의 제어 레지스터로부터 획득될 수 있다. 명령어들은 메타데이터 로드/저장 오퍼레이션을 다른 오퍼레이션들, 예컨대, 어떤 데이터를 로드하고, 그 일부 비트들을 테스트하고, 후속 조건부 점프를 위한 조건 코드를 설정하는 오퍼레이션들과 연관시킬 수 있다. 명령어들은 또한 모든 메타데이터, 또는 단지 특정 MDID에 대한 메타데이터만을 플러시(flush)할 수도 있다. 이하에서는 여러 예시적 메타데이터 액세스 동작들이 나열된다. 전형적 명령어들 중 일부는 특정한 64배 압축비율 명령어들과 연관된 것이나, 구체적으로 논의되지 않더라도 다른 압축 비율들 및 압축되지 않은 메타데이터에 대한 유사 명령어들이 사용될 수도 있다. As shown in this example, the program may include individual operations that reference the same address of the data item, such as data items 216 and 316 from FIGS. 2-3, such as data access operations or metadata access operations, Can map these accesses to different address spaces, such as a physical address space and an abstract address space. In some embodiments, the ISA may be extended with instructions to load / store / test / set metadata for a given virtual address, MDID, compression ratio, and operand width. Any of those parameters may be explicit instruction operands, encoded with an operation code, or obtained from a separate control register. The instructions can associate a metadata load / store operation with other operations, such as operations that load some data, test some of its bits, and set a condition code for subsequent conditional jumps. The instructions may also flush all metadata, or only metadata for a particular MDID. Several exemplary metadata access operations are listed below. Some of the typical instructions are associated with particular 64x compression ratio instructions, although similar instructions for other compression ratios and uncompressed metadata may be used, although not specifically discussed.

메타데이터 비트 테스트 및 설정(Testing and setting metadata bits MDLTMDLT ))

메타데이터 로드 및 테스트 명령어(MDLT)는 두 개의 인수들인, 메타데이터가 소스 오퍼랜드로서 연관되는 데이터 어드레스 및 바이트, 워드, 디워드(dword), 또는 비트를 포함한 다른 사이즈의 메타데이터가 기입되는 레지스터(목적지 오퍼랜드)를 가진다. 테스트된 메타데이터 비트의 값이 레지스터 안에 기입된다. 프로그래머는 MDLT 명령어의 목적지 레지스터에 저장된 데이터에 대한 어떤 지식도 추정해서는 안되며, 이 레지스터를 조작해서도 안된다. 이 레지스터는 동일 어드레스에 대한 메타데이터 저장 및 설정 명령(MDSS)의 소스 오퍼랜드로서만 유일하게 사용되어야 한다. 일 실시예에서, MDLT 명령어는 테스트 및 설정 오퍼레이션들을 연관할 것이나, 테스트가 성공한 경우 설정 오퍼레이션을 스쿼시(squash)할 것이다. The metadata load and test instructions (MDLT) are two arguments: a data address to which metadata is associated as a source operand and a register into which metadata of different sizes, including bytes, words, words, or bits, is written. Destination operand). The value of the tested metadata bit is written into the register. The programmer must not assume any knowledge of the data stored in the destination register of the MDLT instruction and must not manipulate this register. This register should only be used as the source operand of the Metadata Save and Set Instruction (MDSS) for the same address. In one embodiment, the MDLT instruction will associate test and setup operations, but will squash the setup operation if the test is successful.

메타데이터 저장 및 설정(Save and set up metadata MSSMSS ))

메타데이터 저장 및 설정 명령어(MDSS)는 두개의 인수들을 가진다: 메타데이터가 연관되는 데이터 어드레스 및 바이트, 워드, 디워드(dword), 또는 비트를 포함한 다른 사이즈의 메타데이터가 나와 메모리로 저장될 레지스터(소스 오퍼랜드). MDSS 명령어는 그 소스 오퍼랜드로부터의 값에 정확한 비트를 설정할 것이다. The Metadata Save and Set Instruction (MDSS) takes two arguments: a data address with which the metadata is associated and a register to which metadata of different sizes, including bytes, words, words, or bits, appears and is stored in memory. (Source operand). The MDSS instruction will set the correct bit in the value from its source operand.

메타데이터 저장 및 리셋 명령어(Metadata save and reset command ( MDSRMDSR ))

MDSR 명령어는 두 개의 소스 인수들을 가진다: 소스 오퍼랜드로서 메타데이터가 연관되는 데이터 어드레스 및 바이트, 워드, 디워드(dword), 또는 비트를 포함한 다른 사이즈의 메타데이터가 나와 리셋될 레지스터. MDSR 명령어는 그 소스 오퍼랜드로부터의 값에 정확한 비트를 리셋할 것이다. The MDSR instruction has two source arguments: a data address with which the metadata is associated as a source operand and a register to which other size metadata, including bytes, words, words, or bits, is returned and reset. The MDSR instruction will reset the correct bit to the value from its source operand.

메타데이터 어드레스는 참조된 데이터 어드레스로부터 결정된다. 메타데이터 어드레스를 결정하는 예들이 상기 추상적 어드레스 변환 및 여러 추상적 어드레스 공간 섹션들에 포함된다. 그러나, 그러한 변환은 데이터 대 메타데이터의 압축 비율을 포함하여, 즉 그에 기반하여, 각각의 압축비율에 대해 개별적으로 메타데이터를 저장하도록 할 수 있다. The metadata address is determined from the referenced data address. Examples of determining metadata addresses are included in the abstract address translation and various abstract address space sections. However, such a transformation may include storing the metadata separately for each compression ratio, including, based on, the compression ratio of data to metadata.

테스트 메타데이터(Test metadata ( CMDTCMDT ))

CMDT 명령어는 구현예에 따라 좌우되는 압축된 매핑 함수를 이용해 메모리 데이터 어드레스를 메모리 메타데이터 어드레스로 변환하고, 메모리 메타데이터 어드레스에 대응하는 메타데이터 비트가 설정되어 있는지 여부를 테스트하도록 한다. 예로서, 압축 비율 CR은 8 바이트에 대해 1 비트가 된다. 메타데이터 어드레스 계산은 MDBLK[CR] [MDCR.MDID[MDID number]].META를 어드레싱하는 각각의 개별 컨텍스트 ID에 대한 MD의 고유 집합을 제공하기 위해 MDCR 레지스터로부터의 컨텍스트 ID들 중 하나를 포함한다. 그 명령어는 어드레스 'mem'을 특정된 데이터 사이즈로 정렬함으로써 얼라인먼트를 강제한다. 그 명령어는 메타데이터가 설정되었는지 여부를 테스트한다. The CMDT instruction converts the memory data address into a memory metadata address using a compressed mapping function that is implementation-dependent, and tests whether the metadata bit corresponding to the memory metadata address is set. As an example, the compression ratio CR is 1 bit for 8 bytes. The metadata address calculation includes one of the context IDs from the MDCR register to provide a unique set of MDs for each individual context ID addressing MDBLK [CR] [MDCR.MDID [MDID number]]. META. . The instruction forces alignment by aligning the address 'mem' to the specified data size. The command tests whether metadata is set.

이하에서는 CMDT와 연관된 예시적인 의사 코드가 포함된다(제로 메타데이터 값을 나타내기 위해 ZF 플래그가 0으로 설정된다. 다른 모든 플래그들은 클리어된다): Below is an example pseudo code associated with the CMDT (ZF flag is set to 0 to indicate zero metadata value. All other flags are cleared):

if (TxCR.FORCE == 0) { if (TxCR.FORCE == 0) {

cr := 64 // 8 바이트에 대한 1 비트만이 지원됨 cr: = 64 // only 1 bit for 8 bytes is supported

mdid := MDCR[MDID_0] // 비트들 0:14 mdid: = MDCR [MDID_0] // bits 0:14

ZF := ! GetMetaDataBit(addr, cr, mdid) }
ZF: =! GetMetaDataBit (addr, cr, mdid)}

64MDT1 addr 64MDT1 addr

FLAGS := 0
FLAGS: = 0

if (TxCR.FORCE == 0) { if (TxCR.FORCE == 0) {

cr := 64 // 8 바이트 당 1 비트만이 지원됨 cr: = 64 // only 1 bit per 8 bytes is supported

mdid := MDCR[MDID_1] // bits 15:27 mdid: = MDCR [MDID_1] // bits 15:27

압축된 메타데이터 저장(Store compressed metadata ( CMDSCMDS ))

CMDS 명령어는 구현예에 좌우되는 압축된 매핑 함수를 사용해 메모리 데이터 어드레스를 메모리 메타데이터 어드레스로 변환한다. 압축 비율은 데이터 8 바이트에 대해 1 비트가 된다. imm8 값의 인코딩은 다음과 같다: 0->MD_Value; MD 안으로 저장될 값, 그리고 7: 1-> 예비용; 사용되지 않음The CMDS instruction converts memory data addresses into memory metadata addresses using a compressed mapping function that is implementation dependent. The compression ratio is 1 bit for 8 bytes of data. The encoding of imm8 values is as follows: 0-> MD_Value; Value to be stored in MD, and 7: 1-> spare; Not used

이하에서는 CMDS와 연관된 예시적인 의사 코드가 포함된다: The following includes example pseudo code associated with a CMDS:

기록 MDBLK[64][MDCR.MDID[MDID number]](addr).META = MD_value Write MDBLK [64] [MDCR.MDID [MDID number]] (addr) .META = MD_value

오퍼레이션 operation

C64MDS0 addr C64MDS0 addr

mdid := MDCR[MDID_0] // 비트들 0:14 mdid: = MDCR [MDID_0] // bits 0:14

StoreMetadataBit(addr, cr, mdid, imm8[0])
StoreMetadataBit (addr, cr, mdid, imm8 [0])

C64MDS1 addr C64MDS1 addr

mdid := MDCR[MDID_1] // 비트들 15:27 StoreMetadataBit(addr, cr, mdid, imm8[0])mdid: = MDCR [MDID_1] // bits 15:27 StoreMetadataBit (addr, cr, mdid, imm8 [0])

구현 주의: 명령어는 메타데이터에 대해 판독-설정 비트-기록 오퍼레이션을 수행할 것이다. Implementation Note: The instruction will perform a read-set bit-write operation on the metadata.

영향받은 플래그: 없음
Flags Affected: None

보호 모드 및 호환성 모드의 예외 Exceptions in protected mode and compatibility mode

#UD if CR4. OSTM [bit 15] = O #UD if CR4. OSTM [bit 15] = O

No #PF No #PF

64 비트 모드 예외 64-bit mode exception

#GP(0) 메모리 어드레스가 정의에 따르지 않는 형식으로 되어 있는 경우
#GP (0) When the memory address is in a format not defined by the definition

압축된 메타데이터 Compressed Metadata 클리어clear (( CMDCLRCMDCLR ))

CMDCLR 명령어는 MBLK(mem)를 아우르는 범위 내 어떤 데이터에 해당하는 모든 MDBLK[CR][MDCR.MDID[MDID number]].META를 리셋한다. The CMDCLR instruction resets all MDBLK [CR] [MDCR.MDID [MDID number]]. META corresponding to any data in the range including MBLK (mem).

CMDCLR과 연관된 전형적인 의사 코드가 아래에 포함된다: Typical pseudo code associated with a CMDCLR is included below:

오퍼레이션 operation

C64MDCLR0 C64MDCLR0

mblk := floor (addr, MBLK_SIZE) mblk: = floor (addr, MBLK_SIZE)

mdblkStart := mblk mdblkStart: = mblk

mdblkEnd := floor(mblk + MBLK_SIZE - 1, MDBLK_SIZE)mdblkEnd: = floor (mblk + MBLK_SIZE-1, MDBLK_SIZE)

mdid := MDCR[MDID_0)] // 비트들 0:14 mdid: = MDCR [MDID_0)] // bits 0:14

Mblk DO의 모든 mdblk에 대해 StoreMetadataBit(addr, cr, mdid, 0)
StoreMetadataBit (addr, cr, mdid, 0) for all mdblks in Mblk DO

C64MDCLR1 C64MDCLR1

mblk := floor (addr, MBLK_SIZE) mblk: = floor (addr, MBLK_SIZE)

mdblkStart := mblk mdblkStart: = mblk

mdblkEnd := floor(mblk + MBLK_SIZE - 1, MDBLK_SIZE)
mdblkEnd: = floor (mblk + MBLK_SIZE-1, MDBLK_SIZE)

mdid := MDCR[MDID l] // MDCR[27:15] mdid: = MDCR [MDID l] // MDCR [27:15]

구현 주의: 1차 구현시 CR 지원되는 64:1에 대해 1 바이트 클리어가 될 것임. Implementation Note: The first implementation will be 1 byte clear for 64: 1 CR supported.

영향받은 플래그: 없음 Flags Affected: None

#UD CR4.OSTM [bit 15] = 0인 경우. #UD CR4.OSTM [bit 15] = 0

64 비트 모드 예외 #GP(0) 메모리 어드레스가 정의에 따르지 않는 형식으로 되어 있는 경우 64-bit mode exception #GP (0) The memory address is in an undefined format

다음으로 510 단계에서, 메타데이터 어드레스는 압축 비율, 프로세싱 요소 ID, 컨텍스트 ID, MDID, 추상적 값, 오퍼랜드 사이즈, 및/또는 다른 추상적 어드레스 공간 변환 연관 값에 기반하여 메타데이터 액세스 오퍼레이션시 참조된 데이터로부터 결정된다. ID 값들을 데이터 어드레스의 무변환, 데이터 어드레스의 정상적 변환, 또는 데이터 어드레스의 별도의 추상적 어드레스 변환과 조합하는 것 같은 상술한 방법들 중 어느 하나가 사용되어 적절한 메타데이터 어드레스를 획득할 수 있다. Next, in step 510, the metadata address is derived from the data referenced in the metadata access operation based on the compression ratio, processing element ID, context ID, MDID, abstract value, operand size, and / or other abstract address space translation association values. Is determined. Any of the methods described above, such as combining ID values with no data address translation, normal data address translation, or separate abstract address translation of data addresses, may be used to obtain appropriate metadata addresses.

또한, 상술한 바와 같이 어떤 경우 한 개의 스레드나 메타데이터 컨텍스트가 다른 스레드나 메타데이터 컨텍스트의 메타데이터를 테스트하거나, 설정하거나 클리어할 수 있도록 어느 한 버전의 테스트, 설정, 클리어, 또는 다른 명령어들이 제공된다. 그 결과, 메타데이터 어드레스로의 변환은 마스크의 적용과 같은 어드레스의 수정을 포함하여, 한 스레드나 컨텍스트 ID로부터의 액세스가 다른 스레드나 컨텍스트 ID를 액세스할 수 있게 할 수 있다. In addition, as described above, in some cases one version of a test, set, clear, or other instruction is provided so that one thread or metadata context can test, set, or clear the metadata of another thread or metadata context. do. As a result, the translation to metadata address may include modification of the address, such as application of a mask, to allow access from one thread or context ID to access another thread or context ID.

515 단계에서, 메타데이터 어드레스에 의해 참조된 메타데이터가 액세스된다. 정상적인 경우, 로컬 요청 스레드나 컨텍스트 ID와 연관된 메타데이터의 디스조인트(disjoint) 위치가 액세스되고 테스트, 설정 및 클리어 같은 적절한 오퍼레이션이 수행된다. 그러나, 다른 경우 상술한 바와 같이 다른 스레드나 컨텍스트 ID에 대한 메타데이터가 이 단계에서 역시 액세스될 수도 있다. In step 515, the metadata referenced by the metadata address is accessed. Normally, the disjoint location of the metadata associated with the local request thread or context ID is accessed and appropriate operations such as test, set, and clear are performed. In other cases, however, metadata for other threads or context IDs may also be accessed at this stage as described above.

추상적 개념abstraction

소프트웨어에 대한 추상적 개념들의 실시예가 여기에 포함된다. 소정 CR은 얼마나 많은 데이터 비트들이 메타데이터의 한 비트에 매핑되는지를 가리키는 2의 멱수이다. 그것은 있다면 어떤 CR들의 값들이 사용될 수 있는지를 정의한 구현예이다. CR>1은 압축된 메타데이터를 나타낸다. CR=1 압축되지 않은 메타데이터를 나타낸다. Included herein are embodiments of abstract concepts for software. The given CR is a power of 2 indicating how many data bits are mapped to one bit of metadata. It is an implementation that defines which CR's values can be used, if any. CR> 1 represents compressed metadata. CR = 1 Indicates uncompressed metadata.

MDBLK[CR][*]들은 크기가 ceil(CR/8) 바이트이고 당연히 정렬된다. MDBLK들은 그들의 선형 가상 어드레스들이 아닌 물리적 데이터와 결부된다. 같은 값 floor(A/MDBLK[CR][*]_SIZE)을 가진 모든 유효한 물리적 어드레스들 A은 MDBLK들의 동일 집합들을 나타낸다. MDBLK [CR] [*] are ceil (CR / 8) bytes in size and are of course aligned. MDBLKs are associated with physical data rather than their linear virtual addresses. All valid physical addresses A with the same value floor (A / MDBLK [CR] [*] _ SIZE) represent the same sets of MDBLKs.

소정 CR에 있어서, 각각이 고유한 메타데이터 인스턴스를 나타내는 여러 개의 개별 MDID들이 존재할 수 있다. 소정 CR 및 MDID의 메타데이터는 어떤 다른 CR이나 MDID의 메타데이터와 구별된다. 예를 들어, Thd #0에 대해, addr이 QWORD 정렬된다고 가정할 때, MDBLK[CR=64][MDID=3](addr)에 의해 참조되는 메타데이터 블록은 MDBLK[CR=64][MDID=3](addr+7)과 같지만, MDBLK[CR=64][MDID=4](addr) 및 MDBLK[CR=512][MDID=3](addr)와는 확실히 구별된다. For a given CR, there may be several separate MDIDs, each representing a unique metadata instance. The metadata of a given CR and MDID is distinguished from the metadata of any other CR or MDID. For example, for Thd # 0, assuming that addr is QWORD aligned, the metadata block referenced by MDBLK [CR = 64] [MDID = 3] (addr) is MDBLK [CR = 64] [MDID = 3] (addr + 7), but is distinct from MDBLK [CR = 64] [MDID = 4] (addr) and MDBLK [CR = 512] [MDID = 3] (addr).

소정 구현예는 여러 동시 컨텍스트들을 지원할 수 있으며, 그 컨텍스트들의 개수는 CR 및, 해당 프로세서가 일부가 되는 특정 시스템과 연관된 소정 설정 정보에 따라 좌우될 것이다. 압축되지 않은 메타데이터에 대해, 물리적 데이터의 각 QWORD에 대한 메타데이터의 QWORD가 존재한다. Certain implementations may support multiple concurrent contexts, the number of contexts will depend on the CR and certain configuration information associated with the particular system to which the processor is part. For uncompressed metadata, there is a QWORD of metadata for each QWORD of physical data.

메타데이터는 소프트웨어에 의해서만 해석된다. 소프트웨어는 특정 MDBLK[CR][MDID]에 대해 META를 설정, 리셋, 또는 테스트하거나, Thd의 모든 MDBLK[*][*]들에 대한 META를 리셋하거나, 소정 MBLK(addr)을 교차할 수 있는 Thd의 모든 MDBLKS[CR][MDID]의 META를 리셋할 수 있다.Metadata is interpreted only by software. The software can set, reset, or test the META for a particular MDBLK [CR] [MDID], reset the META for all MDBLK [*] [*] s of Thd, or cross a given MBLK (addr). You can reset the META of all MDBLKS [CR] [MDID] of Thd.

메타데이터 손실(Metadata Loss). Thd의 어떤 META 특성은 자발적으로 0으로 리셋되어 Metadata Loss 이벤트를 생성할 수 있다. Metadata Loss. Some META characteristics of Thd may be spontaneously reset to zero to generate a Metadata Loss event.

강제된 메타데이터 값Forced Metadata Value

도 6을 참조하면, 강제된 메타데이터 값에 대한 하드웨어 지원을 제공하는 실시예가 예시된다. STM들은 보통 액세스 장벽들을 활용하여 메모리 액세스 오퍼레이션들 간의 일관성을 보장한다. 예를 들어, 어떤 데이터 아이템으로의 메모리 액세스 이전에, 그 데이터 아이템이 사용가능한지를 판단하기 위해 그 데이터 아이템과 결부된 메타데이터 위치나 락 위치가 체크된다. 다른 가능한 장벽 오퍼레이션들은 메타데이터나 락 위치의 데이터 아이템에 대해 판독 락, 기록 락, 또는 다른 락 같은 락을 획득하고, 트랜잭션할 판독 또는 기록 집합 내 데이터 아이템에 대한 버전을 기록/저장하고, 그 시점까지 트랜잭션할 판독 집합이 아직 유효한지를 판단하고, 데이터 아이템의 값을 버퍼링하거나 백업하고, 모니터들을 설정하고, 필터 값을 업데이트하는 것뿐 아니라 어떤 다른 트랜잭션 오퍼레이션들을 포함한다.With reference to FIG. 6, an embodiment is provided that provides hardware support for forced metadata values. STMs usually utilize access barriers to ensure consistency between memory access operations. For example, prior to memory access to a data item, the metadata location or lock location associated with that data item is checked to determine if the data item is available. Other possible barrier operations are to acquire a lock, such as a read lock, write lock, or other lock, on a data item at the metadata or lock position, to record / store a version of the data item in the read or record set to be transacted, and at that point in time. Determining if the read set to be transacted is still valid, buffering or backing up the value of the data item, setting monitors, updating filter values, as well as some other transactional operations.

그러나, 보통 어떤 트랜잭션 안에서 같은 데이터 아이템에 대해 이어지는 액세스들이 그 데이터 아이템으로의 액세스가 발생할 때마다 연관 트랜잭션 장벽을 실행하는 오버헤드를 야기한다. 어떤 트랜잭션 안에서 어드레스 A로의 세 번의 기록이 수행되고, 이러한 시나리오 상에서 그것이 어드레스 A에 대한 기록 락을 획득하기 위해 기록 장벽을 별도로 세 번 실행하는 결과를 가져오는 예를 예시한다. 어드레스 A에 대한 락은 제1트랜잭션 기록의 기록 장벽의 실행을 통해 이미 획득되었으므로, 마지막 두 트랜잭션 기록 전에 이어지는 두 번의 기록 장벽 실행은 불필요하게 된다-어드레스 A에 대한 락이 다시 획득될 필요가 없다.However, subsequent accesses to the same data item, usually within a transaction, incur the overhead of executing an associated transaction barrier whenever an access to that data item occurs. Three writes to address A are performed within a transaction, and in this scenario it illustrates an example in which it results in executing the write barrier three times separately to obtain a write lock for address A. Since the lock on address A has already been obtained through the execution of the write barrier of the first transaction write, the two write barrier runs that follow before the last two transaction writes become unnecessary—the lock on address A does not need to be acquired again.

따라서, 일 실시예에서 하드웨어는 이러한 장벽들과 연관된 실행을 가속화하기 위한 필터 값을 보유한다. 그 필터 값은 판독 및 기록 모니터들 같은 주석 비트로서 캐시 안에 포함되거나, 앞서 서술된 바와 같이 추상적 어드레스 공간 안의 메타데이터 위치 안에 보유될 수 있다. 상기 예를 이용하면, 제1기록 장벽이 조우될 때, 그것은 어드레스 A에 대한 기록 장벽이 트랜잭션 내에서 이미 만났다는 것을 가리키기 위해 기록 필터 값을 비액세스 값에서 액세스 값으로 업데이트한다. 따라서, 트랜잭션 안에서 이어지는 두 트랜잭션 기록 오퍼레이션 시, 기록 장벽으로 인도하기 전에, 어드레스 A에 대한 기록 필터 값이 체크된다. 여기서 필터 값은 기록 장벽이 실행될 필요가 없다-기록 장벽이 트랜잭션 안에서 이미 실행되었었다는 것을 나타내는 액세스 값을 포함한다. 그 결과, 마지막 두 기록 오퍼레이션에 대해서는 실행이 기록 장벽으로 인도되지 않는다. 즉, 필터 값은 트랜잭션 실행을 가속시킨다-필터를 사용하지 않는 이전 예와 비교할 때 마지막 두 액세스에 대해 기록 장벽의 실행을 생략하거나 포함할 필요가 없다. Thus, in one embodiment, the hardware retains filter values to speed up execution associated with these barriers. The filter value may be included in the cache as annotation bits, such as read and write monitors, or held in metadata locations in the abstract address space as described above. Using the above example, when the first write barrier is encountered, it updates the write filter value from the non-access value to the access value to indicate that the write barrier for address A has already met within the transaction. Thus, in the subsequent two transaction write operations within a transaction, the write filter value for address A is checked before leading to the write barrier. The filter value here does not need to run the write barrier-it includes an access value that indicates that the write barrier has already been run in a transaction. As a result, execution does not lead to a record barrier for the last two write operations. In other words, the filter value accelerates transaction execution-compared to the previous example without using a filter, there is no need to omit or include the execution of the write barrier for the last two accesses.

로드/판독을 위한 판독 필터들, 취소 오퍼레이션을 위한 취소 필터들, 및 일반적 필터 오퍼레이션을 위한 기타 필터들은 상기 기록 필터가 기록/저장 오퍼레이션에 대해 사용되었던 것과 같은 방식으로 사용될 수 있다. Read filters for load / read, cancel filters for cancel operation, and other filters for general filter operation may be used in the same manner as the write filter was used for the write / store operation.

역시 트랜잭션 장벽들과 결부된 개념은 강약의 원자성으로, 이것은 비트랜잭션 오퍼레이션들로부터 트랜잭션 오퍼레이션들을 분리하는 것에 대해 다룬다. 여기서, 트랜잭션 방식으로 로드되는 메모리 위치로의 트랜잭션 기록들이 충돌 가능성이 있는 것과 똑같이, 비트랜잭션 방식으로 로드된 메모리 위치로의 트랜잭션 기록은 비트랜잭션 로드 오퍼레이션에 의해 사용되는 무효한 데이터를 파생시키는 충돌일 수 있다. 약 원자성 시스템들에서, 비트랜잭션 오퍼레이션시 어떠한 장벽들도 삽입되지 않거나 최소한의 장벽들이 삽입되므로, 약 원자성 시스템들은 무효한 실행의 위험을 무릅쓴다. 반대로, 강 원자성 시스템들에서는 비트랜잭션 오퍼레이션들에서도 트랜잭션 장벽들이 삽입된다; 이것은 트랜잭션 및 비트랜잭션 오퍼레이션들 사이에 보호 및 분리를 제공하지만, 비용이 든다-트랜잭션 장벽을 모든 비트랜잭션 오퍼레이션에서 실행하는 비용이 든다.Again, the concept associated with transaction barriers is a strong atomic, which deals with separating transaction operations from non-transactional operations. Here, transaction records to a non-transactionally loaded memory location may be a conflict that derives invalid data used by non-transactional load operations, just as transaction records to a transactionally loaded memory location may potentially conflict. Can be. In weak atomic systems, the weak atomic systems run the risk of invalid execution because no barriers or minimal barriers are inserted in non-transactional operations. Conversely, in strong atomic systems, transaction barriers are also inserted in non-transactional operations; This provides protection and separation between transactional and non-transactional operations, but at a cost—the cost of executing a transaction barrier in every non-transactional operation.

따라서, 일 실시예에서, 강약 원자성 오퍼레이션의 여러 모드들을 지원하기 위해 비트랜잭션 오퍼레이션들에서 강원자성 배리어들과 함께 상술한 필터들이 고려될 수 있다. 예시를 위해 간략화된 전형적 실시예가 도 6에 도시된다. 여기서, 위에서 논의된 바와 같이 데이터(605)에 대해 메타데이터(610)가 하드웨어에 보유된다. 메타데이터(610)를 액세스하라는 메타데이터 액세스(600)가 수신된다. 일 실시예에서, 메타데이터 액세스는 판독 필터, 기록 필터, 취소 필터, 또는 기타 필터 같은 필터를 테스트하기 위한 테스트 메타데이터 오퍼레이션을 포함한다.Thus, in one embodiment, the filters described above along with the ferromagnetic barriers in non-transactional operations may be considered to support various modes of the weak atomic operation. An exemplary embodiment simplified for illustration is shown in FIG. 6. Here, metadata 610 is retained in hardware for data 605 as discussed above. Metadata access 600 is received to access metadata 610. In one embodiment, metadata access includes test metadata operations for testing filters such as read filters, write filters, cancellation filters, or other filters.

필터를 테스트하는 테스트 메타데이터 오퍼레이션은 트랜잭션 또는 비트랜잭션 액세스 오퍼레이션으로부터 비롯될 수 있다. 일 실시예에서, 컴파일러는 응용 코드를 컴파일할 때 그 응용 코드에 맞춰 트랜잭션 및 비트랜잭션 액세스들에서 트랜잭션 장벽에 대한 콜을 실행하는 데 대한 조건으로서 테스트 필터 오퍼레이션을 삽입한다. 따라서, 어떤 트랜잭션 안에서, 장벽에 대한 콜 전에 필터 오퍼레이션이 실행되고, 그것이 성공적으로 리턴되면 트랜잭션 장벽에 대한 콜이 실행되지 않으므로 위에서 말한 가속화를 제공한다. Test metadata operations that test filters can come from transactional or non-transactional access operations. In one embodiment, the compiler inserts a test filter operation as a condition for executing a call to a transaction barrier in transactional and non-transactional accesses in accordance with the application code when compiling the application code. Thus, within some transactions, the filter operation is executed before the call to the barrier, and if it returns successfully, the call to the transaction barrier is not executed, thus providing the aforementioned acceleration.

비트랜잭션 오퍼레이션들을 사용할 때, 일 실시예에서, 하드웨어는 비트랜잭션 오퍼레이션들에서의 트랜잭션 장벽들이 실행되는 약 원자성 모드, 및 트랜잭션 장벽들이 실행되는 강 원자성 모드로 동작할 수 있다.When using non-transactional operations, in one embodiment, the hardware may operate in a weak atomic mode in which transaction barriers in non-transactional operations are executed, and in a strong atomic mode in which transaction barriers are executed.

오퍼레이션 모드, 또는 컨트롤(625)이 MDID들을 보유하기 위해 상술한 MDCR 버전과 조합되거나 별도의 제어 레지스터일 수 있는 메타데이터 제어 레지스터(MDCR)(615) 안에 설정될 수 있다. 다른 실시예에서, 오퍼레이션 모드의 컨트롤(625)은 일반 트랜잭션 제어 레지스터나 상태 레지스터 안에 보유될 수 있다. 여기서, 제1실행 모드는 트랜잭션 장벽들이 비트랜잭션 오퍼레이션에서 실행되어야 하는 강 원자성 모드를 포함한다. 이 경우, 컨트롤(625)은 00 같이 강 원자성 및 비트랜잭션 오퍼레이션 모드를 나타내기 위한 제1값을 표현한다. 그에 응하여, 전형적 멀티플렉서로서 나열되는 로직(620)이 메타데이터 액세스(600)의 목적지 레지스터(650)로 제공될, 데이터 어드레스 A와 연관된 하드웨어 보유 메타데이터(610)로부터 메타데이터 값을 선택한다. 실질적으로, 강 원자성 모드에서 장벽들은 실제 하드웨어 보유 메타데이터에 기반해 가속화된다. 다른 대안으로서, 01 같은 제2값을 나타내는 컨트롤(625)에 의해 지시된 바와 같이 약 원자성 및 비트랜잭션 모드 같은 제2실행 모드 중에, 하드웨어 보유 메타데이터(610) 대신 메타데이터 액세스(600)에 응답하여 MDCR로부터 고정되거나 강제된 값이 목적지 레지스터(650)로 제공된다. An operation mode, or control 625, may be set in the metadata control register (MDCR) 615, which may be combined with the above-described MDCR version or may be a separate control register to hold the MDIDs. In other embodiments, control 625 of operation mode may be held in a general transaction control register or status register. Here, the first execution mode includes a strong atomic mode in which transaction barriers must be executed in non-transactional operations. In this case, control 625 represents a first value to indicate the strong atomic and non-transactional operation mode, such as 00. In response, logic 620, which is listed as a typical multiplexer, selects a metadata value from hardware retained metadata 610 associated with data address A to be provided to a destination register 650 of metadata access 600. In practice, barriers in the strong atomic mode are accelerated based on the actual hardware retention metadata. Alternatively, during the second execution mode, such as weak atomic and non-transactional mode, as indicated by the control 625 indicating a second value, such as 01, the metadata access 600 instead of the hardware retaining metadata 610 may be used. In response, a fixed or forced value from the MDCR is provided to the destination register 650.

실질적으로, 약 원자성 모드에서는 필터 값의 테스트가 항상 성공하고 비트랜잭션 메모리 액세스 이전에 트랜잭션 장벽에 대한 콜이 실행되지 않게 하기 위해, 테스트 필터 오퍼레이션(600)에 응답하여 강제된 값이 목적지 레지스터(650)로 제공된다. 이러한 내용은 테스트 필터 오퍼레이션이 필터 테스트가 성공했는지(배리어가 실행되지 않게 됨) 실패했는지(배리어가 실행되어야 함)를 나타내기 위해 불린 값을 리턴한다고 추정한다. 그 결과, 비트랜잭션 오퍼레이션들에서 모든 배리어들이 생략되는 한 오퍼레이션 모드-약 원자성 모드와, 비트랜잭션 오퍼레이션에서 배리어들이 실행되거나 하드웨어 보유 메타데이터에 기반해 가속화되는 제2동작 모드-강 원자성 모드를 제공하기 위해, 필터 값에 기반해 장벽들을 생략함으로써 트랜잭션들을 가속시키는 그와 같은 필터 소프트웨어 구조가 고려된다. 다른 실시예에서, 각 모듬마다 다양한 강제 값들이 제공될 수 있다. 여기서, 강 원자성 모드에서는 강제 값이 테스트 필터 오퍼레이션이 실패하면 장벽이 항상 실행되게 보장할 것이고, 약 원자성 모드에서는 강제 값이 테스트 필터가 성공하면 장벽이 실행되지 않도록 보장할 것이다. In practice, in weak atomic mode, the forced value is returned in response to the test filter operation 600 to ensure that the test of the filter value always succeeds and that no call to the transaction barrier is executed prior to non-transactional memory access. 650). This assumes that the test filter operation returns a boolean value to indicate whether the filter test succeeded (the barrier would not be executed) or failed (the barrier should be executed). As a result, operation mode-weak atomic mode as long as all barriers are omitted in non-transactional operations, and second mode of operation-strong atomic mode where barriers are executed or accelerated based on hardware retained metadata in non-transactional operations To provide, such a filter software architecture is considered that accelerates transactions by omitting barriers based on the filter value. In other embodiments, various forced values may be provided for each collection. Here, in strong atomic mode, the forced value will ensure that the barrier is always executed if the test filter operation fails, and in weak atomic mode, the forced value will ensure that the barrier is not executed if the test filter succeeds.

컨트롤(625)과 같은 컨트롤 정보에 기반해 MDCR(615) 같은 제어 레지스터로부터 강제 또는 고정 값을 제공하는 것이 모드 오퍼레이션에 기반해 고정/강제 값 또는 메타데이터 값을 제공하는 것과 연관해 기술되었지만, 강제되거나 고정된 값은 데이터-불변 양태가 수요에 따라 수행될 수 있는 메모리 액세스의 일반적 모니터링 및 디버깅에 사용될 수 있도록 하는 것과 같이 어떤 일반적인 메타데이터 용도에 대해 활용될 수 있다. Providing a forced or fixed value from a control register such as MDCR 615 based on control information such as control 625 has been described in connection with providing a fixed / forced or metadata value based on a mode operation. Or fixed values may be utilized for certain general metadata purposes, such as allowing data-invariant aspects to be used for general monitoring and debugging of memory accesses that can be performed on demand.

도 7로 가면, 트랜잭션 환경에서 원자성을 유지하면서 비트랜잭션 오퍼레이션들을 가속화하는 흐름도의 일 실시예가 묘사된다. 단계 705에서, 데이터 어드레스를 참조하는 메타데이터(MD) 액세스 오퍼레이션이 일어난다. 일 특정한 예시적 예로서, MD 액세스 오퍼레이션은 앞서 컴파일러에 의해, 테스트가 한 값(성공)을 리턴한 경우 비트랜잭션 메모리 액세스에서 트랜잭션 장벽을 생략하고 테스트가 제2값(실패)을 리턴한 경우, 장벽을 실행하는 응용 코드와 부합하여 삽입된 테스트 오퍼레이션을 포함한다. 그러나, 테스트 MD 오퍼레이션이 거기에 국한되는 것은 아니며, 불린(Boolean) 성공 또는 실패 값을 리턴하기 위한 어떤 테스트 오퍼레이션이라도 포함할 수 있다. 7, one embodiment of a flow diagram for speeding up non-transactional operations while maintaining atomicity in a transactional environment is depicted. In step 705, a metadata (MD) access operation occurs that references the data address. As one specific illustrative example, the MD access operation is previously used by the compiler to omit the transaction barrier from non-transactional memory access when the test returns a value (success) and the test returns a second value (failure), Contains test operations inserted to match the application code that executes the barrier. However, the test MD operation is not limited thereto and may include any test operation for returning a Boolean success or failure value.

단계 710에서 오퍼레이션 모드가 결정된다. 여기서, 오퍼레이션 모드의 예들은 강 원자성이나 약 원자성과 조합된 트랜잭션 방식이거나 비트랜잭션 방식일 수 있다. 따라서, 한 개나 두 개의 각각의 레지스터들은 트랜잭션 또는 비트랜잭션 오퍼레이션 모드를 나타내기 위한 제1비트 및 강 또는 약 원자성 오퍼레이션 모드에 대한 제2비트를 보유할 수 있다. In step 710 the operation mode is determined. Here, examples of the operation mode may be transactional or non-transactional in combination with strong or weak atomicity. Thus, one or two respective registers may hold a first bit to indicate a transactional or non-transactional operation mode and a second bit for a strong or weak atomic operation mode.

오퍼레이션 모드가 트랜잭션 또는 비트랜잭션 방식이고 강 원자성이면, 하드웨어 보유 메타데이터 값이 메타데이터 액세스 오퍼레이션으로 제공된다-하드웨어 보유 값이 MD 액세스 오퍼레이션에 의해 특정된 목적지 레지스터 안에 위치된다. 반대로, 오퍼레이션 모드가 비트랜잭션 방식이고 약 원자성이면, 하드웨어 보유 MD 값 대신 강제 MDCR 고정 값이 MD 액세스 오퍼레이션으로 제공된다. 그 결과 강 원자성 모드 중에는 하드웨어 보유 MD 값에 기반해 장벽들이 가속화되거나 되지 않고, 약 원자성 모드에서는 강제 MDCR 값에 기반해 장벽들이 가속화된다. If the operation mode is transactional or non-transactional and strong atomic, then the hardware retention metadata value is provided to the metadata access operation-the hardware retention value is located in the destination register specified by the MD access operation. Conversely, if the operation mode is non-transactional and weakly atomic, then a forced MDCR fixed value is provided to the MD access operation instead of the hardware retained MD value. As a result, barriers are accelerated or not based on the hardware-retained MD value during strong atomic mode, and barriers are accelerated based on the forced MDCR value in weak atomic mode.

버퍼링 및 Buffering and 모니터링된Monitored 상태로의 효율적 전환 Efficient transition to state

도 8로 가면, 트랜잭션 커미트 전에 버퍼링 밍 모니터링된 상태로 데이터의 블록을 효율적으로 전환하는 방법의 흐름도에 대한 일 실시예가 예시된다. 상술한 바와 같이, 데이터 아이템이나 메타데이터를 보유하는 캐시 라인과 같은 메모리의 블록들이 버퍼링 및/또는 모니터링될 수 있다. 예를 들어, 캐시 라인에 대한 일관성(coherency) 비트들은 버퍼링된 상태의 표현을 포함하고, 캐시 라인에 대한 속성 비트들은 그 캐시 라인이 모니터링되지 않은 것인지, 판독 모니터링되는지, 혹은 기록 모니터링되는지를 나타낸다.8, one embodiment for a flowchart of a method of efficiently transitioning a block of data to a buffered dimming monitored state prior to transaction commit is illustrated. As discussed above, blocks of memory, such as cache lines holding data items or metadata, may be buffered and / or monitored. For example, coherency bits for a cache line include a representation of a buffered state, and attribute bits for the cache line indicate whether the cache line is unmonitored, read monitored, or write monitored.

일부 실시예들에서, 캐시 라인은 버퍼링은 되지만 모니터링되지는 않는데, 이것은 그 캐시 라인에 보유된 데이터가 손실이 있고 모니터링이 적용되지 않아서 캐시 라인에 대한 충돌이 검출되지 않는다는 것을 의미한다. 예를 들어, 메타데이터 같이, 트랜잭션에 대해 로컬이지만 커미트되지 않아야 하는 데이터는 버퍼링되지만 모니터되지 않은 상태로 보유될 수 있다.In some embodiments, a cache line is buffered but not monitored, which means that data held in that cache line is lost and no monitoring is applied so that no collision for the cache line is detected. For example, data such as metadata that is local to a transaction but should not be committed can be buffered but kept unmonitored.

버퍼링된 데이터와 동일 어드레스들로의 기록들 사이에서 충돌이 검출되어질 때, 판독 모니터링이 그 데이터에 적용된다. 그러면 캐시 라인은 버퍼링 및 판독 모니터링된 상태로 옮겨진다; 그러나, 그 상태에 도달하기 위해 모든 다른 사본들이 어떤 공유 상태로 전환되게 강제하는 외부 프로세싱 요소들로 판독 요청이 전송된다. 이러한 외부의 판독 요청들은 같은 블록/캐시 라인 상에 기록 모니터를 보유하는 다른 프로세싱 요소와 충돌을 일으킬 수 있다. When a collision is detected between the buffered data and writes to the same addresses, read monitoring is applied to that data. The cache line is then moved to a buffered and read monitored state; However, a read request is sent to external processing elements that force all other copies to switch to some shared state to reach that state. Such external read requests may conflict with other processing elements that hold a write monitor on the same block / cache line.

마찬가지로, 버퍼링된 데이터와 동일 메모리 블록들에 대한 판독들 사이에서 충돌이 검출되어질 때, 기록 모니터링이 그 캐시 라인에 적용된다. 그러면 캐시 라인은 버퍼링 및 기록 모니터링된 상태로 옮겨지는데, 그것은 모든 다른 사본들이 어떤 무효한 상태로 전환되게 강제하는 다른 프로세싱 요소들로 소유권에 대한 판독 요청을 전송함으로써 달성된다. 마찬가지로, 같은 메모리 블록에 대해 판독나 기록 모니터를 보유하는 어떤 프로세싱 요소와의 충돌이 검출된다. Similarly, when a collision is detected between the buffered data and reads for the same memory blocks, write monitoring is applied to that cache line. The cache line is then moved to a buffered and write monitored state, which is accomplished by sending a read request for ownership to other processing elements that force all other copies to go into some invalid state. Similarly, collisions with any processing element that holds a read or write monitor for the same memory block are detected.

트랜잭션 충돌을 최소화하기 위해, 트랜잭션이 업데이트를 필요로 하지만 궁극적으로 커미트할 필요는 없는 메모리 블록이 상술한 바와 같이 버퍼링되지만 모니터링되지 않은 상태로 보유될 수 있다. 그러나, 버퍼링되나 모니터링되지는 않은 상태로 보유된 블록이 커미트되도록 결정되면, 도 8에 도시된 것과 같이 일 실시예에서 버퍼링되고 모니터링되지 않은 상태로부터 커미트 가능한 상태로의 효율적 경로가 제공된다. To minimize transaction conflicts, memory blocks that require transactions but ultimately do not need to be committed can be buffered as described above but kept unmonitored. However, if a block held in a buffered but not monitored state is determined to be committed, an efficient path from the buffered and unmonitored state to the committable state is provided in one embodiment as shown in FIG. 8.

예로서, 메모리 블록-블록을 보유하기 위한 캐시 라인-에 대한 버퍼링된 업데이트가 단계 805에서 수신된다. 버퍼링된 업데이트 전이나, 그와 동시에, 그 블록으로 판독 모니터링이 적용된다. 예를 들어, 캐시 라인에 대한 판독 속성은 블록이 판독 모니터링되는 것을 나타내기 위해 판독 모니터 값으로 설정된다. 그러나, 판독 모니터링을 적용하기 위해 우선 단계 815에서 판독 요청이 다른 프로세싱 요소들로 보내진다. 판독 요청을 수신함에 따라, 단계 820에서 그 다른 프로세싱 요소들은 이미 기록 모니터링 상태로 라인을 유지하는데 기인하는 충돌을 검출하거나, 자신들의 사본들을 공유된 상태로 전환한다. 단계 825에서, 충돌이 없으면, 캐시 라인은 버퍼링 및 판독 모니터링 상태로 전환된다-캐시 라인 일관성 비트들이 버퍼링 일관성 상태로 업데이트되고 판독 모니터 속성이 설정된다. As an example, a buffered update to a memory block—a cache line to hold a block— is received at step 805. Read monitoring is applied to the block before or at the same time as the buffered update. For example, the read attribute for the cache line is set to a read monitor value to indicate that the block is read monitored. However, in order to apply read monitoring, a read request is first sent to other processing elements in step 815. Upon receiving the read request, at step 820 the other processing elements detect a collision due to keeping the line already in the write monitoring state, or transition their copies to the shared state. In step 825, if there is no conflict, the cache line transitions to the buffering and read monitoring state-the cache line consistency bits are updated to the buffering consistency state and the read monitor attribute is set.

단계 830에서, 판독 모니터링에 기반하여 캐시 라인에 대해 충돌하는 기록들이 검출된다. 일 실시예에서 판독 속성들은 스눕 로직과 연관되어, 캐시 라인에 대한 외부의 소유권 판독 요청이 캐시 라인 상에 설정된 판독 모니터와의 충돌을 검출할 수 있게 한다.In step 830, conflicting writes are detected for the cache line based on read monitoring. In one embodiment, the read attributes are associated with snoop logic to enable an external ownership read request for the cache line to detect a collision with a read monitor set on the cache line.

나중에, 단계 835에서 블록이 트랜잭션의 어떤 상태의 일부로서 커미트되어야 할 때, 기록 모니터링이 단계 840에서 적용된다. 여기서, 소유권 판독 요청이 단계 845에서 다른 프로세싱 요소들로 보내지는데, 그 프로세싱 요소는 캐시 라인을 판독나 기록 모니터링 상태로 보유한 데 답하여 충돌을 검출하거나 단계 850에서 자신들의 사본을 무효한 상태로 전환시킨다. 그 결과, 소유권 판독 요청시의 충돌 검출은 실질적으로 그 라인을 커미트 가능한 상태로 놓는 그 시점에서 검출될 어떤 충돌들을 고려한다. Later, when step 835 needs to be committed as part of some state of the transaction, write monitoring is applied at step 840. Here, an ownership read request is sent to other processing elements in step 845, which processes the cache line in response to retaining the cache line in a read or write monitoring state to detect a conflict or transition their copy to an invalid state in step 850. . As a result, collision detection in an ownership read request considers any conflicts that will be detected at that point that substantially put the line in a committable state.

결과적으로, 두 단계-단계 810 및 840-에서 버퍼링 및 비모니터링된 블록을 커미트 가능 상태로 전환하는 것이 바람직할 수 있다. 판독 및 기록 모니터들의 단계화된 포착을 통한 소유권 획득을 따르는 것은 여러 동시 트랜잭션들이 같은 블록을 업데이트하게 하면서, 그 트랜잭션들 사이의 충돌을 줄이게 한다. 트랜잭션이 어떤 이유로 커미트 단계에 이르지 못하면, 블록을 버퍼링 및 판독 모니터링 방식으로 업데이트하는 것은 불필요하게 중단시킬 커미트 상태로 도달될 다른 트랜잭션을 유발시키지 않을 것이다. 또, 커미트 단계가 될 때까지 블록에 대한 유일한 소유권 획득을 따르는 것이 그래서 데이터 유효성을 희생시키지 않으면서 스레드들 사이의 보다 높은 동시성을 얻게 하는 방법이 된다.As a result, it may be desirable to transition the buffered and unmonitored blocks to a commitable state in two steps- 810 and 840-. Following ownership gain through the staged capture of read and write monitors allows multiple concurrent transactions to update the same block, while reducing conflicts between those transactions. If a transaction fails to reach the commit phase for some reason, updating the block in a buffered and read-monitoring manner will not cause another transaction to arrive in the commit state which would unnecessarily abort. Also, following the sole ownership of a block until the commit phase is a way to achieve higher concurrency between threads without sacrificing data validity.

아래의 표 E는 두 프로세싱 요소들인 P0 및 P1 사이의 충돌 상태들에 대한 일 실시예를 묘사한다. 예를 들어, R-B 열에 나타낸 것과 같이 버퍼링되고 판독 모니터링 상태로 P1에 의해 보유된 라인과, -W-, RW-, WB, RWB,에 의해 나타낸 것과 같이 기록 모니터링 상태로 보유된 캐시 라인을 가진 P0의 어떤 상태는 교차하는 칸들에서 x로 나타낸 것과 같이 충돌한다. Table E below depicts one embodiment for collision states between two processing elements, P0 and P1. For example, P0 buffered as shown in the RB column and held by P1 in read monitoring state, and P0 with cache line held in write monitoring state as shown by -W-, RW-, WB, RWB, Some states of C collide as indicated by x in the intersecting cells.

또한, 이하의 표 F는 P0 아래에 나열된 오퍼레이션에 응답하는 프로세싱 요소 P1에서의 연관 속성의 손실을 예시한다. 예를 들어, P1이 R-B 열에 의해 나타낸 것과 같이 버퍼링되고 판독 모니터링된 상태로 라인을 보유하고, P0에 대해 저장나 설정 기록 모니터 오퍼레이션이 일어나는 경우, P1은 저장/설정 WM 행들과 R-B 열의 교차점 x-x에 의해 나타낸 바와 같이 라인의 판독 모니터링 및 버퍼링 둘 모두를 상실한다. In addition, Table F below illustrates the loss of the associated attribute at the processing element P1 in response to the operations listed under P0. For example, if P1 holds a line in a buffered and read-monitored state as represented by column RB, and a save or set write monitor operation occurs for P0, P1 is at the intersection xx of the store / set WM rows and the RB column. As indicated by both read monitoring and buffering of the line are lost.

트랜잭션 데이터의 충돌이나 손실에 대한 For conflicts or loss of transactional data 브랜치Branch 명령어( command( JLOSSJLOSS ))

도 9로 가면 트랜잭션 상태 레지스터 내 상태 값에 기반하여 목적지 레이블로 점프하는 손실(loss) 명령어를 지원하는 하드웨어의 일 실시예가 예시된다. 일 실시예에서, 하드웨어는 트랜잭션의 일관성을 체크하는 가속화된 방식을 제공한다. 예들로서, 하드웨어는 캐시로부터 모니터링 또는 버퍼링되는 데이터의 손실-버퍼링되거나 모니터링된 라인들의 퇴거(eviction)-를 추적하거나 그러한 데이터에 대해 충돌 가능한 액세스들-모니터링된 라인에 대한 소유권의 판독 요청과 같은 충돌하는 스눕들을 검출하기 위한 모니터들-을 추적하는 메커니즘들을 제공함으로써 일관성 체크를 지원할 수 있다. 9, one embodiment of hardware that supports a loss instruction that jumps to a destination label based on a status value in a transaction status register is illustrated. In one embodiment, the hardware provides an accelerated way of checking the consistency of transactions. As examples, the hardware may track the loss of data monitored or buffered from the cache—eviction of buffered or monitored lines—or conflicts such as conflicting accesses to such data—requests to read ownership of the monitored line. Supporting consistency checks can be supported by providing mechanisms to track monitors for detecting snoops.

또, 일 실시예에서 하드웨어는 소프트웨어가 모니터링되거나 버퍼링된 데이터의 상태에 기반하여 그러한 메커니즘들을 액세스할 수 있게 하는 구조적 인터페이스들을 제공한다. 두 개의 그러한 인터페이스들은 다음과 같은 것을 포함한다: (1) 소프트웨어가 실행 중에 레지스터를 명시적으로 폴링할 수 있게 하는 상태 레지스터를 읽거나 쓰는 명령어들; (2) 소프트웨어가 상태 레지스터가 일관성의 가능한 손실을 나타낼 때마다 유발되는 핸들러를 설정할 수 있게 하는 인터페이스. Further, in one embodiment, the hardware provides structural interfaces that allow software to access such mechanisms based on the state of the monitored or buffered data. Two such interfaces include the following: (1) instructions to read or write a status register that allows the software to explicitly poll the register during execution; (2) An interface that allows software to set up a handler that is triggered whenever the status register indicates a possible loss of consistency.

다른 실시예에서, 하드웨어는 HW 모니터링되거나 버퍼링된 데이터의 상태에 기반하여 조건부 분기를 수행하는 JLOSS라 칭하는 신규 명령어를 지원한다. JLOSS 명령어는 하드웨어가 캐시로부터 어떤 모니터링되거나 버퍼링된 데이터의 가능한 손실을 검출한 경우, 혹은 그런 어떤 데이터에 대한 가능한 충돌을 검출한 경우 어떤 레이블로 분기한다. 레이블은 데이터 손실이나 충돌 검출의 결과로서 실행될 핸들러나 다른 코드의 어드레스같은 어떤 목적지를 포함한다. In another embodiment, the hardware supports a new instruction called JLOSS that performs a conditional branch based on the state of the HW monitored or buffered data. The JLOSS instruction branches to a label if the hardware detects a possible loss of any monitored or buffered data from the cache, or if it detects a possible conflict with any such data. The label contains some destination, such as the address of a handler or other code to be executed as a result of data loss or conflict detection.

예시적 실시예로서, 도 9는 JLOSS를 프로세서 ISA의 일부로서 인식하고 트랜잭션의 상태에 기반해 프로세서의 로직이 조건부 분기를 수행하게 하는 명령을 디코딩하는 디코더들(910)을 묘사한다. 예로서, 트랜잭션의 상태는 트랜잭션 상태 레지스터(915)에 보유된다. 트랜잭션 상태 레지스터는 여기에서 상실 이벤트라고 칭하는 데이터의 충돌이나 손실을 하드웨어가 검출할 때와 같은 트랜잭션의 상태를 표현할 수 있다. 예시를 위해, 모니터링된 어드레스에 대한 스눕과 함께 어드레스가 모니터링된 것을 나타내는 모니터에 대한 TSR(915) 내 충돌 플래그가 설정되고, TSR(912) 내 충돌 플래그는 충돌이 검출되었음을 나타낸다. 마찬가지로, 트랜잭션 데이터나 메타데이터를 포함하는 라인의 퇴거 같은 데이터의 손실에 대한 손실 플래그가 설정된다. As an exemplary embodiment, FIG. 9 depicts decoders 910 that recognize JLOSS as part of the processor ISA and decode an instruction that causes the processor's logic to perform a conditional branch based on the state of the transaction. By way of example, the status of a transaction is held in transaction status register 915. The transaction status register may represent the status of a transaction, such as when the hardware detects a collision or loss of data, referred to herein as a loss event. For illustration purposes, a conflict flag in the TSR 915 is set for the monitor indicating that the address is monitored along with a snoop for the monitored address, and a conflict flag in the TSR 912 indicates that a conflict was detected. Similarly, a loss flag is set for loss of data, such as the retirement of a line containing transaction data or metadata.

따라서 여기서 JLOSS는 디코딩되어 실행될 때 상태 레지스터를 테스트하고, 손실 이벤트-손실 및/또는 충돌이 있는 경우, 로직(925)이 실행 자원들(930)로 JLOSS에 의해 참조되는 레이블을 점프 목적지 어드레스로서 제공한다. 그 결과로서, 하나의 명령어를 사용하여, 소프트웨어는 트랜잭션의 상태를 식별할 수 있고, 그 상태에 기초하여 단일 명령어에 의해 특정된 레이블로 실행을 인도할 수 있다. JLOSS가 일관성을 체크하기 때문에, 거짓 충돌에 대한 보고는 허용될 수 있다-JLOSS는 충돌이 발생했다는 것을 보수적으로 보고할 수 있다. Thus, JLOSS here tests the state register when it is decoded and executed, and if there is a loss event-loss and / or conflict, the logic 925 provides the label referenced by JLOSS as execution destination 930 as the jump destination address. do. As a result, using one instruction, the software can identify the state of the transaction and guide execution to a label specified by a single instruction based on that state. Since JLOSS checks for consistency, reporting of false conflicts can be allowed-JLOSS can conservatively report that a collision has occurred.

일 실시예에서, 컴파일러 같은 소프트웨어는 일관성에 대해 폴링하기 위해 JLOSS 명령어들을 프로그램 코드 안에 삽입한다. JLOSS는 메인 응용 코드와 같이 사용될 수 있지만, 보통 JLOSS 명령어들은 수요에 따라 일관성을 결정하기 위해 주로 라이브러리들 안에서 제공되는 판독 및 기록 장벽들 안에서 활용되고, 따라서, 프로그램 코드의 실행은 JLOSS를 코드 안에 삽입하는 컴파일러나, 프로그램 코드로부터 JLOSS의 실행, 명령어를 삽입하거나 실행하는 어떤 따른 형식을 포함할 수 있다. JLOSS 명령어는 추가 레지스터들을 필요로 하지 않으므로-명시적 판독을 위한 상태 정보를 읽을 목적지 레지스터가 있을 필요가 없으므로 JLOSS에 의한 폴링은 상태 레지스터의 명시적 판독보다 훨씬 빠르다고 예상된다. 일관성을 체크하는 조건이 명령어에 명시적으로 제공되거나 다른 제어 레지스터에 내재적으로 제공되는 이러한 명령어의 몇 가지 실시예들이 존재한다. In one embodiment, software such as a compiler inserts JLOSS instructions into program code to poll for consistency. JLOSS can be used with the main application code, but usually JLOSS instructions are utilized within the read and write barriers provided primarily in libraries to determine consistency on demand, so that execution of program code inserts JLOSS into the code. It can contain a compiler, or any form of execution of JLOSS from program code, inserting or executing instructions. Because JLOSS instructions do not require additional registers-polling by JLOSS is expected to be much faster than explicit read of status registers, since there is no need for a destination register to read status information for explicit reads. There are several embodiments of such an instruction in which a condition for checking consistency is provided either explicitly in the instruction or inherently in another control register.

일례로서, 트랜잭션 상태 레지스터(915)나 다른 저장 요소는 판독 모니터링된 위치가 다른 에이전트-판독 충돌에 의해 기입되었는지, 기록 모니터링된 위치가 다른 에이전트-기록 충돌, 물리적 트랜잭션 데이터의 손실, 또는 메타데이터의 손실에 의해 기입되었는지와 같은 특정한 충돌 및 손실 상태 정보를 보유한다. 따라서, 다양한 버전의 JLOSS 명령어가 활용될 수 있다. 예를 들어, JLOSS. rm <label> 명령어는 어떤 판독 모니터링된 위치가 다른 에이전트에 의해 기입되었을 경우 그 레이블로 분기할 것이다. 하드웨어 가속 STM(HASTM(hardware-accelerated STM))은 일관성 체크를 가속화하기 위해 이 JLOSS.rm을 사용할 수 있다-네이티브 코드 TM 시스템 내 각각의 트랜잭션 로드 이후와 같이 판독-설정 일관성을 보장하려고 할 때마다 JLOss.rm을 사용해 판독 설정에 대해 충돌하는 업데이트를 빠르게 체크한다. 이 경우, 판독 설정은 판독 장벽 내 JLOSS를 이용해 검증될 수 있으므로, JLOSS 명령어는 라이브러리 내 장벽 안이나 메인 응용 코드와 부합하는 로드 오퍼레이션 다음에 삽입된다. 판독 모니터링된 위치들에 대해 기록을 검출하는 JLOSS.rm 명령어와 유사하게, 기록 모니터링된 위치들에 대한 어떤 판독나 기록을 검출하기 위해 JLOSS.wm 명령어가 활용될 수 있다. 또 다른 예로서, 위치들을 버퍼링할 수 있는 프로세서에서, 버퍼링된 데이터가 상실되었는지를 판단하고 그 결과로서 특정 레이블로 점프하기 위해 JLOSS.buf 명령어가 사용될 수 있다. As an example, the transaction status register 915 or other storage element may be used to determine whether the read monitored location was written by another agent-read conflict, the write-monitored location was written by another agent-write conflict, loss of physical transaction data, or metadata. Holds specific crash and loss status information, such as whether written by loss. Thus, various versions of JLOSS commands can be utilized. For example, JLOSS. The rm <label> command will branch to that label if any read-monitored location has been written by another agent. Hardware-accelerated STM (HASTM) can use this JLOSS.rm to accelerate consistency checks-whenever you want to ensure read-set consistency, such as after each transaction load in a native code TM system. Use JLOss.rm to quickly check for conflicting updates against read settings. In this case, the read settings can be verified using JLOSS in the read barrier, so JLOSS instructions are inserted within the library barrier or after a load operation that matches the main application code. Similar to the JLOSS.rm command that detects writes to read-monitored locations, the JLOSS.wm command may be utilized to detect any reads or writes to write-monitored locations. As another example, in a processor capable of buffering locations, the JLOSS.buf instruction may be used to determine if buffered data has been lost and as a result jump to a specific label.

의사 코드 A로 레이블링되는 이하의 의사 코드는 일관된 판독 설정을 제공하고 JLOSS를 사용하는 네이티브 코드 STM 판독 장벽을 보인다. setrm(void* address) 함수는 주어진 어드레스에 대한 판독 모니터를 설정하고, jloss_rm() 함수는 판독 모니터링된 위치들에 대한 어떤 충돌하는 액세스들이 발생했다면 참(true)을 리턴하는 JLOSS 명령어에 대한 내재적 함수이다. 이 의사 코드가 로드된 데이터를 모니터하지만, 대신 트랜잭션 레코드들(소유권 레코드들)을 모니터링하는 것도 가능하다. 판독 모니터의 설정을 데이터의 로딩과 연관시키는 명령어-예컨대, 데이터를 로드하고 모니터하는 movxm 명령어를 사용하는 것이 가능하다. 모니터링 외에 필터링을 수행하는 판독 장벽에 이것을 사용할 뿐 아니라 판독-설정 유효화에 대한 하드웨어 모니터링만을 이용하는 STM 시스템-어떠한 소프트웨어 판독 로깅 및 어떠한 SW 유효화도 수행하지 않는 STM 시스템에서 이것을 사용할 가능성도 있다.The following pseudo code, labeled Pseudo Code A, provides a consistent read setting and exhibits a native code STM read barrier using JLOSS. The setrm (void * address) function sets the read monitor for the given address, and the jloss_rm () function is an implicit function for the JLOSS instruction that returns true if any conflicting accesses to the read monitored locations occurred. to be. Although this pseudo code monitors the loaded data, it is also possible to monitor transaction records (ownership records) instead. It is possible to use an instruction to associate the setting of the read monitor with the loading of the data, for example the movxm instruction to load and monitor the data. In addition to monitoring, it is possible to use it in read barriers that perform filtering, as well as in STM systems that only use hardware monitoring for read-set validation-STM systems that do not perform any software read logging and any SW validation.

의사 코드 A:Pseudo Code A: 적절한 appropriate 업데이트update STMSTM , 최적 판독, , Optimal reading, 네이티브Native 코드 판독 장벽 Code reading barrier

Type tmRd<Type>(TxnDesc* txnDesc,Type* addr) { Type tmRd <Type> (TxnDesc * txnDesc, Type * addr) {

setrm(addr); /* 로드된 어드레스에 대한 판독 모니터 설정 */ setrm (addr); / * Read monitor setting for loaded address * /

TxnRec* txnRecPtr = getTxnRecPtr(addr); TxnRec * txnRecPtr = getTxnRecPtr (addr);

TxnRec txnRec = *txnRecPtr; TxnRec txnRec = * txnRecPtr;

val = *addr; val = * addr;

if (txnRec != txnDesc) {if (txnRec! = txnDesc) {

while (!validateAndLogUtm(txnDesc,txnRecPtr,txnRec)) { while (! validateAndLogUtm (txnDesc, txnRecPtr, txnRec)) {

/* 재시도 */ /* retry */

txnRec = *txnRecPtr;txnRec = * txnRecPtr;

val = *addr; val = * addr;

}}

} }

return val;return val;

} }

bool validateAndLog(TxnDesc* txnDesc,TxnRec* txnRecPtr,TxnRec txnRec) {bool validateAndLog (TxnDesc * txnDesc, TxnRec * txnRecPtr, TxnRec txnRec) {

if (isWriteLocked(txnRec)||if (isWriteLocked (txnRec) ||

!checkReadConsistency(txnDesc,txnRecPtr,txnRec)){! checkReadConsistency (txnDesc, txnRecPtr, txnRec)) {

handleContention(...); handleContention (...);

return false;return false;

} }

logRead(txnDesc,txnRecPtr); logRead (txnDesc, txnRecPtr);

return true; return true;

} }

bool checkReadConsistency(TxnDesc* txnDesc,TxnRec* txnRecPtr,TxnRec txnRec) bool checkReadConsistency (TxnDesc * txnDesc, TxnRec * txnRecPtr, TxnRec txnRec)

{ {

if (txnRec > txnDesc->timestamp) { if (txnRec> txnDesc-> timestamp) {

TxnRec timestamp = GlobalTimestamp; TxnRec timestamp = GlobalTimestamp;

jlossjloss __ rmrm LostLost : :

txnDesc->timestamp = timestamp; txnDesc-> timestamp = timestamp;

return true; return true;

Lost: Lost:

if (validateReadSet(txnDesc) == false) if (validateReadSet (txnDesc) == false)

abortO; abortO;

txnDesc->timestamp = timestamp; txnDesc-> timestamp = timestamp;

}}

TSR.status_bits = 0; TSR.status_bits = 0;

return (txnRec == *txnRecPtr); /* txnrec가 변경되었는지 체크 */return (txnRec == * txnRecPtr); / * Check if txnrec has changed * /

} }

마찬가지로, 관리되는 코드를 위한 STM과 같이 판독-설정 일관성을 유지하지 못하는 STM 시스템은 루프 백 에지(loop back edges)나, 다른 중요한 제어 단계 지점에 예외를 일으킬 수 있는 명령어들 같은 JLOSS.rm 명령어를 삽입함으로써, 모순으로 인한 무한 루프나 다른 부정확한 제어 플로-예외사항들을 피할 수 있다. Similarly, STM systems that do not maintain read-set consistency, such as STM for managed code, use JLOSS.rm instructions such as loop back edges or instructions that can cause exceptions at other critical control point points. By inserting, infinite loops or other inaccurate control flow-exceptions due to contradictions can be avoided.

의사 코드 B로 표식되는 이하의 의사 코드는 일관성을 제공하는 다른 네이티브 코드 판독 장벽을 보인다. 이러한 버전의 TM 시스템은 트랜잭션 내 기록들에 대해 버퍼링된 업데이트들을 이용하는 캐시에 상주하는 기록 설정들을 사용한다. 앞서 버퍼링되었고 그런 다음 상실된 위치로부터의 판독은 불일치를 야기하므로, 일관성을 유지하기 위해, 이 판독 장벽은 어떤 버퍼링된 후 손실된 위치로부터의 판독을 피한다. COMMIT LOCKING 플래그는 STM이 버퍼링된 위치들에 대해 커미트 시간 락을 이용하는 경우 참이 된다. 커미트 시간 락을 이용하지 않을 때 이전에 락된 위치로부터의 판독들에 대한 jloss_buf() 체크가 활용되고; 다른 경우에는 그것이 모든 판독들에 대해 활용된다.The following pseudo code, denoted pseudo code B, presents another native code reading barrier that provides consistency. This version of the TM system uses write settings that reside in the cache using updates buffered for writes in a transaction. Reads from previously buffered and then lost locations cause inconsistencies, so to maintain consistency, this read barrier avoids reading from lost locations after any buffering. The COMMIT LOCKING flag is true if STM uses a commit time lock on buffered locations. Jloss_buf () check is used for reads from a previously locked position when not using a commit time lock; In other cases it is utilized for all readings.

의사 코드 B:Pseudo Code B: 적절한 appropriate 업데이트update , , 네이티브Native 코드 code STMSTM 판독 장벽 Read barrier

Type tmRd<Type>(TxnDesc* txnDesc,Type* addr) { Type tmRd <Type> (TxnDesc * txnDesc, Type * addr) {

setrm(addr); /* 로드된 어드레스에 대한 판독 모니터를 설정 */ setrm (addr); / * Set the read monitor for the loaded address * /

TxnRec txnRec = *txnRecPtr; TxnRec txnRec = * txnRecPtr;

val = *addr; val = * addr;

if (txnRec != txnDesc) { if (txnRec! = txnDesc) {

/* retry */ / * retry * /

txnRec = *txnRecPtr; txnRec = * txnRecPtr;

val = *addr; val = * addr;

} }

} else if } else if

jloss _ buf Abort; jloss _ buf Abort;

return val; return val;

}
}

Abort Abort

AbortO;
AbortO;

bool checkReadConsistency(TxnDesc* txnDesc,TxnRec* txnRecPtr,TxnRec txnRec) { bool checkReadConsistency (TxnDesc * txnDesc, TxnRec * txnRecPtr, TxnRec txnRec) {

if (COMMIT _ LOCKING && jloss _ buf () == false) if ( COMMIT _ LOCKING && jloss _ buf () == false)

abort();/ * 버퍼링된 데이터가 상실되었다면 중단*/ abort (); / * Abort if buffered data is lost * /

if (txnRec > txnDesc->timestamp) { if (txnRec> txnDesc-> timestamp) {

TxnRec timestamp = GlobalTimestamp; TxnRec timestamp = GlobalTimestamp;

jlossjloss __ rmrm LostLost : :

txnDesc->timestamp = timestamp; txnDesc-> timestamp = timestamp;

return true; return true;

Lost: Lost:

if (validateReadSet(txnDesc) == false) if (validateReadSet (txnDesc) == false)

abortO; abortO;

txnDesc->timestamp = timestamp; txnDesc-> timestamp = timestamp;

} }

TM 시스템들은 위에서 논의된 바와 같이 판독 모니터링을 버퍼링 및 기록 모니터링과 조합할 수 있고, 그에 따라 일관성을 유지하기 위해 모니터링되거나 버퍼링된 라인들에 대한 충돌 체크를 포함할 수 있다. 그러한 시스템을 수용하기 위해 다양한 실시예들이 JLOSS.rm.buf (판독 모니터링 또는 버퍼링된 위치들에 대한 충돌), JLOSS.rm.wm, (판독 또는 기록 모니터링된 위치들에 대한 충돌), 또는 JLOSS.* (판독 모니터링되거나, 기록 모니터링되거나, 버퍼링된 위치에 대한 충돌)과 같은 다양한 모니터링 및 버퍼링 이벤트들의 논리 조합에 대해 분기하는 JLOSS 종류들을 제공할 수도 있다. TM systems may combine read monitoring with buffering and write monitoring as discussed above, and thus may include collision checks on the monitored or buffered lines to maintain consistency. Various embodiments include JLOSS.rm.buf (collision to read monitored or buffered locations), JLOSS.rm.wm, (collision to read or write monitored locations), or JLOSS to accommodate such a system. It may provide JLOSS types that branch on a logical combination of various monitoring and buffering events, such as * (collisions for read monitored, recorded monitored, or buffered locations).

다른 대안적 실시예에서, 구조적 인터페이스는 소프트웨어가 별도의 제어 레지스터에 조건들-판독/기록 모니터링된 라인들이나 버퍼링된 라인들에 대한 충돌-을 설정할 수 있게 함으로써 분기되게 하는 조건들로부터 JLOSS 명령어를 분리시킨다. 이러한 실시예는 하나의 JLOSS 명령어 인코딩만을 필요로 하며, JLOSS가 분기해야 하는 이벤트들의 집합으로의 추후 확장을 지원할 수 있다.In another alternative embodiment, the architectural interface separates the JLOSS instruction from the conditions that cause it to branch by allowing the software to set conditions—collisions for read / write monitored lines or buffered lines—in a separate control register. Let's do it. This embodiment requires only one JLOSS instruction encoding and can support future expansion into a set of events for which JLOSS should branch.

도 10으로 가면, 특정 정보의 충돌이나 손실에 기반하여 목적지 레이블로 점프하는 손실 명령 실행 방법의 흐름도에 대한 일 실시예가 예시된다. 일 실시예에서, JLOSS 명령어가 단계 1005에서 수신된다. 위에서 언급한 바와 같이, JLOSS 명령어는 프로그래머나 컴파일러에 의해 판독 설정 일관성을 보장하기 위해 로드 오퍼레이션 다음과 같은 메인 코드들이나 판독나 기록 장벽 안과 같은 장벽 안에 삽입될 수 있다. JLOSS 명령어, 및 위에서 논의된 것 같은 그 변종들이 일 실시예에서 프로세서의 ISA의 일부로서 인식될 수 있다. 여기서, 디코더들이 JLOSS 명령어들의 오퍼레이션 코드들을 디코딩할 수 있다. 10, one embodiment of a flow diagram of a method of executing a loss instruction that jumps to a destination label based on a conflict or loss of specific information is illustrated. In one embodiment, a JLOSS instruction is received at step 1005. As mentioned above, JLOSS instructions can be inserted by a programmer or compiler into a barrier, such as within a read or write barrier, or main code following a load operation to ensure read set consistency. JLOSS instructions, and variants thereof, as discussed above, may be recognized as part of the processor's ISA in one embodiment. Here, decoders can decode operation codes of JLOSS instructions.

단계 1010에서, 정보의 충돌이나 손실이 발생했는지가 판단된다. 일 실시예에서, 충돌이나 손실의 타입은 JLOSS 명령어의 종류에 좌우된다. 예를 들어, 수신된 JLOSS 명령어가 JLOSS. rm 명령어인 경우, 판독 모니터링된 라인이 외부 기록에 의해 충돌하면서 액세스되었는지가 판단된다. 그러나, 위에서 언급한 바와 같이, 사용자가 제어 레지스터에 조건들을 특정할 수 있게 하는 JLOSS 명령어 같은 JLOSS의 어떤 변종이 수신될 수 있다. In step 1010, it is determined whether a collision or loss of information has occurred. In one embodiment, the type of crash or loss depends on the type of JLOSS instruction. For example, the received JLOSS command is JLOSS. In the case of the rm command, it is determined whether the read-monitored line was accessed with collision by an external write. However, as mentioned above, any variant of JLOSS can be received, such as a JLOSS instruction that allows a user to specify conditions in a control register.

따라서, 제어 레지스터나 JLOSS 명령어의 타입으로부터 조건이 확립되면, 그 조건들이 충족되었는지가 판단된다. 제1예로서, 조건들이 만족되는지를 판단하기 위해 TSR(915) 같은 트랜잭션 상태 레지스터 안의 정보가 활용된다. 여기서, TSR(915)은 디폴트로 무 충돌 값으로 설정되고 충돌이 일어났음을 나타내기 위해 충돌 값으로 업데이트되는 판독 모니터 상태 플래그를 포함할 수 있다. 하지만, 충돌이 일어났는지를 판단하는 데 있어 상태 레지스터만이 유일한 방법은 아니며, 실제로 손실이나 충돌을 판단하기 위해 알려진 어떠한 방법이라도 활용될 수 있다.Thus, if conditions are established from the type of control register or JLOSS instruction, it is determined whether the conditions have been met. As a first example, information in a transaction status register such as TSR 915 is utilized to determine if conditions are met. Here, the TSR 915 may include a read monitor status flag that is set to a no collision value by default and updated with a conflict value to indicate that a collision has occurred. However, the status register is not the only way to determine if a collision has occurred, and any known method can be used to actually determine loss or collision.

판독 모니터 충돌 플래그가 여전히 TSR(915) 내 디폴트 값으로 설정될 때와 같이 어떠한 충돌도 검출되지 않음에 따라, 단계 1025에서 거짓 값이 리턴되고 실행은 정상적으로 계속된다. 그러나, 판독 모니터 충돌 플래그가 설정되는 것과 같이 충돌이나 손실이 검출되면, 단계 1015에서 JLOSS는 참을 리턴하고, 실행을 단계 1020의 수신된 JLOSS 명령어에 의해 정의된 레이블로 점프시킨다. As no conflict is detected, such as when the read monitor conflict flag is still set to the default value in the TSR 915, a false value is returned in step 1025 and execution continues normally. However, if a conflict or loss is detected, such as the read monitor conflict flag is set, then at step 1015 JLOSS returns true and jumps execution to the label defined by the received JLOSS instruction of step 1020.

트랜잭션 메모리 Transactional memory 커미트을Commit 위한 하드웨어 지원 Hardware support

앞에서 논의된 바와 같이, 하드웨어가 지원되는 트랜잭션은 캐시에 트랜잭션 기록들을 버퍼링함으로써 그들을 전체적으로 가시화시키지 않으면서 소프트웨어의 버전 관리를 가속화할 수 있다. 이 경우, 버퍼링된 값들이 모든 프로세서들에 대해 가시화되게 하지만 어떤 버퍼링된 라인들이 손실될 경우 실패하는 간단한 커미트 명령어가 사용될 수 있다. 중복 장벽들을 제거/필터링하는 필터와 같이, 소프트웨어가 가속화를 위해 사용할 수 있는 메타데이터 역시 보유하는 하드웨어 기능은 하드웨어가 어떤 충돌을 검출했을 경우 커미트 명령어가 실패하는 것을 바랄 수 있다. 또한, 커미트 시, 메타데이터, 모니터, 및 버퍼링된 라인들 같이, 트랜잭션할 하드웨어에 보유되는 정보의 다양한 조합들을 클리어하는 것이 바람직할 수 있다. As discussed above, hardware-assisted transactions can accelerate versioning of software without buffering the transaction records in the cache and making them globally visible. In this case, a simple commit instruction can be used that causes the buffered values to be visible to all processors but fails if any buffered lines are lost. A hardware function that also retains metadata that software can use for acceleration, such as a filter that removes / filters redundant barriers, may wish the commit instruction to fail if the hardware detects some collision. In addition, it may be desirable to clear various combinations of information held in the hardware to transact, such as metadata, monitor, and buffered lines, upon commit.

따라서, 일 실시예에서 하드웨어는 커미트 명령어가 커미트 조건과 커미트 시 클리어 할 정보 양자를 특정할 수 있도록 커미트 명령어의 여러 형식들을 지원한다. 도 11을 참조할 때, 커미트 조건과 클리어 컨트롤에 대한 정의를 지원하기 위한 하드웨어의 일반적인 경우에 대한 일 실시예가 예시된다. Thus, in one embodiment, the hardware supports several forms of commit instructions so that the commit instruction can specify both the commit condition and the information to clear upon commit. Referring to FIG. 11, one embodiment of a general case of hardware to support the definition of commit conditions and clear control is illustrated.

예시된 바와 같이, 커미트 명령어(1105)는 프로세서 ISA의 일부로서 인식될 수 있는 오퍼레이션 코드(1110)를 포함하고-디코더들(1115)이 오퍼레이션 코드(1110)를 디코딩할 수 있다. 도시된 에에서, 오퍼레이션 코드(1110)는 커미트 조건(1111) 및 클리어 컨트롤(1112)이라는 두 부분을 포함한다. 커미트 조건(1111)은 커미트할 트랜잭션의 조건을 특정하기 위한 것이고, 커미트 클리어 컨트롤(1112)은 트랜잭션의 커미트 시 클리어할 정보를 특정한다. As illustrated, the commit instruction 1105 includes an operation code 1110 that can be recognized as part of the processor ISA-the decoders 1115 can decode the operation code 1110. In the illustrated example, the operation code 1110 includes two parts, a commit condition 1111 and a clear control 1112. The commit condition 1111 is for specifying a condition of a transaction to commit, and the commit clear control 1112 specifies information to be cleared at commit of the transaction.

일 실시예에서, 두 부분은 네 가지 값들, 판독 모니터링(RM), 기록 모니터링(WM), 버퍼링(Buf), 및 메타데이터(MD)를 포함한다. 실질적으로, 부분(1111)에서 네 값들 중 어느 하나가 설정되면-연관 속성/특성이 커미트 조건임을 가리키는 값을 포함하면, 해당 특성이 커미트할 조건이다. 즉, 조건들(1111) 중 판독 모니터 정보에 해당하는 제1비트가 설정되면, 트랜잭션과 연관된 모니터들(1135)로부터의 어떤 판독 모니터링 데이터의 손실은 중단―커미트 명령어의 특정 조건이 실패했을 때 커미트 없음―으로 이어진다. 마찬가지로, 1112 안의 값이 설정되면, 해당 특성은 커미트 시에 클리어된다. 이 예에서 계속되어, 부분(1112)의 RM이 설정되면, 트랜잭션이 커미트될 때 트랜잭션의 모니터들(1135) 내 판독 모니터 정보가 클리어된다. 따라서, 이 예에서는 커미트 명령어에 대한 변형으로서 256 개의 가능한 조합들을 파생하는 커미트에 대한 네 개의 조건들 곱하기 네 개의 클리어 컨트롤들이라는 가능성이 존재한다. 일 실시예에서, 커미트 조건들이 오퍼레이션 코드 안에서 특정될 수 있게 함으로써, 하드웨어는 모든 변형들을 지원할 수 있다. 그러나, 다양한 스타일의 커미트 명령어들 및 이들이 어떻게 이용될 수 있는지에 대해 더 자세한 이해를 위해 이하에서 몇 가지 변형들이 논의된다. In one embodiment, the two parts include four values, read monitoring (RM), write monitoring (WM), buffering (Buf), and metadata (MD). Practically, if any one of the four values in part 1111 is set-if the associated attribute / characteristic contains a value indicating that it is a commit condition, then that characteristic is the condition to commit. That is, if the first bit of the conditions 1111 corresponding to read monitor information is set, the loss of any read monitoring data from the monitors 1135 associated with the transaction is aborted—commit when certain conditions of the commit instruction have failed. None. Likewise, if a value in 1112 is set, the property is cleared at commit. Continuing in this example, if the RM of the portion 1112 is set, the read monitor information in the monitors 1135 of the transaction is cleared when the transaction is committed. Thus, in this example there is the possibility of four conditions multiplied by four clear controls for a commit that derives 256 possible combinations as a variant to the commit instruction. In one embodiment, by allowing commit conditions to be specified within the operation code, the hardware can support all the variations. However, several variations are discussed below for a more detailed understanding of the various styles of commit instructions and how they can be used.

TXCOMWMTXCOMWM

제1예로서, Txcomwm 명령어가 논의된다. 이 명령어는 트랜잭션을 끝내고 어떠한 기록 모니터링된 데이터도 손실되지 않았을 경우(성공의 경우) 모든 버퍼링되고 기록 모니터링된 데이터를 전체적으로 가시화되게 한다; 다른 경우, 기록 모니터링된 데이터가 손실되었다면 실패한다. Txcomwm은 성공(또는 실패)을 나타내도록 플래그를 설정(리셋)한다. 성공시, Txcomwm은 모든 기록 모니터링된 데이터의 버퍼링된 상태를 클리어한다. Txcomwm은 판독나 기록 모니터링 상태에 영향을 미치지 않으면서, 소프트웨어가 이어지는 트랜잭션들에서 그러한 상태를 재사용할 수 있게 한다; 그것은 또한 버퍼링되지만 기록 모니터링되지는 않은 위치들의 상태에 영향을 주지 않으면서 소프트웨어가 그러한 위치들에서 정보를 계속 유지시킬 수 있게 한다. 이하에서 의사 코드 C로 표제된 의사 코드는 Txcomwm의 알고리즘 내용을 예시한다. TSR.LOSS_WMdl 0일 때, 모든 버퍼링되고 기록 모니터링된 BBLK들의 BF 특성은 자동으로 클리어되고 그렇게 버퍼링된 모든 데이터가 다른 에이전트들에 대해 가시화된다. TCR.IN TX가 클리어된다. WM이 부족한 버퍼링된 블록들은 영향을 받지 않고 계속 버퍼링된다. CF 플래그는 완료시에 설정된다. TSR.LOSS_WM이 1일 때, CF 플래그가 클리어되고 TCR.IN_TX가 클리어된다. 오퍼레이션이 성공했으면 CF 플래그는 1로 설정되고 실패시 0으로 설정된다. OF, SF, ZF, AF, 및 PF 플래그들이 0으로 설정된다.As a first example, the Txcomwm instruction is discussed. This command ends the transaction and causes all buffered, write-monitored data to be visualized globally if no write-monitored data is lost (in case of success); In other cases, it fails if write-monitored data is lost. Txcomwm sets (resets) a flag to indicate success (or failure). On success, Txcomwm clears the buffered state of all write monitored data. Txcomwm allows software to reuse such states in subsequent transactions without affecting read or write monitoring states; It also allows the software to maintain information at those locations without affecting the status of the buffered but not recorded monitored locations. The pseudo code, titled pseudo code C below, illustrates the algorithmic content of Txcomwm. When TSR.LOSS_WMdl 0, the BF property of all buffered and write-monitored BBLKs is automatically cleared and all data so buffered is visible to other agents. TCR.IN TX is cleared. Buffered blocks that lack WM are not affected and continue to be buffered. The CF flag is set upon completion. When TSR.LOSS_WM is 1, the CF flag is cleared and TCR.IN_TX is cleared. The CF flag is set to 1 if the operation was successful and to 0 on failure. OF, SF, ZF, AF, and PF flags are set to zero.

의사 코드 C:Pseudo Code C: TxcomwmTxcomwm 오퍼레이션을 위한 알고리즘의 Of algorithms for operations 실시예Example

atomically atomically

{ {

if (TSR.LOSS WM == 1) if (TSR.LOSS WM == 1)

{ {

CF := 0; CF: = 0;

} }

else else

{ {

for (all mblk)for (all mblk )

{ {

CommitAllInMblk(mblk) CommitAllInMblk (mblk)

}
}

CF := 1; CF: = 1;

} }

}
}

TCR.IN TX := 0; TCR.IN TX: = 0;

OF := 0; OF: = 0;

SF := 0; SF: = 0;

ZF := 0; ZF: = 0;

AF := 0; AF: = 0;

PF := 0; PF: = 0;

이하에서 의사 코드 D로 표제되는 의사 코드는 적절한 업데이트 STM에서 취소 로깅을 피하도록 하드웨어 기록 버퍼링을 이용하는 트랜잭션을 커미트하기 위해 HASTM 시스템이 Txcomwm 명령어를 어떻게 사용할 수 있는지를 보인다. CACHE_RESIDENT_WRITES 플래그는 이 실행 모드를 가리킨다.The pseudo code, titled pseudo code D below, shows how the HASTM system can use the Txcomwm instruction to commit transactions that use hardware write buffering to avoid cancellation logging at the appropriate update STM. The CACHE_RESIDENT_WRITES flag indicates this run mode.

의사 코드 D:Pseudo Code D: TxcomwmTxcomwm 명령어의 사용에 대한 의사 코드의 Of pseudo code for the use of instructions 실시예Example

Void tmCommitUtm(TxnDesc* txnDesc) { Void tmCommitUtm (TxnDesc * txnDesc) {

if (CACHE _ RESIDENT _ WRITES) { if ( CACHE _ RESIDENT _ WRITES ) {

if (LAZY _ LOCKING) { if (EAGER _ MONITORING == false) { if ( LAZY _ LOCKING ) {if ( EAGER _ MONITORING == false ) {

/* 레이지(Lazy) 락 & 레이지 모니터링 */ / * Lazy Lock & Lazy Monitoring * /

setWriteMonitors(txnDesc);setWriteMonitors (txnDesc);

/* 기록 모니터들의 설정 중에 어떤 버퍼링된 라인들이 손실되면 중단 */ / * Abort if some buffered lines are lost during setup of write monitors * /

if (EJECTOR _ ENABLED == false && checkTsrLoss ( LOSS _ BF)) if ( EJECTOR _ ENABLED == false && checkTsrLoss ( LOSS _ BF ))

abort(txnDesc);abort (txnDesc);

} }

/* 락을 획득하는 동안 이젝터를 불이행하기 위해 트랜잭션 종료 */ / * End transaction to fail ejector while acquiring lock * /

if (E_ JEcTOR _ ENABLED) if ( E_ JEcTOR _ ENABLED )

tx(); tx ();

acquireWriteLocks(txnDesc); acquireWriteLocks (txnDesc);

} }

/* 기록 모니터링된 라인들을 커미트하고 트랜잭션을 종료 */ / * Commit the recorded and monitored lines and end the transaction * /

if (txcomwm() == false) { if ( txcomwm () == false) {

txa(); /* 버퍼링 & 모니터링 클리어 */ txa (); / * Clear Buffering & Monitoring * /

abort(txnDesc); abort (txnDesc);

} }

} else { } else {

/* 무제한 기록 */ / * Unlimited records * /

tx(); /* 트랜잭션을 종료 */ tx (); / * End the transaction * /

} }

TxnRec myCommitTimestamp = lockedIncrement(&GlobalTimestamp); TxnRec myCommitTimestamp = lockedIncrement (&GlobalTimestamp);

if (myCommitTimestamp == txnDesc->timestamp-l &&if (myCommitTimestamp == txnDesc-> timestamp-l &&

validateReadSet(txnDesc) == false)validateReadSet (txnDesc) == false)

tmRollbackAndAbort(txnDesc),myCommitTimestamp);tmRollbackAndAbort (txnDesc), myCommitTimestamp);

releaseWriteLocks(txnDesc,myCommitTimestamp);releaseWriteLocks (txnDesc, myCommitTimestamp);

quiesce(txnDesc); quiesce (txnDesc);

} }

TXCOMWMRMTXCOMWMRM

하나의 변형인 txcomwmrm는 어떤 판독 모니터링된 위치들 역시 손실되었을 경우 실패하도록 Txcomwm 명령을 확장한다. 이 변형은 판독-설정 충돌을 검출하기 위해 하드웨어만을 사용하는 트랜잭션들에 유용하다. 이하에서 의사 코드 E로 표제된 의사 코드는 Txcomwmrm의 알고리즘 내용을 예시한다. TSR.LOSS WM 및 TSR.LOSS_RM이 0일 때, 모든 버퍼링되고 기록 모니터링된 BBLK들의 BF 특성은 자동으로 클리어되고 그렇게 버퍼링된 모든 데이터가 다른 에이전트들에 대해 가시화된다. TCR.IN TX가 클리어된다. WM이 부족한 버퍼링된 블록들은 영향을 받지 않고 계속 버퍼링된다. CF 플래그는 완료시에 설정된다. TSR.LOSS_WM 또는 TSR.LOSS_RM이 1일 때, CF 플래그가 클리어되고 TCR.IN_TX가 클리어된다. 오퍼레이션이 성공했으면 CF 플래그는 1로 설정되고 실패시 0으로 클리어된다. OF, SF, ZF, AF, 및 PF 플래그들이 0으로 설정된다. One variant, txcomwmrm, extends the Txcomwm command to fail if any read monitored locations are also lost. This variant is useful for transactions that only use hardware to detect read-set conflicts. The pseudo code, entitled pseudo code E below, illustrates the algorithmic content of Txcomwmrm. When TSR.LOSS WM and TSR.LOSS_RM are zero, the BF property of all buffered and write-monitored BBLKs is automatically cleared and all data so buffered is visible to other agents. TCR.IN TX is cleared. Buffered blocks that lack WM are not affected and continue to be buffered. The CF flag is set upon completion. When TSR.LOSS_WM or TSR.LOSS_RM is 1, the CF flag is cleared and TCR.IN_TX is cleared. If the operation was successful, the CF flag is set to 1 and cleared to 0 upon failure. OF, SF, ZF, AF, and PF flags are set to zero.

의사 코드 E:Pseudo Code E: TxcomwmrmTxcomwmrm 의 알고리즘 내용의 Of algorithm content 실시예Example

atomically atomically

{ {

if ((TSR.LOSS_RM == 1) || (TSR.LOSS_WM == I))if ((TSR.LOSS_RM == 1) || (TSR.LOSS_WM == I))

{ {

CF := 0; CF: = 0;

}}

elseelse

{ {

for (all mblk) for (all mblk )

{ {

CommitAllInMblk(mblk)CommitAllInMblk (mblk)

}
}

CF := 1; CF: = 1;

}}

}
}

TCR.IN TX := 0; TCR.IN TX: = 0;

OF := 0; OF: = 0;

SF := 0; SF: = 0;

ZF := 0; ZF: = 0;

AF := 0; AF: = 0;

PF := 0; PF: = 0;

다음 의사 코드인 의사 코드 F는 트랜잭션 기록들을 버퍼링하고 판독-설정 충돌들을 검출하는 양 동작을 위해 하드웨어를 이용하는 STM 시스템의 txcomwmrm을 활용하는 커미트 알고리즘을 보인다. HW_READ_MONITORTNG 플래그는 알고리즘이 판독-설정 충돌 검출에 대해 하드웨어만을 이용하는지 여부를 나타낸다. The next pseudo code, pseudo code F, shows a commit algorithm that utilizes the txcomwmrm of the STM system using hardware for both operations buffering transaction records and detecting read-set conflicts. The HW_READ_MONITORTNG flag indicates whether the algorithm uses only hardware for read-set conflict detection.

의사 코드 F:Pseudo Code F: txcomwmrmtxcomwmrm 명령어를 활용하는 Command 의소Low 코드의 Of the code 실시예Example

if (LAZY _ LOCKING) { if ( LAZY _ LOCKING ) {

if (EAGER MONITORING == false) { if ( EAGER MONITORING == false ) {

/* 레이지(Lazy) 락 & 레이지 모니터링 *// * Lazy Lock & Lazy Monitoring * /

setWriteMonitors(txnDesc);setWriteMonitors (txnDesc);

if (EJECTOR _ ENABLED == false && checkTsrLoss ( LOSS _ BF)) if ( EJECTOR _ ENABLED == false && checkTsrLoss ( LOSS _ BF ))

abort(txnDesc);abort (txnDesc);

} }

if (EJEcTOR _ ENABLED) if ( EJEcTOR _ ENABLED )

tx(); tx ();

acquire WriteLocks(txnDesc); acquire WriteLocks (txnDesc);

} }

if (HW _ READ _ MONITORING) { if ( HW _ READ _ MONITORING ) {

if (txcomwmrm() == false) { if ( txcomwmrm () == false) {

abort(txnDesc); abort (txnDesc);

}}

} else if (txcomwm() == false) { } else if ( txcomwm () == false) {

txa(); /* clear buffering & monitoring */ txa (); / * clear buffering & monitoring * /

abort(txnDesc);abort (txnDesc);

} }

} else { } else {

/* 무제한 기록 */ / * Unlimited records * /

tx(); /* 트랜잭션을 종료 */ tx (); / * End the transaction * /

} }

if (HW _ READ _ MONITORING == false &&if ( HW _ READ _ MONITORING == false &&

myCommitTimestamp == txnDesc->timestamp-l &&myCommitTimestamp == txnDesc-> timestamp-l &&

validateReadSetUtm(txnDesc) == false)validateReadSetUtm (txnDesc) == false)

tmRollbackAndAbort(txnDesc),myCommitTimestamp); tmRollbackAndAbort (txnDesc), myCommitTimestamp);

releaseWriteLocks(txnDesc,myCommitTimestamp); releaseWriteLocks (txnDesc, myCommitTimestamp);

quiesce(txnDesc);quiesce (txnDesc);

} }

TXCOMWMIRMCTXCOMWMIRMC

의사 코드 F의 알고리즘 내용의 세 번째로 논의되는 변종이 이하에 예시된다. TSR.LOSS_WM 및 TSR.LOSS_IRM이 0일 때, 모든 버퍼링되고 기록 모니터링된 BBLK들의 BF 특성은 자동으로 클리어되고 그렇게 버퍼링된 모든 데이터가 다른 에이전트들에 대해 가시화된다. RM, WM 및 IRM 뿐 아니라 TCR.IN_TX도 클리어된다. WM이 부족한 버퍼링된 블록들은 영향을 받지 않고 계속 버퍼링된다. CF 플래그는 완료시에 설정된다. TSR.LOSS_WM 또는 TSR.LOSS_IRM 이 1일 때, CF 플래그가 클리어되고 TCR.IN_TX가 클리어된다. 오퍼레이션이 성공했으면 CF 플래그는 1로 설정되고 실패시 0으로 클리어된다. OF, SF, ZF, AF, 및 PF 플래그들이 0으로 설정된다. A third discussed variant of the algorithmic content of the pseudo code F is illustrated below. When TSR.LOSS_WM and TSR.LOSS_IRM are zero, the BF property of all buffered and write-monitored BBLKs is automatically cleared and all data so buffered is visible to other agents. In addition to RM, WM and IRM, TCR.IN_TX is cleared. Buffered blocks that lack WM are not affected and continue to be buffered. The CF flag is set upon completion. When TSR.LOSS_WM or TSR.LOSS_IRM is 1, the CF flag is cleared and TCR.IN_TX is cleared. If the operation was successful, the CF flag is set to 1 and cleared to 0 upon failure. OF, SF, ZF, AF, and PF flags are set to zero.

의사 코드 F:Pseudo Code F: TxcomwmirmcTxcomwmirmc 의 알고리즘 내용의 Of algorithm content 실시예Example

atomically atomically

{ {

i f ( (TSR . LOSS_IRM == 1)||(TSR . L0SS_WM == I) )i f ((TSR .LOSS_IRM == 1) || (TSR .L0SS_WM == I))

{ {

CF : = 0 ; CF: = 0;

} }

else else

{ {

for (all mblk) for (all mblk )

{ {

CommitAl1InMbIk(mblk)
CommitAl1InMbIk ( mblk )

mblk.RM :=0;mblk.RM: = 0;

mblk.WM := 0; mblk.WM: = 0;

mblk.IRM : = 0; mblk.IRM: = 0;

}}

CF := 1; CF: = 1;

}}

}
}

TCR.IN_TX : = 0; TCR.IN_TX: = 0;

OF := 0; OF: = 0;

SF := 0; SF: = 0;

ZF := 0; ZF: = 0;

AF := 0;AF: = 0;

PF := 0; PF: = 0;

도 12를 참조할 때, 커미트 컨디션과 클리어 컨트롤을 정의하는 커미트 명령의 실행 방법에 대한 흐름도의 일 실시예가 예시된다. 단계 1205에서 커미트 명령이 수신된다. 상술한 바와 같이, 컴파일러는 프로그램 코드에 커미트 명령어를 삽입할 수 있다. 특정한 예시적 예로서, 커미트 함수에 대한 콜이 메인 코드에 삽입되고, 위에서 의사 코드 안에 포함된 것들 같은 커미트 함수가 라이브러리 안에 제공된다; 컴파일러 역시 라이브러리안에 있는 커미트 함수 안으로 커미트 명령어를 삽입할 수 있다.Referring to FIG. 12, one embodiment of a flow diagram for a method of executing a commit instruction that defines commit conditions and clear control is illustrated. In step 1205 a commit command is received. As mentioned above, the compiler can insert commit instructions into the program code. As a specific illustrative example, a call to a commit function is inserted into the main code, and commit functions such as those contained in the pseudo code above are provided in the library; The compiler can also insert commit instructions into a commit function in the library.

커미트 명령어가 수신된 뒤, 디코더들이 커미트 명령을 디코딩할 수 있다. 디코딩된 정보로부터, 커미트 명령어의 오퍼레이션 코드에 의해 특정된 조건들이 단계 1210에서 결정된다. 상술한 바와 같이, 오퍼레이션 코드는 일부 플래그들을 설정하고 다른 것들은 리셋하여 커미트에 어떤 조건들이 사용되어야 할지를 나타내도록 할 수 있다. 조건이 만족되지 않으면, 거짓이 리턴되고 트랜잭션은 따로 중단된다. 그러나, 판독 모니터, 기록 모니터, 메타데이터, 및/또는 버퍼링의 무손실에 대한 어떤 조합들 같이, 커미트을 위한 조건들이 만족되면, 단계 1215에서 클리어 조건/컨트롤이 결정된다. 예로서, 트랜잭션에 대한 판독 모니터, 기록 모니터, 메타데이터, 및/또는 버퍼링의 어떤 조합이 클리어되도록 결정된다. 그 결과, 클리어되도록 결정된 정보가 단계 1225에서 클리어된다. After the commit command is received, decoders may decode the commit command. From the decoded information, the conditions specified by the operation code of the commit instruction are determined in step 1210. As mentioned above, the operation code may set some flags and reset others to indicate what conditions should be used for the commit. If the condition is not met, false is returned and the transaction is aborted separately. However, if any conditions for commit are satisfied, such as read monitor, write monitor, metadata, and / or lossless buffering, then a clear condition / control is determined at step 1215. By way of example, any combination of read monitor, write monitor, metadata, and / or buffering for a transaction is determined to be cleared. As a result, the information determined to be cleared is cleared in step 1225.

UTMUTM 에 대해 최적화된 메모리 관리Memory management for data

위에서 논의된 바와 같이, 언바운디드 트랜잭션 메모리(UTM) 구조 및 그 하드웨어 구현예는 다음과 같은 특징들인 모니터링, 버퍼링 및 메타데이터를 도입함으로써 프로세서 구조를 확장시킨다. 이러한 조합은 광범위한 트랜잭션 메모리 디자인을 포함하는 다양하고 정교한 알고리즘들을 구현하는데 필요한 수단인 소프트웨어를 제공한다. 각각의 특성은 캐시 구현예에서 기존의 캐시 프로토콜들을 확장하거나 독립적인 새 하드웨어 자원들을 할당함으로써 하드웨어 상에서 구현될 수 있다. As discussed above, the unbounded transactional memory (UTM) architecture and its hardware implementation extend the processor architecture by introducing the following features: monitoring, buffering and metadata. This combination provides software, the means necessary to implement a variety of sophisticated algorithms, including a wide range of transactional memory designs. Each characteristic may be implemented on hardware by extending existing cache protocols or allocating independent new hardware resources in a cache implementation.

HW에 의해 구현된 UTM 특성들을 이용해, UTM 구조 및 그 하드웨어 구현예들은 UTM 트랜잭션 중단 및 이어지는 트랜잭션 재시도 오퍼레이션들 같은 사건들을 효과적으로 피하고 최소화할 수 있다면, 트랜잭션들에 대한 소프트웨어만의 해법(STM)을 통해 성능 향상을 제공할 수 있다. 하드웨어 트랜잭션 중단의 주요 이유들 중 하나는 외부 인터럽트, 시스템 콜 이벤트 및 페이지 오류에 의해 야기되는 잦은 링 전환으로 인한 것이었다. Using the UTM features implemented by the HW, the UTM architecture and its hardware implementations can provide a software-only solution to transactions (STM) if they can effectively avoid and minimize events such as UTM transaction abort and subsequent transaction retry operations. This can provide a performance boost. One of the main reasons for hardware transaction abort was due to frequent ring switching caused by external interrupts, system call events and page faults.

정지 메커니즘에 기반하는 현재의 특권 레벨(CPL(current privilege level))은 하드웨어 트랜잭션을 활성화시키고(버퍼링 및 모니터링 및 이젝션(ejection) 메커니즘을 수행하는 것 같은 UTM 특성들을 이용해 하드웨어 가속화된 트랜잭션을 실행), 프로세서는 특권 레벨 3(사용자 모드)에서 작동한다. 링 3으로부터의 어떤 링 전환이 현재의 활성화된 트랜잭션들을 자동 정지되게 한다(UTM 특성들을 생성하기 위해 정지하고 이젝션 메커니즘을 불이행함(디스에이블)). 마찬가지로, 다시 링 3으로의 어떤 링 전환은 앞서 정지된 하드웨어 트랜잭션을 그것이 액티브되었을 때 재개시킨다. 이러한 접근법의 가능한 단점은 커널 코드나, 링 3을 제외한 어떤 다른 레벨에서의 하드웨어 트랜잭션 메모리 자원들의 사용은 대부분 배제된다는 것이다. The current privilege level (CPL) based on the suspend mechanism activates hardware transactions (executing hardware accelerated transactions using UTM features such as performing buffering and monitoring and ejection mechanisms), The processor operates at privilege level 3 (user mode). Any ring switch from ring 3 causes the currently active transactions to stop automatically (stops to generate UTM characteristics and defaults to the ejection mechanism (disabled)). Likewise, any ring switch back to ring 3 resumes a previously stopped hardware transaction when it is active. A possible disadvantage of this approach is that the use of hardware transactional memory resources at kernel code or any other level except ring 3 is largely excluded.

또 하나의 접근법은 각각의 TM 자원들을 가진 링 0 코드에 대해 하드웨어 트랜잭션들을 여전히 실시할 수 있도록 링 0의 트랜잭션 제어 레지스터(TxCR) 같은 중복된 TM 제어 자원들을 도입하는 것이다. 그러나, 이러한 접근법은 링 0 트랜잭션 오퍼레이션들 중에 네스트된 인터럽트 및 예외들을 다룰 효율적인 해법이 부재할 가능성이 있다. Another approach is to introduce redundant TM control resources, such as ring 0's transaction control register (TxCR), so that hardware transactions can still be performed for ring 0 code with respective TM resources. However, this approach is likely to lack an efficient solution to handle nested interrupts and exceptions during ring 0 transaction operations.

그 결과, 도 13은 링 0 트랜잭션을 사용자 모드(링 3) 트랜잭션들 위에서 인에이블하지만, 링 0 트랜잭션이 존재하는 무한 레벨의 네스트된 인터럽트와 NMI 케이스들을 처리하기 위해 VMM(Virtual Machine Monitor) 같은 OS 및 하이퍼바이저(hypervisor)를 제공하는, 트랜잭션 실행 도중에 특권 레벨 전환 처리를 지원하는 하드웨어의 일 실시예를 예시한다. As a result, FIG. 13 enables a ring 0 transaction on user mode (ring 3) transactions, but an OS such as a virtual machine monitor (VMM) to handle the infinite level of nested interrupts and NMI cases where a ring 0 transaction exists. And one embodiment of hardware that supports privilege level switching processing during transaction execution, providing a hypervisor.

EFLAGS 레지스터(1310) 같은 저장 요소는 트랜잭션 인에이블 필드(TEF(transaction enable field))(1311)를 포함한다. TEF(1311)가 어떤 액티브 값을 보유할 때, 그것은 트랜잭션이 현재 액티브 상태로서 인에이블된다는 것을 나타내나, TEF(1311)가 인액티브 값을 보유할 때 그것은 트랜잭션이 보류된다는 것을 나타낸다. A storage element, such as the EFLAGS register 1310, includes a transaction enable field (TEF) 1311. When TEF 1311 holds some active value, it indicates that the transaction is enabled as the current active state, but when TEF 1311 holds the inactive value it indicates that the transaction is suspended.

일 실시예에서, 트랜잭션 시작 오퍼레이션이나 기타 트랜잭션 시작시의 오퍼레이션이 TEF 필드(1311)를 액티브 값으로 설정한다. 단계 1300에서 인터럽트, 예외사항, 시스템 콜, 가상 머신 탈출, 또는 가상 머신 진입 같은 링 레벨 트랜잭션 이벤트가 일어날 때, PE 0 Eflags 레지스터(1310)의 상태는 단계 1301에서 커널 스택 상으로 푸쉬된다. 단계 302에서, TEF 필드(1311)는 트랜잭션을 보류하기 위해 인액티브 값으로 클리어/업데이트된다. 링 레벨 전환 이벤트는 트랜잭션이 보류된 동안에 처리되거나 서비스된다. 단계 1303에서 리턴 이벤트를 검출하면, 단계 1301에서 스택 위로 푸쉬되었던 Eflags 레지스터(1310)의 상태는 단계 1304에서 팝되어 Eflags(1310)를 이전 상태로 복구시킨다. 이전 상태의 복구가 TEF(1311)를 액티브 값으로 리턴시키고 트랜잭션을 액티브 및 인에이블 상태로서 재개시킨다.In one embodiment, a transaction start operation or other transaction start operation sets the TEF field 1311 to an active value. When a ring level transaction event such as an interrupt, exception, system call, virtual machine exit, or virtual machine entry occurs in step 1300, the state of the PE 0 Eflags register 1310 is pushed onto the kernel stack in step 1301. At step 302, the TEF field 1311 is cleared / updated to an inactive value to withhold the transaction. Ring level transition events are processed or serviced while the transaction is pending. Upon detecting a return event in step 1303, the state of the Eflags register 1310 that was pushed onto the stack in step 1301 is popped in step 1304 to restore the Eflags 1310 to the previous state. Recovery of the previous state returns TEF 1311 to an active value and resumes the transaction as active and enabled.

예시적 링 레벨 전환 이벤트들에 대한 특정 프로세스 예들이 이하에 나열된다. 인터럽트되거나 예외사항에서, 프로세서는 EFLAGS 레지스터를 커널 스택으로 푸쉬하고 "Transaction Enable(트랜잭션 인에이블)" 비트가 설정되어 있으면 그 비트를 클리어하여, 앞서 인에이블된 트랜잭션을 보류한다. IRET에서, 프로세서는 커널 스택으로부터 "트랜잭션 인에이블" 비트를 포함하는 인터럽트된 스레드의 전체 EFLAGS 레지스터 상태를 복구시켜, 앞서 인에블되었던 트랜잭션을 보류 해제한다. Specific process examples for example ring level transition events are listed below. In an interrupted or anomaly, the processor pushes the EFLAGS register onto the kernel stack and clears the bit if the "Transaction Enable" bit is set, thus suspending the previously enabled transaction. In IRET, the processor recovers the entire EFLAGS register state of the interrupted thread that includes the "Transaction Enable" bit from the kernel stack, thus releasing the previously enabled transaction.

SYSCALL 시, 프로세서는 EFLAGS 레지스터를 푸쉬하고 "트랜잭션 인에이블"이 설정되어 있으면 그것을 클리어하여 앞서 인에이블된 트랜잭션을 보류시킨다. SYSRET에서, 프로세서는 커널 스택으로부터 "트랜잭션 인에이블" 비트를 포함하는 인터럽트된 스레드의 전체 EFLAGS 레지스터 상태를 복구시켜, 앞서 인에블되었던 트랜잭션을 보류 해제한다. On SYSCALL, the processor pushes the EFLAGS register and clears it if "Transaction Enable" is set to suspend the previously enabled transaction. In SYSRET, the processor recovers the entire EFLAGS register state of the interrupted thread that includes the "Transaction Enable" bit from the kernel stack, thus releasing the previously enabled transaction.

VM-Exit 시 프로세서는 "트랜잭션 인에이블" 비트를 포함하는 게스트의 EFLAGS 레지스터를 VMCS(Virtual Machine Control Structue) 안에 저장하고 "트랜잭션 인에이블" 비트 상태가 클리어된 호스트의 EFLAGS 레지스터 상태를 로드시켜, 앞서 인에이블된 게스트의 트랜잭션을 인에이블된 경우 보류시킨다. During VM-Exit, the processor stores the guest's EFLAGS register in the Virtual Machine Control Structue (VMCS) containing the "Transaction Enable" bit and loads the EFLAGS register state of the host with the "Transaction Enable" bit cleared. Suspend the enabled guest's transaction if it is enabled.

VM-Enter 시 프로세서는 VMCS로부터 "트랜잭션 인에이블" 비트를 포함하는 게스트의 EFLAGS 레지스터를 복구시켜, 인에이블되었었던 게스트의 앞서 인에이블된 트랜잭션을 보류 해제한다. Upon VM-Enter, the processor recovers the guest's EFLAGS register containing the "Transaction Enable" bit from the VMCS, thereby releasing the previously enabled transaction of the guest that was enabled.

이것은 커널 모드(링 0) 하드웨어 가속형 UTM 트랜잭션을 사용자 모드(링 3) 하드웨어 가속형 UTM 트랜잭션 위에서 인에이블시키지만, 또한 OS 및 VMM 둘 모두가 링 0 트랜잭션들이 존재하는 NMI 케이스들과 네스트된 인터럽트들의 무한 레벨들을 다룰 수 있는 방법들을 제공한다. 선행 기술들 중 어느 것도 그러한 메커니즘을 제공하지 못했다. This enables kernel mode (ring 0) hardware accelerated UTM transactions on top of user mode (ring 3) hardware accelerated UTM transactions, but both OS and VMM allow for NMI cases and nested interrupts where ring 0 transactions exist. Provides ways to deal with infinite levels. None of the prior art has provided such a mechanism.

여기에서 사용된 것과 같은 모듈은 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 조합을 의미한다. 흔히 모듈의 경계는 각각이 전형적으로 가변되고 중첩 가능성을 가진 것으로서 예시된다. 예를 들어, 제1 및 제2모듈은 어떤 독립적인 하드웨어, 소프트웨어, 또는 펌웨어를 보유하면서 하드웨어, 소프트웨어, 또는 펌웨어나 이들의 어떤 조합을 공유할 수 있다. 일 실시예에서, 로직이라는 용어의 사용은 트랜지스터, 레지스터 같은 하드웨어나, 프로그래머블 로직 장치들 같은 다른 하드웨어를 포함한다. 그러나, 다른 실시예에서, 로직은 펌웨어나 마이크로 코드 같이 하드웨어와 통합된 소프트웨어나 코드 또한 포함한다. Module as used herein means hardware, software, firmware, or a combination thereof. Often the boundaries of modules are illustrated as each typically varying and having the possibility of overlap. For example, the first and second modules may share hardware, software, or firmware, or some combination thereof, while retaining some independent hardware, software, or firmware. In one embodiment, the use of the term logic includes hardware such as transistors, registers, or other hardware such as programmable logic devices. However, in other embodiments, the logic also includes software or code integrated with hardware, such as firmware or microcode.

여기에서 사용된 것과 같은 값(value)은 숫자, 상태, 로직 상태, 또는 이진 로직 상태의 어떤 알려진 표현을 포함한다. 보통, 로직 레벨, 로직 값, 또는 로직 로지컬 값들의 사용은 단순히 이진 로직 상태들을 나타내는 1들과 0들로 지칭된다. 예를 들어, 1은 하이 로직 레벨을 나타내고 0은 로우 로직 레벨을 나타낸다. 일 실시예에서, 트랜지스터나 플래시 셀 같은 저장 셀은 단일 로직 값이나 여러 로직 값들을 보유할 수 있다. 그러나, 컴퓨터 시스템에서 값에 대한 다른 표현들이 사용되어져 왔다. 예를 들어 십진수 10 역시 1010이라는 이진 수와 16진수 A로표현될 수 있다. 따라서, 값은 컴퓨터 시스템에서 보유될 수 있는 정보의 어떤 표현을 포함한다.Values as used herein include any known representation of numbers, states, logic states, or binary logic states. Usually, the use of logic level, logic value, or logic logical values is referred to simply as 1s and 0s that represent binary logic states. For example, 1 represents a high logic level and 0 represents a low logic level. In one embodiment, a storage cell, such as a transistor or flash cell, may hold a single logic value or multiple logic values. However, other representations of values have been used in computer systems. For example, decimal 10 can also be represented as binary 1010 and hexadecimal A. Thus, the value includes some representation of the information that may be held in the computer system.

또한, 상태들은 값들이나 값들의 일부에 의해 표현될 수 있다. 예로서, 로직 1 같은 제1값은 디폴트 또는 초기 상태를 나타낼 수 있고, 로직 0 같은 제2값은 디폴트가 아닌 상태를 나타낼 수 있다. 또, 일 실시예에서 리셋과 설정이라는 용어는 디폴트 및 업데이트 값이나 상태를 각기 의미한다. 예를 들어 디폴트 값은 하이 로직 값, 즉 리셋을 포함할 수 있고, 업데이트 값은 로우 로직 값, 즉 설정을 포함할 수 있다. 임의의 개수의 상태들을 표현하기 위해 어떤 값들의 조합들이 사용될 수 있다. Also, states can be represented by values or parts of values. By way of example, a first value such as logic 1 may indicate a default or initial state and a second value such as logic 0 may indicate a non-default state. In addition, in one embodiment, the terms reset and set up refer to default and update values or states, respectively. For example, the default value may include a high logic value, ie a reset, and the update value may include a low logic value, ie a setting. Combinations of some values may be used to represent any number of states.

상술한 방법, 하드웨어, 소프트웨어, 펌웨어 또는 코드의 실시예들은 프로세싱 요소에 의해 실행될 수 있는 장치 액세스 가능하거나 장치 판독가능한 매체 상에 저장되는 명령어나 코드를 통해 구현될 수 있다. 장치 액세스 가능/판독 가능한 매체는 컴퓨터나 전자 시스템 같이 머신에 의해 판독될 수 있는 형식으로 정보를 제공하는(즉, 저장 및/또는 전송하는) 어떤 메커니즘을 포함한다. 예를 들어, 장치 액세스 가능 매체는 고정 RAM(SRAM)이나 동적 RAM(DRAM) 같은 RAM; ROM; 자기 또는 광학 저장 매체; 플래시 메모리 장치; 전기 저장 장치, 광학 저장 장치, 음향학적 저장 장치나 다른 형식의 전파 신호(가령, 반송파, 적외선 신호, 디지털 신호) 저장 장치 등을 포함한다. 예를 들어, 장치는 전파 신호를 통해 전송되는 정보를 보유할 수 있는 매체로부터 반송파 같은 전파 신호를 수신함으로써 저장 장치를 액세스할 수 있다. Embodiments of the methods, hardware, software, firmware or code described above may be implemented via instructions or code stored on a device accessible or device readable medium that may be executed by a processing element. Device accessible / readable media includes any mechanism for presenting (ie, storing and / or transmitting) information in a form that can be read by a machine, such as a computer or an electronic system. For example, device accessible media may include RAM, such as fixed RAM (SRAM) or dynamic RAM (DRAM); ROM; Magnetic or optical storage media; A flash memory device; Electrical storage, optical storage, acoustic storage, or other forms of propagation signals (eg, carrier waves, infrared signals, digital signals) storage devices, and the like. For example, a device can access a storage device by receiving a radio signal, such as a carrier wave, from a medium capable of holding information transmitted over the radio signal.

이 명세서 전체를 통해 "일 실시예" 또는 "한 실시예"라는 언급은 그 실시예와 연관해 기술된 특정한 구성, 구조, 또는 특징이 본 발명의 적어도 일 실시예 안에 포함된다는 것을 의미한다. 따라서, 이 명세서 전체를 통해 여러 곳에 나타나는 "일 실시예"나 "한 실시예"라는 문구가 모두 반드시 같은 실시예를 의미하는 것은 아니다. 또한, 특정 구성, 구조, 또는 특징들이 어떤 적절한 방식으로 하나 이상의 실시예들 안에서 연관될 수 있다. Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular configuration, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. In addition, certain configurations, structures, or features may be associated in one or more embodiments in any suitable manner.

상기 명세서에서, 발명의 상세한 설명은 특정한 전형적 실시예들을 참조하여 주어졌다. 그러나 첨부된 청구범위에 기술된 것과 같은 본 발명의 보다 넓은 개념 및 범위로부터 벗어나지 않고 다양한 수정과 변경이 이뤄질 수 있다는 것은 자명한 일일 것이다. 따라서 명세서 및 도면은 한정적 맥락이 아닌 예시적 맥락으로 간주되어야 한다. 또한 상기 실시예 및 기타 전형적 언어의 사용은 반드시 그와 동일한 실시예나 동일한 예를 의미하는 것은 아니며, 다른 별개의 실시예들 및 동일한 가능성이 있는 실시예를 의미하는 것일 수 있다. In the foregoing specification, a detailed description of the invention has been given with reference to specific exemplary embodiments. However, it will be apparent that various modifications and changes may be made without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive context. In addition, the use of the above embodiments and other typical languages does not necessarily mean the same embodiments or the same examples, but may also mean other separate embodiments and possibly the same embodiments.

Claims

A plurality of processing elements, wherein a processing element of one of the plurality of processing elements is associated with a plurality of software subsystems;
Among the plurality of software subsystems, a metadata access operation associated with a current software subsystem and referring to a data address is assigned to at least a metadata identifier (MDID) associated with the data address and the current software subsystem. And include abstract logic to associate with the abstract address space associated with the current software subsystem based thereon.
Device.

The method of claim 1,
The abstract address space associated with the current software subsystem must be orthogonal to a data address space including the data address and at least one other abstract address space associated with a second software subsystem of the plurality of software subsystems.
Device.

3. The method of claim 2,
Each of the plurality of software subsystems may include a transaction runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software translation subsystem, an outer transaction of a nested transaction group, and a nested transaction group. Individually selected from the group of inner transactions
Device.

The method of claim 1,
Decoding logic for decoding the metadata access operation, the metadata access operation including an operation code (opcode) recognized as one of a plurality of supported operations within the decoding logic.
Device.

The method of claim 1,
The abstract logic includes abstract translation logic to translate the data address into a metadata address within the abstract address space associated with the current software subsystem based at least on the MDID
Device.

6. The method of claim 5,
The abstract translation logic to translate the data address into a metadata address in the abstract address space associated with the current software subsystem is further based on a processing element identifier (PEID) associated with the processing element.
Device.

The method according to claim 6,
The abstract translation logic to translate the data address into a metadata address in the abstract address space associated with the current software subsystem is further based on a compression ratio of the data to metadata.
Device.

The method according to claim 6,
Further comprising a register modifiable by the current software subsystem, the register responsive to a write from the current software subsystem to indicate that the current software subsystem is currently running on the processing element. The abstract translation logic to retain the MDID and convert the data address into a metadata address within the abstract address space associated with the current software subsystem to present a representation of the data address based on the PEID and MDID. The abstract transformation logic in combination with the PEID and the MDID
Device.

9. The method of claim 8,
The abstract translation logic that combines the representation of the data address with the PEID and the MDID adds the PEID and the MDID to the data address to form the metadata address using the normal data translation tables. Converting the data address into a translated data address and adding the PEID and the MDID to the converted address to form the metadata address; and converting the data address using abstract conversion tables separate from normal data conversion tables. Based on a combination algorithm selected from the group consisting of algorithms for converting to translated metadata addresses and forming the metadata addresses by adding the PEID and MDID to the converted metadata addresses;
Device.

Encountering a metadata operation that references a data address associated with a data item that is in a data address space and held in a data entry in cache memory;
Disjoint with the data address space based on the data address, a processing element identifier (PEID) of a processing element associated with the metadata operation, and a metadata identifier (MDID) for a software subsystem associated with the processing element. Determining metadata addresses in the abstract address space;
Accessing metadata entries in the cache memory based on the metadata addresses
Way.

11. The method of claim 10,
The abstract address space is also separate from the additional abstract address space associated with the additional software subsystem also associated with the processing element.
Way.

11. The method of claim 10,
The software subsystem includes a transaction runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software translation subsystem, an outer transaction of a nested transaction group, and an inner of a nested transaction group. Selected from a group of transactions
Way.

11. The method of claim 10,
Writing the MDID to the control register associated with the processing element in response to encountering a write operation to a control register from the software subsystem in response to the software subsystem currently running on the processing element, and
Determining the MDID from the control register
Way.

14. The method of claim 13,
Determining the PEID from a portion of an operation code for the metadata operation
Way.

14. The method of claim 13,
Determining the metadata address from the data address, the PEID and the MDID may be performed by using an algorithm, normal data conversion tables to form the metadata address by adding the PEID and the MDID to the data address. Convert the data address using an algorithm that converts a data address into a translated data address and adds the PEID and the MDID to the converted address to form a metadata address, and abstract conversion tables separate from ordinary data conversion tables. The data address, the PEID, and an algorithm selected from the group consisting of an algorithm that converts the converted metadata address and forms the metadata address by adding the PEID and the MDID to the converted metadata address. Combining the MDIDs
Way.

Decoding logic to decode metadata access instructions that will reference the data address of the data item, wherein the metadata access instructions include an operation code (opcode) that can be recognized as part of an instruction set that can be properly decoded by the decoding logic. Hamm
Converting the data address into a separate metadata address transparently in software, and accessing metadata referenced by the separate metadata address in response to the decoding logic to decode the metadata access instruction. Containing metadata logic
Device.

17. The method of claim 16,
The metadata access instruction is selected from the group of instructions consisting of a metadata bit test and set (MDLT) instruction, a metadata store and set (MSS) instruction, and a metadata store and reset instruction (MDSR).
Device.

17. The method of claim 16,
The metadata access instruction is selected from the group of instructions consisting of a compressed metadata test (CMDT) instruction, a compressed metadata storage (CMS) instruction, and a compressed metadata clear (CMDCLR) instruction.
Device.

17. The method of claim 16,
The metadata logic to translate the data address into a separate metadata address in a software transparent manner is based on at least a metadata identifier (MDID) specified in a control register by a software subsystem associated with the metadata access instruction. Including converting the data address
Device.

17. The method of claim 16,
The metadata access instruction also includes a reference to a destination register, and wherein the metadata logic to access metadata referenced by the separate metadata address is the metadata at the referenced separate metadata address. Containing the metadata logic to load the data into the destination register.
Device.

21. The method of claim 20,
The operation code includes a thread identifier field identifying a thread from which the metadata access instruction is issued.
Device.

21. The method of claim 20,
The metadata logic that accesses the metadata referenced by the separate metadata address responds to the referenced separate metadata address in response to the metadata loaded into the destination register being an unset value. The metadata logic to set the metadata to a predetermined setting value;
Device.

The method of claim 22,
The set and unset values are specified in the metadata access instruction.
Device.

Retains program code that, when executed by a machine, causes the machine to perform generation of a metadata access operation for referencing the data address in the data access operation in response to a data access operation referencing a data address,
The metadata access operation causes the machine to be executed when executed by the machine.
Convert the data address into a metadata address separate from the data address,
Access metadata for a data item at the data address based on the metadata address
Machine-readable medium.

25. The method of claim 24,
The metadata access operation is selected from the group of instructions consisting of a metadata bit test and set (MDLT) instruction, a metadata store and set (MSS) instruction, and a metadata save and reset instruction (MDSR).
Machine-readable medium.

25. The method of claim 24,
The metadata access operation is selected from the group of compression instructions consisting of a compressed metadata test (CMDT) instruction, a compressed metadata storage (CMS) instruction, and a compressed metadata clear (CMDCLR) instruction.
Machine-readable medium.

The method of claim 26,
The metadata access operation, when executed by the machine, causes the machine to translate the data address into a metadata address, which causes the machine to convert the data address into A metadata operation to be combined with a metadata element identifier (MDID) combined with the metadata access operation and a processing element identifier (PEID) combined with the metadata operation based on a compression ratio.
Machine-readable medium.

28. The method of claim 27,
The data address may also be translated by virtual to physical address translation logic within the machine to reference the data item.
Machine-readable medium.

25. The method of claim 24,
The metadata access operation also references an operand register and the metadata access operation that causes the machine to access metadata for the data item when executed by the machine is executed to the machine when executed by the machine. A metadata access operation to cause the metadata for the data item to be updated with a value held in the operand register.
Machine-readable medium.

25. The method of claim 24,
The program code includes compiler code, the compiler code for compiling application code that includes the data access operation, and generating the metadata access operation in the data access operation is a compiled version of the application. Generating said metadata access operation in code.
Machine-readable medium.

A machine-readable medium containing program code,
The program code causes the machine to execute when executed by the machine.
A data address referenced by a metadata access instruction in the program code based on a metadata identifier (MDID) associated with a software subsystem currently active on a processing element associated with the metadata access instruction Convert to, and
Perform an operation of accessing metadata based on the metadata address
Machine-readable medium.

32. The method of claim 31,
The metadata access operation is selected from the group of instructions consisting of a metadata instruction to load the metadata, a metadata storage instruction to store the metadata, and a metadata clear instruction to reset the metadata.
Machine-readable medium.

32. The method of claim 31,
The software subsystem includes a transaction runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software translation subsystem, an outer transaction of a nested transaction group, and an inner of a nested transaction group. Selected from a group of transactions
Machine-readable medium.

32. The method of claim 31,
The operation of translating a data address referenced by a metadata access instruction in the program code to a metadata address based on a metadata identifier (MDID) associated with a software subsystem currently active on a processing element associated with the metadata access instruction is An algorithm for forming the metadata address by adding the MDID to the data address, converting the data address into a translated data address using normal data conversion tables and adding the MDID to the converted data address Converting the data address into a translated metadata address using an algorithm for forming a metadata address and abstract translation tables separate from normal data conversion tables and converting the MDID into the converted And combining the data address with the MDID based on a combination algorithm selected from the group consisting of algorithms for forming the metadata address in addition to the metadata address.
Machine-readable medium.

35. The method of claim 34, wherein adding the MDID adds an MDID selected from the group consisting of an algorithm for adding the MDID at an MSB location, an algorithm for adding the MDID at an LSB location, and an algorithm for replacing address bits with the MDID. Containing algorithms
Machine-readable medium.

35. The method of claim 34,
The program code, when executed by the machine, causes the machine to perform an operation to determine the MDID from the control register of the processing element indicating that the current software subsystem is currently active on the processing element.
Machine-readable medium.

A memory that holds program code including metadata access instructions that reference a data memory address associated with the data item;
A processor coupled with the memory,
The processor is configured to process any one of a plurality of processing elements to be associated with the execution of the metadata access instruction, fetch logic to fetch the metadata access instruction from the memory, and the metadata access instruction. Data including decoding logic to decode at least one metadata access operation, a control register holding a metadata identifier (MDID) associated with an active context on the processing element, and a data entry to hold the data item. A cache memory and an execution unit for executing the metadata access operation,
The execution unit includes an abstract address translation logic in the processor for converting the data memory address into a metadata memory address based on an MDID maintained in the control register, and connected to the data cache memory based on the metadata memory address. Cache control logic to perform metadata access operations on individual entries in the data cache memory.
system.

39. The method of claim 37,
The metadata access instruction is selected from the group of instructions consisting of a metadata load instruction for loading the metadata, a metadata storage instruction for storing the metadata, and a metadata clear instruction for resetting the metadata.
system.

39. The method of claim 37,
The active context includes a transaction runtime subsystem, a garbage collection runtime subsystem, a memory protection subsystem, a software translation subsystem, an outer transaction of a nested transaction group, and an inner transaction of a nested transaction group. Selected from the group consisting of
system.

39. The method of claim 37,
The abstract address translation logic in the processor that translates the data memory address into the metadata memory address is further based on a processing element identifier (PEID) of the processing element and based on the MDID and the PEID held in the control register. The abstract address translation logic in the processor that translates the data memory address into a metadata memory address forms an metadata address by adding the MDID and the PEID to the data address, the data using normal data translation tables. An algorithm that translates an address into a translated data address and adds the PEID and MDID to the translated address to form the metadata address, and an abstract translation table separate from normal data conversion tables. The data address based on a combination algorithm selected from the group consisting of algorithms for converting the data address into a translated metadata address and adding the PEID and MDID to the converted metadata address to form the metadata address. Combining with the MDID and the PEID
system.

An execution module for executing a metadata load operation referencing the address;
In response to the metadata load operation, a force that provides a metadata value associated with an address in response to the processor operating in a first mode and a fixed value in response to the processor operating in a second mode Containing module
Processor.

42. The method of claim 41,
The first mode includes a strong atomicity mode, and the second mode includes a weak atomic mode.
Processor.

43. The method of claim 42,
Further comprising a first register holding the fixed value;
Processor.

44. The method of claim 43,
And a second register holding a mode value, wherein the mode value indicates a first value indicating that the processor is an operation of the strong atomic mode, wherein the mode value indicates that the processor is an operation of the weak atomic mode. Indicating a second value to represent
Processor.

45. The method of claim 44,
The first register and the second register are the same metadata control register
Processor.

45. The method of claim 44,
Wherein the force module provides a metadata value associated with an address in response to the processor operating in the strong atomic mode and provides a fixed value in response to the processor operating in the weak atomic mode; Load the metadata value into a destination register to be specified by the metadata load operation in response to the mode value to be retained in the second register representing the first value to indicate that it is operating in an atomic mode; A force module for loading the fixed value from the first register to the destination register in response to the mode value to be retained in the second register representing the second value to indicate that it operates in the weak atomic mode.
Processor.

Encountering a metadata access operation that references an address,
Determining a mode of processor execution;
In response to determining that the mode of processor execution is a first execution mode, providing a metadata value associated with the address for the metadata access operation;
In response to determining that the mode of processor execution is a second mode of execution, providing a fixed value from a register for the metadata access operation.
Way.

49. The method of claim 47,
Determining the mode of processor execution includes reading a mode flag from a first control register, wherein the mode flag holds a first value indicating that the mode of processor execution is a first execution mode, and Holds a second value indicating that the processor execution mode is a second execution mode
Way.

49. The method of claim 47,
Providing a metadata value associated with the address for the metadata access operation includes loading the metadata value from a memory location associated with the address into a destination register referenced by the metadata access operation.
Way.

50. The method of claim 49,
Providing a fixed value from a register for the metadata access operation includes loading the fixed value from the register into the destination register.
Way.

A memory holding a metadata load operation and a destination register to reference the address,
Including a processor coupled to the memory,
The processor executes execution logic to execute the metadata load operation, a metadata register to hold a forced value, a cache memory to hold a metadata value associated with the address, and to execute the metadata load operation. Responsive to logic to provide the metadata value to the destination register in response to the processor operating in a first mode and to force the value from the metadata register in response to the processor operating in a second mode. Contains force logic provided by registers
system.

52. The method of claim 51,
The force logic further determines whether the processor is operating in the first mode or the second mode.
system.

53. The method of claim 52,
The first mode includes a strong atomicity mode, and the second mode includes a weak atomic mode.
system.

53. The method of claim 52,
The metadata register further holds a mode value, the mode value representing a first value when the processor is operating in the first mode and a second value when the processor is operating in the second mode. And further determining whether the force logic operates in the first mode or the second mode comprises the force logic to interpret the mode value from the metadata register.
system.

53. The method of claim 52,
And a control register to hold a mode value, wherein the mode value indicates a first value when the processor is operating in the first mode and indicates a second value when the processor is operating in the second mode. And further determining whether the force logic operates in the first mode or the second mode includes the force logic to interpret the mode value from the control register.
system.

52. The method of claim 51,
The memory is selected from the group consisting of dynamic random access memory (DRAM), fixed random access memory (SRAM), and nonvolatile memory.
system.

A data cache array holding cache entries,
Coupled with the data cache array, upon buffered update to the cache entry, transition the cache entry from an unmonitored state to a buffered coherency and read monitored state, and then the Cache control logic for transitioning the cache entry to a buffered coherency and write monitored state before transitioning the cache entry to a modified state to commit a buffered update.
Device.

58. The method of claim 57,
A buffering update for the cache entry may include transactional memory access to a data address for a data item to be retained in the cache entry, metadata access to a data address associated with metadata to be retained in the cache entry, and for the cache entry. Containing updates selected from the group consisting of local updates
Device.

58. The method of claim 57,
Cache control logic that transitions the cache entry from an unmonitored state to a buffered coherency and read monitored state updates the coherency bits associated with the cache entry with a buffered value to indicate the buffered coherency state. And cache control logic to update a read monitor attribute bit associated with the cache entry with a read monitored value to indicate the read monitored state.
Device.

60. The method of claim 59,
Cache control logic that transitions the cache entry to a buffered consistency and write-monitored state prior to transitioning the cache entry to a modified state to commit the buffered update may cause the consistency bits associated with the cache entry to be buffered. Cache control logic that maintains the buffered value to indicate a value and updates the write monitor attribute bits associated with the cache entry to a write monitored value to indicate the write monitored state.
Device.

64. The method of claim 60,
The cache control logic to transition the cache entry to the modified state includes the cache control logic to update the coherency bits associated with the cache entry to a modified value to indicate the modified coherency state.
Device.

58. The method of claim 57,
Execution logic that executes the buffered update and then executes a commit operation, wherein the cache entry is buffered for consistency and write monitoring prior to transitioning the cache entry to a modified state to commit the buffered update The cache control logic that transitions to a closed state is responsive to the execution logic executing the commit operation.
Device.

Encountering a buffered update to a block of cache memory,
Applying read monitoring to the block upon encountering the buffered update to the block of cache memory;
Thereafter, applying write monitoring to the block prior to committing the block.
Way.

64. The method of claim 63,
The buffered update to the block of cache memory includes a transaction record for the block of cache memory.
Way.

64. The method of claim 63,
Performing the buffered update on the block of cache memory concurrently with a read monitoring application, wherein after performing the buffered update the block is held in a buffered coherency state.
Way.

64. The method of claim 63,
Performing the buffered update on the block of cache memory after applying read monitoring, wherein after performing the buffered update the block is held in a buffered coherency state.
Way.

64. The method of claim 63,
Upon encountering the buffered update to the block of cache memory, applying read monitoring to the block comprises generating a read request of the block for processing elements outside the cache domain of the cache memory; In response to no conflict detected from processing elements outside the cache domain responsive to the read request for the block, a read monitor attribute associated with the block of the cache memory to apply read monitoring for the block. Updating to the monitor value
Way.

68. The method of claim 67,
Applying write monitoring for the block before committing includes generating a read request for ownership of the block for the processing elements outside of the cache domain of the cache memory, and requesting the read for ownership of the block. In response to no conflict detected from the processing elements outside the cache domain responsive to updating the write monitor attribute associated with the block of cache memory with a write monitor value for applying write monitoring for the block. Containing steps
Way.

69. The method of claim 68,
Committing the block includes transitioning the cache coherency state of the block from a buffered coherency state to a modified coherency state.
Way.

When run by a machine,
Applying read monitoring to the block upon buffered writes to blocks of cache memory;
Performing the buffered write to the block;
Retaining program code for performing an operation of applying write monitoring for the block after applying the read monitoring and before committing the block.
Machine-readable medium.

71. The method of claim 70,
In the buffered read of the block of cache memory, applying read monitoring to the block may include
Generating a read request of the block for processing elements outside the cache domain of the cache memory;
In response to not detecting any conflict from the processing elements outside the cache domain in response to the read request for the block, applying read monitoring for the block to a read monitor attribute associated with the block of the cache memory. Updating the readout monitor value for
Machine-readable medium.

72. The method of claim 71,
After committing the read monitoring and before committing the block, applying write monitoring for the block
Generating an ownership read request for the block for the processing elements outside the cache domain of the cache memory;
Applying write monitoring for the block to the write monitor attribute associated with the block of the cache memory as no conflict is detected from the processing elements outside the cache domain in response to the ownership read request for the block. Updating the history monitor value for
Machine-readable medium.

71. The method of claim 70,
After committing the read mortise and before committing the block, applying write monitoring for the block is responsive to encountering a commit operation.
Machine-readable medium.

71. The method of claim 70,
Committing the block includes transitioning the cache coherency state of the block to a modified coherency state.
Machine-readable medium.

System memory that holds transactional write and commit operations referencing memory addresses;
Coupled with the system memory, generating a read request for a cache line associated with the memory address upon receiving the transaction record, buffering the cache line in response to no conflict being detected based on the read request; Transition to a read-monitored state, generate an ownership read request upon receiving the commit operation, transition the cache line to a buffered and write-monitored state as no conflicts are detected based on the ownership read request, and And a processor including a cache memory for transitioning said cache line to a modified state as said line transitions to said buffered and write monitored state.
system.

78. The method of claim 75,
Cache memory for transitioning the cache line to a buffered and read monitored state updates the coherency bits associated with the cache line with a buffered value to indicate a buffered portion of the buffered and read monitored state and the cache A cache memory that updates a read monitor attribute bit associated with a line with a read monitored value to indicate the read monitored portion of the buffered read monitored state.
system.

80. The method of claim 76,
Cache memory for transitioning the cache line to a buffered and write monitored state maintains the coherency bits associated with the cache line at a buffered value to indicate a buffered portion of the buffered and write monitored state. A cache memory for updating a write monitor attribute bit associated with a write monitored value to indicate the write monitored portion of the buffered write monitored state.
system.

78. The method of claim 77,
In response to transitioning the cache line to the buffered and write monitored state, the cache memory to transition the cache line to the modified state includes updating the coherency bits with a modified value to indicate the modified state.
system.

78. The method of claim 75,
The memory is selected from the group consisting of dynamic random access memory (DRAM), fixed random access memory (SRAM), and nonvolatile memory.
system.

Decode a loss instruction to provide a decoded element, the loss instruction comprising an operation code referring to a label and being part of a set of instructions that can be recognized by the decoding logic. Decoding logic,
A state storage element comprising a loss field holding a loss value indicating that a loss event has been detected;
And jump logic coupled to the state storage element to transfer control to the label based on the decoded element and the loss value indicating that the loss event was detected.
Device.

79. The method of claim 80,
The label includes a jump destination address and includes a read monitor conflict indicating that a write may have occurred on a read monitored cache line, a write monitor conflict indicating that an access may have occurred on a write monitored cache line, and a buffered cache. The loss event is selected from the group consisting of the loss of the line
Device.

79. The method of claim 80,
The state storage element includes a register, the loss field holding a loss value includes a first bit set when a read monitor conflict is detected, a second bit set when a write monitor conflict is detected, and buffered physical data. A third bit set when a loss is detected and a fourth bit set when a loss of the buffered metadata is detected
Device.

79. The method of claim 80,
The loss instruction includes a read monitor loss instruction, wherein the operation code is for specifying a read monitor loss event type and is controlled by the label based on the decoded element and the loss value indicating that the loss event has been detected. The jump logic conveys a jump logic that jumps execution to the label in response to the loss field holding the loss value indicating that the loss event that occurred was of a read monitor loss event type specified by the operation code of a read monitor loss instruction. Containing
Device.

79. The method of claim 80,
The loss instruction includes a write monitor loss instruction and the operation code is for specifying a write monitor loss event type and based on the decoded element and the loss field having a loss value indicating that the loss event was detected. Jump logic transferring control to the label jumps execution to the label in response to a loss field holding a loss value indicating that the loss event that occurred was of a write monitor loss event type specified by the operation code of a write monitor loss instruction. Containing logic
Device.

79. The method of claim 80,
The loss instruction includes a buffered loss instruction and the operation code is for specifying a buffered loss event type, based on the decoded element and the loss field having a loss value indicating that the loss event was detected. Jump logic transferring control to the label jumps execution to the label in response to a loss field having a loss value indicating that the loss event that occurred was of a buffered loss event type specified by the operation code of the buffered loss instruction. Containing logic
Device.

When executed by a machine, cause the machine to respond in response to a loss instruction,
Determining the status of a transaction specified by the loss instruction and held in a transaction status register residing on the machine;
Retain program code for performing an operation for vectoring execution with a label specified by the loss instruction in response to a status of the transaction indicating that a loss event associated with the loss instruction has been detected.
Machine-readable medium.

88. The method of claim 86,
The label includes a jump destination address and includes a read monitor conflict indicating that a write may have occurred on a read monitored cache line, a write monitor conflict indicating that access to a write monitored cache line may have occurred, and a buffered cache. The loss event is selected from the group consisting of the loss of the line
Machine-readable medium.

88. The method of claim 86,
The loss instruction includes a read monitor jump loss (JLOSS) instruction specifying that the loss event is a read monitor conflict, and the operation of determining the status of a transaction held in the transaction status register is a read monitor held in the transaction status register. An operation to determine the state of the conflict bit, wherein the operation to vector execution with a label specified by the loss instruction in response to the state of the transaction indicating that a loss event associated with the loss instruction has been detected. An operation to vector execution to a label specified by the missing instruction in response to the status of the read monitor conflict bit held in the transaction status register indicating that it has been detected.
Machine-readable medium.

88. The method of claim 86,
The loss instruction includes a write monitor jump loss (JLOSS) instruction specifying that the loss event is a write monitor conflict, and the operation of determining the status of a transaction held in the transaction status register is a write monitor held in the transaction status register. An operation to determine the state of the conflict bit, wherein the operation of vectorizing execution to the label specified by the loss instruction in response to the state of the transaction indicating that a loss event associated with the loss instruction was detected. An operation to vector execution to a label specified by the missing instruction in response to the state of the write monitor conflict bit held in the transaction status register indicating that it has been detected.
Machine-readable medium.

88. The method of claim 86,
The loss instruction includes a buffered monitor jump loss (JLOSS) instruction specifying that the loss event is a buffered monitor conflict, and an operation to determine the state of a transaction held in the transaction status register is held in the transaction status register. An operation that determines a state of a buffered monitor conflict bit, wherein the operation of vectorizing execution to the label specified by the loss instruction in response to the state of the transaction indicating that a loss event associated with the loss instruction has been detected is buffering. An operation to vector execution with a label specified by the missing instruction in response to the state of the buffered monitor conflict bit held in the transaction status register indicating that a detected monitor conflict was detected.
Machine-readable medium.

Encountering missing instructions on the processor,
Determining whether a loss event associated with the loss instruction has been detected by the processor in response to encountering the loss instruction;
In response to contacting the missing instruction and determining that the lost event associated with the missing instruction has been detected at the processor, branching to a label referenced by the missing instruction.
Way.

92. The method of claim 91,
The label includes a jump address
Way.

92. The method of claim 91,
The loss instruction includes a read monitor loss instruction and the loss event associated with the read monitor loss instruction includes a write to a read monitored cache line.
Way.

92. The method of claim 91,
The loss instruction includes a write monitor loss instruction, wherein the loss event associated with the write monitor loss instruction includes access to a write monitored cache line.
Way.

92. The method of claim 91,
The loss instruction includes a buffered loss instruction, and the loss event associated with the buffered loss instruction includes eviction of a buffered cache line.
Way.

95. The method of claim 95,
Determining whether retirement of a buffered cache line is detected at the processor includes checking a buffered loss status bit in a transaction status register and in response to the buffered loss status bit being set to a loss value, the buffered cache Determining that eviction of the line has been detected
Way.

Decode commit instructions for a transaction to provide decoded elements, wherein the commit instructions specify an operation condition and include an operation code (opcode) that is part of an instruction set that can be recognized by the decoding logic. and,
And commit logic for determining whether the commit condition specified by the commit instruction is satisfied for the transaction in response to the decoded element.
Device.

98. The method of claim 97,
The commit condition includes any specified combination of lossless read-monitored data, lossless write-monitored data, lossless buffered data, and lossless metadata, and commit logic that determines that the commit condition is satisfied And determining that the specified combination of lossless read-monitored data, lossless write-monitored data, lossless buffered data, and lossless metadata has occurred.
Device.

98. The method of claim 97,
The commit instruction specifying a commit condition is a first bit that, when set, indicates which loss of read-monitored data is a condition to commit, a second bit that indicates which loss of write-monitored data is a condition to commit, when set A third instruction indicating that any loss of buffered data is a condition to commit, and a commit instruction holding four bits of the fourth bit, when set, indicating which loss of metadata is a condition to commit.
Device.

The method of claim 99,
The four bits should be included in the operation code
Device.

The method of claim 99,
The commit logic, which determines whether the commit condition specified by the commit instruction is satisfied for the transaction, checks the corresponding status bits in the transaction status register for each of the four bits set in the commit instruction, Determining that a commit condition is satisfied if none of the corresponding status bits in the transaction register are set to indicate an association loss
Device.

98. The method of claim 97,
The commit instruction further specifies clear controls to indicate that the combination of read monitored data, write monitored data, buffered data, and metadata is cleared at commit, wherein the commit logic is configured to specify the commit specified by the commit command. Clearing the specified combination of read monitored data, write monitored data, buffered data, and metadata after committing the transaction in response to determining that a commit condition is satisfied for the transaction.
Device.

A machine readable medium carrying program code,
The program code, when executed by a machine, causes the machine to encounter a commit instruction specifying at least one commit failure condition for a transaction from the program code;
Determining whether the at least one commit failure condition specified by the commit instruction is detected during the pending of the transaction;
In response to determining that at least one commit failure condition specified by the commit instruction was detected during the pending of the transaction, a value indicating that the at least one commit failure condition specified by the commit instruction was detected during the pending of the transaction Which includes providing an action
Machine-readable medium.

103. The method of claim 103,
The at least one commit failure condition is selected from the group consisting of loss of read monitored data, loss of write monitored data, loss of buffered data, and loss of metadata.
Machine-readable medium.

103. The method of claim 103,
An operation providing a value indicating that the at least one commit failure condition specified by the commit instruction was detected during the pending of the transaction indicates that at least one commit failure condition specified by the commit instruction was detected during the pending of the transaction. An operation that loads the value into a destination register to indicate that
Machine-readable medium.

103. The method of claim 103,
The operation of determining whether the at least one commit failure condition specified by the commit instruction is detected during the pending of the transaction may include an operation of checking a status bit in a transaction status register associated with the at least one commit failure condition. The status bit associated with the at least one commit failure condition is set to indicate that the at least one commit failure condition was detected during the pending of the transaction, the specified by the commit instruction during the pending of the transaction. An operation that determines that at least one commit failure condition has been detected and that the status bit associated with the at least one commit failure condition has not detected the at least one commit failure condition during the pending of the transaction. In response to a reset As shown, including an operation for determining has not the at least one commit failure conditions specified by the commit instruction is detected during the pendency of the transaction
Machine-readable medium.

107. The method of claim 106,
And committing the transaction in response to determining that the at least one commit failure condition specified by the commit instruction was not detected during the pending of the transaction.
Machine-readable medium.

Encountering, within a transaction, a commit instruction that includes an operation code (opcode) specifying a commit failure condition of the transaction;
Determining that a commit failure condition of the transaction specified in the operation code of the commit instruction was not detected during the pending of the transaction;
Committing the transaction in response to determining that a commit failure condition of the transaction specified in the operation code of the commit instruction was not detected during the pending of the transaction;
Way.

108. The method of claim 108,
The first code of the operation code specifying that the loss of read-monitored data is a commit failure condition when the operation code specifying the commit failure conditions of the transaction is a commit failure condition. The second bit of the operation code specifying that the third bit of the operation code specifying that the loss of the buffered data when set, and the loss of the metadata when set is a commit failing condition Containing the fourth bit of the operation code
Way.

108. The method of claim 109,
Determining that a commit failure condition of the transaction specified in the operation code of the commit instruction was not detected during the pending of the transaction is that the read monitor bit of the transaction status register is read monitored as the first bit of the operation code is set. Determining that the write monitor bit of the transaction status register is not set to indicate lossless write-monitored data as the second bit of the operation code is set. Determining that the buffered bit of the transaction status register is not set to indicate lossless of the buffered data as the third bit of the operation code is set, and the operation code Comprising the step of determining the meta-data bits of the transaction status register as the fourth bit is set has not been set to refer to the metadata of the lossless
Way.

108. The method of claim 109,
The operation code further specifies a clear control, and the operation code specifying the clear control is, when set, the fifth bit of the operation code specifying that the read-monitored data is cleared at commit, when set, at commit The sixth bit of the operation code specifying write-monitored data to be cleared, the seventh bit of the operation code specifying the buffered data to be cleared at commit, if set, and the metadata at commit, if set An eighth bit of the operation code that specifies that
Way.

111. The method of claim 111,
The committing of the transaction may include clearing the read-monitored data when the fifth bit is set, clearing the write-monitored data when the sixth bit is set, and the seventh bit. Clearing the buffered data if set and clearing the metadata if the eighth bit is set.
Way.

A memory holding program code including a commit instruction of said transaction comprising an operation code specifying commit failure conditions and clear control information for the transaction;
Decoding logic for decoding the operation code of the commit instruction;
A processor including commit logic to determine whether any of the commit failure conditions specified in the operation code were not detected during a transaction pending and to commit the transaction in accordance with a determination that the commit failure condition was not detected during the transaction pending. ,
The commit logic to commit the transaction includes commit logic to clear transaction information based on the clear control information specified in the operation code of the commit instruction.
system.

112. The method of claim 113,
The commit failure condition is based on a combination of loss of read monitored data, loss of write monitored data, loss of buffered data, and loss of metadata.
system.

115. The method of claim 114,
The commit failure condition may include loss of write-monitored data, loss of read-monitored data or loss of write-monitored data, loss of write-monitored data or loss of buffered data, loss of write-monitored data, or loss of metadata. And loss of write-monitored data, loss of read-monitored data, loss of buffered data, or loss of metadata.
system.

112. The method of claim 113,
The operation code specifying the clear control information includes an operation code specifying which of the read monitor, the write monitor, the buffered consistency, and the metadata should be cleared at commit, wherein the operation code specified by the operation code of the commit instruction The commit logic to clear transaction information based on the clear control information includes read logic specified to be cleared in the operation code, write monitor, buffered consistency, and commit logic to clear metadata.
system.

A storage element that contains a transaction enable field (TEF (transaction enable filed)) indicating that the associated transaction was active and enabled when holding an active value and that the associated transaction was suspended when holding an inactive value; ,
Logic to save at least the state of the TEF in a storage structure in response to a ring level transition event and restore at least the state of the TEF from the storage structure to the storage element in response to a return event. Containing
Device.

118. The method of claim 117 wherein
The ring level switch event includes an event selected from the group consisting of interrupts, exceptions, system calls, virtual machine enters, and virtual machine exits.
Device.

118. The method of claim 117 wherein
The return event includes an event selected from the group consisting of Interrupt Return (IRET), System Return (SYSRET), Virtual Machine (VM) Ender, and Virtual Machine (VM) Escape.
Device.

118. The method of claim 117 wherein
The storage element includes a flag register and the TEF includes a transaction enable flag.
Device.

118. The method of claim 117 wherein
The storage structure includes a stack, and logic to store at least the state of the TEF on the stack includes push logic to push at least the state of the TEF onto the stack, wherein at least the TEF from the stack to the storage element. The logic for recovering the state of the circuit includes pop logic for popping at least the TEF in the stack to restore the TEF to the storage element.
Device.

Memory that holds code that causes ring-level transition events at runtime,
A register comprising a transaction enable field (TEF) that holds an active value indicating that the associated transaction is active, a previous state of the register is pushed onto the stack in response to the ring level switch event, and the associated transaction is And a processor that clears the TEF with an inactive value indicating pending and includes stack logic to restore the previous state of the register from the stack to the register in response to a return event.
system.

123. The method of claim 122 wherein
The ring level switch event includes an event selected from the group consisting of interrupts, exceptions, system calls, virtual machine enters, and virtual machine exits.
system.

123. The method of claim 122 wherein
The return event includes an event selected from the group consisting of interrupt return (IRET), system return (SYSRET), virtual machine (VM) enter, and virtual machine (VM) exit.
system.

123. The method of claim 122 wherein
The register includes a flag register, the TEF includes a transaction enable flag, the active value includes a high logic value of the flag, and the inactive value includes a low logic value of the flag.
system.

Detecting a ring level transition event from the current ring level;
Storing in a storage structure the previous state of a register containing a transaction enable field;
Clearing the transaction enable field to indicate that an associated transaction is pending;
Detecting a return to the current ring level event;
In response to detecting the return to the current ring level event, recovering the previous state of the register from the storage structure.
Way.

126. The method of claim 126,
The storage structure includes a kernel stack, and storing the previous state of the register in the kernel stack includes pushing the previous state of the register onto the kernel stack, wherein the previous state of the register Recovering from the kernel stack includes popping the previous state of the register from the kernel stack to restore the previous state to the register.
Way.

126. The method of claim 126, wherein
The current ring level includes a user ring level
Way.

127. The method of claim 128,
The ring level switch event includes an event selected from the group consisting of an interrupt, an exception, a system call, and a virtual machine enter.
Way.

129. The method of claim 129,
The return to the current privilege level event includes an event selected from the group consisting of an interrupt return (IRET), a system return (SYSRET), and a virtual machine (VM) exit.
Way.