KR102514351B1

KR102514351B1 - Techniques for metadata processing

Info

Publication number: KR102514351B1
Application number: KR1020217036219A
Authority: KR
Inventors: 안드레` 데혼; 카타린 흐릿쿠; 우딧 드하완
Original assignee: 더 차레스 스타크 드레이퍼 래보레이토리, 인코포레이티드; 더 네셔널 인스티튜트 포 리서치 인 데이터 프로세싱 앤드 오토메이션; 더 트러스티즈 오브 더 유니버시티 오브 펜실베니아
Priority date: 2015-12-17
Filing date: 2016-12-12
Publication date: 2023-03-29
Also published as: KR20210138121A

Abstract

프로세서상에서 실행되는 코드에 대해 임의의 수의 보안 정책을 인코딩하는데 사용할 수 있는 메타데이터 처리를 위한 기술이 설명된다. 메타데이터는 시스템의 모든 워드에 추가될 수 있으며 메타데이터 처리 유닛은 데이터 흐름과 병행하여 작동하여 임의의 정책 세트를 시행하는데 사용될 수 있다. 일 양태에서, 메타데이터는 광범위한 메타데이터 처리 정책에 적용 가능하도록 무한하고 소프트웨어 프로그래머블한 것으로 특징지을 수 있다. 기술 및 정책은 예를 들어, 안전, 보안 및 동기화를 비롯한 광범위한 용도를 갖는다. 또한, RISC-V 아키텍처에 기초한 실시예에서는 메타데이터 처리와 관련하여 양태 및 기술이 설명된다.Techniques for metadata processing that can be used to encode any number of security policies for code running on a processor are described. Metadata can be added to every word in the system and the metadata processing unit can be used to enforce an arbitrary set of policies working in parallel with the data flow. In one aspect, metadata can be characterized as infinite and software programmable so as to be applicable to a wide range of metadata processing policies. Technologies and policies have a wide range of uses including, for example, safety, security and synchronization. In addition, aspects and techniques are described with respect to metadata processing in embodiments based on the RISC-V architecture.

Description

Metadata processing technology {TECHNIQUES FOR METADATA PROCESSING}

관련 출원에 대한 상호 참조CROSS REFERENCES TO RELATED APPLICATIONS

본 출원은 2015년 12월 17일자로 출원된 SOFTWARE DEFINED METADATA PROCESSING(소프트웨어 정의 메타데이터 처리)의 미국 가출원 제 62/268,639 호 및 2016년 5월 31일자로 출원된 SOFTWARE DEFINED METADATA PROCESSING의 미국 가출원 제 15/168,689 호에 대한 우선권을 주장하며, 이들 가출원은 모두 전체적으로 본 출원에서 참조로 포함된다. This application is based on U.S. Provisional Application No. 62/268,639 for Software Defined Metadata Processing, filed on December 17, 2015, and U.S. Provisional Application No. 15 for SOFTWARE DEFINED METADATA PROCESSING, filed on May 31, 2016. /168,689, all of which provisional applications are incorporated herein by reference in their entirety.

배경background

본 출원은 일반적으로 데이터 처리에 관한 것으로, 특히 메타데이터 처리를 위한 프로그래머블 유닛(programmable unit)에 관한 것이다.This application relates generally to data processing, and in particular to a programmable unit for metadata processing.

오늘날의 컴퓨터 시스템은 안전하게 지키기가 어렵다고 두루 알려져 있다. 예를 들어, 통상의 프로세서 아키텍처는 버퍼 오버플로우(buffer overflow), 포인터 위조(pointer forging) 등과 같이, 더 높은 레벨의 추상화를 위반하는 다양한 거동을 가능하게 한다. 프로그래밍 언어와 하드웨어 간의 틈새를 좁히는 것은 소프트웨어에 맡길 수 있고, 소프트웨어에서는 빈틈없는 추상화를 시행하는 비용이 종종 너무 높다고 생각된다.Today's computer systems are notoriously difficult to secure. For example, typical processor architectures enable various behaviors that violate higher-level abstractions, such as buffer overflow, pointer forging, and the like. Bridging the gap between programming languages and hardware can be left to software, where the cost of implementing tight abstractions is often considered too high.

최근의 일부 노력으로 실행 동안 메타데이터를 전파하여 안전 위반 및 악의적인 공격이 발생할 때 이를 잡아 내는 정책을 시행하는 가치를 입증하였다. 이러한 정책은 소프트웨어에서 시행될 수 있지만, 정책을 배포하는 것을 단념하게 하거나 수준이 낮게 근사화하게 만드는, 예컨대 성능 및/또는 비용에 있어서, 전형적으로 높고 바람직하지 않은 오버헤드를 유발하여 보호를 더 적게 하게 한다. 고정된 정책을 위한 하드웨어 지원은 허용 가능한 수준으로 오버헤드를 줄일 수 있고 예컨대 악의적인 코드 또는 멀웨어 공격에 의해 수행될 수 있는 원하지 않는 대부분의 코드 위반을 예방할 수 있다. 예를 들어, 인텔은 최근에 경계 검사 및 격리를 위한 하드웨어를 발표하였다. 이것들은 오늘날의 많은 공격을 완화해 주지만, 시스템을 완벽하게 보호하려면 메모리 안전과 격리 이상의 것이 필요할 것이다. 공격은 임의의 남아 있는 형태의 취약성을 악용하기 위해 빠르게 진화한다.Some recent efforts have demonstrated the value of enforcing policies that propagate metadata during execution to catch safety violations and malicious attacks as they occur. Such policies can be enforced in software, but typically result in high and undesirable overhead, e.g. in performance and/or cost, which makes deploying policies discouraging or low-level approximations, resulting in less protection. do. Hardware support for fixed policies can reduce overhead to an acceptable level and prevent most unwanted code violations that can be performed, for example, by malicious code or malware attacks. For example, Intel recently announced hardware for boundary inspection and isolation. These mitigate many of today's attacks, but you'll need more than memory safety and isolation to fully secure your system. Attacks evolve rapidly to exploit any remaining forms of vulnerability.

따라서 이렇게 끊임없이 변화하는 환경에 신속하게 적응될 수 있는 유연한 보안 아키텍처가 필요하다. 이러한 아키텍처가 최소한의 오버헤드로 소프트웨어-정의 메타데이터 처리(software-defined metadata processing)를 지원하도록 하는 것이 바람직하다. 이러한 아키텍처는 일반적으로 메타데이터에 할당된 비트 수에 가시적인 엄격한 제한을 두지 않고 임의의 수와 유형의 정책을 일반적으로 지원하고 시행하도록 확장 가능한 것이 바람직하다. 메타데이터는 실행 동안 전파되어 정책을 시행하고 예를 들어 악의적인 코드 또는 멀웨어 공격과 같은 그러한 정책의 위반을 잡아낼 수 있다.Therefore, a flexible security architecture that can quickly adapt to this ever-changing environment is needed. It is desirable for such an architecture to support software-defined metadata processing with minimal overhead. It is desirable that such architectures be extensible to generally support and enforce arbitrary numbers and types of policies, generally without imposing hard and visible limits on the number of bits allocated to metadata. Metadata can be propagated during execution to enforce policies and catch violations of those policies, such as malicious code or malware attacks.

본 명세서에 설명된 기술의 일 양태에 따르면, 명령어를 처리하는 방법은: 메타데이터 처리를 위해, 연관된 메타데이터 태그를 갖는 현재 명령어를 수신하는 단계 - 상기 메타데이터 처리는 현재 명령어를 포함하는 코드 실행 도메인(code execution domain)으로부터 격리된 메타데이터 처리 도메인(metadata processing domain)에서 수행됨 -; 메타데이터 처리 도메인에서, 메타데이터 태그 및 현재 명령어에 따라, 현재 명령어에 대해 규칙이 규칙 캐시 내에 존재하는지를 결정하는 단계 - 상기 규칙 캐시에는 허용된 동작을 정의하는 메타데이터 처리에 의해 사용되는 메타데이터에 관한 규칙이 포함됨 -; 및 현재 명령어에 대해 어떠한 규칙도 규칙 캐시 내에 존재하지 않는다고 결정하는 것에 응답하여, 메타데이터 처리 도메인에서 규칙 캐시 미스 처리(rule cache miss processing)를 수행하는 단계를 포함하고, 규칙 캐시 미스 처리를 수행하는 단계는: 현재 명령어의 실행이 허용되는지를 결정하는 단계; 현재 명령어가 코드 실행 도메인에서 실행되도록 허용된다고 결정하는 것에 응답하여, 현재 명령어에 대한 새로운 규칙을 발생하는 단계; 레지스터에 기입하는 단계; 및 레지스터에 기입하는 것에 응답하여, 새로운 규칙을 규칙 캐시에 삽입하는 단계를 포함한다. 현재 명령어에 대한 규칙을 선택하는데 사용되는 제 1 메타데이터는 메타데이터 처리에 의해 사용되는 복수의 제어 상태 레지스터의 제 1 부분에 저장될 수 있으며, 복수의 제어 상태 레지스터의 제 1 부분은 현재 명령어에 대한 복수의 메타데이터 태그를 메타데이터 처리 도메인에 전달하는데 사용될 수 있고, 복수의 메타데이터 태그는 메타데이터 처리 도메인에서 데이터로서 사용될 수 있다. 레지스터는 메타데이터 처리에 의해 사용되는 복수의 제어 상태 레지스터 중 제 1 제어 상태 레지스터일 수 있으며, 복수의 제어 상태 레지스터의 제 1 부분은 메타데이터 처리 도메인으로부터 규칙 캐시로 복수의 메타데이터 태그를 전달하는데 사용될 수 있다.복수의 메타데이터 태그는 현재 명령어에 대한 것일 수 있다. 새로운 규칙은 다른 메타데이터 태그를 제 1 제어 상태 레지스터에 기입하는 것에 응답하여 규칙 캐시에 삽입될 수 있고, 다른 메타데이터 태그는 현재 명령어의 결과에 배치될 수 있으며, 결과는 목적지 레지스터 또는 메모리 위치 중 어느 것일 수 있다. 복수의 제어 상태 레지스터는: 다른 모든 발생된 메타데이터 태그가 도출되는 초기 메타데이터 태그를 포함하는 부트스트랩 태그 제어 상태 레지스터(bootstrap tag control status register); 디폴트 메타데이터 태그를 특정하는 디폴트 태그 제어 상태 레지스터(default tag control status register); 공개적(public) 및 신뢰성 없음(untrusted)으로 분류된 명령어 및 데이터에 태깅하는데 사용되는 공개적 신뢰성 없는 메타데이터 태그를 특정하는 공개적 신뢰성 없는 제어 상태 레지스터(public untrusted control status register); opgroup에 관한 정보 및 상이한 opcode에 대한 케어(care) 정보를 포함하는 테이블에 기입된 데이터를 포함하는 opgroup 값 제어 상태 레지스터(opgroup value control status register); opgroup 값 제어 상태 레지스터의 데이터가 기입되는 테이블 내의 위치를 특정하는 opgroup 어드레스 제어 상태 레지스터(opgroup address control status register); 및 펌프플러시 제어 상태 레지스터(pumpflush control status register) 중 임의의 하나 이상을 포함할 수 있고, 펌프플러시 제어 상태 레지스터로의 기입은 규칙 캐시의 플러싱(flushing)을 트리거한다. 복수의 제어 상태 레지스터는 메타데이터 처리의 현재 모드를 나타내는 태그 모드 제어 상태 레지스터(tag mode control status register)를 포함할 수 있다. 태그 모드 제어 상태 레지스터는 하나 이상의 정의된 정책의 규칙이 메타데이터 처리에 의해 시행되지 않는 메타데이터 처리가 연계 해제되는(disengaged) 때를 표시할 수 있다. 태그 모드 제어 상태 레지스터는 메타데이터 처리의 현재 모드를 나타내는 허용된 상태들의 정의된 세트 중 하나로 설정될 수 있다. 허용된 상태는: 오프 상태, 메타데이터 처리가 모든 결과에 디폴트를 기입하는 상태 및 메타데이터 처리가 연계되며 명령어가 하나 이상의 특정된 권한 레벨에서 코드 도메인에서 실행될 때 동작함을 표시하는 상태 중 어느 것을 포함할 수 있다. 규칙 캐시 미스 처리는 메타데이터 처리가 연계 해제되는 허용된 상태들의 정의된 세트 중 제 1상태에서 수행될 수 있다. 허용된 상태는 명령어가 사용자 권한 레벨에서 코드 도메인에서 실행될 때만 메타데이터 처리가 연계됨을 표시하는 제 1 상태; 명령어가 사용자 또는 슈퍼바이저 권한 레벨에서 코드 도메인에서 실행될 때만 메타데이터 처리가 연계됨을 표시하는 제 2 상태; 명령어가 사용자, 슈퍼바이저 또는 하이퍼바이저 권한 레벨에서 코드 도메인에서 실행될 때만 메타데이터 처리가 연계됨을 표시하는 제 3 상태; 및 명령어가 사용자, 슈퍼바이저, 하이퍼바이저 또는 머신 권한 레벨에서 코드 도메인에서 실행될 때 메타데이터 처리가 연계됨을 표시하는 제 4 상태를 포함할 수 있다. 메타데이터 처리가 연계되거나 또는 연계 해제되는지는 코드 도메인에서 실행되는 코드의 현재 권한 레벨과 조합하여 태그 모드 제어 상태 레지스터의 현재 태그 모드에 따라 결정될 수 있고, 하나 이상의 정의된 정책의 규칙은 메타데이터 처리가 연계 해제될 때 시행되지 않을 수 있으며, 규칙은 메타데이터 처리가 연계될 때 시행될 수 있다. 테이블에는 명령어 세트의 opcode를 대응하는 opgroup 및 비트 벡터 정보에 매핑하는 정보가 포함될 수 있다. opgroup은 메타데이터 처리 도메인에 의해 유사하게 취급되는 연관된 opcode들의 그룹을 나타낼 수 있다. 비트 벡터 정보는 메타데이터 처리 도메인에 대해 특정 입력 및 출력이 opcode를 처리하는 것과 관련하여 사용되는지를 나타낼 수 있다. 테이블은 최대 허용 가능한 opcode 비트 수 미만의 opcode 비트의 제 1 부분을 사용하여 인덱싱될 수 있으며, 최대 수는 명령어 세트의 opcode의 비트 수의 상한을 나타낼 수 있다. 복수의 제어 상태 레지스터의 제 1 부분은 만일 있다면, 현재 명령어에 대한 추가 opcode 비트를 포함하는 확장된 opcode 제어 상태 레지스터를 포함할 수 있고, 현재 명령어는 가변 길이 opcode를 갖는 명령어 세트에 포함될 수 있으며, 명령어 세트의 각 opcode는 추가 opcode 비트를 임의로 포함할 수 있으며, 확장된 opcode 제어 상태 레지스터는 만일 있다면, 현재 명령어에 대한 추가 opcode 비트를 포함한다. 테이블을 사용하여 매핑된 각각의 opcode의 경우, 각각의 opcode 에 대응하는 결과 비트 벡터가 존재하고, 결과 비트 벡터는 만일 있다면, 확장된 opcode 제어 상태 레지스터의 추가 opcode 비트의 어떤 부분이 메타데이터 처리를 위해 상기 각 opcode와 함께 사용되는지를 나타낼 수 있다. 현재 명령어는 단일 메타데이터 태그와 연관된 메모리의 단일 워드에 저장된 다수의 명령어 중 하나이며, 상기 단일 메타데이터 태그는 단일 워드에 포함된 다수의 명령어와 연관될 수 있다. 복수의 제어 상태 레지스터는 단일 워드에 저장된 다수의 명령어 중 어느 것이 현재 명령어인지를 표시하는 서브명령어 제어 상태 레지스터를 포함할 수 있다. 단일 메타데이터 태그는 단일 워드 내의 다수의 명령어 각각에 대해 상이한 메타데이터 태그를 포함하는 제 1 메모리 위치를 가리키는 제 1 포인터일 수 있다. 다수의 명령어 중 제 1 명령어에 대해 제 1 메모리 위치에 저장된 적어도 제 1 메타데이터 태그는 제 1 명령어에 대한 메타데이터 태그 정보를 포함하는 제 2 메모리 위치를 가리키는 제 2 포인터를 포함할 수 있다. 제 1 명령어에 대한 메타데이터 태그 정보는 복잡한 구조체를 포함할 수 있다. 복잡한 구조체는 적어도 하나의 스칼라 데이터 필드 및 제 3 메모리 위치를 가리키는 적어도 하나의 포인터 필드를 포함할 수 있다.According to one aspect of the technology described herein, a method of processing an instruction includes: receiving, for metadata processing, a current instruction having an associated metadata tag, the metadata processing performing code execution comprising the current instruction performed in a metadata processing domain isolated from the code execution domain -; determining, in the metadata processing domain, whether a rule exists in the rules cache for the current instruction, according to the metadata tag and the current instruction, wherein the rule cache contains metadata used by the metadata processing that defines allowed actions; rules are included -; and in response to determining that no rule exists in the rule cache for the current instruction, performing rule cache miss processing in the metadata processing domain; The steps include: determining whether execution of the current instruction is permitted; in response to determining that the current instruction is allowed to execute in the code execution domain, generating a new rule for the current instruction; writing to a register; and in response to writing to the register, inserting the new rule into the rule cache. The first metadata used to select a rule for the current instruction may be stored in a first portion of the plurality of control status registers used by metadata processing, the first portion of the plurality of control status registers being used for the current instruction. It can be used to convey a plurality of metadata tags for a metadata processing domain, and a plurality of metadata tags can be used as data in a metadata processing domain. The register may be a first control status register of a plurality of control status registers used by metadata processing, a first portion of the plurality of control status registers used to transfer a plurality of metadata tags from the metadata processing domain to the rules cache. Can be used. Multiple metadata tags can be for the current command. A new rule may be inserted into the rule cache in response to writing another metadata tag to the first control status register, and the other metadata tag may be placed in the result of the current instruction, with the result in one of the destination registers or memory locations. which one can be The plurality of control status registers include: a bootstrap tag control status register containing an initial metadata tag from which all other generated metadata tags are derived; a default tag control status register that specifies a default metadata tag; a public untrusted control status register that specifies a public untrusted metadata tag used to tag instructions and data classified as public and untrusted; an opgroup value control status register containing data written to a table including information on opgroups and care information on different opcodes; an opgroup address control status register that specifies a location in the table where data of the opgroup value control status register is written; and a pumpflush control status register, wherein a write to the pumpflush control status register triggers flushing of the rules cache. The plurality of control status registers may include a tag mode control status register indicating the current mode of metadata processing. The tag mode control status register may indicate when a metadata process is disengaged where the rules of one or more defined policies are not enforced by the metadata process. The tag mode control status register can be set to one of a defined set of allowed states representing the current mode of metadata processing. The allowed states are: any of the off state, states where metadata processing writes defaults to all results, and states where metadata processing is associated and indicates that instructions operate when executed in code domains at one or more specified privilege levels. can include Rule cache miss processing may be performed in a first state of a defined set of allowed states in which metadata processing is disassociated. The allowed state is a first state indicating that metadata processing is only involved when the instruction is executed in the code domain at the user privilege level; a second state indicating that metadata processing is only involved when the instruction is executed in the code domain at the user or supervisor privilege level; a third state indicating that metadata processing is only involved when the instruction is executed in the code domain at the user, supervisor, or hypervisor privilege level; and a fourth state indicating that metadata processing is engaged when the instruction is executed in the code domain at the user, supervisor, hypervisor, or machine privilege level. Whether metadata processing is associated or disassociated may be determined by the current tag mode in the tag mode control status register in combination with the current privilege level of the code executing in the code domain, and the rules of one or more defined policies are associated with metadata processing. may not be enforced when is unassociated, and rules may be enforced when metadata processing is unassociated. The table may include information mapping opcodes of instruction sets to corresponding opgroups and bit vector information. An opgroup may represent a group of related opcodes that are treated similarly by a metadata processing domain. Bit vector information may indicate which specific inputs and outputs for the metadata processing domain are used in connection with processing opcodes. The table may be indexed with a first portion of opcode bits less than the maximum allowable number of opcode bits, and the maximum number may represent an upper bound on the number of bits of an opcode in an instruction set. The first portion of the plurality of control status registers may include an extended opcode control status register containing additional opcode bits, if any, for the current instruction, the current instruction may be included in an instruction set with variable length opcodes; Each opcode in the instruction set may optionally contain additional opcode bits, and the extended opcode control status register contains the additional opcode bits for the current instruction, if any. For each opcode mapped using the table, there is a resulting bit vector corresponding to each opcode, which, if any, is some portion of the additional opcode bits of the extended opcode control status register to perform metadata processing. It can indicate whether it is used together with each of the above opcodes. The current instruction is one of multiple instructions stored in a single word of memory associated with a single metadata tag, and the single metadata tag may be associated with multiple instructions contained in a single word. The plurality of control status registers may include a sub-instruction control status register indicating which of a number of instructions stored in a single word is the current instruction. A single metadata tag may be a first pointer pointing to a first memory location that contains a different metadata tag for each of multiple instructions within a single word. At least a first metadata tag stored in a first memory location for a first command of the plurality of commands may include a second pointer pointing to a second memory location including metadata tag information for the first command. The metadata tag information for the first command may include a complex structure. A complex structure may include at least one scalar data field and at least one pointer field pointing to a third memory location.

본 명세서에서의 기술의 다른 양태에 따르면, 비일시적 컴퓨터 판독 가능한 매체는 실행될 때 명령어를 처리하는 방법을 수행하는 저장된 코드를 포함하고, 명령어를 처리하는 방법은: 메타데이터 처리를 위해, 연관된 메타데이터 태그를 갖는 현재 명령어를 수신하는 단계 - 메타데이터 처리는 현재 명령어를 포함하는 코드 실행 도메인(code execution domain)으로부터 격리된 메타데이터 처리 도메인(metadata processing domain)에서 수행됨 -; 메타데이터 처리 도메인에서, 메타데이터 태그 및 현재 명령어에 따라, 현재 명령어에 대해 규칙이 규칙 캐시 내에 존재하는지를 결정하는 단계 - 규칙 캐시에는 허용된 동작을 정의하는 메타데이터 처리에 의해 사용되는 메타데이터에 관한 규칙이 포함됨 -; 및 현재 명령어에 대해 어떠한 규칙도 규칙 캐시 내에 존재하지 않는다고 결정하는 것에 응답하여, 메타데이터 처리 도메인에서 규칙 캐시 미스 처리(rule cache miss processing)를 수행하는 단계를 포함하고, 규칙 캐시 미스 처리를 수행하는 단계는: 현재 명령어의 실행이 허용되는지를 결정하는 단계; 현재 명령어가 코드 실행 도메인에서 실행되도록 허용된다고 결정하는 것에 응답하여, 현재 명령어에 대한 새로운 규칙을 발생하는 단계; 레지스터에 기입하는 단계; 및 레지스터에 기입하는 것에 응답하여, 새로운 규칙을 규칙 캐시에 삽입하는 단계를 포함한다.According to another aspect of the techniques herein, the non-transitory computer readable medium includes stored code that when executed performs a method of processing instructions, the method of processing instructions comprising: for metadata processing, associated metadata receiving a current instruction with a tag, where metadata processing is performed in a metadata processing domain isolated from a code execution domain containing the current instruction; determining, in the metadata processing domain, whether a rule exists in the rules cache for the current instruction, according to the metadata tag and the current instruction, wherein the rules cache contains information about metadata used by the metadata processing that defines allowed actions; Rules are included -; and in response to determining that no rule exists in the rule cache for the current instruction, performing rule cache miss processing in the metadata processing domain; The steps include: determining whether execution of the current instruction is permitted; in response to determining that the current instruction is allowed to execute in the code execution domain, generating a new rule for the current instruction; writing to the register; and in response to writing to the register, inserting the new rule into the rule cache.

본 명세서에서의 기술의 다른 양태에 따르면, 시스템은: 프로세서; 및 프로세서에 의해 실행될 때, 명령어를 처리하는 방법을 수행하는 저장된 코드를 포함하는 메모리를 포함하고, 명령어를 처리하는 방법은: 메타데이터 처리를 위해, 연관된 메타데이터 태그를 갖는 현재 명령어를 수신하는 단계 - 메타데이터 처리는 현재 명령어를 포함하는 코드 실행 도메인(code execution domain)으로부터 격리된 메타데이터 처리 도메인(metadata processing domain)에서 수행됨 -; 메타데이터 처리 도메인에서, 메타데이터 태그 및 현재 명령어에 따라, 현재 명령어에 대해 규칙이 규칙 캐시 내에 존재하는지를 결정하는 단계 - 규칙 캐시에는 허용된 동작을 정의하는 메타데이터 처리에 의해 사용되는 메타데이터에 관한 규칙이 포함됨 -; 및 현재 명령어에 대해 어떠한 규칙도 규칙 캐시 내에 존재하지 않는다고 결정하는 것에 응답하여, 메타데이터 처리 도메인에서 규칙 캐시 미스 처리(rule cache miss processing)를 수행하는 단계를 포함하고, 규칙 캐시 미스 처리를 수행하는 단계는: 현재 명령어의 실행이 허용되는지를 결정하는 단계; 현재 명령어가 코드 실행 도메인에서 실행되도록 허용된다고 결정하는 것에 응답하여, 현재 명령어에 대한 새로운 규칙을 발생하는 단계; 레지스터에 기입하는 단계; 및 레지스터에 기입하는 것에 응답하여, 새로운 규칙을 규칙 캐시에 삽입하는 단계를 포함한다. 프로세서는 축소 명령어 집합 컴퓨팅 아키텍처(reduced instruction set computing architecture)의 파이프라인 프로세서일 수 있다.According to another aspect of the technology herein, a system includes: a processor; and a memory containing stored code that, when executed by the processor, performs a method of processing an instruction, the method of processing an instruction comprising: receiving, for metadata processing, a current instruction having an associated metadata tag; - Metadata processing is performed in a metadata processing domain isolated from the code execution domain containing the current instruction -; determining, in the metadata processing domain, whether a rule exists in the rules cache for the current instruction, according to the metadata tag and the current instruction, wherein the rules cache contains information about metadata used by the metadata processing that defines allowed actions; Rules are included -; and in response to determining that no rule exists in the rule cache for the current instruction, performing rule cache miss processing in the metadata processing domain; The steps include: determining whether execution of the current instruction is permitted; in response to determining that the current instruction is allowed to execute in the code execution domain, generating a new rule for the current instruction; writing to a register; and in response to writing to the register, inserting the new rule into the rule cache. The processor may be a pipelined processor of a reduced instruction set computing architecture.

본 명세서에서의 기술의 다른 양태에 따르면, 명령어를 처리하는 방법은: 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인(metadata processing domain)에서 수행되는 메타데이터 처리를 위한 현재 명령어를 수신하는 단계; 및 현재 명령어에 대한 메타데이터와 관련하여 메타데이터 처리 도메인에 의해, 하나 이상의 정책들의 세트에 따라 현재 명령어의 실행을 허용할지를 결정하는 단계를 포함하고, 현재 명령어는 제 1 루틴의 스택 프레임의 제 1 위치에 액세스하고, 현재 명령어 및 스택 프레임의 위치는 연관된 메타데이터 태그를 가지며, 하나 이상의 정책들의 세트는 스택 보호를 제공하며 제 1 루틴의 스택 프레임의 저장 위치를 포함하는 스택 저장 위치로의 부적절한 액세스를 방지하는 스택 보호 정책을 포함한다. 스택 보호 정책은 제 1 루틴의 스택 프레임의 제 1 위치에 액세스하는 현재 명령어의 메타데이터 처리에서 사용되는 제 1 규칙을 포함할 수 있다. 제 1 규칙은 제 1 위치가 제 1 루틴의 스택 위치이며 현재 명령어가 제 1 루틴에 포함되어 있음을 표시하는 메타데이터를 제 1 위치가 가졌다면 현재 명령어의 실행을 허용할 수 있다. 현재 명령어는 제 1 루틴의 특정 호출 인스턴스에 의해 사용될 수 있으며, 스택 보호 정책은 현재 명령어의 메타데이터 처리에 사용되는 제 1 규칙을 포함할 수 있다. 제 1 규칙은 현재 명령어가 제 1 루틴에 포함되며 또한 제 1 루틴의 특정 호출 인스턴스에 의해 사용되기도 하면 현재 명령어의 실행을 허용할 수 있다. 제 1 규칙은 제 1 루틴의 특정 호출 인스턴스에 의한 현재 명령어의 실행을 허용할지를 결정하기 위해, 프로그램 카운터와 연관되어 있으며 인가 및 능력 중 임의의 것을 나타내는 메타데이터를 검사하는 것을 포함할 수 있다. 스택 보호 정책은 객체 레벨 보호 - 단일 스택 프레임 내의 상이한 객체는 상이한 컬러 메타데이터 태그를 가짐 - 및 다수의 서브 객체를 포함하는 계층적 객체에 대한 계층적 객체 보호 중 임의의 보호를 제공할 수 있고, 단일 스택 프레임의 다수의 서브 객체 각각은 상이한 메타데이터를 갖는다. 방법은 새로운 루틴 호출을 위한 새로운 스택 프레임을 생성하는 단계; 및 엄격한 객체 초기화(strict object initialization) 또는 느린 객체 컬러화(lazy-object-coloring)에 따라 새로운 스택 프레임의 메모리 위치를 태깅하거나 컬러화하는 단계를 포함할 수 있고, 엄격한 객체 초기화는 새로운 스택 프레임에 정보를 저장하기 전에 새로운 스택 프레임의 각각의 메모리 위치에 초기에 태깅하는 하나 이상의 규칙의 메타데이터 처리를 트리거하는 하나 이상의 명령어를 실행하는 초기화 처리를 수행하는 것을 포함하며, 느린 객체 컬러화는 특정 메모리 위치에 데이터를 저장하는 명령어에 응답하여 트리거되는 규칙의 메타데이터 처리와 관련하여 새로운 스택 프레임의 특정 메모리 위치에 태깅한다. 하나 이상의 정책은 특정 리턴 위치로의 리턴이 특정 호출에 후속하여 이루어진 때만 유효하다는 것을 보장하는 동적 제어 흐름 무결성 정책(dynamic control flow integrity policy)의 시행을 위한 규칙들의 세트를 포함할 수 있다. 제 1 위치는 리턴 명령어를 포함하는 호출된 루틴에 제어를 이전하는 호출 명령어를 포함할 수 있으며, 제 2 위치는 제 2 명령어를 포함할 수 있고, 상기 제 2 위치는 호출된 루틴의 리턴 명령어를 실행한 결과로서 제어가 이전된 리턴 타깃 위치를 나타낼 수 있다. 방법은 호출 명령어를 포함하는 제 1 위치를 제 1 코드 태그로 태깅하는 단계; 리턴 타깃 위치를 나타내는 제 2 위치를 제 2 코드 태그로 태깅하는 단계; 제 1 코드 태그로 태깅된 호출 명령어에 대한 세트의 제 1 규칙의 메타데이터 처리를 수행하는 단계 - 제 1 코드 태그로 태깅된 호출 명령어에 대한 제 1 규칙의 메타데이터 처리는 리턴 어드레스 레지스터가 제 2 위치에 대한 유효 리턴 어드레스를 포함하고 있음을 나타내는 유효 리턴 어드레스 태그로 리턴 어드레스 레지스터에 태깅하는 단계를 포함하고, 호출 명령어의 실행은 제 2 위치로 리턴하는 능력을 나타내기 위해 리턴 어드레스 레지스터상의 태그를 업데이트함 -; 리턴 어드레스 레지스터가 유효 리턴 어드레스로 능력 태그로 태깅되어 있으면 리턴 어드레스 레지스터에 저장된 리턴 어드레스로 제어를 이전하라는 리턴 명령어의 실행을 허용하는, 호출된 루틴의 리턴 명령어에 대한 세트의 제 2 규칙의 메타데이터 처리를 수행하는 단계 - 제 2 규칙은 리턴 어드레스 레지스터의 유효 리턴 어드레스 능력 태그를 리턴 명령어의 런타임 실행에 뒤이은 다음 명령어에 사용되는 프로그램 카운터 태그에 전파함 -; 및 리턴 명령어의 런타임 실행을 따르는 제 2 명령어에 대한 세트의 제 3 규칙의 메타데이터 처리를 수행하는 단계를 포함할 수 있고, 제 3 규칙의 메타데이터 처리는 제 2 명령어가 제 2 코드 태그와 동일한 코드 태그를 갖고 있으면, 그리고 프로그램 카운터 태그가 유효 리턴 어드레스 능력 태그이면, 제 2 명령어의 실행을 허용하고, 제 3 규칙은 제 2 명령어의 런타임 실행에 뒤이은 다음 명령어에 사용되는 프로그램 카운터 태그를 클리어한다.According to another aspect of the techniques herein, a method for processing a command includes: receiving a current command for metadata processing performed in a metadata processing domain isolated from a code execution domain containing the current command doing; and determining, by the metadata processing domain in conjunction with the metadata for the current instruction, whether to allow execution of the current instruction according to the set of one or more policies, the current instruction being in the first stack frame of the first routine. A location is accessed, the location of the current instruction and stack frame has an associated metadata tag, and a set of one or more policies provides stack protection and prevents improper access to a stack storage location that includes the storage location of the first routine's stack frame. includes a stack protection policy that prevents The stack protection policy may include a first rule used in metadata processing of a current instruction accessing a first location in a stack frame of a first routine. The first rule may allow execution of the current instruction if the first position is the stack location of the first routine and the first position has metadata indicating that the current instruction is included in the first routine. The current instruction may be used by a particular invocation instance of the first routine, and the stack protection policy may include a first rule used for processing the metadata of the current instruction. The first rule may allow execution of the current instruction if it is included in the first routine and is also used by a particular invocation instance of the first routine. A first rule may include examining metadata associated with a program counter and indicating any of authorizations and capabilities to determine whether to allow execution of the current instruction by a particular invocation instance of the first routine. A stack protection policy may provide any of object level protection - different objects within a single stack frame have different color metadata tags - and hierarchical object protection for hierarchical objects that contain multiple sub-objects; Each of the multiple sub-objects of a single stack frame has different metadata. The method includes creating a new stack frame for calling the new routine; and tagging or coloring the memory location of the new stack frame according to strict object initialization or lazy-object-coloring, wherein the strict object initialization assigns information to the new stack frame. performing an initialization process that executes one or more instructions that triggers metadata processing of one or more rules that initially tag each memory location of the new stack frame prior to storage, wherein slow object colorization is performed to perform data at specific memory locations; Tag a specific memory location of a new stack frame in relation to metadata processing of a rule triggered in response to a command that stores . One or more policies may include a set of rules for enforcing a dynamic control flow integrity policy that ensures that a return to a particular return location is only valid when made subsequent to a particular call. A first position may contain a call instruction that transfers control to a called routine that contains a return instruction, a second position may contain a second instruction, and the second position may contain a return instruction of a called routine. As a result of execution, the return target position to which control is transferred can be indicated. The method includes tagging a first location containing an invocation instruction with a first code tag; tagging a second position indicating a return target position with a second code tag; performing metadata processing of rule 1 of the set for invocation instructions tagged with the first code tag, wherein metadata processing of rule 1 for invocation instructions tagged with the first code tag causes the return address register to tagging the return address register with an effective return address tag indicating that it contains an effective return address for a location, wherein execution of the calling instruction places a tag on the return address register to indicate an ability to return to a second location. updated -; Metadata of the second rule of the set for the return instruction of the called routine that allows execution of the return instruction to transfer control to the return address stored in the return address register if the return address register is tagged with a capability tag as an effective return address. performing processing, the second rule propagating the effective return address capability tag in the return address register to the program counter tag used in the next instruction following run-time execution of the return instruction; and performing metadata processing of a third rule of the set for a second command that follows run-time execution of the return instruction, wherein metadata processing of the third rule determines that the second instruction is equal to the second code tag. code tag, and if the program counter tag is a valid return address capability tag, allows execution of the second instruction, and the third rule clears the program counter tag used for the next instruction following run-time execution of the second instruction. do.

본 명세서에서의 기술의 다른 양태에 따르면, 명령어를 처리하는 방법은: 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 수행되는 메타데이터 처리를 위한 현재 명령어를 수신하는 단계; 및 현재 명령어에 대한 메타데이터와 관련하여 메타데이터 처리 도메인에 의해, 하나 이상의 정책들의 세트에 따라 현재 명령어의 실행을 허용할지를 결정하는 단계를 포함하고, 하나 이상의 정책은 완전한 시퀀스의 제 1 명령어로부터 완전한 시퀀스의 마지막 명령어까지의 특정된 순서의 완전한 명령어 시퀀스의 실행을 시행하는 한 세트의 규칙을 포함한다. 방법은 제 1 공유된 물리적 페이지를 제 1 프로세스의 제 1 가상 어드레스 공간에 매핑하는 단계; 및 제 1 공유된 물리적 페이지를 제 2 프로세스에 대한 제 2 가상 어드레스 공간에 매핑하는 단계를 포함할 수 있고, 상기 제 1 공유된 물리적 페이지는 복수의 메모리 위치를 포함하고, 복수의 메모리 위치 각각은 메타데이터 처리 도메인에서 규칙 처리와 관련하여 사용되는 복수의 글로벌 메타데이터 태그 중 하나와 연관된다. 복수의 글로벌 메타데이터 태그는 적어도 제 1 프로세스 및 제 2 프로세스를 포함하는 다수의 프로세스에 의해 공유되는 메타데이터 태그들의 세트를 나타낼 수 있으며, 메타데이터 처리 도메인에 의해 제 1 프로세스 및 제 2 프로세스 둘 모두에 대해 동일한 정책이 시행될 수 있다. 메타데이터 처리 도메인에 의한 동일한 정책의 시행은 제 1 프로세스가 제 2 프로세스에 대한 동일한 정책에 의해 그렇지 않았다면 허용되지 않은 동작을 수행할 수 있게 하는 메타데이터를 사용할 수 있으며, 프로그램 카운터는 연관된 프로그램 카운터 태그를 가질 수 있으며, 연관된 프로그램 카운터 태그의 상이한 값은 동일한 정책의 규칙에 의해 사용되어 제 1 프로세스로 하여금 제 2 프로세스에 대한 동일한 정책에 의해 그렇지 않았다면 허용되지 않은 동작을 수행할 수 있게 할 수 있다. 방법은 애플리케이션의 할당 루틴에 의해 제 1 처리를 수행하여 애플리케이션에 대한 현재 컬러를 사용하여 애플리케이션에 대한 다음 컬러를 발생하는 단계를 더 포함할 수 있고, 애플리케이션에 대한 현재 컬러는 애플리케이션에 대한 애플리케이션-특정 컬러 시퀀스의 현재 상태를 나타내고, 다음 컬러는 애플리케이션에 대한 애플리케이션-특정 컬러 시퀀스의 다음 상태를 나타내며, 현재 컬러는 제 1 원자상의 제 1 메타데이터 태그에 저장된다. 제 1 처리는 제 1 하나 이상의 명령어를 실행하는 단계를 포함할 수 있고, 제 1 하나 이상의 명령어는 메타데이터 처리 도메인에 의해 하나 이상의 규칙을 사용하는 메타데이터 처리를 트리거하고, 메타데이터 처리 도메인에 의한 하나 이상의 규칙을 사용하는 메타데이터 처리는 현재 컬러를 사용하여 다음 컬러를 발생하며, 다음 컬러를 제 1 원자의 제 1 메타데이터 태그에 저장함으로써 애플리케이션에 대한 애플리케이션-특정 컬러 시퀀스의 현재 상태를 업데이트한다. 제 1 하나 이상의 명령어는 애플리케이션의 할당 루틴에 포함될 수 있으며, 제 1 원자는 레지스터 및 메모리 위치 중 어느 것일 수 있다. 애플리케이션-특정 컬러 시퀀스는 애플리케이션에 의해 사용하는데 이용 가능한 상이한 컬러의 무한한 시퀀스일 수 있다. , 다음 컬러는 애플리케이션에 의해 사용되는 하나 이상의 메모리 위치 각각에 대해 태그 값으로서 저장될 수 있고, 하나 이상의 메모리 위치는 할당 루틴에 의해 할당될 수 있다. 규칙들의 세트는 제 1 규칙 및 제 2 규칙을 포함할 수 있으며, 완전한 명령어 시퀀스는 제 1 명령어 및 제 2 명령어를 포함할 수 있으며, 제 2 명령어는 제 1 명령어 바로 뒤이어 실행될 수 있다. 방법은 제 1 명령어에 대한 제 1 규칙의 메타데이터 처리를 수행하는 단계 - 제 1 규칙의 메타데이터 처리는 제 1 명령어의 런타임 실행에 뒤이은 다음 명령어에 사용되는 프로그램 카운터의 프로그램 카운터 태그를 특수 태그 값으로 설정하는 단계를 포함함 -; 및 제 2 명령어에 대한 제 2 규칙의 메타데이터 처리를 수행하는 단계를 포함할 수 있고, 제 2 규칙의 메타데이터 처리는 제 2 명령어에 대한 프로그램 카운터의 프로그램 카운터 태그가 특수 태그와 동일한 때만 제 2 명령어의 실행이 허용됨을 보장하는 단계를 포함한다.According to another aspect of the techniques herein, a method of processing an instruction includes: receiving a current instruction for metadata processing performed in a metadata processing domain isolated from a code execution domain containing the current instruction; and determining, by the metadata processing domain in conjunction with the metadata for the current instruction, whether to allow execution of the current instruction according to a set of one or more policies, wherein the one or more policies determine whether to allow execution of the current instruction from the first instruction in the complete sequence. It contains a set of rules that enforce execution of a complete sequence of instructions in a specified order up to the last instruction in the sequence. The method includes mapping a first shared physical page into a first virtual address space of a first process; and mapping a first shared physical page into a second virtual address space for a second process, wherein the first shared physical page includes a plurality of memory locations, each of the plurality of memory locations It is associated with one of a plurality of global metadata tags used in connection with rule processing in the metadata processing domain. The plurality of global metadata tags may represent a set of metadata tags shared by a plurality of processes, including at least a first process and a second process, both of the first process and the second process by a metadata processing domain. The same policy can be enforced for Enforcement of the same policy by the metadata processing domain may use metadata that enables a first process to perform actions not otherwise permitted by the same policy for a second process, and the program counter is associated with a program counter tag. , and different values of the associated program counter tags can be used by the rules of the same policy to allow the first process to perform actions not otherwise permitted by the same policy to the second process. The method may further include performing a first process by an assignment routine of the application to generate a next color for the application using a current color for the application, the current color for the application being application-specific for the application. Indicates the current state of the color sequence, the next color indicates the next state of the application-specific color sequence for the application, the current color is stored in the first metadata tag on the first atom. The first processing may include executing a first one or more instructions, the first one or more instructions triggering, by the metadata processing domain, metadata processing using one or more rules; Metadata processing using one or more rules uses the current color to generate the next color, and updates the current state of the application-specific color sequence for the application by storing the next color in the first metadata tag of the first atom. . The first one or more instructions may be included in an allocation routine of an application, and the first atom may be any of a register and a memory location. An application-specific color sequence can be an infinite sequence of different colors available for use by an application. , the next color may be stored as a tag value for each one or more memory locations used by the application, and the one or more memory locations may be allocated by an allocation routine. The set of rules may include a first rule and a second rule, and the complete command sequence may include a first command and a second command, the second command being executed immediately following the first command. The method comprises performing metadata processing of a first rule for a first instruction, wherein metadata processing of the first rule converts a program counter tag of a program counter used in a next instruction following runtime execution of the first instruction into a special tag. Including the step of setting to a value -; and performing metadata processing of a second rule for the second instruction, wherein the metadata processing of the second rule performs processing of the second rule only when the program counter tag of the program counter for the second instruction is equal to the special tag. and ensuring that the execution of the instruction is permitted.

본 명세서에서의 기술의 다른 양태에 따르면, 비일시적 컴퓨터 판독 가능한 매체는 실행될 때 명령어를 처리하는 방법을 수행하는 저장된 코드를 포함하고, 명령어를 처리하는 방법은: 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 수행되는 메타데이터 처리를 위한 현재 명령어를 수신하는 단계; 및 현재 명령어에 대한 메타데이터와 관련하여 메타데이터 처리 도메인에 의해, 하나 이상의 정책들의 세트에 따라 현재 명령어의 실행을 허용할지를 결정하는 단계를 포함하고, 현재 명령어는 제 1 루틴의 스택 프레임의 제 1 위치에 액세스하고, 현재 명령어 및 스택 프레임의 위치는 연관된 메타데이터 태그를 가지며, 하나 이상의 정책들의 세트는 스택 보호를 제공하며 제 1 루틴의 스택 프레임의 저장 위치를 포함하는 스택 저장 위치로의 부적절한 액세스를 방지하는 스택 보호 정책을 포함한다. According to another aspect of the techniques herein, the non-transitory computer readable medium includes stored code that, when executed, performs a method of processing an instruction, the method of processing the instruction: from a code execution domain containing the current instruction. Receiving a current command for metadata processing performed in an isolated metadata processing domain; and determining, by the metadata processing domain in conjunction with the metadata for the current instruction, whether to allow execution of the current instruction according to the set of one or more policies, the current instruction being in the first stack frame of the first routine. A location is accessed, the location of the current instruction and stack frame has an associated metadata tag, and a set of one or more policies provides stack protection and prevents improper access to a stack storage location that includes the storage location of the first routine's stack frame. includes a stack protection policy that prevents

본 명세서에서의 기술의 다른 양태에 따르면, 시스템은: 프로세서; 및 프로세서에 의해 실행될 때, 명령어를 처리하는 방법을 수행하는 저장된 코드를 포함하는 메모리를 포함하고, 명령어를 처리하는 방법은: 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 수행되는 메타데이터 처리를 위한 현재 명령어를 수신하는 단계; 및 현재 명령어에 대한 메타데이터와 관련하여 메타데이터 처리 도메인에 의해, 하나 이상의 정책들의 세트에 따라 현재 명령어의 실행을 허용할지를 결정하는 단계를 포함하고, 현재 명령어는 제 1 루틴의 스택 프레임의 제 1 위치에 액세스하고, 현재 명령어 및 스택 프레임의 위치는 연관된 메타데이터 태그를 가지며, 하나 이상의 정책들의 세트는 스택 보호를 제공하며 제 1 루틴의 스택 프레임의 저장 위치를 포함하는 스택 저장 위치로의 부적절한 액세스를 방지하는 스택 보호 정책을 포함한다.According to another aspect of the technology herein, a system includes: a processor; and a memory containing stored code that, when executed by the processor, performs a method of processing the instruction, wherein the method of processing the instruction is performed in a metadata processing domain that is isolated from the code execution domain containing the current instruction. Receiving a current command for metadata processing; and determining, by the metadata processing domain in conjunction with the metadata for the current instruction, whether to allow execution of the current instruction according to the set of one or more policies, the current instruction being in the first stack frame of the first routine. A location is accessed, the location of the current instruction and stack frame has an associated metadata tag, and a set of one or more policies provides stack protection and prevents improper access to a stack storage location that includes the storage location of the first routine's stack frame. includes a stack protection policy that prevents

본 명세서에서의 기술의 다른 양태에 따르면, 실행될 때, 명령어를 처리하는 방법을 수행하는 저장된 코드를 포함하는 비일시적 컴퓨터 판독 가능한 매체로서, 명령어를 처리하는 방법은: 명령어를 처리하는 방법은: 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 수행되는 메타데이터 처리를 위한 현재 명령어를 수신하는 단계; 및 현재 명령어에 대한 메타데이터와 관련하여 메타데이터 처리 도메인에 의해, 하나 이상의 정책들의 세트에 따라 현재 명령어의 실행을 허용할지를 결정하는 단계를 포함하고, 하나 이상의 정책은 완전한 시퀀스의 제 1 명령어로부터 완전한 시퀀스의 마지막 명령어까지의 특정된 순서의 완전한 명령어 시퀀스의 실행을 시행하는 규칙들의 세트를 포함한다.According to another aspect of the technology herein, a non-transitory computer-readable medium containing stored code that, when executed, performs a method of processing instructions, the method of processing instructions comprising: currently receiving a current instruction for metadata processing performed in a metadata processing domain isolated from a code execution domain containing the instruction; and determining, by the metadata processing domain in conjunction with the metadata for the current instruction, whether to allow execution of the current instruction according to a set of one or more policies, wherein the one or more policies determine whether to allow execution of the current instruction from the first instruction in the complete sequence. It contains a set of rules governing the execution of a complete sequence of instructions in a specified order up to the last instruction in the sequence.

본 명세서에서의 기술의 다른 양태에 따르면, 시스템은: 프로세서; 및 프로세서에 의해 실행될 때, 명령어를 처리하는 방법을 수행하는 저장된 코드를 포함하는 메모리를 포함하고, 명령어를 처리하는 방법은: 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 수행되는 메타데이터 처리를 위한 현재 명령어를 수신하는 단계; 및 현재 명령어에 대한 메타데이터와 관련하여 메타데이터 처리 도메인에 의해, 한 세트의 하나 이상의 정책에 따라 현재 명령어의 실행을 허용할지를 결정하는 단계를 포함하고, 하나 이상의 정책은 완전한 시퀀스의 제 1 명령어로부터 완전한 시퀀스의 마지막 명령어까지의 특정된 순서의 완전한 명령어 시퀀스의 실행을 시행하는 규칙들의 세트를 포함한다. According to another aspect of the technology herein, a system includes: a processor; and a memory containing stored code that, when executed by the processor, performs a method of processing the instruction, wherein the method of processing the instruction is performed in a metadata processing domain that is isolated from the code execution domain containing the current instruction. Receiving a current command for metadata processing; and determining, by the metadata processing domain in conjunction with the metadata for the current instruction, whether to allow execution of the current instruction according to a set of one or more policies, wherein the one or more policies determine whether to allow execution of the current instruction from a first instruction in the complete sequence. It contains a set of rules governing the execution of a complete sequence of instructions in a specified order up to the last instruction in the complete sequence.

본 명세서에서의 기술의 다른 양태에 따르면, 메타데이터 태그를 발생하여 사용하는 방법은: 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 사용되는 복수의 특정된 레지스터 중 제 1 특정된 레지스터에 부트스트랩 태그(bootstrap tag)를 저장하는 단계; 및 부트스트랩 태그로부터 하나 이상의 추가 메타데이터 태그를 도출하는 제 1 처리를 수행하는 단계를 포함하고, 제 1 처리는 메타데이터 처리 도메인 내의 하나 이상의 규칙의 메타데이터 처리를 트리거하는 코드 실행 도메인 내의 하나 이상의 명령어를 실행하는 것을 포함한다. 부트스트랩 태그는 메타데이터 처리 도메인에 의해 사용되는 모든 다른 메타데이터 태그가 도출되는 초기 시드 태그(seed tag)로서 사용될 수 있다. 부트스트랩 태그는 하드와이어(hardwired)되거나 판독 전용 메모리의 일부분에 저장될 수 있다. 저장 단계 및 제 1 처리는 메타데이터 처리 도메인 및 코드 실행 도메인을 포함하는 시스템을 부팅할 때 부트스트랩 프로그램의 제 1 코드 부분을 실행함으로써 수행되는 처리에 포함될 수 있다. 방법은 제 1 특정된 레지스터에 저장된 부트스트랩 태그로부터 디폴트 태그를 도출하는 단계; 디폴트 태그를 복수의 특정된 레지스터 중 제 2 특정된 레지스터에 저장하는 단계; 및 제 2 특정된 레지스터로부터 디폴트 태그를 코드 실행 도메인에 의해 사용되는 복수의 메모리 위치 각각에 대해 메타데이터 태그로서 기입하는 메타데이터 처리 도메인 내의 규칙의 메타데이터 처리를 트리거하는 명령어 시퀀스를 실행하는 단계를 포함할 수 있다. 제 1 처리는 부트스트랩 태그로부터 도출된 메타데이터 태그들의 초기 세트를 발생하는 것을 포함할 수 있고, 초기 세트의 메타데이터 태그 각각은 현재 명령어에 대해 어떠한 규칙도 규칙 캐시 내에 존재하지 않는 메타데이터 처리 도메인 내의 규칙 캐시 미스 처리를 트리거하는 코드 실행 도메인 내의 현재 명령어를 실행함으로써 발생될 수 있고, 규칙 캐시는 허용된 동작을 정의하는 메타데이터 처리 도메인에 의해 사용되는 메타데이터에 관한 규칙을 포함한다. 규칙 캐시 미스 처리는: 메타데이터 처리 도메인에서 실행되는 규칙 캐시 미스 핸들러(rule cache miss handler)에 의해, 현재 명령어에 대한 새로운 규칙을 계산하는 단계를 포함하고, 새로운 규칙은 메타데이터 태그의 초기 세트의 결과 메타데이터 태그를 포함한다. 초기 세트의 각각의 메타데이터 태그는 다른 메타데이터 태그를 도출하기 위해 추가로 사용될 수 있는 태그 발생기(tag generator)일 수 있다. 하나 이상의 특정된 명령어들의 제 1 세트의 실행은 하나 이상의 다른 메타데이터 태그의 시퀀스를 발생하는데 사용되는 태그 발생기로 표시된 각 메타데이터 태그를 발생하는 메타데이터 처리 도메인에서 규칙 및 규칙 캐시 미스 처리를 트리거할 수 있으며, 하나 이상의 특정된 명령어들의 제 2 세트의 실행은 추가의 메타데이터 태그를 더 발생하기 위해 사용될 수 없는 비-발생 태그(non-generating tag)로 표시된 각각의 메타데이터 태그를 발생하는 메타데이터 처리 도메인 내의 규칙 및 규칙 캐시 미스 처리를 트리거할 수 있다. 부트스트랩 프로그램은 확장된 권한, 능력 또는 인가를 태깅된 하나 이상의 명령어에 제공하기 위해, 지정된 코드 부분의 하나 이상의 명령어 상에 하나 이상의 특수 메타데이터 코드 태그를 기입하는 메타데이터 처리 도메인에서 처리된 규칙을 트리거하는 명령어를 더 포함할 수 있다. 지정된 코드 부분은 커널 코드(kernel code) 및 로더 코드(loader code) 중 하나 이상을 포함할 수 있다. 하나 이상의 특수 메타데이터 코드 태그는 메타데이터 태그들의 초기 세트 중의 제 1 메타데이터 태그로부터 도출되고, 제 1 메타데이터 태그는 특수 명령어 태그 발생기이다. 메타데이터 태그들의 초기 세트는: 명령어에 태깅하는데 사용되는 하나 이상의 코드 태그의 시퀀스를 발생하는데 사용되는 태그 발생기인 초기 명령어 메타데이터 태그; 하나 이상의 다른 malloc 태그 발생기(malloc tag generator)의 시퀀스를 발생하는데 사용되는 태그 발생기인 초기 malloc 메타데이터 태그 - 하나 이상의 다른 malloc 태그 발생기 각각은 할당된 메모리 셀 및 상이한 애플리케이션에 의해 사용되는 할당된 메모리 셀을 가리키는 포인터 중 어느 것을 컬러화하는 것과 관련하여 상이한 애플리케이션에 대한 하나 이상의 다른 메타데이터 태그의 시퀀스를 발생하는데 사용됨 -; 하나 이상의 다른 제어 흐름 무결성 태그 발생기(control flow integrity tag generator)의 시퀀스를 발생하는데 사용되는 태그 발생기인 초기 제어 흐름 무결성 태그 - 하나 이상의 다른 제어 흐름 무결성 태그 발생기 각각은 상이한 애플리케이션의 제어 이전 타깃에 태깅하는 것과 관련하여 상이한 애플리케이션에 대한 하나 이상의 다른 다른 메타데이터 태그의 시퀀스를 발생하는데 사용됨 -; 및 하나 이상의 다른 테인트 태그 발생기(taint tag generator)의 시퀀스를 발생하는데 사용되는 태그 발생기인 초기 테인트 태그 중 임의의 하나 이상을 포함할 수 있고, 하나 이상의 다른 테인트 태그 발생기 각각은 상이한 애플리케이션에 의해 사용되는 데이터 아이템을 데이터 아이템을 생성했거나 수정한 코드에 기초하여 메타데이터 테인트 태그로 태깅하는 것과 관련하여 상이한 애플리케이션에 대한 하나 이상의 다른 메타데이터 테인트 태그의 시퀀스를 발생하는데 사용된다. 메타데이터 처리 도메인에서 규칙의 다른 처리를 트리거하는 명령어를 실행함으로써 메타데이터 태그의 시퀀스가 발생될 수 있다. 다른 처리는 시퀀스 내의 현재 메타데이터 태그를 사용하여 시퀀스 내의 다음 메타데이터 태그를 발생하는 단계 - 현재 메타데이터 태그는 시퀀스의 현재 상태를 나타내며 원자와 연관된 메타데이터 태그로서 저장되고, 원자는 레지스터 또는 메모리 위치 중 어느 것임 -; 및 다음 메타데이터 태그를 원자와 연관된 메타데이터 태그로서 저장함으로써 시퀀스의 현재 상태를 업데이트하는 단계를 포함할 수 있다.According to another aspect of the techniques herein, a method of generating and using a metadata tag includes: a bootstrap tag in a first specified register of a plurality of specified registers used in a metadata processing domain isolated from a code execution domain; Storing (bootstrap tag); and performing a first process that derives one or more additional metadata tags from the bootstrap tag, the first process comprising one or more processes within the code execution domain that trigger metadata processing of one or more rules within the metadata processing domain. This includes executing commands. The bootstrap tag can be used as an initial seed tag from which all other metadata tags used by the metadata processing domain are derived. Bootstrap tags can be hardwired or stored in a portion of read-only memory. The storing step and the first processing may be included in the processing performed by executing the first code part of the bootstrap program when booting the system including the metadata processing domain and the code execution domain. The method includes deriving a default tag from a bootstrap tag stored in a first specified register; storing the default tag in a second specified register among a plurality of specified registers; and executing a sequence of instructions that triggers metadata processing of a rule in the metadata processing domain that writes a default tag from the second specified register as a metadata tag for each of a plurality of memory locations used by the code execution domain. can include The first process may include generating an initial set of metadata tags derived from the bootstrap tag, each metadata tag in the initial set of metadata processing domains for which no rule exists in the rules cache for the current instruction. may be generated by executing a current instruction within a code execution domain that triggers processing of a rule cache miss within, the rules cache containing rules relating to metadata used by metadata processing domains defining allowed actions. Handling a rule cache miss includes: calculating, by a rule cache miss handler running in a metadata processing domain, a new rule for the current instruction, the new rule of an initial set of metadata tags. Include the resulting metadata tag. Each metadata tag in the initial set may be a tag generator that may further be used to derive other metadata tags. Execution of the first set of one or more specified instructions will trigger processing of rules and rule cache misses in the metadata processing domain generating each metadata tag indicated by the tag generator used to generate the sequence of one or more other metadata tags. and execution of the second set of one or more specified instructions generates metadata for each metadata tag marked as a non-generating tag that cannot be used to further generate additional metadata tags. It can trigger processing of rules and rule cache misses within a processing domain. Bootstrap programs are rules processed in the metadata processing domain that write one or more special metadata code tags on one or more instructions of a designated code section to provide extended privileges, capabilities, or authorizations to the tagged one or more instructions. A triggering command may be further included. The designated code portion may include one or more of kernel code and loader code. One or more special metadata code tags are derived from a first metadata tag of the initial set of metadata tags, the first metadata tag being a special instruction tag generator. An initial set of metadata tags may include: an initial instruction metadata tag, which is a tag generator used to generate a sequence of one or more code tags used to tag instructions; An initial malloc metadata tag, which is a tag generator used to generate a sequence of one or more other malloc tag generators - each of which has one or more other malloc tag generators allocated memory cells and allocated memory cells used by different applications. used to generate a sequence of one or more other metadata tags for different applications with respect to colorizing any of the pointers pointing to -; An initial control flow integrity tag, which is a tag generator used to generate a sequence of one or more other control flow integrity tag generators - each of one or more other control flow integrity tag generators that tag a different application's control transfer target. used to generate a sequence of one or more other metadata tags for different applications with respect to; and an initial taint tag, which is a tag generator used to generate a sequence of one or more other taint tag generators, each of the one or more other taint tag generators for a different application. It is used to generate a sequence of one or more other metadata taint tags for different applications in conjunction with tagging data items used by a metadata taint tag based on the code that created or modified the data item. Sequences of metadata tags can be generated by executing commands that trigger other processing of rules in the metadata processing domain. Another processing step is to use the current metadata tag in the sequence to generate the next metadata tag in the sequence, the current metadata tag representing the current state of the sequence and stored as a metadata tag associated with an atom, the atom being a register or memory location. which of -; and updating the current state of the sequence by storing the next metadata tag as the metadata tag associated with the atom.

본 명세서에서의 기술의 다른 양태에 따르면, 애플리케이션에 대한 제어 흐름 정보를 획득하는 방법은: 프로세서에 의한 실행을 위해 애플리케이션을 로드하는 로더를 실행하는 단계 - 상기 로더를 실행하는 단계는 메타데이터 처리 도메인에서 제 1 세트의 하나 이상의 규칙의 메타데이터 처리를 트리거하는 하나 이상의 명령어를 포함하는 제 1 코드 부분을 실행하는 단계를 포함하고, 제 1 세트의 하나 이상의 규칙의 메타데이터 처리는 메타데이터 처리 도메인에 액세스 가능하고 코드 실행 도메인에 액세스 불가능한 애플리케이션 메타데이터로서 애플리케이션에 대한 제어 흐름 정보를 수집하고 저장하는 단계를 포함함 -; 및 코드 실행 도메인에서 애플리케이션의 명령어를 실행하는 단계를 포함하고, 애플리케이션의 상기 명령어를 실행하는 상기 단계는 제어 흐름 정보의 적어도 일부분을 사용하여, 애플리케이션의 제어를 제 1 소스 위치로부터 제 1 타깃 위치로 이전할지를 결정하는 제어 흐름 정책의 제 2 세트의 규칙의 메타데이터 처리를 트리거한다. 제 1 타깃 위치는 제어를 제 1 타깃 위치로 이전하도록 허용된 한 세트의 하나 이상의 허용 가능한 소스 위치를 가질 수 있다. 애플리케이션에 대한 제어 흐름 정보를 수집하고 애플리케이션 메타데이터로서 저장하는 단계는 메타데이터 처리 도메인이 다른 처리를 수행하는 것을 더 포함할 수 있다. 다른 처리는 제 1 타깃 위치를 하나 이상의 허용 가능한 소스 위치들의 세트를 식별하는 제 1 메타데이터로 태깅하는 단계를 포함할 수 있고, 제 1 메타데이터는 애플리케이션 메타데이터의 제어 흐름 정보의 일부로서 저장된다. 애플리케이션의 제 1 명령어는 제 1 소스 위치로부터 제 1 타깃 위치로 제어를 이전할 수 있으며, 제 1 명령어는 제 1 메타데이터를 사용하여 제 1 소스 위치가 제 1 타깃 위치로 제어를 이전하도록 허용된 하나 이상의 허용 가능한 소스 위치들의 세트에 포함되는지를 결정함으로써 제 1 명령어의 실행을 허용할지를 결정하는 제어 흐름 정책의 하나 이상의 규칙의 메타데이터 처리를 트리거할 수 있다. 다른 처리는 또한 세트의 각각의 허용 가능한 소스 위치를 고유 소스 메타데이터 태그로 태깅하는 단계를 포함할 수 있다. 각각의 허용 가능한 소스 위치의 각각의 고유 소스 메타데이터 태그는 애플리케이션에 대한 소스 메타데이터 태그의 제 1 시퀀스에 포함될 수 있고, 제 1 시퀀스는 제어 흐름 발생기 태그로부터 발생된 소스 메타데이터 태그의 고유 시퀀스일 수 있다. 제어 흐름 발생기 태그는 초기 부트스트랩 태그로부터 도출된 초기 제어 흐름 발생기 태그로부터 발생될 수 있다. 초기 제어 흐름 발생기 태그는 복수의 추가 제어 흐름 발생기 태그를 발생하는데 사용될 수 있고, 추가 제어 흐름 발생기 태그 각각은 상이한 애플리케이션에 대한 고유 소스 메타데이터 태그의 시퀀스를 발생하는데 사용될 수 있다. According to another aspect of the techniques herein, a method of obtaining control flow information for an application includes: executing a loader that loads the application for execution by a processor, wherein the executing the loader includes a metadata processing domain Executing a first code portion comprising one or more instructions triggering metadata processing of one or more rules in a first set in the metadata processing domain; collecting and storing control flow information for the application as application metadata that is accessible and inaccessible to the code execution domain; and executing instructions of the application in the code execution domain, wherein executing the instructions of the application transfers control of the application from a first source location to a first target location using at least a portion of the control flow information. Trigger metadata processing of a second set of rules in the control flow policy that determine whether to migrate. The first target location may have a set of one or more allowable source locations that are allowed to transfer control to the first target location. Collecting control flow information for the application and storing it as application metadata may further include the metadata processing domain performing other processing. Another process may include tagging the first target location with first metadata that identifies a set of one or more acceptable source locations, the first metadata stored as part of the control flow information of the application metadata. . A first instruction of the application may transfer control from a first source location to a first target location, and the first instruction may use the first metadata to allow the first source location to transfer control to the first target location. Determining whether it is included in the set of one or more allowable source locations may trigger metadata processing of one or more rules of the control flow policy that determine whether to allow execution of the first instruction. Another process may also include tagging each acceptable source location in the set with a unique source metadata tag. Each unique source metadata tag of each allowable source location may be included in a first sequence of source metadata tags for the application, the first sequence being a unique sequence of source metadata tags generated from a control flow generator tag. can A control flow generator tag may be generated from an initial control flow generator tag derived from an initial bootstrap tag. The initial control flow generator tag can be used to generate a plurality of additional control flow generator tags, and each additional control flow generator tag can be used to generate a unique sequence of source metadata tags for a different application.

본 명세서에서의 기술의 다른 양태에 따르면, 비일시적 컴퓨터 판독가능한 매체는 실행될 때, 메타데이터 태그를 발생하여 사용하는 방법을 수행하는 저장된 코드를 포함하는 비일시적 컴퓨터 판독 가능한 매체로서, 메타데이터 태그를 발생하여 사용하는 방법은: 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 사용되는 복수의 특정된 레지스터 중 제 1 특정된 레지스터에 부트스트랩 태그(bootstrap tag)를 저장하는 단계; 및 부트스트랩 태그로부터 하나 이상의 추가 메타데이터 태그를 도출하는 제 1 처리를 수행하는 단계를 포함하고, 제 1 처리는 메타데이터 처리 도메인에서 하나 이상의 규칙의 메타데이터 처리를 트리거하는 하나 이상의 명령어를 코드 실행 도메인에서 실행하는 단계를 포함한다. According to another aspect of the technology herein, a non-transitory computer-readable medium is a non-transitory computer-readable medium comprising stored code that, when executed, performs a method of generating and using a metadata tag, comprising: A method of generating and using includes: storing a bootstrap tag in a first specified register of a plurality of specified registers used in a metadata processing domain isolated from a code execution domain; and performing a first process that derives one or more additional metadata tags from the bootstrap tag, the first process code executing one or more instructions that trigger metadata processing of one or more rules in the metadata processing domain. Including the steps to run on the domain.

본 명세서에서의 기술의 다른 양태에 따르면, 시스템은: 프로세서; 및 실행될 때, 메타데이터 태그를 발생하여 사용하는 방법을 수행하는 저장된 코드를 포함하는 메모리를 포함하고, 메타데이터 태그를 발생하여 사용하는 방법은: 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 사용되는 복수의 특정된 레지스터 중 제 1 특정된 레지스터에 부트스트랩 태그(a bootstrap tag)를 저장하는 단계; 및 부트스트랩 태그로부터 하나 이상의 추가 메타데이터 태그를 도출하는 제 1 처리를 수행하는 단계를 포함하고, 제 1 처리는 메타데이터 처리 도메인에서 하나 이상의 규칙의 메타데이터 처리를 트리거하는 하나 이상의 명령어를 코드 실행 도메인에서 실행하는 단계를 포함한다. According to another aspect of the technology herein, a system includes: a processor; and a memory containing stored code that, when executed, performs a method of generating and using metadata tags, wherein the method of generating and using metadata tags is used in a metadata processing domain isolated from a code execution domain. storing a bootstrap tag in a first specified register among a plurality of specified registers; and performing a first process that derives one or more additional metadata tags from the bootstrap tag, the first process code executing one or more instructions that trigger metadata processing of one or more rules in the metadata processing domain. Including the steps to run on the domain.

본 명세서에서의 기술의 다른 양태에 따르면, 비일시적 컴퓨터 판독가능한 매체는 실행될 때, 애플리케이션에 대한 제어 흐름 정보를 획득하는 방법을 수행하는 저장된 코드를 포함하고, 애플리케이션에 대한 제어 흐름 정보를 획득하는 방법은: 프로세서에 의한 실행을 위해 애플리케이션을 로드하는 로더를 실행하는 단계 - 로더를 실행하는 단계는 메타데이터 처리 도메인에서 제 1 세트의 하나 이상의 규칙의 메타데이터 처리를 트리거하는 하나 이상의 명령어를 포함하는 제 1 코드 부분을 실행하는 단계를 포함하고, 제 1 세트의 하나 이상의 규칙의 메타데이터 처리는 메타데이터 처리 도메인에 액세스 가능하고 코드 실행 도메인에 액세스 불가능한 애플리케이션 메타데이터로서 애플리케이션에 대한 제어 흐름 정보를 수집하고 저장하는 단계를 포함함 -; 및 코드 실행 도메인에서 애플리케이션의 명령어를 실행하는 단계를 포함하고, 애플리케이션의 명령어를 실행하는 단계는 제어 흐름 정보의 적어도 일부분을 사용하여, 애플리케이션의 제어를 제 1 소스 위치로부터 제 1 타깃 위치로 이전할지를 결정하는 제어 흐름 정책의 제 2 세트의 규칙의 메타데이터 처리를 트리거한다. According to another aspect of the technology herein, a non-transitory computer-readable medium includes stored code that, when executed, performs a method of obtaining control flow information for an application, comprising: a method for obtaining control flow information for an application; B: Executing a loader that loads an application for execution by a processor, wherein executing the loader includes one or more instructions that trigger metadata processing of one or more rules of a first set in a metadata processing domain. 1 executing the code portion, wherein metadata processing of the one or more rules of the first set collects control flow information for the application as application metadata accessible to the metadata processing domain and inaccessible to the code execution domain; Including the step of storing -; and executing instructions of the application in the code execution domain, wherein executing instructions of the application determines whether to transfer control of the application from a first source location to a first target location using at least a portion of the control flow information. triggers metadata processing of a second set of rules of the determining control flow policy.

본 명세서에서의 기술의 다른 양태에 따르면, 시스템은: 프로세서; 및 실행될 때, 애플리케이션에 대한 제어 흐름 정보를 획득하는 방법을 수행하는 저장된 코드를 포함하는 메모리를 포함하고, 애플리케이션에 대한 제어 흐름 정보를 획득하는 방법은: 프로세서에 의한 실행을 위해 애플리케이션을 로드하는 로더를 실행하는 단계 - 로더를 실행하는 단계는 메타데이터 처리 도메인에서 제 1 세트의 하나 이상의 규칙의 메타데이터 처리를 트리거하는 하나 이상의 명령어를 포함하는 제 1 코드 부분을 실행하는 단계를 포함하고, 제 1 세트의 하나 이상의 규칙의 메타데이터 처리는 메타데이터 처리 도메인에 액세스 가능하고 코드 실행 도메인에 액세스 불가능한 애플리케이션 메타데이터로서 애플리케이션에 대한 제어 흐름 정보를 수집하고 저장하는 단계를 포함함 -; 및 코드 실행 도메인에서 애플리케이션의 명령어를 실행하는 단계를 포함하고, 애플리케이션의 명령어를 실행하는 단계는 제어 흐름 정보의 적어도 일부분을 사용하여, 애플리케이션의 제어를 제 1 소스 위치로부터 제 1 타깃 위치로 이전할지를 결정하는 제어 흐름 정책의 제 2 세트의 규칙의 메타데이터 처리를 트리거한다. According to another aspect of the technology herein, a system includes: a processor; and a memory containing stored code that, when executed, performs a method of obtaining control flow information for the application, wherein the method of obtaining control flow information for the application comprises: a loader that loads the application for execution by the processor; executing a - executing the loader comprises executing a first code portion comprising one or more instructions that trigger metadata processing of one or more rules of a first set in a metadata processing domain; metadata processing of one or more rules of the set includes collecting and storing control flow information for the application as application metadata accessible to the metadata processing domain and inaccessible to the code execution domain; and executing instructions of the application in the code execution domain, wherein executing instructions of the application determines whether to transfer control of the application from a first source location to a first target location using at least a portion of the control flow information. triggers metadata processing of a second set of rules of the determining control flow policy.

본 명세서에서의 기술의 다른 양태에 따르면, 태깅된 데이터 소스와 태깅되지 않은 데이터 소스 사이에서 프로세서-중재된 데이터 이전을 수행하는 방법은: 프로세서상에서, 태깅되지 않은 데이터 소스로부터 제 1 데이터를 로드하는 제 1 명령어를 실행하는 단계 - 태깅되지 않은 데이터 소스는 연관된 메타데이터 태그를 갖지 않는 메모리 위치를 포함함 -; 제 1 하드웨어에 의해, 제 1 데이터가 신뢰성이 없고 공개적 데이터 소스(public data source)로부터 온 것임을 나타내는 제 1 메타데이터 태그로 제 1 데이터를 태깅하는 단계 - 제 1 메타데이터 태그를 갖는 제 1 데이터는 제 1 버퍼에 저장됨 -; 및 프로세서상에서, 제 1 하나 이상의 규칙을 사용하는 메타데이터 처리를 트리거하는 제 1 코드를 실행하는 단계를 포함하고, 제 1 하나 이상의 규칙을 사용하는 메타데이터 처리는 제 1 데이터가 신뢰성 있음을 나타내는 제 2 메타데이터 태그를 갖도록 제 1 데이터를 재태깅하는 재태깅을 수행한다. 제 2 메타데이터 태그는 제 1 데이터가 공개적 소스로부터 온 것임을 추가로 나타낼 수 있다. 제 2 메타데이터 태그를 갖는 제 1 데이터는 연관된 메타데이터 태그를 각각 갖는 메모리 위치를 포함하는 태깅된 데이터 소스인 메모리에 저장될 수 있다. 메모리는 하나 이상의 신뢰성 있는 데이터 소수로부터의 데이터를 포함하는 신뢰성 있는 메모리일 수 있다. 메타데이터 처리는 제 1 코드를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 수행될 수 있다. 제 1 하나 이상의 규칙은 허용된 동작을 정의하는 메타데이터 처리에 의해 사용되는 메타데이터에 관한 규칙일 수 있다. 제 1 코드는 하나 이상의 명령어를 포함할 수 있으며 하나 이상의 명령어 각각은 상기 각각의 명령어가 제 1 데이터를 재태깅하여 제 2 메타데이터 태그를 갖게 하는 하나 이상의 규칙을 호출하는 인가를 갖는 것을 나타내는 특수 명령어 태그를 가질 수 있다. 제 1 메타데이터 태그를 갖는 제 1 데이터는 암호화될 수 있으며, 방법은 프로세서상에서 하나 이상의 명령어를 실행함으로써, 제 1 메타데이터 태그를 갖는 제 1 데이터를 암호 해독하고, 제 1 메타데이터 태그를 갖는 제 1 데이터의 암호 해독된 형태를 발생하는 단계; 및 프로세서상에서 하나 이상의 추가 명령어를 실행함으로써 입증 처리(validation processing)를 수행하는 단계를 포함할 수 있고, 상기 입증 처리는 디지털 서명을 사용하여 제 1 데이터의 암호 해독된 형태가 유효하다는 것을 보장하고, 상기 재태깅은 제 1 데이터의 성공적인 입증 처리 이후에 수행된다. 제 2 메타데이터 태그를 갖는 제 1 데이터는 태깅된 메모리의 제 1 메모리 위치에 암호 해독된 형태로 저장될 수 있으며, 방법은 제 1 데이터를 암호화하여 제 1 데이터를 암호화된 형태로 생성하고 제 1 데이터에 따라 디지털 서명을 발생하는 단계 - 상기 암호화 및 발생 단계는 프로세서상에서 추가 코드를 실행함으로써 수행됨 -; 및 프로세서상에서, 제 1 데이터의 암호화된 형태를 태깅된 메모리의 제 1 메모리 위치로부터 태깅되지 않은 메모리의 목적지 위치에 저장하는 제 2 명령어를 실행하는 단계를 포함할 수 있고, 제 1 데이터의 암호화된 형태는 연관된 메타데이터 태그 없이 목적지 위치에 저장되며, 제 2 메타데이터 태그는 제 1 데이터의 암호화된 형태를 목적지 위치에 저장하기 전에 제 2 하드웨어에 의해 제거된다. 제 1 시점에서, 제 1 데이터는 태깅되지 않은 메모리 부분의 제 1 위치에 저장될 수 있으며, 제 2 시점에서, 제 1 데이터가 신뢰성 없고 공개적 데이터 소스로부터 온 것임을 나타내는 제 1 메타데이터 태그를 갖는 제 1 데이터는 태깅된 메모리 부분의 제 2 위치에 저장될 수 있다. 태깅되지 않은 메모리 부분 및 태깅된 메모리 부분은 동일한 메모리 제어기에 의해 서비스되는 동일한 메모리에 포함되며, 제 2 메타데이터 처리 규칙은 프로세서로 하여금 데이터가 공개적인 것을 나타내는 연관된 메타데이터 태그를 갖는 데이터를 태깅되지 않은 메모리 부분에 기입하는 동작만을 수행하게 할 수 있으며, 태깅되지 않은 데이터에 대해 동작하는 외부의 태깅되지 않은 소스로부터의 직접 메모리 동작은 동일한 메모리의 태깅되지 않은 메모리 부분에 액세스하는 것만 허용될 수 있다. 제 2 메타데이터 처리 규칙의 적어도 일부는 프로세서로 하여금 데이터가 공개적이고 부가적으로 신뢰성 없음을 나타내는 연관된 메타데이터 태그를 갖는 데이터를 태깅되지 않은 메모리 부분에 기입하는 동작만을 수행하게 할 수 있다. 태깅되지 않은 데이터 소스는 태깅되지 않은 데이터 소스만을 포함하는 제 1 인터커넥트 패브릭에 연결될 수 있고, 제 2 메타데이터 태그를 갖는 제 1 데이터는 태깅된 데이터 소스만을 포함하는 제 2 데이터 소스 인터커넥트 패브릭에 연결된 메모리의 위치에 저장될 수 있다. 제 2 프로세서는 제 1 인터커넥트 패브릭에 연결될 수 있으며 태깅되지 않은 데이터 소스로부터의 태깅되지 않은 데이터를 사용하여 다른 명령어를 실행할 수 있다. 다른 명령어는 메타데이터 처리를 수행하지 않고 그리고 메타데이터에 관한 규칙을 사용하지 않고 실행되어 허용 가능한 동작을 시행할 수 있고, 제 2 프로세서에 의한 다른 명령어의 실행은 제 1 인터커넥트 패브릭의 태깅되지 않은 데이터 소스로부터의 데이터를 판독하는 것 및 제 1 인터커넥트 구조체의 태깅되지 않은 데이터 소스에 데이터를 기입하는 것 중 임의의 것을 포함하는 하나 이상의 동작을 수행하는 것을 포함할 수 있다. According to another aspect of the techniques herein, a method of performing a processor-mediated data transfer between a tagged data source and an untagged data source includes: on a processor, loading first data from an untagged data source. executing the first instruction, wherein the untagged data source includes a memory location that does not have an associated metadata tag; tagging, by first hardware, the first data with a first metadata tag indicating that the first data is untrustworthy and is from a public data source, the first data having the first metadata tag stored in the first buffer -; and executing, on the processor, first code that triggers metadata processing using the first one or more rules, wherein metadata processing using the first one or more rules indicates that the first data is trustworthy. 2 Re-tagging is performed to re-tag the first data to have a metadata tag. The second metadata tag may further indicate that the first data is from a public source. The first data with a second metadata tag may be stored in a memory that is a tagged data source that includes memory locations each having an associated metadata tag. The memory may be a reliable memory containing data from one or more trusted data sources. Metadata processing may be performed in a metadata processing domain isolated from a code execution domain containing the first code. The first one or more rules may be rules relating to metadata used by the metadata processing to define allowed actions. The first code may include one or more instructions, each special instruction indicating that the respective instruction has an authorization to invoke one or more rules to retag the first data to have a second metadata tag. can have tags. The first data having the first metadata tag may be encrypted, and the method executes one or more instructions on a processor to decrypt the first data having the first metadata tag and decrypting the first data having the first metadata tag. 1 generating a decrypted form of data; and executing one or more additional instructions on the processor to perform validation processing using a digital signature to ensure that the decrypted form of the first data is valid; The re-tagging is performed after successful verification processing of the first data. The first data having the second metadata tag may be stored in decrypted form in a first memory location of the tagged memory, the method encrypting the first data to produce the first data in encrypted form and generating the first data in encrypted form. generating a digital signature according to the data, wherein the encrypting and generating steps are performed by executing additional code on a processor; and executing, on the processor, a second instruction to store the encrypted form of the first data from the first memory location in the tagged memory to the destination location in the untagged memory, The form is stored at the destination location without an associated metadata tag, and the second metadata tag is removed by the second hardware prior to storing the encrypted form of the first data at the destination location. At a first point in time, the first data may be stored in a first location in an untagged memory portion, and at a second point in time, the first data has a first metadata tag indicating that the first data is untrustworthy and is from a public data source. 1 data may be stored in the second location of the tagged memory portion. Where the untagged memory portion and the tagged memory portion are included in the same memory serviced by the same memory controller, the second metadata processing rule causes the processor to untagged data with an associated metadata tag indicating that the data is public. Direct memory operations from external, untagged sources that operate on untagged data may only be allowed to access the untagged memory portion of the same memory. . At least some of the second metadata processing rules may cause the processor to perform only operations to write data to an untagged portion of memory that has an associated metadata tag indicating that the data is public and additionally untrustworthy. An untagged data source may be coupled to a first interconnect fabric comprising only untagged data sources, and the first data having a second metadata tag may be coupled to a memory coupled to a second data source interconnect fabric comprising only tagged data sources. can be stored in the location of A second processor may be coupled to the first interconnect fabric and may execute other instructions using untagged data from an untagged data source. Other instructions may be executed without performing metadata processing and without using rules regarding metadata to enforce permissible actions, and execution of the other instructions by the second processor may process the untagged data of the first interconnect fabric. performing one or more operations including any of reading data from the source and writing data to an untagged data source of the first interconnect structure.

본 명세서에서의 기술의 다른 양태에 따르면, 시스템은: 프로세서; 및 하나 이상의 태깅된 메모리 - 하나 이상의 태깅된 메모리의 각각의 메모리 위치는 연관된 메타데이터 태그를 가짐 -; 제 1 태깅되지 않은 메모리를 포함하는 하나 이상의 태깅되지 않은 메모리 - 하나 이상의 태깅되지 않은 메모리의 메모리 위치는 연관된 메타데이터 태그를 갖지 않음 -; 명령어와 관련하여 허용된 동작을 정의하는, 메타데이터 처리를 수행하는데 사용되는 메타데이터에 관한 규칙을 포함하는 규칙 캐시 - 프로세서에 의해 현재 명령어를 실행하기 전에, 규칙 캐시의 하나 이상의 규칙을 사용하는 메타데이터 처리가 수행되어 현재 명령어의 실행이 허용되는지를 결정함 -; 프로세서에 의해 실행될 때, 제 1 데이터를 제 1 태깅되지 않은 메모리로부터 프로세서에 의해 사용되는 데이터 캐시로 로드하는 제 1 명령어 - 데이터 캐시에 저장된 제 1 데이터는 연관된 제 1 메타데이터 태그를 가짐 -; 프로세서에 의해 실행될 때, 제 2 데이터를 데이터 캐시로부터 제 1 태깅되지 않은 메모리로 저장하는 제 2 명령어 - 데이터 캐시에 저장된 제 2 데이터는 연관된 제 2 메타데이터 태그를 가짐 -; 태깅되지 않은 데이터를 프로세서에 의해 시스템에서 사용되는 태깅된 데이터로 변환하는 제 1 하드웨어 컴포넌트 - 제 1 명령어의 실행에 응답하여, 제 1 하드웨어 컴포넌트는 제 1 태깅되지 않은 메모리로부터, 임의의 연관된 메타데이터 태그 없는 제 1 데이터를 수신하고, 연관된 제 1 메타데이터 태그를 갖는 제 1 데이터를 출력함 -; 및 태깅된 데이터를 태깅되지 않은 데이터로 변환하는 제 2 하드웨어 컴포넌트를 포함하고, 제 2 명령어의 실행에 응답하여, 제 2 하드웨어 컴포넌트는 연관된 제 2 메타데이터 태그를 갖는 제 2 데이터를 수신하고 임의의 연관된 메타데이터 태그 없는 제 2 데이터를 출력한다. 임의의 연관된 메타데이터 태그 없는 제1 데이터는 암호화될 수 있으며, 제 1 하드웨어 컴포넌트는 제 1 데이터를 암호 해독된 형태로 변환할 수 있고, 디지털 서명을 사용하여 제 1 데이터의 입증 처리를 수행하며, 성공적인 입증 처리 시, 제 1 데이터가 신뢰성 있음을 나타내는 연관된 제 1 메타데이터 태그를 갖도록 제 1 데이터에 태깅할 수 있다. 제 2 연관된 메타데이터 태그를 갖는 제 2 데이터는 암호 해독된 형태일 수 있으며, 제 2 하드웨어 컴포넌트는 제 2 데이터를 암호화된 형태로 변환하며 제 2 데이터에 따라 디지털 서명을 발생할 수 있다. 제 1 하드웨어 컴포넌트는 제 1 데이터가 신뢰성 있음을 나타내며 또한 제 1 데이터가 공개적 소스로부터 온 것임을 식별하는 연관된 제 1 메타데이터 태그를 갖도록 제 1 데이터에 태깅할 수 있다. 하나 이상의 암호 키 세트는 하드웨어에서 인코딩되는 것 및 메모리에 저장되는 것 중 하나일 수 있다. 하나 이상의 암호 키 세트는 암호 해독 및 입증 처리를 수행하는 것과 관련하여 제 1 하드웨어 컴포넌트에 의해 사용될 수 있으며 암호화를 수행하고 디지털 서명을 발생하는 것과 관련하여 제 2 하드웨어 컴포넌트에 의해 사용될 수 있다. 제 1 데이터는 제 1 하드웨어 컴포넌트에 의해 사용된 암호 키 세트 중 특정 하나를 식별하여 제 1 데이터를 암호 해독할 수 있으며, 제 2 데이터의 연관된 메타데이터 태그는 제 2 하드웨어 컴포넌트에 의해 사용되는 암호 키 세트 중 특정 하나를 식별하여 제 2 데이터를 암호화하고 서명할 수 있다. According to another aspect of the technology herein, a system includes: a processor; and one or more tagged memories, each memory location of the one or more tagged memories having an associated metadata tag; one or more untagged memories including a first untagged memory, wherein the memory locations of the one or more untagged memories do not have associated metadata tags; A rules cache containing rules relating to metadata used to perform metadata processing, defining allowed actions associated with an instruction - prior to execution of the current instruction by the processor, a metadata using one or more rules of the rules cache. Data processing is performed to determine if execution of the current instruction is allowed -; first instructions that, when executed by a processor, load first data from a first untagged memory into a data cache used by the processor, the first data stored in the data cache having an associated first metadata tag; second instructions that, when executed by a processor, store second data from the data cache to a first untagged memory, the second data stored in the data cache having an associated second metadata tag; A first hardware component that converts untagged data to tagged data used by the system in the system - in response to execution of the first instruction, the first hardware component converts any associated metadata from the first untagged memory Receive untagged first data and output first data with an associated first metadata tag; and a second hardware component that converts tagged data to untagged data, wherein in response to execution of the second instruction, the second hardware component receives second data having an associated second metadata tag and any The second data without associated metadata tag is output. the first data without any associated metadata tag may be encrypted, the first hardware component may convert the first data into a decrypted form, perform verification processing of the first data using a digital signature; Upon successful attestation processing, the first data may be tagged with an associated first metadata tag indicating that the first data is authentic. The second data with a second associated metadata tag may be in decrypted form, and the second hardware component may convert the second data to encrypted form and generate a digital signature according to the second data. The first hardware component can tag the first data with an associated first metadata tag indicating that the first data is trustworthy and also identifying that the first data is from a public source. One or more sets of cryptographic keys may be one of encoded in hardware and stored in memory. A set of one or more cryptographic keys may be used by a first hardware component in connection with performing decryption and verification processing and may be used by a second hardware component in connection with performing encryption and generating a digital signature. The first data identifies a particular one of the set of cryptographic keys used by the first hardware component to decrypt the first data, and the associated metadata tag of the second data identifies the cryptographic key used by the second hardware component. A particular one of the set can be identified to encrypt and sign the second data.

본 명세서에서의 기술의 다른 양태에 따르면, 현재 명령어를 처리하는 방법은: 메타데이터 처리를 위해, 현재 명령어를 수신하는 단계; 및 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 현재 명령어에 대한 메타데이터 처리를 수행하는 단계를 포함하고, 상기 현재 명령어는 메타데이터 처리에서 사용되는 제 1 메타데이터 태그를 갖는 제 1 메모리 위치를 참조하고, 현재 명령어에 대한 상기 메타데이터 처리는: 메모리로부터 제 1 메타데이터 태그를 검색하는 처리를 수행하는 단계; 메모리로부터 제 1 메모리 위치에 대한 제 1 메타데이터 태그를 수신하기에 앞서, 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값을 결정하는 단계; 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값을 사용하여, 현재 명령어의 결과 오퍼랜드에 대한 제 1 결과 메타데이터 태그를 결정하는 단계; 및 메모리로부터, 제 1 메타데이터 태그를 수신하는 것; 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭하는지를 결정하는 ㄷ단계; 및 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭한다고 결정하는 것에 응답하여, 제 1 결과 메타데이터 태그를 결과 오퍼랜드에 대한 최종 결과 메타데이터 태그로서 사용하는 단계를 포함한다. 현재 명령어에 대한 메타데이터 처리는 현재 명령어 및 현재 명령어에 대한 입력 메타데이터 태그들의 세트에 따라, 현재 명령어에 대한 제 1 규칙을 결정하는 단계 - 제 1 규칙은 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값을 포함하며 제 1 결과 메타데이터 태그를 포함하고, 제 1 규칙은 메타데이터 처리 도메인에서 메타데이터 처리를 위해 사용되는 규칙 캐시에 포함됨 -; 및 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭하지 않는다고 결정하는 것에 응답하여, 현재 명령어에 대한 메타데이터 처리 도메인에서 규칙 캐시 미스 처리를 수행하는 단계를 포함할 수 있다. 현재 명령어에 대한 메타데이터 처리 도메인에서 규칙 캐시 미스 처리는 코드 실행 도메인에서 현재 명령어의 실행이 허용되는지를 결정하는 단계; 코드 실행 도메인에서 현재 명령어의 실행이 허용된다고 결정하는 것에 응답하여, 현재 명령어에 대한 새로운 규칙을 발생하는 단계 - 새로운 규칙은 현재 명령어, 입력 메타데이터 태그의 세트 및 제 1 메타데이터 태그에 따라 발생됨 -; 및 메타데이터 처리 도메인에서 메타데이터 처리를 위해 사용되는 규칙 캐시에 새로운 규칙을 삽입하는 단계를 포함할 수 있다. 다른 입력 메타데이터 태그들의 세트는 현재 명령어에 대한 복수의 다른 메타데이터 태그를 포함할 수 있고, 상기 다른 메타데이터 입력 태그들의 세트는: 프로그램 카운터, 현재 명령어 및 현재 명령어의 입력 오퍼랜드 중 어느 것에 대한 메타데이터 태그를 포함할 수 있다. 결과 오퍼랜드는 현재 명령어를 실행한 결과를 저장하는 목적지 메모리 위치 또는 목적지 레지스터일 수 있다. 명령어는 제 1 스테이지 및 제 2 스테이지를 포함하는 복수의 스테이지에 따라 처리될 수 있고, 제 1 스테이지는 제 2 스테이지에 앞서 존재할 수 있다. 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값은 제 1 스테이지에서 결정될 수 있으며, 제 2 스테이지는 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭하는지의 상기 결정을 수행하는 것을 포함할 수 있으며, 제 2 스테이지는 또한 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭하지 않는다고 결정하는 것에 응답하여 현재 명령어에 대해 메타데이터 처리 도메인에서 규칙 캐시 미스 처리를 수행하는 것을 포함할 수 있다. 규칙 캐시는 예측 선택기 모드(prediction selector mode)에 따라 예측 모드 또는 정상 처리 모드에서 동작하도록 구성 가능할 수 있다. 규칙 캐시는 현재 명령어에 대해 메타데이터 처리를 수행할 때 예측 모드에서 동작하도록 구성될 수 있다. 규칙 캐시가 상기 예측 모드에서 동작하도록 구성될 때, 규칙 캐시는 제 1 규칙에 따라 제 1 출력을 발생할 수 있다. 제 1 출력은 다음 명령어의 프로그램 카운터에 대한 메타데이터 태그, 현재 명령어의 결과 오퍼랜드에 대한 제 1 결과 메타데이터 태그 및 제 1 스테이지의 출력으로서 메타데이터 태그의 예측된 값을 포함할 수 있다. 규칙 캐시가 상기 정상 처리 모드에서 동작하도록 구성될 때, 규칙 캐시는 제 1 규칙과 상이한 제 2 규칙에 따라 제 2 출력을 발생할 수 있고, 제 2 출력은 제 1 메타데이터 태그의 예측된 값을 포함하지 않을 수 있으며, 제 2 출력은 현재 명령어의 결과 오퍼랜드에 대한 그리고 다음 명령어의 프로그램 카운터에 대한 메타데이터 태그를 포함할 수 있다. 규칙 캐시는 예측 모드에서 동작할 때 제 1 정책의 규칙의 제 1 버전을 사용할 수 있으며, 그렇지 않으면 정상 처리 모드에서 동작할 때 제 1 정책의 규칙의 제 2 버전을 사용하며, 제 1 규칙은 규칙의 제 1 버전에 포함될 수 있으며 제 2 규칙은 규칙의 제 2 버전에 포함될 수 있다. According to another aspect of the techniques herein, a method of processing a current instruction includes: receiving, for metadata processing, a current instruction; and performing metadata processing for a current instruction in a metadata processing domain isolated from a code execution domain containing the current instruction, wherein the current instruction has a first metadata tag used in metadata processing. 1 memory location is referenced, and the metadata processing for a current instruction includes: performing processing to retrieve a first metadata tag from memory; prior to receiving the first metadata tag for the first memory location from the memory, determining a predicted value of a first metadata tag of the first memory location; determining a first result metadata tag for a result operand of a current instruction using the predicted value of the first metadata tag of the first memory location; and receiving, from the memory, a first metadata tag; step c of determining whether the first metadata tag matches the predicted value of the first metadata tag; and in response to determining that the first metadata tag matches the predicted value of the first metadata tag, using the first result metadata tag as a final result metadata tag for the result operand. Metadata processing for a current instruction comprises determining, according to the current instruction and the set of input metadata tags for the current instruction, a first rule for the current instruction, the first rule being a first metadata tag in a first memory location. contains a predicted value of and includes a first result metadata tag, and the first rule is included in a rule cache used for metadata processing in the metadata processing domain; and in response to determining that the first metadata tag does not match the predicted value of the first metadata tag, performing rules cache miss processing in the metadata processing domain for the current instruction. Processing a rule cache miss in the metadata processing domain for the current instruction determines whether execution of the current instruction is permitted in the code execution domain; In response to determining that execution of the current instruction is permitted in the code execution domain, generating a new rule for the current instruction, the new rule being generated according to the current instruction, the set of input metadata tags, and the first metadata tag. ; and inserting a new rule into a rule cache used for metadata processing in the metadata processing domain. The set of other input metadata tags may include a plurality of other metadata tags for a current instruction, the set of other metadata input tags comprising: a program counter, a metadata for any of the current instruction and input operands of the current instruction. Can contain data tags. The result operand can be a destination memory location or destination register that stores the result of executing the current instruction. An instruction may be processed according to a plurality of stages including a first stage and a second stage, where the first stage may precede the second stage. A predicted value of a first metadata tag of a first memory location may be determined in a first stage, and a second stage may perform said determination whether the first metadata tag matches the predicted value of the first metadata tag. The second stage may also include handling a rule cache miss in the metadata processing domain for the current instruction in response to determining that the first metadata tag does not match the predicted value of the first metadata tag. may include performing The rules cache may be configurable to operate in either prediction mode or normal processing mode depending on the prediction selector mode. The rules cache may be configured to operate in speculative mode when performing metadata processing for the current instruction. When the rules cache is configured to operate in the prediction mode, the rules cache may generate a first output according to a first rule. The first output may include a metadata tag for the program counter of the next instruction, a first result metadata tag for the result operand of the current instruction, and a predicted value of the metadata tag as an output of the first stage. When the rules cache is configured to operate in the normal processing mode, the rules cache may generate a second output according to a second rule different from the first rule, the second output comprising a predicted value of the first metadata tag. Otherwise, the second output may include a metadata tag for the result operand of the current instruction and for the program counter of the next instruction. The rules cache may use a first version of the rules of the first policy when operating in predictive mode, otherwise it may use a second version of the rules of the first policy when operating in normal processing mode, the first rule being the rule cache may be included in the first version of and the second rule may be included in the second version of the rules.

본 명세서에서의 기술의 다른 양태에 따르면, 시스템은: 복수의 파이프라인 스테이지를 포함하는 파이프라인 프로세서 - 상기 복수의 스테이지는 메모리 스테이지(memory stage) 및 라이트백 스테이지(writeback stage)를 포함함 -; 메모리 스테이지 메모리 스테이지의 완료에 앞서 동작하는 통합된 메타데이터 처리용 프로그래머블 유닛(programmable unit for metadata processing)(PUMP) - PUMP는 메타데이터 처리에서 사용된 제 1 메타데이터 태그를 갖는 제 1 메모리 위치를 참조하는 현재 명령어에 대한 메타데이터 처리를 수행하고, PUMP는 현재 명령어에 대한 제 1 메타데이터 태그를 포함하는 제 1 입력을 수신하며, PUMP는 라이트백 스테이지에 입력으로서 제공되는 제 1 출력을 발생하고, 제 1 출력은 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값 및 현재 명령어의 결과 오퍼랜드에 대한 제 1 결과 메타데이터 태그를 포함하고, 제 1 결과 메타데이터 태그는 제 1 메모리 위치에 대한 제 1 메타데이터 태그의 예측된 값에 따라 PUMP에 의해 결정됨 -; 및 제 1 메모리 위치에 대한 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭하는지를 결정하며, 제 1 메타데이터가 제 1 메타데이터 태그의 예측된 값과 매칭할 때 제 1 결과 메타데이터 태그를 결과 오퍼랜드에 대한 최종 결과 메타데이터 태그로서 사용하는 라이트백 스테이지의 하드웨어 컴포넌트를 포함한다. PUMP는 메모리 스테이지와 동시에 동작하고 예측 모드에서 더 동작하는 제 1 PUMP일 수 있으며 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값을 결정할 수 있으며, 시스템은 정상의 비-예측 모드에서 동작하는 제 2 PUMP를 포함할 수 있으며 제 1 메모리 위치의 제 1 메타데이터 태그에 대한 임의의 예측된 값을 결정하지 않을 수 있다. 제 2 PUMP는 메모리 스테이지와 라이트백 스테이지 사이의 다른 스테이지로서 통합될 수 있다. 제 1 PUMP는 예측 모드에서 동작할 때 사용하기 위한 제 1 정책의 규칙의 제 1 버전을 사용할 수 있으며, 제 2 PUMP는 정상의 비-예측 모드에서 동작할 때 사용하기 위한 제 1 정책의 규칙의 제 2 버전을 사용할 수 있다. 제 1 PUMP는 제 1 버전으로부터의 제 1 규칙에 따라 제 1 출력을 결정할 수 있으며, 제 2 PUMP는 제 2 버전으로부터의 제 2 규칙에 따라 제 2 출력을 결정할 수 있다. 제 2 출력은 제 1 메모리 위치에 대한 제 2 결과 메타데이터 태그를 포함할 수 있으며 상기 제 2 출력은 라이트백 스테이지로의 입력으로서 제공될 수 있다. 라이트백 스테이지의 하드웨어 컴포넌트는 제 1 메타데이터 태그가 예측된 값과 매칭하지 않을 때 제 2 결과 메타데이터 태그를 결과 오퍼랜드에 대한 최종 결과 메타데이터 태그로서 추가로 사용할 수 있다.According to another aspect of the techniques herein, a system includes: a pipeline processor including a plurality of pipeline stages, the plurality of stages including a memory stage and a writeback stage; Memory Stage A unified programmable unit for metadata processing (PUMP) that operates prior to completion of the memory stage - the PUMP references a first memory location having a first metadata tag used in metadata processing. performs metadata processing for a current instruction that does, the PUMP receives a first input comprising a first metadata tag for the current instruction, the PUMP generates a first output that is provided as an input to a writeback stage; The first output includes the predicted value of the first metadata tag of the first memory location and the first result metadata tag for the result operand of the current instruction, the first result metadata tag for the first memory location. 1 Determined by PUMP according to predicted value of metadata tag -; and determine whether the first metadata tag for the first memory location matches the predicted value of the first metadata tag, and when the first metadata matches the predicted value of the first metadata tag, a first resulting metadata. It includes a hardware component of the writeback stage that uses the data tag as the final result metadata tag for the result operand. The PUMP may be a first PUMP operating concurrently with the memory stage and further operating in predictive mode and determining the predicted value of the first metadata tag of the first memory location, the system operating in normal non-predictive mode. and may not determine any predicted value for the first metadata tag of the first memory location. A second PUMP may be incorporated as another stage between the memory stage and the writeback stage. The first PUMP may use the first version of the rules of the first policy for use when operating in predictive mode, and the second PUMP may use the first version of the rules of the first policy for use when operating in normal non-predictive mode. A second version is available. The first PUMP may determine the first output according to the first rule from the first version, and the second PUMP may determine the second output according to the second rule from the second version. The second output may include a second resulting metadata tag for the first memory location and the second output may be provided as an input to the writeback stage. The hardware component of the writeback stage may further use the second result metadata tag as the last result metadata tag for the result operand when the first metadata tag does not match the predicted value.

본 명세서에서의 기술의 다른 양태에 따르면, 비일시적 컴퓨터 판독가능한 매체는 실행될 때, 태깅된 데이터 소스와 태깅되지 않은 데이터 소스 사이에서 프로세서-중재 데이터 이전 방법을 수행하는 저장된 코드를 포함하고, 태깅된 데이터 소스와 태깅되지 않은 데이터 소스 사이에서 프로세서-중재 데이터 이전 방법은: 프로세서상에서, 태깅되지 않은 데이터 소스로부터 제 1 데이터를 로드하는 제 1 명령어를 실행하는 단계 - 태깅되지 않은 데이터 소스는 연관된 메타데이터 태그를 갖지 않는 메모리 위치를 포함함 -; 제 1 하드웨어에 의해, 제 1 데이터가 신뢰성이 없고 공개적 데이터 소스(public data source)로부터 온 것임을 나타내는 제 1 메타데이터 태그로 제 1 데이터를 태깅하는 단계 - 제 1 메타데이터 태그를 갖는 제 1 데이터는 제 1 버퍼에 저장됨 -; 및 프로세서상에서, 제 1 하나 이상의 규칙을 사용하는 메타데이터 처리를 트리거하는 제 1 코드를 실행하는 단계를 포함하고, 제 1 하나 이상의 규칙을 사용하는 메타데이터 처리는 제 1 데이터가 신뢰성 있음을 나타내는 제 2 메타데이터 태그를 갖도록 제 1 데이터를 재 태깅하는 재태깅을 수행한다. According to another aspect of the technology herein, a non-transitory computer-readable medium includes stored code that, when executed, performs a processor-mediated data transfer method between a tagged data source and an untagged data source, comprising: A processor-mediated data transfer method between a data source and an untagged data source comprises: executing, on a processor, a first instruction to load first data from an untagged data source, wherein the untagged data source includes associated metadata Contains memory locations that do not have tags -; tagging, by first hardware, the first data with a first metadata tag indicating that the first data is untrustworthy and is from a public data source - the first data having the first metadata tag stored in the first buffer -; and executing, on the processor, first code that triggers metadata processing using the first one or more rules, wherein metadata processing using the first one or more rules indicates that the first data is trustworthy. 2 Re-tagging is performed to re-tag the first data to have a metadata tag.

본 명세서에서의 기술의 다른 양태에 따르면, 비일시적 컴퓨터 판독가능한 매체는 실행될 때, 현재 명령어를 처리하는 방법을 수행하는 저장된 코드를 포함하고, 현재 명령어를 처리하는 방법은: 메타데이터 처리를 위해, 현재 명령어를 수신하는 단계; 및 현재 명령어를 포함하는 코드 실행 도메인으로부터 격리된 메타데이터 처리 도메인에서 현재 명령어에 대한 메타데이터 처리를 수행하는 단계를 포함하고, 현재 명령어는 메타데이터 처리에서 사용되는 제 1 메타데이터 태그를 갖는 제 1 메모리 위치를 참조하고, 현재 명령어에 대한 메타데이터 처리는: 메모리로부터 제 1 메타데이터 태그를 검색하는 처리를 수행하는 단계; 메모리로부터 제 1 메모리 위치에 대한 제 1 메타데이터 태그를 수신하기에 앞서, 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값을 결정하는 단계; 제 1 메모리 위치의 제 1 메타데이터 태그의 예측된 값을 사용하여, 현재 명령어의 결과 오퍼랜드에 대한 제 1 결과 메타데이터 태그를 결정하는 단계; 및 메모리로부터, 제 1 메타데이터 태그를 수신하는 것; 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭하는지를 결정하는 단계; 및 제 1 메타데이터 태그가 제 1 메타데이터 태그의 예측된 값과 매칭한다고 결정하는 것에 응답하여, 제 1 결과 메타데이터 태그를 결과 오퍼랜드에 대한 최종 결과 메타데이터 태그로서 사용하는 단계를 포함한다.According to another aspect of the techniques herein, the non-transitory computer readable medium includes stored code that, when executed, performs a method of processing a current instruction, the method of processing a current instruction comprising: for processing metadata; receiving a current command; and performing metadata processing for the current instruction in a metadata processing domain isolated from a code execution domain containing the current instruction, wherein the current instruction has a first metadata tag used in metadata processing. Referring to the memory location, metadata processing for the current instruction includes: performing processing to retrieve the first metadata tag from memory; prior to receiving the first metadata tag for the first memory location from the memory, determining a predicted value of a first metadata tag of the first memory location; determining a first result metadata tag for a result operand of a current instruction using the predicted value of the first metadata tag of the first memory location; and receiving, from the memory, a first metadata tag; determining whether the first metadata tag matches the predicted value of the first metadata tag; and in response to determining that the first metadata tag matches the predicted value of the first metadata tag, using the first result metadata tag as a final result metadata tag for the result operand.

본 명세서에서 본 기술의 특징 및 장점은 첨부된 도면과 함께 설명된 이하의 실시예에 관한 상세한 설명으로부터 더욱 명백해질 것이다.
도 1은 프로세서 파이프라인에서 파이프라인 스테이지로서 통합된 PUMP 캐시의 예를 도시하는 개략도이다.
도 2는 PUMP 평가 프레임워크(PUMP Evaluation Framework)를 도시하는 개략도이다.
도 3a는 도 2에 도시된 평가 프레임워크를 사용하여 간단히 실시한 단일 런타임 정책에 대한 성능 결과를 도시하는 그래프이다.
도 3b는 간단한 구현에 따른 단일 에너지 정책의 성능 결과를 도시하는 그래프이다.
도 4a는 64b 태그(Tag)를 갖는 간단한 구현의 복합 정책 런타임 오버헤드(composite policy runtime overhead)를 도시하는 일련의 막대 그래프이고, 여기서 복합 정책은 다음과 같은 정책 (i) 공간적 및 시간적 메모리 안전, (ii) 테인트 추적(taint tracking), (iii) 제어 흐름 무결성(control-flow integrity) 및 (iv) 코드 및 데이터 분리를 동시에 시행한다.
도 4b는 64b 태그를 갖는 간단한 구현의 복합 정책 에너지 오버헤드를 도시하는 일련의 막대 그래프이다.
도 4c는 베이스라인에 비해 간단한 구현에 따른 전력 천장(power ceiling)을 도시하는 일련의 막대 그래프이다.
도 5a는 opgroup 최적화를 하지 않는 그리고 opgroup 최적화를 하는 PUMP 규칙의 수를 비교한 막대 그래프이다.
도 5b는 PUMP 용량에 기초한 다양한 opgroup 최적화의 미스 레이트(miss rate)의 영향을 도시하는 일련의 그래프이다.
도 6a는 복합 정책에 따른 gcc 벤치마크에 대한 각 DRAM 이전에 대한 고유 태그의 분포를 도시하는 그래프로, 대부분의 워드가 동일한 태그를 갖는 것을 보여준다.
도 6b는 메인 메모리 태그 압축을 도시하는 다이어그램이다.
도 7a는 16b L2 태그 및 12b L1 태그 간의 변환을 도시하는 개략도이다.
도 7b는 12b L1 태그 및 16b L2 태그 간의 변환을 도시하는 개략도이다.
도 8a는 L1 PUMP 플러시(flush)에 미치는 L1 태그 길이의 영향을 도시하는 개략적인 그래프이다.
도 8b는 L1 PUMP 미스 레이트에 미치는 L1 태그 길이의 영향을 도시하는 개략적인 그래프이다.
도 9a는 상이한 정책에 대한 미스 레이트를 도시하는 일련의 막대 그래프이다.
도 9b는 네 개의 예시적인 마이크로아키텍처 최적화에 대한 캐시 히트 레이트(cache hit rate)를 도시하는 라인 그래프이다.
도 9c는 미스 서비스 성능(miss service performance)을 나타내는 라인 그래프이다.
도 9d는 용량에 기초한 미스 핸들러 히트 레이트(miss handler hit rate)를 도시하는 라인 그래프이다.
도 9e는 복합 정책에 대한 최적화의 영향을 도시하는 일련의 막대 그래프이다.
도 10a는 최적화된 구현의 런타임 오버헤드를 도시하는 일련의 그래프이다.
도 10b는 최적화된 구현의 에너지 오버헤드를 도시하는 일련의 막대 그래프이다.
도 10c는 베이스라인에 비교하여 최적화된 구현의 절대 전력을 도시하는 일련의 막대 그래프이다.
도 11a는 상이한 대표적인 벤치마크에 대한 태그 비트 길이 및 UCP-캐시($) 용량의 런타임 오버헤드 영향을 도시하는 일련의 음영 처리된 그래프이다.
도 11b는 상이한 대표적인 벤치마크에 대한 태그 비트 길이 및 UCP-$ 용량의 에너지 오버헤드 영향을 도시하는 일련의 음영 그래프이다.
도 12a는 대표적인 벤치마크에 미치는 최적화의 런타임 영향을 도시하는 일련의 그래프로서, A: 간단; B: A + Op그룹화; C: B + DRAM 압축; D: C + (10b L1, 14b, L2) 짧은 태그; E: D + (2048-UCP; 512-CTAG)이다.
도 12b는 대표적인 벤치마크에 미치는 최적화의 에너지 영향을 도시하는 일련의 그래프로서, A: 간단; B: A + Op그룹화; C: B + DRAM 압축; D: C + (10b L1, 14b, L2) 짧은 태그; E: D + (2048-UCP; 512-CTAG)이다.
도 13a는 대표적인 벤치마크에 대한 복합에서의 런타임 정책 영향을 도시하는 일련의 그래프이다.
도 13b는 복합에서의 에너지 정책 영향을 도시하는 일련의 그래프이다.
도 14는 조사된 정책의 요약을 제공하는 "테이블 1"로 표기된 제 1 테이블이다.
도 15는 태깅 체계의 분류의 요약을 제공하는 "테이블 2"로 표기된 제 2 테이블이다.
도 16은 베이스라인 및 간단한 PUMP-확장된 프로세서(PUMP-extended processor)에 대한 메모리 자원 추정의 요약을 제공하는 "테이블 3"으로 표기된 제 3 테이블이다.
도 17은 실험에서 사용된 PUMP 파라미터 범위의 요약을 제공하는 "테이블 4"로 표기된 제 4 테이블이다.
도 18은 PUMP-최적화된 프로세서(PUMP-optimized processor)에 대한 메모리 자원 추정의 요약을 제공하는 "테이블 5"로 표시된 제 5 테이블이다.
도 19는 테인트 추적 미스 핸들러의 요약을 제공하는 "알고리즘 1"로 표기된 제 1 알고리즘이다.
도 20은 N-정책 미스 핸들러의 요약을 제공하는 "알고리즘 2"로 표기된 제 2 알고리즘이다.
도 21은 HW 지원되는 N-정책 미스 핸들러의 요약을 제공하는 "알고리즘 3"으로 표기된 제 3 알고리즘이다.
도 22는 PUMP 규칙 캐시 데이터 흐름 및 마이크로아키텍처의 개략도이다.
도 23은 PUMP 마이크로아키텍처의 개략도이다.
도 24는 도 1과 유사하게, 프로세서 파이프라인 및 그 opgroup 변환, UCP 및 CTAG 캐시들에서 파이프라인 스테이지로서 통합된 예시적인 PUMP 캐시를 도시하는 개략도이다.
도 25는 본 명세서에서의 기술에 따른 실시예에서 제어 상태 레지스터(control status register)(CSR)의 예이다.
도 26은 본 명세서에서의 기술에 따른 실시예에서 태그모드(tagmode)의 예이다.
도 27은 본 명세서에서의 기술에 따른 실시예에서 별도의 프로세서를 갖는 별도의 메타데이터 처리 서브시스템/도메인을 도시하는 예이다.
도 28은 본 명세서에서의 기술에 따른 실시예에서 PUMP 입력 및 출력을 도시한다.
도 29는 본 명세서에서의 기술에 따른 실시예에서 opgroup 테이블과 관련하여 입력 및 출력을 도시한다.
도 30은 본 명세서에서의 기술에 따른 실시예에서 PUMP에 의해 수행된 처리를 도시한다.
도 31 및 도 32는 본 명세서에서의 기술에 따른 실시예에서 PUMP 입력 및 출력의 제어 및 선택에 관한 추가적인 세부 사항을 제공한다.
도 33은 본 명세서에서의 기술에 따른 실시예에서 6 스테이지 처리 파이프라인을 나타내는 예이다.
도 34 내지 도 38은 실시예에서 서브 명령어 및 연관된 기술을 도시하는 예이다.
도 36 내지 도 38은 실시예에서 서브 명령어 및 연관된 기술을 도시하는 예이다.
도 39 내지 도 42는 실시예에서 바이트 레벨 태깅 및 연관된 기술을 예시하는 예이다.
도 43은 본 명세서에서의 기술에 따른 실시예에서 가변 길이 opcode를 도시하는 예이다.
도 44는 본 명세서에서의 기술에 따른 실시예에서 opcode 매핑 테이블을 도시하는 예이다.
도 45는 본 명세서에서의 기술에 따른 실시예에서 공유된 페이지를 도시하는 예이다.
도 46은 본 명세서에서의 기술에 따른 실시예에서 제어 포인트의 이전을 도시하는 예이다.
도 47은 본 명세서에서의 기술에 따른 실시예에서 호출 스택을 도시하는 예이다.
도 48 내지 도 49는 본 명세서에서의 기술에 따른 실시예에서 메모리 위치 태깅 또는 컬러화(coloring)를 도시하는 예이다.
도 50은 본 명세서에서의 기술에 따른 실시예에서 setjmp 및 longjmp를 도시하는 예이다.
도 51, 도 52 및 도 53은 본 명세서에서의 기술에 따른 실시예에서 보호 행위를 구현하는데 사용되는 상이한 런타임 거동 및 연관된 보호 행위 및 메커니즘의 테이블이다.
도 54, 도 55 및 도 56은 본 명세서에서의 기술에 따른 실시예에서 정책 규칙을 학습하거나 결정하기 위해 수행될 수 있는 처리를 도시하는 예이다.
도 57, 도 58, 도 59 및 도 60은 데이터의 외부 버전과 내부 태깅된 버전 사이의 변환과 관련하여 실시예에서의 컴포넌트를 도시하는 예이다.
도 61, 도 62 및 도 63은 본 명세서에서의 기술에 따른 실시예에서 태그 예측을 수행하는 양태를 도시하는 예이다.
도 64 및 도 65는 실시예에서 메모리가 할당된 본 출원의 컬러화 메모리 위치 검출 기술의 사용을 도시한다.
도 66 및 도 67은 본 명세서에서의 기술에 따른 실시예에서 하드웨어 규칙 지원을 제공하는 상이한 컴포넌트를 도시한다.
도 68 내지 도 70은 PUMP가 값을 리턴하는 실시예에서 본 명세서에서의 기술의 사용을 도시하는 예이다.
도 71은 명령어 시퀀스를 갖는 실시예에서 본 명세서에서의 기술의 사용을 도시하는 예이다.
도 72는 본 명세서에서의 기술에 따른 실시예에서 시스템을 부팅하는 것과 관련하여 수행될 수 있는 처리 단계의 흐름도이다.
도 73은 본 명세서에서의 기술에 따른 실시예에서 태그 발생과 관련된 트리 태그 계층(tree tag hierarchy)의 예이다.
도 74, 도 75, 도 76 및 도 77은 본 명세서에서의 기술에 따른 실시예에서 I/O PUMP와 관련한 양태 및 특징을 도시하는 예이다.
도 78, 도 79, 도 80, 도 81 및 도 82는 본 명세서에서의 기술에 따른 실시예에서 태그 값을 저장하고 결정하는 것과 관련하여 사용되는 계층을 도시하는 예이다.
도 83 및 도 84는 본 명세서에서의 기술에 따른 실시예에서 제어 흐름 무결성 및 연관된 처리를 도시하는 예이다.The features and advantages of the technology disclosed herein will become more apparent from the following detailed description of embodiments, taken in conjunction with the accompanying drawings.
1 is a schematic diagram illustrating an example of a PUMP cache integrated as a pipeline stage in a processor pipeline.
2 is a schematic diagram illustrating the PUMP Evaluation Framework.
FIG. 3A is a graph showing performance results for a single runtime policy implemented simply using the evaluation framework shown in FIG. 2 .
3B is a graph showing the performance results of a single energy policy according to a simple implementation.
4A is a series of histograms showing the composite policy runtime overhead of a simple implementation with a 64b Tag, where the composite policy is the following policies (i) spatial and temporal memory safety; It simultaneously enforces (ii) taint tracking, (iii) control-flow integrity, and (iv) code and data separation.
4B is a series of histograms illustrating the complex policy energy overhead of a simple implementation with a 64b tag.
4C is a series of histograms illustrating the power ceiling for a simpler implementation compared to the baseline.
5A is a bar graph comparing the number of PUMP rules without opgroup optimization and with opgroup optimization.
5B is a series of graphs illustrating the impact of miss rate of various opgroup optimizations based on PUMP capacity.
6A is a graph showing the distribution of unique tags for each DRAM transfer for the gcc benchmark according to the composite policy, showing that most words have the same tag.
6B is a diagram illustrating main memory tag compression.
7A is a schematic diagram showing conversion between a 16b L2 tag and a 12b L1 tag.
7B is a schematic diagram showing conversion between a 12b L1 tag and a 16b L2 tag.
8A is a schematic graph showing the effect of L1 tag length on L1 PUMP flush.
8B is a schematic graph showing the effect of L1 tag length on L1 PUMP miss rate.
9A is a series of bar graphs showing miss rates for different policies.
9B is a line graph depicting cache hit rates for four exemplary microarchitectural optimizations.
9C is a line graph showing miss service performance.
9D is a line graph showing miss handler hit rate based on capacity.
9E is a series of bar graphs illustrating the impact of optimization on a composite policy.
10A is a series of graphs illustrating the runtime overhead of an optimized implementation.
10B is a series of histograms illustrating the energy overhead of an optimized implementation.
10C is a series of bar graphs depicting the absolute power of an optimized implementation compared to a baseline.
11A is a series of shaded graphs illustrating the runtime overhead impact of tag bit length and UCP-cache ($) capacity for different representative benchmarks.
11B is a series of shaded graphs illustrating the energy overhead impact of tag bit length and UCP-$ capacity for different representative benchmarks.
12A is a series of graphs depicting the runtime impact of optimization on representative benchmarks: A: simple; B: A + Op grouping; C: B + DRAM compression; D: C + (10b L1, 14b, L2) short tag; E: D + (2048-UCP; 512-CTAG).
12B is a series of graphs depicting the energy impact of optimization on representative benchmarks: A: simple; B: A + Op grouping; C: B + DRAM compression; D: C + (10b L1, 14b, L2) short tag; E: D + (2048-UCP; 512-CTAG).
13A is a series of graphs illustrating runtime policy impact in composites for representative benchmarks.
13B is a series of graphs illustrating energy policy impacts in the complex.
Figure 14 is the first table, labeled "Table 1", which provides a summary of the policies examined.
Figure 15 is a second table, labeled "Table 2", which provides a summary of the classification of tagging schemes.
16 is a third table, labeled “Table 3,” which provides a summary of memory resource estimates for baseline and simple PUMP-extended processors.
Figure 17 is a fourth table, labeled "Table 4", which provides a summary of the PUMP parameter ranges used in the experiments.
18 is a fifth table, designated “Table 5”, which provides a summary of memory resource estimates for a PUMP-optimized processor.
Figure 19 is the first algorithm, denoted "Algorithm 1", which provides a summary of the taint tracking miss handler.
Figure 20 is a second algorithm, denoted "Algorithm 2", which provides a summary of N-Policy miss handlers.
Figure 21 is a third algorithm, denoted "Algorithm 3", which provides a summary of HW supported N-Policy miss handlers.
Figure 22 is a schematic diagram of PUMP rule cache data flow and microarchitecture.
23 is a schematic diagram of the PUMP microarchitecture.
FIG. 24 is a schematic diagram similar to FIG. 1 , illustrating an example PUMP cache integrated as a pipeline stage in a processor pipeline and its opgroup translation, UCP and CTAG caches.
25 is an example of a control status register (CSR) in an embodiment consistent with the techniques herein.
26 is an example of a tagmode in an embodiment according to the technology herein.
27 is an example illustrating a separate metadata processing subsystem/domain having a separate processor in an embodiment consistent with the techniques herein.
28 illustrates PUMP inputs and outputs in an embodiment consistent with the techniques herein.
29 illustrates inputs and outputs in relation to an opgroup table in an embodiment consistent with the techniques herein.
30 illustrates processing performed by PUMP in an embodiment according to the techniques herein.
31 and 32 provide additional details regarding control and selection of PUMP inputs and outputs in embodiments consistent with the techniques herein.
33 is an example illustrating a six-stage processing pipeline in an embodiment according to the techniques herein.
34 to 38 are examples illustrating sub-instructions and associated techniques in embodiments.
36 to 38 are examples illustrating sub-instructions and associated techniques in embodiments.
39-42 are examples illustrating byte-level tagging and associated techniques in an embodiment.
43 is an example illustrating a variable length opcode in an embodiment according to the technology herein.
44 is an example illustrating an opcode mapping table in an embodiment according to the technology herein.
45 is an example illustrating a shared page in an embodiment according to the technology herein.
46 is an example illustrating transfer of a control point in an embodiment according to the technology herein.
47 is an example illustrating a call stack in an embodiment according to the techniques herein.
48 to 49 are examples illustrating memory location tagging or coloring in embodiments according to the technology herein.
50 is an example illustrating setjmp and longjmp in an embodiment according to the technology herein.
51, 52 and 53 are tables of different runtime behaviors and associated protection behaviors and mechanisms used to implement protection behaviors in embodiments consistent with the techniques herein.
54, 55 and 56 are examples illustrating processing that may be performed to learn or determine policy rules in embodiments according to the techniques herein.
Figures 57, 58, 59 and 60 are examples illustrating components in an embodiment with respect to conversion between external and internal tagged versions of data.
61, 62, and 63 are examples illustrating aspects of performing tag prediction in an embodiment according to the technology herein.
64 and 65 illustrate the use of the colorized memory location detection technique of the present application where memory is allocated in an embodiment.
66 and 67 illustrate different components for providing hardware rule support in embodiments consistent with the techniques herein.
68-70 are examples illustrating the use of techniques herein in embodiments where PUMPs return values.
71 is an example illustrating the use of techniques herein in an embodiment having a sequence of instructions.
72 is a flow diagram of processing steps that may be performed in connection with booting a system in an embodiment consistent with the techniques herein.
73 is an example of a tree tag hierarchy associated with tag occurrence in an embodiment according to the techniques herein.
74, 75, 76, and 77 are examples illustrating aspects and characteristics related to an I/O PUMP in an embodiment according to the technology herein.
78, 79, 80, 81, and 82 are examples illustrating layers used in relation to storing and determining a tag value in an embodiment according to the description herein.
83 and 84 are examples illustrating control flow integrity and associated processing in embodiments in accordance with the techniques herein.

다음 단락에서는 메타데이터 태그를 시스템의 메인 메모리, 캐시 및 레지스터 내의 모든 워드와 불가분으로 연관시키는 메타데이터 처리용 프로그래머블 유닛(Programmable Unit for Metadata Processing)(PUMP)의 다양한 실시예와 양태를 설명한다. 무제한의 메타데이터를 지원하기 위해, 태그는 메모리의 데이터 구조체를 간접 지정할 만큼 충분히 크다. 모든 명령어에서, 입력의 태그는 동작이 허용되는지를 결정하고, 허용된다면 결과의 태그를 계산하는데 사용된다. 일부 실시예에서, 태그 검사 및 전파 규칙은 소프트웨어로 정의된다; 그러나 성능 영향을 최소화하기 위해 이러한 규칙은 프로세서의 산술 논리 유닛(arithmetic logic unit)(ALU) 부분과 병렬로 동작하는 하드웨어 구조체인 PUMP 규칙 캐시에 캐싱된다. 일부 실시예에서, 예컨대 소프트웨어 및/또는 하드웨어를 사용하여 구현될 수 있는 미스 핸들러(miss handler)는 현재 시행되는 정책에 기초하여 캐시 미스를 서비스하는데 사용될 수 있다.The following paragraphs describe various embodiments and aspects of a Programmable Unit for Metadata Processing (PUMP) that inextricably associates a metadata tag with every word in the system's main memory, caches, and registers. To support unlimited metadata, tags are large enough to indirectly specify data structures in memory. In every instruction, the tag of the input is used to determine if the operation is allowed, and if so, to calculate the tag of the result. In some embodiments, tag inspection and propagation rules are defined in software; However, to minimize performance impact, these rules are cached in the PUMP rule cache, a hardware structure that operates in parallel with the arithmetic logic unit (ALU) portion of the processor. In some embodiments, a miss handler, which may be implemented using, for example, software and/or hardware, may be used to service cache misses based on currently enforced policies.

PUMP의 성능 영향은 상이한 방식으로 PUMP에 스트레스를 주고 보안 속성의 범위를 예시하는, 예를 들어, (1) 태그를 사용하여 코드를 메모리의 데이터와 구별하고 간단한 코드 삽입 공격(code injection attack)에 대해 보호를 제공하는 실행-불가 데이터 및 기입-불가 코드(Non-Executable Data and Non-Writable Code)(NXD+NWC) 정책; (2) 실질적으로 제한 없는 (260) 수의 컬러("테인트 마크(taint mark)")로 확장하는 힙-할당 메모리(heap-allocated memory)의 모든 공간적 및 시간적 위반을 검출하는 메모리 안전(Memory Safety) 정책; (3) 간접 제어 이전(indirect control transfer)을 프로그램의 제어 흐름 그래프에서 허용된 에지만으로 제한하여, 리턴-지향-프로그래밍-스타일(return-oriented-programming-style) 공격을 방지하는 제어-흐름 무결성(Control-Flow Integrity)(CFI) 정책 (미세 세밀화된 CFI가 시행되지만, 공격에 잠재적으로 취약 가능성 있는 저급으로 근사화한 것은 제외함); 및 (4) 각 워드가 동시에 다수의 소스(라이브러리 및 10 개 스트림)에 의해 잠재적으로 오염될 수 있는 미세 세밀화된 테인트 추적 정책(fine-grained Taint Tracking policy)(일반화)과 같은, 네 개의 상이한 정책의 복합을 사용하는 적어도 하나의 실시예에서, PUMP의 성능 영향이 측정될 수 있다(도 14 참조).The performance impact of PUMPs stresses PUMPs in different ways and exemplifies a range of security properties, e.g. (1) using tags to distinguish code from data in memory and to protect against simple code injection attacks. a Non-Executable Data and Non-Writable Code (NXD+NWC) policy that provides protection against (2) Memory safety, which detects all spatial and temporal violations of heap-allocated memory extending to a virtually unlimited number of colors ("taint marks"); Safety) policy; (3) Control-flow integrity, which prevents return-oriented-programming-style attacks by restricting indirect control transfers to only allowed edges in the control-flow graph of the program ( Control-Flow Integrity (CFI) policy (fine-grained CFI is enforced, but not approximated at a low level potentially vulnerable to attack); and (4) a fine-grained Taint Tracking policy (generalization) where each word can potentially be tainted by multiple sources (libraries and 10 streams) simultaneously. In at least one embodiment using a composite of policies, the performance impact of PUMP can be measured (see FIG. 14).

전술한 바는 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 잘 알려진 정책의 예이다. 문헌에서 보호 능력이 확립된 잘 알려진 그러한 정책에 대해, 본 명세서에서의 기술은 그러한 정책을 시행하면서도 또한 PUMP를 사용하여 정책을 시행하는 것의 성능 영향을 줄이는데 사용될 수 있다. NXD+NWC를 제외하고, 이들 정책 각각은 근본적으로 무한한 수의 고유 아이템들을 구별해야 하며; 대조적으로, 메타데이터 비트의 수가 제한된 해결책은 기껏해야 극도로 단순화된 근사화된 해결책만을 지원할 수 있다.The foregoing are examples of well-known policies that may be used in embodiments consistent with the techniques herein. For those well-known policies for which protective capabilities have been established in the literature, the techniques herein can be used to enforce such policies while also reducing the performance impact of enforcing the policies using PUMP. Except for NXD+NWC, each of these policies must distinguish essentially an infinite number of unique items; In contrast, a solution with a limited number of metadata bits can support only an extremely simplified approximate solution at best.

본 명세서의 다른 곳에서 도시되고 설명된 바와 같이, 본 명세서에서의 기술에 따른 일 실시예는 포인터-크기(64b 또는 바이트)의 태그 내지 64b 워드를 사용하여 시스템의 모든 메모리의 크기 및 에너지 사용이 적어도 두배가 되는 PUMP의 간단하고 직접적인 구현을 활용할 수 있다. 규칙 캐시는 이것 이외에 영역 및 에너지를 추가한다. 이러한 특정 실시예에서, 190 %의 영역 오버헤드(도 16 참조)가 측정되었고 약 220 %의 기하학적 의미의 에너지 오버헤드가 측정되었다; 더욱이 일부 애플리케이션에서는 런타임 오버헤드가 300 %를 넘을 수 있다. 이렇게 높은 오버헤드 때문에 이것이 수행될 수 있는 최상의 것이라면 채택할 의욕이 좌절되게 할 수 있다.As shown and described elsewhere herein, one embodiment in accordance with the techniques herein uses pointer-sized (64b or byte) tags to 64b words to reduce the size and energy usage of all memory in the system. A simple and straightforward implementation of PUMP that at least doubles is available. The rule cache adds realms and energies to this. In this particular example, an area overhead of 190% (see Fig. 16) was measured and an energy overhead in a geometric sense of about 220% was measured; Moreover, the runtime overhead can exceed 300% for some applications. This high overhead can discourage adoption if this is the best that can be done.

그러나 아래에서 자세히 보다 설명하는 것처럼 대부분의 정책은 태그와 그에 대해 정의된 규칙 둘 모두에 대해 공간적 및 시간적 지역성(locality)을 보인다. 따라서, 본 명세서에서의 기술에 따른 실시예는 유사한 (또는 심지어 동일한) 명령어들의 그룹에 대해 규칙을 정의하고, 강제 미스(compulsory miss)를 줄이고 규칙 캐시의 유효 용량을 증가시킴으로써 고유의 규칙의 수를 상당히 감소시킬 수 있다. 오프-칩 메모리 트래픽은 태그의 공간 지역성을 활용하여 줄일 수 있다. 온-칩 영역 및 에너지 오버헤드는 소수의 비트를 사용하여 한 번에 사용 중시의 포인터-크기(point-sized) 태그의 서브세트를 한 번에 표현함으로써 최소화될 수 있다. 복합 정책 미스 핸들러의 런타임 비용은 컴포넌트 정책을 캐싱하기 위한 하드웨어 지원을 제공함으로써 감소될 수 있다. 따라서, 본 명세서에서의 기술에 따른 실시예는 그러한 최적화를 포함할 수 있고, 그럼으로써 PUMP가 그의 풍부한 정책 모델을 훼손하지 않으면서 더 낮은 오버헤드를 달성할 수 있게 한다.However, as explained in more detail below, most policies exhibit spatial and temporal locality both to tags and to the rules defined for them. Thus, embodiments in accordance with the techniques herein reduce the number of unique rules by defining rules for groups of similar (or even identical) instructions, reducing compulsory misses and increasing the effective capacity of the rule cache. can be significantly reduced. Off-chip memory traffic can be reduced by exploiting the tag's spatial locality. On-chip area and energy overhead can be minimized by using a small number of bits to represent a subset of point-sized tags in use at a time. The runtime cost of composite policy miss handlers can be reduced by providing hardware support for caching component policies. Accordingly, embodiments in accordance with the techniques herein may include such optimizations, thereby enabling PUMP to achieve lower overhead without compromising its rich policy model.

본 명세서에서의 기술에 따른 실시예는 별도로 또는 동시에 시행될 수 있는 임의의 수의 보안 정책을 인코딩하는데 사용될 수 있는 메타데이터로 메모리 워드 및 내부 프로세서 상태를 강화할 수 있다. 본 명세서에서의 기술에 따른 실시예는 "통상의" 프로세서(예를 들어, RISC-CPU, GPU, 벡터 프로세서 등)에다 데이터 흐름과 병렬로 작동하는 메타데이터 처리 유닛(PUMP)을 추가시킴으로써 전술한 바를 달성할 수 있고; 본 개시내용의 기술은 특별히 본 명세서에서의 기술이 광범위한 메타데이터 처리 정책에 적응되고 적용될 수 있도록 메타데이터를 제한 없고 소프트웨어적으로 프로그래밍 가능하게 만든다. 예를 들어, PUMP는 통상적인 (RISC) 프로세서의 새로운/별도의 파이프라인 스테이지로서 통합될 수 있거나, "호스트" 프로세서와 병렬로 작동하는 하드웨어의 스탠드얼론 단편으로 통합될 수 있다. 전자의 경우, 설계를 특성화하기 위하여 명령어 레벨 시뮬레이터, 정교한 정책, 구현 최적화와 자원 추정 및 광범위한 시뮬레이션이 있을 수 있다.Embodiments in accordance with the techniques herein may enrich memory words and internal processor state with metadata that may be used to encode any number of security policies that may be enforced separately or concurrently. Embodiments according to the technology herein are implemented by adding a metadata processing unit (PUMP) that operates in parallel with the data flow to a "normal" processor (eg, RISC-CPU, GPU, vector processor, etc.) can achieve bar; The techniques of this disclosure specifically make metadata unrestricted and software programmable so that the techniques herein can be adapted and applied to a wide range of metadata handling policies. For example, PUMP can be integrated as a new/separate pipelined stage in a conventional (RISC) processor, or as a standalone piece of hardware running in parallel with the “host” processor. In the former case, there may be command-level simulators, sophisticated policies, implementation optimization and resource estimation, and extensive simulations to characterize the design.

정책을 미세한 (즉, 명령어) 세분화 레벨(granularity level)에서 시행하려는 기존 해결책은 임의의 세트의 정책들을 시행할 수 없다. 보통, 소수의 고정 정책만이 명령어 레벨에서 시행될 수 있다. 더 높은 세분화 레벨(즉, 스레드)에서 정책을 시행하면 리턴 지향형 프로그래밍(Return Oriented Programming) 공격의 특정 부류를 예방할 수 없고, 이에 따라 그 유용성에 있어서 그런 유형의 시행은 제한적이 된다. 대조적으로, 본 명세서에서의 기술에 따른 실시예는 명령어 레벨에서 단독으로 또는 동시에 시행될 수 있는 무한한 수의 정책의 표현을 가능하게 한다(메타데이터는 임의의 모든 데이터 구조체를 가리킬 수 있는 어드레스 포인터의 관점에서 표현되기 때문에 유일한 제한은 크기 어드레스 공간(size address space)이다). Existing solutions that try to enforce policy at a fine (ie command) granularity level cannot enforce arbitrary sets of policies. Usually, only a few fixed policies can be enforced at the instruction level. Enforcing policies at higher levels of granularity (i.e., threads) does not prevent certain classes of Return Oriented Programming attacks, thus limiting those types of enforcement in their usefulness. In contrast, embodiments in accordance with the techniques herein enable the expression of an infinite number of policies that can be enforced singly or concurrently at the instruction level (metadata is a set of address pointers that can point to any and all data structures). Since it is expressed in terms, the only limitation is the size address space).

다음의 단락에서 설명되는 다양한 도면은 본 명세서에서 설명된 기술의 다양한 양태의 다양한 예, 방법 및 다른 예시적인 실시예를 도시한다는 것을 유의해야 한다. 그러한 도면에서, 도시된 요소 경계(예를 들어, 박스, 박스들의 그룹 또는 다른 형상)는 일반적으로 경계의 일례를 나타낸다는 것임을 인식할 것이다. 관련 기술분야에서 통상의 기술자라면 일부 예에서 하나의 요소가 다수의 요소로서 설계될 수 있거나 다수의 요소가 하나의 요소로서 설계될 수 있음을 인식할 것이다. 일부 예에서, 다른 요소의 내부 컴포넌트로 도시된 요소는 외부 컴포넌트로 구현될 수 있고 그 반대의 경우도 가능하다. 그뿐만 아니라, 요소는 일정한 비율로 그려지지 않을 수도 있다.It should be noted that the various figures described in the following paragraphs depict various examples, methods and other illustrative embodiments of various aspects of the technology described herein. It will be appreciated that in such figures, the depicted element boundaries (eg, boxes, groups of boxes or other shapes) generally represent examples of boundaries. Those skilled in the art will recognize that in some instances one element may be designed as multiple elements or multiple elements may be designed as one element. In some examples, an element shown as an internal component of another element may be implemented as an external component and vice versa. In addition, elements may not be drawn to scale.

도 1을 참조하면, 메타데이터 처리용 프로그래머블 유닛(PUMP)(10)은 에너지 절약(energy-conscious) 애플리케이션에 적합한 순서적(in-order) 구현 및 5-스테이지 파이프라인을 갖는 통상의 축소 명령어 집합 컴퓨팅 또는 컴퓨터(Reduced Instruction Set Computing or Computer)(RISC)로 통합될 수 있고, PUMP(10)가 추가된 6-스테이지 파이프라인으로 효과적으로 변환된다. 제 1 스테이지는 페치 스테이지(fetch stage)(14)이고, 제 2 스테이지는 디코드 스테이지(decode stage)(16)이고, 제 3 스테이지는 실행 스테이지(execute stage)(18)이고, 제 4 스테이지는 메모리 스테이지(memory stage)(20)이고, 제 5 스테이지는 라이트백 스테이지(writeback stage)(22)이다. PUMP(10)는 메모리 스테이지(20)와 라이트백 스테이지(22) 사이에 개재된다.Referring to FIG. 1, a programmable unit for metadata processing (PUMP) 10 is a typical reduced instruction set with an in-order implementation suitable for energy-conscious applications and a 5-stage pipeline. It can be integrated into a Reduced Instruction Set Computing or Computer (RISC), effectively transforming it into a 6-stage pipeline with the addition of a PUMP 10. The first stage is the fetch stage (14), the second stage is the decode stage (16), the third stage is the execute stage (18), and the fourth stage is the memory stage. stage (memory stage) 20, and the fifth stage is a writeback stage (writeback stage) (22). The PUMP 10 is interposed between the memory stage 20 and the writeback stage 22 .

다양한 실시예는 정책 시행 및 메타데이터 전파를 제공하는 메커니즘인 전자 로직을 사용하여 PUMP(10)를 구현할 수 있다. PUMP(10)의 실시예는 (i) 네 개의 다양한 정책 및 정책의 조합 하에서 벤치마크들의 표준 세트에 미치는PUMP(10)의 간단한 구현의 런타임, 에너지, 전력 상한 및 영역 영향에 대한 경험적 평가; (ii) 한 세트의 마이크로-아키텍처 최적화; (iii) 온-칩 메모리 구조체에 110 % 추가 영역을 사용함으로써 10 % 미만의 전형적인 런타임 오버헤드, 10 %의 전력 상한, 및 60 % 미만의 전형적인 에너지 오버헤드를 특징으로 할 수 있다.Various embodiments may implement PUMP 10 using electronic logic, a mechanism that provides policy enforcement and metadata propagation. Embodiments of PUMP 10 include (i) empirical evaluation of the runtime, energy, power capping and area impact of a simple implementation of PUMP 10 on a standard set of benchmarks under four different policies and combinations of policies; (ii) a set of micro-architectural optimizations; (iii) a typical runtime overhead of less than 10%, a power cap of 10%, and a typical energy overhead of less than 60% by using 110% extra area for the on-chip memory structure.

컴퓨팅 시, 벤치마킹은 개체의 상대적 성능을 평가하기 위하여, 다수의 표준 테스트 및 테스트에 대한 시도를 정상적으로 실행함으로써, 컴퓨터 프로그램, 한 세트의 프로그램 또는 다른 동작을 실행하는 행위라고 특징지을 수 있다. 본 명세서에서 사용되는 '벤치마크'라는 용어는 벤치마킹 프로그램 자체를 지칭한다. 이러한 애플리케이션 및 도면 전체에서 사용되는 벤치마크 프로그램 유형은 GemsFDTD, astar, bwaves, bzip2, cactusADM, calculix, deall, gamess, gcc, gobmk, gromacs, h264ref, hmmer, Ibm, leslie3d, libquantum, mcf, mile, namd, omnetpp, perlbench, sjeng, specrand, sphinx3, wrf, zeusmp 및 mean이다. 예를 들면, 도 10a, 도 10b 및 도 10c를 참조한다.In computing, benchmarking may be characterized as the act of executing a computer program, set of programs, or other operations, by normally running a number of standard tests and trials of tests, to evaluate the relative performance of an object. The term 'benchmark' used herein refers to the benchmarking program itself. The types of benchmark programs used across these applications and drawings are GemsFDTD, astar, bwaves, bzip2, cactusADM, calculix, deall, gamess, gcc, gobmk, gromacs, h264ref, hmmer, IBM, leslie3d, libquantum, mcf, mile, namd , omnetpp, perlbench, sjeng, specrand, sphinx3, wrf, zeusmp and mean. See, for example, FIGS. 10A, 10B and 10C.

본 명세서에서 사용되는 것으로서 "로직"은 기능(들) 또는 행위(들)을 수행하고 및/또는 다른 로직, 방법 및/또는 시스템으로부터 기능 또는 행위를 유발하는 하드웨어, 펌웨어, 소프트웨어 및/또는 각각의 조합을 포함하지만, 이것으로 제한되는 것은 아니다. 예를 들어, 원하는 어플리케이션 또는 필요에 따라, 로직은 소프트웨어 제어형 마이크로프로세서, 프로세서(예를 들어, 마이크로프로세서)와 같은 이산적 로직, 주문형 집적 회로(application specific integrated circuit)(ASIC), 프로그램된 로직 디바이스, 명령어들을 담은 메모리 디바이스, 메모리를 갖는 전기 디바이스 등을 포함할 수 있다. 로직은 하나 이상의 게이트, 게이트들의 조합 또는 다른 회로 컴포넌트를 포함할 수 있다. 로직은 또한 전체적으로 소프트웨어로 구현될 수도 있다. 다수의 로직이 설명되는 경우, 다수의 로직을 하나의 물리적 로직에 통합하는 것이 가능할 수 있다. 유사하게, 단일의 로직이 설명되는 경우, 다수의 물리적 로직 사이에 그 단일의 로직을 분포시키는 것이 가능할 수 있다.As used herein, "logic" means hardware, firmware, software and/or each component that performs function(s) or behavior(s) and/or causes that function or behavior from other logic, methods and/or systems. Combinations include, but are not limited to. For example, depending on the desired application or needs, the logic may be discrete logic such as software controlled microprocessors, processors (eg, microprocessors), application specific integrated circuits (ASICs), programmed logic devices. , a memory device containing instructions, an electrical device having memory, and the like. Logic may include one or more gates, combinations of gates, or other circuit components. The logic may also be implemented entirely in software. Where multiple logics are described, it may be possible to incorporate multiple logics into one physical logic. Similarly, where a single piece of logic is being described, it may be possible to distribute that single piece of logic among multiple physical logic.

본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, PUMP(10)는 통상의 RISC 프로세서(12)에 대한 확장으로 특징지을 수 있다. 다음의 단락은 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 PUMP(10)의 하드웨어 인터페이스 계층, 기본 마이크로아키텍처적 변경 및 동반하는 낮은 레벨의 소프트웨어를 구성하는 ISA(instruction set architecture)-레벨 확장의 추가적인 세부 사항을 제공한다.In at least one embodiment according to the techniques herein, PUMP 10 may be characterized as an extension to a conventional RISC processor 12 . The following paragraphs describe the instruction set architecture (ISA)-level extensions that make up the hardware interface layer of the PUMP 10, basic microarchitectural changes, and accompanying low-level software that may be used in embodiments according to the techniques herein. provides additional details of

본 명세서에서의 기술에 따른 실시예에서, PUMP-강화된 시스템(PUMP-enriched system)의 각 워드는 포인터-크기 태그와 연관될 수 있다. 이러한 태그는 하드웨어 레벨에서 해석되지 않다. 소프트웨어 레벨에서, 태그는 정책에 의해 정의된 것으로서, 무한한 크기 및 복잡성을 갖는 메타데이터를 나타낼 수 있다. 단지 몇 비트의 메타데이터만을 필요로 하는 더 간단한 정책은 메타데이터를 태그에 직접 저장할 수 있고; 더 많은 비트가 필요하면, 메타데이터를 데이터 구조체로서 메모리에 저장하는 간접 지정(indirection)이 사용되며, 이러한 구조체의 주소는 태그로서 사용된다. 특히, 이러한 포인터-크기 태그는 본 개시내용의 하나의 예시적인 양태이며, 제한적인 것으로 간주되지 않는다. 기본 어드레스 가능 메모리 워드는 불가분적으로 태그로 확장되어, 메모리, 캐시 및 레지스터를 비롯한 모든 값 슬롯을 적절하게 넓게 만든다. 프로그램 카운터(program counter)(PC)에도 또한 태깅된다. 소프트웨어-정의된 메타데이터 및 포인터-크기 태그로서 그 표현의 이러한 개념은 몇 비트만이 태그에 사용되고 및/또는 몇 비트가 고정된 해석으로 하드와이어드 이전의 태깅 접근법을 확장한다. 태깅 체계의 일부 예시적인 분류는 도 15에 재현되는 테이블 2에 제시된다.In embodiments consistent with the techniques herein, each word of a PUMP-enriched system may be associated with a pointer-sized tag. These tags are not interpreted at the hardware level. At the software level, tags are defined by policy and can represent metadata of infinite size and complexity. A simpler policy requiring only a few bits of metadata could store the metadata directly in the tag; If more bits are needed, indirection is used to store the metadata in memory as a data structure, and the address of this structure is used as the tag. In particular, this pointer-sized tag is one exemplary aspect of the present disclosure and is not to be considered limiting. The basic addressable memory word is indivisibly extended to the tag, making all value slots, including memory, cache and registers, appropriately wide. A program counter (PC) is also tagged. This concept of software-defined metadata and its representation as a pointer-sized tag extends pre-hardwired tagging approaches with interpretations where only a few bits are used in a tag and/or a few bits are fixed. Some exemplary classifications of tagging schemes are presented in Table 2 reproduced in FIG. 15 .

메타데이터 태그는 사용자 프로그램에 의해 어드레스 지정될 수 없다. 오히려, 메타데이터 태그는 아래에서 세부 설명된 규칙 캐시 미스 때 호출되는 정책 핸들러에 의해 어드레스 지정된다. 태그에 대한 모든 업데이트는 PUMP(10) 규칙을 통해 실현된다.Metadata tags cannot be addressed by user programs. Rather, metadata tags are addressed by policy handlers that are invoked on rule cache misses detailed below. All updates to tags are realized through PUMP (10) rules.

무한한 메타데이터 외에, 본 명세서에서의 기술에 따른 PUMP(10)의 실시예의 다른 특징은 메타데이터에 대해 단일 사이클 공통-사례 계산을 위한 하드웨어 지원이다. 이러한 계산은 opcode: (PC, CI, OP1, OP2, MR) => (PC_new, R)이라는 형태의 규칙의 관점에서 정의되고, 이는 "현재 opcode가 opcode이면, 프로그램 카운터 상의 현재 태그는 PC이고, 현재 명령어 상의 태그가 CI이고, (만일 있다면) 그의 입력 오퍼랜드 상의 태그가 OP1및 OP2이며, (로드(load)/스토어(store)의 경우) 메모리 위치 상의 태그가 MR 이면, 다음 머신 상태의 프로그램 카운터 상의 태그는 PC_new이어야 하며 명령어 결과상의 태그(만일 있다면, 목적지 레지스터 또는 메모리 위치)는 R이어야 한다"라고 판독되어야 한다. 두 개의 출력 태그를 최대 다섯 개의 입력 태그에서 계산될 수 있게 하는 이러한 규칙 포맷은 최대 두 개의 입력으로부터 하나의 출력을 전형적으로 계산하는 이전 동작에서 고려된 것보다 현저하게 융통적이다(도 15의 테이블 2 참조). 데이터 태그(OP1, OP2, MR, R)를 단지 추적하기만 하는 이전의 해결책 외에도, 본 개시내용은 코드 블록의 출처, 무결성 및 사용을 추적하고 시행하는데 사용될 수 있는 현재 명령어 태그(current instruction tag)(CI)를 제공하고; 그뿐만 아니라 실행 이력, 주변 인가(ambient authority) 및 암시적 정보 흐름을 포함하는 "제어 상태"를 기록하는데 사용할 수 있는 PC 태그를 제공한다. CFI 정책은 PC 태그를 이용하여 간접 점프(indirect jump)의 소스를 기록하기 위한 PC 태그 및 점프 타깃을 식별하기 위한 CI 태그를 이용하며, NXD + NWC는 CI를 적극 활용하여 데이터가 실행 가능하지 않도록 시행하고, 테인트 추적(Taint Tracking)은 이를 생성하였던 코드에 기초하여 데이터를 오염시키는 CI를 사용한다.In addition to infinite metadata, another feature of embodiments of PUMP 10 according to the techniques herein is hardware support for single cycle common-case computations on metadata. This calculation is defined in terms of a rule of the form opcode: (PC, CI, OP1, OP2, MR) => (PC _new , R), which means "if current opcode is opcode, current tag on program counter is PC and , if the tag on the current instruction is CI, the tags on its input operands (if any) are OP1 and OP2, and the tag on the memory location (for load/store) is MR, then the program in the next machine state The tag on the counter should be PC _new and the tag on the command result (destination register or memory location, if any) should be R". This rule format, allowing two output tags to be computed from up to five input tags, is significantly more flexible than that considered in the previous operation, which typically computes one output from up to two inputs (Table 15, Figure 15). 2). In addition to previous solutions that only track data tags (OP1, OP2, MR, R), the present disclosure provides a current instruction tag that can be used to track and enforce the origin, integrity and usage of a block of code. (CI) provided; In addition, it provides PC tags that can be used to record "control states" including execution history, ambient authority, and implicit information flow. The CFI policy uses PC tags to record the source of indirect jumps and CI tags to identify jump targets, and NXD + NWC actively utilizes CI to ensure that data is not executable. Enforce, and Taint Tracking uses CIs that taint data based on the code that created them.

일반적인 사례에서 단일 사이클에서 규칙을 해결하기 위해, 본 명세서에서의 기술에 따른 실시예는 가장 최근에 사용된 규칙의 하드웨어 캐시를 사용할 수 있다. 명령어 및 정책에 따라, 주어진 규칙에서 입력 슬롯들 중 하나 이상이 사용되지 않을 수 있다. 사용되지 않는 슬롯의 모든 가능한 값에 대한 규칙으로 캐시를 오염시키는 것을 방지 하기 위해, 규칙-캐시 룩업 로직(lookup logic)은 각 입력 슬롯-opcode 쌍에 대해, 대응하는 태그가 규칙 캐시 룩업 시에 실제로 사용되는지를 결정하는, "돈-케어(don't-care)" (도 1 참조) 비트가 들어있는 비트 벡터를 참조한다. 이러한 "돈-케어" 입력을 효율적으로 처리하기 위해, 이들 입력은 입력을 PUMP(10)에 제시하기 전에 마스킹된다. 돈-케어 비트 벡터는 미스 핸들러 설치의 일부로서 권한 있는(privileged) 명령어에 의해 설정된다.To resolve rules in a single cycle in the typical case, embodiments in accordance with the techniques herein may use a hardware cache of most recently used rules. Depending on the command and policy, one or more of the input slots in a given rule may not be used. To avoid polluting the cache with rules for all possible values of unused slots, the rules-cache lookup logic ensures that for each input slot-opcode pair, the corresponding tag is actually Refers to the bit vector containing the "don't-care" (see Figure 1) bit, which determines whether it is used. To efficiently process these "money-care" inputs, these inputs are masked before presenting them to PUMP 10. The don-care bit vector is set by a privileged instruction as part of the miss handler installation.

도 1은 일반적으로 PUMP(10) 하드웨어를 통합한 수정된 5-스테이지 프로세서(12) 파이프라인을 갖는 본 명세서에서의 기술에 따른 일 실시예를 도시한다. PUMP(10) 스테이지가 프로세서 파이프라인에 추가 기능정지(stall)를 생성하지 않도록 규칙 캐시 룩업이 추가 스테이지 및 바이패스 태그 및 데이터로서 독립적으로 추가된다.Figure 1 shows one embodiment according to the techniques herein having a modified 5-stage processor 12 pipeline generally incorporating PUMP 10 hardware. Rule cache lookups are added independently as additional stages and bypass tags and data so that the PUMP 10 stage does not create additional stalls in the processor pipeline.

(메모리 스테이지(20)와 라이트백 스테이지(22) 사이에) 별도의 스테이지로서 PUMP(10)를 배치하는 것은 메모리(로드)로부터 판독된 워드 상에 태그를 제공(로드)할 필요에 의해 또는 PUMP(10)로의 입력으로서 메모리(스토어)에 오버라이트(overwirite)되게 할 필요에 의해 동기 부여된 것이다. 기입되는 메모리 위치의 기존 태그에 의존하는 규칙이 허용되기 때문에, 기입 동작은 판독-수정-기입 동작이 된다. 판독 규칙과 같은 메모리 스테이지(20) 동안 기존 태그가 판독되고, 판독 규칙은 PUMP(10) 단계에서 체크되고, 기입은 라이트백 스테이지(22)라고도 지칭될 수 있는 커밋 스테이지(Commit stage) 동안 수행된다. 임의의 캐싱 체계와 함께, 다중 레벨의 캐시가 PUMP(10)에 사용될 수 있다. 아래에서 보다 상세히 설명하는 바와 같이, 본 명세서에서의 기술에 따른 실시예는 두 레벨의 캐시를 이용할 수 있다. 다중 레벨의 캐시로의 확장은 관련 기술분야에서 통상의 기술자에게 용이하게 자명하다.Placing PUMP 10 as a separate stage (between memory stage 20 and writeback stage 22) is driven by the need to provide (load) tags on words read from memory (load) or by the PUMP Motivated by the need to have the memory (store) overwritten as an input to (10). A write operation becomes a read-modify-write operation, since rules that depend on the existing tag of the memory location being written are allowed. Existing tags are read during the memory stage 20, such as read rules, read rules are checked in the PUMP 10 stage, and writes are performed during the commit stage, which may also be referred to as writeback stage 22. . Multiple levels of cache may be used with PUMP 10, along with any caching scheme. As described in more detail below, embodiments in accordance with the techniques herein may utilize a two-level cache. Extensions to multi-level caches are readily apparent to those skilled in the art.

비 제한적인 하나의 예에서, 라이트백 스테이지(22)의 규칙 캐시에서 최종 레벨 미스가 발생할 때, 다음과 같이 처리된다: (i) 현재의 opcode 및 태그는 이 목적을 위해서만 사용되는 프로세서 레지스터의 (새로운) 세트에 저장되고, (ii) 제어는 정책 미스 핸들러(아래에서 보다 자세히 설명됨)로 이전되며, (iii) 동작이 허용되는지 그리고 허용된다면 적절한 규칙이 생성되는지를 결정한다. 미스 핸들러가 리턴할 때, 하드웨어는 (iv) 이 규칙을 PUMP(10) 규칙 캐시에 설치하고, (v) 결함유도(faulting) 명령어를 재-발행한다. 권한을 받은 미스 핸들러와 나머지 시스템 소프트웨어 및 사용자 코드 사이의 분리를 제공하기 위해, 규칙 캐시 미스상에 설정되고 핸들러가 리턴할 때 리셋되는 프로세서 상태의 비트에 의해 제어되는 미스 핸들러 동작 모드가 프로세서에 추가된다. 모든 규칙 캐시 미스에 관해 레지스터를 저장하고 복원할 필요를 방지하기 위해, 정수 레지스터 파일이 미스 핸들러에서만 이용 가능한 16 개의 추가 레지스터로 확장할 수 있다.In one non-limiting example, when a last level miss occurs in the rules cache of writeback stage 22, the following is processed: (i) the current opcode and tag are stored in ( new) set, (ii) control transfers to the Policy Miss Handler (described in more detail below), and (iii) determines whether the action is allowed and, if so, appropriate rules are created. When the miss handler returns, the hardware (iv) installs this rule into the PUMP 10 rule cache, and (v) re-issues the faulting instruction. To provide separation between authorized miss handlers and the rest of the system software and user code, a miss handler mode of operation is added to the processor, controlled by bits in the processor state that are set on rule cache misses and reset when the handler returns. do. To avoid the need to save and restore registers on every rule cache miss, the integer register file can be extended with 16 additional registers available only to the miss handler.

또한, 규칙 입력 및 출력은 미스 핸들러(다른 것은 제외함)가 태그를 일반 값으로 조작할 수 있게 하는 미스 핸들러 모드(예를 들어, 레지스터 윈도우)에서, 레지스터로서 출현한다. 되풀이 하면, 이것들은 모두 라이트백 스테이지(22)의 비 제한적인 예이다.Rule inputs and outputs also appear as registers, in miss handler mode (e.g., register windows), which allows miss handlers (but not others) to manipulate tags to generic values. Again, these are all non-limiting examples of the writeback stage 22.

새로운 미스 핸들러 리턴 명령어가 추가되어 규칙을 PUMP(10) 규칙 캐시에 설치하고 사용자 코드로 리턴한다. 이와 같은 특정의 비 제한적인 실시예에서, 이러한 명령어는 미스 핸들러 모드에 있을 때만 발행될 수 있다. 미스 핸들러 모드에 있는 동안, 규칙 캐시는 무시되고 그 대신 PUMP(10)는 단일의 하드와이어드 규칙을 대신 적용한다: 미스 핸들러가 접촉한 모든 명령어 및 데이터는 미리 정의된 MISSHANDLER 태그로 태깅되어야하며, 모든 명령어 결과에는 동일한 태그가 주어진다. 이러한 방식으로, PUMP(10) 아키텍처는 사용자 코드가 정책에 의해 제공되는 보호를 약화시키는 것을 방지한다. 대안적으로, PUMP는 미스 핸들러 액세스에 관한 융통성 있는 규칙을 시행하는데 사용될 수 있다. 태그는 사용자 코드에 의해 나눌 수 있거나, 어드레스 지정 가능하거나, 또는 교체 가능하지 않고; 메타데이터 데이터 구조체 및 미스 핸들러 코드는 사용자 코드에 의해 건드릴 수 없으며; 사용자 코드는 규칙을 규칙 캐시에 직접 삽입할 수 없다.A new miss handler return instruction has been added to install the rule into the PUMP(10) rule cache and return to user code. In this particular non-limiting embodiment, this command may only be issued when in miss handler mode. While in miss handler mode, the rule cache is ignored and instead the PUMP 10 applies a single hardwired rule instead: all instructions and data touched by the miss handler must be tagged with the predefined MISSHANDLER tag, and all Command results are given the same tags. In this way, the PUMP 10 architecture prevents user code from weakening the protection provided by the policy. Alternatively, PUMP can be used to enforce flexible rules on miss handler access. Tags are not segmentable, addressable, or replaceable by user code; Metadata data structures and miss handler code cannot be touched by user code; User code cannot directly insert rules into the rule cache.

도 19를 참조하면, 알고리즘 1은 테인트 추적 정책을 위한 미스 핸들러의 동작을 도시한다. 개별 태그(및 규칙)의 수를 최소화하기 위해, 미스 핸들러는 구축하는 임의의 새로운 데이터 구조체를 "정규화(canonicalization)"하여 논리적으로 동등한 메타데이터에 단일 태그를 사용한다.Referring to FIG. 19, Algorithm 1 shows the operation of the miss handler for the taint tracking policy. To minimize the number of individual tags (and rules), the miss handler "canonicalizes" any new data structures it builds, using a single tag for logically equivalent metadata.

사용자가 단일 정책을 선택하도록 강요하는 대신, 다수의 정책을 동시에 시행하고 나중에 새로운 정책을 추가한다. 이러한 "무한" 태그의 예시적인 장점은 이들이 동시에 임의의 수의 정책을 시행할 수 있다는 것이다. 이것은 태그를 여러 컴포넌트 정책으로부터의 태그들의 튜플(tuple)을 가리키는 포인터가 되게 함으로써 달성될 수 있다. 예를 들어 NXD+NWC 정책을 테인트 추적 정책과 조합하기 위해, 각 태그는 튜플(s, t)을 가리키는 포인터가 될 수 있으며, 여기서 s는 NXD+NWC 태그(DATA 또는 CODE)이고, t는 테인트 태그(테인트들의 세트를 가리키는 포인터)이다. 규칙 캐시 룩업은 비슷하며; 그러나 미스가 발생할 때 두 컴포넌트 정책이 별도로 평가된다: 두 정책이 이를 허용하는 경우에만 동작이 허용되며, 결과적인 태그는 두 컴포넌트 정책의 결과의 쌍이다. 그러나, 다른 실시예들에서, 정책이 어떻게 (단순히 모든 구성 컴포넌트 사이에 AND로서가 아니라) 조합되어야 하는지를 표현하는 것이 가능할 수 있다.Instead of forcing users to choose a single policy, enforce multiple policies simultaneously and add new ones later. An exemplary advantage of these "infinite" tags is that they can enforce any number of policies at the same time. This can be achieved by making the tag a pointer to a tuple of tags from several component policies. For example, to combine an NXD+NWC policy with a taint tracking policy, each tag could be a pointer to a tuple (s, t), where s is an NXD+NWC tag (DATA or CODE) and t is A taint tag (a pointer to a set of taints). Rule cache lookups are similar; However, when a miss occurs, the two component policies are evaluated separately: the action is allowed only if both policies allow it, and the resulting tag is a pair of results of the two component policies. However, in other embodiments, it may be possible to express how policies should be combined (and not simply as an AND between all constituent components).

도 20을 참조하면, 알고리즘 2는 모든 N 정책에 대한 복합 미스 핸들러의 일반적인 거동을 도시한다. 튜플 내의 태그가 어떻게 상관되는지에 따라, 태그 수 및 규칙 수가 크게 증가하는 결과를 가져올 수 있다. 다수의 정책을 동시에 지원하고 작업 세트 크기(working set size)에 미치는 영향을 측정하기 위해, 실험을 통해 복합 정책("복합")이 구현되었으며, 여기서 복합 정책은 위에 설명된 모두 네 개의 정책을 포함한다. 복합 정책은 지원되는 정책 작업부하의 종류를 나타내며 아래에서 자세히 설명한다. 도 4a 및 도 20에서 도시된 바와 같이, 복합 정책은 다음의 정책, (i) 공간적 및 시간적 메모리 안전, (ii) 테인트 추적, (iii) 제어 흐름 무결성 및 (iv) 코드 및 데이터 분리를 동시에 시행한다.Referring to Figure 20, Algorithm 2 shows the general behavior of a compound miss handler for all N policies. Depending on how the tags within the tuple are correlated, this can result in a large increase in the number of tags and number of rules. To support multiple policies simultaneously and measure their impact on working set size, a composite policy (“composite”) was implemented experimentally, where the composite policy contained all four policies described above. do. Composite policies represent the types of policy workloads supported and are detailed below. As shown in FIGS. 4A and 20 , the composite policy concurrently implements the following policies: (i) spatial and temporal memory safety, (ii) taint tracking, (iii) control flow integrity, and (iv) code and data separation. enforce

대부분의 정책은 적절한 로직을 선택하기 위해 opcode 상에 디스패치한다. NXD+NWC와 같은 일부 정책은 동작이 허용되는지를 바로 체크할 것이다. 다른 정책은 데이터 구조체를 참고할 수 있다(예를 들어, CFI 정책은 허용된 간접 호출 및 리턴 id의 그래프를 참고한다). 메모리 안전은 어드레스 컬러(즉, 포인터 컬러)와 메모리 영역 컬러 간의 동일성을 체크한다. 테인트 추적은 입력 태그(알고리즘 1)를 조합함으로써 새로운 결과 태그를 계산한다. 대규모 데이터 구조체에 액세스해야 하는 정책(CFI) 또는 대규모 집계 전체를 표준화해야 하는 정책(테인트 추적, 복합)은 온-칩 캐시에서 미스를 발생하여 DRAM으로 이동하는 많은 메모리 액세스를 만들 수 있다. 모든 벤치마크 전체를 평균하여, NXD+NWC에 대한 미스를 서비스하는데 30 사이클, 메모리 안전에 60 사이클, CFI에 85 사이클, 테인 추적에 500 사이클 및 복합에 800 사이클이 필요했다.Most policies dispatch on opcodes to select the appropriate logic. Some policies like NXD+NWC will check right away if the action is allowed. Other policies may reference data structures (e.g., CFI policies reference graphs of allowed indirect calls and return ids). Memory safety checks for equality between the address color (i.e. pointer color) and the memory area color. Taint tracking calculates a new result tag by combining the input tags (Algorithm 1). Policies that require access to large data structures (CFIs), or policies that require standardization across large aggregates (taint tracking, complex), can cause many memory accesses that go to DRAM with misses in the on-chip cache. Averaging across all benchmarks, it took 30 cycles to service misses for NXD+NWC, 60 cycles for memory safe, 85 cycles for CFI, 500 cycles for trace trace and 800 cycles for composite.

정책 미스 핸들러가 동작이 허용되지 않는다고 결정하면, 적합한 보안 결함 핸들러(security fault handler)를 호출한다. 이 결함 핸들러가 행하는 일은 런타임 시스템 및 정책에 달려 있고; 전형적으로, 이것은 문제가 되는 프로세스를 종료할 것이지만, 대신에 경우에 따라 적절한 "안전한 값"을 리턴할 수 있다. UNIX-스타일 오퍼레이팅 시스템을 사용한 점진적 전개의 경우, 추정된 정책이 프로세스마다 적용되어, 각 프로세스는 서로 다른 정책 세트를 가질 수 있다. 프로세스 당 적용되는 리사이테이션(recitation)은 비 제한적이지만 오히려 예시적이고 관련 기술분야에서 통상의 기술자는 이를 인식한다. 또한 이것은 우리가 태그, 규칙 및 미스 핸들링 지원을 프로세스의 어드레스 공간에 배치할 수 있게 해주어, OS 레벨의 컨텍스트 전환이 필요하지 않게 한다. 장기적으로, 아마도 PUMP 정책은 OS도 보호하는데 사용될 수 있다.If the policy miss handler determines that the operation is not allowed, it calls the appropriate security fault handler. What this fault handler does depends on the runtime system and policy; Typically, this will terminate the offending process, but may instead return an appropriate "safe value" as the case may be. For incremental deployment using UNIX-style operating systems, the assumed policy is applied per process, so each process can have a different set of policies. The recitations applied per process are non-limiting but rather illustrative and will be recognized by those skilled in the art. It also allows us to place tags, rules, and miss handling support into the address space of a process, eliminating the need for OS-level context switching. In the long term, perhaps the PUMP policy can be used to protect the OS as well.

런타임, 에너지, 면적 및 전력 오버헤드를 측정하기 위한 다음의 세부 평가 방법론은 128b 워드(64b 페이로드 및 64b 태그) 및 도 1에 도시된 수정된 파이프라인 프로세서(12)를 사용하여 이것을 PUMP 하드웨어 및 소프트웨어의 간단한 구현에 적용한다. 최적화된 구현은 (베이스라인 프로세서와 관련하여) 오버헤드를 궁극적으로 바라던 버전일지라도, 간단한 PUMP 구현을 먼저 설명하고 측정하는 것이 유용하다. 보다 정교한 버전에 다다르기 전에 이것이 핵심 메커니즘의 기본 버전을 상세히 열거하기 때문에 둘 모두 설명된다.The following detailed evaluation methodology for measuring runtime, energy, area and power overhead uses 128b words (64b payload and 64b tags) and the modified pipeline processor 12 shown in FIG. Applies to simple implementations of software. It is useful to first describe and measure a simple PUMP implementation, even though the optimized implementation is the version where the overhead (relative to the baseline processor) is ultimately desired. Both are explained as this enumerates the basic version of the core mechanism in detail before getting to the more elaborate version.

PUMP의 물리적 자원 영향을 평가하기 위해, 메모리 비용에 무엇보다 먼저 초점이 맞추어졌는데, 왜냐하면 메모리는 간단한 RISC 프로세서 및 PUMP 하드웨어 확장 시 주요한 영역이고 에너지 소비자이기 때문이다. 32 nm의 낮은 동작 전력(Low Operating Power)(LOP) 프로세스는 L1 메모리(도 1 참조) 용도로 고려되며, 낮은 대기 전력(Low Standby Power)(LSTP)은 L2 메모리 용도로 고려되고, 면적, 액세스 시간, 액세스 당 에너지 및 메인 메모리와 프로세서 온-칩 메모리의 정적(누설) 전력을 모델링하기 위해서는 CACTI 6. 5를 사용한다.To evaluate the physical resource impact of PUMP, memory cost was first of all focused, since memory is a major area and energy consumer for simple RISC processors and PUMP hardware expansion. 32 nm Low Operating Power (LOP) process is considered for L1 memory (see Fig. 1), Low Standby Power (LSTP) is considered for L2 memory, area, access CACTI 6.5 is used to model the time, energy per access, and static (leakage) power of main memory and processor on-chip memory.

베이스라인 프로세서(PUMP 없음)는 데이터 및 명령어용의 별도의 64KB L1 캐시 및 일원화된 512KB L2 캐시를 갖는다. 지연에 최적화된 L1 캐시 및 에너지에 최적화된 L2 캐시가 사용되었다. 모든 캐시는 라이트백 규율(writeback discipline)을 사용한다. 베이스라인 L1 캐시는 약 880ps의 대기 시간을 가지며; 한 사이클 내에서 결과를 리턴할 수 있고 그 클럭을 1 ns로 설정하여 근래의 내장형 셀 폰 프로세서에 필적하는 1 GHz 사이클 목표를 제공할 수 있다고 추정된다. 이 프로세서의 파라미터는 도 16의 테이블 3에 제시된다.Baseline processors (no PUMP) have separate 64KB L1 caches for data and instructions and a unified 512KB L2 cache. A latency-optimized L1 cache and an energy-optimized L2 cache were used. All caches use writeback discipline. The baseline L1 cache has a latency of about 880 ps; It is estimated that it can return a result in one cycle and set its clock to 1 ns to provide a 1 GHz cycle target comparable to current embedded cell phone processors. The parameters of this processor are presented in Table 3 of FIG. 16 .

PUMP 규칙 캐시(10) 하드웨어 구현의 일 실시예는 두 부분: 스테이지(14, 16, 20) 내의 모든 아키텍처적 상태를 태그로 확장하는 것 및 프로세서(12)에 PUMP 규칙 캐시를 추가하는 것을 포함할 수 있다. 온-칩 메모리 내의 각각의 64b 워드를 64b 태그로 확장하는 것은 액세스 당 영역 및 에너지를 증가시키고 액세스 대기 시간을 악화시킨다. 이것은 이미 다중 사이클 액세스 대기 시간을 가지며 매 사이클마다 사용되지 않는 L2 캐시가 잠재적으로 견딜 가능성이 있다. 그러나 L1 캐시(도 1 참조)에 액세스하기 위한 가외의 대기 시간 사이클을 추가하면 파이프라인에서 기능정지를 초래할 수 있다. 이것을 피하기 위해, 이러한 간단한 구현에서 L1 캐시의 유효 용량이 기본 설계의 절반으로 줄어든 다음 태그를 추가하는데; 이것은 L1 캐시에 대해 동일한 단일 사이클 액세스를 제공하지만 미스가 증가함으로 인해 성능을 저하시킬 수 있다.One embodiment of a PUMP rules cache 10 hardware implementation may include two parts: extending all architectural state in stages 14, 16, 20 into tags, and adding a PUMP rules cache to processor 12. can Extending each 64b word in the on-chip memory to a 64b tag increases area and energy per access and worsens access latency. This already has multi-cycle access latencies, and the L2 cache potentially goes unused every cycle. However, adding an extra cycle of latency to access the L1 cache (see Figure 1) can cause a stall in the pipeline. To avoid this, in this simple implementation, the effective capacity of the L1 cache is reduced to half of the basic design and then tags are added; This provides the same single cycle access to the L1 cache, but can degrade performance due to increased misses.

본 명세서에서의 기술에 따른 실시예에서, PUMP 규칙 캐시(10)는 전통적인 캐시 어드레스 키(cache address key)(어드레스 폭 미만)와 비교되는 롱 매치 키(long match key)(5 포인터-크기 태그 및 명령어 opcode 또는 328b)를 이용하고, 128b 결과를 리턴한다. 일 실시예에서, 완전 연관 L1 규칙 캐시가 사용될 수 있지만 높은 에너지 및 지연을 초래할 수 있다(도 16의 테이블 3 참조). 대안으로서, 본 명세서에서의 기술에 따른 실시예는 도 22에 도시된 바와 같이, 네 개의 해시 함수로 영감을 얻은 다중 해시 캐시 방식을 이용할 수 있다. L1 규칙 캐시는 단일 사이클에서 결과를 생성하고, 제 2 사이클에서 거짓 히트(false hit)에 대해 체크하는 반면, L2 규칙 캐시는 낮은 에너지 용도로 설계되어 다중 사이클 액세스 대기 시간을 제공한다. 되풀이 하면, 도 16의 테이블 3은 간단한 구현에 사용된 1024-엔트리 L1 및 4096-엔트리 L2 규칙 캐시의 파라미터를 도시한다. 이러한 캐시가 용량에 도달할 때, 간단한 선입 선출(first-in-first out)(FIFO) 대체 정책이 사용되는데, 이 정책은 현재 동작 부하에서 실제로 잘 작동하는 것으로 보인다(FIFO는 본 출원에서 LRU의 6 % 이내임).In an embodiment according to the techniques herein, the PUMP rules cache 10 uses a long match key (5 pointer-sized tag and Use instruction opcode or 328b) and return 128b result. In one embodiment, a fully associative L1 rule cache may be used but may result in high energy and latency (see Table 3 in FIG. 16). Alternatively, embodiments consistent with the techniques herein may use a multiple hash cache scheme inspired by four hash functions, as shown in FIG. 22 . The L1 rule cache produces results in a single cycle and checks for false hits in a second cycle, while the L2 rule cache is designed for low energy use and provides multi-cycle access latency. Again, Table 3 of FIG. 16 shows the parameters of the 1024-entry L1 and 4096-entry L2 rule caches used in a simple implementation. When these caches reach capacity, a simple first-in-first out (FIFO) replacement policy is used, which seems to work well in practice under the current operating load (FIFO is the LRU's in this application). within 6%).

도 2를 참조하면, PUMP의 성능 영향 평가는 ISA, PUMP 및 어드레스-트레이스 시뮬레이터의 조합을 식별한다. gem5 시뮬레이터(24)는 64-비트 Alpha 베이스라인 ISA상에서 SPEC CPU2006 프로그램(gem5가 실패한 xalancbmk 및 tonto을 생략함)에 대한 명령어 트레이스를 발생한다. 각 프로그램은 1B 명령어의 워밍업 기간 동안 위에 열거된 네 개의 정책 각각 및 복합 정책에 대해 시뮬레이션한 다음 500M 명령어를 평가한다. gem5 시뮬레이터(24)에서, 각각의 벤치마크는 태그 또는 정책없이 베이스라인 프로세서상에서 실행된다. 그런 다음 결과로 생긴 명령 트레이스(26)는 각 명령어에 대해 메타데이터 계산을 수행하는 PUMP 시뮬레이터(28)를 통해 실행된다. 이러한 "단계적(phased)" 시뮬레이션 전략은 PUMP의 결과가 프로그램의 제어 흐름을 그의 베이스라인 실행으로부터 벗어나게 할 수 없는 고장-정지 정책에 대해 정확하다. 어드레스-트레이스 시뮬레이션은 고도로 파이프라인화되고 비순차적인 프로세서에 대해 정확하지 않을 수 있지만, 간단하고 순차적인 5-스테이지 및 6-스테이지 파이프라인에 대해서는 매우 정확하다. 베이스라인 구성에서, gem5 명령어 시뮬레이션 및 어드레스 트레이스 발생(30)에 뒤이은 어드레스 시뮬레이터(32)에서의 커스톰 어드레스 트레이스 시뮬레이션 및 카운팅은 gem5의 사이클-정확도 시뮬레이션의 1.2 % 이내였다.Referring to Figure 2, the performance impact assessment of PUMP identifies a combination of ISA, PUMP and address-trace simulator. The gem5 simulator 24 generates an instruction trace for the SPEC CPU2006 program (omitting xalancbmk and tonto, where gem5 fails) on the 64-bit Alpha baseline ISA. Each program simulates for each of the four policies listed above and for a composite policy during a warm-up period of 1B instructions, then evaluates 500M instructions. In the gem5 simulator 24, each benchmark runs on the baseline processor without tags or policies. The resulting instruction trace 26 is then run through the PUMP simulator 28 which performs metadata calculations for each instruction. This "phased" simulation strategy is correct for a fail-stop policy in which the result of a PUMP cannot divert the program's control flow from its baseline execution. Address-trace simulations may not be accurate for highly pipelined, out-of-order processors, but are very accurate for simple, sequential 5-stage and 6-stage pipelines. In the baseline configuration, custom address trace simulation and counting in address simulator 32 followed by gem5 instruction simulation and address trace generation 30 were within 1.2% of gem5's cycle-accuracy simulation.

PUMP 시뮬레이터(28)는 각 정책을 구현하는 미스-핸들러 코드(C로 작성됨)를 포함하고, 메타데이터 태그는 정책에 따라 초기 메모리 상에 할당된다. PUMP 시뮬레이터(28)는 PUMP(10) 규칙 캐시 내의 액세스 패턴을 포착하고 L2 규칙 캐시에 액세스하는데 요구되는 더 긴 대기 사이클을 감안하여 연관된 런타임 및 에너지 비용을 추정한다. 미스 핸들러 코드를 갖는 PUMP 시뮬레이터(28)가 또한 프로세서상에서 실행되기 때문에, gem5상의 미스 핸들러에 대해 별도의 시뮬레이션을 수행하여 동적 거동을 포착한다. 미스-핸들러 코드는 잠재적으로 데이터 및 명령어 캐시에 영향을 미치기 때문에, 사용자 및 미스-핸들러 코드 둘 모두로부터 적절히 인터리빙된 메모리 액세스를 포함하는 병합된 어드레스-트레이스가 생성되는데, 이는 메모리 시스템의 성능 영향을 평가하는 최종 어드레스-트레이스 시뮬레이션에 사용된다.The PUMP simulator 28 contains miss-handler code (written in C) that implements each policy, and metadata tags are allocated on initial memory according to the policy. The PUMP simulator 28 captures access patterns within the PUMP 10 rule cache and estimates the associated runtime and energy cost given the longer wait cycles required to access the L2 rule cache. Since the PUMP simulator 28 with the miss handler code also runs on the processor, a separate simulation is performed for the miss handler on gem5 to capture the dynamic behavior. Because miss-handler code potentially affects data and instruction caches, a merged address-trace is created that contains properly interleaved memory accesses from both user and miss-handler code, which has no performance impact on the memory system. It is used for the final address-trace simulation to be evaluated.

다음 단락에서, PUMP 없는 베이스라인과 비교하여 간단한 PUMP 구현의 평가가 제공된다.In the following paragraphs, an evaluation of a simple PUMP implementation compared to a baseline without PUMP is provided.

하나의 평가 포인트로서, 베이스라인 프로세서 외에 PUMP(10)의 전체 영역 오버헤드는 190 %(도 16의 테이블 3 참조)라는 것을 유의하여야 한다. 이러한 영역 오버헤드 중 주요한 부분(110 %)은 PUMP(10) 규칙 캐시에서 비롯된다. 단일화된 L2 캐시는 나머지 영역 오버헤드의 대부분의 원인이 된다. L1 D/I 캐시는 그의 유효 용량이 반으로 줄어들기 때문에 대략 동일하게 유지된다. 이렇게 높은 메모리 영역 오버헤드는 정적 전력을 대략 세 배로 늘리기에, 에너지 오버헤드의 24 %의 원인이 된다.As one evaluation point, it should be noted that the overhead of the entire area of the PUMP 10 besides the baseline processor is 190% (see Table 3 in FIG. 16). A major part (110%) of this region overhead comes from the PUMP 10 rules cache. The unified L2 cache accounts for most of the remaining area overhead. The L1 D/I cache remains approximately the same as its effective capacity is cut in half. This high memory area overhead roughly triples the static power, contributing 24% of the energy overhead.

또 다른 평가 포인트는 런타임 오버헤드와 관련이 있다. 대부분의 벤치마크에서 모든 단일 정책의 경우, 이러한 간단한 구현의 평균 런타임 오버헤드는 단지 10 %이다(도 3a 및 도 3b 참조; 박스플롯을 읽으려면: 막대는 중간 값이고, 박스는 위와 아래의 1/4 분위를 차지하고(건수의 중간 50 %), 점은 각 개개의 데이터 포인트를 나타내며, 위스커(whisker)는 아웃 라이어를 제외한 전체 범위(각각의 사분위수 1.5x초과)를 나타내며, 주요한 오버헤드는 프로세서에 및 프로세서로부터 태그 비트를 전송하는데 필요한 추가 DRAM 트래픽으로부터 나온다). 메모리 안전 정책(도 3a 및 도 3b)에 대해, 새로 할당된 메모리 블록에서 강제 미스로 인해 총 오버헤드를 40 내지 50 %까지 밀어 올리는 높은 미스 핸들러 오버헤드를 보이는 몇 가지 벤치마크가 있다. 복합 정책 런타임(도에서 "CPI" 또는 "CPI 오버헤드"라고 표기됨)의 경우, 벤치마크 중 다섯 개가 미스 핸들러에서 매우 높은 오버헤드를 겪고 있고(도 4a 참조), 최악의 경우 GemsFTDT에서는 780 %에 가깝고 geomean에서는 50 %에 이른다. 도 4b에 도시된 복합 정책 에너지(도면에서 "EPF" 또는 "EPI 오버헤드"로 표기됨)의 경우, 벤치마크 중 세 개(예를 들어, GemsFTDT, astar, omnetpp)는 미스 핸들러에서 매우 높은 오버헤드를 겪으며, 최악의 경우 GemsFTDT에서는 1600 %, astar에서는 600 %, 그리고 omnetpp에서는 520 %에 가깝다).Another evaluation point has to do with runtime overhead. For every single policy in most benchmarks, the average runtime overhead of this simple implementation is only 10% (see Figs. 3a and 3b; to read the boxplots: bars are the medians, boxes are the top and bottom 1s). occupies the /4 quartile (middle 50% of cases), dots represent each individual data point, whiskers represent the full range excluding outliers (each quartile greater than 1.5x), and the major overhead is from the additional DRAM traffic required to transfer tag bits to and from the processor). For the memory safety policy (FIGS. 3A and 3B), there are several benchmarks that show high miss handler overhead, boosting the total overhead by 40-50% due to forced misses in newly allocated memory blocks. For the composite policy runtime (labeled "CPI" or "CPI overhead" in the figure), five of the benchmarks suffer from very high overhead in the miss handler (see Figure 4a), and in the worst case GemsFTDT it suffers from 780% overhead. , and in geomean it amounts to 50%. For the composite policy energy shown in Fig. 4b (labeled "EPF" or "EPI overhead" in the figure), three of the benchmarks (e.g., GemsFTDT, astar, omnetpp) show very high overhead in the miss handler. heads, worst case is 1600% on GemsFTDT, 600% on astar, and close to 520% on omnetpp).

이러한 오버헤드의 원인이 되는 두 가지 인자는: (1) 최종 레벨 규칙 캐시 미스를 해결하는데 요구되는 많은 수의 사이클(모든 컴포넌트 미스 핸들러가 참고되기 때문임) 및 (2) 작업 세트 크기를 확장하고 규칙 캐시 미스 레이트를 증가시키는 규칙 수의 폭증이다. 최악의 경우, 고유한 복합 태그의 수는 각 컴포넌트 정책에 있는 고유 태그의 결과물일 수 있다. 그러나 전체 규칙은 최대 단일 정책인 메모리 안전에 비해 3x 내지 5x 배만큼 증가한다.Two factors contribute to this overhead: (1) the large number of cycles required to resolve last-level rule cache misses (because all component miss handlers are referenced) and (2) scaling the working set size and It is an explosion in the number of rules that increases the rule cache miss rate. In the worst case, the number of unique composite tags can be a result of unique tags in each component policy. However, the overall rule increases by a factor of 3x to 5x compared to memory safety, which is the largest single policy.

평가의 또 다른 포인트는 에너지 오버헤드이다. 더 넓은 워드로 인해 더 많은 비트를 이동시키는 것 및 미스 핸들러 코드로 인해 더 많은 명령어를 실행하는 것 둘 모두는 에너지 오버헤드의 원인이 되어, 단일 정책 및 복합 정책에 모두 영향을 미친다(도 3b 및 도 4b). CFI 및 메모리 안전 정책 - 및 이에 따른 복합 정책 - 은 에너지 비용이 드는 DRAM 액세스를 종종 요구하는 대용량 데이터 구조체에 액세스한다. 최악의 경우의 에너지 오버헤드는 단일 정책의 경우 400 %에 가까우며, 복합 인스턴스 정책의 경우 약 1600 %이고, geomean 오버헤드는 약 220 %이다.Another point of evaluation is the energy overhead. Moving more bits due to wider words and executing more instructions due to miss handler code both contribute to energy overhead, affecting both single and compound policies (FIG. 3b and Fig. 4b). CFI and memory safety policies - and hence complex policies - access large data structures that often require energy-expensive DRAM accesses. The worst-case energy overhead is close to 400% for a single policy, about 1600% for a composite instance policy, and about 220% for a geomean overhead.

많은 플랫폼 설계에서, 최악의 경우의 전력 또는 등가적으로 사이클 당 에너지는 제한 기준(limiter)이다. 플랫폼이 배터리로부터 인출할 수 있는 최대 전류 또는 주변의 냉각을 이용하는 모바일 또는 유선 디바이스에서 최대 지속 동작 온도로 인해 이러한 전력 상한이 돌진될 수 있다. 도 4c는 간단한 구현이 lbm을 사용하여 최대 전력 상한을 76 %만큼 올려 베이스라인 구현 및 간단한 PUMP 구현 둘 모두에서 최대 전력에 돌진하는 것을 도시한다. 이러한 전력 상한의 증가는 최악의 경우의 에너지 오버헤드 보다 부분적으로 더 낮은데, 왜냐하면 일부 벤치마크는 이들이 소비하는 여분의 에너지보다 느려지기 때문이고 그리고 높은 에너지 오버헤드를 가진 벤치마크는 베이스라인 설계에서 사이클 당 최소 절대 에너지를 소비하는 것들이기 때문이라는 것을 유의하여야 한다. 전형적으로 이러한 에너지-효율적 프로그램의 데이터 작업 세트는 온-칩 캐시에 적합하며, 그래서 DRAM 액세스의 비용보다 더 많이 드는 경우는 거의 없다.In many platform designs, worst case power or equivalently energy per cycle is the limiter. This power cap can be pushed by the maximum current the platform can draw from the battery or the maximum sustained operating temperature in a mobile or wired device using ambient cooling. Figure 4c shows that a simple implementation raises the maximum power cap by 76% using lbm to rush to maximum power in both the baseline implementation and the simple PUMP implementation. These power cap increases are lower than the worst-case energy overhead in part because some benchmarks are slower than the extra energy they consume, and benchmarks with high energy overhead are cycles away from the baseline design. It should be noted that this is because they consume the least absolute energy per Typically, the data working set of these energy-efficient programs fits in an on-chip cache, so they seldom cost more than the cost of a DRAM access.

위에서 설명된 전술한 구현을 포함하는 실시예는 대부분의 벤치마크에서 합당한 성능을 달성하고, 벤치마크 중 일부에서 복합 정책에 대한 런타임 오버헤드 및 모든 정책과 벤치마크에서 에너지 및 전력 오버헤드는 허용할 수 없을 정도로 높은 것 같다. 이러한 오버헤드를 해결하기 위해, 일련의 목표로 삼은 마이크로아키텍처 최적화가 도입될 수 있고 본 명세서에서의 기술에 따른 실시예에 통합될 수 있다. 도 17의 테이블 4에서, 이러한 최적화는 전체 비용에 미치는 PUMP 컴포넌트와 연관된 아키텍처 파라미터의 영향에 대해 검사된다. 규칙이 동일한 opcode들을 그룹화하면 PUMP 규칙 캐시의 유효 용량, DRAM 이전의 지연 및 에너지를 줄이는 태그 압축, 온-칩 메모리의 면적 및 에너지를 줄이는 짧은 태그, 및 미스 핸들러의 오버헤드를 감소시키는 일원화된 컴포넌트 정책(Unified Component Policy)(UCP) 및 복합 태그(Composition Tag)(CTAG) 캐시를 증가시키는데 사용된다.Embodiments incorporating the foregoing implementations described above achieve reasonable performance in most benchmarks, runtime overhead for complex policies in some of the benchmarks, and energy and power overhead in all policies and benchmarks are acceptable. It seems impossibly high. To address this overhead, a series of targeted microarchitectural optimizations may be introduced and incorporated into embodiments according to the techniques herein. In Table 4 of FIG. 17, these optimizations are examined for the impact of architectural parameters associated with PUMP components on overall cost. Grouping opcodes with identical rules reduces the effective capacity of the PUMP rule cache, tag compression that reduces latency and energy before DRAM transfers, short tags that reduces area and energy in on-chip memory, and a unified component that reduces the overhead of miss handlers. Used to increase Unified Component Policy (UCP) and Composition Tag (CTAG) caches.

이제 설명되는 것은 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 "opgroup"이다. 실용적인 정책에서는 여러 opcode에 대해 유사한 규칙을 정의하는 것이 일반적이다. 예를 들어, 테인트 추적 정책에서, Add 및 Sub 명령어에 대한 규칙은 동일하다(도 19의 알고리즘 1 참조). 그러나 간단한 구현에서, 이러한 규칙은 규칙 캐시에서 별도의 엔트리를 차지한다. 이러한 관찰에 기초하여, 명령어 연산 코드("opcode")는 동일한 규칙이 "opgroups"으로 그룹화되어 필요한 규칙 수를 줄이다. 함께 그룹화될 수 있는 opcode는 정책에 따라 다르고; 그러므로 "돈-케어" SRAM이 실행 스테이지(18)(도 1)에서 확장되어 규칙 캐시 룩업에 앞서 opcode를 opgroup으로 또한 변환한다. 복합 정책의 경우, 300 개가 넘는 알파(Alpha) opcode가 14 개 opgroup으로 줄어들고, 전체 규칙 수는 평균 1.5x로 1.1x 내지 6x 배만큼 줄어든다(도 5a는 이러한 영향을 모든 SPEC 벤치마크에 걸쳐 측정한다). 이것은 실리콘 영역에서 주어진 투자에 대한 규칙 캐시 용량을 효과적으로 증가시킨다. 그룹 내 단일 명령어에 대한 미스는 그룹 내 모든 명령어 opcode에 적용하는 규칙을 설치하기 때문에 opgroup은 강제 미스의 수를 또한 줄인다. 도 5b는 opgroup화할 때와 하지 않을 때 복합 정책에 대해 서로 다른 L1 규칙 캐시 크기에 대한 모든 SPEC 벤치마크 전반의 미스 레이트를 요약한 것이다. 도 5b는 opgrouping에 의해 미스 레이트의 범위 및 평균 둘 모두가 감소함을 보여준다. 특히, opgroup 최적화 이후 1024 엔트리 규칙 캐시는 최적화 없는 4096 엔트리 규칙 캐시보다 낮은 미스 레이트를 갖는다. 더 낮은 미스 레이트는 미스 핸들러에서 소비되는 시간과 에너지를 자연스럽게 줄여 주며(도 12a 및 도 12b 참조) 더 작은 규칙 캐시는 면적과 에너지를 직접 줄여준다.What is now described are “opgroups” that may be used in embodiments according to the techniques herein. It is common in pragmatic policies to define similar rules for multiple opcodes. For example, in the taint tracking policy, the rules for Add and Sub instructions are the same (see Algorithm 1 in FIG. 19). However, in simple implementations, these rules occupy separate entries in the rules cache. Based on these observations, instruction opcodes ("opcodes") allow identical rules to be grouped into "opgroups" to reduce the number of rules needed. The opcodes that can be grouped together depend on the policy; Therefore, the "don-care" SRAM is expanded in execution stage 18 (FIG. 1) to also convert opcodes to opgroups prior to rules cache lookups. For compound policies, over 300 Alpha opcodes are reduced to 14 opgroups, and the total number of rules is reduced by a factor of 1.1x to 6x with an average of 1.5x (Fig. 5a measures this impact across all SPEC benchmarks). ). This effectively increases the rule cache capacity for a given investment in the silicon realm. Opgroups also reduce the number of forced misses because a miss on a single instruction in a group establishes a rule that applies to all instruction opcodes in the group. Figure 5b summarizes miss rates across all SPEC benchmarks for different L1 rule cache sizes for composite policies with and without opgrouping. 5b shows that both the range and the average of the miss rates are reduced by opgrouping. In particular, the 1024-entry rule cache after opgroup optimization has a lower miss rate than the 4096-entry rule cache without optimization. A lower miss rate naturally reduces the time and energy spent in miss handlers (see FIGS. 12A and 12B ), and a smaller rule cache directly reduces area and energy.

본 명세서에서의 기술에 따른 실시예는 이제 설명될 메인 메모리 태그 압축을 이용할 수 있다. 64b 워드에 64b 태그를 사용하면 오프-칩 메모리 트래픽이 두 배로 늘어나 연관된 에너지가 거의 두 배로 된다. 전형적으로, 태그는 공간적 지역성을 나타내지만 - 많은 인접한 워드는 동일한 태그를 갖는다. 예를 들어, 도 6a는 복합 정책에 따른 gcc 벤치마크의 각 DRAM 이전에 대한 고유 태그 분포를 도표로 구성한 것으로, 대부분의 워드가 동일한 태그를 갖는다는 것을 보여주며; 평균적으로 8-워드 캐리 라인의 DRAM 이전 당 고유 태그는 단지 약 1.14 개이다. 이러한 공간 태그 지역성은 오프-칩 메모리로 및 오프-칩 메모리로부터 이전되어야 하는 태그 비트를 압축하는데 이용된다. 데이터는 캐시 라인에서 이전되기 때문에, 캐시 라인은 이러한 압축의 기초로서 사용된다. 어드레스 특정을 간단하게 하기 위해, 캐시 라인 당 128B가 메인 메모리에 할당된다.Embodiments consistent with the techniques herein may utilize main memory tag compression, which will now be described. Using a 64b tag for a 64b word doubles the off-chip memory traffic, almost doubling the associated energy. Typically, tags indicate spatial locality - many adjacent words have the same tag. For example, Fig. 6a plots the unique tag distribution for each DRAM transfer of the gcc benchmark according to the composite policy, showing that most words have the same tag; On average, there are only about 1.14 unique tags per DRAM transfer of 8-word carry lines. This spatial tag locality is used to compress the tag bits that must be transferred to and from off-chip memory. Because data is transferred in cache lines, cache lines are used as the basis for this compression. To simplify address specification, 128B per cache line is allocated to main memory.

그러나, 128b 태깅된 워드를 직접 저장하는 대신 도 6b에 도시된 바와 같이, 여덟 개의 64b 워드(페이로드)가 저장되고, 이어서 여덟 개의 4b 인덱스, 그런 다음 최대 여덟 개의 60b 태그가 저장된다. 인덱스는 60b 태그 중 어느 것이 연관된 워드와 어울리는지를 식별한다. 태그는 인덱스를 수용하기 위해 60b로 잘리지만, 이것은 태그를 포인터로서 사용하는 것을 훼손하지 않는다: 바이트 어드레스 지정 및 16B(두 개의 64b 워드) 정렬된 메타데이터 구조체를 가정하면, 64b 포인터의 하위 4b는 0으로 채워질 수 있다. 결과적으로, 인덱스의 4B를 전송한 후, 남은 모든 것은 캐시 라인에서 고유한 7.5B 태그를 이전해야 한다. 예를 들어, 캐시 라인의 모든 워드가 동일한 태그를 사용하면, 첫 번째 판독에서는 64B+4B = 68B가 이전되고, 두 번째 판독에서는 8B가 이전되어 128B 대신 총 76B가 이전된다. 4b 인덱스는 직접 인덱스이거나 특수 값일 수 있다. 디폴트 태그를 나타내는 특수 인덱스 값이 정의되어, 그래서 이 경우에는 임의의 태그를 전송할 필요가 없게 한다. 이러한 방식으로 태그를 압축함으로써, DRAM 전송 당 평균 에너지 오버헤드가 110 %에서 15 %로 줄어든다.However, instead of directly storing 128b tagged words, eight 64b words (payloads) are stored, followed by eight 4b indices, and then up to eight 60b tags, as shown in Figure 6b. The index identifies which of the 60b tags matches the associated word. The tag is truncated to 60b to accommodate the index, but this does not break the use of the tag as a pointer: assuming byte addressing and a 16B (two 64b words) aligned metadata structure, the lower 4b of the 64b pointer is It can be filled with 0. Consequently, after sending 4B of the index, all that remains must transfer the unique 7.5B tag in the cache line. For example, if all words in a cache line use the same tag, the first read transfers 64B+4B = 68B, the second read transfers 8B, for a total of 76B transferred instead of 128B. The 4b index can be either a direct index or a special value. A special index value representing the default tag is defined, so that there is no need to send any tag in this case. By compressing the tags in this way, the average energy overhead per DRAM transfer is reduced from 110% to 15%.

위에서 제시된 압축 방식은, 예를 들어, 오프-칩 메모리 에너지를 줄일 때 단순성 및 유효성의 조합으로 인해, 본 명세서에서의 기술에 따른 실시예에서 이용될 수 있다. 관련 기술분야에서 통상의 기술자라면 다중 레벨 태그 페이지 테이블, 가변적 세밀화된 TLB-유사 구조체(TLB-like structure) 및 범위 캐시를 비롯한 미세 세밀화된 메모리 태깅을 위한 추가적이고 대안적인 정교한 방식이 존재한다는 것 및 이것들은 본 명세서에서의 기술에 따른 실시예에서 DRAM 풋 프린트를 줄이는데 사용될 수도 있다는 것을 명확히 인식한다.The compression scheme presented above may be used in embodiments according to the techniques herein due to the combination of simplicity and effectiveness in reducing off-chip memory energy, for example. Those of ordinary skill in the art will appreciate that additional and alternative sophisticated approaches exist for fine-grained memory tagging, including multi-level tag page tables, variable-grained TLB-like structures, and range caches; and It is expressly appreciated that these may be used to reduce the DRAM footprint in embodiments according to the techniques herein.

이제 태그 변환이 본 명세서에서의 기술에 따른 실시예에서 수행될 수 있는 방법이 설명될 것이다. 도 1을 다시 참조하면, 각각의 캐싱된 규칙은 456b 폭이기 때문에 간단한 PUMP 규칙 캐시에게는 크다(110 % 영역 추가). PUMP(10)을 지원하려면 베이스라인 온-칩 메모리(RF 및 L1/L2 캐시)를 64b태그로 확장하는 것이 필요했다. 여기서 각 64b 워드에 대해 전체 64b(또는 60b) 태그를 사용하면 밀집 영역과 에너지 오버헤드를 유발한다. 그러나, 64KB L1-D$는 단지 8192 워드만을 수용하며 이에 따라 많아 봐야 8192 개 고유 태그를 보유한다. 64KB L1-I$와 함께, L1 메모리 서브시스템에서는 많아야 16384 개 고유 태그가 있을 수 있고; 이것들은 단지 14b 태그로 표현되어, 시스템에서 지연, 면적, 에너지 및 전력을 줄일 수 있다. 캐시(L1, L2)는 시간적 지역성을 이용하기 위해 존재하며, 이러한 관찰은 지역성이 면적 및 에너지를 줄이는데 적극 활용될 수 있음을 시사한다. 태그 비트가 14b로 줄어들면, PUMP 규칙 캐시 매치 키는 328b에서 78b로 줄어든다.It will now be described how tag translation can be performed in embodiments according to the techniques herein. Referring back to Figure 1, each cached rule is 456b wide, which is large for a simple PUMP rule cache (adding 110% area). To support PUMP 10, it was necessary to expand the baseline on-chip memory (RF and L1/L2 cache) to 64b tags. Here, using a full 64b (or 60b) tag for each 64b word introduces dense area and energy overhead. However, the 64KB L1-D$ accommodates only 8192 words and thus holds at most 8192 unique tags. With 64KB L1-I$, there can be at most 16384 unique tags in the L1 memory subsystem; These can be represented by just 14b tags, reducing latency, area, energy and power in the system. The caches (L1, L2) exist to take advantage of temporal locality, and this observation suggests that locality can be actively used to reduce area and energy. When the tag bits are reduced to 14b, the PUMP rules cache match key is reduced from 328b to 78b.

전체 포인터-크기 태그의 유연성을 잃지 않으면서 전술한 절감의 장점을 획득하기 위해, 상이한-폭 태그(different-width tag)가 상이한 온-칩 메모리 서브시스템에 사용되어 필요할 때 이들 사이에서 변환될 수 있다. 예를 들어, L1 메모리에서 12b 태그를 사용하고 L2 메모리에서 16b 태그를 사용할 수 있다. 도 7a는 L1 및 L2 메모리 서브시스템들 사이에서 수행될 수 있는 태그 변환을 상세히 도시한다. L2 캐시(34)로부터 L1 캐시(36)로 워드를 이동하려면 그의 16b 태그를 대응하는 12b 태그로 변환하여, 필요하다면 새로운 연관성을 생성할 것을 요구한다. L2-태그-대-L1-태그 변환을 위한 간단한 SRAM(38)은 L2 태그에 대한 L1 매핑이 존재하는지를 나타내는 여분의 비트를 갖는다. 도 7b는 L1 태그를 어드레스로 사용하여 SRAM(39) 룩업이 수행된 (라이트백 또는 L2 룩업시) L1 태그(40)로부터 L2 태그(42)로의 변환을 상세하게 도시한다. 유사한 변환은 60b 메인 메모리 태그와 16b L2 태그 사이에서 발생한다.To achieve the aforementioned savings without losing the flexibility of full pointer-size tags, different-width tags can be used in different on-chip memory subsystems and converted between them as needed. there is. For example, you can use a 12b tag in L1 memory and a 16b tag in L2 memory. 7A details tag translation that may be performed between the L1 and L2 memory subsystems. Moving a word from the L2 cache 34 to the L1 cache 36 requires converting its 16b tag to the corresponding 12b tag, creating a new association if necessary. The simple SRAM 38 for L2-tag-to-L1-tag translation has an extra bit to indicate if there is an L1 mapping for an L2 tag. FIG. 7B details the conversion from L1 tag 40 to L2 tag 42 where an SRAM 39 lookup is performed (either during writeback or L2 lookup) using the L1 tag as the address. A similar conversion occurs between the 60b main memory tag and the 16b L2 tag.

장단(long-to-short) 변환 테이블에 긴 태그가 없을 때, 새로운 짧은 태그가 할당되어, 더 이상 사용되지 않는 이전에 할당된 짧은 태그를 잠재적으로 되돌릴 수 있다. 가비지 콜렉션(garbage collection) 및 태그-사용 계산(tag-usage counting)을 비롯하여 짧은 태그를 되돌릴 수 있는 때를 결정할 수 있는 풍부한 설계 공간이 있다. 간략화를 위해, 짧은 태그가 순차적으로 할당되고 짧은 태그 공간이 모두 소모될 때 특정 레벨(명령어, 데이터 및 PUMP) 이상의 모든 캐시를 플러시(flush)하여, 특정한 짧은 태그가 되돌려질 수 있는 때를 추적할 필요성을 회피한다. 캐시는 캐시 플러시를 저렴하게 만드는 적합한 기술로 설계될 수 있다. 예를 들어, 본 명세서에서의 기술에 따른 실시예에서, 모든 캐시는 예컨대 관련 기술분야에서 공지되어 있고, 예를 들어 문헌(K. Mai, R. Ho, E. Alon, D. Liu, Y. Kim, D. Patil, M. Horowitz, Architecture and Circuit Techniques for a 1.1GHz 16-kb Reconfigurable Memory in 0.18um-CMOS, IEEE J. Solid-State Circuits, 40(l):261-275, January 2005)에서 설명된 광파장 갱 클리어(lightweight gang clear)로 설계될 수 있으며, 이 문헌은 본 출원에서 참조로 포함된다.When there is no long tag in the long-to-short conversion table, a new short tag is assigned, potentially reverting previously assigned short tags that are no longer in use. There is a rich design space for determining when short tags can be returned, including garbage collection and tag-usage counting. For simplicity, short tags are allocated sequentially and all caches above a certain level (instruction, data and PUMP) are flushed when short tag space is exhausted, keeping track of when a particular short tag can be returned. avoid the need Caches can be designed with suitable techniques to make cache flushes cheap. For example, in embodiments according to the description herein, all caches are known in the art, such as those described in, for example, K. Mai, R. Ho, E. Alon, D. Liu, Y. In Kim, D. Patil, M. Horowitz, Architecture and Circuit Techniques for a 1.1GHz 16-kb Reconfigurable Memory in 0.18um-CMOS, IEEE J. Solid-State Circuits, 40(l):261-275, January 2005) It can be designed with the described lightweight gang clear, which document is incorporated herein by reference.

각 L1 규칙 캐시 액세스 비용이 51pJ가 드는 경우, (도 16에서 재현된) 테이블 3과 비교하여, 본 명세서에서의 기술은 8b L1 태그를 갖는 10pJ 또는 16b L1 태그를 갖는 18pJ로 떨어지는 감소를 제공하고, 에너지는 이들 포인트 사이의 태그 길이에 따라 선형적으로 크기 조정된다. L1 명령어 및 데이터 캐시에 미치는 에너지 영향은 작다.If each L1 rule cache access costs 51 pJ, compared to Table 3 (reproduced in Figure 16), the technique herein provides a reduction down to 10 pJ with 8b L1 tag or 18 pJ with 16b L1 tag; , the energy scales linearly with the tag length between these points. The energy impact on the L1 instruction and data cache is small.

마찬가지로, 16b L2 태그를 사용하면, L2 PUMP 액세스 비용은 120pJ이고 64b 태그를 사용하는 173pJ보다 낮다. L1 태그를 슬림화하면 L1 캐시의 용량을 복원할 수 있다. 12b 태그를 사용하면, 전체 용량(76KB, 유효 64KB) 캐시는 단일 사이클 타이밍 요건을 충족할 것이므로, L1 캐시 용량이 줄어듦으로써 간단한 구현에 초래되는 성능 불이익이 줄어들 것이다. 결과적으로 L1 태그 길이 조사는 12 비트 이하로 제한된다. 심지어 태그가 더 짧아지면 에너지가 줄어들지만, 더 짧은 태그는 플러시의 빈도를 증가시킨다.Similarly, using a 16b L2 tag, the L2 PUMP access cost is 120pJ, lower than the 173pJ using a 64b tag. By slimming the L1 tag, the capacity of the L1 cache can be restored. With a 12b tag, the full-capacity (76KB, effective 64KB) cache will meet the single-cycle timing requirements, reducing the performance penalty for simpler implementations by reducing the L1 cache capacity. As a result, the L1 tag length lookup is limited to 12 bits or less. Even shorter tags reduce energy, but shorter tags increase the frequency of flushes.

도 8a 및 도 8b는 L1 태그 길이가 늘어남에 따라 어떻게 플러시가 감소하는지에 관한 것뿐만 아니라, L1 규칙 캐시 미스 레이트에 미치는 영향을 도시한다.8A and 8B show how flushes decrease as the L1 tag length increases, as well as the effect on the L1 rule cache miss rate.

이제는 미스 핸들러 가속화와 관련하여 사용될 수 있는 다양한 기술이 설명될 것이다. 본 명세서에서의 기술에 따른 실시예는 네 개의 정책을 단일의 복합 정책으로 조합할 수 있다. 도 20을 참조하면, 알고리즘 2에서, N-정책 미스 핸들러의 각각의 호출은 태그들의 튜플을 분해해야 하고, 복합 정책에 필요한 규칙은 도 9a에서 식별되는 규칙 캐시 미스 레이트를 증가시킨다. 테인트 추적 및 CFI 정책은 개별적으로 낮은 미스 레이트를 가질지라도, 메모리 안전 정책으로 생긴 더 높은 미스 레이트는 복합 정책의 미스 레이트를 마찬가지로 높게 몰고 간다. 개개의 정책의 미스 레이트가 더 낮다는 것은 복합 규칙이 없을 때에도 미스 레이트의 결과가 캐싱 가능하다는 것을 시사한다.Various techniques that may be used in conjunction with miss handler acceleration will now be described. Embodiments according to the techniques herein may combine four policies into a single composite policy. Referring to FIG. 20 , in Algorithm 2, each invocation of the N-Policy miss handler must resolve a tuple of tags, and the rules needed for the composite policy increase the rule cache miss rate identified in FIG. 9A. Although the taint tracking and CFI policies individually have low miss rates, the higher miss rates resulting from the memory safety policies drive the miss rates of the composite policies high as well. The lower miss rates of the individual policies suggest that the results of the miss rates are cacheable even in the absence of compound rules.

도 23에 도시된 바와 같은 PUMP 마이크로아키텍처의 다양한 양상과 관련하여, 복합 정책 미스 처리(composite policy miss handling)를 최적화하기 위해 하드웨어 구조체가 이용될 수 있다. 본 명세서에서의 기술에 따른 실시예는 가장 최근의 컴포넌트 정책 결과가 캐싱되는 단일화된 컴포넌트 정책(Unified Component Policy)(UCP; 도 21의 알고리즘 3 참조) 캐시(UCP $)를 이용할 수 있다. 이러한 실시예에서, 복합 정책에 필요한 일반적인 미스-핸들러는 컴포넌트 정책(예를 들어, 도 21의 알고리즘, 라인 3에서의 정책 참조)을 해결하면서 이 캐시에서 룩업을 수행하도록 수정된다. 이 캐시가 컴포넌트 정책에 대해 미스를 발생할 때, 이 캐시의 정책 계산은 소프트웨어에서 수행된다(그리고 결과는 이 캐시에 삽입된다).In conjunction with various aspects of the PUMP microarchitecture as shown in FIG. 23, hardware structures may be used to optimize composite policy miss handling. Embodiments according to the techniques herein may use a Unified Component Policy (UCP; see Algorithm 3 of FIG. 21) cache (UCP $) in which the most recent component policy result is cached. In this embodiment, the generic miss-handler needed for the composite policy is modified to perform a lookup in this cache while resolving the component policy (eg, algorithm of Figure 21, see policy at line 3). When this cache generates a miss for a component policy, the cache's policy calculation is performed in software (and the result is inserted into this cache).

도 24에서도 또한 도시된 바와 같이, UCP 캐시는 정책 식별자 필드가 추가되는, 정규 PUMP 규칙 캐시와 동일한 하드웨어 구성으로 구현될 수 있다. FIFO 대체 정책은 이 캐시에 사용될 수 있지만, 컴포넌트 정책에 대한 재 계산 비용과 같은 메트릭을 사용하여 공간의 우선 순위를 정함으로써 더 나은 결과를 달성하는 것이 가능하다. 이 캐시는 보통의 용량으로 대부분의 정책 재-계산을 걸러낸다(도 9Bb; 새로운 메모리 할당과 연관된 강제 미스(compulsory miss)로 인해 메모리 안전에 대한 히트 레이트(hit rate)는 낮아진다). 결과적으로, 평균 미스 핸들러 사이클 수는 가장 까다로운 벤치마크에 대해 5배만큼 줄어든다(도 9e). 필요한 복합 규칙은 적은 수의 컴포넌트 정책 규칙의 결과물일 수 있기 때문에 L2 PUMP에서 미스가 있을 때 모든 정책이 UCP 캐시에서 히트할 가능성이 있다. GemsFDTD의 경우, 세 개 이상의 컴포넌트 정책이 시간 중 약 96 %를 히트했다.As also shown in FIG. 24, the UCP cache can be implemented with the same hardware configuration as the regular PUMP rules cache, with the addition of a policy identifier field. A FIFO replacement policy can be used for this cache, but it is possible to achieve better results by prioritizing space using metrics such as recalculation cost for component policies. This cache filters out most of the policy re-computation with normal capacity (Fig. 9Bb; hit rate to memory safety is low due to compulsory misses associated with new memory allocation). As a result, the average number of miss handler cycles is reduced by a factor of 5 for the most demanding benchmark (Fig. 9e). Since the required composite rules can be the result of a small number of component policy rules, there is a chance that all policies will hit the UCP cache when there is a miss at the L2 PUMP. In the case of GemsFDTD, three or more component policies hit about 96% of the time.

도 23과 도 24에 또한 도시된 바와 같이, 캐시는 결과 태그들의 튜플을 그의 표준 복합 결과 태그로 변환하기 위해 추가될 수 있다. 전술한 캐시는 몇몇 컴포넌트 정책 룰이 결과 태그들의 동일한 튜플을 리턴하는 것이 일반적이기 때문에 사실상의 복합 태그(CTAG) 캐시(CTAG $)(도 9D)라고 지칭될 수 있다. 예를 들어, 결과 태그가 상이할지라도, 대부분의 경우 PC_tag는 동일할 것이다. 또한, 많은 상이한 규칙 입력은 동일한 출력으로 이어질 수 있다. 예를 들어, 테인트 추적 세트에서, 합집합이 수행되고, 많은 상이한 합집합이 동일한 결과를 가질 것이다; 예를 들어, (Blue, {A, B, C})는 {A} U {B, C} 및 {A, B} U {B, C}(테인트 추적) 둘 모두의 결과를 Blue 슬롯(메모리 안전)에 기입하기 위한 복합 답변이다. FIFO 대체 정책은 이 캐시에 사용된다. CTAG 캐시는 평균 미스 핸들러 사이클을 또 다른 2배만큼 줄어든다(도 9e 참조).As also shown in Figures 23 and 24, a cache can be added to convert a tuple of result tags to its standard composite result tag. The aforementioned cache may be referred to as a de facto composite tag (CTAG) cache (CTAG $) (FIG. 9D) since it is common for some component policy rules to return identical tuples of result tags. For example, even if the resulting tags are different, in most cases the PC _tag will be the same. Also, many different rule inputs can lead to the same output. For example, in a set of taint traces, unions are performed, and many different unions will have the same result; For example, (Blue, {A, B, C}) returns the result of both {A} U {B, C} and {A, B} U {B, C} (the taint trace) to the Blue slot ( It is a composite answer to write to memory safe). A FIFO replacement policy is used for this cache. The CTAG cache reduces the average miss handler cycle by another factor of 2 (see Fig. 9e).

2048 엔트리의 UCP 캐시와 512 엔트리의 CTAG 캐시가 함께 사용되면 각 L2 규칙 캐시 미스에 소모되는 평균 시간을 800 사이클에서 80 사이클로 줄인다.When used together, the 2048-entry UCP cache and the 512-entry CTAG cache reduce the average time spent on each L2 rule cache miss from 800 cycles to 80 cycles.

본 명세서에서의 기술에 따른 실시예는 또한 규칙을 포함하는 하나 이상의 캐시에 저장되는 하나 이상의 규칙을 프리페치(prefetch)함으로써 성능을 개선할 수 있다. 따라서 가까운 장래에 필요할 수 있는 선계산 규칙(precompute rule)을 사용하여 강제 미스 레이트를 줄일 수도 있다. 예시적인 인스턴스는 메모리 안전 규칙에 대해 높은 값을 갖는다. 예를 들어, 새로운 메모리 태그가 할당될 때, 새로운 규칙은 그 태그에 대해 (초기화 (1), 포인터에 오프셋 추가 및 무브(move) (3), 스칼라 로드(scalar load) (1), 스칼라 저장(scalar store) (2))을 해야 할 것이다. 그 결과, 이러한 규칙은 모두 UCP 캐시에 즉시 추가될 수 있다. 단일 정책 메모리 안전 사례의 경우, 규칙은 규칙 캐시에 직접 추가될 수 있다. 이렇게 하면 메모리 안전 미스 핸들러 호출 수가 2x 배 줄어든다.Embodiments according to the techniques herein may also improve performance by prefetching one or more rules stored in one or more caches containing rules. Therefore, the forced miss rate can be reduced by using a precompute rule that may be needed in the near future. The example instance has a high value for the memory safety rule. For example, when a new memory tag is allocated, the new rule is for that tag (initialize (1), add offset to pointer and move (3), scalar load (1), scalar store (scalar store) (2)). As a result, all of these rules can be immediately added to the UCP cache. For the single policy memory safety case, rules can be added directly to the rule cache. This reduces the number of memory-safe miss handler calls by a factor of 2x.

전체 평가와 관련하여 도 11a를 참조하면, 아키텍처 파라미터는 특정 비용에 단조적으로 영향을 주어 에너지, 지연 및 면적 사이의 상쇄관계를 제공하지만, 단일 비용 기준 내에서 최소한도를 정의하지는 않는다. 일단 태그 비트가 충분히 작아지면, L1 D/I 캐시는 베이스라인 용량으로 복원될 수 있어서, 베이스라인이 L1 태그 길이를 탐구하는 상한선으로 채택되는 문턱 효과가 있지만, 그 포인트를 넘어 태그 길이가 줄어들면 성능에 미치는 영향이 작으면서 에너지가 줄어든다.Referring to Fig. 11a with respect to the overall evaluation, architectural parameters monotonically affect a particular cost, providing a trade-off between energy, delay and area, but do not define a minimum within a single cost criterion. Once the tag bits get small enough, the L1 D/I cache can be restored to its baseline capacity, which has the effect of a threshold where the baseline is taken as an upper bound to explore the L1 tag length, but beyond that point if the tag length is reduced Energy is reduced with a small impact on performance.

도 11b는 태그 길이를 감소시키는 것이 대부분의 벤치마크 프로그램에 대해 우세한 에너지 효과가 있고(예를 들어, Ieslie3d, mcf), 몇몇 프로그램이 UCP 캐시 용량의 증가에 따라 동일하거나 더 큰 이득을 보이는 것(예를 들어, GemsFDTD, gcc)을 도시한다. 다른 비용 문제를 무시하고, 에너지를 줄이기 위해, 큰 미스 핸들러 캐시 및 몇 개의 태그 비트가 선택된다. 런타임 오버헤드(도 11a 참조)는 또한 더 큰 미스 핸들러 캐시를 사용하여 최소화되지만, 태그 비트 수가 적은 것보다는 오히려 많은 것이 이득이 있다(예를 들어, GemsFDTD, gcc).11b shows that reducing the tag length has a dominant energy effect for most benchmark programs (e.g., Ieslie3d, mcf), and some programs show equal or greater gains with increasing UCP cache capacity ( For example, GemsFDTD, gcc). A large miss handler cache and a few tag bits are chosen to save energy, ignoring other cost concerns. Runtime overhead (see Fig. 11a) is also minimized by using a larger miss handler cache, but more than less tag bits are beneficial (eg GemsFDTD, gcc).

이득의 규모는 벤치마크와 정책에 따라 달라진다. 모든 벤치마크를 통해, SPEC CPU2006 벤치마크의 경우 10b L1 태그를 능가하는 장점은 작기 때문에, 10b는 에너지와 지연 사이의 절충안으로 사용되며 2048-엔트리 UCP 캐시 및 512-엔트리 CTAG 캐시를 사용하여 탐구된 아키텍처 파라미터의 공간 내의 최소 에너지 레벨에 근접하면서 오버헤드를 줄인다.The size of the gain depends on the benchmark and policy. Across all benchmarks, the advantage over the 10b L1 tag for the SPEC CPU2006 benchmark is small, so 10b is used as a compromise between energy and latency and is explored using a 2048-entry UCP cache and a 512-entry CTAG cache. It reduces overhead while approaching the minimum energy level within the space of architectural parameters.

도 12a 및 도 12b는 최적화를 적용하는 런타임 및 에너지 오버헤드에 미치는 전반적인 영향을 도시한다. 모든 최적화는 일부 벤치마크에서 우세하고(예를 들어, astar에 대해서는 opgroup, lbm에 대해서는 DRAM 태그 압축, h264ref에 대해서는 짧은 태그, GemsFDTD에 대해서는 미스 핸들러 가속), 각각의 최적화는 연속하여 하나의 병목을 제거하고 다음 병목을 드러낸다. 벤치마크와 상이한 거동은 아래에 상세히 설명된 대로 베이스라인 특성을 따른다.12A and 12B show the overall impact on runtime and energy overhead of applying optimizations. All optimizations dominate some benchmarks (e.g. opgroup for astar, DRAM tag compression for lbm, short tag for h264ref, miss handler acceleration for GemsFDTD), and each optimization in succession breaks one bottleneck. Remove and expose the next bottleneck. Behavior different from the benchmark follows the baseline characteristics as detailed below.

지역성이 낮은 애플리케이션은 높은 메인 메모리 트래픽으로 인해 DRAM에 의해 주도되는 기본 에너지 및 성능을 갖는다. 이러한 벤치마크에서의 오버헤드(예를 들어, lbm)는 DRAM 오버헤드 경향이므로, DRAM 오버헤드의 감소는 런타임 및 에너지 오버헤드에 직접적인 영향을 준다. 지역성이 더 많은 애플리케이션은 베이스라인 구성에서 더 빠르고, 에너지를 덜 소비하고, DRAM 오버헤드를 덜 겪으며; 결과적으로, 이러한 벤치마크는 L1 D/I 및 규칙 캐시에서 감소된 L1 용량 및 태그 에너지 감소로 인해 더 심하게 영향을 받는다. DRAM 최적화는 이러한 애플리케이션에 미치는 영향이 적지만, 짧은 태그를 사용하면 에너지에 큰 영향을 미치고 L1 D/I 캐시 용량 불이익을 제거한다(예를 들어, h264ref).Applications with low locality have primary energy and performance driven by DRAM due to high main memory traffic. Overhead in these benchmarks (eg lbm) tends to be DRAM overhead, so any reduction in DRAM overhead directly affects runtime and energy overhead. Applications that are more local are faster, consume less energy, and suffer less DRAM overhead in the baseline configuration; As a result, these benchmarks are more severely affected by the reduced L1 capacity and reduced tag energy in the L1 D/I and rule caches. DRAM optimization has a small impact on these applications, but using short tags has a large energy impact and eliminates the L1 D/I cache capacity penalty (eg h264ref).

동적 메모리 할당이 많은 벤치마크에서는 새로 생성된 태그가 캐시에 설치되어야 하므로 강제 미스로 인해 L2 규칙 캐시 미스 레이트가 더 높아진다. 이것은 간단한 구현에서 여러 벤치마크에 대해 높은 오버헤드를 주도했다(GemsFDTD, omnetpp). 본 출원에서 설명되는 미스 핸들러 최적화는 이러한 미스의 일반적 사례의 비용을 줄여주고, opgroup 최적화는 용량 미스 레이트(capacity miss rate)를 줄여준다. 간단한 구현에 대해, GemsFDTD는 매 200개 명령어마다 L2 규칙 캐시 미스를 일으켰고 780 % 런타임 오버헤드의 대부분을 주도하는 각 미스를 처리하기 위해 800 사이클을 사용했다(도 4a 참조). 최적화에 따르면, GemsFDTD 벤치마크는 매 400개 명령어마다 L2 규칙 캐시 미스를 서비스하고 미스 당 평균 140 사이클만 사용하여, 런타임 오버헤드를 약 85 %까지 줄여준다(도 10a 참조).In benchmarks with a lot of dynamic memory allocation, forced misses result in higher L2 rule cache miss rates because newly created tags must be installed in the cache. This led to high overhead for several benchmarks in a simple implementation (GemsFDTD, omnetpp). The miss handler optimization described in this application reduces the cost of common cases of such misses, and the opgroup optimization reduces the capacity miss rate. For a simple implementation, GemsFDTD caused an L2 rule cache miss every 200 instructions and took 800 cycles to handle each miss, which led to the majority of the 780% runtime overhead (see Fig. 4a). According to the optimization, the GemsFDTD benchmark services L2 rule cache misses every 400 instructions and uses only 140 cycles on average per miss, reducing runtime overhead by about 85% (see Fig. 10a).

전반적으로, 이러한 최적화는 메모리 할당이 높은 GemsFDTD 및 omnetpp를 제외한 모든 벤치마크에 대해 런타임 오버헤드를 10 % 아래로 내린다(eh 10a 참조). 평균 에너지 오버헤드는 60 %에 가깝고, 단지 네 개의 벤치마크만이 80 %를 초과한다(도 10b 참조).Overall, these optimizations bring runtime overhead down by 10% for all benchmarks except GemsFDTD and omnetpp, which have high memory allocations (see eh 10a). The average energy overhead is close to 60%, and only four benchmarks exceed 80% (see Fig. 10b).

설명하자면, PUMP의 성능 영향은 여러 방식으로 PUMP를 강조하고 다양한 보안 특성 범위를 예시하는 네 개의 상이한 정책: (1) 태그를 사용하여 코드를 메모리의 데이터와 구별하고 단순 코드 주입(simple code injection) 공격으로부터 보호를 제공하는 실행 불가 데이터 및 기입 불가 코드(Non-Executable Data and Non-Writable Code)(NXD+NWC) 정책; (2) 유효하게 무한 (260) 수의 컬러("테인트 마크")으로 확장되는, 힙-할당된 메모리(heap-allocated memory)의 모든 공간적 및 시간적 위반을 검출하는 메모리 안전 정책; (3) 간접 제어 이전(indirect control transfer)을 프로그램의 제어 흐름 그래프에서 허용된 에지만으로 제한하여, 리턴 지향 프로그래밍 스타일의 공격을 방지하는 제어 흐름 무결성(CFI) 정책(미세 세밀화된 CFI를 시행하지만, 잠재적으로 공격에 취약 가능성 있는 저급으로 근사화한 것은 제외함); (4) 각 워드가 동시에 다수의 소스(라이브러리 및 10 스트림)에 의해 잠재적으로 오염될 수 있는 미세 세밀화된 테인트 추적 정책(일반화) 복합을 사용하여 측정할 수 있다(도 14의 테이블 1 참조). 본 출원의 다른 곳에서 언급한 바와 같이, 이러한 것들은 보호 능력이 본 출원의 문헌에서 확립되어있는 공지된 정책이고 본 명세서에서의 설명은 PUMP를 사용하여 이를 시행하는데 미치는 성능 영향을 측정하고 줄이는데 중점을 둔다. NXD+NWC를 제외하고, 이들 정책 각각은 근본적으로 제한되지 않는 수의 고유 아이템을 구별하며; 이에 반해, 메타데이터 비트 수가 제한된 해결책은 기껏해야 극도로 단순하게 근사화된 것만을 지원할 수 있다. 위에서 또한 언급했듯이, PUMP의 간단하고 직접적인 구현은 비용이 많이들 수 있다. 예를 들어, 포인터-크기(64b) 태그를 64b 워드에 추가하면 시스템의 모든 메모리 크기와 에너지 사용량을 최소한 두 배로 만들며; 규칙 캐시에는 이것 외에 영역과 에너지가 추가된다. 이러한 간단한 구현의 경우, 측정된 면적 오버헤드는 약 190 %이고 기하학적 의미의 에너지 오버헤드는 약 220 %이고; 더욱이, 일부 애플리케이션에서 런타임 오버헤드는 실망스럽다(300 % 이상). 이러한 높은 간접비가 행할 수 있는 최선의 것이라면 채택을 좌절시킬 것이다.To illustrate, the performance impact of PUMPs is highlighted in several ways by four different policies that highlight PUMPs and illustrate a range of different security characteristics: (1) use tags to distinguish code from data in memory and simple code injection; a Non-Executable Data and Non-Writable Code (NXD+NWC) policy that provides protection from attacks; (2) a memory safety policy that detects all spatial and temporal violations of heap-allocated memory, extending to an effectively infinite (260) number of colors ("taint marks"); (3) a Control Flow Integrity (CFI) policy that prevents attacks in the return-oriented programming style by restricting indirect control transfers to only allowed edges in the program's control flow graph (enforcing fine-grained CFI; except those approximated to potentially vulnerable to attack); (4) can be measured using a complex fine-grained taint tracking policy (generalization) where each word can potentially be polluted by multiple sources (libraries and 10 streams) at the same time (see Table 1 in Fig. 14) . As noted elsewhere in this application, these are known policies whose protective capabilities have been established in the literature of this application, and the description herein focuses on measuring and reducing the performance impact of enforcing them using PUMP. put With the exception of NXD+NWC, each of these policies distinguishes essentially an unlimited number of unique items; In contrast, solutions with a limited number of metadata bits can only support extremely simplistic approximations at best. As also mentioned above, a simple and straightforward implementation of PUMP can be costly. For example, adding a pointer-sized 64b tag to a 64b word at least doubles the system's total memory size and energy usage; In addition to this, area and energy are added to the rule cache. For this simple implementation, the measured area overhead is about 190% and the geometrical energy overhead is about 220%; Moreover, the runtime overhead in some applications is disappointing (over 300%). If this high overhead is the best it can do, it will discourage adoption.

본 명세서에서 설명된 것과 같은 마이크로-아키텍처 최적화는 본 명세서에서의 기술에 따른 실시예에 포함되어 전력 한도를 10 % 줄일 수 있어(도 10c 참조), 최적화된 PUMP는 플랫폼의 동작 엔벨로프에 거의 영향을 미치지 않음을 시사한다. DRAM 압축은 lbm의 경우 에너지 오버헤드를 20 % 줄이며; 이것은 또한 9 %만큼 느려지므로, 전력 요구량은 단지 10 %만큼 증가한다.Micro-architectural optimizations such as those described herein can be included in an embodiment according to the techniques herein to reduce the power limit by 10% (see FIG. 10C ), such that the optimized PUMP has little impact on the operating envelope of the platform. imply that it does not DRAM compression reduces energy overhead by 20% for lbm; This also slows it down by 9%, so the power requirement only increases by 10%.

최적화된 설계의 면적 오버헤드는 단순한 설계의 190 %(예를 들어, 도 16의 테이블 3 참조)와 비교하여 약 110 %(예를 들어, 도 18의 테이블 5 참조)이다. 짧은 태그는 L1 및 L2 캐시의 면적(지금은 베이스라인보다 5 % 만 추가됨) 및 규칙 캐시의 면적(26 % 만 추가됨)을 상당히 줄이다. 반대로, 최적화된 설계는 런타임 및 에너지 오버헤드를 줄이는데 약간의 면적을 소비한다. UCP 및 CTAG 캐시는 33 %의 영역 오버헤드를 추가하는 반면, 짧은 태그(L1 및 L2 모두)를 위한 변환 메모리는 또 다른 46 %를 추가한다. 이러한 추가적인 하드웨어 구조체는 면적을 추가하지만, 드물게 액세스되고 UCP 및 CTAG 캐시로 인해 미스 핸들러 사이클이 의미 있게 줄어들기 때문에, 순수 에너지 감소를 제공한다.The area overhead of the optimized design is about 110% (eg, see Table 5 in FIG. 18 ) compared to 190% for the simple design (eg, see Table 3 in FIG. 16 ). Short tags significantly reduce the area of the L1 and L2 caches (now only 5% more than baseline) and the area of the rules cache (only 26% more). Conversely, an optimized design spends little area reducing runtime and energy overhead. UCP and CTAG caches add 33% region overhead, while translation memory for short tags (both L1 and L2) adds another 46%. These additional hardware structures add area, but provide net energy reduction, as they are infrequently accessed and miss handler cycles are significantly reduced due to the UCP and CTAG caches.

본 명세서에 설명된 모델 및 최적화의 한 가지 목표는 동시에 시행되는 추가 정책을 추가하는 실시예를 상대적으로 간단하게 만드는 것이다. 간단한 PUMP 설계의 복합 정책은 미스 핸들러 런타임이 크게 증가하여 여러 벤치마크에 대해 증분 비용의 이상의 비용을 초래하였지만, 이것들은 미스 핸들러 최적화로 인해 줄어든다.One goal of the models and optimizations described herein is to make the implementation of adding additional policies concurrently enforced relatively simple. The compound policy of the simple PUMP design resulted in a significant increase in miss handler runtime, costing more than incremental cost for several benchmarks, but these are reduced due to miss handler optimization.

도 13a(CPI 오버헤드)과 도 13b(EPI 오버헤드)는 각각의 단일 정책의 오버헤드를 먼저 보여준 다음, 정책을 가장 복잡한 단일 정책인 메모리 안전에 추가하는 복합을 보여줌으로써 정책의 증분적인 추가가 런타임 오버헤드에 어떻게 영향을 미치는지를 도시한다. 이 진행은 더 높은 오버헤드 정책을 추가하는 것과 대조적으로 임의의 정책의 단순한 추가로 인해 무슨 오버헤드가 비롯되는지를 명확하게 해준다. 여기서 네 가지 정책을 넘어서는 확장성을 얻기 위해, CFI 정책(리턴(return) 및 계산된-점프/호출(computed-jump/call)) 및 테인트 추적 정책(코드 테인팅(code tainting 및 I/O 테인팅(I/O tainting))은 각각 두 부분으로 나누어진다. 추가 정책의 런타임 오버헤드가 최초의 복잡한 정책(메모리 안전)보다 점차 늘어남을 추적하는 것이 도시되는데, 이때 비-이상치(non-outlier)에 미치는 주목할만한 런타임은 없고(최악의 비-이상치는 9 %에서 10 % 오버헤드로 상승함), 주로 미스 핸들러 해결 복잡성 증가로 인해 각각의 새로운 종류의 정책이 추가되기 때문에 두 개의 이상 치에서 더 크게 증가한다(20 내지 40 %). 에너지는 GemsFDTD를 제외한 모든 것을 감안하는 비-이상치 정책에 미치는 영향이 보통 정도인 유사한 추세를 따른다(geomean은 60 %에서 70 %로 증가한다).Figures 13a (CPI Overhead) and 13b (EPI Overhead) show the overhead of each single policy first, then the composite of adding the policy to the most complex single policy, Memory Safety, allowing incremental addition of policies. Shows how it affects runtime overhead. This progression makes it clear what overhead is introduced by the simple addition of an arbitrary policy as opposed to adding a higher overhead policy. To achieve scalability beyond the four policies here, the CFI policies (return and computed-jump/call) and taint tracking policies (code tainting and I/O Each of the tainting (I/O tainting) is divided into two parts: it is shown tracking that the run-time overhead of the additional policy gradually increases over the first complex policy (memory safety), which is non-outlier ), there is no notable runtime impact on the worst non-outliers (it rises from 9% to 10% overhead), mainly because each new kind of policy is added due to increased miss handler resolution complexity. Larger increases (20 to 40%) Energy follows a similar trend (geomean increases from 60% to 70%) with moderate impact on non-outlier policies accounting for everything except GemsFDTD.

관련 동작에 관한 간략한 요약은 도 15에서 재현된 테이블 2에서 확인된다.A brief summary of the relevant operations is found in Table 2 reproduced in FIG. 15 .

본 명세서에서의 기술에 따른 정책 프로그래밍 모델에 따르면, PUMP 정책은 태그 값들의 세트와 함께, 이들 태그들을 조작하여 원하는 태그 전파(tag propagation) 및 시행 메커니즘을 구현하는 규칙들의 모음을 포함한다. 규칙은 두 개의 형태: 시스템의 소프트웨어 계층(상징적 규칙) 또는 하드웨어 계층(구체적인 규칙)로 비롯된다.According to the policy programming model according to the description herein, a PUMP policy includes a set of tag values and a collection of rules that manipulate those tags to implement the desired tag propagation and enforcement mechanism. Rules come in two forms: the software layer of the system (symbolic rules) or the hardware layer (concrete rules).

예를 들어, PUMP의 동작을 설명하기 위해, 프로그램 실행 중에 리턴 포인트를 제한하는 간단한 예제 정책을 고려한다. 이 정책의 동기는 리턴 지향 프로그래밍(return-oriented programming)(ROP)으로 알려진 공격 부류에서 비롯되는데, POP에서, 공격자는 공격 받는 프로그램의 이진 실행 파일에서 "가젯(gadget)"들의 세트를 식별하고 이를 사용하여 적절한 스택 프레임들의 시퀀스 - 각 스택 프레임에는 일부 가젯을 가리키는 리턴 어드레스가 담겨 있음 - 를 구축함으로써 복잡한 악의적인 거동을 모으며; 그러면 버퍼 오버플로우 또는 다른 취약점이 악용되어 스택의 상단을 원하는 시퀀스로 오버라이팅함으로써, 스니핏(snippet)을 순서대로 실행되게 한다. ROP 공격을 제한하는 하나의 간단한 방법은 리턴 명령어의 타깃을 잘 정의된 리턴 포인트로 제한하는 것이다. 이것은 PUMP를 사용하여 메타데이터 태그 타깃이 있는 유효한 리턴 포인트인 명령어에 태깅함으로써 수행된다. 리턴 명령어가 실행될 때마다, PC상의 메타데이터 태그는 리턴이 방금 발생했는지를 표시할 것을 체크하도록 설정된다. 다음 명령어에서, PC 태그가 체크되고 현재 명령어 상의 태그가 타깃인지를 검증하고, 그렇지 않으면 보안 위반을 신호한다. 메타데이터를 더 풍부하게 만듦으로써, 어떤 리턴 명령어가 어떤 리턴 포인트로 리턴할 수 있는지를 정확하게 제어하는 것이 가능하다. 메타데이터를 더욱 풍부하게 만듦으로써, 전체 CFI를 체킹하는 것이 구현할 수 있다.For example, to illustrate the operation of PUMP, consider a simple example policy that limits return points during program execution. The motivation for this policy comes from a class of attack known as return-oriented programming (ROP), in which an attacker identifies a set of "gadgets" in the binary executable of an to assemble complex malicious behavior by constructing a sequence of appropriate stack frames, each stack frame containing a return address pointing to some gadget; A buffer overflow or other vulnerability can then be exploited to overwrite the top of the stack with the desired sequence, causing snippets to be executed in sequence. One simple way to limit ROP attacks is to limit the targets of return instructions to well-defined return points. This is done by using PUMP to tag instructions that are valid return points with metadata tag targets. Whenever a return instruction is executed, a metadata tag on the PC is set to check to indicate that a return has just occurred. In the next instruction, the PC tag is checked and verifies that the tag on the current instruction is the target, otherwise signals a security violation. By making the metadata richer, it is possible to control precisely which return commands can return to which return points. By making the metadata richer, checking the entire CFI can be implemented.

PUMP(10)의 정책 설계자 및 소프트웨어 부분의 관점에서, 작은 도메인-특정 언어로 작성된 상징적 규칙(symbolic rule)을 사용하여 정책을 간결하게 서술할 수 있다. 예시적인 상징적 규칙 및 그의 프로그램 언어는 예를 들어, "PUMP 프로그램하기, 하드웨어 지원형 보안 마이크로-정책(PROGRAMMING THE PUMP, Hardware-Assisted Micro-Policies for Security)"이라는 제목의 단원에서 설명된다.From the perspective of the policy designer and software portion of PUMP 10, policies can be concisely described using symbolic rules written in a small domain-specific language. Exemplary symbolic rules and their programming language are described, for example, in the section entitled “Programming the PUMP, Hardware-Assisted Micro-Policies for Security”.

상징적 규칙은 다양한 메타데이터 추적 메커니즘을 간결하게 인코딩할 수 있다. 그러나 하드웨어 레벨에서, 기본 계산의 속도 저하를 피하기 위해 효율적인 해석을 위해 조정된 표현을 위한 규칙이 필요하다. 이를 위해, 구체적 규칙(concrete rule)이라고 불리는 더 낮은 레벨의 규칙 형식이 도입될 수 있다. 직관적으로, 주어진 정책에 대한 각각의 상징적 규칙은 동등한 세트의 구체적인 규칙으로 확장될 수 있다. 그러나 단일의 상징적 규칙은 일반적으로 무한한 수의 구체적인 규칙을 발생할 수 있기 때문에, 이렇게 완성시킨 일은 느리게 수행되어 시스템이 실행되는 동안 필요에 따라 구체적인 규칙을 발생한다.Symbolic rules can succinctly encode various metadata tracking mechanisms. However, at the hardware level, rules for the coordinated representation are needed for efficient interpretation to avoid slowing down the underlying calculations. To this end, a lower level rule form called a concrete rule can be introduced. Intuitively, each symbolic rule for a given policy can be extended to an equivalent set of concrete rules. However, since a single symbolic rule can generally generate an infinite number of specific rules, this completion is slow, generating specific rules as needed while the system is running.

(예를 들어, ROP보다 풍부한) 메타데이터 태그가 있는 정책의 경우, 상징적 규칙으로부터 구체적 규칙으로의 변환은 동일한 일반 라인을 따르지만, 세부 사항은 조금 복잡해진다. 예를 들어, 테인트-추적 정책은 태그가 메모리 데이터 구조체를 가리키는 포인터가 되게 하며, 각각의 메모리 데이터 구조체는 (주어진 단편의 데이터의 근원이 될 수 있는 데이터 소스 또는 시스템 컴포넌트를 나타내는) 테인트들의 임의의 크기의 세트를 서술한다. 로드 opgroup에 대한 상징적 규칙은 로드된 값에 미친 테인트가 명령어 자체에 미친 테인트, 로드를 위한 타깃 어드레스 및 그 어드레스에 있는 메모리의 합집합이어야 한다고 말한다. 상징적 규칙 및 그의 프로그램 언어는 이전에 확인된 "PUMP 프로그램하기, 보안을 위한 하드웨어 지원형 마이크로-정책"이라는 제목의 논문으로부터 참조로 포함되고 공공 열람이 가능하다.For policies with metadata tags (e.g. richer than ROPs), the conversion from symbolic rules to concrete rules follows the same general lines, but the details get a bit more complicated. For example, a taint-tracking policy allows tags to be pointers to memory data structures, each of which contains a set of taints (representing a data source or system component from which a given fragment's data may originate). Describes a set of arbitrary size. The symbolic rule for load opgroups says that the taints on the values loaded must be the union of the taints on the instruction itself, the target address for the load, and the memory at that address. The Symbolic Rules and its programming language are included by reference from a paper titled "Programming PUMPs, Hardware-Assisted Micro-Policies for Security," previously identified and publicly available.

구별되는 태그의 수를 줄이기 위해(및 이에 따라 규칙 캐시에 가해지는 압박을 줄이기 위해), 메타데이터 구조체는 내부적으로 표준형으로 저장될 수 있고, 태그는 변경할 수 없기 때문에 공유하게 되면 완전히 악용된다(예를 들어, set 요소에는 표준 순서가 부여되므로 set는 공통 프리픽스 서브세트(common prefix subset)를 공유하여 간결하게 표현될 수 있다). 더 이상 필요하지 않을 때, 이러한 구조체는(예를 들어, 가비지 콜렉션에 의해) 되돌려질 수 있다.To reduce the number of distinct tags (and thereby reduce the pressure on the rules cache), metadata structures can be stored internally in canonical form, and since tags are immutable, they are outright exploitable when shared (e.g. For example, since set elements are given a standard order, sets can be concisely expressed by sharing a common prefix subset). When no longer needed, these structures can be returned (eg by garbage collection).

실시예는 복합 정책을 이용할 수 있다. 태그를 여러 컴포넌트 정책의 태그 튜플을 가리키는 포인터가 되도록 함으로써 다중 직교 정책은 동시에 시행될 수 있다. (일반적으로 다중 정책은 직교하지 않을 수 있다) 예를 들어, 테인트-추적 정책을 사용하여 제 1 리턴 opgroup(ROP) 정책을 작성하기 위해, 각 태그를 하나의 튜플의 표현(r; t)을 가리키는 포인터로 놓는데, 여기서 r은 ROP 태그(코드 위치 식별자)이고 t는 테인트 태그(한 세트의 테인트를 가리키는 포인터)이다. 캐시 룩업 프로세스는 정확히 동일하지만, 미스가 발생할 때 튜플의 컴포넌트를 추출하고 상징적 규칙의 세트를 둘 다 평가하는 루틴으로 디스패치한다. 두 정책이 적용하는 규칙을 가진 경우에만 동작이 허용되며; 이 경우 결과 태그는 두 개의 서브 정책의 결과를 포함하는 쌍을 가리키는 포인터이다.Embodiments may use composite policies. Multiple orthogonal policies can be enforced concurrently by having the tags be pointers to tag tuples of multiple component policies. (In general, multiple policies may not be orthogonal.) For example, to create a first-return opgroup (ROP) policy using a taint-tracking policy, each tag is a representation of a tuple (r; t) Let it be a pointer to , where r is a ROP tag (code location identifier) and t is a taint tag (pointer to a set of taints). The cache lookup process is exactly the same, but when a miss occurs it extracts the components of the tuple and dispatches a set of symbolic rules to a routine that evaluates both. Actions are allowed only if both policies have rules that apply; In this case, the result tag is a pointer to a pair containing the result of the two subpolicies.

연결 정책 시스템 및 보호에서, 정책 시스템은 각 사용자 프로세스 내에서 별도의 메모리 영역으로서 존재한다. 정책 시스템은 예를 들어, 미스 핸들러에 대한 코드, 정책 규칙 및 정책의 메타데이터 태그를 나타내는 데이터 구조체를 포함할 수 있다. 프로세스에 정책 시스템을 배치하면 기존의 유닉스(Unix) 프로세스 모델에서 침입이 최소화되고 정책 시스템과 사용자 코드 간의 가벼운 전환을 용이하게 한다. 정책 시스템은 다음에 설명되는 메커니즘을 사용하여 사용자 코드와 분리된다.In the connection policy system and protection, the policy system exists as a separate memory area within each user process. A policy system may include, for example, code for miss handlers, policy rules, and data structures representing the policy's metadata tags. Placing the policy system in the process minimizes intrusion in the traditional Unix process model and facilitates lightweight transitions between the policy system and user code. The policy system is separated from user code using the mechanisms described below.

분명히, 공격자가 메타데이터 태그를 재 기입하거나 그 해석을 변경할 수 있다면, PUMP에 의해 제공되는 보호는 쓸모가 없다. 본 명세서에서 설명된 기술은 이러한 공격을 방지하도록 설계된다. 커널(kernel), 로더(loader) 및 (일부 정책의 경우) 컴파일러(complier)는 신뢰성이 있다. 특히, 컴파일러는 초기 태그를 워드에 할당하고, 필요한 경우 규칙을 정책 시스템에 전달하는 것을 필요로 한다. 로더는 컴파일러에 의해 제공된 태그를 보존할 것이고, 컴파일러로부터 로더까지의 경로는 예를 들어, 암호화 서명을 사용하여 함부로 변경되는 것이 방지된다.Obviously, the protection provided by PUMP is useless if an attacker can rewrite the metadata tag or change its interpretation. The techniques described herein are designed to prevent such attacks. The kernel, loader and (for some policies) compiler are reliable. In particular, the compiler needs to assign initial tags to words and, if necessary, pass rules to the policy system. The loader will preserve the tags provided by the compiler, and the path from the compiler to the loader is protected from tampering using, for example, cryptographic signatures.

본 명세서에서의 기술에 따른 실시예는 각 프로세스에 대해 초기 메모리 이미지를 설정하는 표준 유닉스 스타일 커널을 사용할 수 있다. (이러한 가정 중 일부를 없애기 위해 마이크로-정책을 사용하여 TCB의 크기를 추가로 줄이는 것이 가능할 수 있다). 이러한 실시예에서, 규칙-캐시-미스-핸들링 소프트웨어가 정확하게 구현된다고 추가로 가정한다. 이것은 작으며, 그렇기 때문에, 공식 검증을 위한 좋은 타깃이다. 하나의 관심사는 프로세스에서 실행 중인 사용자 코드가 프로세스의 정책에 의해 제공되는 보호를 약화시키지 못하게 하는 것이다. 사용자 코드는 (i) 태그를 직접 조작할 수 없어야 하고 - 모든 태그 변경은 현재 시행되는 정책/정책 규칙에 따라 수행되어야 하고 -; (ii) 미스 핸들러에 의해 사용된 데이터 구조체 및 코드를 조작할 수 없어야 하며; (iii) 하드웨어 규칙 캐시에 규칙을 직접 삽입할 수 없어야 한다.Embodiments according to the techniques herein may use a standard Unix-style kernel that sets up an initial memory image for each process. (It may be possible to further reduce the size of the TCB using micro-policies to eliminate some of these assumptions). In this embodiment, it is further assumed that the rule-cache-miss-handling software is correctly implemented. It is small and, as such, a good target for formal verification. One concern is to prevent user code running in a process from weakening the protection provided by the process' policy. User code must (i) not be able to directly manipulate tags - all tag changes must be performed in accordance with currently enforced policies/policy rules; (ii) cannot manipulate data structures and code used by the miss handler; (iii) It must not be possible to insert rules directly into the hardware rule cache.

어드레스 지정과 관련하여, 사용자 코드에 의한 태그의 직접적인 조작을 방지하기 위해, 모든 64b 워드에 첨부된 태그는 자체가 별도로 어드레스 지정될 수 없다. 특히, 태그를 판독하거나 기입하기 위해 태그 또는 태그의 일부분에만 대응하는 어드레스를 특정하는 것은 가능하지 않다. 모든 사용자 액세스 가능한 명령어는 원자 단위로서 (데이터, 태그) 쌍에 대해 동작한다 - 표준 ALU는 값 부분에 대해 동작하며 PUMP는 태그 부분에 대해 동작한다.Regarding addressing, to prevent direct manipulation of tags by user code, tags attached to every 64b word cannot itself be separately addressed. In particular, it is not possible to specify an address corresponding to only a tag or a part of a tag in order to read or write a tag. All user accessible instructions operate on (data, tag) pairs as atomic units - the standard ALU operates on the value part and the PUMP operates on the tag part.

본 명세서에서의 기술에 따른 실시예의 미스 핸들러 아키텍처와 관련하여, 정책 시스템은 PUMP 캐시에 대한 미스에 대해서만 작동될 수 있다. 정책 시스템과 사용자 코드 간에 분리를 제공하기 위해, 미스 핸들러 동작 모드가 프로세서에 추가된다. 정수 레지스터 파일은 레지스터의 저장 및 복원을 방지하기 위해 미스 핸들러에만 사용 가능한 16 추가 레지스터로 확장된다. 16 개 추가 레지스터를 사용하는 것은 예시적이며 실제로는 정수 레지스터 파일을 더 적은/더 많은 레지스터로 확장해야 할 수도 있음을 유의하여야 한다. 결함 유도 명령어의 PC, 규칙 입력(opgroup 및 태그) 및 규칙 출력은 미스 핸들러 모드에 있는 동안 레지스터로서 출현한다. 구체적 규칙을 캐시에 설치하는 것을 마무리하고 사용자 코드로 리턴하는 미스-핸들러-리턴(miss-handler-return) 명령어가 추가된다.With respect to the miss handler architecture of embodiments according to the techniques herein, the policy system may only act on misses to the PUMP cache. To provide separation between the policy system and user code, a miss handler mode of operation is added to the processor. The integer register file is extended with 16 additional registers available only to the miss handler to prevent saving and restoring of registers. It should be noted that the use of 16 additional registers is exemplary and in practice the integer register file may need to be extended with fewer/more registers. PCs of fault-inducing instructions, rule inputs (opgroups and tags), and rule outputs appear as registers while in miss handler mode. A miss-handler-return directive is added to finish installing the specific rule into the cache and return to the user code.

본 명세서에서의 기술에 따른 실시예에서, 프로세서(12)가 미스 핸들러 모드에 있는 동안 PUMP(10)의 정상적인 거동은 연계 해지된다(disengaged). 그 대신, 단일의 하드와이어드 규칙(hardwired rule)이 적용되고; 미스 핸들러가 손대는 모든 명령어 및 데이터는 임의의 정책에 의해 사용되는 태그와 완전히 다른 미리 정의된 미스 핸들러 태그로 태깅되어야 한다. 이렇게 하면 동일한 어드레스 공간에서 미스 핸들러 코드 및 데이터와 사용자 코드 간의 분리가 보장된다. 사용자 코드는 정책 시스템 데이터 또는 코드를 손대거나 실행할 수 없으며, 미스 핸들러는 뜻하지 않게 사용자 데이터와 코드에 손댈 수 없다. 미스-핸들러-리턴 명령어는 미스 핸들러 모드에서만 발행될 수 있기에, 사용자 코드가 임의의 규칙을 PUMP에 삽입하는 것이 방지된다.In an embodiment consistent with the techniques herein, the normal behavior of PUMP 10 is disengaged while processor 12 is in miss handler mode. Instead, a single hardwired rule applies; All commands and data touched by the miss handler must be tagged with a predefined miss handler tag that is completely different from the tag used by any policy. This ensures separation between miss handler code and data and user code in the same address space. User code cannot tamper with or execute policy system data or code, and miss handlers cannot inadvertently tamper with user data and code. Because the miss-handler-return command can only be issued in miss-handler mode, user code is prevented from inserting arbitrary rules into the PUMP.

이전 동작에서는 안전 및 보안 정책을 간결하게 표현하거나 대략적으로 표현하는데 정교한 방식이 사용되었지만, 이것은 종종 의도된 정책에 대한 절충안이며, 이것은 복잡함을 간결함으로 맞바꾸어준다. 본 출원에서 설명되는 바와 같이, 추가 런타임 오버헤드가 거의 없거나 전혀 없이 보다 완벽하고 보다 자연스럽게 보안 정책의 필요성을 포착하는 풍부한 메타데이터를 포함하는 것이 가능하다. 메타데이터 표현 및 정책 복잡성에 대한 고정된 경계를 부과하는 대신, PUMP(10)는 성능 측면에서 적절한 저하를 제공한다. 이것은 일반적인 경우의 성능 및 크기에 영향을 주지 않으면서 정책으로 하여금 필요한 곳에 더 많은 데이터를 사용할 수 있게 한다. 복잡한 정책 조차도 쉽게 표현되고 실행될 수 있기 때문에, 정책의 증분적 세분화 및 성능 조정을 또한 가능하게 한다.Previous operations have used sophisticated methods to concisely express or approximate safety and security policies, but this is often a compromise to the intended policy, which trades complexity for brevity. As described herein, it is possible to include rich metadata that captures the needs of a security policy more completely and more naturally, with little or no additional runtime overhead. Instead of imposing fixed boundaries on metadata representation and policy complexity, PUMP 10 provides a reasonable degradation in performance. This allows policies to use more data where needed without affecting size and performance in the general case. Since even complex policies can be easily expressed and enforced, it also enables incremental granularity and performance tuning of policies.

메타데이터-기반 정책 시행의 가치에 대한 증거가 늘어나면서, 본 개시내용은 소프트웨어 정의 메타데이터 처리를 위한 아키텍처를 정의하고 대부분의 런타임 오버헤드를 제거하는 가속기를 식별한다. 전용 하드웨어 메타데이터 전파 해결책에 필적할만한 성능을 제공하는 네 개의 마이크로아키텍처 최적화(opgroup, 태그 압축, 태그 변환 및 미스 핸들러 가속화)와 함께 동시에 지원되는 메타데이터 비트의 수 또는 정책의 수에 제한 없이 (즉, 어떤 경계도 없이) 아키텍처가 본 명세서에서 소개되고 설명된다. 소프트웨어 정의된 메타데이터 정책 모델 및 그의 가속화는 사운드 정보 흐름 제어, 미세 세밀화된 액세스 제어, 무결성, 동기화, 경합 탐지(race detection), 디버깅, 애플리케이션-특정 정책 및 동적 코드의 통제된 발생 및 실행을 비롯하여 본 명세서에서 설명된 것들 이외의 광범위한 정책에 적용 가능할 것이다.Growing evidence for the value of metadata-based policy enforcement, this disclosure defines an architecture for software-defined metadata processing and identifies accelerators that eliminate most of the runtime overhead. An unlimited number of metadata bits or policies supported simultaneously (i.e. , without any boundaries) architectures are introduced and described herein. The software-defined metadata policy model and its acceleration include sound information flow control, fine-grained access control, integrity, synchronization, race detection, debugging, application-specific policies, and controlled generation and execution of dynamic code. It may be applicable to a wide range of policies other than those described herein.

본 명세서에 설명된 다양한 양태 및 실시예의 일부 비제한적인 장점은 (i) 이 아키텍처에 의해 지원되는 정책을 간결하고 정확하게 설명하기 위한 프로그래밍 모델 및 지원 인터페이스 모델; (ii) 잘 연구된 정책의 네 개의 다양한 클래스를 사용하여 정책 인코딩 및 구성에 관한 자세한 예; (iii) 이러한 정책의 요구 사항, 복잡성 및 성능의 정량화를 제공한다.Some non-limiting advantages of the various aspects and embodiments described herein include (i) a programming model and supporting interface model for concisely and accurately describing the policies supported by this architecture; (ii) detailed examples of policy encoding and construction using four different classes of well-studied policies; (iii) provide quantification of the requirements, complexity and performance of these policies;

본 명세서에 설명된 실시예의 프로그래밍 모델은 다수의 다른 정책을 인코딩할 수 있다. 여기서 정보-흐름 제어는 단순한 테인트 추적 모델보다 더 풍부하지만, 암묵적 흐름을 추적하는 것은 RIFLE-스타일의 바이너리 변환으로 또는 컴파일러로부터의 약간의 지원과 함께PC 태그를 사용하여 지원받을 수 있다. 마이크로-정책은 경량 액세스 제어 및 구획화(compartmentalization)를 지원할 수 있다. 태그는 위조할 수 없는 자원을 구별하는데 사용될 수 있다. 고유의 발생된 토큰은 데이터의 봉인 및 보증을 위한 키로서 작용할 수 있으며, 이는 결국 강력한 추상화에 사용할 수 있으므로, 인가된 코드 구성요소에 의해서만 데이터가 생성되고 파괴됨을 보장한다. 마이크로-정책 규칙은 불역성 및 선형성과 같은 데이터 불변성을 실시할 수 있다. 마이크로-정책은 데이터 또는 선물에 대해 채워진/비어있는 비트와 같은 동기화 프리미티브(synchronization primitive)에 대한 대역 외 메타데이터로서 또는 잠금에 관한 경합 조건을 감지하는 상태로서 병렬 처리를 지원할 수 있다. 시스템 설계자는 모든 라인을 감사 또는 재작성하지 않고도 특정 마이크로-정책을 기존의 코드에 적용할 수 있다.The programming model of the embodiments described herein may encode a number of different policies. Information-flow control here is richer than the simple taint tracking model, but implicit flow tracking can be supported using RIFLE-style binary translations or using PC tags with some support from the compiler. Micro-policies can support lightweight access control and compartmentalization. Tags can be used to distinguish unforgeable resources. A unique generated token can act as a key for sealing and guaranteeing data, which in turn can be used for powerful abstractions, thus ensuring that data is created and destroyed only by authorized code components. Micro-policy rules can enforce data immutability such as immutability and linearity. A micro-policy can support parallel processing as out-of-band metadata for synchronization primitives, such as filled/empty bits for data or presents, or as a condition to detect race conditions on locks. System designers can apply specific micro-policies to existing code without having to audit or rewrite every line.

본 출원에서 설명된 PUMP(10) 설계는 융통성과 성능의 매력적인 조합을 제공하여, 대부분의 경우 전용 메커니즘에 필적하는 단일 정책 성능으로 낮은 레벨의 미세 세밀화된 보안 정책의 다양한 모음을 지원하면서, 규칙 복잡성이 증가함에 따라 대부분 적절히 성능을 저하시킨 보다 풍부하고 복합적인 정책을 지원한다. 또한, PUMP에 의해 제공되는 메커니즘은 그 자체의 소프트웨어 구조체를 보호하는데 사용될 수 있다. 본 명세서에서의 기술에 따른 실시예는 PUMP(10)를 사용하여 "구획화" 마이크로-정책을 구현하고 이를 사용하여 미스 핸들러 코드를 보호함으로써 특별한 미스 핸들러 동작 모드를 대체할 수 있다. 마지막으로, 본 명세서에 설명된 바와 같이, 각각의 정책에 의해 제공되는 보호가 다른 정책과 완전히 독립적인 경우, 직교 세트의 정책들이 조합될 수 있다. 그러나 정책은 종종 상호 작용한다: 예를 들면, 정보-흐름 정책은 메모리 안전 정책에 의해 할당되는 새로운 영역에 태그를 배치해야 할 수 있다. 정책 구성에는 표현 및 효율적인 하드웨어 지원 둘 모두와 관련하여 분석이 필요하다.The PUMP 10 design described in this application provides an attractive combination of flexibility and performance, supporting a diverse collection of low-level, fine-grained security policies with a single policy performance comparable to dedicated mechanisms in most cases, while maintaining rule complexity. As it increases, it supports richer and more complex policies, most of which have moderately degraded performance. Also, mechanisms provided by PUMP can be used to protect its own software structures. Embodiments according to the techniques herein can replace special miss handler operation modes by using PUMP 10 to implement a "compartmentalization" micro-policy and use it to protect miss handler code. Finally, as described herein, an orthogonal set of policies may be combined if the protection provided by each policy is completely independent of the other policies. However, policies often interact: for example, an information-flow policy may require placing a tag in a new area allocated by a memory safety policy. Policy construction requires analysis in terms of both representation and efficient hardware support.

이제는 힙 할당 메모리에서 모든 시간적 및 공간적 위반을 식별하는 본 명세서에서의 기술에 따른 실시예에서 메모리 안전 정책의 구현을 도시하는 다른 예가 설명될 것이다. 적어도 하나의 실시예에서, 각각의 새로운 할당에 대해, 새로운 컬러 id, c를 구성하고 (예를 들어, 예컨대 memset을 통해) 새로 생성된 메모리 블록 내의 각각의 메모리 위치상의 태그로서 c를 기입하는 처리가 수행될 수 있다. 새 블록을 가리키는 포인터 또한 c로 태깅된다. 나중에 포인터를 역 참조하는 처리가 수행될 때, 그 처리는 포인터가 참조하거나 가리키는 메모리 셀 상의 태그와 동일한지를 포인터의 태그가 체크하는 것을 포함할 수 있다. 블록이 프리(free) 상태로 될 때, 블록의 모든 셀상의 태그가 비어 있는 메모리를 나타내는 상수 F로 수정될 수 있다. 힙에는 초기에 F로 태깅될 수 있다. 포인터가 아닌 경우 특수 태그인

이 사용할 수 있다. 따라서, 일반적으로, 실시예는 메모리 위치에 대해 컬러 c 또는

중 어느 하나인 태그 t를 기입할 수 있다. Another example will now be described showing the implementation of a memory safety policy in an embodiment according to the techniques herein to identify all temporal and spatial violations in heap allocated memory. In at least one embodiment, for each new allocation, the process of constructing a new color id, c, and writing c as a tag on each memory location within the newly created memory block (e.g., via memset, for example). can be performed. A pointer to the new block is also tagged with c. Later, when a process of dereferencing the pointer is performed, that process may include checking that the pointer's tag is the same as the tag on the memory cell to which the pointer refers or points. When a block is freed, the tags on every cell in the block can be modified with a constant F representing empty memory. A heap may initially be tagged with an F. If it is not a pointer, the special tag

this can be used Thus, in general, embodiments may use color c or

Any one of the tag t can be written.

메모리 셀은 포인터를 포함할 수 있기 때문에, 일반적으로 메모리 내의 각 워드는 두 개의 태그와 연관될 수 있다. 이러한 실시예에서, 각 메모리 셀상의 태그는 쌍(c, t)을 가리키는 포인터가 되며, 여기서 c는 이 셀이 할당되었던 메모리 블록의 id이고, t는 셀에 저장된 워드의 태그이다. 실시예는 상징적 규칙의 관점에서 정책을 특정하기 위해 본 명세서의 다른 곳에서 설명된 규칙 기능에 기초한 도메인 특정 언어를 사용할 수 있다. 로드 및 스토어에 대한 규칙은 이들 쌍을 패킹 및 언패킹하는 것과 함께, 각 메모리 액세스가 유효한지를 (즉, 액세스된 셀이 이 포인터가 가리키는 블록 내에 있는지를) 체크하는 것에 주의한다:Because memory cells can contain pointers, each word in memory can typically be associated with two tags. In this embodiment, the tag on each memory cell is a pointer to the pair (c, t), where c is the id of the memory block to which this cell was allocated and t is the tag of the word stored in the cell. Embodiments may use a domain specific language based on the rule functionality described elsewhere herein to specify policies in terms of symbolic rules. Note that the rules for loads and stores, along with packing and unpacking these pairs, check that each memory access is valid (i.e., that the cell accessed is within the block pointed to by this pointer):

전술한 규칙 및 다른 규칙에서 체크의 수행은 상징적 규칙이 유효한 조건(예를 들어, 스토어 규칙에서 위의 c₂=c₃)으로서 나타난다. "-" 기호는 규칙에서 돈-케어 필드를 나타낸다.The performance of the check in the aforementioned rules and other rules appears as a condition under which the symbolic rule is valid (eg, c ₂ =c ₃ above in the store rules). A "-" sign indicates a money-care field in a rule.

어드레스 산술 연산은 포인터 태그를 보존한다:Address arithmetic operations preserve pointer tags:

포인터에 관한 태그가 할당으로부터만 발생한다는 불변성을 유지하기 위해, 데이터를 스크래치(scratch)(예를 들어, 상수의 로딩)부터 생성하는 동작은 그의 태그를

로 설정한다.In order to maintain the immutability that tags on pointers only arise from assignments, operations that create data from scratch (e.g., loading of constants) have their tags.

set to

메모리 안전성 정책을 구현하는 실시예에서, malloc 및 free와 같은 동작은, 예를 들어, 태깅된 명령어 및 (예를 들어, 한번 사용되면 캐시로부터 삭제될 수 있는) 일시적 규칙을 사용하여 메모리 영역에 태깅하도록 수정될 수 있다. malloc와 관련하여, 처리는 포인터가 일시적인 규칙을 통해 새로운 영역을 가리키는 새로운 태그를 발생할 수 있다. 예를 들어, 무브(move)에 대한 규칙은 다음과 같은 일시적인 규칙일 수 있다:In embodiments implementing a memory safety policy, operations such as malloc and free may be tagged with memory regions using, for example, tagged instructions and transient rules (eg, that may be deleted from the cache once used). can be modified to In terms of malloc, the process can generate a new tag where the pointer points to a new area via the temporal convention. For example, a rule for a move could be a temporary rule such as:

위 첨자가 1 인 화살표(예를 들어,

)는 일시적 규칙을 나타낼 수 있다. 그런 다음 태깅된 포인터를 리턴하기에 앞서, 새로 태깅된 포인터는 특수 스토어 규칙을 사용하여 할당된 영역의 모든 동작에 0을 기입하는데 사용될 수 있다:An arrow with a superscript of 1 (e.g.

) may represent a temporary rule. Then, prior to returning the tagged pointer, the newly tagged pointer can be used to write zeros to all operations in the allocated area using special store rules:

나중 시점에, 영역을 free 리스트에 리턴하기 전에, free는 수정된 스토어 명령어를 영역을 할당되지 않은 것으로 다시 태깅하는데 사용할 수 있다:At a later point, before returning the region to the free list, free can use the modified store command to re-tag the region as unallocated:

메모리 안전 정책을 사용하는 이러한 실시예에서, opgroup은 다음과 같이 규칙 세트를 서술하는데 사용될 수 있다:In this embodiment using a memory safety policy, an opgroup can be used to describe a set of rules as follows:

정책 명세에 대해 위에서 사용된 상징적 규칙은 변수를 사용하여 기입될 수 있으므로, 소수의 상징적 규칙이 구별되는 값의 무한한 세계를 통해 정책을 설명할 수 있게 한다. 그러나 규칙 캐시에 저장된 구체적인 규칙은 특정의 구체적 태그 값을 말한다. 예를 들어, 23 및 24가 유효한 메모리 블록 컬러이면, 실시예는 c = 23 및 c = 24에 대한 PUMP 규칙 캐시에서 위의 상징적 규칙(3)의 구체적인 인스턴스를 갖는 구체적인 규칙을 사용할 수 있다. 예를 들어,

을 0으로 인코딩하고 돈-케어 필드를 0으로 표시한다고 가정하면, 위의 상징적 규칙(3)에 대해 구체적 규칙은 다음과 같다:The symbolic rules used above for policy specifications can be written using variables, allowing a small number of symbolic rules to describe a policy through an infinite world of distinct values. However, specific rules stored in the rule cache refer to specific specific tag values. For example, if 23 and 24 are valid memory block colors, an embodiment may use concrete rules with concrete instances of symbolic rule (3) above in the PUMP rule cache for c = 23 and c = 24. for example,

Assuming that we encode as 0 and denote the money-care field as 0, the concrete rule for symbolic rule (3) above is:

본 명세서의 다른 곳에서의 논의와 일관하여, 적어도 하나의 실시예에서, 미스 핸들러는 규칙을 PUMP 규칙 캐시에 삽입하기 위해 구체적 입력 태그를 획득하고 상징적 규칙으로부터 컴파일된 코드를 실행하여 연관된 구체적 출력 태그를 생성할 수 있다. 상징적 규칙이 위반을 식별할 때, 제어는 에러 핸들러에 전달하고 새로운 구체적 규칙은 PUMP 규칙 캐시에 삽입되지 않는다.Consistent with discussion elsewhere herein, in at least one embodiment, a miss handler obtains concrete input tags and executes code compiled from symbolic rules to insert rules into the PUMP rule cache, resulting in associated concrete output tags. can create When a symbolic rule identifies a violation, control passes to the error handler and no new concrete rule is inserted into the PUMP rule cache.

이제 본 명세서에서의 논의와 일치하는 소프트웨어 정의된 메타데이터 처리(software defined metadata processing, SDMP)를 지원하기 위해 RISC-V 아키텍처가 메타데이터 태그 및 PUMP로 추가로 확장된 것에 기초한 본 명세서에서의 기술에 따른 실시예가 설명될 것이다. RISC-V는 축소 명령어 집합 컴퓨팅(reduced instruction set computing)(RISC) 명령어 집합 아키텍처(instruction set architecture)(ISA)의 오픈 소스 구현으로서 특징지을 수 있다. 이러한 실시예에서, 메타데이터 태그는 각 워드에 대한 명령어 및 데이터 둘 모두에 배치된다. RISC-V 아키텍처에서 워드는 64 비트이다. RISC-V 아키텍처는 상이한 워드 크기 변형성을 제공한다 - RV64는 64 비트 워드 크기이고 RV32는 32 비트 워드 크기이다. 레지스터 및 사용자 어드레스 공간의 너비 또는 크기는 워드 크기에 따라 다를 수 있다. 태그 크기 또는 너비는 워드 크기 또는 너비와 독립적일 수 있지만, 일 실시예에서는 더 전형적으로 동일할 수 있다. 관련 기술분야에서 공지된 바와 같이, RISC-V 아키텍처는 32 비트 명령어를 가지며, 이에 따라 64 비트 워드 크기를 사용하여 지원하고 동작하는 실시예는 단일의 태깅된 워드에 2 명령어를 저장할 수 있다. RISC-V 아키텍처의 전술한 양상 및 다른 양상은 메타데이터 태그인 PUMP 및 SDMP와 함께 사용하기 위해 RISC-V 아키텍처를 확장하는 것과 관련된 상이한 기술 및 특징의 사용과 관련하여 본 명세서의 다른 곳에서 논의된다. Now, to the description herein based on the RISC-V architecture being further extended with metadata tags and PUMP to support software defined metadata processing (SDMP) consistent with the discussion herein. A following embodiment will be described. RISC-V can be characterized as an open source implementation of the reduced instruction set computing (RISC) instruction set architecture (ISA). In this embodiment, metadata tags are placed in both the instructions and data for each word. In the RISC-V architecture, words are 64 bits. The RISC-V architecture offers different word size variants - RV64 is a 64 bit word size and RV32 is a 32 bit word size. The width or size of the register and user address space may vary depending on the word size. The tag size or width may be independent of the word size or width, but in one embodiment may be more typically equal. As is known in the art, the RISC-V architecture has 32-bit instructions, so embodiments that support and operate using a 64-bit word size can store 2 instructions in a single tagged word. The foregoing and other aspects of the RISC-V architecture are discussed elsewhere herein with respect to the use of different technologies and features related to extending the RISC-V architecture for use with the metadata tags PUMP and SDMP. .

RISC-V 아키텍처는 예를 들어, ["The RISC-V Instruction Set Manual Vol. I, User-Level ISA, Version 2.0", May 6, 2014, Waterman, Andrew, et. al., ("also referred to as the "RISC-V user level ISA")]에 설명된 바와 같은 사용자 레벨 명령어를 포함하며, 이 문헌은 본 명세서에서 참조로 포함되고, 예를 들어 RISCV.ORG 웹 사이트 및 버클리 소재의 유니버시티 오브 캘리포니아(University of California)를 통해 Technical Report UCB/EECS-2014-54로서 대중에게 입수 가능하다. RISC-V 아키텍처는 또한 예를 들어 ["The RISC-V Instruction Set Manual Volume II: Privileged Architecture, Version 1.7", May 9, 2015, also referred to as the "RISC-V privileged ISA")]에 설명된 바와 같이, 오퍼레이팅 시스템, 부착형 외부 디바이스 등을 실행하는데 필요한 권한 있는 명령어 및 추가 기능성을 포함하는 권한 있는 아키텍처를 통합하며, 이 문헌은 본 명세서에서 참조로 포함되고, 예를 들어, RISCV.ORG 웹 사이트 및 버클리 소재의 University of California를 통해 Technical Report UCB/EECS-2015-49로서 대중에게 입수 가능하다.The RISC-V architecture is described in, for example, ["The RISC-V Instruction Set Manual Vol. I, User-Level ISA, Version 2.0", May 6, 2014, Waterman, Andrew, et. al., ("also referred to as the "RISC-V user level ISA")], which is incorporated herein by reference, and for example on the RISCV.ORG web Site and is available to the public as Technical Report UCB/EECS-2014-54 through the University of California, Berkeley The RISC-V architecture is also described in ["The RISC-V Instruction Set Manual Volume II: Privileged Architecture, Version 1.7", May 9, 2015, also referred to as the "RISC-V privileged ISA")], privileged commands necessary to run the operating system, attached external devices, etc., and Incorporates a privileged architecture that includes additional functionality, which document is incorporated herein by reference, eg, Technical Report UCB/EECS-2015-49 via the RISCV.ORG website and the University of California, Berkeley. as available to the public.

RISC-V 아키텍처의 실시예는 다음과 같은 네 개의 RISC-V 권한 레벨: 사용자/애플리케이션(U) 권한 레벨에 대한 레벨 0, 슈퍼바이저(S) 권한 레벨에 대한 레벨 1, 하이퍼바이저(H) 권한 레벨에 대한 레벨 및 머신(M) 권한 레벨에 대한 레벨 3RISC-V 권한 레벨을 가질 수 있다. 전술한 것에서, RISC-V 권한 레벨은 최고 내지 최저의 0부터 3까지의 순위가 매겨질 수 있고, 레벨 0이 최고 또는 가장 큰 권한 레벨을 나타내고, 레벨 3이 가장 낮은 또는 최소 권한 레벨을 나타낸다. 이러한 권한 레벨은 상이한 컴포넌트들 간에 보호를 제공하는데 사용될 수 있으며, 현재 권한 레벨 또는 모드에서 허용되지 않는 연산을 수행하는 코드를 실행하려 시도하면 기본 실행 환경으로 향하게 하는 트랩과 같은 예외가 일어나게 한다. 머신 레벨은 가장 높은 권한을 가지며 RISC-V 하드웨어 플랫폼에만 유일한 강제적 권한 레벨(mandatory privilege level)이다. 머신 모드(M-모드)에서 실행되는 코드는 머신 구현으로의 액세스 레벨이 낮기 때문에 본질적으로 신뢰성이 있다. 사용자 모드(U-모드) 및 슈퍼바이저 모드(S-모드)는 각각 통상의 애플리케이션 및 오퍼레이팅 시스템용으로 의도되지만, 하이퍼바이저 모드(H-모드)는 가상 머신 모니터를 지원하도록 의도된다. 각 권한 레벨은 임의적 확장과 변형성을 가진 핵심 권한 ISA 확장 세트를 가지고 있다. RISC-V 아키텍처의 구현은 적어도 M-모드를 지원하여야 하고 대부분의 구현은 적어도 U-모드 및 M-모드를 지원한다는 것을 유의하여야 한다. S-모드는 슈퍼바이저 레벨 오퍼레이팅 시스템의 코드와 M-모드에서 실행되는 다른 보다 권한 있는 코드 간의 추가 분리를 제공하기 위해 추가될 수 있다. 사용자 또는 애플리케이션 코드는 전형적으로 트랩(예를 들어, 슈퍼바이저 호출, 페이지 폴트(page fault)) 또는 인터럽트가 발생하여 지원되는 상위 권한 모드 또는 레벨(예를 들어, H, S 또는 M 모드) 중 하나에서 실행되는 트랩 핸들러로 제어를 강제로 이전할 때까지 U-모드에서 실행될 수 있다. 이후 트랩 핸들러의 코드가 실행된 다음 트랩을 유발한 원래 사용자 코드 또는 애플리케이션으로 제어가 반환될 수 있다. 이러한 사용자 코드 또는 애플리케이션의 실행은 트랩 핸들러 호출을 트리거했던 U-모드의 원래의 트랩된 명령어에서 또는 그 이후에서 재개될 수 있다. RISC-V 구현에서 지원되는 모드들의 다양한 조합은: 단일 M 모드, 두 개의 모드 - M 및 U, 세 개의 모드 - M, S 및 U, 또는 네 개의 모드 M, H, S, U 만을 포함할 수 있다. 본 명세서에 설명된 적어도 하나의 실시예에서, 전술한 권한 레벨들 네 개 모두가 지원될 수 있다. 최소한 본 명세서에서의 기술에 따른 실시예는 M AND U 모드를 지원할 수 있다.An embodiment of the RISC-V architecture has four RISC-V permission levels: Level 0 for user/application (U) permission level, level 1 for supervisor (S) permission level, and hypervisor (H) permission level. level for level and level 3RISC-V permission level for machine (M) permission level. In the foregoing, RISC-V privilege levels can be ranked from 0 to 3, from highest to lowest, with level 0 representing the highest or greatest privilege level and level 3 representing the lowest or least privilege level. These privilege levels can be used to provide protection between different components, and cause an exception such as a trap to be directed to the default execution environment if an attempt is made to execute code that performs an operation that is not permitted by the current privilege level or mode. The machine level has the highest privilege and is the only mandatory privilege level on RISC-V hardware platforms. Code running in machine mode (M-mode) is inherently trusted because of its low level of access to the machine implementation. User mode (U-mode) and supervisor mode (S-mode) are intended for normal applications and operating systems, respectively, while hypervisor mode (H-mode) is intended to support virtual machine monitors. Each privilege level has a set of core privilege ISA extensions with arbitrary extensions and variations. It should be noted that implementations of the RISC-V architecture must support at least M-mode and most implementations support at least U-mode and M-mode. S-mode can be added to provide further separation between code in the supervisor-level operating system and other more privileged code running in M-mode. User or application code typically triggers a trap (e.g. supervisor call, page fault) or interrupt to generate one of the higher privilege modes or levels supported (e.g. H, S or M modes). It can run in U-mode until it forcibly transfers control to a trap handler running on The code in the trap handler can then be executed and control returned to the original user code or application that caused the trap. Execution of this user code or application can resume at or after the original trapped instruction in U-mode that triggered the trap handler call. The various combinations of modes supported in a RISC-V implementation may include: a single M mode, two modes - M and U, three modes - M, S and U, or only four modes M, H, S, U. there is. In at least one embodiment described herein, all four of the aforementioned privilege levels may be supported. At least an embodiment according to the technology herein may support the M AND U mode.

RISC-V 아키텍처는 하나 이상의 연관된 권한 레벨에 의해 원자적으로 판독되고 수정될 수 있는 제어 상태 레지스터(Control Status Register)(CSR)를 갖는다. 일반적으로 CSR은 네 개의 권한 레벨 중 첫 번째 권한 레벨과 첫 번째 권한 레벨보다 높은 네 개 권한 레벨 중 임의의 다른 권한 레벨에서 액세스할 수 있다. 예를 들어, 프로그램이 U-모드(레벨 3)에서 실행 중이고 규칙 캐시 미스와 같은 트랩이 발생한다고 가정하면, 제어는 규칙 캐시 미스 핸들러 코드와 같은 상위 권한 또는 모드(예를 들어, 레벨 0 내지 2 중 임의의 레벨)에서 실행 중인 트랩 핸들러로 이전된다. 트랩이 발생하면, 정보는 M-모드에서 실행되는 트랩 핸들러에 액세스 가능한, 예를 들면 그렇지 않았더라면 (예를 들어, H, S 또는 U 모드에서는 코드에 액세스 가능하지 않은) 더 낮은 권한 레벨에서 실행되는 임의의 다른 코드에 액세스 가능하지 않은, CSR에 배치될 수 있다. 적어도 하나의 실시예에서, 규칙 캐시 미스 핸들러는 PUMP 보호 레벨 이상의 권한 레벨에서 실행될 수 있다(예를 들면, H-모드, S-모드 또는 M-모드에서 실행될 수 있다). 이러한 실시예에서, 본 명세서의 다른 곳에서 설명된 바와 같이, 규칙 캐시 미스 핸들러 레벨에서 태그 정의 및 정책은 (예를 들어, 가상 머신별로) 오퍼레이팅 시스템 전역에 걸쳐 있을 수 있으며, 이에 따라 동일한 태그 정의 및 정책은 모든 실행 코드 전체에 적용될 수 있다. 적어도 하나의 실시예에서, 애플리케이션별 또는 프로세스별 정책은 그러한 정책이 전역적으로 설치된 곳에서 지원될 수 있고, PC(현재 명령어를 식별하는 프로그램 카운터) 및/또는 코드는 프로세스 또는 애플리케이션-특정 규칙을 구별하기 위해 태깅될 수 있다. 가상 머신(virtual machine)(VM)이 메모리를 공유하지 않는 실시예에서, 정책은 VM 단위로 정의될 수 있다.The RISC-V architecture has a Control Status Register (CSR) that can be read and modified atomically by one or more associated privilege levels. In general, a CSR can be accessed at the first of the four permission levels and at any other permission level of the four higher permission levels. For example, assuming a program is running in U-mode (level 3) and a trap such as a rules cache miss occurs, control is passed to a higher privilege or mode (e.g., levels 0 to 2), such as the rules cache miss handler code. at any level of ) to the running trap handler. When a trap occurs, the information runs at a lower privilege level that would otherwise be accessible to the trap handler running in M-mode (e.g. not accessible to code in H, S or U modes). can be placed in the CSR, which is not accessible to any other code that is In at least one embodiment, the rule cache miss handler may run at a privilege level above the PUMP protection level (eg, in H-mode, S-mode or M-mode). In such an embodiment, as described elsewhere herein, tag definitions and policies at the rule cache miss handler level may be operating system wide (eg, per virtual machine), thus allowing the same tag definition. and policies can be applied across all executable code. In at least one embodiment, per-application or per-process policies may be supported where such policies are installed globally, and the PC (program counter that identifies the current instruction) and/or code may set process or application-specific rules. They can be tagged to distinguish them. In embodiments where virtual machines (VMs) do not share memory, policies may be defined on a per-VM basis.

본 명세서의 다른 부분에서의 논의와 일관하여, PUMP는 SDMP에 대한 규칙 캐시로서 특징지을 수 있다. 명령어상의 한 세트의 태그와 동작 결과에 대한 명령어 입력 및 태그와의 사이에는 매핑이 있을 수 있다. 태그 처리는 명령어의 정상 동작과는 독립적이며 평행하다. 적어도 하나의 실시예에서, PUMP는 정상 RISC-V 동작과 병렬로 실행하여, 동작의 결과에 대한 태그를 공급한다. PUMP는 캐시이기 때문에, 규칙 캐시 미스는 PUMP가 특정 명령어 및 그래서 PUMP 입력(예를 들어, 강제)의 특정 대응 세트를 처음 수신할 때 또는 PUMP가 규칙을 캐시에 보유할 수 없을 때(예를 들어, 캐시의 용량이 초과되고 그래서 규칙이 규칙 캐시에서 퇴거되었거나 어쩌면 충돌할 때) 발생한다. 규칙 캐시 미스는 미스 트랩 시스템(예를 들어, 규칙 캐시 미스 핸들러)의 코드에 의해 처리되는 미스 트랩을 유발한다. 입력은 PUMP CSR을 통해 미스 핸들러로 전달될 수 있고, 규칙 삽입은 CSR을 통해 또한 PUMP로 다시 제공될 수 있다. 이것은 아래에서 보다 상세하게 논의된다. 제 1 실시예는 5 PUMP 입력 태그가 존재하는 본 명세서의 다른 곳에서 논의된다. 변형예로서, 실시예는 상이한 수의 태그 및 다른 PUMP 입력을 포함할 수 있다. 특정 수의 PUMP 태그 입력은 명령어 세트 및 오퍼랜드에 따라 다를 수 있다. 예를 들어, 다음은 RISC-V 아키텍처에 기초한 일 실시예에서 PUMP 입력으로서 포함될 수 있다:Consistent with discussion elsewhere herein, PUMP can be characterized as a rule cache for SDMP. There may be a mapping between a set of tags on a command and a command input for an operation result and a tag. Tag processing is independent of and parallel to the normal operation of the instruction. In at least one embodiment, PUMP runs in parallel with normal RISC-V operation, supplying tags for the result of the operation. Because PUMP is a cache, a rule cache miss occurs when PUMP first receives a specific matching set of specific instructions and thus PUMP inputs (eg, forcing), or when PUMP cannot hold a rule in cache (eg, forcing). , when the capacity of the cache is exceeded and so a rule is evicted from the rule cache or possibly conflicts). A rules cache miss causes a miss trap that is handled by code in the miss trap system (e.g., a rules cache miss handler). Input can be passed to the miss handler via the PUMP CSR, and rule insertion can be provided via the CSR also back to the PUMP. This is discussed in more detail below. A first embodiment is discussed elsewhere in this specification where there is a 5 PUMP input tag. As a variation, an embodiment may include a different number of tags and other PUMP inputs. The specific number of PUMP tag inputs may vary depending on the instruction set and operands. For example, the following may be included as PUMP inputs in one embodiment based on the RISC-V architecture:

1. Opgrp - 특정 opgroup가 현재 명령어를 포함함을 나타낸다. 일반적으로, opgroup은 명령어 그룹을 추상화한 것으로 본 명세서의 다른 곳에서 논의된다.1. Opgrp - Indicates that a specific opgroup contains the current instruction. In general, an opgroup is an abstraction of a group of instructions and is discussed elsewhere in this specification.

2. PCtag - PC상의 태그2. PCtag - Tag on PC

3. CItag - 명령어상의 태그3. CItag - tag on command

4. OP1tag - 명령어로의 RS1 입력상의 태그4. OP1tag - tag on RS1 input to command

5. OP2tag - 명령어로의 RS2 입력상의 태그(또는 CSR 명령어일 때 CSR상의 태그)5. OP2tag - tag on RS2 input to command (or tag on CSR when CSR command)

6. OP3tag - 명령어에 대한 RS3 입력상의 태그6. OP3tag - tag on RS3 input for command

7. Mtag - 명령어로의 메모리 입력 또는 명령어의 메모리 타깃상의 태그7. Mtag - Tag on memory input to instruction or memory target of instruction

8. funct12(funct7) - 본 명세서의 다른 곳에서 설명된 바와 같이 일부 명령어에서 발생하는 확장된 opcode 비트이다.8. funct12 (funct7) - Extended opcode bits that occur in some instructions as described elsewhere in this specification.

9. subinstr - 워드에 여러 명령어가 채워져 있을 때, 이 입력은 워드의 어떤 명령어가 PUMP에 의해 연산되는 현재 명령어인지를 식별한다.9. subinstr - When a word is filled with multiple instructions, this input identifies which instruction in the word is the current instruction being operated on by PUMP.

다음은 RISC-V 아키텍처에 기초한 일 실시예에서 펌프 출력으로서 포함될 수 있다:The following may be included as pump outputs in one embodiment based on the RISC-V architecture:

1. Rtag - 결과: 목적지 레지스터, 메모리 또는 CSR 상의 태그1. Rtag - result: tag on destination register, memory or CSR

2. newPCtag - 이 동작 이후 PC상의 태그(예를 들어, 때로는 PCnew 태그라고도 지칭함).2. newPCtag - The tag on the PC after this operation (eg, sometimes referred to as the PCnew tag).

정보는 예를 들어 트랩 발생시 U-모드에서 실행되는 사용자 코드로부터 M-모드에서 실행되는 규칙 캐시 미스 핸들러와 같은 트랩 핸들러로 CSR을 통해 전달될 수 있다. 유사한 방식으로, 정보는 CSR 내의 정보가 U-모드에서 액세스 가능한 대응하는 레지스터에 배치될 수 있는 경우 U-모드에서 프로그램 실행을 재개할 때 M-모드의 트랩 핸들러 사이에서 CSR을 통해 전달될 수 있다. 이러한 방식으로, 하나의 권한 레벨의 CSR과 다른 권한 레벨의 레지스터 사이에는 매핑이 있을 수 있다. 예를 들어, 본 명세서에서의 기술에 따른 실시예에서, 특정 명령어 오퍼랜드 태그가 트랩의 발생시 CSR에 기입되어 태그를 입력으로서 PUMP 및 규칙 캐시 미스 핸들러에 전달하는 경우 M-모드 핸들러 및 PUMP에 액세스 가능한 CSR이 정의될 수 있다. 유사한 방식으로, CSR은 예컨대, 규칙 캐시 미스 이후 (예를 들어, 매칭 규칙이 현재 명령어에 대한 PUMP 규칙 캐시에서 발견되지 않을 때 규칙 캐시 미스가 발생하는 경우) 프로그램 실행을 재개할 때, 정보를 트랩 핸들러 및/또는 (U-모드보다 높은 권한 레벨에서 동작하는) PUMP로부터 U-모드에서 실행되는 다른 코드로 전달하는데 사용될 수 있다. 예를 들어, CSR은 PCnew 및 RD에 대한 PUMP 출력 태그를 출력하거나 전파하는데 사용될 수 있다. 또한, CSR은 특정 CSR에 기입하는 것에 응답하여 상이한 행위가 발생할 수 있는 경우에 정의될 수 있다. 예를 들어, 규칙 캐시 미스 핸들러 코드는 특정 CSR에 기입함으로써 새로운 규칙을 PUMP의 규칙 캐시에 기입/삽입할 수 있다. 정의된 특정 CSR은 실시예에 따라 다를 수 있다.Information can be passed via a CSR, for example, from user code running in U-mode when a trap occurs to a trap handler, such as a rules cache miss handler running in M-mode. In a similar manner, information may be passed through the CSR between trap handlers in M-mode when resuming program execution in U-mode if the information in the CSR can be placed in a corresponding register accessible in U-mode. . In this way, there may be a mapping between a CSR at one privilege level and a register at another privilege level. For example, in an embodiment consistent with the techniques herein, an M-mode handler and PUMP are accessible if a specific instruction operand tag is written to the CSR on occurrence of a trap to pass the tag as input to the PUMP and rules cache miss handler. A CSR can be defined. In a similar fashion, the CSR traps information when resuming program execution, e.g., after a rule cache miss (e.g., if a rule cache miss occurs when a matching rule is not found in the PUMP rule cache for the current instruction). It can be used to pass from a handler and/or PUMP (operating at a higher privilege level than U-mode) to other code running in U-mode. For example, CSR can be used to output or propagate PUMP output tags for PCnew and RD. In addition, CSRs can be defined where different actions can occur in response to writing a specific CSR. For example, the rule cache miss handler code can write/insert a new rule into the PUMP's rule cache by writing to a specific CSR. The specific CSR defined may vary depending on the embodiment.

도 25를 참조하면, 본 명세서에서의 기술에 따른 일 실시예에서 정의되고 사용될 수 있는 CSR의 예가 도시된다. 테이블(900)은 CSR 어드레스가 16 진수인 제 1 열(902), 권한의 제 2 열(904), CSR 이름을 나타내는 제 3 열(906) 및 CSR의 설명이 있는 제 4 열(908)을 포함한다. 테이블(900)의 각 라인은 상이한 정의된 CSR에 대한 정보를 식별할 수 있다. 테이블(900)에서의 상이한 CSR은 또한 실시예에 포함될 수 있는 부가적인 특징과 관련하여 본 명세서의 다른 곳에서 보다 상세하게 설명된다.Referring to FIG. 25 , an example of a CSR that may be defined and used in an embodiment according to the techniques herein is illustrated. The table 900 has a first column 902 of the CSR address in hexadecimal, a second column of authority 904, a third column 906 indicating the CSR name, and a fourth column 908 containing the description of the CSR. include Each line of table 900 may identify information about a different defined CSR. The different CSRs in table 900 are described in more detail elsewhere herein with respect to additional features that may also be included in embodiments.

행(901a 내지 901c)은 PUMP에 의한 코드 및/또는 명령어에 태깅하기 위해 사용되는 특별한 태그 값을 갖는 CSR을 식별한다. 적어도 하나의 실시예에서, 엔트리(901a)에 의해 정의된 sboottag CSR은 시스템에서 사용되는 제 1 초기 또는 시작 태그 값을 포함할 수 있다. 전술한 시작 태그 값은 부트스트랩 태그 값(bootstrap tag value)으로 지칭될 수 있다. 일 양태에서, 부트스트랩 태그 값은 모든 다른 태그가 도출되거나 기초될 수 있는 "시드(seed)"로 특징지을 수 있다. 따라서, 부트스트랩 태그는 일 실시예에서 모든 다른 태그를 생성하기 위한 시작점으로 사용될 수 있다. 오퍼레이팅 시스템에서 부트스트랩 코드의 시작 위치의 초기 로딩과 유사한 방식으로, 하드웨어는 부트스트랩 태그로서 사용된 특정의 미리 정의된 태그 값으로 CSR(901a)을 초기화하는데 사용될 수 있다. 본 명세서에서의 기술에 따라 부트스트랩 태그가 시스템의 부팅의 일부로서 판독되면, sboottag CSR이 클리어될 수 있다. 예를 들어, 오퍼레이팅 시스템 코드의 권한 있는 부분은 부트스트랩 태그 값을 사용하여 초기 태그 전파를 수행하는 규칙을 호출하는 명령어를 포함할 수 있다. 부트스트랩 태그의 사용과 태그 발생 및 전달에 대해서는 본 명세서의 다른 곳에서 추가로 설명된다. 행(901b)은 본 명세서의 다른 곳에서 설명되는 바와 같이 공개적 신뢰성 없는 소스(public untrusted source)로부터의 데이터에 태깅하기 위해 사용된 태그 값을 포함하는 CSR을 식별한다. 행(901c)의 경우, 데이터 및/또는 명령어에 태깅할 때 디폴트 태그 값으로서 사용될 수 있는 디폴트 태그 값을 포함하는 CSR을 식별한다.Rows 901a through 901c identify CSRs with special tag values used to tag code and/or instructions by PUMP. In at least one embodiment, the sboottag CSR defined by entry 901a may include a first initial or starting tag value used in the system. The aforementioned start tag value may be referred to as a bootstrap tag value. In one aspect, a bootstrap tag value can be characterized as a “seed” from which all other tags can be derived or based. Thus, the bootstrap tag can be used as a starting point for creating all other tags in one embodiment. In a manner similar to the initial loading of the starting location of bootstrap code in an operating system, hardware may be used to initialize CSR 901a with a specific predefined tag value used as a bootstrap tag. If the bootstrap tag is read as part of booting the system according to the techniques herein, the sboottag CSR may be cleared. For example, a privileged portion of the operating system code may include instructions that invoke a rule that performs initial tag propagation using the bootstrap tag value. The use of bootstrap tags and tag generation and propagation are further described elsewhere in this specification. Row 901b identifies a CSR containing a tag value used to tag data from a public untrusted source as described elsewhere herein. For row 901c, identifies a CSR that contains a default tag value that can be used as the default tag value when tagging data and/or instructions.

행(901d 및 901e)은 각각 opgroup/don't care 테이블에 기입하기 위한 어드레스 및 데이터를 나타낸다(예를 들어, 본 명세서의 다른 곳에서 opcode에 대한 opgroup 및 care/don't care 비트를 포함하는 매핑 또는 변환 테이블이라고도 지칭된다). 행(901e)에 의해 표시된 CSR에 기입하면 opgroup/care 표에 기입하는 것을 트리거한다. 행(901f)은 PUMP 규칙 캐시를 플러시하기 위해 기입될 수 있는 CSR을 식별한다. 행(901g 내지 901m)은 현재 명령어에 대한 태그 입력을 PUMP 및 규칙 캐시 미스 핸들러에 제공하는 CSR을 식별한다. 행(901j 내지 901m) 각각은 처리되는 현재 명령어의 오퍼랜드에 대한 상이한 오퍼랜드 태그를 나타내며, 규칙 캐시 미스를 유발함으로써 명령어는 최대 4개의 그러한 오퍼랜드(4 오퍼랜드 중 3개는 레지스터(CSR(901j 내지 901l)이고 4번째 오퍼랜드는 행(901m)에 의해 표시된 CSR에 저장된 태그를 갖는 메모리 위치임)를 포함할 수 있다. 행(901n)은 본 명세서의 다른 곳에서 설명된 바와 같이 현재 명령어의 opcode가 확장된 func12 필드를 사용할 때 확장된 opcode 비트를 보유하는 CSR을 식별한다. 행(901o)은 워드의 어떤 서브 명령어가 현재 참조되는 명령어인지를 나타내는 CSR을 나타낸다. 본 명세서의 다른 곳에서 논의된 바와 같이, 단일 태깅된 워드는 64 비트일 수 있고, 각 명령어는 32 비트일 수 있으며, 그에 따라 두 개의 명령어가 단일 태깅된 워드에 포함될 수 있다. 행(901o)로 표시된 CSR은 두 명령어 중 어느 명령어가 PUMP에 의해 처리되는지를 식별한다. 행(901p-901q)은 새로운 PC의 PUMP 출력 태그(예를 들어, 다음 명령어에 대한 새로운 PC 태그) 및 RD(목적지 레지스터, 현재 명령어의 결과에 대한 어드레스)를 각기 포함하는 CSR을 식별한다. 행(901q)으로 표시된 CSR에 기입은 (예를 들어, PUMP 규칙 캐시 미스를 트리거했던 현재 명령어에 매칭하는) 규칙을 PUMP 규칙 캐시에 기입하게 한다. 행(901r)은 PUMP 동작을 위한 tagmod를 식별한다. tagmod는 본 명세서의 다른 곳에서 더 상세히 설명된다.Rows 901d and 901e respectively represent addresses and data to write to the opgroup/don't care table (e.g., elsewhere herein including the opgroup and care/don't care bits for the opcode). Also referred to as a mapping or translation table). A write to the CSR indicated by row 901e triggers a write to the opgroup/care table. Row 901f identifies a CSR that can be written to flush the PUMP rules cache. Rows 901g through 901m identify CSRs that provide tag input for the current instruction to the PUMP and rules cache miss handlers. Rows 901j through 901m each represent a different operand tag for the operand of the current instruction being processed, and by causing a rule cache miss, an instruction can have up to four such operands (three of the four operands are registers (CSRs 901j through 901l)). and the 4th operand is a memory location with the tag stored in the CSR indicated by row 901m. Row 901n is the extended opcode of the current instruction as described elsewhere herein. When using the func12 field, identifies the CSR that holds the extended opcode bit. Row 901o represents the CSR indicating which sub-instruction of the word is the currently referenced instruction. As discussed elsewhere in this specification, A single tagged word may be 64 bits and each instruction may be 32 bits, so that two instructions may be included in a single tagged word CSR indicated by row 901o indicates which of the two instructions is a PUMP Rows 901p-901q contain the new PC's PUMP output tag (e.g., the new PC tag for the next instruction) and RD (destination register, address for the result of the current instruction), respectively. A write to the CSR indicated by row 901q causes a rule to be written to the PUMP rules cache (e.g., matching the current instruction that triggered the PUMP rule cache miss). Identifies the tagmod for the PUMP operation, which is described in more detail elsewhere in this specification.

적어도 하나의 실시예에서, opgroup 및 care/don't care 비트를 저장하는데 사용되는 하나 이상의 표(예를 들어, opgroup/care 표)는 (901e)로 표시된 CSR sopgrpvalue에 기입함으로써 채워질 수 있으며, 이곳에서 전술한 CSR (901e)의 내용이 (901d)로 표시된 sopgrpaddr CSR에 저장된 어드레스에 기입된다. 규칙은 엔트리(901q)에 의해 정의된 srtag CSR 정의에 기입하는 것에 응답하여 PUMP 규칙 캐시에 기입되거나 설치될 수 있다. 기입된 규칙은 opcode (또는 보다 구체적으로는 opcode에 대한 opgroup) 및 현재 명령어에 대한 태그 값을 PUMP CSR(예를 들어, PUMP CSR 입력(901g 내지 901o)에 기초함)을 통한 PUMP로의 입력으로서 명시하는 규칙이다.In at least one embodiment, one or more tables (eg, an opgroup/care table) used to store opgroup and care/don't care bits may be populated by writing to the CSR sopgrpvalue denoted 901e, where In , the contents of the aforementioned CSR 901e are written to the address stored in the sopgrpaddr CSR indicated by 901d. Rules may be written to or installed in the PUMP rule cache in response to writing to the srtag CSR definition defined by entry 901q. The written rule specifies the opcode (or more specifically the opgroup for the opcode) and the tag value for the current instruction as input to PUMP via the PUMP CSR (e.g., based on PUMP CSR inputs 901g through 901o). It is a rule to

CSR 동작에 대해 태깅 및 태그 보호를 허용하기 위해, 데이터 흐름(dataflow)은 CSR 태그가 PUMP에 입력되게 하고, PUMP로부터 출력되게 한다. RISC-V 아키텍처에 따르면, 각각 CSR로부터 판독하고 CSR에 기입하는 판독 및 기입 명령어가 있다. PUMP를 갖는 CSR 명령어와 관련하여, PUMP로의 R2tag 입력은 현재 CSR 태그이다. CSR 판독/기입 명령어(예를 들어, csrrc, csrrci, csrrs, csrrsi, csrrw, csrrwi)는 두 개의 입력: (1) RD 및 (2) 명령어에 의해 참조되는 CSR을 기입한다. 이 경우, PUMP 출력 R 태그(또는 목적지의 RD 태그)는 PUMP에 의해 출력된 CSR 태그를 특정하고 CSRtag를 레지스터 목적지 태그에 직접 복사한다:To allow tagging and tag protection for CSR operations, a dataflow allows CSR tags to be input to and output from the PUMP. According to the RISC-V architecture, there are read and write instructions that read from and write to the CSR, respectively. For a CSR instruction with a PUMP, the R2tag input to the PUMP is the current CSR tag. A CSR read/write instruction (eg csrrc, csrrci, csrrs, csrrsi, csrrw, csrrwi) writes two inputs: (1) RD and (2) the CSR referenced by the instruction. In this case, the PUMP output R tag (or destination's RD tag) specifies the CSR tag output by PUMP and copies the CSRtag directly to the register destination tag:

열(904)에 의해 표시된 권한과 관련하여, 행(901r)에 의해 정의된 CSR mtagmode는 머신 또는 M-모드 레벨에서 실행되는 코드에 의한 판독/기입을 위해 액세스 가능하다. 행(901a 내지 901q)에 의해 정의된 나머지 CSR은 적어도 슈퍼바이저 또는 S-모드 레벨에서 실행되는 코드에 의한 판독/기입을 위해 액세스 가능하다. 따라서, 다양한 CSR에 대해 열(904)에 표시된 권한은 코드가 특정 CSR에 액세스하기 위한 실행 코드의 최소한의 RISC-V 권한 레벨을 표시한다. 실시예는 예(900)에 도시된 바와 달리 실시예에서 사용한 CSR를 갖는 상이한 RISC-V 권한 레벨을 할당할 수 있다.With respect to the permissions indicated by column 904, the CSR mtagmode defined by row 901r is accessible for read/write by code running at the machine or M-mode level. The remaining CSRs defined by rows 901a through 901q are accessible for read/write by code executing at least at the supervisor or S-mode level. Accordingly, the permissions indicated in column 904 for the various CSRs indicate the minimum RISC-V permission level of executable code for the code to access a particular CSR. Embodiments may assign different RISC-V privilege levels with the CSRs used in the embodiments, as shown in example 900 .

본 명세서의 기술에 따른 실시예는 PUMP에 의해 수행된 태그 전파에 영향을 미치는 다수의 태그 모드를 정의할 수 있다. 현재 태그 모드는 행(901r)에 의해 정의된 바와 같이 CSR mtagmode에 저장된 현재 시점의 값으로 식별된다. 적어도 하나의 실시예에서, 태그 모드는 RISC-V 정의된 권한(예를 들어, 위의 M, H, S 및 U 모드)과 조합하여 PUMP와 관련하여 사용되는 CSR 보호 모델을 정의하는데 사용될 수 있다.Embodiments according to the techniques herein may define multiple tag modes that affect tag propagation performed by PUMP. The current tag mode is identified by the current point in time value stored in CSR mtagmode as defined by row 901r. In at least one embodiment, tag mode may be used in combination with RISC-V defined permissions (e.g., M, H, S, and U modes above) to define the CSR protection model used in connection with PUMP. .

규칙 캐시 미스 핸들러를 구성 가능하게 배치할 수 있도록 하기 위해, RISC-V 권한을 더 확장하는 보호 모델이 사용될 수 있다. 권한 레벨에 따라 PUMP CSR 액세스를 전체적으로 정의하는 대신, CSR 액세스는 RISC-V 권한 레벨과 조합하여 현재 태그 모드에 대해 추가로 정의될 수 있다. 따라서, 본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 실행 코드가 CSR에 액세스하도록 허용되는지는 CSR의 최소한의 RISC-V 권한 레벨, 현재 태그 모드 및 실행 코드의 현재 RISC-V 권한 레벨에 종속할 수 있다. 태그모드는 아래에서 더 상세히 논의된다.To allow for configurable placement of rule cache miss handlers, a protection model that further extends RISC-V rights can be used. Instead of defining the PUMP CSR access entirely according to the privilege level, the CSR access can be further defined for the current tag mode in combination with the RISC-V privilege level. Thus, in at least one embodiment consistent with the techniques herein, whether executable code is allowed to access a CSR depends on the CSR's minimum RISC-V privilege level, the current tag mode, and the executable code's current RISC-V privilege level. can be dependent Tag mode is discussed in more detail below.

도 26을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 태그 모드의 예가 도시된다. 테이블(910)은 다음과 같은 열 - mtagmode 비트 인코딩(912), 동작(914) 및 태그 결과(916)를 포함한다. 테이블(910)의 각 행은 서로 다른 가능한 태그 모드에 대한 정보를 나타낸다. 태그 모드가 (911a)로 표시된 000 일 때, PUMP는 오프이고 사용 중이 아니며 어떠한 태그 결과도 생성하지 않는다. 태그 모드가 010 일 때, PUMP는 모든 결과에 디폴트 태그(예를 들어, 목적지 또는 결과 레지스터 또는 메모리 위치에 대한 Rtag)를 기입한다.Referring to FIG. 26 , an example of a tag mode that may be used in an embodiment according to the technology herein is illustrated. Table 910 includes the following columns—mtagmode bit encoding 912, operation 914, and tag result 916. Each row of table 910 represents information about different possible tag modes. When the tag mode is 000, indicated by 911a, the PUMP is off and not in use and does not generate any tag results. When the tag mode is 010, PUMP writes a default tag (eg Rtag to destination or result register or memory location) to all results.

행(911c 내지 911f)과 관련하여, 코드가 상이한 RISC-V 권한 레벨에서 실행되는 경우 PUMP를 연계하거나(engaging) 또는 연계 해지하기(disengaged) 위해 특정될 수 있는 상이한 태그 모드가 표시된다. PUMP가 연계될 때, PUMP는 코드가 실행됨으로써 그 정책의 규칙이 코드 실행 동안 실시될 때 활성, 작동 가능 및 보호를 제공하는 것으로 특징지을 수 있으며, 된다. 이에 반해, PUMP가 연계 해지될 때, PUMP는 코드가 실행되어 그 정책의 규칙이 코드 실행 중에 실시되지 않을 때 비활성, 작동 불가 및 보호를 제공하지 않는 것으로 특징지을 수 있다. PUMP가 연계 해지될 때, 태그는 현재 명령어의 태그 값과 매칭하는 태그 값을 갖는 규칙의 평가에 기초하여 전파된 태그를 갖는 대신 하나 이상의 디폴트 태그 전파 규칙을 사용하여 전파될 수 있다. PUMP가 연계되는지 또는 연계 해지되는지는 상이한 RISC-V 권한 레벨에서 실행되는 코드에 기인한 특정의 추정된 신뢰 레벨 및 원하는 보호 레벨에 따라 달라질 수 있다.With respect to lines 911c through 911f, the different tag modes that can be specified for engaging or disengaging PUMP when the code is running at different RISC-V privilege levels are indicated. When a PUMP is associated, a PUMP can be characterized as being active, operable, and providing protection when code is executed so that the rules of its policy are enforced during code execution. In contrast, when a PUMP is disassociated, it may be characterized as inactive, inoperable, and providing no protection when code is executed and the rules of its policy are not enforced during code execution. When a PUMP is disassociated, the tag is instead of having the tag propagated based on evaluation of a rule with a tag value that matches the tag value of the current instruction. Can be propagated using one or more default tag propagation rules. Whether a PUMP is associated or unassociated may depend on the particular assumed level of trust due to code running at different RISC-V privilege levels and the level of protection desired.

태그 모드(911c 내지 911f)와 관련하여, (901r)로 표시된 mtagmode CSR을 제외하고, 예(900)의 모든 PUMP CSR은 PUMP가 연계 해지될 때만 액세스 가능할 수 있다. 즉, (901r)로 표시된 mtagmode CSR을 제외한 예(900)의 PUMP CSR은 태그 모드에 의해 표시된 가장 높은 순위의 PUMP 권한보다 더 권한 있는 현재 RISC-V의 동작 권한 또는 모드에서 실행 가능한 코드에만 액세스 가능하다(예를 들어, (911c)에 의해 표시된 가장 높은 순위의 권한은 U 모드이고, (911d)에 의해 표시된 가장 높은 순위의 권한은 S 모드이고, (911e)에 의해 표시된 가장 높은 순위의의 권한은 H 모드이며, (911f)에 의해 표시된 가장 높은 순위의 권한은 M 모드이다).Regarding tag modes 911c to 911f, except for the mtagmode CSR indicated by 901r, all PUMP CSRs in example 900 may only be accessible when the PUMP is disassociated. That is, the PUMP CSR of example 900, except for the mtagmode CSR indicated by 901r, can only access code executable in the current RISC-V's operational rights or modes that are more privileged than the highest PUMP rights indicated by tag mode. However (e.g., the highest priority authority indicated by 911c is U mode, the highest priority authority indicated by 911d is S mode, and the highest priority authority indicated by 911e is is H mode, and the highest priority authority indicated by 911f is M mode).

(911c)에 의해 표시된 바와 같이 태그 모드가 100 일 때, PUMP는 RISC-V 권한 레벨이 U-모드보다 높거나 더 승격된 권한 레벨을 표시할 때 연계되지 않고 동작하지 않는다. 따라서, 태그 모드(911c)는 코드가 U-모드에서 실행될 때 PUMP 및 보호를 제공하는 그의 규칙이 유일하게 연계되어 시행되며, 그럼으로써 U-모드보다 높은 권한 레벨(예를 들어, S, M 또는 H 모드)에서 실행되는 코드가 신뢰성 있음을 나타낸다. (911c)에 의해 표시된 바와 같이 태그 모드가 100이고 실행 코드의 RISC-V 보호 레벨이 S, M 또는 H 모드일 때, PUMP는 연계되지 않으며 그의 CSR은 S, M 또는 H 모드에서만 실행되는 코드에 액세스할 수 있다(예를 들면, CSR은 U-모드에서 실행되는 코드에 액세스할 수 없다).When the tag mode is 100 as indicated by 911c, the PUMP is unassociated and does not operate when the RISC-V privilege level is higher than U-mode or indicates a more elevated privilege level. Thus, tag mode 911c is enforced only in conjunction with PUMP and its rules providing protection when the code is running in U-mode, and thereby at a higher privilege level than U-mode (e.g., S, M or H mode) indicates that the code running is reliable. As indicated by 911c, when the tag mode is 100 and the RISC-V protection level of the executable code is S, M, or H mode, the PUMP is not associated and its CSR is for code that runs only in S, M, or H mode. can access (eg CSR cannot access code running in U-mode).

(911d)에 의해 표시된 바와 같이 태그 모드가 101 일 때, PUMP는 RISC-V 권한 레벨이 S-모드보다 높거나 더 승격된 권한 레벨을 표시할 때 연계 해지되고 동작하지 않는다. 따라서, 태그 모드(911d)는 코드가 S-모드 및 U-모드에서 실행될 때 PUMP 및 보호를 제공하는 그의 규칙이 유일하게 연계되어 시행되며, 그럼으로써 S-모드보다 높은 권한 레벨(예를 들어, M 또는 H 모드)에서 실행되는 코드가 신뢰성 있음을 나타낸다. (911d)에 의해 표시된 바와 같이 태그 모드가 101이고 실행 코드의 RISC-V 보호 레벨이 M 또는 H 모드일 때, PUMP는 연계 해지되고 그의 CSR은 M 또는 H 모드에서만 실행되는 코드에 액세스할 수 있다(예를 들면, CSR은 S 또는 U 모드에서 실행되는 코드에는 액세스할 수 없다).When the tag mode is 101 as indicated by 911d, the PUMP is disassociated and does not operate when the RISC-V permission level indicates a permission level higher than or elevated above the S-mode. Thus, tag mode 911d enforces PUMP and its rules providing protection only when the code is running in S-mode and U-mode, and thereby at a higher privilege level than S-mode (e.g., Indicates that code running in M or H mode) is reliable. As indicated by 911d, when the tag mode is 101 and the RISC-V protection level of the executable code is M or H mode, the PUMP is disassociated and its CSR can access code that runs only in M or H mode (For example, the CSR has no access to code running in S or U mode).

(911e)에 의해 표시되는 바와 같이 태그 모드가 110 일 때, PUMP는 RISC-V 권한 레벨이 H-모드보다 높거나 더 승격된 권한 레벨을 표시할 때 연계 해지되고 동작하지 않는다. 따라서, 태그 모드(911e)는 코드가 H-모드, S-모드 및 U-모드에서 실행될 때 유일하게 연계되고 시행되며, 그럼으로써 H-모드보다 높은 권한 레벨에서(예를 들어, M 모드에서)에서 실행되는 코드가 신뢰성 있음을 나타낸다. (911e)에 의해 표시된 바와 같이 태그 모드가 110이고 코드를 실행하는 RISC-V 보호 레벨이 M 모드일 때, PUMP는 연결 해지되고 그의 CSR은 M 모드에서만 실행되는 코드에 액세스할 수 있다(예를 들면, CSR은 U, H 또는 S 모드에서 실행되는 코드에 액세스할 수 없다).When the tag mode is 110 as indicated by 911e, the PUMP is disassociated and does not operate when the RISC-V permission level indicates a higher or more elevated permission level than the H-mode. Thus, tag mode 911e is only associated and enforced when code is executed in H-mode, S-mode, and U-mode, and thereby at a higher privilege level than H-mode (e.g., in M-mode). Indicates that the code running on is trustworthy. As indicated by 911e, when the tag mode is 110 and the RISC-V protection level executing code is M mode, the PUMP is disconnected and its CSR can access code running only in M mode (e.g. For example, a CSR cannot access code running in U, H or S mode).

(911f)에 의해 표시된 바와 같이 태그 모드가 111 일 때, PUMP는 항상 연계되며 M, H, S 및 U의 모든 RISC-V 권한 레벨에 대해 동작한다. 따라서, 태그 모드(911f)는 코드가 M-모드, H-모드, S-모드 및 U-모드 중 어느 모드에서건 실행될 때 PUMP 및 보호를 제공하는 그의 규칙이 연계되어 시행됨을 나타내며, 그럼으로써 코드가 본질적으로 신뢰성 없음을 나타낸다. (911f)에 의해 표시된 바와 같이 태그 모드=111의 경우, PUMP는 절대로 연계 해지되지 않으며 그의 CSR은 모든 실행 코드에 액세스할 수 없다.When the tag mode is 111 as indicated by 911f, PUMP is always associated and operates for all RISC-V privilege levels of M, H, S and U. Thus, tag mode 911f indicates that PUMP and its rules providing protection are enforced in conjunction when the code is executed in any of M-mode, H-mode, S-mode and U-mode, whereby the code indicates inherently unreliable. For tag mode = 111 as indicated by 911f, the PUMP is never disassociated and its CSR cannot access any executable code.

행(911c 내지 911f)에 의해 표시된 태그 모드와 관련하여, 코드를 실행하는 현재 RISC-V 권한 레벨이 태그 모드에 의해 표시된 가장 높은 연계된 PUMP 레벨보다 높을 때, PUMP는 연계 해지될 수 있고 태그는 하나 이상의 디폴트 태그 전파 규칙을 사용하여 전파될 수 있다.With respect to the tag mode indicated by rows 911c through 911f, when the current RISC-V privilege level executing the code is higher than the highest associated PUMP level indicated by the tag mode, the PUMP may be disassociated and the tag Can be propagated using one or more default tag propagation rules.

(PUMP가 오프임을 나타내는) 행(911a)에 의해 표시되는 바와 같이 태그 모드가 000의 인코딩을 가질 때, 또는 (기입 디폴트 모드를 나타내는) 행(911b) 에 의해 표시되는 바와 같이 태그 모드가 010의 인코딩을 가질 때, 테이블(900)의 모든 CSR은 M 모드에서 실행되는 코드에 의해서만 액세스 가능하다.When the tag mode has an encoding of 000, as indicated by row 911a (indicating that PUMP is off), or when the tag mode is of 010, as indicated by row 911b (indicating write default mode) With encoding, all CSRs in table 900 are accessible only by code running in M mode.

따라서, 본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 실행 코드가 CSR에 액세스할 수 있게 되는지는 (테이블(900)의 열(904)에서 특정된 것과 같은) CSR의 최소 RISC-V 권한 레벨, 현재 태그 모드 및 실행 코드의 현재 RISC-V 권한 레벨에 종속할 수 있다. 예를 들어, 태그 모드를 고려하지 않은 RISC-V 아키텍처에서, U 모드에서 실행되는 코드는 그러한 모든 CSR에 대해 (904)에 의해 표시된 최소 권한 레벨로 인해 (900)에서 정의된 CSR 중 임의의 CSR에 액세스할 수 없다. 그러나, 태그 모드를 고려하지 않고, 적어도 H-모드의 권한으로 실행되는 코드는 (901r)을 제외한 (900)의 모든 CSR에 액세스할 수 있고 M 모드에서 실행되는 코드는 (900)의 모든 CSR에 액세스할 수 있다. 이제 (904)의 최소 RISC-V 권한 및 태그 모드에 따라 (900)의 CSR에 대한 CSR 액세스를 결정하는 것을 고려해본다. 예를 들어, H-레벨에서 실행되는 코드 부분 A를 고려해 본다. 코드 부분 A은 (911c)에 의해 표시된 바와 같이 태그 모드가 100 일 때 또는 (911d)에 의해 표시된 바와 같이 태그 모드가 101 일 때 (테이블(900)의) CSR(901a 내지 901q)에 액세스할 수 있다. 그러나, S 모드에서 실행되는 코드 부분 B는 그러한 CSR에 대해 정의된 CSR 권한 레벨에 의해 명시된 최소 권한 레벨을 갖고 있지 않기 때문에 CSR(901a 내지 901q)에 액세스할 수 없을 수 있다. 따라서, 예를 들어, 코드 부분 A는 테이블(900)에서 정의된 CSR을 사용하여 H-레벨에서 실행되는 일 실시예에서의 캐시 미스 핸들러일 수 있다. 제 2 예로서, CSR(901a 내지 901q)에 대해 정의된 최소 RISC-V 권한이 SRW(이러한 CSR에 액세스하는 최소 권한 레벨로 S 모드를 표시함)이라고 가정한다. H 모드에서 실행되는 코드 부분 A는 태그 모드가 (911c)에서와 같이 100일 때 및 (911d)에서와 같이 태그 모드가 101일 때, CSR (901a 내지 901q)에 액세스할 수 있고, S 모드에서 실행되는 코드 부분 B는 태그 모드가 (911c)에서와 같이 100일 때 CSR(901a 내지 901q)에 액세스할 수 있다. 따라서, 코드 부분 A 또는 B는 캐시 미스 핸들러의 코드일 수 있다.Thus, in at least one embodiment consistent with the techniques herein, whether executable code is to be able to access a CSR is subject to the CSR's minimum RISC-V privileges (as specified in column 904 of table 900). level, the current tag mode, and the current RISC-V privilege level of the executable code. For example, in a RISC-V architecture that does not take tag mode into account, code running in U mode can use any CSR defined at (900) due to the minimum privilege level indicated by (904) for all such CSRs. can't access However, regardless of tag mode, at least code running with privileges in H-mode can access all CSRs in (900) except for (901r) and code running in M mode has access to all CSRs in (900). can access Now consider determining the CSR access to the CSR of 900 according to the minimum RISC-V permission and tag mode of 904 . For example, consider code section A that runs at H-level. Code portion A can access CSRs 901a through 901q (of table 900) when the tag mode is 100, as indicated by 911c, or when the tag mode is 101, as indicated by 911d. there is. However, code portion B running in S mode may not be able to access CSRs 901a through 901q because it does not have the minimum privilege level specified by the CSR privilege level defined for that CSR. Thus, for example, code portion A may be a cache miss handler in one embodiment running at the H-level using the CSR defined in table 900. As a second example, assume that the minimum RISC-V authority defined for CSRs 901a to 901q is SRW (indicating S mode as the minimum authority level to access these CSRs). Code portion A executing in H mode can access CSRs 901a through 901q when the tag mode is 100 as in 911c and when the tag mode is 101 as in 911d, and in S mode Code portion B that is executed can access CSRs 901a through 901q when the tag mode is 100 as in 911c. Thus, code part A or B may be the code of the cache miss handler.

적어도 하나의 실시예에서, (911a)의 오프 태그 모드는 예컨대 부트 업 프로세스의 적절한 부분 동안 PUMP가 오프일 때 현재 태그 모드일 수 있다. (예를 들어, CSR(901c)로 표시된) 동일한 디폴트 태그를 갖기 위해 메모리 위치를 초기화할 때, (911b)의 디폴트 태그 모드는 현재 태그 모드일 수 있다. 일반적으로, 4 권한 모드가 RISC-V 아키텍처에서 명시되었지만, 실시예는 제 1 권한 레벨이 사용자 모드 또는 권한 없는 모드를 나타내고 제 2 권한 레벨이 (예를 들어, UNIX 기반 오퍼레이팅 시스템의 커널 모드와 유사한) 승격된 또는 권한 있는 실행 모드를 나타내는 상이한 수의 권한 모드를 대안적으로 사용할 수 있다. 이러한 실시예에서, PUMP는 사용자 또는 권한 없는 모드에서 코드를 실행할 때 연계되어 정책을 실행할 수 있고, PUMP는 제 2 승격된 권한 모드에서 코드를 실행할 때 연계 해지(예를 들어, PUMP 보호 오프 또는 규칙 시행하지 않음)될 수 있다. 이러한 방식으로, 실시예는 새로운 규칙을 PUMP 규칙 캐시에 저장하기 위해 미스 핸들러와 같은 신뢰성 있는 또는 승격된 권한 코드를 실행할 때 PUMP를 연계 해지할 수 있다.In at least one embodiment, the off tag mode of 911a may be the current tag mode when the PUMP is off, for example during an appropriate portion of the boot up process. When initializing a memory location to have the same default tag (e.g., indicated by CSR 901c), the default tag mode of 911b may be the current tag mode. In general, 4 privileged modes have been specified in the RISC-V architecture, but embodiments have a first privileged level representing the user mode or unprivileged mode and a second privileged level (e.g., similar to the kernel mode of a UNIX-based operating system). ) may alternatively use a different number of privileged modes representing elevated or privileged execution modes. In such an embodiment, PUMP can be associated to execute policies when executing code in a user or unprivileged mode, and PUMP can unassociate when executing code in a second elevated privileged mode (e.g., PUMP protection off or rule not enforced). In this way, embodiments may disassociate PUMP when executing trusted or elevated rights code such as a miss handler to store new rules in the PUMP rule cache.

위에서 언급한 바와 같이, 실시예는 디폴트 전파 규칙을 사용하여, 예를 들어, PUMP가 연계 해지될 때 및/또는 규칙이 PUMP 출력인 newPCtag 및 Rtag에 대해 don't care를 명시할 때(예를 들어, 그러한 don't care 값은 현재 명령어의 특정 opcode에 대해 care 벡터에 의해 표시될 수 있다), PUMP 출력인 newPCtag 및 Rtag를 출력하는 것을 결정할 수 있다. 일 실시예에서, 다음은 사용되는 디폴트 전파 규칙에서 구현되는 로직을 나타낼 수 있다.As mentioned above, embodiments may use default propagation rules, e.g., when a PUMP is disassociated and/or when a rule specifies don't care for PUMP outputs newPCtag and Rtag (e.g. For example, such a don't care value can be indicated by the care vector for a particular opcode of the current instruction), and can decide to output the PUMP outputs newPCtag and Rtag. In one embodiment, the following may indicate the logic implemented in the default propagation rules used.

* newPCtag는 디폴트 전파를 위한 PCtag이다.* newPCtag is the PCtag for default propagation.

* Rtag는 CSR 판독 및 기입 동작을 위한 RS1tag이고; RDtag에는 RS2tag(CSRtag)가 할당된다.* Rtag is RS1tag for CSR read and write operations; RS2tag (CSRtag) is assigned to RDtag.

- 태그가 데이터 값과 함께 스왑할 수 있게 한다- Allows tags to swap along with data values

- RDtag <-- RS2tag <-- 원래 CSRtag- RDtag <-- RS2tag <-- Original CSRtag

- CSRtag <-- Rtag <-- 원래 RS1tag- CSRtag <-- Rtag <-- original RS1tag

** Rtag는 CSRR?I, CSRRS, CSRRC에 대한 RS2tag(CSRtag)이다.** Rtag is RS2tag (CSRtag) for CSRR?I, CSRRS, and CSRRC.

- CSRtag는 변경되지 않는다- CSRtag does not change

- RDtag <-- RS2tag <-- 원래 CSRtag- RDtag <-- RS2tag <-- Original CSRtag

- CSRtag <-- Rtag <-- 원래 RS2tag <-- 원래 CSRtag- CSRtag <-- Rtag <-- original RS2tag <-- original CSRtag

* Rtag는 JAL 및 JALR 명령어를 위한 PCtag이다(이것은 리턴 어드레스를 위한 것이다). Rtag는 AUIPC 명령어를 위한 PCtag이다. RISC-V에서, AUIPC(add upper immediate to PC) 명령어는 PC-상대 어드레스를 구축하는데 사용되며 U-타입 포맷을 사용한다. AUIPC는 20-비트 U-이미디어트(immediate)로부터 32 비트 오프셋을 형성하고, 최하위 12 비트를 0으로 채우고, 이 오프셋을 PC에 가산한 다음, 결과를 레지스터 rd에 놓는다.* Rtag is PCtag for JAL and JALR instructions (this is for return address). Rtag is a PCtag for AUIPC commands. In RISC-V, the add upper immediate to PC (AUIPC) instruction is used to build PC-relative addresses and uses a U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, fills the least significant 12 bits with zeros, adds this offset to PC, then places the result in register rd.

* Rtag는 LUI 명령어를 위한 CItag이다. RISC-V에서, LUI(load upper immediate) 명령어는 32 비트 상수를 구축하는데 사용되며 U-타입 포맷을 사용한다. LUI는 U-이미디어트 값을 목적지 레지스터 RD의 상위 20 비트에 놓고 최하위 12 비트를 0으로 채운다.* Rtag is a CItag for LUI commands. In RISC-V, load upper immediate (LUI) instructions are used to construct 32-bit constants and use a U-type format. The LUI places the U-immediate value in the upper 20 bits of the destination register RD and fills the lower 12 bits with zeros.

* Rtag는 비-메모리, 비 CSR, 비-JAL(R)/AUIPC/LUI 동작을 위한 RS1tag이다.* Rtag is RS1tag for non-memory, non-CSR, non-JAL(R)/AUIPC/LUI operations.

** Rtag는 메모리 기입 연산을 위한 RS2tag이다.** Rtag is RS2tag for memory write operations.

* Rtag는 메모리 로드 연산을 위한 Mtag이다.* Rtag is Mtag for memory load operation.

RISC-V 아키텍처에 기초한 본 명세서에서의 기술의 적어도 하나의 실시예에서, 새로운 PUMP 미스 트랩(new PUMP miss trap)이 규칙 캐시 미스 발생에 대해 정의될 수 있다. PUMP 미스 트랩은 가상 메모리 결함(memory fault) 또는 위법 명령어보다 낮은 우선 순위를 가질 수 있다In at least one embodiment of the techniques herein based on the RISC-V architecture, a new PUMP miss trap may be defined for rule cache miss occurrences. A PUMP miss trap may have a lower priority than a virtual memory fault or an illegal instruction.

RISC-V 아키텍처를 사용하는 본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 데이터와 메타데이터 간의 엄격한 분리 및 격리는 태그 메타데이터 처리와 정상 명령어 처리 사이에 분리 및 격리가 존재하는 경우에 유지될 수 있다. 따라서, 메타데이터 규칙 처리와 정상적인 또는 전형적인 프로그램 명령어 실행 사이의 별도의 실행 도메인이 유지될 수 있다. 명령어와 연관된 태그 및 코드를 실행하는 데이터에 대해 PUMP를 사용하여 수행되는 메타데이터 처리가 수행될 수 있다. PUMP 규칙 캐시 미스가 발생하면 현재 명령어와 매칭하는 규칙을 생성 또는 검색하여 그 규칙을 PUMP 규칙 캐시에 저장하는 규칙 캐시 미스 핸들러로 제어를 이전하게 하는 트랩이 유발된다. 정보는 CSR을 사용하여 위에서 언급한 실행 도메인들 사이에 전달될 수 있다. 실행중인 프로그램의 명령 실행 도메인으로부터 메타데이터 규칙 처리 도메인으로 전환할 때(예컨대 규칙 캐시 미스 핸들러가 규칙 캐시 미스 트랩을 통해 트리거될 때), 태그 및 (트랩을 유발하는) 명령어에 관련 있는 다른 정보는 PUMP로의 입력으로서 제공되며 또한 CSR을 사용하여 미스 핸들러로의 입력으로도 제공된다. 유사한 방식으로, 메타데이터 규칙 처리 도메인으로부터 실행중인 프로그램의 명령어 실행 도메인으로 제어를 이전할 때(예컨대, 규칙 캐시 미스 트랩을 처리한 후 규칙 캐시 미스 핸들러로부터 리턴할 때), PUMP 출력은 CSR을 사용하여 전달될 수 있고, 그 후 CSR의 내용은 명령어 실행 도메인의 대응하는 매핑된 레지스터에 저장된다. 본 명세서의 논의와 일관하여, 규칙에 매핑되지 않는 (예를 들어, 명령어에 매칭하는 규칙이 캐시에 놓여 있지 않고, 캐시 미스 핸들러가 현재 명령어에 대해 그러한 매칭 규칙이 존재하지 않는다고 결정하는) 명령어는 규칙이 실행되도록 허용되지 않고, 이에 따라 트랩이나 다른 이벤트가 트리거된다는 것을 나타낸다. 예를 들어, 프로세서는 현재 프로그램 코드의 실행을 중지할 수 있다.In at least one embodiment according to the techniques herein using a RISC-V architecture, strict separation and isolation between data and metadata is maintained where separation and isolation exists between tag metadata processing and normal instruction processing. It can be. Thus, a separate domain of execution may be maintained between metadata rule processing and normal or typical program instruction execution. Metadata processing performed using PUMP may be performed on tags associated with commands and data executing code. A PUMP rule cache miss triggers a trap that transfers control to a rule cache miss handler that creates or retrieves a rule matching the current instruction and stores the rule in the PUMP rule cache. Information can be passed between the execution domains mentioned above using CSR. When switching from the instruction execution domain of an executing program to the metadata rules processing domain (e.g. when a rules cache miss handler is triggered via a rules cache miss trap), tags and other information related to the instruction (causing the trap) are It is provided as an input to the PUMP and also as an input to the miss handler using the CSR. In a similar fashion, when control transfers from the metadata rule processing domain to the instruction execution domain of the executing program (e.g., when returning from a rules cache miss handler after processing a rules cache miss trap), the PUMP output uses the CSR. and the content of the CSR is then stored in the corresponding mapped register of the instruction execution domain. Consistent with the discussion herein, an instruction that does not map to a rule (e.g., no rule matching the instruction is placed in the cache, and the cache miss handler determines that no such matching rule exists for the current instruction) Indicates that the rule is not allowed to run, and that a trap or other event is triggered accordingly. For example, the processor may suspend execution of current program code.

이러한 방식으로, 비록 동일한 RISC-V 프로세서 및 메모리가 두 영역들 모두에서 사용될 수 있을지라도, 전술한 도메인 및 연관된 데이터 경로 사이에는 엄격한 분리가 존재할 수 있다. 본 명세서에서의 기술을 사용하면, 코드를 실행하는 어떠한 명령어도 메타데이터 태그 또는 규칙을 판독하거나 기입할 수 없다. 명령어 및 데이터에 태깅하는 것을 비롯한 모든 메타데이터 변환은 PUMP를 통해 수행될 수 있다. 유사하게, PUMP 캐시로의 규칙 삽입은 메타데이터 서브시스템의 규칙 캐시 미스 핸들러에 의해서만 수행될 수 있다. 메타데이터 서브시스템 또는 처리 시스템에 의해 수행되는 처리와 관련하여, 실행 코드의 메타데이터 태그는 PUMP CSR에 놓이고 메타데이터 시스템에 의해 "데이터" 입력이 되고 그에 대해 동작된다(예를 들어, 포인터는 메타데이터 메모리 공간을 가리킨다). 메타데이터 서브시스템은 PUMP 입력 CSR을 통해 규칙에 따라 처리를 위한 PUMP 입력을 판독한다. 명령어가 규칙을 통해 진행될 수 있으면, PUMP는 (예를 들어, PC new 및 R tag에 대한) 태그 결과를 정의된 PUMP 출력 CSR에 기입한다. 규칙 캐시로의 규칙 삽입은 (예를 들어, (901q)의 srtag CSR과 같은) 특정 CSR에 기입하는 것에 응답하여 트리거될 수 있다. 이러한 방식으로, 모든 태그 업데이트는 PUMP의 규칙을 통해 수행되고 메타데이터 서브시스템에 의해 제어된다. 오직 메타데이터 서브시스템만이 규칙 캐시 미스의 발생시 호출되는 캐시 미스 핸들러를 통해 PUMP 캐시에 규칙을 삽입할 수 있다. 또한, RISC-V 아키텍처를 사용하는 본 명세서에 설명된 적어도 하나의 실시예에서, 메타데이터 처리와 정상적인 명령어 처리 간의 전술한 분리는 "RISC-V 사용자 레벨 ISA" 및 "RISC-V 권한 있는 ISA"의 명령어 이외의 임의의 새로운 명령어를 추가하지 않고 유지될 수 있다. 본 명세서의 다른 곳에서의 논의와 일관하여, 본 명세서에서의 기술에 따른 실시예는 데이터와 메타데이터 사이의 엄격한 분리 및 격리를 유지하며, 이것에 의해 태그에 기초한 메타데이터 처리와 정상적인 명령어 처리 사이의 분리가 존재할 수 있다. 적어도 하나의 실시예에서, 이러한 분리는 별도의 프로세서 및 별도의 메모리를 갖는 별도의 물리적 메타데이터 처리 서브시스템을 가짐으로써 유지될 수 있다. 따라서, 제 1 프로세서 및 제 1 메모리는 실행 프로그램의 명령어를 처리할 때 사용될 수 있고, 제 2 프로세서 및 제 2 메모리는 예컨대 규칙 캐시 미스 핸들러의 코드를 실행할 때 메타데이터 처리의 수행과 함께 사용하기 위해 메타데이터 처리 서브시스템에 포함될 수 있다.In this way, there can be strict separation between the aforementioned domains and associated data paths, even though the same RISC-V processor and memory can be used in both domains. Using the techniques herein, any instruction that executes code cannot read or write metadata tags or rules. All metadata conversion, including tagging commands and data, can be performed through PUMP. Similarly, rule insertion into the PUMP cache can only be performed by the metadata subsystem's rules cache miss handler. Regarding the processing performed by the metadata subsystem or processing system, metadata tags of executable code are placed in the PUMP CSR and become "data" inputs by the metadata system and act on them (e.g., pointers are refers to the metadata memory space). The metadata subsystem reads the PUMP input for processing according to the rules through the PUMP input CSR. If the command can proceed through the rule, PUMP writes the tag result (eg for PC new and R tag) to the defined PUMP output CSR. Rule insertion into the rules cache may be triggered in response to writing a specific CSR (eg, the srtag CSR of 901q). In this way, all tag updates are performed via PUMP's rules and controlled by the metadata subsystem. Only the metadata subsystem can insert rules into the PUMP cache via a cache miss handler that is called when a rule cache miss occurs. Additionally, in at least one embodiment described herein that uses a RISC-V architecture, the foregoing separation between metadata processing and normal instruction processing is referred to as a "RISC-V user-level ISA" and a "RISC-V authoritative ISA". It can be maintained without adding any new instructions other than the instructions of Consistent with discussion elsewhere herein, embodiments in accordance with the techniques herein maintain a strict separation and isolation between data and metadata, thereby allowing between tag-based metadata processing and normal instruction processing. A separation of may exist. In at least one embodiment, this separation may be maintained by having a separate physical metadata processing subsystem with a separate processor and separate memory. Thus, the first processor and first memory may be used when processing instructions of an executing program, and the second processor and second memory may be used in conjunction with performing metadata processing, for example when executing code of a rule cache miss handler. It can be included in the metadata processing subsystem.

도 27을 참조하면, 본 명세서에서의 기술에 따른 실시예에 포함될 수 있는 컴포넌트의 예(1000)가 도시된다. 예(1000)는 실행 프로그램 및 메타데이터 처리 서브시스템 또는 프로세서(1004)에 대한 정상적인 처리와 관련하여 사용되는 제 1 서브시스템 또는 프로세서(1002)를 포함한다. 제 1 서브시스템(1002)은 정상적인 프로그램 실행과 관련하여 사용되는 프로그램 실행 서브시스템으로 특징지을 수 있다. 서브시스템(1002)은 프로그램 코드를 실행하는 것 및 데이터를 사용하는 것과 관련하여 사용되는 컴포넌트를 포함하는 프로세서이며, 그러한 코드 및 데이터는 메타데이터 처리 서브시스템(1004)과 함께 사용하기 위해 본 명세서의 다른 곳에서 기술된 바와 같이 태그를 포함한다. 서브시스템(1002)은 메모리(1008a), 명령어 또는 I-스토어(1008b), ALU(산술 및 로직 유닛)(1008d) 및 프로그램 카운터(PC)(1008e)를 포함한다. PUMP(1003)는 서브시스템(1002)에서 코드의 실행과 관련하여 사용될 수 있지만 메타데이터 처리 서브시스템(1004)의 부분으로서 고려될 수 있음을 유의하여야 한다. 서브시스템(1002)에서의 모든 코드 및 데이터는 데이터(1002b)와 연관된 태그(1002a)에 의해 일반적으로 표시되는 바와 같이 태깅될 수 있으며, (1002a 및 1002b)는 메모리(1008a)에 저장될 수 있다. 마찬가지로, 요소(1001a)는 PC(1008e)의 명령어의 태그를 나타내고, (1001b)는 명령어(1008b)의 태그를 나타내고, (1001c)는 메모리 위치(1008a)의 태그를 나타내고, (1001d)는 레지스터(1008c)의 태그를 나타낸다.Referring to FIG. 27 , an example 1000 of components that may be included in an embodiment consistent with the techniques herein is shown. Example 1000 includes a first subsystem or processor 1002 used in connection with normal processing for an executable program and metadata processing subsystem or processor 1004 . The first subsystem 1002 may be characterized as a program execution subsystem used in connection with normal program execution. Subsystem 1002 is a processor that includes components used in connection with executing program code and using data, such code and data described herein for use with metadata processing subsystem 1004. Include tags as described elsewhere. Subsystem 1002 includes a memory 1008a, an instruction or I-store 1008b, an arithmetic and logic unit (ALU) 1008d and a program counter (PC) 1008e. It should be noted that PUMP 1003 may be used in connection with the execution of code in subsystem 1002 but may be considered part of metadata processing subsystem 1004 . All code and data in subsystem 1002 may be tagged as indicated generally by tag 1002a associated with data 1002b, and 1002a and 1002b may be stored in memory 1008a. . Similarly, element 1001a represents a tag of an instruction in PC 1008e, 1001b represents a tag of instruction 1008b, 1001c represents a tag of memory location 1008a, and 1001d represents a register. Indicates the tag of (1008c).

메타데이터 처리 서브시스템(1004)은 현재 명령어의 태그 및 PUMP(1003)로의 입력으로서 제공된 연관된 데이터를 사용하여 메타데이터 규칙 처리와 관련하여 사용되는 컴포넌트를 포함하는 프로세서(메타데이터 프로세서라고도 지칭함)이다. PUMP(1003)는 본 명세서의 다른 곳에서 설명된 바와 같을 수 있고 규칙 캐시를 포함한다. 예를 들어, 적어도 하나의 실시예에서, PUMP(1003)는 도 22에 도시된 컴포넌트를 포함할 수 있다. PUMP(1003)의 컴포넌트, PUMP 입력 및 출력에 사용된 연관된 PUMP CSR 및 본 명세서에서의 기술에 따른 적어도 하나의 실시예에 포함될 수 있는 연관된 로직의 보다 상세한 설명 및 예는 아래에서 그리고 본 명세서의 다른 곳에서 더 상세하게 설명된다. 서브시스템(1004)은 메타데이터 처리를 위해 사용되는 별도의 프로세서이고 서브시스템(1002)의 컴포넌트와 유사한 컴포넌트를 포함한다. 서브시스템(1004)는 메모리(1006a), I-스토어(1006b), 레지스터 파일(1006b) 및 ALU(1006d)를 포함한다. 메모리(1006a)는 메타데이터 규칙 처리와 관련하여 사용되는 메타데이터 구조체를 포함할 수 있다. 예를 들어, 메모리(1006a)는 포인터인 태그가 가리키는 구조체 또는 데이터를 포함할 수 있다. 포인터 태그 및 포인터 태그가 가리키는 구조체/데이터의 예는 예컨대 CFI 정책과 관련하여 본 명세서에 설명된다. I-스토어(1006b) 및 메모리(1006a)는 메타데이터 처리를 수행하는 미스 핸들러와 같은 명령어 또는 코드를 포함할 수 있다. 메타데이터 프로세서(1004)는 (예를 들어, 태그 및 규칙에 기초하여) 메타데이터 처리만을 수행하기 때문에, 메타데이터 프로세서(1004)는 프로그램 실행과 관련하여 사용되는 데이터 메모리(1008a)와 같은, (1002)의 다른 컴포넌트에 액세스할 필요가 없다. 서브시스템(1004)은 별도의 메모리(1006a)와 같은 자체의 컴포넌트를 포함하고, 메타데이터 처리 코드 및 데이터를 서브시스템(1002)에 저장할 필요가 없다. 오히려, PUMP(1003)에 의해 사용될 수 있는 현재 명령어의 태그와 같은 모든 정보는 메타데이터 처리 서브시스템(1004)의 입력(예를 들어, PUMP 입력(1007))으로서 제공된다.The metadata processing subsystem 1004 is a processor (also referred to as a metadata processor) that includes components used in connection with metadata rule processing using tags of the current instruction and associated data provided as input to PUMP 1003. PUMP 1003 may be as described elsewhere herein and includes a rules cache. For example, in at least one embodiment, PUMP 1003 may include the components shown in FIG. 22 . More detailed descriptions and examples of the components of PUMP 1003, associated PUMP CSRs used for PUMP inputs and outputs, and associated logic that may be included in at least one embodiment in accordance with the techniques herein are provided below and elsewhere herein. described in more detail elsewhere. Subsystem 1004 is a separate processor used for metadata processing and includes components similar to those of subsystem 1002 . Subsystem 1004 includes memory 1006a, I-store 1006b, register file 1006b and ALU 1006d. Memory 1006a may contain metadata structures used in connection with metadata rule processing. For example, memory 1006a may contain a structure or data pointed to by a tag that is a pointer. Examples of pointer tags and structures/data they point to are described herein, for example, in the context of CFI policies. I-store 1006b and memory 1006a may contain instructions or code such as a miss handler that performs metadata processing. Because the metadata processor 1004 only performs metadata processing (e.g., based on tags and rules), the metadata processor 1004 is used in connection with program execution, such as data memory 1008a ( 1002) do not need access to other components. Subsystem 1004 includes its own components, such as a separate memory 1006a, and does not need to store metadata processing code and data in subsystem 1002. Rather, all information, such as tags of the current instruction that may be used by PUMP 1003, is provided as input to metadata processing subsystem 1004 (eg, PUMP input 1007).

예(1000)는 본 명세서의 다른 곳에서 설명된 정상적인 프로그램 실행을 위해 사용되는 것과 동일한 서브시스템 상에서 메타데이터 처리를 수행하기보다는 별도의 메타데이터 처리 서브시스템(1004)을 갖는 대안적인 실시예를 도시한다. 예를 들어, 별도의 메타데이터 프로세서 또는 서브시스템(1004)을 갖는 대신, 실시예는 PUMP(1003) 및 서브시스템(1002)만을 포함할 수 있다. 단일 프로세서를 갖는 그러한 실시예에서, CSR은 본 명세서에서 설명된 바와 같이 메타데이터 처리 및 사용자 프로그램을 실행하는 정상 처리 모드 사이에서 정보를 전달하는데 사용될 수 있음으로써, 격리 및 분리를 제공할 수 있다. 별도의 메타데이터 프로세서 대신 단일 프로세서를 갖는 그러한 실시예에서, 미스 핸들러의 코드는 코드가 보호되도록 하는 방식으로 단일 메모리에 저장될 수 있다. 예를 들어, 별도의 메타데이터 프로세서 또는 서브시스템 없이, 미스 핸들러의 코드는 액세스를 제한하기 위해 본 명세서의 다른 부분에서 설명된 바와 같이 태그를 사용하여 보호될 수 있거나, 사용자 코드에 의해 어드레스 가능하지 않는 메모리의 일부분에 매핑될 수 있다.Example 1000 illustrates an alternative embodiment having a separate metadata processing subsystem 1004 rather than performing metadata processing on the same subsystem used for normal program execution as described elsewhere herein. do. For example, instead of having a separate metadata processor or subsystem 1004, an embodiment may include only PUMP 1003 and subsystem 1002. In such an embodiment with a single processor, the CSR may be used to pass information between metadata processing and normal processing modes of running user programs, as described herein, thereby providing isolation and separation. In such an embodiment having a single processor instead of a separate metadata processor, the miss handler's code may be stored in a single memory in such a way that the code is protected. For example, without a separate metadata processor or subsystem, the miss handler's code may be protected using tags as described elsewhere herein to limit access, or may not be addressable by user code. may be mapped to a portion of memory that is not

이제 PUMP I/O(input/output)(입력/출력)에 관한 더 세부 사항이 설명된다. 아래에서 설명되는 PUMP I/O는 예를 들어 정상적인 코드 실행을 위해 동일한 프로세서 또는 서브시스템을 사용할 수 있는 PUMP의 실시예뿐만 아니라 (1000)에서와 같은 별도의 프로세서 또는 서브시스템을 사용할 수 있는 실시예에 적용된다는 것을 유의하여야 한다. 또한, 아래에서 설명되는 PUMP I/O는 RISC-V 아키텍처에 기초한 실시예와 함께 사용될 수 있고 다른 프로세서 아키텍처와 함께 사용하기 위해 일반화될 수 있다.Further details regarding PUMP I/O (input/output) are now described. The PUMP I/O described below may use a separate processor or subsystem as in 1000 as well as an embodiment of PUMP that may use the same processor or subsystem for normal code execution, for example. It should be noted that this applies to Additionally, the PUMP I/O described below can be used with embodiments based on the RISC-V architecture and can be generalized for use with other processor architectures.

도 28을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 PUMP I/O를 요약하는 예(1010)가 도시된다. 예컨대 도 1 및 도 24와 관련하여 본 명세서의 다른 곳에서 설명한 바와 같이, PUMP는 스테이지(5 및 6)에서 동작한다. PUMP 입력은 정상적인 PUMP 검증(예를 들어, 현재 명령어가 정책 규칙을 사용할 수 있는지를 검증)과 관련하여, 일치하는 규칙(있는 경우)을 현재 명령어에 대한 PUMP의 규칙 캐시에서 찾는데 사용된다. 정상적인 PUMP 검증은 6 스테이지 파이프라인을 갖는 본 명세서의 다른 부분에서 설명한 바와 같이 스테이지 5에서의 일부와 같은 모든 명령어에 대해 발생할 수 있다. 또한, PUMP 입력은 예컨대 6 스테이지 파이프라인의 스테이지 6에서 발생할 수 있는, 규칙 캐시로의 규칙 삽입 제어와 관련하여 사용될 수 있다. 정상적인 PUMP 검증과 연관된 PUMP I/O는 상단(입력(1012))으로부터 하단(출력(1014))으로의 수직 방향의 입력 및 출력에 의해 표시된다. PUMP 규칙 캐시에 규칙 삽입을 제어하는 것과 연관된 PUMP I/O는 예(1010)에서 좌측(입력(1016))으로부터 우측(출력(1018))으로의 수평 방향의 입력 및 출력에 의해 표시된다. 또한, 다른 곳에서 더 상세히 설명된 바와 같이, 요소(1012)는 규칙 삽입과 관련하여 사용되기도 하는 추가 입력을 나타낸다.Referring to FIG. 28 , an example 1010 summarizing PUMP I/O in an embodiment according to the techniques herein is shown. As described elsewhere herein, eg with respect to FIGS. 1 and 24 , PUMP operates in stages 5 and 6 . The PUMP input is used to find a matching rule (if any) in the PUMP's rule cache for the current instruction, in conjunction with normal PUMP validation (eg, verifying that the current instruction can use policy rules). Normal PUMP verification can occur for all instructions, such as some in stage 5 as described elsewhere in this specification with a 6 stage pipeline. Additionally, PUMP inputs may be used in connection with controlling rule insertion into the rules cache, which may occur, for example, in stage 6 of a six stage pipeline. The PUMP I/O associated with normal PUMP verification is represented by inputs and outputs in a vertical direction from top (input 1012) to bottom (output 1014). The PUMP I/O associated with controlling the insertion of rules into the PUMP rule cache is represented in example 1010 by horizontal inputs and outputs from left (input 1016) to right (output 1018). Also, as described in more detail elsewhere, element 1012 represents additional inputs that may also be used in conjunction with rule insertion.

먼저, 정상적인 PUMP 검증 처리와 연관된 PUMP I/O를 고려해 본다. PUMP 입력(1012)은 PC tag, CI tag, 명령어 오퍼랜드 태그(예를 들어, (RISC-V 내의 CSR 기반 명령어에 대한) OP1 tag, OP2 tag 또는 CSR tag), OP3 tag, (메모리 명령어에 대한 메모리 위치에 대한) Mtag(Mtag는 또한 본 명세서에서 메모리 명령어에 대한 MR 태그로서 지칭될 수 있음을 유의할 것), opcode 정보(예를 들어, Opgrp 입력으로 표시된 op group, 확장된 opcode의 RISC-V에 대한 funct12(funct7) 입력, 그의 명령어가 예(200 및 220)에서와 같은 다수의 명령어를 포함하는 명령어 워드에서 현재 명령어인 표시자를 제공하는 subinstr 입력) 및 care 입력 비트를 포함할 수 있다. Opgrp는 현재 명령어에 대한 opgroup일 수 있으며, 본 명세서의 다른 곳에서 설명된 바와 같이 Opgrp는 앞의 스테이지(예를 들어, 스테이지 3 또는 스테이지 4)의 출력일 수 있다. Funct 12 (funct 7) PUMP 입력은 명령어 워드의 추가 비트를 사용하는 그러한 RISC-V opcode에 대한, 만일 있다면, 추가 opcode 비트일 수 있다(예를 들어, 예(400)). PUMP 출력(1014)은 Rtag(예를 들어, 명령어 결과 레지스터 또는 목적지 메모리 위치에 대한 태그), PC new tag(다음 명령어를 위해 사용된 PC 상에 놓인 전파된 태그를 나타냄), 및 스테이지 6에서 미스 핸들러로의 트랩을 초래하는 PUMP 규칙 캐시 미스가 있었는지를 나타내는 표시자(1014a)를 포함할 수 있다.First, consider the PUMP I/O associated with normal PUMP verification processing. The PUMP input 1012 is a PC tag, CI tag, instruction operand tag (e.g., OP1 tag, OP2 tag or CSR tag (for CSR-based instructions in RISC-V), OP3 tag, (memory for memory instruction). Mtag for location (note that Mtag can also be referred to herein as MR tag for memory instruction), opcode information (e.g. op group represented by Opgrp input, extended opcodes in RISC-V) for funct12 (funct7) input, a subinstr input that provides an indicator that its instruction is the current instruction in an instruction word containing multiple instructions as in examples 200 and 220) and a care input bit. Opgrp may be the opgroup for the current instruction, and as described elsewhere herein, Opgrp may be the output of a previous stage (e.g., stage 3 or stage 4). Funct 12 (funct 7) The PUMP input may be the extra opcode bits, if any, for those RISC-V opcodes that use the extra bits of the instruction word (e.g., example 400). PUMP output 1014 is Rtag (e.g., the tag to the instruction result register or destination memory location), PC new tag (indicating the propagated tag placed on the PC used for the next instruction), and misses in stage 6. and an indicator 1014a indicating whether there was a PUMP rule cache miss that would result in a trap to the handler.

care 비트(1012a)는 특정 명령어에 대해 어떤 PUMP 입력(1012) 및 어떤 PUMP 출력(1014)이 관심 있는지(cared)/관심 없는지(not cared)(예를 들어, 무시되는지)를 나타낼 수 있다. PUMP 입력에 관한 care 비트는 funct12에 대한 care 비트와 funct7에 대한 제 2 care 비트를 포함할 수 있다. 본 명세서의 다른 곳에서 설명된 바와 같이, 전술한 care 비트 둘 모두는 현재 명령어의 특정 opcode가 RISC-V 명령어에 대해 확장된 12 opcode 비트 부분에 대한 임의의 비트를 포함하는지를 나타낸다(예를 들어, 예(400)의 (404a)). funct12 및 funct7 care 비트 둘 모두가 "don't care"이면, 확장된 12 opcode 비트 부분의 모든 12 비트가 마스킹된다(masked-out). funct7이 "care"를 표시하면, 확장된 12 opcode 비트 부분의 아래쪽 5 비트가 모두 마스킹된다. funct12가 "care"를 표시하면, 확장된 12 opcode 비트 부분에 대해 마스킹은 없다.The care bit 1012a may indicate which PUMP inputs 1012 and which PUMP outputs 1014 are cared for/not cared for (eg, ignored) for a particular instruction. The care bit for PUMP input may include a care bit for funct12 and a second care bit for funct7. As described elsewhere herein, both of the foregoing care bits indicate whether the particular opcode of the current instruction contains any bits for the extended 12 opcode bit portion for RISC-V instructions (e.g., (404a) of Example 400). If both funct12 and funct7 care bits are "don't care", then all 12 bits of the extended 12 opcode bit part are masked-out. If funct7 indicates "care", all lower 5 bits of the extended 12 opcode bit part are masked. If funct12 indicates "care", there is no masking for the extended 12 opcode bit part.

이제 PUMP 규칙 캐시로의 규칙 삽입을 제어하는 것과 연관된 PUMP I/O를 고려해 본다. PUMP 입력(1016)은 PUMP 캐시 규칙 삽입과 관련하여 입력(1012)과 조합하여 사용될 수 있다. PUMP 입력(1016)은 Op1 데이터(메타데이터 프로세서 또는 서브시스템으로부터의 출력), (스테이지 6으로부터의) 명령어, 및 태그 모드(메타데이터 프로세서 또는 서브시스템으로부터의 출력)와 권한(RISC-V 권한을 표시하는 priv)를 포함할 수 있다. (1016)의 태그 모드 및 priv 입력은 메타데이터 프로세서 또는 서브시스템에 의해 사용되어, 메타데이터 프로세서에서 실행되는 미스 핸들러 또는 다른 코드와 같은 코드가 (예를 들어, 입력(1012)과 같은) 다양한 입력을 메타데이터 프로세서에 제공하는, 아래에서 그리고 본 명세서의 다른 곳에서 설명되는 CSR에 액세스하기에 충분한 권한을 갖는지를 결정한다. Rdata(1018)는 스테이지 6에서 사용하기 위한 메타데이터 프로세서 또는 서브시스템으로의 입력(예를 들어, 캐시 미스 핸들러 처리 입력)이다. Op1 데이터, R 데이터 및 예(1010)의 다른 아이템은 다음 단락과 도면에서 보다 상세히 설명된다.Now consider the PUMP I/O associated with controlling rule insertion into the PUMP rule cache. PUMP input 1016 may be used in combination with input 1012 in connection with PUMP cache rule insertion. PUMP input 1016 includes Op1 data (output from metadata processor or subsystem), instructions (from stage 6), and tag mode (output from metadata processor or subsystem) and permissions (RISC-V permissions). priv) to indicate. The tag mode and priv inputs of 1016 are used by the metadata processor or subsystem so that code, such as a miss handler or other code running on the metadata processor, can be used on various inputs (e.g., input 1012). has sufficient rights to access the CSR described below and elsewhere in this specification, which provides the metadata processor. Rdata 1018 is the input to the metadata processor or subsystem for use in stage 6 (eg, cache miss handler processing input). Op1 data, R data, and other items of example 1010 are described in more detail in the following paragraphs and figures.

따라서 일반적으로, 예(1010)에서, 요소(1012)는 PUMP 및 사용자 코드를 실행하는 프로세서(예를 들어, (1002)와 같은 비-메타데이터 프로세서 또는 서브시스템)로의 입력을 나타내고, 요소(1014)는 메타데이터 프로세서에 의해 발생된 출력을 나타내고, 요소(1016)는 PUMP로의 메타데이터 프로세서 입력에 의해 발성된 출력을 나타내며, 요소(1018)는 메타데이터 프로세서로의 입력을 나타낸다.Thus, in general, in example 1010, element 1012 represents input to a processor (e.g., a non-metadata processor or subsystem such as 1002) executing PUMP and user code, and element 1014 ) represents the output generated by the metadata processor, element 1016 represents the output generated by the metadata processor input to the PUMP, and element 1018 represents the input to the metadata processor.

도 29를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 opgroup/care 테이블(예를 들어, 예(420)의 요소(422))과 관련하여 I/O를 요약한 예(1020)가 도시된다. 본 명세서의 다른 부분에서 설명된 바와 같이, opgroup/care 테이블은 각 명령어에 대해 현재 명령어의 opcode에 대한 opgroup 및 care 비트를 룩업하고 출력하는데 사용될 수 있다. I/O의 이러한 제 1 흐름은 (1020)에서 상단(입력(1022))으로부터 하단(출력(1024))으로의 수직 방향의 입력 및 출력에 의해 도시된다. 본 명세서의 다른 곳에서 설명된 바와 같이, 입력(1022)은 opcode/care 테이블에 인덱스를 사용하는 (예를 들어, 예(420)의 opcode 부분의 예와 관련하여 설명된 것과 같은) opcode 또는 그 일부분일 수 있다. 입력(1022)은 스테이지 3으로부터 온 것일 수 있다. 출력(1024)은 특정 opcode에 대한 opgroup(opgrp) 및 care 비트일 수 있다. 출력(1024)은 스테이지 5 로의 입력(예를 들어, (1012)에 포함되어 있는 PUMP 입력 opgrp 및 care의 두 개)이다.Referring to FIG. 29 , an example 1020 summarizing I/O in relation to an opgroup/care table (eg, element 422 of example 420) in an embodiment according to the techniques herein is illustrated. do. As described elsewhere in this specification, the opgroup/care table can be used for each instruction to look up and output the opgroup and care bits for the opcode of the current instruction. This first flow of I/O is illustrated at 1020 by inputs and outputs in a vertical direction from the top (input 1022) to the bottom (output 1024). As described elsewhere herein, input 1022 is an opcode (e.g., as described with respect to the example in the opcode portion of example 420) that uses an index into the opcode/care table or its may be part Input 1022 may be from stage 3. Output 1024 may be the opgroup (opgrp) and care bits for a particular opcode. Outputs 1024 are the inputs to stage 5 (e.g., the two PUMP inputs opgrp and care contained in 1012).

I/O의 제 2 흐름은 (1020)에서 좌측(입력(1026))으로부터 우측(출력(1028))으로의 수평 방향으로의 입력 및 출력에 의해 도시된다. (1020)에서 I/O의 제 2 흐름은 메타데이터 프로세서 또는 스테이지(6)에 입력되는 PUMP 출력 Rdata(1028)의 선택을 제어하는 것과 관련하여 수행되는 처리의 예시이다. 입력(1026)은 (1016)과 관련하여 전술한 바와 같다. 출력(1028)은 (1018)과 관련하여 위에서 설명된 바와 같다.The second flow of I/O is shown at 1020 by inputs and outputs in a horizontal direction from left (input 1026) to right (output 1028). The second flow of I/O at 1020 is an example of processing performed in connection with controlling the selection of the PUMP output Rdata 1028 input to the metadata processor or stage 6. Input 1026 is as described above with respect to 1016 . Output 1028 is as described above with respect to 1018.

도 30을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 PUMP에 의해 수행되는 처리를 추상적으로 나타내는 예(1030)가 도시된다. 예(1030)는 예(1010)에서 수평 PUMP I/O 흐름(예를 들어, 요소(1012, 1016 및 1018))와 관련하여 전술한 규칙 삽입을 위한 PUMP 제어에 대응하는 PUMP 제어(1031)를 포함한다. 예(1030)는 마스킹(1032), 해시(1034), 규칙 캐시 룩업(1036) 및 출력 태그 선택(1038)을 포함하며, 출력 태그 선택은 예(1010)에서 수직 PUMP I/O 흐름(예를 들어, 요소(1012 및 1014))과 관련하여 전술한 바와 같이 각각의 명령어에 대해 수행된 정상적인 PUMP 검증 경로 I/O 흐름에 대응한다. 마스킹(1032)은 (1012)의 사용되지 않는 PUMP 입력을 마스킹하기 위해 (1012)의 care 비트를 적용하는 것을 나타낸다. 해시(1034)는 (1036)으로 표시된 규칙 캐시 룩업 동안 사용된 해시의 계산을 나타낸다. 일 실시예에서 (1032, 1034 및 1036)에 의해 표시된 로직을 구현하는데 사용될 수 있는 컴포넌트는 도 22와 관련하여 도시되고 설명된다. 출력 태그 선택(1038)은 케어 벡터 비트(입력(1012)에 포함된 care) 및 (현재 태그 모드를 나타내는) htagmode CSR에 기초하여 (1014)에 포함된 PUMP 출력(Rtag 및 PC new tag)의 선택을 나타낸다.Referring to FIG. 30 , an example 1030 abstractly illustrating processing performed by PUMP in an embodiment according to the description herein is shown. Example 1030 provides a PUMP control 1031 corresponding to the PUMP control for rule insertion described above with respect to horizontal PUMP I/O flows (e.g., elements 1012, 1016, and 1018) in example 1010. include Example 1030 includes masking 1032, hashing 1034, rules cache lookup 1036, and output tag selection 1038, which in example 1010 includes the vertical PUMP I/O flow (eg For example, it corresponds to the normal PUMP verify path I/O flow performed for each instruction as described above with respect to elements 1012 and 1014). Masking 1032 indicates applying the care bit of 1012 to mask unused PUMP inputs of 1012 . Hash 1034 represents the calculation of the hash used during the rules cache lookup indicated by 1036. Components that may be used to implement the logic indicated by 1032, 1034 and 1036 in one embodiment are shown and described with respect to FIG. Output tag selection 1038 selects the PUMP output (Rtag and PC new tag) included in 1014 based on the care vector bits (care included in input 1012) and the htagmode CSR (indicating the current tag mode). indicates

도 31을 참조하면, 본 명세서에서의 기술에 따른 일 실시예에서 PUMP의 출력 태그 선택(1038)의 로직을 구현하는데 사용될 수 있는 컴포넌트를 나타내는 예(1040)가 도시된다. 예(1040)는 멀티플렉서(MUX)(1043a-1043b)를 포함한다. 일반적으로, MUX(1043a)는 PUMP에 의한 출력(예를 들어, (1014)의 PCnew tag)으로서 PCnew tag(1043)에 대한 최종 태그 값을 선택하는데 사용될 수 있고, MUX(1043b)는 PUMP에 의한 출력(예를 들어, (1014)의 Rtag)으로서 Rtag(1047)에 대한 최종 태그 값을 선택하는데 사용될 수 있다. 요소(1042)는 MUX(1043a)에 대해 선택기로서 사용되는 입력을 나타낸다. 입력(1042)은 PCnew tag(1043)로서 (1041a) 또는 (1041b) 중 하나를 선택하기 위해 사용된다. 입력(1042)은 연계된(PUMP가 연계되는지를 나타내는 부울)과 논리적으로 AND(&&)된 (예를 들어, (1012)의 케어 비트로부터의) PCnew tag 케어 비트를 포함할 수 있다. 요소(1043)는 MUX(1043b)에 대해 선택기로서 사용되는 입력을 나타낸다. 입력(1043)은 (1045a-1045b)에 의해 표시된 입력 중 하나를 Rtag(1047)로서 선택하는데 사용된다. 입력(1043)은 engaged와 논리적으로 AND($$)된 (예를 들어, (1012)의 케어 비트로부터의) Rtag care 비트를 포함할 수 있다. 따라서, 일반적으로 PUMP 입력(1012)에 포함된 care 비트는 어떤 PUMP 입력이 don't care인지(마스킹되는지) 및 어떤 PUMP 출력(Rtag 및 PCnew tag)이 don't care인지(마스킹되는지)를 식별한다. 또한, 프로세서가 현재 tagmode가 PUMP 동작에 대한 문턱치로서 명시하는 것보다 높은 권한 레벨에서 실행되기 때문에 PUMP가 연계 해지될 때 출력(1043 및 1047)은 "don't care" 값으로 처리된다. Referring to FIG. 31 , an example 1040 illustrating components that may be used to implement the logic of the output tag selection 1038 of a PUMP in one embodiment consistent with the techniques herein is shown. Example 1040 includes multiplexers (MUX) 1043a - 1043b. In general, MUX 1043a may be used to select the last tag value for PCnew tag 1043 as an output by PUMP (e.g., PCnew tag of 1014), and MUX 1043b may be used by PUMP It can be used to select the final tag value for Rtag 1047 as an output (e.g. Rtag in 1014). Element 1042 represents an input used as a selector for MUX 1043a. Input 1042 is PCnew tag 1043, which is used to select either 1041a or 1041b. Input 1042 may include the PCnew tag care bit (e.g., from the care bit of 1012) logically ANDed (&&) with the associated (a boolean indicating whether the PUMP is associated). Element 1043 represents an input used as a selector for MUX 1043b. Input 1043 is used to select one of the inputs indicated by (1045a-1045b) as Rtag 1047. Input 1043 may include the Rtag care bit (e.g., from the care bit of 1012) logically ANDed ($$) with engaged. Thus, in general, the care bits included in PUMP inputs 1012 identify which PUMP inputs don't care (masked) and which PUMP outputs (Rtag and PCnew tags) don't care (masked). do. Also, outputs 1043 and 1047 are treated with a "don't care" value when PUMP is de-associated because the processor is running at a higher privilege level than the current tagmode specifies as the threshold for PUMP operation.

요소(1049)는 부울 engaged가 현재 RISC-V 권한 및 현재 tagmode의 함수로서 결정되는 방법을 나타낸다. 요소(1049)는 관련 기술분야에서 공지된 표준 표기법을 사용하는 논리적 표현을 포함하며, 이에 의하면 "A == B"는 A와 B 간의 균등 여부에 관한 논리적 검사를 나타내고, "A && B"는 A와 B의 논리 AND 연산을 나타내며, "A || B"는 A와 B 사이의 논리 배타 OR 연산을 나타낸다.Element 1049 indicates how the boolean engaged is determined as a function of the current RISC-V authority and the current tagmode. Element 1049 includes a logical expression using standard notation known in the art, whereby "A == B" represents a logical check for equality between A and B, and "A &&B" represents a logical AND operation between A and B , and “A || B” represents a logical exclusive OR operation between A and B.

요소(1041a 및 1045a)는 규칙 캐시 룩업(1036)으로부터의 출력인 (1043a)으로의 입력을 나타낸다. PC tag(1041b)는 PUMP 입력(1012)에 포함된 PC 태그이다. 다른 입력(1041b)은 일반적으로 PUMP에 의해 출력된 아마도 최종 Rtag(1047))로서 선택될 수 있는 다수의 다른 입력이다. 예를 들어, 일 실시예에서, 다른 입력(1041b)은 명령어에 따라 M tag, PC tag, CI tag, OP1 tag, OP2 tag, OP3 tag 및 가능하게는 다른 것들을 포함할 수 있다. 특정 Rtag 출력(1047)은 특정 RISC-V 명령어/opcode에 따라 변할 수 있다.Elements 1041a and 1045a represent inputs to 1043a, outputs from rules cache lookup 1036. PC tag 1041b is a PC tag included in PUMP input 1012. Other inputs 1041b are typically a number of other inputs that can be selected as possibly the last Rtag 1047 output by the PUMP. For example, in one embodiment, other inputs 1041b may include M tag, PC tag, CI tag, OP1 tag, OP2 tag, OP3 tag and possibly others depending on the instruction. The specific Rtag output 1047 may vary depending on the specific RISC-V instruction/opcode.

다음은 일 실시예에서 PUMP 출력 값으로서 발생된 Rtag(1047) 및 PCnew tag(1043)에 대한 특정 값을 요약할 수 있다. 다음은 RISC-V 명령어가 상이한 경우의 특정 R tag 출력 값을 나타낸다는 것을 주목하여야 한다. 따라서, 최종 PUMP R tag 값으로서 출력된 특정 R tag 값은 후속 메타데이터 처리와 관련하여 그러한 PUMP 출력을 이용하는 명령어에 따라 변할 수 있다.The following may summarize specific values for Rtag 1047 and PCnew tag 1043 generated as PUMP output values in one embodiment. It should be noted that the following indicates the specific R tag output values when the RISC-V instructions are different. Accordingly, the specific R tag value output as the final PUMP R tag value may change depending on the instruction using such PUMP output in connection with subsequent metadata processing.

1. PCtag는 출력 케어 비트가 Pc new tag에 대해 꺼져있을 때 변경되지 않는다.1. PCtag does not change when output care bit is off for Pc new tag.

2. Rtag는 CSRRW 동작을 위한 Op1tag이다.2. Rtag is Op1tag for CSRRW operation.

3. Rtag는 CSRR?I, CSRRS, CSRRC 동작을 위한 Op2tag(CSRtag)이다.3. Rtag is Op2tag (CSRtag) for CSRR?I, CSRRS, and CSRRC operations.

4. Rtag는 JAL 및 JALR 명령어에 대한 PCtag이다.4. Rtag is the PCtag for JAL and JALR instructions.

5. Rtag는 AUIPC 명령어에 대한 PCtag이다.5. Rtag is the PCtag for the AUIPC command.

6. Rtag는 LUI 명령어에 대한 CItag이다.6. Rtag is the CItag for the LUI command.

7. Rtag는 출력 케어 비트가 오프일 때(Rtag에 대해 care임을 표시함) 비 메모리, 비-CSR, 비-JAL(R)/AUIPC/LUI 동작을 위한 Op1tag이다.7. Rtag is an Op1tag for non-memory, non-CSR, non-JAL(R)/AUIPC/LUI operations when the output care bit is off (indicating care for Rtag).

8. Rtag는 출력 care bit가 off일 때 메모리 기입 동작을 위한 Op2tag이다.8. Rtag is the Op2tag for memory write operations when the output care bit is off.

9. Rtag는 출력 케어 비트가 오프일 때 메모리 로드 동작을 위한 Mtag이다.9. Rtag is the Mtag for the memory load operation when the output care bit is off.

도 32를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 PUMP I/O를 제어하는데 사용될 수 있는 컴포넌트의 예(1050)가 도시된다. 일반적으로, 예(1030)을 다시 참조하면, (1050)의 컴포넌트는 논리적으로 (1032)의 최상부에 (예를 들어, 도 22의 컴포넌트와 인터페이스하는) 또 다른 계층을 포함할 수 있다. 요소(M1 내지 M14)는 다양한 입력을 선택하기 위해 사용되는 멀티플렉서를 나타낸다. 요소(1052)는 일반적으로 현재 명령어에 대해 (1012)로부터의 입력 opcode인 PC tag, CI tag, Op1 tag, Op2 tag, Op3 tag 및 M tag를 나타낸다. 요소(1056)는 일반적으로 멀티플렉서(M1 내지 M7)의 선택된 출력을 저장하는데 사용되는 레지스터들의 행을 지칭한다. RISC-V 아키텍처에 기초한 일 실시예에서, 행(1056)에서 각각의 박스는 레지스터일 수 있고, 특히 본 명세서의 다른 곳(예를 들어, 일 실시예에서 사용될 수 있는 CSR을 나타내는 예(900))에서 설명된 바와 같은 특정 값을 포함하는 CSR일 수 있다.Referring to FIG. 32 , an example 1050 of components that may be used to control PUMP I/O in an embodiment consistent with the techniques herein is shown. In general, referring back to example 1030, the components of 1050 may logically include another layer on top of 1032 (eg, interfacing with the components of FIG. 22). Elements M1 to M14 represent multiplexers used to select the various inputs. Element 1052 generally represents the input opcodes from 1012 for the current instruction: PC tag, CI tag, Op1 tag, Op2 tag, Op3 tag, and M tag. Element 1056 generally refers to a row of registers used to store selected outputs of multiplexers M1-M7. In one embodiment based on the RISC-V architecture, each box in row 1056 may be a register, particularly elsewhere herein (e.g., example 900 representing a CSR that may be used in one embodiment). ) may be a CSR containing a specific value as described in

예(1050)의 요소(1052)는 (1012)의 모든 입력을 포함하지 않는다는 것을 유의하여야 한다. 예를 들어, funct12(funct7) 및 (1012)의 subinstr 입력은 간략화를 위해 예(1050)에 도시되지 않았다. 그러나, 관련 기술분야에서 통상의 기술자라면 (1012)로부터의 입력 funct12(funct7) 및 subinstr가 또한 (1052)에 포함될 수 있음을 알고 있다. 보다 일반적으로, 입력(1052)은 실시예에서 사용될 수 있는 메타데이터 규칙 처리를 위한 특정 입력에 적용될 수 있다.It should be noted that element 1052 of example 1050 does not include all of the inputs of 1012. For example, the subinstr inputs of funct12 (funct7) and (1012) are not shown in example 1050 for simplicity. However, those skilled in the art know that the input funct12(funct7) and subinstr from (1012) can also be included in (1052). More generally, inputs 1052 may apply to specific inputs for metadata rule processing that may be used in an embodiment.

PUMP가 정상적인 PUMP 검증(예를 들어, 현재 명령어가 정책 규칙을 사용할 수 있는지를 검증)에 대한 처리를 수행할 때, 입력(1052)은 간단히 출력(1054)으로서 통과할 수 있다. 이 경우 출력(1054)은 메타데이터 처리를 위한 도 22의 컴포넌트로의 입력과 같은 입력으로서 PUMP로 통과한다(또는 보다 일반적으로는 메타데이터 프로세서 또는 서브시스템으로 통과한다). 정상적인 PUMP 검증에 따라, PUMP는 출력(1014)을 생성할 수 있다(예를 들어, 현재 명령어에 대한 매칭 규칙이 규칙 캐시에서 발견되면 Rtag 및 PC new tag를 발생하고, 그렇지 않으면 캐시 미스(1014a)를 발생한다).Input 1052 can simply be passed as output 1054 when PUMP is performing processing for normal PUMP verification (eg, verifying that the current command can use policy rules). Output 1054 in this case passes to the PUMP as an input (or more generally to a metadata processor or subsystem) as an input to the component of FIG. 22 for metadata processing. In accordance with normal PUMP validation, the PUMP may generate output 1014 (e.g., if a matching rule for the current instruction is found in the rules cache, generate Rtag and PC new tag, else cache miss 1014a) occurs).

규칙 캐시 미스가 발생하면, 제 1 단계로서, 현재 명령어에 대해 (1052)로부터의 현재 값은 (1056)의 레지스터(G1 내지 G7)에 로드된다. 따라서, (G1 내지 G7)은 규칙 캐시 미스를 유발했던 현재의 명령어에 대한 opcode 및 tag 값의 스냅샷을 포함하며 그러한 값은 이제 그러한 처리를 하기 위한 필요에 따라 (G1 내지 G7)의 하나 이상의 원하는 값을 판독하는 캐시 미스 핸들러에 의한 후속 처리와 관련하여 사용될 수 있다.When a rule cache miss occurs, as a first step, the current values from 1052 for the current instruction are loaded into registers G1 through G7 at 1056. Thus, (G1 through G7) contains a snapshot of the opcode and tag values for the current instruction that caused the rule cache miss, and those values are now one or more desired values of (G1 through G7) as needed to make such processing happen. It can be used in conjunction with subsequent processing by cache miss handlers that read values.

따라서, 제 2 단계에서, 캐시 미스 핸들러는 실행하고, (G1 내지 G7)로부터 입력 값으로서 판독하고, 현재 명령어에 대한 새로운 규칙을 발생한다. 멀티플렉서(M16)는 (M10)으로부터의 선택된 출력이 (예를 들어, 프로그램 코드를 실행하는 때와 동일한 프로세서에서 실행될 수 있거나, 그렇지 않으면 예(1000)에서와 같이 별도의 메타데이터 프로세서에서 실행될 수 있는) 캐시 미스 핸들러에 의해 처리하기 위한 R data(1053)로 표시되는 (G1 내지 G7)로부터의 다양한 가능한 입력의 선택을 제어하는데 사용될 수 있다. 규칙 캐시 미스를 야기하는 현재 명령어에 대한 입력(G1 내지 G7)이 주어지면, 캐시 미스 핸들러는 캐시에 삽입될 새로운 규칙을 결정하는 처리를 수행한다. 캐시 미스 핸들러는 방금 결정된 새로운 규칙에 대한 출력인 R tag 및 PC new tag를 발생하고 Rtag를 Rtag CSR(G8)에 기입하고, PC new tag를 PC new CSR(G9)에 기입한다. 예(1050)에서, Op1 data(1051)는 새로운 규칙에 대해 메타데이터 프로세서에 의해 발생된 출력(Rtag 및 PC new tag)과 같은 출력을 나타내며, 그러한 출력은 설명된 바와 같이 CSR(G8 및 G9)에 저장된다.Thus, in the second step, the cache miss handler executes, reads as input values from (G1 to G7), and generates a new rule for the current instruction. Multiplexer M16 is configured so that selected outputs from M10 can be executed on the same processor as executing program code (e.g., or otherwise executed on a separate metadata processor as in example 1000). ) can be used to control the selection of the various possible inputs from (G1 to G7) represented by R data 1053 for processing by the cache miss handler. Given inputs (G1 through G7) for the current instruction causing the rule cache miss, the cache miss handler performs processing to determine a new rule to be inserted into the cache. The cache miss handler generates the outputs R tag and PC new tag for the new rule just determined and writes Rtag to Rtag CSR (G8) and PC new tag to PC new CSR (G9). In example 1050, Op1 data 1051 represents outputs such as outputs generated by the metadata processor for new rules (Rtag and PC new tag), which outputs are CSRs (G8 and G9) as described. is stored in

이때, CSR(G1 내지 G9)의 값은 캐시 미스 핸들러에 의해 발생된 새로운 규칙의 태그 값이며 제 3 단계에서 새로운 규칙으로서 규칙 캐시에 삽입/기입될 수 있다. RISC-V 아키텍처와 함께 본 명세서에서의 기술을 사용하는 적어도 하나의 실시예에서, G8에 의해 표시된 R tag CSR에 기입은 새로운 규칙(예를 들어, CSR(G1 내지 G9)의 내용)을 규칙 캐시에 기입하도록 트리거한다. 규칙 삽입과 관련하여, CSR(G1 내지 G7)은 출력(1052)으로서 제공되고, CSR(G8 및 G9)은 규칙 캐시에 저장을 위해 출력(1055)으로서 PUMP에 제공된다. 보다 구체적으로, 일 실시예에서, 출력(1052 및 1055)은 규칙 삽입을 위해 도 22의 컴포넌트에 제공될 수 있다.At this time, the values of the CSRs (G1 to G9) are tag values of the new rule generated by the cache miss handler and can be inserted/written into the rule cache as a new rule in the third step. In at least one embodiment using the techniques herein in conjunction with a RISC-V architecture, a write to the R tag CSR indicated by G8 adds a new rule (e.g., the contents of CSRs (G1 through G9)) to the rules cache. trigger to write to Regarding rule insertion, CSRs (G1-G7) are provided as outputs 1052 and CSRs (G8 and G9) are provided to the PUMP as outputs 1055 for storage in the rules cache. More specifically, in one embodiment, outputs 1052 and 1055 may be provided to the component of FIG. 22 for rule insertion.

간단한 사례에서, 실시예는 방금 설명된 바와 같이 (예를 들어, 출력(1052 및 1055)을 통해) 새로운 규칙에 대한 CSR(G1 내지 G9)의 내용들을 PUMP 규칙 캐시에 기입함으로써 현재 규칙 미스를 충족하는 하나의 새로운 규칙을 삽입할 수 있다. 이러한 실시예에서, 캐시 미스 핸들러를 실행하는 메타데이터 규칙 프로세서에 의해 출력된 Op1 data(1051)는 새로운 규칙에 대한 R tag 및 PC new tag만을 발생하기 때문에 멀티플렉서(M1 내지 M7)는 필요하지 않다. 그러나, 실시예는 다수의 규칙을 프리페칭하거나 규칙 캐시에 다수의 규칙을 삽입하는 것을 또한 가능하게 한다. 예를 들어, 규칙 캐시 미스의 발생시, 캐시 미스 핸들러는 현재 명령어에 대한 하나의 새로운 규칙 대신에 규칙 캐시에 기입/삽입 될 다수의 규칙을 결정할 수 있다. 이 경우, Op1 data(1051)는 (CSR(G1 내지 G7에 기입된) opcode, PCtag, CItag, Op1tag, Op2tag, Op3tag 및 Mtag 에 대한 추가의 새로운 값뿐만 아니라 (CSR(G8 및 G9)에 기입되는 것으로서) Rtag 및 PC new tag에 대한 새로운 값을 포함할 수 있다. 이러한 경우, 멀티플렉서(M1 내지 M7)는 CSR(G1 내지 G7)에 대한 입력으로서 Op1 data(1051)로부터의 상기 새로운 값을 각각 선택하기 위해 사용될 수 있다.In a simple case, the embodiment satisfies the current rule miss by writing the contents of the CSR (G1 to G9) for the new rule into the PUMP rule cache as just described (e.g., via outputs 1052 and 1055). You can insert one new rule that does In this embodiment, the multiplexers M1 to M7 are not needed because the Op1 data 1051 output by the metadata rule processor executing the cache miss handler only generates the R tag and PC new tag for the new rule. However, embodiments also enable prefetching multiple rules or inserting multiple rules into the rule cache. For example, in the event of a rule cache miss, the cache miss handler can determine multiple rules to be written/inserted into the rules cache instead of one new rule for the current instruction. In this case, Op1 data 1051 includes additional new values for opcode, PCtag, CItag, Op1tag, Op2tag, Op3tag and Mtag (written in CSR (G1 to G7)) as well as (written in CSR (G8 and G9) In this case, multiplexers M1 to M7 each select the new values from Op1 data 1051 as inputs to CSRs G1 to G7. can be used to do

일반적으로, Op1 data(1051)는 메타데이터 프로세서로부터 PUMP 로의 출력을 나타내고, R data(1053)는 PUMP로부터 메타데이터 프로세서로의 출력을 나타낸다. 또한, 요소(1052)는 정상적 PUMP 검증을 수행할 때(예를 들어, 현재 명령어가 정책 규칙을 사용하는 것이 가능한지를 검증할 때) 요소(1054)의 값이 (1052)에서와 같은 값과 동일한 사용자 코드를 (예를 들어, 정상적인 명령어 처리의 일환으로서) 실행하는 프로세서로부터 PUMP로의 입력을 나타낸다.In general, Op1 data 1051 represents the output from the metadata processor to the PUMP, and R data 1053 represents the output from the PUMP to the metadata processor. Element 1052 also ensures that the value of element 1054 is equal to the value as in 1052 when performing normal PUMP verification (e.g., verifying that the current instruction is capable of using policy rules). Represents input to the PUMP from a processor executing user code (e.g., as part of normal instruction processing).

도 33을 참조하면, 분기 예측(branch prediction)을 하는 RISC-V 아키텍처를 갖는 본 명세서에서의 기술에 따른 일 실시예에서 6 스테이지 프로세서 파이프라인과 조합된 PUMP 처리 스테이지를 도시하는 예(1060)가 도시된다. 예(1060)은 6 스테이지 파이프라인을 도시하는데, 스테이지 1은 실행될 다음 명령어(예를 들어, 페치된 명령어를 I 캐시(1063a)에서 저장하는 것) 및 분기 예측을 페치하는 것을 포함하고, 스테이지 2는 디코드 명령어 스테이지를 나타내고, 스테이지 3은 레지스터로부터 값을 획득(예를 들어, 레지스터 판독)하고 현재 명령어에 대한 분기 해결(branch resolution)을 획득하는 것을 포함하고, 스테이지 4는 명령어 실행(예를 들어, 고속 ALU 연산을 실행하고 부동 소수점(floating point, FP), 정수 곱셈 및 나눗셈과 같은 다단계 연산을 시작함)을 포함하고, 스테이지 5는 다중 스테이지 동작에 대한 응답을 수신하는 것을 포함하며, 스테이지 6은 명령어를 커밋(commit)하는 것(예를 들어, (1069)에 의해 표시된 바와 같이 결과를 목적지 및 데이터 캐시(1063b)에 저장하는 것) 및 예외, 트랩 및 인터럽트를 처리하는 것을 포함한다. 또한 예(1060)에는 PUMP 처리 단계가 도시된다. 요소(1062)는 opgrp/care 테이블 룩업이 스테이지 3에서 수행될 수 있음을 나타내며, 이때 출력(1062a)는 스테이지 4에서 PUMP 해시(1064)로의 입력으로서 제공된다. PUMP 해시(1064)로의 다른 입력은 Mtag(1061)(예를 들어, 현재 명령어에 대한 피연자인 메모리 위치의 태그) 및 다른 태그 값(1062b)을 포함하며, 이에 의해 입력(1061 및 1062a-1062b)는 PUMP 규칙 캐시(1066)의 캐시 어드레스 또는 위치를 나타내는 출력(1064a)을 결정하는데 사용된다. 명령어 오퍼랜드의 다른 태그 값(1062b), PC, 현재 명령 등의 예는 본 명세서의 다른 곳에서 설명되며, 현재 명령어에 대해 규칙 캐시(1066) 내의 위치를 결정하는 것과 관련하여 사용될 수 있다(예를 들어, 도 22). 요소(1068)는 스테이지 5로부터 PUMP 처리의 출력(1066a)에 기초한 캐시 규칙 미스 검출을 나타낸다. 출력(1066a)은 현재 명령어에 대해 규칙 캐시 미스가 있었는지에 관한 표시자를 포함할 수 있다. (1066a)가 잠재적 히트를 보고하면, (1068)은 히트가 참(true) 히트인지 거짓(false) 히트인지를 결정하여, 거짓 히트를 미스로 바꾸어 놓는다. 요소(1066b)는 규칙 캐시 미스가 없었고 현재 명령어와 매칭하는 캐시 내에 규칙이 있는 경우에 스테이지 6으로의 PUMP 출력을 나타낸다. 출력(1066b)은 PC new tag 및 R tag를 포함할 수 있다. 실시예(1060)의 PUMP 스테이지는 변경될 수 있음을 유의하여야 한다. 예를 들어, opgroup/care 룩업(1062)은 (예를 들어, 특정 PUMP 규칙 캐시 구현에 따라) 스테이지 5에서 행해지는 PUMP 규칙 캐시 위치 및 룩업 둘 모두의 결정에 따라 스테이지 3보다는 스테이지 4에서 수행될 수 있다.Referring to FIG. 33 , an example 1060 illustrating a PUMP processing stage combined with a 6 stage processor pipeline in an embodiment in accordance with the techniques herein with a RISC-V architecture with branch prediction is provided. is shown Example 1060 shows a six stage pipeline, where stage 1 includes fetching the next instruction to be executed (e.g., storing the fetched instruction in I cache 1063a) and branch prediction, and stage 2 denotes the decode instruction stage, stage 3 involves obtaining a value from a register (e.g. register read) and obtaining branch resolution for the current instruction, and stage 4 is executing an instruction (e.g. , executes high-speed ALU operations and initiates multi-step operations such as floating point (FP), integer multiplication and division), stage 5 includes receiving responses to multi-stage operations, and stage 6 includes receiving responses to multi-stage operations. This includes committing the instruction (e.g., storing the result to destination and data cache 1063b as indicated by 1069) and handling exceptions, traps and interrupts. Also shown in example 1060 is a PUMP processing step. Element 1062 indicates that an opgrp/care table lookup may be performed in stage 3, with output 1062a being provided as input to PUMP hash 1064 in stage 4. Other inputs to the PUMP hash 1064 include Mtag 1061 (e.g., the tag of a memory location that is an operand to the current instruction) and another tag value 1062b, whereby inputs 1061 and 1062a-1062b ) is used to determine the output 1064a indicating the cache address or location of the PUMP rules cache 1066. Examples of other tag values 1062b of instruction operands, PC, current instruction, etc. are described elsewhere herein and may be used in connection with determining a location within rules cache 1066 for the current instruction (eg For example, Figure 22). Element 1068 represents cache rule miss detection based on output 1066a of the PUMP process from stage 5. Output 1066a may include an indicator as to whether there was a rules cache miss for the current instruction. When 1066a reports a potential hit, 1068 determines whether the hit is a true or false hit, turning the false hit into a miss. Element 1066b represents the PUMP output to stage 6 if there was no rule cache miss and there is a rule in the cache matching the current instruction. Output 1066b may include PC new tag and R tag. It should be noted that the PUMP stage of embodiment 1060 can be changed. For example, the opgroup/care lookup 1062 may be performed in stage 4 rather than stage 3 depending on the determination of both the PUMP rule cache location and the lookup done in stage 5 (e.g., depending on the particular PUMP rules cache implementation). can

비-메모리 동작과 관련하여, Mtag는 PUMP 스테이지로의 입력으로 필요하지 않으며, PUMP는 이것 없이 처리를 계속 수행할 수 있다. 메모리 동작 명령어의 경우, PUMP는 Mtag가 메모리에서 검색될 때까지 기능 정지된다. 대안적으로, 실시예는 본 명세서의 다른 곳에서 설명된 바와 같이 Mtag 예측을 수행할 수 있다. 본 명세서의 다른 곳에서의 논의와 일관하여, PC new tag는 도 1과 관련하여 도시되고 설명된 바와 같이 스테이지 1에 다시 제공되어야 한다. 명령어가 커밋되는 한, PC new tag는 다음 명령어에 적절한 PC 태그이다. 현재 명령어가 커밋하지 않으면(예를 들어, 규칙 캐시 히트 없으면), (스테이지 1로 패스백(pass back)됨으로써) PC new tag는 규칙 캐시 미스 핸들러에 의해 결정된다. 트랩 핸들러가 시작되거나 컨텍스트 스위치가 수행(예를 들어, PC 복원)되면, 태그는 저장된 PC로부터 가져온다.Regarding non-memory operations, Mtag is not needed as an input to the PUMP stage, and PUMP can continue processing without it. For memory operation instructions, PUMP is disabled until the Mtag is retrieved from memory. Alternatively, embodiments may perform Mtag prediction as described elsewhere herein. Consistent with discussion elsewhere herein, the PC new tag must be provided back to stage 1 as shown and described with respect to FIG. 1 . As long as an instruction is committed, the PC new tag is the appropriate PC tag for the next instruction. If the current instruction does not commit (eg, no rules cache hit), then (by passing back to stage 1) the PC new tag is determined by the rules cache miss handler. When a trap handler is started or a context switch is performed (eg PC restore), the tags are fetched from the saved PC.

본 명세서에서 설명된 바와 같이, 실시예는 단일 태그를 각 워드와 연관시킬 수 있다. 적어도 하나의 실시예에서, 각 태그와 연관된 워드 크기는 64 비트일 수 있다. 태깅된 워드의 내용은 예를 들어 명령어 또는 데이터를 포함할 수 있다. 이러한 실시예에서, 단일 명령어의 크기. 그러나, 실시예는 64 비트 이외의 다른 크기의 명령어를 또한 지원할 수도 있다. 예를 들어, 실시예는 본 명세서의 다른 곳에서 더 상세히 설명된 바와 같이, 구축된 축소 명령어 집합 컴퓨팅(RISC) 원리에 기초한 오픈 소스 명령어 집합 아키텍처(ISA)인 RISC-V 아키텍처에 기초할 수 있다. RISC-V 아키텍처를 사용하는 실시예는 예를 들어 32 비트 명령어뿐만 아니라 64 비트 명령어와 같은 다수의 상이한 크기의 명령어를 포함할 수 있다. 이러한 경우, 본 명세서에서의 기술에 따른 실시예는 단일 태그를 단일의 64 비트 워드와 연관시킬 수 있고, 그러므로 단일의 워드는 하나의 64 비트 명령어 또는 두 개의 32 비트 명령어를 포함할 수 있다.As described herein, embodiments may associate a single tag with each word. In at least one embodiment, the word size associated with each tag may be 64 bits. The content of the tagged word may include, for example, instructions or data. In this embodiment, the size of a single instruction. However, embodiments may also support instructions of sizes other than 64 bits. For example, an embodiment may be based on the RISC-V architecture, which is an open source instruction set architecture (ISA) based on built-in reduced instruction set computing (RISC) principles, as described in more detail elsewhere herein. . Embodiments using the RISC-V architecture may include many different sized instructions, such as 64-bit instructions as well as 32-bit instructions, for example. In this case, embodiments consistent with the techniques herein may associate a single tag with a single 64-bit word, and thus a single word may contain one 64-bit instruction or two 32-bit instructions.

도 34를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 명령어와 연관될 수 있는 태그의 예(200)가 도시된다. 요소(201)는 단일 태그(202a)가 단일 명령어(204a)와 연관되는 위에서 언급한 경우를 도시한다. 적어도 하나의 실시예에서, (202a 및 204a) 각각의 크기는 64 비트 워드일 수 있다. 요소(203)는 단일 태그(202b)가 두 개의 명령어(204b 및 204c)와 연관되는 위에서도 언급한 대안을 도시한다. 적어도 하나의 실시예에서, (202b)의 크기는 64 비트 워드일 수 있고, 명령어(204b 및 204c)는 각각 태그(202b)와 연관된 동일한 64 비트 명령어 워드(205)에 포함된 32 비트 명령어일 수 있다. 보다 일반적으로, 실시예에서 사용된 명령어 크기(들)에 따라 단일의 태깅된 명령어에 두 개 초과의 명령어가 있을 수 있음을 유의하여야 한다. 요소(203)에 의해 도시된 바와 같이, 태깅의 세분성이 명령어의 세분성과 매칭하지 않으면, 다수의 명령어는 단일 태그와 연관된다. 경우에 따라, 동일한 태그(202b)가 명령어(204b, 204c) 각각에 사용될 수 있다. 그러나, 경우에 따라, 동일한 태그(202b)가 명령어(204b, 204c) 각각에 사용되지 않을 수 있다. 다음 단락에서, 단일의 태깅된 워드와 연관된 단일 명령어에 포함된 (204b 및 204c)와 같은 다수의 명령어 각각은 서브 명령어(subinstruction)로 지칭될 수 있다.Referring to FIG. 34 , an example 200 of a tag that may be associated with a command in an embodiment consistent with the techniques herein is illustrated. Element 201 illustrates the above mentioned case where a single tag 202a is associated with a single instruction 204a. In at least one embodiment, the size of each of 202a and 204a may be a 64-bit word. Element 203 illustrates the alternative mentioned above in which a single tag 202b is associated with two instructions 204b and 204c. In at least one embodiment, the size of 202b may be a 64-bit word, and instructions 204b and 204c may each be 32-bit instructions contained in the same 64-bit instruction word 205 associated with tag 202b. there is. More generally, it should be noted that there may be more than two instructions in a single tagged instruction depending on the instruction size(s) used in the embodiment. As shown by element 203, multiple commands are associated with a single tag if the granularity of the tagging does not match the granularity of the command. In some cases, the same tag 202b may be used for each of the commands 204b and 204c. However, in some cases, the same tag 202b may not be used for each of the commands 204b and 204c. In the following paragraphs, each of a number of instructions, such as 204b and 204c, included in a single instruction associated with a single tagged word may be referred to as a subinstruction.

따라서 이제는 복수의 서브 명령어 각각과 관련하여 상이한 태그가 사용될 수 있는, 동일한 명령어 워드 내의 다수의 서브 명령어와 관련한 실시예에서 사용될 수 있는 기술이 설명될 것이다.Accordingly, techniques will now be described that may be used in embodiments involving multiple sub-instructions within the same instruction word, where different tags may be used in connection with each of the plurality of sub-instructions.

도 35를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 명령어 및 태그를 도시하는 예가 도시된다. 예(220)는 두 개의 32 비트 서브 명령어(204b 및 204c)를 포함하는 단일 64 비트 명령어 워드(205)를 포함한다. 태그(202b)는 예(200)에서 전술한 바와 같이 명령어 워드(205) 상의 태그일 수 있다. 본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 명령어 워드(205)의 태그(202b)는 태그들의 쌍을 포함하는 다른 메모리 위치(222)를 가리키는 포인터(221)일 수 있고, 여기서 쌍은 명령어 워드(205)의 서브 명령어(204b-204c)의 각각에 대한 태그를 포함한다. 이 예(220)에서, 태그 쌍(222)은 서브 명령어1(204b)에 대한 제 1 태그인 tag1을 나타내는 (222a)를 포함하고, 또한 서브 명령어2(204c)에 대한 제 2 태그인 tag2를 나타내는 (222b)를 포함한다. 적어도 일 실시예에서, 쌍(222)의 각각의 태그(222a-222b)는 비-포인터 태그(예를 들어, 스칼라)일 수 있고, 본 명세서에 설명된 바와 같이 처리를 위해 PUMP에 의해 사용되는 정보를 포함하는 또 다른 메모리 위치를 가리키는 포인터 태그일 수 있거나, 그렇지 않으면 하나 이상의 비-포인터 필드 및/또는 하나 이상의 비-포인터 필드를 포함하는 보다 복잡한 구조체일 수 있다. 예를 들어, tag1(222a)은 서브 명령어1(204a)에 대한 포인터 태그일 수 있고, tag2(222b)는 서브 명령어2(204b)에 대한 포인터 태그일 수 있다. (220)에서 도시된 바와 같이, 요소(223a)는 서브 명령어1(204b)를 처리하기 위해 PUMP에 의해 사용되는 정보를 포함하는 다른 메모리 위치(224a)를 가리키거나 식별하는 tag1(222a)를 나타내며, 요소(223b)는 서브 명령어2(204c)를 처리하기 위해 PUMP에 의해 사용되는 정보를 포함하는 다른 메모리 위치(224b)를 가리키거나 식별하는 tag2(222b)를 나타낸다. 실시예 및 서브 명령어에 따라, (224a 및 224b) 각각은 비-포인터일 수 있고, 메모리 위치를 가리키는 또 다른 포인터일 수 있거나, 하나 이상의 포인터와 하나 이상의 포인트의 일부 조합을 포함하는 복잡한 구조체일 수 있다.Referring to FIG. 35 , an example illustrating commands and tags that may be used in an embodiment according to the techniques herein is shown. Example 220 includes a single 64-bit instruction word 205 comprising two 32-bit sub-instructions 204b and 204c. Tag 202b may be a tag on instruction word 205 as described above in example 200 . In at least one embodiment consistent with the techniques herein, a tag 202b of an instruction word 205 may be a pointer 221 pointing to another memory location 222 that contains a pair of tags, where the pair Includes a tag for each of the sub-instructions 204b-204c of the instruction word 205. In this example 220, tag pair 222 includes 222a representing a first tag, tag1, for sub-instruction 1 (204b), and also includes a second tag, tag2, for sub-instruction 2 (204c). Indicates (222b). In at least one embodiment, each tag 222a-222b of pair 222 may be a non-pointer tag (eg, scalar), used by PUMP for processing as described herein. It may be a pointer tag pointing to another memory location that contains information, or it may otherwise be a more complex structure containing one or more non-pointer fields and/or one or more non-pointer fields. For example, tag1 222a may be a pointer tag for sub-instruction 1 204a, and tag2 222b may be a pointer tag for sub-instruction 2 204b. As shown at 220, element 223a includes tag1 222a pointing to or identifying another memory location 224a containing information used by PUMP to process subinstruction1 204b. element 223b represents tag2 222b pointing to or identifying another memory location 224b containing information used by PUMP to process subinstruction 2 204c. Depending on the embodiment and sub-instructions, each of 224a and 224b may be a non-pointer, may be another pointer pointing to a memory location, or may be a complex structure containing some combination of one or more pointers and one or more points. there is.

동일한 명령어 워드(205) 내에 다수의 서브 명령어를 갖는 실시예에서, 명령어 워드(205)에 포함된 서브 명령어 중 어느 것이 어느 한 시점에서 실행되는지를 나타내는 추가 입력이 PUMP에 제공될 수 있다. 예를 들어, 명령어 워드(205)에 두 개의 서브 명령어(204b-204c)가 있는 경우, PUMP로의 추가 입력은 서브 명령어1(204b) 또는 서브 명령어2(204c)가 특정 시점에서 실행 중인지를 나타내는 0 또는 1일 수 있다. RISC-V 아키텍처에 기초한 본 명세서의 다른 곳에서의 논의와 일관하는 적어도 하나의 실시예에서, PUMP로의 (어느 서브 명령어가 실행 중인지를 나타내는) 추가 입력을 기록하거나 저장하는 (본 명세서 다른 곳에서 설명된 ssuninstr CSR과 같은) CSR이 정의될 수 있다. 적어도 하나의 실시예에서, PUMP는 CSR을 사용하지 않고 데이터 경로로부터 (예를 들어, 코드 실행 도메인으로부터) 전술한 추가 입력을 정상적으로 수신할 수 있다. 그러나, 규칙 미스 시, 전술한 추가 입력은 CSR에 기록되어, 규칙 미스 핸들러가 실행되는 메타데이터 처리 도메인이 전술한 추가 입력을 획득할 수 있도록 한다(예를 들어, 전술한 추가 입력에 대한 CSR 값은 규칙 삽입시에 PUMP에 제공된다).In embodiments having multiple sub-instructions within the same instruction word 205, an additional input may be provided to the PUMP indicating which of the sub-instructions contained in the instruction word 205 are being executed at any one time. For example, if there are two sub-instructions 204b-204c in the instruction word 205, the additional input to PUMP is 0 indicating whether sub-instruction 1 204b or sub-instruction 2 204c is executing at a particular point in time. or 1. In at least one embodiment consistent with discussion elsewhere herein based on the RISC-V architecture, recording or storing additional input (indicating which sub-instruction is executing) to the PUMP (as described elsewhere herein) ssuninstr CSR) can be defined. In at least one embodiment, the PUMP may normally receive the aforementioned additional input from the data path (eg, from the code execution domain) without using a CSR. However, on a rule miss, the aforementioned additional input is written to the CSR, allowing the metadata processing domain on which the rule miss handler is executed to obtain the aforementioned additional input (e.g., the CSR value for the aforementioned additional input). is provided to PUMP upon rule insertion).

추가로 설명하면, 실시예는 프로그램의 두 위치들 사이에서 제어의 이전을 제공하는 서브 명령어를 포함할 수 있다. 그러한 서브 명령어의 예는 점핑하기, 분기하기, 리턴하기 또는 보다 일반적으로 코드의 소스 위치로부터 코드의 타깃(예를 들어, 싱크 또는 목적지) 위치로 제어를 이전하는 것일 수 있다. 본 명세서의 다른 곳에서 설명된 CFI 또는 제어 흐름 무결성과 관련하여, PUMP가 위치들 간의 이전을 프로그램에 의해 지원되는 위치만으로 제한하거나 제어하는 CFI 정책의 규칙을 구현하게 하는 것이 바람직할 수 있다. 예를 들어, 태그 T1을 갖는 코드의 소스 위치로부터 태그 T2를 갖는 코드의 타깃 위치로 제어의 이전이 이루어지는 경우를 고려해 본다. CFI 정책을 시행할 때 PUMP에 의해 사용되는 정보는 제어를 T2로 이전하도록 허용된 유효한 소스 위치 리스트일 수 있다. CFI 정책의 실시예에서, 소스로부터 타깃 위치로 제어를 이전할 때 두 개의 명령어 또는 opcode의 두 번 체크를 제공하기 위해 두 개의 규칙이 사용될 수 있다. 도 36의 예(230)에 도시된 바와 같이, 이전 또는 호출의 의사-코드 표현을 고려해 본다. 예(230)에서, foo 루틴(231)에 있는 소스 위치로부터 루틴 바(routine bar)(233)에 있는 타깃 위치로 제어(23a)를 이전하는 호출이 만들어질 수 있다. 구체적으로, 제어는 태그 T1을 갖는 소스 위치 X1(232)로부터 태그 T2를 갖는 타깃 위치 X2(234)로 이전(231a)될 수 있다. 타깃 위치 X2는 루틴 바의 코드의 본문(body)(233a)에 있는 제 1 명령어일 수 있다. CFI 정책의 규칙은 (232)로부터 (234)로 이전이 허용되거나 유효한지 확인하는지를 체크하는데 사용될 수 있다. 적어도 하나의 실시예에서, (232)로부터 (234)로 제어의 이전이 유효함을 보장하기 위해 체크를 각각 수행하는 CFI 정책의 두 개의 규칙이 사용될 수 있다. 소스 위치 X1에 있는 명령어는 제어가 타깃으로 이전되는 분기 포인트 또는 소스 포인트이다. 소스에서(예를 들어, 소스 위치 X1(232)에 있는 명령어를 실행하기 이전에), 제 1 규칙은 PC의 태그를 마킹하거나 설정하여 소스 위치를 표시하는데 사용될 수 있다. 예를 들어, 제 1 규칙은 PC의 태그를 어드레스 X1로 마킹하거나 설정하여 소스 위치를 표시할 수 있다. 이어서, (타깃 위치 X2(234)에 있는 명령어를 실행하기 이전에), 제 2 규칙은 소스 위치 X1이 제어가 타깃 위치 X2로 이전되도록 허용된 유효한 소스 위치인지를 체크하는데 사용될 수 있다.Further described, embodiments may include sub-instructions that provide transfer of control between two locations in a program. Examples of such sub-instructions may be jumping, branching, returning, or more generally transferring control from a source location in code to a target (eg, sink or destination) location in code. With respect to CFI or control flow integrity described elsewhere herein, it may be desirable to have PUMPs implement the rules of a CFI policy that restrict or control transfers between locations to only locations supported by the program. For example, consider the case where control is transferred from a source location of code with tag T1 to a target location of code with tag T2. The information used by PUMP when enforcing the CFI policy may be a list of valid source locations allowed to transfer control to T2. In an embodiment of the CFI policy, two rules may be used to provide double checks of two instructions or opcodes when transferring control from a source to a target location. Consider the pseudo-code representation of a transfer or call, as shown in example 230 of FIG. 36 . In example 230, a call can be made to transfer control 23a from a source location in routine foo 231 to a target location in routine bar 233. Specifically, control may be transferred 231a from source location X1 232 with tag T1 to target location X2 234 with tag T2. The target location X2 may be the first command in the body 233a of the code of the routine bar. The rules of the CFI policy can be used to check if the transfer from (232) to (234) is allowed or valid. In at least one embodiment, two rules of the CFI policy may be used, each performing a check to ensure the transfer of control from 232 to 234 is valid. The instruction at source position X1 is the branch point or source point from which control transfers to the target. At the source (eg, prior to executing the instruction at source location X1 232), the first rule may be used to mark or set a tag on the PC to indicate the source location. For example, the first rule may indicate the source location by marking or setting the tag of the PC to the address X1. Subsequently (prior to executing the instruction at target location X2 234), a second rule can be used to check if source location X1 is a valid source location that is allowed to transfer control to target location X2.

적어도 하나의 실시예에서, 제 2 규칙의 체크는 (예를 들어, 소스 위치 어드레스 X1을 표시하는) 소스 위치(232)를 식별하는 (제 1 규칙에 의해 설정된) PC의 마킹된 태그가 타깃 위치(234)로 제어가 이전될 수 있는 유효한 소스 위치를 식별하는지를 결정함으로써 수행될 수 있다. 이러한 실시예에서, 제 2 규칙에는 타깃 위치(234)로 제어를 이전하도록 허용된 모든 유효한 소스 위치를 나타내는 정의된 리스트가 제공될 수 있다. 적어도 하나의 일 실시예에서, 정의된 리스트는 예를 들어 위에 언급된 X1과 같은 어드레스에 의해 유효한 소스 위치를 식별할 수 있다.In at least one embodiment, the check of the second rule is that the PC's marked tag (set by the first rule) identifying the source location 232 (e.g., indicating the source location address X1) is the target location. 234 to determine if it identifies a valid source location to which control can be transferred. In such an embodiment, the second rule may be provided with a defined list representing all valid source locations that are allowed to transfer control to the target location 234 . In at least one embodiment, the defined list may identify valid source locations by address, for example X1 mentioned above.

도 37을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 소스 및 타깃 위치의 서브 명령어와 관련하여 사용될 수 있는 태그를 도시하는 예(240)가 도시된다. 예(240)는 전술한 바와 같이 단일 명령어 워드의 두 개의 서브 명령어(204b 내지 201c)에 대해 특정된 단일 태그(202b)를 나타내는 요소(203)를 포함한다. 명령어 워드상의 태그(202b)는 두 개의 서브 명령어(204b-204c)에 대해 각각 두 개의 태그(242a-204b)를 나타내는 태그 쌍(242)을 가리킬 수 있다. 두 개의 태그(242a-b) 각각은 일반적으로 두 개의 태그(242a-242b) 각각과 연관된 특정 서브 명령어에 따라 소스 또는 타깃 위치와 관련하여 CFI 유효 입증을 위해 PUMP 규칙에 의해 사용되는 정보를 가리키는 포인터일 수 있다.Referring to FIG. 37 , an example 240 illustrating tags that may be used in connection with sub-instructions of source and target locations in an embodiment consistent with the techniques herein is shown. Example 240 includes element 203 representing a single tag 202b specified for two sub-instructions 204b-201c of a single instruction word, as described above. A tag 202b on an instruction word may point to a tag pair 242 representing two tags 242a-204b, respectively, for two sub-instructions 204b-204c. Each of the two tags 242a-b is generally a pointer to information used by the PUMP rules to validate the CFI with respect to a source or target location according to a particular sub-instruction associated with each of the two tags 242a-242b. can be

예(240)는 두 개의 서브 명령어(204b-204c)가 타깃 위치가 되는 일 실시예에서의 구조체를 도시한다. 서브 명령어 태그(242a)는 소스 id 필드(245a) 및 허용된 소스 세트 필드(245b)를 포함하는 구조체(245)의 위치를 가리킨다(243a). 소스 id 필드(245a)는 본 명세서에서 서브 명령어(204b)가 타깃 위치인 경우와 같이, 서브 명령어(204b)가 소스 위치가 아닌 경우에는 널(null)일 수 있다. 소스 세트 필드(245b)는 서브 명령어(204b)를 포함하는 특정 타깃 위치로 제어를 이전하도록 허용되는 하나 이상의 유효 소스 위치를 식별하는 리스트 구조체(247)를 포함하는 위치를 가리키는 포인터일 수 있다. 적어도 하나의 실시예에서, 리스트 구조체(247)는 유효 소스 위치들의 크기 또는 개수를 나타내는 제 1 요소를 포함할 수 있다. 따라서, "n"(n은 0을 초과하는 정수)의 크기(247a)는 리스트(247) 내의 요소(247b 내지 247n)에 의해 표시되는 소스 위치의 수를 나타낸다. 요소(247b 내지 247n) 각각은 서브 명령어(204b)를 포함하는 타깃 위치로 제어를 이전할 수 있는 상이한 유효 소스 위치를 식별할 수 있다. 적어도 하나의 실시예에서, 허용된 소스(247b 내지 n) 각각은 예를 들어 유효한 소스 위치 중 하나의 어드레스인 스칼라 또는 비-포인터일 수 있다.Example 240 illustrates a structure in one embodiment where two sub-instructions 204b-204c are target locations. Sub-instruction tag 242a points to the location of structure 245 including source id field 245a and allowed source set field 245b (243a). The source id field 245a may be null when the sub-instruction 204b is not a source location, such as when the sub-instruction 204b is a target location herein. The source set field 245b may be a pointer to a location containing a list structure 247 identifying one or more valid source locations that are allowed to transfer control to a specific target location that includes the sub-instruction 204b. In at least one embodiment, list structure 247 may include a first element indicating the size or number of valid source locations. Accordingly, size 247a of "n" (where n is an integer greater than zero) represents the number of source locations represented by elements 247b through 247n in list 247 . Each of elements 247b through 247n may identify a different valid source location that may transfer control to a target location that includes sub-instruction 204b. In at least one embodiment, each of the allowed sources 247b through n may be a scalar or non-pointer, e.g., the address of one of the valid source locations.

예(240)에서, 서브 명령어2(204c)와 함께 사용되는 요소(243b, 246 및 248)은 서브 명령어1(204b)과 함께 사용되는 요소(243a, 245 및 247)와 각각 유사하다. 일반적으로, (240)의 구조체를 사용하는 그러한 실시예에서, 존재하지 않는 임의의 아이템에는 널 또는 제로 값이 할당될 수 있다. 명령어 워드(205)가 소스 또는 목적지 위치가 아닌 한 쌍의 서브 명령어(204b-204c)를 포함하면, 태그(202b)는 널일 수 있다(예를 들어, 그렇지 않으면, 구조체(242)를 가리키지 않는 비-포인터 또는 다른 포인터를 식별할 수 있다). 서브 명령어(204b-204c) 중 하나가 이전의 소스 위치도 아니고 타깃 위치도 아니면, (242)에서 연관된 태그는 널이다. 예를 들어, 서브 명령어(204b)가 소스 위치도 타깃 위치도 아니지만 서브 명령어(204)가 타깃이면, (242a)는 널일 수 있고, (242b)는 예 240에 도시된 바와 같을 수 있다. 서브 명령어(204b-204c)가 소스 위치가 아니면, 소스 id는 널이다(예를 들어, 예(240)의 (204b-204c)가 타깃 위치이기 때문에, (245a 및 236a)는 둘 모두 널이다). 서브 명령어(204b-204c)가 타깃 위치가 아니면, 허용된 소스 세트 필드 포인터는 널이다. 예를 들어, 서브 명령어(204b)가 타깃 위치가 아닌 소스 위치를 식별하면, 소스 id(245a)는 소스 위치 명령어의 어드레스를 식별하며, (245b)는 널일 것이다.In example 240, elements 243b, 246 and 248 used with sub-instruction 2 204c are similar to elements 243a, 245 and 247 used with sub-instruction 1 204b, respectively. In general, in such an embodiment using the structure of 240, any item that does not exist may be assigned a null or zero value. If instruction word 205 includes a pair of sub-instructions 204b-204c that is not a source or destination location, tag 202b may be null (e.g., otherwise it does not point to structure 242). may identify non-pointers or other pointers). If one of the sub-instructions 204b-204c is neither the previous source location nor the target location, the associated tag at 242 is null. For example, if sub-instruction 204b is neither a source location nor a target location, but sub-instruction 204 is a target, then 242a may be null and 242b may be as shown in example 240. If subinstruction 204b-204c is not a source location, the source id is null (e.g., since 204b-204c in example 240 is a target location, 245a and 236a are both null) . If sub-instructions 204b-204c are not target locations, the allowed source set field pointer is null. For example, if sub-instruction 204b identifies a source location rather than a target location, source id 245a identifies the address of the source location instruction, and 245b will be null.

추가로 설명하면, 예(240)에서 설명된 바와 같은 구조체를 이용하는 다른 예(250)의 도 38이 참조되는데, 차이점은 제 1 서브 명령어(251a)가 소스 위치도 타깃 위치도 아니고, 제 2 서브 명령어(251b)가 3 개의 유효한 소스 위치 중 임의의 위치로부터 제어가 이전될 수 있는 타깃 위치라는 점이다. 요소(251a-251b)는 태그(251)를 갖는 단일 태깅된 워드에 포함된 두 개의 32 비트 서브 명령어를 나타낼 수 있다. 태그(251)는 서브 명령어(251a-251b)에 대한 태그 쌍을 갖는 구조체(252)를 포함하는 메모리 내의 위치를 식별하는 포인터(1228)일 수 있다. 요소(252)는 예(240)의 (242)와 유사할 수 있다. 요소(252a)는 서브 명령어(251a)에 대한 태그 포인터일 수 있고, 요소(252b)는 서브 명령어(251b)에 대한 태그 포인터일 수 있다. 서브 명령어(251a)는 소스도 타깃 위치도 아니므로, (252a)는 0으로 표시된 바와 같이 널이다. 서브 명령어(251b)는 타깃 위치이므로, (252b)는 구조체(254)를 가리키는 포인터(1238)이다. 요소(254)는 예(240)의 (246)과 유사할 수 있다. 요소(254a)는 ((246a)와 같은) 소스 id 필드이고, 요소(254b)는 (예(240)의 (248)과 유사한) 허용된 소스 세트 구조체(256)를 가리키는 포인터(어드레스(1248))를 포함하는 ((246b)와 같은) 허용된 소스 세트 필드이다. 서브 명령어(251b)는 단지 타깃 위치이지 소스가 아니기 때문에, 소스 id(254a)는 널이다. 요소(256a)는 유효한 소스 위치의 수를 나타내는 (248a와 같은) 크기 필드일 수 있다. 요소(256b 내지 256d)는 예를 들면 유효 소스 위치 명령어의 어드레스일 수 있는 유효 소스 id를 나타낼 수 있다. 이 예에서, (256a)는 엔트리(256b 내지 256d)에 각각 저장되는 어드레스(50bc, 5078, 5100)을 갖는 3 개의 유효 소스 위치가 있음을 나타낸다. 전술한 바와 관련하여, 일반적으로 명령어는 타깃 및 소스 둘 모두가 될 수 있어서 타깃이 된다는 것이 소스 id가 항상 널임을 의미하지 않는 것임을 유의하여야 한다. 예를 들어, 명령어가 타깃 및 소스 둘 다이면, 소스 id는 널이 아닐 것이고, 명령어의 태그는 허용 가능/허용된 소스의 리스트를 포함한다.38 for another example 250 using the structure as described in example 240, the difference being that the first sub-instruction 251a is neither a source location nor a target location, and the second sub-instruction 251a is neither a source location nor a target location. The command 251b is a target location to which control may be transferred from any of the three valid source locations. Elements 251a - 251b may represent two 32-bit sub-instructions contained in a single tagged word with tag 251 . Tag 251 may be a pointer 1228 identifying a location in memory containing structure 252 having tag pairs for sub-instructions 251a-251b. Element 252 may be similar to 242 of example 240. Element 252a may be a tag pointer for sub-instruction 251a, and element 252b may be a tag pointer for sub-instruction 251b. Sub-instruction 251a is neither a source nor a target location, so 252a is null as indicated by 0. Since subinstruction 251b is the target location, 252b is pointer 1238 to structure 254. Element 254 may be similar to 246 of example 240. Element 254a is a source id field (such as 246a), and element 254b is a pointer (address 1248) pointing to an allowed source set structure 256 (similar to 248 of example 240). ) is an allowed source set field (such as (246b)) that contains. Since subinstruction 251b is only a target location and not a source, source id 254a is null. Element 256a may be a size field (such as 248a) indicating the number of valid source locations. Elements 256b through 256d may represent a valid source id, which may be, for example, the address of a valid source location instruction. In this example, 256a indicates that there are three valid source locations with addresses 50bc, 5078, and 5100 stored in entries 256b through 256d, respectively. In connection with the foregoing, it should be noted that generally an instruction can be both a target and a source, so being a target does not mean that the source id is always null. For example, if a command is both a target and a source, the source id will not be null, and the command's tag contains a list of allowable/allowed sources.

예컨대 엔트리(256b 내지 256d)에서 및 보다 일반적으로는 허용된 소스 세트 중 임의의 허용된 소스(예를 들어, 예(240)의 (248b 내지 248n) 중 임의의 소스)에서 소스 위치의 어드레스는 바이트 레벨 어드레스 세분성일 수 있다는 것을 유의하여야 한다.The address of the source location, e.g., in entries 256b through 256d, and more generally in any allowed source of the set of allowed sources (e.g., any of 248b through 248n of example 240) is a byte. It should be noted that it can be a level address granularity.

단일의 태깅된 워드에 포함된 다수의 명령어(서브 명령어라고도 지칭함)에 관해 방금 설명된 것과 유사한 방식으로, 실시예는 데이터의 단일의 태깅된 데이터보다 적은 데이터 부분에 대한 액세스를 허용할 수 있다. 예를 들어, 실시예는 바이트 레벨에서 데이터에 액세스하는 명령어를 포함할 수 있으며, 각 바이트가 단일 태깅된 워드에 포함된 다수의 서브 명령어 각각에 상이한 태그를 제공하는 것과 유사한 방식으로 자신의 연관된 태그를 가질 수 있도록 바이트 레벨 태그 태깅을 제공하는 것이 바람직할 수 있다. 다음의 예에서는 바이트 레벨 태깅을 제공하는 것이 언급되며, 여기서 64 비트 워드에 포함된 8 바이트 각각은 자체의 연관된 태그를 가질 수 있다. 그러나, 보다 일반적으로, 본 명세서에서의 기술은 단일 태깅된 워드에 포함된 임의의 수의 다중 데이터 아이템에 서브워드 태깅을 제공하는데 사용될 수 있다. 이러한 경우, 태깅된 데이터 워드와 연관된 태그는 태깅된 데이터 워드의 바이트에 대한 바이트 레벨 태그를 식별하는 구조체를 가리키는 포인터일 수 있다.In a manner similar to that just described with respect to multiple instructions (also referred to as sub-instructions) contained in a single tagged word, embodiments may allow access to less than a single tagged data portion of data. For example, embodiments may include instructions that access data at the byte level, with each byte having its associated tag in a manner similar to providing a different tag to each of a number of sub-instructions contained in a single tagged word. It may be desirable to provide byte-level tagging so that it can have . In the following example, it is mentioned providing byte-level tagging, where each of the 8 bytes included in a 64-bit word can have its own associated tag. More generally, however, the techniques herein may be used to provide subword tagging for any number of multiple data items contained in a single tagged word. In this case, the tag associated with the tagged data word may be a pointer to a structure that identifies the byte level tag for the byte of the tagged data word.

도 39를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 바이트 레벨 태깅의 예(260)가 도시된다. 요소(262)는 태깅된 64 비트 워드(265)와 연관된 태그(262a)를 나타내며, 여기서 워드(265)는 B1 내지 B8로 표시된 8 바이트를 포함한다. 태그(262a)는 데이터 워드(265)의 각각의 바이트(B1 내지 B8)에 대한 태그를 포함하는 구조체(266)의 메모리 위치를 가리키는(261) 포인터일 수 있다. 구조체(266)는 구조체에 남아 있는 엔트리의 수를 나타내는 크기 필드인 제 1 필드(265a)를 포함한다. 구조체 내의 각각의 후속 엔트리는 태그 값을 포함할 수 있고, 그 특정 태그 값을 갖는 워드(265)의 하나 이상의 바이트를 나타낼 수 있다. 이 예에서, 크기(265a)는 8이며, 여기서 (265)의 바이트 B1-B8 각각은 상이한 태그 값을 갖는다. 요소(266a 내지 266h)는 각각 워드(265)의 바이트 B1 내지B8에 대한 태그 값을 나타낸다.Referring to FIG. 39 , an example 260 of byte-level tagging that may be used in an embodiment consistent with the techniques herein is illustrated. Element 262 represents tag 262a associated with tagged 64-bit word 265, where word 265 contains 8 bytes denoted B1 through B8. Tag 262a may be a pointer 261 to a memory location in structure 266 that contains a tag for each byte B1 to B8 of data word 265 . Structure 266 includes a first field 265a, which is a size field indicating the number of entries remaining in the structure. Each subsequent entry in the structure may contain a tag value and may represent one or more bytes of word 265 with that particular tag value. In this example, size 265a is 8, where each of bytes B1-B8 of (265) has a different tag value. Elements 266a through 266h represent tag values for bytes B1 through B8 of word 265, respectively.

도 40을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 바이트 레벨 태깅의 제 2 예(267)가 도시된다. 요소(262)는 태깅된 64 비트 워드(265)와 연관된 태그(262a)를 나타내며, 여기서 워드(265)는 B1 내지 B8로 표시된 8 바이트를 포함한다. 태그(262a)는 데이터 워드(265)의 각각의 바이트(B1 내지 B8)에 대한 태그를 포함하는 구조체(268b)의 메모리 위치를 가리키는(268a) 포인터일 수 있다. 구조체(268b)는 구조체에 남아 있는 엔트리의 수를 나타내는 크기 필드인 제 1 필드(265b)를 포함할 수 있다. 따라서, (265b)는 도 39의 (265a)와 유사하다. 구조체(268b) 내의 각각의 후속 엔트리는 태그 값을 포함할 수 있고 그 특정 태그 값을 갖는 워드(265)의 하나 이상의 바이트를 나타낼 수 있다. 이 예에서, 크기(265b)는 7 개의 후속 엔트리(266a 내지 226f 및 268c)를 나타내는 7이다. 요소(266a 내지 266f)는 도 39의 예(260)와 관련하여 설명된 바와 같다. 요소(268c)는 태그 7이 바이트(B7 및 B8) 둘 모두에 대한 태그임을 나타낸다. 따라서, 예(267)에서, 바이트(B7 및 B8)은 태그 7의 동일한 태그 값을 갖기 때문에, 구조체(268b)는 도 39의 구조체(266)보다 하나 적은 엔트리를 포함한다. 이러한 방식으로, 데이터 워드의 태그(예를 들어, (262a))가 가리키는 구조체(예를 들어, (268b))는 필요에 따라 특정 바이트 레벨 태그에 따라 다양한 수의 엔트리를 가질 수 있다.Referring to FIG. 40 , a second example 267 of byte-level tagging that may be used in an embodiment consistent with the techniques herein is illustrated. Element 262 represents tag 262a associated with tagged 64-bit word 265, where word 265 contains 8 bytes denoted B1 through B8. Tag 262a may be a pointer 268a to a memory location in structure 268b that contains a tag for each byte B1 to B8 of data word 265 . Structure 268b may include a first field 265b, which is a size field indicating the number of entries remaining in the structure. Accordingly, 265b is similar to 265a in FIG. 39 . Each subsequent entry in structure 268b may contain a tag value and may represent one or more bytes of word 265 with that particular tag value. In this example, the size 265b is 7, representing the 7 subsequent entries 266a through 226f and 268c. Elements 266a through 266f are as described with respect to example 260 of FIG. 39 . Element 268c indicates that tag 7 is a tag for both bytes B7 and B8. Thus, in example 267, since bytes B7 and B8 have the same tag value of tag 7, structure 268b contains one less entry than structure 266 of FIG. 39 . In this way, the structure (e.g., 268b) pointed to by the tag of the data word (e.g., 262a) may have a varying number of entries, depending on the particular byte-level tag, if desired.

데이터 액세스 세분성의 특정 레벨은 실시예에서 특정 아키텍처 및 명령어 세트에 따라 달라질 수 있음을 유의하여야 한다. 전술한 바는 바이트 레벨 데이터 액세스를 허용하는 실시예에서 바이트 레벨 태깅을 제공하는데 사용될 수 있다. 변형예로서, 실시예는 세분성의 상이한 레벨에서 데이터 액세스를 지원할 수 있으며, 본 명세서에서의 기술은 세분성의 임의의 서브워드 태깅 레벨로 용이하게 확장될 수 있다.It should be noted that the specific level of data access granularity may vary depending on the specific architecture and instruction set in the embodiment. The foregoing may be used to provide byte-level tagging in embodiments that allow byte-level data access. As a variant, embodiments may support data access at different levels of granularity, and the techniques herein may be readily extended to any subword tagging level of granularity.

유사하게, 예(260 및 267)는 바이트 레벨 또는 다른 서브워드 데이터 태깅을 유지하는데 사용될 수 있는 데이터 구조체의 하나의 예를 도시한다. 변형예로서, 실시예는 트리 또는 다른 계층 구조체를 사용하여 단일 태깅된 데이터 워드의 바이트에 대한 바이트 레벨 태그를 특정할 수 있다. 바이트 레벨 태그를 표현하는 트리 또는 다른 계층 구조체는 예를 들어 본 명세서의 다른 곳에서 설명된 도 78 내지 도 81의 각각의 요소(100, 120, 130 및 140)와 관련하여 워드 레벨 태그를 저장하기 위해 본 명세서에서 설명된 계층적 구조체와 유사할 수 있다.Similarly, examples 260 and 267 show one example of a data structure that can be used to hold byte level or other subword data tagging. As a variant, an embodiment may use a tree or other hierarchical structure to specify byte level tags for bytes of a single tagged data word. A tree or other hierarchical structure representing byte-level tags may be used to store word-level tags in association with, for example, respective elements 100, 120, 130, and 140 of FIGS. 78-81 described elsewhere herein. may be similar to the hierarchical structure described herein for

추가로 설명하면, 실시예는 도 41의 예(270)에서와 같이 바이트 레벨 태그를 표현하는 위해 트리 구조체를 사용할 수 있다. 예(270)에서, 요소(262)는 바이트(B1 내지 B8)를 포함하는 태깅된 워드(265)와 연관된 태그(262a)를 나타낼 수 있다. 태그(262a)는 B1 내지 B8(265)에 대한 바이트 레벨 태그를 나타내는 트리 구조체에 대한 포인터 또는 어드레스일 수 있다. 예를 들어, 태그(262a)는 트리 구조체의 루트 노드(272)의 위치를 가리킬 수 있다. 이 예에서 트리 구조체는 레벨 1의 루트 노드(272), 레벨 2의 노드(274a-274b), 레벨 3의 노드(276a 내지 276d) 및 레벨 4의 노드(278a 내지 278h)를 포함할 수 있다. 트리의 각 노드는 하나 이상의 바이트의 바이트 범위와 연관될 수 있다. 트리의 리프(leaf)들은 바이트(B1 내지 B8)에 대한 바이트 레벨 태그를 나타낼 수 있다. 그러므로 트리의 비-리프(non-leaf) 노드는 태그 값을 특정하지 않고 오히려 하나 이상의 하위 레벨에 있는 하나 이상의 자손 노드가 참고하여 비-리프 노드와 연관된 바이트 범위에 대한 바이트 레벨 태그를 결정해야 함을 나타낸다. 리프 노드는 (265)의 다중 바이트의 범위에 대한 동종 또는 동일한 태그 값을 나타낼 수 있다. 각각의 비-리프 노드는 비-리프 노드의 좌측 자식 노드를 가리키는 좌측 포인터 및 비-리프 노드의 우측 자식 노드를 가리키는 우측 포인터를 포함할 수 있다. 부모 노드의 자식 노드들 각각은 부모 노드와 연관된 바이트 범위의 분할을 나타낼 수 있다.Further explained, an embodiment may use a tree structure to represent a byte level tag as in example 270 of FIG. 41 . In example 270, element 262 can represent tag 262a associated with tagged word 265 that includes bytes B1 through B8. Tag 262a may be a pointer or address to a tree structure representing byte level tags for B1 through B8 265 . For example, the tag 262a may indicate the location of the root node 272 of the tree structure. In this example, the tree structure may include a root node 272 of level 1, nodes 274a to 274b of level 2, nodes 276a to 276d of level 3, and nodes 278a to 278h of level 4. Each node in the tree can be associated with a byte range of one or more bytes. The leaves of the tree may represent byte-level tags for bytes B1 through B8. Therefore, a non-leaf node in the tree does not specify a tag value, but rather one or more descendant nodes at one or more lower levels must consult it to determine the byte-level tag for the byte range associated with the non-leaf node. indicates Leaf nodes may represent homogenous or identical tag values for a multi-byte range of (265). Each non-leaf node may include a left pointer pointing to the left child node of the non-leaf node and a right pointer pointing to the right child node of the non-leaf node. Each of the child nodes of the parent node may represent a division of the byte range associated with the parent node.

예(270)는 동종 바이트 레벨 태그가 없고 (265)의 바이트(B1 내지 B8) 각각이 상이한 태그 값을 갖는 트리 구조체를 도시한다. 본 명세서의 다른 곳에서의 논의(예를 들어, 도 78 내지 도 81의 요소(100, 120, 130 및 140))와 일관된 방식으로, 실시예는 서브트리가 그의 루트로서 제 1 노드와 연관된 바이트 범위에 대한 동종 태그 값을 나타내는 제 1 노드를 갖는다면, 서브트리로부터 자손 노드를 생략할 수 있다. 예를 들어, 추가로 설명하면, 도 42가 참조될 수 있다. 예(280)에서, 요소(262)는 전술한 바와 같이 바이트(B1 내지 B8)를 포함하는 태깅된 워드(265)와 연관된 태그(262a)를 나타낼 수 있다. 태그(262a)는 B1 내지 B8(265)에 대한 바이트 레벨 태그를 나타내는 트리 구조체를 가리키는 포인터 또는 어드레스일 수 있다. 이 예(280)에서, 바이트(B1 내지 B8) 각각은 동일한 태그 T1을 가지며, 그래서 트리 구조체는 루트 노드(281)만을 포함해야 한다. 바이트(B1 내지 B8)에 대한 바이트 레벨 태그가 시간 경과에 따라 수정되거나 변경될 수 있기 때문에, 태그(262a)가 가리키는 트리 구조체 또는 다른 구조체는 이에 따라 이러한 바이트 레벨 태그 수정을 반영하기 위해 적절히 업데이트될 수 있다.Example 270 shows a tree structure in which there is no homogeneous byte-level tag and each of the bytes B1 through B8 of 265 has a different tag value. In a manner consistent with discussion elsewhere herein (e.g., elements 100, 120, 130, and 140 of FIGS. 78-81), an embodiment provides that a subtree is a byte associated with a first node as its root. If we have the first node representing the homogeneous tag value for the range, we can omit the descendant nodes from the subtree. For example, for further explanation, reference may be made to FIG. 42 . In example 280, element 262 can represent a tag 262a associated with tagged word 265 that includes bytes B1 through B8, as described above. Tag 262a may be a pointer or address pointing to a tree structure representing byte level tags for B1 through B8 265 . In this example 280, bytes B1 to B8 each have the same tag T1, so the tree structure should contain only the root node 281. Because the byte-level tags for bytes B1 through B8 may be modified or changed over time, the tree structure or other structure pointed to by tag 262a will be updated accordingly to reflect these byte-level tag modifications. can

동일한 데이터 워드(265) 내에서 바이트 레벨 태깅, 또는 보다 일반적으로 서브워드 태깅을 제공하는 실시예에서, (워드(265)에 포함된 바이트 중 어느 하나 이상의 바이트에 대응하는) 하나 이상의 바이트 레벨 태그가 참조되는 것을 나타내는 추가 입력이 PUMP에 제공될 수 있다. 예를 들어, 단일 태깅된 데이터 워드(265) 내에 8 바이트(B1 내지 B8)가 있는 바이트 레벨 태깅에서, PUMP로의 추가 입력은 8 비트의 비트마스크일 수 있으며, 여기서 8 비트 각각은 바이트(B1 내지 B8)의 서로 다른 바이트와 연관되고 워드(265)의 특정 바이트에 대한 바이트 레벨 태그를 사용할지를 나타낸다. 변형예로서, 실시예는 시작 바이트 및 길이 또는 크기와 같은 바이트 범위를 특정함으로써 하나 이상의 바이트를 나타낼 수 있다(예를 들어, 시작 바이트 B4를 특정하고 5라는 크기 또는 길이를 표시함으로써 바이트(B4 내지 B8)를 나타낼 수 있다). RISC-V 아키텍처에 기초한 본 명세서의 다른 곳에서의 논의와 일관하는 적어도 하나의 실시예에서, 하나 이상의 바이트(B1 내지 B8)에 대한 하나 이상의 바이트 레벨 태그가 PUMP에 의해 사용될 것임을 나타내는 추가 입력을 기록 또는 저장하는 CSR이 정의될 수 있다. 추가 입력은, 예를 들어, 비트마스크 또는 PUMP에 의해 사용된 특정 바이트 레벨 태그를 식별하는 다른 적합한 표현일 수 있다. 적어도 하나의 실시예에서, PUMP는 CSR을 사용하지 않고 어느 하나 이상의 바이트가 데이터 경로로부터의 (예를 들어, 코드 실행 도메인으로부터의) 입력으로서 사용될 것임을 나타내는 전술한 추가 입력을 정상적으로 수신할 수 있다. 그러나, 규칙 미스 시, 전술한 추가 입력은 규칙 미스 핸들러가 실행되는 메타데이터 처리 도메인이 전술한 추가 입력을 획득할 수 있도록 CSR에 기록될 수 있다(예를 들어, 규칙 삽입시 전술한 추가 입력에 대한 CSR 값이 PUMP에 제공된다).In an embodiment that provides for byte-level tagging, or more generally subword tagging, within the same data word 265, one or more byte-level tags (corresponding to any one or more of the bytes included in word 265) Additional input indicating what is being referenced may be provided to the PUMP. For example, in byte level tagging where there are 8 bytes (B1 through B8) within a single tagged data word 265, the additional input to the PUMP may be a bitmask of 8 bits, where each of the 8 bits is a byte (B1 through B8). B8) and indicates whether to use a byte level tag for a particular byte of word 265. As a variant, an embodiment may indicate one or more bytes by specifying a start byte and a length or range of bytes such as size (eg, bytes (B4 through B4) by specifying a start byte B4 and indicating a size or length of 5. B8)). In at least one embodiment consistent with discussion elsewhere herein, based on a RISC-V architecture, one or more byte-level tags for one or more bytes (B1 through B8) record additional input indicating that they will be used by the PUMP. Alternatively, a CSR to store may be defined. The additional input may be, for example, a bitmask or other suitable representation identifying a particular byte level tag used by the PUMP. In at least one embodiment, the PUMP may normally receive the aforementioned additional input indicating that any one or more bytes are to be used as input from the data path (eg, from the code execution domain) without using a CSR. However, on rule misses, the aforementioned additional inputs may be written to the CSR so that the metadata processing domain on which the rule miss handler runs can obtain the aforementioned additional inputs (e.g., upon rule insertion, the aforementioned additional inputs CSR value for is provided to PUMP).

본 명세서의 다른 곳에서 논의된 바와 같이, 정책 레벨에서 많은 명령어가 유사한 방식으로 취급될 수 있다. 예를 들어, 가산 및 감산 명령어 연산 코드 또는 opcode는 전형적으로 그의 메타데이터를 동일하게 취급할 수 있어서, 둘 모두의 opcode는 PUMP로의 동일한 태그 입력 및 PUMP에 의해 전파된 동일한 태그 출력을 고려함으로써 특정 정책에 대한 규칙 레벨에서 유사하게 거동할 수 있다. 그러한 경우에, 가산 및 감산 opcode는 단일 연산 그룹 또는 "opgroup"에서 함께 그룹화될 수 있으므로 동일한 규칙 세트가 그 opgroup 내 모든 opcode에 사용될 수 있다. opcode가 함께 그룹화되는 방법은 정책에 종속적이며 따라서 정책에 따라 다를 수 있다. 일 실시예에서, 정책 레벨마다 특정 opcode를 그의 연관된 opgroup에 매핑하는 변환 또는 매핑 테이블이 사용될 수 있다. 다시 말해서, 매핑은 정책마다 변할 수 있기 때문에 각 정책(또는 opgroup 매핑에 특정된 동일한 opcode를 갖는 다수의 정책 그룹)에 대해 상이한 매핑 테이블이 생성될 수 있다.As discussed elsewhere in this specification, many instructions at the policy level can be treated in a similar manner. For example, add and subtract instruction opcodes or opcodes can typically treat their metadata the same, so that both opcodes consider the same tag inputs to the PUMP and the same tag outputs propagated by the PUMP, thereby ensuring a specific policy. can behave similarly at the rule level for In such cases, add and subtract opcodes can be grouped together in a single operation group or "opgroup" so that the same set of rules can be used for all opcodes in that opgroup. How opcodes are grouped together is policy-dependent and therefore may differ from policy to policy. In one embodiment, a translation or mapping table may be used that maps specific opcodes to their associated opgroups per policy level. In other words, since mappings can change from policy to policy, a different mapping table can be created for each policy (or multiple policy groups with the same opcode specified in the opgroup mapping).

특정 opcode의 경우, 변환 또는 매핑 테이블은 위에서 언급한 바와 같이 opgroup을 결정할 수 있으며 특정 opcode 에 대한 추가 정보를 또한 결정할 수도 있다. 이러한 부가적인 정보는 어느 PUMP 입력 및 PUMP 출력(예를 들어, 입력 태그 및 전파된 출력 태그)이 각각 규칙 처리를 위한 입력으로서 실제로 사용되는지 그리고 특정 opcode에 대한 규칙 처리의 관련 출력으로서 전파될 수 있는지를 나타낼 수 있는, 본 명세서의 다른 곳에서도 또한 논의된 바와 같은 care/don't care 비트 벡터를 포함할 수 있다. 실시예에서, don't care 비트 벡터는 임의의 PUMP 입력 및 출력에 대해 결정될 수 있다. 일 실시예에서, don't care 비트 벡터는 어느 입력 태그 및 출력 태그가 관련 있는지를 나타낼 수 있고 또한 특정 opcode 비트가 특정 opcode에 실제로 사용되는지를 나타낼 수도 있다. 이것은 RISC-V 아키텍처 및 명령어 포맷에 대해 아래에서 보다 상세하게 설명되지만, 다른 아키텍처의 다른 적합한 명령어 포맷과 관련하여 보다 일반적으로 사용될 수도 있다. opgroup 및 특정 opcode에 대한 care/don't care비트(예를 들어, 아래에서 논의되는 예(420)의 요소(422))를 포함하는 전술한 변환 또는 매핑 테이블은 또한 본 명세서의 다른 곳에서 opgroup/care 표라고 언급될 수도 있다.For a particular opcode, a translation or mapping table can determine the opgroup as mentioned above and may also determine additional information about the particular opcode. This additional information determines which PUMP inputs and PUMP outputs (e.g., input tags and propagated output tags) are actually used as inputs for rule processing, respectively, and can be propagated as relevant outputs of rule processing for a particular opcode. A care/don't care bit vector as also discussed elsewhere herein, which may represent In an embodiment, the don't care bit vector may be determined for any PUMP input or output. In one embodiment, the don't care bit vector may indicate which input tags and output tags are relevant and may also indicate which particular opcode bits are actually used for a particular opcode. It is described in more detail below for the RISC-V architecture and instruction format, but may be used more generally in conjunction with other suitable instruction formats for other architectures. The foregoing conversion or mapping tables, including opgroups and care/don't care bits for particular opcodes (e.g., element 422 of example 420 discussed below), may also be used elsewhere herein for opgroup May also be referred to as the /care table.

RISC-V는 opcode에 대해 상이한 세트의 명령어 비트를 각각 사용하는 상이한 명령어 포맷을 가지고 있다. 도 43의 예(400)를 참조하면, RISC-V 아키텍처의 명령어를 사용하는 실시예에서 상이한 opcode에 대해 상이한 비트 인코딩에 포함될 수 있는 명령어의 비트들이 도시된다. 일반적으로, RISC-V 아키텍처는 명령어의 서로 다른 비트가 opcode 인코딩의 일부로서 사용될 수 있는 다중 명령어 포맷을 포함한다. 32 비트 명령어의 경우, 총 22 비트까지 opcode의 인코딩을 나타내는데 사용될 수 있다. 요소(404)는 명령어 포맷에 따라 특정 opcode에 대한 비트 인코딩을 나타내는데 사용될 수 있는 RISC-V 아키텍처에서 명령어의 부분을 나타낸다. 요소(404)는 특정 opcode를 인코딩하는데 사용될 수 있는 비트의 3 필드(404a 내지 404c)를 포함한다. 요소(404a)는 7 비트의 제 1 opcode 필드, opcode A를 나타낸다. 요소(404b)는 3 비트의 제 2 opcode 필드, funct3을 나타낸다. 요소(404a)는 12 비트의 제 3 opcode 필드, funct12를 나타낸다. (예를 들어, 시스템 호출과 같은) 명령어에 따라, opcode 인코딩은 (404a 내지 404c)로 표시된 비트들 중 모두 22 개까지를 포함할 수 있다. 보다 구체적으로, RISC-V에서, opcode는 (404b)의 단지 7 비트를 사용하여, (404b 및 404c)의 단지 10 비트((404a) 제외함) 또는 (404a 내지 404c)의 22 비트 모두를 사용하여 인코딩될 수 있다. 또 다른 변형예로서, RISC-V 아키텍처의 명령어는 (402)로 표시된 바와 같이 필드를 사용하는 opcode 인코딩을 가질 수 있다. 요소(402)는 전술한 바와 같이 비트의 두 필드(404b 및 404c)를 포함한다. 또한, opcode 인코딩에서 funct12(404a)의 12 비트 모두 사용하기 보다는, 명령어는 funct7(402a)에 의해 표시되는 12 비트 중 7 비트만을 사용할 수 있다. 따라서, 또 다른 가능성으로서, opcode는 요소(402)에 의해 도시된 바와 같이 필드(402, 404b 및 404c)를 사용하는 인코딩을 가질 수 있다.RISC-V has different instruction formats, each using a different set of instruction bits for the opcode. Referring to example 400 of FIG. 43 , the bits of an instruction that may be included in different bit encodings for different opcodes in an embodiment using instructions of the RISC-V architecture are shown. In general, RISC-V architectures include multiple instruction formats in which different bits of an instruction can be used as part of opcode encoding. For 32-bit instructions, it can be used to indicate the encoding of opcodes up to a total of 22 bits. Element 404 represents a portion of an instruction in the RISC-V architecture that can be used to indicate the bit encoding for a particular opcode depending on the instruction format. Element 404 includes three fields 404a through 404c of bits that can be used to encode a particular opcode. Element 404a represents a 7-bit first opcode field, opcode A. Element 404b represents a 3-bit second opcode field, funct3. Element 404a represents a 12-bit third opcode field, funct12. Depending on the instruction (eg, system call), the opcode encoding may include up to all 22 of the bits indicated by (404a through 404c). More specifically, in RISC-V, the opcode uses only 7 bits of (404b), only 10 bits of (404b and 404c) (excluding (404a)), or all 22 bits of (404a to 404c). can be encoded. As another variation, an instruction in a RISC-V architecture may have an opcode encoding using a field as indicated by 402 . Element 402 includes two fields 404b and 404c of bits as described above. Also, rather than using all 12 bits of funct12 (404a) in opcode encoding, an instruction can only use 7 of the 12 bits represented by funct7 (402a). Thus, as another possibility, the opcode can have an encoding using fields 402, 404b and 404c as shown by element 402.

도 44에는 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 매핑 또는 변환 테이블을 도시하는 예(420)가 도시된다. 위에서 논의된 바와 같이, opcode(421)는 opcode 매핑 테이블(422)에 입력 또는 인덱스로서 제공되어 opcode(421)에 매핑된 출력(424)을 룩업하거나 결정할 수 있다. 매핑된 출력(424)은 특정 opcode(421)에 대한 PUMP 입력 및 출력의 opgroup 및 care/don't care 비트 벡터를 포함할 수 있다. RISC 아키텍처 및 명령어 포맷에 기초한 실시예에서, opcode는 잠재적으로 최대 22 비트의 인코딩을 가질 수 있다. 그러나 22 비트 opcode를 수용하는데 필요한 많은 수의 엔트리로 인해 이렇게 큰 22 비트 opcode를 테이블에 인덱스로서 사용하는 것은 불합리하다 (예를 들어, 테이블은 22 비트 opcode를 위한 수백만 개의 엔트리를 만들어내는 연관된 opgroup 및 care/don't care 비트 벡터 정보를 나타내는 각 opcode마다 엔트리를 포함할 수 있다). 이러한 실시예에서 테이블(422)의 크기를 줄이기 위해, 테이블(422)은 22 비트 opcode 필드의 일부만을 사용하여 인덱싱될 수 있다. 예를 들어, 적어도 하나의 실시예에서, opcode(421) 입력은 예(400)의 요소(404b 및 404c)에 의해 나타낸 opcode의 10 비트일 수 있다. 따라서, 테이블(422)는 opcode의 (404b 및 404c)의 opcode 비트를 사용하여 인덱싱되어 opcode의 opgroup 및 연관된 care/don't care 비트 벡터를 결정할 수 있다.44 shows an example 420 illustrating a mapping or conversion table that may be used in an embodiment consistent with the techniques herein. As discussed above, opcode 421 can be provided as an input or index into an opcode mapping table 422 to look up or determine the output 424 mapped to opcode 421 . The mapped outputs 424 may include opgroups of PUMP inputs and outputs for specific opcodes 421 and care/don't care bit vectors. In embodiments based on the RISC architecture and instruction format, an opcode can potentially have an encoding of up to 22 bits. However, due to the large number of entries needed to accommodate a 22-bit opcode, it is unreasonable to use such large 22-bit opcodes as indexes into a table (e.g., a table can create millions of entries for 22-bit opcodes and associated opgroups and opcodes). An entry may be included for each opcode representing care/don't care bit vector information). To reduce the size of table 422 in this embodiment, table 422 may be indexed using only a portion of the 22-bit opcode field. For example, in at least one embodiment, opcode 421 input may be 10 bits of the opcode represented by elements 404b and 404c of example 400. Thus, table 422 can be indexed using the opcode bits of 404b and 404c of the opcode to determine the opcode's opgroup and associated care/don't care bit vectors.

이러한 실시예에서, 명령어의 funct12(404a)의 나머지 12 opcode 비트는 PUMP의 입력으로서 제공될 수 있고, PUMP에서 (404a)의 적절한 부분은 특정 opcode에 대해 마스킹된다. funct12(404a)의 어떤 특정 비트가 특정 opcode에 대해 마스킹되어야하는지/마스킹되지 않아야하는지에 관한 정보는 opcode에 대한 매핑 테이블(422) 룩업으로부터 출력된 care/don't care 비트 벡터 정보에 포함될 수 있다. RISC-V 아키텍처에 기초한 적어도 하나의 실시예에서, care/don't care 비트 벡터 정보는 opcode에 대한 funct12(404a)의 12 opcode 비트에 대해 다음 중 하나를 나타낼 수 있다:In such an embodiment, the remaining 12 opcode bits of funct12 404a of the instruction may be provided as input to the PUMP, in which the appropriate portion of 404a is masked for the particular opcode. Information on which specific bits of funct12 404a should be masked/unmasked for a specific opcode may be included in the care/don't care bit vector information output from the mapping table 422 lookup for the opcode. . In at least one embodiment based on the RISC-V architecture, the care/don't care bit vector information may indicate one of the following for the 12 opcode bits of funct12 404a for the opcode:

1. (404a)의 비트가 아무것도 사용되지 않기 때문에 모든 12 비트가 마스킹될 수 있다.1. Since none of the bits in 404a are used, all 12 bits can be masked.

2. (402a)에 의해 표시된 바와 같이, 12 비트 중 7 비트는 (404a)의 최하위 5 비트(예를 들어, 비트 20 내지 25)가 마스킹된 곳에서 사용된다; 또는2. As indicated by 402a, 7 of the 12 bits are used where the least significant 5 bits of 404a (e.g., bits 20-25) are masked; or

3. (404a)의 모든 12 비트가 사용되며 그래서 (404a)의 비트의 마스킹은 없다.3. All 12 bits of 404a are used so there is no masking of the bits of 404a.

또한, 이러한 실시예에서, funct12(404a)의 12 opcode 비트는 PUMP 내에 삽입을 수행하는 것과 관련하여 PUMP 입력으로서 제공되는, 본 명세서의 다른 곳에서 설명된 sfunct12 CSR과 같은 CSR에 기록되거나 저장될 수 있다. 적어도 하나의 실시예에서, PUMP는 CSR을 사용하지 않고 데이터 경로로부터 (예를 들어, 코드 실행 도메인으로부터) 전술한 opcode 비트를 정상적으로 수신할 수 있다. 그러나, 규칙 미스 시, 전술한 것은 CSR에 기입되어, 규칙 미스 핸들러가 실행되는 메타데이터 처리 도메인이 전술한 것을 입력으로서 획득할 수 있도록 CSR에 기록된다(예를 들어, 규칙 삽입 시CSR 값은 PUMP에 대한 입력으로서 제공된다).Also, in this embodiment, the 12 opcode bits of funct12 404a may be written to or stored in a CSR, such as the sfunct12 CSR described elsewhere herein, provided as a PUMP input in connection with performing an insert into a PUMP. there is. In at least one embodiment, the PUMP may normally receive the aforementioned opcode bits from the data path (eg, from the code execution domain) without using a CSR. However, on a rule miss, the foregoing is written to the CSR so that the metadata processing domain on which the rule miss handler runs can obtain the foregoing as input (e.g., on rule insert, the CSR value is PUMP is provided as an input to).

본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 다중 사용자 프로세스는 물리적 페이지가 사용자 프로세스 어드레스 공간으로 매핑되는 가상 메모리 환경을 사용하여 실행할 수 있다. 본 명세서에서의 기술은 다중 사용자 프로세스 사이에서 메모리의 물리적 페이지를 공유할 수 있도록 이용될 수 있으며, 이 경우 하나 이상의 물리적 페이지의 동일한 세트가 다중 사용자 프로세스 어드레스 공간으로 동시에 매핑될 수 있다. 적어도 하나의 실시예에서, 공유가 허용되는 그러한 프로세스에 의해 사용되는 태그는 사용자 프로세스 어드레스 공간에 걸쳐 값 및 의미 또는 해석이 동일한 글로벌 태그인 것으로 특징지을 수 있다.In at least one embodiment consistent with the techniques herein, multi-user processes may execute using a virtual memory environment in which physical pages are mapped into user process address space. Techniques herein may be used to share physical pages of memory among multiple user processes, in which case the same set of one or more physical pages may be simultaneously mapped into the address space of multiple user processes. In at least one embodiment, tags used by those processes that are allowed to share may be characterized as global tags that have the same value and meaning or interpretation across user process address spaces.

도 45를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 프로세스들 사이에서 물리적 페이지의 공유를 도시하는 예(430)가 도시된다. (430)는 어드레스 공간(434)을 갖는 프로세스 PI 및 어드레스 공간(436)을 갖는 프로세스 P2를 포함한다. 요소(434)는 가상 메모리 프로세스 어드레스 공간 또는 0 내지 MAX의 범위를 나타낼 수 있으며, 여기서 MAX는 PI에 의해 사용되는 최대 가상 메모리 어드레스를 나타내고 0은 PI에 의해 사용되는 최소 가상 어드레스를 나타낸다. 관련 기술분야에서 공지된 바와 같이, 메모리(432)의 물리적 페이지는 (434)와 같은 가상 어드레스 공간에 매핑되고, 이 가상 어드레스 공간에서 매핑된 물리적 페이지의 내용은 그렇게 매핑된 물리적 페이지의 매핑된 가상 어드레스를 사용하여 PI에 의해 액세스될 수 있다. 예를 들어, 물리적 페이지 A(432a)는 P1의 가상 어드레스 공간의 서브 범위 X1에 매핑될 수 있다. 프로세스 P1은 예를 들어, 서브 범위 X1에서 특정 가상 어드레스를 참조함으로써 페이지 A(432a) 내의 위치로부터 데이터 아이템 또는 명령어를 판독할 수 있다.Referring to FIG. 45 , an example 430 illustrating sharing of physical pages between processes in an embodiment consistent with the techniques herein is shown. 430 includes process PI having address space 434 and process P2 having address space 436 . Element 434 may represent a virtual memory process address space or range from 0 to MAX, where MAX represents the maximum virtual memory address used by the PI and 0 represents the minimum virtual address used by the PI. As is known in the art, a physical page of memory 432 is mapped into a virtual address space, such as 434, and the contents of the mapped physical page in this virtual address space are mapped to the virtual address space of such mapped physical page. It can be accessed by the PI using the address. For example, physical page A 432a may be mapped to subrange X1 of the virtual address space of P1. Process P1 can read a data item or instruction from a location in page A 432a, for example, by referencing a particular virtual address in subrange X1.

마찬가지로, 메모리의 물리적 페이지(432)는 가상 어드레스 공간(436)으로 매핑될 수 있고, 이 가상 어드레스공간에서 매핑된 물리적 페이지의 내용은 그렇게 매핑된 물리적 페이지의 매핑된 가상 어드레스를 사용하여 P2에 의해 액세스될 수 있다. 예를 들어, 물리적 페이지 A(432a)는 P2의 가상 어드레스 공간의 서브 범위 X2에 매핑될 수 있다. 프로세스 P2는, 예를 들어, 서브 범위 X2에서 특정 가상 어드레스를 참조함으로써 페이지 A(432a) 내의 위치로부터 데이터 아이템 또는 명령어를 판독할 수 있다.Similarly, a physical page 432 of memory can be mapped into a virtual address space 436, in which the contents of the mapped physical page are read by P2 using the mapped virtual address of the mapped physical page. can be accessed. For example, physical page A 432a may be mapped to subrange X2 of the virtual address space of P2. Process P2 can read a data item or instruction from a location within page A 432a, for example, by referencing a particular virtual address in subrange X2.

태그(431)는 페이지 A(432)의 메모리 위치상의 태그를 나타낼 수 있고, 이 페이지에서 이러한 태그는 본 명세서에서 설명된 규칙 처리와 관련하여 PUMP에 의해 사용될 수 있다. 페이지 A(432)는 도시된 바와 같이 매핑을 통해 PI 및 P2 둘 모두에 의해 공유되기 때문에, 동일한 세트의 태그(431)가 또한 P1 및 P2 둘 모두의 명령어를 실행하는 것과 관련하여 PUMP에 의해 사용된다. 이러한 실시예에서, 태그(431)는 PI 및 P2 둘 모두에 의해 공유되는 글로벌 태그라고 특징지을 수 있다. 또한, 적어도 하나의 실시예에서, 다수의 프로세스(P1 및 P2)에 의해 공유되는 글로벌 태그(431)는 동일한 규칙 및 정책을 사용하는 것과 같은 유사한 방식으로 해석된다. 예를 들어, 값이 100 인 제 1 태그는 (432a) 내의 제 1 메모리 위치와 연관될 수 있다. 제 1 태그는 특정 실행 명령어가 제 1 메모리 위치 또는 그 내용을 참조하는 동작을 수행할 수 있는지를 결정하는 정책의 규칙과 관련하여 사용되는 제 1 메모리 위치의 컬러화를 나타내는 값을 표시할 수 있다. 제 1 태그는 P1 및 P2 둘 모두의 명령어 실행과 관련하여 규칙에 의해 동일한 컬러로 해석될 수 있다. 예를 들어, 100이라는 태그 값은 P1 및 P2 둘 모두와 관련한 규칙에 의해 동일한 컬러로서 해석되어야 한다. 또한, 동일한 세트 또는 인스턴스의 정책 및 규칙은 P1 및 P2 둘 모두에 대해 PUMP에 의해 사용될 수 있다.Tag 431 may represent a tag on a memory location in page A 432, in which page this tag may be used by PUMP in connection with the rule processing described herein. Because page A 432 is shared by both PI and P2 via mapping as shown, the same set of tags 431 are also used by PUMP in connection with executing instructions on both P1 and P2. do. In this embodiment, tag 431 may be characterized as a global tag shared by both PI and P2. Also, in at least one embodiment, global tags 431 shared by multiple processes P1 and P2 are interpreted in a similar manner, such as using the same rules and policies. For example, a first tag with a value of 100 can be associated with a first memory location in 432a. The first tag may indicate a value indicating a colorization of the first memory location used in conjunction with a rule of a policy that determines whether a particular executable instruction may perform an operation that references the first memory location or its contents. The first tag may be interpreted as the same color by convention with respect to instruction execution in both P1 and P2. For example, a tagged value of 100 should be interpreted as the same color by the rules relating to both P1 and P2. Also, the same set or instance of policies and rules may be used by PUMP for both P1 and P2.

전술한 바와 같이 공유 메모리 상에 글로벌 태그를 사용하는 그러한 실시예에서, 프로세스 단위로 상이한 액세스, 인가 또는 동작을 더 차별화하거나 허용하는 것이 또한 바람직할 수 있다. 예를 들어, 페이지 A(432a)가 P1 및 P2 둘 모두에 의해 공유된 데이터를 포함한다고 가정한다. 그러나 글로벌 태그가 공유 페이지 A(432a)에 태깅하기 위해 사용되더라도, 프로세스 별로 (432a)의 공유 데이터에 대해 상이한 동작 또는 액세스를 허용하는 것이 바람직할 수 있다. 예를 들어, 프로세스 P1는 페이지(432a)에 대해 기입 액세스를 가질 수 있고 프로세스 P2는 페이지(432a)에 대해 판독 전용 액세스를 가질 수 있다. 그러나 (432a)는 글로벌 태그로 태깅된 공유 메모리 페이지일 수 있다. 공유 페이지상의 글로벌 태그를 갖는 그러한 실시예에서, 동일한 정책 및 규칙 세트는 각각의 프로세스에 대해 상이한 판독 및 기입 액세스 능력이 PC상의 상이한 태그 값을 사용하여 차별화될 수 있는 P1 및 P2에 관련하여 사용될 수 있다. 예를 들어, 프로세스 P1은 (432a) 내의 메모리 위치에 기입을 수행하는 제 1 명령어를 포함할 수 있고 현재 PC 태그는 X의 값을 갖고 있다. 액세스 정책의 규칙은 다음의 로직을 수행할 수 있다:In such embodiments using global tags on shared memory as described above, it may also be desirable to further differentiate or allow different accesses, authorizations or operations on a per-process basis. For example, suppose page A 432a contains data shared by both P1 and P2. However, even if a global tag is used to tag shared page A 432a, it may be desirable to allow different operations or access to the shared data of 432a per process. For example, process P1 may have write access to page 432a and process P2 may have read-only access to page 432a. However, 432a may be a shared memory page tagged with a global tag. In such an embodiment with a global tag on a shared page, the same set of policies and rules could be used for P1 and P2 where different read and write access capabilities for each process could be differentiated using different tag values on the PC. there is. For example, process P1 may include a first instruction that performs a write to a memory location in 432a and the current PC tag has a value of X. A rule in an access policy may perform the following logic:

PCtag = X이면, 기입 허용If PCtag = X, allow writing

PCtag = Y이면, 판독만 허용If PCtag = Y, read only

이러한 경우에, PC 태그는 규칙에 의해 프로세스 P1에 대해 기입 액세스를 허용하는 것으로 해석되는 X라는 값을 가지며 이에 따라 P1은 제 1 명령어를 실행할 수 있다. 프로세스 P2는 (432a) 내의 메모리 위치에 대해 마찬가지로 기입을 수행하는 제 2 명령어를 가질 수 있고 현재 PC 태그는 Y의 값을 갖고 있다. 이러한 경우에, PC 태그는 규칙에 의해 기입 액세스를 허용하지 않고 대신 프로세스 P2에 대한 판독 전용 액세스를 허용하는 것으로 해석되는 Y 값을 가지며 이에 따라 P2는 제 2 명령어를 실행하는 것이 허용되지 않는다.In this case, the PC tag has a value of X, which is interpreted by convention as allowing write access to process P1, so that P1 can execute the first instruction. Process P2 may have a second instruction that likewise performs a write to the memory location in 432a and the current PC tag has a value of Y. In this case, the PC tag has a Y value that is interpreted as not allowing write access by convention, but instead allowing read-only access to process P2, whereby P2 is not allowed to execute the second instruction.

따라서, 적어도 하나의 실시예에서, PC 태그는 프로세스 별로 다를 수 있는 권한, 액세스 또는 인가를 인코딩하는데 사용될 수 있고, 이에 따라 특정의 허용된 권한, 액세스 또는 인가가 상이한 PC 태그 값에 의해 표현될 수 있다.Thus, in at least one embodiment, a PC tag may be used to encode rights, access or authorizations that may vary from process to process, such that a particular allowed right, access or authorization may be represented by a different PC tag value. there is.

실시예는 임의의 적절한 방식으로 각 프로세스에 대해 사용될 특정 PC 태그 값을 특정할 수 있다. 예를 들어, 권한 있는 코드는 특정 프로세스에 사용될 PC 태그 값을 초기에 특정하는 오퍼레이팅 시스템 스타트업 또는 초기화의 일부로서 실행할 수 있다. 변형예로서, 실시예는 공유 페이지 A(432a)를 프로세스 어드레스 공간으로 매핑하는 일부로서 매핑 동작을 수행할 수 있다. 매핑을 수행할 때 오퍼레이팅 시스템에 의해 적용된 규칙은 특정 프로세스에 기초하여 원하는 액세스, 권한 또는 인가를 나타내는 출력으로 특정 PC 태그를 전파하거나 생성할 수 있다.Embodiments may specify specific PC tag values to be used for each process in any suitable way. For example, privileged code may execute as part of operating system startup or initialization that initially specifies a PC tag value to be used for a particular process. As a variant, an embodiment may perform a mapping operation as part of mapping shared page A 432a into the process address space. Rules applied by the operating system when performing the mapping may propagate or generate specific PC tags with output indicating desired access, rights or authorizations based on specific processes.

이러한 방식으로, 동일한 세트의 규칙은 규칙이 PC 태그에 기초한 액세스, 권한 또는 인가의 차이에 대해 로직을 인코딩하는 글로벌 태그를 갖는 공유 페이지와 함께 사용될 수 있다. PC 태그는 또한 메모리 위치를 가리키는 포인터일 수 있으며, 그에 따라 포인터 태그는 다른 태그와 관련하여 본 명세서에서 설명된 방식으로 상이한 태그 값을 상이한 정책에 포함시키는 구조체를 가리킨다는 것을 유의하여야 한다. 이러한 방식으로, 동일한 세트의 PC 태그 값은 정책에 따라 다를 수 있는 프로세스의 상이한 능력을 나타내는데 사용될 수 있다. 예를 들어, P1를 갖는 전술한 바와 같은 X라는 PC 태그 값은 공유 영역에 대해 메모리 안전 정책 또는 데이터 액세스 정책이 있는 전술한 바와 같은 제 1 용도를 가질 수 있다. X라는 동일한 PC 태그 값은 제어 흐름 무결성(CFI)과 같은 제 2의 상이한 정책의 규칙에 의해 부여되는 제 2 용도 및 의미를 가질 수 있다.In this way, the same set of rules can be used with shared pages with global tags where the rules encode logic for differences in access, rights or authorization based on PC tags. It should be noted that a PC tag can also be a pointer to a memory location, so that a pointer tag points to a structure that includes different tag values in different policies in the manner described herein with respect to other tags. In this way, the same set of PC tag values can be used to indicate different capabilities of processes that can vary according to policy. For example, a PC tag value of X as described above with P1 may have a primary purpose as described above where there is a memory safety policy or a data access policy for the shared area. The same PC tag value of X may have a second purpose and meaning given by rules of a second, different policy, such as Control Flow Integrity (CFI).

허용 가능한 호출, 점프, 리턴 포인트 등의 정적 정의에 기초하여 제어 이전을 제한하는 것과 관련하여 사용될 수 있는 CFI 정책의 양상이 본 명세서서 설명된다. 그러나 CFI 정책에 포함될 수 있는 추가적인 양상 또는 차원은 동적 또는 런타임 호출 정보의 실시와 관련이 있으며, 그럼으로써 리턴되는 제어 이전이 이루어질 수 있는 조건을 더욱 세분화한다. 추가로 설명하면, 루틴 foo(502), bar(504) 및 baz(506)를 포함하는 도 46의 예(500)가 참조된다. 루틴 Foo(502)는 bar(504)로 제어의 런타임 이전(501a)을 초래하는 루틴 bar를 호출하는 어드레스 X1에 있는 호출 명령어를 포함할 수 있다. 그러면 루틴 bar(504)는 루틴 bar로의 제어를 어드레스 X2로 리턴(501b)하는 리턴 명령어를 포함시킨다. 따라서 X2는 X1에서 루틴 bar에게 호출한 다음 루틴 foo 내 명령어의 리턴 포인트 어드레스 또는 위치를 나타낸다. 루틴 Foo(502)는 baz(506)로 baz(506)로 제어의 런타임 이전(501a)을 초래하는 루틴 baz(506)를 호출하는 어드레스 Y1에 있는 제 2 호출 명령어를 포함한다. 그러면 루틴 baz(506)는 루틴 bar로의 제어를 어드레스 Y2로 리턴하는 리턴 명령어를 포함시킬 수 있다. 따라서, Y2는 Y1에서 루틴 baz에게 호출한 다음 루틴 foo에서 명령어의 리턴 포인트 어드레스 또는 위치를 나타낸다.Aspects of CFI policies that may be used in connection with restricting transfer of control based on static definitions of allowable calls, jumps, return points, etc. are described herein. However, an additional aspect or dimension that can be included in a CFI policy relates to the enforcement of dynamic or runtime invocation information, thereby further refining the conditions under which returned control transfers can be made. For further explanation, reference is made to example 500 of FIG. 46 , which includes routines foo 502 , bar 504 , and baz 506 . Routine Foo 502 may contain a call instruction at address X1 that calls routine bar, which results in a runtime transfer 501a of control to bar 504. Routine bar 504 then includes a return instruction that returns 501b control to routine bar to address X2. So X2 represents the return point address or location of the instruction in routine foo following the call from X1 to routine bar. Routine Foo (502) contains a second call instruction at address Y1 that calls routine baz (506), which results in a runtime transfer (501a) of control to baz (506) to baz (506). Routine baz 506 may then include a return instruction that returns control to routine bar to address Y2. Thus, Y2 represents the return point address or location of the instruction in routine foo following the call to routine baz in Y1.

정적 CFI 정책은 예를 들어, 동적 런타임 제어 흐름 양상을 반영하는 현재 런타임 스택 또는 호출 체인을 기초로 한 제어 흐름 또는 이전을 더 이상 제한하지 않으면서 임의의 두 개의 이전 포인트 사이에서 모든 잠재적 제어 흐름을 가능하게 할 수 있다. 예를 들어, (500)에 도시된 바와 같이 foo(502)가 bar(504)를 호출할 수 있으면, foo 내 X1에서 bar를 호출한 후 반대로 bar로부터 명령어의 어드레스 X2로의 정적으로 허용된 제어 흐름이 있다. 그러나 foo가 지금까지 호출되지 않았거나, bar 호출 이전에 리턴해야 하는 다른 호출을 어떤 것에게 호출하였다면, 리턴 링크를 실행하여 X2로 리턴할 수 없어야 한다. 예(500)에 도시된 바와 같이 런타임 실행이 있는 다른 예로서, (501a)를 통해 Bar(504)에게로의 호출은 (501d)를 통해 Y2에 있는 Foo(502)로 리턴할 수 없어야 한다.A static CFI policy can direct all potential control flows between any two transfer points without further restricting the transfer or control flow based on, for example, the current runtime stack or call chain that reflects dynamic runtime control flow aspects. can make it possible For example, as shown at 500, if foo 502 can call bar 504, it calls bar on X1 in foo, then the statically allowed control flow from bar to the instruction's address X2 is reversed. there is However, if foo has not been called so far, or has called something else that should return before calling bar , then it should not be able to execute the return link to return to X2. As another example with runtime execution as shown in example 500, a call to Bar 504 via 501a should not be able to return via 501d to Foo 502 at Y2.

이제 리턴 흐름 경로 제어를 제어하는 동적 CFI 리턴 정책을 시행하는 CFI 정책의 규칙의 확장과 관련하여 사용될 수 있는 기술이 설명될 것이다. X1에 있는 bar에게로의 호출과 같은 특정 호출 또는 부름의 다음에 이루어질 때만 X2와 같은 특정 리턴 위치로의 리턴이 유효하다는 것을 보장하는 동적 CFI 리턴 정책의 경우, 동적 CFI 리턴 정책은 유효하지 않은 리턴을 배제하기 위해 호출이 행해졌을 때 정보를 예컨대 하나 이상의 태그에 저장할 수 있다. 관련 기술분야에서 공지된 바와 같이, 예컨대 RISC-V 명령 세트의 JAL(jump and link) 명령어를 사용하여 호출이 이루어질 때, 리턴 어드레스는 리턴 어드레스 레지스터(RA)에 저장된다. RISC-V 명령어 세트에는 리턴 명령어의 예인 JALR(jump and link register) 명령어가 또한 포함된다. 일 양태에서, JAL로부터 RA 레지스터에 저장된 리턴 어드레스는 그 포인트로 리턴하는 "능력(capability)"으로 특징지을 수 있다. 적어도 하나의 실시예에서, JAL 명령어는 규칙이 적합한 태그 능력을 결과로 생긴 리턴 어드레스로 푸시하게 하는 태그로 태깅될 수 있다. 예를 들어, 리턴 어드레스 레지스터로서 RA를 사용하는 경우, 규칙은 RA 레지스터가 유효한 또는 적합한 리턴 어드레스를 포함하고, 나중에 RA 레지스터의 어드레스가 제어가 이전될 수 있는 리턴 포인트로서 사용될 수 있음을 나타내는 태그를 RA 레지스터 상에 배치할 수 있다. 다시 말해서, RA 레지스터상의 태그는 RA 내의 어드레스가 PC에 로드되어 제어의 리턴 이전을 실행하는 리턴 어드레스로서 사용되는 인가를 제공한다. RA 어드레스를 PC에 로드할 때, RA 태그는 CFI 정책의 규칙에 의해 PC 태그로서 저장될 수도 있다.Techniques that may be used will now be described in connection with the extension of the rules of the CFI policy to enforce the dynamic CFI return policy that governs return flow path control. In the case of a dynamic CFI return policy that guarantees that a return to a specific return location, such as X2, is valid only when followed by a specific call or call, such as a call to bar at X1, a dynamic CFI return policy can prevent invalid returns. Information may be stored in one or more tags, for example, when a call is made to exclude. As is known in the art, when a call is made using, for example, a jump and link (JAL) instruction of the RISC-V instruction set, the return address is stored in a return address register (RA). The RISC-V instruction set also includes jump and link register (JALR) instructions, which are examples of return instructions. In one aspect, the return address stored in the RA register from the JAL may be characterized by a "capability" to return to that point. In at least one embodiment, JAL commands may be tagged with tags that cause rules to push appropriate tag capabilities to the resulting return address. For example, in the case of using RA as the return address register, the convention is to tag the RA register to contain a valid or suitable return address, and later the address in the RA register can be used as a return point to which control can be transferred. Can be placed on the RA register. In other words, the tag on the RA register provides authorization for the address in the RA to be loaded into the PC and used as the return address to execute the return transfer of control. When loading the RA address into the PC, the RA tag may be stored as a PC tag by the rules of the CFI policy.

리턴시 제어 흐름을 제한하는데 사용될 수 있는 기술을 추가로 설명하면, 실시예는 expect-A와 같은 동적 CFI 태그로 각 리턴 포인트(예를 들어, X2, Y2)에 코드 태깅할 수 있다. 또한 각 JAL 명령어(또는 호출 명령어)를 코드 태깅하면 규칙으로 하여금 JAL 명령어가 (리턴 어드레스가 JAL에 의해 계산되는) RA 레지스터 내의 리턴 어드레스를 적절한 동적-CFI-리턴-투-A(dynamic-CFI-return-to-A) 태그로 태깅하는 것에 대해 평가하게 한다. 동적-CFI-리턴-투-A 태그로 태깅된 RA 레지스터를 사용하는 각 JALR 명령어와 같은 각 리턴에 대해, PUMP 규칙은 다른 정적 CFI 정책 규칙과 관련하여 수행될 수 있는 것처럼 태그(동적-CFI-리턴-투-A)를 PC에 전파할 수 있다. CFI 정책의 규칙은 리턴 명령어에 사용된 RA 레지스터를 체크하는 로직을 구현할 수 있다. 리턴에 사용된 RA 레지스터가 동적-CFI-리턴-투-A 태그로 태깅되지 않으면, 레지스터에는 JALR 명령어와 함께 사용하도록 허용된 유효한 리턴 어드레스가 포함되지 않은 것으로 알려져 있다. 리턴 포인트(예를 들어, X2 및 Y2)에서, 규칙은 (예를 들어, X2의 명령어상의 태그로서) expect-A 코드 태그를 만날 때 PC가 동적-CFI-리턴-투-A로 태깅되어 있는 것을 체크하고, PC로부터 동적-CFI-리턴-투-A 태그를 클리어하는 로직을 구현할 수 있다.Further describing a technique that can be used to restrict control flow on return, an embodiment can code tag each return point (eg X2, Y2) with a dynamic CFI tag such as expect-A. Code tagging each JAL instruction (or calling instruction) also allows the rules to ensure that the JAL instruction returns the return address in the RA register (where the return address is computed by the JAL) to the appropriate dynamic-CFI-return-to-A. return-to-A) tags to evaluate tagging. For each return, such as each JALR instruction using an RA register tagged with a dynamic-CFI-return-to-A tag, a PUMP rule can be performed in conjunction with other static CFI policy rules by adding a tag (dynamic-CFI- Return-to-A) can be propagated to the PC. The rules of the CFI policy may implement logic to check the RA register used in return instructions. It is known that unless the RA register used for return is tagged with the dynamic-CFI-return-to-A tag, the register does not contain a valid return address allowed for use with JALR instructions. At return points (e.g., X2 and Y2), when the rule encounters an expect-A code tag (e.g., as a tag on the instruction of X2), the PC is tagged as dynamic-CFI-return-to-A. and implement logic to clear the dynamic-CFI-return-to-A tag from the PC.

위의 결과로서, 코드는 임의의 리턴 어드레스로 바로 리턴하는 것이 방지된다. 또한 리턴 어드레스가 다른 레지스터와 같은 다른 위치로 복사되면, 규칙은 복사된 값이 리턴 인가 능력을 유지하지 못하게 할 수 있고; 이것은 코드가 동일한 호출에 대해 다수 번 리턴을 수행하는데 사용될 수 있는 리턴 어드레스의 레지스터 내에 사본을 작성하지 못하게 한다. 위의 다른 결과로서, 스택상의 (적절하게 태깅된) 유효한 리턴 어드레스가 (적절하게 태깅되지 않은) 새로운 어드레스로 오버라이팅된 다음 새로운 어드레스로 리턴하려는 시도가 이루어지면, 리턴이 방지된다.As a result of the above, the code is prevented from returning directly to any return address. Also, if the return address is copied to another location, such as another register, rules may prevent the copied value from retaining the ability to apply returns; This prevents code from making copies in registers of return addresses that could be used to perform multiple returns on the same call. As another result of the above, if a valid return address (properly tagged) on the stack is overwritten with a new address (not properly tagged) and then an attempt is made to return to the new address, the return is prevented.

실시예는 동적-CFI-리턴-투-A 태그를 한번을 초과하여 사용하는 기능을 방지 또는 추가로 제한하는 규칙을 또한 포함할 수 있다. 제 1 구현예로서, 실시예는 (동적-CFI-리턴-투-A 태그로 태깅된 RA 레지스터에 저장된) 리턴 어드레스가 기입되거나 복사될 수 있는 곳을 제한하는 규칙을 사용할 수 있다. 예를 들어, 실시예는 적절하게 태깅된 RA 레지스터의 리턴 어드레스만이 적절하게 코드 태깅된 기능 코드 내의 스택에 리턴 어드레스를 기입하게만 하는 규칙을 사용할 수 있다. 제 2 대안적인 구현예로서, 실시예는 PC 상태(예를 들어, PC 태그) 및 원자 메모리 동작을 사용하여 리턴 어드레스를 선형적으로 만드는 (예를 들어, 호출에 뒤이어 오게 하거나 발생하게 하는) 규칙을 포함할 수 있다. 예를 들어, 호출을 수행하면 PC 태그가 유효-리턴-어드레스(valid-return-address)를 표시하도록 설정된다. 규칙은 PC 태그가 유효-리턴-어드레스로 설정되어 있어야만, 리턴을 허용할 수 있다. 리턴 어드레스를 메모리에 기입할 때, PC 태그를 노-리턴-어드레스(no-return-address)로 설정하는 추가 규칙이 사용될 수 있다. 리턴 어드레스를 타깃 레지스터에 복사할 때, PC 태그를 노-리턴-어드레스로 설정할 수 있고 타깃 레지스터는 유효-리턴-어드레스로 태깅되지 않는 규칙이 사용될 수 있다. RA 레지스터로부터의 리턴 어드레스를 사용하여 산술 연산이 수행될 때, 그 결과가 유효-리턴-어드레스로서 태깅되지 않는 규칙이 사용될 수 있다. (예를 들어, PC 태그가 유효-리턴-어드레스로 설정된 경우) 논-리턴-어드레스를 갖는 원자 교환 동작을 이용하여 메모리로부터 리턴 어드레스를 복구하는 것만을 허용하는 규칙이 사용될 수 있다.Embodiments may also include rules that prevent or further restrict the ability to use the dynamic-CFI-return-to-A tag more than once. As a first implementation, an embodiment may use a rule to limit where the return address (stored in the RA register tagged with the dynamic-CFI-return-to-A tag) can be written or copied. For example, an embodiment may use a rule such that only return addresses in properly tagged RA registers write return addresses to the stack within properly code tagged function code. As a second alternative implementation, the embodiment uses a PC state (eg, a PC tag) and an atomic memory operation to make the return address linear (eg, to follow or occur in a call) rules. can include For example, making a call sets the PC tag to indicate a valid-return-address. A rule can only allow a return if the PC tag is set to valid-return-address. When writing the return address to memory, an additional rule may be used to set the PC tag to a no-return-address. When copying the return address to the target register, a rule can be used that the PC tag can be set to no-return-address and the target register is not tagged with a valid-return-address. When an arithmetic operation is performed using the return address from the RA register, a rule may be used that the result is not tagged as a valid-return-address. A rule can be used that only allows the return address to be retrieved from memory using an atomic swap operation with a non-return-address (e.g., if the PC tag is set to valid-return-address).

일 실시예는 스택 보호 정책을 제공하는 규칙을 추가로 정의할 수 있다. 일 양태에서, 스택 보호 정책은 부분적으로, 정책이 정책 시행을 위한 명령어 및 데이터 둘 모두의 태그를 사용할 수 있는 메모리 안전과 같은 하나 이상의 다른 정책의 확장으로 간주될 수 있다. 다음의 논의 및 본 명세서의 다른 곳에서, 루틴 및 절차와 같은 용어는 상호 교환적으로 사용될 수 있으며 보다 일반적으로는 호출될 때, 결과적으로 호출 스택 상에 새로운 스택 프레임을 생성하는 코드의 호출 가능한 단위를 지칭한다는 것을 유의하여야 한다. 코드의 호출 가능한 단위에 사용될 수도 있는 다른 이름은 함수, 서브루틴, 서브프로그램, 메소드 등을 포함할 수 있다.One embodiment may further define rules providing stack protection policies. In one aspect, a stack protection policy may be considered in part as an extension of one or more other policies, such as memory safety, in which a policy may use tags of both instructions and data for policy enforcement. In the following discussion and elsewhere in this specification, terms such as routine and procedure may be used interchangeably and more generally a callable unit of code that, when called, consequently creates a new stack frame on the call stack. It should be noted that it refers to Other names that may be used for callable units of code might include functions, subroutines, subprograms, methods, etc.

도 47을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 런타임 호출을 위한 프레임의 호출 스택을 도시하는 예(520)가 도시된다. (520)에서, 루틴 foo(502)가 G1에게 호출을 수행하여 결국 G2를 호출한다고 가정한다. 따라서 실행중의 한 포인트에서, 루틴 foo가 실행 중이고 루틴 G1에게 제 1 호출을 했고 G1은 루틴 G2에게 호출했다. 요소(522)는 루틴 foo에 대한 제 1 호출 스택 프레임을 표현할 수 있다. 요소(524)는 루틴 G1에 대한 제 2 호출 스택 프레임을 표현할 수 있다. 요소(526)는 루틴 G2에 대한 제 3 호출 스택 프레임을 표현할 수 있다.Referring to FIG. 47 , an example 520 illustrating a call stack of a frame for a runtime call in an embodiment consistent with the techniques herein is shown. At 520, assume that routine foo 502 makes a call to G1 and eventually calls G2. Thus, at one point in execution, routine foo is running and makes a first call to routine G1, which then calls routine G2. Element 522 may represent the first call stack frame for routine foo. Element 524 may represent a second call stack frame for routine G1. Element 526 may represent a third call stack frame for routine G2.

런타임 호출 인스턴스 또는 호출을 위한 스택 프레임(예를 들어, 522, 524, 526)에 저장된 정보는 예를 들어 리턴 어드레스, 레지스터 대한 그 호출 인스턴스에 의해 사용된 데이터, 변수 또는 데이터 아이템 등을 포함할 수 있다. 요소(522a 및 524a)는 foo에 대한 프레임(522) 및 G1에 대한 프레임(524)에 각각 포함된 리턴 어드레스를 나타낼 수 있다. 예컨대 악의적인 코드에 의해 수행될 수 있는 하나의 일반적인 공격은 스택(520)에 저장된 (522a 및 524a)와 같은 리턴 어드레스를 수정하는 것일 수 있다. 동적 CFI 리턴 정책에 대해 본 명세서에서 설명된 (예를 들어, 예(500)에서 도 46과 관련하여 설명된)바와 같은 기술을 사용하면, 예컨대 부적절하게 수정된 스택 위치로부터 리턴 어드레스를 사용하여 부적절하거나 유효하지 않은 리턴을 방지할 수 있다. 그러나 스택 보호를 제공하고 리턴 어드레스와 같은 스택 저장 위치의 부적절한 수정을 방지하는 추가 규칙을 또한 시행하는 것이 더 바람직할 수 있다. 따라서 스택 프레임 보호 정책을 위한 이러한 추가 규칙은 (522a)의 부적절한 수정을 허용하기 보다는 (522a 또는 524a)의 수정을 방지한 다음 부적절하게 수정된 리턴 어드레스를 사용하는 리턴을 중지시킬 수 있다.Information stored in a runtime call instance or stack frame for a call (e.g., 522, 524, 526) may include, for example, return addresses, data used by that call instance for registers, variables or data items, etc. there is. Elements 522a and 524a may represent the return addresses contained in frame 522 for foo and frame 524 for G1, respectively. One common attack that may be performed, for example, by malicious code may be to modify return addresses such as (522a and 524a) stored in stack 520. Using techniques such as those described herein for dynamic CFI return policies (e.g., as described with respect to FIG. 46 in example 500), an improper or prevent invalid returns. However, it may be desirable to also enforce additional rules that provide stack protection and prevent improper modification of stack storage locations, such as return addresses. Thus, these additional rules for the stack frame protection policy can prevent modification of 522a or 524a and then stop returns using improperly modified return addresses, rather than allowing improper modification of 522a.

아래에서 보다 상세하게 설명되는 바와 같이, 상이한 레벨의 스택 보호가 제공될 수 있다. 일 양태에서, 스택 보호는 정적 절차 (본 명세서의 다른 곳에서 설명된 정적 인가 보호 모델이라고도 지칭함)에 기초하여 결정될 수 있거나, 절차 및 또한 특정 절차(본 명세서의 다른 곳에서 설명된 인스턴스 인가 보호 모델이라고도 지칭함)의 호출 인스턴스 둘 모두에 기초하여 결정될 수 있다. 정적 인가 보호 모델을 사용하면, 스택 보호 정책의 규칙은 프레임을 생성하는 특정 절차 또는 루틴에 기초하여 스택 보호를 제공할 수 있다. 예를 들어, 스택이 (520)에서와 같이 foo의 단일 인스턴스에 대해 단일 프레임만을 포함하는 대신에, 현재 시점에서 현재 호출 체인에 포함된 foo의 호출 인스턴스가 다수 개 있을 수 있고 그래서 (예를 들어, 예컨대 foo에 대한 재귀 호출(recurse call)에 기초하여) 루틴 foo에 대한 스택 내에 호출 스택 프레임이 다수 개 있을 수 있다. 정적 루틴 또는 절차에 기초하여, foo의 임의의 인스턴스는 foo의 인스턴스에 대한 임의의 호출 스택 프레임 내의 정보를 수정하거나 액세스할 수 있다. 예를 들어, foo 인스턴스 1은 호출 스택 프레임 1을 가질 수 있고 foo 인스턴스 2는 호출 스택 프레임 2를 가질 수 있다. 스택 보호를 위한 정적 루틴 또는 절차에 기초하여, foo 인스턴스 1의 코드는 스택 프레임 1 및 2에 액세스할 수 있고 foo 인스턴스 2의 코드 또한 스택 프레임 1 및 2에 액세스할 수 있다. 이러한 실시예에서, 동일한 절차 또는 루틴 foo의 모든 인스턴스에 대한 호출 스택 프레임은 동일한 태그로 컬러화될 수 있다. 예를 들어, foo 인스턴스 1에 대한 프레임 1 및 foo 인스턴스 2에 대한 프레임 2는 둘 모두 태그 T1로 컬러화되어 메모리 보안 정책의 규칙이 동일한 루틴 또는 절차의 상이한 인스턴스에 걸쳐 위에서 언급한 스택 프레임 액세스를 가능하게 할 것이다.As described in more detail below, different levels of stack protection may be provided. In one aspect, stack protection may be determined based on a static procedure (also referred to as a static authorization protection model described elsewhere herein), or may be determined based on a procedure and also a specific procedure (instance authorization protection model described elsewhere herein). Also referred to as) may be determined based on both call instances. Using the statically authorized protection model, the rules of the stack protection policy can provide stack protection based on specific procedures or routines that create frames. For example, instead of the stack containing only a single frame for a single instance of foo as at 520, at this point in time there may be multiple call instances of foo included in the current call chain, so (e.g. , e.g. based on a recursive call to foo) there may be multiple call stack frames in the stack for routine foo. Based on a static routine or procedure, any instance of foo can modify or access information in any call stack frame for an instance of foo. For example, foo instance 1 can have call stack frame 1 and foo instance 2 can have call stack frame 2. Based on the static routine or procedure for stack protection, code in foo instance 1 can access stack frames 1 and 2 and code in foo instance 2 can also access stack frames 1 and 2. In this embodiment, call stack frames for all instances of the same procedure or routine foo may be colored with the same tag. For example, frame 1 for instance 1 of foo and frame 2 for instance 2 of foo are both colored with tag T1 so that the rules of the memory security policy allow the aforementioned stack frame access across different instances of the same routine or procedure. will make

스택 보호의 보다 미세한 세분성으로서, 실시예는 정적 루틴 또는 절차뿐만 아니라 루틴 또는 절차의 특정 런타임 인스턴스에 기초하여 스택의 액세스를 추가로 제한하는 스택 보호 정책의 규칙을 사용할 수 있다(예를 들어, 인스턴스 인가 보호 모델). 예를 들어, 위에서 언급한 바와 같이, foo 인스턴스 1은 호출 스택 프레임 1을 가질 수 있고 foo 인스턴스 2는 호출 스택 프레임 2를 가질 수 있다. 정적 루틴 또는 절차 및 또한 스택 보호를 위한 호출 인스턴스에 기초하여, foo 인스턴스 1에 대한 코드는 스택 프레임 1에 액세스할 수 있지만 스택 프레임 2 에 액세스할 수 없으며, foo 인스턴스 2에 대한 코드는 스택 프레임 2에 액세스할 수 있지만 스택 프레임 1에 액세스할 수 없다. 이러한 실시예에서, 절차 또는 루틴의 각 호출 인스턴스에 대한 호출 스택 프레임은 상이한 태그로 컬러화될 수 있다. 예를 들어, foo 인스턴스 1에 대한 프레임 1은 태그 T1로 컬러화될 수 있고 foo 인스턴스 2에 대한 프레임 2는 태그 T2로 컬러화될 수 있으므로 메모리 안전 정책의 규칙은 특정 호출 및 루틴 또는 절차 각각에 기초하여 위에서 언급한 스택 프레임 액세스를 가능하게 할 것이다. As a finer granularity of stack protection, embodiments may use static routines or procedures as well as rules in stack protection policies that further restrict access to the stack based on specific runtime instances of routines or procedures (e.g., instance Authorization Protection Model). For example, as mentioned above, foo instance 1 can have call stack frame 1 and foo instance 2 can have call stack frame 2. Based on static routines or procedures and also call instances for stack protection, code for foo instance 1 can access stack frame 1 but not stack frame 2, code for foo instance 2 can access stack frame 2 but can't access stack frame 1. In such an embodiment, the call stack frame for each call instance of a procedure or routine may be colored with a different tag. For example, frame 1 for foo instance 1 can be colored with tag T1 and frame 2 for foo instance 2 can be colored with tag T2, so the rules of memory safety policy can be colored based on specific calls and routines or procedures, respectively. This will enable the stack frame access mentioned above.

실시예는 예컨대 각각 컬러가 상이한 스택 프레임 내의 상이한 객체 또는 데이터 아이템을 컬러화함으로써 단일 절차 호출 인스턴스를 위한 스택의 상이한 영역 또는 부분의 보다 미세한 레벨의 세분성을 추가로 제공할 수 있다(본 명세서의 다른 곳에서 설명된 객체 보호 모델이라고도 지칭한다). 본 명세서의 다른 곳에서 설명된 바와 같이, 스택 프레임은 루틴 또는 절차의 특정 호출에서 사용되는 데이터 아이템 또는 객체에 대한 저장소를 포함할 수 있으며, 이곳에서 각각의 이러한 데이터 아이템 또는 객체는 상이한 컬러로 태깅될 수 있다. 예를 들어, 도 48을 참조하면, 루틴 또는 절차 foo에 의해 할당된 저장소를 갖는 데이터 아이템(540) 및 스택 프레임(531) 내의 연관된 태깅된 메모리를 도시하는 예(530)가 도시된다. 요소(540)는 루틴 foo에 할당된 저장소를 갖는 변수(540a 내지 540c)를 나타내고 요소(531)는 호출 스택 내의 루틴 foo의 이러한 특정 호출 인스턴스에 대한 호출 스택 프레임을 나타낸다. 요소(531)는 변수 어레이(540a)의 메모리 영역(532), 변수 라인(540b)의 메모리 영역(534) 및 변수 패스워드(540c)의 메모리 영역(536)을 포함한다. 또한, 프레임(531)은 저장된 리턴 어드레스의 메모리 영역(538)을 포함한다. 상이한 영역(532, 534, 536 및 538) 각각은 (533)으로 표시된 바와 같이 상이한 태그로 태깅되거나 컬러화될 수 있다. 영역(532) 내의 각 워드는 Red1로 태깅될 수 있다. 영역(534) 내의 각 워드는 Red2로 태깅될 수 있다. 영역(536)의 각 워드는 Red3로 태깅될 수 있다. 영역(538)의 각 워드는 Red4로 태깅될 수 있다.Embodiments may further provide a finer level of granularity of different regions or portions of the stack for a single procedure call instance, such as by colorizing different objects or data items within a stack frame, each of which color is different (see elsewhere herein). Also referred to as the object protection model described in ). As described elsewhere herein, a stack frame may contain storage for data items or objects used in specific invocations of routines or procedures, where each such data item or object is tagged with a different color. It can be. For example, referring to FIG. 48 , an example 530 depicting a data item 540 with storage allocated by routine or procedure foo and associated tagged memory in stack frame 531 is shown. Element 540 represents variables 540a through 540c with storage allocated to routine foo, and element 531 represents the call stack frame for this particular call instance of routine foo in the call stack. Element 531 includes memory area 532 of variable array 540a, memory area 534 of variable line 540b, and memory area 536 of variable password 540c. Frame 531 also includes a memory area 538 of stored return addresses. Each of the different regions 532, 534, 536 and 538 may be tagged or colored with a different tag as indicated by 533. Each word in region 532 may be tagged with Red1. Each word in area 534 may be tagged with Red2. Each word in area 536 may be tagged with Red3. Each word in region 538 may be tagged with Red4.

또 다른 변형예로서, 실시예는 코드(예를 들어, 루틴, 절차 등) 세트에 대해 상이한 신뢰 영역 또는 경계를 정의하고 상이한 보호 레벨을 제공할 수 있다. 예를 들어, 호출된 모든 루틴이 동일한 신뢰 레벨을 가질 수는 없다. 예를 들어, 개발자는 자신이 작성한 제 1 세트의 루틴을 가질 수 있고 제 1 세트의 코드에 의해 수행된 동작이 악의적인 코드를 포함하지 않은 높은 신뢰 레벨을 가질 수 있다. 그러나 제 1 세트의 루틴은 제 3자에 의해 제공되었거나 인터넷으로부터 획득된 라이브러리로 호출될 수 있다. 라이브러리는 신뢰성 없을 수 있다. 따라서, 실시예는 코드의 상이한 본문 및 각 본문에 의해 사용되는 특정 데이터 아이템에 기초하여 보호 레벨을 변화시킬 수 있다. 예를 들어, 도 49의 예(550)를 참조하면, 신뢰성 있는 사용자 코드 내의 루틴 foo가 라이브러리 내의 루틴 악의(evil)를 호출하고, 영역(534)을 가리키는 악의 포인터(데이터 엔트리 라인(540b)을 가리키는 포인터)에 파라미터로서 통과시킨다고 가정한다. 이러한 경우, (531)의 각 영역을 다른 컬러로 컬러화하거나 태깅하는 대신, 영역(532, 536 및 538) 모두가 Red5와 같은 동일한 컬러로 컬러화될 수 있고, 영역(534)은 Red6과 같은 다른 컬러로 태깅될 수 있다. 이것은 루틴 악의가 신뢰성 없는 코드인 것으로 간주되기 때문에 루틴 악의에 의해 액세스되는 메모리 영역(534)이 메모리 안전의 레벨로서 다른 영역(531)과 상이한 컬러로 태깅되는 것을 추가로 보증하는데 사용될 수 있다. 또한, 악의에 통과된 영역(534)을 가리키는 포인터는 영역(534)과 동일한 컬러 Red6으로 컬러화되거나 태깅될 수 있다. 이러한 방식으로, 메모리 안전 정책 규칙은 악의에 의해 사용된 메모리로의 액세스를 Red6으로 태깅된 메모리로 제한할 수 있다.As another variation, embodiments may define different trust domains or boundaries for sets of code (eg, routines, procedures, etc.) and provide different levels of protection. For example, not all routines called may have the same trust level. For example, a developer may have a first set of routines written by him and have a high confidence level that the actions performed by the first set of code do not contain malicious code. However, the first set of routines can be called into a library provided by a third party or obtained from the Internet. Libraries can be unreliable. Thus, embodiments may vary the level of protection based on different bodies of code and specific data items used by each body. For example, referring to example 550 of FIG. 49 , routine foo in trusted user code calls routine evil in library and sends an evil pointer to area 534 (data entry line 540b). pointer) as a parameter. In this case, instead of coloring or tagging each region of 531 with a different color, regions 532, 536, and 538 could all be colored the same color, such as Red5, and region 534 a different color, such as Red6. can be tagged with This can be used to further ensure that memory regions 534 accessed by routine malicious are tagged with a different color than other regions 531 as a level of memory safety since routine malicious is considered untrusted code. Additionally, the pointer pointing to the area 534 that was passed in bad faith may be colored or tagged with the same color Red6 as the area 534 . In this way, memory safety policy rules can restrict access to maliciously used memory to memory tagged with Red6.

특정 루틴, 라이브러리 또는 본문 또는 코드가 특정 신뢰 레벨을 갖고 있는지는 하나 이상의 기준 및 입력을 사용하여 분석한 것에 기초하여 결정될 수 있다. 예를 들어, 런타임 분석 및 라이브러리 코드의 사용에 기초하여, 신뢰 레벨이 결정될 수 있다. 예를 들어, 라이브러리가 다른 알지 못하거나 신뢰성 없는 외부 또는 제 3 자 라이브러리를 호출한다면, 신뢰 레벨은 상대적으로 낮을 수 있다. 코드의 본문에 대한 신뢰 레벨은 코드를 획득한 소스 또는 위치에서 사용될 수 있다. 예를 들어, 인터넷에서 획득한 라이브러리로부터의 코드를 사용하는 것은 신뢰성 없는 것으로 간주될 수 있다. 이에 반해, 임의의 신뢰성 없는 코드를 호출하지 않는 특정 개발자에 의해 개발된 코드는 높은 신뢰 레벨을 가질 수 있다.Whether a particular routine, library, or text or code has a particular confidence level can be determined based on an analysis using one or more criteria and inputs. For example, based on runtime analysis and use of library code, a level of trust can be determined. For example, if a library calls another unknown or untrusted external or third party library, the level of trust may be relatively low. The level of trust for the body of code may be used by the source or location from which the code was obtained. For example, using code from a library obtained from the Internet may be considered unreliable. In contrast, code developed by a particular developer that does not call any untrusted code may have a high level of trust.

스택 프레임 및 스택 보호의 전술한 양상 및 다른 양상은 아래에서 더 상세하게 설명된다.The foregoing and other aspects of stack frame and stack protection are described in more detail below.

스택 프레임과 관련하여 그리고 다시 예(530)을 참조하면, 컴파일러는 기존 스택 포인터에 정수(프레임의 크기)를 가산함으로써 새로운 스택 포인터를 생성할 수 있다. 이전 스택 포인터는 스택에 (프레임에) 푸시된 다음 스택으로부터 이를 다시 판독함으로써 복구될 수 있다. 스택 포인터로의 가산은 데이터 아이템(540a 내지 540c)에 대해 (531)에서 전술한 바와 같이 많은 독립 객체를 포함하는 프레임의 총 크기를 나타낼 수 있다. 스택은 이들 3 데이터 아이템(540a 내지 540c)을 위한 공간을 필요로 하고, 컴파일러는 데이터 아이템(540a 내지 540c)에 필요한 전체 공간을 결정할 수 있다. 표준 사용법에서, 컴파일러는 스택 포인터(또는 스택 포인터로부터 생성된 프레임 포인터)로부터 어드레스를 계산함으로써 이들 데이터 아이템(540a 내지 540c)의 스토어(532, 534 및 536)에 각각 액세스한다. 따라서, 실시예에서 컴파일러, 런타임 및 호출 관행은 간단한 포인터 산술을 수행함으로써 스택 호출 프레임의 상이한 영역을 가리키는 포인터를 생성하고 사용할 수 있다.Regarding stack frames and referring back to example 530, the compiler can create a new stack pointer by adding an integer (the size of the frame) to the old stack pointer. The old stack pointer can be restored by pushing it onto the stack (to the frame) and then reading it back from the stack. The addition to the stack pointer may indicate the total size of a frame containing many independent objects, as described above at 531 for data items 540a-540c. The stack needs space for these 3 data items 540a through 540c, and the compiler can determine the total space needed for data items 540a through 540c. In standard usage, the compiler accesses stores 532, 534, and 536 of these data items 540a through 540c, respectively, by computing addresses from the stack pointer (or frame pointer generated from the stack pointer). Thus, in an embodiment, the compiler, runtime, and calling convention can create and use pointers that point to different regions of the stack call frame by performing simple pointer arithmetic.

정적 인가 보호 모델은 객체에 대한 인가가 프레임을 생성하는 루틴 또는 절차와 같은 정적 코드 블록에 속한다는 것을 표시한다. 따라서, 본 명세서의 다른 곳에서 논의된 바와 같이, 프레임을 생성하는 절차 foo는 그 프레임 내의 것을 가리키는 포인터를 생성하는 인가를 갖는다. 가장 단순한 사례에서, 프레임이 더 이전 또는 이후에 스택에 있었더라도, 동일한 인가는 foo로 하여금 자기가 생성한 모든 프레임에 액세스할 수 있게 할 것이다. 정적 인가는 태그(예를 들어, 메모리 셀의 컬러, 컬러화된 포인터, (컬러화된 포인터를 생성하는 코드 태그(예를 들어, 명령어상의 명령어 태그 또는 태그라고도 지칭함)가 로드 시 미리 할당될 수 있음을 의미한다. 인스턴스 인가 보호는 스택상의 함수 호출의 깊이에 기초하여 인가를 제공한다. 객체 보호는 스택 프레임뿐만 아니라 스택에 할당된 객체 레벨에서 보호를 나타낸다. 따라서, 객체 보호는 프레임 내의 한 객체(예를 들어, 어레이, 버퍼)로부터 동일한 프레임상의 다른 객체로의 오버플로우를 검출하고 방지할 수 있게 하는데, 이것은 정적 인가 보호 모델 또는 인스턴스 보호 모델이 있는 간단한 스택 프레임 세분성 PUMP 규칙을 사용하여 달성되지 않은 것이다. 객체 보호는 정적 인가 보호 모델 및 인스턴스 보호 모델 둘 모두에 적용될 수 있다. 객체 보호의 변형예로서, 실시예는 또한 정수와 같은 다수의 상이한 데이터 아이템 서브 객체 및 어레이를 포함하는 구조체와 같은 계층적 객체의 계층적 객체 보호를 사용할 수 있다. 제 1 객체가 하나 이상의 서브 객체 각각의 하나 이상의 레벨을 포함하는 계층적 객체를 갖는 적어도 하나의 실시예에서, 제 1 태그는 제 1 객체에 대해 생성될 수 있고, 그러면 추가의 서브 객체 태그는 제 1 태그에 기초하여 생성될 수 있다. 각 서브 객체 태그는 상이한 서브 객체에 태깅하는데 사용될 수 있다. 서브 객체 태그는 계층 구조에서 서브 객체의 특정 위치를 나타내는 값일 수 있다. 예를 들어, 태그 T1은 서브 객체 2 및 3으로서 2 어레이를 포함하는 구조체와 함께 사용하기 위해 생성될 수 있다. 2 어레이 각각에 대한 상이한 서브 객체 태그는 T1로부터 생성되고 2 어레이 서브 객체에 태깅하는데 사용된다.The static authorization protection model indicates that authorization for an object belongs to a static code block, such as a routine or procedure that creates a frame. Thus, as discussed elsewhere herein, a procedure foo that creates a frame has an authorization that creates pointers to things within that frame. In the simplest case, the same authorization would give foo access to all frames it created, no matter if the frames were on the stack earlier or later. Static authorization indicates that a tag (e.g., color of a memory cell, colored pointer, code tag that creates a colored pointer (e.g., also referred to as an instruction tag or tag on an instruction) may be pre-allocated at load time. Instance authorization protection provides authorization based on the depth of the function call on the stack Object protection represents protection at the level of objects allocated on the stack as well as at the stack frame. eg arrays, buffers) to other objects on the same frame, which is not achieved using simple stack frame granularity PUMP rules with a static authorization protection model or an instance protection model. Object protection can be applied to both the static authorization protection model and the instance protection model As a variant of object protection, embodiments can also be hierarchical, such as structures containing arrays and multiple different data item sub-objects, such as integers. Hierarchical object protection of objects may be used In at least one embodiment where a first object has a hierarchical object comprising one or more levels of each of one or more sub-objects, a first tag may be created for the first object. Additional sub-object tags can then be created based on the first tag Each sub-object tag can be used to tag a different sub-object A sub-object tag indicates a particular position of a sub-object in a hierarchy can be a value For example, tag T1 can be created for use with a structure that contains 2 arrays as subobjects 2 and 3. A different subobject tag for each of the 2 arrays is created from T1 and the 2 array sub Used to tag objects.

이제 본 명세서에서의 기술에 따른 실시예에서 상이한 스택 동작에 대해 스택 메모리와 관련하여 수행될 수 있는 처리에 대해 설명될 것이다. 스타트업 시, 스택 메모리는 프리-스택 프레임 태그(free-stack frame tag)를 사용하여 마킹되거나 태깅된 모든 메모리 셀을 가질 수 있다. 본 명세서의 다른 곳에서의 논의 및 기술과 일관하여, 이러한 태깅은 PUMP 규칙을 호출함으로써 수행될 수 있다. 초기에 스택 메모리 셀을 프리-스택 프레임 태그에 태깅하는 것은 전체 스택에 대해 한번에 수행될 수는 없지만, 그 대신 스택을 확장하는 커널 페이지 결함 핸들러에서 점진적으로 수행될 수 있음을 주목하여야 한다.Processing that may be performed with respect to stack memories for different stack operations in embodiments according to the techniques herein will now be described. At startup, the stack memory may have all memory cells marked or tagged using a free-stack frame tag. Consistent with discussion and description elsewhere herein, such tagging may be performed by invoking PUMP rules. It should be noted that initially tagging of stack memory cells with free-stack frame tags cannot be done for the entire stack at once, but instead can be done incrementally in a kernel page fault handler that expands the stack.

예컨대 컴파일러에 의해 새로운 스택 프레임을 할당하는 것과 관련하여, 새로 할당된 프레임에는 새로운 프레임 태그가 생성될 수 있다. 새로운 프레임을 가리키는 포인터는 새로운 프레임 태그로 태깅될 수 있다. 예를 들어, 실시예는 새로운 프레임 포인터를 생성하는 (예를 들어, (스택 포인터에 가산함으로써) 포인터 연산을 수행하는 가산 명령어와 같은) 명령어에 태깅할 수 있으며, 이 경우 명령어상의 태그는 정책 규칙을 트리거하여 새로운 프레임 태그를 생성한다. 규칙 및 태그 전파를 사용하면, 스택 포인터에 대해 특수 태그가 생성되어 스택 포인터에 태깅하는데 사용될 수 있다. 이어서, 각각의 프레임 포인터마다, 고유한 프레임 포인터 태그가 스택 포인터 특수 태그로부터 도출될 수 있고, 프레임 포인터는 자기의 고유한 프레임 포인터 태그로 태깅될 수 있다. 이러한 실시예에서, 프레임 포인터 태그는 스택 포인터의 태깅된 사본(예를 들어, 가산 또는 0)으로부터 생성될 수 있다.For example, in connection with allocating a new stack frame by a compiler, a new frame tag may be created in the newly allocated frame. A pointer to a new frame may be tagged with a new frame tag. For example, an embodiment may tag an instruction that creates a new frame pointer (eg, an add instruction that performs a pointer operation (by adding to the stack pointer)), in which case the tag on the instruction follows a policy rule. to create a new frame tag. Using rules and tag propagation, a special tag can be created for the stack pointer and used to tag the stack pointer. Then, for each frame pointer, a unique frame pointer tag can be derived from the stack pointer special tag, and the frame pointer can be tagged with its own unique frame pointer tag. In such an embodiment, the frame pointer tag may be created from a tagged copy (e.g., add or zero) of the stack pointer.

새로운 스택 프레임이 예컨대 루틴 또는 절차의 새로운 호출을 위해 할당될 때, 새로 할당된 스택 프레임의 메모리 셀은, 예를 들어, 엄격한 객체 초기화(strict object initialization)라고 지칭되는 제 1 기술 또는 느린 객체 컬러화(lazy object coloring)라고 지칭될 수 있는 제 2 기술을 사용하여 태깅되거나 컬러화될 수 있다.When a new stack frame is allocated, for example for a new invocation of a routine or procedure, the memory cells of the newly allocated stack frame are used, for example, in a first technique called strict object initialization, or slow object colorization ( may be tagged or colored using a second technique, which may be referred to as lazy object coloring.

엄격한 객체 초기화의 제 1 기술을 사용하면, 새로 할당된 프레임의 프리 스택 프레임 셀은 모두 초기에 예컨대 프레임의 정적 객체에 기초하여 하나 이상의 컬러로 컬러화되거나 태깅된다. 이러한 초기 컬러화는 예를 들어, 연관된 호출에 대한 정보를 저장하기 위해 나중에 프레임을 사용하기 전에 새로 할당된 프레임의 초기화 처리의 일부로서 수행될 수 있다. 실시예는 프리 스택 프레임 셀의 컬러화 또는 태깅을 수행하는 규칙을 트리거하는 코드를 예컨대 프레임의 정적 객체에 기초하여 의도된 하나 이상의 컬러에 가산할 수 있다. 명령어상의 코드 태그는 연관된 메모리 셀 컬러화를 인가하고 정의하는데 사용될 수 있다. 프레임의 컬러화된 메모리 셀의 후속 스토어 또는 판독은 예컨대 메모리 안전 정책 규칙에 따라 프레임 메모리 셀 컬러에 기초하여 허용되거나 허용되지 않을 수 있다(예를 들면, 컬러 C1로 태깅된 메모리 셀의 경우, 규칙은 역시 동일한 컬러 C1의 태그를 갖는 포인터를 사용하여 컬러화된 메모리 셀 내용에 액세스하는 메모리 동작을 허용하지만, 포인터가 상이한 컬러 C2라면, 메모리 동작을 허용하지 않을 수 있다). 또한 명령어상의 코드 태그는 절차 내에서 메모리 동작을 수행할 수 있는 인가를 제공할 수 있다.Using the first technique of strict object initialization, all free stack frame cells of a newly allocated frame are initially colored or tagged with one or more colors, eg based on the static objects in the frame. This initial colorization may be performed, for example, as part of the initialization process of the newly allocated frame prior to later using the frame to store information about the associated call. Embodiments may add code that triggers rules that perform colorization or tagging of free stack frame cells to one or more intended colors, e.g., based on static objects in the frame. Code tags on instructions can be used to authorize and define the associated memory cell colorization. Subsequent store or read of a frame's colored memory cell may or may not be allowed based on the frame memory cell color, e.g., according to memory safety policy rules (e.g., for memory cells tagged with color C1, the rule is Again, using a pointer with a tag of the same color C1 allows memory operations to access colored memory cell contents, but may not allow memory operations if the pointer is a different color C2). Code tags on instructions can also provide authorization to perform memory operations within a procedure.

느린 객체 컬러화의 제 2 기술을 사용하면, 엄격한 객체 초기화 기술처럼 모든 스택 객체의 초기 컬러화는 없다. 오히려, 느린 객체 컬러화를 사용하면, 프리 스택 프레임으로서 태깅된 스택 메모리 위치로의 스토어는 그 스토어를 허용하고 기입기(writer)에 기초하여 메모리 위치의 컬러를 또한 변경하는 규칙을 트리거하는 결과를 가져온다. 프리 스택 프레임으로서 태깅된 스택 메모리 위치에 대한 판독은 초기화되지 않은 메모리 판독이며 정책이 초기화되지 않은 메모리 판독을 허용할지/허용하지 않을지에 따라 허용되거나/허용되지 않을 수 있다. 느린 객체 컬러화를 사용하면, 생성시 프레임의 모든 메모리 셀을 초기에 전체적으로 태깅하는 규칙을 호출하는 코드의 초기 블록이 실행되지 않는다. 오히려, 메모리 셀은 스토어 동작과 관련하여 호출되는 규칙에 의해 태깅된다.With the second technique of slow object colorization, there is no initial colorization of all stack objects as with the strict object initialization technique. Rather, using slow object colorization, a store to a stack memory location tagged as a free stack frame results in triggering a rule that allows the store and also changes the color of the memory location based on the writer. . Reads to stack memory locations that are tagged as free stack frames are uninitialized memory reads and may or may not be allowed depending on whether the policy allows/disallows uninitialized memory reads. When using slow object colorization, the initial block of code that calls the rule that initially globally tags all memory cells in a frame upon creation is not executed. Rather, memory cells are tagged by rules that are invoked in conjunction with store operations.

적어도 하나의 실시예에서, 엄격한 객체 초기화 또는 느린 객체 컬러화를 사용할지 여부는 원하는 보호 레벨 및 방어 불가능한 취약성의 발생에 따라 달라질 수 있다.In at least one embodiment, whether to use strict object initialization or slow object colorization may depend on the desired level of protection and the occurrence of undefendable vulnerabilities.

스택/프레임 포인터로부터 데이터에 직접 액세스하는 루틴 또는 절차 내의 코드는 코드 태깅되어 그렇게 하도록 허용된다. 느린 객체 컬러화와 관련하여, 메모리 셀에 저장하게 되면 전술한 바와 같이 기입기에 기초하여 메모리 셀을 컬러화하게 된다. 예를 들어, 예(530)를 다시 참조하면, 프레임(531)을 갖는 루틴 foo의 스토어 명령어는 어레이(532) 내의 메모리 위치에 값을 기입할 수 있다. 실제로 현재 스택 보호 정책에 따라, 스토어 명령어가 foo에 대한 호출 프레임의 어레이(532) 내의 위치에 기입하기 위해, 스토어 명령어는 Red1이라는 태그를 갖고 있어야 할 수 있다. 정책의 제 1 규칙은 스토어 명령어에 대해 이러한 체크를 수행하도록 트리거될 수 있다. 따라서, 실시예는 컴파일러가 재 1 규칙을 트리거하여 스토어 명령어를 Red1로 태깅하는 코드 시퀀스를 생성하게 할 수 있다. (예를 들어, 전술한 것의 변형예로서, Red1과 같은 메모리 셀상의 태그는 명령어 스토어 또는 다른 명령어상의 태그와 관련이 있지만 동일하지 않을 수 있다. 예를 들어, "Red1code" CI 태그는 이 태그를 가진 명령어가 Red1 태깅 메모리 셀에 액세스할 수 있고 Red1 태깅된 메모리 셀을 생성할 수 있음을 나타낸다. 스토어 명령어가 현재 명령어일 때, 이 명령어가 Red1인지를 보장하기 위해 명령어 태그를 체크하는 전술한 제 1 규칙이 트리거될 수 있다. 출력으로서, 규칙은 어레이(532) 내의 메모리 위치를 Red1 태그로 태깅할 수 있다.Code within a routine or procedure that directly accesses data from the stack/frame pointer is code tagged and allowed to do so. Regarding slow object colorization, storing in a memory cell results in colorization of the memory cell based on the writer as described above. For example, referring back to example 530, the store instruction of routine foo with frame 531 may write a value to a memory location in array 532. Indeed, depending on the current stack protection policy, a store instruction may need to have a tag of Red1 in order for it to write to a location in the array 532 of call frames for foo. A first rule in the policy can be triggered to perform this check on the store command. Thus, an embodiment may cause the compiler to trigger the re-1 rule to generate a code sequence that tags a store instruction as Red1. (For example, as a variation on the foregoing, a tag on a memory cell such as Red1 may be related to, but not identical to, a tag on an instruction store or other instruction. For example, a "Red1code" CI tag may refer to this tag as Indicates that an instruction with can access a Red1 tagged memory cell and can create a Red1 tagged memory cell When a store instruction is the current instruction, the above limitation of checking the instruction tag to ensure that the instruction is Red1 is 1 rule can be triggered As an output, the rule can tag a memory location in the array 532 with a Red1 tag.

특정 개체를 가리키는 포인터를 생성하는 절차 내의 코드는 테인트에 태깅하거나 그 객체에 대한 포인터를 설정한다. 포인터는 후속 명령어에서 절차의 자체 사용에 대한 것일 수 있으며 및/또는 인수(argument)로서 다른 절차로 넘겨될 수 있다.Code within a procedure that creates a pointer to a particular object tags a taint or sets a pointer to that object. The pointer may be for the procedure's own use in a subsequent instruction and/or passed as an argument to another procedure.

레지스터 값을 프레임에 저장하거나 프레임으로부터 레지스터 값을 복원하는 것은 프레임 인가에 기초할 수 있다. 레지스터 값을 저장하는 스택 프레임의 메모리 위치(들)는 스택 프레임에서 고유한 객체로서 취급될 수 있다. 명령어 태깅은 그러한 태깅된 스토어 및 로드 명령어에 대한 인가를 제공한다. 느린 객체 컬러화를 사용하면, 메모리 셀에 데이터를 저장하는 인가로 태깅된 스토어 명령어는 또한 기입기(예를 들어, 스토어 명령어를 포함하는 절차)에 기초하여 메모리 셀에 태깅하는 인가를 제공한다.Saving a register value to a frame or restoring a register value from a frame may be based on frame authorization. The memory location(s) of a stack frame that store register values can be treated as unique objects in the stack frame. Instruction tagging provides authorization for those tagged store and load instructions. Using slow object colorization, a store instruction tagged with an authorization to store data in a memory cell also provides an authorization to tag the memory cell based on the writer (eg, a procedure involving the store instruction).

스택에 넘겨진 절차 인수는 호출자와 피호출자 둘 모두가 액세스할 수 있게 하는 태그로 마킹될 수 있다. 리턴 어드레스는 특수하게 태깅될 수 있음(예를 들어, 예컨대 도 46과 관련하여 본 명세서에서 설명된 동적 CFI 리턴 정책)을 주목하여야 한다. 따라서, (예를 들어, 예컨대 내포된(nested) 호출 또는 재귀 호출과 관련하여) 리턴 어드레스가 스택에 저장되면, 리턴 어드레스의 태깅으로 인해 스토어는 스택상에 리턴 어드레스를 오버라이팅하도록 허용되지 않을 것이다. 스택에서 도출된 포인터(stack-derived pointer)가 다른 절차에게로의 호출과 관련하여 다른 프레임에 넘겨질 때, 포인터를 사용하여 수행된 메모리 액세스는 본 명세서의 다른 곳에서 설명한대로 메모리 안전 정책의 규칙을 트리거하는 결과를 가져온다. 메모리 위치를 가리키는 포인터를 생성한 명령어는 특정 메모리 위치의 태그에 기초하여 태깅될 수 있다. 명령어 태그는 메모리 위치에 액세스하는 인가를 표시할 수 있다. 명령어는 메모리 위치에 액세스하는 인가를 표시하기 위해 포인터에 태깅하는 규칙을 트리거할 수 있다. 예를 들어, 규칙은 명령어와 동일한 태그 또는 명령어 태그에 기초한 변형을 포인터에 할당할 수 있다. 따라서, 일 양태에서, 포인터를 생성한 명령어는 또한 포인터를 통해 메모리 위치에 액세스하는 능력을 창출하고 인수로서 호출된 절차에 넘겨진 포인터를 통해 그 기능을 공유한다. 느린 객체 컬러화를 사용하면, 포인터가 프리 스택 프레임 셀에 태깅하는 인가를 제공하는 태그를 가져야 할 것이며, 이는 힙-메모리 안전 포인터에 대해 허용되지 않을 수 있다.Procedure arguments passed on the stack can be marked with tags that make them accessible to both callers and callees. It should be noted that the return address may be specially tagged (eg, the dynamic CFI return policy described herein with respect to, for example, FIG. 46). Thus, if the return address is stored on the stack (e.g., in the context of nested calls or recursive calls), the tagging of the return address will not allow the store to overwrite the return address on the stack. . When a stack-derived pointer is passed to another frame in conjunction with a call to another procedure, memory accesses performed using the pointer obey the rules of the memory safety policy as described elsewhere in this specification. results in a trigger. An instruction that created a pointer to a memory location may be tagged based on the tag of the particular memory location. A command tag may indicate authorization to access a memory location. Instructions can trigger rules that tag pointers to indicate authorization to access memory locations. For example, a rule may assign a pointer the same tag as a command or a variant based on a command tag. Thus, in one aspect, the instruction that created the pointer also creates the ability to access a memory location through the pointer and shares that functionality through the pointer passed as an argument to the called procedure. Using slow object colorization, the pointer would have to have a tag giving permission to tag the free stack frame cell, which may not be acceptable for heap-memory safe pointers.

(예를 들어, 예컨대 호출된 루틴의 완료로 인해) 스택으로부터 프레임을 제거하는 결과를 가져오는 리턴 또는 다른 동작과 관련하여, 태깅된 코드는 프레임을 클리어할 수 있다. 이러한 코드상의 태그는 이 프레임과 연관된 모든 프레임 객체 태그를 프리 스택 프레임 셀 태그로 변경하는 인가를 제공한다.On return or other operation that results in removing the frame from the stack (e.g., due to completion of a called routine), the tagged code may clear the frame. A tag on this code provides permission to change all frame object tags associated with this frame to free stack frame cell tags.

본 명세서에서의 기술에 따라 컴퓨터 시스템의 실시예에서 실행되는 프로그램의 코드는 예외 처리를 수행하는 코드를 포함할 수 있다. 관련 기술분야에 공지된 바와 같이, 예외 처리는 예외 핸들러에 의해 수행되는 특수 처리를 요구하는 이례적인 또는 예외적인 조건의 발생을 나타내는 예외에 응답하여 수행되는 처리이다. 따라서 예외가 프로그램의 제 1 포인트에서 발생하면, 정상적인 프로그램 실행 흐름이 중단되어 제어가 예외 핸들러로 이전된다. 제어를 핸들러에 이전하기 전에, 현재의 실행 상태가 미리 결정된 위치에 저장될 수 있다. 예외가 핸들러에 의해 처리된 이후 프로그램 실행이 재개될 수 있으면, 프로그램의 실행이 재개될 수 있다(예를 들어, 그러면 제어는 프로그램의 제 1 포인트 다음으로 다시 이전될 수 있다). 예를 들어, 0으로 나누는 연산은 핸들러가 예외를 처리한 이후 프로그램 실행이 재개될 수 있는 예외를 초래할 수 있다. 예외 핸들러를 구현하는 것과 관련하여, 실시예는 setjump 및 longjump와 같은 라이브러리 루틴을 사용할 수 있다. 예를 들어, setjump 및 longjump는 각각 다음과 같이 정의되는 표준 C 라이브러리 루틴인 setjmp 및 longjmp일 수 있다.Code of a program executed in an embodiment of a computer system according to the descriptions herein may include code for performing exception handling. As is known in the art, exception handling is processing performed in response to an exception indicating the occurrence of an exceptional or exceptional condition requiring special processing performed by an exception handler. Thus, when an exception occurs at the first point in the program, the normal flow of program execution is interrupted and control is transferred to the exception handler. Prior to transferring control to the handler, the current execution state may be stored in a predetermined location. If program execution can be resumed after the exception is handled by the handler, then execution of the program can be resumed (eg, then control can be transferred back to after the first point in the program). For example, a divide-by-zero operation may result in an exception from which program execution can resume after the handler handles the exception. Regarding implementing the exception handler, embodiments may use library routines such as setjump and longjump. For example, setjump and longjump can be standard C library routines setjmp and longjmp respectively defined as follows.

여기서 setjmp는 로컬 jmp_buf 버퍼를 설정하고 이것을 점프를 위해 초기화한다. setjmp는 나중에 longjmp가 사용할 수 있도록 env 인수에 의 해 특정된 환경 버퍼에 프로그램의 호출 환경을 저장한다. 리턴이 직접 호출로부터 온 것이면, setjmp는 0을 리턴한다. 리턴이 call to longjmp로부터 온 것이면, setjmp는 0이 아닌 값을 리턴한다.Here setjmp sets up the local jmp_buf buffer and initializes it for the jump. setjmp stores the program's calling environment in the environment buffer specified by the env argument for later use by longjmp. If the return is from a direct call, setjmp returns 0. If the return is from call to longjmp, setjmp returns a non-zero value.

여기서 longjmp는 프로그램의 동일한 호출에서 setjmp 루틴의 호출에 의해 저장된 환경 버퍼 env의 컨텍스트를 복원한다. 내포된 신호 핸들러로부터 longjmp를 호출하는 것은 정의되지 않는다. value로 지정된 값은 longjmp로부터 setjmp로 넘겨진다. longjmp가 완료된 후, setjmp의 대응하는 호출이 방금 리턴된 것처럼 프로그램 실행이 계속된다. longjmp에 넘겨진 값이 0이면, setjmp는 마치 1을 리턴한 것처럼 동작할 것이며; 그렇지 않으면 값을 리턴한 것처럼 거동할 것이다.Here longjmp restores the context of the environment buffer env saved by a call to the setjmp routine in the same invocation of the program. Calling longjmp from a nested signal handler is undefined. The value designated as value is passed from longjmp to setjmp. After longjmp completes, program execution continues as if the corresponding call to setjmp had just returned. If the value passed to longjmp is 0, setjmp will act as if it returned 1; Otherwise it will behave as if it returned a value.

따라서 setjmp는 프로그램의 현재 상태를 저장하는데 사용될 수 있다. 프로그램의 상태는 예를 들어, 메모리의 내용(즉, 코드, 글로벌, 힙 및 스택) 및 그 레지스터의 내용에 따라 달라진다. 레지스터의 내용은 스택 포인터, 프레임 포인터 및 프로그램 카운터를 포함한다. setjmp는 프로그램의 현재 상태를 저장하여 longmp가 프로그램 상태를 복원할 수 있도록 하며 그래서 프로그램 실행 상태를 setjmp가 호출되었을 때의 상태였던 대로 리턴한다. 달리 말하면, longjmp()는 리턴하지 않다. 오히려, longjmp가 호출되면, 실행은 (setjmp에 의해 저장된 대로) 이전에 저장된 프로그램 상태에 의해 나타내는 특정 포인트로 리턴하거나 재개한다. 따라서 표준 호출 또는 리턴 관행을 사용하지 않고 longjmp()는 신호 핸들러로부터 프로그램에 저장된 실행 포인트로 다시 제어를 이전하는데 사용될 수 있다.So setjmp can be used to save the current state of the program. The state of a program depends, for example, on the contents of memory (ie code, global, heap and stack) and the contents of its registers. The contents of the registers include the stack pointer, frame pointer and program counter. setjmp saves the current state of the program so that longmp can restore the program state, so it returns the running state of the program as it was when setjmp was called. In other words, longjmp() does not return. Rather, when longjmp is called, execution returns or resumes to a particular point indicated by the previously saved program state (as saved by setjmp). Thus, rather than using standard call or return conventions, longjmp() can be used to transfer control from a signal handler back to a stored execution point in the program.

예를 들어, 도 50이 참조된다. 예(560)에서, 루틴 main은 루틴 first(563)를 호출할 수 있고, 루틴 first(563)는 루틴 second(564)를 호출할 수 있다. 도시된 바와 같이, main(562)은 루틴 first를 호출하기 전에 포인트 X1에 있는 call to setjmp를 포함할 수 있다. 먼저 setjmp가 포인트 X1에서 호출되고, 0을 리턴한 다음 루틴 first가 호출된다. longjmp가 실행된 후, setjmp는 1을 리턴한다. 루틴 second(564)는 setjmp가 호출되었던 위치 X1에 있는 메인으로 제어를 이전시키게 하는 포인트 X2에 있는 call to longjmp를 포함한다. 이제 setjmp가 다시 호출되고 1을 리턴하므로, first가 호출되지 않고 제어는 NEXT로 진행된다.See, for example, FIG. 50 . In example 560, routine main can call routine first 563, and routine first 563 can call routine second 564. As shown, main 562 may include a call to setjmp at point X1 before calling routine first. First setjmp is called at point X1, returns 0 then routine first is called. After longjmp runs, setjmp returns 1. Routine second 564 includes a call to longjmp at point X2 which causes control to transfer to main at location X1 where setjmp was called. Now that setjmp is called again and returns 1, first is not called and control goes to NEXT.

스택 보호 정책과 관련하여, 이전에 settmp에 의해 저장된 포인트 X1로 실행을 재개하기 전에 스택을 클리어하는 것이 바람직할 수 있다. 예를 들어, 위의 호출 체인 main-first-second에 기초하여, 3 스택 프레임은 호출 스택에 존재할 수 있고 longjmp 호출과 setjmp 호출 사이의 호출 체인에서 호출과 연관된 스택 메모리를 클리어하는 처리가 수행될 수 있다. 특히, longjmp의 코드는 이 예에서 first(563) 및 second(564)에 대한 스택 프레임을 클리어하는 코드를 포함할 수 있다. 이제 스택 보호 정책에 따라 이러한 스택 클리어 동작을 수행하는 것과 관련하여 사용될 수 있는 기술이 설명될 것이다.Regarding the stack protection policy, it may be desirable to clear the stack before resuming execution to the point X1 previously saved by settmp. For example, based on the call chain main-first-second above, 3 stack frames could exist in the call stack and in the call chain between the longjmp call and the setjmp call, a process to clear the stack memory associated with the call could be performed. there is. In particular, longjmp's code may include code to clear the stack frames for first (563) and second (564) in this example. Techniques that may be used in connection with performing these stack clearing operations according to a stack protection policy will now be described.

프로그램 상태를 스택 메모리에 저장하는 setjmp를 수행할 때의 스택 보호 정책과 관련하여, 실시예는 현재의 스택 포인터 메모리 셀을 식별된 태그 컴포넌트로 태깅하여, 후속 longjmp와 관련하여 스택이 setjmp 이래로 변경되지 않았음을 규칙이 체크할 수 있도록 한다. 데이터는 현재 프로그램 상태를 나타내는, setjmp 데이터 구조체인 jmpbuf에 저장될 수 있다. 저장된 데이터는 스택 포인터, 프로그램 카운터, 제 1 포인터((예를 들어, 현재 스택 포인터 메모리 셀을 가리키는) 구별된 태그 컴포넌트로 태깅된 메모리 위치를 가리키는 것이 허용되는 포인터로서 태깅됨) 및 제 2 포인터(longjmp 처리를 수행하는 인가를 제공하는 longjmp-clearing-authority-pointer로서 태깅됨)를 포함한다. 적어도 하나의 실시예에서, longjmp-clearing-authority-pointer는 이 절차에서 재귀적으로 호출될 수 있는 절차들의 세트에서 프레임과 연관된 태그를 클리어하는 인가만을 제공할 수 있다.Regarding the stack protection policy when performing a setjmp which saves the program state to stack memory, embodiments tag the current stack pointer memory cell with an identified tag component so that, with respect to subsequent longjmps, the stack has not changed since setjmp. Allow the rule to check that it is not. Data can be stored in a setjmp data structure, jmpbuf, which represents the current program state. The stored data includes a stack pointer, a program counter, a first pointer (tagged as a pointer allowed to point to a memory location tagged with a distinct tagged component (e.g., pointing to the current stack pointer memory cell)) and a second pointer ( tagged as longjmp-clearing-authority-pointer, which provides authorization to perform longjmp processing). In at least one embodiment, the longjmp-clearing-authority-pointer may provide only authorization to clear a tag associated with a frame in a set of procedures that may be called recursively in this procedure.

longjmp를 수행할 때 스택 보호 정책과 관련하여, 코드는 현재의 스택 포인터가 set jump 구조체(예를 들어, setjmp 데이터 구조체, jmpbuf)의 저장된 스택 포인터보다 깊은 스택 위치를 나타내는지를 체크할 수 있다. (set jump에 의해 저장된 것으로서) 저장된 스택 포인터를 포함하는 set jump 구조체의 메모리 셀이 (set jump 구조체의) 태깅된 제 1 포인터와 호환 가능한 태그를 갖는지 확인하는 규칙이 트리거될 수 있다. 현재 스택 포인터와 (set jump 구조체에서 set jump에 의해 이전에 저장된 것으로) 저장된 스택 포인터 사이의 모든 스택 메모리 위치를 클리어하는 코드가 실행될 수 있다. 이러한 코드는 스택 클리어링 인가를 제공하는 longjmp-clearing-authority-pointer(예를 들어, 클리어된 위치를 가리키는데 사용된 제 2 포인터)로서 태깅된 위에서 언급한 제 2 포인터를 사용하여 클리어를 수행할 수 있다. 규칙은 클리어링을 수행하는 코드에 의해 트리거될 수 있고, 이 경우 규칙은 제 2 포인터가 longjmp-clearing-authority-pointer로서 태깅되었는지를 체크한다. longjmp의 명령어는 호출된 규칙이 고유하게 태깅된 명령어로 하여금 longjmp-clearing-authority-pointer로서 태깅된 포인터를 사용할 수 있도록 고유하게 태깅된다. longjmp에 없는 다른 코드는 longjmp-clearing-authority-pointer로서 태깅된 포인터를 사용할 수 없다(예를 들어, 다른 코드는 longjmp-clearing-authority-pointer 사용을 허용하도록 태깅되지 않는다).Regarding the stack protection policy when performing a longjmp, the code can check that the current stack pointer points to a stack location deeper than the stored stack pointer of the set jump structure (e.g., the setjmp data structure, jmpbuf). A rule may be triggered that checks that the memory cell of the set jump structure containing the stored stack pointer (as stored by the set jump) has a tag compatible with the tagged first pointer (of the set jump structure). Code can be executed that clears all stack memory locations between the current stack pointer and the stored stack pointer (from the set jump structure to the one previously saved by the set jump). Such code can perform the clearing using the above-mentioned second pointer tagged as longjmp-clearing-authority-pointer (e.g., the second pointer used to point to the cleared location) that provides the stack clearing authorization. there is. A rule can be triggered by code that performs clearing, in which case the rule checks whether the second pointer is tagged as a longjmp-clearing-authority-pointer. Instructions in longjmp are uniquely tagged so that invoked rules allow uniquely tagged instructions to use a pointer tagged as longjmp-clearing-authority-pointer. Other code not in longjmp cannot use a pointer tagged as longjmp-clearing-authority-pointer (eg, other code is not tagged to allow use of longjmp-clearing-authority-pointer).

적어도 하나의 실시예에서, 명령어의 태깅은 컴파일러가 원하는 명령어 태깅 및/또는 메모리 위치 태깅을 수행하도록 규칙을 호출하는 명령어 시퀀스를 발생하게 함으로써 수행될 수 있다. 예를 들어, 스택 메모리 위치 태깅의 경우, 컴파일러는 스택 위치의 태그를 초기화하거나 재설정하는 규칙을 트리거하는 스토어 명령어를 갖는 명령어 시퀀스를 생성할 수 있다. 태깅 명령어의 경우, 컴파일러는 명령어에 대한 태그가 명령어에 의해 액세스된 태깅된 메모리 위치와 연관된 컬러에 기초할 수 있는, 명령어에 태깅하는 규칙을 트리거하는 스토어 명령어를 갖는 명령어 시퀀스를 생성할 수 있다. 연관된 스택 프레임을 갖는 호출로부터의 리턴과 관련하여, 스택으로부터 프레임을 클리어하는 코드가 추가될 수 있다. 엄격한 객체 초기화가 사용되고 호출에 응답하여 생성된 새로운 프레임이 생성될 때, 새로운 프레임의 객체를 적절히 태깅하거나 컬러화하는 코드가 추가될 수 있다.In at least one embodiment, tagging of instructions may be performed by causing a compiler to generate a sequence of instructions that invoke rules to perform the desired instruction tagging and/or memory location tagging. For example, in the case of stack memory location tagging, the compiler can generate an instruction sequence with a store instruction that triggers a rule that initializes or resets the tag of the stack location. For tagging instructions, the compiler can generate an instruction sequence with a store instruction that triggers a rule for tagging instructions, where the tag for the instruction can be based on a color associated with a tagged memory location accessed by the instruction. Upon return from a call that has an associated stack frame, code may be added to clear the frame from the stack. When strict object initialization is used and a new frame created in response to a call is created, code can be added to properly tag or color the objects in the new frame.

이제 도 51 내지 도 53을 참조하여, 예를 들어, 악성 코드에 의해 행해지는 것과 같이, 스택에 행해질 수 있는 상이한 인가되지 않은 또는 의도되지 않은 수정(예를 들어, 스택 수정을 통한 공격을 지칭하는 "스택 공격") 또는 비 악의적인 코드에 의한 의도하지 않은 스택 수정(예를 들어, 우발적인 오버라이팅 또는 버퍼 오버 플로우)의 예가 설명될 것이다.Referring now to FIGS. 51-53 , different unauthorized or unintentional modifications that may be made to the stack, such as, for example, by malicious code (e.g., referring to an attack via stack modification) Examples of "stack attacks") or unintentional stack modification by non-malicious code (eg, accidental overwriting or buffer overflow) will be described.

도 51 내지 도 52는 제 3 자 코드(예를 들어, 호출된 라이브러리 루틴)와 같은 코드 모듈에 의해 행해진 스택 수정과 관련하여 스택 공격을 방지하기 위해 취해질 수 있고 임의의 공격자 모델로서 특징지을 수 있는 조치를 도시한다. 따라서, (570 및 575)의 사례는 예를 들어, 승인 받지 않거나 의도하지 않은 스택 수정을 수행하는 코드를 포함하는 호출된 제 3 자 라이브러리 루틴의 결과로서, 발생할 수 있다. 또한, 스택 수정은 호출된 라이브러리 루틴의 코드에 의해 추가로 호출되는 또 다른 루틴에 의해 이루어질 수도 있다. (570 및 575)의 각 라인은 3 개의 정보 열을 포함한다. 각 라인(572a 내지 572h)에서, 열 1은 원하지 않는 런타임 실행 거동을 나타내는 것을 방지하는 아이템을 식별하고, 열 2는 열 1의 원하지 않는 거동을 회피하기 위해 취해질 수 있는 예방 조치를 식별하고, 열 3은 열 2의 예방 조치를 구현하거나 시행하는데 사용될 수 있는 하나 이상의 메커니즘을 식별한다. 일반적으로, 열 3에서, 특정 시스템에 따라 독립적으로 및 별도로 구현될 수 있는 대안의 메커니즘이 나열되어 있다. 예를 들어, 종래의 시스템은 제 1 메커니즘으로서 별도의 프로세스를 사용할 수 있는 반면, 제 2 시스템은 대안적으로 능력을 사용할 수 있고, 제 2 시스템은 선택적으로 특정 스택 위치의 컬러화 또는 태깅을 사용할 수 있다.51-52 show stack attacks in relation to stack modifications made by code modules such as third party code (e.g., called library routines) that can be taken to prevent stack attacks and can be characterized as an arbitrary attacker model. show action Thus, instances of 570 and 575 may occur, for example, as a result of a third party library routine being called that contains code that performs unauthorized or unintended stack modifications. Stack modifications may also be made by another routine that is further called by the code of the called library routine. Each line of 570 and 575 contains three information streams. In each line 572a through 572h, column 1 identifies an item that prevents exhibiting undesirable runtime execution behavior, column 2 identifies preventive actions that can be taken to avoid the undesired behavior of column 1, and 3 identifies one or more mechanisms that can be used to implement or enforce the preventive measures in column 2. In general, in column 3, alternative mechanisms are listed that can be implemented independently and separately depending on the particular system. For example, a conventional system may use a separate process as a first mechanism, while a second system may alternatively use capabilities, and a second system may optionally use coloring or tagging of specific stack locations. there is.

본 명세서의 다른 곳에서의 논의와 일관성 있게 추가로 설명하면, 호출이 이루어질 때 실행되는 프롤로그 코드(prolog code)와 같은 코드는 리턴 어드레스 및 레지스터를 스택에 기입한다. 프롤로그 코드는 스택 위치를 특수 태그로 태깅하여 코드가 스택 위치를 수정하거나 일반적으로 스택 위치에 액세스할 수 있는 것을 제한하는 규칙을 호출할 수 있다. 예를 들어, 프롤로그 코드는 리턴 어드레스, 레지스터 등을 스택 프레임의 메모리 셀에 저장하기 위해 메모리 기입/스토어를 수행할 수 있다. 프롤로그 코드의 이러한 기입/스토어 명령어는 메모리 위치를 특수한 것으로 마킹하고 코드가 메모리 셀을 수정할 수 있는 것을 제한하기 위해 스택 프레임의 메모리 셀을 특수 태그인 STACK FRAME TAG로 태깅하는 규칙을 호출할 수 있다. 프롤로그 코드의 기입/스토어 명령어는 이와 같은 태깅을 수행할 수 있는 명령어를 제한하는 PROLOG STACK TAG 태그로도 태깅될 수 있다. 다음은 스택 프레임의 메모리 셀을 특수 태그인 STACK FRAME TA로 태깅하여 메모리 위치를 특수하게 마킹하고 코드가 메모리 셀을 수정할 수 있는 것을 제한하는 프롤로그 코드의 기입/스토어 명령어에 의해 호출된 규칙에 의해 시행되는 로직의 예이다:For further explanation, consistent with discussion elsewhere herein, code such as prolog code that is executed when the call is made writes the return address and register to the stack. Prologue code can tag stack locations with special tags so that code can modify stack locations or invoke rules that generally restrict access to stack locations. For example, the prolog code can perform a memory write/store to store return addresses, registers, etc. in the memory cells of the stack frame. These write/store instructions in the prolog code may invoke a rule to tag the memory cells of the stack frame with a special tag, STACK FRAME TAG, to mark the memory location as special and to limit what code can modify the memory cells. Write/store instructions in the prolog code can also be tagged with the PROLOG STACK TAG tag, which limits the instructions that can perform such tagging. The following is enforced by the rules invoked by the write/store instructions in the prologue code that specifically mark memory locations by tagging the memory cells of the stack frame with a special tag, STACK FRAME TA, and restrict the code from being able to modify the memory cells. Here is an example of the logic to be:

전술된 규칙 로직에서, 출력 태그는 스택 위치에 놓인 태그를 지칭한다.In the rule logic described above, an output tag refers to a tag placed in a stack location.

유사한 방식으로, 리턴을 수행하면서 호출되는 에필로그 코드(epilogue code)와 같은 다른 코드는 스택 또는 그 일부를 클리어하는 것이 허용될 수 있다. 에필로그 코드는 EPILOG STACK TAG라는 특수 태그(예를 들어, CI 태그)로 태깅될 수 있으며, 특수 태그인 STACK FRAME TAG로 태깅된 포인터의 액세스를 통해 인가를 부여받을 수 있다. 에필로그 코드는 STACK FRAME TAG으로 특수하게 태깅된 포인터를 사용하여 기입/스토어 동작을 사용하여 전술한 스택 클리어 동작을 수행할 수 있다. 스택 클리어링을 수행하는 것을 추가로 제한하기 위해, 에필로그 코드는 위에서 언급된 대로 태깅될 수 있다. 이러한 실시예에서, 기입/스토어 명령어는 (STACK FRAME TAG으로 태깅된) 특수 태깅된 포인터를 사용하여 에필로그 코드에 의해서만 스택 클리어링이 수행될 수 있는 정책을 시행하기 위해 다음의 로직을 구현하는 규칙을 호출할 수 있다: In a similar manner, other code, such as epilogue code that is called while performing a return, may be allowed to clear the stack or part thereof. The epilog code can be tagged with a special tag called EPILOG STACK TAG (eg, CI tag), and can be authorized through access of a pointer tagged with the special tag STACK FRAME TAG. The epilog code can perform the aforementioned stack clearing operations using write/store operations using a pointer specially tagged with the STACK FRAME TAG. To further constrain performing stack clearing, the epilog code can be tagged as mentioned above. In this embodiment, the write/store instructions use a specially tagged pointer (tagged with STACK FRAME TAG) to invoke a rule implementing the following logic to enforce the policy that stack clearing can only be performed by the epilog code: can do:

스택으로부터 리턴 어드레스 및 레지스터를 복원하도록 의도된 코드에는 스택의 이렇게 특수 태깅된 메모리 셀을 판독하는 인가가 부여될 수 있다. 이러한 인가는 예를 들어 다음과 같은 것: 코드가 스택의 특수 태깅된 메모리 셀에 액세스하는 것이 허용됨을 나타내는 코드에 태깅하는 것(CI 태그), 코드가 인가를 갖고 있음을 표시하기 위해 PC에 태깅하는 것, 또는 포인터가 특수 태깅된 메모리 셀을 가리키고 포인터상의 태그가 액세스 인가를 나타내는 코드에 의해 사용되는 포인터에 태깅하는 것 중 임의의 것에 의해 부여될 수 있다: 예를 들어, 판독/로드 명령어는 STACK FRAME TAG으로 태깅된 스택 메모리 셀을 판독하는 인가를 부여받을 수 있다. 일 실시예에서, 판독/로드 명령어는 (STACK FRAME TAG로 태깅된) 특수 태깅된 포인터를 사용하는 판독/로드 명령어만을 허용함으로써 스택 메모리 위치로부터 판독하는 인가를 부여받을 수 있다. (STACK FRAME POINTER로 태깅된) 특수하게 태깅된 포인터를 사용하는 판독/로드 명령어만을 허용하여 (STACK FRAME TAG로 태깅된) 특수 태깅된 스택 메모리 위치로부터 판독하는 규칙 로직은 다음과 같을 수 있다: Code intended to restore registers and return addresses from the stack may be given permission to read these specially tagged memory cells of the stack. Such authorizations include, for example: tagging code to indicate that the code is allowed to access specially tagged memory cells in the stack (CI tags), tagging the PC to indicate that the code has authorizations or tagging a pointer where the pointer points to a specially tagged memory cell and the tag on the pointer is used by code indicating an access authorization: for example, a read/load instruction Authorization to read stack memory cells tagged with STACK FRAME TAG can be granted. In one embodiment, a read/load command may be authorized to read from a stack memory location by allowing only read/load commands using a specially tagged pointer (tagged with STACK FRAME TAG). The rule logic to read from a specially tagged stack memory location (tagged with STACK FRAME TAG) allowing only read/load instructions using a specially tagged pointer (tagged with STACK FRAME POINTER) could be:

전술한 것의 변형예로서, 판독/로드 명령어는 판독/로드 명령어에 의해 사용되는 포인터를 특수 태그 STACK FRAME TAG로 태깅함으로써 인가를 부여받을 수 있다.As a variation of the foregoing, a read/load command may be authorized by tagging the pointer used by the read/load command with the special tag STACK FRAME TAG.

(STACK FRAME INSTRUCTION로 태깅된) 특수 태깅된 명령어를 사용하는 판독/로드 명령어만 허용하여 스택 메모리 위치로부터 판독하는 규칙 로직은 다음과 같을 수 있다:The rule logic to read from a stack memory location allowing only read/load instructions using specially tagged instructions (tagged with STACK FRAME INSTRUCTION) could be:

메커니즘의 예는 아래에서 그리고 다른 곳에서 더 상세하게 설명된다. Examples of mechanisms are described in more detail below and elsewhere.

요소(572a)는 호출 루틴(호출자)에게 절대로 리턴하지 않는 호출된 루틴(피 호출자)의 바람직하지 않은 런타임 거동을 식별한다. 이 거동을 방지하기 위해, 취해진 조치는 각 호출과 연관된 제한 시간을 갖는 것으로, 이 경우 호출된 루틴을 완료하는데 최대 양의 시간이 허용될 수 있다. 최대 양의 시간이 경과한 후, 호출된 루틴의 런타임 실행이 종료된다. 타임 아웃을 구현하는 메커니즘은 제 3 자 코드의 호출된 루틴이 타임 아웃을 시행하는 별도의 스레드로부터 이루어지게 하는 것, 또는 시간 또는 명령어 제한된 호출을 사용하여 호출된 루틴의 시간 양을 직접 제한하는 것을 포함할 수 있다.Element 572a identifies an undesirable runtime behavior of the called routine (callee) never returning to the calling routine (caller). To prevent this behavior, a measure taken is to have a time limit associated with each call, in which case the maximum amount of time can be allowed to complete the called routine. After the maximum amount of time has elapsed, runtime execution of the called routine is terminated. Mechanisms for implementing timeouts include either having the called routine in third-party code come from a separate thread enforcing the timeout, or directly limiting the amount of time the called routine uses time- or instruction-bound calls. can include

요소(572b)는 호출된 루틴이 메모리와 같은 이용 가능한 자원을 소모할 수 있는 자원 고갈의 바람직하지 않은 런타임 거동을 식별한다. 이 거동을 방지하기 위해, 취한 조치는 호출된 루틴에서 사용 가능해진 자원을 제한하는 것일 수 있다. 타임 아웃을 구현하는 메커니즘은 제 3 자 코드의 호출된 루틴이 최대 자원 제한을 시행하는 별도의 스레드로부터 이루어지게 하는 것 또는 특수 명령어 제한된 호출을 사용하여 호출된 루틴의 자원의 양을 직접 제한하는 것을 포함할 수 있다.Element 572b identifies an undesirable runtime behavior of resource exhaustion where the called routine may consume available resources such as memory. To prevent this behavior, the action taken may be to limit the resources made available to the called routine. Mechanisms for implementing timeouts may include having a called routine in third-party code run on a separate thread that enforces a maximum resource limit, or using special instruction-restricted calls to directly limit the amount of resources in a called routine. can include

요소(572c)는 예컨대 예상된 호출을 또 다른 루틴에 행함으로써 예기치 않은 인가를 행사하는 호출된 루틴의 원하지 않은 런타임 거동을 식별한다. 이 거동을 방지하기 위해, 취한 조치는 호출된 루틴의 인가를 허용 가능한 최소 인가로 제한하는 것일 수 있다. 이것을 구현하는 메커니즘은 PC를 피호출자 또는 호출된 루틴의 인가 및 제어 능력으로 태깅하는 것, 호출된 루틴에 액세스 가능한 파일 시스템 또는 다른 자원의 일부를 제한하는 것, 및 호출된 루틴이 행할 수 있는 것을 허용 가능한 시스템이 호출하는 것을 제한하는 것을 포함할 수 있다.Element 572c identifies undesirable runtime behavior of a called routine exercising an unexpected authorization, such as by making an expected call to another routine. To prevent this behavior, the action taken may be to limit the authorization of the called routine to the minimum acceptable authorization. Mechanisms to implement this include tagging the PC with the authorization and control capabilities of the callee or called routine, limiting the parts of the file system or other resources accessible to the called routine, and restricting what the called routine can do. This can include limiting what systems are allowed to call.

요소(572d)는 호출된 루틴에 의해 차후에 호출된 다른 루틴에 의해 레지스터에 남아 있는 아이템을 판독하는 호출된 루틴의 원하지 않은 런타임 거동을 식별한다(예를 들어, mycode는 라이브러리에서 P1을 호출하고 P1은 루틴 악의를 호출하고 P1은 악의에 의해 레지스터에 남아 있는 데이터를 판독할 수 있다). 이 거동을 방지하기 위해, 입력 없고(non-input) 리턴 없는(non-return) 레지스터를 클리어하는 조치가 취해질 수 있다. 이것을 구현하는 메커니즘은 명시적인 레지스터 클리어를 수행하는 것, 리턴 없고 입력 없는 레지스터를 비롯한 스택의 일부를 컬러화하여 이들이 호출된 루틴에서 판독될 수 없도록 하는 것, 및 별도의 프로세스가 호출된 루틴을 호출하게 하는 것을 포함할 수 있다.Element 572d identifies undesirable runtime behavior of the called routine reading items left in registers by other routines subsequently called by the called routine (e.g., mycode calls P1 from the library and P1 calls the routine maliciously, and P1 can read the data left in the register by malicious intent). To prevent this behavior, steps can be taken to clear non-input and non-return registers. Mechanisms to implement this include performing explicit register clears, colorizing parts of the stack, including registers with no returns and no inputs, so that they cannot be read by the called routine, and having a separate process call the called routine. may include doing

요소(572e)는 호출된 루틴에 의해 차후에 호출된 다른 루틴에 의해 스택에 남아있는 아이템을 판독하는 호출된 루틴의 원하지 않은 런타임 거동을 식별한다(예를 들어, mycode는 라이브러리 내 P1을 호출하고 P1은 루틴 악의를 호출하고, P1은 악의에 의해 스택에 남아있는 데이터를 판독할 수 있다). 이러한 거동을 방지하기 위해, 호출된 스택을 액세스 불가능하게 만드는 조치가 취해질 수 있다(예를 들어, 악의와 같은 추가로 호출된 다른 루틴에 의해 사용되는 스택 영역은 P1과 같은 처음 호출된 루틴에 액세스 불가능하다). 이것을 구현하는 메커니즘은 (예를 들어, 처음 호출된 루틴 P1 및 추가로 호출된 루틴에 대해) 별도의 스택을 사용하는 것, 능력(예를 들어, PC에 태깅하거나, 특별한 스택 영역에 액세스하도록 허용된 특수 태깅된 포인터를 사용하여 스택 영역을 판독하도록 허용된 코드의 기능 또는 인가를 제한하는 것), 컬러화(예를 들어, 스택의 데이터 영역을 태깅하여 코드가 액세스할 수 있는 것을 제한하는 것), 및 별도의 프로세스가 호출된 루틴을 호출하게 하는 것을 포함할 수 있다.Element 572e identifies undesirable runtime behavior of the called routine reading items left on the stack by other routines subsequently called by the called routine (e.g., mycode calls P1 in the library and P1 calls the routine maliciously, and P1 can read the data left on the stack by malicious intent). To prevent this behavior, measures can be taken to make the called stack inaccessible (e.g. the stack area used by other routines called further, such as malicious access to the first called routine, such as P1). impossible). Mechanisms to implement this include using separate stacks (e.g. for the first called routine P1 and additionally called routines), capabilities (e.g. tagging the PC, or allowing access to special stack areas). Restricting the ability or authorization of code allowed to read areas of the stack using specially tagged pointers that have been tagged), colorization (for example, restricting what code can access by tagging data areas of the stack) , and having a separate process call the called routine.

요소(572f)는 스택 프리픽스(stack prefix) 내의 아이템을 고쳐 기입하는(write over) (예를 들어, 리턴 어드레스를 식별하는 리턴 어드레스 영역을 오버라이팅하는) 호출된 루틴의 원하지 않은 런타임 거동을 식별한다. 스택 프리픽스는 일부 이전 호출자에게 리턴하는데 필요한 정보를 포함하는 스택의 영역일 수 있다. 이러한 거동을 방지하기 위해, 취하는 조치는 스택 프리픽스가 호출된 루틴에 액세스할 수 없게 하거나 기입할 수 없게 하는 것이다. 이것을 구현하는 메커니즘은 호출된 루틴 및 호출된 루틴을 호출하는 사용자 코드가 별도의 스택을 사용하게 하는 것, 능력을 사용하는 것(예를 들어, 특수 태깅된 코드 또는 PC 태그 또는 특수 태깅된 포인터를 통해 인가가 제공된 코드를 통해 액세스할 수 있게 하는 것), 컬러화를 사용하는 것(예를 들어, 호출된 루틴이 액세스할 수 없도록 스택 프리픽스의 데이터 아이템을 특수 태그에 태깅하는 것), 및 별도의 프로세스가 호출된 루틴을 호출하게 하는 것을 포함할 수 있다.Element 572f identifies undesirable runtime behavior of a called routine that writes over items in the stack prefix (e.g., overwrites the return address area identifying the return address). . A stack prefix can be an area of the stack that contains information needed to return to some previous caller. To prevent this behavior, the action taken is to make the stack prefix inaccessible or writeable to the called routine. Mechanisms to implement this include having the called routine and the user code calling the called routine use separate stacks, using capabilities (e.g. special tagged code or PC tags or specially tagged pointers). using colorization (e.g., tagging data items with stack prefixes with special tags so that called routines cannot access them), and using separate This can include having the process call the called routine.

요소(572g)는 스택 프리픽스에서 데이터를 판독한 호출된 루틴의 원하지 않은 런타임 거동을 식별한다. 이러한 거동을 방지하기 위해, 취한 조치는 (572f)에서 설명한 것과 유사한 메커니즘을 사용하여 스택 프리픽스를 호출된 루틴에 액세스할 수 없게 만드는 것이다.Element 572g identifies undesirable runtime behavior of the called routine that reads data from the stack prefix. To prevent this behavior, the action taken is to make the stack prefix inaccessible to the called routine using a mechanism similar to that described in 572f.

요소(572h)는 예컨대 포인터가 스택 프리픽스에 저장되어 있는 리턴 어드레스로 포인터를 오버라이트함으로써 스택 프리픽스에서 제어 흐름을 리다이렉트하는 호출된 루틴의 원하지 않은 런타임 거동을 식별한다. 이러한 거동을 방지하기 위해, 스택 프리픽스에 저장된 리턴 어드레스를 보호하는 조치가 취해질 수 있다. 일 양태에서, 요소(572h)는 (572h)의 특정 인스턴스를 식별하므로 (572h)의 메커니즘은 (572f)의 메커니즘과 유사하다. 이를 구현하는 메커니즘은 호출된 루틴 및 호출된 루틴을 호출하는 사용자 코드가 별도의 스택을 사용하게 하는 것, 능력을 사용하는 것(예를 들어, 특수 태깅된 코드 또는 PC 태그 또는 액세스 인가에 의해 태깅되는 특수 태깅된 리턴 포인터를 통해 인가가 제공된 코드를 통해 액세스할 수 있게 하는 것), 컬러화를 사용하는 것(예를 들어, 호출된 루틴이 액세스할 수 없도록 리턴 어드레스를 포함하는 스택 프리픽스의 메모리 위치를 특수 태그로 태깅하는 것), 및 별도의 프로세스가 호출된 루틴을 호출하게 하는 것을 포함할 수 있다.Element 572h identifies undesirable runtime behavior of a called routine that redirects control flow from the stack prefix, such as by overwriting the pointer to the return address stored in the stack prefix. To prevent this behavior, measures can be taken to protect the return address stored in the stack prefix. In one aspect, the mechanism of 572h is similar to that of 572f since element 572h identifies a particular instance of 572h. Mechanisms to implement this include having the called routine and the user code calling the called routine use separate stacks, using capabilities (e.g. tagging by special tagged code or PC tag or access authorization). using colorization (e.g., a memory location on the stack prefix that contains the return address so that the called routine cannot access it); with a special tag), and having a separate process call the called routine.

도 53은 임의의 입력 공격자 모델과 관련하여 스택 공격을 방지하기 위해 취할 수 있는 조치를 도시한다.Fig. 53 illustrates actions that can be taken to prevent stack attacks in the context of an arbitrary input attacker model.

요소(581a)는 실행 루틴의 현재 프레임 내의 의도하지 않은 아이템을 고쳐 기입하는 코드를 실행하는 원하지 않은 런타임 거동을 식별한다. 이러한 거동을 방지하기 위해, 취한 조치는 객체 무결성을 유지하는 것일 수 있다. 이를 구현하는 메커니즘은 (예를 들어, 특수 태깅된 코드를 구비한 능력 또는 PC 태그를 통해서 또는 액세스 인가에 의해 태깅된 특수 태깅된 리턴 포인터를 통해 인가가 제공된 코드를 구비한 능력을 통해 액세스할 수 있게 하는) 객체에 의한 능력을 사용하는 것 또는 (예를 들어, 객체의 메모리 위치에 태깅하는) 객체에 의한 컬러를 사용하는 것을 포함할 수 있다.Element 581a identifies undesirable runtime behavior of executing code that rewrites unintended items within the current frame of an executing routine. To prevent this behavior, the action taken may be to maintain object integrity. Mechanisms to implement this can be accessed (e.g., through a capability with specially tagged code or through a capability with code provided authorization via a PC tag or via a specially tagged return pointer tagged by an access authorization). This may include using a capability by the object (e.g., tagging an object's memory location) or using a color by the object (e.g., tagging an object's memory location).

요소(581b)는 실행 루틴의 현재 프레임에서 아이템을 판독하는 원하지 않은 런타임 거동을 식별한다. 이러한 거동을 방지하기 위해, 취한 조치는 객체 무결성을 유지하는 것일 수 있다. 이를 구현하는 메커니즘은 (예를 들어, 특수 태깅된 코드를 구비한 능력 또는 PC 태그를 통해 또는 액세스 인가에 의해 태깅된 특수 태깅된 리턴 포인터를 통해 인가가 제공된 코드를 구비한 능력을 통해 액세스할 수 있는) 객체에 의한 능력을 사용하는 것 또는 (예를 들어, 객체의 메모리 어드레스를 객체 특정 태그로 태깅하는) 객체 의한 컬러를 사용하는 것을 포함할 수 있다. Element 581b identifies undesirable runtime behavior of reading items in the current frame of an executing routine. To prevent this behavior, the action taken may be to maintain object integrity. Mechanisms to implement this can be accessed (e.g., capabilities with specially tagged code or capabilities with code provided authorization via a PC tag or via a specially tagged return pointer tagged by an access authorization). This may include using a capability by an object) or using a color by an object (eg, tagging an object's memory address with an object specific tag).

요소(581c)는 (예를 들어, 실행 코드를 호출한 다른 루틴의) 선행하는 프레임에서 의도하지 않은 아이템을 다시 기입하는 (현재 프레임을 갖는) 코드를 실행하는 원하지 않은 런타임 거동을 식별한다. 이러한 거동을 방지하기 위해, 취한 조치는 스택 프레임을 격리하거나 분리하는 것일 수 있다. 이를 구현하는 메커니즘은 (예를 들어, 특수 태깅된 코드를 구비한 능력 또는 PC 태그를 통해 또는 액세스 인가에 의해 태깅된 특수 태깅된 리턴 포인터를 통해 인가가 제공된 코드를 구비한 능력을 통해 액세스할 수 있는) 프레임에 의한 능력을 사용하는 것 또는 (예를 들어, 객체의 메모리 위치에 태깅하는) 프레임에 의한 컬러를 사용하는 것을 포함할 수 있다.Element 581c identifies an undesired runtime behavior of executing code (with the current frame) rewriting an unintended item in a preceding frame (eg, of another routine that called the executable code). To prevent this behavior, measures taken may be to isolate or isolate the stack frame. Mechanisms to implement this can be accessed (e.g., capabilities with specially tagged code or capabilities with code provided authorization via a PC tag or via a specially tagged return pointer tagged by an access authorization). This may include using a frame-by-frame capability) or using a frame-by-color (e.g., tagging an object's memory location).

요소(581d)는 (예를 들어, 실행 코드를 호출한 다른 루틴의) 선행하는 프레임에서 아이템을 판독하는 (현재 프레임을 갖는) 코드를 실행하는 원하지 않은 런타임 거동을 식별한다. 이러한 거동을 방지하기 위해, 취한 조치는 스택 프레임을 격리하거나 분리하는 것일 수 있다. 이를 구현하는 메커니즘은 요소(581c)에서 설명한 것처럼 프레임별 능력 또는 프레임별 컬러를 사용하는 것을 포함할 수 있다.Element 581d identifies undesirable runtime behavior of executing code (with the current frame) reading an item from a preceding frame (eg, of another routine that called the executable code). To prevent this behavior, measures taken may be to isolate or isolate the stack frame. Mechanisms to implement this may include using per-frame capabilities or per-frame colors as described in element 581c.

요소(581e)는 현재 실행중인 코드에 의해 호출된 다른 루틴에 의해 스택의 남아있는 아이템을 판독하는 (현재 프레임을 갖는) 코드를 실행하는 원하지 않은 런타임 거동을 식별한다. 예방 조치는 호출된 루틴의 호출된 스택을 현재 실행중인 코드에 액세스할 수 없게 만드는 것이다. 이를 구현하는 메커니즘은 (572g)과 관련하여 설명한 것과 유사한 방식으로 별도의 프로세스, 별도의 스택, 능력 및 컬러화를 사용하는 것을 포함할 수 있다.Element 581e identifies undesirable runtime behavior of executing code (with the current frame) reading remaining items on the stack by other routines called by the currently executing code. A precautionary measure is to make the called stack of the called routine inaccessible to currently executing code. Mechanisms to implement this may include using separate processes, separate stacks, capabilities, and colorizations in a manner similar to that described with respect to (572g).

요소(581f)는 리턴 포인터(예를 들어, 실행 코드를 호출한 루틴 내의 리턴 어드레스를 포함하는 스택 내의 위치)를 수정하는 (현재 프레임을 갖는) 코드를 실행하는 원하지 않은 런타임 거동을 식별한다. 예방 조치는 리턴 포인터를 포함하여 스택 내의 리턴 포인터 또는 위치를 보호하는 것이다. 이를 구현하는 메커니즘은 (572g)와 관련하여 설명한 것과 유사한 방식으로 능력 및 컬러화를 사용하는 것을 포함할 수 있다.Element 581f identifies undesirable runtime behavior of executing code (with the current frame) that modifies the return pointer (eg, the location in the stack that contains the return address within the routine that called the executable code). A precautionary measure is to protect the return pointer or location in the stack, including the return pointer. A mechanism to implement this may include using capabilities and colorization in a manner similar to that described with respect to (572g).

본 명세서에서의 기술에 따른 실시예는 PUMP 규칙 메타데이터 처리 시스템을 다른 하이브리드 시스템의 일부로서 사용하여 새로운 규칙 세트를 학습하고 입증할 수 있다. 예를 들어, PUMP 규칙 메타데이터 처리 시스템은 허용된 제어 흐름을 (예를 들어, 로깅(logging)을 통해) 학습하고 이에 따라 실행 프로그램에 대한 규칙 및 허용된 유효한 제어 이전을 결정하는데 사용될 수 있다. 그런 다음 규칙 및 허용된 유효한 제어 이전은 실행된 프로그램을 위해 시행된 CFI 정책의 유효한 제어 이전의 규칙 및 그 세트로서 사용될 수 있다.Embodiments consistent with the techniques herein may use the PUMP rule metadata processing system as part of another hybrid system to learn and validate new rule sets. For example, the PUMP rule metadata processing system can be used to learn allowed control flows (eg, via logging) and determine the rules for an executing program and effective control transfers allowed accordingly. The rules and allowed effective control transfers may then be used as the effective control transfer rules and their set of enforced CFI policies for the executed program.

프로그램의 CFI 정책을 위한 학습 규칙 및 제어 이전을 추가로 설명하면, 제 1 훈련 또는 학습 단계가 수행될 수 있다. 이러한 제 1 단계에서, 프로그램은 태깅된 모든 제어 포인트(예를 들어, 소스 및 타깃의 분기 또는 이전) 및 제어 이전 명령어에 대한 규칙이 없는 CFI 정책의 훈련 버전을 이용하여 실행된다. 따라서 분기 또는 점프 명령어와 같은 제어 이전이 있을 때마다, 제어를 PUMP 규칙 메타데이터 시스템의 캐시 미스 핸들러에 이전시키게 하는 PUMP 규칙 캐시 미스가 존재한다. 캐시 미스 핸들러는 제어 이전에 관한 정보를 로그하는 처리를 수행할 수 있다. 로그된 정보는 예를 들어, 이전의 소스 위치 및 이전의 타깃 위치를 포함할 수 있다. 다른 정보는 또한 예를 들어, 이전이 이루어지는 (예를 들어, 소스 위치를 포함하는) 호출 절차 또는 루틴 및 제어가 이전되는 (예를 들어, 타깃 위치를 포함하는) 호출된 절차 또는 루틴을 포함할 수 있다. 보다 구체적으로, 학습 또는 훈련 단계에서, 특정한 제어 이전이 처음 발생하면, 캐시 미스 핸들러는 소스로부터 타깃으로 특정한 제어 이전을 위한 학습된 규칙 세트의 새로운 규칙을 계산한다. 동일한 소스로부터 동일한 타깃으로 제어의 후속 런타임 이전은 이렇게 계산된 규칙을 사용한다. 이러한 방식으로, 프로그램이 버그 없는 것으로 추정되고 프로그램이 실행되는 동안 (악의적 코드가 아닌) 비공격 프로그램 및 모든 제어 경로가 실행되면, 제어 이전들의 로그된 세트는 프로그램 실행의 종료에 설정된 학습된 규칙 세트에 의해 표시된 바와 같이, 이러한 특정 프로그램에 모든 유효한 또는 허용 가능한 제어 이전을 나타낸다. 따라서 학습된 규칙 세트는 프로그램에 대한 CFI 정책의 초기 또는 제 1 규칙 세트를 나타낼 수 있다.Further describing learning rules and transfer of control for the program's CFI policy, a first training or learning step may be performed. In this first step, the program is executed using a training version of the CFI policy that has no rules for all tagged control points (eg branch or transfer of source and target) and control transfer instructions. Thus, whenever there is a transfer of control, such as a branch or jump instruction, there is a PUMP rules cache miss that causes control to transfer to the cache miss handler in the PUMP rules metadata system. The cache miss handler can perform processing to log information about the control transfer. Logged information may include, for example, a previous source location and a previous target location. Other information may also include, for example, the calling procedure or routine to which the transfer was made (eg, including the source location) and the called procedure or routine to which control was transferred (eg, including the target location). can More specifically, in the learning or training phase, the first time a particular control transfer occurs, the cache miss handler computes a new rule in the learned rule set for the particular control transfer from the source to the target. Subsequent run-time transfers of control from the same source to the same target use this calculated rule. In this way, if a program is assumed to be bug-free and a non-attacking program (not malicious code) and all control paths are executed while the program is running, the logged set of control transfers is the set of learned rules set at the end of program execution. As indicated by , indicates all valid or permissible transfers of control to this particular program. Thus, the learned rule set may represent the initial or first rule set of the CFI policy for the program.

프로그램에 대한 CFI 정책을 나타내는 학습된 규칙 세트를 입증하는 처리가 수행될 수 있다. 입증은 이러한 규칙 중 어느 것도 유효하지 않은 제어 이전을 허용하지 않도록 보장하는 것을 포함할 수 있다. 학습된 규칙 세트의 입증은 임의의 적절한 방식으로 수행될 수 있다. 예를 들어, 실시예는 각 규칙을 입증하는 분석 도구를 실행할 수 있다. 도구는 예를 들어, 이진 또는 객체 코드, 심볼 테이블 및 원본 소스 코드 등을 검사하여 각 규칙이 허용된 이전에 해당하는지를 입증한다. 추가로 설명하면, 입증은 모든 제어 포인트(예를 들어, 소스 및 타깃의 분기 또는 이전)가 가진 이진 코드가 태깅되어 있는지를 검사할 수 있다. 이러한 방식으로, 태깅된 이진 또는 소스 코드는 모든 잠재적인 소스 및 타깃 위치의 유효한 세트를 나타내며, 이에 따라 제어의 런타임 이전 시 실제로 사용될 수 있는 잠재적인 소스 및 타깃 세트를 제공한다. 로그된 제어의 임의의 런타임 이전은 소스 및 타깃 각각이 유효한 세트에 포함되어 있는 경우 소스로부터 타깃으로만 이루어져야 한다. 예를 들어, 태깅된 이진 또는 소스 코드는 위치 A1, A2, A3 및 A4를 포함할 수 있다. 제어의 임의의 로그된 이전은 A1, A2, A3 또는 A4의 소스 및 A1, A2, A3 또는 A4의 타깃을 포함하여야 한다. 제 1 규칙에 의해 표시되는 로그된 런타임 제어 이전이 A1로부터 B7까지이면, B7이 제어 이전의 타깃이 아니어야 하기 때문에 (예를 들어, B7은 A1, A2 A3 및 A4로 일관하여 태깅된 정적으로 결정된 가능한 제어 포인트들의 세트에 포함되지 않기 때문에) 제 1 규칙은 유효하지 않을 수 있다. 일 양태에서, 학습된 규칙 세트는 입증 처리의 결과로서 규칙 제거를 통해 더 감소될 수 있는 후보 규칙 세트로서 특징지을 수 있다.A process may be performed to verify the learned rule set representing the CFI policy for the program. Verification may include ensuring that none of these rules allow for an invalid control transfer. Verification of the learned rule set may be performed in any suitable manner. For example, an embodiment may run an analysis tool that validates each rule. The tool verifies that each rule corresponds to the allowed transfer by examining, for example, binary or object code, symbol tables, and original source code. Further clarification, validation may check that the binary code with all control points (e.g. branch or transfer of source and target) is tagged. In this way, the tagged binaries or source code represent a valid set of all potential source and target locations, thus providing a set of potential sources and targets that can actually be used at runtime transfer of control. Any run-time transfer of logged control MUST only be made from a source to a target if the source and target are each included in a valid set. For example, the tagged binary or source code may contain locations A1, A2, A3 and A4. Any logged transfer of control must include the source of A1, A2, A3 or A4 and the target of A1, A2, A3 or A4. If the logged run-time control transfer indicated by rule 1 is from A1 to B7, since B7 should not be the target of the control transfer (e.g. B7 is statically tagged as A1, A2 A3 and A4 consistently) The first rule may not be valid because it is not included in the determined set of possible control points). In one aspect, a learned rule set can be characterized as a candidate rule set that can be further reduced through rule elimination as a result of validation processing.

입증된 프로그램에 대한 CFI 정책에 필요한 규칙의 초기 세트 또는 학습된 세트의 모든 규칙은 프로그램에 대해 시행되는 CFI 정책에 포함된 입증된 규칙 세트로서 사용될 수 있다.Any rules in the learned set or the initial set of rules required by the CFI policy for a validated program may be used as the validated rule set included in the enforced CFI policy for the program.

도 54를 참조하면, 정책 규칙을 학습, 입증 및 사용하기 위한 본 명세서에서의 기술에 따른 실시예에 의해 수행될 수 있는 방금 설명된 처리를 요약하는 예가 도시된다. (602)에서, 프로그램은 초기에 어떠한 CFI 정책 규칙 없이 실행될 수 있으며, 그래서 각각의 새로운 제어의 이전이 규칙 캐시 미스를 야기하게 되고 캐시 미스 핸들러를 트리거하여 런타임시에 만나는 제어 이전에 관한 새로운 규칙을 생성하도록 한다. 새로운 규칙은 소스 및 타깃으로부터 제어 이전을 식별할 수 있고, 제 1 세트의 학습된 규칙(604)에 포함될 수 있다. 프로그램 실행의 종료시에, 제 1 세트의 학습된 규칙(604)은 런타임시 발생된 각각의 상이한 제어 이전에 대한 규칙을 포함한다. 그 다음 제 2 세트의 학습된 규칙은 각 규칙이 유효한 제어 이전을 나타내는 것을 보장하기 위해 (606)의 처리에서 입증될 수 있다. (606)의 처리는 자동화된 규칙 입증에 대해 전술한 바와 같은 도구를 사용할 수 있으며, 다른 처리를 또한 포함할 수도 있다. 예를 들어, (606)의 입증 처리는 제어 이전이 유효하다는 것을 추가 확인해 주기 위해 도구에 의해 입증되었던 규칙을 사용자에게 제시하는 것을 포함할 수 있다. 제 2 세트의 입증 규칙(608)은 규칙 입증 처리(606)의 결과로서 생성될 수 있다. 그 결과, 제 2 세트의 입증된 규칙(608)은 (610)에서 제 2 시점에서 프로그램을 실행할 때 시행되는 CFI 정책으로서 PUMP 시스템에 의해 사용될 수 있다.Referring to FIG. 54 , an example is shown summarizing the just-described processing that may be performed by embodiments in accordance with the techniques herein for learning, validating, and using policy rules. At 602, the program can initially run without any CFI policy rules, so that each new transfer of control will cause a rule cache miss and trigger a cache miss handler to find new rules for control transfers encountered at runtime. to create A new rule may identify transfer of control from the source and target and may be included in the first set of learned rules 604 . At the end of program execution, the first set of learned rules 604 includes rules for each of the different control transfers generated at runtime. The second set of learned rules can then be verified in the process of 606 to ensure that each rule represents a valid control transfer. The processing of 606 may use tools such as those described above for automated rule validation, and may also include other processing. For example, the validation process of 606 may include presenting to the user the rules that have been validated by the tool for further confirmation that the transfer of control is valid. A second set of validation rules 608 may be created as a result of rule validation processing 606 . As a result, the second set of validated rules 608 can be used by the PUMP system as the CFI policy enforced when running the program at a second point in time 610 .

따라서, 전술한 바와 같이, (602)에서 제 1 프로그램 실행은 프로그램에 대한 유효한 한 세트의 제어 이전을 결정하는데 사용될 수 있다. 그러나, 이러한 단일 프로그램 실행이 모든 제어 경로를 실행한다고 가정하는 것이 타당하지 않을 수 있으며, 따라서 (608)에서 식별된 제어 이전은 가능한 모든 유효한 제어 이전보다 적게 나타날 수 있다. 이 경우, 입증된 CFI 정책 규칙 세트를 사용하여 (610)과 관련하여 전술한 바와 같은 처리가 수행될 수 있다. 런타임 동안, (예를 들어, (608)에서 규칙을 갖지 않은 예기치 않은 제어 이전을 나타내는), 규칙 캐시 미스를 야기하는 제어 이전이 발생하면, 런타임시에 예를 들어, (예를 들어, 이진 코드에 태깅된 또는 소스 프로그램에 주석이 붙은 가능한 제어 포인트들의 세트를 사용하여) 전술한 바와 같은 제어 이전을 입증하는 추가적인 체크가 수행될 수 있다. 제어 이전이 유효하지 않은 것으로 결정되면, 결함 또는 예외가 트리거될 수 있다.Thus, as discussed above, the first program execution at 602 can be used to determine a valid set of transfers of control for the program. However, it may not be reasonable to assume that this single program execution executes all control paths, and thus the control transfer identified at 608 may appear less than all possible valid control transfers. In this case, processing as described above with respect to 610 may be performed using the proven CFI policy rule set. During runtime, if a control transfer occurs that results in a rule cache miss (e.g., indicating an unexpected control transfer with no rules at 608), at runtime, e.g., the binary code An additional check can be performed verifying the transfer of control as described above (using a set of possible control points tagged with or annotated in the source program). If the transfer of control is determined to be invalid, a fault or exception may be triggered.

대안으로서, 제어 이전으로 인해 규칙 캐시 미스를 야기하고 그럼으로써 예기치 않은 런타임 제어 이전을 나타내게 되면, 캐시 미스 핸들러는 예기치 않은 이전 규칙을 나중의 검증을 위해 기록할 수 있고 또한 예기치 않은 제어 이전을 시행중인 추가 또는 상이한 정책으로 계속 이어지게 할 수 있다. 예를 들어, 제어 이전이 유효하지 않은 경우, 이전은 신뢰성 없는 것으로 간주될 수 있고, 그래서 정책은 유효하지 않은 제어 이전의 신뢰성 없는 특성으로 인해 더 높은 레벨의 보호를 반영하도록 수정될 수 있다. 예를 들어, 예기치 않은 이전은 제어를 라이브러리 루틴으로 이전할 수 있다. 라이브러리 루틴은 예기치 않은 이전 이전에 시행중인 것들보다 높은 레벨의 보호 및 더 적은 신뢰를 반영하는 정책을 사용하여 실행될 수 있다. 제어 이전이 입증된 경우, 제 1 스택 보호 정책은 예기치 않은 제어 이전에 앞선 제 1 시점에서 시행 중일 수 있고 제 2 스택 보호 정책은 예기치 않은 제어 이전의 이후에 시행될 수 있다. 제 1 스택 보호 정책은 정적 절차 인가를 시행할 수 있다. 제 1 보호 정책은 객체 보호 모델과 함께 본 명세서의 다른 곳에서 설명된 바와 같이 객체 레벨에서 어떠한 컬러화도 포함하지 않을 수 있다. 예기치 않은 제어 이전 이후, 시행중인 제 2 스택 보호 정책은 엄격한 객체 컬러화와 함께 본 명세서의 다른 곳에서 설명된 객체 보호 모델에 따라 스택 보호를 제공할 수 있다. 따라서 예기치 않은 제어 이전이 일어나면, 실행되는 코드는 더 엄격하고 미세한 레벨의 스택 보호의 세밀성을 제공하는 보다 제한적인 제 2 스택 보호 정책을 이용할 수 있다. 또한, 예기치 않은 제어 이전이 일단 발생하면 프로그램 실행은 우선 순위가 낮아진 채로 지속될 수 있다.Alternatively, if a control transfer causes a rule cache miss, thereby indicating an unexpected run-time control transfer, the cache miss handler may log the unexpected transfer rule for later validation and may also indicate that the unexpected transfer of control is in effect. It can be continued with additional or different policies. For example, if the control transfer is invalid, the transfer can be considered untrusted, so the policy can be modified to reflect a higher level of protection due to the untrusted nature of the invalid control transfer. For example, an unexpected transfer could transfer control to a library routine. Library routines can be executed using policies that reflect a higher level of protection and less trust than those in force prior to unexpected transfers. If control transfer is proven, the first stack protection policy may be in effect at a first point in time prior to the unexpected control transfer and the second stack protection policy may be in effect after the unexpected control transfer. The first stack protection policy may enforce static procedure authorization. The first protection policy may not include any colorization at the object level as described elsewhere herein with the object protection model. After an unexpected control transfer, the second stack protection policy in effect may provide stack protection according to the object protection model described elsewhere herein with strict object colorization. Thus, when an unexpected transfer of control occurs, the executing code can use the more restrictive second stack protection policy, which provides a stricter and finer level of granularity of stack protection. Additionally, once an unexpected transfer of control occurs, program execution may continue with a lowered priority.

도 55 및 도 56을 참조하면, 위에서 설명된 프로그램에 대한 CFI 정책의 규칙과 같은, 유효한 규칙 세트를 사용하여 본 명세서에서의 기술에 따른 실시예에서 수행될 수 있는 처리 단계의 흐름도(620, 630)가 도시된다. 흐름도(620)는 실행 프로그램의 CFI 정책에서 규칙을 갖지 않은 예기치 않은 제어 이전과 관련하여 수행될 수 있는 제 1 세트의 처리 단계를 설명한다. 흐름도(631)는 실행 프로그램의 CFI 정책에서 규칙을 갖지 않은 예기치 않은 제어 이전과 관련하여 수행될 수 있는 제 2 세트의 처리 단계를 설명한다.55 and 56, flow diagrams 620 and 630 of processing steps that may be performed in embodiments according to the techniques herein using an effective rule set, such as the rules of the CFI Policy for Programs described above. ) is shown. Flow diagram 620 describes a first set of processing steps that may be performed in connection with an unexpected transfer of control that does not have a rule in an executable program's CFI policy. Flow diagram 631 describes a second set of processing steps that may be performed in connection with unexpected control transfers that do not have rules in the CFI policy of the executable program.

흐름도(620)를 참조하면, 단계(622)에서, 프로그램은 한 세트의 입증된 규칙을 사용하여 실행될 수 있다. 프로그램 실행 동안 단계(624)에서, 제어의 런타임 이전이 수행된다. 단계(626)에서, 이전이 예기치 못한 것을 나타내는 규칙 캐시 미스가 있는지가 결정된다. 특히, 제어의 런타임 이전에 대한 제 2 세트의 입증된 규칙에 규칙이 존재하면, 제어 이전이 예상되며, 이 경우 단계(626)는 아니오로 평가하고, 처리는 단계(628)로 이어지며 이 단계에서 제어 이전이 수행되고 프로그램은 계속 실행된다.Referring to flowchart 620, at step 622, the program may be executed using a set of validated rules. At step 624 during program execution, a runtime transfer of control is performed. At step 626, it is determined if there are rule cache misses indicating that the transfer was unexpected. In particular, if a rule exists in the second set of asserted rules for run-time transfer of control, control transfer is expected, in which case step 626 evaluates to no, and processing continues to step 628, where A transfer of control is performed at , and the program continues to run.

단계(626)가 예(예를 들어, 예기치 않은 제어 이전을 나타내는 캐시 미스)라고 평가하면, 처리는 예기치 않은 제어 이전에 대해 런타임 입증 처리가 수행되는 단계(632)로 이어진다. 특히, 미스 핸들러는 예기치 않은 이전을 입증하려고 시도하는 처리를 수행할 수 있다. 규칙 입증 처리의 예는 런타임 소스 및 타깃 위치가 태깅된 이진 코드, 원래의 소스 프로그램 및 심볼 테이블 등을 사용하여 결정될 수 있는 전술한 한 세트의 잠재적인 제어 이전 포인트들에 포함되어 있는지를 결정하는 것을 포함할 수 있다. 단계(634)에서, 단계(632)의 입증 처리는 예기치 않은 제어 이전이 유효한지를 결정한다. 단계(634)가 예라고 평가하면, 처리는 단계(636)로 이어지고 이 단계에서 새로운 규칙은 프로그램에 대한 CFI 정책으로서 사용되는 제 2 세트에 추가되고 프로그램의 처리는 계속된다. 단계(634)가 아니오로 평가하면, 프로그램 실행은, 예를 들어, 트랩을 유발함으로써 종료될 수 있다.If step 626 evaluates to yes (eg, cache miss indicating unexpected control transfer), then processing continues to step 632 where runtime verification processing is performed for the unexpected control transfer. In particular, the miss handler can perform processing that attempts to prove an unexpected transfer. An example of a rule validation process is determining if the runtime source and target locations are included in the aforementioned set of potential control transfer points that can be determined using tagged binary code, the original source program and symbol tables, etc. can include At step 634, the attestation process of step 632 determines whether the unexpected transfer of control is valid. If step 634 evaluates to yes, processing continues to step 636 where the new rule is added to the second set to be used as the CFI policy for the program and processing of the program continues. If step 634 evaluates to no, program execution may be terminated, for example by triggering a trap.

흐름도(631)를 참조하면, 단계(622, 624, 626 및 628)는 흐름도(620)와 관련하여 설명한 바와 같다. 단계(626)이 예라고 평가하면, 제어는 예기치 않은 제어 이전이 기록될 수 있는 단계(639)로 진행한다 (예를 들어, 예기치 않은 제어 이전의 후보 규칙이 기록된다). 단계(639)에서, 프로그램은 제어의 이전이 예기치 않은 때에도 실행을 계속될 수 있다. 그러나, 단계(639)에서, 프로그램 실행은 예를 들어, 위에서 언급한 바와 같이 하나 이상의 제한 정책 세트, 감소된 실행 우선 순위 등을 사용하여 계속된다. Referring to flowchart 631 , steps 622 , 624 , 626 and 628 are as described with respect to flowchart 620 . If step 626 evaluates to yes, control proceeds to step 639 where the unexpected control transfer can be recorded (eg, the candidate rule for the unexpected control transfer is recorded). At step 639, the program may continue execution even when the transfer of control is unexpected. However, at step 639, program execution continues using, for example, one or more restrictive policy sets, reduced execution priority, etc., as noted above.

전술한 바와 같이 위에서 설명된 처리는 테인트 추적과 같은 다른 정책과 관련하여 유사하게 수행될 수 있다. 예를 들어, 테인트 추적을 위해, 제 1 학습 또는 훈련 단계가 실행되어 캐시 미스 핸들러가 각각의 캐시 미스를 "로그"하게 함으로써 프로그램 실행을 통해 정책의 규칙을 학습할 수 있다. 본 명세서에서 설명된 바와 같이, 테인트 추적은 (예컨대, 본 명세서의 다른 곳에서 설명된 CI를 사용하여) 생성하거나 액세스하는 코드에 기초하여 데이터에 태깅하는 것을 포함할 수 있다. 코드 또는 소스에 기초하여 데이터를 테인트시키는 하나의 이유는 프로그램이 제대로 포함되어 있고 원하지 않거나 부적절한 데이터 액세스를 수행하지 않는 것을 확인하려는 것이다. 예를 들어, JPEG 디코더에 의해 오염된 데이터가 비밀번호 데이터베이스에 절대 유입되지 않도록 하거나 신용 카드 데이터, 사회 보장 번호 또는 기타 개인 정보가 특정 세트의 하나 이상의 제한된 애플리케이션들에 의해서만 액세스되도록 보장하는 규칙이 사용될 수 있다. 테인트 추적 정책을 결정할 때, 캐시 핸들러가 특정 데이터 흐름(예를 들어, 프로그램 루틴의 어떤 루틴이 어떤 데이터에 액세스하는지, 어떤 사용자 입력이 어떤 데이터베이스에 기입되는지 등)을 처음 보았을 때 캐시 핸들러 미스를 유발하고 규칙을 기록하는 테스트 데이터에 대해 테인트 추적 규칙을 실행하지 않고 학습 또는 훈련 단계를 위한 처리가 수행될 수 있다. CFI 정책에 대해 위에서 설명한 것과 유사한 방식으로, 제 1 학습 단계의 테스트 실행이 끝날 때, 동작 동안 프로그램을 보호하기 위해 적용되는 한 세트의 학습된 규칙이 있다. 학습된 규칙 세트의 입증 처리는 CFI 학습된 규칙 세트에 대해 위에서 언급한 도구 또는 다른 적합한 수단을 사용하여 수행될 수도 있다. 테인트 추적을 위한 그러한 입증 처리는 각각의 데이터 흐름 또는 액세스가 적절한지를 보장하는 것을 포함할 수 있다.As noted above, the processing described above may similarly be performed with respect to other policies such as taint tracking. For example, for taint tracking, a first learning or training step can be executed to have a cache miss handler "log" each cache miss, thereby learning the rules of the policy through program execution. As described herein, taint tracking can include tagging data based on code that creates or accesses it (eg, using CIs described elsewhere herein). One reason to taint data based on code or source is to ensure that the program is properly embedded and does not perform unwanted or improper data accesses. For example, a rule could be used to ensure that data tainted by a JPEG decoder never enters a password database, or to ensure that credit card data, social security numbers, or other personal information is only accessed by a specific set of one or more restricted applications. there is. When determining the taint tracking policy, cache handler misses the first time the cache handler sees a particular data flow (e.g., which routine in a program routine accesses which data, which user input is written to which database, etc.). Processing for the learning or training phase can be performed without executing taint tracking rules on the test data that triggers and writes the rules. In a manner similar to that described above for CFI policies, at the end of the first learning phase test run, there is a set of learned rules that are applied to protect the program during operation. Attestation processing of learned rule sets may be performed using the tools mentioned above for CFI learned rule sets or other suitable means. Such attestation processing for taint tracking may include ensuring that each data flow or access is appropriate.

또한, 흐름도(620 및 631)와 관련하여 설명된 것과 유사한 방식으로, 입증된 규칙 세트는 캐시 미스 핸들러가 입증된 세트 내의 대응하는 규칙을 갖지 않는 임의의 데이터 액세스에 대한 처리를 다루는 PUMP 시스템과 함께 사용될 수 있다. 흐름도(620)의 처리와 유사하게, 그 다음에 캐시 미스 핸들러는 또한 데이터 액세스 또는 데이터 흐름에 대한 후보 규칙이 유효한지를 결정하는 런타임 입증 처리(예를 들면, 단계(632)와 유사함)를 수행하고 프로그램 실행이 계속되게(단계(634, 636)와 유사함) 하거나 계속되지 않게(예를 들어, 단계(638)와 유사함) 할 수 있다. 대안적으로, 흐름도(631)의 처리와 유사하게, 캐시 미스 핸들러는 (예를 들어, 런타임 동안이 아닌) 오프라인으로 입증될 수 있는 예상치 않은 데이터 액세스 또는 데이터 흐름에 대한 후보 규칙을 기록하고, 예를 들어 보다 제한적인 정책, 낮아진 우선 순위 등을 사용하여 프로그램 실행을 계속할 수 있다(예를 들어, 단계(639)와 유사함).Also, in a manner similar to that described with respect to flowcharts 620 and 631, an asserted rule set can be created with the PUMP system handling the handling of any data access for which a cache miss handler does not have a corresponding rule in the asserted set. can be used Similar to the processing of flow diagram 620, the cache miss handler then also performs runtime validation processing (e.g., similar to step 632) to determine if the candidate rule for the data access or data flow is valid. and program execution may continue (similar to steps 634 and 636) or not continue (similar to step 638, for example). Alternatively, similar to the processing of flow diagram 631, the cache miss handler records candidate rules for unexpected data accesses or data flows that may be proven offline (e.g., not during runtime), and yes Program execution may continue (e.g., similar to step 639) using a more restrictive policy, lowered priority, etc.

위의 예는 일반적으로 이진 학습 프로세스를 설명한다. 본 발명에서의 기술에 따른 실시예는 이벤트(예를 들어, 제어 이전 또는 데이터 액세스)를 허용할지에 관한 결정을 내릴 때 통계를 사용하는 것을 더 지원할 수 있다. 적어도 하나의 실시예에서, 카운터가 각 규칙에 추가되어 프로그램 실행 동안 각 규칙의 사용 횟수를 카운트할 수 있다. 규칙이 PUMP 캐시에서 퇴거될 때, 처리는 규칙 사용에 관한 추가 통계를 제공하는데 사용될 수 있는 글로벌 소프트웨어 계수(count)에 누적된 규칙 사용을 추가할 수 있다. 계수는 어떤 것을 제한된 횟수만큼 발생하게 하는데 사용될 수도 있다. 예를 들어, 소스로부터 타깃으로의 데이터 흐름을 추적하는 테인 트래킹 규칙과 관련하여, 소스와 타깃 간의 예기치 않은 데이터 흐름에 대해 제한된 문턱량의 데이터(예를 들어, 특정 프로그램에 의한 특정 데이터베이스로부터 판독된 X양의 데이터)가 허용될 수 있다. 일단 해당 문턱량이 일단 이전되면, 대응하는 후보 규칙이 성공적으로 입증될 때까지 소스와 타깃 사이에는 추가 데이터가 이전되지 않을 수 있다. 제한된 사용 사례가 문턱량을 가질 때, PUMP 시스템(예를 들어, 미스 핸들러)은 규칙이 없는 명령어가 약간 제한된 횟수로 발생할 수 있게 한다. 문턱값에 적용된 집계 또는 계수는 다른 방법으로 수행될 수 있다. 예를 들어, 예기치 않은 제어 이전을 고려해 본다. 집계되지 않았다면, 캐시 미스 핸들러는 규칙이 입증되지 않은 동일한 예기치 않은 제어 이전을 5 회 초과하여 발생할 수 없게 할 수 있다. 예컨대 프로그램에 대한 예기치 않은 모든 제어 이전 전체에 대해 집계되면, 프로그램은 최대 100회의 예기치 않은 제어 이전을 허용할 수 있다. 이것은 예를 들면, 예기치 않은 제어의 이전 또는 예기치 않은 데이터 액세스가 발생할 단일 인스턴스에 허용될 수 있는 사례에 대해 유용할 수 있다. 예를 들어, 특정 소스로부터의 데이터를 검사하는 단일 쿼리가 허용될 수 있다. 그러나, 데이터 소스(예를 들어, 특정 데이터베이스)에 대해 문턱 횟수 이상의 쿼리가 수행되면, 프로그램은 플래그(flaged)되거나 중지되어야 한다. The example above describes the binary learning process in general. Embodiments consistent with techniques herein may further support the use of statistics when making decisions about whether to allow an event (eg, control transfer or data access). In at least one embodiment, a counter may be added to each rule to count the number of uses of each rule during program execution. When rules are evicted from the PUMP cache, processing may add accumulated rule usage to a global software count that can be used to provide additional statistics about rule usage. Coefficients can also be used to cause something to occur a limited number of times. For example, with respect to a tain tracking rule that tracks the flow of data from a source to a target, a limited threshold amount of data (e.g., read from a specific database by a specific program) X amount of data) may be acceptable. Once that threshold amount has been transferred, no further data may be transferred between the source and target until the corresponding candidate rule has been successfully validated. When a limited use case has a threshold amount, the PUMP system (e.g., a miss handler) allows an instruction without rules to occur a slightly limited number of times. The aggregation or counting applied to the threshold may be performed in other ways. For example, consider an unexpected transfer of control. If not aggregated, the cache miss handler may not allow the same unexpected control transfer to occur more than 5 times for which the rule is not validated. For example, a program may allow up to 100 unexpected control transfers, if aggregated over all unexpected transfers of control for a program. This can be useful, for example, for cases where unexpected transfers of control or unexpected data access may be allowed in a single instance to occur. For example, a single query that examines data from a particular source may be acceptable. However, if more than a threshold number of queries are performed for a data source (eg, a specific database), the program must be flagged or stopped.

보다 일반적인 통계 사례는 정상적인 거동의 범위를 학습하는데 사용될 수 있다. 예를 들어, 정책의 상이한 규칙의 상대적 사용량(예를 들어, 각 규칙의 사용 비율)을 결정하기 위해 학습 단계에서 프로그램이 실행될 수 있다. 예를 들어, 런타임 제어 이전을 위해 호출된 각 규칙의 상대적 사용이 기록될 수 있다. 이상적으로, 그러한 실행은 많은 상이한 데이터 세트를 사용하는 프로그램에 대해 수행되어 평균적 또는 정상적 프로그램 거동으로 간주될 수 있는 것을 학습할 수 있다. 그런 다음 규칙 학습 및 입증은 (전술한 바와 같은) 입증된 제어 이전에 대한 규칙 세트 및 추가적으로는 각각의 입증된 규칙의 상대적 사용량을 나타내는 비율을 만들어 낼 수 있다. 입증된 규칙 및 연관된 사용 비율 둘 모두는 후속 처리 동안 시행된 정책 규칙으로 사용될 수 있다. 후속 프로그램 실행 동안 정책이 시행될 때, PUMP 시스템은 현재 규칙 사용량이 예상 비율과 일치하지 않은지를 체크할 수 있다. 실시예는 예를 들어, 규칙이 최대치를 초과하는 규칙을 호출하는 제어 이전이 플래그될 수 있는 규칙의 범위 또는 최대 예상된 사용을 포함할 수 있다. 예를 들어, 최대치를 초과하는 특정 제어 이전 규칙을 호출하는 프로그램은 추가 검사 또는 분석을 위해 플래그될 수 있다. 이러한 메커니즘을 사용하면, 네트워크 동작이 모니터링되는 방식과 유사하게 프로그램 런타임 거동이 모니터링되어 방화벽 규칙을 생성할 수 있다. 통계적 학습 알고리즘은 규칙 사용량 및 아마도 메인 메모리 트래픽 및 캐시 미스 레이트와 같은 다른 표준 런타임 특성을 포착하여 정상적인 사례 대 공격 거동을 학습하는데 사용될 수 있다. 전술한 바와 같은 제한적 사용 문턱치를 적용하는 실시예에서, 프로그램이 비정상적인 다른 런타임 거동을 보이거나 그렇지 않으면 신뢰성 없는 것으로 간주될 수 있다면, 사용 한도는 크게 감소되거나 그렇지 않으면 0으로 설정될 수 있다. 대안적으로, 프로그램이 정상적인 런타임 거동을 보이거나 그렇지 않으면 신뢰성 있는 것으로 간주될 수 있다면, 사용 한도는 신뢰성 없는 시나리오에 비해 훨씬 더 높게 설정되거나 증가될 수 있다.More general statistical cases can be used to learn the range of normal behavior. For example, the program can be run in a learning phase to determine the relative usage of the different rules of the policy (eg, the percentage usage of each rule). For example, the relative use of each rule invoked for run-time control transfer can be recorded. Ideally, such runs can be performed on programs using many different data sets to learn what can be considered average or normal program behavior. Rule learning and validation can then produce a set of rules for a proven control transfer (as described above) and additionally a percentage representing the relative usage of each proven rule. Both the proven rule and the associated usage rate can be used as enforced policy rules during subsequent processing. When the policy is enforced during subsequent program execution, the PUMP system can check that the current rule usage does not match the expected rate. Embodiments may include, for example, a maximum expected use or range of rules whereby a control transfer invoking a rule that exceeds the maximum may be flagged. For example, programs that call certain control transfer rules that exceed the maximum may be flagged for further inspection or analysis. Using these mechanisms, program runtime behavior can be monitored to create firewall rules, similar to how network behavior is monitored. Statistical learning algorithms can be used to learn normal case versus attack behavior by capturing rule usage and possibly other standard runtime characteristics such as main memory traffic and cache miss rate. In an embodiment that applies a limited-use threshold as described above, if a program exhibits unusual run-time behavior or otherwise could be considered unreliable, the usage limit may be greatly reduced or otherwise set to zero. Alternatively, if the program exhibits normal runtime behavior or can otherwise be considered trusted, the usage limit can be set or increased much higher than in untrusted scenarios.

위의 기술은 컴파일러가 임의의 추가 정보를 출력하지 않고, CFI 정책의 규칙에 반영된 유효한 제어 이전과 같은 정책의 유효한 규칙 세트를 결정하는데 사용될 수 있다. 따라서, 본 명세서에서의 기술에 따른 실시예는 각 정책의 두 개의 버전 - 하나는 학습 단계에 사용되고 다른 하나는 후속 시행에 사용됨 - 을 가질 수 있다. 학습 단계는 테인트 추적을 위한 허용 가능한 데이터 액세스 또는 흐름을 발견하고, CFI 정책에 대한 제어 이전을 발견하는 등의 자동 진단 모드로 사용될 수 있다.The above technique can be used to determine the effective rule set of a policy, such as the effective control transfers reflected in the rules of a CFI policy, without the compiler outputting any additional information. Thus, embodiments in accordance with the techniques herein may have two versions of each policy, one used for the learning phase and the other used for subsequent enforcement. The learning phase can be used in an autodiagnostic mode to discover allowable data accesses or flows for taint tracking, transfer of control for CFI policies, and so on.

이제 RISC-V 프로세서를 사용하는 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 아키텍처의 예가 설명될 것이다. 또한, 아래에서는 프로세서에 의해 사용되는 태깅되지 않은 데이터 소스와 태깅된 데이터 소스 사이에서 프로세서-기반의 중재된 데이터 이전(processor-based mediated data transfer)을 수행하는 것과 관련하여 사용될 수 있는 기술이 설명된다. 이러한 기술은 프로세서가 사용하기 위해 시스템에 가져올 수 있는 외부의 신뢰성 없는 데이터에 태깅하고 또한 태깅되지 않은 데이터를 만들기 위해 시스템 내에서 사용되는 태깅된 데이터로부터 태그를 제거하여 시스템의 외부에서 사용하기 위해 제공된다.Examples of architectures that may be used in embodiments according to the techniques herein using a RISC-V processor will now be described. Also described below are techniques that may be used in connection with performing a processor-based mediated data transfer between an untagged data source and a tagged data source used by a processor. . These technologies tag external, untrusted data that the processor may bring into the system for use, and also remove tags from tagged data used within the system to make untagged data available for use outside the system. do.

도 57을 참조하면, 태깅된 데이터와 태깅되지 않은 데이터를 중재하기 위한 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 컴포넌트의 예가 도시된다. 예(700)는 RISC-V CPU(702), PUMP(704), L1 데이터 캐시(706), L1 명령어 캐시(708), 태깅된 데이터 이전을 위해 시스템 내에서 내부적으로 사용되는 인터커넥트 패브릭(710), 부팅 ROM(712a), DRAM 제어기(ctrl)(712b) 및 태깅된 데이터를 저장하는 외부 DRAM(712c)을 포함한다. 또한, 하드웨어 컴포넌트인 부가 태그(714a) 및 밸리데이트 드롭 태그(validate drop tag)(714b), 프로세서(702)에 의해 사용하기 위해 태깅되지 않은 메모리(716)로부터 외부의 태깅되지 않은 데이터를 이전하고 태깅되지 않은 메모리(716)로 태깅되지 않은 데이터를 이전하기 위해 사용되는 인터커넥트 패브릭(715)이 포함된다. 태깅되지 않은 메모리(716) 이외의, 외부의 태깅되지 않은 데이터의 다른 소스(701)는 태깅되지 않은 패브릭(715)에 연결될 수 있다는 것을 유의하여야 한다. 예를 들어, 요소(701)는 플래시 메모리에 저장된 태깅되지 않은 데이터, 네트워크로부터 액세스 가능한 태깅되지 않은 데이터 등을 포함할 수 있다. DRAM 제어기(ctrl)(712b)는 DRAM(712c)으로부터 데이터를 판독하고 DRAM(712c)에 데이터를 기입하는데 사용되는 제어기이다. 부팅 ROM(712a)은 시스템을 부팅할 때 사용되는 부팅 코드를 포함할 수 있다.Referring to FIG. 57 , an example of a component that may be used in an embodiment consistent with the techniques herein for mediating tagged and untagged data is shown. Example 700 is a RISC-V CPU 702, PUMP 704, L1 data cache 706, L1 instruction cache 708, interconnect fabric 710 used internally within the system for tagged data transfer. , a boot ROM 712a, a DRAM controller (ctrl) 712b, and an external DRAM 712c for storing tagged data. In addition, an additional tag 714a and a validate drop tag 714b, which are hardware components, transfer external untagged data from the untagged memory 716 for use by the processor 702 and Interconnect fabric 715 used to transfer untagged data to untagged memory 716 is included. It should be noted that other sources 701 of external untagged data, other than untagged memory 716 , may be coupled to untagged fabric 715 . For example, element 701 may include untagged data stored in flash memory, untagged data accessible from a network, and the like. A DRAM controller (ctrl) 712b is a controller used to read data from the DRAM 712c and write data to the DRAM 712c. The boot ROM 712a may include boot code used to boot the system.

예(700)는 별도의 태깅된 패브릭(710) 및 태깅되지 않은 패브릭을 도시하며, 이들 둘 사이에는 데이터를 이동하기 위한 프로세서(702)가 사용된다. 부가 태그(714a)는 입력으로서 태깅되지 않은 데이터를 받고 이 데이터가 (본 명세서에서 설명된 시스템 외부에서 사용될 수 있는) 공개적임을 표시하며 그리고 (소스가 알려지지 않을 수 있거나 그렇지 않으면 공지된 신뢰성 있는 소스로부터 온 것이 아니기 때문에) 신뢰성이 없음을 표시하는 태그로 데이터를 태깅한다. 적어도 하나의 실시예에서, 태깅되지 않은 메모리(716)의 태깅되지 않은 데이터는 (714a)에 의해 수신될 수 있다. (716)로부터 수신된 태깅되지 않은 데이터는 암호화될 수 있고 이에 따라 부가 태그(714a)는 수신된 암호화된 데이터에 신뢰성 없는 태그를 부가할 수 있다. 수신된 데이터는 공개-비밀 키 쌍을 사용하는 공개 키 암호화 또는 관련 기술분야에 공지된 다른 적합한 암호화 기술과 같은 비대칭 암호화를 사용하여 암호화될 수 있다. 수신된 데이터는 암호화된 형태로 저장될 수 있다. 관련 기술 분야에서 공지된 바와 같이, 소유자의 공개-개인 키 쌍에 있어서, 개인 키는 소유자에게만 공개되지만 공개 키는 공개되어 다른 사람들에 의해 사용된다. 제 3자는 소유자의 공개 키를 사용하여 소유자에게 보내는 정보를 암호화할 수 있다. 그러면 소유자는 (다른 사람과 공유하지 않는) 자신의 개인 키를 사용하여 수신된 암호화된 정보를 해독할 수 있다. 유사한 방식으로, 소유자는 자신의 개인 키를 사용하여 정보를 암호화할 수 있고, 암호화된 정보는 소유자의 공개 키를 사용하여 암호화된 정보를 해독하는 제 3 자에게 전송된다.Example 700 shows a separate tagged fabric 710 and an untagged fabric between which a processor 702 is used to move data. Additional tag 714a accepts untagged data as input and indicates that this data is public (which may be used outside the system described herein) and (source may be unknown or is otherwise known from a trusted source). tag the data with a tag that indicates unreliability (because it is not In at least one embodiment, untagged data in untagged memory 716 may be received by 714a. Untagged data received from 716 may be encrypted so that additional tag 714a may attach an untrusted tag to the received encrypted data. The received data may be encrypted using asymmetric encryption, such as public key encryption using a public-private key pair or other suitable encryption techniques known in the art. Received data may be stored in encrypted form. As is known in the art, for an owner's public-private key pair, the private key is only released to the owner while the public key is made public and is used by others. A third party can use the owner's public key to encrypt information sent to the owner. The owner can then use his private key (which he does not share with anyone else) to decrypt the received encrypted information. In a similar manner, the owner can encrypt information using his/her private key, and the encrypted information is transmitted to a third party who uses the owner's public key to decrypt the encrypted information.

밸리데이트 드롭 태그(714b)는 태깅된 암호화된 데이터를 수신하여 태그를 제거함으로써 태깅되지 않은 메모리(716)로 내보내지는 태깅되지 않은 암호화된 데이터를 만들어 낼 수 있다. 메모리(716)에 저장된 이렇게 태깅되지 않은 암호화된 데이터는 예를 들어, 태그를 사용하지 않는 그리고 본 명세서에서 설명된 PUMP를 사용하여 수행하는 것으로서 연관된 메타데이터 규칙 처리를 사용하지 않는 다른 시스템 및 프로세서에서 사용될 수 있다.Validate drop tag 714b may receive the tagged encrypted data and remove the tag to produce untagged encrypted data that is exported to untagged memory 716 . Such untagged encrypted data stored in memory 716 can be stored in other systems and processors that do not use tags and do not use the associated metadata rule processing, for example, as done using PUMP as described herein. can be used

적어도 하나의 실시예에서, (714a)에서 수신된 태깅되지 않은 데이터는 앞에서 언급한 바와 같이 암호화될 수 있고 또한 데이터의 무결성을 제공하도록 서명될 수 있다. 또한, 서명은 수신된 데이터 아이템을 입증하는데 사용되어 (예를 들어, 서명한 원래의 송신자가 송신한 이래로 수정되지 않았으며, 데이터가 그 데이터에 서명하는 송신자가 보냈던 것임을 보장하는) 인증 및 데이터 무결성을 보장할 수 있다. 예를 들어, 소유자는 해시 값 또는 "다이제스트(digest)"를 생성하기 위해 메시지를 해싱한 다음 다이제스트를 소유자의 개인 키로 암호화하여 디지털 서명을 생성할 수 있다. 소유자는 메시지 및 서명을 제 3 자에게 송신할 수 있다. 제 3자는 서명을 사용하여 수신된 데이터를 입증할 수 있다. 먼저, 제 3자는 소유자의 공개 키를 사용하여 메시지를 해독할 수 있다. 서명은 해독된 메시지의 해시 또는 다이제스트를 계산하고, 소유자의/서명자의 공개 키로 서명을 해독하여 예상된 다이제스트 또는 해시를 획득하고, 계산된 다이제스트를 해독된 예상된 다이제스트 또는 해시와 비교함으로써 검증될 수 있다. 매칭하는 다이제스트는 소유자가 서명했던 메시지가 수정되지 않았음을 확인한다.In at least one embodiment, the untagged data received at 714a may be encrypted as noted above and may also be signed to provide integrity of the data. In addition, signatures can be used to verify received data items (e.g., to ensure that the original sender that signed them has not been modified since sending them, and that the data was sent by the sender signing the data) for authentication and data integrity. can guarantee For example, an owner can hash a message to create a hash value or "digest" and then encrypt the digest with the owner's private key to create a digital signature. Owners can send messages and signatures to third parties. Third parties can use signatures to verify received data. First, the third party can use the owner's public key to decrypt the message. A signature can be verified by computing a hash or digest of the decrypted message, decrypting the signature with the owner's/signer's public key to obtain the expected digest or hash, and comparing the computed digest to the decrypted expected digest or hash. there is. A matching digest verifies that the message the owner signed has not been modified.

동작시, 로드 명령어와 같은 명령어는 태깅되지 않은 메모리(716)에 저장된 데이터를 참조할 수 있으며, 이 데이터는 이후 명령어 실행에 사용하기 위해 데이터 캐시(706)로 전달된다. 이러한 로드 명령어에 대해, 데이터는 (716)로부터 (715)을 통해 (신뢰성 없는 것으로 및 공개적인 것으로 태깅된) 태깅된 데이터를 출력하는 (714a)에 의한 처리를 위해 전달될 수 있다. (714a)에 의해 출력된 태깅된 데이터는 처리를 위해 L1 데이터 캐시(706)에 저장된다. 유사한 방식으로, 스토어 명령어는 데이터 캐시(706)로부터의 데이터를 태깅되지 않은 메모리(716) 내의 위치에 저장할 수 있다. 이러한 스토어 명령어에 대해, 데이터는 (706)으로부터 (710)을 통해 태깅되지 않은 데이터를 출력하는 밸리데이트 드롭 태그(714b)로 전달될 수 있다.In operation, an instruction, such as a load instruction, may reference data stored in untagged memory 716, which is then passed to data cache 706 for use in instruction execution. For this load instruction, data may be passed from 716 through 715 for processing by 714a outputting the tagged data (tagged as untrusted and public). The tagged data output by 714a is stored in the L1 data cache 706 for processing. In a similar manner, a store instruction may store data from data cache 706 to a location within untagged memory 716 . For this store instruction, data can be passed from 706 through 710 to validating drop tag 714b which outputs untagged data.

코드는 프로세서(702) 상에서 실행되어 (716)으로부터 태깅되지 않은 데이터를 시스템으로 가져와서 예를 들어 DRAM(712c) 상에 저장할 수 있다. 태깅되지 않은 데이터를 가져오는 코드의 로직은 다음과 같이 나타낼 수 있다:Code may run on processor 702 to retrieve untagged data from 716 into the system and store it on, for example, DRAM 712c. The logic of the code to fetch untagged data can be represented as follows:

1. 부가 태그(714a)에 의해 출력된 태깅된 데이터는 신뢰성 없는 버퍼에 (공개적, 신뢰성 없는 것으로 태깅되어)에 저장될 수 있다.1. The tagged data output by the add-on tag 714a may be stored in an untrusted buffer (tagged as public, untrusted).

2. 신뢰성 없는 버퍼에 저장된 태깅된 데이터를 해독하고 디코드 버퍼에 저장한다. 따라서 디코드 버퍼는 공개적, 신뢰성 없는 것으로 태깅된 해독된 데이터를 포함하고 있다.2. The tagged data stored in the unreliable buffer is decoded and stored in the decode buffer. Thus, the decode buffer contains decrypted data that is tagged as public and untrustworthy.

3. 디코드 버퍼가 유효하고 훼손되지 않은 데이터를 포함함을 보장하는 입증 처리를 수행한다. 이러한 입증 처리는 본 명세서의 다른 곳에서 설명되고 관련 기술분야에 공지된 바와 같이 디지털 서명을 사용할 수 있다.3. Perform validation processing to ensure that the decode buffer contains valid and uncorrupted data. This attestation process may use digital signatures as described elsewhere herein and known in the art.

4. 디코드 버퍼가 입증 데이터를 포함하고 있으면, 신뢰성 있는 코드의 제 2 부분이 실행되어 공개적, 신뢰성 없는 것으로 태깅된 디코드 버퍼의 데이터를 신뢰성 있는 것으로 태깅된 데이터로 변환할 수 있다. 신뢰성 있는 코드 부분은 실행될 때, 디코드 버퍼의 데이터를 신뢰성 있는, 공개적인 것으로 다시 태깅하는 규칙을 호출하는 하나 이상의 명령어를 포함할 수 있다. 신뢰성 있는, 공개적인 것으로 지금 태깅되어진 다시 태깅된 데이터는 외부 DRAM(712c)에 위치한 신뢰성 있는 버퍼에 저장될 수 있다.4. If the decode buffer contains attestation data, the second part of the trusted code may be executed to convert the data in the decode buffer that is tagged as public and untrusted to data that is tagged as trusted. A trusted piece of code may contain one or more instructions that, when executed, invoke a rule to re-tag the data in the decode buffer as trusted, public. Re-tagged data now tagged as trusted, public may be stored in a trusted buffer located in external DRAM 712c.

신뢰성 있는 코드는 실행될 때 참조된 메모리 위치를 다시 태깅하는 규칙을 호출할 수 있는 인가(authority)를 부여하는 특수한 명령어 태그로 태깅된 메모리 명령어를 포함할 수 있다. 예를 들어, 신뢰성 있는 코드는 (신뢰성 없는 버퍼에) 공개적, 신뢰성 없는 것으로 태깅된 데이터를 목적지 메모리 위치(신뢰성 있는 버퍼)에 공개적, 신뢰성 있음의 새로운 태그로 저장하는 특수 태그 스토어 명령어를 포함할 수 있다. 전술한 신뢰의 스토어 명령어는 예를 들어 로더에 의해 특수하게 태깅될 수 있다.Trusted code may contain memory instructions tagged with special instruction tags that, when executed, give authority to invoke rules that re-tag referenced memory locations. For example, trusted code could include special tag store instructions that store data tagged as public, untrusted (in an untrusted buffer) with a new tag as public, trusted, in a destination memory location (trusted buffer). there is. The aforementioned store of trust command may be specially tagged by the loader, for example.

다음과 같이 데이터를 공개적, 신뢰성 없음으로부터 공개적, 신뢰성 있음으로 다시 태깅하는 신뢰 코드의 로직을 나타낼 수 있다.We can represent the logic of the trusted code to re-tag data from public, untrusted to public, trusted as follows:

*

*

여기서 N은 신뢰성 없는 버퍼의 길이이고 temp는 재태깅을 수행하는데 사용되는 임시 버퍼이다. 제 1 명령어인 temp = *untrusted buffer [i] 는 태깅되지 않은 메모리(716)로부터 신뢰성 없는 버퍼의 제 1 요소를 임시 버퍼에 로드하는 로드 명령어를 발생할 수 있다. 제2 명령어인 trusted buffer [i] = temp는 임시 버퍼에서 공개적, 신뢰성 없음으로부터 태깅된 데이터를 신뢰성 있는 버퍼 [i]에 공개적, 신뢰성 있음이라는 새로운 태그로 저장하는 스토어 명령어일 수 있다. 따라서, 제 2 명령어는 인가가 데이터를 신뢰성 없음으로부터 신뢰성 있음으로 다시 태깅하도록 하기 위해 위에서 언급한 바와 같이 특수 태깅된 명령어이다.where N is the length of the unreliable buffer and temp is the temporary buffer used to perform re-tagging. The first command, temp = *untrusted buffer [i], may generate a load command that loads the first element of the untrusted buffer from the untagged memory 716 into a temporary buffer. The second command, trusted buffer [i] = temp, may be a store command for storing data tagged as open and unreliable in a temporary buffer into a new tag of open and trusted in a trusted buffer [i]. Thus, the second instruction is a specially tagged instruction as noted above to cause the authorization to re-tag the data from untrusted to trusted.

유사한 방식으로, (712c)의 태깅된 데이터가 태깅되지 않은 메모리(716)(또는 임의의 태깅되지 않은 메모리 소스(701))로 내보내지거나 저장될 때, 코드는 데이터 아이템을 암호화하여 서명을 생성하는 프로세서(702)에 의해 실행되며, 암호화된 데이터 아이템 및 서명은 (714b)로 발송될 수 있고, 이곳에서 태가 제거된 다음(715)를 통해 전송되어 (716)에 저장된다.In a similar manner, when the tagged data in 712c is exported or stored to untagged memory 716 (or any untagged memory source 701), the code encrypts the data item to create a signature. Executed by processor 702, the encrypted data item and signature may be forwarded to 714b, where they are stripped, then transmitted via 715 and stored at 716.

예(700)의 변형예로서, 메모리(716 및 712c)는 단일화될 수 있고 또한 인터커넥트 패브릭(710 및 715)도 단일화될 수 있다. 그러한 실시예에서, 태깅되지 않은 메모리 소스(701)가 액세스하도록 허용된 어드레스 범위는 제한될 수 있다. 예를 들어, 도 58의 예(720)가 참조된다. 예(720)는 (700)에서와 같은 번호를 가진 것들과 유사한 컴포넌트를 포함하며, 컴포넌트(714a 내지 714b, 715 및 716)가 제거되고 메모리(712c)가 신뢰성 없는, 공개적으로 태깅된 데이터를 저장하는데 사용되는 메모리(712c)의 영역을 나타내는 부분 U(722)을 포함한다는 차이점이 있다. 신뢰성 없는 DMA 및 I/O 서브시스템과 같은 태깅되지 않은 메모리 소스(701)는 메모리(722)의 하위 16 또는 256MB를 사용하는 것으로 제한될 수 있다. 일 실시예에서, U(722)에 저장된 데이터는 명시적으로 태깅될 수 없고, 오히려 이렇게 제한된 범위 내의 어드레스를 갖는 U에 저장된 모든 데이터가 공개적 및 신뢰성 없음으로 암시적으로 태깅되어 취급될 수 있다. 변형예로서, 실시예는 신뢰성 없는 공개적 데이터를 나타내는 영구 태그로 부분 U(722)에 미리 태깅할 수 있으며 전술한 연관된 영구 메타데이터 태그는 수정될 수 없다. 규칙은 프로세서가 다른 데이터를 영역 U(722)에 저장하는 것을 방지할 수 있다. 예를 들어, (701)에 포함된 DMA에 의해 수행되는 신뢰성 없는 DMA 동작은 영역 U(722)에 기입하는 것으로 제한될 수 있다.As a variation of example 700, memories 716 and 712c may be unified and also interconnect fabrics 710 and 715 may be unified. In such an embodiment, the address range that untagged memory source 701 is allowed to access may be limited. For example, see example 720 of FIG. 58 . Example 720 includes components similar to those numbered as in 700, with components 714a-714b, 715 and 716 removed and memory 712c storing untrusted, publicly tagged data. The difference is that it includes a portion U 722 representing the area of memory 712c used to Untagged memory sources 701, such as unreliable DMAs and I/O subsystems, may be limited to using the lower 16 or 256 MB of memory 722. In one embodiment, data stored in U 722 cannot be explicitly tagged, rather all data stored in U having an address within this limited range may be implicitly tagged and treated as public and untrustworthy. As a variant, an embodiment may pre-tag portion U 722 with a persistent tag representing untrusted public data and the aforementioned associated persistent metadata tag cannot be modified. A rule may prevent the processor from storing other data to area U 722 . For example, unreliable DMA operations performed by the DMA included in 701 may be limited to writing to area U 722.

언포트된(unported) I/O 처리 코드를 실행해야 하는 실시예는 컴포넌트의 신뢰성 없는 측의 전용 I/O 프로세서상에서 실행될 수 있다. 예를 들어, 도 59의 예(730)가 참조된다. 예(730)는 (700)에서와 같은 번호를 가진 것과 유사한 컴포넌트를 포함하며, 컴포넌트(732, 732a 및 732b)가 추가되는 차이점이 있다. 요소(732)는 PUMP 및 메타데이터 규칙 처리없이 실행되는 추가의 RISC-V 프로세서이다. 요소(732a)는 제 2 프로세서(732)에 대한 데이터 캐시를 나타내고 요소(732b)는 제 2 프로세서(732)에 대한 명령어 캐시를 나타낸다. 데이터 캐시(732)는 태깅되지 않은 인터커넥트 패브릭(715)에 연결될 수 있다.Embodiments that must execute unported I/O handling code can run on a dedicated I/O processor on the untrusted side of the component. For example, see example 730 of FIG. 59 . Example 730 includes components similar to those numbered as in 700, with the difference that components 732, 732a and 732b are added. Element 732 is an additional RISC-V processor that runs without PUMP and metadata rule processing. Element 732a represents a data cache for second processor 732 and element 732b represents an instruction cache for second processor 732 . Data cache 732 may be coupled to untagged interconnect fabric 715 .

본 명세서의 다른 곳에서 보다 상세하게 설명되는 바와 같이, 별도의 I/O PUMP가 태깅되지 않은 데이터 소스(예를 들어, 701, 716) 및 프로세서(702)에 의해 사용되는 태깅된 메모리(712c) 사이를 중재하는 또 다른 대안으로서 사용될 수 있다.As described in more detail elsewhere herein, separate I/O PUMPs are untagged data sources (e.g., 701, 716) and tagged memory 712c used by processor 702. It can be used as another alternative mediating between them.

도 60을 참조하면, 태깅되지 않은 데이터 소스(예를 들어, 701, 716)와 프로세서(702)에 의해 사용되는 태깅된 메모리(712c) 사이를 중재하기 위해 본 명세서에서의 기술과 관련하여 사용되는 시스템에 포함될 수 있는 컴포넌트의 다른 실시예가 도시된다. 예(740)은 예(700)와 유사한 컴포넌트를 포함하며, 컴포넌트(714a 내지 714b)가 제거되고 인턴(intern)(742) 및 엑스턴(extern)(744)으로 대체된다는 차이점이 있다. 이 실시예에서, 인턴(742) 및 엑스턴(744)은 위에 설명된 처리를 수행하는 하드웨어 컴포넌트일 수 있다. 특히, 인턴(742)은 수신된 태깅되지 않은 데이터를 처리하고 신뢰성 있는, 공개적임으로 태깅된 입증된 데이터 아이템을 출력하는 하드웨어를 포함할 수 있다. 신뢰성 있는, 공개적임으로 태깅된 데이터 아이템은 명령어의 실행과 관련하여 프로세서(702)에 의해 사용되는 데이터 캐시(706)에 저장하기 위해 패브릭(710)에 전달될 수 있다. 인턴(742)은 태깅되지 않은 암호화된 데이터의 입증 처리를 수행하는 하드웨어를 포함할 수 있으며, 성공적인 입증을 가정하면, 수신된 태깅되지 않은 데이터를 신뢰성 있는, 공개적임으로 추가 태깅할 수 있다. 엑스턴(744)는 태깅된 암호화되지 않은 데이터를 처리하고 서명된 암호화된 데이터 아이템을 출력하는 하드웨어를 포함할 수 있다. 서명된 암호화된 데이터 아이템이 본 명세서에서 설명된 대로 메타데이터 규칙 처리를 수행하지 않는 다른 프로세서에서 사용될 것이라면 엑스턴은 암호화하기 전에 태그를 제거할 수 있다.Referring to FIG. 60 , used in connection with the techniques herein to arbitrate between an untagged data source (eg, 701 , 716 ) and a tagged memory 712c used by the processor 702 Other embodiments of components that may be included in the system are shown. Example 740 includes components similar to example 700, with the difference that components 714a-b are removed and replaced with intern 742 and extern 744. In this embodiment, intern 742 and exton 744 may be hardware components that perform the processing described above. In particular, intern 742 may include hardware that processes received untagged data and outputs attested data items that are tagged as reliable, public. Data items that are tagged as trusted and public can be passed to fabric 710 for storage in data cache 706 used by processor 702 in connection with execution of instructions. Interns 742 may include hardware that performs attestation processing of untagged encrypted data and, assuming successful attestation, may further tag received untagged data as trusted, public. Exturn 744 may include hardware that processes tagged unencrypted data and outputs signed encrypted data items. If the signed encrypted data item is to be used by another processor that does not perform metadata rule processing as described herein, Exton may remove the tag prior to encryption.

가장 간단한 사례에서, 인턴(742) 및 엑스턴(744)의 하드웨어는 단일의 공개-개인 키 세트를 호스팅할 수 있고, 서명 및 암호화 역시 단일 키 세트를 사용하여 수행된다. 키 세트는 (741 및 744)에 의해 사용된 하드웨어에서 인코딩될 수 있다. 또 다른 변형예에서, 인턴(742) 및 엑스턴(744)의 하드웨어는 다수의 공개-개인 키 세트를 호스팅할 수 있고, 서명 및 암호화는 또한 복수의 키 세트(각 세트는 다른 공개-개인 키 쌍을 포함함) 중 하나를 사용하여 수행된다. 다중 키 세트는 (742 및 744)에 의해 사용되는 하드웨어에서 인코딩될 수 있다. 입력되는 태깅되지 않은 데이터와 함께 포함된 클리어 데이터는 인턴 유닛(742)에게 어떤 키 세트를 사용할지를 알려준다. 따라서, 인턴(742)은 다수의 키 세트를 포함하는 하드웨어 데이터 스토어(예를 들어, 연상 메모리(associative memory))에서 룩업을 수행하여 원하는 키 세트를 선택할 수 있다. 다수의 키 세트 각각은 다른 태그와 연관될 수 있고, 그러므로, 클리어 데이터에 의해 표시된 특정 키 세트는 또한 태깅된 데이터가 포함할 특정 태그를 지시한다. 이러한 방식으로, (742)에 의해 출력된 태깅된 데이터 아이템의 태그는 데이터 아이템이 공개적임을 나타내고 또한 데이터 아이템이 다수의 키 세트 중 특정 키 세트를 사용하여 암호화/암호 해독된 것임을 나타낸다. 다수의 키 세트를 갖는 실시예에서, 엑스턴(744)은 태그를 검사하여 다중 키 세트 중 어느 특정 키가 데이터 아이템을 암호화하고 서명하는 것과 관련하여 사용되는지를 결정할 수 있다. 따라서, 인턴 유닛(742) 처리는 수신된 태깅되지 않은 데이터를 검증하고 태깅을 수행하는 격리된 하드웨어 컴포넌트를 제공함으로써, 위에서 언급한 신뢰 코드 부분과 같은 코드의 일부분이 데이터에 태깅하는 능력을 가질 필요가 없게 한다.In the simplest case, the hardware of intern 742 and extern 744 can host a single public-private key set, and signing and encryption are also performed using the single key set. The key set may be encoded in the hardware used by 741 and 744. In yet another variation, the hardware of intern 742 and exton 744 may host multiple sets of public-private keys, and signing and encryption may also be performed using multiple sets of keys, each set with a different public-private key. (including pairs) is performed using one of the Multiple key sets may be encoded in the hardware used by 742 and 744. The clear data included with the incoming untagged data informs the intern unit 742 which set of keys to use. Accordingly, intern 742 can select a desired key set by performing a lookup in a hardware data store (eg, associative memory) containing multiple key sets. Each of multiple sets of keys may be associated with a different tag, and therefore, a particular set of keys indicated by clear data also dictates a particular tag that the tagged data will contain. In this way, the tag of the tagged data item output by 742 indicates that the data item is public and also indicates that the data item was encrypted/decrypted using a particular key set of multiple keys. In embodiments with multiple key sets, exturn 744 can inspect the tags to determine which particular key of the multiple key set is being used in connection with encrypting and signing data items. Thus, the intern unit 742 process requires that portions of the code, such as the trusted code portion discussed above, have the ability to tag data, by providing an isolated hardware component that performs tagging and verifying received untagged data. Let there be no

도 1 및 도 24를 다시 참조하면, 스테이지 5에서 PUMP(10)로의 입력은 본 명세서의 다른 부분에서 설명된 바와 같이 태그를 포함한다. 명령어가 명령어의 오퍼랜드로서 메모리 위치를 포함하는 경우, 메모리 입력 및 연관된 태그인 MR 태그(본 명세서에서 때로는 Mtag라고도 지칭함)를 획득하는 것은 여분의 파이프라인 지연(pipeline stall)을 유발하며, 이에 따라 스테이지 5에서 PUMP(10)는 MR 태그를 포함한 모든 입력을 가질 때까지 진행할 수 없다. 메모리로부터 판독된 실제 MR 태그 값을 검색하기를 기다리기보다, R 태그인 명령어의 결과에 대한 태그 값(예를 들어, 만일 있다면, 목적지 레지스터 또는 메모리 위치)을 결정하기 위해 사용될 수 있는 예상 또는 예측된 MR 태그를 결정하는 처리가 본 명세서에서의 기술에 따라 수행될 수 있다. 이러한 실시예에서, 최종 체크는 예측된 MR 태그가 명령어의 오퍼랜드에 대해 메모리로부터 검색된 액션 MR 태그와 매칭하는지를 결정하는 스테이지 6, 라이트백 또는 커밋 스테이지(예를 들어, 도 1의 요소(22) 및 도 24의 마지막 스테이지 6을 참고할 것)에서 수행될 수 있다. 오퍼랜드로서 메모리 위치를 갖는 명령어에 대한 Rtag를 결정하기 위해 예측된 MR 태그의 전술한 선택 및 사용은 Rtag 예측 가속기 최적화(prediction accelerator optimization)라고 지칭될 수 있다.Referring again to Figures 1 and 24, the input to PUMP 10 at stage 5 includes a tag as described elsewhere herein. If an instruction contains a memory location as an operand of the instruction, acquiring the memory input and associated tag, the MR tag (sometimes referred to herein as Mtag), causes an extra pipeline stall, and thus the stage At 5, PUMP 10 cannot proceed until it has all inputs including MR tags. Rather than waiting to retrieve the actual MR tag value read from memory, an expected or predicted value that can be used to determine the tag value (e.g. destination register or memory location, if any) for the result of an instruction that is an R tag. A process of determining an MR tag can be performed according to the techniques herein. In this embodiment, the final check is the action where the predicted MR tag was retrieved from memory for the instruction's operand. It may be performed in stage 6, writeback or commit stage to determine if it matches the MR tag (see for example element 22 of FIG. 1 and final stage 6 of FIG. 24). The aforementioned selection and use of predicted MR tags to determine Rtags for instructions that have memory locations as operands may be referred to as Rtag prediction accelerator optimization.

도 61을 참조하면, Rtag 예측 가속기 최적화에 대한 본 명세서에서의 기술에 따른 실시예의 컴포넌트를 설명하는 예(800)가 도시된다. 예(800)는 Rtag 예측 가속기 최적화를 수행하기 위한 부가적인 특징을 갖는 본 명세서의 다른 곳)(예를 들어, 도 1 및 도 24)에서 설명된 바와 같이 스테이지 5에서 PUMP(10)에 대응하는 PUMP(802를 포함한다. PUMP(802)는 본 명세서의 다른 곳에서 설명된 바와 같이 MR 태그(804a)뿐만 아니라 다른 PUMP 입력(804)을 입력으로서 포함한다. PUMP(802)는 또한 다른 입력, 즉 PUMP(802)가 정상 처리 모드(MR 태그 예측 처리가 수행되지 않는 비-예측 모드)에서 실행되는지 또는 그와 달리 (MR 태그 예측 처리가 수행되는) 예측 모드에서 실행되는지를 나타내는 예측 선택기 모드(804b)를 포함한다. 적어도 하나의 실시예에서, 예측 모드 선택기(804b)는 예측된 MR 태그 값이 결정되지 않는 PUMP에 대한 정상 처리 모드를 나타내는 0일 수도 있거나, 아니면 예측된 MR 태그 값이 결정되는 PUMP에 대한 예측 모드를 나타내는 1일 수도 있다. 예측 모드 선택기가 1일 때, PUMP(802)는 MR 태그(804a) 입력이 마스킹되거나 무시될 수 있는 예측 모드에서 실행할 수 있고, PUMP(802)는 예측된 MR 태그(805c)를 출력으로서 생성한다. 예측 모드 선택기가 0일 때, PUMP(802)는 본 명세서의 다른 곳에서 설명된 바와 같은 정상 처리 모드에서 실행할 수 있고, 이 모드에서 MR 태그(804a)는 PUMP(802)로의 입력이고 출력(805c)은 발생되지 않는다.Referring to FIG. 61 , an example 800 illustrating components of an embodiment in accordance with the techniques herein for Rtag prediction accelerator optimization is shown. Example 800 corresponds to PUMP 10 at stage 5 as described elsewhere herein with additional features for performing Rtag prediction accelerator optimization (e.g., FIGS. 1 and 24). PUMP 802. PUMP 802 includes as an input an MR tag 804a as well as other PUMP inputs 804 as described elsewhere herein. PUMP 802 also includes other inputs, That is, the prediction selector mode (which indicates whether the PUMP 802 is running in normal processing mode (non-prediction mode in which MR tag prediction processing is not performed) or otherwise in prediction mode (in which MR tag prediction processing is performed). 804 b) In at least one embodiment, the prediction mode selector 804 b may be 0 indicating a normal processing mode for a PUMP for which a predicted MR tag value is not determined, or a predicted MR tag value is determined. may also be 1, indicating the prediction mode for the PUMP to be 1. When the prediction mode selector is 1, the PUMP 802 may run in a prediction mode in which the MR tag 804a input may be masked or ignored, and the PUMP 802 produces as output a predicted MR tag 805c When the prediction mode selector is 0, the PUMP 802 can run in normal processing mode as described elsewhere herein, in which mode the MR tag 804a is an input to PUMP 802 and no output 805c is generated.

예(800)에 도시된 바와 같이, 스테이지 5에서 PUMP(802)의 추가 출력은 R tag(805a) 및 PC new tag(805b)를 포함한다. 예측된 MR 태그를 사용할 때, 예측된 MR 태그에 대한 규칙이 결정될 수 있고, 이 경우 규칙은 R 태그에 대해 연관된 태그를 지정한다. 예측 모드에서 동작할 때, 예측된 MR 태그(805c)는 파이프라인의 스테이지 6(808)로의 추가 입력이다. 요소(808)는 본 명세서의 다른 곳에서 설명된 바와 같이 커밋 또는 라이트백 스테이지(예컨대, 도 1 및 도 24)를 나타낼 수 있다. 따라서, 요소(808a)는 일반적으로 본 명세서의 다른 곳에서 설명된 바와 같이 (805a 내지 805c) 이외의 다른 스테이지 6 입력을 나타낼 수 있다.As shown in example 800, additional outputs of PUMP 802 at stage 5 include R tag 805a and PC new tag 805b. When using a predicted MR tag, a rule for the predicted MR tag can be determined, in which case the rule specifies an associated tag for the R tag. When operating in predictive mode, the predicted MR tag 805c is an additional input to stage 6 808 of the pipeline. Element 808 may represent a commit or writeback stage (eg, FIGS. 1 and 24 ) as described elsewhere herein. Accordingly, element 808a may represent other stage 6 inputs than 805a-805c, as generally described elsewhere herein.

스테이지 6(808)에서, 추가 처리(808b)는 PUMP(802)가 예측 모드에서 동작할 때 수행될 수 있다. 요소(808b)는 예측된 MR 태그를 명령어의 오퍼랜드에 대해 메모리로부터 획득된 액션 MR 태그와 비교하는 체크가 단계 6(808)에서 수행될 수 있음을 나타낸다. 달리 말하면, (808b)는 예측된 MR 태그가 메모리로부터 획득된 MR 태그와 매칭하는지를 결정함으로써 PUMP(802)가 MR 태그 값을 정확하게 예측했는지를 평가한다. 예측된 MR 태그가 메모리로부터 획득된 MR 태그와 매칭하지 않으면, 잘못된 규칙이 트리거되어 PUMP(802)에 의해 잘못 예측된 MR 태그로 R 태그(805a)를 결정하는데 사용된다. 올바른 규칙은 이제 (실제 MR 태그에 따라서) 선택되어야 하고 수정된 R 태그를 결정하는데 사용되어야 한다. 따라서, 예측된 MR 태그가 MR 태그와 매칭하지 않으면, 규칙 캐시 미스가 결정되고 캐시 미스 처리가 수행된다. 본 명세서의 다른 부분의 설명과 일관하여, 캐시 미스 처리는 MR 태그를 사용하여 올바른 규칙을 선택하고 평가하는 처리를 포함할 수 있다.At stage 6 808, additional processing 808b may be performed when PUMP 802 is operating in a predictive mode. Element 808b indicates that a check may be performed in step 6 808 comparing the predicted MR tag to the action MR tag obtained from memory for the instruction's operand. In other words, 808b evaluates whether PUMP 802 correctly predicted the MR tag value by determining whether the predicted MR tag matches the MR tag obtained from memory. If the predicted MR tag does not match the MR tag obtained from memory, a false rule is triggered and used by the PUMP 802 to determine the R tag 805a as the erroneously predicted MR tag. The correct rule must now be selected (according to the actual MR tag) and used to determine the modified R tag. Therefore, if the predicted MR tag does not match the MR tag, a rule cache miss is determined and cache miss processing is performed. Consistent with the discussion elsewhere herein, cache miss processing may include processing using MR tags to select and evaluate the correct rule.

로드/판독 및 스토어/기입 명령어는 예측된 MR 태그의 사용으로부터 이익을 얻는 오퍼랜드로서 메모리 위치를 포함할 수 있는 실시예에서 명령어의 예이다. PUMP로의 다른 입력(804)은 MR 태그(804a) 외에 다른 또는 나머지 입력 태그의 세트를 포함한다. 예를 들어, 도 23과 관련하여 도시된 일 실시예는 5 입력 태그 - PC tag, CI tag, OP1 tag, OP2 tag 및 MR tag - 및 2 출력 태그 - PC new 및 R tag - 를 가질 수 있다. 따라서 (MR 태그를 제외한) 나머지 입력 태그 세트는 PC tag, CI tag, OP1 tag, OP2 tag의 4 태그를 포함한다. 예측된 MR 태그 또는 명령어를 결정하는 것은 명령어의 (예를 들어, PC tag, CI tag, OP1 tag, OP2 tag에 대한) 4 태그와 매칭하는 태그 값을 갖는 하나 이상의 규칙 세트를 결정하는 것을 포함할 수 있다. 경우에 따라, 하나의 규칙만이 4 입력 태그에 대해 매칭하는 태그 값을 포함할 수 있다. 이 경우, 단일 매칭 규칙은 예측된 MR 태그(805c)로서 사용될 수 있는 MR 태그에 값을 또한 특정한다. 또한, 규칙은 4 입력 태그 및 예측된 MR 태그를 사용하여 평가되어 R 태그(805a)를 추가로 결정할 수 있다.Load/Read and Store/Write instructions are examples of instructions in embodiments that may include memory locations as operands that benefit from the use of predicted MR tags. The other input 804 to the PUMP includes another or remaining set of input tags in addition to the MR tag 804a. For example, one embodiment shown with respect to FIG. 23 may have 5 input tags - PC tag, CI tag, OP1 tag, OP2 tag and MR tag - and 2 output tags - PC new and R tags. Therefore, the remaining input tag set (excluding MR tag) includes 4 tags: PC tag, CI tag, OP1 tag, and OP2 tag. Determining the predicted MR tag or instruction may include determining one or more rule sets having tag values that match 4 tags (eg, for PC tag, CI tag, OP1 tag, OP2 tag) of the instruction. can In some cases, only one rule may contain matching tag values for 4 input tags. In this case, the single matching rule also specifies a value to the MR tag that can be used as the predicted MR tag 805c. In addition, the rule can be evaluated using the 4-input tag and the predicted MR tag to further determine the R tag 805a.

예를 들어, 전형적인 로드 및 스토어 동작이 있는 메모리 안전 정책을 고려해 본다. 로드 동작은 포인터를 사용하여 소스 메모리 위치로부터 데이터를 로드할 수 있고, 제 1 규칙은 소스 메모리 위치상의 태그 또는 컬러가 포인터의 태그 또는 컬러와 매칭해야 함을 나타낸다. 스토어 동작은 포인터를 사용하여 타깃 메모리 위치에 데이터를 저장할 수 있고, 제 2 규칙은 타깃 메모리 위치상의 태그 또는 컬러가 포인터의 태그 또는 컬러와 매칭하여야 함을 나타낸다. 로드 명령어의 경우, 제 1 규칙은 로드 명령어의 PC tag, CI tag, OP1 tag 및 OP2 tag에 대한 4 입력 태그와 매칭하는 태그 값을 갖는 유일한 규칙일 수 있다. 제 1 규칙의 MR 태그는 예측된 MR 태그(805c)로서 사용될 수 있다. 또한, 제 1 규칙의 R 태그는 4 입력 태그 세트 및 예측된 MR 태그를 사용하여 결정될 수 있다. 유사한 방식으로, 스토어 명령어의 경우, 제 2 규칙은 스토어 명령어의 PC tag, CI tag, OP1 tag 및 OP2 tag에 대한 4 입력 태그와 매칭하는 태그 값을 갖는 유일한 규칙일 수 있다. 제 2 규칙의 MR 태그는 예측된 MR 태그(805c)로서 사용될 수 있다. 또한, 제 2 규칙의 R 태그는 4 입력 태그 세트 및 예측된 MR 태그를 사용하여 결정될 수 있다.For example, consider a memory safety policy with typical load and store operations. A load operation may load data from a source memory location using a pointer, and the first rule indicates that the tag or color on the source memory location must match the tag or color of the pointer. A store operation may use a pointer to store data in a target memory location, and the second rule indicates that the tag or color on the target memory location must match the tag or color of the pointer. In the case of a load command, the first rule may be the only rule having a tag value that matches 4 input tags for the PC tag, CI tag, OP1 tag, and OP2 tag of the load command. The MR tag of rule 1 can be used as the predicted MR tag 805c. Also, the R tag of the first rule can be determined using the 4 input tag set and the predicted MR tag. Similarly, for a store instruction, the second rule may be the only rule with a tag value matching the 4 input tags for the PC tag, CI tag, OP1 tag, and OP2 tag of the store instruction. The MR tag of the second rule may be used as the predicted MR tag 805c. Also, the R tag of the second rule can be determined using the 4 input tag set and the predicted MR tag.

다른 사례에서, 명령어의 PC tag, CI tag, OP1 tag 및 OP2 tag에 대한 입력 태그와 매칭하는 태그를 갖는 정책의 규칙 세트는 예측된 MR 태그(805c)로서 사용될 수 있는 다른 허용 가능한 후보 또는 후보 MR 태그를 식별하는 각각의 매칭 규칙을 갖는 다수의 매칭 규칙을 포함할 수 있다. 실시예는 다수의 허용 가능한 MR 태그 중 하나를 예측된 MR 태그로서 사용하기 위해 선택하는 임의의 적절한 기술을 사용할 수 있다. 예를 들어, 실시예는 가장 일반적이거나 발생할 가능성이 있는 허용 가능한 MR 태그 세트 중의 MR 태그를 선택할 수 있다. 발생 가능성이 가장 높은 MR 태그는 이전의 관측 또는 규칙 프로파일링에 기초할 수 있다. 대안으로서, 실시예는 예측된 MR 태그를 이전 또는 가장 최근의 MR 태그로 설정할 수 있다. 최악의 경우, 예측된 MR 태그가 일단 수신된 실제 MR 태그와 매칭하지 않으면, 본 명세서에 설명된 바와 같이 캐시 미스 처리가 수행되어 실제 MR 태그를 명령어의 다른 입력 태그와 함께 사용하여 올바른 규칙을 결정할 수 있다.In another instance, the policy's rule set with tags matching the input tags for the instruction's PC tag, CI tag, OP1 tag, and OP2 tag may be used as the predicted MR tag 805c. It can include multiple matching rules, with each matching rule identifying a tag. Embodiments may use any suitable technique for selecting one of a number of acceptable MR tags to use as a predicted MR tag. For example, an embodiment may select an MR tag from a set of allowable MR tags that is most common or likely to occur. The most probable MR tag may be based on previous observations or rule profiling. Alternatively, an embodiment may set the predicted MR tag to the previous or most recent MR tag. In the worst case, if the predicted MR tag does not match the actual MR tag once received, cache miss handling is performed as described herein to use the actual MR tag along with the other input tags of the instruction to determine the correct rule. can

적어도 하나의 실시예에서, PUMP가 예측 모드에서 동작할 때 사용되는 메모리 동작을 위한 규칙의 클래스가 생성될 수 있다. 규칙 클래스는 "메모리 태그 예측(predict memory tag)" 규칙의 클래스라고 지칭될 수 있다. "메모리 태그 예측" 규칙의 경우, MR 태그(804a)는 PUMP(802)로의 입력으로서 사용되지 않으며, 이에 따라서 PUMP에 의해 수행되는 다양한 룩업과 관련하여 사용되지 않는다. 예를 들어, "메모리 태그 예측" 규칙에 대한 care/don't care 비트 벡터는 MR 태그를 don't care로서 취급할 수 있다. 또한, "메모리 태그 예측" 규칙은 MR 태그를 입력으로 생략하고, 그 대신 예측된 MR 태그를 출력으로 지정할 수 있다. 전술한 바와 같이, PC tag, CI tag, OP1 tag 및 OP2 tag에 대한 입력 태그의 특정 세트와 매칭하는 다수의 매칭하는 정상 규칙이 있다면, 매칭하는 규칙의 세트에 대응하는 단일 "메모리 태그 예측" 규칙은 예측된 MR 태그를 가장 일반적이거나 예상되는 MR 태그인 출력으로 특정할 수 있다. 일 실시예에서, 매칭하는 규칙의 세트에 대응하는 단일 "메모리 태그 예측" 규칙은 예측된 MR 태그로서, PUMP(802)에 의해 수신된 마지막 또는 이전 MR 태그를 특정할 수 있다.In at least one embodiment, a class of rules may be created for memory operations that are used when a PUMP operates in predictive mode. A rule class may be referred to as a class of "predict memory tag" rules. For the "predict memory tag" rule, the MR tag 804a is not used as an input to PUMP 802, and thus is not used in connection with the various lookups performed by PUMP. For example, a care/don't care bit vector for a "memory tag prediction" rule could treat an MR tag as a don't care. Also, the "memory tag prediction" rule can omit the MR tag as an input, and instead specify a predicted MR tag as an output. As described above, if there are multiple matching normal rules that match a particular set of input tags for PC tag, CI tag, OP1 tag, and OP2 tag, then a single "memory tag prediction" rule corresponding to the matching set of rules. can specify a predicted MR tag as an output that is the most common or expected MR tag. In one embodiment, a single “memory tag prediction” rule corresponding to the set of matching rules may specify the last or previous MR tag received by PUMP 802 as a predicted MR tag.

정책 로직은 "메모리 태그 예측" 규칙을 삽입할지 또는 사용할지를 결정할 수 있다. 실시예는 각 정책의 2 버전을 유지할 수 있는데, 제 1 버전은 예측 모드에서 동작할 때 사용하기 위한 정책 "메모리 태그 예측" 규칙을 포함하고 제 2 버전은 정상 처리 모드 또는 비-예측 모드에서 동작할 때 사용하기 위한 정상 또는 비-예측 정책 규칙을 포함한다. "메모리 태그 예측" 규칙을 사용할 때 주어진 명령어에 대해 스테이지 6의 (808b)에서 수행된 체크가 실패하면, 캐시 미스 처리는 정상 규칙 세트(예를 들어, 전술한 제 2 버전의 규칙)를 사용하여 명령어에 대한 매칭 규칙을 결정하는 처리를 수행할 수 있다.The policy logic may decide whether to insert or use the "memory tag prediction" rule. Embodiments may maintain 2 versions of each policy, a first version containing the policy "memory tag prediction" rules for use when operating in predictive mode and a second version operating in normal processing mode or non-predictive mode. Contains normal or non-predictive policy rules to use when If the check performed at 808b in stage 6 fails for a given instruction when using the "memory tag prediction" rule, the cache miss handling is performed using the normal rule set (e.g., the second version of the rules described above). Processing for determining matching rules for commands may be performed.

RISC-V 프로세서 및 아키텍처를 사용하는 실시예에서, 예측 모드 선택기(804b)는 대응하는 PUMP CSR을 가질 수 있다. RISC-V 아키텍처를 사용하는 실시예에서 CSR의 사용은 본 명세서의 다른 부분에서 보다 상세하게 설명된다.In embodiments using a RISC-V processor and architecture, the prediction mode selector 804b may have a corresponding PUMP CSR. The use of CSR in embodiments using the RISC-V architecture is described in more detail elsewhere herein.

도 62를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 수행될 수 있는 처리 단계의 흐름도가 도시된다. 흐름도(840)는 예(800)와 관련하여 전술한 바와 같은 처리를 요약한다. 위에서 언급한 바와 같이, 예(800)에 도시된 PUMP(802)는 프로세서 파이프라인의 스테이지 6에 입력을 제공하는 스테이지 5에서의 PUMP를 나타낸다. 적어도 하나의 실시예에서, 흐름도(840)의 단계(842, 844, 846, 848 및 852)는 전술한 바와 같이 PUMP 내에서 구현된 스테이지 5에서 수행된 처리 단계 및 사용된 특정 정책 규칙을 나타낼 수 있고, 단계(854, 856 및 858)는 전술한 스테이지 6에서 수행될 수 있다.Referring to FIG. 62 , a flowchart of processing steps that may be performed in an embodiment consistent with the teachings herein is shown. Flow diagram 840 summarizes processing as described above with respect to example 800 . As noted above, PUMP 802 shown in example 800 represents a PUMP at stage 5 providing input to stage 6 of the processor pipeline. In at least one embodiment, steps 842, 844, 846, 848, and 852 of flowchart 840 may represent the processing steps performed in stage 5 and the specific policy rules used, as described above, implemented within PUMP. , steps 854, 856 and 858 may be performed in stage 6 described above.

단계(842)에서, PUMP가 "메모리 태그 예측" 규칙을 사용하는 예측 모드에서 동작하고 있음을 표시하는 예측 모드가 온/인에이블되어 있는지에 대해 결정이 내려진다. 단계(842)가 아니오라고 평가되면, 제어는 PUMP가 정상적인 규칙을 사용하는 정상 또는 비-예측 모드에서 동작하는 단계(846)로 진행한다. 단계(842)가 예라고 평가되면, 제어는 단계(844)로 진행하여, 현재 명령어가 메모리 입력 동작 명령어인지에 관해 결정이 이루어진다. 단계(842)가 아니오라고 평가되면, 제어는 단계(846)으로 진행한다. 단계(844)가 예라고 평가되면, 제어는 PUMP가 "예측된 메모리 태그" 규칙을 사용하는 예측 모드에서 동작하는 단계(848)로 진행한다. 단계(848)에서, 명령어에 대해 매칭하는 "예측된 메모리 태그" 규칙이 결정될 수 있다. 단계(852)에서, 현재 명령어에 대한 R 태그가 단계(848)로부터 매칭하는 "예측된 메모리 태그" 규칙을 사용하여 결정될 수 있다. 단계(854)에서, 예측된 MR 태그가 실제 MR 태그와 매칭하는지에 대해 결정이 이루어진다. 단계(854)가 아니로라고 평가되면, 제어는 단계(856)으로 진행하여 규칙 미스 핸들러를 호출함으로써 규칙 캐시 미스 처리를 수행한다. 단계(856)가 예라고 평가되면, 제어는 단계(858)로 진행하여 예측된 MR 태그를 포함하는 규칙이 있는 것으로 결정될 때, R 태그가 R 태그 PUMP 출력으로서 사용된다.At step 842, a determination is made as to whether the prediction mode is on/enabled indicating that the PUMP is operating in a prediction mode using the "memory tag prediction" rule. If step 842 evaluates to no, control passes to step 846 where the PUMP operates in normal or non-predictive mode using normal rules. If step 842 evaluates to yes, control passes to step 844 where a determination is made as to whether the current instruction is a memory input operating instruction. If step 842 evaluates to no, control passes to step 846. If step 844 evaluates to yes, control proceeds to step 848 where the PUMP operates in a predictive mode using the "predicted memory tag" rule. At step 848, a “predicted memory tag” rule that matches the instruction may be determined. At step 852, the R tag for the current instruction may be determined using the matching “predicted memory tag” rule from step 848. In step 854, a determination is made as to whether the predicted MR tag matches the actual MR tag. If step 854 evaluates to no, control passes to step 856 to perform rule cache miss handling by invoking the rule miss handler. If step 856 evaluates to yes, control passes to step 858 where the R tag is used as the R tag PUMP output when it is determined that there is a rule containing the predicted MR tag.

예(800)의 변형예로서, 정상적인 비-예측 모드에서 실행되는 PUMP(802) 및 예측 모드에서 실행되는 제 2 PUMP(822)를 포함하는 실시예의 컴포넌트를 도시하는 도 63이 참조된다. 이 예에서, 예측 모드에서 동작하는 PUMP(822)는 또한 예측 모드 선택기(822b)가 항상 ON(예를 들어, 1)인 MR 태그 예측 PUMP라고도 지칭될 수 있다. 유사하게, PUMP(802)의 경우, 예측 모드 선택기(804b)는 또한 OFF(예를 들어, 0)일 수 있다. MR 태그 예측 PUMP(822)는 "메모리 태그 예측" 규칙만을 사용할 수 있고 PUMP(802)는 정상 버전 또는 비-예측 버전의 정책 규칙만을 사용할 수 있다. 이러한 실시예에서, PUMP(802 및 822)는 스테이지 5에서 병렬로 동작할 수 있다. 요소(828)는 스테이지 5 및 6 처리 및 MR 태그 예측 PUMP(822)와 연관된 컴포넌트를 나타낼 수 있다. 요소(829)는 스테이지 5 및 6의 처리 및 정상 모드에서 동작하는 PUMP(802)와 연관된 컴포넌트를 나타낼 수 있다. (829)에서, PUMP(802) 출력은 예(800)와 관련한 바와 같고, 예측된 MR 태그(805c)가 더 이상 PUMP(802)에 의해 출력되지 않는다는 차이가 있다. 또한, 스테이지 6(808)은 체크(808b)를 수행하지 않는다. 요소(828)는 예(800)와 유사한 방식으로 처리를 수행하는 컴포넌트를 포함할 수 있고, MR 태그 예측 PUMP(822)가 위에서 언급한 바와 같이 "메모리 태그 예측" 규칙만을 사용한다는 차이가 있다.As a variation of example 800, reference is made to FIG. 63 showing the components of an embodiment that includes a PUMP 802 running in a normal non-prediction mode and a second PUMP 822 running in a prediction mode. In this example, a PUMP 822 operating in prediction mode may also be referred to as an MR tagged prediction PUMP where prediction mode selector 822b is always ON (eg, 1). Similarly, for PUMP 802, prediction mode selector 804b may also be OFF (eg, 0). MR Tag Prediction PUMP 822 can only use “memory tag prediction” rules and PUMP 802 can only use normal or non-predictive versions of policy rules. In this embodiment, PUMPs 802 and 822 may operate in parallel in stage 5. Element 828 may represent a component associated with stage 5 and 6 processing and MR tag prediction PUMP 822 . Element 829 may represent a component associated with stage 5 and 6 processing and PUMP 802 operating in normal mode. At 829, the PUMP 802 output is as with respect to example 800, with the difference that the predicted MR tag 805c is no longer output by the PUMP 802. Also, stage 6 808 does not perform check 808b. Element 828 may include components that perform processing in a manner similar to example 800, with the difference that MR tag prediction PUMP 822 uses only the "memory tag prediction" rules as noted above.

스테이지 6(808)은 MR 태그 예측 PUMP(822)로부터 PUMP 출력 Rtag(805a) 및 PCnewtag(805b)를 받고 PUMP(802)로부터 Rtag(805d) 및 PCnew tag(805e)를 출력하도록 수정된다. 또한, 스테이지 6에서, Rtag(805a 및 805d) 사이에서 선택이 이루어지고 또한 ((808a)로 나타낸 바와 같이) 예측된 MR 태그가 실제 MR 태그와 매칭하는지에 기초하여 PCnew tag(805c 및 805e) 사이에서 선택이 이루어진다. 예측된 MRtag와 실제 MRtag 사이에서 매칭이 있으면(예를 들어, (808a)가 1 또는 참(true)으로 평가되면), 예측된 PUMP(822)로부터의 태그(예를 들어, Rtag(805a 및 PCnew tag(805b))가 사용되고, 예측되지 않은 PUMP(822)로부터의 태그(예를 들어, Rtag(805d) 및 PCnew tag(805e))는 폐기된다. 예측된 MRtag와 실제 MRtag가 매칭하지 않으면(예를 들어, (808a)가 0 또는 거짓(false)으로 평가되면), 예측된 PUMP(822)로부터의 태그(805a 내지 805c)는 폐기되고, 예측되지 않은 PUMP(802)로부터의 태그(805d 내지 805e)는 사용된다. 예측되지 않은 PUMP(802)는 예측된 PUMP(822)의 출력(805a 내지 805c)보다 늦은 그의 출력(805d 내지 805e)을 제공하며, 그래서 PCnewtag 및 MRtag에 관한 스테이지 5로부터의 PUMP 출력이 스테이지 6로의 입력으로서 처리하는데 필요할 때는 전술한 스테이지 6 입력을 대기하는 스테이지 6에 정지가 도입된다. 예측되지 않은 PUMP(802)는 이것이 선택될 때 PUMP 규칙 캐시 미스를 경험할 수도 있는데, 이 경우, 이는 본 개시내용의 다른 곳에서 설명된 바와 같은 전형적인 규칙 캐시 미스처럼 처리된다.Stage 6 808 is modified to receive PUMP outputs Rtag 805a and PCnewtag 805b from MR tag prediction PUMP 822 and output Rtag 805d and PCnew tag 805e from PUMP 802. Also, in stage 6, a selection is made between Rtags 805a and 805d and also between PCnew tags 805c and 805e based on whether the predicted MR tag matches the actual MR tag (as indicated by 808a). selection is made in If there is a match between the predicted MRtag and the actual MRtag (e.g., 808a evaluates to 1 or true), then the tag from the predicted PUMP 822 (e.g., Rtag 805a and PCnew tag 805b) is used, and tags from the unpredicted PUMP 822 (e.g., Rtag 805d and PCnew tag 805e) are discarded. For example, if (808a) evaluates to 0 or false), tags 805a through 805c from predicted PUMPs 822 are discarded, and tags 805d through 805e from unpredicted PUMPs 802 are discarded. ) is used The unpredicted PUMP 802 provides its outputs 805d through 805e later than the outputs 805a through 805c of the predicted PUMP 822, so the PUMP from stage 5 for PCnewtag and MRtag A pause is introduced in stage 6 waiting for the aforementioned stage 6 input when the output is needed for processing as input to stage 6. The unpredicted PUMP 802 may experience a PUMP rule cache miss when it is selected, in which case , which is treated like a typical rules cache miss as described elsewhere in this disclosure.

스테이지 6(808)을 참조하면, 요소(850 및 852)는 멀티플렉서를 나타낸다. 요소(808a)는 예측된 MRtag가 MRtag와 매칭하는지의 논리적 결과에 기초하여 (850 및 852) 각각으로부터의 입력을 선택하는데 사용되는 선택기를 나타낼 수 있다. 전술한 두 개의 태그 값이 매칭하면, Rtag(805a)는 스테이지 6의 최종 Rtag 출력을 나타내는 선택된 Rtag(850a)로서 제공되는 (850)으로의 입력으로서 선택되고; 그렇지 않고, 전술한 두 개의 태그 값이 매칭하지 않으면, Rtag(805d)는 선택된 Rtag(850a)로서 제공된 (850)으로의 입력으로서 선택된다. 또한, 전술한 두 개의 태그 값이 매칭하면, PCnew tag(805b)는 스테이지 6의 최종 PCnew tag 출력을 나타내는 선택된 PCnew tag(852a)로서 제공된 (852)로의 입력으로서 선택되고; 그렇지 않고, 전술한 두 개의 태그 값이 매칭하지 않으면, PCnew tag(805e)는 선택된 PCnew tag(852a)로서 제공되는 (852)로의 입력으로서 선택된다.Referring to stage 6 808, elements 850 and 852 represent multiplexers. Element 808a may represent a selector used to select an input from each of 850 and 852 based on the logical result of whether the predicted MRtag matches the MRtag. If the foregoing two tag values match, Rtag 805a is selected as input to 850, which serves as the selected Rtag 850a representing the final Rtag output of stage 6; Otherwise, if the above two tag values do not match, Rtag 805d is selected as input to 850 provided as selected Rtag 850a. Also, if the values of the two aforementioned tags match, PCnew tag 805b is selected as input to 852, which is presented as selected PCnew tag 852a representing the final PCnew tag output of stage 6; Otherwise, if the values of the two tags described above do not match, PCnew tag 805e is selected as input to 852, which is provided as the selected PCnew tag 852a.

이제 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 할당된 메모리의 컬러화(coloring allocated memory)를 사용하는 기술이 설명될 것이다.Techniques for using coloring allocated memory that may be used in embodiments according to the techniques herein will now be described.

사용자 프로그램, 예컨대 C 프로그래밍 언어로 코딩된 사용자 프로그램에는 메모리 할당(memory allocation) 및 할당 취소(deallocation)와 관련하여 사용되는 루틴을 부르는 호출이 포함될 수 있다. 예를 들어, malloc 및 free는 C 표준 라이브러리에서의 루틴이며 사용자 프로그램의 실행 파일에 링크될 수 있다. 따라서 malloc 및 free는 malloc과 free를 호출할 수 있는 다른 사용자 코드와 함께 사용자 프로세스 어드레스 공간에서 루틴으로 실행된다. malloc은 코드를 실행함으로써 사용되는 메모리 블록을 할당하는 동적 메모리 할당을 위해 호출된다. 적어도 하나의 실시예에서, malloc은 할당될 메모리 블록의 크기를 나타내는 호출에 특정된 입력을 가질 수 있으며, 이에 따라 malloc은 할당된 메모리 블록을 가리키는 포인터를 리턴한다. 프로그램은 malloc에 의해 리턴된 포인터를 사용하여 할당된 메모리 블록에 액세스한다. 적어도 하나의 실시예에서, free는 malloc이 이전에 할당한 메모리를 프리로 만들어주거나 할당 해제하기 위해 호출된다. malloc을 사용하여 할당된 메모리 블록이 더 이상 필요하지 않을 때, (malloc에 의해 리턴된) 포인터는 입력 인수로서 free에 넘겨질 수 있고, 그러므로 free는 (포인터에 의해 표시된 어드레스로서 위치된) 메모리를 할당 해제하여 다른 목적에 사용될 수 있도록 한다. 본 명세서에서의 기술에 따른 실시예에서 프로세서상에서 실행하는 사용자 코드는 메모리 할당 및 할당 해제를 유사하게 수행하는 malloc 및 free 또는 다른 루틴 또는 함수를 부르는 그러한 호출을 수행할 수 있다. 동적 메모리 할당을 수행하는 malloc 및 free와 같은 루틴은 할당된 메모리에 관한 메모리 관리 메타데이터를 사용할 수 있다. 다음 단락에서, 메모리 관리를 위해 사용되는 그러한 메타데이터는 malloc 메타데이터라고 지칭될 수 있으며, 구별되고 게다가 태그 및 포인터 태그가 가리키는 다른 메타데이터를 비롯한 본 명세서에서 설명된 태그 기반 메타데이터이다(예를 들어, 태그 기반 메타데이터는 실행중인 사용자 코드에 액세스할 수 없으며, 예(1000)와 관련하여 본 명세서의 다른 곳에서 설명된 것과 같은 메타데이터 프로세서 또는 서브시스템에 의해 처리된다). malloc 메타데이터는 예를 들어, 할당된 메모리 블록의 크기와 같은 할당된 메모리 블록에 관한 정보 및 후속하여 할당된 메모리 블록에 대한 malloc 메타데이터 부분을 가리키는 포인터를 포함할 수 있다.A user program, for example a user program coded in the C programming language, may contain calls to routines used in connection with memory allocation and deallocation. For example, malloc and free are routines in the C standard library and can be linked into an executable file of a user program. Thus, malloc and free run as routines in the user process address space, along with other user code that can call malloc and free. malloc is called for dynamic memory allocation, which allocates blocks of memory used by executing code. In at least one embodiment, malloc may have an input specific to the call indicating the size of the block of memory to be allocated, such that malloc returns a pointer to the allocated memory block. The program uses the pointer returned by malloc to access the allocated block of memory. In at least one embodiment, free is called to free or deallocate memory previously allocated by malloc. When a block of memory allocated using malloc is no longer needed, the pointer (returned by malloc) can be passed as an input argument to free, and thus free frees the memory (located at the address indicated by the pointer). Deallocate it so that it can be used for other purposes. In embodiments according to the techniques herein, user code running on a processor may make such calls to call malloc and free or other routines or functions that similarly perform memory allocation and deallocation. Routines such as malloc and free that perform dynamic memory allocation can use memory management metadata about allocated memory. In the following paragraphs, such metadata used for memory management may be referred to as malloc metadata, and is distinct and is also tag-based metadata described herein, including other metadata pointed to by tags and pointer tags (e.g. For example, tag-based metadata is not accessible to running user code and is processed by a metadata processor or subsystem as described elsewhere herein with respect to example 1000). The malloc metadata may include, for example, information about the allocated memory block, such as the size of the allocated memory block, and a pointer pointing to a portion of the malloc metadata for the subsequently allocated memory block.

도 64를 참조하면, 예컨대 malloc과 관련하여 메모리 할당을 도시하는 예가 도시된다. 예(1100)에서, 프로그램은 요청된 크기의 메모리의 제 1 블록을 할당하기 위해 malloc을 부르는 제1 호출을 수행한다. 이에 응답하여, malloc은 요청된 크기의 메모리 블록(1102b)을 할당하고 메모리 블록(1102b)에 대한 시작 어드레스를 나타내는 포인터 P1을 리턴할 수 있다. 그 다음, 사용자 프로그램은 포인터(P1) 또는 P1으로부터의 오프셋에 기초한 다른 어드레스를 사용하여 할당된 메모리 블록(1102b)에 데이터를 저장하고 그로부터 데이터를 판독할 수 있다. 또한 동적 메모리 관리를 위해, malloc은 할당된 각 메모리 블록에 대한 자체의 malloc 메타데이터의 스토어(1102a)를 할당할 수도 있다. 요소(1102a)는 할당된 메모리 블록(1102b)에 대한 malloc 메타데이터를 저장하기 위해 malloc에 의해 할당되고 사용되는 메모리 부분을 나타낸다. 유사한 방식으로, 사용자 프로그램은 이어서 malloc을 부르는 제 2 호출을 수행하여 제 2 메모리 블록을 할당할 수 있다. 요소(1104a)는 이러한 제 2 호출에 응답하여 malloc에 의해 할당된 메모리 부분을 나타내며, (1104a)는 malloc 메타데이터를 저장하기 위해 사용된다. 요소(1104b)는 제 2 메모리 블록을 나타내며, 여기서 P2는 제 2 메모리 블록에 액세스하기 위해 사용자 프로그램에 리턴된 포인터이다. 유사한 방식으로, 사용자 프로그램은 이어서 malloc을 부르는 제 3 호출을 수행하여 제 3 메모리 블록을 할당할 수 있다. 요소(1106a)는 이러한 제 3 호출에 응답하여 malloc에 의해 할당된 메모리 부분을 나타내며, (1106a)는 malloc 메타데이터를 저장하기 위해 사용된다. 요소(1106b)는 할당된 제 3 메모리 블록을 나타내며, 여기서 P2는 제 3 메모리 블록을 액세스하기 위해 사용자 프로그램에 리턴된 포인터이다.Referring to Fig. 64, an example illustrating memory allocation, eg, in conjunction with malloc, is shown. In example 1100, the program makes a first call to malloc to allocate a first block of memory of the requested size. In response, malloc may allocate a block of memory 1102b of the requested size and return a pointer P1 indicating the starting address for block 1102b of memory. The user program can then store data in and read data from the allocated memory block 1102b using pointer P1 or another address based on an offset from P1. Also for dynamic memory management, malloc may allocate its own store 1102a of malloc metadata for each allocated memory block. Element 1102a represents a portion of memory allocated and used by malloc to store malloc metadata for allocated memory block 1102b. In a similar manner, the user program can then perform a second call to malloc to allocate a second block of memory. Element 1104a represents the portion of memory allocated by malloc in response to this second call, and 1104a is used to store malloc metadata. Element 1104b represents a second block of memory, where P2 is a pointer returned to the user program to access the second block of memory. In a similar manner, the user program can then perform a third call to malloc to allocate a third block of memory. Element 1106a represents the portion of memory allocated by malloc in response to this third call, and 1106a is used to store malloc metadata. Element 1106b represents an allocated third memory block, where P2 is a pointer returned to the user program to access the third memory block.

(1102b)와 같은 할당된 메모리 블록(1102b)이 실행중인 코드에 의해 더 이상 필요하지 않게 된 이후, 코드는 메모리 블록(1102b)을 프리로 만들어 주기 위해 call to free을 수행하여 그러한 메모리 블록(1102b)이 할당 해제되고 다른 목적으로 사용될 수 있도록 한다. 이러한 call to free을 할 때, 포인터(P1)가 리턴될 수 있다. 유사한 방식으로, 메모리 블록(1104b 내지 1104c)이 더 이상 필요하지 않을 때, call to free는 각각 포인터(P2 및 P3)를 특정하도록 만들어질 수 있다.After the allocated memory block 1102b, such as 1102b, is no longer needed by the executing code, the code performs a call to free to free the memory block 1102b so that the memory block 1102b ) is deallocated and can be used for other purposes. When making such a call to free, a pointer (P1) can be returned. In a similar manner, when memory blocks 1104b through 1104c are no longer needed, a call to free can be made to specify pointers P2 and P3, respectively.

malloc 메타데이터를 보유하는 메모리 부분(1102a)의 어드레스가 실행중인 코드의 어드레스 공간에 매핑되기 때문에, malloc에 의해 실행중인 사용자 코드에 리턴된 P1과 같은 포인터를 통해, 사용자 코드는 실수로 또는 의도적으로 malloc 메타데이터에 액세스할 수 있다. 예를 들어, 사용자 코드는 다른 포인터 P4에 메모리 부분(1102a)의 어드레스를 할당(예를 들어, P4 = P1-2)한 다음 포인터 P4에 의해 식별된 메모리 위치를 판독하거나 그 메모리 위치에 기입할 수 있다. 따라서, 사용자 코드는 예를 들어, (1102a)에 저장된 malloc 메타데이터를 오버라이트하고 (1102a)에 저장된 malloc 메타데이터를 판독할 수 있다. 이러한 방식으로, P4에 의해 식별된 어드레스의 메모리 위치에 기입을 수행하는 것은 malloc 메타데이터 부분(1102a)을 훼손시킬 수 있다. 보다 일반적으로, 전술한 것은 malloc 메타데이터 부분(1102a, 1104a 및 1106a) 중 임의의 부분과 관련하여 사용자 코드에 의해 수행될 수 있다.Because the address of the portion of memory 1102a holding malloc metadata is mapped into the address space of the executing code, through a pointer such as P1 returned to the executing user code by malloc, the user code may accidentally or intentionally You can access malloc metadata. For example, user code may assign the address of memory portion 1102a to another pointer P4 (e.g., P4 = P1-2) and then read or write to the memory location identified by pointer P4. can Thus, user code can, for example, overwrite the malloc metadata stored at 1102a and read the malloc metadata stored at 1102a. In this way, performing a write to the memory location at the address identified by P4 may corrupt the malloc metadata portion 1102a. More generally, the foregoing may be performed by user code with respect to any of the malloc metadata portions 1102a, 1104a, and 1106a.

call to free와 관련하여, 사용자 코드는 malloc을 사용하여 이전에 할당한 할당된 메모리 블록의 시작 어드레스에 대응하는 포인터를 지정할 수 있다. 예를 들어, 사용자 코드는 P1, P2 또는 P3가 아닌 인수로서 전술한 포인터 P4를 지정하는 call to free을 수행할 수 있다. 예를 들어, malloc이 call to malloc과 관련하여 각 malloc 메타데이터 부분(1102a 내지 1102c)에 대해 X 바이트 블록(예를 들어, X는 0이 아닌 정수)을 할당한다고 가정한다. 루틴 free는 제 1 어드레스(P4-X)로부터 제 2 어드레스(P4-1)까지의 메모리 위치가 각각 (1102a)와 같은 malloc 메타데이터 부분에 걸친 시작 및 종료 어드레스를 나타내는 것으로 가정하여 처리를 수행할 수 있다. 이 경우, free에 의해 수행되는 처리는 예를 들어 예기치 않은 런타임 성능 및/또는 동적 메모리 관리 에러를 초래하는 훼손된 malloc 메타데이터 부분(1102a)을 사용하는 것일 수 있다.Regarding call to free, user code can use malloc to specify a pointer corresponding to the starting address of an allocated memory block previously allocated. For example, user code may perform a call to free specifying the aforementioned pointer P4 as an argument other than P1, P2 or P3. For example, assume that malloc allocates a block of X bytes (e.g., X is a non-zero integer) for each malloc metadata portion 1102a to 1102c in conjunction with a call to malloc. The routine free will perform processing assuming that the memory locations from the first address P4-X to the second address P4-1 represent start and end addresses over the malloc metadata portion, such as 1102a, respectively. can In this case, the processing performed by free may be, for example, using a corrupted malloc metadata portion 1102a resulting in unexpected runtime performance and/or dynamic memory management errors.

실시예는 malloc 메타데이터 부분(1102a, 1104a 및 1106a)을 보호하는 본 명세서에 설명된 기술을 사용하여 사용자 코드와 같은 다른 실행중인 코드에 의해 수행되는 오버라이트를 통한 훼손을 회피할 수 있다. 이러한 기술은 코드 및/또는 데이터를 특정 컬러 또는 태그로 태깅하고 본 명세서의 다른 곳에서 설명된 바와 같이 원하는 액세스 및 동작만을 허용하는 규칙을 시행할 수 있다.Embodiments may use the techniques described herein to protect the malloc metadata portions 1102a, 1104a, and 1106a to avoid tampering through overwrites performed by other running code, such as user code. Such techniques may tag code and/or data with specific colors or tags and enforce rules that allow only desired access and operations as described elsewhere herein.

도 65를 참조하면, 적어도 하나의 실시예에서, malloc 및 free에 의해 사용되는 메모리 부분은 본 명세서에서 설명된 바와 같이 메타데이터 처리에 의해 사용되는 제 1 태그로 컬러화되거나 태깅될 수 있고, (malloc에 의해 할당된 바와 같이) 사용자 코드에 의해 사용되는 다른 메모리 부분은 본 명세서에서 설명된 바와 같이 메타데이터 처리에 사용되는 상이한 제 2 태그로 컬러화되거나 태깅될 수 있다. 예(1100)에서 (malloc 메타데이터를 포함하는) malloc 및 free 에 의해 사용되는 데이터 부분은 적색으로 표시되거나 태깅될 수 있고, 사용자 데이터 부분(사용자 코드에서 사용하기 위해 malloc에 의해 할당된 메모리 블록)은 파란색으로 표시되거나 태깅될 수 있다. 실시예는 malloc 및 free에 의해 사용되는 메모리 위치를 컬러화 또는 태깅할 때 사용하기 위해 독점적으로 예약된 적어도 하나의 태그 또는 컬러를 가질 수 있다. 이러한 예에서, 적색은 malloc 및 free에 의해 사용되는 메모리 위치에 태깅하는데 사용되는 예약된 컬러이다. 본 명세서의 다른 곳에서 설명된 바와 같이, 실시예는 또한 실행중인 사용자 코드에 대해 하나 이상의 컬러 또는 태그를 예약할 수 있다. 적어도 하나의 실시예에서, 사용자 프로그램에 의해 사용하기 위해 할당된 모든 메모리는 동일한 컬러로 태깅될 수 있다. 변형예로서, 실시예는 각각의 call to malloc에 대해 상이한 태그를 사용할 수 있고, 이에 따라 할당된 각각의 별도의 메모리 블록에 대해 상이한 컬러를 사용할 수 있다. 이러한 예(1110)에서, 설명을 단순화하기 위해, 단일 컬러 청색만이 사용자 프로그램을 위해 malloc에 의해 할당된 모든 메모리 블록을 태깅하는데 사용된다.65, in at least one embodiment, the portion of memory used by malloc and free may be colored or tagged with a first tag used by metadata processing as described herein, (malloc Other portions of memory used by user code (as allocated by ) may be colored or tagged with a different second tag used for metadata processing as described herein. In example 1100, the data portion used by malloc and free (including malloc metadata) may be marked or tagged red, and the user data portion (a block of memory allocated by malloc for use by user code). may be marked as blue or tagged. An embodiment may have at least one tag or color reserved exclusively for use when colorizing or tagging memory locations used by malloc and free. In this example, red is a reserved color used to tag memory locations used by malloc and free. As described elsewhere herein, embodiments may also reserve one or more colors or tags for executing user code. In at least one embodiment, all memory allocated for use by a user program may be tagged with the same color. As a variant, an embodiment may use a different tag for each call to malloc, and thus a different color for each separate block of memory allocated. In this example 1110, to simplify the explanation, only the single color blue is used to tag all memory blocks allocated by malloc for user programs.

요소(1111)는 대응하는 메모리 위치(1113)에 대해 특정된 태그를 나타낼 수 있다. 요소(1112a, 1114a 및 1116a)는 각각 malloc 메타데이터 부분(1102a, 1104a 및 1106a)에 대한 태그를 나타낸다. 요소(1112b, 1114b 및 1116b)는 각각 위에서 설명한 바와 같이 malloc에 대해 이루어진 호출을 통해 사용자 코드에 의해 사용하기 위해 malloc 에 의해 할당된 메모리 블록(1102b, 1104b 및 1106b)에 대한 태그를 나타낸다.Element 1111 may represent a tag specific to a corresponding memory location 1113 . Elements 1112a, 1114a, and 1116a represent tags for malloc metadata portions 1102a, 1104a, and 1106a, respectively. Elements 1112b, 1114b and 1116b respectively represent tags for memory blocks 1102b, 1104b and 1106b allocated by malloc for use by user code via calls made to malloc as described above.

요소(1112a, 1114a 및 1116a)는 각각 (1102a, 1104a 및 1106a)에 있는 각각의 메모리 위치가 적색으로 태깅된 것임을 표시한다. 요소(1112b, 1114b 및 1116b)는 각각 (1102b, 1104b 및 1106b)의 각각의 메모리 위치가 청색으로 태깅된 것임을 표시한다.Elements 1112a, 1114a, and 1116a indicate that each memory location at 1102a, 1104a, and 1106a, respectively, is tagged red. Elements 1112b, 1114b and 1116b respectively indicate that the respective memory location of 1102b, 1104b and 1106b is tagged blue.

일반적으로, 실시예는 (1111)으로 나타낸 태그를 갖는 (1113)의 메모리 블록을 컬러화하고, 또한 메모리 안전 정책을 시행하는 규칙을 트리거하는 것과 관련하여 명령어 태깅, 컬러화된 포인터 또는 전술한 것들의 조합을 사용할 수 있고, 이에 따라 malloc 및 free 만이 malloc 메타데이터 영역(1102a, 1104a 및 1106a)에 액세스할 수 있고 사용자 코드는 액세스할 수 없다.In general, an embodiment may colorize a block of memory at 1113 with a tag indicated at 1111, and may also use instruction tagging, colored pointers, or combinations of the foregoing in connection with triggering rules that enforce memory safety policies. can be used, whereby only malloc and free can access the malloc metadata areas 1102a, 1104a and 1106a and user code cannot access them.

제 1 실시예에서, malloc 및 free의 코드는 예컨대 로더에 의해 특수한 명령어 태그(예를 들어, CI 태그)로 태깅(예를 들어, 명령어 태깅)될 수 있다. malloc 및 free 둘 모두는 동일한 고유하거나 특수한 명령어 태그로 태깅될 수 있거나(예를 들어, malloc 및 free는 tmem이라는 동일한 CI 태그로 태깅됨), 자체의 고유하거나 특수한 명령어 태그로 태깅될 수 있다(예를 들어, malloc 코드는 tmalloc으로 태깅되고 free 코드는 tfree 태그로 태깅됨). malloc의 코드는, 실행될 때, 예(110)에서와 같이 컬러화를 수행하는 규칙을 트리거하는 스토어 명령어를 포함할 수 있다. free의 코드는, 실행될 때, 예컨대 블록 또는 malloc 메타데이터 부분의 각 메모리 셀을 프리 메모리임을 나타내는 F 태그로 다시 태깅함으로써 malloc 메타데이터 부분(예를 들어, 1102a, 1104a 및 1106a) 또는 이전에 malloced 메모리 블록(예를 들어, 1102b, 1104b 및 1106b)을 다시 초기화하거나 할당 해제하는 규칙을 트리거하는 스토어 명령어를 포함할 수 있다. 또한, 제 1 실시예에서, 메모리 안전 정책은 로드 및 스토어 명령어와 같은 특정 명령어의 실행에 의해 트리거되는 규칙을 포함할 수 있으며, 이에 따라 규칙은 위에서 언급한 특수 명령어 태그(들)로 태깅된 명령어가 오로지 1) malloc 메타데이터 부분(1102a, 1104a 및 1106a)에 액세스할 수 있게 하며 그리고 2) 예(110)에서와 같이 메모리 블록 컬러화를 수행할 수 있게 한다. 이러한 규칙은 일반적으로 CI 태그를 체크하여 (1102a, 1104a 및 1106a) 중 임의의 메타데이터 부분의 메모리 셀을 컬러화하거나 그 메모리 셀에 액세스하는 각각의 명령어가 malloc 또는 free를 나타내는 특수 명령어 태그를 갖는 것을 보장한다.In a first embodiment, the code of malloc and free may be tagged (eg, instruction tagged) with a special instruction tag (eg, CI tag) by the loader, for example. Both malloc and free can be tagged with the same unique or special instruction tag (e.g. malloc and free are tagged with the same CI tag of tmem), or they can be tagged with their own unique or special instruction tag (e.g. For example, malloc code is tagged as tmalloc and free code is tagged as tfree). Malloc's code, when executed, may include a store instruction that triggers a rule to perform colorization as in example 110. Free's code, when executed, deletes the malloc metadata portion (e.g., 1102a, 1104a, and 1106a) or previously malloced memory by re-tagging each memory cell of the block or malloc metadata portion with an F tag to indicate that it is free memory. It may include store commands that trigger rules to reinitialize or deallocate blocks (eg, 1102b, 1104b, and 1106b). Also, in a first embodiment, the memory safety policy may include rules triggered by the execution of specific instructions, such as load and store instructions, whereby the rules are instructions tagged with the above-mentioned special instruction tag(s). allows only 1) access to malloc metadata portions 1102a, 1104a and 1106a and 2) to perform memory block colorization as in example 110. These rules generally check CI tags to colorize memory cells in any of the metadata portions of (1102a, 1104a, and 1106a) or that each instruction accessing that memory cell has a special instruction tag indicating malloc or free. guarantee

제 2 실시예에서, 특수 명령어 태그를 사용하기보다, 실시예는 로드 및 스토어 명령어와 같은 특정 명령어의 실행에 의해 트리거되는 메모리 안전 정책의 규칙을 갖는 컬러화된 포인터를 사용할 수 있다. 로더는 적색으로 컬러화된 malloc 메타데이터 부분(1102a, 1104a 및 1106a)을 참조하는 malloc 및 free의 포인터에 태깅할 수 있다. malloc의 코드는, 실행될 때, 예(110)에서와 같이 컬러화를 수행하는 규칙을 트리거하는 스토어 명령어를 포함할 수 있다. free의 코드는 실행될 때, 예컨대 메모리 셀을 프리 메모리임을 나타내는 태그 F로 다시 태깅함으로써 malloc 메타데이터 부분(예를 들어, 1102a, 1104a 및 1106a) 또는 이전에 할당된 메모리 블록(예를 들어, 1102b, 1104b 및 1106b)을 다시 초기화하거나 할당 해제하는 규칙을 트리거하는 스토어 명령어를 포함할 수 있다. 메모리 안전 정책은 로드 및 스토어 명령어와 같은 특정 명령어의 실행에 의해 트리거되는 규칙을 포함할 수 있으며, 이에 따라 규칙은 적색으로 컬러화된 포인터를 사용하는 메모리 셀을 참조하는 명령어로 malloc 메타데이터 부분(1102a, 1104a 및 1106a)에 액세스하는 것만을 허용한다. 이러한 규칙은 일반적으로 MR 태그를 체크하여 (1102a, 1104a 및 1106a) 중 임의의 메타데이터 부분의 메모리 셀에 액세스하는 메모리 명령어가 메모리 셀의 제 2 컬러와 매칭하는 제 1 컬러를 갖는 포인터를 사용하는 것을 보장한다.In a second embodiment, rather than using special instruction tags, embodiments may use colored pointers with rules of memory safety policy triggered by execution of specific instructions, such as load and store instructions. The loader can tag pointers to malloc and free that refer to malloc metadata portions 1102a, 1104a and 1106a colored red. Malloc's code, when executed, may include a store instruction that triggers a rule to perform colorization as in example 110. Free's code, when executed, removes the malloc metadata portions (e.g., 1102a, 1104a, and 1106a) or previously allocated memory blocks (e.g., 1102b, 1104b and 1106b) may include store commands that trigger rules to reinitialize or deallocate. The memory safety policy may include rules that are triggered by the execution of specific instructions, such as load and store instructions, whereby the rules refer to memory cells using red colored pointers in the malloc metadata portion (1102a). , 1104a and 1106a). This rule generally checks the MR tag so that a memory instruction accessing a memory cell in any metadata portion of (1102a, 1104a, and 1106a) uses a pointer with a first color that matches the second color of the memory cell. guarantee that

제 3 실시예에서, 위에서 설명된 바와 같이 특수 명령어 태그 및 컬러화된 포인터 둘 모두는 조합되어 이용될 수 있다. 다음은 그러한 제 3 실시예에서 사용될 수 있는 명령어 및 규칙의 예이다. 본 명세서에서의 다른 논의와 일관하여, 다음의 예는 목적지 레지스터 또는 현재 명령어의 결과가 저장되는 메모리 위치에 태깅하는데 사용되는, PC(프로그램 카운터), CI(현재 명령어), OP1(현재 명령어의 오퍼랜드 1), OP2(현재 명령어의 오퍼랜드 2), MR(만일 있다면, 현재 명령어에서 참조되는 메모리 위치)에 대해 메타데이터 처리를 위한 5 입력 태그 및 PCnew(다음 명령어의 다음 PC에 대한 새로운 PC 태그) 및 R(현재 명령어의 결과에 대한 태그) 명령어에 대해 두 개의 전파되거나 발생된 태그에 기초한 규칙을 사용한다. 또한 "-"는 태그에 대해 don't care임을 나타낸다. 이러한 실시예에서, 로더는 malloc의 명령어를 특수 태그 tmalloc으로 태깅할 수 있고 free의 명령어를 특수 태그 tfiree로 태깅할 수 있다. 아래에서 언급되는 트리거된 규칙을 사용하여 컬러화된 포인터가 생성될 수 있다.In a third embodiment, as described above, both special command tags and colored pointers may be used in combination. The following are examples of commands and rules that may be used in such a third embodiment. Consistent with other discussions herein, the following examples are used to tag the destination register or memory location where the result of the current instruction is stored: PC (program counter), CI (current instruction), OP1 (current instruction's operand) 1), OP2 (operand 2 of current instruction), 5 input tags for metadata processing for MR (memory location referenced in current instruction, if any) and PCnew (new PC tag for next PC in next instruction) and R (tag for the result of the current instruction) Use a rule based on two propagated or generated tags for the instruction. Also, "-" indicates don't care about tags. In this embodiment, the loader can tag instructions in malloc with the special tag tmalloc and instructions in free with the special tag tfiree. Colored pointers can be created using the triggered rules mentioned below.

malloc과 관련하여, malloc의 코드 부분을 실행함으로써 트리거되는 메타데이터 규칙 처리는 malloc의 코드 부분의 스토어 명령어의 결과로서 호출된 제 1 규칙을 통해 (1102b)와 같은 새로 할당된 메모리 블록을 가리키는 포인터에 대한 태그를 생성할 수 있다. 예를 들어, malloc C 코드는 "P1 = next free"일 수 있고, 여기서 next free는 (1113)에서 다음의 프리 메모리 위치를 가리키는 포인터이며, 스토어 명령어는 "move R1, R2"일 수 있고, 여기서 레지스터 R1은 next free의 어드레스를 포함하는 소스 레지스터이며, 레지스터 R2는 포인터 P1 인 목적지 레지스터이다. 레지스터 R1은 (OP1 태그를 가진) OP1일 수 있고, 레지스터 R2는 (발동된 규칙(fired rule)의 결과로서 전파되거나 생성된 R 태그를 갖는) 결과 또는 목적지 레지스터일 수 있다. malloc의 코드 부분은 역시 특수 태그인 tmalloc으로 태깅된 명령어로서, 명령어가 malloc 코드에 포함되어 있음을 나타내는, 예컨대 전술한 move 명령어와 같은 명령어를 포함할 수 있다. 적어도 하나의 실시예에서, 로더는 malloc의 명령어를 특수 코드 태그인 tmalloc으로 태깅할 수 있다. 제 1 규칙은 할당된 메모리 블록(1102b)을 가리키는 포인터 P1를 태그 청색으로 태깅할 수 있다. malloc에서 상기 move 명령어의 결과로서 트리거되는 제 1 규칙은 다음과 같을 수 있다:In the context of malloc, the processing of metadata rules triggered by executing malloc's code portion is a pointer to a newly allocated memory block, such as 1102b, via the first rule called as a result of a store instruction in malloc's code portion. You can create tags for For example, the malloc C code could be "P1 = next free", where next free is a pointer to the next free memory location at (1113), and the store instruction could be "move R1, R2", where Register R1 is the source register containing the address of the next free, and register R2 is the destination register, which is the pointer P1. Register R1 can be OP1 (with an OP1 tag), and register R2 can be a result or destination register (with an R tag propagated or generated as a result of a fired rule). The code portion of malloc is also an instruction tagged with a special tag, tmalloc, and may include an instruction indicating that the instruction is included in the malloc code, for example, an instruction such as the move instruction described above. In at least one embodiment, the loader may tag instructions in malloc with a special code tag, tmalloc. The first rule may tag the pointer P1 pointing to the allocated memory block 1102b with the tag blue. The first rule triggered as a result of the move command in malloc may be:

위의 규칙은 오로지 CI 태그가 tmalloc일 때 발동하고 그래서 malloc의 태깅된 move 명령어에 대해서 발동한다. malloc에 의해 사용된 포인터가 P1이라고 가정하면, 위의 mv 규칙 1A는 레지스터 R2에 저장된 태그 P1을 태그 또는 컬러 청색으로 태깅하여, 이것이 청색 메모리 위치(예를 들어, 청색 태그로 태깅된 메모리 위치)를 가리키는 포인터임을 나타낸다.The above rule only fires when the CI tag is tmalloc and so on move commands tagged with malloc. Assuming that the pointer used by malloc is P1, mv rule 1A above would tag the tag P1 stored in register R2 as a tag or color blue, so that it is a blue memory location (i.e. a memory location tagged with a blue tag). indicates that it is a pointer pointing to .

청색으로 태깅된 포인터 P1는 할당된 메모리 블록(1102b)에 0 또는 각 워드에 대한 몇몇 다른 초기 값을 할당된 메모리 블록(1102b)에 기입하라는 malloc의 또 다른 제 2 스토어 명령어와 함께 사용될 수 있다. 예를 들어 "*P1=0"이 "Store R3, (R2)"을 발생하는 malloc C 코드에 포함될 수 있는데, 여기서 R3은 제로(0)를 포함하는 소스 레지스터 오퍼랜드 OP1이고, R2는 어드레스 P1을 포함하는 OP2 레지스터이다. 이와 같은 스토어 명령어에서, "(R2)"는 오퍼랜드 MR이고 또한 스토어 명령어의 타깃 또는 목적지인 메모리 위치를 나타낸다. 또한, malloc의 위의 스토어 명령어는 또한 tmalloc으로도 태깅될 수 있고 다음과 같이 제 2 특수 스토어 규칙을 트리거하는 결과를 가져올 수 있다:The blue tagged pointer P1 can be used with another second store instruction of malloc to write allocated memory block 1102b with 0 or some other initial value for each word to allocated memory block 1102b. For example, "*P1=0" can be included in malloc C code that generates "Store R3, (R2)", where R3 is the source register operand OP1 containing zero, and R2 is the address P1. It is the OP2 register that contains. In such a store instruction, "(R2)" is the operand MR and also indicates the memory location that is the target or destination of the store instruction. Additionally, store instructions above malloc may also be tagged as tmalloc and may result in triggering a second special store rule as follows:

태깅된 포인터 P1를 malloc을 호출한 사용자 코드에 리턴하기 전에,Before returning the tagged pointer P1 to the user code that called malloc,

위의 스토어 규칙 2A는 오로지 CI 태그가 tmalloc 일 때, (P1를 나타내는) R2 내의 포인터 또는 어드레스가 청색으로 태깅될 때, 및 P1이 가리키는 메모리 위치 MR이 F 태그를 가질 때만 발동한다. 전술한 메모리 위치 *P1은 메모리 위치를 청색으로 컬러화하기 전에 "F"로 태깅된 것으로 가정한다. 이 예에서, F는 프리 메모리 위치를 나타낸다. 결과로 생성된 메모리 위치에 대한 MR 태그는 메모리 위치에 대한 청색 태그를 나타낸다.Store rule 2A above only fires when the CI tag is tmalloc, when a pointer or address in R2 (representing P1) is tagged blue, and when the memory location MR pointed to by P1 has an F tag. Assume that the aforementioned memory location *P1 is tagged with "F" prior to colorizing the memory location as blue. In this example, F represents a free memory location. The MR tag for the resulting memory location shows a blue tag for the memory location.

따라서 malloc은 할당되는 메모리 블록의 각 메모리 위치에 대해 위에서 언급한 제 2 규칙을 트리거하는 결과를 가져오는 코드를 포함할 수 있다.Thus, malloc can contain code that results in triggering the second rule mentioned above for each memory location in the block of memory being allocated.

malloc은 malloc 메타데이터 부분(1102a, 1104a 및 1106a)을 초기화하는데 사용하기 위해 아래에 설명된 추가 규칙을 트리거하는 코드(예를 들어, 상기 무브(mv) 규칙 1A 및 스토어 규칙 2A와 유사함)를 또한 포함할 수 있다. 예를 들어, malloc C 코드는 "(P1 - 2) = MD 영역"일 수 있고, 여기서 MD 영역은 malloc 메타데이터 영역(1102a)을 가리키는 포인터이고 무브 명령어는 "move R7, R8"일 수 있고, 여기서 레지스터 R7은 어드레스 "P1 - 2"를 포함하는 소스 레지스터이며, 레지스터 R8은 포인터 MD 영역인 목적지 레지스터이다. 위의 무브 명령어에 의해 트리거되는 규칙은 다음과 같을 수 있다:malloc generates code that triggers additional rules described below (e.g., similar to move (mv) rule 1A and store rule 2A above) for use in initializing the malloc metadata portions 1102a, 1104a, and 1106a. can also contain For example, the malloc C code may be "(P1 - 2) = MD area", where the MD area is a pointer to the malloc metadata area 1102a and the move command may be "move R7, R8"; Here, register R7 is a source register containing addresses "P1 - 2", and register R8 is a destination register, which is a pointer MD area. A rule triggered by the move command above could be:

MD 영역 포인터를 적색으로 태깅하는,tagging the MD area pointer as red,

malloc은 malloc 메타데이터 부분(1102a, 1104a 및 1104c)에 대해 각각 태그(1112a, 1114a, 1116a)를 저장하는 것과 같이 malloc 메타데이터 부분의 각 메모리 위치에 태깅하는 아래에 언급된 (위의 스토어 2A 규칙과 유사한) 저장 규칙 2B를 트리거하는 코드를 또한 포함할 수 있다. 예를 들어, 크기가 malloced 메모리 블록(1102b)의 크기를 나타내는 정수이고 "*(P1-2)=size"가 "Store R6, (R7)"을 초래하는 malloc C 코드에 포함되어 있는 것으로 가정하며, 여기서 R6은 크기 값을 포함하는 소스 레지스터 오퍼랜드 OP1이고, R7은 어드레스 P1을 포함하는 OP2 레지스터이다. 이러한 스토어 명령어에서, "(R7-2)"는 오퍼랜드 MR이고, 또한 스토어 명령어의 타깃 또는 목적지인 MR(1102a) 내의 메모리 위치를 나타낸다. 스토어 규칙 2B는 다음과 같을 수 있고:malloc stores tags 1112a, 1114a, and 1116a for malloc metadata portions 1102a, 1104a, and 1104c, respectively, to tag each memory location of the malloc metadata portion (store 2A rule above). (similar to) may also contain code that triggers storage rule 2B. For example, assume that size is an integer representing the size of the malloced memory block 1102b and "*(P1-2)=size" is included in the malloc C code resulting in "Store R6, (R7)"; , where R6 is the source register operand OP1 containing the size value, and R7 is the OP2 register containing the address P1. In this store instruction, "(R7-2)" is the operand MR, and also indicates the memory location in MR 1102a that is the target or destination of the store instruction. Store rule 2B could be:

이는 스토어 명령어가 태깅된 tmalloc이면, 어드레스 P1을 포함하는 R7 레지스터가 적색으로 태깅되면, 그리고 MR 오퍼랜드가 F로서 태깅되면, 스토어를 수행하는 것이다. 실시예는 위에서 언급한 스토어 2B 규칙의 변형예인 위에서 언급한 스토어 규칙 2D (store 2D)를 또한 포함할 수 있고, 이에 따라 스토어 2D 규칙은 메타데이터 값의 업데이트가 바람직한 경우에 사용될 수 있다.This is to perform a store if the store instruction is tagged tmalloc, if the R7 register containing address P1 is tagged red, and if the MR operand is tagged as F. Embodiments may also include the above-mentioned store rule 2D, which is a variation of the above-mentioned store 2B rule, whereby the store 2D rule may be used when updating of metadata values is desired.

나중 시점에서, 예컨대 청색으로 컬러화된 블록(1102b)을 프리로 만들거나 또는 할당 해제할 때, free는 이전에 할당된 메모리 블록(예를 들어, 사용자 코드 사용을 위해 할당된 메모리 블록)의 메모리 위치를 다시 태깅하기 위해 아래에서 언급되는 스토어 규칙 3을 트리거하게 하는 "*P=0"과 같은 코드를 포함할 수 있다. 로더는 free 명령어를 tfree로 컬러화하거나 태깅할 수 있다. 루틴 free는 "Store R4, (R1)"을 초래하는 C 코드 문장 "*P=0"을 포함할 수 있는데, 여기서 R4는 제로를 포함하는 소스 레지스터 오퍼랜드 OP1이고, R1은 초기화될 메모리 위치의 어드레스를 포함하는 OP2 레지스터이며, "(R1)"은 메모리 위치를 가리키는 어드레스를 포함하는 R1을 갖는 메모리 오퍼랜드 MR를 나타낸다. 스토어 규칙 3은 다음과 같을 수 있다:At a later point in time, such as when freeing or deallocating block 1102b, colored blue, the free is the memory location of a previously allocated memory block (e.g., a memory block allocated for user code use). You can include code such as "*P=0" which will trigger store rule 3 mentioned below to re-tag. The loader can colorize or tag free instructions with tfree. Routine free may contain the C code statement "*P=0" resulting in "Store R4, (R1)", where R4 is the source register operand OP1 containing zero, and R1 is the address of the memory location to be initialized. is the OP2 register containing "(R1)" denotes the memory operand MR with R1 containing an address pointing to a memory location. Store rule 3 could be:

따라서 free는 할당 해제되는 메모리 블록의 각 메모리 위치에 대해 위에서 언급한 제 3 규칙을 트리거하게 하는 코드를 포함할 수 있고, 여기서 메모리 블록은 (예를 들어, malloc 메타데이터 이외의 데이터를 저장하기 위해 사용된) 사용자 코드에 의한 사용을 위해 malloc을 사용하여 이전에 할당된다. 스토어 규칙 3은 CI tag = t가 free이고 메모리 위치와 이를 가리키는 포인터 둘 모두가 동일한 컬러인 청색인지를 보장하기 위해 체크한다.Thus, free can contain code that causes the third rule mentioned above to be triggered for each memory location in a memory block being deallocated, where the memory block is (e.g., to store data other than malloc metadata). used) previously allocated using malloc for use by user code. Store rule 3 checks to ensure that CI tag = t is free and both the memory location and the pointer pointing to it are the same color, blue.

"청색"의 MR 태그는 일반적으로 할당된 사용자 메모리 블록을 컬러화하기 위해 malloc에 의해 이전에 사용된 임의의 컬러일 수 있다는 것을 유의하여야 한다.It should be noted that an MR tag of "blue" could be any color previously used by malloc to color a generally allocated user memory block.

free의 코드는 (1112a)와 같이 malloc 메타데이터 부분의 각 메모리 위치를 다시 태깅하는 것과 관련하여 아래에 설명된 move(mv) 규칙 1C 및 스토어 규칙 4를 트리거하는 코드를 또한 포함할 수 있다. free의 코드는 위의 move(mv) 규칙 1B와 유사한 아래에서 언급되는 move(mv) 규칙 1C를 트리거하는 코드를 포함할 수 있다. move(mv) 규칙 1C는 다음과 같을 수 있고:Free's code may also contain code that triggers move(mv) rule 1C and store rule 4 described below in relation to re-tagging each memory location in the malloc metadata part, such as 1112a. Free's code may contain code that triggers move(mv) rule 1C mentioned below, similar to move(mv) rule 1B above. The move(mv) rule 1C could be:

이는 이는 스토어 규칙 4를 사용하여 다시 태깅하는 것과 관련하여 free에 의해 사용하기 위해 적색 포인터를 태깅하는 것일 수 있다.This may be tagging the red pointer for use by the free in conjunction with re-tagging using store rule 4.

아래의 스토어 규칙 4(위의 스토어 규칙 3과 유사함)는 메타데이터 부분의 각 메모리 부분을 다시 태깅하기 위해, 예컨대 메타데이터 부분(1102a, 1104a 및 1106a)에 대해 각각 다시 태깅(1112a, 1114a, 1116a)하기 위해 트리거될 수 있다. 스토어 규칙 4는 다음과 같을 수 있고:Store rule 4 below (similar to store rule 3 above) is for re-tagging each memory portion of the metadata portion, e.g., for metadata portions 1102a, 1104a, and 1106a, re-tagging 1112a, 1114a, 1116a) may be triggered. Store rule 4 could be:

이는 스토어 명령어가 태깅된 tfree이면, 그리고 및 MR 오퍼랜드가 적색으로 태깅된 포인터를 사용하면, 스토어를 수행한다. 메모리 위치는 "F"로 태깅되어 이제 이것이 free라고 표시한다.It performs a store if the store instruction is tagged tfree, and if the MR operand uses a pointer tagged red. The memory location is tagged with "F" to indicate that it is now free.

제 4 실시예에서, PC 태깅은 malloc 메타데이터 부분(1102a, 1104a 및 1106a)으로부터 데이터를 판독하고 malloc 메타데이터 부분(1102a, 1104a 및 1106a)에 데이터를 기입하며 또한 다른 코드가 전술한 메타데이터 부분에 액세스하는 것을 배제하는데 충분한 권한, 액세스 또는 인가를 malloc 및 free에게 제공하도록 malloc 및 free을 준비시키는데 사용될 수 있다. PC 태깅은 예를 들어, 상이한 PC 태그 값을 사용하여 프로세스별로 상이한 권한, 액세스 또는 인가를 제공하는 예(430)과 관련하여, 본 명세서의 다른 곳에서 설명된다. 유사한 방식으로, 특수하거나 고유한 PC 태그 값은 malloc 메타데이터 부분(1102a, 1104a 및 1106a)과 관련하여 로드 및 스토어 동작을 수행하는 인가를 malloc 및 free에게 제공하기 위해 사용될 수 있다. 더 자세히 설명하면, malloc은 tmalloc으로 태깅된 명령어를 포함할 수 있다(예를 들어, 명령어가 실행될 때 CI tag=tmalloc). malloc은 또한 실행될 때, malloc 메타데이터 부분(1102a, 1104a 및 1106a)에 액세스하는 권한 또는 인가를 나타내는 출력으로서 특정 PC 태그를 전파하거나 생성하는 규칙의 적용을 트리거하는 코드를 포함할 수 있다. malloc은 다음과 같은 제 1 명령어 INS1을 포함할 수 있다:In a fourth embodiment, PC tagging reads data from the malloc metadata portions 1102a, 1104a, and 1106a and writes data to the malloc metadata portions 1102a, 1104a, and 1106a, and another code is used in the aforementioned metadata portion. can be used to prepare malloc and free to provide malloc and free with sufficient privileges, access, or authorizations to preclude access to . PC tagging is described elsewhere herein, for example, with respect to example 430 of providing different rights, access or authorizations on a per-process basis using different PC tag values. In a similar manner, special or unique PC tag values may be used to provide malloc and free with authorization to perform load and store operations with respect to malloc metadata portions 1102a, 1104a, and 1106a. More specifically, malloc can contain instructions tagged as tmalloc (eg CI tag=tmalloc when the instruction is executed). malloc may also contain code that, when executed, triggers the application of rules that propagate or create specific PC tags as output indicating rights or authorizations to access malloc metadata portions 1102a, 1104a, and 1106a. malloc may include the first instruction INS1 as follows:

여기서 R2는 영역(1102a)의 어드레스 P6과 같은 malloc 메타데이터 부분의 어드레스이고, (R2)는 적색으로 컬러화된 (1102a) 내의 어드레스 P6을 갖는 메모리 위치를 나타낸다. 전술한 명령어 INS1은 실행될 때, X1와 같은 태그 값을 갖는 PCnew를 생성할 수 있고, 여기서 X1은 (1102a)에 액세스하는데 필요한 권한을 나타낸다. 이 경우, 위의 제 1 명령어 INS1에 대해 트리거된 규칙은 다음과 같을 수 있고:where R2 is the address of the malloc metadata portion, such as address P6 in area 1102a, and (R2) represents the memory location with address P6 in 1102a colored red. When executed, the command INS1 described above may create a PCnew with a tag value equal to X1, where X1 represents the permission required to access 1102a. In this case, the rule triggered for the first command INS1 above may be as follows:

이는 R2를 적색으로 컬러화하고, 또한 PC를 X1로 설정하여 R2에 저장된 어드레스(예를 들어, 어드레스 P6)를 갖는 메모리 위치로의 판독/기입 액세스를 표시한다. 이어서, malloc은 레지스터 R3로부터의 값(예를 들어, OP1)을 어드레스 P6(P6은 R2에 저장됨)을 갖는 메모리 위치에 저장하는 "store R3, (R2)"라는 제 2 명령어 INS2를 포함할 수 있다. 위의 제 2 명령어 INS2에 대해 트리거된 규칙은 다음과 같을 수 있다:This colorizes R2 red and also sets PC to X1 to indicate a read/write access to the memory location with the address stored in R2 (e.g. address P6). Malloc will then include a second instruction INS2 called “store R3, (R2)” which stores the value from register R3 (e.g., OP1) to a memory location with address P6 (P6 is stored in R2). can The rule triggered for the second command INS2 above may be as follows:

여기서 PCnew는 클리어되거나 malloc 메타데이터 부분(1102a)에 액세스하는 권한을 표시하지 않는 디폴트 PC 태그인 PCdefault로 리셋된다. 따라서, 이와 같은 특정 예에서, 제 1 ADD 명령어는 (1102a)에 대한 판독/기입 액세스 권한 또는 인가를 malloc에게 허여하는 규칙을 트리거한다. 기입을 수행하는 위의 제 2 malloc 명령어가 실행된 후, 전파된 PC 태그는 (1102a)에 대한 판독/기입 액세스를 위한 권한 또는 인가를 malloc으로부터 제거한다. 변형예로서, 실시예는 X1의 PCnew 태그를 생성함으로써 (1102a)에 대한 판독/기입 액세스를 malloc에게 허여하는 규칙을 트리거하는 명령어를 포함하는 프롤로그를 갖는 malloc의 버전을 포함할 수 있다(예를 들어, 프롤로그는 위에서 언급한 규칙을 트리거하는 Add 명령어 INS1을 포함한다). 리턴하기 전 malloc의 종료시, 실행될 때, PCdefault의 PCnew 태그를 생성함으로써 malloc의 판독/기입 액세스를 제거하는 규칙을 트리거하는 명령어를 포함하는 에필로그가 실행될 수 있다(예를 들어, 에필로그는 위에서 언급한 규칙을 트리거하는 스토어 명령어 INS2를 포함한다).Here PCnew is cleared or reset to PCdefault, which is the default PC tag that does not indicate permission to access the malloc metadata portion 1102a. Thus, in this particular example, the first ADD instruction triggers a rule granting malloc read/write access or authorization to 1102a. After the second malloc instruction above, which performs the write, is executed, the propagated PC tag removes the permission or authorization for read/write access to 1102a from malloc. As a variant, an embodiment may include a version of malloc with a prologue containing an instruction that triggers a rule that grants malloc read/write access to 1102a by creating a PCnew tag of X1 (e.g. For example, the prolog contains the Add command INS1 which triggers the rule mentioned above). On malloc's exit before returning, an epilog may be executed containing instructions that, when executed, trigger rules that remove read/write access to malloc by creating a PCnew tag of PCdefault (e.g. contains the store command INS2 that triggers).

유사한 방식으로, free는 (1102a)로의 액세스를 free에게 제공하기 위해 PCnew 태그 값을 생성하거나 전파하는 규칙을 호출하는 명령어를 포함할 수 있다. 적용된 규칙은 특정한 프로세스에 기초하여 원하는 액세스, 권한 또는 인가를 나타내는 출력으로서 특정 PC 태그를 전파하거나 생성할 수 있고, 이에 따라 특정 허용된 권한, 액세스 또는 인가가 상이한 PC 태그 값에 의해 표현될 수 있다.In a similar manner, free may contain instructions that invoke rules that generate or propagate a PCnew tag value to give free access to 1102a. Applied rules may propagate or generate specific PC tags as output indicating desired access, rights or authorizations based on a particular process, such that certain allowed rights, accesses or authorizations may be represented by different PC tag values. .

전술한 바는 모든 malloced 메모리 블록에 대해 단일 컬러 청색 및 모든 malloc 메타데이터 부분에 대한 단일 컬러 적색을 예시하고 있음을 주목하여야 한다. 보다 일반적으로, 본 명세서의 다른 곳에서 설명된 바와 같이, malloc은 힙 메모리의 다른 부분을 컬러화하는데 필요할 수 있는 새로운 컬러를 무한대로 생성할 수 있는 인가를 제공받을 수 있다. 본 명세서의 다른 곳에서 논의된 바와 같이, 예를 들어, malloc는 초기에 미리 결정된 하나 이상의 컬러 세트 또는 태그를 제공 받을 수 있고, 이어서 초기의 미리 결정된 세트로부터 필요한 태그를 생성할 수 있다. 예를 들어, malloc의 초기의 미리 결정된 세트는 황색 또는 Y 및 적색 또는 R을 포함할 수 있다. 실행중인 프로세스의 경우, 각 call to malloc에 대해 새로운 Y-기반 태그(예를 들어, Y1, Y2, Y3, . . . )를 생성하여 사용자 코드에 의해 사용되는 (예를 들어, malloc 메타데이터 저장소 이외의) 새로운 메모리 블록을 할당할 수 있다. 따라서, 상이한 Y-기반 태그는 각각의 할당된 메모리 블록(1102b, 1104b 및 1106b)을 컬러화하는데 사용될 수 있다(예를 들어, (1102b)는 Y1로 컬러화되고, (1104b)는 Y2로 컬러화되고, (1106b)는Y3로 컬러화된다). malloc은 각 call to malloc에 대해 생성된 각각의 상이한 malloc 메타데이터 부분에 대해 새로운 R-기반 태그(예를 들어, R1, R2, R3,...)를 생성할 수 있다. 따라서, R-기반 태그는 malloc 메타데이터 부분(1102a, 1104a, 1106a)을 각각 상이한 R-기반 태그로 컬러화하는데 사용될 수 있다(예를 들어, (1102a)는 R1로 컬러화되고, (1104a)는 R2로 컬러화되고, (1106a)는R3로 컬러화된다). malloc에 의해 사용되는 현재 또는 마지막 R-기반 태그 및 현재 또는 마지막 Y-기반 태그는 malloc 명령어를 실행할 때 트리거되는 규칙을 통해 상태 정보로서 저장될 수 있다. 예를 들어, malloc은 마지막 Y-기반 태그 Y9를 제 1 메모리 위치의 태그로 저장하는 규칙을 트리거하는 명령어를 포함할 수 있다. Y9는 Rtag로서 생성될 수 있다. 후속 명령어는 저장된 마지막 Y-기반 태그인 Y9로 태깅된 동일한 제 1 메모리 위치를 다시 참조할 수 있으며, 이 경우 후속 명령어는 1) 마지막 Y-기반 태그 Y9에 기초하여 새로운 태그 Y10을 생성하고, 및 2) 태그로서 태그 Y10을 제 1 메모리 위치에 저장하는 규칙을 트리거한다. Y10은 Rtag로서 생성될 수 있다. 후속 명령어에 의해 트리거된 규칙은 Rtag를 예를 들어, MRtag +1로 결정할 것을 지시할 수 있으며, 여기서 MRtag는 후속 명령어에 대해 Y9이다.It should be noted that the foregoing illustrates a single color blue for all malloced memory blocks and a single color red for all malloc metadata parts. More generally, as described elsewhere in this specification, malloc may be given an authorization to create an infinite number of new colors that may be needed to color other parts of heap memory. As discussed elsewhere herein, for example, malloc may initially be given one or more pre-determined sets of colors or tags, and then generate the necessary tags from the initial pre-determined sets. For example, the initial predetermined set of mallocs may contain yellow or Y and red or R. For a running process, for each call to malloc it creates a new Y-based tag (e.g. Y1, Y2, Y3, . . . ) to be used by user code (e.g. malloc metadata store other than) can allocate a new memory block. Thus, a different Y-based tag can be used to color each allocated memory block 1102b, 1104b, and 1106b (e.g., 1102b is colored Y1, 1104b is colored Y2, (1106b) is colored Y3). malloc may create a new R-based tag (e.g. R1, R2, R3,...) for each different piece of malloc metadata created for each call to malloc. Thus, R-based tags can be used to color malloc metadata portions 1102a, 1104a, and 1106a with different R-based tags, respectively (e.g., 1102a is colored R1, 1104a is colored R2 is colored with , and (1106a) is colored with R3). The current or last R-based tag and the current or last Y-based tag used by malloc can be stored as state information via rules that are triggered when executing the malloc instruction. For example, malloc may contain an instruction that triggers a rule to store the last Y-based tag Y9 as the tag of the first memory location. Y9 can be generated as Rtag. Subsequent instructions may refer back to the same first memory location tagged with Y9, the last Y-based tag stored, in which case subsequent instructions: 1) create a new tag Y10 based on the last Y-based tag Y9; and 2) Trigger a rule that stores tag Y10 as a tag in the first memory location. Y10 can be generated as Rtag. A rule triggered by a subsequent instruction may direct Rtag to be determined, for example, MRtag +1, where MRtag is Y9 for the subsequent instruction.

이제 하드웨어 가속화된 미스 처리를 사용하는 메타데이터 처리와 관련하여 최적화로서 사용될 수 있는 기술이 설명될 것이다. 일반적으로, 본 명세서의 실시예에서 사용되는 일부 정책은 빈번한 규칙 캐시 미스를 유발할 수 있고 그러한 정책을 위한 캐시 미스 핸들러는 실행하는데 많은 사이클을 가질 수 있다. 일부 정책에서, 다양한 규칙 입력 간의 관계가 결과 또는 성과를 논리적으로 결정한다는 점에서 약간은 단순할 수 있고 그래서 전용 하드웨어를 사용하여 신속하게 하드와이어드되고 해결될 수 있다.A technique will now be described that can be used as an optimization in connection with metadata processing using hardware accelerated miss handling. In general, some policies used in the embodiments herein may cause frequent rule cache misses and cache miss handlers for such policies may take many cycles to execute. In some policies, the relationship between the various rule inputs can be somewhat simpler in that the relationship between the various rule inputs logically determines the outcome or outcome and so can be quickly hardwired and resolved using dedicated hardware.

결과적으로, 하드웨어(HW) 규칙 캐시 미스 핸들러를 사용하여 구현된 이러한 정책은 이러한 하드웨어 가속을 사용하지 않는 다른 것보다 훨씬 짧은 양의 시간 내에 해결될 수 있다. 이러한 실시예에서, 하나 이상의 선택된 정책을 위한 캐시 미스 핸들러와 같은 정책 컴포넌트는 전용 하드웨어로 구현될 수 있다. 따라서, 본 명세서에서의 기술에 따른 실시예는 소프트웨어 규칙 캐시 미스 핸들러를 사용하여 소프트웨어 정의된 정책 컴포넌트 단독으로, 또는 소프트웨어 정의된 정책 컴포넌트와 함께 그러한 하드웨어 지원 정책을 사용할 수 있다.As a result, these policies implemented using hardware (HW) rule cache miss handlers can be resolved in a much shorter amount of time than others that do not use such hardware acceleration. In such embodiments, policy components such as cache miss handlers for one or more selected policies may be implemented in dedicated hardware. Accordingly, embodiments in accordance with the techniques herein may use such hardware-assisted policies with software-defined policy components alone or in conjunction with software-defined policy components using software rule cache miss handlers.

하나의 예로, 메모리 안전 컬러화를 사용하는 메모리 안전 정책을 고려해 본다. 예컨대 본 명세서의 다른 곳에서 설명된 메모리 안전 정책과 관련하여, 메모리 셀 및 포인터는 컬러화될 수 있고, 이에 따라 로드 및 스토어 동작과 관련하여 호출되는 규칙은 포인터 컬러가 메모리 셀의 컬러와 매칭하는 메모리 참조만을 허용할 수 있다. 예를 들어, 로드 명령어에 대해 트리거되는 규칙은 (예를 들어, 레지스터가 OP1과 같은 오퍼랜드인 레지스터 태그의) 포인터 컬러가 메모리 셀 컬러(예를 들어, Mtag와 같은 메모리 위치 태그)와 동일한 정책을 시행하는데 사용될 수 있다. 메모리 안전 정책은 이와 같이 동일한 컬러 관계를 많은 컬러에 대해 간단하게 담아주는 많은 상이한 구체적인 규칙으로 PUMP 규칙 캐시를 채워줌으로써 용량에 도전할 수 있어서, 용량 미스 레이트를 높인다. 규칙 캐시를 프리로드하지 않는 본 명세서에 설명된 일부 실시예에서, 이들 규칙을 하나 하나 모두 삽입하기 위해 강제 규칙 캐시 미스가 요구된다. 메모리 안전 정책 규칙은 일반적으로 실행중인 사용자 코드와 관련하여 통상 트리거될 수 있기 때문에, 소프트웨어 규칙 캐시 미스 핸들러가 아닌 HW 규칙 캐시 미스 핸들러를 사용하여 메모리 안전 정책 규칙이 지원될 수 있다.As an example, consider a memory-safe policy using memory-safe colorization. Memory cells and pointers may be colored, such as in connection with memory safety policies described elsewhere herein, such that the rule invoked with respect to load and store operations is that the pointer color matches the color of the memory cell. Only references are permitted. For example, a rule triggered on a load instruction has the policy that the pointer color (eg, of a register tag whose register is an operand such as OP1) is equal to the memory cell color (eg, a memory location tag such as Mtag). can be used to enforce The memory safety policy can thus challenge capacity by filling the PUMP rule cache with many different specific rules that simply capture the same color relationship for many colors, thereby increasing the rate of capacity misses. In some embodiments described herein that do not preload the rule cache, a forced rule cache miss is required to insert all of these rules one by one. Because memory safety policy rules can typically be triggered in conjunction with running user code, memory safety policy rules can be supported using HW rule cache miss handlers rather than software rule cache miss handlers.

이러한 실시예에서, HW 규칙 캐시 미스 핸들러는 규칙 캐시 미스가 발생할 때 캐시에 삽입된 새로운 규칙을 생성하거나 계산할 수 있다. 예를 들어, 메모리 안전을 위한 미스 핸들러는 로드 명령어에 대해 OP1tag를 Mtag와 비교하는 HW 규칙 캐시 미스 핸들러로서 하드웨어를 사용하여 구현될 수 있다. OP1tag가 Mtag와 동일하면, HW 규칙 캐시 미스 핸들러는 Rtag가 Mtag에 할당된 새로운 규칙을 생성할 수 있다. 예를 들어 포인터 PTR이 적색이고 PTR이 가리키는 메모리 셀이 적색이면, 규칙을 호출하는 명령어가 허용되고 결과 태그 Rtag는 적색이어야 한다. 규칙 캐시에 삽입될 새로운 규칙으로서 전술한 것을 발생하기 위해, HW 캐시 미스 핸들러는 먼저 OP1tag를 Mtag와 비교할 수 있다. 이들이 매칭하지 않으면, 규칙 위반이 있었으며 명령어는 허용되지 않는다(예를 들어, 프로세서로 하여금 실행을 중지하게 한다). HW 규칙 캐시 미스 핸들러가 OP1tag가 Mtag와 동일하다고 결정하면, HW 규칙 캐시 미스 핸들러는 opcode=로드, OP1tag=적색, Mtag=적색 및 Rtag=적색(규칙의 모든 다른 태그 입력 및 출력은 don't care일 수 있음)를 포함하는 새로운 규칙을 하드웨어의 출력으로서 생성할 수 있고, 그 다음에 생성된 규칙은 규칙 캐시에 삽입할 수 있다.In such an embodiment, the HW rule cache miss handler may generate or compute a new rule inserted into the cache when a rule cache miss occurs. For example, a miss handler for memory safety can be implemented using hardware as a HW rules cache miss handler that compares OP1tag to Mtag for load instructions. If OP1tag equals Mtag, the HW rule cache miss handler can create a new rule with Rtag assigned to Mtag. For example, if the pointer PTR is red and the memory cell pointed to by the PTR is red, the command that invokes the rule is allowed and the resulting tag Rtag must be red. To generate the above as a new rule to be inserted into the rule cache, the HW cache miss handler can first compare OP1tag with Mtag. If they do not match, a rule violation has occurred and the instruction is disallowed (eg, causes the processor to stop executing). If the HW rules cache miss handler determines that OP1tag is equal to Mtag, then the HW rules cache miss handler determines opcode=load, OP1tag=red, Mtag=red, and Rtag=red (all other tag inputs and outputs in the rule don't care may be generated as an output of the hardware, and the generated rule may then be inserted into the rule cache.

도 66을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 하드웨어로 구현된 캐시 미스 핸들러를 도시하는 예가 도시된다. 예(1300)는 룩업을 수행하여 입력(1302a)과 매칭하는 규칙이 캐시 내에 있는지를 결정하기 위해 PUMP 규칙 캐시(1302)(예를 들어, 도 22)에 입력되는 입력(1302a)을 도시하는 (1301)을 포함한다. 매칭하면, 출력(1302b)은 캐시에 저장된 규칙에 기초하여 결정된다. 본 명세서의 다른 곳에서의 논의와 일관하여, 입력(1302a)은 opcode 및 입력 태그 - PCtag, CItag, OP1tag, OP2tag, Mtag - 를 포함할 수 있다. 출력(1302b)은 PCnew tag 및 Rtag와 같은 규칙의 출력 태그를 포함할 수 있다. 캐시 미스 핸들러를 소프트웨어로 구현하는 본 명세서에서의 기술에 따른 실시예와 관련하여, 규칙 캐시 미스의 발생시, 소프트웨어 캐시 미스 핸들러가 호출될 수 있고, 이에 의해 미스 핸들러의 코드가 실행되어 현재 규칙 캐시 미스를 야기하는 입력(1302a)에 대한 새로운 규칙을 계산할 수 있다. 캐시 미스 핸들러는 먼저 입력이 허용 가능한 규칙과 일치하는지를 (예를 들어, 메모리 안전 로드 규칙의 경우, OP1tag이 Mtag과 동일한지를) 결정하고, 일치한다면, 특정 입력(1302a)에 대한 출력을 계산하며(예를 들어, Rtag를 Mtag으로 결정하며), 그럼으로써 입력(1302a)에 대한 규칙을 생성한다. (입력(1302a)과 미스 핸들러의 계산된 출력의 조합에 기초한) 새로운 규칙은 규칙 캐시에 삽입된다. 본 명세서의 다른 곳에서 논의된 바와 일관하여, 새로운 규칙은 opcode, 입력 태그 - PCtag, CItag, OP1tag, OP2tag, Mtag - 및 출력 태그 - PCnewtag, Rtag - 를 포함할 수 있다.Referring to FIG. 66 , an example illustrating a cache miss handler implemented in hardware in an embodiment according to the techniques herein is shown. Example 1300 shows input 1302a being entered into PUMP rule cache 1302 (e.g., FIG. 22) to perform a lookup to determine if a rule matching input 1302a is in the cache ( 1301). If there is a match, the output 1302b is determined based on the rules stored in the cache. Consistent with discussion elsewhere herein, input 1302a may include an opcode and an input tag - PCtag, CItag, OP1tag, OP2tag, Mtag. Output 1302b may include the rule's output tags, such as PCnew tag and Rtag. With respect to an embodiment according to the description herein of implementing a cache miss handler in software, when a rule cache miss occurs, the software cache miss handler may be called, whereby the code of the miss handler is executed to determine the current rule cache miss. A new rule can be calculated for input 1302a that causes The cache miss handler first determines whether the input matches an acceptable rule (e.g., in the case of a memory safe load rule, whether OP1tag equals Mtag), and if so, calculates the output for the particular input 1302a ( eg, Rtag to Mtag), thereby creating a rule for input 1302a. A new rule (based on the combination of the input 1302a and the computed output of the miss handler) is inserted into the rule cache. Consistent with what has been discussed elsewhere herein, new rules may include opcodes, input tags - PCtag, CItag, OP1tag, OP2tag, Mtag - and output tags - PCnewtag, Rtag.

요소(1303)는 소프트웨어 규칙 캐시 미스 핸들러가 아닌 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 HW 규칙 캐시 미스 핸들러(1304)를 도시한다. 이러한 실시예에서, HW 규칙 캐시 미스 핸들러(1304)는 예를 들어, 게이트 레벨 로직 및 다른 하드웨어 컴포넌트를 포함하는 전용 하드웨어를 사용하여 구현될 수 있다. 이러한 실시예에서, HW 미스 핸들러(1304)는 PUMP 규칙 캐시(1302)와 동일한 입력(1302a)을 취할 수 있고 그의 하드웨어를 사용하여 PUMP 규칙 캐시에 출력될 동일한 출력(1302b)을 생성할 수 있다. 결과적으로, 위에서 언급한 바와 같이 opcode, 입력 태그 및 출력 태그를 조합함으로써 새로운 규칙이 형성될 수 있다. 그 다음에 새로운 규칙은 PUMP 규칙 캐시(예를 들어, 도 22)에 저장될 수 있다.Element 1303 illustrates a HW rules cache miss handler 1304 that may be used in embodiments according to the techniques herein rather than a software rule cache miss handler. In such an embodiment, the HW rule cache miss handler 1304 may be implemented using dedicated hardware including, for example, gate level logic and other hardware components. In this embodiment, the HW miss handler 1304 can take the same inputs 1302a as the PUMP rules cache 1302 and use its hardware to generate the same outputs 1302b to be output to the PUMP rules cache. Consequently, as mentioned above, new rules can be formed by combining opcodes, input tags and output tags. The new rules may then be stored in the PUMP rule cache (eg, FIG. 22).

적어도 하나의 실시예에서, 메모리 안전 정책에 대한 HW 규칙 캐시 미스 핸들러는 규칙을 캐시 내로 로드하기 위해, 간단히 Mtag로부터 Rtag로 복사하고 즉시 PUMP 규칙 삽입을 수행할 수 있는 (예를 들어 게이트 레벨 로직을 사용하여) 전술한 바와 같이 하드웨어로 구현될 수 있다. 이렇게 간단한 경우에는 메모리를 참조할 필요가 없고 메모리의 모든 데이터 구조체 동작을 수행할 필요가 없다.In at least one embodiment, the HW rule cache miss handler for memory safety policies can simply copy from Mtag to Rtag to load rules into cache and perform PUMP rule insertions immediately (e.g. gate level logic). using) can be implemented in hardware as described above. In this simple case, you don't need to reference memory and perform all data structure operations in memory.

또한, 적어도 하나의 실시예에서, 메모리 안전은 메모리 셀의 태그를 한 쌍의 태그: (1) 메모리-셀 컬러 태그, (2) 메모리 셀 내의 포인터상의 포인터 컬러 태그로서 구현할 수 있다. 메모리-안전 가속은 Mtag 및 OP2tag를 스토어상의 새로운 Rtag내에 조합하고 Mtag 쌍으로부터 포인터-태그를 추출하여 로드에 대한 Rtag에 배치하는 전용 캐시를 포함할 수 있다. 이러한 캐시에 대한 미스에는 보다 간단한 전용 소프트웨어 핸들러가 사용될 수 있다. 전술한 바는 메모리 안전과 같은 단일 (비-복합) 정책에 관해 설명되었지만, 동일한 일반적인 기술이 UCP상의 복합 정책의 컴포넌트에도 적용될 수 있다.Additionally, in at least one embodiment, memory security may implement a tag on a memory cell as a pair of tags: (1) a memory-cell color tag, and (2) a pointer-on-pointer color tag within a memory cell. Memory-safe acceleration may include a dedicated cache that combines the Mtag and OP2tag into a new Rtag on the store and extracts the pointer-tag from the Mtag pair and places it in the Rtag for the load. A simpler dedicated software handler can be used for such cache misses. Although the foregoing has been described in terms of a single (non-composite) policy such as memory safety, the same general techniques can be applied to components of a composite policy on UCP.

실시예는 또한 공통으로 참조될 것으로 예상되는 규칙과 같은 규칙의 제한된 공통 서브세트에 대해 HW 규칙 캐시 미스 핸들러를 사용하여 하드웨어 가속을 수행할 수도 있다. 예를 들어, 메모리 안전에서, 산술 동안 로드/스토어 및 전파에 대한 규칙은 가장 표준적이고 양식화된 것이다. 초기에 메모리 영역을 컬러화하고 메모리 영역을 프리로 다시 돌려놓기 위한 다른 드문 규칙이 존재한다. 이러한 드문 규칙은 하드웨어 지원으로 구현되기 보다는 본 명세서에서 설명된 전형적인 규칙 미스 핸들러를 사용하게 할 수 있다.Embodiments may also perform hardware acceleration using HW rule cache miss handlers for a limited common subset of rules, such as rules expected to be commonly referenced. For example, in memory safety, the rules for load/store and propagation during arithmetic are the most standard and stylized. There are other unusual rules for initially colorizing a memory area and putting the memory area back to free. These rare rules may be implemented using the typical rule miss handlers described herein rather than being implemented with hardware support.

적어도 하나의 실시예에서, HW 규칙 캐시 미스 핸들러는 매핑 기능을 게이트-레벨 로직으로서 직접 구현할 수 있다. 예를 들어, 이러한 게이트 레벨 로직 회로는 메모리 안전 정책의 스토어 명령 규칙에 대해 Mtag를 Rtag에 매핑하는 것과 같은 규칙을 위해 입력 태그를 출력 태그에 매핑할 수 있다. 다른 예로서, CFI(제어 흐름 무결성) 정책에 대한 HW 규칙 캐시 미스 핸들러는 게이트 레벨 로직을 사용하여 제어 흐름 타깃 또는 목적지의 태그를 허용된 호출자 세트(예를 들어, 태깅된 특정 제어 흐름 타깃 또는 목적지로 제어를 이전하도록 허용된 소스 위치 또는 어드레스)를 가리키는 포인터가 되게 하여, CFI HW 규칙 캐시 미스 핸들러가 매칭을 위해 그 세트를 통해 판독하게 할 수 있다. 또 다른 예로서, 스택 보호 정책은 하나가 다른 하나로부터 도출되는 하드웨어를 허용하는 방식으로 스택-프레임-코드 태그 및 연관된 스택-프레임-메모리-셀 태그를 인코딩할 수 있고(예를 들어, 이들은 단지 몇 비트만 상이할 수 있고, 이것은 스택-프레임-코드 태그 포인터 및 스택-프레임-메모리-셀 태그 포인터를 함께 할당함으로써 태그가 포인터였던 경우에도 배열될 수도 있음); 그 결과, 스택 보호 정책을 시행하는 HW 규칙 캐시 미스 핸들러는 (스택 포인터로부터 태그를 생성하는 경우에) 생성할 태그를 결정할 수 있거나 (판독 또는 기입의 경우에) 그러한 코드 내에서 메모리 참조를 요구할 수 있을 것이다.In at least one embodiment, the HW rules cache miss handler may directly implement the mapping function as gate-level logic. For example, these gate-level logic circuits can map input tags to output tags for rules such as mapping Mtag to Rtag for the store instruction rule of a memory safety policy. As another example, a HW rule cache miss handler for a control flow integrity (CFI) policy uses gate-level logic to tag a control flow target or destination to a set of allowed callers (e.g., a specific tagged control flow target or destination). source location or address that is allowed to transfer control to . As another example, the stack protection policy may encode the stack-frame-code tags and associated stack-frame-memory-cell tags in a way that allows hardware that one derives from the other (e.g., they are only may differ by only a few bits, which could be arranged even if the tag was a pointer by allocating the stack-frame-code tag pointer and the stack-frame-memory-cell tag pointer together); As a result, the HW rule cache miss handler enforcing the stack protection policy can either determine which tag to create (in the case of creating a tag from a stack pointer) or request a memory reference within such code (in the case of a read or write). There will be.

HW 규칙 캐시 미스 핸들러를 사용하여 PUMP 규칙 캐시에 삽입되는 새로운 규칙을 계산하거나 결정하기 위한 변형예로서, 실시예는 하나 이상의 규칙이 완전히 구체화되고 하드웨어로 시행되고 그래서 PUMP 규칙 캐시에 저장되지 않는 는 그러한 하나 이상의 정책 규칙의 로직을 실제로 하드와이어할 수 있다. 예를 들어, 정책에 대해 HW 규칙 캐시 미스 핸들러 및 PUMP 규칙 캐시를 사용하는 대신, 실시예는 정책의 규칙(예컨대, 게이트 레벨 로직 및 회로와 같은 하드웨어에 구현된 정책의 규칙)을 시행하고 인코딩하는 하드웨어를 사용할 수 있다. PUMP 규칙 캐시 및 HW 특정 규칙 둘 모두를 사용하는 그러한 실시예에서, PUMP 규칙 캐시 및 HW 특정 규칙 모두의 규칙 룩업이 수행될 수 있다. 이 경우, PUMP 규칙 캐시 또는 HW 특정 규칙의 특정 입력을 위한 규칙을 찾지 않음에 따라 미스 핸들러(예를 들어, HW 규칙 캐시 미스 핸들러 또는 소프트웨어 미스 핸들러)가 호출되어 새로운 규칙을 결정/계산할 수 있다.As a variant for using the HW rule cache miss handler to compute or determine new rules to be inserted into the PUMP rule cache, an embodiment is such that one or more rules are fully instantiated and enforced in hardware and so are not stored in the PUMP rule cache. You can actually hardwire the logic of one or more policy rules. For example, instead of using a HW rule cache miss handler and a PUMP rule cache for the policy, an embodiment encodes and enforces the rules of the policy (e.g., rules of the policy implemented in hardware such as gate level logic and circuitry). hardware can be used. In such an embodiment using both the PUMP rule cache and HW specific rules, a rule lookup of both the PUMP rule cache and HW specific rules may be performed. In this case, a miss handler (eg, HW rule cache miss handler or software miss handler) may be called to determine/calculate a new rule according to not finding a rule for a specific entry in the PUMP rule cache or HW specific rule.

복합 정책은 추가적인 도전과 기회를 제시한다. 본 명세서의 다른 곳에서의 논의와 일관하여, 복합 정책은 명령어에 대해 동시에 시행되는 다중 정책을 포함한다. 과제는 복합 정책이 여러 상이한 정책 컴포넌트를 해결해야 한다는 것이다. 복합 정책의 전체적인 해결 순서는 복합 정책의 모든 상이한 정책 컴포넌트에 대해 데이터 캐시, (복합 정책의 정책 컴포넌트 당 UCP 캐시 당) UCP 캐시 및 CTAG 캐시를 이용한 HW 규칙 캐시 미스 핸들러를 사용하여 하드웨어에 의해 지원받을 수 있다는 것이 기회이다. 이전의 경험으로부터, 공통의 과제는 (예를 들어, malloc을 사용하여) 메모리를 새로 할당하고, 그래서 새로운 메모리를 컬러로 태깅하는 경우에, 강제적 규칙 캐시 미스가 야기되는 것이다. 이러한 경우, 메모리 안전 정책 컴포넌트에는 새로운 규칙이 필요하지만 다른 컴포넌트에는 이들의 규칙이 이미 UCP 캐시에 있을 가능성이 있다. 최상위 레벨 복합 정책 및 메모리 안전 컬러 매칭을 위한 HW 규칙 캐시 미스 핸들러를 통한 하드웨어 가속을 사용하면, 메모리 규칙 해결책은 소프트웨어 기반 미스 핸들러 코드를 실행하는 규칙을 해결하는데 수백 내지 수천 개의 사이클이 필요한 대신, 하드웨어에서 실행되고 캐시(예를 들어, 데이터 캐시, UCP 캐시 및 CTAG 캐시)와 상의하는 작은 유한 상태 머신으로 성취될 수 있다.Composite policies present additional challenges and opportunities. Consistent with discussion elsewhere herein, a composite policy includes multiple policies that are concurrently enforced for an instruction. The challenge is that a composite policy must address several different policy components. The overall resolution order of a composite policy can be supported by hardware using a HW rule cache miss handler with a data cache, a UCP cache (per policy component of the composite policy) and a CTAG cache for all different policy components of the composite policy. Being able is an opportunity. From previous experience, a common challenge is that when allocating memory new (eg, using malloc), and thus tagging the new memory with a color, a forced rule cache miss is caused. In this case, it is possible that the memory safety policy component needs new rules, but the other components already have their rules in the UCP cache. Using hardware acceleration through HW rule cache miss handlers for top-level composite policy and memory-safe color matching, memory rule resolution can be implemented in hardware instead of requiring hundreds or thousands of cycles to resolve rules executing software-based miss handler code. It can be achieved with a small finite state machine that runs on and consults caches (e.g. data cache, UCP cache and CTAG cache).

적어도 하나의 실시예에서, UCP 캐시는 컴포넌트 정책에 의해 분해될 수 있고, 다시 CTAG 캐시 내로 피드백될 태그 결과의 복합 세트를 생성하는 모든 것이 병렬로 해결될 수 있다. 모든 정책이 UCP 캐시에 의해 또는 메모리 안전을 위한 규칙과 같은 간단한 하드웨어 규칙에 의해 해결될 수 있다면, UCP 캐시를 룩업하는 총 시간은 정책 수에 비례하기 보다는 단일 정책의 총 시간일 것이다. 이것은 컴포넌트 정책의 수가 고정되어 제공된 하드웨어와 매칭된다면 완벽하게 작동한다. 그럼에도 불구하고, 약간의 변형은 이용 가능한 고정된 수의 UCP 캐시 전반에 컴포넌트 정책을 단순히 배포하는 것이므로, 순차적 UCP 캐시 해결책의 수는 컴포넌트 태그의 수 대 물리적 UCP 캐시의 수의 비에 불과하다.In at least one embodiment, the UCP cache can be decomposed by component policy, and everything that creates a complex set of tag results to be fed back into the CTAG cache can be resolved in parallel. If all policies can be resolved by the UCP cache or by simple hardware rules such as those for memory safety, the total time to look up the UCP cache will be the total time of a single policy rather than proportional to the number of policies. This works perfectly if the number of component policies is fixed and matches the hardware provided. Nonetheless, a slight variation is to simply distribute the component policy across a fixed number of available UCP caches, so the number of in-order UCP cache solutions is just the ratio of the number of component tags to the number of physical UCP caches.

도 67을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 사용될 수 있는 복합 정책과 관련하여 HW 규칙 캐시 미스 핸들러의 사용을 도시하는 예(1310)가 도시된다. 이러한 특정 예에서, 3 정책은 복합 정책을 포함하며, 이에 따라 모든 3 정책이 동일한 명령어에 대해 동시에 시행되지만, 보다 일반적으로 복합 정책은 임의의 수의 정책을 포함하며 3으로 제한되지는 않는다. 요소(1314a 내지 1314c)는 복합 정책을 포함하는 3 정책에 대한 HW 규칙 캐시 미스 핸들러이다. 입력(1312)은 특정 정책에 대한 규칙 출력(1316a 내지 1316c)을 각각 결정하거나 계산하는 HW 규칙 캐시 미스 핸들러(1314a 내지 1314c) 각각에 제공될 수 있다(예를 들어, HW 규칙 캐시 미스 핸들러(1314a)는 정책 A에 대한 Rtag 및 PCnew tag를 포함하는 출력(1316b)을 결정하고; HW 규칙 캐시 미스 핸들러(1314b)는 정책 B에 대한 Rtag 및 PCnew tag를 포함하는 출력(1316b)을 결정한다). 결과적으로, 출력(1316a 내지 1316c)은 3 정책에 대한 합성 Rtag 및 PCnew tag를 나타내는 단일 복합 결과(1318)으로 조합될 수 있다. 복합 결과(1318)를 결정하기 위해 출력(1316a 내지 1316c)을 조합하는 것은 또한 하드웨어 또는 소프트웨어를 사용하여 구현될 수 있다. 새로운 규칙은 캐시 내에 삽입되고 이 캐시에서 새로운 규칙은 복합 결과(1318)(예를 들어, Rtag 및 PCnew tag에 대한 복합 값)과 함께 규칙 미스 처리를 트리거하는 특정 명령어에 대한 입력(예를 들어, opcode 및 입력 태그)을 포함할 수 있다. Referring to FIG. 67 , an example 1310 illustrating the use of a HW Rule Cache Miss Handler in conjunction with a composite policy that may be used in embodiments consistent with the techniques herein is shown. In this particular example, the 3 policies include a composite policy, such that all 3 policies are enforced concurrently for the same instruction, but more generally a composite policy includes any number of policies, but is not limited to three. Elements 1314a through 1314c are HW rule cache miss handlers for the 3 policies including the composite policy. Input 1312 may be provided to each of HW rule cache miss handlers 1314a through 1314c that determine or compute rule outputs 1316a through 1316c, respectively, for a particular policy (e.g., HW rule cache miss handler 1314a ) determines output 1316b containing Rtag and PCnew tag for policy A; HW rules cache miss handler 1314b determines output 1316b containing Rtag and PCnew tag for policy B). As a result, outputs 1316a through 1316c can be combined into a single composite result 1318 representing the composite Rtag and PCnew tag for the three policies. Combining the outputs 1316a - 1316c to determine the composite result 1318 can also be implemented using hardware or software. The new rule is inserted into the cache, where the new rule is an input (eg, input for a specific instruction that triggers rule miss processing) with a composite result 1318 (eg, a composite value for Rtag and PCnew tag). opcode and input tags).

또한, 예(1310)에 도시되지는 않았지만, 실시예는 본 명세서에서의 기술에 따른 실시예에서 UCP 캐시 및 CTAG 캐시를 HW 규칙 캐시 미스 핸들러(1314a 내지 1314c)와 조합하여 사용할 수 있다. (예를 들어, 도 21, 도 23 및 도 24와 관련하여) 본 명세서의 다른 곳에서 설명된 바와 같이, 정책 A, B 및 C 각각은 가장 최근의 정책 결과 태그를 캐싱하는 자체의 UCP 캐시를 가질 수 있다(예를 들어, 정책 A에 대한 UCP 캐시는 명령어의 opcode 및 입력 태그의 조합에 기초하여 미스 핸들러(1314a)에 의해 최근에 계산된 결과 태그 - PCnewtag 및 Rtag 결과)를 저장한다. (예를 들어, 도 21, 도 23 및 도 24와 관련하여) 본 명세서의 다른 곳에서 설명된 바와 같이, CTAG 캐시는 정책 A, B 및 C와 같은 다수의 복합 정책으로부터 출력될 수 있는 개별 Rtag 값들의 특정 조합에 대한 Rtag의 복합 결과를 저장할 수 있다. CTAG 캐시는 또한 정책 A, B 및 C와 같은 다수의 복합 정책으로부터 출력될 수 있는 개별 PCnew tag 값들의 특정 조합에 대한 PCnewtag의 복합 결과를 저장할 수 있다. 따라서, 출력(1316a 내지 1316c)으로부터 복합 결과(1318)를 생성하는 하드웨어는 CTAG 캐시로부터의 정보를 이용하여 복합 결과(1318)를 결정할 수 있다. 또한, HW 규칙 캐시 미스 핸들러(1314a 내지 1314c)는 또한 정책 A, B 및 C에 대해 UCP 캐시로부터의 정보를 입력으로서 가질 수도 있다. Also, although not shown in example 1310, an embodiment may use a UCP cache and a CTAG cache in combination with HW rule cache miss handlers 1314a to 1314c in an embodiment according to the techniques herein. As described elsewhere herein (e.g., with respect to FIGS. 21, 23, and 24), policies A, B, and C each have their own UCP cache caching the most recent policy result tag. (e.g., the UCP cache for policy A stores the result tag recently computed by miss handler 1314a based on the combination of the instruction's opcode and input tag - PCnewtag and Rtag results). As described elsewhere herein (e.g., with respect to FIGS. 21, 23, and 24), a CTAG cache is an individual Rtag that may be output from multiple composite policies, such as policies A, B, and C. You can store composite results of Rtags for specific combinations of values. The CTAG cache may also store composite results of PCnewtag for specific combinations of individual PCnew tag values that may be output from multiple composite policies, such as policies A, B, and C. Thus, the hardware generating composite result 1318 from outputs 1316a - 1316c can use information from the CTAG cache to determine composite result 1318 . Additionally, the HW rule cache miss handlers 1314a - 1314c may also have as input information from the UCP cache for policies A, B and C.

예(310)에서와 같이, 복합 정책의 모든 3 정책에 대한 HW 규칙 캐시 미스 핸들러를 갖는 것의 대안으로서, 실시예는 하나 이상의, 그러나 복합 정책을 포함하는 그러한 정책 전부보다는 적은 정책에 대해 HW 규칙 캐시 미스 핸들러를 구현하도록 선택적으로 선택할 수 있다. 이러한 실시예에서, 규칙 캐시 미스 핸들러의 일부분은 하드웨어로 구현될 수 있고, 복합 정책의 규칙 캐시 미스 핸들러의 나머지 부분은 본 명세서의 다른 곳에서 설명된 바와 같이 소프트웨어로 구현될 수 있다.As an alternative to having HW rule cache miss handlers for all 3 policies in a composite policy, as in example 310, an embodiment may provide a HW rule cache for one or more, but less than all of those policies that contain the composite policy. You can optionally choose to implement a miss handler. In such an embodiment, a portion of the rules cache miss handler may be implemented in hardware, and the remainder of the rule cache miss handler in the composite policy may be implemented in software as described elsewhere herein.

본 명세서에 설명된 바와 같은 일부 정책은 예컨대, 예를 들어, 메모리 안전 정책과 관련하여 새로운 태그를 할당할 수 있다는 것을 유의하여야 한다. 적어도 하나의 실시예에서, 새로운 태그를 할당할 수 있는 메모리 안전과 같은 정책에 대한 HW 규칙 캐시 미스 핸들러는 HW 기반 핸들러가 사용할 수 있는 새로운 태그 값의 FIFO 기반 캐시(예를 들어, 새로 할당된 태그 값이 생성됨에 따라 사용될 수 있는 태그의 캐시)를 구비할 수 있다. 할당된 태그가 어드레스를 나타내는 포인터이면, 캐시는 태그 값 대신 어드레스 또는 포인터를 포함한다. 이러한 방식으로, HW 규칙 캐시 미스 핸들러는 FIFO 기반 캐시로부터 최상위 엔트리를 판독함으로써 간단하게 할당을 수행할 수 있다. 주기적으로, 소프트웨어 핸들러는 FIFO 기반 캐시를 할당에 이용 가능한 새로운 태그로 다시 채우기 위해 메타데이터 처리 도메인에서 실행될 수 있다.It should be noted that some policies as described herein may assign a new tag, such as in connection with a memory safety policy, for example. In at least one embodiment, a HW rules cache miss handler for a policy such as memory safety that can allocate a new tag is a FIFO-based cache of new tag values that the HW-based handler can use (e.g., a newly allocated tag cache of tags that can be used as values are generated. If the assigned tag is a pointer representing an address, the cache contains the address or pointer instead of the tag value. In this way, the HW rules cache miss handler can perform the allocation simply by reading the top entry from the FIFO based cache. Periodically, a software handler may run in the metadata processing domain to repopulate the FIFO-based cache with new tags available for allocation.

메타데이터 처리 도메인과 사용자 코드 또는 실행 도메인의 "정상적인" 코드 처리 사이에 완전하고 엄격한 분리가 존재하는 실시예가 본 명세서에서 설명된다. 변형예로서, 실시예는 보다 완화된 접근법을 취할 수 있고 사용자 코드 또는 실행 도메인에 의한 정보의 메타데이터 처리 도메인에 대한 수정 또는 기입을 여전히 허용하지 않지만, 메타데이터 도메인에 의해 정보/값을 사용자 코드 또는 실행 도메인으로 리턴될 수 있게 하는 전술한 엄격한 격리 모델을 확장할 수 있다.Embodiments are described herein where there is complete and strict separation between metadata processing domains and "normal" code processing in user code or execution domains. As a variant, an embodiment may take a more relaxed approach and still not allow modification or writing of information by user code or execution domains to metadata processing domains, but passing information/values by metadata domains to user code. Or you can extend the strict isolation model described above to allow returns to the execution domain.

이제 메타데이터 처리 도메인이 정상적 코드 처리 또는 실행 도메인에서 실행되는 코드에 의해 사용되거나 참조될 수 있는 값을 리턴하는 전술한 보다 완화된 접근법을 이용할 수 있는 적어도 하나의 실시예에 포함될 수 있는 기술이 설명될 것이다(예를 들어, 메타데이터 처리는 정상 또는 사용자 코드 실행 도메인으로의 입력되는 값을 리턴한다). 예를 들어, 본 명세서의 다른 곳에서 설명된 바와 같이, 실시예는 malloc 및 free 루틴을 사용할 수 있는데, 이러한 루틴은 malloc 및 free의 코드가, 실행될 때, malloc 메타데이터에 액세스하는 처리를 수행하는 기능, 새로운 컬러 태그를 생성하는 기능, 사용자 데이터 영역을 이러한 새로운 컬러 태그로 태깅하는 기능 등을 malloc 및 free에게 허용하는 규칙을 트리거하도록 하는데 필요한 고유한 기능을 제공하는 명령어 태그로 태깅된 코드를 갖는다. 전술한 바는 사용자 코드와 같은 다른 코드를 제외하고 malloc 및 free에 고유하게 할당되는 그러한 권한 또는 기능을 제공하는 것이다. 이제 malloc 및 free가 자신의 처리를 수행하고, 로더에 의해 malloc 및 free 코드를 특수 코드 태그(들)로 특수하게 태깅하여 이러한 코드를 특수 실행 권한을 갖는 malloc 및 free에 속하는 것으로 고유하게 식별하는 코드 태깅이 이용되는 실시예를 고려해 본다. 이러한 실시예에서, call to free를 만드는 사용자 코드는 예를 들어, 손상되었거나 그렇지 않으면 지금 할당 해제되고 있는 이전에 할당된 저장 영역의 시작점을 가리키지 않는 포인터 PTR1을 제공하는 사례일 수 있다. PTR1은 free에 의하면, malloc에 의해 이전에 할당된 사용자 데이터 영역의 제 1 위치를 가리키는 것으로 추정할 수 있다. free는, 예를 들어, malloc 메타데이터가 할당된 사용자 데이터 영역에 대해 미리 결정된 위치에 저장되는, 예컨대, 도 64 및 도 65와 관련하여 설명된 메모리 힙의 사용자 데이터 영역 및 연관된 메모리 위치에 대해 미리 결정된 구조를 추정할 수 있다.We now describe techniques that may be included in at least one embodiment where a metadata processing domain may take advantage of the more relaxed approach described above to return a value that may be used or referenced by code executing in a normal code processing or execution domain. (e.g., metadata processing returns input values to normal or user code execution domains). For example, as described elsewhere herein, an embodiment may use malloc and free routines, which allow the code of malloc and free to, when executed, perform the processing of accessing malloc metadata. It has code tagged with command tags that provide unique functionality needed to trigger rules that allow malloc and free to function, create new color tags, tag user data areas with these new color tags, etc. . The foregoing is to provide those privileges or functions that are uniquely assigned to malloc and free, excluding other code, such as user code. Code where malloc and free now do their own processing, and the loader specially tags the malloc and free codes with special code tag(s) to uniquely identify them as belonging to malloc and free with special execute rights. Consider an embodiment in which tagging is used. In such an embodiment, the user code making the call to free may, for example, case provide a pointer PTR1 that is corrupted or does not point to the beginning of a previously allocated storage area that is otherwise being deallocated. According to free, PTR1 can be assumed to point to the first location of the user data area previously allocated by malloc. free, for example, stored in a predetermined location for the user data area to which malloc metadata is allocated, e.g., in advance for the user data area and associated memory location of the memory heap described with reference to FIGS. 64 and 65 The determined structure can be inferred.

이제, PUMP가 값을 코드 실행 도메인에 리턴하게 하는 실시예에서 사용될 수 있는 기술이 설명될 것이다.Now, a technique that may be used in an embodiment that causes a PUMP to return a value to a code execution domain will be described.

도 68의 예(1200)를 참조하면, 아래에서 논의되는 포인터 PTR1 및 PTR2라는 주석을 더 붙인 도 65의 예(1110)와 관련하여 설명된 요소(1111 및 1113)가 도시된다. 사용자 코드가 메모리 블록(1102b)을 할당 해제하려는 의도로 PTR1을 가진 free를 호출한다고 가정한다. P1는 free에 의해 예상되는 포인터 또는 어드레스를 나타낼 수 있다. 그러나, 이 예에서 PTR1은 일반적으로 P1 이외의 다른 어드레스를 나타내는 손상되거나 올바르지 않은 어드레스를 나타낼 수 있다(예를 들어, PTR1은 메모리(1102b) 내의 위치를 식별할 수 있거나 또는 힙을 가리키지도 않는 어드레스를 나타낼 수 있다). PTR1이 손상되었거나 그렇지 않으면 정확한 메모리 위치 P1을 가리키지는 않지만, free는 PTR1을 사용하는 처리를 수행하여 PTR1에 대한 상대적인 어드레스 지정을 사용하는 malloc 메타데이터에 액세스할 수 있는데, PTR1에서 malloc 메타데이터가 그의 미리 정의된 구조, 포맷 또는 레이아웃으로 존재한다고 가정된다. 예를 들어, free에 의해 사용된 malloc 메타데이터 영역은 도 64 및 도 65에서와 같이 할당된 사용자 데이터 부분 바로 이전에 위치하는 것으로 추정될 수 있다. 이러한 경우, free의 코드는 특정 메모리 블록을 할당 해제하기 위해 자신이 처리하는데 사용하는 malloc 메타데이터가 미리 결정된 레이아웃에 기초하여 PTR1 이전의 특정 오프셋 OFF1에 위치한다고 결정한다. 예를 들어, 도 68을 참조하면, free는 사용자 코드에 의해 PTR1이 call to free에 제공될 수 있는 PTR1=P1이라고 추정할 수 있다. free는 할당 해제될 메모리 블록(1102b)에 대응하는 malloc 메타데이터(1102a)가 어드레스 PTR2=PTR1-OFF1을 갖는 메모리 위치에서 시작해야 한다는 미리 정의된 데이터 레이아웃에 기초한 전술한 바와 같은 상대적 어드레스 지정을 사용할 수 있다. 이 예에서, PTR1은 P1과 동일하지 않고, PTR1은 어드레스 계산 PTR2=PTR1-OFF1이 사용자 할당된 메모리 블록(1102b)에서도 또한 존재하도록 실제로 할당된 메모리 블록(1102b)의 어딘가를 가리킨다(PTR2는 free에 의해 사용된 연관된 malloc 메타데이터의 예상된 시작점을 나타내고 있다).Referring to example 1200 of FIG. 68 , elements 1111 and 1113 described with respect to example 1110 of FIG. 65 , further annotated with pointers PTR1 and PTR2 discussed below, are shown. Assume that user code calls free with PTR1 with the intention of deallocating memory block 1102b. P1 can represent a pointer or address expected by free. However, in this example, PTR1 could indicate a corrupted or invalid address that would normally indicate an address other than P1 (e.g., PTR1 could identify a location in memory 1102b or that does not even point to a heap). address). Although PTR1 is corrupt or otherwise does not point to the correct memory location P1, free can perform processing using PTR1 to access malloc metadata using relative addressing to PTR1, where malloc metadata is It is assumed to exist in its predefined structure, format or layout. For example, the malloc metadata area used by free can be estimated to be located right before the allocated user data portion as shown in FIGS. 64 and 65 . In this case, free's code determines that the malloc metadata it processes to deallocate a particular block of memory is located at a particular offset OFF1 before PTR1 based on a predetermined layout. For example, referring to FIG. 68, free can be estimated by user code as PTR1=P1 where PTR1 can be provided to call to free. free can use relative addressing as described above based on a predefined data layout that malloc metadata 1102a corresponding to memory block 1102b to be deallocated must start at a memory location with address PTR2=PTR1-OFF1. can In this example, PTR1 is not equal to P1, and PTR1 points to somewhere in the actually allocated memory block 1102b such that the address calculation PTR2=PTR1-OFF1 also exists in the user allocated memory block 1102b (PTR2 is free indicates the expected starting point of the associated malloc metadata used by .

free 호출에 관한 사용자 코드에 의해 제공된 PTR1이 예상된 위치 P1를 가리키지 않고 그래서 PTR2가 free에 의해 사용된 malloc 메타데이터의 추정된 시작점을 나타내는 그러한 경우에, free의 코드는 (PUMP에 의해 발견된 규칙 위반 또는 free의 실행 중 다른 코드 실행 오류 조건에 기인할 수 있는) 위반, 인터럽트 또는 트랩을 유발하는 malloc 메타데이터와 같은 데이터를 사용하여 메모리 블록(1102b)에 저장된 데이터에 부정확하게 액세스할 수 있다. 따라서 사용자 프로세스 공간 또는 도메인에서 실행되는 코드의 실행은 사용자 코드로부터 PTR1을 사용하여 call to free 에서 호출된 것처럼 루틴 free를 실행하는 동안 전술한 위반으로 인해 중단될 수 있다. 루틴 free가 사용자 코드의 중단을 일으키게 하기 보다는, free의 코드가 PUMP로 쿼리하게 하거나, 보다 일반적으로는 메타데이터 처리가 값을 리턴하게 하는 것이 바람직할 수 있다. 리턴 값은, 예를 들어 (free 코드에 의해 malloc 메타데이터에 액세스하는데 사용되는) PTR2와 연관된 컬러가 실제로 유효하거나 예상되는 malloc 메타데이터 영역을 가리키는지를 나타내는 부울(Boolean)일 수 있다. 그러한 리턴된 PUMP 값 또는 메타데이터 처리 값을 사용하면 free는 어드레스 PTR2의 메모리 위치와 연관된 컬러가 적색과 같은 유효한 malloc 메타데이터 컬러를 나타내는지에 기초하여 상이한 조건부 처리를 수행할 수 있게 된다. 루틴 free는 PTR2가 PTR2의 컬러를 통해 결정된 것처럼 유효하지 않은 malloc 메타데이터 영역을 식별하면, 일부 복구 또는 다른 조치를 수행할 수 있다. 이러한 조치는 규칙 위반, 트랩, 인터럽트 또는 기타 실행 오류로 인해 사용자 코드를 중단시키는 것보다 바람직할 수 있다.In such a case where PTR1 provided by the user code on the call to free does not point to the expected location P1 and so PTR2 indicates the assumed starting point of the malloc metadata used by free, free's code (discovered by PUMP) Data stored in memory block 1102b may be incorrectly accessed using data such as malloc metadata that triggers a violation, interrupt, or trap (which may be due to rule violations or other code execution error conditions during execution of free). . Thus, the execution of code running in the user process space or domain may be interrupted due to the aforementioned violation during execution of the routine free as invoked in call to free using PTR1 from user code. Rather than having routine free cause user code to crash, it may be desirable to have free's code query PUMP, or more generally, have metadata processing return a value. The return value could be, for example, a Boolean indicating whether the color associated with PTR2 (used to access malloc metadata by the free code) actually points to a valid or expected malloc metadata area. Using such a returned PUMP value or metadata processing value allows free to perform different conditional processing based on whether the color associated with the memory location at address PTR2 represents a valid malloc metadata color, such as red. Routine free can perform some recovery or other action if PTR2 identifies an invalid malloc metadata area as determined by the color of PTR2. This action may be preferable to halting user code due to rule violations, traps, interrupts, or other execution errors.

RISC-V 명령 세트를 사용하는 적어도 하나의 실시예에서, 메타데이터 처리 값의 리턴을 구현하기 위해, 다음과 같이 새로운 명령 gmd(get-metadata-info)가 RISC-V 명령 세트에 추가될 수 있다:In at least one embodiment using the RISC-V instruction set, to implement the return of metadata processing values, a new instruction gmd (get-metadata-info) may be added to the RISC-V instruction set as follows: :

여기서 R1은 PUMP 또는 메타데이터 처리에서 리턴된 결과 값을 포함하고;where R1 contains the result value returned from PUMP or metadata processing;

R2는 어드레스 PTR2를 갖는 메모리 위치의 컬러로 태깅된 어드레스 PTR2를 포함하며; R2 contains address PTR2 tagged with the color of the memory location having address PTR2;

R3은 유효한 malloc 메타데이터 영역에 대해 예상대로 유효한 컬러로 태깅된다.R3 is tagged with a valid color as expected for a valid malloc metadata area.

따라서, R2 및 R3은 입력 또는 소스 오퍼랜드인 레지스터일 수 있고, R1은 결과 또는 출력을 포함하는 레지스터일 수 있다. 이러한 특정 예에서, R3tag는 유효한 malloc 메타데이터 영역의 컬러를 나타내는 적색일 수 있고 R2tag는 청색일 수 있다. 새로운 명령어에 의해 호출된 규칙은 이 예에서 R2tag=R3tag인지를 나타내는 부울로서 리턴 값을 출력할 수 있는데, 여기서 전술한 부울 결과는 사용자 실행 코드의 어드레스 공간에 포함된 free에 액세스 가능한 레지스터 R1에 저장된 메타데이터 규칙 처리에 의해 출력된 리턴 값(예를 들어, PUMP 출력)일 수 있다. 본 명세서의 다른 곳에서의 논의와 일관하여 R1은 결과 태그로서 Rtag로 태깅될 수 있다는 것을 알아야 한다.Thus, R2 and R3 can be registers that are input or source operands, and R1 can be a register that contains the result or output. In this particular example, R3tag can be red and R2tag can be blue, representing the color of a valid malloc metadata region. The rule invoked by the new instruction may output a return value in this example as a boolean indicating whether R2tag=R3tag, where the aforementioned boolean result is stored in register R1 accessible to free contained in the address space of user executable code. It may be a return value output by metadata rule processing (eg, PUMP output). Consistent with discussion elsewhere herein, it should be noted that R1 may be tagged with Rtag as the result tag.

다음은 위에서 설명한 바와 같이 PTR1, PTR2 및 OFF1과 함께 C-유사 의사 코드(C-like pseudo code) 서술을 사용하여 free 코드에 의해 수행될 수 있는 로직 처리를 설명한다: The following describes the logic processing that can be performed by free code using a C-like pseudo code description with PTR1, PTR2 and OFF1 as described above:

위의 로직 처리에서, IS_RED는 PTR2가 컬러 RED인지를 알기 위해 체크할 수 있다.In the logic processing above, IS_RED can check to see if PTR2 is color RED.

위에서 언급된 else 블록의 코드에 의해 수행된 복구 처리는, 예를 들어, PTR2로부터 역방향 또는 순방향으로 검색함으로써 유효한 malloc 메타데이터 영역의 시작점을 찾도록 시도할 수 있다. 위에서 언급된 else 블록의 코드는 사용자 코드를 더 많이 정의된 예측된 방식으로, 예컨대 유효하지 않게 컬러화된 포인터 PTR2임을 나타내는 런타임 에러 메시지/조건으로 종료하게 할 수 있다.The recovery process performed by the code in the else block mentioned above may attempt to find the starting point of a valid malloc metadata area, for example, by searching backwards or forwards from PTR2. The code in the else block mentioned above can cause user code to exit in a more defined and expected way, eg with a runtime error message/condition indicating invalid colored pointer PTR2.

새로운 명령어 Get metadata info R1, R2, R3은 예를 들어 위에서 언급된 로직 처리를 수행하기 위해 C에 기입된 free 루틴의 코드를 컴파일하고 링크한 결과로서 발생된 명령어에 포함될 수 있다. 실시예는 어떤 특정 코드 부분이 이러한 새로운 명령어를 실행하게 허용될 수 있는 것을 제어하거나 제한하기를 원할 수 있다. PUMP 규칙은 이러한 새로운 명령어가 어떤 루틴에 의해 실행되게 허용될 수 있는 때를 중재하거나 제한하는데 사용될 수 있다. 예를 들어, free 또는 malloc 코드는 새로운 Get metadata info 명령어를 실행하도록 허용될 수 있지만 사용자 코드는 허용되지 않을 수 있다. 임의의 적절한 기술 중 일부가 본 명세서에서 설명된 임의의 적합한 기술은 루틴 free에게 PUMP 값을 리턴하는 새로운 명령어를 실행하는데 필요한 권한 또는 인가를 제공하기 위해 사용될 수 있다. 예를 들어, free의 코드는 free가 새로운 명령어를 실행하도록 허용됨을 나타내는 특수 명령어 태그로 태깅될 수 있다. 예를 들어, 로더는 free 코드에 출현하는 새로운 명령어에 특수 태그 NI로 태깅할 수 있다. 규칙은 코드가 새로운 명령어를 호출하도록 허용될 수 있는 것을 NI의 명령어 태그(CI 태그)를 가진 코드에게 중재하거나 제한하는데 사용될 수 있다.The new commands Get metadata info R1, R2, R3 may be included in the commands generated as a result of compiling and linking the code of the free routine written in C to perform the logic processing mentioned above, for example. Embodiments may wish to control or restrict what particular code portions may be allowed to execute these new instructions. PUMP rules can be used to mediate or restrict when these new instructions can be allowed to be executed by certain routines. For example, free or malloc code may be allowed to execute the new Get metadata info command, but user code may not. Any suitable technique, some of which is described herein, may be used to provide routine free with the necessary privileges or authorizations to execute a new instruction that returns a PUMP value. For example, free's code can be tagged with a special instruction tag indicating that free is allowed to execute new instructions. For example, the loader can tag new instructions that appear in free code with the special tag NI. Rules can be used to arbitrate or restrict code with an NI instruction tag (CI tag) from being allowed to call a new instruction.

도 69를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 메타데이터 규칙 처리의 입력 및 출력을 도시하는 예(1210)가 도시된다. 요소(1212)는 일반적으로 본 명세서에서 설명된 바와 같은 메타데이터 처리를 나타낼 수 있다. 메타데이터 처리로의 입력(1212a)은 예를 들어, 본 명세서에서 설명된 바와 같이 다양한 태그 및 opcode 정보를 포함할 수 있다. 메타데이터 처리(1212)에 의해 발생된 출력(1214)은 본 명세서의 다른 곳에서 설명된 바와 같이 Rtag(1214a) 및 PCtag(1214b)를 포함할 수 있다. 또한, 메타데이터 처리는 리턴 값(1214c)인 새로운 출력을 발생할 수 있다. 리턴 값(1214c)은 사용자 프로세스 공간/코드 실행에 액세스 가능한 레지스터들의 세트에 있는 새로운 명령어를 가진, 위에서 표시된 R1과 같은 레지스터에 위치될 수 있다. 본 명세서의 다른 곳에서의 설명과 일관하여, (1214a 및 1214b)는 결과(예를 들어, 결과 레지스터 또는 메모리 위치) 및 PC 상에 각각 배치되는 태그를 나타내며, 따라서 (1214a 내지 1214b)는 사용자 프로세스 공간/코드 실행에 액세스 가능하지 않다. 메타데이터 처리가 리턴 값(1214c)을 리턴하는지는 특정 명령어 또는 opcode에 조건부일 수 있음을 유의하여야 한다. 예를 들어, 본 명세서의 다른 곳에서 설명된 바와 같이, 메타데이터 처리 출력은 리턴 값(1214c)의 출력을 인에이블/디스에이블하게 하는 멀티플렉서를 사용하는 opcode에 기초하여 도 27 내지 도 33과 관련하여 설명된 바와 같이 필터링될 수 있다. 이 예에서 값(1214c)은 opcode가 새로운 명령어의 값일 때 R2tag=R3tag인지의 로직 결과를 나타낸다. 그렇지 않고, opcode가 새로운 명령 opcode를 나타내지 않는다면, 디폴트 값이 리턴 값(1214c)으로서 메타데이터 처리에 의해 조건부로 리턴될 수 있다.Referring to FIG. 69 , an example 1210 illustrating the inputs and outputs of metadata rule processing in an embodiment consistent with the techniques herein is shown. Element 1212 may generally represent metadata processing as described herein. Inputs 1212a to metadata processing may include, for example, various tag and opcode information as described herein. Output 1214 generated by metadata processing 1212 may include Rtag 1214a and PCtag 1214b as described elsewhere herein. Metadata processing may also generate a new output, which is return value 1214c. The return value 1214c may be placed in a register, such as R1 indicated above, with the new instruction in the user process space/set of registers accessible to code execution. Consistent with description elsewhere herein, 1214a and 1214b represent results (e.g., result registers or memory locations) and tags placed on the PC, respectively; Not accessible to space/code execution. It should be noted that whether metadata processing returns return value 1214c may be conditional on a particular instruction or opcode. For example, as described elsewhere herein, the metadata processing outputs associated with FIGS. 27-33 based on opcodes using multiplexers that enable/disable the output of return values 1214c. can be filtered as described above. Value 1214c in this example represents the logical result of whether R2tag=R3tag when opcode is the value of the new instruction. Otherwise, if the opcode does not represent a new instruction opcode, a default value may be conditionally returned by metadata processing as return value 1214c.

도 70을 참조하면, 메타데이터 처리에 의해 값을 사용자 실행 도메인으로 리턴할 때, 예컨대 위에서 설명한 바와 같이 free의 코드 내에 포함된 새로운 명령어를 실행할 때, 본 명세서에서의 기술에 따른 실시예에서 수행될 수 있는 컴포넌트 및 처리를 도시하는 예(1220)가 도시된다. 간략한 도시를 위해, 예(1220)는 이러한 새로운 리턴 값에 대한 목적지 또는 결과 레지스터 R1 및 연관된 결과 레지스터에 대해서만 사용되는 메타데이터 처리의 로직 및 컴포넌트를 도시한다. 요소(1222a)는 일반적으로 메타데이터 처리를 위해 본 명세서의 다른 곳에서 설명된 바와 같이 PUMP 입력(예를 들어, 이 예에서는 R2tag 및 R3tag, opcode와 같은 태그)을 나타낼 수 있다. PUMP(1222)는 새로운 명령어에 대해 코드 태그가 NI인지를 체크하는 규칙을 포함할 수 있고, R2tag=R3tag인지를 나타내는 로직 결과(예를 들어, 이 예에서는 제 1 입력 소스 오퍼랜드 R2를 나타내는 OP1 및 제 2 입력 소스 오퍼랜드 R3을 나타내는 OP2)를 출력한다. 규칙은 전술한 로직 결과(1221a)를 출력하게 된다. 요소(1225)는 멀티플렉서(1225) 대용의 선택기(1225a)로서 사용되는 연산 코드를 갖는 멀티플렉서를 나타낼 수 있다. 현재 명령어의 opcode가 새로운 명령 Get metadata info에 대한 특정 opcode를 나타낼 때, (1225a)는 리턴 값(1214c)으로서 출력되게 (1221a)를 선택한다. 그렇지 않고, opcode가 새로운 명령어의 opcode가 아니면, (1225a)는 리턴 값(1214c)으로서 디폴트 리턴 값(1222a)을 선택하게 된다. 리턴 값(1214c)은 목적지 레지스터 RD(1228)에 저장된 PUMP 출력이다(예를 들어, (1214c)는 D1(1228b)에 저장되어 사용자 프로세스 어드레스 공간에서 실행되는 코드에 액세스 가능한 레지스터 RD에 저장된 내용을 나타낸다). RD(1228)가 결과 레지스터이기 때문에, 규칙은 또한 RD를 Rtag로 태깅하게 될 수 있다(예를 들어, Rtag는 태그 부분 T1(1228a)에 저장되며, 여기서 T1은 RD 레지스터의 태그 워드이다). 적어도 하나의 실시예에서, Rtag는 RD가 새로운 명령어의 출력을 포함하고 있음을 나타내는 특수 태그 SPEC1일 수 있다. 규칙으로의 태그 입력이 (PCtag, CItag, OP1tag, OP2tag, MR tag)이고 규칙 출력이 (PCtag, Rtag) 및 새로운 리턴 값(1214c)을 나타내는 NEWOUT인 제 3 출력인 본 명세서의 다른 곳에서 설명된 상징적 로직에 기초하여, 규칙은 다음과 같이 표현될 수 있다:Referring to FIG. 70 , when a value is returned to the user execution domain by metadata processing, for example, when executing a new instruction included in the code of free as described above, in an embodiment according to the technology herein, An example 1220 illustrating possible components and processes is shown. For simplicity of illustration, example 1220 shows the logic and components of the metadata processing used only for the destination or result register R1 and the associated result register for this new return value. Element 1222a may represent a PUMP input (eg, a tag such as R2tag and R3tag, opcode in this example), generally as described elsewhere herein for metadata processing. The PUMP 1222 may include a rule for checking whether the code tag is NI for a new instruction, and a logic result indicating whether R2tag=R3tag (e.g., in this example, OP1 representing the first input source operand R2 and Outputs OP2) representing the second input source operand R3. The rule outputs the aforementioned logic result 1221a. Element 1225 can represent a multiplexer having an opcode used as selector 1225a for multiplexer 1225. When the opcode of the current command indicates a specific opcode for the new command Get metadata info, 1225a selects 1221a to be output as return value 1214c. Otherwise, if the opcode is not the opcode of the new instruction, 1225a will select the default return value 1222a as the return value 1214c. Return value 1214c is the PUMP output stored in destination register RD 1228 (e.g., 1214c returns the contents stored in register RD stored in D1 1228b and accessible to code running in the user process address space). indicate). Since RD 1228 is a result register, the rule could also be to tag RD with Rtag (e.g., Rtag is stored in tag portion T1 1228a, where T1 is the tag word of the RD register). In at least one embodiment, Rtag may be a special tag SPEC1 indicating that RD contains the output of a new instruction. As described elsewhere herein where the tag input to the rule is (PCtag, CItag, OP1tag, OP2tag, MR tag) and the rule output is (PCtag, Rtag) and the third output is NEWOUT indicating the new return value 1214c. symbolic Based on the logic, the rule can be expressed as:

보다 일반적으로, 새로운 명령어의 전술한 사용은 본 명세서에서의 기술에 따른 실시예에서, 호출된 메타데이터 처리 규칙을 통해 허용 가능한 새로운 명령어의 그러한 발생을 나타내기 위해 (예를 들어, NI로) 특수하게 태깅될 수 있는 임의의 적절하고 바람직한 값인 값을 리턴하는데 사용될 수 있다.More generally, the foregoing use of a new instruction may, in embodiments consistent with the techniques herein, indicate (e.g., with NI) a special can be used to return a value that is any suitable and desired value that can be tagged as

대안적인 실시예는 새로운 명령어를 추가하는 것을 피할 수 있다. 이것은 이러한 거동을 제어하기 위해 기존 명령어를 코드 태깅하고 이 경우에는 출력 값을 선택하도록 care 비트를 설정함으로써 이루어질 수 있다. 다른 대안은 PUMP의 출력이기도 한 value-output-care 비트를 추가하여 값 출력이 RD 값 결과로 흘러가야 하는 경우를 규칙이 결정할 수 있도록 하는 것일 수 있다. 이러한 두 번째 사례는 opcode가 태깅되지 않을 때는 정상적으로 거동할 수 있게 하고 적절한 코드 태그가 주어졌을 때만 이와 같은 특수한 거동을 발휘할 수 있게 한다.Alternative embodiments may avoid adding new instructions. This can be done by code tagging an existing instruction to control this behavior and setting the care bit to select the output value in this case. Another alternative could be to add a value-output-care bit that is also the output of the PUMP so that the rule can decide when the value output should flow to the RD value result. This second case allows opcodes to behave normally when not tagged and exhibit this special behavior only when given an appropriate code tag.

이제 특정 명령어 시퀀스가 시퀀스의 제 1 명령어로부터 마지막 명령어까지의 특정 순서대로 단일 유닛으로서 또는 완전한 시퀀스로서 원자적으로 수행되는 것을 보장하기 위해 사용될 수 있는 기술이 설명될 것이다. 또한, 이러한 기술은 시퀀스의 제 1 명령어 이외의 명령어의 시퀀스에 제어가 이전되지 않으며 시퀀스의 마지막 특정된 명령어를 통하지 않고 시퀀스를 이전하거나 종료하지 않게 하는 것을 보증한다. 예를 들어, 도 71의 간단한 명령어 시퀀스를 고려해 본다.Techniques that may be used to ensure that a particular sequence of instructions are executed atomically, either as a single unit or as a complete sequence, in a particular order from the first instruction to the last instruction in the sequence will now be described. Further, this technique ensures that control does not transfer to the sequence of instructions other than the first instruction in the sequence and does not transfer or end the sequence except through the last specified instruction in the sequence. For example, consider the simple instruction sequence of FIG. 71.

예(1400)에서, 2 명령어(1402 및 1404)의 시퀀스가 도시된다. 제 1 명령어(1402)는 메모리 위치(메모리 위치의 어드레스는 R2에 저장되어 있음) 로부터 내용을 판독하거나 또는 R1에 로드한다. 제2 명령어는 동일한 메모리 위치(메모리 위치는 R2에 저장된 어드레스를 갖고 있음)에 제로(0)를 기입하거나 저장한다. 이러한 명령어 시퀀스는 메모리 위치(R2에서 특정된 어드레스를 갖고 있음)로부터 판독된 값이 단 한 번만 사용되는 것을 보장하기 위해 제공될 수 있으며, 이에 따라 이전의 값은 그 값이 메모리로부터 판독된 직후에 메모리 위치로부터 소거되거나 영 출력(zeroed out)된다. 따라서, 메모리 위치의 영 출력은 시퀀스의 제 2 명령어(1404)에 의해 수행되고, 메모리 위치로부터 값을 판독하는 제 1 명령어(1402) 이후에 시퀀스 내의 다음 명령어로서 수행되어야 한다.In example 1400, a sequence of two instructions 1402 and 1404 is shown. The first instruction 1402 reads the contents from a memory location (the address of the memory location is stored in R2) or loads it into R1. The second instruction writes or stores zero (0) to the same memory location (which has the address stored in R2). This sequence of instructions can be provided to ensure that a value read from a memory location (with the address specified in R2) is used only once, so that the previous value is immediately after the value is read from memory. It is erased from the memory location or zeroed out. Thus, a zero output of a memory location must be performed by the second instruction 1404 in the sequence, as the next instruction in the sequence after the first instruction 1402 reading a value from the memory location.

RISC 아키텍처를 사용하는 본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 규칙은 (1400)의 명령어 시퀀스의 데이터 아이템의 전술한 선형성 및 원자성(atomicity)을 시행하기 위해 사용될 수 있다. 이러한 실시예에서, PC 태그(PCnew tag)는 시퀀스 내의 예상되는 다음 명령어의 시퀀스의 상태를 전달하도록 업데이트될 수 있다. 적어도 하나의 실시예에서, 하나의 해결책은 명령어(1402)를 CI로 태깅하여 이 명령어를 선형 판독 명령어(linear read instruction)로서 나타내는 것이다. 또한, R2에 저장된 어드레스를 갖는 메모리 위치를 나타내는 (R2)는 고유 메타데이터 id X1(예를 들어, X1는 이러한 선형 변수를 모든 다른 선형 변수로부터 고유하게 식별한다)을 갖는 선형 변수로서 입력되고 태깅될 수 있다. 제 1 규칙은 제 1 명령어(1402)의 결과로서 트리거될 수 있다. 제 1 규칙은 선형 판독치로서 태깅된 명령어만이 선형 변수로부터 판독되도록 허용된 것임을 표시할 수 있다. 또한, 결과적인 PCnew 태그는 실행된 다음 명령어가 선형 변수 X1을 클리어해야 함을 나타내는 clear-linear-variable-Xl-next일 수 있다. 제2 규칙은 제2 명령어(1404)를 실행하는 결과로서 트리거될 수 있고, 제2 명령어에서 오퍼랜드의 값, (메모리 위치에 기입된) 제로는 메모리 위치를 초기화 또는 클리어하는데 사용되는 특수 값을 나타내는 특수 EMPTY 태그로 태깅된다. 또한, 메모리 위치는 선형 변수 X1가 직전의 명령어(1402)로부터 특정 태깅된 선형 변수를 나타내도록 하는 것이 요구된다. (1402) 다음의 제2 명령어가 EMPTY 값을 선형 변수 X1로 기입하는 것 이외의 어떤 것을 행하면, 트랩이 발생된다. 따라서, (1400)에서 제 2 규칙은 명령어 시퀀스의 원하는 연속적 및 원자성을 시행한다.In at least one embodiment consistent with the techniques herein using a RISC architecture, rules may be used to enforce the aforementioned linearity and atomicity of data items in the instruction sequence of 1400. In such an embodiment, a PC tag (PCnew tag) may be updated to convey the state of the sequence of the expected next instruction in the sequence. In at least one embodiment, one solution is to tag instruction 1402 with a CI to denote it as a linear read instruction. Also, (R2), which represents a memory location with an address stored in R2, is entered and tagged as a linear variable with unique metadata id X1 (e.g., X1 uniquely identifies this linear variable from all other linear variables). It can be. The first rule can be triggered as a result of the first instruction 1402 . The first rule may indicate that only instructions tagged as linear reads are allowed to read from linear variables. Also, the resulting PCnew tag may be clear-linear-variable-Xl-next indicating that the next instruction to be executed should clear linear variable X1. The second rule can be triggered as a result of executing the second instruction 1404, where the value of the operand, zero (written to the memory location), represents a special value used to initialize or clear the memory location. It is tagged with the special EMPTY tag. Also, a memory location is required such that the linear variable X1 represents the particular tagged linear variable from the immediately preceding instruction 1402 . (1402) If the next second instruction does anything other than write the EMPTY value into the linear variable X1, a trap is generated. Thus, the second rule at 1400 enforces the desired contiguous and atomicity of the sequence of instructions.

보다 구체적으로, 제 1 명령어(1402)가 R2에 저장된 어드레스를 갖는 메모리 위치로부터 내용을 R1으로 로드하는 로드 명령어라고 가정한다. 로드 명령어는 다음과 같을 수 있다:More specifically, assume that the first instruction 1402 is a load instruction that loads the contents into R1 from a memory location whose address is stored in R2. The load command could be:

또한, 제 2 명령어(1404)가 R2에 저장된 어드레스를 갖는 메모리 위치로 제로(0)를 이동시키는 무브 명령어라고 가정한다. 무브 명령어는 다음과 같을 수 있다:Also assume that the second instruction 1404 is a move instruction that moves a zero (0) to the memory location having the address stored in R2. A move command could be:

*

본 명세서의 다른 곳에서 언급된 규약에 따르면, 규칙은 opcode, 입력 태그 - PCtag, CItag, OP1tag, OP2tag, Mtag - 및 출력 태그 - PCtag, Rtag)로 정의될 수 있다. 전술한 규칙 규약에 기초하여, 제 1 로드 명령어에 대해, OP1은 R1이고, OP2는 R2이며, (R2)는 Mtag로서 태깅된 메모리 위치이다. 제 1로드 명령어에 의해 트리거되는 제 1 규칙은 다음과 같을 수 있다: According to conventions mentioned elsewhere in this specification, rules may be defined in terms of opcodes, input tags - PCtag, CItag, OP1tag, OP2tag, Mtag - and output tags - PCtag, Rtag. Based on the above rule convention, for the first load instruction, OP1 is R1, OP2 is R2, and (R2) is the memory location tagged as Mtag. The first rule triggered by the first load instruction may be:

전술한 규칙 규약에 기초하여, 제 2 스토어 명령어의 경우, OP1은 0이고, OP2는 R2이고, (R2)는 Mtag로서 태깅된 메모리 위치이다. 제 2 무브 명령어에 의해 트리거된 제 2 규칙은 다음과 같을 수 있다:Based on the above rule convention, for the second store instruction, OP1 is 0, OP2 is R2, and (R2) is the memory location tagged as Mtag. The second rule triggered by the second move command may be:

이 예는 태그와 규칙을 사용하여 특정 명령어 시퀀스의 분할성(indivisibility)을 보장하는 방법을 보여준다. 관련 기술분야에서 통상의 기술자라면, 데이터가 특정 명령어 시퀀스의 일부로서 특정 방식으로만 액세스될 수 있도록 시행하는 것이 바람직한 다른 많은 시나리오에 이와 같은 일반적인 기술이 적용될 수 있다는 것을 쉽게 알 수 있다. 관련 기술분야에서 통상의 기술자는 이 기술이 명령어 시퀀스의 엄격한 시행이 요구되는 어떠한 사례에도 채택될 수 있음을 또한 쉽게 알 수 있다. 위에서 언급한 바와 같이, 일반적인 기술은 시퀀스 내 명령어 N으로부터의 새로운 PC(예를 들어, PCnew tag)를 이어져 있는 시퀀스(glued sequence) 내의 다음 명령어 N+1에 의해 트리거되는 규칙에 있는 PC 태그로서 체크되는 특수 태그로 태깅되는 것을 포함한다.This example shows how to use tags and rules to ensure indivisibility of specific instruction sequences. Those skilled in the art will readily appreciate that this general technique can be applied to many other scenarios where it is desirable to enforce that data can only be accessed in specific ways as part of specific instruction sequences. A person skilled in the art can also readily appreciate that this technique can be employed in any case where strict enforcement of a sequence of instructions is required. As mentioned above, a common technique is to check the new PC from instruction N in the sequence (e.g., PCnew tag) as a PC tag in a rule triggered by the next instruction N+1 in the glued sequence. This includes being tagged with a special tag that

이제 RISC-V 아키텍처에 기초하여 본 명세서의 기술에 따른 실시예에서 시스템을 부팅 또는 시동하는 동작의 일부로서 수행될 수 있는 기술이 설명될 것이다. 다음 단락은 예컨대 CSR이 메타데이터 처리 도메인과 관련하여 사용될 수 있는 예(900)와 관련하여 본 명세서의 다른 곳에서 설명된 다양한 CSR을 언급할 수 있다. Techniques that may be performed as part of an operation to boot or start up a system in embodiments according to the techniques herein based on the RISC-V architecture will now be described. The following paragraphs may refer to various CSRs described elsewhere herein, such as with respect to example 900 in which a CSR may be used in connection with a metadata processing domain.

본 명세서의 다른 곳에서 설명된 바와 같이, 부트스트랩 태그는 하드와이어드되거나 특정 ROM 위치에 저장된 값일 수 있다. 시스템의 부팅의 일부로서, 상이한 CSR, 메모리 등을 초기화하는 것을 포함하는 초기화를 일반적으로 수행하는 부트스트래핑 코드의 세그먼트가 실행될 수 있다. 초기화의 일부로서, 이러한 처리는 또한 초기에 메모리 위치를 초기 부트스트랩 태그로부터 도출된 디폴트 태그 값으로 태깅한다. 적어도 하나의 실시예에서, boottag CSR(예를 들어, 예(900)에서와 같은 sboottag CSR)과 같은 CSR은 시스템의 모든 다른 태그가 도출되는 초기의 "시드(seed)" 태그로서 사용되는 특수 boottrap 태그로 초기화될 수 있다. 로더와 같은 다른 코드 엔티티는 자체의 명령어를 특수하게 태깅되게 하여(예를 들어, CI 태그는 특수 명령어 태그로 설정됨), 특수 명령어 태그를 갖지 않는 다른 코드가 동작하도록 허용되지 않는 동작을 수행하는 특정 권한 또는 인가를 갖는 것으로 로더를 지정할 수 있다. 전술한 바는 트리거된 규칙이 원하는 태그 동작을 수행하도록 하기 위해 CI 태그가 특수 태그임을 보장하기 위해 CI 태그를 검사하는 로더의 코드에 의해 트리거되는 규칙을 사용하여 시행될 수 있다. 따라서, 예를 들어, 로더의 명령어에 태깅하는데 사용되는 특수한 CI 태그는 스타트업 프로세스의 일부로서 코드를 실행함으로써 트리거되는 특수한 규칙의 결과로서 부트스트랩 태그로부터 발생되거나 도출될 수 있다. 일반적으로, 코드 또는 저장된 명령어의 일부분이 태깅되면, 규칙은 그렇게 태깅된 코드의 실행에 의해 트리거되어 더 많은 원하는 태그를 발생하고 또한 그렇게 발생된 태그를 코드 및 데이터에 배치할 수 있다. 전술한 양상 및 다른 양상은 보다 상세하게 설명된다.As described elsewhere herein, a bootstrap tag can be a value that is hardwired or stored in a specific ROM location. As part of booting the system, a segment of bootstrapping code may be executed that generally performs initialization including initializing different CSRs, memory, and the like. As part of initialization, this process also initially tags the memory location with a default tag value derived from the initial bootstrap tag. In at least one embodiment, a CSR such as a boottag CSR (e.g., a sboottag CSR as in example 900) is a special boottrap used as an initial "seed" tag from which all other tags in the system are derived. Can be initialized with tags. Other code entities, such as loaders, have their instructions specially tagged (eg, CI tags are set to special instruction tags) to perform actions that other code that does not have a special instruction tag are not allowed to do. You can specify a loader as having specific privileges or authorizations. The foregoing can be enforced using a rule triggered by the loader's code that checks the CI tag to ensure that the CI tag is a special tag in order for the triggered rule to perform the desired tag action. Thus, for example, the special CI tags used to tag the loader's instructions may be generated or derived from bootstrap tags as a result of special rules triggered by executing code as part of the startup process. In general, when a piece of code or stored instructions is tagged, rules can be triggered by execution of that tagged code to generate more desired tags and also place so-generated tags into the code and data. The foregoing and other aspects are described in more detail.

시스템의 스타트업 또는 부팅시, 예컨대 tagmode CSR(예를 들어, 예(900)의 (901r))에 저장된 태그 모드는 초기에 오프일 수 있다(예를 들어, 예(910)의 (911a)). 부트스트랩 ROM 프로그램은 먼저 default tag CSR(예를 들어, 예(900)의 (901c))를 특수 디폴트 태그 값으로 직접 설정하는 것으로 실행될 수 있다. 이어서, 부트스트랩 프로그램은 tagmode CSR을 메타데이터 처리 도메인이 디폴트 태그 CSR에 저장된 디폴트 태그를 모든 결과에 기입할 수 있는 모드로 설정할 수 있다. 달리 말하면, defaulttag 태그 모드(예를 들어, 예(900)의 (911b))에 있는 동안, PUMP 출력 Rtag는 항상 디폴트 태그 값이다.At start-up or boot-up of the system, for example, the tagmode stored in the tagmode CSR (eg, 901r of example 900) may be initially off (eg, 911a of example 910). . The bootstrap ROM program can be executed by first directly setting the default tag CSR (e.g., 901c of example 900) to a special default tag value. The bootstrap program can then set the tagmode CSR to a mode in which the metadata processing domain can write the default tag stored in the default tag CSR to all results. In other words, while in defaulttag tag mode (e.g., 911b of example 900), the PUMP output Rtag is always the default tag value.

이어서, 메모리 위치가 초기화되고 디폴트 태그로 태깅된 후, 모든 다른 후속 태그를 더 발생하거나 도출하기 위해 사용될 초기 태그 세트를 발생하는 처리가 수행될 수 있다(예를 들어, 초기 세트는 무한한 방식으로 하나 이상의 태그 발생을 도출하는데 추가로 사용될 수 있다). 이러한 처리는 초기 태그 세트를 생성하는 규칙을 트리거하는 명령어 시퀀스 또는 코드 세그먼트를 실행하는 것을 포함할 수 있다. 이 경우, 태그 모드는 코드 세그먼트의 실행 동안 PUMP를 연계시키는 적절한 태그 모드 레벨로 설정될 수 있다. 예를 들어, 부팅 코드가 하이퍼바이저 모드에서 실행 중이면, (911e)에 의해 나타낸 바와 같이 x110으로 설정되거나 (911f)에 의해 나타낸 바와 같이 x111로 설정되어, 코드 세그먼트의 실행 동안 PUMP를 연계시키며, 이에 의해 규칙은 코드 세그먼트 명령어의 결과로서 트리거되고 시행된다.Subsequently, after the memory location has been initialized and tagged with a default tag, a process may be performed to generate an initial set of tags that will be used for further generating or deriving all other subsequent tags (e.g., the initial set may be one in an infinite fashion). can be further used to derive occurrences of tags above). This process may include executing a sequence of commands or code segments that trigger rules to create an initial set of tags. In this case, the tag mode can be set to an appropriate tag mode level to associate PUMP with during execution of the code segment. For example, if the boot code is running in hypervisor mode, it is set to x110, as indicated by 911e, or to x111, as indicated by 911f, to engage PUMP during execution of the code segment; This allows rules to be triggered and enforced as a result of code segment instructions.

위에서 언급한 코드 세그먼트를 실행하기 전에, 코드 세그먼트를 검증하거나 입증하는 처리가 수행될 수 있음에 유의하여야 한다. 예를 들어, 적어도 하나의 실시예에서, 위에서 언급한 코드 세그먼트는 실행에 앞서, 코드 세그먼트가 변경되거나 수정되지 않았음을 보장하기 위해 (예를 들어, 예컨대 디지털 서명을 사용하여) 암호 해독 및 검증되거나 입증되는, 암호화된 형태로 저장될 수 있다. It should be noted that processing to verify or verify the code segment may be performed prior to executing the above-mentioned code segment. For example, in at least one embodiment, the code segments noted above are decrypted and verified prior to execution (e.g., using a digital signature) to ensure that the code segments have not been altered or modified. may be stored in an encrypted form, verified or validated.

추가로 설명하면, 부트스트랩 프로그램은 PUMP가 연계되는 동안 실행되는 위에서 언급한 코드 세그먼트 4 명령어에 포함되어 초기 태그 세트를 생성할 수 있다:Further explaining, the bootstrap program may be included in the above mentioned Code Segment 4 instructions executed while PUMP is engaged to generate an initial set of tags:

위의 명령어 1에서, R1은 범용 레지스터이다. 명령어 1은 boottag CSR을 판독하여 boottag CSR에 저장된 값과 boottag CSR에 저장된 태그 둘 모두를 R1로 전달한다. boottag CSR은 프로세서 리셋 동안 또는 자기의 태그를 비롯한 CSR의 권한 있는 모드 기입에 의해 특정 태그를 보유하도록 설정되었다. boottag CSR로부터 판독하면 또한 boottag CSR을 클리어할 수 있으므로 부팅 동안 초기 검색 이후에는 검색되는 것이 가능하지 않다.In instruction 1 above, R1 is a general purpose register. Command 1 reads the boottag CSR and passes both the value stored in the boottag CSR and the tag stored in the boottag CSR to R1. The boottag CSR was set to hold certain tags during processor reset or by privileged mode writing of the CSR including its own tag. Reading from the boottag CSR may also clear the boottag CSR so it is not possible to retrieve it after the initial search during boot.

"

" 형태의 위의 명령어 2 내지 4를 형성하는 각각의 가산 명령어에서, Rn은 Add의 결과를 저장할 타깃 또는 결과 레지스터를 나타내며, Ry는 또한 레지스터 또는 소스 오퍼랜드를 나타낸다. 전술한 코드 세그먼트의 명령어 2는 제 1 태그로부터 제 2 태그를 발생하고 제 2 태그를 R2가 가리키는 메모리 위치에 배치하는 제 2 규칙을 트리거할 수 있다. 전술한 코드 세그먼트의 명령어 3은 제 2 태그로부터 제 3 태그를 더 발생하고 제 3 태그를 R3가 가리키는 메모리 위치에 배치하는 제 3 규칙을 트리거할 수 있다. 전술한 코드 세그먼트의 명령어 4는 제 3 태그로부터 제 4 태그를 더 발생하고 제 4 태그를 R4가 가리키는 메모리 위치에 배치하는 제 4 규칙을 트리거할 수 있다. 이러한 방식으로, 전술한 코드 세그먼트는 태그 값으로서 레지스터에 저장된 4 태그의 초기 세트를 발생하는데 사용될 수 있다. 전술한 일반적인 기술은 임의의 원하는 수의 초기 세트의 태그를 발생하는 유사한 방식으로 더 확장될 수 있다."

In each add instruction forming above instructions 2 through 4 of the form, Rn denotes the target or result register in which to store the result of Add, and Ry also denotes the register or source operand. Instruction 2 of the preceding code segment may trigger a second rule that generates a second tag from the first tag and places the second tag in the memory location pointed to by R2 Instruction 3 of the preceding code segment further generates a third tag from the second tag; can trigger rule 3 which places a 3rd tag at the memory location pointed to by R3 Instruction 4 of the preceding code segment generates a further 4th tag from 3rd tag and places the 4th tag at the memory location pointed to by R4 In this way, the code segment described above can be used to generate an initial set of 4 tags stored in registers as tag values The general technique described above can be used to generate any desired number of initial sets It can be further extended in a similar way to generate tags of

일반적으로, 적어도 하나의 실시예에서 초기 태그 세트를 발생할 때, 초기 세트 내의 특정 태그의 수는 미리 정의된 수일 수 있다. 각 특수 태그는 명령어를 실행할 때 트리거되는 상이한 고유 규칙의 결과로서 발생될 수 있다. 위의 코드 세그먼트와 같은 각 명령어는 캐시 미스를 초래할 수 있고 그래서 캐시 미스 핸들러가 실행되어 특정 명령어에 대한 규칙 출력의 일부로서 Rtag를 계산할 수 있으며, 여기서 Rtag는 초기 세트의 태그 중 하나이다. 위의 코드 세그먼트의 명령어와 유사한 방식으로, 상이한 코드 시퀀스가 상이한 시점에 실행되어 초기 세트의 태그 중 하나를 사용하여 다른 태그를 추가로 발생할 수 있다. 따라서, 초기 세트 내의 각 태그는 다른 태그 시퀀스를 추가로 발생하는데 사용되는 태그 발생기(tag generator)를 나타낼 수 있다. 전술한 예에서, Add 명령어는 태그의 다른 전체 시퀀스를 발생하는데 사용될 수 있는 다음 태그 발생기를 발생하는데 사용될 수 있다. 아래에서 논의되는 바와 같이, 초기 세트의 태그 발생기(다른 시퀀스를 발생하는 시작점으로 사용되는 추가의 태그 발생기 자체)는 다른 태그 시퀀스를 생성하는 발생기로서 더 이상 사용될 수 없는 정규 또는 비-생성 태그와 구별될 수 있다. 따라서, ADD와 같은 특정 명령어는 규칙 및 미스 처리를 트리거하여 태그 발생기들의 세트 또는 시퀀스를 발생하는데 사용될 수 있다. 이것은 시퀀스에서 비-발생 태그를 발생하는 규칙 및 미스 처리를 트리거할 수 있는 MOVE와 같은 다른 명령어와 대조될 수 있다. malloc과 같은 코드와 관련하여, ADD 명령어는 제 1 애플리케이션에 대해 상이한 컬러의 시퀀스를 발생하는데 사용되는 새로운 애플리케이션 태그 컬러 발생기를 발생하는데 유사하게 사용될 수 있다(예를 들면, 새로운 애플리케이션 태그 컬러 발생기는 특정 애플리케이션에 대해 상이한 컬러 RED-APP1, BLUE-APP1, GREEN-APP1 등의 시퀀스를 발생하는데 사용되는 APP1일 수 있다). 그러면 태깅된 ADD 명령어는 RED-APP1-gen, BLUE-APP1-gen 또는 GREEN-APP1-gen 중 하나와 같은 특정한 애플리케이션 특정 시퀀스에서 다음 태그를 획득하는데 사용될 수 있다. 그런 다음 태깅된 MOVE 명령어는 RED-APP1-gen, BLUE-APP1-gen 또는 GREEN-APP1-gen으로부터 실제 컬러 RED-APP1, BLUE-APP1 또는 GREAN-APP1을 각각 생성하는데 사용될 수 있다(여기서 RED-APP1, BLUE-APP1, GREEN-APP1은 추가 태그 시퀀스를 추가로 발생하는데 사용될 수 없다).Generally, when generating an initial set of tags, in at least one embodiment, the number of specific tags in the initial set may be a predefined number. Each special tag can be generated as a result of a different unique rule triggered when executing a command. Each instruction, such as the code segment above, can result in a cache miss, so the cache miss handler can be executed to compute an Rtag as part of the rule output for that particular instruction, where Rtag is one of the initial set of tags. In a similar way to the instructions in the code segments above, different code sequences can be executed at different times to generate additional tags using one of the initial set of tags. Thus, each tag in the initial set may represent a tag generator that is used to further generate other tag sequences. In the above example, the Add command can be used to generate the next tag generator that can be used to generate another full sequence of tags. As discussed below, an initial set of tag generators (additional tag generators themselves used as starting points for generating other sequences) are distinguished from regular or non-generated tags that can no longer be used as generators for generating other tag sequences. It can be. Thus, a specific command such as ADD can be used to trigger rule and miss processing to generate a set or sequence of tag generators. This can be contrasted with other commands, such as MOVE, which can trigger rules and miss handling that result in non-occurring tags in sequence. In conjunction with code such as malloc, the ADD instruction can similarly be used to generate a new application tag color generator used to generate a sequence of different colors for a first application (e.g., a new application tag color generator may be APP1 used to generate a sequence of different colors RED-APP1, BLUE-APP1, GREEN-APP1, etc. for the application). The tagged ADD instruction can then be used to obtain the next tag in a specific application specific sequence, such as one of RED-APP1-gen, BLUE-APP1-gen or GREEN-APP1-gen. The tagged MOVE command can then be used to generate the actual color RED-APP1, BLUE-APP1 or GREEN-APP1 from RED-APP1-gen, BLUE-APP1-gen or GREEN-APP1-gen respectively (where RED-APP1 , BLUE-APP1, GREEN-APP1 cannot be used to further generate additional tag sequences).

PUMP가 연계되는 동안 실행되는 부트스트랩 프로그램의 코드 세그먼트는 실행될 때 커널 코드/명령어에 태깅하고 부가적으로 다른 코드 모듈 또는 엔티티를 임의의 원하는 특수 명령어 태그로 태깅하여 이렇게 특수 태깅된 코드가 원하는 권한 또는 인가를 가질 수 있게 하는 규칙을 트리거하는 추가 코드를 또한 포함할 수 있다. 예를 들어, 코드 세그먼트는 로더 코드에 태깅하고 루틴 malloc 및 free의 코드를 권한 또는 인가를 확장하는 특수 명령어 태그로 태깅하여 이러한 코드가 권한 있는 태깅 동작을 수행하도록 하는 규칙을 트리거하는 명령어를 포함할 수 있다. 특수 코드 태그는 추가의 원하는 태그를 발생하고 또한 부가적인 코드 및/또는 데이터를 일반화된 태그로 적절하게 태깅하는 규칙을 트리거하는, 미리 결정된 코드 시퀀스/명령어 세트를 사용하는 위에서 언급한 것과 유사한 방식으로, 태그의 초기 세트로부터 발생될 수 있다.The bootstrap program's code segments that run while the PUMP is engaged, when executed, tag kernel code/instructions and additionally tag other code modules or entities with any desired special instruction tags so that these specially tagged codes have the desired privileges or You can also include additional code to trigger rules that allow you to have authorization. For example, a code segment may contain instructions that trigger a rule that causes such code to perform privileged tagging operations by tagging loader code and tagging the code of routines malloc and free with special instruction tags that extend privileges or authorizations. can Special code tags generate additional desired tags and also trigger rules that appropriately tag additional code and/or data with generalized tags, in a manner similar to that mentioned above using predetermined code sequences/instruction sets. , can be generated from an initial set of tags.

적어도 하나의 실시예에서, 위에서 언급한 코드 세그먼트의 부분과 관련하여 부가적인 방책 또는 기술이 취해질 수 있다. 예를 들어, 초기 태그 세트를 생성하기 위해 사용된 위에서 언급한 4 명령어는 예컨대 본 명세서의 다른 곳(예(400)에 설명된 순차성(sequentiality) 및 원자성을 시행하는 "글루 (glue)" 정책의 규칙을 사용하여 제 1 명령어 시퀀스에 포함될 수 있다.In at least one embodiment, additional countermeasures or techniques may be taken with respect to portions of the code segments noted above. For example, the 4 instructions mentioned above used to generate the initial set of tags are a "glue" policy that enforces sequentiality and atomicity, such as described elsewhere herein (example 400). may be included in the first command sequence using the rule of

위에서 언급한 코드 세그먼트가 초기 태그 세트를 발생하고 커널 코드 및 임의의 다른 원하는 명령어를 추가로 특수 태깅하도록 실행된 후, 제어는 부가적인 부트 코드로 이전될 수 있다. RISC-V 아키텍처에 기초한 적어도 하나의 실시예에서, 부가적인 부트 코드는 하이퍼바이저 권한 레벨에서 실행될 수 있다. 이러한 부가적인 부트 코드는 예를 들어, PUMP에 초기 규칙 세트를 로딩하는 것을 트리거하는 명령어를 포함할 수 있다. 부팅이 완료되면, tagmode CSR에 의해 표시되는 PUMP 태그 모드는 사용자 권한 레벨에서 실행하는 것과 같은 사용자 코드와 관련하여 PUMP를 연계시키기에 적합한 레벨로 설정될 수 있다(예를 들어, 예(910)의 (911c)에서처럼 PUMP가 연계되고 U (사용자) 모드 또는 권한 레벨에서만 작동함을 나타내는 태그 모드로 설정된다).After the code segments mentioned above are executed to generate the initial set of tags and further special tag the kernel code and any other desired instructions, control can be transferred to additional boot code. In at least one embodiment based on the RISC-V architecture, additional boot code may be executed at the hypervisor privilege level. This additional boot code may include, for example, instructions that trigger the loading of the initial rule set into the PUMP. Once boot is complete, the PUMP tag mode indicated by the tagmode CSR may be set to a level suitable for associating PUMP with respect to user code, such as running at a user privilege level (e.g., in example 910). As in 911c, the PUMP is associated and set to U (user) mode or tag mode indicating that it operates only at privilege level).

도 72를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 수행될 수 있는 처리 단계의 흐름도가 도시된다. 흐름도(1600)는 전술한 처리를 요약한다. 단계(1602)에서, 태그 모드는 tagmode CSR이 예(910)의 (911a)와 관련하여 본 명세서의 다른 곳에서 설명된 바와 같이 PUMP 오프 상태를 나타내는 오프 상태로 설정된다. 단계(1604)에서, boottag CSR은 특수 부트스트랩 태그로 초기화된다. 단계(1606)에서, 부트스트랩 프로그램의 실행이 시작된다. 단계(1608)에서, 부트스트랩 프로그램은 defaultatg CSR을 디폴트 태그로 설정할 수 있다. 단계(1610)에서, tagmode CSR은 디폴트 태그를 모든 결과에 기입하는 모드로 수정될 수 있다(예를 들어, 이 태그 모드에 있는 동안 각 Rtag=디폴트 태그). 단계(1612)에서, 메모리 위치를 초기화하고 메모리 위치를 디폴트 태그로 태깅하는 규칙을 트리거하는 명령어가 실행될 수 있다. 단계(1614)에서, tagmode CSR은 단계(1616)에서 후속 코드 세그먼트의 실행 동안 PUMP를 연계시키는 모드로 변경될 수 있다. 단계(1616)에서, 후속 코드 세그먼트는 PUMP가 연계되어 실행된다. 코드 세그먼트는 초기 태그 세트를 발생하고, boottag CSR을 클리어하고, 커널 코드에 태깅하며, 부가적인 코드 부분을 특수 태그로 태깅하여 그렇게 태깅된 코드에게 원하는 것으로서 확장된 능력, 인가 및 권한을 제공하는 규칙을 트리거하는 명령어를 포함한다. 단계(1618)에서, 제어는 실행되는 부가적인 부트 코드로 이전될 수 있다. 부팅 프로세스가 완료될 때, 시스템은 이제 PUMP가 연계되고 사용자 코드 실행을 위한 작동 가능 상태로 사용자 코드를 실행할 준비가 된다.Referring to FIG. 72 , a flowchart of processing steps that may be performed in an embodiment consistent with the teachings herein is shown. Flow diagram 1600 summarizes the process described above. At step 1602, the tagmode is set to an off state where the tagmode CSR indicates a PUMP off state as described elsewhere herein with respect to 911a of example 910. At step 1604, the boottag CSR is initialized with a special bootstrap tag. At step 1606, execution of the bootstrap program begins. At step 1608, the bootstrap program may set the defaultatg CSR as the default tag. At step 1610, the tagmode CSR can be modified to a mode that writes the default tag to all results (eg, each Rtag=default tag while in this tag mode). At step 1612, an instruction that triggers a rule to initialize a memory location and tag the memory location with a default tag may be executed. At step 1614, the tagmode CSR may be changed to a mode that associates the PUMP during execution of the subsequent code segment at step 1616. At step 1616, the subsequent code segment is executed in conjunction with the PUMP. A code segment generates an initial set of tags, clears the boottag CSR, tags the kernel code, and tags additional code segments with special tags to give those tagged code the extended capabilities, authorizations, and privileges rules as desired. Contains commands that trigger At step 1618, control can be transferred to additional boot code that is executed. When the boot process is complete, the system is now ready to execute user code with PUMP associated and in an operational state for user code execution.

이제 부트스트랩 태그로부터 태그를 발생하는 방법에 대해 보다 상세하게 설명된다. 부트스트랩 태그로 시작하는 태그 발생 처리는 태그 트리(tag tree) 또는 생명의 트리(tree of life)라고 불릴 수도 있다. 보다 일반적으로, 태그 발생 프로세스는 도 73의 예(1620)에 도시된 바와 같은 계층 구조를 형성한다.Now, in more detail how to generate tags from bootstrap tags. A tag generation process starting with a bootstrap tag may be called a tag tree or tree of life. More generally, the tag generation process forms a hierarchical structure as shown in example 1620 of FIG. 73 .

예(1620)는 태그 발생 프로세스의 기원으로서 boottag(1621)를 도시한다. 요소(1621a 내지 1621d)는 예컨대 위에서 설명한 바와 같이 발생된 초기 태그 세트를 나타낼 수 있다. 이 예에서, 초기 태그 세트(1621a 내지 1621d)는 무한한 수의 특수 명령어 태그의 시퀀스(1622)를 더 발생하는데 사용되는 초기 OS 특수 명령어 태그(1621a)를 포함할 수 있으며, 이후 무한의 특수 명령어 태그는 (1623)에 적용되어 상이한 코드 또는 모듈(1624)의 명령어에 태깅할 수 있다. 초기 OS 특별 명령어 태그(1621a)로부터, 부가적인 태그(1622)가 태깅될 상이한 모듈에 대해 생성될 수 있다. 예를 들어, malloc에 대해 제 1 OS 특수 명령어 태그(1622a)가 발생되고 malloc에 적용되어(1623a), malloc의 명령어가 특수 명령어 태그(1622a)로 태깅된다(1624a). 이러한 방식으로, malloc 코드는 태그 발생기로서 malloc을 식별하는 (예를 들어, malloc 코드가 다른 새로운 태그를 더 발생하고 새로이 발생된 태그를 더 사용하여 다른 메모리 셀을 태깅하는 권한을 갖는 것을 나타내는) 특수 명령어 태그로 태깅될 수 있다.Example 1620 shows boottag 1621 as the origin of the tag generation process. Elements 1621a through 1621d may represent an initial set of tags generated, for example, as described above. In this example, the initial set of tags 1621a through 1621d may include an initial OS special instruction tag 1621a used to further generate a sequence 1622 of an infinite number of special instruction tags, followed by an infinite number of special instruction tags. may be applied to 1623 to tag different code or instructions of module 1624. From the initial OS special command tag 1621a, additional tags 1622 can be created for the different modules to be tagged. For example, a first OS special instruction tag 1622a is generated for malloc and applied to malloc (1623a), so that the instructions in malloc are tagged with special instruction tag 1622a (1624a). In this way, the malloc code has a special code that identifies malloc as the tag generator (e.g., indicating that the malloc code has the authority to generate more new tags and tag other memory cells with more newly generated tags). Can be tagged with command tags.

malloc에 관한 이러한 예에서, (1621b)는 malloc의 인스턴스가 각각의 사용자 애플리케이션에 포함되어 있기 때문에 사용자 애플리케이션 당 하나씩 malloc 태그 발생기 애플리케이션 태그(1626)를 더 발생하는데 사용되는 초기 malloc 태그일 수 있다. 본 발명은 각 사용자 애플리케이션에 있는 그러한 각 malloc 인스턴스에 (1625)에 포함된 상이한 컬러화된 태그를 발생하는 권한을 부여하고자 한다.In this example of malloc, 1621b may be the initial malloc tag used to generate more malloc tag generator application tags 1626, one per user application, since an instance of malloc is included in each user application. The present invention seeks to authorize each such malloc instance in each user application to generate a different colored tag included in 1625.

일반적으로, 예(1620)는 특수 명령어 태그(1621a), Malloc(1621b), CFI(1621c) 및 Taint(162d)에 대한 초기 태그 세트(1621a 내지 1621d)를 도시한다. 따라서, (boottag(1621) 이외의) 제 1 행의 태그의 수직 표시에 있는 태그(1621a 내지 1621d) 각각은 무한 태그 시퀀스를 생성하기 위해 사용된 상이한 초기 태그를 나타낸다. 예를 들어, 값(1621a)은 무한한 수의 특수 명령어 태그(1622)를 더 도출하거나 발생하는데 사용된다. 값(1621b)은 무한한 수의 값(1626)을 더 도출하거나 발생하는데 사용된다. (1626)의 각각의 인스턴스는 각 애플리케이션에 대해 태그의 다른 무한 시퀀스의 발생기로서 더 사용될 수 있다. 예를 들어, (1626a)는 단일 애플리케이션 app1에 사용되는 서로 다른 컬러의 다른 무한 시퀀스 (1629)를 더 발생하는데 사용되는 발생기 값을 나타낸다. 유사한 방식으로, (1626)의 각각의 상이한 발생기 값은 각 애플리케이션에 대해 무한한 수의 컬러를 더 발생하기 위해 사용될 수 있다.In general, example 1620 shows initial tag sets 1621a through 1621d for special instruction tag 1621a, Malloc 1621b, CFI 1621c, and Taint 162d. Thus, each of the tags 1621a to 1621d in the vertical display of tags in the first row (other than boottag 1621) represents a different initial tag used to create the infinite tag sequence. For example, value 1621a is used to further derive or generate an infinite number of special command tags 1622. Value 1621b is used to further derive or generate an infinite number of values 1626. Each instance of 1626 can be further used as a generator of another infinite sequence of tags for each application. For example, 1626a represents a generator value used to further generate another infinite sequence 1629 of different colors used for a single application app1. In a similar manner, each different generator value of 1626 can be used to generate an infinite number of more colors for each application.

값(1621c)은 무한한 수의 값(1627)을 더 발생하는 발생기로서 사용될 수 있다. 요소(1627)는 특정 어플리케이션 또는 앱(app) N에 대해 CFI 태그 발생기 n이 발생할 때마다 다른 무한 시퀀스를 더 발생하는 권한 또는 기능을 나타낸다는 점에서 (1626)과 유사하다. 예를 들어, (1627a)는 단일 애플리케이션 app1에 사용되는 상이한 컬러의 다른 무한 시퀀스(1630)를 더 발생하는데 사용되는 발생기 값을 나타낸다. 유사한 방식으로, (1627)의 각각의 상이한 발생기 값은 각각의 애플리케이션에 대해 무한한 수의 컬러를 더 발생하기 위해 사용될 수 있다.Value 1621c can be used as a generator to generate an infinite number of further values 1627. Element 1627 is similar to 1626 in that it represents authority or functionality to further generate another infinite sequence each time CFI tag generator n occurs for a particular application or app N. For example, 1627a represents a generator value used to further generate another infinite sequence 1630 of different colors used for a single application app1. In a similar manner, each different generator value of 1627 can be used to generate an infinite number of more colors for each application.

값(162d)은 무한한 수의 값(1628)을 더 발생하는 발생기로서 사용될 수 있다. 요소(1628)는 특정 애플리케이션 또는 앱 N에 대한 태그 발생기 n이 발생할 때마다 다른 무한 시퀀스를 더 발생하는 권한 또는 기능을 나타낸다. 예를 들어, (1628a)는 단일 애플리케이션 app1에 사용되는 상이한 컬러의 다른 무한 시퀀스(1631)을 더 발생하는데 사용되는 발생기 값을 나타낸다. 유사한 방식으로, 각각의 상이한 발생기 값(1628)은 각각의 애플리케이션에 대해 무한한 수의 컬러를 더 발생하는데 사용될 수 있다.Value 162d can be used as a generator to generate an infinite number of further values 1628. Element 1628 represents a permission or function to further generate another infinite sequence whenever tag generator n for a particular application or app N occurs. For example, 1628a represents a generator value used to further generate another infinite sequence 1631 of different colors used for a single application app1. In a similar manner, each different generator value 1628 can be used to generate an infinite number of more colors for each application.

도시된 바와 같이, (1621c 내지 1621d)로부터 각각 유래하는 CFI 및 Taint의 시퀀스 또는 서브트리는 (1621b)로부터 시작하는 Malloc 서브트리와 유사하다. 예(1620)에서, nxtTag 또는 TInxtTag는 발생된 무한 시퀀스의 다음 요소를 나타내는데 사용되며, getTag는 시퀀스 멤버로부터 다음 태그를 추출하는데 사용된다. 일반적으로, getTag는 태그 발생기가 아닌, 사용할 태그 자체를 추출하는 것을 나타내는데 사용될 수 있다. 사용 가능한 태그가 특정 코드 부분에 제공되어 사용할 것으로 예정되어 있으면, 본 발명은 코드 부분에 태그를 발생하는 기능을 또한 부여하기를 원하지 않는다. 예를 들면, 각 애플리케이션에 그 애플리케이션을 위한 Malloc 태그 발생기(예를 들어, App1ColorTagX)를 제공하기를 원하지만, 애플리케이션에 다른 애플리케이션을 위한 Malloc 태그 발생기를 발생하는 기능을 제공하기를 원하지 않는다. 그래서, getTag는 타입을 발생기에서 인스턴스로 변경한다. nxtTag와 TInxtTag의 차이점은 nxtTag가 "태깅된 명령어" 없이도 사용 가능하지만, TInxtTag는 적합하게 태깅된 명령어에 의해서만 사용 가능한 것이다.As shown, the sequences or subtrees of CFI and Taint, respectively, from (1621c to 1621d) are similar to the Malloc subtree starting from (1621b). In example 1620, nxtTag or TInxtTag is used to indicate the next element of the generated infinite sequence, and getTag is used to extract the next tag from the sequence member. In general, getTag can be used to indicate retrieving the tag itself to use, not the tag generator. If a usable tag is provided to a particular code segment and is expected to be used, the present invention does not wish to also endow the code segment with the ability to generate the tag. For example, you want to provide each application with a Malloc tag generator for that application (e.g., App1ColorTagX), but you do not want to provide an application with the ability to generate Malloc tag generators for other applications. So, getTag changes the type from generator to instance. The difference between nxtTag and TInxtTag is that nxtTag can be used without "tagged instructions", whereas TInxtTag can only be used with properly tagged instructions.

Malloc 애플리케이션 태그(Malloc Application Tag) 시퀀스(1626)는 오퍼레이팅 시스템 또는 로더가 각 애플리케이션의 컬러 태그 발생기를 발생할 수 있게 한다. 예를 들어, 요소(1626a)는 애플리케이션 컬러 시퀀스(1629)의 태그를 발생하는데 사용되는 애플리케이션-특정 컬러 태그 발생기 값을 나타낸다. 애플리케이션 내에서, AppYColorTag 시퀀스(1629)는 malloc이 각 컬러에 대한 권한을 발생하게 한다. 그 컬러 권한은: 할당된 메모리에 대해 셀을 컬러화하고, 할당을 위한 포인터를 컬러화하고 (예를 들어, free가 호출될 때) 그 컬러의 셀을 프리로 만드는데 사용될 수 있다. 예컨대 malloc 및 free를 이용한 컬러의 사용은 본 명세서의 다른 부분에서 설명된다.The Malloc Application Tag sequence 1626 allows the operating system or loader to generate a color tag generator for each application. For example, element 1626a represents an application-specific color tag generator value used to generate tags of application color sequence 1629. Within the application, the AppYColorTag sequence 1629 causes malloc to generate permissions for each color. That color right can be used to: color cells for allocated memory, color pointers for allocations (eg, when free is called) and make cells of that color free. The use of color, for example with malloc and free, is described elsewhere in this specification.

이러한 방식으로, 상이한 태그는 상이한 용도로 예약될 수 있다. 위에서 언급한 바와 같이 초기에 태깅된 커널 명령어로부터, 다른 코드 부분을 상이한 능력 또는 인가로 더 태깅하는 커널 코드가 실행될 수 있다. 예를 들어, 오퍼레이팅 시스템의 커널 코드는 로더에게 다른 코드 및 데이터를 더 태깅하고, 부가적인 태그 발생기를 발생하는 등의 기능을 부여하는 것과 같이, 로더와 같은 다른 코드 엔티티를 특수 권한으로 더 태깅할 수 있다. 로더는 malloc을 포함하는 사용자 프로그램을 로딩할 때 malloc 코드를 특수 명령어 태그(들)로 더 태깅하여 이를 상이한 메모리 영역을 컬러화하는데 사용되는 다른 태그를 더 발생하는 능력을 제공하는 malloc 코드로서 표시할 수 있다. 따라서 로더의 코드에 배치되는 특정 명령어 태그는 로더에게 한 세트의 권한을 제공한다. malloc 코드상에 제 2 상이한 명령어 태그를 배치하면 malloc 코드에 상이한 권한 세트가 제공된다. 일반적으로, 시퀀스의 태그 발생을 수행할 때, 시퀀스 내의 현재 태그는 시퀀스 내의 다음 태그를 발생하는 것과 관련하여 참조되고 사용되는 상태 정보로서 저장된다. 본 명세서에 설명된 바와 같이, 시퀀스 내의 현재 태그에 관한 이러한 상태 정보는 저장되어 메타데이터 처리 도메인에서 사용될 수 있다. 현재의 태그, 또는 보다 일반적으로는 메타데이터 처리 상태 정보는 규칙 처리 및 캐시 미스 처리의 결과로서 저장되고 복원될 수 있다. 특정 애플리케이션에 사용하기 위해 할당된 마지막 컬러와 같은 시퀀스 내의 현재 태그는 시퀀스의 현재 상태로서 지정된 메모리 위치상의 태그로서 저장될 수 있다. 애플리케이션의 새로운 다음 컬러가 할당되어야 할 때, malloc의 코드는 애플리케이션에 대해 마지막으로 할당된 컬러를 검색하는 규칙을 트리거하고 마지막 할당된 컬러를 사용하여 애플리케이션-특정 컬러 시퀀스의 다음 컬러를 결정한다. 일반적으로, 태그의 고유한 시퀀스를 발생하는 것은 다음과 같은 것을 수행하는 규칙을 트리거하는 명령어의 실행을 포함할 수 있다:In this way, different tags can be reserved for different uses. From the initially tagged kernel instructions as mentioned above, kernel code can be executed further tagging other code parts with different capabilities or authorizations. For example, the operating system's kernel code may further tag other code entities, such as loaders, with special privileges, giving the loader the ability to further tag other code and data, generate additional tag generators, and the like. can When loading a user program containing malloc, the loader may further tag the malloc code with special instruction tag(s) to mark it as malloc code providing the ability to further generate other tags used to color different memory regions. there is. Thus, certain instruction tags placed in the loader's code provide the loader with a set of privileges. Placing a second, different instruction tag on the malloc code gives the malloc code a different set of privileges. Generally, when performing tag firing of a sequence, the current tag in the sequence is stored as state information that is referenced and used in connection with firing the next tag in the sequence. As described herein, this state information about the current tag in the sequence can be stored and used in the metadata processing domain. Current tags, or more generally metadata processing state information, can be saved and restored as a result of rule processing and cache miss processing. A current tag in a sequence, such as the last color assigned for use in a particular application, can be stored as a tag on a memory location designated as the current state of the sequence. When an application's new next color needs to be allocated, malloc's code triggers a rule to retrieve the last allocated color for the application and uses the last allocated color to determine the next color in the application-specific color sequence. In general, generating a unique sequence of tags may involve the execution of a command that triggers a rule to:

1. 원자(예를 들어, 레지스터, 메모리 위치)의 태그 부분에 시퀀스 상태를 저장/보관;1. Store/hold sequence state in the tagged portion of an atom (eg register, memory location);

2. 보관된/저장된 시퀀스 상태를 사용하여 시퀀스의 다음 태그를 발생하는 규칙을 트리거하는 명령어를 실행; 및2. Execute a command that triggers a rule to fire the next tag in the sequence using the archived/saved sequence state; and

3. 다음 태그가 지금 시퀀스의 업데이트된 현재 상태인 원자의 태그 부분에 (2로부터 생성된) 시퀀스의 다음 태그를 저장/보관.3. Save/keep the next tag of the sequence (generated from 2) in the tag part of the atom where the next tag is now the updated current state of the sequence.

예(1620)를 다시 참조하면, 로더는 malloc을 사용하여 각 애플리케이션에 대해 (1626)의 malloc 태그 발생기 어플리케이션 태그 중의 특정한 하나의 태그를 할당할 수 있다. 로더는 예를 들어, (1626a)와 같은 다음 malloc 태그 발생기 태그를 발생하는 규칙을 트리거하는 코드를 실행한 다음, 메모리 위치에 태깅함으로써 이 태그를 상태 정보로서 저장할 수 있다. 후속하여, 애플리케이션에 의한 malloc에게 첫 호출 시, 규칙을 트리거하는 malloc 코드가 실행되어, 보관된 malloc 태그 발생기 태그를 검색하고, 보관된 태그를 사용하여 애플리케이션의 제 1 컬러를 발생한 다음, 저장된 상태를 업데이트하여 제 1 컬러를 애플리케이션에 대해 생성된 마지막 또는 최신 컬러로 저장할 수 있다. 애플리케이션에 의한 malloc에게 두 번째 호출에서, 규칙을 트리거하는 malloc 코드가 실행되어, 이전에 보관된 제 1 컬러를 검색하고, 보관된 제 1 컬러를 사용하여 애플리케이션의 제 2 컬러를 발생한 다음, 보관된 상태 정보를 업데이트하여 이제 제 2 컬러를 애플리케이션에 대해 발생된 마지막 또는 최신 컬러로서 저장할 수 있다. 유사한 방식으로, malloc으로의 다른 후속 호출은 애플리케이션에 대해 보관된 상태 정보(예를 들어, 가장 최근에 할당된 컬러)에 기초하여 추가의 컬러를 할당하는 다른 규칙을 트리거할 수 있다.Referring back to example 1620, the loader may use malloc to allocate a particular one of the malloc tag generator application tags of 1626 for each application. The loader can execute code that triggers a rule to generate the next malloc tag generator tag, e.g., 1626a, and then store this tag as state information by tagging a memory location. Subsequently, on the first call to malloc by the application, the malloc code triggering the rule is executed, retrieves the archived malloc tag generator tag, uses the archived tag to generate the application's first color, then writes the stored state. You can update and save the first color as the last or newest color created for the application. On the second call to malloc by the application, the malloc code that triggers the rule is executed, retrieves the previously stored primary color, uses the stored primary color to generate the application's secondary color, and then The state information can be updated so that the second color can now be stored as the last or newest color generated for the application. In a similar manner, other subsequent calls to malloc may trigger other rules to assign additional colors based on state information maintained for the application (eg, most recently assigned color).

이제 본 명세서에서의 기술에 따른 실시예에 포함될 수 있는 직접 메모리 액세스(direct memory access)(DMA) 아키텍처의 양태가 설명될 것이다. 일반적으로 다음 단락에서는 태깅되지 않은 데이터를 사용하는 제 1 인터커넥트 패브릭에 연결된 신뢰성 없는 디바이스와 같은 소스에서 발급된 DMA를 중재하여 태깅된 데이터를 사용하는 제 2 인터커넥트 패브릭의 메모리에 저장된 데이터에 액세스하는 I/O PUMP의 쓰임새가 설명된다.Aspects of direct memory access (DMA) architectures that may be included in embodiments according to the techniques herein will now be described. In general, the following paragraphs describe I accessing data stored in memory of a second interconnect fabric using untagged data by intervening DMAs issued from sources such as untrusted devices connected to the first interconnect fabric using untagged data. The use of /O PUMP is explained.

도 74를 참조하면, 본 명세서에 설명된 실시예에 포함될 수 있는 컴포넌트의 예가 도시된다. 예(1500)는 본 명세서의 다른 곳에서 설명된 예(700) 및 다른 예(예를 들어, 도 57 내지 도 60)의 컴포넌트와 유사한 번호를 가진 컴포넌트를 포함한다. 또한, 예(1500)는 I/O PUMP(1502) 및 추가의 액터(actor), 즉 메모리(712c)에 저장된 데이터에 액세스하려는 DMA 요청을 발행할 수 있는 DMA 요청 소스 또는 개시자(initiator)(1504a 내지 1504c)를 포함한다. 예(1500)는 태깅되지 않은 패브릭(715)에 연결된 이더넷 DMA 디바이스 A(1504a), 이더넷 DMA 디바이스 B(1504b) 및 UART(universal asynchronous receiver/transmitter)(범용 비동기 수신기/송신기) 또는 직렬 통신 디바이스(1504c)를 포함한다. 데이터를 판독 또는 기입하려는 DMA 요청은 디바이스(1504a 내지 1504c) 중 하나로부터 유래될 수 있다. 이 요청은 DMA 요청이 허용되는지를 결정하는 처리를 수행하는 I/O PUMP(1502)로 전달되고, 만약 허용된다면, 요청이 진행될 수 있게 한다. 따라서, I/O PUMP(1502)는 태깅되지 않은 패브릭(715)을 통해 수신된 DMA 요청을 중재하는 것으로 특징 지을 수 있고, 이에 따라 일반적인 가정은 이러한 DMA 요청을 발행하는 (715)에 연결된 디바이스는 신뢰성 없을 수 있다는 것이다.Referring to FIG. 74 , examples of components that may be included in embodiments described herein are shown. Example 1500 includes similarly numbered components to components in example 700 and other examples (eg, FIGS. 57-60 ) described elsewhere herein. Example 1500 also includes I/O PUMP 1502 and an additional actor, a DMA request source or initiator (which can issue DMA requests to access data stored in memory 712c). 1504a to 1504c). Example 1500 includes Ethernet DMA device A 1504a, Ethernet DMA device B 1504b and a universal asynchronous receiver/transmitter (UART) or serial communication device (universal asynchronous receiver/transmitter) connected to untagged fabric 715. 1504c). A DMA request to read or write data may originate from one of the devices 1504a - 1504c. This request is forwarded to I/O PUMP 1502, which performs processing to determine if the DMA request is allowed, and if so, allows the request to proceed. Accordingly, I/O PUMP 1502 may be characterized as arbitrating DMA requests received over untagged fabric 715, so the general assumption is that a device connected to 715 issuing such a DMA request that it can be unreliable.

적어도 하나의 실시예에서, I/O PUMP(1502)는 본 명세서에 설명된 바와 같은 PUMP의 인스턴스화(예를 들어, 도 22)일 수 있고, 차이점은 시행된 규칙이 메모리(1712c)로의 DMA 액세스를 제어하는 DMA 정책의 규칙이라는 점이다. 전술한 I/O PUMP(1502)의 사용은 메모리 동작을 포함하는 모든 명령어가 규칙에 의해 중재되는 것임을 확인하는 일반적인 아키텍처와 일치한다. 자율 DMA 디바이스(1504a 내지 1504c)가 메모리로의 직접적이고 중재되지 않은 액세스를 허용한다면, DMA 디바이스(1504a 내지 1504c)는 규칙이 시행하는 불변성 및 안전 특성을 훼손할 수 있다. 결과적으로, DMA를 허용하기 위해, 본 명세서에서의 기술에 따른 실시예는 또한 메모리(712c)로의 DMA 액세스에 관한 규칙을 시행할 수 있다. 프로세서 명령어에 대한 규칙을 시행하는 PUMP와 유사하게, I/O PUMP(1502)는 (1504a 내지 1504c)와 같은 DMA 디바이스에서 메모리 로드 및 스토어에 필요한 규칙을 시행한다. 일반적으로, I/O PUMP는 모든 로드 및 스토어를 중재한다. RISC-V 아키텍처에 기초하여 본 명세서에 설명된 적어도 하나의 실시예에서, I/O PUMP는 CSR을 사용하고 RISC-V 아키텍처에서 사용되는 PUMP와 관련하여 본 명세서의 다른 곳에서 설명된 것과 유사한 방식으로 규칙 캐시 미스 처리를 수행한다. I/O PUMP(1502)는 PUMP와 유사한 한 세트의 CSR을 갖지만, 메모리 매핑된 어드레스를 통해 이들에게 액세스한다. 예컨대 예(1520)와 관련하여 다음 단락에서 설명되는 I/O PUMP CSR로의 액세스는 규칙을 사용하여 보호되는 태그일 수도 있다. I/O PUMP에서 규칙을 찾으려 시도할 때 만나게 되는 규칙 캐시 미스는 인터럽트가 프로세서 RISC-V CPU(702)에 의해 서비스되도록 트리거할 수 있다. I/O PUMP는 프로세서(702)와 동일한 규칙 해결 프로세스를 사용하지만, 메모리(712c) 내의 데이터에 액세스하려는 DMA로드 및 스토어에 대한 규칙만을 포함하는 단일 DMA 정책이 존재한다. I/O PUMP는 원자적으로 메모리(712c)에 기입한다(예를 들어, 태그 및 값을 단일 원자 동작으로 기입한다). 그러나, 일부 실시예에서, Mtag를 판독하는 것부터 Mtag을 기입하는 것까지의 (예를 들어, 태그 체크를 수행하거나 입증하고 기입하는 처리까지의) 완전한 프로세스는 표준 스토어와 원자적으로 함께 사용하지 않을 수 있다.In at least one embodiment, I/O PUMP 1502 may be an instantiation of a PUMP as described herein (eg, FIG. 22 ), the difference being that the rules enforced are DMA accesses to memory 1712c. It is a rule of the DMA policy that controls The use of the I/O PUMP 1502 described above is consistent with a general architecture that ensures that all instructions involving memory operations are mediated by rules. If autonomous DMA devices 1504a through 1504c allow direct, unmediated access to memory, DMA devices 1504a through 1504c may break the invariant and safety characteristics enforced by the rules. Consequently, to allow DMA, embodiments consistent with the techniques herein may also enforce rules regarding DMA access to memory 712c. Similar to PUMP, which enforces rules for processor instructions, I/O PUMP 1502 enforces rules necessary for memory loads and stores in DMA devices such as 1504a through 1504c. Normally, the I/O PUMP arbitrates all loads and stores. In at least one embodiment described herein based on the RISC-V architecture, the I/O PUMP uses CSR and in a manner similar to that described elsewhere herein with respect to PUMPs used in the RISC-V architecture. to handle rule cache misses. I/O PUMP 1502 has a set of CSRs similar to PUMPs, but accesses them via memory mapped addresses. For example, access to the I/O PUMP CSR described in the next paragraph with respect to example 1520 may be tagged protected using a rule. A rule cache miss encountered when trying to find a rule in the I/O PUMP may trigger an interrupt to be serviced by the processor RISC-V CPU 702. The I/O PUMP uses the same rule resolution process as processor 702, but there is a single DMA policy containing only the rules for DMA loads and stores trying to access data in memory 712c. The I/O PUMP atomically writes to memory 712c (eg, writes tags and values in a single atomic operation). However, in some embodiments, the complete process from reading the Mtag to writing the Mtag (e.g., performing a tag check or verifying and writing) may not be used atomically with the standard store. can

I/O PUMP(1502)는 SDMP를 위한 규칙 캐시이다. I/O PUMP는 DMA 동작에 연루되는 태그 세트와 동작 결과 사이의 매핑을 제공한다. 적어도 하나의 실시예에서, I/O PUMP는 프로세서(702)와 독립적으로 실행된다. I/O PUMP(1502)는 캐시이기 때문에, 이전에 입력 세트를 본 적이 없을 때(강제적) 또는 규칙에다 보유할 수 없을 때(용량 또는 아마도 충돌), 미스를 발생할 것이다. 이것은 본 명세서에서 PUMP에 대해 설명된 바와 같이 규칙 캐시 미스와 유사한 방식으로 I/O PUMP와 관련된 규칙 캐시 미스를 발생한다. I/O PUMP 규칙 캐시(1502)에 관련한 미스는 - 서비스 프로세서(702)가 트랩을 놓치는 것과 동일한 - 규칙 캐시 미스 핸들러 시스템에 의해 소프트웨어에서 처리되는 인터럽트를 발생시킨다. I/O PUMP(1502)에 관련한 규칙 미스 시, 입력은 아래(예를 들어, 예(1520))에서 설명되는 I/OPUMP CSR을 통해 (예컨대 메타데이터 처리 도메인의 프로세서(702)의 코드상에서 실행되는) 미스 핸들러에 전달되고, 규칙 삽입은 CSR을 통해 I/O PUMP에 다시 제공된다. I/OPUMP 미스는 I/O PUMP가 프로세서(702)에 의해 서비스될 때까지 디스에이블되게 한다. 적어도 하나의 실시예에서, I/O PUMP의 디스에이블 상태는 I/O PUMP가 중재하는 모든 DMA 이전이 I/O PUMP 미스가 서비스될 때까지 기능 중지된다는 것을 의미한다.I/O PUMP 1502 is a rules cache for SDMP. I/O PUMP provides a mapping between a set of tags involved in a DMA operation and an operation result. In at least one embodiment, I/O PUMP runs independently of processor 702. Since the I/O PUMP 1502 is a cache, it will generate a miss when it has not seen the input set before (forced) or cannot hold to the rules (capacity or possibly conflict). This results in rules cache misses associated with I/O PUMPs in a manner similar to rule cache misses as described herein for PUMPs. A miss involving the I/O PUMP rules cache 1502 - equivalent to a missed trap by the service processor 702 - generates an interrupt that is handled in software by the rules cache miss handler system. On rule misses relating to I/O PUMP 1502, the input is executed via the I/OPUMP CSR described below (e.g., example 1520) (e.g., on code of processor 702 in the metadata processing domain). ) is passed to the miss handler, and the rule insert is provided back to the I/O PUMP via CSR. An I/OPUMP miss causes the I/O PUMP to be disabled until serviced by the processor 702. In at least one embodiment, the disabled state of the I/O PUMP means that all DMA transfers that the I/O PUMP mediate are suspended until the I/O PUMP miss is serviced.

본 명세서의 다른 부분에서의 PUMP와의 논의와 일관하여, I/O PUMP 입력은 opgroup(opgrp), DMA 명령 및 그의 오퍼랜드에 대한 태그(예를 들어, PCtag, CI tag, OP1 tag, OP2 tag, Mtag(본 명세서에서 때로는 MRtag라고도 지칭함)를 포함한다. I/O PUMP 출력은 본 출원에서 설명된 바와 같이 Rtag 및 PCnew 태그(다음 명령어의 PC에 대한 태그)를 포함할 수 있다. I/O PUMP와 관련하여, 이러한 입력 및 출력은 일 실시예에서 아래와 같은 다른 의미와 값을 가질 수 있다.Consistent with the discussion of PUMP elsewhere in this specification, an I/O PUMP input is an opgroup (opgrp), a DMA command, and a tag for its operand (e.g., PCtag, CI tag, OP1 tag, OP2 tag, Mtag (sometimes referred to herein as MRtag) The I/O PUMP output may include Rtag and PCnew tags (the tag for the PC of the next instruction) as described herein I/O PUMP and In relation to this, these inputs and outputs may have different meanings and values as follows in one embodiment.

다음의 것은 일 실시예에서 I/O PUMP 입력이다:The following are I/O PUMP inputs in one embodiment:

1. Opgrp - 현재 두 가지가 있다: 로드 및 스토어1. Opgrp - Currently there are two: load and store

2. PCTag - DMA I/O 디바이스의 상태(코드에 대한 PCtag와 유사함)2. PCTag - Status of DMA I/O devices (similar to PCtag for code)

3. CItag - DMA IO 디바이스를 식별하는 태그(지정된 코드 영역의 명령어 태그와 유사)3. CItag - A tag that identifies a DMA IO device (similar to a command tag in a designated code area)

4. OP1tag - 항상 "공개적, 신뢰성 없음"으로 가정 (IOPUMP 캐시에는 물리적으로 표현되지 않지만 규칙에는 사용됨)4. OP1tag - always assumed to be "public, untrusted" (not physically represented in IOPUMP cache, but used in rules)

5. OP2tag - OP1tag와 동일함5. OP2tag - same as OP1tag

6. Mtag - DMA 동작으로의 메모리 입력상의 태그6. Mtag - tag on memory input to DMA operation

7. byteenable - 어느 바이트가 판독/기입 중인가?7. byteenable - which bytes are being read/written?

다음의 것은 일 실시예에서 I/O PUMP 출력이다:Following are the I/O PUMP outputs in one embodiment:

8. Rtag - 스토어에 대한 메모리 결과상의 태그8. Rtag - tag on memory result for store

10. PCnew 태그 - 이 동작 이후 DMA I/O 디바이스의 상태10. PCnew tag - state of DMA I/O device after this operation

I/O PUMP에서, 프로그래머블 opgroup 매핑 테이블이 없을 수 있다(예를 들어, 예(420)). 오히려, I/O PUMP가 규칙을 찾기 위해 사용하는 opgroup은 DMA로드 및 DMA 스토어 동작을 위한 단일 opgroup을 나타내는 고정된 opcode일 수 있다. 적어도 하나의 실시예에서, I/O PUMP를 위한 케어 마스킹(care masking)은 없다.In an I/O PUMP, there may be no programmable opgroup mapping table (e.g., YES 420). Rather, the opgroups that the I/O PUMP uses to find rules may be fixed opcodes representing a single opgroup for DMA load and DMA store operations. In at least one embodiment, there is no care masking for I/O PUMP.

예컨대 도 22에서 본 명세서에 설명된 바와 같이 PUMP와 관련하여 규칙 캐시 미스가 있을 때, 프로세서(702)는 대응하는 규칙이 PUMP 규칙 캐시에 삽입된 후 미스를 야기했던 명령어를 자동으로 재발행할 것으로 예상될 수 있다. 결과적으로, 규칙 삽입은 간단히 규칙을 PUMP 캐시에 배치하고 태깅된 결과를 얻기 위해 명령어를 다시 발행할 것으로 기대한다. 그러나 DMA 동작에 따른 거동은 이전과 다르다. DMA 동작은 인터럽트되고 재시도 동작을 필요로 할 것으로 예상되지 않는다. 이러한 DMA 동작을 지원하기 위해, I/O PUMP에 대한 규칙 삽입은 상이하게 처리될 수 있다. 특히, 일단 미스로 인해 I/O PUMP가 결함을 일으켰다면, 처리는 보류중인 DMA 동작을 보유하고 프로세서(702)를 대기시켜 (예를 들어, 새로운 규칙에 대한 출력 태그 Rtag 및 PCnew tag를 계산하기 위해 규칙 미스 처리를 수행함) 규칙에 대해 미싱 출력 태그를 공급한다(허용될 것으로 가정한다). 출력이 공급될 때, IOPUMP 내로 규칙 기입을 트리거하는 것 외에도, 출력은 마치 I/O PUMP에서 온 것처럼 (예를 들어, 아래의 예(1540)과 관련하여 설명되는) DMA 파이프라인에 포워딩되고, 그래서 I/O PUMP에 동작을 강제로 재 발행되게 하지 않고 동작은 계속될 수 있다. 규칙 위반은 지정된 disenabled-DMA-device 태그를 업데이트된 PCtag, PCnew tag에 제공함으로써 처리될 수 있으며, 이는 동작이 허용되지 않으며 PCtag가 리셋될 때까지 그 특정 DMA 디바이스(1504a 내지 1504c)로부터 더 이상의 DMA 동작이 허용되지 않을 것이라고 신호로 알려줄 것이다. 일반적으로, DMA 동작 또는 DMA 요청을 발행하는 특정 DMA 디바이스, 예컨대 (1504a 내지 1504c) 중 하나의 DMA 디바이스에 대한 디바이스 태그는 발행하는 DMA 디바이스(예를 들어, DMA 요청의 소스)를 고유하게 식별하는 CI의 특정 값 및 PC 태그 DMA 디바이스의 현재 상태를 나타내는 PC 태그의 특정 값일 수 있다. 적어도 하나의 실시예에서, PC 태그는 CI 태그에 의해 식별된 특정 DMA 디바이스로부터의 DMA 요청의 추가 처리를 디스에이블하는 시점에서 특정 값으로 설정될 수 있다.When there is a rule cache miss with respect to PUMP, e.g., as described herein in FIG. 22, the processor 702 expects to automatically reissue the instruction that caused the miss after the corresponding rule has been inserted into the PUMP rule cache. It can be. As a result, rule injection expects to simply place the rule into the PUMP cache and issue the command again to get the tagged result. However, the behavior according to the DMA operation is different from the previous one. DMA operations are interrupted and are not expected to require retry operations. To support this DMA operation, rule insertion for I/O PUMP can be handled differently. In particular, once an I/O PUMP has failed due to a miss, processing holds pending DMA operations and causes processor 702 to wait (e.g., to calculate the output tags Rtag and PCnew tag for the new rule). (assuming it is allowed) supplies a missing output tag for the rule. When an output is supplied, in addition to triggering a rule write into the IOPUMP, the output is forwarded to the DMA pipeline as if it came from an I/O PUMP (e.g., described with respect to example 1540 below); So the operation can continue without forcing the I/O PUMP to reissue the operation. Rule violations can be handled by providing the specified disabled-DMA-device tag to the updated PCtag, PCnew tag, which means the operation is disallowed and no further DMA from that particular DMA device 1504a through 1504c is allowed until the PCtag is reset. It will signal that the action will not be allowed. In general, a device tag for a particular DMA device issuing a DMA operation or DMA request, e.g., one of 1504a through 1504c, is a device tag that uniquely identifies the issuing DMA device (e.g., the source of the DMA request). It may be a specific value of the CI and a specific value of the PC tag indicating the current state of the PC tag DMA device. In at least one embodiment, the PC tag may be set to a specific value at which point it disables further processing of DMA requests from the specific DMA device identified by the CI tag.

도 75를 참조하면, 본 명세서에서의 기술에 따른 실시예에서 I/O PUMP에 의해 사용될 수 있는 CSR의 테이블이 도시된다. 테이블(1520)은 어드레스 열(1524)(CSR의 메모리 매핑된 어드레스를 표시함), 이름 열(1526) 및 설명 열(1528)을 포함한다. 테이블(1520)의 각 행은 I/O PUMP에 의해 사용되는 정의된 CSR 중 하나에 대응한다. 행(1522a)은 CSR 트랜잭션 id가 어드레스 0x00를 가지고 있음을 나타낸다. 트랜잭션 id CSR로의 기입은 (예를 들어, 프리페치하기 위해) 저장된 현재 트랜잭션 id를 증분하고, 트랜잭션 id CSR으로부터의 판독은 트랜잭션 id CSR에 저장된 현재 트랜잭션 id를 리턴한다. 행(1522b)는 CSR opgrp가 0x01 어드레스를 가지고 있음을 나타낸다. opgrp CSR은 현재 DMA 명령어에 대한 opgroup을 포함하며 규칙 미스 시 규칙 미스 핸들러로의 입력으로서 사용된다. 행(1522c)은 CSR byteenable이 어드레스 0x02를 가지고 있음을 나타낸다. byteenable CSR은 워드의 어느 바이트가 DMA 동작에 영향을 미치고 규칙 미스 시 규칙 미스 핸들러로의 입력으로서 사용된다는 것을 나타낸다. 본 명세서에서의 다른 논의와 일관하여, 이것은 정책이 바이트 레벨 보호를 제공할 수 있게 하고; 트리거된 규칙은 DMA 요청된 데이터의 바이트가 예컨대 상이한 DMA 디바이스에 액세스 가능한 메모리 부분을 특수하게 태깅함으로써 그 요청을 개시하는 특정 DMA 디바이스에 의해 액세스되도록 허용되는지를 체크할 수 있다. 행(1522d)은 CSR pctag가 어드레스 0x03을 가지고 있음을 나타낸다. pctag CSR은 현재 DMA 명령어에 대한 PC 태그를 포함하며 규칙 미스 시 규칙 미스 핸들러로의 입력으로서 사용된다. 행(1522e)는 CSR citag가 어드레스 0x04를 가지고 있음을 나타낸다. citag CSR는 현재 DMA 명령어에 대한 CI 태그가 포함되어 있으며 규칙 미스 시 규칙 미스 핸들러로의 입력으로서 사용된다. 행(1522f)은 CSR mtag가 어드레스 0x07을 가지고 있음을 나타낸다. mtag CSR은 현재 DMA 명령어에 대한 M 태그를 포함하며 규칙 미스 시 규칙 미스 핸들러로의 입력으로서 사용된다. 행(1522g)은 CSR newpctag가 어드레스 0x08를 가지고 있음을 나타낸다. newpctag CSR은 현재 DMA 명령어의 완료(예를 들어, PUMP 및 캐시 미스 처리의 출력) 이후에 PC 상에 배치되는 PC new tag를 포함한다. 행(1522h)은 CSR rtag가 어드레스 0x09를 가지고 있음을 나타낸다. rtag CSR은 현재 DMA 명령어의 메모리 결과(예를 들어, PUMP 및 캐시 미스 처리의 출력) 상에 배치되는 태그를 포함한다. 행(1522i)은 CSR 커밋이 어드레스 0x0A를 가지고 있음을 나타낸다. 커밋 CSR에 기입하는 것은 커밋 CSR에 기입된 값과 (트랜잭션 id CSR에 저장된) 현재 트랜잭션 id을 비교하게 한다. 전술한 두 개가 매칭하면, 매칭은 I/O PUMP로의 규칙의 기입을 트리거한다. 기입된 규칙은 현재 DMA 명령어에 대한 opcode 및 (미스 처리에 의해 결정된 것으로서) 태그 입력 및 출력을 포함한다. 행(1522j)은 CSR 상태가 어드레스 0x0E를 가지고 있음을 나타낸다. 상태 CSR는 I/O PUMP의 상태를 나타내는 값을 포함한다. 예를 들어, 본 명세서에 설명된 바와 같은 일 실시예에서, 상태 CSR는 I/O PUMP가 인에이블 또는 디스에이블되는지를 나타낼 수 있다. 이것은 본 명세서의 다른 곳에서 설명된 바와 같이 PUMP I/O 규칙 캐시 미스의 경우에는 디스에이블될 수 있다. 행(1522k)은 CSR 플러시가 0x0F 어드레스를 갖는다는 것을 나타낸다. flush CSR은 기입될 때, I/O PUMP의 플러시를 트리거한다(예를 들어, I/O PUMP 캐시로부터의 규칙을 플러시하거나 클리어한다).Referring to FIG. 75, a table of CSRs that may be used by an I/O PUMP in an embodiment according to the technology herein is shown. The table 1520 includes an address column 1524 (indicating the memory mapped address of the CSR), a name column 1526, and a description column 1528. Each row of table 1520 corresponds to one of the defined CSRs used by the I/O PUMP. Row 1522a indicates that the CSR transaction id has address 0x00. A write to transaction id CSR increments the current transaction id stored (eg, to prefetch), and a read from transaction id CSR returns the current transaction id stored in transaction id CSR. Row 1522b indicates that the CSR opgrp has an address of 0x01. The opgrp CSR contains the opgroup for the current DMA instruction and is used as input to the rule miss handler in case of a rule miss. Row 1522c indicates that CSR byteenable has address 0x02. A byteenable CSR indicates which bytes of the word affect the DMA operation and are used as inputs to the rule miss handler on rule misses. Consistent with other discussions herein, this allows policies to provide byte-level protection; A triggered rule may check if the bytes of data requested by the DMA are allowed to be accessed by the particular DMA device initiating the request, for example by specially tagging a portion of memory that is accessible to a different DMA device. Row 1522d indicates that the CSR pctag has address 0x03. The pctag CSR contains the PC tag for the current DMA instruction and is used as input to the rule miss handler on rule misses. Row 1522e indicates that CSR citag has address 0x04. The citag CSR contains the CI tag for the current DMA command and is used as input to the rule miss handler in case of a rule miss. Row 1522f indicates that the CSR mtag has address 0x07. The mtag CSR contains the M tag for the current DMA command and is used as input to the rule miss handler on rule misses. Line 1522g indicates that CSR newpctag has address 0x08. The newpctag CSR contains the PC new tag that is placed on the PC after the completion of the current DMA instruction (e.g., the output of PUMP and cache miss handling). Line 1522h indicates that the CSR rtag has address 0x09. The rtag CSR contains the tag placed on the memory result of the current DMA instruction (e.g., the output of PUMP and cache miss handling). Row 1522i indicates that the CSR commit has address 0x0A. Writing to the commit CSR causes a comparison of the value written to the commit CSR with the current transaction id (stored in the transaction id CSR). If the two above match, the match triggers a write of the rule to the I/O PUMP. The written rules include the opcode for the current DMA instruction and tag inputs and outputs (as determined by miss processing). Row 1522j indicates that the CSR status has address 0x0E. The status CSR contains a value indicating the status of the I/O PUMP. For example, in one embodiment as described herein, the status CSR may indicate whether the I/O PUMP is enabled or disabled. This may be disabled in case of a PUMP I/O rule cache miss as described elsewhere herein. Row 1522k indicates that the CSR flush has an 0x0F address. The flush CSR, when written, triggers a flush of the I/O PUMP (eg, flushes or clears rules from the I/O PUMP cache).

적어도 하나의 실시예에서, 상태 CSR의 비트 0이 1이면, 이는 I/O PUMP가 디스에이블됨을 의미하고, 그렇지 않고 비트 0이 0의 값을 갖는다면, I/O PUMP가 디스에이블됨을 의미한다. PUMP I/O 미스는 pump를 디스에이블한다. 상태 CSR의 비트1은 PUMP가 결함을 일으켰고 서비스를 대기 중인지를 나타낸다(예를 들어, Bit1=1은 I/O PUMP 결함/캐시 미스 및 서비스 대기 중임을 의미한다). 상태 CSR의 비트2는 I/O PUMP 규칙 미스가 현재 규칙 캐시 미스 핸들러에 의해 해결되고 있는지를 나타내며, 트랜잭션 id가 매칭하면, 삽입된 결과를 계류중인 미스 동작으로 직접 제공할 것이다. 상태 CSR의 전술한 모든 비트는 커밋 동작(성공 또는 실패)에 의해 리셋된다(예를 들어, bit 0=인에이블됨, bit 1=결함 없음, bit 2=계류중인 미스 없음). 상태 CSR에 기입하는 것은 예를 들어, 스타트업 때 초기에 I/O PUMP를 인에이블하기 위해 필요한 것으로서, 전술한 비트들을 리셋하기 위해서도 또한 수행될 수 있다. 실패한 기입에 대해 상태 CSR의 리셋은 DMA 디바이스가 동작을 재 시도하고 결함을 다시 트리거할 수 있게 한다.In at least one embodiment, if bit 0 of the status CSR is 1, it means I/O PUMP is disabled, otherwise, if bit 0 has a value of 0, it means I/O PUMP is disabled. . A PUMP I/O miss disables the pump. Bit 1 of the Status CSR indicates whether the PUMP has failed and is waiting for service (e.g. Bit1=1 means I/O PUMP Fault/Cache Miss and Waiting for Service). Bit 2 of the Status CSR indicates whether an I/O PUMP rule miss is currently being resolved by the rule cache miss handler, and if the transaction id matches, it will provide the inserted result directly to the pending miss operation. All of the aforementioned bits of the status CSR are reset by a commit operation (success or failure) (e.g., bit 0 = enabled, bit 1 = no fault, bit 2 = no pending misses). Writing to the state CSR may also be performed to reset the aforementioned bits, eg as needed to initially enable the I/O PUMP at startup. A reset of the state CSR for a failed write allows the DMA device to retry the operation and trigger the fault again.

프로세서(702)에 의한 I/O PUMP CSR로의 로드/스토어 메모리 동작은 iopump CI 태그로 태깅되어야 한다. iopump CI 태그를 가진 명령어로의 동작을 제한하는 정책 규칙이 적소에 있어야 한다. 개별 I/O PUMP CSR에는 태그가 없다.A load/store memory operation into an I/O PUMP CSR by processor 702 must be tagged with an iopump CI tag. There must be a policy rule in place that restricts actions to commands with the iopump CI tag. Individual I/O PUMP CSRs do not have tags.

태깅되지 않은 또는 신뢰성 없는 패브릭(715) 상의 각각의 디바이스(1504a 내지 1504c)는 프로세서가 디바이스에 로드 또는 스토어를 수행할 때 디바이스 태그로서 제시되는 자신의 태그로 구성될 수 있다(예를 들어, 디바이스 태그가 아래에 설명된 그리고 특정 디바이스가 DMA로드 또는 스토어를 수행할 때 CI 태그로서 특정된 레지스터 파일에 저장된 경우의 (1534b) 참조). 이로써 어떤 코드와 인가가 어떤 디바이스에 직접 액세스할 수 있는지에 대한 미세-세밀화된 제어가 가능하다. 디바이스로의 모든 로드 및 스토어 디바이스에 동일한 태그가 제공되며, 그 태그는 로드 및 스토어 동작에 따라 변경되지 않는다. 각각의 디바이스(1504a 내지 1504c)와 연관되고 식별하는 특정 디바이스 태그는 디바이스 레지스터 파일에 저장될 수 있다. 디바이스(1504a 내지 1504c)에 대해 특정된 특정 디바이스 태그는 디바이스 레지스터 파일을 수정함으로써만 변경될 수 있다. 디바이스 레지스터 파일은 각 디바이스(1504a 내지 1504c)에 대해 고유 타깃 디바이스 id (태깅되지 않은 또는 신뢰성 없는 패브릭(715)상의 디바이스를 식별하는데 사용됨) 및 고유 타깃 디바이스 id 에 대한 타깃 디바이스 특정 태그를 나타낼 수 있다. 적어도 하나의 실시예에서, 디바이스 레지스터 파일 자체는 자체 디바이스 태그를 사용하여 신뢰성 없는 패브릭(715)상의 디바이스로서 액세스될 수 있다. 디바이스 레지스터 파일의 사용을 부트스트랩하기 위해, (디바이스 레지스터 파일에 저장된) 디바이스 태그 레지스터 파일의 자체 태그는 스타트업 동안 PUMP가 인에이블되기 전에 파일에 기입될 수 있다. 예를 들어, 디바이스 태그 레지스터 파일의 자체 태그는 PUMP가 오프(예를 들어, 예(910)의 (911a)에 의해 표시되는 tagmode) 동안 부트 처리의 일부로서 파일에 기입될 수 있다. 명령어의 CI 태그는 로드 또는 스토어 명령어를 수행하는 DMA 타깃 디바이스의 타깃 ID를 식별할 수 있으며, 위의 명령어에서 CI 태그는 그러한 로드 및 스토어 동작에 의해 트리거된 규칙에서 사용되어 특정된 DMA 디바이스에 의한 특정 로드 또는 스토어 동작을 제한(예를 들어, 허용 또는 허용하지 않음)할 수 있다. 또한, 특정 DMA 디바이스가 허용되지 않는 로드 및/또는 스토어 동작을 수행하면, 특정 DMA 디바이스와 연관된 상태는 디스에이블되도록 수정되어, 더 이상의 요청(예를 들어, DMA로드 및 스토어)이 무시되도록 한다.Each device 1504a - 1504c on the untagged or untrusted fabric 715 can be configured with its tag presented as a device tag when the processor performs a load or store to the device (e.g., device 1534b) where the tag is stored in the register file described below and specified as a CI tag when the particular device performs a DMA load or store). This allows fine-grained control over which codes and authorizations can directly access which devices. All load to and store devices are given the same tag, and the tag does not change with load and store operations. The specific device tags associated with and identifying each device 1504a - 1504c may be stored in a device register file. The specific device tags specified for devices 1504a through 1504c can only be changed by modifying the device register file. The device register file may indicate for each device 1504a - c a unique target device id (used to identify devices on untagged or untrusted fabric 715) and a target device specific tag for the unique target device id. . In at least one embodiment, the device register file itself can be accessed as a device on untrusted fabric 715 using its own device tag. To bootstrap the use of the device register file, the device tag register file's own tag (stored in the device register file) may be written to the file before PUMP is enabled during startup. For example, a self tag in the device tag register file may be written to the file as part of the boot process while PUMP is off (eg, tagmode indicated by 911a of example 910). The CI tag of the command can identify the target ID of the DMA target device performing the load or store command, and in the above command the CI tag is used in rules triggered by those load and store operations to determine the target ID of the specified DMA device. You can restrict (eg allow or disallow) certain load or store operations. Also, if a particular DMA device performs a disallowed load and/or store operation, the state associated with the particular DMA device is modified to be disabled, allowing further requests (e.g., DMA loads and stores) to be ignored.

위에서 언급한 바와 같이, DMA 요청 또는 명령어를 개시하거나 그 소스인 DMA 디바이스는 DMA 디바이스의 PCtag에 의해 지시되는 연관된 상태를 가질 수 있다. 특히, 고유한 PCtag는 (CI 태그에 의해 식별된) DMA 디바이스로부터 허용되는 DMA 동작에 대해 디스에이블된 상태를 나타내기 위해 사용될 수 있다. 디스에이블된 개시자는 아래(예를 들어, 예(1530 및 1540))에서 설명되는 DMA 또는 트러스트브릿지(Trustbridge) 파이프라인의 시작 시 자신의 DMA 요청을 거절할 수 있다.As mentioned above, a DMA device that initiates or is the source of a DMA request or command may have an associated state indicated by the PCtag of the DMA device. In particular, a unique PCtag can be used to indicate a disabled state for allowed DMA operations from a DMA device (identified by the CI tag). A disabled initiator may reject its DMA request upon initiation of a DMA or Trustbridge pipeline described below (e.g., examples 1530 and 1540).

실시예는 모든 DMA 트래픽, DMA 엔진 당 I/O PUMP, 또는 다수의 DMA 엔진에 대해 DMA 트래픽을 중재하는 다수의 I/O PUMP를 중재하는 단일 I/O PUMP를 가질 수 있다는 것을 알아야 한다. 예(1510)에는 단일 DMA 엔진(예를 들어, 단일 메모리(712c))에 대한 단일 I/O PUMP가 도시된다. 예(1500)에서와 같이 단일 I/O PUMP의 사용은 병목이 될 수 있으며, 따라서 실시예는 다수의 I/O PUMP가 I/O 트래픽을 중재하도록 선택할 수 있다. 다수의 I/O PUMP가 존재하는 그러한 실시예에서, 각각은 독립적으로 인에이블 또는 디스에이블될 수 있으므로, 다수의 I/O PUMP 중 하나 이상의 제 1 부분이 (I/O PUMP 미스로 인해) 디스에이블될 수 있을지라도, 다중 I/O PUMP의 나머지 제 2 부분은 디스에이블되어 DMA 요청을 계속 서비스할 수 있다.It should be noted that embodiments may have a single I/O PUMP to arbitrate all DMA traffic, an I/O PUMP per DMA engine, or multiple I/O PUMPs to arbitrate DMA traffic for multiple DMA engines. Example 1510 shows a single I/O PUMP to a single DMA engine (eg, single memory 712c). As in example 1500, the use of a single I/O PUMP may be a bottleneck, so embodiments may choose to have multiple I/O PUMPs arbitrate I/O traffic. In such an embodiment where there are multiple I/O PUMPs, each can be independently enabled or disabled so that a first portion of one or more of the multiple I/O PUMPs is disabled (due to an I/O PUMP miss). Even if enabled, the remaining second portion of the multiple I/O PUMP may be disabled to continue servicing DMA requests.

적어도 하나의 실시예에서, DMA 동작의 개시자 또는 소스로서 작용하는 상이한 DMA 디바이스는 각각 메모리(712c)의 특정된 부분에만 액세스하도록 허용될 수 있다. DMA를 통해 액세스 가능한 메모리(712c)의 상이한 부분은 개별 태그로 각각 태깅될 수 있다. 예를 들어, 디바이스(1504a)는 메모리(712c)의 제 1 어드레스 범위에 액세스할 수 있고 디바이스(1504b)는 메모리(712c)의 다른 제 2 어드레스 범위에 액세스할 수 있다. 제 1 범위에 대응하는 메모리 위치(712c)는 제 1 태그로 태깅될 수 있고, 제 2 범위에 대응하는 메모리 위치(712c)는 제 2 태그로 태깅될 수 있다. 이러한 방식으로, 제 1 범위 내의 메모리 위치에 디바이스(1504a)의 액세스를 시행하거나 제한하고 제 2 범위 내의 메모리 위치에 디바이스(1504b)의 액세스를 시행하거나 제한하는 규칙이 사용될 수 있다. 변형예로서, 상이한 태그가 허용된 액세스의 유형(예를 들어, 판독 전용, 기입 전용 판독 및 기입)과 연관될 수 있다. 유사한 방식으로, 동일한 메모리(712c)를 액세스하는 다수의 DMA 엔진을 갖는 실시예에서, DMA 엔진 각각에 독점적으로 액세스 가능한 단일 메모리(712c)의 상이한 부분은 고유하게 태깅될 수 있고, 그에 따라 규칙은 메모리 위치의 특정된 어드레스 범위에 각 DMA 엔진의 액세스를 시행하고 제한한다.In at least one embodiment, different DMA devices acting as initiators or sources of DMA operations may each be allowed to access only specified portions of memory 712c. The different portions of memory 712c accessible via DMA may each be tagged with separate tags. For example, device 1504a can access a first address range of memory 712c and device 1504b can access another second address range of memory 712c. A memory location 712c corresponding to the first range may be tagged with a first tag, and a memory location 712c corresponding to a second range may be tagged with a second tag. In this way, a rule can be used that enforces or restricts device 1504a's access to memory locations within a first range and device 1504b's access to memory locations within a second range. As a variant, different tags may be associated with the type of access allowed (eg read only, write only read and write). In a similar manner, in an embodiment with multiple DMA engines accessing the same memory 712c, the different portions of a single memory 712c accessible exclusively to each DMA engine may be uniquely tagged, so the rule is Enforces and limits each DMA engine's access to a specified address range of memory locations.

도 76을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 신뢰성 있는 패브릭(1532)(예를 들어, 태깅된 인터커넥트 패브릭(710)에 대응함)과 신뢰성 없는 패브릭(1536)(예를 들어, 태깅되지 않은 인터커넥트 패브릭(715)에 대응함) 사이의 데이터 흐름을 도시하는 예가 도시된다. 요소(1534)는 일반적으로 (1532)와 (1536) 사이의 DMA 중재와 관련하여 I/O PUMP(1534a)에 의해 수행되는 처리를 나타낸다. 요소(1534)는 DMA 중재의 일부로서 DMA 동작을 입증하고 서비스하기 위해 수행되는 트러스트 브리지 또는 DMA 파이프라인(1534c)을 나타낼 수 있다. 요소(1538a)는 신뢰성 없는 패브릭(1536)으로부터 (예를 들어, 예컨대 예(1500)의 DMA 디바이스(1504a 내지 1504c)까지의) 출력 채널을 나타낼 수 있다. 요소(1538b)는 (예를 들어, 디바이스(1504a 내지 1504c) 중 하나로부터) 신뢰성 없는 패브릭(1536)으로의 입력 채널을 나타낼 수 있다. 일반적으로, I/O PUMP(1534a)는 DMA 판독 및 기입 동작 동안 판독 요청을 발행하여 타깃 메모리상의 태그가 요청된 DMA 액세스를 허용하는 것을 입증할 필요가 있다. I/O PUMP는 (아래의 예(1540)에서 처리 스테이지 사이에서 설명되는 바와 같이) 요청을 버퍼링하고 태깅된 통신 동작의 마스터 제어를 수행해야 할 것이다.Referring to FIG. 76 , in an embodiment according to the techniques herein, a trusted fabric 1532 (e.g., corresponding to tagged interconnect fabric 710) and an untrusted fabric 1536 (e.g., corresponding to tagged interconnect fabric 710) An example depicting data flow between interconnect fabrics (corresponding to unmodified interconnect fabric 715) is shown. Element 1534 generally represents processing performed by I/O PUMP 1534a in connection with DMA arbitration between 1532 and 1536. Element 1534 may represent a trust bridge or DMA pipeline 1534c performed to verify and service DMA operations as part of DMA arbitration. Element 1538a may represent an output channel from untrusted fabric 1536 (e.g., to DMA devices 1504a-1504c of example 1500). Element 1538b may represent an input channel to untrusted fabric 1536 (eg, from one of devices 1504a - 1504c ). Typically, I/O PUMP 1534a needs to verify that the tag on the target memory allows the requested DMA access by issuing a read request during DMA read and write operations. The I/O PUMP will have to buffer the request and perform master control of the tagged communication operation (as described between processing stages in example 1540 below).

요소(1537)는 예(1520)에 설명된 바와 같이 I/O PUMP CSR을 로드(또는 검색)하는 입력으로서 제공된 값을 나타낸다. 또한, 상이한 DMA 디바이스 개시자의 디바이스 상태 정보는 PCtag(예를 들어, 이 DMA 디바이스로부터의 요청이 디스에이블되는지와 같은 DMA 디바이스의 상태) 및 (예를 들어, 신뢰성 없는 패브릭(715)상의 (1504a 내지 1504c)와 같은) DMA 디바이스 개시자의 CItag(예를 들어, DMA 디바이스의 고유 식별자)를 포함하는 신뢰성 없는 패브릭 디바이스 레지스터 파일(1534b)에 저장될 수 있다. DMA로드 또는 스토어를 수행하는 특정 DMA 디바이스에 대한 디바이스 레지스터 파일(1534b)의 엔트리는 현재 DMA로드 또는 스토어에 대한 CItag 및 PCtag 값을 제공할 수 있다. 요소(1535a)는 (1534)의 (1534)의 중재된 DMA 처리 요청을 만들어 주는 신뢰성 없는 패브릭(1536)상의 디바이스에 대해 사용된 채널을 표시한다. 요소(1535b)는 신뢰성 없는 패브릭(1536)으로의 (1534)의 중재된 DMA 요청의 결과를 리턴하는데 사용되는 채널을 나타낼 수 있다.Element 1537 represents a value provided as input to load (or retrieve) the I/O PUMP CSR as described in example 1520. In addition, the device state information of the different DMA device initiators is PCtag (e.g., the state of the DMA device, such as whether requests from this DMA device are disabled) and (e.g., 1504c)) of the DMA device initiator's CItag (eg, the unique identifier of the DMA device) may be stored in untrusted fabric device register file 1534b. An entry in the device register file 1534b for the particular DMA device performing the DMA load or store may provide the CItag and PCtag values for the current DMA load or store. Element 1535a indicates the channel used for the device on untrusted fabric 1536 making the 1534 arbitrated DMA processing request of 1534. Element 1535b may represent a channel used to return the result of 1534's arbitrated DMA request to untrusted fabric 1536.

요소(1531a 내지 1531b)는 신뢰성 없는 패브릭(1536)로부터 ((1534)의 DMA 중재 처리를 통해) 신뢰성 있는 패브릭(1532)으로 DMA 요청을 포워딩하기 위한 채널을 나타낸다. 특히, 채널(1531a)은 초기 태그 판독된 (입증되지 않은) DMA 요청을 신뢰성 있는 패브릭(1532)로 포워딩하기 위한 채널이고, 채널(1531b)은 업데이트된 태그를 사용하여 데이터의 최종 기입을 포워딩하기 위한 제 2 채널이다. 두 개의 채널을 사용하는 것은 예(1540)와 관련하여 아래에서 설명되는 DMA 또는 트러스트브릿지 파이프라인에 관한 더 이상의 논의를 고려해볼 때 더 명백해질 수 있다. 요소(1531c)는 DMA 중재 처리(1534)를 통해 신뢰성 있는 패브릭(1531c)으로부터 신뢰성 없는 패브릭으로의 채널을 나타낸다.Elements 1531a-1531b represent channels for forwarding DMA requests from the untrusted fabric 1536 to the trusted fabric 1532 (through the DMA arbitration process at (1534)). In particular, channel 1531a is for forwarding initial tag read (unvalidated) DMA requests to trusted fabric 1532, and channel 1531b is for forwarding final writes of data using updated tags. This is the second channel for The use of two channels may become more apparent when considering further discussion regarding DMA or Trustbridge pipelines described below with respect to example 1540. Element 1531c represents a channel from the trusted fabric 1531c to the untrusted fabric through the DMA arbitration process 1534.

일 실시예에서, 요소(1534)는 도 77의 예(1540)에 도시된 바와 같이 DMA 처리 파이프라인을 나타낼 수 있다. 예(1540)은 신뢰성 없는 또는 태깅되지 않은 패브릭(예를 들어, 예(1500)의 (1506), 예(1530)의 (1536))으로부터 DMA 디바이스(1504a 내지 1504c)에 의해 수행된 DMA 동작을 서비스하기 위한 4 스테이지 처리 파이프라인을 나타낸다. 요소(1542, 1544, 1546 및 1548)는 DMA 요청의 결과로서 트리거된 규칙을 나타낼 수 있다. 요소(1545)는 예컨대 다른 도면(예를 들어, (1500)의 (1502))과 관련하여 설명된 I/O PUMP를 나타낸다. 요소(1543)는 DMW 처리 파이프라인의 스테이지를 나타낸다. 제 1 스테이지(1541a)에서, DMA 요청은 신뢰성 없는 패브릭으로부터 수신되고, 입증되지 않은 요청은 메모리(712c)로부터 요청된 DMA 데이터 및 그의 연관된 태그를 획득하기 위해 제 2 메모리 페치 스테이지(1541b)의 규칙(1542)을 통해 이루어진다. DMA 요청된 데이터에 대해 메모리로부터의 페치된 태그 정보는 현재 DMA 요청에 대응하는 규칙에 대한 I/O PUMP 캐시(1545)에서 룩업이 수행되는 제 3 입증 스테이지(1541c)로의 입력으로서 제공된다. I/O PUMP에서 규칙이 발견되지 않으면, I/O PUMP 처리는 스테이지(1541c)에서 정지되고 디스에이블될 수 있지만, DMA 미스 핸들러는 프로세서(702)에서 실행되어 DMA 요청에 대한 출력 Rtag 및 PCnew tag를 계산하거나 그렇지 않으면 현재 DMA 요청이 허용되지 않음을 결정한다(이에 따라 결함 또는 트랩을 트리거한다). 현재 DMA 요청에 대한 규칙을 I/O PUMP에서 찾는다고 가정하면, DMA 요청이 실행되도록 허용된다고 결정된다. DMA 요청이 기입 요청이면, DMA 요청의 기입 데이터는 그의 태그 정보와 함께 스테이지4(1541d)에서 메모리(712c)에 라이트백된다. DMA 기입 동작의 경우, 일단 기입이 완료되면, 응답(1548a)이 신뢰성 없는 패브릭(및 그 다음으로 DMA 요청을 개시한 DMA 디바이스)에 제공될 수 있다. DMA 판독 동작의 경우, 응답(1546a)이 신뢰성 없는 패브릭(및 그 다음으로 DMA 요청을 개시한 DMA 디바이스)으로 리턴되고, 이 경우 응답은 스테이지 2(1541b)에서 페치된 요청된 데이터를 포함한다.In one embodiment, element 1534 may represent a DMA processing pipeline as shown in example 1540 of FIG. 77 . Example 1540 illustrates DMA operations performed by DMA devices 1504a through 1504c from untrusted or untagged fabrics (e.g., 1506 of example 1500, 1536 of example 1530). Represents a 4 stage processing pipeline for servicing. Elements 1542, 1544, 1546 and 1548 may represent rules triggered as a result of the DMA request. Element 1545 represents, for example, an I/O PUMP as described in connection with other figures (e.g., 1502 of 1500). Element 1543 represents a stage of the DMW processing pipeline. In the first stage 1541a, a DMA request is received from the untrusted fabric, and the unsubstantiated request follows the rules of the second memory fetch stage 1541b to obtain the requested DMA data and its associated tag from the memory 712c. (1542). The tag information fetched from memory for the DMA requested data is provided as input to the third verification stage 1541c where a lookup is performed in the I/O PUMP cache 1545 for the rule corresponding to the current DMA request. If no rules are found on the I/O PUMP, then I/O PUMP processing may be stopped and disabled at stage 1541c, but a DMA miss handler may be executed on processor 702 to output Rtag and PCnew tags for the DMA request. Calculate or otherwise determine that the current DMA request is disallowed (thus triggering a fault or trap). Assuming the rule for the current DMA request is found in the I/O PUMP, it is determined that the DMA request is allowed to execute. If the DMA request is a write request, the write data of the DMA request along with its tag information is written back to the memory 712c at stage 4 (1541d). For a DMA write operation, once the write is complete, a response 1548a can be provided to the untrusted fabric (and then to the DMA device that initiated the DMA request). For a DMA read operation, a response 1546a is returned to the untrusted fabric (and then to the DMA device that initiated the DMA request), in which case the response includes the requested data fetched in stage 2 1541b.

요소(1542)는 스테이지 2(1541b)에서 메모리 페치가 수행되는 동안 신뢰성 없는 패브릭으로부터의 요청을 통과시키고 (스테이지 3(1541c)에서) I/O PUMP(1545)에 대한 I/O 요청에 관한 (스테이지 1(1541a)로부터) 정보를 통과시키는 규칙을 나타낼 수 있다. 요소(1544)는 신뢰성 있는 패브릭으로부터의 태그 응답을 모으고, I/O PUMP에 입력된 실제 규칙을 공식화하며, I/O PUMP의 출력과 병합되도록 스테이지(1541b)로부터 라이트백 스테이지(154dd)로 정보를 전파하는 규칙을 나타낼 수 있다.Element 1542 passes requests from the untrusted fabric during stage 2 1541 b while memory fetches are being performed (at stage 3 1541 c) and passes the I/O request to I/O PUMP 1545 (at stage 3 1541 c). It may indicate rules for passing information (from stage 1 1541a). Element 1544 gathers tag responses from the trusted fabric, formulates the actual rules input to the I/O PUMP, and transfers information from stage 1541b to writeback stage 154dd to be merged with the output of the I/O PUMP. can indicate the propagation rule.

전술한 실시예의 변형예로서, 예(1500)를 다시 참조한다. 적어도 하나의 실시예에서, 위에서 설명한 바와 같이 규칙을 I/O PUMP 캐시에 저장하기 보다는, I/O PUMP는 하드와이어드 I/O PUMP로서 구현될 수 있고, 하드와이어드 I/O PUMP에서 규칙은 고정 세트의 I/O PUMP 로드 및 스토어 규칙을 구현하기 위해 배선된 로직 게이트와 같은 전용 하드웨어를 사용하여 구현될 수 있다.As a variation of the foregoing embodiment, reference is made to example 1500 again. In at least one embodiment, rather than storing rules in an I/O PUMP cache as described above, an I/O PUMP may be implemented as a hardwired I/O PUMP, in which the rules are fixed. It can be implemented using dedicated hardware, such as hardwired logic gates to implement a set of I/O PUMP load and store rules.

또 다른 변형예로서, I/O PUMP는 예(1500)와 관련하여 설명된 바와 같이 또 다른 실시예에서 프로그래머블 캐시로서 대안적으로 정의될 수 있으며, 차이점은 규칙 캐시로서 I/O PUMP가 유한 용량을 갖고 I/O PUMP 캐시에 모두 저장되는 고정된 규칙 세트로 채워진다는 점이다. 이러한 후자의 실시예에서, I/O PUMP는 모든 DMA 규칙의 완전한 세트로 채워질 수 있으므로 I/O PUMP에 대한 규칙 캐시 미스는 결코 없다. 따라서, I/O PUMP 규칙 캐시 미스를 서비스할 필요가 절대로 없다.As another variation, the I/O PUMP could alternatively be defined as a programmable cache in another embodiment as described with respect to example 1500, the difference being that as a rules cache, the I/O PUMP has finite capacity. and is populated with a fixed set of rules that are all stored in the I/O PUMP cache. In this latter embodiment, there is never a rule cache miss for an I/O PUMP since the I/O PUMP can be filled with the complete set of all DMA rules. Thus, there is absolutely no need to service I/O PUMP rule cache misses.

이제는 메모리 위치와 연관될 수 있는 태그 초기화, 설정 또는 리셋팅과 관련하여 사용될 수 있는 기술이 설명될 것이다. 본 명세서의 다른 설명과 일관하여, 그러한 기술과 관련하여 사용되는 태그는 비-포인터 태그(비-포인터 태그는 연관된 메모리 위치에 대한 실제 태그 값임) 또는 포인터 태그(포인터 태그는 포인터 또는 실제 태그 값 또는 값들을 포함하는 다른 메모리 위치의 어드레스임)를 나타낼 수 있다. 예를 들어, 메모리 위치와 연관된 포인터 태그는 복합 태그와 관련하여 사용될 수 있으며, 복합 태그에서 포인터는 예컨대 병렬로 구현된 복수의 복합 정책에 대한 다수의 태그 값을 포함하는 메모리 내의 어드레스를 식별한다. 본 명세서의 다른 곳에서 설명된 바와 같이, 병렬로 지원될 수 있는 예시적인 복합 정책은 본 명세서에서 설명된 메모리 안전 정책 및 CFI(제어 흐름 무결성) 정책을 포함한다.Techniques that may be used in connection with initializing, setting, or resetting a tag that may be associated with a memory location will now be described. Consistent with other descriptions herein, tags used in connection with such techniques may be non-pointer tags (a non-pointer tag is an actual tag value for an associated memory location) or a pointer tag (a pointer tag is a pointer or actual tag value or is the address of another memory location that contains values). For example, a pointer tag associated with a memory location may be used in conjunction with a composite tag where a pointer identifies an address in memory that contains multiple tag values, eg for multiple composite policies implemented in parallel. As described elsewhere herein, exemplary composite policies that may be supported in parallel include the memory safety policy and the CFI (Control Flow Integrity) policy described herein.

메모리 안전 및 스택 정책과 관련하여 수행되는 처리는 예를 들어, 메모리 위치와 연관된 많은 수의 태그를 특정 값으로 설정하거나 초기화하는 것을 포함할 수 있다. 예를 들어, 메모리 영역의 할당이 예컨대 특정 컬러와 관련될 수 있을 때, 영역 내의 메모리 위치와 연관된 각각의 태그는 특정 컬러 값을 갖도록 초기화되어야 한다. 다른 예로서, 메모리 영역을 되돌려 놓을 때, 예컨대 메모리 영역을 프리로 만들 때, 프리 또는 할당되지 않은 영역의 모든 메모리 위치는 프리 또는 할당되지 않은 메모리 위치를 나타내는 특정 태그 값으로 초기화될 수 있다.Processing performed in relation to memory safety and stack policies may include, for example, setting or initializing a number of tags associated with memory locations to specific values. For example, when the allocation of a memory area can be associated with a particular color, for example, each tag associated with a memory location within the area must be initialized to have a particular color value. As another example, when putting a memory area back, for example freeing a memory area, all memory locations in the free or unallocated area may be initialized with a specific tag value representing the free or unallocated memory location.

영역 내의 모든 메모리 위치의 태그를 초기화 또는 리셋하기 위해 수행되는 처리는 수용할 수 없는 양의 시간을 소비할 수 있으며, 태깅될 메모리 영역의 크기가 증가함에 따라 특히 수용할 수 없게 된다. 따라서, 메모리 위치의 태그를 효율적으로 초기화 또는 설정(태깅)하기 위한 기술이 다음 단락에서 설명된다. 적어도 하나의 실시예에서, 태그 초기화 또는 설정은 예를 들어, 메모리의 영역 할당 또는 메모리 영역의 프리 상태화와 관련하여 수행될 수 있다. 본 명세서에 설명된 이러한 기술은 큰 메모리 영역과 함께 사용하기 위해 확장 가능하다. 이러한 기술이 메모리 위치의 태그와 관련하여 아래에 예시되지만, 보다 일반적으로, 이러한 기술은 데이터 아이템 또는 엔티티와 각각 연관된 값을 초기화, 설정 또는 리셋팅하는 것과 관련하여 사용될 수 있다.The processing performed to initialize or reset the tags of all memory locations within the region can consume an unacceptable amount of time, and becomes especially unacceptable as the size of the memory region to be tagged increases. Accordingly, techniques for efficiently initializing or setting (tagging) a tag of a memory location are described in the following paragraphs. In at least one embodiment, tag initialization or setup may be performed, for example, in connection with allocating an area of memory or freeing a memory area. These techniques described herein are scalable for use with large memory areas. Although these techniques are illustrated below with respect to tags in memory locations, more generally, these techniques may be used in connection with initializing, setting, or resetting values associated with data items or entities, respectively.

적어도 하나의 실시예에서, 메모리 영역의 태그 및 연관된 메모리 위치는 계층 구조의 리프가 메모리 위치에 대한 태그를 나타내는 계층적 구조 또는 배열로 표현될 수 있다. 설명의 목적 상, 다음의 논의는 트리를 계층 구조로서 참조한다. 그러나, 보다 일반적으로, 임의의 적합한 계층 구조가 메모리 위치 영역과 연관된 어드레스 공간을 표현하는데 사용될 수 있다.In at least one embodiment, tags of memory regions and associated memory locations may be represented in a hierarchical structure or arrangement in which the leaves of the hierarchy represent tags to memory locations. For explanatory purposes, the following discussion refers to a tree as a hierarchical structure. More generally, however, any suitable hierarchical structure may be used to represent the address space associated with a region of memory locations.

극단적인 사례로, 일 실시예에서, 트리 또는 계층 구조의 리프는 메모리 내의 개별 워드를 나타내고 태그를 보유할 수 있다. 그러나, 전체 서브트리가 동일한 태그 값으로 동질로 태깅되면, 본 명세서에서의 기술은 서브트리의 임의의 자손 노드를 더 나타내지 않고 트리 내의 특정 노드 및 연관된 레벨에 태그 값을 단순히 저장할 수 있다. 이 경우, 노드의 태그 값은 (예를 들어, 연속적인 또는 인접한 메모리 어드레스의 범위와 같은) 특정 영역의 다수의 메모리 위치에 대한 태그 값을 특정한다. 이러한 방식으로, 동질로 태깅된 큰 영역이 존재하면, 저장소는 태그 값을 저장하는데 절약될 수 있다. 동질의 태그 값이 없는 최악 사례의 시나리오(예를 들어, 연속적인 어드레스를 갖는 두 개의 메모리 위치가 동일한 태그 값을 갖지 않음)에서, 트리의 리프는 각각 영역 내의 단일 워드와 같은 단일 메모리 위치에 대한 태그 값을 나타낸다.As an extreme example, in one embodiment, the leaves of a tree or hierarchy may represent individual words in memory and hold tags. However, if an entire subtree is homogeneously tagged with the same tag value, the techniques herein may simply store the tag value at a particular node and associated level within the tree without further indicating any descendant nodes of the subtree. In this case, the node's tag value specifies the tag value for a number of memory locations in a specific region (eg, contiguous or contiguous range of memory addresses). In this way, if there are large areas that are homogeneously tagged, storage can be saved in storing the tagged values. In the worst-case scenario where there are no homogeneous tag values (e.g., two memory locations with consecutive addresses do not have the same tag value), each leaf of the tree corresponds to a single memory location, such as a single word within a region. Indicates the tag value.

다음 단락에서 설명되는 바와 같은 트리와 같은 그러한 계층 구조에서, 하나의 노드를 간단히 트리에 다시 기입함으로써 2의 제곱의 메모리 영역을 다시 태깅하거나 초기화하는 처리가 수행될 수 있다. 2의 제곱을 하지 않은 영역의 경우, 처리를 수행하여 영역을 최소 세트의 2제곱 영역으로 (예를 들어, 최소 세트의 그러한 영역을 최대 2*log₂(영역 크기)로) 분할하는 처리가 수행될 수 있다. 특정 워드 또는 메모리 위치의 태그가 필요할 때(예를 들어, 연관된 메모리 위치에 대한 태그를 판독할 때), 트리를 사용하여 태그를 결정하는 처리가 수행될 수 있다. 아래에서 설명되는 적어도 하나의 실시예에서, 캐시 메모리의 계층은 트리의 상이한 레벨에 이용될 수 있다. 태그 값은 원하는 메모리 위치에 관련하여 캐시 히트를 갖는 트리의 최고 레벨과 연관된 캐시에 의해 제공될 수 있다(예를 들어, 원하는 메모리 위치의 어드레스에 대한 태그 값의 캐시 룩업을 수행한다). 메모리 위치와 연관된 태그 값을 기입 또는 수정하기 위해 수행되는 처리와 관련하여, 처리는 서브트리 또는 다중 기입(예를 들어, 2*log₂(영역 크기) 로그 기입(log write))을 표시하는 단일 기입을 수행하는 것을 포함할 수 있다. 이러한 다중 기입은, 예를 들어, 태그 값을 수정하거나 설정하기 전에 동질로 태깅된 제 1 메모리 영역에 포함된 제 1 메모리 위치의 태그를 수정 또는 설정하는 것에 응답하여 수행될 수 있다. 이 경우, 태그 값을 설정 또는 변경하는 것은 제 1 메모리 영역이 더 이상 동질로 태깅되지 않게 하며, 이에 따라서 제 1 메모리 영역에 대한 태그 값을 나타내는 계층 구조는 제 1 메모리 영역에 대한 태그 값을 나타내는 서브트리를 더 분해하도록 업데이트한다.In such a hierarchical structure, such as a tree as described in the next paragraph, a process of re-tagging or initializing a power-of-two memory region can be performed by simply writing one node back into the tree. For non-power-of-two regions, processing is performed to divide the region into a minimum set of power-of-two regions (e.g., at most 2*log ₂ (region size) of such a region in the minimum set). It can be. When a tag of a particular word or memory location is needed (eg, when reading a tag for an associated memory location), a process of determining the tag using the tree may be performed. In at least one embodiment described below, a hierarchy of cache memories may be used at different levels of the tree. The tag value may be provided by the cache associated with the highest level of the tree that has the cache hit relative to the desired memory location (eg, performing a cache lookup of the tag value for the address of the desired memory location). Regarding the processing performed to write or modify a tag value associated with a memory location, the processing is a single representation representing a subtree or multiple writes (e.g., 2*log ₂ (region size) log write). It may include performing an entry. Such multiple writing may be performed, for example, in response to modifying or setting a tag of a first memory location included in a homogenously tagged first memory area prior to modifying or setting a tag value. In this case, setting or changing the tag value causes the first memory area to no longer be tagged as homogeneous, and accordingly, the hierarchical structure representing the tag value for the first memory area is Update the subtree to further decompose.

도 78을 참조하면, 본 명세서에서의 기술에 따른 실시예에서 메모리 영역에 대응하는 어드레스 공간에 대한 태그 값을 표현하기 위해 사용될 수 있는 계층 구조체의 예가 도시된다. 예(100)는 도시의 단순화를 위해 8 메모리 위치를 포함하는 메모리 영역을 나타내는데 사용되는 계층 구조체로서 트리를 도시한다. 보다 일반적으로, 본 명세서에서의 기술은 임의의 수의 레벨, 각 레벨에서 임의의 적합한 수의 노드, 부모 노드 당 임의의 적합한 수의 자식 노드 및 각 레벨에서 임의의 적합한 수의 노드를 갖는 임의의 적합한 계층을 사용하여 임의의 어드레스 공간 또는 메모리 영역에 대한 태그 값을 나타내는데 사용될 수 있다.Referring to FIG. 78 , an example of a hierarchical structure that may be used to represent a tag value for an address space corresponding to a memory area in an embodiment according to the technology herein is shown. Example 100 shows a tree as a hierarchical structure used to represent a memory region containing 8 memory locations for simplicity of illustration. More generally, the techniques herein can be applied to any number of levels, any suitable number of nodes at each level, any suitable number of child nodes per parent node, and any suitable number of nodes at each level. It can be used to represent tag values for any address space or region of memory using an appropriate hierarchy.

예(100)는 어드레스 0 내지 7을 포함하여 8 메모리 위치에 대한 태그 값의 이진 트리 표현을 도시한다. 이 예에서 트리는 만일 있다면, 8 메모리 위치 중 어느 것이 구조체(100)의 동일한 서브트리를 사용하여 동질로 태깅되었는지에 따라 최대 4 레벨의 노드를 포함할 수 있다. 이 예에서, 8 메모리 위치의 전체 메모리 영역은 두 개의 더 작은 메모리 영역의 제곱으로 반복적으로 분할될 수 있고, 더 작은 메모리 영역의 각각의 그러한 분할은 트리 내의 노드들의 서로 다른 레벨에 대응한다. 예를 들어, 레벨 1(104)은 레벨 2(106)에서의 노드(노드 B1 및 B2)에 의해 각각 표현되는 두 개의 더 작은 영역으로 분할되는 전체 어드레스 공간 0 내지 7에 대응하는 노드 A1을 포함한다. 레벨 2(106)는 어드레스 0 내지 3과 연관된 노드 B1 및 어드레스 4 내지 7과 연관된 노드 B2를 포함한다. Example 100 shows a binary tree representation of tag values for 8 memory locations, including addresses 0-7. The tree in this example may contain up to four levels of nodes, depending on which, if any, of the eight memory locations are tagged homogeneously using the same subtree of structure 100 . In this example, the entire memory region of 8 memory locations can be repeatedly partitioned into two smaller memory regions squared, each such division of the smaller memory region corresponding to a different level of nodes in the tree. For example, level 1 (104) contains node A1 corresponding to the entire address space 0 to 7 which is divided into two smaller regions each represented by the nodes (nodes B1 and B2) in level 2 (106). do. Level 2 (106) includes node B1 associated with addresses 0-3 and node B2 associated with addresses 4-7.

레벨 2(106)에서의 노드 B1 및 B2 각각은 레벨 3(108)에서의 노드에 의해 각각 표현되는 두 개의 더 작은 영역으로 더 분할될 수 있다. 노드 B1 및 그의 연관된 어드레스 범위 0 내지 3은 노드 C1 및 C2에 의해 표현되는 두 개의 영역으로 분할되며, 여기서 C1은 어드레스 범위 0-1과 연관되며 C2는 어드레스 범위 2-3과 연관된다. 유사하게, 노드 B2 및 그의 연관된 어드레스 범위 4 내지 7은 노드 C3 및 C4에 의해 표현되는 두 개의 영역으로 분할되며, 여기서 C3은 어드레스 범위 4-5와 연관되고 C4는 어드레스 범위 6-7과 연관된다.Each of the nodes B1 and B2 at level 2 (106) can be further divided into two smaller regions each represented by a node at level 3 (108). Node B1 and its associated address ranges 0 to 3 are divided into two regions represented by nodes C1 and C2, where C1 is associated with address range 0-1 and C2 is associated with address range 2-3. Similarly, node B2 and its associated address ranges 4 to 7 are divided into two regions represented by nodes C3 and C4, where C3 is associated with address ranges 4-5 and C4 is associated with address ranges 6-7. .

레벨 3(108)에서의 노드 C1 내지 C4) 각각은 레벨 4(110)에서의 노드에 의해 각각 표현되는 두 개의 더 작은 영역으로 더 분할될 수 있다. 이 예에서, 레벨 4에서의 노드는 각각 단일 워드 또는 메모리 위치에 대한 태그 값을 나타낸다. 노드 C1 및 그의 연관된 어드레스 범위 0-1은 노드 D1 및 D2로 표현되는 두 개의 영역으로 분할되며, 여기서 D1은 어드레스 0과 연관되고 D2는 어드레스 1과 연관된다. 노드 C2 및 그의 연관된 어드레스 범위 2-3은 노드 D3 및 D4로 표현되는 두 개의 영역으로 분할되며, 여기서 D3는 어드레스 2와 연관되고 D4는 어드레스 3과 연관된다. 노드 C3 및 그의 연관된 어드레스 범위 4-5는 노드 D5 및 D6로 표현되는 두 개의 영역으로 분할되며, 여기서 D5는 어드레스 4와 연관되고 D6은 어드레스 5와 연관된다. 노드 C4 및 그의 연관된 어드레스 범위 6-7은 노드 D7 및 D8로 표현되는 두 개의 영역으로 분할되며, 여기서 D7은 어드레스 6과 연관되고 D8은 어드레스 7과 연관된다.Each of the nodes C1 to C4 in level 3 (108) can be further divided into two smaller regions each represented by a node in level 4 (110). In this example, the nodes at level 4 each represent a tag value for a single word or memory location. Node C1 and its associated address range 0-1 are divided into two regions represented by nodes D1 and D2, where D1 is associated with address 0 and D2 is associated with address 1. Node C2 and its associated address ranges 2-3 are divided into two regions represented by nodes D3 and D4, where D3 is associated with address 2 and D4 is associated with address 3. Node C3 and its associated address ranges 4-5 are divided into two regions represented by nodes D5 and D6, where D5 is associated with address 4 and D6 is associated with address 5. Node C4 and its associated address ranges 6-7 are divided into two regions represented by nodes D7 and D8, where D7 is associated with address 6 and D8 is associated with address 7.

모든 노드(A1, B1-B2, C1 내지 C4 및 D1 내지 D8)은 8 메모리 위치의 영역에 대한 태그 값의 계층적 표현에 존재할 수 있는 가능한 노드의 최대 수를 나타낼 수 있다. 그러나, 보다 상세하게 아래에서 설명되는 바와 같이, 메모리 위치 0 내지 7에 저장된 태그 값을 나타내는 트리에 포함된 특정 노드는 특정 태그 값 및 다양한 시점에서 표현된 동질 및 비동질 태그 영역에 따라 달라질 수 있다. 계층의 레벨은 루트 또는 레벨 1(104) 노드에 대응하는 최상위 레벨로부터 최하단 레벨 4(110)에 대응하는 최하위 레벨로 등급이 매겨질 수 있다.All nodes (A1, B1-B2, C1 to C4, and D1 to D8) may represent the maximum number of possible nodes that may exist in the hierarchical representation of tag values for a region of 8 memory locations. However, as described below in more detail, specific nodes included in the tree representing tag values stored in memory locations 0 to 7 may vary depending on specific tag values and homogeneous and non-homogeneous tag regions represented at various points in time. . The levels of the hierarchy may be ranked from the highest level corresponding to the root or level 1 (104) node to the lowest level corresponding to the lowest level 4 (110) node.

본 명세서에서의 기술과 관련하여 제 1 예에 대해, 도 79의 (120)이 참조된다. 이러한 제 1 예에서, 계층(120)의 특정 레벨에 있는 노드와 연관된 모든 메모리 위치는 동일한 태그 값 T1을 갖고, 이에 따라 동질로 태깅된 메모리 위치의 서브트리를 나타내고, 특정 레벨에서의 노드는 모든 그러한 메모리 위치의 태그 값을 가지며, 서브트리 내의 더 이상의 자손 노드는 동질로 태깅된 메모리 위치 중 임의의 메모리 위치에 대한 태그 값을 결정하기 위해 참조될 필요가 없다. 예컨대, 동일한 태그를 갖는 메모리 영역을 초기화하는 것과 관련하여 모든 메모리 위치 0 내지 7이 동일한 태그 값 T1을 포함하면, 메모리 위치 0 내지 7에 대한 태그 값 T1은 (예를 들어, 노드 A1에 의해 "tag=T1" 표시로 나타내는 바와 같이) 노드(A1)에 저장될 수 있다. 적어도 하나의 실시예에서, 어드레스 0 내지 7에 대한 전체 영역에 대한 태그 값은 단일 노드 A1에 의해 표현되기 때문에 트리의 다른 노드에 대한 추가 태그 값을 더 이상 저장할 필요가 없다. 이 경우, 요소(122)는 어드레스 0 내지 7을 갖는 메모리 위치에 저장된 태그 값의 계층적 표현에 포함될 수 있는 단일 노드를 나타내며, 나머지 노드(B1, B2, C1 내지 C4 및 D1 내지 D8)는 계층적 표현에서 생략될 수 있다For a first example in connection with the description herein, reference is made to 120 of FIG. 79 . In this first example, all memory locations associated with nodes at a particular level of hierarchy 120 have the same tag value T1, thus representing a subtree of homogenously tagged memory locations, and nodes at a particular level are all Having the tag value of that memory location, no further descendant nodes in the subtree need be referenced to determine the tag value for any of the homogenously tagged memory locations. For example, if all memory locations 0 to 7 contain the same tag value T1 with respect to initializing a memory region with the same tag, then the tag value T1 for memory locations 0 to 7 is (e.g., "by node A1" tag=T1") may be stored at node A1. In at least one embodiment, it is no longer necessary to store additional tag values for other nodes in the tree because the tag values for the entire region for addresses 0 through 7 are represented by a single node A1. In this case, element 122 represents a single node that may be included in a hierarchical representation of tag values stored in memory locations having addresses 0 through 7, and the remaining nodes B1, B2, C1 through C4, and D1 through D8 are hierarchical. may be omitted from the expression

제 2 예에서, 도 80의 (130)이 참조된다. 이러한 제 2 예에서, 메모리 위치 0 내지 3이 동일한 제 1 태그 값 T1을 갖고 메모리 위치 4 내지 7이 동일한 제 2 태그 값 T2을 갖는다고 가정한다(제 1 및 제 2 태그 값 태그 값은 상이하다). 이 경우, 노드(A1)는 노드 A1이 메모리 위치 0 내지 7에 대한 동질 태그를 특정하지 않고 메모리 위치 0 내지 7에 대한 태그 값이 계층의 하나 이상의 하위 레벨에 있는 노드에 의해 특정되는 것을 나타내는 표시자(예를 들어, 노드 A1에 의해 "TAG VALUE = NO TAG VALUE" 표시로 나타냄)를 포함할 수 있다. 계층의 레벨 2에서, 제 1 태그 값 T1은 노드 B1에 저장될 수 있고(노드 B1에 의해 "TAG VALUE=T1" 표시로 나타냄), 제 2 태그 값 T2는 노드 B2에 저장될 수 있다(노드 B2에 의해 "TAG VALUE=T2" 표시로 표시됨). B1이 루트인 서브트리(B1, C1, C2, D1 내지 D4)는 동질로 태깅된 메모리 위치 0 내지 3의 한 세트를 나타낸다. B2가 루트인 서브트리(B2, C3, C4, D5 내지 D8)는 동질로 태깅된 메모리 위치 4 내지 7의 다른 세트를 나타낸다. 적어도 하나의 실시예에서, 레벨 3 및 4에서의 트리의 다른 노드(예를 들어, 레벨 3에 대한 노드 C1 내지 C4 및 레벨 4에 대한 노드 D1 내지 D8)에 대한 추가 태그 값을 더 이상 저장할 필요는 없는데, 그 이유는 어드레스 0 내지 7에 대한 전체 영역에 대한 태그 값이 레벨 2에서의 노드 B1 및 B2로 표현되기 때문이다. 이 경우, 요소(132)는 어드레스 0 내지 7을 이용한 메모리 위치에 저장된 태그 값의 계층적 표현에 포함될 수 있는 노드를 나타내고, 나머지 노드(C1 내지 C4 및 D1 내지 D8)은 계층적 표현에서 생략될 수 있다.In the second example, reference is made to 130 in FIG. 80 . In this second example, assume that memory locations 0 to 3 have the same first tag value T1 and memory locations 4 to 7 have the same second tag value T2 (the first and second tag values are different ). In this case, node A1 is an indication that node A1 does not specify homogeneous tags for memory locations 0-7 and that the tag values for memory locations 0-7 are specified by nodes at one or more lower levels of the hierarchy. (e.g., indicated by the "TAG VALUE = NO TAG VALUE" indication by node A1). At level 2 of the hierarchy, a first tag value T1 may be stored in node B1 (represented by node B1 by the "TAG VALUE=T1" indication), and a second tag value T2 may be stored in node B2 (node B1). marked "TAG VALUE=T2" by B2). The subtree B1, C1, C2, D1 through D4, rooted at B1, represents a set of homogeneously tagged memory locations 0 through 3. The subtree B2, C3, C4, D5 through D8, rooted at B2, represents another set of homogeneously tagged memory locations 4 through 7. In at least one embodiment, there is no longer a need to store additional tag values for other nodes in the tree at levels 3 and 4 (e.g., nodes C1 to C4 for level 3 and nodes D1 to D8 for level 4). is absent, because tag values for the entire area for addresses 0 to 7 are represented by nodes B1 and B2 at level 2. In this case, element 132 represents a node that may be included in the hierarchical representation of tag values stored in memory locations using addresses 0 to 7, and the remaining nodes (C1 to C4 and D1 to D8) may be omitted from the hierarchical representation. can

제 1 시점에서, 모든 태그가 동일한 태그 값을 가지므로, 태그 계층은 단일 노드(122)만으로 예(120)와 관련하여 설명된 바와 같을 수 있다. 후속하는 제 2 시점에서, 어드레스 0 내지 3에 대한 태그 값은 동일한 제 1 태그 값 T1로 수정될 수 있고 어드레스 4 내지 7은 동일한 제 2 태그 값 T2로 수정될 수 있다. 전술한 태그 수정의 결과로서, 예(130)에서 위에서 설명된 바와 같은 두 개의 부가 노드 B1 및 B2가 계층에 추가될 수 있다. 후속하는 제 3 시점에서, 이제 어드레스 0 내지 3에 대한 태그 값은 예(130)에서와 동일하게 유지된다고 가정한다. 그러나, 어드레스 4 내지 7에 대한 태그 값은 도 81과 관련하여 아래에서 설명되는 바와 같이 수정될 수 있으며, 이에 의해 (C4 및 D5-D6)이 태그 계층에 추가된다.At a first point, all tags have the same tag value, so the tag hierarchy can be as described with respect to example 120 with only a single node 122 . At a second time point that follows, tag values for addresses 0 to 3 may be modified to the same first tag value T1 and addresses 4 to 7 may be modified to the same second tag value T2. As a result of the tag modification described above, two additional nodes B1 and B2 as described above in example 130 may be added to the hierarchy. At a third time point that follows, now assume that the tag values for addresses 0 through 3 remain the same as in example 130. However, the tag values for addresses 4 through 7 can be modified as described below with respect to FIG. 81, whereby (C4 and D5-D6) are added to the tag hierarchy.

제 3 예에서, 도 81의 (140)이 참조된다. 이러한 제 3 예에서, 메모리 위치 0 내지 3은 위에서 설명된 바와 동일한 제 1 태그 값 T1을 갖는다고 가정한다(여기서, 제 1 태그 값 T1은 노드 B1 및 서브트리(B1, C1, C2, D1 내지 D4)에 저장되며, 서브트리 중 B1은 동질로 태깅된 한 세트의 메모리 위치 0 내지 3을 나타낸다. 또한, 메모리 위치 4-5는 각각 상이한 태그 값을 포함하며, 여기서 메모리 위치 4는 태그 값 T3을 갖고, 메모리 위치 5는 태그 값 T4를 가지며, 메모리 위치 6-7는 동질로 태깅되고 동일한 태그 값 T5를 포함한다고 가정한다. 이 경우, 노드 A1, B2 및 C3는 위의 설명과 일관하여, 각각 특정 노드가 태그 값을 특정하지 않는 표시자(예를 들어, TAG=NO TAG)를 포함할 수 있고, 이에 따라 계층 내의 하나 이상의 하위 레벨에서의 노드는 노드 A1, B2 및 C3와 연관된 특정 메모리 위치에 대한 태그 값을 특정한다. 예를 들어, 메모리 위치 4-5에 대응하는 노드 C3는 노드 C3이 태그 값을 특정하지 않음으로써, 계층 내의 하나 이상의 하위 레벨에서의 노드가 메모리 위치 4-5에 대한 태그 값을 특정한다는 표시자를 포함할 수 있다. 레벨 4에서의 노드 D5는 메모리 위치 4에대한 태그 값 T3를 특정(예를 들어, 노드(D5)에 의해 TAG=T3 표시자)할 수 있고, 레벨 4에서의 노드 D6는 메모리 위치 6-7에 대한 태그 값 T4를 특정(예를 들어, 노드 D6에 의해 TAG=T4 표시자)할 수 있다. 메모리 위치 6-7에 대응하는 노드 C4는 태그 값 T5을 메모리 위치 6-7에 공통인 동질 태그 값으로 특정(예를 들어, 노드 C4에 의해 TAG=T5 표시자)할 수 있고, 노드 D7 및 D8에 추가 태그 값을 더 이상 저장할 필요가 없음(예를 들어, C4의 자손 노드 D7, D8을 더 이상 검사할 필요가 없음)을 표시한다. 이 경우, 요소(142)는 어드레스 0 내지 7을 이용한 메모리 위치에 저장된 태그 값의 계층적 표현에 포함될 수 있는 노드를 나타내며, 나머지 노드(C1-C2, D1 내지 D4 및 D7 내지 D8)는 계층적 대현에서 생략될 수 있다.In the third example, reference is made to 140 in FIG. 81 . In this third example, assume that memory locations 0 through 3 have the same first tag value T1 as described above (where the first tag value T1 is the node B1 and the subtrees B1, C1, C2, D1 through D4), B1 of the subtree represents a homogenously tagged set of memory locations 0 to 3. In addition, memory locations 4-5 each contain a different tag value, where memory location 4 has a tag value T3. , memory location 5 has a tag value T4, and memory locations 6-7 are homogeneously tagged and contain the same tag value T5 In this case, nodes A1, B2 and C3, consistent with the above description, have: Each particular node may include an indicator that does not specify a tag value (eg, TAG=NO TAG), such that nodes at one or more lower levels in the hierarchy may have specific memory associated with nodes A1, B2, and C3. Specifies a tag value for a location, eg, node C3 corresponding to memory location 4-5, whereby node C3 does not specify a tag value, so that a node at one or more lower levels in the hierarchy has memory location 4-5 node D5 at level 4 may specify a tag value T3 for memory location 4 (e.g., by node D5 the TAG=T3 indicator). and node D6 at level 4 may specify a tag value T4 for memory locations 6-7 (e.g., the TAG=T4 indicator by node D6. Node C4 corresponding to memory locations 6-7). can specify the tag value T5 as a homogenous tag value common to memory locations 6-7 (e.g. by node C4 the TAG=T5 indicator), no need to store additional tag values at nodes D7 and D8. indicates that there is no (e.g., descendant nodes D7 and D8 of C4 no longer need to be examined.) In this case, element 142 is a hierarchical hierarchy of tag values stored in memory locations with addresses 0 through 7. in the expression Indicates nodes that may be included, and the remaining nodes (C1-C2, D1 to D4, and D7 to D8) may be omitted from the hierarchical representation.

(120, 130 및 140)의 전술한 예시는 메모리 위치와 연관된 태그 값이 시간에 따라 변할 수 있기 때문에 상이한 시점에서 어드레스 또는 메모리 위치 0 내지 7에 대한 메모리 영역에 대한 태그 값의 계층적 표현을 나타낼 수 있다. 위에서 설명된 바와 같은 계층에 노드를 추가하는 것과 유사한 방식으로, 기존 노드의 서브트리가 모두 동일한 태그 값을 갖도록 수정되므로 필요에 따라 노드는 계층에서 제거할 수 있다(예를 들어, 노드의 후손이 모두 동일한 태그 값이면, 모든 자손 노드는 계층에서 제거될 수 있고 노드는 서브트리의 유일한 노드로서 사용되어 노드 및 그 후손의 단일의 동질 태그 값을 나타낼 수 있다).The foregoing examples of 120, 130 and 140 represent a hierarchical representation of tag values for memory regions for addresses or memory locations 0 through 7 at different points in time because tag values associated with memory locations may change over time. can In a similar way to adding a node to a hierarchy as described above, a node can be removed from a hierarchy as needed (e.g., a node's descendants will be modified so that all subtrees of the existing node have the same tag value). If they all have the same tag value, all descendant nodes can be removed from the hierarchy and the node can be used as the only node in the subtree to represent a single homogeneous tag value of the node and its descendants).

적어도 하나의 실시예에서, 트리의 레벨에서의 제 1 노드가 제 1 노드와 연관된 하나 이상의 메모리 위치에 대한 값을 특정할 때, 제 1 노드의 자손 노드를 더 나타낼 필요가 없다(예를 들어, 제 1 노드 이외의 서브트리 노드를 더 나타낼 필요가 없다). 추가로 설명하면, 메모리 영역 0 내지 7에 대한 단일의 동질 태그 값을 나타내기 위해 노드 A1의 태그 값만이 필요한 도 79의 (120)에서 위에서 언급된 제 1 예가 참조된다. 추가로 설명하면, 실시예가 노드(C1 내지 C2, D1 내지 D4 및 D7 내지 D8)를 더 나타내지 않을 수 있는 위에서 언급된 도 81의 제 3 예(140)가 참조된다. 이러한 방식으로, 메모리 위치 및 연관된 태그 값의 이와 같은 계층적 표현을 사용하면 메모리 위치에 대한 태그 값과 관련하여 저장소를 절약할 수 있다. 다시 말해서, 적어도 하나의 실시예에서, 각각의 메모리 위치에 대한 개별 태그 값을 항상 할당하고 저장하기보다는, 계층 내의 단일 태그 값이 연속적 또는 인접한 어드레스를 갖는 다수의 메모리 위치에 대해 동질 태그 값을 나타내는 저장 디바이스가 할당될 수 있다. (120)에서 언급된 제 1 예를 참조하면, 각각이 동일한 태그 값을 포함하는 메모리 위치 0 내지 7에 대한 8 태그 값을 위한 저장소를 할당하기보다는, 노드(A1)의 단일 태그 값을 저장하기 위한 메모리가 할당될 수 있다. In at least one embodiment, when a first node in a level of the tree specifies a value for one or more memory locations associated with the first node, there is no need to indicate further descendant nodes of the first node (e.g., There is no need to further indicate subtree nodes other than the first node). For further explanation, reference is made to the first example mentioned above in (120) of FIG. 79 in which only the tag value of node A1 is needed to indicate a single homogenous tag value for memory areas 0 to 7. [0053] For further explanation, reference is made to the above-mentioned third example 140 of Figure 81 in which an embodiment may not further represent nodes C1-C2, D1-D4, and D7-D8. In this way, using this hierarchical representation of memory locations and associated tag values can save storage in terms of tag values for memory locations. In other words, in at least one embodiment, rather than always assigning and storing individual tag values for each memory location, a single tag value within a hierarchy represents a homogenous tag value for multiple memory locations with contiguous or contiguous addresses. A storage device may be allocated. Referring to the first example mentioned at 120, rather than allocating storage for 8 tag values for memory locations 0 through 7, each containing the same tag value, storing a single tag value for node A1 Memory can be allocated for

어드레스 0 내지 7을 갖는 메모리 영역에 동질로 태깅된 메모리 위치가 없다고 가정하는 최악의 시나리오에서, 도 78의 전체의 노드 계층 구조체는 어드레스 0 내지 7에 저장된 태그 값을 나타내기 위해 사용된다. 예를 들어, 계층의 각 리프는 메모리의 다른 워드를 나타낼 수 있다. 따라서, 계층의 최하단 레벨 4(110)는 어드레스 공간 0 내지 7에 대한 태그 값을 나타낼 수 있다.In the worst-case scenario, assuming that there are no homogenously tagged memory locations in the memory regions with addresses 0-7, the entire node hierarchy structure of FIG. 78 is used to represent the tag values stored at addresses 0-7. For example, each leaf of the hierarchy may represent a different word of memory. Thus, the lowest level 4 (110) of the hierarchy may represent tag values for address spaces 0 through 7.

다시 도 78을 참조하면, 메모리 위치 0 내지 7의 어드레스를 나타내는데 사용되는 8 비트 어드레스 공간이 있다고 가정한다. 적어도 하나의 실시예에서, 전체 8 비트 어드레스 공간은 각각이 8 메모리 위치를 포함하는 상이한 메모리 영역으로 분할될 수 있고, 각각의 상이한 메모리 영역은 태그 값 계층의 상이한 인스턴스에 의해 표현되는 태그 값을 가질 수 있다. 따라서, 방금 설명한 어드레스 0 내지 7의 메모리 영역에 대해, 최상위 또는 최상단 5 비트는 all=0이고, 어드레스 0 내지 7은 나머지 하위 3 비트에서 표현될 수 있다. 따라서 최상위 또는 최상단 5 비트=0은 어드레스 0 내지 7의 메모리 영역을 나타내는데 사용될 수 있다. 이러한 실시예에서, 8 메모리 위치의 각각의 메모리 영역은 예컨대 특정 메모리 영역의 태그 값을 나타내는 도 78에 도시된 분리된 태그 값 계층을 가질 수 있다. 이 예에서, 8 어드레스 또는 메모리 위치의 상이한 범위를 나타내는 상이한 메모리 영역 각각은 메모리 위치의 8 비트 어드레스의 최상단 5 비트를 검사함으로써 구별될 수 있다.Referring again to Figure 78, assume that there is an 8-bit address space used to represent the addresses of memory locations 0-7. In at least one embodiment, the entire 8-bit address space may be divided into different memory regions each containing 8 memory locations, each different memory region having a tag value represented by a different instance of the tag value hierarchy. can Therefore, for the memory area of addresses 0 to 7 just described, the most significant or top 5 bits are all = 0, and the addresses 0 to 7 can be expressed in the remaining 3 bits. Therefore, the most significant or top 5 bits = 0 can be used to indicate the memory area of addresses 0 to 7. In such an embodiment, each memory area of the 8 memory locations may have a separate tag value hierarchy shown in FIG. 78 representing, for example, the tag value of a particular memory area. In this example, each of the different memory regions representing a different range of 8 addresses or memory locations can be distinguished by examining the top 5 bits of the 8-bit address of the memory location.

본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 태그 캐시의 수가 태그 값을 나타내는 노드의 계층에서의 레벨의 수에 대응할 수 있는 일련의 태그 캐시 메모리가 사용될 수 있다. 위에서 논의된 예를 계속 진행하면서 도 78의 (100)을 다시 참조하면, 8 메모리 위치의 메모리 영역에 대한 태그 계층의 각 인스턴스는 4 레벨을 갖는다. 이러한 경우에, 실시예는 메모리 위치에 대한 태그를 저장하기 위해 도 82의 예(150)에 도시된 바와 같이 4 태그 캐시 메모리(152, 154, 156 및 158)를 사용할 수 있다. 일반적으로, 4 태그 캐시 메모리(152, 154, 156 및 158) 각각은 태그 값 계층에서 상이한 레벨과 연관되고 태그 값 계층의 연관된 상이한 레벨에 각 노드에 관한 정보를 저장할 수 있다. 예를 들어, 태그 레벨 캐시(152)는 레벨 1(104) 노드 또는 메모리 영역(위에서 언급한 바와 같이 이러한 특정 예에서는 8 메모리 위치의 메모리 영역) 각각에 대한 태그 값 계층의 루트에 대한 정보를 포함할 수 있다. 태그 레벨 캐시(154)는 각각의 메모리 영역에 대한 태그 값 계층의 레벨 2(106) 노드에 대한 정보를 포함할 수 있다. 태그 레벨 캐시(156)는 각각의 메모리 영역에 대한 태그 값 계층의 레벨 3(108) 노드에 대한 정보를 포함할 수 있다. 태그 레벨 캐시(158)는 각각의 메모리 영역에 대한 태그 값 계층의 레벨 4(108) 노드에 대한 정보를 포함할 수 있다. 이 예에서 레벨 4(110)인 계층의 최하위 또는 최하단 레벨은 데이터 캐시에 저장될 수 있는 메모리 위치에 대한 캐시 라인에 대응할 수 있다(예를 들어, 도 1의 요소(20)으로 표시되는 바와 같이 (L1-D$)로 나타낸다). 실시예는 태그 캐시의 레벨 4(158)를 가질 수 있고 데이터 캐시의 캐시 라인에 개별적으로 저장될 수 있는 메타데이터 태그를 부가적으로 데이터 캐시의 캐시 라인에 별도로 저장될 수 있는 메타데이터 태그를 가질 수 있다. 노드의 캐시(152, 154, 156 및 158) 각각은 메인 메모리에 연관된 표현을 갖는다.In at least one embodiment according to the techniques herein, a series of tag cache memories may be used in which the number of tag caches may correspond to the number of levels in the hierarchy of nodes representing tag values. Continuing the example discussed above and referring back to 100 of FIG. 78 , each instance of the tag hierarchy for a memory area of 8 memory locations has 4 levels. In this case, an embodiment may use four tag cache memories 152, 154, 156, and 158 as shown in example 150 of FIG. 82 to store tags for memory locations. In general, each of the four tag cache memories 152, 154, 156 and 158 are associated with a different level in the tag value hierarchy and may store information about each node in the associated different level of the tag value hierarchy. For example, the tag level cache 152 contains information about the root of the tag value hierarchy for each level 1 104 node or memory region (in this particular example, a memory region of 8 memory locations, as noted above). can do. The tag level cache 154 may include information about the level 2 106 node of the tag value hierarchy for each memory region. The tag level cache 156 may include information about the level 3 108 node of the tag value hierarchy for each memory region. The tag level cache 158 may contain information about the level 4 108 node of the tag value hierarchy for each memory region. The lowest or lowest level in the hierarchy, level 4 110 in this example, may correspond to a cache line for a memory location that may be stored in a data cache (e.g., as indicated by element 20 in FIG. 1). (L1-D$)). Embodiments may have level 4 158 of the tag cache and have metadata tags that may be separately stored in cache lines of the data cache in addition to metadata tags that may be separately stored in cache lines of the data cache. can Each of the node's caches 152, 154, 156 and 158 has an associated representation in main memory.

8 비트 어드레스 공간을 갖는 본 명세서에서 설명된 실시예와 관련하여, 메모리 위치의 어드레스의 최상단 또는 최상위 5 비트(152a)는 레벨 1 캐시(152)에 의해 캐시(152)가 메모리 위치의 어드레스에 대한 임의의 레벨 1 노드를 포함하는지를 검색하는데 사용될 수 있다. 메모리 위치의 어드레스의 최상단 또는 최상위 6 비트(154a)는 레벨 2 캐시(154)에 의해 캐시(154)가 메모리 위치의 어드레스에 대한 임의의 레벨 2 노드를 포함하는지를 검색하는데 사용될 수 있다. 메모리 위치의 어드레스의 최상단 또는 최상위 7비트(156a)는 레벨 3 캐시(156)에 의해 캐시(156)가 메모리 위치의 어드레스에 대해 임의의 레벨 3 노드를 포함하는지를 검색하는데 사용될 수 있다. 메모리 위치의 어드레스의 8 비트(154a)는 레벨 4 캐시(158)에 의해 캐시(158)가 메모리 위치의 어드레스에 대해 임의의 레벨 4 노드를 포함하는지를 검색하는데 사용될 수 있다.With respect to the embodiments described herein having an 8-bit address space, the most significant or most significant 5 bits 152a of the address of a memory location are used by the level 1 cache 152 to allow the cache 152 to access the memory location's address. Can be used to search for any level 1 node. The topmost or most significant 6 bits 154a of the memory location's address may be used by the level 2 cache 154 to retrieve whether the cache 154 contains any level 2 nodes for the address of the memory location. The topmost or most significant 7 bits 156a of the address of the memory location may be used by the level 3 cache 156 to retrieve whether the cache 156 contains any level 3 nodes for the address of the memory location. Eight bits 154a of the memory location's address may be used by the level 4 cache 158 to retrieve whether the cache 158 contains any level 4 nodes for the memory location's address.

특정 어드레스의 경우, 최하단 레벨이 아닌 다른 레벨과 연관된 각각의 캐시는 다음의 것을 리턴할 수 있다: For a particular address, each cache associated with a level other than the lowest level can return:

1) 특정의 어드레스에 대한 태그 값(이것이 그 레벨에서 다수의 어드레스에 대한 동질의 태그 값이 있는 것을 나타냄);1) a tag value for a particular address (this indicates that there is a homogeneous tag value for multiple addresses at that level);

2) 캐시는 특정 어드레스에 대한 태그 값을 특정하지 않으며, 계층의 더 낮은 레벨에서의 캐시는 특정 어드레스에 대한 태그 값을 획득하기 위해 참조될 필요가 없다는 표시자(이러한 특정 레벨은 특정 어드레스에 대한 동질의 태그 값을 특정하지 않음); 또는2) an indicator that the cache does not specify a tag value for a particular address, and that a cache at a lower level in the hierarchy does not need to be referenced to obtain a tag value for a particular address (this particular level is does not specify a homogenous tag value); or

3) 특정 어드레스에 대응하는 노드 또는 태그 정보를 포함하는 특정 레벨 캐시에 캐시 위치가 없음을 나타내는 널(null) 또는 제 2 표시자. 제 2 표시자는 또한 더 낮지만 하단이 아닌 레벨 캐시에서의 어떤 캐시도 어드레스에 대한 노드 또는 태그 정보를 포함하지 않음을 나타낸다. 이에 대해서는 아래에서 자세히 설명한다.3) Null or a second indicator indicating that there is no cache location in the specific level cache containing node or tag information corresponding to the specific address. A second indicator also indicates that no caches in the lower but not lower level caches contain node or tag information for the address. This is explained in detail below.

위의 논의와 일관하여, 아이템 2)에서 리턴된 표시자는 예컨대 예(120, 130 및 140)에 도시된 노드와 연관된 "NO TAG" 표시자일 수 있다. 예를 들어, 도 80의 예(130)를 참조하여, 메모리 위치 5에 대한 태그를 결정하는 처리가 수행된다고 가정한다. 이 경우에, 레벨 1 캐시(152)는 메모리 위치 5에 대한 태그 값이 다른 하위 레벨 캐시(154, 156 또는 158) 중 하나에 의해 특정됨을 나타내는 NO TAG 표시자를 리턴할 수 있다. 레벨 2 캐시(154)는 위의 리턴된 캐시 아이템 1)을 예시하는 메모리 위치 5에 대한 태그 값 T2를 리턴할 수 있다. 제 2 표시자가 리턴된 위의 리턴된 아이템 3)을 설명하기 위해, 레벨 3 캐시(156)를 고려해 본다. 레벨 3 캐시(156)는 메모리 위치(5)에 대응하는 임의의 노드 정보를 포함하지 않을 수 있으며(예를 들어, 어떠한 캐시 위치도 연관된 노드 또는 태그 정보를 포함하지 않고) 그래서 이 레벨 3 캐시에는 메모리 위치 5에 대한 노드 정보가 없음을 표시하는 위의 3)에서 설명된 제 2 표시자가 리턴될 수 있다. 이러한 실시예에서, 처리는 최고 태그 레벨 캐시로부터 리턴된 태그 값을 일반적으로 이용할 수 있다. 예를 들어, 메모리 위치 5와 관련하여 예(130)를 참조하면, 레벨 2 캐시(154)는 메모리 위치 5에 대한 태그 값을 리턴하는 최상위 레벨의 태그 캐시이다.Consistent with the discussion above, the indicator returned in item 2) could be a “NO TAG” indicator associated with the nodes shown in examples 120, 130 and 140, for example. For example, referring to example 130 of FIG. 80, assume that a process of determining a tag for memory location 5 is performed. In this case, the level 1 cache 152 may return a NO TAG indicator indicating that the tag value for memory location 5 is specified by one of the other lower level caches 154, 156 or 158. The level 2 cache 154 may return the tag value T2 for memory location 5, which illustrates returned cache item 1) above. To illustrate returned item 3) above where the second indicator is returned, consider a level 3 cache 156. Level 3 cache 156 may not contain any node information corresponding to memory location 5 (e.g., no cache location contains associated node or tag information), so this level 3 cache A second indicator described in 3) above indicating that there is no node information for memory location 5 may be returned. In such an embodiment, processing may generally use the tag value returned from the highest tag level cache. For example, referring to example 130 with respect to memory location 5, level 2 cache 154 is the top level tag cache that returns the tag value for memory location 5.

L1(레벨 1) 데이터 캐시에 저장된 메모리 위치의 내용의 경우, 캐싱된 정보에는 현재 태그 값 및 태그 값이 정의된 태그 캐시 계층의 레벨이 포함될 수 있다. 도 80의 계층(130)을 사용하는 메모리 위치 5에 대한 위의 예를 다시 참조하면, 메모리 위치 5의 내용이 또한 데이터 캐시에 저장되어 있다면, 데이터 캐시는 태그 값 T2를 포함할 수 있고 또한 현재 태그 값 T2가 레벨 2 캐시(154)에 저장된 노드 정보를 갖는 레벨 2 노드(예를 들어, B2)에 의해 정의된 정보를 포함할 수 있다. 따라서, 예(150)는 태그 값이 저장될 수 있는 태그 캐시 계층의 4 태그 캐시를 도시하며, 태그 캐시 계층에 저장된 임의의 태그 정보와 별도로 태깅된 데이터 캐시(예를 들어, L1 데이터 캐시)를 부가적으로 포함한다.For the contents of a memory location stored in the L1 (level 1) data cache, the cached information may include the current tag value and the level of the tag cache hierarchy at which the tag value is defined. Referring back to the above example for memory location 5 using layer 130 of FIG. 80, if the contents of memory location 5 are also stored in the data cache, then the data cache may contain the tag value T2 and also the current The tag value T2 may include information defined by a level 2 node (eg, B2) having node information stored in the level 2 cache 154. Thus, example 150 illustrates a four tag cache in a tag cache hierarchy in which tag values can be stored, and a tagged data cache (e.g., L1 data cache) separate from any tag information stored in the tag cache hierarchy. additionally included.

본 명세서에서의 기술에 따른 실시예에서, PUMP에 의해 수행되어 특정 어드레스를 갖는 특정 메모리 위치에 대한 태그 값을 해결하거나 결정하는 처리가 수행될 수 있다. 특정 메모리 위치에 대한 태그 값 및 내용을 획득하는 처리를 수행할 때, 메모리 캐시 내용 및 그의 태그가 데이터 캐시에 저장되는 데이터 캐시 히트가 있을 수 있다. 메모리 위치에 대한 데이터 캐시 히트가 발생하면, 이 메모리 위치에 대한 태그 값을 정의하는 태그 캐시 계층의 저장된 레벨을 참조하는 처리가 수행되어, 태그 캐싱 레벨로부터 획득된 제 1 캐싱된 태그 값이 데이터 캐시에 저장된 메모리 위치의 제 2 태그 값과 매칭함을 보장해 줄 수 있다. 둘이 매칭하지 않으면, 이것은 데이터 캐시에 저장된 제 2 캐시 태그 값이 정지 상태이고, 유효 기간이 지났으며 수정되었음을 나타낸다. 이 경우, 데이터 캐시에 저장된 메모리 위치에 대한 제 2 캐시 태그 값 및 메모리 위치에 대한 태그 캐시로부터 획득된 제 1 태그 값이 매칭하지 않으면, (예를 들어, 태그 캐시 계층에 저장된 것과 매칭하도록) 데이터 캐시에 저장된 메모리 위치에 대한 제 2 캐시 태그 값을 업데이트하는 것을 포함하는 처리가 수행될 수 있다. 적어도 하나의 실시예에서, 메모리 위치가 그의 데이터를 갖고 그래서 데이터 캐시에 그의 태그가 캐싱된 경우, 태그가 정의된 태그 캐시 계층의 레벨을 포함하는 메모리 위치의 태그에 대한 데이터 캐시 내의 정보가 추적될 수 있다. 전술한 캐시 태그 계층에 레벨을 저장하는 것은 (예를 들어, 모든 태그 레벨 캐시를 참조하거나 그렇지 않으면 예컨대 루트 또는 계층의 최상단에서부터 리프 노드를 향한 하향 검색에서 기존 노드의 계층을 찾아야 하는 것 보다는) 저장된 레벨이 태그 계층으로부터의 태그 값에 용이하게 액세스하는데 사용될 수 있는 최적한 것일 수 있다. 따라서, 데이터 캐시 히트가 발생하면 그리고 메모리 위치에 대한 데이터 캐시에 저장된 태그 값이 메모리 위치에 대한 태그 계층에 저장된 태그 값과 매칭하지 않는 경우, 처리는 데이터 캐시에 저장된 태그 값을 업데이트하는 것 및 메모리 위치의 태그가 태그 계층에서 정의된 것에 관해서 데이터 캐시에 저장된 계층 레벨 정보를 추가적으로 업데이트하는 것을 포함할 수 있다. 이어서, 메모리 위치에 대한 태그 값을 해결하거나 결정하기 위해 PUMP에 의해 수행되는 처리가 재시작될 수 있다.In an embodiment according to the techniques herein, a process performed by PUMP to resolve or determine a tag value for a specific memory location having a specific address may be performed. When performing the process of obtaining the tag value and content for a particular memory location, there may be a data cache hit where the memory cache content and its tag are stored in the data cache. When a data cache hit to a memory location occurs, processing is performed that references the stored level of the tag cache hierarchy that defines the tag value for that memory location, so that the first cached tag value obtained from the tag caching level is returned to the data cache. Matching with the second tag value of the memory location stored in may be guaranteed. If the two do not match, this indicates that the second cache tag value stored in the data cache is in a suspended state, has passed its expiration date, and has been modified. In this case, if the second cache tag value for the memory location stored in the data cache and the first tag value obtained from the tag cache for the memory location do not match (eg, to match those stored in the tag cache layer), data A process may be performed that includes updating a second cache tag value for a memory location stored in the cache. In at least one embodiment, when a memory location has its data and so its tag is cached in the data cache, information in the data cache for a tag in the memory location that includes the level of the tag cache hierarchy at which the tag is defined may be tracked. can Storing levels in the aforementioned cache tag hierarchy (rather than having to reference all tag level caches or otherwise find the hierarchy of existing nodes in a downward search from the root or top of the hierarchy towards a leaf node, for example) The level may be the optimal one that can be used to easily access tag values from the tag hierarchy. Thus, if a data cache hit occurs and the value of the tag stored in the data cache for the memory location does not match the value of the tag stored in the tag layer for the memory location, the process is to update the value of the tag stored in the data cache and the memory location. It may further include updating hierarchical level information stored in the data cache regarding the location's tag as defined in the tag hierarchy. The processing performed by PUMP to resolve or determine the tag value for the memory location can then be restarted.

메모리 위치에 대한 데이터 캐시 미스가 발생하면(예를 들어, 메모리 위치 내용 및 태그가 데이터 캐시에서 발견되지 않는 경우), (예를 들어, 최하단 태그 캐시 레벨 이외의) 태그 캐시의 레벨에서 태그 값에 대한 태그 캐시 룩업을 병렬로 수행하는 처리가 수행될 수 있다. 예를 들어, 메모리 위치에 대한 태그 값에 대한 룩업은 태그 캐시의 레벨 1, 2, 3 및 4에 대해 각각 4 캐시(152, 154, 156 및 158)를 참조함으로써 병렬로 수행될 수 있다. 위에서 논의된 바와 같이, 최상위 레벨의 태그 캐시(152, 154, 156 및 158)에 의해 리턴된 태그 값은 메모리 위치에 대한 태그 값으로서 사용된다. 또한, 적절하게 표현된 태그 값 계층에서, (152, 154, 156 및 158) 중 단 하나만이 메모리 위치에 대한 태그 값을 리턴할 수 있다는 것을 유의하여야 한다. 적어도 하나의 실시예에서, 캐시(152, 154, 156 및 158)는 병렬 액세스를 할 수 있도록 색인될 수 있다.If a data cache miss occurs for a memory location (eg, the memory location contents and tags are not found in the data cache), then the tag value is returned to a tag value at a level in the tag cache (eg, other than the lowest tag cache level). A process of parallelly performing tag cache lookup for the . For example, lookups for tag values for memory locations can be performed in parallel by referencing four caches 152, 154, 156, and 158 for levels 1, 2, 3, and 4 of the tag cache, respectively. As discussed above, the tag values returned by the top-level tag caches 152, 154, 156 and 158 are used as the tag values for memory locations. Also note that in a properly represented tagged value hierarchy, only one of (152, 154, 156 and 158) can return a tagged value for a memory location. In at least one embodiment, caches 152, 154, 156, and 158 may be indexed to allow parallel access.

실시예는 또한 모든 4 태그 캐시(152, 154, 156 및 158)와 관련하여 특정 메모리 위치의 태그를 찾는 병렬 검색 또는 룩업을 수행하지 않을 수 있다. 변형예로서, 실시예는 계층의 태그 캐시를 루트 노드 레벨(레벨 1)로부터 리프 노드(예를 들어, 레벨 4)를 향해 아래쪽으로 횡단(traverse)할 수 있다. 레벨N 에서 태그 캐시 미스의 경우, 특정 메모리 액세스를 위해 노드를 태그 캐시의 다른 레벨에 삽입함으로써 트리 또는 계층이 횡단될 수 있다. 태그 캐시의 병렬 검색을 위한 레벨 캐시 미스와 관련하여, 실시예는 태그가 있을 때 레벨 캐시에 노드를 삽입하는 것만을 선택할 수 있다. 따라서 일부 레벨 캐시가 태그를 제공하므로, 다른 모든 레벨 캐시가 NO TAG 엔트리를 갖는 것은 필요하지 않다.Embodiments may also not perform parallel searches or lookups to find tags of a particular memory location with respect to all four tag caches 152, 154, 156 and 158. As a variant, an embodiment may traverse the tag cache of the hierarchy downward from the root node level (level 1) towards the leaf nodes (eg, level 4). In the case of a tag cache miss at level N, the tree or hierarchy can be traversed by inserting nodes into different levels of the tag cache for specific memory accesses. Regarding level cache misses for parallel lookup of the tag cache, embodiments may only choose to insert a node into the level cache when a tag is present. Therefore, since some level caches provide tags, it is not necessary for all other level caches to have NO TAG entries.

본 명세서의 다른 곳에서 논의된 바와 같이, 메모리 위치의 태그는 수정될 수 있다. 메모리 위치의 태그를 수정하는 것에 응답하여, 메모리 위치에 대한 태그 값을 특정하는 계층을 적절히 업데이트하는 처리가 수행될 수 있다. 이러한 업데이트는 더 이상 동질이 아닌 계층의 임의의 하나 이상의 레벨을 무효화하는 것을 포함할 수 있다. 또한, 따라서 예컨대 도 82의 예(150)에 도시된 바와 같이 캐시 레벨을 적절하게 업데이트하는 처리가 수행될 수 있다.As discussed elsewhere herein, the tag of a memory location may be modified. In response to modifying the tag of the memory location, processing may be performed to properly update the hierarchy that specifies the tag value for the memory location. Such an update may include invalidating any one or more levels of the hierarchy that are no longer homogenous. Further, accordingly, a process of appropriately updating the cache level may be performed, for example, as shown in example 150 of FIG. 82 .

메모리 위치의 태그를 설정 또는 초기화하는 동작을 수행할 때, 그러한 처리는 원하는 동작을 수행하는 유효성에 대해 PUMP 검사를 포함할 수 있다. 예를 들어, 메모리 영역 내의 모든 메모리 위치를 새로운 태그로 다시 태깅하는 경우를 고려해 본다. 처리는 영역 내의 모든 메모리 위치의 현재 태그 값을 획득하는 것 및 재태그(retag)의 유효성을 PUMP 처리를 통해 체크하는 것을 포함할 수 있다. 허용된다면, 처리는 영역 내의 메모리 위치의 태그를 클리어하는 것 및 허용된다면, 그 다음에는 영역 내의 메모리 위치에 대한 태그 값을 갱신하는 것을 포함할 수 있다. 위의 논의와 일관하여, 메모리 영역의 태그를 업데이트, 수정 또는 설정하는 것은 메모리 영역 내의 상이한 메모리 위치에 대한 태그 값을 반영하도록 계층 및 연관된 노드를 수정하는 것(예를 들어, 수정 이전에는 동질이고 수정 이후에는 동질이 아닌 영역의 부분을 분해하는 것, 그리하여 태그 값 수정(들)을 반영하기 위해 추가된 부가적인 자식 또는 자손 노드가 있을 수 있는 것)을 포함할 수 있다.When performing an operation to set or initialize a tag of a memory location, such processing may include a PUMP check for validity of performing the desired operation. For example, consider the case of re-tagging all memory locations within a memory region with a new tag. The processing may include obtaining the current tag values of all memory locations in the region and checking the validity of the retags through PUMP processing. If permitted, the process may include clearing the tag of the memory location within the region and, if permitted, then updating the tag value for the memory location within the region. Consistent with the discussion above, updating, modifying, or setting a tag of a memory region means modifying the hierarchy and associated nodes to reflect tag values for different memory locations within the memory region (e.g., prior to modification, homogenous and After modification, it may include decomposing parts of the non-homogeneous region, so that there may be additional child or descendant nodes added to reflect the tag value modification(s).

적어도 하나의 실시예에서, 메모리 영역에 대한 태그 값의 계층적 표현은 트리일 수 있다. 예를 들어, 트리는 각 노드가 0, 1 또는 2 자식을 갖는 이진 트리일 수 있다. 변형예로서, 계층적 표현은 트리일 수도 있지만 이진 트리가 아닐 수도 있다. 예를 들어, 트리의 각 노드는 임의의 적절한 수의 자식 노드를 특정된 최대치까지 갖도록 허용될 수 있다. 계층적 표현은 임의의 적절한 수의 레벨, 각 레벨의 노드, 부모 노드 당 자식 등을 포함할 수 있다. 관련 기술분야에 공지된 바와 같이, 깊이 또는 레벨의 수 및 각 레벨에서의 노드/부모 노드 당 자식의 수와 같은 계층적 표현의 다양한 파라미터들 사이에서 상쇄관계가 있다. 예를 들어, 각 레벨에서 노드 수가 많을수록, 메모리 위치에 대한 태그 값을 결정할 때 레벨의 수는 적어지고 따라서 참조될 시간/레벨의 양이 단축된다. 그러나, 이러한 경우, 영역을 클리어하기 위해 더 많은 기입이 수행되어야 한다.In at least one embodiment, the hierarchical representation of tag values for memory regions may be a tree. For example, the tree can be a binary tree where each node has 0, 1 or 2 children. As a variant, the hierarchical representation may be a tree but not a binary tree. For example, each node in the tree may be allowed to have any suitable number of child nodes up to a specified maximum. A hierarchical representation may include any suitable number of levels, nodes at each level, children per parent node, and the like. As is known in the art, there are trade-offs between various parameters of a hierarchical representation, such as depth or number of levels and number of children per node/parent node at each level. For example, the greater the number of nodes in each level, the fewer the number of levels and thus the amount of time/levels to be referenced when determining a tag value for a memory location. However, in this case, more writing must be performed to clear the area.

이제 본 명세서에서의 기술에 따른 실시예에서 CFI 정책과 관련하여, 로더 코드의 결과로서 트리거되는 규칙에 의해 수행될 수 있는 기술이 설명될 것이다. 태그 정보에 액세스하는 메타데이터 처리 규칙을 사용하여 CFI 정책을 시행하려면, 허용 가능한 제어 흐름과 관련된 정보가 메타데이터 처리 도메인에 전달되어야 한다. 이를 위해, 본 명세서에서의 기술에 따른 실시예는 다음 단락에서 설명된 접근법을 사용할 수 있다. 일반적으로, 제어의 이전은 분기 소스로부터 타깃 또는 목적지로 이루어진다. 허용 가능한 제어 흐름과 관련하여, 특정 제어 흐름 타깃 또는 목적지에 대해, 특정 제어 흐름 타깃 또는 목적지로 제어의 이전이 허용되는 소스들의 세트가 식별될 수 있다. 각각의 가능한 제어 흐름 타깃에 대한 소스들의 세트는 저장된 메타데이터 태그 정보와 같은 메타데이터 처리 도메인으로 전달될 수 있으며, 그리고 나서 사용자 코드(예를 들어, 코드 실행 도메인 또는 비-메타데이터 처리 도메인에서 실행되는 코드)의 런타임 실행 동안 CFI 정책 시행과 관련하여 CFI 정책의 규칙에 의해 사용될 수 있다.Now, in relation to the CFI policy in an embodiment according to the technology herein, a technology that can be performed by a rule triggered as a result of a loader code will be described. To enforce CFI policies using metadata processing rules that access tag information, information related to permissible control flows must be passed to the metadata processing domain. To this end, embodiments according to the techniques herein may use the approaches described in the following paragraphs. In general, the transfer of control is from a branch source to a target or destination. With respect to an allowable control flow, for a particular control flow target or destination, a set of sources from which transfer of control to a particular control flow target or destination is permitted can be identified. A set of sources for each possible control flow target can be passed to the metadata processing domain, such as stored metadata tag information, and then executed in user code (e.g., code execution domain or non-metadata processing domain). code) may be used by the rules of the CFI Policy in connection with the enforcement of the CFI Policy during run-time execution of the CFI Policy.

수행되는 처리는 각 소스를 고유하게 태깅한 다음, 그 특정 타깃에 제어를 이전하도록 허용된 허용 가능한 소스들의 세트(예를 들어, 소스들의 어드레스)로 각 타깃을 태깅하는 것을 포함할 수 있다. 예를 들어, 도 83이 참조된다. 예(1700)에서, 요소(1701)는 코드 실행 또는 비-메타데이터 처리 도메인에서 실행되는 애플리케이션의 명령어의 코드 부분을 나타낼 수 있다. 요소(1702a 및 1704a 내지 1704c)는 코드 부분의 명령어의 위치를 나타낸다. 요소(1702a)는 제어 흐름 타깃 A를 나타낸다. 요소(1704a 내지 1704c)는 제어를 타깃 A1(1702a)에 이전하도록 허용된 제어 흐름 소스를 나타낸다. 이러한 (1704a 내지 1704c) 각각으로부터 이러한 제어 이전은 JMP(점프) A 명령어로 표시된다. 요소(1706)는 제어를 타깃 A로 이전하도록 허용되는 허용 가능한 소스들의 세트를 나타낸다. D7은 명령어(1704a)의 고유 소스 태그를 나타낸다. C3은 명령어(1704b)의 고유 소스 태그를 나타낸다. E8은 명령어(1704c)의 고유 소스 태그를 나타낸다. (1710)으로 도시된 바와 같이, JMP(점프) 명령어(1704a 내지 1704c)는 각각 D7, C3 및 E8로서 태깅된다. 또한, (1710)으로 도시된 바와 같이, 명령어(1704a 내지 1704c)는 각각 어드레스(1020, 1028 및 1034)에 저장된다. 타깃 위치 A는 어드레스 800을 갖는다. 이 경우에, 허용 가능한 소스 세트 또는 타겟 A로 제어를 이전하도록 허용된 소스 명령어의 어드레스는 (1706)으로 표시된 세트 {1020, 1028, 1034}일 수 있다. 따라서, 세트(1706)는 허용 가능한 메타데이터 처리 도메인으로 전달되어야 하는 허용 가능한 제어 흐름 정보의 예이며, 메타데이터 처리 도메인에서 이러한 허용 가능한 제어 흐름 정보는 CFI 정책의 규칙에 의한 사용을 위해 태그 메타데이터로서 저장된다. 본 명세서에서의 기술에 따른 적어도 하나의 실시예에서, 로더의 코드는 애플리케이션에 대한 CFI 정책이 코드 부분(1701)에 대한 CFI 정책을 시행하기 위해 메타데이터 처리 도메인에 의해 필요한 제어 흐름 정보를 수집하는 처리를 수행하는 규칙을 발동시킬 수 있다. 로더 코드는 애플리케이션을 로딩하는 것(예를 들어, 애플리케이션에 필요한 실행 코드를 로드하는 것)과 관련하여 실행되고, 이에 따라 로더 코드는, 애플리케이션을 로드하기 위해 실행하면서, (애플리케이션의 실행 동안 메타데이터 처리에 의해 나중에 CFI 정책을 시행하기 위해 사용되는) 제어 흐름 정보를 수집하는데 필요한 처리를 수행하는 규칙을 트리거한다.The processing performed may include uniquely tagging each source and then tagging each target with a set of permissible sources (eg, addresses of sources) that are allowed to transfer control to that particular target. For example, see FIG. 83. In example 1700, element 1701 may represent a code portion of instructions of an application running in a code execution or non-metadata processing domain. Elements 1702a and 1704a through 1704c represent the locations of instructions in the code portion. Element 1702a represents control flow target A. Elements 1704a through 1704c represent control flow sources that are allowed to transfer control to target A1 1702a. This transfer of control from each of these 1704a through 1704c is indicated by the JMP (jump) A instruction. Element 1706 represents the set of permissible sources that are allowed to transfer control to target A. D7 represents the unique source tag of instruction 1704a. C3 represents the unique source tag of instruction 1704b. E8 represents the unique source tag of instruction 1704c. As shown by 1710, JMP (jump) instructions 1704a through 1704c are tagged as D7, C3, and E8, respectively. Also, as shown at 1710, instructions 1704a through 1704c are stored at addresses 1020, 1028, and 1034, respectively. Target location A has address 800. In this case, the set of allowable sources or addresses of source instructions allowed to transfer control to target A may be the set {1020, 1028, 1034} denoted by 1706. Accordingly, set 1706 is an example of permissible control flow information that must be passed to the permissible metadata processing domain, where such permissible control flow information is included in the tag metadata for use by the rules of the CFI Policy. stored as In at least one embodiment consistent with the techniques herein, the loader's code collects control flow information needed by the metadata processing domain to enforce the CFI policy for the application code portion (1701). You can trigger rules that perform processing. The loader code executes in conjunction with loading the application (e.g., loading executable code required by the application), so the loader code executes to load the application (metadata during execution of the application). The processing triggers rules that perform the processing necessary to collect control flow information (which is later used to enforce CFI policies).

본 명세서의 다른 곳에서의 설명과 일관하는 적어도 하나의 실시예에서, 커널 코드의 실행은, 실행될 때, (예를 들어, 소스 태그 D7, C3 및 E8을 발생하는) 소스에 태깅하는데 사용되는 소스 태그의 시퀀스(시퀀스의 각 태그는 고유함)를 발생하는 규칙을 트리거하는 로더의 태깅된 명령어를 인에이블하는 특수 명령어 태그로 로더의 코드를 태깅하는 규칙을 트리거할 수 있다. 예를 들어, 로더의 코드를 실행한 결과 발동된 규칙에 의해 수행되는 로직 처리를 포함하는 도 84가 참조된다. 로직 처리는 이러한 처리가 A(1702a)와 같은 각 제어 흐름 타깃에 대해 수행될 수 있는 C-유사 의사 코드 서술을 사용하여 (1720)에 설명된다. 단계(1721)에서, 소스 세트는 비어 있는 세트로 초기화된다. 단계(1722)에서, 제어를 타깃에 이전하도록 허용된 각각의 소스에 대해, 단계(1723, 1724 및 1725)가 수행될 수 있다. 단계(1723)에서, t는 새로 할당된 CFI 소스 태그를 할당 받는다. 단계(1724)에서, (타깃에 제어를 이전하는 명령어의) 소스 위치는 단계(1723)에서 생성된 새롭게 할당된 태그 t로 태깅된다. 단계(1725)에서, 소스 세트는 또한 태그 t를 포함하도록 업데이트된다. 일 양태에서, 단계(1725)의 동작은 각 소스에 대해 수행되는 바와 같이, (1722)에서 시작하는 루프 처리의 매 반복마다 (1725)에서 합집합 연산이 수행되는 타깃에 대해 허용 가능한 소스들의 세트의 합집합을 형성하는 것으로 특징지을 수 있다. 단계(1726)는 타깃을 소스 세트로 태깅하거나 마킹한다.In at least one embodiment, consistent with description elsewhere herein, execution of kernel code, when executed, is used to tag sources (generating source tags D7, C3, and E8, for example). You can trigger a rule tagging the loader's code with a special instruction tag that enables the loader's tagged instruction to trigger a rule that fires a sequence of tags (each tag in the sequence is unique). For example, reference is made to FIG. 84 which includes logic processing performed by a rule triggered as a result of executing the code of the loader. Logic processing is described at 1720 using C-like pseudocode descriptions such processing can be performed for each control flow target, such as A 1702a. At step 1721, the source set is initialized to an empty set. In step 1722, steps 1723, 1724 and 1725 may be performed for each source allowed to transfer control to the target. At step 1723, t is assigned the newly assigned CFI source tag. In step 1724, the source location (of the instruction transferring control to the target) is tagged with the newly assigned tag t created in step 1723. At step 1725, the source set is updated to also include tag t. In one aspect, the operation of step 1725 is performed on each iteration of the loop process starting at 1722, as performed for each source, the set of sources acceptable to the target for which the union operation is performed at 1725. It can be characterized as forming a union. Step 1726 tags or marks the target as a source set.

요소(1723)는 새로운 CFI 소스 태그를 발생하거나 할당하는 규칙을 트리거하는 로더 코드에 포함된 다음의 명령어일 수 있다:Element 1723 may be the following instruction included in loader code that triggers a rule to generate or assign a new CFI source tag:

여기서 (RISC-V 명령 세트 내의 ADDI와 같은) ADD 명령어는 커널 코드에 의해, 이 명령어를 허용 가능한 태그 발생기 명령어로서 마킹하는 CFI-alloc-tag의 특수 CI 태그로 태깅되어 있다. 적어도 하나의 실시예에서, 소스 태그의 상이한 시퀀스는 CFI 정책과 관련하여 로더에 의해 각각의 애플리케이션에 대해 발생될 수 있다(예를 들어, 예(1620)에서, 로더는 애플리케이션에 대한 CFI 소스 태그의 고유 시퀀스로서 CFI 태그(1630)의 상이한 시퀀스를 사용할 수 있고, 이 애플리케이션에서 CFI 태그의 시퀀스는 (1627)의 CFI 태그 발생기 App-n 태그들 중 특정 태그로부터 생성될 수 있다). CFI-alloc-tag는 ADD 명령어가 애플리케이션-특정 CFI 시퀀스에서 다음 태그를 할당하거나 발생할 수 있음을 나타내는 위의 로더 ADD 명령어에 배치된 CI 태그이다. CFI-alloc-tag는 예(1620)과 관련하여 설명된 바와 같이 (1624)의 특수 명령 타입 중 하나일 수 있다. 위의 ADD 명령어는 R1상의 태그가 상태가 이전에 생성된 시퀀스의 마지막 태그일 수 있는 CFI 시퀀스의 상태를 유지함을 나타낸다. 위의 ADD 명령어를 실행하면 CFI 시퀀스에서 다음의 새로운 태그를 발생하고 R1상의 태그를 새로 발생된 태그로 업데이트하는 규칙이 트리거된다. 본 명세서의 다른 곳에서 설명된 바와 같이 규칙 관행을 사용하면, 다음과 같은 ADD 규칙이 위의 ADD 명령어에 의해 트리거된 규칙임을 나타낼 수 있고:Here, an ADD instruction (such as ADDI in the RISC-V instruction set) is tagged by the kernel code with a special CI tag of CFI-alloc-tag that marks it as an acceptable tag generator instruction. In at least one embodiment, a different sequence of source tags may be generated for each application by a loader in association with a CFI policy (e.g., in example 1620, the loader may generate a set of CFI source tags for an application). A different sequence of CFI tags 1630 can be used as a unique sequence, and the sequence of CFI tags in this application can be generated from a specific one of the CFI Tag Generator App-n tags in 1627). CFI-alloc-tag is a CI tag placed in the loader ADD instruction above indicating that the ADD instruction may allocate or generate the next tag in the application-specific CFI sequence. The CFI-alloc-tag may be one of the special command types of 1624 as described with respect to example 1620. The ADD instruction above indicates that the tag on R1 holds the state of the CFI sequence whose state could be the last tag of a previously created sequence. Executing the above ADD command triggers a rule that generates the next new tag in the CFI sequence and updates the tag on R1 with the newly generated tag. Using rule conventions as described elsewhere herein, it can be indicated that the following ADD rule is a rule triggered by the ADD command above:

이것은 ADDI 명령어에 대한 CI 태그가 CFI-alloc-tag임을 보장한다. 이러한 ADD 규칙에서, t1은 시퀀스의 다음 태그인 t1next를 발생하는데 사용되는 (애플리케이션의 CFI 태그 시퀀스의 현재 상태로서 저장된) 시퀀스의 이전 태그를 나타내며, 여기서 이후 t1next는 RD(목적지 또는 결과 레지스터)에 대한 태그로서 저장된다. CFI 시퀀스의 전술한 태그, t1next는 소스 포인트에 배치된 고유한 CFI 소스 태그로 사용될 수 있다.This ensures that the CI tag for the ADDI command is a CFI-alloc-tag. In these ADD rules, t1 represents the previous tag in the sequence (saved as the current state of the application's CFI tag sequence) that is used to generate the next tag in the sequence, t1next, where the next t1next is the value for the destination or result register (RD). stored as tags. The aforementioned tag of the CFI sequence, t1next, may be used as a unique CFI source tag placed at the source point.

요소(1724)는 소스 위치를 고유한 CFI 소스 태그로 태깅하는 규칙을 트리거하는데 사용되는 아래의 ST(스토어) 명령어와 같은 로더 코드의 명령어일 수 있고:Element 1724 may be an instruction in the loader code, such as the ST (store) instruction below used to trigger a rule to tag a source location with a unique CFI source tag:

여기서 R3은 태깅되는 사용자 프로그램 코드의 제어 흐름 소스 위치(예를 들어, 예(1700)의 (1704a))를 가리키는 포인터이고, R1상의 태그는 소스 위치에 배치될 고유 CFI 소스 태그이다. 위의 ST 명령어는 또한 ST 명령어가 아래의 로더 코드 트리거링 규칙 ST에 포함되어 있음을 나타내는 CI-LDR과 같은 특수 CI 태그로 태깅할 수 있고:where R3 is a pointer pointing to the control flow source location of the user program code being tagged (e.g., 1704a of example 1700), and the tag on R1 is the unique CFI source tag to be placed at the source location. The ST instruction above can also be tagged with a special CI tag such as CI-LDR indicating that the ST instruction is included in the loader code triggering rule ST below:

여기서 CI tag=CI-LDR이고, t1은 R1상에 태그로서 현재 저장된 CFI 소스 태그이고, codetag는 어드레스 R3에 있는 소스 위치상의 명령어 태그이다(예를 들어, 소스 위치가 현재 코드로서 태깅된 것을 보장하는 태그이다). 결과적으로, 목적지(R3)는 t1, 즉 고유한 CFI 소스 태그로 태깅된다.where CI tag=CI-LDR, t1 is the CFI source tag currently stored as a tag on R1, and codetag is the instruction tag on the source location at address R3 (e.g. ensure that the source location is currently tagged as code is a tag that does). As a result, destination R3 is tagged with t1, a unique CFI source tag.

요소(1725)는 아래의 ADD 명령어와 같이, 소스의 어드레스(예를 들어, R3가 현재 가리키고 있음, R3는 소스의 어드레스를 포함함)를 타깃으로 제어를 이전할 수 있는 허용 가능한 소스 위치를 나타내는 CFI 소스 태그들의 누적 세트에 가산하는 규칙을 트리거하는데 사용되는 로더 코드의 명령어일 수 있으며:Element 1725 represents an acceptable source location from which control can be transferred to the target address of the source (e.g., R3 currently points to, where R3 contains the address of the source), such as in the ADD instruction below. An instruction in the loader code used to trigger a rule that adds to the cumulative set of CFI source tags may be:

여기서 R2상의 태그는 허용 가능한 소스 위치들의 누적 세트를 나타내는 메모리 위치를 가리킨다. 위의 ADD 명령어는 특수 CFI UNION 명령어 태그로 태깅되어, 이 ADD 명령어가 CFI 소스의 합집합 연산을 수행하는 것이고 합집합은 R2상의 태그로서 저장됨을 나타낼 수 있다. 위의 ADD 명령어의 결과로서 ADD에 대한 다음의 규칙이 발동될 수 있으며:Here, the tag on R2 points to a memory location representing the cumulative set of acceptable source locations. The above ADD instruction may be tagged with a special CFI UNION instruction tag to indicate that this ADD instruction performs a union operation of CFI sources and the union is stored as a tag on R2. As a result of the ADD command above, the following rules for ADD can be triggered:

이것은 CI 태그가 CFI UNION이고, tset이 타깃 세트이며, tsrc가 소스 태그임을 보장하기 위해 체크한다. 이 규칙은 tset에 tsrc를 가산함을 나타내는 새로운 CFI 세트, tunion, 을 생성한다.This checks to ensure that the CI tag is a CFI UNION, tset is the target set, and tsrc is the source tag. This rule creates a new CFI set, tunion, indicating the addition of tsrc to tset.

요소(1726)는 아래의 ST 명령어와 같이, 타깃에 제어를 이전할 수 있는 허용 가능한 소스 위치들의 합집합 또는 누적된 리스트로 타깃을 태깅하는 규칙을 트리거하는데 사용되는, 로더 코드의 명령어일 수 있다:Element 1726 can be an instruction in loader code, used to trigger a rule tagging a target with a union or accumulated list of allowable source locations that can transfer control to the target, such as the ST instruction below:

R17은 타깃 위치의 어드레스를 포함하는 레지스터일 수 있고, R2는 위에서 언급한 바와 같이 허용 가능한 소스 위치들 현재 누적된 세트의 합집합으로 태깅된 레지스터일 수 있다(예를 들어, R2상의 태그는 어드레스가 R17에 포함된 타깃 위치에 대해 허용가능한 소스 위치들의 세트를 나타낸다). 위의 ST 명령어는 이 명령어를 제어 이전 타깃 위치에 태깅하도록 허용된 특수한 명령어라고 표시하는 특수 명령어 태그 CFI MARK TARGET으로 태깅될 수 있다(예를 들어, (1720)의 처리를 수행하는 규칙을 트리거하는 로드 코드 명령어상의 다른 코드 태그와 유사한 방식으로, 로더 코드의 이러한 STORE 명령어(1726)는 커널 코드에 의해 태깅될 수 있다). (1726)에 대한 위의 STORE 명령어의 결과로서 다음의 ST 규칙이 트리거될 수 있으며:R17 may be the register containing the address of the target location, and R2 may be the register tagged as the union of the currently accumulated set of allowable source locations as mentioned above (e.g., the tag on R2 is the address represents the set of allowable source locations for the target location included in R17). The above ST instruction may be tagged with the special instruction tag CFI MARK TARGET, which marks this instruction as a special instruction allowed to tag the target location prior to control (e.g., triggering a rule that performs the processing of 1720). In a similar manner to other code tags on load code instructions, these STORE instructions 1726 in loader code may be tagged by kernel code). As a result of the STORE command above for (1726), the following ST rules may be triggered:

이것은 CI 태그가 CFI MARK TARGET이고, 타깃(R17이 가리키고 있음, R17은 타깃 어드레스를 포함함)이 명령어를 나타내는 코드 태그로 태깅될 때 트리거하고, tset 주석을 타깃에 배치한다.This triggers when the CI tag is CFI MARK TARGET, the target (pointed by R17, where R17 contains the target address) is tagged with a code tag representing the instruction, and places a tset annotation on the target.

소스, 타깃 및 허용 가능한 소스 위치들의 세트와 함께 사용하기 위해 정의될 수 있는 다른 태그 구조체 또는 레이아웃은 본 명세서의 다른 곳에서뿐만 아니라 임의의 다른 적합한 구조체 정의에서 설명되어 있다(예를 들어, 임의의 명령어와 함께 더 일반적으로 사용될 수 있을 뿐만 아니라 태깅된 워드 당 다수의 명령어와 관련하여 더 일반적으로 사용될 수 있는 태깅된 소스 및 타깃 위치와 함께 사용하기 위한 태그 레이아웃을 설명하는 예(240, 250, 260, 267, 270 및 280)를 참고할 것).Other tag structures or layouts that may be defined for use with a source, target, and set of permissible source locations are described elsewhere herein, as well as in any other suitable structure definition (e.g., any instruction Examples (240, 250, 260, 240, 250, 260, 267, 270 and 280)).

따라서, 예컨대 예(1720)에서 위에서 설명된 처리 단계는, 이러한 로더 코드가 실행될 때, 예(1720)의 단계가 본 명세서에서의 기술에 따른 실시예에서 메타데이터 처리 도메인에 의해 수행되도록 하는 규칙이 발동되도록 로더의 코드를 적절하게 태깅함으로써, 수행될 수 있다. 명령어의 결과로서의 전술한 명령어 시퀀스 및 발동된 규칙은 본 명세서에서의 기술을 사용하는 실시예에서 사용될 수 있는 명령 및 규칙의 단지 하나의 예에 불과하다는 것을 알아야 한다. 예를 들어, 실시예는 위에서 설명된 바와 같은 처리를 수행하는 규칙(예를 들어, 요소(1725))을 트리거하는 로더 코드에 ADD 이외의 상이한 명령어를 포함시킬 수 있다.Thus, the processing steps described above, e.g., in example 1720, when such loader code is executed, the rules that cause the steps in example 1720 to be performed by the metadata processing domain in an embodiment consistent with the description herein. This can be done by properly tagging the loader's code to be triggered. It should be noted that the foregoing sequence of commands and rules invoked as a result of commands are only one example of commands and rules that may be used in an embodiment using the techniques herein. For example, an embodiment may include instructions other than ADD in the loader code that trigger a rule (eg, element 1725) to perform processing as described above.

전술한 설명에서, 특정 용어는 간결성, 명료성 및 이해를 위해 사용되었다. 이러한 용어는 설명의 목적으로 사용되며 폭넓게 해석되는 것으로 의도되기 때문에 선행 기술의 요구 사항을 넘어서는 불필요한 제한이 함축되지 않는다. 더욱이, 본 개시내용의 바람직한 실시예의 설명 및 예시는 예이며, 본 개시내용은 도시되거나 설명된 정확한 세부 사항으로 제한되지 않는다.In the foregoing description, certain terminology has been used for brevity, clarity, and understanding. As these terms are used for descriptive purposes and are intended to be interpreted broadly, no unnecessary limitations beyond the requirements of the prior art are implied. Moreover, the descriptions and illustrations of preferred embodiments of the present disclosure are examples, and the present disclosure is not limited to the exact details shown or described.

본 명세서에서 설명된 기술의 다양한 양태는 임의의 하나 이상의 상이한 형태의 컴퓨터 판독 가능한 매체 상에 저장된 코드를 실행함으로써 수행될 수 있다. 컴퓨터 판독 가능한 매체는 착탈 가능형 또는 비착탈 가능형일 수 있는 상이한 형태의 휘발성(예를 들어, RAM) 및 비휘발성(예를 들어, ROM, 플래시 메모리, 자기 또는 광학 디스크, 또는 테이프) 저장소를 포함할 수 있다.Various aspects of the techniques described herein may be performed by executing code stored on any one or more different forms of computer readable media. Computer readable media includes different forms of volatile (eg, RAM) and nonvolatile (eg, ROM, flash memory, magnetic or optical disks, or tape) storage that may be removable or non-removable. can do.

본 발명은 도시되고 상세한 설명된 다양한 실시예와 관련하여 개시되었지만, 본 발명의 수정 및 개선은 관련 기술분야에서 통상의 기술자에게 쉽게 명백해질 것이다. 따라서, 본 발명의 사상과 범위는 다음의 청구범위에 의해서만 제한되어야 한다.Although the invention has been disclosed in connection with the various embodiments shown and described in detail, modifications and improvements of the invention will be readily apparent to those skilled in the art. Accordingly, the spirit and scope of the present invention should be limited only by the following claims.

PUMP 프로그래밍PUMP programming

보안을 위한 하드웨어 지원형 마이크로 정책Hardware-assisted micro-policies for security

요약summary

광범위한 보안 정책은 ISA 레벨의 메타데이터에 관한 규칙으로 공식화되고 프로그래머블 하드웨어에서 효율적으로 시행할 수 있다. 우리는 주요 계산과 함께 해석되지 않은 메타데이터에 관한 유연한 규칙 평가를 지원하는 PUMP(Programmable Unit for Metadata Processing) 아키텍처를 기초로 하는 이와 같은 정책에 대한 프로그래밍 모델을 자세히 설명한다. 우리는 네 개의 특정 영역 - 공간적 및 시간적 메모리 안전, 테인트 추적, 제어 흐름 무결성 및 프리미티브 타입 지정 - 에서 다양한 복잡성의 안전 및 보안 정책의 다양한 세트를 구현함으로써 모델의 일반성을 보여준다. 우리는 간단한 RISC ISA에 대해 이러한 정책의 성능을 단독으로 또는 조합하여 특징짓는다. 대부분의 정책에 대한 평균 런타임 오버헤드는 8 %일뿐이다. 이것은 PUMP 모델이 전용 하드웨어의 성능으로 소프트웨어 집행의 유연성과 적응성을 달성할 수 있음을 보여준다.Extensive security policies are formulated as rules for ISA-level metadata and can be efficiently enforced in programmable hardware. We detail a programming model for such policies, based on the Programmable Unit for Metadata Processing (PUMP) architecture, which supports flexible rule evaluation on uninterpreted metadata along with key computations. We demonstrate the generality of the model by implementing different sets of safety and security policies of varying complexity in four specific domains - spatial and temporal memory safety, taint tracking, control flow integrity and primitive typing. We characterize the performance of these policies, alone or in combination, for a simple RISC ISA. The average runtime overhead for most policies is only 8%. This shows that the PUMP model can achieve flexibility and adaptability of software execution with the performance of dedicated hardware.

1. 소개1. Introduction

공격자가 텐트 안의 프로그램을 뒤엎는 것은 너무 쉽다. 프로세서가 수행하는 동작의 의도된 높은 레벨의 의미론에 관용적이도록 설계된 최신의 프로세서는 트랜지스터가 비싸며 기본 설계 목표가 런타임 성능이었던 기술 시대의 유산인, 이러한 상황에서 상충하고 있다. 컴퓨터 시스템이 점점 중요한 업무를 맡게 됨에 따라, 시스템 보안은 마침내 핵심 설계 타깃이 되었다. 동시에, 프로세서는 현재 그저 평범한 시스템-온-칩 다이와 비교하여 크기가 작아, 프로세서를 보안을 강화하는 하드웨어로 늘리는 것이 실현 가능하고 비용이 저렴하다. 내일의 컴퓨터들에 대해, 이들이 관리하는 데이터의 개인 정보 및 무결성을 적절하게 보호하기 위해, 우리는 전체 컴퓨팅 스택을 현대적인 위협 및 하드웨어 비용과 일관하는 보안 메커니즘으로 근본적으로 재구성해야 한다.It is all too easy for an attacker to subvert the programs in the tent. Modern processors designed to be tolerant of the intended high-level semantics of the operations they perform are in conflict in this context, a legacy of a technology era in which transistors are expensive and the primary design goal was run-time performance. As computer systems take on increasingly critical tasks, system security finally becomes a key design target. At the same time, processors are now small compared to just plain system-on-chip dies, making it feasible and inexpensive to augment them with security-enhancing hardware. For tomorrow's computers, to adequately protect the privacy and integrity of the data they manage, we must radically reconfigure our entire computing stack with security mechanisms consistent with modern threats and hardware costs.

본 보안 문헌은 악의적이고 오류가 있는 코드로 인한 취약성을 줄일 수 있는 광범위한 런타임 정책을 제공한다. 이들 정책은 종종 고급 언어 추상화(이것은 숫자 배열이고, 이것은 코드 포인터이고, …) 또는 사용자 레벨 보안 불변성(이 문자열은 네트워크에서 온 것)을 프로그램의 데이터 및 코드상의 메타데이터 주석으로 인코딩한다. 높은 레벨의 의미론 또는 정책은 계산이 진행됨에 따라 이 메타데이터를 전파하고 적절한 지점에서 위반을 동적으로 체크함으로써 시행된다. 우리는 이러한 낮은 레벨의 미세 세밀화된 집행 메커니즘을 마이크로-정책(또는 비공식적으로는 그저 "정책")이라 부른다.This security literature provides extensive runtime policies that can reduce vulnerabilities due to malicious and erroneous code. These policies often encode high-level language abstractions (this is an array of digits, this is a code pointer, ...) or user-level security invariants (this string comes from the network) into the program's data and metadata annotations on the code. High-level semantics or policies are enforced by propagating this metadata as computation proceeds and dynamically checking for violations at appropriate points. We call these low-level, fine-grained enforcement mechanisms micro-policies (or, informally, just "policies").

마이크로-정책의 소프트웨어 실현은 임의의 메타데이터 및 임의의 강력한 계산을 이들 이상으로 정의할 수 있다. 소프트웨어 구현은 새로운 정책의 신속한 배포를 용이하게 하지만, 런타임 및 에너지 비용의 측면에서 엄청나게 비싸서(1.5x 내지 10x) [42], 호의적이지 않은 보안 성능 상충 관계를 초래할 수 있다. 간단한 마이크로-정책은 오버헤드가 낮은 하드웨어에서 지원될 수 있다[41,?]; 그러나 단일 정책을 지원하기 위해 고객화된 하드웨어는 배포하는데 수년이 걸릴 수 있으며 적응하는 것이 느리다. 오늘날의 역동적인 사이버 공격 환경은 진화하는 위협에 신속한 현장 대응을 지원하는 메커니즘을 필요로 한다.A software realization of the micro-policy can define arbitrary metadata and arbitrary powerful computation beyond these. Software implementation facilitates rapid deployment of new policies, but can be prohibitively expensive (1.5x to 10x) in terms of runtime and energy cost [42], resulting in unfavorable security performance trade-offs. Simple micro-policies can be supported in hardware with low overhead [41,?]; However, hardware customized to support a single policy can take years to deploy and is slow to adapt. Today's dynamic cyberattack environment requires mechanisms to support a rapid, on-the-ground response to evolving threats.

더 큰 유연성에 대한 바람은 최근에 정책 집행 하드웨어를 더 프로그램 가능하게 만드는 최근의 많은 노력을 촉발시켰다 [18, 45, 19, 13](§5 참조). 여기서, 우리는 임의의 메타데이터에 관해 세밀화된 명령어 계산의 관점에서 광범위한 낮은 레벨의 런타임 정책이 정의될 수 있게 하는 PUMP [7], "Programmable Unit for Metadata Processing"라고 불리는 설계를 고려한다. 하드웨어 레벨에서, 모든 데이터 워드는 워드 크기의 메타데이터 태그와 연관된다. 이러한 태그는 하드웨어에 의해 해석되지 않으며; 소프트웨어에서 태그는 유형, 출처, 분류 레벨 또는 정보가 첨부된 데이터의 신뢰성과 같은 정보 표현에 매핑될 수 있다. 태그는, 포인터를 표현할 만큼 충분히 크기 때문에, 여러 직교 정책이 병렬로 시행될 수 있게 하는, 메타데이터의 튜플을 비롯한 임의의 크기와 복잡성을 가진 데이터 구조체를 말할 수 있다. 프로그램 카운터는 프로그램의 제어 상태의 이력 추적을 지원하도록 태깅되며; 프로그램 코드는 코드 출처, 제어 흐름 및 구획화에 관한 정책을 지원하도록 태깅된다. 프로세서 코어는 명령어 실행과 동시에 고성능 규칙 분석을 가능하게 하는 규칙 캐시 및 이 캐시에서 룩업이 미스될 때 정책 처리 코드로의 신속한 컨텍스트 전환을 위한 특수 동작 모드로 늘어난다. 이것은 PUMP가 소프트웨어의 표현성과 적응성 및 하드웨어의 성능을 통해 광범위한 낮은 레벨 정책을 용이하게 시행될 수 있게 한다.The desire for greater flexibility has fueled many recent efforts to make policy enforcement hardware more programmable [18, 45, 19, 13] (see §5). Here, we consider a design called PUMP [7], “Programmable Unit for Metadata Processing”, which allows a wide range of low-level runtime policies to be defined in terms of fine-grained instruction computation on arbitrary metadata. At the hardware level, every word of data is associated with a word-sized metadata tag. These tags are not interpreted by hardware; In software, tags can be mapped to informational representations such as type, source, classification level, or authenticity of the data to which the information is attached. Tags can refer to data structures of arbitrary size and complexity, including tuples of metadata, that are large enough to represent pointers, allowing multiple orthogonal policies to be enforced in parallel. A program counter is tagged to support historical tracking of a program's control state; Program code is tagged to support policies regarding code provenance, control flow, and compartmentalization. The processor core is augmented with a rule cache that enables high-performance rule analysis in parallel with instruction execution, and a special mode of operation for rapid context switching to policy processing code when a lookup misses in this cache. This allows PUMP to easily enforce a wide range of low-level policies through the expressiveness and adaptability of software and the capabilities of hardware.

이 논문의 하나의 목표는 실제 위협에 대해 PUMP와 유사한 태깅 및 규칙 처리가 유용하다는 것 및 규칙의 형태로 정책을 작성하는 것이 쉽다는 것 둘 모두를 보여주는 것이다. 우리는 PUMP가 낮은 레벨의 보안 및 안전 정책의 다양한 모음을 지원하도록 프로그램될 수 있는 방법을 상세히 설명함으로써 이를 수행한다. 우리는 네 가지 정책 계열(모두 문헌에 나와 있음): (i) 약한 형태의 타입 안전(type safety)을 시행하는 프리미티브 타입; (ii) 공간적 및 시간적 메모리 안전, 캐싱 경계 및 힙-할당된 데이터에 대한 유저-애프터-프리(use-after-free) 에러; (iii) 코드-재사용(code-reuse) 공격을 방지하는 제어 흐름 무결성(CFI) [2]; (iv) 테인트 추적(taint tracing) - 여기서 테인트는 주어진 데이터 조각의 원인이 되었을 수 있는 데이터 소스 또는 컴포넌트를 나타낼 수 있음 - 의 상세한 구현 및 평가를 제시한다. 이러한 정책의 대부분은 현재 시스템이 소프트웨어에서 효율적으로 지원할 수 있는 것 이상이다. 마지막으로, 우리는 어떻게 이러한 정책이 동시에 적용될 수 있는지를 보여준다. 이러한 정책은 기존의 문헌에서 잘 연구되었기 때문에, 우리는 이들 정책이 제공하는 보안 보증에 주요 초점을 맞추지 않고, 그 보다는 정책이 규칙으로서 표현되고 PUMP를 이용하여 시행될 수 있는 방법을 탐구한다. 우리는 명령어 추적 시뮬레이션을 사용하여 PUMP가 단순한 순차 RISC 프로세서 (Alpha [1])에 첨부될 때 SPEC CPU2006 Benchmark Suite에 대해 이러한 정책이 미치는 런타임 영향을 예측한다. 우리는 PUMP가 광범위한 복잡성을 가진 정책을 지원하고 성능 영향을 정량화할 수 있음을 보여준다. 이 범위는 위협이 진화함에 따라 정책을 개선하는 기능과 이러한 진화가 어떻게 성능에 미칠 수 있는지를 예시한다.One goal of this paper is to show both the usefulness of PUMP-like tagging and rule handling for real threats and the ease of writing policies in the form of rules. We do this by detailing how PUMPs can be programmed to support a diverse collection of low-level security and safety policies. We have four families of policies (all of which are in the literature): (i) primitive types that enforce a weak form of type safety; (ii) user-after-free errors for spatial and temporal memory safety, caching bounds, and heap-allocated data; (iii) control flow integrity (CFI) to prevent code-reuse attacks [2]; (iv) present detailed implementation and evaluation of taint tracing, where a taint may indicate a data source or component that may have contributed to a given piece of data; Many of these policies are beyond what current systems can efficiently support in software. Finally, we show how these policies can be applied simultaneously. Since these policies have been well studied in the existing literature, we do not focus primarily on the security assurances these policies provide, but rather explore how policies can be expressed as rules and enforced using PUMP. We use instruction trace simulations to predict the runtime impact of these policies on the SPEC CPU2006 Benchmark Suite when PUMPs are attached to simple sequential RISC processors (Alpha [1]). We show that PUMP can support policies of a wide range of complexity and quantify their performance impact. This range exemplifies the ability to improve policy as a threat evolves and how this evolution can affect performance.

본 논문은 금년 여름 늦게 워크숍에서 발표될 짧은 논문인 확장되고, 풍부해지고, 재조명된 [7]의 버전이다. 이전 논문은 PUMP를 RISC 프로세서에 직접 하드웨어 통합하는데 초점을 맞추고, 대부분의 벤치마크에 대해 합리적인 성능을 수립하고, 개선할 영역을 확인한다. 본 저작물에서, 우리는 [7]에서 잘 설명되어 있는 마이크로아키텍처적 고려 사항을 피하고, 대신 프로그래밍 모델 및 훨씬 더 상세한 설명과 정책 자체의 평가에 초점을 맞춘다. 우리는 또한 PUMP 소프트웨어 서비스가 어떻게 남용으로부터 스스로를 보호하는지를 설명한다. 우리가 보고하는 성능은 (i) opgroup의 사용(§2), (ii) 미스 비용의 보다 정확한 추정(§3) 및 (iii) 필요할 때만 포인터 태그를 사용함으로써 DRAM 액세스의 저감(§4)으로 인해 [7]을 개선한다.This paper is an expanded, enriched, and refocused version of [7], a short paper to be presented at a workshop later this summer. The previous paper focuses on direct hardware integration of PUMP into RISC processors, establishes reasonable performance for most benchmarks, and identifies areas for improvement. In this work, we avoid the microarchitectural considerations well described in [7], and instead focus on the evaluation of the programming model and much more detailed description and policy itself. We also explain how the PUMP software service protects itself from abuse. The performance we report can be attributed to (i) the use of opgroups (§2), (ii) a more accurate estimate of the miss cost (§3) and (iii) a reduction in DRAM accesses by using pointer tags only when needed (§4). [7] is improved due to

요약하면, 본 저작물의 주요 기여 사항은 (i) 이 아키텍처에 의해 지원되는 정책을 간결하고 정확하게 설명하기 위한 프로그래밍 모델 및 지원 인터페이스 모델 (§및 §3); (ii) 잘 연구된 정책의 네 가지 다양한 클래스를 사용하여 정책 인코딩 및 구성에 대한 상세한 예; (iii) 이들 정책에 대한 요건, 복잡성 및 성능의 정량화 (§4)이다. §5 및 §에서, 우리는 관련 있는 그리고 향후의 작업에 대해 논의한다. 몇 가지 추가 자료는 http://git.io/8K7IKA에서 익명으로 입수 가능하다. 이들 자료는: 연구된 정책에 대한 완전한 정의가 있는 부록인, 우리 실험의 소스 코드 및 익명화된 버전의 [7]을 포함한다.In summary, the main contributions of this work are (i) a programming model and supporting interface model (§ and §3) to concisely and accurately describe the policies supported by this architecture; (ii) detailed examples of policy encoding and construction using four different classes of well-studied policies; (iii) quantification of requirements, complexity and performance for these policies (§4). In §5 and §, we discuss related and future work. Some additional material is available anonymously at http://git.io/8K7IKA. These materials include: [7] an anonymized version and the source code of our experiment, an appendix with a complete definition of the policy studied.

2. 정책 프로그래밍 모델(POLICY PROGRAMMING MODEL)2. POLICY PROGRAMMING MODEL

PUMP 정책은 태그 값들의 세트와 함께 이러한 태그를 조작하여 어떤 원하는 추적 및 집행 메커니즘을 구현하는 규칙들의 모음으로 구성된다. 규칙은 시스템의 소프트웨어 계층(상징적 규칙) 또는 하드웨어 계층(구체적 규칙)에 관해 우리가 이야기하고 있는지에 따라 두 가지 형태로 나온다.A PUMP policy consists of a collection of rules that manipulate these tags along with a set of tag values to implement any desired tracking and enforcement mechanism. Rules come in two forms, depending on whether we are talking about the software layer of the system (symbolic rules) or the hardware layer (concrete rules).

예제. PUMP의 동작을 설명하려면, 프로그램 실행 동안 리턴 포인트를 제한하기 위한 간단한 예제 정책을 고려해 본다. 이 정책의 동기는 return-oriented programming(ROP)[39]로 알려진 공격 클래스에서 비롯되는데, 이 공격에서 공격자는 공격 받고 있는 프로그램의 이진 실행 파일에서 "가젯(gadget)들"의 세트를 식별하고 이를 사용하여 스택 프레임들 - 각각에는 어떤 가제트를 가리키는 리턴 어드레스가 포함되어 있음 - 의 적절한 시퀀스를 구축함으로써 복잡한 악의적 거동을 모으며; 그런 다음 버퍼 오버플로우 또는 다른 취약점이 스택의 상단을 원하는 시퀀스로 오버라이트하도록 이용되어, 스니핏을 순서대로 실행되게 한다.example. To illustrate the operation of PUMP, consider a simple example policy for limiting return points during program execution. The motivation for this policy comes from a class of attacks known as return-oriented programming (ROP) [39], in which an attacker identifies a set of "gadgets" in the binary executable of the program under attack and uses them assemble complex malicious behavior by building appropriate sequences of stack frames, each containing a return address pointing to some gadget; A buffer overflow or other vulnerability is then exploited to overwrite the top of the stack with the desired sequence, causing the snippets to be executed in sequence.

ROP 공격을 제한하는 간단한 하나의 방법은 리턴 명령어의 타깃을 잘 정의된 리턴 포인트로 제한하는 것이다. 우리는 유효한 리턴 포인트인 태깅 명령어를 메타데이터 태그 타깃으로 태깅함으로써 PUMP를 사용하여 이를 수행할 수 있다. 우리가 리턴 명령어를 실행할 때마다, 우리는 PC에 메타데이터 태그를 설정하여 리턴이 방금 발생했음을 표시하는 것을 체크한다. 다음 명령어 때, 우리는 PC 태그가 check인 것을 통보 받고, 현재 명령어상의 태그가 타깃인지를 검증하고, 그렇지 않으면 보안 위반을 신호로 알린다. 우리는 메타데이터를 더 풍부하게 만듦으로써, 우리가 어떤 리턴 명령어가 어떤 리턴 포인트로 리턴할 수 있는지 정확하게 제어할 수 있음을 이 섹션의 뒷부분에서 보여줄 것이다. 이것을 더 풍부하게 함으로써, 우리는 본격적인 CFI 체킹 [2](§4. 3 참조)을 구현할 수 있다.One simple way to limit ROP attacks is to limit the targets of return instructions to well-defined return points. We can do this using PUMP by tagging the tagging command, which is a valid return point, as a metadata tag target. Whenever we issue a return instruction, we check to set a metadata tag on the PC to indicate that a return has just occurred. On the next instruction, we are notified that the PC tag is checked, verify that the tag on the current instruction is the target, otherwise signal a security violation. We'll show later in this section that by making the metadata richer, we can control exactly which return commands can return to which return points. By making this richer, we can implement full-fledged CFI checking [2] (see §4. 3).

상징적 규칙. 정책 디자이너와 PUMP의 소프트웨어 부분의 관점에서, 정책은 작은 도메인-특정 언어로 작성된 상징적 규칙을 사용하여 간결하게 설명된다. 각각의 상징적 규칙의 형태는 다음과 같으며:symbolic rules. From the perspective of the policy designer and the software portion of the PUMP, policies are concisely described using symbolic rules written in a small domain-specific language. Each symbolic rule has the following form:

이것은 규칙이 프로그램 카운터(PC)상의 메타데이터 태그, 현재 명령어(CI), 레지스터 파일로부터의 최대 두 개의 오퍼랜드(OP1, OP2) 및 만일 있다면, 명령어(MR)에 의해 참조되는 메모리 위치와 함께 명령어 opcode들의 세트(opgroup)에서 매칭한다고 말한다. 규칙은 모든 관련 태그 표현이 매칭하고 guard? 술부가 홀드한다면 적용한다. 이 경우, 오른쪽은 PC(PC')상의 태그 및 동작의 결과(R')상의 태그를 업데이트하는 방법을 결정한다. 대부분의 정책에서 규칙이 동일한 opcode가 많이 있을 것이기 때문에, 우리는 opcodes 대신 opgroup을 사용한다. 우리는 무시되는 입력 또는 출력 필드("와일드 카드")를 나타내기 위해 "-"을 기입한다. guard? 조건이 참일 때, 우리는 이것을 삭제한다. This rule is the instruction opcode along with the metadata tag on the program counter (PC), the current instruction (CI), up to two operands (OP1, OP2) from the register file, and the memory location referenced by the instruction (MR), if any. It is said to match on a set (opgroup) of . The rule matches all relevant tag expressions and guard? Apply if the predicate holds. In this case, the right side determines how to update the tag on the PC (PC') and the result of the operation (R'). Because in most policies there will be many opcodes with the same rules, we use opgroups instead of opcodes. We write "-" to indicate input or output fields that are ignored ("wild cards"). guard? When the condition is true, we delete it.

방금 개요 설명한 간단한 ROP 정책에 대해, 우리는 opcode를 두 개의 opgroup - return(단지 단일 opcode만 포함) 및

(나머지 모두) - 으로 분할하며; 가능한 태그 값은 check, target 및

이다. PC는 항상 check 또는

로 태깅될 것이며, 각 명령어는 target 또는

로 태깅될 것이다. (명령어 태그는 신뢰성 있는 로더에 의해 공급된다; §참조). 상징적 규칙은 다음과 같다: For the simple ROP policy just outlined, we divide the opcodes into two opgroups - return (which contains only a single opcode) and

(all others) split by -; Possible tag values are check, target and

am. PC always check or

will be tagged with , and each command can be a target or

will be tagged with (The instruction tag is supplied by the trusted loader; see §). The symbolic rules are:

규칙 1에 따르면, 현재 동작이 return일 때 (그리고 PC가 이미 태깅된 check가 아닐 때), 우리는 PC상의 태그를 check로 변경한다. 우리가 PC 태깅된 체크(규칙 2)로 명령어를 실행할 때, 우리는 명령어 태그, CI가 target인지 체크하고; 그렇다면, 우리는 동작을 허용하고 PC상의 태그를 클리어한다. 현재 동작이 return이 아니고 PC 태그가 1이면, 우리는 그냥 진행한다(규칙 3). 규칙 4는 return의 유효 타깃이 return 자체인 특수한 사례를 처리한다. 어느 규칙도 적용되지 않으면, 동작은 허용되지 않다(예를 들어, configuration PC=check 및 CI=

은 허용되지 않는다). 우리는 상징적 규칙이 겹치지 않는다고 가정한다.According to Rule 1, when the current action is return (and the PC is not already tagged check), we change the tag on the PC to check. When we execute an instruction with a PC tagged check (rule 2), we check if the instruction tag, CI, is the target; If so, we allow the action and clear the tag on the PC. If the current action is not return and the PC tag is 1, then we just proceed (rule 3). Rule 4 handles the special case where the valid target of a return is the return itself. If none of the rules apply, the action is disallowed (e.g. configuration PC=check and CI=

is not allowed). We assume that symbolic rules do not overlap.

다음으로, 이 정책의 보다 정밀한 변형을 생각해 보는데, 이 정책에서 우리는 모든 return이 어떤 유효한 리턴 타깃에 도달할 뿐만 아니라, 이것이 실제로 호출될 수 있는 코드 포인트를 타깃으로 한다는 것을 보장한다. 이 정책은 컴파일러가 리턴 포인트에 대해 완전한 지식을 가지고 있고, 각 리턴 포인트에 대해, 어느 호출 사이트에 리턴할 가능성 있는지를 분석할 수 있다고 가정한다. 이 정보를 사용하여, 우리는 고유 태그를 각각의 return 및 각각의 잠재적 리턴 타깃에 첨부할 수 있다. return을 만났을 때, PUMP는 (일반 태그 check 대신) 명령어상의 태그를 PC에 카피한다(규칙 1' 및 4'). 다음 단계에서, 실제 리턴 포인트가 예상한 포인트들 중에 있는지를 체크하는데 - 즉, PC에서 CI 로의 리턴이 허용되는지 체크한다(규칙 2' 및 4').Next, consider a more sophisticated variant of this policy, in which we ensure that every return not only reaches some valid return target, but also targets a code point that can actually be called. This policy assumes that the compiler has complete knowledge of the return points and, for each return point, can analyze which call site it is likely to return to. Using this information, we can attach a unique tag to each return and each potential return target. When return is encountered, PUMP copies the tag on the instruction to the PC (rule 1' and 4') (instead of the normal tag check). In the next step, it is checked if the actual return point is among the expected points - i.e. if return from PC to CI is allowed (rules 2' and 4').

이들 규칙에서, 우리는 X(컴파일러에 의해 제공된 코드 위치 식별자 쌍들의 세트)를 사용하여 코드 내의 return을 통해 허용된 간접 제어 흐름을 나타낸다. 여기에 알 수 있는 바와 같이, 상징적 규칙 내의 태그를 설명하는 표현식은 상수 값으로 제한되지 않는다: 우리는 태그들의 대형 세트를 간결하게 설명하는 보다 일반적인 표현식을 작성할 수 있다.In these rules, we use X (the set of code location identifier pairs provided by the compiler) to indicate the indirect control flow allowed through returns in code. As can be seen here, expressions describing tags within symbolic rules are not limited to constant values: we can write more general expressions that concisely describe large sets of tags.

구체적 규칙. 상징적 규칙은 각종 메타데이터 추적 메커니즘을 간결하게 인코딩할 수 있다. 그러나 하드웨어 레벨에서, 우리는 기본 계산의 속도 저하를 피하기 위해 효율적인 해석을 위해 조정된 규칙 표현이 필요하다. 이를 위해, 우리는 구체적 규칙이라 불리는 하위 레벨 규칙 포맷을 도입한다. 직관적으로, 주어진 정책에 대한 각각의 상징적 규칙은 동등한 구체적인 규칙 세트로 확장될 수 있다. 그러나 단일의 상징적 규칙은 일반적으로 무한한 수의 구체적 규칙을 만들어 낼 수 있기 때문에, 우리는 이렇게 애써 만든 것을 느리게 수행하여, 시스템을 실행하는 동안 필요에 따라 구체적 규칙을 만든다. specific rules. Symbolic rules can succinctly encode various metadata tracking mechanisms. However, at the hardware level, we need tuned rule expressions for efficient interpretation to avoid slowing down the underlying calculations. To do this, we introduce a low-level rule format called concrete rules. Intuitively, each symbolic rule for a given policy can be extended to an equivalent set of concrete rules. However, since a single symbolic rule can usually generate an infinite number of concrete rules, we do this laboriously slowly, creating concrete rules as needed while the system is running.

PUMP 하드웨어는 프로세서의 ALU 동작과 병행하여 참조될 수 있는 구체적인 규칙의 캐시를 포함한다. 명령어가 발행될 때, 규칙 캐시는 캐시 내의 모든 구체적 규칙에 대하여 현재 머신 상태(현재 PC 태그, 현재 명령어의 오퍼랜드상의 태그 등)으로부터의 태그의 연상 매치(associative match)를 수행한다. 매치가 발견되면, 캐시는 PC에 대한 새로운 태그 및 명령어의 결과에 대한 태그를 리턴한다. 그렇지 않으면, 프로세서는 규칙 미스 핸들러 - 정책의 상징적 규칙을 참조하고 결함 있는 머신 상태가 진행하도록 허용되어야 하는지를 결정하는 소프트웨어 루틴 - 에 결함을 지워주고; 만일 그렇다면, 적절한 구체적 규칙을 발생하고, 이를 캐시에 설치하고, 결함을 유도한 명령어를 다시 시작한다. 그렇지 않으면, 적합한 보안 결함 핸들러(security fault handler)를 호출한다. 구체적 규칙의 일반적인 포맷은 다음과 같으며:The PUMP hardware contains a cache of specific rules that can be referenced in parallel with the processor's ALU operation. When an instruction is issued, the rules cache performs an associative match of tags from the current machine state (current PC tag, tag on the operand of the current instruction, etc.) for all specific rules in the cache. If a match is found, the cache returns a new tag for the PC and a tag for the result of the instruction. Otherwise, the processor clears the fault in the rule miss handler - a software routine that consults the symbolic rule in the policy and determines whether the faulty machine state should be allowed to proceed; If so, it generates the appropriate specific rule, installs it in the cache, and restarts the instruction that led to the fault. Otherwise, call the appropriate security fault handler. The general format of a specific rule is:

여기서 입력 및 출력 필드는 고정 태그이다. "guard?" 필드는 상징적 규칙 포맷 내의 필드가 필요하지 않다는 것을 알아야 하는데, 왜냐하면 미스 핸들러는 임의의 구체적 규칙을 캐시에 추가하기 전에 대응하는 조건을 체크하기 때문이다.Here input and output fields are fixed tags. "Guard?" It should be noted that the fields in the symbolic rule format are not needed, since the miss handler checks the corresponding condition before adding any concrete rules to the cache.

하나의 편리한 인코딩 트릭은 구체적 규칙 수를 크게 줄인다. 우리는 주어진 opgroup에 대한 모든 상징적 규칙이 특정 입력 또는 출력을 "와일드카드"로 마킹하는 것이 매우 일반적이라는 것을 관찰한다. 예를 들어, ROP 정책에서, return 및

opgroup에 대한 규칙은 OP1, OP2 및 MR 입력에 매칭시킬 필요가 없으며, R' 결과를 생성할 필요가 없다. 미사용 입력 필드의 모든 가능한 값에 대해 구체적 규칙을 생성하지 않기 위해, 우리는 각 opgroup 및 입력 필드에 대해 don't-care 비트를 포함하는 비트 벡터를 정의하는데, 이것은 대응하는 태그가 규칙 캐시 룩업에 실제로 사용되는지를 결정한다. 유사하게, don't-care 벡터는 디폴트 태그가 리턴된 미사용 출력을 마킹한다(아래에서는 이것을 위해

를 사용한다). One convenient encoding trick greatly reduces the number of concrete rules. We observe that it is very common for all symbolic rules for a given opgroup to mark certain inputs or outputs as "wildcards". For example, in the ROP policy, return and

Rules for opgroup need not match the OP1, OP2 and MR inputs, and do not need to produce R' results. To avoid generating concrete rules for every possible value of an unused input field, we define a bit vector containing a don't-care bit for each opgroup and input field, which means that the corresponding tag is not included in the rule cache lookup. determine if it is actually used. Similarly, the don't-care vector marks unused output for which a default tag is returned (see below for this

use).

예를 들어, ROP 정책의 경우,

opgroup이 OP1, OP2, MR 및 R1'에 대해don't-care 비트 세트를 갖기 때문에, 컴파일러가 t1으로 태깅된 리턴 명령어가 코드의 유일한 리턴임을 알고 있고 t2 및 t3로 태깅된 리턴 타겟에만 리턴할 수 있다면, 규칙 2'는 단지 두 개의 구체적 규칙만 갖는다.For example, for a ROP policy:

Because the opgroup has the don't-care bit set for OP1, OP2, MR, and R1', the compiler knows that the return instruction tagged t1 is the only return in the code and will only return to the return targets tagged t2 and t3. If possible, rule 2' has only two specific rules.

"don't-care" 위치는

로 마스킹되었다. 다른 한편, 상징적 규칙 3'은 네 개의 구체적 규칙에 대응한다:The "don't-care" position is

was masked with On the other hand, symbolic rule 3' corresponds to four concrete rules:

CI는

에 대한 "don't-care" 위치가 아니기 때문에 (규칙 3''은 CI를 와일드카드로서 마킹하지만, 규칙 2'는 그렇지 않으며 두 규칙은 모두 동일한 opcode에 관한 규칙임), 우리는 각각의 가능한 값마다 각기 취할 수 상이한 구체적 규칙을 얻는다 -

및 모든 식별자(이 예에서는 단지 t1, t2 및 t3).CI is

Since it is not a "don't-care" position for (rule 3'' marks the CI as a wildcard, rule 2' does not, and both rules are about the same opcode), we consider each possible Each value gets different concrete rules that can be taken -

and all identifiers (in this example only t1, t2 and t3).

opcode로부터 opgroups 및 don't-care 벡터로의 매핑은 프로그램 가능하다. ROP 정책은 두 개의 opgroup(return 및

)만 사용하지만, 다른 정책은 더 많이 필요할 수 있는데; 예를 들면, 프리미티브 타입 정책(§4.1)은 열 개를 사용한다.The mapping from opcodes to opgroups and don't-care vectors is programmable. A ROP policy consists of two opgroups (return and

), but other policies may require more; For example, the primitive type policy (§4.1) uses ten.

구조화된 태그. ROP보다 풍부한 메타데이터 태그를 갖는 정책의 경우, 상징적 규칙으로부터 구체적 규칙으로의 변환은 동일한 일반 라인을 따르지만 세부 사항은 조금 더 복잡해진다. 예를 들어, 테인트 추적 정책(§4. 4}은 태그를 메모리 데이터 구조체를 가리키는 포인터로 사용하며, 각각의 메모리 데이터 구조체는 (주어진 단편의 데이터에 기여했을 수 있는 데이터 소스 또는 시스템 컴포넌트를 나타내는) 임의의 크기의 테인트 세트를 서술한다. 로드 opgroup에 대한 상징적 규칙은 로드된 값 상의 테인트가 명령어 자체, 로드의 타깃 어드레스 및 그 어드레스에서의 메모리 상의 테인트들의 합집합이어야 한다고 말한다:structured tags. For policies with richer metadata tags than ROPs, the conversion from symbolic rules to concrete rules follows the same general lines, but the details are a bit more complicated. For example, the taint tracking policy (§4.4} uses tags as pointers to memory data structures, each of which represents a data source or system component that may have contributed to a given fragment's data. ) describes a set of taints of arbitrary size A symbolic rule for the load opgroup says that the taints on the loaded value must be the union of the instruction itself, the target address of the load, and the taints on memory at that address:

어떤 순간에, (i) 실행될 다음 명령어가 Id r0 r1이며 그의 태그가 tci이고, 레지스터 r0가 세로 태깅된 포인터 p 를 포함하며, 어드레스 p에서의 메모리가

로 태깅된 값이고; (ii) tci가 집합{TA, TB}을 나타내는 데이터 구조체(이를 테면, 테인트 id의 어레이)를 가리키고; (iii) tp가 {TC, TD}의 표현을 가리키고; 및 (iv)

가 비어 있는 세트를 가리킨다고 가정해 본다. 또한, 이전에 테인트 {TA, TB, TC, TD}가 발생하지 않았다고 - 즉, 우리가 로드의 결과를 테인트하는데 사용해야 하는 세트를 나타내는 데이터 구조체가 현재 메모리에 없다고 - 가정한다. 이 경우, 규칙 캐시 룩업은 미스일 것이고 실행은 규칙 미스 핸들러에 결함을 일으킬 것이며, 이것은 적절한 구체적 규칙을 발생하고 이를 캐시에 설치할 것이며, 아마도 다른 규칙을 없애버려 공간을 만들 것이다. 이렇게 하면 (예를 들어, 어드레스 tnew에서) 새로운 메모리를 할당해야 하고, 이를 초기화하여 {TA, TB, TC, TD}를 표현해 주어야 한다. 그러면 발생된 구체적 규칙은 다음과 같을 것이다:At any moment, (i) the next instruction to be executed has Id r0 r1 and its tag is tci, register r0 contains a vertically tagged pointer p, and the memory at address p is

is the value tagged with ; (ii) tci points to a data structure representing the set {TA, TB} (eg, an array of taint ids); (iii) tp points to the expression {TC, TD}; and (iv)

Suppose that points to an empty set. We also assume that no taints {TA, TB, TC, TD} have occurred before - i.e. there is currently no data structure in memory representing the set that we should use to taint the result of the load. In this case, the rule cache lookup will miss and execution will fault the rule miss handler, which will generate the appropriate specific rule and install it into the cache, possibly discarding other rules to make room. In this case, new memory must be allocated (eg, at address tnew), and it must be initialized to represent {TA, TB, TC, TD}. Then the specific rules generated would be:

명령어가 다시 시작된 후, 다음 캐시 룩업이 성공할 것이고, r1에 로드된 값은 tnew로 태깅될 것이다.After the instruction restarts, the next cache lookup will succeed, and the value loaded into r1 will be tagged with tnew.

별개의 태그들의 수를 줄이기 위해 (그래서 규칙 캐시에 가해진 부담을 줄이기 위해), 메타데이터 구조체는 내부적으로 기본형으로 저장되고, 태그는 불변이기 때문에 공유는 완전히 활용된다(예를 들어, 세트의 원소들은 기본 순서로 제공되므로 세트는 공통의 프리픽스 서브세트를 공유하는 것으로 간결하게 표현된다). 더 이상 필요하지 않을 때, 이러한 구조체는 (예를 들어, 불요 정보 정리(garbage collection)에 의해) 되돌려질 수 있다.To reduce the number of distinct tags (and thus reduce the burden on the rules cache), metadata structures are stored internally as primitives, and sharing is fully exploited since tags are immutable (e.g. elements of a set are Given in the default order, the sets are succinctly expressed as sharing a common subset of prefixes). When no longer needed, these structures can be returned (eg by garbage collection).

복합 정책. 한 걸음 더 나아가서, 우리는 태그가 여러 컴포넌트 정책으로부터의 태그들의 튜플을 가리키는 포인터가 되도록 함으로써 복수의 직교 정책을 동시에 시행할 수 있다. (일반적으로, 복수의 정책은 직교하지 않을 수 있고; 우리는 §6의 이 시점으로 되돌아 간다.) 예를 들어, 방금 스케치한 테인트 추적 정책이 있는 제 1 ROP 정책을 구성하기 위해, 우리는 각 태그를 튜플{r, f}의 표현을 가리키는 포인터로 놓을 것이며, 여기서 r은 ROP 태그(코드 위치 식별자 또는

)이고, t는 테인트 태그(테인트들의 세트를 가리키는 포인터)이다. 캐시 룩업 프로세스는 정확히 동일하지만, 미스가 발생할 때, 미스 핸들러는 튜플의 컴포넌트를 추출하고 상징적 규칙들 세트 모두를 평가하는 루틴으로 디스패치한다. 정책 모두가 적용되는 규칙이 있는 경우에만 동작이 허용되고; 이 경우 결과적인 태그는 두 서브-정책으로부터 생긴 결과를 포함하는 쌍을 가리키는 포인터이다.compound policy. Taking it a step further, we can enforce multiple orthogonal policies simultaneously by having a tag be a pointer to a tuple of tags from multiple component policies. (In general, the multiple policies may not be orthogonal; we return to this point in §6.) For example, to construct the first ROP policy with the taint tracking policy we just sketched, we would We will put each tag as a pointer to a representation of the tuple {r, f}, where r is a ROP tag (code location identifier or

), and t is a taint tag (a pointer to a set of taints). The cache lookup process is exactly the same, but when a miss occurs, the miss handler extracts the tuple's components and dispatches them to a routine that evaluates the full set of symbolic rules. An action is allowed only if there is a rule that all of the policies apply to; The resulting tag in this case is a pointer to a pair containing the result from the two sub-policies.

명령어 수정자 및 임시 규칙. 일부 정책(예를 들어, 메모리 안전)에서는 새로운 태그가 동적으로 발생되어야 한다. 이 효과를 달성하는 하나의 방법은 수정자로서 무브(move)와 같은 명령어 상의 태그를 사용하여 새로운 태그의 요청을 정책 관리 시스템에 전달하는 것이다.Command Modifiers and Temporary Rules. Some policies (e.g. memory safety) require that new tags be generated dynamically. One way to achieve this effect is to use a tag on a command such as move as a modifier to pass a request for a new tag to the policy management system.

이것에 따르면 tpolicygen으로 태깅된 무브 명령어가 새로운 태그를 발생하라는 요청으로 해석된다. 결과인 tnewtag는 특정된 정책과 연관된 고유 태그이다. 명령어 상의 태그 tpolicygen은 이러한 서비스 요청에 대해 인가 또는 능력으로서도 또한 사용되고; 그 태그가 없으면, 호출하는 것이 불가능하며; 신뢰성 있는 로더는 특별히 지정된 코드 영역(예를 들어, §4.2 의 메모리 안전 정책에서 malloc 루틴)만이 이러한 태그로 주석을 달게 하는 것을 보장한다. "1"은 결과가 하드웨어 규칙 캐시에 영구적으로 저장되지 않는 임시 규칙을 나타낸다(모든 호출에서 변경되기 때문임).According to this, a move command tagged with tpolicygen is interpreted as a request to generate a new tag. The resulting tnewtag is the unique tag associated with the specified policy. The tag tpolicygen on the command is also used as an authorization or capability for these service requests; Without that tag, it is impossible to call; The trusted loader ensures that only specially designated areas of code (eg the malloc routine in the memory safety policy in §4.2) are annotated with these tags. A "1" represents a temporary rule whose results are not permanently stored in the hardware rule cache (since it changes on every call).

태그를 초기화하기 위한 코드는 "정상 상태" 규칙을 무시해야 할 필요가 있을 수도 있다. 예를 들어, 메모리 안전 정책에서, malloc은 새로 할당된 메모리 영역의 태그를 초기화해야 한다. 표준 규칙은 포인터가 포인터와 매칭하도록 적절하게 태깅된 메모리 영역에만 기입될 수 있다는 것이다. 그러나 malloc은 새로 생겨난 태그를 새 영역의 각 워드에 기입하는 동안 이 규칙을 무시할 수 있게 해주어야 한다. 우리는 이것을 스토어 동작에 (malloc에서만 사용되는) 특수 수정자 태그를 제공함으로써 수행한다:Code to initialize tags may need to ignore the "normal state" rule. For example, in memory safety policy, malloc must initialize the tags of newly allocated memory areas. The standard rule is that a pointer can only be written to a memory area that is properly tagged to match the pointer. However, malloc must allow us to override this rule while writing the newly created tags to each word in the new region. We do this by giving the store operation a special modifier tag (used only by malloc):

3. 정책 시스템 및 보호3. Policy system and protection

정책 시스템은 각 사용자 프로세스 내에 별도의 메모리 영역으로 존재한다. 정책 시스템은 미스 핸들러, 정책 규칙 및 정책의 메타데이터 태그를 나타내는 데이터 구조체를 포함한다. 정책 시스템을 프로세스에 배치하면 기존의 유닉스 프로세스 모델(Unix process model)이 받는 침입을 최소화하고 정책 시스템과 사용자 코드 간의 가벼운 전환을 용이하게 한다. 정책 시스템은 다음에 설명되는 메커니즘을 사용하여 사용자 코드로부터 격리된다.The policy system exists as a separate memory area within each user process. The policy system includes data structures representing miss handlers, policy rules, and metadata tags of policies. Placing the policy system in the process minimizes the intrusions of the existing Unix process model and facilitates lightweight transitions between the policy system and user code. The policy system is isolated from user code using the mechanisms described below.

메타데이터 위협 모델. 분명히, 공격자가 메타데이터 태그를 다시 기입하거나 그 해석을 변경할 수 있다면, PUMP에 의해 제공되는 보호 기능은 쓸모가 없다. 우리의 시스템은 그러한 공격을 방지하도록 설계된다. 우리는 커널, 로더 및 (일부 정책의 경우) 컴파일러를 신뢰한다. 특히, 컴파일러에 따라 우리는 초기 태그를 워드에 할당하고, 필요한 경우, 정책 시스템에 규칙을 전달한다. 우리는 로더가 컴파일러에 의해 제공된 태그를 보존할 것이고, 컴파일러로부터 로더까지의 경로가 예를 들어, 암호화 서명을 사용하여 부당 변경하는 것(tempering)으로부터 보호되는 것으로 상정한다. 우리는 각 프로세스에 대해 초기 메모리 이미지를 설정하는 표준 유닉스 스타일 커널을 상정한다. (마이크로-정책을 사용하여 이러한 상정 중 일부를 없애고, 추가로 TCB의 크기를 줄이는 것이 가능할 수 있다 - §6.) 우리는 규칙-캐시-미스-처리 소프트웨어가 올바르게 구현되는 것으로 상정한다. 이것은 작으며, 그렇기 때문에 공식 검증을 위해서는 좋은 타깃이며; 최근의 저작물 [8]은 PUMP와 유사한 프로그래밍 모델에 대한 실행 가능성을 보여준다.Metadata Threat Model. Obviously, the protection provided by PUMP is useless if an attacker can rewrite the metadata tag or change its interpretation. Our system is designed to prevent such attacks. We trust the kernel, the loader and (for some policies) the compiler. Specifically, depending on the compiler, we assign initial tags to words and, if necessary, pass rules to the policy system. We assume that the loader will preserve the tags provided by the compiler, and that the path from the compiler to the loader is protected from tempering, for example using cryptographic signatures. We assume a standard Unix-style kernel that sets up an initial memory image for each process. (It may be possible to eliminate some of these assumptions using micro-policies, further reducing the size of the TCB - §6.) We assume that the rules-cache-miss-handling software is implemented correctly. It is small and therefore a good target for formal verification; A recent work [8] shows the feasibility of a programming model similar to PUMP.

우리의 주된 관심사는 프로세스에서 실행 중인 사용자 코드가 프로세스의 정책에 의해 제공되는 보호를 훼손시키지 못하게 하는 것이다. 사용자 코드는 (i) 태그를 직접 조작할 수 없어야 하고 - 모든 태그 변경은 현재 적용되는 정책 규칙에 따라 수행되어야 하고; (ii) 미스 핸들러에 의해 사용된 데이터 구조체 및 코드를 조작할 수 없어야 하고; (iii) 규칙을 직접 하드웨어 규칙 캐시에 삽입할 수 없어야 한다.Our main concern is to prevent user code running in a process from violating the protection provided by the process' policy. User code must (i) not be able to directly manipulate tags - all tag changes must be performed in accordance with currently applicable policy rules; (ii) must not be able to manipulate data structures and code used by the miss handler; (iii) It must not be possible to insert rules directly into the hardware rule cache.

어드레스 지정. 사용자 코드에 의한 태그의 직접 조작을 방지하기 위해, 모든 64b 워드에 첨부된 태그는 자체적으로 별도로 처리할 수 없다. 특히, 태그를 판독하거나 기입하기 위해 태그 또는 태그의 일부분에만 대응하는 어드레스를 특정하는 것은 불가능하다. 사용자가 접근 가능한 모든 명령어는 원자 단위로서 (데이터, 태그) 쌍 - 값 부분에 대해 동작하는 표준 ALU 및 태그 부분에 대해 동작하는 PUMP - 에 대해 동작한다.addressing. To prevent direct manipulation of tags by user code, tags attached to all 64b words cannot be separately processed by themselves. In particular, it is impossible to specify an address corresponding to only a tag or a portion of a tag in order to read or write a tag. All user-accessible commands operate on (data, tag) pairs as atomic units - the standard ALU, which operates on the value part, and the PUMP, which operates on the tag part.

미스 핸들러 아키텍처. 정책 시스템은 PUMP 캐시의 미스 시에만 활성화된다. 정책 시스템과 사용자 코드를 격리하기 위해, 우리는 미스(미스 핸들러) 동작 모드를 프로세서에 추가하고; 우리는 레지스터를 저장하고 복원하는 것을 피하기 위해, 미스 핸들러에만 이용 가능한 16 추가 레지스터로 정수 레지스터 파일을 또한 확장한다. 결함유도 명령어의 PC, 규칙 입력(opgroup 및 태그) 및 규칙 출력은 미스 핸들러 모드에 있는 동안 레지스터처럼 나타난다. 또한 우리는 캐시에 구체적 규칙을 설치하고 사용자 코드로 리턴하는 미스-핸들러-리턴 명령어를 추가한다.Miss handler architecture. The policy system is only activated on a PUMP cache miss. To isolate the policy system and user code, we add a miss (miss handler) operating mode to the processor; We also extend the integer register file with 16 additional registers available only for miss handlers, to avoid saving and restoring registers. PCs of fault-inducing instructions, rule inputs (opgroups and tags), and rule outputs appear as registers while in miss handler mode. We also add a miss-handler-return instruction that installs the specific rule into the cache and returns it to user code.

프로세서가 미스-핸들러 모드에 있는 동안 PUMP의 정상 거동은 해제된다. 대신에, 단일의 하드와이어드 규칙이 적용된다: 미스 핸들러가 접촉한 모든 명령어 및 데이터는 모든 정책에 의해 사용된 태그와 구별되는 미리 정의된 미스-핸들러 태그로 태깅해야 한다. 이렇게 하면 동일한 어드레스 공간에서 미스 핸들러 코드와 데이터 및 사용자 코드 간의 격리가 보장된다. 사용자 코드는 정책 시스템 데이터 또는 코드를 접촉하거나 실행할 수 없고, 미스 핸들러는 우연히 사용자 데이터 및 코드를 접촉할 수 없다. 미스-핸들러-리턴 명령어는 미스-핸들러 모드에서만 발행될 수 있으므로, 사용자 코드가 규칙을 PUMP에 삽입하는 것이 방지된다.Normal behavior of PUMP is released while the processor is in miss-handler mode. Instead, a single hardwired rule applies: all commands and data touched by a miss handler must be tagged with a predefined miss-handler tag distinct from the tag used by any policy. This ensures isolation between the miss handler code and the data and user code in the same address space. User code cannot touch or execute policy system data or code, and miss handlers cannot accidentally touch user data and code. Because the miss-handler-return command can only be issued in miss-handler mode, user code is prevented from inserting rules into the PUMP.

4. 정책 및 실험4. Policy and experimentation

이 단원에서, 우리는 PUMP를 사용하여 다양한 세트의 보안 불변성을 시행하는 네 개의 정책 그룹을 구현하는 방법을 보여준다. 각 패밀리마다 우리는 먼저 위협 모델을 스케치한다. 그런 다음 우리는 이를 완화하는 정책 및 대응 규칙을 설명한다. 공개적인 취약성 모음 [10]의 예제를 사용하여, 우리는 각 정책이 전형적인 공격을 잡아내는 방법을 보여준다. 가장 중요한 것으로, 우리는 각 정책이 시스템에 가하는 부하를 설명한다. 우리는 문헌으로부터 유사한 정책과 비교함으로써 마감한다.In this section, we show how to use PUMP to implement four policy groups that enforce various sets of security invariants. For each family, we first sketch a threat model. Then we describe policies and response rules that mitigate this. Using the example from the public vulnerability suite [10], we show how each policy catches typical attacks. Most importantly, we describe the load each policy places on the system. We close by comparing similar policies from the literature.

정책 부하를 평가하기 위해, 우리는 28 C, C++ 및 SPEC CPU2006 벤치마크 제품군의 Fortran 애플리케이션 [25]을 사용하고, 이를 64-비트 Alpha ISA [1]에 대해 gem5 시뮬레이션 환경 [9]에서 시뮬레이션한다(우리는 gem5가 실패한 tonto 및 xalancbmk 벤치마크를 제외한다). gem5 시뮬레이션은 PUMP를 직접 모델링하지 않고; 오히려 별도의 PUMP 시뮬레이터를 통해 실행되는 명령어 추적을 생성한다.To evaluate the policy load, we use 28 C, C++ and Fortran applications [25] from the SPEC CPU2006 benchmark suite, and simulate them in the gem5 simulation environment [9] against 64-bit Alpha ISA [1] ( We exclude the tonto and xalancbmk benchmarks where gem5 fails). The gem5 simulation does not directly model PUMP; Rather, it creates an instruction trace that runs through a separate PUMP simulator.

도 1: Figure 1:

정책 위반이 발생할 때 계산에 영향을 미칠 때만 실행을 중단하는 것이기 때문에, 이러한 단계적 시뮬레이션이면 여기서 설명된 정책에 충분하다. 우리는 4096-엔트리 프리-미스-핸들러(pre-miss-handler) 규칙 캐시를 시뮬레이션한다.This step-by-step simulation suffices for the policy described here, because when a policy violation occurs, execution is only aborted if it affects the computation. We simulate a 4096-entry pre-miss-handler rule cache.

§2에서 설명된 추상 프로그래밍 모델은 소프트웨어 레벨에서 메타데이터를 표현하기 위해 사용된 고유 태그의 수, 구체적 규칙의 수 또는 데이터 구조체의 크기에 제한을 두지 않는다. PUMP가 실제로 어떻게 수행되는지 이해하기 위해, 다수의 문제가 고려되어야 한다. 고유 메타데이터 태그가 주어진 정책, 애플리케이션 및 데이터세트를 실제로 얼마나 많이 생성하는지? O opgroup 및 T 태그를 사용하면, 이론적으로 프로그램에 O＊T⁵구체적 규칙이 필요할 수 있지만, 전형적인 사례에서는 얼마인가? 메타데이터 태그의 총 수 및 메타데이터 표현의 크기가 어떻게 성능에 영향을 주는지? 사용에 태깅하고 사용을 통제하는 지역성은 얼마나 되는지? 구체적 규칙 해결 결정에는 비용이 얼마나 들며, 규칙 캐시 미스가 성능에 얼마나 영향을 미치는지? 태그, 규칙, 메타데이터 크기 또는 규칙 해결 시간이 증가함에 따라 성능이 적절하게 저하되는지5?The abstract programming model described in §2 does not place any restrictions on the number of unique tags, the number of specific rules, or the size of data structures used to represent metadata at the software level. To understand how PUMP is actually performed, a number of issues must be considered. How many policies, applications, and datasets do you actually create given unique metadata tags? With the O opgroup and T tags, in theory a program could need O*T ⁵ concrete rules, but how much in a typical case? How does the total number of metadata tags and the size of the metadata representation affect performance? How much locality is tagging and controlling usage? How much does a specific rule resolution decision cost, and how much does a rule cache miss affect performance? Does performance degrade appropriately as tags, rules, metadata size, or rule resolution time increase5?

이러한 결과의 이해를 시작하기 위해, 우리는 각 정책에 대해, 런타임 오버헤드 이외의 여러 특성을 측정한다 - 도 1 참조. Tag usage(태그 사용)은 태그가 정책 내의 임의의 규칙에 의해 사용되지 않는 것을 보여준다. Opgroups은 정책을 점하는데 필요한 opgroup의 최소 수이며; 우리가 사용하는 opgroup이 적을수록 구체적 규칙에 대해 우리가 얻는 압축은 더 크며 그래서 유효 PUMP 용량이 더 커진다. Symbolic rules(상징적 규칙)은 우리가 정책을 표현하기 위해 작성한 상징적 규칙의 수이다. Initial tags(초기 태그)는 실행을 시작하기 전 초기 메모리 이미지 내의 태그의 수이다. 실행 중에 더 많은 태그가 동적으로 할당된다(dyn.alloc.tags). 또한 테인트 추적과 같은 정책은 테인트 세트들의 합집합을 표현하는 태그를 생성할 것이며, 복합 정책은 개개의 정책 태그의 튜플을 형성할 것이다. Final tags(최종 태그)는 십억 명령어 시뮬레이션 기간의 종료시 존재하는 태그의 수를 식별한다; 이것은 정책 복잡성이라는 약간의 의미를 주며 태그 발생의 속도를 추론하는데 사용될 수 있다. Concrete rules(구체적 규칙), 이것은 시뮬레이션 기간 동안 생성된 고유한 구체적 규칙의 수로서, 상징적 규칙을 구체적 규칙으로 해결하는데 필요한 강제 미스 횟수 및 실질적으로는 강제 미스 레이트의 특징을 나타낸다. Metadata structure(메타데이터 구조체), 이것은 각 태그가 가리키는 데이터 구조체의 워드의 평균 크기로, 무한한 메타데이터를 갖는 값을 보여준다. Metadata space(메타데이터 공간), 이것은 메타데이터 태그가 가리키는 정책 관련 정보를 보유하는 모든 데이터 구조체에 필요한 워드의 수이며, 태그 자체를 초과하는 메모리 오버헤드의 특징을 나타낸다. Policy-depend instrs는 상징적 규칙을 구체적 규칙으로 해석하는 코드에 필요한 총 명령어 수이며; 이것은 정책의 복잡성을 이해하는데 유용하다. Policy-depend instrs(dynamic)은 상징적 규칙으로부터 구체적 규칙으로 해결하기 위해 실행되는 평균 수의 정책-종속적 명령어이고; 이것은 각 정책에 대한 미스 핸들러의 런타임 복잡성을 나타낸다. 정책-종속적 부분의 영향은 규칙의 복잡성, 메타데이터 데이터 구조체, 메타데이터 데이터 구조체의 지역성 및 새로운 결과 태그를 할당해야 할 필요성에 따라 달라진다. 미스 핸들러의 정책-독립적 부분은 수십 개의 명령어만을 필요로 한다(도 1의 열

참조). Runtime overhead(런타임 오버헤드)는 PUMP가 없는 베이스라인 Alpha와 비교하여 정책을 실행하는 애플리케이션의 벽시간(wall-clock) 런타임의 비율이다. 아무 정책도 사용되지 않더라도 태그 및 PUMP의 하드웨어 구조체의 추가 때문에 몇몇 런타임 오버헤드가 존재한다. 특히, 태그가 늘어난 프로세서상의 L1 캐시는 더 큰 태깅된 워드 폭을 수용하면서 동일한 사이클 시간을 달성하기 위해 PUMP없는 베이스라인 Alpha 유효 용량의 절반이다. 이로 인해 태그가 늘어난 프로세서의 L1 미스 레이트는 더 높아진다. 이러한 오버헤드는 제 1 열(

)에 담겨 있는데, 이 열에서 모든 태그는 디폴트이고, 단일 규칙이 존재하며, 미스 핸들러는 결코 효과적으로 호출되지 않는다.To begin understanding these results, we measure several characteristics other than runtime overhead for each policy - see Figure 1. Tag usage shows that the tag is not used by any rules within the policy. Opgroups is the minimum number of opgroups required to acquire a policy; The fewer opgroups we use, the greater the compression we get for a specific rule and hence the greater the effective PUMP capacity. Symbolic rules are the number of symbolic rules we have written to express our policy. Initial tags is the number of tags in the initial memory image before starting execution. More tags are dynamically allocated during execution (dyn.alloc.tags). Policies such as tracking taints will also create tags representing unions of taint sets, and compound policies will form tuples of individual policy tags. Final tags identifies the number of tags present at the end of the billion instruction simulation period; This gives some sense of policy complexity and can be used to infer the rate of tag generation. Concrete rules, this is the number of unique concrete rules generated during the simulation period, characterizing the number of forced misses required to resolve symbolic rules into concrete rules, and the actual forced miss rate. Metadata structure, this is the average size of a word in the data structure pointed to by each tag, showing a value with infinite metadata. Metadata space, this is the number of words required for all data structures holding policy-related information pointed to by a metadata tag, and is characterized by a memory overhead that exceeds the tag itself. Policy-depend instrs is the total number of instructions required for the code to translate symbolic rules into concrete rules; This is useful for understanding the complexity of policies. Policy-depend instrs (dynamic) are the average number of policy-dependent instructions executed to resolve from symbolic rules to concrete rules; This represents the runtime complexity of the miss handler for each policy. The impact of the policy-dependent part depends on the complexity of the rule, the metadata data structure, the locality of the metadata data structure, and the need to assign a new result tag. The policy-independent part of the miss handler requires only a few dozen instructions (Fig. 1 column

reference). Runtime overhead is the ratio of the wall-clock runtime of the application executing the policy compared to baseline Alpha without PUMP. Even if no policy is used, there is some runtime overhead due to the addition of tags and hardware structures of PUMP. Specifically, the L1 cache on a tagged processor is half the effective capacity of the baseline Alpha without PUMP to achieve the same cycle time while accommodating the larger tagged word width. This results in a higher L1 miss rate for processors with increased tags. This overhead is the first column (

), in which all tags are default, there is a single rule, and the miss handler is never effectively called.

*도 1에서 평균 수치는 간결함을 위한 불가피한 단순화이다. 벤치마크는 다양한 결과를 보여준다. 결과는 SPEC CPU2006 벤치마크 세트에서 어플리케이션 전반의 특성 분포를 보여주기 위해 상자그림(boxplot)을 사용하는 도 3 내지 도 6에서 도시된다. 도 6은

를 초과하는 런타임 오버헤드를 그래프로 도시한다.*Average figures in Figure 1 are inevitable simplifications for brevity. Benchmarks show varying results. The results are shown in Figures 3-6 using boxplots to show the distribution of properties across applications on the SPEC CPU2006 benchmark set. Figure 6 is

Graph the runtime overhead exceeding .

우리는 다른 자명하지 않은 비용은 제쳐두고 런타임 성능만을 측정한다. 특히, 생무지의 구현에서, 워드-크기의 태그를 캐시 및 메모리의 모든 워드에 추가하면 최소 2x 배의 영역 오버헤드가 부과된다. PUMP 캐시의 영향과 더 큰 메모리의 영향을 추가하면, 에너지 오버헤드가 4x 배가 된다. 우리는 신중한 최적화를 통해 이러한 수치를 약 30 % 범위와 50 % 에너지로 줄이거나 심지어 더 낮게 줄일 수 있다고 낙관하며; 우리는 이 주장을 입증하기 위해 노력하고 있다.We only measure runtime performance to the exclusion of other non-trivial costs. In particular, in a blank implementation, adding a word-sized tag to every word in cache and memory imposes at least 2x area overhead. Adding the impact of the PUMP cache and the impact of larger memory, the energy overhead is multiplied by 4x. We are optimistic that with careful optimization we can reduce these numbers to around 30% range and 50% energy, or even lower; We are working to substantiate this claim.

4.1 프리미티브 타입4.1 Primitive Types

위협 모델. 데이터 오역은 프로세서가 의도하지 않은 동작을 수행하도록 하는 흔한 방법이다. 여기서 우리는 상대방을 대신하여 실행 중인 코드가 포인터로서 임의의 데이터 값을 사용하거나 명령어로서 워드를 실행하려고 시도할 수 있는 낮은-레벨 타입의 혼란(low-level type confusion)의 형태에 관심이 있다. 우리는 데이터가 실행될 수 없고 코드가 런타임에서 생성되거나 수정될 수 없게 시행한다(또한 §4.3참조).threat model. Data misinterpretation is a common way to cause a processor to perform unintended actions. Here we are interested in a form of low-level type confusion in which the code executing on behalf of the other party may try to use an arbitrary data value as a pointer or execute a word as an instruction. We enforce that data cannot be executed and code cannot be created or modified at runtime (see also §4.3).

정책 및 규칙. 정책(C)에서, 우리는 명령어(태깅된 insn), 어드레스(addr) 및 기타 모든 데이터(기타)를 분리하기 위해 태그를 사용한다. 명령어는 생성되거나 수정될 수 없고, 명령어만이 실행될 수 있다. 어드레스만이 메모리 액세스 명령어로 사용될 수 있다. 다른 타입 태그는 명령어 또는 어드레스가 아닌 워드의 포괄적 표현(catch-call)으로 사용된다. 다음의 규칙은 (예를 들면) nop가 실행되기 전에 실제로 태깅된 insn임을 입증한다: Policies and Rules. In policy (C), we use tags to separate instructions (tagged insn), addresses (addr) and all other data (other). Commands cannot be created or modified, only commands can be executed. Only addresses can be used as memory access instructions. Other type tags are used as catch-calls for words that are not commands or addresses. The following rule verifies (for example) that insn is actually tagged before nop is executed:

(5)

어드레스 산술은 허용된다 - 예를 들어, 가산할 인수 중 하나가 어드레스일 때 결과는 어드레스이다:Address arithmetic is allowed - for example, when one of the arguments to be added is an address, the result is an address:

(6)

또한 우리는 로드 및 스토어 명령어가 포인터만을 역참조하는 것을 시행하고 명령어를 판독하거나 기입하지 않는다: Also we enforce that load and store instructions only dereference pointers and do not read or write instructions:

(7)

(8)

리턴 어드레스가 (예를 들어, 스택 스매싱(stack smashing)을 통해) 오버라이트된 공격을 방지하기 위해, 우리는 리턴 어드레스에 제 4 태그(retaddr)를 추가하는 확장된 정책(D)을 고려한다. 우리는 이것을 호출의 리턴 어드레스에 태깅하는데 사용한다(규칙 9). Alpha ISA에서 호출은 리턴 어드레스를 reg26에 넣고, 반면에 리턴은 이 레지스터에 있는 어드레스로 제어를 이전한다(레지스터는 또 다른 호출시 스택으로 유출된다). 규칙 10은 리턴 명령어가 실행될 때 reg26 내의 값이 입력된 retaddr인지를 체크한다.To prevent attacks where the return address is overwritten (eg via stack smashing), we consider an extended policy (D) that adds a fourth tag (retaddr) to the return address. We use this to tag the call's return address (rule 9). In Alpha ISA, a call puts the return address into reg26, while a return transfers control to an address in this register (the register is flushed onto the stack on another call). Rule 10 checks whether the value in reg26 is the entered retaddr when the return instruction is executed.

(9)

(10)

계기화된(instrumented) 컴파일러는 이러한 타입 태그를 추론하고 이를 초기의 이진 메모리 이미지에 적용할 수 있다 - 발생된 모든 명령어는 태깅된 insn를 갖고, 스택-할당된 메모리를 가리키는 포인터는 태깅된 addr을 갖고, 다른 것들은 달리 태깅되며; 새로운 addr이 타이핑된 워드가 동적 어드레스 할당을 통해 생긴다. 그러나 현재 우리는 이러한 컴파일러가 없기 때문에, 우리는 시뮬레이션 및 분석을 위해 이러한 태그를 추론하는 다른 방법을 사용한다. 먼저, 우리는 이진 실행 파일 insn에 있는 모든 명령어에 태깅한다. addr를 태깅해야 하는 워드를 추론하기 위해, 우리는 실행 추적에 대한 사후 분석을 사용하여, 언제 그리고 어디서 각 레지스터가 로드되고 이것이 나중에 로드 또는 스토어를 가리키는 포인터 오퍼랜드로서 사용되는지를 추적한다. 다른 모든 것은 달리 태깅된다. 초기 태그를 획득하는 이 방법은 우리에게 SPEC 벤치마크에서 타이핑 정책의 런타임 영향을 측정할 수 있게 해준다. 그러나 이 설정으로 우리는 우리의 타이핑 정책이 불필요한 경보를 발생시키지 않고 모든 벤치마크를 수용할 수 있을 만큼 충분한지에 대해서는 어떠한 주장도 할 수 없다. 이것은 타이핑에 필요한 타이트한 컴파일러 통합으로 인해 유발되며 우리가 아래에서 제시하는 다른 정책에서는 발생하지 않는다.An instrumented compiler can infer these type tags and apply them to the initial binary memory image - every issued instruction has a tagged insn, and a pointer to stack-allocated memory has a tagged addr. have, others are tagged otherwise; Words in which new addr is typed occur through dynamic address assignment. However, currently we do not have such a compiler, so we use other methods to infer these tags for simulation and analysis. First, we tag every instruction in the binary executable file insn. To infer which words addr should be tagged with, we use post mortem analysis of the execution trace to track when and where each register is loaded and which is later used as a pointer operand pointing to a load or store. Everything else is tagged otherwise. This method of acquiring the initial tag allows us to measure the runtime impact of typing policies on the SPEC benchmark. However, with this setup we can't make any claims about whether our typing policy is good enough to accommodate all benchmarks without raising unnecessary alerts. This is caused by the tight compiler integration required for typing and does not occur with the other policies we present below.

보호 데모. 우리는 프로그래머가 정수를 함수 포인터에 타입캐스트(typecast)하고 나중에 이 함수를 호출하는 CWE-843의 인스턴스(타입 혼동) [30]의 인스턴스를 사용한다. 이것은 다른 것으로 태깅된 즉치 값(immediate value)을 레지스터에 로드하고, 나중 시점에 그 레지스터가 가리키는 어드레스로 점프하는 것으로 변환한다. 정책(C)을 사용하여, 우리는 정책이 addr 태깅된 값으로 간접 점프만을 허용하기 때문에 결함유도 명령어를 잡아낼 수 있다.protection demo. We use an instance of CWE-843 (type confusion) [30] where the programmer typecasts an integer to a function pointer and later calls this function. This translates into loading an immediate value tagged with something else into a register, and jumping to the address pointed to by that register at a later time. Using policy (C), we can catch the fault-inducing instruction since the policy only allows indirect jumps to values tagged with addr.

특성. 정책(C) 및 (D)는 새로운 태그를 발생하지 않는다. (C)는 단지 17개의 구체적 규칙을 만드는 15개의 상징적 규칙을 가지고 인코딩될 수 있는 반면, (D)는 16개 상징적 규칙 및 19개 구체적 규칙을 필요로 한다. 규칙의 총 수가 적기 때문에, 우리는 미스-핸들러 없는 정책(A)와 비교하여 (0.01 % 미만의) 무시할만한 런타임 오버헤드만을 보여준다. 따라서, PUMP는 정책을 하드웨어로 굽지 않으면서, 간단하고 하드와이어드 타입의 태그 성능을 제공한다.characteristic. Policies (C) and (D) do not generate new tags. (C) can be encoded with 15 symbolic rules making only 17 concrete rules, whereas (D) requires 16 symbolic rules and 19 concrete rules. Because the total number of rules is small, we show only a negligible runtime overhead (less than 0.01%) compared to the policy (A) without a miss-handler. Thus, PUMP provides a simple, hardwired type of tagging capability without having to bake policy into hardware.

관련 저작물. 컴퓨터 아키텍처에서 태그의 제 1 용도 중 하나는 머신 내의 워드의 타입을 구별하는 것이었다 [34, 23]. Symbolics L1SP Machines [31]에서는 명령어, 포인터의 여러 플레이버(flaver), 정수, 부동 소수점 및 초기화되지 않은 값을 비롯한 한 세트의 프리미티브 타입을 구별하기 위해 36b 프리미티브 워드 중에서 태깅을 위한 2 내지 8b를 할당했고; Berkeley SPUR [43]에서는 6b 객체 타입 태그를 사용했다.Related Works. One of the first uses of tags in computer architecture was to distinguish the type of word within a machine [34, 23]. Symbolics L1SP Machines [31] allocated 2 to 8b for tagging out of 36b primitive words to distinguish a set of primitive types including instructions, different flavors of pointers, integers, floating point and uninitialized values. ; Berkeley SPUR [43] used the 6b object type tag.

4.2 공간적 및 시간적 메모리 안전4.2 Spatial and Temporal Memory Safety

위협 모델. 다음의 정책 그룹은 힙-할당된 데이터의 메모리 안전을 타깃으로 하여, 공격자가 객체의 경계(object's bound)(공간적 위반)를 넘어 참조하는 것, 영역이 프리로 만들어진 이후 포인터를 통해 참조하는 것 또는 유효하지 않은 포인터를 프리로 만드는 것(시간적 위반)과 같은 프로그래밍 오류를 악용하는 것을 방지한다. 이것은 힙 스매싱(heap smashing) 및 포인터 위조(point forging)와 같은 전형적인 힙 기반 공격을 포함한다. 여기서 우리가 연구하는 정책은 malloc 및 free로의 호출이 어떻게 메모리 영역을 설정하고 해체하는지를 알려주는 힙-할당된 데이터만을 보호하며; 우리는 스택 할당 또는 박스화되지 않은 데이터 구조(struct)는 다루지 않다. 이것들은 원칙적으로 일부 컴파일러 지원을 가정하여 처리될 수도 있다 ([32] 참조).threat model. The following policy groups target memory safety of heap-allocated data, preventing an attacker from referencing beyond an object's bounds (spatial violation), referencing via a pointer after the region is freed, or Prevent exploitation of programming errors such as freeing invalid pointers (temporal violations). This includes classic heap-based attacks such as heap smashing and point forging. The policy we study here protects only heap-allocated data, which tells how calls to malloc and free set up and free memory regions; We do not deal with stack allocation or unboxed data structures. These can in principle be handled assuming some compiler support (see [32]).

정책 및 규칙. 직관적으로, 각각의 새로운 할당에 대해, 우리는 새로운 블록 id, 소위 ("컬러"를 위한) c를 만들고, (memset와 같은 식으로) c를 새로이 생성된 메모리 블록의 각 메모리 위치상의 태그로서 기입한다. 새로운 블록을 가리키는 포인터는 또한 c로 태깅된다. 나중에, 우리가 포인터를 역참조할 때, 우리는 그 태그가 가리키는 메모리 셀상의 태그와 동일한지를 체크한다. 블록이 프리로 될 때, 블록의 모든 셀상의 태그는 프리 메모리임을 나타내는 상수 F로 변경된다.Policies and Rules. Intuitively, for each new allocation, we create a new block id, so-called c (for "color"), and (in memset-like fashion) write c as a tag on each memory location of the newly created memory block. do. The pointer to the new block is also tagged c. Later, when we dereference the pointer, we check that the tag is the same as the tag on the memory cell it points to. When a block is freed , the tags on every cell of the block are changed to a constant F indicating that it is free memory.

우리는 비-포인터(non-pointer)에 대해 추가 태그

를 사용하고, 태그에 대해 컬러 c 또는

중 하나인 t를 기입한다. 우리는 하나의 부가적인 세부 사항을 처리한다 - 메모리 셀은 포인터를 포함할 수 있다. 그러므로 메모리 내의 워드는 두 개의 태그와 연관되어야 한다. 우리는 이것을 각 메모리 셀상의 태그를 쌍(c, t)을 가리키는 포인터로 만들어서 처리하며, 여기서 c는 이 셀이 할당된 메모리 블록의 id이고, t는 셀에 저장된 워드상의 태그이다. 로드 및 스토어에 대한 규칙은 각 메모리 액세스가 유효한지를 (즉, 액세스된 셀이 이 포인터가 가리키는 블록 내에 있는지를) 체크하는 것과 함께, 이들 쌍을 패킹 및 언패킹하는 일을 처리한다.We add additional tags for non-pointers

, and color c or

Enter one of them, t. We take care of one additional detail - memory cells can contain pointers. Therefore, a word in memory must be associated with two tags. We do this by making the tag on each memory cell a pointer to the pair (c, t), where c is the id of the memory block to which this cell is allocated, and t is the tag on the word stored in the cell. The rules for load and store handle packing and unpacking these pairs, along with checking that each memory access is valid (i.e., that the cell accessed is within the block pointed to by this pointer).

(11)

(12)

(13)

포인터 상의 태그가 할당에 의해서만 발생한다는 불변성을 유지하기 위해, 스크래치(scratch)로부터 데이터를 생성하는 (상수를 로딩하는 것과 같은) 동작은 태그를

로 설정한다.To maintain the immutability that tags on pointers only occur by assignment, operations that create data from scratch (such as loading constants) do not trigger tags.

set to

우리는 §2의 끝 부분에서 설명된 명령어 수정자 및 임시 규칙을 사용하여 메모리 영역에 태깅하는 malloc 및 free를 늘릴 수 있다. malloc에서, 우리는 임시 규칙을 통해 새로운 영역을 가리키는 포인터에 대한 새로운 태그를 발생한다. 그런 다음 우리는 태깅된 포인터를 리턴하기 전에 새로 태깅된 포인터를 사용하여 특수 스토어 규칙을 사용하여 할당된 영역 내의 모든 워드에 0을 기입한다.We can augment malloc and free tagging memory regions by using the instruction modifiers and temporary rules described at the end of §2. In malloc, we generate a new tag for a pointer to a new area via a temporary rule. Then we use the newly tagged pointer to write zeros to all words within the allocated area using special store rules before returning the tagged pointer.

(14)

반대로, free는 메모리 영역을 프리 리스트에 리턴하기 전에 수정된 스토어 명령어를 사용하여 영역을 할당되지 않은 것으로 다시 태깅한다:Conversely, free uses a modified store instruction to re-tag the region as unallocated before returning it to the free list:

(15)

우리는 이 정책의 몇 가지 변형예를 구현하여, 성능/보안의 상쇄관계를 설명한다. 제 1 (E)에서, 우리는 주어진 소스 모듈에 의해 할당된 모든 메모리 영역에 단일 컬러를 할당한다. 이러한 샌드박싱 정책은 소프트웨어 기반 결함 격리(software-based fault isolation) [46]와 유사한, 프로세스 내에서 모듈 별 격리를 제공한다. 다음 변형예에서, 우리는 - 그저 단일 컬러 (F)로 된 - malloc으로의 연속 호출에 의해 리턴되는 영역에 태깅하기 위해 상이한 수의 컬러를 사용하는데 - 이것은 가장 약한 형태의 공간적 및 시간적 메모리 안전을 제공하여, 할당되지 않은 메모리로부터 할당된 것을 - 8 (G) 및 32 (H) 컬러까지만 구분할 뿐이다. 컬러 수를 늘리면 컬러의 재사용으로 인해 발생하는 앨리어싱 효과가 줄어든다. 마지막으로 우리는 컬러에 필요한 전체 64-비트 태그 공간을 사용하여, 정확한 전체 메모리 안전 정책(I)을 구현한다.We implemented several variations of this policy to account for performance/security trade-offs. In the first (E), we assign a single color to all memory regions allocated by a given source module. This sandboxing policy provides per-module isolation within a process, similar to software-based fault isolation [46]. In the following variant, we use different numbers of colors to tag regions returned by successive calls to malloc - just in a single color (F) - which provides the weakest form of spatial and temporal memory safety. provided, it only distinguishes allocated from unallocated memory - up to 8 (G) and 32 (H) colors. Increasing the number of colors reduces the aliasing effect caused by color reuse. Finally, we implement a precise full-memory safety policy (I), using the entire 64-bit tag space required for color.

보호 데모. 우리는 Juliet 제품군 [10]로부터의 두 개의 공격을 사용한다. 제 1공격은 CWE-416(Use After Free) [28]의 사례이며, 이 경우 애플리케이션은 정책 (I)를 사용하여 메모리 위치 태깅된 F로부터 로드하려는 시도를 잡아낸다. 제 2공격은 버퍼가 할당되고 나중에 (strcpy를 사용하여) 그의 경계를 넘어 기입하여, 유효한 영역을 오버라이트하는 CWE-122(Heap-Based Buffer Overflow) [27]의 사례이다. (I)를 사용하여, PUMP는 문자를 태깅된 F의 메모리 위치에 넣으려 시도하는 명령어를 중지시킨다.protection demo. We use two attacks from the Juliet family [10]. The first attack is the case of CWE-416 (Use After Free) [28], in which case the application uses policy (I) to catch an attempt to load from a memory location tagged F. The second attack is an example of CWE-122 (Heap-Based Buffer Overflow) [27] where a buffer is allocated and later written to beyond its bounds (using strcpy), thus overwriting the valid area. Using (I), PUMP aborts the instruction attempting to put a letter into the memory location of the tagged F.

특성. 샌드박싱(E) 및 컬러 수가 적은 정책 (F), (G) 및 (H)은 소수의 태그만을 할당하고 적은 수의 규칙(32 컬러의 경우 600 미만)만을 만든다. 이것들은 런타임 오버헤드를 추가하지 않다 - 규칙은 모두 캐시에 딱 맞는다. 전체 메모리 안전 (I)은 더 비싸다: 이것은 메모리 할당 당 하나의 태그를 할당하는 것이고, 이 때문에 새로운 규칙이 캐시에 추가되어야 한다. 이것은 미스 핸들러를 통해 더 많은 트립을 필요로 하며 일부 벤치마크에서, 구체적 규칙 세트가 캐시보다 크다는 것을 의미한다. 그럼에도 불구하고 규칙 지역성은 높으며(도 7 참조), 평균 런타임 오버헤드는 13 %에 불과하다. 우리는 GemsFDTD에 대해 약 130 %의 최대 오버헤드를 목격할 수 있다.characteristic. Sandboxing (E) and low color count policies (F), (G) and (H) assign only a few tags and create only a small number of rules (less than 600 for 32 colors). These add no runtime overhead - the rules all fit into the cache. Full memory safety (I) is more expensive: it allocates one tag per memory allocation, and for this a new rule must be added to the cache. This means more trips through the miss handler and, in some benchmarks, the specific rule set is larger than the cache. Nonetheless, the rule locality is high (see Figure 7), and the average runtime overhead is only 13%. We can witness a maximum overhead of around 130% for GemsFDTD.

관련 저작물. Clause 등 [16]은 처음으로 메타데이터 테인팅을 사용하는 공간적 및 시간적 메모리 보호를 실증하였다. Deng 등 [19, 20]은 이러한 테인팅을 하드웨어 태그 관리로 지원했다. HardBound [21]는 모니터링된 코드와 모니터링되지 않은 코드 사이의 데이터 구조체 레이아웃 호환성을 유지하기 위해 경계 정보를 그림자 공간에 배치하는 공간 메모리 안전에 대한 접근법이다. HardBound의 런타임 오버헤드는 10 내지 20 %이다. Watchdog [32]은 각 할당에 대해 고유 식별자를 생성함으로써 시간적 위반을 부가적으로 방지하는 HardBound의 후속 조치이고; 평균 런타임 오버헤드가 24 %이다. SoftBound [33]는 HardBound처럼, C에 대한 공간 메모리 안전을 제공하는 소프트웨어 접근법이지만, 런타임 오버헤드의 비용이 증가한다(SPEC 및 Olden 벤치마크에서 67 %). Baggy Bounds [3]는 또한 공간 위반만을 타깃으로 하며 SPEC2000에서 60 % 런타임 오버헤드를 달성한다.Related Works. Clause et al. [16] first demonstrated spatial and temporal memory protection using metadata tainting. Deng et al. [19, 20] supported such tainting with hardware tag management. HardBound [21] is an approach to spatial memory safety that places boundary information in shadow space to maintain data structure layout compatibility between monitored and unmonitored code. HardBound's runtime overhead is 10 to 20%. Watchdog [32] is a follow-up to HardBound that additionally prevents temporal violations by generating a unique identifier for each assignment; The average runtime overhead is 24%. SoftBound [33], like HardBound, is a software approach that provides spatial memory safety for C, but at the cost of runtime overhead (67% in SPEC and Olden benchmarks). Baggy Bounds [3] also targets only space violations and achieves 60% runtime overhead in SPEC2000.

4.3 제어-흐름 무결성4.3 Control-flow integrity

위협 모델. 이 정책 그룹은 코드-재사용(code-reuse) 공격을 타깃으로 한다. 우리는 공격자가 데이터를 실행하거나 코드를 주입 또는 수정할 수 없다는 표준 가정 [2]을 만든다. (우리는 우리의 작성된 정책을 가지고 §5에서 하는 것과 같이, §1의 프리미티브 타입 정책을 사용하여 이 가정을 시행할 수 있다). 그 대신, 공격자는 악의적 거동을 유도하기 위해 기존 코드 스니핏(가젯)들을 함께 묶으려 시도한다.threat model. This policy group targets code-reuse attacks. We make the standard assumption [2] that an attacker cannot execute data or inject or modify code. (We can enforce this assumption using the primitive type policy in §1, just as we do in §5 with our written policy). Instead, attackers attempt to string existing code snippets (gadgets) together in order to induce malicious behavior.

정책 및 규칙. 모든 코드-재사용 공격의 공통 요소는 원래의 이진 파일에 존재하지 않는 제어 흐름을 도입하는 것이다. 우리는 각각의 간접 제어 흐름(계산된 점프)을 프로그램의 제어 흐름 그래프에 대해 입증하는 CFI 정책 패밀리를 구현한다. 코드가 고정되어 있기 때문에, 직접 점프를 동적으로 검사 할 필요가 없다 [2]. 먼저 우리는 [2, 51] (J), (K) 및 (L)의 저급의 CFI 정책을 구현한다. (J)는 모든 간접 호출, 간접 점프 및 리턴 명령어와 이들의 잠재적 타깃을 단일 태그{f}로 태깅한다. 간접 제어 흐름의 소스인 명령어를 실행하면, 우리는 이 태그를 PC로 이전한다.Policies and Rules. A common element of all code-reuse attacks is the introduction of control flows that do not exist in the original binary. We implement a family of CFI policies that validate each indirect control flow (calculated jump) against the program's control flow graph. Because the code is fixed, there is no need to dynamically check direct jumps [2]. First we implement the low-level CFI policies of [2, 51] (J), (K) and (L). (J) tags all indirect call, indirect jump and return instructions and their potential targets with a single tag {f}. When we execute an instruction that is the source of the indirect control flow, we transfer this tag to the PC.

(16)

다른 모든 명령어는 Ø로 태깅된다. PC가 {f}로 태깅될 때마다, 현재 명령어는 동일한 태그를 가져야 한다:All other instructions are tagged with Ø. Whenever a PC is tagged with {f}, the current instruction must have the same tag:

(17)

정책 (K)는 리턴(이들의 태그에는 r이 포함되어 있음)로부터 비롯하는 제어 흐름을 간접 호출 및 점프(이들의 태그에는 c가 포함되어 있음)로부터 비롯하는 제어 흐름으로부터 분리하여 추적하기 위해 더 많은 태그(Ø, {r}, {c} 및 {r, c})를 사용한다. 정책 (L)은 권한이 있는 코드(이들의 태그에는 p가 포함되어 있음)로 리턴하기 위한 두 개의 추가 태그 ({p}및 {p, c})를 가진 (K)로 확장하여, 중요한 코드 스니핏에 대해 추가 보호를 가능하게 한다 [51].Policy (K) further separates and traces the control flow resulting from returns (these tags contain r) from indirect calls and jumps (these tags contain c). It uses many tags (Ø, {r}, {c} and {r, c}). Policy (L) extends to (K) with two additional tags ({p} and {p, c}) to return to authorized code (these tags contain p), so that critical code Enables additional protection against snippets [51].

Goktas 등 [22]에서 볼 수 있는 바와 같이, 이러한 느슨한 CFI 정책은 정교한 코드-재사용 공격에 대해 충분한 보호를 제공하지 못한다. 우리는 또한 Goktas 등이 "이상적인 CFI"라고 설명했던, 한 세트의 미세 세밀화된 CFI 정책을 구현했다. 우리는 두 개의 직교 정책: 간접 점프 및 호출과 이들의 타깃 간의 연관을 정밀하게 추적하는 PUMP JOP (M); 및 §2(규칙 1'-4')에서 제시된 바와 같이, 리턴에 대해 동일하게 수행하는 PUMP ROP (M)을 처음 소개한다. 우리는 마지막으로 이 두 정책을 PUMP CFI (O)로 병합한다 - 단일 정책은 모든 간접 제어 흐름을 정확하게 추적하고 입증해준다. 이러한 모든 정책에서, 컴파일러 또는 링커는 간접 제어 흐름 및 태그 명령어의 사운드 과도근사(sound overapproximation)를 계산한다고 추정된다.As seen in Goktas et al. [22], these loose CFI policies do not provide sufficient protection against sophisticated code-reuse attacks. We also implemented a set of fine-grained CFI policies, described by Goktas et al as “ideal CFI”. We use two orthogonal policies: PUMP JOP (M), which precisely tracks the association between indirect jumps and calls and their targets; and §2 (rule 1'-4'), we first introduce a PUMP ROP (M) that performs the same on return. We finally merge these two policies into PUMP CFI (O) - a single policy accurately tracks and verifies all indirect control flows. In all these policies, it is assumed that the compiler or linker computes a sound overapproximation of the indirect control flow and tag instructions.

보호 데모. 우리는 이들 정책을 "무해한" 기능으로의 단일 호출로 구성된 특수하게 제작된 프로그램에 대해 테스트하였다. 코드에는 또한 실행 경로의 일부가 아니고 의도하지 않은 거동을 유발하도록 악용될 수 있는 휴면 가젯을 모방하는, 결코 호출되지 않는 "나쁜" 함수가 포함되어 있다. 리턴-지향(return-oriented) 공격을 시뮬레이션하기 위해, 무해한 함수의 줄지은 어셈블리는 스택 포인터를 나쁜 함수의 어드레스로 오버라이팅하여, 실행을 속여 나쁜 함수에 리턴시킨다. 정책 (N)은 나쁜 리턴이 유효한 제어 흐름 세트에 없음을 알려줌으로써 이와 같이 시뮬레이션된 공격을 검출한다.protection demo. We tested these policies against a specially crafted program consisting of a single call to a "harmless" function. The code also contains "bad" functions that are never called, mimicking dormant gadgets that are not part of the execution path and can be exploited to cause unintended behavior. To simulate a return-oriented attack, a series of assemblies of harmless functions overwrites the stack pointer with the bad function's address, tricking execution into returning to the bad function. Policy (N) detects this simulated attack by indicating that the bad return is not in a valid control flow set.

특성. 위의 각 CFI 정책은 2 내지 4개의 상징적 규칙만으로 매우 간결하게 인코딩될 수 있다. 더 간단한 정책 (J), (K) 및 (L)은 또한 매우 적은 수의 태그 및 구체적 규칙을 필요로 한다. 도 1에서 도시된 바와 같이, 이들 중 가장 큰 (L)은 6 개의 상수 태그를 사용하고 21 이하의 구체적 규칙을 필요로 한다. 이렇게 작은 작업 세트 크기로 인해, 이러한 정책은 비어 있는 정책보다 런타임 오버헤드를 유발하지 않는다. SPEC 벤치마크에 보다 강력한 CFI 정책 (M), (N), (O)을 적용하면 이러한 정책에 대해 최대 수천 개의 구체적 규칙을 생성하며, 이는 4096 개 엔트리, 프리-미스-핸들러 PUMP 캐시에 완전히 들어 맞는다. 결과적으로, 우리는 추가적인 런타임 오버헤드 없이 이상적인 부가된 보호를 획득한다. 완전한 CFI (O) 정책은 어플리케이션에 대한 제어 흐름 그래프를 저장하는 평균 28K 워드가 필요하다(이 시뮬레이션의 경우, 우리는 이것을 gem5에 의해 생성된 명령어 트레이스에서 추출한다; 실제로 이것은 우리의 시뮬레이션에서 결코 발휘되지 않는 허용된 제어 흐름 경로를 포함하는 것으로 표시된 것보다 많은 공간이 필요하다).characteristic. Each of the above CFI policies can be encoded very concisely with only 2 to 4 symbolic rules. The simpler policies (J), (K) and (L) also require very few tags and specific rules. As shown in Figure 1, the largest of these (L) uses 6 constant tags and requires 21 or fewer specific rules. Because of this small working set size, these policies incur no runtime overhead than empty policies. Applying the stronger CFI policies (M), (N), and (O) to the SPEC benchmark generates up to thousands of specific rules for these policies, which are fully contained in the 4096 entry, pre-miss-handler PUMP cache. That's right. As a result, we get the ideal added protection without additional runtime overhead. A complete CFI(O) policy requires an average of 28K words to store the control flow graph for the application (for this simulation, we extract this from an instruction trace generated by gem5; in practice this never worked in our simulation). (requires more space than indicated to contain allowed control flow paths that are not allowed).

관련 저작물. CFI [2]는 일반적인 코드-재사용 공격에 대한 매력적인 방어책을 제공하지만 종종 너무 비싸다고 여겨져 왔다. 최근의 저작물 [51]은 브랜치 타깃 체킹을 제공하며 가제트를 성공적으로 구성하는 것에 대한 추가 방어책으로서 무작위화하는 "발판(springboard)"을 사용하는 낮은 오버헤드 CFI 체계를 실증하였다. 그러나, 이 저작물은 정책 (N), (M) 및 (O)가 수행하는 것처럼 특정 타깃을 가진 특정 리턴 포인트가 아닌, §2로부터의 규칙 1 내지 4에 있는 단일-타깃 예제와 유사한, 허용된 호출 및 리턴 타깃만을 제재할 뿐이어서, 공격에 취약한 채로 방치하며; 우리의 (M) 및 (O)가 수행하는 것처럼 절차 내 CFI를 다루지도 않는다.Related Works. CFI [2] provides an attractive defense against common code-reuse attacks, but has often been considered too expensive. A recent work [51] demonstrated a low-overhead CFI scheme that provides branch target checking and uses a randomizing "springboard" as an additional defense against successfully constructing gadgets. However, this work is not a specific return point with a specific target as policies (N), (M) and (O) do, but similar to the single-target examples in Rules 1 to 4 from §2, the permitted It only sanctions call and return targets, leaving them vulnerable to attack; Nor does it deal with CFI in procedures as our (M) and (O) do.

4.4 테인트 추적4.4 Taint Tracking

위협 모델. 이 정책은 공격자가 의도하지 않거나 악의적인 거동(예를 들어, SQL 또는 OS 명령어 주입)을 유발하는, 입력 위생 처리(input sanitization)를 하지 않는 프로그램에 기형 데이터를 입력하는 사례를 다룬다.threat model. This policy covers cases where an attacker enters malformed data into a program that does not sanitize input, which results in unintended or malicious behavior (e.g. SQL or OS command injection).

정책 및 규칙. 테인트 추적은 신뢰성 없는 데이터가 민감한 동작으로 유입될 수 있는 때를 검출함으로써 이러한 위협을 완화한다. PUMP는 무한한 수의 소스, 소스 당 별도의 테인트 및 각 데이터 조각상의 다수의 테인트를 사용하여 미세 세밀화된 테인트 추적을 용이하게 하여, 각 태그가 한 세트의 소스 id를 가리키는 포인터가 될 수 있게 한다. 값 상의 테인트는 이를 계산하는데 사용된 값상의 테인트들의 합집합이다. 전형적인 테인트 전파 규칙은 다음을 포함한다:Policies and Rules. Taint tracking mitigates this threat by detecting when untrusted data can flow into sensitive behavior. PUMP facilitates fine-grained taint tracking using an infinite number of sources, separate taints per source, and multiple taints on each piece of data, where each tag can be a pointer to a set of source ids. let it be A taint on a value is the union of taints on a value used to compute it. Typical taint propagation rules include:

(18)

(19)

(20)

*우리가 연구하는 모든 정책은 초기 테인트의 수와 소스만 다른 동일한 세트의 상징적 규칙을 사용한다. 우리는 두 개의 상이한 방식: 입력 소스 (P 및 Q)에 의한 방식 및 코드 영역((R), (S) 및 (T))에 의한 방식으로 테인트를 소개한다. (P)에서, 우리는 모든 입력 소스(즉, SPEC 프로그램의 경우, 표준 I/O 스트림 및 입력 파일)에 대해 단일 테인트를 사용한다. 이것은 신뢰성 없는 소스로부터 임의의 데이터가 t로 태깅된 값을 계산할 때 사용되었는지를 단일-비트 테인트 t가 단순히 표시하는 대부분의 이전의 연구 [41]와 유사하다. 정책 (Q)는 각 입력 스트림에 고유한 테인트 id를 할당함으로써 (P)를 확장하며; 스트림 수에는 제한이 없다. 프로그램 코드에 의한 테인팅은 신뢰성 없는 라이브러리 및 버그가 있는 컴포넌트에 대해 보호한다. 우리는 (i) 각각의 라이브러리 (R), (ii) 포함된 각각의 헤더 파일 (S) 또는 (iii) 코드의 각 기능 (T)에 대해 고유 테인트를 사용함으로써 세분성을 변경한다. 이들 정책은 컴파일러가 명령어를 관련된 테인트 식별자로 태깅할 것을 필요로 한다. 마지막으로, 우리는 (Q) 및 (R)을 조합하여 정책 (U)를 만든다.*All policies we study use the same set of symbolic rules, differing only in the number and source of initial taints. We introduce taints in two different ways: by input sources (P and Q) and by code domains ((R), (S) and (T)). In (P), we use a single taint for all input sources (i.e., for SPEC programs, standard I/O streams and input files). This is similar to most previous work [41] where the single-bit taint t simply indicates whether any data from an untrusted source was used to compute the value tagged with t. Policy (Q) extends (P) by assigning each input stream a unique taint id; There is no limit on the number of streams. Tainting by program code protects against untrusted libraries and buggy components. We change the granularity by using a unique taint for (i) each library (R), (ii) each included header file (S), or (iii) each function in the code (T). These policies require the compiler to tag instructions with associated taint identifiers. Finally, we combine (Q) and (R) to create policy (U).

보호 데모. 우리는 CWE-78(OS Command Injection) [29]의 사례를 고려해 볼 수 있는데, 여기서 사용자는 system 시스템 호출에 넘겨진 Is 명령어의 인수를 파라미터화하는 것을 허용 받았을 뿐이다. 악의적인 사용자는 임의의 명령어와 함께 명령어 종료 문자로 시작하는 파라미터 문자열을 추가한다. 이것은 "신뢰성 없음"으로 태깅되어 execve 시스템 호출에 인수로서 넘겨진 데이터를 사후 위생처리된 데이터로 변환한다. 정책 (U)를 사용하여, PUMP는 신뢰성 없음을 시스템-호출 테인트와 막 조합하려는 것을 보았을 때 실행을 중지시킨다.protection demo. We can consider the case of CWE-78 (OS Command Injection) [29], where the user is only allowed to parameterize the arguments of the Is command passed to the system system call. A malicious user adds a parameter string starting with a command termination character along with an arbitrary command. This converts data tagged as "untrusted" and passed as an argument to the execve system call into post-sanitized data. Using policy (U), PUMP aborts execution when it sees that it is about to combine unreliability with a system-call taint.

특성. 이러한 모든 정책은 7개의 opgroup의 관점에서 정의된 동일한 8 개의 상징적 규칙을 사용한다. 처음 두 개 (P 및 Q)는 입력 스트림을 테인트 소스로서 사용한다. (Q)의 경우, 모든 SPEC 프로그램에 걸쳐, 우리는 평균적으로 2 개 소스만 볼 수 있고, 10 개 및 14 개의 구체적 규칙만 필요하다. 결과적으로 이러한 정책은 눈에 띄는 런타임 오버헤드를 초래하지 않는다. 정책 (R) 내지 (U)의 경우, 우리는 더 큰 작업 세트를 목격한다.characteristic. All these policies use the same 8 symbolic rules defined in terms of 7 opgroups. The first two (P and Q) use the input stream as a taint source. For (Q), across all SPEC programs, we only see 2 sources on average, and we only need 10 and 14 specific rules. As a result, these policies do not incur appreciable runtime overhead. For policies (R) to (U), we see a larger set of tasks.

함수 실험 (T)에 의한 테인트는 의도적으로 메커니즘을 극단적으로 밀어 넣어, 실제 유용할 것보다 미세 세밀화된 태깅을 제공한다. 많은 수의 테인트는 PUMP 캐시가 한 번에 보유할 수 있는 것보다 더 많은 정도의 규칙을 초래한다. 또한, 태그-처리 오버헤드는 커진다(4110 명령어). 이러한 요소로 인해 314 %의 평균 런타임 오버헤드가 발생한다. 이것은 PUMP 메커니즘이 복잡한 정책 하에서 무리를 주지만 여전히 정책을 지원할 수 있음을 보여준다. 파일 당 테인트 (S)는 또한 유용할 가능성이 있는 것보다 미세 세밀화되고 규칙 세트가 더 작고 미스 핸들러 해결이 더 엄격해지기 때문에 런타임 오버헤드를 9 %로 낮게 달성한다.Tainting by function experimentation (T) deliberately pushes the mechanism to extremes, providing more fine-grained tagging than would be useful in practice. A large number of taints results in more degrees of rules than the PUMP cache can hold at one time. Also, the tag-processing overhead is large (4110 instructions). These factors result in an average runtime overhead of 314%. This shows that the PUMP mechanism can strain under complex policies but still support them. Per-file taints (S) also achieve runtime overhead as low as 9%, as they are finer-grained than likely useful, the rule set is smaller, and miss handler resolution is tighter.

우리가 전체 라이브러리에 테인트를 할당하는 정책 (R) 및 (Q)는 보다 합리적인 용법을 나타낸다. 여기서, 평균 런타임 오버헤드는 미스-핸들러가 없는 사례와 구별할 수 없다. 이것은 PUMP가 또한 (1b-테인트 또는 4b-테인트를 사용하는 이전의 연구와 비교하여) 본질적으로 추가 런타임 오버헤드 없이 훨씬 더 풍부한 모델을 표현하고 지원할 수 있음을 보여준다. 그뿐만 아니라, 이러한 다양한 테인트 사례에 걸쳐, 최종 태그는 초기 및 동적으로 할당된 태그의 2 내지 3X배에 불과하며; 이것은 우리가 비-싱글톤(non-singleton) 태그 세트를 생성하는 동안, 우리가 이론적인 최악 사례의 거듭 제곱 효과와 근접하는게 없음을 안다는 것을 보여준다.Policies (R) and (Q) in which we assign taints to the entire library represent a more reasonable usage. Here, the average runtime overhead is indistinguishable from the no-miss-handler case. This shows that PUMP can also express and support much richer models (compared to previous studies using 1b-taints or 4b-taints), essentially without additional runtime overhead. Not only that, but across these various taint cases, the final tag is only 2 to 3X times the initial and dynamically allocated tag; This shows that while we create a set of non-singleton tags, we know nothing close to the theoretical worst-case power-of-effect.

관련 저작물. 테인트 추적을 사용하여 해결된 취약성은 포맷 문자열(format string) 공격 [48, 17, 41, 18, 12], 교차 사이트 스크립팅(cross-site scripting) [48, 18, 12], 메모리 악용(memory exploits) [48, 17, 41, 14, 18, 36, 12], 코드 주입(code injection) [48, 17, 18, 12] 및 기타 [49, 18]를 포함한다. 기존의 대부분의 저작물은 프로그램이 계기화된 소프트웨어 기술에 중점을 둔다. 전형적으로, 이러한 기술은 다른 장애(예를 들어, 동적 이진 변환(dynamic binary translations) [15]에서 경쟁 조건 처리) 외에, 의미있는 런타임 오버헤드(종종 2x 배이상, 일부 20x 배까지)를 도입한다.Related Works. Vulnerabilities addressed using taint tracking include format string attacks [48, 17, 41, 18, 12], cross-site scripting [48, 18, 12], and memory exploitation. exploits [48, 17, 41, 14, 18, 36, 12], code injection [48, 17, 18, 12] and others [49, 18]. Most of the existing works focus on software technologies in which programs are instrumented. Typically, these techniques introduce significant runtime overhead (often 2x or more, some up to 20x), in addition to other handicaps (e.g. handling of race conditions in dynamic binary translations [15]). .

DIFT [41], Minos [17], 및 SIFT [35]와 같은 하드웨어 접근법은 단일 테인트 비트를 사용한다. Raksha - 온-코어 [18] 및 전용 코프로세서 [26] 변형 둘 모두 - 는 4-비트 태그를 사용하여 최대 네 개의 동시 정책을 지원한다. 대조적으로, 우리는 아마도 신뢰성 레벨이 상이한 다수의 신뢰성 있는 소스에 대응하는 임의의 테인트 세트를 허용한다. 보다 유연한 태깅 체계는 §5에서 논의된다.Hardware approaches such as DIFT [41], Minos [17], and SIFT [35] use a single taint bit. Raksha - both on-core [18] and dedicated coprocessor [26] variants - supports up to four concurrent policies using 4-bit tags. In contrast, we allow arbitrary sets of taints corresponding to multiple trusted sources, possibly with different levels of trustworthiness. A more flexible tagging scheme is discussed in §5.

4.5 복합 정책4.5 Composite Policy

이러한 정책 각각은 잠재적으로 유용하지만, 한 번에 시행할 단일 정책만을 선택해야 한다면 - 예를 들면, 버퍼 오버플로우 또는 명령어 주입 취약성에 대비하여 보호하는 것 중 선택한다면 - 부끄러운 일이 될 것이다. 대신에, 우리는 일반적으로 다수의 정책을 작성하여 보호를 원한다. 실제로, 우리의 개별 정책 중 일부는 모든 위협에 대해 보호하는 상호 보호가 필요하다 (CFI는 데이터가 실행될 수 없고 코드가 생성되거나 수정될 수 없음을 보장하기 위해 타입 보호에 의존한다).Each of these policies is potentially useful, but it would be a shame if you had to choose only one policy to enforce at a time - choosing between protecting against buffer overflows or command injection vulnerabilities, for example. Instead, we generally want protection by writing multiple policies. In fact, some of our individual policies require mutual protection that protects against all threats (CFI relies on type protection to ensure that data cannot be executed and code cannot be created or modified).

복합은 잠재적으로 태그 수와 생성된 규칙 수를 증가시켜 성능을 상당히 저하시킨다. 조합의 효과를 특성화하기 위해, 우리는 두 개의 복합 정책을 구현한다. 첫째 우리는 3 프리미티브 타입 (C), 간단한 메모리 안전 (F), CCFIR (L) 및 단일-비트 입력-테인트 (P)의 네 개 보호 부류 각각의 가장 단순한 인스턴스를 기반으로 상당히 작은 것을 구현한다. 둘째, 우리는 4 프리미티브 타입 (D), 전체 공간적 및 시간적 메모리 안전 (I), PUMP CFI (O), 스트림 입력 테인트 별 및 라이브러리 별의 복합 (U)의 복합인 보다 복합적이고 강력한 보호 기능을 구현한다.Composite potentially increases the number of tags and the number of rules created, which significantly degrades performance. To characterize the effect of the combination, we implement two composite policies. First we implement a fairly small one based on the simplest instance of each of the four protection classes: 3 primitive types (C), simple memory safety (F), CCFIR (L) and single-bit input-taint (P). . Second, we use more complex and stronger protections, which are a composite of 4 primitive types (D), total spatial and temporal memory safety (I), PUMP CFI (O), stream input taint-specific and library-specific complex (U). implement

특성. 간단한 복합 정책 (V)는 캐시에 적합하며 구성성분들 정책과 동일한 성능을 갖는다. 더 큰 복합 정책 (W)의 경우, 모든 정책을 해결해야 하므로 미스 핸들러의 규칙 해결에 필요한 명령어 수가 상당히 증가하여, 38 내지 710으로 올라간다. 최종 태그의 증가는 2.5x 배에 불과하여, 복합 태그로부터 일부 제품 세트 효과가 있지만, 최악의 시나리오에는 거의 나타나지 않음을 암시한다. 또한, 구체적 규칙은 더 많은 태그 세트 및 부가적인 opgroup으로 인해 약 3x배 증가할 뿐이다. 더 큰 구체적 규칙 세트(이제는 PUMP 캐시 용량보다 훨씬 큼) 및 증가된 미스 핸들러 비용의 조합은 38 %의 평균 오버헤드를 초래하여 최악의 오버헤드는 280 %만큼 높아진다(GemsFDTD). 이것은 PUMP가 성능에 미치는 영향을 희생하여 복합이 초래하는 더 많은 수의 규칙을 처리할 수 있음을 보여준다. 많은 애플리케이션의 경우, 오버헤드는 그다지 중요하지 않지만, 일부의 경우 지나치게 커진다. 이것은 함수 실험 (T)에 의한 테인트와 함께, 우리의 진행중인 연구의 초점인 이와 같은 풍부한 복합에서 대해 합리적인 성능을 얻기 위해 미스 핸들러 서비스 시간을 줄이는 추가 소프트웨어 및 마이크로아키텍처적 최적화의 필요성을 지적한다.characteristic. A simple compound policy (V) is suitable for caching and has the same performance as the constituents policy. For larger compound policies (W), the number of instructions needed to resolve rules in the miss handler increases significantly, going from 38 to 710, since all policies have to be resolved. The increase in the final tag is only a factor of 2.5x, suggesting that there is some product set effect from the composite tag, but very little in the worst case scenario. Also, specific rules only increase about 3x with more tag sets and additional opgroups. The combination of a larger specific rule set (now much larger than the PUMP cache capacity) and increased miss handler cost results in an average overhead of 38%, with the worst-case overhead being as high as 280% (GemsFDTD). This demonstrates that PUMP can handle the larger number of rules that compositing results in at the expense of performance impact. For many applications, the overhead is insignificant, but in some cases it becomes excessive. This, together with the taint by function experimentation (T), points to the need for additional software and microarchitectural optimizations to reduce the miss handler service time to obtain reasonable performance on rich complexes such as this, which is the focus of our ongoing research.

4.6 논의4.6 Discussion

전체 규칙 수는 규칙의 지역성 및 결과적으로 효과적인 작업 세트 크기를 완전히 점하지 못한다. 도 7은 gcc 벤치마크에 대한 10 억 명령어 시뮬레이션 내에서 백만 명령어 시퀀스 내에서 사용된 고유 규칙의 수의 누적 분포 함수(cumulative distribution function)(CDF)를 도시한다. 이것은 전체 메모리 안전 (I)이 매우 빽빽한 작업 세트(대부분 3000 미만)를 가지고 있음을 보여주며; 이것은 비-복합 정책의 구체적 규칙의 수가 최대이기 때문에 중요하다. 이러한 지역성은 규칙 세트가 훨씬 더 큼에도 불구하고 성능 오버헤드가 왜 낮은지에 대한 이유를 설명하는데 도움이 된다. 복합 CFI (O)는 더 큰 작업 세트를 갖지만, 복합 규칙 세트는 4096 엔트리 캐시에 완전히 들어 맞고, 그래서 퇴출이 없다.The total number of rules does not fully account for the locality of the rules and consequently the effective working set size. Figure 7 shows the cumulative distribution function (CDF) of the number of unique rules used within a million instruction sequence within a billion instruction simulation for the gcc benchmark. This shows that the overall memory safe (I) has a very dense working set (mostly less than 3000); This is important because the number of specific rules in a non-composite policy is at most. This locality helps explain why the performance overhead is low despite having a much larger rule set. A composite CFI (O) has a larger working set, but the composite rule set fits completely in the 4096 entry cache, so there are no exits.

이전 저작물에서는 안전 및 보안 정책(예를 들어, [42])을 간결하게 표현하거나 비슷하게 하기 위해 기발한 체계를 사용했지만, 이것은 종종 의도된 정책에 대한 절충안이며 복잡성을 간결함으로 바꾸어줄 수 있다. 우리는 런타임 오버헤드가 거의 없거나 추가되지 않으면서 보다 완벽하고 보다 자연스럽게 보안 정책의 요구 사항을 받아주는 풍부한 메타데이터를 포함할 수 있음을 보여준다. 메타데이터 표현 및 정책 복잡성에 고정된 경계를 부과하는 대신, PUMP는 적절히 성능을 저하시킨다. 이로 인해 정책은 일반적인 경우의 성능 및 크기에 영향을 주지 않으면서 필요한 곳에 더 많은 데이터를 사용할 수 있게 한다. 복잡한 정책이라도 쉽게 표현되고 실행될 수 있기 때문에, 정책의 점진적 세분화 및 성능 조정이 추가로 가능하다.Previous works have used ingenious schemes to succinctly express or approximate safety and security policies (eg [42]), but these are often compromises to the intended policy and can trade complexity for brevity. We show that it is possible to include rich metadata that more fully and more naturally accommodates the requirements of a security policy with little or no added runtime overhead. Instead of imposing fixed boundaries on metadata representation and policy complexity, PUMP moderately degrades performance. This allows policies to make more data available where needed without affecting performance and size in the normal case. Even complex policies can be easily expressed and enforced, further enabling incremental granularity and performance tuning of policies.

4.7 기타 마이크로-정책4.7 Other micro-policies

우리는 우리의 프로그래밍 모델이 다수의 다른 정책을 인코딩할 수 있다고 생각한다. 정보-흐름 제어(예를 들어, [6, 37, 40, 24, 8])는 여기서 간단한 테인트 추적 모델보다 더 풍부하지만, 암묵적 흐름을 추적하는 것은 RIFLE-스타일 이진 변환(RIFLE-style binary translation) [44]을 사용하여 또는 컴파일러로부터 일부 지원되는 PC 태그의 사용에 의해 지원될 수 있다. 마이크로-정책은 경량의 액세스 제어 및 구획화 [47]를 지원할 수 있다. 태그는 위조할 수 없는 자원 [50]을 구별하는데 사용될 수 있다. 고유의 생성된 토큰은 데이터를 봉인하고 보증하는데 핵심 역할을 할 수 있고, 결국 강력한 추상화에 사용할 수 있으므로 - 데이터가 인가된 코드 컴포넌트에 의해서만 생성되고 파괴되는 것을 보장한다. 마이크로-정책 규칙은 불변성 및 선형성과 같은 데이터 불변성을 시행할 수 있다. 마이크로-정책은 데이터 또는 선물(예를 들어, [5])의 차있는/비어있는 비트와 같은 동기화 프리미티브에 대한 대역외로서 또는 잠금(lock)에 대한 경쟁 조건을 감지하는 상태로서 병행성을 지원할 수 있다(예를 들어, [38, 52]). 시스템 설계자는 모든 라인을 감사 또는 다시 작성하지 않으면서 기존의 코드에 특정 마이크로-정책을 적용할 수 있다.We envision that our programming model can encode a number of different policies. Information-flow control (e.g., [6, 37, 40, 24, 8]) is richer than the simple taint tracking model here, but tracking the implicit flow is a RIFLE-style binary translation ) [44] or by use of some supported PC tags from the compiler. Micro-policies can support lightweight access control and segmentation [47]. Tags can be used to identify non-forgeable resources [50]. Unique generated tokens can play a key role in sealing and guaranteeing data, which in turn can be used for powerful abstractions - ensuring that data is only created and destroyed by authorized code components. Micro-policy rules can enforce data immutability such as immutability and linearity. Micro-policies will support concurrency as out-of-band for synchronization primitives such as data or full/empty bits of futures (e.g., [5]) or as a condition to detect race conditions on locks. can (eg, [38, 52]). System designers can apply specific micro-policies to existing code without having to audit or rewrite every line.

5. 관련 저작물5. Related Works

우리의 예제 정책과 관련 저작물은 §4에서 다루었다. 여기에서 우리는 하드웨어 태그 체킹 및 전파에 일반적으로 더 관련 저작물에 대해 설명한다. 아래에 언급되는 몇 가지 예외를 제외하고, 대부분의 이전 저작물은 하드 와이어되거나 매우 제한적인 정책을 가지고 작은 태그 비트 세트를 사용한다(도 2 참조). 테인트 하드웨어의 첫번째 물결은 하드와이어된 테인트 전파 로직을 사용하여 각 워드에 첨부된 단일 테일 비트를 지원했다. 최신 시스템은 다수의 독립 테인트 태그(예를 들어, [18]), 다중 비트 태그(예를 들어, [45]) 및 보다 유연한 정책(예를 들어, [19])을 처리할 수 있는 기능을 추가했다. 한 번에 하나 이상의 정책을 지원하는 유일한 설계인 Raksha는 최대 네 개의 테인트 추적 정책 [18]을 지원했다.Our example policies and related works are covered in §4. Here we discuss more generally related works on hardware tag checking and propagation. With a few exceptions noted below, most previous works use a small set of tag bits with hard-wired or very restrictive policies (see Figure 2). The first wave of taint hardware used hard-wired taint propagation logic to support a single tail bit appended to each word. Modern systems have the ability to handle multiple independent taint tags (e.g. [18]), multi-bit tags (e.g. [45]) and more flexible policies (e.g. [19]). Added Being the only design to support more than one policy at a time, Raksha supported up to four taint-tracking policies [18].

우리에게 가장 가까운 이전 시스템은 Aries [11], FleX1Taint [45], Log-Based Architecture (LBA) [13], Harmoni [20]이며, 이들은 모두 소프트웨어 핸들러에 의해 뒷받침되는 프로그램 가능 규칙 캐시를 제안한다. FleX1Taint 및 LBA만이 프로그램 가능 규칙 캐시를 사용하는 특정의 예시적인 보안 정책을 상세화하고 있다. LBA를 제외한 모든 사례에서, 규칙 캐시는 연산의 두 오퍼랜드에 대한 두 개의 입력을 기초로 하고 단일 출력을 생성하지만, PUMP는 잠재적으로 다섯 입력을 가져와서 두 개의 출력을 생성한다: 도 1은 이러한 태그 소스 및 목적지가 우리의 보안 정책에 사용되는 방법을 요약한다.Previous systems closest to us are Aries [11], FleX1Taint [45], Log-Based Architecture (LBA) [13], and Harmoni [20], all of which propose programmable rule caches backed by software handlers. Only FleX1Taint and LBA specify specific exemplary security policies using a programmable rules cache. In all cases except LBA, the rules cache bases the two inputs on the operation's two operands and produces a single output, whereas the PUMP potentially takes five inputs and produces two outputs: Figure 1 shows these tags It summarizes how sources and destinations are used in our security policy.

도 2: 하드웨어 태깅 접근법Figure 2: Hardware Tagging Approach

LBA는 잠재적으로 여러 입력을 갖지만, 하드웨어에서 메타데이터 생성을 처리하지 않는다. LBA의 혁신 중 일부(예를 들어, 테인트 조합의 포기를 포함하는 단항의 상속 추적(unary inheritance tracking)으로 일반적인 전파 추적의 제한)은 우리의 해결책이 제공하는 일반성을 특히 신속하게 포기하도록 만들었다. 이러한 제한된 정책이 있더라도, LBA는 대부분 단일 정책의 8 %의 평균 오버헤드에 비해 ~50 % 런타임 오버헤드를 갖는다. 우리가 여기서 보여준 정책은 여분의 태그 입력 및 출력과 더 풍부한 태그 메타데이터 때문에 FleXiTaint에 의해 지원되는 정책보다 더 풍부하다.LBAs potentially have multiple inputs, but do not handle metadata generation in hardware. Some of LBA's innovations (e.g., limiting general propagation tracking to unary inheritance tracking, including abandonment of taint combinations) led to the abandonment of the generality provided by our solution particularly quickly. Even with these limited policies, LBAs have ~50% runtime overhead compared to the 8% average overhead of most single policies. The policies we demonstrate here are richer than those supported by FleXiTaint because of the extra tag inputs and outputs and richer tag metadata.

6. 향후 연구6. Future research

PUMP 설계는 융통성 및 성능의 매력적인 조합을 제공하여, 대부분의 경우 전용 메커니즘에 필적하는 단일 정책 성능으로 다양한 저레벨, 미세 세밀화된 보안 정책의 모음을 지원하면서, 규칙 복잡성이 커짐에 따라 대부분 적절히 성능이 저하되는 보다 풍부하고 복합적인 정책을 지원한다. 이러한 설계 공간을 더 철저히 이해하기 위해, 많은 문제가 추가로 조사하는 것이 필요하다. 첫째, 우리가 실행중인 하드웨어 구현을 갖는다면, PUMP 하드웨어 및 하위 레벨 소프트웨어를 호스트 오퍼레이팅 시스템 및 소프트웨어 툴체인(예를 들어, 컴파일러, 링커 및 로더)과 통합해야 할 것이다. 둘째, 우리는 PUMP에 의해 제공되는 메커니즘이 자체 소프트웨어 구조체를 보호하는데 사용될 수 있는지 궁금해 하고 있다. 우리는 PUMP를 사용하는 "구획화" 마이크로-정책을 구현하고 이를 미스-핸들러 코드를 보호하는데 사용함으로써 특수한 미스-핸들러 작동 모드를 대체할 수 있다고 믿는다. 마지막으로, 각 정책에 의해 제공되는 보호가 다른 정책과 완전히 독립적인, 정책들의 직교 세트를 조합하는 것이 쉽다는 것을 알았다. 그러나 정책은 종종 상호 작용한다: 예를 들어, 정보-흐름 정책은 메모리 안전 정책에 의해 할당되는 새로운 영역에 태그를 배치해야 할 수 있다. 정책 구성은 표현에 있어서 그리고 효율적인 하드웨어 지원에 있어서 더 많은 연구가 필요하다.The PUMP design offers an attractive combination of flexibility and performance, supporting a diverse collection of low-level, fine-grained security policies with a single policy performance comparable to dedicated mechanisms in most cases, while degrading moderately in most cases as rule complexity increases. support richer and more complex policies that To understand this design space more thoroughly, many issues need further investigation. First, if we have a running hardware implementation, we will need to integrate the PUMP hardware and low-level software with the host operating system and software toolchain (eg compiler, linker and loader). Second, we wonder if the mechanism provided by PUMP can be used to protect its own software constructs. We believe that by implementing a "compartmentalization" micro-policy using PUMP and using it to protect the miss-handler code, we can replace the special miss-handler mode of operation. Finally, we find it easy to assemble an orthogonal set of policies, where the protection afforded by each policy is completely independent of the other policies. However, policies often interact: for example, an information-flow policy may require placing a tag in a new area allocated by a memory safety policy. Policy construction needs more research in terms of representation and efficient hardware support.

도 3: 초기 태그의 분포Figure 3: Distribution of initial tags

도 4: 최종 태그의 분포Figure 4: Distribution of final tags

도 5: 구체적 규칙의 분포Figure 5: Distribution of specific rules

도 6: (위의 정책 A의) 런타임 오버헤드의 분포Figure 6: Distribution of runtime overhead (of policy A above)

도 7: gcc에 대한 작업 세트 크기의 CDFFigure 7: CDF of working set size for gcc

7. 참고 문헌7. References

[I ] Alpha Architecture Handbook. Digital Equipment Corporation, 1992. [I] Alpha Architecture Handbook. Digital Equipment Corporation, 1992.

[2] M. Abadi, M. Budiu, LJ Erlingsson, and J. Ligatti. Control-flow integrity. In Proc. ACM CCS, pages 340-353, 2005. [2] M. Abadi, M. Budiu, LJ Erlingsson, and J. Ligatti. Control-flow integrity. In Proc. ACM CCS, pages 340-353, 2005.

[3] P. Akritidis, M. Costa, M. Castro, and S. Hand. Baggy bounds checking: an efficient and backwards-compatible defense against out-of- bounds errors. In Proc. USENIX Security, pages 51-66, 2009. [3] P. Akritidis, M. Costa, M. Castro, and S. Hand. Baggy bounds checking: an efficient and backwards-compatible defense against out-of-bounds errors. In Proc. USENIX Security, pages 51-66, 2009.

[4] D. Arora, S. Ravi, A. Raghunathan, and N. K. Jha. Architectural support for run-time validation of program data properties. IEEE Trans. VLSI Sys., 15(5):546-559, May 2007. [4] D. Arora, S. Ravi, A. Raghunathan, and N. K. Jha. Architectural support for run-time validation of program data properties. IEEE Trans. VLSI Sys., 15(5):546-559, May 2007.

[5] Arvind, R. S. Nikhil, and K. K. Pingali. l-structures: Data structures for parallel computing. In Proc. Wkshp on Graph Reduction (Springer- Verlag LNCS 279), Sept. 1986. [5] Arvind, R. S. Nikhil, and K. K. Pingali. l-structures: Data structures for parallel computing. In Proc. Wkshp on Graph Reduction (Springer- Verlag LNCS 279), Sept. 1986.

[6] T. H. Austin and C. Flanagan. Efficient purely-dynamic information flow analysis. In Workshop on Programming [6] T. H. Austin and C. Flanagan. Efficient purely-dynamic information flow analysis. In Workshop on Programming

Languages and Analysis for Security (PLAS), PLAS, pages 1 13- 124. ACM, 2009. Languages and Analysis for Security (PLAS), PLAS, pages 1 13- 124. ACM, 2009.

[7] (authors removed for anonymity). PUMP - A Programmable Unit for Metadata Processing, 2014. To appear. [7] (authors removed for anonymity). PUMP - A Programmable Unit for Metadata Processing, 2014. To appear.

[8] A. Azevedo de Amorim, N. Collins, A. DeHon, D. Demange, C. Hrij:cu, D. Pichardie, B. C. Pierce, R. Pollack, and A. Tolmach. A verified information-flow architecture. In POPL, pages 165-178. ACM, Jan. 2014. [8] A. Azevedo de Amorim, N. Collins, A. DeHon, D. Demange, C. Hrij:cu, D. Pichardie, B. C. Pierce, R. Pollack, and A. Tolmach. A verified information-flow architecture. In POPL, pages 165-178. ACM, Jan. 2014.

[9] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. [9] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator.

SIGARCH Comput. Archil News, 39(2): 1-7, Aug. 201 1. SIGARCH Comput. Archil News, 39(2): 1-7, Aug. 201 1.

[10] T. Boland and P. E. Black. Juliet 1 .1 C/C++ and Java test suite. [10] T. Boland and P. E. Black. Juliet 1.1 C/C++ and Java test suite.

Computer, pages 88-90, 2012. Computer, pages 88-90, 2012.

[11] J. Brown and T. F. Knight, Jr. A minimally trusted computing base for dynamically ensuring secure information flow. Technical Report 5, MIT CSAIL, November 2001, Aries Memo No. 15. [11] J. Brown and T. F. Knight, Jr. A minimally trusted computing base for dynamically ensuring secure information flow. Technical Report 5, MIT CSAIL, November 2001, Aries Memo No. 15.

[12] H. Chen, X. Wu, L. Yuan, B. Zang, P.-c. Yew, and F. T. Chong. From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware. In [12] H. Chen, X. Wu, L. Yuan, B. Zang, P.-c. Yew, and F. T. Chong. From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware. In

Proc. ISCA, pages 401-412, 2008. Proc. ISCA, pages 401-412, 2008.

[13] S. Chen, M. Kozuch, T. Strigkos, B. Falsafi, P. B. Gibbons, T. C. [13] S. Chen, M. Kozuch, T. Strigkos, B. Falsafi, P. B. Gibbons, T. C.

Mowry, V. Ramachandran, O. Ruwase, M. P. Ryan, and E. Vlachos.Mowry, V. Ramachandran, O. Ruwase, M. P. Ryan, and E. Vlachos.

Flexible Hardware Acceleration for Instruction-Grain Program Monitoring. In Proc. ISCA, pages 377-388, 2008. Flexible Hardware Acceleration for Instruction-Grain Program Monitoring. In Proc. ISCA, pages 377-388, 2008.

[14] S. Chen, J. Xu, N. Nakka, Z. Kalbarczyk, and R. Iyer. Defeating memory corruption attacks via pointer taintedness detection. In Proc. IEEE DSN, pages 378-387, 2005. [14] S. Chen, J. Xu, N. Nakka, Z. Kalbarczyk, and R. Iyer. Defeating memory corruption attacks via pointer taintedness detection. In Proc. IEEE DSN, pages 378-387, 2005.

[15] J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe dynamic binary translation using transactional memory. In HPCA, pages 279-289. IEEE, 2008. [15] J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe dynamic binary translation using transactional memory. In HPCA, pages 279-289. IEEE, 2008.

[16] J. A. Clause, I. Doudalis, A. Orso, and M. Prvulovic. Effective memory protection using dynamic tainting. In Proc. ASE, pages 284-292. ACM, 2007. [16] J. A. Clause, I. Doudalis, A. Orso, and M. Prvulovic. Effective memory protection using dynamic tainting. In Proc. ASE, pages 284-292. ACM, 2007.

[17] J. R. Crandall and F. T. Chong. Minos: Control data attack prevention orthogonal to memory model. In Proc. IEEE MICRO, pages 221-232, 2004. [17] J. R. Crandall and F. T. Chong. Minos: Control data attack prevention orthogonal to memory model. In Proc. IEEE MICRO, pages 221-232, 2004.

[18] M. Dalton, H. Kannan, and C. Kozyrakis. Raksha: a flexible information flow architecture for software security. In Proc. ISCA, pages 482-493, 2007. [18] M. Dalton, H. Kannan, and C. Kozyrakis. Raksha: a flexible information flow architecture for software security. In Proc. ISCA, pages 482-493, 2007.

[19] D. Y. Deng, D. Lo, G. Malysa, S. Schneider, and G. E. Suh. Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric. In Proc. IEEE MICRO, pages 137-148, 2010. [19] D. Y. Deng, D. Lo, G. Malysa, S. Schneider, and G. E. Suh. Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric. In Proc. IEEE MICRO, pages 137-148, 2010.

[20] D. Y. Deng and G. E. Suh. High-performance parallel accelerator for flexible and efficient run-time monitoring. In Proc. IEEE DSN, pages 1- 12, 2012. [20] D. Y. Deng and G. E. Suh. High-performance parallel accelerator for flexible and efficient run-time monitoring. In Proc. IEEE DSN, pages 1-12, 2012.

[21] J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic. [21] J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic.

HardBound: Architectural support for spatial safety of the C programming language. In S. J. Eggers and J. R. Lams, editors, ASPLOS, pages 103-1 14. ACM, 2008. HardBound: Architectural support for spatial safety of the C programming language. In S. J. Eggers and J. R. Lams, editors, ASPLOS, pages 103-1 14. ACM, 2008.

[22] E. Goktap, E. Athanasopoulos, H. Bos, and G. Portokalidis. Out Of Control: Overcoming Control-Flow Integrity. In Proc. IEEE S&P, 2014. [23] C. J. Haley, S. M. Luera, M. D. Schanken, and W. B. Geer. Final evaluation report unisys a series mcp/as release 3.7. Technical Report CSC-EPL-871003, Library No. S-228,515, National Computer Security Center, Fort Meade, MD, August 5 1987. [22] E. Goktap, E. Athanasopoulos, H. Bos, and G. Portokalidis. Out Of Control: Overcoming Control-Flow Integrity. In Proc. IEEE S&P, 2014. [23] C. J. Haley, S. M. Luera, M. D. Schanken, and W. B. Geer. Final evaluation report unsys a series mcp/as release 3.7. Technical Report CSC-EPL-871003, Library No. S-228,515, National Computer Security Center, Fort Meade, MD, August 5 1987.

[24] D. Hedin and A. Sabelfeld. Information-flow security for a core of JavaScript. In 25th IEEE Computer Security Foundations Symposium (CSF), CSF, pages 3-18. IEEE, 2012. [24] D. Hedin and A. Sabelfeld. Information-flow security for a core of JavaScript. In 25th IEEE Computer Security Foundations Symposium (CSF), CSF, pages 3-18. IEEE, 2012.

[25] J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archil News, 34(4):1-17, Sept. 2006. [25] J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archil News, 34(4):1-17, Sept. 2006.

[26] H. Kannan, M. Dalton, and C. Kozyrakis. Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor. In Proc. IEEE[26] H. Kannan, M. Dalton, and C. Kozyrakis. Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor. In Proc. IEEE

DSN, pages 105-1 14, 2009. DSN, pages 105-1 14, 2009.

[27] MITRE Corp. CWE-122: Heap-based buffer overflow. [28] MITRE Corp. CWE-416: Use after free. [27] MITER Corp. CWE-122: Heap-based buffer overflow. [28] MITER Corp. CWE-416: Use after free.

[29] MITRE Corp. CWE-78: Improper neutralization of special elements used in an OS command (OS command injection). [29] MITER Corp. CWE-78: Improper neutralization of special elements used in an OS command (OS command injection).

[30] MITRE Corp. CWE-843: Access of resource using incompatible type (type confusion). [30] MITER Corp. CWE-843: Access of resource using incompatible type (type confusion).

[31] D. A. Moon. Architecture of the Symbolics 3600. In Proc. ISCA, pages 76-83, Los Alamitos, CA, USA, 1985. IEEE Computer Society. [32] S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Hardware-Enforced Comprehensive Memory Safety. IEEE Micro, 33(3):38-47, May-June 2013. [31] D. A. Moon. Architecture of the Symbolics 3600. In Proc. ISCA, pages 76-83, Los Alamitos, CA, USA, 1985. IEEE Computer Society. [32] S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Hardware-Enforced Comprehensive Memory Safety. IEEE Micro, 33(3):38-47, May-June 2013.

[33] S. Nagarakatte, J. Zhao, M. M. K. Martin, and S. Zdancewic. SoftBound: highly compatible and complete spatial memory safety for C. In Proc. PLDI, pages245-258. ACM, 2009. [33] S. Nagarakatte, J. Zhao, M. M. K. Martin, and S. Zdancewic. SoftBound: highly compatible and complete spatial memory safety for C. In Proc. PLDI, pages 245-258. ACM, 2009.

[34] E. I. Organick. Computer System Organization: The B5700/B6700[34] E. I. Organick. Computer System Organization: The B5700/B6700

Series. Academic Press, 1973. Series. Academic Press, 1973.

[35] M. Ozsoy, D. Ponomarev, N. B. Abu-Ghazaleh, and T. Suit SIFT: a low-overhead dynamic information flow tracking architecture for SMT processors. In Conf. Computing Frontiers, page 37, 201 1. [35] M. Ozsoy, D. Ponomarev, N. B. Abu-Ghazaleh, and T. Suit SIFT: a low-overhead dynamic information flow tracking architecture for SMT processors. In Conf. Computing Frontiers, page 37, 201 1.

[36] F. Qin, C. Wang, Z. Li, H. Kim, Y. Zhou, and Y. Wu. LIFT: A low-overhead practical information flow tracking system for detecting security attacks. In Proc. IEEE MICRO, pages 135-148, 2006. [36] F. Qin, C. Wang, Z. Li, H. Kim, Y. Zhou, and Y. Wu. LIFT: A low-overhead practical information flow tracking system for detecting security attacks. In Proc. IEEE MICRO, pages 135-148, 2006.

[37] A. Russo and A. Sabelfeld. Dynamic vs. static flow-sensitive security analysis. In Proc. CSF, pages 186-199, 2010. [37] A. Russo and A. Sabelfeld. Dynamic vs. static flow-sensitive security analysis. In Proc. CSF, pages 186-199, 2010.

[38] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic race detector for multi-threaded programs. ACM Trans. Comp. Sys. , 15(4), 1997. [38] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic race detector for multi-threaded programs. ACM Trans. Comp. Sys. , 15(4), 1997.

[39] H. Shacham. The Geometry of Innocent Flesh on the Bone: Return- into-libc without Function Calls (on the x86). In Proc. ACM CCS, pages 552-561 , Oct. 2007. [39] H. Shacham. The Geometry of Innocent Flesh on the Bone: Return- into-libc without Function Calls (on the x86). In Proc. ACM CCS, pages 552-561, Oct. 2007.

[40] D. Stefan, A. Russo, J. C. Mitchell, and D. Mazieres. Flexible dynamic information flow control in Haskell. In 4th Symposium on Haskell, pages 95- 106. ACM, 2011. [40] D. Stefan, A. Russo, J. C. Mitchell, and D. Mazieres. Flexible dynamic information flow control in Haskell. In 4th Symposium on Haskell, pages 95- 106. ACM, 2011.

[41] G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure Program Execution via Dynamic Information Flow Tracking. In Proc. ASPLOS, pages 85-96, 2004. [41] G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure Program Execution via Dynamic Information Flow Tracking. In Proc. ASPLOS, pages 85-96, 2004.

[42] L. Szekeres, M. Payer, T. Wei, and D. Song. SoK: Eternal war in memory. In Proc. IEEE S&P, pages 48-62, 2013. [42] L. Szekeres, M. Payer, T. Wei, and D. Song. SoK: Eternal war in memory. In Proc. IEEE S&P, pages 48-62, 2013.

[43] G. S. Taylor, P. N. Hilfinger, J. R. Larus, D. A. Patterson, and B. G. [43] G. S. Taylor, P. N. Hilfinger, J. R. Larus, D. A. Patterson, and B. G.

Zorn. Evaluation of the SPUR lisp architecture. In Proc. ISCA, pages 444-452, 1986. Zorn. Evaluation of the SPUR lisp architecture. In Proc. ISCA, pages 444-452, 1986.

[44] N. Vachharajani, M. J. Bridges, J. Chang, R. Rangan, G. Ottoni, J. A. Blome, G. A. Reis, M. Vachharajani, and D. I. August. RIFLE: An architectural framework for user-centric information-flow security. In Proc. IEEE MICRO, 2004. [44] N. Vachharajani, M. J. Bridges, J. Chang, R. Rangan, G. Ottoni, J. A. Blome, G. A. Reis, M. Vachharajani, and D. I. August. RIFLE: An architectural framework for user-centric information-flow security. In Proc. IEEE MICRO, 2004.

[45] G. Venkataramani, I. Doudalis, Y. Solihin, and M. Prvulovic. FlexiTaint: A programmable accelerator for dynamic taint propagation. In Proc. HPCA, pages 173-184, Feb. 2008. [45] G. Venkataramani, I. Doudalis, Y. Solihin, and M. Prvulovic. FlexiTaint: A programmable accelerator for dynamic taint propagation. In Proc. HPCA, pages 173-184, Feb. 2008.

[46] R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Efficient software-based fault isolation. In SOSP, pages 203-216, 1993. [46] R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Efficient software-based fault isolation. In SOSP, pages 203-216, 1993.

[47] E. Witchel, J. Cates, and K. Asanovi c. ondrian memory protection. In Proc. ASPLOS, pages 304-316, New York, NY, USA, 2002. ACM. [47] E. Witchel, J. Cates, and K. Asanovi c. ondrian memory protection. In Proc. ASPLOS, pages 304-316, New York, NY, USA, 2002. ACM.

[48] W. Xu, S. Bhatkar, and R. Sekar. Taint-enhanced policy enforcement: a practical approach to defeat a wide range of attacks. In Proc. USENIX Security, Berkeley, CA, USA, 2006. [48] W. Xu, S. Bhatkar, and R. Sekar. Taint-enhanced policy enforcement: a practical approach to defeat a wide range of attacks. In Proc. USENIX Security, Berkeley, CA, USA, 2006.

[49] H. Yin, D. X. Song, M. Egele, C. Kruegel, and E. Kirda. Panorama: capturing system-wide information flow for malware detection and analysis. In Proc. CCS, pages 1 16-127. ACM, 2007. [49] H. Yin, D. X. Song, M. Egele, C. Kruegel, and E. Kirda. Panorama: capturing system-wide information flow for malware detection and analysis. In Proc. CCS, pages 1 16-127. ACM, 2007.

[50] N. Zeldovich, H. Kannan, M. Dalton, and C. Kozyrakis. Hardware enforcement of application security policies using tagged memory. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI, pages 225-240. USENIX Association, 2008. [50] N. Zeldovich, H. Kannan, M. Dalton, and C. Kozyrakis. Hardware enforcement of application security policies using tagged memory. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI, pages 225-240. USENIX Association, 2008.

[51 ] C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant, D. Song, and W. Zou. Practical Control Flow Integrity & Randomization for Binary Executables. In Proc. IEEE S&P, 2013. [51] C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant, D. Song, and W. Zou. Practical Control Flow Integrity & Randomization for Binary Executables. In Proc. IEEE S&P, 2013.

[52] P. Zhou, R. Teodorescu, and Y. Zhou. HARD: Hardware-assisted lockset-based race recording. In Proc. HPCA, 2007.[52] P. Zhou, R. Teodorescu, and Y. Zhou. HARD: Hardware-assisted lockset-based race recording. In Proc. HPCA, 2007.

8. 상징적 규칙8. Symbolic rules

8.1 프리미티브 타입8.1 Primitive Types

리턴 어드레스를 체크하기 위한 대안의 규칙Alternate rules for checking return addresses

8.2 메모리 안전8.2 Memory Safety

전체 메모리 안전을 위해 N=2⁶⁴-k로 N-컬러화. 우리는 컬러를 c로 작성하고, 이를 힙을 가리키는 포인터에 태깅하는데 사용한다. 우리는 컬러와 상이한 특수 태그

을 가정하는데, 이는 힙을 가리키는 포인터가 아닌 모든 데이터에 태깅하는데 사용된다. 레지스터에 대한 태그는 컬러 또는

(t로 작성됨)이다. 메모리에 대한 태그는 컬러와 컬러 또는

((c1, t2)로 작성됨) 또는 F (할당되지 않음)의 쌍이다. 힙은 처음에 모두 F로 태깅된다. 마지막으로 명령어상의 태그는 세트: {tmalloc, tmal loci nit, tfreeinit, tsomething}로부터 작성된다.N-colorize with N=2 ⁶⁴ -k for full memory safety. We write color in c, and use it to tag a pointer to the heap. We have special tags that are different in color

, which is used to tag all data that is not a pointer to the heap. Tags for registers can be colored or

(written as t). Tags for memory are color and color or

A pair of (written as (c1, t2)) or F (unassigned). Heaps are initially tagged with all F's. Finally, the tag on the command is created from the set: {tmalloc, tmal loci nit, tfreeinit, tsomething}.

8.3 CFI8.3 CFI

8.3.1 CFI-1ID [2]8.3.1 CFI-1ID [2]

우리는 세트: ø 및 {f}로서 기입된 2 태그를 사용한다. 태그{f}는 모든 간접적 통제 흐름뿐만 아니라 이들의 모든 잠재적 목적지에 태깅하기 위해 사용된다. 태그 ø는 다른 모든 것에 사용된다.We use 2 tags written as sets: ø and {f}. The tag {f} is used to tag all indirect control flows as well as all their potential destinations. The tag ø is used for everything else.

8.3.2 CFI-2ID [2]8.3.2 CFI-2ID [2]

이 정책에서, r은 리턴 및 리턴의 잠재적 타깃을 마킹하기 위해 사용되며, c는 간접 호출 및 점프와 점프의 잠재 타깃에 사용된다. 이 두 경우가 겹칠 수 있기 때문에, 우리는 세트: ø{r}, {c} 및 {r, c}로서 작성된 4 태그를 사용하고 있다.In this policy, r is used to mark returns and potential targets of returns, and c is used for indirect calls and jumps and potential targets of jumps. Since these two cases can overlap, we are using 4 tags written as sets: ø{r}, {c} and {r, c}.

8.3.3 CCFIR [51]8.3.3 CCFIR [51]

r은 return-id, c는 cal1-id, p는 return-into-privileged-code-id이다. 세트: ø, {r}, {p}, {c}, {r, c} 및 {p, c}로 작성된 6 태그를 가정한다.r is return-id, c is cal1-id, and p is return-into-privileged-code-id. Set: Assume 6 tags written in ø, {r}, {p}, {c}, {r, c} and {p, c}.

8.3.4 CFI-ROP8.3.4 CFI-ROP

우리는 리턴 ID 및 가능한 목적지 ID의 쌍을 포함하는 허용된 제어-흐름 그래프 χ를 가정한다. 우리는 아래에 id를 ci 또는 pc로서 기입한다. 태그는 유효한 ID 또는

이다.We assume an allowed control-flow graph χ containing pairs of return IDs and possible destination IDs. We write the id below as ci or pc. tag is a valid ID or

am.

8.3.5 CFI-JOP8.3.5 CFI-JOP

허용된 제어-흐름 그래프, X를 가정한다.Assume an allowed control-flow graph, X.

8.3.6 완전한 CFI8.3.6 Complete CFI

우리는 허용된 제어-흐름 그래프 x를 가정한다.We assume an allowed control-flow graph x.

8.4 테인트 추적8.4 Taint Tracking

8.5 서브워드 연산8.5 Subword Operation

우리가 우리의 실험에서 사용한 위의 규칙은 서브워드 연산을 감안하지 않는다. 서브워드 연산을 적절히 지원하기 위해, 우리는 로드 및 스토어 opgroup을 워드 연산(wload 및 wstore) 및 두 opgroup 바이트 연산(bload 및 bstore)을 위한 두 opgroup으로 나누어야 했을 것이다.The above rules we used in our experiments do not take subword operations into account. To properly support subword operations, we would have had to split the load and store opgroups into two opgroups for word operations (wload and wstore) and two opgroups for byte operations (bload and bstore).

명시적으로 로드 또는 스토어에 대해 이야기하는 모든 정책에 대한 규칙은 변경해야 한다(간단한 타입, 메모리 안전 및 테인트 추적). 여기서는 간단한 타입 정책(의 retaddr 변형이 없음)을 변경하는 방법이다 (w opgroup는 이전의 규칙에 대응한다).The rules for any policies that explicitly talk about loads or stores should be changed (simple types, memory safety, and taint tracking). Here's how to change a simple type policy (without the retaddr variant) (the w opgroup corresponds to the previous rule).

여기서는 메모리 안전에 대한 b 규칙이다.Here is the b rule for memory safety.

여기서는 테인트 추적에 대한 bstore 규칙이다. Here is the bstore rule for taint tracking.

Claims

As a method for performing metadata rule-based mediated data transfer,
issuing a first direct memory access command from a first device, wherein the first direct memory access command accesses first data stored in a first memory location having a first metadata tag, the first memory location having a first metadata tag; contained in a first memory coupled to one interconnect fabric, the first device being an untrusted device and coupled to a second interconnect fabric; and
performing processing to arbitrate the first direct memory access command issued from the second interconnect fabric to the first interconnect fabric;
The processing is:
Whether to allow execution of the first direct memory access instruction using a first rules cache containing metadata rules for the direct memory access instruction and according to one or more metadata tags used in the first direct memory access instruction. determining, wherein the first rule cache contains metadata rules for direct memory access instructions used to define allowed direct memory access operations;
A method for performing metadata rule-based moderated data transfer.

According to claim 1,
Further comprising performing, in a metadata processing domain using metadata rules of a second rule cache, metadata processing to determine whether to allow execution of a second instruction in a code execution domain isolated from the metadata processing domain. And, the second command is included in user code executed on a processor of the code execution domain.
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 2,
The processing is:
determining whether a rule exists in a first rule cache for the first direct memory access instruction according to the one or more metadata tags used in the first direct memory access instruction;
in response to determining that no rule exists in the first rule cache for the first direct memory access instruction, performing rule cache miss processing in the metadata processing domain;
The step of performing rule cache miss processing in the metadata processing domain;
determining whether execution of the first direct memory access instruction is permitted;
in response to determining that the first direct memory access instruction is allowed, generating a new rule for the first direct memory access instruction; and
inserting the new rule into the first rule cache;
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 1,
wherein the first instruction performs an operation comprising one of reading from and writing to the first memory location;
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 1,
The second interconnect fabric includes a plurality of untrusted devices including the first device, and a direct memory access command issued by the plurality of devices is an untrusted request to access a memory location of the first memory. ,
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 1,
The processing is:
obtaining the first metadata tag for the first memory location from the first memory;
providing the first metadata tag as a first metadata tag of the one or more metadata tags used to determine whether to allow execution of the first direct memory access instruction;
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 6,
The one or more metadata tags used to determine whether to allow execution of the first direct memory access command include the first metadata tag and additionally indicate a device identifier that uniquely identifies the first device. 1 contains a plurality of metadata tags, including the current instruction metadata tag for the direct memory access instruction,
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 7,
wherein the plurality of metadata tags further comprises a program counter metadata tag indicating a current state of the first device and indicating whether direct memory access instructions from the first device are enabled or disabled.
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 8,
wherein the plurality of metadata tags further comprises a byte enable metadata tag identifying which of the one or more bytes of the first memory location is being accessed by the first direct memory access instruction.
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 9,
The plurality of metadata tags used to determine whether to allow execution of the first direct memory access instruction are provided as inputs to the first rule cache in a first set of registers, and a second set of registers are provided as inputs to the first rule cache. which stores the output of the cache,
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 10,
The second set of registers includes a first output register containing a next tag of a program counter for a next instruction, the next tag indicating the state of the first device after performing the first direct memory access instruction. ,
Methods for Performing Metadata Rules-Based Moderated Data Transfer

According to claim 11,
wherein the second set of registers comprises a second output register containing a tag placed on a memory result if the first direct memory access instruction is a store or write to the first memory location.
A method for performing metadata rule-based moderated data transfer.

According to claim 3,
A current transaction identifier is stored in a transaction identifier register, and a write to the commit register results in a comparison between the current contents of the commit register and the transaction identifier register, whereby the current contents of the commit register are changed to the transaction identifier. in response to determining that it matches the current contents of the identifier register, the new rule is written to the first rule cache.
A method for performing metadata rule-based moderated data transfer.