KR20220044094A

KR20220044094A - FPGA based cache invalidation method and apparatus performing the same

Info

Publication number: KR20220044094A
Application number: KR1020210108793A
Authority: KR
Inventors: 이희민; 장선경; 서태원
Original assignee: 고려대학교 산학협력단
Priority date: 2020-09-28
Filing date: 2021-08-18
Publication date: 2022-04-06
Also published as: KR102526499B1

Abstract

The present invention relates to an FPGA-based cache invalidation apparatus to provide the role of cache invalidation in Armv8 CPU. According to the present invention, the FPGA-based cache invalidation apparatus comprises: a processing system generating and transmitting an advanced microcontroller bus architecture (AMBA) transmission signal according to AMBA and a cache invalidation module configured to transmit an AMBA reception signal for cache invalidation to a processing system when receiving the AMBA transmission signal. The cache invalidation module sets AWCACHE, AWUSER, WSTRB, and AWADDR associated with the AMBA reception signal, transmit the AMBA reception signal to the processing system when the value of the AWADDR is different from an initial value, and receives a response from the processing system when a cache invalidation task is normally processed.

Description

FPGA based cache invalidation method and apparatus performing the same}

본 발명은 컴퓨터 보안에 관한 것이며, 보다 상세하게는 ARM 프로세서를 기반으로 FPGA 기반의 하드웨어 로직을 통해 특정 권한 없이 CPU 캐시에 적재된 데이터를 무효화하는 방법이다.The present invention relates to computer security, and more particularly, to a method of invalidating data loaded in a CPU cache without specific authority through an FPGA-based hardware logic based on an ARM processor.

Arm 아키텍처 기반 프로세서는 오늘날 많은 임베디드 시스템에서 사용 중이다. 최근 발표된 Arm 기반 프로세서는 성능 향상을 위해 여러 개의 코어 (멀티 코어)로 이루어져 있으며 각 코어는 독립적으로 프로그램을 수행한다.Processors based on the Arm architecture are in use in many embedded systems today. The recently announced Arm-based processor consists of multiple cores (multi-core) to improve performance, and each core executes programs independently.

프로세서는 DRAM (메모리)에서 각 코어로 데이터를 읽고/쓰는 속도를 향상시키기 위해 CPU 내부에 빠르고 작은 메모리인 캐시를 가지고 있다. 특히 Arm 아키텍처 기반 멀티 코어는 각 코어가 독점적으로 사용하는 L1 Data, Instruction 캐시, 여러 코어가 공유하는 L2 캐시를 가진다. 이러한 구조에서 각 코어는 독립적으로 명령어를 수행하며, 메모리에 읽기/쓰기 요청을 개별적으로 보내기 때문에, 캐시는 이러한 요청들에 대해 데이터 일관성 (Data consistency)을 유지해야 한다.The processor has a cache, which is a fast, small memory inside the CPU to improve the speed of reading/writing data from DRAM (memory) to each core. In particular, Arm architecture-based multi-core has L1 data exclusively used by each core, instruction cache, and L2 cache shared by multiple cores. In this structure, each core independently executes instructions and individually sends read/write requests to memory, so the cache must maintain data consistency for these requests.

Arm 아키텍처의 경우 데이터 일관성 유지를 위해 SCU(Snoop Control Unit) 하드웨어를 가지고 있으며, SCU는 각 코어의 메모리 요청을 스누핑 하여 L1, L2, 메모리 간의 일관성을 유지한다. 또한, 코어뿐만 아니라 외부 I/O 장치들에서도 메모리 읽기/쓰기 요청이 가능하며, 이 요청은 ACP(Accelerator Coherency Port)를 통해 SCU로 전달된다.The Arm architecture has Snoop Control Unit (SCU) hardware to maintain data consistency, and the SCU maintains consistency between L1, L2, and memory by snooping each core's memory request. In addition, memory read/write requests are possible not only from the core but also from external I/O devices, and these requests are transmitted to the SCU through the Accelerator Coherency Port (ACP).

2013년 Arm사는 32-bit/64-bit 명령어를 모두 지원하기 위해 Armv8 아키텍처를 발표했으며, 최신 고성능 임베디드 시스템은 대부분 Armv8-A 기반의 프로세서를 사용하고 있다. Armv8-A ISA(Instruction Set Architecture)는 다양한 명령어를 지원하고 있으며 캐시 유지 및 관리(cache maintaince) 명령어 또한 지원하고 있다. OS(Operating System)나 VM(Virtual Machine)은 CPU의 성능 향상 및 보안 유지를 위해 때때로 캐시를 직접 정리(Clean)하거나, 무효화(Invalidation) 해야 하며, 이때 Armv8-A에서 제공하는 명령어 셋을 활용한다. 하지만 이러한 명령어를 사용하기 위해서는 특정 권한이 필요하며, 권한이 낮은 애플리케이션은 캐시 관련 명령어를 사용할 수 없다는 문제점이 있다.In 2013, Arm announced the Armv8 architecture to support both 32-bit and 64-bit instructions, and most of the latest high-performance embedded systems use Armv8-A-based processors. Armv8-A ISA (Instruction Set Architecture) supports various instructions and also supports cache maintaince instructions. The OS (Operating System) or VM (Virtual Machine) must occasionally clean or invalidate the cache directly to improve CPU performance and maintain security. In this case, the instruction set provided by Armv8-A is utilized. . However, there is a problem that specific privileges are required to use these instructions, and applications with low privileges cannot use cache-related instructions.

본 발명은 상기와 같은 문제를 해결하기 위해 제안된 것으로, FPGA 기반의 하드웨어 로직을 통해 특정 권한 없이 CPU 캐시에 적재된 데이터를 무효화하는 방법 및 장치를 제안하고자 한다.The present invention has been proposed to solve the above problems, and it is intended to propose a method and apparatus for invalidating data loaded in a CPU cache without specific authority through FPGA-based hardware logic.

또한, 본 발명이 해결하고자 하는 과제는 FPGA를 통해 권한이 낮은 애플리케이션이 캐시 무효화(cache invalidation) 기능을 사용할 수 있는 방법을 제공하는 것이다.In addition, the problem to be solved by the present invention is to provide a method by which a low-privileged application can use a cache invalidation function through the FPGA.

본 발명에 따른 FPGA 기반 캐시 무효화 기능을 수행하는 장치는 AMBA (Advanced Microcontroller Bus Architecture)에 따른 AMBA 송신 신호를 생성하여 송신하는 프로세싱 시스템; 및 상기 AMBA 송신 신호를 수신하면, 캐시 무효화를 위한 AMBA 수신 신호를 상기 프로세싱 시스템으로 송신하는 캐시 무효화 모듈을 포함한다. 상기 캐시 무효화 모듈은 상기 AMBA 수신 신호와 연관된 AWCACHE, AWUSER, WSTRB, AWADDR를 설정하고, 상기 AWADDR의 값이 상기 초기 값과 다르면 상기 AMBA 수신 신호를 상기 프로세싱 시스템으로 송신하고, 정상적으로 캐시 무효화 작업이 처리 되었다면 상기 프로세싱 시스템으로부터 응답을 수신할 수 있다.An apparatus for performing an FPGA-based cache invalidation function according to the present invention includes: a processing system for generating and transmitting an AMBA transmission signal according to AMBA (Advanced Microcontroller Bus Architecture); and a cache invalidation module configured to, upon receiving the AMBA transmission signal, transmit an AMBA reception signal for cache invalidation to the processing system. The cache invalidation module sets AWCACHE, AWUSER, WSTRB, AWADDR associated with the AMBA received signal, and if the value of AWADDR is different from the initial value, sends the AMBA received signal to the processing system, and the cache invalidation operation is normally processed If so, a response may be received from the processing system.

실시 예에 따르면, 상기 캐시 무효화 모듈은 상기 AMBA 수신 신호와 연관된 AWCACHE, AWUSER, WSTRB, AWADDR를 설정하고, 상기 AWADDR의 값이 디폴트 값인 초기 값과 다른 값으로 설정되어 있는지 확인하여, 상기 AWADDR의 값이 상기 초기 값과 다르면 상기 AWADDR의 값이 타겟 주소(target address)로 변경된 것으로 판단할 수 있다. According to an embodiment, the cache invalidation module sets AWCACHE, AWUSER, WSTRB, AWADDR associated with the AMBA reception signal, checks whether the value of AWADDR is set to a value different from an initial value that is a default value, and the value of the AWADDR If it is different from the initial value, it may be determined that the value of the AWADDR has been changed to a target address.

실시 예에 따르면, 상기 캐시 무효화 모듈은 4-bit로 이루어진 상기 AWCACHE의 AWCACHE[1]을 1로 설정하고, 2-bit로 이루어진 AWUSER(301)의 AWUSER[0]을 1로 설정하고, 상기 타겟 주소에 대응되는 데이터를 변경하지 않기 위하여 16-bit로 이루어진 상기 WSTRB를 모두 0으로 설정하고, 40-bit 크기의 상기 AWADDR에 상기 타겟 주소를 설정할 수 있다.According to an embodiment, the cache invalidation module sets AWCACHE[1] of the 4-bit AWCACHE to 1, sets AWUSER[0] of the 2-bit AWUSER 301 to 1, and sets the target In order not to change the data corresponding to the address, all 16-bit WSTRBs may be set to 0, and the target address may be set in the 40-bit AWADDR.

실시 예에 따르면, 상기 캐시 무효화 모듈은 Enable bit의 설정 여부를 확인하여 상기 Enable bit가 1로 설정된 경우 상기 AMBA 수신 신호로 송신할 AWCACHE, AWUSER, WSTRB, AWADDR를 설정할 수 있다. 상기 프로세싱 시스템은 복수의 코어들의 각각에 포함된 L1 캐시 및 공통된 L2 캐시와 연동 가능하고, 상기 AWADDR가 상기 초기 값과 다른 경우 상기 AMBA 수신 신호를 수신하도록 구성된 스누프 제어부를 포함할 수 있다.According to an embodiment, the cache invalidation module may check whether an Enable bit is set and set AWCACHE, AWUSER, WSTRB, and AWADDR to be transmitted as the AMBA reception signal when the Enable bit is set to 1. The processing system may include a snoop control unit capable of interworking with an L1 cache and a common L2 cache included in each of the plurality of cores, and configured to receive the AMBA reception signal when the AWADDR is different from the initial value.

실시 예에 따르면, 상기 스누프 제어부는 상기 AMBA 수신 신호를 수신하면, 상기 복수의 코어들에서 데이터 쓰기 작업이 수행될 때 공유하고 있는 데이터의 불일치가 해결되도록 상기 복수의 코어들이 공유하는 상기 L2 캐시를 제어할 수 있다. According to an embodiment, when the snoop control unit receives the AMBA reception signal, the L2 cache shared by the plurality of cores is resolved so that mismatch of data shared by the plurality of cores is resolved when a data write operation is performed by the plurality of cores. can control

실시 예에 따르면, 상기 캐시 무효화 모듈은 AXI 프로토콜에 따라 캐시 무효화를 위한 상기 AMBA 수신 신호를 송신하도록 구성된 AXI 마스터 포트를 포함하고, 상기 프로세싱 시스템은 AXI 프로토콜에 따라 상기 AMBA 수신 신호를 수신하도록 구성된 AXI 슬레이브 포트를 포함할 수 있다.According to an embodiment, the cache invalidation module comprises an AXI master port configured to transmit the AMBA receive signal for cache invalidation according to an AXI protocol, and wherein the processing system is configured to receive the AMBA receive signal according to an AXI protocol It may include a slave port.

실시 예에 따르면, 상기 AXI 마스터 포트가 AW 채널로 상기 AWADDR 및 AWVALID를 high로 송신하면, 상기 AXI 슬레이브 포트는 상기 AW 채널로 AWREADY 신호를 송신하여 상기 프로세싱 시스템에서 상기 데이터 쓰기 작업에 대한 준비가 되었음을 알려주도록 구성될 수 있다.According to an embodiment, when the AXI master port sends the AWADDR and AWVALID high on the AW channel, the AXI slave port sends an AWREADY signal on the AW channel to indicate that the processing system is ready for the data write operation. It can be configured to inform.

실시 예에 따르면, 상기 AXI 마스터 포트는, 상기 AWREADY 신호를 수신하면 상기 AWVALID를 low로 변경하여 데이터 전송 과정이 시작되도록 제어하고, 상기 데이터 전송 과정에서 W 채널을 통해 상기 AXI 마스터 포트에서 상기 AXI 슬레이브 포트로 데이터와 high로 설정된 WVALID를 송신하고 상기 AXI 슬레이브 포트에서 WREADY를 송신하여 이루어질 수 있다.According to an embodiment, when the AXI master port receives the AWREADY signal, it changes the AWVALID to low to control the data transmission process to start, and in the data transmission process, the AXI slave port at the AXI master port through the W channel This can be done by sending data and WVALID set to high to the port, and sending WREADY from the AXI slave port.

실시 예에 따르면, 상기 AXI 마스터 포트에서 데이터를 송신 시 high로 설정된 WLAST 신호를 송신하면 상기 AXI 슬레이브 포트는 B Channel을 통해 high로 설정된 BVALID를 상기 AXI 마스터 포트로 송신하고, 상기 AXI 마스터 포트가 상기 high로 설정된 BVALID를 수신하면 상기 데이터 전송 과정을 종료할 수 있다.According to an embodiment, if the AXI master port transmits a WLAST signal set high when transmitting data, the AXI slave port transmits the BVALID set high through the B Channel to the AXI master port, and the AXI master port transmits the Upon reception of the BVALID set to high, the data transmission process may be terminated.

실시 예에 따르면, 상기 캐시 무효화 모듈은 상기 AMBA 수신 신호를 송신한 후에 상기 응답으로 쓰기 응답(write response)이 수신되지 않으면, Enable bit의 설정 여부를 확인하여 상기 Enable bit가 1로 설정된 경우 상기 AMBA 수신 신호로 송신할 AWCACHE, AWUSER, WSTRB, AWADDR를 다시 설정할 수 있다.According to an embodiment, if a write response is not received as the response after transmitting the AMBA reception signal, the cache invalidation module checks whether an Enable bit is set, and when the Enable bit is set to 1, the AMBA AWCACHE, AWUSER, WSTRB, and AWADDR to be transmitted as the received signal can be set again.

실시 예에 따르면, 상기 캐시 무효화 모듈은, 상기 AWADDR가 상기 초기 값과 다르면 상기 AMBA 수신 신호를 상기 프로세싱 시스템으로 송신하고, 정상적으로 캐시 무효화 작업이 처리 되었다면 상기 프로세싱 시스템으로부터 응답으로 상기 쓰기 응답이 다시 수신되도록 AXI 슬레이브 포트를 제어할 수 있다.According to an embodiment, if the AWADDR is different from the initial value, the cache invalidation module transmits the AMBA reception signal to the processing system, and if the cache invalidation operation is normally processed, the write response is received again as a response from the processing system It is possible to control the AXI slave port as much as possible.

본 발명은 AXI ACP와 AMBA 통신규약을 이용하여 Armv8 CPU에서 cache invalidation의 역할을 제공할 수 있다.The present invention can provide the role of cache invalidation in Armv8 CPU using AXI ACP and AMBA communication protocol.

본 발명을 통하여 기존에는 cache invalidation을 할 수 없었던 사용자나 애플리케이션이 기존보다 폭넓은 개발을 할 수 있는 환경을 제공한다. The present invention provides an environment in which a user or application, which was not able to perform cache invalidation in the past, can develop more extensively than before.

본 발명을 통하여, 사용자는 하드웨어 IP(Intellectual Property)를 통해 함수처럼 간편하게 cache invalidation을 할 수 있다.Through the present invention, a user can perform cache invalidation as easily as a function through a hardware IP (Intellectual Property).

상술한 본 발명의 특징 및 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. The features and effects of the present invention described above will become more apparent through the following detailed description in relation to the accompanying drawings, whereby those of ordinary skill in the art to which the present invention pertains can easily implement the technical idea of the present invention. will be able

도 1은 본 발명의 제안하는 캐시 무효화 모듈을 포함한 시스템 전체 구조를 나타낸다.
도 2는 본 발명에 따른 마스터와 슬레이브 간 AXI4 프로토콜의 구조도를 나타낸다.
도 3은 본 발명에 따른 캐시 무효화 모듈에서 보내는 ABMA 수신 신호)에 대한 구조도이다.
도 4는 본 발명에 따른 캐시 무효화 모듈에서 수행하는 동작의 흐름도를 나타낸다.
도 5는 본 발명에 따른 ACP Invalidation IP를 포함한 시스템의 블록 다이어그램이다. 1 shows the overall structure of a system including a cache invalidation module proposed in the present invention.
2 shows a structural diagram of an AXI4 protocol between a master and a slave according to the present invention.
3 is a structural diagram of an ABMA reception signal transmitted from a cache invalidation module according to the present invention.
4 is a flowchart of an operation performed by the cache invalidation module according to the present invention.
5 is a block diagram of a system including ACP Invalidation IP according to the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 구체적으로 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and will be described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다.In describing each figure, like reference numerals are used for like elements.

제1, 제2등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않아야 한다.Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. shouldn't

이하의 설명에서 사용되는 구성요소에 대한 접미사 모듈, 블록 및 부는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. The suffix module, block, and part for the components used in the following description are given or mixed in consideration of only the ease of writing the specification, and do not have distinct meanings or roles by themselves.

이하, 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 당해 분야에 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 설명한다. 하기에서 본 발명의 실시 예를 설명함에 있어, 관련된 공지의 기능 또는 공지의 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. In the following description of embodiments of the present invention, if it is determined that a detailed description of a related known function or a known configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

이하에서는, 본 발명에 따른 FPGA 기반 캐시 무효화 방법 및 이를 수행하는 장치에 대해 도면을 참조하여 설명한다. Hereinafter, an FPGA-based cache invalidation method and an apparatus for performing the same according to the present invention will be described with reference to the drawings.

도 1은 본 발명의 제안하는 캐시 무효화 모듈을 포함한 시스템 전체 구조를 나타낸다. 도 1을 참조하면, 본 발명에서 제안하는 하드웨어 IP(Intellectual Property)는 캐시 무효화 모듈(111)이다. 이와 관련하여, 캐시 무효화 모듈(111)은 프로그래머블 로직(110) 내에 구현될 수 있고, 프로세싱 시스템(100)과 연동될 수 있다. 한편, 본 발명에 따른 FPGA 기반 캐시 무효화 방법을 수행하는 장치는 프로세싱 시스템(100)과 프로그래머블 로직(110)을 포함할 수 있다.1 shows the overall structure of a system including a cache invalidation module proposed in the present invention. Referring to FIG. 1 , the hardware IP (Intellectual Property) proposed by the present invention is a cache invalidation module 111 . In this regard, the cache invalidation module 111 may be implemented in the programmable logic 110 , and may cooperate with the processing system 100 . Meanwhile, an apparatus for performing the FPGA-based cache invalidation method according to the present invention may include the processing system 100 and the programmable logic 110 .

한편, 본 발명의 프로세싱 시스템(100) 및 프로그래머블 로직(100)을 포함한 장치와 관련하여, 프로세서 내부에는 다양한 장치 및 I/O가 있으며 버스를 통해 장치 간 통신을 한다. 이와 관련하여, 각 프로세서 제조사마다 서로 다른 규약을 정의하여 사용한다. 대표적으로 Arm은 AMBA(Advanced Microcontroller Bus Architecture) 프로토콜을 사용한다. 2010년에 출시한 AMBA 4는 크게 ACE, AXI4, APB4 프로토콜을 지원한다.On the other hand, with respect to the device including the processing system 100 and the programmable logic 100 of the present invention, there are various devices and I/Os inside the processor, and the devices communicate with each other through a bus. In this regard, each processor manufacturer defines and uses a different protocol. Typically, Arm uses the AMBA (Advanced Microcontroller Bus Architecture) protocol. AMBA 4, released in 2010, largely supports ACE, AXI4, and APB4 protocols.

이와 관련하여, FLUSH+RELAOD, FLUSH+FLUSH와 같은 캐시 부채널공격은 x86등을 대상으로 마이크로아키텍쳐의 취약점이 발생할 수 있다. 캐시 부채널공격은 주로 LLC (Last Level Cache)의 정보를 쓰거나 읽었을 때 발생하는 부가적인 정보를 이용하여 커널의 중요 정보를 알아내는 공격방법이다. In this regard, cache side-channel attacks such as FLUSH+RELAOD and FLUSH+FLUSH may cause micro-architecture vulnerabilities targeting x86 and the like. The cache side-channel attack is an attack method to find out important information of the kernel by using additional information that is mainly generated when the LLC (Last Level Cache) information is written or read.

대부분의 캐시 부채널공격은 Intel 및 AMD 프로세서를 대상으로 연구가 진행되었다. 이는 Intel 및 AMD 프로세서에서 권한이 없는 일반 사용자의 경우에도 flush 명령어가 사용가능하기 때문이다. 하지만 Arm 프로세서의 경우, 해당 명령어 수행에 EL1 이상의 권한이 필요하여, unprivileged user는 flush 명령어를 사용할 수 없어 같은 방식으로 캐시 부채널공격이 불가능하다.Most cache side-channel attacks have been studied for Intel and AMD processors. This is because the flush command is available even for non-privileged general users on Intel and AMD processors. However, in the case of Arm processors, EL1 or higher privileges are required to execute the corresponding instruction, so unprivileged users cannot use the flush instruction, so cache side-channel attacks are impossible in the same way.

본 발명은 ACP (Accelerator Coherency Port)를 이용하여 Arm Cortex-A53 프로세서에서 unprivileged 권한으로 캐시 부채널 공격이 가능한 경우 캐시 무효화를 위한 것이다. 이와 관련하여 본 발명에 따른 캐시 무효화 모듈(111)을 Programmable SoC (System-on-Chip)의 PL (Programmable Logic) 영역에 배치하고자 한다. 따라서, 일반 사용자도 캐시 flush 명령어와 유사한 기능인 캐시 invalidate 명령어를 수행할 수 있도록 할 수 있다.The present invention is for invalidating cache when a cache side-channel attack is possible with unprivileged authority in an Arm Cortex-A53 processor using an accelerator coherency port (ACP). In this regard, the cache invalidation module 111 according to the present invention is to be disposed in the PL (Programmable Logic) area of the Programmable SoC (System-on-Chip). Therefore, general users can also execute the cache invalidate command, which is a function similar to the cache flush command.

본 발명에서는 AXI4 (Advanced eXtensible Interface 4) 프로토콜의 일부 신호를 조정하여 ACP(Accelerator Coherency Port)(124)로 캐시 무효화(cache invalidation) 요청을 보낼 수 있는 하드웨어를 설계하는 방법에 대해 설명한다. The present invention describes a method of designing hardware that can send a cache invalidation request to the Accelerator Coherency Port (ACP) 124 by adjusting some signals of the Advanced eXtensible Interface 4 (AXI4) protocol.

캐시 무효화 모듈(111)은 사용자가 설정한 값을 바탕으로 캐시 무효화(cache invalidation)를 수행한다. 사용자는 캐시 무효화를 하기 위해 프로세싱 시스템(100)을 통해 Enable bit, 타겟 주소(Target address)를 설정해야 한다. Enable bit는 캐시 무효화 모듈(111)의 작동 여부를 결정하며, 1로 셋팅할 경우 AMBA 수신 신호(121)가 발생 한다. 타겟 주소는 캐시 무효화를 위한 주소를 담는다. 캐시 무효화 모듈(111)은 타겟 주소를 무효화하기 위해 AXI4 프로토콜의 WSTRB, AWCACHE, AWUSER 값을 설정하여 AMBA 수신 신호(121)로 보낸다. WSTRB는 AXI4 프로토콜에 담긴 데이터 유효성을 결정하며, AWCACHE와 AWUSER 값 설정을 통해 캐시 무효화를 요청한다.The cache invalidation module 111 performs cache invalidation based on a value set by a user. A user must set an enable bit and a target address through the processing system 100 in order to invalidate the cache. The Enable bit determines whether the cache invalidation module 111 operates, and when it is set to 1, the AMBA reception signal 121 is generated. The target address contains the address for cache invalidation. The cache invalidation module 111 sets the WSTRB, AWCACHE, and AWUSER values of the AXI4 protocol to invalidate the target address and sends it to the AMBA reception signal 121 . WSTRB determines the validity of data contained in the AXI4 protocol, and requests cache invalidation by setting AWCACHE and AWUSER values.

도 1을 참조하면, 캐시 무효화를 수행하는 장치는 크게 프로세싱 시스템(PS)(100), 프로그래머블 로직(PL)(110), 메모리(105)로 이루어져 있다. Referring to FIG. 1 , an apparatus for performing cache invalidation largely includes a processing system (PS) 100 , a programmable logic (PL) 110 , and a memory 105 .

PS(100)는 Armv8 core(101), L1 cache(102), 스누프 제어부 (SCU :Snoop Control Unit)(103), L2 cache(104), AXI 마스터 포트(Master Port)(122), ACP(124)를 포함한다. PL(110)은 캐시 무효화 모듈(111), AXI 슬레이브 포트(Slave Port)(123), AXI 마스터 포트(125)를 포함한다. AXI 마스터 포트(122)는 AXI 슬레이브 포트(123)로 AMBA 송신 신호(120)를 보내고, AXI 마스터 포트(125)는 ACP(124)로 AMBA 수신 신호(121)를 보낸다.PS(100) is Armv8 core(101), L1 cache(102), Snoop Control Unit (SCU:Snoop Control Unit)(103), L2 cache(104), AXI Master Port(122), ACP( 124). The PL 110 includes a cache invalidation module 111 , an AXI slave port 123 , and an AXI master port 125 . The AXI master port 122 sends the AMBA transmit signal 120 to the AXI slave port 123 , and the AXI master port 125 sends the AMBA receive signal 121 to the ACP 124 .

PS(100)는 프로세싱 시스템(Processing System)의 약자로 프로세서를 나타내며, Armv8 core(101), L1 cache(102), SCU(103), L2 cache(104), AXI Master Port(122), ACP(124)를 포함한다. 각 Armv8 core(101) 빠른 데이터 처리를 위한 고유의 L1 cache(102)을 가지고 있다. SCU(103)는 각 Armv8 core(101)들의 L1 cache(102) 간의 데이터 일관성을 유지하는 역할을 한다. 데이터 일관성 유지란 여러 Armv8 core(101)에서 데이터 쓰기가 수행될 때 공유하고 있는 데이터의 불일치를 해결하는 일련의 과정을 말한다. L2 cache(104)는 독점적으로 사용되는 L1 cache(102)와 달리 Armv8 core(101)들이 공유한다. L2 cache(104)는 L1 cache(102)보다 데이터를 읽고 쓰는데 느리지만, 메모리(105) 보다 빠르다. AXI Master Port(122)는 사용자가 캐시 무효화 모듈(111)에 데이터를 보내기 위해 사용하는 인터페이스로 AMBA AXI4 프로토콜에 따라 AMBA 송신 신호(120)를 보낸다. ACP(124)는 AMBA AXI4 프로토콜을 따르며 SCU(103)에 연결되어 있기 때문에, 캐시 무효화 모듈(111)은 L1 cache(102)의 데이터 일관성에 관여할 수 있다. PS (100) is an abbreviation of Processing System (Processing System) and represents a processor, Armv8 core (101), L1 cache (102), SCU (103), L2 cache (104), AXI Master Port (122), ACP ( 124). Each Armv8 core (101) has its own L1 cache (102) for fast data processing. The SCU 103 serves to maintain data consistency between the L1 cache 102 of each Armv8 core 101 . Maintaining data consistency refers to a series of processes to resolve inconsistencies in shared data when data writes are performed in multiple Armv8 cores 101. L2 cache 104 is shared by Armv8 core 101 unlike L1 cache 102 used exclusively. The L2 cache 104 reads and writes data slower than the L1 cache 102 , but is faster than the memory 105 . The AXI Master Port 122 is an interface used by the user to send data to the cache invalidation module 111 and sends the AMBA transmission signal 120 according to the AMBA AXI4 protocol. Since the ACP 124 conforms to the AMBA AXI4 protocol and is coupled to the SCU 103 , the cache invalidation module 111 may be involved in data coherency of the L1 cache 102 .

PL(110)은 프로그래머블 로직(Programmable Logic)의 약자이며, 본 발명에서 제안한 캐시 무효화 모듈(111), AXI Slave Port(123), AXI Master Port(125)를 포함한다. 캐시 무효화 모듈(111)을 통해 사용자는 장치를 통해 특정 캐시 데이터를 invalidation 할 수 있다. 이를 위해, 사용자가 장치를 통해 캐시 무효화 모듈(111)에 AMBA 송신 신호(120)을 보내면, 캐시 무효화 모듈(111)에서 ACP(124)로 cache invalidation을 위한 AMBA 수신 신호(121)을 보낸다.PL 110 is an abbreviation of Programmable Logic, and includes a cache invalidation module 111 , AXI Slave Port 123 , and AXI Master Port 125 proposed in the present invention. The cache invalidation module 111 allows the user to invalidate specific cache data through the device. To this end, when the user sends an AMBA transmission signal 120 to the cache invalidation module 111 through the device, the cache invalidation module 111 sends an AMBA reception signal 121 for cache invalidation to the ACP 124 .

도 2는 본 발명에 따른 마스터와 슬레이브 간 AXI4 프로토콜의 구조도를 나타낸다. AXI 마스터(200)는 AXI4 프로토콜에 따라 신호를 송신하는 장치이고 AXI슬레이브(201)는 신호를 수신하는 장치이다. 도 2의 AXI 마스터(200)와 AXI 슬레이브(201)는 도 1의 AXI 마스터 포트(122)와 AXI 슬레이브 포트(123)에 해당할 수 있지만, 이에 한정되는 것은 아니다. 다른 예로, 도 2의 AXI 마스터(200)와 AXI 슬레이브(201)는 도 1의 AXI 마스터 포트(125)와 ACP(124)에 해당할 수 있다.2 shows a structural diagram of an AXI4 protocol between a master and a slave according to the present invention. The AXI master 200 is a device that transmits a signal according to the AXI4 protocol, and the AXI slave 201 is a device that receives a signal. The AXI master 200 and the AXI slave 201 of FIG. 2 may correspond to the AXI master port 122 and the AXI slave port 123 of FIG. 1 , but are not limited thereto. As another example, the AXI master 200 and the AXI slave 201 of FIG. 2 may correspond to the AXI master port 125 and the ACP 124 of FIG. 1 .

도 1 및 도 2를 참조하면, AXI4 프로토콜은 크게 5개의 채널로 이루어져 있다. AR 채널(Channel)(210)은 Read Address Channel로 AXI 마스터(200)에서 읽고자 하는 데이터의 주소에 관련된 채널이다. R 채널(211)은 Read Data Channel로 AXI 마스터(200)에서 읽고 싶은 데이터를 받는 채널이다. 이 두 채널(210, 211)은 읽기 작업과 관련되어 있다. AXI 마스터(200)가 AR 채널(210)로 신호를 보내면 AXI 슬레이브(201)가 AR 채널(210)에서 받은 주소에 있는 데이터를 R 채널(211)로 보내준다. AW 채널(212)은 Write Address Channel로 AXI 마스터(200)에서 쓰고자 하는 데이터의 주소에 관련된 채널이다. W 채널(213)은 Write Data Channel로 AXI 마스터(200)에서 쓰고 싶은 데이터를 보내는 채널이다. B 채널(214)는 Write Response Channel로 AXI 슬레이브(201)에서 받은 쓰기 신호에 응답하는 채널이다. 이 세 채널(212, 213, 214)은 쓰기 작업과 관련되어 있다. AXI 마스터(200)가 AW 채널(212)과 W 채널(213)로 신호를 보내면 AXI 슬레이브(201)가 B 채널(214)로 쓰기 작업에 대한 상태를 보내준다.1 and 2, the AXI4 protocol mainly consists of five channels. The AR channel (Channel) 210 is a read address channel and is a channel related to the address of data to be read by the AXI master 200 . The R channel 211 is a Read Data Channel, which receives data to be read from the AXI master 200. These two channels 210 and 211 are associated with a read operation. When the AXI master 200 sends a signal to the AR channel 210 , the AXI slave 201 sends the data in the address received from the AR channel 210 to the R channel 211 . The AW channel 212 is a Write Address Channel and is a channel related to the address of data to be written in the AXI master 200. The W channel 213 is a Write Data Channel that sends data to be written from the AXI master 200. The B channel 214 is a Write Response Channel, which responds to the write signal received from the AXI slave 201. These three channels (212, 213, 214) are associated with write operations. When the AXI master 200 sends a signal to the AW channel 212 and the W channel 213, the AXI slave 201 sends a status for the write operation to the B channel 214.

구체적으로 쓰기 작업은 시작, 데이터 전송, 종료 과정으로 나뉜다. 시작 과정은 AXI Master(200)이 AW Channel(212)로 AWADDR(303)와 AWVALID를 high로 보내면, AXI Slave(201)이 AW Channel(212)로 AWREADY를 보내서 준비가 되었다는 것을 알려준다. 이후 AXI Master(200)이 AWREADY 신호를 받으면, AWVALID를 low로 바꾸면서 데이터 전송 과정이 시작된다. 이러한 과정을 핸드 쉐이크라고 한다. 데이터 전송 과정은 W Channel(213)에서 이루어지는데, 시작 과정과 마찬가지로 AXI Master(200)에서 데이터와 WVALID를 high를 보내면, AXI Slave(201)에서 WREADY를 보낸다. 이 때, 처음 데이터를 보내면서 마지막 데이터까지 B Channel(214)에서 BREADY를 high로 보낸다. AXI Master(200)에서 WVALID 값을 받으면 데이터가 제대로 전송되었다고 여기고 다음 데이터를 보낸다. 다음 데이터를 마지막으로 전송을 종료하기 위해서 마지막 과정을 실행한다. 종료 과정은 AXI Master(200)에서 데이터를 보낼 때, WVALID 외에 WLAST 신호를 추가로 high로 보내면 AXI Slave(201)에서 마지막이라는 것을 알고 WREADY와 B Channel(214)에서 BVALID를 high로 보내준다. AXI Master(200)에서 BVALID 신호를 통해 정상적으로 종료되었다는 것을 알고 전송을 마친다.Specifically, the write operation is divided into start, data transfer, and end processes. In the start process, when AXI Master(200) sends AWADDR(303) and AWVALID to AW Channel(212) as high, AXI Slave(201) sends AWREADY to AW Channel(212) to inform that it is ready. Afterwards, when the AXI Master (200) receives the AWREADY signal, the data transmission process starts by changing the AWVALID to low. This process is called a handshake. The data transmission process is performed in the W Channel (213). Similar to the start process, when data and WVALID are sent high from the AXI Master (200), WREADY is sent from the AXI Slave (201). At this time, while sending the first data, the B Channel 214 sends BREADY high until the last data. When the AXI Master(200) receives the WVALID value, it considers that the data has been transmitted properly and sends the next data. The last process is executed to end the transmission of the next data. When sending data from AXI Master (200), if WLAST signal is additionally sent high in addition to WVALID, AXI Slave (201) knows that it is the last, and sends BVALID high in WREADY and B Channel (214). The AXI Master(200) knows that it has ended normally through the BVALID signal and finishes the transmission.

도 3은 본 발명에 따른 캐시 무효화 모듈에서 보내는 ABMA 수신 신호)에 대한 구조도이다. 도 1 및 도 3을 참조하면, 캐시 무효화 모듈(111)에서 cache invalidation을 위해 설정하는 신호는 AWCACHE(300), AWUSER(301), WSTRB(302), AWADDR(303) 이며 그 외의 신호는 AXI4 표준에 따른다. cache invalidation을 위한 설정은 다음과 같다. 4-bit로 이루어진 AWCACHE(300)의 AWCACHE[1]을 1로 설정하고, 2-bit로 이루어진 AWUSER(301)의 AWUSER[0] 또한 1로 설정한다. 동시에 16-bit로 이루어진 WSTRB(302)를 Target address에 대응되는 데이터를 변경하지 않기 위하여 모두 0으로 설정해야 하며, 40-bit 크기의 AWADDR(303)에 타겟 주소를 설정한다.3 is a structural diagram of an ABMA reception signal transmitted from a cache invalidation module according to the present invention. 1 and 3, the signals set for cache invalidation in the cache invalidation module 111 are AWCACHE(300), AWUSER(301), WSTRB(302), AWADDR(303), and other signals are AXI4 standard follow The settings for cache invalidation are as follows. AWCACHE[1] of the 4-bit AWCACHE 300 is set to 1, and AWUSER[0] of the 2-bit AWUSER 301 is also set to 1. At the same time, all 16-bit WSTRB 302 must be set to 0 in order not to change data corresponding to the target address, and the target address is set in the 40-bit AWADDR 303 .

도 4는 본 발명에 따른 캐시 무효화 모듈에서 수행하는 동작의 흐름도를 나타낸다. 도 1 및 도 4를 참조하면, 캐시 무효화 모듈은(111) 먼저 Enable bit의 설정 여부를 확인한다(S410). 사용자가 Enable bit를 1로 설정할 경우 AMBA 수신 신호(121)로 보낼 AWCACHE, AWUSER, WSTRB, AWADDR를 도면 3의 설명과 같이 설정한다(S420). 그 다음, AMBA 수신 신호(121)를 보내기 전에 AWADDR 값이 타겟 주소로 변경되었는지를 확인하기 위하여, 디폴트(default) 값인 초기 값(initial value)과 다른 값이 설정되어 있는지 확인한다(S430). AWADDR가 초기 값과 다르면, AMBA 수신 신호(121)를 보내며(S440), 정상적으로 캐시 무효화가 처리되었다면 AXI slave(201)가 BVALID를 high로 응답(write response)하게 된다. 캐시 무효화 모듈은(111) AMBA 수신 신호(121)를 보낸 후에 응답으로 쓰기 응답(write response)이 오지 않으면 S410 단계로 돌아가 다시 신호를 보내며(S450), 쓰기 응답을 수신한 경우 정상적으로 동작을 종료한다(S450).4 is a flowchart of an operation performed by the cache invalidation module according to the present invention. 1 and 4 , the cache invalidation module 111 first checks whether an Enable bit is set ( S410 ). When the user sets the Enable bit to 1, AWCACHE, AWUSER, WSTRB, and AWADDR to be transmitted to the AMBA reception signal 121 are set as described in FIG. 3 (S420). Next, in order to check whether the AWADDR value is changed to the target address before sending the AMBA reception signal 121, it is checked whether a value different from an initial value, which is a default value, is set (S430). If the AWADDR is different from the initial value, the AMBA reception signal 121 is sent (S440), and if the cache invalidation is normally processed, the AXI slave 201 responds with a high BVALID (write response). If a write response does not come as a response after the cache invalidation module (111) sends the AMBA reception signal 121, it returns to step S410 and sends a signal again (S450), and when the write response is received, the operation normally ends. (S450).

이에 따라, 최종적으로 본 발명을 통해 제안하는 AXI4 프로토콜을 따라 동작하는 캐시 무효화 모듈(111)을 이용하여 사용자는 특정 권한 없이 캐시 무효화(cache invalidation)가 가능하다. 도 1 내지 도 4를 참조하면, 본 발명에 따른 FPGA 기반 캐시 무효화 기능을 수행하는 방법 및 장치는 FLUSH+RELOAD, FLUSH+FLUSH와 같은 캐시 부채널 공격 방식을 수행할 수 있다. 이에 따라, 본 발명에 따른 방법 및 장치는 캐시 무효화 기능을 수행할 수 있다. 본 발명에서는 ACP를 활용한 캐시 무효화(invalidation)를 위해 ACP Invalidation 하드웨어 IP를 제시하고자 한다. Accordingly, by using the cache invalidation module 111 that finally operates according to the AXI4 protocol proposed through the present invention, the user can invalidate cache without specific authority. 1 to 4 , the method and apparatus for performing the FPGA-based cache invalidation function according to the present invention may perform a cache side-channel attack method such as FLUSH+RELOAD and FLUSH+FLUSH. Accordingly, the method and apparatus according to the present invention can perform a cache invalidation function. The present invention intends to propose an ACP Invalidation hardware IP for cache invalidation using ACP.

이와 관련하여, 도 5는 본 발명에 따른 ACP Invalidation IP를 포함한 시스템의 블록 다이어그램이다. 도 5를 참조하면, ACP Invalidation IP에 해당하는 ACP 구성 모듈(112)은 도 1의 PL (Programmable Logic) (110) 영역에 배치될 수 있다. 본 발명에 따른 FPGA 기반 캐시 무효화 방법은 캐시 무효화 모듈(111) 및/또는 ACP 구성 모듈(112)에 의해 수행될 수 있다. 이와 관련하여, ACP 구성 모듈(112)은 캐시 무효화 모듈(111)과 별도로 프로그래머블 로직(110)을 구성할 수 있다. 다른 예로, ACP 구성 모듈(112)은 캐시 무효화 모듈(111) 내에 구성되어 프로그래머블 로직(110) 영역에 배치될 수 있다.In this regard, FIG. 5 is a block diagram of a system including ACP Invalidation IP according to the present invention. Referring to FIG. 5 , the ACP configuration module 112 corresponding to the ACP Invalidation IP may be disposed in the PL (Programmable Logic) 110 area of FIG. 1 . The FPGA-based cache invalidation method according to the present invention may be performed by the cache invalidation module 111 and/or the ACP configuration module 112 . In this regard, the ACP configuration module 112 may configure the programmable logic 110 separately from the cache invalidation module 111 . As another example, the ACP configuration module 112 may be configured in the cache invalidation module 111 and disposed in the programmable logic 110 area.

도 1 및 도 5를 참조하면, ACP 구성 모듈(112)은 프로세싱 시스템(100)에서 받은 데이터를 통하여 PS 코어(101)의 S_AXI_ACP_FPD port로 캐시 무효화(invalidation) 신호를 보낸다. 여기서, 보내는 신호 값은 AWADDR[39:0], WSTRB[15:0], AWCACHE[3:0] 이다. AWADDR[39:0]은 40-bit이고 무효화(invalidate)하려는 타겟 주소를 나타낸다. WSTRB[15:0]은 ACP를 통하여 보내는 128-bit 데이터 중에서 각각의 byte가 유효한 값인지 나타내주는 bit 신호이다 (128 bits는 16 bytes이기 때문에 WSTRB[15:0]는 16-bit이다). 1 and 5 , the ACP configuration module 112 sends a cache invalidation signal to the S_AXI_ACP_FPD port of the PS core 101 through data received from the processing system 100 . Here, the transmitted signal values are AWADDR[39:0], WSTRB[15:0], and AWCACHE[3:0]. AWADDR[39:0] is 40-bit and indicates the target address to invalidate. WSTRB[15:0] is a bit signal indicating whether each byte is a valid value among 128-bit data sent through ACP (WSTRB[15:0] is 16-bit because 128 bits are 16 bytes).

ACP 구성 모듈(112)은 프로세서 코어(101)의 캐시 값을 변경하지 않고 invalidation만 수행하기 위해 WSTRB[15:0] 값을 0으로 설정하여 보낸다. 마지막으로 AWCACHE[3:0]는 AWADDR[39:0]으로 설정한 주소의 캐시 정책에 대해 설정하는 비트이다. Write-back and write-allocate을 사용하기 때문에 AWCACHE[3:0] 값을 4‘로 보낼 수 있다.The ACP configuration module 112 sets and sends the WSTRB[15:0] value to 0 in order to only perform invalidation without changing the cache value of the processor core 101 . Finally, AWCACHE[3:0] is a bit set for the cache policy of the address set with AWADDR[39:0]. Because write-back and write-allocate are used, the value of AWCACHE[3:0] can be sent as 4'.

본 발명에 따르면 ACP를 이용하여 Arm 아키텍처에서 unprivileged user도 권한 상승(privilege escalation) 없이 FLUSH+RELOAD 공격과 유사한 방식으로 캐시 무효화(invalidation)가 가능하다. 이를 위해 privileged 명령어와 유사한 역할을 하는 ACP Invalidation IP를 개발하였으며, 실제 공격을 수행하여 읽기 시간 차이를 이용하여 타겟 데이터를 확인하였다. 이러한 점을 미루어 보았을 때, ACP Invalidation IP를 통하여 Arm 기반의 프로세서에서 권한이 없는 사용자도 캐시 invalidation을 수행할 수 있다. According to the present invention, cache invalidation is possible in a manner similar to the FLUSH+RELOAD attack without privilege escalation even for unprivileged users in the Arm architecture using ACP. To this end, we developed ACP Invalidation IP, which plays a similar role to the privileged command, and confirmed the target data using the read time difference by performing an actual attack. Considering these points, through ACP Invalidation IP, even an unauthorized user can perform cache invalidation on an Arm-based processor.

이상에서는 본 발명에 따른 FPGA 기반 캐시 무효화 방법 및 이를 수행하는 장치에 대해 설명하였다. 이하에서는 본 발명에서 청구하고자 하는 FPGA 기반 캐시 무효화 기능을 수행하는 장치에 대해 설명한다. 이와 관련하여, 도 1 내지 도 4를 참조하면, FPGA 기반 캐시 무효화 기능을 수행하는 장치는 프로세싱 시스템(100) 및 캐시 무효화 모듈(110)을 포함할 수 있다.In the above, an FPGA-based cache invalidation method according to the present invention and an apparatus for performing the same have been described. Hereinafter, an apparatus for performing the FPGA-based cache invalidation function claimed in the present invention will be described. In this regard, referring to FIGS. 1 to 4 , an apparatus for performing an FPGA-based cache invalidation function may include a processing system 100 and a cache invalidation module 110 .

프로세싱 시스템(100)은 AMBA (Advanced Microcontroller Bus Architecture)에 따른 AMBA 송신 신호를 생성하여 송신하도로 구성된다. 캐시 무효화 모듈(110)은 AMBA 송신 신호를 수신하면, 캐시 무효화를 위한 AMBA 수신 신호를 프로세싱 시스템(110)으로 송신하도록 구성될 수 있다.The processing system 100 is configured to generate and transmit an AMBA transmission signal according to AMBA (Advanced Microcontroller Bus Architecture). The cache invalidation module 110 may be configured to, upon receiving the AMBA transmission signal, transmit an AMBA reception signal for cache invalidation to the processing system 110 .

캐시 무효화 모듈(110)은 AMBA 수신 신호와 연관된 AWCACHE, AWUSER, WSTRB, AWADDR를 설정하도록 구성될 수 있다. 이와 관련하여, AWCACHE, AWADDR은 쓰기 주소 채널(Write Address Channel)을 구성할 수 있다. AWADDR은 쓰기 주소(Write address)를 의미한다. AWCACHE는 메모리 타입에 해당하고, 마스터에서는 버퍼 가능(0011)일 때 트랜잭션을 생성할 수 있고 슬레이브에서는 사용하지 않도록 구성될 수 있다.The cache invalidation module 110 may be configured to set AWCACHE, AWUSER, WSTRB, AWADDR associated with the AMBA received signal. In this regard, AWCACHE and AWADDR may configure a write address channel. AWADDR stands for write address. AWCACHE corresponds to a memory type, and the master can create a transaction when it is bufferable (0011) and can be configured not to use it in the slave.

AWUSER는 쓰기 주소 사용자 신호(Write Address User Signal)을 의미한다. 한편, WSTRB는 쓰기 데이터 체널(Write Data Channel)을 구성할 수 있다. WSTRB는 4byte의 쓰기 데이터(Write Data)를 나타내는 4bit 신호로 구성되고, 슬레이브는 모든 byte가 유효하다고 가정할 수 있다.AWUSER stands for Write Address User Signal. Meanwhile, the WSTRB may configure a write data channel. The WSTRB is composed of a 4-bit signal representing 4 bytes of write data, and the slave may assume that all bytes are valid.

구체적으로, 캐시 무효화 모듈(110)은 AWADDR의 값이 디폴트 값인 초기 값과 다른 값으로 설정되어 있는지 확인한다. 따라서, 상기 AWADDR의 값이 상기 초기 값과 다르면 상기 AWADDR의 값이 타겟 주소(target address)로 변경된 것으로 판단할 수 있다.Specifically, the cache invalidation module 110 checks whether the value of AWADDR is set to a value different from an initial value that is a default value. Accordingly, if the value of AWADDR is different from the initial value, it may be determined that the value of AWADDR has been changed to a target address.

캐시 무효화 모듈(110)은 상기 AWADDR의 값이 상기 초기 값과 다르면 상기 AMBA 수신 신호를 프로세싱 시스템(100)으로 송신하도록 구성될 수 있다. 또한, 캐시 무효화 모듈(110)은 상기 AMBA 수신 신호를 프로세싱 시스템(100)으로 송신되어, 정상적으로 캐시 무효화 작업이 처리되었다면 프로세싱 시스템(100)으로부터 응답을 수신할 수 있다.The cache invalidation module 110 may be configured to transmit the AMBA received signal to the processing system 100 when the value of the AWADDR is different from the initial value. In addition, the cache invalidation module 110 transmits the AMBA reception signal to the processing system 100 , and may receive a response from the processing system 100 if the cache invalidation operation is normally processed.

한편, 캐시 무효화 모듈(110)은 Enable bit의 설정 여부를 확인하여 상기 Enable bit가 1로 설정된 경우 상기 AMBA 수신 신호로 송신할 AWCACHE, AWUSER, WSTRB, AWADDR를 설정할 수 있다. 이와 관련하여, 캐시 무효화 모듈(110)은 4-bit로 이루어진 상기 AWCACHE의 AWCACHE[1]을 1로 설정하고, 2-bit로 이루어진 AWUSER(301)의 AWUSER[0]을 1로 설정할 수 있다. 또한. 캐시 무효화 모듈(110)은 상기 타겟 주소에 대응되는 데이터를 변경하지 않기 위하여 16-bit로 이루어진 상기 WSTRB를 모두 0으로 설정할 수 있다.Meanwhile, the cache invalidation module 110 may check whether the Enable bit is set and set AWCACHE, AWUSER, WSTRB, and AWADDR to be transmitted as the AMBA reception signal when the Enable bit is set to 1. In this regard, the cache invalidation module 110 may set AWCACHE[1] of the 4-bit AWCACHE to 1, and set AWUSER[0] of the 2-bit AWUSER 301 to 1. also. The cache invalidation module 110 may set all 16-bit WSTRBs to 0 in order not to change data corresponding to the target address.

한편, 캐시 무효화 모듈(110)은 복수의 코어들(101)의 각각에 포함된 L1 캐시(102) 및 공통된 L2 캐시(104)와 연동 가능하다. 캐시 무효화 모듈(110)은 상기 AWADDR가 상기 초기 값과 다른 경우 상기 AMBA 수신 신호를 수신하도록 구성된 스누프 제어부(103)를 더 포함할 수 있다.Meanwhile, the cache invalidation module 110 is interoperable with the L1 cache 102 and the common L2 cache 104 included in each of the plurality of cores 101 . The cache invalidation module 110 may further include a snoop control unit 103 configured to receive the AMBA reception signal when the AWADDR is different from the initial value.

스누프 제어부(103)는 상기 AMBA 수신 신호를 수신하면, 상기 복수의 코어들에서 데이터 쓰기 작업이 수행될 때 공유하고 있는 데이터의 불일치가 해결되도록 제어할 수 있다. 이를 위해, 스누프 제어부(103)는 상기 복수의 코어들이 공유하는 L2 캐시(104)를 제어할 수 있다. When the snoop control unit 103 receives the AMBA reception signal, the snoop control unit 103 may control the mismatch of shared data to be resolved when a data write operation is performed in the plurality of cores. To this end, the snoop control unit 103 may control the L2 cache 104 shared by the plurality of cores.

한편, 캐시 무효화 모듈(110)은 AXI 프로토콜에 따라 캐시 무효화를 위한 상기 AMBA 수신 신호를 송신하도록 구성된 AXI 마스터 포트(200)를 포함할 수 있따. 프로세싱 시스템(100)은 AXI 프로토콜에 따라 상기 AMBA 수신 신호를 수신하도록 구성된 AXI 슬레이브 포트(201)를 포함할 수 있다.Meanwhile, the cache invalidation module 110 may include an AXI master port 200 configured to transmit the AMBA reception signal for cache invalidation according to the AXI protocol. The processing system 100 may include an AXI slave port 201 configured to receive the AMBA receive signal according to an AXI protocol.

AXI 마스터 포트(200)가 AW 채널로 AWADDR 및 AWVALID를 high로 송신하면, AXI 슬레이브 포트(201)는 상기 AW 채널로 AWREADY 신호를 송신한다. 이에 따라, 프로세싱 시스템(100)에서 상기 데이터 쓰기 작업에 대한 준비가 되었음을 알려줄 수 있다.When the AXI master port 200 transmits AWADDR and AWVALID to high on the AW channel, the AXI slave port 201 transmits an AWREADY signal to the AW channel. Accordingly, the processing system 100 may notify that the processing system 100 is ready for the data writing operation.

이에 따라, AXI 마스터 포트(200)는 상기 AWREADY 신호를 수신하면 상기 AWVALID를 low로 변경하여 데이터 전송 과정이 시작되도록 제어할 수 있다. 이와 관련하여, 상기 데이터 전송 과정에서 W 채널을 통해 AXI 마스터 포트(200)에서 AXI 슬레이브 포트(201)로 데이터와 high로 설정된 WVALID를 송신할 수 있다. 이에 응답하여, AXI 슬레이브 포트(201)에서 WREADY를 송신하여 상기 데이터 전송 과정이 이루어질 수 있다.Accordingly, upon receiving the AWREADY signal, the AXI master port 200 may change the AWVALID to low to control the data transmission process to start. In this regard, in the data transmission process, data and WVALID set to high may be transmitted from the AXI master port 200 to the AXI slave port 201 through the W channel. In response, the AXI slave port 201 transmits WREADY to perform the data transmission process.

한편, AXI 마스터 포트(200)에서 데이터를 송신 시 high로 설정된 WLAST 신호를 송신하면 AXI 슬레이브 포트(201)는 B Channel을 통해 high로 설정된 BVALID를 AXI 마스터 포트(200)로 송신할 수 있다. 이에 따라, AXI 마스터 포트(200)가 상기 high로 설정된 BVALID를 수신하여 상기 데이터 전송 과정을 종료할 수 있다. On the other hand, if the AXI master port 200 transmits the WLAST signal set to high when transmitting data, the AXI slave port 201 may transmit the BVALID set to high through the B Channel to the AXI master port 200 . Accordingly, the AXI master port 200 may receive the BVALID set to high and end the data transmission process.

한편, 캐시 무효화 모듈(111)은 상기 AMBA 수신 신호를 송신한 후에 상기 응답으로 쓰기 응답(write response)이 수신되는지를 확인할 수 있다. 이에 따라, 상기 쓰기 응답이 수신되지 않으면, Enable bit의 설정 여부를 확인하여 상기 Enable bit가 1로 설정된 경우 상기 AMBA 수신 신호로 송신할 AWCACHE, AWUSER, WSTRB, AWADDR를 다시 설정할 수 있다.Meanwhile, the cache invalidation module 111 may check whether a write response is received as the response after transmitting the AMBA reception signal. Accordingly, if the write response is not received, it is possible to check whether the enable bit is set, and to set AWCACHE, AWUSER, WSTRB, and AWADDR to be transmitted as the AMBA reception signal when the enable bit is set to 1 again.

따라서, 캐시 무효화 모듈(111)은 상기 AWADDR가 상기 초기 값과 다르면 상기 AMBA 수신 신호를 상기 프로세싱 시스템으로 송신한다. 캐시 무효화 모듈(111)은 정상적으로 캐시 무효화 작업이 처리되었다면 프로세싱 시스템(100)으로부터 응답으로 상기 쓰기 응답이 다시 수신되도록 AXI 슬레이브 포트(123)를 제어할 수 있다.Accordingly, the cache invalidation module 111 transmits the AMBA reception signal to the processing system when the AWADDR is different from the initial value. If the cache invalidation operation is normally processed, the cache invalidation module 111 may control the AXI slave port 123 to receive the write response again as a response from the processing system 100 .

이상에서는 본 발명에 따른 FPGA 기반 캐시 무효화 방법 및 이를 수행하는 장치에 대해 설명하였다. 본 발명에 따른 FPGA 기반 캐시 무효화 방법에 대한 기술이 적용될 수 있는 제품 및 서비스는 다음과 같다.In the above, an FPGA-based cache invalidation method according to the present invention and an apparatus for performing the same have been described. Products and services to which the technology for the FPGA-based cache invalidation method according to the present invention can be applied are as follows.

1) Mobile/IoT/Automotive 등 Arm 기반 SoC를 활용하는 애플리케이션 최적화, 2）Cloud/Big Data 관련 서비스의 FPGA 기반 성능 최적화 및 3) Operating System/Virtual Machine 등 Performance Management 기술에 활용될 수 있다.1) Application optimization using Arm-based SoC such as Mobile/IoT/Automotive, 2) FPGA-based performance optimization of Cloud/Big Data related services, and 3) Performance management technology such as Operating System/Virtual Machine.

한편, 제품/서비스의 일부분을 구성하는 요소기술로 1) AXI4 신호 변경을 통한 Cache Management 기술에 활용 가능하고, 2) 모바일/IoT 등 배터리 디바이스의 전력 최적화 기술에 활용 가능하다. 또한, 3) Arm FPGA 기반의 전원공급장치 고속제어부, 병렬처리의 성능 향상에 활용 가능하다.On the other hand, as a component technology that forms part of products/services, 1) it can be used in cache management technology through AXI4 signal change, and 2) it can be used in battery device power optimization technology such as mobile/IoT. In addition, 3) Arm FPGA-based power supply high-speed control unit, it can be used to improve the performance of parallel processing.

본 발명에 따른 기술이 적용될 수 있는 시장은 1) 성능 및 전력 소모에 민감한 Mobile, IoT, Automotive 시장 및 2) Arm FPGA를 적극적으로 활용하고 있는 Server 및 Cloud 서비스 시장일 수 있다. Markets to which the technology according to the present invention can be applied may be 1) mobile, IoT, and automotive markets sensitive to performance and power consumption, and 2) server and cloud service markets that actively utilize Arm FPGA.

또한, 발명에 따른 기술을 활용할 수 있는 기업은 FPGA SoC 제조사 (Intel Altera, Xilinx 등) 및 FPGA 활용 기업 (Amazon, Google 등)과 Arm SoC 기반 애플리케이션 및 프레임워크 개발사 (스마트폰, 자동차 등)를 포함할 수 있다.In addition, companies that can utilize the technology according to the invention include FPGA SoC manufacturers (Intel Altera, Xilinx, etc.) can do.

본 발명에 따른 FPGA 기반 캐시 무효화 방법 및 이를 수행하는 장치의 기술적 효과는 다음과 같다.Technical effects of the FPGA-based cache invalidation method and the apparatus for performing the same according to the present invention are as follows.

소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능뿐만 아니라 각각의 구성 요소들에 대한 설계 및 파라미터 최적화는 별도의 소프트웨어 모듈로도 구현될 수 있다. 적절한 프로그램 언어로 쓰여진 소프트웨어 어플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 메모리에 저장되고, 제어부(controller) 또는 프로세서(processor)에 의해 실행될 수 있다.According to the software implementation, not only the procedures and functions described in this specification but also the design and parameter optimization for each component may be implemented as a separate software module. The software code may be implemented as a software application written in a suitable programming language. The software code may be stored in a memory and executed by a controller or a processor.

100: 프로세싱 시스템 101: 코어
102: L1 캐시 103: 스누프 제어부
104: L2 캐시 105: 메모리
110: 프로그래머블 로직 111: 캐시 무효화 모듈
112: ACP 구성 모듈
120: AMBA 송신 신호 121: AMBA 수신 신호
122, 125: AXI 마스터 포트 123: AXI 슬레이브 포트
124: ACP
200: AXI 마스터 201: AXI 슬레이브
210: Read Address Channel 211: Read Data Channel
212: Write Address Channel 213: Write Data Channel
214: Write Response Channel
300: AWCACHE 301: AWUSER
302: WSTRB 303: AWADDR100: processing system 101: core
102: L1 cache 103: snoop control
104: L2 cache 105: memory
110: programmable logic 111: cache invalidation module
112: ACP configuration module
120: AMBA transmit signal 121: AMBA receive signal
122, 125: AXI Master Port 123: AXI Slave Port
124: ACP
200: AXI Master 201: AXI Slave
210: Read Address Channel 211: Read Data Channel
212: Write Address Channel 213: Write Data Channel
214: Write Response Channel
300: AWCACHE 301: AWUSER
302: WSTRB 303: AWADDR

Claims

A device for performing an FPGA-based cache invalidation function, comprising:
A processing system for generating and transmitting an AMBA transmission signal according to AMBA (Advanced Microcontroller Bus Architecture); and
a cache invalidation module that, upon receiving the AMBA transmission signal, sends an AMBA reception signal for cache invalidation to the processing system;
The cache invalidation module,
Set AWCACHE, AWUSER, WSTRB, AWADDR associated with the AMBA received signal,
It is checked whether the value of AWADDR is set to a value different from the initial value, which is a default value, and if the value of AWADDR is different from the initial value, it is determined that the value of AWADDR has been changed to a target address,
When the value of the AWADDR is different from the initial value, the AMBA reception signal is transmitted to the processing system, and if the cache invalidation operation is normally processed, a response is received from the processing system. An apparatus for performing an FPGA-based cache invalidation function.

According to claim 1,
The cache invalidation module,
AWCACHE[1] of the AWCACHE composed of 4-bit is set to 1, and AWUSER[0] of the AWUSER 301 composed of 2-bit is set to 1,
An apparatus for performing an FPGA-based cache invalidation function by setting all of the 16-bit WSTRBs to 0 in order not to change data corresponding to the target address.

According to claim 1,
The cache invalidation module,
Check whether the Enable bit is set or not, and if the Enable bit is set to 1, set AWCACHE, AWUSER, WSTRB, AWADDR to be transmitted as the AMBA reception signal,
The processing system is
An FPGA-based cache invalidation function, interoperable with the L1 cache and the common L2 cache included in each of the plurality of cores, and including a snoop control unit configured to receive the AMBA reception signal when the AWADDR is different from the initial value device to do.

4. The method of claim 3,
The snoop control unit,
Upon receiving the AMBA reception signal,
An apparatus for performing an FPGA-based cache invalidation function for controlling the L2 cache shared by the plurality of cores so that a mismatch of shared data is resolved when a data write operation is performed in the plurality of cores.

5. The method of claim 4,
wherein the cache invalidation module comprises an AXI master port configured to transmit the AMBA receive signal for cache invalidation according to an AXI protocol, and wherein the processing system comprises an AXI slave port configured to receive the AMBA receive signal according to an AXI protocol , a device that performs an FPGA-based cache invalidation function.

6. The method of claim 5,
When the AXI master port sends the AWADDR and AWVALID high on the AW channel, the AXI slave port sends an AWREADY signal on the AW channel indicating that the processing system is ready for the data write operation. A device that performs the cache invalidation function.

7. The method of claim 6,
When the AXI master port receives the AWREADY signal, the data transmission process starts by changing the AWVALID low,
In the data transmission process, the AXI master port transmits data and WVALID set to high to the AXI slave port through the W channel in the data transmission process, and transmits WREADY from the AXI slave port. A device that performs an FPGA-based cache invalidation function.

8. The method of claim 7,
When transmitting data from the AXI master port, when transmitting the WLAST signal set to high, the AXI slave port transmits the BVALID set to high through the B Channel to the AXI master port,
When the AXI master port receives the BVALID set to high, the data transfer process is terminated, and an FPGA-based cache invalidation function is performed.

According to claim 1,
The cache invalidation module,
If a write response is not received as the response after transmitting the AMBA reception signal, it is checked whether the enable bit is set, and when the enable bit is set to 1, the AWCACHE, AWUSER, WSTRB to be transmitted as the AMBA reception signal A device that performs an FPGA-based cache invalidation function, resetting AWADDR.

10. The method of claim 9,
The cache invalidation module is configured to send the AMBA reception signal to the processing system if the AWADDR is different from the initial value, and configure the AXI slave port to receive the write response again as a response from the processing system if the cache invalidation operation is normally processed. A device that performs an FPGA-based cache invalidation function that controls.