KR20240034258A

KR20240034258A - Dynamic allocation of cache memory as RAM

Info

Publication number: KR20240034258A
Application number: KR1020247006761A
Authority: KR
Inventors: 로히트 나타라잔; 위르겐 엠. 슐츠; 크리스토퍼 디. 슐러; 로히트 케이. 구프타; 토마스 티. 조우; 스리니바사 랑간 스리다란
Original assignee: 애플 인크.
Priority date: 2021-08-31
Filing date: 2022-07-28
Publication date: 2024-03-13
Also published as: WO2023033955A1; DE112022003363T5

Abstract

장치는 캐시 제어기 회로, 및 복수의 캐시 라인들을 갖는 캐시 메모리를 추가로 포함하는 캐시 메모리 회로를 포함한다. 캐시 제어기 회로는, 현재 사용 중인 캐시 메모리 회로의 일부분을 재할당하라는 요청을 수신하도록 구성될 수 있다. 이러한 요청은 캐시 라인들 중 하나 이상의 캐시 라인들에 대응하는 어드레스 영역을 식별할 수 있다. 캐시 제어기 회로는, 요청에 응답하여, 하나 이상의 캐시 라인들을 캐시 동작들로부터 배제함으로써 하나 이상의 캐시 라인들을 직접 어드레싱가능한 랜덤 액세스 메모리(RAM)로 컨버팅하도록 추가로 구성될 수 있다.The device includes a cache controller circuit and a cache memory circuit further comprising a cache memory having a plurality of cache lines. The cache controller circuit may be configured to receive a request to reallocate a portion of the cache memory circuit currently in use. This request may identify an address area corresponding to one or more of the cache lines. The cache controller circuit may be further configured, in response to the request, to convert one or more cache lines to directly addressable random access memory (RAM) by excluding the one or more cache lines from cache operations.

Description

Dynamic allocation of cache memory as RAM

본 명세서에서 설명된 실시예들은 시스템-온-칩(systems-on-a-chip, SoC)들에 관한 것으로, 더 구체적으로는, 캐시 메모리를 동작시키기 위한 방법들에 관한 것이다.Embodiments described herein relate to systems-on-a-chip (SoC) and, more specifically, to methods for operating cache memory.

시스템-온-칩(SoC) 집적 회로(integrated circuit, IC)들은 대체적으로, 메모리 제어기들 및 다른 에이전트들과 같은 다양한 다른 컴포넌트들과 함께, 시스템에 대한 중앙 프로세싱 유닛(central processing unit, CPU)들로서의 역할을 하는 하나 이상의 프로세서들을 포함한다. 본 명세서에서 사용되는 바와 같이, "에이전트"는 버스 회로를 통한 트랜잭션을 개시할 수 있거나 또는 그에 대한 목적지가 될 수 있는 기능 회로를 지칭한다. 따라서, 범용 프로세서들, 그래픽스 프로세서들, 네트워크 인터페이스들, 메모리 제어기들, 및 다른 유사한 회로들이 에이전트들로 지칭될 수 있다. 본 명세서에서 사용되는 바와 같이, "트랜잭션"은 하나 이상의 버스 회로들에 걸쳐 2개의 에이전트들 사이의 데이터 교환을 지칭한다. 메모리 회로로부터 데이터를 판독하거나 그에 데이터를 저장하기 위한 에이전트로부터의 트랜잭션들은 전형적인 유형의 트랜잭션이고, 다량의 데이터를 포함할 수 있다. 메모리 회로들은 다수의 클록 사이클들을 사용하여 그의 메모리 셀들 내의 데이터에 액세스할 수 있다.System-on-chip (SoC) integrated circuits (ICs) typically act as central processing units (CPUs) for a system, along with various other components such as memory controllers and other agents. It includes one or more processors that play the role of. As used herein, “agent” refers to a functional circuit that can initiate or be the destination for a transaction over a bus circuit. Accordingly, general purpose processors, graphics processors, network interfaces, memory controllers, and other similar circuits may be referred to as agents. As used herein, “transaction” refers to the exchange of data between two agents over one or more bus circuits. Transactions from an agent to read data from or store data in a memory circuit are a typical type of transaction and can include large amounts of data. Memory circuits can access data within their memory cells using multiple clock cycles.

캐시 메모리들은 시스템 메모리들 및/또는 비휘발성 저장 메모리들에 대한 트랜잭션들과 연관된 지연들을 감소시킴으로써 프로세서들의 증가된 성능을 지원하기 위해 SoC들에서 빈번하게 사용된다. 캐시 메모리들은 빈번하게 액세스된 메모리 어드레스들에 저장된 정보의 로컬 카피들을 저장할 수 있다. 이러한 로컬 카피들은 타깃 메모리 어드레스에 대한 메모리 액세스를 수행하는 것과 비교하여 에이전트들에 대해 캐싱된 값들에 액세스하기 위한 더 짧은 지연들을 가질 수 있다. 메모리 액세스가 현재 캐싱되지 않은 타깃 어드레스에 대해 이루어질 때, 어드레싱된 메모리가 액세스될 수 있고, 타깃 어드레스를 포함하는 복수의 순차적 어드레스들로부터의 값들이 그룹으로서 판독되고, 이어서, 미래의 액세스 시간들을 감소시키기 위해 캐싱될 수 있다. 캐시 라인 내의 캐싱된 정보가 무효하게 되거나 또는 캐싱된 정보가 빈번하게 액세스되지 않았다고 결정될 때, 캐싱된 정보는 축출을 위해 무효화 및 마킹되어, 이에 의해, 그것이 SoC의 프로세서들에 의해 다른 정보가 액세스되는 것에 의해 오버라이트(overwrite)될 수 있게 할 수 있다.Cache memories are frequently used in SoCs to support increased performance of processors by reducing delays associated with transactions to system memories and/or non-volatile storage memories. Cache memories can store local copies of information stored at frequently accessed memory addresses. These local copies may have shorter latencies for agents to access cached values compared to performing a memory access to the target memory address. When a memory access is made to a target address that is not currently cached, the addressed memory can be accessed and values from a plurality of sequential addresses containing the target address are read as a group, then reducing future access times. It can be cached to do so. When the cached information in a cache line becomes invalid or it is determined that the cached information has not been accessed frequently, the cached information is invalidated and marked for eviction, thereby preventing other information from being accessed by the SoC's processors. It can be overwritten by:

일 실시예에서, 장치는 캐시 제어기 회로, 및 복수의 캐시 라인들을 갖는 캐시 메모리를 추가로 포함하는 캐시 메모리 회로를 포함한다. 캐시 제어기 회로는, 현재 사용 중인 캐시 메모리 회로의 일부분을 재할당하라는 요청을 수신하도록 구성될 수 있다. 이러한 요청은 캐시 라인들 중 하나 이상의 캐시 라인들에 대응하는 어드레스 영역을 식별할 수 있다. 캐시 제어기 회로는 또한, 요청에 응답하여, 하나 이상의 캐시 라인들을 캐시 동작들로부터 배제함으로써 하나 이상의 캐시 라인들을 직접 어드레싱가능한 랜덤 액세스 메모리(random-access memory, RAM)로 컨버팅하도록 구성될 수 있다.In one embodiment, the device includes a cache controller circuit and a cache memory circuit further comprising a cache memory having a plurality of cache lines. The cache controller circuit may be configured to receive a request to reallocate a portion of the cache memory circuit currently in use. This request may identify an address area corresponding to one or more of the cache lines. The cache controller circuit may also be configured, in response to the request, to convert one or more cache lines to directly addressable random-access memory (RAM) by excluding one or more cache lines from cache operations.

추가 예에서, 캐시 제어기 회로는 식별된 어드레스 영역에서 메모리 트랜잭션들에 대한 실시간 가상 채널을 지원하도록, 그리고 벌크 가상 채널을 통해 수신된 메모리 트랜잭션보다 실시간 가상 채널을 통해 수신된 메모리 트랜잭션을 우선순위화하도록 추가로 구성될 수 있다. 일 예에서, 캐시 제어기 회로는 어드레스 영역이 보안 액세스 영역에 포함된다고 결정하도록 추가로 구성될 수 있다. 결정에 응답하여, 캐시 제어기 회로는 보안 액세스 영역에 액세스하도록 인가되지 않은 에이전트로부터의 어드레스 영역에서의 메모리 트랜잭션을 무시하도록 추가로 구성될 수 있다.In a further example, the cache controller circuitry is configured to support a real-time virtual channel for memory transactions in the identified address region and to prioritize memory transactions received over the real-time virtual channel over memory transactions received over the bulk virtual channel. It may be configured additionally. In one example, the cache controller circuitry can be further configured to determine that the address area is included in the secure access area. In response to the determination, the cache controller circuitry may be further configured to ignore memory transactions in the address area from agents that are not authorized to access the secure access area.

다른 예에서, 캐시 제어기 회로는 또한, 직접 어드레싱가능한 RAM으로의 컨버팅 전에 하나 이상의 캐시 라인들을 플러싱하도록 구성될 수 있다. 일 예에서, 캐시 제어기 회로는, 기록되는 유효 캐시 라인 내의 데이터에 응답하여, 유효 캐시 라인에 대한 라이트백(write-back) 요청을 발행하도록 추가로 구성될 수 있다. 캐시 제어기 회로는 또한, 라이트백 요청들로부터 하나 이상의 캐시 라인들을 배제하도록 구성될 수 있다.In another example, the cache controller circuit may also be configured to flush one or more cache lines prior to conversion to directly addressable RAM. In one example, the cache controller circuit can be further configured to issue a write-back request for a valid cache line in response to data in the valid cache line being written. The cache controller circuit may also be configured to exclude one or more cache lines from writeback requests.

일 실시예에서, 캐시 제어기 회로는 직접 어드레싱가능한 RAM으로부터 캐시 메모리의 일부분을 할당해제하라는 상이한 요청을 수신하도록 추가로 구성될 수 있다. 상이한 요청에 응답하여, 캐시 제어기 회로는 하나 이상의 캐시 라인들이 재할당된 동안 직접 어드레싱가능한 RAM에 저장된 데이터를 카피하지 않고서 캐시 동작들에서 하나 이상의 캐시 라인들을 포함시키도록 추가로 구성될 수 있다. 다른 실시예에서, 캐시 제어기 회로는 또한, 캐시 메모리의 일부분을 할당해제한 후에 수신된 직접 어드레싱가능한 RAM에서의 메모리 트랜잭션에 응답하여, 에러를 생성하도록 구성될 수 있다.In one embodiment, the cache controller circuit may be further configured to receive different requests to deallocate portions of cache memory from directly addressable RAM. In response to a different request, the cache controller circuit may be further configured to include one or more cache lines in cache operations without copying data stored in directly addressable RAM while the one or more cache lines are reallocated. In another embodiment, the cache controller circuit may also be configured to generate an error in response to a memory transaction in directly addressable RAM that is received after deallocating a portion of the cache memory.

하기의 상세한 설명은 첨부 도면들을 참조하며, 이제 도면들이 간단히 설명된다.
도 1은, 2개의 시점들에서, 캐시 메모리 및 어드레스 맵을 포함하는 시스템의 일 실시예의 블록도를 예시한다.
도 2는 프로세서, 복수의 경로들을 갖는 캐시 메모리, 및 어드레스 맵을 포함하는 시스템의 일 실시예의 블록도를 도시한다.
도 3은 2개의 상이한 시점들에서의 도 1의 시스템의 일 실시예의 블록도를 도시한다.
도 4는 2개의 기록 요청들을 수신하는 도 1의 시스템의 일 실시예의 블록도를 예시한다.
도 5는, 네트워크 중재기를 통해, 메모리 트랜잭션들을 캐시 메모리로 전송하는 2개의 에이전트들을 포함하는 시스템의 일 실시예의 블록도를 도시한다.
도 6은 메모리 트랜잭션들을 캐시 메모리로 전송하는 2개의 에이전트들, 즉 하나의 신뢰받는 에이전트 및 하나의 신뢰받지 않는 에이전트를 포함하는 시스템의 일 실시예의 블록도를 예시한다.
도 7은 캐시 메모리 시스템의 일부분을 직접 어드레싱가능한 어드레스 영역에 재할당하기 위한 방법의 일 실시예의 흐름도를 도시한다.
도 8은 인가되지 않은 에이전트로부터 메모리 트랜잭션을 수신하기 위한, 그리고 직접 어드레싱가능한 어드레스 영역으로부터 캐시 메모리 시스템의 일부분을 할당해제하기 위한 방법의 일 실시예의 흐름도를 도시한다.
도 9는, 2개의 시점들에서, 시스템 메모리에 위치된 버퍼가 캐시 메모리에 할당되는 시스템의 일 실시예의 블록도를 예시한다.
도 10은, 2개의 상이한 시점들에서, 저장 위치를 캐시 메모리에 할당하려는 시도가 반복되는 도 9의 시스템의 일 실시예의 블록도를 도시한다.
도 11은 버퍼를 캐시 메모리에 할당하는 DMA 및 프로세서 코어를 포함하는 시스템의 일 실시예의 블록도를 도시한다.
도 12는, 2개의 상이한 시점들에서, 벌크 트랜잭션들 및 실시간 트랜잭션들이 버퍼 및 캐시 메모리와 함께 사용되는 도 9의 시스템의 일 실시예의 블록도를 예시한다.
도 13은 버퍼를 캐시 메모리 시스템에 할당하기 위한 방법의 일 실시예의 흐름도를 도시한다.
도 14는 버퍼 및 캐시 메모리와 함께 벌크 트랜잭션들 및 실시간 트랜잭션들을 사용하기 위한 방법의 일 실시예의 흐름도를 도시한다.
도 15는 캐시 메모리에 할당된 버퍼에 액세스할 때 캐시 미스 레이트를 결정하기 위한 방법의 일 실시예의 흐름도를 예시한다.
도 16은 결합된 집적 회로들을 포함하는 시스템들의 다양한 실시예들을 도시한다.
도 17은 일부 실시예들에 따른 예시적인 컴퓨터 판독가능 매체의 블록도를 도시한다.
본 개시내용에 설명된 실시예들은 다양한 수정들 및 대안적인 형태들을 허용할 수 있지만, 그의 특정 실시예들이 도면들에 예로서 도시되고, 본 명세서에서 상세히 설명될 것이다. 그러나, 그에 대한 도면들 및 상세한 설명은 실시예들을 개시된 특정 형태로 제한하는 것으로 의도되는 것이 아니라, 그와는 반대로, 의도는 첨부된 청구범위의 사상 및 범주 내에 속한 모든 수정들, 등가물들, 및 대안들을 커버하기 위한 것임을 이해하여야 한다.The detailed description below refers to the accompanying drawings, which are now briefly described.
1 illustrates a block diagram of one embodiment of a system including a cache memory and an address map, from two viewpoints.
Figure 2 shows a block diagram of one embodiment of a system including a processor, a cache memory with multiple paths, and an address map.
Figure 3 shows a block diagram of one embodiment of the system of Figure 1 at two different viewpoints.
Figure 4 illustrates a block diagram of one embodiment of the system of Figure 1 receiving two write requests.
Figure 5 shows a block diagram of one embodiment of a system including two agents that transfer memory transactions to cache memory, via a network arbiter.
Figure 6 illustrates a block diagram of one embodiment of a system including two agents, one trusted agent and one untrusted agent, that transfer memory transactions to cache memory.
Figure 7 shows a flow diagram of one embodiment of a method for reallocating a portion of a cache memory system to a directly addressable address region.
Figure 8 shows a flow diagram of one embodiment of a method for receiving memory transactions from an unauthorized agent and deallocating a portion of a cache memory system from a directly addressable address area.
Figure 9 illustrates a block diagram of one embodiment of a system in which a buffer located in system memory is allocated to cache memory at two points in time.
Figure 10 shows a block diagram of one embodiment of the system of Figure 9 in which attempts to allocate storage locations to cache memory are repeated, at two different points in time.
Figure 11 shows a block diagram of one embodiment of a system including a processor core and a DMA that allocates buffers to cache memory.
FIG. 12 illustrates a block diagram of one embodiment of the system of FIG. 9 in which bulk transactions and real-time transactions are used with buffer and cache memory, at two different points in time.
Figure 13 shows a flow diagram of one embodiment of a method for allocating buffers to a cache memory system.
Figure 14 shows a flow diagram of one embodiment of a method for using bulk transactions and real-time transactions with buffer and cache memory.
Figure 15 illustrates a flow diagram of one embodiment of a method for determining a cache miss rate when accessing a buffer allocated to cache memory.
16 shows various embodiments of systems including coupled integrated circuits.
Figure 17 shows a block diagram of an example computer-readable medium in accordance with some embodiments.
Although the embodiments described in this disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will be described in detail herein. However, the drawings and detailed description thereto are not intended to limit the embodiments to the specific form disclosed; on the contrary, the intent is to make all modifications, equivalents, and modifications within the spirit and scope of the appended claims. It should be understood that it is intended to cover alternatives.

트랜잭션들은 "실시간" 및 "벌크"와 같은 둘 이상의 우선순위 레벨들로 분류될 수 있다 실시간 트랜잭션은 벌크 트랜잭션보다 더 높은 우선순위 레벨을 가질 수 있고, 따라서, 버스 회로들 및 실시간 트랜잭션이 통과하는 임의의 중간 에이전트들을 통해 더 빠르게 프로세싱될 수 있다. 에이전트들은 실시간 트랜잭션들을 사용하여, 충족되지 않는 경우, 시스템에서 불량한 성능, 부정확한 계산들, 또는 심지어 고장을 초래할 수 있는 프로세싱을 완료하기 위한 데드라인들을 만족시킬 수 있다. 예를 들어, 낮은 레이턴시 데드라인들이 충족되지 않는 경우에 비디오의 재생이 정지(stall)되거나 글리치(glitch)할 수 있다. 버스 회로들 및 메모리 회로들에 액세스하기 위한 다른 에이전트들로부터의 경쟁은 그러한 낮은 레이턴시 데드라인들을 충족시키는 데 있어서 문제의 하나의 원인이다.Transactions may be classified into two or more priority levels, such as “real-time” and “bulk.” Real-time transactions may have a higher priority level than bulk transactions, and therefore bus circuits and any It can be processed faster through intermediate agents. Agents can use real-time transactions to meet deadlines for completing processing that, if not met, can lead to poor performance, inaccurate calculations, or even failure in the system. For example, playback of video may stall or glitch if low latency deadlines are not met. Competition from other agents for access to bus circuits and memory circuits is one source of problems in meeting such low latency deadlines.

캐시 메모리들은 대응하는 에이전트에 더 가까운 메모리 위치들의 카피들을 저장하여, 이에 의해, 프로세싱될 때 실시간 트랜잭션이 트래버스(traverse)해야 하는 버스 회로들 및/또는 중간 에이전트들의 수를 감소시킴으로써 낮은 레이턴시 데드라인들을 만족시키기 위한 문제들 중 일부를 완화시키는 데 사용될 수 있다. 또한, 복수의 캐시 메모리들이 이용가능한 경우, 주어진 캐시 메모리는 더 적은 에이전트들에 의해 액세스되어, 이에 의해, 낮은 레이턴시 데드라인들을 만족시키면서 에이전트가 캐시 메모리에 액세스할 수 있는 확률을 증가시킬 수 있다. 그러나, 캐싱된 값들은, 빈번하게 액세스되지 않는 경우 그리고/또는 캐시 메모리에 값들을 캐싱하기 위한 다른 에이전트들로부터의 경쟁이 있는 경우, 캐시 메모리로부터 축출될 수 있다.Cache memories store copies of memory locations closer to the corresponding agent, thereby achieving low latency deadlines by reducing the number of bus circuits and/or intermediate agents that a real-time transaction must traverse when being processed. It can be used to alleviate some of the problems of satisfying Additionally, when multiple cache memories are available, a given cache memory can be accessed by fewer agents, thereby increasing the probability that an agent can access the cache memory while meeting low latency deadlines. However, cached values may be evicted from cache memory if they are not accessed frequently and/or if there is competition from other agents to cache values in cache memory.

따라서, 캐시 메모리를 사용하여 실시간 트랜잭션들을 위해 사용되고/사용될 값들을 저장하기 위한 기법들이 요구된다. 캐시 메모리를 사용하여 더 빠른 액세스 메모리 영역을 구현하기 위한 2개의 일반적인 접근법들이 본 명세서에서 제시된다. 제1 접근법에서, 캐시 메모리의 일부분이 시스템 버스 액세스가능한 어드레스 영역에 할당될 수 있으며, 여기서 어드레싱가능 캐시 메모리는 랜덤 액세스 메모리(RAM)와 유사한 방식으로 액세스될 수 있다. 그러한 기법을 구현하기 위해, 캐시 메모리의 일부분을 RAM으로서 할당하기 위해 제어 회로가 사용될 수 있다. 캐시 메모리의 할당된 부분 내의 캐시 라인들이 플러싱되고, 임의의 오손(dirty) 데이터(예컨대, 시스템 메모리 내의 대응하는 메모리 위치들을 업데이트하지 않고서 캐시 메모리에서 수정되었던 데이터)가 시스템 메모리에 라이트백된다. 할당된 부분 내의 캐시 라인들은 메모리 맵핑된 어드레스 영역을 통한 액세스를 위해 인에이블되고, 이용가능한 캐시 메모리 라인들로부터 제거된다. 이어서, 에이전트들은, 캐싱된 위치와 유사한 양의 시간에 프로세싱될 수 있는 실시간 트랜잭션들로 메모리 맵핑된 어드레스 영역에 직접 액세스할 수 있다. 메모리 맵핑된 어드레스 영역이 할당 동안 캐시의 일부로서 취급되지 않기 때문에, 이러한 영역에 저장된 값들은 연장된 기간 동안 액세스되지 않는 경우에 축출될 위험이 없다. 후술되는 도 1 내지 도 8은 이러한 RAM으로서의 캐시(cache-as-RAM) 접근법에 대한 다양한 세부사항들을 예시한다.Accordingly, techniques are needed for storing values that are/are to be used for real-time transactions using cache memory. Two general approaches are presented herein to implement faster access memory regions using cache memory. In a first approach, a portion of the cache memory may be allocated to a system bus accessible address region, where the addressable cache memory may be accessed in a manner similar to random access memory (RAM). To implement such a technique, control circuitry can be used to allocate a portion of the cache memory as RAM. Cache lines within the allocated portion of cache memory are flushed, and any dirty data (e.g., data that has been modified in cache memory without updating corresponding memory locations in system memory) is written back to system memory. Cache lines within the allocated portion are enabled for access through the memory mapped address area and are removed from the available cache memory lines. Agents can then directly access the memory mapped address region with real-time transactions that can be processed in a similar amount of time as the cached location. Because memory mapped address areas are not treated as part of the cache during allocation, values stored in these areas are not at risk of being evicted if not accessed for extended periods of time. Figures 1-8, described below, illustrate various details of this cache-as-RAM approach.

제2 접근법에서, 버퍼는 시스템 메모리 내에 할당되며, 여기서 버퍼는 낮은 레이턴시 메모리 트랜잭션들과 함께 사용하도록 의도된다. 이러한 버퍼 내의 값들에 액세스하기 위한 레이턴시를 감소시키기 위해, 버퍼는 또한 캐시 메모리에 할당될 수 있다. 일부 실시예들에서, 캐시는 특정 캐시 라인을 낮은 레이턴시 트랜잭션들과 연관시키기 위한 기법들을 포함하는, 높은 우선순위 데이터에 대한 지원을 포함할 수 있다. 그러한 지원은 연관된 캐시 라인들에 대한 축출들을 제한하거나 제거하는 것을 포함할 수 있다. 그러나, 큰 버퍼(예컨대, 비디오 프레임, 다른 이미지, 오디오 파일 등과 함께 사용하기 위해 크기설정된 버퍼)의 캐시 할당은 캐시 미스들을 겪기 시작할 수 있는데, 이는 캐시 할당 프로세스의 종료 무렵에 할당된 버퍼의 부분들이 이전에 캐싱되었던 버퍼의 부분들에 의해 이제 점유된 캐시 라인들에 대한 더 높은 맵핑 확률을 갖기 때문이다. 캐시 할당이 버퍼의 하나의 종단에서 시작하는 경우, 반대편 종단은 더 높은 수의 캐시 미스들을 겪어서, 버퍼에 대한 액세스들이 할당될 마지막 부분들을 향해 이동함에 따라 캐시 미스들을 더 빈번하게 만들 것이다.In a second approach, a buffer is allocated within system memory, where the buffer is intended for use with low latency memory transactions. To reduce latency for accessing values within this buffer, the buffer may also be allocated to cache memory. In some embodiments, the cache may include support for high priority data, including techniques for associating specific cache lines with low latency transactions. Such support may include limiting or eliminating evictions to associated cache lines. However, cache allocations of large buffers (e.g., buffers sized for use with video frames, other images, audio files, etc.) may begin to experience cache misses, which may cause portions of the allocated buffer to be lost near the end of the cache allocation process. This is because there is a higher mapping probability for cache lines that are now occupied by portions of the buffer that were previously cached. If cache allocation starts at one end of the buffer, the opposite end will experience a higher number of cache misses, making cache misses more frequent as accesses to the buffer move toward the last portions to be allocated.

개시된 기법들은 버퍼 내의 다양한 위치들에 걸쳐 버퍼의 부분들의 캐시 할당을 확산시키려고 시도한다. 이를 달성하기 위해, 버퍼는 복수의 블록들로 논리적으로 분할될 수 있다. 이어서, 각각의 블록으로부터의 제1 서브블록을 캐시에 할당하려는 시도들이 이루어질 수 있다. 후속적으로, 각각의 블록으로부터의 제2 서브블록을 캐시에 할당하려는 추가 시도들이 이루어질 수 있다. 이것은 모든 서브블록들을 캐싱하려는 시도가 이루어질 때까지 각각의 블록 내의 일정 수의 서브블록들에 대해 반복될 수 있다. 그러한 기법은, 미스들이 버퍼의 종단을 향해 집중되지 않도록 버퍼 전체에 걸쳐 다양한 서브블록 할당들에 대한 캐시 미스들을 분배할 수 있다. 예를 들어, 버퍼가 이미지를 프로세싱하기 위해 사용되는 경우, 캐시 미스들은, 이미지의 프로세싱의 시작에 몇 개의 미스들을 갖고 이어서 이미지의 종단에 가까운 프로세싱으로서 더 빈번한 미스들을 갖기보다, 전체 이미지의 프로세싱에 걸쳐 더 일관적으로, 그러나 더 낮은 집중도로 발생할 수 있다. 미스들이 더 빈번하게 발생함에 따라, 시스템 메모리로부터 요청된 데이터를 취출하기 위해 더 많은 실시간 트랜잭션들이 생성될 수 있다. 동시에 프로세싱되는 더 많은 수의 실시간 트랜잭션들을 갖는 것은 낮은 레이턴시 데드라인들이 미스될 가능성을 증가시켜, 후속적으로, 불량한 성능 또는 시스템 장애가 시스템의 사용자에 의해 경험될 가능성을 증가시킬 수 있다. 본 개시내용에서 나중에 설명되는 도 9 내지 도 13은 이러한 분산형 캐싱 접근법에 관한 세부사항들을 예시한다.The disclosed techniques attempt to spread cache allocation of portions of a buffer across various locations within the buffer. To achieve this, the buffer can be logically divided into multiple blocks. Attempts may then be made to allocate the first subblock from each block to the cache. Subsequently, further attempts may be made to allocate a second subblock from each block to the cache. This may be repeated for a certain number of subblocks within each block until an attempt is made to cache all subblocks. Such a technique can distribute cache misses for various subblock allocations throughout the buffer so that misses are not concentrated toward the end of the buffer. For example, if a buffer is used to process an image, cache misses may occur during processing of the entire image, rather than having a few misses at the beginning of processing of the image and then more frequent misses as processing near the end of the image. It may occur more consistently across the board, but at a lower intensity. As misses occur more frequently, more real-time transactions can be created to retrieve requested data from system memory. Having a greater number of real-time transactions being processed simultaneously can increase the likelihood that low-latency deadlines will be missed, subsequently increasing the likelihood that poor performance or system failures will be experienced by users of the system. Figures 9-13, described later in this disclosure, illustrate details regarding this distributed caching approach.

도 1은 2개의 시점들에서의 캐시 메모리 시스템의 일 실시예의 블록도를 예시한다. 예시된 바와 같이, 시스템(100)은 캐시 제어기 회로(101), 캐시 메모리 회로(105), 및 어드레스 맵(110)을 포함한다. 캐시 메모리 회로(105)는 캐시 라인들(120 내지 127)을 포함한다. 어드레스 맵(110)은 4개의 어드레스 영역들(115a 내지 115d)(집합적으로, 어드레스 영역들(115))을 갖는 것으로 도시되어 있다. 시스템(100)은 마이크로프로세서, 마이크로제어기, 또는 다른 형태의 시스템-온-칩(SoC)과 같은 프로세서 회로에 대응할 수 있다. 시스템(100)은 회로 기판 상에 결합된 다수의 회로 요소들의 사용에 의해 또는 단일 집적 회로 상에서 구현될 수 있다.1 illustrates a block diagram of one embodiment of a cache memory system from two viewpoints. As illustrated, system 100 includes cache controller circuitry 101, cache memory circuitry 105, and address map 110. Cache memory circuit 105 includes cache lines 120-127. Address map 110 is shown as having four address areas 115a through 115d (collectively, address areas 115). System 100 may correspond to a processor circuit, such as a microprocessor, microcontroller, or other type of system-on-chip (SoC). System 100 may be implemented on a single integrated circuit or by the use of multiple circuit elements combined on a circuit board.

예시된 바와 같이, 캐시 메모리 회로(105)는 정적 랜덤 액세스 메모리(static random-access memory, SRAM), 동적 RAM(dynamic RAM, DRAM), 강유전성 RAM(ferroelectric RAM, FeRAM 또는 FRAM), 자기저항 RAM(magnetoresistive RAM, MRAM), 플래시 메모리 등과 같은 임의의 적합한 유형의 메모리 회로 설계를 사용하여 구현될 수 있다. 캐시 메모리 회로(105)는 다수의 경로들 및/또는 세트들의 사용을 포함하는, 임의의 적합한 캐시 구조를 사용하여 조직화될 수 있다. 캐시 제어기 회로(101)는 캐시 메모리 회로(105)에서 캐시 동작들, 예컨대 캐시 태그들을 유지하는 것, 메모리 트랜잭션에 관련된 어드레스가 히트(캐시 라인이 현재 어드레스에 대응함)인지 아니면 미스(어떠한 캐시 라인도 어드레스에 대응하는 데이터로 충전되지 않았음)인지를 결정하는 것, 미스에 응답하여 캐시 라인 충전 요청들을 발행하는 것, 축출을 위해 캐시 라인들을 마킹하는 것 등을 수행하기 위한 회로들을 포함한다. 어드레스 맵(110)은 메모리 맵핑된 레지스터들 및 메모리 회로들에 대한 물리적 어드레스를 결정하기 위한 소프트웨어, 펌웨어, 및 하드웨어 회로들의 임의의 적합한 조합을 포함한다. 일부 실시예들에서, 어드레스 맵(110)은 로직 어드레스들을 물리적 어드레스들로 컨버팅하기 위한 변환 테이블들을 포함한다.As illustrated, the cache memory circuit 105 may include static random-access memory (SRAM), dynamic RAM (DRAM), ferroelectric RAM (FeRAM or FRAM), magnetoresistive RAM ( It can be implemented using any suitable type of memory circuit design, such as magnetoresistive RAM (MRAM), flash memory, etc. Cache memory circuit 105 may be organized using any suitable cache structure, including the use of multiple paths and/or sets. The cache controller circuit 101 performs cache operations in the cache memory circuit 105, such as maintaining cache tags and determining whether the address associated with a memory transaction is a hit (the cache line corresponds to the current address) or a miss (no cache line circuits for determining whether an address has not been filled with data corresponding to the address, issuing cache line charge requests in response to a miss, marking cache lines for eviction, etc. Address map 110 includes any suitable combination of software, firmware, and hardware circuits for determining physical addresses for memory mapped registers and memory circuits. In some embodiments, address map 110 includes conversion tables for converting logical addresses to physical addresses.

도시된 바와 같이, 캐시 제어기 회로(101)는 시간 t0에서 할당 요청(145)을 수신한다. 캐시 제어기 회로(101)는 현재 사용 중인 캐시 메모리 회로(105)의 일부분을 재할당하라는 할당 요청(145)을 수신하도록 구성된다. 할당 요청(145)은 캐시 라인들(120 내지 127)(예컨대, 캐시 라인(123)) 중 하나 이상의 캐시 라인들에 대응하는 어드레스 영역들(115) 중 하나(예컨대, 어드레스 영역(115b))를 식별한다. 캐시 제어기 회로(101)는 시간 t0에서 할당 요청(145)을 수신하는데, 이 시점에서, 캐시 메모리 회로(105)가 사용 중이었고, 시스템 메모리(도시되지 않음) 내의 위치들을 캐싱하기 위해 캐시 라인들(120 내지 127) 중 하나 이상의 캐시 라인들이 사용 중일 수 있다. 할당 요청(145)은 어드레스 영역(115b)에 대응하는 어드레스 값의 포함에 의해 어드레스 영역(115b)을 표시할 수 있다. 다른 실시예들에서, 어드레스 영역(115b)에 대응하는 인덱스 값과 같은 다른 형태들의 표시들이 어드레스 영역(115b)을 식별하는 데 사용될 수 있다.As shown, cache controller circuit 101 receives allocation request 145 at time t0. Cache controller circuit 101 is configured to receive an allocation request 145 to reallocate a portion of cache memory circuit 105 that is currently in use. Allocation request 145 requests one of the address areas 115 (e.g., address area 115b) corresponding to one or more of the cache lines 120 to 127 (e.g., cache line 123). Identify. Cache controller circuit 101 receives allocation request 145 at time t0, at which point cache memory circuit 105 was busy and used cache lines to cache locations in system memory (not shown). One or more of the cache lines (120 to 127) may be in use. The allocation request 145 may indicate the address area 115b by including an address value corresponding to the address area 115b. In other embodiments, other forms of indications, such as an index value corresponding to the address area 115b, may be used to identify the address area 115b.

어드레스 영역들(115) 각각은, 활성일 때, 캐시 라인들(120 내지 127) 중 하나 이상의 캐시 라인들과 같은, 캐시 메모리 회로(105) 내의 메모리 위치들에 대응하는 복수의 어드레스들에 대응할 수 있다. 어드레스 영역들(115) 중 특정 어드레스 영역이 활성일 때, 대응하는 캐시 라인(들)은 데이터를 캐싱하기 위해 사용되는 것이 아니라, 오히려 RAM으로서 사용된다. 어드레스 영역이 비활성일 때, 대응하는 캐시 라인(들)은 데이터 값들을 캐싱하기 위해 사용될 수 있다. 비활성 어드레스 영역과 연관된 어드레스들은 불법적 어드레스로서 취급될 수 있고, 따라서, 트랜잭션에 포함되는 경우에 예외를 생성할 수 있다. 시간 t0에서, 어드레스 영역(115b)은 (도 1에서 해싱에 의해 표시된 바와 같이) 활성이 아니고, 따라서, 영역 내에 어떠한 데이터 값들도 저장되지 않을 수 있다. 어드레스 영역들(115c, 115d)이 또한 비활성인 반면, 어드레스 영역(115a)은 현재 활성이다. 일부 실시예들에서, 어드레스 영역(115a)은 메인 시스템 메모리에 대응할 수 있으며, 시스템(100)이 활성일 때 항상 인에이블되는 메모리 맵핑된 레지스터들 및 모든 메모리 위치들에 대한 어드레스들을 포함할 수 있다.Each of the address areas 115, when active, may correspond to a plurality of addresses corresponding to memory locations within the cache memory circuit 105, such as one or more of the cache lines 120-127. there is. When a particular address area of address areas 115 is active, the corresponding cache line(s) are not used to cache data, but rather as RAM. When an address area is inactive, the corresponding cache line(s) may be used to cache data values. Addresses associated with an inactive address area may be treated as illegal addresses and, therefore, may generate an exception when included in a transaction. At time t0, address area 115b is not active (as indicated by hashing in Figure 1) and therefore no data values may be stored within the area. Address areas 115c and 115d are also inactive, while address area 115a is currently active. In some embodiments, address area 115a may correspond to main system memory and may include addresses for all memory locations and memory-mapped registers that are always enabled when system 100 is active. .

할당 요청(145)에 응답하여, 캐시 제어기 회로(101)는 캐시 동작들로부터 캐시 라인(123)을 배제함으로써, 시간 t1에서, 캐시 라인(123)을 직접 어드레싱가능한 랜덤 액세스 메모리(RAM)로 컨버팅하도록 추가로 구성될 수 있다. 이어서, 캐시 라인(123)은 어드레스 영역(115b) 내의 위치들로 어드레싱된 메모리 트랜잭션들을 사용하여 직접 어드레싱될 수 있다. 예를 들어, 어드레스 영역(115b)을 활성화하는 것은 어드레스 영역(115b) 내의 위치들로 어드레싱된 트랜잭션들이 캐시 라인(123)에 대응하는 메모리 셀들로 라우팅되도록 어드레스 맵(110)을 수정하는 것을 포함할 수 있다. 또한, 캐시 제어기 회로(101)는, 캐시 라인(123)이 이용불가능하다는, 그리고 어떠한 캐싱된 데이터도 현재 캐시 라인(123)에 저장되어 있지 않다는 표시를 추가로 설정할 수 있다. 예를 들어, 캐시 제어기 회로(101)는 캐시 라인(123)에 대응하는 캐시 태그에 그러한 표시들을 제공하는 하나 이상의 비트들을 설정할 수 있다.In response to allocation request 145, cache controller circuitry 101 excludes cache line 123 from cache operations, thereby converting cache line 123 to directly addressable random access memory (RAM) at time t1. It may be additionally configured to do so. Cache line 123 can then be addressed directly using memory transactions addressed to locations within address area 115b. For example, activating address area 115b may include modifying address map 110 such that transactions addressed to locations within address area 115b are routed to memory cells corresponding to cache line 123. You can. Additionally, cache controller circuit 101 may further set an indication that cache line 123 is unavailable and that no cached data is currently stored in cache line 123. For example, cache controller circuit 101 may set one or more bits providing such indications in the cache tag corresponding to cache line 123.

그러한 RAM에 대한 캐시(cache-to-RAM) 기법의 사용은, 시스템(100)에서 실행하는 프로세스가, 시스템이 활성인 임의의 시점에서, 직접 어드레싱가능한 RAM으로서 사용하기 위해 캐시 메모리 회로(105)의 일부분을 할당하는 것을 가능하게 할 수 있다. 전술된 바와 같이, 그러한 할당은 특정 에이전트가 실시간 트랜잭션들과 같은 높은 우선순위 트랜잭션들에 관련된 데이터와 함께 사용하기 위해 낮은 레이턴시 액세스 시간들로 메모리 공간을 예약할 수 있게 할 수 있다. 캐시 메모리 회로(105)에서 이러한 공간을 할당하는 것은, 특정 에이전트가 실시간 트랜잭션들로 수행될 때까지 다른 에이전트들이 할당된 부분의 사용을 얻는 것을 추가로 방지할 수 있고, 다시 캐시 라인(123)으로서 사용하기 위해 그 부분을 할당해제할 수 있다.The use of cache-to-RAM techniques allows a process executing in system 100 to cache memory circuitry 105 for use as directly addressable RAM at any time the system is active. It may be possible to allocate a portion of . As described above, such allocation may allow a particular agent to reserve memory space with low latency access times for use with data related to high priority transactions, such as real-time transactions. Allocating this space in the cache memory circuit 105 may further prevent other agents from obtaining use of the allocated portion until a particular agent performs real-time transactions, again as cache line 123. You can deallocate that part for use.

도 1에 도시된 바와 같은 시스템(100)은 일례일 뿐임에 유의한다. 도 1의 도시는 본 개시내용에 관련된 특징부들을 강조하기 위해 간략화되었다. 다양한 실시예들은 추가적인 요소들 및/또는 요소들의 상이한 구성들을 포함할 수 있다. 예를 들어, 단지 8개의 캐시 라인들 및 4개의 어드레스 영역들만이 도시되어 있다. 임의의 적합한 수의 캐시 라인들 및 어드레스 영역들이 다른 실시예들에서 구현될 수 있다. 어드레스 영역(115b)이 단일 캐시 라인에 대응하는 것으로 도시되어 있지만, 다른 실시예들에서, 어드레스 영역은 캐시 라인의 일부분뿐만 아니라 임의의 적합한 수의 캐시 라인들에 대응할 수 있다.Note that system 100 as shown in FIG. 1 is merely an example. The illustration of FIG. 1 has been simplified to emphasize features relevant to the present disclosure. Various embodiments may include additional elements and/or different configurations of elements. For example, only 8 cache lines and 4 address areas are shown. Any suitable number of cache lines and address areas may be implemented in other embodiments. Although address area 115b is shown as corresponding to a single cache line, in other embodiments, the address area may correspond to any suitable number of cache lines as well as a portion of a cache line.

도 1에 예시된 시스템은 단순화된 도면으로 도시되어 있다. 캐시 메모리 시스템들은 다양한 방식들로 구현될 수 있다. 캐시 메모리를 갖는 시스템의 다른 예가 도 2에 도시되어 있다.The system illustrated in Figure 1 is shown in a simplified diagram. Cache memory systems can be implemented in a variety of ways. Another example of a system with cache memory is shown in Figure 2.

도 2를 참조하면, 캐시 메모리 회로에서 경로들의 사용을 채용하는 캐시 메모리 시스템의 일 실시예의 블록도가 도시되어 있다. 시스템(200)은, 예시된 바와 같이, 캐시 제어기 회로(201), 캐시 메모리 회로(205), 어드레스 맵(210), 및 프로세서(230)를 포함한다. 프로세서(230)는 범용 프로세싱 코어, 또는 데이터를 프로세싱할 수 있고 메모리 요청들을 발행할 수 있는 다른 유형의 프로세싱 회로에 대응할 수 있다. 후술되는 바를 제외하면, 시스템(200)의 요소들은 도 1의 유사하게 명명되고 넘버링된 요소들에 대해 설명된 바와 같은 기능들을 수행한다.2, a block diagram of one embodiment of a cache memory system employing the use of paths in a cache memory circuit is shown. System 200 includes a cache controller circuit 201, a cache memory circuit 205, an address map 210, and a processor 230, as illustrated. Processor 230 may correspond to a general-purpose processing core, or another type of processing circuit capable of processing data and issuing memory requests. Except as described below, elements of system 200 perform functions as described for similarly named and numbered elements of FIG. 1 .

도시된 바와 같이, 캐시 메모리 회로(205)는 복수의 경로들(240a 내지 240d)뿐만 아니라 복수의 세트들(250 내지 257)을 갖는 캐시 메모리를 포함한다. 프로세서(230)는 활성 및 비활성 어드레스 영역들(215)을 포함하는 어드레스 맵(210)을 사용하여 메모리 요청들을 발행하도록 구성된다. 예시된 실시예에서, 어드레스 영역(215m)은 시스템(200)이 활성일 때 항상 활성이고, 메인 시스템 메모리뿐만 아니라 다양한 레지스터들에 대한 어드레스들을 포함할 수 있다. 프로세서(230)가 시스템 메모리(예컨대, 어드레스 영역(215m) 내의 어드레스)로 메모리 요청을 발행할 때, 메모리 페치(fetch)에 포함된 페치 어드레스는, 캐시 메모리 회로(205) 내의 캐시 라인이 페치 어드레스에 대응하는 시스템 메모리 위치들에 대한 유효 값들을 현재 보유하는지의 여부를 결정하기 위해 캐시 제어기 회로(201)에 의해 사용된다. 그러한 결정을 행하기 위해, 캐시 제어기 회로(201)는 페치 어드레스를 사용하여 세트들(250 내지 257) 중 특정 세트를 식별할 수 있다. 예를 들어, 캐시 제어기 회로(201)는 해싱 알고리즘에서 페치 어드레스의 적어도 일부를 사용하여 특정 해시 값을 결정할 수 있다. 이어서, 이러한 해시 값은 세트들(250 내지 257) 중 특정 세트를 식별하는 데 사용될 수 있다. 세트들(250 내지 257) 각각은 각각의 경로들(240)로부터의 적어도 하나의 캐시 라인을 포함한다. 특정 세트에 대한 경로들(240) 중 임의의 경로 내의 캐시 라인이 페치 어드레스에 대응하는 유효 값들을 보유하는 경우, 메모리 요청은 캐시 메모리 회로(205)에서 "히트하는" 것으로 언급된다. 그렇지 않은 경우, 메모리 요청은 캐시 메모리 회로(205)에서 "미스"이다. 다수의 경로들의 사용은 캐시 제어기 회로(201)가 페칭된 값들을 캐시 메모리 회로(205)의 캐시 라인들에 맵핑하는 방법에 있어서 약간의 유연성을 가능하게 할 수 있다.As shown, cache memory circuit 205 includes a cache memory having a plurality of paths 240a through 240d as well as a plurality of sets 250 through 257. Processor 230 is configured to issue memory requests using an address map 210 that includes active and inactive address regions 215. In the illustrated embodiment, address area 215m is always active when system 200 is active and may include addresses for various registers as well as main system memory. When the processor 230 issues a memory request to system memory (e.g., an address within the address area 215m), the fetch address included in the memory fetch is the fetch address of the cache line in the cache memory circuit 205. is used by the cache controller circuit 201 to determine whether it currently holds valid values for the system memory locations corresponding to . To make such a determination, cache controller circuit 201 may use the fetch address to identify a particular set among sets 250-257. For example, cache controller circuit 201 may use at least a portion of the fetch address in a hashing algorithm to determine a particular hash value. This hash value can then be used to identify a particular set of sets 250-257. Each of sets 250 - 257 includes at least one cache line from each of the paths 240 . A memory request is said to “hit” the cache memory circuit 205 if a cache line in any of the paths 240 for a particular set holds valid values corresponding to the fetch address. Otherwise, the memory request is a “miss” in the cache memory circuit 205. The use of multiple paths may allow some flexibility in how the cache controller circuit 201 maps fetched values to cache lines of the cache memory circuit 205.

전술된 바와 같이, 캐시 메모리 회로(205)는, 어드레스 영역(215m)에서와 같이, 시스템 메모리에 대한 액세스보다 더 낮은 레이턴시 액세스 - 더 높은 서비스 품질(QoS)로도 지칭됨 - 를 제공할 수 있다. 소정 조건들 하에서, 프로세서(230)는 높은 QoS 데드라인으로 데이터의 블록을 프로세싱하도록 요구될 수 있다. 시스템 메모리로부터의 데이터를 프로세싱하는 것이 높은 QoS 데드라인의 한도들 내에서 데이터의 성공적인 프로세싱을 위태롭게 할 수 있기 때문에, 프로세서(230)는 캐시 메모리 회로(205)의 일부분이 재할당될 것을 요청하는 할당 요청(245)을 캐시 제어기 회로(201)로 전송할 수 있다.As described above, cache memory circuitry 205 may provide lower latency access—also referred to as higher quality of service (QoS)—than access to system memory, such as in address region 215m. Under certain conditions, processor 230 may be required to process a block of data with a high QoS deadline. Because processing data from system memory may jeopardize successful processing of data within the limits of the high QoS deadline, the processor 230 may request that a portion of the cache memory circuit 205 be reallocated. A request 245 may be sent to cache controller circuit 201.

캐시 제어기 회로(201)는, 도시된 바와 같이, 캐시 메모리 회로(205)의 일부분을 직접 어드레싱가능한 메모리로서 재할당하라는 할당 요청(245)을 프로세서(230)로부터 수신하도록 구성된다. 할당 요청(245)은, 할당 요청(245)을 수신하는 시간에 비활성인 어드레스 영역(215b)을 식별한다. 예를 들어, 할당 요청(245)은 특정 어드레스 값, 또는 어드레스 영역(215b)을 식별하는 다른 유형의 표시를 포함할 수 있다. 할당 요청(245)에 기초하여, 캐시 제어기 회로는 컨버팅할 경로들(240)의 일부분을 선택하도록 추가로 구성된다. 도시된 바와 같이, 경로들(240) 각각은, 표시된 바로서, 어드레스 영역(215b)에 대응하는 경로(240b)를 포함하는 어드레스 범위들(215) 중 하나에 대응할 수 있다. 다른 실시예들에서, 캐시 메모리 회로(205)는 2개 이상의 경로들이 주어진 어드레스 영역과 연관될 수 있도록 추가적인 경로들을 포함할 수 있다. 할당 요청(245)은 또한, 어드레스 영역들(215b, 215c)과 같은 하나 초과의 어드레스 영역을 표시할 수 있다. 일부 실시예들에서, 경로들(240)의 일부분은 특정 경로의 절반 또는 다른 비율일 수 있다. 예를 들어, 경로(240b)는 세트당 2개의 라인들과 같이, 세트들(250 내지 257) 각각에 다수의 캐시 라인들을 포함할 수 있다. 그러한 실시예들에서, 세트들(250 내지 257) 각각으로부터의 2개의 캐시 라인들 중 하나의 캐시 라인이 재할당되어, 이에 의해, 캐시로서 사용하기 위해 경로(240b)의 절반을 남길 수 있는 반면, 다른 절반은 어드레스 영역(215b)에 재할당된다.Cache controller circuit 201 is configured to receive an allocation request 245 from processor 230 to reallocate a portion of cache memory circuit 205 as directly addressable memory, as shown. Allocation request 245 identifies address areas 215b that are inactive at the time of receiving allocation request 245. For example, allocation request 245 may include a specific address value, or other type of indication identifying address region 215b. Based on the allocation request 245, the cache controller circuit is further configured to select a portion of the paths 240 to convert. As shown, each of the paths 240 may correspond to one of the address ranges 215, with path 240b corresponding to address region 215b, as indicated. In other embodiments, cache memory circuit 205 may include additional paths such that more than one path may be associated with a given address region. Allocation request 245 may also indicate more than one address area, such as address areas 215b and 215c. In some embodiments, a portion of paths 240 may be half or another proportion of a particular path. For example, path 240b may include multiple cache lines in each of sets 250-257, such as two lines per set. In such embodiments, one of the two cache lines from each of sets 250-257 may be reallocated, thereby leaving half of path 240b for use as a cache. , the other half is reallocated to the address area 215b.

경로(240b)를 컨버팅하기 위해, 캐시 제어기 회로(201)는 경로들의 선택된 부분에 포함된 특정 캐시 라인들에 대응하는 캐시 태그들에 각자의 표시들을 설정하도록 구성될 수 있다. 캐시 라인들(250b 내지 257b)은 경로(240b)에 포함되고, 도시된 바와 같이, 어드레스 영역(215b)에 대한 재할당을 위해 선택된다. 캐시 라인들(250b 내지 257b) 각각에 대한 캐시 태그들에 각자의 표시들을 추가하는 것은 캐시 메모리로서의 사용으로부터 대응하는 캐시 라인을 제거한다. 그러한 표시들은 수신된 메모리 요청이 캐시 메모리 회로(205)에서 히트인지 아니면 미스인지를 결정할 때 캐시 제어기 회로(201)가 캐시 라인들(250b 내지 257b)을 무시하게 할 수 있고, 캐시 제어기 회로(201)가 캐시 미스로부터의 어드레스를 캐시 라인들(250b 내지 257b) 중 임의의 캐시 라인에 맵핑하는 것을 추가로 방지할 수 있다. 따라서, 캐시 라인들(250b 내지 257b)은 캐시 태그들 내의 이러한 표시들이 설정되어 있는 동안 캐시 메모리 사용으로부터 효과적으로 제거된다.To convert path 240b, cache controller circuit 201 may be configured to set respective indications in cache tags corresponding to specific cache lines included in the selected portion of the paths. Cache lines 250b through 257b are included in path 240b and, as shown, are selected for reallocation to address area 215b. Adding respective indications to the cache tags for each of cache lines 250b through 257b removes the corresponding cache line from use as cache memory. Such indications may cause cache controller circuit 201 to ignore cache lines 250b through 257b when determining whether a received memory request is a hit or miss in cache memory circuit 205, and cache controller circuit 201 ) can further prevent mapping an address from a cache miss to any of the cache lines 250b to 257b. Accordingly, cache lines 250b through 257b are effectively eliminated from cache memory usage while these indications in the cache tags are set.

캐시 제어기 회로(201)는 식별된 어드레스 영역(215b)에서 사용하기 위해 경로(240b)에서 캐시 라인들(250b 내지 257b)을 맵핑하도록 추가로 구성된다. 어드레스 영역(215b)은, 예시된 바와 같이, 재할당된 캐시 라인들(250b 내지 257b)과 함께 사용하기 위해 예약될 수 있는 다수의 어드레스들을 포함하고, 따라서, 임의의 다른 메모리 위치들 또는 레지스터들에 맵핑되지 않을 수 있다. 어드레스 영역(215b)이 비활성일 때, 이러한 어드레스들에 액세스하려는 시도는 예외의 생성, 및/또는 디폴트 값의 반환을 초래할 수 있다. 캐시 제어기 회로(201)는 특정 캐시 라인들(250b 내지 257b)에 대응하는 캐시 태그들에 각자의 실시간 표시자들을 설정하도록 추가로 구성될 수 있다. 그러한 실시간 표시자들은 캐시 라인들(250b 내지 257b), 및 따라서, 어드레스 영역(215b) 내의 어드레스들이 벌크 트랜잭션들보다 더 높은 우선순위들을 갖는 실시간 트랜잭션들과 연관됨을 표기할 수 있다. 따라서, 재할당된 캐시 라인들(250b 내지 257b) 중 임의의 캐시 라인에 대한 메모리 액세스들은 실시간 트랜잭션이 메모리 액세스에서 명시적으로 사용되지 않더라도 실시간 트랜잭션들로서 취급될 수 있다.Cache controller circuit 201 is further configured to map cache lines 250b-257b in path 240b for use in identified address region 215b. Address region 215b, as illustrated, contains a number of addresses that can be reserved for use with reallocated cache lines 250b through 257b, and thus any other memory locations or registers. It may not be mapped to . When address area 215b is inactive, attempts to access these addresses may result in the generation of an exception, and/or the return of a default value. Cache controller circuit 201 may be further configured to set respective real-time indicators to cache tags corresponding to specific cache lines 250b-257b. Such real-time indicators may indicate that cache lines 250b-257b, and therefore addresses within address area 215b, are associated with real-time transactions having higher priorities than bulk transactions. Accordingly, memory accesses to any of the reallocated cache lines 250b through 257b may be treated as real-time transactions even if real-time transactions are not explicitly used in the memory access.

또한, 캐시 제어기 회로(201)는 어드레스 영역(215b)에서 사용하기 위해 경로(240b)를 맵핑하기에 앞서 경로(240b)에서 캐시 라인들(250b 내지 257b) 중 하나 이상의 캐시 라인들을 플러싱하도록 추가로 구성될 수 있다. 프로세서(230)가 할당 요청(245)을 발행하기 전에 캐시 메모리 회로(205)가 사용 중일 수 있기 때문에, 경로(240b) 내의 캐시 라인들 중 하나 이상의 캐시 라인들은 어드레스 영역(215m) 내의 위치들과 같은 메모리 위치들을 캐싱하는 데 사용될 수 있다. 현재 캐싱된 값들이 어드레스 영역(215m) 내의 각자의 위치들에서의 값들과 매칭되는 경우, 이러한 값들은 각자의 캐시 라인이 어드레스 영역(215b)에 맵핑될 때 간단히 클리어될 수 있거나 또는 무시될 수 있다. 그러나, 경로(240b)에서 캐싱된 값이 수정되었지만 아직 어드레스 영역(215m)에 라이트백되지 않은 경우, 그러한 값은 "오손"으로 지칭될 수 있고, 어드레스 영역(215m) 내의 시스템 메모리 위치들에 오손 값들을 라이트백하도록 플러시 커맨드가 발행될 수 있다. 예를 들어, 캐시 라인들(251b, 254b, 257b)은, 예시된 예에서, 오손 데이터를 포함한다. 캐시 제어기 회로(201)는 이러한 캐시 라인들의 오손 값들을 어드레스 영역(215m) 내의 대응하는 위치들에 라이트백하라는 플러시 커맨드(248)를 발행한다. 경로(240b)의 캐시 라인들을 어드레스 영역(215b) 내의 직접 어드레싱가능한 메모리 위치들로 컨버팅하기 전에 하나 이상의 플러시 커맨드들이 발행될 수 있다.Additionally, cache controller circuit 201 is further configured to flush one or more of cache lines 250b through 257b in path 240b prior to mapping path 240b for use in address region 215b. It can be configured. Because cache memory circuitry 205 may be busy before processor 230 issues allocation request 245, one or more of the cache lines in path 240b may be associated with locations within address region 215m. It can be used to cache the same memory locations. If currently cached values match values at respective locations within address area 215m, these values can be simply cleared or ignored when the respective cache line is mapped to address area 215b. . However, if a cached value in path 240b has been modified but has not yet been written back to address area 215m, then such value may be referred to as "corrupt" and corrupts system memory locations within address area 215m. A flush command can be issued to write back the values. For example, cache lines 251b, 254b, and 257b, in the illustrated example, include corrupted data. Cache controller circuit 201 issues a flush command 248 to write back the corrupted values of these cache lines to corresponding locations within address area 215m. One or more flush commands may be issued prior to converting the cache lines of path 240b to directly addressable memory locations within address area 215b.

높은 QoS 데드라인으로 데이터의 블록을 프로세싱한 후에, 프로세서(230)는 어드레스 영역(215b)에서 높은 QoS의 직접 어드레싱가능한 메모리를 위한 즉각적인 사용을 갖지 않을 수 있고, 경로(240b)를 할당해제하라는 요청을 발행하도록 구성될 수 있다. 캐시 제어기 회로(201)는, 어드레스 영역(215b)에서 직접 어드레싱가능한 메모리를 할당해제하라는 요청을 수신하는 것에 응답하여, 캐시 동작들에서 경로(240b)를 포함시키도록 추가로 구성될 수 있다. 경로(240b)가 재할당되었던 동안 직접 어드레싱가능한 메모리에 저장된 값들은 직접 어드레싱가능한 메모리를 할당해제하라는 요청에 응답하여 재위치되지 않는다. 재할당 동안 어드레스 영역(215b)에 기록된 임의의 값들이 삭제 또는 무시될 수 있고, 후속적으로, 경로(240b)가 캐시 메모리 회로(205)의 동작들에서 사용하도록 복귀됨에 따라 오버라이트될 수 있다.After processing a block of data with a high QoS deadline, processor 230 may not have an immediate use for the high QoS directly addressable memory in address region 215b and requests to deallocate path 240b. It can be configured to issue. Cache controller circuit 201 may be further configured to include path 240b in cache operations in response to receiving a request to deallocate directly addressable memory in address region 215b. Values stored in directly addressable memory while path 240b was reallocated are not relocated in response to a request to deallocate directly addressable memory. Any values written to address area 215b during reallocation may be deleted or ignored and subsequently overwritten as path 240b is returned for use in the operations of cache memory circuit 205. there is.

직접 어드레싱가능한 메모리에 캐시 메모리를 재할당하기 위한 캐시 메모리 회로(205)의 경로들(240)의 사용은, 거의 또는 전혀 인터럽트 없이 캐시 메모리 회로(205)의 동작을 계속하는 것을 허용하면서, 수용가능한 양의 추가적인 로직 회로들로 구현될 수 있다는 점에 유의한다. 대조적으로, 개별적인 캐시 라인 기반에서 캐시 메모리의 재할당을 구현하는 것은, 특히 캐시 메모리가 크고/크거나 많은 세트들 및 경로들을 갖는 경우, 더 큰 로직 회로의 추가를 요구할 수 있다. 다른 한편으로, 재할당될 수 있는 캐시 메모리의 양을 추가로 제한하는 것은 높은 QoS 메모리 위치들에 대한 필요성들과 진행 중인 캐시 동작들 사이를 관리하기 위한 적절한 해결책을 제공하지 않을 수 있다. 예를 들어, 캐시 메모리 회로(205)의 재할당이 캐시의 절반으로 제한되는 경우, 할당되는 메모리의 양은 높은 QoS 데이터를 프로세싱하는 데 필요한 것보다 훨씬 더 클 수 있고, 캐시의 용량을 추가로 감소시킬 수 있고, 가능하게는, 캐시를 활용하는 에이전트들의 효율을 감소시킬 수 있다.The use of paths 240 of the cache memory circuit 205 to reallocate cache memory to directly addressable memory can be performed in an acceptable manner, while allowing operation of the cache memory circuit 205 to continue with little or no interruption. Note that it can be implemented with positive additional logic circuits. In contrast, implementing reallocation of cache memory on an individual cache line basis may require the addition of larger logic circuitry, especially if the cache memory is large and/or has many sets and paths. On the other hand, further limiting the amount of cache memory that can be reallocated may not provide an appropriate solution for managing between ongoing cache operations and the need for high QoS memory locations. For example, if reallocation of the cache memory circuit 205 is limited to half of the cache, the amount of memory allocated may be much larger than required for processing high QoS data, further reducing the capacity of the cache. This can, and possibly reduces the efficiency of agents that utilize the cache.

또한, 도 2의 실시예는 캐시 메모리 시스템의 하나의 묘사임에 유의한다. 단지 4개의 경로들 및 8개의 세트들만이 도시되어 있지만, 다른 실시예들에서, 임의의 적합한 수의 캐시 경로들 및 세트들이 포함될 수 있다. 또한, 5개의 어드레스 영역들이 도시되어 있지만, 어드레스 맵(210)은 임의의 적합한 수의 영역들로 분할될 수 있다.Additionally, note that the embodiment of Figure 2 is one depiction of a cache memory system. Although only four paths and eight sets are shown, in other embodiments any suitable number of cache paths and sets may be included. Additionally, although five address areas are shown, address map 210 may be divided into any suitable number of areas.

도 2의 설명은 직접 어드레싱가능한 메모리 위치들의 할당해제를 설명하였다. 다시 캐시 메모리에 대한 직접 어드레싱가능한 메모리 영역의 할당해제는 다양한 방식들로 구현될 수 있다. 도 3은 그러한 방식을 도시한다.The description of Figure 2 describes deallocation of directly addressable memory locations. Again, deallocation of a directly addressable memory area to cache memory can be implemented in a variety of ways. Figure 3 shows such a scheme.

도 3을 참조하면, 도 1의 시스템은 다시, 2개의 상이한 시점들에서 예시된다. 시스템(100)은, 도 1을 참조하여 설명된 바와 같이, 캐시 제어기 회로(101), 캐시 메모리 회로(105), 및 어드레스 맵(110)을 포함한다. 시간 t0에서, 캐시 라인(123)은 어드레스 영역(115b)에서 직접 어드레싱가능한 메모리에 맵핑되고, 따라서, 캐시 동작들에 이용불가능하다. 예시된 바와 같이, 캐시 제어기 회로(101)는, 예컨대 도 2의 프로세서(230)와 같은 에이전트로부터, 어드레스 영역(115b)에서 직접 어드레싱가능한 메모리로부터 캐시 라인(123)을 할당해제하라는 할당해제 요청(345)을 수신하도록 구성된다. 할당해제 요청(345)에 응답하여, 캐시 제어기 회로(101)는, 캐시 라인(123)이 어드레스 영역(115b)에 재할당되었던 동안 직접 어드레싱가능한 메모리에 저장된 데이터를 카피하지 않고서, 캐시 동작들에서 캐시 라인(123)을 포함시키도록 추가로 구성된다. 예를 들어, 전술된 바와 같이, 캐시 제어기 회로(101)는 캐시 라인(123)이 직접 어드레싱가능한 메모리에 재할당되었음을 표시하기 위한 표시를 캐시 라인(123)에 대한 연관된 캐시 태그에 설정했을 수 있다. 캐시 라인(123)을 할당해제하기 위해, 캐시 제어기 회로(101)는 이러한 표시를 클리어하여, 이에 의해, 어드레스 영역(115b)으로부터 캐시 라인(123)을 제거하고 후속 캐시 동작들에서 캐시 라인(123)을 포함시킬 수 있다.Referring to Figure 3, the system of Figure 1 is again illustrated from two different viewpoints. System 100 includes a cache controller circuit 101, a cache memory circuit 105, and an address map 110, as described with reference to FIG. At time t0, cache line 123 is mapped to directly addressable memory in address area 115b and is therefore unavailable for cache operations. As illustrated, cache controller circuit 101 may receive an deallocation request (e.g., from an agent, such as processor 230 of FIG. 2) to deallocate cache line 123 from directly addressable memory in address region 115b. 345). In response to deallocation request 345, cache controller circuit 101 may perform cache operations without copying data stored in directly addressable memory while cache line 123 has been reallocated to address region 115b. It is further configured to include a cache line (123). For example, as described above, cache controller circuit 101 may set an indication in the associated cache tag for cache line 123 to indicate that cache line 123 has been reallocated to directly addressable memory. . To deallocate cache line 123, cache controller circuit 101 clears this indication, thereby removing cache line 123 from address area 115b and removing cache line 123 from subsequent cache operations. ) can be included.

어드레스 영역(115b)에 할당된 동안 캐시 라인(123)에 기록된 값들은 연관된 캐시 태그의 표시가 클리어될 때 삭제 또는 무시될 수 있다. 어드레스 영역(115b) 내의 어드레스들이 어드레스 맵(110) 내의 다른 곳에서 구현되지 않을 수 있기 때문에, 어떠한 라이트백 요청들도 이러한 값들을 카피하기 위해 발행되지 않을 수 있다. 활성인 동안의 어드레스 영역(115b)을 활용하는 에이전트가 데이터를 어드레스 영역(115b)으로부터 어드레스 맵(110) 내의 다른 위치들로 명시적으로 카피하지 않는 한, 어드레스 영역(115b) 내의 값들은 할당해제가 완료된 후에 손실될 수 있다.Values written to cache line 123 while assigned to address area 115b may be deleted or ignored when the display of the associated cache tag is cleared. Because addresses within address area 115b may not be implemented elsewhere in address map 110, no writeback requests may be issued to copy these values. Values within address area 115b are deallocated unless an agent utilizing address area 115b while active explicitly copies data from address area 115b to other locations within address map 110. may be lost after completion.

시간 t1에서, 메모리 트랜잭션(350)이 어드레스 영역(115b) 내의 값에 액세스하기 위해 에이전트에 의해 발행된다. 캐시 제어기 회로(101)는, 메모리 트랜잭션(350)이 어드레스 영역(115b)을 할당해제한 후에 수신되는 것에 응답하여, 에러 메시지(355)를 생성하도록 구성된다. 일부 실시예들에서, 메모리 트랜잭션(350)이 어드레스 영역(115b) 내의 어드레스에 대한 기록 또는 그의 수정을 포함하는 경우에 에러 메시지(355)가 생성될 수 있다. 그와 달리, 메모리 트랜잭션(350)이 어드레스 영역(115b)에 대한 판독 액세스들만을 포함하는 경우, 캐시 제어기 회로(101)는, 에러 메시지(355)를 생성하는 대신 또는 그에 더하여, 모든 0 비트들 또는 모든 1 비트들과 같은 특정 디폴트 값을 요청 에이전트로 반환할 수 있다. 에러 메시지(355)를 생성하는 것은 다양한 기법들을 사용하여 구현될 수 있다. 예를 들어, 에러 메시지(355)는, 결국, 특정 프로세스가 시스템(100) 내의 하나 이상의 프로세서 코어들에 의해 실행되게 하는 예외 신호를 어써트함으로써 생성될 수 있다. 에러 메시지(355)를 생성하는 것은 메모리 트랜잭션(350)을 발행했던 에이전트에게 어드레스 영역(115b)이 할당해제되었음을 표시하는 특정 값을 반환하는 것을 포함할 수 있다.At time t1, memory transaction 350 is issued by the agent to access a value within address area 115b. Cache controller circuit 101 is configured to generate an error message 355 in response to a memory transaction 350 being received after deallocating address region 115b. In some embodiments, error message 355 may be generated when memory transaction 350 includes writing to or modifying an address within address area 115b. Alternatively, if memory transaction 350 includes only read accesses to address region 115b, cache controller circuit 101, instead of or in addition to generating error message 355, returns all zero bits. Alternatively, a specific default value, such as all 1 bits, can be returned to the requesting agent. Generating the error message 355 may be implemented using a variety of techniques. For example, error message 355 may be generated by asserting an exception signal that, in turn, causes a particular process to be executed by one or more processor cores within system 100. Generating the error message 355 may include returning a specific value indicating that the address region 115b has been deallocated to the agent that issued the memory transaction 350 .

도 3은 직접 어드레싱가능한 어드레스 영역으로부터의 캐시 라인들의 할당해제를 도시한다. 도 4로 진행하면, 시스템(100)은 (캐시 라인(123)을 사용하여) 활성인 어드레스 영역(115b)을 갖는 것으로 도시되고, 어드레스 영역(115b)에 대한 액세스 및 캐시 동작들이 어떻게 동시에 핸들링될 수 있는지를 예시한다. 시스템(100)은, 도 4에서, 캐시 제어기 회로(101)가 기록 요청들(445, 446)을 수신하는 것을 도시한다.Figure 3 shows deallocation of cache lines from a directly addressable address region. 4 , system 100 is shown as having address area 115b active (using cache line 123), and how access and cache operations to address area 115b will be handled concurrently. Shows whether it can be done. System 100, in FIG. 4, shows cache controller circuitry 101 receiving write requests 445, 446.

도시된 바와 같이, 기록 요청(445)은 캐시 메모리 회로(105)의 캐시 라인(121)에 현재 캐싱되는 하나 이상의 위치들에 데이터를 기록하라는 기록 요청을 포함한다. 유사한 방식으로, 기록 요청(446)은 캐시 메모리 회로(105)로부터 어드레스 맵(110)으로 캐시 라인(123)을 재할당함으로써 구현되는 어드레스 영역(115b) 내의 하나 이상의 위치들에 데이터를 기록하라는 기록 요청을 포함한다. 캐시 제어기 회로(101)는, 기록 요청(445)에 응답하여, 캐시 라인(121)에 대한 라이트백 요청(447)을 발행하도록 구성된다. 캐시 라인(121) 내의 수정된 값들은, 시스템 메모리 내의 대응하는 타깃 어드레스들과 함께, 라이트백 요청(447)에 포함된다. 라이트백 요청(447)은 이러한 수정된 값들이 시스템 메모리 내의 타깃 어드레스들에서 업데이트되게 한다. 캐시 라인(121)이 축출되고 이어서 시스템 메모리 내의 상이한 어드레스들에 맵핑되는 경우, 시스템 메모리 내의 타깃 어드레스들은 여전히 최신 값들을 가질 수 있다.As shown, write request 445 includes a write request to write data to one or more locations currently cached in cache line 121 of cache memory circuit 105. In a similar manner, write request 446 is a write request to write data to one or more locations within address region 115b, which is implemented by reallocating cache lines 123 from cache memory circuit 105 to address map 110. Includes requests. Cache controller circuitry 101 is configured to issue a writeback request 447 for cache line 121 in response to write request 445 . Modified values in cache line 121 are included in writeback request 447, along with corresponding target addresses in system memory. Writeback request 447 causes these modified values to be updated at target addresses in system memory. If cache line 121 is evicted and then mapped to different addresses within system memory, the target addresses within system memory may still have the latest values.

캐시 제어기 회로(101)는, 예시된 바와 같이, 라이트백 요청들로부터 캐시 라인(123)을 배제하도록 추가로 구성된다. 기록 요청(446)은 어드레스 영역(115b)(캐시 라인(123)을 포함함) 내의 하나 이상의 값들을 수정할 수 있다. 캐시 라인(123) 내의 값들이 수정됨에도 불구하고, 캐시 제어기 회로(101)는 라이트백 커맨드들에 관하여 이러한 수정들을 무시하도록 구성된다. 어드레스 영역(115b)은, 캐시 라인(123)을 포함함에도 불구하고 엔드포인트 메모리 목적지로서 취급된다. 시스템 메모리 내의 어떠한 타깃 어드레스도 어드레스 영역(115b) 내의 어드레스들에 대응하지 않는다. 따라서, 캐시 라인(123)에 저장된 수정된 값들은 기록 요청(446)에 응답하여 다른 메모리 회로에서 업데이트되지 않을 수 있다.Cache controller circuit 101 is further configured to exclude cache line 123 from writeback requests, as illustrated. Write request 446 may modify one or more values within address area 115b (including cache line 123). Despite the values within cache line 123 being modified, cache controller circuit 101 is configured to ignore these modifications with respect to writeback commands. Address area 115b, despite containing cache lines 123, is treated as an endpoint memory destination. No target address in the system memory corresponds to addresses in the address area 115b. Accordingly, modified values stored in cache line 123 may not be updated in other memory circuits in response to write request 446.

그러나, 상이한 캐시 메모리가 캐시 메모리 회로(105)와 기록 요청(446)을 발행하는 프로세싱 회로 사이에 상주할 수 있다는 것에 유의한다. 예를 들어, 캐시 메모리 회로(105)는 L2 캐시일 수 있고, 기록 요청(446)을 발행하는 프로세싱 회로는 L1 캐시를 포함할 수 있다. 그러한 실시예에서, L1 캐시는 어드레스 영역(115b)에 저장된 적어도 일부 값들을 (예컨대, 캐시 라인(123)에서) 캐싱할 수 있다.However, note that different cache memories may reside between the cache memory circuitry 105 and the processing circuitry issuing the write request 446. For example, cache memory circuitry 105 may be an L2 cache, and the processing circuitry issuing write request 446 may include an L1 cache. In such an embodiment, the L1 cache may cache at least some values stored in address area 115b (e.g., in cache line 123).

도 3 및 도 4의 실시예들은 단지, 개시된 개념들을 입증하기 위한 예들일 뿐이라는 것에 추가로 유의한다. 이러한 도면들에 도시된 시스템(100)은 명확성을 위해 단순화되어 있다. 다른 실시예들에서, 설명된 동작들을 야기하는 메모리 트랜잭션들을 발행하는 하나 이상의 에이전트들과 같은 추가적인 요소들이 포함될 수 있다. 추가적으로, 도 3 및 도 4가 도 1의 시스템(100)을 활용하지만, 설명된 기법들은 도 2의 시스템(200)에 적용될 수 있다.It is further noted that the embodiments of FIGS. 3 and 4 are merely examples to demonstrate the disclosed concepts. The system 100 shown in these figures is simplified for clarity. In other embodiments, additional elements may be included, such as one or more agents issuing memory transactions that result in the described operations. Additionally, although FIGS. 3 and 4 utilize system 100 of FIG. 1, the techniques described can be applied to system 200 of FIG. 2.

실시간 트랜잭션들의 사용은, 다양한 용량들로, 개시된 기법들과 함께 사용되는 것으로서 전술되어 있다. 실시간 및 벌크 트랜잭션들 둘 모두는 본 명세서에 설명된 캐시 기반 어드레스 영역들을 타깃화하는 메모리 요청들을 위해 사용될 수 있다. 도 5는 상이한 QoS 레벨들을 갖는 트랜잭션들의 사용이 어떻게 구현될 수 있는지의 예를 예시한다.The use of real-time transactions, in various capacities, is described above as used with the disclosed techniques. Both real-time and bulk transactions can be used for memory requests targeting cache-based address areas described herein. Figure 5 illustrates an example of how the use of transactions with different QoS levels can be implemented.

이제 도 5로 이동하면, 중재기들이 시스템 네트워크에 걸쳐 트랜잭션들을 스케줄링하는 데 사용되는 시스템이 도시되어 있다. 시스템(500)은 캐시 제어기 회로(501) 및 어드레스 맵(510)을 포함하는데, 이들은 후술되는 바를 제외하면, 도 1, 도 3, 및 도 4의 유사하게 명명되고 넘버링된 요소들에 대응할 수 있다. 시스템(500)은 에이전트들(530a, 530b)(집합적으로, 에이전트들(530)), 네트워크 중재기 회로(540), 및 버스 회로(545)를 추가로 포함한다. 도 5는 전술된 바와 같은 기법을 사용하여, 캐시 제어기 회로(501)로부터 하나 이상의 캐시 라인들을 사용하여 구현되는 어드레스 영역(515b)에 어드레싱된 2개의 메모리 트랜잭션들(각각 실시간 트랜잭션 및 벌크 트랜잭션인 메모리 트랜잭션들(550, 555))의 흐름을 예시한다.Turning now to Figure 5, a system is shown where arbiters are used to schedule transactions across the system network. System 500 includes a cache controller circuit 501 and an address map 510, which may correspond to similarly named and numbered elements of FIGS. 1, 3, and 4, except as described below. . System 500 further includes agents 530a, 530b (collectively, agents 530), network arbiter circuitry 540, and bus circuitry 545. 5 illustrates two memory transactions (one real-time transaction and one bulk transaction, respectively) addressed to address area 515b, implemented using one or more cache lines from cache controller circuit 501, using the technique described above. The flow of transactions 550 and 555 is illustrated.

제1 시점에서, 에이전트(530b)는 벌크 메모리 트랜잭션(555)을 어드레스 영역(515b) 내의 목적지로 발행한다. 도 1 및 도 2의 어드레스 영역들(115b, 215b)과 유사한 방식으로, 어드레스 영역(515b)은 캐시 제어기 회로(501)와 연관된 캐시 메모리의 일부분을 포함한다. 예시된 바와 같이, 메모리 트랜잭션(555)은 제어기 회로(501)를 캐싱하기 위해 그의 경로 상에서 네트워크 중재기 회로(540)에 의해 수신된다. 메모리 트랜잭션(555)이 벌크 트랜잭션이기 때문에, 네트워크 중재기 회로(540)는 버스 회로(545)가 메모리 트랜잭션(555)을 캐시 제어기 회로(501)로 포워딩하기 위해 이용가능한 대역폭을 가질 때까지 메모리 트랜잭션(555)을 벌크 큐(565a)에 배치한다.At a first point in time, agent 530b issues bulk memory transaction 555 to a destination within address area 515b. In a similar manner to address areas 115b and 215b of FIGS. 1 and 2, address area 515b includes a portion of cache memory associated with cache controller circuit 501. As illustrated, memory transaction 555 is received by network arbiter circuit 540 on its path to caching controller circuit 501. Because memory transaction 555 is a bulk transaction, network arbiter circuitry 540 transfers memory transaction 540 until bus circuitry 545 has available bandwidth to forward memory transaction 555 to cache controller circuitry 501. Place 555 in bulk queue 565a.

버스 회로(545)는 캐시 제어기 회로(501)를 네트워크 중재기 회로(540)에 결합하는 와이어들의 세트를 포함한다. 일부 실시예들에서, 버스 회로(545)는 독립적인 물리적 벌크 및 실시간 채널들을 지원하기 위해 충분한 수의 와이어들을 포함할 수 있다. 그러나, 도시된 바와 같이, 버스 회로(545)는 그러한 수의 와이어들을 포함하지 않고, 따라서, 실시간 및 벌크 메모리 트랜잭션들 둘 모두는 동일한 세트의 와이어들을 사용하여 전달되어, 가상 벌크 및 실시간 채널들을 활용하여 각각의 유형의 트랜잭션에 대한 각자의 QoS 레벨들을 지원한다. 따라서, 네트워크 중재기 회로(540)는 버스 회로(545)를 통해 전송할 다음 트랜잭션에 대해 실시간(RT) 큐(560a)와 벌크 큐(565a) 사이에서 선택하기 위한 우선순위화 스킴을 사용한다. 예를 들어, 네트워크 중재기 회로(540)는 먼저 RT 큐(560a)에서 트랜잭션들을 전송할 수 있고, 이어서, RT 큐(560a)가 빈 후에 벌크 큐(565a)에서 트랜잭션들을 전송할 수 있다. 다른 실시예들에서, 벌크 큐(565a)가 풀(full) 상태에 도달하는 것 또는 과도한 양의 시간 동안 벌크 큐(565a)에서 벌크 트랜잭션이 정지되게 하는 것을 회피하기 위해, 추가적인 고려사항들이 선택 프로세스에 포함될 수 있다.Bus circuit 545 includes a set of wires that couple cache controller circuit 501 to network arbiter circuit 540. In some embodiments, bus circuit 545 may include a sufficient number of wires to support independent physical bulk and real-time channels. However, as shown, bus circuit 545 does not include such a number of wires, and therefore both real-time and bulk memory transactions are conveyed using the same set of wires, utilizing virtual bulk and real-time channels. Thus, it supports individual QoS levels for each type of transaction. Accordingly, network arbiter circuit 540 uses a prioritization scheme to select between real-time (RT) queue 560a and bulk queue 565a for the next transaction to transmit over bus circuit 545. For example, network arbiter circuitry 540 may first transmit transactions in RT queue 560a and then transmit transactions in bulk queue 565a after RT queue 560a is empty. In other embodiments, additional considerations may be included in the selection process to avoid bulk queue 565a reaching a full state or causing bulk transactions to be stalled in bulk queue 565a for an excessive amount of time. may be included in

본 명세서에서 사용되는 바와 같이, "채널"은 소스 에이전트(예컨대, 프로세서 회로)와 목적지 에이전트(예컨대, 메모리 회로) 사이에서 정보를 전달하는 데 사용되는 매체이다. 채널은 와이어들(회로 기판 또는 집적 회로 상의 전도성 트레이스들을 포함함) 및 다양한 다른 회로 요소들을 포함할 수 있다. 일부 실시예에서, 채널은 특정 주파수 또는 주파수들의 범위의 안테나들 및 전자기파들을 추가로 포함할 수 있다. "물리적" 채널은 채널을 포함하는 회로 요소들을 지칭한다. "가상" 채널은 동일한 물리적 채널에 걸쳐 구현된 2개 이상의 상이한 "채널들"을 지칭한다. 가상 채널들은 다양한 기법들을 사용하여 구현될 수 있다. 예를 들어, 채널의 가상화는 각각의 가상 채널에 대한 각자의 큐들을 포함함으로써 채널 인터페이스에서 구현될 수 있다. 에이전트는 각자의 채널에 대한 큐를 사용하여 주어진 채널을 통해 트랜잭션들을 전송 및 수신한다. 이어서, 다른 회로들은 각자의 큐들 사이에서 채널 중재를 제어하여, 채널이 이용가능할 때 전송할 특정 트랜잭션들을 선택할 수 있다. 다른 실시예들에서, 에이전트는 다양한 트랜잭션들을 대응하는 가상 채널들에 연관시키는 것을 담당할 수 있다. 그러한 실시예들에서, 에이전트는 적절한 가상 채널들에 트랜잭션들을 할당하기 위해, 그리고 이어서, 채널이 이용가능할 때 전송할 주어진 트랜잭션을 선택하도록 중재하기 위해 적절한 데이터 구조들을 유지할 수 있다.As used herein, a “channel” is a medium used to convey information between a source agent (e.g., a processor circuit) and a destination agent (e.g., a memory circuit). Channels may include wires (including conductive traces on a circuit board or integrated circuit) and various other circuit elements. In some embodiments, a channel may further include antennas and electromagnetic waves at a specific frequency or range of frequencies. A “physical” channel refers to the circuit elements that contain the channel. A “virtual” channel refers to two or more different “channels” implemented over the same physical channel. Virtual channels can be implemented using a variety of techniques. For example, channel virtualization can be implemented in the channel interface by including separate queues for each virtual channel. Agents send and receive transactions through a given channel using queues for their respective channels. Other circuits can then control channel arbitration between their respective queues to select specific transactions to transmit when the channel is available. In other embodiments, an agent may be responsible for associating various transactions to corresponding virtual channels. In such embodiments, the agent may maintain appropriate data structures to assign transactions to appropriate virtual channels and then arbitrate to select a given transaction to transmit when the channel is available.

제2 시점에서, 네트워크 중재기 회로(540)는 벌크 큐(565a)로부터 메모리 트랜잭션(555)을 선택하고, 그것을 캐시 제어기 회로(501)로 포워딩한다. 캐시 제어기 회로(501)는 이어서, 대역폭이 어드레스 영역(515b)에서 메모리 트랜잭션(555)을 프로세싱하는 데 이용가능할 때까지 메모리 트랜잭션(555)을 벌크 큐(565b)에 배치할 수 있다. 한편, 제2 시점 이후의 제3 시점에서, 에이전트(530a)는, 버스 회로(545)를 통해, 메모리 트랜잭션(550)을 캐시 제어기 회로(501)로 전송한다. 네트워크 중재기 회로(540)는 실시간 메모리 트랜잭션(550)을 수신하고, 그것을 RT 큐(560a)에 배치한다. 후속 제4 시점에서, 메모리 트랜잭션(550)은 네트워크 중재기 회로(540)에 의해 선택되고, 수신된 메모리 트랜잭션(550)을 RT 큐(560b)에 배치하는 캐시 제어기 회로(501)로 전송된다.At a second point, network arbiter circuitry 540 selects memory transaction 555 from bulk queue 565a and forwards it to cache controller circuitry 501. Cache controller circuit 501 may then place memory transaction 555 in bulk queue 565b until bandwidth is available to process memory transaction 555 in address area 515b. Meanwhile, at a third time after the second time, agent 530a transmits memory transaction 550 to cache controller circuit 501, via bus circuit 545. Network arbiter circuitry 540 receives real-time memory transaction 550 and places it in RT queue 560a. At a subsequent fourth point in time, the memory transaction 550 is selected by the network arbiter circuit 540 and sent to the cache controller circuit 501 which places the received memory transaction 550 in RT queue 560b.

예시된 예에서, 메모리 트랜잭션들(550, 555) 둘 모두는 RT 큐(560b) 및 벌크 큐(565b)에 각각 있다. 캐시 제어기 회로(501)는 어드레스 영역들(515a 내지 515d)에서 메모리 트랜잭션들에 대한 실시간 및 벌크 가상 채널들을 지원하도록 구성된다. 따라서, 캐시 제어기 회로(501)는, 네트워크 중재기 회로(540)와 유사한 선택 스킴을 사용하여, 벌크 가상 채널을 통해 수신된 메모리 트랜잭션(550)보다 실시간 가상 채널을 통해 수신된 메모리 트랜잭션(550)을 우선순위화한다. 제5 시점에서는, 제4 시점 후에, 캐시 제어기 회로(501)는 벌크 큐(565b)에서 기다리는 메모리 트랜잭션(555)을 스킵하고, 대신에, RT 큐(560b)에서 기다리는 메모리 트랜잭션(550)을 선택한다. 이후에, 제6 시점에서, 메모리 트랜잭션(555)은 선택 기준들을 만족시키고, 어드레스 영역(515b)에서 프로세싱된다.In the illustrated example, both memory transactions 550 and 555 are in RT queue 560b and bulk queue 565b, respectively. Cache controller circuit 501 is configured to support real-time and bulk virtual channels for memory transactions in address areas 515a through 515d. Accordingly, the cache controller circuit 501 may use a similar selection scheme as the network arbiter circuit 540 to select memory transactions 550 received over the real-time virtual channel over memory transactions 550 received over the bulk virtual channel. Prioritize. At the fifth time point, after the fourth time point, the cache controller circuit 501 skips the memory transaction 555 waiting on the bulk queue 565b and instead selects the memory transaction 550 waiting on the RT queue 560b. do. Thereafter, at a sixth point in time, memory transaction 555 satisfies the selection criteria and is processed in address area 515b.

시스템(500)은 개시된 개법들을 하이라이트하기 위한 예임에 유의한다. 도 5는 명확성을 위해 단순화되어 있다. 다른 실시예들에서, 추가적인 에이전트들, 다수의 버스 회로들, 연관된 네트워크 중재기 회로들 등과 같은 추가적인 요소들이 포함될 수 있다.Note that system 500 is an example for highlighting the disclosed methods. Figure 5 is simplified for clarity. In other embodiments, additional elements may be included, such as additional agents, multiple bus circuits, associated network arbiter circuits, etc.

도 5는 개시된 기법들을 사용하여 상이한 레벨들의 QoS를 갖는 메모리 트랜잭션들이 어떻게 핸들링될 수 있는지를 도시한다. 개시된 캐시 제어기 회로는 상이한 유형들의 보안 메모리 영역들 내에 있는 어드레스 영역들을 관리하도록 추가로 구성될 수 있다. 그러한 실시예의 설명이 다음에 제시된다.Figure 5 illustrates how memory transactions with different levels of QoS can be handled using the disclosed techniques. The disclosed cache controller circuit can be further configured to manage address areas within different types of secure memory areas. A description of such an embodiment is presented next.

도 6을 참조하면, 오픈 액세스(open-access) 및 보안 액세스 메모리 영역들에 대한 지원을 포함하는 시스템의 실시예가 도시되어 있다. 시스템(600)은 캐시 제어기 회로(601), 어드레스 맵(610), 시스템 메모리 맵(620), 신뢰받는 에이전트(630), 및 신뢰받지 않는 에이전트(635)를 포함한다. 시스템 메모리 맵(620)은, 도시된 바와 같이, 2개의 영역들, 즉 오픈 액세스 영역(623) 및 보안 액세스 영역(627)으로 분할된다. 어드레스 맵(610) 내의 어드레스 영역(615b)은 전술된 바와 같이, 재할당된 캐시 메모리에 대응하고, 보안 액세스 영역(627) 내에서 맵핑된다. 신뢰받는 에이전트(630) 및 신뢰받지 않는 에이전트(635)는 각각 메모리 트랜잭션들(650, 655)을 발행하고, 둘 모두는 어드레스 영역(615b)에서 목적지 어드레스를 타깃화한다.Referring to Figure 6, an embodiment of a system is shown that includes support for open-access and secure access memory regions. System 600 includes a cache controller circuit 601, an address map 610, a system memory map 620, a trusted agent 630, and an untrusted agent 635. System memory map 620 is divided into two regions, as shown, open access region 623 and secure access region 627. Address area 615b in address map 610 corresponds to reallocated cache memory and is mapped within secure access area 627, as described above. Trusted agent 630 and untrusted agent 635 issue memory transactions 650 and 655, respectively, both targeting a destination address in address area 615b.

시스템 메모리 맵(620)은, 예시된 바와 같이, 시스템(600)에 포함된 모든 어드레스 영역들의 메모리 맵을 포함한다. 이러한 어드레스 영역들은 두 가지 유형들의 보안 영역들, 즉 오픈 액세스 영역(623) 및 보안 액세스 영역(627)으로 분류될 수 있다. 오픈 액세스 영역은 시스템(600) 내의 임의의 에이전트(신뢰받는 에이전트(630) 및 신뢰받지 않는 에이전트(635) 둘 모두를 포함함)가 메모리 트랜잭션들을 발행할 수 있는 모든 메모리 범위들을 포함한다. 오픈 액세스 영역은, 예를 들어, 이미지들, 오디오 파일들, 및 일반 애플리케이션들의 실행을 프로세싱하기 위해 사용되는 메모리를 포함하는, 일반 애플리케이션 사용을 위해 사용되는 메모리를 포함할 수 있다. 보안 액세스 영역(627)은 제한된 액세스를 갖는 메모리 범위들을 포함한다. 신뢰받는 에이전트(630)와 같이 신뢰받는 것으로서 분류된 에이전트들만이 보안 액세스 영역(627) 내의 메모리 위치들에 액세스할 수 있다. 신뢰받지 않는 에이전트로부터 보안 액세스 영역(627) 내의 어드레스로의 메모리 트랜잭션은 무시될 수 있거나, 또는 예외와 같은 에러 표시의 생성을 초래할 수 있다.System memory map 620, as illustrated, includes a memory map of all address areas included in system 600. These address areas can be classified into two types of security areas: open access area 623 and secure access area 627. The open access area includes all memory ranges in which any agent within system 600 (including both trusted agents 630 and untrusted agents 635) can issue memory transactions. The open access area may include memory used for general application use, including, for example, memory used for processing images, audio files, and execution of general applications. Secure access area 627 contains memory ranges with restricted access. Only agents classified as trusted, such as trusted agent 630, can access memory locations within secure access area 627. Memory transactions from untrusted agents to addresses within the secure access area 627 may be ignored, or may result in the generation of an error indication, such as an exception.

예시된 예에서, 신뢰받는 에이전트(630) 및 신뢰받지 않는 에이전트(635) 둘 모두는 어드레스 영역(615b) 내의 목적지 어드레스에 대해 각자의 메모리 트랜잭션들(650, 655)을 발행한다. 보안 액세스 영역들을 지원하기 위해, 캐시 제어기 회로(601)는 어드레스 영역(615b)이 보안 액세스 영역(627)에 포함된다고 결정하도록 구성된다. 결정에 응답하여, 캐시 제어기 회로(601)는 보안 액세스 영역(627)에 액세스하도록 인가되지 않은 신뢰받지 않는 에이전트(635)로부터의 메모리 트랜잭션(655)을 무시하도록 구성된다. 그러나, 신뢰받는 에이전트(630)는 보안 액세스 영역(627)에 액세스하도록 인가되고, 따라서, 캐시 제어기 회로(601)는 어드레스 영역(615b)에서 메모리 트랜잭션(650)을 프로세싱하도록 구성된다.In the illustrated example, both trusted agent 630 and untrusted agent 635 issue their respective memory transactions 650, 655 to destination addresses within address area 615b. To support secure access areas, cache controller circuitry 601 is configured to determine that address area 615b is included in secure access area 627. In response to the determination, cache controller circuitry 601 is configured to ignore memory transaction 655 from an untrusted agent 635 that is not authorized to access secure access area 627. However, trusted agent 630 is authorized to access secure access area 627, and therefore cache controller circuitry 601 is configured to process memory transaction 650 in address area 615b.

메모리 트랜잭션(655)을 수신하는 것에 응답하여, 캐시 제어기 회로(601)는 에러 표시를 생성하도록 추가로 구성될 수 있다. 예를 들어, 캐시 제어기 회로(601)는 에러 코드를 신뢰받지 않는 에이전트(635)로 반환할 수 있고, 에러 코드는 인가되지 않은 어드레스에 대한 액세스를 표시하는 특정 값을 포함한다. 캐시 제어기 회로(601)는, 대신에 또는 추가적으로, 불법적 어드레스 예외 및/또는 보안 위반 예외와 같은 하나 이상의 예외 신호들을 어써트하도록 추가로 구성될 수 있다.In response to receiving memory transaction 655, cache controller circuit 601 may be further configured to generate an error indication. For example, cache controller circuit 601 may return an error code to untrusted agent 635, where the error code includes a specific value indicating access to an unauthorized address. Cache controller circuit 601 may instead or additionally be further configured to assert one or more exception signals, such as an illegal address exception and/or a security violation exception.

시스템(600)은 일례일 뿐이라는 것에 유의한다. 다양한 요소들이 명확성을 위해 시스템(600)으로부터 생략될 수 있다. 다른 실시예들에서, 시스템(600)은 추가적인 보안 액세스 영역들을 포함할 수 있다. 예를 들어, 복수의 상이한 보안 액세스 영역들이 구현될 수 있으며, 이때 각각의 영역은 상이한 레벨의 보안 액세스에 대응하고, 따라서, 신뢰받는 에이전트들의 상이한 조합들에 의해 액세스가능하다.Note that system 600 is only an example. Various elements may be omitted from system 600 for clarity. In other embodiments, system 600 may include additional secure access zones. For example, a plurality of different secure access areas may be implemented, with each area corresponding to a different level of security access and therefore accessible by different combinations of trusted agents.

도 1 내지 도 6에 관하여 전술된 회로들 및 기법들은 캐시 메모리의 일부분을 직접 어드레싱가능한 어드레스 영역에 재할당하기 위한 다양한 기법들을 설명한다. 이러한 개시된 기법들을 구현하기 위해 다양한 방법들이 활용될 수 있다. 두 가지의 그러한 방법들이 도 7 및 도 8을 참조하여 후술된다.The circuits and techniques described above with respect to Figures 1-6 describe various techniques for reallocating portions of cache memory to directly addressable address areas. A variety of methods may be utilized to implement these disclosed techniques. Two such methods are described below with reference to FIGS. 7 and 8.

이제 도 7을 참조하면, 캐시 메모리 회로의 일부분을 직접 어드레싱가능한 어드레스 영역에 재할당하기 위한 방법의 일 실시예에 대해 흐름도가 도시된다. 방법(700)은, 각각 도 1, 도 2, 도 5, 및 도 6의 캐시 제어기 회로들(101, 201, 501, 601)과 같은 캐시 제어기 회로에 의해 수행될 수 있다. 방법(700)은 소프트웨어 또는 펌웨어를 실행하는 프로세싱 회로에 의해, 예를 들어, 로직 게이트들을 포함하는 하드웨어 회로들에 의해, 또는 이들의 조합에 의해 수행될 수 있다. 도 1 및 도 7을 집합적으로 참조하면, 방법(700)은 블록(710)에서 시작한다.Referring now to Figure 7, a flow diagram is shown for one embodiment of a method for reallocating a portion of a cache memory circuit to a directly addressable address region. Method 700 may be performed by a cache controller circuit, such as cache controller circuits 101, 201, 501, and 601 of FIGS. 1, 2, 5, and 6, respectively. Method 700 may be performed by processing circuitry executing software or firmware, by hardware circuits including, for example, logic gates, or a combination thereof. Referring collectively to Figures 1 and 7, method 700 begins at block 710.

블록(710)에서, 방법(700)은, 캐시 제어기 회로(101)에 의해, 현재 사용 중인 캐시 메모리 회로(105)의 일부분을 직접 어드레싱가능한 메모리 공간에 재할당하라는 할당 요청(145)을 수신하는 단계를 포함한다. 도시된 바와 같이, 할당 요청(145)은 비활성 어드레스 영역(115b)을 식별한다. 할당 요청(145)은 시간 t0에서 수신될 수 있는데, 이 시점에서, 캐시 메모리 회로(105)는 사용 중이었고, 캐시 라인들(120 내지 127) 중 하나 이상의 캐시 라인들은 시스템 메모리 내의 위치들을 캐싱하기 위해 사용 중일 수 있다. 어드레스 영역(115b)은, 할당 요청(145) 시에, 어드레스 영역(115b) 내의 어드레스 값 또는 어드레스 영역(115b)에 대응하는 인덱스 값의 포함에 의해 표시될 수 있다.At block 710, method 700 receives, by cache controller circuit 101, an allocation request 145 to reallocate a portion of cache memory circuit 105 currently in use into directly addressable memory space. Includes steps. As shown, allocation request 145 identifies inactive address area 115b. Allocation request 145 may be received at time t0, at which point cache memory circuitry 105 was busy and one or more of cache lines 120-127 were busy caching locations in system memory. may be in use for The address area 115b may be indicated by including an address value within the address area 115b or an index value corresponding to the address area 115b at the time of the allocation request 145.

방법(700)은, 블록(720)에서, 식별된 어드레스 영역(115b)에 기초하여, 컨버팅할 캐시 메모리 회로(105)의 캐시 라인(123)을 선택하는 단계를 추가로 포함한다. 예시된 바와 같이, 캐시 라인(123)은 운영 체제와 같은 시스템(100)에서 실행되는 소프트웨어로 인해 어드레스 영역(115b)과 연관될 수 있다. 다른 실시예들에서, 캐시 라인(123)은 시스템(100)의 회로 설계에 기초하여 어드레스 영역(115b)에 대한 하드코드들일 수 있다. 단지 하나의 캐시 라인만이 어드레스 영역(115b)에서 사용하기 위해 선택되는 것으로 도시되어 있지만, 임의의 적합한 수의 캐시 라인들이 선택될 수 있다. 예를 들어, 도 2를 참조하여 설명된 바와 같이, 캐시 메모리 회로는 복수의 경로들을 포함할 수 있고, 전체 경로 또는 다수의 경로들이 직접 어드레싱가능한 어드레스 영역에서 사용하기 위해 선택될 수 있다.The method 700 further includes selecting, at block 720, a cache line 123 of the cache memory circuit 105 to convert based on the identified address region 115b. As illustrated, cache line 123 may be associated with address area 115b due to software executing on system 100, such as an operating system. In other embodiments, cache lines 123 may be hardcodes to address area 115b based on the circuit design of system 100. Although only one cache line is shown as being selected for use in address area 115b, any suitable number of cache lines may be selected. For example, as described with reference to Figure 2, a cache memory circuit may include multiple paths, and either an entire path or multiple paths may be selected for use in a directly addressable address region.

블록(730)에서, 방법(700)은 또한, 캐시 제어기 회로(101)에 의해, 추가 캐시 동작들로부터 캐시 라인(123)을 배제하기 위해 선택된 캐시 라인(123)에 대한 각자의 표시를 설정하는 단계를 포함한다. 예를 들어, 캐시 제어기 회로(101)는 어드레스 영역(115b) 내의 캐시 라인(123)의 사용을 표시하기 위해 캐시 라인(123)에 대응하는 캐시 태그 내의 특정 비트 또는 비트들의 그룹을 설정할 수 있다. 또한, 캐시 제어기 회로(101)는 캐시 라인(123)이 벌크 트랜잭션들보다 더 높은 우선순위들을 갖는 실시간 트랜잭션들과 연관됨을 표기하는 실시간 메모리 표시자를 설정할 수 있다. 그러한 표시는 캐시 제어기 회로(101)가, 캐시 라인(123)이 어드레스 영역(115b)에 재할당된 후에 그의 콘텐츠들의 축출을 수행하는 것을 방지할 수 있다. 실시간 표시는 캐시 제어기 회로(101)에 대한 큐 내의 임의의 벌크 트랜잭션들보다, 어드레스 영역(115b) 내의 어드레스를 갖는 임의의 트랜잭션들을 목적지로서 추가로 우선순위화할 수 있다.At block 730, the method 700 further includes setting, by the cache controller circuit 101, a respective indication for the selected cache line 123 to exclude the cache line 123 from further cache operations. Includes steps. For example, cache controller circuit 101 may set a particular bit or group of bits in the cache tag corresponding to cache line 123 to indicate use of cache line 123 within address area 115b. Additionally, cache controller circuit 101 may set a real-time memory indicator to indicate that cache line 123 is associated with real-time transactions that have higher priorities than bulk transactions. Such an indication may prevent the cache controller circuit 101 from performing eviction of the contents of the cache line 123 after it has been reallocated to the address area 115b. The real-time indication may further prioritize any transactions with addresses within address area 115b as destinations over any bulk transactions in the queue for cache controller circuitry 101.

일부 실시예들에서, 방법(700)은 각자의 표시를 설정하기 전에, 캐시 제어기 회로(101)에 의해, 캐시 라인(123)을 플러싱하는 단계를 추가로 포함할 수 있다. 캐시 메모리 회로(105)가 할당 요청(145)의 수신 전에 사용 중이었기 때문에, 유효 데이터는 캐시 라인(123)에 캐싱될 수 있다. 캐시 라인(123)에 캐싱된 임의의 값이 수정되었고 이러한 수정이 시스템 메모리 내의 목적지 위치에 라이트백되지 않았다면, 캐시 라인(123)에 현재 캐싱된 수정된 값들을 갖는 임의의 위치에 대한 라이트백 요청들을 생성하는 플러시 커맨드가 캐시 제어기 회로(101)에 의해 발행될 수 있다. 라이트백 요청들이 발행된 후에, 캐시 라인(123)은 어드레스 영역(115b)에서 사용하기 위해 이용가능할 수 있다.In some embodiments, method 700 may further include flushing, by cache controller circuit 101, cache line 123 prior to setting the respective indication. Because the cache memory circuit 105 was in use prior to receipt of the allocation request 145, valid data may be cached in cache line 123. If any value cached in cache line 123 has been modified and this modification has not been written back to a destination location in system memory, a writeback request is made to any location that has modified values currently cached in cache line 123. A flush command that generates a flush command may be issued by the cache controller circuit 101. After writeback requests are issued, cache line 123 may be available for use in address area 115b.

직접 어드레싱가능한 어드레스 영역으로서의 캐시 메모리의 일부분의 사용은, 전형적인 캐싱 기법들이 시스템 메모리 액세스들에 대해 채용되더라도, 시스템 메모리에 대한 직접 액세스들에 의해 달성가능하지 않을 수 있는 높은 QoS 데드라인을 갖는 메모리 액세스들을 수행하기 위한 특정 에이전트에 의해 사용될 수 있는 낮은 레이턴시 메모리 범위를 가능하게 할 수 있다. 캐시 메모리 회로들을 사용하는 낮은 레이턴시 메모리 영역을 생성함으로써, 특정 에이전트는, 특정 시간 프레임 내에서 액세스되지 않는 경우, 버퍼링된 데이터가 캐시로부터 축출될 위험이 없이 이러한 낮은 레이턴시 메모리 영역에서 프로세싱될 데이터를 버퍼링할 수 있다.The use of a portion of cache memory as a directly addressable address area allows memory accesses with high QoS deadlines that may not be achievable by direct accesses to system memory, even if typical caching techniques are employed for system memory accesses. This may enable a range of low-latency memory that can be used by specific agents to perform tasks. By creating low-latency memory regions using cache memory circuits, a particular agent can buffer data to be processed in these low-latency memory regions without risking the buffered data being evicted from the cache if not accessed within a certain time frame. can do.

어드레스 영역(115b)이 활성인 동안, 캐시 라인들(120 내지 122, 124 내지 127)은 캐시 메모리 회로(105)에서 캐시 동작들을 위해 사용될 수 있다. 예를 들어, 캐시 메모리 회로(105)에서 현재 캐싱되는 특정 어드레스에 기록된 데이터는 시스템 메모리 내의 특정 어드레스에 라이트백될 수 있다. 그러나, 캐시 라인(123)은 캐시 동작들을 위해 사용되지 않는다. 예를 들어, 어드레스 영역(115b) 내의 캐시 라인(123)에 있는 상이한 어드레스에 기록된 데이터는 시스템 메모리에 라이트백되지 않는다. 대신에, 캐시 라인(123)은 어드레스 영역(115b)에 기록된 데이터에 대한 최종 목적지로서 사용될 수 있다.While address region 115b is active, cache lines 120-122 and 124-127 may be used for cache operations in cache memory circuit 105. For example, data written to a specific address that is currently cached in the cache memory circuit 105 may be written back to a specific address in the system memory. However, cache line 123 is not used for cache operations. For example, data written to a different address in the cache line 123 in the address area 115b is not written back to the system memory. Instead, cache line 123 may be used as the final destination for data written to address area 115b.

방법(700)은 블록(730)에서 종료될 수 있거나, 일부 또는 모든 동작들을 반복할 수 있다. 예를 들어, 방법(700)은 다른 할당 요청이 캐시 제어기 회로(101)에 의해 수신되는 것에 응답하여 블록(710)으로 복귀할 수 있다. 일부 실시예들에서, 방법(700)의 다수의 인스턴스들이 동시에 수행될 수 있다. 예를 들어, 캐시 제어기 회로(101)는 제1 할당 요청을 여전히 수행하면서 제2 할당 요청을 프로세싱할 수 있다. 시스템(100)이 (예컨대, 각자의 캐시 메모리 회로들에 대한) 다수의 캐시 제어기 회로들을 포함하는 경우, 각각의 캐시 제어기 회로는 방법(700)을 병렬로 수행할 수 있다. 도 7의 방법은, 캐시 메모리의 일부분을 직접 어드레싱가능한 어드레스 영역으로서 할당하기 위한 일례일뿐이라는 것에 유의한다.Method 700 may end at block 730 or may repeat some or all of the operations. For example, method 700 may return to block 710 in response to another allocation request being received by cache controller circuitry 101. In some embodiments, multiple instances of method 700 may be performed simultaneously. For example, cache controller circuitry 101 may process a second allocation request while still performing the first allocation request. If system 100 includes multiple cache controller circuits (e.g., for respective cache memory circuits), each cache controller circuit may perform method 700 in parallel. Note that the method of Figure 7 is only an example for allocating a portion of the cache memory as a directly addressable address area.

이제 도 8을 참조하면, 캐시 메모리의 일부분을 활용하는 직접 어드레싱가능한 어드레스 영역을 동작 및 할당해제하기 위한 방법의 일 실시예에 대한 흐름도가 도시되어 있다. 방법(700)과 유사한 방식으로, 방법(800)은, 각각 도 1, 도 2, 도 5, 및 도 6에 도시된 바와 같은 캐시 제어기 회로(101, 201, 501, 601)와 같은 캐시 제어기 회로에 의해 수행될 수 있다. 방법(800)은 또한, 소프트웨어 또는 펌웨어를 실행하는 프로세싱 회로에 의해, 하드웨어 회로에 의해, 또는 이들의 조합에 의해 수행될 수 있다. 도 1, 도 3, 및 도 8을 집합적으로 참조하면, 방법(800)은 블록(810)에서 시작되며, 이때 캐시 라인(123)은 어드레스 영역(115b)에 이미 재할당되어 있다.Referring now to Figure 8, a flow diagram is shown for one embodiment of a method for operating and deallocating a directly addressable address region utilizing a portion of cache memory. In a similar manner to method 700, method 800 includes cache controller circuits such as cache controller circuits 101, 201, 501, and 601 as shown in FIGS. 1, 2, 5, and 6, respectively. It can be performed by . Method 800 may also be performed by processing circuitry executing software or firmware, by hardware circuitry, or a combination thereof. Referring collectively to Figures 1, 3, and 8, method 800 begins at block 810, where cache line 123 has already been reallocated to address area 115b.

방법(800)은, 블록(810)에서, 캐시 제어기 회로(101)에 의해, 인가되지 않은 에이전트로부터, 어드레스 영역(115b)에 대한 메모리 트랜잭션을 수신하는 단계를 포함한다. 도 6을 참조하여 전술된 바와 같이, 시스템(100)에 대한 시스템 메모리 맵은 오픈 액세스 영역 및 하나 이상의 보안 액세스 영역들을 포함할 수 있다. 다양한 에이전트들은 어드레스 영역(115b)에 액세스하려고 시도할 수 있으며, 이들 중 일부는 보안 영역들 중 하나 이상에 액세스하도록 인가될 수 있는 한편, 다른 에이전트들은 오픈 액세스 영역 내의 것들을 제외한 어떠한 어드레스들에도 액세스하기 위한 인가를 갖지 않을 수 있다.The method 800 includes, at block 810, receiving, by the cache controller circuit 101, a memory transaction for the address area 115b from an unauthorized agent. As described above with reference to Figure 6, the system memory map for system 100 may include an open access area and one or more secure access areas. Various agents may attempt to access address area 115b, some of which may be authorized to access one or more of the secure areas, while other agents may attempt to access any addresses except those within the open access area. may not have approval for it.

블록(820)에서, 방법(800)은, 어드레스 영역(115b)이 보안 액세스 영역의 일부라고 결정하는 것에 응답하여, 캐시 제어기 회로(101)에 의해, 인가되지 않은 에이전트로부터의 메모리 트랜잭션을 무시하는 단계를 포함한다. 예시된 바와 같이, 수신된 메모리 트랜잭션에 포함된 어드레스는 어드레스 영역(115b) 내의 위치를 타깃화한다. 어드레스 영역(115b)은 인가되지 않은 에이전트가 액세스하지 않는 시스템 메모리 맵의 보안 액세스 영역 내에 있는 것으로 결정될 수 있다. 이러한 결정에 응답하여, 수신된 메모리 트랜잭션이 무시된다. 전술된 바와 같이, 에러 메시지는 인가되지 않은 에이전트로 반환될 수 있고/있거나, 예외 신호가, 인가되지 않은 액세스가 시도되었음을, 예컨대 운영 체제에 표시하도록 어써트될 수 있다.At block 820, method 800 causes cache controller circuitry 101 to ignore memory transactions from unauthorized agents in response to determining that address region 115b is part of a secure access region. Includes steps. As illustrated, the address included in the received memory transaction targets a location within address area 115b. Address region 115b may be determined to be within a secure access region of the system memory map that is not accessed by unauthorized agents. In response to this decision, the received memory transaction is ignored. As described above, an error message may be returned to the unauthorized agent and/or an exception signal may be asserted to indicate, for example, to an operating system, that unauthorized access has been attempted.

블록(830)에서, 방법은 또한, 캐시 제어기 회로(101)에 의해, 직접 어드레싱가능한 어드레스 영역(115b)으로부터 캐시 메모리 회로(105)의 캐시 라인(123)을 할당해제하라는 할당해제 요청(345)을 수신하는 단계를 포함한다. 어드레스 영역(115b)을 사용하고 있던 에이전트는 어드레스 영역(115b)에 캐시 라인(123)을 재할당하라는 요청을 개시했던 활동들을 완료할 수 있다. 예를 들어, 프로세서는 애플리케이션 내에서의 프로세스 또는 특정 애플리케이션의 개시에 응답하여 어드레스 영역(115b)의 활성화를 요청했을 수 있다. 일단 애플리케이션 또는 프로세스가 완료되었다면, 어드레스 영역(115b)이 필요하지 않을 수 있고, 따라서, 캐시 메모리 회로(105)에서 사용하도록 복귀되어, 이에 의해, 주어진 시간에 캐싱될 수 있는 데이터의 양을 증가시킬 수 있다.At block 830, the method also includes a deallocation request 345 to deallocate, by the cache controller circuit 101, the cache line 123 of the cache memory circuit 105 from the directly addressable address region 115b. It includes the step of receiving. An agent that was using address area 115b may complete the activities that initiated the request to reallocate cache line 123 to address area 115b. For example, the processor may have requested activation of address area 115b in response to initiation of a specific application or process within the application. Once the application or process has completed, the address area 115b may not be needed and is therefore returned to use by the cache memory circuit 105, thereby increasing the amount of data that can be cached at any given time. You can.

방법(800)은, 블록(840)에서, 할당해제 요청(345)에 응답하여, 캐시 동작들에서 캐시 라인(123)을 포함시키는 단계를 추가로 포함한다. 예시된 바와 같이, 캐시 라인(123)은 캐시 메모리로서 사용하기 위해 캐시 메모리 회로(105)로 복귀된다. 예를 들어, 캐시 라인(123)에 대응하는 캐시 태그 내의 하나 이상의 비트들이 어드레스 영역(115b) 내에 캐시 라인(123)을 포함하도록 설정되었다면, 이러한 비트들은 캐시 라인(123)을 캐시 메모리 회로(105)로 복귀시키기 위해 클리어될 수 있다. 캐시 라인(123)이 재할당되었던 동안 어드레스 영역(115b)에 저장된 데이터는, 시스템 메모리 회로에 대한 라이트백 없이 오버라이트될 수 있다. 어드레스 영역(115b)에 저장된 값들은 캐시 라인(123)이 할당해제되기 전에 각자의 메모리 트랜잭션들의 사용을 통해 다른 메모리 위치들에 명시적으로 카피될 필요가 있을 수 있다. 그렇지 않다면, 어드레스 영역(115b)으로부터의 임의의 값들이 할당해제 후에 손실될 수 있다.The method 800 further includes, at block 840, including cache line 123 in cache operations in response to deallocation request 345. As illustrated, cache line 123 is returned to cache memory circuit 105 for use as cache memory. For example, if one or more bits in the cache tag corresponding to the cache line 123 are set to include the cache line 123 in the address area 115b, these bits are used to connect the cache line 123 to the cache memory circuit 105. ) can be cleared to return to ). Data stored in the address area 115b while the cache line 123 was reallocated can be overwritten without writeback to the system memory circuit. Values stored in address area 115b may need to be explicitly copied to other memory locations through the use of respective memory transactions before cache line 123 is deallocated. Otherwise, any values from address area 115b may be lost after deallocation.

방법은, 블록(850)에서, 캐시 메모리 회로(105)의 캐시 라인(123)을 할당해제한 후에 수신된 어드레스 영역(115b) 내의 어드레스에 대한 판독 요청에 응답하여 디폴트 값을 반환하는 단계를 추가로 포함한다. 예시된 바와 같이, 메모리 트랜잭션(350)이 할당해제 요청(345)이 수행된 후 어드레스 영역(115b) 내의 어드레스로 지향되는 경우, 비활성 어드레스에 대한 액세스를 표시하는 디폴트 값이 메모리 트랜잭션(350)을 발행했던 에이전트로 반환된다.The method further includes, at block 850, returning a default value in response to a read request for an address within the address region 115b received after deallocating the cache line 123 of the cache memory circuit 105. Included as. As illustrated, when memory transaction 350 is directed to an address within address region 115b after deallocation request 345 is performed, a default value indicating an access to an inactive address causes memory transaction 350 to be It is returned to the issuing agent.

블록(860)에서, 방법(800)은 또한, 할당해제 후에 수신된 어드레스 영역(115b) 내의 어드레스에 대한 기록 요청에 응답하여 캐시 제어기 회로(101)에 의해 에러를 생성하는 단계를 포함한다. 블록(850)에 더하여, 또는 일부 실시예들에서, 블록(850) 대신에, 예외 신호의 어써트와 같은 에러가 생성될 수 있다. 그러한 에러는 비활성 어드레스에 대한 액세스가 이루어졌다는 표시를 감독 프로세서, 보안 회로, 예외 핸들러 회로 또는 프로세스, 및/또는 다른 하드웨어 회로들 또는 소프트웨어 프로세스들에 제공할 수 있다. 일부 경우들에서, 그러한 액세스는 부적절한 운영 체제를 표시할 수 있고, 시스템 재설정 또는 예외 루틴과 같은 복구 동작이 개시될 수 있다.At block 860, the method 800 also includes generating an error by the cache controller circuit 101 in response to a write request for an address in the address region 115b received after deallocation. In addition to block 850, or in some embodiments, instead of block 850, an error, such as an assertion of an exception signal, may be generated. Such an error may provide an indication to the supervisory processor, security circuit, exception handler circuit or process, and/or other hardware circuits or software processes that an access to an inactive address has been made. In some cases, such access may indicate an improper operating system and initiate recovery actions such as a system reset or exception routine.

일부 실시예들에서, 방법(800)은 블록(860)에서 종료될 수 있거나, 또는 다른 실시예들에서, 일부 또는 모든 동작들을 반복할 수 있다. 예를 들어, 방법(800)은 블록(830)으로 복귀하여, 상이한 할당해제 요청에 응답하여 상이한 어드레스 영역을 할당해제할 수 있다. 방법(800)의 동작들은 전체적으로 또는 부분적으로 상이한 순서로 수행될 수 있다는 것에 유의한다. 예를 들어, 블록들(810, 820)은 블록(830)이 초기 시간에 수행되기 전에 1회 이상 수행될 수 있다. 블록들(830 내지 860)은 블록들(810, 820)이 수행되지 않고서 수행될 수 있다.In some embodiments, method 800 may end at block 860, or in other embodiments, may repeat some or all of the operations. For example, the method 800 may return to block 830 to deallocate a different address region in response to a different deallocation request. Note that the operations of method 800 may be performed in whole or in part in a different order. For example, blocks 810 and 820 may be performed one or more times before block 830 is performed at an initial time. Blocks 830 to 860 may be performed without blocks 810 and 820 being performed.

방법들(700, 800)의 다양한 동작들의 수행은 동시에 그리고/또는 인터리빙된 방식으로 수행될 수 있다. 예를 들어, 캐시 제어기 회로(101)는 다수의 어드레스 영역들을 동시에 관리하도록 구성되어, 이에 의해, 상이한 프로세서 회로들이 중첩 방식으로 상이한 직접 어드레싱가능한 어드레스 영역들을 활용할 수 있게 할 수 있다. 따라서, 방법(800)은 방법(700)이 진행 중인 동안 전체적으로 또는 부분적으로 수행될 수 있다.Performance of the various operations of methods 700 and 800 may be performed simultaneously and/or in an interleaved manner. For example, cache controller circuit 101 may be configured to manage multiple address areas simultaneously, thereby allowing different processor circuits to utilize different directly addressable address areas in an overlapping manner. Accordingly, method 800 may be performed in whole or in part while method 700 is in progress.

도 1 내지 도 8은 캐시 메모리의 일부분이 시스템 버스 액세스가능한 어드레스 영역에 할당되어, 이에 의해, 주어진 에이전트 또는 에이전트들의 그룹에 대한 낮은 레이턴시 메모리 영역을 가능하게 하는 RAM으로서의 캐시 기법의 다양한 실시예들을 도시한다. 후술되는 도 9 내지 도 15는, 버퍼가 시스템 메모리 내에 할당되고, 이어서, 버퍼의 전체에 걸쳐 캐시 미스들을 분배하려고 시도하는 특정 순서를 사용하여 캐시 메모리에 할당되는 분산형 버퍼 기법을 도시한다.1-8 illustrate various embodiments of a cache-as-RAM technique in which a portion of the cache memory is allocated to a system bus accessible address region, thereby enabling a low-latency memory region for a given agent or group of agents. do. Figures 9-15, described below, illustrate a distributed buffer technique in which buffers are allocated within system memory and then allocated to cache memory using a specific order that attempts to distribute cache misses across the buffer.

도 9로 진행하면, 캐시 메모리를 포함하는 시스템의 일 실시예의 블록도가 2개의 시점들에서 예시된다. 도시된 바와 같이, 시스템(900)은 프로세싱 회로(901), 캐시 메모리 회로(905), 및 시스템 메모리 회로(910)를 포함한다. 캐시 메모리 회로(905)는 캐시 라인들(920a 내지 920h)(집합적으로, 캐시 라인들(920))을 포함한다. 시스템 메모리 회로(910)는 9개의 저장 위치들(935a 내지 935i)(집합적으로, 위치들(935))을 갖는 것으로 도시되어 있다. 시스템(900)은 마이크로프로세서, 마이크로제어기, 또는 다른 형태의 시스템-온-칩(SoC)과 같은 프로세서에 대응할 수 있다. 시스템(900)은 회로 기판 상에 결합된 다수의 회로 요소들의 사용에 의해 또는 단일 집적 회로 상에서 구현될 수 있다.Turning to Figure 9, a block diagram of one embodiment of a system including cache memory is illustrated from two perspectives. As shown, system 900 includes processing circuitry 901, cache memory circuitry 905, and system memory circuitry 910. Cache memory circuit 905 includes cache lines 920a through 920h (collectively, cache lines 920). System memory circuit 910 is shown as having nine storage locations 935a - 935i (collectively, locations 935). System 900 may correspond to a processor, such as a microprocessor, microcontroller, or other type of system-on-chip (SoC). System 900 may be implemented on a single integrated circuit or by the use of multiple circuit elements combined on a circuit board.

예시된 바와 같이, 프로세싱 회로(901)는 단일 또는 다중 코어 프로세서 복합체 내의 프로세서 코어일 수 있다. 시스템(900)은, 도 9 내지 도 15와 관련하여 후술되는 동작들을 수행하기 위해 프로세싱 회로(901)에 의해 실행가능한 명령들이 저장된 비일시적 컴퓨터 판독가능 매체를 포함할 수 있다. 그러한 비일시적 컴퓨터 판독가능 매체는 시스템 메모리 회로(910)에 포함되고/되거나 그에 결합된 비휘발성 메모리 회로들을 포함할 수 있다. 비휘발성 메모리 회로는, 예를 들어, 플래시 메모리 어레이들, 솔리드 스테이트 드라이브, 하드 디스크 드라이브, 범용 직렬 버스(universal serial bus, USB) 드라이브, 광학 디스크 드라이브, 플로피 디스크 드라이브들 등을 포함할 수 있다. 시스템 메모리 회로(910) 및 캐시 메모리 회로(905) 각각은 SRAM, DRAM 등과 같은 하나 이상의 유형들의 RAM을 각각 포함할 수 있다.As illustrated, processing circuit 901 may be a processor core within a single or multi-core processor complex. System 900 may include a non-transitory computer-readable medium having instructions executable by processing circuit 901 to perform operations described below with respect to FIGS. 9-15. Such non-transitory computer-readable media may include non-volatile memory circuits included in and/or coupled to system memory circuitry 910. Non-volatile memory circuitry may include, for example, flash memory arrays, solid state drives, hard disk drives, universal serial bus (USB) drives, optical disk drives, floppy disk drives, etc. System memory circuit 910 and cache memory circuit 905 may each include one or more types of RAM, such as SRAM, DRAM, etc.

프로세싱 회로(901)는, 도시된 바와 같이, 시스템(900)의 시스템 메모리 회로(910) 내의 저장 위치들(935)을 버퍼(915)에 할당하도록 구성된다. 다양한 실시예들에서, 시스템(900)(도시되지 않음) 내의 프로세싱 회로(901) 및/또는 다른 에이전트는 버퍼(915)를 사용하여, 시스템(900) 상에서 실행되는 애플리케이션에 관련된 정보를 프로세싱할 수 있다. 본 출원의 원하는 성능을 만족시키기 위해, 버퍼(915)에 대한 액세스는 특정 서비스 품질(QoS) 필요성들을 가질 수 있다. QoS 필요성들을 충족할 확률을 증가시키기 위해, 프로세싱 회로(901)는 저장 위치들(935)을 캐시 메모리 회로(905)에 할당하도록 추가로 구성된다. 캐시 메모리 회로(905)에 대한 액세스들은 전형적으로, 시스템 메모리 회로(910)에 액세스하는 더 높은 QoS 레벨을 가질 수 있다.Processing circuit 901 is configured to assign storage locations 935 within system memory circuit 910 of system 900 to buffer 915, as shown. In various embodiments, processing circuitry 901 and/or other agents within system 900 (not shown) may use buffer 915 to process information related to applications running on system 900. there is. To meet the desired performance of the present application, access to buffer 915 may have specific quality of service (QoS) requirements. To increase the probability of meeting QoS needs, processing circuitry 901 is further configured to assign storage locations 935 to cache memory circuitry 905. Accesses to cache memory circuitry 905 may typically have a higher QoS level than accesses to system memory circuitry 910.

버퍼(915)를 캐시 메모리 회로(905)에 할당하기 위해, 프로세싱 회로(901)는 저장 위치들(935)을 캐시 메모리 회로(905)에 할당하기 위한 특정 순서를 선택하도록 구성된다. 이러한 특정 순서는 선형 순서와 비교하여 캐시 미스 레이트들의 균일성을 증가시킬 수 있다. 저장 위치들(935)을 선형 순서로 할당하는 것, 예컨대 위치(935a)를 할당하는 것으로 시작하여, 저장 위치들(935b, 935c, 935d 등)을 거쳐 저장 위치(935i)까지 순서대로 진행하는 것은 버퍼(915)의 종단에서의 저장 위치들에 대해 캐시 미스들이 더 빈번하게 발생하는 결과를 초래할 수 있다. 예를 들어, 저장 위치들(935g, 935h, 935i)은 대응하는 캐시 라인이 상이한 저장 위치에 이미 할당된 것으로 인해 할당되지 못할 더 높은 확률을 가질 수 있다. 따라서, 버퍼(915)의 종단에서의 위치들이 캐시 메모리 회로(905)에 성공적으로 할당될 수 있을 가능성을 증가시키는 보다 공평한 방식으로 저장 위치들(935)을 할당하는, 캐시 메모리 회로(905)에 대한 저장 위치들(935)의 할당들을 수행하기 위한 특정 순서가 선택된다.To assign the buffer 915 to the cache memory circuit 905, the processing circuit 901 is configured to select a specific order for assigning the storage locations 935 to the cache memory circuit 905. This specific ordering can increase the uniformity of cache miss rates compared to linear ordering. Assigning storage locations 935 in a linear order, e.g., starting with assigning location 935a and proceeding sequentially through storage locations 935b, 935c, 935d, etc., to storage location 935i This may result in cache misses occurring more frequently for storage locations at the end of buffer 915. For example, storage locations 935g, 935h, 935i may have a higher probability of being unallocated due to the corresponding cache line having already been assigned to a different storage location. Accordingly, the cache memory circuit 905 allocates storage locations 935 in a more fair manner that increases the likelihood that locations at the end of the buffer 915 can be successfully assigned to the cache memory circuit 905. A specific order for performing allocations of storage locations 935 is selected.

특정 순서가 선택된 후, 프로세싱 회로(901)는 특정 순서로 캐시 메모리 회로(905) 내의 버퍼(915)의 저장 위치들(935) 중의 저장 위치들을 캐싱하도록 추가로 구성된다. 일부 실시예들에서, 프로세싱 회로(901)는 개별 저장 위치들을 선택 및 할당하기보다는, 다수의 저장 위치들을 각각 갖는 저장 위치들(935)의 서브세트들을 선택 및 할당하도록 추가로 구성될 수 있다.After the specific order is selected, the processing circuit 901 is further configured to cache the storage locations among the storage locations 935 of the buffer 915 within the cache memory circuit 905 in the specific order. In some embodiments, processing circuitry 901 may be further configured to select and assign subsets of storage locations 935, each having multiple storage locations, rather than selecting and assigning individual storage locations.

예로서, 시간 t0에서, 프로세싱 회로(901)는 저장 위치들(935)을 포함하는 버퍼(915)를 시스템 메모리 회로(910)에 할당한다. 시간 t1에서, 프로세싱 회로(901)는, 특정 순서에 기초하여, 버퍼(915)를 복수의 블록들로 세그먼트화하도록 구성된다. 이러한 복수의 블록들은 저장 위치들(935)에 대응하고, 도시된 바와 같이 직렬 로직 순서를 갖는다.As an example, at time t0, processing circuitry 901 allocates buffer 915 containing storage locations 935 to system memory circuitry 910. At time t1, the processing circuitry 901 is configured to segment the buffer 915 into a plurality of blocks based on a specific order. These plurality of blocks correspond to storage locations 935 and have a serial logical order as shown.

각각의 저장 위치(935)는 1 바이트, 16 바이트, 128 바이트 등과 같은, 시스템 메모리 회로(910)의 임의의 적합한 수의 바이트들을 포함할 수 있다. 일부 실시예들에서, 상이한 저장 위치들(935)은 상이한 수들의 바이트들을 포함할 수 있다. 이러한 예의 경우, 하나의 저장 위치(935)는 하나의 캐시 라인(920)과 동일한 수의 바이트들을 갖는다. 저장 위치들(935)에 대한 크기들은 특정 순서에 기초하여 프로세싱 회로(901)에 의해 결정될 수 있다. 도시된 바와 같이, 버퍼(915)는 9개의 저장 위치들로 분할되고, 특정 순서는, 저장 위치(935a)로 시작하여, 이어서 935d, 및 이어서 935g인 매 세 번째 저장 위치를 할당하는 것을 포함한다. 순서는 저장 위치(935b), 이어서 935e, 및 이어서 935h로 다시 랩핑된다. 이어서, 935c로 시작하여, 이어서 935f, 및 935i로 종료되는 최종 3개의 저장 위치들이 할당된다.Each storage location 935 may contain any suitable number of bytes of system memory circuitry 910, such as 1 byte, 16 bytes, 128 bytes, etc. In some embodiments, different storage locations 935 may contain different numbers of bytes. For this example, one storage location 935 has the same number of bytes as one cache line 920. The sizes for storage locations 935 may be determined by processing circuitry 901 based on a specific order. As shown, buffer 915 is divided into nine storage locations, and the specific order includes assigning every third storage location, starting with storage location 935a, followed by 935d, and then 935g. . The order wraps back to storage location 935b, then 935e, and then 935h. The final three storage locations are then assigned, starting with 935c, followed by 935f, and ending with 935i.

프로세싱 회로(901)는 직렬 순서와는 상이한 특정 순서로 저장 위치들(935) 중의 저장 위치들을 선택하는 증분을 사용하여 저장 위치들(935)을 캐싱하도록 추가로 구성된다. 예시된 예에서, 이러한 증분은 3이지만, 임의의 적합한 수가 사용될 수 있다. 저장 위치(935a)가 캐시 라인(920c)에 할당되고, 이어서 저장 위치(935d)가 캐시 라인(920f)에 할당되고, 이어서, 935g가 캐시 라인(920h)에 할당된다. 캐시 메모리 회로(905)는, 도시된 바와 같이, 주어진 저장 위치(935)에 포함된 특정 시스템 어드레스에 기초하여 주어진 저장 위치(935)를 대응하는 캐시 라인(920)에 맵핑하도록 구성된다. 예를 들어, 캐시 메모리 회로(905)는 특정 어드레스 또는 그의 일부분의 해시를 수행할 수 있고, 결과적인 해시 값은 특정 어드레스를 대응하는 캐시 라인(920)에 맵핑하는 데 사용된다. 캐시 메모리 회로(905)가 시스템 메모리 회로(910)보다 훨씬 더 작을 수 있기 때문에, 2개의 상이한 시스템 어드레스들은 동일한 캐시 라인(920)에 맵핑되는 해시 값들을 초래할 수 있다. 그러한 경우에, 2개의 어드레스들 중 제2 어드레스는 할당되지 못할 수 있다.The processing circuitry 901 is further configured to cache the storage locations 935 using increments to select storage locations among the storage locations 935 in a specific order that is different from the serial order. In the illustrated example, this increment is 3, but any suitable number may be used. Storage location 935a is assigned to cache line 920c, followed by storage location 935d to cache line 920f, and then 935g to cache line 920h. Cache memory circuit 905 is configured to map a given storage location 935 to a corresponding cache line 920 based on a specific system address contained in the given storage location 935, as shown. For example, cache memory circuitry 905 may perform a hash of a particular address or a portion thereof, and the resulting hash value may be used to map the particular address to a corresponding cache line 920. Because the cache memory circuit 905 may be much smaller than the system memory circuit 910, two different system addresses may result in hash values being mapped to the same cache line 920. In such a case, the second of the two addresses may not be assigned.

도 9의 예에서, 저장 위치들(935b, 935f, 935i)은 각각 캐시 라인들(920h, 920e, 920c)에 맵핑된다. 그러나, 이러한 3개의 캐시 라인들(920)은 이미 저장 위치들(935a, 935h, 935g)에 각각 할당되었다. 따라서, 저장 위치들(935b, 935f, 935i)이 할당되지 못한다. 버퍼(915)에서 이탤릭체의 굵은 텍스트에 의해 도시된 바와 같이, 할당되지 못한 저장 위치들은 버퍼(915) 전체에 걸쳐 확산된다. 이어서, 버퍼(915)의 콘텐츠들이 저장 위치(935a)에서 시작하는 로직 순서로 에이전트에 의해 트래버스되는 경우, 캐시 미스들은 다음 캐시 미스에 도달하기 전에 2개 이상의 캐시 히트들에 의해 분리되어 한번에 하나씩 발생한다.In the example of Figure 9, storage locations 935b, 935f, and 935i are mapped to cache lines 920h, 920e, and 920c, respectively. However, these three cache lines 920 have already been assigned to storage locations 935a, 935h, and 935g, respectively. Therefore, the storage locations 935b, 935f, and 935i cannot be assigned. As shown by the italic, bold text in buffer 915, unallocated storage locations are spread throughout buffer 915. Then, when the contents of buffer 915 are traversed by the agent in a logical order starting at storage location 935a, cache misses occur one at a time, separated by two or more cache hits before reaching the next cache miss. do.

그러나, 저장 위치들(935)이 버퍼(915)가 트래버스될 때와 동일한 선형 순서로 할당되었다면, 저장 위치(935g)보다는 저장 위치(935b)가 할당되었을 것이고, 저장 위치(935f)가 저장 위치(935h) 대신에 할당되었을 것이다. 이것은 저장 위치들(935g, 935h, 935i) 모두가 할당되지 못하는 결과를 초래했을 것이다. 이러한 시나리오에서 에이전트가 버퍼(915)를 트래버스할 때, 3개의 캐시 미스들이 버퍼(915)의 종단에서의 행에서 발생하며, 이때 미스들 사이에 캐시 히트들이 없다. 행에서의 시스템 메모리 회로(910)에 대한 3개의 페치들은 지연들을 야기할 수 있으며, 이는 제2 및 제3 페치들이 프로세싱될 이전 페치들을 기다려야 할 수 있기 때문이다. 따라서, 선형 순서보다는 특정 순서를 사용하여 버퍼(915)를 할당하는 것은 버퍼(915)를 통해 트래버스하기 위한 전체 시간을 감소시킬 수 있다.However, if storage locations 935 had been assigned in the same linear order as buffer 915 was traversed, storage location 935b would have been assigned rather than storage location 935g, and storage location 935f would have been assigned storage location ( 935h) would have been assigned instead. This would have resulted in all of the storage locations (935g, 935h, 935i) being unallocated. In this scenario, when the agent traverses buffer 915, three cache misses occur in the row at the end of buffer 915, with no cache hits between the misses. Three fetches to system memory circuit 910 in a row may cause delays because the second and third fetches may have to wait for previous fetches to be processed. Accordingly, allocating buffers 915 using a specific order rather than a linear order may reduce the overall time to traverse through buffers 915.

캐시 메모리 회로(905)에 대한 버퍼(915)의 할당이 완료된 후, 프로세싱 회로(901), 또는 시스템(900) 내의 다른 에이전트들은 버퍼(915)에 저장된 값들에 대한 낮은 레이턴시 경로로서 캐시 메모리 회로(905)에 액세스할 수 있다. 성공적으로 캐싱되었던 위치들(935)은 시스템 메모리 회로(910) 내의 위치들(935)에 직접 액세스하는 것에 비해 버퍼(915)의 콘텐츠들에 더 빠른 액세스를 제공할 수 있다.After allocation of the buffer 915 to the cache memory circuit 905 is complete, the processing circuit 901, or other agents within the system 900, use the cache memory circuit (915) as a low-latency path for the values stored in the buffer 915. 905) can be accessed. Locations 935 that have been successfully cached may provide faster access to the contents of buffer 915 compared to direct access to locations 935 within system memory circuitry 910.

도 9의 실시예는 단지 예시임에 유의한다. 도 9는 개시된 기법들을 설명하기 위한 요소들만을 포함한다. 다른 실시예들에서, 추가적인 실시예들이 포함될 수 있다. 예를 들어, 하나 이상의 버스 회로들, 메모리 관리 유닛들 등이 다른 실시예들에 포함될 수 있다. 캐시 라인들 및 저장 위치들의 수는 명료함을 위해 제한된다. 다른 실시예들에서, 임의의 적합한 수의 캐시 라인들 및 저장 위치들이 포함될 수 있다.Note that the embodiment in Figure 9 is merely illustrative. 9 includes only elements for illustrating the disclosed techniques. In other embodiments, additional embodiments may be included. For example, one or more bus circuits, memory management units, etc. may be included in other embodiments. The number of cache lines and storage locations is limited for clarity. In other embodiments, any suitable number of cache lines and storage locations may be included.

도 9의 설명에서, 버퍼의 위치를 성공적으로 할당하지 못하는 실패가 간략히 논의된다. 버퍼 내의 특정 위치가 버퍼 내의 상이한 위치에 이미 할당되었던 캐시 라인에 맵핑되는 경우, 할당은 실패한다. 일부 실시예들에서, 버퍼 내의 특정 위치는 버퍼와 연관되지 않은 시스템 메모리 내의 상이한 위치에 현재 할당되는 캐시 라인에 맵핑될 수 있다. 그러한 경우를 핸들링하기 위한 기법이 이제 제시된다.In the description of Figure 9, failure to successfully allocate a location for a buffer is briefly discussed. If a particular location in the buffer maps to a cache line that has already been assigned to a different location in the buffer, the allocation fails. In some embodiments, a specific location within the buffer may be mapped to a cache line that is currently assigned to a different location in system memory that is not associated with the buffer. A technique for handling such cases is now presented.

이제 도 10을 참조하면, 도 9의 시스템(900)의 일 실시예의 블록도가 2개의 시점들에서 다시 예시된다. 도시된 바와 같이, 시스템(900)은, 캐시 메모리 회로가 4개의 추가적인 캐시 라인들, 즉 캐시 라인들(920i 내지 920l)을 갖는 것으로 도시되어 있다는 것을 제외하고는, 도 9에 도시된 바와 동일하다. 전술된 바와 같이, 캐시 메모리 회로(905)는 명료함을 위해 도 9에서 제한된 수의 캐시 라인들을 갖는 것으로 도시되어 있다. 다양한 실시예들에서, 캐시 메모리 회로(905)는, 예를 들어, 도 10에 도시된 12개를 넘는 추가적인 캐시 라인들을 포함하는, 임의의 적합한 수의 캐시 라인들을 포함할 수 있다. 프로세싱 회로(901)는 버퍼(915)의 저장 위치들(935b, 935e, 935h)을 캐시 메모리 회로(905)에 할당하는 것으로 도시되어 있다. 시간 t0에서, 프로세싱 회로(901)는 저장 위치(935b)를 캐시 라인(920k)에 할당하려고 시도한다.Referring now to Figure 10, a block diagram of one embodiment of system 900 of Figure 9 is again illustrated from two viewpoints. As shown, system 900 is identical to that shown in Figure 9, except that the cache memory circuit is shown as having four additional cache lines, namely cache lines 920i through 920l. . As mentioned above, cache memory circuit 905 is shown in Figure 9 as having a limited number of cache lines for clarity. In various embodiments, cache memory circuit 905 may include any suitable number of cache lines, including, for example, additional cache lines beyond the 12 shown in FIG. 10 . Processing circuit 901 is shown assigning storage locations 935b, 935e, 935h of buffer 915 to cache memory circuit 905. At time t0, processing circuitry 901 attempts to assign storage location 935b to cache line 920k.

도 9에 도시된 바와 같이, 저장 위치(935b)는 버퍼(915)의 저장 위치(935g)에 이전에 할당되었던 캐시 라인(920h)에 맵핑되었다. 도 10의 실시예에서, 저장 위치(935b)는 캐시 라인(920k)에 추가로 맵핑될 수 있다. 예를 들어, 캐시 메모리 회로(905)는 세트 연관성일 수 있고, 복수의 경로들을 포함하여, 주어진 시스템 메모리 어드레스가 2개 이상의 캐시 라인들(920)에 맵핑될 수 있게 할 수 있다. 따라서, 캐시 라인(920k)은 캐시 라인(920h)과는 상이한 경로에 있을 수 있고, 따라서, 저장 위치(935b)를 할당할 대안적인 캐시 라인을 제공할 수 있다.As shown in Figure 9, storage location 935b has been mapped to cache line 920h that was previously assigned to storage location 935g of buffer 915. In the embodiment of Figure 10, storage location 935b may be further mapped to cache line 920k. For example, cache memory circuitry 905 may be set associative and may include multiple paths, allowing a given system memory address to be mapped to two or more cache lines 920 . Accordingly, cache line 920k may be in a different path than cache line 920h, thereby providing an alternative cache line to assign storage location 935b.

그러나, 캐시 라인(920k)은, 시간 t0에서, 버퍼(915)와 연관되지 않은 시스템 메모리 회로(910) 내의 위치일 수 있는 저장 위치(1035y)에 할당된다. 저장 위치(935b)를 캐시 라인(920k)에 캐싱하지 못하는 실패에 응답하여, 프로세싱 회로(901)는 상이한 저장 위치를 캐싱하기 전에 저장 위치(935b)의 캐싱을 재시도하도록 구성된다. 도시된 바와 같이, 프로세싱 회로(901)는 저장 위치(935b)를 캐싱하라는 새로운 할당 요청을 생성한다. 일부 실시예들에서, 프로세싱 회로(901)는 저장 위치(935b)를 할당하려는 원래의 시도와 재시도 사이의 특정 양의 시간의 지연 또는 명령 사이클들 또는 버스 사이클들의 횟수를 포함할 수 있다.However, cache line 920k is assigned, at time t0, to storage location 1035y, which may be a location in system memory circuit 910 that is not associated with buffer 915. In response to a failure to cache storage location 935b in cache line 920k, processing circuitry 901 is configured to retry caching storage location 935b before caching a different storage location. As shown, processing circuit 901 generates a new allocation request to cache storage location 935b. In some embodiments, processing circuitry 901 may include a specific amount of delay or number of instruction cycles or bus cycles between the original attempt to allocate storage location 935b and a retry.

시간 t1에서, 저장 위치(1035y)는 캐시 라인(920k)으로부터 축출될 수 있고, 따라서, 저장 위치(935b)는 캐시 라인(920k)에 성공적으로 캐싱될 수 있다. 후속적으로, 프로세싱 회로(901)는 저장 위치(935e), 이어서 저장 위치(935h)의 캐싱을 추가로 시도할 수 있다.At time t1, storage location 1035y can be evicted from cache line 920k, and thus storage location 935b can be successfully cached in cache line 920k. Subsequently, processing circuitry 901 may further attempt caching of storage location 935e, followed by storage location 935h.

저장 위치(935b)의 캐시 할당 시도를 재시도함으로써, 프로세싱 회로(901)는 성공적으로 캐싱되는 버퍼(915)의 저장 위치들의 수를 증가시킬 수 있다. 캐시 메모리 회로(905)에 할당될 수 있는 버퍼(915)의 저장 위치들이 더 많을수록, 버퍼(915)를 활용할 애플리케이션의 QoS 필요성들을 충족할 확률이 더 양호하다.By retrying the cache allocation attempt for storage location 935b, processing circuitry 901 may increase the number of storage locations in buffer 915 that are successfully cached. The more storage locations of buffer 915 that can be assigned to cache memory circuit 905, the better the probability of meeting the QoS needs of the application that will utilize buffer 915.

도 10에 도시된 시스템(900)은 개시된 기법들을 입증하기 위한 예임에 유의한다. 이러한 기법들을 설명하기 위한 요소들만이 예시되어 있다. 전술된 바와 같이, 추가적인 캐시 라인들 및 저장 위치들뿐만 아니라, 추가적인 프로세싱 회로들 및 다른 버스 및 메모리 관리 회로들과 같은 추가적인 요소들이 다른 실시예들에 포함될 수 있다.Note that the system 900 shown in FIG. 10 is an example to demonstrate the disclosed techniques. Only elements intended to illustrate these techniques are illustrated. As discussed above, additional elements may be included in other embodiments, such as additional cache lines and storage locations, as well as additional processing circuits and other bus and memory management circuits.

도 9의 시스템은, 저장 위치들을 캐시 메모리 회로에 캐싱하는 것과 연관된 액션들 중 많은 것을 수행하는 것으로서 프로세싱 회로를 설명한다. 다양한 유형들의 프로세싱 회로들이 그러한 액션들을 수행하는 데 활용될 수 있다. 하나의 그러한 프로세싱 회로는 도 11에 도시된 것과 같은 직접 메모리 액세스(direct-memory access, DMA) 회로를 포함한다.The system of Figure 9 illustrates the processing circuitry as performing many of the actions associated with caching storage locations in the cache memory circuitry. Various types of processing circuits may be utilized to perform such actions. One such processing circuit includes a direct-memory access (DMA) circuit such as that shown in FIG. 11.

이제 도 11을 참조하면, 캐시 메모리에서 시스템 메모리의 버퍼를 캐싱하기 위한 DMA 회로를 포함하는 시스템의 일 실시예가 도시되어 있다. 시스템(1100)은 DMA 회로(1101)에 결합된 프로세서 코어(1190)를 포함하는데, 이는 캐시 메모리 회로(905) 및 시스템 메모리 회로(910)에 추가로 결합된다. 다양한 실시예들에서, DMA 회로(1101), 프로세서 코어(1190), 또는 둘의 조합은 도 9 및 도 10의 프로세싱 회로(901)에 대응할 수 있다.Referring now to Figure 11, one embodiment of a system is shown that includes DMA circuitry for caching buffers of system memory in cache memory. System 1100 includes a processor core 1190 coupled to a DMA circuit 1101, which is further coupled to a cache memory circuit 905 and a system memory circuit 910. In various embodiments, the DMA circuit 1101, the processor core 1190, or a combination of the two may correspond to the processing circuit 901 of FIGS. 9 and 10.

프로세서 회로(1190)는 계산 동작들을 수행하는 범용 프로세서일 수 있다. 일부 실시예들에서, 프로세서 코어(1190)는 그래픽스 프로세서, 오디오 프로세서, 또는 뉴럴 프로세서와 같은 특수 목적 프로세싱 코어일 수 있다. 프로세서 코어(1190)는, 일부 실시예들에서, 복수의 범용 및/또는 특수 목적 프로세서 코어들뿐만 아니라, 전력 신호들, 클록 신호들, 및 메모리 요청들 등을 관리하기 위한 지원 회로들을 포함할 수 있다. DMA 회로(1101)는, 도시된 바와 같이, 시스템(1100)의 메모리 맵에 걸쳐 다양한 메모리 어드레스들 사이의 값들을 카피하거나 이동시키기 위해 메모리 트랜잭션들을 발행하도록 구성된다. DMA 회로(1101)는 특수 회로, 그러한 태스크들을 수행하도록 프로그래밍된 범용 회로, 또는 이들의 조합으로서 구현될 수 있다. DMA 회로(1101)는 적어도 프로세서 코어(1190)에 의해 프로그래밍가능하여, 원하는 시퀀스에서 다수의 메모리 트랜잭션들을 수행할 수 있다.Processor circuit 1190 may be a general-purpose processor that performs computational operations. In some embodiments, processor core 1190 may be a special-purpose processing core, such as a graphics processor, audio processor, or neural processor. Processor core 1190 may, in some embodiments, include a plurality of general-purpose and/or special-purpose processor cores, as well as support circuits for managing power signals, clock signals, memory requests, etc. there is. DMA circuitry 1101 is configured to issue memory transactions to copy or move values between various memory addresses across the memory map of system 1100, as shown. DMA circuit 1101 may be implemented as a specialized circuit, a general-purpose circuit programmed to perform such tasks, or a combination thereof. DMA circuit 1101 is programmable by at least processor core 1190, so that it can perform multiple memory transactions in a desired sequence.

전술된 바와 같이, 프로세싱 회로(901)는 버퍼(915)의 저장 위치들을 캐시 메모리 회로(905)에 캐싱하기 위한 특정 순서를 선택한다. 시스템(1100)에 도시된 바와 같이, 특정 순서를 선택하는 것은, 예를 들어 버퍼(915)의 크기, 및/또는 캐시 메모리 회로(905) 내의 캐시 라인들의 이용가능성에 기초하여, 프로세서 코어(1190)에 의해 수행된다. 프로세서 코어(1190)는 특정 순서를 DMA 회로(1101)에 프로그래밍하도록, 그리고 DMA 회로(1101)를 사용하여 캐시 메모리 회로(905) 내의 버퍼(915)의 저장 위치들(935) 중의 저장 위치들을 캐싱하도록 구성된다. 예를 들어, DMA 회로(1101)는, 프로세서 코어(1190)가, 저장 위치들(935) 중의 저장 위치들에 대응하는 메모리 트랜잭션들을 발행하기 위한 특정 순서를 제공하는 것을 포함하여, 위치들(935)에 대한 소스 어드레스들 및 저장 위치들(935)을 캐시 메모리 회로(905)에 캐싱하기 위한 목적지 어드레스들을 저장할 수 있는 다양한 레지스터들을 포함할 수 있다.As described above, processing circuit 901 selects a particular order for caching storage locations in buffer 915 into cache memory circuit 905. As shown in system 1100, selecting a particular order may be performed by processor core 1190, for example, based on the size of buffer 915 and/or availability of cache lines within cache memory circuit 905. ) is performed by. Processor core 1190 programs a specific order into DMA circuitry 1101 and uses DMA circuitry 1101 to cache storage locations among storage locations 935 of buffer 915 within cache memory circuitry 905. It is configured to do so. For example, the DMA circuit 1101 may include providing a specific order for the processor core 1190 to issue memory transactions corresponding to one of the storage locations 935 . ) and destination addresses for caching the storage locations 935 in the cache memory circuit 905.

예시된 바와 같이, 프로세서 코어(1190)는 저장 위치들(935)에 대한 액세스들을 포함하는 메모리 트랜잭션들을 위해 캐시 메모리 회로(905)에서 캐시 미스 레이트를 추적하도록 추가로 구성된다. 버퍼(915)가 캐시 메모리 회로(905)에 할당된 후, 프로세서 코어(1190), 또는 시스템(1100) 내의 상이한 에이전트는 저장 위치들(935) 중의 저장 위치들에 액세스하는 다양한 메모리 트랜잭션들을 발행할 수 있다. 저장 위치들(935) 중 얼마나 많은 저장 위치들이 캐시 메모리 회로(905)에 성공적으로 할당되었는지에 따라, 저장 위치들(935) 내의 어드레스들을 타깃화하는 이러한 메모리 트랜잭션들에 대해 특정 캐시 미스 레이트가 결정될 수 있다. 예를 들어, 저장 위치들(935)의 10%가 할당되지 못하고, 저장 위치들(935)이 버퍼(915)를 사용하여 특정 에이전트에 의해 동일하게 액세스되는 경우, 캐시 미스는 10% 또는 그에 가까울 것이다. 그러나, 특정 에이전트가 다른 것들보다 더 빈번하게 저장 위치들(935) 중의 특정 저장 위치들에 액세스하는 경우, 캐시 미스 레이트는 더 빈번하게 액세스된 저장 위치들이 성공적으로 할당되었는지의 여부에 따라 10%보다 더 높을 수 있거나 또는 더 낮을 수 있다.As illustrated, processor core 1190 is further configured to track the cache miss rate in cache memory circuit 905 for memory transactions involving accesses to storage locations 935. After buffer 915 is allocated to cache memory circuit 905, processor core 1190, or a different agent within system 1100, may issue various memory transactions accessing one of the storage locations 935. You can. Depending on how many of the storage locations 935 are successfully assigned to the cache memory circuit 905, a specific cache miss rate will be determined for those memory transactions targeting addresses within the storage locations 935. You can. For example, if 10% of the storage locations 935 are unallocated, and the storage locations 935 are equally accessed by a particular agent using the buffer 915, the cache miss would be at or near 10%. will be. However, if a particular agent accesses certain of the storage locations 935 more frequently than others, the cache miss rate may be greater than 10% depending on whether the more frequently accessed storage locations are successfully allocated. It could be higher or it could be lower.

추적된 캐시 미스 레이트가 임계 레이트를 만족시킨다는 결정에 응답하여, 프로세서 코어(1190)는 DMA 회로(1101)에서 특정 순서를 수정하도록 추가로 구성될 수 있다. 예를 들어, 임계 미스 레이트가 15%이고 추적된 미스 레이트가 18%인 경우, 프로세서 코어(1190)는, 캐싱되지 않았지만 메모리 트랜잭션들에서 빈번하게 타깃화되었던 저장 위치들(935)을 식별할 수 있을 뿐만 아니라, 빈번하게 타깃화되지 않았던 성공적으로 캐싱된 저장 위치들(935)을 식별할 수 있다. 수정된 순서는 이러한 식별된 저장 위치들을 할당하기 위한 순서를 조정하여, 더 빈번하게 액세스된 위치들이 수정된 순서로 더 빨리 할당되고 덜 빈번하게 액세스된 위치들이 수정된 순서의 끝을 향해 이동되게 하도록 할 수 있다. 후속 버퍼가 캐시 메모리 회로(905)에 할당되어야 할 때, 원래의 특정 순서를 넘어 수정된 순서가 선택될 수 있다. 일부 실시예들에서, 할당을 위한 선택된 순서가 유사한 태스크들의 과거 수행을 고려하도록, 다양한 순서들이 결정되고 특정 에이전트들, 태스크들, 프로세스들 등과 연관될 수 있다.In response to determining that the tracked cache miss rate satisfies the threshold rate, processor core 1190 may be further configured to modify a particular order in DMA circuitry 1101. For example, if the critical miss rate is 15% and the tracked miss rate is 18%, processor core 1190 can identify storage locations 935 that were not cached but were frequently targeted in memory transactions. In addition, it is possible to identify successfully cached storage locations 935 that were not frequently targeted. The modified order adjusts the order for allocating these identified storage locations so that more frequently accessed locations are allocated sooner in the modified order and less frequently accessed locations are moved toward the end of the modified order. can do. When subsequent buffers need to be allocated to the cache memory circuit 905, a modified order may be selected beyond the original specific order. In some embodiments, various orders may be determined and associated with specific agents, tasks, processes, etc., such that the selected order for allocation takes into account past performance of similar tasks.

할당 순서를 결정하는 것과 관련하여, 후속 저장 위치들이 연속적인 위치들 사이의 특정 증분을 사용하여 선택되는 기법이 상기에 개시되어 있다. 도 11에서, 버퍼(915)를, 각자의 일련의 인접 저장 위치들(935)을 갖는 복수의 블록들(1130a 내지 1130c)(집합적으로, 블록들(1130))로 분할하는 것을 포함하는 기법이 예시되어 있다. 9개의 예시된 저장 위치들(935)이 3개의 블록들(1130)로 분할되고, 각각의 블록(1130)은 3개의 연속적인 저장 위치들(935)을 포함한다. 블록들(1130)이 블록당 동일한 수의 저장 위치들(935)을 포함하는 것으로 도시되어 있지만, 다른 실시예들에서, 각각의 블록(1130)에 포함된 저장 위치들(935)의 수는 변할 수 있다. 예를 들어, 버퍼(915)에 대한 용도는 알려져 있을 수 있고, 알려진 용도에 기초하여, 특정 저장 위치들(935), 또는 위치들의 그룹들은 드물게 액세스되는 것으로 알려져 있을 수 있는 반면, 다른 것들은 더 빈번하게 액세스되는 것으로 알려져 있다. 따라서, 각각의 블록에 할당된 저장 위치들(935)의 수는, 예를 들어, 각각의 블록에 대한 초기 저장 위치(935)가 더 빈번하게 액세스되는 것으로 알려진 위치이도록 조정될 수 있다.With regard to determining the allocation order, a technique is disclosed above in which subsequent storage locations are selected using specific increments between successive locations. 11 , a technique that includes partitioning the buffer 915 into a plurality of blocks 1130a - 1130c (collectively, blocks 1130) each having a respective series of adjacent storage locations 935. This is illustrated. The nine illustrated storage locations 935 are divided into three blocks 1130, with each block 1130 containing three consecutive storage locations 935. Although blocks 1130 are shown as containing the same number of storage locations 935 per block, in other embodiments, the number of storage locations 935 included in each block 1130 may vary. You can. For example, the usage for buffer 915 may be known, and based on the known usage, certain storage locations 935, or groups of locations, may be known to be accessed infrequently, while others may be more frequently accessed. It is known to be easily accessed. Accordingly, the number of storage locations 935 assigned to each block may be adjusted, for example, such that the initial storage location 935 for each block is a location that is known to be accessed more frequently.

저장 위치들(935)이 각자의 블록들(1130)로 분할된 후에, 프로세서 코어(1190)는, 블록들(1130) 중의 각자의 일련의 블록들의 제1 저장 위치(935)를 캐시 메모리 회로(905)에 할당하고, 이어서 블록들(1130) 중의 블록들의 제2 저장 위치(935)를 할당하는 특정 순서를 선택할 수 있다. 도시된 바와 같이, 블록(1130a)은 초기 저장 위치(935a), 이어서 저장 위치들(935c, 935c)을 포함한다. 유사하게, 블록(1130b)은 초기 저장 위치(935d), 이어서 저장 위치들(935e, 935f)을 포함하는 반면, 블록(1130c)은 초기 저장 위치(935g), 이어서 저장 위치들(935h, 935i)을 포함한다.After the storage locations 935 are divided into respective blocks 1130, the processor core 1190 stores the first storage location 935 of each series of blocks 1130 in a cache memory circuit ( 905), and then to allocate the second storage location 935 of blocks 1130. As shown, block 1130a includes an initial storage location 935a, followed by storage locations 935c, 935c. Similarly, block 1130b includes an initial storage location 935d followed by storage locations 935e and 935f, while block 1130c includes an initial storage location 935g followed by storage locations 935h and 935i. Includes.

제1 패스에서, 프로세서 코어(1190)는 DMA 회로(1101)가 블록들(1130) 각각으로부터의 초기 저장 위치, 즉 저장 위치들(935a, 935d, 935g)을 캐싱하게 한다. 후속적으로, DMA 회로(1101)는, 제2 패스에서, 각각의 블록(1130)으로부터 제2 저장 위치(저장 위치들(935b, 935e, 935h))를 캐싱하고, 이어서 제3 패스에서, 각각의 블록(1130)으로부터 제3 위치(저장 위치들(935c, 935f, 935i))를 캐싱한다.In the first pass, processor core 1190 causes DMA circuitry 1101 to cache the initial storage location from each of blocks 1130, namely storage locations 935a, 935d, and 935g. Subsequently, the DMA circuit 1101 caches a second storage location (storage locations 935b, 935e, 935h) from each block 1130 in a second pass and then in a third pass, respectively. Cache the third location (storage locations 935c, 935f, 935i) from block 1130 of .

상기에서 언급된 바와 같이, 프로세서 코어(1190)는 모니터링된 캐시 미스 레이트에 기초하여 특정 순서를 수정할 수 있다. 이러한 수정은, 각각의 블록에 포함된 위치들의 수, 각각의 블록으로부터 한번에 저장된 위치들의 수, 또는 각각의 블록 내의 위치들을 할당하기 위한 순서를 조정하는 것을 포함할 수 있다. 예를 들어, 프로세서 코어(1190)는 블록(1130b)에서 저장 위치(935e)가 저장 위치(935d)보다 더 빈번하게 액세스된다고 결정할 수 있다. 수정된 순서에서, 블록(1130b)으로부터 할당된 초기 저장 위치는 935d보다는 935e일 수 있다.As mentioned above, processor core 1190 may modify certain orders based on monitored cache miss rates. These modifications may include adjusting the number of locations included in each block, the number of locations stored at a time from each block, or the order for assigning locations within each block. For example, processor core 1190 may determine at block 1130b that storage location 935e is accessed more frequently than storage location 935d. In the modified order, the initial storage location allocated from block 1130b may be 935e rather than 935d.

시스템(1100)은 일례일 뿐이라는 것에 유의한다. 도 11은 명확성을 위해 단순화되었다. 9개의 저장 위치들 및 3개의 블록들이 도시되어 있지만, 버퍼(915)는 임의의 적합한 수의 저장 위치들을 포함할 수 있고, 이러한 위치들은 임의의 적합한 수의 블록들로 분할될 수 있다. 각각의 블록에 포함된 위치들의 수는 블록들 사이에서 변할 수 있다. 또한, 주어진 시간에 할당되는 각각의 블록으로부터의 위치들의 수는 패스들 사이에서 변할 수 있다.Note that system 1100 is only an example. Figure 11 has been simplified for clarity. Although nine storage locations and three blocks are shown, buffer 915 may include any suitable number of storage locations, and these locations may be divided into any suitable number of blocks. The number of locations included in each block may vary between blocks. Additionally, the number of positions from each block assigned at any given time may vary between passes.

다양한 유형들의 QoS 레벨들이 도 1 내지 도 8과 관련하여 논의된다. 시스템 메모리로부터 캐시 메모리로 버퍼를 캐싱하는 데 사용되는 트랜잭션들은 또한 상이한 태스크들에 대해 상이한 QoS 레벨들을 활용할 수 있다. 도 12는 개시된 기법들에 의한 벌크 및 실시간 트랜잭션들의 사용을 예시한다.Various types of QoS levels are discussed in conjunction with Figures 1-8. Transactions used to cache buffers from system memory to cache memory may also utilize different QoS levels for different tasks. Figure 12 illustrates the use of bulk and real-time transactions by the disclosed techniques.

이제 도 12로 진행하면, 2개의 상이한 시간들에서의, 즉 캐시에 대한 버퍼 할당 동안 그리고 할당된 버퍼의 사용 동안의 도 9 및 도 10으로부터의 시스템(900)의 일 실시예가 도시된다. 시스템(900)은 도 9 및 도 10에서 이전에 도시된 바와 같은 요소들을 포함한다. 또한, 캐시 메모리 회로(905) 및 시스템 메모리 회로(910)는 벌크 및 실시간 채널들(1240, 1245)을 각각 지원하도록 구성된다. 일부 실시예들에서, 벌크 및 실시간 채널들(1240, 1245)은 각자의 트랜잭션들을 완료하기 위해 다양한 에이전트들과 메모리 회로들 사이의 별개의 물리적 연결들을 활용할 수 있다. 다른 실시예들에서, 실시간 채널(1245) 및 벌크 채널(1240)의 적어도 일부분은 공유되고, 일부 실시예들에서, 전술된 바와 같은 가상 벌크 및 실시간 채널들로서 구현될 수 있다.Turning now to Figure 12, one embodiment of the system 900 from Figures 9 and 10 is shown at two different times: during buffer allocation to the cache and during use of the allocated buffer. System 900 includes elements as previously shown in FIGS. 9 and 10 . Additionally, the cache memory circuit 905 and the system memory circuit 910 are configured to support bulk and real-time channels 1240 and 1245, respectively. In some embodiments, bulk and real-time channels 1240, 1245 may utilize separate physical connections between various agents and memory circuits to complete their respective transactions. In other embodiments, at least a portion of real-time channel 1245 and bulk channel 1240 are shared and, in some embodiments, may be implemented as virtual bulk and real-time channels as described above.

시간 t0에서, 버퍼(915)는 캐시 메모리 회로(905)에 캐싱된다. 본 실시예에서, 버퍼(915)는 실시간 버퍼이다. 본 명세서에 사용되는 바와 같은 "실시간 버퍼"는, 실시간 트랜잭션들이 주로 버퍼의 위치들에 액세스하는 데 사용되는 메모리 버퍼를 지칭한다. 실시간 버퍼는 에이전트 및/또는 태스크와 함께 사용될 수 있고, 여기서 특정 QoS 수요를 충족하지 못하는 것은 에이전트 또는 태스크의 부적절한 동작을 초래할 수 있다. 예를 들어, 재생을 위해 비디오의 프레임을 프로세싱하는 것은 특정 양의 시간 내에 완료될 필요가 있으며, 그렇지 않으면, 비디오 재생은 뷰어에게 눈에 띄는 정지 또는 글리치를 생성할 수 있다.At time t0, buffer 915 is cached in cache memory circuit 905. In this embodiment, buffer 915 is a real-time buffer. “Real-time buffer,” as used herein, refers to a memory buffer that real-time transactions primarily use to access locations in the buffer. Real-time buffers may be used with agents and/or tasks, where failure to meet certain QoS demands may result in inappropriate behavior of the agent or task. For example, processing frames of video for playback needs to be completed within a certain amount of time, otherwise video playback may produce freezes or glitches that are noticeable to the viewer.

버퍼(915)가 실시간 버퍼이지만, 캐시로의 버퍼(915)의 초기 할당은 시간 민감성이 아닐 수 있다. 따라서, 버퍼(915)의 저장 위치들(935)을 캐싱하는 것은 벌크 채널(1240)에 걸쳐 벌크 트랜잭션들(1242)을 사용하여 수행되어, 복수의 저장 위치들(935)을 캐시 메모리 회로(905)에 할당할 수 있다. 시간 t0에서 도시된 바와 같이, 벌크 채널(1240)은 캐시 메모리 회로(905)에서 저장 위치들(935a, 935d, 935g)을 각각 할당하기 위해 벌크 트랜잭션들(1242a, 1242b, 1242c)을 전달하는 데 사용된다. 이러한 버퍼 할당 태스크 동안, 버퍼(915)를 사용하고 있을 에이전트, 즉 프로세싱 회로(901)는, 예를 들어, 버퍼(915)로부터 판독하거나 그에 기록할 준비가 된 값들을 갖지 않을 수 있다. 따라서, 벌크 트랜잭션들(1242)은 버퍼(915)를 할당하기 위해 사용될 수 있다.Although buffer 915 is a real-time buffer, the initial allocation of buffer 915 to the cache may not be time sensitive. Accordingly, caching the storage locations 935 of the buffer 915 is performed using bulk transactions 1242 across the bulk channel 1240 to store the plurality of storage locations 935 in the cache memory circuit 905. ) can be assigned to. As shown at time t0, bulk channel 1240 carries bulk transactions 1242a, 1242b, and 1242c to allocate storage locations 935a, 935d, and 935g, respectively, in cache memory circuit 905. It is used. During this buffer allocation task, the agent that would be using buffer 915, i.e., processing circuit 901, may not have values ready to read from or write to buffer 915, for example. Accordingly, bulk transactions 1242 may be used to allocate buffer 915.

그러나, 버퍼(915)가 실시간 트랜잭션들과 함께 사용될 것으로 예상되기 때문에, 벌크 트랜잭션들(1242)은 이러한 캐싱된 저장 위치들이 실시간 트랜잭션들과 연관됨을 표시하는, 성공적으로 캐싱된 저장 위치들(935)을 갖는 표시를 포함할 수 있다. 예를 들어, 각각의 성공적으로 캐싱된 저장 위치(935)와 연관된 캐시 태그들은, 연관된 캐시 라인(920)이 실시간 트랜잭션들과 함께 사용될 것임을 표시하는 특정 비트 또는 비트들 세트의 그룹을 가질 수 있다. 그들의 각자의 캐시 태그들에서 실시간 표시들을 갖는 캐시 라인들(920)은 캐시 라인들이 축출을 위해 식별될 때 더 높은 우선순위를 수신할 수 있다. 예를 들어, 캐시 메모리 회로(905) 내의 특정 수의 캐시 라인들(920)이, 예컨대 최대 저장 용량의 소정 백분율에 근접하는 임계 레벨에 도달하는 경우, 빈번하게 액세스되지 않았던 캐시 라인들(920) 중의 특정 캐시 라인들이 축출을 위해 선택될 수 있다. 실시간 표시들 세트를 갖는 캐시 라인들(920)은 축출을 위한 고려사항으로부터 생략될 수 있거나, 또는 선택되기 위한 순서에서 매우 낮게 배치될 수 있는데, 예컨대, 다른 캐시 라인들이 축출을 위해 선택될 더 높은 가능성들을 가질 것이다.However, since buffer 915 is expected to be used with real-time transactions, bulk transactions 1242 may be stored in successfully cached storage locations 935, indicating that these cached storage locations are associated with real-time transactions. It may include a sign having . For example, the cache tags associated with each successfully cached storage location 935 may have a specific bit or group of sets of bits indicating that the associated cache line 920 will be used with real-time transactions. Cache lines 920 that have real-time indications in their respective cache tags may receive higher priority when cache lines are identified for eviction. For example, when a certain number of cache lines 920 in the cache memory circuit 905 reaches a threshold level, e.g., approaching a certain percentage of maximum storage capacity, cache lines 920 that have not been accessed frequently Certain cache lines in the cache may be selected for eviction. Cache lines 920 with a set of real-time indications may be omitted from consideration for eviction, or may be placed very low in the order for selection, e.g., at a higher level where other cache lines will be selected for eviction. There will be possibilities.

캐시 메모리 회로(905)는 또한, 실시간 메모리 트랜잭션과 연관된 캐시 미스에 응답하여 시스템 메모리 회로(910)로부터 데이터를 페칭하기 위한 특정 양의 대역폭을 예약할 수 있다. 캐시 메모리 회로(905)는, 실시간 트랜잭션을 발행하기 위해 대역폭이 유지되도록 주어진 시점에서 발행되고 활성인 벌크 트랜잭션들의 수를 제한할 수 있다. 예를 들어, 캐시 메모리 회로(905)와 시스템 메모리 회로(910) 사이의 버스 회로들은 크레디트 기반 중재기 회로를 포함할 수 있다. 발행된 트랜잭션이 이러한 중재기 회로에 의해 선택되게 하기 위해, 캐시 메모리 회로(905)는 특정 수의 버스 크레디트들을 유지할 필요가 있을 수 있다. 그러한 실시예에서, 캐시 메모리 회로(905)는 버스 크레디트들의 수가 특정 수이거나 그 근처인 경우, 벌크 트랜잭션을 발행하는 것을 지연시킬 수 있다. 벌크 트랜잭션은 캐시 메모리 회로(905)가 충분한 수의 버스 크레디트들을 누적한 후에 전송될 수 있다.Cache memory circuitry 905 may also reserve a certain amount of bandwidth for fetching data from system memory circuitry 910 in response to cache misses associated with real-time memory transactions. Cache memory circuitry 905 may limit the number of bulk transactions issued and active at any given time to ensure that bandwidth is maintained for issuing real-time transactions. For example, bus circuits between cache memory circuit 905 and system memory circuit 910 may include a credit-based arbiter circuit. In order for an issued transaction to be selected by this arbiter circuit, the cache memory circuit 905 may need to maintain a certain number of bus credits. In such an embodiment, cache memory circuitry 905 may delay issuing a bulk transaction if the number of bus credits is at or near a certain number. Bulk transactions may be transmitted after the cache memory circuit 905 has accumulated a sufficient number of bus credits.

시간 t1에서, 버퍼(915)는 캐시 메모리 회로(905)에 할당되었다. 굵은 이탤릭체 텍스트에 의해 표시된 바와 같이, 위치들(935f, 935i)은 성공적으로 캐싱되지 못하였다. 예를 들어, 저장 위치들(935f, 935i)은 캐시 라인들(920i, 920l)에 맵핑되었을 수 있는데, 이들은 각각 저장 위치들(1235x, 1235y)에 이전에 할당되었다. 프로세싱 회로(901)는 실시간 트랜잭션(1250a)을 사용하여 성공적으로 캐싱된 저장 위치(935c)에 액세스하도록 추가로 구성된다. 캐시 메모리 회로(905)는 캐시 라인(920a)에 저장된 값들을 사용하여 실시간 트랜잭션(1250a)을 프로세싱하도록 구성될 수 있다.At time t1, buffer 915 has been assigned to cache memory circuit 905. As indicated by the bold italic text, locations 935f and 935i were not cached successfully. For example, storage locations 935f and 935i may have been mapped to cache lines 920i and 920l, which were previously assigned to storage locations 1235x and 1235y, respectively. Processing circuitry 901 is further configured to access successfully cached storage location 935c using real-time transaction 1250a. Cache memory circuit 905 may be configured to process real-time transaction 1250a using values stored in cache line 920a.

캐시 메모리 회로(905)는 각자의 메모리 트랜잭션과 연관된 캐시 미스에 응답하여 시스템 메모리 회로(910)에 대한 페치 요청들을 생성하도록 구성되고, 생성된 페치 요청들은 대응하는 메모리 트랜잭션과 호환가능한 QoS 레벨을 갖는다. 예를 들어, 캐시 메모리 회로(905)는 주어진 에이전트로부터의 벌크 트랜잭션들에 응답하여 벌크 페치들(1265a, 1265b)을 생성할 수 있다. 프로세싱 회로(901)는 실시간 트랜잭션(1250b)을 사용하여 성공적이지 못하게 캐싱된 저장 위치(935f)에 액세스하도록 추가로 구성될 수 있다. 캐시 메모리 회로(905)는, 저장 위치(935f)에 대한 캐시 미스에 응답하여, 실시간 페치(1290)를 사용하여 실시간 트랜잭션(1250b)을 이행하도록 구성된다. 캐시 메모리 회로(905)가 실시간 페치들에 대한 대역폭을 예약하도록 구성되기 때문에, 실시간 페치는 발행되지 않았던 다른 벌크 페치들에 앞서 프로세싱될 수 있다. 예를 들어, 벌크 페치(1265b)는 벌크 페치(1265a)의 완료를 기다리면서 큐잉될 수 있다. 벌크 페치(1265b)가 발행되기 이전에 실시간 페치(1290)가 생성되는 경우, 실시간 페치(1290)는 벌크 페치(1265b)에 앞서 프로세싱될 수 있다.Cache memory circuitry 905 is configured to generate fetch requests to system memory circuitry 910 in response to a cache miss associated with a respective memory transaction, the generated fetch requests having a QoS level compatible with the corresponding memory transaction. . For example, cache memory circuitry 905 may generate bulk fetches 1265a, 1265b in response to bulk transactions from a given agent. Processing circuitry 901 may be further configured to access unsuccessfully cached storage location 935f using real-time transaction 1250b. Cache memory circuitry 905 is configured to perform real-time transaction 1250b using real-time fetch 1290 in response to a cache miss for storage location 935f. Because the cache memory circuit 905 is configured to reserve bandwidth for real-time fetches, a real-time fetch may be processed before other bulk fetches that have not been issued. For example, bulk fetch 1265b may be queued waiting for bulk fetch 1265a to complete. If real-time fetch 1290 is generated before bulk fetch 1265b is issued, real-time fetch 1290 may be processed prior to bulk fetch 1265b.

그러한 실시간 및 벌크 QoS 레벨들의 사용은 캐시 메모리에 할당된 실시간 버퍼를 사용하여 에이전트에 대한 액세스 시간들을 감소시킬 수 있다. 실시간 QoS 레벨의 사용은 또한, 실시간 버퍼의 일부분이 버퍼에 할당되지 못하는 이벤트에서 메모리 액세스 시간들을 감소시킬 수 있다.Use of such real-time and bulk QoS levels can reduce access times for the agent using the real-time buffer allocated to cache memory. Use of a real-time QoS level can also reduce memory access times in the event that part of the real-time buffer is not allocated to the buffer.

도 12의 실시예는 입증 목적들을 위해 사용된 예임에 유의한다. 명확성을 위해, 도 12에 도시된 요소들의 개수가 최소화되었다. 저장 위치들 및 캐시 라인들의 수가 예시됨에도 불구하고, 임의의 적합한 수의 저장 위치들 및 캐시 라인들이 다른 실시예들에 포함될 수 있다. 실시간 및 벌크 트랜잭션들만이 도시되어 있지만, 임의의 적합한 수의 QoS 레벨들이 다른 실시예들에서 사용될 수 있다.Note that the embodiment in Figure 12 is an example used for demonstration purposes. For clarity, the number of elements shown in Figure 12 has been minimized. Although the number of storage locations and cache lines is illustrated, any suitable number of storage locations and cache lines may be included in other embodiments. Although only real-time and bulk transactions are shown, any suitable number of QoS levels may be used in other embodiments.

도 9 내지 도 12와 관련하여 전술된 회로들, 프로세스들, 및 기법들은 시스템 메모리에 있는 버퍼를 캐시 메모리에 할당하기 위한 다양한 기법들을 설명한다. 다양한 방법들이 이러한 다양한 기법들을 구현하는 데 사용될 수 있다. 세 가지의 그러한 방법들이 도 13 내지 도 15를 참조하여 후술된다.The circuits, processes, and techniques described above with respect to FIGS. 9-12 describe various techniques for allocating buffers in system memory to cache memory. Various methods can be used to implement these various techniques. Three such methods are described below with reference to Figures 13-15.

이제 도 13을 참조하면, 시스템 메모리 내에 있는 버퍼를 캐시 메모리 회로에 캐싱하기 위한 방법의 일 실시예에 대한 흐름도가 도시되어 있다. 다양한 실시예들에서, 방법(1300)은, 캐시 메모리 회로(905) 내의 버퍼(915)를 캐싱하기 위한 프로세스의 일부로서, 도 9, 도 10, 및 도 12의 프로세싱 회로(901)에 의해 수행될 수 있다. 예를 들어, 프로세싱 회로(901)는, 도 13을 참조하여 설명된 동작들을 야기하도록 프로세싱 회로에 의해 실행가능한 프로그램 명령들이 저장된 비일시적 컴퓨터 판독가능 매체를 포함할 수 있다(또는 그에 액세스할 수 있음). 도 9 및 도 13을 집합적으로 참조하면, 방법(1300)은 블록(1310)에서 시작한다.Referring now to Figure 13, a flow diagram is shown for one embodiment of a method for caching a buffer within system memory in a cache memory circuit. In various embodiments, method 1300 is performed by processing circuit 901 of FIGS. 9, 10, and 12 as part of a process for caching buffer 915 within cache memory circuit 905. It can be. For example, processing circuit 901 may include (or be accessible to) a non-transitory computer-readable medium having stored program instructions executable by the processing circuit to cause the operations described with reference to FIG. 13. ). Referring collectively to Figures 9 and 13, method 1300 begins at block 1310.

블록(1310)에서, 방법(1300)은, 프로세싱 회로(901)에 의해, 시스템 메모리 회로(910) 내의 복수의 저장 위치들(935)을 버퍼(915)에 할당하는 단계를 포함한다. 도시된 바와 같이, 프로세싱 회로(901), 또는 시스템(900) 내의 상이한 에이전트는, 에이전트가 수행하도록 준비하고 있는 특정 프로세스 또는 태스크와 함께 사용하기 위해 시스템 메모리 회로(910)에서 할당될 버퍼(915)를 요청할 수 있다. 예를 들어, 태스크는 이미지, 오디오 파일, 파일의 암호화 또는 복호화, 센서로부터의 입력의 분석 등의 프로세싱을 수반할 수 있다. 일부 실시예들에서, 버퍼(915)는 실시간 트랜잭션들을 사용하여 저장 위치들(935)에 액세스하는 실시간 버퍼일 수 있다. 전술된 바와 같이, 실시간 트랜잭션들은 벌크 트랜잭션들과 같은 다른 트랜잭션들보다 더 높은 QoS 레벨을 갖는다.At block 1310 , method 1300 includes allocating, by processing circuitry 901 , a plurality of storage locations 935 within system memory circuitry 910 to a buffer 915 . As shown, processing circuitry 901, or different agents within system 900, may have buffers 915 allocated in system memory circuitry 910 for use with the particular process or task the agent is preparing to perform. You can request. For example, tasks may involve processing images, audio files, encrypting or decrypting files, analyzing input from sensors, etc. In some embodiments, buffer 915 may be a real-time buffer that accesses storage locations 935 using real-time transactions. As described above, real-time transactions have a higher QoS level than other transactions such as bulk transactions.

방법(1300)은, 블록(1320)에서, 저장 위치들(935)을 캐시 메모리 회로(905)에 할당하기 위한 특정 순서를 결정하는 단계를 추가로 포함한다. 이러한 특정 순서는 선형 순서의 사용과 비교하여 캐시 미스 레이트들의 균일성을 증가시키도록 선택될 수 있다. 전술된 바와 같이, 선형 순서를 사용하여 저장 위치들(935)을 할당하는 것은 선형 순서의 시작 근처의 저장 위치들(935)이 성공적으로 캐싱되는 결과를 초래할 수 있는 반면, 선형 순서의 끝에 있는 저장 위치들(935)은 이전에 캐싱된 저장 위치들(935)과 동일한 캐시 라인들(920)에 맵핑되는 것으로 인해 성공적으로 캐싱되지 못한다. 버퍼(915) 내의 데이터가 저장 위치들(935)로부터 그들이 할당되었던 바와 동일한 순서로 액세스되는 경우, 프로세싱이 순서의 끝을 향해 이동함에 따라 더 많은 캐시 미스들이 예상될 것이다. 따라서, 특정 순서는, 버퍼(915)의 사용 동안 캐시 미스들을 균일하게 분배하려고 시도하는 순서로 캐싱이 발생하도록 선택된다. 따라서, 버퍼(915)의 사용 동안, 캐시 미스들은 버퍼 액세스들의 임의의 특정 부분 동안 집중되지 않을 수 있다.Method 1300 further includes, at block 1320, determining a specific order for assigning storage locations 935 to cache memory circuit 905. This specific ordering may be chosen to increase uniformity of cache miss rates compared to the use of a linear ordering. As described above, allocating storage locations 935 using a linear sequence can result in storage locations 935 near the start of the linear sequence being cached successfully, while stores at the end of the linear sequence Locations 935 are not cached successfully due to being mapped to the same cache lines 920 as previously cached storage locations 935 . If the data in buffer 915 is accessed from storage locations 935 in the same order in which they were assigned, more cache misses would be expected as processing moves toward the end of the sequence. Accordingly, a specific order is selected so that caching occurs in an order that attempts to evenly distribute cache misses during use of buffer 915. Accordingly, during use of buffer 915, cache misses may not be concentrated during any particular portion of buffer accesses.

블록(1330)에서, 방법(1300)은 또한, 특정 순서를 사용하여 버퍼(915)의 저장 위치들(935)을 캐싱하는 단계를 포함한다. 저장 위치들(935)을 할당하기 위한 특정 순서가 선택된 후에, 프로세싱 회로(901)는 저장 위치들(935) 중의 저장 위치들을 캐시 메모리 회로(905)에 할당하기 시작한다. 도 10에 도시된 바와 같은 일부 실시예들에서, 방법(130)은 특정 저장 위치(935)(예컨대, 저장 위치(935b))를 캐싱하는 것에 대한 실패에 응답하여, 저장 위치(935e)와 같은 상이한 저장 위치(935)를 캐싱하기 전에 저장 위치(935b)의 캐싱을 재시도하는 단계를 포함할 수 있다.At block 1330, method 1300 also includes caching storage locations 935 of buffer 915 using a particular order. After a specific order for assigning storage locations 935 is selected, processing circuitry 901 begins assigning storage locations among storage locations 935 to cache memory circuitry 905. In some embodiments, as shown in FIG. 10 , method 130 may, in response to failure to cache a particular storage location 935 (e.g., storage location 935b), cache a storage location 935e, such as storage location 935e. This may include retrying caching of storage location 935b before caching a different storage location 935.

방법(1300)은 블록(1330)에서 종료될 수 있다. 일부 실시예들에서, 방법(1300)의 적어도 일부분이 반복될 수 있다. 예를 들어, 방법(1300)은 시스템 메모리 회로(910)에서 상이한 버퍼를 할당하기 위한 요청을 수신하는 것에 응답하여 반복될 수 있다. 일부 경우들에서, 방법(1300)은 방법의 다른 인스턴스들과 동시에 수행될 수 있다. 예를 들어, 프로세싱 회로(901)의 둘 이상의 인스턴스들 또는 프로세싱 회로(901)의 단일 인스턴스에서의 다수의 프로세스 스레드들은 각각 방법(1300)을 서로 독립적으로 수행할 수 있다.Method 1300 may end at block 1330. In some embodiments, at least a portion of method 1300 may be repeated. For example, method 1300 may repeat in response to receiving a request to allocate a different buffer in system memory circuit 910. In some cases, method 1300 may be performed concurrently with other instances of the method. For example, two or more instances of processing circuitry 901 or multiple process threads in a single instance of processing circuitry 901 may each perform method 1300 independently of one another.

이제 도 14를 참조하면, 캐시 메모리 회로에 할당되는 버퍼를 갖는 다양한 QoS 레벨들을 사용하기 위한 방법의 일 실시예에 대한 흐름도가 예시되어 있다. 방법(1300)과 유사한 방식으로, 방법(1400)은 도 9, 도 10, 및 도 12의 프로세싱 회로(901)에 의해 수행될 수 있다. 전술된 바와 같이, 프로세싱 회로(901)는, 도 14를 참조하여 설명된 동작들을 야기하도록 프로세싱 회로(901)에 의해 실행가능한 프로그램 명령들이 저장된 비일시적 컴퓨터 판독가능 매체를 포함할 수 있다(또는 그에 액세스할 수 있음). 도 12 및 도 14를 집합적으로 참조하면, 방법(1400)은 블록(1410)에서 시작한다.Referring now to Figure 14, a flow diagram of one embodiment of a method for using various QoS levels with a buffer allocated to a cache memory circuit is illustrated. In a similar manner to method 1300, method 1400 may be performed by processing circuitry 901 of FIGS. 9, 10, and 12. As described above, processing circuit 901 may include (or may include) a non-transitory computer-readable medium having program instructions executable by processing circuit 901 to cause the operations described with reference to FIG. 14 stored thereon. accessible). Referring collectively to Figures 12 and 14, method 1400 begins at block 1410.

블록(1410)에서, 방법(1400)은 벌크 트랜잭션들을 사용하여 복수의 위치들을 캐시 메모리 회로에 할당하는 단계를 포함한다. 예시된 바와 같이, 버퍼(915)의 할당 프로세스는 중요한 QoS 수요를 갖지 않을 수 있다. 따라서, 버퍼(915)의 저장 위치들(935)의 캐싱은 벌크 트랜잭션들(1242)을 사용하여 수행되어, 저장 위치들(935)을 캐시 메모리 회로(905)에 할당할 수 있다. 도 12의 시간 t0에서 도시된 바와 같이, 벌크 채널(1240)은 캐시 메모리 회로(905)에 저장 위치들(935a, 935d, 935g)을 각각 할당하기 위해 벌크 트랜잭션들(1242a, 1242b, 1242c)을 전달하는 데 사용된다.At block 1410, method 1400 includes assigning a plurality of locations to a cache memory circuit using bulk transactions. As illustrated, the allocation process of buffer 915 may not have significant QoS demands. Accordingly, caching of the storage locations 935 of the buffer 915 may be performed using bulk transactions 1242 to assign the storage locations 935 to the cache memory circuit 905. As shown at time t0 in FIG. 12, bulk channel 1240 initiates bulk transactions 1242a, 1242b, and 1242c to allocate storage locations 935a, 935d, and 935g, respectively, to cache memory circuit 905. It is used to convey.

방법(1400)은 또한, 블록(1420)에서, 실시간 트랜잭션들과의 사용을 표시하는, 성공적으로 캐싱된 저장 위치들(935)을 갖는 표시를 포함하는 단계를 포함한다. 버퍼(915)에 대한 할당 프로세스가 실시간 수요를 갖지 않았을 수 있지만, 버퍼(915)는 실시간 트랜잭션들을 사용하여 액세스할 것으로 예상될 수 있다. 따라서, 특정 저장 위치(935)가 각자의 캐시 라인(920)에 성공적으로 캐싱될 때, 캐시 라인에 대한 대응하는 캐시 태그는 캐싱된 콘텐츠들이 실시간 트랜잭션들과 연관된다는 표시를 포함할 수 있다. 전술된 바와 같이, 그러한 표시들은 버퍼(915)에 할당되었던 캐시 라인들(920)의 축출을 회피하는 것을 도울 수 있다.The method 1400 also includes, at block 1420, including an indication of having successfully cached storage locations 935, indicating use with real-time transactions. Although the allocation process for buffer 915 may not have a real-time demand, buffer 915 can be expected to be accessed using real-time transactions. Accordingly, when a particular storage location 935 is successfully cached to its respective cache line 920, the corresponding cache tag for the cache line may include an indication that the cached contents are associated with real-time transactions. As described above, such indications can help avoid eviction of cache lines 920 that had been allocated to buffer 915.

블록(1430)에서, 방법(1400)은, 에이전트(예컨대, 프로세싱 회로(901))에 의해, 실시간 트랜잭션들을 사용하여 성공적으로 캐싱된 저장 위치들(935)에 액세스하는 단계를 추가로 포함한다. 버퍼(915)의 할당이 완료된 후에, 프로세싱 회로(901)는, 도 12에 도시된 바와 같이, 실시간 트랜잭션들(1250a, 1250b)을 사용하여 저장 위치들(935) 중의 저장 위치들에 액세스할 수 있다. 실시간 트랜잭션(1250a)은 저장 위치(935c)가 캐싱되었던 캐시 라인(920a)을 히트한다. 실시간 트랜잭션(1250a)이 판독 요청을 포함하는 경우, 요청된 어드레스에 대응하는, 캐시 라인(920a)으로부터의 데이터는 실시간 트랜잭션을 사용하여 캐시 메모리 회로(905)로부터 프로세싱 회로(901)로 전송될 수 있다.At block 1430, method 1400 further includes accessing, by an agent (e.g., processing circuitry 901), successfully cached storage locations 935 using real-time transactions. After allocation of buffer 915 is complete, processing circuitry 901 may access one of storage locations 935 using real-time transactions 1250a and 1250b, as shown in FIG. 12. there is. Real-time transaction 1250a hits cache line 920a where storage location 935c was cached. If real-time transaction 1250a includes a read request, data from cache line 920a, corresponding to the requested address, may be transferred from cache memory circuit 905 to processing circuit 901 using a real-time transaction. there is.

방법(1400)은, 블록(1440)에서, 또한, 캐싱되지 못한 저장 위치들(935) 중의 특정 위치에 대한 캐시 미스에 응답하여, 캐시 메모리 회로(905)에 의해, 실시간 트랜잭션들을 사용하여 시스템 메모리 회로(910)에서 버퍼(915) 내의 특정 저장 위치(935)에 액세스하는 단계를 포함한다. 도 12에 도시된 바와 같이, 실시간 트랜잭션(1250b)은 저장 위치(935f)로 타깃화된다. 그러나, 저장 위치(935f)는 캐시 메모리 회로(905)에서 성공적으로 캐싱되지 못하였다. 따라서, 캐시 메모리 회로(905)는 저장 위치(935f)로부터 값들을 취출하기 위해 실시간 페치(1290)를 생성하여 시스템 메모리 회로(910)로 발행한다. 캐시 메모리 회로(905)에 의해 또한 생성된 벌크 페치들(1265a, 1265b) 중 어느 하나가, 실시간 페치(1290)가 발행될 준비가 될 때 발행되지 않았다면, 실시간 페치(1290)는 발행되지 않은 벌크 페치들에 앞서 우선순위화될 수 있다.The method 1400, at block 1440, also includes, in response to a cache miss for a particular one of the uncached storage locations 935, using real-time transactions, by the cache memory circuitry 905. and accessing, in circuit 910, a specific storage location 935 within buffer 915. As shown in Figure 12, real-time transaction 1250b is targeted to storage location 935f. However, storage location 935f was not successfully cached in cache memory circuit 905. Accordingly, cache memory circuit 905 generates a real-time fetch 1290 and issues it to system memory circuit 910 to retrieve values from storage location 935f. If either of the bulk fetches 1265a, 1265b, also generated by the cache memory circuit 905, has not been issued when the real-time fetch 1290 is ready to be issued, then the real-time fetch 1290 will May be prioritized ahead of fetches.

방법(1400)은 블록(1440)에서 종료될 수 있거나, 또는 일부 실시예들에서, 전체적으로 또는 부분적으로 반복될 수 있다. 예를 들어, 프로세싱 회로(901)가 버퍼(915) 내의 값들을 프로세싱하고 있는 동안 블록(1430)이 반복될 수 있다. 유사하게, 프로세싱 회로(901)가 성공적으로 캐싱되지 않았던 저장 위치(935)에 액세스할 때 블록(1440)이 반복될 수 있다. 방법(1300)과 유사한 방식으로, 방법(1400)은 방법(1400)의 다른 인스턴스들과 동시에 수행될 수 있다.Method 1400 may end at block 1440, or, in some embodiments, may be repeated in whole or in part. For example, block 1430 may be repeated while processing circuit 901 is processing values in buffer 915. Similarly, block 1440 may be repeated when processing circuitry 901 accesses storage location 935 that was not successfully cached. In a similar manner to method 1300, method 1400 may be performed concurrently with other instances of method 1400.

이제 도 15로 진행하면, 버퍼를 캐시 메모리 회로에 할당하기 위한 특정 순서를 선택 및 조정하기 위한 방법의 일 실시예에 대한 흐름도가 예시되어 있다. 방법들(1300, 1400)에 대해 설명된 바와 같이, 방법(1400)은 도 9, 도 10, 및 도 12의 프로세싱 회로(901)에 의해 수행될 수 있다. 설명된 바와 같이, 프로세싱 회로(901)는, 도 15를 참조하여 설명된 동작들을 야기하도록 프로세싱 회로(901)에 의해 실행가능한 프로그램 명령들이 저장된 비일시적 컴퓨터 판독가능 매체를 포함할 수 있다(또는 그에 액세스할 수 있음). 도 12 및 도 15를 집합적으로 참조하면, 방법(1500)은 블록(1510)에서 시작한다.Turning now to Figure 15, a flow diagram of one embodiment of a method for selecting and coordinating a particular order for allocating buffers to cache memory circuits is illustrated. As described for methods 1300 and 1400, method 1400 may be performed by processing circuitry 901 of FIGS. 9, 10, and 12. As described, processing circuit 901 may include (or may include) a non-transitory computer-readable medium having program instructions executable by processing circuit 901 to cause the operations described with reference to FIG. 15 stored thereon. accessible). Referring collectively to Figures 12 and 15, method 1500 begins at block 1510.

블록(1510)에서, 방법(1500)은 복수의 저장 위치들(935)에 대한 원하는 캐시 미스 레이트를 사용하여 특정 순서를 결정하는 단계를 포함한다. 전술된 바와 같이, 버퍼(915)를 할당하기 위한 특정 순서는 버퍼(915)에 걸쳐 캐시 미스들을 분배하는 목표를 갖고 선택될 수 있다. 버퍼(915)를 사용할 에이전트(예컨대, 프로세싱 회로(901))는 선형 순서를 사용하여 버퍼(915)에 저장된 데이터를 프로세싱할 수 있다. 프로세싱 회로(901)는 935a와 같은 초기 저장 위치에서 시작할 수 있고, 순차적으로 저장 위치들(935), 예컨대 935b, 935c 등을 통해 진행하여, 저장 위치(935i)에서 종료될 수 있다. 저장 위치들(935)이 이러한 동일한 선형 순서로 할당되는 경우, 더 많은 저장 위치들(935)이 버퍼(915)의 종단을 향해 캐싱되지 못할 수 있다. 동일한 순서로 버퍼(915) 내의 데이터를 프로세싱하는 것은 프로세싱이 진행됨에 따라 증가하는 캐시 미스 레이트를 초래하여, 잠재적으로 버퍼(915)의 종단을 향해 피크를 이룰 수 있다. 특정 순서는 버퍼(915)에 걸쳐 저장 위치들(935)의 할당 실패들을 분배하도록 선택되어, 버퍼(915)가 프로세싱됨에 따라, 피크 캐시 미스 레이트가 원하는 캐시 미스 레이트 미만으로 유지되도록 할 수 있다.At block 1510, method 1500 includes determining a specific order using a desired cache miss rate for a plurality of storage locations 935. As described above, a specific order for allocating buffers 915 may be selected with the goal of distributing cache misses across buffers 915. An agent that will use buffer 915 (e.g., processing circuit 901) may process data stored in buffer 915 using a linear order. Processing circuit 901 may start at an initial storage location, such as 935a, sequentially progress through storage locations 935, such as 935b, 935c, etc., and end at storage location 935i. If storage locations 935 are allocated in this same linear order, more storage locations 935 may not be cached toward the end of buffer 915. Processing the data in buffer 915 in the same order can result in a cache miss rate that increases as processing progresses, potentially peaking toward the end of buffer 915. A specific order may be chosen to distribute allocation failures of storage locations 935 across buffer 915 such that the peak cache miss rate remains below the desired cache miss rate as buffer 915 is processed.

방법(1500)은, 블록(1520)에서, 또한, 캐싱 후에 프로세싱 회로(901)에 의해, 선형 순서를 사용하여 복수의 저장 위치들(935)에 액세스하는 단계를 포함한다. 설명된 바와 같이, 프로세싱 회로(901)는 특정 순서와는 상이한 선형 순서를 사용하여 버퍼(915)에 액세스할 수 있다. 다른 실시예들에서, 프로세싱 회로(901)는 선형 순서와는 상이한 순서를 사용할 수 있다. 그러한 실시예들에서, 특정 순서는, 예를 들어, 저장 위치들(935)을 할당하기 위한 선형 순서를 사용하는 것을 포함하여, 상이한 순서와는 상이하도록 선택될 수 있다.The method 1500 includes accessing, at block 1520, the plurality of storage locations 935 using a linear order by processing circuitry 901 after caching. As described, processing circuitry 901 may access buffer 915 using a linear order that differs from the specific order. In other embodiments, processing circuit 901 may use an ordering other than a linear ordering. In such embodiments, the specific order may be selected to be different from a different order, including, for example, using a linear order to assign storage locations 935.

블록(1530)에서, 방법은 복수의 저장 위치들(935)을 캐싱하기 위한 특정 순서의 사용과 연관된 캐시 미스 레이트를 추적하는 단계를 추가로 포함한다. 프로세싱 회로(901)가 버퍼(915)를 사용함에 따라, 관찰된 캐시 미스 레이트가 추적될 수 있고, 원하는 캐시 미스 레이트에 추가로 비교될 수 있다. 저장 위치들(935)을 할당하기 위한 특정 순서가 효과적이었다면, 추적된 캐시 미스 레이트는 원하는 캐시 미스 레이트 미만으로 유지되어야 하는데, 이는 캐시 미스들이 버퍼(915) 내의 모든 데이터의 프로세싱 전체에 걸쳐 더 일관되게 발생할 수 있기 때문이다. 캐시 미스들을 일관되게 분배함으로써, 피크 캐시 미스 레이트는 합리적으로 낮게 유지되어야 하고, 원하는 캐시 미스 레이트를 초과하지 않아야 한다.At block 1530, the method further includes tracking a cache miss rate associated with the use of a particular order for caching the plurality of storage locations 935. As processing circuitry 901 uses buffer 915, the observed cache miss rate can be tracked and further compared to the desired cache miss rate. If the particular order for allocating storage locations 935 was effective, the tracked cache miss rate should remain below the desired cache miss rate, which would result in cache misses being more consistent throughout the processing of all data in buffer 915. Because it can happen. By distributing cache misses consistently, the peak cache miss rate should be kept reasonably low and not exceed the desired cache miss rate.

방법(1500)은, 블록(1540)에서, 추적된 캐시 미스 레이트가 임계 레이트를 만족시킨다고 결정하는 것에 응답하여, 후속 사용을 위해 특정 순서를 조정하는 단계를 추가로 포함한다. 예시된 바와 같이, 추적된 캐시 미스 레이트가 원하는 캐시 미스 레이트에 도달하거나 그를 초과하는 경우, 선택된 특정 순서를 사용하여 버퍼(915)를 할당하는 것은 원하는 결과들을 달성하지 않았다. 임계 레이트는 원하는 캐시 미스 레이트와 동일할 수 있거나, 또는 전체 시스템 동작 목표들에 기초하여 더 높게 또는 더 낮게 조정될 수 있다. 특정 순서를 조정하기 위해, 캐시 미스 레이트가 임계 레이트를 만족시켰던 시간에 발생했던 캐시 미스들은, 액세스되고 있던 저장 위치들(935)을 식별하기 위해 분석될 수 있다. 이러한 식별된 저장 위치들(935) 중 하나 이상은 조정된 할당 순서의 시작에 더 가깝게 이동되도록 선택될 수 있다. 또한, 캐시 미스 레이트가 낮았을 때의 시간에 액세스되었던 저장 위치들(935)이 또한 식별될 수 있다. 이러한 저장 위치들 중 하나 이상은 조정된 할당 순서의 종료를 향해 이동되도록 선택될 수 있다.The method 1500 further includes, at block 1540, in response to determining that the tracked cache miss rate satisfies the threshold rate, adjusting the particular order for subsequent use. As illustrated, when the tracked cache miss rate reaches or exceeds the desired cache miss rate, allocating buffers 915 using the specific order selected did not achieve the desired results. The threshold rate may be equal to the desired cache miss rate, or may be adjusted higher or lower based on overall system operating goals. To tune a particular order, cache misses that occurred at a time when the cache miss rate met a threshold rate can be analyzed to identify the storage locations 935 that were being accessed. One or more of these identified storage locations 935 may be selected to be moved closer to the start of the adjusted allocation order. Additionally, storage locations 935 that were accessed at times when the cache miss rate was low can also be identified. One or more of these storage locations may be selected to be moved toward the end of the adjusted allocation order.

방법(1500)은 블록(1540)에서 종료될 수 있거나, 또는 일부 실시예들에서, 전체적으로 또는 부분적으로 반복될 수 있다. 예를 들어, 프로세싱 회로(901)가 버퍼(915) 내의 저장 위치들(935)에 액세스하고 있는 동안 블록들(1520, 1530)이 반복될 수 있다. 방법들(1300, 1400)에 대해 설명된 바와 같이, 방법(1500)은 또한 방법(1500)의 다른 인스턴스들과 동시에 수행될 수 있다. 또한, 방법들(1300, 1400, 1500)은 서로 동시에 수행될 수 있다.Method 1500 may end at block 1540, or, in some embodiments, may be repeated in whole or in part. For example, blocks 1520 and 1530 may be repeated while processing circuit 901 is accessing storage locations 935 within buffer 915. As described for methods 1300 and 1400, method 1500 may also be performed concurrently with other instances of method 1500. Additionally, methods 1300, 1400, and 1500 may be performed concurrently with each other.

도 1 내지 도 8은 직접 어드레싱가능한 어드레스 영역으로서의 사용을 위해 캐시 메모리의 일부분을 재할당하는 시스템을 위한 회로들 및 방법들을 예시한다. 도 9 내지 도 15는 시스템 메모리 내에 있는 버퍼를 캐시 메모리 회로에 캐싱하기 위한 회로들 및 기법들을 도시한다. 개시된 시스템들의 임의의 실시예는 데스크톱 컴퓨터, 랩톱 컴퓨터, 스마트폰, 태블릿, 웨어러블 디바이스 등과 같은 다양한 컴퓨터 시스템들 중 하나 이상에 포함될 수 있다. 일부 실시예들에서, 전술된 회로들은 시스템-온-칩(SoC) 또는 다른 유형의 집적 회로 상에서 구현될 수 있다. 컴퓨터 시스템(1600)의 일 실시예를 예시한 블록도가 도 16에 예시되어 있다. 컴퓨터 시스템(1600)은, 일부 실시예들에서, 시스템들(100, 200, 500, 600, 900, 또는 1100)과 같은 개시된 실시예들 중 임의의 것을 포함할 수 있다.1-8 illustrate circuits and methods for a system that reallocates a portion of cache memory for use as a directly addressable address area. 9-15 illustrate circuits and techniques for caching a buffer within system memory in a cache memory circuit. Any embodiment of the disclosed systems may be included in one or more of a variety of computer systems, such as desktop computers, laptop computers, smartphones, tablets, wearable devices, etc. In some embodiments, the circuits described above may be implemented on a system-on-chip (SoC) or other type of integrated circuit. A block diagram illustrating one embodiment of computer system 1600 is illustrated in FIG. 16. Computer system 1600 may, in some embodiments, include any of the disclosed embodiments, such as systems 100, 200, 500, 600, 900, or 1100.

예시된 실시예에서, 시스템(1600)은, 다수의 유형들의 프로세싱 회로들, 예컨대 중앙 프로세싱 유닛(CPU), 그래픽스 프로세싱 유닛(GPU) 등, 통신 패브릭, 및 메모리들 및 입력/출력 디바이스들에 대한 인터페이스들을 포함할 수 있는 시스템-온-칩(SoC)(1606)의 적어도 하나의 인스턴스를 포함한다. 일부 실시예들에서, SoC(1606)의 하나 이상의 프로세서들은 다수의 실행 레인들 및 명령 발행 큐를 포함한다. 다양한 실시예들에서, SoC(1606)는 외부 메모리(1602), 주변기기들(1604), 및 전력 공급부(1608)에 결합된다.In the illustrated embodiment, system 1600 includes multiple types of processing circuits, such as a central processing unit (CPU), graphics processing unit (GPU), etc., a communication fabric, and memories and input/output devices. Includes at least one instance of a system-on-chip (SoC) 1606, which may include interfaces. In some embodiments, one or more processors of SoC 1606 include multiple execution lanes and an instruction issue queue. In various embodiments, SoC 1606 is coupled to external memory 1602, peripherals 1604, and power supply 1608.

SoC(1606)에 공급 전압들을 공급할 뿐만 아니라 메모리(1602) 및/또는 주변기기들(1604)에 하나 이상의 공급 전압들을 공급하는 전력 공급부(1608)가 또한 제공된다. 다양한 실시예들에서, 전력 공급부(1608)는 배터리(예를 들어, 스마트 폰, 랩톱이나 태블릿 컴퓨터, 또는 다른 디바이스 내의 재충전가능 배터리)를 표현한다. 일부 실시예들에서, SoC(1606)의 하나 초과의 인스턴스가 포함된다(그리고 하나 초과의 외부 메모리(1602)가 또한 포함됨).A power supply 1608 is also provided that supplies supply voltages to the SoC 1606 as well as one or more supply voltages to the memory 1602 and/or peripherals 1604. In various embodiments, power supply 1608 represents a battery (e.g., a rechargeable battery in a smart phone, laptop or tablet computer, or other device). In some embodiments, more than one instance of SoC 1606 is included (and more than one external memory 1602 is also included).

메모리(1602)는, 동적 랜덤 액세스 메모리(DRAM), 동기식 DRAM(SDRAM), 더블 데이터 레이트(DDR, DDR2, DDR3 등) SDRAM(mDDR3 등과 같은 모바일 버전들의 SDRAM들, 및/또는 LPDDR2 등과 같은 저전력 버전들의 SDRAM들을 포함함), RAMBUS DRAM(RDRAM), 정적 RAM(SRAM) 등과 같은 임의의 유형의 메모리이다. 하나 이상의 메모리 디바이스들은 단일 인라인 메모리 모듈(single inline memory module, SIMM)들, 듀얼 인라인 메모리 모듈(DIMM)들 등과 같은 메모리 모듈들을 형성하기 위해 회로 보드 상에 결합된다. 대안적으로, 디바이스들에는 칩-온-칩(chip-on-chip) 구성, 패키지-온-패키지(package-on-package) 구성 또는 멀티-칩 모듈 구성으로 SoC 또는 집적 회로가 실장된다.Memory 1602 may include dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM (mobile versions of SDRAM, such as mDDR3, and/or low-power versions, such as LPDDR2). It is any type of memory, such as RAMBUS DRAM (RDRAM), static RAM (SRAM), etc. One or more memory devices are combined on a circuit board to form memory modules, such as single inline memory modules (SIMMs), dual inline memory modules (DIMMs), etc. Alternatively, the devices are mounted with SoCs or integrated circuits in a chip-on-chip configuration, package-on-package configuration, or multi-chip module configuration.

주변기기들(1604)은 시스템(1600)의 유형에 의존하여 임의의 원하는 회로부를 포함한다. 예를 들어, 일 실시예에서, 주변기기들(1604)은 Wi-Fi, 블루투스, 셀룰러, 글로벌 포지셔닝 시스템 등과 같은 다양한 유형들의 무선 통신용 디바이스들을 포함한다. 일부 실시예들에서, 주변기기들(1604)은 또한 RAM 저장소, 솔리드 스테이트 저장소, 또는 디스크 저장소를 포함하여, 부가적인 저장소를 포함한다. 주변기기들(1604)은 터치 디스플레이 스크린들 또는 멀티터치 디스플레이 스크린들을 포함하는 디스플레이 스크린, 키보드 또는 다른 입력 디바이스들, 마이크로폰들, 스피커들 등과 같은 사용자 인터페이스 디바이스들을 포함한다.Peripherals 1604 include any desired circuitry depending on the type of system 1600. For example, in one embodiment, peripherals 1604 include devices for various types of wireless communication, such as Wi-Fi, Bluetooth, cellular, global positioning system, etc. In some embodiments, peripherals 1604 also include additional storage, including RAM storage, solid state storage, or disk storage. Peripherals 1604 include user interface devices such as a display screen, including touch display screens or multitouch display screens, keyboard or other input devices, microphones, speakers, etc.

예시된 바와 같이, 시스템(1600)은 넓은 범위의 영역들의 애플리케이션을 갖는 것으로 도시되어 있다. 예를 들어, 시스템(1600)은 데스크톱 컴퓨터(1610), 랩톱 컴퓨터(1620), 태블릿 컴퓨터(1630), 셀룰러 또는 모바일 폰(1640), 또는 텔레비전(1650)(또는 텔레비전에 결합된 셋톱 박스)의 칩들, 회로부, 컴포넌트들 등의 부분으로서 활용될 수 있다. 스마트워치 및 건강 모니터링 디바이스(1660)가 또한 예시된다. 일부 실시예들에서, 스마트워치는 다양한 범용 컴퓨팅 관련 기능들을 포함할 수 있다. 예를 들어, 스마트워치는 이메일, 셀폰 서비스, 사용자 캘린더 등에 대한 액세스를 제공할 수 있다. 다양한 실시예들에서, 건강 모니터링 디바이스는 전용 의료 디바이스일 수 있거나, 또는 그렇지 않으면 전용 건강 관련 기능을 포함할 수 있다. 예를 들어, 건강 모니터링 디바이스는 사용자의 바이탈 사인(vital sign)들을 모니터링하고, 역학적인 사회적 거리두기의 목적을 위해 다른 사용자들에 대한 사용자의 근접도를 추적하고, 접촉을 추적하고, 건강 위험의 경우 응급 서비스에 통신을 제공하는 등일 수 있다. 다양한 실시예들에서, 위에서 언급된 스마트워치는 일부 또는 임의의 건강 모니터링 관련 기능들을 포함할 수 있거나 포함하지 않을 수 있다. 목부 주위에 착용된 디바이스들, 모자 또는 기타 헤드기어에 부착된 디바이스들, 인체에 이식가능한 디바이스들, 증강 및/또는 가상 현실 경험을 제공하도록 설계된 안경 등과 같은 다른 웨어러블 디바이스들(1660)이 또한 고려된다.As illustrated, system 1600 is shown to have a wide range of areas of application. For example, system 1600 may operate on a desktop computer 1610, a laptop computer 1620, a tablet computer 1630, a cellular or mobile phone 1640, or a television 1650 (or a set-top box coupled to a television). It can be utilized as parts of chips, circuits, components, etc. Smartwatch and health monitoring device 1660 are also illustrated. In some embodiments, a smartwatch may include various general-purpose computing-related functions. For example, a smartwatch can provide access to email, cell phone service, the user's calendar, etc. In various embodiments, a health monitoring device may be a dedicated medical device or may otherwise include dedicated health-related functionality. For example, health monitoring devices can monitor a user's vital signs, track the user's proximity to other users for epidemiological social distancing purposes, contact tracing, and identify health risks. This may include providing communications to emergency services. In various embodiments, the above-mentioned smartwatch may or may not include some or any health monitoring-related features. Other wearable devices 1660 are also contemplated, such as devices worn around the neck, devices attached to a hat or other headgear, devices implantable in the body, glasses designed to provide augmented and/or virtual reality experiences, etc. do.

시스템(1600)은 클라우드 기반 서비스(들)(1670)의 일부로서 추가로 사용될 수 있다. 예를 들어, 이전에 언급된 디바이스들, 및/또는 다른 디바이스들은 클라우드 내의 컴퓨팅 자원들(즉, 원격으로 위치된 하드웨어 및/또는 소프트웨어 자원들)에 액세스할 수 있다. 더 추가적으로, 시스템(1600)은 이전에 언급된 것들 이외의 홈(home)(1680)의 하나 이상의 디바이스들에서 활용될 수 있다. 예를 들어, 홈 내의 기기들은 주의를 요하는 조건들을 모니터링하고 검출할 수 있다. 홈 내의 다양한 디바이스들(예를 들어, 냉장고, 냉각 시스템 등)은 디바이스의 상태를 모니터링하고, 특정 이벤트가 검출되면 경보를 집주인(또는 예를 들어, 수리 설비)에게 제공할 수 있다. 대안적으로, 서모스탯(thermostat)은 홈 내의 온도를 모니터링할 수 있고, 집주인에 의한 다양한 조건들에 대한 응답들의 이력에 기초하여 가열/냉각 시스템에 대한 조정들을 자동화할 수 있다. 또한, 운송(1690)의 다양한 모드들에 대한 시스템(1600)의 애플리케이션이 도 16에 예시되어 있다. 예를 들어, 시스템(1600)은 항공기, 기차들, 버스들, 임대용 자동차들, 개인용 자동차들, 개인용 보트들로부터 유람선(cruise liner)들까지의 수상 선박들, (대여 또는 소유를 위한) 스쿠터들 등의 제어 및/또는 엔터테인먼트 시스템들에서 사용될 수 있다. 다양한 경우들에서, 시스템(1600)은 자동화된 안내(예를 들어, 자율-주행 차량들), 일반적인 시스템 제어 등을 제공하기 위해 사용될 수 있다.System 1600 may further be used as part of cloud-based service(s) 1670. For example, the previously mentioned devices, and/or other devices, may access computing resources (i.e., remotely located hardware and/or software resources) within the cloud. Still further, system 1600 may be utilized in one or more devices of home 1680 other than those previously mentioned. For example, devices in the home can monitor and detect conditions that require attention. Various devices within the home (e.g., refrigerators, cooling systems, etc.) may monitor the status of the devices and provide alerts to the homeowner (or, e.g., a repair facility) when certain events are detected. Alternatively, a thermostat can monitor the temperature within the home and automate adjustments to the heating/cooling system based on a history of responses to various conditions by the homeowner. Additionally, application of system 1600 to various modes of transportation 1690 is illustrated in FIG. 16 . For example, system 1600 can be used on aircraft, trains, buses, rental cars, personal automobiles, watercraft from personal boats to cruise liners, scooters (for rental or ownership), etc. It can be used in control and/or entertainment systems. In various instances, system 1600 may be used to provide automated guidance (e.g., self-driving vehicles), general system control, etc.

시스템(1600)에 대한 매우 다양한 잠재적 애플리케이션들은 다양한 성능, 비용, 및 전력 소비 요건들을 포함할 수 있다는 것에 유의한다. 따라서, 하나 이상의 집적 회로들을 사용하여 성능, 비용 및 전력 소비의 적합한 조합을 제공할 수 있는 스케일러블(scalable) 솔루션이 유리할 수 있다. 이들 및 많은 다른 실시예들이 가능하고 고려된다. 도 16에 예시된 디바이스들 및 애플리케이션들이 단지 예시적인 것이며 제한하려는 의도가 아니라는 것을 유의한다. 다른 디바이스들이 가능하고 고려된다.Note that the wide variety of potential applications for system 1600 may include varying performance, cost, and power consumption requirements. Accordingly, a scalable solution that can provide a suitable combination of performance, cost and power consumption using one or more integrated circuits may be advantageous. These and many other embodiments are possible and contemplated. Note that the devices and applications illustrated in FIG. 16 are illustrative only and are not intended to be limiting. Other devices are possible and contemplated.

도 16에 관하여 개시된 바와 같이, 컴퓨터 시스템(1600)은 개인용 컴퓨터, 스마트폰, 태블릿 컴퓨터, 또는 다른 유형의 컴퓨팅 디바이스 내에 포함된 하나 이상의 집적 회로들을 포함할 수 있다. 설계 정보를 사용하여 집적 회로를 설계 및 생성하기 위한 프로세스가 도 17에서 아래에 제시된다.As disclosed with respect to FIG. 16 , computer system 1600 may include one or more integrated circuits included within a personal computer, smartphone, tablet computer, or other type of computing device. A process for designing and creating an integrated circuit using design information is presented below in Figure 17.

도 17은 일부 실시예들에 따른, 회로 설계 정보를 저장하는 비일시적 컴퓨터 판독가능 저장 매체의 예를 예시하는 블록도이다. 도 17의 실시예는 집적 회로들, 예를 들어, 도 1 내지 도 15 전체에 걸쳐 도시 및 설명된 바와 같은 시스템들(100, 200, 500, 600, 900, 또는 1100) 중 임의의 것을 설계 및 제조하기 위한 프로세스에서 활용될 수 있다. 도시된 실시예에서, 반도체 제조 시스템(1720)은 비일시적 컴퓨터 판독가능 저장 매체(1710) 상에 저장된 설계 정보(1715)를 프로세싱하고 설계 정보(1715)에 기초하여 집적 회로(1730)(예컨대, 시스템(100))를 제조하도록 구성된다.Figure 17 is a block diagram illustrating an example of a non-transitory computer-readable storage medium storing circuit design information, according to some embodiments. The embodiment of FIG. 17 may be used to design and use integrated circuits, e.g., any of the systems 100, 200, 500, 600, 900, or 1100 as shown and described throughout FIGS. 1-15. It can be used in the manufacturing process. In the depicted embodiment, semiconductor manufacturing system 1720 processes design information 1715 stored on non-transitory computer-readable storage medium 1710 and creates an integrated circuit 1730 (e.g., It is configured to manufacture system 100).

비일시적 컴퓨터 판독가능 저장 매체(1710)는 다양한 적절한 유형들의 메모리 디바이스들 또는 저장 디바이스들 중 임의의 것을 포함할 수 있다. 비일시적 컴퓨터 판독가능 저장 매체(1710)는, 설치 매체, 예를 들어 CD-ROM, 플로피 디스크들, 또는 테이프 디바이스; DRAM, DDR RAM, SRAM, EDO RAM, 램버스(Rambus) RAM 등과 같은 컴퓨터 시스템 메모리 또는 랜덤 액세스 메모리; 플래시, 자기 매체, 예를 들어, 하드 드라이브, 또는 광학 저장소와 같은 비휘발성 메모리; 레지스터들, 또는 다른 유사한 유형들의 메모리 요소들 등일 수 있다. 비일시적 컴퓨터 판독가능 저장 매체(1710)는 또한 다른 유형들의 비일시적 메모리 또는 이들의 조합들을 포함할 수 있다. 비일시적 컴퓨터 판독가능 저장 매체(1710)는 상이한 위치들, 예를 들어 네트워크를 통해 연결되는 상이한 컴퓨터 시스템들에 상주할 수 있는 2개 이상의 메모리 매체들을 포함할 수 있다.Non-transitory computer-readable storage medium 1710 may include any of a variety of suitable types of memory devices or storage devices. Non-transitory computer-readable storage media 1710 may include installation media, such as CD-ROM, floppy disks, or tape devices; Computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc.; Non-volatile memory such as flash, magnetic media, such as hard drives, or optical storage; These may be registers, or other similar types of memory elements, etc. Non-transitory computer-readable storage medium 1710 may also include other types of non-transitory memory or combinations thereof. Non-transitory computer-readable storage medium 1710 may include two or more memory media that may reside in different locations, for example, different computer systems that are connected through a network.

설계 정보(1715)는, 제한 없이, VHDL, Verilog, SystemC, SystemVerilog, RHDL, M, MyHDL 등과 같은 하드웨어 설명 언어들을 포함하는 다양한 적절한 컴퓨터 언어들 중 임의의 것을 사용하여 특정될 수 있다. 설계 정보(1715)는 집적 회로(1730)의 적어도 일부를 제조하기 위해 반도체 제조 시스템(1720)에 의해 사용가능할 수 있다. 설계 정보(1715)의 포맷은 예를 들어 반도체 제조 시스템(1720)과 같은 적어도 하나의 반도체 제조 시스템에 의해 인식될 수 있다. 일부 실시예들에서, 설계 정보(1715)는 셀 라이브러리의 요소들뿐만 아니라 그들의 연결성을 특정하는 넷리스트를 포함할 수 있다. 집적 회로(1730)에 포함된 회로들의 로직 합성 동안 사용되는 하나 이상의 셀 라이브러리들이 또한 설계 정보(1715)에 포함될 수 있다. 그러한 셀 라이브러리들은 셀 라이브러리에 포함된 셀들의 디바이스 또는 트랜지스터 레벨 넷리스트, 마스크 설계 데이터, 특성화 데이터 등을 나타내는 정보를 포함할 수 있다.Design information 1715 may be specified using any of a variety of suitable computer languages, including, without limitation, hardware description languages such as VHDL, Verilog, SystemC, SystemVerilog, RHDL, M, MyHDL, etc. Design information 1715 may be usable by semiconductor manufacturing system 1720 to manufacture at least a portion of integrated circuit 1730. The format of design information 1715 may be recognized by at least one semiconductor manufacturing system, such as semiconductor manufacturing system 1720, for example. In some embodiments, design information 1715 may include a netlist that specifies the elements of the cell library as well as their connectivity. One or more cell libraries used during logic synthesis of circuits included in integrated circuit 1730 may also be included in design information 1715. Such cell libraries may include information representing device or transistor level netlist, mask design data, characterization data, etc. of cells included in the cell library.

다양한 실시예들에서, 집적 회로(1730)는 메모리들, 아날로그 또는 혼합 신호 회로들 등과 같은 하나 이상의 맞춤형 매크로셀들을 포함할 수 있다. 그러한 경우들에서, 설계 정보(1715)는 포함된 매크로셀들에 관련된 정보를 포함할 수 있다. 그러한 정보는, 제한 없이, 개략적 캡처 데이터베이스, 마스크 설계 데이터, 거동 모델들, 및 디바이스 또는 트랜지스터 레벨 넷리스트들을 포함할 수 있다. 본 명세서에 사용되는 바와 같이, 마스크 설계 데이터는 그래픽 데이터 시스템(gdsii), 또는 임의의 다른 적합한 포맷에 따라 포맷팅될 수 있다.In various embodiments, integrated circuit 1730 may include one or more custom macrocells, such as memories, analog or mixed signal circuits, etc. In such cases, design information 1715 may include information related to the included macrocells. Such information may include, without limitation, coarse capture database, mask design data, behavior models, and device or transistor level netlists. As used herein, mask design data may be formatted according to the Graphics Data System (gdsii), or any other suitable format.

반도체 제조 시스템(1720)은 집적 회로들을 제조하도록 구성된 다양한 적절한 요소들 중 임의의 것을 포함할 수 있다. 이것은, 예를 들어, 반도체 재료들을 (예를 들어, 마스킹을 포함할 수 있는 웨이퍼 상에) 침착하는 것, 재료들을 제거하는 것, 침착된 재료들의 형상을 변경하는 것, 재료들을 (예를 들어, 재료들을 도핑하거나 또는 자외선 프로세싱을 사용하여 유전 상수들을 수정함으로써) 수정하는 것 등을 위한 요소들을 포함할 수 있다. 반도체 제조 시스템(1720)은 또한, 정확한 동작을 위해 제조된 회로들의 다양한 테스트를 수행하도록 구성될 수 있다.Semiconductor manufacturing system 1720 may include any of a variety of suitable elements configured to fabricate integrated circuits. This may include, for example, depositing semiconductor materials (e.g., on a wafer, which may include masking), removing the materials, changing the shape of the deposited materials, removing the materials (e.g. , doping materials, or modifying dielectric constants using ultraviolet processing), etc. Semiconductor manufacturing system 1720 may also be configured to perform various tests on fabricated circuits to ensure correct operation.

다양한 실시예들에서, 집적 회로(1730)는 설계 정보(1715)에 의해 특정된 회로 설계에 따라 동작하도록 구성되며, 이는 본 명세서에 설명된 기능 중 임의의 것을 수행하는 것을 포함할 수 있다. 예를 들어, 집적 회로(1730)는 본 명세서에 도시되거나 설명된 다양한 요소들 중 임의의 것을 포함할 수 있다. 추가로, 집적 회로(1730)는 다른 컴포넌트들과 함께 본 명세서에 설명된 다양한 기능들을 수행하도록 구성될 수 있다.In various embodiments, integrated circuit 1730 is configured to operate according to a circuit design specified by design information 1715, which may include performing any of the functions described herein. For example, integrated circuit 1730 may include any of the various elements shown or described herein. Additionally, integrated circuit 1730 may be configured with other components to perform various functions described herein.

본 명세서에 사용되는 바와 같이, "~하도록 구성된 회로의 설계를 특정하는 설계 정보"라는 형태의 문구는 요소가 충족되기 위해 해당 회로가 제조되어야 한다는 것을 의미하지 않는다. 오히려, 이러한 문구는 설계 정보가, 제조 시에, 표시된 액션들을 수행하도록 구성될 것이거나 특정된 컴포넌트들을 포함할 회로를 설명한다는 것을 표시한다.As used herein, phrases of the form “design information specifying the design of a circuit configured to” do not imply that the circuit must be manufactured for the element to be met. Rather, this phrase indicates that the design information describes a circuit that, when manufactured, will be configured or will include specified components to perform the indicated actions.

******

본 개시내용은 "실시예" 또는 "실시예들"의 그룹들(예를 들어, "일부 실시예들" 또는 "다양한 실시예들")에 대한 언급들을 포함한다. 실시예들은 개시된 개념들의 상이한 구현들 또는 인스턴스들이다. "일 실시예", "하나의 실시예", "특정 실시예" 등에 대한 언급들은 반드시 동일한 실시예를 지칭하는 것은 아니다. 구체적으로 개시된 것들뿐만 아니라, 본 개시내용의 사상 또는 범주 내에 속하는 수정들 또는 대안들을 포함하는 많은 가능한 실시예들이 고려된다.This disclosure includes references to “an embodiment” or groups of “embodiments” (e.g., “some embodiments” or “various embodiments”). Embodiments are different implementations or instances of the disclosed concepts. References to “one embodiment,” “one embodiment,” “particular embodiment,” etc. do not necessarily refer to the same embodiment. Many possible embodiments are contemplated, including those specifically disclosed, as well as modifications or alternatives that fall within the spirit or scope of the disclosure.

본 개시내용은 개시된 실시예들로부터 발생할 수 있는 잠재적인 이점들을 논의할 수 있다. 이러한 실시예들의 모든 구현들이 반드시 잠재적인 이점들 중 임의의 또는 모든 것을 나타내는 것은 아닐 것이다. 특정 구현에 대해 이점이 실현되는지 여부는 많은 인자들에 의존하며, 이들 중 일부는 본 개시내용의 범위를 벗어난다. 실제로, 청구항들의 범위 내에 속하는 구현이 임의의 개시된 이점들 중 일부 또는 전부를 나타내지 않을 수 있는 많은 이유들이 있다. 예를 들어, 특정 구현은 개시된 실시예들 중 하나와 함께, 하나 이상의 개시된 이점들을 무효화하거나 약화시키는, 본 개시내용의 범위 밖의 다른 회로부를 포함할 수 있다. 더욱이, 특정 구현의 차선의 설계 실행(예를 들어, 구현 기술들 또는 도구들)은 또한 개시된 이점들을 무효화하거나 약화시킬 수 있다. 숙련된 구현을 가정하더라도, 이점들의 실현은 구현이 전개되는 환경 상황들과 같은 다른 인자들에 여전히 의존할 수 있다. 예를 들어, 특정 구현에 공급되는 입력들은 본 개시내용에서 해결되는 하나 이상의 문제들이 특정 기회에 발생하는 것을 방지할 수 있으며, 그 결과, 그 해결책의 이익이 실현되지 않을 수 있다. 본 개시내용 외부의 가능한 인자들의 존재를 고려할 때, 본 명세서에서 설명되는 임의의 잠재적인 이점들은, 침해를 입증하기 위해 충족되어야 하는 청구항 제한들로서 해석되지 않아야 한다는 것이 명백하게 의도된다. 오히려, 그러한 잠재적 이점들의 식별은 본 개시내용의 이익을 갖는 설계자들에게 이용가능한 개선의 유형(들)을 예시하도록 의도된다. 그러한 이점들이 허용가능하게 설명된다는 것(예를 들어, 특정 이점이 "발생할 수 있다"고 언급함)은 그러한 이점들이 실제로 실현될 수 있는지에 대한 의구심을 전달하도록 의도되는 것이 아니라, 그러한 이점들의 실현이 종종 부가적인 인자들에 의존한다는 기술적 현실을 인식하도록 의도된다.This disclosure may discuss potential advantages that may arise from the disclosed embodiments. All implementations of these embodiments will not necessarily exhibit any or all of the potential advantages. Whether a benefit is realized for a particular implementation depends on many factors, some of which are beyond the scope of this disclosure. In fact, there are many reasons why implementations that fall within the scope of the claims may not exhibit some or all of the disclosed advantages. For example, a particular implementation may include other circuitry outside the scope of the disclosure, along with one of the disclosed embodiments, that negates or diminishes one or more of the disclosed advantages. Moreover, suboptimal design implementation (e.g., implementation techniques or tools) of a particular implementation may also negate or diminish the disclosed advantages. Even assuming a skillful implementation, realization of the benefits may still depend on other factors such as the environmental circumstances in which the implementation unfolds. For example, inputs supplied to a particular implementation may prevent one or more problems addressed in the disclosure from occurring at a particular opportunity, and as a result, the benefits of the solution may not be realized. Given the existence of possible factors outside the present disclosure, it is expressly intended that any potential advantages described herein should not be construed as claim limitations that must be met to establish infringement. Rather, the identification of such potential advantages is intended to illustrate the type(s) of improvement available to designers having the benefit of the present disclosure. That such benefits are acceptably described (e.g., by stating that a particular benefit "could occur") is not intended to convey doubt as to whether such benefits can in fact be realized, but rather that the realization of those benefits is It is intended to recognize the technical reality that this often depends on additional factors.

달리 언급되지 않는 한, 실시예들은 비제한적이다. 즉, 개시된 실시예들은, 특정 특징에 대해 단일 예만이 설명되는 경우에도, 본 개시내용에 기초하여 작성되는 청구항들의 범위를 제한하도록 의도되지 않는다. 개시된 실시예들은, 이에 반하는 본 개시내용의 어떠한 진술도 없이, 제한적이기보다는 예시적인 것으로 의도된다. 따라서, 본 출원은 개시된 실시예들을 커버하는 청구항들뿐만 아니라, 본 개시내용의 이익을 갖는 당업자에게 명백할 그러한 대안들, 수정들 및 등가물들을 허용하도록 의도된다.Unless otherwise stated, the examples are non-limiting. That is, the disclosed embodiments are not intended to limit the scope of claims made based on this disclosure, even if only a single example is described for a particular feature. The disclosed embodiments are intended to be illustrative rather than restrictive, without any statement of the disclosure to the contrary. Accordingly, this application is intended to admit not only the claims that cover the disclosed embodiments, but also such alternatives, modifications and equivalents that will be apparent to those skilled in the art having the benefit of this disclosure.

예를 들어, 본 출원에서의 특징들은 임의의 적합한 방식으로 조합될 수 있다. 따라서, 특징들의 임의의 그러한 조합에 대해 본 출원(또는 그에 대한 우선권을 주장하는 출원)의 심사 동안에 새로운 청구범위가 작성될 수 있다. 특히, 첨부된 청구항들을 참조하면, 종속 청구항들로부터의 특징들은 다른 독립 청구항들로부터 의존하는 청구항들을 포함하여, 적절한 경우 다른 종속 청구항들의 특징들과 조합될 수 있다. 유사하게, 개개의 독립 청구항들로부터의 특징들은 적절한 경우 조합될 수 있다.For example, features in the present application may be combined in any suitable way. Accordingly, new claims may be made during examination of this application (or an application claiming priority thereto) for any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with features of other dependent claims, where appropriate, including claims that rely on other independent claims. Similarly, features from individual independent claims may be combined where appropriate.

따라서, 첨부된 종속 청구항들은 각각이 단일의 다른 청구항에 의존하도록 작성될 수 있지만, 부가적인 종속성들이 또한 고려된다. 본 개시내용과 일치하는 종속물에서의 특징들의 임의의 조합들이 고려되며, 이러한 또는 다른 출원에서 청구될 수 있다. 간단히 말하면, 조합들은 첨부된 청구항들에 구체적으로 열거된 것들로 제한되지 않는다.Accordingly, although attached dependent claims may be written so that each depends on a single other claim, additional dependencies are also contemplated. Any combination of features consistent with this disclosure is contemplated and may be claimed in this or other application. In short, the combinations are not limited to those specifically recited in the appended claims.

적절한 경우, 하나의 포맷 또는 법정 유형(예를 들어, 장치)으로 작성된 청구항들은 다른 포맷 또는 법정 유형(예를 들어, 방법)의 대응하는 청구항들을 지원하도록 의도되는 것으로 또한 고려된다.Where appropriate, claims written in one format or statutory type (e.g., device) are also considered to be intended to support corresponding claims in another format or statutory type (e.g., method).

******

본 개시내용은 법적인 문서이기 때문에, 다양한 용어들 및 문구들은 행정적 및 사법적 해석의 대상이 될 수 있다. 본 개시내용 전반에 걸쳐 제공되는 정의들뿐만 아니라 다음의 단락들이 본 개시내용에 기초하여 작성되는 청구항들을 해석하는 방법을 결정하는 데 사용될 것이라는 공지가 본 명세서에 주어진다.Because this disclosure is a legal document, various terms and phrases may be subject to administrative and judicial interpretation. Notice is given herein that the following paragraphs, as well as the definitions provided throughout this disclosure, will be used in determining how to interpret claims made based on this disclosure.

물품의 단수 형태(즉, "a", "an" 또는 "the"가 선행되는 명사 또는 명사 문구)에 대한 언급들은, 문맥상 명확하게 달리 지시되지 않는 한, "하나 이상"을 의미하는 것으로 의도된다. 따라서, 청구항에서 "항목"에 대한 언급은, 수반되는 상황 없이, 항목의 부가적인 인스턴스들을 배제하지 않는다. "복수"의 항목들은 항목들 중 2개 이상의 세트를 지칭한다.References to the singular form of an article (i.e., a noun or noun phrase preceded by “a”, “an” or “the”) are intended to mean “one or more” unless the context clearly dictates otherwise. do. Accordingly, reference to “an item” in a claim does not exclude additional instances of the item without accompanying context. “Plural” items refer to two or more sets of items.

"~할 수 있다"라는 단어는 본 명세서에서 강제적인 의미(즉, ~ 해야 하는)가 아니라 허용적인 의미(즉, ~할 가능성을 갖는, ~할 수 있는)로 사용된다.The word “may” is used in this specification not in a mandatory sense (i.e., must) but in a permissive sense (i.e., having the possibility of doing, being able to do).

용어들 "포함하는"("comprising" 및 "including") 및 이들의 형태들은 개방형(open-ended)이며, "포함하지만 이로 한정되지 않는"을 의미한다.The terms “comprising” and “including” and their forms are open-ended and mean “including but not limited to.”

용어 "또는"이 옵션들의 리스트에 관하여 본 개시내용에서 사용될 때, 문맥이 달리 제공하지 않는 한, 일반적으로 포괄적인 의미로 사용되는 것으로 이해될 것이다. 따라서, "x 또는 y"의 언급은 "x 또는 y, 또는 둘 모두"와 동등하고, 따라서 1) x지만 y 아님, 2) y지만 x 아님 및 3) x 및 y 둘 모두를 커버한다. 반면에, "둘 모두가 아니라 x 또는 y 중 어느 하나"와 같은 문구는 "또는"이 배타적인 의미로 사용되고 있다는 것을 명확하게 한다.When the term “or” is used in this disclosure in relation to a list of options, it will generally be understood to be used in an inclusive sense, unless the context provides otherwise. Accordingly, reference to “x or y” is equivalent to “x or y, or both” and thus covers 1) x but not y, 2) y but not x, and 3) both x and y. On the other hand, phrases such as “either x or y, but not both” make it clear that “or” is being used in an exclusive sense.

"w, x, y, 또는 z, 또는 이들의 임의의 조합" 또는 "... w, x, y, 및 z 중 적어도 하나"의 언급은 세트 내의 요소들의 총 수까지 단일 요소를 수반하는 모든 가능성들을 커버하도록 의도된다. 예를 들어, 세트 [w, x, y, z]가 주어지면, 이러한 문구들은 세트의 임의의 단일 요소(예를 들어, w지만 x, y, 또는 z 아님), 임의의 2개의 요소들(예를 들어, w 및 x지만 y 또는 z 아님), 임의의 3개의 요소들(예를 들어, w, x 및 y지만, z 아님) 및 4개의 요소들 모두를 커버한다. 따라서, "... w, x, y, 및 z 중 적어도 하나"라는 문구는 세트 [w, x, y, z]의 적어도 하나의 요소를 지칭하고, 이로써 요소들의 이러한 리스트 내의 모든 가능한 조합들을 커버한다. 이러한 문구는 w의 적어도 하나의 인스턴스, x의 적어도 하나의 인스턴스, y의 적어도 하나의 인스턴스, 및 z의 적어도 하나의 인스턴스가 있음을 요구하도록 해석되지 않아야 한다.Reference to "w, x, y, or z, or any combination thereof" or "... at least one of w, It is intended to cover possibilities. For example, given a set [w, x, y, z], these phrases refer to any single element of the set (e.g., w but not for example w and x but not y or z), any three elements (e.g. w, x and y but not z) and all four elements. Thus, the phrase "...at least one of w, x, y, and z" refers to at least one element of the set [w, x, y, z], thereby exhausting all possible combinations within this list of elements. Cover. These phrases should not be interpreted to require that there is at least one instance of w, at least one instance of x, at least one instance of y, and at least one instance of z.

본 개시내용에서 다양한 "라벨들"이 명사들 또는 명사 문구들에 선행할 수 있다. 문맥이 달리 제공하지 않는 한, 특징에 대해 사용되는 상이한 라벨들(예를 들어, "제1 회로", "제2 회로", "특정 회로", "주어진 회로" 등)은 특징의 상이한 인스턴스들을 지칭한다. 추가적으로, 특징에 적용될 때, "제1", "제2" 및 "제3" 라벨들은, 달리 언급되지 않는 한, 어떠한 유형의 순서화(예를 들어, 공간적, 시간적, 논리적 등)를 의미하지 않는다.Various “labels” may precede nouns or noun phrases in this disclosure. Unless the context provides otherwise, different labels used for a feature (e.g., “first circuit,” “second circuit,” “particular circuit,” “given circuit,” etc.) refer to different instances of the feature. refers to Additionally, when applied to features, the “first,” “second,” and “third” labels do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless otherwise noted. .

문구 "기초하여"는 결정에 영향을 주는 하나 이상의 인자들을 설명하기 위해 사용된다. 이러한 용어는 부가적인 인자들이 결정에 영향을 줄 수 있는 가능성을 배제하지 않는다. 즉, 결정은 단지 특정된 인자들에 기초하거나 또는 그 특정된 인자들뿐만 아니라 다른, 불특정된 인자들에 기초할 수 있다. "B에 기초하여 A를 결정한다"라는 문구를 고려한다. 이러한 문구는 B가 A를 결정하는 데 사용되거나 A의 결정에 영향을 주는 인자라는 것을 명시한다. 이러한 문구는 A의 결정이 C와 같은 일부 다른 인자에 또한 기초할 수 있음을 배제하지 않는다. 또한, 이 문구는 A가 B만에 기초하여 결정되는 실시예를 커버하도록 의도된다. 본 명세서에서 사용되는 바와 같이, "에 기초하여"라는 문구는 "적어도 부분적으로 기초하여"라는 문구와 동의어이다.The phrase “based on” is used to describe one or more factors that influence a decision. These terms do not exclude the possibility that additional factors may influence the decision. That is, the decision may be based solely on the specified factors or may be based on the specified factors as well as other, unspecified factors. Consider the statement “Decide A based on B.” These phrases specify that B is used to determine A or is a factor influencing A's decision. This phrase does not exclude that A's decision may also be based on some other factors, such as C. Additionally, this phrase is intended to cover embodiments in which A is determined based solely on B. As used herein, the phrase “based on” is synonymous with the phrase “based at least in part.”

문구들 "~ 에 응답하여" 및 "~ 에 응답으로"는 효과를 트리거하는 하나 이상의 인자들을 설명한다. 이러한 문구는 부가적인 인자들이 특정된 인자들과 공동으로 또는 특정된 인자들과는 독립적으로 영향을 주거나 또는 달리 효과를 트리거할 수 있는 가능성을 배제하지 않는다. 즉, 효과는 단지 이들 인자들에만 응답할 수 있거나 또는 특정된 인자들뿐만 아니라 다른 불특정된 인자들에 응답할 수 있다. "B에 응답하여 A를 수행한다"라는 문구를 고려한다. 이러한 문구는 B가 A의 수행을 트리거하는 또는 A에 대한 특정 결과를 트리거하는 인자라는 것을 명시한다. 이러한 문구는 A를 수행하는 것이 C와 같은 일부 다른 인자에 또한 응답할 수 있음을 배제하지 않는다. 이러한 문구는 또한 A를 수행하는 것이 B와 C에 응답하여 공동으로 수행될 수 있다는 것을 배제하지 않는다. 이러한 문구는 또한 A가 B에만 응답하여 수행되는 실시예를 커버하도록 의도된다. 본 명세서에서 사용되는 바와 같이, 문구 "응답하여"는 문구 "적어도 부분적으로 응답하여"와 동의어이다. 유사하게, 문구 "~에 응답하여"는 문구 "적어도 부분적으로 응답하여"와 동의어이다.The phrases “in response to” and “in response to” describe one or more factors that trigger an effect. This phrase does not exclude the possibility that additional factors may influence or otherwise trigger the effect jointly with or independently of the specified factors. That is, the effect may respond only to these factors or may respond to the specified factors as well as other unspecified factors. Consider the statement “Do A in response to B.” These phrases specify that B is an argument that triggers the execution of A or triggers a specific result for A. This phrase does not exclude that performing A may also respond to some other argument, such as C. This phrase also does not exclude that performing A may be performed jointly in response to B and C. This phrase is also intended to cover embodiments in which A is performed in response only to B. As used herein, the phrase “in response” is synonymous with the phrase “at least in part in response.” Similarly, the phrase “in response to” is synonymous with the phrase “at least in part in response to.”

본 개시내용 내에서, 상이한 엔티티들(이는, "유닛들", "회로들", 다른 구성요소들 등으로 다양하게 지칭될 수 있음)은 하나 이상의 태스크들 또는 동작들을 수행하도록 "구성된" 것으로 설명되거나 또는 청구될 수 있다. 이러한 표현-[하나 이상의 태스크들을 수행]하도록 구성된 [엔티티]-은 본 명세서에서 구조체(즉, 물리적인 것)를 지칭하는 데 사용된다. 더 상세하게는, 이러한 표현은 이러한 구조체가 동작 동안 하나 이상의 태스크들을 수행하도록 배열됨을 나타내는 데 사용된다. 구조체는 그 구조체가 현재 동작되고 있지 않더라도 일부 태스크를 수행하도록 "구성된다"고 말할 수 있다. 따라서, 일부 태스크를 수행"하도록 구성된" 것으로 설명된 또는 언급된 엔티티는 디바이스, 회로, 태스크를 구현하도록 실행가능한 프로그램 명령들을 저장하는 메모리 및 프로세서 유닛을 갖는 시스템 등과 같은 물리적인 것을 지칭한다. 이러한 문구는 본 명세서에서 무형인 것을 지칭하기 위해 사용되지는 않는다.Within this disclosure, different entities (which may be variously referred to as “units,” “circuits,” other components, etc.) are described as being “configured” to perform one or more tasks or operations. may be or may be charged. This expression - [entity] configured to [perform one or more tasks] - is used herein to refer to a structure (i.e., a physical thing). More specifically, this expression is used to indicate that this structure is arranged to perform one or more tasks during operation. A structure can be said to be "configured" to perform some task even if the structure is not currently performing it. Accordingly, an entity described or referred to as being “configured” to perform some task refers to something physical, such as a device, circuit, system having a processor unit and memory that stores executable program instructions to implement the task. This phrase is not used herein to refer to something intangible.

일부 경우들에서, 다양한 유닛들/회로들/구성요소들은 태스크 또는 동작들의 세트를 수행하는 것으로 본 명세서에서 설명될 수 있다. 이들 엔티티들은, 구체적으로 언급되지 않더라도, 그러한 태스크들/동작들을 수행하도록 "구성"된다는 것이 이해된다.In some cases, various units/circuits/components may be described herein as performing a task or set of operations. It is understood that these entities, even if not specifically stated, are “configured” to perform such tasks/actions.

용어 "~ 하도록 구성된"은 "~하도록 구성가능한"을 의미하도록 의도되지 않는다. 예를 들어, 프로그래밍되지 않은 FPGA는 특정 기능을 수행하도록 "구성된" 것으로 간주되지 않을 것이다. 그러나, 이러한 프로그래밍되지 않은 FPGA는 그 기능을 수행하도록 "구성가능"할 수 있다. 적절한 프로그래밍 후에, 이어서 FPGA는 특정 기능을 수행하도록 "구성된다"고 말할 수 있다.The term “configured to” is not intended to mean “configurable to”. For example, an unprogrammed FPGA would not be considered “configured” to perform a specific function. However, these unprogrammed FPGAs can be “configurable” to perform their functions. After appropriate programming, the FPGA can then be said to be “configured” to perform a specific function.

본 개시내용에 기초한 미국 특허 출원들의 목적들을 위해, 구조가 하나 이상의 태스크들을 수행하도록 "구성"된다고 청구항에서 언급하는 것은 명백히 그 청구항 요소에 대하여 35 U.S.C. §112(f)를 적용하지 않도록 의도된다. 출원인이 본 개시내용에 기초하여 미국 특허 출원의 심사 동안 섹션 112(f)의 적용을 원하면, [기능을 수행]"하기 위한 수단" 구조를 이용하여 청구항 요소들을 열거할 것이다.For purposes of U.S. patent applications based on this disclosure, a statement in a claim that a structure is “configured” to perform one or more tasks expressly means that a claim element is within the meaning of 35 U.S.C. §112(f) is not intended to apply. If an applicant seeks to invoke section 112(f) during prosecution of a U.S. patent application based on this disclosure, the claim elements will be recited using a “means for performing a function” structure.

******

상이한 "회로들"이 본 개시내용에서 설명될 수 있다. 이러한 회로들 또는 "회로부"는 조합 로직, 클로킹된 저장 디바이스들(예를 들어, 플립-플롭들, 레지스터들, 래치들 등), 유한 상태 머신들, 메모리(예를 들어, 랜덤 액세스 메모리, 내장형 동적 랜덤 액세스 메모리), 프로그래밍가능 로직 어레이들 등과 같은 다양한 유형들의 회로 요소들을 포함하는 하드웨어를 구성한다. 회로부는 맞춤 설계되거나 표준 라이브러리들로부터 취해질 수 있다. 다양한 구현들에서, 회로부는 적절하게 디지털 구성요소들, 아날로그 구성요소들, 또는 둘 모두의 조합을 포함할 수 있다. 특정 유형들의 회로들은 일반적으로 "유닛들"(예를 들어, 디코드 유닛, 산술 로직 유닛(ALU), 기능 유닛, 메모리 관리 유닛(MMU) 등)로 지칭될 수 있다. 그러한 유닛들은 또한 회로들 또는 회로부를 지칭한다.Different “circuits” may be described in this disclosure. These circuits or “circuitry” may include combinational logic, clocked storage devices (e.g., flip-flops, registers, latches, etc.), finite state machines, memory (e.g., random access memory, embedded It constitutes hardware that includes various types of circuit elements such as dynamic random access memory, programmable logic arrays, etc. The circuitry can be custom designed or taken from standard libraries. In various implementations, the circuitry may include digital components, analog components, or a combination of both, as appropriate. Certain types of circuits may be generally referred to as “units” (eg, decode unit, arithmetic logic unit (ALU), functional unit, memory management unit (MMU), etc.). Such units also refer to circuits or circuit sections.

따라서, 도면들에 예시되고 본 명세서에서 설명된 개시된 회로들/유닛들/구성요소들 및 다른 요소들은 이전 단락에서 설명된 것들과 같은 하드웨어 요소들을 포함한다. 많은 경우들에서, 특정 회로 내의 하드웨어 요소들의 내부 배열은 그 회로의 기능을 설명함으로써 특정될 수 있다. 예를 들어, 특정 "디코드 유닛"은 "명령의 오피코드(opcode)를 프로세싱하고 그 명령을 복수의 기능 유닛들 중 하나 이상에 라우팅하는" 기능을 수행하는 것으로 설명될 수 있으며, 이는 디코드 유닛이 이러한 기능을 수행하도록 "구성됨"을 의미한다. 이러한 기능의 규격은, 컴퓨터 분야의 당업자들에게, 회로에 대한 가능한 구조체들의 세트를 암시하기에 충분하다.Accordingly, the disclosed circuits/units/components and other elements illustrated in the drawings and described herein include hardware elements such as those described in the previous paragraph. In many cases, the internal arrangement of hardware elements within a particular circuit can be specified by describing the functionality of that circuit. For example, a particular “decode unit” may be described as performing the function of “processing the opcode of an instruction and routing that instruction to one or more of a plurality of functional units,” which means that the decode unit means “configured” to perform these functions. This functional specification is sufficient to suggest to those skilled in the computer arts the set of possible structures for the circuit.

다양한 실시예들에서, 이전 단락에서 논의된 바와 같이, 회로들, 유닛들, 및 다른 요소들은 이들이 구현하도록 구성된 기능들 또는 동작들에 의해 정의될 수 있다. 그러한 회로들/유닛들/컴포넌트들의 서로에 대한 배열 및 이들이 상호작용하는 방식은, 마이크로아키텍처 정의의 물리적 구현을 형성하도록 집적 회로에서 궁극적으로 제조되거나 FPGA로 프로그래밍되는 하드웨어의 마이크로아키텍처 정의를 형성한다. 따라서, 마이크로아키텍처 정의는 많은 물리적 구현들이 유도될 수 있는 구조체로서 당업자들에 의해 인식되며, 이들 모두는 마이크로아키텍처 정의에 의해 설명된 더 넓은 구조체에 속한다. 즉, 본 개시내용에 따라 공급되는 마이크로아키텍처 정의를 제공받는 당업자는, 과도한 실험 없이 그리고 통상의 기술의 적용으로, 회로들/유닛들/구성요소들의 디스크립션을 Verilog 또는 VHDL과 같은 하드웨어 디스크립션 언어(HDL)로 코딩함으로써 구조체를 구현할 수 있다. HDL 디스크립션은 종종, 기능적으로 보일 수 있는 방식으로 표현된다. 그러나, 본 분야의 당업자들에게, 이러한 HDL 디스크립션은 회로, 유닛 또는 구성요소의 구조체를 다음 레벨의 구현 세부사항으로 변환하는 데 사용되는 방식이다. 그러한 HDL 디스크립션은 (통상적으로 합성가능하지 않은) 거동 코드, (거동 코드와는 대조적으로, 통상적으로 합성가능한) 레지스터 전송 언어(RTL) 코드, 또는 구조적 코드(예를 들어, 로직 게이트들 및 그들의 연결을 특정하는 넷리스트)의 형태를 취할 수 있다. HDL 디스크립션은 주어진 집적 회로 제조 기술을 위해 설계된 셀들의 라이브러리에 대해 순차적으로 합성될 수 있고, 타이밍, 전력 및 다른 이유들로 인해 수정되어 최종 설계 데이터베이스를 생성할 수 있으며, 이는 파운드리(foundry)로 송신되어 마스크들을 생성하고 궁극적으로 집적 회로를 생성할 수 있다. 일부 하드웨어 회로들 또는 그의 부분들은 또한 회로도 편집기(schematic editor)로 맞춤 설계될 수 있고 합성된 회로부와 함께 집적 회로 설계 내로 캡처될 수 있다. 집적 회로들은 트랜지스터들, 및 다른 회로 요소들(예를 들어, 커패시터들, 저항기들, 인덕터들 등과 같은 수동 요소들) 및 트랜지스터들과 회로 요소들 사이의 상호연결부를 포함할 수 있다. 일부 실시예들은 하드웨어 회로를 구현하기 위해 함께 조합된 다수의 집적 회로들을 구현할 수 있고/있거나 일부 실시예들에서는 이산 요소들이 사용될 수 있다. 대안적으로, HDL 설계는 FPGA(Field Programmable Gate Array)와 같은 프로그래밍가능 로직 어레이로 합성될 수 있으며 FPGA에서 구현될 수 있다. 회로들의 그룹의 설계와 이들 회로들의 후속 저레벨 구현 사이의 이러한 결합해제는 일반적으로, 회로 또는 로직 설계자가 회로가 무엇을 하도록 구성되는지의 설명을 넘어서 저레벨 구현에 대한 특정 세트의 구조체들을 특정하지 않는 시나리오를 도출하는데, 이는 이러한 프로세스가 회로 구현 프로세스의 상이한 스테이지에서 수행되기 때문이다.In various embodiments, as discussed in the previous paragraph, circuits, units, and other elements may be defined by the functions or operations they are configured to implement. The arrangement of such circuits/units/components with respect to each other and the way they interact form the microarchitectural definition of the hardware that is ultimately fabricated in an integrated circuit or programmed into an FPGA to form the physical implementation of the microarchitectural definition. Accordingly, the microarchitecture definition is recognized by those skilled in the art as a structure from which many physical implementations can be derived, all of which fall within the broader structure described by the microarchitecture definition. That is, a person skilled in the art, given the microarchitecture definition provided in accordance with the present disclosure, will be able, without undue experimentation and by the application of ordinary skill, to describe the circuits/units/components in a hardware description language (HDL) such as Verilog or VHDL. ) You can implement the structure by coding. HDL descriptions are often expressed in a way that may appear functional. However, for those skilled in the art, this HDL description is the method used to translate the structure of a circuit, unit or component to the next level of implementation details. Such HDL descriptions can be either behavior code (which is typically not synthesizable), register transfer language (RTL) code (which, as opposed to behavior code, is typically synthesizable), or structural code (e.g., logic gates and their connections). It can take the form of a netlist specifying . The HDL description can be sequentially synthesized for a library of cells designed for a given integrated circuit manufacturing technology and modified for timing, power and other reasons to create the final design database, which can be sent to the foundry. can be used to create masks and ultimately create integrated circuits. Some hardware circuits or portions thereof can also be custom designed with a schematic editor and captured together with the synthesized circuitry into an integrated circuit design. Integrated circuits may include transistors and other circuit elements (eg, passive elements such as capacitors, resistors, inductors, etc.) and interconnections between the transistors and circuit elements. Some embodiments may implement multiple integrated circuits combined together to implement a hardware circuit and/or in some embodiments discrete elements may be used. Alternatively, the HDL design can be synthesized into a programmable logic array, such as a Field Programmable Gate Array (FPGA), and implemented on the FPGA. This decoupling between the design of a group of circuits and the subsequent low-level implementation of those circuits is generally a scenario in which the circuit or logic designer does not specify a particular set of structures for the low-level implementation beyond a description of what the circuit is configured to do. This is because these processes are performed at different stages of the circuit implementation process.

회로 요소들의 많은 상이한 저레벨 조합들이 회로의 동일한 규격을 구현하는 데 사용될 수 있다는 사실은 그 회로에 대한 다수의 등가 구조체들을 초래한다. 언급된 바와 같이, 이러한 저레벨 회로 구현들은 제조 기술의 변화들, 집적 회로를 제조하기 위해 선택된 파운드리, 특정 프로젝트를 위해 제공된 셀들의 라이브러리 등에 따라 변할 수 있다. 많은 경우들에서, 이들 상이한 구현들을 생성하기 위해 상이한 설계 도구들 또는 방법론들에 의해 이루어지는 선택들은 임의적일 수 있다.The fact that many different low-level combinations of circuit elements can be used to implement the same specification of a circuit results in a large number of equivalent structures for that circuit. As mentioned, these low-level circuit implementations may vary depending on changes in manufacturing technology, the foundry selected to fabricate the integrated circuit, the library of cells provided for a particular project, etc. In many cases, the choices made by different design tools or methodologies to create these different implementations may be arbitrary.

게다가, 회로의 특정 기능 규격의 단일 구현이 주어진 실시예에 대해 많은 수의 디바이스들(예를 들어, 수백만 개의 트랜지스터들)을 포함하는 것이 일반적이다. 따라서, 엄청난 체적의 이러한 정보는, 등가의 가능한 구현들의 방대한 어레이는 말할 것도 없이, 단일 실시예를 구현하는 데 사용되는 저레벨 구조체의 완전한 설명을 제공하는 것을 비실용적으로 만든다. 이러한 이유로, 본 개시내용은 업계에서 일반적으로 사용되는 기능적 속기(shorthand)를 사용하여 회로들의 구조체를 설명한다.Moreover, it is common for a single implementation of a particular functional specification of a circuit to include a large number of devices (e.g., millions of transistors) for a given embodiment. Accordingly, the sheer volume of this information makes it impractical to provide a complete description of the low-level structures used to implement a single embodiment, let alone the vast array of equivalent possible implementations. For this reason, this disclosure describes the structure of circuits using functional shorthand commonly used in the industry.

Claims

As a device,
a cache memory circuit including a cache memory having a plurality of cache lines; and
a cache controller circuit, the cache controller circuit comprising:
receive a request to reallocate a portion of the cache memory circuit currently in use, the request identifying an address region corresponding to one or more cache lines of the plurality of cache lines; and
In response to the request, the apparatus is configured to convert the one or more cache lines to directly addressable random-access memory (RAM) by excluding the one or more cache lines from cache operations.

2. The method of claim 1, wherein the cache controller circuit:
to support a real-time virtual channel for memory transactions in the identified address region; and
The apparatus further configured to prioritize memory transactions received via the real-time virtual channel over memory transactions received via the bulk virtual channel.

2. The method of claim 1, wherein the cache controller circuit:
determine that the address area is included in a secure access area; and
In response to the determination, the apparatus is further configured to ignore memory transactions in the address area from agents that are not authorized to access the secure access area.

The apparatus of claim 1, wherein the cache controller circuit is further configured to flush the one or more cache lines prior to converting the one or more cache lines to the directly addressable RAM.

2. The method of claim 1, wherein the cache controller circuit:
In response to data in a valid cache line being written, issue a write-back request for the valid cache line; and
The apparatus further configured to exclude the one or more cache lines from writeback requests.

2. The method of claim 1, wherein the cache controller circuit:
receive a different request to deallocate a portion of the cache memory from the directly addressable RAM; and
In response to the different request, the apparatus further configured to include the one or more cache lines in cache operations without copying data stored in the directly addressable RAM while the one or more cache lines are reallocated.

7. The apparatus of claim 6, wherein the cache controller circuit is further configured to generate an error in response to a memory transaction in the directly addressable RAM received after deallocating a portion of the cache memory.

As a method,
Receiving, by a cache controller circuit, a request to reallocate a portion of the cache memory circuit currently in use to a directly addressable address area, the request identifying an inactive address area;
based on the identified address region, selecting one or more cache lines of the cache memory circuit to convert; and
Setting, by the cache controller circuit, respective indications for cache lines among the selected cache lines to exclude the selected cache lines from further cache operations.

9. The method of claim 8, wherein setting the respective indication comprises setting, by the cache controller circuit, a real-time memory indicator, wherein the real-time memory indicator determines that the selected cache lines are higher than bulk transactions. A method of indicating that a method is associated with real-time transactions having priorities.

9. The method of claim 8, wherein in response to determining that the identified address region is part of a secure access region, the cache controller circuit, in response to determining that the identified address region is part of a secure access region, A method further comprising the step of ignoring a memory transaction.

9. The method of claim 8, wherein the cache controller circuit:
The method further comprising flushing the selected cache lines prior to setting the respective indications.

The method of claim 8, wherein data written to a specific address currently cached in the cache memory circuit is written back to the specific address in system memory;
Wherein data written to a different address within the identified address range is not written back to the system memory.

According to clause 8,
receiving, by the cache controller circuit, a different request to deallocate a portion of the cache memory circuit from the directly addressable address region; and
In response to the different request, further comprising including the selected cache lines in cache operations, wherein while the selected cache lines are reallocated, data stored in the directly addressable address region is stored in a write to a system memory circuit. Method to be overwritten without a back.

14. The method of claim 13, further comprising, by the cache controller circuit, returning a default value in response to a read request for an address within the directly addressable address region received after deallocating a portion of the cache memory circuit. Including, method.

As a system,
a cache memory circuit including a cache memory having a plurality of paths;
A processor configured to issue memory requests using an address map including active and inactive address areas; and
a cache controller circuit, the cache controller circuit comprising:
receive a request from the processor to reallocate a portion of the cache memory as directly addressable memory, the request identifying an inactive address region;
based on the request, select a portion of the paths to convert; and
A system configured to map a selected portion of the paths for use in the identified address region.

16. The method of claim 15, wherein to convert the selected portion of the paths, the cache controller circuit is configured to set respective indications on cache tags corresponding to specific cache lines included in the selected portion of the paths, The system wherein the respective indications remove the specific cache lines from use as cache memory.

17. The method of claim 16, wherein the cache controller circuit is further configured to set respective real-time indicators to the cache tags corresponding to the specific cache lines, the real-time indicators being configured to determine whether the specific cache lines are associated with bulk transactions. Indicating that the system is associated with real-time transactions with higher priorities.

16. The method of claim 15, wherein the cache controller circuit is further configured to include a selected portion of the paths in cache operations in response to receiving a request to deallocate the directly addressable memory. and wherein data stored in the directly addressable memory while a selected portion is reallocated is not relocated in response to the request to deallocate the directly addressable memory.

16. The system of claim 15, wherein the cache controller circuit is further configured to flush cache lines in a selected portion of the paths prior to mapping the selected portion of the paths for use in the identified address region.

16. The system of claim 15, wherein the portion of paths is half of a particular path.