KR20230159602A

KR20230159602A - Address hashing in multi-memory controller systems

Info

Publication number: KR20230159602A
Application number: KR1020237036724A
Authority: KR
Inventors: 스티븐 피쉬윅; 제프리 이. 고니언; 펄 에이치. 하마르룬드; 이란 타마리; 리올 지메트; 제라드 알. 3세 윌리엄스; 하르샤바르드한 카우쉬카르
Original assignee: 애플 인크.
Priority date: 2021-04-26
Filing date: 2022-04-25
Publication date: 2023-11-21
Also published as: WO2022232008A1; DE112022002417T5

Abstract

일 실시예에서, 시스템은 메모리 어드레스들을 메모리 제어기들에 그리고 궁극적으로 적어도 메모리 디바이스들에 매핑하기 위해 복수의 입도 레벨들에서 어드레스 비트들의 프로그래밍가능한 해싱을 지원할 수 있다. 해싱은 메모리 제어기들에 걸쳐 메모리의 페이지들을 분산시키도록 프로그래밍될 수 있고, 페이지의 연속적인 블록들은 물리적으로 먼 메모리 제어기들에 매핑될 수 있다. 일 실시예에서, 어드레스 비트들은 각각의 입도 레벨로부터 드롭되어, 압축된 파이프 어드레스를 형성하여 메모리 제어기 내에서 전력을 절약할 수 있다. 일 실시예에서, 메모리 폴딩 방식은 메모리의 전체(full complement)가 필요하지 않을 때 시스템 내의 활성 메모리 디바이스들 및/또는 메모리 제어기들의 수를 감소시키기 위해 채용될 수 있다.In one embodiment, the system may support programmable hashing of address bits at multiple levels of granularity to map memory addresses to memory controllers and ultimately to at least memory devices. Hashing can be programmed to distribute pages of memory across memory controllers, and consecutive blocks of a page can be mapped to physically distant memory controllers. In one embodiment, address bits can be dropped from each granularity level to form a compressed pipe address to save power within the memory controller. In one embodiment, a memory folding scheme may be employed to reduce the number of active memory devices and/or memory controllers in a system when the full complement of memory is not needed.

Description

Address hashing in multi-memory controller systems

기술분야Technology field

본 명세서에 기술된 실시예들은 컴퓨터 시스템들에서의 메모리 어드레싱에 관한 것으로, 특히 다수의 메모리 디바이스들에 걸쳐 메모리 어드레스 공간을 분산시키는 것에 관한 것이다.Embodiments described herein relate to memory addressing in computer systems, and more particularly to distributing memory address space across multiple memory devices.

배경기술background technology

(예컨대, 특정 I/O 디바이스들에 매핑되는 I/O 어드레스 공간과 비교하여) 메모리 어드레스 공간을 통해 시스템 내의 프로세서들 및 다른 하드웨어 에이전트(hardware agent)들에 직접 액세스가능한 많은 양의 시스템 메모리를 포함하는 다양한 컴퓨터 시스템들이 존재한다. 시스템 메모리는 일반적으로 다수의 동적 랜덤 액세스 메모리(DRAM) 디바이스들로서 구현된다. 다른 경우들에서, 다른 유형들의 메모리, 예컨대 정적 랜덤 액세스 메모리(SRAM) 디바이스들, 다양한 유형들의 자기 메모리 디바이스들(예컨대, MRAM), 플래시 메모리 또는 판독 전용 메모리(ROM)와 같은 비휘발성 메모리 디바이스들, 다른 유형들의 랜덤 액세스 메모리 디바이스들이 또한 사용될 수 있다. 일부 경우들에서, RAM 디바이스들에 매핑되는 메모리 어드레스 공간의 부분들에 더하여 메모리 어드레스 공간의 일부분이 그러한 디바이스들에 매핑될 수 있다(그리고 메모리 매핑된 I/O 디바이스들도 사용될 수 있다).Contains a large amount of system memory that is directly accessible to processors and other hardware agents within the system through the memory address space (e.g., compared to the I/O address space, which is mapped to specific I/O devices) There are various computer systems that do this. System memory is typically implemented as multiple dynamic random access memory (DRAM) devices. In other cases, other types of memory, such as static random access memory (SRAM) devices, various types of magnetic memory devices (e.g., MRAM), flash memory, or non-volatile memory devices such as read-only memory (ROM). , other types of random access memory devices may also be used. In some cases, in addition to the portions of the memory address space that are mapped to RAM devices, a portion of the memory address space may be mapped to such devices (and memory-mapped I/O devices may also be used).

메모리 디바이스들에 대한 메모리 어드레스들의 매핑은 메모리 시스템의 성능(예컨대, 지속가능한 대역폭 및 메모리 레이턴시의 면에서)에 큰 영향을 미칠 수 있다. 예를 들어, 전형적인 불균일 메모리 아키텍처(non-uniform memory architecture, NUMA) 시스템들은 프로세서들, 주변기기 디바이스들, 및 메모리를 포함하는 컴퓨팅 노드들로 구성된다. 컴퓨팅 노드들은 통신하며, 하나의 컴퓨팅 노드가 다른 컴퓨팅 노드 내의 데이터에 액세스할 수 있지만, 증가된 레이턴시에서 액세스할 수 있다. 메모리 어드레스 공간은 큰 연속 섹션들에서 매핑된다(예컨대, 하나의 노드는 어드레스들 0 내지 N-1을 포함하고 - 여기서 N은 노드 내의 메모리의 바이트 수임 -, 다른 노드는 어드레스들 N 내지 2N-1을 포함하는 등이다). 이러한 매핑은 비-로컬 메모리에 대한 액세스들을 희생하면서 로컬 메모리에 대한 액세스를 최적화한다. 그러나, 이러한 매핑은 또한, 가상 페이지들을 물리적 페이지들에 매핑하는 방식, 및 더 높은 성능을 달성하기 위한 시스템에서 주어진 프로세스가 실행될 수 있는 컴퓨팅 노드의 선택 둘 모두에서 운영 체제를 제한한다. 부가적으로, 대량의 데이터에 대한 프로세스에 의한 액세스들의 대역폭 및 레이턴시는 주어진 로컬 메모리 시스템의 성능에 의해 제한되고, 다른 컴퓨팅 노드 내의 메모리가 액세스되는 경우에 악화된다.The mapping of memory addresses to memory devices can have a significant impact on the performance of a memory system (eg, in terms of sustainable bandwidth and memory latency). For example, typical non-uniform memory architecture (NUMA) systems are comprised of computing nodes that include processors, peripheral devices, and memory. Computing nodes communicate, and one computing node can access data in another computing node, but at increased latency. The memory address space is mapped in large contiguous sections (e.g., one node contains addresses 0 through N-1, where N is the number of bytes of memory in the node, and another node contains addresses N through 2N-1). including etc.). This mapping optimizes accesses to local memory at the expense of accesses to non-local memory. However, this mapping also limits the operating system both in how it maps virtual pages to physical pages and in the choice of computing nodes on which a given process can run in the system to achieve higher performance. Additionally, the bandwidth and latency of accesses by a process to large amounts of data are limited by the performance of a given local memory system and are exacerbated when memory within other computing nodes is accessed.

다음의 상세한 설명은 첨부 도면들을 참조하며, 이제 도면들이 간단히 설명된다.
도 1은 복수의 시스템 온 칩(system on a chip, SOC)들의 일 실시예의 블록도이며, 여기서 주어진 SOC는 복수의 메모리 제어기들을 포함한다.
도 2는 SOC들 상의 메모리 제어기들 및 물리적/논리적 배열의 일 실시예를 예시하는 블록도이다.
도 3은 특정 어드레스를 서비스하는 메모리 제어기를 결정하기 위한 이진 결정 트리의 일 실시예의 블록도이다.
도 4는 복수의 메모리 위치 구성 레지스터들의 일 실시예를 예시하는 블록도이다.
도 5는 부트/전력 투입 동안 SOC들의 일 실시예의 동작을 예시하는 흐름도이다.
도 6은 메모리 요청을 라우팅하기 위한 SOC들의 일 실시예의 동작을 예시하는 흐름도이다.
도 7은 메모리 요청에 응답하는 메모리 제어기의 일 실시예의 동작을 예시하는 흐름도이다.
도 8은 메모리 폴딩(memory folding)을 결정하기 위해 시스템 동작을 모니터링하는 일 실시예의 동작을 예시하는 흐름도이다.
도 9는 메모리 슬라이스를 폴딩하는 일 실시예의 동작을 예시하는 흐름도이다.
도 10은 메모리 슬라이스를 언폴딩(unfolding)하는 일 실시예의 동작을 예시하는 흐름도이다.
도 11은 메모리 폴딩 방법의 일 실시예를 예시하는 흐름도이다.
도 12는 메모리 어드레스를 해싱하는 방법의 일 실시예를 예시하는 흐름도이다.
도 13은 압축된 파이프 어드레스를 형성하는 방법의 일 실시예를 예시하는 흐름도이다.
도 14는 시스템 및 시스템의 다양한 구현예들의 일 실시예의 블록도이다.
도 15는 컴퓨터 액세스가능 저장 매체의 블록도이다.
본 개시내용에 설명된 실시예들은 다양한 수정들 및 대안적인 형태들을 허용할 수 있지만, 그의 특정 실시예들이 도면들에 예로서 도시되고, 본 명세서에서 상세히 설명될 것이다. 그러나, 그에 대한 도면들 및 상세한 설명은 실시예들을 개시된 특정 형태로 제한하는 것으로 의도되는 것이 아니라, 그와는 반대로, 의도는 첨부된 청구범위의 사상 및 범주 내에 속한 모든 수정들, 등가물들, 및 대안들을 커버하기 위한 것임을 이해하여야 한다. 본 명세서에서 사용되는 표제들은 오직 구성 목적들을 위한 것이며 설명의 범위를 제한하기 위해 사용되는 것으로 의도되지 않는다.The following detailed description refers to the accompanying drawings, which are now briefly described.
1 is a block diagram of one embodiment of a plurality of system on a chip (SOC), where a given SOC includes a plurality of memory controllers.
Figure 2 is a block diagram illustrating one embodiment of memory controllers and physical/logical arrangement on SOCs.
Figure 3 is a block diagram of one embodiment of a binary decision tree for determining which memory controller services a particular address.
Figure 4 is a block diagram illustrating one embodiment of a plurality of memory location configuration registers.
5 is a flow diagram illustrating the operation of one embodiment of SOCs during boot/power-up.
Figure 6 is a flow diagram illustrating the operation of one embodiment of SOCs for routing memory requests.
Figure 7 is a flow diagram illustrating the operation of one embodiment of a memory controller responding to a memory request.
Figure 8 is a flow chart illustrating the operation of one embodiment of monitoring system operation to determine memory folding.
Figure 9 is a flow diagram illustrating the operation of one embodiment of folding a memory slice.
Figure 10 is a flow diagram illustrating the operation of one embodiment of unfolding a memory slice.
11 is a flow chart illustrating one embodiment of a memory folding method.
Figure 12 is a flow chart illustrating one embodiment of a method for hashing a memory address.
Figure 13 is a flow diagram illustrating one embodiment of a method for forming a compressed pipe address.
Figure 14 is a block diagram of one embodiment of the system and various implementations of the system.
Figure 15 is a block diagram of a computer accessible storage medium.
Although the embodiments described in this disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will be described in detail herein. However, the drawings and detailed description thereto are not intended to limit the embodiments to the specific form disclosed; on the contrary, the intent is to make all modifications, equivalents, and modifications within the spirit and scope of the appended claims. It should be understood that it is intended to cover alternatives. Headings used herein are for organizational purposes only and are not intended to be used to limit the scope of the description.

도 1은 시스템을 형성하는 복수의 시스템 온 칩(SOC)들(10)의 일 실시예의 블록도이다. SOC들(10)은 공통 집적 회로 설계의 인스턴스들일 수 있으며, 따라서 SOC들(10) 중 하나가 더 상세히 도시된다. SOC(10)의 다른 인스턴스들은 유사할 수 있다. 예시된 실시예에서, SOC(10)는 복수의 메모리 제어기들(12A 내지 12H), 하나 이상의 프로세서 클러스터들(P 클러스터들)(14A, 14B), 하나 이상의 그래픽 프로세싱 유닛(GPU)들(16A, 16B), 하나 이상의 I/O 클러스터들(18A, 18B), 및 서쪽 상호연결부(IC)(20A) 및 동쪽 IC(20B)를 구성하는 통신 패브릭(communication fabric)을 포함한다. I/O 클러스터들(18A, 18B), P 클러스터들(14A, 14B), 및 GPU들(16A, 16B)은 서쪽 IC(20A) 및 동쪽 IC(20B)에 결합될 수 있다. 서쪽 IC(20A)는 메모리 제어기들(12A 내지 12D)에 결합될 수 있고, 동쪽 IC(20B)는 메모리 제어기들(12E 내지 12H)에 결합될 수 있다.1 is a block diagram of one embodiment of a plurality of system-on-chip (SOC) 10 forming a system. The SOCs 10 may be instances of a common integrated circuit design, so one of the SOCs 10 is shown in greater detail. Other instances of SOC 10 may be similar. In the illustrated embodiment, SOC 10 includes a plurality of memory controllers 12A-12H, one or more processor clusters (P clusters) 14A, 14B, one or more graphics processing units (GPUs) 16A, 16B), one or more I/O clusters 18A, 18B, and a communication fabric comprising a western interconnect (IC) 20A and an eastern IC 20B. I/O clusters 18A, 18B, P clusters 14A, 14B, and GPUs 16A, 16B may be coupled to west IC 20A and east IC 20B. West IC 20A may be coupled to memory controllers 12A through 12D, and east IC 20B may be coupled to memory controllers 12E through 12H.

도 1에 도시된 시스템은 메모리 제어기들(12A 내지 12H)에 결합된 복수의 메모리 디바이스들(28)을 추가로 포함한다. 도 1의 예에서, 4개의 메모리 디바이스들(28)이 각각의 메모리 제어기(12A 내지 12H)에 결합된다. 다른 실시예들은 주어진 메모리 제어기(12A 내지 12H)에 결합된 더 많거나 더 적은 메모리 디바이스들(28)을 가질 수 있다. 또한, 상이한 메모리 제어기들(12A 내지 12H)은 상이한 수의 메모리 디바이스들(28)을 가질 수 있다. 메모리 디바이스들(28)은 용량 및 구성이 다를 수 있거나, 일관된 용량 및 구성(예컨대, 뱅크들, 뱅크 그룹들, 행 크기, 랭크들 등)을 가질 수 있다. 각각의 메모리 디바이스(28)는 이 구현예에서 독립 채널을 통해 각자의 메모리 제어기(12A 내지 12H)에 결합될 수 있다. 다른 실시예들에서는 2개 이상의 메모리 디바이스들(28)에 의해 공유되는 채널들이 지원될 수 있다. 일 실시예에서, 메모리 디바이스들(28)은 칩 온 칩(chip-on-chip, CoC) 또는 패키지 온 패키지(package-on-package, PoP) 구현으로 대응하는 SOC(10) 상에 장착될 수 있다. 다른 실시예에서, 메모리 디바이스들(28)은 다중칩 모듈(multi-chip-module, MCM) 구현으로 SOC(10)와 함께 패키징될 수 있다. 또 다른 실시예에서, 메모리 디바이스들(28)은 단일 인라인 메모리 모듈(SIMM)들, 듀얼 인라인 메모리 모듈(DIMM)들 등과 같은 하나 이상의 메모리 모듈들 상에 장착될 수 있다. 일 실시예에서, 메모리 디바이스들(28)은 동기식 DRAM(SDRAM) 및 더 구체적으로 더블 데이터 레이트(DDR) SDRAM과 같은 동적 랜덤 액세스 메모리(DRAM)일 수 있다. 일 실시예에서, 메모리 디바이스들(28)은 모바일 DDR(mDDR) SDRAM으로도 알려진, 저전력(LP) DDR SDRAM 사양으로 구현될 수 있다.The system shown in FIG. 1 further includes a plurality of memory devices 28 coupled to memory controllers 12A through 12H. In the example of Figure 1, four memory devices 28 are coupled to each memory controller 12A through 12H. Other embodiments may have more or fewer memory devices 28 coupled to a given memory controller 12A-12H. Additionally, different memory controllers 12A-12H may have different numbers of memory devices 28. Memory devices 28 may vary in capacity and configuration, or may have consistent capacity and configuration (eg, banks, bank groups, row size, ranks, etc.). Each memory device 28 may be coupled to a respective memory controller 12A-12H through an independent channel in this implementation. Channels shared by two or more memory devices 28 may be supported in other embodiments. In one embodiment, memory devices 28 may be mounted on a corresponding SOC 10 in a chip-on-chip (CoC) or package-on-package (PoP) implementation. there is. In another embodiment, memory devices 28 may be packaged with SOC 10 in a multi-chip-module (MCM) implementation. In another embodiment, memory devices 28 may be mounted on one or more memory modules, such as single in-line memory modules (SIMMs), dual in-line memory modules (DIMMs), etc. In one embodiment, memory devices 28 may be dynamic random access memory (DRAM), such as synchronous DRAM (SDRAM) and, more specifically, double data rate (DDR) SDRAM. In one embodiment, memory devices 28 may be implemented with a low-power (LP) DDR SDRAM specification, also known as mobile DDR (mDDR) SDRAM.

일 실시예에서, 상호연결부들(20A, 20B)은 또한 SOC(10)의 다른 인스턴스에 대한 오프-SOC 인터페이스에 결합되어, 시스템을 하나 초과의 SOC(예컨대, 하나 초과의 반도체 다이, 여기서 SOC(10)의 주어진 인스턴스는 단일 반도체 다이 상에 구현될 수 있지만 다수의 인스턴스들이 결합되어 시스템을 형성할 수 있음)로 확장할 수 있다. 따라서, 시스템은, SOC(10)의 인스턴스들이 구현되는 2개 이상의 반도체 다이들로 확장가능할 수 있다. 예를 들어, 2개 이상의 반도체 다이들은 다수의 반도체 다이들의 존재가 단일 시스템 상에서 실행되는 소프트웨어에 투명한 단일 시스템으로서 구성될 수 있다. 일 실시예에서, 다이로부터 다이로의 통신에서의 지연들이 최소화될 수 있어서, 다이-투-다이 통신은 전형적으로, 멀티-다이 시스템에 대한 소프트웨어 투명성의 하나의 양태로서 다이-내(intra-die) 통신과 비교하여 상당한 부가적인 레이턴시를 발생시키지 않는다. 다른 실시예들에서, SOC(10) 내의 통신 패브릭은 물리적으로 구별되는 상호연결부들(20A, 20B)을 갖지 않을 수 있고, 오히려 (메모리 요청들을 송신하는) 시스템 내의 소스 하드웨어 에이전트들과 메모리 제어기들(12A 내지 12H) 사이의 완전한 상호연결부(full interconnect)(예컨대, 완전한 크로스바)일 수 있다. 그러한 실시예들은, 일 실시예에서, 해싱 및 라우팅 목적들을 위해, 논리적으로 상호연결부들(20A, 20B)의 개념을 여전히 포함할 수 있다.In one embodiment, interconnects 20A, 20B are also coupled to an off-SOC interface to another instance of SOC 10, thereby connecting the system to more than one SOC (e.g., more than one semiconductor die, where SOC ( A given instance of 10) can be implemented on a single semiconductor die, but can be expanded to multiple instances (multiple instances can be combined to form a system). Accordingly, the system may be scalable to two or more semiconductor dies on which instances of SOC 10 are implemented. For example, two or more semiconductor dies can be configured as a single system where the presence of multiple semiconductor dies is transparent to software running on the single system. In one embodiment, delays in die-to-die communication can be minimized, so that die-to-die communication is typically an aspect of software transparency for multi-die systems. ) It does not cause significant additional latency compared to communication. In other embodiments, the communication fabric within SOC 10 may not have physically distinct interconnections 20A, 20B, but rather source hardware agents and memory controllers within the system (sending memory requests). There may be a full interconnect (eg, a full crossbar) between (12A to 12H). Such embodiments may still include the concept of logical interconnections 20A, 20B, in one embodiment, for hashing and routing purposes.

메모리 제어기(12A)는 도 1에서 더 상세하게 도시되어 있으며, 제어 회로(24) 및 다양한 내부 버퍼(들)(26)를 포함할 수 있다. 다른 메모리 제어기들(12B 내지 12H)은 유사할 수 있다. 제어 회로(24)는 내부 버퍼들(26) 및 메모리 위치 구성 레지스터들(22F)(아래에서 논의됨)에 결합된다. 일반적으로, 제어 회로(24)는 메모리 제어기(12A)가 결합되는 메모리 디바이스들(28)에 대한 액세스를 제어하도록 구성될 수 있으며, 이는 메모리 디바이스(28)에 대한 채널들을 제어하는 것, 교정을 수행하는 것, 정확한 리프레시를 보장하는 것 등을 포함한다. 제어 회로(24)는 또한 레이턴시를 최소화하고, 메모리 대역폭을 최대화하는 등을 시도하기 위해 메모리 요청들을 스케줄링하도록 구성될 수 있다. 일 실시예에서, 메모리 제어기들(12A 내지 12H)은 메모리 레이턴시를 감소시키기 위해 메모리 캐시들을 채용할 수 있고, 제어 회로(24)는 메모리 요청들을 위해 메모리 캐시에 액세스하고, 메모리 캐시에서의 히트들 및 미스들, 및 메모리 캐시로부터의 축출(eviction)들을 프로세싱하도록 구성될 수 있다. 일 실시예에서, 메모리 제어기들(12A 내지 12H)은 그에 부착된 메모리에 대한 코히어런시(예컨대, 디렉토리-기반 코히어런시 방식)를 관리할 수 있고, 제어 회로(24)는 코히어런시를 관리하도록 구성될 수 있다. 메모리 디바이스(28)에 대한 채널은 디바이스에 대한 물리적 연결들뿐만 아니라, 저레벨 통신 회로부(예컨대, 물리 계층(PHY) 회로부)를 포함할 수 있다.Memory controller 12A is shown in more detail in FIG. 1 and may include control circuitry 24 and various internal buffer(s) 26. Other memory controllers 12B through 12H may be similar. Control circuit 24 is coupled to internal buffers 26 and memory location configuration registers 22F (discussed below). In general, control circuitry 24 may be configured to control access to memory devices 28 to which memory controller 12A is coupled, such as controlling channels for memory devices 28, performing calibration, etc. This includes performing, ensuring accurate refresh, etc. Control circuitry 24 may also be configured to schedule memory requests to attempt to minimize latency, maximize memory bandwidth, etc. In one embodiment, memory controllers 12A through 12H may employ memory caches to reduce memory latency, and control circuitry 24 may access the memory cache for memory requests and detect hits in the memory cache. and misses, and evictions from the memory cache. In one embodiment, memory controllers 12A through 12H can manage coherency (e.g., a directory-based coherency scheme) for the memory attached thereto, and control circuitry 24 can manage coherency Can be configured to manage runtime. The channel to memory device 28 may include low-level communication circuitry (e.g., physical layer (PHY) circuitry) as well as physical connections to the device.

도 1에 예시된 바와 같이, I/O 클러스터들(18A, 18B), P 클러스터들(14A, 14B), GPU들(16A, 16B), 및 메모리 제어기들(12A 내지 12H)은 메모리 위치 구성(MLC) 레지스터들(참조 번호 22A 내지 22H, 22J 내지 22N, 및 22P)을 포함한다. 일부 실시예들에서, 서쪽 및 동쪽 IC(20A, 20B)는 또한 메모리 위치 구성 레지스터들을 포함할 수 있다. 시스템은 다수의 메모리 제어기들(12A 내지 12H)(및 가능하게는 SOC(10)의 다수의 인스턴스들에서 메모리 제어기들의 다수의 세트들)을 포함하기 때문에, 메모리 요청에 의해 액세스되는 어드레스는 디코딩(예컨대, 해싱)되어 메모리 제어기(12A 내지 12H)를 결정할 수 있고, 결국 그 어드레스에 매핑되는 특정 메모리 디바이스(28)를 결정할 수 있다. 즉, 메모리 어드레스들은 메모리 어드레스들을 메모리 디바이스들 내의 메모리 위치들에 매핑하는 메모리 어드레스 공간 내에서 정의될 수 있다. 메모리 어드레스 공간 내의 주어진 메모리 어드레스는 복수의 메모리 제어기들(12A 내지 12H) 중 하나에 결합되는 메모리 디바이스들(28) 중 하나 내의 메모리 위치를 고유하게 식별한다. MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 및 22P)은 매핑을 설명하도록 프로그래밍될 수 있어서, MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 및 22P)에 의해 특정된 바와 같은 메모리 어드레스 비트들을 해싱하는 것은 메모리 제어기(12A 내지 12H)를 식별할 수 있고, 결국 메모리 요청이 지향되는 메모리 디바이스(28)(및 일 실시예에서, 메모리 디바이스(28) 내의 뱅크 그룹 및/또는 뱅크)를 식별할 수 있다.As illustrated in FIG. 1 , I/O clusters 18A, 18B, P clusters 14A, 14B, GPUs 16A, 16B, and memory controllers 12A through 12H have a memory location configuration ( MLC) registers (reference numbers 22A through 22H, 22J through 22N, and 22P). In some embodiments, west and east ICs 20A, 20B may also include memory location configuration registers. Because the system includes multiple memory controllers 12A through 12H (and possibly multiple sets of memory controllers in multiple instances of SOC 10), the address accessed by a memory request can be decoded ( For example, hashing) can be used to determine the memory controllers 12A through 12H, which in turn can determine the specific memory device 28 that is mapped to that address. That is, memory addresses can be defined within a memory address space that maps memory addresses to memory locations within memory devices. A given memory address within the memory address space uniquely identifies a memory location within one of the memory devices 28 coupled to one of the plurality of memory controllers 12A-12H. MLC registers 22A through 22H, 22J through 22N, and 22P can be programmed to describe a mapping of memory address bits as specified by MLC registers 22A through 22H, 22J through 22N, and 22P. Hashing may identify the memory controllers 12A through 12H, which in turn may identify the memory device 28 (and, in one embodiment, the bank group and/or bank within memory device 28) to which the memory request is directed. You can.

주어진 회로에는 하나 초과의 MLC 레지스터가 있을 수 있다. 예를 들어, 메모리 제어기(12A 내지 12H)를 식별하기 위해 입도 레벨(level of granularity)들의 계층구조에서 각각의 입도 레벨에 대한 MLC 레지스터가 있을 수 있다. 주어진 회로에 의해 디코딩되는 레벨들의 수는 메모리 요청을 올바른 메모리 제어기(12A 내지 12H)로 라우팅하는 방법을 결정하기 위해 주어진 회로가 사용하는 입도 레벨들의 수에 의존할 수 있고, 일부 경우들에서는 올바른 메모리 제어기(12A 내지 12H) 내의 훨씬 더 낮은 입도 레벨들에 의존할 수 있다. 메모리 제어기들(12A 내지 12H)은, 적어도 특정 메모리 디바이스(28)까지, 계층구조의 각각의 레벨에 대한 MLC 레지스터들을 포함할 수 있다. 일반적으로, 입도 레벨들은 복수의 메모리 제어기들(12A 내지 12H) 중 적어도 2개의 메모리 제어기들의 2의 재귀적 거듭제곱으로서 보일 수 있다. 따라서, MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 및 22P)은 동일한 일반 참조 번호를 부여받지만, MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 및 22P)은 모두 동일한 세트의 레지스터들이지는 않을 수 있다. 그러나, 동일한 입도 레벨에 대응하는 레지스터들(22A 내지 22H, 22J 내지 22N, 및 22P)의 인스턴스들은 동일할 수 있고, 일관되게 프로그래밍될 수 있다. 부가적인 세부사항들이 아래에서 추가로 논의된다.A given circuit may have more than one MLC register. For example, there may be an MLC register for each level of granularity in the hierarchy of levels of granularity to identify the memory controller 12A through 12H. The number of levels decoded by a given circuit may depend on the number of granularity levels the given circuit uses to determine how to route a memory request to the correct memory controller 12A through 12H, and in some cases the correct memory. One may rely on much lower granularity levels within controllers 12A through 12H. Memory controllers 12A through 12H may include MLC registers for each level of the hierarchy, at least up to the specific memory device 28. Generally, the granularity levels can be viewed as recursive powers of 2 of at least two of the plurality of memory controllers 12A through 12H. Therefore, although the MLC registers 22A through 22H, 22J through 22N, and 22P are assigned the same general reference number, the MLC registers 22A through 22H, 22J through 22N, and 22P may not all be the same set of registers. You can. However, instances of registers 22A through 22H, 22J through 22N, and 22P corresponding to the same granularity level may be identical and may be programmed consistently. Additional details are discussed further below.

메모리 제어기들(12A 내지 12H)은 SOC(10)가 구현되는 집적 회로 다이에 걸쳐 물리적으로 분산될 수 있다. 따라서, 시스템 내의 메모리 제어기들은 다수의 집적 회로 다이에 걸쳐 물리적으로 분산되고, 집적 회로 다이 내에서 물리적으로 분산될 수 있다. 즉, 메모리 제어기들(12A 내지 12H)은 SOC(10)가 형성되는 반도체 다이의 영역에 걸쳐 분산될 수 있다. 예를 들어, 도 1에서, SOC(10) 내에서의 메모리 제어기들(12A 내지 12H)의 위치는 SOC(10) 다이 영역 내에서의 그러한 메모리 제어기들(12A 내지 12H)의 물리적 위치들을 나타낼 수 있다. 따라서, 주어진 메모리 요청이 매핑되는 메모리 제어기(12A 내지 12H)("타겟팅된 메모리 제어기")를 결정하는 것은 메모리 요청을 SOC(10) 내의 통신 패브릭을 통해 타겟팅된 메모리 제어기로 라우팅하는 데 사용될 수 있다. 통신 패브릭은, 예를 들어, 서쪽 IC(20A) 및 동쪽 IC(20B)를 포함할 수 있고, 도 1에 도시되지 않은 추가적인 상호연결부를 추가로 포함할 수 있다. 다른 실시예들에서, 메모리 제어기들(12A 내지 12H)은 물리적으로 분산되지 않을 수 있다. 그럼에도 불구하고, 본 명세서에 설명된 바와 같은 해싱 메커니즘은 타겟팅된 메모리 제어기(12A 내지 12H)를 식별하는 데 사용될 수 있다.Memory controllers 12A-12H may be physically distributed across the integrated circuit die on which SOC 10 is implemented. Accordingly, memory controllers within a system may be physically distributed across multiple integrated circuit dies, and physically distributed within an integrated circuit die. That is, memory controllers 12A through 12H may be distributed across the area of the semiconductor die on which SOC 10 is formed. For example, in Figure 1, the location of memory controllers 12A through 12H within SOC 10 may represent the physical locations of those memory controllers 12A through 12H within the SOC 10 die area. there is. Accordingly, determining which memory controller 12A through 12H a given memory request is mapped to (“targeted memory controller”) may be used to route the memory request through the communication fabric within SOC 10 to the targeted memory controller. . The communication fabric may include, for example, West IC 20A and East IC 20B, and may further include additional interconnections not shown in FIG. 1 . In other embodiments, memory controllers 12A-12H may not be physically distributed. Nonetheless, a hashing mechanism as described herein may be used to identify targeted memory controllers 12A through 12H.

I/O 클러스터들(18A, 18B), P 클러스터들(14A, 14B) 및 GPU들(16A, 16B)은 메모리 어드레스들을 사용하여 메모리 제어기들(12A 내지 12H)을 통해 메모리 디바이스들(28) 내의 데이터에 액세스하도록 구성되는 하드웨어 에이전트들의 예들일 수 있다. 다른 하드웨어 에이전트들도 포함될 수 있다. 일반적으로, 하드웨어 에이전트는 메모리 요청(예컨대, 판독 또는 기록 요청)의 소스일 수 있는 하드웨어 회로일 수 있다. 요청은 MLC 레지스터들의 내용에 기초하여 하드웨어 에이전트로부터 타겟팅된 메모리 제어기로 라우팅된다.I/O clusters 18A, 18B, P clusters 14A, 14B, and GPUs 16A, 16B use memory addresses within memory devices 28 via memory controllers 12A through 12H. These may be examples of hardware agents configured to access data. Other hardware agents may also be included. Generally, a hardware agent may be a hardware circuit that may be the source of a memory request (eg, a read or write request). The request is routed from the hardware agent to the targeted memory controller based on the contents of the MLC registers.

일 실시예에서, 메모리 어드레스들은 메모리 시스템 전체에 걸쳐 페이지 내에서 데이터를 분산시키기 위해 메모리 제어기들(12A 내지 12H)(및 시스템에 포함된 SOC(10)의 다른 인스턴스들에서의 대응하는 메모리 제어기들)에 걸쳐 매핑될 수 있다. 그러한 방식은 페이지 내의 데이터의 대부분 또는 전부에 액세스하는 애플리케이션들을 위한 메모리 제어기 및 통신 패브릭의 대역폭 사용량을 개선할 수 있다. 즉, 메모리 어드레스 공간 내의 주어진 페이지는 복수의 블록들로 분할될 수 있고, 주어진 페이지의 복수의 블록들은 시스템 내의 복수의 메모리 제어기들에 걸쳐 분산될 수 있다. 페이지는 가상 메모리 시스템에서 메모리의 할당 단위일 수 있다. 즉, 메모리가 애플리케이션 또는 다른 프로세스/스레드에 할당될 때, 메모리는 페이지 단위로 할당된다. 가상 메모리 시스템은 애플리케이션에 의해 사용되는 가상 어드레스들 및 메모리 어드레스 공간 내의 물리적 어드레스로부터의 변환을 생성하며, 이는 메모리 디바이스들(28) 내의 위치들을 식별한다. 페이지 크기들은 실시예마다 다르다. 예를 들어, 16킬로바이트(16 kB) 페이지 크기가 사용될 수 있다. 더 작거나 더 큰 페이지 크기들이 사용될 수 있다(예컨대, 4 kB, 8 kB, 1메가바이트(MB), 4 MB 등). 일부 실시예들에서, 다수의 페이지 크기들이 시스템에서 동시에 지원된다. 일반적으로, 페이지는 페이지-크기의 경계에 정렬된다(예컨대, 16 kB 페이지가 16 kB 경계들 상에 할당되어, 최하위 14개 어드레스 비트들이 페이지 내에서 오프셋을 형성하고, 나머지 어드레스 비트들은 페이지를 식별한다).In one embodiment, memory addresses are used by memory controllers 12A through 12H (and corresponding memory controllers in other instances of SOC 10 included in the system) to distribute data within pages throughout the memory system. ) can be mapped across. Such an approach can improve bandwidth usage of the memory controller and communication fabric for applications that access most or all of the data within a page. That is, a given page within a memory address space may be partitioned into multiple blocks, and the multiple blocks of a given page may be distributed across multiple memory controllers within the system. A page may be an allocation unit of memory in a virtual memory system. That is, when memory is allocated to an application or other process/thread, the memory is allocated in pages. The virtual memory system generates translations from physical addresses within the memory address space and virtual addresses used by an application, which identifies locations within memory devices 28. Page sizes vary from embodiment to embodiment. For example, a 16 kilobyte (16 kB) page size may be used. Smaller or larger page sizes may be used (eg, 4 kB, 8 kB, 1 megabyte (MB), 4 MB, etc.). In some embodiments, multiple page sizes are supported simultaneously in the system. Typically, pages are aligned to page-size boundaries (e.g., a 16 kB page is allocated on 16 kB boundaries, so that the least significant 14 address bits form an offset within the page, and the remaining address bits identify the page. do).

주어진 페이지가 분할되는 블록들의 수는 시스템 내의 메모리 제어기들 및/또는 메모리 채널들의 수와 관련될 수 있다. 예를 들어, 블록들의 수는 메모리 제어기들의 수(또는 메모리 채널들의 수)와 동일할 수 있다. 그러한 실시예에서, 페이지 내의 모든 데이터가 액세스되는 경우, 동일한 수의 메모리 요청들이 각각의 메모리 제어기/메모리 채널로 전송될 수 있다. 다른 실시예들은 메모리 제어기들의 수의 배수, 또는 메모리 제어기들의 분수(예컨대, 2의 거듭제곱 분수)와 동일한 블록들의 수를 가져서, 페이지가 메모리 제어기들의 서브세트에 걸쳐 분산되도록 할 수 있다.The number of blocks into which a given page is divided may be related to the number of memory controllers and/or memory channels within the system. For example, the number of blocks may be equal to the number of memory controllers (or number of memory channels). In such an embodiment, when all data in a page is accessed, the same number of memory requests may be sent to each memory controller/memory channel. Other embodiments may have the number of blocks equal to a multiple of the number of memory controllers, or a fraction of the memory controllers (e.g., a fraction to a power of 2), allowing pages to be distributed across a subset of memory controllers.

일 실시예에서, MLC 레지스터들은 페이지의 인접한 블록들을 시스템의 SOC(들)(10) 내에서 물리적으로 서로 먼 메모리 제어기들에 매핑하도록 프로그래밍될 수 있다. 따라서, 페이지의 연속적인 블록들이 액세스되는 액세스 패턴은, 통신 패브릭의 상이한 부분들을 이용하고 최소의 방식으로 서로 간섭하면서(또는 아마도 전혀 간섭하지 않으면서), 시스템에 걸쳐 분산될 수 있다. 예를 들어, 인접한 블록들에 대한 메모리 요청들은 통신 패브릭을 통해 상이한 경로들을 취할 수 있고, 따라서 동일한 패브릭 자원들(예컨대, 상호연결부들(20A, 20B)의 부분들)을 소비하지 않을 것이다. 즉, 경로들은 적어도 부분적으로 비-중첩될 수 있다. 일부 경우들에서, 경로들은 완전히 비-중첩될 수 있다. 메모리 액세스들의 분산에 관한 추가 세부사항들은 도 2와 관련하여 아래에서 제공된다. 메모리 액세스들의 분산을 최대화하는 것은 전체 레이턴시를 감소시키고 대역폭 이용을 증가시킴으로써 시스템 전반의 성능을 개선할 수 있다. 부가적으로, 프로세서들에 대한 프로세스들을 스케줄링하는 데 있어서의 유연성이 달성될 수 있는데, 이는 유사한 성능이 임의의 P 클러스터(14A, 14B) 내의 임의의 유사한 프로세서 상에서 발생할 수 있기 때문이다.In one embodiment, MLC registers may be programmed to map adjacent blocks of a page to memory controllers that are physically distant from each other within the SOC(s) 10 of the system. Accordingly, the access pattern in which successive blocks of a page are accessed can be distributed across the system, utilizing different parts of the communication fabric and interfering with each other in a minimal manner (or perhaps not at all). For example, memory requests for adjacent blocks may take different paths through the communication fabric and thus will not consume the same fabric resources (e.g., portions of interconnects 20A, 20B). That is, the paths may be at least partially non-overlapping. In some cases, the paths may be completely non-overlapping. Additional details regarding the distribution of memory accesses are provided below in relation to FIG. 2 . Maximizing the distribution of memory accesses can improve overall system performance by reducing overall latency and increasing bandwidth utilization. Additionally, flexibility in scheduling processes to processors can be achieved because similar performance can occur on any similar processor within any P cluster 14A, 14B.

MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P)은 주어진 메모리 어드레스에 대해 시스템에서 각각의 입도 레벨을 선택하기 위해 해싱되는 어드레스 비트들을 독립적으로 특정할 수 있다. 예를 들어, 제1 입도 레벨은 메모리 요청이 라우팅되는 반도체 다이를 선택할 수 있다. 제2 입도 레벨은 메모리 제어기들의 세트일 수 있는 슬라이스를 선택할 수 있다(예컨대, 상위 4개의 메모리 제어기들(12A, 12B 및 12E, 12F)은 슬라이스를 형성할 수 있고, 하위 4개의 메모리 제어기들(12C, 12D 및 12F, 12G)은 다른 슬라이스를 형성할 수 있다). 다른 입도 레벨들은 "측(side)"(도 1에서 동쪽 또는 서쪽), 및 슬라이스 내의 행을 선택하는 것을 포함할 수 있다. 메모리 제어기들(12A 내지 12H) 내에 추가적인 입도 레벨들이 있을 수 있어서, 최종적으로, 선택된 메모리 디바이스(28)(및 아마도, 일 실시예에서, 디바이스(28) 내의 뱅크 그룹 및 뱅크)를 생성할 수 있다. 다양한 실시예들에서 임의의 수의 입도 레벨들이 지원될 수 있다. 예를 들어, 2개 이상의 다이가 포함되는 경우, 다이 레벨보다 더 거친 하나 이상의 입도 레벨들이 있을 수 있으며, 여기서 다이의 그룹들이 선택된다.MLC registers 22A through 22H, 22J through 22N, and 22P can independently specify the address bits that are hashed to select each level of granularity in the system for a given memory address. For example, the first level of granularity may select the semiconductor die to which memory requests are routed. A second level of granularity may select a slice, which may be a set of memory controllers (e.g., the top four memory controllers 12A, 12B and 12E, 12F may form a slice, and the bottom four memory controllers 12C, 12D and 12F, 12G) can form different slices). Other levels of granularity may include selecting a “side” (east or west in Figure 1), and a row within a slice. There may be additional levels of granularity within memory controllers 12A-12H, ultimately creating a selected memory device 28 (and possibly, in one embodiment, a bank group and bank within device 28). . Any number of granularity levels may be supported in various embodiments. For example, if more than two dies are involved, there may be one or more granularity levels coarser than the die level, from which groups of dies are selected.

각각의 입도 레벨에 대한 어드레스 비트들의 독립적인 사양은 시스템에서 상당한 유연성을 제공할 수 있다. 부가적으로, SOC(10) 자체의 설계에 대한 변화들은 MLC 레지스터들에서 상이한 프로그래밍을 사용함으로써 관리될 수 있고, 따라서 메모리 시스템 및/또는 상호연결부 내의 하드웨어는 메모리 디바이스들에 대한 어드레스들의 상이한 매핑을 수용하기 위해 변화할 필요가 없다. 또한, MLC 레지스터들에서의 프로그래밍가능성은, SOC(들)(10)를 포함하는 주어진 제품에서 메모리 디바이스들(28)이 줄도록(depopulated) 하여, 그 제품에서 메모리 디바이스들(28)의 전체(full complement)가 요구되지 않는 경우 비용 및 전력 소비를 감소시킬 수 있다.Independent specification of address bits for each granularity level can provide significant flexibility in the system. Additionally, changes to the design of the SOC 10 itself can be managed by using different programming in the MLC registers, and thus hardware within the memory system and/or interconnect, resulting in different mappings of addresses to memory devices. There is no need to change to accept. Additionally, programmability in the MLC registers allows for the depopulation of memory devices 28 in a given product containing SOC(s) 10, thereby reducing the total number of memory devices 28 in that product. If full complement is not required, cost and power consumption can be reduced.

일 실시예에서, 각각의 입도 레벨은 이진 결정이다: 해시로부터의 이진 0의 결과는 그 레벨에서 하나의 결과를 선택하고, 해시로부터의 이진 1의 결과는 다른 결과를 선택한다. 해시들은 MLC 레지스터들의 프로그래밍에 의해 레벨들에 대해 선택된 입력 비트들에 대한 임의의 조합 논리 연산일 수 있다. 일 실시예에서, 해시는, 어드레스 비트들이 서로 배타적-논리합(exclusive-OR)되어 이진 출력을 생성하는, 배타적 논리합 감소일 수 있다. 다른 실시예들은 둘 초과의 결과들 중에서 선택하기 위해 다중 비트 출력 값을 생성할 수 있다.In one embodiment, each level of granularity is a binary decision: a binary 0 resulting from the hash selects one outcome at that level, and a binary 1 resulting from the hash selects the other outcome. Hashes can be any combinational logic operation on the input bits selected for levels by programming the MLC registers. In one embodiment, the hash may be an exclusive-OR reduction in which the address bits are exclusive-ORed with each other to produce a binary output. Other embodiments may generate a multi-bit output value to select between more than two results.

주어진 메모리 제어기(12A 내지 12H) 내의 내부 버퍼들(26)은 상당한 수의 메모리 요청들을 저장하도록 구성될 수 있다. 내부 버퍼들(26)은 주어진 메모리 제어기(12A 내지 12H)에서 프로세싱되는 다양한 메모리 요청들의 상태뿐만 아니라, 요청들이 프로세싱될 때 요청들이 통과하여 흐를 수 있는 다양한 파이프라인 스테이지들을 추적하는 트랜잭션 테이블들과 같은 정적 버퍼들을 포함할 수 있다. 요청에 의해 액세스되는 메모리 어드레스는 요청을 기술하는 데이터의 상당 부분일 수 있고, 따라서 요청들을 저장하고 요청들을 주어진 메모리 제어기(12A 내지 12H) 내에서 다양한 자원들을 통해 이동시키는 데 있어서 전력 소비의 상당한 컴포넌트일 수 있다. 일 실시예에서, 메모리 제어기들(12A 내지 12H)은 타겟팅된 메모리 제어기를 결정하기 위해 사용되는 (각각의 입도 레벨에 대응하는) 어드레스 비트들의 각각의 세트로부터 어드레스 비트를 드롭(drop)하도록 구성될 수 있다. 일 실시예에서, 나머지 어드레스 비트들은, 요청이 타겟팅된 메모리 제어기에 대한 것이라는 사실과 함께, 필요한 경우 드롭된 어드레스 비트들을 복구하는 데 사용될 수 있다. 일부 실시예들에서, 드롭된 비트는 다른 입도 레벨에 대응하는 임의의 다른 해시에 포함되지 않는 어드레스 비트일 수 있다. 다른 레벨들로부터의 드롭된 비트의 배제는 드롭 비트들의 복구를 병렬로 허용할 수 있는데, 이는 동작들이 독립적이기 때문이다. 주어진 드롭된 비트가 다른 레벨들로부터 배제되지 않는 경우, 그것은 먼저 복구될 수 있고, 이어서 다른 드롭된 비트들을 복구하는 데 사용될 수 있다. 따라서, 배제는 복구를 위한 최적화일 수 있다. 다른 실시예들은 원래 어드레스의 복구를 요구하지 않을 수 있고, 따라서 드롭된 비트들은 각각의 해시에 고유할 필요가 없거나, 또는 배제가 구현되지 않는 경우 직렬 방식으로 비트들을 복구할 수 있다. (드롭된 비트들이 없는) 나머지 어드레스 비트들은 프로세싱을 위해 메모리 제어기의 내부에서 사용될 수 있는 압축된 파이프 어드레스를 형성할 수 있다. 드롭된 어드레스 비트들은 필요하지 않는데, 그 이유는 주어진 메모리 제어기(12A 내지 12H)에 결합된 메모리 디바이스들(28) 내의 메모리의 양이, 압축된 파이프 어드레스를 사용하여 고유하게 어드레싱될 수 있기 때문이다. MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P)은, 일 실시예에서, 드롭 비트들을 식별하도록 프로그래밍가능한 레지스터들을 포함할 수 있다.Internal buffers 26 within a given memory controller 12A-12H may be configured to store a significant number of memory requests. Internal buffers 26 include transaction tables that track the status of the various memory requests being processed at a given memory controller 12A through 12H, as well as the various pipeline stages through which requests may flow as they are processed. Can contain static buffers. The memory address accessed by a request may be a significant portion of the data describing the request and therefore a significant component of power consumption in storing the requests and moving them through various resources within a given memory controller 12A through 12H. It can be. In one embodiment, memory controllers 12A through 12H may be configured to drop an address bit from each set of address bits (corresponding to each granularity level) used to determine the targeted memory controller. You can. In one embodiment, the remaining address bits, along with the fact that the request is to a targeted memory controller, can be used to recover dropped address bits if necessary. In some embodiments, the dropped bit may be an address bit that is not included in any other hash corresponding to another granularity level. Exclusion of dropped bits from different levels can allow recovery of dropped bits in parallel since the operations are independent. If a given dropped bit is not excluded from other levels, it can be recovered first and then used to recover other dropped bits. Therefore, exclusion may be optimal for recovery. Other embodiments may not require recovery of the original address, so the dropped bits do not need to be unique in each hash, or may recover the bits in a serial manner if exclusion is not implemented. The remaining address bits (without dropped bits) can form a compressed pipe address that can be used internally to the memory controller for processing. Dropped address bits are not needed because the amount of memory in memory devices 28 coupled to a given memory controller 12A through 12H can be uniquely addressed using the compressed pipe address. . MLC registers 22A through 22H, 22J through 22N, and 22P, in one embodiment, may include programmable registers to identify drop bits.

도 1의 SOC(10)는 특정 수의 메모리 제어기들(12A 내지 12H), P 클러스터들(14A, 14B), GPU들(16A, 16B), 및 I/O 클러스터들(18A, 18B)을 포함한다. 일반적으로, 다양한 실시예들은, 원하는 대로, 임의의 수의 메모리 제어기들(12A 내지 12H), P 클러스터들(14A, 14B), GPU들(16A, 16B), 및 I/O 클러스터들(18A, 18B)을 포함할 수 있다. 위에서 언급된 바와 같이, P 클러스터들(14A, 14B), GPU들(16A, 16B), 및 I/O 클러스터들(18A, 18B)은 일반적으로 각각의 컴포넌트에 대해 본 명세서에 설명된 동작을 구현하도록 구성된 하드웨어 회로들을 포함한다. 유사하게, 메모리 제어기들(12A 내지 12H)은 일반적으로 각각의 컴포넌트에 대해 본 명세서에 설명된 동작을 구현하기 위한 하드웨어 회로들(메모리 제어기 회로들)을 포함한다. 상호연결부(20A, 20B) 및 다른 통신 패브릭은 일반적으로 다른 컴포넌트들 사이에서 통신들(예컨대, 메모리 요청들)을 수송하기 위한 회로들을 포함한다. 상호연결부(20A, 20B)는 포인트 투 포인트 인터페이스들, 공유된 버스 인터페이스들, 및/또는 하나 또는 양쪽 모두의 인터페이스들의 계층구조들을 포함할 수 있다. 패브릭은 회선 교환, 패킷 교환 등일 수 있다.SOC 10 of FIG. 1 includes a certain number of memory controllers 12A through 12H, P clusters 14A, 14B, GPUs 16A, 16B, and I/O clusters 18A, 18B. do. In general, various embodiments may include any number of memory controllers 12A through 12H, P clusters 14A, 14B, GPUs 16A, 16B, and I/O clusters 18A, as desired. 18B) may be included. As mentioned above, P clusters 14A, 14B, GPUs 16A, 16B, and I/O clusters 18A, 18B generally implement the operations described herein for each component. It includes hardware circuits configured to do so. Similarly, memory controllers 12A-12H generally include hardware circuits (memory controller circuits) to implement the operations described herein for each component. Interconnects 20A, 20B and other communication fabrics generally include circuits for transporting communications (eg, memory requests) between other components. Interconnects 20A, 20B may include point-to-point interfaces, shared bus interfaces, and/or hierarchies of one or both interfaces. The fabric may be circuit switched, packet switched, etc.

도 2는 일 실시예에 대한, SOC 다이(들) 상의 복수의 메모리 제어기들 및 물리적/논리적 배열의 일 실시예를 예시하는 블록도이다. 메모리 제어기들(12A 내지 12H)은 도 2에서 다이 0 및 다이 1로 예시된(예컨대, 짧은 점선(30)으로 분리된) SOC(10)의 두 인스턴스들에 대해 예시된다. 다이 0은 점선(30) 위에 예시된 부분일 수 있고, 다이 1은 점선(30) 아래의 부분일 수 있다. 주어진 다이 상의 메모리 제어기들(12A 내지 12H)은 메모리 제어기들(12A 내지 12H)의 물리적 위치에 기초하여 슬라이스들로 분할될 수 있다. 예를 들어, 도 2에서, 슬라이스 0은 다이 0 또는 다이 1의 하나의 반부 상에 물리적으로 위치된, 메모리 제어기들(12A, 12B 및 12E, 12F)을 포함할 수 있다. 슬라이스 1은 다이 0 또는 다이 1의 다른 반부 상에 물리적으로 위치된, 메모리 제어기들(12C, 12D 및 12G, 12H)을 포함할 수 있다. 다이 상의 슬라이스는 도 2에서 파선들(32)에 의해 범위가 정해진다. 슬라이스들 내에서, 메모리 제어기들(12A 내지 12H)은 슬라이스 내의 물리적 위치에 기초하여 행들로 분할될 수 있다. 예를 들어, 다이 0의 슬라이스 0은 도 1에서 두 행들을 포함하도록 도시되며, 행 0에서 점선(34) 위의 메모리 제어기들(12A, 12E)은 슬라이스 0에 의해 점유된 영역의 하나의 반부 상에 물리적으로 위치된다. 슬라이스 1의 행 1의 메모리 제어기들(12B, 12F)은, 슬라이스 0에 의해 점유된 영역의 다른 반부 상에서 점선(34) 아래에, 슬라이스 0에 의해 점유된 영역의 다른 반부 상에서 물리적으로 위치된다. 다른 슬라이스들은 유사하게 행들로 분할될 수 있다. 부가적으로, 주어진 메모리 제어기(12A 내지 12H)는 서쪽 상호연결부(20A) 또는 동쪽 상호연결부(20B) 중 어느 하나를 통해 도달가능할 수 있다.Figure 2 is a block diagram illustrating one embodiment of a plurality of memory controllers and physical/logical arrangement on SOC die(s), for one embodiment. Memory controllers 12A-12H are illustrated for two instances of SOC 10, illustrated as Die 0 and Die 1 in Figure 2 (e.g., separated by short dashed line 30). Die 0 may be the portion illustrated above dashed line 30 and die 1 may be the portion below dashed line 30. Memory controllers 12A through 12H on a given die may be divided into slices based on the physical location of memory controllers 12A through 12H. For example, in Figure 2, slice 0 may include memory controllers 12A, 12B and 12E, 12F, physically located on either half of die 0 or die 1. Slice 1 may include memory controllers 12C, 12D and 12G, 12H, physically located on die 0 or the other half of die 1. The slice on the die is delimited by dashed lines 32 in FIG. 2 . Within slices, memory controllers 12A-12H may be partitioned into rows based on physical location within the slice. For example, slice 0 of die 0 is shown in Figure 1 to include two rows, with memory controllers 12A, 12E above dashed line 34 in row 0 covering one half of the area occupied by slice 0. is physically located on the Memory controllers 12B, 12F of row 1 of slice 1 are physically located on the other half of the area occupied by slice 0, below dotted line 34. Other slices can be similarly divided into rows. Additionally, a given memory controller 12A-12H may be reachable via either the western interconnect 20A or the eastern interconnect 20B.

따라서, 메모리 어드레스가 매핑되는 주어진 다이 0 또는 1 상의 주어진 메모리 제어기(12A 내지 12H)를 식별하기 위해, 메모리 어드레스는 다수의 입도 레벨들에서 해싱될 수 있다. 이 실시예에서, 레벨들은 다이 레벨, 슬라이스 레벨, 행 레벨, 및 측 레벨(동쪽 또는 서쪽)을 포함할 수 있다. 다이 레벨은 복수의 집적 회로 다이 중 어느 것이 주어진 메모리 제어기를 포함하는지를 특정할 수 있다. 슬라이스 레벨은 다이 내의 복수의 슬라이스들 중 어느 것이 주어진 메모리 제어기를 포함하는지를 특정할 수 있으며, 여기서 다이 상의 복수의 메모리 제어기들은 주어진 집적 회로 다이 상의 물리적 위치에 기초하여 복수의 슬라이스들로 논리적으로 분할되고, 주어진 슬라이스는 다이 내의 복수의 메모리 제어기들 중 적어도 2개의 메모리 제어기들을 포함한다. 주어진 슬라이스 내에서, 메모리 제어기들은 다이 상의 물리적 위치에 기초하여, 더 구체적으로는 주어진 슬라이스 내에서, 복수의 행들로 논리적으로 분할될 수 있다. 행 레벨은 복수의 행들 중 어느 것이 주어진 메모리 제어기를 포함하는지를 특정할 수 있다. 행은, 다시 다이 내의 물리적 위치에 기초하여, 더 구체적으로는 주어진 행 내에서, 복수의 측들로 분할될 수 있다. 측 레벨은 주어진 행의 어느 측이 주어진 메모리 제어기를 포함하는지를 특정할 수 있다.Accordingly, a memory address may be hashed at multiple levels of granularity to identify a given memory controller 12A-12H on a given die 0 or 1 to which the memory address is mapped. In this embodiment, the levels may include die level, slice level, row level, and side level (east or west). The die level may specify which of a plurality of integrated circuit dies contains a given memory controller. The slice level may specify which of a plurality of slices within a die contains a given memory controller, wherein the plurality of memory controllers on the die are logically partitioned into a plurality of slices based on their physical location on the given integrated circuit die. , a given slice includes at least two memory controllers among a plurality of memory controllers in the die. Within a given slice, memory controllers may be logically partitioned into multiple rows based on their physical location on the die, and more specifically, within a given slice. The row level may specify which of a plurality of rows contains a given memory controller. A row may be divided into multiple sides, again based on physical location within the die, and more specifically within a given row. The side level can specify which side of a given row contains a given memory controller.

다른 실시예들은, 메모리 제어기들(12A 내지 12H)의 수, 다이의 수 등에 기초하여, 더 많거나 더 적은 레벨들을 포함할 수 있다. 예를 들어, 2개 초과의 다이를 포함하는 실시예는 다이를 선택하기 위한 다수의 입도 레벨들을 포함할 수 있다(예컨대, 다이 그룹들은 4개의 다이 구현예에서 SOC들(10)의 쌍들을 그룹화하는 데 사용될 수 있고, 다이 레벨은 선택된 쌍 내의 다이 중에서 선택할 수 있다). 유사하게, 다이당 8개 대신 4개의 메모리 제어기들을 포함하는 구현예는 슬라이스 또는 행 레벨들 중 하나를 제거할 수 있다. 다중 다이가 아닌 단일 다이를 포함하는 구현예는 다이 레벨을 제거할 수 있다.Other embodiments may include more or fewer levels, based on the number of memory controllers 12A-12H, number of dies, etc. For example, an embodiment including more than two dies may include multiple granularity levels for selecting a die (e.g., die groups group pairs of SOCs 10 in a four die implementation can be used to do so, and the die level can be selected among the dies within the selected pair). Similarly, an implementation that includes four memory controllers per die instead of eight could eliminate either the slice or row levels. Implementations that include a single die rather than multiple dies can eliminate die levels.

입도 레벨들 각각에서, 하나의 또는 다른 레벨을 선택하기 위해 어드레스 비트들의 서브세트의 해시에 기초하여 이진 결정이 이루어진다. 따라서, 해시는 어드레스 비트들에서 논리적으로 연산하여 이진 출력(하나의 비트, 0 또는 1)을 생성할 수 있다. 임의의 논리 함수가 해시에 사용될 수 있다. 일 실시예에서, 예를 들어, 해시가 어드레스 비트들의 서브세트를 함께 배타적-논리합(XOR)하여 결과를 생성하는, XOR 감소가 사용될 수 있다. XOR 감소는 또한 해시의 가역성을 제공할 수 있다. 가역성은 드롭된 비트들의 복구를 허용할 수 있지만, 이진 결과를 드롭되지 않은 어드레스 비트들과 XOR할 수 있다(레벨당 하나의 드롭된 비트). 특히, 일 실시예에서, 드롭된 어드레스 비트는 다른 레벨들에 대해 사용된 어드레스 비트들의 서브세트들로부터 배제될 수 있다. 해시 내의 다른 비트들은 해시들 사이에서 공유될 수 있지만, 드롭될 비트는 그렇지 않다. 이 실시예에서 XOR 감소가 사용되지만, 다른 실시예들은 해시로서 임의의 논리적으로 가역적인 부울 연산을 구현할 수 있다.At each of the granularity levels, a binary decision is made based on a hash of a subset of address bits to select one or the other level. Therefore, a hash can operate logically on the address bits to produce a binary output (one bit, 0 or 1). Any logical function can be used in the hash. In one embodiment, XOR reduction may be used, for example, where the hash produces a result by exclusive-oring (XORing) a subset of the address bits together. XOR reduction can also provide hashing reversibility. Reversibility can allow recovery of dropped bits, but XORing the binary result with non-dropped address bits (one dropped bit per level). In particular, in one embodiment, dropped address bits may be excluded from subsets of address bits used for other levels. Other bits in the hash may be shared between hashes, but the bit to be dropped cannot. Although XOR reduction is used in this embodiment, other embodiments may implement any logically reversible Boolean operation as a hash.

도 3은 특정 메모리 어드레스를 서비스하는 메모리 제어기(12A 내지 12H)(및 다이)(즉, 특정 메모리 어드레스가 매핑되는 메모리 제어기)를 결정하기 위한 이진 결정 트리의 일 실시예의 블록도이다. 결정 트리는 다이(도면 부호 40), 다이 상의 슬라이스(도면 부호 42), 슬라이스 내의 행(도면 부호 44), 및 행 내의 측(도면 부호 46)을 결정하는 것을 포함할 수 있다. 일 실시예에서, 메모리 제어기 내에서 메모리 요청의 프로세싱을 안내하기 위한 추가적인 이진 결정들이 있을 수 있다. 예를 들어, 도 3의 실시예는 평면 레벨(48) 및 파이프 레벨(50)을 포함할 수 있다. 내부 입도 레벨들은 메모리 요청을, 메모리 요청에 의해 영향받는 데이터를 저장하는 특정 메모리 디바이스(28)에 매핑할 수 있다. 즉, 가장 정밀한 입도 레벨은 특정 메모리 디바이스(28)에 매핑하는 레벨일 수 있다. 메모리 평면들은 독립적이어서, 다수의 메모리 요청들이 병렬로 진행되게 할 수 있다. 부가적으로, 메모리 제어기에 포함된 다양한 구조들(예컨대, 메모리 디바이스들(28)에서 이전에 액세스된 데이터를 캐싱하는 메모리 캐시, 중복 태그들 또는 디렉토리와 같은 코히어런시 제어 하드웨어, 다양한 버퍼들 및 큐들 등)은 평면들 사이에서 분할될 수 있고, 따라서 메모리 구조들은 설계하기에 더 작고 더 용이하여 주어진 동작 주파수에서 타이밍을 충족할 수 있는 등일 수 있다. 따라서, 주어진 크기의 하드웨어 구조들에 대한 더 높은 달성가능한 클록 주파수 및 병렬 프로세싱 둘 모두를 통해 성능은 증가될 수 있다. 다른 실시예들에서, 메모리 제어기 내에 추가적인 내부 입도 레벨들이 또한 있을 수 있다.FIG. 3 is a block diagram of one embodiment of a binary decision tree for determining which memory controller 12A through 12H (and die) serves a particular memory address (i.e., the memory controller to which a particular memory address is mapped). The decision tree may include determining a die (see 40), a slice on the die (see 42), a row within a slice (see 44), and a side within a row (see 46). In one embodiment, there may be additional binary decisions within the memory controller to guide processing of a memory request. For example, the embodiment of FIG. 3 may include a planar level 48 and a pipe level 50 . Internal granularity levels may map a memory request to a specific memory device 28 that stores the data affected by the memory request. That is, the most precise level of granularity may be the level that maps to a specific memory device 28. Memory planes are independent, allowing multiple memory requests to proceed in parallel. Additionally, various structures included in the memory controller (e.g., a memory cache that caches previously accessed data in memory devices 28, coherency control hardware such as redundant tags or directories, various buffers) and queues, etc.) can be partitioned between planes, so memory structures can be smaller and easier to design, capable of meeting timing at a given operating frequency, etc. Accordingly, performance can be increased through both higher achievable clock frequencies and parallel processing for hardware structures of a given size. In other embodiments, there may also be additional internal levels of granularity within the memory controller.

도 3에 예시된 이진 결정 트리는 다이 레벨(40), 슬라이스 레벨(42), 행 레벨(44), 측 레벨(46), 평면 레벨(48), 및 파이프(50)의 결정들이 직렬로 이루어짐을 암시하고자 하는 것이 아니다. 결정들을 수행하기 위한 로직은 병렬로 동작하여, 어드레스 비트들의 세트들을 선택하고 해시들을 수행하여 결과적인 이진 결정들을 생성할 수 있다.The binary decision tree illustrated in Figure 3 shows that decisions at die level 40, slice level 42, row level 44, side level 46, planar level 48, and pipe 50 are made in series. I don't mean to imply it. Logic for performing decisions may operate in parallel, selecting sets of address bits and performing hashes to generate resulting binary decisions.

도 2로 돌아가면, 메모리 제어기들(12A 내지 12H) 및 다이들 0 및 1에 대한 어드레스 매핑의 프로그래밍가능성은 물리적으로 먼 메모리 제어기들(12A 내지 12H) 사이의 연속적인 어드레스들의 분산을 제공할 수 있다. 즉, 소스가 메모리의 페이지의 연속적인 어드레스들에 액세스하고 있는 경우, 예를 들어, 메모리 요청들은 (소정 어드레스 입도에서) 상이한 메모리 제어기들에 걸쳐 분산될 수 있다. 예를 들어, 연속적인 캐시 블록들(예컨대, 정렬된 64바이트 또는 128바이트 블록들)은 상이한 메모리 제어기들(12A 내지 12H)에 매핑될 수 있다. 더 낮은 입도 매핑들이 또한 사용될 수 있다(예컨대, 256바이트, 512바이트, 또는 1킬로바이트 블록들이 상이한 메모리 제어기들에 매핑될 수 있음). 즉, 동일한 블록 내의 데이터에 액세스하는 다수의 연속적인 메모리 어드레스들이 동일한 메모리 제어기로 라우팅될 수 있고, 이어서 다음 다수의 연속적인 메모리 어드레스들이 상이한 메모리 제어기로 라우팅될 수 있다.Returning to Figure 2, the programmability of the address mapping for memory controllers 12A through 12H and dies 0 and 1 can provide distribution of consecutive addresses between physically distant memory controllers 12A through 12H. there is. That is, if a source is accessing consecutive addresses of a page of memory, for example, memory requests may be distributed across different memory controllers (at some address granularity). For example, consecutive cache blocks (e.g., aligned 64-byte or 128-byte blocks) may be mapped to different memory controllers 12A through 12H. Lower granularity mappings may also be used (eg, 256-byte, 512-byte, or 1-kilobyte blocks may be mapped to different memory controllers). That is, multiple consecutive memory addresses accessing data in the same block may be routed to the same memory controller, and then the next multiple consecutive memory addresses may be routed to a different memory controller.

연속적인 블록들을 물리적으로 분산된 메모리 제어기들(12A 내지 12H)에 매핑하는 것은 성능 이점들을 가질 수 있다. 예를 들어, 메모리 제어기들(12A 내지 12H)은 서로 독립적이기 때문에, 메모리 제어기들(12A 내지 12H)의 세트 전체에서 이용가능한 대역폭은 전체 페이지가 액세스되는 경우 더 충분히 활용될 수 있다. 부가적으로, 일부 실시예들에서, 통신 패브릭 내의 메모리 요청들의 루트는 부분적으로 비-중첩되거나 완전히 비-중첩될 수 있다. 즉, 하나의 메모리 요청에 대한 루트의 일부인 통신 패브릭의 적어도 하나의 세그먼트는 다른 메모리 요청에 대한 루트의 일부가 아닐 수 있고, 부분적으로 비-중첩된 루트의 경우 그 반대도 마찬가지이다. 완전히 비-중첩된 루트들은 패브릭의 구별되는 완전한 별개의 부분들을 사용할 수 있다(예컨대, 어떠한 세그먼트들도 동일할 수 없음). 따라서, 통신 패브릭에서의 트래픽은 확산될 수 있고, 트래픽이 그렇지 않았다면 간섭할 수 있는 만큼 서로 간섭하지 않을 수 있다.Mapping contiguous blocks to physically distributed memory controllers 12A through 12H may have performance benefits. For example, because memory controllers 12A through 12H are independent of each other, the bandwidth available across the set of memory controllers 12A through 12H can be more fully utilized when an entire page is accessed. Additionally, in some embodiments, the root of memory requests within the communication fabric may be partially non-overlapping or completely non-overlapping. That is, at least one segment of the communication fabric that is part of the route for one memory request may not be part of the route for another memory request, and vice versa for partially non-overlapping routes. Completely non-overlapping routes may use distinct and completely separate parts of the fabric (eg, no segments can be identical). Accordingly, traffic in the communication fabric can be spread out and not interfere with each other as much as the traffic might otherwise interfere with.

따라서, MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P)은, 제1 어드레스 및 제2 어드레스가 제2 입도 레벨에서 인접한 어드레스들일 때, 회로부로 하여금 제1 어드레스를 갖는 제1 메모리 요청을 복수의 메모리 제어기들 중 제1 메모리 제어기로 라우팅하게 하고 제2 어드레스를 갖는 제2 메모리 요청을 복수의 메모리 제어기들 중 제1 메모리 제어기로부터 물리적으로 먼 제2 메모리 제어기로 라우팅하게 하는 데이터로 프로그래밍가능할 수 있다. 일 실시예에서, 통신 패브릭을 통한 제1 메모리 요청의 제1 루트 및 통신 패브릭을 통한 제2 메모리 요청의 제2 루트는 완전히 비-중첩된다. 다른 경우들에서, 제1 및 제2 루트들은 부분적으로 비-중첩될 수 있다. 하나 이상의 레지스터들은, 통신 패브릭으로 하여금 연속적인 어드레스들에 대한 복수의 메모리 요청들을 물리적으로 먼 메모리 제어기들에 걸쳐 분산시키는 패턴으로 복수의 메모리 요청들을 복수의 메모리 제어기들 중 상이한 메모리 제어기들로 라우팅하게 하는 데이터로 프로그래밍가능할 수 있다.Accordingly, the MLC registers (22A to 22H, 22J to 22N, 22P), when the first address and the second address are adjacent addresses at the second granularity level, allow the circuitry to receive the first memory request with the first address in multiple numbers. It may be programmable with data that causes routing to a first memory controller among the memory controllers and routes a second memory request with a second address to a second memory controller that is physically distant from the first memory controller among the plurality of memory controllers. there is. In one embodiment, the first route of the first memory request through the communications fabric and the second route of the second memory request through the communications fabric are completely non-overlapping. In other cases, the first and second routes may be partially non-overlapping. One or more registers cause the communication fabric to route multiple memory requests to different memory controllers of the plurality of memory controllers in a pattern that distributes the multiple memory requests for consecutive addresses across physically distant memory controllers. It can be programmed with data.

예를 들어, 도 2에서, 다이 0 및 다이 1 상의 메모리 제어기들(12A 내지 12H)은 MC 0 내지 MC 15로 라벨링된다. 페이지 내의 어드레스 0으로 시작하여, MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P)의 프로그래밍에서 정의된 입도 레벨에서의 연속적인 어드레스들은 먼저 MC0(다이 0 내의 메모리 제어기(12A)), 이어서 MC1(다이 1 내의 메모리 제어기(12G)), MC2(다이 1 내의 메모리 제어기(12D)), MC3(다이 0 내의 메모리 제어기(12F)), MC4(다이 1 내의 메모리 제어기(12A)), MC5(다이 0 내의 메모리 제어기(12G)), MC6(다이 0 내의 메모리 제어기(12D)), MC7(다이 1 내의 메모리 제어기(12F)), MC8(다이 0 내의 메모리 제어기(12C)), MC9(다이 1 내의 메모리 제어기(12E)), MC10(다이 1 내의 메모리 제어기(12B)), MC11(다이 0 내의 메모리 제어기(12H)), MC12(다이 1 내의 메모리 제어기(12C)), MC13(다이 0 내의 메모리 제어기(12E)), MC14(다이 0 내의 메모리 제어기(12B)), 및 이어서 MC15(다이 1 내의 메모리 제어기(12H))에 액세스할 수 있다. 제2 입도 레벨이 페이지 크기의 1/N^th보다 작은 경우 - 여기서 N은 시스템 내의 메모리 제어기들의 수임(예컨대, 이 실시예에서, 16) -, MC15 이후의 다음 연속적인 액세스는 MC0으로 복귀할 수 있다. 더 랜덤한 액세스 패턴은 메모리 요청들이 물리적으로 가까운 메모리 제어기들로 라우팅하는 것을 야기할 수 있지만, 더 일반적인 규칙적인 액세스 패턴들(하나 이상의 메모리 제어기가 위의 순서에서 스킵되는 스트라이드(stride)가 수반되는 경우에도)은 시스템에서 잘 분산될 수 있다.For example, in Figure 2, memory controllers 12A through 12H on Die 0 and Die 1 are labeled MC 0 through MC 15. Starting with address 0 in the page, successive addresses at the level of granularity defined in the programming of the MLC registers (22A through 22H, 22J through 22N, 22P) are first MC0 (memory controller 12A in die 0), then MC1. (Memory Controller (12G) in Die 1), MC2 (Memory Controller (12D) in Die 1), MC3 (Memory Controller (12F) in Die 0), MC4 (Memory Controller (12A) in Die 1), MC5 (Memory Controller (12A) in Die 1) MC6 (memory controller (12D) in die 0), MC7 (memory controller (12F) in die 1), MC8 (memory controller (12C) in die 0), MC9 (memory controller (12D) in die 1) Memory Controller (12E)), MC10 (Memory Controller (12B) in Die 1), MC11 (Memory Controller (12H) in Die 0), MC12 (Memory Controller (12C) in Die 1), MC13 (Memory Controller (12B) in Die 0) (12E)), MC14 (memory controller 12B in die 0), and then MC15 (memory controller 12H in die 1). If the second granularity level is less than 1/N ^th of the page size, where N is the number of memory controllers in the system (e.g., 16 in this embodiment), the next successive access after MC15 may return to MC0. there is. More random access patterns may result in memory requests being routed to physically nearby memory controllers, while more common regular access patterns (which involve strides where one or more memory controllers are skipped in the above sequence) even if) can be well distributed in the system.

도 4는 복수의 메모리 위치 구성 레지스터들(60, 62)의 일 실시예를 예시하는 블록도이다. 일반적으로, 주어진 하드웨어 에이전트에서 레지스터들(60)은 복수의 입도 레벨들 중 하나 이상에서 어느 어드레스 비트들이 해시에 포함되는지를 식별하는 데이터로 프로그래밍가능할 수 있다. 예시된 실시예에서, 레지스터들(60)은 이전에-설명된 레벨들에 대응하는 다이 레지스터, 슬라이스 레지스터, 행 레지스터, 측 레지스터, 평면 레지스터, 및 파이프 레지스터뿐만 아니라, 데이터를 저장하는 메모리 디바이스(28) 내의 뱅크 그룹(BankG) 및 뱅크를 정의하는 뱅크 그룹 및 뱅크 레지스터를 포함할 수 있다(DRAM 메모리 디바이스들이 뱅크 그룹들 및 뱅크들 둘 모두를 갖는 실시예의 경우). 도 4에서 각각의 입도 레벨에 대해 별개의 레지스터들(60)이 도시되지만, 다른 실시예들은, 원하는 대로, 단일 레지스터 내의 필드들로서 2개 이상의 입도 레벨들을 조합할 수 있다는 것에 유의한다.4 is a block diagram illustrating one embodiment of a plurality of memory location configuration registers 60 and 62. In general, registers 60 in a given hardware agent may be programmable with data identifying which address bits are included in the hash at one or more of a plurality of levels of granularity. In the illustrated embodiment, registers 60 include die registers, slice registers, row registers, side registers, plane registers, and pipe registers corresponding to the previously-described levels, as well as memory devices that store data ( 28) may include a bank group and a bank register defining a bank and a bank group (BankG) in (for embodiments where DRAM memory devices have both bank groups and banks). Note that although separate registers 60 are shown for each granularity level in Figure 4, other embodiments may combine two or more granularity levels as fields within a single register, as desired.

다이 레지스터는 일 실시예에 대해 분해도로 도시되고, 다른 레지스터들(60)은 유사할 수 있다. 예시된 실시예에서, 다이 레지스터는 반전 필드(66), 및 마스크 필드(68)를 포함할 수 있다. 반전 필드(66)는 반전을 나타내는 설정 상태 및 반전 없음을 나타내는 소거 상태를 갖는 비트일 수 있다(또는 그 반대일 수 있거나, 또는 다중 비트 값이 사용될 수 있음). 마스크 필드(68)는 각자의 어드레스 비트들에 대응하는 비트들의 필드일 수 있다. 그 입도 레벨에 대해, 마스크 비트에서의 설정 상태는 각자의 어드레스 비트가 해시에 포함됨을 나타낼 수 있고, 소거 상태는 각자의 어드레스 비트가 해시로부터 배제됨을 나타낼 수 있다(또는 그 반대도 마찬가지임).The die resistor is shown in an exploded view for one embodiment; other resistors 60 may be similar. In the illustrated embodiment, the die resistor may include an invert field 66 and a mask field 68. The inversion field 66 may be a bit with a set state indicating an inversion and a cleared state indicating no inversion (or vice versa, or multiple bit values may be used). The mask field 68 may be a field of bits corresponding to respective address bits. For that level of granularity, a set state in the mask bit may indicate that the respective address bit is included in the hash, and a cleared state may indicate that the respective address bit is excluded from the hash (or vice versa).

반전 필드(66)는 선택된 어드레스 비트들의 해시의 결과가 반전되어야 함을 특정하는 데 사용될 수 있다. 반전은 메모리 제어기의 결정에서 추가적인 유연성을 허용할 수 있다. 예를 들어, 모두 제로의 마스크를 프로그래밍하는 것은 임의의 어드레스에 대해 그 입도 레벨에서 이진 0을 야기하여, 매번 동일한 방향으로 결정을 강제한다. 임의의 어드레스에 대해 주어진 입도 레벨에서 이진 1이 요구되는 경우, 마스크는 모두 제로로 프로그래밍될 수 있고, 반전 비트는 설정될 수 있다.The invert field 66 may be used to specify that the result of the hash of the selected address bits should be inverted. Inversion may allow additional flexibility in the memory controller's decisions. For example, programming a mask of all zeros results in binary zeros at that granularity level for any address, forcing the decision in the same direction every time. If binary 1's are desired at a given granularity level for any address, the mask can be programmed to all zeros and the invert bit can be set.

MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P) 각각은, 하드웨어 에이전트 및 그 하드웨어 에이전트에 의해 메모리 요청을 라우팅하는 데 사용되는 입도 레벨들에 따라, 레지스터들(60)의 서브세트 또는 모두를 포함할 수 있다. 일반적으로, 주어진 하드웨어 에이전트는, 원하는 경우, 뱅크 레벨까지(도 4에서 "뱅크"로 라벨링된 중괄호(curly brace)), 입도 레벨들 모두를 채용할 수 있다. 그러나, 일부 하드웨어 에이전트들은 그러한 많은 입도 레벨들을 구현할 필요가 없다. 예를 들어, 하드웨어 에이전트는 다이, 슬라이스, 행, 및 측 입도 레벨들을 채용하여, 메모리 요청들을 타겟팅된 다이 상의 타겟팅된 메모리 제어기(12A 내지 12H)(도 4에서 "MC"로 라벨링된 중괄호)로 전달할 수 있다. 메모리 제어기(12A 내지 12H)는 나머지 해싱 레벨들을 처리할 수 있다. 다른 하드웨어 에이전트는, 각각의 평면에 대해 하나씩, 주어진 메모리 제어기(12A 내지 12H)로의 2개의 루트들을 가질 수 있다. 따라서, 그러한 하드웨어 에이전트는 다이, 슬라이스, 행, 측, 및 평면 레지스터들(도 4에서"평면"으로 라벨링된 중괄호)을 채용할 수 있다. 또 다른 하드웨어 에이전트는 다이, 슬라이스, 행, 측, 및 평면 입도 레벨들뿐만 아니라, 원하는 채널을 식별하는 파이프 레벨(도 4에서 "채널"로 라벨링된 중괄호)을 포함할 수 있다. 따라서, 제1 하드웨어 에이전트는 복수의 입도 레벨들 중 제1 수에 대해 프로그래밍가능할 수 있고, 제2 하드웨어 에이전트는 복수의 입도 레벨들 중 제2 수에 대해 프로그래밍가능할 수 있으며, 여기서 제2 수는 제1 수와 상이하다. 다른 실시예들에서, 뱅크 그룹, 뱅크, 및 다른 디바이스내(intradevice) 입도 레벨들은 다른 입도 레벨들과 상이하게 특정될 수 있고, 따라서 레지스터들(60)에 포함되지 않는 별도로-정의된 레지스터들일 수 있다. 또 다른 실시예들에서, 뱅크 그룹, 뱅크, 및 다른 디바이스내 입도 레벨들은 하드웨어에서 고정될 수 있다.MLC registers 22A through 22H, 22J through 22N, and 22P each use a subset or all of registers 60, depending on the hardware agent and the levels of granularity used to route memory requests by the hardware agent. It can be included. In general, a given hardware agent can employ all levels of granularity, if desired, up to the bank level (curly braces labeled “bank” in Figure 4). However, some hardware agents do not need to implement such many levels of granularity. For example, the hardware agent employs die, slice, row, and side granularity levels to route memory requests to targeted memory controllers 12A through 12H (braces labeled “MC” in Figure 4) on the targeted die. It can be delivered. Memory controllers 12A through 12H may process the remaining hashing levels. Another hardware agent may have two routes to a given memory controller 12A through 12H, one for each plane. Accordingly, such a hardware agent may employ die, slice, row, side, and plane registers (braces labeled “plane” in Figure 4). Another hardware agent may include die, slice, row, side, and plane granularity levels, as well as a pipe level (braces labeled “channel” in Figure 4) that identifies the desired channel. Accordingly, a first hardware agent may be programmable to a first number of a plurality of granularity levels, and a second hardware agent may be programmable to a second number of the plurality of granularity levels, where the second number is a second number of the plurality of granularity levels. 1 It is different from the number. In other embodiments, bank group, bank, and other intradevice granularity levels may be specified differently than other granularity levels, and thus may be separately-defined registers that are not included in registers 60. there is. In still other embodiments, bank group, bank, and other intra-device granularity levels may be fixed in hardware.

MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P)의 일부 세트들에 포함될 수 있는 레지스터들의 다른 세트는 도 4에 도시된 드롭 레지스터들(62)이다. 특히, 일 실시예에서, 드롭 레지스터들(62)은 메모리 제어기들(12A 내지 12H)에서 MLC 레지스터들(22F 내지 22H, 22J 내지 22N)에 포함될 수 있다. 드롭 레지스터들(62)은 각각의 입도 레벨에 대한 레지스터를 포함할 수 있고, 타겟팅된 메모리 제어기(12A 내지 12H)에 의해 드롭될 그 입도 레벨에 대응하는 어드레스 비트들의 서브세트 내의 적어도 하나의 어드레스 비트를 식별하도록 프로그래밍가능할 수 있다. 특정된 비트는 그 입도 레벨의 해시에 포함된 비트로서 대응하는 레지스터(60)에서 특정된 비트들 중 하나이다. 일 실시예에서, 드롭된 어드레스 비트는 그 입도 레벨에 대한 해시에 배타적으로 포함될 수 있다(예컨대, 드롭된 어드레스 비트는 레지스터들(60)에서 임의의 다른 입도 레벨에서 특정되지 않는다). 주어진 해시에 포함된 다른 비트들은 다른 입도 레벨들에서 공유될 수 있지만, 드롭된 비트는 주어진 입도 레벨에 고유할 수 있다. 드롭 레지스터들(62)은 드롭될 어드레스 비트를 나타내는 임의의 방식으로 프로그래밍될 수 있다(예컨대, 비트 번호는 16진수로서 특정될 수 있거나, 도 4에 도시된 바와 같이 비트 마스크가 사용될 수 있다). 비트 마스크는 각각의 어드레스 비트에 대한 비트(또는 일부 어드레스 비트들이 드롭에 적격하지 않은 경우, 각각의 선택가능한 어드레스 비트)를 포함할 수 있다. 비트 마스크는 선택된 드롭 비트를 나타낼 수 있는, 단 하나의 세트 비트가(set bit)가 있는, "원 핫(one hot)" 마스크일 수 있다. 다른 실시예들에서, 단일 드롭 레지스터(62) 내의 단일 비트 마스크는 각각의 입도 레벨에 대한 드롭 비트를 특정할 수 있고, 따라서 하나의 핫 마스크가 아닐 수 있다.Another set of registers that may be included in some sets of MLC registers 22A through 22H, 22J through 22N, and 22P are drop registers 62 shown in FIG. 4. In particular, in one embodiment, drop registers 62 may be included in MLC registers 22F through 22H and 22J through 22N in memory controllers 12A through 12H. Drop registers 62 may include a register for each granularity level, and at least one address bit in the subset of address bits corresponding to that granularity level to be dropped by the targeted memory controller 12A through 12H. It may be programmable to identify. The specified bit is a bit included in the hash of that granularity level and is one of the bits specified in the corresponding register 60. In one embodiment, the dropped address bit may be included exclusively in the hash for that granularity level (eg, the dropped address bit is not specified at any other granularity level in registers 60). Other bits included in a given hash may be shared across different granularity levels, but a dropped bit may be unique to a given granularity level. Drop registers 62 can be programmed in any way to indicate the address bit to be dropped (eg, the bit number can be specified as a hexadecimal number, or a bit mask can be used as shown in Figure 4). The bit mask may include a bit for each address bit (or each selectable address bit if some address bits are not eligible for drop). The bit mask may be a “one hot” mask, with only one set bit, which may indicate the selected drop bit. In other embodiments, a single bit mask in single drop register 62 may specify a drop bit for each granularity level and thus may not be a single hot mask.

메모리 제어기는 드롭 비트들을 특정하기 위해 드롭 레지스터들(62)을 통해 프로그래밍될 수 있다. 메모리 제어기(그리고 보다 구체적으로, 제어 회로(24)는 내부 버퍼들(26) 내의 메모리 제어기에서 내부적으로 사용하기 위해 각각의 메모리 요청에 대한 내부 어드레스(위에서 언급된 "압축된 파이프 어드레스", 또는 더 간단히 "압축된 어드레스")를 생성하고 메모리 디바이스(28)를 어드레싱하도록 구성될 수 있다. 압축된 파이프 어드레스는 특정된 어드레스 비트들 중 일부 또는 전부를 드롭하고 나머지 어드레스 비트들을 함께 시프팅함으로써 생성될 수 있다.The memory controller can be programmed through drop registers 62 to specify drop bits. The memory controller (and more specifically, the control circuit 24) determines an internal address for each memory request for internal use by the memory controller in internal buffers 26 (the "compressed pipe address" mentioned above, or more may be configured to simply generate a "compressed address") and address the memory device 28. A compressed pipe address may be created by dropping some or all of the specified address bits and shifting the remaining address bits together. You can.

이전에 언급된 바와 같이, 어드레스의 복사본들을 갖는 다수의 내부 버퍼들은 불필요한 어드레스 비트들을 제거함으로써 전력을 절약할 수 있다. 부가적으로, 가역적 해시 함수를 이용해, 드롭된 비트들이 복구되어 전체 어드레스를 복구할 수 있다. 주어진 메모리 제어기(12A 내지 12H) 내의 메모리 요청의 존재는 주어진 입도 레벨에서 해시의 결과를 제공하고, 그 입도 레벨에 포함되는 다른 어드레스 비트들로 결과를 해싱하는 것은 드롭된 어드레스 비트를 생성한다. 전체 어드레스의 복구는, 그것이 요청에 대한 응답을 위해, 일관성 이유들에 대한 스누프들 등을 위해 필요한 경우, 유용할 수 있다.As previously mentioned, multiple internal buffers with copies of an address can save power by eliminating unnecessary address bits. Additionally, using a reversible hash function, dropped bits can be recovered to recover the entire address. The presence of a memory request within a given memory controller 12A through 12H provides the result of a hash at a given granularity level, and hashing the result with other address bits included in that granularity level produces a dropped address bit. Recovery of the entire address may be useful if it is needed for a response to a request, snoops for consistency reasons, etc.

이제 도 5를 참조하면, 부트/전력 투입 동안 SOC들의 일 실시예의 동작을 예시하는 흐름도가 도시된다. 예를 들어, 도 5에 예시된 동작은 프로세서에 의해 실행되는 명령어들(예컨대, 운영 체제의 실행을 위해 시스템을 초기화하기 위해 실행되는 낮은 레벨 부트 코드)에 의해 수행될 수 있다. 대안적으로, 도 5에 도시된 동작의 전부 또는 일부는 부트 동안 하드웨어 회로부에 의해 수행될 수 있다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. 블록들은 SOC들(10) 내의 조합 로직에서 병렬로 수행될 수 있다. 블록들, 블록들의 조합들, 및/또는 흐름도 전체는 다수의 클록 사이클들에 걸쳐 파이프라인화될 수 있다.Referring now to Figure 5, a flow diagram illustrating the operation of one embodiment of SOCs during boot/power-up is shown. For example, the operations illustrated in FIG. 5 may be performed by instructions executed by a processor (e.g., low level boot code executed to initialize the system for execution of an operating system). Alternatively, all or part of the operations shown in Figure 5 may be performed by hardware circuitry during boot. Although blocks are shown in a specific order for ease of understanding, other orders may be used. Blocks may be performed in parallel in combinational logic within SOCs 10. Blocks, combinations of blocks, and/or the entire flow diagram may be pipelined over multiple clock cycles.

부트 코드는 SOC 구성(예컨대, SOC(10) 인스턴스들을 포함하는 하나 이상의 칩들, 더 적은 메모리 제어기들(12A 내지 12H)을 포함하는 부분 SOC 또는 시스템에 의해 지원되는 복수의 SOC 설계들 중 하나와 같은 SOC 설계 차이들, 각각의 메모리 제어기(12A 내지 12H)에 결합된 메모리 디바이스들(28) 등)을 식별할 수 있다(블록(70)). 구성을 식별하는 것은 일반적으로 메모리 요청들에 대한 목적지들의 수(예컨대, 시스템 내의 메모리 제어기들(12A 내지 12H)의 수, 각각의 메모리 제어기(12A 내지 12H) 내의 평면들의 수, 사용 동안 인에이블될 메모리 제어기들(12A 내지 12H)의 수 등)를 결정하는 데 있어서의 활동(exercise)일 수 있다. 주어진 메모리 제어기(12A 내지 12H)는 사용 동안 이용불가능할 수 있는데, 예컨대 메모리 디바이스들(28)이 주어진 메모리 제어기(12A 내지 12H)에서 채워지지 않거나 메모리 디바이스들(28)에 하드웨어 실패가 있는 경우에 이용불가능할 수 있다. 다른 경우들에서, 주어진 메모리 제어기(12A 내지 12H)는 소정 테스트 모드들 또는 진단 모드들에서 이용불가능할 수 있다. 구성을 식별하는 것은 또한 이용가능한 메모리의 총량(예컨대, 각각의 메모리 제어기(12A 내지 12H)에 결합된 메모리 디바이스들(28)의 수 및 메모리 디바이스들(28)의 용량)을 결정하는 것을 포함할 수 있다.The boot code may be configured to support a SOC configuration (e.g., one or more chips containing instances of SOC 10, a partial SOC containing fewer memory controllers 12A through 12H, or one of a plurality of SOC designs supported by the system). SOC design differences, memory devices 28 coupled to each memory controller 12A through 12H, etc.) may be identified (block 70). Identifying a configuration generally determines the number of destinations for memory requests (e.g., the number of memory controllers 12A through 12H in the system, the number of planes within each memory controller 12A through 12H, the number of planes to be enabled during use). number of memory controllers 12A to 12H, etc.). A given memory controller 12A through 12H may be unavailable during use, such as if memory devices 28 are not populated in a given memory controller 12A through 12H or if there is a hardware failure in memory devices 28. It may be impossible. In other cases, a given memory controller 12A-12H may be unavailable in certain test modes or diagnostic modes. Identifying the configuration may also include determining the total amount of available memory (e.g., the number of memory devices 28 coupled to each memory controller 12A through 12H and the capacity of the memory devices 28). You can.

이러한 결정들은 각각의 메모리 제어기(12A 내지 12H)에 매핑될 페이지 내의 인접한 블록의 크기에 영향을 미칠 수 있으며, 이는 메모리 제어기들(12A 내지 12H)(및, 하나 초과의 인스턴스가 제공될 때, SOC(10) 인스턴스들) 사이에서 페이지 내에서 메모리 요청들을 확산시키는 것과, 요청들을 동일한 어드레스들로 그룹화하는 것으로부터 얻어질 수 있는 효율들 사이의 트레이드오프를 나타낸다. 따라서, 부트 코드는 각각의 메모리 제어기(12A 내지 12H)에 매핑될 블록 크기를 결정할 수 있다(블록(72)). 다른 모드들에서, 메모리 제어기들(12A 내지 12H)에 대한 어드레스들의 선형 매핑이 사용될 수 있거나(예컨대, 메모리 제어기(12A 내지 12H) 내의 메모리 디바이스들(28) 전체를 메모리 어드레스 공간 내의 어드레스들의 인접한 블록에 매핑함), 하나 이상의 입도 레벨들에서의 인터리빙과 다른 입도 레벨들에서의 선형의 하이브리드가 사용될 수 있다. 부트 코드는 메모리 제어기들(12A 내지 12H)에 대한 어드레스들의 원하는 매핑을 제공하기 위해 MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P)을 프로그래밍하는 방법을 결정할 수 있다(블록(74)). 예를 들어, 마스크 레지스터들(60)은 각각의 입도 레벨에서 어드레스 비트들을 선택하도록 프로그래밍될 수 있고, 드롭 비트 레지스터들(62)은 각각의 입도 레벨에 대해 드롭 비트를 선택하도록 프로그래밍될 수 있다.These decisions may affect the size of contiguous blocks within a page to be mapped to each memory controller 12A through 12H, which may affect the memory controllers 12A through 12H (and, when more than one instance is provided, the SOC (10) represents a trade-off between spreading memory requests within a page among instances and the efficiencies that can be achieved from grouping requests to the same addresses. Accordingly, the boot code can determine the block size to be mapped to each memory controller 12A through 12H (block 72). In other modes, a linear mapping of addresses to memory controllers 12A through 12H may be used (e.g., mapping all of the memory devices 28 within memory controllers 12A through 12H to adjacent blocks of addresses in the memory address space). ), a hybrid of interleaving at one or more granularity levels and linear at other granularity levels can be used. Boot code may determine how to program MLC registers 22A through 22H, 22J through 22N, 22P to provide a desired mapping of addresses to memory controllers 12A through 12H (block 74). For example, mask registers 60 can be programmed to select address bits at each granularity level, and drop bit registers 62 can be programmed to select a drop bit for each granularity level.

도 6은 소스 컴포넌트로부터, 메모리 요청에 대한 식별된 메모리 제어기(12A 내지 12H)로 그 메모리 요청에 대한 루트를 결정하기 위한 다양한 SOC 컴포넌트들의 동작을 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. 블록들은 SOC들(10) 내의 조합 로직에서 병렬로 수행될 수 있다. 블록들, 블록들의 조합들, 및/또는 흐름도 전체는 다수의 클록 사이클들에 걸쳐 파이프라인화될 수 있다.6 is a flow diagram illustrating the operation of various SOC components to determine the route for a memory request from a source component to an identified memory controller 12A-12H for the memory request. Although blocks are shown in a specific order for ease of understanding, other orders may be used. Blocks may be performed in parallel in combinational logic within SOCs 10. Blocks, combinations of blocks, and/or the entire flow diagram may be pipelined over multiple clock cycles.

컴포넌트는 다이, 슬라이스, 행, 측 등과 같은 다양한 입도 레벨들을 결정하기 위해 메모리 요청의 어드레스에 레지스터들(60)을 적용할 수 있다(블록(76)). 입도 레벨들에서의 결과들에 기초하여, 컴포넌트는 메모리 요청을 식별된 메모리 제어기(12A 내지 12H)로 패브릭을 통해 라우팅할 수 있다(블록(78)).The component may apply registers 60 to the address of the memory request to determine various levels of granularity, such as die, slice, row, side, etc. (block 76). Based on the results at the granularity levels, the component may route the memory request through the fabric to the identified memory controllers 12A through 12H (block 78).

도 7은 메모리 요청에 응답하는 메모리 제어기(12A 내지 12H)의 일 실시예의 동작을 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. 블록들은 SOC들(10) 내의 조합 로직에서 병렬로 수행될 수 있다. 블록들, 블록들의 조합들, 및/또는 흐름도 전체는 다수의 클록 사이클들에 걸쳐 파이프라인화될 수 있다.7 is a flow diagram illustrating the operation of one embodiment of memory controllers 12A-12H in response to memory requests. Although blocks are shown in a specific order for ease of understanding, other orders may be used. Blocks may be performed in parallel in combinational logic within SOCs 10. Blocks, combinations of blocks, and/or the entire flow diagram may be pipelined over multiple clock cycles.

메모리 제어기(12A 내지 12H)는 평면, 파이프, 뱅크 그룹, 및 뱅크 마스크 레지스터들(60)을 사용하여 메모리 요청에 대한 평면, 파이프, 뱅크 그룹, 및 뱅크를 식별할 수 있다(블록(80)). 예를 들어, 메모리 제어기(12A 내지 12H)는 대응하는 레지스터(60)로부터의 마스크를 어드레스와 논리적으로 AND하고, 비트들을 논리적으로 조합하고(예컨대, XOR 감소), 표시된 경우 반전시킬 수 있다. 메모리 제어기(12A 내지 12H)는 드롭 레지스터들(62)로부터의 드롭 마스크들을 사용하여 각각의 입도 레벨(예컨대, 다이, 슬라이스, 행, 측, 평면, 파이프, 뱅크 그룹, 및 뱅크)에 의해 특정된 어드레스 비트들을 드롭할 수 있고, 나머지 어드레스 비트들을 함께 시프팅하여 압축된 파이프 어드레스를 형성할 수 있다(블록(82)). 예를 들어, 메모리 제어기(12A 내지 12H)는 드롭 마스크들의 역의 논리적 AND로 어드레스를 마스킹할 수 있고, 나머지 비트들을 함께 시프팅할 수 있다. 대안적으로, 메모리 제어기(12A 내지 12H)는 단순히 어드레스 비트들을 함께 시프팅하여, 식별된 비트들을 자연스럽게 드롭할 수 있다. 메모리 제어기(12A 내지 12H)는 특정된 메모리 요청(예컨대, 판독 또는 기록)을 수행할 수 있고(블록(84)), (예컨대, 판독 데이터로, 또는 기록이 게시된 기록이 아닌 경우 기록 완료로) 소스에 응답할 수 있다. 응답 또는 프로세싱 동안의 다른 이유들로 전체 어드레스가 필요한 경우, 전체 어드레스는 압축된 파이프 어드레스, 각각의 레벨에 대한 레지스터들(60)의 내용, 및 메모리 요청을 수신한 메모리 제어기(12A 내지 12H)에 대응하는 각각의 레벨에 대한 알려진 결과로부터 복구될 수 있다(블록(86)).Memory controllers 12A-12H may use plane, pipe, bank group, and bank mask registers 60 to identify the plane, pipe, bank group, and bank for the memory request (block 80). . For example, memory controllers 12A through 12H may logically AND the mask from the corresponding register 60 with the address, logically combine the bits (e.g., XOR decrement), and invert when indicated. Memory controllers 12A through 12H use drop masks from drop registers 62 to select the memory specified by each granularity level (e.g., die, slice, row, side, plane, pipe, bank group, and bank). The address bits can be dropped and the remaining address bits can be shifted together to form a compressed pipe address (block 82). For example, memory controllers 12A through 12H can mask an address with the inverse logical AND of the drop masks and shift the remaining bits together. Alternatively, memory controllers 12A through 12H may simply shift the address bits together, naturally dropping the identified bits. Memory controllers 12A-12H may perform (block 84) a specified memory request (e.g., read or write) (e.g., with read data, or with write completion if the record is not a posted record). ) can respond to the source. If the full address is needed for response or other reasons during processing, the full address is stored in the compressed pipe address, the contents of registers 60 for each level, and the memory controller 12A through 12H that received the memory request. A recovery can be made from the known results for each corresponding level (block 86).

시스템 내의 다수의 메모리 제어기들(12A 내지 12H), 및 메모리 제어기들(12A 내지 12H)에 결합된 다수의 메모리 디바이스들(28)은 시스템에서 전력 소비의 상당한 공급원일 수 있다. 동작 동안 소정 지점들에서, 상대적으로 작은 양의 메모리는 활성 사용에 있을 수 있고, 메모리 제어기들/메모리 디바이스들의 하나 이상의 슬라이스들에 대한 액세스들이 빈번하지 않을 때 이들 슬라이스들을 디스에이블함으로써 전력이 보존될 수 있다. 슬라이스를 디스에이블하는 것은, 슬라이스에서의 전력 소비를 감소시키고 슬라이스가 재인에이블될 때까지 슬라이스를 이용불가능하게 하는 임의의 메커니즘을 포함할 수 있다. 일 실시예에서, 슬라이스가 디스에이블되는 동안 데이터는 메모리 디바이스들(28)에 의해 유지될 수 있다. 따라서, 메모리 디바이스들(28)에 대한 전력 공급은 활성 상태로 유지될 수 있지만, 메모리 디바이스들(28)은 더 낮은 전력 모드에 배치될 수 있다(예컨대, DRAM 디바이스들은, 디바이스들이 데이터를 유지하기 위한 리프레시 동작들을 내부적으로 생성하지만 자체-리프레시 모드가 종료될 때까지 SOC(10)로부터 액세스가능하지 않은, 자체-리프레시 모드에 배치될 수 있다). 슬라이스 내의 메모리 제어기(들)(12A 내지 12H)는 또한 저전력 모드(예컨대, 클록 게이팅됨)에 있을 수 있다. 슬라이스 내의 메모리 제어기(들)(12A 내지 12H)는 전력 게이팅될 수 있고, 따라서 슬라이스를 인에이블할 때 그리고 디스에이블 후에 전력 투입되고 재구성될 수 있다.The number of memory controllers 12A through 12H in the system, and the number of memory devices 28 coupled to memory controllers 12A through 12H can be a significant source of power consumption in the system. At certain points during operation, a relatively small amount of memory may be in active use, and power may be conserved by disabling one or more slices of the memory controllers/memory devices when accesses to these slices are infrequent. You can. Disabling a slice may include any mechanism to reduce power consumption in the slice and make the slice unavailable until the slice is re-enabled. In one embodiment, data may be retained by memory devices 28 while the slice is disabled. Accordingly, the power supply to memory devices 28 may remain active, but memory devices 28 may be placed in a lower power mode (e.g., DRAM devices may be placed in a lower power mode to allow the devices to retain data). may be placed in self-refresh mode, generating refresh operations internally for the device but not accessible from the SOC 10 until self-refresh mode is terminated). Memory controller(s) 12A-12H within a slice may also be in a low-power mode (eg, clock gated). Memory controller(s) 12A through 12H within a slice may be power gated, so they can be powered on and reconfigured when enabling the slice and after disabling it.

일 실시예에서, 소프트웨어(예컨대, 운영 체제의 일부)는 시스템 내의 활동을 모니터링하여 슬라이스 또는 슬라이스들이 디스에이블되는지 여부를 결정할 수 있다. 소프트웨어는 또한 디스에이블된 시간 동안 슬라이스 내의 데이터에 액세스하려는 시도들을 모니터링할 수 있고, 원하는 대로 슬라이스를 재인에이블할 수 있다. 또한, 일 실시예에서, 모니터 소프트웨어는 슬라이스를 디스에이블하기 전에 특정된 레이트보다 큰 레이트로 액세스되는 슬라이스 내의 데이터의 페이지들을 검출할 수 있고, 그러한 페이지들을 디스에이블되지 않을 다른 슬라이스로 복사할 수 있다(그러한 페이지들에 대해 가상 대 물리적 어드레스 변환들을 재매핑함). 따라서, 슬라이스 내의 일부 페이지들은 이용가능하게 유지될 수 있고, 슬라이스가 디스에이블되는 동안 액세스될 수 있다. 액세스되고 있는 페이지들을 재할당하고 슬라이스를 디스에이블하는 프로세스는 본 명세서에서 슬라이스를 "폴딩(folding)"하는 것으로 지칭된다. 폴딩된 슬라이스를 재인에이블하는 것은 슬라이스를 "언폴딩(unfolding)"하는 것으로 지칭될 수 있고, 재인에이블하는 프로세스는 이전에 재할당된 페이지들을 재매핑하여 이용가능한 슬라이스들에 걸쳐 페이지들을 확산시키는 것(그리고, 슬라이스가 폴딩되었던 시간 동안 재할당된 페이지들 내의 데이터가 수정된 경우, 재할당된 물리적 페이지에 데이터를 복사하는 것)을 포함할 수 있다.In one embodiment, software (e.g., part of an operating system) can monitor activity within the system to determine whether a slice or slices are disabled. The software can also monitor attempts to access data within the slice during the disabled time and re-enable the slice as desired. Additionally, in one embodiment, the monitor software can detect pages of data within a slice that are accessed at a rate greater than a specified rate before disabling the slice, and copy those pages to another slice that will not be disabled. (Remapping virtual-to-physical address translations for those pages). Accordingly, some pages within a slice may remain available and can be accessed while the slice is disabled. The process of reallocating the pages being accessed and disabling the slice is referred to herein as “folding” the slice. Re-enabling a folded slice may be referred to as “unfolding” the slice, and the process of re-enabling involves spreading the pages across the available slices by remapping previously reallocated pages. (And, if the data in the reallocated pages is modified during the time the slice was folded, copying the data to the reallocated physical page).

도 8은 메모리를 폴딩할지 언폴딩할지를 결정하기 위해 시스템 동작을 모니터링하는 일 실시예의 동작을 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. SOC(들)(10) 상의 하나 이상의 프로세서들에 의해 실행되는 복수의 명령어들을 포함하는 하나 이상의 코드 시퀀스들("코드")은 아래에 나타낸 바와 같은 동작들을 포함하는 동작들을 야기할 수 있다. 예를 들어, 메모리 모니터 및 폴드/언폴드 코드는, SOC(들)(10) 상의 프로세서들에 의해 실행될 때 SOC들을 포함하는 시스템으로 하여금 도 8에 도시된 동작들을 포함하는 동작들을 수행하게 할 수 있는 명령어들을 포함할 수 있다.Figure 8 is a flow diagram illustrating the operation of one embodiment of monitoring system operation to determine whether to fold or unfold memory. Although blocks are shown in a specific order for ease of understanding, other orders may be used. One or more code sequences (“code”) comprising a plurality of instructions executed by one or more processors on SOC(s) 10 may result in operations including the operations as indicated below. For example, memory monitor and fold/unfold code, when executed by processors on SOC(s) 10, may cause a system comprising SOCs to perform operations including those shown in FIG. 8. It can contain commands.

메모리 모니터 및 폴드/언폴드 코드는 시스템 내의 조건들을 모니터링하여, 슬라이스를 폴딩할 기회들 또는 폴딩된 슬라이스가 언폴딩될 것임을 나타내는 활동을 식별할 수 있다(블록(90)). 모니터링될 수 있는 활동은, 예를 들어, 주어진 슬라이스에 포함된 다양한 페이지들에 대한 액세스 레이트들을 포함할 수 있다. 주어진 슬라이스 내의 페이지들이 임계 레이트 초과의 레이트로 액세스되지 않는 경우(또는 상당한 수의 페이지들이 임계 레이트 초과의 레이트로 액세스되지 않는 경우), 주어진 슬라이스는 슬라이스가 종종 유휴이므로 폴딩 후보일 수 있다. SOC들 내의 프로세서들 내의 전력 상태들은 메모리 모니터 및 폴드/언폴드 코드에 의해 모니터링되는 다른 인자일 수 있는데, 이는 더 낮은 전력 상태들의 프로세서들이 메모리에 덜 빈번하게 액세스할 수 있기 때문이다. 특히, 슬립 상태들에 있는 프로세서들은 메모리의 페이지들에 액세스하지 않을 수 있다. SOC(들)(10) 내의 통신 패브릭에서의 소비된 대역폭이 모니터링될 수 있다. 다른 시스템 인자들이 또한 모니터링될 수 있다. 예를 들어, 전력을 공급하는 배터리가 낮은 충전 상태에 도달하고 있음을 시스템이 검출하는 것으로 인해, 메모리는 폴딩될 수 있다. 다른 인자는 전원의 변화일 수 있는데, 예컨대 시스템이 연속적인, 사실상 무제한 전원(예컨대, 벽 콘센트)에 연결되었다가 언플러그되어 이제 배터리 전력에 의존하고 있는 경우일 수 있다. 다른 인자는 시스템 온도 과부하, 전력 공급 과부하 등일 수 있으며, 폴딩 메모리는 열 또는 전기 부하를 감소시킬 수 있다. 시스템에서의 활동 레벨을 나타내는 인자들의 임의의 세트가 다양한 실시예들에서 모니터링될 수 있다.The memory monitor and fold/unfold code may monitor conditions within the system to identify opportunities to fold a slice or activity that indicates a folded slice will be unfolded (block 90). Activity that can be monitored may include, for example, access rates for various pages included in a given slice. If the pages within a given slice are not accessed at a rate above the threshold rate (or a significant number of pages are not accessed at a rate above the threshold rate), then the given slice may be a candidate for folding because the slice is often idle. Power states within processors within SOCs may be another factor monitored by the memory monitor and fold/unfold code, since processors in lower power states may access memory less frequently. In particular, processors that are in sleep states may not access pages of memory. Consumed bandwidth in the communication fabric within the SOC(s) 10 may be monitored. Other system factors may also be monitored. For example, the memory may fold due to the system detecting that the battery providing power is reaching a low state of charge. Another factor may be a change in power source, such as when the system was connected to a continuous, virtually unlimited power source (e.g., a wall outlet) and then unplugged and is now relying on battery power. Other factors may be system temperature overload, power supply overload, etc. Folding memory can reduce thermal or electrical load. Any set of factors indicative of the level of activity in the system may be monitored in various embodiments.

활동이, 성능에 대한 상당한 영향 없이 하나 이상의 메모리 슬라이스들이 폴딩될 수 있음을 나타내는 경우(결정 블록(92), "예" 레그), 메모리 모니터 및 폴드/언폴드 코드는 적어도 하나의 슬라이스의 폴드를 개시할 수 있다(블록(94)). 활동이, 메모리에 대한 수요가 증가하고 있을 수 있음(또는 곧 증가하고 있을 수 있음)을 나타내는 경우(결정 블록(96), "예" 레그), 메모리 모니터 및 폴드/언폴드 코드는 언폴드를 개시할 수 있다(블록(98)).If activity indicates that one or more memory slices can be folded without significant performance impact (decision block 92, “yes” leg), the memory monitor and fold/unfold code will cause the folding of at least one slice. May commence (block 94). If activity indicates that demand for memory may be (or may soon be) increasing (decision block 96, “yes” leg), the memory monitor and fold/unfold code will May commence (block 98).

일 실시예에서, 슬라이스들의 폴딩은 점진적일 수 있고 단계적으로 발생할 수 있다. 도 9는 슬라이스의 점진적 폴드의 일 실시예를 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. SOC(들)(10) 상의 하나 이상의 프로세서들에 의해 실행되는 코드는 아래에 나타낸 바와 같은 동작들을 포함하는 동작들을 야기할 수 있다.In one embodiment, folding of slices may be gradual and occur in stages. Figure 9 is a flow diagram illustrating one embodiment of progressive folding of a slice. Although blocks are shown in a specific order for ease of understanding, other orders may be used. Code executed by one or more processors on SOC(s) 10 may cause operations including the operations shown below.

폴딩 프로세스는 폴딩할 슬라이스를 결정하는 것으로 시작될 수 있다(블록(100)). 슬라이스는, 슬라이스가, 슬라이스들 중에서, 또는 가장 덜 빈번하게 액세스되는 것 중에서, 가장 덜 빈번하게 액세스된다고 결정함으로써 선택될 수 있다. 슬라이스는 무작위로 선택될 수 있다(일 실시예에서, 활성 상태로 유지되도록 지정될 수 있는 슬라이스들을 포함하지 않음). 슬라이스는 슬라이스에서 와이어드(wired) 및/또는 기록시 복사(copy-on-write) 페이지들(아래에서 논의됨)의 결여에 기초하여 선택될 수 있거나, 슬라이스는 다른 슬라이스들보다 더 적은 유선 및/또는 기록시 복사 페이지들을 가질 수 있다. 슬라이스는 다른 폴딩된 슬라이스들로부터의 그의 상대적 독립성(예컨대, 물리적 거리, 다른 폴딩된 슬라이스들과의 통신 패브릭에서의 공유 세그먼트들의 결여 등)에 기초하여 선택될 수 있다. 슬라이스를 결정하기 위해 임의의 인자 또는 인자들이 사용될 수 있다. 슬라이스는 폴딩으로 마킹될 수 있다. 일 실시예에서, 폴딩 프로세스는 해싱에 대한 이진 결정 트리와 매칭되는, 2의 거듭제곱으로 슬라이스들을 디스에이블할 수 있다. 적어도 하나의 슬라이스는 폴딩불가능한 것으로 지정될 수 있고, 데이터가 메모리 시스템에서 액세스가능한 것을 보장하기 위해 활성 상태로 유지될 수 있다.The folding process may begin with determining which slices to fold (block 100). A slice may be selected by determining that the slice is the least frequently accessed among the slices, or among the least frequently accessed. A slice may be selected randomly (in one embodiment, not including slices that may be designated to remain active). A slice may be selected based on the lack of wired and/or copy-on-write pages (discussed below) in the slice, or a slice may have fewer wired and/or Or you can have copies of pages on write. A slice may be selected based on its relative independence from other folded slices (eg, physical distance, lack of shared segments in the communication fabric with other folded slices, etc.). Any factor or factors may be used to determine the slice. Slices can be marked by folding. In one embodiment, the folding process may disable slices with powers of 2, matching a binary decision tree for hashing. At least one slice may be designated as non-foldable and may be kept active to ensure that data is accessible in the memory system.

폴드를 개시하는 것은 폴딩 슬라이스 내의 물리적 페이지들에 대한 새로운 메모리 할당들을 억제하는 것을 포함할 수 있다. 따라서, 메모리 모니터 및 폴드/언폴드 코드는 메모리에 아직 매핑되지 않은 가상 페이지들에 대한 물리적 페이지들을 할당하는 가상 메모리 페이지 할당기 코드와 통신하여, 가상 메모리 페이지 할당기로 하여금 슬라이스 내의 물리적 페이지들을 할당하는 것을 중지하게 할 수 있다(블록(102)). 비활성화/디스에이블은 또한 잠재적으로, 슬라이스 내의 와이어드 페이지들이 언와이어드(unwired)될 것을 기다릴 수 있다. 와이어드 페이지는 가상 메모리 시스템에 의해 페이징 아웃되도록 허용되지 않는 페이지일 수 있다. 예를 들어, 커널 코드의 페이지들 및 관련 데이터 구조들의 페이지들이 와이어드일 수 있다. 기록시 복사 페이지가 할당될 때, 그것은 활성 상태로 유지되어야 할 슬라이스에 할당될 수 있고 따라서 폴딩 슬라이스에 할당되지 않을 수 있다. 기록시 복사 페이지들은, 독립적인 코드 시퀀스들(예컨대, 프로세스들, 또는 프로세스 또는 프로세스들 내의 스레드들)이 독립적인 코드 시퀀스들 중 어느 것도 페이지들을 기록하지 않는 한 페이지들을 공유할 수 있도록 허용하기 위해 사용될 수 있다. 독립적인 코드 시퀀스가 기록을 생성할 때, 기록은 가상 메모리 페이지 할당기로 하여금 새로운 페이지를 할당하게 하고 데이터를 새롭게 할당된 페이지로 복사하게 할 수 있다.Initiating a fold may include suppressing new memory allocations for physical pages within the folding slice. Therefore, the memory monitor and fold/unfold code communicate with the virtual memory page allocator code, which allocates physical pages for virtual pages not yet mapped to memory, causing the virtual memory page allocator to allocate physical pages within the slice. can be stopped (block 102). Deactivation/disable can also potentially wait for wired pages within a slice to become unwired. Wired pages may be pages that are not allowed to be paged out by the virtual memory system. For example, pages of kernel code and pages of related data structures may be wired. When a copy-on-write page is allocated, it may be allocated to a slice that should remain active and therefore not allocated to a folding slice. Copy-on-write pages allow independent code sequences (e.g., processes, or threads within a process or processes) to share pages as long as none of the independent code sequences write the pages. can be used When an independent code sequence creates a write, the write may cause the virtual memory page allocator to allocate a new page and copy data to the newly allocated page.

따라서, 가상 메모리 페이지 할당기는 어느 물리적 페이지들이 어느 슬라이스들에 매핑되는지를 인식할 수 있다. 일 실시예에서, 폴딩이 사용될 때, 상이한 메모리 제어기들/메모리에 걸쳐 각각의 페이지의 블록들을 확산시키는 대신에 메모리에 대한 어드레스들의 선형 매핑이 사용될 채용될 수 있다. 대안적으로, 어드레스들의 매핑은 주어진 슬라이스에 인접할 수 있지만, 페이지들은 슬라이스 내의 메모리 제어기들/메모리 채널들 사이에서 확산될 수 있다. 하나의 특정 실시예에서, 어드레스 공간은 각각의 슬라이스에 단일 인접한 블록들로서 매핑될 수 있다(예컨대, 하나의 슬라이스는 어드레스들 0 내지 slice_size-1에 매핑될 수 있고, 다른 슬라이스는 어드레스들 slice_size 내지 2*slice_size-1에 매핑되는 등일 수 있다). 다른 메커니즘들은 페이지 경계들 사이에 인터리브를 사용할 수 있거나, 유닛으로 폴딩/언폴딩될 수 있는 제한된 수의 슬라이스들에 페이지들을 매핑하는 등일 수 있다.Accordingly, the virtual memory page allocator can recognize which physical pages are mapped to which slices. In one embodiment, when folding is used, a linear mapping of addresses to memory may be employed instead of spreading blocks of each page across different memory controllers/memory. Alternatively, the mapping of addresses may be contiguous in a given slice, but pages may be spread among memory controllers/memory channels within a slice. In one particular embodiment, the address space may be mapped to each slice as a single contiguous block (e.g., one slice may be mapped to addresses 0 through slice_size-1 and the other slice may be mapped to addresses slice_size through 2). may be mapped to *slice_size-1, etc.). Other mechanisms may use interleaving between page boundaries, mapping pages to a limited number of slices that can be folded/unfolded as a unit, etc.

슬라이스가 폴딩되고 있는 전이 기간 동안, 선택된 (폴딩) 슬라이스 내의 페이지들은 일정 기간에 걸쳐 추적되어, 어느 페이지들이 활성으로 액세스되는지를 결정할 수 있다(블록(104)). 예를 들어, 페이지 테이블 변환들 내의 액세스 비트들은 어느 페이지들이 액세스되고 있는지를 추적하는 데 사용될 수 있다(액세스 비트들을 주기적으로 체크하고 체크되면 그들을 소거하여 새로운 액세스들이 검출될 수 있도록 함). 활성이고 더티한(dirty) 것으로 밝혀진 페이지들(데이터는 메모리에 로딩된 이후로 수정되었음)이 활성 상태로 유지될 슬라이스로 이동될 수 있다. 즉, 페이지들은 가상 메모리 페이지 할당기에 의해 상이한 슬라이스에 재매핑될 수 있다(블록(106)). 활성이지만 클린한(clean) 것으로 밝혀진 페이지들(메모리로의 초기 로드 후 수정되지 않음)은 선택적으로 상이한 슬라이스에 재매핑될 수 있다(블록(108)). 활성이지만 클린한 페이지가 재매핑되지 않는 경우, 슬라이스가 폴딩된 후의 페이지에 대한 액세스는 슬라이스가 다시 인에이블/활성화되게 할 수 있고, 따라서 달성될 수 있는 전력 절감을 제한할 수 있다. 따라서, 일반적인 의도는, 활성으로-액세스된 페이지들이 디스에이블된/폴딩된 슬라이스에 남아 있지 않는 것일 수 있다.During the transition period when a slice is being folded, the pages within the selected (folding) slice may be tracked over a period of time to determine which pages are actively accessed (block 104). For example, access bits in page table translations can be used to track which pages are being accessed (checking the access bits periodically and clearing them when checked so new accesses can be detected). Pages that are active and found to be dirty (data has been modified since being loaded into memory) can be moved to a slice where they will remain active. That is, pages may be remapped to a different slice by the virtual memory page allocator (block 106). Pages that are found to be active but clean (not modified after initial load into memory) may optionally be remapped to a different slice (block 108). If active but clean pages are not remapped, access to pages after the slice has been folded may cause the slice to be re-enabled/activated, thus limiting the power savings that can be achieved. Therefore, the general intent may be for no actively-accessed pages to remain in a disabled/folded slice.

전술한 것이 완료되면, 슬라이스 내의 메모리 디바이스들(28)(예컨대, DRAM들)은 자체-리프레시에 활성으로 배치될 수 있다(블록(110)). 대안적으로, 시간이 경과함에 따라 액세스들이 발생하고 있지 않기 때문에 메모리 디바이스들(28)은 자연스럽게 자체-리프레시로 하강하여, 메모리 제어기(12A 내지 12H) 하드웨어 내에 내장된 전력 관리 메커니즘들에 의존하여 자체-리프레시로의 전이를 야기할 수 있다. 다른 유형들의 메모리 디바이스들은 그러한 디바이스들의 정의에 따라 저전력 모드에 활성으로 배치될 수 있다(또는 자연스럽게 하강하도록 허용될 수 있다). 선택적으로, 슬라이스 내의 메모리 제어기들(12A 내지 12H)은 트래픽의 결여로 인해 더 낮은 전력 상태로 감소될 수 있지만, 메모리 요청들이 발생하는 경우 이들을 계속 청취하고 이들에 응답할 수 있다(블록(112)).Once the foregoing is complete, the memory devices 28 (e.g., DRAMs) within the slice can be actively placed into self-refresh (block 110). Alternatively, since accesses are not occurring over time, the memory devices 28 naturally fall into self-refresh, relying on power management mechanisms built into the memory controllers 12A through 12H hardware to refresh themselves. -Can cause transition to refresh. Other types of memory devices may be actively placed in a low-power mode (or allowed to fall naturally) depending on the definition of those devices. Optionally, memory controllers 12A through 12H within the slice may be reduced to a lower power state due to lack of traffic, but may continue to listen for and respond to memory requests as they occur (block 112) ).

일 실시예에서, 폴딩된 슬라이스 내의 데이터가 요구되지 않는다는 충분히 높은 확신이 있는 경우, 현재 폴딩에 더해 하드 폴드(hard fold)가 더 공격적인 모드로서 적용될 수 있다. 즉, 폴딩된 슬라이스에 대한 액세스가 장기간 동안 없는 경우 메모리 디바이스들(28)은 실제로 전력 오프될 수 있다.In one embodiment, a hard fold may be applied as a more aggressive mode in addition to the current fold if there is sufficiently high confidence that data within the folded slice is not required. That is, the memory devices 28 may actually be powered off if there is no access to the folded slice for an extended period of time.

슬라이스를 언폴딩(재인에이블 또는 활성화)하는 것은 점진적이거나 신속할 수 있다. 점진적인 언폴딩은, 실행 중인 애플리케이션들에 의해 필요한 활성 메모리 또는 대역폭의 양이 증가하고 있고 현재 활성인 슬라이스들이 수요를 충족시킬 수 없어서 성능을 제한하게 될 임계치에 접근하고 하고 있을 때, 발생할 수 있다. 신속한 언폴딩은 큰 메모리 할당 또는 대역폭 수요의 상당한 증가 시 발생할 수 있다(예컨대, 디스플레이가 켜지거나, 새로운 애플리케이션이 시작되거나, 사용자가 시스템을 잠금해제하거나 버튼 또는 다른 입력 디바이스를 누름으로써 달리 시스템과 상호작용하는 것과 같이 시스템에 참여하는 경우 등).Unfolding (re-enabling or activating) a slice can be gradual or rapid. Progressive unfolding can occur when the amount of active memory or bandwidth required by running applications is increasing and is approaching a threshold where currently active slices will not be able to meet the demand, limiting performance. Rapid unfolding can occur when a large memory allocation or a significant increase in bandwidth demand occurs (e.g., when a display turns on, a new application starts, or a user unlocks the system or otherwise interacts with the system by pressing a button or other input device). participating in a system, acting, etc.).

도 10은 메모리 슬라이스를 언폴딩하는 일 실시예를 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. SOC(들)(10) 상의 하나 이상의 프로세서들에 의해 실행되는 코드는 아래에 나타낸 바와 같은 동작들을 포함하는 동작들을 야기할 수 있다.Figure 10 is a flow diagram illustrating one embodiment of unfolding a memory slice. Although blocks are shown in a specific order for ease of understanding, other orders may be used. Code executed by one or more processors on SOC(s) 10 may cause operations including the operations shown below.

언폴딩할 슬라이스가 선택될 수 있거나(블록(120)), 위에서 논의된 바와 같이 2의 거듭제곱의 수의 슬라이스들과 같은 다수의 슬라이스들이 선택될 수 있다. 슬라이스/슬라이스들을 선택하기 위한 임의의 메커니즘이 사용될 수 있다. 예를 들어, 폴딩된 슬라이스에 대한 메모리 액세스가 발생하는 경우, 그 슬라이스가 선택될 수 있다. 슬라이스는 무작위로 선택될 수 있다. 슬라이스는 다른 비-폴딩된 슬라이스들로부터의 그의 상대적 독립성(예컨대, 물리적 거리, 다른 비-폴딩된 슬라이스들과의 통신 패브릭에서의 공유 세그먼트들의 결여 등)에 기초하여 선택될 수 있다. 언폴딩할 슬라이스를 선택하기 위해 임의의 인자 또는 인자들의 조합이 사용될 수 있다.A slice may be selected to unfold (block 120), or a number of slices may be selected, such as a power of 2 number of slices, as discussed above. Any mechanism for selecting the slice/slices may be used. For example, if a memory access to a folded slice occurs, that slice may be selected. Slices may be selected randomly. A slice may be selected based on its relative independence from other non-folded slices (eg, physical distance, lack of shared segments in the communication fabric with other non-folded slices, etc.). Any argument or combination of factors can be used to select a slice to unfold.

언폴딩 슬라이스 내의 메모리 제어기(들)(12A 내지 12H)의 전력 상태는 선택적으로 증가될 수 있고/있거나, DRAM들은 자체-리프레시(또는, 다른 유형들의 메모리 디바이스들(28)의 경우, 다른 저전력 모드)를 종료하도록 활성으로 야기될 수 있다(블록(122)). 대안적으로, 메모리 제어기들(12A 내지 12H) 및 메모리 디바이스들(28)은, 언폴딩 메모리 슬라이스 내의 물리적 페이지들이 도착할 때 메모리 요청들의 도착에 응답하여 더 높은 성능/전력 상태들로 자연스럽게 전이할 수 있다. 메모리 모니터 및 폴드/언폴드 코드는 선택된 메모리 슬라이스 내의 물리적 페이지 할당들이 할당에 이용가능하다는 것을 가상 메모리 페이지 할당기에 통지할 수 있다(블록(124)). 시간의 경과에 따라, 가상 메모리 페이지 할당기는 선택된 메모리 슬라이스 내의 페이지들을 새로 요청된 페이지들에 할당할 수 있다(블록(126)). 새롭게 요청된 페이지들을 할당하는 것에 대안적으로 또는 추가적으로, 가상 메모리 페이지 할당기는 선택된 메모리 슬라이스에서 이전에 할당된 페이지들을 선택된 메모리 슬라이스에 재배치할 수 있다. 다른 실시예에서, 가상 메모리 페이지 할당기는 페이지들을 선택된 슬라이스에 신속하게 재배치할 수 있다.The power state of the memory controller(s) 12A through 12H within the unfolding slice can optionally be increased and/or the DRAMs can be self-refreshed (or, for other types of memory devices 28, in another low-power mode). ) may be activated to terminate (block 122). Alternatively, memory controllers 12A through 12H and memory devices 28 may naturally transition to higher performance/power states in response to the arrival of memory requests when physical pages within an unfolding memory slice arrive. there is. The memory monitor and fold/unfold code may notify the virtual memory page allocator that physical page allocations within the selected memory slice are available for allocation (block 124). Over time, the virtual memory page allocator may allocate pages within the selected memory slice to newly requested pages (block 126). Alternatively or in addition to allocating newly requested pages, the virtual memory page allocator may reallocate previously allocated pages from the selected memory slice to the selected memory slice. In another embodiment, a virtual memory page allocator can quickly reallocate pages to selected slices.

슬라이스는 도 2와 관련하여 이전에 설명된 바와 같이 정의될 수 있다(예컨대, 슬라이스는 행보다 더 거친 입자일 수 있음). 다른 실시예들에서, 메모리 폴딩의 목적들을 위해, 슬라이스는 단일 메모리 채널(예컨대, 단일 메모리 디바이스(28))에 이르는 임의의 크기일 수 있다. 다른 실시예들은 슬라이스를 하나 이상의 메모리 제어기들(12A 내지 12H)로 정의할 수 있다. 일반적으로, 슬라이스는 복수의 페이지들이 매핑되는 물리적 메모리 자원이다. 매핑은, 일 실시예에서, MLC 레지스터들(22A 내지 22H, 22J 내지 22N, 22P)의 프로그래밍에 따라 결정될 수 있다. 다른 실시예에서, 매핑은 하드웨어로 고정될 수 있거나, 다른 방식으로 프로그래밍가능할 수 있다.Slices may be defined as previously described with respect to Figure 2 (eg, slices may be coarser grained than rows). In other embodiments, for purposes of memory folding, a slice may be of any size up to a single memory channel (e.g., single memory device 28). Other embodiments may define a slice as one or more memory controllers 12A through 12H. Generally, a slice is a physical memory resource into which a plurality of pages are mapped. The mapping may be determined according to programming of MLC registers 22A through 22H, 22J through 22N, and 22P, in one embodiment. In other embodiments, the mapping may be fixed in hardware or may be otherwise programmable.

일 실시예에서, 슬라이스 크기의 선택은, 부분적으로, 시스템에서 관심 있는 저전력 사용 사례들에 의해 사용되는 데이터 용량 및 대역폭에 기초할 수 있다. 예를 들어, 슬라이스 크기는, 단일 슬라이스가 시스템의 일차 디스플레이를 지속하고 운영 체제 및 작은 수의 백그라운드 애플리케이션들을 유지할 수 있는 메모리 용량을 가질 수 있도록 선택될 수 있다. 사용 사례들은, 예를 들어, 영화를 보는 것, 음악을 재생하는 것, 화면 보호기가 켜져 있지만 이메일을 페칭하거나 백그라운드에서 업데이트들을 다운로드하는 것을 포함할 수 있다.In one embodiment, the choice of slice size may be based, in part, on the data capacity and bandwidth used by low-power use cases of interest in the system. For example, the slice size can be selected so that a single slice can sustain the system's primary display and have enough memory capacity to sustain the operating system and a small number of background applications. Use cases may include, for example, watching a movie, playing music, fetching email while the screen saver is on, or downloading updates in the background.

도 11은 (예컨대, 슬라이스를 디스에이블하거나 비활성화하기 위해) 메모리 슬라이스를 폴딩하기 위한 방법의 일 실시예를 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. SOC(들)(10) 상의 하나 이상의 프로세서들에 의해 실행되는 코드는 아래에 나타낸 바와 같은 동작들을 포함하는 동작들을 야기할 수 있다.Figure 11 is a flow diagram illustrating one embodiment of a method for folding a memory slice (e.g., to disable or deactivate a slice). Although blocks are shown in a specific order for ease of understanding, other orders may be used. Code executed by one or more processors on SOC(s) 10 may cause operations including the operations shown below.

본 방법은 메모리 시스템 내의 복수의 메모리 슬라이스들 중 제1 메모리 슬라이스가 디스에이블되어야 하는지 여부를 검출하는 것을 포함할 수 있다(결정 블록(130)). 검출이 제1 메모리 슬라이스가 디스에이블되지 않아야 함을 나타내는 경우(결정 블록(130), "아니오" 레그), 방법은 완료될 수 있다. 검출이 제1 메모리 슬라이스가 디스에이블되어야 함을 나타내는 경우, 방법은 계속될 수 있다(결정 블록(130), "예" 레그). 제1 메모리 슬라이스가 디스에이블되어야 함을 검출하는 것에 기초하여, 방법은 제1 메모리 슬라이스 내의 물리적 페이지들의 서브세트를 복수의 메모리 슬라이스들 중 다른 메모리 슬라이스로 복사하는 것을 포함할 수 있다. 물리적 페이지들의 서브세트 내의 데이터는 임계 레이트 초과로 액세스될 수 있다(블록(132)). 방법은, 제1 메모리 슬라이스가 디스에이블되어야 함을 검출하는 것에 기초하여, 물리적 페이지들의 서브세트에 대응하는 가상 어드레스들을 다른 메모리 슬라이스에 재매핑하는 것을 포함할 수 있다(블록(134)). 방법은 또한, 제1 메모리 슬라이스가 디스에이블되어야 함을 검출하는 것에 기초하여, 제1 메모리 슬라이스를 디스에이블하는 것을 포함할 수 있다(블록(136)). 일 실시예에서, 제1 메모리 슬라이스를 디스에이블하는 것은, 제1 메모리 슬라이스 내의 하나 이상의 동적 액세스 메모리들(DRAM들)을 자체 리프레시 모드에 활성으로 배치하는 것을 포함할 수 있다. 다른 실시예에서, 제1 메모리 슬라이스를 디스에이블하는 것은, 제1 메모리 슬라이스 내의 하나 이상의 동적 액세스 메모리들(DRAM들)이 액세스의 결여로 인해 자체-리프레시 모드로 전이하도록 허용하는 것을 포함할 수 있다. 일 실시예에서, 메모리 시스템은 복수의 메모리 제어기들을 포함하고, 물리적 메모리 자원은 복수의 메모리 제어기들 중 적어도 하나를 포함한다. 다른 실시예에서, 메모리 시스템은 복수의 메모리 채널들을 포함하고, 주어진 동적 랜덤 액세스 메모리(DRAM)는 복수의 메모리 채널들 중 하나에 결합된다. 주어진 메모리 슬라이스는 복수의 메모리 채널들 중 적어도 하나를 포함한다. 예를 들어, 일 실시예에서, 주어진 메모리 슬라이스는 복수의 메모리 채널들 중 하나의 메모리 채널이다.The method may include detecting whether a first memory slice of a plurality of memory slices within the memory system should be disabled (decision block 130). If the detection indicates that the first memory slice should not be disabled (decision block 130, “no” leg), the method may be completed. If the detection indicates that the first memory slice should be disabled, the method may continue (decision block 130, “Yes” leg). Based on detecting that the first memory slice should be disabled, the method may include copying a subset of physical pages within the first memory slice to another of the plurality of memory slices. Data in a subset of physical pages may be accessed above a threshold rate (block 132). The method may include remapping virtual addresses corresponding to a subset of physical pages to another memory slice based on detecting that the first memory slice should be disabled (block 134). The method may also include disabling the first memory slice based on detecting that the first memory slice should be disabled (block 136). In one embodiment, disabling the first memory slice may include actively placing one or more dynamic access memories (DRAMs) within the first memory slice in a self-refresh mode. In another embodiment, disabling the first memory slice may include allowing one or more dynamic access memories (DRAMs) within the first memory slice to transition to a self-refresh mode due to lack of access. . In one embodiment, the memory system includes a plurality of memory controllers, and the physical memory resource includes at least one of the plurality of memory controllers. In another embodiment, a memory system includes a plurality of memory channels, and a given dynamic random access memory (DRAM) is coupled to one of the plurality of memory channels. A given memory slice includes at least one of a plurality of memory channels. For example, in one embodiment, a given memory slice is one memory channel of a plurality of memory channels.

일 실시예에서, 제1 메모리 슬라이스가 디스에이블되어야 한다고 결정하는 것은, 제1 메모리 슬라이스에 대한 액세스 레이트가 제1 임계치보다 낮다는 것을 검출하는 것; 및 제2 임계치보다 더 빈번하게 액세스되는 물리적 페이지들의 서브세트를 식별하는 것을 포함할 수 있다. 일 실시예에서, 본 방법은, 액세스 레이트가 제1 임계치보다 낮음을 검출하는 것에 기초하여, 제1 메모리 슬라이스에 대응하는 복수의 물리적 페이지들을 메모리 할당기 내의 가상 어드레스들에 할당하는 것을 디스에이블하는 것을 추가로 포함할 수 있다. 방법은 복수의 물리적 페이지들의 할당을 디스에이블하는 것에 후속하여 식별하는 것을 수행하는 것을 추가로 포함할 수 있다. 일 실시예에서, 복사하는 것은 메모리 시스템에서 수정된 데이터를 포함하는 서브세트의 하나 이상의 물리적 페이지들로부터 다른 메모리 슬라이스로 데이터를 복사하는 것을 포함한다. 일부 실시예들에서, 복사하는 것은 하나 이상의 물리적 페이지들로부터 데이터를 복사하는 것에 후속하여 서브세트의 나머지 물리적 페이지들로부터 데이터를 복사하는 것을 추가로 포함한다.In one embodiment, determining that the first memory slice should be disabled includes detecting that the access rate for the first memory slice is below a first threshold; and identifying a subset of physical pages that are accessed more frequently than a second threshold. In one embodiment, the method disables assigning a plurality of physical pages corresponding to a first memory slice to virtual addresses in a memory allocator based on detecting that the access rate is below a first threshold. may additionally be included. The method may further include disabling allocation of the plurality of physical pages followed by performing identification. In one embodiment, copying includes copying data from one or more physical pages of the subset containing the modified data to another memory slice in the memory system. In some embodiments, copying further includes copying data from one or more physical pages followed by copying data from remaining physical pages in the subset.

전술한 것에 따라, 시스템은 메모리 시스템을 형성하는 하나 이상의 메모리 디바이스들에 결합된 하나 이상의 메모리 제어기들을 포함할 수 있고, 여기서 메모리 시스템은 복수의 메모리 슬라이스들을 포함하고, 복수의 메모리 슬라이스들 중 주어진 메모리 슬라이스는 복수의 물리적 페이지들이 매핑되는 물리적 메모리 자원이다. 시스템은 하나 이상의 프로세서들; 및 하나 이상의 프로세서들에 의해 실행될 때, 시스템으로 하여금 위에서 강조된 바와 같은 방법을 포함하는 동작들을 수행하게 하는 복수의 명령어들을 저장하는 비일시적 컴퓨터 판독가능 저장 매체를 추가로 포함할 수 있다. 비일시적 컴퓨터 판독가능 저장 매체는 또한 일 실시예이다.In accordance with the foregoing, a system may include one or more memory controllers coupled to one or more memory devices forming a memory system, wherein the memory system includes a plurality of memory slices, one of the plurality of memory slices having a given memory A slice is a physical memory resource to which a plurality of physical pages are mapped. The system includes one or more processors; and a non-transitory computer-readable storage medium storing a plurality of instructions that, when executed by one or more processors, cause the system to perform operations including methods as highlighted above. A non-transitory computer-readable storage medium is also an embodiment.

도 12는 어드레스에 대한 메모리 요청을 타겟팅된 메모리 제어기로 라우팅하기 위해, 그리고 일부 경우들에서, 타겟팅된 메모리 디바이스 및/또는 메모리 디바이스 내의 뱅크 그룹 및/또는 뱅크로 라우팅하기 위해 어드레스를 해싱하기 위한 방법의 일 실시예를 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. 소스 하드웨어 에이전트들, 통신 패브릭 컴포넌트들, 및/또는 메모리 제어기 컴포넌트들과 같은 SOC(10)의 다양한 컴포넌트들은 방법의 부분들 또는 전부를 수행하도록 구성될 수 있다.12 illustrates a method for hashing an address to route a memory request for an address to a targeted memory controller and, in some cases, to a targeted memory device and/or a bank group and/or bank within a memory device. This is a flowchart illustrating one embodiment of. Although blocks are shown in a specific order for ease of understanding, other orders may be used. Various components of SOC 10, such as source hardware agents, communication fabric components, and/or memory controller components, may be configured to perform part or all of the method.

본 방법은 하나 이상의 집적 회로 다이에 걸쳐 물리적으로 분산된 복수의 메모리 제어기들을 갖는 시스템 내의 메모리 디바이스들에 매핑되는 메모리 어드레스 공간 내의 제1 어드레스를 갖는 메모리 요청을 생성하는 것을 포함할 수 있다(블록(140)). 일 실시예에서, 메모리 어드레스 공간 내의 주어진 메모리 어드레스는 복수의 메모리 제어기들 중 하나에 결합된 메모리 디바이스들 중 하나 내의 메모리 위치를 고유하게 식별하고, 메모리 어드레스 공간 내의 주어진 페이지는 복수의 블록들로 분할되고, 주어진 페이지의 복수의 블록들은 복수의 메모리 제어기들에 걸쳐 분산된다. 본 방법은 제1 어드레스로부터 어드레스 비트들의 독립적으로-특정된 세트들을 해싱하여 메모리 요청을 복수의 메모리 제어기들 중 제1 메모리 제어기로 지향시키는 것을 추가로 포함할 수 있으며, 여기서 어드레스 비트들의 독립적으로-특정된 세트들은 복수의 입도 레벨들에서 제1 메모리 제어기를 위치시킨다(블록(142)). 본 방법은 더 추가적으로, 해싱에 기초하여 메모리 요청을 제1 메모리 제어기로 라우팅하는 것을 포함할 수 있다(블록(144)).The method may include generating a memory request having a first address in a memory address space that is mapped to memory devices in a system having a plurality of memory controllers physically distributed across one or more integrated circuit die (block ( 140)). In one embodiment, a given memory address within the memory address space uniquely identifies a memory location within one of memory devices coupled to one of a plurality of memory controllers, and a given page within the memory address space is partitioned into a plurality of blocks. , and multiple blocks of a given page are distributed across multiple memory controllers. The method may further include directing a memory request to a first memory controller of the plurality of memory controllers by hashing independently-specified sets of address bits from the first address, wherein the independently-specified sets of address bits are The specified sets locate the first memory controller at a plurality of granularity levels (block 142). The method may further include routing the memory request to the first memory controller based on hashing (block 144).

일 실시예에서, 하나 이상의 집적 회로 다이는 복수의 집적 회로 다이이고; 복수의 입도 레벨들은 다이 레벨을 포함하고; 다이 레벨은 복수의 집적 회로 다이 중 어느 것이 제1 메모리 제어기를 포함하는지를 특정한다. 일 실시예에서, 주어진 집적 회로 다이 상의 복수의 메모리 제어기들은 주어진 집적 회로 다이 상의 물리적 위치에 기초하여 복수의 슬라이스들로 논리적으로 분할되고; 복수의 메모리 제어기들 중 적어도 2개의 메모리 제어기들은 복수의 슬라이스들 중 주어진 슬라이스에 포함되고; 복수의 입도 레벨들은 슬라이스 레벨을 포함하고; 슬라이스 레벨은 복수의 슬라이스들 중 어느 것이 제1 메모리 제어기를 포함하는지를 특정한다. 일 실시예에서, 주어진 슬라이스 내의 적어도 2개의 메모리 제어기들은 주어진 집적 회로 다이 상의 물리적 위치에 기초하여 복수의 행들로 논리적으로 분할되고; 복수의 입도 레벨들은 행 레벨을 포함하고; 행 레벨은 복수의 행들 중 어느 것이 제1 메모리 제어기를 포함하는지를 특정한다. 일 실시예에서, 복수의 행들은 주어진 집적 회로 다이 상의 물리적 위치에 기초하여 복수의 측들을 포함하고; 복수의 입도 레벨들은 측 레벨을 포함하고; 측 레벨은 복수의 행들 중 주어진 행의 어느 측이 제1 메모리 제어기를 포함하는지를 특정한다. 일 실시예에서, 메모리 요청들을 생성하는 복수의 하드웨어 에이전트들 중 주어진 하드웨어 에이전트는 하나 이상의 레지스터들을 포함하고, 본 방법은 복수의 입도 레벨들 중 하나 이상에서 해시에 어느 어드레스 비트들이 포함되는지를 식별하는 데이터로 하나 이상의 레지스터들을 프로그래밍하는 것을 추가로 포함한다. 일 실시예에서, 복수의 하드웨어 에이전트들 중 제1 하드웨어 에이전트는 복수의 입도 레벨들 중 제1 수에 대해 프로그래밍가능하고, 복수의 하드웨어 에이전트들 중 제2 하드웨어 에이전트는 복수의 입도 레벨들 중 제2 수에 대해 프로그래밍가능하며, 여기서 제2 수는 제1 수와 상이하다. 일 실시예에서, 복수의 메모리 제어기들 중 주어진 메모리 제어기는, 복수의 입도 레벨들 및 주어진 메모리 제어기 내부의 하나 이상의 다른 입도 레벨들에 어느 어드레스 비트들이 포함되는지를 식별하는 데이터로 프로그래밍가능한 하나 이상의 레지스터들을 포함한다.In one embodiment, the one or more integrated circuit dies are a plurality of integrated circuit dies; The plurality of granularity levels include a die level; The die level specifies which of the plurality of integrated circuit dies contains the first memory controller. In one embodiment, the plurality of memory controllers on a given integrated circuit die are logically partitioned into a plurality of slices based on their physical locations on the given integrated circuit die; At least two of the plurality of memory controllers are included in a given slice of the plurality of slices; The plurality of granularity levels include a slice level; The slice level specifies which of the plurality of slices contains the first memory controller. In one embodiment, the at least two memory controllers within a given slice are logically partitioned into a plurality of rows based on their physical location on a given integrated circuit die; The plurality of granularity levels include a row level; The row level specifies which of the plurality of rows contains the first memory controller. In one embodiment, the plurality of rows includes a plurality of sides based on physical location on a given integrated circuit die; The plurality of granularity levels include a side level; The side level specifies which side of a given row of the plurality of rows contains the first memory controller. In one embodiment, a given hardware agent of a plurality of hardware agents generating memory requests includes one or more registers, and the method identifies which address bits are included in the hash at one or more of the plurality of granularity levels. It further includes programming one or more registers with data. In one embodiment, a first hardware agent of the plurality of hardware agents is programmable for a first number of the plurality of granularity levels, and a second hardware agent of the plurality of hardware agents is programmable to a first number of the plurality of granularity levels. Programmable for numbers, where the second number is different from the first number. In one embodiment, a given memory controller of a plurality of memory controllers has one or more registers programmable with data identifying which address bits are included at a plurality of granularity levels and one or more different granularity levels within the given memory controller. includes them.

도 13은 메모리 제어기에서 압축된 파이프 어드레스를 형성하기 위해 어드레스 비트들을 드롭하기 위한 방법의 일 실시예를 예시하는 흐름도이다. 블록들이 이해의 용이함을 위해 특정 순서로 도시되었지만, 다른 순서들도 사용될 수 있다. 메모리 제어기는 방법의 부분들 또는 전부를 수행하도록 구성될 수 있다.13 is a flow diagram illustrating one embodiment of a method for dropping address bits to form a compressed pipe address in a memory controller. Although blocks are shown in a specific order for ease of understanding, other orders may be used. The memory controller may be configured to perform portions or all of the method.

본 방법은 시스템 내의 복수의 메모리 제어기들 중 제1 메모리 제어기에서 복수의 어드레스 비트들을 포함하는 어드레스를 수신하는 것을 포함할 수 있다. 어드레스는 제1 메모리 제어기로 라우팅되고, 제1 메모리 제어기에 의해 제어되는 복수의 메모리 디바이스들 중 제1 메모리 디바이스는 복수의 어드레스 비트들의 세트들의 복수의 해시들에 기초하여 선택된다(블록(150)). 본 방법은 복수의 어드레스 비트들 중 복수를 드롭하는 것을 추가로 포함할 수 있다(블록(152)). 복수의 어드레스 비트들 중 복수의 주어진 비트가 복수의 해시들 중 하나에 포함되고, 복수의 해시들 중 나머지 것들로부터 배제된다. 본 방법은, 복수의 어드레스 비트들 중 나머지 어드레스 비트들을 시프팅하여 제1 메모리 제어기 내에서 사용되는 압축된 어드레스를 형성하는 것을 포함할 수 있다(블록(154)).The method may include receiving an address including a plurality of address bits from a first memory controller of a plurality of memory controllers in the system. The address is routed to the first memory controller, and a first memory device of the plurality of memory devices controlled by the first memory controller is selected based on the plurality of hashes of the plurality of sets of address bits (block 150) ). The method may further include dropping a plurality of the plurality of address bits (block 152). A plurality of given bits among the plurality of address bits are included in one of the plurality of hashes and excluded from the remaining ones of the plurality of hashes. The method may include shifting remaining address bits of the plurality of address bits to form a compressed address for use within the first memory controller (block 154).

일 실시예에서, 본 방법은 복수의 해시들에서 사용되는 복수의 어드레스 비트들의 세트들 및 제1 메모리 제어기의 식별에 기초하여 복수의 어드레스 비트들 중 복수를 복구하는 것을 추가로 포함할 수 있다. 일 실시예에서, 본 방법은 압축된 어드레스에 기초하여 메모리 제어기에 의해 제어되는 메모리 디바이스에 액세스하는 것을 추가로 포함할 수 있다. 일 실시예에서, 본 방법은 복수의 해시들 중 각자의 것들에 포함된 복수의 어드레스 비트들의 세트들을 식별하기 위해 복수의 구성 레지스터들을 프로그래밍하는 것을 추가로 포함할 수 있다. 일 실시예에서, 프로그래밍하는 것은 복수의 어드레스 비트들의 세트들을 식별하는 비트 마스크들로 복수의 구성 레지스터들을 프로그래밍하는 것을 포함할 수 있다. 일 실시예에서, 본 방법은 드롭되는 복수의 어드레스 비트들 중 복수를 식별하기 위해 복수의 구성 레지스터들을 프로그래밍하는 것을 추가로 포함한다. 일 실시예에서, 프로그래밍하는 것은 복수의 구성 레지스터들을 원-핫 비트 마스크들로 프로그래밍하는 것을 포함한다.In one embodiment, the method may further include recovering a plurality of the plurality of address bits based on the identification of the first memory controller and the plurality of sets of address bits used in the plurality of hashes. In one embodiment, the method may further include accessing a memory device controlled by a memory controller based on the compressed address. In one embodiment, the method may further include programming a plurality of configuration registers to identify a plurality of sets of address bits included in respective ones of the plurality of hashes. In one embodiment, programming may include programming a plurality of configuration registers with bit masks that identify a plurality of sets of address bits. In one embodiment, the method further includes programming a plurality of configuration registers to identify a plurality of the plurality of address bits that are dropped. In one embodiment, programming includes programming a plurality of configuration registers with one-hot bit masks.

컴퓨터 시스템computer system

다음으로 도 14를 참조하면, 시스템(700)의 일 실시예의 블록도가 도시된다. 예시된 실시예에서, 시스템(700)은 하나 이상의 주변기기들(704) 및 외부 메모리(702)에 결합된 시스템 온 칩(SOC)(10)의 적어도 하나의 인스턴스를 포함한다. SOC(10)에 공급 전압들을 공급할 뿐만 아니라, 메모리(702) 및/또는 주변기기들(154)에 하나 이상의 공급 전압들을 공급하는 전력 공급부(PMU)(708)가 제공된다. 일부 실시예들에서, 이전에 언급된 바와 같이, SOC(10)의 1개 초과의 인스턴스가 포함될 수 있다(그리고 1개 초과의 메모리(702)가 또한 포함될 수 있다). 일 실시예에서, 메모리(702)는 도 1에 예시된 바와 같은 메모리 디바이스들(28)을 포함할 수 있다.Referring next to Figure 14, a block diagram of one embodiment of system 700 is shown. In the illustrated embodiment, system 700 includes at least one instance of a system-on-chip (SOC) 10 coupled to one or more peripherals 704 and external memory 702. A power supply unit (PMU) 708 is provided that supplies supply voltages to the SOC 10, as well as one or more supply voltages to the memory 702 and/or peripherals 154. In some embodiments, as previously noted, more than one instance of SOC 10 may be included (and more than one memory 702 may also be included). In one embodiment, memory 702 may include memory devices 28 as illustrated in FIG. 1 .

주변기기들(704)은 시스템(700)의 유형에 의존하여 임의의 원하는 회로부를 포함할 수 있다. 예를 들어, 일 실시예에서, 시스템(704)은 모바일 디바이스(예를 들어, 개인 휴대 정보 단말기(personal digital assistant, PDA), 스마트 폰 등)일 수 있고, 주변기기들(704)은 Wi-Fi, 블루투스(Bluetooth), 셀룰러, 글로벌 포지셔닝 시스템 등과 같은 다양한 유형들의 무선 통신용 디바이스들을 포함할 수 있다. 주변기기들(704)은 또한 RAM 저장소, 솔리드 스테이트 저장소(solid state storage) 또는 디스크 저장소를 포함하는 부가적인 저장소를 포함할 수 있다. 주변기기들(704)은 사용자 인터페이스 디바이스들, 예컨대 터치 디스플레이 스크린 또는 멀티터치 디스플레이 스크린을 포함한 디스플레이 스크린, 키보드 또는 다른 입력 디바이스, 마이크로폰, 스피커 등을 포함할 수 있다. 다른 실시예들에서, 시스템(700)은 임의의 유형의 컴퓨팅 시스템(예컨대, 데스크톱 개인용 컴퓨터, 랩톱, 워크스테이션, 넷톱 등)일 수 있다.Peripherals 704 may include any desired circuitry depending on the type of system 700. For example, in one embodiment, system 704 may be a mobile device (e.g., a personal digital assistant (PDA), smart phone, etc.) and peripherals 704 may be Wi-Fi , may include various types of wireless communication devices such as Bluetooth, cellular, global positioning system, etc. Peripherals 704 may also include additional storage, including RAM storage, solid state storage, or disk storage. Peripherals 704 may include user interface devices, such as a display screen, including a touch display screen or a multitouch display screen, a keyboard or other input device, a microphone, speakers, etc. In other embodiments, system 700 may be any type of computing system (eg, desktop personal computer, laptop, workstation, nettop, etc.).

외부 메모리(702)는 임의의 유형의 메모리를 포함할 수 있다. 예를 들어, 외부 메모리(702)는 SRAM, 동적 RAM(DRAM), 예컨대, 동기식 DRAM(synchronous DRAM, SDRAM), 더블 데이터 레이트(double data rate)(DDR, DDR2, DDR3 등) SDRAM, RAMBUS DRAM, 저전력 버전들의 DDR DRAM(예를 들어, LPDDR, mDDR 등) 등일 수 있다. 외부 메모리(702)는 메모리 디바이스들이 장착되는 하나 이상의 메모리 모듈들, 예컨대, 단일 인라인 메모리 모듈(single inline memory module, SIMM)들, 듀얼 인라인 메모리 모듈(dual inline memory module, DIMM)들 등을 포함할 수 있다. 대안적으로, 외부 메모리(702)는 칩-온-칩(chip-on-chip) 또는 패키지-온-패키지(package-on-package) 구현예로 SOC(20) 상에 장착되는 하나 이상의 메모리 디바이스들을 포함할 수 있다.External memory 702 may include any type of memory. For example, the external memory 702 may include SRAM, dynamic RAM (DRAM), such as synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM, RAMBUS DRAM, It may be low-power versions of DDR DRAM (eg, LPDDR, mDDR, etc.). External memory 702 may include one or more memory modules on which memory devices are mounted, such as single inline memory modules (SIMMs), dual inline memory modules (DIMMs), etc. You can. Alternatively, external memory 702 may be one or more memory devices mounted on SOC 20 in a chip-on-chip or package-on-package implementation. may include.

예시된 바와 같이, 시스템(700)은 넓은 범위의 영역들의 애플리케이션을 갖는 것으로 도시되어 있다. 예를 들어, 시스템(700)은 데스크톱 컴퓨터(710), 랩톱 컴퓨터(720), 태블릿 컴퓨터(730), 셀룰러 또는 모바일 폰(740), 또는 텔레비전(750)(또는 텔레비전에 결합되는 셋톱 박스)의 칩들, 회로부, 컴포넌트들 등의 부분으로서 이용될 수 있다. 스마트워치 및 건강 모니터링 디바이스(760)가 또한 예시된다. 일부 실시예들에서, 스마트워치는 다양한 범용 컴퓨팅 관련 기능들을 포함할 수 있다. 예를 들어, 스마트워치는 이메일, 셀폰 서비스, 사용자 캘린더 등에 대한 액세스를 제공할 수 있다. 다양한 실시예들에서, 건강 모니터링 디바이스는 전용 의료 디바이스일 수 있거나, 또는 그렇지 않으면 전용 건강 관련 기능을 포함할 수 있다. 예를 들어, 건강 모니터링 디바이스는 사용자의 바이탈 사인(vital sign)들을 모니터링하고, 역학적인 사회적 거리두기의 목적을 위해 다른 사용자들에 대한 사용자의 근접도를 추적하고, 접촉을 추적하고, 건강 위험의 경우 응급 서비스에 통신을 제공하는 등일 수 있다. 다양한 실시예들에서, 위에서 언급된 스마트워치는 일부 또는 임의의 건강 모니터링 관련 기능들을 포함할 수 있거나 포함하지 않을 수 있다. 목부 주위에 착용된 디바이스들, 인체에서 이식가능한 디바이스들, 증강 및/또는 가상 현실 경험을 제공하도록 설계된 안경 등과 같은 다른 웨어러블 디바이스들이 또한 고려된다.As illustrated, system 700 is shown to have a wide range of areas of application. For example, system 700 may operate on a desktop computer 710, a laptop computer 720, a tablet computer 730, a cellular or mobile phone 740, or a television 750 (or a set-top box coupled to a television). It can be used as part of chips, circuitry, components, etc. Smartwatch and health monitoring device 760 are also illustrated. In some embodiments, a smartwatch may include various general-purpose computing-related functions. For example, a smartwatch can provide access to email, cell phone service, the user's calendar, etc. In various embodiments, a health monitoring device may be a dedicated medical device or may otherwise include dedicated health-related functionality. For example, health monitoring devices can monitor a user's vital signs, track the user's proximity to other users for epidemiological social distancing purposes, contact tracing, and identify health risks. This may include providing communications to emergency services. In various embodiments, the above-mentioned smartwatch may or may not include some or any health monitoring-related features. Other wearable devices, such as devices worn around the neck, implantable devices in the body, glasses designed to provide augmented and/or virtual reality experiences, etc. are also contemplated.

시스템(700)은 클라우드 기반 서비스(들)(770)의 일부로서 추가로 사용될 수 있다. 예를 들어, 이전에 언급된 디바이스들, 및/또는 다른 디바이스들은 클라우드 내의 컴퓨팅 자원들(즉, 원격으로 위치된 하드웨어 및/또는 소프트웨어 자원들)에 액세스할 수 있다. 더 추가적으로, 시스템(700)은 이전에 언급된 것들 이외의 홈(home)의 하나 이상의 디바이스들에서 이용될 수 있다. 예를 들어, 홈 내의 기기들은 주의를 요하는 조건들을 모니터링하고 검출할 수 있다. 예를 들어, 홈 내의 다양한 디바이스들(예를 들어, 냉장고, 냉각 시스템 등)은 디바이스의 상태를 모니터링하고, 특정 이벤트가 검출되면 경보를 집주인(또는 예를 들어, 수리 설비)에게 제공할 수 있다. 대안적으로, 서모스탯(thermostat)은 홈 내의 온도를 모니터링할 수 있고, 집주인에 의한 다양한 조건들에 대한 응답들의 이력에 기초하여 가열/냉각 시스템에 대한 조정들을 자동화할 수 있다. 또한, 다양한 운송 모드들에 대한 시스템(700)의 적용이 도 14에 예시되어 있다. 예를 들어, 시스템(700)은 항공기, 기차들, 버스들, 임대용 자동차들, 개인용 자동차들, 개인용 보트들로부터 유람선(cruise liner)들까지의 수상 선박들, (대여 또는 소유를 위한) 스쿠터들 등의 제어 및/또는 엔터테인먼트 시스템들에서 사용될 수 있다. 다양한 경우들에서, 시스템(700)은 자동화된 안내(예를 들어, 자율-주행 차량들), 일반적인 시스템 제어 등을 제공하기 위해 사용될 수 있다. 이들 임의의 많은 다른 실시예들이 가능하고 고려된다. 도 14에 예시된 디바이스들 및 애플리케이션들이 단지 예시적인 것이며 제한하려는 의도가 아니라는 것을 유의한다. 다른 디바이스들이 가능하고 고려된다.System 700 may further be used as part of cloud-based service(s) 770. For example, the previously mentioned devices, and/or other devices, may access computing resources (i.e., remotely located hardware and/or software resources) within the cloud. Still further, system 700 may be used on one or more devices in the home other than those previously mentioned. For example, devices in the home can monitor and detect conditions that require attention. For example, various devices within the home (e.g., refrigerators, cooling systems, etc.) may monitor the status of the devices and provide alerts to the homeowner (or, for example, a repair facility) when certain events are detected. . Alternatively, a thermostat can monitor the temperature within the home and automate adjustments to the heating/cooling system based on a history of responses to various conditions by the homeowner. Additionally, application of system 700 to various transportation modes is illustrated in FIG. 14. For example, system 700 can be used on aircraft, trains, buses, rental cars, personal automobiles, watercraft from personal boats to cruise liners, scooters (for rental or ownership), etc. It can be used in control and/or entertainment systems. In various instances, system 700 may be used to provide automated guidance (e.g., self-driving vehicles), general system control, etc. Many other embodiments of any of these are possible and contemplated. Note that the devices and applications illustrated in FIG. 14 are illustrative only and are not intended to be limiting. Other devices are possible and contemplated.

컴퓨터 판독가능 저장 매체computer-readable storage media

이제 도 15을 참조하면, 컴퓨터 액세스가능 저장 매체(800)의 일 실시예의 블록도가 도시된다. 일반적으로 말하면, 컴퓨터 액세스가능 저장 매체는 명령어들 및/또는 데이터를 컴퓨터에 제공하기 위하여 사용 동안 컴퓨터에 의해 액세스가능한 임의의 저장 매체를 포함할 수 있다. 예를 들어, 컴퓨터 액세스가능 저장 매체는, 자기 또는 광학 매체, 예를 들어, 디스크(고정식 또는 착탈식), 테이프, CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, 또는 블루레이(Blu-Ray)와 같은 저장 매체들을 포함할 수 있다. 저장 매체들은, RAM(예를 들어, 동기식 동적 RAM(SDRAM), 램버스 DRAM(RDRAM), 정적 RAM(SRAM) 등), ROM, 또는 플래시 메모리와 같은 휘발성 또는 비휘발성 메모리 매체들을 추가로 포함할 수 있다. 저장 매체들은 저장 매체들이 명령어들/데이터를 제공하는 컴퓨터 내에 물리적으로 포함될 수 있다. 대안적으로, 저장 매체들은 컴퓨터에 연결될 수 있다. 예를 들어, 저장 매체들은 네트워크 부착형 저장소와 같이, 네트워크 또는 무선 링크를 통해 컴퓨터에 연결될 수 있다. 저장 매체들은 USB(Universal Serial Bus)와 같은 주변기기 인터페이스를 통해 연결될 수 있다. 일반적으로, 컴퓨터 액세스가능 저장 매체(800)는 비일시적 방식으로 데이터를 저장할 수 있고, 이러한 맥락에서 비일시적은 신호 상에서 명령어들/데이터를 송신하지 않는 것을 지칭할 수 있다. 예를 들어, 비일시적 저장소는 휘발성(그리고 전력 차단에 응답하여 저장된 명령어들/데이터를 상실할 수 있음) 또는 비휘발성일 수 있다.Referring now to Figure 15, a block diagram of one embodiment of a computer accessible storage medium 800 is shown. Generally speaking, a computer accessible storage medium may include any storage medium that is accessible by a computer during use to provide instructions and/or data to the computer. For example, computer accessible storage media includes magnetic or optical media, such as disks (fixed or removable), tape, CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD. -May include storage media such as RW or Blu-Ray. Storage media may further include volatile or non-volatile memory media such as RAM (e.g., synchronous dynamic RAM (SDRAM), Rambus DRAM (RDRAM), static RAM (SRAM), etc.), ROM, or flash memory. there is. Storage media may be physically included within a computer on which the storage media provides instructions/data. Alternatively, storage media may be connected to a computer. For example, storage media may be connected to a computer through a network or wireless link, such as network attached storage. Storage media may be connected through a peripheral interface such as USB (Universal Serial Bus). In general, computer accessible storage medium 800 may store data in a non-transitory manner, and non-transitory in this context may refer to not transmitting instructions/data on a signal. For example, non-transitory storage can be volatile (and may lose stored instructions/data in response to a power failure) or non-volatile.

도 15의 컴퓨터 액세스가능 저장 매체(800)는 SOC(10)를 표현하는 데이터베이스(804)를 저장할 수 있다. 일반적으로, 데이터베이스(804)는, 프로그램에 의해 판독될 수 있고 SOC(10)를 포함하는 하드웨어를 제조하기 위해 직접적으로 또는 간접적으로 사용될 수 있는 데이터베이스일 수 있다. 예를 들어, 데이터베이스는 베릴로그(Verilog) 또는 VHDL과 같은 고레벨 설계 언어(high level design language, HDL)에서 하드웨어 기능의 거동-레벨 디스크립션(behavioral-level description) 또는 레지스터-전송 레벨(register-transfer level, RTL) 디스크립션일 수 있다. 디스크립션은 합성 라이브러리로부터의 게이트들의 목록을 포함하는 넷리스트(netlist)를 생성하기 위해 디스크립션을 합성할 수 있는 합성 도구에 의해 판독될 수 있다. 넷리스트는 SOC(10)를 포함하는 하드웨어의 기능을 또한 표현하는 한 세트의 게이트들을 포함한다. 이어서, 넷리스트는 마스크들에 적용될 기하학적 형상들을 설명하는 데이터 세트를 생성하기 위해 배치되고 라우팅될 수 있다. 이어서, 마스크들은 SOC(10)에 대응하는 반도체 회로 또는 회로들을 생성하기 위한 다양한 반도체 제조 단계들에서 사용될 수 있다. 대안적으로, 컴퓨터 액세스가능 저장 매체(800) 상의 데이터베이스(804)는, 원하는 대로, 넷리스트(합성 라이브러리가 있거나 없음) 또는 데이터 세트일 수 있다.Computer-accessible storage medium 800 of FIG. 15 may store database 804 representing SOC 10. Generally, database 804 may be a database that can be read by a program and used directly or indirectly to fabricate hardware that includes SOC 10. For example, a database can be a behavioral-level description or register-transfer level description of hardware functions in a high level design language (HDL) such as Verilog or VHDL. , RTL) description. The description can be read by a synthesis tool that can synthesize the description to generate a netlist containing a list of gates from the synthesis library. The netlist contains a set of gates that also represent the functionality of the hardware comprising the SOC 10. The netlist can then be placed and routed to create a data set that describes the geometries to be applied to the masks. The masks may then be used in various semiconductor manufacturing steps to create a semiconductor circuit or circuits corresponding to SOC 10. Alternatively, database 804 on computer accessible storage medium 800 may be a netlist (with or without a synthesis library) or a data set, as desired.

컴퓨터 액세스가능 저장 매체(800)가 SOC(10)의 표현을 저장하지만, 다른 실시예들은 도 1에 도시된 컴포넌트들의 임의의 서브세트를 포함하여, 원하는 대로, SOC(10)의 임의의 부분의 표현을 반송할 수 있다. 데이터베이스(804)는 위의 것의 임의의 부분을 표현할 수 있다.Although computer accessible storage medium 800 stores a representation of SOC 10, other embodiments may include any subset of the components shown in FIG. 1, as desired, of any portion of SOC 10. Expressions can be returned. Database 804 may represent any portion of the above.

도 13에 예시된 바와 같이, 컴퓨터 액세스가능 저장 매체(800)는 가상 메모리 페이지 할당기(806) 및 메모리 모니터 및 폴드/언폴드 코드(808) 중 하나 이상을 추가로 저장할 수 있다. 가상 메모리 페이지 할당기(806)는, 하나 이상의 SOC들(10)을 포함하는 본 명세서에 기술된 다양한 컴퓨터 시스템들과 같은 컴퓨터 상에서 실행될 때(그리고 더 구체적으로 P 클러스터들(14A, 14B) 중 하나 이상 내의 프로세서 상에서 실행될 때), 컴퓨터로 하여금 (예컨대, 도 8 내지 도 11과 관련하여) 가상 메모리 페이지 할당기에 대해 전술한 것들을 포함하는 동작들을 수행하게 하는 명령어들을 포함할 수 있다. 유사하게, 메모리 모니터 및 폴드/언폴드 코드(808)는, 하나 이상의 SOC들(10)을 포함하는 본 명세서에 기술된 다양한 컴퓨터 시스템들과 같은 컴퓨터 상에서 실행될 때(그리고 더 구체적으로 P 클러스터들(14A, 14B) 중 하나 이상 내의 프로세서 상에서 실행될 때), 컴퓨터로 하여금 (예컨대, 도 8 내지 도 11과 관련하여) 메모리 모니터 및 폴드/언폴드 코드에 대해 전술한 것들을 포함하는 동작들을 수행하게 하는 명령어들을 포함할 수 있다.As illustrated in FIG. 13 , computer accessible storage medium 800 may further store one or more of a virtual memory page allocator 806 and a memory monitor and fold/unfold code 808 . Virtual memory page allocator 806, when running on a computer, such as the various computer systems described herein that include one or more SOCs 10 (and more specifically one of P clusters 14A, 14B) and instructions, when executed on a processor within the foregoing, that cause the computer to perform operations including those described above with respect to a virtual memory page allocator (e.g., with respect to FIGS. 8-11). Similarly, the memory monitor and fold/unfold code 808, when executed on a computer, such as the various computer systems described herein that include one or more SOCs 10 (and more specifically P clusters ( instructions, when executed on a processor within one or more of 14A, 14B), that cause the computer to perform operations including those described above for the memory monitor and fold/unfold code (e.g., with respect to FIGS. 8-11) may include.

******

본 개시내용은 "실시예" 또는 "실시예들의 그룹들"(예를 들어, "일부 실시예들" 또는 "다양한 실시예들")에 대한 언급들을 포함한다. 실시예들은 개시된 개념들의 상이한 구현들 또는 인스턴스들이다. "일 실시예", "하나의 실시예", "특정 실시예" 등에 대한 언급들은 반드시 동일한 실시예를 지칭하는 것은 아니다. 구체적으로 개시된 것들 뿐만 아니라, 본 개시내용의 사상 또는 범주 내에 속하는 수정들 또는 대안들을 포함하는 많은 가능한 실시예들이 고려된다.This disclosure includes references to “an embodiment” or “groups of embodiments” (e.g., “some embodiments” or “various embodiments”). Embodiments are different implementations or instances of the disclosed concepts. References to “one embodiment,” “one embodiment,” “particular embodiment,” etc. do not necessarily refer to the same embodiment. Many possible embodiments are contemplated, including those specifically disclosed, as well as modifications or alternatives that fall within the spirit or scope of the disclosure.

본 개시내용은 개시된 실시예들로부터 발생할 수 있는 잠재적인 이점들을 논의할 수 있다. 이러한 실시예들의 모든 구현들이 반드시 잠재적인 이점들 중 임의의 또는 모든 것을 나타내는 것은 아닐 것이다. 특정 구현에 대해 이점이 실현되는지 여부는 많은 인자들에 의존하며, 이들 중 일부는 본 개시내용의 범위를 벗어난다. 실제로, 청구항들의 범위 내에 속하는 구현이 임의의 개시된 이점들 중 일부 또는 전부를 나타내지 않을 수 있는 많은 이유들이 있다. 예를 들어, 특정 구현은 개시된 실시예들 중 하나와 함께, 하나 이상의 개시된 이점들을 무효화하거나 약화시키는, 본 개시내용의 범위 밖의 다른 회로부를 포함할 수 있다. 더욱이, 특정 구현의 차선의 설계 실행(예를 들어, 구현 기술들 또는 도구들)은 또한 개시된 이점들을 무효화하거나 약화시킬 수 있다. 숙련된 구현을 가정하더라도, 이점들의 실현은 구현이 전개되는 환경 상황들과 같은 다른 인자들에 여전히 의존할 수 있다. 예를 들어, 특정 구현에 공급되는 입력들은 본 개시내용에서 해결되는 하나 이상의 문제들이 특정 기회에 발생하는 것을 방지할 수 있으며, 그 결과, 그 해결책의 이익이 실현되지 않을 수 있다. 본 개시내용 외부의 가능한 인자들의 존재를 고려할 때, 본 명세서에서 설명되는 임의의 잠재적인 이점들은, 침해를 입증하기 위해 충족되어야 하는 청구항 제한들로서 해석되지 않아야 한다는 것이 명백하게 의도된다. 오히려, 그러한 잠재적 이점들의 식별은 본 개시내용의 이익을 갖는 설계자들에게 이용가능한 개선의 유형(들)을 예시하도록 의도된다. 그러한 이점들이 허용가능하게 설명된다는 것(예를 들어, 특정 이점이 "발생할 수 있다"고 언급함)은 그러한 이점들이 실제로 실현될 수 있는지에 대한 의구심을 전달하도록 의도되는 것이 아니라, 그러한 이점들의 실현이 종종 부가적인 인자들에 의존한다는 기술적 현실을 인식하도록 의도된다.This disclosure may discuss potential advantages that may arise from the disclosed embodiments. All implementations of these embodiments will not necessarily exhibit any or all of the potential advantages. Whether a benefit is realized for a particular implementation depends on many factors, some of which are beyond the scope of this disclosure. In fact, there are many reasons why implementations that fall within the scope of the claims may not exhibit some or all of the disclosed advantages. For example, a particular implementation may include other circuitry outside the scope of the disclosure, along with one of the disclosed embodiments, that negates or diminishes one or more of the disclosed advantages. Moreover, suboptimal design implementation (e.g., implementation techniques or tools) of a particular implementation may also negate or diminish the disclosed advantages. Even assuming a skillful implementation, realization of the benefits may still depend on other factors such as the environmental circumstances in which the implementation unfolds. For example, inputs supplied to a particular implementation may prevent one or more problems addressed in the disclosure from occurring at a particular opportunity, and as a result, the benefits of the solution may not be realized. Given the existence of possible factors outside the present disclosure, it is expressly intended that any potential advantages described herein should not be construed as claim limitations that must be met to establish infringement. Rather, the identification of such potential advantages is intended to illustrate the type(s) of improvement available to designers having the benefit of the present disclosure. That such benefits are acceptably described (e.g., by stating that a particular benefit "could occur") is not intended to convey doubt as to whether such benefits can in fact be realized, but rather that the realization of those benefits is It is intended to recognize the technical reality that this often depends on additional factors.

달리 언급되지 않는 한, 실시예들은 비제한적이다. 즉, 개시된 실시예들은, 특정 특징에 대해 단일 예만이 설명되는 경우에도, 본 개시내용에 기초하여 작성되는 청구항들의 범위를 제한하도록 의도되지 않는다. 개시된 실시예들은, 이에 반하는 본 개시내용의 어떠한 진술도 없이, 제한적이기보다는 예시적인 것으로 의도된다. 따라서, 본 출원은 개시된 실시예들을 커버하는 청구항들뿐만 아니라, 본 개시내용의 이익을 갖는 당업자에게 명백할 그러한 대안들, 수정들 및 등가물들을 허용하도록 의도된다.Unless otherwise stated, the examples are non-limiting. That is, the disclosed embodiments are not intended to limit the scope of claims made based on this disclosure, even if only a single example is described for a particular feature. The disclosed embodiments are intended to be illustrative rather than restrictive, without any statement of the disclosure to the contrary. Accordingly, this application is intended to admit not only the claims that cover the disclosed embodiments, but also such alternatives, modifications and equivalents that will be apparent to those skilled in the art having the benefit of this disclosure.

예를 들어, 본 출원에서의 특징들은 임의의 적합한 방식으로 조합될 수 있다. 따라서, 특징들의 임의의 그러한 조합에 대해 본 출원(또는 그에 대한 우선권을 주장하는 출원)의 심사 동안에 새로운 청구범위가 작성될 수 있다. 특히, 첨부된 청구항들을 참조하면, 종속 청구항들로부터의 특징들은 다른 독립 청구항들로부터 의존하는 청구항들을 포함하여, 적절한 경우 다른 종속 청구항들의 특징들과 조합될 수 있다. 유사하게, 개개의 독립 청구항들로부터의 특징들은 적절한 경우 조합될 수 있다.For example, features in this application may be combined in any suitable way. Accordingly, new claims may be made during examination of this application (or an application claiming priority thereto) for any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with features of other dependent claims, where appropriate, including claims that rely on other independent claims. Similarly, features from individual independent claims may be combined where appropriate.

따라서, 첨부된 종속 청구항들은 각각이 단일의 다른 청구항에 의존하도록 작성될 수 있지만, 부가적인 종속성들이 또한 고려된다. 본 개시내용과 일치하는 종속물에서의 특징들의 임의의 조합들이 고려되며, 이러한 또는 다른 출원에서 청구될 수 있다. 간단히 말하면, 조합들은 첨부된 청구항들에 구체적으로 열거된 것들로 제한되지 않는다.Accordingly, although attached dependent claims may be written so that each depends on a single other claim, additional dependencies are also contemplated. Any combination of features consistent with this disclosure is contemplated and may be claimed in this or other application. In short, the combinations are not limited to those specifically recited in the appended claims.

적절한 경우, 하나의 포맷 또는 법정 유형(예를 들어, 장치)으로 작성된 청구항들은 다른 포맷 또는 법정 유형(예를 들어, 방법)의 대응하는 청구항들을 지원하도록 의도되는 것으로 또한 고려된다.Where appropriate, claims written in one format or statutory type (e.g., device) are also considered to be intended to support corresponding claims in another format or statutory type (e.g., method).

******

본 개시내용은 법적인 문서이기 때문에, 다양한 용어들 및 문구들은 행정적 및 사법적 해석의 대상이 될 수 있다. 본 개시내용 전반에 걸쳐 제공되는 정의들뿐만 아니라 다음의 단락들이 본 개시내용에 기초하여 작성되는 청구항들을 해석하는 방법을 결정하는 데 사용될 것이라는 공지가 본 명세서에 주어진다.Because this disclosure is a legal document, various terms and phrases may be subject to administrative and judicial interpretation. Notice is given herein that the following paragraphs, as well as the definitions provided throughout this disclosure, will be used in determining how to interpret claims made based on this disclosure.

물품의 단수 형태(즉, "a", "an" 또는 "the"가 선행되는 명사 또는 명사 문구)에 대한 언급들은, 문맥상 명확하게 달리 지시되지 않는 한, "하나 이상"을 의미하는 것으로 의도된다. 따라서, 청구항에서 "항목"에 대한 언급은, 수반되는 상황 없이, 항목의 부가적인 인스턴스들을 배제하지 않는다. "복수"의 항목들은 항목들 중 2개 이상의 세트를 지칭한다.References to the singular form of an article (i.e., a noun or noun phrase preceded by “a”, “an” or “the”) are intended to mean “one or more” unless the context clearly dictates otherwise. do. Accordingly, reference to “an item” in a claim does not exclude additional instances of the item without accompanying context. “Plural” items refer to two or more sets of items.

"~할 수 있다"라는 단어는 본 명세서에서 강제적인 의미(즉, ~ 해야 하는)가 아니라 허용적인 의미(즉, ~할 가능성을 갖는, ~할 수 있는)로 사용된다.The word “may” is used in this specification not in a mandatory sense (i.e., must) but in a permissive sense (i.e., having the possibility of doing, being able to do).

용어들 "포함하는"("comprising" 및 "including") 및 이들의 형태들은 개방형(open-ended)이며, "포함하지만 이로 한정되지 않는"을 의미한다.The terms “comprising” and “including” and their forms are open-ended and mean “including but not limited to.”

용어 "또는"이 옵션들의 리스트에 관하여 본 개시내용에서 사용될 때, 문맥이 달리 제공하지 않는 한, 일반적으로 포괄적인 의미로 사용되는 것으로 이해될 것이다. 따라서, "x 또는 y"의 언급은 "x 또는 y, 또는 둘 모두"와 동등하고, 따라서 1) x지만 y 아님, 2) y지만 x 아님 및 3) x 및 y 둘 모두를 커버한다. 반면에, "둘 모두가 아니라 x 또는 y 중 어느 하나"와 같은 문구는 "또는"이 배타적인 의미로 사용되고 있다는 것을 명확하게 한다.When the term “or” is used in this disclosure in relation to a list of options, it will generally be understood to be used in an inclusive sense, unless the context provides otherwise. Accordingly, reference to “x or y” is equivalent to “x or y, or both” and thus covers 1) x but not y, 2) y but not x, and 3) both x and y. On the other hand, phrases like “either x or y, but not both” make it clear that “or” is being used in an exclusive sense.

"w, x, y, 또는 z, 또는 이들의 임의의 조합" 또는 "... w, x, y, 및 z 중 적어도 하나"의 언급은 세트 내의 요소들의 총 수까지 단일 요소를 수반하는 모든 가능성들을 커버하도록 의도된다. 예를 들어, 세트 [w, x, y, z]가 주어지면, 이러한 문구들은 세트의 임의의 단일 요소(예를 들어, w지만 x, y, 또는 z 아님), 임의의 2개의 요소들(예를 들어, w 및 x지만 y 또는 z 아님), 임의의 3개의 요소들(예를 들어, w, x 및 y지만, z 아님) 및 4개의 요소들 모두를 커버한다. 따라서, "... w, x, y, 및 z 중 적어도 하나"라는 문구는 세트 [w, x, y, z]의 적어도 하나의 요소를 지칭하고, 이로써 요소들의 이러한 리스트 내의 모든 가능한 조합들을 커버한다. 이러한 문구는 w의 적어도 하나의 인스턴스, x의 적어도 하나의 인스턴스, y의 적어도 하나의 인스턴스, 및 z의 적어도 하나의 인스턴스가 있음을 요구하도록 해석되지 않아야 한다.Reference to "w, x, y, or z, or any combination thereof" or "... at least one of w, It is intended to cover possibilities. For example, given a set [w, x, y, z], these phrases refer to any single element of the set (e.g., w but not for example w and x but not y or z), any three elements (e.g. w, x and y but not z) and all four elements. Thus, the phrase "...at least one of w, x, y, and z" refers to at least one element of the set [w, x, y, z], thereby exhausting all possible combinations within this list of elements. Cover. These phrases should not be interpreted to require that there is at least one instance of w, at least one instance of x, at least one instance of y, and at least one instance of z.

본 개시내용에서 다양한 "라벨들"이 명사들 또는 명사 문구들에 선행할 수 있다. 문맥이 달리 제공하지 않는 한, 특징에 대해 사용되는 상이한 라벨들(예를 들어, "제1 회로", "제2 회로", "특정 회로", "주어진 회로" 등)은 특징의 상이한 인스턴스들을 지칭한다. 추가적으로, 특징에 적용될 때, "제1", "제2" 및 "제3" 라벨들은, 달리 언급되지 않는 한, 어떠한 유형의 순서화(예를 들어, 공간적, 시간적, 논리적 등)를 의미하지 않는다.Various “labels” may precede nouns or noun phrases in this disclosure. Unless the context provides otherwise, different labels used for a feature (e.g., “first circuit,” “second circuit,” “particular circuit,” “given circuit,” etc.) refer to different instances of the feature. refers to Additionally, when applied to features, the “first,” “second,” and “third” labels do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless otherwise noted. .

문구 "기초하여"는 결정에 영향을 주는 하나 이상의 인자들을 설명하기 위해 사용된다. 이러한 용어는 부가적인 인자들이 결정에 영향을 줄 수 있는 가능성을 배제하지 않는다. 즉, 결정은 단지 특정된 인자들에 기초하거나 또는 그 특정된 인자들뿐만 아니라 다른, 불특정된 인자들에 기초할 수 있다. "B에 기초하여 A를 결정한다"라는 문구를 고려한다. 이러한 문구는 B가 A를 결정하는 데 사용되거나 A의 결정에 영향을 주는 인자라는 것을 명시한다. 이러한 문구는 A의 결정이 C와 같은 일부 다른 인자에 또한 기초할 수 있음을 배제하지 않는다. 또한, 이 문구는 A가 B만에 기초하여 결정되는 실시예를 커버하도록 의도된다. 본 명세서에서 사용되는 바와 같이, "에 기초하여"라는 문구는 "적어도 부분적으로 기초하여"라는 문구와 동의어이다.The phrase “based on” is used to describe one or more factors that influence a decision. These terms do not exclude the possibility that additional factors may influence the decision. That is, the decision may be based solely on the specified factors or may be based on the specified factors as well as other, unspecified factors. Consider the statement “Decide A based on B.” These phrases specify that B is used to determine A or is a factor influencing A's decision. This phrase does not exclude that A's decision may also be based on some other factors, such as C. Additionally, this phrase is intended to cover embodiments in which A is determined based solely on B. As used herein, the phrase “based on” is synonymous with the phrase “based at least in part.”

문구들 "~ 에 응답하여" 및 "~ 에 응답으로"는 효과를 트리거하는 하나 이상의 인자들을 설명한다. 이러한 문구는 부가적인 인자들이 특정된 인자들과 공동으로 또는 특정된 인자들과는 독립적으로 영향을 주거나 또는 달리 효과를 트리거할 수 있는 가능성을 배제하지 않는다. 즉, 효과는 단지 이들 인자들에만 응답할 수 있거나 또는 특정된 인자들 뿐만 아니라 다른 불특정된 인자들에 응답할 수 있다. "B에 응답하여 A를 수행한다"라는 문구를 고려한다. 이러한 문구는 B가 A의 수행을 트리거하는 또는 A에 대한 특정 결과를 트리거하는 인자라는 것을 명시한다. 이러한 문구는 A를 수행하는 것이 C와 같은 일부 다른 인자에 또한 응답할 수 있음을 배제하지 않는다. 이러한 문구는 또한 A를 수행하는 것이 B와 C에 응답하여 공동으로 수행될 수 있다는 것을 배제하지 않는다. 이러한 문구는 또한 A가 B에만 응답하여 수행되는 실시예를 커버하도록 의도된다. 본 명세서에서 사용되는 바와 같이, 문구 "응답하여"는 문구 "적어도 부분적으로 응답하여"와 동의어이다. 유사하게, 문구 "~에 응답하여"는 문구 "적어도 부분적으로 응답하여"와 동의어이다.The phrases “in response to” and “in response to” describe one or more factors that trigger an effect. This phrase does not exclude the possibility that additional factors may influence or otherwise trigger the effect jointly with or independently of the specified factors. That is, the effect may respond only to these factors, or it may respond to the specified factors as well as other unspecified factors. Consider the statement “Do A in response to B.” These phrases specify that B is an argument that triggers the execution of A or triggers a specific result for A. This phrase does not exclude that performing A may also respond to some other argument, such as C. This phrase also does not exclude that performing A may be performed jointly in response to B and C. This phrase is also intended to cover embodiments in which A is performed in response only to B. As used herein, the phrase “in response” is synonymous with the phrase “at least in part in response.” Similarly, the phrase “in response to” is synonymous with the phrase “at least in part in response to.”

******

본 개시내용 내에서, 상이한 엔티티들(이는, "유닛들", "회로들", 다른 구성요소들 등으로 다양하게 지칭될 수 있음)은 하나 이상의 태스크들 또는 동작들을 수행하도록 "구성된" 것으로 설명되거나 또는 청구될 수 있다. 이러한 표현 - [하나 이상의 태스크들을 수행]하도록 구성된 [엔티티] -은 본 명세서에서 구조체(즉, 물리적인 것)를 지칭하는 데 사용된다. 더 상세하게는, 이러한 표현은 이러한 구조체가 동작 동안 하나 이상의 태스크들을 수행하도록 배열됨을 나타내는 데 사용된다. 구조체는 그 구조체가 현재 동작되고 있지 않더라도 일부 태스크를 수행하도록 "구성된다"고 말할 수 있다. 따라서, 일부 태스크를 수행"하도록 구성된" 것으로 설명된 또는 언급된 엔티티는 디바이스, 회로, 태스크를 구현하도록 실행가능한 프로그램 명령어들을 저장하는 메모리 및 프로세서 유닛을 갖는 시스템 등과 같은 물리적인 것을 지칭한다. 이러한 문구는 본 명세서에서 무형인 것을 지칭하기 위해 사용되지는 않는다.Within this disclosure, different entities (which may be variously referred to as “units,” “circuits,” other components, etc.) are described as being “configured” to perform one or more tasks or operations. may be or may be charged. This expression - [entity] configured to [perform one or more tasks] - is used herein to refer to a structure (i.e., a physical thing). More specifically, this expression is used to indicate that this structure is arranged to perform one or more tasks during operation. A structure can be said to be "configured" to perform some task even if the structure is not currently performing it. Accordingly, an entity described or referred to as being “configured” to perform some task refers to something physical, such as a device, circuit, system having a processor unit and memory storing executable program instructions to implement the task, etc. This phrase is not used herein to refer to something intangible.

일부 경우들에서, 다양한 유닛들/회로들/구성요소들은 태스크 또는 동작들의 세트를 수행하는 것으로 본 명세서에서 설명될 수 있다. 이들 엔티티들은, 구체적으로 언급되지 않더라도, 그러한 태스크들/동작들을 수행하도록 "구성"된다는 것이 이해된다.In some cases, various units/circuits/components may be described herein as performing a task or set of operations. It is understood that these entities, even if not specifically mentioned, are “configured” to perform such tasks/actions.

용어 "~ 하도록 구성된"은 "~하도록 구성가능한"을 의미하도록 의도되지 않는다. 예를 들어, 프로그래밍되지 않은 FPGA는 특정 기능을 수행하도록 "구성된" 것으로 간주되지 않을 것이다. 그러나, 이러한 프로그래밍되지 않은 FPGA는 그 기능을 수행하도록 "구성가능"할 수 있다. 적절한 프로그래밍 후에, 이어서 FPGA는 특정 기능을 수행하도록 "구성된다"고 말할 수 있다.The term “configured to” is not intended to mean “configurable to”. For example, an unprogrammed FPGA would not be considered “configured” to perform a specific function. However, these unprogrammed FPGAs can be “configurable” to perform their functions. After appropriate programming, the FPGA can then be said to be “configured” to perform a specific function.

본 개시내용에 기초한 미국 특허 출원들의 목적들을 위해, 구조가 하나 이상의 태스크들을 수행하도록 "구성"된다고 청구항에서 언급하는 것은 명백히 그 청구항 요소에 대하여 35 U.S.C. §112(f)를 적용하지 않도록 의도된다. 출원인이 본 개시내용에 기초하여 미국 특허 출원의 심사 동안 섹션 112(f)의 적용을 원하면, [기능을 수행]"하기 위한 수단" 구조를 이용하여 청구항 요소들을 열거할 것이다.For purposes of U.S. patent applications based on this disclosure, a statement in a claim that a structure is “configured” to perform one or more tasks expressly means that a claim element is within the meaning of 35 U.S.C. §112(f) is not intended to apply. If an applicant seeks to invoke section 112(f) during prosecution of a U.S. patent application based on this disclosure, the applicant will recite the claim elements using a “means for performing a function” structure.

상이한 "회로들"이 본 개시내용에서 설명될 수 있다. 이러한 회로들 또는 "회로부"는 조합 로직, 클로킹된 저장 디바이스들(예를 들어, 플립-플롭들, 레지스터들, 래치들 등), 유한 상태 머신들, 메모리(예를 들어, 랜덤 액세스 메모리, 내장형 동적 랜덤 액세스 메모리), 프로그래밍가능 로직 어레이들 등과 같은 다양한 유형들의 회로 요소들을 포함하는 하드웨어를 구성한다. 회로부는 맞춤 설계되거나 표준 라이브러리들로부터 취해질 수 있다. 다양한 구현들에서, 회로부는 적절하게 디지털 구성요소들, 아날로그 구성요소들, 또는 둘 모두의 조합을 포함할 수 있다. 특정 유형들의 회로들은 일반적으로 "유닛들"(예를 들어, 디코드 유닛, 산술 로직 유닛(ALU), 기능 유닛, 메모리 관리 유닛(MMU) 등)로 지칭될 수 있다. 그러한 유닛들은 또한 회로들 또는 회로부를 지칭한다.Different “circuits” may be described in this disclosure. These circuits or “circuitry” may include combinational logic, clocked storage devices (e.g., flip-flops, registers, latches, etc.), finite state machines, memory (e.g., random access memory, embedded It constitutes hardware that includes various types of circuit elements such as dynamic random access memory, programmable logic arrays, etc. The circuitry can be custom designed or taken from standard libraries. In various implementations, the circuitry may include digital components, analog components, or a combination of both, as appropriate. Certain types of circuits may be generally referred to as “units” (eg, decode unit, arithmetic logic unit (ALU), functional unit, memory management unit (MMU), etc.). Such units also refer to circuits or circuit sections.

따라서, 도면들에 예시되고 본 명세서에서 설명된 개시된 회로들/유닛들/구성요소들 및 다른 요소들은 이전 단락에서 설명된 것들과 같은 하드웨어 요소들을 포함한다. 많은 경우들에서, 특정 회로 내의 하드웨어 요소들의 내부 배열은 그 회로의 기능을 설명함으로써 특정될 수 있다. 예를 들어, 특정 "디코드 유닛"은 "명령어의 오피코드(opcode)를 프로세싱하고 그 명령어를 복수의 기능 유닛들 중 하나 이상에 라우팅하는" 기능을 수행하는 것으로 설명될 수 있으며, 이는 디코드 유닛이 이러한 기능을 수행하도록 "구성됨"을 의미한다. 이러한 기능의 규격은, 컴퓨터 분야의 당업자들에게, 회로에 대한 가능한 구조체들의 세트를 암시하기에 충분하다.Accordingly, the disclosed circuits/units/components and other elements illustrated in the drawings and described herein include hardware elements such as those described in the previous paragraph. In many cases, the internal arrangement of hardware elements within a particular circuit can be specified by describing the functionality of that circuit. For example, a particular “decode unit” may be described as performing the function of “processing the opcode of an instruction and routing the instruction to one or more of a plurality of functional units,” which means that the decode unit means “configured” to perform these functions. This functional specification is sufficient to suggest to those skilled in the computer arts the set of possible structures for the circuit.

다양한 실시예들에서, 이전 단락에서 논의된 바와 같이, 회로들, 유닛들, 및 이들이 구현하도록 구성된 기능들 또는 동작들에 의해 정의된 다른 요소들, 배열 및 그러한 회로들/유닛들/구성요소들은 서로에 대해 그리고 이들이 상호작용하는 방식으로, 마이크로아키텍처 정의의 물리적 구현을 형성하도록 집적 회로에서 궁극적으로 제조되거나 FPGA로 프로그래밍되는 하드웨어의 마이크로아키텍처 정의를 형성한다. 따라서, 마이크로아키텍처 정의는 많은 물리적 구현들이 유도될 수 있는 구조체로서 당업자들에 의해 인식되며, 이들 모두는 마이크로아키텍처 정의에 의해 설명된 더 넓은 구조체에 속한다. 즉, 본 개시내용에 따라 공급되는 마이크로아키텍처 정의를 제공받는 당업자는, 과도한 실험 없이 그리고 통상의 기술의 적용으로, 회로들/유닛들/구성요소들의 디스크립션을 Verilog 또는 VHDL과 같은 하드웨어 디스크립션 언어(HDL)로 코딩함으로써 구조체를 구현할 수 있다. HDL 디스크립션은 종종, 기능적으로 보일 수 있는 방식으로 표현된다. 그러나, 본 분야의 당업자들에게, 이러한 HDL 디스크립션은 회로, 유닛 또는 구성요소의 구조체를 다음 레벨의 구현 세부사항으로 변환하는 데 사용되는 방식이다. 그러한 HDL 디스크립션은 (통상적으로 합성가능하지 않은) 거동 코드, (거동 코드와는 대조적으로, 통상적으로 합성가능한) 레지스터 전송 언어(RTL) 코드, 또는 구조적 코드(예를 들어, 로직 게이트들 및 그들의 연결을 특정하는 넷리스트)의 형태를 취할 수 있다. HDL 디스크립션은 주어진 집적 회로 제조 기술을 위해 설계된 셀들의 라이브러리에 대해 순차적으로 합성될 수 있고, 타이밍, 전력 및 다른 이유들로 인해 수정되어 최종 설계 데이터베이스를 생성할 수 있으며, 이는 파운드리(foundry)로 송신되어 마스크들을 생성하고 궁극적으로 집적 회로를 생성할 수 있다. 일부 하드웨어 회로들 또는 그의 부분들은 또한 회로도 편집기(schematic editor)로 맞춤 설계될 수 있고 합성된 회로부와 함께 집적 회로 설계 내로 캡처될 수 있다. 집적 회로들은 트랜지스터들, 및 다른 회로 요소들(예를 들어, 커패시터들, 저항기들, 인덕터들 등과 같은 수동 요소들) 및 트랜지스터들과 회로 요소들 사이의 상호연결부를 포함할 수 있다. 일부 실시예들은 하드웨어 회로를 구현하기 위해 함께 결합된 다수의 집적 회로들을 구현할 수 있고/있거나 일부 실시예들에서는 이산 요소들이 사용될 수 있다. 대안적으로, HDL 설계는 FPGA(Field Programmable Gate Array)와 같은 프로그래밍가능 로직 어레이로 합성될 수 있으며 FPGA에서 구현될 수 있다. 회로들의 그룹의 설계와 이들 회로들의 후속 저레벨 구현 사이의 이러한 디커플링은 일반적으로, 회로 또는 로직 설계자가 회로가 무엇을 하도록 구성되는지의 설명을 넘어서 저레벨 구현에 대한 특정 세트의 구조체들을 특정하지 않는 시나리오를 도출하는데, 이는 이러한 프로세스가 회로 구현 프로세스의 상이한 스테이지에서 수행되기 때문이다.In various embodiments, as discussed in the previous paragraph, other elements, arrangements, and other elements, arrangements, and such circuits/units/components are defined by circuits, units, and the functions or operations they are configured to implement. With respect to each other and the way they interact, they form the microarchitectural definition of the hardware that is ultimately fabricated in an integrated circuit or programmed into an FPGA to form a physical implementation of the microarchitectural definition. Accordingly, the microarchitecture definition is recognized by those skilled in the art as a structure from which many physical implementations can be derived, all of which fall within the broader structure described by the microarchitecture definition. That is, a person skilled in the art, given the microarchitecture definition provided in accordance with the present disclosure, will be able, without undue experimentation and by the application of ordinary skill, to describe the circuits/units/components in a hardware description language (HDL) such as Verilog or VHDL. ) You can implement the structure by coding. HDL descriptions are often expressed in a way that may appear functional. However, for those skilled in the art, this HDL description is the method used to translate the structure of a circuit, unit or component to the next level of implementation details. Such HDL descriptions can be either behavior code (which is typically not synthesizable), register transfer language (RTL) code (which, as opposed to behavior code, is typically synthesizable), or structural code (e.g., logic gates and their connections). It can take the form of a netlist specifying . The HDL description can be sequentially synthesized for a library of cells designed for a given integrated circuit manufacturing technology and modified for timing, power and other reasons to create the final design database, which can be sent to the foundry. can be used to create masks and ultimately create integrated circuits. Some hardware circuits or portions thereof can also be custom designed with a schematic editor and captured together with the synthesized circuitry into an integrated circuit design. Integrated circuits may include transistors and other circuit elements (eg, passive elements such as capacitors, resistors, inductors, etc.) and interconnections between the transistors and circuit elements. Some embodiments may implement multiple integrated circuits coupled together to implement a hardware circuit and/or in some embodiments discrete elements may be used. Alternatively, the HDL design can be synthesized into a programmable logic array, such as a Field Programmable Gate Array (FPGA), and implemented on the FPGA. This decoupling between the design of a group of circuits and the subsequent low-level implementation of those circuits generally allows for scenarios in which a circuit or logic designer does not specify a particular set of structures for the low-level implementation beyond a description of what the circuit is configured to do. This is because these processes are performed at different stages of the circuit implementation process.

회로 요소들의 많은 상이한 저레벨 조합들이 회로의 동일한 규격을 구현하는 데 사용될 수 있다는 사실은 그 회로에 대한 다수의 등가 구조체들을 초래한다. 언급된 바와 같이, 이러한 저레벨 회로 구현들은 제조 기술의 변화들, 집적 회로를 제조하기 위해 선택된 파운드리, 특정 프로젝트를 위해 제공된 셀들의 라이브러리 등에 따라 변할 수 있다. 많은 경우들에서, 이들 상이한 구현들을 생성하기 위해 상이한 설계 도구들 또는 방법론들에 의해 이루어지는 선택들은 임의적일 수 있다.The fact that many different low-level combinations of circuit elements can be used to implement the same specification of a circuit results in a large number of equivalent structures for that circuit. As mentioned, these low-level circuit implementations may vary depending on changes in manufacturing technology, the foundry selected to fabricate the integrated circuit, the library of cells provided for a particular project, etc. In many cases, the choices made by different design tools or methodologies to create these different implementations may be arbitrary.

게다가, 회로의 특정 기능 규격의 단일 구현이 주어진 실시예에 대해 많은 수의 디바이스들(예를 들어, 수백만 개의 트랜지스터들)을 포함하는 것이 일반적이다. 따라서, 엄청난 체적의 이러한 정보는, 등가의 가능한 구현들의 방대한 어레이는 말할 것도 없이, 단일 실시예를 구현하는 데 사용되는 저레벨 구조체의 완전한 설명을 제공하는 것을 비실용적으로 만든다. 이러한 이유로, 본 개시내용은 업계에서 일반적으로 사용되는 기능적 속기(shorthand)를 사용하여 회로들의 구조체를 설명한다.Moreover, it is common for a single implementation of a particular functional specification of a circuit to include a large number of devices (eg, millions of transistors) for a given embodiment. Accordingly, the sheer volume of this information makes it impractical to provide a complete description of the low-level structures used to implement a single embodiment, let alone the vast array of equivalent possible implementations. For this reason, this disclosure describes the structure of circuits using functional shorthand commonly used in the industry.

다음의 번호가 매겨진 예들에 기재된 실시예들을 포함하는 다양한 실시예들이 본 명세서에 기술된다:Various embodiments are described herein, including those described in the following numbered examples:

1. 방법으로서,1. As a method,

메모리 시스템 내의 복수의 메모리 슬라이스들 중 적어도 하나의 제1 메모리 슬라이스가 디스에이블되어야 함을 검출하는 단계 - 복수의 메모리 슬라이스들 중 주어진 메모리 슬라이스는 복수의 물리적 페이지들이 매핑되는 물리적 메모리 자원임 -; 및detecting that at least one first memory slice among the plurality of memory slices in the memory system should be disabled, wherein a given memory slice among the plurality of memory slices is a physical memory resource to which a plurality of physical pages are mapped; and

제1 메모리 슬라이스가 디스에이블되어야 함을 검출하는 것에 기초하여:Based on detecting that the first memory slice should be disabled:

제1 메모리 슬라이스 내의 물리적 페이지들의 서브세트를 복수의 메모리 슬라이스들 중 다른 메모리 슬라이스로 복사하는 단계 - 물리적 페이지들의 서브세트 내의 데이터는 임계 레이트 초과로 액세스되고 있음 -;copying a subset of physical pages in the first memory slice to another of the plurality of memory slices, wherein data in the subset of physical pages is being accessed above a threshold rate;

물리적 페이지들의 서브세트에 대응하는 가상 어드레스들을 다른 메모리 슬라이스에 재매핑하는 단계; 및remapping virtual addresses corresponding to a subset of physical pages to another memory slice; and

제1 메모리 슬라이스를 디스에이블하는 단계를 포함하는, 방법.A method comprising disabling the first memory slice.

2. 예 1에 있어서, 적어도 하나의 제1 메모리 슬라이스는 복수의 메모리 슬라이스들 중 복수이고, 복수의 메모리 슬라이스들 중 복수의 수는 2의 거듭제곱인, 방법.2. The method of Example 1, wherein the at least one first memory slice is a plurality of the plurality of memory slices and the plurality of number of the plurality of memory slices is a power of 2.

3. 예 1 또는 예 2에 있어서, 제1 메모리 슬라이스를 디스에이블하는 단계는, 제1 메모리 슬라이스 내의 하나 이상의 동적 액세스 메모리들(DRAM들)을 자체 리프레시 모드에 활성으로 배치하는 단계를 포함하는, 방법.3. The method of Example 1 or Example 2, wherein disabling the first memory slice includes actively placing one or more dynamic access memories (DRAMs) within the first memory slice in a self-refresh mode. method.

4. 예 1 또는 예 2에 있어서, 제1 메모리 슬라이스를 디스에이블하는 단계는, 제1 메모리 슬라이스 내의 하나 이상의 동적 액세스 메모리들(DRAM들)이 액세스의 결여로 인해 자체-리프레시 모드로 전이하도록 허용하는 단계를 포함하는, 방법.4. The method of Example 1 or Example 2, wherein disabling the first memory slice allows one or more dynamic access memories (DRAMs) within the first memory slice to transition to self-refresh mode due to lack of access. A method comprising the steps of:

5. 예 1 내지 예 4 중 임의의 예에 있어서, 메모리 시스템은 복수의 메모리 제어기들을 포함하고, 물리적 메모리 자원은 복수의 메모리 제어기들 중 적어도 하나를 포함하는, 방법.5. The method of any of examples 1-4, wherein the memory system includes a plurality of memory controllers and the physical memory resource includes at least one of the plurality of memory controllers.

6. 예 1 내지 예 4 중 임의의 예에 있어서, 메모리 시스템은 복수의 메모리 채널들을 포함하고, 주어진 동적 랜덤 액세스 메모리(DRAM)는 복수의 메모리 채널들 중 하나에 결합되고, 주어진 메모리 슬라이스는 복수의 메모리 채널들 중 적어도 하나를 포함하는, 방법.6. The example of any of Examples 1-4, wherein the memory system includes a plurality of memory channels, a given dynamic random access memory (DRAM) is coupled to one of the plurality of memory channels, and the given memory slice has a plurality of memory slices. A method comprising at least one of the memory channels.

7. 예 6에 있어서, 주어진 메모리 슬라이스는 복수의 메모리 채널들 중 하나의 메모리 채널인, 방법.7. The method of Example 6, wherein the given memory slice is one memory channel of a plurality of memory channels.

8. 예 1 내지 예 7 중 임의의 예에 있어서, 제1 메모리 슬라이스가 디스에이블되어야 한다고 결정하는 단계는,8. The method of any of examples 1-7, wherein determining that the first memory slice should be disabled includes:

제1 메모리 슬라이스에 대한 액세스 레이트가 제1 임계치보다 낮음을 검출하는 단계; 및detecting that the access rate for the first memory slice is lower than a first threshold; and

제2 임계치보다 더 빈번하게 액세스되는 물리적 페이지들의 서브세트를 식별하는 단계를 포함하는, 방법.A method comprising: identifying a subset of physical pages that are accessed more frequently than a second threshold.

9. 예 8에 있어서,9. For Example 8:

액세스 레이트가 제1 임계치보다 낮음을 검출하는 것에 기초하여, 제1 메모리 슬라이스에 대응하는 복수의 물리적 페이지들을 메모리 할당기 내의 가상 어드레스들에 할당하는 것을 디스에이블하는 단계를 추가로 포함하는, 방법.The method further comprising disabling assignment of the plurality of physical pages corresponding to the first memory slice to virtual addresses in the memory allocator based on detecting that the access rate is below the first threshold.

10. 예 9에 있어서,10. In Example 9,

복수의 물리적 페이지들의 할당을 디스에이블하는 것에 후속하여 식별하는 것을 수행하는 단계를 추가로 포함하는, 방법.The method further comprising disabling allocation of the plurality of physical pages followed by performing identification.

11. 예 1 내지 예 10 중 임의의 예에 있어서, 복사하는 단계는,11. The method of any of examples 1 through 10, wherein the copying step includes:

메모리 시스템에서 수정된 데이터를 포함하는 서브세트의 하나 이상의 물리적 페이지들로부터 다른 메모리 슬라이스로 데이터를 복사하는 단계를 포함하는, 방법.A method comprising copying data from one or more physical pages of a subset containing modified data to another memory slice in a memory system.

12. 예 11에 있어서, 복사하는 단계는,12. In Example 11, the copying step is:

하나 이상의 물리적 페이지들로부터 데이터를 복사하는 것에 후속하여 서브세트의 나머지 물리적 페이지들로부터 데이터를 복사하는 단계를 추가로 포함하는, 방법.The method further comprising copying data from the remaining physical pages of the subset subsequent to copying the data from the one or more physical pages.

13. 예 1 내지 예 12 중 임의의 예에 있어서, 제1 메모리 슬라이스가 디스에이블되어야 한다고 결정하는 단계는 전력 소비의 감소를 나타내는 시스템 내의 하나 이상의 인자들을 검출하는 것에 기초하는, 방법.13. The method of any of examples 1-12, wherein determining that the first memory slice should be disabled is based on detecting one or more factors in the system that indicate a decrease in power consumption.

14. 시스템으로서,14. As a system,

메모리 시스템을 형성하는 하나 이상의 메모리 디바이스들에 결합된 하나 이상의 메모리 제어기들 - 메모리 시스템은 복수의 메모리 슬라이스들을 포함하고, 복수의 메모리 슬라이스들 중 주어진 메모리 슬라이스는 복수의 물리적 페이지들이 매핑되는 물리적 메모리 자원임 -;One or more memory controllers coupled to one or more memory devices forming a memory system - the memory system includes a plurality of memory slices, and a given memory slice of the plurality of memory slices is a physical memory resource to which a plurality of physical pages are mapped. lim -;

하나 이상의 프로세서들; 및one or more processors; and

하나 이상의 프로세서들에 의해 실행될 때, 시스템으로 하여금 동작들을 수행하게 하는 복수의 명령어들을 저장하는 비일시적 컴퓨터 판독가능 저장 매체를 포함하며, 동작들은,A non-transitory computer-readable storage medium storing a plurality of instructions that, when executed by one or more processors, cause a system to perform operations, the operations comprising:

메모리 시스템 내의 복수의 메모리 슬라이스들 중 제1 메모리 슬라이스가 디스에이블되어야 함을 검출하는 것; 및detecting that a first memory slice of the plurality of memory slices in the memory system should be disabled; and

제1 메모리 슬라이스 내의 물리적 페이지들의 서브세트를 복수의 메모리 슬라이스들 중 다른 메모리 슬라이스로 복사하는 것 - 물리적 페이지들의 서브세트 내의 데이터는 임계 레이트 초과로 액세스되고 있음 -;copying a subset of physical pages in a first memory slice to another of the plurality of memory slices, where data in the subset of physical pages is being accessed above a threshold rate;

물리적 페이지들의 서브세트에 대응하는 가상 어드레스들을 다른 메모리 슬라이스에 재매핑하는 것; 및remapping virtual addresses corresponding to a subset of physical pages to a different memory slice; and

제1 메모리 슬라이스를 디스에이블하는 것을 포함하는, 시스템.A system comprising disabling a first memory slice.

15. 예 14에 있어서, 제1 메모리 슬라이스를 디스에이블하는 것은, 제1 메모리 슬라이스 내의 하나 이상의 동적 액세스 메모리들(DRAM들)을 자체 리프레시 모드에 활성으로 배치하는 것을 포함하는, 시스템.15. The system of Example 14, where disabling the first memory slice includes actively placing one or more dynamic access memories (DRAMs) within the first memory slice in a self-refresh mode.

16. 예 14에 있어서, 제1 메모리 슬라이스를 디스에이블하는 것은, 제1 메모리 슬라이스 내의 하나 이상의 동적 액세스 메모리들(DRAM들)이 액세스의 결여로 인해 자체-리프레시 모드로 전이하도록 허용하는 것을 포함하는, 시스템.16. The method of Example 14, where disabling the first memory slice includes allowing one or more dynamic access memories (DRAMs) within the first memory slice to transition to a self-refresh mode due to lack of access. , system.

17. 예 14 내지 예 16 중 임의의 예에 있어서, 메모리 시스템은 복수의 메모리 제어기들을 포함하고, 물리적 메모리 자원은 복수의 메모리 제어기들 중 적어도 하나를 포함하는, 시스템.17. The system of any of examples 14-16, wherein the memory system includes a plurality of memory controllers and the physical memory resource includes at least one of the plurality of memory controllers.

18. 예 14 내지 예 17 중 임의의 예에 있어서, 메모리 시스템은 복수의 메모리 채널들을 포함하고, 주어진 동적 랜덤 액세스 메모리(DRAM)는 복수의 메모리 채널들 중 하나에 결합되고, 주어진 메모리 슬라이스는 복수의 메모리 채널들 중 적어도 하나를 포함하는, 시스템.18. The example of any of Examples 14-17, wherein the memory system includes a plurality of memory channels, a given dynamic random access memory (DRAM) is coupled to one of the plurality of memory channels, and the given memory slice is coupled to one of the plurality of memory channels. A system comprising at least one of the memory channels.

19. 예 14 내지 예 18 중 임의의 예에 있어서, 제1 메모리 슬라이스가 디스에이블되어야 한다고 결정하는 것은,19. The method of any of examples 14-18, wherein determining that the first memory slice should be disabled includes:

제1 메모리 슬라이스에 대한 액세스 레이트가 제1 임계치보다 낮음을 검출하는 것; 및detecting that the access rate for the first memory slice is below a first threshold; and

제2 임계치보다 더 빈번하게 액세스되는 물리적 페이지들의 서브세트를 식별하는 것을 포함하는, 시스템.A system comprising: identifying a subset of physical pages that are accessed more frequently than a second threshold.

20. 예 19에 있어서,20. In Example 19,

액세스 레이트가 제1 임계치보다 낮음을 검출하는 것에 기초하여, 제1 메모리 슬라이스에 대응하는 복수의 물리적 페이지들을 메모리 할당기 내의 가상 어드레스들에 할당하는 것을 디스에이블하는 것을 추가로 포함하는, 시스템.The system further comprising disabling assignment of the plurality of physical pages corresponding to the first memory slice to virtual addresses in the memory allocator based on detecting that the access rate is below the first threshold.

21. 예 20에 있어서,21. In Example 20,

복수의 물리적 페이지들의 할당을 디스에이블하는 것에 후속하여 식별하는 것을 수행하는 것을 추가로 포함하는, 시스템.The system further comprising performing identification followed by disabling allocation of the plurality of physical pages.

22. 메모리 시스템을 포함하는 시스템 내의 하나 이상의 프로세서들에 의해 실행될 때, 시스템으로 하여금 동작들을 수행하게 하는 복수의 명령어들을 저장하는 비일시적 컴퓨터 판독가능 저장 매체로서, 동작들은,22. A non-transitory computer-readable storage medium storing a plurality of instructions that, when executed by one or more processors in a system including a memory system, cause a system to perform operations, the operations comprising:

메모리 시스템 내의 복수의 메모리 슬라이스들 중 제1 메모리 슬라이스가 디스에이블되어야 함을 검출하는 것 - 복수의 메모리 슬라이스들 중 주어진 메모리 슬라이스는 복수의 물리적 페이지들이 매핑되는 물리적 메모리 자원임 -; 및detecting that a first memory slice of the plurality of memory slices in the memory system should be disabled, where a given memory slice of the plurality of memory slices is a physical memory resource to which a plurality of physical pages are mapped; and

제1 메모리 슬라이스를 디스에이블하는 것을 포함하는, 비일시적 컴퓨터 판독가능 저장 매체.A non-transitory computer-readable storage medium comprising disabling a first memory slice.

23. 시스템으로서,23. As a system,

메모리 디바이스들에 대한 액세스를 제어하도록 구성된 복수의 메모리 제어기들;a plurality of memory controllers configured to control access to memory devices;

메모리 어드레스들을 사용하여 메모리 디바이스들 내의 데이터에 액세스하도록 구성된 복수의 하드웨어 에이전트들; 및a plurality of hardware agents configured to access data in memory devices using memory addresses; and

복수의 메모리 제어기들 및 복수의 하드웨어 에이전트들에 결합된 통신 패브릭을 포함하며, 여기서comprising a communication fabric coupled to a plurality of memory controllers and a plurality of hardware agents, wherein

통신 패브릭은 제1 메모리 어드레스에 기초하여 제1 메모리 어드레스를 갖는 메모리 요청을 복수의 메모리 제어기들 중 제1 메모리 제어기로 라우팅하도록 구성되고,the communication fabric is configured to route a memory request having a first memory address to a first memory controller of the plurality of memory controllers based on the first memory address;

제1 메모리 어드레스의 어드레스 비트들의 복수의 서브세트들은 메모리 요청을 복수의 입도 레벨들에서 제1 메모리 제어기로 지향시키도록 해싱되고,A plurality of subsets of address bits of the first memory address are hashed to direct the memory request to the first memory controller at a plurality of levels of granularity,

복수의 서브세트들 중 주어진 서브세트 내의 적어도 하나의 어드레스 비트는 복수의 서브세트들 중 나머지 서브세트들에 포함되지 않고;At least one address bit in a given subset of the plurality of subsets is not included in the remaining subsets of the plurality of subsets;

제1 메모리 제어기는 제1 메모리 어드레스의 복수의 어드레스 비트들을 드롭하여 제1 메모리 제어기 내에서 사용되는 제2 어드레스를 형성하도록 구성되고;The first memory controller is configured to drop a plurality of address bits of the first memory address to form a second address used within the first memory controller;

복수의 어드레스 비트들의 각자의 비트들은 복수의 서브세트들 중 주어진 서브세트 내의 적어도 하나의 어드레스 비트인, 시스템.A system, wherein each bit of the plurality of address bits is at least one address bit in a given subset of the plurality of subsets.

24. 예 23에 있어서, 해시는 논리적으로 가역적인 부울 연산인, 시스템.24. The system of Example 23 where hash is a logically reversible Boolean operation.

25. 예 24에 있어서, 해시는 어드레스 비트들의 배타적-논리합(XOR) 감소인, 시스템.25. The system of Example 24, wherein the hash is an exclusive-or (XOR) reduction of the address bits.

26. 예 23 내지 예 25 중 임의의 예에 있어서, 제1 메모리 제어기는 다른 어드레스 비트들 및 메모리 요청이 전달되는 제1 메모리 제어기의 식별로부터 드롭된 비트들을 복구하도록 구성되는, 시스템.26. The system of any of examples 23-25, wherein the first memory controller is configured to recover dropped bits from other address bits and an identification of the first memory controller to which the memory request is directed.

27. 예 23 내지 예 26 중 임의의 예에 있어서, 복수의 입도 레벨들의 각자의 레벨들에서 해싱되는 어드레스 비트들의 복수의 서브세트들을 식별하도록 프로그래밍가능한 복수의 구성 레지스터들을 추가로 포함하는, 시스템.27. The system of any of examples 23-26, further comprising a plurality of configuration registers programmable to identify a plurality of subsets of address bits to be hashed at respective levels of the plurality of granularity levels.

28. 예 27에 있어서, 복수의 구성 레지스터들은 어드레스 비트들을 식별하는 비트 마스크들로 프로그래밍가능한, 시스템.28. The system of Example 27, wherein the plurality of configuration registers are programmable with bit masks that identify address bits.

29. 예 23 내지 예 28 중 임의의 예에 있어서, 드롭되는 복수의 어드레스 비트들을 식별하도록 프로그래밍가능한 복수의 구성 레지스터들을 추가로 포함하는, 시스템.29. The system of any of examples 23-28, further comprising a plurality of configuration registers programmable to identify the plurality of address bits that are dropped.

30. 예 28 또는 예 29에 있어서, 복수의 구성 레지스터들은 원-핫 비트 마스크들로 프로그래밍가능한, 시스템.30. The system of Example 28 or Example 29, wherein the plurality of configuration registers are programmable with one-hot bit masks.

31. 예 청구항 23 내지 30 중 임의의 예 청구항에 있어서, 복수의 메모리 제어기들은 시스템 내의 하나 이상의 집적 회로 다이에 걸쳐 물리적으로 분산되고, 복수의 입도 레벨들의 서브세트는 제1 메모리 제어기의 물리적 위치와 연관되는, 시스템.31. Example The example of any of claims 23-30, wherein the plurality of memory controllers are physically distributed across one or more integrated circuit die within the system, and wherein the subset of the plurality of granularity levels corresponds to the physical location of the first memory controller and associated system.

32. 예 31에 있어서, 서브세트는 제1 메모리 제어기가 위치되는 집적 회로 다이를 식별하는 적어도 하나의 다이 레벨을 포함하는, 시스템.32. The system of example 31, wherein the subset includes at least one die level that identifies the integrated circuit die on which the first memory controller is located.

33. 예 31 또는 예 32에 있어서, 서브세트는 집적 회로 다이 내의 물리적 위치를 식별하는 복수의 레벨들을 포함하는, 시스템.33. The system of example 31 or 32, wherein the subset includes a plurality of levels that identify a physical location within the integrated circuit die.

34. 제31항 내지 제33항 중 어느 한 항에 있어서, 서브세트는 제1 메모리 제어기에 의해 제어되는 복수의 메모리 디바이스들 중 어느 메모리 디바이스가 제1 메모리 어드레스와 연관된 데이터를 저장하고 있는지를 식별하는 복수의 레벨들을 포함하는, 시스템.34. The method of any of clauses 31-33, wherein the subset identifies which of the plurality of memory devices controlled by the first memory controller is storing data associated with the first memory address. A system comprising a plurality of levels.

35. 방법으로서,35. As a method,

시스템 내의 복수의 메모리 제어기들 중 제1 메모리 제어기에서 복수의 어드레스 비트들을 포함하는 어드레스를 수신하는 단계 - 어드레스는 제1 메모리 제어기로 라우팅되고, 제1 메모리 제어기에 의해 제어되는 복수의 메모리 디바이스들 중 제1 메모리 디바이스는 복수의 어드레스 비트들의 세트들의 복수의 해시들에 기초하여 선택됨 -;Receiving an address including a plurality of address bits from a first memory controller among a plurality of memory controllers in the system - the address is routed to the first memory controller, and the address is routed to one of the plurality of memory devices controlled by the first memory controller. a first memory device is selected based on a plurality of hashes of a plurality of sets of address bits;

복수의 어드레스 비트들 중 복수를 드롭하는 단계 - 복수의 어드레스 비트들 중 복수의 주어진 비트는 복수의 해시들 중 하나에 포함되고, 복수의 해시들 중 나머지 것들로부터 배제됨 -; 및dropping a plurality of the plurality of address bits, wherein a given plurality of bits of the plurality of address bits are included in one of the plurality of hashes and excluded from the remainder of the plurality of hashes; and

복수의 어드레스 비트들 중 나머지 어드레스 비트들을 시프팅하여 제1 메모리 제어기 내에서 사용되는 압축된 어드레스를 형성하는 단계를 포함하는, 방법.A method comprising shifting remaining address bits of the plurality of address bits to form a compressed address for use within a first memory controller.

36. 예 35에 있어서,36. In Example 35:

복수의 해시들에서 사용되는 복수의 어드레스 비트들의 세트들 및 제1 메모리 제어기의 식별에 기초하여 복수의 어드레스 비트들 중 복수를 복구하는 것을 추가로 포함하는, 방법.The method further comprising recovering a plurality of the plurality of address bits based on the identification of the first memory controller and the sets of the plurality of address bits used in the plurality of hashes.

37. 예 35 또는 예 36에 있어서, 압축된 어드레스에 기초하여 메모리 제어기에 의해 제어되는 메모리 디바이스에 액세스하는 단계를 추가로 포함하는, 방법.37. The method of examples 35 or 36, further comprising accessing a memory device controlled by the memory controller based on the compressed address.

38. 예 35 내지 예 27 중 임의의 예에 있어서,38. The method of any of Examples 35 through 27,

복수의 해시들 중 각자의 것들에 포함된 복수의 어드레스 비트들의 세트들을 식별하기 위해 복수의 구성 레지스터들을 프로그래밍하는 단계를 추가로 포함하는, 방법.The method further comprising programming a plurality of configuration registers to identify a plurality of sets of address bits included in respective ones of the plurality of hashes.

39. 예 38에 있어서, 프로그래밍하는 단계는 복수의 어드레스 비트들의 세트들을 식별하는 비트 마스크들로 복수의 구성 레지스터들을 프로그래밍하는 단계를 포함하는, 방법.39. The method of example 38, wherein programming includes programming a plurality of configuration registers with bit masks that identify a plurality of sets of address bits.

40. 예 35 내지 예 39 중 임의의 예에 있어서,40. The method of any of Examples 35 through 39,

드롭되는 복수의 어드레스 비트들 중 복수를 식별하기 위해 복수의 구성 레지스터들을 프로그래밍하는 단계를 추가로 포함하는, 방법.The method further comprising programming a plurality of configuration registers to identify a plurality of the plurality of address bits that are dropped.

41. 예 39 또는 예 40에 있어서, 프로그래밍하는 단계는 복수의 구성 레지스터들을 원-핫 비트 마스크들로 프로그래밍하는 단계를 포함하는, 방법.41. The method of examples 39 or 40, wherein programming includes programming the plurality of configuration registers with one-hot bit masks.

42. 메모리 제어기로서,42. As a memory controller,

어드레스 비트들을 포함하는 어드레스를 포함하는 메모리 요청에 대한 목적지로서 메모리 제어기를 선택하기 위해 해싱되는 각자의 복수의 어드레스 비트들을 식별하도록 프로그래밍가능한 복수의 구성 레지스터들; 및a plurality of configuration registers programmable to identify a respective plurality of address bits that are hashed to select a memory controller as a destination for a memory request containing an address containing the address bits; and

복수의 구성 레지스터들에 결합된 제어 회로를 포함하며, 여기서 제어 회로는 메모리 제어기에 의해 제어되는 하나 또는 메모리 디바이스들에 액세스하기 위해 압축된 어드레스를 생성하도록 구성되고, 제어 회로는 복수의 어드레스 비트들을 드롭하고 어드레스의 나머지 비트들을 시프팅하여 압축된 어드레스를 생성하도록 구성되는, 메모리 제어기.a control circuit coupled to a plurality of configuration registers, wherein the control circuit is configured to generate a compressed address to access one or more memory devices controlled by the memory controller, wherein the control circuit is configured to generate a compressed address to access the plurality of address bits. A memory controller configured to generate a compressed address by dropping and shifting remaining bits of the address.

43. 예 42에 있어서, 제어 회로에 결합된 제2 복수의 구성 레지스터들을 추가로 포함하고, 제2 복수의 구성 레지스터들은 드롭되는 복수의 어드레스 비트들을 식별하도록 프로그래밍가능한, 메모리 제어기.43. The memory controller of example 42, further comprising a second plurality of configuration registers coupled to the control circuit, wherein the second plurality of configuration registers are programmable to identify the plurality of address bits that are dropped.

일단 위의 개시내용이 완전히 이해되면 다수의 변형들 및 수정들이 당업자들에게 명백해질 것이다. 다음의 청구범위는 모든 그러한 변형들 및 수정들을 망라하는 것으로 해석되도록 의도된다.Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully understood. It is intended that the following claims be construed to encompass all such variations and modifications.

Claims

As a system,
a plurality of memory controllers configured to control access to memory devices;
A plurality of hardware agents configured to access data in the memory devices using memory addresses.
- the memory addresses are defined within a memory address space that maps memory addresses to memory locations within the memory devices,
A given memory address in the memory address space uniquely identifies a memory location within one of the memory devices coupled to one of the plurality of memory controllers,
A given page in the memory address space is divided into a plurality of blocks,
the plurality of blocks of the given page are distributed across two or more of the plurality of memory controllers; and
A communication fabric coupled to the plurality of memory controllers and the plurality of hardware agents,
the communication fabric is configured to route a memory request having a first memory address to a first memory controller of the plurality of memory controllers based on the first memory address,
wherein circuitry within the system is configured to hash independently-specified address bits of the first memory address to direct the memory request to the first memory controller at a plurality of levels of granularity.

According to paragraph 1,
and wherein a given level of the plurality of granularity levels selects between at least two memory controllers of the plurality of memory controllers.

According to claim 1 or 2,
the plurality of memory controllers are physically distributed across one or more integrated circuit die within the system;
the one or more integrated circuit dies are a plurality of integrated circuit dies;
the plurality of granularity levels include a die level;
The system of claim 1, wherein the die level specifies which of the plurality of integrated circuit dies includes the first memory controller.

According to any one of claims 1 to 3,
the plurality of memory controllers are physically distributed across one or more integrated circuit die within the system;
the plurality of memory controllers on a given integrated circuit die are logically divided into a plurality of slices based on physical locations on the given integrated circuit die;
at least two of the plurality of memory controllers are included in a given one of the plurality of slices;
the plurality of granularity levels include a slice level;
The system of claim 1, wherein the slice level specifies which of the plurality of slices includes the first memory controller.

According to paragraph 4,
the at least two memory controllers within the given slice are logically partitioned into a plurality of rows based on their physical location on the given integrated circuit die;
the plurality of granularity levels include a row level;
wherein the row level specifies which of the plurality of rows contains the first memory controller.

According to clause 5,
the plurality of rows include a plurality of sides based on physical location on the given integrated circuit die;
the plurality of granularity levels include a side level;
The system of claim 1, wherein the side level specifies which side of a given row of the plurality of rows contains the first memory controller.

7. The method of any one of claims 1 to 6, wherein a given hardware agent of the plurality of hardware agents is programmed with data identifying which address bits are included in the hash at one or more of the plurality of granularity levels. A system, possibly including one or more registers.

The method of claim 7, wherein a first hardware agent of the plurality of hardware agents is programmable for a first number of the plurality of granularity levels, and a second hardware agent of the plurality of hardware agents is programmable for a first number of the plurality of granularity levels. A system, wherein the system is programmable for a second number of levels, the second number being different from the first number.

9. The method of claim 7 or 8, wherein a given memory controller of the plurality of memory controllers identifies which address bits are included in the plurality of granularity levels and one or more other granularity levels within the given memory controller. A system comprising one or more registers programmable with data.

10. The system of claim 9, wherein the finest level of granularity identifies a memory device coupled to the given memory controller.

11. The method of any one of claims 7 to 10, wherein the one or more registers cause the communication fabric to route a first memory request having a first address to a first memory controller of the plurality of memory controllers. It is programmable with data that routes a second memory request having a second address to a second memory controller that is physically distant from the first memory controller among the plurality of memory controllers, and the first address and the second address are Adjacent addresses at a second granularity level, the system.

12. The system of claim 11, wherein the first route of the first memory request through the communications fabric and the second route of the second memory request through the communications fabric are completely non-overlapping.

11. The method of any one of claims 7-10, wherein the one or more registers allow the communication fabric to store the plurality of memory requests for consecutive addresses in a pattern that distributes the plurality of memory requests across physically distant memory controllers. A system, programmable with data to route memory requests to different memory controllers of the plurality of memory controllers.

According to any one of claims 1 to 13,
the independently-specified address bits include a plurality of subsets of address bits corresponding to respective levels of the plurality of granularity levels,
At least one address bit in a given subset of the plurality of subsets is not included in the remaining subsets of the plurality of subsets;
the first memory controller is configured to drop a plurality of address bits of the first memory address to form a second address used within the first memory controller;
and wherein respective bits of the plurality of address bits are the at least one address bit within the given subset of the plurality of subsets.

15. The system of claim 14, wherein the hash is a logically reversible Boolean operation.

16. The system of claim 15, wherein the hash is an exclusive-or (XOR) reduction of the address bits.

17. The system of claim 15 or 16, wherein the first memory controller is configured to recover the dropped bits from the other address bits and an identification of the first memory controller to which the memory request is directed.

As a method,
generating a memory request having a first address in a memory address space that is mapped to memory devices in a system having a plurality of memory controllers.
- a given memory address within the memory address space uniquely identifies a memory location within one of the memory devices coupled to one of the plurality of memory controllers,
A given page in the memory address space is divided into a plurality of blocks,
the plurality of blocks of the given page are distributed across two or more of the plurality of memory controllers;
Directing the memory request to a first of the plurality of memory controllers by hashing independently-specified sets of address bits from the first address, wherein the independently-specified sets of address bits are positioning the first memory controller at granularity levels; and
and routing the memory request to the first memory controller based on the hashing.

According to clause 18,
and wherein a given level of the plurality of granularity levels selects between at least two memory controllers of the plurality of memory controllers.

According to claim 18 or 19,
the plurality of memory controllers are physically distributed across one or more integrated circuit die;
the one or more integrated circuit dies are a plurality of integrated circuit dies;
the plurality of granularity levels include a die level;
The method of claim 1, wherein the die level specifies which of the plurality of integrated circuit dies includes the first memory controller.

According to any one of claims 18 to 20,
the plurality of memory controllers are physically distributed across one or more integrated circuit die;
the plurality of memory controllers on a given integrated circuit die are logically divided into a plurality of slices based on physical locations on the given integrated circuit die;
at least two of the plurality of memory controllers are included in a given one of the plurality of slices;
the plurality of granularity levels include a slice level;
The method of claim 1, wherein the slice level specifies which of the plurality of slices includes the first memory controller.

According to clause 21,
the at least two memory controllers within the given slice are logically partitioned into a plurality of rows based on their physical location on the given integrated circuit die;
the plurality of granularity levels include a row level;
wherein the row level specifies which of the plurality of rows contains the first memory controller.

According to clause 22,
the plurality of rows include a plurality of sides based on physical location on the given integrated circuit die;
the plurality of granularity levels include a side level;
wherein the side level specifies which side of a given row of the plurality of rows contains the first memory controller.

24. The method of any one of claims 18 to 23, wherein a given hardware agent of the plurality of hardware agents generating memory requests comprises one or more registers, the method comprising:
The method further comprising programming the one or more registers with data identifying which address bits are included in the hash at one or more of the plurality of granularity levels.

25. The method of claim 24, wherein a first hardware agent of the plurality of hardware agents is programmable for a first number of the plurality of granularity levels, and a second hardware agent of the plurality of hardware agents is programmable for a first number of the plurality of granularity levels. A method, wherein the method is programmable for a second number of levels, wherein the second number is different from the first number.

26. The method of claim 24 or 25, wherein a given memory controller of the plurality of memory controllers identifies which address bits are included in the plurality of granularity levels and one or more other granularity levels within the given memory controller. A method comprising one or more registers programmable with data.

According to clause 18,
Detecting that at least one first memory slice among a plurality of memory slices in the system should be disabled, wherein a given memory slice among the plurality of memory slices is a physical memory resource to which a plurality of physical pages are mapped, At least one memory slice includes two or more memory controllers of the plurality of memory controllers to which a given page of the plurality of pages is mapped;
Based on detecting that the first memory slice should be disabled:
copying a subset of physical pages in the first memory slice to another of the plurality of memory slices, where data in the subset of physical pages is being accessed above a threshold rate;
remapping virtual addresses corresponding to the subset of physical pages to the different memory slice; and
The method further comprising disabling the first memory slice.