KR100987832B1

KR100987832B1 - System, apparatus and method for managing predictions of various access types to a memory associated with cache memory

Info

Publication number: KR100987832B1
Application number: KR1020077003839A
Authority: KR
Inventors: 지야드 에스 하쿠라; 라도슬라브 다닐라크; 브래드 더블유 시머럴; 브라이언 케이트 랑엔도르프; 스테파노 에이 페스카도르; 드미트리 비시트스키
Original assignee: 엔비디아 코포레이션
Priority date: 2004-08-17
Filing date: 2005-08-16
Publication date: 2010-10-13
Also published as: WO2006038991A2; TWI348097B; JP5059609B2; WO2006038991A3; TW200619937A; KR20070050443A; JP2008510258A

Abstract

메모리로의 액세스를 예측하는 시스템, 장치 및 방법이 개시된다. 일 실시형태에서, 예시적인 장치는 프로그램 인스트럭션을 실행하고 프로그램 데이터를 프로세스하도록 구성된 프로세서를 포함하고, 메모리는 프로그램 인스트럭션 및 프로그램 데이터, 및 메모리 프로세서를 포함한다. 메모리 프로세서는 프로그램 인스트럭션 또는 프로그램 데이터를 포함하는 어드레스를 수신하도록 구성된 스페큘레이터를 포함할 수 있다. 이러한 스페큘레이터는 순차 및 비순차 어드레스의 구성가능한 수를 각각 발생시키기 위해 순차 예측기 및 비순차 예측기를 포함할 수 있다. 일 실시형태에서, 프리페쳐는 그 장치를 구현한다. 다양한 실시형태에서, 스페큘레이터는 촉진기, 억제기, 인벤토리, 인벤토리 필터, 포스트-인벤토리 필터 및 단기간 및 장기간 캐시를 포함할 수 있는 데이터 복귀 캐시 메모리 중 임의의 하나를 포함할 수 있다.Systems, apparatus, and methods are disclosed for predicting access to memory. In one embodiment, an example apparatus includes a processor configured to execute program instructions and process program data, and the memory includes program instructions and program data, and a memory processor. The memory processor may include a speculator configured to receive an address containing a program instruction or program data. Such speculators may include sequential predictors and nonsequential predictors to generate configurable numbers of sequential and nonsequential addresses, respectively. In one embodiment, the prefetcher implements the device. In various embodiments, the speculator can include any one of a facilitator, suppressor, inventory, inventory filter, post-inventory filter, and data return cache memory, which can include short term and long term caches.

순차 어드레스 예측기, 비순차 어드레스 예측기 Sequential address predictor, non-sequential address predictor

Description

SYSTEM, APPARATUS AND METHOD FOR MANAGING PREDICTIONS OF VARIOUS ACCESS TYPES TO A MEMORY ASSOCIATED WITH CACHE MEMORY}

발명의 간단한 설명Brief Description of the Invention

본 발명은 일반적으로 컴퓨팅 시스템에 관한 것으로, 더욱 상세하게는, 예를 들어, 구성가능한 양의 예측을 발생시킬 뿐만 아니라 예측 인벤토리 (prediction inventory) 및/또는 다중-레벨 캐시 (multi-level cache) 에 반하는 예측을 억제하고 필터링함으로써, 메모리로의 순차적 및 비순차적 액세스를 예측하는 시스템에 관한 것이다.FIELD OF THE INVENTION The present invention relates generally to computing systems and, more particularly, to generating a configurable amount of prediction, for example, into prediction inventory and / or multi-level cache. A system for predicting sequential and nonsequential access to memory by suppressing and filtering contrary predictions.

발명의 배경Background of the Invention

프리페쳐 (prefetcher) 는, 프로세서가 필요로 하는 회수된 정보 그 자체를 즉시 이용할 수 있도록, 프로그램 인스트럭션 및 프로그램 데이터를 페치 (fetch) 하여 이용된다. 프리페쳐는, 프로세서가 통상적으로 그 프로세서보다 느린 레이트로 동작하는 시스템 메모리로부터 액세스되는 인스트럭션 또는 데이터에 대해 대기할 필요가 없도록, 미래에 프로세서가 이용할 수도 있는 인스트럭션 및 데이터를 예측한다. 프로세서와 시스템 메모리 사이에서 구현되는 프리페쳐를 통해서, 이 프로세서는 메모리에서 요청된 데이터에 대해 대기하는 것과 같은 유휴 (idle) 상태로 잔류할 가능성은 적다. 이와 같이, 프리페쳐는 일반적으로 프로 세서 수행능력을 개선시킨다.The prefetcher is used to fetch program instructions and program data so that the retrieved information itself needed by the processor is immediately available. Prefetchers predict the instructions and data that a processor may use in the future so that the processor does not have to wait for instructions or data accessed from system memory that typically operates at a slower rate than that processor. With prefetchers implemented between the processor and system memory, the processor is less likely to remain idle, such as waiting for data requested in memory. As such, prefetchers generally improve processor performance.

일반적으로, 프리페쳐가 더 많은 예측을 발생시킬수록, 프리페쳐가 프로세서에 이용가능한 필수 인스트럭션 및 데이터를 갖도록 배치될 수 있어서, 프로세서의 지연 (latency) 이 감소할 가능성이 커진다. 그러나, 종래의 프리페쳐는 통상적으로 예측 프로세스의 충분한 관리가 부족하다. 이러한 관리 없이, 이들 프리페쳐는, 그 프리페쳐가 조정할 수 있는 양의 예측 어드레스의 양을 초과했을 경우, 메모리 리소스를 오버로드하는 경향이 있다. 따라서, 리소스 오버로드를 방지하기 위하여, 통상의 프리페쳐는, 프리페쳐 또는 메모리 리소스 둘 중 하나를 오버로드할 수 있는 예측의 양을 발생시키지 않도록 예측을 생성하는데 있어 신중한 경향이 있다. 또한, 종래의 프리페쳐는 통상적으로 이러한 예측 프로세스를 구현하는 비용을 고려하지 않고 예측을 발생시키고, 그로 인해, 예측 프로세스를 체계적으로 함으로써 (streamlining) 발생하는 이득 및 이를 지원하는데 필요한 리소스의 양을 분명하게 파악하는데 실패하게 된다. 특히, 프리페쳐의 전형적인 유형은 본래 자연적 순차의 예측을 발생시키는 표준 기술에 의존하고, 계산적 (computational) 또는 그렇지 않게, 리소스를 보존하는 방법으로 예측을 저장하지 않는다. 또한, 종래의 프리페쳐는 일반적으로 예측 프로세스의 충분한 관리가 부족하고, 예측된 어드레스의 양이 그 프리페쳐가 조정할 수 있는 양을 초과했을 경우, 전산 및 메모리 리소스를 오버로드시키는 경향이 있다. 따라서, 리소스 오버로드를 방지하기 위해, 이들 프리페쳐는 그 프리페쳐를 오버로드하는 양의 예측을 발생시키지 않도록 예측을 발생시키는데 보존적인 (conservative) 경향이 있 다. 또한, 수많은 종래의 프리페쳐는 예측이 생성된 후 및 하나의 프로세서가 예측을 요청하기 전에 예측을 관리하는 능력이 부족하다. 일반적으로, 이들 프리페쳐는 캐시에 사전에 저장된 예측들과 관련된 잉여의 (superfluous) 예측들을 제한하기 위하여 통상적으로 기능이 결핍된 단일 캐시 메모리에 프리페치 데이터를 저장한다. 종래의 프리페쳐에 이용되는 캐시 메모리는 단지 데이터를 저장하기 위한 것으로, 그 내부에 저장되는 예측된 어드레스를 효과적으로 관리하는데 충분하게 설계되지 않았다.In general, the more prefetchers generate the predictions, the more likely the prefetchers can be arranged to have the necessary instructions and data available to the processor, thus increasing the likelihood that the processor's latency is reduced. However, conventional prefetchers typically lack sufficient management of the prediction process. Without such management, these prefetchers tend to overload memory resources when they exceed the amount of predictable address that the prefetcher can adjust. Thus, to prevent resource overload, conventional prefetchers tend to be cautious in generating predictions so as not to generate an amount of prediction that can overload either the prefetcher or the memory resource. In addition, conventional prefetchers typically generate predictions without considering the cost of implementing such a prediction process, thereby clarifying the benefits that occur by streamlining the prediction process and the amount of resources needed to support it. Will fail to grasp. In particular, the typical type of prefetcher relies on standard techniques for generating predictions of natural sequential in nature and do not store predictions in a computational or otherwise resource-conserving manner. Also, conventional prefetchers generally lack sufficient management of the prediction process and tend to overload computational and memory resources when the amount of predicted addresses exceeds the amount that the prefetcher can adjust. Thus, to prevent resource overload, these prefetchers tend to be conservative in generating predictions so as not to generate an amount of prediction that overloads the prefetchers. In addition, many conventional prefetchers lack the ability to manage predictions after they are generated and before one processor requests them. In general, these prefetchers store prefetch data in a single cache memory that typically lacks functionality to limit superfluous predictions associated with predictions previously stored in the cache. The cache memory used in conventional prefetchers is only for storing data and is not designed sufficiently to effectively manage the predicted addresses stored therein.

전술한 관점에서, 메모리로의 액세스를 효율적으로 예측하는 시스템, 장치 및 방법을 제공하는 것이 바람직하다. 예시적인 시스템, 장치 또는 방법은 적어도 전술한 결점을 최소화시키거나 또는 제거하는 것이 이상적이다.In view of the foregoing, it would be desirable to provide a system, apparatus, and method for efficiently predicting access to a memory. Exemplary systems, devices or methods are ideal to minimize or eliminate at least the aforementioned drawbacks.

발명의 개요Summary of the Invention

메모리로의 액세스를 예측하는 시스템, 장치, 및 방법이 개시된다. 일 실시형태에서, 예시적인 장치는 프로그램 인스트럭션 및 프로그램 데이터를 실행하도록 구성된 프로세서, 프로그램 인스트럭션 및 프로그램 데이터를 포함하는 메모리, 및 메모리 프로세서를 포함한다. 메모리 프로세서는 프로그램 인스트럭션 또는 프로그램 데이터를 포함하는 어드레스를 수신하도록 구성된 스페큘레이터 (speculator) 를 포함할 수 있다. 이러한 스페큘레이터는 순차적인 어드레스의 구성가능한 수를 발생시키기 위한 순차 예측기를 포함할 수 있다. 또한, 이 스페큘레이터는 어드레스에 어드레스들의 서브세트를 결합시키도록 구성된 비순차 예측기를 포함할 수 있다. 또한, 비순차 예측기는 하나 이상의 서브세트의 어드 레스에 기초하여 어드레스의 그룹을 예측하도록 구성될 수 있고, 여기서, 서브세트의 하나 이상의 어드레스는 그 어드레스로 패턴형성이 불가능하다. 일 실시형태에서, 예시적인 비순차 예측기는 메모리로의 액세스를 예상한다. 비순차 예측기는 어드레스에서 인덱스 및 태그를 생성하도록 구성된 예측 생성기를 포함한다. 또한, 비순차 예측기는 예측 생성기에 연결된 타겟 캐시를 포함한다. 타겟 캐시는 트리거-타겟 조합을 저장하기 위한 메모리 위치를 각각 가지는 수많은 메모리의 부분을 포함한다. 메모리의 제 1 부분에 저장된 트리거-타겟 조합은 메모리의 제 2 부분에 저장된 다른 트리거-타겟 조합보다 더 높은 우선순위와 결합된다.Systems, apparatus, and methods are disclosed for predicting access to memory. In one embodiment, an example apparatus includes a processor configured to execute program instructions and program data, a memory including the program instructions and program data, and a memory processor. The memory processor may include a speculator configured to receive an address containing a program instruction or program data. Such speculators may include a sequential predictor for generating a configurable number of sequential addresses. The speculator may also include an out of order predictor configured to combine a subset of addresses with the address. In addition, the out of order predictor may be configured to predict a group of addresses based on the addresses of one or more subsets, wherein one or more addresses of the subset are not patternable with the addresses. In one embodiment, the exemplary out of order predictor expects access to memory. The out of order predictor includes a prediction generator configured to generate an index and a tag at an address. The out of order predictor also includes a target cache coupled to the prediction generator. The target cache includes a number of pieces of memory each having a memory location for storing trigger-target combinations. The trigger-target combination stored in the first portion of the memory is combined with a higher priority than other trigger-target combinations stored in the second portion of the memory.

본 발명의 일 실시형태에서, 장치는 일 그룹의 아이템을 유지시키도록 구성된 각각의 큐 (queue) 를 포함하는 예측 인벤토리를 포함한다. 통상적으로, 아이템의 그룹은 그 아이템의 그룹에 대응하는 트리거링 어드레스를 포함한다. 그룹의 각각의 아이템은 일 유형의 예측으로 이루어진다. 또한, 장치는 예측의 수와 동일한 예측의 유형을 가지는 하나 이상의 큐와 예측의 수를 비교하도록 구성된 인벤토리 필터를 포함한다. 몇몇 경우에서, 인벤토리 필터는 상이한 예측 유형을 가지는 하나 이상의 다른 큐와 예측의 수를 비교하도록 구성된다. 예를 들어, 다수의 순방향 순차 예측은 후면 큐 등에 대하여 필터링될 수 있다. 적어도 일 실시형태에서, 장치는 메모리로의 예측가능한 액세스를 관리하기 위해 데이터 반환 캐시 메모리를 포함한다. 데이터 반환 캐시 메모리는, 예를 들어, 임계치보다 적은 수명 (age) 을 가지는 예측을 저장하도록 구성된 단기간 캐시 메 모리 및 예를 들어 임계치보다 많은 또는 동일한 수명을 가지는 예측을 저장하도록 구성된 장기간 캐시 메모리를 포함할 수 있다. 장기간 캐시 메모리는 통상적으로 단기간 캐시 메모리보다 더욱 큰 메모리 수용력을 가진다. 또한, 프리페쳐는, 다중의 예측이 단기간 캐시 메모리 또는 장기간 캐시 메모리 둘 다, 또는 둘 중 하나에 저장되는지의 여부에 관계없이 동작의 하나의 주기 또는 2 주기 이상에 걸치는 동안 병렬로 검출하도록 구성된 인터페이스를 포함할 수 있고, 여기서, 인터페이스는 단기간 캐시 메모리 및 장기간 캐시 메모리를 검사할 때 각각 다중 예측의 2 개 이상의 표현을 이용한다.In one embodiment of the invention, the apparatus includes a predictive inventory comprising each queue configured to hold a group of items. Typically, a group of items includes a triggering address corresponding to the group of items. Each item in the group consists of one type of prediction. The apparatus also includes an inventory filter configured to compare the number of predictions with one or more queues having the same type of prediction as the number of predictions. In some cases, the inventory filter is configured to compare the number of predictions with one or more other queues having different prediction types. For example, multiple forward sequential predictions may be filtered for the back queue and the like. In at least one embodiment, the apparatus includes a data return cache memory to manage predictable access to the memory. The data return cache memory includes, for example, short term cache memory configured to store predictions having an age less than a threshold and long term cache memory configured to store predictions having, for example, a lifetime greater than or equal to a threshold. can do. Long term cache memory typically has greater memory capacity than short term cache memory. In addition, the prefetcher is an interface configured to detect in parallel while spanning more than one or two cycles of operation, whether multiple predictions are stored in both short-term cache memory or long-term cache memory, or either. Wherein the interface uses two or more representations of multiple predictions respectively when examining the short term cache memory and the long term cache memory.

도면의 간단한 설명Brief description of the drawings

본 발명은 첨부된 도면에 관련된 이하의 상세한 설명을 통해 더욱 완전하게 이해된다.The invention is more fully understood from the following detailed description taken in conjunction with the accompanying drawings.

도 1 은 본 발명의 특정 실시형태에 따라서, 메모리 프로세스를 통해서 구현되는 예시적인 스페큘레이터를 도시하는 블록도이다.1 is a block diagram illustrating an exemplary speculator implemented through a memory process, in accordance with certain embodiments of the present invention.

도 2 는 본 발명의 일 실시형태에 따른 예시적인 실시형태의 도면이다.2 is a diagram of an exemplary embodiment according to one embodiment of the invention.

도 3a 는 본 발명의 특정 실시형태에 따른 예시적인 순방향 순차 예측기의 도면이다.3A is a diagram of an exemplary forward sequential predictor, in accordance with certain embodiments of the present invention.

도 3b 는 본 발명의 특정 실시형태에 따른 예시적인 블라인드 백 순차 예측기 (exemplary blind back sequential predictor) 의 도면이다.3B is a diagram of an exemplary blind back sequential predictor, in accordance with certain embodiments of the present invention.

도 3c 는 본 발명의 특정 실시형태에 따른 예시적인 백 섹터 순차 예측기 (exemplary back sector sequential predictor) 의 도면이다.FIG. 3C is a diagram of an exemplary back sector sequential predictor, in accordance with certain embodiments of the present invention. FIG.

도 3d 는 본 발명의 특정 실시형태에 따른 예시적인 역방향 순차 예측기의 동작을 도시하는 도면이다.3D is a diagram illustrating the operation of an exemplary reverse sequential predictor in accordance with certain embodiments of the present invention.

도 4 는 본 발명의 일 실시형태에 따라서, 예시적인 비순차 예측기를 도시하는 도면이다.4 is a diagram illustrating an exemplary out of order predictor, in accordance with an embodiment of the invention.

도 5 는 본 발명의 일 실시형태에 따라서, 인터리빙된 (interleave) 순차 어드레스의 스트림에 대해 비순차 예측을 억제하는 예시적인 기술을 나타내는 도면이다.5 is a diagram illustrating an example technique for suppressing out of order prediction for a stream of interleaved sequential addresses, in accordance with an embodiment of the present invention.

도 6 은 본 발명의 일 실시형태에 따라서, 다중의 스레드 (multiple thread) 에 걸쳐서 인터리빙된 순차 어드레스에 대해 비순차 예측을 억제하는 예시적인 기술을 나타내는 도면이다.6 is a diagram illustrating an exemplary technique for suppressing out of order prediction for an interleaved sequential address across multiple threads, in accordance with an embodiment of the present invention.

도 7 은 본 발명의 특정 실시형태에 따라서, 기준 어드레스 및 비순차 어드레스의 도달 시간에 기초하는 비순차 예측을 억제하는 다른 기술을 나타내는 도면이다.7 is a diagram illustrating another technique for suppressing out of order prediction based on arrival times of a reference address and out of order address, in accordance with certain embodiments of the present invention.

도 8 은 본 발명의 특정 실시형태에 따라서, 예측의 발생을 촉진하기 위한 예시적인 기술을 나타내는 도면이다.8 is a diagram illustrating an exemplary technique for facilitating generation of prediction, in accordance with certain embodiments of the present invention.

도 9 는 본 발명의 일 실시형태에 따라서, 예측 필터를 포함하는 다른 예시적인 스페큘레이터를 나타내는 도면이다.9 is a diagram illustrating another exemplary speculator including a predictive filter, in accordance with an embodiment of the present invention.

도 10 은 본 발명의 특정 실시형태에 따라서, 예시적인 비순차 예측기를 구현하는 프리페쳐를 나타내는 블록도이다.10 is a block diagram illustrating a prefetcher implementing an exemplary out of order predictor, in accordance with certain embodiments of the present invention.

도 11 은 본 발명의 일 실시형태에 따른 예시적인 비순차 예측기를 도시하는 도면이다.11 is a diagram illustrating an exemplary out of order predictor in accordance with an embodiment of the present invention.

도 12 는 본 발명의 일 실시형태에 따라서, 예시적인 예측 발생기를 도시하는 도면이다.12 is a diagram illustrating an exemplary prediction generator, in accordance with an embodiment of the present invention.

도 13 은 본 발명의 특정 실시형태에 따라서, 예시적인 우선순위 조절기를 도시하는 도면이다.13 is a diagram illustrating an exemplary priority adjuster, in accordance with certain embodiments of the present invention.

도 14 는 본 발명의 특정 실시형태에 따라서, 비순차 예측을 형성할 때 비순차 예측기 발생기를 동작하는 예시적인 파이프라인 (pipeline) 을 도시하는 도면이다.14 is a diagram illustrating an example pipeline of operating an out of order predictor generator when forming out of order prediction, in accordance with certain embodiments of the present invention.

도 15 는 본 발명의 특정 실시형태에 따라서, 비순차 예측의 우선순위를 결정하는 우선순위 조절기를 동작시키는 예시적인 파이프라인을 도시하는 도면이다.FIG. 15 is a diagram illustrating an example pipeline for operating a priority adjuster that prioritizes out of order prediction, in accordance with certain embodiments of the present invention.

도 16 은 본 발명의 특정 실시형태에 따라서, 메모리 프로세서 내의 예시적인 예측 인벤토리를 나타내는 블록도이다.16 is a block diagram illustrating an exemplary predictive inventory in a memory processor, in accordance with certain embodiments of the present invention.

도 17 은 본 발명의 일 실시형태에 따른 예시적인 예측 인벤토리를 도시하는 도면이다.17 is a diagram illustrating an exemplary predictive inventory in accordance with an embodiment of the present invention.

도 18 은 본 발명의 특정 실시형태에 따른 인벤토리 필터의 일 예를 도시하는 도면이다.18 is a diagram illustrating an example of an inventory filter according to a particular embodiment of the present invention.

도 19a 및 도 19b 는 본 발명의 특정 실시형태에 따라서, 잉여분 (redundancy) 을 필터링하는 예시적인 기술을 도시하는 도면이다.19A and 19B are diagrams illustrating example techniques for filtering redundancy, in accordance with certain embodiments of the present invention.

도 20 은 본 발명의 일 실시형태에 따라서, 프리페쳐 내에 배치된 다른 예시적인 예측 인벤토리를 나타내는 도면이다.20 is a diagram illustrating another exemplary predictive inventory placed in a prefetcher, in accordance with an embodiment of the present invention.

도 21 은 본 발명의 특정 실시형태에 따라서, 예시적인 캐시 메모리를 포함하는 프리페쳐를 도시하는 블록도이다.21 is a block diagram illustrating a prefetcher including an exemplary cache memory, in accordance with certain embodiments of the present invention.

도 22 는 본 발명의 일 실시형태에 따라서, 예시적인 다중-레벨 캐시를 도시하는 도면이다.22 is a diagram illustrating an exemplary multi-level cache, in accordance with an embodiment of the present invention.

도 23a 는 본 발명의 특정 실시형태에 따른 제 1 어드레스 저장소에 대한 예시적인 제 1 쿼리 인터페이스 (exemplary first query interface) 를 도시하는 도면이다.FIG. 23A is a diagram illustrating an exemplary first query interface for a first address store, in accordance with certain embodiments of the present invention. FIG.

도 23b 는 도 23a 의 제 1 쿼리 인터페이스를 이용하여 동시에 검사될 수 있는 다수의 입력 어드레스를 나타내는 도면이다.FIG. 23B illustrates a number of input addresses that can be checked simultaneously using the first query interface of FIG. 23A.

도 24 는 본 발명의 특정 실시형태에 따른 제 2 어드레스 저장소에 대한 예시적인 제 2 쿼리 인터페이스를 도시하는 도면이다.24 is a diagram illustrating an exemplary second query interface for a second address store, in accordance with certain embodiments of the present invention.

도 25a 는 본 발명의 일 실시형태에 따라서, 제 2 어드레스 저장소에 저장된 예시적인 어드레스의 가능한 배열 (또는 그 표현) 을 도시하는 도면이다.FIG. 25A is a diagram illustrating a possible arrangement (or representation thereof) of example addresses stored in a second address store, in accordance with an embodiment of the present invention. FIG.

도 25b 는 본 발명의 일 실시형태에 따라서, 비순서화된 어드레스 및 순서화된 유효 비트에 기초하여 결과를 발생시키는 예시적인 히트 발생기를 도시하는 도면이다.FIG. 25B is a diagram illustrating an exemplary hit generator that generates a result based on an unordered order and valid bits in accordance with one embodiment of the present invention.

도 26 은 본 발명의 일 실시형태에 따라서, 도 25b 의 히트 발생기의 하나의 결과, R 을 발생시키는 컴포넌트의 개략적인 표현이다.FIG. 26 is a schematic representation of a component that generates R, one result of the heat generator of FIG. 25B, in accordance with an embodiment of the present invention.

도 27 은 본 발명의 특정 실시형태에 따라서, 히트 발생기의 일 예시를 도시하는 도면이다.27 is a diagram illustrating an example of a heat generator, in accordance with certain embodiments of the present invention.

도 28 은 본 발명의 다른 실시형태에 따라서, 히트 발생기의 다른 예시를 도시하는 도면이다.28 is a diagram showing another example of a heat generator, according to another embodiment of the present invention.

동일한 참조 기호는 수 개의 도면을 통해 대응하는 부분을 나타내는 것이다.The same reference numerals indicate corresponding parts throughout the several views.

예시적인 실시형태의 상세한 설명Detailed Description of Exemplary Embodiments

본 발명은, 프로세서가 필요하도록 기대될 수도 있는 프로그램 인스트럭션 및 프로그램 데이터를 회수하기 위한 메모리로의 액세스를 효과적으로 예측하기 위한 시스템, 장치 및 방법을 제공한다. 메모리로의 액세스를 효과적으로 예측함으로써, 하나 이상의 프로세스로 필요한 데이터를 제공하는 지연시간이 최소화될 수 있다. 본 발명의 특정 실시형태에 따라서, 장치는 메모리 액세스를 예측하도록 구성된 스페큘레이터를 포함한다. 예시적인 스페큘레이터는 예측 발생 레이트를 변경하기 위해 구성가능한 양의 예측을 발생하도록 구성될 수 있다. 다른 실시형태에서, 스페큘레이터는, 프리페쳐가 관리하기 위해 요구될 수도 있는 잉여의 예측과 같은 불필요한 예측의 양을 제한하기 위해 특정 예측의 생성을 억제할 수 있다. 또한, 특정 실시형태에서, 스페큘레이터는, 캐시 메모리 또는 예측을 포함하는 인벤토리가 프로세서로의 표현을 위한 보다 적절한 예측을 포함하는지의 여부를 조사함으로써 불필요한 예측을 필터링할 수 있다. 일 실시형태에서, 캐시 메모리는 단기간 캐시 및 장기간 캐시 메모리 내에 예측을 저장하고, 이들은 잉여의 예측을 필터링하기 위해 동시에 검사된다.The present invention provides a system, apparatus, and method for effectively predicting access to a memory for retrieving program instructions and program data that may be expected to be required by a processor. By effectively predicting access to memory, the latency of providing the necessary data to one or more processes can be minimized. In accordance with certain embodiments of the present invention, an apparatus includes a speculator configured to predict memory access. The example speculator can be configured to generate a configurable amount of prediction to change the prediction generation rate. In another embodiment, the speculator can suppress the generation of certain predictions to limit the amount of unnecessary predictions, such as redundant predictions that prefetchers may be required to manage. Further, in certain embodiments, the speculator can filter out unnecessary predictions by examining whether the inventory containing cache memory or predictions includes more appropriate predictions for representation to the processor. In one embodiment, the cache memory stores predictions in the short term cache and the long term cache memory, which are simultaneously checked to filter out the excess predictions.

순차 및 Sequential and 비순차Out of order 예측을 발생시키기 위한 To generate predictions

프리페쳐Prefetchers 및 And 스페큘레이터의Specular 예시적인 실시형태 Exemplary Embodiment

도 1 은 본 발명의 특정 실시형태에 따라서, 예시적인 스페큘레이터를 도시하는 블록도이다. 이 예시에서, 스페큘레이터 (108) 는 프리페쳐 (106) 내에 상주하도록 도시된다. 또한, 프리페쳐 (106) 는 하나 이상의 프로세서에 의해 적어도 메모리 액세스를 제어하도록 설계된 메모리 프로세서 (104) 내부에 상주하도록 도시된다. 프리페쳐 (106) 는 요구되기 전에 메모리 (112) 에서 프로그램 인스트럭션 및 프로그램 데이터 둘 다를 "페치" 하도록 동작하고, 그 후, 그 프로세서에 의한 요청 하에서 프로세서 (102) 로 페치된 프로그램 인스트럭션 및 프로그램 데이터를 제공한다. 사용하기 이전에 그들을 페칭함으로써 (즉, "프리페칭" 함으로써), 프로세서 유휴 시간 (예를 들어, 프로세서 (102) 의 데이터가 고갈되는 동안의 시간) 이 최소화된다. 또한, 프리페쳐 (106) 는 프리페칭된 데이터의 표현을 프로세서 (102) 로 저장하고 관리하기 위한 캐시 메모리 (110) 를 포함한다. 캐시 메모리 (110) 는 인스트럭션 실행 및 데이터 회수를 고속화 (speeding-up) 하기 위한 데이터 저장소로서 기능한다. 특히, 캐시 메모리 (110) 는 프리페쳐 (106) 내에 상주하고, 메모리 컨트롤러 (104) 와는 별개로 약간의 지연 시간을 감소시키기 위해 일반적으로 채용된 "L1" 및 "L2" 캐시와 같은 다른 메모리 캐시를 보충하도록 동작한다.1 is a block diagram illustrating an exemplary speculator in accordance with certain embodiments of the present invention. In this example, speculator 108 is shown to reside within prefetcher 106. Also, prefetchers 106 are shown to reside within memory processor 104 designed to control at least memory access by one or more processors. Prefetcher 106 operates to " fetch " both program instructions and program data in memory 112 before it is required, and then fetches the program instructions and program data fetched to processor 102 under request by the processor. to provide. By fetching them (ie, “prefetching”) prior to use, processor idle time (eg, time while data in processor 102 is depleted) is minimized. The prefetcher 106 also includes a cache memory 110 for storing and managing a representation of the prefetched data with the processor 102. The cache memory 110 functions as a data store for speeding-up instruction execution and data retrieval. In particular, the cache memory 110 resides in the prefetcher 106, and other memory caches, such as the " L1 " and " L2 " caches, which are generally employed to reduce some delay time separately from the memory controller 104. It works to supplement.

동작시에, 스페큘레이터 (108) 는 메모리 (112) 를 액세스하기 위해 프로세서 (102) 에 의한 요청 ("판독 요청") 에 대해 시스템 버스 (103) 를 모니터한다. 특히, 프로세서 (102) 가 프로그램 인스트럭션을 실행하기 때문에, 스페큘레이터 (108) 는 프로세서 (102) 에 의해 이용되는 프로그램 인스트럭션 및 프로그램 데이터를 포함하는 어드레스에 대한 판독 요청을 검출한다. 설명을 위해, "어드레스" 는 메모리 (112) 와 캐시 메모리 (110) 사이에서 일반적으로 전송되는 메모리의 유닛 또는 캐시 라인과 조합된다. 캐시 라인의 "어드레스" 는 메모리 위치를 나타낼 수 있고, 캐시 라인은 메모리 (112) 의 하나 이상의 어드레스로부터 데이터를 포함할 수 있다. 용어 "데이터" 는 프리페치될 수 있는 정보의 유닛을 나타내는 반면, 용어 "프로그램 인스트럭션" 및 "프로그램 데이터" 는 각각 그 프로세싱에서 프로세서 (102) 에 의해 이용되는 인스트럭션 및 데이터를 나타낸다. 따라서, 데이터 (예를 들어, 임의의 비트 수) 는 프로그램 인스트럭션 및/또는 프로그램 데이터를 구성하는 예측가능한 정보를 나타낼 수 있다. 또한, 용어 "예측" 은 용어 "예측된 어드레스" 와 호환성 있게 이용될 수 있다. 예측된 어드레스가 메모리 (112) 에 액세스하기 위해 이용된 경우, 하나 이상의 캐시 라인은 예측된 어드레스뿐만 아니라 다른 어드레스 (예측된 또는 그렇지 않은 어드레스) 가 통상적으로 페치된 것을 포함한다.In operation, speculator 108 monitors system bus 103 for a request by processor 102 (“read request”) to access memory 112. In particular, because processor 102 executes program instructions, speculator 108 detects a read request for an address containing program instructions and program data used by processor 102. For illustration purposes, an "address" is combined with a cache line or unit of memory that is generally transferred between memory 112 and cache memory 110. An "address" of a cache line may indicate a memory location, and the cache line may include data from one or more addresses of memory 112. The term "data" refers to a unit of information that can be prefetched, while the terms "program instruction" and "program data" respectively refer to the instructions and data used by the processor 102 in its processing. Thus, the data (eg, any number of bits) may represent predictable information that constitutes the program instruction and / or program data. The term "prediction" may also be used interchangeably with the term "predicted address". If the predicted address is used to access the memory 112, the one or more cache lines include those in which the predicted address as well as other addresses (predicted or otherwise) are typically fetched.

검출된 판독 요청에 기초하여, 스페큘레이터 (108) 는 이후 프로세서 (102) 에 의해 요청될 수도 있는 예측된 어드레스의 구성가능한 수를 생성할 수 있다. 따라서, 스페큘레이터 (108) 는 본 발명의 적어도 일 실시형태에 따른 하나 이상의 스페큘레이션 기술을 이용함으로써 동작한다. 스페큘레이터 (108) 는 예측기와 같은 이들 스페큘레이션 기술을 구현하고, 이들의 구현이 이하 설명된다. 또한, 스페큘레이터 (108) 는 몇몇 예측의 발생을 억제하고, 다른 예측을 필터링한다. 특정 예측을 억제하거나 또는 필터링 함으로써, 또는 둘 다 함으로써, 잉 여의 예측의 수가 감소되고, 그로 인해, 리소스가 보존된다. 보존된 리소스의 예는 캐시 메모리 (110) 와 같은 메모리 리소스, 및 메모리 버스 (111) 와 같은 (예를 들어, 대역폭에 의한) 버스 리소스를 포함한다.Based on the detected read request, speculator 108 may then generate a configurable number of predicted addresses that may be requested by processor 102. Thus, speculator 108 operates by using one or more speculation techniques in accordance with at least one embodiment of the present invention. Speculator 108 implements these speculation techniques, such as predictors, and their implementation is described below. In addition, speculator 108 suppresses the occurrence of some predictions and filters out other predictions. By suppressing or filtering certain predictions, or both, the number of redundant predictions is reduced, thereby conserving resources. Examples of conserved resources include memory resources such as cache memory 110, and bus resources (eg, by bandwidth) such as memory bus 111.

스페큘레이터 (108) 의 예측이 부가적인 필터링을 수행하고 난 후, 메모리 프로세서 (104) 는 메모리 버스 (111) 를 통해서 잔류하는 예측 (즉, 필터링되지 않은 예측) 을 메모리 (112) 로 전송한다. 이에 응답하여, 메모리 (112) 는 예측된 어드레스를 가지는 프리페치된 데이터를 복귀시킨다. 캐시 메모리 (110) 는, 메모리 프로세서 (104) 가 그 데이터를 프로세서 (102) 로 전송하는 시간까지 복귀된 데이터를 임시로 저장한다. 적절한 시점에서, 메모리 프로세서 (104) 는 다른 조건들 중에서 지연 시간이 최소화된 것을 보증하기 위해 시스템 버스 (103) 를 통해서 프리페치된 데이터를 프로세서 (102) 로 전송한다.After the prediction of the speculator 108 performs additional filtering, the memory processor 104 sends the remaining prediction (ie, the unfiltered prediction) to the memory 112 via the memory bus 111. . In response, memory 112 returns the prefetched data with the predicted address. The cache memory 110 temporarily stores the returned data until the time when the memory processor 104 transmits the data to the processor 102. At the appropriate time, the memory processor 104 sends the prefetched data to the processor 102 over the system bus 103 to ensure that the delay time is minimized, among other conditions.

도 2 는 본 발명의 일 실시형태에 따른 예시적인 스페큘레이터를 도시한다. 스페큘레이터 (108) 는 예측 (203) 을 발생시키는 판독 요청 (201) 을 수신하도록 구성된다. 도시된 바와 같이, 스페큘레이터 (108) 는 순차 예측기 ("SEQ. 예측기"; 206) 및 비순차 예측기 ("NONSEQ. 예측기"; 216) 로 제어 정보 및 어드레스 정보를 제공하도록 구성된 예측 컨트롤러 (202) 를 포함하고, 여기서 이 예측기들은 예측 (203) 을 발생시킨다. 예측 컨트롤러 (202) 는 예측의 최적의 양 및 최적의 유형을 제공하는 방법으로 예측 발생 프로세스를 관리하기 위해, 전체적으로 또는 부분적으로, 기능한다. 예를 들어, 예측 컨트롤러 (202) 는 판독 요청 (201) 에서 구체화된 특정 캐시 라인, 또는 캐시 라인의 그룹에 대해 발생된 예측 의 수와 예측의 유형을 변경시킬 수 있다. 다른 예시로서, 예측 컨트롤러 (202) 는, 타겟 캐시 (218) 에서 이용가능한 메모리와 같은 리소스를 보존하도록 특정 예측의 발생을 억제하고, 과잉 예측된 어드레스로 인한 메모리 (112) 로의 불필요한 액세스를 최소화하기 위해 억제기 (204) 를 포함한다. 부가적으로, 예측 컨트롤러 (202) 는 비순차 예측의 발생을 독촉하기 위한 촉진기 (205) 를 포함할 수 있다. 도 8 에 도시된 바와 같이, 촉진기 (208) 는, 비순차 예측이 관련된 비선형 어드레스 스트림 바로 전에 발생하는 어드레스를 검출하기 이전에 비순차 예측의 발생을 트리거하도록 동작한다. 예측 컨트롤러 (202) 에 대한 더욱 상세한 설명이 순차 예측기 (206) 및 비순차 예측기 (216) 에 대한 이하 설명에 후속한다.2 illustrates an exemplary speculator in accordance with an embodiment of the present invention. Speculator 108 is configured to receive a read request 201 that generates a prediction 203. As shown, the speculator 108 is a prediction controller 202 configured to provide control information and address information to a sequential predictor ("SEQ. Predictor"; 206) and an out of order predictor ("NONSEQ. Predictor"; 216). ), Where these predictors generate the prediction (203). Prediction controller 202 functions, in whole or in part, to manage the prediction generation process in a manner that provides the optimal amount and optimal type of prediction. For example, prediction controller 202 can change the number of predictions and the type of predictions generated for a particular cache line, or group of cache lines, specified in read request 201. As another example, prediction controller 202 suppresses the occurrence of certain predictions to conserve resources such as memory available in target cache 218 and minimizes unnecessary access to memory 112 due to over-predicted addresses. Hazard suppressor 204. Additionally, prediction controller 202 can include an accelerator 205 for encouraging the generation of out of order prediction. As shown in FIG. 8, the accelerator 208 operates to trigger the generation of out-of-order prediction before detecting an address that occurs just before the non-linear address stream with which the out-of-order prediction occurs. A more detailed description of the prediction controller 202 follows the description below for the sequential predictor 206 and the out of order predictor 216.

순차 예측기 (206) 는 기대도 (a degree of expectancy) 를 가지는 예측 (즉, 예측된 어드레스) 을 발생하도록 구성된다. 즉, 순차 예측기 (206) 는 시간 경과에 따라 규칙적인 판독 요청 (201) 의 하나 이상의 패턴이 뒤따를 것으로 기대될 수도 있는 예측을 발생시킨다. 이들 패턴은, 메모리 참조가 그들 사이에서 공간적인 집약성 (spatial locality) 을 가진다는 사실에서 발생한다. 예를 들어, 프로세서 (102) 가 프로그램 인스트럭션을 실행하는 것과 같이, 판독 요청 (201) 의 스트림이 그들이 시스템 버스 (103) 를 교차하는 자연적인 순차일 수 있다. 순차 패턴에 후속하는 어드레스를 예측하기 위해, "순방향 순차 예측" 으로 이하 설명된 스페큘레이션 기술의 유형은 순차 어드레스를 예측할 수 있다. 스페큘레이션 기술의 이러한 유형은 이하 설명된다.The sequential predictor 206 is configured to generate a prediction (ie, predicted address) having a degree of expectancy. That is, sequential predictor 206 generates predictions that may be expected to follow one or more patterns of regular read requests 201 over time. These patterns arise from the fact that memory references have spatial locality between them. For example, as the processor 102 executes program instructions, the stream of read requests 201 may be a natural sequence that they cross the system bus 103. In order to predict an address following a sequential pattern, the type of speculation technique described below as "forward sequential prediction" may predict the sequential address. This type of speculation technique is described below.

순방향 순차 예측기 (208) 는, 오름차순으로, 다수의 순차 어드레스를 발생시키도록 구성된다. 따라서, 프로세서 (102) 는 오름차순 어드레스의 스트림을 포함하는 시스템 버스 (103) 상으로 일련의 판독 요청 (201) 을 전송하는 경우, 순방향 순차 예측기 (208) 는 추가적인 오름차순 어드레스를 프리페치하기 위해 예측의 수를 발생시킨다. 순방향 순차 예측기 ("FSP" (208); forward sequential predictor) 의 예시가 도 3a 에 도시된다. 도 3a 에 도시된 바와 같이, FSP (208) 는 어드레스 A0 와 같은 어드레스를 수신하고, A0 어드레스로부터 순방향 (즉, 오름차순) 순차로 하나 이상의 어드레스를 발생시킨다. A0 의 표시는 하나 이상의 예측이 형성되는 기준 어드레스 (즉, A+0) 를 식별한다. 따라서, 표시 A1, A2, A3 등은 A+1, A+2, A+3 등의 어드레스를 나타내는 반면에, 표시 A(-1), A(-2), A(-3) 등은 A-1, A-2, A-3 등의 어드레스를 나타낸다. 이들 표시가 하나의 어드레스에 의해 오름차순 또는 내림차순 둘 중 하나로 일련의 어드레스를 나타낸다 할지라도, 어드레스의 임의의 패터닝 가능한 세트가 순차적으로 나타날 수 있다. 본 명세서를 통해서 전체적으로 이용된 바와 같이, 순차 어드레스는 단일의 문자 (single letter) 로 표현되고 나타낼 수 있다. 예를 들어, "A" 는 A0, A1, A2, A3 등을 표현하고, "B" 는 B0, B1, B2, B3 등을 표현한다. 이와 같이, "A" 및 "B" 는 순차 어드레스 스트림을 각각 나타내지만, "B" 의 어드레스 스트림은 "A" 의 비순차 어드레스 스트림을 나타낸다.The forward sequential predictor 208 is configured to generate a plurality of sequential addresses in ascending order. Thus, when the processor 102 sends a series of read requests 201 over a system bus 103 that includes a stream of ascending addresses, the forward sequential predictor 208 is responsible for prefetching additional ascending addresses. Generates numbers An example of a forward sequential predictor (“FSP” 208) is shown in FIG. 3A. As shown in FIG. 3A, the FSP 208 receives an address, such as address A0, and generates one or more addresses in a forward (ie ascending) order from the A0 address. The indication of A0 identifies the reference address (ie A + 0) in which one or more predictions are made. Thus, displays A1, A2, A3, and the like represent addresses such as A + 1, A + 2, A + 3, and so on, while displays A (-1), A (-2), A (-3), and the like are A -1, A-2, A-3, etc. address is shown. Although these indications represent a series of addresses in either ascending or descending order by one address, any patternable set of addresses may appear sequentially. As used throughout this specification, sequential addresses may be represented and represented by a single letter. For example, "A" represents A0, A1, A2, A3 and the like, and "B" represents B0, B1, B2, B3 and the like. In this way, "A" and "B" represent sequential address streams, respectively, but the address stream of "B" represents a non-sequential address stream of "A".

도 3a 를 더 참조하면, FSP (208) 는 적어도 인에이블 신호 (enable signal) 및 배치 신호 (batch signal) 를 수신하도록 도시되고, 이들 신호는 예측 컨트롤러 (202) 에 의해 제공된다. 인에이블 신호는 순방향 순차 예측이 발생되는지의 여부를 제어하고, 만약 발생되면, 배치 신호는 FSP (208) 가 발생하는 순차 어드레스의 수를 제어한다. 이 예시에서, 배치 신호는 기준 어드레스 하에서의 "7" 개의 어드레스가 예측되는 것을 나타낸다. 이와 같이, FSP (208) 는 순방향-순차의 어드레스 (A1 내지 A7) 를 발생시킨다. 따라서, 스페큘레이터 (108) 가 판독 요청 (201) 의 일부로서 A0 와 같은 어드레스를 수신할 때, 순차 예측기 (206) 는 예측 (203) 의 일부로서 어드레스 A1, A2, A3,..., Ab 를 제공할 수 있고, 여기서 b 는 "배치 (batch)" 의 수이다.Referring further to FIG. 3A, the FSP 208 is shown to receive at least an enable signal and a batch signal, which signals are provided by the prediction controller 202. The enable signal controls whether forward sequential prediction occurs, and if so, the batch signal controls the number of sequential addresses from which the FSP 208 occurs. In this example, the batch signal indicates that "7" addresses under the reference address are expected. As such, the FSP 208 generates forward-sequential addresses A1 to A7. Thus, when the speculator 108 receives an address such as A0 as part of the read request 201, the sequential predictor 206 is assigned the addresses A1, A2, A3,... As part of the prediction 203. Ab can be provided, where b is the number of “batch”.

도 2 의 블라인드 백 순차 예측기 (210; blind back sequential predictor) 는 하나의 순차 어드레스를 발생시키지만, 기준 어드레스에서 내림차순으로 구성된다. 블라인드 백 순차 예측기 ("블라인드 백") (210) 의 일 예는 도 3b 에 도시되고, 이는, 어드레스 A0 와 같은 하나 이상의 어드레스를 수신하고, 어드레스 A(-1) 와 같은 단지 하나의 예측을 발생시키는 블라인드 백 순차 예측기 (210) 를 A0 어드레스에서 뒤쪽 (즉, 내림차순) 순차로 나타낸다. 또한, FSP (208) 을 통한 경우에서와 같이, 블라인드 백 순차 예측기 (210) 는 내림차순 예측을 발생시키는지의 여부를 제어하기 위해 인에이블 신호를 수신한다.The blind back sequential predictor 210 of FIG. 2 generates one sequential address, but is configured in descending order from the reference address. An example of a blind back sequential predictor (“blind back”) 210 is shown in FIG. 3B, which receives one or more addresses, such as address A0, and generates only one prediction, such as address A (−1). The blind back sequential predictor 210 is shown backwards (ie, in descending order) at the A0 address. In addition, as in the case via the FSP 208, the blind back sequential predictor 210 receives an enable signal to control whether to generate descending prediction.

도 2 의 백 섹터 순차 예측기 (214) 는 시스템 버스 (103) 에서 다른 특정 캐시 라인을 검출한 후에 예측으로서 특정 캐시 라인을 발생시키도록 구성된다. 특히, 백 섹터 순차 예측기 (214) 가 특정 판독 요청 (201) 이 높은-순서의 캐시 라인에 대한 요청이라는 것을 검출하는 경우, 다음으로, 관련된 낮은-순서 캐시 라 인이 예측으로 발생된다. 높은-순서 캐시 라인이 홀수 어드레스를 포함하는 상위 ("프런트") 섹터로서 나타낼 수 있는 반면에, 낮은-순서 캐시 라인은 짝수 어드레스를 포함하는 하위 ("후면") 섹터로서 나타낼 수 있다. 도시의 목적으로, 캐시 라인이 128 바이트를 포함하고, 64 바이트 (즉, 128 바이트의 상부 반 (upper half)) 의 높은-순서 캐시 라인 및 64 바이트 (즉, 128 바이트의 하부 반 (lower half)) 의 낮은-순서 캐시 라인으로 구성된다고 가정한다.The back sector sequential predictor 214 of FIG. 2 is configured to generate a particular cache line as a prediction after detecting another particular cache line on the system bus 103. In particular, when the back sector sequential predictor 214 detects that a particular read request 201 is a request for a high-order cache line, then the associated low-order cache line is generated with prediction. High-order cache lines may be represented as upper ("front") sectors containing odd addresses, while low-order cache lines may be represented as lower ("back") sectors containing even addresses. For purposes of illustration, the cache line contains 128 bytes, with 64 bytes (ie, upper half of 128 bytes) high-order cache line and 64 bytes (ie, lower half of 128 bytes). Suppose it consists of a low-order cache line.

백 섹터 순차 예측기 (214) 의 일 예는 도 3c 에 도시되고, 이는, 하나 이상의 어드레스를 수신하는 백 섹터 순차 예측기 ("백 섹터") (214) 를 나타낸다. 캐시 라인의 상부 또는 정면 섹터에 대한 판독 요청 (201) 을 수신하는 조건 하에서, 어드레스 AU 와 같은 백 섹터 순차 예측기 (214) 는 단지 하나의 예측: 어드레스 AL 만을 발생시킨다. 스페큘레이션 기술의 이러한 유형은, 프로세서 (102) 가 일반적으로 캐시 라인의 상부 또는 정면 섹터를 요청하고 일정 시간 후에 하부 또는 백 섹터를 요청하는 현상에 영향을 준다. 또한, 백 섹터 순차 예측기 (214) 는 백 섹터 예측을 발생시키는지의 여부를 제어하기 위해 인에이블 신호를 수신한다.One example of a back sector sequential predictor 214 is shown in FIG. 3C, which represents a back sector sequential predictor (“back sector”) 214 that receives one or more addresses. Under the condition of receiving a read request 201 for the top or front sector of the cache line, the back sector sequential predictor 214, such as address AU, generates only one prediction: address AL. This type of speculation technique affects the phenomenon in which the processor 102 generally requests the top or front sectors of the cache line and requests the bottom or back sectors after some time. The back sector sequential predictor 214 also receives an enable signal to control whether to generate back sector prediction.

도 2 의 역방향 순차 예측기 (212) 는 내림차순으로 다수의 순차 어드레스를 발생시키도록 구성된다. 따라서, 프로세서 (102) 가 내림차순 어드레스의 스트림을 포함하는 시스템 버스 (103) 상으로 일련의 판독 요청을 전송하는 경우, 역방향 순차 예측기 (212) 가 추가적인 내림차순 어드레스에 대한 다수의 예측을 발생시킨다. 역방향 순차 예측기 ("RSP") (212) 의 일 예시가 도 3d 에 도시된다. 도 3d 에 도시된 바와 같이, RSP (212) 는 어드레스 A0, A(-1), 및 A(-2) 와 같은 어드레스 스트림을 검출하고, 이에 응답하여, 기준 어드레스 A0 에서 역방향 (즉, 내림차순) 순차로 하나 이상의 어드레스를 발생시킨다. 또한, 도 3d 는, RSP (212) 가 적어도 인에이블 신호, 배치 신호, 및 신뢰 레벨 ("Conf."; confidence level) 신호를 수신하고, 이들 신호 모두는 예측 컨트롤러 (202) 에 의해 제공된다는 것을 나타낸다. 비록, 인에이블 신호 및 배치 신호가 FSP (208) 를 통해서 이용된 바와 같이 동일한 방법으로 동작하지만, 신뢰 레벨 ("Conf.") 신호는 역방향-순차 예측의 발생을 트리거할 때 정의되는 임계치 (threshold) 를 제어한다.The reverse sequential predictor 212 of FIG. 2 is configured to generate multiple sequential addresses in descending order. Thus, when processor 102 sends a series of read requests over system bus 103 that includes a stream of descending addresses, reverse sequential predictor 212 generates multiple predictions for additional descending addresses. An example of a reverse sequential predictor (“RSP”) 212 is shown in FIG. 3D. As shown in FIG. 3D, the RSP 212 detects address streams such as addresses A0, A (-1), and A (-2) and, in response, reverses (ie, in descending order) at reference address A0. Generate one or more addresses sequentially. 3D also indicates that the RSP 212 receives at least an enable signal, a placement signal, and a confidence level (“Conf.”) Confidence signal, all of which are provided by the prediction controller 202. Indicates. Although the enable signal and the placement signal operate in the same way as used through the FSP 208, a confidence level ("Conf.") Signal is a threshold defined when triggering the generation of reverse-sequential prediction. ).

도 3d 는 본 발명의 특정 실시형태에 따라서, 예시적인 RSP (212) 의 동작을 도시하는 차트 (310) 를 또한 나타낸다. 여기서, "2" 개의 신뢰 레벨은 트리거 레벨 (312) 을 설정하고, 배치 신호는 "5" 가 예측되는 트리거 어드레스 이상을 어드레스 지정한다는 것을 나타낸다. 트리거 어드레스는 예측을 발생시키는 예측기를 유발하는 어드레스이다. 인터벌 I1 도중에 A(0) 를 검출한 후, RSP (212) 는 후속 인터벌 I2 도중에 어드레스 A(-1) 를 검출한다고 가정한다. 다음으로, 인터벌 I3 도중에 어드레스 A(-2) 를 검출하고, 검출된 스트림이 일련의 내림차순 어드레스인 신뢰의 특정 레벨에 도달된다. 트리거 레벨 (312) 이 초과될 때 이 신뢰의 레벨에 도달되고, 이는 역-순차의 어드레스 A(-3) 내지 A(-7) 를 발생하기 위한 RSP (212) 를 유발한다. 따라서, 스페큘레이터 (108) 가 일련의 판독 요청 (201) 으로서 A0, A(-1) 및 A(-2) 와 같은 특정 수의 어드레스를 수신하면, 순 차 예측기 (206) 는 예측 (203) 의 일부로서 어드레스 A(-3), A(-4), A(-5),...,Ab 를 제공할 수 있고, 여기서, b 는 수 "배치" 이다. 몇몇 실시형태에서, RSP (212) 는 신뢰 레벨을 채용하지 않지만, 기준 어드레스 이후에 시작하는 예측을 생성한다. 본 발명의 다른 실시형태에서, 신뢰 레벨의 개념이 본 명세서에 설명된 다른 예측에 채용된다. 순차 예측기 (206) 의 RSP (212) 및 다른 구성 예측기의 제어는 이하 더 상술되고; 도 2 의 비순차 예측기 (216) 가 이하 설명된다.3D also shows a chart 310 illustrating the operation of an example RSP 212, in accordance with certain embodiments of the present invention. Here, " 2 " confidence levels set the trigger level 312, and the batch signal indicates that " 5 " addresses more than the trigger address to be predicted. The trigger address is the address that causes the predictor to generate the prediction. After detecting A (0) during interval I1, assume that RSP 212 detects address A (-1) during subsequent interval I2. Next, during the interval I3, address A (-2) is detected, and the detected stream reaches a specific level of trust, which is a series of descending addresses. This level of confidence is reached when the trigger level 312 is exceeded, which causes the RSP 212 to generate the reverse-sequential addresses A (-3) to A (-7). Thus, if speculator 108 receives a certain number of addresses, such as A0, A (-1), and A (-2) as a series of read requests 201, sequential predictor 206 predicts (203). Address A (-3), A (-4), A (-5), ..., Ab can be provided as part of a) where b is the number "batch". In some embodiments, RSP 212 does not employ a confidence level, but produces a prediction that starts after the reference address. In another embodiment of the present invention, the concept of confidence level is employed in the other predictions described herein. Control of the RSP 212 and other configuration predictors of the sequential predictor 206 is further described below; The out of order predictor 216 of FIG. 2 is described below.

비순차 예측기 (216) 는, 어드레스가 판독 요청 (201) 의 비선형 스트림 내에 있을 때까지도, 스페큘레이터 (108) 에 의해 검출된 어드레스에 후속하여 하나 이상의 예측 (즉, 예측된 어드레스) 을 발생하도록 구성된다. 통상적으로, 다음 어드레스를 예측하기 위한 요청된 어드레스의 관찰가능한 패턴이 없을 경우, 앞선 어드레스 하나에만 기초하는 예측은 어렵다. 그러나, 본 발명의 일 실시형태에 따라서, 비순차 예측기 (216) 는 하나 이상의 사전 어드레스에서 패터닝불가한 예측된 어드레스를 포함하는 비순차 예측을 발생한다. "패터닝불가한" 예측은 함께 패터닝될 수 없는 예측이거나 또는 사전 어드레스에 대해 불규칙적이다. 패터닝불가한 예측의 하나의 유형은 비순차 예측이다. 비순차 예측이 기준이 되는 사전 어드레스는 즉석 어드레스이거나 또는 트리거 어드레스로서 구성된 임의의 어드레스일 수 있다. 특히, 판독 요청 (201) 의 스트림에서의 2 개 이상의 어드레스에 걸친 하나 이상의 패턴의 부족은 메모리 위치의 다양한 공간 위치에서의 인스트럭션 및 데이터를 페칭함으로 인해 약간 무분별한 방법으로 프로그램 인스트럭션을 실행하는 프로세서 (102) 를 나타낸다.Out-of-order predictor 216 may generate one or more predictions (ie, predicted addresses) subsequent to the address detected by speculator 108, even until the address is within the non-linear stream of read request 201. It is composed. Typically, if there is no observable pattern of the requested address to predict the next address, prediction based only on the previous address is difficult. However, in accordance with one embodiment of the present invention, out of order predictor 216 generates out of order predictions that include predicted addresses that are not patternable in one or more prior addresses. A "nonpatternable" prediction is a prediction that cannot be patterned together or is irregular for a prior address. One type of nonpatterned prediction is out of order prediction. The pre-address to which out-of-order prediction is based may be an instant address or any address configured as a trigger address. In particular, the lack of one or more patterns across two or more addresses in the stream of read requests 201 may cause processor 102 to execute program instructions in a slightly indiscriminate manner due to fetching instructions and data at various spatial locations of memory locations. ).

비순차 예측기 (216) 는 비순차 예측으로서 사전 어드레스에서 적격 (qualify) 이 될 수 있는 하나 이상의 가능한 비순차 어드레스로의 결합을 저장하는 저장소로서 타겟 캐시 (218) 를 포함한다. 타겟 캐시 (218) 는 적절한 방법으로 비순차 예측을 발생하기 위해 검출된 어드레스인, 도입부 (incoming) 에 반하는 그 콘텐츠를 쉽게 비교하도록 설계된다. 비순차 예측을 발생하는 검출된 어드레스는 "트리거" 어드레스로서 지칭되고, 그 결과 예측은 2 개의 어드레스 사이에서 패터닝불가한 조합의 "타겟" 이다. 예시적인 비순차 예측기 (216) 는 이하 설명된다.Out-of-order predictor 216 includes a target cache 218 as storage that stores a combination of one or more possible out-of-order addresses that may be qualified in a prior address as out-of-order prediction. The target cache 218 is designed to easily compare its content against the incoming, which is the address detected to generate out of order prediction in an appropriate manner. The detected address that produces the out of order prediction is referred to as the "trigger" address, and the result is the "target" of an unpatternable combination between the two addresses. An exemplary out of order predictor 216 is described below.

도 4 는 본 발명의 일 실시형태에 따라서, 예시적인 비순차 예측기 (216) 를 도시한다. 비순차 예측기 (216) 는 타겟 캐시 (422) 인 저장소에 동작가능하게 연결된 비순차 예측 엔진 ("NonSeq. Prediction Engine") (420) 을 포함한다. 타겟 캐시 (422) 는 각각의 트리거 어드레스와 하나 이상의 대응 타겟 어드레스 사이에서 조합을 유지한다. 도 4 는 비순차 어드레스를 조합하는 많은 방법 중 하나를 나타낸다. 여기서, 트리 구조는 특정 트리거 어드레스를 그 대응 타겟 어드레스에 관련시킨다. 이 예시에서, 타겟 캐시 (422) 는 어드레스 "B", "X" 및 "L" 과 같은 가능한 비순차 예측의 어드레스로 조합을 형성하기 위해 트리거 어드레스로서 어드레스 "A" 를 포함한다. 또한, 3 개의 타겟 어드레스는 각각의 어드레스 "C" 및 "G", "Y" 및 "M" 에 대한 트리거 어드레스이다. 타겟 캐시 (422) 의 형성 및 동작은 이하 더욱 상세하게 설명된다. 또한, 어드레스 "A" 는 도 4 에 도시되지 않은 트리거 어드레스에 대한 타겟 어드레스일 수 있다. 또한, 도시되지 않은 어드레스들 중에서 많은 다른 조합이 가능하다.4 illustrates an exemplary out of order predictor 216, in accordance with an embodiment of the present invention. Out-of-order predictor 216 includes an out-of-order prediction engine ("NonSeq. Prediction Engine") 420 operatively coupled to a repository that is a target cache 422. The target cache 422 maintains a combination between each trigger address and one or more corresponding target addresses. 4 illustrates one of many ways of combining out of order addresses. Here, the tree structure associates a particular trigger address with its corresponding target address. In this example, target cache 422 includes address "A" as a trigger address to form a combination with addresses of possible out of order predictions, such as addresses "B", "X" and "L". Also, the three target addresses are trigger addresses for the addresses "C" and "G", "Y" and "M", respectively. The formation and operation of the target cache 422 is described in more detail below. Also, the address "A" may be a target address for a trigger address not shown in FIG. In addition, many other combinations are possible among the addresses not shown.

비순차 예측 엔진 (420) 은 적어도 4 개의 신호 및 임의의 수의 어드레스 (402) 를 수신하도록 구성된다. 비순차 예측 엔진 (420) 의 동작을 제어하기 위해, 예측 컨트롤러 (202) 는 "배치" 신호 및 "인에이블" 신호를 제공하고, 이 신호들은 사전에 설명된 신호들과 사실상 유사하다. 또한, 예측 컨트롤러 (202) 는 2 개의 다른 신호: 폭 ("W") 신호 및 깊이 ("D") 신호를 제공한다. 이들 신호는 타겟 캐시 (422) 의 형성을 제어하고, 폭 신호 W 는 트리거 어드레스가 예측할 수 있는 타겟의 가능한 수를 설정하고, 깊이 신호 D 는 트리거 어드레스와 조합된 레벨의 수를 설정한다. 후자의 예시는, D 가 "4" 의 깊이를 나타낼 때이다. 이는, 어드레스 A 는 첫 번째 레벨에 있고, 어드레스 B 는 두 번째 레벨에 있으며, 어드레스 C 및 G 는 세 번째 레벨에 있고, 어드레스 D 는 네 번째 레벨에 있다는 것을 의미한다. 전자의 예는 W 가 "2" 로 설정될 때이다. 이는, 3 개의 어드레스 "B", "W" 및 "L" 중 단지 2 개의 어드레스만이 비순차 예측에 대해 이용된다는 것을 의미한다.Out of order prediction engine 420 is configured to receive at least four signals and any number of addresses 402. To control the operation of the out of order prediction engine 420, the prediction controller 202 provides a "batch" signal and an "enable" signal, which signals are substantially similar to the signals previously described. The prediction controller 202 also provides two other signals: a width ("W") signal and a depth ("D") signal. These signals control the formation of the target cache 422, the width signal W sets the possible number of targets that the trigger address can predict, and the depth signal D sets the number of levels in combination with the trigger address. An example of the latter is when D represents a depth of "4". This means that address A is at the first level, address B is at the second level, addresses C and G are at the third level, and address D is at the fourth level. An example of the former is when W is set to "2". This means that only two of the three addresses "B", "W" and "L" are used for out of order prediction.

또한, 도 4 는 비순차 어드레스 스트림 (404, 406, 408, 410 및 412) 에 개념적으로 도시된 어드레스와 같은, 예측 컨트롤러 (202) 에서의 예시적인 어드레스 (402) 를 수신하도록 구성된 비순차 예측 엔진 (420) 을 도시하는데, 여기서, 각각의 비순차 어드레스 스트림은 사전 검출된 어드레스로 패터닝불가한 어드레스를 포함한다. 예를 들어, 스트림 (404) 은 어드레스 "C" 에 의해 차례로 후속되는 어드레스 "B" 이전의 어드레스 "A" 를 포함한다. 비순차 어드레스의 경우에서 와 같이, "A" 에서 "B" 를 예측하고, "B" 에서 "C" 를 예측하기 위한 패턴을 검출하는 것은 프로세서 (102) 에서 판독 요청 (201) 을 모니터링하는 것 이상의 동작 없이는 어려운 문제이다. 이를 달성하기 위해, 비순차 예측기 (216) 는 특정 트리거 어드레스 및 그 타겟 어드레스 사이에서의 패터닝불가한 조합의 예측을 인에이블시키기 위한 타겟 캐시 (422) 를 형성한다. 비순차 예측 엔진 (420) 이 비순차 예측을 형성하면, 조합된 타겟 어드레스에서 하나의 그룹의 예측을 발생한다. 따라서, 트리거 어드레스 "A" 가 어드레스 "B" 의 비순차 예측 (즉, 기준 어드레스로서 B0) 으로 유도되는 경우, 다음으로, 예측된 어드레스는 B0, B1, B2,...Bb 를 포함하고, 여기서, b 는 배치 신호에 의해 설정된 수이다.4 also illustrates a non-sequential prediction engine configured to receive an example address 402 at the prediction controller 202, such as an address conceptually shown in the non-sequential address streams 404, 406, 408, 410 and 412. 420, where each out of order address stream includes an address that is not patternable with a pre-detected address. For example, stream 404 includes address "A" before address "B", which in turn is followed by address "C". As in the case of out of order addresses, detecting a pattern for predicting "B" at "A" and predicting "C" at "B" is to monitor the read request 201 at the processor 102. Without the above operation, it is a difficult problem. To accomplish this, out of order predictor 216 forms a target cache 422 to enable prediction of an unpatternable combination between a particular trigger address and its target address. When the out of order prediction engine 420 forms the out of order prediction, it generates a group of predictions at the combined target address. Thus, if trigger address "A" is derived to out of order prediction of address "B" (ie, B0 as a reference address), then the predicted address includes B0, B1, B2, ... Bb, Here b is a number set by the batch signal.

본 발명의 일 실시형태에서, 비순차 예측 엔진 (420) 은 각각의 어드레스 (402) 에서 후속의 어드레스로의 조합을 저장함에 따라서 타겟 캐시 (422) 를 형성한다. 예를 들어, 스트림 (404) 의 어드레스 A 를 검출함으로써, 비순차 예측 엔진 (420) 은, A 에서 B 로의 조합, B 에서 C 로의 조합, C 에서 D 로의 조합 등과 같은 조합을 가지는 타겟 캐시 (422) 를 거주시킨다. 비순차 예측 엔진 (420) 은 다른 스트림 (406, 408 등) 의 어드레스를 검출할 때와 동일하게 동작한다.In one embodiment of the invention, the out of order prediction engine 420 forms a target cache 422 as it stores the combination from each address 402 to a subsequent address. For example, by detecting the address A of the stream 404, the out-of-order prediction engine 420 causes the target cache 422 to have a combination, such as a combination of A to B, a combination of B to C, a combination of C to D, and the like. ) Out of order prediction engine 420 operates the same as when detecting the addresses of other streams 406, 408, and the like.

특정 실시형태에 따르면, 타겟 캐시 (422) 는 테이블 (430, 440 및 450) 과 같은 테이블에 의한 형태로 이들 조합을 저장한다. 이들 테이블은 트리거 어드레스와 타겟 어드레스 사이의 조합을 각각 저장하기 위한 트리거 컬럼 (426; trigger column) 및 타겟 컬럼 (428) 을 포함한다. 다음으로, 모든 스트림의 어드레스 (402) 는 타겟 캐시 (422) 의 테이블 (430, 440 및 450) 에 저장된다고 고려한다. 테이블 (430) 에 도시된 바와 같이, 트리거-타겟 조합 (432, 434 및 436) 은 A 에서 B 로의 조합, B 에서 C 로의 조합 및 G 에서 Q 로의 조합을 각각 설명한다. 다른 트리거-타겟 조합 (438) 은 C 에서 D 로의 조합 등을 포함한다. 이와 같이, 테이블 (440) 은 A 에서 X 로의 조합을 설명하기 위해 트리거-타겟 조합 (442) 을 포함하고, 테이블 (450) 은 A 에서 L 로의 조합을 설명하기 위해 트리거-타겟 조합 (452) 을 포함한다.According to a particular embodiment, the target cache 422 stores these combinations in the form of tables, such as tables 430, 440, and 450. These tables include a trigger column 426 and a target column 428 for storing a combination between a trigger address and a target address, respectively. Next, consider that the addresses 402 of all streams are stored in the tables 430, 440, and 450 of the target cache 422. As shown in table 430, trigger-target combinations 432, 434, and 436 describe combinations A to B, combinations B to C, and combinations G to Q, respectively. Other trigger-target combinations 438 include C to D combinations, and the like. As such, table 440 includes trigger-target combination 442 to describe the combination from A to X, and table 450 uses trigger-target combination 452 to describe the combination from A to L. Include.

도 4 는, 표 (430, 440 및 450) 이 동일한 트리거 어드레스에 대한 다중의 트리거-타겟 조합의 관련 우선순위를 설명하는 "Way 0", "Way 1", 및 "Way 2" 로서 각각 식별된다. 이러한 경우, Way 0 은 가장 높은 우선순위로서 조합되고, Way 1 은 두 번째로 높은 우선순위로서 조합된다. 이 예시에서, 표 (430) 의 트리거-타겟 조합 (432) 은, A 에서 B 로의 조합이 표 (440) 의 트리거-타겟 조합 (442) 인 A 에서 X 의 조합보다 높은 우선순위라는 것을 나타낸다. 따라서, 타겟 캐시 (422) 가 이들 조합을 포함한 후에, 후속 시간 비순차 예측 엔진 (420) 은 어드레스 A 를 검출하고 (예측 컨트롤러 (202) 가 동작하기 위해 비순차 예측 엔진 (420) 을 인에이블하는 한), 다음으로, 어드레스 B 는 가장 높은 우선순위로서 예측되고, 그 후, 표의 관련 우선순위로 인해서 두 번째로 높은 우선순위로서 어드레스 X 가 뒤따른다.4 is identified as "Way 0", "Way 1", and "Way 2", where tables 430, 440, and 450 respectively describe the relevant priorities of multiple trigger-target combinations for the same trigger address. . In this case, Way 0 is combined as the highest priority and Way 1 is combined as the second highest priority. In this example, trigger-target combination 432 of table 430 indicates that the combination from A to B is higher priority than the combination of A to X, which is trigger-target combination 442 of table 440. Thus, after the target cache 422 includes these combinations, the subsequent temporal out of order prediction engine 420 detects address A (which enables the out of order prediction engine 420 for the prediction controller 202 to operate). 1) Next, address B is predicted as the highest priority, followed by address X as the second highest priority due to the relevant priority of the table.

본 발명의 일 실시형태에 따르면, 관련 우선순위는 적어도 2 개의 방법으로 결정된다. 첫째, 트리거-타겟 조합은, 타겟 캐시 (422) 로 먼저 검출되고 위치 될 때 가장 높은 우선순위로 할당된다. 둘째, 트리거-타겟 조합은, 비순차 예측 엔진 (420) 이 트리거-타겟 조합이 성공적인지를 (예를 들어, 그 특정 조합에 기초하여 비순차 예측을 결과로 하는 가장-최근의 캐시 히트 (most-recent cache hit) 가 있는지를) 결정하는 경우 가장 높은 우선순위가 할당된다. "가장-최근의" 캐시 히트는 특정 트리거 어드레스로 조합된 타겟 어드레스의 적어도 하나의 최근 캐시 히트이다. 또한, 이전의 "가장 높은 우선순위" (또한, 레그 0 으로 지정) 는 Way 1 테이블에 대응하는 조합을 이동시킴으로써 두 번째로 높은 우선순위 (또한, 레그 1 로 지정) 에 뒤섞인다. 예시로서, A 에서 X 로의 조합이 제 1 트리거-타겟 조합으로서 타겟 캐시 (422) 로 도입되는 때의 제 1 포인트를 고려한다. 그 결과, 테이블 (430; 즉, Way 0) 로 위치시킴으로서 가장 높은 우선순위 (즉, 초기의 레그 0) 가 할당된다. 정시의 이후 포인트에서, 타겟 캐시 (422) 는 A 에서 B 로의 조합을 테이블 (430; 가장 높은 우선순위, 레그 0) 로 삽입한다. 또한, A 내지 X 의 조합을 테이블 (440) 로 (두 번째로 높은 우선순위, 레그 1) 이동시킨다. 본 발명의 특정 실시형태에서, 트리거-타겟 조합이 저장되는 테이블은 인덱스를 구성하는 어드레스 비트의 일부에 의존한다.According to one embodiment of the invention, the relevant priorities are determined in at least two ways. First, the trigger-target combination is assigned the highest priority when first detected and placed into the target cache 422. Second, the trigger-target combination may indicate that the non-sequential prediction engine 420 determines whether the trigger-target combination is successful (eg, most- recent cache hits resulting in non-sequential prediction based on that particular combination). When determining whether there is a recent cache hit, the highest priority is assigned. The "most recent" cache hit is at least one recent cache hit of the target address combined with a specific trigger address. Also, the previous "highest priority" (also designated leg 0) is shuffled to the second highest priority (also designated leg 1) by moving the combination corresponding to the Way 1 table. As an example, consider a first point when a combination of A to X is introduced into the target cache 422 as a first trigger-target combination. As a result, the highest priority (i.e., initial leg 0) is assigned by locating to table 430 (i.e. Way 0). At a later point in time, the target cache 422 inserts a combination of A to B into the table 430 (highest priority, leg 0). Also, the combination of A to X is moved to the table 440 (second highest priority, leg 1). In a particular embodiment of the invention, the table in which the trigger-target combination is stored depends on some of the address bits that make up the index.

도 2 를 다시 참조하여, 예측 컨트롤러 (202) 는 순차 예측기 (206) 및 비순차 예측기 (216) 둘 다를 제어하도록 구성된다. 예측 컨트롤러 (202) 는 순차 예측기 (206) 또는 비순차 예측기 (216), 또는 둘 다에 의해 발생된 예측의 양뿐만 아니라 유형도 제어한다. 또한, 예측 컨트롤러 (202) 는 잉여분 또는 복제 예측과 같은 불필요한 예측 (203) 의 발생을 억제한다. 각각의 예측 (208, 210, 212, 214 및 216) 이 동시에 동작할 수 있기 때문에, 예측 (203) 의 수는 프리페쳐 리소스를 오버로드하지 않도록 관리되어야만 한다. 예측 컨트롤러 (202) 는 이 동작과 다른 유사한 동작을 수행하기 위해 억제기 (204) 를 채용한다.Referring back to FIG. 2, prediction controller 202 is configured to control both sequential predictor 206 and out of order predictor 216. Prediction controller 202 controls the type as well as the amount of prediction generated by sequential predictor 206 or out of order predictor 216, or both. In addition, prediction controller 202 suppresses the occurrence of unnecessary predictions 203, such as excess or replication predictions. Since each prediction 208, 210, 212, 214 and 216 can operate simultaneously, the number of predictions 203 must be managed so as not to overload the prefetcher resource. Prediction controller 202 employs suppressor 204 to perform this operation and other similar operations.

본 발명의 일 실시형태에서, 억제기 (204) 는 발생되는 예측의 양을 제어한다. 이는, 판독 요청 (201) 의 특정 분배를 먼저 확정지음으로써 수행된다. 특히, 억제기 (204) 는 판독 요청 (201) 이 프로그램 인스트럭션 (즉, "코드") 또는 프로그램 데이터 (즉, "코드 아님") 둘 중 하나에 속하는지의 여부를 결정한다. 통상적으로, 프로그램 데이터보다는 코드를 회수하기 위한 판독 요청 (201) 은 자연적인 순차, 또는 적어도 패터닝가능한 순차일 경향이 있다. 왜냐하면, 이는 프로세서 (102) 가 프로그램 데이터에 대한 그 요청보다 더욱 선형적인 방법으로 인스트럭션을 실행하는 것이 일반적이기 때문이다. 이와 같이, 억제기 (204) 는, 판독 요청 (201) 이 프로그램 데이터에 관련할 때 예측 발생을 억제하기 위해 순차 예측기 (206) 또는 비순차 예측기 (216) 를 지시할 수 있다. 이는, 비논리적인 예측 (spurious prediction) 의 발생을 방지하는 것을 돕는다.In one embodiment of the invention, the suppressor 204 controls the amount of prediction that is generated. This is done by first confirming a particular distribution of the read request 201. In particular, suppressor 204 determines whether read request 201 belongs to either a program instruction (ie, "code") or program data (ie, "code not"). Typically, the read request 201 to retrieve the code rather than the program data tends to be a natural sequence, or at least a patternable sequence. This is because it is common for the processor 102 to execute instructions in a more linear manner than its request for program data. As such, the suppressor 204 can direct the sequential predictor 206 or the non-sequential predictor 216 to suppress occurrence of prediction when the read request 201 relates to program data. This helps to prevent the occurrence of spurious prediction.

또한, 억제기 (204) 는, 판독 요청 (201) 이 비-프리페치 "요구" 인지 또는 프리페치인지의 여부를 확인함으로써 순차 예측기 (206) 및 비순차 예측기 (216) 를 발생하는 예측의 양을 조절할 수 있다. 프로세서 (102) 는 통상적으로 메모리 (112) 로부터 회수되는 프로그램 인스트럭션 또는 프로그램 데이터를 요구하는 것 (비-프리페치 요구) 이 어떠한 경우에는 절대적으로 필요한 반면, 프로세서 (102) 는 단지 다음의 필요를 예상하기 위해 프로그램 인스트럭션 또는 프로그램 데이터를 프리페치하도록 요구될 수도 있다. 절대적인 필요가 예상된 필요보다 더욱 중요하게 기능할 수 있기 때문에, 억제기 (204) 는 판독 요청 (201) 의 요구에 기초하는 예측에서의 판독 요청 (201) 의 프리페치에 기초하는 예측을 억제하도록 특정 예측기를 지시할 수 있다.In addition, the suppressor 204 determines the amount of prediction that generates the sequential predictor 206 and the non-sequential predictor 216 by checking whether the read request 201 is a non-prefetch “request” or prefetch. Can be adjusted. Processor 102 is typically absolutely necessary in some cases to require program instructions or program data retrieved from memory 112 (non-prefetch requests), while processor 102 only expects the following needs. It may be required to prefetch program instructions or program data in order to do so. Since the absolute need can function more important than the expected need, the suppressor 204 can suppress the prediction based on the prefetch of the read request 201 in the prediction based on the request of the read request 201. It can point to a specific predictor.

테이블 I 은 발생된 예측의 수를 억제하는 예시적인 기술을 도시한다. 즉, 판독 요청 (201) 이 양쪽의 코드 및 요구에 속하는 경우, 억제기 (204) 는 가장 적게 억제될 것이다. 즉, 예측 컨트롤러 (202) 는 테이블 I 에서 배치 크기 (Batch Size; 4) 로 표시된 큰 크기에서 "배치 (batch)" 를 설정한다. 특정 예시에서, 배치 크기 (4) 는 7 로 설정될 수 있다. 그러나, 전술한 이유의 경우, 판독 요청 (201) 이 프로그램 데이터 (즉, 코드 아님) 및 프로세서-발생 프리페쳐와 관련하는 경우, 억제기 (204) 는 가장 많이 억제될 것이다. 이와 같이, 예측 컨트롤러 (202) 는 테이블 I 에서 배치 크기 (1) 로 표시된 작은 크기에서 "배치" 를 설정한다. 예시로서, 배치 크기 (1) 는 1 로 설정될 수 있다. 다른 경우, 예측 컨트롤러 (202) 는 배치 크기 (2) 및 배치 크기 (3) 와 같은 다른 배치 크기를 이용함으로써 예측 억제의 레벨을 다양하게 할 수 있다. 본 발명의 일 실시형태에 따른 억제기가 프로세서 요청이 데이터 또는 프리페치 요청, 또는 둘 다에 대한 것일 경우에 "배치" 분량을 감소시킴으로써 하나 이상의 예측된 어드레스의 발생을 억제하도록 구성될지라도, 테이블 I 은 제한적이지 않다. 예를 들어, 코드 또는 인스트럭션에 대한 프로세서 요청은 "배치" 크기를 증가시키기보다는 감소시킬 수 있다. 다른 예시로서, 요구에 대한 요청은 또한 "배치" 크기를 증가시키기보다는 감소시킬 수 있다. 당업계에서 일반적인 기술을 가지는 당업자는 본 발명의 범위 내에서 많은 변화가 있다는 것을 이해한다.Table I shows an example technique for suppressing the number of predictions generated. That is, if the read request 201 belongs to both codes and requests, the suppressor 204 will be least suppressed. In other words, prediction controller 202 sets a "batch" at the large size indicated as Batch Size (4) in Table I. In a particular example, batch size 4 can be set to seven. However, for the reasons described above, if the read request 201 relates to program data (ie, not code) and processor-generated prefetchers, the suppressor 204 will be most suppressed. As such, prediction controller 202 sets a "batch" at the small size indicated by batch size 1 in table I. As an example, batch size 1 can be set to one. In other cases, the prediction controller 202 can vary the level of prediction suppression by using other batch sizes such as batch size 2 and batch size 3. Although the suppressor according to one embodiment of the invention is configured to suppress the occurrence of one or more predicted addresses by reducing the "deployment" amount when the processor request is for a data or prefetch request, or both, Table I Is not limiting. For example, processor requests for code or instructions may be reduced rather than increased "batch" size. As another example, the request for a request can also be reduced rather than increased the "batch" size. Those skilled in the art having ordinary skill in the art understand that there are many variations within the scope of the present invention.

[테이블 1][Table 1]

또한, 억제기 (204) 는 순차 예측기 (206) 및 비순차 예측기 (216) 가 발생하는 예측의 유형을 조절할 수 있다. 첫째, 예측 컨트롤러 (202) 는 순방향 순차 예측기 (208) 와 역방향 순차 예측기 (212) 를 동시에 인에이블 시킬 수 있다고 고려한다. 이와 같이, 억제기 (204) 는, 역방향 순차 예측기 (212) 가 프로세서 (102) 가 내림차순으로 판독 요청을 요구할 때 오름차순으로 어드레스를 예측하는 것을 최소화하도록 트리거할 때 (즉, 신뢰 레벨을 초과할 때) 적어도 순방향 순차 예측기 (208) 를 디스에이블하도록 예측 컨트롤러 (202) 를 지시한다.The suppressor 204 can also adjust the type of prediction in which the sequential predictor 206 and the out of order predictor 216 occur. First, the prediction controller 202 considers that the forward sequential predictor 208 and the reverse sequential predictor 212 can be enabled at the same time. As such, the suppressor 204 triggers when the reverse sequential predictor 212 triggers to minimize the prediction of the address in ascending order when the processor 102 requests a read request in descending order (ie, exceeds the confidence level). ) Instruct the prediction controller 202 to disable at least the forward sequential predictor 208.

둘째, 특정 어드레스는 백 예측 (back prediction; 즉, 블라인드 백 순차 예측기 (210) 또는 백 섹터 순차 예측기 (214)) 를 예측 컨트롤러 (202) 가 동작하기 위해 순차 예측 (즉, 순방향 순차 예측기 (208) 또는 역방향 순차 예측기 (212)) 를 인에이블할 때 트리거 한다고 고려한다. 이러한 경우, 억제기 (204) 는 순 방향 순차 예측기 (208) 또는 역방향 순차 예측기 (212) 둘 중 하나에 대한 초기 양 중 하나에 의해 배치를 억제한다. 즉, "배치" 가 초기에 7 로 설정된 경우, 다음으로, "배치" 는 블라인드 백 순차 예측기 (210) 또는 백 섹터 순차 예측기 (214) 둘 중 하나를 트리거 또는 활성화함으로써 감소된다. 예를 들어, 순방향 순차 예측기 (208) 가 어드레스 A0, A1, A2,...,A7 를 발생하도록 설정되는 경우, 및 블라인드 백 순차 예측기 (210) 가 하나 이상의 판독 요청 (201) 에 대해 인에이블 되는 경우, 순방향 순차 예측기 (208) 는 단지 예측 A1, A2,...,A6 만을 발생한다. 최종 결과는 이들 판독 요청 (201) 에 대한 예측의 세트 A(-1), A(0), A1, A2,...,A6 이고, 여기서, 백 예측은 예측 A(-1) 을 제공한다.Second, a particular address may be used for back prediction (i.e., blind back sequential predictor 210 or back sector sequential predictor 214) for sequential prediction (i.e., forward sequential predictor 208) for the prediction controller 202 to operate. Or consider triggering when inverse sequential predictor 212 is enabled. In this case, the suppressor 204 suppresses the placement by one of the initial amounts for either the forward sequential predictor 208 or the reverse sequential predictor 212. That is, if "batch" is initially set to 7, then "batch" is reduced by triggering or activating either the blind back sequential predictor 210 or the back sector sequential predictor 214. For example, if forward sequential predictor 208 is set to generate addresses A0, A1, A2, ..., A7, and blind back sequential predictor 210 enables for one or more read requests 201. Forward sequential predictor 208 only generates predictions A1, A2, ..., A6. The final result is a set of predictions A (-1), A (0), A1, A2, ..., A6 for these read requests 201, where the back prediction provides prediction A (-1). .

셋째, 예측 컨트롤러 (202) 는 제 1 예측이 프로세서에서 어드레스의 순차 스트림 (201) 내에서 발생된 후에 그 예측을 억제하기 위해 블라인드 백 순차 예측 (210) 또는 백 섹터 순차 예측기 (214) 둘 중 하나를 부가적으로 디스에이블 시킬 수 있다. 이는, 순차의 기준 어드레스가 확립된 후이기 때문이며, 후속 순방향 또는 역방향 순차 예측은 역방향-유형 스페큘레이션 (하나의 어드레스가 뒤에 있지만) 을 예측한다. 예를 들어, 순방향 순차 예측 A2, A3 및 A4 은 (기준 어드레스가 A0 인 경우) 모두 예측된 역방향-유형 예측 A1, A2, 및 A3 를 커버한다. 억제기 (204) 는 예측의 다른 유형을 억제하도록 구성될 수 있고, 예들이 이하 설명된다.Third, the prediction controller 202 may either be blind back sequential prediction 210 or back sector sequential predictor 214 to suppress the prediction after the first prediction has occurred within the sequential stream 201 of addresses in the processor. Can be additionally disabled. This is because after a sequential reference address has been established, subsequent forward or backward sequential predictions predict reverse-type speculation (although one address is behind). For example, forward sequential predictions A2, A3 and A4 (if the reference address is A0) all cover the predicted backward-type predictions A1, A2, and A3. Suppressor 204 can be configured to suppress other types of predictions, examples are described below.

도 5 는 본 발명의 일 실시형태에 따라서, 비순차 예측을 억제하는 예시적인 기술을 도시한다. 이 기술에 따르면, 억제기 (204) 는 비순차로 고려될 수 있 는 인터리빙된 순차 스트림을 검출하고, 타겟 캐시 (422) 내에서 트리거-타겟 조합의 저장을 요구한다. 리소스, 특히, 타겟 캐시 (422) 내의 사용가능한 메모리를 보존하기 위해, 억제기 (204) 는 스트림 (502) 에서와 같은 비순차 어드레스를 분석하고 및 인터리빙된 순차 스트림과 같은 비순차 어드레스를 모델링한다. 도시된 바와 같이, 스트림 (502) 은 각각의 인터벌 I1, I2, I3, I4, I5, I6, I8, 및 I9 동안 검출된 어드레스 A0, B0, C0, A1, B1, C1, A2, B2, 및 C2 로 구성된다. 억제기 (204) 는 순차로서 비순차 어드레스를 모델링하기 위해 테이블 (504) 과 같은 데이터 구조를 포함한다. 테이블 (504) 은 스트림 (502) 을 분해하기 위한 임의의 수의 스트림 추적기 (stream tracker) 를 포함할 수 있다. 특히, 스트림 추적기 (520, 522, 및 524) 는 순차 스트림 (B0, B1, 및 B2), (A0, A1, 및 A2), 및 (C0 및 C1) 을 각각 모델링하기 위해 설계된다. A7 (미도시) 와 같은 스트림 (502) 에서의 이후-검출된 판독 어드레스는 비순차 예측이 추적되는 스트림에 대해 억제될 수 있는지의 여부를 관찰하기 위해 이들 스트림에 대해 비교된다.5 illustrates an example technique for suppressing out of order prediction, in accordance with an embodiment of the present invention. According to this technique, the suppressor 204 detects an interleaved sequential stream that may be considered out of order and requires the storage of the trigger-target combination in the target cache 422. To conserve resources, especially available memory in the target cache 422, the suppressor 204 analyzes non-sequential addresses such as in stream 502 and models non-sequential addresses such as interleaved sequential streams. . As shown, stream 502 includes addresses A0, B0, C0, A1, B1, C1, A2, B2, and during the intervals I1, I2, I3, I4, I5, I6, I8, and I9 that are detected. It consists of C2. Suppressor 204 includes a data structure such as table 504 to model non-sequential addresses as sequential. The table 504 can include any number of stream trackers for decomposing the stream 502. In particular, stream trackers 520, 522, and 524 are designed to model sequential streams B0, B1, and B2, A0, A1, and A2, and C0 and C1, respectively. The post-detected read addresses in stream 502 such as A7 (not shown) are compared against these streams to observe whether out of order predictions can be suppressed for the stream being tracked.

동작시에, 억제기 (204) 는 순차의 제 1 어드레스와 같은 기준 어드레스 (510) 를 저장함으로써 순차 스트림을 추적한다. 그 후, 억제기 (204) 는 최종적으로-검출된 어드레스 (514) 를 유지한다. 각 신규의 최종-검출 어드레스 (예를 들어, 스트림 추적기 (520) 의 B2) 에 대해서, 사전 최종-검출 어드레스 (예를 들어, 스트림 추적기 (520) 의 B1) 는 부가적 컬럼인 컬럼 (512) 에 위치시킴으로써 취소된다 ("취소 (void)"). 이러한 예시적인 기술을 통해서, 억제기 (204) 는 다른 유형의 예측이 이용될 수 있을 때, 불필요한 비순차 예측의 발생을 억제한다. 따라서, 도 5 에 도시된 예시에 대해, 순방향 순차 예측기 (208) 는 스트림 (502) 에 대한 예측을 충분하게 발생시킬 수 있다.In operation, the suppressor 204 tracks the sequential stream by storing a reference address 510, such as the sequential first address. The suppressor 204 then retains the finally-detected address 514. For each new final-detection address (e.g., B2 of stream tracker 520), the preliminary final-detection address (e.g., Bl of stream tracker 520) is an additional column column 512. Is canceled by placing it in a "void". Through this exemplary technique, the suppressor 204 suppresses the generation of unnecessary out of order predictions when other types of predictions may be used. Thus, for the example shown in FIG. 5, forward sequential predictor 208 can sufficiently generate prediction for stream 502.

도 6 은 본 발명의 일 실시형태에 따라서, 비순차 예측을 억제하는 다른 예시적인 기술을 도시한다. 이 기술에 따라서, 억제기 (204) 는 도 5 에 도시된 프로세스와 유사한 인터리빙된 순차 스트림으로서 비순차 어드레스를 모델링한다. 그러나, 도 6 의 기술은 임의의 수의 스레드 (thread) 에 걸쳐서 순차 스트림을 검출하기 위해 이용된 각각의 복합 데이터 구조를 구현한다. 이 예시에서, 테이블 (604, 606, 및 608) 은 스레드 (0; "T"), 스레드 (1; "T'") 및 스레드 (2; "T''") 에 대해 각각의 스트림 추적기를 포함한다. 이 기술을 통해서, 스트림 (602) 의 비순차 어드레스는 비순차 예측을 억제하도록 다중의 스레드에 걸쳐서 다중의 순차 스트림으로 모델링될 수 있다. 이 기술은 순차 스트림 또는 다른 유형의 예측을 반대로 하도록 적용될 수도 있다는 것이 명시된다.6 illustrates another example technique for suppressing out of order prediction, in accordance with an embodiment of the present invention. In accordance with this technique, the suppressor 204 models out of order addresses as an interleaved sequential stream similar to the process shown in FIG. However, the technique of FIG. 6 implements each composite data structure used to detect sequential streams over any number of threads. In this example, the tables 604, 606, and 608 store respective stream trackers for thread (0; "T"), thread (1; "T '"), and thread (2; "T' '"). Include. Through this technique, the out of order addresses of stream 602 can be modeled as multiple out of order streams over multiple threads to suppress out of order prediction. It is specified that this technique may be applied to reverse sequential streams or other types of prediction.

도 7 은 본 발명의 특정 실시형태에 따라서, 비순차 예측을 억제하는 다른 기술을 도시한다. 어드레스 (702) 의 스트림에 대해서, 비순차성이 어드레스 A4 및 B0 사이에 존재한다. 그러나, 몇몇 경우에, 이 요청된 판독 어드레스들 사이의 시간차가 너무 짧은 경우, 다음으로, 비순차 예측을 채용하기 위한 충분한 시간은 없다. 억제기 (204) 의 매처 (706) 는 어드레스 A4 와 B0 사이의 시간차 d 를 비교하기 위해 동작한다. d 가 역치 (TH; threadhold) 와 동일하거나 그보다 큰 경우, 매처 (706) 는 동작하기 위한 비순차 예측기 (216) 를 인에이블 (즉, "억제하지 않음") 하기 위해 신호를 보낸다. 그러나, d 가 TH 보다 작은 경우, 매처 (706) 는 비순차 예측기 (216) 를 디스에이블 시키기 위해 신호를 보내고, 그로 인해, 예측을 억제한다.7 illustrates another technique for suppressing out of order prediction, in accordance with certain embodiments of the present invention. For the stream at address 702, out of order exists between addresses A4 and B0. However, in some cases, if the time difference between these requested read addresses is too short, then there is not enough time to employ out of order prediction. The matcher 706 of the suppressor 204 operates to compare the time difference d between addresses A4 and B0. If d is equal to or greater than the thread (TH) thread, matcher 706 signals to enable (ie, "do not suppress") out of order predictor 216 to operate. However, if d is less than TH, matcher 706 signals to disable out of order predictor 216, thereby suppressing prediction.

억제기 (204) 에 의해 구현될 수 있는 다른 억제 메커니즘이 이하 설명된다. 일반적으로, 백 섹터 어드레스에 대한 요청이 프런트 섹터 어드레스를 요청한 후 프로세서 (102) 에 의해 이루어지기 전에 경과하는 유한한 양의 시간이 있다. 시간의 양이 충분히 길면, 백 섹터 어드레스 판독 요청은 불규칙성 (즉, 프런트 섹터에 대해 패터닝 불가함) 으로 나타날 수도 있다. 이를 방지하기 위해, 억제기 (204) 는 프로세서 (102) 에 의한 프런트 섹터 판독의 리스트를 유지하도록 구성된다. 프런트 섹터 어드레스를 검출한 후에, 어드레스는 그 프런트 섹터 어드레스에 대해 비교된다. 대응 백 섹터가 도달할 때, 인식된다. 따라서, 비순차성뿐만 아니라 그 예측도 억제될 수 있다.Other suppression mechanisms that can be implemented by the suppressor 204 are described below. In general, there is a finite amount of time that elapses before a request for a back sector address is made by processor 102 after requesting a front sector address. If the amount of time is long enough, the back sector address read request may appear to be irregular (ie, non-patternable for the front sector). To prevent this, the suppressor 204 is configured to maintain a list of front sector reads by the processor 102. After detecting the front sector address, the address is compared against that front sector address. When the corresponding back sector arrives, it is recognized. Thus, the prediction as well as the out of order can be suppressed.

도 8 은 본 발명의 특정 실시형태에 따라서, 예측의 발생을 촉진시키기 위한 예시적인 기술을 도시한다. 상세하게는, 촉진기 (205) (도 2) 는 비순차 예측의 발생을 재촉하기 위한 이 기술에 따라서 동작한다. 이 예시에서, 스트림 (802) 은 2 개의 인접 순차 스트림 (A0 내지 A4) 및 (B0 내지 B3) 를 포함한다. 비순차 예측기 (216) 는 통상적으로 타겟 어드레스 (810) 로서 어드레스 B0 를 가지는 트리거 어드레스 (808) 로서 어드레스 A4 를 지정한다. 그러나, 비순차 예측을 발생하는 시간을 감소시키기 위해서, 트리거 어드레스 (808) 는 신규의 트리거 어드레스 (804; 즉, A0) 로 변경될 수 있다. 따라서, 타겟 어드레스에 대한 신규의 트리거 어드레스의 지정을 통해서, 다음의 프로세서 (102) 는 스트림 (802) 의 어드레스를 요청하고, 비순차 예측기 (216) 는 후속 어드레스보다 더 빠른 어드레스의 검출하에서 그 예측을 즉시 발생할 수 있다 (즉, A0 이 A4 보다 "신규의" 트리거 어드레스로서 검출될 때의 예측을 발생할 수 있다). 이는, 비순차 예측이 가장 적절한 시간에 발생된다는 것을 확신한다.8 illustrates an example technique for facilitating the generation of prediction, in accordance with certain embodiments of the present invention. Specifically, accelerator 205 (FIG. 2) operates in accordance with this technique to promote the occurrence of out of order prediction. In this example, stream 802 includes two adjacent sequential streams A0 to A4 and B0 to B3. Out of order predictor 216 typically specifies address A4 as trigger address 808 having address B0 as target address 810. However, to reduce the time to generate out of order prediction, the trigger address 808 can be changed to a new trigger address 804 (ie A0). Thus, through the designation of a new trigger address for the target address, the next processor 102 requests the address of the stream 802 and the out of order predictor 216 predicts it under detection of an address earlier than the subsequent address. May occur immediately (ie, prediction may occur when A0 is detected as a "new" trigger address than A4). This ensures that out of order prediction occurs at the most appropriate time.

도 9 는 본 발명의 일 실시형태에 따른 다른 예시적인 스페큘레이터를 나타낸다. 이 예시에서, 프리페쳐 (900) 는 불필요한 예측 발생을 최소한으로 유지시키도록 잉여의 어드레스를 필터링하기 위한 필터 (914) 를 가지는 스페큘레이터 (908) 를 포함한다. 도 9 의 프리페쳐 (900) 는 또한 다중-레벨 캐시 (920) 및 예측 인벤토리 (916) 를 포함한다. 여기서, 다중-레벨 캐시 (920) 는 제 1 레벨 데이터 복귀 캐시 ("DRC1"; 922) 및 제 2 레벨 데이터 복귀 캐시 ("DRC2"; 924) 로 구성된다. 제 1 레벨 데이터 복귀 캐시 (922) 는 일반적으로 단기간 데이터 저장소로서 설명될 수 있고, 제 2 레벨 데이터 복귀 캐시 (924) 는 일반적으로 장기간 데이터 저장소로서 설명될 수 있다. 다중-레벨 캐시 (920) 는 프로세서 (102) 가 그들을 요구할 때까지, 메모리 (112) 로부터 프리페치된 프로그램 인스트럭션 및 프로그램 데이터를 저장한다. 이와 유사하게, 예측 인벤토리 (916) 는 중재기 (918; arbiter) 에 의해 액세스 메모리 (112) 로 선택될 때까지 발생된 예측에 대한 임시 저장소를 제공한다. 중재기 (918) 는, 중재 규칙에 따라서, 발생된 예측이 인스트럭션 및 데이터를 프리페치하는 메모리 (112) 를 액세스기 위해 발행되는 것을 결정하도록 구성된다.9 shows another exemplary speculator according to an embodiment of the present invention. In this example, prefetcher 900 includes a speculator 908 having a filter 914 for filtering out redundant addresses to keep unnecessary prediction occurrences to a minimum. The prefetcher 900 of FIG. 9 also includes a multi-level cache 920 and predictive inventory 916. Here, the multi-level cache 920 is comprised of a first level data return cache ("DRC1") 922 and a second level data return cache ("DRC2") 924. The first level data return cache 922 may generally be described as a short term data store, and the second level data return cache 924 may generally be described as a long term data store. The multi-level cache 920 stores program instructions and program data prefetched from the memory 112 until the processor 102 requires them. Similarly, prediction inventory 916 provides temporary storage for predictions generated until selected as access memory 112 by arbiter 918. The arbiter 918 is configured to determine, according to the arbitration rule, that the generated prediction is issued to access the memory 112 that prefetches instructions and data.

필터 (914) 는 적어도 두 개의 필터: 캐시 필터 (910) 및 인벤토리 필터 (912) 를 포함한다. 캐시 필터 (910) 는 다중-레벨 캐시 (920) 내에 이미 저장된 인스트럭션 및 데이터를 프리페치한 사전의 예측과 신규로-발생된 예측을 비교하도록 구성된다. 따라서, 하나 이상의 신규로-발생된 예측이 다중-레벨 캐시 (920) 에 관련하는 임의의 사전-발생 예측에 대해 잉여분이 발생할 경우에, 잉여의 예측은 예측의 수를 최소화하기 위해 취소된다. 또한, 인벤토리 필터 (912) 는 예측 인벤토리 (916) 에서 이미 발생되고 저장된 예측들에 대항하여 신규로-발생된 예측을 비교하도록 구성된다. 따라서, 하나 이상의 신규로-발생된 예측은 예측 인벤토리 (916) 내에 저장된 사전의 예측들에 대해 잉여 발생하고, 예측의 수를 최소화하도록 임의의 잉여의 예측이 취소될 수 있으며, 그로 인해 프리페쳐 리소스를 해방시킨다.Filter 914 includes at least two filters: cache filter 910 and inventory filter 912. Cache filter 910 is configured to compare the previously-predicted pre-prediction of the instructions and data already stored in multi-level cache 920 with the newly-generated prediction. Thus, if a surplus occurs for any pre-generated prediction in which one or more newly-generated predictions relate to the multi-level cache 920, the surplus prediction is canceled to minimize the number of predictions. In addition, the inventory filter 912 is configured to compare the newly-generated prediction against predictions already generated and stored in the predictive inventory 916. Thus, one or more newly-generated predictions are redundant to the prior predictions stored in prediction inventory 916, and any surplus predictions can be canceled to minimize the number of predictions, thereby prefetcher resources Free

비순차Out of order 예측기에 대한 예시적인 실시형태 Example Embodiments for Predictor

도 10 은 본 발명의 특정 실시형태에 따라서 예시적인 비순차 ("NONSEQ") 예측기 (1010) 를 도시하는 블록도이다. 이 예시에서, 비순차 예측기 (1010) 는 스페큘레이터 (1008) 내에 상주하도록 도시되고, 또한, 순차 예측을 발생하기 위한 순차 예측기 (1012) 를 포함한다. 스페큘레이터 (1008) 를 포함하는 프리페쳐 (1006) 는 요구되기 전에 (미도시) 메모리로부터의 프로그램 인스트럭션 및 프로그램 데이터 둘 다를 "페치" 하고, 그 후, 그 프로세서 (미도시) 에 의한 요청 하에서 페치된 프로그램 인스트럭션 및 프로그램 데이터를 프로세스로 제공하도록 동작한다. 사용하기 전에 그들을 페치함 (즉, "프리페칭") 으로써, 프로세서 유휴 시간 (예를 들어, 프로세서에서 데이터가 고갈되는 동안의 시간) 은 최소화된다. 비순차 예측기 (1010) 는 예측을 발생하기 위한 비순차 예측 엔진 ("예측 엔진") (1020) 및 예측을 저장하고 우선순위를 설정하기 위한 타겟 캐시 (1030) 를 포함한다.10 is a block diagram illustrating an exemplary out of order ("NONSEQ") predictor 1010 in accordance with certain embodiments of the present invention. In this example, out of order predictor 1010 is shown to reside within speculator 1008 and also includes a sequential predictor 1012 for generating sequential prediction. Prefetcher 1006 including speculator 1008 “fetches” both program instructions and program data from (not shown) memory before it is required, and then under request by its processor (not shown) Operate to provide fetched program instructions and program data to a process. By fetching them (ie, “prefetching”) before use, processor idle time (eg, time while data is running out of the processor) is minimized. Out of order predictor 1010 includes an out of order prediction engine (“prediction engine”) 1020 for generating predictions and a target cache 1030 for storing and prioritizing the predictions.

또한, 프리페쳐 (1006) 는 필터 (1014), 선택 예측 인벤토리 (1016), 선택 중재기 (1018), 및 다중-레벨 캐시 (1040) 를 포함한다. 여기서, 필터 (1014) 는 프로그램 인스트럭션 및 프로그램 데이터가 다중-레벨 캐시 (1040) 로 먼저 프리페쳐되도록 유발한 사전 예측과 신규로-발생한 예측을 비교하기 위해 구성된 캐시 필터 (미도시) 를 포함한다. 따라서, 임의의 신규-발생 예측이 다중-레벨 캐시 (1040) 내에 저장된 임의의 사전에-발생한 예측이 잉여 발생하는 경우, 잉여의 예측은 예측의 수를 최소화하도록 취소되고, 이로 인해, 프리페쳐 리소스를 해방시킨다. 예측 인벤토리 (1016) 는 메모리에 액세스하기 위해 중재기 (1018) 에 의해 선택될 때까지 발생된 예측을 저장하는 임시 저장소를 제공한다. 중재기 (1018) 는, 어떠한 발생된 예측이 인스트럭션 및 데이터를 프리페치하기 위해 메모리에 액세스하도록 발행되는지를 결정하도록 구성된다. The prefetcher 1006 also includes a filter 1014, a selection prediction inventory 1016, a selection arbiter 1018, and a multi-level cache 1040. Here, the filter 1014 includes a cache filter (not shown) configured to compare the newly-occurred prediction with the pre-prediction that caused the program instructions and program data to be prefetched first into the multi-level cache 1040. Thus, if any new-occurring prediction is redundant in any pre-generated prediction stored in the multi-level cache 1040, the redundant prediction is canceled to minimize the number of predictions, thereby prefetcher resources Free Prediction inventory 1016 provides temporary storage for storing predictions generated until selected by arbiter 1018 to access memory. The arbiter 1018 is configured to determine which generated predictions are issued to access the memory to prefetch instructions and data.

다중-레벨 캐시 (1040) 는 제 1 레벨 데이터 복귀 캐시 ("DRC1"; 1042) 및 제 2 레벨 데이터 복귀 캐시 ("DRC2"; 1044) 로 구성된다. 제 1 레벨 데이터 복귀 캐시 (1042) 는 일반적으로 단기간 데이터 저장소로서 설명될 수 있고, 제 2 레벨 데이터 복귀 캐시 (1044) 는 일반적으로 장기간 데이터 저장소로서 설명될 수 있다. 본 발명의 일 실시형태에 따르면, 제 1 레벨 데이터 복귀 캐시 (1042) 또는 제 2 레벨 데이터 복귀 캐시 (1044) 둘 중 하나, 또는 둘 다는 예측된 어드레 스 (즉, 타겟 어드레스) 에 기초하여 프리페치된 프로그램 인스트럭션 및 프로그램 데이터를 저장할 수 있다. 도시된 바와 같이, 다중-레벨 캐시 (1040) 내에 저장된 프리페치된 예측 정보는 데이터 (TRT1) 및 데이터 (TRT2) 로서 나타낸다. 이 표시는 타겟 어드레스 TRT1 및 TRT2 는 예측 정보를 나타내는 데이터를 프리페칭에 기여한다. 도시된 바와 같이, 이하 설명되는 바와 같이,데이터 (TRT1) 및 데이터 (TRT2) 는 각각 예측 식별자 ("PID"; 1 및 2) 를 가지는 다중-레벨 캐시 (1040) 내에 저장된다. 데이터 (TRT1) 또는 데이터 (TRT2) 둘 중 하나가 프로세서에 의해 요청되는 경우, 대응 타겟 어드레스 (예를 들어, TRT1) 및 예측 식별자는 비순차 예측기 (1010) 로 통신된다.The multi-level cache 1040 is comprised of a first level data return cache ("DRC1") 1042 and a second level data return cache ("DRC2") 1044. The first level data return cache 1042 can generally be described as a short term data store, and the second level data return cache 1044 can generally be described as a long term data store. According to one embodiment of the invention, either, or both, the first level data return cache 1042 or the second level data return cache 1044 are prefetch based on the predicted address (ie, the target address). Stored program instructions and program data. As shown, the prefetched prediction information stored in the multi-level cache 1040 is represented as data TRT1 and data TRT2. This indication contributes to prefetching data in which the target addresses TRT1 and TRT2 represent prediction information. As shown, as described below, data TRT1 and data TRT2 are stored in a multi-level cache 1040 having prediction identifiers " PID " 1 and 2, respectively. When either data TRT1 or data TRT2 is requested by the processor, the corresponding target address (eg, TRT1) and the prediction identifier are communicated to the out of order predictor 1010.

동작시에, 스페큘레이터 (1008) 는 프로세서가 메모리로의 액세스 ("판독 요청") 를 요청함에 따라서 시스템 버스를 모니터한다. 프로세서가 프로그램 인스트럭션을 실행하기 때문에, 스페큘레이터 (1008) 는 프로세서에 의해 이제 이용되는 프로그램 인스트럭션 및 프로그램 데이터를 포함하는 어드레스에 대한 판독 요청을 검출한다. 논의의 목적으로, "어드레스" 는 메모리와 캐시 메모리 사이에서 일반적으로 전송되는, 다중-레벨 캐시 (1040) 와 같은 메모리의 유닛 또는 캐시 라인과 조합된다. 캐시 메모리는 타겟 캐시 (1030) 의 외부 저장소의 일 예라는 것이 명시된다.In operation, speculator 1008 monitors the system bus as the processor requests access to memory (“read request”). Because the processor executes program instructions, speculator 1008 detects a read request for an address that includes program data and program data that is now used by the processor. For the purposes of discussion, an "address" is combined with a cache line or unit of memory, such as a multi-level cache 1040, that is generally transferred between memory and cache memory. It is specified that the cache memory is an example of external storage of the target cache 1030.

검출된 판독 요청에 기초하여, 비순차 예측기 (1010) 는 프로세서에 의해 이후에 요청될 가능성이 있는 구성가능한 수의 예측 어드레스를 발생시킬 수 있다. 특히, 비순차 예측기 (1010) 는, 어드레스가 판독 요청의 비선형 스트림 내에 있 을때 조차, 어드레스의 검출에 후속하는 하나 이상의 예측 (즉, 예측된 어드레스) 을 발생하도록 구성된다. 통상적으로, 앞선 어드레스 하나에 기초한 예측, 다음 어드레스를 예측하는 것이 어려운 조건하에서 요청된 어드레스의 관찰가능한 패턴은 없다. 그러나, 본 발명의 일 실시형태에 따르면, 비순차 예측 엔진 (1020) 은 하나 이상의 선행의 어드레스로부터 패터닝불가한 예측된 어드레스를 포함하는 비순차 예측을 발생시킨다. "패터닝불가한" 예측은 함께 패터닝될 수 없는 예측이거나, 또는 선행의 어드레스에 대해 불규칙하다. 패터닝불가한 예측의 일 유형은 비순차 예측이다. 비순차 예측이 기초가 되는 앞선 어드레스는 즉각 어드레스 또는 트리거 어드레스로서 구성된 임의의 어드레스 둘 중 하나일 수 있다. 특히, 판독 요청의 스트림 내에서 2 개 이상의 어드레스에 걸친 하나 이상의 패턴의 부족은 메모리 위치의 다양한 공간 위치로부터 인스트럭션 및 데이터를 페칭하는 것에 대해 약간 무차별적인 방법으로 프로그램 인스트럭션을 실행하는 프로세스를 암시한다.Based on the detected read request, the out of order predictor 1010 may generate a configurable number of prediction addresses that are likely to be subsequently requested by the processor. In particular, out of order predictor 1010 is configured to generate one or more predictions (ie, predicted addresses) following detection of the address, even when the address is in the nonlinear stream of the read request. Typically, there is no observable pattern of requested addresses under conditions that are difficult to predict based on one previous address, the next address. However, according to one embodiment of the present invention, the out of order prediction engine 1020 generates out of order predictions that include predicted addresses that are not patternable from one or more preceding addresses. A "nonpatternable" prediction is a prediction that cannot be patterned together, or is irregular for the preceding address. One type of nonpatterned prediction is out of order prediction. The preceding address on which the out of order prediction is based may be either an immediate address or any address configured as a trigger address. In particular, the lack of one or more patterns across two or more addresses in the stream of read requests implies a process of executing program instructions in a slightly indiscriminate manner for fetching instructions and data from various spatial locations of memory locations.

비순차 예측기 (1010) 는 비순차 예측으로서 각각 적격할 수 있는 하나 이상의 잠재 비순차 어드레스에 대해 선행의 어드레스에 대한 조합을 저장하기 위한 저장소로서 타겟 캐시 (1030) 를 포함한다. 타겟 캐시 (1030) 는 신속한 방법으로 비순차 예측을 발생시키는 도입하는 검출된 어드레스와 그 콘텐츠를 비교하도록 설계된다. 또한, 타겟 캐시 (1030) 는, 예를 들어, 캐시 메모리 내에서의 히트에 응답하여 비순차 예측의 우선순위를 매기도록 구성된다. 또는, 비순차 예측기 (1010) 는 신규의 비순차 예측과 특정 트리거 어드레스 사이의 조합을 확립하는 제 1 예의 우선순위를 매길 수 있다. "트리거" 어드레스는, 비순차 예측기 (1010) 가 그 둘 사이의 패터닝불가한 조합의 "타겟" 으로 지칭된 결과 예측을 가지는 비순차 예측을 발생시키는 검출된 어드레스이다. 타겟 캐시 (1030) 는 다중-이식된 (multi-ported) 메모리에 의해 다르게 이용되는 리소스를 유지하기 위한 단일-이식된 메모리 (single-ported) 일 수 있다는 것이 본 발명의 적어도 하나의 실시형태에 따라서 명시된다.Out of order predictor 1010 includes a target cache 1030 as storage for storing a combination of preceding addresses for one or more potential out of order addresses that may each qualify as out of order prediction. The target cache 1030 is designed to compare its content with the introduced address that generates out of order prediction in a fast manner. The target cache 1030 is also configured to prioritize out of order prediction in response to, for example, a hit in the cache memory. Or, the out of order predictor 1010 can prioritize the first example to establish a combination between the new out of order prediction and the specific trigger address. A "trigger" address is a detected address that causes out of order predictor 1010 to generate out of order prediction with a result prediction called "target" of an unpatternable combination between the two. In accordance with at least one embodiment of the present invention, the target cache 1030 may be a single-ported memory for holding resources otherwise utilized by multi-ported memory. Is specified.

프리페쳐 (1006) 는 비순차 예측기 (1010) 로부터 예측을 발행하고, 비순차 예측은 메모리에 접근하기 위해 이용된다. 이에 응답하여, 메모리는 예측된 어드레스와 관련하여 참조하는 프리페쳐된 데이터를 복귀하고, 여기서, 참조정보는 예측 식별자 ("PID") 및 대응 타겟 어드레스를 포함할 수 있다. 다음으로, 다중-레벨 캐시 메모리 (1040) 는 프로세서가 복귀된 데이터를 요청하는 시간까지 복귀된 데이터를 임시로 저장한다. 이하 설명되는 바와 같이, 프로세서는 프리페쳐된 데이터 (즉, 예측 정보) 를 요청할 때, 참조정보는 필요한 경우 비순차 예측의 우선순위를 재조정하기 위해 비순차 예측기 (1010) 로 전달된다.Prefetcher 1006 issues predictions from out of order predictor 1010, and out of order prediction is used to access the memory. In response, the memory returns prefetched data that references with respect to the predicted address, where the reference information may include a predictive identifier (“PID”) and a corresponding target address. Next, the multi-level cache memory 1040 temporarily stores the returned data until the time the processor requests the returned data. As described below, when the processor requests prefetched data (ie, prediction information), the reference information is passed to the out of order predictor 1010 to reorder the out of order prediction if necessary.

도 11 은 본 발명의 일 실시형태에 따라서, 예시적인 비순차 예측기 (1010) 를 도시한다. 비순차 예측기 (1010) 는, 타겟 캐시 (1130) 에 의해 예시된 저장소에 동작가능하게 연결된 비순차 예측 엔진 ("NonSeq. Prediction Engine") (1120) 을 포함한다. 또한, 비순차 예측 엔진 (1120) 은 예측 발생기 (1122) 및 우선순위 조절기 (1124) 를 포함한다. 예측 발생기 (1122) 는 예측을 발생시키고, 타겟 캐시 (1130) 내에 저장된 트리거-타겟 조합을 관리한다. 우선순 위 조절기 (1324) 는, 예를 들어, 가장 많은 최근의 성공적인 타겟 어드레스에서 적어도 가장 적은 최근의 또는 성공적인 타겟 어드레스로의 트리거-타겟 조합의 우선순위를 매기도록 동작한다. 예측 발생기 (1122) 및 우선순위 조절기 (1124) 는 도 12 및 도 13 각각을 통해서 더욱 상세하게 설명된다.11 illustrates an exemplary out of order predictor 1010, in accordance with an embodiment of the present invention. Out-of-order predictor 1010 includes an out-of-order prediction engine ("NonSeq. Prediction Engine") 1120 operably connected to the storage illustrated by target cache 1130. The out of order prediction engine 1120 also includes a prediction generator 1122 and a priority adjuster 1124. Prediction generator 1122 generates predictions and manages trigger-target combinations stored in target cache 1130. Priority adjuster 1324 operates, for example, to prioritize the trigger-target combination from the most recent successful target address to at least the least recent or successful target address. Prediction generator 1122 and priority adjuster 1124 are described in more detail with reference to FIGS. 12 and 13, respectively.

타겟 캐시 (1130) 는 각각의 트리거 어드레스 ("TGR") 및 하나 이상의 대응 타겟 어드레스 ("TRT") 사이의 조합을 유지한다. 도 11 은 비순차 어드레스를 조합하기 위한 수많은 방법 중 하나를 나타낸다. 여기서, 트리 구조는 특정 트리거 어드레스를 그 대응 타겟 어드레스에 관련시킨다. 이 예시에서, 타겟 캐시 (1130) 는 어드레스 "B", "X" 및 "L" 과 같은 가능한 비순차 예측의 어드레스로 조합을 형성하는 트리거 어드레스로서 어드레스 "A" 를 포함한다. 또한, 이들 3 개의 타겟 어드레스는 각각의 어드레스 "C" 와 "G", "Y", 및 "M" 에 대한 트리거 어드레스이다. 특히, 예측 발생기 (1122) 는 신규의 트리거-타겟 조합을 발견하고 타겟 캐시 (1130) 내에 그 조합을 삽입하는, 타겟 캐시 (1130) 의 형성 및 동작은 이하 상세하게 설명된다. 또한, 어드레스 "A" 는 도 11 에 도시되지 않은 트리거 어드레스에 대한 타겟 어드레스일 수 있다. 또한, 많은 다른 조합이 도시되지 않은 어드레스들 중에서 가능하다.The target cache 1130 maintains a combination between each trigger address ("TGR") and one or more corresponding target addresses ("TRT"). 11 illustrates one of a number of methods for combining out of order addresses. Here, the tree structure associates a particular trigger address with its corresponding target address. In this example, target cache 1130 includes address "A" as a trigger address that forms a combination with addresses of possible out of order predictions such as addresses "B", "X", and "L". In addition, these three target addresses are trigger addresses for respective addresses "C" and "G", "Y", and "M". In particular, the formation and operation of target cache 1130, in which prediction generator 1122 discovers a new trigger-target combination and inserts the combination into target cache 1130, is described in detail below. Also, the address "A" may be a target address for a trigger address not shown in FIG. In addition, many other combinations are possible among the addresses not shown.

도시된 바와 같이, 타겟 캐시는 본 발명의 일 실시형태에 따른 폭 ("w"), 깊이 ("d"), 및 높이 ("h") 의 적어도 3 개의 변수에 따라서 비순차 예측 엔진 (1120) 에 의해 구성될 수 있다. 폭, w 은 트리거 어드레스가 예측할 수 있는 가능한 타겟의 수를 설정하고, 깊이, d 는 트리거 어드레스와 조합된 레벨의 수를 설정한다. 높이, h 는 비순차 예측을 발생시키기 위해 이용된 연속적인 트리거 어드레스의 수를 설정한다. 예로서, d 는 "4" 개의 깊이를 나타낸다고 고려한다. 이는, 어드레스 A 는 제 1 레벨에 있고, 어드레스 B 는 제 2 레벨에 있고, 어드레스 C 및 G 는 제 3 레벨에 있고, 어드레스 D 는 제 4 레벨에 있다는 것을 의미한다. 다른 예로서, w 는 "2" 로 설정된다고 고려한다. 이는, 3 개의 어드레스 "B", "X", 및 "L" 의 단지 2 개만이 레그 0 및 레그 1 로서 비순차 예측에 대해 이용되고, 모든 3 개의 어드레스는 제 2 레벨에 있다는 것을 의미한다. 특정 실시형태에서, 변수 h 는 다중-레벨 예측 발생을 유효하게 하기 위해 단지 제 1 레벨을 지나는 레벨의 수를 설정한다.As shown, the target cache is an out-of-sequence prediction engine 1120 according to at least three variables of width ("w"), depth ("d"), and height ("h") according to one embodiment of the invention. It can be configured by). The width, w sets the number of possible targets that the trigger address can predict, and the depth, d sets the number of levels combined with the trigger address. The height, h, sets the number of consecutive trigger addresses used to generate out of order prediction. As an example, consider d representing "4" depths. This means that address A is at the first level, address B is at the second level, addresses C and G are at the third level, and address D is at the fourth level. As another example, consider w to be set to "2". This means that only two of the three addresses "B", "X", and "L" are used for out-of-order prediction as leg 0 and leg 1, and all three addresses are at the second level. In a particular embodiment, the variable h sets the number of levels just past the first level to validate multi-level prediction occurrences.

도 11 에 도시된 바와 같이, h 는 2 로 설정된다고 고려한다. 이는, 제 1 레벨 내의 트리거 어드레스 (예를 들어, 어드레스 A) 및 제 2 레벨 내의 연속적인 트리거 어드레스 (예를 들어, 어드레스 B) 의 트리거 어드레스의 2 개의 레벨이 있다는 것을 의미한다. 따라서, 2 로 설정된 h 를 통해서, 예측의 제 1 그루핑은 어드레스 A 를 트리거하는데 응답하여 형성된다. 즉, 제 2 레벨의 임의의 타겟 어드레스들은 비순차 어드레스들의 하나 이상의 그룹을 발생시킬 수 있다. 예를 들어, 임의의 어드레스 "B", "X", 및 "L" 은 비순차 예측을 발생하기 위한 기준일 수 있고, 여기서, 이들 어드레스의 수는 비순차 예측 엔진 (1120) 에 의해 정의된 활성 레그 (예를 들어, 레그 2 를 통한 레그 0) 의 수에 의해 선택된다. 그러나, 다중-레벨 예측 발생 (및 2 로 설정된 h 를 통해서) 에 따라서, 어드레스 "B", "X" 및 "L" 각각은 하위 다음 레벨의 타겟 어드레스에 기초하여 예측의 제 2 그루핑을 발생하는 연속적인 트리거 어드레스일 수 있다. 따라서, 제 3 레벨의 타겟 어드레스 C 및 G 는 연속적인 트리거 어드레스 B 에 기초하여 부가적인 비순차 예측을 발생시키기 위해 이용될 수 있다. 유사하게, 타겟 어드레스 Y 및 M 은 연속적인 트리거 어드레스 X 및 L 각각에 의존하여 비순차 예측을 발생하기 위해 이용될 수 있다. 당업계의 일반적인 기술을 가지는 당업자는 3 개의 전술한 변수 중 하나 이상을 변화시킴으로써 가능한 많은 구현이 있다는 것이 명시되어야만 한다.As shown in FIG. 11, it is considered that h is set to two. This means that there are two levels of trigger addresses of the trigger address (eg address A) in the first level and successive trigger addresses (eg address B) in the second level. Thus, with h set to 2, the first grouping of predictions is formed in response to triggering address A. That is, any target addresses of the second level may generate one or more groups of out of order addresses. For example, any of the addresses "B", "X", and "L" may be the criteria for generating out of order prediction, where the number of these addresses is an active defined by out of order prediction engine 1120. It is selected by the number of legs (eg leg 0 through leg 2). However, according to multi-level prediction occurrences (and through h set to 2), each of the addresses "B", "X" and "L" generates a second grouping of predictions based on the target address of the lower next level. It may be a consecutive trigger address. Thus, the third level of target addresses C and G can be used to generate additional out of order prediction based on the successive trigger address B. Similarly, target addresses Y and M may be used to generate out of order predictions depending on successive trigger addresses X and L, respectively. Those skilled in the art having ordinary skill in the art should specify that there are as many implementations as possible by changing one or more of the three aforementioned variables.

비순차 예측 엔진 (1120) 은 판독 요청의 예시적인 어드레스 (1101) 을 수신하도록 구성된다. 도 11 은 비순차 어드레스 스트림 (1102, 1104, 1106, 1108 및 1110) 을 개념적으로 도시하고, 이들 각각은 사전 검출된 어드레스로 패터닝불가한 어드레스를 포함한다. 예를 들어, 스트림 (1102) 은 차례로 어드레스 "C" 가 뒤따르는 어드레스 "B" 에 이은 어드레스 "A" 를 포함한다. 비순차 어드레스의 경우에서와 같이, "A" 에서 "B" 를 예측하고, "C" 에서 "B" 를 예측하기 위한 패턴을 검출하는 것은 단지 판독 요청 (1101) 을 모니터링하는 것 이상이 없이는 어려운 조건이다. 이를 달성하기 위해, 예측 발생기 (1122) 는 특정 트리거 어드레스와 그 타겟 어드레스 사이의 패터닝불가한 조합의 예측을 인에이블하기 위해 타겟 캐시 (1130) 의 콘텐츠를 확립한다. 예를 들어, 스트림 (1102) 의 어드레스 A (뿐만 아니라 후속 어드레스) 를 검출하에서, 예측 발생기 (1122) 는 A 에서 B 로의 조합, B 에서 C 로의 조합, C 에서 D 로의 조합 등과 같은 조합을 가지는 타겟 캐시 (1130) 를 거주시킨다. 비순차 예측 엔진 (1120) 은 다른 스트림 (1104, 1106, 등) 의 어드레스가 검출되는 때와 동일하게 동작한다.Out of order prediction engine 1120 is configured to receive an example address 1101 of a read request. 11 conceptually illustrates non-sequential address streams 1102, 1104, 1106, 1108, and 1110, each of which includes an address that cannot be patterned with a pre-detected address. For example, stream 1102 includes address "A" followed by address "B" followed by address "C". As in the case of a non-sequential address, detecting a pattern for predicting "B" at "A" and predicting "B" at "C" is difficult without more than just monitoring the read request 1101. Condition. To accomplish this, prediction generator 1122 establishes the contents of target cache 1130 to enable prediction of an unpatternable combination between a particular trigger address and its target address. For example, upon detecting address A (as well as subsequent address) of stream 1102, prediction generator 1122 has a target having a combination such as a combination of A to B, a combination of B to C, a combination of C to D, and the like. The cache 1130 is inhabited. Out of order prediction engine 1120 operates the same as when the addresses of other streams 1104, 1106, etc. are detected.

특정 실시형태에 따르면, 타겟 캐시 (1130) 는 테이블 (1140, 1150 및 1160) 과 같은 테이블에 의한 형태로 이들의 조합을 저장한다. 이들 테이블는 트리거 어드레스 및 타겟 어드레스를 각각 저장하기 위한 트리거 컬럼 ("TGR") 및 타겟 컬럼 ("TGT") 을 포함한다. 다음으로, 모든 스트림의 어드레스들 (1101) 이 테이블 (1140, 1150 및 1160) 내에 저장된다고 고려한다. 테이블 (1140) 에 도시된 바와 같이, 트리거-타겟 조합 (1142, 1144 및 1146) 은 A 에서 B 로의 조합, B 에서 C 로의 조합 및 G 에서 Q 로의 조합을 각각 설명한다. 다른 트리거-타겟 조합 (1148) 은 C 에서 D 로의 조합 등을 포함한다. 마찬가지로, 테이블 (1150) 은 A 에서 X 로의 조합을 설명하는 트리거-타겟 조합 (1152) 을 포함하고, 테이블 (1160) 은 A 에서 L 로의 조합을 설명하기 위해 트리거-타겟 조합 (1162) 을 포함한다.According to a particular embodiment, the target cache 1130 stores combinations thereof in the form of tables, such as tables 1140, 1150, and 1160. These tables include a trigger column ("TGR") and a target column ("TGT") for storing a trigger address and a target address, respectively. Next, consider that the addresses 1101 of all streams are stored in tables 1140, 1150 and 1160. As shown in table 1140, trigger-target combinations 1142, 1144, and 1146 describe combinations A to B, combinations B to C, and combinations G to Q, respectively. Other trigger-target combinations 1148 include C to D combinations and the like. Similarly, table 1150 includes a trigger-target combination 1152 that describes a combination from A to X, and table 1160 includes a trigger-target combination 1162 to describe the combination from A to L. .

도 11 은 테이블 (1140, 1150 및 1160) 이 "Way 0", "Way 1", 및 "Way 2" 로 각각 식별되는 것을 나타내고, 동일한 트리거 어드레스에 대한 트리거 캐시 (1130) 내에서 다중 트리거-타겟 조합의 관련 위치를 도시한다. 우선순위 조절기 (1124) 는 통상적으로 우선순위를 가지는 메모리 위치를 조합함으로써 트리거-타겟 조합에 우선순위를 할당하고, 그로 인해, 예측을 할당한다. 이러한 경우, Way 0 은 가장 높은 우선순위로 조합되고, Way 1 은 두 번째로 높은 우선순위에 조합된다. 이 예시에서, 테이블 (1140) 의 트리거-타겟 조합 (1142) 은, A 에서 B 로의 조합이 테이블 (1150) 의 트리거-타겟 조합 (1152) 인 A 에서 X 로의 조합보다 높은 우선순위인 것을 나타낸다. 따라서, 타겟 캐시 (1130) 가 이들 조합을 포함한 후에, 다음 시간 비순차 예측 엔진 (1120) 은 어드레스 A 를 검출하고, 다음으로, 비순차 예측 엔진 (1120) 은 하나 이상의 예측을 제공할 수 있다. 통상적으로, 비순차 예측 엔진 (1120) 은 우선순위의 순서로 발생된 비순차 예측을 발생시킨다. 특히, 비순차 예측 엔진 (1120) 은 더 낮은 우선순위의 예측을 발생시키기 전에 가장 좋은 우선순위를 가지는 예측을 발생시킨다. 이와 같이, 비순차 예측 엔진 (1120) 은 우선순위에 기초하여 예측의 구성가능한 수를 발생시킬 수 있다. 예를 들어, 비순차 예측 엔진 (1120) 은 2 개의 예측: 레그 0 및 레그 1 (즉, 상부 2 개의 트리거-타겟 조합) 의 수를 제한할 수 있다. 몇몇의 경우에 이는, 비순차 예측 엔진 (1120) 이 테이블의 관련 우선순위에 의해서 어드레스 X 보다는 어드레스 B 를 제공하도록 하는 경향이 있는 것을 의미한다. 트리거-타겟 조합중 관련 우선순위는 단지 그와 관련된 것이라는 것을 명시한다. 이는, 타겟 캐시 (1130) 는 예를 들어, Way 4 에서 특정 트리거 어드레스에 대한 가장 높은 우선순위 조합을 위치시키고, Way 9 에서 두 번째로 높은 우선순위 조합을 위치시킬 수 있다는 것을 의미한다. 그러나, 타겟 캐시 (1130) 는 하나의 어드레스에서 "레그" 의 임의의 분량, 즉, 그 이하에는 레그 0 및 레그 1 밖에 없는, 임의의 분량이 포함될 수 있다는 것이 명시된다.FIG. 11 shows that tables 1140, 1150 and 1160 are identified as “Way 0”, “Way 1” and “Way 2” respectively, and multiple trigger-targets within trigger cache 1130 for the same trigger address. The relative position of the combination is shown. Priority adjuster 1124 typically assigns priority to the trigger-target combination by combining prioritized memory locations, thereby assigning prediction. In this case, Way 0 is combined with the highest priority and Way 1 is combined with the second highest priority. In this example, the trigger-target combination 1142 of the table 1140 indicates that the combination of A to B is higher priority than the combination of A to X, which is the trigger-target combination 1152 of the table 1150. Thus, after target cache 1130 includes these combinations, next time out of order prediction engine 1120 detects address A, and then out of order prediction engine 1120 may provide one or more predictions. Typically, out of order prediction engine 1120 generates out of order predictions generated in order of priority. In particular, the out of order prediction engine 1120 generates the prediction with the highest priority before generating the lower priority prediction. As such, out of order prediction engine 1120 may generate a configurable number of predictions based on priority. For example, the out of order prediction engine 1120 can limit the number of two predictions: leg 0 and leg 1 (ie, the top two trigger-target combinations). In some cases this means that the out of order prediction engine 1120 tends to provide address B rather than address X by the relative priority of the table. The related priority of the trigger-target combination specifies that it is only relevant. This means that the target cache 1130 may, for example, locate the highest priority combination for a particular trigger address in Way 4 and place the second highest priority combination in Way 9. However, it is specified that the target cache 1130 may include any amount of “legs” at one address, that is, any amount below that only leg 0 and leg 1.

도 12 는 본 발명의 일 실시형태에 따라서, 예시적인 예측 발생기 (1222) 를 도시한다. 이 예시에서, 예측 발생기 (1222) 는 그 내부에 저장된 트리거-타겟 조합을 관리할 뿐만 아니라 예측을 발생시키기 위해 타겟 캐시 (1230) 에 연결된 다. 예측 발생기 (1222) 는 인덱스 발생기 (1204), 태그 발생기 (1206), 타겟 결정기 (1208) 및 결합기 (1210) 를 포함한다. 또한, 예측 발생기 (1222) 는 발견된 트리거-타겟 조합을 타겟 캐시 (1230) 로 삽입하기 위한 삽입기 (1202) 를 포함한다.12 illustrates an example prediction generator 1222, in accordance with an embodiment of the present invention. In this example, prediction generator 1222 is connected to target cache 1230 to generate predictions as well as manage trigger-target combinations stored therein. Prediction generator 1222 includes an index generator 1204, a tag generator 1206, a target determiner 1208, and a combiner 1210. The prediction generator 1222 also includes an inserter 1202 for inserting the found trigger-target combination into the target cache 1230.

예측을 발생시키는 경우, 인덱스 발생기 (1204) 및 태그 발생기 (1206) 는 다른 어드레스에 우선하는 어드레스일 수 있는 제 1 어드레스 "addr_1" 를 나타내기 위한 인덱스 및 태그를 생성하기 위해 각각 동작한다. 인덱스 발생기 (1204) 는 타겟 캐시 (1230) 에서 메모리 위치의 서브세트에 액세스하기 위해 addr_1 로부터 인덱스 "index(addr_1)" 를 형성한다. 통상적으로, 인덱스 (addr_1) 의 값은 각각의 선택된 방법의 각각의 대응 메모리 위치를 선택한다. 또한, 예측 발생기 (1222) 가 addr_1 과 조합된 타겟 캐시 (1230) 내의 특정 트리거-타겟 조합을 액세스할 수 있도록, 태그 발생기 (1206) 는 태그 "tag(addr_1)" 를 형성한다.In generating the prediction, the index generator 1204 and the tag generator 1206 operate respectively to generate an index and a tag for indicating the first address "addr_1", which may be an address that takes precedence over the other address. Index generator 1204 forms index "index (addr_1)" from addr_1 to access a subset of memory locations in target cache 1230. Typically, the value of index addr_1 selects each corresponding memory location of each selected method. In addition, the tag generator 1206 forms a tag “tag (addr_1)” such that the prediction generator 1222 can access a specific trigger-target combination in the target cache 1230 combined with addr_1.

예시로서, addr_1 은 "G" 라는 것을 고려한다. 이 어드레스를 통해서, 예측 발생기 (1222) 는 그 인덱스로 조합된 메모리 위치를 선택하기 위해 인덱스 (G) 를 생성시킨다. 이 예시에서, 인덱스 (G) 는 3 인 값 "I" 을 가진다 (즉, I=3). 이는, 인덱스 (G) 가 방법 ("Way 0"; 1240), 방법 ("Way 1"; 1250), 내지 방법 ("방법 N"; 1260) 에 대한 I=3 에 의해 식별된 각각의 메모리 위치를 선택하기 위해 이용될 수 있는데, 여기서, N 은 타겟 캐시 (1230) 내에서 이용가능한 방법의 수를 나타내는 구성가능한 수이다. 동일한 어드레스 G 에 대해, 태그 발생기 (1206) 는 G 와 조합된 특정 메모리 위치를 식별하기 위해 태그 (G) 로서 어드레스 G 의 태그를 생성한다. 따라서, 인덱스 (G) 의 인덱스 및 태그 (G) 의 태그의 경우, 타겟 어드레스 Q 및 P (또는 그 대리 표현) 은, 도 12 에 도시된 바와 같이, Way (1240) 및 Way (1250) 에서 각각의 메모리 위치로부터 회수될 수 있고 또는 그 위치에 저장될 수 있다. 특정 실시형태에서, 각각의 어드레스는 36 비트를 구성한다. 비트 28:18 은 어드레스에 대한 태그를 나타낼 수 있고, 비트의 임의의 그룹 19:9, 18:8, 17:7 또는 비트 16:6 은 그 어드레스에 대한 구성가능한 인덱스를 나타낼 수 있다. 이와 다르게, 일 실시형태에서, 어드레스의 일부는 타겟 어드레스를 나타낸다. 예를 들어, 36-비트 타겟 어드레스의 비트 30:6 은 타겟 캐시 (1230) 의 TRT 컬럼 내에 유지된다. 타겟 어드레스 및 트리거 어드레스의 감소된 표현을 통해서, 작은 하드웨어가 요구되고, 그로 인해, 재료, 리소스 등과 관련한 비용이 감소된다.As an example, consider that addr_1 is "G". Through this address, prediction generator 1222 generates index G to select the memory location combined with that index. In this example, index G has a value "I" of 3 (ie, I = 3). This means that each memory location whose index G is identified by I = 3 for method ("Way 0"; 1240), method ("Way 1"; 1250), to method ("Method N"; 1260) Can be used to select N, where N is a configurable number representing the number of methods available within the target cache 1230. For the same address G, tag generator 1206 generates a tag of address G as tag G to identify a particular memory location in combination with G. Therefore, in the case of the index of the index G and the tag of the tag G, the target addresses Q and P (or surrogate representations thereof) are respectively represented in the way 1240 and the way 1250, as shown in FIG. 12. May be retrieved from or stored at a memory location of the device. In a particular embodiment, each address constitutes 36 bits. Bits 28:18 can represent a tag for an address, and any group 19: 9, 18: 8, 17: 7 or bit 16: 6 of a bit can represent a configurable index for that address. Alternatively, in one embodiment, some of the addresses represent target addresses. For example, bits 30: 6 of the 36-bit target address are kept in the TRT column of the target cache 1230. With reduced representation of target address and trigger address, less hardware is required, thereby reducing costs associated with materials, resources, and the like.

타겟 결정기 (1208) 는 트리거-타겟 조합이 특정 트리거에 대해 존재하는지의 여부를 결정하고, 만약 존재한다면, 그 트리거에 대한 각각의 타겟 어드레스를 결정한다. 사전 예시에 관련하여, 타겟 결정기 (1208) 는 다른 트리거 어드레스를 표현하는 인덱스 (G) 에서 태그에 대해 매치되는 태그 (G) 에 응답하여 타겟 어드레스 (Q 및 P) 를 회수한다. 당업자는, 공지된 비교기 회로 (미도시) 가 매칭 태그를 식별하기 위해 예측 발생기 (1222) 또는 타겟 캐시 (1230) 둘 중 하나에서 구현하는 것이 적절하다는 것을 이해한다. 하나 이상의 타겟 어드레스가 발견되는 경우, 이들 어드레스는 결합기 (1210) 로 통과된다. 결합기 (1210) 는 각각의 타겟 어드레스 (1214) 를 예측 식별자 ("PID"; 1212) 로 조합하고, 이는 트리거 어드레스의 인덱스 및 태그로 구성된다. PID (1212) 는 타겟 어드레스 (Q 및 P) 가 예측되도록 야기한 트리거 어드레스를 식별한다. 따라서, PID (1212) 가 [index(G), tag(G)] 로서 나타날 수 있는 경우, 예측 발생기 (1222) 에 의해 발생된 비순차 예측은 참조로서 [[index(G), tag(G)],Q] 의 형태를 가진다. [index(G), tag(G)] 가 참조 예측으로 조합되면, Q 는 예측으로서 고려된다고 명시한다. 따라서, 캐시 메모리로 프리페치된 예측 정보는 data(Q)+[[index(G), tag(G)],Q] 로 나타날 수 있다.Target determiner 1208 determines whether a trigger-target combination exists for a particular trigger, and if present, determines each target address for that trigger. In relation to the preliminary example, the target determiner 1208 retrieves the target addresses Q and P in response to the tag G that is matched for the tag at the index G representing the different trigger address. Those skilled in the art understand that it is appropriate for a known comparator circuit (not shown) to implement in either the prediction generator 1222 or the target cache 1230 to identify the matching tag. If more than one target address is found, these addresses are passed to combiner 1210. The combiner 1210 combines each target address 1214 into a prediction identifier (“PID”) 1212, which consists of an index and a tag of the trigger address. PID 1212 identifies the trigger address that caused the target addresses Q and P to be predicted. Thus, if PID 1212 can be represented as [index (G), tag (G)], the out of order prediction generated by prediction generator 1222 is [[index (G), tag (G) as a reference. ], Q]. If [index (G), tag (G)] is combined into a reference prediction, it specifies that Q is considered as a prediction. Therefore, the prediction information prefetched into the cache memory may be represented as data (Q) + [[index (G), tag (G)], Q].

결합기 (1210) 는 트리거 어드레스에 대해 비순차인 부가적인 예측의 수를 발생하는 "배치" 신호 (1226) 를 수신하도록 구성될 수 있다. 예를 들어, 배치 신호 (1226) 는 매칭된 타겟 어드레스를 포함하는 범위를 가지는 예측의 그룹으로서 "n" 개의 예측을 발생하도록 결합기 (1210) 를 지시한다고 가정한다. 따라서, 트리거 어드레스 "G" 가 어드레스 "Q" 의 비순차 예측 (즉, 기준 어드레스로서 Q0) 을 발생하는 경우, 예측된 어드레스는 Q0, Q1, Q2,...Qb 를 포함할 수 있고, 여기서, b 는 배치 신호에 의해 설정된 수이다. 백 섹터 또는 블라인드 후면 순차 예측이 동시에 발생되는 몇몇의 경우에서, 다음으로, 배치 b 는 b-1 로 설정될 수 있다. 이와 같이, 예측된 어드레스의 그룹은 Q(-1), Q0, Q1, Q2,...Q(b-1) 을 포함한다. 예측된 어드레스의 그룹에서 각각은 PID (1212) 와 조합될 수 있다는 것이 명시된다. 특정 실시형태에서, 타겟 어드레스 (1214) 는 트리거 어드레스의 속성을 물려받고, 여기서 이러한 속성들은 트리거 어드레스가 코드 또 는 프로그램 데이터와 조합되었는지의 여부, 및 트리거 어드레스가 프로세서 요구 어드레스인지 아닌지의 여부를 나타낸다. 또한, 다른 특정 실시형태에서, 그룹에서 예측된 어드레스의 수보다 작은 수가 PID (1212) 와 조합될 수 있다. 일 예시에서, 단지 타겟 어드레스 Q0 만이 PID (1212) 와 조합될 수 있고, 하나 이상의 다른 그룹 (예를 들어, Q(-1), Q2, Q3 등) 은 PID (1212) 와 조합될 필요는 없다. 이와 같이, 트리거 어드레스 G 가 후속되는 타겟 어드레스 Q0 에 의해 충돌될 때, PID (1212) 는 비순차 예측기로 기록된다. 다음으로, Q2 또는 임의의 다른 그룹이 충돌될 때, PID (1212) 는 기록되지 않는다. 이는, 타겟 캐시에서의 잉여의 입력의 수를 감소시킨다. 따라서, 단지 조합 "G->Q0" 이 그 예측 상의 히트의 결과로서 저장되고 다시 우선순위가 매겨진다. 어드레스 Q1 이 어드레스 스트림 내부에서 검출될 때, 비순차 예측기는 조합 "G->Q1" 를 삽입할 필요는 없다.The combiner 1210 can be configured to receive a "batch" signal 1226 that generates a number of additional predictions that are out of order with respect to the trigger address. For example, assume that placement signal 1226 directs combiner 1210 to generate "n" predictions as a group of predictions having a range that includes a matched target address. Thus, when the trigger address "G" generates out of order prediction of the address "Q" (ie, Q0 as a reference address), the predicted address may include Q0, Q1, Q2, ... Qb, where , b is the number set by the batch signal. In some cases where back sector or blind back sequential prediction occurs at the same time, then placement b may be set to b-1. As such, the group of predicted addresses includes Q (-1), Q0, Q1, Q2, ... Q (b-1). It is specified that each in the group of predicted addresses can be combined with the PID 1212. In a particular embodiment, the target address 1214 inherits the attributes of the trigger address, where these attributes indicate whether the trigger address is combined with code or program data and whether the trigger address is a processor request address or not. . Also, in another particular embodiment, a number less than the number of predicted addresses in the group may be combined with the PID 1212. In one example, only target address Q0 may be combined with PID 1212, and one or more other groups (eg, Q (-1), Q2, Q3, etc.) need not be combined with PID 1212. . As such, when trigger address G is collided by a subsequent target address Q0, PID 1212 is written to an out of order predictor. Next, when Q2 or any other group crashes, the PID 1212 is not recorded. This reduces the number of redundant inputs in the target cache. Thus, only the combination "G-> Q0" is stored and reprioritized as a result of the hit on the prediction. When the address Q1 is detected inside the address stream, the out of order predictor need not insert the combination "G-> Q1".

다음으로, 타겟 결정기 (1208) 는 addr_1 에 대한 타겟 어드레스를 검출하지 않는다. 그 후, 타겟 결정기 (1208) 는 addr_1 에 대한 어떠한 트리거-타겟 조합도 존재하지 않는 삽입기 (1202) 로 전달된다. 이에 응답하여, 삽입기 (1202) 는 addr_1 에 대한 트리거-타겟 조합을 형성하고, 타겟 캐시 (1230) 로 그 조합을 삽입한다. 이를 수행하기 위해, 삽입기 (1202) 는 먼저 태그 (addr_1) 를 함께 저장하기 위한 인덱스 (addr_1) 를 사용하는 메모리 위치를 식별한다. 또한, 삽입기 (1202) 는 어드레스 addr_1 를 트리거하기 위해 타겟 어드레스로서 저장하기 위한 후속 어드레스 "addr_2" 를 수신하도록 구성된다. 어떠한 트리 거-타겟 조합도 신규로-형성된 트리거-타겟 조합 이전에 존재하지 않기 때문에, 삽입기 (1202) 는 가장 높은 우선순위의 방법 (즉, Way 0) 인 Way (1240) 의 TRG 컬럼 및 TGT 컬럼에 각각 태그 (addr_1) 및 (addr_2) 를 저장한다. 예를 들어, 도 11 의 어드레스 스트림에 대해 고려하면, 이 스트림은 "Z" 가 "Y" 뒤를 따르는 제 1 예를 나타낸다. 이를 결정한 후에, 어떠한 "태그(Y) 내지 (Z)" 트리거-타겟 조합도 존재하지 않으며, 그 후, 도 12 의 삽입기 (1202) 는 인덱스 (Y) 에서 신규의 트리거-타겟 조합을 저장한다. 이와 같이, "태그(Y) 내지(Z)" 는 Way (1240) 에서 트리거-타겟 조합 (1242) 으로 저장된다. 특정 실시형태에서, 삽입기 (1202) 는 이하 설명되는 우선순위 조절기 (1324) 로부터 삽입 신호 ("INS"; 1224) 를 수신한다.Next, the target determiner 1208 does not detect the target address for addr_1. The target determiner 1208 is then passed to the inserter 1202 in which no trigger-target combination for addr_1 exists. In response, inserter 1202 forms a trigger-target combination for addr_1 and inserts the combination into target cache 1230. To do this, inserter 1202 first identifies a memory location that uses an index addr_1 for storing the tag addr_1 together. In addition, inserter 1202 is configured to receive a subsequent address “addr_2” for storing as a target address to trigger address addr_1. Since no trigger-target combination exists prior to the newly-formed trigger-target combination, inserter 1202 is a TG column and TGT of Way 1240, which is the highest priority method (ie, Way 0). Store the tags (addr_1) and (addr_2) in the columns, respectively. For example, considering the address stream of FIG. 11, this stream represents a first example where "Z" follows "Y". After determining this, there are no “tags (Y) to (Z)” trigger-target combinations, and then the inserter 1202 of FIG. 12 stores the new trigger-target combination at index Y. . As such, “tags Y through Z” are stored as trigger-target combination 1242 in Way 1240. In a particular embodiment, inserter 1202 receives an insert signal (“INS”) 1224 from priority adjuster 1324 described below.

도 13 은 본 발명의 일 실시형태에 따른 예시적인 우선순위 조절기 (1324) 를 도시한다. 일반적으로, 우선순위 조절기 (1324) 는 가장 최근의, 성공적인 타겟 어드레스에서 가장 오래된 또는 성공적인 어드레스로 트리거-타겟 조합의 우선순위를 매기도록 동작한다. 예를 들어, 트리거-타겟 조합은, 어떠한 사전 타겟도 특정 조합에 대해 존재하지 않았을 때, (즉, Way 0 으로 제어된) 가장 높은 우선순위로 할당된다. 또한, 트리거-타겟 조합은, 예측된 타겟 어드레스가 성공적으로 증명되는 경우 (예를 들어, 데이터가 비순차 예측에 기초하여 프리페치된 프로세스에 의한 데이터의 판독이 있는 경우), 가장 높은 우선순위가 할당될 수 있다. 이 예시에서, 우선순위 조절기 (1324) 는 그 내부에 저장된 트리거-타겟 조합의 우선순위를 그들 사이에서 매기기 위해 타겟 캐시 (1230) 에 연결된다. 우선순위 조절기 (1324) 는 레지스터 (1302), 인덱스 디코더 (1308), 태그 디코더 (1310), 타겟 결정기 (1318), 매처 (1314) 및 우선순위 재조정기 (1316) 를 포함한다.13 illustrates an example priority adjuster 1324 in accordance with an embodiment of the present invention. In general, priority adjuster 1324 operates to prioritize the trigger-target combination from the most recent, successful target address to the oldest or successful address. For example, a trigger-target combination is assigned the highest priority (ie, controlled by Way 0) when no pre-target exists for a particular combination. In addition, the trigger-target combination has the highest priority if the predicted target address is successfully proved (e.g., there is a read of the data by a process where the data is prefetched based on out of order prediction). Can be assigned. In this example, priority adjuster 1324 is coupled to target cache 1230 to prioritize the trigger-target combination stored therein between them. Priority adjuster 1324 includes a register 1302, index decoder 1308, tag decoder 1310, target determiner 1318, matcher 1314, and reprioritizer 1316.

일반적으로, 우선순위 조절기 (1324) 는 특정 어드레스가 프로세서에 의해 요청된 데이터를 제공하는데 성공적임을 나타내는 비순차 예측기 (1010) 와 무관한 정보를 수신한다. 이러한 정보는 도 10 에 도시된 다중-레벨 캐시와 같은 캐시 메모리에 의해 발생될 수 있다. 우선순위 조절기 (1324) 는 "Hit Info" 로서 레지스터 (1302) 로 이 정보를 수신한다. Hit Info 는 적어도 데이터 (예를 들어, 프로세서에 의해 실질적으로 요청된 프로그램 인스트럭션 및/또는 프로그램 데이터) 의 어드레스 (1304) 를 포함하는 참조정보이다. 어드레스 (1304) 는 addr_2 로서 구별된다. 또한, 이 참조기호는 어드레스 (1304) 와 조합된 PID (1306) 를 포함한다.In general, priority adjuster 1324 receives information independent of out-of-order predictor 1010 indicating that a particular address is successful in providing the data requested by the processor. Such information may be generated by cache memory, such as the multi-level cache shown in FIG. Priority adjuster 1324 receives this information in register 1302 as "Hit Info". Hit Info is reference information that includes at least an address 1304 of data (eg, program instructions and / or program data substantially requested by the processor). Address 1304 is distinguished as addr_2. This reference symbol also includes a PID 1306 combined with an address 1304.

인덱스 디코더 (1308) 및 태그 디코더 (1310) 는, addr_2 가 적절한 레벨의 우선순위를 갖는지의 여부를 결정하기 위해 PID (1306) 로부터 인덱스 (addr_1) 및 태그 (addr_1) 를 각각 추출한다. 이를 달성하기 위해, 우선순위 조절기 (1324) 는, addr_2 가 타겟 캐시 (1230) 에 기존의 트리거-타겟 조합의 타겟 어드레스인지의 여부를 식별한다. 우선순위 조절기 (1324) 는 태그 (addr_1) 및 인덱스 (addr_1) 를 타겟 캐시 (1230) 로 가한 후, 타겟 캐시 (1230) 의 TRG 컬럼의 임의의 매칭 트리거 어드레스는 타겟 결정기 (1318) 에 의해 수신된다. addr_1 에 조합된 하나 이상의 타겟 어드레스의 검출 하에서, 타겟 결정기 (1318) 는 이들 타겟 어드레스를 매처 (1314) 로 제공한다.Index decoder 1308 and tag decoder 1310 extract the index addr_1 and tag addr_1 from PID 1306, respectively, to determine whether addr_2 has an appropriate level of priority. To accomplish this, priority adjuster 1324 identifies whether addr_2 is a target address of an existing trigger-target combination in target cache 1230. Priority adjuster 1324 applies tag addr_1 and index addr_1 to target cache 1230, and any matching trigger address of the TRG column of target cache 1230 is received by target determiner 1318. . Under detection of one or more target addresses combined with addr_1, target determiner 1318 provides these target addresses to matcher 1314.

그러나, 타겟 결정기 (1318) 가, 어떠한 타겟 어드레스도 트리거-타겟 조합 내에 존재하지 않는다고 (즉, 어드레스 addr_1 와 조합된 어떠한 addr_2 도 없다고) 결정되는 경우, 다음으로, 신규의 트리거-타겟 조합을 삽입하기 위한 도 12 의 삽입기 (1202) 와 삽입 신호 ("INS") 는 통신한다. 삽입 신호 (1224) 는 통상적으로 addr_1 및 addr_2 와 같은 어드레스 정보를 포함한다. 통상적으로, 어떠한 매칭 타겟 어드레스도 Hit Info 의 PID (1306) 에 대해 존재하지 않는 상황은, 프로세서가 사전에 발행된 비순차 예측하에 있다는 것을 의미한다. 그러나, 타겟 캐시 (1230) 는 그 사전에 발행된 비순차 예측에 대한 기초를 형성한 트리거-타겟 조합을 제거했다. 이와 같이, 비순차 예측 엔진 (1010) 은, 프로세서에 의해 성공적으로 이용된 비순차 어드레스를 예측하기 위해 다시 사용될 수 있는 트리거-타겟 조합을 삽입하거나 또는 재삽입한다.However, if the target determiner 1318 determines that no target address is present in the trigger-target combination (ie, there is no addr_2 in combination with the address addr_1), then inserting the new trigger-target combination The inserter 1202 of FIG. 12 and the insert signal ("INS") are in communication. Insert signal 1224 typically includes address information such as addr_1 and addr_2. Typically, a situation where no matching target address exists for the PID 1306 of Hit Info means that the processor is under previously issued out of order prediction. However, the target cache 1230 has eliminated the trigger-target combination that formed the basis for its previously issued out of order prediction. As such, the out of order prediction engine 1010 inserts or reinserts a trigger-target combination that can be used again to predict out of order addresses successfully used by the processor.

타겟 결정기 (1318) 가 하나 이상의 타겟 어드레스를 검출하는 경우, 이는, 검출된 타겟 어드레스를 매처 (1314) 에 제공한다. 매처 (1314) 는 얼마나 많은 조합된 타겟 어드레스가 addr_1 에 대해, 그리고 기존의 타겟 어드레스에 대해 대응 트리거-타겟 조합이 주둔하는 방법으로 존재하는 지를 결정하기 위해 addr_2 (즉, 어드레스 (1304)) 에 대항하는 각각의 검출된 타겟 어드레스를 비교한다. 필요한 경우, 확인 회로 (1314) 는 우선순위를 변형하기 위해 우선순위 재조정기 (1316) 에 그 대조의 결과를 제공한다.If target determiner 1318 detects one or more target addresses, it provides the detected target address to matcher 1314. The matcher 1314 is opposed to addr_2 (ie, address 1304) to determine how many combined target addresses exist for addr_1 and for existing target addresses in such a way that the corresponding trigger-target combination is stationed. Each detected target address is compared. If necessary, the confirmation circuit 1314 provides the result of the matching to the priority reorientator 1316 to modify the priority.

첫째, 하나 이상의 타겟 어드레스가 트리거 어드레스로서 addr_1 을 나타내 지만, addr_2 를 포함하는 어떠한 트리거-타겟 조합도 없는 PID (1306; 즉, addr_1) 과 조합되는 것과 같이 검출되는 예를 고려한다. 따라서, 우선순위 재조정기 (1316) 는 가장 높은 우선순위 (예를 들어, Way 0) 를 나타내는 위치로 신규의 트리거-타겟 조합을 삽입하고, 동일한 트리거의 기존의 트리거-타겟 조합의 우선순위를 강등시킨다. 예를 들어, 도 12 에 도시된 바와 같이, "태그 (A 내지 X)" 트리거-타겟 조합이 가장 높은 우선순위를 나타내는 메모리 위치에 있는 반면, "태그 (A 내지 L)" 조합은 더 낮은 우선순위를 가지는 조합을 고려한다. 다음으로, PID (1306) 는 addr_1 로서 어드레스 A 를 나타내고, addr_2 는 어드레스 B 라고 가정한다. 우선순위 재조정기 (1316) 는, 도 13 에 도시된 바와 같이, 더 낮은 우선순위의 다른 방법으로 저장된 다른 사전 조합을 통해서 "태그 (A 내지 B)" 를 저장하도록 동작한다.First, consider an example where one or more target addresses represent addr_1 as a trigger address, but are detected as being combined with a PID 1306 (ie, addr_1) without any trigger-target combination including addr_2. Thus, prioritizer 1316 inserts the new trigger-target combination into the position representing the highest priority (eg, Way 0) and demotes the priority of an existing trigger-target combination of the same trigger. Let's do it. For example, as shown in FIG. 12, the "tags (A to X)" trigger-target combination is in the memory location with the highest priority, while the "tags (A to L)" combination is the lower priority. Consider a combination with ranking. Next, it is assumed that PID 1306 represents address A as addr_1 and addr_2 is address B. FIG. Priority readjuster 1316 operates to store "tags A through B" via another dictionary combination stored in another way of lower priority, as shown in FIG.

둘째, 2 개의 타겟 어드레스가 PID (1306; 즉, addr_1) 과 조합되는 바와 같이 검출되지만, 2 개의 트리거-타겟 조합이 적절하게 교환된 그 우선순위를 가지는 예시를 고려한다. 이 경우, 우선순위 재조정기 (1316) 는 가장 높은 우선순위 (예를 들어, Way 0) 를 나타내는 위치로 가장 높은 우선순위 트리거-타겟 조합을 삽입하고, 두 번째로 높은 우선순위 (예를 들어, Way 1) 를 나타내는 다른 위치로 이전의 가장 높은 우선순위 트리거-타겟 조합을 삽입한다. 예를 들어, 도 12 에 도시된 바와 같이, "태그 (B 내지 G)" 트리거 타겟 조합은 가장 높은 우선순위를 나타내는 메모리 위치에 있는 반면에, "태그 (B 내지 C)" 조합은 더 낮은 우선순위를 가지는 조합을 고려한다. 다음으로, PID (1306) 은 addr_1 로서 어드레 스 B 를 나타내고, 어드레스 C 는 addr_2 이라고 가정한다. 우선순위 재조정기 (1316) 는, 도 13 에 도시된 바와 같이, 더 낮은 우선순위의 Way 1 에서의 다른 조합을 통해서, Way 0 의 "태그 (B 내지 C)" 조합을 저장하기 위해 동작한다. 우선순위화 하는 기술은, 적어도 2 개의 가장 높은 우선순위가 "레그 0" 및 "레그 1" 로서 유지되는 경우, 각각 가장 높은 우선순위 및 2 번째 우선순위로서 유지되기에 유용하다는 것이 명시된다.Second, consider an example in which two target addresses are detected as combined with PID 1306 (ie addr_1), but the two trigger-target combinations have their priorities properly exchanged. In this case, priority reorderor 1316 inserts the highest priority trigger-target combination into the position representing the highest priority (eg, Way 0), and the second highest priority (eg, Insert the previous highest priority trigger-target combination into another location representing Way 1). For example, as shown in FIG. 12, the "tag (B to G)" trigger target combination is at the memory location that represents the highest priority, while the "tag (B to C)" combination is the lower priority. Consider a combination with ranking. Next, it is assumed that PID 1306 represents address B as addr_1 and address C is addr_2. Priority readjuster 1316 operates to store a “tag (B to C)” combination of Way 0 through another combination in Way 1 of lower priority, as shown in FIG. 13. It is stated that the prioritizing technique is useful to be maintained as the highest priority and the second priority, respectively, when at least two highest priorities are maintained as "leg 0" and "leg 1", respectively.

다음으로, 2 개의 타겟 어드레스는 PID (1306; 즉, addr_1) 과 조합되는 바와 같이 검출되고, 2 개의 트리거-타겟 조합이 적절하게 할당된 그 우선순위를 가지는 예시를 고려한다. 이 경우, 우선순위 재조정기 (1316) 는 대응 트리거-타겟 조합이 틀림없기 때문에 어떠한 동작도 취하지 않는다.Next, consider an example in which two target addresses are detected as combined with PID 1306 (ie addr_1) and the two trigger-target combinations have their priorities properly assigned. In this case, the reprioritizer 1316 does not take any action since the corresponding trigger-target combination must be.

도 14 는 본 발명의 특정 실시형태에 따르면, 비순차 예측을 형성하기 위해 예측 발생기를 동작시키기 위한 예시적인 파이프라인 (1400) 을 도시한다. 도 14 에서, 실선으로 된 박스는 단계들 도중의 또는 사이의 저장을 나타내고, 점선으로 된 박스는 비순차 예측기에 의해 수행된 동작을 나타낸다. 스테이지 0 도중에, 판독 요청의 addr_1 은 도 13 의 인덱스 디코더 (1308) 및 태그 디코더 (1310) 의 혼합물 (amalgam) 일 수 있는 결합-태그-및-인덱스 발생기 (1402) 에 의해 디코딩된다. 일 실시형태에서, 결합-태그-및-인덱스 발생기 (1402) 는 그 어드레스의 제 1 부분 및 어드레스의 제 2 부분으로 addr_1 을 분리시키기 위해 구성된 멀티플렉서이다. 제 1 부분은 1406 에서 태그 (addr_1) 로서 유지되고, 제 2 부분은 1408 에서 인덱스 (addr_1) 로서 유지된다. 또한, 이 단계 도중에, 인덱 스 (addr_1) 는 트리거-타겟 조합을 나타내는 데이터를 회수하기 위해 1410 에서 타겟 캐시로 가해진다. 부가적으로, 판독 요청의 addr_1 은, 타겟 캐시가 기록되는 동안, 버퍼 (1404) 에 임시로 저장될 수 있다.14 illustrates an example pipeline 1400 for operating a prediction generator to form out of order prediction, in accordance with certain embodiments of the present invention. In Fig. 14, the solid boxes represent storage during or between the steps, and the dotted boxes represent operations performed by the out of order predictor. During stage 0, addr_1 of the read request is decoded by the combine-tag-and-index generator 1402, which may be a mixture (amalgam) of the index decoder 1308 and the tag decoder 1310 of FIG. 13. In one embodiment, the combine-tag-and-index generator 1402 is a multiplexer configured to separate addr_1 into a first portion of the address and a second portion of the address. The first portion is maintained as tag addr_1 at 1406 and the second portion is maintained as index addr_1 at 1408. Also during this step, index addr_1 is added to the target cache at 1410 to retrieve data representing the trigger-target combination. Additionally, addr_1 of the read request may be temporarily stored in buffer 1404 while the target cache is being written.

스테이지 1 동안, 태그 (addr_1) 및 인덱스 (addr_1) 는 1412 및 1414 에서 각각 유지되어 잔류한다. 1416 에서, 타겟 어드레스는 타겟 캐시로부터 판독된다. 스테이지 2 동안, 비순차 예측 엔진은 1418 에서 인덱스 (addr_1) 에 조합된 태그에 대항하는 제 1 매칭 태그 (addr_1) 에 의해 적절한 비순차 예측을 선택한다. 1420 에서, 비순차 예측 엔진은, 예를 들어, 가장 높은 우선순위 타겟 어드레스 (즉, 가장 높은 우선순위 트리거-타겟 조합을 저장하는 방법에서) 를 1422 에서 레그 0 예측 큐로 전송하고, 두 번째로 높은 우선순위 타겟 어드레스 (즉, 두 번째로 높은 우선순위 트리거-타겟 조합을 저장하는 방법에서) 를 1424 에서 레그 1 예측 큐로 전송하기 위해 멀티플렉서를 구성한다. 스테이지 3 에서, 이들 2 개의 비순차 예측은, 예를 들어, 1430 에서 결합기로 출력한다. 도 14 가 4 개의 단계에서 비순차 예측을 발생시키지만, 다른 실시형태의 다른 비순차 예측 파이프라인은 더 많은 또는 더 적은 단계를 가질 수 있다는 것이 명시된다.During stage 1, tag addr_1 and index addr_1 remain at 1412 and 1414, respectively. At 1416, the target address is read from the target cache. During stage 2, the out of order prediction engine selects the appropriate out of order prediction by the first matching tag addr_1 against the tag combined at index addr_1 at 1418. At 1420, the out of order prediction engine sends, for example, the highest priority target address (ie, in the method of storing the highest priority trigger-target combination) to the leg 0 prediction queue at 1422, and the second highest. Configure a multiplexer to send the priority target address (ie, in the method of storing the second highest priority trigger-target combination) to the Leg 1 prediction queue at 1424. In stage 3 these two out of order predictions are output to the combiner, for example, at 1430. Although FIG. 14 generates out of order prediction in four steps, it is noted that other out of order prediction pipelines in other embodiments may have more or fewer steps.

도 15 는 본 발명의 특정 실시형태에 따라서 비순차 예측의 우선순위를 매기기 위해 우선순위 조절기를 동작시키기 위한 예시적인 파이프라인 (1500) 을 도시한다. 실선 박스는 단계들 도중 또는 사이에서 저장을 나타내고, 점선 박스는 우선순위 조절기에 의해 수행될 수 있는 동작을 나타낸다. 파이프라인 (1500) 은 트리거-타겟 조합을 타겟 캐시로 삽입하고 타겟 캐시 조합의 우선순위를 재조정 하는 예시적인 방법을 도시한다. 단계-1 은, 우선순위 조절기가 삽입하는지 또는 우선순위를 매기는지의 여부를 결정한다. 우선순위 조절기가 삽입을 수행하게 되면, 그 후, 1502 에서 판독 요청의 어드레스 addr_1 는 이 단계 도중에 1506 에 저장된다. 이 어드레스는 타겟 어드레스에 대한 트리거 어드레스가 될 가능성을 가진다. 우선순위 조절기가 우선순위화를 수행하게 되면, 그 후, 1504 에서 우선순위 조절기는 외부 소스 (예를 들어, 캐시 메모리) 로부터 addr_1 어드레스를 나타내는 PID (1508) 을 수신하고, 또한, 이 단계 도중에 1510 에서 어드레스 addr_2 를 수신한다.15 illustrates an example pipeline 1500 for operating a priority adjuster to prioritize out of order prediction in accordance with certain embodiments of the present invention. Solid boxes represent storage during or between steps, and dashed boxes represent operations that can be performed by the priority adjuster. Pipeline 1500 illustrates an example method of inserting a trigger-target combination into a target cache and reordering the target cache combination. Step-1 determines whether the priority adjuster inserts or prioritizes. If the priority adjuster performs insertion, then at 1502 the address addr_1 of the read request is stored at 1506 during this step. This address has the potential to be a trigger address for the target address. If the priority adjuster performs prioritization, then at 1504 the priority adjuster receives a PID 1508 indicating the addr_1 address from an external source (eg, cache memory), and also, during this step, 1510. Receives the address addr_2.

도 14 및 도 15 는 예측의 하나의 레벨을 이용하는 비순차 예측을 예시하는 것이다. 다중-레벨 예측 발생을 유효하게 하기 위해, 예시적인 파이프라인 (1400 및 1500) 은 각각의 파이프라인 (1400 및 1500) 의 단부에서 발생된 예측을 입력 어드레스로서의 파이프라인 (1400 및 1500) 으로 다시 공급하기 위해 변형될 수 있다. 다음으로, 이들 예측은 예측 발생의 다른 레벨에 대해 큐잉 (queue) 된다. 예를 들어, A 가 검출되면, 그 후, 타겟 캐시 (1130) 는 타겟 어드레스 B 및 X (예를 들어, 가장 높은 2 개의 우선순위 방법) 를 생성한다. 다음으로, 연속 트리거로서의 어드레스 B 는, 타겟 캐시 (1130) 가 어드레스 C 및 G 를 생성하는 파이프라인의 상부로 다시 입력된다. 다시 말해서, 피드백 루프가 예측의 하나 이상의 레벨을 구현하기 위한 예시적인 파이프라인 (1400 및 1500) 에 더해진다.14 and 15 illustrate out of order prediction using one level of prediction. To validate multi-level prediction occurrences, example pipelines 1400 and 1500 feed the predictions generated at the ends of each pipeline 1400 and 1500 back to pipelines 1400 and 1500 as input addresses. It can be modified to make. These predictions are then queued for different levels of prediction occurrences. For example, if A is detected, then target cache 1130 generates target addresses B and X (eg, the two highest priority methods). Next, address B as a continuous trigger is inputted again to the top of the pipeline where the target cache 1130 generates addresses C and G. In other words, a feedback loop is added to the example pipelines 1400 and 1500 for implementing one or more levels of prediction.

먼저, 스테이지 0 도중에, 우선순위 조절기는 트리거-타겟 조합 삽입을 수행 한다고 고려한다. 이 예시에서, addr_1 는 결합-태그-및-인덱스 발생기 (1514) 에 의해 디코딩되고, addr_2 는 멀티플렉서 (1516) 를 통해서 1512 로부터 선택된다. 결합-태그-및-인덱스 발생기 (1514) 는 인덱스 발생기 및 태그 발생기의 집합적인 기능을 수행한다. 일 실시형태에서, 결합-태그-및-인덱스 발생기 (1514) 는 1506 또는 1508 둘 중 하나에서 어드레스를 선택하도록 구성된 멀티플렉서이다. 이 경우, 결합-태그-및-인덱스 발생기 (1514) 는 1520 에서 태그(addr_1) 로서 유지되는 제 1 어드레스 부분을 형성하고, 1522 에서 인덱스 (addr_1) 로서 유지되는 제 2 어드레스 부분을 형성한다. 또한, 이 단계 도중에, 인덱스 (addr_1) 는 트리거-타겟 조합을 설명하는 데이터를 회수하기 위해 멀티플렉서 (1518) 를 통해서 타겟 캐시 (1524) 로 가해진다. 다음으로, 스테이지 0 도중에, 우선순위 조절기는 타겟 캐시의 우선순위화를 수행한다. 이 예시에서, addr_1 (또는 그 다른 표현) 는 1508 로부터 수신되고, addr_2 는 멀티플렉서 (1516) 를 통해서 1510 로부터 선택된다. 그 후, 결합된 태그 및 인덱스 발생기 (1514) 는 PID (1508) 로부터 제 1 및 제 2 부분을 형성한다. 다음으로, PID (1508) 로부터 형성된 인덱스 (addr_1) 는 트리거-타겟 조합을 설명하는 데이터를 회수하기 위해 1524 에서 멀티플렉서 (1518) 를 통해서 타겟 캐시로 가해진다. 스테이지 1 내지 스테이지 3 에서, 파이프라인 (1500) 은 우선순위 조절기가 삽입 또는 우선순위화를 수행하는지의 여부에 관계없이 유사하게 동작한다.First, during stage 0, the priority adjuster considers performing a trigger-target combination insertion. In this example, addr_1 is decoded by the combine-tag-and-index generator 1514 and addr_2 is selected from 1512 via the multiplexer 1516. Combine-tag-and-index generator 1514 performs the collective functions of the index generator and the tag generator. In one embodiment, the combine-tag-and-index generator 1514 is a multiplexer configured to select an address from either 1506 or 1508. In this case, the combine-tag-and-index generator 1514 forms a first address portion that is maintained as a tag addr_1 at 1520 and forms a second address portion that is maintained as an index addr_1 at 1522. In addition, during this step, index addr_1 is applied to target cache 1524 through multiplexer 1518 to retrieve data describing the trigger-target combination. Next, during stage 0, the priority adjuster performs prioritization of the target cache. In this example, addr_1 (or other representation thereof) is received from 1508 and addr_2 is selected from 1510 via multiplexer 1516. The combined tag and index generator 1514 then forms first and second portions from the PID 1508. Next, the index addr_1 formed from the PID 1508 is applied to the target cache through the multiplexer 1518 at 1524 to retrieve data describing the trigger-target combination. In stages 1 through 3, the pipeline 1500 operates similarly regardless of whether the priority adjuster performs insertion or prioritization.

스테이지 1 도중에, 태그 (addr_1) 및 인덱스 (addr_1) 는 각각 1530 및 1532 에 유지되어 잔류한다. 1534 에서, 타겟 어드레스는 타겟 캐시로부터 판 독된다. 스테이지 2 도중에, 우선순위 조절기는 먼저 태그에 대항하여 태그(addr_1) 를 매치시킨다. 1540 에서 어떠한 태그도 매치되지 않으면, 멀티플렉서는 트리거-타겟 조합을 삽입하도록 준비하기 위해 1542 에서 구성된다. 그러나, 타겟 캐시의 방법으로 적어도 하나의 태그가 1544 에서 매치하고, 가장 높은 우선순위 트리거-타겟 조합이 가장 높은 우선순위에 대응하는 방법으로 주둔하지 않으면, 트리거-타겟 조합은 1554 에서 우선순위가 재조정된다. 이를 달성하기 위해, 멀티플렉서는 신규의 트리거-타겟 조합의 우선순위를 재조정하거나 삽입하기 위해 1552 에서 선택된다. 스테이지 3 도중에, 완전하게-접속된 우선순위 재조정 멀티플렉서는 1556 에서 addr_2 를 저장하도록 구성된다. 이 어드레스는 1550 에서 유지된 인덱스 (addr_1) 에 의해 결정된 바와 같이, 스테이지 0 도중 Way 0 에서 타겟 어드레스로 기록된다. 도시된 바와 같이, 1560 에서 완전하게-접속된 우선순위 재조정 멀티플렉서에 의해 결정된 다른 트리거-타겟 조합은 1550 에 유지된 인덱스 (addr_1) 를 이용하는 1524 에서의 타겟 캐시로 캐시 기록 데이터로서 기록된다. 파이프라인 (1500) 이 스테이지 0 으로 반환된 후, 우선순위 조절기는 이에 따른 동작이 계속된다.During stage 1, tag addr_1 and index addr_1 remain at 1530 and 1532, respectively. At 1534, the target address is read from the target cache. During stage 2, the priority adjuster first matches the tag addr_1 against the tag. If no tag matches at 1540, the multiplexer is configured at 1542 to prepare to insert the trigger-target combination. However, if at least one tag matches at 1544 by way of the target cache and the highest priority trigger-target combination is stationed in a way that corresponds to the highest priority, the trigger-target combination is re-prioritized at 1554. do. To accomplish this, the multiplexer is selected at 1552 to reorder or insert a new trigger-target combination. During stage 3, the fully-connected prioritization multiplexer is configured to store addr_2 at 1556. This address is written to the target address at Way 0 during stage 0, as determined by the index addr_1 maintained at 1550. As shown, another trigger-target combination determined by the fully-connected prioritization multiplexer at 1560 is written as cache write data into the target cache at 1524 using the index addr_1 maintained at 1550. After the pipeline 1500 returns to stage 0, the priority regulator continues to operate accordingly.

인벤토리에서From inventory 예측을 발행하는 예시적인 실시형태 Example Embodiments That Issue Predictions

도 16 은 본 발명의 특정 실시형태에 따른 예시적인 예측 인벤토리 (1620) 를 도시하는 블록도이다. 이 예시에서, 예측 인벤토리 (1620) 는 프리페쳐 (1606) 내에 존재하도록 도시된다. 또한, 프리페쳐 (1606) 는 하나 이상의 프로세서에 의해 적어도 메모리 액세스를 제어하도록 설계된 메모리 프로세서 (1604) 내에서 동작하도록 도시된다. 프리페쳐 (1606) 는 요청되기 전에 메모리 (1612) 에서 프로그램 인스트럭션 및 프로그램 데이터 둘 다를 "페치" 하고, 그 후, 그 프로세서에 의한 요청 하에서 프로세서 (1602) 로 그 페칭된 프로그램 인스트럭션 및 프로그램 데이터를 제공하도록 동작한다. 사용하기 전에 그들을 페칭함으로써 (즉, "프리페칭"), 프로세서의 유휴 시간 (예를 들어, 프로세서 (1602) 의 데이터가 고갈되는 시간) 이 최소화된다. 또한, 프리페쳐 (1606) 는 예측을 발생하기 위한 스페큘레이터 (1608) 및 불필요한 예측을 제거하기 위한 필터 (1622) 를 포함한다.16 is a block diagram illustrating an example predictive inventory 1620 in accordance with certain embodiments of the present invention. In this example, predictive inventory 1620 is shown to be within prefetcher 1606. Also, prefetcher 1606 is shown to operate within memory processor 1604 designed to control at least memory access by one or more processors. Prefetcher 1606 “fetches” both program instructions and program data in memory 1612 before being requested, and then provides the program instructions and program data fetched to processor 1602 under a request by the processor. To work. By fetching them before use (ie, “prefetching”), the idle time of the processor (eg, the time when data in processor 1602 is depleted) is minimized. Prefetcher 1606 also includes a speculator 1608 for generating predictions and a filter 1622 for removing unnecessary predictions.

필터 (1622) 는 인벤토리 필터 또는 포스트-인벤토리 필터 둘 중 하나 또는 둘 다를 나타낸다. 불필요한 예측을 제거함으로써, 프리페쳐 (1606) 는 불필요하게 복제가능한 예측을 관리하도록 이용되는 계산 및 메모리 리소스를 유지할 수 있다. 인벤토리 필터 (프리-인벤토리 필터) 는 예측 인벤토리 (1620) 로의 삽입 이전에 불필요한 예측을 제거하기 위해 동작하고, 반면에, 포스트-인벤토리 필터는 메모리 (1612) 로의 발행 이전에 불필요한 예측을 제거한다. 포스트-인벤토리 필터의 일 예가 도 20 에 도시된다. 다음으로, 프리페쳐 (1606) 및 그 구성성분의 동작을 이하 설명한다.Filter 1622 represents one or both of an inventory filter or a post-inventory filter. By eliminating unnecessary predictions, prefetcher 1606 can maintain computational and memory resources that are used to manage unnecessarily replicable predictions. The inventory filter (pre-inventory filter) operates to remove unnecessary predictions prior to insertion into predictive inventory 1620, while the post-inventory filter removes unnecessary predictions before publication to memory 1612. An example of a post-inventory filter is shown in FIG. 20. Next, the operation of the prefetcher 1606 and its components will be described below.

동작시에, 스페큘레이터 (1608) 는 메모리 (1612) 를 액세스하기 위해 프로세서 (1602) 에 의해 요청 ("판독 요청") 에 대한 시스템 버스 (1603) 를 모니터한다. 프로세서 (1602) 는 프로그램 인스트럭션을 실행함에 따라서, 스페큘레이터 (1608) 는 프로세서 (1602) 에 의해 이후에 이용되는 프로그램 인스트럭션 및 프로그램 데이터를 포함하는 어드레스에 대한 판독 요청을 검출한다. 설명을 위해, "어드레스" 는 일반적으로 메모리 (1612) 와 캐시 메모리 (미도시) 사이에서 전송된 메모리의 유닛 또는 캐시 라인과 조합된다. 캐시 메모리는 예측 인벤토리와 무관한 예측 저장소의 일 예이다. 캐시 라인의 "어드레스" 는 메모리 위치로 지칭될 수 있고, 캐시 라인은 메모리 (1612) 의 하나 이상의 어드레스에서 데이터를 포함할 수 있다. 용어 "데이터" 는 프리페칭될 수 있는 정보의 유닛으로 지칭되는 반면, 용어 "프로그램 인스트럭션" 및 "프로그램 데이터" 는 각각 그 프로세싱에서 프로세서 (1602) 에 의해 이용된 인스트럭션 및 데이터를 지칭한다. 따라서, 데이터 (예를 들어, 임의의 비트 수) 는 프로그램 인스트럭션 및/또는 프로그램 데이터를 구성하는 예측가능한 정보를 나타낼 수 있다.In operation, speculator 1608 monitors system bus 1603 for a request (“read request”) by processor 1602 to access memory 1612. As the processor 1602 executes the program instructions, the speculator 1608 detects a read request for an address that includes program data and program data that is subsequently used by the processor 1602. For illustration purposes, an "address" is generally combined with a cache line or unit of memory transferred between memory 1612 and cache memory (not shown). Cache memory is an example of predictive storage that is independent of predictive inventory. An "address" of a cache line may be referred to as a memory location, and the cache line may include data at one or more addresses of memory 1612. The term “data” is referred to as a unit of information that can be prefetched, while the terms “program instructions” and “program data” respectively refer to the instructions and data used by the processor 1602 in its processing. Thus, the data (eg, any number of bits) may represent predictable information that constitutes the program instruction and / or program data.

검출된 판독 요청에 기초하여, 스페큘레이터 (1608) 는 프로세서 (1602) 에 의해 메모리 (1612) 로의 액세스를 정확하게 예측하는 기회를 개선시키기 위해 수많은 예측을 발생시킬 수 있고, 이들 수많은 예측은 잉여의 예측을 포함할 수도 있다. 이러한 예측의 예시는 순방향 순차 예측, 역방향 순차 예측, 백 블라인드 순차 예측, 백 섹터 순차 예측, 비순차 예측 등을 포함한다. 이러한 여분을 제거하기 위해, 인벤토리 필터 (1622) 는 생존하는 예측을 발생시키기 위해 복제 예측을 필터링하고, 그 후, 예측 인벤토리 (1620) 내에 저장된다. 여분을 제거하기 위해, 인벤토리 필터 (1622) 는 예측 인벤토리 (1620) 로 그 예측을 삽입하기 전에 캐시 (미도시) 의 콘텐츠에 대해서 발생된 예측들을 비교한다. 예측과 예측 인벤토리 (1620) 내에 잔류하는 예측 사이에서 매치하는 점이 발견되면, 인벤토 리 필터 (1622) 는 그 예측을 무효화한다. 그러나, 어떠한 매치도 발견되지 않는 경우, 인벤토리 필터 (1622) 는 생존하는 예측을 예측 인벤토리 (1620) 로 삽입한다. 이는, 예측의 신규 그룹 내에서 몇몇 예측 (즉, 일 이벤트, 또는 동일한 트리거 어드레스에 의해 발생된 예측) 이 캐시의 콘텐츠와 매치하는 반면, 다른 예측들은 그렇지 않은 경우일 수도 있다는 것이 명시된다. 이 경우, 인벤토리 필터 (1622) 는 캐시 내의 예측을 매치시키는 개별적인 예측을 무효화하고, 매치되지 않은 그 예측들 (예를 들어, "무효" 로 표시되지 않은) 을 예측 인벤토리 (1620) 로 삽입한다.Based on the detected read request, speculator 1608 can generate a number of predictions to improve the chances of accurately predicting access to memory 1612 by processor 1602, and these numerous predictions are redundant. It may also include predictions. Examples of such predictions include forward sequential prediction, reverse sequential prediction, back blind sequential prediction, back sector sequential prediction, non-sequential prediction, and the like. To remove this redundancy, the inventory filter 1622 filters the replication predictions to generate surviving predictions, and then stores them in the prediction inventory 1620. To remove the redundancy, the inventory filter 1622 compares the predictions generated against the contents of the cache (not shown) before inserting the prediction into the prediction inventory 1620. If a match is found between the prediction and the prediction remaining in the prediction inventory 1620, the inventory filter 1622 invalidates the prediction. However, if no match is found, inventory filter 1622 inserts surviving predictions into predictive inventory 1620. This specifies that within the new group of predictions some predictions (ie, one event, or predictions generated by the same trigger address) match the contents of the cache, while other predictions may be the case. In this case, the inventory filter 1622 invalidates the individual predictions that match the predictions in the cache and inserts those unmatched predictions (eg, not marked "invalid") into the prediction inventory 1620.

예측 인벤토리 (1620) 내에 거주하는 즉시, 예측은 인벤토리의 "아이템" 으로 유지된다. 용어 "아이템" 은 예측 인벤토리 (1620) 내에 저장된 "예측" 또는 (예측을 발생하는) "트리거링 어드레스" 둘 중 하나로 지칭된다. 이들 아이템은 목적을 필터링하기 위해 이후-발생되는 예측에 대해서 비교될 수 있다. 프리페쳐 (1606) 는 이들 아이템을 변화 레이트에서 메모리 (1612) 로 발행하는 동안 인벤토리에서 이들 아이템을 관리한다. 발행의 레이트는 예측의 유형 (예를 들어, 순방향 순차 예측, 비순차 예측, 등), 예측의 각각의 유형의 우선순위, 및 이하 설명되는 다른 요인들에 의존한다.As soon as you reside within the predictive inventory 1620, the forecast remains as an “item” of the inventory. The term “item” is referred to as either “prediction” or “triggering address (generating predictions)” stored in predictive inventory 1620. These items can be compared against post-generated predictions to filter the objectives. Prefetcher 1606 manages these items in the inventory while issuing these items to memory 1612 at the rate of change. The rate of issuance depends on the type of prediction (eg, forward sequential prediction, non-sequential prediction, etc.), priority of each type of prediction, and other factors described below.

예측이 잉여분으로 될 수 있는 하나의 방법은, 프로세서 (1602) 가 특정 어드레스에 대한 실제 판독 요청을 발행하는 경우, 그 어드레스에 대한 예측은 이미 예측 인벤토리 (1620) 에 존재한다. 이 경우, 예측은 필터링되고 (즉, 무효화되고), 프로세서 (1602) 의 실제 판독 요청이 유지된다. 이는, 순차-유형 및 백-유형 예측과 같은 예측에 대해서는 특히 사실이다. 또한, 예측 인벤토리 (1620) 는 프리페쳐 (1606) 가 메모리로 예측을 발행할 때까지 이들 예측을 수신하는 시간 사이에, 몇몇 예측이 잉여분으로 되고, 또한, 프리페쳐 (1606) 가 아이템을 발행하기 전에 예측들을 필터링할 수 있다. 이는, 그 시간 동안 복제물을 발생시키는 잉여의 예측의 수를 다시 감소시키지만, 이후에-발생된 예측은 예측 인벤토리 (1620) 내에 삽입된다. 또한, 잉여의 예측의 수가 감소함에 따라서, 더 많은 리소스가 유지된다.One way that predictions can be redundant is that when processor 1602 issues an actual read request for a particular address, the prediction for that address already exists in the prediction inventory 1620. In this case, the prediction is filtered (ie invalidated) and the actual read request of the processor 1602 is maintained. This is especially true for predictions such as sequential-type and back-type predictions. In addition, the prediction inventory 1620 may have some predictions surplus between the time of receiving these predictions until the prefetchers 1606 issue the predictions to memory, and also for the prefetchers 1606 to publish the items. You can filter the predictions before. This again reduces the number of surplus predictions that generate a duplicate during that time, but the after-generated predictions are inserted into the prediction inventory 1620. Also, as the number of redundant predictions decreases, more resources are maintained.

프리페쳐 (1606) 가 에측 인벤토리 (1620) 에서 예측을 발행한 후, 메모리 프로세서 (1604) 는 메모리 버스 (1611) 를 통해서 잔류하는 예측들 (즉, 적어도 포스트-인벤토리 필터에 의해 필터링되지 않은) 을 메모리 (1612) 로 전송한다. 이에 응답하여, 메모리 (1612) 는 예측된 어드레스에 관련하여 프리페칭된 데이터를 반환한다. 프리페쳐 (1606) 내부 또는 외부에 주둔할 수 있는 캐시 메모리 (미도시) 는 메모리 프로세서로 데이터를 전송하는 시간까지 반환된 데이터를 임시로 저장한다. 정시에 적절한 포인트에서, 메모리 프로세서 (1604) 는, 다른 것들 중에, 잠재 데이터가 최소화됨을 확인하기 위해 시스템 버스 (1603) 를 통해서 프리페칭된 데이터를 프로세서 (1602) 로 전송한다.After the prefetcher 1606 issues a prediction in the side inventory 1620, the memory processor 1604 receives the remaining predictions (ie, at least not filtered by the post-inventory filter) over the memory bus 1611. Transfer to memory 1612. In response, memory 1612 returns prefetched data with respect to the predicted address. A cache memory (not shown), which may be located inside or outside the prefetcher 1606, temporarily stores the returned data until the time of transferring the data to the memory processor. At the appropriate point in time, the memory processor 1604 sends the prefetched data to the processor 1602 via the system bus 1603 to ensure that latent data is minimized, among others.

도 17 은 본 발명의 일 실시형태에 따라 예시적인 예측 인벤토리 (1620) 를 도시한다. 예측 인벤토리 (1620) 는 예측을 저장하기 위해 다수의 큐 (1710, 1712, 1714 및 1716) 를 포함하고, 여기서, 큐는 각각이 발행되거나 필터링될 때까지 예측을 저장하기 위한 버퍼 또는 임의의 이와 같은 구성요소일 수 있다. 또 한, 예측 인벤토리 (1620) 는 인벤토리 관리기 (1704) 및 하나 이상의 큐 속성을 포함하고, 인벤토리 관리기 (1704) 는 구조 및/또는 대응 큐 속성 (1706) 에 따른 큐 각각의 동작을 형성한다.17 illustrates an example predictive inventory 1620 according to one embodiment of the invention. Prediction inventory 1620 includes multiple queues 1710, 1712, 1714, and 1716 to store predictions, where the queues are buffers or any such for storing predictions until each is issued or filtered. It may be a component. In addition, predictive inventory 1620 includes an inventory manager 1704 and one or more queue attributes, and the inventory manager 1704 forms the operation of each of the queues according to the structure and / or corresponding queue attributes 1706.

각각의 큐는 아이템으로서 예측을 유지하고, 이 모든 아이템은 일반적으로 순방향 순차 예측과 같은 예측의 동일한 특별 유형이다. 도시된 바와 같이, 예측 인벤토리 (1620) 는 4 개의 큐, 순차 큐 ("S Queue"; 1710), 백 큐 ("B Queue"; 1712), 비순차 제로-큐 ("NS0 Queue"; 1714), 및 비순차 일-큐 ("NS1 Queue"; 1716) 를 포함한다. 순차 큐 (1710) 는 순방향 순차 예측 또는 역방향 순차 예측 둘 중 하나를 포함하도록 구성될 수 있는 반면, 백 큐 (1712) 는 블라인드 백 순차 예측 또는 백 섹터 순차 예측 둘 중 하나를 포함할 수 있다. 설명을 목적으로, 순방향 순차 예측, 역방향 순차 예측 등은 집합적으로 "일련의-유형" 예측으로 지칭될 수 있는 반면에, 블라인드 백 순차 예측, 백 섹터 순차 예측 등은 집합적으로 "백-유형" 예측으로 지칭된다.Each cue maintains a prediction as an item, all of which are generally the same special type of prediction, such as forward sequential prediction. As shown, predictive inventory 1620 includes four queues, a sequential queue (“S Queue”; 1710), a back queue (“B Queue”; 1712), an out of sequence zero-queue (“NS0 Queue”; 1714). , And out of order one-queue (“NS1 Queue”; 1716). The sequential queue 1710 may be configured to include either forward sequential prediction or reverse sequential prediction, while the back queue 1712 may include either blind back sequential prediction or back sector sequential prediction. For illustrative purposes, forward sequential prediction, reverse sequential prediction, and the like may be collectively referred to as "serial-type" prediction, while blind back sequential prediction, back sector sequential prediction, etc. may be collectively referred to as "back-type". Is referred to as prediction.

예측 인벤토리 (1620) 는 "0 번째" 비순차 큐 및 "첫 번째" 비순차 큐를 포함한다. 비순차 ("제로-") 큐 (1714) 및 비순차 ("일-") 큐 (1716) 는 각각 "가장 높은" 및 "두 번째로 높은" 우선순위를 가지는 비순차 예측을 포함한다. 특히, 비순차 제로-큐 (1714) 는 대응 트리거 어드레스에 의해 발생될 수 있는 (타겟 어드레스의 임의의 수의) 가장 높은 우선순위 타겟 어드레스를 포함하는 비순차 예측을 유지한다. "트리거" 어드레스는 스페큘레이터 (1608) 가 예측을 발생시키는 검출된 어드레스이다. 이러한 예측 (즉, 예측된 어드레스) 은 타겟을 발 생하는 트리거로 패터닝불가한 (예를 들어, 비순차) "타겟" 어드레스이다. 유사하게, 비순차 일-큐 (1716) 는 비순차 예측을 유지하는 대신에, 대응 트리거 어드레스에 의해 발생될 수 있는 두 번째로 높은 우선순위 타겟 어드레스를 포함한다.Predictive inventory 1620 includes a "0th" out of order queue and a "first" out of order queue. Out of order ("zero-") queue 1714 and out of order ("one-") queue 1716 include out of order predictions having "highest" and "second highest" priorities, respectively. In particular, out of order zero-queue 1714 maintains out of order prediction including the highest priority target address (any number of target addresses) that may be generated by the corresponding trigger address. The "trigger" address is the detected address at which speculator 1608 generates the prediction. This prediction (ie, predicted address) is a "target" address that is not patterned (eg, out of order) with a trigger that generates the target. Similarly, out of order one-queue 1716 includes a second highest priority target address that can be generated by the corresponding trigger address, instead of maintaining out of order prediction.

각각의 큐는 그룹 0, 1, 2 및 3 과 같은 그룹 (1720) 의 임의의 수로 이루어질 수 있다. 각각의 그룹 (1720) 은 트리거링 어드레스 및 트리거링 어드레스가 발생하는 대응 예측과 같은 항복의 구성가능한 수를 포함한다. 예를 들어, 순차 큐 (1710) 의 그룹 (1720) 은 각각 트리거링 어드레스 및 7 개의 순차 예측을 포함할 수 있는 반면에, 백 큐 (1712) 의 그룹 (1720) 은 각각 트리거링 어드레스 및 하나의 백-유형 예측을 포함할 수 있다 (또는 몇몇 경우, 이들 큐는 아이템으로 예측만을 포함한다). 또한, 비순차 제로-큐 (1714) 또는 비순차 일-큐 (1716) 둘 중 하나 또는 둘 다의 그룹 (1720) 은 트리거 어드레스 및 4 개의 비순차 예측을 포함할 수 있다 (또는 몇몇 경우, 이들 큐는 아이템으로 예측만을 포함한다). 특정 실시형태에서, 스페큘레이터 (1608) 예측의 특정 수를 발생시키기 위해 그 "배치" 수를 설정함으로써 예측 인벤토리 (1620) 내에 저장된 그룹 (1720) 당 아이템의 수를 결정한다. 그루핑된 아이템으로서 예측 인벤토리 (1620) 내에 예측을 저장함으로써, 그룹 (1720) 은 각각의 예측을 개별적으로 관리하도록 통상적으로 이용된 정보의 양을 감소시키고, 차례로, 예측을 발행할 때 중재를 용이하게 한다.Each queue may consist of any number of groups 1720, such as groups 0, 1, 2, and 3. Each group 1720 includes a configurable number of yields, such as a triggering address and the corresponding prediction in which the triggering address occurs. For example, group 1720 of sequential queue 1710 may include a triggering address and seven sequential predictions, respectively, while group 1720 of back queue 1712 may each include a triggering address and one back- May include type prediction (or in some cases, these cues contain only prediction as an item). In addition, a group 1720 of either or both of the non-sequential zero-queues 1714 or the non-sequential one-queues 1716 may include a trigger address and four out of order predictions (or in some cases, these A cue contains only predictions as an item). In a particular embodiment, the number of items per group 1720 stored in prediction inventory 1620 is determined by setting its “placement” number to generate a particular number of speculator 1608 predictions. By storing predictions in prediction inventory 1620 as grouped items, group 1720 reduces the amount of information typically used to manage each prediction separately, and, in turn, facilitates mediation when issuing predictions. do.

인벤토리 관리기 (1704) 는 각각의 큐에서 아이템의 인벤토리를 관리할 뿐만 아니라 큐의 구조 및/또는 동작을 제어하도록 구성된다. 예측 인벤토리 (1620) 를 관리하기 위해, 인벤토리 관리기 (1704) 는 하나 이상의 큐 속성 (1706) 을 이용하여 전체적으로 또는 부분적으로 수행한다. 큐 속성의 제 1 예는 큐의 유형이다. 예를 들어, 임의의 큐 (1710 내지 1716) 는 "FIFO(first-in first-out)" 버퍼, "LIFO (last-in first-out)" 버퍼, 또는 임의의 다른 유형의 버퍼가 되도록 구성될 수 있다. FIFO 또는 LIFO 와 같은 유형의 큐는 큐와 관련하는 아이템을 삽입하고 제고하는 방법에 영향을 준다. 일 실시형태에서, 순차 큐 (1710) 는 LIFO 로서 구성되고, 비순차 제로-큐 (1714) 및 비순차 일-큐 (1716) 각각은 FIFO 로서 구성된다.Inventory manager 1704 is configured to manage the inventory of items in each queue as well as control the structure and / or operation of the queue. To manage predictive inventory 1620, inventory manager 1704 performs in whole or in part using one or more queue attributes 1706. The first example of a queue attribute is a type of queue. For example, any queue 1710-1716 can be configured to be a "first-in first-out" buffer, a "last-in first-out" buffer, or any other type of buffer. Can be. Types of cues, such as FIFO or LIFO, affect how you insert and remove items associated with the cue. In one embodiment, the sequential queue 1710 is configured as a LIFO, and each of the non-sequential zero-queues 1714 and the non-sequential one-queues 1716 are configured as FIFOs.

큐 속성의 제 2 예시는 만료 시간, 또는 큐, 그룹 또는 항복에 할당가능한 수명시간이다. 이 속성은 예측에 대한 변패도 (degree of staleness) 를 제어한다. 임의의 그룹 (1720) 에서의 예측 또는 큐 에이지 (queue age) 가 변패함에 따라서, 이들은 정확한 에측에 영향을 주지 않을 가능성이 점차 증가한다. 따라서, 오래된 아이템을 최소화하기 위해, 인벤토리 관리기 (1704) 가 아직 발행되지 않은 전체적으로 오래된 그룹 또는 임의의 잔류하는 아이템 둘 다를 제거한 후의 특정 만료 시간까지, 인벤토리 관리기 (1704) 는 일 그룹이 현재 인벤토리를 유지하는 것을 가능하게 한다. 본 발명의 일 실시형태에서, 큐, 그룹 또는 아이템에 대한 수명주기는 무한하게 그들을 보유하도록 구성될 수 있다. 즉, 이들이 "불변" 으로 설정될 수 있다는 것은, 불변의 예측이 발행되거나 또는 불변성이 사라질 때까지 하나의 큐 내에 상주한다는 것을 의미한다. 특정 실시형태에 서, 만료 시간은 큐 내로 하나의 그룹이 삽입될 때 그 그룹과 조합된다. 그 후, 그 그룹이 무효화되는 임의의 잔류하는 아이템이 제로가 될 때까지, 타이머는 만료 시간에서 세어진다. 다른 실시형태에서, 비순차 제로-큐 (1714) 또는 비순차 일-큐 (1716) 둘 중 하나의 그룹 (1720) 에 대한 만료 시간은 비순차 예측이 발행되고 데이터 캐시 내에서 결과적으로 히트되는 가능성을 증가시키기 위해 순차 큐 (1710) 의 그룹 (1720) 보다 더 길게 설정된다.A second example of a queue attribute is an expiration time, or a lifetime that is assignable to a queue, group or yield. This property controls the degree of staleness for prediction. As predictions or queue ages in any group 1720 decay, the likelihood that they will not affect accurate predictions increases gradually. Thus, to minimize old items, inventory manager 1704 maintains the current inventory by one group until a specific expiration time after inventory manager 1704 has removed both the totally old group or any remaining items that have not yet been issued. Makes it possible to do In one embodiment of the present invention, lifecycles for queues, groups or items can be configured to hold them indefinitely. In other words, that they can be set to "constant" means that they remain in one queue until an invariant prediction is issued or the invariant disappears. In a particular embodiment, the expiration time is combined with that group when one group is inserted into the queue. Thereafter, the timer is counted at the expiration time until any remaining item whose group is invalidated becomes zero. In another embodiment, the expiration time for either group 1720 of either out of order zero-queue 1714 or out of order one-queue 1716 is the likelihood that out of order predictions are issued and consequently hit within the data cache. It is set longer than the group 1720 of the sequential queue 1710 to increase.

큐 속성의 제 3 예시는 큐가 꽉 찼을 때 어떻게 인벤토리 관리기 (1704) 가 큐 내부로 예측을 삽입하는지를 나타내는 큐와 조합된 삽입 표시기이다. 일 예시에서, 삽입 표시기는 인벤토리 관리기 (1704) 가 삽입되는 것으로부터 신규로-발생된 예측을 드롭 (drop) 하거나, 또는 특정 큐에 상주하는 오래된 아이템 위에 겹침 기록하는지의 여부를 나타낸다. 삽입 표시기가 "드롭" 을 나타내는, 인벤토리 관리기 (1704) 는 임의의 신규 예측을 훼손시키고, 드롭이 아닌 경우에는 삽입시킨다. 그러나, 삽입 표시기가 "겹침 기록" 인 경우, 인벤토리 관리기 (1704) 는 특정 큐에 대응하는 유형의 큐에 의존하여 2 가지 코스의 동작 중 한 동작을 취한다. 이 큐가 LIFO 로 구성되는 경우, 인벤토리 관리기 (1704) 는 LIFO 의 하부에서 가장 오래된 아이템 및/또는 그룹을 효과적으로 밀어내는 스택으로서 LIFO 내부로 신규의 예측을 더한다. 그러나, 그 큐가 FIFO 로서 구성된 경우, 신규의 예측은 FIFO 내에 가장 오래된 아이템을 겹침 기록한다.A third example of a queue attribute is an insertion indicator in combination with a queue that indicates how inventory manager 1704 inserts predictions into the queue when the queue is full. In one example, the insertion indicator indicates whether inventory manager 1704 drops a newly-generated prediction from being inserted, or writes over an old item residing in a particular queue. Inventory manager 1704, with the insertion indicator indicating “drop,” corrupts any new prediction and inserts it if it is not a drop. However, if the insertion indicator is an "overlap record", the inventory manager 1704 takes one of two course operations depending on the type of queue corresponding to the particular queue. If this queue consists of a LIFO, the inventory manager 1704 adds new predictions into the LIFO as a stack that effectively pushes out the oldest items and / or groups at the bottom of the LIFO. However, if the queue is configured as a FIFO, the new prediction overwrites the oldest item in the FIFO.

큐 속성의 제 4 예시는 다음 아이템이 발행되는 특정 큐를 결정하기 위해 각각의 큐와 조합된 우선순위이다. 일 실시형태에서, 우선순의의 순서는 다음 예 측을 선택하기 위해 큐들 중에서 중재하기 위한 각각의 큐 (1710, 1712, 1714 및 1716) 에 관련하여 설정된다. 시리즈-타입 예측이 더욱 풍부하게 발생되는 어플리케이션에서, 순차 큐 (1710) 를 기능시키는 것은 중요하다. 따라서, 이 큐는, 통상적으로, 상대적으로 높은 우선순위와 조합된다. 이는, 예를 들어, 비순차 제로-큐 ("NS0 Queue") (1714) 및 비순차 일-큐 ("NS1 Queue") (1716) 가 순차 큐 (1710) 와 관련하여 더 낮은 우선순위에 설정될 가능성이 가장 크다는 것을 의미한다. 큐 속성의 다른 예시는 얼마나 많은 예측이 그 내부에 임시로 저장될 수 있는지를 결정하기 위해 각각의 큐와 조합된 큐 크기이다. 예를 들어, 순차 큐는 2 개의 그룹의 크기 또는 깊이를 가질 수 있고, 백 큐는 하나의 그룹의 깊이를 가질 수 있고, 비순차 큐는 4 개의 그룹의 깊이를 가질 수 있다. 큐 크기는 얼마나 큰 인벤토리 메모리가 서로 상이한 유형의 예측에 할당되는지를 제어함으로써 프리페쳐 (1606) 에 의해 발행되는 예측의 수를 제어할 수 있다는 것이 명시된다.The fourth example of a queue attribute is a priority combined with each queue to determine the particular queue on which the next item is to be published. In one embodiment, the order of priority order is set in relation to each queue 1710, 1712, 1714 and 1716 for arbitrating among the queues to select the next prediction. In applications where series-type prediction occurs more abundantly, it is important to function the sequential queue 1710. Thus, this queue is typically combined with a relatively high priority. This means, for example, that a non-sequential zero-queue (“NS0 Queue”) 1714 and a non-sequential one-queue (“NS1 Queue”) 1716 are set to a lower priority with respect to the sequential queue 1710. That means you are most likely to be. Another example of a queue attribute is the queue size combined with each queue to determine how many predictions can be stored therein temporarily. For example, the sequential queue may have a size or depth of two groups, the back queue may have a depth of one group, and the non-sequential queue may have a depth of four groups. It is specified that the queue size can control the number of predictions issued by the prefetcher 1606 by controlling how large inventory memory is allocated to different types of predictions.

백 큐 (1712) 의 우선순위는 본 발명의 일 실시형태에 따라 순차 큐 (1710) 의 우선순위보다 높은 우선순위로 동적으로 증진되거나 또는 변형될 수 있다. 이 특징은, 스페큘레이터 (1608) 가 상부 또는 "프런트" 섹터를 검출한 후에 메모리 (1612) 에서 예측가능한 정보를 회수하는 것에 있다. 이는, 프로세서 (1612) 가 캐시 라인의 상부 또는 프런트 섹터를 요청한 즉시 하부 또는 "백 (back)" 섹터를 요청할 수 있는 가능성이 크기 때문이다. 따라서, 백 큐 (1712) 의 우선순위를 상승시킴으로써, 특히, 백 섹터 순차 예측을 유지하는 경우, 프리페쳐 (1606) 가 적절한 백 섹터 순차 예측을 메모리 (1612) 로 발행하는 가능성이 증가된다. 특정 실시형태에서, 백 큐 카운터 (미도시) 는 백 큐 (1712) 가 아닌 다른 큐에서 발행된 아이템의 수를 카운팅한다. 이 카운터가 임계치에 도달하면, 백 큐 (1712) 는 적어도 순차 큐 (1710) 보다 높은 우선순위로 증진된다. 그 후, 아이템 (예를 들어, 백 섹터 아이템) 은 백 큐 (1712) 에서 발행될 수 있다. 하나 이상의 백-유형 아이템을 발행하거나 또는 (예를 들어, 모든 아이템을 오래되게 하거나 또는 발행함으로써) 백-큐 (1712) 를 비운 뒤, 백 큐 (1712) 의 우선순위는 초기 우선순위 및 백 큐 카운터 리셋으로 복귀한다 (또는 반환한다).The priority of the back queue 1712 may be dynamically enhanced or modified to be higher than the priority of the sequential queue 1710 in accordance with one embodiment of the present invention. This feature resides in retrieving predictable information from the memory 1612 after the speculator 1608 detects an upper or "front" sector. This is because the processor 1612 is likely to request the bottom or "back" sector immediately upon requesting the top or front sector of the cache line. Thus, increasing the priority of the back queue 1712 increases the likelihood that the prefetcher 1606 will issue the appropriate back sector sequential prediction to the memory 1612, especially when maintaining the back sector sequential prediction. In a particular embodiment, a back queue counter (not shown) counts the number of items issued in a queue other than the back queue 1712. When this counter reaches a threshold, the back queue 1712 is promoted to at least a higher priority than the sequential queue 1710. Thereafter, the item (eg, back sector item) can be issued in the back queue 1712. After issuing one or more back-type items or emptying the back-cue 1712 (eg, by aging or issuing all items), the priority of the back queue 1712 is the initial priority and the back queue. Return to (or return to) counter reset.

일반적으로, 비순차 그룹 예측의 임의의 그룹 (1720) 에 대해, 비순차 예측에 대한 타겟 어드레스로서 시리즈-유형 및 백-유형 예측의 혼입이 있을 수 있다. 특히, 비순차 어드레스의 그룹은 시리즈-유형 (즉, 순반향 또는 역방향 둘 중 하나) 예측만을 포함할 수 있다. 그러나, 이들 그룹은 또한 백-타입과 혼합된 수 많은 시리즈-타입 예측을 포함할 수 있다. 이전의 예시와 같이, 스페큘레이터 (1608) 는 트리거 어드레스 "A" 가 타겟 어드레스 "B" 및 다른 타겟 어드레스 "C" 와 조합된다. 타겟 어드레스 B 가 C 보다 높은 우선순위인 경우, B 는 트리거 어드레스 A 에 대해 비순차 예측의 그룹을 따라서 비순차 제로-큐 (1714) 내에 유지된다. 다음으로, 이 그룹은 예측 B0 (즉, 어드레스 B), B1, B2 및 B3 를 포함할 수 있고, 이 예측들은 어드레스 A 에 대해 비순차이지만 모두 순방향 시리즈-유형이다. 이하의 예시로서, 그룹 (1720) 은 비순차 예측 B(-1) (즉, 어드레스 B-1), B0, B1, 및 B2 를 포함할 수 있고, 여기서 예측 B(-1) 은 다른 시리즈-유형 예측과 혼합된 백-유형 예측이다. 또는, 그룹 (1720) 은 이하 상사하게 설명되지 않은 예측의 임의의 다른 배열을 포함할 수 있다. C 가 B 보다 2 번째로 높은 우선순위를 가지기 때문에, C 는 비순차 예측의 유사한 그룹을 가지는 비순차 일-큐 (1716) 내에 유지된다. 결과적으로, 예측 (B0, B1, B2 및 B3) 는 비순차 제로-큐 (1714) 의 그룹 3 으로서 삽입될 수 있고, 예측 C0, C1, C2 및 C3 는 비순차 일-큐 (1716) 의 그룹 3 으로서 삽입될 수 있다.In general, for any group 1720 of out of order group prediction, there may be a mix of series-type and back-type prediction as target addresses for out of order prediction. In particular, a group of non-sequential addresses may only contain series-type (ie, forward or reverse) prediction. However, these groups can also include many series-type predictions mixed with bag-types. As in the previous example, speculator 1608 has a trigger address "A" combined with a target address "B" and another target address "C". If target address B is higher priority than C, B remains in out of order zero-cue 1714 along the group of out of order predictions for trigger address A. Next, this group may include predictions B0 (ie, address B), B1, B2 and B3, which are out of order for address A but are all forward series-type. As an example below, group 1720 may include out of order prediction B (-1) (ie address B-1), B0, B1, and B2, where prediction B (-1) is a different series- It is a bag-type prediction mixed with type prediction. Or, group 1720 may include any other arrangement of predictions that is not described below. Since C has a second higher priority than B, C remains in the out of order one-cue 1716 with a similar group of out of order predictions. As a result, predictions B0, B1, B2 and B3 can be inserted as group 3 of out of order zero-cues 1714, and predictions C0, C1, C2 and C3 are groups of out of order one-cues 1716. It can be inserted as three.

또한, 도 17 은, 일 실시형태에서 예측 인벤토리 (1620) 가 생존하는 예측 통로를 통해서 인벤토리 필터 (1702) 를 경유하여 예측 (1701) 을 수신하도록 구성된다는 것을 도시한다. 그 후, 생존하는 예측은 적절한 큐로 삽입되고, 전술한 인벤토리 관리기 (1704) 에 의해 관리된다. 예시적인 인벤토리 필터 (1702) 는 이하 설명된다.In addition, FIG. 17 illustrates that, in one embodiment, the predictive inventory 1620 is configured to receive the prediction 1701 via the inventory filter 1702 through the surviving prediction path. The surviving predictions are then inserted into the appropriate queue and managed by the inventory manager 1704 described above. An example inventory filter 1702 is described below.

도 18 은 본 발명의 특정 실시형태에 따른 인벤토리 필터 (1702) 의 예시를 도시한다. 이 예시가 도 17 의 순차 큐 (1710) 과 같은 순차 큐에 대한 순방향 순차 예측을 필터링하도록 가해지기 때문에, 인벤토리 필터 (1702) 는 임의의 유형의 예측을 필터링하기 위해 임의의 큐와 협력하여 이용될 수 있다. 즉, 인벤토리 필터 (1702) 는 상이한 예측 유형의 예측을 포함하는 하나 이상의 다른 큐에 대한 임의의 예측 유형의 예측의 임의의 수를 비교하도록 구성될 수 있다. 예를 들어, 수많은 순방향 순차 예측은 백 큐 등에 대항하여 필터링될 수 있다. 인벤토리 필터 (1702) 는 그룹 (1806) 의 아이템과 수많은 예측 (1802) 을 매치시키 기 위해 적어도 매처 (1804) 를 포함한다. 그룹 (1806) 은 아이템 A0 와 각각 조합된 아이템 A1 내지 A7 을 포함한다. A0 은 아이템 A1 내지 A7 로서 식별된 예측을 사전에 발생시킨 트리거링 어드레스이다. 또한, 그룹 (1806) 은 순차 큐 (1710) 내의 임의의 그룹 (1720) 으로서 상주할 수 있다. 예측 (1802) 의 수에 관한 한, 이들은 트리거링 어드레스로서 "TA" 를 포함하고, 어드레스 TA 의 검출하에서 스페큘레이터 (1608) 에 의해 발생된 모든 예측 B1 내지 B7 을 포함한다. 비록, 도 18 은 하나의 그룹 (즉, 그룹 (1806)) 만을 도시하지만, 동일한 큐의 다른 그룹 (1720) 은 동일한 방법으로 동시에 필터링될 수 있다는 것이 명시된다.18 illustrates an example of an inventory filter 1702 in accordance with certain embodiments of the present invention. Since this example is applied to filter forward sequential predictions for sequential queues, such as sequential queue 1710 in FIG. 17, inventory filter 1702 may be used in conjunction with any queue to filter any type of prediction. Can be. That is, inventory filter 1702 can be configured to compare any number of predictions of any prediction type for one or more other queues that include predictions of different prediction types. For example, a number of forward sequential predictions can be filtered against back queues and the like. Inventory filter 1702 includes at least a matcher 1804 to match a number of predictions 1802 with items in group 1806. Group 1806 includes items A1 through A7 in combination with item A0, respectively. A0 is the triggering address that previously generated the prediction identified as items A1 through A7. Further, group 1806 can reside as any group 1720 in sequential queue 1710. As far as the number of predictions 1802 are concerned, they include "TA" as the triggering address and include all predictions B1 through B7 generated by the speculator 1608 under detection of the address TA. Although FIG. 18 shows only one group (ie, group 1806), it is specified that other groups 1720 of the same queue can be filtered simultaneously in the same manner.

특정 실시형태에서, 매처 (1804) 는 CMP0, CMP1, CMP2,...CMPM (미도시) 로 식별된 수많은 비교기로 구성된다. 비교기 CMP0 는 그룹 (1806) 의 N 아이템에 대항하는 TA 를 비교하도록 구성되고, 비교기 CMP1, CMP2,...CMPM 각각은 예측 (1802) 에서의 예측을 그룹 (1806) 의 수많은 N 아이템에 대해 비교하고, 여기서, M 은 발생된 예측의 가장 큰 수를 수용하도록 설정된다. 예시로서, M 은 7 이고, 그로 인해, 7 개의 비교기를 요구하며, N 은 3 이며, 각각의 비교기는 1802 의 하나의 엘리먼트를 (1806) 의 3 개의 아이템에 대해 비교한다고 고려한다. 또한, 예측 (1802) 의 각각의 엘리먼트는 동일한 위치 (예를 들어, 첫 번째 대 첫 번째, 두 번째 대 두 번째 등) 를 가지는 대응 아이템에 매치된다고 가정한다. 이와 같이, CMP0 는 A0, 아이템 A1 및 아이템 A2 에 대한 TA 를 비교하고, CMP1 은 아이템 A1, A2, 및 A3 에 대한 예측 B1 등을 비교한다. 수 N 은 비교기 하드웨 어의 양을 최소화하도록 설정될 수 있지만, 시스템 버스 (1603) 상에서 검출된 어드레스의 스트림에서 작은 점프 (즉, N 보다 크지 않은) 를 결과로 도출할 수도 있는 이들 예측과 연속적인 스트림을 충분히 필터링 하도록 설정될 수 있다.In a particular embodiment, matcher 1804 consists of a number of comparators identified as CMP0, CMP1, CMP2, ... CMPM (not shown). Comparator CMP0 is configured to compare TA against N items in group 1806, and each of comparators CMP1, CMP2, ... CMPM compares the prediction in prediction 1802 against the number of N items in group 1806. Where M is set to accommodate the largest number of predictions generated. As an example, M is 7, and therefore requires seven comparators, N is 3, and each comparator considers that one element of 1802 is compared against three items of 1806. In addition, assume that each element of prediction 1802 matches a corresponding item having the same location (eg, first versus first, second versus second, etc.). As such, CMP0 compares TA for A0, item A1 and item A2, and CMP1 compares predictions B1 for items A1, A2, A3, and the like. The number N can be set to minimize the amount of comparator hardware, but it is continuous with these predictions that may result in small jumps (ie, not greater than N) in the stream of addresses detected on the system bus 1603. It can be set up to sufficiently filter the conventional stream.

일 실시형태에서, 큐는 A0 을 나타내기 위해 페이지 어드레스를 저장하고, 아이템 A1, 아이템 A2 등을 나타내는 각각을 오프셋한다. 매칭이 이러한 경우에 존재하는지의 여부를 결정하기 위해, 예측 (1802) 에서의 어드레스 TA 및 특정 예측의 오프셋의 페이지 어드레스는 A0 의 페이지 어드레스 및 대응 오프셋에 대해 각각 비교된다. 본 발명의 특정 실시형태에서, 인벤토리 필터 (1702) 는 비순차 예측에 대항하여 순차 예측을 필터링 하지 않고, 따라서, 비순차 제로-큐 (1714) 또는 비순차 일-큐 (1716) 둘 중 하나와 작용하지 않는다. 이는, 순차 예측과 존재하는 많은 잉여분으로서 비순차 스페큘레이션을 가질 가능성이 적을 수도 있기 때문이다.In one embodiment, the queue stores page addresses to represent A0 and offsets each representing item A1, item A2, and the like. To determine whether a match exists in this case, the address TA in prediction 1802 and the page address of the offset of the particular prediction are compared against the page address of A0 and the corresponding offset, respectively. In certain embodiments of the present invention, the inventory filter 1702 does not filter sequential predictions against out of order predictions, and thus, with either out of order zero-cue 1714 or out of order one-cue 1716. It doesn't work. This is because it may be less likely to have sequential speculation as sequential prediction and as much surplus present.

도 19a 및 도 19b 는 본 발명의 특성 실시형태에 따라서 여분을 필터링하는 예시적인 기술을 도시하는 도면이다. 매처 (1804) 가 매치한다고 결정되면, 신규로-발생된 예측 (즉, 신규 아이템 K) 또는 이전에 발생된 아이템 (즉, 기존의 아이템 K) 둘 중 하나는 무효화된다. 도 19a 는 신규 아이템 K 또는 기존 아이템 K 둘 중 하나가 필터링되거나 또는 무효화되는 것을 나타낸다. 이 경우, 큐 (1902) 는 FIFO 이다. 이와 같이, 신규 아이템 K 는 무효화되고, 그로 인해, 기존 아이템 K 를 유지한다. 반대로, 도 19b 는 큐 (1904) 가 LIFO 인 경우, 기존의 아이템 K 가 무효화되고, 그로 인해, 신규 아이템 K 를 유지하는 것을 나타 낸다. 일반적으로, 신규 아이템 K 또는 기존 아이템 K 둘 중 가장 빠른 발행이 유지되는 반면, 나중에 발행되는 아이템은 무효화된다. 일반적인 기술의 당업자는 인벤토리 필터 (1702) 가 본 발명의 범위 및 정신에서 벗어나지 않고 다른 기술을 채용할 수 있다는 것을 명심해야만 한다.19A and 19B are diagrams illustrating exemplary techniques for filtering excess in accordance with certain embodiments of the present invention. If the matcher 1804 is determined to match, either the newly-generated prediction (ie, new item K) or the previously generated item (ie, existing item K) is invalidated. 19A shows that either the new item K or the existing item K is filtered or invalidated. In this case, queue 1902 is a FIFO. In this way, the new item K is invalidated, thereby keeping the existing item K. In contrast, FIG. 19B shows that if the queue 1904 is LIFO, the existing item K is invalidated, thereby maintaining the new item K. In general, the earliest publication of either New Item K or Existing Item K is maintained, while later published items are invalidated. Those skilled in the art should appreciate that the inventory filter 1702 may employ other techniques without departing from the scope and spirit of the invention.

도 20 은 본 발명의 일 실시형태에 따른 프리페쳐 내에 배치된 다른 예시적인 예측 인벤토리를 나타낸다. 이 예시에서, 프리페쳐 (2000) 는 스페큘레이터 (1608) 및 필터 (2014) 를 포함한다. 도 20 의 프리페쳐 (2000) 는 또한 다중-레벨 캐시 (2020) 및 예측 인벤토리 (1620) 를 포함한다. 여기서, 다중-레벨 캐시 (2020) 는 제 1 레벨 데이터 복귀 캐시 ("DRC1"; 2022) 및 제 2 레벨 데이터 복귀 캐시 ("DRC2"; 2024) 로 구성된다. 제 1 레벨 데이터 복귀 캐시 (2022) 는 일반적으로 단기간 데이터 저장소로서 설명될 수 있고, 제 2 레벨 데이터 복귀 캐시 (2024) 는 일반적으로 장기간 데이터 저장소로서 설명될 수 있다. 다중-레벨 캐시 (2020) 는 프로세서 (1602) 가 그들을 요구할 때까지 메모리 (1612) 에서 프리페칭된 프로그램 인스트럭션 및 프로그램 데이터를 저장한다. 신규로-발생된 예측이 다중-레벨 캐시 (2020) 에 대해 필터링될 수 있도록, 다중-레벨 캐시 (2020) 의 캐시는 또한 프리페칭된 예측가능한 정보를 발생시킨 예측에 대한 참조정보를 저장한다. 예를 들어, DRC1 (2022) 및 DRC2 (2024) 는 캐시 라인 또는 메모리 유닛에 대한 데이터에 더해서 참조로서 정보의 2 가지 유형을 저장하고: 여기서 2 가지 유형은, (1) 신규의 예측에 대항하여 필터링하도록 이용된 저장된 캐시 라인에 대한 어드레스, 및 (2) 캐시 라인이 예측의 결과로서 캐시로 수용되는 경우 트리거 어드레스이다. 특히, 트리거 어드레스는 스페큘레이터 (1608) 내에 비순차 예측의 우선순위를 섞도록 이용된다.20 illustrates another example predictive inventory placed in a prefetcher, according to one embodiment of the invention. In this example, the prefetcher 2000 includes a speculator 1608 and a filter 2014. The prefetcher 2000 of FIG. 20 also includes a multi-level cache 2020 and predictive inventory 1620. Here, the multi-level cache 2020 is comprised of a first level data return cache ("DRC1") 2022 and a second level data return cache ("DRC2") 2024. The first level data return cache 2022 may generally be described as a short term data store, and the second level data return cache 2024 may generally be described as a long term data store. The multi-level cache 2020 stores pre-fetched program instructions and program data in memory 1612 until processor 1602 requires them. The cache of the multi-level cache 2020 also stores reference information about the prediction that generated the prefetched predictable information so that the newly-generated prediction can be filtered against the multi-level cache 2020. For example, DRC1 2022 and DRC2 2024 store two types of information as references in addition to data for a cache line or memory unit: where the two types are: (1) against new predictions; The address for the stored cache line used to filter, and (2) the trigger address if the cache line is accepted into the cache as a result of the prediction. In particular, the trigger address is used to mix the order of out of order prediction within speculator 1608.

예측 인벤토리 (1620) 는 중재기 (2018) 에 의해 선택될 때까지 발생된 예측에 대한 임시의 저장소를 제공한다. 예측 인벤토리 (1620) 내에 저장된 예측은 여분을 필터링하도록 이용되고, 그렇지 않으면 발행된다. 중재기 (2018) 는 중재 규칙에 따라서 발생된 예측이 인스트럭션 및 데이터를 프리페칭하기 위해 발행되는지를 결정하도록 구성된다. 일반적으로, 이러한 중재 규칙은 예측을 발행하기 위한 특정 큐를 선택하는 기준을 제공한다. 예를 들어, 중재기 (2018) 는 큐 및/또는 그룹 사이의 상대적인 우선순위에 기초하여 일부분 또는 전체적으로 예측을 선택하고 발행한다.Prediction inventory 1620 provides a temporary repository for predictions generated until selected by arbiter 2018. The predictions stored in the prediction inventory 1620 are used to filter out the redundancy, otherwise issued. The arbiter 2018 is configured to determine if a prediction generated in accordance with the arbitration rules is issued for prefetching instructions and data. In general, such an arbitration rule provides a criterion for selecting a particular cue to issue a prediction. For example, the arbiter 2018 selects and issues predictions in part or in whole based on the relative priorities between queues and / or groups.

필터 (2014) 는 적어도 2 개의 필터: 캐시 필터 (2010) 및 인벤토리 필터 (1702) 를 포함한다. 캐시 필터 (2010) 는 다중-레벨 캐시 (2020) 내에 이미 저장된 프리페칭된 인스트럭션 및 데이터를 유발하는 이들 사전 예측과 신규로-발생된 예측을 비교하도록 구성된다. 따라서, 하나 이상의 신규로-발생된 예측이 다중-레벨 캐시 (2020) 에 관하여 임의의 사전-발생 예측에 대해 잉여분이 있는 경우, 잉여의 예측은 프로세싱을 요구하는 예측을 수를 최소화하도록 취소된다. 잉여의 예측 (즉, 잉여분, 불필요한 예측) 은 신규로-발생된 예측일 수 있다는 것이 명시된다. 인벤토리 필터 (1702) 는 예측 인벤토리 (1620) 내에 이미 발생되고 저장된 예측에 대항하는 신규-발생된 예측을 비교하도록 구성된다. 일 실시형태에서, 인벤토리 필터 (1702) 는 도 18 에 도시된 예측의 구조 및/또는 기능 과 유사하다. 또, 하나 이상의 신규로-발생된 예측이 예측 인벤토리 (1620) 내에 사전에 저장된 예측에 대해 잉여분이 있는 경우, 임의의 잉여의 예측은 프리페쳐 리소스를 자유롭게 하도록 취소된다.Filter 2014 includes at least two filters: cache filter 2010 and inventory filter 1702. The cache filter 2010 is configured to compare these pre-predictions with the newly-generated predictions that result in prefetched instructions and data already stored in the multi-level cache 2020. Thus, if one or more newly-generated predictions have excess for any pre-generated predictions with respect to multi-level cache 2020, the excess predictions are canceled to minimize the number of predictions that require processing. It is specified that the surplus prediction (ie, surplus, unnecessary prediction) may be a newly-generated prediction. Inventory filter 1702 is configured to compare newly-generated predictions against predictions that have already been generated and stored in prediction inventory 1620. In one embodiment, the inventory filter 1702 is similar to the structure and / or function of the prediction shown in FIG. 18. In addition, if one or more newly-generated predictions have surplus for the predictions previously stored in prediction inventory 1620, any surplus predictions are canceled to free the prefetcher resource.

잉여의 예측의 수를 더 감소시키기 위해, 포스트-인벤토리 필터 (2016) 가 프리페쳐 (2000) 내에 포함된다. 프리페쳐 (1606) 가 예측을 예측 인벤토리 (1620) 로부터 발행하기 직전 또는 그 후에, 포스트-인벤토리 필터 (2016) 는 중재기 (2018) 가 발행하는 예측을 선택할 때까지 이들 예측을 먼저 수신하는 시간 사이에서 발생하는 잉여의 예측을 필터링한다. 통상적으로, 예측 인벤토리에서 아이템의 동일한 예측 어드레스를 나타내는 예측이 예측 인벤토리 (1620) 에서 메모리로 발행될 수도 있고 어떠한 예측가능한 정보도 캐시 (2020) (즉, 어떠한 참조정보도 이에 대항하여 필터링하기 위한 캐시 (2020) 내에 없는) 로 복귀되지 않을 수도 있기 때문에, 잉여분이 발생한다. 일 실시형태에서, 포스트-인벤토리 필터 (2016) 는 도 18 에 도시된 인벤토리 필터 (1702) 또는 캐시 필터 (2002) 둘 중 하나의 구조 및/또는 기능성에서 유사할 수 있다.To further reduce the number of redundant predictions, a post-inventory filter 2016 is included in the prefetcher 2000. Immediately before or after the prefetcher 1606 issues predictions from the prediction inventory 1620, the post-inventory filter 2016 is between the times of first receiving these predictions until the arbitrator 2018 selects the predictions it publishes. Filter out any excess predictions that occur in. Typically, a prediction that represents the same predictive address of an item in the predictive inventory may be issued from predictive inventory 1620 to memory and cache for predicting any predictable information 2020 (ie, filtering any reference information against it). Since it may not return to (not in 2020), a surplus occurs. In one embodiment, the post-inventory filter 2016 may be similar in structure and / or functionality of either the inventory filter 1702 or the cache filter 2002 shown in FIG. 18.

일 실시형태에서, 포스트-인벤토리 필터 (2016) 는 예측 인벤토리 (1620) 내의 각각의 그룹 (1720) 의 각각의 아이템에 대한 발행 정보를 유지한다. 특히, 이 발행 정보는 특정 그룹의 어떠한 아이템이 발행되었는지를 나타낸다. 그러나, 포스트-인벤토리 필터 (2016) 는 예측 인벤토리 (1620) 에서 발행된 아이템을 제거하지 않는다. 더 정확히 말하면, 이들은 도입 잉여의 예측을 필터링할 때와 비교될 수 있도록 잔류한다. 특정 그룹이 발행하는 각각의 아이템에 따라 서, 발행 정보는 이에 영향을 주기 위해 업데이트된다. 모든 아이템이 발행되면, 그룹은 제거되고, 큐는 부가적인 아이템을 취하기 위해 해방된다.In one embodiment, post-inventory filter 2016 maintains publication information for each item of each group 1720 in predictive inventory 1620. In particular, this publication information indicates which items of a particular group were published. However, the post-inventory filter 2016 does not remove the issued item from the predictive inventory 1620. More precisely, they remain so that they can be compared with when filtering out the prediction of the introduction surplus. With each item issued by a particular group, the publication information is updated to affect it. Once all items have been issued, the group is removed and the queue is released to take additional items.

일 실시형태에서, 중재기 (2018) 는 예측을 발행하는 것에 관련하는 예측 인벤토리 (1620) 의 몇몇 양태를 제어할 수 있다. 특히, 중재기 (2018) 는 가장 효율적인 예측을 발행하도록 큐, 그룹 또는 아이템 중에 상관 우선순위를 변경할 수 있다. 특정 실시형태에서, 중재기 (2018) 는 메모리 (1612), 캐시 메모리 (2020), 또는 메모리 서브시스템의 다른 컴포넌트와 같은 메모리 (즉, 메모리 과도-사용효율) 를 과도하게 부담하는 수많은 예측의 발생을 감속하기 위해 상관 우선순위를 효율적으로 변형시키도록 구성된다. 예를 들어, 중재기 (2018) 는 각각의 큐에 구성가능한 부하 임계치를 할당할 수 있다. 이 임계치는 특정 큐가 예측을 발행할 수 있는 최대의 레이트를 나타낸다. 이 적재 임계치는 표준 부하량 축적기 (미도시) 의 콘텐츠에 대해 비교되고, 메모리 (1612) 에서 요청된 작업의 축적된 단위를 유지한다. 작업의 단위는 판독, 기록 등과 같은 메모리 (1612) 의 요청된 임의의 동작이다. 작업의 추가적인 유닛이 메모리 (1612) 에 요청되기 때문에, 표준 부하량 축적기 내의 값은 증가한다. 그러나, (예를 들어, 클록 주기의 매 특정 수에 대해) 시간이 지남에 따라서, 값은 감소한다. 동작시에, 중재기 (2018) 는 각각의 큐의 부하 임계치를 평균 부하량 축적기의 값과 비교한다. 부하 임계치가 평균 부하량 값에 의해 초과되면, 중재기 (2018) 는 2 개의 예시적인 동작 중 하나를 수행한다. 중재기 (2018) 는, 그 내부의 아이템이 발행되거나 소멸되는 둘 중의 하나가 되도록, 특정 큐에 대한 예측을 취 하는 것을 중지하기 위해 예측 인벤토리 (1620) 에 명령할 수 있다. 또는, 중재기 (2018) 는 그들을 겹쳐 기록함으로써 큐 이외의 아이템을 취할 수 있다. 중재기 (2018) 는 평균 부하량 값이 부하 임계치 이하로 하락함을 검출하고, 큐는 예측을 다시 발행하는 것이 가능하다.In one embodiment, the arbiter 2018 may control some aspects of the predictive inventory 1620 related to issuing predictions. In particular, the arbiter 2018 can change the correlation priority among queues, groups or items to issue the most efficient prediction. In certain embodiments, the arbiter 2018 may generate numerous predictions that excessively burden memory (ie, memory over-use efficiency), such as memory 1612, cache memory 2020, or other components of the memory subsystem. It is configured to efficiently transform the correlation priority in order to slow down. For example, arbiter 2018 can assign a configurable load threshold to each queue. This threshold represents the maximum rate at which a particular queue can make predictions. This loading threshold is compared against the content of a standard load accumulator (not shown) and maintains an accumulated unit of requested work in memory 1612. The unit of work is any requested operation of the memory 1612 such as read, write, and the like. Because additional units of work are requested to memory 1612, the value in the standard load accumulator increases. However, over time (eg, for every particular number of clock cycles), the value decreases. In operation, the arbiter 2018 compares the load threshold of each queue with the value of the average load accumulator. If the load threshold is exceeded by the average load value, the arbiter 2018 performs one of two example operations. The arbiter 2018 can instruct the predictive inventory 1620 to stop taking predictions for a particular queue such that the item therein is one of two published or destroyed. Or, the arbitrator 2018 can take items other than queues by overwriting them. The arbiter 2018 detects that the average load value falls below the load threshold, and the queue is able to issue a prediction again.

캐시 메모리에서 예측 정보의 Of predictive information in cache memory 미리보기Preview 룩업Look up ( ( LookLook -- AheadAhead LookupLookup ) 을 )

수행하기 위한 예시적인 실시형태Example Embodiments for Performing

도 21 은 본 발명의 특정 실시형태에 따른 예시적인 다중-레벨 캐시 (2120) 를 포함하는 프리페쳐 (2100) 를 도시하는 블록도이다. 이 예시에서, 다중-레벨 캐시 (2120) 는 캐시 필터 (2110), 제 1 레벨 데이터 복귀 캐시 ("DRC1"; 2122) 및 제 2 레벨 데이터 복귀 캐시 ("DRC2"; 2124) 를 포함한다. 캐시 필터 (2110) 는 이들 캐시 내에서 예측 어드레스와 같은 입력 어드레스의 존재 또는 부재 둘 중 하나를 검출하기 위해 제 1 레벨 DRC (2122) 및 제 2 레벨 DRC (2124) 상에서 신속하게 시험하거나 또는 "미리보기 룩업" 을 수행하도록 구성된다. 미리보기 룩업은 예측의 수가 예를 들어 다중-레벨 캐시 (2120) 내에 이미 존재하는지의 여부를 동시에 결정하기 위한 캐시 메모리의 시험이다.21 is a block diagram illustrating a prefetcher 2100 that includes an exemplary multi-level cache 2120 in accordance with certain embodiments of the present invention. In this example, the multi-level cache 2120 includes a cache filter 2110, a first level data return cache ("DRC1") 2122, and a second level data return cache ("DRC2") 2124. Cache filter 2110 can quickly test or " preview " on first level DRC 2122 and second level DRC 2124 to detect either the presence or absence of an input address, such as a prediction address, within these caches. View lookup ". The preview lookup is a test of cache memory to simultaneously determine whether the number of predictions already exists in the multi-level cache 2120, for example.

예측의 존재 또는 부재의 여부에 의존하여, 다중-레벨 캐시 (2120) 는 이하 설명되는 예시들과 같은 수단을 캐싱함에 따라서 제 1 레벨 DRC (2122) 및 제 2 레벨 DRC (2124) 의 콘텐츠를 관리한다. 제 1 레벨 DRC (2122) 는 단기간 데이터 저장소로서 일반적으로 설명될 수 있고, 제 2 레벨 DRC (2124) 는 장기간 데이터 저장소로서 일반적으로 설명될 수 있으며, 제 1 레벨 DRC (2122) 의 예측은 프로세 서가 이들 예측을 요청하지 않을 때 결국 제 2 레벨 DRC (2124) 로 이동한다. 본 발명의 일 실시형태에 따라서, 제 1 레벨 DRC (2122) 또는 제 2 레벨 DRC (2124) 의 둘 중 하나, 또는 둘 다는 예측된 어드레스 뿐만 아니라 프로세서-요청된 어드레스에 기초하여 프리페칭된 프로그램 인스트럭션 및 프로그램 데이터를 저장할 수 있다. 또한, 캐시 필터 (2110), 제 1 레벨 DRC (2122) 및 제 2 레벨 DRC (2124) 는 잉여의 예측을 감소할 뿐만 아니라 예측가능한 정보의 프리페칭의 속도를 높임으로써 (예를 들어, 페이지 오프닝 동작을 처리함으로써) 프리페칭된 프로그램 인스트럭션 및 프로그램 데이터를 제공하는 잠재기를 감소하도록 협력한다. 이하의 설명이 다중-레벨 캐시 메모리 (즉, 다중 캐시) 에 관한 것이라 할 지라도, 임의의 이하의 예시적인 실시형태는 단일 캐시 메모리를 포함할 수 있다는 것이 명시된다.Depending on the presence or absence of prediction, the multi-level cache 2120 manages the content of the first level DRC 2122 and the second level DRC 2124 according to caching means such as the examples described below. do. The first level DRC 2122 may be described generally as a short term data store, the second level DRC 2124 may be described generally as a long term data store, and the prediction of the first level DRC 2122 may be a process. When the book does not request these predictions it eventually moves to the second level DRC 2124. In accordance with one embodiment of the present invention, either or both of the first level DRC 2122 or the second level DRC 2124 are prefetched program instructions based on the processor-requested address as well as the predicted address. And program data. In addition, cache filter 2110, first level DRC 2122 and second level DRC 2124 not only reduce redundant prediction but also speed up prefetching of predictable information (e.g., page opening). Work together to reduce the potential for providing prefetched program instructions and program data). Although the description below relates to multi-level cache memory (ie, multiple caches), it is specified that any of the following exemplary embodiments may include a single cache memory.

캐시 필터 (2110) 는 동시에 수많은 다중 캐시의 각각에 대한 입력 주소의 범위를 비교하도록 구성되고, 여기서 다중 캐시는 자연적으로 계층적이다. 예를 들어, 제 1 캐시는 더 작은 크기일 수 있고, 상대적으로 단기간의 예측을 저장하도록 구성될 수 있는 반면, 제 2 캐시는 더 큰 크기일 수 있고, 제 1 캐시의 주기보다 더 긴 주기 동안 예측을 저장하도록 구성될 수 있다. 또한, 제 2 캐시는 본 발명의 일 실시형태에 따르면 그 예측된 어드레스 및 제 1 캐시에서 대응 예측 데이터 만을 수신한다. 동시에 양 캐시를 시험하기 위해서는, 특히, 제 2 캐시가 제 1 캐시보다 큰 경우에는, 캐시 필터는 각각의 어드레스 "룩업된", 또는 캐시 내에서 시험된 2 개의 표현을 발생시킨다. 제 1 캐시에 대해 이용된 하나 의 표현 및 제 2 캐시에 대해 이용된 제 2 표현을 통해서 양 캐시는 동시에 시험된다. 이에 대한 한가지의 이유는 더 작은 캐시 내에서보다 더 큰 캐시 내에서 시험을 요구하는 어드레스 및 엔트리가 더 많기 때문이다. 만약 둘 다 동시에 시험되는 경우, 작은 캐시의 어드레스보다 더 큰 캐시의 어드레스를 시험하는 더욱 큰 효율성의 기술이 필요하다. 이하 설명되는 쿼리 인터페이스는 이들 기능을 수행한다.Cache filter 2110 is configured to compare the range of input addresses for each of a number of multiple caches at the same time, where the multiple caches are naturally hierarchical. For example, the first cache may be of smaller size and may be configured to store relatively short-term predictions, while the second cache may be of larger size and for longer periods than the period of the first cache. It may be configured to store the prediction. In addition, the second cache receives only the predicted address and corresponding prediction data in the first cache according to one embodiment of the present invention. To test both caches at the same time, especially when the second cache is larger than the first cache, the cache filter generates two representations of each address "looked up" or tested within the cache. Both caches are tested simultaneously with one representation used for the first cache and the second representation used for the second cache. One reason for this is that there are more addresses and entries requiring testing in a larger cache than in a smaller cache. If both are tested at the same time, a more efficient technique is needed to test the address of the larger cache than the address of the small cache. The query interface described below performs these functions.

또한, 프리페쳐 (2100) 는 예측을 발생시키기 위해 스페큘레이터 (2108) 를 포함한다. 특히, 스페큘레이터 (2108) 는 순방향 순차 예측, 역방향 순차 예측, 백 블라인드 순차 예측, 백 섹터 순차 예측 등과 같은 순차 예측을 생성시키기 위해 순차 예측기 ("SEQ. 예측기"; 2102) 를 포함한다. 또한, 스페큘레이터 (2108) 는 비순차 예측을 형성하기 위한 비순차 예측기 ("NONSEQ. 예측기"; 2104) 를 포함한다. 프리페쳐 (2100) 는 메모리 (미도시) 에서 프로그램 인스트럭션 및 프로그램 데이터 둘 다 "페칭" 하고, 그 후, 프로세서 (미도시) 가 인스트럭션 또는 데이터를 요청하기 전에 다중-레벨 캐시 (2120) 에서 페칭된 프로그램 인스트럭션 및 프로그램 데이터를 저장하기 위해 이들 예측을 이용한다. 이들을 이용하기 이전에 페칭함으로써 (즉, "프리페칭"), 프로세서 유휴 시간 (예를 들어, 프로세서의 데이터가 고갈되는 동안의 시간) 은 최소화된다.Also, prefetcher 2100 includes speculator 2108 to generate prediction. In particular, speculator 2108 includes a sequential predictor (“SEQ. Predictor”) 2102 to generate sequential predictions such as forward sequential prediction, reverse sequential prediction, back blind sequential prediction, back sector sequential prediction, and the like. Speculator 2108 also includes an out of order predictor (“NONSEQ. Predictor”) 2104 for forming out of order prediction. Prefetcher 2100 “fetches” both program instructions and program data in memory (not shown), and then fetches in multi-level cache 2120 before the processor (not shown) requests the instruction or data. These predictions are used to store program instructions and program data. By fetching them before using them (ie, “prefetching”), processor idle time (eg, time while the processor is running out of data) is minimized.

비순차 예측기 (2104) 는 전술한 어드레스의 조합을 비순차 예측으로서 각각 한정시킬 수 있는 하나 이상의 잠재 비순차 어드레스로 저장하기 위한 저장소로서 타겟 캐시 (미도시) 를 포함한다. 타겟 캐시는 신속한 방법으로 비순차 예측을 발생시키기 위해 도입 검출 어드레스에 대한 그 콘텐츠를 비교하도록 설계되고, 여기서, 타겟 캐시는 예를 들어 다중-레벨 캐시 (2120) 내의 히트에 응답하여 저장된 비순차 예측의 우선순위를 매기도록 구성된다. 특히, 다중-레벨 캐시 (2120) 는 그 요청하에서 프로세서에 예측된 어드레스를 제공하면, 그 후, 그 어드레스가 속하는 저장된 트리거-타겟 조합은 우선순위가 높아진다. "트리거" 어드레스는 비순차 예측기 (2104) 가 비순차 예측을 발생하는 검출된 어드레스이고, 2 개 사이에 형성된 패터닝불가한 조합의 "타겟" 으로 지칭된 예측을 결과로 도출한다. 또한, 트리거 어드레스는 순차 예측을 발생하는 어드레스로 지칭될 수 있고, 타겟 어드레스로 지칭될 수 있다는 것이 명시된다.Out-of-order predictor 2104 includes a target cache (not shown) as storage for storing the combination of addresses described above as one or more potential out-of-order addresses, each of which may qualify as out-of-order prediction. The target cache is designed to compare its content against the introductory detection address to generate out of order predictions in a fast manner, where the target cache is stored for example in response to a hit in the multi-level cache 2120. It is configured to prioritize. In particular, if the multi-level cache 2120 provides a predicted address to the processor under the request, then the stored trigger-target combination to which the address belongs is prioritized. The "trigger" address is the detected address at which out of order predictor 2104 generates out of order prediction, resulting in a prediction referred to as "target" of an unpatternable combination formed between the two. It is also specified that the trigger address may be referred to as the address that generates the sequential prediction and may be referred to as the target address.

또한, 프리페쳐 (2100) 는 필터 (2114), 부가적인 예측 인벤토리 (2116), 부가적인 포스트-인벤토리 필터 (2117), 및 부가적인 중재기 (2118) 를 포함한다. 여기서, 필터 (2114) 는 발생된 예측을 예측 인벤토리 (2116) 내에 상주하는 사전에-발생된 예측과 비교하기 위한 인벤토리 필터 (미도시) 를 포함하도록 구성할 수 있다. 예측 인벤토리 (2116) 는 중재기 (2118) 가 메모리로 액세스하기 위해 예측을 선택할 때까지 발생된 예측을 저장하는 임시 저장소를 제공한다. 중재기 (2118) 는 발생된 예측의 예측이 인스트럭션 및 데이터를 프리페칭할 때 메모리를 액세스하기 위해 발행되는지의 여부를 결정하도록 구성된다. 몇몇 실시형태에서, 필터 (2114) 는 캐시 필터 (2110) 를 포함할 수 있고, 다중-레벨 캐시 (2120) 로 이미 "프리페칭된" 프로그램 인스트럭션 및 프로그램 데이터를 유발하는 사전에-발생된 예측과 생성된 예측을 비교하도록 구성될 수 있다. 따라서, 임 의의 발생된 예측이 다중-레벨 캐시 (2120) 내에 저장된 임의의 사전-발생 예측에 대해 잉여분이 있으면, 잉여의 예측은 통제를 필요로하는 예측의 수를 최소화하도록 취소 (또는 무효화) 될 수 있고, 그로 인해 프리페쳐 리소스를 해방시킬 수 있다.In addition, prefetcher 2100 includes filter 2114, additional predictive inventory 2116, additional post-inventory filter 2117, and additional arbiter 2118. Here, filter 2114 may be configured to include an inventory filter (not shown) for comparing the generated predictions with pre-generated predictions residing within predictive inventory 2116. Prediction inventory 2116 provides temporary storage for storing predictions generated until arbitrator 2118 selects predictions for access to memory. The arbiter 2118 is configured to determine whether a prediction of the generated prediction is issued to access the memory when prefetching instructions and data. In some embodiments, the filter 2114 may include a cache filter 2110, and includes pre-generated predictions that cause program instructions and program data that have already been “prefetched” into the multi-level cache 2120. It can be configured to compare the generated predictions. Thus, if any generated prediction has surplus for any pre-occurrence prediction stored in multi-level cache 2120, the surplus prediction will be canceled (or invalidated) to minimize the number of predictions requiring control. Can free up prefetcher resources.

동작시에, 스페큘레이터 (2108) 는 프로세서가 메모리로의 액세스를 요청 ("판독 요청") 함에 따라서 시스템 버스를 모니터한다. 프로세서가 프로그램 인스트럭션을 실행함에 따라서, 스페큘레이터 (2108) 는 아직 프로세서에 의해 이용되지 않은 프로그램 인스트럭션 및 프로그램 데이터를 포함하는 어드레스에 대한 판독 요청을 검출한다. 설명의 목적으로, "어드레스" 는 다중-레벨 캐시 (2120) 와 같이, 메모리와 캐시 메모리 사이에서 일반적으로 전송된 캐시 라인 또는 메모리 유닛과 조합된다. 캐시 라인의 "어드레스" 는 메모리 위치를 지칭할 수 있고, 캐시 라인은 메모리의 하나 이상의 어드레스로부터의 데이터를 포함할 수 있다. 용어 "데이터" 는 프리페칭될 수 있는 정보의 유닛을 지칭하고, 용어 "프로그램 인스트럭션" 및 "프로그램 데이터" 는 각각 그 프로세싱에서 프로세서에 의해 이용된 인스트럭션 및 데이터를 지칭한다. 따라서, 데이터 (예를 들어, 비트의 임의의 수) 는 프로그램 인스트럭션 또는 프로그램 데이터 중 하나, 또는 둘 다를 구성하는 정보를 지칭하는 "예측가능한 정보" 를 나타낼 수 있다. 또한, 용어 "예측" 은 용어 "예측 어드레스" 와 상호교환가능하게 이용될 수 있다. 예측 어드레스가 메모리를 액세스하기 위해 이용되는 경우, 통상적으로, 그 예측 어드레스를 포함하는 하나 이상의 캐시 라인뿐만 아니라 다른 어드레스 (예측 어드 레스 또는 다른 어드레스) 는 페칭된다.In operation, speculator 2108 monitors the system bus as the processor requests access to the memory (“read request”). As the processor executes the program instructions, speculator 2108 detects a read request for an address that includes program instructions and program data that has not yet been used by the processor. For purposes of explanation, an "address" is combined with a cache line or memory unit that is generally transferred between memory and cache memory, such as multi-level cache 2120. An "address" of a cache line may refer to a memory location, and the cache line may include data from one or more addresses of the memory. The term "data" refers to a unit of information that can be prefetched, and the terms "program instructions" and "program data" respectively refer to the instructions and data used by the processor in its processing. Thus, data (eg, any number of bits) may represent “predictable information” which refers to information that constitutes one or both of the program instructions or program data. The term "prediction" may also be used interchangeably with the term "prediction address". When a predictive address is used to access a memory, typically, one or more cache lines containing the predictive address as well as other addresses (prediction addresses or other addresses) are fetched.

프리페쳐 (2100) 가 예측을 발행할 때, 각각의 예측으로 참조정보를 추가하고 또는 결합시킬 수 있다. 예측이 비순차 예측인 경우, 이것과 조합된 참조 정보는 예측 식별자 ("PID") 및 대응 타겟 어드레스를 포함할 수 있다. PID (미도시) 는 예측되는 대응 타겟 어드레스를 야기하는 트리거 어드레스 (또는 그 표현) 을 식별한다. 이 참조정보는 메모리가 프리페칭된 데이터를 복구할 때 다중-레벨 캐시 (2120) 에 의해 수신된다. 그 후, 다중-레벨 캐시 (2120) 는 프로세서가 이 데이터를 요구하는 시간까지 복귀된 데이터를 임시로 저장한다. 다중-레벨 캐시 (2120) 가 프리페칭된 데이터를 저장하는 시간 동안, 발생된 예측에 대해 필터링하고, 그 내부에 저장된 데이터의 일관성을 확보하고, 단기간 또는 장기간 데이터 둘 중 하나로서 그 데이터를 분류하는 등의 데이터를 관리한다. 그러나, 프로페서는 프리페칭된 데이터 (즉, 예측가능한 정보) 를 요청하는 경우, 이 데이터는 프로세서로 전송된다. 다중-레벨 캐시 (2120) 내에 위치되는 데이터가 비순차 예측의 결과인 경우, 참조정보는 필요한 경우 타겟 캐시 내에 저장된 비순차 예측의 우선순위를 재조정하기 위한 비순차 예측기 (2104) 로 전송될 수 있다.When prefetcher 2100 issues predictions, it may add or combine reference information with each prediction. If the prediction is out of order prediction, the reference information combined with it may include a prediction identifier (“PID”) and a corresponding target address. The PID (not shown) identifies the trigger address (or a representation thereof) that results in the predicted corresponding target address. This reference information is received by the multi-level cache 2120 when the memory recovers prefetched data. The multi-level cache 2120 then temporarily stores the returned data until the time the processor requires this data. During the time that the multi-level cache 2120 stores the prefetched data, it filters for the predictions generated, ensures consistency of the data stored therein, and classifies the data as either short-term or long-term data. Manage your data. However, if a processor requests prefetched data (ie, predictable information), this data is sent to the processor. If the data located in the multi-level cache 2120 is the result of out of order prediction, reference information may be sent to the out of order predictor 2104 to reorder the out of order prediction stored in the target cache if necessary. .

도 22 는 본 발명의 일 실시형태에 따른 예시적인 다중-레벨 캐시 (2220) 를 도시한다. 다중-레벨 캐시 (2220) 는 캐시 필터 (2210), 제 1 레벨 데이터 복귀 캐시 ("DRC1"; 2222) 및 제 2 레벨 데이터 복귀 캐시 ("DRC2": 2224) 를 포함한다. 캐시 필터 (2210) 는 제 1 레벨 DRC (2222) 및 제 2 레벨 DRC (2224) 를 프리페쳐 (2100) 의 컴포넌트뿐만 아니라 메모리 프로세서 (미도시) 의 컴포넌트들과 같은 다른 컴포넌트와 각각 인터페이스로 접속시키기 위한 DRC1 쿼리 인터페이스 (2204) 및 DRC2 쿼리 인터페이스 (2214) 를 포함한다. 이와 같은 메모리 프로세서 컴포넌트가 널리 공지된 캐싱 방법에 따라서 동작하는 도 21 의 기록-백 캐시 (2290) 이면, 그로 인해, 캐시 내의 데이터로의 변형은 필요할 때까지 캐시 소스 (예를 들어, 시스템 메모리) 로 복사되지 않는다. 기록-백 캐시 (2290) 가 당업계에 널리 공지된 구조 및 기능성에서 유사하기 때문에 상세하게 설명할 필요는 없다. 또한, DRC1 쿼리 인터페이스 (2204) 는 DRC1 매처 (2206) 및 DRC1 처리기 (2208) 를 포함하고, DRC2 쿼리 인터페이스 (2214) 는 DRC2 매처 (2216) 및 DRC2 처리기 (2218) 를 포함한다.22 illustrates an example multi-level cache 2220 in accordance with an embodiment of the present invention. The multi-level cache 2220 includes a cache filter 2210, a first level data return cache (“DRC1”) 2222 and a second level data return cache (“DRC2”: 2224). The cache filter 2210 interfaces the first level DRC 2222 and the second level DRC 2224 with other components, such as components of the preprocessor 2100 as well as components of a memory processor (not shown), respectively. DRC1 query interface 2204 and DRC2 query interface 2214 for each. If such a memory processor component is the write-back cache 2290 of FIG. 21 operating according to well-known caching methods, whereby a modification to the data in the cache is required until the cache source (eg, system memory) is needed. Is not copied. The write-back cache 2290 is similar in structure and functionality well known in the art and need not be described in detail. In addition, DRC1 query interface 2204 includes DRC1 matcher 2206 and DRC1 processor 2208, and DRC2 query interface 2214 includes DRC2 matcher 2216 and DRC2 processor 2218.

제 1 레벨 DRC (2222) 는 어드레스 (예를 들어, 예측 어드레스) 를 저장하기 위한 DRC1 어드레스 저장소 (2230) 를 포함하고, DRC1 어드레스 저장소 (2230) 는 데이터 (즉, 예측가능한 정보) 및 PID 를 저장하는 DRC1 데이터 저장소 (2232) 에 연결된다. 예를 들어, 예측 어드레스 ("PA") 에서 도출하는 프리페칭된 데이터는 PID (2232b) 와 관련하여 데이터(PA) (2232a) 로서 저장될 수 있다. 이 표시는 예측가능한 정보를 나타내는 데이터를 프리페칭하는데 기여하는 예측 어드레스 PA 를 나타낸다. 데이터(PA) (2232a) 가 프로세서에 의해 요청되는 경우, 대응 예측 어드레스, PA, 및 예측 식별자, PID (2232b) 는 필요한 경우 예측 어드레스의 우선순위를 변경하기 위해 비순차 예측기 (2104) 로 옮겨진다. 예측 실별자, PID (2232b) 는 통상적으로 PA 를 발생하는 트리거 어드레스를 나타내는 정 보를 포함한다. 비순차 예측기 (2104) 에 의해 발생된 PA 는 또한 타겟 어드레스, 프로세서-요청된 어드레스 (및 상관 데이터) 로 지칭될 수 있고, 다중-레벨 캐시 (2220) 내에 또한 저장될 수 있다. 또한, 데이터(PA) (2232a) 는 PID (2232b) 를 수행할 필요는 없다.The first level DRC 2222 includes a DRC1 address store 2230 for storing an address (eg, a predictive address), and the DRC1 address store 2230 stores data (ie, predictable information) and a PID. Is connected to a DRC1 data store 2232. For example, the prefetched data derived from the predictive address (“PA”) may be stored as data (PA) 2232a with respect to the PID 2232b. This indication represents a predictive address PA that contributes to prefetching data representing predictable information. When data (PA) 2232a is requested by the processor, the corresponding prediction address, PA, and prediction identifier, PID 2232b, are moved to out-of-order predictor 2104 to change the priority of the prediction address if necessary. . Prediction identifier, PID 2232b, typically includes information indicating a trigger address that generates a PA. The PA generated by out of order predictor 2104 may also be referred to as a target address, processor-requested address (and correlation data), and may also be stored in multi-level cache 2220. Further, data PA 2232a does not need to perform PID 2232b.

또한, DRC1 어드레스 저장소 (2230) 및 DRC1 데이터 저장소 (2232) 는 그 기능성 및/또는 구조를 관리하는 DRC1 관리기 (2234) 에 통신적으로 연결된다. 제 2 레벨 DRC2 (2224) 는 데이터 (2232a) 및 PID (2232b) 의 데이터에 유사한 형태의 데이터를 저장하는 DRC2 데이터 저장소 (2242) 에 연결된 DRC2 어드레스 저장소 (2240) 를 포함한다. DRC2 어드레스 저장소 (2240) 및 DRC2 데이터 저장소 (2242) 둘 다는 그 기능성 및/또는 구조를 관리하는 DRC2 관리기 (2246) 에 통신적으로 연결된다.In addition, DRC1 address store 2230 and DRC1 data store 2232 are communicatively coupled to DRC1 manager 2234, which manages its functionality and / or structure. The second level DRC2 2224 includes a DRC2 address store 2240 coupled to a DRC2 data store 2242 that stores similar types of data in the data 2232a and the data of the PID 2232b. Both DRC2 address store 2240 and DRC2 data store 2242 are communicatively coupled to DRC2 manager 2246, which manages its functionality and / or structure.

본 발명의 특정 실시형태에서, 제 2 레벨 DRC (2224) 는 DRC2 어드레스 저장소 (2240) 와는 분리된 유효 비트 (2244) 를 유지하기 위한 "유효 비트" (2244) 의 저장소를 또한 포함하고, 각각의 유효 비트는 저장된 예측이 유효 (데이터에 대한 프로세서 요청을 기능하는데 이용가능한) 또는 무효 (이용가능 하지 않은) 둘 중 하나임을 나타낸다. 무효 예측을 가지는 엔트리는 빈 엔트리로서 나타날 수 있다. 어드레스와 분리된 유효 비트 (2224) 의 비트를 유지함으로써 하나 이상의 유효 비트를 재설정 또는 설정하는 것은, DRC2 어드레스 저장소 (2240) 가 대응 어드레스를 가지는 유효 비트를 저장하는 경우보다 더욱 빠르고, 계산적으로 보다 조금 부담을 가진다. 대부분의 경우, DRC1 의 어드레스에 대한 유효 비트는 통상 함께 저장되거나 또는 이들 어드레스의 일부로서 저장된다.In a particular embodiment of the invention, the second level DRC 2224 also includes a store of "valid bits" 2244 for maintaining valid bits 2244 separate from the DRC2 address store 2240, each of The valid bit indicates that the stored prediction is either valid (available to function a processor request for data) or invalid (not available). Entries with invalid predictions may appear as empty entries. Resetting or setting one or more valid bits by keeping the bits of the valid bits 2224 separate from the address is faster and computationally smaller than when the DRC2 address store 2240 stores the valid bits with the corresponding address. Have a burden. In most cases, the valid bits for the addresses of DRC1 are usually stored together or as part of these addresses.

동작시에, DRC1 쿼리 인터페이스 (2204) 및 DRC2 쿼리 인터페이스 (2214) 는, 이들이 "입력 어드레스" 로서 가해진 임의의 하나 이상의 어드레스를 포함하는지의 여부를 결정하기 위해 제 1 레벨 DRC (2222) 및 제 2 레벨 DRC (2224) 의 콘텐츠를 각각 시험하도록 구성된다. 입력 어드레스는 기록 어드레스로서 기록-백 캐시, 또는 다른 엘리먼트를 외부의 다중-레벨 캐시 (2220) 로 생성된 예측으로서, 스페큘레이터 (2108) 에서 기인할 수 있다. 본 명세서에 설명된 바와 같이, 일반적으로 입력 어드레스는 여분을 필터링하기 위해 다중-레벨 캐시 (2220) 의 콘텐츠에 대해 비교되는 생성된 예측이다. 그러나, 입력 어드레스는 때때로 데이터가 기록되거나 또는 기록될 메모리의 위치를 식별하는 기록 어드레스이다. 이러한 경우, 다중-레벨 캐시 (2220) 는 일 동작이 메모리, DRC1 데이터 저장소 (2222), 및 DRC2 데이터 저장소 (2224) 중에서 일관성을 유지하도록 요청되는지의 여부를 결정하기 위해 시험된다.In operation, the DRC1 query interface 2204 and the DRC2 query interface 2214 are configured to determine whether they include any one or more addresses applied as " input addresses " And to test the content of level DRC 2224, respectively. The input address may originate in the speculator 2108 as a prediction generated from the write-back cache, or other element, into an external multi-level cache 2220 as the write address. As described herein, the input address is generally the generated prediction that is compared against the contents of the multi-level cache 2220 to filter out the excess. However, the input address is sometimes a write address that identifies the location of the memory where data is to be written or to be written. In this case, the multi-level cache 2220 is tested to determine whether an operation is requested to remain consistent among the memory, DRC1 data store 2222, and DRC2 data store 2224.

DRC1 매처 (2206) 및 DRC2 매처 (2216) 는, 입/출력 포트 ("I/O"; 2250) 에서 하나 이상의 입력 어드레스가 DRC1 어드레스 저장소 (2230) 및 DRC2 어드레스 저장소 (2240) 각각에 상주하는지의 여부를 결정하도록 구성된다. DRC1 매처 (2206) 또는 DRC2 매처 (2216) 둘 중 하나가, 입력 어드레스가 제 1 레벨 DRC (2222) 및 제 2 레벨 DRC (2224) 의 하나를 매치한다고 검출하는 경우, DRC1 조절기 (2208) 또는 DRC2 조절기 (2218) 과 같은 조합 조절기는 잉여의 예측을 필터링하거나 또는 다중-레벨 캐시 (2220) 의 데이터가 메모리를 통해서 일관적임을 확인 하도록 동작한다. DRC1 매처 (2206) 및 DRC2 매처 (2216) 는 제 1 레벨 DRC (2222) 및 제 2 레벨 DRC (2224) 의 콘텐츠에 대한 입력 어드레스의 범위를 동시에 비교하도록 구성될 수 있다 (즉, 동시에 또는 거의 동시에, 다중-레벨 캐시 (2220) 의 구조에 의존하여, 동작 (예를 들어, 클럭 주기) 의 1 또는 2 주기, 또는 주기의 다른 최소 수로 이루어진다). 캐시에 대해 동시에 비교될 수 있는 입력 어드레스의 범위의 예는 어드레스 A0 (트리거 어드레스) 및 예측 어드레스 A1, A2, A3, A4, A5, A6 및 A7 이고, 나중의 7 개의 어드레스는 순차 예측기 (2102) 에 의해 발생된다.DRC1 matcher 2206 and DRC2 matcher 2216 can determine whether one or more input addresses reside in DRC1 address store 2230 and DRC2 address store 2240, respectively, at input / output port (“I / O”) 2250. Configured to determine whether or not. If either the DRC1 matcher 2206 or the DRC2 matcher 2216 detects that the input address matches one of the first level DRC 2222 and the second level DRC 2224, the DRC1 regulator 2208 or DRC2 Combination regulators, such as regulator 2218, operate to filter out excess prediction or to ensure that data in multi-level cache 2220 is consistent through memory. The DRC1 matcher 2206 and the DRC2 matcher 2216 can be configured to simultaneously compare the range of input addresses for the content of the first level DRC 2222 and the second level DRC 2224 (ie, simultaneously or nearly simultaneously). , Depending on the structure of the multi-level cache 2220, consisting of one or two periods of operation (eg, clock periods), or other minimum number of periods). Examples of ranges of input addresses that can be compared simultaneously for a cache are address A0 (trigger address) and prediction addresses A1, A2, A3, A4, A5, A6 and A7, and the seven later addresses are sequential predictors 2102. Is caused by.

동시에 시험될 때, 이와 같은 비교를 수행하는 매처 (2206, 2216) 는 "미리보기 룩업" 을 수행하도록 명시된다. 몇몇 실시형태에서, 미리보기 룩업은 프로세서가 유휴상태일 때 또는 프리페쳐 (2100) 에서 데이터를 요청하지 않을 때 수행된다. 또한, 기능성의 측면에서 유사하다 할지라도, DRC1 매처 (2206) 및 DRC2 매처 (2216) 의 각각의 구조는 DRC1 어드레스 저장소 (2230) 및 DRC2 어드레스 저장소 (2240) 와 함께 동작하도록 구성되고, 따라서, 이들은 구조적으로 유사하게 형성될 필요는 없다. DRC1 매처 (2206) 및 DRC 매처 (2216) 의 예는 본 발명의 하나 이상의 특정 실시형태에 따라서 각각 도 23a 및 도 24 와 관련하여 이하 설명된다.When tested at the same time, the matchers 2206 and 2216 that perform this comparison are specified to perform a "preview lookup". In some embodiments, the preview lookup is performed when the processor is idle or not requesting data at the prefetcher 2100. Also, although similar in terms of functionality, each structure of DRC1 matcher 2206 and DRC2 matcher 2216 is configured to work with DRC1 address store 2230 and DRC2 address store 2240, thus It does not have to be structurally similar. Examples of DRC1 matcher 2206 and DRC matcher 2216 are described below with respect to FIGS. 23A and 24, respectively, in accordance with one or more specific embodiments of the present invention.

다음으로, 쿼리 인터페이스 (2204, 2214) 가 필터링 동작을 수행하는 경우의 상황을 고려한다. 다중-레벨 캐시 (2220) 의 콘텐츠에 대한 수많은 입력 어드레스를 비교함으로써, 상호 매치하지 않는 입력 어드레스를 검출함으로써, 조절기 (2208, 2218) 는 필터링이 수행되지 않는 경우보다 상호 매치하지 않는 입력 어드레스로 하여금 예측가능한 정보를 페칭하도록 발생된 예측으로서 진행하게 하는 동안 상호 매치하는 예측 (즉, 잉여의 예측) 을 필터링하기 위한 적절한 동작을 취할 수 있다. 이와 같이, 다중-레벨 캐시 (2220) 및 그 캐시 필터 (2210) 는 어떤 캐시 라인이 페칭을 시작하는지를 더욱 빨리 결정함으로써 잠재기를 감소시킨다. 이는, 제 1 레벨 DRC (2222) 및 제 2 레벨 DRC (2224) 캐시가 예측이 동시에 비교되지 않는 경우 또는 필터링되지 않는 경우 둘 중 하나, 또는 둘 다의 경우보다 더욱 빨리 프리페칭된 예측가능한 정보를 포함할 가능성이 더욱 크기 때문에, 프로세서에 의해 야기된 잠재기를 감소시키는 가능성이 차례로 더욱 크게 된다.Next, consider a situation where the query interface 2204, 2214 performs a filtering operation. By comparing a large number of input addresses for the contents of the multi-level cache 2220, by detecting input addresses that do not match each other, the regulators 2208, 2218 may cause input addresses that do not match each other than if no filtering is performed. Appropriate actions may be taken to filter mutually matched predictions (ie, redundant predictions) while proceeding as predictions generated to fetch predictable information. As such, multi-level cache 2220 and its cache filter 2210 reduce potential by determining more quickly which cache line is starting fetching. This means that the first level DRC 2222 and second level DRC 2224 caches are able to retrieve the predictable information prefetched faster than either or both if the predictions are not compared at the same time or if they are not filtered. Because of the greater likelihood of inclusion, the likelihood of reducing the latent potential caused by the processor is in turn greater.

DRC1 어드레스 저장소 (2230) 및 DRC2 어드레스 저장소 (2240) 는 DRC1 데이터 저장소 (2232) 및 DRC2 데이터 저장소 (2242) 내에 각각 저장된 프리페칭된 데이터와 조합된 어드레스를 각각 저장한다. 어드레스 저장소 (2230 및 2240) 각각은 어드레스 또는 어드레스의 다른 표현 둘 중 하나를 저장한다. 본 발명의 일 실시형태에 따르면, 예시적인 DRC1 어드레스 저장소 (2230) 은 완전하게 조합적이고, 완전하게 별개의 어드레스를 저장하도록 구성된다. 예를 들어, 각각의 어드레스에 대한 비트 35:6 는 이들 어드레스를 별개로 식별하기 위해 DRC1 에 저장된다. DRC1 어드레스 저장소 (2230) 내에 저장된 어드레스는 공통 부분 (예를 들어, 태그) 및 델타 (delta) 부분 (예를 들어, 인덱스) 을 포함하여 나타날 수 있고, 공통 부분과 델타 부분은 적어도 하나의 실시형태에 따른 DRC1 의 미리보기 룩업 동안 어드레스를 표현하도록 이용된다. 또한, DRC1 어드레스 저장소 (2230) 및 DRC1 데이터 저장소 (2232) 는 데이터의 어드레스 데이터의 어드레스의 32 엔트리 및 어드레스 엔트리당 64 바이트 캐시 라인을 저장하도록 구성된다. 일반적으로, 프리페칭된 데이터는 동적 랜덤 액세스 메모리 ("DRAM") 와 같은 메모리에서 기인한다고 할지라도, DRC1 데이터 저장소 (2232) 의 데이터가 업데이팅을 요구하는 경우의 기록 백 캐시에서 기인할 수 있다.DRC1 address store 2230 and DRC2 address store 2240 each store an address in combination with prefetched data stored in DRC1 data store 2232 and DRC2 data store 2242, respectively. Each of address stores 2230 and 2240 stores either an address or another representation of an address. According to one embodiment of the invention, the exemplary DRC1 address store 2230 is configured to store a completely separate, completely separate address. For example, bits 35: 6 for each address are stored in DRC1 to separately identify these addresses. Addresses stored in DRC1 address store 2230 may appear to include a common portion (eg, a tag) and a delta portion (eg, an index), the common portion and delta portion being in at least one embodiment. Is used to represent an address during a preview lookup of DRC1 according to. In addition, DRC1 address store 2230 and DRC1 data store 2232 are configured to store 32 entries of addresses of address data of data and 64 byte cache lines per address entry. In general, even if the prefetched data is from a memory such as dynamic random access memory (“DRAM”), it can be from the write back cache when the data in the DRC1 data store 2232 requires updating. .

반대로, 예시적인 DRC2 어드레스 저장소 (2240) 는 엔트리와 조합가능한 4-세트로 구성될 수 있고, 어드레스를 표현하기 위해 기초 부분 (예를 들어, 태그) 를 저장하도록 구성될 수 있다. 또한, DRC2 어드레스 저장소 (2240) 및 DRC2 데이터 저장소 (2242) 는 어드레스의 1024 엔트리 및 데이터의 어드레스 엔트리당 64 바이트 캐시 라인을 각각 저장하도록 구성된다. DRC2 데이터 저장소 (2242) 는 DRC1 데이터 저장소 (2232) 에서 기인한 프리페칭된 데이터를 저장하고, 몇몇 구현에서는, 임의의 수의 메모리 뱅크 (예를 들어, 4 개의 뱅크; 0, 1, 2 및 3) 로 구성될 수 있다.In contrast, the example DRC2 address store 2240 can be configured in four sets, which can be combined with an entry, and can be configured to store a base portion (eg, a tag) to represent an address. In addition, DRC2 address store 2240 and DRC2 data store 2242 are configured to store 1024 entries of addresses and 64 byte cache lines per address entry of data, respectively. DRC2 data store 2242 stores prefetched data resulting from DRC1 data store 2232, and in some implementations, any number of memory banks (eg, four banks; 0, 1, 2, and 3). It can be composed of).

예측가능한 정보가 프리페칭되는 메모리가 통상적으로 DRAM 메모리 (예를 들어, "DIMM (Dual In-Line Memory Module)" 내에 배열된 메모리) 라 할지라도, 메모리는 임의의 다른 공지된 메모리 기술일 수 있다. 통상적으로 메모리는 특정 로우 어드레스 내에서 이용가능한 메모리의 섹션인 "페이지"로 세부분할된다. 특정 페이지가 액세스되거나, 또는 "개방" 되면, 다른 페이지는 완성을 위해 시간을 요구하는 개방 및 폐쇄 페이지의 프로세스를 통해서 폐쇄된다. 따라서, 프로세서가 약간 무분별한 방법으로 프로그램 인스트럭션을 실행할 때, DRAM 메모리 의 다양한 메모리 위치에서 인스트럭션 및 데이터를 페칭하는 것에 관하여, 메모리로의 액세스는 비순차이다. 이와 같이, 판독 요청의 스트림은 페이지 경계에 걸쳐서 연장할 수 있다. 다음 페이지에서 다음 어드레스가 이용가능하지 않은 경우, 프로세서는 메모리에서 직접적으로 프로그램 인스트럭션 및 프로그램 데이터를 페칭해야만 한다. 이는, 이와 같은 인스트럭션 및 데이터를 회수하는 잠재기를 증가시킨다. 따라서, 다중-레벨 캐시 (2220) 에서 다중 페이지를 스팬 (span) 하는 예측가능한 정보를 프리페칭하고 저장함으로써, 페이지를 개방함에 관련되는 잠재기는 본 발명에 따라서 감소된다. 프리페칭되는 데이터가 캐시로부터 기인하기 때문에, 프로세서의 의해 또는 프로세서와 관련된 잠재기는 액세스된 페이지가 개방상태에 있는 동안 감소된다.Although the memory to which the predictable information is prefetched is typically a DRAM memory (eg, a memory arranged in a "Dual In-Line Memory Module"), the memory can be any other known memory technology. . Typically, memory is subdivided into "pages" which are sections of memory available within a particular row address. When a particular page is accessed or "opened", the other page is closed through a process of open and closed pages that require time for completion. Thus, when a processor executes program instructions in a somewhat indiscriminate manner, access to the memory is out of order with respect to fetching instructions and data at various memory locations in the DRAM memory. As such, the stream of read requests may extend across page boundaries. If the next address on the next page is not available, the processor must fetch the program instructions and program data directly from memory. This increases the potential for recovering such instructions and data. Thus, by prefetching and storing predictable information spanning multiple pages in a multi-level cache 2220, the potential associated with opening a page is reduced in accordance with the present invention. Since the data being prefetched is from the cache, the latent by or associated with the processor is reduced while the accessed page is in the open state.

예를 들어, 비순차 예측기 (2104) 가, 어드레스 "00200" 는 어드레스 "00100" 의 프로세서 판독에 뒤따라서 액세스된다는 것을 정확하게 예측한다고 고려한다. 따라서, 비순차 예측기 (2104) 는 어드레스 "00200" 를 사실상 액세스하는 프로세서 이전에 페칭되는 어드레스 "00200" (뿐만 아니라 배치가 4 인 경우의 어드레스 00201, 00202, 00203 및 00204) 를 시작하는 라인의 범위 (예를 들어, 하나의 타겟 어드레스 및 4 개의 예측된 어드레스, 예측의 수는 구성가능하고, 배치 "b" 에 의해 정의되어 발생한다) 를 유발한다. 프로세서가 어드레스 "00200" 에 대한 판독을 사실상 수행할 때, 다중-레벨 캐시 (2220) 의 미리보기 룩업은 어드레스 "00200" 를 뒤따르는 특정 범위 내의 어떤 캐시 라인이 사전에 프리페칭되었는지를 빠르게 결정한다. 판독 어드레스 스트림 내의 비순차 변환은 DRAM 페이지 개방 동작에 의해 수행될 수 있기 때문에, 미리보기 룩업은 프리페쳐 (2100) 로 하여금 판독 요청의 스트림 내에서 신속하게 미리보기하게 하며, 어떤 어드레스 또는 캐시 라인이 프리페칭되는 것이 필요한지를 결정한다. 신속하게 페칭을 시작함으로써, 프리페쳐 (2100) 는 DRAM 페이지 개방 동작의 잠재기를 숨길 수 있고, 이 후, 프로세서에서 지연 패널티를 일으키지 않고 캐시 라인의 순차 스트림 (타겟 어드레스에 대한 기초를 형성하는 트리거 어드레스와 비순차임에도 불구하고) 을 제공한다.For example, consider that out of order predictor 2104 accurately predicts that address "00200" is accessed following a processor read of address "00100". Thus, out of order predictor 2104 is a range of lines starting address " 00200 " (as well as addresses 00201, 00202, 00203 and 00204 when batch is 4) that is fetched before the processor that actually accesses address " 00200 ". (E.g., one target address and four predicted addresses, the number of predictions is configurable and occurs as defined by placement " b "). When the processor actually performs a read on address "00200", the preview lookup of the multi-level cache 2220 quickly determines which cache lines within a particular range following the address "00200" have been prefetched beforehand. . Since out-of-order translation within a read address stream can be performed by a DRAM page open operation, the preview lookup causes the prefetcher 2100 to quickly preview within the stream of read requests, which address or cache line Determine if it needs to be prefetched. By quickly fetching, the prefetcher 2100 can hide the potential for DRAM page open operations, after which a trigger address that forms the basis for a sequential stream of cache lines (the base for the target address) without incurring a delay penalty in the processor. And despite being out of order).

도 22 는 분리된 엔티티로서 DRC1 관리기 (2234) 및 DRC2 관리기 (2246) 를 도시하지만, 분리할 필요는 없다. 즉, DRC1 관리기 (2234) 및 DRC2 관리기 (2246) 는 단일 관리 엔티티로 결합될 수 있고, 또한, 다중-레벨 캐시 (2220) 의 외부에 배치될 수 있으며, 둘 다 일 수 있다. 제 1 레벨 DRC (2222) 및 제 2 레벨 DRC (2224) 이 프로세서 내의 종래의 L1 및 L2 와는 구조적으로 및/또는 기능적으로 다르고, 다중-레벨 캐시 (2220) 내에 저장된 예측가능한 정보를 관리하는 독특한 수단이 채용된다. 이와 같은 수단의 예시는 각각의 데이터 복귀 캐시에서 메모리를 할당하는 수단, 단기간에서 장기간 데이터 저장소로 정보를 카피하는 수단, 및 다중-레벨 캐시 (2220) 와 기록-백 캐시와 같은 다른 엔티티 사이에서 일관성을 유지하는 수단을 포함한다.22 shows DRC1 manager 2234 and DRC2 manager 2246 as separate entities, but need not be separated. That is, DRC1 manager 2234 and DRC2 manager 2246 can be combined into a single management entity and can also be placed outside of multi-level cache 2220, and both. The first level DRC 2222 and the second level DRC 2224 are structurally and / or functionally different from the conventional L1 and L2 in the processor and are a unique means of managing the predictable information stored in the multi-level cache 2220. Is employed. Examples of such means are: means for allocating memory in each data return cache, means for copying information to short term and long term data stores, and consistency between other entities such as multi-level cache 2220 and write-back cache. Means for maintaining.

먼저, 이 정보가 단기간에서 장기간 정보로 낡게 함에 따라, 제 1 레벨 DRC (2222) 에서 제 2 레벨 DRC (2224) 로 예측가능한 정보의 카피를 관리하도록 이용된 카피 수단을 고려한다. DRC1 관리기 (2234) 는, 이 데이터가 시간의 특정 임계치까지 제 1 레벨 DRC (2222) 내에 있는 경우, 데이터를 DRC1 데이터 저장소 (2232) 에서 DRC2 데이터 저장소 (2242) 로 전송하기 위해 DRC2 관리기 (2246) 과 협력한다. 이 임계치는 정수일 수 있고, 또는, 동작 도중에 다양하게 변화할 수 있다는 것이 명시된다. 통상적으로, 낡은 데이터는 DRC1 의 N 무효 엔트리 (즉, 사용가능한) 보다 적은 어느 때나 전송되도록 구성될 수 있으며, 여기서 N 은 프로그램가능하다. 동작시에, 데이터가 단기간에서 장기간 저장으로 카피되면, 제 1 레벨 DRC (2222) 내의 엔트리는 삭제 (즉, 무효화) 된다.First, consider this copy means used to manage a copy of the predictable information from the first level DRC 2222 to the second level DRC 2224 as this information ages from the short term to the long term information. The DRC1 manager 2234 may send the data from the DRC1 data store 2232 to the DRC2 data store 2242 if this data is within the first level DRC 2222 up to a certain threshold of time. Cooperate with It is specified that this threshold may be an integer or may vary variously during operation. Typically, old data may be configured to be sent at any time less than N invalid entries (ie, available) of DRC1, where N is programmable. In operation, if data is copied from short term to long term storage, the entry in the first level DRC 2222 is deleted (ie invalidated).

둘째, 제 1 레벨 DRC (2222) 및 제 2 레벨 DRC (2224) 내의 예측가능한 정보를 삽입하는 할당 수단을 고려한다. 예측가능한 정보를 제 1 레벨 DRC (2222) 로 삽입할 때, DRC1 관리기 (2234) 는 후보로서 록트 엔트리 (locked entry) 를 제외한 DRC1 데이터 저장소 (2232) 의 임의의 유효 엔트리를 선택한다. DRC1 관리기 (2234) 가 예측가능한 정보가 저장될 수 있는 임의의 무효 엔트리를 검출하지 않는 경우, 가장 오래된 엔트리는 하나의 엔트리에 대한 공간을 할당하도록 이용될 수 있다. DRC2 데이터 저장소 (2242) 내에 엔트리를 할당하는 것에 관하여, DRC2 관리기 (2246) 는 제 1 레벨 DRC (2222) 에서 제 2 레벨 DRC (2224) 로 카피된 데이터를 수신하는 수많은 방법 중 하나 (즉, 4 개의 방법 중 하나) 를 이용할 수 있다. 예를 들어, 예측 어드레스의 인덱스는 데이터를 저장하기 위한 4 개의 엔트리를 포함할 수 있다. 처음에, DRC2 데이터 저장소 (2242) 는 사용되지 않는 (즉, 무효화된) 방법들 중 임의의 하나를 할당한다. 그러나, 모든 방법이 할당되는 경우, 제 1 입력되는 것은 제 1 출력되는 것 (즉, 가장 오래된 것은 초과 기록됨) 이다. 그러나, 가장 오래된 엔트리가 동일한 수명을 가지고 유효한 경우, DRC2 관리기 (2246) 는 언록트 엔트리 (unlocked entry) 를 할당한다. 마지막으로, 방법의 세트 내의 모든 엔트리가 잠기면, DRC2 관리기 (2246) 는 유효한 제 1 레벨 DRC (2222) 의 엔트리를 유지하는 동안 제 1 레벨 DRC (2222) 에서 제 2 레벨 DRC (2224) 로 기록을 억제한다. 또한, 통상적으로 제 2 레벨 DRC (2224) 는 제 1 레벨 DRC (2222) 에서만 저장하는 데이터를 수신한다는 것을 명시한다.Second, consider allocation means for inserting predictable information within first level DRC 2222 and second level DRC 2224. When inserting predictable information into first level DRC 2222, DRC1 manager 2234 selects any valid entry of DRC1 data store 2232 excluding the locked entry as a candidate. If the DRC1 manager 2234 does not detect any invalid entries for which predictable information can be stored, the oldest entry can be used to allocate space for one entry. With regard to allocating entries in the DRC2 data store 2242, the DRC2 manager 2246 is one of a number of ways to receive data copied from the first level DRC 2222 to the second level DRC 2224 (ie, 4). One of two methods) can be used. For example, the index of the prediction address may include four entries for storing data. Initially, DRC2 data store 2242 allocates any one of the unused (ie invalidated) methods. However, if all methods are assigned, the first input is the first output (ie the oldest is overwritten). However, if the oldest entry is valid with the same lifetime, the DRC2 manager 2246 allocates an unlocked entry. Finally, if all entries in the set of methods are locked, the DRC2 manager 2246 writes a record from the first level DRC 2222 to the second level DRC 2224 while maintaining an entry of a valid first level DRC 2222. Suppress In addition, it typically specifies that second level DRC 2224 receives data that only stores at first level DRC 2222.

DRC1 관리기 (2234) 및 DRC2 관리기 (2246) 가 부착될 수 있는 다른 수단은 일관성을 유지하는 것에 관한다. DRC1 관리기 (2234) 는, 데이터가 기록되는 기록 어드레스와 매치하는 어드레스를 가지는 임의의 엔트리의 데이터를 업데이트함으로써 제 1 레벨 DRC (2222) 일관성을 유지한다. 통상적으로, 기록-백 캐시 (2290; 도 21) 는, 기록하기 위한 기록 어드레스를 메모리 (예를 들어, DRAM) 으로 전송할 때까지, 기록 어드레스 (및 대응 데이터) 를 일시적으로 저장한다. 기록-백 캐시 (2290) 에서 기록 어드레스를 매치시키는 판독 요청의 어드레스가 있는 몇몇 경우에서, 다중-레벨 캐시 (2220) 는 제 1 레벨 DRC (2222) 로 그 데이터를 전송하기 전에 기록 어드레스의 데이터를 메모리의 데이터와 합병한다. DRC2 관리기 (2246) 는, 기록 백 캐시 (2290) 로 적재될 때 기록 어드레스를 매치시키는 어드레스의 임의의 엔트리를 무효화함으로써 제 2 레벨 DRC (2224) 일관성을 유지한다. 제 2 레벨 DRC (2224) 은 DRC1 에서만 데이터를 수신하기 때문에, 그리고, 제 1 레벨 DRC (2222) 는 메모리와 기록-백 캐시 (2290) 를 통해서 일관성을 유지하기 때문에, 제 2 레벨 DRC (2224) 은 일반적으로 변질된 데이터를 포함하지 않는다. 또한, DRC1 에서 DRC2 로 카피되는 임의의 어드레스는 기록 백 캐시 ("WBC"; 2290) 에 대해 먼저 체크될 수 있다. WBC (2290) 에서 매치가 발견되는 경우, 카피 동작은 실패된다. 반면에, DRC1 내지 DRC2 로의 어드레스를 카피하는 단계가 행해진다. 이 추가적인 체크는 일관성을 유지하는데 더욱 도움이 된다.Other means by which the DRC1 manager 2234 and the DRC2 manager 2246 can be attached are about maintaining consistency. The DRC1 manager 2234 maintains first level DRC 2222 consistency by updating the data of any entry with an address that matches the write address where the data is written. Typically, write-back cache 2290 (FIG. 21) temporarily stores the write address (and corresponding data) until the write address for writing to the memory (e.g., DRAM). In some cases where there is an address of a read request that matches a write address in the write-back cache 2290, the multi-level cache 2220 reads the data of the write address before sending that data to the first level DRC 2222. Merges with data in memory. The DRC2 manager 2246 maintains second level DRC 2224 consistency by invalidating any entry in the address that matches the write address when loaded into the write back cache 2290. Since the second level DRC 2224 receives data only in DRC1, and because the first level DRC 2222 maintains consistency through memory and write-back cache 2290, the second level DRC 2224 Generally does not include corrupted data. In addition, any address copied from DRC1 to DRC2 may first be checked against the write back cache (“WBC”) 2290. If a match is found in the WBC 2290, the copy operation fails. On the other hand, copying the addresses to DRC1 to DRC2 is performed. This additional check is more helpful in maintaining consistency.

도 23a 는 특정 실시형태에 따라서 제 1 어드레스 저장소 (2305) 에 대한 예시적인 DRC1 쿼리 인터페이스 (2323) 를 도시한다. 이 예시에서, 입력 어드레스와 같은 트리거 어드레스 ("A0"; 2300; 예를 들어, 프로세서-요청된 어드레스) 가 공통 어드레스 부분 (2302a) 및 델타 어드레스 부분 (2302b) 로 구성된다. 어드레스 (2300) 는 몇몇 경우에서 예측 어드레스, 또는 다른 경우 (일관성을 유지할 때) 에서 기록 어드레스일 수 있다. 어드레스 (2300) 가 예측 어드레스의 그룹을 발생시키는 트리거 어드레스인 경우, 이러한 그룹 (2307) 은 어드레스 ("A1"; 2301) 에서 어드레스 ("Am"; 2303) 으로 식별된 어드레스와 같은 어드레스를 포함할 수 있으며, 여기서, "m" 은 본 발명의 적어도 일 실시형태에 따라서 "미리보기 룩업" 을 수행하도록 이용할 수 있는 예측의 임의의 수를 나타낸다. 몇몇 경우, "m" 은 배치 크기 "b" 와 동일하게 설정된다.FIG. 23A illustrates an example DRC1 query interface 2323 for the first address store 2305, in accordance with certain embodiments. In this example, a trigger address (“A0”) 2300 (eg, processor-requested address), such as an input address, consists of the common address portion 2302a and the delta address portion 2302b. The address 2300 may in some cases be a predictive address, or in other cases a write address (when maintaining consistency). If address 2300 is a trigger address that generates a group of prediction addresses, this group 2307 may contain an address such as the address identified as address ("Am") 2303 in address ("A1"; 2301). Where “m” represents any number of predictions that can be used to perform a “look preview” in accordance with at least one embodiment of the present invention. In some cases, "m" is set equal to the batch size "b".

DRC1 어드레스 저장소 (2305) 의 엔트리 (2306) 는 제 1 엔트리 부분 (2306a; 예를 들어, 태그) 및 제 2 엔트리 부분 (2306b; 예를 들어, 인덱스) 을 각각 포함한다. 특정 실시형태에서, 제 1 엔트리 부분 (2306a) 및 제 2 엔트리 부분 (2306b) 은 공통 어드레스 부분 (2302a) 및 델타 어드레스 부분 (2302b) 과 각각 유사하다. 제 2 엔트리 부분 (2306b) 은 트리거 어드레스 ("A0"; 2300) 에서 그 특정 엔트리 (2306) 로 이동하는 어드레스에 관한 치환을 표시한다. 따라서, DRC1 매처 (2312) 는 트리거 어드레스 ("A0"; 2300) 와 같은 입력 어드레스를 엔트리 (2306) 에 비교할 때, 공통 부분 (2302a) 은 그룹 (2307) 의 어드레스의 공통 부분을 나타내기 위해 이용될 수 있다. 또한, 어드레스 (2300) 의 공통 부분 (2302a) 이 일반적으로 어드레스 ("A1"; 2301) 내지 어드레스 ("Am"; 2303) 에 대해 공통부분과 일반적으로 유사하며, 공통 부분 (2302a) 은 엔트리 (2306) 의 하나 이상의 제 1 엔트리 부분 (2306a) 에 대해 비교하도록 이용될 필요가 있다. 또한, 어드레스 ("A1"; 2301) 내지 어드레스 ("Am"; 2303) 에 대한 델타 부분 (2302b) 은 엔트리 (2306) 의 다중 제 2 엔트리 부분 (2306b) 에 대해 매치될 수 있다.Entry 2306 of DRC1 address store 2305 includes a first entry portion 2306a (eg, a tag) and a second entry portion 2306b (eg, an index), respectively. In a particular embodiment, first entry portion 2306a and second entry portion 2306b are similar to common address portion 2302a and delta address portion 2302b, respectively. The second entry portion 2306b indicates a substitution regarding the address moving from the trigger address (“A0”) 2300 to that particular entry 2306. Thus, when DRC1 matcher 2312 compares an input address, such as trigger address (“A0”) 2300, to entry 2306, common portion 2302a is used to indicate a common portion of the addresses of group 2307. Can be. In addition, the common portion 2302a of the address 2300 is generally similar to the common portion for the addresses (“A1”; 2301) to addresses (“Am”) 2303, and the common portion 2302a is an entry ( It needs to be used to compare against one or more first entry portions 2306a of 2306. Also, the delta portion 2302b for addresses ("A1") 2301 through "Am" 2303 may be matched against multiple second entry portions 2306b of entry 2306.

일 실시형태에서, DRC1 매처 (2312) 는 제 1 엔트리 부분에 대한 공통 어드레스 부분을 매치시키기 위한 공통 비교기 (2308), 및 제 2 엔트리 부분에 대한 델타 어드레스 부분을 매치시키기 위한 델타 비교기 (2310) 를 포함한다. 특히, 공통 부분 (2302a) 은 엔트리 0 내지 n 번째 엔트리에 대한 제 1 부분 (2306a) 에 대해 동시에 비교되고, 델타 부분 (2302b) 은 동일한 엔트리에 대해 제 2 부분 (2306b) 에 대해 동시에 비교된다. 몇몇 실시형태에서, 공통 비교기 (2308) 는 높은-순서 비트 (예를 들어, 36-비트 어드레스의 비트 35:12) 를 위한 "넓은 (wide)" 비교기이고, 델타 비교기 (2310) 는 낮은-순위 비트 (예를 들어, 36-비트 어드레스의 비트 11:6) 를 비교하기 위한 "좁은 (narrow)" 비교기이다. 도 23a 는 일 델타 부분 (2302b) 당 하나의 델타 부분을 도시하고, 몇몇 경우, 델타 비교기 (2310) 의 수는 m*n (미도시) 와 동일하고, 여기서, 각각의 델타 비교기는 입력으로 하나의 델타 부분 (2302b) 및 하나의 제 2 엔트리 부분 (2306b) 을 수신한다. 비교기 크기는 이들 비교를 수행하기 위해 요청된 물리적인 리소스의 양을 제한하고, 이와 같이, 동시에 미리보기되는 어드레스는 동일한 메모리 페이지 (예를 들어, 메모리 페이지 크기가 일반적으로 4K 바이트인) 내에 있도록 구성된다. 이는 페이지 경계를 가로지름으로써 미리보기 룩업의 어드레스를 감소시킬지라도, 이들 구성은 물리적 리소스에 관한 미리보기 룩업을 수행하는 비용을 감소시킨다. 공통 부분 (2302a) 및 델타 부분 (2302b) 은 엔트리 (2306) 와 동시에 각각 비교되거나, 또는 거의 동시에 비교된다.In one embodiment, DRC1 matcher 2312 includes a common comparator 2308 for matching the common address portion for the first entry portion, and a delta comparator 2310 for matching the delta address portion for the second entry portion. Include. In particular, the common portion 2302a is compared simultaneously for the first portion 2306a for entries 0 through nth entry, and the delta portion 2302b is simultaneously compared for the second portion 2306b for the same entry. In some embodiments, common comparator 2308 is a “wide” comparator for high-order bits (eg, bits 35:12 of a 36-bit address), and delta comparator 2310 is a low-rank A "narrow" comparator for comparing bits (e.g., bit 11: 6 of a 36-bit address). FIG. 23A shows one delta portion per one delta portion 2302b, and in some cases, the number of delta comparators 2310 is equal to m * n (not shown), where each delta comparator is one as input. Receives delta portion 2302b and one second entry portion 2306b. The comparator size limits the amount of physical resources requested to perform these comparisons, and as such, the addresses previewed simultaneously are configured to be within the same memory page (eg, memory page size is typically 4K bytes). do. Although this reduces the address of the preview lookup by crossing the page boundaries, these configurations reduce the cost of performing the preview lookup on physical resources. Common portion 2302a and delta portion 2302b are compared with entry 2306 respectively, or nearly simultaneously.

공통 비교기 (2308) 및 델타 비교기 (2310) 의 출력은 각각 Hbase(0), Hbase(1),...Hbase(m) 및 H0, H1, H2,...HN 이고, 여기서 각각은 0 (예를 들어, 매치하지 않음을 나타냄) 또는 1 (예를 들어, 매치함을 나타냄) 이다. 그 결과 0 과 1 의 히트 벡터를 형성하고, 이들은 필터링하고 또는 일관성을 유지시키는지의 여부에 의존하여, 동작을 취하기 위해 DRC1 처리기 (2314) 로 전송된다. 히트 리스트 발생기 (2313) 는 범위 "r" (즉, 그룹 (2307)) 의 어떤 어드레스가 DRC1 어드레스 저장소 (2305) 에 상주하는지를 나타내는 히트의 리스트 ("히트 리스트") 를 발생시킨다. 어드레스가 매치되는 경우 (즉, 예측이 그 내부에 저장되는 경우), 어드레스는 히트 리스트에 포함되는 반면, 매치하지 않는 어드레스 (즉, 예측이 저장되지 않음) 는 히트 리스트에서 제외된다. 이 히트 리스트는 DRC1 어드 레스 저장소 (2305) 내에서 예측을 발생시키거나 또는 일관성을 유지하도록 이용된다.The outputs of common comparator 2308 and delta comparator 2310 are Hbase (0), Hbase (1), ... Hbase (m) and H0, H1, H2, ... HN, respectively, where 0 ( For example, no match) or 1 (eg, match). As a result, they form hit vectors of 0 and 1, which are sent to the DRC1 processor 2314 to take action, depending on whether they are filtering or maintaining consistency. Hit list generator 2313 generates a list of hits (“hit list”) indicating which addresses of range “r” (ie, group 2307) reside in DRC1 address store 2305. If an address matches (ie, the prediction is stored therein), the address is included in the hit list, while an address that does not match (ie, the prediction is not stored) is excluded from the hit list. This hit list is used to generate predictions or remain consistent within the DRC1 address store 2305.

도 23b 는 특정 실시형태에 따라 도 23a 의 DRC1 쿼리 인터페이스 (2323) 를 이용하여 동시에 시험될 수 있는 수많은 예시적인 입력 어드레스 (2352) 를 도시한다. 여기서, DRC1 쿼리 인터페이스 (2350) 는 DRC1 어드레스 저장소 (2305) 에 대해 매치시키기 위해 어드레스 (2352) 의 임의의 범위를 수용할 수 있다. 도 23a 의 매처 (2312) 는 입력 어드레스의 수를 초과하여 평행의 미리보기 룩업을 수행하기 위해 필요한 시간만큼 복제된다. 예로서, 27 로 설정된 배치 크기 "b" 를 통해서 순방향 순차 예측에 대해, DRC1 쿼리 인터페이스 (2350) 는 그룹 (2307) 으로서 예측 어드레스 A1 내지 A7 와 동사에 기초 (또는 트리거) 어드레스로서 A1 을 매치시키기 위해 매처를 요구한다. 블라인드 백 예측에 대해, A(-1) 만이 그룹 (2307) 으로서 기초 어드레스에 대해 매칭을 요구하지만, 역방향 순차 예측에 대해서는, 어드레스 A(-1) 내지 어드레스 A(-7) 는 매칭을 요구한다. 어드레스 (2352) 의 범위는 DRC1 및 DRC2 쿼리 인터페이스에 동시에 병렬로 가해진다는 것이 명시된다.FIG. 23B shows a number of example input addresses 2352 that can be tested simultaneously using the DRC1 query interface 2323 of FIG. 23A, in accordance with certain embodiments. Here, DRC1 query interface 2350 can accept any range of addresses 2352 to match against DRC1 address store 2305. Matcher 2312 of FIG. 23A is duplicated by the time required to perform parallel preview lookups beyond the number of input addresses. For example, for forward sequential prediction via batch size " b " set to 27, DRC1 query interface 2350 matches A1 as a group (2307) to A1 as a base (or trigger) address based on the verb. Ask for a matcher. For blind back prediction, only A (-1) requires matching for the base address as group 2307, whereas for backward sequential prediction, addresses A (-1) to address A (-7) require matching. . It is specified that the range of addresses 2352 is applied in parallel to the DRC1 and DRC2 query interfaces simultaneously.

도 24 는 특정 실시형태에 따라 DRC2 어드레스 저장소 (2404) 에 대한 예시적인 DRC2 쿼리 인터페이스 (2403) 를 도시한다. DRC2 쿼리 인터페이스 (2403) 는 DRC2 어드레스 저장소 (2404) 의 콘텐츠에 대한 어드레스를 비교하기 위해 입력 어드레스 (2402) 를 수신하도록 구성된다. 이 예시에서, 입력 어드레스 (2402) 는 태그 (A0) 와 같은 어드레스의 기초 부분 (예를 들어, 태그) 이다. 이 예시 에 더해, DRC2 어드레스 저장소 (2404) 는 메모리의 4 개의 뱅크 (2406; 뱅크0, 뱅크1, 뱅크2 및 뱅크3) 로 구성되고, 각각의 뱅크는 엔트리 (2410) 를 포함한다. 이러한 경우, 엔트리 (2410) 는 4 개의 방법 (W0, W1, W2 및 W3) 중 임의의 하나로 위치될 수 있다.24 illustrates an example DRC2 query interface 2403 for a DRC2 address store 2404, in accordance with certain embodiments. DRC2 query interface 2403 is configured to receive an input address 2402 to compare the addresses for the contents of DRC2 address store 2404. In this example, input address 2402 is the base portion (eg, tag) of an address, such as tag A0. In addition to this example, DRC2 address store 2404 is comprised of four banks 2406 (bank 0, bank 1, bank 2 and bank 3) of memory, each bank comprising an entry 2410. In this case, entry 2410 may be located in any one of four methods W0, W1, W2, and W3.

DRC2 매처 (2430) 는 엔트리 (2410) 에 대한 태그 (A0) 를 비교하기 위해 수많은 비교기를 포함한다. 일반적으로, DRC2 어드레스 저장소 (2404) 내의 임의의 매칭 어드레스는 동일한 태그 (A0) 를 공유하지만, 비트의 다른 그룹 (예를 들어, 인덱스에 의해) 에 관하여 상이할 수 있다. 본 발명의 특정 실시형태에서, 태그가 DRC2 어드레스 저장소 (2404) 내에서 임의의 엔트리와 매치하는지의 여부를 결정하는 것은 일반적으로 이하의 방법으로 수행된다. 먼저, 각각의 뱅크 (2406) 에 대해서, 이 뱅크 내의 인덱스 중 하나는 잠재 매칭 어드레스에 대해 서치되도록 선택된다. 도 25a 에 도시된 바와 같이, 서치되는 선택된 뱅크가 특정 어드레스 (예를 들어, 도 25 의 A0) 가 상주하는 뱅크의 하나에 의존하기 때문에, 뱅크가 특정 어드레스 (예를 들어, A0) 의 일정 인덱스 비트에 의해 식별될 수 있음에 따라서, 이는 뱅크당 변화시킬 수 있다. 둘째, 각각의 뱅크 (2406) 에 대해 선택된 인덱스의 모든 4 개의 방법이 액세스된다. 다음으로, 4 개의 방법 (예를 들어, W0 내지 W3) 에 관련하여 저장된 태그는 이 예시에서 기초 어드레스 (2402) 인 태그 (A0) 에 대해 비교된다. 일반적으로, 태그 (A1) 와 같은 다른 태그를 비교하지 않고 태그 (A0) 를 비교하기 위해 충분하다. 이는, 이들 태그가 동일 (예를 들어, 태그(A0)=태그(A1)=태그(A2)) 하도록 가정되기 때문이다. 예측에 대한 동시 서치는 4 킬로바이트 페이지와 같은 동일한 페이지에 있는 페이지로 통상 한정되고, 이는, 동일한 태그를 야기한다. 셋째, 어드레스 매치가 DRC2 매처 (2430) 에 의해 이루어지면, 히트 벡터 및 유효 비트의 형성은 도 27 및 도 28 과 관련하여 설명된 바와 유사한 최종 히트 벡터를 획득하기 위해 이용된다.The DRC2 matcher 2430 includes a number of comparators to compare the tag A0 for the entry 2410. In general, any matching address in DRC2 address store 2404 shares the same tag A0, but may be different with respect to other groups of bits (eg, by index). In a particular embodiment of the present invention, determining whether a tag matches any entry in the DRC2 address store 2404 is generally performed in the following manner. First, for each bank 2406, one of the indices in this bank is selected to be searched for a potential matching address. As shown in Fig. 25A, since a selected bank to be searched depends on one of the banks in which a specific address (e.g., A0 in Fig. 25) resides, the bank has a constant index of a specific address (e.g., A0). As can be identified by the bits, this can vary per bank. Second, all four methods of the selected index are accessed for each bank 2406. Next, the tags stored in relation to the four methods (e.g., W0 to W3) are compared against tag A0, which is the base address 2402 in this example. In general, it is sufficient to compare tag A0 without comparing another tag such as tag A1. This is because these tags are assumed to be identical (for example, tag A0 = tag A1 = tag A2). Simultaneous search for prediction is usually limited to pages that are on the same page, such as 4 kilobyte pages, which results in the same tag. Third, if an address match is made by the DRC2 matcher 2430, formation of the hit vector and valid bits is used to obtain a final hit vector similar to that described with respect to FIGS. 27 and 28.

DRC2 쿼리 인터페이스 (2403) 의 히트 발생기 (2442) 는 DRC2 매처 (2430) 에서 태그 비교 결과 ("TCR"; 2422) 를 수신하고, 예측의 순서 세트 ("순서 예측") 을 발생시키기 위해 대응 유효 비트 (2450) 에 대한 결과를 더 비교한다. 여기서, 태그 비교는 뱅크 1, 2, 3 및 4 로부터의 태그 비교 결과는 TRC(a), TCR(b), TCR(c) 및 TCR(d) 로 각각 표시되고, 각각은 태그가 하나 이상의 엔트리 (2410) 를 매칭하는지의 여부를 나타내는 하나 이상의 비트를 포함한다. 순서 예측은 입력 어드레스 (2402) 를 매칭하는 (또는 매칭하지 않는) 예측의 순서 세트일 수 있다. 또한, 순서 예측은, 입력 어드레스가 DRC2 어드레스 저장소 (2404) 에 존재하는 어드레스를 가지는지의 여부를 각각 나타내는 비트의 벡터일 수 있다. 입력 어드레스 (2402) 의 임의의 수는, 부가적인 DRC2 매처 (2430) 가 포함되는 경우의 DRC2 쿼리 인터페이스 (2403) 에 대해 유사하게 매칭될 수 있다. 도 25a 내지 도 28 은 본 발명의 몇몇 실시형태에 따른 예시적인 히트 발생기를 도시한다.The hit generator 2442 of the DRC2 query interface 2403 receives the tag comparison result (“TCR”) 2422 at the DRC2 matcher 2430 and generates a corresponding valid bit to generate an ordered set of predictions (“order prediction”). Compare the results for (2450) further. Here, the tag comparisons are the results of tag comparisons from banks 1, 2, 3, and 4, represented by TRC (a), TCR (b), TCR (c), and TCR (d), respectively, with each tag having one or more entries. One or more bits indicating whether to match 2410. The ordered prediction may be an ordered set of predictions that match (or do not match) the input address 2402. The order prediction may also be a vector of bits, each indicating whether or not the input address has an address present in the DRC2 address store 2404. Any number of input addresses 2402 can be similarly matched to DRC2 query interface 2403 when additional DRC2 matcher 2430 is included. 25A-28 illustrate exemplary heat generators in accordance with some embodiments of the present invention.

도 25a 는 본 발명의 일 실시형태에 따라 DRC2 어드레스 저장소 (2404) 에 저장된 어드레스의 가능한 배열 (또는 그 표현) 을 도시한다. 방법 W0, W1, W2 및 W3 는 이하 설명을 간단하게 하도록 도시되지 않음이 명시된다. 입력 어드레스 A0, A1, A2, 및 A3 는 DRC2 어드레스 저장소 (2404) 내에 저장된다. 일 예로서, 순차 예측기 (2102) (미도시) 는 트리거 어드레스 (A0; 예를 들어, 임의의 4 개의 방법에서) 에 기초하여 순차 예측 A1, A2, 및 A3 를 발생할 수 있다. 제 1 배열 (2502) 은 뱅크 0 에 저장되는 A0 에서 도출된다. 이와 같이, 제 2 배열 (2504), 제 3 배열 (2506) 및 제 4 배열 (2508) 은 뱅크 1, 2 및 3 의 어드레스 A0 에서 각각 개별적으로 트리거 어드레스에 이어서 직렬로 저장된 후속의 어드레스를 통해서 도출된다. 이와 같이, 이들 어드레스 (또는 태그의 형성에서와 같은 그 부분) 는 어떠한 특정 순서도 없는 DRC2 어드레스 저장소 (2404) 로부터 일반적으로 출력된다.25A illustrates a possible arrangement (or representation thereof) of addresses stored in DRC2 address store 2404 in accordance with one embodiment of the present invention. It is specified that the methods W0, W1, W2 and W3 are not shown to simplify the description below. Input addresses A0, A1, A2, and A3 are stored in DRC2 address store 2404. As an example, sequential predictor 2102 (not shown) may generate sequential predictions A1, A2, and A3 based on the trigger address A0 (eg, in any of the four methods). The first arrangement 2502 is derived from A0 stored in bank 0. As such, the second array 2504, the third array 2506 and the fourth array 2508 are derived from the addresses A0 of banks 1, 2 and 3, respectively, individually via the trigger address followed by subsequent addresses stored in series. do. As such, these addresses (or portions thereof, such as in the formation of tags) are generally output from DRC2 address store 2404 without any particular order.

도 25b 는 본 발명의 일 실시형태에 따라서 순서화되지 않은 어드레스 및 대응 유효 비트에 기초하여 결과를 발생시키는 예시적인 시트 발생기 (2430) 를 도시한다. 이 예시에서, 순차 예측기 (2102) 는 트리거 어드레스 A0 에 기초하여 순차 예측 A1, A2, A3, A4, A5, A6 및 A7 를 발생시키고, 이들은 도시된 특정 배열로 저장된다 (즉, 트리거 어드레스 A0 는 다른 이하를 가지는 뱅크 1 에 저장된다). 히트 발생기 (2430) 는 순서화되지 않은 어드레스 (A2, A6, A1, A5, A0, A4, A3, A7) 및 순서화된 유효 비트 (VB0 내지 VB7) 를 수신하고, 그들을 순서화하며, 그들을 비교하고, 다음으로, 비트 벡터 또는 어드레스의 리스트일 수 있는 (매치하는 또는 매치하지 않는) 결과 (R0 내지 R7) 를 발생시킨다. 예측이 무효화됨을 나타내는 유효 비트는 매칭되는 것으로부터 저장되고 무효의 예측을 방지한다는 것을 명시한다. 이는, 어드레스 저장의 콘텐츠에 대한 유효 비트를 매칭시키는 하나의 이유이다. 본 발명의 특정 실시형태에 따라서, 어드레스 A2, A1, A0 및 A3 또는 어드레스 A6, A5, A4 및 A7 둘 중 하나와 같은, 8 개의 어드레스 보다 4 개의 어드레스가 동시에 고려된다. 이와 같이, 도 25b 에 도시된 바와 같이, "겹침" 방법으로 밀집하게 액세스가능한 어드레스 A0 내지 A7 를 나타낼 필요는 없다. 그러나, 도 25b 의 어드레스 A0 내지 A7 를 동시에 고려하기 위해, DRC2 는 동일한 RAM (또는 동일한 DRC2) 에 개별적으로 동시에 액세스를 수행하도록 더블-포트 랜덤 액세스 메모리 ("RAM") 로서 구성될 수 있다.25B illustrates an example sheet generator 2430 that generates results based on unordered addresses and corresponding valid bits in accordance with one embodiment of the present invention. In this example, sequential predictor 2102 generates sequential predictions A1, A2, A3, A4, A5, A6, and A7 based on trigger address A0, which are stored in the particular array shown (i.e., trigger address A0 is Stored in bank 1 with another or less). Hit generator 2430 receives unordered addresses A2, A6, A1, A5, A0, A4, A3, A7 and ordered valid bits VB0 through VB7, orders them, compares them, and then This results in results R0 through R7, which may be (match or do not match) a bit vector or a list of addresses. A valid bit indicating that the prediction is invalidated specifies that it is stored from being matched and prevents the prediction of the invalidation. This is one reason for matching valid bits for the contents of the address store. In accordance with certain embodiments of the present invention, four addresses are considered simultaneously than eight addresses, such as addresses A2, A1, A0 and A3 or one of addresses A6, A5, A4 and A7. As such, as shown in FIG. 25B, it is not necessary to represent the addresses A0 to A7 that are densely accessible in an "overlapping" method. However, in order to simultaneously consider the addresses A0 to A7 of FIG. 25B, the DRC2 may be configured as a double-port random access memory (“RAM”) to perform simultaneous simultaneous access to the same RAM (or the same DRC2).

도 26 은 도 24 의 히트 발생기 (2442) 에 대한 히트 발생기 (2600) 의 개략적인 표현이다. 히트 발생기 (2600) 는 각각의 입력 어드레스에 대해 Way 0 내지 3 및/또는 유효 비트에서 어드레스를 멀티플렉싱함으로써 하나 이상의 결과 R0 내지 R7 를 발생시키고, 여기서 결과 R 은 어드레스 또는 유효 비트의 멀티플렉싱된 비트를 비교함으로써 결정된다. 유효 비트가, 대응 태그 비교 결과 ("TCR") 에 의해 표시된 태그가 유효하다고 나타내는 경우, 그 태그는 결과 R 로 출력된다. TRC 는 어드레스의 태그일 수 있고, 또는, "1" (즉, DRC2 의 히트) 또는 "0" (즉, DRC2 의 히트 아님) 둘 중 하나의 값을 가지는 비트일 수 있다는 것이 명시된다. 도 27 및 도 28 에 관련하여 이하 설명되는 바와 같이, 어드레스에 대한 태그 (예를 들어, 태그 (A1)) 는 일반적으로 그 태그에 대한 단일 TCR 비트를 나타낸다.FIG. 26 is a schematic representation of a heat generator 2600 for the heat generator 2442 of FIG. 24. Hit generator 2600 generates one or more results R0 through R7 by multiplexing the addresses in Ways 0 to 3 and / or valid bits for each input address, where result R compares the multiplexed bits of the address or valid bits. Is determined by. If the valid bit indicates that the tag indicated by the corresponding tag comparison result ("TCR") is valid, the tag is output as a result R. It is specified that the TRC may be a tag of an address or may be a bit having a value of either "1" (ie, hit of DRC2) or "0" (ie, not hit of DRC2). As described below with respect to FIGS. 27 and 28, a tag for an address (eg, tag A1) generally indicates a single TCR bit for that tag.

도 27 은 본 발명의 일 실시형태에 따른 히트 발생기 (2442) 의 일 예를 도시한다. 히트 발생기 (2442) 는 뱅크 0, 1, 2, 및 3 의 방법으로부터 각각의 어드레스 A3, A0, A1 및 A2 에 대해 순서화되지 않은 태그를 순서화하도록 구성된 다. 그러나, 어드레스 A3, A0, A1 및 A2 에 대한 태그는 각각의 태그에 대한 TCR 을 나타내는 단일의 비트를 각각 나타낸다. 다음으로, 순서화된 TCR (어드레스 A0, A1, A2, A3 에 대한 순서화된 태그로 도시됨) 은 유효 비트 (2244) 에서 유효 비트 VB0-VB3 에 대해 테스트된다. AND 동작기 ("AND") (2706) 는 논리 AND 기능으로 테스트를 수행한다. 따라서, 유효 비트가 사실이고, 단일-비트 TCR 이 사실인 경우, 히트가 있고 결과 R 은 이에 반영된다. 즉, 결과 R0, R1, R2 및 R3 는 매치/불매치를 나타내는 비트일 수 있거나 또는 어드레스에 대한 태그를 매칭될 수 있고 없는 순서화된 예측 결과를 형성한다. 태그 그 자체가 TCR (예를 들어, 태그 (A3)) 로서 이용되는 경우, AND 동작기 (2706) 는 대응 유효 비트가 0 인 (예를 들어, 결과 R 가 그 대응 유효 비트가 0 인 경우 모든 0 을 포함하는) 경우의 비트를 마스킹하도록 동작한다.27 shows an example of a heat generator 2442 in accordance with an embodiment of the present invention. Hit generator 2442 is configured to order the unordered tags for each of the addresses A3, A0, A1 and A2 from the methods of banks 0, 1, 2, and 3. However, the tags for addresses A3, A0, A1 and A2 each represent a single bit representing the TCR for each tag. Next, an ordered TCR (shown as an ordered tag for addresses A0, A1, A2, A3) is tested for valid bits VB0-VB3 at valid bits 2244. AND operator ("AND") 2706 performs the test with a logical AND function. Thus, if the valid bit is true and the single-bit TCR is true, there is a hit and the result R is reflected in it. That is, the results R0, R1, R2 and R3 may be bits indicating a match / mismatch or form an ordered prediction result that may or may not match a tag for an address. If the tag itself is used as a TCR (e.g., tag A3), then the AND operator 2706 will return all the bits whose corresponding valid bits are zero (e.g., if the result R is zero). To mask bits in the case (including zero).

도 28 은 본 발명의 다른 실시형태에 따라 히트 발생기 (2442) 의 다른 예를 도시한다. 히트 발생기 (2442) 는 유효 비트 (2224) 에서 순서화된 유효 비트 VB0-VB3 의 순서를 혼란시키도록 구성된 유효 비트 ("VB"; valid bit) 순서기를 포함한다. 즉, 유효 비트 순서기 (2802) 는 순서 VB0, VB1, VB2 및 VB3 에서 VB3, VB0, VB1 및 VB2 의 순서를 가지도록 유효 순서를 재순서화하고, TCR 의 순서를 매칭하고, 어드레스 A3, A0, A1 및 A2 에 대한 태그를 나타낸다. 다음으로, 어드레스에 대한 비순서화된 태그 (즉, 이들 태그에 대한 비순서화된 TCR) 는 AND 동작기 ("AND"; 2806) 에 의해 유사하게 순서화된 유효 순서에 대해 테스트된다. 비순서화된 결과 R3, R0, R1 및 R2 는 순서화된 예측 결과로서 R0, R1, R2 및 R3 를 획득하도록 결과 순서기 (2810) 를 통과하고, 이는, 프리페쳐 (2100) 에 의해 사용가능한 형태이고, 그 엘리먼트는 필터링, 일관성 등을 수행한다. 유효 비트 및 (결과 비트로서 할 수 있는) 결과를 재순서화함으로써, 하드웨어는 수많은 비트로 구성된 각각의 어드레스를 재순서화하는 것보다 적게 필요하다. 순서기 (2702) 및 결과 순서기 (2810) 를 순서화하는 것은 예시적이고, 비트를 순서화하고 재순서화하는 다른 매핑은 본 발명의 범위 내에 있다.28 shows another example of a heat generator 2442 in accordance with another embodiment of the present invention. Hit generator 2442 includes a valid bit ("VB") sequencer configured to disrupt the order of valid bits VB0-VB3 ordered in valid bits 2224. That is, the valid bit sequencer 2802 reorders the valid order to have the order of VB0, VB1, VB2, and VB3 to VB3, VB0, VB1, and VB2, matches the order of TCR, and matches addresses A3, A0, Represent tags for A1 and A2. Next, unordered tags for the addresses (ie, unordered TCRs for these tags) are tested for similarly ordered valid order by an AND operator ("AND") 2806. The unordered results R3, R0, R1, and R2 pass through a result sequencer 2810 to obtain R0, R1, R2, and R3 as an ordered prediction result, which is a form usable by the prefetcher 2100. The element performs filtering, consistency, and so on. By reordering the valid bits and the result (which may be the result bit), the hardware needs less than reordering each address consisting of a number of bits. Ordering orderer 2702 and result orderer 2810 is illustrative, and other mappings for ordering and reordering bits are within the scope of the present invention.

본 발명의 특정 실시형태에서, 비순차 예측 (2104) 과 다중-레벨 캐시 (2120) 를 포함하는 도 21 의 프리페쳐 (2100) 는 노스브리지 (Northbridge) 칩의 동일한 기능의 몇몇을 적어도 가지는 메모리 프로세서 내부와 같은 노스브리지-사우스브리지 (Northbridge-Southbridge) 칩세트 구조 내에 배치된다. 메모리 프로세서는 CPU, 그래픽 프로세서 유닛 (graphic processor unit; "GPU") 등과 같은 하나 이상의 프로세서에 의해 적어도 메모리 액세스를 제거하도록 설계된다. 노스브리지 구현에서, 프리페쳐 (2100) 는 AGP/PCI 익스프레스 인터페이스를 통해 GPU 로 연결될 수 있다. 또한, 프런트 버스 ("FSB"; front side bus) 는 프로세서와 메모리 사이의 시스템 버스로 이용될 수 있다. 또한, 메모리는 시스템 메모리일 수 있다. 이와 다르게, 다중-레벨 캐시 (2120) 는, 메모리 프로세서와 같이, 메모리로의 액세스를 제어하도록 기능하는 임의의 다른 구조, 회로, 디바이스 등에서 채용될 수 있다. 또한, 다중-레벨 캐시 (2120) 및 그 엘리먼트 뿐만 아니라 프리페쳐 (2100) 의 다른 컴포넌트는 하드웨어 또는 소프트웨어 모듈 둘 중 하나 또는 둘 다로 구성될 수 있고, 임의의 방법으로 분배되거나 또는 결합될 수 있다.In a particular embodiment of the invention, the prefetcher 2100 of FIG. 21, which includes out of order prediction 2104 and a multi-level cache 2120, has at least some of the same functionality of a Northbridge chip. It is placed within a Northbridge-Southbridge chipset structure such as the interior. The memory processor is designed to remove at least memory access by one or more processors, such as a CPU, a graphics processor unit (“GPU”), or the like. In a northbridge implementation, the prefetcher 2100 may be connected to the GPU via an AGP / PCI Express interface. In addition, a front side bus (“FSB”) may be used as the system bus between the processor and the memory. The memory may also be system memory. Alternatively, multi-level cache 2120 may be employed in any other structure, circuit, device, or the like that functions to control access to the memory, such as a memory processor. In addition, the multi-level cache 2120 and its elements, as well as other components of the prefetcher 2100, may consist of one or both of hardware or software modules, and may be distributed or combined in any manner.

설명의 목적으로, 전술한 설명은 본 발명의 전체적인 이해를 제공하기 위해 특정 목록을 이용하였다. 그러나, 특정 상세설명이 본 발명을 시험하도록 요구되지 않는다는 것이 당업자에게는 명백하다. 따라서, 본 발명의 특정 실시형태의 전술한 설명은 묘사 및 설명의 목적으로 나타난다. 이들은 완벽하거나 개시된 정밀한 형태로 본 발명을 한정하도록 의도되지 않았고, 명백하게, 많은 변형 및 변화가 전술한 교시의 관점에서 가능하다. 사실상, 이 설명은 임의의 실시형태에 본 발명의 임의의 특징 또는 양태를 한정하도록 판독되지 않았고, 일 실시형태의 특정 및 양태는 다른 실시형태와 쉽게 상호작용될 수도 있다. 실시형태들은 본 발명 및 그 실제 어플리케이션의 원리를 가장 잘 설명하기 위해 선택되고 설명되었으며, 이에 따라, 당업자들로 하여금 다양한 변형을 통해서 본 발명 및 다양한 실시형태를 가장 잘 이용하도록 한다. 이는, 이하 청구항 및 그 균등물이 본 발명의 범위를 정의하도록 의도된다.For purposes of explanation, the foregoing descriptions have used specific lists to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that specific details are not required to test the invention. Accordingly, the foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise form disclosed, and obviously, many modifications and variations are possible in light of the above teaching. Indeed, this description has not been read to limit any feature or aspect of the invention to any embodiment, and the specifics and aspects of one embodiment may be easily interacted with other embodiments. Embodiments have been selected and described in order to best explain the principles of the invention and its practical application, and thus enable those skilled in the art to best utilize the invention and various embodiments through various modifications. It is intended that the following claims and equivalents thereof define the scope of the invention.

Claims

As a prefetcher to predict accesses to memory,

A first address predictor configured to associate a subset of addresses to one address and predict a group of addresses based on one or more addresses of the subset,

The one or more addresses of the subset are non-patternable to the address.

The method of claim 1,

The first address predictor,

And a nonsequential predictor for generating the group of addresses as nonsequential predictions when the address is detected.

The method of claim 2,

The out of order predictor,

A method of prioritizing each subset of the addresses over a remainder, wherein the storage stores the subset of addresses and combinations of the addresses, wherein the address is stored as a trigger address and the subset of addresses is a target. The store, stored as addresses; And

Further comprising a non-sequential prediction engine configured to detect the address in a stream of addresses and further configured to select the one or more addresses as non-sequential prediction based on the highest priority thereof; Prefetcher.

The method of claim 3, wherein

The highest priority is

At least indicating that a processor has requested the one or more addresses most recent for the remaining of the subset of addresses.

The method of claim 3, wherein

The first address predictor is configured to generate indexes and tags from addresses, the repository including a plurality of methods each having a memory location for storing trigger-target associations,

The trigger-target combination stored in the first method is combined at a higher priority than other trigger-targets stored in the second method.

The method of claim 5,

Each trigger-target combination comprises a tag having a tag size and at least a portion of a target address having a portion size;

Wherein the tag represents a trigger address and the tag size and the size of the portions are configured to minimize a size requirement for a memory location.

The method of claim 5,

The first address predictor is configured to compare the first address tag with each tag identified by one or more of the indexes to detect any trigger-target combination that includes a first address tag,

The first address predictor may be a target address from one or more trigger-target combinations for forming out of order prediction, or two consecutive trigger addresses for forming additional out of order prediction based on one or more other trigger-target combinations. Using either or both,

Each of the one or more other trigger-target combinations relates to lower levels in the target cache than one of the trigger-target combination or the other trigger-target combination.

The method of claim 5,

Further comprising a priority adjuster configured to modify a priority for one of the trigger-target combinations that includes a target address that matches a second address,

The target address is identified by a trigger address consisting of one or more of the index and a first address tag.

The method of claim 3, wherein

Designating a first address of a sequential stream of addresses as a new trigger address for the one or more addresses—that is, when the trigger address is within the sequential stream and the out of order prediction is through the trigger address rather than through the new address. A prefetcher, further comprising an accelerator for prematurely occurring through a trigger address.

The method of claim 3, wherein

And a suppressor configured to suppress generating one or more prediction addresses.

The method of claim 10,

The suppressor is configured to reduce the batch quantity of addresses for the group if the address relates to one or both of a request for data or a prefetch request, and thereby Presuppressing the occurrence of the one or more prediction addresses.

The method of claim 10,

The suppressor is adapted to generate the group of addresses as out of order predictions if the interval of time from the detection of the address as the trigger address to the generation of the group of addresses as out of order predictions is less than a threshold. Further configured to suppress,

The threshold is an amount of time between a first processor request for the trigger address and a second processor request for the one or more addresses, the time being less than the time required to prefetch one or more of the group of addresses from memory. Prefetcher, at least defined by

The method of claim 10,

The suppressor keeps track of the base address and the last-detection address for each of the plurality of interleaved sequential streams, and another address from the base address to the last-detection address for any of the plurality of interleaved sequential streams. Determine whether it is within an address stream, and if so, further configured to suppress the occurrence of at least the predicted address based on the other address.

The method of claim 13,

Each of the plurality of interleaved sequential streams is part of one of a plurality of threads.

The method of claim 10,

And a second address predictor comprising a sequential predictor to generate a plurality of additional predictive addresses based on the one or more other addresses.

The method of claim 15,

The plurality of additional prediction addresses;

A first number of addresses sequentially ordered from the one or more other addresses, or

Include a second number of addresses sequentially in descending order from the one or more other addresses, or

Includes both the first number of addresses and the second number of addresses,

The inhibitor;

Detect the one or more other addresses are part of a first address stream in ascending order, and suppress the plurality of additional predictive addresses based on the second number of addresses ordered in descending order,

And detect the one or more other addresses as being part of a second address stream in descending order and suppress the plurality of additional predictive addresses based on the first number of addresses ordered in ascending order.

The method of claim 15,

The plurality of additional prediction addresses;

A back address sequentially ordered by one from the one or more other addresses in descending order, or

One or both of the back sector addresses of the one or more other addresses,

And the suppressor is further configured to reduce the batch amount by one when the plurality of additional predictive addresses includes one of the back address or the back sector address.

The method of claim 15,

A prediction inventory comprising a plurality of queues each configured to maintain predictions of the same prediction type until it is published or filtered; And

An inventory filter for generating a subset of the filtered addresses,

The inventory filter is configured to filter extra addresses in the predictive inventory, or one of the group of addresses and the plurality of additional predictive addresses,

The prefetcher is configured to provide one or more of the filtered subset of addresses.

The method of claim 18,

And the plurality of queues further comprises one or more queues that maintain different types of predictions than predictions maintained in other queues of the plurality of queues.

The method of claim 18,

Further comprising an inventory manager configured to control each of the plurality of queues according to one or more queue attributes,

The one or more queue attributes are

Type of queue,

Expiry Time,

Cue size,

An insertion indicator indicating a method for inserting an introduction prediction into a complete queue, and

Prefetcher, which is the priority for selecting the next prediction.

The method of claim 20,

The plurality of queues are:

Sequential queues with sequential queue priorities;

A back queue having a back queue priority configurable to indicate a precedence exceeding the sequential queue priority; And

Further comprising one or more out of order queues, each having a unique priority relative to the priorities of the other queues.

The method of claim 20,

The inventory manager manages predictions by each group of items including a triggering address and one or more items,

Wherein each of the plurality of queues is configured to be searched to match the predictions to other predictions that are independent of the plurality of queues upon publication.

The method of claim 22,

Further comprising an arbiter configured to publish the one or more items as an issued item to access a memory,

The published item is selected based on a priority for a publication queue, and the priority is changeable by the arbiter upon detecting that the publication queue is contributing to memory over-use.

The method of claim 23,

A cache memory containing predictable information and references; And

And a post-inventory filter configured to compare the published item against the references to filter the published item as a prediction of surplus.

The method of claim 15,

Further comprising a cache memory for managing predictable accesses to the memory;

The cache memory,

A short term cache memory configured to store predictions having a lifetime less than a threshold;

A long term cache memory configured to store predictions that have a lifetime greater than or equal to the threshold and having more memory capacity than the short term cache memory; And

An interface configured to detect in parallel whether multiple predictions are stored in one or both of the short term cache memory or the long term cache memory, and

The interface uses two or more representations for each of the multiple predictions when examining the short term cache memory and the long term cache memory.

The method of claim 25,

Further comprising a data return cache manager configured to copy one stored prediction as a copied prediction from the short term cache memory to the long term cache memory when the one stored prediction ages past the threshold in the past,

And the short term cache memory is the only source of data for the long term cache memory.

The method of claim 26,

The data return cache manager is:

Store the copied prediction in an entry in the long term cache memory, the entry being one of a number of methods available; or

Further configured to store the copied prediction in the entry of the long-term cache memory if none of the plurality of methods is available, the entry including the longest stored prediction.

The method of claim 26,

The data return cache manager;

Store a prediction in an entry in the short term cache memory, the entry comprising an invalid prediction, or

And store the prediction in another entry of the short term cache memory that includes the oldest prediction.

The method of claim 26,

The data return cache manager;

Match the write address against the next stored prediction to form a matched prediction,

If the next stored prediction is stored in the short term cache memory, merging at least a portion of the data of the write address through a portion of the predictable information of the matched prediction, and

And if the next stored prediction is stored in the long term cache memory, further configured to invalidate the next prediction.

The method of claim 25,

The short term and long term cache memory is configured to reduce the processor-related delay due to open pages of memory by storing out of order predictions as a subset of the range of predictions,

The subset includes predictions in at least two pages of memory,

The first address predictor generates the range of predictions in response to a trigger address,

The trigger address is in a page of memory different from any page that includes the range of predictions,

The short term and long term cache memories are configured to store a predictive identifier associated with each entry of the stored predictions and to send the predictive identifier to the first address predictor,

And the long term cache memory is configured to store valid bits separately from each entry configured to store a prediction.