KR20050084232A

KR20050084232A - Counter based stride prediction for data prefetch

Info

Publication number: KR20050084232A
Application number: KR1020057010495A
Authority: KR
Inventors: 잔 후게르브루게; 드 웨어드트 잔-윌렘 반
Original assignee: 코닌클리즈케 필립스 일렉트로닉스 엔.브이.
Priority date: 2002-12-12
Filing date: 2003-12-09
Publication date: 2005-08-26
Also published as: EP1576465A1; JP2006510082A; CN1726459A; AU2003285604A1; WO2004053686A1

Abstract

A prefetching system (400) includes hysteresis in the determination and modification of a stride value (412) that is used for prefetching data (130) in a sequential process. Once a stride value is determined, intermittent stride inconsistencies are ignored (322-330), and the stride value retains its prior value. When the stride inconsistencies become frequent (322-330), the stride value is modified (230). When the modified stride value becomes repetitive, the system adopts this value as the stride, and subsequent stride inconsistencies are again ignored, and the stride value thereafter retains is current value until inconsistencies become frequent.

Description

COUNTER BASED STRIDE PREDICTION FOR DATA PREFETCH}

본 발명은 전자 분야에 관한 것으로, 특히, 소정 프로세스에서 다음 데이터의 위치를 예측하여 그 위치로 부터 데이터를 용이하게 프리패치(prefetch)하는 방법 및 시스템에 관한 것이다. TECHNICAL FIELD The present invention relates to the electronic field, and more particularly, to a method and system for predicting the location of the next data in a given process and easily prefetching data from the location.

프리패치는 연속적인 프로세스에서의 지연(latency)을 최소화하는 통상적인 기술이다. 장래에 그 프로세스에 필요할 것으로 기대되는 데이터는, 장래의 후속적인 억세스를 위해, 메모리로 부터 검색되고 캐쉬(cache)에 저장된다. 그 캐쉬는 그 메모리보다 많이 빠른 억세스 시간을 제공하도록 고안된다. 따라서, 그 프로세스가 데이터를 필요로 하고, 그 데이터가 캐쉬내에 있으면, 그 데이트는 보다 높은 캐쉬 억세스 속도로 그 프로세스에 제공된다. 반대로, 데이터가 캐쉬내에 있지 않으면, 그 데이터는 상당히 느린 메모리 억세스 시간이 지난 후에야 비로소 그 프로세스에 제공되며, 그에 따라 그 프로세스에 메모리 억세스 지연을 도입한다.Prefetch is a common technique to minimize latency in continuous processes. Data expected to be needed by the process in the future is retrieved from memory and stored in cache for future subsequent access. The cache is designed to provide access times much faster than that memory. Thus, if the process needs data and the data is in the cache, the data is provided to the process at a higher cache access rate. Conversely, if data is not in the cache, the data is not provided to the process until after a fairly slow memory access time, thereby introducing a memory access delay to the process.

장래에 필요할 것으로 예상되는 데이터를 용이하게 예측하는데 공통적으로 이용할 수 있는 여러 기술이 있다. 그러한 기술은 "스트라이드 예측(stride prediction)"으로서, 그 기술에서는 필요한 다음 데이터 아이템의 위치가, 이전에 억세스된 데이터 아이템의 위치 시퀀스에 기반한다. 예를 들어, 데이터는 피고용인의 이름, 주소, 사회 보장 번호등의 레코드와 같은 데이터 레코드 어레이 또는 데이터 레코드 리스트로서 저장되는 경우가 있다. 전형적으로, 이들 레코드는 고정 길이 레코드이며, 그에 따라 각 레코드의 시작은 고정된 수의 메모리 위치만큼 그 메모리내의 인접 레코드로 부터 이격된다. 모든 피 고용인 이름의 프린트아웃(printout) 또는 디스플레이를 제공하는 어플리케이션은, 예를 들어, 이런 고정된 수만큼 이격된 메모리 위치를 연속적으로 억세스할 것이다. 어플리케이션이 그 메모리로부터 위치 L에 있는 제 1 피고용인 이름을 억세스하고 위치 L+S에 있는 제 2 피고용인 이름을 억세스한다고 하면, 그 어플리케이션은 메모리의 위치 L+S+S에 위치된 데이터를 억세스하여 다음 피고용인 이름을 획득할 수 있을 것이다. 이러한 억세스가 요청되기 전에, 위치 L+S+S에 있는 데이터가 메모리로 부터 검색되어 고속 캐쉬 메모리에 저장되면, 제 3 피고용인 이름은 제 1 및 제 2 피 고용인 이름보다 빠르게 그 어플리케이션에 제공될 수 있다.There are a number of techniques that can be commonly used to easily predict the data that will be needed in the future. Such a technique is "stride prediction", in which the position of the next data item required is based on the sequence of positions of the previously accessed data item. For example, data may be stored as an array of data records or a list of data records, such as records of employees' names, addresses, social security numbers, and the like. Typically, these records are fixed length records, such that the beginning of each record is spaced apart from adjacent records in that memory by a fixed number of memory locations. An application that provides a printout or display of all employee names will, for example, continuously access such fixed number of spaced memory locations. If the application accesses the first employee name at location L from the memory and the second employee name at location L + S, the application accesses the data located at location L + S + S in memory and then Employee names will be obtained. If such data at location L + S + S is retrieved from memory and stored in fast cache memory before such access is requested, the third employee name may be provided to the application faster than the first and second employee names. have.

도 1은 종래 기술의 스트라이드 예측 프리패치 프로세스의 예시적인 흐름도이다. 110에서, 프리패치 프로세스는, 어플리케이션이 새로운 데이터 아이템에 대한 억세스 요청을 발생함과 동시에 실행된다. 도시되지는 않았지만, 이러한 프리패치는 다수의 프리패치 알고리즘을 포함하는 전역적 스킴(global scheme)의 일부로서 지칭되기도 하며, 순차적인 데이터 억세스의 근접성등과 같은 요소에 의존하여 선택적으로 실행될 수 있다. 전형적으로, 프로세서는, 그 메모리에 대해 실행된 억세스와 관련된 순차적인 억세스들간의 간격(본 명세서에는 이를 억세스의 스트라이드라함)을 포함하는 정보를 기록하는 스트라이드 예측 테이블(Stride Prediction Table : SPT)을 유지한다. 이러한 스트라이드 예측 테이블은, 전형적으로, 다수의 잠재적인 스트라이드 트랙을 유지하기 위해 실행된 억세스와 관련된 다수의 정보 세트를 기록하도록 구성된다. 편의상 및 이해를 위해, 본 명세서에서는 관련 메모리 억세스들 사이의 단일 스트라이드의 트랙을 유지시키는데 이용되는 단일 정보 세트의 패러다임을 이용하는 발명이 안출된다. 당업자라면, 본 명세서에서 안출된 원리가 다수의 스트라이드에 관련된 다수의 정보 세트를 포함하는 스트라이드 예측 테이블의 통상적인 이용에 직접 적용될 수 있음을 알 수 있을 것이다. 1 is an exemplary flow diagram of a prior art stride prediction prefetch process. At 110, the prefetch process runs concurrently with the application generating an access request for a new data item. Although not shown, such prefetch may also be referred to as part of a global scheme that includes multiple prefetch algorithms, and may be selectively executed depending on factors such as proximity of sequential data access. Typically, a processor maintains a Stride Prediction Table (SPT) that records information that includes the interval between sequential accesses associated with the accesses executed for that memory (herein referred to as strides of accesses). do. Such a stride prediction table is typically configured to record multiple sets of information related to accesses performed to maintain multiple potential stride tracks. For convenience and understanding, the invention contemplates the use of a paradigm of a single set of information used to keep track of a single stride between associated memory accesses. Those skilled in the art will appreciate that the principles devised herein can be applied directly to the conventional use of a stride prediction table that includes a plurality of sets of information related to multiple strides.

110에서의 현재 스트라이드는, 120에서, 이전/과거(prior/old) 억세스의 어드레스와 새로 요청된 억세스의 어드레스간의 차이에 의해 결정된다. 다음 요청된 어드레스가 이전 억세스처럼 균등 이격될 것으로 가정하면, 130에서, 프리패치가 실행되어 현재/새로운(current/new) 어드레스로 부터 동일 간격에 있는 어드레스의 데이터를 인출한다. 140에서, 다음 데이터 억세스에 대비하여 이전 어드레스는 새로운 어드레스로 대체되며, 150에서 프리패치 루틴이 종료한다. 새로운 데이터에 대한 요청 억세스 시점에 다음 후보 데이터에 대한 프리패치를 개시함으로써, 어플리케이션이 이 데이터에 대한 억세스 요청을 시작하면, 프리패치된 데이터가 고속 캐쉬에 나타날 것이다. 다음 후보 데이터가 후속 요청된 데이터가 아니면, 그 캐쉬는 다음 요청된 데이터를 포함하지 않을 것이며, 메모리 억세스가 요청될 것이다. The current stride at 110 is determined at 120 by the difference between the address of the prior / old access and the address of the newly requested access. Assuming that the next requested address will be evenly spaced like the previous access, at 130, a prefetch is executed to retrieve data of addresses at equal intervals from the current / new address. At 140, the old address is replaced with a new address in preparation for the next data access, and at 150 the prefetch routine ends. By initiating a prefetch for the next candidate data at the time of the request access to the new data, if the application initiates an access request for this data, the prefetched data will appear in the fast cache. If the next candidate data is not the next requested data, the cache will not contain the next requested data, and memory access will be requested.

그러나, 도 1의 프리패치 프로세스는, 다음 후보 요청 데이터가 이전 요청된 데이터로 부터 균등 이격될 것이라는 가정에 무관하게, 새로운 데이터에 대한 모든 억세스의 프리패치를 시작한다. 이것은 상당한 메모리 억세스 트래픽을 유발하여, 메모리 억세스의 효율을 크게 줄이는 작용을 할 수 있다. 도 2에는 두개의 연속적인 억세스가 동일 스트라이드를 나타내는 경우 또는 오로지 그 경우에만, 소정 프리패치가 시작되는 개선된 종래 기술의 스트라이드 예측 프리패치 프로세스에 대한 예시적인 흐름도가 도시된다. 이 실시예에 있어서, 220에서, 새로운 억세스 요청에 대해 결정된 스트라이드를 이전 결정된 스트라이드와 비교한다. 130에서, 새로운 스트라이드가 이전 스트라이드와 같다면, 다음 스트라이드가 이들 두 스트라이드와 동일할 우도(likelihood)가 충분히 높아서 프리패치가 보장된다. 새로운 스트라이드가 과거 스트라이드와 다르다면, 230에서, 다음 사이클을 대비하여, 과거 스트라이드가 새로운 스트라이드로 대체된다. 당업자라면, 이와 같이 프리패치를 시작하는 2열의 기준(two-in-a-row criteria)이 3열, 4열등으로 연장되어, 캐쉬내에 다음 요청 데이터를 가질 가능성과 초과하는 메모리 억세스 트래픽간에 균형을 유지시킬 수 있음을 알 수 있을 것이다.However, the prefetch process of FIG. 1 starts prefetching of all accesses to new data, regardless of the assumption that the next candidate request data will be evenly spaced from the previously requested data. This causes significant memory access traffic, which may act to greatly reduce the efficiency of the memory access. FIG. 2 shows an exemplary flow diagram for an improved prior art stride prediction prefetch process in which a predetermined prefetch is initiated when or only if two consecutive accesses represent the same stride. In this embodiment, at 220, the stride determined for the new access request is compared with the previously determined stride. At 130, if the new stride is the same as the previous stride, the likelihood that the next stride is equal to these two strides is high enough that prefetching is guaranteed. If the new stride is different from the old stride, then at 230, the old stride is replaced with the new stride in preparation for the next cycle. Those skilled in the art can extend these two-in-a-row criteria to prefetching into columns 3, 4, etc. to balance the possibility of having the next request data in the cache with excess memory access traffic. You will see that it can be maintained.

도 1은 종래 기술의 스트라이드 예측 프리패치 프로세스의 예시적인 흐름도,1 is an exemplary flow diagram of a prior art stride prediction prefetch process;

도 2는 대안적인 종래 기술의 스트라이드 예측 프리패치 프로세스의 예시적인 흐름도,2 is an exemplary flow diagram of an alternative prior art stride prediction prefetch process;

도 3은 본 발명에 따른 스트라이드 예측 프리패치 프로세스의 예시적인 흐름도,3 is an exemplary flow diagram of a stride prediction prefetch process in accordance with the present invention;

도 4는 본 발명에 따른 스트라이드 예측 프리패치 시스템의 예시적인 블럭도.4 is an exemplary block diagram of a stride prediction prefetch system in accordance with the present invention.

전체 도면에서, 동일 참조 번호는 유사하거나 대응되는 특징 또는 기능을 나타낸다. In the entire drawing, like reference numerals indicate similar or corresponding features or functions.

본 발명의 목적은 어플리케이션에 의해 후속적으로 억세스될 데이터를 프리패치할 우도를 개선하는데 있다. 본 발명의 다른 목적은 하드웨어 구현에 아주 적합한 효율적인 프리패치 기법을 제공하는데 있다.It is an object of the present invention to improve the likelihood to prefetch data that will subsequently be accessed by an application. It is another object of the present invention to provide an efficient prefetch technique that is well suited for hardware implementation.

본 발명의 상술한 목적 및 다른 목적은 연속적인 프로세스에서 데이터를 프리패치하는데 이용되는 스트라이드값을 결정 및 정정하는데 있어서 히스테리시스(hysteresis)를 포함하는 프리패치 시스템에 의해 성취된다. 일단 스트라이드값이 결정되면, 단속적인 스트라이드 비 일관성(inconsistency)이 무시되고, 스트라이드값은 그의 이전값을 유지한다. 스트라이드 비 일관성이 빈번해지면, 스트라이드값은 정정된다. 정정된 스트라이드값이 반복적이면, 그 시스템은 이 값을 스트라이드로 채택하고 후속하는 스트라이드 비 일관성은 다시 무시되며, 비 일관성이 빈번해질 때 까지는 이후에 유지한 스트라이드값이 현재값으로 된다. The above and other objects of the present invention are achieved by a prefetch system that includes hysteresis in determining and correcting stride values used to prefetch data in a continuous process. Once the stride value is determined, intermittent stride inconsistency is ignored and the stride value retains its previous value. If stride inconsistency is frequent, the stride value is corrected. If the corrected stride value is repetitive, the system adopts this value as the stride and the subsequent stride inconsistency is ignored again, and the stride value held subsequently until the inconsistency becomes frequent becomes the current value.

메모리 억세스내의 규칙적인 스트라이드 패턴이 스트라이드 패턴을 따르지 않은 단속적인 데이터 억세스를 포함하는 경우도 있다는 것을 전제로 한다. 예를 들어, 네스트형 루프 구조(nested loop structure)는 데이터 리스트를 통해 싸이클을 형성하는 내부 루프와, 내부 루프의 각 싸이클에 이용되는 하나 이상의 변수를 프리셋하는 외부 루프를 포함한다. 데이터 리스트를 통해 싸이클을 형성하는 내부 루프는 고정 스트라이드를 나타낼 것이다. 그러나, 내부 루프의 각각의 재 시작시에는, 리스트의 시작부에서 메모리 억세스가 이루어지는 반면, 이전 억세스는 리스트의 종료부에서 이루어졌다. 그러나, 리스트의 종단부와 리스트의 시작부간의 스팬(span)은 내부 루프의 스트라이드에 대응하지 않을 것이다. 추가적으로, 그 루프의 시작부 또는 종단부에서의 외부 루프에 의한 데이터 억세스는, 억세스들간에, 내부 루프의 스트라이드에 대응하지 않은 스팬을 생성할 것이다. 다른 예시적인 스트라이드의 단속적 단절은 다차원 어레이로 구성되는 데이터의 프로세싱을 포함한다. 전형적으로, 데이터는 하나의 차원을 따르는 주어진 영역에 대해 처리되고, 다른 차원에 대한 인덱스는 증가되며, 첫번째 차원을 따르는 주어진 영역에 대해 그 다음에 인덱스된 다른 차원의 데이터가 처리된다. 첫번째 차원의 영역을 따르는 스트라이드는 일반적으로 고정적일 것이지만, 다음 차원에 대한 인덱스의 증가는 첫번째 차원을 따르는 스트라이드와 매칭되지 않은, 이전 억세스로 부터의 소정 스팬을 가진 억세스로 될 가능성이 있다.It is assumed that the regular stride pattern in memory access may include intermittent data access that does not follow the stride pattern. For example, a nested loop structure includes an inner loop that forms a cycle through a list of data, and an outer loop that presets one or more variables used for each cycle of the inner loop. The inner loop that forms the cycle through the data list will represent a fixed stride. However, at each restart of the inner loop, memory access is made at the beginning of the list, while previous access is made at the end of the list. However, the span between the end of the list and the beginning of the list will not correspond to the stride of the inner loop. Additionally, data access by the outer loop at the beginning or end of the loop will create spans between the accesses that do not correspond to the stride of the inner loop. Another intermittent break of the stride involves the processing of data organized into a multidimensional array. Typically, data is processed for a given region along one dimension, indexes for other dimensions are incremented, and data of the next dimensioned index is processed for a given region along the first dimension. The stride along the region of the first dimension will generally be fixed, but the increase of the index for the next dimension is likely to result in an access with a predetermined span from the previous access, which does not match the stride along the first dimension.

통상적인 스트라이드 예측 프로세스에 있어서, 스트라이드가 "단절"되거나 인터럽트될 때마다, 스트라이드를 결정하는 프로세스가 반복되어, 새로운 스트라이드가 결정된다. 새로운 스트라이드가 결정되는 시간 동안, 프리패치는 발생되지 않으며, 내부 루프의 각각의 재 시작시의 메모리 억세스에 의해 또는 다차원 어레이의 처리동안의 상위 레벨 인덱스의 각각의 증가에 의해 어플리케이션은 지연된다. In a typical stride prediction process, each time the stride is "disconnected" or interrupted, the process of determining the stride is repeated to determine a new stride. During the time that the new stride is determined, no prefetch occurs, and the application is delayed by memory access at each restart of the inner loop or by each increase of the high level index during processing of the multidimensional array.

본 발명에 따르면, 스트라이드의 단속적 단절 동안에는 스트라이드값이 유지된다. 현재 스트라이드의 데이터 프리패치를 수행하는 것은, 주어진 수의 메모리 억세스들 내에서 동일값을 가진 스트라이드들의 개수에 좌우되며, 프리패치 값을 조정하는 것은, 다수의 상이한 값을 가진 스트라이드의 발생에 좌우된다. 단순한 예시에 있어서, 데이터의 프리패치는 3 억세스들중 2 억세스가 연속적으로 동일한 값을 가질때마다 수행되며, 스트라이드값의 정정은 2 억세스가 연속적으로 상이한 스트라이드를 가질때마다 수행된다.According to the present invention, the stride value is maintained during the intermittent break of the stride. Performing data prefetch of the current stride depends on the number of strides with the same value within a given number of memory accesses, and adjusting the prefetch value depends on the generation of a stride with a number of different values. . In a simple example, prefetching of data is performed whenever two of the three accesses have the same value in succession, and correction of the stride value is performed whenever the two accesses have different strides in succession.

도 3에는 본 발명에 따른 스트라이드 예측 프리패치 프로세스의 예시적인 흐름도가 도시된다. 이 예시에 있어서, "카운트" 파라메타는 동일한 값을 가진 연속적인 스트라이드의 개수를 선택된 최대치까지 카운트하는데 이용된다. 본 발명의 바람직한 실시예에 있어서, 이 카운트 파라메타는 스트라이드에 있어서의 단속적인 단절과, 스트라이드의 실질적이면서 연속적인 변경을 구별하는데 이용된다. 본 명세서에 개시된 것을 고려한 당업자라면, 연속적인 불균등 스트라이드들의 개수를 카운트하는데 독자적인 파라메타가 채용될 수 있음을 알 수 있을 것이다. 3 shows an exemplary flow diagram of a stride prediction prefetch process in accordance with the present invention. In this example, the "count" parameter is used to count the number of consecutive strides with the same value up to the selected maximum. In a preferred embodiment of the present invention, this count parameter is used to distinguish between an intermittent break in the stride and a substantial and continuous change in the stride. Those skilled in the art having regard to what is disclosed herein will appreciate that unique parameters may be employed to count the number of consecutive uneven strides.

220에서, 새로운 억세스 어드레스와 이전 억세스 어드레스 사이의 현재 스트라이드의 결정후, 본 발명의 프로세스는 현재 스트라이드와 이전 스트라이드를 비교한다. 현재 스트라이드가 이전 스트라이드와 동일하면, 326에서 카운트는 증가되며, 그렇지 않으면, 322에서 현재 카운트가 감소된다. 바람직한 실시예에 있어서, 그 카운트는 0에서 최대 카운트까지로 값이 제한된다. 블럭(324,328)은 블럭(322,326)으로 부터의 증가 또는 감소된 카운트가 이 제한값내에 있도록 클립(clip)한다. At 220, after determining the current stride between the new access address and the previous access address, the process of the present invention compares the current stride with the previous stride. If the current stride is the same as the previous stride, the count is incremented at 326, otherwise the current count is decreased at 322. In a preferred embodiment, the count is limited in value from zero to the maximum count. Blocks 324 and 328 clip so that the increment or decremented count from blocks 322 and 326 is within this limit.

330에서, 그 카운트는 상한 UL 및 하한 LL과 비교되며, 이때 바람직하기로는, 하한 LL은 상한보다 작다. 이 예시에 있어서, 상한 UL은 다음 스트라이드 예측된 억세스 위치로 부터 프리패치를 보장하는데 필요한 균등 스트라이드 발생 횟수에 대응한다. 카운트가 이러한 상한UL과 동일하거나 그를 초과하면, 130에서, 현재 어드레스와 현재 스트라이드를 가산한 것에 대응하는 어드레스로 부터 데이터 프리패치가 실행된다. 예를 들어, 상한 UL이 2의 값일 경우, 초기에, 2열의 동일 스트라이드값이 발생되지 않은 한, 프리패치는 실행되지 않는다. 상한이 3의 값이면, 3열의 동일 스트라이드값이 발생되지 않는한, 프리패치는 실행되지 않는다. 이후, 326,328을 통해 후속하는 동일값을 가진 스트라이드는 그 카운트를 최대까지 계속 증가시킨다. 바람직한 실시예에 있어서, 최대값은 현재 스트라이드값이 "신뢰성" 있다고 결론짓는데 필요한 연속적인 동일값을 가진 스트라이드의 개수로서 선택된다. At 330, the count is compared to the upper limit UL and the lower limit LL, where preferably the lower limit LL is less than the upper limit. In this example, the upper limit UL corresponds to the number of even stride occurrences required to ensure prefetch from the next stride predicted access location. If the count equals or exceeds this upper limit UL, at 130, data prefetch is performed from the address corresponding to the addition of the current address and the current stride. For example, when the upper limit UL is a value of 2, prefetching is not executed unless the same stride value of two rows is generated initially. If the upper limit is a value of 3, the prefetch is not executed unless the same stride value of three columns is generated. Subsequent strides through 326,328 then continue to increase the count to the maximum. In the preferred embodiment, the maximum value is chosen as the number of consecutive equal values of stride required to conclude that the current stride value is "reliable".

상이한 스트라이드가 발생되는 각 시점마다, 332에서 그 카운트는 감소된다. 따라서, 최대 카운트와 현재 카운트값의 차이는 현재 스트라이드값의 "비 신뢰성(unreliability)"의 척도(measure)에 대응한다. 230에서, 하한은 현재 스트라이드값에 대한 변경을 보장하기 위해 비 신뢰성의 충분한 척도를 구성하는 현재 카운트의 값으로서 선택된다. 일반적으로, 도 3에 도시된 바와 같이, 230에서, 현재 스트라이드값이 신뢰성이 없는 것으로 간주되면, 130에서 프리패치는 실행되지 않는다. 유사한 방식으로, 130에서, 현재 스트라이드값이 프리패치를 보장하기에 충분히 신뢰성이 있으면, 230에서 현재 스트라이드값의 정정은 실행되지 않는다. 따라서, 하한 LL은 상한보다 낮게, 일반적으로는 UL-1로 설정된다. 선택적으로, 테스트블럭들(330,140)간의 파선 접속에 의해 나타난 바와 같이, 하한 LL은, 상한 UL에 대해, 현재 스트라이드의 평가된 신뢰성이 130에서 프리패치를 보장하기에 충분하지 않으면서(카운트〈UL), 평가된 비 신뢰성이 230에서 현재 스트라이드의 정정을 보장하기에 충분하지 않도록(카운트〉LL) 선택된다. Each time a different stride occurs, the count is reduced at 332. Thus, the difference between the maximum count and the current count value corresponds to a measure of "unreliability" of the current stride value. At 230, the lower limit is selected as the value of the current count, which constitutes a sufficient measure of instability to ensure a change to the current stride value. In general, as shown in FIG. 3, at 230, if the current stride value is considered unreliable, prefetch is not executed at 130. In a similar manner, at 130, if the current stride value is reliable enough to ensure prefetching, then at 230 the correction of the current stride value is not performed. Therefore, the lower limit LL is set lower than the upper limit, and is generally set to UL-1. Optionally, as indicated by the broken line connection between the test blocks 330, 140, the lower limit LL, for the upper limit UL, is not sufficient for the current stride's estimated reliability to ensure prefetch at 130 (count <UL). ), It is chosen so that the estimated unreliability is not sufficient to ensure correction of the current stride at 230 (count > LL).

도 4는 본 발명에 따른 스트라이드 예측 프리패치 시스템의 예시적인 블럭도이다. 패치 제어기(fetch controller)(430)는 제어 레지스터(410)의 컨텐츠와 프로세서(420)로 부터의 데이터 억세스 요청에 기초하여, 메모리(450)에서 캐쉬로 데이터를 프리패치시킨다. 요청된 데이터의 어드레스는, 상술한 바와 같이, 어플리케이션이 반복적인 방식으로 데이터를 요청하는지, 메모리 억세스들간에 일관된 스트라이드 또는 위치의 스팬을 나타내는지를 결정하는데 이용된다. 제어 레지스터(410)는 이전 데이터 억세스(416)의 어드레스, 이전 스트라이드(412) 및 카운터(414)를 포함한다. 상술한 바와 같이, 단일 카운터(414) 대신에 두개의 카운터가 이용되어, 동일 스트라이드 억세스 및 상이한 스트라이드 억세스의 독립적인 카운트를 유지할 수 있다. 프로세서(420)로 부터의 현재 요청된 데이터의 어드레스는 이전 어드레스(416)와 비교되어, 현재 스트라이드가 결정된다. 현재 스트라이드가 이전 스트라이드(412)에 대응하면, 카운터(414)는 증가되며, 그렇지 않은 경우에는, 감소된다. 카운터(414)의 증가 또는 감소된 값에 따라, 패치 제어기는 메모리(450)내의 다음 예정된 위치에서 캐쉬(460)로 데이터의 프리패치를 시작할지의 여부를 결정한다. 후속적인 데이터 요청이 캐쉬(460)내에 프리패치된 데이터에 대한 것이면, 캐쉬(460)는 프로세스에 직접 데이터를 제공하고, 그에 따라 메모리(450)로 부터 데이터를 검색함에 의해 발생되는 지연을 방지하게 된다.4 is an exemplary block diagram of a stride prediction prefetch system in accordance with the present invention. A fetch controller 430 prefetches data from the memory 450 into the cache based on the contents of the control register 410 and the data access request from the processor 420. The address of the requested data is used to determine whether the application requests data in an iterative manner, as described above, or represents a consistent stride or span of location between memory accesses. Control register 410 includes the address of previous data access 416, previous stride 412, and counter 414. As discussed above, two counters may be used instead of a single counter 414 to maintain independent counts of identical stride access and different stride access. The address of the currently requested data from the processor 420 is compared with the previous address 416 to determine the current stride. If the current stride corresponds to the previous stride 412, the counter 414 is incremented, otherwise it is decremented. Depending on the increment or decremented value of the counter 414, the patch controller determines whether to start prefetching data into the cache 460 at the next scheduled location in the memory 450. If the subsequent data request is for data prefetched in cache 460, cache 460 provides the data directly to the process, thereby preventing delays incurred by retrieving data from memory 450. do.

카운터(414)의 증가 또는 감소된 값에 따라, 패치 제어기(430)는 상술한 바와 같이 스트라이드값(412)을 정정할지를 결정한다. 종래의 시스템에서 처럼 하나의 상이한 스트라이드의 발생에 기초하여 스트라이드값을 정정하는 것이 아니라, 상이한 스트라이드의 개수에 따른 카운트에 기초하여 스트라이드 값(412)을 정정할지를 결정함으로서, 스트라이드 예측 프리패치 시스템(400)은 스트라이드의 단속적인 단절에 둔감하게 된다. Depending on the increment or decremented value of the counter 414, the patch controller 430 determines whether to correct the stride value 412 as described above. Instead of correcting the stride value based on the occurrence of one different stride, as in conventional systems, the stride prediction prefetch system 400 is determined by determining whether to correct the stride value 412 based on a count according to the number of different strides. ) Is insensitive to intermittent breaks in the stride.

상술한 것은 단지 본 발명의 원리를 설명한 것이다. 당업자라면, 비록 본 명에서에서 정확하게 설명하거나 도시하지는 않았지만, 이하의 청구범위의 사상 및 범주내에서 본 발명의 원리를 채용한 다양한 구조를 고안할 수 있음을 알 수 있을 것이다. The foregoing merely illustrates the principles of the invention. Those skilled in the art will appreciate that various structures may be devised which employ the principles of the invention, although not explicitly described or illustrated in the present invention, within the spirit and scope of the following claims.

Claims

As a method of prefetching data from memory 450 to cache 460:

Determining (326) a first measure of identical stride memory access based on a number of identical stride memory accesses and previous stride values 412;

Determining (322) a second measure of different stride memory access based on the plurality of different stride memory accesses and previous stride values 412;

Prefetching data from the memory 450 based on a first measure (130),

Correcting 230 the previous stride value 412 based on the second measure,

Data prefetch method.

The method of claim 1,

Determining the first and second scales,

Maintaining a count that is incremented 326 for each same stride memory access and a decrease 322 for each different stride memory access, executing the prefetch, and performing a correction based on the count 414 due to,

Performed,

Data prefetch method.

The method of claim 2,

The step 130 of executing the prefetch occurs when the count 414 is greater than or equal to the upper limit (330),

The step 230 of performing the correction occurs when the count 414 is below the lower limit (330),

Data prefetch method.

The method of claim 3, wherein

The count 414 is limited to a maximum of 3, the upper limit is 2 and the lower limit is 1,

Data prefetch method.

As the prefetch system 400,

A control register 410 configured to include at least one measure 414 corresponding to the consistency of the stride value between the requested memory accesses;

A prefetch controller 430 configured to prefetch data from the memory 450 to the cache 460 based on the measure of consistency,

The consistency of the stride value depends on the comparison of the current stride and the previous stride value 412, and the prefetch controller 430 corrects the previous stride value 412 based on a measure of inconsistency. Further configured, wherein the inconsistency measure is based on a number of different strides between requested memory accesses,

Prefetch System 400.

The method of claim 5,

The consistency measure and inconsistency measure correspond to a count 414 that is incremented up to a maximum count for each same stride requested memory access and is reduced to a minimum count for each different stride requested memory access,

Prefetch System 400.

The method of claim 6,

The prefetch controller 430 is configured to prefetch data when the count 414 is greater than or equal to an upper threshold level and to correct a previous stride value 412 when the count 414 is less than or equal to a lower threshold level.

Prefetch System 400.

The method of claim 7, wherein

The maximum count is 3, the upper threshold level is 2, the lower threshold level is 1, and the minimum count is 0,

Prefetch System 400.

As a processing system:

A memory 450 configured to provide access to data based on the access address;

A cache 460 operatively coupled to the memory 450, the cache 460 configured to store data accessed from the memory 450 and to quickly access the data;

Operatively coupled to the memory 450 and the cache 460 to provide the access address, if the data is in the cache 460, receive the data from the cache 460, and the data is cache 460. Processor 420 configured to receive the data from memory 450, if not within;

Operatively coupled to the processor 420, the memory 450, and the cache 460, based on the access address and the predicted stride value, the cache 460 from the memory 450. A patch controller 430 configured to transmit data to the

The patch controller 430 is configured to maintain a measure of stride consistency based on repeated occurrences of the same stride value, and maintain a measure of stride inconsistency based on the repeated occurrence of different stride values,

The patch controller 430 transmits data based on the measure of stride consistency and corrects a predicted stride value based on the measure of stride inconsistency,

Processing system.

The method of claim 9,

The measure of stride coherence and the measure of stride inconsistency are based on a count 414 that is incremented to a maximum count for each same stride requested memory access and reduced to a minimum count for each different stride requested memory access. doing,

Processing system.

The method of claim 10,

The patch controller 430,

Transmit data when the count 414 is above the upper threshold level and correct the predicted stride value when the count 414 is below the lower threshold level,

Processing system.

The method of claim 11,

Processing system.