KR20050088292A

KR20050088292A - Selectively changeable line width memory

Info

Publication number: KR20050088292A
Application number: KR1020057008824A
Authority: KR
Inventors: 라파엘 블란코; 잭 알. 스미스; 세바스찬 티. 벤트론
Original assignee: 인터내셔널 비지네스 머신즈 코포레이션
Priority date: 2005-05-17
Filing date: 2002-12-17
Publication date: 2005-09-05
Also published as: KR100714933B1

Abstract

The invention provides for selectively changing a line width for a memory, i.e., selecting one of a plurality of line widths for a memory (14). The selected line width is used in communicating with one or more processors (12, 26). This provides increased flexibility and efficiency for communicating with the memory. In particular, a register (42) can be set based on a desired line width, and subsequently used when locating data in the memory. The selected line width can be associated with each data block (38) in the memory to allow multiple line widths to be used simultaneously. When implemented in a cache (30, 130), multiple ways (40) of the cache can be processed as a group to provide data during a single memory operation. The line width can be varied based on a task (13, 28), a processor, and/or a performance evaluation.

Description

Selectable variable line width memory {SELECTIVELY CHANGEABLE LINE WIDTH MEMORY}

본 발명은 메모리 라인 폭에 관한 것이다.The present invention relates to memory line widths.

캐시는 메인 메모리와 처리 유닛 간에 데이터 전달 속도를 높이는데 사용되는 메모리의 한 유형이다. 일반적으로, 캐시는 메인 메모리보다는 작은 양의 데이터를 포함한다. 통상, 처리 유닛에 의해 액세스 되었거나, 되려고 하는 데이터(예컨대, 최근 액세스된 데이터, 이웃 데이터, 룩어헤드(look ahead) 알고리즘으로 결정되는 데이터 등)는 메인 메모리로부터 캐시 내의 하나 이상의 데이터 블록으로 로드된다. 처리 유닛에 의해 메인 메모리 주소가 캐시에 제공되면, 그 메인 메모리의 주소 전부 또는 일부는 요청된 데이터가 캐시 내에 있는지의 여부를 결정하는 데에 사용된다.Cache is a type of memory used to speed up data transfer between main memory and processing units. In general, a cache contains a smaller amount of data than main memory. Typically, data that has been accessed or intended to be accessed by a processing unit (eg, recently accessed data, neighbor data, data determined by a look ahead algorithm, etc.) is loaded from main memory into one or more data blocks in the cache. If a main memory address is provided to the cache by the processing unit, all or part of the address of that main memory is used to determine whether the requested data is in the cache.

도 1은 데이터 블록들(셀들)(6)의 격자 구조로 구조적으로 조직되어 있는 예시 캐시(2)를 도시한 것이다. 열은 통로(8)로 지칭되며, 행은 각각 인덱스로 표시되어 있다. 예시 캐시(2)에서, 4개의 통로(8) 즉, 통로₁ 내지 통로₃과 0-7로 인덱스된 8개의 열이 제공된다. 그러므로, 32개의 데이터 블록이 나타나 있다. 각 데이터 블록(6)은 하나 이상의 워드 데이터를 포함한다. "워드"는 처리 시스템에서 독립적으로 주소지정 가능한(addressable) 가장 작은 양의 데이터를 말한다. 워드는 일반적으로 하나 이상의 바이트(예컨대, 2바이트, 4바이트 등)이다. 메모리 내 오버헤드의 부담을 줄이기 위해서, 통상적으로 각 데이터 블록(6) 내에 복수의 워드가 저장된다. 각 데이터 블록(6) 내에 저장된 복수 분량의 데이터에 단일 작업을 위한 메모리가 예약되어 있다.1 shows an example cache 2 structured in a lattice structure of data blocks (cells) 6. Columns are referred to as passages 8, with rows each marked with an index. In the example cache 2, four passages 8, i.e. passages ₁ through _3, and eight columns indexed 0-7 are provided. Therefore, 32 data blocks are shown. Each data block 6 contains one or more word data. "Word" refers to the smallest amount of data that is independently addressable in a processing system. A word is generally one or more bytes (eg, 2 bytes, 4 bytes, etc.). To reduce the burden of overhead in memory, typically a plurality of words are stored in each data block 6. Memory for a single operation is reserved for a plurality of pieces of data stored in each data block 6.

메인 메모리의 주소가 주어진 경우, 캐시(2)는 인덱스를 사용하여 각 통로(8) 내에서 해당 데이터 블록(6)의 위치를 알아낸다. 다음으로 캐시(2)는 임의의 위치를 알아낸 데이터 블록들(6)이, 제공된 메인 메모리 주소에 대한 데이터를 포함하는지 여부를 결정한다. 위치를 알아낸 데이터 블록(6) 중 하나에 데이터가 존재하는 경우, 그 데이터에 소정의 동작(예컨대, 읽기, 쓰기, 삭제 등)이 수행된다. 데이터가 존재하지 않는 경우, 요청된 데이터는 메인 메모리로부터 검색되어, 위치를 알아낸 데이터 블록들(6) 중 하나로 로드된 다음, 소정의 동작이 수행될 수 있다.Given the address of the main memory, the cache 2 uses the index to locate the corresponding data block 6 within each passage 8. The cache 2 then determines whether any located data blocks 6 contain data for the provided main memory address. When data exists in one of the located data blocks 6, a predetermined operation (e.g., read, write, delete, etc.) is performed on the data. If no data exists, the requested data is retrieved from the main memory, loaded into one of the located data blocks 6, and then a predetermined operation can be performed.

도 2는 캐시(2)에 대한 종래의 주소 룩업 동작을 도시한 것이다. 캐시(2)는 N개의 통로(8) 즉, 통로₀ 내지 통로_N-1을 포함하는 것으로 나타나 있다. 각 통로(8)는 0 내지 2^I-1로 인덱스된 2^I개의 데이터 블록(6)을 포함한다. 통상, 프로세서는 데이터에 대한 메인 메모리 주소(4)를 캐시(2)에게 제공한다. 요청된 데이터의 위치를 알아내기 위해서, 캐시(2)는 메인 메모리 주소(4)를 태그 부분(4A), 인덱스 부분(4B), 및/또는 블록 오프셋 부분(4C)을 포함하는 것으로 간주한다. 메인 메모리에 대한 캐시(2)의 상대적인 크기 및 각 데이터 블록(6) 내 데이터(6D)의 분량은 각 주소 부분(4A-C)의 크기를 결정한다. 예를 들면, 특정 메인 메모리는 22 비트 길이의 주소를 필요로 하는 4메가 워드(2²²워드)를 포함할 수 있다. 그러나, 캐시(2) 내 각 통로(8)는 단지 1킬로 워드(2¹⁰워드)만을 포함할 수 있으며, 상기 워드는 각각 4워드인 데이터 블록 256개에 저장되어 있다. 이 경우, 블록 오프셋 부분(4C)은 2비트(4(2²)워드 중 하나의 위치를 찾기 위함)를 포함할 것이며, 인덱스 부분(4B)은 8비트(256(2⁸)데이터 블록 중 하나의 위치를 찾기 위함)를 포함할 것이고, 태그 부분(4A)은 남은 12 비트를 포함할 것이다. 인덱스 부분(4B)은 메인 메모리 주소(4) 내에서 블록 오프셋 부분(4C)과 인접한 비트들에서 시작하여 위치할 수 있다. 태그 부분(4A)은 메인 메모리 주소(4)에서 블록 오프셋 부분(4C) 또는 인덱스 부분(4B)에서 사용되지 않는 남은 비트(T)를 포함한다. 통상, 태그 부분(4A)은 메인 메모리 주소(4)에서 가장 높은 위치의 값으로 할당된 비트("최상위 비트")를 포함한다.2 shows a conventional address lookup operation for the cache 2. The cache 2 is shown to contain _N passages 8, that is, passages ₀ through _N-1 . Each passageway 8 comprises 2 ^I data blocks 6 indexed from 0 to 2 ^I −1. Typically, the processor provides the cache 2 with a main memory address 4 for data. To locate the requested data, the cache 2 regards the main memory address 4 as including a tag portion 4A, an index portion 4B, and / or a block offset portion 4C. The relative size of the cache 2 relative to the main memory and the amount of data 6D in each data block 6 determines the size of each address portion 4A-C. For example, a particular main memory may contain 4 mega words (2 ²² words) requiring an address that is 22 bits long. However, each passage 8 in the cache 2 may contain only 1 kilo word (2 ¹⁰ words), which are stored in 256 data blocks of 4 words each. In this case, the block offset portion 4C will contain two bits (to find the position of one of the 4 (2 ² ) words), and the index portion 4B will have eight bits (one of the 256 (2 ⁸ ) data blocks. To find the position of the < RTI ID = 0.0 > and < / RTI > tag portion 4A will contain the remaining 12 bits. The index portion 4B may be located starting from the bits adjacent to the block offset portion 4C in the main memory address 4. The tag portion 4A includes the remaining bits T which are not used in the block offset portion 4C or the index portion 4B in the main memory address 4. Typically, the tag portion 4A includes a bit ("highest order bit") assigned to the value of the highest position in the main memory address 4.

데이터를 검색하기 위해서, 캐시(2)는 인덱스 부분(4B)을 사용하여 데이터 블록(6)의 행의 위치를 알아낸다, 즉, 인덱스 부분(4B)은 인덱스 0 내지 7과 매치될 인덱스 룩업(5)으로서 사용된다. 다음으로 캐시(2)는, 태그 부분(4A)과 각 데이터 블록(6) 내에 저장된 태그(6A)를 비교함으로써, 위치를 알아낸 행 내의 데이터 블록(6) 중 하나가 제공된 메인 메모리 주소(4)에 대한 데이터(6D)를 포함하는 지 여부를 결정한다. 올바른 데이터가 존재하면, 소정의 동작이 수행된다. 블록 오프셋 부분(4C)은 메인 메모리 주소(4)의 상당량의 비트(B)를 포함하는데, 이는 데이터(6D) 내에 데이터의 위치를 결정하기 위해 요구되는 것이다. 통상, 블록 오프셋 부분(4C)은 메인 메모리 주소(4)에서 가장 낮은 위치의 값으로 할당된 비트("최하위 비트")를 포함한다. 데이터 블록(6) 내에 데이터(6D)가 메인 메모리 내의 데이터와 매치되는지 여부를 지시하는 더티(dirty) 비트(6B)와 같은 다른 정보, 및 데이터 블록(6)이 유효한 데이터를 갖는지 여부를 지시하는 유효 비트(6C)가 각 데이터 블록(6) 내에 포함될 수 있다.In order to retrieve the data, the cache 2 uses the index portion 4B to locate the rows of the data block 6, i.e., the index portion 4B may use the index lookup ( 5) is used. The cache 2 then compares the tag portion 4A with the tags 6A stored in each data block 6, thereby providing a main memory address 4 provided with one of the data blocks 6 in the located row. It is determined whether to include the data (6D) for. If the correct data exists, the predetermined operation is performed. The block offset portion 4C includes a significant amount of bits B of the main memory address 4, which is required to determine the location of the data within the data 6D. Typically, the block offset portion 4C includes a bit assigned to the value of the lowest position in the main memory address 4 ("lowest bit"). Other information, such as dirty bit 6B, indicating whether data 6D in data block 6 matches data in main memory, and indicating whether data block 6 has valid data. A valid bit 6C may be included in each data block 6.

메인 메모리 주소(4)에 위치한 데이터를 캐시(2)로 로드하기 위해서, 인덱스 부분(4B)이 데이터 블록(6)의 행에 대한 인덱스 룩업(5)으로서 사용된다. 통로(8) 중 하나에 있는 데이터 블록(들)(6)이 선택되고, 데이터는 그 데이터 블록(들)(6)으로 로드된다. 데이터가 데이터 블록(6)으로 로드되면, 태그 부분(4A)이 각 데이터 블록(6)의 태그(6A)에 쓰여진다. 이어서, 검색을 위해 메인 메모리 주소(4)가 제공되면, 그 데이터를 포함할 수 있는 데이터 블록(6) 행의 위치를 알아내기 위해서, 인덱스 부분(4B)은 다시 인덱스 룩업(5)으로서 사용된다. 데이터 블록(6)이 요청된 데이터를 포함하는지 여부를 결정하기 위해서, 태그 부분(4A)이 위치를 알아낸 각 데이터 블록(6) 내의 태그(6A)와 비교된다. In order to load the data located at the main memory address 4 into the cache 2, an index portion 4B is used as the index lookup 5 for the row of the data block 6. The data block (s) 6 in one of the passages 8 is selected and the data is loaded into the data block (s) 6. When data is loaded into the data block 6, the tag portion 4A is written to the tag 6A of each data block 6. Then, if a main memory address 4 is provided for retrieval, the index portion 4B is again used as an index lookup 5 to locate the row of data block 6 that may contain that data. . To determine whether the data block 6 contains the requested data, the tag portion 4A is compared with the tag 6A in each located data block 6.

"라인 폭"은 단일 동작으로 메모리에/로부터 전달되는 비트의 양이다. 통상적으로, 캐시(2)에/로부터 데이터를 전달하는 라인 폭은 데이터 블록(6) 내의 데이터(6D)의 양에 해당하며, 고정되어 있다. 예컨대, 상기 예에서, 각 데이터 블록(6)은 4 워드를 포함한다. 따라서, 라인 폭은 4워드가 된다. 그 결과, 각 통로(8)는 메모리 동작이 수행되는 동안, 개별적으로 액세스된다."Line width" is the amount of bits passed to / from memory in a single operation. Typically, the line width for transferring data to / from the cache 2 corresponds to the amount of data 6D in the data block 6 and is fixed. For example, in the above example, each data block 6 comprises four words. Therefore, the line width is 4 words. As a result, each passage 8 is accessed individually while the memory operation is performed.

주어진 메모리 크기에 대해, 라인 폭이 커질수록, 데이터 동작을 수행하는 데 필요로 하는 메모리 동작이 적어지므로 유리하다. 예를 들면, 1워드의 라인 폭을 사용하여 16워드를 읽기 위해서는 16번의 읽기 동작이 요구된다. 동일한 동작을 4워드의 라인 폭을 사용하게 되면, 단지 4번의 읽기 동작만을 필요로 한다. 그런데, 캐시가 사용되고, 라인 폭이 데이터 블록의 크기에 해당하는 경우, 블록이 커지면, 데이터가 캐시 내에 저장되지 못할 가능성(즉, 캐시 미스(cache miss))이 커질 수 있다. 캐시 미스의 비율이 높아지면, 메인 메모리와 캐시 간에 좀 더 빈번한 전달을 야기시키면서 성능을 저하시키게 된다. 일반적으로, 큰 라인 폭이 사용되어, 캐시의 동작 회수를 줄이는 경우, 데이터 동작을 많이 수행하면서 코드의 국지성을 작게 유지하는 단일의 작업이 이득이 있다. 반면, 코드의 국지성이 좀 더 분산되고/또는, 많은 작업이 하나의 캐시를 공유하는 경우, 관련이 없는 물리적 주소들로부터 추가적인 데이터 블록이 저장될 수 있기 때문에 라인 폭이 작을수록 바람직하다. 유감스럽게도, 현재의 기술은 단일 메모리(캐시)에 관하여 각기 다른 라인 폭을 제공한다. 이 문제는 별개의 라인 폭으로부터 이득을 얻는 작업들, 및 별개의 함수를 수행하는 경우 별개의 라인 폭을 사용함으로써 이득을 얻는 작업들에 대해 존재한다. 덧붙여, 단일 캐시에 대해서 다양한 라인 폭을 제공하지 못한다는 것은, 특정 프로세서 아키텍처 또는 상속(legacy) 프로그램 코드가 적절히 기능을 수행하기 위해서 소정의 라인 폭을 필요로 하거나/예상할 수 있는 경우에 문제가 될 수 있다. 이러한 문제는 프로세서 및/또는 작업이 하나의 메모리를 공유하면서, 상이한 라인 폭을 요구/희망하는 경우 증폭된다.For a given memory size, the larger the line width, it is advantageous as the less memory operation required to perform the data operation. For example, 16 read operations are required to read 16 words using a line width of 1 word. Using the same operation with a 4-word line width requires only four read operations. However, if a cache is used and the line width corresponds to the size of the data block, the larger the block, the greater the likelihood that data cannot be stored in the cache (ie, a cache miss). Increasing the rate of cache misses degrades performance, causing more frequent transfers between main memory and the cache. In general, when a large line width is used to reduce the number of operations in the cache, there is a single operation that keeps the locality of the code small while performing many data operations. On the other hand, if the locality of the code is more distributed and / or many tasks share one cache, smaller line widths are desirable because additional blocks of data may be stored from unrelated physical addresses. Unfortunately, current technology offers different line widths for a single memory (cache). This problem exists for tasks that benefit from separate line widths, and tasks that benefit from using separate line widths when performing separate functions. In addition, the inability to provide varying line widths for a single cache is problematic when certain processor architectures or legacy program code may need / expect certain line widths to function properly. Can be. This problem is amplified when the processor and / or task share one memory, while requesting / desiring different line widths.

상술한 관점에서, 메모리의 라인 폭을 선택적으로 변경시키는 방법에 관한 요구가 있게 된다.In view of the foregoing, there is a need for a method of selectively changing the line width of a memory.

도 1은 종래 기술의 캐시 도면;1 is a prior art cache diagram;

도 2는 캐시에 대한 종래 기술의 주소 룩업 동작 도면;2 is a prior art address lookup operation diagram for a cache;

도 3은 본 발명의 일 실시예에 따른 예시적인 시스템 도면;3 is an exemplary system diagram in accordance with one embodiment of the present invention;

도 4는 본 발명의 일 실시예에 따른 주소 룩업 동작 도면;4 is an address lookup operation diagram according to an embodiment of the present invention;

도 5는 다양한 작업이 수행된 이후 캐시의 예시적인 부분 도면;5 is an exemplary partial view of a cache after various operations have been performed;

도 6은 본 발명의 또 다른 일 실시예에 따른 캐시에 대한 주소 룩업 동작 도면;6 is an address lookup operation diagram for a cache according to another embodiment of the present invention;

본 발명은 메모리에 대한 라인 폭의 선택적 변경을 제공한다. 라인 폭은 하나 이상의 프로세서와 통신하는 데에 사용된다. 이것은 메모리와의 통신에 있어서 융통성 및 효율성을 증진시킨다. 특히, 레지스터는 선택된 라인 폭을 표시하는 값을 저장할 수 있으며, 이 때 선택된 라인 폭은 메모리 내의 데이터를 관리할 때에 이용된다. 프로세서는 라인 폭을 선택하기 위해서, 레지스터에안 된다다. 메모리와 통신할 때 사용되는 라인 폭은 레지스터 값에 따라 조정된다. 복수의 라인 폭이 동시에 사용될 수 있도록 하기 위해서, 선택된 라인 폭은 메모리 내에 각 데이터 블록에 연관될 수 있다. 캐시 내에 구현될 때, 캐시의 복수의 통로들에 있는 데이터 블록들은, 단일 메모리 동작이 수행되는 동안 좀 더 넓은 라인 폭을 사용하여 데이터를 제공하는 하나의 그룹으로서 처리될 수 있다. 라인 폭은 처리 시스템, 작업, 프로세서, 및/또는 성능 평가에 기초하여 다양하게 변화될 수 있다.The present invention provides for a selective change in line width for memory. The line width is used to communicate with one or more processors. This enhances flexibility and efficiency in communicating with the memory. In particular, the register may store a value indicating the selected line width, where the selected line width is used when managing data in memory. The processor must not be in the register to select the line width. The line width used when communicating with the memory is adjusted according to the register value. In order to allow multiple line widths to be used simultaneously, the selected line widths may be associated with each data block in memory. When implemented in a cache, data blocks in multiple passages of the cache may be treated as a group that provides data using a wider line width while a single memory operation is performed. The line width can vary widely based on processing system, task, processor, and / or performance assessment.

본 발명의 예시적인 태양은 여기에서 기술된 문제들, 및 논의되지는 않았지만 당업자라면 발견할 수 있는 다른 문제들을 해결하기 위해 설계되었다.Exemplary aspects of the present invention are designed to solve the problems described herein, and other problems that are not discussed but can be found by those skilled in the art.

본 발명의 이러한 외에 다른 특징들은, 첨부된 도면과 관련하여 취한 본 발명의 다양한 태양들에 대해 후술된 세부사항으로부터 좀 더 쉽게 이해될 것이다. These and other features of the present invention will be more readily understood from the following detailed description of various aspects of the invention taken in conjunction with the accompanying drawings.

본 발명의 도면들은 크기 조정되지 않았음을 유의해야 한다. 도면들은 단지 본 발명의 전형적인 태양을 도시하는 것으로 의도되며 따라서, 본 발명의 범위를 제한하는 것으로 간주되어서는 안 된다. 도면들에서, 동일한 번호는 도면들 간에 동일한 요소를 표시한다.It should be noted that the figures of the present invention are not to scale. The drawings are intended only to illustrate typical aspects of the invention and therefore should not be considered as limiting the scope of the invention. In the drawings, like numerals indicate like elements between the figures.

본 발명을 수행하기 위한 최적 형태Best Mode for Carrying Out the Invention

본 발명은 메모리에 대한 라인 폭의 선택적 변경, 즉, 메모리와 통신을 하기 위한 복수의 라인 폭 중 하나를 선택하는 것을 제공한다. 라인 폭은 예컨대, 메모리가 설치되는 처리 시스템, 메모리를 액세스하는 프로세서, 메모리를 사용하는 작업 및/또는 메모리가 사용되고 있는 효과의 성능 평가를 포함하는 복수의 파라미터에 기초하여 선택될 수 있다. 처리 시스템에 관해서는, 본 발명의 메모리가 처리 시스템 내에 설치될 때, 라인 폭이 선택될 수 있다. 이는 동일한 메모리가 제조되어, 상이한 라인 폭을 사용하는 다양한 처리 시스템들 중 하나에 설치되는 것을 가능하게 한다. 작업 또는 프로세서에 관해서는, 라인 폭은 작업의 로딩/언로딩, 또는 공유 메모리를 액세스하고 있는 수 개의 프로세서들 중 하나의 시작/종료에 기초하여 선택될 수 있다. 작업들이 상이한 라인 폭을 사용할 수 있게 하는 선택성이 있는 가변 라인 폭 메모리를 구현할 때, 선택된 라인 폭은 각 작업에 연관되어 있어야 한다. 라인 폭은 당업계에 알려져 있는 바와 같이, 작업이 로딩될 때 선택되고, 그 작업이 다른 작업 정보(즉, 프로그램 카운터, 레지스터 정보, 등)에 따라 언로딩될 때 저장될 수 있다. 처리 시스템에 관한 디폴트 라인 폭은 프로세서/작업이 특정 라인 폭을 선택하지 않은 경우에 사용될 수 있다. 라인 폭은 또한, 활성 작업(active task)에 대해 변경될 수 있다. 예를 들면, 소프트웨어 개발자로 하여금 프로그램의 소정 부분에 대해 라인 폭을 바꿀 수 있도록 하기 위해서 하나 이상의 컴파일러 지시문(complier directives)이 통합될 수 있다. 예를 들면, SetCacheWidth#X 명령어는 희망하는 라인 폭(X)을 지정할 수 있으며, 반면 EndCacheWidth 명령어는 선택된 라인 폭을 이전의 크기 또는 디폴트 크기로 반환할 수 있다. 이로써 소프트웨어 개발자는 예컨대, 상당량의 데이터를 전달할 예정인 작업의 일부가 입력될 때, 큰 라인 폭을 지정할 수 있으며, 따라서 큰 라인 폭으로부터 이익을 얻을 수 있다. 성능 평가에 관해서는, 프로세서 상에 하나 이상의 작업을 실행하고 있는 운영 체제는 비효율적인 메모리 성능을 감지하여, 활성 작업 및/또는 다른 작업의 라인 폭을 변경할 수 있다. 예를 들면, 운영 체제는 캐시 히트/미스 비율을 모니터하여, 그 비율이 너무 높다고 결정할 수 있다. 그에 대한 응답으로, 운영 체제는 캐시를 사용하는 작업 전부 또는 일부에 대해 상이한 라인 폭을 선택하도록 명령을 내릴 수 있다. The present invention provides for a selective change of the line width for the memory, i.e. selecting one of a plurality of line widths for communicating with the memory. The line width may be selected based on a plurality of parameters including, for example, a processing system in which the memory is installed, a processor accessing the memory, a task using the memory, and / or a performance assessment of the effect of the memory being used. As for the processing system, the line width can be selected when the memory of the present invention is installed in the processing system. This allows the same memory to be manufactured and installed in one of various processing systems using different line widths. As for the task or processor, the line width may be selected based on the loading / unloading of the task, or the start / end of one of several processors accessing the shared memory. When implementing an optional variable line width memory that allows jobs to use different line widths, the selected line width must be associated with each job. The line width is selected when a job is loaded, as is known in the art, and can be stored when the job is unloaded according to other job information (ie, program counters, register information, etc.). The default line width for the processing system may be used if the processor / job did not select a particular line width. The line width can also be changed for the active task. For example, one or more compiler directives may be incorporated to allow software developers to vary line widths for certain portions of a program. For example, the SetCacheWidth # X command can specify the desired line width (X), while the EndCacheWidth command can return the selected line width to its previous or default size. This allows a software developer to specify a large line width, for example, when a portion of a task that is to deliver a significant amount of data is entered, thus benefiting from the large line width. With regard to performance assessment, an operating system running one or more tasks on the processor may detect inefficient memory performance and change the line width of active and / or other tasks. For example, the operating system may monitor the cache hit / miss ratio and determine that the ratio is too high. In response, the operating system may instruct to select a different line width for all or some of the tasks using the cache.

도면으로 돌아가서, 도 3은 본 발명의 다양한 특징을 구현하는 예시적인 처리 시스템(10)을 도시한다. 처리 시스템(10)은 프로세서(12) 및 메모리(14)를 포함한다. 일반적으로 프로세서(12)는 메모리(14) 내에 저장된 데이터에 대해 읽기, 쓰기, 삭제 등과 같은 메모리 동작을 수행한다. 희망하는 동작을 수행하기 위해서, 프로세서(12)는 주소 라인(들)(16)을 사용하여 메모리(14)에게 주소를 제공한다. 데이터는 프로세서(12)와 메모리(14) 사이에서 데이터 라인(18)을 사용하여 전달된다. 프로세서(12)는 데이터 라인(18) 전부 또는 일부를 사용하여, 또는 도시되지 않은 하나 이상의 동작 라인들에 의하여 희망하는 동작을 전달할 수 있다.Returning to the drawings, FIG. 3 shows an exemplary processing system 10 implementing various aspects of the present invention. Processing system 10 includes a processor 12 and a memory 14. In general, the processor 12 performs memory operations such as reading, writing, and deleting of data stored in the memory 14. To perform the desired operation, processor 12 uses address line (s) 16 to provide an address to memory 14. Data is transferred using data line 18 between processor 12 and memory 14. The processor 12 may deliver the desired operation using all or part of the data line 18 or by one or more operation lines not shown.

캐시로서 구현되는 경우, 메모리(14)는 메인 메모리(20) 내에 저장된 데이터의 일부를 저장한다. 동작 중, 메인 메모리(20)는 프로세서(12)에 의해 수행될 하나 이상의 작업들을 위하여 예약된 하나 이상의 메모리 블록(13)을 포함한다. 프로세서(12)는 메인 메모리(20) 내에 저장된 데이터에 대한 주소를 제공한다. 메모리(14)는 먼저, 메인 메모리(20) 주소에 기초하여 데이터의 사본을 포함하는지 여부를 결정한다. 요청된 데이터가 존재하는 경우, 희망하는 동작이 메모리(14) 내에 데이터 상에서 수행된다. 요청된 데이터가 메모리(14) 내에 존재하지 않으면, 메모리(14)는 동작을 수행하기 전에 메인 메모리(20)로부터 데이터를 획득한다. 메모리(14)는 삭제 및/또는 다른 데이터와 교체(swapping out)하기 전에 변형된 데이터를 다시 메인 메모리(20)에 기록한다. If implemented as a cache, the memory 14 stores some of the data stored in the main memory 20. In operation, main memory 20 includes one or more memory blocks 13 reserved for one or more tasks to be performed by processor 12. The processor 12 provides an address for the data stored in the main memory 20. The memory 14 first determines whether to include a copy of the data based on the main memory 20 address. If the requested data exists, the desired operation is performed on the data in memory 14. If the requested data does not exist in the memory 14, the memory 14 obtains data from the main memory 20 before performing the operation. The memory 14 writes the modified data back to the main memory 20 before erasing and / or swapping out with other data.

메모리(14)는 데이터 라인(18)에 대하여 선택적으로 변경될 수 있는 라인 폭을 사용하여 프로세서(12)와 통신할 수 있다. 선택성이 있는 가변 라인 폭을 구현하기 위해서, 메모리(14)는 폭 유닛(22) 및 주소 유닛(24)을 포함하는 것으로 도시되어 있다. 폭 유닛(22)은 데이터 라인(18)에 대하여 예컨대, 프로세서(12)에 의하여 선택될 라인 폭을 저장한다. 주소 유닛(24)은, 후술되는 바와 같이, 제공된 메인 메모리 및 선택된 라인에 기초하여 룩업을 생성한다. 폭 유닛(22) 및 주소 유닛(24)은 메모리(14) 내에 포함되어 있지만, 유닛(22, 24)의 기능은 소프트웨어(예컨대, 프로세서(12) 내에서 실행되는), 하드웨어, 또는 소프트웨어 및 하드웨어의 결합을 이용하여 메모리(14) 내 및/또는 메모리로부터 분리되어 구현될 수 있는 것으로 이해된다. 나아가, 하나 이상의 추가적인 프로세서, 즉 프로세서(26)는 하나 이상의 작업을 수행하기 위해서, 및 메모리(14) 및/또는 메인 메모리(20)와 통신하고, 그 상에 동작을 수행하기 위해서 하나 이상의 메모리 블록(28)을 예약할 수 있다.The memory 14 may communicate with the processor 12 using a line width that may be selectively changed with respect to the data line 18. In order to implement a variable variable line width, memory 14 is shown to include a width unit 22 and an address unit 24. The width unit 22 stores, for example, the line width to be selected by the processor 12 for the data line 18. The address unit 24 generates a lookup based on the provided main memory and the selected line, as described below. Although the width unit 22 and the address unit 24 are included in the memory 14, the functions of the units 22 and 24 may be software (eg, executed in the processor 12), hardware, or software and hardware. It is understood that the combination of can be implemented within and / or separately from the memory 14. Further, one or more additional processors, i.e., the processor 26, perform one or more tasks, and one or more memory blocks to communicate with, and perform operations on, the memory 14 and / or main memory 20. (28) can be reserved.

도 4는 라인 폭을 선택적으로 변경할 수 있게 하는 본 발명의 일 실시예에 따라 캐시(30)에 대한 주소 룩업 동작을 도시한다. 선택된 라인 폭이 데이터 블록(38) 내에 복수 분량의 데이터(38D)인 경우, 복수의 통로(40) 내에 위치한 데이터 블록들(38)이 그룹으로서 관리된다. 나아가, 블록 오프셋 부분(36C), 인덱스 부분(36B), 및/또는 태그 부분(36A)의 크기 및/또는 위치는 선택된 라인 폭에 따라 가변적이다.4 illustrates an address lookup operation for cache 30 in accordance with one embodiment of the present invention that enables the line width to be selectively changed. When the selected line width is a plurality of pieces of data 38D in the data block 38, the data blocks 38 located in the plurality of passages 40 are managed as a group. Further, the size and / or position of the block offset portion 36C, the index portion 36B, and / or the tag portion 36A is variable depending on the selected line width.

라인 폭의 선택을 구현하기 위해서, 캐시(30)는 폭 유닛(32)을 포함하는 것으로 도시되어 있다. 폭 유닛(32)은 희망하는 라인 폭을 선택하도록 프로세서/작업에 의해 설정된 폭 레지스터(42)를 포함한다. 캐시(30)는 폭 유닛(32)을 사용해 라인 폭을 결정한다. 선택된 라인 폭에 기초하여 캐시(30)는 하나 이상의 통로(40) 내에 있는 데이터 블록(38)을 가변 크기의 단일 데이터 블록으로 관리한다. 예를 들면, 폭 레지스터(42)가 2^B+1워드(두 개의 데이터 블록)의 라인 폭을 지시하는 경우, 통로₀ 및 통로₁ 내의 인덱스(0)에 위치한 데이터 블록들이 두 배 크기의 단일 데이터 블록으로서 관리된다.In order to implement the selection of the line width, the cache 30 is shown as including a width unit 32. The width unit 32 includes a width register 42 set by the processor / task to select the desired line width. The cache 30 uses the width unit 32 to determine the line width. Based on the selected line width, cache 30 manages data blocks 38 in one or more passages 40 as a single data block of variable size. For example, if the width register 42 indicates the line width of 2 ^{B + 1} words (two data blocks), then the data blocks located at index _{0 in} passage ₀ and passage ₁ are twice as large as single data. It is managed as a block.

주의를 기울여야할 한 가지 논점은, 라인 폭이 변경되는 경우, 하나 이상의 데이터 블록이 올바른 데이터를 포함하지 않기 때문에 및/또는 데이터가 다른 데이터 블록에 위치할 수 있기 때문에, 캐시(30) 내의 데이터 전부 또는 일부가 액세스 불가 및/또는 무효로 된다는 것이다. 예를 들면, 라인 폭이 1에서 2 데이터 블록으로 변경된 경우, 둘째 통로의 데이터 블록은 기록되지 않았기 때문에, 이전에 단일 데이터 블록으로서 기록된 데이터는 두 개의 데이터 블록으로서 검색될 수 없다. 마찬가지로, 라인 폭이 2 데이터 블록에서 1 데이터 블록으로 변경된 경우, 데이터를 가진 둘째 데이터 블록이 다른 인덱스에 위치한다. 그 결과, 새로운 라인 폭이 선택된 경우, 캐시(30) 내의 데이터 전부 또는 일부가 무효화될 필요가 있을 수 있다. One point to note is that if the line width changes, all of the data in the cache 30 may be because one or more data blocks do not contain the correct data and / or because the data may be located in other data blocks. Or some become inaccessible and / or invalid. For example, when the line width is changed from 1 to 2 data blocks, the data previously written as a single data block cannot be retrieved as two data blocks because the data blocks of the second passage have not been written. Similarly, when the line width is changed from 2 data blocks to 1 data block, the second data block with data is located at another index. As a result, when a new line width is selected, all or part of the data in the cache 30 may need to be invalidated.

모든 데이터가 무효화되는 것을 방지하기 위해서, 선택된 라인 폭은 각 데이터 블록(38)에 연관되어 있어서, 후에 데이터 블록(38)이 어떤 라인 폭에서 기록되었는지 결정할 수 있다. 이는 라인 폭 변경이 있을 때마다 캐시(30) 내의 데이터를 무효화할 필요없이 복수의 프로세서/작업이 동시에 캐시(30)를 사용할 수 있게 한다. 일 실시예에서, 선택된 라인 폭(예컨대, 폭 레지스터(42)의 값)은 그 값을 데이터 블록(38) 내에 크기(38E)로서 저장함으로써 데이터 블록(38)과 연관되어 있다. 이와 달리, 폭 레지스터(42)의 값은 선택된 라인 폭을 데이터 블록(38)과 연관시키기 위하여, 크기(38E) 내에 저장된 가능한 라인 폭 각각에 해당하는 상이한 값으로 매핑될 수 있다. 크기(38E)의 값에 기초하여, 데이터 블록(38)이 현재의 라인 폭으로 기록되었는지 여부, 및 태그 부분(36A)이 데이터 블록(38)에 저장된 태그(38A)와 매치되는 경우에 현재의 라인 폭이 사용될 수 있는지 여부를 결정할 수 있다. 복수의 데이터 블록(38)이 하나의 그룹으로서 관리되는 경우, 각 데이터 블록에 대한 오버헤드(즉, 태그(38A), 더티 비트(38B), 유효 비트(38C))는 단지 첫번째 데이터 블록(38)에만 기록되면 되는데, 왜냐하면 추가적인 데이터 블록(38)에 대한 오버헤드는 첫번째 데이터 블록(38)의 단지 사본에 불과할 것이기 때문이다. 그러나, 별개의 라인 폭 크기를 사용하는 후속 액세스가 그 데이터 블록을 헌 것으로서 및/또는 무효로서 인식할 수 있도록 크기(38E)는 그룹 내 모든 데이터 블록(38)에 대해 쓰여진다. 이와 달리, 정보의 전부 또는 일부가 각 데이터 블록(38)에 기록되기를 계속할 수 있다. 예를 들면, 데이터 블록(38) 전부보다는 적은 수의 데이터 블록에 있는 데이터(38D)가 변경되는 경우, 더티 비트(38B)는 각 데이터 블록(38)에 대해 별개로 업데이트되어 메인 메모리로 복사되는 데이터(38D)의 양을 제한할 수 있다.In order to prevent all data from being invalidated, the selected line width is associated with each data block 38 so that it can later be determined at which line width the data block 38 was written. This allows multiple processors / tasks to simultaneously use the cache 30 without having to invalidate the data in the cache 30 whenever there is a line width change. In one embodiment, the selected line width (eg, the value of the width register 42) is associated with the data block 38 by storing that value as the size 38E in the data block 38. Alternatively, the value of the width register 42 may be mapped to a different value corresponding to each of the possible line widths stored in the size 38E, in order to associate the selected line width with the data block 38. Based on the value of size 38E, whether the data block 38 has been written with the current line width, and if the tag portion 36A matches the tag 38A stored in the data block 38, the current It can be determined whether line width can be used. If a plurality of data blocks 38 are managed as a group, the overhead for each data block (ie, tag 38A, dirty bit 38B, valid bit 38C) is only the first data block 38. ), Since the overhead for additional data block 38 will only be a copy of the first data block 38. However, the size 38E is written for all data blocks 38 in the group so that subsequent accesses using separate line width sizes can recognize that data block as old and / or invalid. Alternatively, all or part of the information may continue to be written to each data block 38. For example, when data 38D in fewer data blocks than all of data blocks 38 is changed, dirty bits 38B are updated separately for each data block 38 and copied into main memory. It is possible to limit the amount of data 38D.

캐시(30)는 또한, 데이터 블록(38)의 위치를 알아내기 위하여 인덱스 부분(36B)에 기초하여 룩업(37)을 생성하는 주소 유닛(34)을 포함한다. 주소 유닛(34)은 작업/프로세서에 대하여 라인 폭이 변경될 때, 데이터 블록(38) 전부 또는 일부의 데이터가 유효한 상태로 유지되도록 선택된 라인 폭에 기초하여 인덱스 부분(36B)을 변경한다. 라인 폭을 선택하기 위해서, 적절한 마스크가 폭 레지스터(42)에 기록된다. 폭 레지스터(42)는 최대 라인 폭(즉, 데이터 블록의 최대 개수)이 선택될 때, 마스크된(0으로 설정된) 인덱스 부분(36B) 비트의 최대 개수에 해당하는 비트의 개수(E)를 포함한다. 다시 말해서, N 개의 통로를 갖는 캐시에 대해서, 폭 레지스터(42)는 log₂(N) 비트(E)까지 포함할 것이다. 주소 유닛(34)은 논리 AND 게이트(44)를 포함한다. AND 게이트(44)는 인덱스 부분(36B)의 최하위 E 비트들을 폭 레지스터(42)의 내용과 결합시키기 위해 사용된다. 다음으로 그 결과는 룩업(37)을 생성하기 위해 인덱스 부분(36B)의 나머지와 결합된다. 다음으로 룩업(37)은 메인 메모리 주소(36)에 대해 데이터를 포함할 수 있는 통로(40) 내에서 데이터 블록(38)의 위치를 알아내는데 사용된다.The cache 30 also includes an address unit 34 that generates a lookup 37 based on the index portion 36B to locate the data block 38. The address unit 34 changes the index portion 36B based on the selected line width such that when the line width changes for the task / processor, the data of all or part of the data block 38 remains valid. To select the line width, an appropriate mask is written to the width register 42. The width register 42 includes the number E of bits corresponding to the maximum number of masked (set to zero) index portion 36B bits when the maximum line width (ie, the maximum number of data blocks) is selected. do. In other words, for a cache with N paths, the width register 42 will contain up to log ₂ (N) bits (E). The address unit 34 includes a logical AND gate 44. AND gate 44 is used to combine the least significant E bits of index portion 36B with the contents of width register 42. The result is then combined with the rest of the index portion 36B to produce a lookup 37. The lookup 37 is then used to locate the data block 38 in the passage 40 which may contain data for the main memory address 36.

이하의 예시적인 표는, 최대 8 개의 데이터 블록이 선택될 수 있는 경우, 폭 레지스터(42)의 값을 제공한다. 행(1)에서 볼 수 있는 바와 같이, 하나의 데이터 블록으로 라인 폭이 선택되는 경우, 각 데이터 블록이 개별적으로 액세스될 수 있도록 인덱스 부분(36B)의 모든 I 비트가 사용된다. 라인 폭이 두 배가 될 때마다, 최하위 인덱스 비트로 시작하여, 인덱스 비트들이 추가로 마스크된다. 따라서, 결과적으로 행(2)에서는, 룩업(37)이 매 두 번마다 인덱스된 데이터 블록을 액세스하며, 행(3)에서는 매 네번마다 인덱스된 데이터 블록을 액세스한다. 8 개의 데이터 블록으로 라인 폭이 선택되는 경우(마지막 행), 인덱스 부분(36B)의 최하위 3 비트가 마스크되며 그 결과 매 8번째 블록이 액세스되는 것으로 나타난다. 마스크된 인덱스 비트(들)(2행 내지 4행)는 그룹에서 어느 데이터 블록(38)이 데이터(38D)에서 희망하는 데이터를 포함하는지 결정하기 위해서 필요하다. 그 결과, 인덱스 부분(36B)의 마스크된 비트들은 블록 오프셋 부분(36C)의 일부로서 간주될 수 있다(즉, 블록 오프셋 부분(36C)의 크기는 마스크된 비트들의 개수만큼 증가한다). The example table below provides the value of the width register 42 when up to eight data blocks can be selected. As can be seen in row 1, when the line width is selected with one data block, all I bits of the index portion 36B are used so that each data block can be accessed individually. Each time the line width is doubled, starting with the lowest index bit, the index bits are further masked. Consequently, in row 2, lookup 37 accesses the indexed data block every two times, and in row 3, the indexed data block is accessed every four times. If the line width is selected with the eight data blocks (last row), the least significant three bits of the index portion 36B are masked, resulting in every eighth block being accessed. Masked index bit (s) (rows 2 through 4) are needed to determine which data block 38 in the group contains the desired data in the data 38D. As a result, the masked bits of the index portion 36B may be considered as part of the block offset portion 36C (ie, the size of the block offset portion 36C increases by the number of masked bits).

도 5를 도 4와 연결시켜 고려하면, A 내지 D의 네 개의 작업을 각각 수행한 이후 캐시(30)의 예시적인 부분이 도시되어 있다. 예시 부분은 3개의 통로(40)(통로₀ 내지 통로₃)를 포함하며, 각 통로는 0-7로 인덱스된 8개의 데이터 블록(38)(도 5에서 셀로 도시되어 있음)을 갖는다. 캐시(30)는 처음, 작업(A)이 수행된 이후로 도시되어 있다. 작업(A)이 하나의 데이터 블록(38)을 라인 폭으로 사용하기 때문에(즉, 각 통로는 독립적으로 관리됨), 폭 레지스터(42)는 모두 1로 설정되어, 인덱스 부분(36B)의 모든 I 비트가 데이터 블록(38)의 위치를 알아내기 위한 룩업(37)을 생성하는 데에 사용될 수 있도록 한다. 그 결과 작업(A)은 임의의 통로(40) 내에 위치한 어떤 데이터 블록(38)에도 데이터를 기록할 수 있다.Considering FIG. 5 in conjunction with FIG. 4, an exemplary portion of cache 30 is shown after each of the four operations A through D has been performed. The example portion includes three passages 40 (paths ₀ through ₃ ), each passage having eight data blocks 38 (shown in cells in FIG. 5) indexed 0-7. The cache 30 is shown first, after the operation A has been performed. Since task A uses one data block 38 as the line width (ie each passage is managed independently), the width registers 42 are all set to 1, so that all of the index portion 36B Allows the I bits to be used to generate a lookup 37 for locating the data block 38. As a result, task A may write data to any data block 38 located within any passage 40.

작업(B)은 4개의 데이터 블록(4 통로)을 라인 폭으로 사용한다. 따라서, 작업(B)이 캐시(30)로부터 데이터를 읽을 때마다, 주어진 인덱스에서의 모든 데이터 블록의 데이터(38D)가 전달된다(즉, 통로₀ 내지 통로₃). 나아가, 작업(B)은 4개의 데이터 블록(38)을 라인폭으로 사용하여 캐시(30)와 통신하기 때문에, 폭 레지스터(42)의 최하위 2비트는 상술된 바와 같이 0으로 설정된다. 그 결과, 작업(B)을 위해 캐시(30) 내의 모든 데이터를 무효화시키지 않고, 라인 폭을 다양하게 변화시킬 수 있도록 작업(B)에 대한 룩업(37)을 생성할 때, 주소 유닛(34)은 인덱스 부분(38B)의 최화위 2비트를 0으로 설정한다. 그러므로, 작업(B)은 도시된 바와 같이 캐시(30) 부분의 인덱스 0 및 4에 있는 데이터 블록(38)에 데이터를 기록하는 것으로 제한된다.Task B uses four data blocks (four passages) as the line width. Thus, each time task B reads data from cache 30, data 38D of all data blocks at a given index is passed (i.e., passages ₀ through ₃ ). Further, since task B communicates with cache 30 using four data blocks 38 in line width, the least significant two bits of width register 42 are set to zero as described above. As a result, when generating the lookup 37 for the task B so that the line widths can be varied without invalidating all data in the cache 30 for the task B, the address unit 34 Sets the least significant two bits of the index portion 38B to zero. Therefore, task B is limited to writing data to data block 38 at indexes 0 and 4 of the cache 30 portion as shown.

작업(C)은 2개의 데이터 블록(38)(2 통로)을 라인 폭으로 사용한다. 그 결과 폭 레지스터(42)의 최하위 비트가 0으로 설정되어 작업(C)에 대한 룩업(37)을 생성할 때, 주소 유닛(34)이 인덱스 부분(38B)의 최하위 비트를 0으로 설정할 수 있도록 한다. 작업(C)이 실행된 후, 작업(B)의 엔트리 중 하나의 부분이 교체된다(즉, 통로₀ 및 통로₁의 데이터 블록0). 그 결과, 작업(B)의 나머지는 무효가 되며, 더 이상 작업(B)에 의해 액세스될 수 없다.Task C uses two data blocks 38 (two passages) as the line width. As a result, when the least significant bit of the width register 42 is set to zero to generate a lookup 37 for task C, the address unit 34 can set the least significant bit of the index portion 38B to zero. do. After task C is executed, a portion of one of the entries of task B is replaced (i.e., data block 0 of passage ₀ and passage ₁ ). As a result, the rest of job B becomes invalid and can no longer be accessed by job B.

작업(C)이 실행된 후, 데이터 블록이 다른 작업을 위해 교체되지 않았기 때문에 작업(A, B, 및 C)에 대한 데이터 블록(38)은 여전히 유효하게 존속하며, 각 작업에 의해 작업 별 고유의 라인 폭을 사용하여 액세스될 수 있다. 마찬가지로, 하나의 데이터 블록(38)을 라인 폭으로 하는 작업(D)이 실행된 후, 작업(A 및 C)에 의해 사용되는 상당량의 데이터 블록(38)이 여전히 유효하게 존속한다. 그러나, 작업(B)에 의해 사용되는 데이터 블록(38)은, 작업(B)에 대한 인덱스(0)에서의 데이터가 교체되었기 때문에, 오직 하나만 유효하게 존속한다. 또한, 인덱스(3)의 데이터 블록(38)에서 작업(A) 데이터는, 일단 작업(D)이 인덱스(3)의 통로₀에 작업(A)에 대한 데이터를 겹쳐 쓰기(overwrite)하게 되면, 모든 통로에 대해 무효화된다. 캐시(30)는 작은 라인 폭(더 많은 히트) 대 큰 라인 폭(더 적은 동작) 간에 얻어지는 트레이드 오프를 도시한다. 나아가 도시된 바와 같이, 캐시(30)는 각기 다른 라인 폭을 포함하는 복수의 작업들에 대해 동시에 데이터를 저장할 수 있는 능력이 있으며, 그리하여 캐시(30) 이용의 효율성을 증진시킨다.After job C is executed, the data block 38 for jobs A, B, and C still remains valid because the data block was not replaced for another job, and is unique to each job by each job. It can be accessed using the line width of. Similarly, after operation D, which makes one data block 38 line width, is executed, a significant amount of data blocks 38 used by operations A and C still survive. However, only one data block 38 used by task B remains valid because the data at index 0 for task B has been replaced. In addition, in the data block 38 of the index 3, the data of the work A, once the work D overwrites the data for the work A in the passage ₀ of the index 3, It is invalidated for all passages. Cache 30 illustrates the tradeoffs obtained between small line widths (more hits) versus large line widths (less actions). Furthermore, as shown, the cache 30 has the ability to simultaneously store data for multiple jobs including different line widths, thereby enhancing the efficiency of using the cache 30.

활성 작업에 대한 라인 폭이 다양하게 달라질 수 있으면, 활성 작업에 대한 데이터를 포함하는 데이터 블록(38)은 상술된 바와 같이 무효화될 수 있다. 라인 폭 간에 효율적인 전이를 보장하기 위해서, 캐시(30)는 "저장 스루(store through)" 모드에서 실행될 수 있다. 저장 스루 모드에서, 캐시(30)는 작업 교체 또는 교체될 데이터를 기다리지 않고, 메인 메모리에 임의의 변형된 데이터를 기록한다. 이리하여, 이것이 발생하기 전에, 메인 메모리에로의 상당량 쓰기를 잠재적으로 요구함이 없이, 데이터 블록(38)이 무효로 표지되는 것을 가능하게 한다. 또한, 나아가 인덱스 부분(36B)이 라인 폭에 기초하여 마스크되는 경우, 작업의 라인 폭이 다양하게 달라질 때, 통로(40)의 데이터 부분은 여전히 유효하게 존속한다. 작업의 데이터에 대한 부분은 각기 다른 라인 폭에 대해서도 동일한 위치에 저장된다. 예를 들면, 도 5에서, 작업(B)이 실행된 이후, 작업(B)의 라인 폭이 2 데이터 블록(2 통로)으로 변경되면, 통로₀ 및 통로₁의 인덱스(0 및 4)에서 데이터 블록(38)의 작업(B) 데이터는 유효하며, 사용 가능한 상태로 존속할 것이다. 그 결과, 이 데이터는 메인 메모리로부터 로드 동작을 필요로 하면서 무효화로 표지될 필요는 없어질 것이다.If the line width for the active task can vary, then the data block 38 containing the data for the active task can be invalidated as described above. To ensure efficient transition between line widths, cache 30 can be run in a "store through" mode. In the save through mode, the cache 30 writes any modified data to the main memory without waiting for work replacement or data to be replaced. This allows the data block 38 to be marked as invalid before this occurs, potentially without requiring a significant amount of writes to the main memory. Further, when the index portion 36B is masked based on the line width, when the line width of the task varies in various ways, the data portion of the passage 40 still remains valid. The parts of the job's data are stored in the same location for different line widths. For example, in FIG. 5, after the operation B is executed, if the line width of the operation B is changed to two data blocks (two passages), the data at the indices 0 and 4 of the passage ₀ and the passage ₁ is changed. The task B data of block 38 is valid and will remain available. As a result, this data will need to be loaded from main memory and not need to be marked invalid.

도 4로 다시 돌아가서, 인덱스 부분(36B)의 비트를 마스크하는 대신, 주소 유닛(34)은 데이터 블록(38)의 위치를 알아내기 위해 룩업(37)으로서 인덱스 부분(36B)을 제공할 수 있다. 이것은 폭 레지스터(42)가 각기 다른 크기의 주소지정 가능한 워드를 사용하는 프로세서들에 기초하여 다양하게 변화하는 경우, 바람직할 수 있다. 예를 들면, 1바이트 워드를 갖는 프로세서는 1 데이터 블록(38)의 라인 폭을 사용할 수 있고, 2바이트 워드를 가지면서 2 데이터 블록(38)의 라인 폭을 사용하는 프로세서와 캐시(30)를 공유할 수 있다. 이러한 구성에서, 1바이트 주소지정 가능한 프로세서는 N개의 데이터 히트 가능성(각 경로(40)당, 1 가능성)을 가지는 반면, 2바이트 주소지정 가능한 프로세서는 N/2개의 데이터 히트 가능성(한 쌍의 경로(40)당, 1 가능성)을 가질 것이다.4, instead of masking the bits of the index portion 36B, the address unit 34 may provide the index portion 36B as a lookup 37 to locate the data block 38. . This may be desirable if the width register 42 varies variously based on processors using different sized addressable words. For example, a processor with a one byte word may use a line width of one data block 38 and a processor and cache 30 using a line width of a two data block 38 while having a two byte word. Can share In this configuration, a one byte addressable processor has N data hit possibilities (one probability per path 40), while a two byte addressable processor has N / 2 data hit possibilities (a pair of paths). Per 40, will have 1 possibility).

태그(38A)는 태그 부분(36A)의 사본, 또는 지정된 메인 메모리 주소(36)를 하나 이상의 데이터 블록(38) 내에 저장된 데이터와 매치시킬 수 있는 능력을 가진 임의의 데이터를 포함할 수 있다고 이해된다. 나아가, 메인 메모리 주소(36)는 도시된 부분(36A-C) 전부 또는 일부를 포함할 수 있다고 이해된다. 예를 들면, 각 데이터 블록(38)이 1워드 크기인 경우, 블록 오프셋 부분(36C)이 0비트(즉, 포함되지 않음)이다. 덧붙여, 메인 메모리 주소(36) 내에서 인덱스 부분(36B), 및 블록 오프셋(36C)의 위치는 데이터가 캐시(30) 내에서 어떻게 저장/액세스 되는가에 따라서 순서가 바뀔 수 있다.It is understood that the tag 38A may include a copy of the tag portion 36A, or any data having the ability to match a designated main memory address 36 with data stored in one or more data blocks 38. . Furthermore, it is understood that the main memory address 36 may include all or a portion of the portions 36A-C shown. For example, if each data block 38 is one word in size, the block offset portion 36C is zero bits (ie not included). In addition, the position of index portion 36B, and block offset 36C within main memory address 36 may be reordered in accordance with how data is stored / accessed in cache 30.

도 6은 본 발명의 또 다른 실시예에 따른 대안적인 캐시(130)를 도시한 것이다. 캐시(130)는 선택된 라인 폭의 크기에 관계없이, 모든 데이터 블록(38)에로 액세스를 허가한다. 그 결과, 더 큰 라인 폭을 사용하는 작업/프로세서는 도 4와 관련하여 논의된 바와 같이 제한된 개수의 데이터 블록(38)이 아닌, 모든 캐시(130) 내 모든 데이터 블록(38)으로 액세스를 할 수 있다. 캐시(130)에서, 인덱스 부분(36B)은 선택된 라인 폭에 기초하여 메인 메모리 주소(36) 내에 위치한다. 폭 유닛(32)은 도 4와 관련하여 상술된 것과 동일하게 동작하는 폭 레지스터(42)를 포함한다. 주소 유닛(134)은 데이터 블록(38) 내에 태그(38A)와 비교되고, 태그(38A)로서 저장되는 태그(139)를 생성하는 논리 AND 게이트(44), 및 데이터 블록(38)의 위치를 알아내기 위하여 룩업(137)을 생성하는 시프트 회로(146)를 포함한다.6 illustrates an alternative cache 130 in accordance with another embodiment of the present invention. The cache 130 grants access to all data blocks 38, regardless of the size of the selected line width. As a result, jobs / processors using larger line widths will have access to all data blocks 38 in all caches 130, rather than a limited number of data blocks 38, as discussed in connection with FIG. Can be. In cache 130, index portion 36B is located in main memory address 36 based on the selected line width. The width unit 32 includes a width register 42 that operates in the same manner as described above with respect to FIG. 4. The address unit 134 is compared with the tag 38A in the data block 38, and the position of the data block 38 and the logical AND gate 44 that generates the tag 139 stored as the tag 38A. A shift circuit 146 that generates a lookup 137 to find out.

폭 레지스터(42)의 값에 따라, 인덱스 부분(36B)의 모든 I 비트, 및 태그 부분(36A)의 최하위 E 비트가 시프트 회로(146)로 제공된다. 폭 레지스터(42)의 갑에 따라, 제공된 비트들은 오른쪽으로 0 이상의 비트만큼 시프트된다. 예를 들면, 인덱스 부분(36B) 비트 및 태그 부분(36A)의 최하위 E 비트의 결합은, 0값을 갖는(마스크된) 폭 레지스터(42)의 각 비트별로 오른쪽 1비트만큼 시프트될 수 있다. 일단 시프트되면, 남아 있는 최하위 I 비트가 데이터 블록(38)의 위치를 알아내기 위한 룩업(137)으로서 이용된다. 그 결과, 룩업(137)은 항상 마스크되지 않은 I 비트를 포함하게 되며, 따라서 데이터 블록(38)에 대한 모든 인덱스가 액세스 가능하다. 임의의 오른쪽으로 시프트된 비트는 상술된 바와 같이, 후속으로 블록 오프셋 부분(36C)의 일부로서 이용될 수 있다. Depending on the value of the width register 42, all I bits of the index portion 36B and the least significant E bits of the tag portion 36A are provided to the shift circuit 146. According to the width of the width register 42, the provided bits are shifted to the right by zero or more bits. For example, the combination of the index portion 36B bits and the least significant E bits of the tag portion 36A may be shifted by one bit to the right for each bit of the width register 42 having a zero value (masked). Once shifted, the remaining least significant I bits are used as lookup 137 to locate the data block 38. As a result, lookup 137 will always contain unmasked I bits, so all indexes for data block 38 are accessible. Any right shifted bit may be used subsequently as part of block offset portion 36C, as described above.

태그 부분(36A)의 최하위 E 비트는 또한, AND 게이트(44)에게 제공되어, 폭 레지스터(42)를 사용하여 마스크된다. 다음으로, 마스크된 비트는 태그 부분(36A)의 남아 있는 비트와 결합되어 태그(139)를 생성한다. 태그(139)는 태그(38A)와 비교되고, 및/또는 태그(38A)로 복사된다. 이것은 오른쪽으로 시프트되어, 룩업(137) 내에 이용된 태그 부분(36A)의 최하위 비트를 태그의 일부로서 사용되기 전에 0이 되게 한다. 그 결과, 메인 메모리 주소(36)가 제공될 때 비트는 2번, 즉, 한 번은 룩업(137)의 일부로서, 다음은 태그(139)의 일부로서 사용되지는 않는다.The least significant E bit of the tag portion 36A is also provided to the AND gate 44 to be masked using the width register 42. Next, the masked bits are combined with the remaining bits of tag portion 36A to generate tag 139. Tag 139 is compared to tag 38A and / or copied to tag 38A. This shifts to the right, causing the least significant bit of the tag portion 36A used in the lookup 137 to be zero before being used as part of the tag. As a result, when the main memory address 36 is provided, the bit is not used twice, i. E. Once as part of the lookup 137 and next as part of the tag 139.

두 가지 실시예에 대해 다양한 대안들이 가능할 것으로 이해된다. 예를 들면, 도 6에 도시된 주소 유닛(134) 및/또는 폭 유닛(32)은 논의된 다양한 실시예들 간에 동작을 스위치하는 소프트웨어, 및/또는 회로를 포함할 수 있다. 예를 들면, 폭 유닛(32)은 워드 크기를 선택하기 위한 레지스터를 포함할 수 있다. 이 선택, 및 폭 레지스터(42)의 값에 기초하여 주소 유닛(134)의 동작이 변경될 수 있다. 선택된 워드 크기는 또한, 크기(38E)와 유사한 방식(즉, 각 데이터 블록(38)에 저장됨)으로 각 데이터 블록(38)에 연관될 수 있다. 이렇게 기능을 통합함으로써, 각기 다른 주소지정 가능한 워드 크기를 사용하는 프로세서 상에서 실행되는 작업들이 다양한 크기의 라인 폭을 선택할 수 있도록 한다. 나아가, 메인 메모리 주소(36) 비트들을 마스크하는 것이 필수가 아니라고 이해된다. 예를 들면, 주소 룩업 동작은 불필요한 비트를 무시할 수 있다. It is understood that various alternatives are possible for the two embodiments. For example, address unit 134 and / or width unit 32 shown in FIG. 6 may include software and / or circuitry to switch operations between the various embodiments discussed. For example, the width unit 32 may include a register for selecting a word size. The operation of the address unit 134 can be changed based on this selection and the value of the width register 42. The selected word size may also be associated with each data block 38 in a manner similar to size 38E (ie, stored in each data block 38). By incorporating this functionality, tasks running on processors with different addressable word sizes can be selected for varying line widths. Furthermore, it is understood that masking the main memory address 36 bits is not required. For example, an address lookup operation can ignore unnecessary bits.

본 명세서에서, 컴퓨터 프로그램, 소프트웨어 프로그램, 프로그램, 또는 소프트웨어는 임의 언어, 코드 또는 표기로 된 명령어 집합의 임의의 표현을 의미하는데, 이 때 명령어는 다음 중 하나를 직접, 또는 다음 중 하나 이후, 또는 다음 둘 다의 특정 기능을 수행하는 정보 처리 능력을 갖는 시스템을 야기시키는 것으로 의도된 것이다: (a) 다른 언어, 코드, 또는 표기로 변환, 및/또는 (b) 다른 자재 형태로 재생산이 그것이다. 본 발명에 대해 상술된 다양한 태양들은 설명 및 기술 목적 상 제시되었다. 본 발명을 개시된 그 정확한 형태로 제한하거나, 소모되도록 의도되지 않으며, 많은 변형 및 변경이 가능할 것이 명백하다. 그러한 변형 및 변경들이 첨부된 청구항으로 정의되는 본 발명의 범위 내에 포함될 것으로 의도된다는 것이 당업자에게 자명하다. In this specification, a computer program, software program, program, or software means any representation of a set of instructions in any language, code, or notation, wherein the instructions are either directly or after one of the following, or It is intended to result in a system having an information processing capability that performs both specific functions: (a) conversion to another language, code, or notation, and / or (b) reproduction in other material forms. . Various aspects described above with respect to the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and it is obvious that many variations and modifications are possible. It is apparent to those skilled in the art that such modifications and variations are intended to be included within the scope of the invention as defined by the appended claims.

본 발명은 처리 시스템 내에서 캐시와 같은 메모리를 액세스하는 데에 유용하다.The present invention is useful for accessing a memory such as a cache within a processing system.

Claims

In the cache (30, 130),

Means (38, 40) for storing data; And

Means (32) for selectively changing a line width for the cache

Cache containing.

The method of claim 1,

Means (12, 26) for communicating with the cache (30, 130) using the line width

Cache containing more.

The method of claim 1,

Means 18 for transferring data 38D from a plurality of data blocks 38 in the cache 30, 130 while performing a data operation.

Cache containing more.

The method of claim 1, wherein the line width,

A cache selected based on at least one of a processor (12, 26), a task (13, 28), and a performance assessment.

The method of claim 1,

Means for associating the line width with a data block 38

Cache containing more.

The method of claim 1,

Means (34, 134) for masking a portion of main memory address 36 based on the selected line width

Cache containing more.

In the method of managing the memory to communicate using the line width,

Selectively changing the line width; And

Delivering data in a memory operation, the amount of data being delivered based on the selected line width

Memory management method comprising a.

The method of claim 7, wherein the line width is,

A memory management method selected based on at least one of a processor (12, 26), a task (13, 28), and a performance assessment.

The method of claim 7, wherein the data transfer step,

Providing a main memory address (36) to the memory (30, 130);

Generating a lookup (37, 137) based on the main memory address and the line width; And

Transferring data (38D) from at least one data block (38) located within the memory using the main memory address and the lookup.

The method of claim 7, wherein

Associating the line width with the data

Memory management method further comprising.

11. The memory device of claim 10, wherein the memory (30, 130) comprises a first data block (38) associated with a first line width, and a second data block associated with a second line width that is different from the first line width. Including memory management method.

The method of claim 7, wherein

Associating line widths with operations 13 and 28;

When loading the job, selecting the line width; And

When unloading the job, storing the line width

Memory management method further comprising.

The method of claim 7, wherein

Varying line width for active operation

Memory management method further comprising.

In the processing system 10,

A memory 14, 30, 130 comprising a plurality of data blocks 38;

A processor (12, 26) in communication with the memory; And

Width units 22, 32 that store the line widths selected by the processor.

Wherein the amount of data transferred from the data block (38D) is based on the line width while a memory operation is performed.

The method of claim 14,

Main memory (20)

Further comprising, the memory (14) comprising data copied from the main memory.

The method of claim 14,

Address units 24, 34 and 134 for generating lookups 37 and 137 for locating at least one data block 38 in the memory 14-the lookup is the line width and main memory address Based on index portion 36B of (36)-

Processing system comprising more.

The method of claim 16, wherein the address unit (24, 34, 134),

And a tag (139) for matching a data block (38) with said main memory address (36), said tag based on said line width and said memory address.

The method of claim 17, wherein the index portion 36B is

Located within the main memory address (36) based on the line width, wherein the tag portion (36A) is masked based on the line width to generate the tag (139).

The method of claim 14,

Means for varying the line width of the active operation 13, 28

Processing system comprising more.

The method of claim 14,

Means for associating the line width with at least one of operations 13, 28 and processor 12, 26.

Processing system comprising more.