KR100343940B1

KR100343940B1 - Cache anti-aliasing during a write operation using translation lookahead buffer prediction bit

Info

Publication number: KR100343940B1
Application number: KR1020000009610A
Authority: KR
Inventors: 박성배
Original assignee: 삼성전자 주식회사
Priority date: 1999-02-26
Filing date: 2000-02-26
Publication date: 2002-07-20
Also published as: KR20000076741A

Abstract

세트 어소시어티브 캐쉬를 가지는 데이터 프로세서에서, 가상 주소는 중앙 처리 장치에서 발생된다. 가상 주소는 가상 페이지 번호와 가상 오프셋을 구비한다. 가상 페이지 번호는 물리 페이지 번호로 변환된다. 변환된 비트들 중의 적어도 한 비트는 쓰여질 캐쉬 세트에 관한 정보를 포함한다. 가상 페이지 번호는 변환된 캐쉬 세트 비트에 의해 수정된다. 그 다음, 수정된 가상 페이지 번호를 이용하여 캐쉬에 엑세스 한다. 이러한 방법으로, 캐쉬 쓰기 동작 동안의 얼리어싱 가능성을 제거하면서, 캐쉬 성능이 향상되고, 정확한 세그먼트 예측이 유지된다.In a data processor with a set associative cache, the virtual address is generated at the central processing unit. The virtual address has a virtual page number and a virtual offset. The virtual page number is converted to a physical page number. At least one of the converted bits contains information about the cache set to be written. The virtual page number is modified by the translated cache set bits. The cache is then accessed using the modified virtual page number. In this way, cache performance is improved and accurate segment prediction is maintained, while eliminating the possibility of aliasing during cache write operations.

Description

Cache anti-aliasing during a write operation using translation lookahead buffer prediction bit}

본 발명은 데이터 프로세서에 관한 것으로, 특히 세트 어소시어티브 캐쉬를 가지는 데이터 프로세서에서 캐쉬에 쓰여질 워드의 세트값을 결정하는 방법 및 시스템에 관한 것이다.The present invention relates to a data processor, and more particularly, to a method and system for determining a set value of a word to be written to a cache in a data processor having a set associative cache.

통상적인 데이터 프로세서들에서, 중앙 처리 장치(CPU, central processing unit, 이하 CPU라 함)는 정보를 저장하고 추출하기 위해 메모리에 의존한다. 시스템 메모리는 메모리 계층에 따라 구성된다. 메모리 계층은 레지스터(register),캐쉬(cache), 물리 메모리 및 가상 메모리를 포함한다. 대개 프로세서 내에서 CPU 근처에 형성되는 레지스터들은 빠른 엑세스(access)가 가능하고, CPU가 동작하는 동안에 일반적으로 가장 많이 활성화된다. 온-칩(on-chip)과 오프-칩(off-chip) 양쪽에 위치하는 캐쉬는 레지스터들과 레지스터들보다 느린 오프-칩의 물리 메모리 및 가상 메모리의 중간 인터페이스를 제공한다. 캐쉬는 메모리에 쓰여지는 데이터 및 메모리로부터 읽혀지는 데이터를 일시 저장하는 곳으로, 효율적인 CPU 엑세스를 가능하게 한다.In typical data processors, a central processing unit (CPU) relies on memory to store and extract information. System memory is organized according to memory hierarchy. The memory layer includes registers, caches, physical memory, and virtual memory. In general, registers formed near the CPU within the processor allow for quick access and are most commonly active while the CPU is running. Caches located on both on-chip and off-chip provide an intermediate interface between registers and off-chip physical and virtual memory that are slower than registers. The cache temporarily stores data written to and read from the memory, which enables efficient CPU access.

동작 중에, CPU는 빠른 온-칩 캐쉬를 참조함으로써 오프-칩(off-chip) 메모리에 저장되어 있는 데이터에 엑세스하거나, 오프-칩 메모리에 데이터를 기록한다. 요구된 데이터가 캐쉬에서 즉시 이용 가능하면, 캐쉬 히트(cache hit)가 발생한다. 그렇지 않으면, 캐쉬 미스(cache miss)가 발생하고, 시간이 오래 걸리는 오프-칩 메모리로부터의 데이터 추출 작업이 필요하다. 최근의 CPU 구조의 공통적인 목표는 캐쉬 히트율을 최적화하는 것이다.In operation, the CPU accesses data written to off-chip memory or writes data to off-chip memory by referencing the fast on-chip cache. If the requested data is immediately available in the cache, a cache hit occurs. Otherwise, cache misses occur and data extraction from time-consuming off-chip memory is required. A common goal of modern CPU architectures is to optimize cache hit rates.

가장 최근의 시스템에서, 오프-칩 메모리의 크기는 온-칩 캐쉬의 크기보다 훨씬 크다. 이러한 이유로, 캐쉬와 메모리를 매핑시키는 매핑(mapping) 기술이 사용된다. 매핑 기술은 일반적으로, 직접 캐쉬 매핑과 가상 캐쉬 매핑으로 나뉠 수 있다. 각각은 1994년 Morgan Kaufmann 출판사에서 출판되고 Patterson과 Hennessy가 쓴 "Computer Organization Design"이란 책의 7장 페이지 454 내지 527에 잘 나와 있다.In modern systems, the size of off-chip memory is much larger than the size of on-chip cache. For this reason, a mapping technique is used that maps cache and memory. Mapping techniques can generally be divided into direct cache mapping and virtual cache mapping. Each is published on page 7 of pages 454-527 of the book "Computer Organization Design" published by Morgan Kaufmann in 1994 and written by Patterson and Hennessy.

가장 간단한 캐쉬 구조는 각 메모리 위치를 해당 캐쉬 위치 또는 캐쉬 블록으로 할당한다. 이러한 캐쉬 구조는 이 기술 분야에서 직접 매핑된 캐쉬로 불리워진다. 각 메모리 주소는 캐쉬 태그(tag)와 연결된 캐쉬 인덱스(index)를 구비한다. 캐쉬 인덱스는 메모리가 연관되는 캐쉬 블록을 식별한다. 캐쉬 태그는 캐쉬 블록에 대한 엑세스를 가지는 각 메모리 위치를 고유하게 식별한다.The simplest cache structure allocates each memory location to its cache location or cache block. This cache structure is called a cache mapped directly in the art. Each memory address has a cache index associated with a cache tag. The cache index identifies the cache block with which the memory is associated. Cache tags uniquely identify each memory location that has access to a cache block.

캐쉬에 엑세스하는데 가상 주소가 사용되면 그 때의 캐쉬는 가상 캐쉬로, 캐쉬에 엑세스하는데 물리 주소가 사용되면 그 때의 캐쉬는 물리 캐쉬로 불리워진다. 가상 주소를 사용할 때는 물리 주소로의 변환이 필요하다. 최근의 CPU 구조는흔히 가상 주소 기술을 사용한다.When a virtual address is used to access the cache, the cache at that time is called a virtual cache. When a physical address is used to access the cache, the cache at that time is called a physical cache. When using a virtual address, translation to a physical address is required. Modern CPU architectures often use virtual address technology.

가상 메모리는 페이지 인덱스에 의해 각각 참조되는 다수의 페이지들로 배열된다. 가상 메모리 위치를 참조하기 위해 페이지 인덱스와 함께 오프셋(offset)이 사용된다. 하나의 가상 메모리 페이지의 크기가 캐쉬의 크기보다 작고, 다른 가상 주소가 동일한 물리 주소로 매핑될 때, 둘 이상의 동일한 물리 주소 엔트리(entry)가 주어진 캐쉬 내에 동시에 존재할 수 있다. 이러한 현상을 이 기술 분야에서는 캐쉬 얼리어싱(cache aliasing)이라고 한다. 최근 시스템은 캐쉬 얼리어싱의 문제를 보완하여야 한다.Virtual memory is arranged into a number of pages each referenced by a page index. An offset is used with the page index to refer to the virtual memory location. When the size of one virtual memory page is smaller than the size of the cache and different virtual addresses are mapped to the same physical address, two or more identical physical address entries may exist simultaneously in a given cache. This phenomenon is known as cache aliasing in this technical field. Modern systems must address the issue of cache aliasing.

캐쉬 얼리어싱을 없애는 통상적인 방법은 의사 세트 어소시어티브 캐쉬(pseudo-set associative cache)를 사용하여 가상 주소에 대한 물리 주소를 예측하는 기술과, 동일한 인덱스 태그를 읽어 그 결과가 캐쉬의 갱신(updating) 동안에 동일할 때 상기 동일한 태그를 무효화하는 방법, 그리고 가상 주소와 물리 주소의 하위 겹치는 비트들로 구성된 경우에는 동일한 캐쉬를 할당을 하는 방법을 포함한다. 상기의 기술들의 각각은 캐쉬 할당 기술에 있어서 캐쉬의 손실이 많고, 캐쉬에 저장된 내용을 무효시킬 때 성능이 떨어지는 등 많은 문제점을 가진다.Conventional methods for eliminating cache aliasing include techniques for predicting physical addresses for virtual addresses using pseudo-set associative caches, and reading the same index tag and updating the cache. Invalidating the same tag when identical, and allocating the same cache when the virtual address and the physical address are composed of lower overlapping bits. Each of the above techniques has many problems in cache allocation techniques such as high cache loss and poor performance when invalidating the contents stored in the cache.

본 발명이 이루고자 하는 기술적 과제는 종래 기술의 문제점을 극복하는 방법으로 캐쉬 기록 동작시의 캐쉬 얼리어싱 문제를 해결하는 기술을 제공하는 것이다.The technical problem to be solved by the present invention is to provide a technique for solving the cache aliasing problem in the cache write operation as a method of overcoming the problems of the prior art.

도 1은 본 발명에 따른 바람직한 캐쉬 구조를 나타내는 블록도이다.1 is a block diagram illustrating a preferred cache structure in accordance with the present invention.

도 2는 본 발명에 따른 캐쉬 세트 예측 비트 발생기를 포함하는 트랜슬레이션 룩어헤드 버퍼(TLB)의 블록도이다.2 is a block diagram of a translation lookahead buffer (TLB) including a cache set prediction bit generator in accordance with the present invention.

도 3은 본 발명에 따른 캐쉬 쓰기 및 세트 예측 비트의 발생시의 타이밍을 나타내는 타이밍도이다.3 is a timing diagram illustrating timing at the generation of cache write and set prediction bits in accordance with the present invention.

상기 기술적 과제를 이루기 위한 본 발명의 일면은 세트 어소시어티브 캐쉬를 가지는 데이터 프로세서에서 캐쉬에 쓰여질 워드(word)의 세트값을 결정하는 방법에 관한 것이다. 가상 주소가 중앙 처리 장치에서 발생된다. 가상 주소는 가상 페이지 번호 및 가상 오프셋을 구비한다. 가상 페이지 번호는 물리 페이지 번호로 변환된다. 변환된 비트들의 적어도 한 비트는 쓰여질 캐쉬 세트에 관한 정보를 포함한다. 가상 페이지 번호는 변환된 캐쉬 세트 비트들을 이용하여 수정된다. 그 다음 수정된 가상 페이지 번호를 이용하여 캐쉬에 엑세스한다.One aspect of the present invention for achieving the above technical problem relates to a method for determining a set value of a word to be written to the cache in a data processor having a set associative cache. The virtual address is generated at the central processing unit. The virtual address has a virtual page number and a virtual offset. The virtual page number is converted to a physical page number. At least one bit of the converted bits contains information about the cache set to be written. The virtual page number is modified using the translated cache set bits. The cache is then accessed using the modified virtual page number.

바람직한 실시에에서는, 수정된 가상 페이지 번호를 이용하여 캐쉬에 엑세스한 후에, 캐쉬 히트가 발생했는지를 결정하기 위하여 변환된 물리 페이지 번호가 수정된 가상 페이지 번호와 비교된다. 변환시, 쓰여질 캐쉬 세트의 표시인 세트 예측 비트가 발생된다. 세트 예측 비트는 하나의 비트일 수도 있고, 복수의 비트들일 수도 있다.In a preferred embodiment, after accessing the cache using the modified virtual page number, the translated physical page number is compared with the modified virtual page number to determine if a cache hit has occurred. Upon conversion, a set prediction bit is generated that is an indication of the cache set to be written. The set prediction bit may be one bit or may be a plurality of bits.

본 발명에 의해, 캐쉬 크기의 확장 없이 캐쉬 성능이 향상된다. 그리고, 캐쉬의 쓰기 동작 동안에 세트 예측 비트를 사용함으로써, 정확한 세그먼트(segment)예측이 유지될 수 있다.By the present invention, cache performance is improved without expanding the cache size. And, by using the set prediction bits during the write operation of the cache, accurate segment prediction can be maintained.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다.In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the practice of the present invention, reference should be made to the accompanying drawings which illustrate preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그리고 본 명세서에서는 설명의 편의상 각 도면을 통하여 동일한 역할을 수행하는 신호는 동일한 참조 부호로 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the present specification, for the convenience of description, signals performing the same role through the drawings are denoted by the same reference numerals.

도 1은 본 발명의 바람직한 실시예를 나타내는 캐쉬 구조의 블록도이다. CPU(20)은 구현된 기술에 따라 명령어를 실행하고 데이터를 처리한다. 일반적으로, CPU(20)와 메모리(54) 사이의 데이터 및 명령어 교환의 가장 효율적인 수단은 온-칩 캐쉬(24)를 통하는 것이다. CPU(20)가 제1 캐쉬(24A)로부터 워드(word)를 읽고자 할 때, CPU(20)는 가상 주소(VA), 예를 들어, VA<41:0>를 발생한다. 가상 주소(VA)는 페이지 인덱스, 예를 들어, VA<41:14>와 오프셋, 예를 들어, VA<13:0>을 포함한다. 여기서, <x:y>는 데이터 워드 또는 주소 워드에서 비트 위치 x와 y를 표시하기 위해 당해 기술 분야에서 흔히 사용되는 명명법이다. 페이지 인덱스는 엑세스될 데이터의 물리 주소에 관한 정보를 포함한다. 오프셋은 엑세스될 제1 캐쉬(24A)의 블록(25A)을 나타낸다. 캐쉬 읽기 동안, 오프셋에 따라 캐쉬 블록(25A)에 엑세스하고, 캐쉬 히트인지 캐쉬 미스인지를 결정하기 위해 비교기(30)에서 해당 캐쉬 태그(26A)가 페이지 인덱스와 비교된다. 캐쉬 히트가 발생하면, CPU(20)는 제1 캐쉬(24A)로부터 해당 데이터(32)를 이용할 수 있다. 만약, 캐쉬 미스가 발생하면, 메모리(54)로부터데이터를 가져온다. 도 1에서, 가상 메모리 주소 VA<41:14>에서 물리 메모리 주소 PA<31:14>로의 변환은 트랜슬레이션 룩어헤드 버퍼(Translation Lookahead Buffer, TLB, 22)에서 이루어진다. 이 변환 과정은 아래에서 더욱 자세하게 기술된다.1 is a block diagram of a cache structure representing a preferred embodiment of the present invention. The CPU 20 executes instructions and processes data in accordance with the implemented technology. In general, the most efficient means of exchanging data and instructions between the CPU 20 and the memory 54 is through the on-chip cache 24. When the CPU 20 wants to read a word from the first cache 24A, the CPU 20 generates a virtual address VA, for example, VA <41: 0>. The virtual address VA includes a page index, eg VA <41:14> and an offset, eg VA <13: 0>. Here, <x: y> is a nomenclature commonly used in the art to indicate bit positions x and y in a data word or address word. The page index contains information about the physical address of the data to be accessed. The offset represents the block 25A of the first cache 24A to be accessed. During a cache read, the cache tag 25A is compared with the page index in the comparator 30 to access the cache block 25A according to the offset and to determine whether it is a cache hit or a cache miss. When a cache hit occurs, the CPU 20 may use the corresponding data 32 from the first cache 24A. If a cache miss occurs, the data is retrieved from the memory 54. In FIG. 1, the translation from virtual memory address VA <41:14> to physical memory address PA <31:14> is done in a translation lookahead buffer (TLB) 22. This conversion process is described in more detail below.

최근의 CPU 구조들에서는 흔히 가상 페이지 오프셋의 폭보다 크기에서 더 큰 캐쉬를 사용하는 것이 바람직하다. 캐쉬의 크기를 키우는 것이 전체적인 시스템 성능을 향상시키는데 도움이 되지만, 문제들을 야기할 수도 있다. 예를 들어, 도 1에서 도시된 세트 어소시어티브 캐쉬 구조에서는 캐쉬(24)는 각각 8K의 캐쉬 블록들(25A, 25B)을 포함하여, 총 16K의 캐쉬 블록들이 되는 두 개의 캐쉬 세트(24A, 24B)로 구성될 수 있다. 반면에, 주소 워드의 오프셋 부분은 단지 8K의 위치들, VA<13:0>으로 구성될 수 있다. 캐쉬 읽기를 수행할 때, 유효한 데이터는 두 개의 캐쉬 세트(24A, 24B)의 어느 한 곳에 저장될 수 있다. 각 세트는 캐쉬 오프셋 VA<13:0>에 의해 엑세스되고, 해당 태그(26A, 26B)는 가상 메모리의 페이지 인덱스와 비교된다. 페이지 인덱스와 일치하는 태그의 해당 데이터가 올바른 데이터이다. 그러므로, 이 예에서, 캐쉬 읽기 동작은 문제가 없다.In modern CPU architectures it is often desirable to use a cache larger in size than the width of the virtual page offset. Increasing the size of the cache helps improve overall system performance, but can also cause problems. For example, in the set associative cache structure shown in FIG. 1, the cache 24 includes 8K cache blocks 25A and 25B, respectively, so that two cache sets 24A and 24A become total 16K cache blocks. 24B). On the other hand, the offset portion of the address word may consist of only 8K positions, VA <13: 0>. When performing a cache read, valid data can be stored in either cache set 24A, 24B. Each set is accessed by cache offsets VA <13: 0> and the corresponding tags 26A, 26B are compared with the page indexes of the virtual memory. The data in the tag that matches the page index is correct. Therefore, in this example, the cache read operation is not a problem.

그러나, 캐쉬 쓰기 동작중에는 캐쉬 얼리어싱이라는 이상이 발생할 수 있다. CPU(20)가 특정한 캐쉬 블록(25A)에 워드를 쓰고자 할 때, CPU(20)는 캐쉬 오프셋 위치 VA<13:0>을 포함하는 가상 주소(VA)를 발생한다. 그러나, 캐쉬(24)는 가상 주소 오프셋 위치의 2배의 블록들, 예를 들어, 캐쉬 인덱스 <13:0>에 따라 각각 인덱스되는 두 개의 캐쉬 세트(24A, 24B)를 포함한다. 이러한 이유로, CPU가 제1 캐쉬 세트(24A)에 쓰고자 하는지 또는 제2 캐쉬 세트(24B)에 쓰고자 하는지 알 수 없다.However, abnormalities such as cache aliasing may occur during the cache write operation. When the CPU 20 wants to write a word to a particular cache block 25A, the CPU 20 generates a virtual address VA containing the cache offset positions VA <13: 0>. However, cache 24 includes two cache sets 24A, 24B, each indexed according to cache index <13: 0>, twice as many as the virtual address offset position. For this reason, it is unknown whether the CPU wants to write to the first cache set 24A or to the second cache set 24B.

본 발명은 시스템 성능의 저하없이, 캐쉬 쓰기 동작 중에 항상 적절한 캐쉬 세트를 예측하게 하는 방법으로 캐쉬 얼리어싱 문제를 해결한다.The present invention solves the cache aliasing problem in such a way that the proper cache set is always predicted during cache write operations without degrading system performance.

특히, 세트 예측 비트 발생기(28)을 포함하는 TLB(22)가 사용된다. TLB는 가상 주소(VA)와 물리 주소(PA) 사이의 효율적인 주소 변환을 수행하기 위해 CPU 구조에서 흔히 사용된다. 앞에서 인용한, Patterson 공저의 책 페이지 491에서 기술되어 있는 바와 같이, TLB는 가상 메모리와 물리 메모리 사이의 최근 사용된 변환엔트리들을 추적하는 캐쉬로 동작한다. TLB를 참조할 때마다 앞에서 기술한 데이터 캐쉬에서와 같은 방법으로, TLB 히트 또는 TLB 미스가 결정된다. 미스가 나면, 페이지 테이블(이전에 메모리에 저장되어 있는)로부터 TLB로 변환 엔트리를 가져오고, 그 다음 참조될 수 있다. TLB 참조가 히트가 될 때, 가상 주소(VA)는 물리 주소(PA)로 매핑된다. 그 결과인 물리 주소(PA)는 비교기(30)에서 캐쉬 태그(26A, 26B)와 비교된다. 캐쉬 히트가 일어나면, CPU는 관련 데이터(32)를 이용할 수 있다.In particular, a TLB 22 is used that includes a set prediction bit generator 28. TLB is commonly used in CPU architecture to perform efficient address translation between virtual address (VA) and physical address (PA). As described in Patterson's book page 491, cited earlier, the TLB acts as a cache to keep track of recently used translation entries between virtual and physical memory. Each time a TLB is referenced, the TLB hit or TLB miss is determined in the same manner as in the data cache described above. If a miss occurs, the translation entry is taken from the page table (previously stored in memory) into the TLB, which can then be referenced. When the TLB reference becomes a hit, the virtual address VA is mapped to the physical address PA. The resulting physical address PA is compared with cache tags 26A and 26B at comparator 30. If a cache hit occurs, the CPU may use the associated data 32.

도 2는 본 발명의 트랜슬레이션 룩어헤드 버퍼(TLB, 22)의 상세 블록도이다. 바람직한 실시예에서, TLB(22)는 콘텐트 어드레서블 메모리(Content Addressable Memory, CAM, 52)와 랜덤 엑세스 메모리(RAM, 52)을 구비한다. 가상 주소(VA)는 CAM(50)으로 입력되고, CAM의 출력(51)은 해당 RAM(52) 주소를 참조하는데 사용된다. RAM(52)의 출력은 가상 주소(VA)에 해당하는 물리 주소(PA)를 나타낸다.2 is a detailed block diagram of the translation lookahead buffer (TLB) 22 of the present invention. In a preferred embodiment, the TLB 22 has a content addressable memory (CAM) 52 and a random access memory (RAM) 52. The virtual address VA is input to the CAM 50 and the output 51 of the CAM is used to reference the corresponding RAM 52 address. The output of the RAM 52 indicates the physical address PA corresponding to the virtual address VA.

본 발명의 TLB는 다양한 CAM(50)의 위치에 해당하는 많은 수의 지정 가능한 위치를 포함하는 세트 예측 비트 레지스터(28)를 더 구비한다. 캐쉬 읽기 동안 캐쉬가 채워질 때, 물리 주소(PA)의 세트 비트, 예를 들어, 비트 14가 TLB(22)와 세트예측 비트 레지스터(28)에 저장되고, 기록된다. 예를 들어, 데이터가 제1 캐쉬 세트(24A)에 저장되면, 세트 예측 비트(APB)는 '0'으로 설정된다. 반면에, 데이터가 제2 캐쉬 세트(24B)에 저장되면, 세트 예측 비트(APB)는 '1'로 설정된다.The TLB of the present invention further includes a set prediction bit register 28 that includes a large number of addressable locations corresponding to the locations of the various CAMs 50. When the cache is filled during the cache read, the set bits of the physical address PA, for example bit 14, are stored in the TLB 22 and the set prediction bit register 28 and written. For example, if data is stored in the first cache set 24A, the set prediction bit ABP is set to '0'. On the other hand, if data is stored in the second cache set 24B, the set prediction bit ABP is set to '1'.

다시 도 1을 참조하면, 캐쉬 쓰기 동안에, CPU는 페이지 인덱스 VA<41:14> 와 오프셋 VA<13:0>을 포함하는 가상 주소(VA)를 발생한다. 가상 주소(VA)는 즉시 TLB(22)로 입력되고, 해당 세트 예측 비트(APB)가 세트 예측 비트 레지스터(28)로부터 출력된다. 세트 예측 비트(APB)는 가상 주소(VA)와 연결되고, 가상 주소(VA)가 캐쉬(24)를 엑세스하기 위해 사용되기 전에 비트 14를 덮어 쓴다. 이 과정은 가상 주소(VA)를 물리 주소(PA)로 변환하는 동안에 일어난다. 예를 들어, 캐쉬 쓰기 동안에, 두 개의 캐쉬 블록들이 동일한 태그(tag0, tag1)를 가진다면, 세트 예측 비트(APB)는 올바른 블록을 선택하기 위해 사용되고, 올바른 블록만이 업데이트된다.Referring again to FIG. 1, during a cache write, the CPU generates a virtual address VA that includes page indexes VA <41:14> and offsets VA <13: 0>. The virtual address VA is immediately input into the TLB 22, and the corresponding set prediction bit APB is output from the set prediction bit register 28. The set prediction bit (APB) is associated with the virtual address VA and overwrites bit 14 before the virtual address VA is used to access the cache 24. This process occurs during the translation of the virtual address (VA) into a physical address (PA). For example, during cache write, if two cache blocks have the same tag tag0, tag1, the set prediction bit ABP is used to select the correct block, and only the correct block is updated.

도 3은 본 발명에 따른, 캐쉬 쓰기 및 세트 예측 비트(APB) 발생시의 타이밍을 나타내는 타이밍도이다. 클럭 신호(CLK)는 표준 시스템 클럭(clock)을 나타낸다. 시간 t₀에서, 가상 주소(VA)가 TLB(22)의 입력으로서 이용 가능해진다. 가상 주소(VA)는 시간 t₀의 클럭 신호(CLK)의 상승 엣지(rising edge)에서 CAM(50)으로 입력된다. 한 클럭 싸이클(cycle) 뒤에 해당 물리 주소(PA)가 TLB(22)의 출력으로 이용 가능해진다. 종래 기술은 캐쉬 쓰기의 얼리어싱 문제를 해결하는데 있어서, 물리 주소(PA)에 의존한다. 그러나, 본 발명은 시간 t₁에서 이용 가능한 세트 예측비트(APB)를 즉시 발생하는 세트 예측 비트 레지스터(28)를 이용한다. 태그(26A, 26B)는 즉시 수정되고, 적절한 캐쉬 세트(24A, 24B)가 수정된 페이지 번호로 엑세스된다. 이 수정은 종래 기술과는 반대로, 가상 주소(VA)가 캐쉬에 이용 가능해지는 시점과 같은 클럭 싸이클 이내에서 이루어진다. 종래 기술에서는, 다음 클럭 싸이클 동안에 이용 가능한 물리 주소에 의존한다. 이러한 방법으로, 적절한 캐쉬 세트가 항상 엑세스되고, 따라서 캐쉬 얼리어싱이 제거된다.3 is a timing diagram illustrating timing at the time of cache write and set prediction bit (APB) generation in accordance with the present invention. The clock signal CLK represents a standard system clock. At time t ₀ , the virtual address VA becomes available as an input to the TLB 22. The virtual address VA is input to the CAM 50 at the rising edge of the clock signal CLK at time t ₀ . After one clock cycle, the corresponding physical address PA becomes available as an output of the TLB 22. The prior art relies on the physical address PA in solving the aliasing problem of cache writes. However, the present invention uses the set prediction bit register 28 which immediately generates the set prediction bit (APB) available at time t ₁ . The tags 26A and 26B are modified immediately, and the appropriate cache set 24A and 24B is accessed with the modified page number. In contrast to the prior art, this modification is made within the same clock cycle as when the virtual address VA becomes available for caching. In the prior art, it depends on the physical address available during the next clock cycle. In this way, the appropriate cache set is always accessed, thus cache aliasing is eliminated.

본 발명은 바람직한 실시예를 참고로 도시되고, 기술되었으나, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been shown and described with reference to preferred embodiments, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

예를 들어, 세련된 구조는 데이터 버스 및 명령어 버스에 대해 분리된 캐쉬와 같이 다수의 캐쉬 구성을 가질 수도 있다. 본 발명은 그러한 시스템에도 동일하게 잘 적용된다.For example, a sophisticated architecture may have multiple cache configurations, such as separate caches for the data bus and instruction bus. The present invention equally well applies to such systems.

다른 실시예에서, 두 개 이상의 캐쉬 세트가 사용될 때, 세트 예측 비트(APB)가 복수의 비트들로 구성될 수도 있다.In another embodiment, when two or more cache sets are used, the set prediction bit (APB) may consist of a plurality of bits.

본 발명에 의하여, 캐쉬 쓰기 동작 동안의 얼리어싱 가능성이 제거되면서, 캐쉬 성능이 향상되고, 정확한 세그먼트 예측이 유지된다.By the present invention, the possibility of aliasing during a cache write operation is eliminated, while cache performance is improved and accurate segment prediction is maintained.

Claims

In a data processor having a set associative cache,

A) generating, at the central processing unit, a virtual address comprising a virtual page number and a virtual offset and which will be written to the cache;

B) converting the virtual page number into a physical page number, wherein at least one of the converted bits includes information regarding a cache set to be written;

C) modifying the virtual page number using the converted cache set bits before cache access; And

And d) accessing said cache using said modified virtual page number.

The method of claim 1 wherein the method is

And after step D), comparing the physical page number with the modified virtual page number to determine if a cache hit has occurred.

The method of claim 1, wherein the step B)

Generating a set prediction bit that is an indication of the cache set.

The method of claim 3, wherein the set prediction bits

10. A method of determining a set value of a word to be written to a cache, characterized by comprising a plurality of bits.

The method of claim 1, wherein the step D)

And a method for determining a set value of a word to be written to the cache, wherein the converted physical page number is earlier than an available time point.

In a data processor having a set associative cache,

A cache comprising a plurality of assignable blocks grouped into a plurality of sets;

A central processing unit including a virtual page number and a virtual offset and generating a virtual address to be written to the cache;

A converter for converting the virtual page number into a physical page number wherein at least one bit of the bits to be converted includes information about a cache set to be written; And

And circuitry for modifying the virtual page number using the converted cache set bits prior to cache access.

The system of claim 6 wherein the system is

And a comparator for comparing the physical page number with the modified virtual page number to determine if a cache hit has occurred.

The method of claim 6, wherein the transducer

And means for generating a set prediction bit that is an indication of the cache set.

The method of claim 8, wherein the set prediction bits

A set value determination system of a word to be written to a cache, characterized by comprising a plurality of bits.

The method of claim 6,

And the modification of the virtual page number precedes the available time of the converted physical page number.