KR20080006136A

KR20080006136A - Cache memory apparatus for 3-dimensional graphic computation, and method of processing 3-dimensional graphic computation

Info

Publication number: KR20080006136A
Application number: KR1020060064908A
Authority: KR
Inventors: 김재우
Original assignee: 엠텍비젼 주식회사
Priority date: 2006-07-11
Filing date: 2006-07-11
Publication date: 2008-01-16
Also published as: KR100840011B1

Abstract

A cache memory device for 3D graphic computation and a 3D graphic computation processing method are provided to refer to apex index arrangement for more accurately predicting the processing order of apex data used in the 3D graphic computation to improve a cache hit rate although a cache memory of a limited size is used. A cache memory array(411) stores apex data used in 3D graphic computation. A tag memory(412) stores first apex indexes corresponding to the apex data. An index buffer(421) stores second apex indexes associated with the graphic computation according to the processing order of the graphic computation. A cache memory control unit(410) compares the second apex index with the first apex indexes, and judges whether to perform a cache hit of the apex data.

Description

CACHE MEMORY APPARATUS FOR 3-DIMENSIONAL GRAPHIC COMPUTATION, AND METHOD OF PROCESSING 3-DIMENSIONAL GRAPHIC COMPUTATION}

도 1은 캐쉬 메모리를 포함하는 종래의 데이터 처리 시스템의 데이터 흐름을 나타내는 블록도이다.1 is a block diagram illustrating a data flow of a conventional data processing system including a cache memory.

도 2는 도 1의 시스템에서 캐쉬 메모리 읽기 동작을 수행하는 방법을 단계별로 도시한 흐름도이다.2 is a flowchart illustrating a method of performing a cache memory read operation in the system of FIG. 1.

도 3은 본 발명에 따른 3차원 그래픽 가속기의 주메모리, 캐쉬 메모리, 및 실행부 간의 데이터 흐름을 나타내는 블록도이다.3 is a block diagram illustrating a data flow between a main memory, a cache memory, and an execution unit of a three-dimensional graphics accelerator according to the present invention.

도 4는 본 발명에 따른 캐쉬 메모리와 그 주변 회로의 내부 구성을 상세히 도시한 블록도이다.4 is a block diagram showing in detail the internal configuration of the cache memory and its peripheral circuit according to the present invention.

도 5는 본 발명에 따른 캐쉬 메모리를 이용한 3차원 그래픽 연산 처리 방법을 단계별로 도시한 흐름도이다.5 is a flowchart illustrating step by step a three-dimensional graphics operation processing method using a cache memory according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

310: 캐쉬 메모리 320: 캐쉬 메모리 제어기310: cache memory 320: cache memory controller

330, 430: 처리 순서 버퍼 340: 실행부330 and 430: processing sequence buffer 340: execution unit

350: 주메모리 360: 주메모리 제어기350: main memory 360: main memory controller

370: 메모리 버스 411: 캐쉬 메모리 어레이370: memory bus 411: cache memory array

412: 태그 메모리 413: 캐쉬 메모리 제어부412: tag memory 413: cache memory control unit

421: 인덱스 버퍼 431: 처리 순서 버퍼421: Index buffer 431: Processing sequence buffer

본 발명은 3차원 그래픽 가속기에 적용되는 캐쉬 메모리에 관한 것으로서, 보다 구체적으로 3차원 그래픽 연산에 이용되는 정점 데이터를 저장하고 정점 인덱스를 이용하여 캐쉬 히트 여부를 판단하는 정점 데이터 캐쉬 메모리의 구성에 관한 것이다.The present invention relates to a cache memory applied to a three-dimensional graphics accelerator, and more particularly, to a configuration of a vertex data cache memory for storing vertex data used for three-dimensional graphics operations and determining whether or not a cache hit using a vertex index. will be.

3차원 그래픽 기술은 디지털 컴퓨터와 3차원 소프트웨어를 이용하여 3차원 공간에 위치한 물체의 모양, 색상, 움직임 등을 시각화하는 기술을 의미한다. 3차원 그래픽 기술은 최근 3차원 게임 시장의 성장에 힘입어 크게 발전하였고, 사실상 교육, 의료, 군사, 예술 등의 분야에서 이미 널리 이용되고 있는 핵심 기반 기술이기도 하다.Three-dimensional graphics technology refers to a technology for visualizing the shape, color, and motion of an object located in three-dimensional space using a digital computer and three-dimensional software. 3D graphics technology has been greatly developed in recent years due to the growth of the 3D game market, and in fact, it is also a core foundation technology that is already widely used in the fields of education, medical care, military, and art.

이처럼 3차원 그래픽 기술이 널리 이용되는 것은, 이 기술이 물체를 사실감 있게 보여 줌으로써 사용자에게 직관적으로 정보를 전달할 수 있기 때문이다. 일반적인 3차원 그래픽 영상에서 물체는 다수의 선(edge) 또는 폴리곤(polygon)으로 구성된다. 3차원 그래픽 연산 처리 장치 또는 3차원 그래픽 가속기는 폴리곤을 구성하는 정점(vertex)들에 대해 3차원 그래픽 연산을 적용함으로써 3차원 공간상에 모델링되는 물체가 사용자에게 시각화될 수 있도록 한다.This 3D graphics technology is widely used because this technology can intuitively convey information to the user by showing the object realistically. In a typical 3D graphic image, an object is composed of a plurality of edges or polygons. The three-dimensional graphics processing unit or three-dimensional graphics accelerator allows the user to visualize objects modeled in three-dimensional space by applying three-dimensional graphics operations on the vertices constituting the polygon.

많은 수의 파라미터를 참조하는 복잡한 계산을 수행해야 하는 3차원 그래픽 연산의 특성상, 3차원 그래픽 가속기를 설계하는 경우에 연산 및 데이터 접근의 중복을 최소화함으로써 수행 성능을 최적화하는 것이 중요한 과제이다.Due to the nature of three-dimensional graphics operations that require complex calculations that refer to a large number of parameters, it is important to optimize performance by minimizing duplication of computation and data access when designing three-dimensional graphics accelerators.

특히 3차원 그래픽 처리 시스템은 대용량의 이미지 데이터를 저장하기 위해 SDRAM(synchronous dynamic random access memory)으로 구성되는 주메모리를 이용하는 경우가 많고, 이 경우 3차원 그래픽 가속기 외에 다른 장치들과 주메모리 자원을 공유하기 위해 메모리 버스를 이용하게 된다. 만약 3차원 그래픽 연산에서 파라미터로 사용되는 정점 데이터를 참조하기 위해 직접 주메모리에 접근(access)한다면 메모리 버스 이용에 따른 오버헤드로 인해 그래픽 가속기의 동작 효율이 떨어지게 된다. 따라서, 이로 인한 성능 저하를 막기 위해 주메모리와 그래픽 가속기 사이에 캐쉬 메모리를 두어 주메모리 접근 오버헤드를 감소시킨다.In particular, the 3D graphics processing system often uses a main memory composed of synchronous dynamic random access memory (SDRAM) to store a large amount of image data, in which case the main memory resources are shared with other devices besides the 3D graphics accelerator. The memory bus is used to do this. If the main memory is directly accessed to refer to the vertex data used as a parameter in three-dimensional graphics operations, the overhead of using the memory bus reduces the efficiency of the graphics accelerator. Thus, to prevent this performance degradation, cache memory is placed between the main memory and the graphics accelerator to reduce main memory access overhead.

도 1은 캐쉬 메모리를 포함하는 종래의 일반적인 데이터 처리 시스템의 데이터 흐름을 도시한 블록도이다. 도 1을 참조하면, 3차원 그래픽 연산을 수행하는 연산 장치 등의 실행부(130)는 주메모리(140)에서 데이터를 독출하기에 앞서, 캐쉬 메모리 제어기(120)를 통해 먼저 캐쉬 메모리(110)를 참조한다.1 is a block diagram illustrating the data flow of a conventional general data processing system including a cache memory. Referring to FIG. 1, the execution unit 130, such as an arithmetic unit that performs 3D graphic calculation, first reads the cache memory 110 through the cache memory controller 120 before reading data from the main memory 140. See.

캐쉬 메모리 제어기(120)는 실행부(130)가 사용하고자 하는 데이터가 캐쉬 메모리(110)에 저장되어 있는 경우, 즉 캐쉬 히트인 경우에 캐쉬 메모리(110)로부터 해당 데이터를 독출하여 실행부(130)로 전달한다. 그러나 캐쉬 메모리(110)에 실행부(130)가 사용하고자 하는 데이터가 저장되어 있지 않은 경우, 즉 캐쉬 미스 인 경우에는 메모리 버스(160)에 연결된 주메모리 제어기(150)를 통해 주메모리(140)로부터 상기 데이터를 캐쉬 메모리(140)로 복사해 온 뒤에 실행부(130)로 전달한다.The cache memory controller 120 reads the data from the cache memory 110 when the data to be used by the execution unit 130 is stored in the cache memory 110, that is, the cache hit, and executes the execution unit 130. To pass). However, when the data to be used by the execution unit 130 is not stored in the cache memory 110, that is, when the cache misses, the main memory 140 is connected to the main memory controller 150 connected to the memory bus 160. The data is copied from the cache memory 140 to the execution unit 130 after being copied.

도 2는 도 1의 시스템에서 캐쉬 메모리 읽기 동작을 수행하는 방법을 단계별로 도시한 흐름도이다. 도 2에 도시된 바와 같이, 단계(S210)에 의해 실행부(130)로부터 캐쉬 읽기 요구(cache read request) 또는 명령(command)이 전달되면, 단계(S220)에서는 캐쉬 메모리(110)를 참조하여 상기 캐쉬 읽기 요구와 연관된 데이터의 캐쉬 히트 여부를 검사한다. 만약 캐쉬 히트라면, 단계(S230)에 의해 캐쉬 메모리(110)로부터 데이터를 독출하여 실행부(130)로 전달하고, 이와 달리 캐쉬 미스라면, 단계(S240)에서는 데이터를 복사해 오기 위해 주메모리(140)에 접근한다. 이 때, 도시된 단계(S250)에 설명된 것처럼 필요한 데이터뿐만 아니라 인접한 주소의 일정량의 데이터를 주메모리에서 복사해 오게 된다.2 is a flowchart illustrating a method of performing a cache memory read operation in the system of FIG. 1. As shown in FIG. 2, when a cache read request or a command is transmitted from the execution unit 130 by step S210, the cache memory 110 is referred to in step S220. The cache hit of the data associated with the cache read request is checked. If it is a cache hit, the data is read from the cache memory 110 by the step S230 and transferred to the execution unit 130. Otherwise, if the cache is missed, in step S240, the main memory (S240) is used to copy the data. 140). At this time, as described in the illustrated step S250, not only the necessary data but also a certain amount of data of an adjacent address are copied from the main memory.

이처럼 필요한 데이터 이외에 일정량의 데이터를 함께 복사해 오는 이유는, 데이터의 지역성(locality)에 기인한다. 즉, 일반적인 프로그램의 실행 로직 및 데이터의 저장 구조상 실행부(130)가 현재 접근하는 메모리 부근에 저장된 데이터를 가까운 장래에 사용하게 될 가능성이 높다는 것이다.The reason why a certain amount of data is copied together in addition to the necessary data is due to the locality of the data. That is, the execution logic of the general program and the data storage structure of the general program is likely to use the data stored in the near future in the near future memory execution unit 130.

그러나, 이와 같은 일반적인 캐쉬 메모리를 3차원 그래픽 가속기에 적용할 경우에 다음과 같은 몇 가지 문제점이 발생할 수 있다.However, when applying such a general cache memory to the 3D graphics accelerator, some problems may occur.

첫째, 종래의 일반적인 캐쉬 메모리에서처럼 실행부(130)가 필요한 시점에 데이터를 캐쉬 메모리(110)에서 검사하여 데이터가 없는 경우에 주메모리(140)에 접근하게 되면, 실행부(130)는 캐쉬 메모리 제어기(120)가 데이터를 주메모리(140)에서 캐쉬 메모리(110)로 복사해 오는 동안 많은 시간 동안 실행을 멈춰야 한다. 이와 같은 문제점은 종래의 일반적인 캐쉬 메모리를 적용함에 있어서 3차원 그래픽 연산의 특수성을 반영하지 않기 때문에 발생하는 것이다.First, when the execution unit 130 checks the data in the cache memory 110 when the execution unit 130 is needed as in the conventional general cache memory and approaches the main memory 140 when there is no data, the execution unit 130 is the cache memory. While the controller 120 is copying data from the main memory 140 to the cache memory 110, execution must be stopped for many hours. This problem occurs because the conventional cache memory does not reflect the specificity of the three-dimensional graphics operation.

둘째, 지역성 효과에 근거하여 일정량의 데이터를 일괄적으로 복사하는 종래의 캐쉬 메모리 제어 방식이 효과를 거두기 위해서는 캐쉬 메모리의 크기가 적어도 한 번에 복사해 오는 데이터량의 수 배에 달하는 일정 수준 이상이어야 한다. 따라서 캐쉬 메모리의 크기를 작게 설계하거나 또는 필요에 의해 가용 캐쉬 메모리의 크기가 제한되는 경우에는 종래의 일반적인 캐쉬 메모리 구조가 만족할 만한 성능을 발휘하지 못하게 된다.Second, in order for the conventional cache memory control method that copies a certain amount of data to be collectively based on the locality effect, the cache memory must be at least a certain level that is at least several times the amount of data copied at one time. do. Therefore, when the size of the cache memory is designed to be small or the size of the available cache memory is limited by necessity, the conventional general cache memory structure does not exhibit satisfactory performance.

이에 본 발명에서는 상술한 문제점을 해결하기 위하여 3차원 그래픽 연산의 특수성을 반영하여 보다 효율적으로 동작하도록 설계된 캐쉬 메모리 구조를 제안하고자 한다.Accordingly, in order to solve the above problems, the present invention proposes a cache memory structure designed to operate more efficiently by reflecting the specificity of three-dimensional graphic operations.

본 발명은 상기와 같은 종래 기술을 개선하기 위해 안출된 것으로서, 정점 데이터의 인덱스 배열을 참조하여 캐쉬 히트 여부를 판단하는 새로운 캐쉬 메모리의 구성을 제공하는 것을 그 목적으로 한다.SUMMARY OF THE INVENTION The present invention has been made to improve the prior art as described above, and an object thereof is to provide a configuration of a new cache memory that determines whether a cache hit is made by referring to an index array of vertex data.

3차원 그래픽 연산에 사용되는 정점 데이터의 처리 순서를 보다 정확하게 예측하게 하는 정점 인덱스 배열을 참조함으로써, 본 발명은 제한된 크기의 캐쉬 메모리를 사용하는 경우에도 캐쉬 히트율을 향상시키는 것을 그 목적으로 한다.By referring to a vertex index array that makes it possible to more accurately predict the processing order of vertex data used in three-dimensional graphics operations, the present invention aims to improve the cache hit rate even when using a limited size cache memory.

또한, 위와 같은 구성을 통하여 본 발명은 종래의 일반적인 캐쉬 메모리에 사용되는 캐쉬 메모리 어레이의 크기를 큰 폭으로 축소시켜, 회로 면적을 절약하고 공정 비용 및 공정 불량률을 저감하는 것을 그 목적으로 한다.In addition, through the above configuration, the present invention aims to significantly reduce the size of the cache memory array used in the conventional general cache memory, to save circuit area, and to reduce the process cost and process failure rate.

또한, 본 발명은 3차원 그래픽 연산이 수행되고 있는 동안에 다음 연산에 필요한 정점 데이터를 주메모리로부터 복사해 옴으로써, 3차원 그래픽 연산 장치의 유휴 대기(idle waiting) 시간을 감소시켜 3차원 그래픽 가속기 전체의 동작 효율을 향상시키는 것을 그 목적으로 한다.In addition, the present invention by copying the vertex data required for the next operation from the main memory while the three-dimensional graphics operation is being performed, thereby reducing the idle waiting time of the three-dimensional graphics accelerator device The purpose is to improve the operation efficiency.

또한, 본 발명은 필요한 정점 데이터와 인접한 일정량의 데이터를 주메모리로부터 일괄적으로 복사하는 것이 아니라 필요한 정점 데이터만을 복사함으로써, 작은 크기의 캐쉬 메모리를 보다 효율적으로 사용할 수 있도록 하는 것을 그 목적으로 한다.It is also an object of the present invention to use a small size cache memory more efficiently by copying only necessary vertex data instead of collectively copying a predetermined amount of data adjacent to necessary vertex data from the main memory.

또한, 본 발명은 주메모리에 저장된 정점 인덱스 배열을 인덱스 버퍼로 복사할 때 여러 개의 정점 인덱스를 한꺼번에 복사함으로써, 주메모리 접근 오버헤드로 인해 발생하는 성능 저하를 최소화하는 것을 그 목적으로 한다.In addition, an object of the present invention is to minimize the performance degradation caused by the main memory access overhead by copying a number of vertex indexes at the same time when copying the vertex index array stored in the main memory to the index buffer.

또한, 본 발명은 정점 데이터의 처리 순서에 따라 정점 데이터를 저장하는 캐쉬 메모리 어레이의 주소값을 처리 순서 버퍼에 저장하여 외부에서 참조 가능하도록 함으로써, 캐쉬 메모리의 동작 스루풋(throughput)을 높이는 것을 그 목적으로 한다.In addition, the present invention is to increase the operation throughput of the cache memory by storing the address value of the cache memory array for storing the vertex data in the processing order buffer according to the processing order of the vertex data so that it can be externally referenced. It is done.

상기의 목적을 달성하고, 상술한 종래기술의 문제점을 해결하기 위하여, 본 발명에 따른 캐쉬 메모리 장치는 3차원 그래픽 연산에 이용되는 정점 데이터들을 저장하는 캐쉬 메모리 어레이, 정점 데이터들 각각에 대응하는 제1 정점 인덱스들을 저장하는 태그 메모리, 그래픽 연산과 연관된 제2 정점 인덱스들을 그래픽 연산의 처리 순서에 따라 저장하는 인덱스 버퍼, 및 제2 정점 인덱스를 제1 정점 인덱스들과 비교하여 정점 데이터의 캐쉬 히트 여부를 판단하는 캐쉬 메모리 제어부를 포함하는 것을 특징으로 한다.In order to achieve the above object and to solve the above-mentioned problems of the prior art, the cache memory device according to the present invention is a cache memory array for storing the vertex data used for the three-dimensional graphics operation, corresponding to each of the vertex data Tag memory for storing vertex indices, an index buffer for storing second vertex indices associated with a graphic operation according to the processing order of the graphic operation, and whether the vertex data is cache hit by comparing the second vertex index with the first vertex indices. It characterized in that it comprises a cache memory control unit for determining the.

또한, 본 발명에 따른 캐쉬 메모리에 정점 데이터를 저장하는 방법은 이전에 참조된 정점 데이터들 및 상기 정점 데이터들에 대응하는 제1 정점 인덱스들을 캐쉬 메모리에 저장하는 제1 단계, 상기 그래픽 연산과 연관된 제2 정점 인덱스들을 주메모리로부터 독출하여 그래픽 연산의 처리 순서에 따라 인덱스 버퍼에 저장하는 제2 단계, 제2 정점 인덱스를 제1 정점 인덱스들과 비교하여 캐쉬 히트 여부를 판단하는 제3 단계, 판단 결과 캐쉬 미스인 경우에, 주메모리로부터 제2 정점 인덱스에 대응하는 정점 데이터를 독출하여 캐쉬 메모리에 추가 저장하는 제4 단계, 및 캐쉬 메모리 상에서 제1 정점 인덱스에 대응하는 정점 데이터가 저장되어 있는 주소값을 처리 순서 버퍼에 저장하는 제5 단계를 포함하는 것을 특징으로 한다.Further, a method of storing vertex data in a cache memory according to the present invention includes a first step of storing previously referenced vertex data and first vertex indices corresponding to the vertex data in a cache memory, associated with the graphic operation. A second step of reading the second vertex indexes from the main memory and storing the second vertex indexes in the index buffer according to the processing order of the graphic operation; and a third step of determining whether or not the cache hit is performed by comparing the second vertex index with the first vertex indexes. A fourth step of reading the vertex data corresponding to the second vertex index from the main memory and storing the vertex data corresponding to the second vertex index in the cache memory, and the address at which the vertex data corresponding to the first vertex index is stored on the cache memory. And a fifth step of storing the value in the processing sequence buffer.

또한, 본 발명에 따른 3차원 그래픽 연산 처리 방법은 캐쉬 메모리 상에 정점 데이터가 저장되어 있는 주소값을 저장하는 처리 순서 버퍼를 유지하는 단계, 처리 순서 버퍼로부터 상기 주소값을 독출하여, 상기 주소값에 해당하는 캐쉬 메모리상의 번지에 접근하여 정점 데이터를 독출하는 단계, 및 독출된 정점 데이터를 파라미터로 하여 3차원 그래픽 연산을 수행하는 단계를 포함하는 것을 특징으로 한 다.In addition, the three-dimensional graphics processing method according to the present invention comprises the steps of maintaining a processing order buffer for storing the address value stored in the vertex data on the cache memory, by reading the address value from the processing order buffer, And reading the vertex data by accessing the address on the cache memory, and performing the 3D graphic operation using the read vertex data as a parameter.

이하 첨부된 도면들을 참조하여 본 발명에 따른 캐쉬 메모리 장치의 구성 및 상기 캐쉬 메모리 장치를 이용하여 효율적으로 그래픽 연산을 처리하는 방법에 대해 상세히 설명한다.Hereinafter, a configuration of a cache memory device and a method of efficiently processing a graphic operation using the cache memory device will be described in detail with reference to the accompanying drawings.

도 3을 참조하면, 캐쉬 메모리 제어기(320)는 종래의 일반적인 캐쉬 메모리 회로와 마찬가지로 메모리 버스(370)에 연결되어 주메모리 제어기(360)를 통해 주메모리(350)에 접근한다. 주메모리(350)에는 3차원 그래픽 연산에 필요한 각 정점(vertex)의 정점 데이터가 저장된다. 또한, 주메모리(350)에는 여러 정점들의 그래픽 연산 처리 순서가 저장된 정점 인덱스 배열이 저장된다. 주메모리(350)에 저장되는 정점 인덱스 배열은 사용자로부터 입력되는데, 정점 인덱스 배열의 쓰임새에 대해서는 뒤에서 보다 상세히 설명하도록 한다. 한편, 주메모리(350)는 SDRAM(synchronous dynamic random access memory) 등으로 구성되며, 데이터 독출을 위한 메모리 접근에 상당한 사이클이 소요된다.Referring to FIG. 3, the cache memory controller 320 is connected to the memory bus 370 to access the main memory 350 through the main memory controller 360 like the conventional cache memory circuit. The main memory 350 stores vertex data of each vertex necessary for 3D graphic calculation. In addition, the main memory 350 stores a vertex index array in which graphic vertex processing sequences of various vertices are stored. The vertex index array stored in the main memory 350 is input from the user. The use of the vertex index array will be described later in more detail. On the other hand, the main memory 350 is composed of synchronous dynamic random access memory (SDRAM) and the like, and a considerable cycle is required to access the memory for reading data.

앞서 지적한 바와 같이, 좌표 변환 및 조명 처리 연산을 수행하는 T&L 엔진(Transform & Lighting engine) 등으로 대표되는 실행부(340)가 그래픽 연산에 이용되는 정점 데이터를 가져오기 위해 매번 주메모리(350)에 직접 접근하게 되면 주메모리(350)로부터 정점 데이터를 독출하는 동안 실행부(340)는 그래픽 연산을 수행할 수 없어 그래픽 연산 처리 효율이 떨어진다는 문제가 발생한다. 이와 같은 이유에서, 캐쉬 메모리 제어기(320)는 주메모리(350)에 저장된 정점 데이터 중에서 가까운 장래에 실행부(340)에 의해 사용될 정점 데이터를 상대적으로 빠른 접근이 가능한 캐쉬 메모리(310)로 복사하여, 실행부(340)의 주메모리(350) 접근에 따른 오버헤드를 감소시킬 수 있다.As pointed out above, the execution unit 340 represented by a T & L engine (Transform & Lighting engine) that performs coordinate transformation and lighting processing operations, etc., is loaded into the main memory 350 each time in order to obtain vertex data used for graphic calculation. When directly approaching, the execution unit 340 may not perform the graphic operation while reading the vertex data from the main memory 350, resulting in a decrease in the efficiency of the graphic operation processing. For this reason, the cache memory controller 320 copies the vertex data to be used by the execution unit 340 in the near future among the vertex data stored in the main memory 350 to the cache memory 310 which can be accessed relatively quickly. The overhead of accessing the main memory 350 of the execution unit 340 may be reduced.

그러나 종래의 일반적인 캐쉬 메모리 회로와 달리, 본 발명에 따른 캐쉬 메모리 장치는 실행부(340)가 정점 데이터를 필요로 하는 경우에 캐쉬 메모리 제어기(320)로 캐쉬 읽기 요구를 전달하는 것이 아니라, 캐쉬 메모리 제어기(320)에 의해 관리되는 처리 순서 버퍼(330)를 참조하여 직접 캐쉬 메모리(310)의 특정 주소에 저장된 정점 데이터를 가져간다.However, unlike conventional cache memory circuits in the related art, the cache memory device according to the present invention does not transmit a cache read request to the cache memory controller 320 when the execution unit 340 needs vertex data. The vertex data stored at a specific address of the cache memory 310 is directly taken by referring to the process order buffer 330 managed by the controller 320.

즉, 캐쉬 메모리 제어기(320)는 실행부(340)의 캐쉬 읽기 요구가 전달되는 경우에 캐쉬 히트 여부를 판단하는 것이 아니라, 실행부(340)가 캐쉬 메모리(310)를 참조하기 전에 미리 실행부(340)에 의해 다음으로 사용될 정점 데이터를 캐쉬 메모리(310)에 저장한다. 이처럼 실행부(340)가 캐쉬 메모리(310)를 참조하기 전에 다음 정점 데이터를 미리 캐쉬 메모리(310)에 저장함으로써, 본 발명에 따른 캐쉬 메모리 장치는 캐쉬 미스시에 주메모리(350)로부터 정점 데이터를 복사함에 따른 실행부(340)의 유휴 대기(idle waiting) 시간을 제거 또는 큰 폭으로 저감할 수 있다.That is, the cache memory controller 320 does not determine whether the cache is hit when the cache read request of the execution unit 340 is transmitted, but before the execution unit 340 refers to the cache memory 310, the execution unit is executed in advance. Vertex data to be used next by 340 is stored in the cache memory 310. As such, the execution unit 340 stores the next vertex data in advance in the cache memory 310 before referring to the cache memory 310, so that the cache memory device according to the present invention vertex data from the main memory 350 at the time of cache miss. Idle waiting time of the execution unit 340 by copying can be eliminated or greatly reduced.

이와 같이 캐쉬 메모리 제어기(320)가 실행부(340)에 의해 다음으로 사용될 정점 데이터를 미리 예측하여 저장할 수 있는 이유는 3차원 그래픽 연산의 특수성에 기인한다. 앞서 설명한 바와 같이, 3차원 그래픽 영상에서 물체는 선(edge) 또 는 폴리곤(polygon) 등의 프리미티브들로 구성되며, 프리미티브들은 다시 복수의 정점으로 구성된다. 예를 들어 선은 두 개의 정점으로, 삼각형 폴리곤은 3 개의 정점으로 구성된다. 따라서 3차원 그래픽 연산은 복수의 정점에 대한 연산들로 이루어진다고 할 수 있다.As such, the reason why the cache memory controller 320 may predict and store vertex data to be used next by the execution unit 340 is due to the specificity of the 3D graphic operation. As described above, in a 3D graphic image, an object is composed of primitives such as edges or polygons, and the primitives are composed of a plurality of vertices. For example, a line consists of two vertices and a triangular polygon consists of three vertices. Therefore, it can be said that the 3D graphic calculation is performed with a plurality of vertices.

3차원 그래픽 연산에서는 사용자로부터 입력되는 정점 인덱스 배열을 통해 각 정점과 관련된 데이터, 즉 정점 데이터의 처리 순서를 알 수 있다. 참고로, 정점 인덱스 배열은 처리될 정점들의 인덱스를 처리 순서에 따라 저장하고 있다.In the 3D graphic operation, the processing sequence of vertex data, that is, data related to each vertex, can be known through the vertex index array input from the user. For reference, the vertex index array stores indexes of vertices to be processed in order of processing.

널리 이용되는 공개 그래픽 라이브러리 표준인 OpenGL(Open Graphics Library)에서는 3차원 물체를 시각화하기 위한 함수로서 DrawArray()와 DrawElement()라는 API 함수를 지원한다. 이 중에서 DrawElement() 함수는 사용자에 의해 지정된 순서에 따라 정점 연산을 수행하는 함수이다. 이 때, 연속되는 프리미티브와 연관된 정점들을 순서대로 처리하도록 처리 순서가 정해지는 경우가 많으며, 연속되는 프리미티브들 간에는 공유하는 정점들이 있게 된다.OpenGL, a popular open graphics library standard, supports API functions called DrawArray () and DrawElement () as functions for visualizing three-dimensional objects. Among them, DrawElement () is a function that performs vertex operations in the order specified by the user. At this time, the processing order is often determined to sequentially process the vertices associated with successive primitives, and there are shared vertices among successive primitives.

예를 들어 인접한 삼각형 폴리곤들은 하나 또는 두 개의 정점을, 인접한 선들은 하나의 정점을 공유한다. 따라서, 현재 3차원 그래픽 연산에 이용되는 정점 데이터는 인접한 프리미티브에 의해 공유되어 가까운 장래에 다시 이용될 가능성이 높게 된다.For example, adjacent triangular polygons share one or two vertices, and adjacent lines share one vertex. Hence, vertex data currently used for three-dimensional graphics operations are shared by adjacent primitives and are likely to be used again in the near future.

따라서 캐쉬 메모리 제어기(320)는 실행부(340)가 3차원 그래픽 연산을 수행하고 있는 동안에, 주메모리(350)에 저장된 정점 인덱스 배열을 읽어 들여, 저장된 정점 인덱스에 대응하는 정점 데이터가 캐쉬 메모리(310)에 저장되어 있는지 판단 한다. 만약 저장되어 있다면, 처리 순서 버퍼(330)의 다음 엔트리에 정점 데이터가 저장된 캐쉬 메모리의 주소값을 저장한다.Therefore, the cache memory controller 320 reads the vertex index array stored in the main memory 350 while the execution unit 340 is performing the 3D graphic operation, and vertex data corresponding to the stored vertex index is stored in the cache memory ( If it is stored in 310). If so, the address value of the cache memory storing the vertex data is stored in the next entry of the processing sequence buffer 330.

그러나 만약 저장된 정점 인덱스에 대응하는 정점 데이터가 캐쉬 메모리(310)에 저장되어 있지 않다면, 캐쉬 메모리 제어기(320)는 주메모리(350)로부터 해당 정점 인덱스에 대응하는 정점 데이터를 캐쉬 메모리(310)로 복사하여 저장하고, 저장된 주소값을 처리 순서 버퍼(330)에 추가한다.However, if no vertex data corresponding to the stored vertex index is stored in the cache memory 310, the cache memory controller 320 transfers the vertex data corresponding to the vertex index from the main memory 350 to the cache memory 310. Copy and store, and add the stored address value to the processing sequence buffer 330.

이 때, 캐쉬 메모리 제어기(320)는 소정의 교체 알고리즘에 따라 종전에 저장되어 있던 정점 데이터들 중에서 어느 하나를 캐쉬 메모리(310)로부터 제거할 수 있다. 제거되는 정점 데이터는 일반적으로 가장 히트 확률이 낮은 데이터이며, 제거될 정점 데이터를 선택하는 교체 알고리즘으로는 최소 최근 사용(Least Recently Used, LRU) 알고리즘, 선입선출(First-In-First-Out, FIFO) 알고리즘, 최소 사용 빈도(Least Frequently Used, LFU) 알고리즘, 및 임의 선택(random selection) 알고리즘 등이 있다.At this time, the cache memory controller 320 may remove any one of the vertex data previously stored from the cache memory 310 according to a predetermined replacement algorithm. The vertex data to be removed is generally the least hit data, and the replacement algorithm for selecting the vertex data to be removed is the least recently used (LRU) algorithm, first-in-first-out, FIFO. ) Algorithm, Least Frequently Used (LFU) algorithm, and random selection algorithm.

이와 같이 동작하는 캐쉬 메모리 장치를 이용할 경우, 실행부(340)는 필요한 정점 데이터 캐쉬 메모리 제어기(320)를 통해 캐쉬 메모리로부터 독출하는 것이 아니라, 처리 순서 버퍼(330)를 참조하여, 처리 순서 버퍼에 순차적으로 저장된 주소값에 해당하는 캐쉬 메모리(310) 번지에 접근하여 직접 정점 데이터를 독출한다.When using the cache memory device operating as described above, the execution unit 340 does not read from the cache memory through the required vertex data cache memory controller 320, but refers to the processing sequence buffer 330 to process the processing sequence buffer. The vertex data is read directly by accessing the cache memory 310 address corresponding to the address value sequentially stored in the.

도 4는 이와 같이 동작하는 캐쉬 메모리와 그 주변 회로의 내부 구성을 상세히 도시한 블록도이다. 구체적으로 도 4는 도 3에 도시된 캐쉬 메모리(310), 캐쉬 메모리 제어기(320), 및 처리 순서 버퍼(330)를 포함하는 캐쉬 메모리 장치의 보다 실질적인 구성을 제시한다. 도 3은 본 발명에 따른 캐쉬 메모리 장치의 동작을 설명하기 위하여 편의상 도 4에 도시된 실질적인 구성에 따른 구성요소들 중에서 일부를 생략하거나 하나로 묶어 간략히 도시한 것이다. 따라서 도 4에 도시된 구성요소들은 도 3의 캐쉬 메모리(310), 캐쉬 메모리 제어기(320), 및 처리 순서 버퍼(330)에 각각 일대일로, 그 일부에, 또는 그 조합에 대응한다.4 is a block diagram showing in detail the internal configuration of the cache memory and the peripheral circuits that operate in this manner. Specifically, FIG. 4 illustrates a more practical configuration of the cache memory device including the cache memory 310, the cache memory controller 320, and the processing order buffer 330 shown in FIG. 3. FIG. 3 is a simplified illustration of some of the components according to the actual configuration shown in FIG. 4 for convenience of description, in order to explain the operation of the cache memory device according to the present invention. Thus, the components shown in FIG. 4 correspond to the cache memory 310, the cache memory controller 320, and the processing order buffer 330 of FIG. 3, one-to-one, some, or a combination thereof, respectively.

도 4를 참조하면, 본 발명에 따른 캐쉬 메모리 장치는 정점 데이터 캐쉬 메모리 블록(410), 인덱스 버퍼 블록(420), 및 처리 순서 버퍼 블록(430)을 포함한다. 참고로, 도 4에는 캐쉬 메모리 장치가 64 개의 정점에 대한 정점 데이터를 저장하는 경우를 예시하고 있다.Referring to FIG. 4, a cache memory device according to the present invention includes a vertex data cache memory block 410, an index buffer block 420, and a processing sequence buffer block 430. For reference, FIG. 4 illustrates a case in which the cache memory device stores vertex data for 64 vertices.

인덱스 버퍼 블록(420)은 다시 정점 인덱스 배열을 저장하는 인덱스 버퍼(421)와 인덱스 버퍼(421)의 데이터 읽기/쓰기 동작을 제어하는 인덱스 버퍼 제어부(422)를 포함할 수 있다. 인덱스 버퍼(421)에 저장되는 정점 인덱스 배열은 캐쉬 메모리 제어부(413)에 의해 주메모리(350)로부터 복사된다. 캐쉬 메모리 제어부(413)는 주메모리(350) 접근 횟수를 최소화하기 위해 주메모리(350)로부터 한 번에 여러 개의 정점에 대한 정점 데이터를 가져와서 인덱스 버퍼(421)에 저장할 수 있다.The index buffer block 420 may further include an index buffer 421 for storing the vertex index array and an index buffer controller 422 for controlling data read / write operations of the index buffer 421. The vertex index array stored in the index buffer 421 is copied from the main memory 350 by the cache memory controller 413. The cache memory controller 413 may obtain vertex data for several vertices from the main memory 350 at one time and store the vertex data in the index buffer 421 in order to minimize the number of times the main memory 350 is accessed.

한편, 정점 데이터 캐쉬 메모리 블록(410)에 포함되는 캐쉬 메모리 어레이(411)에는 3차원 그래픽 연산에 파라미터로 이용되는 정점 데이터가 저장된다. 도시된 바와 같이, 캐쉬 메모리 어레이(411)는 여러 다양한 종류의 정점 데이터를 그 종류별로 저장하는 복수의 메모리 블록을 포함할 수 있다.Meanwhile, in the cache memory array 411 included in the vertex data cache memory block 410, vertex data used as a parameter for three-dimensional graphic operations is stored. As shown, the cache memory array 411 may include a plurality of memory blocks for storing various types of vertex data for each type.

일례로서 캐쉬 메모리 어레이(310)를 구성하는 정점 데이터 종류별 메모리 블록은, 정점의 좌표를 저장하는 정점 좌표(vertex coordinate) 블록, 정점의 수직 벡터의 좌표를 저장하는 수직 좌표(normal coordinate) 블록, 정점의 색상과 관련된 데이터를 저장하는 정점 색상(vertex color) 블록, 0번째 텍스쳐와 관련된 좌표를 저장하는 텍스쳐 0 좌표(texture 0 coordinate) 블록, 1번째 텍스쳐와 관련된 좌표를 저장하는 텍스쳐 1 좌표(texture 1 coordinate) 블록, 정점의 포인트 크기 데이터를 저장하는 정점 크기(point size) 블록, 정점의 움직임과 관련된 매트릭스 팔레트의 가중치 및 인덱스를 각각 저장하는 매트릭스 팔레트 가중치(matrix palette weight) 블록 및 매트릭스 팔레트 인덱스(matrix palette index) 블록 등을 포함할 수 있다.As an example, a memory block for each vertex data type constituting the cache memory array 310 may include a vertex coordinate block that stores the coordinates of the vertex, a normal coordinate block that stores the coordinates of the vertical vector of the vertex, and a vertex A vertex color block that stores data related to the color of the texture, a texture 0 coordinate block that stores the coordinates associated with the 0th texture, and a texture 1 coordinate that stores the coordinates associated with the 1st texture coordinate block, a point size block that stores the point size data of the vertices, a matrix palette weight block that stores the weights and indices of the matrix palette related to the movement of the vertices, and a matrix palette index, respectively. palette index) block and the like.

또한, 정점 데이터 캐쉬 메모리 블록(410)은 캐쉬 히트 여부 판단을 위한 태그 정보를 저장하는 태그 메모리(412)를 포함한다. 태그 메모리(412)에 저장되는 태그 정보는 각 정점의 인덱스이다. 태그 메모리(412)의 각 엔트리는 캐쉬 메모리 어레이(411)의 각 엔트리에 대응한다. 즉, 캐쉬 메모리 어레이(411)의 64 개의 엔트리 가운데 n 번째 엔트리에 저장된 정점 데이터에 대응하는 정점 인덱스는 태그 메모리(412)의 n 번째 엔트리에 저장된다.The vertex data cache memory block 410 also includes a tag memory 412 that stores tag information for determining whether a cache hit is made. The tag information stored in the tag memory 412 is an index of each vertex. Each entry in the tag memory 412 corresponds to each entry in the cache memory array 411. That is, the vertex index corresponding to the vertex data stored in the n th entry among the 64 entries of the cache memory array 411 is stored in the n th entry of the tag memory 412.

캐쉬 메모리 제어부(413)는 캐쉬의 히트 여부를 판단하기 위해 인덱스 버퍼에 저장된 정점 인덱스 하나를 읽어 들여, 이 정점 인덱스와 태그 메모리(412)에 저장된 정점 인덱스들의 일치 여부를 비교한다. 일치하는 정점 인덱스를 저장하고 있는 엔트리가 태그 메모리(412)에서 발견되는 경우에, 캐쉬 메모리 제어부(413)는 대응되는 캐쉬 메모리 어레이(411)의 엔트리 번호 또는 주소값을 처리 순서 버퍼(431)에 저장한다.The cache memory controller 413 reads one vertex index stored in the index buffer to determine whether the cache is hit, and compares the vertex index with the vertex index stored in the tag memory 412. When an entry storing a matching vertex index is found in the tag memory 412, the cache memory controller 413 stores the entry number or address value of the corresponding cache memory array 411 in the processing sequence buffer 431. Save it.

그러나 만약 일치하는 정점 인덱스를 저장하고 있는 태그 메모리 엔트리를 발견하지 못했다면, 캐쉬 메모리 제어부(413)는 상기 정점 인덱스에 대응하는 정점 데이터를 주메모리(350)로부터 캐쉬 메모리 어레이(411)로 복사해 와서 저장한다. 그리고 저장된 캐쉬 메모리 어레이(411)의 주소값은 처리 순서 버퍼(431)에 저장된다.However, if no tag memory entry is stored that matches the vertex index, the cache memory controller 413 copies the vertex data corresponding to the vertex index from the main memory 350 to the cache memory array 411. Come and save The stored address value of the cache memory array 411 is stored in the processing sequence buffer 431.

마지막으로, 처리 순서 버퍼 블록(430)은 정점 데이터가 저장된 캐쉬 메모리 어레이(411)의 주소값을 저장하는 처리 순서 버퍼(431)와 처리 순서 버퍼의 데이터 읽기/쓰기 동작을 제어하는 처리 순서 버퍼 제어부(432)를 포함한다. 처리 순서 버퍼(431)에 각 주소값이 저장되는 위치는 인덱스 버퍼(421)에 각 정점 인덱스가 저장되는 위치에 대응한다. 즉, 처리 순서 버퍼(431)의 64 개의 엔트리 가운데 i 번째 엔트리에 저장된 주소값은 인덱스 버퍼(421)의 i 번째 엔트리에 저장된 정점 인덱스에 대응하는 정점 데이터가 저장되어 있는 주소값을 의미한다.Finally, the processing sequence buffer block 430 controls the processing sequence buffer 431 for storing address values of the cache memory array 411 in which the vertex data is stored, and the processing sequence buffer control unit for controlling data read / write operations of the processing sequence buffer. 432. The position where each address value is stored in the processing sequence buffer 431 corresponds to the position where each vertex index is stored in the index buffer 421. That is, the address value stored in the i th entry among the 64 entries in the processing sequence buffer 431 means an address value in which vertex data corresponding to the vertex index stored in the i th entry of the index buffer 421 is stored.

도 4와 같이 구성된 캐쉬 메모리 장치의 동작을 하나의 예를 통해 보다 쉽게 설명하도록 한다. 예를 들어, 캐쉬 메모리 제어부(413)가 정점 인덱스 배열에 저장된 i 번째 정점에 대한 처리를 수행하는 경우에, i 번째 정점의 정점 인덱스와 정점 데이터는 각각 태그 메모리(412)와 캐쉬 메모리 어레이(411)의 n 번째 엔트리에 저장되어 있다고 한다.An operation of the cache memory device configured as shown in FIG. 4 will be described more easily with one example. For example, when the cache memory controller 413 performs processing for the i th vertex stored in the vertex index array, the vertex index and the vertex data of the i th vertex are respectively tagged memory 412 and cache memory array 411. Is stored in the nth entry of the.

캐쉬 메모리 제어부(413)는 i 번째 정점 인덱스와 일치하는 인덱스 값을 태 그 메모리(412)의 엔트리들에서 검색하여 캐쉬 히트 여부를 판단한다. 검색 결과 n 번째 엔트리에서 일치하는 인덱스 값이 발견되면, 캐쉬 메모리 제어부(413)는 캐쉬 히트로 판단하고 처리 순서 버퍼의 i 번째 위치에 캐쉬 메모리 어레이(411)의 n 번째 엔트리의 주소값 ‘n’ 을 저장한다.The cache memory controller 413 searches the entries of the tag memory 412 for an index value corresponding to the i th vertex index and determines whether the cache is hit. If a matching index value is found in the n th entry as a result of the search, the cache memory controller 413 determines that the cache hit is the address value 'n' of the n th entry of the cache memory array 411 at the i th position of the processing order buffer. Save it.

만약 i 번째 정점의 정점 데이터가 캐쉬 메모리 어레이(411)에 저장되어 있지 않다면, 그에 대응하는 정점 인덱스 또한 태그 메모리(412)에 저장되어 있지 않다. 따라서 캐쉬 메모리 제어부(413)는 i 번째 정점 인덱스와 일치하는 인덱스 값을 태그 메모리(412)의 엔트리에서 발견하지 못하게 되고, 이를 캐쉬 미스로 판단한다. 따라서 캐쉬 메모리 제어부(413)는 주메모리(350)로부터 i 번째 정점 인덱스에 대응하는 정점 데이터를 복사해 와서 캐쉬 메모리 어레이(411)의 m 번째 엔트리에 저장한다. 이 때, 태그 메모리의 m 번째 엔트리에는 i 번째 정점 인덱스가 저장된다. 또한, 처리 순서 버퍼(431)의 i 번째 위치에는 캐쉬 메모리 어레이(411)의 m 번째 엔트리의 주소값 ‘m’ 이 저장된다.If the vertex data of the i th vertex is not stored in the cache memory array 411, the corresponding vertex index is not stored in the tag memory 412. Therefore, the cache memory controller 413 does not find an index value corresponding to the i th vertex index in the entry of the tag memory 412, and determines it as a cache miss. Accordingly, the cache memory controller 413 copies the vertex data corresponding to the i th vertex index from the main memory 350 and stores the vertex data in the m th entry of the cache memory array 411. At this time, the i th vertex index is stored in the m th entry of the tag memory. In addition, the address value 'm' of the m th entry of the cache memory array 411 is stored in the i th position of the processing sequence buffer 431.

캐쉬 메모리 어레이(411)의 64 개의 엔트리가 이미 정점 데이터를 저장하기 위해 모두 사용되고 있다면, 상기 정점 데이터의 추가 저장은 m 번째 엔트리가 새로운 정점 데이터와 정점 인덱스로 교체되는 것을 의미한다. 이 때 교체될 m 번째 엔트리를 선택하는 교체 알고리즘에 대해서는 앞에서 설명하였다.If 64 entries in the cache memory array 411 are already used to store vertex data, then further storage of the vertex data means that the mth entry is replaced with new vertex data and vertex index. The replacement algorithm for selecting the m-th entry to be replaced at this time has been described above.

일실시예에 따르면, 주메모리(350)로부터 정점 인덱스 배열을 가져오는 동작 또는 i 번째 정점 인덱스와 관련된 히트 여부 판별 및 이에 따른 처리 순서 버퍼(431)의 업데이트 동작은 실행부(340)가 캐쉬 메모리 장치로부터 독출해 간 정점 데이터를 이용하여 3차원 그래픽 연산을 수행하는 동안에 동시에 수행될 수 있다.According to an embodiment, the operation of obtaining the vertex index array from the main memory 350 or determining whether the hit is related to the i th vertex index and the updating operation of the processing sequence buffer 431 according to the execution unit 340 may be performed by the execution unit 340. Vertex data read from the device may be used to perform simultaneous three-dimensional graphics operations.

도 5를 참조하면, 단계(S510)에서는 주메모리(350)로부터 정점 인덱스 배열을 읽어 와서 인덱스 버퍼(421)에 저장하는 과정이 수행된다. 정점 인덱스 배열은 처리해야 할 정점들의 정점 인덱스들이 그래픽 연산 처리 순서에 따라 저장되어 있다.Referring to FIG. 5, in operation S510, a process of reading a vertex index array from the main memory 350 and storing the vertex index array in the index buffer 421 is performed. In the vertex index array, vertex indices of vertices to be processed are stored in the graphic operation processing order.

단계(S520)에서는 인덱스 버퍼에 저장된 정점 인덱스들 중에서 하나의 값을 태그 메모리(412)의 엔트리들에 저장된 정점 인덱스 값들과 비교하여 캐쉬 히트 여부를 판단한다.In operation S520, one of the vertex indexes stored in the index buffer is compared with the vertex index values stored in the entries of the tag memory 412 to determine whether the cache is hit.

판단 결과, 일치하는 인덱스 값이 태그 메모리(412)에서 발견되었다면, 캐쉬 히트로 판단하고 단계(S530)를 통해 태그 메모리(412) 상에서 발견된 위치에 대응하는 캐쉬 메모리 어레이(411)의 주소값을 처리 순서 버퍼(431)에 저장한다.As a result of the determination, if a matching index value is found in the tag memory 412, it is determined as a cache hit and the address value of the cache memory array 411 corresponding to the location found on the tag memory 412 is determined in step S530. Stored in the processing sequence buffer 431.

한편, 만약 태그 메모리(412)에서 일치하는 인덱스 값이 발견되지 않았다면, 캐쉬 미스로 판단하고 메모리 버스를 통해 주메모리(350)에 접근하는 단계(S540) 및 주메모리에서 상기 정점 인덱스에 대응하는 정점 데이터를 복사해 와서 캐쉬 메모리 어레이(411)에 저장하는 단계(S550)를 수행한다. 다음으로, 정점 데이터가 저장된 캐쉬 메모리 어레이(411)의 주소값을 처리 순서 버퍼(431)에 저장하여 캐쉬 메모리 어레이(411)가 외부에서 직접 참조될 수 있도록 한다.On the other hand, if a matching index value is not found in the tag memory 412, determining a cache miss and accessing the main memory 350 through the memory bus (S540) and a vertex corresponding to the vertex index in the main memory The data is copied and stored in the cache memory array 411 (S550). Next, the address value of the cache memory array 411 in which the vertex data is stored is stored in the processing sequence buffer 431 so that the cache memory array 411 may be directly referenced from the outside.

단계(S560)은 처리 순서 버퍼(431)를 참조하여 캐쉬 메모리 어레이(411)로부 터 정점 데이터를 독출하는 단계이다. 본 단계(S560)는 캐쉬 메모리 장치 외부에 연결된 실행부(340)에 의해 수행되는 단계로서, 본 단계에 따르면 처리 순서 버퍼(431)에 저장된 주소값에 해당하는 캐쉬 메모리 어레이(411)의 번지에 저장된 정점 데이터를 직접 독출하게 된다.In operation S560, the vertex data is read from the cache memory array 411 with reference to the processing sequence buffer 431. This step (S560) is performed by the execution unit 340 connected to the outside of the cache memory device, according to this step to the address of the cache memory array 411 corresponding to the address value stored in the processing sequence buffer 431 The stored vertex data will be read directly.

상기 단계(S560)에 의해 독출된 정점 데이터는 단계(S570)의 3차원 그래픽 연산에 파라미터로서 이용된다. 일실시예에 따르면, 단계(S510) 내지 단계(S550) 중에서 적어도 하나는 3차원 그래픽 연산을 수행하는 단계(S570)와 동시에 수행된다. 이처럼, 특정한 정점에 대한 3차원 그래픽 연산이 수행되는 동안에 미리 다음 정점과 연관된 정점 데이터를 캐쉬 메모리에 저장하고, 정점 데이터가 저장된 주소값을 외부에서 참조 가능하도록 함으로써, 캐쉬 미스시에 주메모리(350)에 접근하여 정점 데이터를 복사해 오는 동안 3차원 그래픽 연산을 수행하지 않고 대기하는 유휴 대기 시간을 제거 또는 큰 폭으로 저감할 수 있다.The vertex data read out in step S560 is used as a parameter in the three-dimensional graphic calculation of step S570. According to one embodiment, at least one of steps S510 to S550 is performed simultaneously with step S570 of performing a 3D graphic operation. As such, while the three-dimensional graphic operation for a specific vertex is performed, the vertex data associated with the next vertex is stored in the cache memory in advance, and the address value in which the vertex data is stored can be referred to from the outside so that the main memory 350 can be cached at the time of cache miss. ), We can eliminate or significantly reduce the idle wait time waiting for 3D graphics operations while copying vertex data.

지금까지 도 5를 참조하여 본 발명에 따른 조명 처리 연산 방법에 대하여 설명하였다. 본 발명에 따른 조명 처리 연산 방법에는 도 1 내지 도 4와 관련하여 상술한 실시예들의 세부 내용이 그대로 적용될 수 있으므로 이하 본 방법과 관련된 세부 내용의 설명은 생략하도록 한다.The lighting processing calculation method according to the present invention has been described above with reference to FIG. 5. Since the details of the embodiments described above with reference to FIGS. 1 to 4 may be applied to the illumination processing calculation method according to the present invention, the description of the details related to the present method will be omitted.

본 발명에 따른 캐쉬 메모리에 정점 데이터를 저장하는 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기 록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.The method of storing the vertex data in the cache memory according to the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks such as floppy disks. -Magneto-Optical Media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

상기 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 명세서에서는 특정한 구성 요소들과 한정된 실시예 및 도면을 통하여 본 발명에 대한 설명을 제공하였으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐 본 발명이 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 가하여 본 발명의 범위에 포함되는 발명을 착안할 수 있다.As described above, in the present specification, the description of the present invention has been provided through specific embodiments and limited embodiments and drawings, but the present invention is provided to help a more general understanding of the present invention, and the present invention is limited to the above embodiments. The present invention may be devised by those skilled in the art to which the present invention pertains without departing from the scope of the present invention.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있 는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the described embodiments, and all the things that are equivalent to or equivalent to the scope of the claims as well as the claims to be described later belong to the scope of the present invention. will be.

본 발명에 따른 캐쉬 메모리 장치 및 상기 캐쉬 메모리 장치를 이용하여 3차원 그래픽 연산을 처리하는 방법에 의하면, 정점 데이터의 처리 순서를 보다 정확하게 예측하게 하는 정점 인덱스 배열을 참조함으로써, 제한된 크기의 캐쉬 메모리를 사용하면서도 캐쉬 히트율을 향상시킬 수 있다.According to the cache memory device and the method of processing a 3D graphic operation using the cache memory device according to the present invention, by referring to the vertex index array to predict the processing order of the vertex data more accurately, The cache hit rate can be improved while using.

또한, 본 발명에 의하면 종래의 일반적인 캐쉬 메모리에 사용되는 캐쉬 메모리 어레이의 크기를 큰 폭으로 축소시켜, 회로 면적을 절약하고 공정 비용 및 공정 불량률을 저감하는 효과를 거둘 수 있다.In addition, according to the present invention, the size of a cache memory array used in a conventional general cache memory can be greatly reduced, thereby reducing the circuit area and reducing the process cost and the process failure rate.

또한, 본 발명은 3차원 그래픽 연산이 수행되고 있는 동안에 다음 연산에 필요한 정점 데이터를 주메모리로부터 복사해 옴으로써, 3차원 그래픽 연산 장치의 유휴 대기 시간을 감소시켜 3차원 그래픽 가속기 전체의 동작 효율을 향상시킬 수 있다.In addition, the present invention copies the vertex data required for the next operation from the main memory while the three-dimensional graphics operation is being performed, thereby reducing the idle waiting time of the three-dimensional graphics accelerator device to improve the operational efficiency of the entire three-dimensional graphics accelerator. Can be improved.

또한, 본 발명은 일정량의 데이터를 주메모리로부터 캐쉬 메모리로 일괄적으로 복사하는 것이 아니라 필요한 정점 데이터만을 복사함으로써, 작은 크기의 캐쉬 메모리를 보다 효율적으로 사용할 수 있다.In addition, the present invention makes it possible to use a small size cache memory more efficiently by copying only necessary vertex data instead of copying a certain amount of data from the main memory to the cache memory at once.

또한, 본 발명은 주메모리에 저장된 정점 인덱스 배열을 인덱스 버퍼로 복사할 때 여러 개의 정점 인덱스를 한꺼번에 복사함으로써, 주메모리 접근 오버헤드로 인해 발생하는 성능 저하를 최소화할 수 있다.In addition, the present invention can minimize the performance degradation caused by the main memory access overhead by copying a number of vertex indexes at the same time when copying the vertex index array stored in the main memory to the index buffer.

또한, 본 발명에 따르면 정점 데이터의 처리 순서에 따라 정점 데이터를 저 장하는 캐쉬 메모리 어레이의 주소값을 처리 순서 버퍼에 저장하여 외부에서 참조 가능하도록 함으로써, 캐쉬 메모리의 동작 스루풋(throughput)을 향상시킬 수 있다.In addition, according to the present invention, the address value of the cache memory array that stores the vertex data according to the processing order of the vertex data is stored in the processing order buffer to be externally referred to, thereby improving the operation throughput of the cache memory. Can be.

Claims

A cache memory array for storing vertex data used for three-dimensional graphics operations;

A tag memory for storing first vertex indices corresponding to each of the vertex data;

An index buffer that stores second vertex indices associated with the graphics operation according to the processing order of the graphics operation; And

A cache memory controller configured to determine whether the vertex data is cache hit by comparing the second vertex index with the first vertex indexes

Cache memory device comprising a.

The method of claim 1,

A processing sequence buffer for storing the address value in which the vertex data is stored in the cache memory array according to the processing sequence

More,

The cache memory controller stores the address value with reference to the index buffer.

Cache memory device, characterized in that.

The method of claim 2,

The cache memory controller determines a cache hit when an entry in which the matching first vertex index value is found as a result of the comparison is determined, and further stores the address value corresponding to the found entry in the processing sequence buffer. Cache memory device, characterized in that.

The method of claim 2,

The cache memory controller determines that a cache miss is detected when an entry storing the matching first vertex index value is not found, and vertex data corresponding to the first vertex index is read from a main memory and the cache is read. And storing the stored address value in the processing sequence buffer.

The method of claim 2,

The cache memory device is connected to a three-dimensional graphics computing device for performing coordinate transformation or lighting operation,

Wherein the graphics computing device reads the vertex data from the cache memory array with reference to the processing order buffer.

Cache memory device, characterized in that.

The method of claim 1,

And the cache memory controller reads a plurality of second vertex indexes from a main memory at a time and stores the plurality of second vertex indexes in the index buffer.

The method of claim 1,

And when the vertex data is read from the main memory, the cache memory controller reads only the vertex data except for data of an adjacent address.

The method of claim 1,

The cache memory device comprises a plurality of memory blocks for storing the vertex data for each type.

The method of claim 1,

And the vertex data includes at least one of vertex coordinates, vertical coordinates, vertex colors, vertex sizes, matrix palette weights, and matrix palette indexes.

In the method of storing the vertex data for the vertices in the three-dimensional space in the cache memory,

A first step of storing previously referenced vertex data and first vertex indexes corresponding to the vertex data in a cache memory;

A second step of reading second vertex indices associated with the graphic operation from a main memory and storing the second vertex indexes in an index buffer according to a processing order of the graphic operation;

A third step of determining whether to hit a cache by comparing the second vertex index with the first vertex indexes;

A fourth step of reading vertex data corresponding to the second vertex index from the main memory and additionally storing the vertex data in the cache memory when the result of the determination is a cache miss; And

A fifth step of storing an address value in which vertex data corresponding to the first vertex index is stored in a cache in a processing sequence buffer;

Vertex data storage method comprising a.

The method of claim 10,

A sixth step of reading the vertex data from the cache memory with reference to the processing sequence buffer; And

A seventh step of performing a 3D graphic operation using the read vertex data as a parameter

Vertex data storage method characterized in that it further comprises.

The method of claim 11,

At least one of the first to fifth steps is performed simultaneously with the seventh step.

The method of claim 10,

And the graphic operation is a coordinate transformation or lighting operation for the vertex.

In the graphic operation processing method for a vertex in three-dimensional space,

Maintaining a processing order buffer for storing an address value in which vertex data is stored on a cache memory;

Reading the address value from the processing sequence buffer, accessing a address on the cache memory corresponding to the address value, and reading the vertex data; And

Performing a 3D graphic operation using the read vertex data as a parameter

Graphic processing method comprising a.

The method of claim 14,

Maintaining the processing sequence buffer,

Reading a vertex index array input from a user and stored in the main memory;

Determining whether to hit a cache of vertex data corresponding to the vertex index using the vertex index included in the vertex index array; And

Storing the vertex data in the cache memory according to whether the cache is hit, and updating the stored address value in the processing sequence buffer.

Graphic processing method comprising a.

A computer-readable recording medium having recorded thereon a program for executing the method according to any one of claims 10 to 15.